Sociometrics And Human Relationships: Analyzing Social Networks To Manage Brands, Predict Trends, And Improve Organizational Performance [1st Edition] 1787141136, 9781787141131, 1787141128, 9781787141124, 1787147258, 9781787147256

Sociometrics and Human Relationships translates the latest academic research into practical business strategies and tech

156 49 32MB

English Pages 508 Year 2017

Table of contents :
Front Cover......Page 1
Sociometrics and Human Relationships......Page 4
Copyright Page......Page 5
Contents......Page 6
Acknowledgments......Page 12
1 Introduction......Page 16
1.1.1. Part I — Trend Prediction by Analyzing Social Networks......Page 21
1.1.2. Part II — Analyzing Structure, Dynamics, and Content of Networks with Condor......Page 24
1.1.3. Part III — Automatic Media Insights COIN Assessment (AMICA)......Page 28
1.1.4. Part IV — Appendix — Useful Machine Learning and Graph Analysis Tools......Page 29
1.2. Key Takeaways of This Book......Page 30
1.3. Study Plan for a One-Semester Course......Page 33
1.4. Sample Course Syllabus......Page 35
Part I. Trend Prediction by Measuring Social Networks......Page 40
2 Coolfarming Organizations......Page 42
2.1. Knowledge Flow Optimization through Organizational Social Network Analysis......Page 44
2.2. The Coolfarming Data Collection and Analysis Process......Page 46
2.2.1. Assessing the Organization’s Communication Patterns......Page 47
2.2.2. Benchmarking the Organization’s Communication Patterns against Those Seen in Other Organizations......Page 48
2.2.4. Virtual Mirroring......Page 49
3.1. Measuring Collective Awareness......Page 52
3.2. The Coolhunting Process — Finding Trends by Finding Trendsetter......Page 54
4 The Six Honest Signals of Collaboration......Page 60
4.1. The Honest Signals Have Different Meanings for Different Organizations......Page 69
4.3. Dealing with Privacy Concerns......Page 71
4.4. How to Apply Knowledge Flow Optimization......Page 73
4.5. Four Examples......Page 76
4.6. Areas of E-Mail-Based SNA......Page 79
4.7. Improving Financial Capital through Optimizing Social Capital......Page 80
5 Essentials of Social Network Analysis and Statistics......Page 84
5.1. Basics of Social Network Analysis (SNA)......Page 85
5.2. Basics of Statistics......Page 90
6 How Ideas Spread in Online Social Networks — Readings......Page 100
6.1. Theories of Information Diffusion......Page 104
6.2. Spreading Ideas on Facebook......Page 110
6.3. Finding Fake Reviews through Machine Learning......Page 111
6.4. Measuring Financial Performance......Page 112
6.5. Calculating Demographic Information......Page 114
6.6. Predicting Election Outcome......Page 118
Part II. Analyzing Structure, Dynamics, and Content of Networks with Condor......Page 122
7 The Four-Step Analysis Process......Page 126
7.1. Social Media Fetchers......Page 130
7.3. Social Media Visualizers......Page 131
7.4. Social Media Exporters......Page 133
8 Getting Started with Condor......Page 136
8.1. Analyzing the Facebook Wall with Condor......Page 140
8.2.1. Step 1 — Fetch Data......Page 145
8.2.2. Step 2 — Process......Page 147
8.2.3. Step 3 — Visualize......Page 148
8.2.4. Step 4 — Export......Page 149
8.3. Measuring the Importance of Brands through Betweenness of Actors in Bipartite Graphs......Page 151
8.4. Pruning the Leaves in a Graph......Page 152
8.5. Degree-of-Separation Search with Google CSE......Page 156
8.6. Degree-of-Separation Search with Twitter......Page 161
8.7. Wikipedia Search......Page 165
9 Analyzing E-Mail with Condor......Page 168
9.1. Creating a Virtual Mirror of Your Own Mailbox......Page 169
9.1.1. Drawing the Term Graph......Page 187
9.1.2. Removing the Mailbox Owner......Page 189
9.2. Finding COINs through Community Detection......Page 200
9.3. Creating a Virtual Mirror of an Organization......Page 207
9.4. Analyzing Hillary Clinton’s Mail......Page 234
9.5. Organizational Aspects of E-Mail-Based SNA......Page 243
9.7. (Partial) List of E-Mail Studies Conducted by the Author in Various Organizations......Page 247
10 Calculating Personality Characteristics from E-Mail......Page 256
10.1. Calculating Correlations between FFI and E-Mail......Page 257
10.2. Developing a General Prediction Formula......Page 259
10.2.1. Neuroticism......Page 264
10.2.4. Agreeability......Page 265
10.2.5. Conscientiousness......Page 268
10.3. Adding Gender, Ethnicity, and Nationality as Control Variables......Page 269
10.3.1. Extroversion......Page 273
10.3.2. Agreeability......Page 274
10.4. Follow-on Exercises......Page 275
11 Predicting Criminal Intent from E-Mail — Analyzing the Enron E-Mail Archive......Page 278
11.1. Exploratory Analysis......Page 279
11.2. Identifying Criminal Actors through Their Honest Signals of Collaboration......Page 288
11.3. “Tribefinder” — Identifying Criminals through Machine Learning in Condor......Page 295
11.4. Follow-on Exercises......Page 306
12 Coolhunting on the Internet with Condor......Page 310
12.1. Expert Analysis — Websites and Blogs......Page 313
12.2. Swarm Analysis — Wikipedia......Page 326
12.3. Analysis of the Crowd — Twitter......Page 337
12.4. Follow-on Exercises......Page 349
12.5. (Partial) List of Internet Coolhunting Studies......Page 350
13 Coolhunting — Francogeddon......Page 354
13.1. Follow-on Exercises......Page 362
14 Coolhunting the US Presidential Elections......Page 364
14.1. Bernie Sander’s Presidential Campaign — The Perfect COIN......Page 368
14.2. Coolhunting Bernie Sanders, Hillary Clinton, Jeb Bush, and Donald Trump......Page 371
14.3. Tribefinder on Twitter (Using Machine Learning)......Page 381
14.4. Follow-on Exercises......Page 398
Part III. Automatic Media Insights COIN Assessment (AMICA)......Page 400
15 Inside Media Individual Collaboration (IMIC)......Page 406
15.1. IMIC Annotation Process......Page 416
16 Outside Media Individual Collaboration (OMIC)......Page 420
16.1. OMIC Annotation Process......Page 429
17 Inside Media Organizational Collaboration (IMOC)......Page 434
17.1. IMOC Annotation Process......Page 438
18 Outside Media Organizational Collaboration (OMOC)......Page 440
18.1. OMOC Annotation Process......Page 444
18.2. Follow-on Exercises......Page 445
19.1. Survey of Individual Collaboration (SIC)......Page 446
19.1.1. Individual Motivation......Page 447
19.1.2. Organizational Motivation......Page 448
19.1.3. Transparency......Page 449
19.1.4. Fairness......Page 450
19.1.5. Trust/Honesty......Page 451
19.1.6. Forgiveness......Page 452
19.1.7. Empathy/Listening......Page 453
19.2. Survey of Organizational Collaboration (SOC)......Page 454
19.2.1. Collective Consciousness......Page 455
19.2.2. Leadership......Page 456
19.2.3. Contribution/Sharing......Page 457
19.2.4. Responsiveness/Respect......Page 458
19.3. Sample Download......Page 459
Part IV. Appendix — Useful Machine Learning and Graph Analysis Tools......Page 460
Appendix A: Identifying Anti-Vaxxers through Machine Learning Using KNIME......Page 462
Appendix B: Generating Nice Graph Pictures with Gephi......Page 474
Appendix C: Sample Mid-Term Exam......Page 480
Appendix D: References......Page 484
Biography......Page 498
Index......Page 500

Recommend Papers

The Strategically Networked Organization : Leveraging Social Networks to Improve Organizational Performance [1 ed.] 9781786352910, 9781786352927

This book demonstrates to managers the strategic significance of intra-organizational social networks. It argues that st

128 95 1MB Read more

Organizational Culture, Business-To-Business Relationships, and Interfirm Networks [1 ed.] 9780857243065, 9780857243058

Presenting an understanding about business-to-business and organizational relationships, this title identifies real-life

148 113 7MB Read more

Relationships Matter: Manage Your Thoughts, Feelings and Actions to Develop and Maintain Healthy Relationships | Simple Tips to Improve Self-Esteem 9781736837306, 1736837303

This book offers simple, straightforward tips to know yourself better, elevate your self-esteem, and improve relationshi

154 107 2MB Read more

Data Networks: Routing, Seurity, and Performance Optimization [1st ed.]

Data Networks builds on the foundation laid in Kenyon's first book, High-Performance Data Network Design, with expa

480 46 9MB Read more

Analyzing social networks [2 ed] 9781526404091, 9781526404107, 1526404095, 1526404109

Designed to walk beginners through core aspects of collecting, visualizing, analyzing, and interpreting social network d

273 39 10MB Read more

Promoting Healthy Human Relationships in Post-Apartheid South Africa: Social Work and Social Development Perspectives [1st ed.] 9783030501389, 9783030501396

This is the first book that examines healthy human relationships in post-apartheid South Africa. In contemporary South A

410 29 3MB Read more

Social Networks and Migration: Relocations, Relationships and Resources [1 ed.] 1529213541, 9781529213546

Leading migration researcher Louise Ryan’s topical and intersectional book provides rich insights into migrants’ social

147 107 11MB Read more

Trends in Human Performance Research [1 ed.] 9781617280863, 9781616685911

The topic of human performance has grown dramatically in recent years and draws on a range of academic disciplines inclu

160 71 2MB Read more

Promoting Healthy Human Relationships in Post-Apartheid South Africa: Social Work and Social Development Perspectives 9783030501389

This is the first book that examines healthy human relationships in post-apartheid South Africa. In contemporary South A

694 76 3MB Read more

Mining and Analyzing Social Networks (Studies in Computational Intelligence, 288) 3642134211, 9783642134210

Mining social networks has now becoming a very popular research area not only for data mining and web mining but also so

105 44 5MB Read more

Author / Uploaded
Peter A. Gloor

Commentary
TruePDF

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

SOCIOMETRICS AND HUMAN RELATIONSHIPS Analyzing Social Networks to Manage Brands, Predict Trends, and Improve Organizational Performance

This page intentionally left blank

SOCIOMETRICS AND HUMAN RELATIONSHIPS Analyzing Social Networks to Manage Brands, Predict Trends, and Improve Organizational Performance BY

PETER A. GLOOR MIT Center for Collective Intelligence, Massachusetts Institute of Technology, Cambridge, MA, USA

United Kingdom North America Japan India Malaysia China

Emerald Publishing Limited Howard House, Wagon Lane, Bingley BD16 1WA, UK First edition 2017 Copyright r 2017 Peter A. Gloor Reprints and permissions service Contact: [email protected]

British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-1-78714-113-1 (Print) ISBN: 978-1-78714-112-4 (Online) ISBN: 978-1-78714-725-6 (Epub)

ISOQAR certified Management System, awarded to Emerald for adherence to Environmental standard ISO 14001:2004. Certificate Number 1985 ISO 14001

CONTENTS Acknowledgments

xi

1. Introduction 1.1. Roadmap 1.2. Key Takeaways of This Book 1.3. Study Plan for a One-Semester Course 1.4. Sample Course Syllabus

1 6 15 18 20

PART I. TREND PREDICTION BY MEASURING SOCIAL NETWORKS 2. Coolfarming Organizations 2.1. Knowledge Flow Optimization through Organizational Social Network Analysis 2.2. The Coolfarming Data Collection and Analysis Process 3. Coolhunting and Trend Forecasting on the Web 3.1. Measuring Collective Awareness 3.2. The Coolhunting Process — Finding Trends by Finding Trendsetter 4. The Six Honest Signals of Collaboration 4.1. The Honest Signals Have Different Meanings for Different Organizations 4.2. Virtual Mirroring Leads to Change 4.3. Dealing with Privacy Concerns v

27 29 31 37 37 39 45 54 56 56

Contents

vi

4.4. 4.5. 4.6. 4.7.

How to Apply Knowledge Flow Optimization Four Examples Areas of E-Mail-Based SNA Improving Financial Capital through Optimizing Social Capital

5. Essentials of Social Network Analysis and Statistics 5.1. Basics of Social Network Analysis (SNA) 5.2. Basics of Statistics 6. How Ideas Spread in Online Social Networks — Readings 6.1. Theories of Information Diffusion 6.2. Spreading Ideas on Facebook 6.3. Finding Fake Reviews through Machine Learning 6.4. Measuring Financial Performance 6.5. Calculating Demographic Information 6.6. Predicting Election Outcome

58 61 64 65

69 70 75 85 89 95 96 97 99 103

PART II. ANALYZING STRUCTURE, DYNAMICS, AND CONTENT OF NETWORKS WITH CONDOR 7. The Four-Step Analysis Process 7.1. Social Media Fetchers 7.2. Social Media Filters 7.3. Social Media Visualizers 7.4. Social Media Exporters

111 115 116 116 118

8. Getting Started with Condor 8.1. Analyzing the Facebook Wall with Condor

121 125

Contents

8.2. 8.3.

8.4. 8.5. 8.6. 8.7.

vii

Sample Four-Step Analysis with Twitter Measuring the Importance of Brands through Betweenness of Actors in Bipartite Graphs Pruning the Leaves in a Graph Degree-of-Separation Search with Google CSE Degree-of-Separation Search with Twitter Wikipedia Search

9. Analyzing E-Mail with Condor 9.1. Creating a Virtual Mirror of Your Own Mailbox 9.2. Finding COINs through Community Detection 9.3. Creating a Virtual Mirror of an Organization 9.4. Analyzing Hillary Clinton’s Mail 9.5. Organizational Aspects of E-Mail-Based SNA 9.6. Follow-on Exercises 9.7. (Partial) List of E-Mail Studies Conducted by the Author in Various Organizations 10.

Calculating Personality Characteristics from E-Mail 10.1. Calculating Correlations between FFI and E-Mail 10.2. Developing a General Prediction Formula 10.3. Adding Gender, Ethnicity, and Nationality as Control Variables 10.4. Follow-on Exercises

130

136 137 141 146 150 153 154 185 192 219 228 232 232

241 242 244 254 260

Contents

viii

11.

12.

Predicting Criminal Intent from E-Mail — Analyzing the Enron E-Mail Archive 11.1. Exploratory Analysis 11.2. Identifying Criminal Actors through Their Honest Signals of Collaboration 11.3. “Tribeﬁnder” — Identifying Criminals through Machine Learning in Condor 11.4. Follow-on Exercises

263 264 273 280 291

Coolhunting on the Internet with Condor 12.1. Expert Analysis — Websites and Blogs 12.2. Swarm Analysis — Wikipedia 12.3. Analysis of the Crowd — Twitter 12.4. Follow-on Exercises 12.5. (Partial) List of Internet Coolhunting Studies

295 298 311 322 334

13.

Coolhunting — Francogeddon 13.1. Follow-on Exercises

339 347

14.

Coolhunting the US Presidential Elections 14.1. Bernie Sander’s Presidential Campaign — The Perfect COIN 14.2. Coolhunting Bernie Sanders, Hillary Clinton, Jeb Bush, and Donald Trump 14.3. Tribeﬁnder on Twitter (Using Machine Learning) 14.4. Follow-on Exercises

349

335

353

356 366 383

PART III. AUTOMATIC MEDIA INSIGHTS COIN ASSESSMENT (AMICA) 15.

Inside Media Individual Collaboration (IMIC) 15.1. IMIC Annotation Process

391 401

Contents

ix

16.

Outside Media Individual Collaboration (OMIC) 405 16.1. OMIC Annotation Process 414

17.

Inside Media Organizational Collaboration (IMOC) 17.1. IMOC Annotation Process

419 423

Outside Media Organizational Collaboration (OMOC) 18.1. OMOC Annotation Process 18.2. Follow-on Exercises

425 429 430

18.

19.

Survey of Individual and Organizational Collaboration (SIC & SOC) 19.1. Survey of Individual Collaboration (SIC) 19.2. Survey of Organizational Collaboration (SOC) 19.3. Sample Download

431 431 439 444

PART IV. APPENDIX — USEFUL MACHINE LEARNING AND GRAPH ANALYSIS TOOLS Appendix A: Identifying Anti-Vaxxers through Machine Learning Using KNIME

447

Appendix B: Generating Nice Graph Pictures with Gephi

459

Appendix C: Sample Mid-Term Exam

465

Appendix D: References

469

Biography

483

Index

485

This page intentionally left blank

ACKNOWLEDGMENTS

The tools and methods described in this book have been developed and tested over the last 12 years in the Collaborative Innovation Networks (COINs) seminar. I am deeply grateful to all my instructor colleagues, and of course to the hundreds of students from the United States, Finland, Germany, Switzerland, Chile, Italy, South Korea, and China who have contributed many creative ideas, and have taught me what works, and what does not. The COINs seminar was started at MIT Sloan in spring 2005. In fall of the same year, the seminar morphed into a virtual distributed course joined by students from Helsinki, supervised by Maria Paasivaara and Casper Lassenius, students from Cologne tutored by Detlef Schoder and Kai Fischbach, and students from Savannah College of Art and Design (SCAD) lectured by Christine Z. Miller. In the meantime, the seminar has also repeatedly been taught at Pontiﬁcia Universidad Catolica Santiago de Chile coached by Cristobal Garcia Herrera, and University of Applied Sciences Northwestern Switzerland, where Michael Henninger has been the indispensable instructor. Since 2011, the students from Cologne have been coached ﬁrst by Johannes Putzke, and since 2014 by Gloria Volkmann, while at xi

xii

Acknowledgments

University of Bamberg, students have been instructed by Kai Fischbach and Matthaeus Zylka. The software tool Condor that is the basis of this course was started in 2003, when the Center for Digital Strategies at Dartmouth College under the leadership of Hans Brechbühl and Eric Johnson agreed to support Yan Zhao’s software development efforts as part of her Master’s thesis supervised by Fillia Makedon. For the next three years, Yan, ably supported by the algorithm genius of her husband Song Ye, built the ﬁrst two versions of Condor, originally called TecFlow. End of 2006, she passed the baton to Renauld Richardet, who added Apache Lucene’s text processing capabilities. In 2008, Condor development continued in Switzerland at galaxyadvisors, funded by the Swiss Commission for Technology and Innovation CTI. Michael Henninger, Hauke Fuehres, Martin Stangl, Lucas Broennimann, Marton Makai, and Kevin Zogg from the University of Applied Sciences Northwestern Switzerland (FHNW) worked on building a fundamentally revised version of Condor in the team of Manfred Vogel and André Csillaghy at the Institute for 4D technologies i4ds. Since 2013, Condor development is done by my colleagues at galaxyadvisors, Marton Makai, Hauke Fuehres, and Joao Marcos Da Oliveira, supported from 2014 to 2015 by Karsten Packmohr. This book is the product of many people working together over 14 years, building the tools and methods described here. First of all, I am grateful to Ken Riopelle and Michael Henninger, who have been essential in making the social media analysis tool Condor accessible to a wider audience beyond programmers and

Acknowledgments

xiii

statisticians. Ken created the ﬁrst Condor videos, and wrote a comprehensive manual, the precursor of this book. Michael wrote the ﬁrst tutorial for Condor in the COINs seminar at University of Applied Sciences Northwestern Switzerland. Ken Riopelle, Michael Henninger, and Lucas Broennimann provided valuable feedback on earlier versions of this manuscript. Ken also contributed the last section of Chapter 3 of Part II. My sincerest thanks to all of you, without your creative ideas, didactical talent, and Java development and software architecting skills, both the COINs course and Condor would not exist.

This page intentionally left blank

1 INTRODUCTION

Imagine being able to spot if a customer is becoming really unhappy with your product and service — and do something about it before they actually leave you. Imagine ﬁnding out what the constituency of a politician or political party really thinks. Imagine ﬁnding out what your customers love and hate about your product. Imagine being able to identify your most creative employees, your external innovators, and lead users — and help them become even more creative. Imagine being able to predict who wants to leave your company, your department, or your project team — and not just identify them, but help them become happy and motivated workers again. Imagine identifying potentially fraudulent or risky behavior among your employees before they actually commit anything illegal.

r 2017 Peter A. Gloor

1

Sociometrics and Human Relationships

2

If you are looking for answers to these and similar questions, read on. This book gives you a framework to analyze your organization from the inside, by mining e-mail, skype, and calendar data, and from the outside, by crunching Twitter, Wikipedia, and blog data. From your and your organization’s e-mail, skype, and calendar data, you can: Find out about the happiness of your employees (see Section 9.3). Find out about the satisfaction of your customers (see Section 9.3). Find out who might be leaving your company (see Section 9.3). Find your most creative and motivated employees (see Chapter 10). Find out about the willingness of your employees to take unnecessary risks (see Section 11.3). From Twitter, Wikipedia, and blog interaction data, you can: Find out about what your customers and prospects really think about your company and your brands (see Chapter 12). Measure the strength of your brand (see Chapter 12).

Introduction

3

Find out about the demographic proﬁle of the customers and aﬁcionados of your company and brands (see Section 14.3). Forecast the popularity and voter share of a politician (see Section 14.2). Find out about the demographic proﬁle of the voters of a politician (see Section 14.3). These are just a few use cases that we will address to study how humans communicate and collaborate inside the organization, through e-mail, chat, videoconferencing, and faceto-face communication, and outside on online social media. Better communication leads to better collaboration, which leads to more and better innovation! This book describes algorithms and tools to ﬁnd and support collaboration within and between organizations. Our approach puts a lens to the organization by mining electronic communications such as e-mail, sociometric badges, telephone, chat, online meeting, Web/videoconferencing, and calendars to make existing communication patterns visible. The Condor software tool, which has been developed over the past decade at the MIT Center for Collective Intelligence and the University of Applied Sciences Northwestern Switzerland, mines these electronic archives and generates a broad range of structural, temporal, and content-based social network metrics which can be used to calculate and forecast all of these real-world insights mentioned above (Figure 1). This book provides a practical guide to Coolhunting and Coolfarming on online social media. It explains how to “Coolhunt” — to ﬁnd cool trends by ﬁnding the trendsetters on Twitter, Facebook, Wikipedia, blogs,

4

Sociometrics and Human Relationships

Figure 1: Focus of This Book.

online forums, and e-mail. It also teaches how to optimize your own communication behavior by creating a personal virtual mirror from your own e-mail, skype log, online calendar, or chat log. It then extends this approach to “Coolfarming” an organization by improving collaboration and innovation through ﬁnding the best communication behavior to reach a certain goal. It mirrors back to the organization and its current communication behavior by mining its e-mail, phone, Web conferencing, or online calendars. This virtual mirror of communication deﬁciencies helps the organization to change its communication behavior for better performance and innovation. The ﬁrst part of the book explains the theory behind Coolhunting and Coolfarming, the second part provides a series of in-depth hands-on tutorials to analyze online social networks, and the third part introduces Automatic Media Insights COIN Assessment (AMICA), a speciﬁc method using Condor applying the procedures and processes introduced in Part II to measure and increase individual and organizational creativity and performance through virtual

Introduction

5

mirroring. After having worked through the examples, you will be able to improve yours and your organization’s communication for better collaboration and better innovation: First, you will know more about yourself by understanding the social network where you, as an individual, are embedded, through analyzing your mailbox and your Web network. Second, you will be able to understand and optimize the communication network of your organization by analyzing its e-mail and other communication archives. This analysis might increase an organization’s creativity, its employees’ satisfaction, or its sales success. Third, you will be able to identify your best customers, your key competitors, and your possible business partners through your communication patterns and position in online social media such as Twitter, blogs, Facebook, and Wikipedia. This book is geared toward students and practitioners with a background in management, human resources, marketing, design, sociology, psychology, and the humanities. It includes numerous examples with the user-friendly software tool Condor that analyzes all types of online social networks such as Twitter, Wikipedia, blogs, Facebook, as well as e-mail. The book is a brief and targeted guide with step-by-step instructions, with an objective to deliver immediate actionable insights for anybody interested in analyzing online social networks. It explains how to visualize, track, and manage brands, products, and topics on the Internet through online social media, and to analyze organizations through their e-mail networks. The book translates latest academic research into practical business strategies and techniques. It provides a wealth of examples of how to apply social network analysis (SNA) for the prediction of trends by mining Twitter, Wikipedia, blogs, and Facebook.

6

Sociometrics and Human Relationships

It also illustrates how to improve organizational performance by optimizing communication and collaboration using individual and organizational e-mail archives. The book is based on a course on Collaborative Innovation Networks (COINs) that has been taught for the last 12 years to students forming virtual teams participating from universities in Boston, Savannah, Helsinki, Cologne, Brugg, Bamberg, Rome, and Chile,1 with majors in business, statistics, education, design, computer science, psychology, and sociology. In this course, students use and analyze social media to answer complex questions impacting society. The course teaches students how to leverage virtual collaborative creativity in the Internet age. It helps them understand and apply the dynamics of online communication using e-mail, social media, Twitter, Wikipedia, and the Web. This is done using online SNA with Condor. The examples in this book have been drawn from class projects from this course. The book includes a free academic license of Condor to analyze dynamic semantic social networks. 1.1. ROADMAP 1.1.1. Part I — Trend Prediction by Analyzing Social Networks • Chapter 2, Coolfarming Organizations This chapter describes the key principles of how innovation can be improved by better collaboration and 1

MIT, Savannah College of Art and Design, Aalto University Helsinki, University of Cologne, University of Applied Sciences Northwestern Switzerland, University of Bamberg, University Tor Vergata Rome, Pontiﬁcia Universidad Catolica Santiago Chile.

Introduction

7

better communication. It shows how by analyzing social networks at companies through mining online communication archives, such as e-mail, skype, calendars, and phone logs works, and how through virtual mirroring

organizational

performance

can

be

optimized. • Chapter 3, Coolhunting and Trend Forecasting on the Web This chapter gives an introduction to the key principles of Coolhunting. Coolhunting measures global consciousness by analyzing the wisdom (and madness) of the crowd on Twitter, the (paid) wisdom of experts on blogs and online newspapers, and the wisdom of swarms on Wikipedia, Facebook groups, and online forums. • Chapter 4, The Six Honest Signals of Collaboration This chapter introduces six social indicators of creative collaboration — “the six honest signals” developed by the MIT’s research group where Condor was created over the last 12 years. The indicators are collected and measured through tweets, bloglinks, Wikipedia entries, e-mail archives, and body signals captured through sensors. These “honest signals” are predictive of future creativity, performance, and outcomes of teams. Changing the individual communication behavior to adhere to these six indicators will lead to better communication, collaboration, and more innovative results. The six indicators are central leadership, rotating leadership, balanced contribution, rapid response, honest language, and shared context.

8

Sociometrics and Human Relationships

• Chapter 5, Essentials of Social Network Analysis and Statistics The chapter gives a short introduction to SNA, which is needed to do a social media analysis. It describes actor-level metrics such as degree and betweenness centrality, contribution index, and path length, as well as group-level metrics such as density, group degree, and group betweenness centrality. It also introduces the basic statistical techniques (t-tests, correlation, regression) illustrated using the KNIME environment, which is described in the appendix, to understand predictive analytics for forecasting organizational variables such as employee satisfaction, personality characteristics, or sales success based on e-mail communication in the organization. The same statistics is needed to analyze online social media such as Twitter to predict friends and foes of politicians, the outcome of elections, or who will win an Oscar. • Chapter 6, How Ideas Spread in Online Social Networks — Readings This chapter brieﬂy presents the insights from 22 key papers that provide the theoretical background for the examples described in Part II. They are structured into theories of information diffusion, how ideas spread on Facebook, how machine learning can be more accurate than human judgment in analyzing online social networks, how stocks and other ﬁnancial indicators can be predicted from Twitter, Google, and Wikipedia, how demographic information can be mapped to real-world users by geographic and other indicators, and how the outcome of elections can be

Introduction

9

predicted from social media. If this book is used for a classroom course, students may be asked to read and present the papers in the classroom as part of the course. 1.1.2. Part II — Analyzing Structure, Dynamics, and Content of Networks with Condor The second part of the book describes how to use Condor for Coolhunting and Coolfarming described in Part I. • Chapter 7, The Four-Step Analysis Process This chapter describes the key analysis process in Condor, starting with collecting communication data not only from Twitter, Facebook, Wikipedia, and blogs, but also from e-mail and other types of organizational communication archives such as calendars. The collected data is then preprocessed and cleaned using a series of content ﬁlters. In the next step, Condor provides a variety of visual analysis tools, to visually explore the social network in many different ways. In the last step, the data is exported as actor-level variables and time series for further statistical analysis in tools like Excel, KNIME, R, or SPSS. • Chapter 8, Getting Started with Condor This chapter introduces the basics of Condor on Mac and Windows, including how to install MySQL and Java, which are needed for Condor. It will use precollected datasets from Twitter, Wikipedia, Facebook, blogs, and e-mail to teach the essentials of digital network, sentiment, and content analysis using Condor.

10

Sociometrics and Human Relationships

It also introduces degree-of-separation search to measure the importance of inﬂuential, brands, and products on the Web. • Chapter 9, Analyzing E-Mail with Condor This chapter teaches how to use Condor to analyze email networks and discover hidden communities. Creating a social network to map a personal mailbox will give unprecedented insights into whom one is working most closely with, who the hidden inﬂuencers are, whom one likes the best, and whom one respects the most, and how these measures can change over time as the network of relations changes with new projects, employees, suppliers, and clients. The same SNA can also be extended to teams and entire companies. The second example in this chapter analyzes the network of an entire organization using the e-mail communication of a class of 50 students working in 10 teams. The third example analyzes Hillary Clinton’s e-mails released as part of the controversy about her use of a private e-mail server while she was serving as the US secretary of state. The network map can be used as the foundation to improve communication within the organization, by identifying bottlenecks, collaborators, hidden inﬂuencers, and people bridging structural holes. Even more, it can also be used to improve knowledge ﬂow in business processes, and to increase organizational effectiveness by tracking and improving employee satisfaction, customer satisfaction, employee turnover, and salesforce effectiveness. The analysis is based on the “six honest signals of collaboration” introduced in Part I.

Introduction

11

• Chapter 10, Calculating Personality Characteristics from E-Mail In this chapter, we calculate the personality characteristics of individuals based on their e-mailing behavior. We compare the six honest signals of collaboration of individual actors with their personality characteristics measured through the Big Five personality characteristics. The Big Five personality test measures Neuroticism, Extraversion, Openness to experience, Agreeableness, and Conscientiousness through a survey and is commonly used to assess personality characteristics by scientiﬁc psychologists. • Chapter 11, Predicting Criminal Intent from E-Mail — Analyzing the Enron E-Mail Archive In this chapter, we try to catch criminals based on their e-mailing behavior, by analyzing the e-mail archive of Enron. The Enron e-mail archive documents the spectacular crash of Texan energy trading ﬁrm Enron at the end of 2001. Enron’s downfall has been widely publicized and has also been described in the book The Smartest Guys in the Room and in a movie by the same name. We identify differences in the six honest signals of collaboration between ordinary Enron employees and the convicted criminals, which in theory could be used to identify potential suspects in other e-mail archives. The chapter also introduces “tribeﬁnder” that uses Condor’s machine learning capability to identify people with communication patterns similar to the convicted criminals. • Chapter 12, Coolhunting on the Internet with Condor This chapter illustrates how to use Condor for analyzing the importance of a brand on the Internet.

12

Sociometrics and Human Relationships

Coolhunting for a brand consists of identifying the context of a brand, in particular, its competitors, measuring the relative strength of the brand and its competitors, and identifying the brand’s associated inﬂuencers, ranking them by their impact. Thanks to the availability of geotagging, this analysis can be done globally, and can also be restricted by geography, drilling down into different target markets. The Coolhunting process is illustrated using Condor by tracking a brand on the Web, Wikipedia, and Twitter. • Chapter 13, Coolhunting — Francogeddon This chapter illustrates Coolhunting on the Web and on Twitter, measuring the global awareness during Francogeddon, when on January 15, 2015, the Swiss National Bank unexpectedly removed the link between Euro and Swiss Franc, leading to huge global currency ﬂuctuations and the bankruptcy of some hedge funds. The global sentiments of those events are analyzed through tweets about “Swiss Franc,” “Euro,” and “USD.” • Chapter 14, Coolhunting the US Presidential Elections This chapter gives a detailed example of Internet Coolhunting by analyzing and predicting the outcome of elections. The 2016 US Presidential election provides an excellent opportunity to study Coolhunting and Coolfarming. Not only are the US Presidential elections fought to a large extent on social media, but the differing styles of the candidates also offer a prime example of the difference between COIN-based and hierarchical leadership style. Using machine learning in “tribeﬁnder,” Condor identiﬁes members of the “Bernie Sanders tribe” and the “Donald Trump tribe,”

Introduction

13

people with Twitter behavior similar to known Donald Trump and Bernie Sanders fans. 1.1.3. Part III — Automatic Media Insights COIN Assessment (AMICA) AMICA is an assessment of individual and group behavior that measures, compares, and optimizes the collective mindset of an individual, organization, or a company. AMICA identiﬁes which types of communication patterns are indicative of the most efﬁcient and effective collaboration and helps individuals and organizations to improve their collaborative behavior. • Chapter 15, Inside Media Individual Collaboration (IMIC) IMIC measures the collaboration behavior of individuals inside an organization, based on their e-mail, skype, and calendar archives. It displays results as a comparative radar chart and offers a drill-down with social network charts, scatter plots, and bar charts. • Chapter 16, Outside Media Individual Collaboration (OMIC) OMIC measures the collaboration behavior of individuals seen from the outside through online social media such as Twitter, Facebook, Wikipedia, and Google search. It starts with an analysis of an individual’s footprint on Twitter and drills down through a Wikipedia, Facebook wall, and Google Blog search analysis. This chapter also introduces Twitter EgoFetcher, which allows to measure the echo chamber of an individual on Twitter, similar to an individual mailbox analysis.

Sociometrics and Human Relationships

14

• Chapter 17, Inside Media Organizational Collaboration (IMOC) IMOC measures the collaborative performance of an organization based on the organization’s e-mail, skype, and calendar archives. It compares departments, business units, or companies using the group measures of Condor. • Chapter 18, Outside Media Organizational Collaboration (OMOC) OMOC measures the collaboration behavior of companies from the outside through online social media such as Twitter, Facebook, Wikipedia, and Google search. It starts with an analysis of the organization’s footprint on Twitter and drills down with Wikipedia and Google Blog search analysis. • Chapter 19, Survey of Individual and Organizational Collaboration (SIC & SOC) The four automated online media-based assessments of AMICA are complemented by two survey-based assessments, Survey of Individual Collaboration (SIC), focusing on the collaborativeness of the individual, and Survey of Organizational Collaboration (SOC) with a focus on the organization.

1.1.4. Part IV — Appendix — Useful Machine Learning and Graph Analysis Tools The appendix describes KNIME and gephi, two additional tools besides Condor, useful for mapping the collective mind on online social media.

Introduction

15

• Appendix A — Identifying Anti-Vaxxers through Machine Learning using KNIME This appendix describes how to use machine learning to distinguish supporters and objectors of the “AntiVaxxer” theory through their online behavior. It analyzes a dataset of tweets that was collected in Spring 2015. The resulting tweets, together with information about the tweeters, were used to manually classify two sets of tweets, one belonging to pro-vaxxers and the other belonging to anti-vaxxers. It illustrates the use of KNIME, an opensource text mining and data analytics tool with a visual frontend, to investigate if pro- and anti-vaxxers use function words in different ways and thus develop an automatic way of proﬁling online users based on their behavior. • Appendix B — Generating Nice Graph Pictures with Gephi This appendix discusses how to use the open source graph drawing tool Gephi, to draw and manipulate graphs with additional functionality and layout options not available in Condor, such as clustering, segmenting, and pruning the networks.

1.2. KEY TAKEAWAYS OF THIS BOOK The goal of the book is to teach you how to read the collective mind through interpreting honest signals of collaboration. As all of us are part of the collective mind, understanding it will also change our own behavior. It will help you understand who you REALLY are and how you can become whom you would like to be.

Sociometrics and Human Relationships

16

Applying the principles of social quantum physics introduced in the companion book Swarm Leadership and the Collective Mind: Using Collaborative Innovation Networks to Build a Better Business, this book helps you to learn about how to build entanglement through empathy, and to reﬂect and reboot (Figure 2). In particular, after having worked through the examples described in this book, you will know about the following: • A practical framework for trend prediction based on social media analysis. • A process description of the “six honest signals of collaboration” as a key mechanism for trend prediction and to increase organizational creativity, performance, and collaborativeness. • A tutorial illustrating Coolhunting for trends by ﬁnding the trendsetters on online social media. • Guidelines for Coolfarming through “virtual mirroring” to analyze individual, group, and organizational e-mail archives to increase personal, group, and organizational effectiveness and creativity.

Figure 2. Four Principles of Social Quantum Physics.

Introduction

17

• AMICA, a method for conducting virtual mirroring and increasing collaboration through analyzing inside media such as e-mail, and outside media such as Twitter and Wikipedia on the individual and organizational level. • Step-by-step tutorials to get started with the userfriendly automated social media analysis and monitoring tool Condor. • Using machine learning with “tribeﬁnder” to segment online proﬁles according to sociodemographic criteria based on their different attributes. • Detailed descriptions to tackle hard business problems through mining communication archives: Track the happiness of your employees Track the satisfaction of your customers Predict which employees consider leaving your company Locate your most creative and motivated employees Forecast the propensity of your employees to take unnecessary risks Track the opinion of customers and prospects about your company and your brands Track the strength of your brand Discover the demographic proﬁle of friends and foes of your company and brands Predict the popularity and voter share of a politician Discover the demographic proﬁle of the voters of a politician and lovers of a brand.

18

Sociometrics and Human Relationships

1.3. STUDY PLAN FOR A ONE-SEMESTER COURSE This section describes the syllabus and study plan of the COIN course that has been taught for the last 12 years at dozens of universities around the world. This course is usually run as a collaboration between different universities, with students from different universities working together as teams. In this course, students learn how to build collective consciousness by becoming “entangled” through building empathy with team members from other countries and cultures (Figure 2). By reﬂecting on their own communication behavior, they will also get to know better about themselves through the eyes of others. This will lead them to reboot, to change their own behavior. The emergence of online social networks opens up unprecedented opportunities to read the collective mind, discovering emergent trends while they are still being hatched by small groups of creative individuals. Using concepts from psychology and sociology, this course gives students the opportunity to lead and work in a wide range of projects analyzing a large corpora of digital traces of human activity. The Web has become a mirror of the real world, allowing course participants to study and better understand why some new ideas change our lives, while others never make it from the drawing board of the innovator. The aim of the COIN course is to track the emergence of new ideas through SNA. Using concepts from sociology and psychology, students predict what people will be doing next, by analyzing their social interactions on three levels: (1) global — on the Internet, blogs, Twitter, Facebook,

Introduction

19

and Wikipedia, (2) organizational — through e-mail/ phone/chat, and (3) individual — through collecting body signals. Students measure “honest signals” of communication through tweets, bloglinks, Wikipedia entries, e-mails, chats, and phone archives, and body signals captured through cameras and other sensors. The COIN seminar is a demanding course combining skills from many interdisciplinary ﬁelds: • SNA • Psychology, sociology, and management • Using and building software tools for analyzing online social networks • Selected statistical methods for data mining and data ﬁltering • Concept visualization and information modeling. Learning Goals • Students learn to work and cooperate in virtual international teams • Students learn to analyze and visualize chosen topics with the help of an appropriate software tool • On the global level, students learn to correlate Web sentiment with macroeconomic indicators, and blog buzz with the outcome of political elections • On the organizational level, students learn to compare performance metrics such as revenue, productivity, peer ratings, and customer satisfaction with e-mail network metrics in a variety of settings.

Sociometrics and Human Relationships

20

1.4. SAMPLE COURSE SYLLABUS Below is a sample syllabus for a one-semester course, with a two-hour class every week. #Lesson Topic (Two-Hour Lessons) 1

Introduction to Swarmcreativity (Based on the Companion Book Swarm Leadership and the Collective Mind) Preparatory reading: Chapter 2 Chapter 3 Chapter 4

2

Introduction to Condor: Following the introductory Condor chapter of this book, students experiment with Condor in class (Chapter 8) Preparatory reading: Chapter 7

3

Basics of Social Network Analysis and Statistics: If the students have no previous experience in SNA, a two-hour class explains the basic SNA metrics such as betweenness and degree centralities of actors and networks, t-tests, correlations, and regressions (Chapter 5)

4

Presenting an Analysis of Own E-Mail Network in Class: As a ﬁrst assignment, each student brieﬂy presents the results of analyzing her/his own e-mailbox, skype network, or Facebook wall (described in Section 9.1) Preparatory reading: Chapter 9

5

Presenting Papers 1 — In ﬁve-minute presentations, students present the ﬁrst 13 papers (Chapter 6)

6

Presenting Results of Individual Coolhunting — Each student brieﬂy presents the results of an individual Web Coolhunting project about a topic of her/his choice, collecting and analyzing Twitter, blog, and Wikipedia data. Preparatory reading: Chapter 12

7

Mid-Term Exam: Consists of six to eight questions on SNA, statistics, Coolhunting, and Coolfarming. The second part of the exam consists of a Condor Web Coolhunting task. A sample Midterm exam can be found in the appendix.

Introduction

21

(Continued ) #Lesson Topic (Two-Hour Lessons) 8

Team Formation: Students from different sites introduce themselves brieﬂy (one minute per student) using Skype, Hangout, or WebEx. Then the topics for the teamwork are presented by the instructors. Next students sign up for a project, for example, using Doodle. Each student may choose two topics; there have to be students from at least two locations in each team, and a team may have at most ﬁve members. Students will cc all their team-speciﬁc e-mail trafﬁc to a dummy e-mail address, for example, [email protected], to be used for the virtual mirror in Lesson 12.

9

Virtual Meeting 1: Each team presents in 58 minutes, using SCRUM. SCRUM is an agile prototyping-oriented software development method, where developers work in iterations called “sprints,” and show prototypes to each other frequently. Structure of the presentation per team: Project goals Overall project plan Plan for ﬁrst iteration Your way-of-working

10

Presenting Scientiﬁc Papers 2 — In ﬁve-minute presentations students present the second group of 12 papers (Chapter 6)

11

Virtual Meeting 2 Structure of the presentation per team (ﬁve minutes per team): Goals of the project Progress during the last iteration: explain what was done Show the main results Goals and plans for the next iteration Output from the retrospective Problems/questions?

Sociometrics and Human Relationships

22

(Continued ) #Lesson Topic (Two-Hour Lessons) 12

Virtual Meeting 3/Virtual Mirror Students get the e-mail data of their team and of the full class. Each team will present a virtual mirror as described in Section 9.3. Structure of the presentation per team (810 minutes per team): Presentation of the virtual mirror Goals of the project Progress during the last iteration: explain what was done Show the main results Goals and plans for the next iteration Output from the retrospective Problems/questions?

13

Virtual Meeting 4 Structure of the presentation per team: Goals of the project Progress during the last iteration: explain what was done Show the main results Goals and plans for the next iteration Output from the retrospective Problems/questions?

14

Final Presentations (10 minutes/team) Goals of the project Related work (what others have done in the same area) Work process (how did you organize your work) Results Possible extensions What worked well/what could have been done better (both from the team perspective, and advice for the instructors)

Introduction

23

Virtual meetings can either be conducted in the local classrooms, with the classrooms at different sites connected by Web conferencing, or students may be allowed to connect from home. We found that a mix of on-site and off-site virtual meetings works best. In on-site virtual meetings, students are asked to participate from the classroom, to increase bonding and knowledge sharing between teams. In the off-site virtual meetings, students are allowed to join remotely from wherever is most convenient for them.

This page intentionally left blank

PART I. TREND PREDICTION BY MEASURING SOCIAL NETWORKS This ﬁrst theoretical part gives an introduction to the basic concepts of COINs, Coolhunting, and Coolfarming. Coolfarming means using dynamic semantic social network analysis to increase individual and organizational creativity by creating and nurturing COINs. Coolhunting means ﬁnding trends by ﬁnding the trendsetters by ﬁnding the COINs through dynamic semantic social network analysis.

r 2017 Peter A. Gloor

25

This page intentionally left blank

2 COOLFARMING ORGANIZATIONS

CHAPTER CONTENTS • What is Coolfarming? • Knowledge Flow Optimization • The Coolfarming Data Collection and Analysis Process.

When Robinson Crusoe was stuck on a lonely island for years, in spite of plenty of food and a subtropical climate, he only had one wish, to ﬁnally meet and connect with other people. Relationships form the core of human existence. The way we communicate in our relationships is key for building private and professional success and happiness. In this book, we will learn how to analyze and measure individual, organizational, and global social networks by mining online communication archives such as e-mail, Twitter, Facebook, and blogs to increase collaboration and creativity by better communication. In this initial chapter, we look at networks from the perspective of the individual — called ego networks — and from the perspective of the organization — called organizational networks. Our main means to analyze these networks is r 2017 Peter A. Gloor

27

28

Sociometrics and Human Relationships

communication archives: most prominently, e-mail logs, but also chat, online calendars, Web conferencing logs, and phone archives. The graphtheoretical foundation used to analyze these networks is Social Network Analysis (SNA). Classic SNA looks at the structure of networks; in our own work, we have added analysis of the dynamics of network change over time, and an analysis of the content of the networks, for example, in the content of e-mails or Tweets. Dynamic and content-based SNA affords an X-ray into the inner workings of an organization, mapping the informal relationships that transcend organizational hierarchy. It gives an assessment of communication and knowledge ﬂow, resulting in actionable data to optimize outcomes. The Condor software used in our examples provides an interactive dashboard to deep dive into the structures of ego and organizational networks. Our approach puts a lens to the organization by mining e-mail archives and, as relevant, other electronic communications (e.g., telephone, chat, online meeting, Web/ videoconferencing, calendars) to make existing communication patterns visible. The Condor software tool, which has been developed over the past decade at the MIT Center for Collective Intelligence and University of Applied Sciences Northwestern Switzerland (FHNW), mines electronic archives (e-mail, phone, chat, Web conferencing, sociometric badges — body worn sensors) and generates a broad range of SNA metrics. In the following description, when we talk about e-mail, the term “e-mail” stands for all types of organizational communication archives.

Coolfarming Organizations

29

2.1. KNOWLEDGE FLOW OPTIMIZATION THROUGH ORGANIZATIONAL SOCIAL NETWORK ANALYSIS Just as weather patterns predict sunshine or thunderstorms, communication ﬂows allow us to anticipate positive and negative developments in groups of people. Like a weather forecast, an SNA can serve as an early warning system, revealing the organizational equivalent of sunny days with cool breezes — or impending storms — in interactions between members of groups. Organizational forecasts of this kind are difﬁcult to obtain by other means. Business process reengineering forever changed the way companies do business, introducing a process focus and streamlining structured business processes. SNA can do the same for unstructured, knowledge-intensive processes. By visualizing the ﬂow of knowledge (Figure 3), making it transparent, and reengineering its ﬂow, organizations and Figure 3: Knowledge Flow Is the Nervous System of an Organization.

30

Sociometrics and Human Relationships

individuals can become more creative, innovative, and responsive to change. SNA offers companies an opportunity to complement their organizational charts and business process maps with more ﬂuid maps of communication ﬂows. By making the communication ﬂow transparent, organizations can make better use of people by freeing them from being buried in conventional multilayer hierarchies. By establishing ﬂexible ad hoc workﬂows based on communication ﬂows, people can assume more efﬁcient roles, which also leads to increased individual motivation. Applying these insights to increase organizational creativity is what I call “Coolfarming.”1 Coolfarming allows the organization to understand and optimize key parameters of organizational health, such as identifying their most creative employees, and ﬁnd the communication patterns of creativity for their particular organization. It can also ﬁnd the happiest employees and ﬁnd communication patterns of satisﬁed employees as well as identify the communication patterns of dissatisﬁed employees who are ready to leave the ﬁrm. It can identify the honest signals of happy and unhappy customers. Coolfarming can be done from within the organization, by mining e-mail, calendar, phone/skype log, and measuring face-to-face interaction using sociometric badges, little devices worn on the body. It can also be done from the outside by mining Twitter, Facebook, blogs, and Wikipedia entries discussing the organization to be analyzed. On the outside, it can locate discussions about the relationship with the company on social media such as Twitter, blogs, and Facebook groups. It can identify productive and less productive collaboration with business 1

Gloor (2010).

Coolfarming Organizations

31

partners by tracking e-mail exchange between company executives and their outside business partners. It can ﬁnd social media linkage between company and business partners. Finally, it can also spot novel business ideas, for instance, by identifying new vocabulary picked up by employees through corporate e-mail in the outside discussion on online social media. 2.2. THE COOLFARMING DATA COLLECTION AND ANALYSIS PROCESS The Coolfarming data collection process starts with setting up a way to continuously collect the organization’s communication archive (Figure 4). In the next step, outcome metrics such as customer or employee satisfaction need to be deﬁned and measured. In the third step, these outcome metrics are compared against the social networking metrics, in particular, the six honest signals of collaboration introduced in more detail in Chapter 4. In the fourth step, communication behavior of the organization is continuously tracked and mirrored back to the employees. Figure 4: Coolfarming Data Collection and Analysis Process.

Sociometrics and Human Relationships

32

The Coolfarming process therefore is conducted in four successive steps: 1. Assessing of the organization’s existing communication patterns and structures. 2. Benchmarking the organization’s communication patterns (its “honest signals”) against those seen in other organizations doing similar work. 3. Correlating and calibrating communication patterns against performance metrics. 4. Virtual mirroring: Showing individuals how far away they are from optimal communication, which will lead them to change their behavior, which in turn will lead to improved communication, resulting in better collaboration, leading to more innovation.

2.2.1. Assessing the Organization’s Communication Patterns In the ﬁrst phase, the social network within the organization is visualized through social network pictures, movies, and other charts and statistics. This way a communication matrix between different business areas can be constructed, and the interactions at the department, role, and employee levels can be analyzed. E-mail-based SNA of the organization on its own can provide insights into the following key points at the divisional, departmental, and role/individual levels: • Who are key inﬂuencers? Who is central in the network?

Coolfarming Organizations

33

• How do they behave? Do they contribute to discussions or ﬁlter them? Do they assume a collegial/ creative work style? Do they respond quickly? What is the sentiment of their conversations? • Potential bottlenecks and ways to alleviate those. • Prospective future leaders. • Hidden innovation teams (COINs). • Spot individuals who build strong trust relationships to connect within the organization. • Compare in-group and out-group communication behavior. At the organizational level, e-mail-based SNA can address questions such as how business units interact with the rest of the organization and how outside partners interact with the organization. E-mail-based SNA can reveal otherwise invisible communication behaviors that transcend ﬁxed workﬂows, revealing patterns present inside an organization and in its interactions with peer groups inside the corporation and outside organizations. Such analysis can assist senior management in coaching individuals and teams and redesigning key processes and organizational structures to foster creativity.

2.2.2. Benchmarking the Organization’s Communication Patterns against Those Seen in Other Organizations After the initial organizational ﬁngerprint has been revealed, communication patterns within the organization

34

Sociometrics and Human Relationships

can be compared against those in other organizations. In our projects at MIT and galaxyadvisors, we have studied over 100 different organizations from different industries such as automotive, chemical, ﬁnancial services, health care, management consulting, pharmaceutical, outsourcing, retail, technology, and nonproﬁt sectors and ranging in size from global top 100 ﬁrms to small start-ups, and collaborations that occur on the open Internet, for instance, Wikipedia, Stackoverﬂow, and other open forums. 2.2.3. Correlating Communication Patterns against Performance Metrics If the organization has performance metrics, which it can share, these can be used to calibrate performance with communication patterns. Performance metrics could, for instance, be customer or employee satisfaction, sales success, completing projects on time and budget, or the number of patents ﬁled. The correlations between communication behavior and performance variables can then be used to identify which communication patterns are associated with superior performance. These insights can then be taught to the members of the organizations to optimize their communication behavior. 2.2.4. Virtual Mirroring Showing individuals their own communication behavior, and telling them which behavior is desirable (based on Steps 13 above), will change their behavior toward being more collaborative, and thus more innovative. We were able to demonstrate signiﬁcant performance

Coolfarming Organizations

35

Figure 5: Virtual Mirroring Process.

improvements in earlier similar projects through virtual mirroring. Figure 5 describes the four steps of virtual mirroring. The last part of this book “Automatic Media Insights COIN Assessment” (AMICA) introduces a framework to conduct virtual mirroring. Before the project starts, key methodological, technical, and legal issues will need to be resolved: • Agree on the number of actors to be analyzed: will the analysis be of the in-group only (focus exclusively on interactions between people inside the organization); the in-group and corporate peers (includes the above plus interactions between the organization and other units); or in-group, peers, and out-group (all of the above plus interactions with people in outside organizations). • Agree on the time period to be analyzed. • Decide if subject line and/or content is to be included in the analysis (without at least one of these, sentiment and innovative inﬂuencer analysis cannot be done).

Sociometrics and Human Relationships

36

• Decide whether this is a one-time analysis or if the long-term goal is to move toward a continuous collaboration dashboard. • Resolve privacy/regulatory issues. • Determine how e-mail can be accessed technically and how the data will be formatted and transmitted. In the next chapter, we will learn how we can apply the same process of social media analysis and trend prediction to the open Internet through a process we call “Coolhunting.” MAIN LESSONS LEARNED • Relationships are at the core of human existence, understanding them allows organizations and individuals to increase their creativity, performance, and happiness. • Knowledge Flow Optimization streamlines unstructured business processes by constantly monitoring and improving electronic communication. • Coolfarming nurtures and optimizes COINs through virtual mirroring. • Comparing knowledge ﬂow in organizational (inside) e-mail, skype, and calendar networks with outcome variables indicative of performance (e.g., sales success, customer satisfaction, employee creativity) will lead to interventions to increase performance.

3 COOLHUNTING AND TREND FORECASTING ON THE WEB

CHAPTER CONTENTS • Coolhunting measures collective awareness • Coolhunting combines expert, swarm, and crowd on blogs, Wikipedia and forums, and Twitter • Coolhunting ﬁnds trends by ﬁnding the trendsetters.

3.1. MEASURING COLLECTIVE AWARENESS Does an organization — and thus ultimately humanity — show some sort of consciousness or self-awareness? One might think so, at least in moments such as on the day when Princess Diana died, or more recently, on that day in April 2013 when I was stuck at home in Cambridge while the Boston Marathon bomber was roaming at large in the neighborhood. In those intense moments, we feel maybe not “collectively intelligent” but certainly “collectively aware” or “collectively conscious.” If we meet a stranger in those moments, we know what they are r 2017 Peter A. Gloor

37

38

Sociometrics and Human Relationships

thinking, namely “it’s so sad Diana died” or “where might the marathon bomber be hiding and hitting next.” Moments like these motivate an informal deﬁnition of “organizational consciousness.” It is analogous to the human body, where the brain is conscious of the toe, and will respond differently depending on whether a person hits his/her toe at the door or somebody else steps on his/ her toe. Extending this metaphor, a “collectively conscious” organization will respond differently if somebody hits a member purposefully or if a member hurts himself/ herself. Similarly to the neurons in the brain that are communicating through their synapses to create consciousness, humans communicate by interacting with each other verbally, through text, or other signals, either faceto-face or over long distance by phone or Internet. To prove existence of consciousness on the individual level, Descartes famously stated “cogito ergo sum” — “I think, therefore I exist.” Extending this deﬁnition to an organization, “if the organization thinks and acts as one cohesive organism, it exists” and thus shows collective consciousness, deﬁning organizational consciousness as the common understanding in an organization’s global context, which allows the members of the organization to implicitly coordinate their activities and behaviors. As an example of a global-level event, in the case of the Boston Marathon bomber, everybody in the Boston area was trying to stay abreast of the most recent developments on Twitter, Facebook, and the News, and looking out for traces of the terrorists. On the organizational level, a well-oiled team of software developers working together closely face-to-face, using chat or e-mail trying to debug a jointly developed application also shows

Coolhunting and Trend Forecasting on the Web

39

a high level of organizational consciousness, as they are able to coordinate their work with minimal use of words. In the research by our team described in this book, we aim to make this implicit understanding more measurable, similarly to brain researchers, who measure individual levels of consciousness by attaching probes to individual neurons, tracking the electrical ﬂow of current ﬂowing through synapses between the neurons. In our work, we measure interaction among people through “Coolhunting” on online media such as e-mail, Twitter, Facebook, and blog posts, applying a framework of “six honest signals of collaboration” to assess the level of global consciousness.

3.2. THE COOLHUNTING PROCESS — FINDING TRENDS BY FINDING TRENDSETTER The Coolhunting approach distinguishes between three different sources of information: the crowd, the experts, and the swarm. The difference is explained well through the metaphor of Coolhunting for a restaurant as a tourist in a foreign city. Following all other tourists will bring us to the places where all the tourists go; these restaurants will be crowded, full of other tourists, expensive, and not particularly good. This is what following the crowd gives us, as the crowd likes to follow well-trodden paths. If we ask the concierge in our hotel for a recommendation, we will end up in a better restaurant, with better food, but it most likely will still be full of tourists, and much more expensive, as the concierge will be sending tourists to a good restaurant, but most likely to the one that pays him a kick-back for sending people there.

40

Sociometrics and Human Relationships

This is what following the advice of the expert brings us. The problem with experts is that they take kick-backs from the organizations whom they recommend, as they are paid to give advice, just like the rating agencies Moody’s and Standard & Poor’s, which get paid from the same companies and governments whom they are supposed to assess, leading to serious conﬂict of interest. As tourists in a foreign city, we will ﬁnd the best places to eat if we visit the places popular with the local residents. The hard part is trying to identify the locals on the street and in a crowded restaurant, as they are hard to distinguish from the tourists. We might get some hints by looking at their clothing and listening to their language. We call this the swarm, leading in our restaurant example to the best meal at the lowest price. When doing Coolhunting on social media (Figure 6), we can make the same distinction between crowd, experts, and swarm, based on the media source. Twitter usually Figure 6: Coolhunting on Social Media.

Coolhunting and Trend Forecasting on the Web

41

gives us the wisdom (and madness) of the crowd, blogs and online newspapers give us the (paid) wisdom of the experts, whereas the (intrinsically motivated) swarm might be found among Wikipedia editors, in Facebook groups, and on subject-matter-speciﬁc online forums. Obviously, the intrinsically motivated swarm will give us the best information quality. Tracking the right hashtags on Twitter might also lead us to the swarm for a certain topic. The Coolhunting overview consists of ﬁlling in the following 3 by 3 matrix, ﬁnding the key topics, key people and organizations, and key websites (Table 1). For example for the topic “Social Determinants of Health” doing a Coolhunting (with Condor) would give us the following terms (Table 2). Once we have the ﬁrst context by reading the Wikipedia page about “Social Determinants of Health,” we can get a feel of the importance of the term by putting it into the context using Google trends. In our example, we compare “Social Determinants of Health” with “poverty reduction,” “minimum wage,” and “early childhood development.” We ﬁnd that “minimum wage” is the most discussed term by far of the four, the other three are all on the same (much lower) level of interest. Table 1: Generic Coolhunting Overview. Key Topics Experts (from websites) Swarm (from Wikipedia) Crowd (from Twitter)

Key People and Organizations

Key Websites

Sociometrics and Human Relationships

42

Table 2: Coolhunting Overview of the Example of the Topic “Social Determinants of Health.” Key Topics

Key People and

Key Websites

Organizations Experts (from

Education,

Thoraya Ahmed

www.cnn.comwww.

websites)

graduation rate

Obaid

forbes.com

Jake Grovum Swarm (from

Early childhood

Wikipedia)

development

WHO

http://www.who.int/

Crowd (from

Poverty,

Bill Moyers

http://billmoyers.com/

Twitter)

minimum wage

The Oregonian

http://www.

Steny Hoyer

oregonlive.com/

hia/evidence/doh/

Image 1a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Detailed Coolhunting examples are provided in Chapter 12.

Coolhunting and Trend Forecasting on the Web

43

To resume, Coolhunting consists of the following key steps: 1. Get an overview of the topic using Google, Wikipedia, and other online relevant sources. 2. Find the right search terms for Twitter, blog search, and Wikipedia — this involves repeated “trial and error” experiments with different search terms. 3. Calculate the key strength of the brands and key people by constructing the degree-of-separation networks described in Section 8.5 by combining the search terms of the Twitter, blog, and Wikipedia networks and calculating the betweenness of the combined network. 4. Show the resulting networks and label the search terms and key people. 5. Present the key conclusions and unexpected ﬁndings of the Coolhunt. In the next chapter, we will learn about the six honest signals of collaboration, which have been developed for e-mail based analysis, but are similarly applicable to Twitter, blog, and Wikipedia Coolhunting results.

MAIN LESSONS LEARNED • Coolhunting means ﬁnding signals of collective awareness on online social media such as blogs, Facebook, Twitter, Wikipedia, and forums. • Coolhunting trendsetters.

ﬁnds

trends

by

ﬁnding

the

44

Sociometrics and Human Relationships

• Coolhunting ﬁnds cool people by ﬁnding their COINs and measuring their betweenness in online social networks. • Coolhunting distinguishes between the knowledge of the experts, the madness and wisdom of the crowds, which are both extrinsically motivated, and the wisdom of the swarm, people motivated intrinsically by their cause. • We ﬁnd experts on blogs, the crowd on Twitter, and the swarm on Wikipedia, Facebook, and online forums.

4 THE SIX HONEST SIGNALS OF COLLABORATION

CHAPTER CONTENTS • The six honest signals of collaboration: strong leadership, balanced contribution, rotating leadership, responsiveness, honest sentiment, shared context • Different interpretations for highly creative and high-performance settings • Four-step process of Coolfarming: analyze, predict, mirror, optimize • E-mail use cases: forecasting customer satisfaction, predicting employee attrition, and improving sales effectiveness and creativity of medical researchers • Improving ﬁnancial capital through optimizing social capital.

Just like a satellite image allows the meteorologist to predict the weather of the next few days with surprising accuracy, interpreting an e-mail archive allows the analyst r 2017 Peter A. Gloor

45

46

Sociometrics and Human Relationships

to predict personality attributes of the mailbox owner. On the organizational level, the organization’s effectiveness, happiness, the satisfaction of its customers, the propensity of members to leave the organization, or the sales performance of teams and individuals can be predicted by analyzing its e-mail archive. Over the last 15 years, our research group at the MIT Center for Collective Intelligence, University of Cologne, and University of Applied Sciences Northwestern Switzerland (FHNW) has studied hundreds of organizations through the lens of their social networks extracted from the organization’s e-mail archive. Our goal has been to develop a set of metrics and software tools to make informal communication in organizations as measureable as what SAP does for payroll and accounting. Among many others, we have studied social networks at R&D organizations at car manufacturers, marketing departments at banks, sales teams at high-tech manufacturers, medical researchers and doctors at large hospitals, and service delivery teams at large consulting and service provider ﬁrms. In addition, we have also looked at open-source organizations like Eclipse software developers, stackoverﬂow developers, Wikipedians, storywriters, and online communities of patients of chronic diseases. We studied these groups through their public e-mail archives, their Twitter feeds, their Facebook group pages, and dedicated online forums. I ﬁrst noticed that collaborative innovators show a highly speciﬁc communication behavior when I was working as a post-doc in the Advanced Network Architecture group at the MIT Lab for Computer Science in the early 1990s. This was right before the emergence of the Web. Tim Berners-Lee, the creator of the Web, had just joined

The Six Honest Signals of Collaboration

47

our group as a visiting scientist. I observed some marked differences in Tim’s behavior, when compared to others, for instance, answering e-mails in minutes instead of in weeks, as was the habit in those early Internet days. In the meantime, I have also observed these differences many times in the communication networks constructed from the e-mail archives of successful organizations. In particular, I have tested this behavior in the COIN seminar, which I have been teaching with colleagues to students since 2005 at MIT, Aalto University Helsinki, and University of Cologne, and at many other universities around the globe. In this seminar, for the duration of one semester, students form global virtual teams in different time zones, providing an excellent test bed for identifying Figure 7: Social Network Picture of the Author’s COINs Seminar Network (Blue Dot in Center is the Author).a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

48

Sociometrics and Human Relationships

communication patterns of successful distributed teams. By asking students to share their e-mail archives by cc’ing all communication to dummy e-mail boxes, we can compare communication patterns with teacher and peer ratings of student team performance (Figure 7). Our social network research of the last 12 years has identiﬁed six “social indicators” of collaborative communication collected through e-mail archives, and also through tweets, bloglinks, Wikipedia entries, and body signals captured through sensors. These signals are predictive of future performance and outcomes. Changing individual communication behavior to adhere to the six indicators increases individual, team, and organizational performance. The six indicators have been made available to individuals and organizations in “Condor.” The six indicators are the following: Strong leadership While one might expect that in collaborative teams everybody is a leader, our research showed the opposite: creative swarms need strong leaders. For example, when Tim Berners-Lee came to MIT, he built up his network with global groups of thought leaders such as the World Economic Forum by connecting with other leaders such as the head of the MIT Lab for Computer Science at the time, Michael Dertouzos. Even Wikipedia, the epitome of creative collaboration, shows this pattern, as we found that articles where a small group is in charge become high quality much faster than articles where a large group of editors is working without clear leaders. In the meantime, we have seen the same patterns for teams of medical

The Six Honest Signals of Collaboration

49

researchers and shared business process service providers, to name just a few. Rotating leadership While strong leaders with the right personality characteristics are essential for successful collaboration, a group of leaders taking turns is even better. We ﬁrst discovered this by studying the e-mail communication among Eclipse open-source developers, where rotating leadership was the best predictor of the most creative teams. By now, we have veriﬁed this behavior in dozens of other organizational networks, for example, in the COIN seminar, where students with rotating leadership showed better results. This was also conﬁrmed for medical researchers developing innovations for the care of patients with chronic diseases, where teams with different leaders taking turns were found to be more creative. Balanced contribution In a team, we differentiate between information consumers and information producers, whom we call the “contributors.” For e-mails in a group, this means that there are people that send more than they receive and the other way around. For instance, when we studied the e-mail archive of the World Wide Web consortium in its early days, Tim Berners-Lee frequently was the most active sender of a community. Over time, others would be contributing their ideas, leading to an overall well-balanced contribution index. We found that teams with a low variance in contribution — all members of the core team contributing a similar number of messages — were more creative than

50

Sociometrics and Human Relationships

teams where one or a few people were contributing most of the messages. On the other hand, for account management teams catering to customers of a global service provider, a centralized contribution index, where a few central leaders sent a steady stream of messages to customers, led to more satisﬁed customers than a pattern, where customers were bombarded in scattergun fashion from many different employees of the service provider. Responsiveness As a young post-doc at MIT, the speed with which Tim Berners-Lee responded to e-mails was an eye-opening experience. In the meantime, I have found this pattern many times. The speed of response and the number of “nudges” or “pings” it takes until a prospective communication partner answers e-mails are excellent predictors of employee and customer satisfaction and mutual respect. For example, in a consulting company we calculated the average e-mail response times of different departments in 2008, before smartphone usage became popular. Six departments took about three days on average to respond to e-mails, whereas one department was considerably slower. In the meantime, in the age of smartphones, this behavior has become more marked, with average response times of wellworking groups falling in the two-hour range for employees getting hundreds of e-mails per day. Honest language While we are not looking at the speciﬁc content of messages, we analyze positive and negative sentiments and emotionality of messages. The Condor system uses a

The Six Honest Signals of Collaboration

51

machine learning approach that can be trained with any large body of classiﬁed text, for example, with billions of tweets. We found that if the language is too positive, this might be an indicator of dissatisﬁed customers. For example, in a project with a global service provider, we found that the more positive language a sales person used in communicating with the customer, the less happy the customer was. On the other hand, in innovation teams we found that using more emotional language, deﬁned as using more positive and negative text at the same time, was a predictor of more creative teams. When looking at employee attrition, we found that employees most likely to terminate their work were becoming less emotional in their language, thereby showing less rotating leadership behavior. Shared context Highly functioning teams also deﬁne their own language. When the World Wide Web was started, new words and acronyms like “HTML,” “HTTP,” “RDF,” and “FOAF” were coined and existing words like “web,” “semantic web,” and “apache” took on a new meaning. In our analysis, we measure and use new word usage in two ways. First, we measure the complexity of text as the frequency of rare words in the entire text collection. For instance, we found that the more complex the language of sales people is, the less satisﬁed their customers are. Second, we also track the diffusion of new words in a community. If somebody introduces a new word in a group, using it for the ﬁrst time in a message sent to others, we measure how quickly the word is picked up by others. The more somebody succeeds in introducing new words, the more inﬂuential she or he is (Table 3).

52

Table 3: Deﬁnition of Six Honest Signals. Indicator

SNA Term

Deﬁnition

How the Variables Are Calculated in Condor

Central

Degree centrality

leadership Betweenness centrality

It is the number of nearest neighbors from an actor who are both senders and receivers in the network

It is a measure of the extent to which each actor acts as an

It is deﬁned as the likelihood to be on the

information hub and controls the information ﬂow

shortest path between any two actors in the network

Rotating

Betweenness centrality

It is a measure of how frequently actors change their network Number of local maxima and minima in the

leadership

oscillation

position in the team, from central to peripheral, and back

Balanced

Contribution index

Indicates how balanced a communication is in terms of

msg sent msg rcvd/(msg sent + msg

messages sent and messages received

rcvd)

Average number of hours the sender takes to respond to e-

Time until a frame is closed for the

mails

receiver after an e-mail has been sent

Average number of follow-ups that the sender needs to send

Number of pings until the sender responds

contribution Rapid

Ego ART

response Ego nudges

in order to receive a response from the receiver

betweenness curve of an actor or a group

Sociometrics and Human Relationships

Number of actors each person is directly connected with in a network

Indicator

SNA Term

Deﬁnition

How the Variables Are Calculated in Condor

Alter ART Alter nudges Honest

Avg. sentiment

Average number of hours the receiver takes to respond to e-

Time until a frame is closed for the sender,

mails

after an e-mail has been sent

Average number of follow-ups that the receiver needs to

Number of pings until the receiver

send in order to receive a response from the sender

responds

Indicates positivity and negativity of communication

Uses automatically generated bag of

language

words, based on a dictionary trained for

The Six Honest Signals of Collaboration

Table 3: (Continued )

language/subject area

Shared context

Avg. emotionality

Represents the deviation from neutral sentiment

Standard deviation of sentiment

Avg. complexity

It is a measure of complexity of word usage. It is deﬁned as

Information distribution using TF/IDF,

the information distribution, that is, the more diverse words,

independent of single words

which are all used evenly, a sender uses, the higher his complexity

53

54

Sociometrics and Human Relationships

4.1. THE HONEST SIGNALS HAVE DIFFERENT MEANINGS FOR DIFFERENT ORGANIZATIONS The six signals have predictive power for both creative and process-oriented organizations; however, we found that for some indicators the directionality might change, while others ﬁnd universal applicability. When I started my research in communication in COINs 14 years ago, I initially expected to ﬁnd democratic leadership patterns with members of the core team, all sharing in an egalitarian communication pattern. However, I found the opposite, with people like Tim Berners-Lee for the Web, Linus Torvalds for Linux, or Jimmy Wales for Wikipedia assuming strong leadership roles. Even when it looked like an egalitarian leadership team, with a small group of people sharing the lead, when adding the temporal dimension, it became clear that one person was in charge at any given point in time. Through 12 years of research, we found that rotating leadership was the key indicator of creativity. However, for tasks where creativity is not at a premium, and reliability is essential, the opposite — steady leadership — is more important. For example, when studying nurses in the Post Anesthesia Care Unit of a large hospital, we found that patients were waking up from anesthesia on average faster if the same senior nurse was in charge for the entire duration of a day. Democratic leadership and taking turns in this case does not seem beneﬁcial for the patients. This was different for teams of medical researchers, whose research output was rated more creative when they were showing more rotating leadership, with different senior and junior people taking turns in occupying the most central network position over time.

The Six Honest Signals of Collaboration

55

We also found that speed in response can be interpreted differently depending on the context. For example, when comparing customer satisfaction of a service provider with the speed of response of account managers, we found no direct inﬂuence on customer satisfaction — although intuition might tell us otherwise. However, we found a signiﬁcant correlation between the speed with which the customer answered the e-mails of the account manager and customer satisfaction. The happier the customers were with the services provided, the faster they would respond to messages of their account manager. This tells us that it is not enough to answer messages of a customer quickly, one also needs to solve their problems — although answering messages slowly is certainly one way to create unhappy customers. In this particular case, however, we had already raised awareness of being responsive with the service provider, such that all account managers were already quite fast in answering, thereby not providing a competitive advantage anymore (Table 4).

Table 4: Directionality of Indicators for High-Performing Teams. Indicator

Performance Indicator

Central leadership

Higher performing teams have one or more clear leaders

Rotating

Creative teams have rotating leaders; process-oriented

leadership

teams have steady leaders

Balanced contribution

Creative teams show balanced contribution; processoriented teams show a few dominant contributors

Rapid response

Higher performing teams show rapid response

Honest language

Higher performing teams use honest language

Shared context

Higher performing teams use their own vocabulary

56

Sociometrics and Human Relationships

4.2. VIRTUAL MIRRORING LEADS TO CHANGE When people know that they are being monitored, they change their behavior. This is called the “Hawthorne effect,” discovered almost a century ago when scientists in the Hawthorne factory outside Chicago experimented with augmenting the work environment of factory workers. They found that whatever they did, whether it was turning the lighting up or down or changing the ﬂoor plan, performance of the workers improved because of the attention paid to the workers. We call the process of showing people their communication behavior “virtual mirroring.” They are shown a “virtual mirror” of their own communication pattern as a social network picture, created from their e-mail archive, plus a comparative ranking of their six indicators. If exposed to a virtual mirror, people, based on the Hawthorne principle, will initially change their behavior. If, in combination with the virtual mirror, we teach them which type of behavior is indicative of future higher organizational performance, this change in behavior will be permanent and will lead to improved outcome. When monitoring a business process for higher performance, participants will change their behavior to act in a way leading to a better process; when monitoring for creativity, participants in virtual mirroring will become more creative.

4.3. DEALING WITH PRIVACY CONCERNS One of the ﬁrst questions that always comes up when we start a new project analyzing e-mails is about privacy and

The Six Honest Signals of Collaboration

57

data security. When dealing with sensitive company data, the preferred approach is to host the company’s data within the corporate ﬁrewall, provide access to aggregated information to management, and to give each employee access to their own personal communication insights. People are concerned about the contents becoming known in public. On the technical level, we are addressing this issue on three different levels. On the strictest level, we commit to doing an anonymized analysis, where insights about individuals are only given to individuals. This means that in the results screens shown in Figures 9 and 10, individuals log in with their own e-mail address, and will only see their own names, with the rest of the people anonymized. Management will only get anonymized results aggregated on the team or business unit level. The problem with this approach is that the insights to be gained are somewhat limited, as it would be quite interesting for people to know, for example, who responds fastest to them. On the mid-level of privacy, we therefore restrict our analysis to e-mail header information and are not using e-mail content and subject line. This allows us to calculate all the indicators except “honest sentiment” and “shared context.” If we are doing an individual analysis using an individual’s mailbox, the individual has access to the full mailbox anyway. Using our Condor software, we can then disclose who responds fastest and with the least nudges from the mailbox owner, and who is the most honest and the most inﬂuential person in the network of the individual based on new word usage. Frequently, we conduct an analysis on this most open level also for organizations, as they own the

58

Sociometrics and Human Relationships

contents of their organization’s e-mail archive, similarly to how they own the payroll and accounting data of their SAP system. Organizations have always been keeping salary and sales data protected from individual employees, while using the aggregate information for their own competitive advantage. The same parallel applies to e-mail data, which in aggregated form, and broken down in employee communication benchmarks, provides invaluable information to corporate management. 4.4. HOW TO APPLY KNOWLEDGE FLOW OPTIMIZATION We have developed a four-step process, which we call “Knowledge Flow Optimization” to analyze and increase the performance of organizations and to “Coolfarm ideas” (Figure 8). It consists of the four Figure 8: Coolfarming through “Knowledge Flow Optimization.”

The Six Honest Signals of Collaboration

59

steps: “Analyze, Predict, Mirror, Optimize.” To illustrate our approach, let’s look at the analysis of the sales force of a fortune 500 high-tech company, where we compared e-mail communication of the organization with sales success of their sales teams in different geographical regions. Step 1: Analyze: Determining social network metrics and communication patterns In the ﬁrst step, we analyzed and quantiﬁed the communication patterns and social network structure embedded within organizational communication archives such as e-mail, videoconferencing, and instant messaging. We used this data to calculate the six indicators for the different types of communication archives such as e-mail or videoconferencing. Step 2: Predict: Six honest signals: Comparing structural attributes with business success In the second step, we compared communication behavior found in Step 1 with the communication patterns identiﬁed as indicators of better connectivity, interactivity, and sharing among the individuals in the network. Having calculated the six indicators from the data in the communication archives, we correlate them with quantiﬁed success and failure criteria. The success and failure criteria vary signiﬁcantly depending on the type of organization, the industry, and the individuals being measured. In this example, we measured sales performance of the sales teams in different geographic regions and for different products.

60

Sociometrics and Human Relationships

Step 3: Mirror: Virtual mirroring In the next step, we mirror the communication behavior we have identiﬁed for the different parts of the organization back to the teams and individuals. By showing them how they differ from the best practices we found in past projects, we help them to improve their behavior for better performance. Just like with a real mirror, looking at how a team “really” communicates can be an eye-opening experience for the team members, leading to fundamental changes in their behavior for the better (Figure 9). In the example with the global sales force for the hightech manufacturer, we found that the more they showed a rotating leadership behavior within the Web conferencing network, the less e-mail they sent to their customers Figure 9: Overview Screen of “Virtual Mirroring,” Showing How the Individual Does Compared to the Rest of the People in the Group (see also Chapter 15).a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The Six Honest Signals of Collaboration

61

(the less the “spammed” them), and the less positive and more “honest” language they used, the more they sold. This means that the less they were “talking up” their products, and the more they also admitted upcoming problems, the more willing their customers were to buy their products (Figure 10). Step 4: Optimize: Devising a plan to optimize communication for greater success Once we ﬁgured out which of the six indicators are correlated with success and failure, we developed a roadmap and recommendations for the company to act on, to change communication behaviors of the sales staff for more successfully closed deals and more satisﬁed customers.

4.5. FOUR EXAMPLES Predicting customer satisfaction of a global service provider For a large service provider, we tracked 26 accounts for more than two years. In each of these accounts, dozens to hundreds of service provider employees were working on behalf of one fortune 1000 customer. We collected the e-mail of the account managers of the service provider and calculated the six indicators for every month. We also provided virtual mirroring feedback to the account executives once we had the six variables. When comparing the 26 tracked accounts with a control group of 150 accounts whose e-mail we did not collect, we found that over the duration of our analysis, the satisfaction of the 26 customers, measured by

Figure 10: Drill-down on Virtual Mirror, Showing from the Top to Bottom: Social Network, Communication Frequency (Degree Centrality), and Flexibility and Adaptability (Oscillation in Betweenness Centrality).a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The Six Honest Signals of Collaboration

63

Net Promoter Score, improved by 5%, whereas average satisfaction decreased by 12% for the control group. Predicting employee attrition of a global service provider For a large service provider, using two years of e-mail data, we predicted the likelihood of the top 3000 employees leaving the company. We also distinguished between the “inner termination” phase before employees actually handed in their resignations and the last three months working for the company. We found marked differences in communication patterns. For instance, employees who were not happy in the company anymore showed a 4% decrease in emotionality and 20% decrease in rotating leadership. Once they had decided to leave, their emotionality levels and patterns of rotating leadership went back to their normal level. Improving sales effectiveness of a global high-tech company For a global high-tech manufacturer, we compared two weeks of the entire e-mail archive, together with videoconferencing networks, and message networks, with sales performance of the global sales teams. There were some overall patterns, such as more responsive sales associates generating higher sales or sales people selling collaboration-based products being more successful when showing a rotating leadership pattern. We also found that different types of communication behaviors were advantageous in different geographies. For example, in France, sales people who initiated more video conferences were generating more sales, while sending more e-mails was not leading to increased sales.

64

Sociometrics and Human Relationships

Boosting research creativity of medical researchers In a multi-year project at a leading US research hospital, we analyzed the e-mail trafﬁc of a large project with over 100 members from both outside research organizations and the hospital. We also provided virtual mirroring sessions to the senior leadership team as well as to selected project teams. These mirroring sessions led to increased rotating leadership, to more honest sentiment, and higher responsiveness. In the meantime, the research model pioneered in this project has been applied to a variety of other healthcare-related research projects, for instance, to address the needs of chronic care patients, as well as to a large project aiming to reduce infant mortality across the United States.

4.6. AREAS OF E-MAIL-BASED SNA Using e-mail-based social network analysis gives managers an unprecedented early warning system of the whole organization, and allows them to predict ﬂash points before they happen, leading to greatly improved performance and the capability to manage risk in a much better way. Measuring the effectiveness of the communication network is akin to measuring the nervous system of the organization, which until now was unmeasurable. Analyzing e-mail ﬂows permits the CEO to predict mission critical factors such as the propensity of employees to leave the company, the satisfaction of a company’s customers, and the effectiveness of its sales force well before the events actually happen.

The Six Honest Signals of Collaboration

65

Our e-mail communication analysis system can thus be put to productive use in three ways: 1. Managers can use it as a monitoring and alert system to spot emerging problems before they actually happen. 2. Business units and teams can optimize their performance through better communication. Telling teams through the virtual mirroring process what their strengths and weaknesses in interaction are can be used to give invaluable advice for better collaboration and improved team success. 3. The six indicators of collaboration can also be mirrored back to individuals, so they can improve their individual communication behavior to become more effective and successful team members.

4.7. IMPROVING FINANCIAL CAPITAL THROUGH OPTIMIZING SOCIAL CAPITAL Just like SAP allows an organization to make better use of its ﬁnancial resources through continuous monitoring of managerial accounting, Coolfarming through knowledge ﬂow optimization allows an organization to make much better use of its social capital through tracking and optimizing collaboration. While social media monitoring has become quite popular, the social capital-driven approach described in this book is unique in ﬁve different ways: Measuring “True Creativity” — Our framework is based on the notion of COINs. It has been ﬁeld-tested in over 100 organizations to identify the communication patterns

66

Sociometrics and Human Relationships

indicative of creativity. This includes far more than counting the number of e-mails of individuals and teams; rather, using the six honest signals of collaboration listed in Chapter 3, users will be able to identify complex networking patterns of true creativity. Know cool people, and not just “hotspots” and “spammers” — The semantic social network analysis tool Condor ﬁnds trends by ﬁnding the trendsetters. When doing a Coolhunt on the Web, Twitter, and Wikipedia, in the ﬁrst step it works like Google, to identify who is using novel words and ideas, but then it ﬁnds the most inﬂuential people, by applying the six honest signals of collaboration. This way it does not just reward “spammers” with a central position, but also will, for instance, identify inﬂuencers by measuring who uses novel words ﬁrst and how quickly they are picked up by others to grow into new COINs. Anti-gaming — We use social network analysis metrics such as “betweenness centrality” and time series of e-mail exchange, which are far more robust toward “gaming” by employees than simply counting e-mails sent and received. Measuring the betweenness of Twitter users in a retweet network is also more indicative of popularity, than just the number of Twitter followers, which can be gamed quite easily by following back other people. Measuring organizational trust and satisfaction — We do not just count complex words, to measure complexity in dialogue, and counting positive and negative words such as “great,” “wonderful,” “horrible,” “awful,” but through supervised machine learning algorithms track word distribution and measure positivity and negativity as well as complexity in context.

The Six Honest Signals of Collaboration

67

Understanding communication galaxies — We track the evolution of network positions of people, measuring how individuals change from being “stars” to becoming “galaxies,” applying the slogan “don’t be a star, be a galaxy.” The groups of the most creative people and most highly functioning teams act as communication galaxies embedded into clusters of other teams.

MAIN LESSONS LEARNED • The six honest signals of collaboration are strong leadership, balanced contribution, rotating leadership, responsiveness, honest language, and shared context. • Showing the six honest signals to individuals and organizations in a virtual mirroring process, and telling them which behavior is predictive of high performance and creativity, will lead to better organizations and more creative and productive employees. • Monitoring communication will allow organizations to manage social capital just like they manage ﬁnancial capital. • To address privacy concerns, just like SAP stores accounting information and calculates ﬁnancial performance metrics, virtual mirroring allows organizations to calculate and show collaborative performance metrics.

This page intentionally left blank

5 ESSENTIALS OF SOCIAL NETWORK ANALYSIS AND STATISTICS

CHAPTER CONTENTS • Introduction to social network analysis • Introduction to statistics (t-tests, correlation, regression). In this chapter, you will learn just enough about SNA and statistics to understand the theory behind the social media analysis with Condor and the trend predictions described in the subsequent chapters. This will enable you to do your own experiments and predictions with social media data collected using Condor. There are many excellent textbooks on SNA (Tsvetovat & Koutznetsov, 2011; Wassermann & Faust, 1994) and statistics (Urdan, 2010) available to learn more. SNA has been around for a long time. In the classic example of the puzzle of the “seven bridges of Koenigsberg,” Leonhard Euler laid the foundations of graph theory in 1736. Since then, SNA has come a long way and with the proliferation of the Internet and the Web, most prominently

r 2017 Peter A. Gloor

69

70

Sociometrics and Human Relationships

Facebook, it has become a key foundation to understand the structure of social networks. 5.1. BASICS OF SOCIAL NETWORK ANALYSIS (SNA) You will learn here about actors and ties, about actorlevel metrics, degree centrality, betweenness centrality, and contribution index, and about group-level metrics, group degree centrality, group betweenness centrality, and density. Networks consists of nodes and connecting edges. In the language of SNA, nodes are called actors, edges are called ties. Figure 11 shows a simple undirected network with seven actors and nine ties. Figure 11: Undirected Network.

The same graph can also be shown as an adjacency matrix, where all actors are labeled on both the rows and the columns. In Figure 12, the black square in element a12 denotes the tie from actor 401 to actor 402. As element a21 is empty, there is no tie from actor 402 to actor 401. On the other hand, elements a16 and a61 are both black, showing that

Essentials of Social Network Analysis and Statistics

71

Figure 12: Adjacency Matrix of (Directed) Graph from Figure 11.

there is a link from actor 407 to actor 401, as well as a link from actor 401 to 407. In other words, between actors 401 and 407, there is a bidirectional link. The adjacency matrix in Figure 12 therefore shows a directed graph (if it were undirected, the matrix would be symmetric). Figure 13 shows the network view of the matrix from Figure 12. All the links in Figures 12 and 13 are Figure 13: Directed Network View of Adjacency Matrix in Figure 12.

72

Sociometrics and Human Relationships

bidirectional except the link from 402 to 404 and 404 to 406. Based on the position of an actor in the network, we can calculate actor-level metrics for each actor in the graph. The simplest actor-level metric is degree centrality, which measures the number of direct neighbors of an actor. CD(a) = deg(a) Figure 14 shows the degree centralities of all actors in the network from Figure 13. Figure 14: Degree Centralities of All Nodes in Network from Figure 13, Nodes Sized by Degree Centrality.

The most frequently used metric is betweenness centrality, which measures information ﬂow among nodes. It measures the number of times a node is on the shortest path between any two nodes other than itself (corresponding to the likelihood of the node to be on the shortest path). Betweenness centrality is commonly taken as a proxy for power and inﬂuence, as whoever controls information has power (Figure 15). Formally, the nonnormalized betweenness centrality is P CB ðaÞ ¼ s ≠ a ≠ t σ st ðaÞ where σst(a) is the number of shortest paths passing through a between any two actors s and t.

Essentials of Social Network Analysis and Statistics

73

Figure 15: Nonnormalized Betweenness Centrality of All Nodes in Network from Figure 13, Nodes Sized by Betweenness Centrality.

The normalized betweenness centrality is P CB ðaÞ ¼ s ≠ a ≠ t σ stσðaÞ st where σst is the total number of shortest paths from s to t. Both degree and betweenness centrality are also deﬁned as group metrics. The group metric measures the distribution of the actor-level centralities, with the most centralized graphs, in a star structure where one actor in the center is connected with all other actors with one tie, deﬁned as 1, while a totally egalitarian structure, where every actor has the same number of connections to all other actors, deﬁned as 0. More formally, group degree centrality is deﬁned as Pg [CD ðn Þ CD ðni Þ] CD ¼ i¼1 [ðg 1Þðg 2Þ] where CD(n*) is the maximum degree of any actor in the graph and g is the number of actors. For the example above, this would be ((4 3)+(4 3)+ (4 3)+(4 2)+(4 1)+(4 1))/(6*5)=11/30 = 0.3667.

Sociometrics and Human Relationships

74

For group betweenness centrality, the formula is Pg [CB ðn Þ CB ðni Þ] CB ¼ i¼1 ðg 1Þ where CB(n*) is the maximum normalized betweenness of any actor in the graph and g is the number of actors. Contribution index measures the number of sent and received messages for an actor. Formally, it is deﬁned for each node as messages sent messages received messages sent þ messages received which in Figure 13 would be translated to incoming ties and outgoing ties. For example actors 403, 405, and 407 all would have 3 incoming and 3 outgoing links, leading to contribution index 0. Actor 402 has a total of 3 ties, out of which 2 are outgoing and 1 is incoming, leading to contribution index (2 1)/3 = 0.3333. In the visual representation, each actor is shown as a dot, with the x-axis denoting the total number of messages an actor has sent and received and the y-axis the contribution index (Figure 16). The ﬁnal group-level metric we will be using is graph density, which measures how many connections between any two nodes out of all possible connections actually exist. Formally, where E is the number of all edges, density D is D¼

E gðg 1Þ

which would be, in the example from Figure 13, D = 16/(7*6)= 0.38095.

Essentials of Social Network Analysis and Statistics

75

Figure 16: Contribution Index for Network from Figure 13.

5.2. BASICS OF STATISTICS You will learn here about t-tests, which help you to compare two samples and ﬁgure out if there are statistically signiﬁcant differences between the two samples: correlations, which help you to decide if two variables are related, and linear regression, which help you to measure if one outcome variable might be caused by a set of input variables. For example, assume that we have a set of data points describing gender, body length, and income for a mixed group of women and men. The t-test will help conﬁrm that men have higher income than women. The correlation will tell us that there is a relationship between body length and income, as men are usually taller than women, and also still on average have higher income. The regression will tell us that a certain fraction of income in general is explained by body length and gender.

76

Sociometrics and Human Relationships

In this chapter, we will look at a social media example to illustrate how t-tests, correlations, and regression can be used to identify patterns in the social media data. We analyze the Twitter data described in Section 11.3. It contains 23,484 tweets by 16,948 people tweeting either about “Bernie Sanders” or “Donald Trump” on April 22, 2016 from 6:42 to 15:19. Using insights by James Pennebaker about the “hidden life of pronouns,”1 we count the number of pronouns “I,” “me,” “we,” “us,” “the,” “and” “or,” “not,” etc. and calculate their frequency in the tweets used by each individual person. Condor allows us to do this automatically, calculating the probability that a particular pronoun, for example “the,” appears in a random tweet of the observed person. We would like to know if Bernie Sanders fans, recognizable through Twitter handles such as “Latinos4Bernie” or “People4Bernie” differ in their language from other tweeters by using these pronouns differently. To measure this we are using the independent group t-test. The t-test measures if two normally distributed samples are statistically different. It basically checks if the two group averages are signiﬁcantly different and their standard deviations are sufﬁciently small. Figure 18 shows the results of the independent samples t-tests as calculated using Condor. For example, Bernie Sanders’ fans use the pronoun “you” 10 times as much as the other tweeters, with a probability of 0.00077 within a tweet instead of 0.000077. What this means is illustrated in Figure 17, which shows the distribution of the tweeters using “you.” While there are also Bernie Sanders fans who 1

Pennebaker (2013).

Essentials of Social Network Analysis and Statistics

77

Figure 17: Distribution of Relative Frequency of “you” in Tweets by Bernie Sanders Fans (Orange) and Others (Blue).a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

use “you” as rarely as other tweeters, the majority of Bernie Sanders fans in the normal distribution — the maximum in the distribution curve — is at 0.00077, whereas the majority of the other tweeters uses “you” in a single tweet only with a probability of 0.000077. Figure 18 shows the results of the t-test for all Pennebaker variables. We see that there are 240 Bernie Sanders fans (Group 0 in Figure 18), and 16,708 others (Group 1). We ﬁnd that the difference in the usage of “you” is highly statistically signiﬁcant, as the p-value is 0. When looking at the number of Twitter followers (friends_count), we see that while Bernie Sanders fans have more followers than others (1527 instead of 1494), the difference is not statistically signiﬁcant. The p-value is 0.94, which means there is a 94% chance that the higher number of followers is pure chance. Let’s now look at the relationship between the different variables. We would like to know if people who are more popular on Twitter by being “favored” more often

78

Sociometrics and Human Relationships

Figure 18: Independent Samples t-Test Result for Pronoun Usage in Tweets for Bernie Sanders Fans (Group = 0) and Others (Group = 1) Calculated by Condor.

use a speciﬁc language, for instance using more or less words like “the,” “and,” “to,” “with,” and “in.” To check for this we use correlations. The most popular correlation is the Pearson correlation. If two variables have a 100% linear relationship, the Pearson correlation coefﬁcient is 1, if there is no relationship whatsoever between the two variables, the correlation coefﬁcient is 0. Figure 19 illustrates two variables x and y which are strongly correlated (at left) and which have no correlation (at right).

Essentials of Social Network Analysis and Statistics

79

Figure 19: Strong Correlation between Two Variables x and y (Left) and Uncorrelated Variables x and y (Right).a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

We would now like to see if an actor who has many followers (called “friends” in Twitter) is also more popular, by being “favorited” more. Figure 20 shows the relationship between the two variables friends_count and favorites_count for all 16,947 actors in our dataset. As we can see in Figure 20, there is a correlation, but it is not particularly strong (there is a straight line rising, but it is quite ﬂat). Correlating friends_count and favorite_count, we ﬁnd that the correlation is signiﬁcant with a coefﬁcient of r = 0.120 (Table 5). A correlation which is signiﬁcant at the 0.01 level has a 1% chance that the result is by chance; a correlation with a signiﬁcance level of 0.05 has a 5% chance that the result is accidental. The correlation coefﬁcient of 0.12 means that 12% of the variation between Twitter friends and being “favorited” can be explained through the relationship between Twitter friends and being “favorite.” Table 5 also tells us that there is a weak but signiﬁcant negative correlation between being “favorited” and

80

Sociometrics and Human Relationships

Figure 20: Correlation between friends_count and favorites_count.

the usage of “the,” “and,” “to,” “with,” and “in.” A very small part, 23% of the variation of being “favorited” can be explained through the usage of these pronouns, using them less indicates to be more “favorited.” To investigate if A is causing B, in this case, if using less pronouns and having more Twitter friends will cause an actor to be more “favorited,” statisticians use a linear regression (Table 6). Table 6 shows the results of the regression with the dependent variable favorites_count and the independent variables avg frequency_With, avg frequency_In, avg frequency_To, avg frequency_The, and friends_count. Entering this data into a statistics package, such as R, SPSS, Stata, SAS, or Excel, returns an adjusted R squared

friends_count favorites_count

avg

avg

avg

avg

avg

frequency_The frequency_And frequency_To frequency_With frequency_In friends_count

Pearson

1

.120a

.006

.009

.008

.002

.003

.000

.505

.284

.328

.835

.757

14,414

14,414

14,414

14,414

14,414

14,414

1

.038

.021

.027

.021

.031a

correlation Sig. (twotailed) N favorites_count Pearson

14,414 .120

a

a

b

a

b

correlation Sig. (two-

.000

.000

.012

.001

.011

.000

14,414

14,414

14,414

14,414

14,414

tailed) N

14,414

14,414

Essentials of Social Network Analysis and Statistics

Table 5: Pearson Correlation between Twitter Friends/Favorite Count and Pronoun Usage.

a

Correlation is signiﬁcant at the 0.01 level (two-tailed).

b

Correlation is signiﬁcant at the 0.05 level (two-tailed).

81

82

Table 6: Regression Results of Regressing the Dependent Variable favorites_count against the Independent Variables avg frequency_With, avg frequency_In, avg frequency_To, avg frequency_The, and friends_count. Model

Unstandardized Coefﬁcients

B

t

Sig.

39.403

0.000

Beta

7445.317

188.952

avg frequency_The

31497.381

8162.346

.032

3.859

.000

avg frequency_To

20985.868

9297.563

.019

2.257

.024

avg frequency_In

32535.075

10200.098

.027

3.190

.001

avg frequency_With

39646.094

17694.318

.019

2.241

.025

.324

.022

.120

14.528

.000

friends_count

Sociometrics and Human Relationships

(Constant)

Std. Error

Standardized Coefﬁcients

Essentials of Social Network Analysis and Statistics

83

of 0.017. This means that 1.7% of the favorites_count is explained by this linear model, which is quite a small effect, although signiﬁcant (all coefﬁcients are signiﬁcant too). The less “the,” “to,” “in,” “with” somebody uses in her tweets, and the more friends she has, the more will she be favorited. More precisely, the standardized coefﬁcient of 0.032 of “avg frequency_The” means that for a one-unit change in standard deviation for the predictor variable “avg frequency_The,” the dependent variable favorites_count will change by 0.032. In other words, using a lot of “the’s” in Tweets reduces the likelihood of being “favorited” by a small amount. This concludes our bare bones introduction to SNA and statistics; readers are encouraged to further study this subject in one of the many excellent books or online courses available elsewhere.

MAIN LESSONS LEARNED • SNA provides a framework for comparing the structural properties of a social network with its behavior. • The SNA key actor-level properties are degree centrality, betweenness centrality, and contribution index. • The SNA key network-level properties — to compare differences between different networks — are group degree centrality, group betweenness centrality, and density.

84

Sociometrics and Human Relationships

• The statistical t-test allows you to compare the difference of attributes between two samples. • The Pearson correlation measures if two variables have a relationship. • A linear regression tracks the impact of a set of input variables (called “independent variables”) on an outcome variable (called “dependent variable”).

6 HOW IDEAS SPREAD IN ONLINE SOCIAL NETWORKS — READINGS

CHAPTER CONTENTS • Theories of information diffusion on social networks • Spreading ideas on Facebook • Finding fake content through machine learning • Forecasting ﬁnancial performance on social media • Extracting demographic information from social media • Predicting elections from social media.

This chapter covers the theoretical background for the different applications of the tools and methods described in this book. It provides short comments for 25 foundational research papers investigating how ideas are spreading in online social networks and how analyzing online social network structure and content can be used to extract demographic information about the underlying real-world network.

r 2017 Peter A. Gloor

85

86

Sociometrics and Human Relationships

The ﬁrst section investigates which network structure is most conducive to spreading new ideas and convincing others to accept these new ideas. It also demonstrates that cooperation is a good idea and looks at trustworthiness and uncalculating cooperators. Subsequently, these concepts are tested in Facebook networks, which provide an unbiased platform to verify the algorithms introduced in the ﬁrst section. The next section demonstrates that machine learning can discover fake text that humans cannot. The next three papers show how ﬁnancial trends such as stock prices can be predicted using Google Trend, Twitter, and Wikipedia. Similar algorithms are then used to extract demographic information such as personality characteristics, sociodemographic and socioeconomic information, and the most controversial topics in different cultures from Twitter, Wikipedia, blogs, and mobile phone records. Finally, Twitter data is also used to predict the outcome of popular elections (Table 7). Table 7: Overview of Papers Covered in This Section. Basic Concept

Main Insight

From Paper

How and why ideas spread in social networks Weak ties

Dissemination of ideas

Battilana and Casciaro

Strong ties

Acceptance of ideas

(2012)

Advantages of

Acceptance of ideas

Centola (2010)

150 (Dunbar’s number)

Hill and Dunbar (2003)

Dark side of social

Acceptance of bad ideas

Satyanath, Voigtländer,

capital

comes from friends

and Voth (2013)

embeddedness Maximum number of close friends

How Ideas Spread in Online Social Networks — Readings

87

Table 7: (Continued ) Basic Concept

Main Insight

From Paper

Social structure to

Structural fold (move

Vedres and Stark

spread ideas

people between teams)

(2010)

Social networks in

Are the same, driven by

Apicella, Marlowe,

stone age

homophily

Fowler, and Christakis (2012)

Network structure of

Fewer but fundamental

Wagner, Horlings,

Nobel Prize winners is different

papers, form groups in young years

Whetsell, Mattsson, and Nordqvist (2015)

Evolution is linking

We think Google is part of

Sparrow, Liu, and

computers to humans

our brain

Wegner (2011)

Is cooperation genetic

Models show that altruistic cooperation beneﬁts the

Nowak (2006)

individual Do we prefer to

Preference for interaction

Fu, Nowak, Christakis,

interact with people

with similar people makes

and Fowler (2012)

similar to us

groups more similar

In a lab test,

Uncalculating cooperators

Jordan, Hoffman,

calculating cooperation are more popular

Nowak, and Rand

is tested

(2016)

Spreading ideas on Facebook Do Facebook friends

Yes, for classical music

Lewis, Gonzalez, and

share the same

and Jazz, they start liking

Kaufman (2012)

musical taste

the same style

Do people adapt Yes, preferably if the friend Aral and Walker (2012) recommendations from is older and male Facebook friends Facebook behavior is

People who get more

correlated with

direct Facebook messages Marlow (2011)

happiness

are more popular

Burke, Kraut, and

Finding fake content through machine learning Machine learning ﬁnds fake reviews

Using different classiﬁers

Ott, Choi, Cardie, and Hancock (2011)

Sociometrics and Human Relationships

88

Table 7: (Continued ) Basic Concept

Main Insight

From Paper

Measuring ﬁnancial performance Google Trends predicts stock markets

Search for a stock correlates with stock price

Preis, Moat, and Stanley (2013)

change Twitter buzz correlates GPOMS “arousal” predicts

Bollen, Mao, and Zeng

with NASDQ

(2011)

rise/drop in stock prices

Do Wikipedia edits and Searches do, edits do not

Moat et al. (2013)

searches correlate with stocks Calculating demographic information Most controversial

Are religion and politics,

Yasseri, Spoerri,

topics in different

Israel is also consistently

Graham, and Kertész

languages through

controversial

(2014)

The way how one

Using FFI, Twitterers with

Quercia, Kosinski,

tweets predicts their

many followers are more

Stillwell, and Crowcroft

personality

extrovert

(2011)

Twitterers with more

Happiness and popularity

Bollen, Gonçalves, van

friends are happier

are not correlated

de Leemput, and Ruan

reverts

(2016) Twitter behavior

Low income Twitterers use Preot¸iuc-Pietro,

predicts income

foul language

Volkova, Lampos, Bachrach, and Aletras (2015)

Mobile phone records

Income and physical

Aharony, Pan, Ip,

give demographic

activity can be measured

Khayal, and Pentland

information

from mobile phone records (2011)

Blogs will predict

FFI (ﬁve-factor inventory)

personality

predicted based on Pennebaker pronouns

Yarkoni (2010)

Predicting elections Twitter buzz can

Congressional elections

DiGrazia, McKelvey,

predict outcome of elections

correlate with Twitter buzz

Bollen, and Rojas (2013)

How Ideas Spread in Online Social Networks — Readings

89

6.1. THEORIES OF INFORMATION DIFFUSION Battilana and Casciaro: Change agents, networks, and institutions: A contingency theory of organizational change Battilana and Casciaro describe the advantages of strong and weak ties. Weak ties, connections with casual acquaintances, are good for information dissemination. Strong ties — being embedded through many connections in a group of good friends — are beneﬁcial for the adaptation of new ideas. Drawing on a study in eight healthcare organizations in the United Kingdom, the authors ﬁnd that “explorative” innovations, which change the status quo, are better supported in “weak tie” networks with “structural holes,” while “exploitative” healthcare innovations that support the status quo ﬂourish in densely connected networks. In particular, the more structural holes the social networks of inﬂuencers have, the more likely they are to initiate explorative change. Centola: The spread of behavior in an online social network experiment The paper compares the adaptation of new ideas in social networks with different network structures. In a random network, the network diameter is lower than in a clustered one. So we would expect that in a random network, a new idea has a shorter path for spreading through the whole network. However, in experiments Centola found clustered networks to be more efﬁcient at spreading innovative ideas, despite the higher network diameter. A clustered network represents our social networks more

90

Sociometrics and Human Relationships

closely, as an individual’s neighbors often are neighbors to each other as well. This means that a person is repeatedly exposed to a new idea through the many neighbors in the same cluster, leading to faster adaptation of the idea. As we tend to emulate the behavior of our friends, the likelihood of a person adopting the new idea directly correlates with the amount of neighbors that have adopted the idea. Hill and Dunbar: Social network size in humans This paper analyzes the size of social networks in the Western society. The ﬁndings are based on a study in which the number of Christmas cards sent was measured. Forty-three white British households were questioned. The result of the study was that the average network size of a person is 154 people, out of which an average of 125 individuals are contacted explicitly, with the others being included by living in the same household. The relationship with the Christmas card recipients was examined by distance to the sender, relationship with the sender, social status of the recipient, last contact, and emotional closeness. It was found that the more distant the other person was, the more emotionally close, a colleague at work, and living overseas, the higher the likelihood of sending a Christmas card. Satyanath, Voigtländer, and Voth: Bowling for fascism: Social capital, and the rise of the Nazi party in Weimar Germany, 19191933 The authors ﬁnd a dark side of social capital. They measure social capital as the density of associations in a

How Ideas Spread in Online Social Networks — Readings

91

particular region of Germany at the time when Nazi Germany was emerging, collecting association data from 112 German cities where the records survived the 2nd World War. Their analysis found that areas with higher association density registered more entries to the NSDAP. It seems that the more of our friends adapt to a bad idea, the more we are likely to do the same. Vedres and Stark: Structural folds: Generative disruption in overlapping groups In a dataset with personnel ties among the largest 1696 Hungarian enterprises from 1987 to 2001, the researchers identiﬁed a distinctive network topology, the structural folds. Structural folds are bridging ties; actors at the structural folds are members of more than one cohesive group who over time change group membership, acting as knowledge transfer agents. The researchers found that groups with more structural folds show higher revenue growth. Apicella, Marlowe, Fowler, and Christakis: Social networks and cooperation in hunter-gatherers The researchers conduct a social network analysis among members of the Hadza tribe of Tanzania, a people of Stone Age hunter-gatherers. The authors ﬁnd the same networking behavior as for people living in the Western Facebook civilization. Ties between two people are measured through the option of giving honey sticks to each other. Reciprocity (the increased likelihood of an outbound tie to be reciprocated with an inbound tie from the same person), degree assortativity

92

Sociometrics and Human Relationships

(the tendency of popular people to befriend other popular people), transitivity (the likelihood that two of a person’s friends are in turn friends), and homophily (the tendency of similar people to form ties) seem to be true for both stone age and Internet age people. With respect to homophily, similar generosity, strength of handshake, and body mass index are all predictors of the existence of a tie. Wagner, Horlings, Wheettsell, Mattsson, and Nordqvist: Do Nobel laureates create prize-winning networks? In this paper, a group of 68 Nobel laureates is compared to a group of similarly accomplished scientists who did not win a Nobel. The goal was to compare productivity, impact, coauthorship, and international collaboration patterns of both networks. One big difference of those networks is that the laureates seem to be more likely to close structural holes by building bridges across networks. Therefore, the laureate network has signiﬁcantly fewer communities and is more interconnected and less clustered. Nevertheless, nonlaureates seem to be more productive and have similar rates of collaboration. Laureates appear to focus on fewer, higher quality publications, and are more highly cited. Furthermore, more connections are found across the laureate network, providing more opportunity for bridging new ideas, methods, and technologies. Sparrow, Liu, and Wegner: Google effects on memory: Cognitive consequences of having information at our ﬁngertips The Internet has become a primary form of external or transactive memory, where information is stored

How Ideas Spread in Online Social Networks — Readings

93

collectively outside our brains. This paper examines if people who expect to have access to online information have lower rates of recall of the information and enhanced recall of where to access the information. In experiments, participants were shown statements, which they thought they would have access to later. A control group who thought they would not have access to the information later was shown the same statements. The researchers found that when participants thought they would have online access, they spent less effort storing the information. The conclusion is that human memory processes are adapting to ready access to information on the Internet, Google, and Wikipedia, by enhancing our memory. Nowak: The evolution of cooperation Nowak’s premise is that cooperation is needed for evolution to construct more complex organizations. On the other hand, natural selection implies competition and therefore opposes cooperation. Nowak then introduces ﬁve mechanisms for the evolution of cooperation: kin selection, direct reciprocity, indirect reciprocity, network reciprocity, and group selection. Kin selection means that a member of a network is more likely to help another member of the network if the two are genetically related. On the highest level is group selection, where the members of a group cooperate with each other to beat other groups. According to Nowak, cooperation promotes biological diversity and is the secret behind the open-endedness of the evolutionary process.

94

Sociometrics and Human Relationships

Fu, Nowak, Christakis, and Fowler: The evolution of homophily This paper describes the evolution of homophily under different kinds of conditions by creating a model with preferences for either homophily (the tendency for individuals to interact with similar others) or heterophily, with different phenotypes (size, color, behavior), payoffs to interactions, evolution from one generation to the next, and overall ﬁtness (ability to survive). The payoffs for homophilic interactions are called synergy and payoffs from heterophilic interactions help to increase specialization. The model shows that favoring synergy can signiﬁcantly reduce the total number of phenotypes, making a group more uniform and dominated by a single phenotype. In the long run, the group alternates between different dominant phenotypes. In heterophilic populations, diversity is maintained by privileging rare phenotypes. Homophilic populations prefer common phenotypes and drive alternate phenotypes to extinction. Jordan, Hoffman, Nowak, and Rand: Uncalculating cooperation as a signal of trustworthiness The paper describes a series of experiments comparing uncalculating cooperation and trustworthiness. The researchers show the following three hypotheses: (1) People should engage in more uncalculating behavior when their decision process is observable. (2) People should perceive uncalculating cooperators as more trustworthy than calculating cooperator. (3) Uncalculating cooperators should behave in a more trustworthy way than calculating ones. To prove their predictions the researchers conducted two experiments, each structured in two stages.

How Ideas Spread in Online Social Networks — Readings

95

6.2. SPREADING IDEAS ON FACEBOOK Lewis, Gonzales, and Kaufmann: Social selection and peer inﬂuence in an online social network The paper addresses the question whether or not people inﬂuence each other on social networks. This has been examined by collecting data from Facebook over four years from 1600 college students, which was complemented with data from college housing. The question investigated was if Facebook friends would start picking up each other’s taste in music and reading, based on what they liked on Facebook. The results show relatively low evidence for social selection and social inﬂuence on social networks. Only for a few sub areas such as “classical music” and “jazz,” a tendency of adapting a friend’s taste was found. Musical tastes such as “indie” even show a negative effect, in that if somebody was a fan of indie music this would deter their Facebook friends to also become indie fans. Aral and Walker: Identifying inﬂuential and susceptible members of social networks The paper studies the adaptation of new ideas by tracking a new Facebook app recommending movies. The researchers measure inﬂuence and susceptibility of Facebook users based on how many messages they get from their friends about the new app. Users with high inﬂuence are less susceptible. Individuals with high susceptibility are mostly noninﬂuential. They found that older people and men are more inﬂuential than women, but that women are less susceptible to inﬂuence than men. Married people are least

96

Sociometrics and Human Relationships

susceptible to inﬂuence, while people reporting their marital status as “it’s complicated” are most susceptible. Burke, Kraut, and Marlow: Social capital on Facebook: Differentiating uses and users The authors measure the creation of social capital on Facebook. They combine longitudinal self-report surveys and Facebook server logs to examine how direct communication with friends, broadcasting status updates to a wide audience and reading of others’ news can predict changes in users’ social capital, self-esteem, and communication skills. They deﬁne three types of social behavior on social networking sites: (1) directed communication with individual friends (messaging, likes, tag sin pictures), (2) passive consumption of social news, when one reads others’ updates, (3) broadcasting, when one writes for others’ consumption and it is not targeted at a particular recipient. Results indicated that for people with low communication skills and self-esteem, passive consumption of information increases their self-esteem and communication skills. Only directed one-to-one communication will actively increase social capital.

6.3. FINDING FAKE REVIEWS THROUGH MACHINE LEARNING Ott, Choi, Cardie, and Hancock: Finding deceptive opinion spam by any stretch of the imagination This paper shows that machine learning is doing a better job than humans in detecting fake reviews. The authors

How Ideas Spread in Online Social Networks — Readings

97

created a dataset of real and fake hotel reviews and automated the detection with genre identiﬁcation, psycholinguistic methods as well as simple text categorization. As a reference they took a subset of the data categorized by human judges. The results show that automated methods are better capable of detecting deceptive opinion spam, while human judges perform roughly at chance. Among the automated approaches, the n-gram-based text categorization got the best individual results. When combined with psycholinguistically motivated features, the detection accuracy reached almost 90%. The paper also studied how to best write credible fake reviews, considering the context as well as the underlying motivation to detect a deception.

6.4. MEASURING FINANCIAL PERFORMANCE Preis, Moat, and Stanley: Quantifying trading behavior in ﬁnancial markets using Google Trends The authors look at the Google search behavior of trading and ﬁnance-related search terms using Google Trends. They ﬁnd a correlation between the number of searches for terms relevant for ﬁnance and trading and the Dow Jones Industrial Average (DJIA). They ﬁnd that the search terms precede drops in the Dow Jones by a few days. A trading strategy shorting and buying the Dow Jones each Monday based on the most predictive search terms averaged out over the preceding six weeks gives high theoretical returns.

98

Sociometrics and Human Relationships

Bollen, Mao, and Zeng: Twitter mood predicts the stock market This paper investigates if information about public mood extracted from Twitter messages has predictive capability regarding stock market prices. The researchers collected 10 million tweets over a time period of 10 months and compared it against daily DJIA closing values. The tweets’ contents were analyzed using the tool OpinionFinder to assess the emotional polarity of each tweet. As a second sentiment analysis tool, “Google-Proﬁle of Mood States” was used to calculate the six mood dimensions: calm, alert, sure, vital, kind, and happy. To measure the relationship between DJIA and Google-Proﬁle of Mood States, Granger causality analysis was used, ﬁnding a strong correlation between the calm state of tweets and the DJIA. Further analysis using a Self-Organizing Fuzzy Neural Network conﬁrmed the correlation with an accuracy of 87.6% in predicting daily DJIA values based on the calm emotionality metric. Moat, Curme, Avakian, Kenett, Stanley, and Preis: Quantifying Wikipedia usage patterns before stock market moves This paper analyses the correlation between stock market movements and search behavior on Wikipedia. The authors speculate that before trading decisions are made, the traders might look up information on Wikipedia or even make edits to the Wikipedia page. This means that the amount of views or edits of ﬁnancially relevant Wikipedia pages may contain early signs

How Ideas Spread in Online Social Networks — Readings

99

of stock market moves. Based on previous studies on behavioral economics that demonstrated that humans are loss averse, the authors assumed that investors might be willing to invest more efforts in information gathering before making a decision, which they view to be of greater consequence. This would lead to the conclusion that increases in information gathering would precede falls in stock market prices. The authors analyzed Wikipedia metadata generated between 2007 and 2012 and measured changes in page views and page edits from one week to the next. If the number of page views or edits increased, they predicted falling stock prices, otherwise they bet on rising stock prices. The authors found that a hypothetical portfolio trading based on Wikipedia page view changes was highly proﬁtable; they could not detect any signal however from the Wikipedia page edits.

6.5. CALCULATING DEMOGRAPHIC INFORMATION Yasser, Spoerri, Graham, and Kertesz: The most controversial topics in Wikipedia: A multilingual and geographical analysis The authors extracted background information about the most controversial Wikipedia articles, by calculating “controversy” of an article, based on how many times a Wikipedia article has been edited and reverted. They analyzed Wikipedia articles of 10 different languages with the additional help of geographical tags and found that articles about religion, Israel, and anti-Semitism are

100

Sociometrics and Human Relationships

controversial in every language and region, whereas most of the other topics are only controversial in certain regions and languages, for example, “Gipsy Crime” in Hungarian or “Chile” in Spanish. The English Wikipedia is exceptional because English is widely spoken all over the world; therefore, globally disputed topics like Jesus or anarchism are often represented in the English Wikipedia. The most controversial categories are about politics, geographical locations, and religion. Quercia, Kosinskiy, Stillwell, and Crowcroft: Our Twitter proﬁles, our selves: Predicting personality with Twitter This project investigated the relationship between personality characteristics of tweeters and their tweeting behavior. Based on a dataset collected from Facebook, where users could take a big ﬁve personality test, they compared the personality proﬁles of the 335 Twitter users included in this dataset, comparing it against their tweeting behavior. They tracked three features of Twitter users publicly available on proﬁles: following (number of proﬁles a user follows), followers (number of followers), and listed counts (number of times the user has been listed in others reading lists). Using these three features, they clustered the 355 Twitter users in four categories of Twitter users: listeners, popular, highly read, and inﬂuentials. The study produced two main insights. First, all their Twitter users were emotionally stable and most of them were extrovert. Interestingly, popular users tend to be “imaginative” (high in openness), while inﬂuential users tend to be “organized” (high in conscientiousness).

How Ideas Spread in Online Social Networks — Readings

101

Bollen, Goncalvez, Van de Leemput, and Ruan: The happiness paradox: Your friends are happier than you This paper investigates the relationship between popularity and happiness on Twitter. The authors examined a group of 40,000 Twitter users. They assessed the popularity and the happiness of these users, calculating happiness through automatic sentiment analysis of tweets, and popularity by counting their number of followers. The results show that a friendship paradox (on average people are less popular than their friends) and a happiness paradox (on average people are less happy than their friends) exist, but there is no correlation between popularity and happiness. Preo¸tiuc-Pietro, Volkova, Lampos, Bachrach, and Aletras: Studying user income through language, behaviour and affect in social media This research analyzed Twitter users. It calculated their profession from their self-declaration in their Twitter proﬁles and assigned them a number of user-level psychodemographic features for later comparison. Results conﬁrmed the impact of gender or race on income; the researchers also found that people with higher income post more neutral content, have more followers and express more emotions of fear and anger. People with lower income on average send more tweets. The researchers assume that this is caused by the fact that lower income users use Twitter more for social interaction. They are also more emotional in their tweets, but compared to people with higher income they retweet less and

102

Sociometrics and Human Relationships

are retweeted less often. The researchers found that users with higher income tweet more about NGOs and corporate topics, whereas people with lower income tend to use more swear words. Aharony, Pan, Ip, Khayal, and Pentland: Social fMRI, investigating and shaping social mechanism in the real world Just like medical fMRI measures brain activity, the authors use social fMRI to measure interpersonal interaction through mobile sensors, most prominently mobile phone records. In the paper three experiments are conducted, with the ﬁrst result showing that individuals’ social interaction patterns are inﬂuenced by their ﬁnancial status, and thus the lower the income, the less social interaction they will have. The second experiment concludes that social relationships inﬂuence decision-making, particularly by face-to-face interaction; the more interaction somebody had, the more likely they were to install an app on their mobile phone. In the third experiment, the goal was to increase physical exercise through the inﬂuence of friends. The researchers found that compensating friends were the best strategy to get couch potatoes to exercise more. Yarkoni: Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers This paper compares personality characteristics and word usage among 700 bloggers who took the big ﬁve personality test, which measures intro/extroversion,

How Ideas Spread in Online Social Networks — Readings

103

neuroticism, openness to experience, agreeability, and conscientiousness. Yarkoni found that word usage indeed predicts personality characteristics. For example, a person with high neuroticism may primarily use adjectives to describe events in a negative way (awful, depressing, stressful) rather than nouns connoting actual negative events. A person with high agreeableness might often speak about love (love, hug) but is more likely not speaking about a sexual behavior (porn, gay, fuck). Some personality traits like openness have a high correlation with people’s vocabulary. Others like extraversion or conscientiousness might be more difﬁcult to predict.

6.6. PREDICTING ELECTION OUTCOME DiGrazia, McKelvey, Bollen, and Rojas: More tweets, more votes: Social media as a quantitative indicator of political behavior This project ﬁnds a strong correlation between the number of Twitter mentions of candidates for the US congressional election in 2010 and their eventual vote tally. Using 500 million tweets, the researchers cross-veriﬁed their ﬁndings with district partisanship, demographic, and media coverage to control for other outside inﬂuences. The content of the tweets other than mentioning the candidate name did not seem to matter for predicting the popularity of a candidate.

104

Sociometrics and Human Relationships

MAIN LESSONS LEARNED • Having a large group of weak-tie friends is good for spreading knowledge of new ideas, having a close knit group of strong-tie trustworthy friends is good for the adaptation of new ideas. • Humans have on average 150 real friends and “tribe members” (Dunbar’s number); if they have more, for instance, in Facebook, they are not really friends. • Collaborative groups with in-group altruism will win against noncooperative groups. • Facebook provides an excellent testbed for social network research on homophily and inﬂuence; older men are most inﬂuential on Facebook, while in general humans do not tend to adapt many ideas from their Facebook friends. • For discovering cheating online behavior, machine learning is more accurate than humans reading and assessing the text online. • Online social media provides an excellent source for predicting ﬁnancial performance of assets such as stock; in particular, Google or Wikipedia search patterns are predictive of future performance.

How Ideas Spread in Online Social Networks — Readings

105

• Online behavior on social media such as Twitter can be used to predict demographic attributes of Twitter proﬁles such as income or health. • Twitter can also be used to predict the outcome of elections.

This page intentionally left blank

PART II. ANALYZING STRUCTURE, DYNAMICS, AND CONTENT OF NETWORKS WITH CONDOR The second part of this book describes how to use online social media analysis to Coolhunt — identifying trends by ﬁnding the trendsetters — and to Coolfarm — studying team networks and improving their communication for better collaboration for more innovation. Analyzing social media to read the collective mind consists of four different parts: virtual mirroring, trend forecasting, Coolhunting, and Coolfarming. These four social media analysis tasks are fundamentally different in two dimensions: (1) time and (2) about the nature of what we do not know (Figure 21). We can either analyze social media to gain insights about things that we do not know today or to predict the future. Trend forecasting and Coolfarming are concerned with activities in the future, whereas virtual mirroring and Coolhunting are concerned with things that we do not know today. The second dimension distinguishes between things which we know to exist, we just do not know how they are developing, these are the known unknowns. But there are also things which we do not

r 2017 Peter A. Gloor

107

108

Sociometrics and Human Relationships

Figure 21: Four Fundamental Social Media Analysis Tasks to Read the Collective Mind.

know to exist before starting the analysis, these are the unknown unknowns. Mathematician Nassim Taleb calls unknown unknowns “black swans,”1 as in the European Middle Ages it was clear that all swans had to be white, it was unimaginable that a swan could be black. Only when the ﬁrst Europeans came to Australia and found black swans there, they had to change their beliefs. After two introductory chapters that explain the basic architecture and concepts of the social media analysis tool Condor, the succeeding six chapters demonstrate all four facets of online social media sense making, virtual mirroring, trend forecasting, Coolhunting, and Coolfarming. Chapter 9, Analyzing E-Mail with Condor (virtual mirroring)

1

Taleb (2007).

Part II. Analyzing Structure, Dynamics, and Content of Networks

109

Chapter 10, Calculating Personality Characteristics from E-Mail (trend forecasting)

Chapter 11, Predicting Criminal Intent from E-Mail — Analyzing the Enron E-Mail Archive (trend forecasting)

Chapter 12, Coolhunting on the Internet with Condor (Coolhunting)

Chapter 13, Coolhunting — Francogeddon (Coolhunting)

110

Sociometrics and Human Relationships

Section 14.1, Bernie Sander’s Presidential Campaign — The Perfect COIN (Coolfarming)

Section 14.2, Coolhunting Bernie Sanders, Hillary Clinton, Jeb Bush, and Donald Trump (Coolhunting)

Section 14.3, Tribeﬁnder on Twitter (Using Machine Learning) (trend forecasting)

7 THE FOUR-STEP ANALYSIS PROCESS

CHAPTER CONTENTS • Overview of Condor’s four-part architecture • Condor’s social media fetchers • Condor’s social media ﬁlters • Condor’s social media visualizers • Condor’s social media exporters.

To illustrate Coolhunting, Coolfarming, trend forecasting, and virtual mirroring concepts described in Part I of this book, Part II introduces the social media analysis tool Condor, which has been developed over the last 14 years by a team from the University of Applied Sciences Northwestern Switzerland, the MIT Center for Collective Intelligence, and over the last seven years by the software company galaxyadvisors. Condor is a powerful social media analysis tool for collecting all types of social media data, aggregating the data, visualizing it,

r 2017 Peter A. Gloor

111

112

Sociometrics and Human Relationships

and exporting it to other types of analysis tools such as Excel. Condor consists of four parts: (1) A series of fetchers to directly load data from e-mail, for example, from Gmail or Exchange, calendars, Skype, Twitter, and from Google, Wikipedia, and Facebook. (2) Interactive preprocessing functions to modify and reduce the graph, ﬁlter by content and by structure, annotate by geocode, merge multiple e-mail addresses, and create modiﬁed graphs. (3) Visualization functions, to show the static network, a dynamic movie of the network over time, geographical word maps, and different views for structure, content, and sentiment of actors. (4) Export functions to export time series of all variables for later longitudinal analysis in statistics packages such as R or SPSS (or Excel). Figure 22 lists the four components of Condor: the fetchers, ﬁlters, visualizers, and exporters. They work together to calculate the six honest signals of collaboration introduced in Chapter 4. These honest signals can be calculated both from the inside of an organization, tracking mostly e-mail, online calendars, and chat, or from the outside on the Internet, tracking tweets, posts on Facebook walls, and the speed with which Wikipedia pages about a certain topic are edited. In step 1, the different fetchers collect the raw communication data in an easy and automated way from e-mail archives, Twitter, Facebook, the Web, and Wikipedia. In step 2, the ﬁlters allow to preprocess the data to prune and shape the network in the most meaningful way for

The Four-Step Analysis Process

Figure 22: Four Main Parts of Condor: Fetchers, Filters, Visualizers, Exporters.

113

114

Sociometrics and Human Relationships

analysis. To calculate the six honest signals of collaboration, the network ﬁrst has to be cleaned. This is where the science of social network analysis meets the art of social network interpretation. To take a simple example, a user with multiple e-mail addresses can be merged into one virtual actor for later analysis. Or mailing list addresses can be removed, as they would show up as the most central actors in the network without having any real social meaning. Or in a complex network with many so-called “leaf” nodes, which are only connected to one other node, these peripheral nodes can be pruned from the network. In step 3, the Condor user can look at the different visualizations of the network’s structure, dynamics, and content to develop hypotheses about which honest signals might be most indicative of the outcome metrics the user is trying to measure. For example, looking at the contribution index chart of individual actors within a community will tell the analyst which person is the most popular member of the community — getting the most e-mails or tweets, or who is the most active participant — sending the most e-mails or tweets. In the dynamic view or in time series curves, the development of a community or a discussion about a topic can be tracked over time. In step 4, the numbers of the time series can be calculated and exported to an external analytics or visualization tool such as Excel or SPSS. This allows the user for instance to drill down on the evolution of betweenness over time to see who has been the most central person in a community at any given point in time. A time series of actor level emotionality metrics will tell who has been most positive at a given point in time. A time series of response times from others will tell how somebody is

The Four-Step Analysis Process

115

gaining respect by others answering her successively faster over time. This is where the statistics brieﬂy described in Section 5.2 will come handy. We will now look at each of the four Condor components (fetcher, ﬁlter, visualizer, exporter) in more detail.

7.1. SOCIAL MEDIA FETCHERS The fetchers (Figure 23) allow Condor to automatically collect large amounts of social network data. In particular, Condor has fetchers for e-mail, online calendars, Skype, Facebook wall, Wikipedia, Google Custom Search, and Twitter. The fetchers get the data from outside sources and store it in the MySQL database. This data can then be taken and preprocessed for later Figure 23: List of Condor Fetchers.

116

Sociometrics and Human Relationships

analysis. Note that besides the live social media fetchers, Condor can also directly import MySQL databases and Microsoft CSV ﬁles.

7.2. SOCIAL MEDIA FILTERS The Condor ﬁlters prepare the data stored by the fetchers in the MySQL database for later analysis. They allow the user to merge multiple e-mail addresses into one virtual actor; for example, the e-mail address [email protected] and [email protected] can be combined into a single virtual actor, that is, one node on the screen. Actors can also be removed by name, or by property, for example, removing all nodes with less than three nearest neighbors. One key function is the annotate function, which will calculate the six honest signals of collaboration described in Chapter 4. Note that all these changes are only stored in the MySQL database when the changes are explicitly saved under a new database name. As the annotated values such as betweenness centrality, or sentiment, are calculated in a network, they should be recalculated, if a node has been removed (Figure 24).

7.3. SOCIAL MEDIA VISUALIZERS At the core of Condor is a list of visualizations (Figure 25). Key is the static and dynamic network views; the dynamic view shows a movie of the evolution of an e-mail, Twitter, or Wikipedia network over time. The actor scatter plot allows users to quickly compare and visualize

The Four-Step Analysis Process

Figure 24: Condor Filters.

117

118

Sociometrics and Human Relationships

Figure 25: Condor Visualizers.

actor level metrics. The values are straightforward to compare, if for the x-axis always the same variable, for example, “total number of messages,” is chosen. The sentiment views, the Word cloud, and the geomap view show the content, which Condor ﬁnds in the tweets, and the e-mail message bodies. The SNA metrics over time shows the evolution of group betweenness centrality and other graph-level metrics. The temporal social surface view shows the same information, breaking it down on the individual actor level.

7.4. SOCIAL MEDIA EXPORTERS The Condor export wizards export actor-level metrics in CSV format that are aggregated over time, as well as

The Four-Step Analysis Process

Figure 26: Condor Export Wizards.

119

120

Sociometrics and Human Relationships

longitudinal time series for later analysis in SPSS, Excel, and R. The exported data can be shown directly in Excel or can be further manipulated and computed in statistics packages such as SPSS, Matlab, R, or Stata (Figure 26). After this brief introduction into the architecture of Condor, we will now look at how to get started with Condor.

MAIN LESSONS LEARNED • The three-step analysis process in Condor starts with collecting communication data from Twitter, Facebook, Wikipedia, and blogs, and also from e-mail and other types of organizational communication archives such as calendars or skype. • The collected data is preprocessed and cleaned using a series of content ﬁlters. • In the next step, Condor provides a variety of visual analysis tools to visually explore the social network in many different ways. • In the last step, the data is exported as actor-level variables and time series for further statistical analysis in tools like Excel, KNIME, R, or SPSS.

8 GETTING STARTED WITH CONDOR

CHAPTER CONTENTS • Analyzing the Facebook wall • Analyzing Twitter tweets • Measuring the importance of brands through betweenness in bipartite graphs • Removing the “Nobodies” — pruning the leaves • Degree-of-separation search.

In order to start Condor, you ﬁrst need to install Java and MySQL on your machine (Condor runs on Windows, Mac, or Linux). The ﬁrst step is to download the latest version of the Java runtime from the Oracle website https://java.com/en/download/ by clicking on the large red button “Free Java Download.”

r 2017 Peter A. Gloor

121

122

Sociometrics and Human Relationships

If you have problems installing Java, you will ﬁnd help in the Condor Manual http://91.250.82.108:8080/condor/Condor%203%20User%20Manual.pdf The next step is to install MySQL, which you can download from http://dev.mysql.com/downloads/mysql/. Again, if you have problems, you will ﬁnd tips in the Condor Manual. There is also a series of YouTube videos, linked from the Condor Manual, that will take you step by step through the process. Now you are ready to download Condor, which you will ﬁnd on http://guardian.galaxyadvisors.com/. The free academic version will allow you to analyze and visualize up to 10,000 nodes; you can download as many nodes as you want with the different data fetchers. If you want to analyze and visualize larger networks, you ﬁrst have to sign up with a valid e-mail address. You will subsequently get a download link for a Condor trial version sent to your e-mail address. In the e-mail, click on “verify email address.” This will take you to your account page on the Condor Guardian website, from there you can download a full trial version of Condor.

Getting Started with Condor

123

Once you have downloaded Condor, you can either double-click the Java icon or better start it from the command line to allocate more memory to Condor, by opening a DOS command window or a Mac or Linux terminal window.

Once you try to start Condor, depending on your security settings, you might have to go on the Mac to the “Security & Privacy” pane and allow Condor to run by clicking on “Open Anyway.”

124

Sociometrics and Human Relationships

The next step, once you have allowed Condor to start, will be to enter your license key.

After that, the login Window to MySQL will pop up. If the ﬁelds are all red, you most likely do not have MySQL running (note that installing MySQL will not start it, you will have to do that after installing MySQL).

Once your MySQL login window looks like the image below, you can click “ok” in case you have installed MySQL with default settings as user “root” and no password. If you have set a MySQL password, you have to enter it now.

Getting Started with Condor

125

Condor will now bring up the Condor Workspace, and you are ready to start working.

Now you are ready to jump into your ﬁrst social media analysis. 8.1. ANALYZING THE FACEBOOK WALL WITH CONDOR We will start by collecting and analyzing your own personal Facebook wall. This only works if you have a Facebook account.

126

Sociometrics and Human Relationships

The ﬁrst step consists of creating a dataset.

Alternatively, you can directly enter a new dataset name and create the new dataset on the ﬂy when starting the Facebook wall fetcher.

This will bring up a window to log into Facebook and authorize Condor to collect the Facebook wall.

Getting Started with Condor

127

After logging in, you will be given a security warning by Facebook, which you can ignore. Click on the “next” button again, to collect the actual data.

Once the wall has been collected, you can create a static view of your network.

128

Sociometrics and Human Relationships

To easily ﬁnd the owner of the wall (myself in this case), I annotate my network by betweenness centrality.

Among the options, I click on “betweenness centrality.” Annotating the graph means calculating the corresponding variable (betweenness centrality, degree centrality, contribution index, etc.) for each actor. Note that these variables will change if a single actor has been removed or added to the network and will have to be

Getting Started with Condor

129

recomputed by rerunning the corresponding annotate command.

When I now size the nodes by betweenness centrality, I see myself as the largest node.

After this simple use of Condor to have a look at your Facebook wall, we will now run a step-by-step analysis “fetcher — ﬁlter — visualizer — export” as described in Chapter 7.

130

Sociometrics and Human Relationships

8.2. SAMPLE FOUR-STEP ANALYSIS WITH TWITTER Before running the ﬁrst Twitter Fetcher query, you will need to obtain your own personal Twitter API (application program interface) keys. For this, you will need to be registered with Twitter. After that, go to https://apps.twitter.com and click on “Create New App.” Fill in the details of the app, for example “measure importance of presidential candidates,” and click on “Create Your Twitter Application” to create your consumer key and consumer secret. You will also have to create your Twitter access token and access token secret, as described here: https://dev.twitter.com/oauth/overview/ application-owner-access-tokens (Alternatively you can generate Twitter access tokens on the ﬂy, by clicking on “Login with Twitter” in the ﬁrst Condor dialog.) Once you have your Twitter access token, your Twitter access token secret, your Twitter consumer secret, and your Twitter consumer key, you are ready to collect Twitter data with the Condor Twitter Fetcher. For this simple ﬁrst example, we will compare the US presidential candidates Bernie Sanders, Donald Trump, Hillary Clinton, and Ted Cruz. We start by creating a new MySQL database in Condor, naming it “4candidates.” 8.2.1. Step 1 — Fetch Data

Then we create a dataset for each candidate, starting with Donald Trump.

Getting Started with Condor

131

Next, we run a query collecting the most recent 2000 tweets about the candidate. We need to make sure that the checkbox “Connect nodes with search term” is checked. This will add an additional link from each tweet containing the search term “donald trump,” to the search term “donald trump” which will be shown as an additional actor of type “search term” in the resulting network. Including this link in the graph will allow us to later compare the strength of the brand “donald trump” to the brands “hillary clinton,” “bernie sanders,” and “ted cruz,” which is described in Section 8.3.

In the next step you can either use the credentials from your app, or log in with Twitter. This process is now

132

Sociometrics and Human Relationships

repeated for the other three candidates Hillary Clinton, Bernie Sanders, and Ted Cruz. To do a combined analysis, we then merge the four datasets into one combined dataset.

8.2.2. Step 2 — Process We now calculate the values to be displayed in Step 3: Betweenness and degree centrality Betweenness oscillation Contribution index annotation TurnTaking annotation Sentiment

Getting Started with Condor

133

8.2.3. Step 3 — Visualize The ﬁrst parameter to look at is the activity of the combined tweets. We see that collecting the last 2000 tweets of the four candidates collected on March 6, 2016, around 14:30 will just give us the last hour, from 14:41 to 15:28.

Bringing up the actor scatter plot will tell us who the most respected tweeters are, which are fastest in being retweeted or responded to by others. For instance, CliffWilkin is a proponent of “Convention of States,” a right-wing initiative that wants to take away power from the federal government, and a follower of Ted Cruz. CliffWilkin is being addressed quickly, within 0.0125 hours on average, that means within less than a minute, by other tweeters.

134

Sociometrics and Human Relationships

Image 2a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

8.2.4. Step 4 — Export In the ﬁnal step, we export the calculated actor values. We can then open the ﬁle “4candidates.csv” in Excel for later analysis.

Getting Started with Condor

135

136

Sociometrics and Human Relationships

8.3. MEASURING THE IMPORTANCE OF BRANDS THROUGH BETWEENNESS OF ACTORS IN BIPARTITE GRAPHS Condor offers a unique way of measuring strength of brands by measuring the betweenness of their search terms in a social network graph created through Twitter, Google Custom Search Engine (CSE) search, or Wikipedia. For instance, in a Twitter graph, a link between two actors is drawn, if person A is retweeting person B, or if person A is mentioning person @B in a tweet. The more a particular brand is mentioned in tweets, the more links it will thus have from all actors mentioning it in their Tweets. For instance, everybody tweeting about “bernie sanders” will have a link to the search term “bernie sanders.” The more people tweet about “bernie sanders,” the more incoming links the search term “bernie sanders” thus will have. If we combine the graph to include all Tweets about “donald trump,” the more incoming links “donald trump” has compared to “bernie sanders,” and the more central the people tweeting about “donald trump” compared to people tweeting about “bernie sanders” are, the stronger the brand of “donald trump” is. The image below (image 3 on page 138) shows the static view with the nodes sized by betweenness. The static view for Twitter will contain two types of actors. All the people tweeting will be shown as circles, with the tweets as connecting lines from a tweeter to the retweeter or the person mentioning the other person in the tweet. In addition, there will be the original search term, as a special node, shown as a square. In our example, we will

Getting Started with Condor

137

have four squares, “bernie sanders,” “hillary clinton,” “ted cruz,” and “donald trump.” By the deﬁnition of betweenness, the size of the square will be a proxy for the importance of the search term. Dragging the mouse over each search term will tell us that in the time from 14:41 to 15:28 on March 6, 2016, Donald Trump had the strongest brand, with betweenness of 1.23*107, followed by Ted Cruz with betweenness of 1.06*107, followed by Hillary Clinton with 9.96*106 and Bernie Sanders with 9.86*106.

8.4. PRUNING THE LEAVES IN A GRAPH The Twitter network shown below (image 3 on page 138) about Trump, Cruz, Clinton, and Sanders contains 7435 actors and 17,325 links. We can prune the graph and remove all the peripheral nodes by removing all the nodes with degree centrality 1; these are all the people who did not get retweeted, and whom nobody mentioned in another tweet, or who did not mention anybody else in a tweet. For this we use the “Process dataset->Actor ﬁlter” function. Before bringing up the dialog, we need to make sure we have calculated the degree centrality (“Process dataset->Annotate->centrality annotations”).

138

Image 3a

Sociometrics and Human Relationships

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Getting Started with Condor

139

All the “leaf”-nodes at the periphery, that is, all the tweeters that have only tweeted once without being retweeted or being mentioned in another Tweet, are now gone (image 4 on page 140).

140

Image 4a

Sociometrics and Human Relationships

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Getting Started with Condor

141

8.5. DEGREE-OF-SEPARATION SEARCH WITH GOOGLE CSE The same four-step analysis process “fetch-ﬁlter-visualizeexport” also works for other data sources. Repeating the same query for the two candidates, Trump and Clinton on the Web leads to a “degree-of-separation” search to measure which candidate is more popular, and which websites are most important to boost the candidates’ importance. Web searches in Condor are conducted through the Google CSE API. Before executing the ﬁrst Condor Google CSE search, you will need to obtain a Google CSE API key. You can get an API key from https://console.developers.google. com/apis/ by clicking on “Custom Search API” in the “Other popular APIs” group. Enter this key in the dialog box in Condor. You will also need to enter the Condor CX Key:000229616349723713761:mlcaolv1mpw. This process is also described step-by-step in the Condor manual. Google gives you 100 free queries per day; if you want to run more queries, Google will ask for your credit card number and charge you a few cents per query. Condor’s “degree-of-separation” search provides a powerful mechanism for measuring the importance of a search term on the Web based on the importance of the websites where it is being used, similarly to Google’s Page Rank algorithm. Different from the Page Rank algorithm, which returns a single number per website as a proxy for its importance, “degree-of-separation” search returns betweenness of the website depending on the particular search term. For example, for searches for politicians politico.com will be important, while for searches for actors imdb.com will be prominent.

142

Sociometrics and Human Relationships

To illustrate how “degree-of-separation” works, let’s look at a speciﬁc example. For instance, for the search for “Donald Trump,” a Google search is run through the Google Custom Search Engine (CSE) API. The resulting 10 or 20 top links (depending on your settings in the CSE fetcher) are then plugged back into the Google search engine, and the top 10 or 20 back links pointing back to each of the top 10 or 20 original sites are taken. In more detail (see Section 12.1 for a step-by-step description in Condor), it works as follows: Step 1: Using “Fetch->Fetch Web” search on Google for “Donald Trump” Get the top 10 results. In the example below, done on October 3, 2016, the search for “Donald Trump” returned the following websites:

The image below shows the same result, visualized in Condor as a network with each website containing the search text “Donald Trump” pointing back to the original search term “Donald Trump.”

Getting Started with Condor

143

Step 2: Get the top 10 results pointing back to top 10 results Condor will now collect the websites such as, www. headlinespot.com which links to www.politico.com which contains the search text “Donald Trump.” The image below shows the complete backlink network. Blue nodes contain the search term “Donald Trump,” yellow nodes link back to the blue nodes.

144

Sociometrics and Human Relationships

Image 5a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

In the next step, we repeat the same “degree-of-separation” search for Hillary Clinton, to check which one of the candidates is more popular on the Web. Step 3: “Degree-of-separation” search for “Hillary Clinton” The image below shows the top 10 websites pointing back to the top 10 websites containing the search term “Hillary Clinton.”

Getting Started with Condor

145

Step 4: Merge degree-of-separation search for “Donald Trump” and degree-of-separation search for “Hillary Clinton” and calculate betweenness centrality

The image below illustrates the resulting bipartite (with nodes of two types) website network, with the search terms shown as squares and the websites shown as circles. The size of a node shows its importance, measured as betweenness centrality. In the network below, Hillary Clinton has higher betweenness centrality (her purple square is slightly larger than the one for Donald Trump), based on the higher betweenness centrality of the websites pointing back to her (nytimes.com, nymag.com).

146

Sociometrics and Human Relationships

Image 6a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

8.6. DEGREE-OF-SEPARATION SEARCH WITH TWITTER The same degree-of-separation search as for websites can also be run to collect and analyze Twitter data. The trick is to include the search result as an extra node in the network. The other nodes in the network are not websites, but Twitter users, and a link between two nodes — except to the search term — denotes either one user retweeting another user or mentioning her or him in a tweet. We again compare the popularity of Donald Trump with the popularity of Hillary Clinton on Twitter (see Section 12.3 for a detailed step-by-step example in Condor).

Getting Started with Condor

147

Step 1: Collect last 500 tweets about “Donald Trump” In Step 1, we collect 500 Tweets, which will only cover a few minutes of tweets on October 3, 2016, as the topic is frantically tweeted about. We have to make sure to click the checkbox that connects the search term “Donald Trump” with all the Tweets. Visualizing the search results leads to the image below; the grey box in the center is the search term.

On a side note… Removing the search term will show the most important tweeters. The image below shows the network of above with the search term “Donald Trump” removed; the tweeters are sized by betweenness centrality. Nytimes and ShazBooty714 are the most important tweeters.

148

Sociometrics and Human Relationships

Step 2: Collect last 500 tweets about “Hillary Clinton” We repeat the same search on Twitter for the term “Hillary Clinton,” again connecting the search term with all tweeters. The image below shows the results.

Getting Started with Condor

149

Step 3: Merge Twitter degree-of-separation search results for “Donald Trump” and for “Hillary Clinton” and calculate betweenness centrality The image below shows the merged datasets, size of a node denotes its betweenness centrality. The squares for Hillary Clinton and Donald Trump are of comparable size; this means on Twitter both candidates show approximately the same brand strength.

150

Sociometrics and Human Relationships

8.7. WIKIPEDIA SEARCH We can also use Wikipedia to measure the strength of a brand. This is done not by “degree-of-separation” search, but by direct search. For a detailed example of how to use the Wikipedia Evolution fetcher, see Section 12.2. To compare the strength of Hillary Clinton and Donald Trump on Wikipedia, we used Condor’s Wiki Evolution fetcher to collect the last 250 edits on the Wikipedia pages about “Donald Trump” and “Hillary Clinton.” Condor will collect all pages linking from and back to the pages (called “bidirectional links” in Condor) about “Donald Trump” or “Hillary Clinton” on

Getting Started with Condor

151

Wikipedia which have been referenced in the last 250 edits from other Wikipedia pages. The image below shows the resulting Wikipedia link network, each node is a Wikipedia page, each connecting line is a bidirectional link between two pages. The two most central pages are — not surprisingly — the pages about Hillary Clinton and Donald Trump. Each node is sized by betweenness centrality. Hillary Clinton, who, in addition to being the US presidential candidate, is the wife of a former president, a former senator, and a former US secretary of state, is more central than Donald Trump in the Wikipedia link network.

After this general introduction into the four-step analysis process with Condor, we will now look at how to analyze everybody’s own mailbox.

152

Sociometrics and Human Relationships

MAIN LESSONS LEARNED • Condor runs on Mac, Windows, and Linux. • Before running Condor, Java and MySQL must be installed. • The Facebook wall fetcher allows you to collect and analyze your personal Facebook wall, if you have a Facebook account. • The Twitter fetcher allows you to analyze tweets about any topic; you will need to obtain Twitter API keys ﬁrst. • The Google CSE fetcher allows you to analyze the most important websites about a certain topic; you will need to obtain Google CSE API keys ﬁrst. • Degree-of-separation search will measure the importance of brands by constructing a bipartite graph either within websites using Google CSE or Twitter. • Brand importance is measured by calculating the betweenness centrality of the nodes.

9 ANALYZING E-MAIL WITH CONDOR

CHAPTER CONTENTS • Analyzing your personal social network through your mailbox • Finding COINs through community detection • Analyzing the social network of an organization through its e-mail archive • Analyzing Hillary Clinton’s e-mail • How to deal with an organization when analyzing its e-mail archive.

r 2017 Peter A. Gloor

153

154

Sociometrics and Human Relationships

9.1. CREATING A VIRTUAL MIRROR OF YOUR OWN MAILBOX In the ﬁrst e-mail analysis example, I will be studying four months of e-mail messages of my own personal mailbox. This analysis will allow me to much better understand what worked and what did not work in my collaboration with dozens of teams in a variety of topics. A personal e-mail-box analysis consists of the following steps: 1. Create a new e-mail database and dataset. 2. Fetch e-mail using the options to ﬁlter by date and mail folders. 3. Create a static view of the network for an initial look. 4. Use Process Dataset to continue to clean up the network by merging e-mail addresses, removing mailing lists, and other unwanted actors. 5. Annotate the dataset with network measures using Process Dataset. 6. Use the View menu to create a scatter plot of Contribution Index, Average Response Time (ART), and Word Cloud. 7. Remove yourself from the network to understand who else is important in holding your network together and rerun all the annotations. 8. Graph group centrality measures and temporal social surface to examine creativity or performance behavior; use the actor scatter plot to examine

Analyzing E-Mail with Condor

155

sentiment, complexity, betweenness centrality oscillation, and adjacency matrix. 9. Calculate the inﬂuence measure; remove noninﬂuential actors and graph. 10. Summary. I start by creating a MySQL database “mail_peter_Aug15” in Condor, into which I will load my mailbox data from May to August 2015.

Note: Database and dataset names cannot include any spaces in their names. Use an underscore “_” as a separator.

Once the database is created, I switch to it.

I then create a dataset “mail_May_Aug15,” into which I will be loading the mails of last four months of my mailbox.

156

Sociometrics and Human Relationships

Now I can use Condor like an e-mail client, to download the mail into the MySQL database.

Analyzing E-Mail with Condor

157

Note: If a dialog window is hidden behind the main Condor window, it can always be brought to the foreground by clicking on .

Condor has the capability to fetch e-mail from an Exchange, IMAP, or POP3 account. In this example, I am collecting my MIT mailbox, which is stored on a Microsoft Exchange server. I include collecting the contents of my mailbox, by checking the box “Fetch content.” For Exchange, if the server is set up well, it is sufﬁcient to enter the e-mail address and password (just like logging into Webmail), then the Exchange Autodiscover server will automatically ﬁgure out hostname and username.

158

Sociometrics and Human Relationships

If you would like to download an IMAP account, you will also have to enter the name of the IMAP host. For example, for GMAIL the settings in the dialog below would log you into your account with Condor.

Note: If you have enabled Google’s two-step password veriﬁcation, you will get the following error message and will have to generate an app password https://support.google.com/accounts/answer/ (see 185833)

Analyzing E-Mail with Condor

159

Both ways, either using the Google-generated Condor app password or using your own password (if you are not using two-step veriﬁcation), your GMAIL login dialog will look as follows.

I will now login into my MIT Exchange mailbox. In the next dialog, I have the option to set the time period for the e-mails I want to download. I only collect my mails from May 1, 2015 to the collection time (August 27, 2015).

The next dialog gives me the option to only collect speciﬁc folders. I collect all the folders that I expect to have gotten new content in the speciﬁed time period.

160

Sociometrics and Human Relationships

Now the e-mails are loaded into the dataset “mail_May_Aug15” as a network with 1381 actors and 13,074 links, and I can create my ﬁrst visualization (View->Create static view).

The “asteroid belt” outside the large connected cluster in the center comes from e-mails which are not sent to [email protected], but to mailing list addresses.

Analyzing E-Mail with Condor

161

I now clean up my mailbox by removing the mailing list addresses and some other mails that are not interesting (“Process dataset->Remove speciﬁc actors”).

In the next step, I merge people who are using more than one e-mail address, using the “Manual node merging” wizard (“Process dataset->Node merging->Manual node merging”). I can ﬁlter names by typing substrings in the box at the top. By shift-clicking multiple actors and then clicking on “Merge actors and/or group()” I can merge multiple e-mail addresses into one node in the graph.

162

Sociometrics and Human Relationships

In a further (optional) cleanup step, I reduce graph size without losing key information by removing all nodes that are isolated or have only one connection to the connected component in the center. To do this, I calculate the degree centrality for all nodes (Process dataset ->Annotate->Centrality annotations).

Analyzing E-Mail with Condor

163

I then use the actor ﬁlter dialog (Process dataset->Actor ﬁlter) to only keep the nodes which have degree centrality larger than 1.

The network has now been reduced to 516 actors and 10,530 edges.

164

Sociometrics and Human Relationships

Note: These changes to the original dataset are not saved in the original database. If you want to save it, you have to right click on the dataset and save it under a new name.

Caution: This might take a lot of space on your hard disk, as Condor databases will get large very quickly!

Now we can calculate the different networking attributes of all actors, using the annotate functions: Process dataset->annotate->Centrality annotations (Central Leadership) Process dataset->annotate-> Oscillation annotations (Rotating Leadership) Process dataset->annotate-> Contribution Index annotations (Balanced Contribution) Process dataset->annotate-> Turntaking annotations (Responsiveness) Calculate sentiment (Honest Sentiment).

Analyzing E-Mail with Condor

165

There are also two annotations on the group level Process dataset->annotate-> AWVCI annotation Process dataset->annotate-> Group density annotation. These annotations calculate ﬁve of the six honest signals of collaboration, listed above in parentheses and introduced in Chapter 4, plus additional networking metrics. The image below illustrates the settings for calculating betweenness oscillation annotations for e-mail. The sliding time window to aggregate e-mails is set to 7 days, which means that Condor is always taking a week’s worth of e-mail to calculate betweenness of each actor, recalculating betweenness in 1-day increments for the entire duration from May 1 to Aug 27, 2015. The resulting time series of betweenness values is then smoothened over a 3-day time window.

166

Sociometrics and Human Relationships

The contribution index annotations can be run with default settings, optionally edges can be deduplicated in the graph if one person sends single e-mails to hundreds of recipients simultaneously. I am not doing this for my own mailbox.

The turntaking annotations are run with a minimum response time of 15 seconds, which means that if somebody just sends back an empty reply to an e-mail within 15 seconds (e.g., to try to game their response time) the reply will be ignored. Also, if a message is not answered within 4 days, it will be ignored to calculate the average response time. This assumes that if an e-mail is not answered within 4 days, it did not need an answer.

Analyzing E-Mail with Condor

167

We also calculate sentiment, emotionality, and complexity, using the following settings.

Now we can look at the results, identifying my most active communication partners. We start using the actor scatter plot view (View->Actor scatter plot). Not surprisingly I am the most active participant, sending and receiving a combined total of 6500 e-mails. I am sending slightly more messages than I receive, leading to a contribution index of 0.2. Remember that a contribution index of 1 means that a person only sends e-mail, while a contribution index of 1 means that the person is only receiving messages.

168

Sociometrics and Human Relationships

Image 7a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Looking at how quickly somebody responds to somebody else (ego ART) is a proxy for how passionate somebody is. Image 8a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Analyzing E-Mail with Condor

169

Looking how quickly everybody else answers to somebody (alter ART) is a proxy for how much somebody is respected. Image 9a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

We can now also look at the content of my e-mails. What immediately jumps into the eye is the bright green of the words, indicating an overall very positive mood. The only negative word is “problem.” My own name is the most popular word. As this is not so interesting, I right click on it to delete it in the view.

170

Sociometrics and Human Relationships

Image 10a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Clicking on “problem” in the above view will tell what the “problem” is. It turns out it is mostly about Condor bugs. Also note that the messages can be in English, German, French, Spanish, Portuguese, and Italian to be automatically analyzed by Condor’s sentiment detection algorithm.

Analyzing E-Mail with Condor

171

172

Sociometrics and Human Relationships

9.1.1. Drawing the Term Graph Condor provides a second way to visualize content, using the “term graph” function. The term graph is a semantic network of “terms.” “Terms” are the most important words in the content of a document. The network is constructed by cooccurrence of keywords; if two words appear in the same document (each document is an e-mail message in this example), they will be connected by an edge. Running the function with the default settings will create a new dataset “mail_May_Aug15term1,” using the keyword vector generated from the content ﬁeld by the “Calculate Sentiment” function. The “minimal number of co-occurrences per edge” deﬁnes a cut-off value for including keywords into the term graph; in the example below two keywords need to appear at least four times together in four different documents to be included in the term graph.

After clicking the “Next” button, I get a dialog showing all keywords fulﬁlling the selection criteria.

Analyzing E-Mail with Condor

173

Clicking on the headings “Term,” “Occurrences,” “Type,” and “Language” allows me to sort the terms in different ways, and to choose the ones I want to have included in the ﬁnal term graph. In the dialog below I have manually chosen 97 words, which I suspect are meaningful in the context of my mailbox from May to August 2015. In the next step, the new dataset “mail_May_ Aug15term1” is created. Next, I calculate the betweenness centrality annotation, and call up the static view, showing all labels corresponding to the 97 keywords or terms, sizing the nodes by betweenness. We see that “team” and “work,” “meetings” and “project” are the most important words by betweenness in the term graph, while the names of my collaborators “Michael,” “Andrea,” and “Ken,” shown on the left, are related = meaning they occasionally show up in the same documents — but more peripheral in the term graph network.

174

Sociometrics and Human Relationships

9.1.2. Removing the Mailbox Owner The next key step in the analysis of an individual mailbox is to remove the owner of the mailbox. As the static view of communication illustrates, I am also by far the most central person in my own network (mailbox owners are

Analyzing E-Mail with Condor

175

usually the most central actors in their own e-mail network, although there are exceptions). Image 11a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The conclusion is therefore to delete myself from my own network, which I can easily do using the “remove speciﬁc actor” function.

176

Sociometrics and Human Relationships

This reduces the number of links in the graph from 13,074 to 4945 edges. Note how the network now falls apart and is broken up in different clusters. Image 12a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

As the network metrics were heavily inﬂuenced by my own position in the network, I have to rerun all annotations. Note how the betweenness changes considerably after rerunning the annotations. Image 13a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Analyzing E-Mail with Condor

177

Next we can look at the evolution of the centrality network metrics over time (View->Group centrality measures). They are pretty oscillating, which, as has been shown in our research, is an indicator of creativity.

Next we look at the temporal social surface, using a sliding time window of 7 days, and unchecking the “with history” option.

178

Sociometrics and Human Relationships

This will use a sliding time window approach, taking the last 7 days in 1-day increments to calculate betweenness for each actor, while resorting the actors each day by betweenness, and then plotting their betweenness curves in a three-dimensional surface. Note how in the ﬁrst six weeks a few individuals show very high betweenness (not me, as I removed myself just before), suggesting they were actively collaborating with groups of people — bridging structural holes — while in the second half of the time period, there are no high-betweenness individuals, and the overall activity of people (people above “ﬂatland”) signiﬁcantly drops.

Image 14a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Analyzing E-Mail with Condor

179

Using the actor scatter plot will show us who the most positive people are, plotted against the number of messages they send. Image 15a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

And who is using the most complex language in the e-mails they send.

180

Sociometrics and Human Relationships

Image 16a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

And who are the most creative people, measured through oscillations in betweenness centrality. Image 17a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Analyzing E-Mail with Condor

181

Using the adjacency matrix view (View->Adjacency matrix) and sorting actors by degree centrality will show the most gregarious, that is, most connected individuals in the upper right corner.

We can now also look at the evolution of sentiment in my mailbox with and without me. Note the drop in activity during vacation time in July and August. There is also a drop in sentiment at the end of June. Compare this with the activity, sentiment, emotionality, and complexity of my mailbox, including the messages sent and received by me. Notice how the drop in activity is much less marked for the original mailbox including me, which also includes the mails sent exclusively to me, or received exclusively by me. This means that while overall business activity drops, I am still pretty active during the summer break. Also, while the sentiment shows some noticeable drops in the mailbox where I removed myself, the mood is steadier and less oscillating in the original mailbox including myself.

182

Sociometrics and Human Relationships

Finally, I would also like to know who the most inﬂuential people in my network are. It is better to calculate this with myself taken out of the picture, to not skew the inﬂuencer calculation algorithm. I check the box “create new dataset” to get a new network of inﬂuential people, based on new word usage being picked up by others.

Analyzing E-Mail with Condor

183

I am getting a new dataset “mail_May_ Aug15_Inﬂuence.” Drawing this network leads to a large group of noninﬂuential people — the isolated dots in the image below in the “asteroid belt.”

To remove all these noninﬂuencers, I calculate degree centrality, and remove all the nodes with degree smaller than 1, as we already did before, to prune the network.

184

Sociometrics and Human Relationships

The image now looks much different, and my colleagues from galaxyadvisors suddenly become most inﬂuential, followed by the COIIN project to reduce infant mortality. Image 18a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Analyzing E-Mail with Condor

185

This concludes the analysis of three months’ worth of e-mail data. We have been able to analyze the following: 1. Overall personality characteristics of my friends. Who is most creative (betweenness oscillation)? Who is most passionate (answers the fastest and uses most emotional language)? Who is most open and gregarious (has the highest degree centrality)? Who is using the most positive language? 2. Who has the most inﬂuence in my daily life? Who the most inﬂuential people in my work life are? Whom I respect most (whom I answer the fastest)? 3. What are key determinants of my professional life? Which communities and teams I am most working with (the biggest clusters)? What are the key topics I have worked on over the last three months? What are the most hectic periods; what have been the quiet periods over the last three months?

9.2. FINDING COINS THROUGH COMMUNITY DETECTION Condor offers an automatic way to ﬁnd COINs through its community detection algorithm. It uses the Louvain

186

Sociometrics and Human Relationships

algorithm,1 which assigns each actor to one community based on its social network connections. It ﬁnds the “modularity” of a community, which is deﬁned as a value between 1 and 1 by calculating the density of links inside the community compared to the density of all other links outside of the community. In this example to illustrate the detection of COINs, I use my mailbox from January 1, 2009 to December 31, 2015 to locate the key projects I have worked on since the last six years. First, I load the top 6000 actors of a dataset with 15,364 actors, and merge the multiple e-mail addresses that represent the same person into single actors using the “Manual node merging” wizard, as described in the previous section. Then, I run the community detection algorithm.

1

https://en.wikipedia.org/wiki/Louvain_Modularity

Analyzing E-Mail with Condor

187

I then also annotate the actors by betweenness centrality, to be able to draw them in their respective size. The communities come out nicely, they are numbered from 0 for the largest community to the number of communities (1271 in this example), with community 0 having the most members. Image 19a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

When checking the communities by looking at their members, I ﬁnd that the community detection algorithm did an excellent job grouping related people together. Community 0 (yellow) is most of the COIN seminar students, described in section 9.3 plus many outside collaborators. Community 1 (green) is the C3N Chronic Collaborative Care Network — a project on improving the lives of patients with Crohn’s disease, 2 (turquoise) is the US government part of the IM-COIIN — a project reducing infant mortality in the United States, community 3 (blue) is the sponsored projects of our MIT research

188

Sociometrics and Human Relationships

center, 4 (purple) is the other half of the IM-COIIN, 5 (light blue) is the CFF C3N project — a successor of the C3N project applying the same approach to patients with cystic ﬁbrosis, 6 (gray) is the HV-COIIN — another part of the IM-COIIN project focusing on home visiting, 7 (olive green) is around Technopark Aargau, a startup incubator in Switzerland. As I am part of the initial community, my betweenness centrality is by far the largest, and I am the glue linking the communities, which means that the community detection algorithm will produce one large cluster with me in the core, plus many small communities. I therefore, remove myself from the analysis, as described in the previous section using the “remove speciﬁc actor” menu function, and rerun the community detection algorithm. Note that I will also have to recalculate betweenness centrality. Image 20a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Analyzing E-Mail with Condor

189

I also increase the number of communities to be shown in different colors to the top 12 communities. Note that the clusters are now much smaller, as the big connector in the center (myself) has been removed. Community 0 (yellow), the largest cluster, is now the C3N project focusing on improving the lives of patients with Crohn’s disease, 1 (bright green) is US government part of the IM-COIIN, 2 (turquoise) is my own COINonCOINs, 3 (dark blue) is the MIT sponsored projects, 4 (purple) is my collaborators from University of Cologne, 5 (light blue) is the second part of the IM-COIIN around NICHQ — the contractor running the IM-COIIN project, 6 (gray) is the CFF cystic ﬁbrosis C3N, 7 (olive) is the HV-COIIN, 8 (dark green) is my collaboration with service provider ﬁrm Genpact, 9 (blue green) is around Technopark Aargau, a startup incubator, 10 (dark violet) is the MIT Sloan administration, 11 (violet) is a group of particularly active COIN seminar students from 2010. The next step is to drill down into one community at a time. I start by looking at the COINonCOINs, the turquoise cluster with community ID 2. Using the “Actor Filter” dialog with the setting below will remove all actors with a community ID other than 2.

190

Sociometrics and Human Relationships

After clicking on the next button, I obtain the following image. It will still show all the links, but only include the actors from community 2, the COINonCOINs.

Analyzing E-Mail with Condor

191

To obtain a network only showing the links between the members of the COINonCOIN community, I close the static view, and recompute the betweenness centrality as well as the community structure, as the network structure has now radically changed. The resulting image shows the network and subcommunity structure of the COINonCOINs community. Image 21a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Cluster 0 (yellow) is the students from University of Applied Sciences Northwestern Switzerland (FHNW) and from Wayne State, connected through their respective instructors Michael Henninger and Ken Riopelle. Community 6 (gray) is a particularly active team from FHNW, as are communities 5 (purple) and 7 (olive). Community 1 (green) is students from one COINs course mostly from Germany and SCAD connected through instructors Christine Miller and Julia Gluesing, community

192

Sociometrics and Human Relationships

2 (turquoise) is students from Universidad Cattolica Santiago di Chile connected through instructor Cristobal Garcia, community 3 (blue) is another COINs course with students from Helsinki and IIT connected through instructor Maria Paasivaara. In the next section, we will extend this e-mail analysis framework from analyzing single-user mailboxes to studying the e-mail network of an entire organization.

9.3. CREATING A VIRTUAL MIRROR OF AN ORGANIZATION The same process as analyzing an individual’s mailbox — called an “ego network” — can be used to track organizational networks, with the goal of improving organizational performance. Social network variables can be compared with organizational performance metrics such as employee or customer satisfaction, productivity, sales force success, or propensity to leave the organization. Based on these correlations, interventions to optimize the organization can be developed. In this example, we will analyze the e-mail network of the COINs 2015 spring seminar. Before starting the exploration, it is useful to acquire as much context information as possible about the organization we will be analyzing. The context information about the COINs seminar is as follows: The COINs seminar has been taught as a distributed seminar since 2005. COINs is the acronym for “Collaborative Innovation Networks,” which are the focus of the seminar. Students are participating from

Analyzing E-Mail with Condor

193

MIT, Illinois Institute of Technology, Aalto University, University of Cologne, and University of Bamberg, collaborating as virtual distributed teams, tackling problems of social media analysis, and other COINs-related issues. For most sites, the course consists of a 3-day introductory block course taught on-site, followed by 34 months of virtual collaboration by distributed student teams. The virtual collaboration projects are divided into iterations of 13 weeks. At the end of each iteration, each team presents the results of the last iteration and the plans for the next iteration in a virtual meeting. Half-way through the course, students are shown their own communication behavior captured through e-mail as a virtual mirror. At the end of the course, the students deliver a ﬁnal presentation and submit a ﬁnal paper reporting their project. All e-mail communication during the teamwork period is captured by asking the students to cc all their messages to a teamspeciﬁc dummy GMAIL folder. We start our analysis by downloading the 10 GMAIL boxes of the 10 teams of the 2015 COINs spring seminar, which are, in addition to students from University of Cologne and University of Bamberg, made up of students from University of Applied Sciences Northwestern Switzerland, and University of Rome Tor Vergata. The e-mails are stored in one dedicated MySQL database, putting each mail folder into one dataset. First, we create a dataset for each team.

194

Sociometrics and Human Relationships

Then, we load the GMAIL mailbox for the team into the newly created dataset using the e-mail fetcher.

This process is now repeated for teams 210, leading to 10 different datasets, one for each team.

Analyzing E-Mail with Condor

195

The next step is to merge the 10 datasets into one combined dataset.

We are now ready to take a ﬁrst look at the merged dataset, coloring it by original dataset. This will show the communication for each team in a different color. Note, however, that an actor who is in more than one dataset, like the instructor, will only be shown in the color of the ﬁrst dataset he is in. We also size the nodes by betweenness,

196

Sociometrics and Human Relationships

which means that we have to annotate them ﬁrst by betweenness (Process dataset->Annotate->Centrality annotations).

Image 22a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

We already can see the central role of the main instructor (Peter). We can also very nicely see the different teams, which are each represented as a separate cluster, connected by inter-team ties, and ties to the instructors. We also note that there are more actors than students per team. Further investigation reveals that a lot of the students have been using more than one e-mail address, so these need to be combined. In addition, we also see that for each team we have a virtual actor called coinproject1 to coinproject11. We start by removing these virtual actors. We also discover some spam and mailing list actors, which we are removing for this analysis, as we are interested in human interactions. Note that if we were interested in knowledge ﬂow, we might keep these virtual

Analyzing E-Mail with Condor

197

actors. We also note some script-generated actors (e.g., [email protected]), which we also remove. To remove speciﬁc actors we use the “Process dataset -> Remove actors” function. In the next step, we manually go through the e-mail addresses, using the “Process dataset->Node merging ->manual node merging” function. Condor provides some support for this process. Starting to type a name into the search bar at the top will bring up all names starting with the same characters. Clicking on the heading “Uuid” or “Name” will sort by Uuid or Name. Shiftclicking on more than one actor in a pane will allow us to merge these actor aliases under the ﬁrst name by clicking “Merge actors and/or groups.”

198

Sociometrics and Human Relationships

Note: A company’s e-mail is usually quite clean because employees generally use only one e-mail address for ofﬁcial business correspondence. However, there are exceptions in case a person changes her/his name after a marriage or a divorce. And, a different e-mail address may be assigned to an employee who quits and then returns. Overall, it is recommended to periodically check for people having more than one e-mail address. Once we have done our labor-intensive merging work, we can save the merge ﬁle as an Excel CSV ﬁle, by clicking the button “Save actor merge CSV.” Clicking on “Load actor merge CSV” allows us to load a previously saved actor merge ﬁle.

Analyzing E-Mail with Condor

199

A line in the actor merge ﬁle is of the format “uuiid to keep, name to keep, uuid to merge, name to merge,” for example, [email protected], Peter Gloor, [email protected], Peter A Gloor. This will lead to the combined actor being called [email protected] The network has now been greatly reduced in number of nodes; each node now corresponds to one actor in the COINs seminar.

Image 23a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

200

Sociometrics and Human Relationships

At this point in the analysis, it is a good idea to save the cleaned-up ﬁle under a new name, for example “allteams-cleaned.”

We are now closing the original merged dataset, and we open the cleaned-up dataset “allteams-cleaned.” The next step is to annotate the network, by calculating the actor level metrics: Process dataset->annotate->Centrality annotations (betweenness, degree) Process dataset->annotate-> Oscillation annotations Process dataset->annotate-> Contribution Index annotations Process dataset->annotate-> Turntaking annotations Process dataset->Calculate sentiment. After that, we obtain a ﬁrst overview by creating a dynamic movie of the social network (View->Create dynamic view). The image below shows four snapshots of the movie over the time period from April 14 to June 16, 2015. Note how on the ﬁrst picture in the upper left the teams self-organize, without the instructor. In pictures

Analyzing E-Mail with Condor

201

23 the instructor becomes increasingly central, communicating intensively with a few teams, while others go their own way. In picture 4 the teams are again communicating among themselves, with the instructor just attached to one team in the center. Image 24a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Next we look at the e-mail activity over time. We can clearly see a spike in communication activity of the students before the end of an iteration, when they frantically prepare their presentations for the virtual meetings. Likewise, in business we see high rates of team member interaction right before a project deadline, management review meeting, or other signiﬁcant event. Check your calendars to identify these key event dates.

202

Sociometrics and Human Relationships

Next we look at sentiment (in blue), emotionality (in green), activity (in red) and complexity (in yellow) over time. We see that sentiment starts quite positive, but goes down in the last third of the course, when laggards are pushed in less friendly words to contribute their share to the ﬁnal presentation. In prior work we have found that the sentiment in the language of well-functioning teams tends to move down somewhat to get more “honest,” as team members are not just giving praise to each other, but also say in clear words what can be done to improve the product. In the end, sentiment goes up again, with mutual back patting after the job well done. In this analysis of the COINs course teams we ﬁnd the same pattern.

Analyzing E-Mail with Condor

203

Image 25a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Right-clicking on an actor brings up his individual sentiment, emotionality, activity, and complexity.

Clicking on the instructor (Peter) shows that he is using quite clear language, with very positive sentiment, but also quite negative periods, particularly right before the end of the course.

204

Sociometrics and Human Relationships

Now we look at individual actors, starting with the actor scatter plot (View->Actor scatter plot) of the contribution index (see Section 5.1). Coloring individual actors by “original dataset” (which corresponds to their team), we ﬁnd that members of the same team show similar send/receive ratios. Different to the analysis of COINs seminar teams in previous years, we ﬁnd that there is always a team member who is more active than the rest of the team. In prior years, we had found that teams where members showed similar contribution index and number of e-mails sent or received were performing better, but in this analysis we did not ﬁnd such teams. We still notice that members of a team are close together in the overall number of messages sent, but there is usually one or two members who are more “up-right” in the scatter plot and thus the main communicators and senders of e-mails.

Analyzing E-Mail with Condor

205

Next we look at the most oscillating (betweenness oscillation), responsive (ego ART), respected (alter ART), connected (degree), positive (sentiment), and emotional students. Note: It is important to keep in mind the time zone differential among team members when reviewing a person’s responsiveness (ego ART), and respect (alter ART). Vacations, sick time, and leaves as well as cross cultural differences in answering e-mails over the weekends may make it difﬁcult to compare these indicators uniformly across team members. Again, it is important to understand the context of team members.

206

Sociometrics and Human Relationships

Analyzing E-Mail with Condor

207

Next we are looking at the betweenness centrality of individual actors over time, by exporting a time series of betweenness values per actor to Excel as a CSV ﬁle (Export->Export time series). We have to uncheck the “with history” checkbox and set the sliding time window to 7 days to extract actor values that are always representing the last 7 days. If we had checked the “with history” box, the social network would have been built by subsequently adding all edges to the network graph for each actor, up to the ﬁnal, most complete graph. For this application of tracking what’s happening each week, this would have been the wrong setting.

Opening the export ﬁle in Excel, we ﬁnd that Peter is the most central actor in the beginning, but that different students assume leadership roles of high-betweenness centrality over the progress of the course (the orange, green, and black lines in the image below).

208

Sociometrics and Human Relationships

Comparing the individual betweenness curves with the temporal social surface (View->Create temporal social surface view) leads to a similar image. See the Condor manual for steps to identify actors in the temporal social surface view.

Analyzing E-Mail with Condor

209

Initially, there is one person who is highly central in the temporal social surface (Peter). After this initial more centralized structure, all participants are active with similar betweenness centrality, interrupted by small bursts of centrality along the way toward June 16, 2015. This is the pattern of rotating leadership, which is indicative of creative teams.

Next, we look at word usage over time, selecting four words whose usage over time will be shown.

210

Sociometrics and Human Relationships

The usage pattern of these four words correlates quite well with the overall activity of the students. The students are planning “skype meetings” for “presentations tomorrow.” In mid-June, there are no “meetings” anymore, while students are still working on the “presentation.” Discussion about “tomorrow” is also going down as the semester progresses. Note: Business communication often contains a boilerplate nondisclosure statement at the bottom of each e-mail that can bias or distort the words appearing in a word cloud. It is recommended to identify those words in these nondisclosure statements and exclude them. This is particularly important when many suppliers may be involved who have different nondisclosure wording.

Analyzing E-Mail with Condor

211

Image 26a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Next we look at the overall word usage by computing the word cloud. The overall sentiment is very positive; there is no keyword with a negative context in the word cloud of the top 40 words. Meetings and Skype, as well as the names of the instructor (Peter) and of students who are particularly active are the most popular words. Image 27a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

212

Sociometrics and Human Relationships

We can also check where on the world people involved in the COINs seminar have been active. As we do not have geotagged data in e-mail, we use natural language processing (NLP) to map names of geographical places such as country names and cities to locations on the world map. First, we have to run the location annotation process (Process dataset->Location annotation). Condor has two options to look up the mapping of a location. Either it uses the Google geocoding API (this only works for the ﬁrst 2500 lookups) or it uses a locally installed geomapping dataset. Note that the ﬁrst time Condor runs location annotation with the local database, the database has to be installed on the local machine by checking the box “Import local data ﬁrst.”

The image below shows the results of the geotagged data, running “View->Create geographical view.” Note that while there are no participants from Oceania, there is discussion in Australia and Indonesia, as one of the projects was working on a global health project.

Analyzing E-Mail with Condor

213

Image 28a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Bringing up the European map illustrates the focus on Switzerland, Germany, and Italy, where most of the participating students were coming from. Image 29a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

214

Sociometrics and Human Relationships

For the aggregate analysis on the team level, we combine all e-mail records of a team into one virtual actor per team, running “Analyze->Create collapsed graph,” thereby collapsing the graph by the original datasets that correspond to the 10 teams.

Creating the static view of the newly created network shows the ties between the 10 teams. Note that each person can only be allocated to one team, the one in which the person is ﬁrst mentioned in. In the image below, node size corresponds to the number of team members.

Analyzing E-Mail with Condor

215

Using the contribution index on the aggregated team level shows the most active teams. Team 4 is most proactive, and team 11 is most active overall.

We can now also check which teams are most responsive (ego ART), most respected (alter ART), most positive (sentiment), and most creative (betweenness centrality oscillation). We ﬁnd that team 4 is the most creative (highest betweenness centrality oscillation) while also being highly respected (alter ART), that is, others respond to them the fastest; they are also very positive in their sentiment.

216

Sociometrics and Human Relationships

Next we calculate the change in responsiveness of the different teams, starting with exporting the ego ART time series. We set the time window to 7 days, the minimum turn delay to one minute, ignoring replies sent within less than a minute, and e-mails which have not been answered after more than 4 days.

Analyzing E-Mail with Condor

217

The resulting CSV ﬁle is then opened in Excel, and shows that most teams start slow, but are getting more responsive as the course progresses, illustrating the effectiveness of the virtual mirroring method: If you tell people what you will be measuring, they will start changing what is being measured! Image 30a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

218

Sociometrics and Human Relationships

Compared to the ﬁrst ego-centric network analysis example, where a single mailbox has been explored, this second example illustrates the virtual mirroring process on the organizational level. It can be done for small and large organizations, ranging from teams with a dozen members to companies with hundred thousands of employees. Other than the ego-network analysis which studies mostly networking behavior of individuals, for the organizational analysis, virtual actors can be created by aggregating communication records using grouping attributes, for example, on the team, business unit, or geography level. We can study the following: 1. Personality characteristics of individual actors Who is most creative (betweenness oscillation)? Who is most passionate (answers the fastest and uses most emotional language)? Who is most respected (whom others answer the fastest)? Who is most open and gregarious (has the highest degree centrality)? Who is using the most positive language? 2. Longitudinal analysis of the organization Who are the most inﬂuential people at different points in time? What are the key events at different points in time? 3. Aggregated analysis of the organization What are the key topics and sentiment? What are the key locations?

Analyzing E-Mail with Condor

219

4. Organizational unit analysis What are the most creative teams (betweenness oscillation)? What are the happiest teams (answer the fastest and use most positive language)? What are the most respected teams (whom others answer the fastest)? What are the most connected teams (highest degree centrality)? These four steps are building blocks toward developing recommendations for interventions to increase organizational performance, by correlating social network metrics calculated for Steps 14 with organizational performance metrics such as sales performance, customer satisfaction, or employee turnover. Next, we will study another publicly available e-mail dataset, Hillary Clinton’s e-mail.

9.4. ANALYZING HILLARY CLINTON’S MAIL This short example is based on Hillary Clinton’s mailbox from her time as US secretary of state 2009 to 2014.2 Against ofﬁcial regulations, Hillary Clinton had been using a private e-mail server for her ofﬁcial correspondence. The main criticism focused on Hillary Clinton deleting 32,000 e-mails, which she considered private. As part of a subsequent investigation into potential misbehavior, the US

2

https://en.wikipedia.org/wiki/Hillary_Clinton_email_controversy

220

Sociometrics and Human Relationships

department of justice published a subset of her e-mail from 2009 to 2014 as 7000 individual pdf ﬁles. These e-mails have been scanned in and provided as a SQLite database and an Excel spreadsheet on the Kaggle website.3 I loaded this data into Condor as a dataset. It is available at www. ickn.org/sociometrics/. First, we create a new database. In this database, we load the CSV data from ﬁle “emails.csv” containing the messages, and “persons.csv” containing the actor names. For emails.csv we delete the redundant rows, only keeping the following rows: • Id — the id of the link, an e-mail can result in more than one link if it has multiple recipients • E-mail id — the id given by Kaggle to the e-mail message • Subject — subject line • SenderPersonId — the id of the person sending the e-mail given by Kaggle and listed in persons.csv • ReceiverId — the id of the person receiving the e-mail given by Kaggle and listed in persons.csv • DateSent — the date the message was sent • ExtractedBodyText — the body text extracted by Kaggle from the pdf ﬁle. The persons.csv ﬁle contains three ﬁelds. Note that we need to add a mock starttime ﬁeld, as the Condor CSV importer is expecting it.

3

https://www.kaggle.com/kaggle/hillary-clinton-emails

Analyzing E-Mail with Condor

221

• Id — the id of the person to link to the person name in the emails.csv ﬁle • Name — the name of the person • Starttime — a fake start time for Condor. This leads to the following import CSV dialog.

In the next step we assign dateSent both to Condor’s starttime and endtime.

222

Sociometrics and Human Relationships

This leads to a dataset with 514 actors and 9291 links. The static view below shows the resulting social network image. After annotating the actors with betweenness centrality, the most central actors stand out.

Analyzing E-Mail with Condor

223

Not surprisingly, the most central people are Hillary Clinton herself, and her trusted staffers Cheryl Mills, Huma Abedin, Sidney Blumenthal, and Harold Hongju Koh. Creating a tag cloud of the most frequent words does not bring up any “smoking guns.” Rather, the words such as “government,” “secretary,” “state,” “president,” “time,” “tomorrow” are the ones to be expected in the daily communication of the US secretary of state.

224

Sociometrics and Human Relationships

Image 31a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

The activity chart below tells us that the bulk of the released e-mails is from 2009 to 2011, peaking end of 2009 with 25 messages per day. The sentiment is quite neutral, conﬁrming Hillary Clinton’s reputation as a welltempered person using neutral non-emotional language.

Analyzing E-Mail with Condor

225

Image 32a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The contribution index scatter plot below tells us that Hillary is the (not surprisingly) most active sender and recipient of messages, getting somewhat more messages (her contribution index is 0.25) than she sends. The second most active participants are Huma Abedin and Cheryl Mills, who send signiﬁcantly more to Hillary than they receive (their contribution index is 0.30.5).

226

Sociometrics and Human Relationships

Huma Abedin is blazingly fast in answering her e-mail, with less than one-hour responses. Hillary takes on average nine hours to answer her mails, Cheryl Mills 32 hours.

Analyzing E-Mail with Condor

227

The most creative people, measured through their betweenness oscillation shown below, are Hillary Clinton herself, Huma Abedon, Jake Sullivan, another member of the inner circle of Hillary, and Cheryl Mills.

The last image shows the temporal social surface. I already removed Hillary from that image, because she would dominate the image. It shows us that there are very few people, among them Huma Abedin, Cheryl Mills, Jake Sullivan, and Harold Hongju Koh, who dominate the discussion, while the remaining 500 actors play mostly peripheral roles and have few connections among themselves.

228

Sociometrics and Human Relationships

9.5. ORGANIZATIONAL ASPECTS OF E-MAIL-BASED SNA Analyzing the e-mail of individual members of an organization needs to be approached carefully, in particular if e-mail content is included into the analysis. A series of legal and ethical issues have to be addressed when conducting such a project. It has been our experience that analyzing employee e-mail is, and has been, a very sensitive subject. This issue usually brings the company’s legal staff, human resources, information technology, and senior management into a conversation about how such analysis will be conducted. Nevertheless, we have found that with careful consideration, transparency, and clear rules of involvement, e-mail analysis can be conducted

Analyzing E-Mail with Condor

229

from the level of a team, department, division, or entire enterprise for the beneﬁt of employees, teams, and the enterprise. As an engineering manager said, “We know that our e-mail is monitored for drugs, pornography, gambling etc. Let’s use it for a positive purpose rather than for just punitive purpose.” There are different ways to collect the e-mails of organizations with Condor: 1. The easiest way is to use Condor’s single-user e-mail fetcher to collect the team members’ e-mail one dataset at a time and then merging them. This means that for each e-mail account the user id and password are needed. 2. Using Microsoft Outlook/Exchange impersonation (an e-mail admin account), multiple e-mail accounts can be loaded in batch mode without the need for individual passwords. 3. Condor is also available in a server version called CondorCore, which allows organizations to download multiple e-mail accounts automatically using “CondorCore annotations;” this option, however, requires for the e-mail passwords to all be stored on the server. 4. Exporting e-mail from the server as Excel CSV ﬁles, or MySQL tables, and loading the exported CSV ﬁles or MySQL tables directly into Condor, using Condor’s import CSV and import MySQL function. 5. Extracting e-mail from a person’s Outlook .pst ﬁles, importing it in Condor 2.6.6 and converting to Condor 3.

230

Sociometrics and Human Relationships

6. Microsoft Outlook Rules can be used to automatically collect e-mail from historical archives and current activity with keywords and be forwarded to a dummy mailbox, for example, to a gmail dummy account. My friends Julia Gluesing and Ken Riopelle who did a lot of e-mail analysis projects once worked with a corporate manager who had over 20 subteams reporting to him. He insisted that all his subteams cc him on all e-mails. In this case, his e-mail folder for the subteams was used for analysis. In this case no dummy e-mail box had to be created to be cced on any correspondence. When doing a group e-mail analysis, how does one know who is on the team to invite for the e-mail analysis? This might seem like an obvious question, but for large global teams it is often difﬁcult to identify team members beyond a visible common core. One way to resolve these issues is to ﬁrst analyze a project team leader’s e-mail and calculate the degree and betweenness centrality for each person and rank-order them as a guide to select team members to be included in the project team analysis; this process is also called “snowball sampling” in social network analysis. This approach can be used to analyze e-mail of tens of thousands of employees and even hundreds of thousands of employees; some of these projects are described in the scientiﬁc papers listed at the end of this chapter. As each of these projects is highly dependent on local requirements and the e-mail architecture employed at the organization, describing a general process is beyond the scope of this book.

Analyzing E-Mail with Condor

231

The collection of insights below reﬂects some lessons learned doing hundreds of e-mail analysis projects at organizations. • Only project-related e-mails should be included. Personal e-mails can be excluded through Outlook rules or by setting up speciﬁc folders that will be ignored in the collection process. • With the same exclusion process, patent-related and other legal e-mails can be excluded because of their privileged or conﬁdential nature. • Employees can volunteer to opt into the analysis and can opt out at any time. • The analysis will not be used for any punitive or termination purposes. • The individual analysis will be shared with team members ﬁrst before upper management gets to see it. • In some cases, content is excluded and only the subject line retained. • In a global study, which involved numerous countries around the world, it was decided to follow German law because it was the most restrictive. • E-mail analysis might stay inside the company’s ﬁrewall, by installing Condor and the MySQL database on a server inside the company’s ﬁrewall. • When e-mail analysis is shared with people outside the company, it is anonymized or deidentiﬁed so no individual is identiﬁed.

232

Sociometrics and Human Relationships

9.6. FOLLOW-ON EXERCISES 1. Load your own mailbox into Condor and conduct a similar analysis as the one mailbox. Whom are you answering fastest, who is answering to you fastest? Who is most creative? How positive is the sentiment? What are the key terms you are using in your everyday language? 2. Using Hillary Clinton’s e-mail box, who are the most central people outside her close collaborators. What is the role of Barack Obama in her network? 3. Can you ﬁnd out what happened to the key people in Hillary Clinton’s mailbox. Doing a Coolhunting for them on social media, what are their activities on Twitter? What are others saying about them in Wikipedia, blogs, and social media? Who are the most prominent people in the public perception on social media?

9.7. (PARTIAL) LIST OF E-MAIL STUDIES CONDUCTED BY THE AUTHOR IN VARIOUS ORGANIZATIONS Downloadable from http://www.ickn.org/publications. html Gloor, P. A. (2015, Winter). What email reveals about your organization. Sloan Management Review. Gloor, P., Woerner, S., Schoder, D., Fischbach, K., & Fronzetti Colladon, A. (2016). Size does not matter In the virtual world. Comparing online social networking

Analyzing E-Mail with Condor

233

behavior with business success of entrepreneurs. International Journal of Entrepreneurial Venturing. Accepted for publication. Allen, T., Gloor, P., Woerner, S., Raz, O., & Fronzetti Colladon, A. (2016). The power of reciprocal knowledge sharing relationships for startup success. Journal of Small Business and Enterprise Development. Accepted for publication. Gloor, P., & Fronzetti, A. (2015). Measuring organizational consciousness through e-mail based social network analysis. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. Gloor, P., Paasivaara, M., & Miller, C. (2015). Lessons from the coin seminar. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. Maddali, H. T., Gloor, P., & Margolis, P. (2015) Comparing online community structure of patients of chronic diseases. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. Hybbeneth, S., Brunberg, D., & Gloor, P. (2014). Increasing knowledge worker productivity through a “Virtual Mirror” of the social network. International Journal Organisational Design and Engineering, 3(3/4).

234

Sociometrics and Human Relationships

Gloor, P., & Giacomelli, G. (2014, Spring). Reading global clients’ signals. Sloan Management Review, March 4. Grippa, F., Provost, S., Gloor, P., McKean, M., & Thakkar S. A. (2014). Systematic methodology to characterize communication patterns in chronic care innovation networks. In S. Long, E-H. Ng, & C. Downing (Eds). Proceedings of the American society for engineering management international annual conference. Zhang, X., Gloor, P., & Grippa, F. (2013). Measuring creative performance of teams through dynamic semantic social network analysis. International Journal Organisational Design and Engineering, 4(2). Gloor, P., & Paasivaara, M. (2013). COINs change leaders — Lessons learned from a distributed course. Proceedings fourth international conference on collaborative innovation networks COINs, Santiago de Chile, August 1113. Hybbeneth, S., Brunnberg, D., & Gloor, P. (2013). Increasing knowledge worker efﬁciency through a “virtual mirror” of the social network. Proceedings of the fourth international conference on collaborative innovation networks COINs, Santiago de Chile, August 1113. Brunnberg, D., Gloor, P., & Giacomell, G. (2013). Predicting customer satisfaction through (e-mail) network analysis: The communication score card. Proceedings fourth International Conference on Collaborative Innovation Networks, Santiago de Chile, August 1113.

Analyzing E-Mail with Condor

235

Gloor, P., Dorsaz, P., Fuehres, H., & Vogel, M. (2012). Choosing the right friends Predicting success of startup entrepreneurs and innovators through their online social network structure. International Journal of Organisational Design and Engineering, 3(2). Grippa, F., Palazzolo, M., Buccuvalas, J., & Gloor, P. (2012). Monitoring changes in the social network structure of clinical care teams resulting from team development efforts. International Journal of Organisational Design and Engineering, 2(4). Gloor, P., Margolis, P., Seid, M., & Dellal, G. (2014). Coolfarming Lessons from the beehive to increase organizational creativity. MIT Sloan School Working Paper No. 5123-14. Gloor, P., Paasivara, M., Lassenius, C., Schoder, D., Fischbach, K., & Miller, C. (2011). Teaching a global project course: Experiences and lessons learned. ICSE International Conference on Software Engineering Collaborative Teaching of Globally Distributed Software Development Community Building Workshop, Honolulu, Hawaii, May 23. Gloor, P., Grippa, F., Borgert, A., Colletti, R., Dellal, G., Margolis, P., & Seid, M. (2011). Toward growing a COIN in a medical research community. Procedia Social and Behavioral Sciences, 26. Proceedings COINs 2010, Collaborative innovations networks conference, Savannah GA, October 79.

236

Sociometrics and Human Relationships

Merten, F., & Gloor, P. (2009). Too much e-mail decreases job satisfaction. Proceedings COINs, Collaborative innovations networks conference, Savannah GA, October 811. Grippa, F., & Gloor, P. (2009). You are who remembers you. Detecting leadership through accuracy of recall. Social networks, August 11. Allen, T., Raz, O., & Gloor, P. (2009). Does geographic clustering still beneﬁt high tech new ventures? The case of the Cambridge/Boston biotech cluster. MIT ESD-WP2009-01 Working Paper. DiMaggio, M., Gloor, P., & Passiante, G. (2009). Collaborative innovation networks, virtual communities, and geographical clustering. International Journal of Innovation and Regional Development, 1(4), 387404. Fischbach, K., Gloor, P., & Schoder, D. (2009, February). Analysis of informal communication networks A case study. Business & Information Systems Engineering, 2 (also in German). Gloor, P., Paasvaara, M., Schoder, D., & Willems, P. (2007, April). Finding collaborative innovation networks through correlating performance with social network structure. Journal of Production Research. Kidane, Y., & Gloor, P. (2007, March). Correlating temporal communication patterns of the Eclipse open source community with performance and creativity. Computational & Mathematical Organization Theory, 13(1).

Analyzing E-Mail with Condor

237

Zilli, A., Grippa, F., Gloor, P., & Laubacher, R. (2006). One in four is enough Strategies for selecting ego mailboxes for a group network view. Proceedings of European conference on complex systems ECCS ’06, Oxford, UK, September 2529. Gloor, P., & Zhao, Y. (2006) Analyzing actors and their discussion topics by semantic social network analysis. Proceedings of 10th IEEE international conference on information visualisation IV06, London, July 57. Grippa, F., Zilli, A., Laubacher, R., & Gloor, P. (2006). E-mail may not reﬂect the social network. NAACSOS Conference, Notre Dame IN, North American Association for Computational Social and Organizational Science, June 22, 23. Gloor, P. Niepel, S., & Li, Y. (2006, January). Identifying potential suspects by temporal link analysis. MIT CCS Working Paper. Gloor, P., Laubacher, R., Dynes, S., & Zhao, Y. (2003). Visualization of communication patterns in collaborative innovation networks: Analysis of some W3C working groups. ACM CKIM international conference on information and knowledge management, New Orleans, November 38.

238

Sociometrics and Human Relationships

MAIN LESSONS LEARNED • Creating a social network map of your personal mailbox will give you unprecedented insights into your social network. • You will ﬁnd out with whom you are working with most closely, who the hidden inﬂuencers are, who likes you best, and who respects you most, but also who are the bottlenecks, and who is bridging structural holes among your friends. • Through automatic community detection, you will ﬁnd your COINs. • You will also see how these measures can change over time as your network of relationships changes with new projects, collaborators, suppliers, and clients. • The same social network analysis can be extended to teams and entire companies to improve communication within the organization, by identifying bottlenecks, collaborators, hidden inﬂuencers, and people bridging structural holes. • Even more, the network map can be used to improve knowledge ﬂow in business processes, by tracking and improving employee satisfaction, customer satisfaction, employee turnover, and sales force performance to increase organizational effectiveness.

Analyzing E-Mail with Condor

239

• The analysis is based on a list of “six honest signals of collaboration” that Condor continuously measures indicative of creative or high-performing communication (see Chapter 4). • The second example in this chapter analyzes the network of an entire organization using the e-mail communication of a class of 50 students from the COINs seminar working in 10 teams. • The third example analyzes Hillary Clinton’s e-mails released as part of her e-mail controversy, identifying her closest collaborators.

This page intentionally left blank

10 CALCULATING PERSONALITY CHARACTERISTICS FROM E-MAIL

CHAPTER CONTENTS • Predicting personality characteristic by correlating FFI with e-mail behavior • Developing a prediction formula through ordinary least squares regression • Adding gender variables.

r 2017 Peter A. Gloor

and

nationality

241

as

control

242

Sociometrics and Human Relationships

10.1. CALCULATING CORRELATIONS BETWEEN FFI AND E-MAIL E-mail behavior of people is indicative of their personality characteristics. If we have both an e-mail archive of a person and their personality characteristics, we can correlate e-mail behavior and personality, leading to a general mapping of the honest signals of collaboration to personality characteristics. In this example, we use the e-mails of a group of 50 students from Germany, Finland, and the United States participating in the COINs course. Out of the 50 students collaborating in 11 teams, 34 also took the Neo-FFI test,1 a shortened version of the Big Five personality test. We will now ﬁrst calculate the honest signals of collaboration, and then correlate them with the Neo-FFI results. First, we load the full e-mail archive into Condor. The archive covers the group communication of the 50 students and their instructors over a period of three months. The static view below shows the full network.

1

https://en.wikipedia.org/wiki/Revised_NEO_Personality_ Inventory

Calculating Personality Characteristics from E-Mail

243

The next step is the calculation of the six honest variables of collaboration. We compute them using the annotate functions: • Process dataset->annotate->Centrality annotations (betweenness and degree) (Central Leadership) • Process dataset->annotate-> Oscillation annotations (Rotating Leadership) • Process dataset->annotate-> Contribution Index annotations (Balanced Contribution) • Process dataset->annotate-> Turntaking annotations (Responsiveness) • Calculate sentiment (Honest Sentiment) • Calculate inﬂuence (Shared Context). We then export them to Excel:

244

Sociometrics and Human Relationships

Next, we correlate the resulting values with the FFI metrics: Neuroticism, Extroversion, Openness, Agreeability, and Conscientiousness for the 34 students where we have their FFI metrics. We ﬁnd the following correlations, using SPSS (Table 8). We ﬁnd that agreeable people are more central by betweenness as well as by degree. This means they have more important and more numerous friends. One explanation for their popularity could be because they are easier to get along with. People who are more open to experience have fewer friends, that is, lower degree centrality. This could perhaps be because they are more focused on their projects or because they have more connections to people outside their course project not captured in this analysis. More conscientious people send and receive more messages, and others respond faster to them (alter ART). This might be because they act as timekeepers and note takers for their less conscientious peers. More neurotic people send less positive e-mails (lower sentiment), and they also send less complex e-mails. More extrovert people send more positive e-mails.

10.2. DEVELOPING A GENERAL PREDICTION FORMULA Based on the correlations above, we develop ﬁve regression equations, to predict the Big Five personality characteristics based on e-mail behavior. Using IBM SPSS or another statistics package, we regress the six honest signals against the Big Five values: Neuroticism, Extroversion,

Correlations

Betweenness centrality

Neuroticism

Extroversion

Openness

Pearson correlation

.189

.236

.268

Sig. (two-tailed)

.285

.178

.125

N Degree centrality

34

.285

.259

Sig. (two -tailed)

.102

.139

Pearson correlation

34

N

34

.014

Sig. (two-tailed)

Alter ART [h]

34

Pearson correlation

N Messages total

34

.939 34

.563 34

.348*

.119

.044

.503

34 .371*

.072

.031

.686

.169 .339 34

34 .259

.198

.199

.108

Sig. (two-tailed)

.551

.262

.258

.543

34

.357*

34

.038 34 .370* .031 34

245

34

34

.139 34

.106

34

34

.035

Pearson correlation

N

Conscientiousness

.363*

34

.103

Agreeability

Calculating Personality Characteristics from E-Mail

Table 8: Correlation Results of FFI Metrics with Six Honest Signal SNA Metrics.

246

Table 8: (Continued ) Correlations

Neuroticism avg sentiment

Pearson correlation Sig. (two-tailed)

avg complexity

Pearson correlation Sig. (two-tailed) N

Betweenness centrality oscillation

Pearson correlation Sig. (two-tailed) N

Alter Nudges

.002 34

Openness

Agreeability

Conscientiousness

.465**

.174

.233

.308

.006

.326

.185

.076

34

34

34

34

.371*

.249

.017

.085

.284

.031

.156

.924

.632

.103

34 .076 .669 34

34 .104 .558 34

34 .060 .737 34

34 .023 .895 34

34 .109 .538 34

Pearson correlation

.107

.159

.104

.032

.020

Sig. (two-tailed)

.547

.368

.558

.859

.912

N

34

34

34

34

34

Sociometrics and Human Relationships

N

.522**

Extroversion

Pearson correlation Sig. (two-tailed) N

Total inﬂuence

Pearson correlation Sig. (two-tailed) N

Messages received

.289

.322+

.633

.097

.063

34 .087 .624 34

34 .009 .960 34

34 .212 .230 34

34 .261 .136 34

34 .290+ .097 34

.066

.227

.179

.327+

Sig. (two-tailed)

.912

.711

.196

.312

.059

Pearson correlation

N

34

34

34

34

34

.180

.165

.267

.124

.146

.308

.351

.128

.486

.409

34

34

34

34

34

Pearson correlation

.088

.023

.035

.057

.198

Sig. (two-tailed)

.622

.897

.846

.750

.262

N avg emotionality

.085

.502

.020

Sig. (two-tailed)

Ego ART [h]

.119

.810

Pearson correlation

N Contribution index

.043

Pearson correlation Sig. (two-tailed)

34

34

34

34

.221

.175

.227

.171

.007

.208

.322

.196

.334

.970

34

34

34

34

34

247

N

34

Calculating Personality Characteristics from E-Mail

Messages sent

248

Table 8: (Continued ) Correlations

Average inﬂuence per message

Neuroticism

Extroversion

Openness

Agreeability

Conscientiousness

Pearson correlation

.213

.249

.205

.164

.081

Sig. (two-tailed)

.226

.155

.244

.353

.649

Ego nudges

34

34

34

34

Pearson correlation

.014

.005

.125

.111

.062

Sig. (two -tailed)

.936

.979

.482

.532

.727

N *Correlation is signiﬁcant at the 0.05 level (two-tailed). **Correlation is signiﬁcant at the 0.01 level (two-tailed). + marginally signiﬁcant.

34

34

34

34

34

34

Sociometrics and Human Relationships

N

Calculating Personality Characteristics from E-Mail

249

Openness, Agreeability, and Conscientiousness, calculating the goodness of ﬁt, and the coefﬁcients of the regression equations. 10.2.1. Neuroticism The Adjusted R Square for this regression is 0.39; which means 39% of the neuroticism of people can be explained by the structure and content of their e-mail (Table 9). In other words, the less messages people send, the less positive they are, the less central by betweenness they are, and the more different communication partners they have (degree centrality), the more neurotic they are. The equation is as follows: N = 0.28*messages sent 76.135*sentiment 0.019*betweenness centrality + 0.995*degree centrality + 87.781 Table 9: Regression Coefﬁcients for Regressing Six Honest Signals against Neuroticism. Model

Unstandardized

Standardized

Coefﬁcients

Coefﬁcients

B

Std.

t

Sig.

Beta

Error (Constant)

87.781

12.679

6.924 .000

.028

.011

.461

2.652 .013

avg sentiment

76.135

20.757

.518

3.668 .001

Betweenness

.019

.010

.683

1.932 .063

.995

.366

1.054

2.718 .011

Messages sent

centrality Degree centrality

250

Sociometrics and Human Relationships

10.2.2. Extroversion The Adjusted R Square is 0.33, which means 33% of the extroversion of people can be explained by the structure and content of their e-mail (Table 10). In other words, the more people oscillate in their network position, the more positive they are in their messages, the faster other people answer to them (the lower alter ART), and the less different communication partners they have (degree centrality), the more extrovert they are.

10.2.3. Openness The Adjusted R Square is 0.11, this means 11% of the openness to experience of people can be explained by structure and content in their e-mail (Table 11). In other words, the less central people are by betweenness centrality, and the more they send e-mails compared to receiving them, the more open they are to new experiences.

10.2.4. Agreeability The Adjusted R Square is 0.21, which means 21% of the agreeability of people can be explained by the structure and content in their e-mail (Table 12). In other words, the more inﬂuential by word usage in people’s e-mails, the less complex their messages, and the more positive their messages, the more agreeable people are.

Model

Unstandardized

Standardized Coefﬁcients

t

Sig.

Coefﬁcients

(Constant) Betweenness centrality oscillation

B

Std. error

16.015

14.669

Beta 1.092

.284

.466

.203

.363

2.299

.029

avg sentiment

73.989

23.723

.465

3.119

.004

Alter ART [h]

.263

.121

.326

2.171

.038

Degree centrality

.348

.165

.341

2.113

.043

Calculating Personality Characteristics from E-Mail

Table 10: Regression Coefﬁcients for Regressing Six Honest Signals against Extraversion.

251

Sociometrics and Human Relationships

252

Table 11: Regression Coefﬁcients for Regressing Six Honest Signals against Openness. Model

Unstandardized

Standardized

Coefﬁcients

Coefﬁcients

B

Std.

t

Sig.

Beta

error (Constant) Betweenness

54.912

1.334

41.156 .000

.007

.004

.310

1.869 .071

10.960

5.894

.308

1.860 .072

centrality Contribution index

Table 12: Regression Coefﬁcients for Regressing Six Honest Signals against Agreeability. Model

Unstandardized

Standardized

Coefﬁcients

Coefﬁcients

B

Std.

t

Sig.

Beta

error (Constant)

47.976

14.418

avg sentiment

51.502

18.594

.495

2.770 .010

3.328 .002

avg

5.967

2.673

.405

2.232 .033

.145

.051

.504

2.864 .008

complexity Total inﬂuence

Calculating Personality Characteristics from E-Mail

253

10.2.5. Conscientiousness The Adjusted R Square is 0.57, which means 57% of the conscientiousness of people can be explained by the network structure and content of their e-mails (Table 13). In other words, the less e-mail messages people send, the more messages they receive, the more positive their messages, the more central by betweenness, the fewer communication partners they have, the faster others respond to them, the fewer nudges they need until they answer, the higher the total inﬂuence of their messages,

Table 13: Regression Coefﬁcients for Regressing Six Honest Signals against Conscientiousness. Model

Unstandardized

Standardized

Coefﬁcients

Coefﬁcients

B

Std.

t

Sig.

Beta

error (Constant) Messages sent avg sentiment Betweenness

71.128 17.006 .167

.055

57.422 18.404

4.183 .000 2.618

3.047 .006

.374

3.120 .005

.039

.011

1.345

3.651 .001

.393

.116

.507

3.401 .002

1.905

.453

1.933

4.201 .000

1.481

.387

3.498

3.832 .001

.023

.012

.335

1.846 .077

195.283 53.294

.844

3.664 .001

.316

1.835 .079

centrality Alter ART [h] Degree centrality Total inﬂuence Messages received Average inﬂuence per message Ego nudges

9.781

5.331

254

Sociometrics and Human Relationships

and the lower the inﬂuence of an individual message, the more conscientious people are.

10.3. ADDING GENDER, ETHNICITY, AND NATIONALITY AS CONTROL VARIABLES Research about personality characteristics suggests that they differ between gender, ethnicity, and nationality. We therefore introduce three categorical variables, one for gender (female/male), one for ethnicity, where, based on the student population, we have Asian, Caucasian, and Arab, and one for nationality (Finnish, German, US, and other). Besides giving insights about the cultural differences of personality characteristics, adding these control variables also increases the accuracy of some of the predictions. A simple one-way ANOVA in SPSS for gender, ethnicity, and nationality leads to Table 14. Females are more neurotic, more open, more agreeable, and somewhat less conscientious, while there is no difference in extroversion. However, these differences are not statistically signiﬁcant. The table for ethnicity looks like Table 15. We see again differences, in that the Arabs are most neurotic, extrovert, open, agreeable, and conscientious. The Asians, on the other hand, are most neurotic, but least extrovert, open, and conscientious. The Caucasians are least agreeable. Note, however, these results are not signiﬁcant, and thus are of mostly anecdotal value. The table for nationality is shown in Table 16.

Gender Male

Neuroticism

Extroversion

Openness

Agreeability

Conscientiousness

Mean

44.26

54.58

52.84

45.26

51.11

N

19

19

19

19

19

Std. deviation Female

9.651

6.517

4.965

8.869

Mean

48.60

54.53

55.13

47.73

49.80

N

15

15

15

15

15

Std. deviation Total

8.608

9.840

10.901

9.149

8.172

10.936

Mean

46.18

54.56

53.85

46.35

50.53

N

34

34

34

34

34

Std. deviation

9.288

10.061

7.746

6.582

Calculating Personality Characteristics from E-Mail

Table 14: ANOVA Results by Gender for FFI Characteristics.

9.699

255

256

Table 15: ANOVA Results by Ethnicity for FFI Characteristics. Ethnicity Asians

Mean N

Caucasians

Openness

Agreeability

Conscientiousness

48.50

51.25

51.00

46.00

45.25

4

4

4

4

10.231

Mean

45.68

54.21

53.82

45.86

51.32

N

28

28

28

28

28

Mean

48.50 2

9.049 66.00 2

7.354 60.00

6.530 54.00

12.148

9.623 50.00

2

2

2

9.899

8.485

5.657

Std. deviation

13.435

11.314

Mean

46.18

54.56

53.85

46.35

50.53

N

34

34

34

34

34

Std. deviation

9.288

10.061

7.746

6.582

9.699

Sociometrics and Human Relationships

15.196

9.121

5.228

4

11.269

N

Total

Extroversion

Std. deviation

Std. deviation Arabs

Neuroticism

Nationality Finnish

Mean N Std. deviation

German

Total

Openness

Agreeability

Conscientiousness

47.44

53.33

58.11

44.11

48.33

9 9.888 45.33

N

15

Mean N

Other

Extroversion

Mean

Std. deviation The United States

Neuroticism

8.006 44.50 8

Std. deviation

11.662

Mean

53.50

9 7.036

9 5.183

9 6.194

9 11.147

53.67

50.80

45.20

53.13

15

15

15

15

10.069 61.63 8 8.943 38.50

N

2

2

Std. deviation

7.778

4.950

7.720 55.13 8 7.039 52.50 2 16.263

6.461 50.75 8 6.042 47.50 2 7.778

8.593 50.25 8 8.430 42.00 2 16.971

Mean

46.18

54.56

53.85

46.35

50.53

N

34

34

34

34

34

9.288

10.061

7.746

6.582

9.699

257

Std. deviation

Calculating Personality Characteristics from E-Mail

Table 16: ANOVA Results by Nationality for FFI Characteristics.

258

Sociometrics and Human Relationships

We ﬁnd that the Americans are least neurotic, most extrovert, and agreeable, conﬁrming national stereotypes. The Finns are the most open, while they are as extrovert as the Germans, which is quite surprising. The Germans, again conﬁrming national stereotypes, are most conscientious. This time, national differences in extroversion are statistically signiﬁcant (p = 0.018). When rerunning the regressions with the three control variables, gender, ethnicity, and nationality, we ﬁnd that these do not inﬂuence the coefﬁcients for calculating neuroticism, openness, and conscientiousness with the six honest signals of collaboration. However, the accuracy of the predictions for extroversion and conscientiousness increases.

10.3.1. Extroversion When regressing the six honest signals of collaboration against extroversion using ethnicity as a control variable, Adjusted R Square goes up from 0.33 to 0.49; this means that, now, 49% of extroversion of people can be explained by the structure and content of their e-mail (Table 17). As Table 17 shows, ethnicity explains part of extroversion. Also, adding ethnicity as a control variable has added signiﬁcance to other social signals of collaboration. When controlling for ethnicity, extroverts keep their behavior of being more oscillating, and showing more positive sentiment. But it also seems extroverts are less popular. For instance, their betweenness centrality is lower, and their messages have less inﬂuence. However,

Calculating Personality Characteristics from E-Mail

259

Table 17: Regression Coefﬁcients for Regressing Six Honest Signals against Extraversion with Ethnicity as Control Variable. Model

Unstandardized

Standardized

Coefﬁcients

Coefﬁcients

B

Std.

t

Sig.

Beta

error (Constant) Ethnicity Betweenness

33.648 16.820

2.000 .056

8.695

3.156

.365

2.755 .011

.695

.209

.541

3.324 .003

59.658 21.116

.375

2.825 .009

.331

2.360 .026

centrality oscillation avg sentiment Betweenness

.010

.004

.355

.118

.441

3.002 .006

102.152 34.816

.425

2.934 .007

.351

1.946 .063

centrality Alter ART [h] Average inﬂuence per message Ego nudges

11.257

5.786

they are more responsive, as they need fewer nudges until they respond (ego nudges), and others respond faster (alter ART).

10.3.2. Agreeability When regressing the six honest signals of collaboration against agreeability and adding ethnicity and nationality as control variables, the Adjusted R Square goes up from 0.21 to 0.34; this means that now 34% of agreeability of people can be explained by the structure and content of their e-mail (Table 18).

Sociometrics and Human Relationships

260

Table 18: Regression Coefﬁcients for Regressing Six Honest Signals against Agreeability with Ethnicity as Control Variable. Model

Unstandardized

Standardized

Coefﬁcients

Coefﬁcients

B

Std.

t

Sig.

Beta

error (Constant)

33.236

14.583

2.279 .030

Ethnicity

5.612

2.614

.360

2.147 .041

Nationality

2.962

1.320

.390

2.243 .033

Messages

.020

.007

.471

2.875 .008

avg sentiment

36.241

18.072

.348

2.005 .055

avg

4.938

2.566

.335

1.924 .065

sent

complexity

As Table 18 shows, ethnicity and nationality explain part of agreeability. Also, adding ethnicity and nationality as control variables has changed the inﬂuence to the “number of messages sent” as a signiﬁcant variable. When controlling for ethnicity and nationality, it seems that agreeable people show more positive sentiment and use less complex language. They also send more messages.

10.4. FOLLOW-ON EXERCISES 1. Use the coefﬁcients in the tables above to formulate ﬁve equations to calculate the Big Five personality characteristics of the other 15 students and of the instructors, who have not taken the neo-FFI test. What would they be?

Calculating Personality Characteristics from E-Mail

261

2. Using the personality insights gained, take your own e-mail archive and make an educated guess on the personality characteristics of the people in your mailbox, based on the correlations between personality and the six honest signals of collaborations identiﬁed in this chapter. 3. Using personality recognition through word usage, identify the personality characteristics of the people in your mailbox, using the system mentioned here: Celli and Poesio (2014). An online version of the system is available here: http://personality.altervista.org/pear.php

MAIN LESSONS LEARNED • Personality characteristics of individuals can be calculated based on their e-mailing behavior by comparing the six honest signals of collaboration of individual actors with their personality characteristics measured through the Big Five personality test. • The Big Five personality test measures Neuroticism, Extraversion, Openness to experience, Agreeableness, and Conscientiousness through a survey and is commonly used to assess personality characteristics by scientiﬁc psychologists.

This page intentionally left blank

11 PREDICTING CRIMINAL INTENT FROM E-MAIL — ANALYZING THE ENRON E-MAIL ARCHIVE

CHAPTER CONTENTS • The initial phase consists of an exploratory SNA to develop the hypotheses • The second phase identiﬁes criminals through their six honest signals of collaboration with t-tests • Finally, machine learning with “tribeﬁnder” ﬁnds the “tribe” of suspected criminals.

r 2017 Peter A. Gloor

263

Sociometrics and Human Relationships

264

Enron’s downfall has been widely publicized and is also described in the book The Smartest Guys in the Room1 and a movie of the same name. Enron was once a high-ﬂying energy trading company located in Houston TX, and named by Fortune for six subsequent years as “America’s most innovative company.” Enron employed 20,000 employees and claimed revenues of $111 billion before it went bankrupt on December 2, 2001. Enron was trading in electricity, natural gas, communications, pulp, and paper. Founded by longtime CEO Ken Lay, and led by COO/CEO Jeffrey Skilling and CFO Andrew Fastow, Enron got engaged in a sophisticated game of hiding bad assets in offshore vehicles and booking future earnings as proﬁts. 11.1. EXPLORATORY ANALYSIS For the discovery process during the criminal process of Enron, the prosecution also obtained and screened the e-mails of the 155 indicted Enron employees. After the process, these e-mails were cleaned up by academics and placed in the public domain. They are now widely used to benchmark e-mail-based social network research projects. There are different variants of the Enron e-mail archive available on the Internet. In this analysis, we use a version from 2006 containing 261,852 e-mails and 27,742 actors, collected from the 155 employees of Enron who were indicted in the criminal proceedings that were started by the US government after Enron’s bankruptcy in 2002. The dataset in Condor format is available from www.ickn.org/ sociometrics. 1

McLean and Elkind (2013).

Predicting Criminal Intent from E-Mail

265

The data goes from 1997 to 2002. For this analysis we restrict the dataset to the top 2862 actors active in the period January 1, 2000 to January 4, 2002, the time when most of the criminal activity at Enron was happening. This dataset includes 170,239 links. In the loading dialog of the dataset “enron2000to2002,” we set the start time to “January 1, 2000.”

The activity chart below shows that there are peaks of activity in October/November 2000, April/May 2001, and October/November 2001.

266

Sociometrics and Human Relationships

Coloring the static view of communication by organization shows a cohesive network where the major part of the actors is from Enron (shown in green in the graph below). Image 33a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

In our analysis, we would like to compare the nonindicted employees with the indicted employees.2 Research on the Internet leads to the following list of people convicted in the context of the Enron criminal process.

2

http://fuelﬁx.com/blog/2011/11/28/the-defendants-of-the-enronera-and-their-cases/#1875101=0

Andrew Fastow

Chief ﬁnancial ofﬁcer — Pleaded guilty to two conspiracy counts; testiﬁed against Lay and Skilling; ﬁnishing six-year sentence that ended Dec. 2011 at his Houston home

[email protected]

Ben Glisan

Treasurer — Pleaded guilty to conspiracy; served two-thirds of 5-year sentence

[email protected]

Christopher Calger

Vice president, pleaded guilty to a charge of conspiracy to commit wire fraud

NOT FOUND

Dan Boyle

Vice president global ﬁnance group — Convicted in Nigerian barge case; did

NOT FOUND

David Bermingham

not appeal; served three-year 10-month sentence British banker pleaded guilty to misleading their former employer in an Andrew Fastow ﬁnance scheme; sentenced to three years [email protected]

David Delainey

Former CEO retail energy division — Pleaded guilty to insider trading; served

Predicting Criminal Intent from E-Mail

[email protected]

nine-month prison term NOT FOUND

David Duncan

Arthur Andersen auditor — Withdrew a guilty plea after the Supreme Court reversed ﬁrm’s conviction; settled Securities and Exchange Commission complaint of securities laws violations

NOT FOUND

Gary Mulgrew

British banker pleaded guilty to misleading his former employer in an Andrew Fastow ﬁnance scheme; sentenced to three years

NOT FOUND

Giles Darby

British banker pleaded guilty to misleading his former employer in an Andrew Fastow ﬁnance scheme; sentenced to three years

James A. Brown

Merrill Lynch banker — Convicted in Nigerian barge case; some charges thrown out on appeal; served 47 months on remaining charges

267

NOT FOUND

Jeff Skilling

[email protected]

Jeffrey Richter

President — Serving 24-year sentence

268

[email protected]

Trader Enron Energy Services — Pleaded guilty to manipulating California power markets; served two-year probation

NOT FOUND

Joe Hirko

Co-CEO Enron Broadband Services — Pleaded guilty to charge arising from overstating performance of Broadband division; served 16-month sentence

[email protected]

John M. Forney

Energy trader — Pleaded guilty to manipulating California power markets; served two-year probation

[email protected]

Ken Lay

CEO — Tried with former Enron President Jeff Skilling; conviction thrown out because Lay died before sentencing

Ken Rice

[email protected]

Kevin Hannon

Co-CEO Enron Broadband — Pleaded guilty to securities fraud in Broadband case; served 27-month sentence Chief operating ofﬁcer Enron Broadband — Pleaded guilty to conspiracy in Broadband case; served two-year sentence

NOT FOUND

Kevin Howard

Finance chief Enron Broadband — Pleaded guilty to one count of falsifying records in Broadband case; served one-year probation

[email protected]

Lawrence Lawyer

Vice president global markets — Pleaded guilty to failing to report income; served two-year probation

NOT FOUND

Lea Fastow

Assistant treasurer, Andrew Fastow’s wife — Pleaded guilty to lying on tax

[email protected]

Mark E. Koenig

Head of investor relations — Pleaded guilty to securities fraud; served 18-

return; served one-year sentence month sentence

Sociometrics and Human Relationships

[email protected]

Michael Kopper

Finance managing director — First Enron executive to enter plea bargain; served less than two-thirds of three-year one-month sentence; released January 2009

[email protected]

Paula Rieker

Managing director of investor relations — Pleaded guilty to insider trading; served two-year probation

[email protected]

Rex T. Shelby

Vice president of engineering operations Enron Broadband — The last of the Enron employees to be sentenced; pleaded guilty to one count of insider trading; sentenced to two-year probation;

[email protected]

Richard Causey

Chief accounting ofﬁcer — Pleaded guilty to securities fraud; completed ﬁve-year six-month sentence

[email protected]

Timothy Belden

Predicting Criminal Intent from E-Mail

NOT FOUND

Head of trading Enron Energy Services — Pleaded guilty to manipulating California power markets; served two-year probation

[email protected]

Timothy DeSpain

Assistant treasurer — Pleaded guilty to conspiracy; served four-year probation

269

270

Acquitted

NOT FOUND

Robert Furst

Merrill Lynch banker — Tried in Nigerian barge case; conviction thrown out on appeal

NOT FOUND

Daniel Bayly

Former head of investment banking for Enron — Tried in Nigerian barge case;

NOT FOUND

Sheila Kahanek

Enron in-house accountant — Tried in Nigerian barge case; acquitted

In full dataset

Michael Krautz

Enron Broadband — Tried in Broadband case; acquitted

NOT FOUND

William Fuhs

Merrill Lynch banker — Tried in what prosecutors alleged was a scheme to inﬂate earnings through transactions involving power generation barges in Nigeria;

conviction thrown out on appeal

In full dataset

Scott Yeager

Strategic business executive Enron Broadband — Appeals court ordered Yeager acquitted on all charges after his case went to US Supreme Court

Sociometrics and Human Relationships

conviction thrown out on appeal

Predicting Criminal Intent from E-Mail

271

We start by tracking the 16 convicted criminals from the list above who show sufﬁcient e-mail activity in the full network. They are the following: [email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

We ﬁrst visually identify these 16 people in the full network in the static view of communication. First, we color all the nodes in blue. When coloring the nodes, clicking on the “advanced” button brings up the advanced coloring dialog which allows to set the color of a list of people by their UUIDs.

Using this dialog we load a text ﬁle (also available at www.ickn.org/sociometrics) containing the e-mail addresses we want to show in red, separated by commas: [email protected], [email protected], [email protected], david.delainey@enron. com, [email protected], [email protected],

272

Sociometrics and Human Relationships

[email protected], [email protected], ken [email protected], [email protected], mark. [email protected], [email protected], rex. [email protected], [email protected], tim. [email protected], [email protected] The image below shows the recolored static view of communication. Red dots are the convicted criminals. As the image shows, they are not the most central actors in the network. In the next phase of our analysis we will use the six honest signals of collaboration, calculating them for all actors and comparing them between the 16 convicted criminals and the rest of the actors. Image 34a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Predicting Criminal Intent from E-Mail

273

11.2. IDENTIFYING CRIMINAL ACTORS THROUGH THEIR HONEST SIGNALS OF COLLABORATION Our goal is to identify the differences in communication behavior between the 16 criminals (the experimental group) and the other 2845 people in the dataset (the control group). We will compare the six honest signals of collaboration between the two groups. We calculate them using the annotate functions: • Process dataset->annotate->Centrality annotations (betweenness and degree) (Central Leadership) • Process dataset->annotate-> Oscillation annotations (Rotating Leadership) • Process dataset->annotate-> Contribution Index annotations (Balanced Contribution) • Process dataset->annotate-> Turntaking annotations (Responsiveness) • Calculate sentiment (Honest Sentiment) • Calculate inﬂuence (Shared Context). We then export them to Excel:

274

Sociometrics and Human Relationships

Next, we split the table into two samples, the 16-people experimental group of convicts and the 2845 people in the control group. In this example, we are using SPSS “compare means/Independent sample” function, and store the 2861 records in a single SPSS table, adding a variable “0_inno_1_convict,” coding innocent people with “0” and convicts with “1.” Running SPSS t-test to compare the two groups leads to Table 19 (alternatively, we could also have used Condor’s builtin t-test function). As we can see, the criminals send and receive more messages; they are also more central in the network by degree and betweenness; their contribution index is more negative, which means they get more mail than the rest. These indicators all might come from their high-hierarchy positions. They also have much higher betweenness centrality oscillation, which means they are more creative

Predicting Criminal Intent from E-Mail

275

than their peers at Enron, and they are on average more responsive, and others answer faster to them. They also need less nudges until they respond to an e-mail, and others need less nudges, until they respond to the criminals. The speed of response of others (Alter ART) might Table 19: t-Tests of Structural SNA Metrics between Convicts and Control Group. N

0_control_1_convict

Messages sent

Messages received

Mean

Std.

Std. Error

Deviation

Mean

0 2845

59.31

243.01

4.56

1

94.56

201.01

50.25

16

0 2845

58.98

164.14

3.08

16

152.06

204.89

51.22

0 2845

118.28

385.43

7.23

1

16

246.63

334.27

83.57

0 2845

16.25

25.94

0.49

1

16

53.13

55.23

13.81

0 2845

2724.50

12,702.99

238.16

16 13,856.93

23,057.97

5764.49

0 2845

0.13

0.53

0.01

1

16

0.31

0.42

0.11

Betweenness centrality

0 2845

56.05

54.44

1.02

oscillation

1

16

130.38

73.01

18.25

Ego ART [h]

0 1604

62.05

54.64

1.36

1

16

58.74

71.31

17.83

Alter ART [h]

0 1465

63.53

56.03

1.46

1

16

52.76

69.86

17.47

Ego nudges

0 1604

1.27

0.64

0.02

1

16

0.80

0.58

0.15

0 1465

1.40

1.06

0.03

1

0.87

0.57

0.14

1 Messages total

Degree centrality

Betweenness centrality

1 Contribution index

Alter nudges

16

276

Sociometrics and Human Relationships

again be an indicator of high respect due to the highhierarchy position of the convicts. The higher responsiveness of the convicts might indicate that they have “more skin in the game.” Table 20 shows for which of these variables there are statistically signiﬁcant differences between criminals and their peers at Enron. As Table 20 shows, these differences in messages received, degree and betweenness centrality, betweenness centrality oscillation, and ego and alter nudges are statistically signiﬁcant. This means that the convicted criminals indeed show a different behavior from the rest of the people at Enron. Note that these differences might be an artifact of data collection. The mailboxes of the 16 convicted criminals are included in the 155 mailboxes that are the basis for the construction of this e-mail dataset. If we wanted to have a cleaner comparison, we would have to restrict our peer group to the other 139 people whose mailboxes were also collected, instead of taking all 2845 nonconvicts as the control group. We now compare the content of the e-mails, calculating complexity, sentiment, emotionality, and inﬂuence using Condor’s “honest signals of collaboration” content-based values for each actor (Tables 21 and 22). Statistical analysis with the same SPSS t-test as before shows that the messages of criminals are more inﬂuential — this might again be due to their higher rank in the company — but also more complex, and less emotional. There is no difference in average inﬂuence per message, and average sentiment between the two groups.

t-Test for Equality of Means

Levene’s Test for Equality of Variances

F

Messages sent

Equal variances assumed

0.48

Equal variances not assumed Messages received

Equal variances assumed

5.22

Equal variances not assumed Messages total

Equal variances not assumed

1.65

t

df

Sig. (twotailed)

Mean difference

Std. error difference

95% Conﬁdence Interval of the Difference

Lower

Upper

0.49 0.58 2859

0.56

35.26

60.87

154.62

84.10

0.70 15.25

0.50

35.26

50.46

142.65

72.14

0.02 2.26 2859

0.02

93.09

41.21

173.89

12.28

1.81 15.11

0.09

93.09

51.31

202.39

16.22

0.20 1.33 2859

0.18

128.34

96.57

317.69

61.00

1.53 15.23

0.15

128.34

83.88

306.90

50.21

277

Equal variances assumed

Sig.

Predicting Criminal Intent from E-Mail

Table 20: Signiﬁcances of t-Tests between Convicts and Control Group for Structural SNA Metrics.

278

Table 20: (Continued ) t-Test for Equality of Means

Levene’s Test for Equality of Variances

F

29.94

Equal variances not assumed Betweenness centrality

Equal variances assumed

23.31

Equal variances not assumed Contribution index Equal variances assumed Equal variances not assumed

1.79

t

df

Sig. (twotailed)

Mean difference

Std. error difference

95% Conﬁdence Interval of the Difference

Lower

Upper

0.00 5.62 2859

0.00

36.87

6.56

49.74

24.01

2.67 15.04

0.02

36.87

13.82

66.32

7.43

0.00 3.47 2859

0.00

11,132

3203

17,414

4850

1.93 15.05

0.07

11,132

5769

23,426

1161

0.18 1.33 2859

0.18

0.18

0.13

0.08

0.44

1.67 15.27

0.11

0.18

0.11

0.05

0.40

Sociometrics and Human Relationships

Degree centrality Equal variances assumed

Sig.

Equal variances assumed

Ego ART [h]

Equal variances assumed

5.58

Equal variances not assumed 2.92

Equal variances not assumed Alter ART [h]

Equal variances assumed

3.00

Equal variances not assumed Ego nudges

Equal variances assumed

1.41

Equal variances not assumed Alter nudges

Equal variances assumed Equal variances not assumed

0.03

0.02 5.43 2859

0.00

74.32

13.68

101.14

47.51

4.07 15.09

0.00

74.32

18.28

113.27

35.38

0.09 0.24 1618

0.81

3.31

13.77

23.70

30.32

0.19 15.18

0.86

3.31

17.88

34.76

41.38

0.08 0.76 1479

0.45

10.77

14.12

16.93

38.47

0.61 15.21

0.55

10.77

17.53

26.54

48.08

0.24 2.92 1618

0.00

0.47

0.16

0.15

0.78

3.20 15.36

0.01

0.47

0.15

0.16

0.78

0.86 2.00 1479

0.05

0.53

0.26

0.01

1.05

3.65 16.15

0.00

0.53

0.14

0.22

0.83

Predicting Criminal Intent from E-Mail

Betweenness centrality oscillation

279

Sociometrics and Human Relationships

280

Table 21: t-Tests of Content-Based E-Mail Metrics between Convicts and Control Group. N

0_control_1_convict

Mean

Std. Error Mean

1.92

0.04

avg complexity

0 2845.00 16.00

6.19

0.71

0.18

Total inﬂuence

0 2845.00

4.40

27.56

0.52

1

16.00

4.80

9.08

2.27

avg emotionality

0 2845.00

0.26

0.08

0.00

16.00

0.23

0.02

0.01

Average inﬂuence per

0 2845.00

0.07

0.10

0.00

message

1

16.00

0.07

0.07

0.02

avg sentiment

0 2845.00

0.53

0.09

0.00

1

0.54

0.09

0.02

1

1

16.00

5.68

Std. Deviation

Note that we could have done this analysis entirely in Condor, as it includes the t-test function. If we want to run some regression analysis, however, we will need the exported data loaded into a statistics package such as SPSS (or R, Matlab, SAS, or Stata).

11.3. “TRIBEFINDER” — IDENTIFYING CRIMINALS THROUGH MACHINE LEARNING IN CONDOR As a next step, we use Condor to directly identify potential criminals in the Enron dataset. In Table 21 we have found that convicted criminals show a different behavior from nonconvicted people. Using Condor’s machinelearning function, we can identify other people showing the same “suspicious behavior.”

t-Test for Equality of Means

Levene’s Test for Equality of Variances

F

avg complexity

Equal variances assumed

3.19

Sig.

0.07

Equal variances not assumed Total inﬂuence

Equal variances assumed

0.00

0.95

Equal variances not assumed avg emotionality

Equal variances assumed

0.06

df

Sig. Mean Std. error (two- difference difference tailed)

95% Conﬁdence Interval of the Difference

Lower

Upper

1.07 2859.00 0.29

0.51

0.48

1.46

0.43

2.84 16.26

0.01

0.51

0.18

0.90

0.13

0.06 2859.00 0.95

0.40

6.89

13.91

13.12

0.17 16.59

0.40

2.33

5.32

4.52

1.30 2859.00 0.20

0.03

0.02

0.01

0.07

4.54

0.03

0.01

0.01

0.04

17.30

0.87

0.00

281

Equal variances not assumed

3.49

t

Predicting Criminal Intent from E-Mail

Table 22: Signiﬁcances of t-Tests between Convicts and Control Group for Content-Based Metrics.

282

Sociometrics and Human Relationships

First, we load the Enron dataset.

We load the top 2000 actors out of the 27,742 actors in the dataset, in the time interval from January 1, 1997 to January 4, 2002, which is when most of the activity of the Enron employees happened.

Once we have loaded the data, we need now to calculate the different features to classify the 2000 actors in the dataset into suspects and nonsuspects, using the convicts as our training dataset. As features we use all the honest signals of collaboration:

Predicting Criminal Intent from E-Mail

283

• Process dataset->annotate->Centrality annotations (betweenness and degree) (Central Leadership) • Process dataset->annotate-> Oscillation annotations (Rotating Leadership) • Process dataset->annotate-> Contribution Index annotations (Balanced Contribution) • Process dataset->annotate-> Turntaking annotations (Responsiveness) • Calculate sentiment (Honest Sentiment) • Calculate inﬂuence (Shared Context).

After annotating all actors with the social networking and content-based features, we also need to mark each actor as being a convict or nonconvict. In order to later test the accuracy of our machine-learning algorithm, we tag each actor as either being part of the class of convicts or nonconvicts. For this, we load a CSV ﬁle of the structure (available at www.ickn.org/sociometrics):

284

Sociometrics and Human Relationships

[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

When loading the data, we ﬁnd that criminal mastermind Andrew Fastow seems to have been a very reluctant user of e-mail, with just 31 messages, which means he does not make it into the top 2000 users, so we end up with 15 “criminals” instead of 16. Now, we are ready to run the machine-learning algorithm. The idea is that “criminals” and “ordinary” people exhibit fundamental differences in their communication patterns. As there are many more “ordinary” people than “criminals,” our sample is highly unbalanced. For the training phase, we will have 15 “criminals” and 1985 “ordinary” people. This means that in Phase 1, described below, we will use all 15 “criminals” for training the machine-learning system, but multiply them by 9000% to obtain 1350 actors and thus make the two samples approximately of

Predicting Criminal Intent from E-Mail

285

same size. Using the Synthetic Minority Over-sampling Technique (SMOTE) algorithm, Condor will modify their features by small random increments to get an additional group of 1335 “virtual criminals” with communication patterns similar to the original 15 ones to get two samples of approximately same size. Condor currently provides two algorithms, decision trees, and random forests. Decision trees are simpler and faster, but they tend to overﬁt the data to the training dataset. This means that while the formula we are developing will be very accurate in modeling the data in our test cases, it will produce false results as soon as the underlying structure of the data changes only slightly. It is better to have a less accurate, but more robust prediction algorithm. We therefore use the random forest algorithm, which combines a number of decision trees into an “averaged” tree that has been built with many small random variations of the original samples. We now start the random forest machine-learning wizard. First, we load an external ﬁle with the e-mail addresses of the 16 convicts. Clicking on the “include” button leads to the following dialog, which shows all the actors on the left and the ones which will be used for training (the “convicts” in our example) on the right.

286

Sociometrics and Human Relationships

The next dialog asks which ﬁelds we would like to use for training the classiﬁer with the attributes of suspicious behavior. For our experiment, we use all the attributes as training features:

Predicting Criminal Intent from E-Mail

287

Remember that this dataset is unbalanced, because the two categories of “convicts” and “nonconvicts” are of vastly different size, with the convict class having 15 members and the nonconvict class having 1985 members. In the next dialog, we deal with the imbalance of the two classes. We check the box SMOTE mentioned above which applies the SMOTE algorithm to create additional records for the smaller class, blowing up the original 15 actors by 9000% to 1350 records.

The next image shows the training data on the left and how well each record ﬁts within the random forest training algorithm. On the right, we see a list of people with similar features (i.e., similar communication behavior). [email protected] has the best ﬁt with a communication pattern similar to the “criminals.”

288

Sociometrics and Human Relationships

We check the quality of the ﬁt by creating a receiver operating characteristics (ROC) curve. An ROC curve is a graphical way of visualizing the accuracy of a binary classiﬁer. It calculates and visualizes many different variations of testing the accuracy of true positives against false positives, by splitting the dataset into two parts, one part for training and the other for testing. Condor calculates nine different ROC curves, starting with using 90% of the data for testing, and 10% for training, down to using 10% of the data for testing, and 90% for training. The bigger the area under curve (AUC), the more correct is the classiﬁer, that is, the higher is the proportion of true positives. In the image below we see that the AUC is very large — for a random classiﬁer the ROC curve would be a diagonal. This means that our identiﬁcation of potential suspects with high likelihood is correct.

Predicting Criminal Intent from E-Mail

289

Finally, we visualize the convicts, which have been our training dataset, in light blue, and the suspects, which are the actors with communication behavior similar to the convicts, in green, and the rest of the people are shown in yellow. Each node is sized by the goodness of ﬁt to the training dataset. The larger the node, the more similar an actor is to a convict.

290

Sociometrics and Human Relationships

Image 35a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

As a very last analysis, we look at the t-test and compare the mean for suspects and nonsuspects using the “convicted” attribute we originally loaded.

Predicting Criminal Intent from E-Mail

291

11.4. FOLLOW-ON EXERCISES 1. As a follow-on exercise, analyze the communication behavior of the two acquitted suspects included in the full “Enron” dataset (Michael Krautz, Scott Yeager) and explore differences in their six honest signals of collaboration from convicted criminals and from the control group of the remaining people in the dataset. 2. As a second exercise, use the same data as input for a regression analysis and develop a formula that predicts if somebody else in the dataset might be a criminal based on a linear combination of their six honest signals of collaboration. 3. As a third exercise, increase the accuracy and predictability of this formula, by looking up on the Internet the names of the 155 people whose e-mail has been collected for this analysis. This means that while the experimental group (the 16 criminals) will stay the same, the control group will now be 155 16 = 139 instead of 2845 people. 4. Finally, focus on the “California Energy Crisis” and repeat the above analysis for this subtopic of the Enron downfall, by only loading the messages containing these keywords, and constructing the network with the resulting messages.

292

Sociometrics and Human Relationships

Important: We CANNOT convict anybody based on this formula. There might be perfectly legitimate reasons why somebody shows a certain combination of communication attributes. The only thing we can say is that with certain likelihood the communication behavior of this person resembles the behavior of convicted criminals.

MAIN LESSONS LEARNED • The six honest signals of collaboration can be used to ﬁnd people with similar behavior. • In this example, we try to identify criminals based on their e-mail behavior, analyzing the e-mail archive of Enron. The Enron e-mail archive documents the spectacular crash of Texan energy trading ﬁrm Enron at the end of 2001. • We identify the differences in the six honest signals of collaboration between ordinary Enron employees and the convicted criminals. • Using machine learning, Condor can be trained with the criminal actors and will then ﬁnd others with similar (suspicious) behavior.

Predicting Criminal Intent from E-Mail

293

• This is called “tribeﬁnder,” as it uses Condor’s machine-learning capability to identify people with communication patterns similar to a “tribe,” a homogeneous group of people, in this case the convicted criminals.

This page intentionally left blank

12 COOLHUNTING ON THE INTERNET WITH CONDOR

CHAPTER CONTENTS • The wisdom of the experts: ﬁnding trends and trendsetters on the Web and blogs • The wisdom of the swarm: through leaders and topics from Wikipedia • The “wisdom” of the crowd: identifying inﬂuencers and trends on Twitter.

r 2017 Peter A. Gloor

295

296

Sociometrics and Human Relationships

As the Web is mirroring the real world, it provides an excellent data source to measure the importance of a brand, product, politician, philosopher, or concept. Condor provides a rich functionality for analyzing the importance of a brand, product, or person on the Internet. Coolhunting for a topic consists of • Identifying the context of a topic or brand, in particular, its competitors • Measuring the relative strength of the topic or brand and its competitors • Identifying the topic or brand’s associated inﬂuencers, ranking them by inﬂuence. Without any further restrictions this analysis is done globally. Thanks to the availability of geotagging, the analysis can also be restricted by drilling down into different target markets. We distinguish three different information spheres where we track context, strength, and inﬂuencers of a brand, namely the • Crowd • Experts • Swarm. The crowd is deﬁned as the broad and indiscriminating masses, which easily ﬂip between wisdom and madness. The crowd is found mostly on Twitter. Experts might be journalists, or movie critics, or professional consultants. Experts are primarily found on the Web on blogs and in News websites. The swarm is deﬁned as people with “real skin in the game.” They might be medical researchers when Coolhunting for drugs, or plumbers when Coolhunting for ideas for aircraft toilets, or “treehuggers” when Coolhunting for alternative

Coolhunting on the Internet with Condor

297

energy sources. The swarm is found in Wikipedia and in domain-speciﬁc online forums and Facebook pages. In this section, the Coolhunting process is illustrated using Condor for a full 360-degree scan on the Web, Wikipedia, and Twitter for “Amity University,” a private university in India. The key preliminary step consists of understanding the context of the search term, using Google and common sense. In a netnographic analysis, the Google search results are interpreted qualitatively. In this initial phase, key people and inﬂuencers, products associated with the search term, and competitors are identiﬁed. Just by using Google and Wikipedia, we learn that “Amity University” is a private university in India, established by Ashok Chauhan in 2003, with 125,000 students and 4500 faculty and staff, and campuses not only all across India, but also in the Arab Emirates, China, the United States, and the United Kingdom.

Subsequently, the context of the brand, brand strength, key people and products, and competitors are searched

298

Sociometrics and Human Relationships

for on the Web, Wikipedia, and Twitter using Condor. Based on the Wikipedia entry we can identify “Amity University” and “Ashok Chauhan” as initial search terms. We also identify some further universities as “competitors” to calibrate the strength of Amity’s brand. Based on my own experience in teaching and working at MIT and University of Cologne, I will use “University of Cologne” and “MIT” as global brands to measure “Amity University” against, with “MIT” as a top brand that will come out much stronger, and “University of Cologne” as a local brand in Germany, comparable to Amity. I will compare these terms against “IIT,” the Indian Institutes of Technology as the top Indian brand and “University of Mumbai,” a local competitor of possibly comparable brand strength.

12.1. EXPERT ANALYSIS — WEBSITES AND BLOGS We start by creating a new database in Condor called “amity.”

Coolhunting on the Internet with Condor

299

We then create a new dataset “amity_uni_web.”

This will lead to a new dataset “amity_Uni_Web” being shown in Condor. Tip: Database and dataset names cannot have a space. Use an underscore “_” as a separator

We then start the Web fetcher using the “fetch Web” command, with the following settings. Note that you must previously have obtained your own Google CSE keys.1 Google will allow you to do 100 free searches per day, after that it charges a few cents per query. 1

See the following YouTube video for how to obtain your Google CSE keys: https://www.youtube.com/watch?v=zME1-j9yPvI

300

Sociometrics and Human Relationships

After clicking the “Next >” button, Condor will bring up the top 20 search hits from Google. These can be manually checked to make sure they are really about the “Amity University” we are interested in. This is not a problem for a strong brand like “Amity University,” but might be more problematic if we would be measuring the strength of a local politician named “John Smith” in Alabama. In this case, the search term would be “John Smith Alabama,” and we still might have to check some URLs to make sure they are not about John Smith the teacher in Huntsville.

Coolhunting on the Internet with Condor

301

Condor will now conduct a degree-of-separation search, collecting the top 20 web pages pointing back to each of the URLs shown in the image above. We then repeat this process for “Ashok Chauhan,” “Mumbai University,” and “University Cologne,” making sure to store each of the resulting datasets in a separate Condor dataset called “Ashok_Chauhan_Web,” “Uni_Cologne_Web,” and “Uni_Mumbai_Web.” Once all four queries have been completed, we merge the four datasets into one combined dataset which we call “Web_combined.”

Storing each of the Web fetches into a separate dataset will allow us to display the websites belonging to each of the different universities (Amity, Cologne, Mumbai) and Ashok Chauhan in different colors. As a next step we will have to make sure that each domain is shown separately. We do this by collapsing the dataset by domain.

302

Sociometrics and Human Relationships

Note that we need to check the box “keep original nodes and link each to its collapsed one” to make sure to keep the web page nodes in addition to the domain nodes, and also check the box “keep nodes with missing ‘collapse by’ value” to keep the original search terms. Now we are ready to open the combined dataset.

Coolhunting on the Internet with Condor

303

By clicking the “Next >” button we will be using default settings and not ﬁltering or changing anything during the load dataset process.

We will be loading both the “Query” and the “web pages” by just clicking the “Next >” button again.

304

Sociometrics and Human Relationships

The next dialog would give us the option to manually remove some of the URLs from the resulting network. By just clicking the “Next >” button again, we include all URLs.

Now we have loaded combinedcollapsed.”

the

full

dataset

“Web

Coolhunting on the Internet with Condor

305

By dragging the mouse over the resulting dataset box, we can see that it contains 686 actors and 822 links. Choosing the “View->Create static view” menu, we are now ready to display the network. We also color the network by the original datasets. Image 36a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The next step will be to calculate the importance of the four brands “Amity, Cologne, Mumbai, Ashok

306

Sociometrics and Human Relationships

Chauhan” and of the different websites. We do this by annotating the actors with betweenness centrality, using “Process dataset->Annotate->Centrality annotations” which will bring up the following dialog. Clicking on the “Next >” button will calculate the betweenness centrality for each node in the graph.

We can now change the size of the nodes by selecting “Size by betweenness centrality” in the window pane on the right. Note that this option is available only now, after having calculated betweenness centrality. Keeping the nodes in the graph colored by the original datasets leads to the following image.

Coolhunting on the Internet with Condor

307

Image 37a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

In the image above, we can already see that “Ashok Chauhan” seems to be a stronger brand than “Amity University.” Right clicking on a node shows its name, dragging the slider “Node labels” increases the font size of the label (see the image below). Squares represent the search terms, indicating the strength of each brand through their node size, circles are URLs.

We can more exactly track the strength of the brands on the Web by exporting the values to Excel (Export->Export CSV). There is no need to also export the links for now.

308

Sociometrics and Human Relationships

Importing the actor.csv ﬁle into Excel leads to the image below on page 310. We separate search terms and URLs and draw a separate chart for each of the two ordered lists. The image below shows the search terms sorted by betweenness centrality, which will give a ranked list of the importance of the four brands: 1. Ashok Chauhan 2. Mumbai University

Coolhunting on the Internet with Condor

309

3. University of Cologne 4. Amity University. One unexpected insight of the analysis is that the brand equity of its founder Ashok Chauhan is stronger than the brand of Amity University itself. This means Ashok Chauhan is more prominent on the Web than Amity University. The second list in the image below on page 310 shows the most prominent websites, again sorted by betweenness in the degree-of-separation graph shown before (reverse the graph sort for the highest score at the top). The top websites boosting the brands of Ashok Chauhan, Mumbai University, University of Cologne, and Amity University are 1. Wikipedia 2. Facebook 3. Topuniversities.com 4. Indianexpress.com 5. Quora 6. Portal.uni-koeln.de 7. Amazon.

310

Sociometrics and Human Relationships

Coolhunting on the Internet with Condor

311

The presence of the uni-koeln portal is explained by content about a Nobel Prize winning scientist featured on the uni-koeln website. This illustrates the power of star scientists to promote the popularity of brands such as universities.

12.2. SWARM ANALYSIS — WIKIPEDIA The second analysis compares the presence of the different universities on Wikipedia. As Wikipedia provides access to the world’s knowledge, particularly for measuring the strength of universities, Wikipedia editors represent the intrinsically motivated swarm of “knowledge gathering worker bees.” To calculate the link network for “Amity University,” we create a new dataset “Amity Wiki.”

Using the Wiki Evolution Fetcher, we ﬁrst search for the key Wikipedia pages containing the term “Amity University” in the English Wikipedia.

312

Sociometrics and Human Relationships

Out of the returned pages, we extract the Wikipedia network originating from the page “Amity University.”

We collect all the links (not just the bidirectional ones) originating and pointing to “Amity University” by unchecking the “restrict static fetcher to bidirectional links” check box shown in the image below.

Note: Bidirectionality of a link means that if page A has a link to page B, there will also be a link on page B linking it back to page A. Including

Coolhunting on the Internet with Condor

313

unidirectional links will give us a broader overview of the network, and as we do not expect “Amity University” to be wildly popular, the number of links found will still be manageable even when including the unidirectional links also. If we would search for a page like “United States,” collecting all the links either originating on the “United States” page, or pointing back to that page, would most likely return a signiﬁcant part of the entire Wikipedia, for such a page we would only collect the bidirectional links.

We are not collecting the dynamic network, as for now we are interested only in the full network at collection time, not the evolution of the links over time. We also collect the snippets, the text before the table of contents of a Wikipedia page, to obtain the most important words describing Amity University on Wikipedia.

314

Sociometrics and Human Relationships

Once Wiki Evolution has ﬁnished its data collection, we can run “create static view” and calculate betweenness by executing “centrality annotations.” Before drawing the network, we can check the nodes for some nodes which might distort the network picture, while not adding meaning to the overall graph, by bringing up the dialog for removing nodes (“Process dataset-> Remove speciﬁc actor”). The page “Geographic_ Coordinate_System” and some template pages have nothing to do with the universities; we are thus removing them.

This leads to the following network, with the nodes sized by betweenness.

Coolhunting on the Internet with Condor

315

Image 38a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The above network shows that Ashok Chauhan commands a much less central position in the Wikipedia network, while India is the deﬁning attribute of Amity University. It is also surprising that only other private universities in India, that is, local competitors (the cluster on the lower left), as well as private Indian high schools (the cluster on the top right) dominate the network. The only research topic that shows up somewhat prominently is biotechnology. Rerunning the same analysis, but only including bidirectional links (keeping the box “Restrict static fetcher to bidirectional links” checked as shown in the image below), will identify the most important relationships.

316

Sociometrics and Human Relationships

The image below shows the resulting network in the static view of communication. We see that there are much less nodes, and general geography nodes like the page about “India” are not mentioned anymore. This makes sense, because Amity University is not important enough to make it on any of the general geography pages.

In the image above, Amity University is clustered together with other local competitors, while the second

Coolhunting on the Internet with Condor

317

large cluster on the bottom right shows the importance of Amity School and its local Indian competitors such as Ryan International School. Next, we are looking at the key terms around “Amity University” on Wikipedia. First, we need to calculate the sentiment for the Wikipedia snippets (the snippet is the text above the table of contents on a Wikipedia page), which we have collected for the “amity” “wiki” dataset. “Process dataset->calculate sentiment” brings up the following dialog, where we calculate the sentiment for the ﬁeld “content.”

The menu “View->Create word cloud view” creates the following word cloud. It tells us that for Amity University, Amity School and its associated middle and high schools are a stronger brand than the university proper. The dark red color of “school” tells us that the sentiment around the word “school” is slightly negative.

318

Sociometrics and Human Relationships

Image 39a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Clicking on “school” on the word cloud view brings up the drill-down view of the word “school” shown below. The view shows that “school” has been found in 50 articles, out of which 12 are positive, 13 are neutral, and 25 are negative. The drill-down view also displays the context where the search word “school” has been used for each of the 50 occurrences. The reason why there are 25 negative texts in the context of “school” is the use of words like “strictly,” “poor,” “compulsory,” and “lower.” The next step in the analysis is to compare the Wikipedia link structure of “Amity University” with its competitors, “IIT,” “University of Mumbai,” “MIT,” and “University of Cologne.” Running the Wiki Evolution Fetcher with these four search terms, while collecting all links, not just bidirectional links, produces the following four network images.

Coolhunting on the Internet with Condor

319

320

Sociometrics and Human Relationships

Comparing the information structure of “Amity” with competitors, both in the same “league,” and well above, such as IIT and MIT, shows both its strength and weaknesses. MIT is number 1 in the 2015/16 QS World university ranking, IIT Mumbai is 222, University of Cologne is 305, and University of Mumbai is 551600. In the QS Asia ranking, IIT Delhi is 42, University of Mumbai is 125, and Amity University is 251300. Images 40 to 43a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

At ﬁrst glance, we can see that MIT (top right), IIT (bottom right), and University of Mumbai (bottom left) are all much more dense and have more nodes in the network than University of Cologne (top left), which has a similar network to Amity University. This tells us at a glance that

Coolhunting on the Internet with Condor

321

MIT, IIT, and also University of Mumbai all play in a league above Amity University. As IIT is spread out across multiple campuses, it has different local clusters, although IIT Mumbai seems to be the dominant one, while the largest node in the IIT network is about India. The links of IIT to other universities are also surprisingly local, almost exclusively linking to other Indian universities. Compared to IIT’s 976 actors and 44,448 Wikipedia links, MIT has 977 actors and 474,976 links. The ﬁrst link of MIT is to Harvard, illustrating how these two top brands in academia boost each other. The ﬁrst MIT university links to non-US institutions are to McGill and University of Toronto, the next one is to Tsinghua. The ﬁrst link to a person in the IIT network is to Narendra Modi, the current prime minister of India, followed by some historical kings. Only quite far down in the link network is the link to Ashoke Sen, a highly respected Indian physics professor. The ﬁrst link in the MIT network to a person is to John F. Kennedy, followed by links to MIT presidents of the late 19th century, followed by some recent MIT presidents, and then some more recent Nobel Prize winners. The network of University of Cologne is much more sparse, showing fewer links. The University of Cologne network, however, has more international links. After a ﬁrst link to University of Munich, it is next linked to National University of Singapore, University of Vienna, and London School of Economics. After some politicians, it is linked to Heinrich Böll, a famous poet, and then to Peter Grünberg, a Nobel Prize winner in physics teaching at University of Cologne. The conclusions for Amity University are to build an international alliance network with universities abroad and establish connections to a few star scientists.

322

Sociometrics and Human Relationships

12.3. ANALYSIS OF THE CROWD — TWITTER In our ﬁnal analysis, we will be investigating what the crowd has to say about “Amity University” on Twitter. To collect the tweets about Amity, we create a new dataset “Amity_twi”.

Searching Twitter for “Amity University” tells us that @amityuni is the Twitter account of Amity University. Following only two people, Amity has 3957 followers.

We now also look at the Twitter accounts of University of Cologne and University of Mumbai. While University of Cologne seems to have an ofﬁcial Twitter account @UniCologne, University of Mumbai seems to be

Coolhunting on the Internet with Condor

323

somewhat disorganized with two accounts, one having tweeted 23 times with only 228 followers and another with 1088 followers, but only one single tweet.

We now collect the tweets for Amity University into the amity_twi dataset, searching for “amityuni.” We uncheck the box “collect only retweets” to collect all the tweets about “amityuni,” even if they have never been retweeted, which means they have been read at least once. We also increase the maximum number of results to collect from 250 to 1000. Note that the Twitter API will only let us collect at most 18,000 tweets from the last seven days, by running at most 180 queries returning 100 tweets each within a 15-minute time window. This means that we could have entered 18,000 into the “Number of result” box, and then we would have to wait 15 minutes for the next query. As our search in the Twitter search box told us that there is not much active tweeting about amityuni, asking to collect at most 1000 tweets within the last seven days gives us the 30 tweets actually been tweeted in the last seven days.

324

Sociometrics and Human Relationships

Condor then brings up the Twitter API key window, where we have already entered our search keys.2

To go further into the past, we also collect all the tweets sent by @amityuni using the Twitter people fetcher.

2

See the following YouTube movie for how to obtain your Twitter API keys: https://www.youtube.com/watch?v=6zMW7YEKJzw

Coolhunting on the Internet with Condor

325

The activity view shows that the tweets about amityuni go back until the beginning of 2015, with most of the tweets being sent at the time of collection, to the end of August 2015, where at most 13 tweets were sent per day.

The resulting static communication view shows the importance of the search term “amityuni.” Image 44a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

326

Sociometrics and Human Relationships

This process is now repeated for “unicologne,” collecting the tweets with the “Twitter fetcher,” and the “Twitter accounts fetcher,” storing the results into a newly created dataset cologne_twi. Next, the tweets for University of Mumbai are collected. As both “mumbai_uni” and “umumbai” are used, two queries, one for “uni mumbai” and another for “mumbai uni” are run. Afterwards the two search terms are merged with the Twitter account “Uni_Mumbai,” using Condors actor merge function (“Process dataset-> Node merging->manual node merging,” see image below).

To make this change persistent, it is saved into the MySQL database under the new name “mumbai_twi2” by right clicking on the dataset box “mumbai_twi.”

Coolhunting on the Internet with Condor

327

The three datasets, amity_tw, cologne_twi, and mumbai_twi2 are now merged into a new dataset.

In the combined static view, we see that Amity Uni for the seven days when the data was collected is the strongest brand in the Twittersphere, while Uni Cologne was most tweeted about.

328

Sociometrics and Human Relationships

Image 45a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Now we go back to the original amity_twi dataset, looking at its content by calculating sentiment.

Next, we look at what people are tweeting about Amity University in the week August 1724, 2015. It seems the tweets about amity are universally happy, somewhat emotional, and somewhat complex.

Coolhunting on the Internet with Condor

329

Image 46a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Generating the word cloud shows the keywords being used in the tweets, clicking on a word shows it in context. Image 47a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

330

Sociometrics and Human Relationships

Compared to Amity University, tweets about University of Mumbai are very angry, people are complaining about the “outrageous fees,” and “harassment of females.” Tweets about University of Cologne are much happier. Condor is capable of automatically calculating sentiments of English, German, French, Italian, Spanish, and Portuguese, note that there are some German words in the word cloud about University of Cologne. Image 48a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

We can also calculate where on the World people are tweeting about Amity. For this we ﬁrst need to run “Process dataset-> Location annotation.”

Coolhunting on the Internet with Condor

331

The World Map shows that almost all of the tweeting is happening in India, with a few tweets from Indians in the United States or Europe tweeting about relatives graduating at Amity Noida. Image 49a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

332

Sociometrics and Human Relationships

Finally, we would like to identify the most inﬂuential twitterers about Amity University. Toward that goal, we remove the search term “AmityUni.”

Note that the network, which was initially connected in the static view, now breaks up into many small clusters. These represent individual retweet networks, with a central individual in the core being retweeted by the people in the periphery. We have now the option to size the nodes by different Twitter-speciﬁc attributes such as the number of followers or the number of times a tweeter has been listed. This helps to identify the most inﬂuential twitterers about Amity.

Coolhunting on the Internet with Condor

333

Image 50a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

We ﬁnd that the most inﬂuential twitterer by follower count is @Brothers2015, which is the ofﬁcial and veriﬁed account of the Bollywood movie “Brothers.” It seems Amity University is a sponsor of this movie. Entering “brothers” in the search box at the top will highlight all the tweets and actors containing the keyword, showing a lot of traction for this association with Amity.

334

Sociometrics and Human Relationships

There is also a second tier of inﬂuential twitterers such as BeingExample, Nick_Ksg, and 6670G with a decent amount of followers in the 40005000 people range. However, they follow a similar or even larger number of people, which raises the suspicion, that they have “gamed” their follower counts by following each other. This means that they might be tweeting into an “echo chamber” without wider reach.

12.4. FOLLOW-ON EXERCISES 1. Repeat the analysis for another university, company, or organization, for example, take your own university. 2. Repeat the analysis for a product or brand, for example, comparing “Samsung Galaxy” with “iPhone 6s.” 3. Compare the importance of a politician (Hillary Clinton) with the importance of a university. 4. Monitor the twittersphere over time to assess how content and twitterers change. 5. Rerun the Web search but restrict the date range for most recent four weeks to assess what is new and what links have been added. 6. Rerun the Wikipedia analysis including the dynamic fetcher to assess what has been recently added.

Coolhunting on the Internet with Condor

335

12.5. (PARTIAL) LIST OF INTERNET COOLHUNTING STUDIES Downloadable from http://www.ickn.org/publications. html Gloor, P., De Boer, P., Lo, W., Wagner, S., Nemoto, K., & Fuehres, H. (2015). Cultural anthropology through the lens of Wikipedia A comparison of historical leadership networks in the English, Chinese, and Japanese Wikipedia. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. Maddali, H. T., Gloor, P., & Margolis, P. (2015). Comparing online community structure of patients of chronic diseases. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. Gloor, P., & Nemoto, K. Who really matters in the world Leadership networks in different language Wikipedias. Places and Spaces Mapping Science, Map #157. Frick, K., Guertler, D., & Gloor, P. (2013). Coolhunting for the world’s thought leaders. Proceedings 4rd international conference on collaborative innovation networks COINs 2013, Santiago de Chile, August 1113. Futterer, T., Gloor, P., Malhotra, T., Mfula, Packmohr, K. H., & Schultheiss, S. (2013). Wikipulse A newsportal based on Wikipedia. Proceedings 4rd international

336

Sociometrics and Human Relationships

conference on collaborative innovation networks COINs 2013, Santiago de Chile, August 1113. Yun, Q., & Gloor, P. (2012). The web mirrors value in the real world Comparing a ﬁrm’s valuation with its web network position. Sloan Technical Report, Cambridge, MA. Fuehres, H., Gloor, P., Henninger, Kleeb, M., & Nemoto, K. (2012). Galaxysearch: Discovering the knowledge of many by using Wikipedia as a meta-search index. Proceedings collective intelligence, Cambridge, MA, April 1820. Garcia, C., Parraguez, P., Barahona M., & Gloor, P. (2012). Tracking the 2011 student-led collective movement in Chile through social media use. Proceedings collective intelligence 2012, Cambridge, MA, April 1820. Kleeb, R., Gloor, P., & Nemoto, K. (2011). Wikimaps: Dynamic maps of knowledge. Proceedings 3rd international conference on collaborative innovation networks coins 2011, Basel, Switzerland, September 810. Zhang, X., Fuehres, H., & Gloor, P. (2011). Predicting asset value through twitter buzz. In J. Altmann, U. Baumoel, B. Kraemer, (Eds.), Proceedings 2nd symposium on collective intelligence Collin 2011, Seoul, June 910, Springer Advances in Intelligent and Soft Computing, vol. 112. Gloor, P., Grippa, F., Borgert, A., Colletti, R., Dellal, G., Margolis, P., & Seid, M. (2011). Towards growing a

Coolhunting on the Internet with Condor

337

coin in a medical research community. Procedia Social and behavioral sciences (Vol. 26). Proceedings COINs 2010, collaborative innovations networks conference, Savannah GA, October 79, 2010. Zhang, X., Fuehres, H., & Gloor, P (2010). Predicting stock market indicators through twitter: “I hope it is not as bad as I fear,” Procedia Social and Behavioral Sciences, 26, 2011. Collaborative Innovations Networks Conference, Savannah GA, October 79, 2010. Doshi, L., Krauss, J., Nann, S., & Gloor, P (2009). Predicting movie prices through dynamic social network analysis. Proceedings COINs 2009, Collaborative innovations networks conference, Savannah GA, October 811. Gloor, P., Krauss, J., Nann, S., Fischbach, K., & Schoder, D. (2009). Web Science 2.0: Identifying trends through semantic social network analysis. IEEE conference on social computing (SocialCom-09), Vancouver, August 2931. Krauss, J., Nann, S., Simon, D., Fischbach, K., & Gloor, P. (2008). Predicting movie success and academy awards through sentiment and social network analysis. Proceedings of European conference on information systems (ECIS), Galway, Ireland, June 911. Gloor, P. (2007). Coolhunting for trends on the Web (invited paper). Proceedings of IEEE 2007 international symposium on collaborative technologies and systems, Orlando, May 2125.

338

Sociometrics and Human Relationships

MAIN LESSONS LEARNED • Coolhunting with Condor measures and tracks the importance of a brand on the Internet. • Coolhunting for a brand consists of identifying the context of a brand, in particular, its competitors. • It tracks the relative strength of the brand and its competitors through degree-of-separation search, constructing a bipartite graph and measuring betweenness of the search terms. • It also identiﬁes the brand’s associated inﬂuencers, ranking them by their impact. • This analysis can be done globally or be restricted by geography or by demographic subgroups.

13 COOLHUNTING — FRANCOGEDDON

CHAPTER CONTENTS • “Francogeddon” — breaking the link between Euro and Swiss Franc on January 15, 2015, by the Swiss National Bank • The Web and Twitter reﬂect the sentiment of the market in response to “Francogeddon” • The six honest signals of collaboration for “Swiss Franc,” “Euro,” and “USD” on Twitter track the inﬂuence of “Francogeddon” on the three currencies.

r 2017 Peter A. Gloor

339

340

Sociometrics and Human Relationships

On January 15, 2015, ﬁnancial markets were in turmoil. In a surprise move — later termed Francogeddon — the Swiss National Bank removed the artiﬁcial exchange rate of Swiss Franc 1.20 to the Euro, which it had set and defended by buying massive amounts of Euro and Dollars since September 6, 2011. Within hours the exchange rate between Euro and Swiss Franc ﬂuctuated from 1.20 Francs per Euro to 95 cents per Euro, leading to massive losses at stock markets around the world, forcing some hedge funds into insolvency. Such an unexpected event in the ﬁnancial markets offers a unique natural experiment to measure global consciousness of ﬁnancial markets. Using Condor, we collected the most recent 12,000 tweets containing the string “Swiss Franc,” as well as another 12,000 tweets each containing “Euro” and “USD” on January 18, when Francogeddon was still a major issue, and currencies were still ﬂuctuating wildly. We repeated the data collection at two later points in time, on February 3 and February 6, 2015, when Francogeddon was over, and things had settled down. This nine-part dataset allows us to compare a moment of high public consciousness, when Francogeddon was at the top of everybody’s minds involved into currency trading with a baseline of two later points in time when the event was over and public consciousness for this topic will be low again. The nine charts in Figure 27 illustrate the activity of the tweeters on these three days. While the tweet activity about Euro and USD is about the same on all three sampling days (2030 tweets per minute), tweet activity for Swiss Franc is about 200 tweets per hour on January 18, dropping to 50 tweets per hour on February 3 and 6.

Coolhunting — Francogeddon

Figure 27: Twitter Activity after January 18, 2015 for Search Strings, “Swiss Franc,” “USD,” and “Euro.”

341

342

Sociometrics and Human Relationships

Figure 28 shows the network structure of the three currency Twitter networks on January 18 and February 6. Each node is a person tweeting, a link is added between two nodes if one person is mentioned in the other person’s tweet, or one person retweeting the other person. As Figure 28 illustrates, the tweets about Swiss Franc on January 18 form a large connected component. The Euro network (which was more inﬂuenced by the Swiss Franc) shows a somewhat smaller connected component, while the USD tweet network is very little connected which tells us that the tweeters have nothing to do with each other. On February 6, all three tweet networks have similar structures of mostly unconnected tweets, with the Euro still showing a somewhat larger connected component. The six word clouds depicted in Figure 29 show what people are tweeting about. While the sentiment about the Swiss Franc on January 18 is overarching negative (the darker the red of a keyword, the more negative its context), it is somewhat negative for the Euro tweets, and almost exclusively positive for the USD. The Swiss Franc tweets on February 6 are becoming more positive, but still mostly negative, as a lot of people in Eastern Europe, particularly in Poland, but also in Rumania and Austria, complain about taking out mortgages in Swiss Franc, which now ballooned against their local currency. A look at the USD tweets on both January 18 and February 6 shows that they mostly consist of retweets of items auctioned on eBay. This illustrates that the US tweeters don’t care much about Francogeddon. Tweets about the Euro are somewhat negative, but the concerns — which are growing on February 6 — are more about Draghi and the possible Grexit, that is, the exit of Greece from the Eurozone.

Coolhunting — Francogeddon

Figure 28: Twitter Network Structure on January 18 and February 6, 2015 for Search Strings, “Swiss Franc,” “USD,” and “Euro.”

343

344

Figure 29: Word Clouds of Tweets on January 18 and February 6 2015 for Search Strings, “Swiss Franc,” “USD,” and “Euro.”a

Sociometrics and Human Relationships

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Coolhunting — Francogeddon

345

So let’s now calculate the six honest signals of communication for the nine datasets: (1) Group betweenness centrality (how centralized are the tweet networks). (2) Oscillation in group betweenness centrality (how much is the centrality of individual tweeters in the network changing over time, measured in 15-minute intervals). (3) Average weighted variance in contribution index, that is, how much are individual tweeters being retweeted over time. (4) Average response time (ART) and nudges, which tells how long it takes for a tweet to be retweeted and if people are mutually retweeting each other. (5) Sentiment and emotionality, which shows how positive and negative the tweets are. (6) Complexity of language. The charts below illustrate the changes over the three points in time in emotionality (Figure 30), ART (Figure 31), and the number of nudges per tweeter (Figure 32). For example, the response time (ART) drops considerably for USD from January 18 (day 1) to February 6 (day 3), while it goes up for Swiss Francs. This means things are cooling down for tweets about Swiss Francs, and it takes more time until they are retweeted. Comparing the six honest signals of communication for the three currencies, we see that even for this small sample, using the Mann-Whitney U-test, tweeting behavior about Swiss Franc is different from tweeting about

346

Sociometrics and Human Relationships

Figure 30: Average Emotionality of Tweets Containing Search Strings, “Swiss Franc,” “USD,” and “Euro.”

Figure 31: Average Response Time (ART) of Tweeters Using Search Strings, “Swiss Franc,” “USD,” and “Euro.”

Euro and USD with regards to the number of nudges as well as the variance between nudges until one tweeter responds to another tweeter (p = 0.024). To put this in other words: comparing the three Twitter networks about the three currencies over three points in time, there seems

Coolhunting — Francogeddon

347

Figure 32: Average Number of Nudges (Retweets) of Tweeters Using Search Strings, “Swiss Franc,” “USD,” and “Euro.”

to be a higher global consciousness by people tweeting about Swiss Franc compared to people tweeting about Euro and USD — maybe a glimpse of global consciousness of currency traders related to Francogeddon?

13.1. FOLLOW-ON EXERCISES 1. Do a Coolhunting today for USD, EUR, and CHF, collecting 12,000 tweets for each of the three symbols and compare with the social network and the word cloud from January 2015. 2. Calculate the six honest signals of collaboration for your Coolhunting data for CHF, EUR, and USD and compare with the six honest signals from January 2015.

348

Sociometrics and Human Relationships

3. Who are the most inﬂuential people and websites for CHF, EUR, and USD in your new data? 4. Collect the Twitter and Wikipedia data for CHF, EUR, and USD for one month and correlate with the exchange rate for these three currencies, comparing the 1. number of tweets about each currency; 2. sentiment and emotionality of tweets of each currency; 3. ART and nudges for each currency. Which of the three time series results in the highest correlation with the actual exchange rate?

MAIN LESSONS LEARNED • Coolhunting on the Web and on Twitter measures global awareness during “Francogeddon.” • “Francogeddon” happened when on January 15, 2015, the Swiss National Bank unexpectedly removed the link between Euro and Swiss Franc, leading to huge global currency ﬂuctuations and the bankruptcy of some hedge funds. • The global sentiment of those events is analyzed through tweets about “Swiss Franc,” “Euro,” and “USD,” using the six honest signals to compare the impact of Francogeddon on the three currencies.

14 COOLHUNTING THE US PRESIDENTIAL ELECTIONS

CHAPTER CONTENTS • Online social media provides a microscope into the inner workings of the US presidential elections • The 2015/2016 Bernie Sanders campaign — a prime example of swarm-based COIN leadership • Comparing sentiment, demographics, and popularity of four candidates: Donald Trump, Jeb Bush, Hillary Clinton, Bernie Sanders

r 2017 Peter A. Gloor

349

350

Sociometrics and Human Relationships

• “Tribeﬁnders” categorizes supporters of Donald Trump and Bernie Sanders through their tweets.

The 2016 US Presidential election provides an excellent opportunity to study Coolhunting and Coolfarming. The US Presidential elections are fought to a large extent on social media with the candidates ﬂooding the Internet with their tweets, videomessages, and Instagram pictures. The contrasting styles of the candidates also offer a prime example of the difference between COIN-based and hierarchical leadership style. We start the Coolhunting using Google trends (the query below has been made on March 5, 2016, eight months before the elections). Image 51a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Google trend tells us that Donald Trump trumps the other candidates in the number of Google searches by an order of magnitude. Bernie Sanders is on the second place and Hillary Clinton on the third place, generating signiﬁcantly less Google searches than Trump. Marco Rubio is on the fourth place, with the lowest level of interest.

Key Topics Experts (Web)

Hillary, Bernie, Trump, Rubio

Key People

Key Websites

Jon Stewart

www.hufﬁngtonpost.com www.politico.

Robert Reich

com

Paul Krugman Swarm (Wikipedia)

Democratic Party,

Vermonty_Python

Reddit.com

Republican Party,

IrrationalTsunami

FeelTheBern.com

#USElection

RealDonaldTrump

http://t.co/tPiqUzQ0pZ

#MakeAmericaGreatAgain

HillaryClinton

https://about.me/collaborateforrights

#FeelTheBern

BernieSanders RFSchatten,

http://t.co/QN6DgkANlr http://www.zeustechnologies.com

Coolhunting the US Presidential Elections

Table 23: Coolhunting Overview Results for US 2016 Presidential Elections.

US Presidential Election 2016 Crowd (Twitter)

Libertea2012

351

352

Sociometrics and Human Relationships

Table 23 shows the summary of the subsequent Coolhunting with Condor, described in detail in Section 14.2. Key topics on the Web are the search terms on Google for the candidates “Hillary,” “Bernie,” “Trump,” and “Rubio.” The key people talking about the candidates are the technical pundits, commentators, and talk show hosts Jon Stewart, Robert Reich, and NYT commentator Paul Krugman. The most inﬂuential websites are the Hufﬁngton Post, owned by AOL, and Politico. Key topics on Wikipedia are the three Wikipedia pages for the Democratic Party, the Republican Party, and about the 2016 US Presidential Election. The main swarm — the intrinsically motivated people — are active on Reddit, the two most active people on the Bernie Sanders Forum on Reddit are Vermonty_Python and IrrationalTsunami. The website for Bernie Sanders is another product of the swarm; FeelTheBern.com has been created by volunteers without initial ﬁnancial backing of the Bernie Sanders campaign. For the crowd, on Twitter, the most important hashtags about the election are the general tag #USElection, Donald Trump’s #MakeAmericaGreatAgain, and Bernie Sanders’ #FeelTheBern. The most central and inﬂuential twitterers are Donald Trump, Hillary Clinton, and Bernie Sanders, followed by two politically active volunteers, RFSchatten and Libertea2012. The most central websites on Twitter at the time of the analysis in September 2015 were the personal page of a human rights activist on collaborateforrig and zeustechnologies, the site of a Web marketing agency in the United Kingdom. In the next section, we look at a Coolfarming example, how a COIN operates — the way how the Bernie Sanders

Coolhunting the US Presidential Elections

353

campaign is leveraging the Web and intrinsically motivating self-organizing volunteers to run a highly efﬁcient campaign. The subsequent section will compare the Web activities of two Republican and two Democratic campaigns through Coolhunting.

14.1. BERNIE SANDER’S PRESIDENTIAL CAMPAIGN — THE PERFECT COIN

The process of how Bernie Sander’s campaign to become the next President of the United States is unfolding is a great example of COINs, very different from the hierarchical style of his opponent at the right end of the spectrum, Donald Trump. For a start, the entire progress of the campaign is documented online, on Reddit https://www.reddit.com/r/ SandersForPresident, from its humble beginnings, up to when Bernie Sanders conceded defeat to Hillary Clinton in July 2016. In December 2013, the Reddit forum SandersForPresident was started, and four month later,

Sociometrics and Human Relationships

354

on April 30, 2014, on the same Reddit forum, Sanders announced his candidacy: Reddit — I am running for President of the United States, and seeking the Democratic nomination. I need you to stand with me and organize an unprecedented grass-roots campaign. Are you in? — B In a true COIN fashion, it was three people forming the original Reddit COIN, by the Reddit screen names Vermonty_Python, IrrationalTsunami, and scriggities who created and moderated the SandersForPresident forum. Making excellent use of social media, Sanders became a heavy user of Reddit, Twitter, and Facebook pages. The reason why he has been resonating so much on online social media is that he has been very consistent in his message for the last 30 years. During his campaign he closed in as a close second to Hillary Clinton, the frontrunner as democratic presidential candidate. For instance, in September 2015 Sanders was leading in the critical early voting state New Hampshire and a close second in Iowa. The hundreds of thousands of people on Reddit, Facebook, and Twitter form a perfect Collaborative Learning Network (CLN) learning about Sanders’ viewpoint. Some of them even self-organized their own COINs to further Sanders’ cause. For instance, Sanders succeeded in tapping into the Web savvy of young IT professionals, with whom his message of Northern European style social democracy resonated very well.

Coolhunting the US Presidential Elections

355

Jumpstarted by a young IT professional in NYC, hundreds of software developers volunteered their time, energy, and creativity to create all sorts of social media apps, websites, and idea tracking tools. Titled as “A legion of tech volunteers are leading a charge for Bernie Sanders,” the NYC describes1 how this group created the website “FeelTheBern.org” to showcase Bernie Sander’s position on key issues. They coordinated their work using the communication tool “slack,” moonlighting and contributing their skills for free to create interactive maps, donation collection apps, and grassroots organizing tools. This archetypical COIN is there for all to study through Reddit, Twitter, Facebook, and the Web.

1

http://www.nytimes.com/2015/09/04/us/politics/bernie-sanders-presidential-campaign-tech-supporters.html

356

Sociometrics and Human Relationships

14.2. COOLHUNTING BERNIE SANDERS, HILLARY CLINTON, JEB BUSH, AND DONALD TRUMP In this section, we compare the social media footprint in fall 2015 of the campaign of Bernie Sanders with his counterpart on the right spectrum of the political landscape, Donald Trump, and contrast it with their more established competitors Hillary Clinton and Jeb Bush. In a nutshell, in September 2015, the two outsiders Sanders and Trump were sharing the spotlight, while the two candidates of the establishment, Hillary and Jeb Bush, were badly trailing not just in the polls, but also on social media. We start by comparing the global Twitter footprint of the four candidates. On September 6, 2015, I collected the last 4000 tweets about “Bernie Sanders,” “Hillary Clinton,” “Donald Trump,” and “Jeb Bush.” I also collected an additional 4000 tweets on their most popular hashtags #feelthebern, #Hillary2016, #make AmericaGreatAgain, and #jeb2016.

Coolhunting the US Presidential Elections

357

Image 52a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

As the curve above in image 52 tells, the messages are quite emotional, and just above the positivity line. There are also quite a few people around the world tweeting about Sanders. And, quite importantly, the tweets are about Sanders. The next image shows the tweets about Hillary.

358

Sociometrics and Human Relationships

Image 53a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Other than Bernie Sanders, Hillary, while also having a global presence, is strongly dominated in the United States by topics other than herself. Besides Sanders showing up in tweets about Hillary, “unitedblue,” a grassroots campaign against SuperPACs (organizations sponsored by wealthy individuals circumventing US election sponsoring restrictions) is also quite prominent.

Coolhunting the US Presidential Elections

359

Image 54a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

In a true celebrity fashion, Donald Trump succeeded in making himself the topic of his own campaign. However, all the positivity of his campaign comes from outside the United States, while the sentiment of his US tweets is very negative, strongly inﬂuenced by his attacks against the Latin minority and illegal immigrants in the United States.

360

Sociometrics and Human Relationships

Image 55a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Jeb Bush is in the weakest position of the four candidates. Even in Jeb’s own Twitter feed Donald Trump features prominently, and overall his tweets are scattered and not very positive. (So his early dropping out of the presidential race was no surprise, and it was predicted by these Coolhunting results.) The next image shows the tag clouds of the tweets of the four candidates.

Coolhunting the US Presidential Elections

361

Image 56a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Hillary and Jeb are dominated by their opponents. Trump’s feud with the Hispanic immigrants (top left) shows up prominently. There is hope for Hillary (bottom right), because she shows up in all four word clouds. The outlook for Jeb Bush, however, is not good (bottom left). Even in his own tag cloud, his greatest asset is not his own achievements, but his family name. Bernie Sanders’ most prominent tag (top right) is his grassroots campaign “feelthebern.” Next, we look at the importance of the four candidates on Twitter. The next image shows the tweets for each candidate in a different color. It also measures the betweenness centrality of each candidate, drawing a line

362

Sociometrics and Human Relationships

to the search terms “Bernie Sanders,” “Hillary Clinton,” “Donald Trump,” and “Jeb Bush” and #feelthebern, #Hillary2016, #makeAmericaGreatAgain, and #jeb2016. Image 57a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Owing to his celebrity status, Donald Trump has the highest centrality in the Twittersphere. However, Hillary’s hashtag #hillary2016 is the most prominent. We repeat the analysis, this time ﬁltering out all negative tweets, only keeping the ones that the automatic sentiment analysis function of Condor categorizes as above 0.5, that is, having positive sentiment. Condor automatically recognizes sentiment in English, Spanish, German, French, Italian, and Portuguese.

Coolhunting the US Presidential Elections

363

Image 58a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The ﬁrst thing we notice in the new chart is that the network has much less nodes (i.e., people tweeting) and the structure falls apart. Jeb Bush drops out, and among the other three candidates, Sanders’ hashtag #FeelTheBern becomes the most central. Next, we use Condor’s inﬂuencer function to calculate the most inﬂuential twitterers for each candidate. Condor looks at word usage among twitterers to identify how ideas spread from one actor to the next. If somebody introduces a new word that is picked up quickly by others it makes her or him inﬂuential.

364

Sociometrics and Human Relationships

Image 59a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

We again see that Bush has very few inﬂuencers, while the Sander’s group is highly creative, coining their own vernacular, and busily retweeting their new words in their own sphere (the turquoise cluster at the lower left). Finally, we repeat the same analysis on the Web, constructing a degree-of-separation search with Condor. This search identiﬁes the most prominent websites for each candidate, and then constructs the Web link structure between these sites.

Coolhunting the US Presidential Elections

365

Image 60a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The input parameters for the Google CSE fetcher have been set for this example to only collect websites, which had been updated in the four weeks before September 6, 2015. The picture spells good news for Hillary, as she is most central on these blogs, followed by Donald Trump. Hufﬁngton Post, Politico, and Twitter are the most important sites boosting the centrality of these candidates. This predicts by seven months the outcome of the primaries, where Hillary Clinton and Donald Trump were elected as ofﬁcial candidates for their parties. In sum, based on this analysis both Hillary Clinton and Bernie Sanders have reason to be optimistic, with Sanders leading the creative and optimistic crowd on Twitter, and Hillary being the strongest brand among the

366

Sociometrics and Human Relationships

establishment. Also Donald Trump commands a strong position, but this might also come from his prior celebrity status and confrontational communication style, which provokes furious responses by the people he attacks.

14.3. TRIBEFINDER ON TWITTER (USING MACHINE LEARNING)

Condor has a built-in machine-learning function that allows the user to discover “virtual tribes” (see Section 11.3 for an example using the machine-learning function of Condor with e-mail data). Virtual tribes are groups of people who exhibit similar communication behavior in terms of network structure, communication dynamics, and message content and word usage. To ﬁnd a tribe, Condor is trained with “exemplary tribe members.” In this example, in a collection of tweets about the 2016 US presidential election, we will identify the tribe of Bernie Sanders supporters and the tribe of Donald Trump supporters. Similarly to the supporters of a politician, we can also identify supporters of brands and products, for instance ﬁnding people who prefer Pepsi Cola to Coca Cola, or Android to the iPhone.

Coolhunting the US Presidential Elections

367

A more primitive way of achieving the same goal would be to search for tweets containing “Donald Trump” and then checking if the sentiment of the tweet is positive or negative. However, there is no guarantee that a positive tweet containing the string “Donald Trump” is from a Donald Trump supporter, as for example the tweet “I like Bernie Sanders better than Donald Trump” would be categorized as positive by Condor and also contain the string “Donald Trump.” A better way to identify supporters is to use the machine-learning function of Condor. As a ﬁrst step we collect 10,000 tweets about Bernie Sanders and 10,000 tweets about Donald Trump. The Twitter data for Sanders generates a graph with 23,484 edges in the network from 16,948 actors, covering the period from 6:42 to 15:19 on April 22, 2016. The chart below shows the sentiment and activity for the tweets about Bernie Sanders.

Image 61a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

368

Sociometrics and Human Relationships

The same query for “Donald Trump” leads to 9540 actors and 21,466 edges, covering the four hours from 11:00 to 13:00 on April 22, 2016. This tells us that people tweeting about Donald Trump are less connected, but are more active than tweeters about Sanders, as the last 10,000 tweets only cover two hours of tweeting about Trump instead of the four hours in the case of Sanders. The chart below shows activity and sentiment for Donald Trump. Image 62a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The sentiment of the people tweeting about Donald Trump (the blue line above) seems to be somewhat more positive than the tweets about Bernie Sanders (blue line in the chart before). To ﬁnd a virtual tribe, Condor uses a three-step process: 1. Locate a smaller subset of known tribe members; they will be the training dataset for the subsequent machine learning.

Coolhunting the US Presidential Elections

369

2. Train the classiﬁer in Condor with the honest signals (called “features” in machine learning) of the tribe members; these features can be variables of network structure, network dynamics, and network contents. Currently, Condor includes decision tree and random forest machine-learning classiﬁers. 3. Apply the same classiﬁer to the other actors in the dataset, identifying actors with similar features, locating the other tribe members that until now have been hidden in the entire dataset. Let’s now ﬁrst extract the tribe of “Bernie Sanders” supporters to classify the Sanders fans based on their honest signals. The ﬁrst step is to identify a group of known supporters. This could be done in different ways: the most direct, although somewhat tedious, way is to read their selfdescriptions in their Twitter proﬁles. The second method consists of looking at their Twitter names, assuming that anybody with the handle “NH4bernie” or “veteransforbernie” will be a Sanders fan. The third way is to look at who is retweeting the tweets of Bernie Sanders consistently; this way they declare themselves as Bernie Sanders fans. The next step is to train Condor’s machine-learning functionality with their behavior. We will calculate all individual networking attributes of the six honest signals of collaboration, as well as the Pennebaker pronouns (see Section 5.2) as our features for the machine-learning step. Next, we will run the random forest classiﬁer, which delivers better results than decision trees in this context. The classiﬁer will then ﬁnd other Twitter users in the same dataset with similar communication behavior, that is, similar combinations of features.

370

Sociometrics and Human Relationships

To start the process of identifying the Sanders tribe, we merge the two datasets and remove the two search terms “donald trump” and “bernie sanders” as actors.

We only keep the actors who have sent at least ﬁve tweets, reducing our dataset to 448 actors and 508 connections.

Coolhunting the US Presidential Elections

371

As the image below shows, the Bernie Sanders tweeters (yellow) are much more connected than the Donald Trump tweeters (green). Image 63a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Then we have to annotate the actors with social networking attributes to calculate attributes to be used as features for the machine learning.

372

Sociometrics and Human Relationships

We also calculate the usage of words such as “the,” “and,” “or,” etc., for each actor based on James Pennebaker’s insights about the use of pronouns (see Section 5.2 for a discussion of Pennebaker’s insights). Now we are ready to start the machine learning, using the random forest classiﬁer. First, we identify the actors who are Bernie Sanders supporters, by looking at their names, guessing that people with names like “mexicans4bernie” will be Bernie Sanders fans.

Next, we decide on the features to be used for the training. In the ﬁrst run we keep all the features, including Twitter attributes such as the number of followers.

Coolhunting the US Presidential Elections

373

Now we run the random forest learner, creating 10 times more training records than the original eight known Bernie Sanders fans by changing their features by small random increments with SMOTE and undersampling the other class to two times the number of Bernie Sanders supporters (see Section 11.3 for a discussion of SMOTE), ending up with 80 virtual Bernie Sanders fans, and 160 nonfans.

374

Sociometrics and Human Relationships

We ﬁnd 29 possible Bernie Sanders supporters.

Coolhunting the US Presidential Elections

375

A check among the top matches for additional Bernie fans brings up: +LisaBeliveau is a Bernie Sanders delegate +Tthomaslew76 is a social activist +Liberalllatchr is a Bernie supporter +Barbos2 is for Bernie +Ronraj777 is for Bernie +we3fordemocracy is an New Zealand open democracy tweeter +debdlund is a female black democrat, she seems to be a Hillary supporter +tcooper9999 is a female white Hillary supporter +aroyaldmd seems to be a female Hillary supporter +denver_rose is for Hillary. This means, among the top 10 hits, the ﬁrst ﬁve are US Bernie fans, the next one is a Bernie fan from abroad, and the next four are democrats, although for Hillary. The image below sizes the nodes in the combined dataset by their similarity to the original known eight Bernie Sanders supporters.

376

Sociometrics and Human Relationships

Image 64a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

As we can see in the t-test, Bernie Sanders fans (the 35 members of the group “0”) are more active, and they mention each other much more; they also oscillate more (are more creative), are less emotional, but use more complex language.

Coolhunting the US Presidential Elections

377

We now repeat this process with Donald Trump supporters. Trump fans seem less proliﬁc in their tweeting. Therefore, to get enough actors, we include in our analysis the top 3000 actors. We then select the 23 declaring themselves through their Twitter handles as Trump fans.

378

Sociometrics and Human Relationships

We then run the random forest learner, again creating a total of 460 artiﬁcial Trump fans using SMOTE. We do not do undersampling this time, as the risk would be too big to remove proliﬁc Trump fans in the undersampling step.

Coolhunting the US Presidential Elections

379

The Random Forest classiﬁer proposes 879 matches. It is less accurate than for Bernie Sanders, as we had to take a larger sample with fewer tweets per actor for our analysis, to get enough confessing Trump fans included.

Among the top 10 Donald Trump fans: +Slowdownandlove is a confessing Trump fan +Jamesspivey is a confessing Trump fan +Mwbrown51358 is a republican taking swings at Ted Cruz Parantokristine is a Bernie Sanders fan BCLaraby is a Canadian Bernie Sanders fan Liepardestin is a Bernie Sanders fan +Luimbe tweets are all over the spectrum, but seem to support Trump +Oldgoatsmell seems to be a Trump fan (Nytpolitics is the newsfeed of the New York Times)

380

Sociometrics and Human Relationships

(AFixhold is a SEO optimizer) +Trumpeterswin is a trump fan. As the image below shows, Trump and Bernie fans are scattered between the two datasets collected with search terms “bernie sanders” and “donald trump.” The second picture below (image 66) shows the connected component in the center of the larger picture (image 65), colored by the original datasets, the blue dots at left are the actors collected with the query “donald trump,” the purple dots at right are the actors collected with the query “bernie sanders.” In the larger picture on top (image 65), the turquoise dots are the original 23 confessing Donald Trump fans, the green dots are potential members of the Trump tribe identiﬁed through machine learning of our random forest classiﬁer. Image 65a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

Coolhunting the US Presidential Elections

381

Image 66a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The t-tests show that the Trump fans (the 899 people of group “0”) have less friends on Twitter and tweet less per person than the rest of the people, that they use less pronouns, they also oscillate more, but contrary to the Sanders fans, they use less complex language, but are more emotional.

382

Sociometrics and Human Relationships

To summarize, the Twitter Tribeﬁnder process consists of four steps: 1. Collect as many tweets as possible about “bernie sanders” and “donald trump.” 2. Merge the two datasets. 3. Find the most prominent Bernie or Donald supporters (e.g., from Twitter handles like “vets4bernie” or “FL4trump”). 4. Train with these supporters to ﬁnd the other hidden Bernie or Donald fans.

Coolhunting the US Presidential Elections

383

14.4. FOLLOW-ON EXERCISES 1. Collect 10,000 tweets for the search term “Apple.” Find the Apple fans by taking the top 20 retweeters of the ofﬁcial Apple accounts of Tim Cook (@tim_cook), Apple’s CEO, @applenws and “@applesupport.” Find their tribe, and extract its features, comparing them with their peers in the same dataset with a t-test. 2. Collect 10,000 tweets for the search term “Google.” Find the Google fans by taking the top 20 retweeters of the ofﬁcial Google accounts “@Google” and “@googleresearch.” Find their tribe, and extract its features, comparing them with their peers in the same dataset with a t-test. 3. What is the difference between the Apple and the Google fans?

MAIN LESSONS LEARNED • Internet Coolhunting is well-suited for analyzing and predicting the outcome of political elections. • Today’s US Presidential elections are fought to a large extent on the social media. • The contrasting styles of the two candidates Bernie Sanders and Donald Trump demonstrate the difference between COIN-based and hierarchical leadership style. • The popularity of four candidates early in the race for the US 2016 presidential election is

384

Sociometrics and Human Relationships

analyzed: Bernie Sanders, Donald Trump, Hillary Clinton, Jeb Bush. • Using machine learning in “tribeﬁnder,” Condor identiﬁes members of the “Bernie Sanders tribe” and the “Donald Trump tribe” and their communication behavior.

PART III. AUTOMATIC MEDIA INSIGHTS COIN ASSESSMENT (AMICA) The ﬁnal part of this book illustrates how Web-based Coolhunting and e-mail based Coolfarming can be combined to measure and optimize social capital. Just like SAP, Oracle Financials, and Microsoft Dynamics provide a ﬁnancial capital management system, Automatic Media Insights COIN Assessment (AMICA) provides a social capital management system for individuals and organizations. AMICA is an assessment of individual and group behavior that measures, compares, and optimizes the collective mind of an individual, organization, or a company. Borrowing two metaphors from medicine, AMICA offers both collaboration diagnosis and collaboration therapy. The AMICA diagnosis identiﬁes which types of communication patterns are indicative of the most efﬁcient and effective collaboration. Monitoring collaboration with AMICA allows individuals and organizations to measure the health of their relationships based on their communication style with friends and colleagues. The AMICA therapy helps individuals to improve their collaborative behavior; it suggests interventions to change individual and organizational collaboration based on

r 2017 Peter A. Gloor

385

386

Sociometrics and Human Relationships

their communication patterns, resulting in healthier and more satisfactory relationships. Based on the principles of social quantum physics introduced in the companion book Swarm Leadership and the Collective Mind: Using Collaborative Innovation Networks to Build a Better Business and brieﬂy mentioned in the introduction chapter, AMICA helps individuals and organization to build entanglement through empathy, and to reboot and reﬂect, to constantly improve collaboration behavior through self-reﬂection triggered through a virtual mirror of communication patterns. AMICA provides a complete virtual mirror of an individual’s and an organization’s collaborative conduct, applying Durkheim’s concept of “collective consciousness.”1 AMICA uses Condor to collect a set of comparative data points for individuals and organizations to benchmark their collaborative performance, comparing it against a normalized benchmark based on the six honest signals of collaboration. AMICA also includes a two-part online survey to assess individual and organizational collaboration with a series of qualitative questions. The goal is to assist individuals and organizations in the formation and growth of COINS, emergent ad hoc structures of intrinsically motivated people getting together to create radically new things. Note that these patterns have been identiﬁed in our current research and may not exactly ﬁt your team or organization. More work is needed to test and verify them more widely; in their current form they should be treated as experimental and explorative research. We are still at an early discovery period using these metrics and 1

Durkheim and Swain (2008).

Part III. Automatic Media Insights COIN Assessment (AMICA)

387

tools, but have enough conﬁdence with the initial results to share the ﬁndings with you in this book to use. AMICA consists of four automated analysis modules abbreviated as IMIC, OMIC, IMOC, and OMOC (Table 24), measuring communication of individuals and organizations through their patterns on inside communication archives (e-mail, Skype, online calendars) and outside online social media (Twitter, Wikipedia, Blogs, and Facebook). It is complemented by an inside and an outside online survey (SIC and SOC). The automated part of AMICA comprises a description of how to calculate the four different metrics in Condor, as well as reference benchmarks and recommendations on how to change the behavior for better collaboration. The three parts of AMICA focusing on the individual are IMIC, OMIC, and SIC. IMIC measures the collective mind of individuals, automatically analyzing the inside media. It speciﬁes a process to collect the six honest signals of individuals using Condor, based on analyzing their e-mail, calendar, and Skype archive. OMIC measures the communication behavior of individuals on the outside media. It speciﬁes a process to track the position of individuals in the social media, looking at their social network extracted from Twitter, Web (through Google CSE), Wikipedia, and Facebook wall. IMIC and OMIC are complemented by SIC, a survey of individual collaboration behavior according to the key principles of social quantum physics. SIC provides a qualitative approach to measuring collaboration, we hope to identify correlation between SIC scores and the six honest signals of collaboration, but this will only be possible once we have collected enough SIC ratings.

388

Table 24: The Six Different Parts of the AMICA Analysis. Inside Media “Honest Signals”

Outside Media “Honest Signals”

Collaboration Readiness Assessment

(1) IMIC Inside Media Individual Collaboration Individual message

(2) OMIC Outside Media Individual Collaboration Social Media (Twitter, Web,

(3) SIC Survey of Individual Collaboration Measuring collective

archive (e-mail, calendar, Skype,…)

Wikipedia…) presence & perception of the

intelligence and collaborative

star or galaxy

individual

capabilities

Condor

Condor

Online Survey

(4) IMOC Inside Media

(5) OMOC Outside Media Organizational

(6) SOC Survey of Organizational

Organizational Collaboration

Collaboration Social Media awareness and

Collaboration Measuring collective

Organizational messaging archive

presence of company, brand, and products

consciousness and collective creativity

(e-mail, calendar, phone,…)

on Twitter, Wikipedia, Blogs,…

of organizations

Condor

Condor

Online Survey

Individual Collaboration Assessment

Assessment

Sociometrics and Human Relationships

Organizational Collaboration

Part III. Automatic Media Insights COIN Assessment (AMICA)

389

AMICA also includes three corresponding assessments for measuring the collective mind of organizations. IMOC automatically analyzes the inside media from the perspective of the organization. It speciﬁes a process to collect the six honest signals of organizations using Condor, based on analyzing their e-mail, calendar, and Skype archive. It also provides benchmarks to interpret the six honest signals of collaboration of organizations, comparing them against a database of different organizations from different cultures which is still under development; currently we have results from the United States, Australia, India, Switzerland, Germany, and Finland. OMOC automatically analyzes the outside media from the organizational perspective. OMOC speciﬁes a process in Condor to track the social network position of organizations in social media, looking at the strength of company names, brands, and products, extracted from Twitter, Web (through Google CSE), Wikipedia, and corporate Facebook walls. Finally AMICA also provides SOC, an organizational online survey that tracks through a series of questions on the collective consciousness and collaborative creativity of organizations. The survey questions are to be answered on a Likert-type scale. Just like SIC, SOC provides a qualitative approach to measuring collaboration; we hope to identify correlation between SOC scores and the six honest signals of collaboration, but this will only be possible once we have collected enough SOC ratings.

This page intentionally left blank

15 INSIDE MEDIA INDIVIDUAL COLLABORATION (IMIC)

IMIC measures collaboration behavior of individuals inside the organization, based on their e-mail, Skype, and calendar archives.

r 2017 Peter A. Gloor

391

Operationalization in Condor

Diagnosis: Indication of High Collaborators

Therapy: What You Can Do to Improve Collaboration

Degree centrality of e-mail

Connecting to many people can be an indicator of openness

Reach out to new people outside the core team

Frequent communication

Number of e-mail messages sent

Sending more messages than receiving can be an indicator of proactive behavior

Be proactive and responsive, but conscious of not ﬂooding others with messages

Group ﬂow

AWVCI average-weighted variance in contribution index

Having a low variance in sending and receiving among group members means having a shared culture

Integrate all group members into information exchange, encourage passives to participate more actively, get spammers to send less

Creativity

Betweenness centrality oscillation

Repeatedly changing individual networking position from central to peripheral may be an indicator of creativity

Empower others by delegating, and by rotating between a central position of responsibility and letting others lead

Passion

Ego ART (average response time of individual) to e-mail

A person who is responsive and answers quickly shows more passion

Be responsive to everybody, independent of status and prestige

Respect

Alter ART (average response time of everybody else to an individual) to e-mail

An individual who elicits fast response from others is highly respected

Treat others with respect, then they will treat you respectfully too

Emotionality

Emotionality is deﬁned as standard deviation of positive and negative sentiments

Saying what is good as well as what could be better is a sign of a high-functioning community

Be more honest, and refrain from using overly positive language

Sociometrics and Human Relationships

Open communication

392

Individual Collaboration Characteristics

Inside Media Individual Collaboration (IMIC)

393

To calculate the main scores, this example uses e-mail, which can be replaced by Skype or calendar data if e-mail is not readily available. For the drill-down, Skype and calendar networks can easily be integrated, if these archives are accessible. Skype will include content through any chat that might have been done in parallel to the call; for calendar analysis there will also be content from the entries and comments in the calendar. The radar chart below gives a comparative overview of the seven individual collaboration characteristics for three different individuals working at the same company, comparing their IMIC metrics against each other. This example illustrates that scores for the seven characteristics are very much dependent on the role and function of the individual, comparing the inside communication patterns of three employees holding very different roles at the company. Andrew, the SVP for products, has a more outside focused role, which leads to a communication proﬁle much higher in communication frequency than IT technician Jacob and software developer Charles. Andrew is very popular with his customers, as he is high in passion for his job. Nevertheless, his customers are much slower in responding to him than the internal IT developers responding to their colleague Charles, who has high respect inside his department, leading to a lower respect score for Andrew. On the other hand, Andrew scores high on ﬂexibility and adaptability.

394

Sociometrics and Human Relationships

Image 67a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Drilling down in Condor shows the different social network structures for Jacob and Andrew. Using Condor to ﬁlter the top 20 actors by betweenness (proxy for importance) and then sizing them by betweenness oscillation (proxy for creativity) illustrates in the social network image below that while Andrew is communicating as the center of a galaxy, Jacob has a star network. In Andrew’s network, Bob is the most creative person. Images 68 and 69a

a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

Inside Media Individual Collaboration (IMIC)

395

The same IMIC representation can be used to track individual change of a single person in the seven personality characteristics over time. The image below compares my own communication behavior between April and June 2016. It shows that while in April I was communicating more — measured as frequency of communication — it was always with the same people, because in May (the red line) the openness of communication with many different people is increasing, reaching its peak in June. On the other hand, my shared vision and emotionality pattern is quite consistent over all three months. Passion is highest in May, while respect peaks in April and June. Flexibility and adaptability show a low point in April, when I was single-mindedly focused on organizing a workshop at MIT.

Image 70a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

396

Sociometrics and Human Relationships

A drill-down of my network shows the top 30 people of my mailbox in April and May 2016. The size of the nodes shows the most respected people, that is, the ones whom others respond the fastest in April and May. We currently rely on Exchange or IMAP to adjust for time zone differences; note, however, that the automatic time zone translation systems are far from perfect, in particular if legacy e-mail data is directly imported, for example from Novell or Lotus Notes.

a

Inside Media Individual Collaboration (IMIC)

Images 71 and 72a

For color pictures see online version of images, available at http://www.ickn.org/sociometrics/

397

398

Sociometrics and Human Relationships

I also analyze my Skype network from 2010 to 2015; the ﬁrst chart shows my overall activity in number of calls per day. There were some group calls where I was conducting meetings with my students; thus, I was in contact with up to 100 participants in June 2011.

The image below shows my full Skype network over the entire duration from 2010 to 2015.

Inside Media Individual Collaboration (IMIC)

399

Removing in the next step the star in the center (myself) from the Skype call network creates a much clearer picture, as now my colleagues and collaborators stand out distinctly. The size of the nodes of both networks is by betweenness centrality oscillation, which is a proxy of creativity.

400

Sociometrics and Human Relationships

Besides Skype, the online calendar is also a rich source of information. As the activity chart below illustrates, the number of my meetings started to explode in the second half of 2014 and 2015.

The social network chart below shows the reason: it’s mostly because of my participation in the healthcare projects with NICHQ and HRSA, the health resource administration, where I participated in the Infant Mortality reduction Collaborative Improvement and Innovation Networks (IM CoIIN) and with Cincinnati Children’s Hospital Medical Center (CCHMC), where I participated in the Type 1 diabetes (T1D) and cystic ﬁbrosis (CF) projects.

Inside Media Individual Collaboration (IMIC)

401

Image 73a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

15.1. IMIC ANNOTATION PROCESS These values can easily be calculated in Condor. The ﬁrst step consists of collecting the mailbox data from the server.

402

Sociometrics and Human Relationships

The second step consists of loading the dataset for the desired time interval, and calculating the group values. This step is the same for all four automatic assessments of AMICA, not only IMIC, but also OMIC, IMOC, and OMOC. For example, for analyzing my mails of June 2016, I load the dataset with the following parameters.

Next I compute the group values using Condor’s annotate functions. They are calculated using the centrality annotations (group betweenness and degree centrality), group betweenness oscillation, AWVCI, TurnTaking, and graph density, as well as the “calculate sentiment” function, which will calculate the average sentiment, emotionality, and complexity for all the messages in the dataset.1

1

In principle, these variables could all be calculated automatically; however, their parameters need to be adjusted for the data being analyzed (e-mail, Skype, calendar, monthly/weekly data collection, snowball sampling or corporate archive, etc.). Once the parameters have been deﬁned, the server version of Condor could be used, which can collect and calculate the AMICA values automatically on a server using a RESTful API.

Inside Media Individual Collaboration (IMIC)

403

After having calculated the values, I export them to Excel for comparative analysis, using the “Export-> Export dataset properties” menu. The resulting CSV ﬁle can then be loaded into Excel and be visualized using Excel’s radar chart function.

Note that to show them as a nice radar chart in Excel, we need to normalize the values into the interval [0,1], by dividing them through their maximum value. For the values of passion and respect (ego ART and alter ART) we have to invert the maximum values, so 1 stands for the smallest response time (i.e., ego and alter ART) and 0 for the highest, as the lowest response time reﬂects the highest passion and respect. A sample Excel template is provided on the companion book website (www.ickn. org/sociometrics).

This page intentionally left blank

16 OUTSIDE MEDIA INDIVIDUAL COLLABORATION (OMIC)

OMIC measures collaboration behavior of individuals as seen from the outside through the lens of online social media such as Twitter, Facebook, Wikipedia, and Google search. OMIC starts with an analysis of an individual’s footprint on Twitter and extends the drill-down exploration with Wikipedia, Facebook wall, and Google Blog search analysis.

r 2017 Peter A. Gloor

405

406

Individual

Operationalization in Condor on

Diagnosis: Indication of High

Therapy: What You Can Do to

Social Media

Popularity and Inﬂuence on Social

Improve Your Social Media

Media

Footprint

Collaboration Characteristics Activity

The more people tweet about a person, Be selective in tweets, and crosslist

EgoFetcher

the more popular the person is

them on Facebook and LinkedIn

Central

Group degree centrality of the Twitter

The less centralized the network, the

For a Twitter community, a

leadership

network

more different people form their own sub-communities

decentralized network might be desirable

Creativity

Oscillation in betweenness

The more diverse the network structure Reach out to new people outside the

centrality of actors

of people tweeting about the person,

core group by including them in tweets,

the more creative the person

and add blog content and outside links

Ego ART (average response time of

The faster actors respond to tweets,

Be responsive; however, only tweet

individual to tweets from others)

the more passionate they are about the when you have something to say

Passion

person originally tweeting Respect

Alter ART (average response time of

The faster other people respond to

Add substance to your tweets, and

everybody else to tweets from original actor)

tweets from the original actors, the more the original actors are respected

cross-reference blogposts and other interesting content on the Web

Sociometrics and Human Relationships

Number of tweets collected with

Sentiment

Emotionality

Complexity measures word usage by looking at the diversity of the vocabulary,

More complex language means that people are having more diverse and

and its distribution among the different

thought-provoking discussions on

tweets

Twitter

Sentiment measures the positivity and

Having a more positive sentiment in

Use a fundamentally positive tweeting

negativity of tweets using the machinelearning function of Condor

Twitter is an indicator of long-term success

style, and also point out positive things, instead of complaining

Emotionality is deﬁned as standard

Being more emotional on Twitter is an

Be honest in your own tweets, and

deviation of positive and negative

indicator of being more engaged and

refrain from using overly positive

sentiments in tweets

committed

language

Make good use of the 140 characters, and add links to pictures and videos

Outside Media Individual Collaboration (OMIC)

Complexity

407

408

Sociometrics and Human Relationships

The radar chart below compares the social media footprint of Bill Gates, Neil deGrasse Tyson, Paul Krugman, and Arianna Hufﬁngton collected on July 12, 2016. On this day, Bill Gates had the highest Twitter activity, followed by Neil deGrasse Tyson. Tyson and Gates were also seen as most creative. In that particular week, there was not much activity around Paul Krugman, which is why his network shows centralized leadership, but his Twitter network is using the most complex language, while Arianna Hufﬁngton’s is using the most positive language and is most passionate and respectful, meaning that tweets by her swarm are retweeted quickly, while the members of her swarm are also quick to retweet. In this analysis, we are not just looking at the tweets of Bill Gates, Neil deGrasse Tyson, Paul Krugman, and Arianna Hufﬁngton, but also including the importance of the people retweeting them, as well as analyzing the behavior of the people tweeting about them. Note that Twitter reﬂects the ﬁckleness of the crowd, one week later the values might be quite different.

Outside Media Individual Collaboration (OMIC)

409

Image 74a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The image below shows out of the total 56,284 actors in Bill Gates’ Twitter network on July 12 the top 20 actors by betweenness. They are his most inﬂuential supporters; Bill Gates himself is one of them. RealDonaldTrump also shows up, as he has a dominating presence in August 2016 in the Twittersphere, being cross-referenced numerous times by other tweeters. The most central Twitterid though is YouTube, thanks to the numerous links to science and other videos tweeted by Bill Gates’ and his supporters.

410

Sociometrics and Human Relationships

Image 75a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

To further explore the social media presence of a person, we can also do a Wikipedia query and look at the Wikipedia network surrounding a person. The image below compares my own Wikipedia network (note that I do not have a page on Wikipedia) against the network of Bill Gates on Wikipedia, illustrating Bill Gates’ position as a global thought leader, tycoon of IT, and leader of his charitable foundation.

Outside Media Individual Collaboration (OMIC)

411

412

Sociometrics and Human Relationships

My Facebook wall shows the people who left comments on my Facebook wall. Note that only I as the owner of my Facebook page will be able to download the content of my page into Condor. As the activity chart below illustrates, my Facebook wall is not particularly active from September 2013 to January 2016.

Not surprisingly, I am the most central person on my own Facebook wall; my colleague and Facebook friend Yoshiaki also has left some well-connected comments on my page.

Outside Media Individual Collaboration (OMIC)

413

To better understand the network, I remove myself from the network. Yoshiaki now becomes the center of his own galaxy, while many other Facebook friends become much more central. My Facebook friends Mattaeus, Sergey, Andreas Pollak, Azadeh, Jun, and Puja suddenly stand out as gatekeepers of information and posts on my Facebook wall.

414

Sociometrics and Human Relationships

16.1. OMIC ANNOTATION PROCESS The main tool for the online social media analysis is Condor’s Twitter EgoFetcher, which simulates an individual’s social reach on Twitter, by factoring in the popularity and importance of the people retweeting the individual’s tweets. The EgoFetcher works in four steps: (1) It takes the last N (e.g., 10,000) tweets about the search term or Twitter handle (e.g., “Bill Gates”). Note that normally — except for an individual’s own tweets — the search API of Twitter only returns last week’s tweets. If the Twitter search API’s limit of 180

Outside Media Individual Collaboration (OMIC)

415

tweets is reached, the EgoFetcher will pause for 15 minutes and then continue fetching. There is no limit on the tweets from an individual’s timeline, so you might be able to get much higher numbers of tweets before hitting the rate limit. (2) It constructs a network with a link from actor B to actor A if B retweets A or B mentions A in a tweet. (3) It then takes the timelines of the 480 most inﬂuential people in the search results, the inﬂuence of people is measured through their degree in the retweet network from step (2). For Twitter users, their timeline is all their tweets, sorted from newest to oldest. (4) It adds for each tweet collected in the previous steps the ﬁrst 100 retweets. This leads to a network that shows the impact and reach of a person in the twittersphere, and is a better proxy of their importance than just the number of followers, which can be gamed or bought. For example, to collect the Twitter Ego network of Bill Gates, we would run the Twitter EgoFetcher with the following settings.

416

Sociometrics and Human Relationships

The image below shows the 56,284 actors of the Twitter Ego network for Bill Gates; the dark grey dots are the tweets collected in step (1) above, running a search for the string “Bill Gates” on Twitter. The light grey dots come from the timelines of the top 480 users collected in step (1), and also include the top 100 retweets for each of the tweets — these are the scattergun-like funnels in the periphery of the graph.

Outside Media Individual Collaboration (OMIC)

417

After the data collection process has been completed, the annotations are calculated using the same process as described above for IMIC, and exported to Excel, where they are then visualized using the radar chart function of Excel.

This page intentionally left blank

17 INSIDE MEDIA ORGANIZATIONAL COLLABORATION (IMOC)

IMOC measures collaborative performance of organizations through the aggregated communication behavior of its individual members conversing inside the organization, based on their e-mail, Skype, and calendar archives. It looks at how departments, business units, and companies communicate as collective entities. It is calculated using the group measures of Condor.

r 2017 Peter A. Gloor

419

Operationalization in Condor

Diagnosis: Indication of High Collaborators

Therapy: What You Can Do to Improve Collaboration

Group degree centrality of e-mail

More centralized leadership might lead to more innovation

Encourage qualiﬁed group members to assume leadership roles

Group creativity

Oscillation of group betweenness centrality over time

Many individuals changing their networking positions from central to peripheral may be an indicator of organizational creativity

Empower others by delegating, and by rotating between a central position of responsibility and letting others step up

Group ﬂow

AWVCI average-weighted variance in contribution index

Having a low variance in sending and receiving among group members means having a shared culture

Integrate all group members into information exchange, encourage passives to participate more actively, get spammers to send less

Empowerment

Graph density

The more directly connected employees are to Increase connectivity by encouraging interothers, the more they are empowered organizational and across-hierarchy communication

Satisfaction

Ego ART (Average Response A community where members are responsive Time of individuals to others) is an indicator of high satisfaction

Try to create a respectful work culture

Empathy

Alter ART (Average Response Time of everybody else to individuals)

Answering quickly to others is an indicator of respect and empathy

Be more responsive to everybody, independent of status and prestige

Emotionality

Emotionality is deﬁned as standard deviation of positive and negative sentiments

Saying what is good as well as what could be better is a sign of a high-functioning community

Teach members of the organization to be more honest, and refrain from using overly positive language

Sociometrics and Human Relationships

Central leadership

420

Organizational Collaboration Characteristics

Inside Media Organizational Collaboration (IMOC)

421

The example below compares the collaboration performance of a professional services company with 45,000 employees over three months, plotting the values for the seven IMOC variables in the radar chart. The image illustrates that in June 2016 the company communicated most centrally, most likely because some company-wide campaigns were run. April was the most creative month, where different leaders took turns assuming central roles. In May, employees were most empathic, responding to each other the fastest. Image 76a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The drill-down image below shows the most central 2000 employees of the company in April (in blue), as well as their communication with key customers (shown in red). Note that the customers are scattered in the periphery, while the employees of the company are doing a lot of the talking among themselves.

422

Sociometrics and Human Relationships

Image 77a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

When ﬁltering the company employees for only the account managers of the customers, the position of the customers (marked in red) notably changes (image 78), and the customers move into the core of the network. In order to measure the customer focus of the company, it would therefore be worthwhile to repeat the calculation of the IMOC variables using the network below, to track the change in customer focus of the company’s account managers.

Inside Media Organizational Collaboration (IMOC)

423

Image 78a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

17.1. IMOC ANNOTATION PROCESS The IMOC annotation process is identical to the IMIC annotation process described in chapter 15. The main difference is that the e-mail, Skype, or calendar archive is not from an individual, but from an entire organization. This means that normally it is much more voluminous, easily containing billions of communication records which might necessitate calculation of the Condor variables on a cloud server equipped with over 100 GB RAM. For preprocessing the communication data, a map reduce or Hadoop-based cluster might be necessary.

This page intentionally left blank

18 OUTSIDE MEDIA ORGANIZATIONAL COLLABORATION (OMOC)

OMOC measures collaboration behavior of companies as seen from the outside through the lens of online social media such as Twitter, Facebook, Wikipedia, and Google search. OMOC starts with an analysis of the organization’s footprint on Twitter and extends the drill-down exploration with Wikipedia and Google Blog search analysis.

r 2017 Peter A. Gloor

425

Operationalization in Condor on Social Media

Diagnosis: Indication of High Popularity and Inﬂuence on Social Media

Therapy: What You Can Do to Improve Your Organization’s Social Media Footprint

The less centralized the network, the more different people form their own sub-communities

Activity

Number of tweets collected with EgoFetcher about company name

The more people tweet about a company, the more Be selective when tweeting, and crosslist popular it is tweets on Facebook and LinkedIn

Creativity

Oscillation in betweenness centrality of company name in Twitter network

The more diverse the network structure of people tweeting about the company, the more creative the company’s brand

Reach out to new people outside the company by including them in tweets, and add blog content and outside links

Passion

Ego ART (Average Response Time of individuals to tweets from others)

The faster actors respond to tweets, the more passionate they are about the person tweeting about the company

Be responsive yourself, and only tweet about the company when you have something to say

Respect

Alter ART (Average Response Time of everybody else to tweets from original actors)

The faster other people respond to tweets from the original actors, the more the original actors tweeting about the company are respected

Add substance to your tweets, and crossreference blogposts and other interesting content on the Web

Complexity

Complexity measures word usage by looking at the diversity of the vocabulary, and its distribution among the different tweets

More complex language means that people are having more diverse and thought-provoking discussions on Twitter about the company

Make good use of the 140 characters of Tweets, and add links to pictures and videos

Sentiment

Sentiment measures the positivity and negativity Having a more positive sentiment in Twitter is an of tweets using the machine-learning function of indicator of a positive attitude toward the brand Condor

Use a fundamentally positive tweeting style, and also point out positive things, instead of complaining

Emotionality

Emotionality is deﬁned as standard deviation of positive and negative sentiments in tweets

Be honest in your own tweets, and refrain from using overly positive language when tweeting about the brand

Being more emotional on Twitter is an indicator of being more engaged and committed toward the brand

For a Twitter community, a decentralized network might be desirable

Sociometrics and Human Relationships

Central leadership Group degree centrality of the Twitter network about the company name

426

Organizational Collaboration Characteristics

Outside Media Organizational Collaboration (OMOC)

427

The radar chart below compares the social media footprint of Microsoft and Google, collected on July 12, 2016. On this day, Microsoft and Google had comparative Twitter activity; however, Google is seen as more creative. People tweeting about Google are also more passionate and respectful. The text of Tweets about Microsoft is slightly more complex. Note that Twitter reﬂects the ﬁckleness of the crowd; one week later the values might be quite different.

The image below shows a drill-down with the top 30 most central nodes; their node size indicates betweenness centrality oscillation. We ﬁnd that Prashantrjoshi, MSFTExchange, and sladner show the most creative Twitter networking behavior, which means they are engaged in a rapid exchange of tweets.

428

Sociometrics and Human Relationships

Image 79a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

The image below shows the most central websites boosting the Microsoft brand. Nodes are sized by betweenness. Facebook, Forbes, and the New York Times are most central.

Outside Media Organizational Collaboration (OMOC)

429

18.1. OMOC ANNOTATION PROCESS The OMOC annotation process is identical to the OMIC annotation process described above. The main difference is that the Twitter, Wikipedia, and blog archives are not about an individual, but about an organization. Similarly to OMIC, OMOC mostly uses the Twitter EgoFetcher, collecting tweets about a company’s name, and the Twitter timelines of the most important 480 people tweeting about the company as well as the ﬁrst 100 retweets of each tweet. Note that the current Condor Facebook fetcher is restricted to the walls of individual people; it does not directly provide walls of Facebook groups and organizations.

430

Sociometrics and Human Relationships

18.2. FOLLOW-ON EXERCISES 1. Using your own mailbox, do a radar chart IMIC analysis, comparing your communication behavior of the last three months and replicate the drill-down as described in Chapter 15. 2. Compare the social media proﬁles of Pope Francis, Tim Berners-Lee, and Roger Federer using OMIC and replicate the drill-down as described in Chapter 16. 3. Combine the e-mail boxes or Skype archives of you and your three closest friends into one combined team-mail box and analyze your communication behavior over the last three months using IMOC, and replicate the drill-down as described in Chapter 17. 4. Compare the social media proﬁles of Samsung, Nokia, Apple, and Huawei using OMOC and replicate the drill-down as described in Chapter 18.

19 SURVEY OF INDIVIDUAL AND ORGANIZATIONAL COLLABORATION (SIC & SOC)

The four automated online media-based assessments are complemented by two survey-based assessments: Survey of Individual Collaboration (SIC), focusing on the individual, and Survey of Organizational Collaboration (SOC), with a focus on the organization. The survey questions are grounded in extensive prior research. The survey is also online at http://5.35.249.27/sociometrics/sicsoc

19.1. SURVEY OF INDIVIDUAL COLLABORATION (SIC) SIC measures the attitude of individuals toward collaboration in the seven dimensions: individual motivation, organizational motivation, transparency, fairness, trust/honesty, forgiveness, and empathy/listening. These dimensions are explained in detail in my companion book Swarm Leadership and the Collective Mind: Using Collaborative Innovation Networks to Build a Better Business. r 2017 Peter A. Gloor

431

432

19.1.1. Individual Motivation

Agree I would take a different job paying the same

(1)

Disagree (2)

(3)

(4)

(5)

(6)

(7)

(1)

(2)

(3)

(4)

(5)

(6)

(7)

If I got all the money I ever wanted, I would still stay in my current profession

(1)

(2)

(3)

(4)

(5)

(6)

(7)

In my private time I spend time reading up on professional material

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I am very personally involved in my job

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I consider my profession central to my existence

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Source: Blau (1985).

Sociometrics and Human Relationships

If I could start again, I would not learn my current profession

Agree I am willing to put in extra effort for my organization

(1)

Disagree (2)

(3)

(4)

(5)

(6)

(7)

I talk up my organization as a great place to work

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I would accept almost any job to stay in my organization

(1)

(2)

(3)

(4)

(5)

(6)

(7)

In my private time I spend time reading up on professional material

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I really care about the fate of my organization

(1)

(2)

(3)

(4)

(5)

(6)

(7)

This is the best organization to work for

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Source: Blau (1985).

Survey of Individual and Organizational Collaboration (SIC & SOC) 433

19.1.2. Organizational Motivation

434

19.1.3. Transparency

Agree

Disagree

The people I work with keep me informed

(1)

(2) (3) (4) (5) (6)

(7)

It is important for me to know if a website that collects my information will use it in a way that will

(1)

(2) (3) (4) (5) (6)

(7)

(1)

(2) (3) (4) (5) (6)

(7)

Citizen requests for government documents are just a big distraction for government workers

(1)

(2) (3) (4) (5) (6)

(7)

Do you think whistleblowers, anticorruption activists, and journalists should enjoy legal protections that make them feel secure about reporting cases of corruption?

(1)

(2) (3) (4) (5) (6)

(7)

My organization wants people like me to know what it is doing and why it is doing it

(1)

(2) (3) (4) (5) (6)

(7)

identify me I think ordinary citizens should have access to records of government contracts, including the amount and who got the contracts

Sociometrics and Human Relationships

Sources: Awad and Krishnan (2006); Piotrowski and Van Ryzin (2007); http://www.transparency.org/

Agree

Disagree

I help others to acquire the skills they might need

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My organization treats people like me fairly and justly

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My organization can be relied on to keep its promises

(1)

(2)

(3)

(4)

(5)

(6)

(7)

This organization does not mislead people like me

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My organization is interested in the well-being of people like me, not just itself

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My organization freely admits when it has made mistakes

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Source: Rawlins (2008).

Survey of Individual and Organizational Collaboration (SIC & SOC) 435

19.1.4. Fairness

436

19.1.5. Trust/Honesty

Agree

Disagree

I give information to the group, even if it might jeopardize my position or job

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I am not afraid to offend other people if I think I am right

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I believe my organization takes the opinions of people like me into account when making decisions

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I’m willing to let my organization make decisions for people like me

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I trust my organization to take care of people like me

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Source: Rawlins (2008).

Sociometrics and Human Relationships

When my friends have a problem, they usually ask me for help

Agree Your signiﬁcant other has just broken up with you, leaving you hurt and confused. You learn that the

Disagree

(1)

(2) (3) (4) (5) (6)

(7)

I feel hatred whenever I think about the person who wronged me

(1)

(2) (3) (4) (5) (6)

(7)

I think my life is ruined because of this person’s wrongful actions

(1)

(2) (3) (4) (5) (6)

(7)

If I encountered the person who wronged me I would feel at peace

(1)

(2) (3) (4) (5) (6)

(7)

I hope the person who wronged me is treated fairly by others in the future

(1)

(2) (3) (4) (5) (6)

(7)

A friend borrows your most valued possession, and then loses it. You will never forgive her/him

(1)

(2) (3) (4) (5) (6)

(7)

reason for the break up is that your signiﬁcant other started dating a good friend of yours. You will never forgive her/him

Source: Rye et al. (2001).

Survey of Individual and Organizational Collaboration (SIC & SOC) 437

19.1.6. Forgiveness

438

19.1.7. Empathy/Listening

Agree

Disagree

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Most of my expertise has developed as a result of working with others

(1)

(2)

(3)

(4)

(5)

(6)

(7)

When I am in need my colleagues will go out of their way to help me

(1)

(2)

(3)

(4)

(5)

(6)

(7)

We are continuously encouraged to bring new knowledge in our team

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I believe my organization takes the opinions of people like me into account when making decisions

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My organization asks the opinions of people like me before making decisions

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Sources: Sveiby and Simons (2002); Narayan and Cassidy (2001); Grootaert (2004).

Sociometrics and Human Relationships

I learn a lot from others in the team

Survey of Individual and Organizational Collaboration (SIC & SOC) 439

19.2. SURVEY OF ORGANIZATIONAL COLLABORATION (SOC) SOC measures the collective attributes of individual group members toward their organization in the ﬁve dimensions: collective consciousness, leadership, contribution/sharing, and responsiveness/respect. It focuses on collective consciousness of group members, their leadership behavior, their attitude toward sharing, and giving respect to everybody.

440

19.2.1. Collective Consciousness

Agree

Disagree

I am a worthy member of the group I belong to

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I feel good about the group I belong to

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Overall, my group is considered good by others

(1)

(2)

(3)

(4)

(5)

(6)

(7)

We have a strong organizational culture with shared vision, values, norms,

(1)

(2)

(3)

(4)

(5)

(6)

(7)

The group I belong to is an important reﬂection of who I am

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I have a strong sense of belonging to my own group

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Source: Luhtanen and Crocker (1992); Ashmore, Deaux, and McLaughlin-Volpe (2004).

Sociometrics and Human Relationships

systems, symbols, language, assumptions, beliefs, and habits in our group

Agree

Disagree

My group chooses its own leaders

(1)

(2)

(3)

(4)

(5)

(6)

(7)

If a member of the group has a problem, the group member will collectively help her/him

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My group gives me the power to make important decisions concerning myself by myself

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I feel free of conﬂict with myself in the context of my group

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My group supports me to become a better person

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My group wants me to stand up for my beliefs, independently of who opposes them

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Source: Braithwaite and Law (1985).

Survey of Individual and Organizational Collaboration (SIC & SOC) 441

19.2.2. Leadership

442

19.2.3. Contribution/Sharing

Agree

Disagree

It is imperative to lessen the gap between the rich and the poor

(1)

(2)

(3)

(4)

(5)

(6)

(7)

All nations of the earth should work together to help each other

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I prefer to gain new insights even if it comes at a cost to myself

(1)

(2)

(3)

(4)

(5)

(6)

(7)

My personal success depends on aligning my goals with the goals of my group or organization

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(1)

(2)

(3)

(4)

(5)

(6)

(7)

I am willing to invest more into my organization than what I get out

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Source: Schwartz (2012); Waldman et al. (2006).

Sociometrics and Human Relationships

My goal is to serve my organization, not myself

Agree Each group member needs to be treated as someone of worth, independent of social position and income

(1)

Disagree (2) (3) (4) (5) (6)

(7)

We need to give every group member an equal chance, even if this means I have less

(1)

(2) (3) (4) (5) (6)

(7)

It is imperative to prevent the destruction of nature, even if this means less income for me

(1)

(2) (3) (4) (5) (6)

(7)

In the context of my group I am taking responsibility for my own actions

(1)

(2) (3) (4) (5) (6)

(7)

I like making fun about the members of my group

(1)

(2) (3) (4) (5) (6)

(7)

Every group member deserves respect

(1)

(2) (3) (4) (5) (6)

(7)

Source: Watson, Newton, and Kim (2003).

Survey of Individual and Organizational Collaboration (SIC & SOC) 443

19.2.4. Responsiveness/Respect

444

Sociometrics and Human Relationships

19.3. SAMPLE DOWNLOAD The samples mentioned in this book can be downloaded from the following link: http://www.ickn.org/sociometrics/ There you will ﬁnd the following: • Hillary Clinton’s e-mail as provided on Kaggle. • Enron e-mail archive in Condor format. • List of Enron convicts to test machine learning with Condor. • 9000 tweets about Donald Trump on April 22, 2016. • 9000 tweets about Bernie Sanders on April 22, 2016. • Antivaxxer Twitter example in a KNIME format. • Excel spreadsheet to create AMICA output: IMIC, OMIC, IMOC, and OMOC examples. The SIC and SOC survey is available online at http:// 5.35.249.27/sociometrics/sicsoc

PART IV. APPENDIX — USEFUL MACHINE LEARNING AND GRAPH ANALYSIS TOOLS The appendix describes KNIME and Gephi, two additional tools besides Condor useful for mapping the collective mind on online social media.

r 2017 Peter A. Gloor

445

This page intentionally left blank

APPENDIX A: IDENTIFYING ANTIVAXXERS THROUGH MACHINE LEARNING USING KNIME

CHAPTER CONTENTS • KNIME is an open source machine learning tool with a visual front end • A training dataset of tweeters is manually classiﬁed into pro-vaxxers and anti-vaxxers • Machine learning distinguishes supporters and objectors of the “Anti-Vaxxer” theory through their word usage in Twitter.

In this example, we will learn how to use machine learning to identify proponents of the “Anti-Vaxxer” theory through their Twitter behavior. Anti-vaccination, the refusal of parents to vaccinate their infants against common infectious diseases, has been scientiﬁcally debunked, but is still propagated by a small but vocal minority of parents in the United States. They claim that vaccination of infants will create autism. The consequence is that in some parts of the United States more than 10% of children are not vaccinated, thereby r 2017 Peter A. Gloor

447

448

Sociometrics and Human Relationships

becoming potential carriers of infectious disease, such as measles, for their peers.1 We will analyze a dataset of tweets that has been collected in Spring 2015 by a team of students of the COINs seminar at FHNW Brugg and University of Cologne. They gathered all the Tweets containing the words “vaccination,” “vaccinate,” “vaxxer,” “vaccine,” “anti-vaccination,” and “anti-vaxxers.” The resulting Tweets together with information about the tweeters were used to manually classify two sets of Tweets: one belonging to pro-vaxxers and the other belonging to antivaxxers. These Tweets were used to categorize pro- and antivaxxers based on their word usage. Pennebaker (2013), a social psychology professor at UT Austin, has found that how people use small function words such as “the,” “and,” “or,” “to,” “in,” “it,” “what,” “I,” “my,” “me,” “you,” etc., have high predictive value. As Condor did not count these words initially, an external program was written by the students that calculates these statistics for each Tweet. (In the meantime this function has been added to Condor.) We are then using KNIME, an open source text mining and data analytics tool with a visual front end, to apply the results of Pennebaker to see if pro- and anti-vaxxers use these function words in different ways.

1

This example is based on a class project in the COINs 2015 Spring seminar done by Juerg Dietrich (FHNW Brugg) and Matthias Sambale (University of Cologne), together with their team members in Brugg and Tor Vergata University Rome, Yannick Gaugler, Benjamin Schaja, Luca Balestra, and Rosy Innarella.

Appendix A

449

As an input for KNIME, we use the dataset put together by the student team. The dataset contains the following ﬁelds: ID Username isAntivaxxer meanNumberOfMentions meanNumberOfHashtags meanNumberOfHtmls meanTextLength Frequency_The Frequency_And Frequency_To Frequency_In Frequency_It Frequency_My Frequency_You Frequency_Was Frequency_For Frequency_Have Frequency_With Frequency_Me Frequency_But Sum_smallwords

The ﬁle “Pro_Anti_Vaxxer_Twitter.csv” contains 171 manually classiﬁed proﬁles of pro-vaxxers and 171 proﬁles of anti-vaxxers, with their attributes such as meanNumberOfMentions, …, Frequency_But, Sum_smallwords. After downloading KNIME from www.knime.org, we start by creating a new workﬂow “Antivaxxer2.” From the Node Repository, we ﬁrst drag a File

450

Sociometrics and Human Relationships

Reader icon into the workspace, by right clicking it and opening the conﬁgure window. Selecting the “Pro_Anti_Vaxxer_Twitter.csv” ﬁle leads to the following conﬁguration window. Note that the ID and username need to be skipped — they are random numbers for the purpose of machine learning and would confuse the results — for the analysis, by right clicking the column heading and selecting the box “Don’t include column in output table.”

Next we add “Equal Size Sampling” (use exact sampling) and “Partitioning” nodes. Our nominal column has been automatically set to “isAntivaxxer.” The partitioning node

Appendix A

451

will split the input table into a training and a test dataset. We specify equal sampling (relative 50%) and stratiﬁed sampling. We do this because our sample contains an equal number of classiﬁed anti-vaxxer and pro-vaxxer proﬁles. After that we drag “Naïve Bayes Learner,” “Naïve Bayer Predictor,” “Decision Tree Learner,” “Decision Tree Predictor,” and “Logistic Regression Learner” and “Regression Prediction” nodes into the workspace and connect their inputs and outputs according to the network plan below. This means that we are applying three different machine learning algorithms to the anti-vaxxer dataset, building three different models to be able to classify new and unclassiﬁed anti-vaxxer proﬁles. The predictor will test the second half of our dataset against the three models developed by the three learners, to give us an indication of the quality of the three models. To measure the accuracy of the output, we need to add a “Scorer” to the output of each predictor.

452

Sociometrics and Human Relationships

The scorer needs to be set up to test the accuracy of the prediction “isAntivaxxer” against the preclassiﬁed variable “isAntivaxxer.”

Clicking on the Naïve Bayes Scorer shows the following confusion matrix. The confusion matrix shows that out of 170 test cases classiﬁed, 57 were correctly classiﬁed as pro-vaxxers and 54 were correctly classiﬁed as antivaxxers, leading to an accuracy of 65.3%.

Appendix A

453

Looking at the Naïve Bayes learning view shows that pro-vaxxers use more “me” and “my”; according to Pennebaker this is a sign of humility.

454

Sociometrics and Human Relationships

Clicking on the Decision Tree Scorer shows the following confusion matrix, telling us that 18 anti-vaxxers and 44 pro-vaxxers have been misclassiﬁed.

Looking at the decision tree shows that pro-vaxxers use more complex language (Sum_smallwords > 0.075).

Appendix A

455

Clicking on the Logistic Regression Scorer shows the following confusion matrix.

Sociometrics and Human Relationships

456

We can now also look at the coefﬁcient of the Logistic Regression. As we see, we get some signiﬁcant predictors, for instance meanNumberofMentions, meanNumberofHtmls, meanTextLength, etc. Putting them into the regression equation would allow us to calculate the probability F(x) for new people to be a proor an anti-vaxxer.

FðxÞ ¼

1 1þ

eðβ0 þβ1 xÞ

Appendix A

457

This concludes a very brief introduction to KNIME; more details can be found online. The book Guide to Intelligent Data Analysis by Borgelt, Höppner, and

458

Sociometrics and Human Relationships

Klawonn2 gives a broad introduction to machine learning with KNIME examples.

MAIN LESSONS LEARNED • Supervised learning needs a training and a test dataset. • A dataset of categorized tweets from anti-vaxxers (denying the beneﬁts of vaccinations) and provaxxers (public health ofﬁcials) helps identify anti-vaxxers based on their word usage in tweets. • As features in the machine learning James Pennebacker’s “small words” are used.3 • Anti-vaxxers use less personal pronouns and less complex language.

2 3

Borgelt, Höppner, and Klawonn (2010). Pennebaker (2013).

APPENDIX B: GENERATING NICE GRAPH PICTURES WITH GEPHI

CHAPTER CONTENTS • Gephi is an open source graph drawing and manipulation tool for Mac, Windows, and Linux. • Gephi includes additional functions for graph drawing, ﬁltering, clustering, and manipulation not available in Condor.

The open source graph drawing tool Gephi offers an additional functionality to draw and manipulate graphs as well as sophisticated layout options, which is not available in Condor. First, you will need to download the most recent version of Gephi (currently 0.9.1) from gephi.org. You might need to adjust the parameters of Gephi in the conﬁg ﬁle, by right-clicking on the “Gephi”-icon and selecting “show package contents.”

r 2017 Peter A. Gloor

459

460

Sociometrics and Human Relationships

In the gephi.conf ﬁle, you can decide what version of the Java virtual machine to run (jdkhome) (Gephi still seems to use Java 1.6) and how much memory to allocate (for instance Xmx11,468 m assigns Gephi 11,468 MB): # ${HOME} will be replaced by user home directory according to platform default_userdir = “${HOME}/.${APPNAME}/ 0.8.2/dev” default_mac_userdir = “${HOME}/Library/ Application Support/${APPNAME}/0.8.2/dev” # options used by the launcher by default, can be overridden by explicit # command line switches default_options = “–branding gephi -J-Xms64m -J-Xmx11468m -J-Xverify:none -J-Dsun.java2d. noddraw = true

-J-Dsun.awt.noerasebackground =

true -J-Dnetbeans.indexing.noFileRefresh = true -J-Dplugin.manager.check.interval = EVERY_DAY” # for development purposes you may wish to append: -J-Dnetbeans.logger.console = true -Jea

Appendix B

461

# default location of JDK/JRE, can be overridden by using –jdkhome switch jdkhome = “/System/Library/Frameworks/ JavaVM.framework/Versions/1.6.0/Home/” # clusters’ paths separated by path.separator (semicolon on Windows, colon on Unices) #extra_clusters =

Once Gephi is started, we are ready to load the data. In this example, I will look at my own mailbox data that I have downloaded into Condor previously, exporting it as a MySQL dump which can be directly loaded into Gephi.

The exported ﬁle, for example, “peter.sql” needs to be loaded into MySQL, for example, by using Navicat,

462

Sociometrics and Human Relationships

MySQL Workbench, or by directly using the command line: PETERs-MacBook-Pro-2:~ pgloor$ /usr/local/ mysql/bin/mysql -u root mysql > create database petermail2; Query OK, 1 row affected (0.00 sec) mysql > use petermail2; Database changed mysql > source /Users/pgloor/Desktop/peter. sql

Afterwards, the network can be loaded into Gephi using the command “File->import database”

Appendix B

463

This will lead to the following image.

Calculating statistics, and choosing a layout algorithm, and coloring the nodes by cluster, and sizing nodes and labels by betweenness leads to the following image. Image 80a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

464

Sociometrics and Human Relationships

Filtering only the top nodes by betweenness results in the following graph. Image 81a

a

For color pictures see online version of images, available at http://www.ickn.org/

sociometrics/

A discussion of the rich feature set of Gephi is beyond the scope of this book, I encourage you to experiment and try it out for yourself. MAIN LESSONS LEARNED • Gephi offers rich functionality to manipulate and visualize networks. • It is particularly useful to produce appealing network pictures for presentations and reports. • Gephi also supports export of many graph metrics for subsequent analysis.

APPENDIX C: SAMPLE MID-TERM EXAM

This section includes a sample Mid-Term Exam suitable for a one-semester course on digital social network analysis, organizational redesign and engineering, social media-based trend forecasting and prediction, and Coolhunting for trends and trendsetters. 1. Brieﬂy explain the following (in your own words): a. Social network (2p) b. Reciprocity (2p) c. Egocentric network (2p) d. Draw a network that includes a clique with four nodes (2p)

r 2017 Peter A. Gloor

465

466

Sociometrics and Human Relationships

2. Given the graph G below:

a. What is the betweenness centrality for b? (6p) Use steps to show your thought process b. What is the degree centrality for b? (2p)

3. Based on graph G from task 2 above, ﬁll in the table below (8p): 0.5p each for multiple choice 0.5p for explanation

G is connected

False

Impossible to Say

Why?

Appendix C

True

G is weighted G is directed G is complete G has a bridge G has a gatekeeper G has strong ties The density of G is more than 0.5

467

468

Sociometrics and Human Relationships

4. Draw a network with at least four nodes, where the group degree centrality is 1. (2p) 5. Draw a graph with ﬁve nodes with density 1. (2p) 6. What is the correlation between the two variables visualized below for (a), (b), and (c). Explain why.

7. Explain the pros and cons of “collaborative competition” and “competitive collaboration,” and give some examples. (4p) 8. Which social network structure is best for spreading ideas quickly? Which network structure would you create to get others to accept your new idea? (4p) 9. Do a Coolhunting with Condor for “Hillary Clinton,” comparing the results against “Donald Trump.” (30p)

APPENDIX D: REFERENCES

Aharony, N., Pan, W., Ip, C., Khayal, I., & Pentland, A. (2011). Social fMRI: Investigating and shaping social mechanisms in the real world. Pervasive and Mobile Computing, 7(6), 643659. Allen, T., Gloor, P., Woerner, S., Raz, O., & Fronzetti Colladon, A. (2016). The power of reciprocal knowledge sharing relationships for startup success. Journal of Small Business and Enterprise Development, 23(3), 636651. Allen, T., Raz, O., & Gloor, P. (2009). Does geographic clustering still beneﬁt high tech new ventures? The case of the Cambridge/Boston biotech cluster. MIT ESD-WP2009-01 Working Paper #1, 2009. Apicella, C. L., Marlowe, F. W., Fowler, J. H., & Christakis, N. A. (2012). Social networks and cooperation in hunter-gatherers. Nature, 481(7382), 497501. Aral, S., & Walker, D. (2012). Identifying inﬂuential and susceptible members of social networks. Science, 337(6092), 337341. Ashmore, R. D., Deaux, K., & McLaughlin-Volpe, T. (2004). An organizing framework for collective identity: Articulation and signiﬁcance of multidimensionality. Psychological Bulletin, 130(1), 80. r 2017 Peter A. Gloor

469

470

Sociometrics and Human Relationships

Awad, N. F., & Krishnan, M. S. (2006). The personalization privacy paradox: An empirical evaluation of information transparency and the willingness to be proﬁled online for personalization. MIS Quarterly, 1328. Battilana, J., & Casciaro, T. (2012). Change agents, networks, and institutions: A contingency theory of organizational change. Academy of Management Journal, 55(2), 381398. Blau, G. J. (1985). The measurement and prediction of career commitment. Journal of occupational Psychology, 58(4), 277288. Bollen, J., Gonçalves, B., van de Leemput, I., & Ruan, G. (2016). The happiness paradox: Your friends are happier than you. arXiv preprint arXiv:1602.02665. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 18. Borgelt, C., Höppner, F., & Klawonn, F. (2010). Guide to intelligent data analysis. London: Springer-Verlag. Braithwaite, V. A., & Law, H. G. (1985). Structure of human values: Testing the adequacy of the Rokeach Value Survey. Journal of Personality and Social Psychology, 49(1), 250. Brunnberg, D., Gloor, P., & Giacomell, G. (2013). Predicting customer satisfaction through (e-mail) network analysis: The communication score card. Proceedings 4rd international. conference on collaborative innovation networks COINs 2013, Santiago de Chile, August 1113.

Appendix D

471

Burke, M., Kraut, R., & Marlow, C. (2011, May). Social capital on Facebook: Differentiating uses and users. Proceedings of the SIGCHI conference on human factors in computing systems, ACM (pp. 571580). Celli, F., & Poesio, M. (2014). Pr2: A language independent unsupervised tool for personality recognition from text. arXiv preprint arXiv:1402.2796. Centola, D. (2010). The spread of behavior in an online social network experiment. Science, 329(5996), 11941197. DiGrazia, J., McKelvey, K., Bollen, J., & Rojas, F. (2013). More tweets, more votes: Social media as a quantitative indicator of political behavior. PloS One, 8(11), e79449. DiMaggio, M., Gloor, P., & Passiante, G. (2009). Collaborative innovation networks, virtual communities, and geographical clustering. International Journal of Innovation and Regional Development, 1(4), 387404. Doshi, L., Krauss, J., Nann, S., & Gloor, P. (2009). Predicting movie prices through dynamic social network analysis. Proceedings COINs 2009, collaborative innovations networks conference, Savannah GA, Oct 811. Durkheim, E., & Swain, J. W. (2008). The elementary forms of the religious life. Courier Corporation. Fischbach, K., Gloor, P., & Schoder, D. (2009). Analysis of informal communication networks A case study. Business & Information Systems Engineering, 2 (also in German).

472

Sociometrics and Human Relationships

Frick, K., Guertler, D., & Gloor, P. (2009). Coolhunting for the world’s thought leaders. Proceedings 4rd International conference on collaborative innovation networks COINs 2013, Santiago de Chile, August 1113. Fu, F., Nowak, M. A., Christakis, N. A., & Fowler, J. H. (2012). The evolution of homophily. Scientiﬁc Reports, 2. Fuehres, H., Gloor, P., Henninger, M., Kleeb, R., & Nemoto, K. (2012). Galaxysearch: Discovering the knowledge of many by using Wikipedia as a meta-search index. Proceedings of collective intelligence 2012, Cambridge, MA, April 1820. Futterer, T., Gloor, P., Malhotra, T., Mfula, H., Packmohr, K. H., & Schultheiss, S. (2013). WikiPulse A news-portal based on Wikipedia. Proceedings 4rd International conference on collaborative innovation networks COINs 2013, Santiago de Chile, August 1113. Garcia, C., Parraguez, P., Barahona, M., & Gloor, P. (2012). Tracking the 2011 Student-led collective movement in Chile through social media use. Proceedings collective intelligence 2012, Cambridge, MA, April 1820. Gloor, P. (2007). Coolhunting for trends on the Web. (invited paper). Proceedings IEEE 2007 international symposium on collaborative technologies and systems, Orlando, May 2125. Gloor, P. (2010). Coolfarming Turn your great idea into the next big thing AMACOM, New York, NY. Gloor, P. (2011). To become a better manager stop being a manager. Ivey Business Journal. March/April 2011.

Appendix D

473

Retrieved from: http://iveybusinessjournal.com/publication/to-become-a-better-manager-stop-being-a-manager/ Gloor, P. (2015). What email reveals about your organization. Sloan Management Review, Winter. Gloor, P., De Boer, P., Lo, W., Wagner, S., Nemoto, K., & Fuehres, H. (2015). Cultural anthropology through the lens of Wikipedia A comparison of historical leadership networks in the English, Chinese, and Japanese Wikipedia. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. Gloor, P., Dorsaz, P., Fuehres, H., & Vogel, M. (2012). Choosing the right friends Predicting success of startup entrepreneurs and innovators through their online social network structure. International Journal of Organisational Design and Engineering, 3(2), 6885. Gloor, P., & Fronzetti, A. (2015). Measuring organizational consciousness through e-mail based social network analysis. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. Gloor, P., & Giacomelli, G. (2014). Reading global clients’ signals. Sloan Management Review, Spring. Gloor, P., Grippa, F., Borgert, A., Colletti, R., Dellal, G., Margolis, P., & Seid, M. (2011). Towards growing a COIN in a medical research community. Procedia Social and Behavioral Sciences, 26, Proceedings COINs 2010, Collaborative innovations networks conference, Savannah GA, October 79, 2010.

474

Sociometrics and Human Relationships

Gloor, P., Krauss, J., Nann, S., Fischbach, K., & Schoder, D. (2009). Web Science 2.0: Identifying trends through semantic social network analysis. IEEE conference on social computing (SocialCom-09), Vancouver, August 2931. Gloor, P., Laubacher, R., Dynes, S., & Zhao, Y. (2003). Visualization of communication patterns in collaborative innovation networks: Analysis of some W3C working groups. ACM CKIM international conference on information and knowledge management, New Orleans, November 38. Gloor, P., Margolis, P., Seid, M., & Dellal, G. (2014). Coolfarming Lessons from the beehive to increase organizational creativity. MIT Sloan School Working Paper No. 5123-14. Gloor, P., & Nemoto, K. (2013). Who really matters in the world Leadership networks in different language Wikipedias. Places and Spaces Mapping Science, Map #157. Gloor, P., Niepel, S., & Li, Y. (2006, January). Identifying potential suspects by temporal link analysis. MIT CCS Working Paper. Gloor, P., & Paasivaara, M. (2013). COINs change leaders Lessons learned from a distributed course. Proceedings 4rd International conference on collaborative innovation networks COINs 2013, Santiago de Chile, August 1113. Gloor, P., Paasivaara, M., Lassenius, C., Schoder, D., Fischbach, K., & Miller, C. (2011). Teaching a global

Appendix D

475

project course: Experiences and lessons learned. ICSE international conference on software engineering Collaborative teaching of globally distributed software development Community building workshop, Honolulu, Hawaii, May 23. Gloor, P., Paasivaara, M., & Miller, C. (2015). Lessons from the coinseminar. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. Gloor, P., Paasivaara, M., Schoder, D., & Willems, P. (2007). Finding collaborative innovation networks through correlating performance with social network structure. Journal of Production Research. Gloor, P., Woerner, S., Schoder, D., Fischbach, K., & Fronzetti Colladon, A. (2016). Size does not matter In the virtual world. Comparing online social networking behavior with business success of entrepreneurs. International Journal of Entrepreneurial Venturing. In press. Gloor, P., & Zhao, Y. (2006). Analyzing actors and their discussion topics by semantic social network analysis. Proceedings of 10th IEEE international conference on information visualisation IV06, London, July 57. Grippa, F., & Gloor, P. (2009). You are who remembers you. Detecting leadership through accuracy of recall. Social Networks, 31, 255261. Grippa, F., Palazzolo, M., Buccuvalas, J., & Gloor, P. (2012). Monitoring changes in the social network structure of clinical care teams resulting from team

476

Sociometrics and Human Relationships

development efforts. International Journal of Organisational Design and Engineering, 2(4), 380401. Grippa, F., Provost, S., Gloor, P., McKean, M., & Thakkar, S. A. (2014). Systematic methodology to characterize communication patterns in chronic care innovation networks. In S. Long, E.-H. Ng, & C. Downing (Eds.), Proceedings of the American society for engineering management international annual conference. Grippa, F., Zilli, A., Laubacher, R., & Gloor, P. (2006). E-mail may not reﬂect the social network. NAACSOS conference, Notre Dame IN, North American Association for Computational Social and Organizational Science, June 2223. Grootaert, C. (Ed.). (2004). Measuring social capital: An integrated questionnaire. No. 18. World Bank Publications. Hill, R. A., & Dunbar, R. I. (2003). Social network size in humans. Human nature, 14(1), 5372. Hybbeneth, S., Brunberg, D., & Gloor, P. (2014). Increasing knowledge worker productivity through a “Virtual Mirror” of the social network. International Journal of Organisational Design and Engineering, 3(34), 302316. Jordan, J. J., Hoffman, M., Nowak, M. A., & Rand, D. G. (2016). Uncalculating cooperation as a signal of trustworthiness. Retrieved from SSRN. Kidane, Y., & Gloor, P. (2007). Correlating temporal communication patterns of the eclipse open source

Appendix D

477

community with performance and creativity. Computational & Mathematical Organization Theory, 13(1), 1727. Kleeb, R., Gloor, P., & Nemoto, K. (2011). Wikimaps: Dynamic maps of knowledge. Proceedings 3rd International conference on collaborative innovation networks COINs 2011, Basel, Switzerland, September 810. Krauss, J., Nann, S., Simon, D., Fischbach, K., & Gloor, P. (2008). Predicting movie success and academy awards through sentiment and social network analysis. Proceedings European conference on information systems (ECIS), Galway, Ireland, June 911. Lewis, K., Gonzalez, M., & Kaufman, J. (2012). Social selection and peer inﬂuence in an online social network. Proceedings of the National Academy of Sciences, 109(1), 6872. Luhtanen, R., & Crocker, J. (1992). A collective selfesteem scale: Self-evaluation of one’s social identity. Personality and Social Psychology Bulletin, 18(3), 302318. Maddali, H. T., Gloor, P., & Margolis, P. (2015). Comparing online community structure of patients of chronic diseases. Proceedings of the 5th international conference on collaborative innovation networks COINs15, Tokyo, Japan, March 1214. McCrae, R. R., & Costa Jr, P. T. (1997). Personality trait structure as a human universal. American Psychologist, 52(5), 509.

478

Sociometrics and Human Relationships

McLean, B., & Elkind, P. (2013). The smartest guys in the room: The amazing rise and scandalous fall of Enron. London: Penguin. Merten, F., & Gloor, P. (2009). Too much e-mail decreases job satisfaction. Proceedings COINs 2009, collaborative innovations networks conference, Savannah GA, October 811. Moat, H. S., Curme, C., Avakian, A., Kenett, D. Y., Stanley, H. E., & Preis, T. (2013). Quantifying Wikipedia usage patterns before stock market moves. Scientiﬁc Reports, 3. Narayan, D., & Cassidy, M. F. (2001). A dimensional approach to measuring social capital: Development and validation of a social capital inventory. Current Sociology, 49(2), 59102. Naveen Farag, A. S., & Krishnan, M. S. (2006). The personalization privacy paradox: An empirical evaluation of information transparency and the willingness to be proﬁled online for personalization. MIS Quarterly, pp. 1328. Nowak, M. A. (2006). Five rules for the evolution of cooperation. Science, 314(5805), 15601563. Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011, June). Finding deceptive opinion spam by any stretch of the imagination. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (Vol. 1, pp. 309319). Pennebaker, J. (2013). The secret life of pronouns: What our words say about us. London: Bloomsbury Press.

Appendix D

479

Piotrowski, S. J., & Van Ryzin, G. G. (2007). Citizen attitudes toward transparency in local government. The American Review of Public Administration, 37(3), 306323. Preis, T., Moat, H. S., & Stanley, H. E. (2013). Quantifying trading behavior in ﬁnancial markets using Google trends. Scientiﬁc Reports, 3. Preot¸iuc-Pietro, D., Volkova, S., Lampos, V., Bachrach, Y., & Aletras, N. (2015). Studying user income through language, behaviour and affect in social media. PloS One, 10(9), e0138717. Quercia, D., Kosinski, M., Stillwell, D., & Crowcroft, J. (2011). Our twitter proﬁles, our selves: Predicting personality with Twitter. Privacy, Security, Risk and Trust (PASSAT) and IEEE third international conference on social computing (SocialCom), 2011, pp. 180185. Rawlins, B. (2008). Measuring the relationship between organizational transparency and employee trust. Public Relations Journal, 2(2), 121. Rye, M. S., Loiacono, D. M., Folck, C. D., Olszewski, B. T., Heim, T. A., & Madia, B. P. (2001). Evaluation of the psychometric properties of two forgiveness scales. Current Psychology, 20(3), 260277. Satyanath, S., Voigtländer, N., & Voth, H. J. (2013). Bowling for fascism: Social capital and the rise of the Nazi Party (No. w19201). National Bureau of Economic Research.

480

Sociometrics and Human Relationships

Schwartz, S. H. (2012). An overview of the Schwartz theory of basic values. Online Readings in Psychology and Culture, 2(1). doi:10.9707/2307-0919.1116 Sparrow, B., Liu, J., & Wegner, D. M. (2011). Google effects on memory: Cognitive consequences of having information at our ﬁngertips. Science, 333(6043), 776778. Sveiby, K. E., & Simons, R. (2002). Collaborative climate and effectiveness of knowledge work-an empirical study. Journal of Knowledge Management, 6(5), 420433. Taleb, N. N. (2007). The black swan: The impact of the highly improbable. New York, NY: Random House. Tsvetovat, M., & Koutznetsov, A. (2011). Social network analysis for startups. O’Reilly. Urdan, T. (2010). Statistics in plain English. Abingdon: Routledge. Vedres, B., & Stark, D. (2010). Structural folds: Generative disruption in overlapping Groups1. American Journal of Sociology, 115(4), 11501190. Wagner, C. S., Horlings, E., Whetsell, T. A., Mattsson, P., & Nordqvist, K. (2015). Do Nobel laureates create prize-winning networks? An analysis of collaborative research in physiology or medicine. PloS One, 10(7), e0134164. Waldman, D. A., de Luque, M. S., Washburn, N., House, R. J., Adetoun, B., Barrasa, A., & Dorfman, P. (2006). Cultural and leadership predictors of corporate social responsibility values of top management: A GLOBE study

Appendix D

481

of 15 countries. Journal of International Business Studies, 37(6), 823837. Wassermann, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press. Watson, D. L., Newton, M., & Kim, M. S. (2003). Recognition of values-based constructs in a summer physical activity program. The Urban Review, 35(3), 217232. Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality, 44(3), 363373. Yasseri, T., Spoerri, A., Graham, M., & Kertész, J. (2014). The most controversial topics in Wikipedia: A multilingual and geographical analysis. Yun, Q., & Gloor, P. (2012). The web mirrors value in the real world Comparing a ﬁrm’s valuation with its web network position. Sloan Technical Report, Cambridge, MA. Zhang, X., Fuehres, H., & Gloor, P. (2011a). Predicting asset value through twitter Buzz. In J. Altmann, U. Baumoel, B. Kraemer (Eds.), Proceedings 2nd symposium on collective intelligence, Collin 2011 (Vol. 112), Seoul, Springer Advances in Intelligent and Soft Computing, June 910. Zhang, X., Fuehres, H., & Gloor, P. (2011b). Predicting stock market indicators through twitter: “I hope it is not as bad as I fear,” Procedia Social and Behavioral

482

Sociometrics and Human Relationships

Sciences, 26, Collaborative innovations networks conference, Savannah GA, October 79, 2010. Zhang, X., Gloor, P., & Grippa, F. (2013). Measuring creative performance of teams through dynamic semantic social network analysis. International Journal of Organisational Design and Engineering, 4(2), 118. Zilli, A., Grippa, F., Gloor, P., & Laubacher, R. (2006). One in four is enough strategies for selecting ego mailboxes for a group network view. Proceedings European conference on complex systems ECCS ‘06, Oxford UK, September 2529.

BIOGRAPHY

Peter A. Gloor is Research Scientist at the Center for Collective Intelligence at MIT’s Sloan School of Management where he leads a project exploring Collaborative Innovation Networks. He is also the Founder and Chief Creative Ofﬁcer of software company galaxyadvisors, a Honorary Professor at University of Cologne, Distinguished Visiting Professor at P. Universidad Católica de Chile and Honorary Professor at Jilin University, Changchun, China. Earlier, he was a partner with Deloitte and PwC, and a manager at UBS. He got his Ph.D. in computer science from the University of Zurich and was a Post-Doc at the MIT Lab for Computer Science. In his spare time, Peter likes to work on projects bridging the digital divide, enjoy nature, and play the piano.

r 2017 Peter A. Gloor

483

This page intentionally left blank

INDEX Actor ﬁlter, 163, 189 Actors, in SNA, 70 Actor scatter plot, 133, 167, 179 Adjusted R Square, 249, 250, 258, 259 Agreeability, 250, 259260 “Allteams-cleaned”, 200 Amity University, India, 297298, 300, 311, 312, 316317, 322 Annotate functions, 164, 243 ANOVA results by ethnicity for FFI characteristics, 256 by gender for FFI characteristics, 255 by nationality for FFI characteristics, 257 Anti-gaming, 66 Anti-vaccination, 447 Antivaxxers identiﬁcation through machine learning, 447457 Asteroid belt, 160, 183 Automatic Media Insights COIN Assessment (AMICA), 4, 13, 17, 385389 Average Response Time (ART), 154, 345, 346, 403 Balanced contribution, 4950, 52

BeingExample, 334 Bernie Sander’s presidential campaign, 352, 353355 Betweenness centrality, 70, 7273, 188, 306, 308 Betweenness curves, 178 Bidirectional links, 150, 312313, 315 Bipartite graphs measuring the importance of brands through betweenness of actors in, 136137 Black swans, 108 Blogs, 3, 298311 Bowling for fascism, 9091 Brands, calculating the importance of, 305 “Brothers”, 333 Bush, Jeb, 356, 360, 361, 363, 364 “Calculate Sentiment” function, 164, 167, 172, 200, 243, 273, 283, 317, 402 Calendar data, 2 Centrality annotations, 137, 162, 164, 173, 196, 200, 243, 273, 283, 314, 402 Chat, 3, 4 Chauhan, Ashok, 297, 298, 309

485

486

Cincinnati Children’s Hospital Medical Center (CCHMC), 400 Classic SNA, 28 Clinton, Hillary, 137, 151, 219228, 350, 356, 365 Clustered network, 8990 COIIN project, 184 COINonCOINs community, 189190 Collaboration honest signals of, 45 balanced contribution, 4950 honest language, 5051 responsiveness, 50 rotating leadership, 49 shared context, 5155 strong leadership, 48 knowledge ﬂow optimization, 5861 privacy concerns, dealing with, 5658 virtual mirroring, 56 Collaborative Innovation Networks (COINs), 6, 24, 25, 192, 212, 352, 353354, 386 Collaborative Learning Network (CLN) learning, 354 Collaborative performance of organizations, measuring, 419 Communication galaxies, understanding, 67 Community detection, ﬁnding COINs through, 185192

Index

Community detection algorithm, 185, 186, 187, 188 Condor, 108, 109, 155, 156, 157, 165, 170, 172, 185, 197, 208, 212, 229, 242, 296, 340, 366, 419 analyzing e-mail with, 108 bipartite graphs, brands through betweenness of actors in, 136137 Coolhunting on Internet with, 1112 drilling down in, 394 facebook wall with, analyzing, 126129 four-step analysis process. See Four-step analysis process getting started with, 121 Google CSE, degree-ofseparation search with, 141146 graph, 137 identifying criminals through machine learning in, 280290 main parts of, 113 manual, 122 sample four-step analysis with twitter, 130 export, 134 fetch data, 130132 process, 132 visualize, 133134 started with, 910 Twitter, degree-ofseparation search with, 146150

Index

Wikipedia search, 150152 Condor Export Wizards, 118, 119 Condor software tool, 3, 28, 57 Conscientiousness, 103, 244, 253254, 258 Contribution index, 49, 70, 74, 75, 154, 204, 215 Contribution index annotations, 164, 166, 200, 243, 273, 283 Contribution index scatter plot, 225 Convicts versus nonconvicts, 287 Coolfarming, 3, 4, 6, 9, 12, 24, 107, 108 data collection and analysis process, 3132 organizations, 25 through knowledge ﬂow optimization, 5861 Coolhunting, 3, 4, 24, 36, 107, 108, 349 ﬁnding trends by ﬁnding trendsetter, 3944 Francogeddon, 12, 339348 on Internet with Condor, 1112 on social media, 40 and trend forecasting on web, 7, 37 US Presidential elections, 12 Coolhunting on the Internet with Condor, 295

487

analysis of the crowd, 322334 expert analysis, 298311 swarm analysis, 311321 Cooperation, evolution of, 93 Cooperation and trustworthiness, uncalculating, 9495 Correlation, 7880, 81 Correlation results of FFI metrics with six honest signal SNA metrics, 245248 Correlations calculation between FFI and e-mail, 242244 “Create new dataset”, 182 Creativity, 6566 Criminal actors, identifying through their honest signals of collaboration, 273280 Criminals, identifying through machine learning in condor, 280290 Crowd, 296 analysis of, 322334 CSV data, 220 Deceptive opinion spam, ﬁnding, 9697 Degree centrality, 70, 72, 73, 137, 181 Demographic information calculating, 99103 extracting, 85, 86 Density, 70, 74, 186 Directed graph, 71

488

Edges, 70 EgoFetcher, 414416 Ego networks, 25, 192 Election outcome, predicting, 103 Electronic communications, 3, 28 E-mail, 2, 25, 65, 115, 242, 393 analyzing with, 10 calculating personality characteristics from, 11, 109 predicting criminal intent from, 11, 109 see also Personality characteristics calculation from e-mail E-mail analysis with condor, 153 creating a virtual mirror of an organization, 192219 creating virtual mirror of personal e-mailbox, 154 drawing the term graph, 172174 removing the mailbox owner, 174185 ﬁnding COINs through community detection, 185191 Hillary Clinton’s mail, analyzing, 219228 organizational aspects of e-mail-based SNA, 228231 E-mail-based social network analysis, 6465

Index

Emails.csv, 220 Enron e-mail archive, 11, 109, 263 exploratory analysis, 264272 identifying criminal actors through their honest signals of collaboration, 273280 “tribeﬁnder”, 280290 Exchange Autodiscover server, 157 Expert analysis, 298311 Experts, 296 Exporters, 113, 118120 Extroversion, 250, 258259 Facebook, 3, 25, 112, 115, 425 spreading ideas on, 9596 Facebook wall, analyzing, 126129 Face-to-face communication, 3, 30, 38 FeelTheBern.com, 352 Fetch content, 157 Fetchers, 111, 112, 113, 115116 “Fetch Web”, 299 Filters, 112, 113, 116 Financial capital, improving through optimizing social capital, 6567 Financial performance, measuring, 9799 Four-step analysis process, 111 social media, 111 exporters, 118120 fetchers, 115116 ﬁlters, 116

Index

visualizers, 116118 Francogeddon, 339348 Gates, Bill, 408, 409410 Geotagging, 296 Gephi, generating graph pictures with, 15, 459464 GMAIL login dialog, 158, 159 GMAIL mailbox, 194 Google, 43, 93, 297, 425, 427 Google Custom Search, 115 Google Custom Search Engine (CSE), 136 degree-of-separation search with, 141146 Google Trends, 97, 350 Graph, 28, 137140 Grexit, 342 Group betweenness centrality, 70, 74, 118, 345 Group degree centrality, 70, 73 Happiness paradox, 101 Hawthorne effect, 56 Hillary Clinton’s mail, analyzing, 219228 Homophily, evolution of, 94 Honest language, 5051, 53, 61 Hufﬁngton, Arianna, 408 Hufﬁngton Post, 352 IIT, 298, 320321 IMAP account, 158 “Import local data ﬁrst”, 212

489

Infant Mortality reduction Collaboration Improvement and Innovation Networks (IM CoIIN), 189, 400 Inside media individual collaboration (IMIC), 13, 391403 annotation process, 401403 Inside media organizational collaboration (IMOC), 14, 419423 annotation process, 423 Internet, 38, 9293, 264, 295334 Kaggle website, 220 KNIME, 447458 environment, 8 identifying anti-vaxxers through machine learning using, 15 Knowledge ﬂow optimization, 5861 analyze, 59 coolfarming, 58 mirror, 6061 optimize, 61 through organizational social network analysis, 2931 predict, 59 Known unknowns, 107108 Krugman, Paul, 408 Libertea2012, 352 Linear regression, 80, 8283

490

“Load actor merge CSV”, 198 Louvain algorithm, 185186 Machine learning, 447458 ﬁnding fake reviews through, 9697 Mailbox owner, removing, 174185 Mann-Whitney U-test, 345 “Manual node merging” wizard, 161, 186 Matlab, 120 Microsoft, 427 MIT, 46, 298, 320321 MSFTExchange, 427 MySQL, 115, 122, 124, 155, 156, 326, 461 Natural language processing (NLP), 212 Neo-FFI test, 242 Neuroticism, 103, 244, 249 Nick_Ksg, 334 “Node labels”, 307 Nodes, 70 “Nonconvicts”, 287 Nudges, 50, 345 One-semester course, 18 Online calendars, 115, 400 Online social media, 3, 349, 354 Online social network demographic information, calculating, 99103 election outcome, predicting, 103 facebook, spreading ideas on, 9596

Index

ﬁnancial performance, measuring, 9799 ideas spread in, 8, 85 machine learning, ﬁnding fake reviews through, 9697 papers covered in section, overview, 8688 social selection and peer inﬂuence in, 95 theories of information diffusion, 8994 Openness, 250 Organizational networks, 25 Organizational trust and satisfaction, measuring, 66 Organization’s Communications Patterns assessment, 3233 Oscillation annotations, 164, 165, 200, 243, 273, 283 Outside Media Individual Collaboration (OMIC), 1314, 405417 annotation process, 414417 Outside Media Organizational Collaboration (OMOC), 14, 425 annotation process, 429 Pearson correlation, 7880, 81 Performance metrics correlating communication patterns against, 34

Index

Personal e-mailbox analysis, 154 creating virtual mirror of personal e-mailbox, 154 drawing the term graph, 172185 removing the mailbox owner, 174185 Personality and word use among bloggers, 102103 Personality characteristics calculation from e-mail, 241 adding gender, ethnicity, and nationality as control variables, 254260 agreeability, 259260 extroversion, 258259 calculating correlations between FFI and e-mail, 242244 general prediction formula, developing, 244 agreeability, 250 conscientiousness, 253254 extroversion, 250 neuroticism, 244 openness, 250 Persons.csv ﬁle, 220 Privacy concerns, dealing with, 5658 Problem, 170 Process Dataset, 154 Pro-vaxxers, 448 R, statistical package, 120 Receiver operating characteristics (ROC) curve, 288

491

Reddit, 352, 353 Regression, 80, 8283 Regression coefﬁcients for regressing six honest signals against agreeability, 260 against agreeability with ethnicity as control variable, 260 against conscientiousness, 253254 against extraversion, 251 against extraversion with ethnicity as control variable, 259 against neuroticism, 249 against openness, 252 “Remove speciﬁc actor” function, 175, 188 Responsiveness, 50, 52 RFSchatten, 352 Rotating leadership, 49, 52 Sales effectiveness of a global high-tech company, 63 Sample course syllabus, 2023 Sample download, 444 Sample mid-term exam, 465468 Sanders, Bernie, 365, 369376 Script-generated actors, 197 Shantrjosh, 427 Shared context, 51, 53, 5455 SIC & SOC (Survey of Individual and Organizational Collaboration), 14

492

Six honest signals of collaboration, 7 6670G, 334 Skype, 2, 115, 393 Slander, 427 SMOTE, 373, 378 “Snowball sampling”, 230 Social capital on Facebook, 96 Social fMRI, 102 Social media, 30 Coolhunting on, 40 exporters, 118120 fetchers, 115116 ﬁlters, 116 fundamental analysis, 108 as quantitative indicator of political behavior, 103 visualizers, 116118 Social network analysis (SNA), 56, 28 basics of, 70 E-mail-based, 6465 knowledge ﬂow optimization through, 2931 and statistics, 8, 69 Social network picture of COINs seminar network, 47 Social networks, 5, 90 and cooperation in huntergatherers, 9192 inﬂuential and susceptible members of, 9596 trend prediction by analyzing, 6 trend prediction by measuring, 24 Social Quantum Physics, principles of, 16

Index

Spammers, 66 SPSS statistical package, 114, 120 SPSS’ t-test, 274, 276 SQLite database, 220 Stata, 120 Statistical techniques, 8 Statistics basics of, 75 linear regression, 80, 8283 Pearson correlation, 7880, 81 and SNA, 75 t-test, 76, 78 Stock market Twitter mood predicts, 98 Wikipedia usage patterns, 9899 Strong leadership, 48, 54 Strong ties, 89 Survey of individual collaboration (SIC), 431438 empathy/listening, 438 fairness, 435 forgiveness, 437 organizational motivation, 433 transparency, 434 trust/honesty, 436 Survey of organizational collaboration (SOC), 431, 439443 collective consciousness, 440 contribution/sharing, 442 leadership, 441 responsiveness/respect, 443 Swarm analysis, 296, 311321 Swiss Franc, 340, 342

Index

Swiss National Bank, 340 Synthetic Minority Oversampling Technique (SMOTE) algorithm, 285, 287 Tag cloud, creating, 223 Temporal social surface, 208 “Term graph” function, 172 “Terms”, 172 Theories of information diffusion, 8994 Ties, 70 Trend forecasting, 107, 108 Trends ﬁnding by ﬁnding trendsetter, 3943 “Tribeﬁnder”, 280290, 350, 366382 Trump, Donald, 350, 365368, 377381 t-test, 76, 78, 274, 276 Turntaking annotations, 164, 166, 200, 243, 273, 283 Twitter, 2, 3, 25, 101, 112, 115, 136, 146150, 296, 322334, 425, 427 EgoFetcher, 414416 Tribeﬁnder, 382 2015/2016 Bernie Sanders campaign, 349 2016 US Presidential elections, 350 Bernie Sander’s presidential campaign, 353355

493

Coolhunting Bernie Sanders, Hillary Clinton, Jeb Bush, and Donald Trump, 356366 tribeﬁnder on twitter, 366382 Undirected network, 70 Unidirectional links, 313 Unknown unknowns, 108 Videoconferencing, 3 Virtual collaboration projects, 193 Virtual mirror creation of an organization, 192219 Virtual mirroring, 32, 3436, 56, 107, 108 Virtual tribes, 366, 368369 Visualizers, 113, 116, 118 Weak ties, 89 Web, 295 Websites and blogs, 298311 Wiki Evolution Fetcher, 311, 318 Wikipedia, 2, 3, 42, 93, 112, 115, 136, 150152, 311321, 425 controversial topics in, 99100 “With history” option, 177, 207 Word Cloud, 154