123 34 8MB
English Pages [371] Year 2024
HANDBOOK OF SOCIAL COMPUTING
Handbook of Social Computing Edited by
Peter A. Gloor Research Scientist, MIT Center for Collective Intelligence, Cambridge, USA
Francesca Grippa Professor of Business Strategy, College of Professional Studies, Northeastern University, USA
Andrea Fronzetti Colladon Associate Professor of Business Management and Analytics, University of Perugia, Italy
Aleksandra Przegalinska Associate Professor and Vice-Rector, Kozminski University, Poland
Cheltenham, UK • Northampton, MA, USA
© Peter A. Gloor, Francesca Grippa, Andrea Fronzetti Colladon and Aleksandra Przegalinska 2024
Cover image: olena ivanova on Unsplash. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical or photocopying, recording, or otherwise without the prior permission of the publisher. Published by Edward Elgar Publishing Limited The Lypiatts 15 Lansdown Road Cheltenham Glos GL50 2JA UK Edward Elgar Publishing, Inc. William Pratt House 9 Dewey Court Northampton Massachusetts 01060 USA A catalogue record for this book is available from the British Library Library of Congress Control Number: 2023952157 This book is available electronically in the Business subject collection http://dx.doi.org/10.4337/9781803921259
ISBN 978 1 80392 124 2 (cased) ISBN 978 1 80392 125 9 (eBook)
EEP BoX
Contents
List of contributorsvii Introduction – Social computing: panacea or abyss?xv Peter A. Gloor, Francesca Grippa, Andrea Fronzetti Colladon and Aleksandra Przegalinska PART I
INTRODUCTION TO SOCIAL COMPUTING
1
Network data visualization Walter Didimo, Giuseppe Liotta and Fabrizio Montecchiani
2
Exponential random graph models: explaining strategic patterns of collaboration between artists in the music industry with data from Spotify Claudia Zucca
12
3
Knowing what you get when seeking semantic similarity: exploring classic NLP method biases Johanne Saint-Charles, Pierre Mongeau and Louis Renaud-Desjardins
27
PART II
2
PREDICTION WITH ONLINE SOCIAL MEDIA
4
Chasing the Black Swan in cryptocurrency markets by modeling cascading dynamics in communication networks Christian Schwendner, Vanessa Kremer, Julian Gierenz, Hasbi Sevim, Jan-Marc Siebenlist and Dilber Güclü
5
Presidential communications on Twitter during the COVID-19 pandemic: mediating polarization and trust, moderating mobility Mikhail Oet, Tuomas Takko and Xiaomu Zhou
6
COVID-19 Twitter discussions in social media: disinformation, topical complexity, and health impacts Mikhail Oet, Xiaomu Zhou, Kuiming Zhao and Tuomas Takko
48
74
100
PART III MEASURING EMOTIONS 7
Predicting YouTube success through facial emotion recognition of video thumbnails Peter-Duy-Linh Bui, Martin Feldges, Max Liebig and Fabian Weiland
142
8
Do angry musicians play better? Measuring emotions of jazz musicians through body sensors and facial emotion detection Lee J. Morgan and Peter A. Gloor
159
9
Using plants as biosensors to measure the emotions of jazz musicians Anushka Bhave, Fritz K. Renold and Peter A. Gloor v
173
vi Handbook of social computing PART IV APPLICATIONS IN BUSINESS AND MARKETING 10
How does congruence between customer and brand personality influence the success of a company? Tobias Olbrück, Peter A. Gloor, Ludovica Segneri and Andrea Fronzetti Colladon
11
Netnography 2.0: a new approach to examine crowds on social media Mathias Efinger, Xisa Lina Eich, Marius Heck, Dung Phuong Nguyen, Halil Ibrahim Özlü, Teresa Heyder and Peter A. Gloor
216
12
Crowdfunding success: how campaign language can predict funding Andrea Fronzetti Colladon, Julia Gluesing, Francesca Greco, Francesca Grippa and Ken Riopelle
234
13
Design, content and application of consent banners on plastic surgeon websites: derivation of a typology and discussion of possible implications for data analytics and AI applications Michael Beier and Katrin Schillo
PART V 14
15
190
249
MORE SUSTAINABILITY THROUGH SOCIAL COMPUTING
Creating a systematic ESG (Environmental Social Governance) scoring system using social network analysis and machine learning for more sustainable company practices Aarav Patel and Peter A. Gloor
265
Two chambers, no silver bullets: the growing polarity of climate change discourse279 Jacek Mańko and Dariusz Jemielniak
PART VI HUMAN INTERACTION WITH OTHER SPECIES 16
Plants as biosensors: tomato plants’ reaction to human voices Patrick Fuchs, Rebecca von der Grün, Camila Ines Maslatón and Peter A. Gloor
17
Prototyping a mobile app which detects dogs’ emotions based on their body posture: a design science approach Alina Hafner, Thomas M. Oliver, Benjamin B. Paßberger and Peter A. Gloor
294
310
PART VII TEACHING AI FOR SOCIAL COMPUTING 18
Say ‘yes’ to ‘no-code’ solutions: how to teach low-code and no-code competencies to non-IT students Monika Sońta and Aleksandra Przegalinska
330
Index343
Contributors
Michael Beier (Doctorate, Dr. rer. pol., Faculty of Management, Economics and Social Sciences, University of Cologne, Germany) is a Senior Research Scientist at the Swiss Institute for Entrepreneurship at the University of Applied Sciences of the Grisons in Switzerland. In his work, he deals with topics of organization science, computational social science, digital transformation, corporate data analytics and AI applications, as well as business ethics. Anushka Bhave graduated with a Bachelor of Technology (B.Tech) in Computer Engineering from Savitribai Phule Pune University in 2022. She has been working as a student researcher at the MIT Center for Collective Intelligence with Peter Gloor since 2022. Her research interests lie in multimodal deep learning, human centered AI, and natural language processing (NLP). Prior to this, she has lead-authored four research papers and has had several internship stints in the fields of machine learning as well as software development. Peter-Duy-Linh Bui (Cologne Institute of Information Systems, University of Cologne, Germany) is an employee of Adesso SE where he holds the position of a software developer. He earned both his Bachelor of Science and Master of Science degrees in Information Systems from the University of Cologne, Germany. His research focuses specifically on scaling agile development. Walter Didimo (Department of Engineering, University of Perugia, Italy) received a PhD degree in computer science from the University of Rome “La Sapienza”, in 2000. He is currently an Associate Professor with the Department of Engineering, University of Perugia. His research interests include graph drawing, information visualization, algorithm engineering, and computational geometry. He has authored more than 150 international publications in the above areas and chaired the program committee of the International Symposium on Graph Drawing. Mathias Efinger (Faculty of Information Systems and Applied Computer Sciences, University of Bamberg, Germany) is a graduate student pursuing a Master’s degree in International Information Systems Management. With a keen interest in network analysis, Mathias is preparing to embark on a career as an IT consultant in the aviation industry. Xisa Lina Eich (Faculty of Information Systems and Applied Computer Sciences, University of Bamberg, Germany) holds a Master’s degree in International Information Systems Management from the University of Bamberg. She currently works as an IT consultant at BCG Platinion. Martin Feldges (Cologne Institute of Information Systems, University of Cologne, Germany) is currently employed as a business intelligence consultant at areto consulting GmbH. He holds a Bachelor of Science and a Master of Science in Information Systems from the University of Cologne in Germany. His research interests lie in the fields of machine learning and data warehousing, with a specific focus on how these technologies can be used to solve business problems. vii
viii Handbook of social computing Andrea Fronzetti Colladon (Department of Engineering, University of Perugia, Italy) is Associate Professor of Business Management and Analytics and Head of the Business and Collective Intelligence Lab at the University of Perugia. Previously, he was Visiting Professor at the Department of Management of Kozminski University and Visiting Scholar at the MIT Center for Collective Intelligence. In his research, Professor Fronzetti Colladon combines methods from network science, natural language processing, and machine learning with theories from the social sciences, psychology, humanities, and linguistics to advance knowledge and discovery about management and human behavior. He is the author of more than 100 scientific publications, and his work has been featured in magazines and newspapers such as Harvard Business Review, Psychology Today, Il Sole 24 Ore, and Warsaw Business Journal. Professor Fronzetti Colladon serves as Area Editor of Computers and Industrial Engineering, Associate Editor of the International Journal of Engineering Business Management, and Academic Editor of PLoS One and Nature Scientific Reports. He is also the instructor of several courses on Social Network Analysis, Text Mining, Business Analytics, and Change Management. Patrick Fuchs (University of Bamberg, Germany) studied business administration in his Bachelor’s degree and worked as a bank clerk. Later he started studying Information Systems (MSc) at the Otto-Friedrich-Universität in Bamberg. He currently lives in Bamberg and is writing his thesis in the field of communication, telecommunications, and computer networks. Julian Gierenz (Otto-Friedrich-Universität, Germany) received his BA in Staatswissenschaften with a focus on Computational Social Science from the University of Erfurt and his MA in Political Science with a focus on Computational Social Science from Otto-Friedrich-Universität Bamberg. He currently works as a student assistant at the Chair of Business Informatics, especially Social Networks. Peter A. Gloor is a Research Scientist at the Center for Collective Intelligence at MIT where he leads a project exploring Collaborative Innovation Networks and Happiness. He is also Founder and Chief Creative Officer of software company galaxyadvisors, and Honorary Professor at University of Cologne and Jilin University, Changchun, China. Previosuly he was a partner with Deloitte and PwC, and a manager at UBS. He obtained his PhD in Computer Science from University of Zurich and was a post-doc at the MIT Lab for Computer Science. His newest book, Happimetrics: Leveraging AI to Untangle the Surprising Link Between Ethics, Happiness, and Business Success, was published by Edward Elgar in October 2022. Julia Gluesing (Wayne State University, USA) is a business and organizational anthropologist with more than 40 years’ experience in industry and academia as a consultant, researcher, and trainer in global business development focusing on global leadership development, managing global teams, managing change, innovating across cultures, and cross-cultural communication and training. Julia is a part-time faculty member in the Industrial and Systems Engineering Department at Wayne State University where she teaches the management of technology change and serves as a leadership project advisor in the Engineering Management Master’s Program. She also teaches courses in qualitative methods, global leadership, and global perspectives in the Global Executive Track PhD Program, for which she was a founder and co-director.
Contributors ix Francesca Greco (University of Udine, Italy), PhD in sociology and psychology, is a tenure track Assistant Professor at the Department of Languages and Literatures, Communication, Education and Society, University of Udine, where she teaches Advertising Communication and Public Relations. She is the “Organ donation and transplants” expert of the Health and Medicine Board, Italian Association of Sociology (AIS). She is an expert in communication and cultural processes, big data, computational sociology, and natural language processing. She developed a socio-cultural profiling method, emotional text mining (ETM), which won awards in Italy and France. Francesca Grippa (Northeastern University, USA) is a Professor of Business Strategy and Associate Dean of Research at Northeastern University, College of Professional Studies, where she leads the Lab for Inclusive Entrepreneurship. Dr Grippa is Research Affiliate at MIT Media Lab in the Human Dynamics group. Dr Grippa has authored several scientific papers on collaborative innovation networks in the Journal of Business Research, Scientific Reports, Knowledge Management Research & Practice, and Journal of Knowledge Management and Social Networks, among others. She has co-authored three books on the digital transformation of collaboration and collaborative innovation network. Dilber Güclü (Universität zu Köln, Germany) has a BSc in Wirtschaftsinformatik and a MSc in Information Systems from Universität zu Köln. She is currently a working student at Adesso, where she works as a front-end developer. Alina Hafner (University of Bamberg, Germany) was an Information Systems Master’s student at the University of Bamberg. She is now working on her PhD at the Technical University of Munich. Marius Heck (University of Bamberg, Germany) is a Master’s student in International Information Systems Management at the University of Bamberg and is currently writing his thesis about digital trends in social networks. He works as a student assistant at the Department of Information Systems and Social Networks at the University of Bamberg. Teresa Heyder (University of Bamberg, Germany) holds a Master’s degree in Business Administration from the University of Bamberg with a focus on human resource management (HRM) and innovation. After working as an employer branding manager at HUK Coburg, she now works as a research associate at the Department of Information Systems and Social Networks at the University of Bamberg. Dariusz Jemielniak specializes in the strategy of organizations in the technology (Internet) industry and the study of open collaboration community. He holds a Master’s degree (2000), PhD (2004), habilitation (2009), and professorship in management, along with another habilitation in sociology (2019). In 2020, he became the youngest correspondent member of the Polish Academy of Sciences in humanities and social sciences in the history of the institution. Since 2015, Jemielniak has been working as a faculty associate at the Berkman Klein Center for Internet and Society at Harvard University. Most significant works: The New Knowledge Workers (Edward Elgar, 2012), Common Knowledge? An Ethnography of Wikipedia (Stanford University Press, 2014), Thick Big Data: Doing Digital Social Sciences (Oxford University Press, 2020), Collaborative Society (MIT Press, 2020, together with A. Przegalinska), and Strategizing AI in Business and Education: Emerging Technologies and Business Strategy (Cambridge University Press, 2023, together with A. Przegalinska).
x Handbook of social computing Vanessa Kremer (Universität zu Köln, Germany) is a skilled data specialist with a BSc in Information Informatics and an MSc in Information Systems, both from Universität zu Köln. She is currently working in the context of data engineering and data analytics in the real estate sector at Aachener Grundvermögen mbH, with a focus on data integration, transformation, analysis, and visualization. Max Liebig (University of Cologne, Germany) is currently employed as a software developer at the Gothaer Allgemeine Versicherung AG insurance company. He earned a BSc degree in Business Administration, a BSc degree in Information Systems, and a MSc degree in Information Systems from the University of Cologne, Germany. The main focus of his research is in the area of process mining and organizational routines. Giuseppe Liotta is a Professor with the Department of Engineering, University of Perugia, Italy. His research interests include information visualization, graph drawing, and computational geometry. On these topics, he has published more than 250 research papers. He chaired the Steering Committee of the International Symposium of Graph Drawing and Network Visualization and currently serves as the editor in chief of Computer Science Review and of the Journal of Graph Algorithms and Applications. Jacek Mańko holds an MA degree in cognitive science obtained at the Adam Mickiewicz University in Poznan (2014). In the past, he also studied and worked at the Albert-Ludwig University in Freiburg, Germany. In 2020, he graduated from the postgraduate program “Data Processing – Big Data” at Adam Mickiewicz University in Poznan. His research interests include AI across a broad spectrum of other disciplines such as sociology, psychology, cognitive science, economy, and, last but not least, the ethics of AI. Camila Ines Maslatón (University of Bamberg, Germany) studied Multimedia Design at the UADE University in Buenos Aires, Argentina. Further, she did a Master’s in Computing in the Humanities at the Otto-Friedrich-Universität Bamberg in Germany, where she wrote her thesis about localization of center of plants using machine learning. She is currently based in Bonn. Pierre Mongeau is a Full Professor of Social and Public Communication at the University of Quebec at Montreal. His work focuses on the study of phenomena related to human communication: communication in groups, analysis of social networks, and socio-semantic networks. He has published about ten books, about 50 articles, and as many communications in collaboration with researchers and practitioners in the fields of communication, psychology, education, and mathematics. He was successively program director, department head, and Dean of the Faculty of Communication. Fabrizio Montecchiani (Department of Engineering, University of Perugia, Italy) received a PhD degree in information engineering from the University of Perugia in 2014, where he currently works as an Associate Professor. His research interests include graph drawing, computational geometry, visual analytics, and Big Data algorithms. He has published more than 100 scientific publications in the above areas. In 2021, he was recognized by the Italian Chapter-European Association for Theoretical Computer Science (IC-EATCS) as the Best Young Italian Researcher in Theoretical Computer Science. He has been a guest editor of the Journal of Graph Algorithms & Applications and of Computational Geometry: Theory & Applications, and has been a program committee member of international conferences such as the European Workshop on Computational Geometry, International Symposium on Graph Drawing and Network Visualization, Mathematical Foundations of Computer Science, International Workshop on Graph-Theoretic Concepts in Computer Sciences.
Contributors xi Lee J. Morgan is a sophomore in electrical engineering and computer science at the Massachusetts Institute of Technology, USA. Dung Phuong Nguyen is a Master’s student in International Information Systems at the University of Bamberg, Germany. Mikhail Oet is an Associate Teaching Professor at Northeastern University and Lead Faculty in the Master of Science in Commerce and Economic Development program. He has held teaching positions at the Federal Reserve System and Case Western Reserve University, where he obtained his PhD. Mikhail also holds degrees from Yale (Architecture), Harvard (Architecture), Cooper Union (Engineering), and New York University (Finance). Mikhail began his public service career with the Federal Reserve System, focusing on the resilience of complex financial intermediaries. In 2006, Mikhail initiated the development of early warning methodologies to identify instability in the US financial system. As an economist at the Federal Reserve Bank of Cleveland, Mikhail has led the development of financial stability analytics. In this area, Mikhail has authored 21 refereed papers receiving over 550 citations and published in Review of Finance, Journal of Financial Stability, Journal of Banking & Finance, and European Journal of Finance, among others. Tobias Olbrück is a Cloud Advisory Consultant at Accenture. During his Master’s degree at the University of Cologne, Germany, he specialized in the field of data analytics and artificial intelligence. In his Master’s thesis he investigated the influence of congruence between customer and brand personality on the success of companies. Thomas M. Oliver was a data science Master’s student at Lucerne University of Applied Sciences and Arts, Switzerland. He is now working as a software engineer. Halil Ibrahim Özlü (Faculty of Management, Economics, and Social Sciences, University of Cologne, Germany) holds a Master’s degree from the University of Cologne in Information Systems. He currently works as a Data Scientist at PricewaterhouseCoopers. Benjamin B. Paßberger was an Information Systems Master’s student at the University of Bamberg, Germany. He is now working as a consultant. Aarav Patel is a Senior at Amity Regional High School who is conducting research at the MIT Center for Collective Intelligence under Dr Peter Gloor. He is passionate about the intersection of data science, business sustainability, and AI, and has three pending first-author publications in this area. He looks forward to studying Computer Science and Business through the Jerome Fisher Program in Management & Technology at the University of Pennsylvania. Aleksandra Przegalinska is an Associate Professor and Vice-Rector of Kozminski University, Poland, responsible for International Relations and ESR as well as Senior Research Associate at the Center for Labour and Just Economy, Harvard University. Aleksandra is the head of the Human–Machine Interaction Research Center at Kozminski University, and the Leader of the AI in Management Program. Until recently, she conducted post-doctoral research at the Center for Collective Intelligence at the Massachusetts Institute of Technology in Boston. She graduated from The New School for Social Research in New York. She is the co-author of Collaborative Society (MIT Press), and Strategizing AI in Business and Education (Cambridge University Press) published together with Dariusz Jemielniak.
xii Handbook of social computing Louis Renaud-Desjardins holds a Master’s degree in quantum physics and a Master’s degree in oceanography. For the past four years, he has been working at the Office of Digital Initiatives (ODI), University of Quebec at Montreal, Canada to help researchers in the humanities and social sciences on digital issues and digital methods. Fritz K. Renold (Music Productions Renold & Co.) a Swiss saxophonist, composer and producer of the Jazzaar Festival, has been working with international musicians for years. Already in 1987 Renold was elected as the youngest member of the faculty of the Berklee College of Music. Since February 1997, Fritz Renold and the “Bostonian Friends” are under contract with SONY – BMG Music – Columbia Records in New York. Renold’s body of work includes over 1,500 compositions and arrangements, many of which have been recorded on Shanti Records, EPM Records, and SONY Music, Columbia Records. With the concept “Bandstand Learning With Role Models”, which he developed together with his wife Helen Savari, he integrates international jazz stars with young talents of the Swiss music scene. During the past 30 years he has worked with Randy Brecker, Mike Mainieri, Cecil Bridgewater, Benny Golson, Buster Williams, Christian Jacob, Britt Woodman, Bob Berg, Victor Lewis, Harvie S., Jerry Bergonzi, Benny Bailey, Mark Soskin, Bill Pierce, Billy Cobham, Michael Baker, John Aebercrombie, Miroslav Vitous, Alphonso Johnson, and many more. Together with his wife Helen, the album they produced, Ron Carter & The Jazzaar Festival Big Band, was nominated for a Grammy Award. Ken Riopelle is an educator, entrepreneur, management consultant, and retired Research Professor at Wayne State University, USA. His professional career spans over 40 years in both the auto industry and academia. His primary research interests include: accelerating the diffusion of Innovations in Globally Networked Organizations, which was funded by a National Science Foundation (NSF) grant from 2005 to 2010, the study of Collaborative Innovation Networks or COINs, and the Science of Team Science using co-author and co-citation analysis as a method to visualize, measure, and understand scientific collaboration. Johanne Saint-Charles is a Full Professor at the Faculty of Communication of the Université du Québec à Montréal (UQAM), Canada and Director of the Health and Society Institute and the WHO/PAHO Collaborating Centre on Occupational and Environmental Health. In the field of health and the environment, most of her research work is carried out within the framework of an ecosystem approach to health. In communication, she has developed an expertise on relational dynamics within groups and networks; in particular she is interested in how social relations affect the dissemination of ideas and discourse and how these affect the transformation of social relations. She has been teaching for several years on social networks and small groups. Katrin Schillo (Dr. rer. pol., School of Business, Economics and Society, Friedrich-Alexander-University of Erlangen-Nuremberg, Germany) is a Senior Research Scientist and Lecturer in General and Strategic Management at the University of Applied Sciences of the Grisons. Her primary research interests are managerial and cultural aspects of digital transformation processes, as well as corporate development of small and medium-sized enterprises (SMEs).
Contributors xiii Christian Schwendner holds a BSc in Industrial and Electrical Engineering from Bergische Universität Wuppertal and a Master of Science in Information Systems from Universität zu Köln (Germany). He has worked as a consultant for companies in crisis situations and as an expert for process, methods, and tools in product development at Mercedes-Benz Trucks. Currently, he works as a Cyber Security Expert in the Global Cyber Security at Daimler Trucks with a focus on Vehicle Cyber Security and Data Science. Ludovica Segneri is currently a second-year PhD student in Industrial and Information Engineering at the University of Perugia, Italy. Since 2021, she has been a member of the Business and Collective Intelligence Lab at the University of Perugia. Her research is focused on combining social network analysis, machine learning, and text mining to uncover the unconscious social signals that complement conscious language and human behaviors with the aim of understanding their intentions, goals, and values. Hasbi Sevim holds a BSc in Wirtschaftsinformatik from TH Köln and an MSc in Information Systems from Universität zu Köln (Germany). He is currently a consultant at WTS Deutschland, supporting the consulting business. Jan-Marc Siebenlist holds a BSc in BWL from Universität Augsburg and a Master of Science in International Information Systems Management from Otto-Friedrich-Universität Bamberg, Germany. He currently works as a Digital Marketing Specialist at Roland Berger. Monika Sońta is a Communication Professional and Social Scientist who specializes in the area of Organisational Culture, Internal Communication, and Creativity in Business. She has over 15 years’ experience in multinational companies in HR & Communication departments. Monika is a certified facilitator of creative methods of work: FORTH Innovation Method: (https://www.forth-innovation.com/); LEGO SERIOUS PLAY, PLAYMOBILpro, certified SCRUM Master, and Design Sprint Master. Monika is an Assistant Professor at Kozminski University, Poland, in the Management of Networked and Digital Societies (http://nerds .kozminski.edu.pl/). Tuomas Takko is a doctoral researcher in the Complex Systems group in the computer science department at Aalto University, Finland. He has his BSc (Tech) degree in bioinformation technology and his MSc (Tech) degree in Complex Systems, and he is currently working on his PhD. The topic of his dissertation is data-driven modeling human behavior with complex networks. The subjects of his publications include modeling human decision-making in cooperative games with autonomous agents, mining unstructured data into knowledge graphs for modeling, and investigating election interference in social media. Most recently, he has been working on human mobility and modeling exposure between spatially separated populations during the COVID-19 pandemic in Finland. Rebecca von der Grün completed a BA degree in English and American Studies and Ibero-Romance Studies at the Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany. This was followed by studies in Computing in the Humanities (MSc) at the University of Bamberg with a focus on cognitive systems.
xiv Handbook of social computing Fabian Weiland (Cologne Institute of Information Systems, University of Cologne, Germany) works as a data scientist at Taod Consulting GmbH, where he specializes in the development of cloud-based machine learning applications and data warehouse solutions. He obtained his Bachelor’s and Master’s degrees in Information Systems from the University of Cologne. Leveraging his expertise in the field, Fabian is aiding organizations in harnessing the potential of data to improve business insights and decision-making processes. Kuiming Zhao obtained his BA degree in economics at Stony Brook University and his MSc degree in commerce and economic development (concentrating on data analytics) at Northeastern University, USA. He is currently working as a business consultant at PwC in Shanghai. The subjects of his publication include statistics analysis, data mining, and network analysis in economic development and social media. Most recently, he has been working on how disinformation impacted people’s health during the COVID-19 pandemic in the United States. Xiaomu Zhou is the faculty lead in the Master of Professional Studies in Informatics program at the Northeastern University College of Professional Studies, USA, where she is an Associate Teaching Professor. She received her BSc in Computer and Information Technology from Shandong University, a Master’s degree in Computer Science and Engineering from Beijing Institute of Technology, and a PhD in Information Science from the University of Michigan (Ann Arbor). Before joining Northeastern, she was an Assistant Professor at the Rutgers University School of Communication and Information. Dr Zhou’s research focuses on Human–Computer Interaction (HCI) and Health Informatics. She has published in or presented at the ACM International Conferences on Human Factors in Computing Systems, Computer-Supported Cooperative Work, American Medical Informatics Association, and the Association for Information Science and Technology. More specifically, her research provides a better understanding of the information behaviors of clinicians and patients in a healthcare context, which promotes improved information technology design. She also studies human-centered design and evaluation of AI-Empowered Clinical Decision Support Systems. In recent years, Dr Zhou has explored the intersection of social media and public health, examining how disinformation affects public health compliance and outcomes. Claudia Zucca is an Assistant Professor of Organizational Networks at the Jheronimus Academy of Data Science, Tilburg University (the Netherlands). She holds a PhD in computational political science from the University of Exeter and previously worked as a post-doctoral researcher at the University of Glasgow. She is also a Marie Curie Alumna. Her research mainly focuses on computational social science, especially network analysis, applied to the social science domain, such as forming political opinions, introducing innovation in organizations, and health policy implementation.
Introduction – Social computing: panacea or abyss? Peter A. Gloor, Francesca Grippa, Andrea Fronzetti Colladon and Aleksandra Przegalinska
Using digital technologies to analyze social behavior is rapidly becoming popular both in science and industry. Online social media, email, and body sensors provide a sea of data, which can be used to measure, explain, and improve human behavior and social interactions. Social computing is fishing in this sea of data in order to recognize interaction patterns of individuals and groups. Initially leveraging social network analysis (SNA) and network science, the advent of AI has further turbocharged this research area. Technologies such as text mining, natural language processing and generation (NLP/NLG), machine learning and deep learning, voice and face emotion recognition, and virtual reality permit analysis of fuzzy data on a scale and level of accuracy unimaginable just a few years ago. Data sources such as Wikipedia permit demographic insights on the global macro level, while body sensors and smartwatches provide behavioral data on the individual micro level. Social computing thus permeates all aspects of our daily life.
SOCIAL COMPUTING EMPOWERS HYBRID WORKING While the restrictions of COVID-19 are quickly fading away, the pandemic has undoubtedly changed the way we are structuring and conducting our private and business lives. Being stuck at home for months has got many to question the meaning of their jobs. Some even decided that they no longer wanted to return to their office work, and switched careers for more meaningful occupations. However, even those who have remained in their traditional roles are undergoing significant transformations, thanks to the numerous social computing tools available. These tools have enabled individuals to enhance their work processes and communication methods, resulting in a more efficient and effective work environment. Hybrid work has become a popular and well-accepted way of doing business, seeping into all aspects of our daily life. One of the main changes is how business meetings are conducted. Before COVID, salespeople, managers, and many others would hop on a plane taking multi-hour flights for a short meeting with a customer or manager, assuming that there is no substitute for a face-to-face meeting. Since COVID, videoconference meetings have become a well-accepted replacement of in-person on-site meetings. Real estate brokers can sell houses over Zoom, while buying a car does not even need the Zoom call and can be done with a few mouse clicks. Teaching and learning have also become much more virtual, particularly on the secondary and tertiary level, where students occasionally come to the classroom to meet the professor and each other, but frequently attend college and university classes using videoconferencing. But also regular meetings changed in character, as people started to passively participate in boring video meetings – with the camera turned off, while using their time to assist their children doing their homework and doing other household chores. This opens up many opporxv
xvi Handbook of social computing tunities for social computing research to improve the efficiency and user experience of virtual meetings (Roessler et al., 2021).
COMBINING AI WITH SOCIAL COMPUTING CAN MAKE US MORE ETHICAL The most recent breakthrough advances in artificial intelligence, most prominently through ChatGPT and other large language models, raise huge questions about the ethical use of these technologies. Approaches based on social computing are key to better understand ethical behavior of these AI systems (Gloor et al., 2022). Currently it is not clear if it will ever be possible to guarantee that AI will behave fully ethically, if only because there is no universally agreed on definition on what ethical behavior is. Marvin Minsky once said: “Will robots inherit the earth? Yes, but they will be our children.” What he means is that because we made them, the robots will follow our ethical understanding and thus be ethically well-behaved – unfortunately there will always be unethical hackers, leading to unethical robots. Isaac Asimov defined the three laws of robotics which say that (1) A robot may not injure a human being or, through inaction, allow a human being to come to harm. (2) A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. (3) A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. Looking at the military uses of AI, where both Russia and the U.S.A. announced that they are working on self-guided missiles that use AI to find their target autonomously, the First Law has already been violated. Unfortunately, there will always be rogue programmers that either for amorally selfish purposes, or under the premise of jingoistic nationalism, will program AI and robots to do their unethical calling. However, what social computing can do is use AI to measure ethical and moral values. AI models can measure one’s ethical values based on body signals tracked with a smartwatch, based on the words that one uses, and even based on the “honest signals” computed from the email interaction network and dynamics – without even looking at the words that are used (Gloor et al., 2022; Altuntas et al., 2022). This approach will enable you to know your personal moral and ethical values, as well as the ones of the people interacting with you. As shown in previous research projects (e.g., Kim et al., 2019), we frequently are bad judges of our own personal and moral values; sometimes our family and friends are better in assessing them for us. By creating a virtual mirror (Gloor et al., 2017) of your own moral values and ethics, AI will assume the role of family members and friends in showing you how caring, fair, honest and collaborative you truly are. You will be able to reflect on yourself through the eyes of the people you are interacting with, aggregated and computed through AI. Getting such a virtual mirror will assist in living with higher ethical values.
SOCIAL COMPUTING HAS THE POTENTIAL TO BRING BACK “HUMANITY” TO “HUMAN RESOURCES” In preindustrial times, people were like bees, creating what they needed, growing their grain, baking their bread, sewing and stitching their clothes, and building their furniture and houses.
Introduction xvii Farming brought slavery, while with the onset of industrialization, division of labor arose, and managers started to manage their resources – one of which happened to be their human resources. But humans don’t particularly like to be resources – being a resource implies being a passive asset, which is being moved around like a pawn in a game of chess, without agency and own will. Synonyms like “human capital”, “talent”, “labor”, or “manpower” are not much better. “Human capital” is even worse, as it conveys the connotation of foreign ownership: according to definition, capital is an asset owned by an individual or organization available for a purpose such as running a company or investing. I don’t think people want to be owned by their manager or company. “Talent” is somewhat better, as a synonym for natural aptitude or skill; however the word’s origin is similar to “capital”, it also has monetary roots as the “talent” was a currency unit of the Greeks and Romans. Again, I don’t want to be a piece of money owned by my manager or company for my skill. “Labor” implies hard work and great effort with little interest in creativity and imagination. “Manpower” is not much better; it stands for the (amorphous and anonymous) number of people available for work and service, without valuing individual ingenuity and originality. Happiness research has clearly shown that we humans highly value what psychologists call “agency” and “experience”: agency means the capacity to make independent decisions and being in control of one’s own destiny. For instance, happiness researcher Bruno S. Frey (2010) has found that in already quite happy Switzerland, those Swiss cantons whose citizens have the most to say, that is they get to vote the most, are the happiest. He also found that being stuck in traffic, when we totally lose control over where to go, is assured to reduce happiness and make us miserable. The second psychological property is experience, the capability to enjoy and to suffer, to experience compassion and empathy for others. As “human resources” we are denied both agency and experience. While AI has been prominent in Human Resources (HR), for instance by automatically conducting job interviews through a chatbot with candidates for a job – taking away agency and supporting the resource-centric view in HR – AI combined with social computing can also be used in more positive ways by providing transparency about self and others. In our own research at MIT, we have analyzed communication archives for 20 years and derived patterns of successful and unsuccessful teams and individuals. For example, the text of emails can be used to infer personality traits such as OCEAN (Openness, Conscientiousness, Extroversion, Agreeability, Neuroticism) of the authors. The same AI algorithms (Gloor et al., 2022) also detect moral and ethical core values and risk behaviors of individuals (Altuntas et al., 2022). These email analytics can also detect team communication patterns, and improve team communication and collaboration by mirroring these patterns back by “virtual mirroring”. In a research project together with a global consulting firm with over 100,000 employees, “virtual mirroring” was performed by the 240 large global clients for 24 client teams. This involved collecting emails from the 4,000 employees who worked for these 24 large clients and developing communication metrics that were indicative of customer satisfaction, such as a stable vendor contact, email response time, or emotional email content. Subsequently, the team leaders of these 24 customer teams were shown for six months how they and their employees behaved in communicating with customers in relation to these communication metrics. Before and after this project, customer satisfaction was measured for all 240 key accounts using the Net Promoter Score. It was found that those teams that received feedback on their communication behaviors related to the communication metrics achieved a 17 percent improvement in customer satisfaction (Gloor et al., 2017).
xviii Handbook of social computing In a research project on virtual mirroring we equipped 20 employees in the IT department of a bank with Happimeters, an AI smartwatch app that measures employees’ state of mind and emotions based on their body signals, tells them when they are stressed, and then prompts them to perform relaxing activities (Roessler & Gloor, 2021). In the experiment, ten employees received “virtual mirroring” feedback from the Happimeter app about what made them happy and what stressed them out, while the other half, the control group, just wore the watches so that the control group’s state of mind could also be measured but they did not receive feedback. At the end of the project, those employees who received virtual mirroring feedback were 16 percent happier than the control group. Social computing not only measures individual emotions to increase happiness and well-being, but it also is key for identifying fake news.
TOO MANY “FAKE” EXPERTS: WHOM SHOULD WE TRUST? ChatGPT and other large language models make creating fake news easy, helping self-proclaimed experts to fake expertise. To illustrate how social computing can be used to distinguish experts from conspiracy theorists, the authors analyzed different communities of self-proclaimed COVID-19 experts on Twitter (now known as X). These experts come from two opposing sides: mainstream science and government experts on the one side, and conspiracy theorists on the other side. Figuring out whom to believe, with so many “experts” contradicting each other can become a real headache. In the context of COVID, politicians, regulators, and scientists in different countries, and even within the same country, had been fundamentally disagreeing and fighting with each other about the best strategies for coping with the disease. To shed some light, we did an analysis in the height of the pandemic using the social computing tool Galaxyscope that identifies digital virtual tribes (Gloor et al., 2020), based on word usage. We created two digital tribes, “Covid-Experts”, and “Alt-health” (spiritual healing believers, COVID deniers, and fringe conspiracy theorists). While the COVID-Experts form a solid cluster, Alt-health has some separate clusters, connected by a few gatekeepers. However, the most shocking insight is that Dr Tedros (the director of the World Health Organization (WHO), who was supposed take the lead in the fight against COVID-19) appears in both networks – at the periphery. This means that mainstream science and spiritual healers are both desperately looking for leadership – and not finding it. In the COVID-Experts community, people like Eric Topol, Director of the Scripps translational institute, Helen Branswell, a Canadian global health reporter, and Marion Koopmans, a Dutch virologist, occupy the central positions. In the Alt-Health cluster, activists like FLAutismMom, Brett Weinstein, and Spiro Skouras are in the center. We see that fear and sadness dominate the COVID-19 discourse, with experts recommending handwashing and mask wearing, while the “Alt-health” members see conspiracies everywhere, and recommend miracle cures against COVID such as Melatonin and Vitamin C instead of having to wear masks. Both groups place strong emphasis on maintaining mental health among all the chaos. The application of social computing technologies thus assists the end user in more clearly identifying the origins of conspiracy theories and understanding the positions of experts.
Introduction xix After this brief overview of some fundamental applications of social computing, we will now briefly introduce the remaining chapters of the book.
CHAPTER OVERVIEW This Handbook surveys and categorizes this vast new emerging discipline, providing a sorely needed overview over this exciting new field of research both from the computer science and AI side, and the sociological and psychological behavioral side. Starting with a discussion of the theoretical foundations and methods of social computing, the volume presents use cases from a variety of industries and covers important topics such as ethical implications of AI applications, regulatory frameworks protecting data privacy, and technological platforms supporting safe data sharing among individuals and organizations. Chapter 1 delves into the fascinating world of “Network Data Visualization.” Didimo, Liotta, and Montecchiani provide a comprehensive review of the fundamental paradigms and techniques used to visualize networked data sets. They also highlight the benefits of new research lines, such as immersive analytics, which have the potential to open up new visualization models and forms of human–machine collaboration. Chapter 2 explores the use of exponential random graph models as tools to answer causal inference questions through cross-sectional network data. Claudia Zucca provides evidence on how to conduct this type of research by analyzing patterns of collaborations among rock musicians using network data from Spotify. In Chapter 3, Saint-Charles, Mongeau, and Renaud-Desjardins delve into classic NLP method biases and suggest new criteria for the selection of an NLP method. They compare the results of well-known NLP methods such as Jaccard, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Term Frequency-Inverse Document Frequency (TF-IDF) on text corpora with different characteristics. The three chapters in Part II showcase case studies that demonstrate the immense potential of predicting outcomes through online social media. In Chapter 4, Schwendner, Kremer, Gierenz, Sevim, Siebenlist and Güclü delve into the concept of “Black Swan events” and how they can cause cascading behaviors in social networks. Their research aims to enhance the predictive capabilities of traditional models and engineer a predictive indicator by analyzing the corresponding communication network. This approach can help predict market fluctuations with greater accuracy. The authors’ work is a testament to the power of social media in predicting outcomes and understanding complex phenomena. By leveraging the vast amounts of data available on social media platforms, researchers can uncover hidden patterns and insights that were previously impossible to detect. This has significant implications for businesses, policymakers, and individuals alike, as it enables them to make more informed decisions and stay ahead of the curve. Overall, the case studies presented in these chapters provide a compelling argument for the importance of social media in predictive analytics. As we continue to generate more data and develop new tools for analysis, the potential for social media to transform the way we predict and understand the world around us is truly limitless. Chapter 5 presents a fascinating analysis of how presidential administrations communicated on Twitter during the COVID-19 pandemic. Oet, Takko, and Zhou conducted a study to measure social media polarization by extracting sentiment and topical dimensions from localized tweets. They then calculated state-specific dispersion and bimodality. The results of
xx Handbook of social computing their study indicate that clear online communication can lead to increased public trust, better control over the spread of the disease, and reduced economic uncertainty. In Chapter 6, the authors focus on the health impact of disinformation during the COVID-19 pandemic. Oet, Zhou, Zhao, and Takko introduce a new, scalable model for measuring disinformation and estimate the prevalence of bots and their role in generating and spreading pandemic disinformation compared to human accounts. This chapter sheds light on the dangers of disinformation and the importance of accurate information during a public health crisis. Overall, this chapter provides valuable insights into the role of social media in shaping public opinion during a pandemic. The authors’ research highlights the importance of clear and accurate communication from leaders and the dangers of disinformation. This chapter is a must-read for anyone interested in the intersection of social media and public health. Part III on emotions and social computing comprises three chapters, each offering unique insights into the use of computational methods for emotion recognition and prediction. Chapter 7, authored by Bui, Feldges, Liebig, and Weiland, introduces a novel indicator of success for YouTube videos, utilizing a prediction model trained on facial attributes within thumbnails. In Chapter 8, Morgan and Gloor explore the use of sensors embedded in smartwatches to predict the emotions of musicians. Finally, Chapter 9, written by Bhave and colleagues, delves into the construction of a machine learning model that predicts human emotions using features extracted from electrical signals emitted by plants. These chapters offer a fascinating glimpse into the intersection of technology and human emotion. By utilizing cutting-edge computational methods, researchers are able to gain new insights into the ways in which emotions are expressed and experienced. From predicting the success of online videos to understanding the emotional states of musicians, these studies have the potential to revolutionize our understanding of human emotion and its role in our lives. Part IV, “Applications in Business”, showcases the power of user-generated social media content in predicting entrepreneurial success and inferring the feelings, behaviors, and personalities of online users. This section features four case studies that demonstrate the effectiveness of this approach. In Chapter 10, Olbrück and colleagues analyze Twitter data from 29 car manufacturers in the U.S.A., along with data from their potential customers. By utilizing the IBM Watson Personality Insights and the Griffin Tribefinder, they were able to compute personality features of customers and demonstrate that brands with a personality similar to their customers’ lead to increased business performance. Chapter 11 introduces a new methodology called Netnography 2.0, which can help companies improve their segmentation strategies using social media data. This approach is particularly useful in identifying and targeting specific customer groups. Chapter 12 combines emotional text mining (ETM) and machine learning to explore how the use of language can impact the success of crowdfunding campaigns. This approach can help companies craft more effective messaging and increase their chances of success. Finally, Chapter 13 offers a unique ethical perspective on the implications of using consent banners and cookies on websites. The chapter discusses the possible implications for data analytics and AI applications and highlights the importance of ethical considerations in these areas. Overall, this section provides valuable insights into the power of social media data in driving business success and highlights the importance of ethical considerations in data analytics and AI applications. Part V on sustainability explores various applications of machine learning and predictive analytics techniques to help organizations create more socially and environmentally impactful initiatives. In Chapter 14, Patel and Gloor collected data from Wikipedia, Twitter, LinkedIn,
Introduction xxi and Google News for the S&P 500 companies and created a data-driven ESG evaluation system that can provide better guidance and more systemized scores by incorporating social sentiment and using NLP algorithms. Social sentiment allows for more balanced perspectives which directly highlight public opinion. In Chapter 15, Mańko and Jemielniak observe how polarized groups of climate-change believers and deniers on Twitter cohabit separate digital echo chambers. To investigate how the online discourse differs between the two ideological camps, they analyzed nearly 400,000 tweets about climate change and conducted both qualitative and quantitative content analysis. In Part VI, devoted to the frontiers of human interaction with other species, Fuchs, von der Grün, Maslatón and Gloor present an unconventional experiment to gather insights about the environment in which both people and plants live. Chapter 16 explores how the bioelectrochemical signals generated by tomato plants react to human voices of different frequencies by discussing an experiment where a plant was exposed to recordings of people singing, talking and reading. Chapter 17 continues the exploration of the impact of communication between humans and other species by presenting a mobile app that relies on various machine learning models to predict the emotions of dogs based on their body posture. The Handbook concludes with thought-provoking questions about the impact of low-code and no-code tools as solutions that aim to simplify the process of creating software applications. Sońta and Przegalinska guide us in this discussion by providing valuable insights on how to teach these competencies to non-IT students, enabling them to create applications that meet their specific needs. This approach not only empowers individuals but also has the potential to revolutionize the software development industry by democratizing the creation of applications. In conclusion, as we explored in this Handbook the emerging field of social computing, we were expertly guided by contributors who sifted through the vast sea of data to identify patterns of interaction among individuals, machines, and groups. The use of computers to analyze human and social behavior is a rapidly evolving field that holds great promise for both scientific and industrial applications. With the latest advancements in AI and data analysis technologies, researchers can now delve deeper into the complexities of human behavior and interaction than ever before.
REFERENCES Altuntas, E., Gloor, P. A., & Budner, P. (2022). Measuring ethical values with AI for better teamwork. Future Internet, 14(5), 133. Asimov, Isaac (1950). “Runaround”. I, Robot (The Isaac Asimov Collection ed.). New York City: Doubleday. p. 40. ISBN 0-385-42304-7. Frey, B. S. (2010). Happiness: A Revolution in Economics. MIT Press. Gloor, P., Fronzetti Colladon, A. F., de Oliveira, J. M., & Rovelli, P. (2020). Put your money where your mouth is: using deep learning to identify consumer tribes from word usage. International Journal of Information Management, 51 1 April, 101924. Gloor, P., Fronzetti Colladon, A. F., Giacomelli, G., Saran, T., & Grippa, F. (2017). The impact of virtual mirroring on customer satisfaction. Journal of Business Research, 75 1 June, 67–76. Gloor, P., Fronzetti Colladon, A., & Grippa, F. (2022). Measuring ethical behavior with AI and natural language processing to assess business success. Scientific Reports, 12(1), 10228. Kim, H., Di Domenico, S. I., & Connelly, B. S. (2019). Self–other agreement in personality reports: a meta-analytic comparison of self- and informant-report means. Psychological Science, 30(1), 129–38.
xxii Handbook of social computing Minsky, M. L. (1994). Will Robots Inherit the Earth? Scientific American, October. https://web.media .mit.edu/~minsky/papers/sciam.inherit.html (retrieved Dec 12, 2023) Roessler, J., & Gloor, P. A. (2021). Measuring happiness increases happiness. Journal of Computational Social Science, 4(1), 123–46. Roessler, J., Sun, J., & Gloor, P. (2021). Reducing videoconferencing fatigue through facial emotion recognition. Future Internet, 13(5), 126.
PART I INTRODUCTION TO SOCIAL COMPUTING
1. Network data visualization Walter Didimo, Giuseppe Liotta and Fabrizio Montecchiani
1. INTRODUCTION Data and information visualization gathers research in human–computer interaction, computer science, graphics, visual design, psychology, and business methods (Bederson & Shneiderman, 2003). The ultimate goal is to represent data and information in a meaningful, visual way that users can interpret and easily comprehend, which is often a key requirement for businesses and organizations. Indeed, information visualization tools can support decision-makers in the process of navigating the data effectively, to deliver value to the entire organization. In this context, many data sets arising from a variety of applications, such as social networks, exhibit a relational nature and can be conveniently modeled as graphs, possibly enriched with vertex and edge attributes. Network visualization is hence a key research area aimed at producing effective visual representations of networked data sets, and it requires a broad range of skills to be successfully tackled. Some of the main challenges can be briefly summarized as follows. In terms of their connectivity structure, real-world networks are often locally dense and globally sparse, making it difficult to create a readable layout of the entire data set within a single visualization model (Angori et al., 2022). Also, nodes and edges may have multiple heterogeneous attributes, which may require different types of graphical representations. Additional facets that are often relevant for an effective visualization may involve network’s dynamics and spatialization. This chapter introduces and shortly surveys some fundamental paradigms and algorithms to visualize networked data sets. According to the subject of this book, we mostly focus on social network visualization. In Section 2 we provide a brief overview of the main paradigms that can be used to visualize networks, ranging from the classical node-link paradigm to multi-faceted representations. In Section 3 we give application examples, by discussing two notable real-world scenarios in which network visualization plays a crucial role in supporting analysts and decision makers, namely financial and fiscal risk analysis, and influence maximization. We conclude in Section 4.
2.
SOCIAL NETWORK VISUALIZATION PARADIGMS
A visualization paradigm, also called visualization model, defines the type of representation adopted to visually convey the main elements of a network, i.e., their nodes and edges. Additionally, the paradigm may specify how to visually represent node and edge attributes when required. Although there are many different network visualization paradigms proposed in the literature, they can be classified into three main categories: node-link representations, space-filling representations, and hybrid representations (discussed below).
2
Network data visualization 3 2.1
Node-link Representations
The node-link representation is the most popular and arguably the most intuitive way for humans to visually convey a graph. It represents each node as a circle, a rectangle, an icon, or some other type of geometric feature, depending on the specific application domain. The edges are represented as simple curves connecting their end-nodes. In the visualization of social networks, edges are most often drawn as straight lines, although polygonal lines or smoothed curves are sometimes used in the production of hierarchical layouts or to favor the reduction of visual clutter in the visualization. Figures 1.1 and 1.2 show examples of node-link representations of social networks with different shapes for the nodes and for the edges.
Notes: In the figure to the left, each node represents a scientist and each edge corresponds to a collaboration between scientists. In the figure to the right, some clusters of nodes have been collapsed into bigger circles.
Figure 1.1
Two node-link representations of a collaboration network in computer science, where nodes are represented as circles and edges are drawn as straight lines
In some application context, constrained node-link representations are also considered to adhere to some types of diagrams that are commonly adopted. For instance, node-link representations of business processes, flow charts, or circuit schematics typically adhere to the so-called orthogonal drawing convention, where the edges are drawn as chains of horizontal and vertical segments (Di Battista et al., 1999). The orthogonal drawing convention has also been used for the visualization of social networks to represent the high-level structure of the network in terms of communities of nodes and connections between these communities. We remark that there exists multiple software offering off-the-shelf modules and components to create visually appealing node-link representations of networks; among them we mention (in alphabetical order) Cytoscape1 (Shannon et al., 2003), Gephi2 (Bastian et al., 2009), TomSawyer,3 and yWorks.4 In many application domains, the visualization of social networks is flat and neither highlights hierarchical relationships between nodes nor guarantees specific geometric constraints on the layout. In these cases, the automatic computation of a node-link representation of the network is done using so-called force-directed algorithms. They follow two basic principles: (i) edges should not be too long and hence adjacent vertices should be drawn near to each other; (ii) vertices should not overlap and should be evenly distributed in the drawing area.
4 Handbook of social computing
Notes: Each node is a person of the hierarchy, represented as a rectangle. In the left-hand drawing, some edges are polygonal lines instead of straight lines; this helps keep the visualization compact without creating edge crossings. In the right-hand drawing, polygonal lines are replaced by smoothed curves.
Figure 1.2
Node-link representations of hierarchical networks
These two principles can be encoded in a system of forces acting on the vertices of the input graph. The goal of the algorithm is then finding a placement of the nodes that corresponds to a minimum energy configuration of the force system. For example, nodes can be modeled as electric charges that repulse each other, edges as springs attracting adjacent nodes, and a placement can be found by iteratively computing the forces acting on each node and updating the nodes’ positions accordingly. Refer to Cheong & Si (2020) and Kobourov (2013) for extensive surveys on this type of algorithm. Although node-link representations are very popular and widely used, they also have some clear limits when the network is large or when its structure exhibits very dense portions. These limits can be observed in terms of both visual complexity of the layout and computational time needed to obtain the visualization. For instance, naïvely applying force-directed algorithms to large networks with a scale-free structure often result in so-called “hairball drawings”, in which overplotting and visual clutter hinder readability; see, for instance, Figure 1.3a. In terms of computational complexity, a quantum leap towards the applicability of force-directed algorithms to larger graphs is represented by multilevel force-directed algorithms (Hachul & Jünger, 2007). In the first phase, these algorithms produce from the original graph a hierarchy of coarser graphs; in the second phase, they recursively compute the layout through a sequence of drawing refinements, from the coarsest graph to the initial graph. Notably, the OGDF library5 (Chimani et al., 2013) is an open-source toolkit containing C++ implementations of several force-directed algorithms, including multilevel algorithms. Also, to unleash the power of modern computing infrastructures, different directions have been investigated, such as GPU-based implementations (Ingram et al., 2009), as well as parallel and distributed implementations (Arleo et al., 2019; Meyerhenke et al., 2018). Furthermore, there exist several JavaScript libraries for network and data visualization that can be easily exploited in web and mobile applications, such as the very popular D3.js6 (Bostock et al., 2011).
Network data visualization 5
Note: The “hairball effect” at the core of the network (caused by overplotting and visual clutter) hinder readability.
Figure 1.3a
A node-link representation of a complex network
Concerning the visual complexity issue, visual summaries can be used to provide a succinct and yet faithful visual abstraction of the underlying network. Visual summaries can take many different forms: examples include schematic representations, possibly using glyphs, of either the underlying network or its statistics. Inspired by Shneiderman’s mantra, “Overview first, zoom and filter, then details-on-demand” (Shneiderman, 1996), several authors have proposed multilevel visualizations aimed at computing multiple abstractions of the underlying network (Consalvi et al., 2022; Perrot & Auber, 2020; Zinsmaier et al., 2012). Figure 1.3b shows a screenshot of a system that computes a density-based visual summary with tunable level-of-detail rendering.
6 Handbook of social computing
Note: The in-browser system, which computes visual summaries of large networks with tunable level-of-detail representations. Source: Consalvi et al. (2022).
Figure 1.3b 2.2
A screenshot of BrowVis
Matrix-based and Hybrid Representations
A matrix-based representation of a network is a simple and compact visualization inspired by the so-called “adjacency matrix” data structure: rows and columns of a square matrix correspond to the nodes of the network, and a colored cell indicates the presence of an edge between the row and the column that identify that cell. While this type of representation is less intuitive for humans than a node-link diagram and makes it difficult to recognize paths between pairs of nodes, it avoids visual clutter when the density of the network is relatively high. Often, a matrix-based representation helps answering overview tasks of analysis better than node-link diagrams; for example, it may facilitate the identification of clusters of nodes that are highly connected, also called communities (Okoe et al., 2019). However, the ability of a matrix-based representation to visually convey structural properties of the network in a clear manner strongly depends on the specific ordering of its rows and columns, for which different criteria have been proposed and experimented in the literature (Behrisch et al., 2016). Based on the considerations above, and since many real-world networks are globally sparse but locally dense, several types of hybrid representations have also been proposed in recent years. They exploit the node-link paradigm to represent the inter-community structure of the network, and combine it with matrices or other types of representations (such as chord diagrams) to represent the structure of each community (Angori et al., 2022; Batagelj et al., 2011; Henry et al., 2007). An example of a hybrid representation of a social network is shown in Figure 1.4.
Network data visualization 7
Note: This hybrid representation combines node-link representations with matrix-based representations and chord diagrams for visualizing the structure of different communities in the network.
Figure 1.4
A hybrid representation of a collaboration network
The utility of exploiting hybrid representations in the execution of analysis tasks has been investigated by a recent user study, which highlights some advantages and drawbacks of these types of representations. 2.3
Multi-faceted Network Visualization
While conveying the basic graph connectivity structure is a common goal of any network visualization approach, there are other additional facets that are often useful to consider. They depend on the specific application for which the visualization is designed and on the specific tasks of analysis. These facets can be incorporated into the visualization with different approaches (e.g., juxtaposition, superimposition, nesting) and can be roughly classified as follows (see Hadlak et al., 2015 for a survey): ● Network’s partitions/groupings: they give an idea of the clustering structure of the network, which complements the low-level connectivity information (Vehlow et al., 2017). As also observed in the previous subsection, this high-level view may be of great support in the interactive exploration of globally sparse but locally dense networks. ● Network’s attributes: the possibility of displaying different node and edge attributes without creating visual clutter plays a central role in the analysis of real-world networks from different perspectives (Kerren et al., 2014).
8 Handbook of social computing ● Network’s dynamics: the temporal factor represents a relevant facet when the network changes dynamically. Different approaches have been proposed to effectively include this factor in the visualization (Beck et al., 2017; Didimo et al., 2019). ● Network’s spatialization: this facet introduces geometric constraints in the visualization, which are often relevant to relate nodes and edges with some geographic contexts (di Giacomo et al., 2010; Rodgers, 2005).
3.
APPLICATION EXAMPLES
This section gives some examples of the usefulness of network data visualization in real-world application domains. Focusing on social networks, we briefly discuss an application to financial and fiscal risk analysis, and an application to influence maximization. 3.1
Financial and Fiscal Risk Analysis
The importance of using visualization to analyze financial activity networks and identify economic and financial frauds is well described in the literature (e.g., Didimo et al., 2014; Dilla & Raschke, 2015; Kielman et al., 2009; Leite et al., 2018). Notable examples can be found, for instance, in a series of works focused on the fiscal risk analysis of Italian taxpayers (Didimo et al., 2018, 2019, 2020). Here, the relationships among taxpayers are modeled by a social network and suspicious graph patterns on such networks can be defined through a simple visual language. Next, the adoption of graph databases makes it possible to efficiently retrieve matching subgraphs, while a visual interface supports the exploration of these results. The authors also describe how social network analysis, machine learning, information diffusion algorithms and visual analytics tools can be suitably blended to compose advanced decision support systems, which can drastically improve the performance of the analysts and officers. 3.2
Influence Maximization
In social networks, it is well understood that individuals’ decisions are strongly influenced by recommendations from their friends, acquaintances, and very important people (Kempe et al., 2015). For instance, individuals’ decisions to purchase a product or adopt an opinion are strongly influenced by recommendations from their friends and acquaintances. It hence comes with no surprise that online social networking platforms are becoming the favorite venue where companies advertise their products and politicians run their campaigns (Petrova & Sen, 2016). In this context, research on so-called influence maximization focuses on understanding and leveraging such influence to obtain a much larger spread of the product or opinion than traditional marketing campaigns targeted to single individuals (Li et al., 2018). The goal is to identify and select a so-called “seed set” of users that maximizes the expected number of users positively influenced by an underlying (stochastic) information diffusion process. Recently, researchers experimented with the adoption of visual analytics to support domain experts in analyzing information diffusion processes and influence maximization algorithms. We point the reader to the recent work by (Arleo et al., 2022) for a discussion about related work and for the description of a system, called VAIM (Visual Analytics for Influence
Network data visualization 9 Maximization), which provides facilities to simulate an information diffusion process over a given network and offers problem-oriented visual analytics tools to explore the related data. VAIM follows a focus+context approach and offers an interface organized as a dashboard with multiple coordinated views. Each view provides a different visualization paradigm aimed at highlighting different aspects of the underlying data. In particular, the system makes use of visual summaries to create a succinct overview of the social network, and of node-link representations to provide detailed representations of restricted subgraphs.
4. CONCLUSION Motivated by the increasing relevance of visualization to guide data analysis and support decision-makers, in this chapter we have provided a brief introduction to some of the fundamental paradigms and techniques to visualize networked data sets, with a particular focus on social networks. While the network visualization field has reached a notable maturity level and visual analytics tools are nowadays integrated in nearly all software systems dealing with data analysis and business intelligence, there are still several challenges that should be faced to obtain more readable representations and more effective systems. A key challenge that it is worth mentioning is to deal with the always increasing volume and complexity of data, which requires new forms of data exploration and more sophisticated ideas to create effective visual summaries. In the direction of tackling this challenge, the research line on immersive analytics (Ens et al., 2021) is still in its infancy and we believe it has great potential, as it can open the way to new visualization models, as well as to new forms of human–computer interaction and user collaboration.
NOTES 1. See https://cytoscape.org/. 2. See https://gephi.org/. 3. See https://www.tomsawyer.com/. 4. See https://www.yworks.com/. 5. See https://ogdf.uos.de/. 6. See https://d3js.org/.
REFERENCES Angori, L., Didimo, W., Montecchiani, F., Pagliuca, D., & Tappini, A. (2022). Hybrid graph visualizations with ChordLink: algorithms, experiments, and applications. IEEE Transactions on Visualization and Computer Graphics, 28(2), 1288–300. https://doi.org/10.1109/TVCG.2020.3016055. Arleo, A., Didimo, W., Liotta, G., Miksch, S., & Montecchiani, F. (2022). Influence maximization with visual analytics. IEEE Transactions on Visualization and Computer Graphics, 28(10), 3428–40. https://doi.org/10.1109/TVCG.2022.3190623. Arleo, A., Didimo, W., Liotta, G., & Montecchiani, F. (2019). A distributed multilevel force-directed algorithm. IEEE Transactions on Parallel and Distributed Systems, 30(4), 754–65. https://doi.org/10 .1109/TPDS.2018.2869805.
10 Handbook of social computing Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating networks. Proceedings of the International AAAI Conference on Web and Social Media, 3(1), 361–2. https://doi.org/10.1609/icwsm.v3i1.13937. Batagelj, V., Brandenburg, F. J., Didimo, W., Liotta, G., Palladino, P., & Patrignani, M. (2011). Visual analysis of large graphs using (X,Y)-clustering and hybrid visualizations. IEEE Transactions on Visualization and Computer Graphics, 17(11), 1587–98. https://doi.org/10.1109/TVCG.2010.265. Beck, F., Burch, M., Diehl, S., & Weiskopf, D. (2017). A taxonomy and survey of dynamic graph visualization. Computer Graphics Forum, 36(1), 133–59. https://doi.org/10.1111/cgf.12791. Bederson, B., & Shneiderman, B. (eds) (2003). The Craft of Information Visualization. Morgan Kaufmann. Behrisch, M., Bach, B., Henry Riche, N., Schreck, T., & Fekete, J.-D. (2016). Matrix reordering methods for table and network visualization. Computer Graphics Forum, 35(3), 693–716. https://doi.org/10 .1111/cgf.12935. Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3 data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2301–9. https://doi.org/10.1109/TVCG.2011.185. Cheong, S.-H., & Si, Y.-W. (2020). Force-directed algorithms for schematic drawings and placement: a survey. Information Visualization, 19(1), 65–91. https://doi.org/10.1177/1473871618821740. Chimani, M., Gutwenger, C., Jünger, M., Klau, G. W., Klein, K., & Mutzel, P. (2013). The Open Graph Drawing Framework (OGDF). In R. Tamassia (ed.), Handbook of Graph Drawing and Visualization. CRC Press. Consalvi, L., Didimo, W., Liotta, G., & Montecchiani, F. (2022). BrowVis: visualizing large graphs in the browser. IEEE Access, 10, 115776–86. https://doi.org/10.1109/ACCESS.2022.3218884. di Battista, G., Eades, P., Tamassia, R., & Tollis, I. G. (1999). Graph Drawing: Algorithms for the Visualization of Graphs. Prentice-Hall. di Giacomo, E., Didimo, W., Liotta, G., & Palladino, P. (2010). Visual analysis of one-to-many matched graphs. Journal of Graph Algorithms and Applications, 14(1), 97–119. https://doi.org/10.7155/jgaa .00200. Didimo, W., Giamminonni, L., Liotta, G., Montecchiani, F., & Pagliuca, D. (2018). A visual analytics system to support tax evasion discovery. Decision Support Systems, 110, 71–83. https://doi.org/10 .1016/j.dss.2018.03.008. Didimo, W., Grilli, L., Liotta, G., Menconi, L., Montecchiani, F., & Pagliuca, D. (2020). Combining network visualization and data mining for tax risk assessment. IEEE Access, 8, 16073–86. https://doi .org/10.1109/ACCESS.2020.2967974. Didimo, W., Grilli, L., Liotta, G., Montecchiani, F., & Pagliuca, D. (2019). Visual querying and analysis of temporal fiscal networks. Information Sciences, 505, 406–21. https://doi.org/10.1016/j.ins.2019.07 .097. Didimo, W., Liotta, G., & Montecchiani, F. (2014). Network visualization for financial crime detection. Journal of Visual Languages & Computing, 25(4), 433–51. https://doi.org/10.1016/j.jvlc.2014.01 .002. Dilla, W. N., & Raschke, R. L. (2015). Data visualization for fraud detection: practice implications and a call for future research. International Journal of Accounting Information Systems, 16, 1–22. https:// doi.org/10.1016/j.accinf.2015.01.001. Ens, B., Bach, B., Cordeil, M., Engelke, U., Serrano, M., Willett, W., Prouzeau, A., Anthes, C., Büschel, W., Dunne, C., Dwyer, T., Grubert, J., Haga, J. H., Kirshenbaum, N., Kobayashi, D., Lin, T., Olaosebikan, M., Pointecker, F., Saffo, D., … Yang, Y. (2021). Grand challenges in immersive analytics. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–17. https://doi.org/10.1145/3411764.3446866. Hachul, S., & Jünger, M. (2007). Large-graph layout algorithms at work: an experimental study. Journal of Graph Algorithms and Applications, 11(2), 345–69. https://doi.org/10.7155/jgaa.00150. Hadlak, S., Schumann, H., & Schulz, H.-J. (2015). A survey of multi-faceted graph visualization. Eurographics Conference on Visualization (EuroVis) – STARs. http://dx.doi.org/10.2312/eurovisstar .20151109. Henry, N., Fekete, J.-D., & McGuffin, M. J. (2007). NodeTrix: a hybrid visualization of social networks. IEEE Transactions on Visualization and Computer Graphics, 13(6), 1302–9. https://doi.org/10.1109/ TVCG.2007.70582.
Network data visualization 11 Ingram, S., Munzner, T., & Olano, M. (2009). Glimmer: multilevel MDS on the GPU. IEEE Transactions on Visualization and Computer Graphics, 15(2), 249–61. https://doi.org/10.1109/TVCG.2008.85. Kempe, D., Kleinberg, J., & Tardos, E. (2015). Maximizing the spread of influence through a social network. Theory of Computing, 11(1), 105–47. https://doi.org/10.4086/toc.2015.v011a004. Kerren, A., Purchase, H. C., & Ward, M. O. (eds) (2014). Multivariate Network Visualization. Springer International Publishing. https://doi.org/10.1007/978–3-319–06793–3. Kielman, J., Thomas, J., & May, R. (2009). Foundations and frontiers in visual analytics. Information Visualization, 8(4), 239–46. https://doi.org/10.1057/ivs.2009.25. Kobourov, S. G. (2013). Force-directed drawing algorithms. In R. Tamassia (ed.), Handbook of Graph Drawing and Visualization. CRC Press. Leite, R. A., Gschwandtner, T., Miksch, S., Gstrein, E., & Kuntner, J. (2018). Visual analytics for event detection: focusing on fraud. Visual Informatics, 2(4), 198–212. https://doi.org/10.1016/j.visinf.2018 .11.001. Li, Y., Fan, J., Wang, Y., & Tan, K.-L. (2018). Influence maximization on social graphs: a survey. IEEE Transactions on Knowledge and Data Engineering, 30(10), 1852–72. https://doi.org/10.1109/TKDE .2018.2807843. Meyerhenke, H., Nollenburg, M., & Schulz, C. (2018). Drawing large graphs by multilevel maxent-stress optimization. IEEE Transactions on Visualization and Computer Graphics, 24(5), 1814–27. https:// doi.org/10.1109/TVCG.2017.2689016. Okoe, M., Jianu, R., & Kobourov, S. (2019). Node-link or adjacency matrices: old question, new insights. IEEE Transactions on Visualization and Computer Graphics, 25(10), 2940–52. https://doi .org/10.1109/TVCG.2018.2865940. Perrot, A., & Auber, D. (2020). Cornac: tackling huge graph visualization with big data infrastructure. IEEE Transactions on Big Data, 6(1), 80–92. https://doi.org/10.1109/TBDATA.2018.2869165. Petrova, M., & Sen, A. (2016). Social media and political donations: new technology and incumbency advantage in the United States. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2836323. Rodgers, P. (2005). Graph drawing techniques for geographic visualization. In Exploring Geovisualization (pp. 143–58). Elsevier. https://doi.org/10.1016/B978–008044531–1/50425–5. Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., & Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research, 13(11), 2498–504. https://doi.org/10.1101/gr.1239303. Shneiderman, B. (1996). The eyes have it: a task by data type taxonomy for information visualizations. IEEE Symposium on Visual Languages, 336–43. https://doi.org/10.1109/VL.1996.545307. Vehlow, C., Beck, F., & Weiskopf, D. (2017). Visualizing group structures in graphs: a survey. Computer Graphics Forum, 36(6), 201–25. https://doi.org/10.1111/cgf.12872. Zinsmaier, M., Brandes, U., Deussen, O., & Strobelt, H. (2012). Interactive level-of-detail rendering of large graphs. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2486–95. https:// doi.org/10.1109/TVCG.2012.238.
2. Exponential random graph models: explaining strategic patterns of collaboration between artists in the music industry with data from Spotify Claudia Zucca
1. INTRODUCTION The abundance of data produced by the interaction of people online and the availability of information shared through online platforms offers an immense potential impact on social and business growth (Parameswaran & Whinston, 2007). The urgency of developing and applying techniques to make sense of these realities increases every day. The analysis and interpretation of the user-generated content that we can retrieve online requires, at the same time, computer science and social science skills to understand phenomena of interest (Lazer et al., 2009). Social Computing and, broadly speaking, Computational Social Sciences respond to the need to revolutionize the field separations we used to rely on to organize scientific knowledge. There is no longer use in keeping the social and the computational approaches distant since we can make sense of crucial social phenomena only by means of computational techniques (Mason et al., 2014). This interdisciplinarity is at the core of this scholarship, and social scientists and computer scientists must join forces investing resources in these new interdisciplinary fields (Lazer et al., 2009). Given the interactive nature of the data generation process, a very large quantity of this big data is constituted of networks (Lazer et al., 2009). This chapter introduces the reader to the use of Exponential Random Graph Models (ERGMs) in analyzing network data retrieved online. This class of models is able to answer causal inference questions with the analysis of cross-sectional network data. The new powerful computational resources enable us to employ techniques that were only applicable to analyzing small networks carefully selected to represent a population to much larger datasets. Several other statistical models allow researchers to test hypotheses of relevance on cross-sectional network data. Among the most popular, we can cite the multiple regression quadratic assignment procedures (MRQAP) (Dekker et al., 2007), Logistic Models for Network Data (Butts & Carley, 2001), and Logistic Network Autocorrelation Models (LNAM) (Anselin, 1988). Compared to these models, the ERGM provides a more considerable extent of flexibility for model specification, translating into a very large selection of terms to be inserted in the model to check the probability of observing any possible pattern the researcher might need. Moreover, the ERG models can be easily employed to test hypotheses on several types of cross-sectional networks, such as bipartite, weighted, or ego-centric. A network of collaborations between musicians playing Rock music retrieved from Spotify is analyzed as a case study to show the applicability of this class of causal inference models for dyadic data to research relevant to the scholarship of Social Computing. More specifically, 12
Exponential random graph models 13 Section 2 will introduce the reader to the ERGM models. Section 3 will discuss what type of research can be performed with the employment of ERGMs, with the help of the case study. The section will show how to analyze by discussing (a) the type of theory that can be tested using ERGMs, (b) the type of data needed for this type of model, and (c) how to specify the model with the R programming language. Afterward, Section 4 will present the results of the ERGM used as a case study explaining how to interpret the results. Finally, Section 5 will discuss the type of conclusion that can be drawn from the ERG models.
2.
THE ERGM
Causal inference approaches enable the testing of hypotheses of relevance using a deductive epistemological approach (Rothman & Greenland, 2005). The goal of causal inference consists of advancing theory by explaining the reasons that led to a particular phenomenon we were able to observe (collecting data). For instance, we could test the hypotheses that reacher people buy more goods by analyzing a data set containing information about the number of goods purchased on Amazon by a relevant sample of observed people with their individual income. This very simple case would require the employment of an ordinary least square (OLS) linear model. However, dyadic data cannot be analyzed with OLS, nor with any other sophisticated and complex model that we use for independently and identically distributed (IID) data (e.g., regular survey data that assume the independence between the observations) (Snijders, 2011). In order to test theoretical hypotheses to explain why a network of relationships observed in reality looks the way it does, we need to employ ‘special’ types of models specifically designed to handle network data. One of them is the p-star or exponential-family Random Graph built on the work of Holland and Leinhardt (1981) and Wasserman and Pattison (1996). Every network comprises nodes representing entities of interest that can be connected by ties representing a possible relationship between each pair. An ERGM accounts for the presence and absence of ties in a network and provides an environment to model a network structure (Lusher et al., 2013). We consider the network as the outcome variable of an ERG model that can be explained considering exogenous and endogenous explanatory variables. Intuitively, we can think of the outcome variable of this model as the complete list of every possible combination between each pair of nodes considering a binary outcome: ties and absence of ties (0;1). If we think of the model this way, our reasoning can be supported by the previous knowledge of logistic regressions, where we consider a dichotomous outcome variable. The ERGM is similar to a logistic model, but it is a fancy one. First, we can include variables of relevance as predictors in the model as much as in any logistic regression. However, the ERGM moves forward since there is the possibility to model several types of network effects generated by a variable on the network instead of a single one, as in the majority of the statistical models we know. For instance, if we consider a network where nodes are people with an eBay account that are connected to each other if they buy or sell something from/to each other, we can predict whether the increase in the individual income increases the probability of selling/buying (forming edges) as much as in any model for IID data. Still, we have many other options. For instance, we can also see whether the difference between the income of each pair of nodes is a possible explanation for their transaction. There are several options to model the influence of variables of relevance in the formation of the network or, in other words, to model exogenous effects (Cranmer et al., 2020). Still, the most innovative
14 Handbook of social computing side of ERGMs consists of modeling endogenous effects that, in the network realm, identify patterns that we can observe in the network structure. For instance, what is the probability that if node A sells to node B, also node B sells to node A (reciprocity)? Or what is the probability that if node A sells to node B and C, B and C also trade with each other (transitivity)? This model class is highly flexible and can be tailored to explain very diverse and complex structures. The estimation of the model allows for the identification of the parameters that maximize the likelihood of a graph, finding the parameters of the terms that maximize the probability of simulating graphs resembling the observed real one. According to the model specified, different algorithms can compute the probability of observing that network given the predictors. Two are the primary distinctions: If the predictors estimate the probability of forming ties independently from the existence of other ties – for instance, the difference in income between pair of nodes (dyadic-independent) – the model employs a maximum likelihood approach, and it estimates the parameters in a very similar way as a logistic regression since the problem is mathematically tractable (Cranmer et al., 2020). Differently, if the predictors specified in the model focus on the probability of forming ties that depend on other ties – for instance, reciprocity, where a tie can be reciprocated only if there is already another – the problem is mathematically intractable (dyadic-inter-dependency). This second case requires the employment of Markov Chains Monte Carlo (MCMC) simulations to solve the problem by approximation (Cranmer et al., 2020). An ERGM simulates a large number of random graphs similar to the one observed in reality to estimate the probability of observing the particular features of the real network compared to the random ones. If the observed network is sufficiently different from the distribution of the parameters generated by the simulated random network, it is possible to claim that we are observing something ‘interesting’ called significant in frequentist statistics. On the other hand, if the simulated networks are very similar to the observed one, we believe that the actual network was generated by chance as much as those simulated and hence to be of negligible scientific interest. Also, the simulation might not lead to a final result if we have a dyadic inter-dependent model. In fact, if the model is poorly specified, it will not ‘converge’. In other words, the simulations are so different from each other that it is impossible to make any inference based on that data. This is a sign that the model requires more work and is heading toward a very different scenario from the one captured by the actual network. The estimation of the ergm is not elementary, and it might require a long coding time and a high statistical and computer science proficiency level. one of the most established software for its estimation is the ergm package, part of the statnet suite developed for the R language (Handcock et al., 2008; Statnet Development Team, 2003–22). The ergm package is a self-contained environment for its estimation, simulation, and diagnostics (Hunter et al., 2008). It also provides a selection of the most common types of terms to insert in the model and a guided procedure to create new ones if needed (Morris et al., 2008).
Exponential random graph models 15
3.
EXPLAINING COLLABORATIONS BETWEEN MUSICIANS USING SPOTIFY DATA
An ERG model can be used to test hypotheses of relevance in reference to a network structure. The model allows disentangling very specific dynamics that can be observed in a network, trying to find the processes that generated the structure we observe in reality. Using a case study relevant to the domain of Social Computing will help illustrate the process behind the employment of an ERG model. First, we introduce a theory to be tested with the help of research questions and hypotheses; then, we explore the data set available for the hypothesis testing; and finally, we specify a model in the R environment using the ergm package and the statnet suite. 3.1
Theory Testing
Weatherston (2009) explains that staff and students in a university music department see themselves as entrepreneurs. Entrepreneurial and marketing research has been trying to explain the factors that drive success in creative businesses (Chen et al., 2015). Collaboration has been proven to be a strategy that brings advantages to entrepreneurs since it can make new knowledge available and increase the chances of succeeding (Castañer & Oliveira, 2020). Artists, as musicians, show entrepreneurial profiles and, as entrepreneurs, promote their artistic products in a very competitive music market (Woronkowicz & Noonan, 2019). Hendry (2004) claims that creative workers are expected to deploy entrepreneurial skills motivated by competitive self-interest rather than cooperation. However, as a matter of fact, some of them choose to collaborate and release songs and albums together. What drives collaboration between musicians? Does this strategy increase their chances of being successful players in the music business ecosystem? Given the artistic and competitive nature of the creative industry, musicians should be naturally inclined not to cooperate (Hendry, 2004). However, collaboration is a strategy that only some entrepreneurs use due to personal marketing strategies and opportunities and the fact that collaboration takes extra effort that only some are able to perform (Anderson & Narus, 1991). Hence a first hypothesis can be formulated as follows: H1 Many musicians decide not to collaborate as their career strategy. However, in the contemporary creative industry, networking is considered an essential skill, and musicians are asked to work on their contacts to survive in the music business (Coulson, 2012). Also, collaboration provides the advantage of sharing knowledge and increasing exposure (Castañer & Oliveira, 2020). Since collaborations require effort, those who collaborate usually build a relationship with an established partner instead of putting work into several collaborations at the same time (Anderson & Narus, 1991). A second hypothesis can be formulated as follows: H2 Those musicians who collaborate usually do that only with one other artist or band.
16 Handbook of social computing Since establishing new collaborations takes effort (Anderson & Narus, 1991), musicians reduce the effort of working with someone new, building on the relationships of their existing collaborators. From which we can hypothesize that: H3 Those musicians who collaborate with more than one other artist do so with their collaborators’ collaborators. In this way, they can also rely on the artistic affinity they already have with their previous collaborators. Moreover, research shows that collaboration increases the chances of a successful entrepreneurial business (Sorenson et al., 2008). More specifically, referring to the music business, Silva et al. (2019) show that collaboration between musicians is positively associated with their popularity. Hence we hypothesize that: H4 Collaboration is associated with higher popularity. Finally, Granovetter (1973) discusses a theory known as ‘The Strength of Weak Ties’ to explain that outsiders of a specific target group are fundamental for the group’s prosperity since they provide access to external resources not available in the current environment. In relation to the behavior of musicians establishing collaborations, we can hypothesize that: H5 Popular artists expand their popularity to new smaller markets by collaborating with niche artists (exploit the strength of weak ties). 3.2
The Spotify Data Set
The hypotheses introduced in the previous sections will be tested on a network of collaboration between musicians retrieved from Spotify. Figure 2.1 provides a graphical representation of the network under examination. It is undirected, unweighted, and constituted of 4,208 nodes and 2,548 ties representing musicians or bands playing Rock music and their collaborations with other musicians or bands. The musicians playing Rock music and their collaborations have been subsetted from a large sample of Spotify data collected from September 29, 2013, to October 10, 2022, available on Kaggle (Szamek & Freyberg, 2022).1,2 The ‘Spotify popularity index’ is also available for each of the musicians in the sample. The index ranges from 0 to 100 and measures how popular each musician is on Spotify compared to any other musician or band on Spotify. This network shows 2,142 isolated nodes representing musicians who do not collaborate with any other node in the sample. This large number of isolates generates 8,848,980 null connections (connections that could occur but are not there) and a very small network density of 0.00028. Table 2.1 provides a descriptive analysis of the possible patterns involving three nodes (triad census – Holland & Leinhardt, 1971) available for an undirected network. Given the small density of the network, we observe a majority of unconnected triads (three musicians or bands that could collaborate but do not), followed by connected pairs (collaborations between two artists) and open triads (one artist with two collaborators that do not collaborate with each other). Finally, only a relatively small number of nodes constitute 311 closed triads (a clique of three musicians collaborating together).
Exponential random graph models 17
Source:
Spotify data.
Figure 2.1
Network of collaboration between Rock musicians
Table 2.1
Triad census for the Rock Spotify undirected network
Triad census – type
Count 003 – Unconnected
12,399,138,495
102 – Connected pair
10,690,945
201 – Open triad
12,505
003 – Closed triad
311
18 Handbook of social computing The Spotify Application Programming Interface (API) does not provide access to data concerning any demographics of bands and musicians. Hence, we cannot include in the analysis any information such as nationality, number of musicians in the band, or age. 3.3
Model Specification
Each of the hypotheses we stated above can be translated into terms to include in the model specification. Table 2.2 summarizes the terms capturing each of the hypotheses to be included in the model using the aforementioned ergm package. H1 expects to find that there are disconnected nodes and that those nodes are not there by chance (but by choice). This idea translates into a model term that expects to find nodes with degree(0). The term degree considers the number of connections each node has and it checks on the probability that the number of connections specified in brackets is not random, but the outcome of a specific social process. Degree(0) tests whether nodes have zero connections by chance, or whether this is a choice of the musician/band. The term degree considers the number of connections each node has, and it checks on the probability that the number of connections specified in brackets is not random but the outcome of a specific social process. Degree(0) tests whether musicians/bands have zero collaborators by chance or whether this is a choice of the musician/band. H2 points to edges disconnected from other nodes. Hence we insert into the model the term isolatededges. This term first counts the number of connections between pair of nodes when these two nodes do not have any other connections. Second, it checks on the probability that the number of observed isolated edges is by change or whether it is the outcome of a particular social process that makes it not random. H3 wants to verify the probability that the closed triads we observe are not happening by chance. Hence, we insert into the model a term called ‘gwesp’, which stands for geometrically weighted edgewise shared partner. A closed triad is essentially a triangle. A term called triangle could be used to check the probability that the number of triangles observed in the network is not random. However, this specific term suffers from so-called ‘degeneracy’ problems, leading the simulation to insert too many triangles and fail the model. It is common practice to avoid this issue using the gwesp term. Inside the term, it is possible to specify a parameter called decay that tells the simulation to keep only the specified number of triangles and to discard the others to avoid model degeneracy. Still, the term checks the probability that the specific number of observed triangles is not random but generated by a specific social process. H4 and H5 want to see whether popularity plays a role in edge formation. Then we insert the variable popularity into the model twice. In the case of H4, we insert it with the ‘nodecov’ term, which computationally works like inserting a numeric variable as a predictor into a logistic regression model. The term considers popularity as a node attribute and checks on the probability that nodes with a high popularity level are more likely to be connected to other nodes. Finally, since the ERGM allows us to do so, we also insert popularity with the term ‘absdiff’, which checks on the probability of connection between two terms based on the similarity of the attribute, enabling us to relate musicians with heterogeneous level of popularity to each other. In other words, the term absdiff computes the difference between the popularity of each pair of nodes so that popularity is no longer considered a property of each node but a property of the relationships between the nodes. Hence, it becomes an edge attribute, and the terms compute the probability of observing edges with the highest weight.
Exponential random graph models 19 Table 2.2 Hypothesis
Matching hypotheses to model terms Hypotheses
Model term
H1
Many musicians decide not to collaborate as their career strategy
Degree(0)
H2
Those musicians who collaborate usually do that only with one other artist
isolatededges
number
or band H3
Those musicians who collaborate with more than one other artist do so with gwesp their collaborators’ collaborators
H4
Collaboration is associated with higher popularity
nodecov – popularity
H5
Popular artists expand their popularity to new smaller markets by
absdiff – popularity
collaborating with niche artists (exploit the strength of weak ties)
The first three terms are dyadic dependent. This means that the probability of observing one edge is not independent of the probability of observing others. This feature increases the complexity of the model. Hence, the model is mathematically intractable and needs to be solved using MCMC simulations. For further details about specifying an ERG model in R, see Goodreau et al. (2008) and the workshops available at the Statnet Development Team (2003–22).
4. RESULTS This section presents the final model selected for this illustrative study. Before a final model can be selected, many others need to be run to ensure the best possible model configuration is chosen. This is true for every model but is particularly important for ERGMs. In fact, before checking the estimates of the model and the significance of the coefficient, it is necessary to make sure that the simulation works in the way it is supposed to and that the model shows a certain level of goodness of fit. Hence, this section presents first the MCMC diagnostics for the model, second the goodness of fit, and only after the model results.
20 Handbook of social computing 4.1
MCMC Diagnostics
Figure 2.2 displays the MCMC diagnostics for the final model. We can observe two plots for each term inserted in the model after the first two (first row of models) that refer to the term edges, typically employed in every model to play the intercept role. These plots show how
Source:
Spotify data.
Figure 2.2a
Source:
MCMC diagnostics plots
Spotify data.
Figure 2.2b
MCMC diagnostics plots
Exponential random graph models 21 the simulation evolved. We want a simulation that does not go too far from the real data we observed. Assuming that the real parameter estimates are represented as a 0, we can see how the simulations evolved with both the right and left sides’ plots. While the left side plots tell us how the MCMC chains evolved during the simulations, the right side plots tell us how the estimation of the parameters evolved. Figure 2.2a shows that the plots on the left-hand side indicate that the chains are ‘mixing well’. Mixing well means the simulation traces compose a unified shape, like a fuzzy caterpillar. The different colors should not trick the reader. They only trace the parallelization employed in running the model to make it faster. Therefore, the fuzzy caterpillar shape should be considered independently of the colors. Also, the traces revolve around a zero placed at the center of the y axis. This is also a sign of the goodness of the model. If the plot showed scattered lines that do not revolve around zero, it would signify a poorly specified model or poorly specified simulation parameters. The plots on the right-hand side display the density of the parameters’ distribution across the simulations. They ideally need to be normally distributed around zero. In this case, all of them are normally distributed around zero on the x axis. A poor model specification would show non-normal distributions not centered around 0. Figure 2.2b shows each parameter chain that resembles a fuzzy caterpillar shape centered around zero on the left-hand side plot and density distributions of the parameters normally distributed around zero on the right-hand side. We can conclude that the simulation in the model worked very well, allowing us to move on to examine the goodness of fit.
Source:
Spotify data.
Figure 2.3
Model goodness of fit
22 Handbook of social computing 4.2
Model Goodness of Fit
Figure 2.3 displays the goodness of fit (GOF) for the final model. The boxplot shows the variation of the parameters coming from the simulation, while the black tick line shows the parameter of the observed network. The black tick line should ideally place inside the box plot so that the actual values lie within the simulation’s variation range. We can appraise from Figure 2.3 that this final model has a good fit even if there could still be some improvements, for instance for the value 1 of edge-wise shared partners. The first plot on the upper left displays the model terms, while the other three are standard ways to look at the GOF implemented by the gof() function in the ergm package. 4.3
Model Results
Considering that both the MCMC diagnostics and the GOF show good results, we can rely on the chosen model and interpret its results displayed in Table 2.3. Table 2.3
Model results
Model terms
Estimates
Standard error
Odds ratios
Probabilities
P-value
edges
−9.33
0.11
0
0
1 − (_ ) a p > a − (1 − a) > a p
(4.5)
Figure 4.5
Network structure after processing
Chasing the Black Swan in cryptocurrency markets 61
62 Handbook of social computing The decision has been made regarding initial adopters not to include them in the model. This is because it would have added a layer of complexity without adding value to the model. During the process of mathematically modeling a complex phenomenon, one should always reflect on the level of complexity of the model. The mindless addition of features to the model in order to achieve better prediction scores on the training data can lead to overfitting and, therefore, to a decrease in the models’ predictive performance on unknown data (Saltelli et al., 2020). Instead of perfectly simulating an ongoing cascade inside the given network, the model assumes that a hypothetical set of initial adopters with arbitrary size “infiltrates” the network from the outside. Therefore, the answer that the model tries to give is the more general question of if the network got infiltrated by a set of initial strategy A adopters from outside the network, would the underlying network structure allow a complete cascade to happen. For a specific day, the answer to that question depends on three things. First, how the general sentiment of the news articles is. Second, how the general market sentiment is. And third, how tightly the nodes are interconnected inside their clusters. Figure 4.5 gives an overview of the models’ data flow. 6.1.2 News sentiment analysis To evaluate the current market events outside of social media, news articles from the area of the cryptocurrencies were downloaded, as explained in Section 5. In order to be able to process these for the later model, the time and the sentiment of each article are required to consider the sentiment picture on Twitter and the prices. The news article analysis was done using the Generative Pre-trained Transformer 3 (GPT-3). To predict the sentiment of each news article, a few-shot learning method was used. In concrete terms, this technique was applied to send an example of sentiment with it. Due to the high generative level, GPT-3 is already able to generate satisfactory results for classification. The Davinci instructor model was used, which is the most highly developed GPT-3. For use, an API was addressed, which uses a so-called prompt as input. It contains the text snippet of the news article (i.e., the title and the short description) and a textual instruction for the AI. GPT-3 is based on the examples and the instruction to implement the commands. With this procedure, the whole dataset was classified. The sentiment images “positive”, “negative”, and “neutral” were predicted in the first step and afterward translated into 0 for “positive” and “negative” values; −1 for “neutral” in a further postprocessing step. The prediction process classified 7 percent of the newspaper articles as “neutral”, 69 percent as “positive”, and 25 percent as “negative”. Thus, 7 percent of the final processed news dataset values are −1, and the other 93 percent are equal to 0. 6.1.3 Market sentiment A critical influencing factor in the model is the data obtained from the technical indicators, which were gathered from the API provided by Yahoo. In each case, the analysis includes the technical indicators that have already been explained in Subsection 5.3. Redundant columns have been deleted to ensure that only essential information remains taken into account. “Date”, “Close”, and “Volume” are used, which are necessary for the calculations of the respective indicators (as in equation 4.6 below) (Table 4.2). It is essential to note the data consistency so that the output from all indicators is homogeneous and facilitates further work. The individual indicators are implemented according to their applicability. Concerning the past, the raw market behavior manifests itself in the technical indicators, so it is interesting to include such influencing variables in the evaluation. The indi-
Chasing the Black Swan in cryptocurrency markets 63 Table 4.2
Market sentiment – dataframe
Date
Close
Volume
signal_rsi signal_bb
signal_obv signal_mva
calculated_average
ta_query
2017-11-29
427,523,010
2675940096
0
0
−2
−2
−1.0
ETH
2017-11-30
447,114,014
1903040000
0
0
2
−2
−0.2
ETH
2017-12-01
466,540,009
1247879936
0
0
2
−2
−0.2
ETH
2017-12-02
463,449,005
943649984
0
0
−2
−2
−1.0
ETH
cators follow the same principle when evaluating the results: 2 signals a buy potential, −2 is a sell signal, and 0 is a neutral value. Subsequently, the individual output results of the indicators are included in the calculation according to their weighting. The weighting was allocated according to their intensity of application in practice and pragmatics. In each case, BBs and MAs were weighted with 0.3, while RSI and OBV was weighted with 0.2. As a return, we get the calculated average, which gives us a summed up final value, given with the corresponding weights, representing the tendency of the behavioral decision. The value is displayed for each coin on a daily granularity. 6.1.4 Network calculation Several issues with the data at hand have been detected during the data-fetching process. First, the resulting network for a specific day consisted of several components. This is a problem because the model assumes that the analyzed network consists of only one component. The unprocessed network structure can be seen in Figure 4.7. As stated in Subsection 5.1, the nodes of the network represent the Twitter users, which participate to the respective keywords, e.g. Bitcoin (BTC) or Ethereum (ETH). Each node reflects one single user. The arcs are the connections between those users. As also mentioned in Subsection 5.1, the model does not distinguish between input and output connections. A link between two nodes thus implicates that those two nodes are engaging with each other. Dealing with this issue, the assumption has been made that the fragmentation of components inside the network is due to missing data stemming from a data fetching process with too little granularity. Based on this assumption, the nodes in the raw network get reconnected using the k-edge-augmentation algorithm. This algorithm performs a breadth-first search for the smallest sized set of edges needed to transform the network into a one-component network. This network gets fractured again if k edges are removed (Hagberg, Schult, & Swart, 2008; Filippidou & Kotidis, 2016). The algorithm has been chosen for this model to be performed with k = 1, so the data gets augmented in the smallest way possible. After the network augmentation calculation, a modularity-based approach was chosen to detect clusters inside the network. The concept of modularity is centered around the fraction of the edges that fall within the given groups minus the expected fraction if edges were distributed at random (Brandes et al., 2008). For this particular purpose, the Clauset–Newman– Moore greedy modularity maximization method has been chosen (Clauset, Newman, & Moore, 2004). This algorithm starts with each node being in its own cluster. If the modularity of a particular node can be increased by joining an adjacent cluster, the node proceeds to do just that. This step is repeated until no more nodes exist inside the network, which could further increase their respective modularity.
64 Handbook of social computing 6.1.5 Cluster strength After performing the described network calculations, the cluster strength (p) can be calculated. This scalar value in the range between 0 and 1 does indicate how strong the nodes of a cluster are interconnected. In other words, the higher the value of (p) for a given cluster, the lower the number of edges to nodes from other clusters. Therefore, the cluster strength (p) of a cluster holds information about how strongly isolated a particular cluster is from the rest of the network. To calculate (p) for a given cluster (4.6) is being calculated for every node in a cluster and added to a list. The lowest value of that list corresponds to the cluster strength (p) of that cluster. This is repeated for every cluster inside the network. The highest value is of interest from all calculated cluster strengths (p) and is therefore saved for further analysis. NeighbouringNodesInSameCluster TotalNeighboringNodes
p = ________________________
(4.6)
6.1.6 Analysis After (a) and (p) have been calculated for a specific day, these values can be used to analyze whether the network structure is susceptible to a complete cascade. For this purpose, the values of (a) and (p) are being compared. If (p) is higher than (a), the network is not susceptible to a complete cascade because, according to (Easley & Kleinberg, 2010), a cluster with a high enough cluster strength (p) to block a possible strategy A outbreak from adapting to strategy A exists in the network. This comparison is encapsulated in equation 4.5 (above). 6.2
Prediction Model
One of the goals of this project is to improve the predictive performance of existing models for cryptocurrency prices. However, this is not the project’s core but rather the network theoretical approach to predicting and measuring hype or panic reactions utilizing various influencing factors. In order to increase the predictive performance based on these predictions, an LSTM neural network was created. The approach was to train one LSTM solo (i.e., without the values of the cascade model) and one including these values as features (LSTM + cascade model). As features of the LSTM solo, the price values “Open”, “High”, “Low”, “Close”, and “Volume” were used. For the LSTM + cascade model, the output values of the model were also used as features: “a”, “cluster strength”, “hype bool”, and “hype num”. The features “hype bool” and “hype num” are based on a calculation from “a” and “cluster strength”. If the value of the feature “a” is more than the value of “cluster strength”, the value of hype bool is 1. Otherwise, it is 0. “hype num”, on the other hand, gives the numerical difference so that it can be seen as the amplitude of the cascade excess. Since the values of the cascade model are available on a daily basis, daily price histories were used as the basis of the training.
LSTM ARCHITECTURE model = Sequential() model.add(LSTM(units = 50, activation = ‘relu’, return_sequences = True, input_shape = (X_train.shape[1], len(X_train[0][0]))))
Chasing the Black Swan in cryptocurrency markets 65 model.add(Dropout(0.2)) model.add(LSTM(units = 60, activation = ‘relu’, return_sequences = True)) model.add(Dropout(0.3)) model.add(LSTM(units = 80, activation = ‘relu’, return_sequences = True)) model.add(Dropout(0.4)) model.add(LSTM(units = 120, activation = ‘relu’)) model.add(Dropout(0.5)) model.add(Dense(units =1)) optimizer = ‘adam’ loss = ‘mean_squared_error’
7. RESULTS In the subsequent sections, the results of the respective models are presented and briefly described. 7.1
Cascading Model
The inequation (4.5) is calculated for every day of the data fetching period. The resulting diagrams display the crucial parameters (p) and (a) as time series and can be seen in Figures 4.6–4.11. Periods where the networks are susceptible to a complete strategy A cascade are shaded dark gray. In addition to the relevant parameters, the price of the corresponding cryptocurrency is being displayed to give a hint about whether a substantial increase or decrease follows the analyzed susceptibility in the pricing evaluation of the observed cryptocurrency.
Figure 4.6
BTC
66 Handbook of social computing
Figure 4.7
ETH
Figure 4.8
SOL
Figure 4.9
LINK
Chasing the Black Swan in cryptocurrency markets 67
Figure 4.10
SHIB
Figure 4.11
DOGE
Figure 4.12
Too few data points to train a neural network
68 Handbook of social computing 7.2
Predictive Performance
Unfortunately, the prediction results are not satisfactory. More extensive training data is needed to obtain acceptable results for the LSTM solo approach. A further issue is the small amount of training data available for training the LSTM model with the additional feature obtained from the cascade model. Their data is only available in the period between November and December. This was not sufficient to generate reliable results, as neural networks, in particular, require a high number of training data (Figure 4.12). The available training data amount to only about 45 days. Thus, a verification of the increase of predictive performance by adding additional features from the output of the cascade model is not possible.
8. WEAKNESSES When interpreting the results of the cascade model, one has to be cautious for several reasons. First, the database on which the model operates is far from complete. As mentioned earlier, the data fetched for this work turned out insufficient. The decision to fetch the data every Sunday, combined with the limitations of the Twitter API, resulted in an uneven daily distribution of tweets with a strong bias towards the days on which the data had been fetched. The network data had to be artificially altered to deal with this issue, as described in the former sections. Second, the model operates in a highly simplified environment. Since the model tries to reason about an exceptionally complex real-world phenomenon with exceedingly high uncertainty, simplification is unavoidable. Therefore, it should be noted that not all potentially influential factors and channels were included in this study because other indicators can influence price trends. Consequently, price does not consist only of the indicators discussed in the analysis, so that one might consider several other influential entities for future work. Nevertheless, the assumptions that have been taken along the way should be kept in mind when analyzing the models’ output. Third, even though the model does indicate some sort of correlation, this does not indicate a causal relationship between how the structure of the Twitter communication network is composed and the price fluctuation of the corresponding asset. The model contains essential indicators that could vehemently influence price development. The inclusion of Twitter data and the addition of news data gives the work credibility and seriousness. The endogenous and exogenous influencing factors provide a market total contribution, which can be used for future research with further indicators to find and include in the calculation. Regarding the prediction model (LSTM), it should be noted that the observation period and the amount of data were not sufficient for training the LSTM network. Therefore, feeding the model with much data should be considered to obtain a better result. Observing a more extended time interval and including these data in the model is recommended.
9. DISCUSSION The key characteristics of a Black Swan event, according to (Taleb, 2007a), are the unpredictability, massive impact, and randomness of the event in question. Those characteristics
Chasing the Black Swan in cryptocurrency markets 69 render predicting those events extremely difficult, if not utterly impossible, and can thus be considered a reason for the thin body of academic papers in this research field. Taking this into account, it is essential to reformulate the underlying problem of predicting Black Swan events. Thus, in trying to answer the problem of predicting Black Swan events, this chapter focused on the transition of an emotional reaction from the actor to the whole network level. The problem of predicting Black Swan events, therefore, has been reduced to predicting human behavior, a task which can be seen as equally tricky, but with a more sophisticated body of academic literature, which also supplied the considerations taken in this chapter, supporting the decision to examine the structure of communication networks associated with a specific cryptocurrency market about the possibility of supporting a complete cascade of panic (or hype) behavior. The formerly described endeavor of predicting human behavior inside social networks has been enriched with an effort to predict price fluctuations resulting from soft Black Swan events. For this purpose, the model output has been used, alongside historical market data, as an additional feature to train an LSTM model for predicting cryptocurrency prices. Furthermore, the output of the solo LSTM model, compared to the output of the LSTM with the cascade model as a feature, could have been used to give indications about the statistical significance of the results of the cascade model. This effort did not provide satisfactory results because of a lack of data from the cascade model to train the LSTM model. Keeping in mind the lack of statistically significant proof for the results of the cascade model, the results for ETH (Figure 4.7), SOL (Figure 4.8), LINK (Figure 4.9), and SHIB (Figure 4.10) indeed indicate days where the structure of the Twitter communication network is susceptible for a complete cascade, therefore fulfilling the prerequisite for a soft Black Swan event, which has been defined in Section 4.3.2 of this chapter. In addition, periods of complete cascade susceptibility are often followed by a sudden drop or increase in price in the range of several percent, further substantiating the claims laid in section 4.3.2. For BTC (Figure 4.6) and DOGE (Figure 4.11), however, the model did not signal any day where the network structure was susceptible to a whole cascade event, even though the corresponding price for BTC did drop by about 24 percent in that period. These unsatisfactory results show that the model is not sensitive enough to detect every possible constellation of news, market sentiment, and network structure, supporting a complete cascade. According to the knowledge gained from the modeling of the cascade approach, further extensions are possible. In particular, to increase the number of data points, the granularity of the downloaded data must be increased to an hourly level. Thus, the number of data points could be increased by 24. Furthermore, crypto markets are very volatile, as already explained in the theoretical background of this chapter. Therefore, to predict fast price fluctuations on an hourly level or even in shorter intervals, the current cascade readiness of the market has to be analyzed. Further, the algorithm could be additionally enriched with the sentiment of the Twitter data. This was not done in the first step to simplify the model. However, this would make it even easier to conclude positive hype or negative panic and thus also about the expected price trend. Furthermore, the data must be drawn over a more extended period. The currently used eight weeks are not sufficient to train an LSTM thoroughly. The last extension of this work could be an automatic trading pipeline. If the time series prediction by the LSTM works with satisfactory performance, then one could predict the prices in the short term better than with the conventional analyses. If one can automate the trading of small amounts and convert price changes into profits in short intervals, one could use the cash average effect by frequent
70 Handbook of social computing trading in intervals. This design assumes that after increasing the number of data points, there is a significant benefit of the cascade model to increase the predictive performance. Thus, this chapter provides an essential basis for identifying hard Black Swan events where exogenous influences could diverge. The model approach could determine the market impact of soft Black Swan events. Obviously, the occurrence of a hard Black Swan cannot be accurately predicted. However, by reference to the past and the successive reaction and behavior of market participants to the event, it would be translatable into a language that could determine the future path of price development and, therefore, potentially make an essential contribution to the realm of quantitative finance.
10. CONCLUSION The scope of this work has been to improve existing endeavors in predicting the price fluctuation of markets. Unlike similar attempts, which approached the underlying problem by implementing sophisticated AI models and training them on historical market data, this chapter explored the feasibility of increasing the predictive performances of those prediction models by engineering a predictive indicator based on the analysis of a corresponding communication network. The output of the cascade model can then be used as a feature for sophisticated AI models, which can then be trained in addition to historical market data. Unfortunately, an LSTM model with the integrated feature did not provide satisfactory results. Unfavorable decisions at the beginning of the research period and the resulting lack of data were significant driving factors. Nevertheless, the results of the cascade model hint into the direction of being able to predict strong price fluctuations in the observed markets and therefore justify further research endeavors in the same direction taken by this chapter.
REFERENCES Aven, T. (2013). On the meaning of a black swan in a risk context. Safety Science, 57, 44–51. BaFin (2019). Zweites hinweisschreiben zu prospekt- und erlaubnispflichten im zusammenhang mit der ausgabe sogenannter krypto-token. https://www.bafin.de/SharedDocs/Downloads/DE/Merkblatt/ WA/dl_wa_merkblatt_ICOs.html (accessed 10 February 2022). Best, R. D. (2022). Bitcoin dominance 2022. https://www.statista.com/statistics/1269302/crypto-market -share/(accessed 10 February 2022). Bollinger Bands Explained (2018). https://academy.binance.com/en/articles/bollinger-bands-explained. Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., & Wagner, D. (2008). On modularity clustering. IEEE Transactions on Knowledge and Data Engineering, 20(2), 172–88 https://doi.org/10.1109/TKDE.2007.190689. Braun, W. (2010). Die (psycho) logik des entscheidens. Hogrefe. Clauset, A., Newman, M. E. J., & Moore, C. (2004). Finding community structure in very large networks, Physical Review, 70(6). https://doi.org/10.1103/PhysRevE.70.066111. Digital News Report (2021). Executive summary and key findings of the 2021 report. https:// reutersinstitute.politics.ox.ac.uk/digital-news-report/2021/dnr-executive-summary. Easley, D., & Kleinberg, J. (2010). Networks, crowds, and markets: reasoning about a highly connected world. https://doi.org/10.1017/CBO9780511761942. Filippidou, I., & Kotidis, Y. (2016). Effective and efficient graph augmentation in large graphs. 2016 IEEE International Conference on Big Data (pp. 875–80). IEEE. https://doi.org/10.1109/BigData .2016.7840681.
Chasing the Black Swan in cryptocurrency markets 71 Frick, K., Guertler, D., & Gloor, P. (2013). Coolhunting for the world’s thought leaders. https://arxiv .org/abs/1308.1160. Global Cryptocurrency Ownership Data 2021 (2021). https://triple-a.io/crypto-ownership/. Gloor, P. (2007). Coolhunting for trends on the web. https://www.researchgate.net/publication/4369790 _Coolhunting_for_trends_on_the_Web (accessed 10 February 2022). Gloor, P., Krauss, J., Nann, S., Fischbach, K., & Schoder, D. (2009). Web science 2.0: identifying trends through semantic social network analysis. 2009 International Conference on Computational Science and Engineering, 4, 215–22. Gloor, P., & Zhao, Y. (2005). TeCFlow: a temporal communication flow visualizer for social network analysis. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=49569dff712e85a4356a 634499557a480c75c391. Hagberg, A. A., Schult, D. A., & Swart, P. J. (2008). Exploring network structure, dynamics, and function using NetworkX. In G. Varoquaux, T. Vaught, & J. Millman (eds), Proceedings of the 7th Python in Science Conference (pp. 11–15). https://conference.scipy.org/proceedings/scipy2008/SciPy2008 _proceedings.pdf. www .investopedia .com/ terms/ o/ Hayes, A. (2021). On balance volume (OBV) definition. https:// onbalancevolume.asp (accessed 10 February 2022). Hönig, M. (2018). Initial coin offering: studie zu kryptowährungen und der blockchaintechnologie. https:// www.frankfurt-university.de/fileadmin/standard/Hochschule/Fachbereich_3/Kontakt/Professor_inn _en/Hoenig/20180502_Bitcoin_Studie_fra_uas_Hoenig_V1.0.pdf (accessed 10 February 2022). Houben, R., & Snyers, A. (2018). Cryptocurrencies and blockchain: legal context and implications for financial crime, money laundering and tax evasion. European Parliament. https://op.europa.eu/en/ publication-detail/-/publication/631f847c-b4aa-11e8-99ee-01aa75ed71a1. Ji, S., Kim, J., & Im, H. (2019, 09). A comparative study of bitcoin price prediction using deep learning. Mathematics, 7(10), 898. https://doi.org/10.3390/math7100898. Li, T., Chamrajnagar, A., Fong, X., Rizik, N., & Fu, F. (2019, 07). Sentiment-based prediction of alternative cryptocurrency price fluctuations using gradient boosting tree model. Frontiers in Physics, 7, 98. https://doi.org/10.3389/fphy.2019.00098. Mittal, A., Dhiman, V., Singh, A., & Prakash, C. (2019). Short-term bitcoin price fluctuation prediction using social media and web search data. Twelfth International Conference on Contemporary Computing (IC3), Noida, India, pp. 1–6. https://doi.org/10.1109/IC3.2019.8844899. Moving Average Explained (2018). https://academy.binance.com/en/articles/moving-averages -explained. Phillip, M., & Mathias, F. (2019). Regulation of initial coin offerings: reconciling U.S. and E.U. securities laws. Chicago Journal of International Law, 19(2), article 5. Poongodi, M., Ashutosh, S., Vignesh, V., Bhardwaj, V., Abhinav, S., Razi, I., & Rajiv, K. (2019). Prediction of the price of Ethereum blockchain cryptocurrency in an industrial finance system. Computers & Electrical Engineering, 81, 1065527. https://doi.org/10.1016/j.compeleceng.2019 .106527. Rockerfeller, B. (2014). Technical Analysis for Dummies. Wiley. Saltelli, A., Bammer, G., Bruno, I., Charters, E., Di Fiore, M., Didier, E., … Vineis, P. (2020). Five ways to ensure that models serve society: a manifesto. Nature, 582(7813), 482–4. https://doi.org/10.1038/ d41586-020-01812-9. Sin, E., & Wang, L. (2017). Bitcoin price prediction using ensembles of neural networks. 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Guilin, China, pp. 666–71. https://doi.org/10.1109/FSKD.2017.8393351. Taleb, N. N. (2007a). The Black Swan: The Impact of the Highly Improbable. Random House. Taleb, N. N. (2007b). Preface. In The Black Swan: The Impact of the Highly Improbable. Random House. What Is the RSI Indicator? (2018). https://academy.binance.com/en/articles/what-is-the-rsi-indicator. Zhang, X., Fuehres, H., & Gloor, P. (2011, 12). Predicting stock market indicators through twitter – “I hope it is not as bad as I fear”. Procedia – Social and Behavioral Sciences, 26, 55–62. https://doi .org/10.1016/j.sbspro.2011.10.562. Zhuge, Q., Xu, L., & Zhang, G. (2017). LSTM neural network with emotional analysis for prediction of stock price. Engineering Letters, 25, 167–75.
72 Handbook of social computing
APPENDIX
Note: Normalized in the range [0;1].
Figure 4A.13 Past Black Swan events
Source:
Easley & Kleinberg (2010).
Figure 4A.14 Coordination game
Chasing the Black Swan in cryptocurrency markets 73
Source:
Easley & Kleinberg (2010).
Figure 4A.15 Exemplary cascade
5. Presidential communications on Twitter during the COVID-19 pandemic: mediating polarization and trust, moderating mobility Mikhail Oet, Tuomas Takko and Xiaomu Zhou
1. INTRODUCTION How we communicate, the extent of discourse polarization in our communities, and the level of public trust in our society can influence many aspects of our lives. It affects our understanding of a novel disease, actions to protect ourselves and our families, economic vitality, and societal security. Since the outbreak of the COVID-19 pandemic, much of the international discourse regarding the role of government in resolving public health emergencies is to provide a clearer understanding of how epidemiology and public health information affect policy, public anxiety, and economic uncertainty (Van Bavel et al., 2020). We have yet to see much research exploring whether government communications and public trust in government affect infectious disease spread and macroeconomic outcomes, and if so, how. We hypothesize that the clarity of government officials’ communication attenuates communication polarity and public anxiety, increases public trust and compliance, improves control over the disease spread, and reduces economic uncertainty and financial market volatility. By contrast, unclear government communication amplifies communication polarity and public anxiety, lowers public trust and compliance, and increases disease spread, economic uncertainty, and financial market volatility. The proposed study will collect observable and unobservable information from the COVID-19 pandemic in a sample of two months from 2020 and 2021 from different states in the U.S.A. to investigate the effects of public communications on social media polarization, public anxiety, and pandemic human mobility in a novel framework. This framework considers the broad category of government crisis communications as a mediating factor in the interactions between public sentiments, polarization, economic conditions, uncertainty, and pandemic outcomes. We aim to construct a set of measures for mining unstructured textual data from social media for general sentiment level polarization and public trust. This approach differs from many polarization-related studies focusing on the structural properties of social media, as we are not considering the semantics of opinion but how people elicit their feelings within their public communications. We test our hypotheses on publicly available datasets and attempt to observe the effects and relationships in our model framework. In this study, we will first define and describe the background of our research in terms of prior studies and theories and their relation to our research questions. After defining the scope of our research, we describe the data and methodology used for this study. Finally, we report and discuss the results of the variables and their interactions, first with preliminary observations and then with the implementation of the model framework. We also aim to establish
74
Presidential communications on Twitter during the COVID-19 pandemic 75 a connection between the government communication policy and strategy choices and the related effects within the model framework.
2. BACKGROUND The COVID-19 virus poses a multi-dimensional problem to policymakers around the world. One aspect of the challenge is the positive problem of measurement. The spread of COVID-19 is largely hidden. It is known that the virus can infect people whom themselves may not exhibit symptoms but serve as carriers of contagion. Because testing capacity is constrained in many countries, the number of infections is not observable. Furthermore, the number of deaths is subject to measurement error, as global evidence shows that the actual cause of death of COVID-19 victims can be misidentified and underreported. Given the problem of observation difficulty, policymakers are likely using incomplete and erroneous data to evaluate the benefits of control policies such as containment and testing versus the economic costs of containment, such as growing economic slack, unemployment, and market disruptions, along with related economic costs. Another aspect of the challenge is the normative problem of government communication. Governments have a choice of communication focus and strategy. Recognizing its pivotal role in the population’s complex mechanism of sentiment formation, a government may focus its communication on addressing the emergence of public anxiety. Socially, a government may focus its communication on the economic fallout from pandemic control measures, on the generation of hope with the promotion of potential treatment options, or, indeed, by active resistance to the catastrophization of the pandemic. Furthermore, a government may choose to shift its communication strategy dynamically in order to shift public opinion. In this setting, the choice of government communication policy becomes an important channel to influence a hidden phenomenon of public uncertainty. However pivotal a role the government may play in its influence on public sentiment, news media and social media channels provide other essential channels of influencing public uncertainty. Each communication channel has some role to play in forming general uncertainty. Public uncertainty manifests itself in market phenomena such as volatility and expectations observed as the yield curve’s slope. Since the early 2010s, a significant shift has occurred in how governments communicate with the public, particularly on social media. While social media has been used for many years by individuals and organizations to communicate with their audiences, the rise of social media use by government entities is relatively recent. Many governments worldwide have embraced social media to communicate with citizens (West, 2005; Skoric et al., 2016). Since the election of President Obama in 2008, there has been a concerted effort by the U.S. government to use social media platforms to engage with citizens and provide transparency in government operations (Bertot et al., 2012; Mergel, 2013). This trend has continued under subsequent administrations, with the use of social media by the Trump administration and now the Biden administration. The Presidential Communications strategies in the U.S. included policy announcements (Obama, Trump, Biden), photos of the First Family (Obama), criticism of political opponents (Trump), sharing personal opinions and views (Trump, Biden), engaging with and responding to questions from the public (Trump, Biden). The objective of this research is to provide deeper insights into the effect of presidential social media communication strategies on public health and economic outcomes in the context
76 Handbook of social computing of a global pandemic and explore the relationships between pandemic health impact, government communication, social media polarization, public anxiety, human mobility, public trust, and economic uncertainty. We consider the following research questions: RQ1. Does presidential social media communication respond to the human toll of the pandemic? RQ2. Is presidential social media communication associated with social media polarization, public anxiety, human mobility, public trust, and economic uncertainty? RQ3. Does presidential social media communication mediate the relationship between the human toll of the pandemic and the effects of polarization, public anxiety, human mobility, public trust, and economic uncertainty? RQ4. Does presidential social media communication moderate the relationships between human mobility with public anxiety and public trust, and economic uncertainty with human mobility and public trust? To answer the above questions, we would like to offer a literature review around a few themes that closely relate to our research inquiries. 2.1
Role of Communication in a Public Health Crisis
Maslow (1943) considered safety at the foundation of the “hierarchy of human needs.” Disasters not only threaten human safety but also shock the standards of living. Such impacts constitute Maslowian shocks, leading disaster victims to lose confidence in the government (Olson & Gawronski, 2010). Patterson et al. (2010) discuss duality observable in community response to disasters. Community response to natural disasters tends to foster “therapeutic communities.” Man-made disasters tend to be characterized by “corrosive communities,” resulting in blame, anger, and greater stress. Leiss and Powell (1997) describe the importance of two parallel communication channels influencing public perception of disaster risk: the expert and public channels. Glass and Schoch-Spana (2002) emphasize the mediating role of government communication in controlling the public’s risk perception. Ding (2014) observes that pandemic communications reveal both the structure of the communication channels and the role of government strategy in shaping distinct discourse narratives in each channel. The author considers the communication in the expert channel between experts and the government to be “classified governmental discourse” with “conveyed messages completely different from and, in some cases, contradictory to those produced for public consumption.” The second public discourse is taking place in the public channel. In the case of epidemics, the classified discourse communicates epidemiological information, such as the extent of the outbreak, to the audience with classified access to facilitate the selection of the containment strategy. The public discourse promotes stability and controls mass panic, rumors, and overpricing. As Ding (2014) points out, during the SARS epidemic in China in 2002–03, the national and regional governments effectively controlled public panic by creating the illusion of transparency. Government strategy consisted of censor-
Presidential communications on Twitter during the COVID-19 pandemic 77 ship of all governmental and commercial media, banning relevant private messages, and controlling the discourse through partial disclosures through mainstream and commercial media. Sandman (1994: p. 254) generalizes that the media’s “alarming content about risk is more common than reassuring content or intermediate content—except, perhaps, in crises, when the impulse to prevent panic seems to moderate the coverage.” Ungar (1998: p. 36) observes moderation during the 1995 Ebola outbreak in Zaire when media discourse shifted its narrative from Ebola mutations and contagion to a containment discourse that used “a strategy of ‘othering’ to allay the fear.” Ungar (2008) analyzed the Sandman (1994) proposition empirically using Google news data (2004–06) for coverage of the Avian Flu. Ungar (2008) identifies three stages of an epidemic discourse: fearful narrative, mixed messages, and containment narrative. Drawing on these precedents, Ding (2014) suggests that Western governments can pursue effective disaster communications by crafting a containment narrative rather than censorship and media control. Scholars have extensively studied the pivotal role of social media in emerging epidemics. For instance, Ding (2009) examines rhetorical data on China’s emergence of the SARS epidemic. The author finds that social media effectively circumvent institutional controls such as state censorship to provide vital information to the public. Ding’s analysis is based on the limited epidemiological information available between November 2002 and March 2003 in Guangdong Province. The author argues that social media, word-of-mouth communication, and the panic buying triggered by social media played crucial roles in pressuring the Guangdong Municipal Government. As a result, the government held its first and only official press conference on February 11, 2003, during which officials declared that the local epidemic was under control. This study further concludes that when government communication is absent in a crisis, social media may play an essential role in disseminating information by those with access to inside information and warning the larger public. 2.2
Government Communication as a Mediator of Critical Information, Public Trust, and Ease of Public Anxiety
Reynolds and Quinn (2008: p. 13S) consider public trust essential in enabling desired public behavior during a crisis. They suggest that in the face of a crisis, trust is an effective outcome of persuasive communication shown by public officials who can communicate with “empathy and caring, competence and expertise, honesty and openness, and dedication and commitment.” In their seminal research on the epidemic theory of communication processes, Goffman and Newill (1964) suggest that people tend to be more affected by some ideas but resist others and would further pass the ideas they believe to others. Ironically, this (selective yet fast) information transmission phenomenon, “an intellectual epidemic,” represents the exact mechanism or process of how an infectious disease would spread (Goffman & Newill, 1967). In the same context, Alaszewski and Coxon (2009) extend the notion of viral information from ideas to sentiment. They also find that emotions, such as fear, can spread in the same pattern as a viral disease. Similarly, Van Damme and Van Lerberghe (2000) discuss the parallel anxiety channels, i.e., psychological and bio-demographic, with the former easier to feel but more challenging to measure quantitatively. Based on historical examples, the authors state that when the public faces various challenges and threats in a pandemic, it will turn to administrative leadership or religious authorities to ensure responses. Government
78 Handbook of social computing and public health leaders should consider psychological (such as fear) and bio-demographic burdens in their communication. This approach can be more effective than just focusing on one dimension. These findings suggest a conjecture that epidemiological shocks are accompanied by the emergence of several interdependent communication channels, each with an intrinsic mechanism to propagate information and anxiety exponentially. The first exponential channel is the epidemiological process itself. Government communication provides a second channel for the viral transmission of human sentiments such as fear/anxiety, calm/trust, harmony/unity, discord/polarity, and confusion/uncertainty. These sentiments can also be transmitted exponentially in the media channels of communication. 2.3
Quality of Communication as a Mediator of Trust in Government
By examining the role of communication in a public health emergency in the context of the 2009 H1N1 pandemic, Quinn et al. (2013) find that the quality of communication described by the alignment of government communication and the news is associated with higher trust in government actions. At the same time, there is evidence that governments are perceived as more transparent, credible, and trustworthy when they adopt social media to provide information dissemination, communication, and participation channels (Park et al., 2016; Song & Lee, 2016). 2.4
Communication Polarity as a Mediating Amplifier of Public Anxiety
Scholars have studied political polarization on the Twitter platform. Conover et al. (2011: p. 89) find that the networks of political retweets are polarized, e.g., left- and right-leaning users rarely connect. At the same time, the network of retweets with person-to-person mentions “is dominated by a single politically heterogeneous cluster of users in which ideologically opposed individuals interact at a much higher rate than the network of retweets.” In a related study, Hong and Kim (2016) emphasize that social media is capable of polarization and alignment, demonstrating dual aspects of social media. That is, social media polarization is manifested by echo chambers that further push polarization to more extremes of public opinion. In contrast, by “crosscutting interactions” among other political communications, social media alignment can build connections across polarized communities. Bonneux and Van Damme (2006) discuss how media sentiment amplifies public fear during epidemics. The media amplification effect is not exclusive to epidemics but is likely characteristic of acute episodes that feed public anxiety. For example, Bollen et al. (2011) examine how Twitter hugely influenced public sentiments during the financial crisis of 2008. They argue for a more thorough and larger scale analysis of public mood formation, which potentially has predictive value to social and economic indicators. Baker et al. (2020) examined the extraordinary market response to the COVID-19 pandemic. They observed that as the virus transitioned from a regional crisis in China’s Hubei Province to a worldwide pandemic, equity values plunged, and market volatility surged globally. Volatility levels in the United States have approached or exceeded those recorded in October 1987, December 2008, and during the late 1920s and early 1930s. A possible explanation for this striking reaction could be that information about pandemics now spreads more
Presidential communications on Twitter during the COVID-19 pandemic 79 quickly. This contrasts with the Spanish Flu, which had significantly higher mortality rates but failed to trigger even a small fraction of daily stock market jumps.
3.
METHODS AND MATERIALS
In this section, we describe hypotheses related to the research questions (RQ1 to RQ4 as listed in Section 2) for our proposed study to test. We present literature as the supporting evidence for generating these hypotheses (Table 5.1). After the hypotheses and the related conceptual framework, we will describe the metrics and methods used for testing and investigating the framework. We begin by describing the data sources and the construction of the measures. The bulk of this work is constructing methods for measuring social media sentiment polarization and public trust from unstructured data. For measuring polarization, we use dimensionality reduction for constructing composite measures as a proxy for emotional polarization. 3.1
Conceptual Framework
Our conceptual framework consists of seven variable constructs: Pandemic Public Health (PPH), Presidential Communications (PC), Social Media Polarization (SMP), Public Anxiety (PA), Pandemic Human Mobility (PHM), Public Trust (PT), and Economic Uncertainty (EU). The interactions between the system variables are shown in Figure 5.1.
Figure 5.1
Study framework depicting each hypothesis
80 Handbook of social computing Table 5.1
Hypotheses
H
Hypothesis type
Hypothesis (at 95% confidence)
H1
Direct Association
PPH is directly associated with PC (Kim and Kreps, 2020; Musolff et al., 2022)
H2
Direct Association
(H2a) PC is directly associated with SMP (Jiang et al., 2020; Hong & Kim, 2016) (H2a) PC is directly associated with SMP (Jiang et al., 2020; Hong & Kim, 2016) (H2b) PC is directly associated with PA (Van Scoy et al., 2021; Mheidly & Fares, 2020). (H2c) PC is directly associated with PHM (Young, 2022; Levitt et al., 2022) (H2d) PC is directly associated with PT (Hyland-Wood et al., 2021; Young, 2022) (H2e) PC is directly associated with EU (Huynh et al., 2021)
H3
Mediation
(H3a) PC mediates the relationship of PPH with SMP (Claeys & De Waele, 2022; Hyland-Wood et al., 2021; Xu et al., 2022) (H3b) PC mediates the relationship of PPH with PA (Kim & Kreps, 2020) (H3c) PC mediates the relationship of PPH with PHM (Chapin & Roy, 2021; Hu et al., 2021; Huang et al., 2020) (H3d) PC mediates the relationship of PPH with PT (Van Scoy et al., 2021) (H3e) PC mediates the relationship of PPH with EU (Huynh et al., 2021)
H4
Moderation
(H4a) PC moderates the relationship of PA with PHM (Kim & Kreps, 2020; Hu et al., 2021; Huang et al., 2020) (H4b) PC moderates the relationship of PT with PHM (Van Scoy et al., 2021; Hu et al., 2021; Huang et al., 2020) (H4c) PC moderates the relationship of EU with PHM (Huynh et al., 2021; Hu et al., 2021; Huang et al., 2020) (H4d) PC moderates the relationship of EU with PT (Huynh et al., 2021; Van Scoy et al., 2021)
3.2
System Constructs
The system described in Section 3.1 consists of seven constructs depicting the system’s state, with their definitions provided in Table 5.2. In the following section, we detail how these constructs were derived, including the relevant data sources and methods used for calculation. 3.2.1 Social media polarization We utilize two Twitter datasets from separate sources, TBCOV (Imran et al., 2022) data from November 2020 and GeoCOV (Qazi et al., 2020) data from December 2021. Both datasets consist of tweets related to COVID-19, and thus we do not perform any further topical filtering. In November 2020, the dataset shows a daily average of 2,956 tweets, ranging from 99 to 29,744, for each state. The December 2021 dataset is sparser, displaying a daily average of 90 tweets for each state, with a range of 0 to 965. The decline in the number of original tweets related to COVID-19 in the data impacts the design of our social media polarization measure. It is one of the underlying reasons for extracting the polarizing emotional and categorical dimensions separately for both months. The unit of analysis in the present study was an ij observation in the system of j different U.S. states by categorizing users based on their self-proclaimed location obtained for i-th daily observation from November 2020 and December 2021. To accomplish this, we matched users to the full name of each state or its two-letter code. We limited our analysis to original tweets, removing retweets from the dataset. We analyzed each tweet’s content using sentiment scores generated by the Semantic Brand Score (SBS) software (Fronzetti Colladon & Grippa, 2020). This software combines two text analysis lexicons: the Linguistic Inquiry and Word Count (LIWC) lexicon (Pennebaker et al., 2015) and the National Research Council (NRC)
Code
SMP
PT
PA
Social Media Polarization
Public Trust
Public Anxiety
n
∑ j Vj
− 3
s 4
(5.7) (5.8)
‾ ij PAij = pa
(5.6)
(5.5)
(5.4)
Competency, Consistency, Fairness}
pt ∈ {Honesty, Caring, Commitment,
tij PTij = ∑ pt p ‾
Vtotal
J j DV = _ N j
V – _
i=1
= _ 1n ∑ m 4
_ (xi − x ) 4 _
(5.3)
_ (xi − x ) 3 m = _ 1n ∑ _ 3 3 i=1 s n
(5.2)
3 (n − 1
BCi,j = ( m3 2 i,j + 1) / (m4 i,j + ___________ ( n − 2) (n − 3) ) )2
(5.1)
____________
B Ci,j 2 + NDVi,j 2 SMPi,j = √
Formula
System constructs in this study
System variable
Table 5.2 Notes
app (Fronzetti Colladon & Grippa, 2020).
Colladon, 2018) software available in the SBS web
using the Semantic Brand Score (SBS) (Fronzetti
(Pennebaker et al., 2015) in the set of tweets estimated
Public Anxiety (PAij ) is the mean of LIWC anxiety
(LIWC) software version 22 (LIWC-22).
Fairnessusing the Linguistic Inquiry and Word Count
Caring, Commitment, Competency, Consistency, and
lexical trust value (p tij ) constructed using the pt set ‾ of custom dictionaries for the constructs of H onesty,
Public Trust (PTij ) is the sum of the means of each
is the total variance in the set of dimensions.
dimensions. Jis the total number of dimensions. Vtotal
dimension. ∑ j Vj is the sum of variances across all j
for the j-th dimension. V j is the variance in the -th
NDVj is the normalized dimension-specific variance
and sis the sample standard deviation.
size, xi is the ith observation, x ̄is the sample mean,
shown in Equations 5.3 and 5.4, where n is the sample
m4 is the sample excess kurtosis. These are defined as
in Equation 5.2, where m3 is the sample skewness and
Bimodality Coefficient (BCi,j ) is calculated as shown
obtained from dimensionality reduction.
corresponding to the observation. The final (SMPi,j) is
December 2021. The index j describes the U.S. state
daily observations from November 2020 and
variance (NDVi,j ). The index i corresponds to
and dispersion, i.e., normalized dimension-specific
is calculated from the bimodality coefficient (BCi,j )
In Equation 5.1, Social Media Polarization (SMPi,j)
Presidential communications on Twitter during the COVID-19 pandemic 81
Code
EU
Economic Uncertainty
(5.12)
ij mij PHMij = ∑
EUij EUij = 0.5 EPUij + 0.5 BLSij
transit, parks, work}
m ∈ { retail, grocery, residential,
(5.14)
(5.13)
(5.11)
cij ∈ {cpcij , tpcij , dpcij , cptij }
(5.10)
(5.9)
c ij − min (c) PP Hij = _1 ∑ _____________ c max (c) − min (c) 4
C ij = {p‾ P c 1 ij, p‾ c 2 ij, …, p‾ c n ij}
Formula
Detailed descriptions for the constructs are provided in Subsections 3.2.1 through 3.2.7.
PHM
Pandemic Human Mobility
Note:
PPH
Pandemic Public Health
Presidential Communication PC
System variable
from the Bureau of Labor Statistics (BLSij ).
Uncertainty (EPUij )and the economic condition index
state-specific indicators of Economic Policy
Economic Uncertainty (EUij) is the mean of
mobility difference to baseline.
Pandemic Human Mobility (PHMij) is the mean
(cptij ).
(tpcij ), deaths per capita (dpcij ), and cases per test
from the set of cases per capita (c pcij ), tests per capita
normalized categorical condition indices c ij observed
Pandemic Public Health (PPHij) is the mean of
2013), estimated using the SBS web app.
word-emotion association (Mohammad & Turney,
daily mean dimensions p cn ijavailable in the lexicons ‾ of LIWC and the National Research Council (NRC)
Notes Presidential Communication (PCij) is a set of n = 99
82 Handbook of social computing
Presidential communications on Twitter during the COVID-19 pandemic 83 word-emotion association lexicon (Mohammad & Turney, 2013). We employed the LIWC lexicon to assess the topics in each tweet and the NRC lexicon to calculate sentiment scores. This approach yielded 27 numerical dimensions of information for each tweet, some of which depict the prevalence of a particular sentiment or emotion in the text (e.g., joy, anger, fear). In contrast, other dimensions depict the topical or linguistic properties of the text (e.g., health, pronouns, we, they). These dimensions are listed in Table 5.3, Panel A. We examined the relationship between social media communication polarity and other constructs, including PPH, PA, and PHM. To do this, we computed numerical measures from sentiment and emotion-specific distributions for each state ( j) on a specific date ( i) . Table 5.3
All variables used in the analysis and component variables with significant loadings from the dimensionality reduction methods for both time windows
Panel A: Lexicon Lexicon Type LIWC NRC
Variables Tone, I-pronoun, You-pronoun, We-pronoun, They-pronoun, Affective, Sad, Social, Biological processes, Health, Death, Risk, Work, Leisure, Home, Money, Perceptive, Cognitive processes Anger, Fear, Anticipation, Surprise, Positive, Negative, Sadness, Disgust, Joy
Panel B: November 2020 Dimension First Principal Component First Factor Second Factor
Component variables with significant loadings Tone, Joy Anger, Sad, Biological processes, Health, Death, Risk, Leisure, Home, Money, Anticipation, Sadness(NRC), Disgust(NRC), Perceptive I-pronoun, You, We, They, Affective processes, Social, Cognitive processes, Joy(NRC)
Panel C: December 2021 Dimension First Principal Component First Factor Second Factor
Component variables with significant loadings Fear, Surprise, Positive(NRC), Negative(NRC), Sadness(NRC), Disgust(NRC) Tone, Fear, Surprise, Positive(NRC), Negative(NRC), Sadness(NRC), Disgust(NRC), Cognitive process Tone, You, Affective processes, Social, Biological processes, Health
Note: For detailed descriptions of each variable, please refer to Pennebaker et al. (2015) and Mohammad & Turney (2013).
DiMaggio et al. (1996) define communication polarization as dispersion and bimodality. Our operationalization of this approach (Eq. 5.1) is interdisciplinary, combining concepts from social network analysis, data mining, and statistical modeling. We estimate Social Media Polarization (SMPi,j) for the ij unit of analysis as a distance metric between the Bimodality Coefficient (BCi,j) and dispersion. We measure the bimodality of the distribution of sentiment using Sarle’s bimodality coefficient (Knapp, 2007) (Eq. 5.2), using auxiliary measures of sample skewness (Eq. 5.3) and sample excess kurtosis (Eq. 5.4). We define dispersion as a sentiment’s Normalized Dimension-specific Variance (NDVi,j) for the j-th dimension in tweets within a state during an i-th one-day time window (Eq. 5.5). Coefficient values above .555 are considered bimodal. In this study, we compare dataset values to themselves without classifying distributions. We assume higher bimodality coefficients and variances indicate greater user sentiment level polarization. Using these methods, we obtain 54-time series: 30 for November 2020 and 31 for December 2021, comprising 27 bimodality and 27 dispersion variables.
84 Handbook of social computing For each bimodality–dispersion pair, we construct a heuristic combined variable. We first normalize dispersion values over the time window for all states. Then, we construct a combined value for each sentiment using sci-kit learn’s transform functionalities and factor analysis (Figure 5.2). Table 5.3 (Panels B and C) lists variables with relevant loadings for each component from dimensionality reduction methods.
Figure 5.2
Processing pipeline for constructing measures of sentiment level polarization from unstructured data from Twitter
3.2.2 Public trust In addition to the sentiment extraction for measuring polarization, we utilize a novel approach for measuring public trust from unstructured data for studying the interaction between PPH and PA. Similar to Grimmelikhuijsen & Knies (2017) and Bonavita & Maitland (2021), we construct bags of words for six categories of trust – caring, commitment, competency, consistency, fairness, and honesty – integrating the work by Renn & Levine (1991), Meredith et al. (2007), and Grimmelikhuijsen & Knies (2017). Renn & Levine (1991) reviewed previous research on trust and credibility in risk communication and synthesized their findings to propose a scale for measuring trust using case studies to illustrate the importance of trust in risk communication. Meredith et al. (2007) conducted a comprehensive literature review to identify factors that influence trust in public health messaging and made recommendations for improving trust in public health messaging during a bioterrorist event. Grimmelikhuijsen & Knies (2017) developed and validated their scale for measuring citizen trust in government organizations by surveying Dutch citizens. They used a sample of 2,013 citizens and employed exploratory and confirmatory factor analysis to validate their scale. They also examined the scale’s reliability and validity to ensure it accurately measured trust in government organizations. In our implementation of similar principles, the bag of words for honesty contains unigrams such as “believable” and “candidly” for the positive measure, and “deceive” and “devious” for a negative measure of honesty. The complete list of unigrams across all categories contains 1,103 tokens. For each tweet, we calculate the frequency of words for each category (pt) appearing in the text using LIWC22 software and end up with six distributions of public trust for each state for each day. In the scope of this study, we use the sum of means for each of the six categories as the measure for overall perceived public trust PT at a single date (Eq. 5.6, Eq. 5.7). 3.2.3 Public anxiety In the conceptual framework, we investigate the effect of SMP on PA, which is constructed using the SBS software (Fronzetti Colladon & Grippa, 2020) implementation of LIWC anxiety
Presidential communications on Twitter during the COVID-19 pandemic 85 (Pennebaker et al., 2015). The measure of public anxiety for a state is the mean anxiety of all tweets during that day (Eq. 5.8). 3.2.4 Presidential Communications To investigate the effect of Presidential Communications, we collect a dataset of all texts from the official Twitter account of the President of the United States (“@POTUS”) and perform an analysis similar to the public communication polarization by calculating the sentiments using the LIWC and the NRC lexicons estimated using the SBS web app (Eq. 5.9). We do not conduct topical filtering of the tweets by the POTUS account, as topical hashtags are not consistent in the tweets. Separating the crisis communications from other topics would imply that the other topics are not impacting the system. The finalized dataset consists of daily means for each of the 89 LIWC sentiment dimensions and 10 NRC dimensions, which we investigate in parallel as a part of the model framework. 3.2.5 Pandemic public health This study’s pandemic health outcomes (PPH) are constructed as a numerical index from the data provided by Johns Hopkins (Dong et al., 2020) and CDC (Centers for Disease Control and Prevention, 2023). For each state, we consider the number of daily cases per capita, number of tests per capita, number of deaths per capita, and number of cases per number of tests. We normalize each of these values across all states such that the state with the highest value has a value of 1 and the lowest has a value of 0 and calculate the mean for each state (Eq. 5.10, Eq. 5.11). 3.2.6 Pandemic human mobility The pandemic human mobility measure consists of an index similar to the pandemic public health outcomes but in terms of movement from Google Community Mobility reports (Google LLC, n.d.). The mobility is measured as a percentage difference to a baseline, the median daily mobility before the pandemic (i.e., the time between January 3 and February 6, 2020). In the present study’s scope, we simplify that reduced mobility depicts higher compliance regardless of the restrictions in individual states during the time windows of our data. The categories of mobility for this measure are retail & recreation, grocery & pharmacy, residential, transit, parks, and workplaces. To construct an index for pandemic human mobility (PHM), we calculate the sum of percentage differences from the baseline for each state for each day (Eq. 5.12, Eq. 5.13). As the measure of baseline difference does not consider the weekdays or seasonal fluctuations, we are using the monthly mean in our preliminary analysis. 3.2.7 Economic uncertainty The final measure in our framework is an economic uncertainty index (EUij ) constructed as a mean of forward-looking economic conditions, namely economic policy uncertainty (EPUij ) (Baker et al., 2016) and the economic condition index from the Bureau of Labor Statistics (BLSij ) (Eq. 5.14). The data used is recorded in weekly intervals and for each state. We also use both of the values separately in the testing of the proposed model.
86 Handbook of social computing
4. RESULTS 4.1 Measurement This section reports the results of applying the described methodology to the Twitter data. In this study’s scope, we restrict the preliminary analysis (figures and correlations) reporting to SMP interactions. We calculate the constructs for SMP using the methods described in Section 3.2 and perform preliminary analysis on the components from the dimensionality reduction. First, we analyze the emotional polarization measures state by state, such that we investigate the loadings of the significant components passing our criteria. We find that most of the polarization components in each state are unique, meaning that the most significant emotional dimensions in bimodality and dispersion differ from state to state. The most common significant components appeared three times across the states in November 2020 and contained dimensions such as affective, social, risk, home, anger, biological processes, joy, and fear. The number of relevant components by state is shown in Figure 5.3 for the two months of our data.
Figure 5.3
Number of states with several relevant principal components for November 2020 (left) and December 2021 (right)
As we aim to compare and evaluate the effects of SMP on PA, pandemic human mobility, and pandemic public health, we perform principal component analysis (PCA) and factor analysis on the time series of all states simultaneously. Using the filtering described in Section 3.2.1, we obtain a single relevant component from PCA and two components from factor analysis for both months separately. The overlap in the variables loading into the dimensionality reduction solutions using PCA and exploratory factor analysis (EFA) in 2021 emphasizes the importance of those particular loadings. In contrast, the difference in November 2020 is more significant and implies the general emotional polarization to be more complex. However, it should be noted that once we calculate the component values of SMP, we utilize all the dimensions, even if the variables are low loadings.
Presidential communications on Twitter during the COVID-19 pandemic 87 The extracted components can be characterized between the two-time windows regarding the relevant variables. The most critical variables in November 2020 are related to emotional dimensions (sadness, joy, disgust) and physical categories related to life (health, work, home, money). The second EFA factor contains variables related to pronouns, affective talk, cognitive processes, and joy, which can be characterized as an affectionate dialogue between groups and individuals. In December 2021, the PCA components and EFA factors captured the important variables such as fear, surprise, and general emotions (sadness, positive, negative, disgust). The appearance of fear and surprise in the dimension reduction solution differs from the previous time window and reinforces window-specific analysis. The second EFA factor overlaps the previous time window, including social, biological processes, and health variables. For preliminary analysis of the SMP measure in our model framework, we first compare the monthly changes and means by visual inspection (see Figures 5.4–5.6) and calculate the corresponding Pearson correlations. We can observe a low correlation between the two constructs by visually inspecting the figures with the difference in pandemic public health and the monthly mean SMP. States with higher mean SMP can be observed to have a more positive change in pandemic public health (i.e., the pandemic outcomes worsen during the month compared to the other states). However, the correlations between mean SMP and change in pandemic public health are low for both time windows (≈0.11 to ≈0.4), which can be expected from the direct interaction in our conceptual model. It should be noted that implementing the dimensionality reduction methods can also yield components that show oppositely signed correlations to the other constructs in the conceptual model.
Notes: SMP is constructed as the principal component 1. US states are shown as dots or circles according to the political leaning based on the results of the 2020 presidential election.
Figure 5.4a
The change in Social Media Polarization (SMP) during November 2020 and Pandemic Public Health (PPH) by state
88 Handbook of social computing
Notes: SMP is constructed as the factor analysis factor 1. US states are shown as dots or circles according to the political leaning based on the results of the 2020 presidential election.
Figure 5.4b
The change in Social Media Polarization (SMP) during November 2020 and Pandemic Public Health (PPH) by state
Notes: US states are shown as dots or circles according to the political leaning based on the results of the 2020 presidential election.
Figure 5.5
Social Media Polarization (SMP) as principal component 1 for November 2020 and Public Anxiety (PA) by state
Presidential communications on Twitter during the COVID-19 pandemic 89
Notes: US states are shown as dots or circles according to the political leaning based on the results of the 2020 presidential election.
Figure 5.6
Social Media Polarization (SMP) as principal component 1 for November 2020 and Pandemic Human Mobility (PHM) by state
The second interaction in our conceptual framework is between SMP and PA. The preliminary investigation (Figure 5.5) shows a similarly low level of correlation (≈0.22) for November 2020 and roughly a similar magnitude for December 2021 (≈0.16). In our conceptual model, the third interaction with SMP is between SMP and PHM. The correlation between the states’ polarization and the monthly average of pandemic human mobility shows an interesting change from practically 0 in November 2020 to a negative −0.289 in December 2021. Interpreting the negative sign would convey that the states with higher polarization would have lower average mobility compared to the baseline (i.e., higher compliance). All the correlations not included in the hypotheses of this study are listed in Table 5.4.
90 Handbook of social computing Table 5.4
Pearson correlation between the monthly means of Pandemic Human Mobility (PHM), Public Trust (PT), Public Anxiety (PA), Economic Uncertainty (EU), Social Media Polarization (SMP), and the change of Pandemic Public Health (PPH) during the months of November 2020 and December 2021
Construct/variable pair
November 2020
December 2021
Public Anxiety, Public Trust
0.42596
0.62292
Social Media Polarization, Public Anxiety
0.19261
0.15527
Social Media Polarization, Pandemic Public Health
0.39595
0.0427
Social Media Polarization, Pandemic Human Mobility
0.03447
−0.289
Pandemic Human Mobility, Public Trust
0.1198
−0.07778
Pandemic Human Mobility, Economic Uncertainty (BLS)
0.33886
−0.11572
Pandemic Human Mobility, Public Anxiety
0.05969
0.06688
Pandemic Public Health, Pandemic Human Mobility
0.39817
−0.3764
Pandemic Public Health, Public Trust
0.10548
0.05944
Pandemic Public Health, Public Anxiety
0.37318
0.24567
Pandemic Public Health, Economic Uncertainty (BLS)
−0.10606
0.17426
4.2
Direct Association, Mediation, and Moderation
4.2.1
H1 direct association of presidential communication and pandemic public health We estimate the partial least squares structural equation model (PLS-SEM) shown in Figure 5.1 using SmartPLS software, version 4 (Ringle et al., 2022). The results provide strong evidence for a direct relationship between the Pandemic Public Health (PPH) construct and the Presidential Communication (PC) construct in both November 2020 and December 2021 samples. As shown in Table 5.5, in both time samples, the relationship between PPH and PC was found to be statistically significant (p-values < 0.05). Interestingly, the results evidence a change in PC strategy. Every unit of increase in public health adversity (increases in PPH) was associated with a 0.184 unit increase in presidential communications on social media. In contrast, in 2021, PC activity remained virtually flat and somewhat negative: every unit of PPH increase was associated with a −0.057 decline in PC. Table 5.5
Direct association results Direct Beta
Direct Beta
Hypothesis
(2020)
(2021)
outcome
0.184***
−0.057**
Supported
Hypothesis
Proposition
H1
Pandemic Public Health → Presidential Communication
H2a
Presidential Communication → Social Media Polarization −0.631***
0.001
H2b
Presidential Communication → Public Anxiety
0.063
−0.024
Not Supported
−0.076*
0.341***
Supported
H2c
Presidential Communication → Pandemic Human Mobility
H2d
Presidential Communication → Public Trust
0.424***
−0.001
H2e
Presidential Communication → Economic Uncertainty
0.003
−0.024
Notes:
* significant at 10%; ** significant at 5%; *** significant at 1%.
Partially Supported
Partially Supported Not Supported
Presidential communications on Twitter during the COVID-19 pandemic 91 4.2.2
H2 direct association of presidential communication and endogenous system variables Based on the results of the PLS-SEM analysis, we found mixed evidence regarding the direct association between the construct of PC and the five remaining endogenous constructs for the November 2020 and December 2021 samples (Table 5.5). First, the relationship between PC and SMP was partially supported, with evidence of a significant direct association in 2020 (path coefficient of −0.631, p-value < 0.01) but not in 2021 (path coefficient = 0.001, p-value > 0.1). Second, the relationship between PC and PA was not supported, with no evidence of a significant direct association in either 2020 or 2021 (path coefficients < 0.10, p-values > 0.1). Third, the relationship between PC and PHM was supported in both 2020 and 2021, with evidence of a significant direct association in both time samples (path coefficients = −0.076 in 2020 and 0.341 in 2021 with respective p-values < 0.1 and < 0.01). Fourth, the relationship between PC and PT was partially supported, with evidence of a significant direct association in 2020 (path coefficient = 0.424, p-value < 0.01) but not in 2021 (path coefficient = −0.001, p-value > 0.1). Finally, the relationship between PC and EU was not supported, with no evidence of a significant direct association in either 2020 or 2021 (path coefficients < 0.1, p-values > 0.1). These findings suggest that the impact of PC on various constructs varied across different periods and contexts. Specifically, PC appears to have a stronger association with SMP and PT in 2020 compared to 2021. In contrast, its association with PHM remains consistently strong across both time samples. The lack of significant association between PA and EU suggests that other factors may be more important in driving these constructs. 4.2.3 H3 mediated associations of presidential communication We tested the hypothesized mediated relationships using the Baron and Kenny (1986) approach, with and without mediators, via bootstrapping (5,000 samples with 95 percent bias-corrected confidence level). Our results (Table 5.6) suggest that the association between Pandemic Public Health and Presidential Communication is partially mediated by Social Media Polarization and Public Trust in 2020 but not in 2021. No mediation was found for the association between Pandemic Public Health and Presidential Communication with Public Anxiety, Pandemic Human Mobility, and Economic Uncertainty in 2020 and 2021. 4.2.4 H4 moderated association of Presidential Communication We investigated whether higher or lower levels of Presidential Communication moderated four different relationships: (1) between Public Anxiety and Pandemic Human Mobility (H4a), (2) between Pandemic Human Mobility and Public Trust (H4b), (3) between Pandemic Human Mobility and Economic Uncertainty (H4c), and (4) between Public Trust and Economic Uncertainty (H4d). We tested these moderation hypotheses via PLS-SEM with bootstrapping (5,000 samples with 95 percent bias-corrected confidence level) and conducted simple slope analysis tests to examine the nature of the moderation further. Our findings support hypotheses H4a, H4b, and H4d, but not H4c (Table 5.7). Simple slope analysis is a method commonly used to examine moderation effects in the context of structural equation modeling (SEM) and PLS-SEM. According to Hayes (2017), moderation occurs when the relationship between two variables changes as a function of a third variable. Simple slope analysis can explore the relationship between two variables at different levels of the moderating variable (Preacher et al., 2006). The appropriateness of using
Notes: (large).
H3e
H3d
H3c
H3b
H3a
Hypothesis
→ EU
PPH → PC
→ PT
PPH → PC
PHM
PPH → PC →
→ PA
PPH → PC
SMP
PPH → PC →
Proposition
−0.278***
0.075**
0.065**
−0.038
−0.054*
−0.279***
0.424***
0.076*
−0.047*
0.001
0.078***
−0.014
0.012
−0.116***
small
−0.019
No mediation −0.407***
mediation
Partial
No mediation 0.321***
No mediation −0.067**
mediation
−0.130***
−0.408***
−0.019
0.322***
−0.068**
−0.130***
0.001
0.000
−0.020**
0.001
−0.000
medium
small
No mediation Not supported
No mediation Supported
No mediation Not supported
No mediation Not supported
No mediation Supported
outcome observed size
Effect
w/ Med.
w/o Med.
observed
size
Effect
w/ Med.
w/o Med.
0.046**
Hypothesis Mediation
Effect
Mediation
Direct Beta
Direct Beta
Mediation
Effect
Mediation
Direct Beta Partial
2020 and 2021 2021
2021
2021
2021
2021
2020
2020
2020
2020
2020
Direct Beta
Mediation results
* significant at 10%; ** significant at 5%; *** significant at 1%. Effect size categorizations follow Cohen (1992): f2 = 0.02 (small), f2 = 0.15 (medium), f2 = 0.35
Table 5.6
92 Handbook of social computing
Presidential communications on Twitter during the COVID-19 pandemic 93 Table 5.7
Moderation results
Hypothesis Proposition
Direct Beta (2020)
Direct Slope Analysis (2020)
Beta (2021)
Slope Analysis (2021)
Hypothesis outcome
Presidential Communication x H4a
Public Anxiety →
0.080**
0.000
−0.119**
−0.063*
−0.009
−0.019
0.003
0.030
Pandemic Human
Partially Supported
Mobility Presidential Communication H4b
x Public Trust →
Supported
Pandemic Human Mobility Presidential Communication H4c
x Economic Uncertainty →
Not Supported
Pandemic Human Mobility Presidential Communication H4d
x Economic Uncertainty →
Partially Supported
Public Trust
Notes:
* significant at 10%; ** significant at 5%; *** significant at 1%.
simple slope analysis to examine moderation in SEM and PLS-SEM depends on the research question and the nature of the data being analyzed. As Hayes (2017) suggested, when the path coefficients are small, and the p-values are not statistically significant, this may suggest that the independent variable has little or no direct effect on the dependent variable. However, if the simple slope analysis shows apparent moderation effects, this suggests that the relationship between the independent and dependent variables varies depending on the value of the moderating variable. In this case, the simple slope analysis should be interpreted as evidence of moderation, even if the path coefficients are small and p-values are not statistically significant. The size of the moderation effect can be estimated by calculating the differences in the slopes of the simple regression lines for different levels of the moderating variable (Preacher et al., 2006). H4a: Presidential Communication partially moderates the association of Public Anxiety with Pandemic Human Mobility, as it was significant in 2020 (β = 0.08, p < 0.05) but not significant in 2021 (β < 0.05, p > 0.1). The slope analysis revealed that at lower levels of Public Anxiety, a negative 1 SD shock in Presidential Communication amplifies Pandemic Human Mobility. A positive 1 SD shock in Presidential Communication significantly dampens Pandemic Human Mobility. At higher levels of Public Anxiety, the relationship between Presidential Communication does not affect Pandemic Human Mobility.
94 Handbook of social computing H4b: Presidential Communication moderates the association of Public Trust and Pandemic Human Mobility, as it was significant in both 2020 (β = −0.119, p < 0.05) and 2021 (β = −0.063, p < 0.1). The slope analysis for the 2020 sample showed that at lower levels of Public Trust, a positive 1 SD shock in Presidential Communication significantly amplifies Pandemic Human Mobility, while a negative 1 SD shock in Presidential Communication significantly dampens Pandemic Human Mobility. At higher levels of Public Trust, the effect is reversed. The slope analysis for the 2021 sample showed that across all levels of Public Trust, a positive 1 SD shock in Presidential Communication significantly dampens Pandemic Human Mobility, while a negative 1 SD shock in Presidential Communication significantly amplifies Pandemic Human Mobility. H4c: On the other hand, our results do not support hypothesis H4c, which proposes that Presidential Communication moderates the association of Pandemic Human Mobility with Economic Uncertainty. Presidential Communication and Economic Uncertainty interaction was insignificant in either 2020 or 2021. The slope analysis did not reveal any significant relationship between the two constructs. H4d: Lastly, our results partially support hypothesis H4d, which proposes that Presidential Communication moderates the relationship between Public Trust and Economic Uncertainty. The interaction between Presidential Communication and Economic Uncertainty partially supported the hypothesis based on the slope analysis results in 2021. The slope analysis showed that at high levels of Economic Uncertainty, a positive 1 SD shock in Presidential Communication leads to higher Public Trust, while a negative 1 SD shock in Presidential Communication leads to lower Public Trust. At lower levels of Economic Uncertainty, the effect is reversed.
5. DISCUSSION The results of the dimensionality reduction methods show interesting differences between states. Each state’s individually run dimensionality reduction shows that the polarizing components are unique, with the most common loadings appearing only three times. This finding reinforces the notion that different populations react nonuniformly to various crises. In Figure 5.3, we showed the number of relevant components by state and that the number of polarized components also varies from state to state. Even with a lack of shared components, some variables appeared more frequently than others. Namely, the five most common variables with high loadings were joy, affective, tone, health, and social in November 2020, and anticipatory, joy, negative, anger, and fear in December 2021. Worth noticing is that the variables also change between the two months, reflecting the public’s attitudes towards the shifting state of the pandemic. Going onward, collecting and computing a more extensive dataset consisting of all dates between the two sample months could reveal interesting changes in these measures and provide more data for our conceptual framework. After the preliminary state-by-state analysis, we used the dimensionality reduction methods for extracting common components across the states. This approach resulted in varying amounts of variance explained, with the filtered PCA components explaining approximately half of the variance and the EFA factors explaining less. As a preliminary step, we compared the mean polarization component values to the other constructs in our conceptual framework,
Presidential communications on Twitter during the COVID-19 pandemic 95 which provided some interesting results regarding the correlation strengths and signs of the correlations. The most notable sign change involves the Pandemic Human Mobility, which could be explained by easing mobility restrictions in states with lower polarization. A similar interaction could be speculated for the Economic Uncertainty construct. The strongest correlations in our preliminary analysis, with a single monthly value for each state, hint that a level of relationship between them can exist. However, a more exhaustive analysis, variable by variable, would be warranted to explain these correlations fully. The constructs extracted from the unstructured text from social media can have overlaps in the methodology, and thus analyzing independent text corpora should be conducted in the future. The results of this study highlight the complexity of the relationships between Presidential Communication and the constructs it interacts with within the context of a public health crisis. The study found strong evidence to support a direct association between the constructs of PPH and PC across the November 2020 and December 2021 samples. Interestingly, the study also found that the relationship between Presidential Communication and Pandemic Public Health differed between the two time samples. The study also found mixed evidence regarding the direct association between Presidential Communication and the other endogenous constructs, including Social Media Polarization, Public Anxiety, Pandemic Human Mobility, Public Trust, and Economic Uncertainty. These findings suggest that the impact of Presidential Communication on various constructs varies across different periods and contexts. Furthermore, the study tested the hypothesized mediated relationships using the Baron and Kenny (1986) approach via bootstrapping with and without mediators. The results suggest that the association between Pandemic Public Health and Presidential Communication is partially mediated by Social Media Polarization and Public Trust in 2020 but not in 2021. No mediation was found for the association between Pandemic Public Health and Presidential Communication with Public Anxiety, Pandemic Human Mobility, and Economic Uncertainty in 2020 and 2021. The study’s findings show the importance of taking a nuanced approach to analyzing Presidential Communication in public health crises, as different constructs interact differently in different contexts. Overall, the results highlight the need for a deeper understanding of the complex relationships between Presidential Communication and the various constructs it interacts with to inform better public health policies and practices. It should be noted that our methodology and implementation of the constructs have limitations related to coverage and comparability. The first and the most apparent limitation in interpreting the results is the limited scope of data. In the Twitter datasets, we limited the processed tweets to only the original tweets from users declaring to be located in a particular state due to data availability. This choice limits the amount of data analyzed by an order of magnitude as the retweeting consists of many interactions on the platform. Thus, there remains work to be done in improving the coverage of tweets to contain retweets, constructing more extensive time series, and investigating changes in the dimensions of emotional polarization throughout the pandemic. This expansion would also allow us to compare the polarization during different stages of the pandemic (waves, vaccination, restrictions, and viral variants) in different states. A similar task remains in extending the scope of government communications by including additional sources such as local decision-makers and organizations. The second categorical limitation arises from the assumptions made for capturing Social Media Polarization. Our proposed method differs from the methods used in many polarization-related studies. Extensive work has been done with robust results and measures in structural polarization (i.e., the network aspect of users forming communities and echo cham-
96 Handbook of social computing bers in social media platforms). Our method of capturing emotional polarization assumes that people showing highly different emotions towards a single topic are polarized. This approach relies on the empirical distributions in terms of dispersion and bimodality. Our measure of bimodality uses Sarle’s bimodality coefficient (Sarle, 1983), which has limitations in accuracy and suitability for certain distributions. For instance, the method fails with strictly unimodal distributions (Knapp, 2007; Tarbă et al., 2022).
6. CONCLUSION In this study, we proposed a novel conceptual framework for investigating the effects of government crisis communications on public polarization, anxiety, trust, economic uncertainty, and health-related outcomes of the pandemic. We measured Social Media Polarization in terms of emotional polarization by extracting a set of sentiment and topical dimensions from localized tweets and calculated state-specific dispersion and bimodality. We also presented a bag-of-words method with novel dictionaries and dimensions for measuring public trust from online social media. Our results show that the most polarizing dimensions are different between states. The extracted general polarization has a low correlation to pandemic outcomes, pandemic human mobility, and public anxiety. Applying the measures to our conceptual framework showed that the results of this study have important implications for public health crises. Our findings suggest that the clarity of government communication significantly impacts social media polarization, public anxiety, and human mobility. Clear communication can increase public trust and compliance, improving control over the disease spread and reducing economic uncertainty. On the other hand, unclear communication can amplify communication polarity and public anxiety, lower public trust and compliance, and increase disease spread and economic uncertainty. The study also highlights the complex role of presidential communications within the context of a public health crisis. We found that the impact of presidential communications varies across different periods and contexts. Our results suggest that presidential communications can significantly influence the progression of a health crisis through its direct and mediated effects on emotional polarization, public trust, and human mobility and its moderated effects on public trust in an uncertain economic environment. In conclusion, our study contributes to a better understanding of the role of government communication in shaping public attitudes and behaviors during public health crises. Our results suggest that policymakers should prioritize clear communication and transparency to increase public trust and compliance. Additionally, policymakers should be aware that the impact of communication on public attitudes and behaviors can vary across different periods and contexts. Future research should explore the effectiveness of communication strategies in shaping public attitudes and behaviors during public health crises and develop more targeted communication strategies that account for the complex relationships between communication, public attitudes, and behaviors.
Presidential communications on Twitter during the COVID-19 pandemic 97
ACKNOWLEDGMENTS We thank our research assistants, Chase Zhang and Ryan Oet, for their vital contributions to this study’s conceptual and empirical aspects. We also thank the 10th International Conference on Collaborative Innovation Networks (COINs) participants, particularly Peter Gloor, Francesca Grippa, Julia Gluesing, Ken Riopelle, Richard B. Freeman, and Aleksandra Przegalinska, for their constructive critiques and suggestions. Furthermore, we acknowledge Professor Andrea Fronzetti Colladon from the University of Perugia for his invaluable mentorship of the Semantic Brand Score (SBS) software, which facilitated our research. We thank the Northeastern University Diplomacy Lab for its support and assistance as part of the Department of State’s Diplomacy Lab academic research initiative. We also gratefully recognize the financial support that Tuomas Takko received from the Vilho, Yrjö, and Kalle Väisälä Foundation of the Finnish Academy of Science and Letters. Finally, we express our thanks for the funding support provided by the Northeastern University Office of the Provost.
REFERENCES Alaszewski, A., & Coxon, K. (2009). Uncertainty in everyday life: risk, worry and trust. Health, Risk & Society, 11(3), 201–7. https://doi.org/10.1080/13698570902906454. Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. The Quarterly Journal of Economics, 131(4), 1593–636. Baker, S. R., Bloom, N., Davis, S. J., Kost, K., Sammon, M., & Viratyosin, T. (2020). The unprecedented stock market reaction to COVID-19. The Review of Asset Pricing Studies, 10(4), 742–758. Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. Bertot, J. C., Jaeger, P. T., & Hansen, D. (2012). The impact of policies on government social media usage: issues, challenges, and recommendations. Government Information Quarterly, 29(1), 30–40. Bollen, J., Mao, H., & Pepe, A. (2011). Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, July. https://doi.org/10.48550/arXiv.0911.1583. Bonavita, I., & Maitland, E. (2021). A trust-motivated framework for assessing governments engagement with citizens on social media. Workshop Proceedings, SocialSens 2021. https://doi.org/10 .36190/2021.33. Bonneux, L., & Van Damme, W. (2006). An iatrogenic pandemic of panic. British Medical Journal, 332(7544), 786–8. Centers for Disease Control and Prevention (2023). COVID data tracker. Atlanta, GA: US Department of Health and Human Services, CDC. https://covid.cdc.gov/covid-data-tracker (accessed 7 March 2023). Chapin, C., & Roy, S. S. (2021). A spatial web application to explore the interactions between human mobility, government policies, and COVID-19 cases. Journal of Geovisualization and Spatial Analysis, 5(12), 1–8. Claeys, A. S., & De Waele, A. (2022). From message to messenger: should politicians lead-by-example to increase compliance with public health directives? Health Communication, 38(14), 3393-3408. https://doi.org/10.1080/10410236.2022.2150806. Cohen, J. (1992). Quantitative methods in psychology: a power primer. Psychological Bulletin, 112(1),155–9. Conover, M. D., Ratkiewicz, J., Francisco, M., Gonçalves, B., Menczer, F., & Flammini, A. (2011). Political polarization on Twitter. Fifth international AAAI conference on weblogs and social media, July. https://ojs.aaai.org/index.php/ICWSM/article/view/14126. DiMaggio, P., Evans, J., & Bryson, B. (1996). Have Americans’ social attitudes become more polarized? American Journal of Sociology, 102(3), 690–755.
98 Handbook of social computing Ding, H. (2009). Rhetorics of social media in an emerging epidemic: SARS, censorship, and extra-institutional risk communication. Technical Communication Quarterly, 18(4), 327–50. Ding, H. (2014). Rhetoric of a Global Epidemic: Transcultural Communication about SARS. SIU Press. Dong, E., Du, H., & Gardner, L. (2020). An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases, 20(5), 533–4. Fronzetti Colladon, A. (2018). The semantic brand score. Journal of Business Research, 88, 150–60. Fronzetti Colladon, A., & Grippa, F. (2020). Brand intelligence analytics. In A. Przegalinska, F. Grippa, & P. A. Gloor (eds), Digital Transformation of Collaboration (pp. 125–41). Springer Nature. Glass, T. A., & Schoch-Spana, M. (2002). Bioterrorism and the people: how to vaccinate a city against panic. Clinical Infectious Diseases, 34(2), 217–23. Goffman, W., & Newill, V. A. (1964). Generalization of epidemic theory: an application to the transmission of ideas. Nature, 204(4955), 225–8. Goffman, W., & Newill, V. A. (1967). Communication and epidemic processes. Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 298(1454), 316–34. Google LLC (n.d.). Google COVID-19 community mobility reports. https://www.google.com/COVID19/ mobility/(accessed 7 March 2023). Grimmelikhuijsen, S., & Knies, E. (2017). Validating a scale for citizen trust in government organizations. International Review of Administrative Sciences, 83(3), 583–601. Hayes, A. F. (2017). Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach (2nd edn). The Guilford Press. Hong, S., & Kim, S. H. (2016). Political polarization on Twitter: implications for the use of social media in digital governments. Government Information Quarterly, 33(4), 777–82. Hu, T., Wang, S., She, B., Zhang, M., Huang, X., Cui, Y., … & Li, Z. (2021). Human mobility data in the COVID-19 pandemic: characteristics, applications, and challenges. International Journal of Digital Earth, 14(9), 1126–47. Huang, X., Li, Z., Jiang, Y., Li, X., & Porter, D. (2020). Twitter reveals human mobility dynamics during the COVID-19 pandemic. PloS One, 15(11), e0241957. Huynh, N., Dao, A., & Nguyen, D. (2021). Openness, economic uncertainty, government responses, and international financial market performance during the coronavirus pandemic. Journal of Behavioral and Experimental Finance, 31, 100536. Hyland-Wood, B., Gardner, J., Leask, J., & Ecker, U. K. (2021). Toward effective government communication strategies in the era of COVID-19. Humanities and Social Sciences Communications, 8(1), 1–11. https://doi.org/10.1057/s41599-020-00701-w. Imran, M., Qazi, U., & Ofli, F. (2022). Tbcov: two billion multilingual Covid-19 tweets with sentiment, entity, geo, and gender labels. Data, 7(1), 1–27. Jiang, J., Chen, E., Yan, S., Lerman, K., & Ferrara, E. (2020). Political polarization drives online conversations about COVID‐19 in the United States. Human Behavior and Emerging Technologies, 2(3), 200–211. Kim, D. K. D., & Kreps, G. L. (2020). An analysis of government communication in the United States during the COVID‐19 pandemic: recommendations for effective government health risk communication. World Medical & Health Policy, 12(4), 398–412. Knapp, T. R. (2007). Bimodality revisited. Journal of Modern Applied Statistical Methods, 6(1), 8–20. Leiss, W., & Powell, D. (1997). Mad Cows and Mother’s Milk: The Perils of Poor Risk Communication. McGill-Queen’s University Press. Levitt, E. E., Gohari, M. R., Syan, S. K., Belisario, K., Gillard, J., DeJesus, J., … MacKillop, J. (2022). Public health guideline compliance and perceived government effectiveness during the COVID-19 pandemic in Canada: findings from a longitudinal cohort study. The Lancet Regional Health – Americas, 9(May), 1–11, 100185. Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50(4), 370–396. Meredith, L. S., Eisenman, D. P., Rhodes, H., Ryan, G., & Long, A. (2007). Trust influences response to public health messages during a bioterrorist event. Journal of Health Communication, 12(3), 217–32. Mergel, I. (2013). Social media adoption and resulting tactics in the U.S. federal government. Government Information Quarterly, 30(2), 123–30. Mheidly, N., & Fares, J. (2020). Leveraging media and health communication strategies to overcome the COVID-19 infodemic. Journal of Public Health Policy, 41(4), 410–20.
Presidential communications on Twitter during the COVID-19 pandemic 99 Mohammad, S. M., & Turney, P. D. (2013). NRC emotion lexicon. National Research Council, Canada, 2, 1–234. https://doi.org/10.4224/21270984. Musolff, A., Breeze, R., Kondo, K., & Vilar-Lluch, S. (eds) (2022). Pandemic and Crisis Discourse: Communicating COVID-19 and Public Health Strategy. Bloomsbury. Olson, R. S., & Gawronski, V. T. (2010). From disaster event to political crisis: a “5C+ A” framework for analysis. International Studies Perspectives, 11(3), 205–21. Park, M. J., Kang, D., Rho, J. J., & Lee, D. H. (2016). Policy role of social media in developing public trust: Twitter communication with government leaders. Public Management Review, 18(9), 1265–88. Patterson, O., Weil, F., & Patel, K. (2010). The role of community in disaster response: conceptual models. Population Research and Policy Review, 29(2), 127–41. Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. https://www.liwc.app/static/documents/LIWC-22%20Manual%20 -%20Development%20and%20Psychometrics.pdf (accessed 7 March 2023). Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing interaction effects in multiple linear regression, multilevel modeling, and latent curve analysis. Journal of Educational and Behavioral Statistics, 31(4), 437–48. Qazi, U., Imran, M., & Ofli, F. (2020). GeoCoV19: a dataset of hundreds of millions of multilingual COVID-19 tweets with location information. SIGSPATIAL Special, 12(1), 6–15. Quinn, S. C., Parmer, J., Freimuth, V. S., Hilyard, K. M., Musa, D., & Kim, K. H. (2013). Exploring communication, trust in government, and vaccination intention later in the 2009 H1N1 pandemic: results of a national survey. Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science, 11(2), 96–106. Renn, O., & Levine, D. (1991). Credibility and Trust in Risk Communication. Springer. Reynolds, B., & Quinn, S. C. (2008). Effective communication during an influenza pandemic: the value of using a crisis and emergency risk communication framework. Health Promotion Practice, 9(4_suppl.), 13S–17S. Ringle, C. M., Wende, S., & Becker, J.-M. (2022). SmartPLS 4. Oststeinbek: SmartPLS. https://www .smartpls.com (accessed 7 March 2023). Sandman, P. M. (1994). Mass media and environmental risk: seven principles. Risk, 5(3), 251–260. Sarle, W. S. (1983). The Cubic Clustering Criterion. SAS Technical Report A-108. Cary, NC: SAS Institute Inc. Skoric, M. M., Zhu, Q., Goh, D., & Pang, N. (2016). Social media and citizen engagement: a meta-analytic review. New Media & Society, 18(9), 1817–39. Song, C., & Lee, J. (2016). Citizens’ use of social media in government, perceived transparency, and trust in government. Public Performance & Management Review, 39(2), 430–53. Tarbă, N., Voncilă, M. L., & Boiangiu, C. A. (2022). On generalizing Sarle’s bimodality coefficient as a path towards a newly composite bimodality coefficient. Mathematics, 10(7), 1042. Ungar, S. (1998). Hot crises and media reassurance: a comparison of emerging diseases and Ebola Zaire. British Journal of Sociology, 49(1), 36–56. Ungar, S. (2008). Global bird flu communication: hot crisis and media reassurance. Science Communication, 29(4), 472–97. Van Bavel, J. J., Baicker, K., Boggio, P. S., Capraro, V., Cichocka, A., Cikara, M., … Drury, J. (2020). Using social and behavioral science to support COVID-19 pandemic response. Nature Human Behaviour, 4(5), 460–71 Van Damme, W., & Van Lerberghe, W. (2000). Epidemics and fear. Tropical Medicine and International Health, 5(8), 511–14. Van Scoy, L. J., Snyder, B., Miller, E. L., Toyobo, O., Grewel, A., Ha, G., … Lennon, R. P. (2021). Public anxiety and distrust due to perceived politicization and media sensationalism during early COVID-19 media messaging. Journal of Communication in Healthcare, 14(3), 193–205. West, D. M. (2005). Digital Government: Technology and Public Sector Performance. Princeton University Press. Xu, D., Li, J. Y., & Lee, Y. (2022). Predicting publics’ compliance with containment measures at the early stages of COVID-19: the role of governmental transparent communication and public cynicism. International Journal of Strategic Communication, 16(3), 364–85. Young, S. (2022). Rituals, reassurance, and compliance: government communication in Australia during the COVID-19 pandemic. In P. J. Maarek (ed.), Manufacturing Government Communication on Covid-19: A Comparative Perspective (pp. 147–74). Springer International.
6. COVID-19 Twitter discussions in social media: disinformation, topical complexity, and health impacts Mikhail Oet, Xiaomu Zhou, Kuiming Zhao and Tuomas Takko
1. INTRODUCTION Have you heard that sunlight and disinfectant water can cure COVID-19? The pandemic infodemic has made rumors viral, exacerbated by the spread of disinformation.1 Social media has become an essential source of information for millions seeking answers about the novel virus, its symptoms, and ways to stay safe (Cinelli et al., 2020). Over 80 percent of Americans use their social media networks to obtain news (Shearer, 2021), and social media significantly influences public sentiments (Auxier, 2020). Unfortunately, much online information was misleading, causing confusion, fear, and harm. A recent Northwestern University study (Kulke, 2020) found that people who get news from social media are more likely to believe coronavirus misinformation. During a White House press briefing in April 2020, the president mused about using disinfectants to treat the virus, leading to a Centers for Disease Control and Prevention (CDC) survey finding that a third of the respondents consumed diluted bleach solutions, soapy water, and other disinfectants to protect themselves from the coronavirus (Centers for Disease Control and Prevention, 2020). As people increasingly rely on social media for pandemic information, distinguishing fact from fiction becomes critical for personal health and safety. The rise of digital media has created a new landscape for disinformation spread by bots, trolls, and others (Howard, 2020). For instance, Benson (2020) finds Twitter (rebranded later as X) bots responsible for the massive spread of disinformation by leading users to low credibility resources, complicating truth differentiation, and exacerbating public health risks. Furthermore, the infodemic (Jones, 2020) resulted in health and safety decisions based on false information, causing the spread of the virus and putting people at risk. Disinformation influenced the public to adopt dangerous health behaviors and mistrust essential health information, leading to poorer health outcomes. A 2021 study by the Center for Countering Digital Hate found that 12 individuals were responsible for 65 percent of anti-vaccine hoaxes on social media (Nogara et al., 2022). 1.1
Research Questions and Claims
This study investigates the topical complexity of COVID-19 discussions on social media, disinformation detection, its pandemic health impact, and the role of bots in spreading COVID-19 disinformation. The study answers four primary research questions: 100
COVID-19 Twitter discussions in social media 101 RQ1. What are the prevalent COVID-19-related disinformation themes on social media? RQ2. What are the factors and indicators of social media disinformation? RQ3. What are the effects of disinformation on public health? RQ4. What is the relative role of bots versus human accounts in generating and spreading disinformation? The study makes four claims: Claim 1: The topical complexity in social media COVID-19 discussions grows over time, partly due to two phenomena, the spread of disinformation and the growing resilience of the public to known disinformation. Claim 2: A disinformation measurement that is grounded in theory and supported by empirical evidence can be accurately assessed in a linear time frame with minimal human involvement.2 Claim 3: The relationship between disinformation and negative health impact during the COVID-19 pandemic is greater in 2021 than in 2020. Claim 4: Bots play a significant role in spreading COVID-19-related disinformation on social media. This chapter is structured into six sections, beginning with an introduction (Section 1) that outlines the significance of the study’s focus. In Section 2, the conceptual framework, we discuss the connection of this study to the communication theories of disinformation. The framework provides a comprehensive understanding of disinformation and its impact on public health. We explore the relevant literature and existing frameworks to develop a novel approach to measuring disinformation on social media. This approach operationalizes the measurement of social media disinformation by constructing the Probability of Social Media Disinformation, including theoretical indicators of manipulation and deception and empirical indicators informed by communication theories, and applying a falsifiable quantitative strategy to assign weights to the indicators, resulting in a rigorous validation process. This framework forms the basis for the subsequent analysis of COVID-19 discussions on social media. We then describe our methodology in Section 3, which includes a variety of computational techniques, such as text analysis, sentiment analysis, and network analysis, to comprehensively understand the topical complexity of COVID-19 discussions on social media and identify the most influential topics and actors. For data analysis process, we use U.S.-based Twitter data collected on specific dates (November 2020 and December 2021) utilizing COVID-19-related tweets and additional datasets from the Johns Hopkins University, New York Times, and the COVID States University Consortium Project. We use natural language processing techniques and psychometric algorithms to structure textual data and extract reliable discourse topics, which are validated through triangulation among multiple methods of analysis and reliability statistics.
102 Handbook of social computing Section 4 presents our findings regarding the relationship between disinformation and negative health impacts during the COVID-19 pandemic. We compare the average disinformation level against a state-level negative health impact metric to determine if the relationship between the two has increased over time. We use descriptive and inferential statistics to present our results. Section 5 provides a discussion and interpretation of our results, comparing them with existing literature and evaluating the implications and limitations of our findings. Finally, in Section 6, we summarize the contributions of this study and provide recommendations for future research.
2.
CONCEPTUAL FRAMEWORK
2.1 Definitions 2.1.1 Disinformation Disinformation is the “deliberate creation and sharing of information known to be false” (Bakir & McStay, 2018, p. 159). Disinformation can be spread through various mediums, including social media, traditional media, and online forums. It can have significant political and social consequences (Woolley & Howard, 2016). According to Woolley and Howard (2018), disinformation can be categorized into three types: misinformation, where information is shared without malicious intent but is inaccurate; propaganda, where information is shared with the intent to manipulate or deceive; and strategic communication, where data is shared with the intent to influence a specific audience. As disinformation can harm democracy, public health, and social cohesion (Lazer et al., 2018), it is crucial for individuals to be vigilant and critical of the information they encounter and for policymakers to develop strategies to combat the spread of disinformation (Fletcher et al., 2018). 2.1.2 Health impacts Public health impacts refer to the effects of disease on the health of populations (Kass, 2001). These impacts are often measured using various metrics, including deaths per capita, cases per capita, and cases per test. Deaths per capita measure mortality from a particular disease or condition per population unit. Cases per capita measure the number of confirmed cases of a specific illness or condition per population unit (Bonita et al., 2006). During the COVID-19 pandemic, deaths per capita have been used to compare the impact of the disease across countries and to track the effectiveness of interventions. In contrast, cases per capita have been used to compare the prevalence of the disease across countries and to track the effectiveness of testing and contact tracing efforts (Heuveline & Tzen, 2021). Cases per test measure the positivity rate of COVID-19 tests, calculated as the number of positive test results divided by the total number of tests conducted (Johns Hopkins University, 2021). This metric is used to assess testing programs’ accuracy and efficiency and monitor the spread of outbreaks. 2.1.3 Human mobility Public mobility, also known as human mobility, is a construct that refers to the physical movement of people within and between geographic locations (González et al., 2008). Measuring public mobility has become increasingly important for researchers and policymakers, particu-
COVID-19 Twitter discussions in social media 103 larly in the context of public health (Chang et al., 2021) and COVID-19 transmission (Bergman & Fishman, 2020; Badr et al., 2020; Askitas et al., 2020; Sulyok & Walker, 2020; Sadowski et al., 2021; Kim & Kwan, 2021). By analyzing mobility patterns, researchers can better understand how the virus spreads between communities and develop more effective strategies for controlling its transmission. These studies highlight the role of human mobility in spreading the virus and the need for tailored interventions that consider regional mobility patterns. For example, Takko et al. (2023) developed a network model to simulate the exposure between populations based on their mobility patterns. They found that the activity-induced contact between people in different regions is correlated to the government-imposed restrictions indicating effectiveness in reducing possible contact. 2.2
Research Questions
2.2.1
RQ1: what are the prevalent COVID-19-related disinformation themes on social media? The relationship between the COVID-19 pandemic and false information has been studied in academia, demonstrating its importance in understanding the connection between social media disinformation and public health (Gallotti et al., 2020; Lee et al., 2022). Identifying prevalent COVID-19-related disinformation themes can provide insights into the strategies and motivations behind their spread, guiding public health interventions and communication strategies. However, there is disagreement among studies on the prevalence of COVID-19-related disinformation themes on social media. Although some studies propose that disinformation themes have a low prevalence and that accurate information is more frequently shared than inaccurate information (Kouzy et al., 2020), other studies suggest widespread disinformation themes considerably affect public attitudes and behaviors, such as vaccine hesitancy and political polarization. 2.2.2
RQ2: what factors and indicators can reliably measure disinformation on social media? While there is general agreement that disinformation is a problem on social media, there is an ongoing debate about identifying and measuring disinformation reliably and validly. Some researchers argue that traditional content analysis methods are not effective in identifying disinformation reliably (Zhou et al., 2004; Kumar & Geethakumari, 2014; Kim & Hastak, 2018; Roozenbeek & Van Der Linden, 2019; de Oliveira et al., 2021), while others argue that new machine learning and natural language processing techniques may be too complex and opaque to be reliable (Wardle & Derakhshan, 2017; Guo et al., 2022; Zhou et al., 2019). 2.2.3 RQ3: what are the effects of disinformation on public health? Studying the effects of disinformation on public health is critical for protecting public health, but there is controversy regarding these effects. While some studies have found a negative impact of disinformation on public health (Cuan-Baltazar et al., 2020; Pennycook et al., 2020), others suggest that the relationship between disinformation and public health is complex and may be influenced by factors such as political ideology, individual susceptibility, and media literacy (Roozenbeek et al., 2020). For example, Cuan-Baltazar et al. (2020) found that COVID-19 misinformation was associated with lower knowledge and adoption of preventive behaviors, while Pennycook et al. (2020) found that exposure to COVID-19 misinformation
104 Handbook of social computing was associated with lower compliance with public health guidelines. Roozenbeek et al. (2020) found that political ideology may influence how individuals process health-related misinformation. RQ4: what is the relative role of bots versus human accounts in generating and spreading disinformation? Understanding the relative role of bots versus human accounts in spreading disinformation is crucial for designing effective interventions to combat this problem. While considerable evidence highlights that bots can be crucial in spreading disinformation, the relative importance of bots versus human accounts still needs to be determined. Some studies suggest that bots play a significant role in spreading disinformation on social media (Bessi & Ferrara, 2016; Shao et al., 2018), while others suggest that the role of bots may be overstated (Howard et al., 2018; Starbird, 2019; Nogara et al., 2022). For example, Bessi and Ferrara (2016) find that bots spread many links to conspiracy theories and false information on Twitter. Similarly, Shao et al. (2018) find that bots were crucial in promoting content related to the 2016 U.S. presidential election. However, Howard et al. (2018) found that human accounts spread most of the Twitter content related to the 2016 U.S. presidential election. Starbird (2019) argues that “effective disinformation campaigns involve diverse participants” (p. 449), including state-sponsored actors, bots, and individual users, who create and disseminate disinformation. Starbird et al. (2019) provide several case studies of strategic information operations (SIOs) to illustrate how different actors collaborate to develop and spread disinformation. 2.2.4
2.3
Conceptual Framework
Figure 6.1 depicts the conceptual framework that guides this research. The framework examines the interaction between social media disinformation, social media topical complexity, and the health impacts of the pandemic. Alternative nomological formulations may optimally operationalize these constructs. For example, disinformation can be identified as multiple causes multiple indicators (MIMIC) latent factor if configurally and metrically invariant across various groups (Figure 6.1a). Alternatively, suppose the structure of the disinformation measurement is only configurally invariant. In that case, distinct groups interpret disinformation differently, leading to different health impacts (Figure 6.1b). It is also possible that the concept of social media disinformation lacks a common configural pattern across distinct groups. In this case, individual path relationships between elements of social media discourse and their impacts on human mobility and health must be considered. Figure 6.1c depicts the complex relationship between social media disinformation, social media topical complexity, and health impacts, which can differ across time, states, and social media topics. 2.4
Hypothesis Development
This section develops a model to test the relationship between social media disinformation and pandemic health impacts. We propose two testable hypotheses to address our research questions: a measurement hypothesis (H1) and a structural hypothesis (H2). The model incorporates disinformation as a mediator of emotional contagion, emotional arousal, positive affect, linguistic complexity, and violation expectation on pandemic health impacts measured by disease impact and human mobility.
Conceptual framework of disinformation measurement and impact analysis under configural and metric invariance
Authors’ own.
Figure 6.1a
Source:
COVID-19 Twitter discussions in social media 105
106 Handbook of social computing
Source:
Authors’ own.
Figure 6.1b
Source:
Conceptual framework of disinformation measurement and impact analysis under configural invariance
Authors’ own.
Figure 6.1c
Conceptual framework of disinformation measurement and impact analysis under non-invariance
COVID-19 Twitter discussions in social media 107 2.4.1 Measurement hypothesis To measure disinformation, we propose a multi-dimensional construct that includes emotional contagion, emotional arousal, positive affect, linguistic complexity, and violation of expectation. By combining these factors, we aim to capture the complex nature of disinformation and the various tactics that may be used to manipulate individuals. Ensuring this measurement model’s reliability, convergent validity, and discriminant validity is crucial for measuring disinformation accurately in different contexts. Moreover, the model’s invariance across different samples makes it a valuable tool for researchers and policymakers combating disinformation, enabling its use in various settings and facilitating the formulation of effective strategies across different populations and contexts. Hypothesis 1(H1). Disinformation can be measured as a latent factor of emotional contagion, emotional arousal, positive affect, linguistic complexity, and violation expectation, such that each factor contributes to the overall construct of disinformation and achieves: • •
•
•
1(H1a). Reliability, such that each factor’s composite reliability (CR) exceeds 0.70, ensures that disinformation measurement is consistent and stable across multiple measurements. 1(H1b). Convergent validity, such that the unique one-to-one factor loading is shown by the factor components with average variance extracted (AVE) exceeding 0.50 and CR exceeding AVE. This property confirms that each factor contributes to the overall construct of disinformation and that the factors measure the same underlying construct. 1(H1c). Discriminant validity, such that AVE exceeds maximum shared variance (MSV) for each factor, and AVE exceeds average shared variance (ASV). This property confirms that each factor is distinct from the others, measuring a different aspect of disinformation and that there is no overlap between the factors. 1(H1d). The factor measurement model is invariant configurally or metrically at 95 percent confidence, such that there is no statistically significant difference in the factor construction between different samples, including those from different populations or contexts.
2.4.2 Structural hypotheses Disinformation is expected to correlate positively with public mobility during the pandemic, potentially leading to risky behaviors and decreased adherence to public health guidelines. Accurate information promotes informed decision-making, potentially reducing public mobility and improving health outcomes. The spread of disinformation is also expected to increase adverse health impacts by reducing trust in public health messaging, promoting unproven treatments, and preventing individuals from following evidence-based recommendations. Given these considerations, we hypothesize that community mobility mediates the relationship between disinformation and pandemic health impacts. Disinformation can lead to risky behaviors that increase community transmission, leading to adverse health impacts. To test this mediation hypothesis, we will analyze disinformation’s direct and indirect effects on pandemic health impacts, with community mobility as the mediator variable. Hypothesis 2 (H2). At a 95 percent confidence level, the relationship between disinformation and negative health impacts during the pandemic is mediated by community mobility.
108 Handbook of social computing
3. METHODOLOGY We use an applied economics approach to analyze our social media datasets. We employ several methods to extract insights from the data. These include (Table 6.1) basic text analysis, sentiment analysis, association mining, latent Dirichlet allocation (LDA), exploratory and confirmatory factor analysis (EFA and CFA), the disinformation index, network analysis, and community influence. Basic text and sentiment analysis are exploratory methods that improve our understanding of the text structure and provide insights into the narrative’s tone, language, emotion, and other characteristics. EFA and CFA identify the most influential topics in the data, while association mining and LDA help to extract insights from the textual data. The disinformation index quantifies the probability that text contains disinformation on specific topics. Network analysis helps us analyze social structures through connections among social media actors. Finally, community influence identifies the right avenue of influence against a particular target through social media or traditional messaging. Table 6.1
Methods
Method
Description
Basic Text Analysis
Preprocesses data for further analysis.
Sentiment Analysis
Discovers emotional tendencies in analyzed articles.
Association Mining
Extracts insights from textual data and performs topic modeling.
Latent Dirichlet Allocation
A statistical method for unsupervised topic discovery generates topics as unobserved groups of elements (e.g., words), where the same textual units can occur in different topics.
Exploratory Factor Analysis (EFA) Statistical methods that allow for unsupervised topic discovery, identifying the most influential and Confirmatory Factor Analysis topics discussed in analyzed articles. (CFA) Disinformation Index
Quantifies the probability that text contains disinformation on specific topics.
Network Analysis
Analyzes social structures through connections among social media actors.
Community Influence
Identifies the right avenue of influence against a specific target (individual or group) through social media or traditional messaging.
3.1
Text Data Preprocessing
Textual data must be preprocessed using natural language processing (NLP) techniques before analysis (Figure 6.2). To preprocess our collected tweets, we utilized R software packages “tm,” “textclean,” and “textstem.” Finally, we used R’s DocumentTermMatrix () function from the “tm” package to structure the textual data. 3.2
Association Mining and Topic Modeling
In addition to LDA, we also used Factor Analysis (FA) for topic extraction. FA is a statistical method that reduces the number of observed and correlated variables to a smaller set of unobserved variables called factors. Using FA, we extracted latent topics from the structured document term matrix, providing a unique mapping of observed terms to extracted factors. This approach allowed for a reliable human interpretation of each factor and AI-based topic naming, addressing the limitations of LDA for interpretation.
COVID-19 Twitter discussions in social media 109
Figure 6.2 3.3
Text preprocessing algorithm
Triangulation Method of Validation
To validate our latent topic extraction results, we applied the triangulation method (Denzin, [1970] 2017), which involves comparing the results obtained by at least three FA implementations. We confirmed that FA results are similar, demonstrating a consensus in extracting latent topics. The triangulation process also enables the selection of an optimal topic extraction. We extracted topics using four different FA methods: method 1 (manual FA algorithm using IBM SPSS software), method 2 (manual implementation of FA algorithm using R software), method 3 (automated Non-negative Matrix Factorization FA algorithm) (Lee & Seung, 1999, 2000) using Provalis Research WordStat software), and method 4 (automated FA algorithm using Provalis Research WordStat software). 3.4 Sampling We use the Keyword Retrieve function in WordStat software to extract tweets and related data for specific topics. This extraction yields a multi-dimensional data frame of tweets, their terms, and user data, including keywords, keyword phrases, and case numbers. We utilize the case number to extract native and inferred text attributes for further analysis. 3.5
Disinformation Index
Our disinformation index uses a scientific model that combines theoretical and empirical validation to understand online disinformation and misinformation. While Precedent Global Disinformation Index (GDI, 2019) and News Guard ratings (Pratelli & Petrocchi, 2022) evaluate the underlying site’s “organizational and journalistic standards,” news domain ownership, funding sources, content moderation policies, editorial independence, and so on, we refine, extend, and support disinformation constructs measurement to focus on unstructured information, regardless of its origin. Disinformation is characterized by a deliberate intent to mislead
110 Handbook of social computing
Notes: To measure disinformation, we have referred to various communication theories, including Grice’s (1975) conversational communication in social situations, the Undeutsch hypothesis (Amado et al., 2015), reality monitoring theory (Johnson & Raye, 1981), four-factor theory (Zuckerman et al., 1981), and information manipulation theory (McCornack et al., 2014). Rastogi and Bansal (2022) suggest integrating disinformation measurement with normative influence theory (Deutsch & Gerard, 1955), availability cascade theory (Kuran & Sunstein, 1998), and social cognitive theory (Bandura, 2009).
Figure 6.3
The conceptual model for disinformation measurement
(American Psychological Association, 2022). Therefore, to measure disinformation, we drew on communication theory to provide indicators of manipulation and deception, including verbal cues, persuasive language techniques, and emotional manipulation. These factors encompass the face validity of the disinformation construct, distinct from misinformation that involves getting the facts wrong. Our goal is to develop an algorithmic measure of disinfor-
COVID-19 Twitter discussions in social media 111 mation that minimizes human interaction, is automated, and can scale in real-time operations. Figure 6.3 presents a conceptual framework of the disinformation index grounded in empirical evidence and theoretical foundations. 3.6
Disinformation Constructs
Emotional Contagion refers to the spread of emotions from one person to another, which deceivers can use to elicit negative emotions, such as disgust, in their targets, making them more vulnerable to manipulation. Using informal language and first-person personal pronouns can create a sense of closeness or trust, further facilitating manipulation. Emotional Arousal refers to the intensity of an emotional experience, which can be elicited through various means, including communication. Deceivers may use fear or sadness to create high emotional arousal in their targets, making it harder for them to think rationally or critically. A negative tone can also create tension or conflict, increasing emotional arousal. Positive Affect refers to the experience of positive emotions, such as joy, which can be elicited through communication. Deceivers may use positive affect to create a sense of goodwill or positive association with their targets, increasing compliance. The use of trust also enhances credibility, making the target more likely to believe the deceiver’s claims. Linguistic Complexity refers to the level of complexity or sophistication in the language used in communication. Deceivers may use linguistic complexity to create an impression of credibility or authority. Using long sentences and less common words, known as hapaxes, can indicate a higher linguistic complexity. Using “they” can also suggest a nuanced understanding of the subject. Violation Expectation refers to the experience of surprise or confusion resulting from unexpected or unusual occurrences. Deceivers may use a violation of expectations to create a sense of disorientation or confusion, making it harder for targets to evaluate messages critically. Intense surprises can further enhance the sense of violation expectation. 3.7
Feature Selection for Empirical Disinformation Index
Constructing an influential multilevel disinformation factor relies on several operational decisions that qualitative, quantitative, or mixed methods considerations can support. First, we must choose among alternative higher-level factors that describe the dimensions of disinformation. Second, we must select correlated indicators that reflect the latent level-2 factors or select uncorrelated indicators that form these factors. Third, we must have a quantitative strategy regarding the assignment of weights. We analyze the model using unsupervised methods without a classified sample where the outcome variable (disinformation) is available. We can estimate optimal weight using classification models when a classified sample is available. This study considers that some indicators for manipulation, deception, and disinformation are more powerful and essential than others and require higher weights. Therefore, it is important to quantify these indicators’ importance in a replicable and falsifiable way. We leverage the semantic brand score (SBS) platform (Fronzetti Colladon, 2018) to run a complete emotional analysis and to access a rich and customizable portfolio of lexical dimensions, including replications of Linguistic Inquiry and Word Count (LIWC) output
112 Handbook of social computing (Pennebaker et al., 2015), the NRC (National Research Council Canada) lexical categories, recent contributions from other NLP researchers, and an ability to create customized linguistic measures of our design. Rastogi and Bansal (2022) suggest that prior manipulation and deception theories can be considered in two groups: news-related and social impact theories. News-related theories are well described by constructs of “writing style,” “more unique words,” “sentiments,” and “word counts.” In comparison, social impact theories can be described by the three constructs of “likes,” “retweets,” and “popularity.” Following Rastogi and Bansal’s analysis, we focus on the news-related constructs for our study. Based on their insights, we select linguistic features of “writing style,” “complexity,” “sentiments,” and “word counts” to identify disinformation in social media networks. The construct of “writing style” supports the Undeustch (1967) hypothesis that a statement derived from real-life experiences is expected to differ in content and quality from fabricated statements. The construct of “complexity” is captured through more unique words and reflects the Reality Monitoring Theory view that facts should contain more detailed sensory information than artificial creations or manipulated content. The construct of “sentiments” comes from the Four-Factor Theory, which states that deceptions have an underlying manipulative intent reflected in arousal, guilt, and dominance while trying to appear authentic. Lastly, the “word counts” construct stems from the Information Manipulation Theory insight that deception contains extensive information with unnecessary detail. We considered these insights in selecting linguistic features for the disinformation measurement model. Another set of linguistic features comes from the seminal work of Newman et al. (2003) on predicting deception from linguistic styles. Based on the results of the experiments in the study, researchers concluded that five linguistic features play a significant role in accurate predictions of deception: first-person singular pronouns and third-person pronouns,3 negative emotion words,4 unique words,5 exclusive words,6 and motion verbs.7 According to Shrestha (2018), researchers leveraging linguistic cues have shown that deceivers use language in a specific way to make fake news seem legitimate. Shrestha found that fake news tends to convey expertise, appear negative in tone, and denote lesser analytical thinking. Informed by the precedent theoretical literature and the recent empirical work, we specify a two-level, five-factor model of disinformation using 22 indicators shown in Figure 6.3. To operationalize the features suggested by prior research, we used LIWC-22 (Pennebaker et al., 2022) and the full-text analysis function from the SBS platform to obtain quantitative values for our selected features. 3.8
Sign Expectations, Model Calibration, and Validity Testing
To validate the disinformation construct, we must test the assumptions of equal importance for each indicator and the positive relationship between all manifest variables and latent constructs. To achieve this, we trained a logistic regression model on two pre-classified datasets of COVID-19-related social media. We considered prior findings and economic reasoning when establishing the sign expectations for each indicator, as shown in Table 6.2, columns 4 and 5. For the first dataset, 6 of 18 indicators violated expected signs, including arousal, fear, joy, surprise, negative tone, and first-person pronouns. Furthermore, 6 out of 18 indicators failed to achieve statistical significance in explaining disinformation. This relatively inferior
COVID-19 Twitter discussions in social media 113 performance suggested that the dataset’s properties, such as inferior classification, may have contributed to the results. To cross-validate the directional associations of indicators with disinformation, we applied a logistic regression on a second COVID-19-related misinformation tweet dataset from a different source. The second model completely agreed with the expected signs, with all indicators except one (surprise) being statistically significant in explaining disinformation. These results supported the disinformation model’s theoretical and empirical validity and suggested that the inferior quality of the first dataset may have been the cause of the poor results. Additionally, the model calibrated on the classified data allows us to improve the estimation of disinformation factors beyond the naïve equal weights assumption. 3.9
Estimating Disinformation Probability
We use a three-step algorithm to estimate each tweet’s disinformation probability. First, we normalize the values of selected indicators. Next, we sum up the normalized values of all selected indicators. Finally, we rescale the sum value to a range of 0 to 100. This algorithm allows us to interpret the score as the probability of disinformation for each tweet based on the selected indicators from LIWC and SBS. 3.10
Classification of Disinformation
After estimating the disinformation probability of each tweet in our sample, we obtain their distribution. The crucial question is how to classify the distribution efficiently and accurately. Several approaches are available, with a tradeoff between computational efficiency and accuracy. We use a simple strategy by setting a threshold value for the disinformation probability within a subsample of each topical discourse in a biweekly period. This approach enables us to filter out a subsample for further study easily. We set the threshold value separately for each topic and biweekly period. The classification decision is made using a boxplot procedure. Within each topic-period subsample, two groups of tweets are classified: (1) potential disinformation (disinformation probability > subsample median) and (2) highly potential disinformation (disinformation probability > subsample threshold). The potential disinformation is set to observations exceeding the 50th percentile, and highly potential disinformation is set to a chosen threshold percentile (e.g., exceeding the interquartile range of 75 percent). 3.11
Disinformation Outlier Detection
Our validated disinformation identification model, based on 22 indicators grouped into five factors, enables estimation of the probability of disinformation for each tweet and filters out the disinformation outliers as “highly potential disinformation” topic-by-topic and period-by-period. The filtered outliers’ Case ID in the dataset can be located to check the detailed information, such as the author, the number of retweets, and whether a verified user sent it. Additionally, the data can help answer advanced analytical questions about communication’s influence. Table 6.3 shows a filtered subsample for an outlier in the “vaccine manufacturers” topic that spread widely in December 2021. The top tweet claimed that “Pfizer and Moderna understand COVID-19 as a new industry. Their execs be [sic] the leader of a new, powerful global mafia. They do not care about your health. They want you on a lifetime subscription through the force of government coercion.” This tweet exemplifies high potential disinformation and illustrates the algorithm’s effectiveness.
Indicator
Word count
Lexical diversity
Trust
Arousal
Disgust
Fear
Joy
Sadness
Quality
Quality
Quality
Relation
Relation
Relation
Relation
Relation
SBS
SBS
SBS
SBS
SBS
SBS
LIWC
LIWC
Estimation
Sign Expectations
Factor
Table 6.2
−
+
+
+
+
≠0
≠0
≠0
Expectation
to be shared.
contain disinformation as they are less likely
Messages eliciting sadness are less likely to
disinformation.
message could be misleading and spread
tone to inspire the desire to accept the positive
However, online messages in a joyful, upbeat
joy are less likely to contain disinformation.
communications, genuine messages eliciting
Context-dependent. In person-to-person
contain disinformation.
Messages eliciting fear are more likely to
contain disinformation.
Messages eliciting disgust are more likely to
contain disinformation.
Messages eliciting arousal are more likely to
contain disinformation.
designed to look trustworthy is more likely to
disinformation. However, information
narrative is likely inversely associated with
Context-dependent. Genuinely trust
nuanced arguments.
and positive sign for disinformation spread by
disinformation spread through catchy slogans,
Context-dependent: negative sign for
arguments.
disinformation spread through fully articulated
simple messages, and positive signs for
for disinformation spread through
Context-dependent: negative signs
Reasoning
−
−
−
+
−
+
+
+
0.5735
−0.8197 **
−0.3846
1.0186 *
−0.5504
0.6351 *
2.8420 ***
10.8228 ***
+
−
+
+
+
+
+
+
−3.1260***
1.0225*
2.9785***
1.0183
19.3321***
1.5374***
5.0799***
71.8767***
[2] AIC = 2311.6
[1] AIC = 5079
114 Handbook of social computing
Indicator
Surprise
Negative tone
Certainty
Informality
Dominance
Hapaxes
Authenticity
First-person pronouns
Factor
Relation
Relation
Manner
Manner
Manner
Manner
Manner
Quantity
LIWC
LIWC
SBS
SBS
SBS
SBS
LIWC
SBS
Estimation
+0
−
−
≠0
≠0
−
+
+
Expectation
designed to conceal disinformation.
social media can also be manipulative and
first-person pronouns to convey honesty on
lies and disinformation. However, using
liars’ attempts to dissociate themselves from
communications can be more consistent with
self-references in person-to-person
Context-dependent. A lower rate of
manipulative and contain disinformation.
A lack of authenticity is more likely to be
and disinformation.
complexity and are more consistent with lies
Fewer unique words suggest lower cognitive
message may also be viewed as manipulative.
confident and more credible. A dominant
dominant messages may be perceived as
The sign depends on context and tone:
which tends to use more rhetorical devices.
is also less associated with disinformation
highly colloquial conversation among friends
facilitate disinformation acceptance. However,
language to create a sense of familiarity and
typically rely on informal and colloquial
Context-dependent. Disinformation campaigns
creating uncertainty.
Disinformation campaigns typically rely on
disinformation.
and confusion and is more likely to contain
A negative tone can create fear, mistrust,
contain disinformation.
Messages eliciting surprise are more likely to
Reasoning
−
−
−
−
−
−
−
−
−1.9037 **
−0.6311 ***
−13.0180 ***
−0.5007
−2.8055 ***
−0.5201
−1.1183 ●
−0.6133
+
−
−
−
−
−
+
+
2.6188***
−0.7337***
−39.997***
−14.5195***
−1.9493**
−0.6934 ●
2.6966 ●
0.6509
[2] AIC = 2311.6
[1] AIC = 5079
COVID-19 Twitter discussions in social media 115
Exclusive words
Quantity
LIWC
LIWC
Estimation
+
+
Expectation
* significant at 10%; ** significant at 5%; *** significant at 1%.
Third-person pronouns
Quantity
Notes:
Indicator
Factor
consistent with disinformation.
in-group vs. out-group distinctions is more
Exclusive words are often used to inflame
likely to contain disinformation.
them) narrative like exclusivity and is more
more consistent with an alienation (us vs.
A higher rate of third-person references is
Reasoning
+
+
0.8319 ●
−0.7430 *
+
+
1.4310**
4.6424***
[2] AIC = 2311.6
[1] AIC = 5079
116 Handbook of social computing
COVID-19 Twitter discussions in social media 117
Notes: The figure shows sample results for tweets from the “vaccine manufacturers” topic in December 2021. Tweets with a probability score above 30 percent are considered “potential disinformation,” while those exceeding 62 percent are “highly potential disinformation.”
Figure 6.4a
Threshold analysis and tweet classification
4. RESULTS 4.1
Prevalence of COVID-19-related Disinformation Themes on Social Media
4.1.1 Choice of the optimal method Choosing the optimal topic extraction method involves comparing the costs and benefits of each method. The manual factor analysis (FA) algorithm using IBM SPSS software is conceptually simple but lacks computational efficiency. To remedy this, we implement a manual FA algorithm using the R software “psych” package, providing a programmatic solution for factor analysis. Additionally, we consider two automatic methods using Provalis Research WordStat software, which automatically extract latent topics of COVID-19-related tweets. In addition to the standard FA algorithm, WordStat offers an additional automated algorithm called Non-negative Matrix Factorization (NNMF) FA, which applies comparatively less weight to words with less coherence and reduces the dimensions of the input corpus.
118 Handbook of social computing
Note: The boxplot indicates outliers with black dots.
Figure 6.4b
Threshold analysis and tweet classification
To select an optimal method, we use a sample of tweets collected within December 2021 and analyze this dataset sample using the four FA methods. We suppress factor loadings lower than 0.3 for all four methods to reduce the cross-loading incidence and enable unique term mapping to factors. To verify the reliability of extracted factors, we calculate Cronbach’s Alpha statistic for each extracted topic and retain only the topics that are at least marginally reliable (where Cronbach’s Alpha is > 0.6). We use mixed methods reasoning to select the optimal method, first considering the relative performance of the four methods quantitatively and then considering computational efficiency vs. performance qualitatively. We rank the methods using two metrics: the average alpha for each method describing its reliability, and the number of reliable topics extracted, describing the richness of distinct reliable topics for each method. Method 3 extracted the highest number of reliable topics (29), followed by method 4 with 28 reliable topics, method 2 with 25 reliable topics, and baseline method 1 with 24 reliable topics. The darkest gray tone in Table 6.4 designates nine reliable topics shared by all four methods, while the second lighter value of gray designates four reliable topics shared by three methods. The third lighter value of gray denotes reliable topics extracted by manual FA methods, and those extracted by automatic FA methods are marked by the fourth lighter value of gray. Finally, unique topics extracted by various methods are shown using a white background.
1088175382061380000 20668212 52191473 1423236166007070000
SpiceLord15
aquilalux
MSOswald
AnonimAktivista
industry their execs be
[sic] the leader of a new
government coercion
through the force of
on a lifetime subscription
your health they want you
they do not care about 1326024187106910000
1120490917310420000
AmericaEffYeaa
understand as a new
jeffpeenut
1849216412
Techno_Vking
pfizer and moderna
powerful global mafia
USER_ID
USER_SCREE
A sample disinformation outlier
Text
Table 6.3
12/3/2021
12/2/2021
12/2/2021
12/2/2021
12/2/2021
12/2/2021
12/2/2021
DATE
JordanSchachtel 25
JordanSchachtel 3213
JordanSchachtel 1864
JordanSchachtel 44
JordanSchachtel 39
JordanSchachtel 266
JordanSchachtel 1676
234
3980
4977
475
203
1223
1983
8455
34197
1085
268
16692
15326
66264
RETWEET_SC USER_FOLLO USER_FRIEN USER_FAVOU
1750
1750
1750
1750
1750
1750
1750
CO
RETWEET_
78.10
78.10
78.10
78.10
78.10
78.10
78.10
Score
COVID-19 Twitter discussions in social media 119
120 Handbook of social computing Based on Table 6.4, method 3 extracted 29 reliable topics but produced the lowest average alpha (0.764). Method 1 had the highest average alpha (0.817) but extracted only 24 reliable topics. Method 4 had the second-highest number of reliable topics (27) and the second-highest average alpha (0.806). Method 4 was also computationally efficient and effective, so we chose it to perform topic modeling for subsequent samples. 4.1.2 Sankey diagram method for visualization of dynamic changes in topics We use the Sankey diagram visualization method to understand how COVID-19-related topics evolved on Twitter in relation to the epidemiological stages of the pandemic (Malik et al., 2013). We identify topics using method 4 on biweekly time slices and aggregate them across quarterly periods. Figure 6.5 shows the dynamic changes in topics and themes over time, providing a logical way to consider how different information regimes, as reflected in the topics and more prominent themes, relate to the course of the pandemic. This approach achieves face validity as supported by reason and precedent (e.g., Malik et al., 2013; Resnick et al., 2014; Pépin et al., 2017). Based on the results, we identify pandemic impact (phase 1), preventive measures (phase 2), COVID-19 cases (phase 3), vaccines (phase 4), and complete vaccination (phase 5) as the important topics for each respective phase. Figure 6.5 also shows that the number of topics discussed increased from phase 1 (January 2020–April 2020) to phase 5 (May 2021–August 2021), with additional topics associated with each phase noted at the bottom of the diagram. 4.2
Measurement of Disinformation on Social Media
4.2.1 Exploratory factor analysis We conducted an exploratory factor analysis on a 19-item scale measuring disinformation on social media, using Hinkin’s (1998) recommendation of considering eigenvalues greater than 1 and the scree test of the percentage of variance explained (Cattell, 1966) to determine the number of factors (Table 6.5). The analysis revealed a five-factor solution with high factor loadings and low cross-loadings of the retained items, providing a suitable conceptualization of disinformation on social media in the context of pandemic public health. The data were appropriate for factor analysis, as evidenced by a Kaiser–Meyer–Olkin (KMO) statistic of 0.71, a significant Bartlett’s Test of Sphericity, communalities above 0.30, and Measure of Sampling Adequacy (MSAs) across the diagonal of the anti-image matrix above 0.56. The first five factors explained 82.5 percent of the total variance (see Table 6.3). Cronbach’s Alphas for the five factors were above the recommended level of 0.65 (Nunnaly, 1978), indicating good internal consistency. Factor 1 (Emotional Contagion), Factor 2 (Emotional Arousal), Factor 3 (Positive Affect), Factor 4 (Linguistic Complexity), and Factor 5 (Expectation Violation) had alphas of 0.903, 0.861, 0.830, 0.857, and 0.686, respectively (see Table 6.4), confirming convergent and discriminant validity. 4.2.2 Configural and metric invariance Our analysis reveals that the first-order latent construct of disinformation on social media is configurally invariant across time groups, state groups, and discussion topic groups, indicating the same underlying factor structure across different contexts. However, our analysis also demonstrates that the model does not achieve metric invariance across these groups compared to an unconstrained, freely estimated model. This outcome suggests that the strength
Analytics and
Topic 2
(0.830)
Response (0.825)
Topic 13
Healthcare Worker
Healthcare Worker (0.880)
Public Health (0.927)
CDC Safety Protocol (0.720)
(0.670)
Clinic and Vaccination Clinic and Vaccination (0.670)
(0.880)
Public Health (0.927)
Topic 12
holidays (0.918)
Pandemic during
(0.884)
Vaccination Mandate (0.910)
(0.690)
Response (0.886)
Vaccination Mandate
Biden Administration Response
Biden Administration
Topic 11
Topic 10
Topic 9
Topic 8
Trump Administration Response
Trump Administration
Topic 7
Booster Dose (0.720)
Booster Dose (0.827)
Topic 6
Test Positive (0.720)
Test Positive (0.702)
Topic 5
(0.911)
COVID-19 variant
Topic 4
COVID-19 variant (0.820)
Premier League (0.793) Premier League (0.770)
Topic 3
Healthcare (0.951)
Refuse Mandate (0.963) Refuse Mandate (0.970)
Topic 1
Analytics and Healthcare (0.990)
SPSS FA
Method 2 – Manual R FA
Method 1 – Manual
Method
Booster Dose (0.72)
Test Positive (0.72)
COVID-19 variant (0.82)
Premier League (0.77)
Analytics and Healthcare (0.99)
Refuse Mandate (0.97)
Clinic and Vaccination (0.837)
South Africa Outbreak (0.765)
Public Health (0.654)
Pandemic during holidays (0.867)
Vaccination Mandate (0.91)
Biden Administration Response (0.69)
CDC Omicron Study (0.877)
Healthcare Worker (0.780)
Positive Rate Total (0.650)
Pandemic during holidays (0.829)
Vaccination Mandate (0.91)
Biden Administration Response (0.69)
Trump Administration Response (0.83) Trump Administration Response (0.83)
Booster Dose (0.72)
Test Positive (0.72)
COVID-19 variant (0.82)
Premier League (0.77)
Analytics and Healthcare (0.99)
Refuse Mandate (0.97)
Method 3 – Auto WordStat NMMF FA Method 4 – Auto WordStat FA
Method selection using performance matrix
Topic/
Table 6.4
Topics shared by three methods
Topics shared by four methods
Legend
COVID-19 Twitter discussions in social media 121
Isolation and
Topic 14
Average alpha = 0.806; 28 Reliable topics
Average alpha = 0.817; Average alpha = 0.783; 25 Reliable Average alpha = 0.765; 29 Reliable
24 Reliable topics
Topic 27
Topic 28
Topic 29
Performance
topics
topics
Active Cases Total (0.806)
(0.884)
Monoclonal Antibody Treatment
Wear a Mask (0.704)
Severy Illness Study (0.855)
Mild Symptoms (0.623)
Wear a Mask (0.670)
Severy Illness Study (0.792)
Children (0.728)
Children (0.692)
South Korea Outbreak (0.660)
Topic 26
Ron Desantis and Florida (0.958)
FDA Vaccination Approval (0.750)
Show Proof (0.787)
(0.940)
Ron Desantis and Florida (0.958)
FDA Vaccination Approval (0.750)
Show Proof (0.852)
Healthcare (0.655)
Healthcare (0.700)
Government and Pandemic (0.690)
The outbreak in Christmas (0.670)
Paid Leave Unvaccinated Employee
Topic 25
Pandemic (0.710)
Government and
Christmas (0.723)
The outbreak in
(0.951)
Paid Leave Unvaccinated Employee
Failed Reopening (0.959)
(0.691)
Failed Reopening (0.959)
(0.691)
Topic 24
Topic 23
Topic 22
(0.726)
Mohammed Alyamani
Topic 21
Mohammed Alyamani (0.790)
Omicron Wave (0.745) Omicron Wave (0.740)
Topic 20
Hospital Beds and COVID-19 Patients
Hospital Beds and COVID-19 Patients
(0.860)
(0.860)
Travel Ban (0.770)
Travel Ban (0.770)
Topic 19
(0.780)
disease (0.780)
One-day Case Increase (0.864)
Lawmakers and Vaccination Companies Lawmakers and Vaccination Companies
One-day Case Increase (0.881)
COVID-19 Patients and COVID-19 Patients and disease
Walter Reed (0.740)
Bill (0.888)
Mass Vaccination (0.935)
Tory Party (0.938)
Topic 18
(0.830)
Statistics (0.831)
Bill (0.913)
Mass Vaccination (0.935)
Tory Party (0.947)
Method 3 – Auto WordStat NMMF FA Method 4 – Auto WordStat FA
Walter Reed (0.807)
Official COVID-19 Statistics
Conspiracy Theory (0.810)
Official COVID-19
(0.847)
Conspiracy Theory
Isolation and Quarantine (0.800)
Method 2 – Manual R FA
Topic 17
Topic 16
Topic 15
SPSS FA
Method
Quarantine (0.880)
Method 1 – Manual
Topic/
Unique topics
manual methods
methods
automatic
shared by
Topics shared by Topics
Legend
122 Handbook of social computing
Figure 6.5
Visualization of the dynamics of COVID-19-related topics on social media
Notes: The gray shaded nodes at the left boundary of each phase represent thematic topic groups, while nodes on the right boundary represent sub-topics. The curvilinear edge weights represent topical prevalence.
COVID-19 Twitter discussions in social media 123
124 Handbook of social computing Table 6.5
Results of exploratory factor analysis
Component 1
2
3
4
5
Disgust intensity
.943
Negative emotion
.940
Disgust count
.933
Informal
.903
First-person pronouns
.761
Fear count
.860
Fear intensity
.832
Sadness count
.828
Sadness intensity
.821
Negative tone
.739
Joy count
.849
Trust intensity
.828
Trust count
.824
Joy intensity
.808
Words per sentence
.916
Hapaxes
.870
Third-person pronouns
.552
Surprise intensity
.930
Surprise count
.923
Note: The table shows the rotated component matrix result using principal component analysis with Kaiser-normalized varimax rotation.
of relationships between the observed variables and the latent construct varies across groups. This variation may be due to differences in understanding and interpretation of the observed variables or differences in the reliability of the measured variables across groups. The freely estimated model with multiple groups achieves a goodness of fit index (GFI) of .917 and an adjusted goodness of fit index (AGFI) of .893, indicating a good fit for the measurement model across time and state groups. However, the fit deteriorates when the model is estimated across groups defined by different topics on social media, suggesting that the complexity of topics affects the structure of the disinformation measurement model. Overall, while the factor structure of the first-order latent construct of disinformation on social media is invariant across groups and contexts, the meaning and interpretation of the observed variables may differ. To account for these differences, researchers should use group-specific factor loadings or conduct separate analyses for each group when comparing groups. This approach would enable the analysis of relationships between disinformation and public health impacts separately across time, states, and social media topics, providing insights into how disinformation affects public health in different contexts. By accounting for group-specific differences, we can assess relationships between disinformation and public health impacts more accurately, informing the development of more effective interventions to combat the adverse effects of disinformation on public health. 4.3
The Effects of Disinformation on Public Health
We examine the relationship between disinformation and negative health impact quantitatively by plotting a single-point average disinformation level against a state-level negative health impact metric (Figures 6.6a–f). The disinformation index is calculated by averaging
COVID-19 Twitter discussions in social media 125
Notes: Model sensitivity 5.1%. US states are shown as dots (Republican) or circles (Democrat) according to the political leaning based on the results of the 2020 presidential election.
Figure 6.6a
Average disinformation score for all topics (X-axis) versus pandemic score (Y-axis) in November 2020
Notes: Model sensitivity 9.1%. US states are shown as dots (Republican) or circles (Democrat) according to the political leaning based on the results of the 2020 presidential election.
Figure 6.6b
Average disinformation score for all topics (X-axis) versus pandemic score (Y-axis) in December 2021
126 Handbook of social computing
Notes: Model sensitivity 1.7%. US states are shown as dots (Republican) or circles (Democrat) according to the political leaning based on the results of the 2020 presidential election.
Figure 6.6c
Average disinformation score for the “COVID-19 vaccine” topic (X-axis) versus pandemic score (Y-axis) in November 2020
Notes: Model sensitivity 1.3%. US states are shown as dots (Republican) or circles (Democrat) according to the political leaning based on the results of the 2020 presidential election.
Figure 6.6d
Average disinformation score for the “Vaccine boosters” topic (X-axis) versus pandemic score (Y-axis) in December 2021
COVID-19 Twitter discussions in social media 127
Notes: Model sensitivity 2.3%. US states are shown as dots (Republican) or circles (Democrat) according to the political leaning based on the results of the 2020 presidential election.
Figure 6.6e
Average disinformation score for the “COVID-19 symptoms and patients” topic (X-axis) versus pandemic score (Y-axis) in November 2020
Notes: Model sensitivity 1.8%. US states are shown as dots (Republican) or circles (Democrat) according to the political leaning based on the results of the 2020 presidential election.
Figure 6.6f
Average disinformation score for the “Full vaccinations” topic (X-axis) versus pandemic score (Y-axis) in December 2021
128 Handbook of social computing each state’s disinformation level, while the negative health impact during the COVID-19 pandemic is measured using a weighted factor constructed from three indicators: cases per capita, deaths per capita, and cases per test. To construct a COVID-19 negative health impact index, we normalized the values of the three indicators using a three-step algorithm. First, for each state, we divided the sum of cases and deaths by the state’s population to obtain the cases and deaths per capita, respectively. Then, we divided the sum of cases by the sum of tests to obtain the cases per test. Second, we normalized the resulting values on a given day by subtracting the minimum value among all the states for each variable and dividing it by the variable’s range. Finally, the resulting index consists of the average score of a state ranked by change, subtracting the change from the minimum change of all states and dividing by the maximum. Our findings show that in 2021, the sensitivity of public health outcomes to disinformation appears to be systematically greater than in 2020 (Figure 6.6). However, these results do not control for other variables such as the availability of vaccines, human immune response, genetic changes in the virus, and human learning to recognize and respond to disinformation rationally. Figures 6.6a–f show mixed results of the relationship between Negative Health Impact and Disinformation. Panel 6.6a results show that across all topics, the statewide negative health impact in December 2021 is more sensitive to disinformation (with a slope of 9.1 percent) than statewide health impacts in November 2020 (with a slope of 5.1 percent). Panel 6.6b’s comparison of similar vaccine-related topics shows that sensitivity to disinformation is systematically lower in 2021 than in 2020. We believe that the growth in the adverse health effect in 2021 is due to the increasing cumulative toll of the COVID-19 disinformation. Aggregate disinformation negatively impacts health during a pandemic by spreading false information about the disease, its causes, symptoms, and treatments, leading to fear, anxiety, and confusion among the public. As the pandemic continues, the impact of disinformation grows over time because more people are exposed to it, and it becomes harder to correct false information once it has taken root in people’s minds. The attenuation of the negative health effect for specific related topics in 2021 is due to several mechanisms, including improved public education, increased media literacy, correction of false information, and weariness from repetition. As the public becomes better informed and more aware of the dangers of disinformation, the impact of familiar and specific thematic narratives on adverse health impacts should diminish over time. These results and interpretations should be considered with caution since the disinformation and negative health impacts are compared without controlling for other variables like the availability of vaccines, human immune response, genetic changes in the virus, morbidity, and human learning to recognize mis- and disinformation and to respond to it rationally. 4.4
The Role of Bots in Generating and Spreading Disinformation
Social media bots perform various automated tasks on social media platforms, including spreading and amplifying information, connecting with other users, and reaching a wider audience. They can be either adverse or beneficial, depending on their intended purposes and actions. In this study, we utilized the tweetbotornot2 R software package and its built-in XGBoost bot classifier, predict_bot(), to understand the influence of bots in spreading
COVID-19 Twitter discussions in social media 129
Notes: The sample has 6,354 nodes and 3,401 edges. Light gray dots represent 401 detected bots, while dark gray dots represent non-bots. Highly connected users are labeled, and the directed edges (arrows) indicate the flow of information from the original tweet to the first set of receivers.
Figure 6.7
Directed Twitter network for December 2021, random sample of 10,000 tweets
COVID-19-related disinformation on social media. We randomly sampled 10,000 tweets from all December 2021 tweets, analyzed this sample using NodeXL Pro software, and displayed it as a directed network where the directed edges (arrows) represent the flow of information from one node to another. We marked bot nodes in light gray and non-bot nodes in dark gray in the network visualization. Highly connected accounts, operated by legitimate media companies like Reuters, A.P., New York Times, and Washington Post, were found to use bots to distribute their content efficiently. Bots can help to spread information across different communities and promote the cross-pollination of ideas. However, their function can also be used maliciously or for advertising rent extraction through clickbait. After identifying bots in the sample, we analyzed the accounts influenced by bots and found a sub-network of 913 nodes with 598 edges, where the 401 detected bots directly influenced 512 accounts (Figure 6.7). This analysis provides insights into bots’ role in shaping information dissemination on Twitter.
130 Handbook of social computing
Notes: Each node in this visualization corresponds to a tweet, and each directed edge indicates a retweet. The graph displays tweets that were directly retweeted by bots without taking subsequent retweets into account.
Figure 6.8
Initial retweets in the bot-influenced flow of information for December 2021 random sample of 10,000 tweets
Comparing the network with and without bots (Figure 6.8) showed that removing bots resulted in greater fragmentation of Twitter communities, creating “echo chambers.” Echo chambers form when users are only exposed to viewpoints like their own, reinforcing existing beliefs and leading to a lack of exposure to alternative perspectives. Bots play a crucial role in connecting different communities and facilitating the flow of information between them, preventing the network from becoming too fragmented. Social media communities tend to form isolated groups centered around specific topics or viewpoints.8 This effect leads to “echo chambers,” where users are only exposed to similar viewpoints, reinforcing existing beliefs and limiting exposure to alternative perspectives. This lack of diversity in shared information can increase polarization. Bots play a critical role in connecting different communities and promoting the flow of information between them. Therefore, removing bots from social media networks could lead to the disintegration of the network into isolated communities or echo chambers.
COVID-19 Twitter discussions in social media 131
Notes: The network is less fragmented without bots, as seen in the right-hand graph and right-hand columns 4 through 6 of Table 6.6. The right-hand graph (without bots) shows fewer overlapping communities by comparison with left-hand graph (with bots).
Figure 6.9
Initial retweets in the bot-influenced flow of information for December 2021 random sample of 10,000 tweets
Table 6.6
Network statistics with and without bots
Panel B: Network Statistics with and without bots Top Influencers With
Network Statistics
Value
Bots
Top Influencers Without
Graph Metric
Value
Bots
DrEricDing
Vertices
5,463
DrEricDing
Vertices
6,354
joncoopertweets
Unique Edges
3,397
joncoopertweets
Unique Edges
3,985
soompi
Edges With Duplicates
4
thehill
Edges With Duplicates
14
ShamsCharania
Total Edges
3,401
soompi
Total Edges
3,999
medriva
Self-Loops
39
ShamsCharania
Self-Loops
42
DWUhlfelderLaw
Reciprocated Vertex Pair
0
medriva
Reciprocated Vertex Pair
0
Ratio
Ratio
NikkiCallowayy
Reciprocated Edge Ratio
0
washingtonpost
Reciprocated Edge Ratio
Jim_Jordan
Connected Components
2,103
DWUhlfelderLaw
Connected Components
2,406
ElectionWiz
Single-Vertex Connected
35
NikkiCallowayy
Single-Vertex Connected
38
93
Jim_Jordan
Components EssexPR
Maximum Vertices in
Components
a Connected Component BernieSpofforth
Maximum Edges in Maximum Geodesic
Maximum Vertices in
93
a Connected Component 93
ElectionWiz
a Connected Component OccupyDemocrats
0
Maximum Edges in
93
a Connected Component 5
A.P.
Distance (Diameter)
Maximum Geodesic
5
Distance (Diameter)
RBReich
Average Geodesic Distance 1.5469
ANI
Graph Density
EssexPR
1.13E-4 SkyNews
Average Geodesic Distance 1.5671 Graph Density
9.78E-5
Notes: Table 6.6 indicates that the graph density without bots (columns 4 through 6) (9.78E-5) is 15 percent lower than with bots (columns 1 through 3) (1.13E-4), suggesting less connectivity and a more dispersed network without bots.
132 Handbook of social computing We further analyzed the impact of bots on the “Vaccine Manufacturers” topic in our December 2021 sample, using the eigenvector method of community detection algorithm to identify densely connected groups. Our analysis suggests that bots operated by legitimate news and media sources play a significant role in originating and spreading disinformation. Of the 45 accounts that sent highly potential disinformation, we identified 13 originating 29 percent of original content on vaccines as bot-originated disinformation. Bots retweeted 28 of the 46 highly potential disinformation tweets, spreading 61 percent of disinformation. These results are based on a small subsample of the data (Figure 6.9). Further research is needed to understand the full extent of bots’ influence on the spread of disinformation during the pandemic. Nevertheless, our findings provide important insights into the role of bots in spreading disinformation on social media and the potential impact on public health.
5. DISCUSSION The intricate connection between disinformation and adverse health outcomes during the COVID-19 pandemic (Pennycook et al., 2020) emphasizes the importance of ongoing surveillance and measures to curb its proliferation (Lewandowsky et al., 2017; Swire-Thompson & Lazer, 2019; Sylvia Chou et al., 2020). Enhancing public health education, media literacy, and debunking false information are crucial in reducing the harmful effects of disinformation and fostering public health and well-being (Allington et al., 2021). Our Disinformation Index offers a probabilistic assessment of disinformation likelihood and represents a noteworthy contribution to the growing field of disinformation studies (Shu et al., 2020; Camargo & Simon, 2022; Pavliuc et al., 2023) and the literature on disinformation detection (Conroy et al., 2015; Rastogi & Bansal, 2022). It builds upon existing work on journalistic approaches to reliability evaluation (Pratelli & Petrocchi, 2022) and incorporates empirical indicators of manipulation and deception, as well as a quantitative strategy for assigning weights. We found that social media discussions around COVID-19 have grown increasingly complex, partly due to the spread of disinformation (Bruns et al., 2020) and the public’s heightened resilience to it (Zhao et al., 2011; Friggeri et al., 2014). Health impact sensitivity to disinformation increased in 2021 due to cumulative disinformation (Lwin et al., 2020; Singh et al., 2020), while sensitivity to vaccine disinformation diminished (Cinelli et al., 2020; Jamison et al., 2020). Our research confirms bots’ significant role in initiating and disseminating COVID-19 disinformation (Ferrara, 2020). These insights emphasize the complicated connection between disinformation and adverse health outcomes during the pandemic, underlining the necessity for persistent surveillance and measures to counter its spread. This investigation presents several compelling insights. The design of the Disinformation Index represents a noteworthy contribution to the expanding field of disinformation studies and the literature on disinformation detection (Shu et al., 2020). Our findings regarding the escalation of topical complexity and its association with adverse health impacts concur with prior research on the role of disinformation in shaping public discourse and attitudes (Scheufele & Tewksbury, 2007; Lewandowsky et al., 2012; Towers et al., 2015; Ecker et al., 2022; Pierri et al., 2022; Van Der Linden, 2022). The increased health impact sensitivity to cumulative disinformation in 2021 emphasizes the importance of continuous monitoring and
COVID-19 Twitter discussions in social media 133 intervention during public health crises (Broniatowski et al., 2018; Vosoughi et al., 2018; Pennycook et al., 2020). This research enriches the literature on disinformation by introducing novel methods to measure and comprehend its topical complexity and effect on health (Lewandowsky et al., 2017). Our findings on the evolving role of bots in originating and spreading COVID-19 disinformation stress the need for continued investigation in this domain (Bruns et al., 2020). We also contribute to understanding disinformation evolution on social media, demonstrating the growing topical complexity of COVID-19 discussions and the rising sensitivity to cumulative disinformation (Cinelli et al., 2020). These outcomes emphasize the importance of ongoing monitoring and intervention to address disinformation during public health crises (Bode & Vraga, 2018). The development of the Disinformation Index serves as a valuable instrument for identifying disinformation, enhancing our ability to examine online disinformation and its impact on public health (Shu et al., 2020). By providing a probabilistic measure of disinformation likelihood, the Disinformation Index can facilitate the development of targeted interventions to counter the spread of disinformation and support evidence-based public health strategies (Wang et al., 2019; Brennen et al., 2020; Depoux et al., 2020).
6. CONCLUSIONS Our study has several limitations that must be acknowledged, including the limited focus on Twitter, a limited sample size, and the focus on COVID-19-related disinformation. Additionally, the Disinformation Index’s design has limitations that require further investigation, including alternative methods for measuring disinformation and improving its real-time monitoring capabilities. Future research should explore the effectiveness of counter-communication strategies, the long-term effects of disinformation on public health and well-being, and the relationship between disinformation and adverse health impacts in other health crises. Future research should focus on identifying and quantifying disinformation in more complex forms of media and integrating linguistic and network approaches to disinformation detection. Future studies should also explore the effectiveness of public education and media literacy campaigns in reducing disinformation impact and improving the public’s ability to distinguish between fact and fiction. Investigating the long-term effects of disinformation on public health and well-being is another promising area for future research. As public health crises become increasingly complex, with multiple sources of information competing for attention, it is essential to develop strategies for mitigating disinformation’s negative impact. Our study contributes to this effort by creating a new Disinformation Index that can measure disinformation in real-time and provide insights into the relationship between disinformation and negative health impacts. In conclusion, our study offers several important contributions to the literature on disinformation and public health crises. We have shed new light on this critical issue by introducing innovative methods for measuring and understanding disinformation’s topical complexity and impact on health. Our findings underscore the need for continued monitoring and intervention to combat disinformation during public health crises and highlight the importance of media literacy, public health education, and the correction of false information. Developing a com-
134 Handbook of social computing prehensive strategy for addressing disinformation is essential to promoting public health and well-being; future research should continue exploring this important area. As disinformation continues to pose a significant threat to public health, we must develop effective strategies for mitigating its impact and ensuring that the public can access accurate and reliable information. The public’s health and well-being depend on it.
ACKNOWLEDGMENTS The authors would like to express their gratitude to several individuals and organizations that supported this study. The Northeastern University Diplomacy Lab is acknowledged for its support and assistance as part of the Department of State’s Diplomacy Lab academic research initiative. Tuomas Takko received funding from the Vilho, Yrjö, and Kalle Väisälä Foundation of the Finnish Academy of Science and Letters, which is greatly appreciated. Xiaomu Zhou and Mikhail Oet acknowledge the funding support of the Northeastern University Office of the Provost, Full-Time Faculty Professional Development Fund. Special recognition is given to Chase Zhang and Olivia Liu, previous research assistants who contributed to the conceptual and empirical research. The authors thank Brandon Smith and Jennifer Counter for teaching and mentoring this project. The authors would also like to sincerely thank the participants of the 10th International Conference on Collaborative Innovation Networks (COINs), including Peter Gloor, Francesca Grippa, Julia Gluesing, Ken Riopelle, Richard B. Freeman, and Aleksandra Przegalinska, for their invaluable critiques and suggestions. Furthermore, the authors would like to acknowledge Professor Andrea Fronzetti Colladon from the University of Perugia for his mentorship on the Semantic Brand Score (SBS) software, making the SBS software available to the authors, and providing insightful guidance and recommendations for this research. Their contributions and support were critical to the success of this study.
NOTES 1. Infodemic is defined as “too much information or false and misleading information” that “causes confusion, risk taking behaviors … and mistrust of health officials” (World Health Organization, n.d.). 2. Linear time refers to a time measurement where the amount required to complete a task grows in proportion to the input size. It refers to a time complexity in which the running time of an algorithm is directly proportional to the size of the input data (O(n)). In linear time, the running time increases as the input size grows, making it a desirable property for many computational tasks. 3. According to previous literature, liars use fewer first-person pronouns, potentially to dissociate themselves from the lie. 4. Liars use more negative emotion words, which may indicate tension and guilt related to the lies they are telling. Another interpretation is that liars may use negative emotions to manipulate their audience by generating fear, anxiety, or anger. 5. Liars use fewer unique words than truth tellers, suggesting lower cognitive complexity. Telling a complete and convincing story is a cognitively demanding task, and liars may not possess the necessary cognitive resources. 6. Exclusive words are often used in political speech to create a sense of in-group and out-group and to reinforce existing power dynamics, according to Newman et al. (2003).
COVID-19 Twitter discussions in social media 135 7.
Motion verbs are frequently used in political discourse to create an impression of action or progress as a distraction from the lack of concrete actions or progress on a particular issue, according to Newman et al. (2003). 8. Pariser (2011) warns that social media platforms’ personalization algorithms can create “filter bubbles,” where users only encounter information and perspectives that align with their existing beliefs and values. Meanwhile, Cinelli et al. (2021) observe that echo chambers can arise on social media platforms due to users’ selective attention and content choices. Echo chambers reinforce existing beliefs and opinions, while also promoting the spread of false information.
REFERENCES Allington, D., Duffy, B., Wessely, S., Dhavan, N., & Rubin, J. (2021). Health-protective behaviour, social media usage and conspiracy belief during the COVID-19 public health emergency. Psychological Medicine, 51(10), 1763–9. Amado, B. G., Arce, R., & Fariña, F. (2015). Undeutsch hypothesis and Criteria Based Content Analysis: a meta-analytic review. The European Journal of Psychology Applied to Legal Context, 7(1), 3–12. American Psychological Association (2022). Misinformation and disinformation. https://www.apa.org/ topics/journalism-facts/misinformation-disinformation. Askitas, N., Tatsiramos, K., & Verheyden, B. (2020). Lockdown strategies, mobility patterns and COVID-19. In Covid Economics-Vetted and Real-Time Papers (pp. 293–302). The Centre for Economic Policy Research (CEPR). Auxier B. (2020). 64% of Americans say social media have a mostly negative effect on the way things are going in the U.S. today. Washington, D.C.: Pew Research Center. Badr, H. S., Du, H., Marshall, M., Dong, E., Squire, M. M., & Gardner, L. M. (2020). Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study. The Lancet Infectious Diseases, 20(11), 1247–54. Bakir, V., & McStay, A. (2018). Fake news and the economy of emotions: problems, causes, solutions. Digital Journalism, 6(2), 154–75. Bandura, A. (2009). Social cognitive theory of mass communication. In J. Bryant & M. B. Oliver (eds), Media Effects (pp. 110–40). Routledge. Benson, T. (2020). Twitter bots are spreading massive amounts of Covid-19 misinformation. IEEE Spectrum 29. https://spectrum.ieee.org/twitter-bots-are-spreading-massive-amounts-of-covid-19 -misinformation. Bergman, N. K., & Fishman, R. (2020). Mobility reduction and Covid-19 transmission rates. medRxiv, 2020–05. https://www.medrxiv.org/content/10.1101/2020.05.06.20093039v1. Bessi, A., & Ferrara, E. (2016). Social bots distort the 2016 US presidential election online discussion. First Monday, 21(11). https://firstmonday.org/ojs/index.php/fm/article/view/7090/5653. Bode, L., & Vraga, E. K. (2018). See something, say something: correction of global health misinformation on social media. Health Communication, 33(9), 1131–40. Bonita, R., Beaglehole, R., & Kjellström, T. (2006). Basic Epidemiology. World Health Organization. Brennen, J. S., Simon, F. M., Howard, P. N., & Nielsen, R. K. (2020). Types, sources, and claims of COVID-19 misinformation (doctoral dissertation, University of Oxford). https://doi.org/10.60625/risj -awvq-sr55. Broniatowski, D. A., Jamison, A. M., Qi, S., AlKulaib, L., Chen, T., Benton, A., … Dredze, M. (2018). Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. American Journal of Public Health, 108(10), 1378–84. Bruns, A., Harrington, S., & Hurcombe, E. (2020). ‘Corona? 5G? or both?’: the dynamics of COVID-19/5G conspiracy theories on Facebook. Media International Australia, 177(1), 12–29.
136 Handbook of social computing Camargo, C. Q., & Simon, F. M. (2022). Mis- and disinformation studies are too big to fail: six suggestions for the field’s future. Harvard Kennedy School (HKS) Misinformation Review, 3(5). https://doi .org/10.37016/mr-2020-106 Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–76. Centers for Disease Control and Prevention (2020). Knowledge and practices regarding safe household cleaning and disinfection for COVID-19 prevention—United States, May. Morbidity and Mortality Weekly Report, 69(23), 705–9. https://doi.org/10.15585/mmwr.mm6923e2. Chang, S., Pierson, E., Koh, P. W., Gerardin, J., Redbird, B., Grusky, D., & Leskovec, J. (2021). Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589(7840), 82–7. Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W., & Starnini, M. (2021). The echo chamber effect on social media. Proceedings of the National Academy of Sciences, 118(9), e2023301118. Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., … Scala, A. (2020). The COVID-19 social media infodemic. Scientific Reports, 10(1), 1–10. Conroy, N. K., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection: methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1), 1–4. Cuan-Baltazar, J. Y., Muñoz-Perez, M. J., Robledo-Vega, C., Pérez-Zepeda, M. F., & Soto-Vega, E. (2020). Misinformation of COVID-19 on the internet: infodemiology study. JMIR Public Health and Surveillance, 6(2), e18444. de Oliveira, N. R., Pisa, P. S., Lopez, M. A., de Medeiros, D. S. V., & Mattos, D. M. (2021). Identifying fake news on social networks based on natural language processing: trends and challenges. Information, 12(1), 38. Denzin, N. K. ([1970] 2017). The Research Act: A Theoretical Introduction to Sociological Methods. Routledge. Depoux, A., Martin, S., Karafillakis, E., Preet, R., Wilder-Smith, A., & Larson, H. (2020). The pandemic of social media panic travels faster than the COVID-19 outbreak. Journal of Travel Medicine, 27(3). https://doi.org/10.1093/jtm/taaa031. Deutsch, M., & Gerard, H. B. (1955). A study of normative and informational social influences upon individual judgment. The Journal of Abnormal and Social Psychology, 51(3), 629–636. Ecker, U. K., Lewandowsky, S., Cook, J., Schmid, P., Fazio, L. K., Brashier, N., … Amazeen, M. A. (2022). The psychological drivers of misinformation belief and its resistance to correction. Nature Reviews Psychology, 1(1), 13–29. Ferrara, E. (2020). What types of COVID-19 conspiracies are populated by Twitter bots? https://doi.org/ 10.48550/arXiv.2004.09531. Fletcher, R., Cornia, A., Graves, L., & Nielsen, R. K. (2018). Measuring the reach of “fake news” and online disinformation in Europe. Australasian Policing, 10(2), 25–34. Friggeri, A., Adamic, L., Eckles, D., & Cheng, J. (2014). Rumor cascades. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 101–10. Fronzetti Colladon, A. (2018). The semantic brand score. Journal of Business Research, 88, 150–60. Gallotti, R., Valle, F., Castaldo, N., Sacco, P., & De Domenico, M. (2020). Assessing the risks of ‘infodemics’ in response to COVID-19 epidemics. Nature Human Behaviour, 4(12), 1285–93. GDI (2019). The global disinformation index. https://www.disinformationindex.org/blog/2019–12–17 -rating-disinformation-risk-the-gdi-approach-to-news-sites/. González, M. C., Hidalgo, C. A., & Barabasi, A. L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779–82. Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (eds), Syntax and Semantics, Vol. 3: Speech Acts (pp. 41–58). Academic Press.
COVID-19 Twitter discussions in social media 137 Guo, Z., Schlichtkrull, M., & Vlachos, A. (2022). A survey on automated fact-checking. Transactions of the Association for Computational Linguistics, 10, 178–206. Heuveline, P., & Tzen, M. (2021). Beyond deaths per capita: comparative COVID-19 mortality indicators. BMJ Open, 11(3), e042934. Hinkin, T. R. (1998). A brief tutorial on the development of measures for use in survey questionnaires. Organizational Research Methods, 1(1), 104–21. Howard, P. N. (2020). Lie Machines: How to Save Democracy from Troll Armies, Deceitful Robots, Junk News Operations, and Political Operatives. Yale University Press. Howard, P. N., Woolley, S., & Calo, R. (2018). Algorithms, bots, and political communication in the US 2016 election: the challenge of automated political communication for election law and administration. Journal of Information Technology & Politics, 15(2), 81–93. Jamison, A. M., Broniatowski, D. A., Dredze, M., Sangraula, A., Smith, M. C., & Quinn, S. C. (2020). Not just conspiracy theories: vaccine opponents and proponents add to the COVID-19 ‘infodemic’on Twitter. Harvard Kennedy School Misinformation Review, 1. https://misinforeview.hks.harvard .edu/article/not-just-conspiracy-theories-vaccine-opponents-and-pro-ponents-add-to-the-covid-19 -infodemic-on-twitter/. Johns Hopkins University (2021). COVID-19 testing. https://coronavirus.jhu.edu/testing. Johnson, M. K., & Raye, C. L. (1981). Reality monitoring. Psychological Review, 88(1), 67–85. Jones, J. (2020). Americans struggle to navigate COVID-19 ‘infodemic’. Gallup Poll. https://news .gallup.com/poll/310409/americans-struggle-navigate-covid-infodemic.aspx. Kass, N. E. (2001). An ethics framework for public health. American Journal of Public Health, 91(11), 1776–82. Kim, J., & Hastak, M. (2018). Social network analysis: characteristics of online social networks after a disaster. International Journal of Information Management, 38(1), 86–96. Kim, J., & Kwan, M. P. (2021). The impact of the COVID-19 pandemic on people’s mobility: a longitudinal study of the US from March to September of 2020. Journal of Transport Geography, 93, 103039. Kouzy, R., Abi Jaoude, J., Kraitem, A., El Alam, M. B., Karam, B., Adib, E., … Baddour, K. (2020). Coronavirus goes viral: quantifying the COVID-19 misinformation epidemic on Twitter. Cureus, 12(3), e7255. https://doi/org/10.7759/cureus.7255. Kulke, S. (2020). Social media contributes to misinformation about COVID-19. Northwestern Now. https://news.northwestern.edu/stories/2020/09/social-media-contributes-to-misinformation-about -covid-19/. Kumar, K. P., & Geethakumari, G. (2014). Detecting misinformation in online social networks using cognitive psychology. Human-centric Computing and Information Sciences, 4(1), 1–22. Kuran, T., & Sunstein, C. R. (1998). Availability cascades and risk regulation. Stanford Law Review, 51, 683–768. Lazer, D. M., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., … Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094–6. Lee, D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–91. Lee, D., & Seung, H. S. (2000). Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 13. https://proceedings.neurips.cc/paper_files/paper/2000/file/f9 d1152547c0bde01830b7e8bd60024c-Paper.pdf. Lee, S. K., Sun, J., Jang, S., & Connelly, S. (2022). Misinformation of COVID-19 vaccines and vaccine hesitancy. Scientific Reports, 12(1), 13681. Lewandowsky, S., Ecker, U. K., & Cook, J. (2017). Beyond misinformation: understanding and coping with the “post-truth” era. Journal of Applied Research in Memory and Cognition, 6(4), 353–69.
138 Handbook of social computing Lewandowsky, S., Ecker, U. K., Seifert, C. M., Schwarz, N., & Cook, J. (2012). Misinformation and its correction: continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3), 106–31. Lwin, M. O., Lu, J., Sheldenkar, A., Schulz, P. J., Shin, W., Gupta, R., & Yang, Y. (2020). Global sentiments surrounding the COVID-19 pandemic on Twitter: analysis of Twitter trends. JMIR Public Health and Surveillance, 6(2), e19447. Malik, S., Smith, A., Hawes, T., Papadatos, P., Li, J., Dunne, C., & Shneiderman, B. (2013). TopicFlow: visualizing topic alignment of Twitter data over time. Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 720–26). https:// doi.org/10.1145/2492517.2492639. McCornack, S. A., Morrison, K., Paik, J. E., Wisner, A. M., & Zhu, X. (2014). Information manipulation theory 2: a propositional theory of deceptive discourse production. Journal of Language and Social Psychology, 33(4), 348–77. Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29(5), 665–75. Nogara, G., Vishnuprasad, P. S., Cardoso, F., Ayoub, O., Giordano, S., & Luceri, L. (2022). The disinformation dozen: an exploratory analysis of Covid-19 disinformation proliferation on twitter. 14th ACM Web Science Conference (pp. 348–58). https://doi.org/10.1145/3501247.3531573. Nunnaly, J. C. (1978). Psychometric Theory (2nd edn). McGraw-Hill. Pariser, E. (2011). The Filter Bubble: What the Internet Is Hiding From You. Penguin UK. Pavliuc, A., George, A., Spezzano, F., Giachanou, A., Spaiser, V., & Bright, J. (2023). Multidisciplinary approaches to mis- and disinformation studies. Social Media + Society, 9(1), https://doi.org/10.1177/ 20563051221150405. Pennebaker, J. W., Boyd, R. L., Booth, R. J., Ashokkumar, A., & Francis, M. E. (2022). Linguistic Inquiry and Word Count: LIWC-22. Pennebaker Conglomerates. https://www.liwc.app, Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. https://repositories.lib.utexas.edu/bitstream/handle/2152/31333/LIWC2015 _LanguageManual.pdf. Pennycook, G., McPhetres, J., Zhang, Y., Lu, J. G., & Rand, D. G. (2020). Fighting COVID-19 misinformation on social media: experimental evidence for a scalable accuracy-nudge intervention. Psychological Science, 31(7), 770–80. Pépin, L., Kuntz, P., Blanchard, J., Guillet, F., & Suignard, P. (2017). Visual analytics for exploring topic long-term evolution and detecting weak signals in company targeted tweets. Computers & Industrial Engineering, 112, 450–58. Pierri, F., Perry, B. L., DeVerna, M. R., Yang, K. C., Flammini, A., Menczer, F., & Bryden, J. (2022). Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal. Scientific Reports, 12(1), 5966. Rastogi, S., & Bansal, D. (2022). Disinformation detection on social media: an integrated approach. Multimedia Tools and Applications, 81(28), 40675–707. Resnick, P., Carton, S., Park, S., Shen, Y., & Zeffer, N. (2014). Rumorlens: a system for analyzing the impact of rumors and corrections in social media. Proceedings of the Computation + Journalism Symposium, 5(7). http://computation-and-journalism.brown.columbia.edu/. Roozenbeek, J., Schneider, C. R., Dryhurst, S., Kerr, J., Freeman, A. L., Recchia, G., … Van Der Linden, S. (2020). Susceptibility to misinformation about COVID-19 around the world. Royal Society Open Science, 7(10), 201199. Roozenbeek, J., & Van Der Linden, S. (2019). The fake news game: actively inoculating against the risk of misinformation. Journal of Risk Research, 22(5), 570–80. Sadowski, A., Galar, Z., Walasek, R., Zimon, G., & Engelseth, P. (2021). Big data insight on global mobility during the Covid-19 pandemic lockdown. Journal of Big Data, 8(1), 78.
COVID-19 Twitter discussions in social media 139 Scheufele, D. A., & Tewksbury, D. (2007). Framing, agenda setting, and priming: the evolution of three media effects models. Journal of Communication, 57(1), 9–20. Shao, C., Ciampaglia, G. L., Varol, O., Yang, K. C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature Communications, 9(1), 1–9. Shearer, E. (2021). More than eight-in-ten Americans get news from digital devices. Pew Research Center. https://www.pewresearch.org/short-reads/2021/01/12/more-than-eight-in-ten-americans-get -news-from-digital-devices/. Shrestha, M. (2018). Detecting fake news with sentiment analysis and network metadata. Earlham College. https://portfolios.cs.earlham.edu/wp-content/uploads/2018/12/Fake_News_Capstone.pdf. Shu, K., Wang, S., Lee, D., & Liu, H. (2020). Disinformation, Misinformation, and Fake News in Social Media. Springer International. Singh, L., Bansal, S., Bode, L., Budak, C., Chi, G., Kawintiranon, K., … Wang, Y. (2020). A first look at COVID-19 information and misinformation sharing on Twitter. https://www.ncbi.nlm.nih.gov/pmc/ articles/PMC7280879/. Starbird, K. (2019). Disinformation’s spread: bots, trolls and all of us. Nature, 571(7766), 449–50. Starbird, K., Arif, A., & Wilson, T. (2019). Disinformation as collaborative work: surfacing the participatory nature of strategic information operations. Proceedings of the ACM on Human–Computer Interaction, 3(CSCW), 1–26. Sulyok, M., & Walker, M. (2020). Community movement and COVID-19: a global study using Google’s community mobility reports. Epidemiology & Infection, 148, e284. Swire-Thompson, B., & Lazer, D. (2019). Public health and online misinformation: challenges and recommendations. Annual Review of Public Health, 41(1), 433–51. Sylvia Chou, W. Y., Gaysynsky, A., & Cappella, J. N. (2020). Where we go from here: health misinformation on social media. American Journal of Public Health, 110(S3), S273–S275. Takko, T., Bhattacharya, K., & Kaski, K. (2023). Modeling exposure between populations using networks of mobility during Covid-19. https://doi.org/10.48550/arXiv.2301.03663. Towers, S., Afzal, S., Bernal, G., Bliss, N., Brown, S., Espinoza, B., … Castillo-Chavez, C. (2015). Mass media and the contagion of fear: the case of Ebola in America. PloS One, 10(6), e0129179. Undeutsch, U. (1967). Beurteilung der glaubhaftigkeit von aussagen. In P. Lersch (ed.), Handbuch der Psychologie, Vol. 11: Forensische Psychologie (pp. 26–181). Hogrefe. Van Der Linden, S. (2022). Misinformation: susceptibility, spread, and interventions to immunize the public. Nature Medicine, 28(3), 460–67. Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–51. Wang, Y., McKee, M., Torbica, A., & Stuckler, D. (2019). Systematic literature review on the spread of health-related misinformation on social media. Social Science & Medicine, 240, 112552. Wardle, C., & Derakhshan, H. (2017). Information Disorder: Toward an Interdisciplinary Framework for Research and Policymaking. Council of Europe. https://rm.coe.int/information-disorder-report -november-2017/1680764666. Woolley, S. C., & Howard, P. N. (2016). Political communication, computational propaganda, and autonomous agents: introduction. International Journal of Communication, 10, 4882–90. Woolley, S. C., & Howard, P. N. (eds) (2018). Computational Propaganda: Political Parties, Politicians, and Political Manipulation on Social Media. Oxford University Press. World Health Organization. (n.d.). Infodemic. https://www.who.int/health-topics/infodemic. Zhao, L., Wang, Q., Cheng, J., Chen, Y., Wang, J., & Huang, W. (2011). Rumor spreading model with consideration of forgetting mechanism: a case of online blogging LiveJournal. Physica A: Statistical Mechanics and its Applications, 390(13), 2619–25.
140 Handbook of social computing Zhou, L., Burgoon, J. K., Nunamaker, J. F., & Twitchell, D. (2004). Automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communications. Group Decision and Negotiation, 13, 81–106. Zhou, Z., Guan, H., Bhat, M. M., & Hsu, J. (2019). Fake news detection via NLP is vulnerable to adversarial attacks. https://doi.org/10.48550/arXiv.1901.09657. Zuckerman, M., DePaulo, B. M., & Rosenthal, R. (1981). Verbal and nonverbal communication of deception. In L. Berkowitz (ed.), Advances in Experimental Social Psychology, Vol. 14 (pp. 1–59). Academic Press.
PART III MEASURING EMOTIONS
7. Predicting YouTube success through facial emotion recognition of video thumbnails Peter-Duy-Linh Bui, Martin Feldges, Max Liebig and Fabian Weiland
1. INTRODUCTION Nowadays, YouTube is the second most visited website globally after Google (The top 500 sites on the web, 2021). On average, YouTube registers active users in over 75 countries (Holmbom, 2015), attracted by video categories such as music, sports, trailers, or gaming (Nier, 2018). Accordingly, YouTube offers a multi-faceted video database that covers the preferences of many viewers. Furthermore, the demand for these videos creates incentives for businesses to place advertisements which triggers a steadily increasing number of new content creators on YouTube (YouTubers) (Holmbom, 2015). However, depending on viewer preferences, discrepancies in popularity arise among YouTubers (Holmbom, 2015). Hence, we conclude that many factors influence the success of YouTube videos. Given the abundance of videos on offer, viewers are mostly indifferent to their video selection and prefer utilizing short video previews. Hence, it is easier to select videos according to their preferences (Song et al., 2016). Accordingly, YouTube offers the feature of uploading preview images, which present a first video content impression (Koh & Cui, 2022). These images are also commonly referred to as thumbnails. YouTube provides guidelines for creating thumbnails. However, these guidelines can still be interpreted in different ways. As a result, YouTubers use a wide variety of thumbnail designs. Nevertheless, many popular thumbnails share one characteristic: they contain faces with emotions (Shimono et al., 2020). Overall, 72 percent of the most popular videos on YouTube in 2020 featured an image of a human face (Digital Information World, 2020). Therefore, selecting the thumbnail type that leads to the highest video success may be difficult. It can be assumed that YouTubers are interested in increasing their popularity and thus generating more attention for their videos. Utilizing human emotions in thumbnail images is one way to increase recognition. Studies investigate that using negative emotions in thumbnails leads to increased views (Digital Information World, 2020). In addition to emotions, it has also been ascertained that other facial attributes can lead to higher attractiveness. For instance, one study finds that feminine faces are more attractive in pictures than masculine faces (McLellan & McKelvie, 1993). Perceived attractiveness also decreases with increasing age, valid for both sexes but more substantial for women’s faces (McLellan & McKelvie, 1993). In terms of ethnicity, Stepanova and Strube (2018) found that faces that are a mixture of ethnicities are perceived as more attractive than purebred faces. Also, visually appealing people on social media are perceived as likable (Bradley et al., 2019). For the context of this chapter, the previously mentioned facial characteristics (emotion, ethnicity, age, and gender) are referred to as facial attributes. Based on the studies mentioned above, we presume a positive influence between different characteristics of human faces and YouTube video success. Besides the video title, YouTube 142
Predicting YouTube success 143 video thumbnails are the only crucial reference for selecting the consumption of a particular video. Therefore, the following research question arises: How do facial attributes within their corresponding thumbnails influence the success of YouTube videos?
2.
RELATED WORK
The main related research focuses on automatic thumbnail extraction out of videos (Song et al., 2016; Shimono et al., 2020; Pretorious & Pillay, 2020; Zhang et al., 2014) and the development of prediction models for video popularity (Fontanini et al., 2016; Cremer, 2017; Koh & Cui, 2022). Most of the literature proves thumbnails to be one of the most important factors influencing the number of video views (Chang et al., 2019; Song et al., 2016; Shimono et al., 2020; Koh & Cui, 2022). In addition, the video title (Koh & Cui, 2022), the video content, and the number of views and likes are important factors (Chang et al., 2019). However, we only found a limited number of articles dealing with faces and their emotions in video thumbnails. Hence, we present three of these articles in more detail. Cremer (2017) analyzes the correlation between facial emotions in thumbnails and the number of views. Using artificial intelligence in thumbnail imagery analysis, he ascertains a positive correlation between increasing emotional complexity and the number of views. He observes an increasing number of views when utilizing faces with negative emotions. These emotions appear to significantly influence the number of views as they attract attention (Cremer, 2017). Koh and Cui (2022) investigate the connection between the view-through rate of videos – which indicates the proportion of completed views compared to the absolute number of views – and the characteristics of image elements such as text, products, persons, and celebrities. One of their results indicates that the view-through rate of videos correlates with the informativeness and visual attractiveness of thumbnails. On the one hand, they show that thumbnails negatively correlate with the view-through rate if they only contain human elements. On the other hand, the correlation is positive if the human factors are combined with text or objects. Furthermore, the presence of celebrities increases the view-through rate if they are used as a single design element. Shimono et al. (2020) develop and evaluate a method to automatically generate thumbnails using video frames and an automatic text insertion. They utilize the emotion recognition API from Microsoft to create the thumbnail’s background. However, they only consider the emotions happiness and surprise in the background because they suppose that “YouTube videos [are] cheerful and funny” (Shimono et al., 2020: 26). An API summarizing and displaying the video title is applied for the foreground design. Nevertheless, they find that a thumbnail designed by the video creator is more attractive than just taking a frame from the video. In addition, they observe that people and product images lead to more views. None of the aforementioned literature particularly describes the impact of facial attributes on the video’s success. Nevertheless, a successful thumbnail must contain one (Shimono et al., 2020) or multiple faces (Cremer, 2017). Thus, we deduce that this research area has not been sufficiently studied.
144 Handbook of social computing
3. HYPOTHESES To provide a guideline for our work and specify the research question, hypotheses are derived, which will be substantiated in the course of the work with the resulting findings. The following hypotheses are investigated further during our work: H1: There is a positive correlation between facial attributes in thumbnails and the success of YouTube videos. H2: There are differences between YouTube categories regarding the success of facial attributes. H3: It is possible to predict the success of a YouTube video based on facial attributes within its thumbnail.
4. METHOD We conduct our analysis according to the Cross-Industry Standard Process for Data Mining (CRISP-DM). This method represents a standard approach for data mining projects and helps to transform business problems into data mining projects. According to Wirth and Hipp (2000), the CRISP-DM contains the phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. Based on the CRISP-DM process, we derive our implementation steps. The executed implementation steps (Figure 7.1) are explained in more detail in the upcoming sections. 4.1
Data Extraction
To create a comprehensive dataset for our project, data must be extracted from YouTube and transformed. We consider the 100 most successful German content creators on YouTube based on the number of their subscribers (Nindo, 2021). Thus, we ensure a similar range of subscribers among the content creators. Furthermore, the restriction to the German market improves the comparability of individual videos. The video information is extracted from the publicly available YouTube API using a custom-built YouTube scraper. Table 7.1 provides an overview of the stored attributes for each video. The data extraction leads to an overall dataset of 160,000 records fetched in January 2021. The attributes view_count and subscriber_count are employed to create our target feature (see Section 4.2). We apply a mapping function to translate the category_id to a specific category such as Entertainment, Sports, or Music. Then, we execute the main feature extraction process that deals with processing faces in thumbnails. First, we download thumbnails from their corresponding thumbnail_url. Next, we extract clippings containing individual faces for each thumbnail using the face detection Python library Multi-Task Cascaded Convolutional Networks (MTCNN), based on deep multi-task cascaded convolutional networks (Zhang et al., 2016). Subsequently, we determine specific attributes for each face. Our first approach focuses on the extraction of emotions only. Later, we find that additional attributes such as a person’s ethnicity, age, and gender improve
Figure 7.1
Executed implementation steps
Predicting YouTube success 145
146 Handbook of social computing Table 7.1
Overview of the extracted video attributes
Attribute
Description
video_id
Unique identifier for the video
youtube_channel_id
Unique identifier for each YouTube Channel
view_count
Number of views
thumbnail_url
URL to the location of the video thumbnail
published_at
Video publish date and time
category_id
ID of the video category representing the genre of the video
subscriber_count
Number of subscribers of the corresponding YouTube channel
Table 7.2
Description of facial attributes
Attribute
Description
Emotion
Emotion is expressed by the corresponding face. For each face, the engine provides percent matches regarding all emotions. We differentiate between the emotions Anger, Fear, Neutral, Sadness, Disgust, Happiness, and Surprise
Ethnicity
Ethnicity of the corresponding face. For each face, the engine provides percent matches regarding all ethnicities. We differentiate between the ethnicities Asian, Black, Indian, Latino/Hispanic, Middle Eastern, and White
Age
Age of the corresponding face (in years)
Gender
Gender of the corresponding face. We differentiate between the Male and Female gender
our predictive power (see Section 4.4). Therefore, these attributes were also analyzed for each face. For this purpose, we utilize the publicly available engine DeepFace, which offers a facial attribute analysis (Serengil, 2018, 2019a, 2019b; Serengil & Ozpinar, 2020). We use DeepFace to determine the composition of those attributes described in Table 7.2. For emotions and ethnicities, DeepFace indicates the likelihood of each expression matching the face, with the sum of the probabilities adding up to 1. 4.2
Data Preparation
This section describes all necessary processing steps to transform and evaluate the data collected. For this purpose, outliers are identified and removed. Then, the target feature, our Success Score, is designed. In the first step, we prepare our data by removing outliers related to an extraordinarily high number of views. We assume these videos went viral for reasons other than the thumbnail and corresponding facial attributes. Within this step, a total of 2,221 videos were removed. In the second step, we create a measure reflecting the thumbnail success (Success Score). First, we include the number of views since these are strongly influenced by a thumbnail (Chang et al., 2019; Song et al., 2016; Shimono et al., 2020; Koh & Cui, 2022). Since already popular channels with many subscribers receive more views naturally, we also include the component views/subscribers to relativize our Success Score. We also assume that successful thumbnails cause the number of views to deviate from the average number. A video having more views than the average number of views per channel is potentially a result of a superior thumbnail and vice versa. Thus, we include the component views/rolling average views. Considering the growing average number of views for each YouTuber over time, we use a rolling window for the average number of views (average number of views for the five previous and five subsequent videos). We also contemplate using the number of comments, likes
Predicting YouTube success 147 and dislikes, and various combinations of these attributes. However, these attributes did not show significant correlations with our input features. Comments, likes, and dislikes probably refer almost exclusively to a video’s content, not the thumbnail. Therefore, we did not include any of these variables within our Success Score. Finally, we standardize the three components to ensure an equal weighting and add them up. We observe that the distribution of views is right-skewed. Additionally, since the views are included within each Success Score component, our Success Score also follows a right-skewed distribution. We assume YouTube success to be normally distributed. Therefore, we use a cube root function to handle the skewness. This representation of thumbnail success ranges from −7 (bad) to 11 (good) and is composed as follows: _
views ) Success Score = standardize ( √ 3
_
+ standardize (3√ _ views subscribers )
___________
views + standardize ( 3√ ___________ (7.1) rolling average views )
Figure 7.2 illustrates the distribution of our Success Score after every calculation step has been performed. It is approximately normally distributed with an expected value of zero (due to standardization) and a standard deviation of 2.7. We observe that the proportion of videos including faces within its thumbnail increases as the Success Score increases. For example, at a Success Score of −4.0, the proportion of faces is 55.2 percent, while at 4.0, it is 70.3 percent. However, it is noticed that the influence of the proportion of faces in thumbnails with a very low or very high success may not be representative since the number of observations, in this case, is very low.
Figure 7.2
Distribution of the success score and proportion of thumbnails with faces
148 Handbook of social computing After creating the Success Score, a second data cleaning is executed. In this step, all videos with five or more faces are removed. Nevertheless, these videos were needed to calculate the Success Score because we use a rolling average window for one of the three Success Score components. For further investigation, we consider these thumbnails as outliers. We assume facial attributes in a thumbnail with five or more faces are difficult to recognize. One reason for this difficulty is that they are relatively small and thus have a negligible effect on the consumer. In total, 2,902 videos were removed. To ensure comparability, all input features (see Section 4.1) are normalized. Since emotion- and ethnicity-based features provide values for each face, we need to determine an aggregation for thumbnails with multiple faces. Therefore, we compare the use of the average and the sum for the emotions and ethnicities. We conclude that our Success Score correlates stronger with the summed values for emotions and ethnicities Success Score, and therefore we select this form of aggregation. Finally, we remove sparsely represented video categories. Categories containing less than 1 percent of the extracted videos are deleted. This way, we remove 22 categories and their corresponding videos. Here, 4,415 videos were removed from our dataset. 4.3
Data Analysis
After the data preparation, we continue investigating our dataset through descriptive analysis. Here, the goal is to provide insights into our data. To achieve this goal, we first study general information about our dataset (Table 7.3). Table 7.3
General information about the dataset
Attribute
Value
Number of videos
145,088
Average number of views
356,256
Number of categories
8
Percentage of faces
58%
Number of men
112,720
Number of women
12,930
Another part of our descriptive analysis is to examine the feature distribution. Figure 7.3 shows the sum of the occurrences for each emotion. For example, the emotion happiness is most frequently represented with more than 35,000 occurrences. Presumably, content creators on YouTube probably utilize this emotion to relate positivity and a feeling of joy to their videos. This positive emotion probably causes the audience to connect more deeply with the thumbnail protagonist, increasing the desire to watch the video. On the other hand, the emotions disgust and surprise occur fewer than 5,000 times. Anger, sadness, neutral, and fear occur around 15,000 to 25,000 times and are thus in the middle range. To compare differences in emotions between categories, we investigate the allocation of emotions within eight different video categories (Figure 7.4). Overall, there are no drastic differences between the categories. For example, happiness is the most common emotion within all categories, while disgust occurs seldom. Furthermore, the categories Autos & Vehicles and How to & Style show a very high proportion of happiness but a lower average proportion of sadness. The Music category, on the other hand, is the opposite. Furthermore, the emotion surprise occurs rarely in the Cars & Vehicles and Sports categories.
Predicting YouTube success 149
Figure 7.3
Occurrences of emotions
Figure 7.4
Proportion of emotions per category
We also examine the occurrences of different ethnicities in our dataset (Figure 7.5). The largest proportion is White, with over 60,000 occurrences. All other ethnicities occur relatively equally, with about 5,000 to 15,000. This observation is strongly influenced by the exclusive consideration of the German market. We then calculate correlations between our input features and our Success Score using the Pearson Correlation. This type of correlation fits our use case because our Success Score is normally distributed. Also, the Pearson Correlation measures the linear relationship between
150 Handbook of social computing
Figure 7.5
Occurrences of ethnicities and proportion of gender
Figure 7.6
Correlations between the Success Score and the input features for the categories
two continuous variables. In the following, we only consider significant correlations using a significance level of α = 0.05. These correlations are visualized in Figure 7.6, which includes the correlations for our whole dataset and each category separately. We observe a positive
Predicting YouTube success 151 influence on using faces in thumbnails and their success. However, there are substantial differences between the categories, which we will describe and evaluate in the following. First, we investigate the relationship between a person’s age and thumbnail success. The most successful German YouTubers do not deviate highly in age. Additionally, we consider the average age for thumbnails with multiple faces. As a result, the age attribute interval is very short (20 to 30 years); thus, age correlates almost one-to-one with the presence of faces. Consequently, age and the presence of faces correlate almost equally with the Success Score. In the following analysis, we will not further elaborate on the attribute age because the correlation mainly results from the presence of a face. Accordingly, it is not valid that a thumbnail is more successful the older the person is, even if the correlations initially suggested this implication. According to Figure 7.6, the highest negative correlation can be found in the category How To & Style (−0.32). On the other hand, the highest positive correlation can be found in the category Sports (0.13). Both refer to the attribute Includes Faces. Additionally, we can see that the correlations vary between and within the categories. In the category Gaming, the emotions happiness and surprise and the ethnicity White are highly correlated with the Success Score. The success of White people is probably motivated by socio-cultural factors because we only consider the German market. The features happiness and surprise have positive connotations and create associations such as joy and excitement, which may make gaming videos more attractive. In the category How To & Style, the presence of faces correlates strongly with the Success Score. It follows that it is essential to use faces in this category. Specific facial attributes, on the other hand, are not proportionally important. Therefore, make-up and beauty videos (part of the category How To & Style) may focus on the person’s attractiveness and less on emotions. In this category, it is also noticeable that the presence of women correlates almost like the presence of men with the Success Score, which is not observed in many other categories. In the category Music, emotions are somewhat irrelevant, but ethnicities are not. Thumbnails with White, Middle Eastern, or Latino/Hispanic men are generally successful. A possible explanation for their success could be that viewers listen to German and international music. In the categories Sports and Cars & Vehicles, all facial attributes are negatively correlated with the Success Score. Therefore, the presence of faces should be avoided in thumbnails. Sports equipment, club emblems, or vehicles are probably more important than human faces in these categories. In the category Education, the emotion sadness has the highest correlation with the Success Score. Although this observation is unexpected, sadness could be well used in thumbnails. On the other hand, surprise correlates negatively with success and should probably be avoided. On average, thumbnails with Middle Eastern, Latino/Hispanic, or Indian men are mainly successful. Therefore, it could be concluded that German viewers associate these ethnicities with a high educational competence. Regarding the Entertainment category, the emotion surprise and the ethnicity Latino/ Hispanic show the highest positive correlation with success. Therefore, these attributes could be well applied. Surprise is probably successful because uncertainty always builds tension, which is perceived as attractive in the Entertainment category. However, the other facial attributes have only a minor influence on success. In the People & Blogs category, the number of faces correlates remarkably with the Success Score. Consequently, thumbnails with multiple faces are more successful than thumbnails with
152 Handbook of social computing only a single face. This is probably observed since viewers are more likely to be interested in interactions with numerous people when watching videos about People & Blogs. Asian and Latino/Hispanic people show the highest correlations with the Success Score regarding ethnicities. Within the People & Blogs category, there is probably a more substantial interest in different cultures and geographies. 4.4
Predictive Analysis
Instead of giving general recommendations for using facial attributes in thumbnails, we aim to predict the success of individual thumbnails based on these factors. Therefore, we build and train a deployable machine learning model to predict our Success Score for given thumbnails. Since we only consider face-related attributes, we need to perform an additional data cleaning. Here, we remove all videos which do not include human faces in their corresponding thumbnail. Accordingly, it will certainly not be possible to predict the success of thumbnails without faces. Finally, our dataset consists of 81,395 thumbnail instances. In Section 4.3, we observe low correlations between our input and target feature. Therefore, we do not exclude any features for the prediction. We start our predictive analysis by splitting the dataset. For this analysis, we follow the common proportions for the different datasets: Training Set (60 percent), Validation Set (10 percent), and Test Set (30 percent). All models are trained using the training set and evaluated using the test set. Finally, we tune the models’ hyperparameters utilizing the validation set. We start the prediction using a regression, as the Success Score is a numerical label for our data. Then, we apply different modeling approaches whose performances are listed in Table 7.4. Table 7.4
Accuracy of the regression models
Model
R² Score
Mean Absolute Error (MAE)
Linear Regression
0.1273
2.0729
Polynomial Regression
0.1403
2.0510
XGBoost
0.1522
2.0335
We result in explaining 12.73 to 15.22 percent of the Success Score’s variance. In addition, the XGBoost algorithm is the best-performing model, with an R² Score of 0.1522. Specifically, we perform hyperparameter tuning using a GridSearch for the XGBoost-based model. For this purpose, a model is trained for all possible hyperparameter combinations based on a finite number of predefined values. Finally, we select the best-performing model instance on the validation set regarding the R² Score. Our second approach consists of multiple regressions based on individual video categories. Thereby, we create a separate model for each category previously used as an input feature. The resulting models are solely based on facial attributes as input features. However, since each regression model for each category performed worse than the first approach, we did not elaborate further on this approach. A classification approach is used to obtain higher accuracy and more interpretable results. In this step, we transform our numerical Success Score into three equally sized categories based on the number of thumbnails within each category: Unsuccessful, Neutral, and Successful. We compare two different classification models. While our Logistic Regression has an accuracy of
Predicting YouTube success 153 0.4583, the accuracy of our XGBoost is 0.4881. We measure the model’s performance based on the accuracy metric, representing the ratio of correct classifications divided by the total number of classifications. We observe that the XGBoost-based model is our best-performing model.
Figure 7.7
Confusion matrix based on the XGBoost classification
As explained in our first approach, we also use GridSearch to tune the XGBoost hyperparameters and thus improve our model accuracy. Figure 7.7 shows the detailed Confusion Matrix for our XGBoost-based classification model, which is evaluated using the test set. The number of correctly classified instances is shown on the diagonal from top left to bottom right. Out of 24,419 values, 11,918 (48.8 percent) were correctly predicted. It is noticeable that our model is more likely to classify thumbnails as successful than unsuccessful. A possible reason for predicting more thumbnails as successful is that facial attributes positively correlate with the Success Score in most categories. Another reason could be that only thumbnails with faces were used in the model.
5. RESULTS In this section, we present the results of our work and address the hypotheses stated in Section 3. Hypothesis H1 can be conditionally confirmed. As shown in Figure 7.6, some facial attributes in YouTube thumbnails correlate moderately positively with the success of the videos. For example, Happiness and Surprise have a correlation of 0.10 with our success score. These two emotions are part of the “Top 4 positive emotions” (Harvard Business Review, 2015), indicating their high performance in thumbnail success. Happiness is also the most common emotion, while surprise is rarely used. Therefore, we recommend using especially the underestimated emotion “surprise” to increase the attractiveness of the thumbnail.
154 Handbook of social computing In addition, White people or the presence of Women in thumbnails showed a correlation of 0.11 with the Success Score. The fact that White people increased the thumbnail success most might be motivated by socio-cultural factors. The target group and thus the audience of our extracted video data is the German market, probably most attracted by people of the same ethnicity. However, we perceive that these correlations are not remarkable. One potential reason might be that various other factors addressed in Section 7 influence the success of a YouTube video. Nevertheless, some attributes such as Disgust, Anger, Sadness, Indian, or Black show no or a poor correlation with the Success Score. In general, the presence of faces in thumbnails is positively correlated with the success of the videos with a correlation coefficient of 0.13. Hypothesis H2 can be confirmed. We compared different video categories regarding the success of different facial attributes (see Section 4.3). For instance, there are categories such as How To & Style, Gaming, Education, and Music where certain facial attributes in thumbnails positively correlated with video success. However, there are huge differences across these categories. For example, while the emotion surprise performs well in gaming-related videos, it exhibits a negative correlation with success in the education category. Nevertheless, sadness is one of the most successful emotions in educational videos, while it does not correlate with success in the Gaming category. People’s ethnicities also show varying successes for different categories. For instance, White is highly correlated with success in the Gaming category, while in the People & Blogs category, Asian and in the Music category Latino/Hispanic perform best. The correlations for the attribute Includes Faces also confirm hypothesis H2. Except for the category Sports, there is a positive correlation between the attribute Includes Faces and the success in all categories. While in the category Sports the correlation is −0.13, the highest correlation is in the category How To & Style with 0.32. Our answer to hypothesis H2 emphasizes the complexity of the coherence between facial attributes and the success of thumbnails in different categories. Hypothesis H3 can be conditionally confirmed. Our machine learning models predict the success of thumbnails to a certain extent. For this purpose, a regression model was developed, explaining 15 percent of the variance in success. In addition, a classification model was developed that correctly predicts whether a thumbnail is Unsuccessful, Neutral, or Successful with an accuracy of 49 percent. However, the modest predictive performance can probably be explained due to the influence of many other factors (besides facial attributes) regarding thumbnail success. We expected the prediction to have limited performance, confirmed during our work because of the low correlations between the Success Score and the facial attributes we investigated beforehand.
6. DISCUSSION This work investigates the interrelation between facial attributes in YouTube thumbnails and their success. We provide proposals for using facial attributes for each of the major video categories. We also trained a prediction model to predict thumbnail success based on facial attributes.
Predicting YouTube success 155 Thus, we complement existing research, focusing mainly on finding indicators for creating thumbnails to gain more attention (Koh & Cui, 2022). For this purpose, Koh & Cui (2022) examined various thumbnail properties in more detail and investigated the effects of certain attributes on the number of views. In this context, facial attributes such as age, gender, or ethnicity were dealt with to a limited extent. Our findings about emotions differ from Cremer’s (2017) research, which proves that negative emotions are more prevalent and have a greater influence on the success of the thumbnail. In contrast to Cremer (2017), positive emotions such as happiness and surprise perform best. Cremer (2017) has already revealed increased attention when using emotions. We were able to confirm this finding in our hypothesis H1 and contribute to the research by extending the work of Cremer (2017), which deals with the relation between complexity dimensions (visual, emotional, and social) in thumbnails and the number of views. So, we differ from previous work by investigating which facial attributes such as age, gender, ethnicity, and emotion can lead to more success. The interrelation between facial features and the success of a thumbnail has not yet been researched concerning individual video categories beforehand. Koh and Cui (2022) only consider video categories for their study on optimizing thumbnails to attract more users. In our work, we evaluate the difference between YouTube categories regarding the success of facial attributes. We find that facial attributes lead to higher success in specific categories. By proving hypothesis H3 partially, we extend the research of Shimono et al. (2020), who focus on a method that generates more effective thumbnails with facial expressions. Proving this hypothesis also coincides with Cremer (2017), who predicts the popularity of hedonic digital content with artificial intelligence. Furthermore, we have developed a machine learning model to predict success, which allows us to classify success based on the facial attributes of a thumbnail. Thus, we expand existing research by considering facial attributes and success in more detail. The discussion highlights a clear contribution to the research already conducted in analyzing the influence of facial attributes on the success of a YouTube video.
7.
LIMITATIONS AND FUTURE WORK
We describe limitations and explain how future work can address or complement them in the following. Our results are mainly limited by the data preparation and thumbnail analysis process. Our findings may be biased due to the sample size of 160,000 thumbnails from the top 100 most popular German content creators on YouTube. The dataset could be extended to the international market to generalize our propositions and improve the predictive power. This approach would also allow the analysis of cultural or regional differences regarding thumbnail success (Zhang et al., 2021). The Success Score is developed to measure the success of YouTube thumbnails, which is potentially not the most precise approach. It is investigated that the click-through rate (CTR), which represents the ratio between views and impressions, is an essential indicator for the attraction of a thumbnail image (Wilson, 2019). Due to Application Programming Interface restrictions from YouTube, it is impossible to obtain this figure since it is only accessible to the particular YouTube channel owner. Further research could cooperate with these video cre-
156 Handbook of social computing ators. First, it would enable our approach to extract the CTR, which would probably be a more precise label for thumbnail quality besides our custom Success Score. Thus, our prediction model could probably be more accurate. In addition, our label does not consider the time since the publication of the videos. Nevertheless, Chowdhury and Makaroff (2013) show that the number of views increases most strongly during the first days after publication. It is possible that newly published videos only received a low number of views for that reason, which may have negatively influenced the validity of our Success Score. However, we did not include each channel’s most recent five videos in the dataset to mitigate this observation. This issue of relative time restriction could be reduced in future research by only considering videos with a specific uptime. The data quality in facial analysis depends on the accuracy of the applied face detection and the extraction of facial attributes. Consequently, it is possible that faces are not detected or incorrectly identified. This inaccuracy is countered by only utilizing faces with a correctness accuracy of 90 percent or higher. This accuracy is chosen because faces with a lower accuracy often have a poor resolution or are indistinct, negatively influencing our facial attribute recognition. The applied facial attribute recognition accuracy impacts our descriptive and predictive analysis. The inaccuracy of facial attribute recognition could partially explain the low correlation and predictive accuracy results. An open problem is still the bias of deep learning models like Convolutional Neural Network (CNN), which results in overfitting models (Thom & Hand, 2020). However, new algorithms could handle existing face detection challenges such as pose, facial attributes, or occlusion better within the faces (Yang et al., 2002). Furthermore, it may be possible to determine other emotions and ethnicities. The feature extraction is limited to facial attributes, but additional characteristics may influence thumbnail success. Therefore, the feature extraction could be extended by including further attributes such as charisma or body language. For example, there is already research in detecting and classifying attractive hairstyles (Nakamae et al., 2020), which could be used to extend the input features. Using non-personal features regarding the used colors and text in thumbnails is also possible. In addition to the thumbnail itself, further studies could also investigate the video title in terms of success. The video title is also essential when browsing video platforms, influencing the number of views (Hoiles et al., 2016; Koh & Cui, 2022). Finally, the impact of “sexually appealing thumbnails” (which are mostly referred to as “clickbait”) on the success of videos could be considered in future work (Zannettou et al., 2018). All of these additional factors probably influence the success of YouTube videos. Therefore, the correlation between facial attributes and the success score is probably weak in some cases. The video success prediction could also be improved with the supplementary use of the mentioned features.
ACKNOWLEDGMENT We greatly appreciate the help from Professor Peter Gloor, Center for Collective Intelligence at MIT’s Sloan School of Management. Not only did he provide us with regular guidance on our research topic, but he also gave us the opportunity to present our work at the 10th international COINs conference.
Predicting YouTube success 157
REFERENCES Bradley, S. W., Roberts, J. A., & Bradley, P. W. (2019). Experimental evidence of observed social media status cues on perceived likability. Psychology of Popular Media Culture, 8(1), 41–51. https://doi.org/ 10.1037/ppm0000164. Chang, W.-L., Chen, L.-M., & Verkholantsev, A. (2019). Revisiting online video popularity: a sentimental analysis. Cybernetics and Systems, 50(6), 563–77. https://doi.org/10.1080/01969722.2019 .1646012. Chowdhury, S., & Makaroff, D. (2013). Popularity growth patterns of YouTube videos: a category-based study. WEBIST 2013 – Proceedings of the 9th International Conference on Web Information Systems and Technologies, 233–42. Cremer, S. (2017). Predicting popularity of hedonic digital content via artificial intelligence imagery analysis of thumbnails. http://aisel.aisnet.org/pacis2017http://aisel.aisnet.org/pacis2017/186 (accessed 10 December 2022). Digital Information World (2020). The perfect thumbnail is key to YouTube success, study finds. Digital Information World, October 27. https://www.digitalinformationworld.com/2020/10/the-perfect -thumbnail-is-key-to-youtube.html. Digital Information World (2020). The perfect thumbnail is key to YouTube success, study finds. Digital Information World, October 27. https://www.digitalinformationworld.com/2020/10/the-perfect -thumbnail-is-key-to-youtube.html (accessed 5 Dember 2020). Fontanini, G., Bertini, M., & del Bimbo, A. (2016). Web video popularity prediction using sentiment and content visual features. Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 289–292. https://doi.org/10.1145/2911996.2912053. Harvard Business Review (2015, September 1). Why some videos go viral. https://hbr.org/2015/09/why -some-videos-go-viral (accessed 10 December 2020). Hoiles, W., Aprem, A., & Krishnamurthy, V. (2016). Engagement dynamics and sensitivity analysis of YouTube videos, ArXiv E-Prints, 1–12. https://doi.org/10.48550/arXiv.1611.00687. Holmbom, M. (2015). The YouTuber: a qualitative study of popular content creators. Institutionen För Informatik, Umeå University. http://umu.diva-portal.org/smash/get/diva2:825044/FULLTEXT01.pdf (accessed 10 December 2020). Koh B., & and Cui, F. (2022). An exploration of the relation between the visual attributes of thumbnails and the view-through of videos: The case of branded video content. Decision Support Systems, 160, 113820. https://doi.org/10.2139/ssrn.3611735 . McLellan, B., & McKelvie, S. J. (1993). Effects of age and gender on perceived facial attractiveness. Canadian Journal of Behavioural Science/Revue Canadienne Des Sciences Du Comportement, 25(1), 135–42. https://doi.org/10.1037/h0078790. Nakamae, Y., Wang, X., & Yamasaki, T. (2020). Recommendations for attractive hairstyles. Proceedings of the 2020 Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia, 19–24. https://doi.org/10.1145/3379173.3393709. Nier, H. (2018). Infografik: Musik und Comedy sind bei Youtube am beliebtesten. Statista. https:// de.statista.com/infografik/12526/beliebteste-videokategorien-bei-youtube/ (accessed 1 December 2020). Nindo (2021). Socialmedia Charts & Statistiken. Nindo. https://nindo.de/(accessed 1 December 2020). Pretorious, K., & Pillay, N. (2020). A comparative study of classifiers for thumbnail selection. 2020 International Joint Conference on Neural Networks (IJCNN), 1–7. https://doi.org/10.1109/ IJCNN48605.2020.9206951. Serengil, S. I. (2018). Facial expression recognition with keras. Sefiks. https://sefiks.com/2018/01/01/ facial-expression-recognition-with-keras/ (accessed 15 November 2020). Serengil, S. I. (2019a). Apparent age and gender prediction in keras. Sefiks. https://sefiks.com/2019/02/ 13/apparent-age-and-gender-prediction-in-keras/ (accessed 15 November 2020). Serengil, S. I. (2019b). Race and ethnicity prediction in keras. Sefiks. https://sefiks.com/2019/11/11/race -and-ethnicity-prediction-in-keras/ (accessed 15 November 2020). Serengil, S. I., & Ozpinar, A. (2020). LightFace: a hybrid deep face recognition framework. 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), 1–5. https://doi.org/10.1109/ ASYU50717.2020.9259802.
158 Handbook of social computing Shimono, A., Kakui, Y., & Yamasaki, T. (2020). Automatic YouTube-thumbnail generation and its evaluation. Proceedings of the 2020 Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia, 25–30. https://doi.org/10.1145/3379173.3393711. Song, Y., Redi, M., Vallmitjana, J., & Jaimes, A. (2016). To click or not to click. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 659–668. https://doi .org/10.1145/2983323.2983349. Stepanova, E. v., & Strube, M. J. (2018). Attractiveness as a function of skin tone and facial features: evidence from categorization studies. The Journal of General Psychology, 145(1), 1–20. https://doi .org/10.1080/00221309.2017.1394811. The top 500 sites on the web (2021). Alexa. https://www.alexa.com/topsites (accessed 1 December 2020). Thom, N., & Hand, E. M. (2020). Facial attribute recognition: a survey. In K. Ikeuchi (ed.), Computer Vision: A Reference Guide (pp. 1–13). Springer International. Wilson, L. (2019). Clickbait works! The secret to getting views with the YouTube algorithm. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3369353. Wirth, R., & Hipp, J. (2000). CRISP-DM: towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, 1, 29–39. Yang, M.-H., Kriegman, D. J., & Ahuja, N. (2002). Detecting faces in images: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1), 34–58. https://doi.org/10.1109/ 34.982883. Zannettou, S., Chatzis, S., Papadamou, K., & Sirivianos, M. (2018). The good, the bad and the bait: detecting and characterizing clickbait on YouTube. 2018 IEEE Security and Privacy Workshops (SPW), 63–69. https://doi.org/10.1109/SPW.2018.00018. Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–503. https://doi.org/ 10.1109/LSP.2016.2603342. Zhang, S., Aktas, T., & Luo, J. (2021). Mi YouTube es su YouTube? Analyzing the cultures using YouTube thumbnails of popular videos. 2021 IEEE International Conference on Big Data (Big Data), 4999–5006. https://doi.org/10.1109/BigData52589.2021.9672037. Zhang, W., Liu, C., Wang, Z., Li, G., Huang, Q., & Gao, W. (2014). Web video thumbnail recommendation with content-aware analysis and query-sensitive matching. Multimedia Tools and Applications, 73(1), 547–71. https://doi.org/10.1007/s11042-013-1607-5.
8. Do angry musicians play better? Measuring emotions of jazz musicians through body sensors and facial emotion detection Lee J. Morgan and Peter A. Gloor
1. INTRODUCTION Emotions play a core role in the decisions humans make every day. It therefore comes as no surprise that the study of emotions and their recognition is a heavily researched area. Indeed, studies have found that in the workplace, emotions affect engagement and creativity, with positive emotions, like happiness, increasing workforce performance (Amabile et al., 2005; Amabile & Kramer, 2011). The analysis of emotions has also taken a more computational turn, as datasets, machine learning and deep learning models that can be used for recognizing emotions have become increasingly available. Indeed, researchers have used emotion recognition techniques to enhance safe driving through recording driver emotions (De Nadai et al., 2016), note the emotions of mental health patients in unobtrusive ways (Guo et al., 2013) and detect lies (Verschuere et al., 2006). A similar related concept – flow – has been introduced by Csikszentmihalyi (1990). Flow is the mental state of elatedness resulting from being energized and immersed into an activity while fully enjoying it. More recently, the flow concept has been extended from a state defined for individuals to groups. In particular, jazz musicians, their physiological signals and their group behavior have been researched. Studies have gauged the “group flow,” which occurs when a group of people is in an extremely focused and productive state, of jazz musicians through the use of sociometric badges (Gloor et al., 2013). Another study used functional magnetic resonance imaging (fMRI) data, which measures brain activity though analyzing blood flow, to gauge jazz improvisation (Donnay et al., 2014). Emotion recognition software typically uses one method to gauge emotions. Some studies use speech data (Kerkeni et al., 2019) or physiological signals (Ali et al., 2018). Other studies use facial emotions, but they, along with speech data, can be biased and skewed by people masking their true emotions. Physiological signals cannot be easily controlled, and thus it is easier to accurately measure emotions using physiological sensors (Shu et al., 2018). Besides the obvious problem of human subjects skewing their facial and speech data, emotions are extremely complicated and using one method or metric to gauge them might not be sufficiently accurate (Qiu et al., 2018). Thus, some studies investigate multimodal methods to better classify people’s emotions (for example, predicting emotions through using both facial and physiological data). Indeed, an analysis of 50 multimodal sentiment analysis papers concluded that they resulted in more accurate emotional predictions, alongside the fact that there is much variation in the methods used in each paper (Ullah et al., 2017). The dynamics between each modality used in multimodal sentiment analysis have also been explored (Zadeh et al., 2017). 159
160 Handbook of social computing Despite the recent research in multimodal emotion recognition, there is limited research when it comes to analyzing the inter-modal dynamics of one entire group. Most studies looking into multimodal sentiment analysis and the dynamics within it typically are looking at an individual. Other studies have looked at the inter-modal dynamics between two groups of people as well (Gloor et al., 2019). Motivated by the advances in multimodal sentiment analysis, this chapter investigates the interactions between the facial and physiologically gauged emotions in one group of individuals, in this case, jazz musicians. The contributions of this chapter are as follows: We collected data from a four-hour jazz rehearsal where an orchestra of 30 students at a music college was rehearsing for a concert with a world-famous musician. The orchestra included string players, percussionists, singers and dancers, ten of which were equipped with smartwatches to collect their body signals, while a camera was recording the emotions shown in their faces. We then analyzed the correlations between the different results for each emotional prediction method, comparing the emotions predicted by the physiological signals to those based on facial features. Moreover, the correlations between emotions and physical indicators, like heart rate and speech volume, were investigated. Finally we created machine learning models that would predict the intensity of certain facial emotions using physiological signals and emotions as inputs.
2.
THEORETICAL BACKGROUND
The topic of human emotions has been widely explored by psychologists, but there has yet to be a consensus on the proper framework for emotion classification (Shu et al., 2018). Indeed, there are many different models that have emerged, and they can be divided into dimensional and discrete models. One of the most prominent discrete models, developed by Paul Ekman and his colleagues, divides emotions into six categories: happy, sad, anger, fear, surprise, and disgust (Ekman & Friesen, 1971). According to this theory, all other emotions are derived from these basic categories. A wheel model has also been proposed by Plutchik, composed of eight basic emotions: joy, trust, fear, surprise, sadness, disgust, anger and anticipation (Plutchik, 2001). Like Ekman’s model, other emotions are based on a mixture of these basic emotions. Dimensional models, on the other hand, create a two- or three-dimensional graphic representation of emotions, with various axes representing different factors that play into emotions. For instance, the Circumplex model of effect has two axes: valence and arousal (Posner et al., 2005). Valence represents the positivity or negativity of a response, and arousal represents how active a certain emotion is. For example, boredom would have extremely low arousal and moderately low valence. One of the emotions that has been under the most scrutiny is happiness, as there have been many models that try to define happiness and what determines this emotion. Robertson and Cooper (2011), for example, explain that happiness is based on the well-being of the mind and body, while Frey and Stutzer (2018) posit that happiness is attained when one meets a personal goal. In many cases, emotions – and happiness specifically – are measured through surveys. Experimental subjects are offered questionnaires to fill out based on their satisfaction with
Do angry musicians play better? 161 their life, which is then used to draw conclusions on the impact of different emotions in many areas, like productivity or creativity (Robertson & Cooper, 2011). However, besides being costly and difficult to implement, these surveys are plagued by cognitive biases (Kahneman & Krueger, 2006). Emotions are typically demonstrated through language, physiology and facial expressions. Among these, facial and speech data are some of the most powerful indicators of emotions (Ambady & Weisbuch, 2010; Rule & Ambady, 2010). Physiological changes are also indicators of emotional changes (Purves et al., 2001). It is important to note that each modality has its own strengths and weaknesses, as well as specific situations in which they can be deployed (Egger et al., 2019). Machine learning has been deployed for predicting emotions from the modalities presented above. For example, with facial emotion recognition, neural networks and traditional algorithms have been used. Traditional algorithms typically extract geometric and appearance information from the face and utilize the information as features. For example, Ghimire and Lee (2013) used the position and angle of 52 points on the human face as features, which were used in a multi-class AdaBoost and a support-vector machine. Another study ran a principal component analysis with local binary pattern histograms of different block sizes from the faces as features to predict emotion (Happy et al., 2012). Other studies in the area of facial recognition use neural networks. Popular models in this field are Long Short Term Memory models, a type of Recurrent Neural Network, and Convolutional Neural Networks (CNNs), a type of Deep Neural Network. For example, one study employed two CNNs for sentiment analysis, one which extracted appearance features and another which extracted geometric features of the face (Jung et al., 2015). In another analysis, it was found that a hybrid Recurrent and Convolutional Neural Network outperformed a CNN running on its own (Kahou et al., 2013). Lastly, Jain et al. (2017) used a Long Short Term Memory Model along with a CNN to label facial expressions as part of a multi-angle optimal pattern-based deep learning method. Physiological-based emotion recognition, on the other hand, tends to be restricted to more simplistic algorithms in machine learning, like “random forests” (Ali et al., 2018). This could be a result of a lack of datasets with many labeled points containing physiological and emotional data – datasets which are much more common in the field of facial emotion recognition. Research regarding physiological signals and their use for emotion prediction is employing an abundance of different metrics to predict emotions, like heart rate, movement, blood pressure and body temperature. As a result, different methods have emerged for emotion recognition with body-based signals. One study, for example, employed an “Emotive Couch” which was a piece of smart furniture that detected the movement of the human body to classify emotions (Rus et al., 2018). Other studies utilized wearable devices that contained sensors, like smart watches. In those cases, researchers used these wearable sensors to measure pulse, blood pressure, blood oxygen, temperature and other physiological metrics in their relation with emotions (Khan & Lawo, 2016). Another study used a wrist band which measured heart rate and some of its derived features for sentiment analysis (Nguyen et al., 2017). One other study used a badge put on the chest which measured one’s accelerometer, sound level, temperature and location for sentiment analysis as well (Yano et al., 2015). Lastly, the Happimeter, which is used in this chapter, uses an accelerometer sensor, step counter, heart rate sensor, microphone, position sensor and some derived features (Roessler & Gloor, 2020).
162 Handbook of social computing In some cases, the results of emotion classification algorithms can be reported back to their subjects in a process called “virtual mirroring.” In earlier instances of virtual mirroring, workers were shown their email habits, and as a result they changed their behavior to be more productive and innovative (Gloor et al., 2017). Virtual mirroring has succeeded in the field of emotion recognition, as when subjects got happiness feedback, their happiness increased in turn (Roessler & Gloor, 2020). As mentioned earlier, most sentiment analysis studies have used unimodal data, but the use of multimodal methods, where multiple modalities are utilized, has become increasingly popular. An important topic in multimodal sentiment analysis is the exploration of inter-modal dynamics, where the interactions between each modality are documented (Marechal et al., 2019). Research which tries to track inter-modality dynamics has been conducted (Zadeh et al., 2017); however, most research focuses on the individual level. It is important to move beyond individual level analysis, though, as research shows that one’s expression of their emotions can impact the emotions of another – for instance, the facial expression of an individual can change the mood of another (Ekman et al., 1980). There has been limited exploration of the inter-modal dynamics within a larger group of people. One study has analyzed the dynamics between a group of actors and an audience, but each group was provided with different sensors, making it harder to look into the dynamics between each modality for only one group (Gloor et al., 2019). This chapter aims to investigate the inter-modal dynamics within one group of people, specifically through comparing emotions extracted from a group’s facial expressions and physiological signals.
3. METHODOLOGY 3.1
Data Collection
Table 8.1
Summary of the collected data from the rehearsal
Data Type
Description
Heart rate
The smartwatch measured the performers’ heart rates in beats per minute (BPM)
Microphone
The smartwatch microphone recorded the noise level throughout the performance
Movement
The sum of the absolute values of the smartwatch’s accelerometer values in the X, Y and Z directions
Standard Deviation
The standard deviation of the performer’s movement values as they were measured during the rehearsal
Physiological Emotions
The Happimeter’s machine learning model predicted the intensity of the performers’ levels of activation, pleasance and stress. Each emotion was labeled with 0, 1 and 2
Facial Emotions
Video of participants faces fed into a CNN that classified faces as happy, sad, angry, surprise, neutral and fear
We collected physiological and facial data from different performers, namely string players, percussionists, singers and dancers. This data was collected at a four-hour long jazz rehearsal where 30 college music students rehearsed for a concert with a prominent musician in their field. The Happimeter (Budner et al., 2017) app was installed on smartwatches given to the participants, which measured their physiological signals and gathered their activation, pleasance and stress levels, which were determined using a machine learning model that used
Do angry musicians play better? 163 physiological signals as its features. Moreover, a video camera was used to capture the faces of the participants throughout the duration of the experiment. Due to a limited number of smartwatches along with privacy concerns, only ten of the participants had their data collected (Table 8.1). 3.2
Model Implementation
3.2.1 Facial emotion recognition In this experiment, we used the facial emotion recognition (FER) algorithm developed by Gloor et al. (2019). This model was trained on four different datasets, including CK+ [40] and JAFFE [41]. It employs a VGG16 (Simonyan & Zisserman, 2014) CNN that was pre-trained on ImageNet. To extract the faces from the video data, the Python face_recognition package was used to identify faces. The extracted faces were then labeled by the FER model described above. 3.2.2 Physiological emotion recognition We also employed the Happimeter app’s machine learning model for physiological sentiment analysis (Budner et al., 2017). The model took physiological and environmental features into account for its predictions. Using Scikit-learn’s (Pedregosa et al., 2011) gradient boosting algorithm, the Happimeter achieved a prediction accuracy of 79 percent after training on data collected from users over a three-year period. In our experiment, we used the Happimeter app on the smartwatches to label the performers’ levels of activation, pleasance and stress with values of 0, 1 or 2. 3.2.3 Correlation analysis After performing a rolling window calculation (simple moving average) on the data in order to smooth out short-term changes in the data and expose long-term trends, we conducted a correlation analysis using the Pearson Correlation coefficient. Specifically, correlations between facial and physiological emotions were calculated. Also, we determined correlations between emotional intensity and the physiological variables. 3.2.4 Cross-modal predictions Lastly, we used the physiological data and the emotions extracted from it to predict the intensity of certain facial expressions. To do so, a random forest regression was used, both taking in the same features: heart rate, microphone values, activation, pleasance and stress levels, and the average and standard deviation of their movement. To measure the importance of variables in the random forest regression, we used the mean decrease in impurity mechanism. This analysis was based on the Scikit-learn library (Pedregosa et al., 2011).
4. RESULTS Figure 8.1 shows the levels of the moving average for some facial (happy, sadness, anger, fear) and physiological emotions (activation, pleasance, stress) over a two-hour rehearsal period. Activation, pleasance and stress steadily rose throughout the duration of the rehearsal, while the facial emotions were more volatile.
164 Handbook of social computing
Figure 8.1
Emotions from the Happimeter and the FER model
Unsurprisingly, activation, pleasance and stress have strong positive correlations, with r > 0.75*** for each pairing of the emotions.1 This reflects the trend seen in Figure 8.1 and implies that as the actors become more activated, they feel more stressed and pleasant at the same time, which might be an indicator of group flow, which has also been called “eustress” or positive stress. Other interesting results are the insignificant correlation between the FER happiness and the Happimeter’s pleasance metric (r = −0.018), suggesting a disconnect between the two, which could also be because of the angry faces captured by the FER, while the musicians were experiencing eustress. Moreover, anger tended to have positive, significant correlations with activation, pleasance and stress with r > 0.50*** for each combination of the four. In addition, the FER sadness and happiness emotions had a significant but weak positive correlation (r = 0.19***). In addition, as fear increased, activation, pleasance, stress and anger tended to decrease (r was between −0.13*** and −0.24***). Some intuitive results were the negative correlations the neutral emotion had with activation, pleasance, stress, anger and happiness, with the correlation value being at most −0.34*** and at least −0.62***. As Figure 8.2 shows, the intensity of facial emotions and the Happimeter sensors display erratic behavior, with some spikes but no general trends throughout the rehearsal, besides the steady rise in anger levels after falling rapidly, and a relatively static heart rate level. The Pearson Correlations demonstrate many significant relationships, though. For example, the average movement of the performers negatively correlated with their surprise and happiness levels while it positively correlated with their anger, fear and sadness levels. Each of these correlations were statistically significant. Heart rate had a small, positive correlation with surprise (r = 0.08*), meaning that as people were more surprised, their heart rates tended to increase as well. Heart rate also tended to increase as neutrality did (r = 0.47***) and decrease as anger and happiness increased (r = −0.41*** and −0.12**, respectively). The volume of
Do angry musicians play better? 165
Figure 8.2
Emotions from the FER model and metrics from the Happimeter
the performance increased as the surprise and happiness levels increased as well (r = 0.21*** and 0.41***). On the other hand, fear and neutrality decreased as the volume increased (r ≈ −0.25*** for both). Lastly, the standard deviation of the performers’ movement, which could represent how in sync they were throughout the performance, had notable negative correlations with fear, happiness and sadness. It also tended to increase as the anger level of the performers increased, suggesting that as they were less in sync they tended to appear more angry (r = 0.32***). Table 8.2
Results of random forest regression predicting facial emotion intensity
Emotion Predicted
R-Square
MAE
Anger
0.9671
0.00596
Fear
0.9381
0.00078
Happiness
0.9866
0.00069
Sadness
0.9849
0.00545
Neutrality
0.9741
0.00548
Surprise
0.9688
0.00665
Note:
MAE = mean absolute error.
In addition to calculating correlations, we also ran a random forest regression with movement, heart rate, standard deviation of movement, microphone measurements, and activation, pleasance and stress levels as features. The labels were the intensity of facial emotions. As shown in Table 8.2, the regressions performed very well, with R-Square scores ranging from 93 percent to 98 percent, and MAE ranging from 0.0007 to 0.006. This suggests that these features are powerful for identifying facial emotions, as large percentages of the variance of facial emotion
166 Handbook of social computing intensity can be explained by the random forest we used. In addition, we calculated the mean decrease in impurity for the features in the regressions to see which had the greatest influence.
Figure 8.3
Feature importances in random forest regression
Figure 8.3 shows the importances of each feature taken from the Happimeter, for determining the intensity of facial emotions. The feature importances show how powerful each feature is when calculating the emotional intensity, which can show which pieces of Happimeter data have strong relationships with emotions.
Do angry musicians play better? 167
5. DISCUSSION 5.1
Physiological vs Facial Emotions
One of the most important results that came from the relationships between facial and physiological emotion recognition was that between anger, activation, pleasance and stress. Anger had strong, significant correlations with activation, pleasance and stress, which are indicators of flow or eustress, because flow is a state where one feels happy to do an activity, although stressed while completing it, and highly activated throughout the endeavor. Anger’s positive correlations with the indicators of flow suggest that angry faces were captured by FER during a period of flow. Thus, the FER “anger” emotion might not really be capturing angry faces, but instead the faces of performers who are in the state of flow. This hypothesis is further supported through the correlation between anger and sadness, as sad faces among musicians might indicate the flow state.
Figure 8.4
Significant correlations between Happimeter (left) and facial emotions (right)
Other interesting relationships with group flow and facial emotions are present as well, with all significant correlations shown in Figure 8.4. Fear, for example, had significant negative correlations with the indicators of activation, pleasance, stress and anger, implying that the more the performers were experiencing eustress, the less fear they felt. Moreover, they negatively correlated with surprise and neutrality, again suggesting that as flow increased, the performers felt less surprised or neutral. Also, they had insignificant and weak correlations with happiness, implying that happy faces do not have an identifiable relationship with flow. Activation and pleasance did not correlate with sadness, whereas anger and stress had negative and positive correlations with sadness. This indicates that sadness increases as stress
168 Handbook of social computing does, suggesting that stress is an important component of sadness or vice versa. In addition, the more the performers appeared to be angry, which could hint at group flow, the less sad they appeared to be. The happiness facial emotion also demonstrated interesting relationships. It did not correlate with pleasance, anger or surprise, negatively correlated with neutrality and stress, and positively correlated with sadness. Most of these relationships are counterintuitive, which could signal that the performers did not demonstrate their happiness facially throughout the performance, instead showing other emotions, like anger, while feeling pleasant. 5.2
Physiological Signals vs Facial Emotions
There are also significant relationships between the facial emotions and physiological signals, which are shown in Figure 8.5. Anger, which could indicate eustress, had positive correlations with movement and the standard deviation of movement, and a negative correlation with the heart rate of the performers. It had no significant or notable correlation with the microphone values. Again, since anger can indicate group flow, this would imply that as the performers had higher levels of flow, they moved more and their magnitude of movement differed more as well. In addition, their heart rates tended to decrease as group flow increased.
Figure 8.5
Correlations between physiological signals and facial emotions
The standard deviation of the performers’ movement is negatively correlated with the fear, happiness and sadness levels of the performers. When they got out of sync (as expressed through the standard deviation of their movement), their anger and fear increased, and they started to fidget around more. This could indicate that as the amount of movement between the performers became more similar, they appeared to be more happy, sad and fearful. It positively
Do angry musicians play better? 169 correlated with surprise and anger, hinting that as the movement began to vary more, the anger and stress levels increased. Movement in general negatively correlated with surprise and happiness while it is positively correlated with sadness, fear and anger. This hints that as the performers moved less, they appeared to be more happy or surprised, while feeling less sad, fearful or angry. Moving the body more increases happiness, while at the same time the stress levels also go up. So, the more actively the musicians play their instruments, the more they get into the state of positive eustress. The volume of audio that the microphone picked up increased as the fear and surprise of the performers decreased, pointing out that during quieter portions of the performance, they felt higher levels of those emotions. On the other hand, it is positively correlated with surprise, sadness and happiness, indicating that during louder portions of the performance, faces appeared to be more sad, surprised and happy. The correlation of surprise and happiness of the musicians with the volume of the music they produce indicates that playing louder makes the musicians happier. 5.3
Random Forest Regression
Lastly, the feature importances in the random forest indicate the importance of physiological emotions or signals when determining the facial emotional level. This gives insight into non-monotonic relationships, something Pearson’s Correlation cannot do, and highlights important connections between emotions and physiological signals. For anger, the most important determinant was the activation of the performers, suggesting that activity has a strong relationship with “angry” faces, which makes sense, because activation is part of flow, which might be indicated by angry faces. For predicting fear, stress had the highest importance, at around 0.7, which indicates that stress is closely related to fear. For happiness, the standard deviation of the performers’ movement and the sound level were the most important features, implying that the volume of the music and voices along with their synchronization strongly relate to happiness. Surprisingly, for the neutral emotion, pleasance had the highest importance by far, hinting at a strong relationship between the two. A more logical relationship is that between surprise and its most important feature, stress. Lastly, the most important features for sadness were activation, stress, and microphone values.
6.
CONCLUSIONS AND FUTURE WORK
One of our insights is that the “anger” and “sad” emotions in the facial emotion recognition system might not really indicate anger, but instead demonstrate group flow. The higher the heart rate, the lower are both anger and happiness. This seems to indicate that when people are in flow, their hearts start beating slower. This is a very early result, which will need much further investigation. In order to classify emotions more accurately, in later research flow should be included as one of the emotions in the FER. In fact, we are working on adding the “flow” emotion to our FER. Moreover, for the most part facial recognition algorithms tend to perform worse on different races, which could also be a problem for facial emotion recognition. Thus, developing a FER that performs better for different ethnic groups is necessary as well. Other limitations include the small sample size. Only ten performers had their data
170 Handbook of social computing recorded, and the experiment spanned over just one rehearsal. Lastly, we did not compare the traits of the performers with external variables, like audience satisfaction, which could indicate the quality of the performance and help us understand which emotions help make the artists perform better. Overall, this chapter describes early research measuring emotions of musicians with the goal of increasing their experience. It illustrates the usefulness of the chosen approach applying multimodal emotion analysis. It addresses relationships between facial emotion recognition, physiological emotion recognition and physiological signals, looking into what their relationships have to do with flow or eustress. Information about the data collected could be virtually mirrored back to performers to help them find ways to increase their flow and happiness, leading to more pleasant and productive rehearsals. Our end goal is to produce a system that helps performers understand how they feel as they are working to increase productivity while further exploring inter-modal dynamics.
ACKNOWLEDGMENTS We thank Lydia Renold for helping us with the data collection, and the Berklee College of Music as well as the participating musicians for agreeing to share their sensor data.
NOTE 1.
The asterisks in the text in the following are indicative of: *** = significant at the 0.001 level; ** = significant at the 0.01 level; * = significant at the 0.05 level.
REFERENCES Ali, M., Mosa, A. H., Al Machot, F., & Kyamakya, K. (2018). Emotion recognition involving physiological and speech signals: a comprehensive review. In K. Kyamakya et al. (eds), Recent Advances in Nonlinear Dynamics and Synchronization (pp. 287–302). Cham: Springer. Amabile, T. M., Barsade, S. G., Mueller, J. S., & Staw, B. M. (2005). Affect and creativity at work. Administrative Science Quarterly, 50(3), 367–403. Amabile, T., & Kramer, S. (2011). The Progress Principle: Using Small Wins to Ignite Joy, Engagement, and Creativity at Work. Boston, MA: Harvard Business Review Press. Ambady, N., & Weisbuch, M. (2010). Nonverbal behavior. In S. T. Fiske, D. T. Gilbert, & G. Lindzey (eds), Handbook of Social Psychology. 1, 5th edn (pp. 464–97). Hoboken, NJ: John Wiley & Sons. Budner, P., Eirich, J., & Gloor, P. A. (2017). “Making you happy makes me happy”: measuring individual mood with smartwatches. arXiv preprint arXiv:1711.06134. Csikszentmihalyi, M. (1990). Flow: The Psychology of Optimal Experience. New York: Harper & Row. De Nadai, S., D’Incà, M., Parodi, F., Benza, M., Trotta, A., Zero, E., ... & Sacile, R. (2016). Enhancing safety of transport by road by on-line monitoring of driver emotions. 11th System of Systems Engineering Conference (SoSE) (pp. 1–4). IEEE. Donnay, G. F., Rankin, S. K., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2014). Neural substrates of interactive musical improvisation: an FMRI study of ‘trading fours’ in jazz. PLoS One, 9(2), e88665. Egger, M., Ley, M., & Hanke, S. (2019). Emotion recognition from physiological signal analysis: a review. Electronic Notes in Theoretical Computer Science, 343, 35–55.
Do angry musicians play better? 171 Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17(2), 124. Ekman, P., Freisen, W. V, & Ancoli, S. (1980). Facial signs of emotional experience. Journal of Personality and Social Psychology, 39(6), 1125–34. Frey, B. S., & Stutzer, A. (2018). Economics of Happiness. New York: Springer International. Ghimire, D., & Lee, J. (2013). Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines. Sensors, 13(6), 7714–34. Gloor, P. A., Araño, K. A., & Guerrazzi, E. (2019). Measuring audience and actor emotions at a theater play through automatic emotion recognition from face, speech, and body sensors. Collaborative Innovation Networks Conference of Digital Transformation of Collaboration (pp. 33–50). Cham: Springer. Gloor, P., Colladon, A. F., Giacomelli, G., Saran, T., & Grippa, F. (2017). The impact of virtual mirroring on customer satisfaction. Journal of Business Research, 75, 67–76. Gloor, P. A., Oster, D., & Fischbach, K. (2013). JazzFlow—analyzing “group flow” among jazz musicians through “honest signals”. KI-Künstliche Intelligenz, 27(1), 37–43. Guo, R., Li, S., He, L., Gao, W., Qi, H., & Owens, G. (2013). Pervasive and unobtrusive emotion sensing for human mental health. 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops (pp. 436–9). IEEE. Happy, S. L., George, A., & Routray, A. (2012). A real time facial expression classification system using local binary patterns. 4th International Conference on Intelligent Human Computer Interaction (IHCI) (pp. 1–5). IEEE. Jain, D. K., Zhang, Z., & Huang, K. (2017). Multi angle optimal pattern-based deep learning for automatic facial expression recognition. Pattern Recognition Letters, 139, 157–165 https://doi.org/10 .1016/j.patrec.2017.06.025. Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. Proceedings of the IEEE International Conference on Computer Vision (pp. 2983–91). IEEE. Kahneman, D., & Krueger, A. B. (2006). Developments in the measurement of subjective well-being. Journal of Economic Perspectives, 20(1), 3–24. Kahou, S. E., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, Ç., Memisevic, R., ... & Mirza, M. (2013). Combining modality specific deep neural networks for emotion recognition in video. Proceedings of the 15th ACM International Conference on Multimodal Interaction (pp. 543–550). ACM. Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M. A., & Cleder, C. (2019). Automatic speech emotion recognition using machine learning. Social Media and Machine Learning. IntechOpen. https://doi.org/10.5772/intechopen.84856. Khan, A. M., & Lawo, M. (2016). Developing a system for recognizing the emotional states using physiological devices. 12th International Conference on Intelligent Environments (IE) (pp. 48–53). IEEE. Marechal, C., Mikołajewski, D., Tyburek, K., Prokopowicz, P., Bougueroua, L., Ancourt, C., & Węgrzyn-Wolska, K. (2019). Survey on AI-based multimodal methods for emotion detection. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11400, 307–24. Nguyen, N. T., Nguyen, N. V., Tran, M. H. T., & Nguyen, B. T. (2017). A potential approach for emotion prediction using heart rate signals. 9th International Conference on Knowledge and Systems Engineering (KSE) (pp. 221–6). IEEE. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Vanderplas, J. (2011). Scikit-learn: machine learning in Python. The Journal of Machine Learning Research, 12, 2825–30. Plutchik, R. (2001). The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4), 344–50. Posner, J., Russell, J. A., & Peterson, B. S. (2005). The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and Psychopathology, 17(3), 715–34. Purves, D., Augustine, G., Fitzpatrick, D., Katz, L., LaMantia, A., McNamara, J., & Williams, S. (2001). Neuroscience, 2nd edn. Sunderland, MA: Sinauer Associates.
172 Handbook of social computing Qiu, J. L., Liu, W., & Lu, B. L. (2018). Multi-view emotion recognition using deep canonical correlation analysis. In International Conference on Neural Information Processing (pp. 221–31). Cham: Springer. Robertson, I., & Cooper, C. (2011). Well-being: Productivity and Happiness at Work. Basingstoke: Palgrave Macmillan. Roessler, J., & Gloor, P. A. (2020). Measuring happiness increases happiness. Journal of Computational Social Science, 4, 1–24. https://doi.org/10.1007/s42001-020-00069-6. Rule, N., & Ambady, N. (2010). First impressions of the face: predicting success. Social and Personality Psychology Compass, 4(8), 506–16. Rus, S., Joshi, D., Braun, A., & Kuijper, A. (2018). The emotive couch-learning emotions by capacitively sensed. Procedia Computer Science, 130, 263–70. Shu, L., Xie, J., Yang, M., Li, Z., Li, Z., Liao, D., … & Yang, X. (2018). A review of emotion recognition using physiological signals. Sensors, 18(7), 2074. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556. Ullah, M. A., Islam, M. M., Azman, N. B., & Zaki, Z. M. (2017). An overview of multimodal sentiment analysis research: opportunities and difficulties. 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR) (pp. 1–6). IEEE. Verschuere, B., Crombez, G., Koster, E., & Uzieblo, K. (2006). Psychopathy and physiological detection of concealed information: a review. Psychologica Belgica, 46(1–2), 99–116. Yano, K., Akitomi, T., Ara, K., Watanabe, J., Tsuji, S., Sato, N., … Moriwaki, N. (2015). Profiting from IoT: the key is very-large-scale happiness integration. Symposium on VLSI Technology (VLSI Technology) (pp. C24–C27). IEEE. Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L. P. (2017). Tensor fusion network for multimodal sentiment analysis. https://doi.org/10.48550/arXiv.1707.07250.
9. Using plants as biosensors to measure the emotions of jazz musicians Anushka Bhave, Fritz K. Renold and Peter A. Gloor
1. INTRODUCTION Emotions play an indispensable role in decision-making ability and collaboration among humans. Emotion recognition is the process of identifying human emotions from data collected about an individual or groups of individuals (Zhang et al., 2020). These consist of facial configurations and expressions, textual sentiments (Kerkeni et al., 2019), voice, granular data amassed from wearable devices (Ali et al., 2018), or even neurological data obtained from brain–computer interfaces (Czarnocki, 2021). The detection and analysis of emotions leveraging recent developments in artificial intelligence have seen progressive advancements using multimodal datasets, machine learning, and state-of-the-art deep learning models. Scientific research has led to applications of emotion recognition in tasks such as monitoring the mental health of patients (Guo et al., 2013), driving vehicles safely (Nadai et al., 2016), and ensuring social security in public places (Verschuere et al., 2006). Nevertheless, during formal social situations, humans might hide or try to control their true emotions. For example, a person showing a smile can in reality be in a gloomy mood. Similarly, even speech data may yield skewed results (Shu et al., 2018). On the contrary, biological signals gathered from sensors are predominantly involuntarily activated and cannot be easily manipulated (Shu et al., 2018). Data collection for emotion recognition has often been associated with privacy concerns. The collection of facial emotion data, physiological data, or audio data for studying emotions has been considered an invasive form of surveillance (Czarnocki, 2021). The data accumulated can be confidential and contain sensitive information, making the processes of data assembling intrusive for human subjects and leading to the violation of General Data Protection Regulation (GDPR) laws. Plants are known to respond to movement and sound (Mancuso and Viola, 2015). Plants can discern electrostatic changes in their environment like the walking of a person or, in our case, the body movement of musicians (Oezkaya and Gloor, 2020). For instance, striking the drum foot pedal or tapping one’s feet during the musical performance will trigger a response in the plant. Plants also show spikes in the action potential as a reaction to the sound played in the background (Peter, 2021). The electrical signals of the plant can then be recorded, for instance using a Plant Spikerbox (https://backyardbrains.com/products/plantspikerbox), and analyzed to decipher how plants react to specific frequencies and rhythms. Figure 9.1 shows a Plant Spikerbox connected to a basil plant, which has been used in the experiments described in this chapter. Using plants as biosensors is a promising solution for data acquisition as it avoids GDPR concerns because the facts obtained do not contain any private information. Furthermore, incorporating plants in the set-up of human emotion sensing and measurement is affordable, suitable for tracking over lengthy periods of time, and contact-free. Even more, 173
174 Handbook of social computing
Figure 9.1
Basil plant as a movement and sound sensor
it has repeatedly been shown that interaction with plants is beneficial to human health (Sacks, 2018). Studies have shown that using one metric to gauge emotions can be inadequate whereas employing multimodal techniques for labeling human emotions can yield more precise results (Qiu et al., 2018). A survey of 50 multimodal sentiment analysis papers concluded that they resulted in more accurate emotion predictions (Ullah et al., 2017). The mathematical relationships between each modality used in sentiment analysis have also been investigated (Zadeh et al., 2017). However, explorations in multimodal emotion prediction are fewer in the field of intermodal dynamics among teams (Gloor et al., 2019). Current research in multimodal emotion prediction chiefly consists of studying an individual or the intermodal dynamics between two different groups. Motivated by the aforementioned research question, in this paper we investigate ways of measuring collective group emotions of humans using electrical signals from plants, facial emotions, and body signals. The primary contributions of this chapter are: 1. We collected data from a two-hour jazz rehearsal session where an orchestra of 19 musicians at the Jazzaar festival (www.jazzaar.com) was rehearsing for an upcoming live-streaming concert (Figure 9.2). The group consisted of percussionists, string players, wind instrumentalists, trumpeters, and vocal singers as well as the audience. In the
Using plants as biosensors to measure the emotions of jazz musicians 175 orchestra, there were two world-famous saxophone players, a star drum player, a few professional musicians, and high school students. 2. We inspected the correlations between the results of the trimodal data of emotion prediction that included facial emotions, plant MFCCs (Mel-Frequency Cepstral Coefficients), heart rate, and body movement of the musicians measured with a smartwatch. We further performed a regression analysis to predict plant response from the human emotions measured with facial emotion recognition (FER). 3. We constructed machine learning models to predict the emotions of jazz musicians using the features extracted from the plant’s electrical signals.
Figure 9.2
Orchestra of jazz musicians playing diverse musical instruments
2.
THEORETICAL BACKGROUND
2.1
Facial Emotions
Psychologists have built several frameworks for understanding and classifying human emotions. Emotion classification, how we distinguish one emotion from another, is approached by researchers from two different viewpoints. The first asserts that emotions are discrete and fundamentally disparate constructs, and the second states that emotions can be characterized on a dimensional basis (Colombetti, 2009). Paul Ekman identifies six basic emotions categorized as anger, disgust, sadness, happiness, fear, and surprise (Ekman and Friesen, 1971). By this theory, other complex emotions are derived from these fundamental emotions. Plutchik’s wheel of emotions lists eight core emotions: joy, trust, fear, surprise, sadness, disgust, anger, and anticipation (Plutchik, 2001).
176 Handbook of social computing Dimensional models of emotion like the Circumplex model developed by James Russell (Posner et al., 2005) suggest that emotions are distributed in a two-dimensional circular space, incorporating arousal and valence dimensions. Arousal represents the vertical axis and valence represents the horizontal axis, while the center of the circle depicts a neutral valence and a medium level of arousal. Emotional states can be represented at any level of valence and arousal, or at a neutral level of one or both of these factors. Circumplex models have been used most commonly to test stimuli of emotional words and emotional facial expressions (Posner et al., 2005). Emotions are usually exhibited through language, voice or tone of speaking, physiology, and facial expressions (Ambady and Weisbuch, 2010). Speech data as well as facial emotions are one of the strongest indicators for accurately predicting emotions (Rule and Ambady, 2010). Physiological changes also help gauge emotions (Purves et al., 2001). Although plants are not intelligent in the human sense and do not recognize human emotions directly, they have a perception of the surrounding environment that we see in the form of response to stimuli (Mancuso and Viola, 2015). Plants also show adaptations to environmental conditions. Inter-plant communication through mycorrhizal networks mediates complex adaptive behavior in plants that takes place through the secretion of biochemicals (Gorzelak et al., 2015). Traditional machine learning algorithms have been widely employed to create emotion recognition models. SVM, K-Nearest Neighbours (KNN) and Random Forest (RF) have been utilized to attain intensity estimation along with emotion classification (Mehta et al., 2019). A facial emotion classification algorithm implemented by Happy et al. (2012) uses a Haar classifier for face detection purposes along with Local Binary Patterns (LBP) histogram of different block sizes for a face image as feature vectors and classifies six basic human expressions by implementing Principal Component Analysis (PCA). Ghimire et al. (2013) explore geometric feature-based facial expression recognition by identifying 52 points on the human face as features, which are then used as input to a multi-class AdaBoost and Support Vector Machine (SVM) achieving a recognition accuracy of 95.17 percent and 97.35 percent, respectively. State-of-the-art research in emotion classification includes leveraging deep learning techniques like Convolutional Neural Networks (CNN), a type of artificial neural network, as well as Long Short Term Memory (LSTM), a type of Recurrent Neural Network (RNN). Jung et al. (2015) consolidate a deep learning method where the first deep network extracts temporal appearance features from image sequences, while the other deep network extracts temporal geometry features from temporal facial landmark points. Jain et al. (2017) construct an LSTM network along with a CNN to label facial expressions as part of a multi-angle optimal pattern-based deep learning method. In another analysis, Kahou et al. (2013) discover that a hybrid CNN–RNN model outperforms a lone CNN for facial emotion recognition tasks when combining multiple deep neural networks for different data modalities. 2.2
Plants as Biosensors
The notion of using plants as biosensors has been associated with the physiology of plants. The specificity and sensitivity of biological systems are the basis of building biosensors (Turner et al., 2013). Using the plant as a sensor to monitor its environmental changes, electrodes are placed on the plant to measure its action potential and record conclusions based on the external effects causing it. To measure activity, the plant should be able to detect the stimuli applied
Using plants as biosensors to measure the emotions of jazz musicians 177 and respond to them while showing an observable reaction. Plant biosensors have been used to study growth reactions, visual movement, and internal communication in plants (Peter, 2021). Jagdish Chandra Bose pioneered the study of electrical impulses in plants at the beginning of the twentieth century. He used an electrical detector to experiment with Mimosa Pudica’s reaction to touch and identify the internal responses generated by the plant by recording and analyzing the signal. In modern research, the question is no longer about how to measure electrical signals in plants but how we can leverage this discovery to understand the world better. The electrical signal is the potential difference between the electrode in the plant cell and the ground. The signals consist of three types of reactions: Local Electrical Potential (LEP), Action Potential (AP), and Variation Potential (VP) (Chatterjee et al., 2015). Action Potential is a short spike in the electrical potential which is triggered due to environmental activity. APs are usually generated when there is no permanent damage to the plant. For instance, Mimosa Pudica, whose sensitivity is especially high, can detect a person walking in its environment through the electrostatic discharge produced (Oezkaya and Gloor, 2020). Scientific research has shown that a majority of plants generate electrical impulses. It is feasible to capture a variety of stimuli by measuring electrical responses. If the reaction is recognizable, the exact stimulus that triggered a specific reaction can be studied. One major advantage of using plants to measure electrical signals is their sensitivity, which makes them suitable to discover small movements hard to detect with mechanical sensors. In earlier work, differences in the walking patterns of distinct humans have been identified by extracting information from the electrical signals of Mimosa pudica. A non-intrusive signal detector has been used to record movement around the plant when electrostatic discharge is produced due to the lifting and placing of a foot on the floor. Hence, the plant acts as a movement sensor (Oezkaya and Gloor, 2020). In another work, the relationship between the movement of the lateral leaflets of Codariocalyx motorius and the human voice has been studied. The researchers hypothesized that the reaction of the plant is due to the fact that the Codariocalyx motorius senses sound (Duerr and van Delden, 2020). The effect of sound on plants and the reactions obtained have been explored by many different researchers. Plant biosensors have been used to investigate the effect of sound on growth reactions, sound vibrations, and inter-plant communication. Through experimental analysis, it has been shown that plants show a change in behavior when sound is present in the environment (Mishra et al., 2016). 2.3
Body Signals
Physiological signals have been identified as one of the useful features for research in human emotion recognition. There are numerous metrics like heart rate, body movement, blood pressure, body temperature, and pulse count which have been put to use for predicting emotions. In a previous study, an electrical device worn by candidates on the chest measured temperature, location, sound, and body acceleration for sentiment analysis (Kahou et al., 2013). They conducted an experiment in multi-class emotion prediction using heart rate and virtual reality stimuli to investigate if the heart rate signals could be utilized to classify four-class emotions. They used common classifiers like SVM, KNN, and RF to predict emotions (Bulagang et al., 2020). An emotion classifier was built using Decision Tree (J48) and IBK classifiers by collecting the data on blood volume pulse, galvanic skin response, and skin temperature, attaining an accuracy of 97 percent (Khan and Lawo, 2016). The Emotive Couch, a sensor-augmented
178 Handbook of social computing piece of smart furniture, detects the proximity and motion of the human body and predicts three basic emotions of anxiety, relaxation, and interest with an accuracy of 77.7 percent (Rus et al., 2018). Lastly, the Happimeter app which gives insights from the smartwatch about the emotions of the wearer with 80 percent accuracy, uses an accelerometer sensor, step counter, heart rate sensor, microphone, position sensor, and some derived features (Roessler and Gloor, 2021). Physiological-based emotion recognition is frequently done using simpler machine learning algorithms, such as RF. This could be due to the paucity of datasets with many labeled points containing physiological and emotional data. In this chapter, we combine several innovative concepts for the task of emotion recognition. We investigate the intermodal dynamics within a group of jazz musicians, by comparing facial emotions, electrical signals recorded by plants in their vicinity, and body signals collected using the Happimeter app on a smartwatch.
3. METHODOLOGY 3.1
Data Collection
We accrued multimodal data of the emotional countenance, physiological signals, and plant’s electrical signals from a two-hour jazz rehearsal session of an orchestra of 19 musicians who were a part of the Jazzaar festival (www.jazzaar.com). The group was rehearsing for an upcoming live-streamed concert and consisted of three star musicians, some professional musicians, and some high school students. The Happimeter app was installed on the smartwatches worn by the musicians. It measured the heart rate and the body’s proper acceleration in the X, Y, and Z directions. As a plant sensor, basil was used. The basil plant (Ocimum basilicum) is commonly available, inexpensive, and has good electrical conductivity. This makes it possible to position basil in the vicinity of humans to be able to get real-time analysis of human emotions. It acts as a highly sensitive sound and movement sensor to generate electrical activity in our experiment. A Plant Spikerbox (https://www.backyardbrains.com/products/plantspikerbox) was placed near the musicians to measure the electrical potential values of the basil plant triggered by electrostatic discharge and track action potential variations observed in response to sound and body movement. Moreover, a video camera was used to capture the emotions on the faces of the participants throughout the entire duration of the experiment. A summary of the collected data is shown in Table 9.1. Table 9.1
Synopsis of the data collected from the jazz rehearsal
Data Type
Description
Heart Rate
The beats per minute (BPM) measured as the heart rate by the smartwatch
Movement
The sum of the absolute values of the smartwatch’s accelerometer values in the X, Y, and Z directions
Plant Data
Electric potential of the basil plant is collected in the form of electrical waveforms in a “.wav” file. The waveform audio file consists of the digitally sampled data obtained from the analog signal of the electrical wave.
Facial Emotions
Videos containing faces of musicians are passed as input to a CNN that classifies faces into seven emotions – angry, sad, happy, disgusted, surprised, fearful, and neutral. Based on the individual scores, a total score is generated to determine the dominant group emotion.
Using plants as biosensors to measure the emotions of jazz musicians 179 3.2
Model Implementation
3.2.1 Face emotion recognition In this experiment, we employ the FER algorithm developed by Page et al. (2021). Figure 9.3 demonstrates the real-time output of this algorithm. The facial emotion recognition has been done with the faceapi.js (https://justadudewhohacks.github.io/face-api.js) JavaScript Application Programming Interface (API) which is implemented on top of the TensorFlowJS core API and can perform face recognition in the browser. Both face detection and face expression and emotion recognition are done using two different neural networks. The model receives a frozen frame per second from the video window with all participants’ cameras and calculates every second the average of all probabilities for the current emotion for each detected face. Thus, every emotion out of neutral, happy, sad, angry, fearful, disgusted, and surprised gets a probability between 0 and 1 assigned. Exemplarily, “angry (0.8)” and “disgusted (0.2)” mean that the particular face looks 80 percent angry, and is 20 percent annoyed or disgusted as predicted. The overall audience emotion score is determined when we calculate the mean face emotion value from all detected faces.
Figure 9.3
Real-time group face emotion recognition (FER) algorithm
3.2.2 Plant Spikerbox To collect the electric potential of the basil, the Plant SpikerBox by BackyardBrains is used. One electrode is placed on a leaf of the basil; the other is placed in the ground near the roots of the plant. The electrical signals show a spike when action potential triggers are observed. The SpikeRecorder saves the recordings as sound files in “.wav” format. The tool provides recordings with a sample rate of 10,000 samples per second. To characterize spectral components of the signal, we perform a spectral analysis by transforming time-dependent frequency informa-
180 Handbook of social computing tion on the Mel scale. By performing a Fast Fourier transform followed by transforming the results to the Mel scale, we obtain the Cepstral Coefficients (MFCCs). MFCCs are a compact representation of the spectrum of an audio signal and MFCCs contain information about the rate changes in the different spectrum bands. The implementation used here is provided by the Librosa library (McFee et al., 2015). We calculate a total of eight MFCCs. The parameters set for calculation are summarized in Table 9.2. Table 9.2
Selected parameters for MFCC feature extraction
Parameter
Value
Sampling Rate
10,000 samples per second
Number of MFCCs
8
Window Size
2,500
Hop Length
2,000
3.2.3 Physiological signals The Happimeter app (Roessler and Gloor, 2021) keeps track of the physiological signals extracted from the sensors of the smartwatch worn by jazz musicians. It measures the heart rate, body acceleration in the X, Y, and Z directions, body activity, step count, the latitude, longitude, and altitude of location, and Vasomotor Center (VMC) signal of the wearer of the device. The information obtained is in the form of time series data covering the entire rehearsal session of two hours with values per second for each metric. 3.2.4 Correlation analysis We use cubic spline interpolation to smooth the short-term data inconsistencies and display long-term trends. We carry out a correlation analysis using the Pearson correlation coefficient comparing all three modes of data. We specifically plot correlations of the plant MFCCs with group facial emotions, physiological signals with plant MFCCs, and group facial emotions with physiological signals. 3.2.5 Regression analysis Regression analysis is used to deduce the dependency between two or more variables. Regression analysis helps us to particularly look at how well the data fits into the relationship. We perform a regression analysis on the independent variables of the emotions “surprised,” “disgusted,” “angry,” and “neutral,” and the dependent variable “MFCC-7.” The results are exhibited in Section 4. 3.2.6 Machine learning model for emotion prediction Finally, we use the plant MFCC features extracted from the electrical signals to predict group facial emotions. We employ the Extreme Gradient Boosting (XGBoost) algorithm for performing a multi-class classification of emotions. We select the best MFCC coefficients based on the correlation heatmap as input features to the model, and one dominant emotion with the highest probability from the FER output as the label for each sample. The implementation is done using the XGBoost library in Python.
Using plants as biosensors to measure the emotions of jazz musicians 181
4. RESULTS Figure 9.4 demonstrates the Pearson correlation values (r) of each of the group facial emotions – angry, surprised, disgusted, neutral, sad, fearful, and happy correlated with the MFCC-7 coefficient.
Figure 9.4
Diagonal correlation heatmap of group facial emotions and MFCC-7 coefficient
The r-values of surprise and anger are particularly highly positive. The r-values of sadness and fear are also moderately positive. However, the emotion neutral is negatively correlated with the MFCC-7 coefficient. Unsurprisingly, happiness also has a weak negative r-value as opposed to sadness. Figure 9.5 displays graphs of the moving average of each facial emotion and MFCC-7 with time. We ran a regression analysis with the dependent variable MFCC-7 and the predictors as the emotions of happiness, anger, surprise, and neutral. Table 9.3 lists the results of the analysis. We built a machine learning model for predicting group emotions using plant MFCCs. We implement XGBoost for training the model. XGBoost is an implementation of gradient-boosted decision trees. Weights assigned to independent variables are first fed into decision trees for prediction. Based on the results, the weights of wrongly predicted variables are fed to the second decision tree to form such multiple classifiers. These are then assembled
182 Handbook of social computing
Note: x-axis is time, y-axis is scaled values of MFCC and emotion.
Figure 9.5
Time plots of group facial emotions with the MFCC-7 coefficient
Table 9.3
Regression analysis, dependent variable MFCC-7
Unstandardized Coefficients
t
Sig.
172.841
0.000
0.313
15.115
0.000
0.735
15.692
0.000
0.186
0.204
13.987
0.000
0.004
0.549
10.371
0.000
B
Std. Error
(Constant)
0.604
0.003
surprised
0.794
0.053
angry
0.069
0.004
disgusted
2.600
neutral
0.044
Note:
Standardized Coefficients Beta
R sq. adj = 0.34.
to give a more precise model. Based on the inter-correlation among the MFCCs that are the input features, we select MFCC-1, MFCC-2, and MFCC-7 as the best inputs to our model to prevent overfitting. We consider the emotion having a maximum probability score or the dominant emotion as the label of each sample. The labels contain multi-class data of three emotions – happy, sad, and angry. To mitigate the data imbalance of each class and inhibit the training of a biased model, we use the Synthetic Minority Oversampling Technique (SMOTE). SMOTE synthesizes new minority instances to balance the number of samples of each class. We finally split the data containing 849 samples into a 70–30 train-test split using K-Fold Cross Validation where K=10. Figure 9.6 demonstrates the confusion matrix for the XGBoost model.
Using plants as biosensors to measure the emotions of jazz musicians 183
Figure 9.6
Confusion matrix of the machine learning model
We train the XGBoost model over 100 epochs. We use log loss for multi-class classification known as the “mlogloss” parameter. Similarly, we use classification error for multi-class classification by setting “merror” as a parameter. We achieve a training accuracy of 69 percent and a test accuracy of 64 percent for the model, as shown in Figure 9.7. We use accuracy, loss, and classification error as the evaluation metrics of the model. Classification error for training and testing is exhibited in Figure 9.8, whereas training and testing loss of the model decreasing over the epochs is showcased in Figure 9.9. Hence, we successfully experiment and evaluate our hypothesis of predicting human emotions using plant features.
184 Handbook of social computing
Figure 9.7
Train and test accuracy of the machine learning model
Figure 9.8
Train and test classification error of the machine learning model
Using plants as biosensors to measure the emotions of jazz musicians 185
Figure 9.9
Train and test loss of the machine learning model
5. DISCUSSION 5.1
Smartwatch Metrics and Plant MFCCs
One of the most important results that came from the relationships between the group’s facial emotions and body signals recorded by the smartwatch was the average body movement of drum players, vocal singers, and saxophone players. The body movement of the vocal singers showed a high positive correlation with the MFCC-7 coefficient, evident from the Pearson correlation value of 0.654 and a significance value of 0.00001. Body movement of drum players also shows a moderate positive correlation of 0.418 with a significance value of 0.000001 with the MFCC-7 feature. This corroborates that the plant features contain information about the movement since the plant acts as a movement sensor. The body movement of drum players could be because of actions like striking the drum foot pedal or constant movement of hands while playing using drumsticks. The body movement of vocal singers could be because of swaying while singing or tapping their feet. One interesting result achieved on the individual level was for the average body movement of saxophone players versus that of star saxophone players. We observe that the movement of the student saxophone players is positively correlated with MFCC-7, having a Pearson correlation of 0.212 and a significance value of 0.00001. As opposed to that, the body movement of the star saxophone players is negatively correlated with MFCC-7, having a Pearson correlation of −0.286 and a significance value of 0.00001, which suggests that star saxophone players who are more skilled at playing the instrument move relatively less.
186 Handbook of social computing 5.2
Face Emotion Recognition and Smartwatch Metrics
We also acquire meaningful relationships between the group’s facial emotion recognition results and physiological signals. The average body movement of drum players is strongly negatively correlated with the emotions of disgust, surprise, and fear with Pearson correlation values of −0.714, −0.657, and −0.611 with a significance value of 0.00001. The average heart rate of vocal singers is strongly negatively correlated with happiness and is strongly positively correlated with the emotions of anger and surprise. The Pearson values obtained are −0.916, 0.962, and 0.944, respectively, with a significance value of 0.00001. The body movement of vocal singers when correlated with the emotion of happiness shows a strong positive Pearson correlation of 0.868 with a significance value of 0.00001, whereas a negative correlation with the emotions of fear and sadness attaining Pearson correlations of 0.917 and 0.921, respectively, with a significance value of 0.00001. In other words, the more the singers move, the happier they are. The notion of an increase in the heartbeat while experiencing anger and surprise as well as an increased happiness level because of the body movement of the jazz musicians makes intuitive sense. For saxophone players, student musicians have a low positive Pearson correlation value between happiness and heart rate as compared to the star saxophone players. The values are 0.321 and 0.529, respectively, with a significance value of 0.00001. Also, the Pearson correlation between anger and body movement is higher for the star saxophone players as compared to the usual saxophone players. The values are 0.732 and 0.421, respectively, with a significance value of 0.00001. This helps us to form a hypothesis that star saxophone players have a higher happiness level while playing the saxophone, and are more likely to be angry while moving their body which may signify passion while playing the instrument and a possibility to reach the state of flow.
6.
CONCLUSION, LIMITATIONS, AND FUTURE WORKS
We presented a privacy-preserving machine learning approach for human emotion detection using features extracted from the plant’s electrical signals, achieving a train and test accuracy of 69 percent and 64 percent, respectively. We also were able to draw conclusions about the emotional relationship between the emotions of musicians and their body movements. A key limitation of this work is the evaluation based on data from a relatively small group of musicians. Furthermore, the accuracy of tools used for collecting FER and physiological signals can cause our model to be less precise. Future directions include predicting human emotions from plant MFCCs using a larger dataset containing more musicians. We also envision implementing state-of-the-art deep learning models, studying the relative contributions of each mode of data, exploring other techniques for signal analysis, and finding interesting relationships between emotions and group flow.
Using plants as biosensors to measure the emotions of jazz musicians 187
REFERENCES Ali, M., Mosa, A., Almachot, F., and Kyamakya, K. (2018), “Emotion recognition involving physiological and speech signals: a comprehensive review”, in K. Kyamakya et al. (eds), Recent Advances in Nonlinear Dynamics and Synchronization (pp. 287–302), Cham: Springer. Ambady, N., and Weisbuch, M. (2010), “Non verbal behavior”, in S. T. Fiske et al. (eds), Handbook of Social Psychology, Vol. 1, 5th edn. (pp. 464–97), New York: Wiley. Bulagang, A., Weng, N., Mountstephens, J., and Teo, J. (2020), “A review of recent approaches for emotion classification using electrocardiography and electrodermography signals”, Informatics in Medicine Unlocked, 20, https://doi.org/10.1016/j.imu.2020.100363. Chatterjee, S., Das, S., Maharatna, K., Masi, E., Santopolo, L., Mancuso, S., and Vitaletti, A. (2015), “Exploring strategies for classification of external stimuli using statistical features of the plant electrical response”, Journal of the Royal Society, https://doi.org/10.1098/rsif.2014.1225. Colombetti, G. (2009), “From affect programs to dynamical discrete emotions”, Philosophical Psychology, 407–25, https://doi.org/10.1080/09515080903153600. Czarnocki, J. (2021), “Will new definitions of emotion recognition and biometric data hamper the objectives of the proposed AI Act?”, International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–4, https://doi.org/10.1109/BIOSIG52210.2021.9548285. Duerr, S. and van Delden, J. (2020), “Eurythmic dancing with plants – measuring plant response to human body movement in an anthroposophic environment”, https://doi.org/10.48550/arXiv.2012 .12978. Ekman, P., and Friesen, W. (1971), “Constants across cultures: the face and emotion”, Journal of Personality and Social Psychology, 17(2), 124. Ghimire, D., and Lee, J. (2013), “Geometric feature-based facial expression recognition in image sequences using multi-class adaboost and support vector machines”, Sensors, 13(6), 7714–34. Gloor, P., Araño, K., and Guerrazzi, E. (2019), “Measuring audience and actor emotions at a theater play through automatic emotion recognition from face, speech, and body sensors”, in Digital Transformation of Collaboration: Proceedings of the 9th International COINs Conference (pp. 33–50), Cham: Springer. Gorzelak, M., Asay, A., Pickles, B., and Simard, S. (2015), “Inter-plant communication through mycorrhizal networks mediates complex adaptive behavior in plant communities”, AoB Plants, https://doi .org/10.1093/aobpla/plv050. Guo, R., Li, S., He, L., Gao, W., Qi, H., and Owens, G. (2013), “Pervasive and unobtrusive emotion sensing for human mental health”, in 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops (pp. 436–9). IEEE. Happy, S., George, A., and Routray, A. (2012), “Realtime facial expression classification system using local binary patterns”, in 4th International Conference on Intelligent Human Computer Interaction (IHCI) (pp. 1–5), IEEE. Jain, D., Zhang, Z., and Huang, K. (2017), “Multiangle optimal pattern-based deep learning for automatic facial expression recognition”, Pattern Recognition Letters, 139, 157–65. https://doi.org/10 .1016/j.patrec.2017.06.025. Jung, H., Lee, S., Yim, J., Park, S., and Kim, J. (2015), “Joint fine-tuning in deep neural networks for facial expression recognition”, in Proceedings of the IEEE International Conference on Computer Vision (pp. 2983–91), IEEE. Kahou, S., Pal, C., Bouthillier, X., Froumenty, P., Gülçehre, Ç., Memisevic, R., and Mirza, M. (2013), “Combining modality specific deep neural networks for emotion recognition in video”, in Proceedings of the 15th ACM on International Conference on Multimodal Interaction (pp. 543–50), https://doi.org/ 10.1145/2522848.2531745. Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M., and Cleder, C. (2019), “Automatic speech emotion recognition using machine learning”, in A. Cano (ed.), Social Media and Machine Learning, Intech Open. Khan, A., and Lawo, M. (2016), “Developing system for recognizing the emotional states using physiological devices”, in 12th International Conference on Intelligent Environments (IE) (pp. 48–53), IEEE.
188 Handbook of social computing Mancuso, S. and Viola, A. (2015), Brilliant Green: The Surprising History and Science of Plant Intelligence, Washington, D.C.: Island Press. McFee, B. et al. (2015), “librosa: audio and music signal analysis in Python”, Python in Science Conference, https://doi.org/10.25080/Majora-7b98e3ed-003. Mehta, D., Siddiqui M., and Javaid A. (2019), “Recognition of emotion intensities using machine learning algorithms: a comparative Study”, Sensors (Basel), 19(8), 1897. https://doi.org/10.3390/ s19081897. Mishra, R., Ghosh, R., and Bae, H. (2016), “Plant acoustics: in the search of a sound mechanism for sound signaling in plants”, Journal of Experimental Botany, 67 (15), 4483–94, https://doi.org/10 .1093/jxb/erw235. Nadai, S., D’Incà, M., Parodi, F., Benza, M., Trotta, A., and Sacile, R. (2016), “Enhancing safety of transport by road by online monitoring of driver emotions”, in 11th System of Systems Engineering Conference (SoSE) (pp. 1–4), IEEE, https://doi.org/10.1109/SYSOSE.2016.7542941. Oezkaya, B., and Gloor, P. (2020), “Recognizing individuals and their emotions using plants as biosensors through electro-static discharge”, Signal Processing, https://doi.org/10.48550/arXiv.2005.04591. Page, P., Kilian, K., and Donner, M. (2021), “Enhancing quality of virtual meetings through facial and vocal emotion recognition”, COINs Seminar Paper Summer Semester 2021, University of Cologne. Peter, P. (2021), “Do plants sense music? An evaluation of the sensorial abilities of the Codariocalyx Motorius”, Master’s thesis, University of Cologne, https://kups.ub.uni-koeln.de/53756/(accessed 30 November 2022). Plutchik, R. (2001), “The nature of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice”, American Scientist, 89(4), 344–50. Posner, J., Russell, J., and Peterson, B. (2005), “The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology”, Development and Psychopathology, 17(3), 715–34. Purves, D., Augustine, G., Fitzpatrick, D., Katz, L., LaMantia, A., McNamara, J., and Williams, S. (2001), Neuroscience, 2nd edn, Sunderland, MA: Sinauer Associates. Qiu, J., Liu, W., and Lu, B.(2018), “Multi-view emotion recognition using deep canonical correlation analysis”, in International Conference on Neural Information Processing (pp. 221–31), Cham: Springer, https://doi.org/10.48550/arXiv.1908.05349. Roessler, J., and Gloor, P. (2021), “Measuring happiness increases happiness”, Journal of Computational Social Science, 4(1), 123–46, https://doi.org/10.1007/s42001-020-00069-6. Rule, N., and Ambady, N. (2010), “First impressions of the face: predicting success”, Social and Personality Psychology Compass, 4(8), 506–16, https://doi.org/10.1111/j.1751-9004.2010.00282.x. Rus, S., Joshi, D., Braun, A., and Kuijper, A. (2018), “The emotive couch – learning emotions by capacitively Sensed Movements”, Procedia Computer Science, 130, 263–70. Sacks, O. (2018), “The healing power of gardens”, New York Times, April 18. Shu, L., Xie, J., Yang, M., Li, Z., Li, Z., Liao, D., and Yang, X. (2018), “A review of emotion recognition using physiological signals”, Sensors, 18(7), 2074. Turner, A., Karube, I., and Wilson, G. (2013), Biosensors: Fundamentals and Applications, Oxford: Oxford University Press. Ullah, M., Islam, M., Azman, N., and Zaki, Z. (2017), “An overview of multimodal sentiment analysis research: opportunities and difficulties”, in IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR) (pp. 1–6), IEEE, https://doi.org/10.1109/ICIVPR.2017.7890858. Verschuere, B., Geert, C., Ernst, H., and Katarzyna, U. (2006), “Psychopathy and physiological detection of concealed information: a review”, Psychologica Belgica, 46(1–2), https://doi.org/10.5334/PB-46-1 -2-99. Zadeh, A., Chen, M., Poria, S., Cambria, E., and Morency, L. (2017), “Tensor fusion network for multimodal sentiment analysis”, https://doi.org/10.48550/arXiv.1707.07250. Zhang, J., Zhong Y., Chen, P., and Nichele, S. (2020), “Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review”, Information Fusion, 59, 103–26, https://doi.org/ 10.1016/j.inffus.2020.01.011.
PART IV APPLICATIONS IN BUSINESS AND MARKETING
10. How does congruence between customer and brand personality influence the success of a company? Tobias Olbrück, Peter A. Gloor, Ludovica Segneri and Andrea Fronzetti Colladon
1. INTRODUCTION Increasing competition in the business environment leads companies to think about the most effective strategies to increase their competitive advantage. Traditional car manufacturers, for example, are facing new competitors because of electrification. Not only has Tesla newly entered the market, but technology companies, such as Google and Apple, are examining or already preparing for a market entry (Demling & Jahn, 2021; Tyborksi & Demling, 2020). Other industries are also fiercely competitive. One way to succeed is to know potential customers and their needs to differentiate from competitors with a strong brand personality (Lin, 2010; Maehle & Shneor, 2010) and a targeted social media presence to strengthen the relationship with customers (Assaad & Gómez, 2011). However, gaining customer insights with traditional customer surveys is costly and cannot be carried out regularly with many participants (Azucar et al., 2018; Lindemann et al., 2020). At the same time, more and more people are using social media platforms and disclosing information about themselves on these networks (Tankovska, 2021). The proportion of the world’s population with a social media account is expected to increase from 48.7 percent in 2020 to 56.7 percent in 2025, which is 4.41 billion people (Tankovska, 2021). User-generated content (e.g., Twitter tweets or Facebook posts) and its language tells a lot about a person: you can infer feelings, behavior, and personality from it (Schwartz & Ungar, 2015). This shows the enormous potential of social media data and the insights that can be gained from it. Various researchers from a wide range of disciplines have already exploited this potential and conducted analyses on social media data (De Choudhury et al., 2014; Eichstaedt et al., 2015; Settanni & Marengo, 2015). One popular application of social media analytics is personality analysis. Personality is a central concept in psychology. The best known and most-researched model describing the personality is the Big Five, which consists of five areas with which a personality can be described comprehensively: Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness to Experience (McCrae & Costa, 1987; McCrae & John, 1992). Researchers have already shown that personality can be accurately analyzed from social media data (Celli et al., 2014; Farnadi et al., 2016; Pratama & Sarno, 2015; Tandera et al., 2017). Companies could use this information to understand better potential customers and tailor products (Lindemann et al., 2020). Another application is the analysis of tribe affiliations. A tribe is a group of people who share certain passions and emotions (Cova & Cova, 2002). Since belonging to tribes is more important for individuals than belonging to a particular social segment or class (Cova, 1996), the construct of tribes is also very important for companies (Holzweber et al., 2015). Gloor 190
Congruence between customer and brand personality 191 et al. (2019) have shown that people can be assigned to tribes based on their tweets. For this purpose, they have defined six macro-categories: Alternative Reality, Ideology, Lifestyle, Emotions, Personality, and Recreation, each comprising four tribes (Gloor et al., 2019). Looking at each category to which individuals belong, Tribefinder separates them into four tribes: Fatherlander, Nerd, Spiritualist, and Treehugger for the Alternative Reality category; Anger, Fear, Happy, and Sad for the Emotions category; Capitalism, Complainers, Liberalism, and Socialism for the Ideology category; Fitness, Sedentary, Vegan, and Yolo for the Lifestyle category; Journalist, Politician, Risk-Taker, and Stock-Trader for the Personality category; and, lastly, Art, Fashion, Sport, and Travel for the Recreation category. These tribes, which also reflect personality characteristics, impact how customers perceive a brand. Mulyanegara et al. (2009) have studied the influence of human personality on preferences for particular brand personalities and discovered that neurotic individuals prefer trusted brands, whereas extroverted individuals prefer sociable brands. Similarly, Smith (2020) has investigated the influence of personality on customer loyalty. In general, researchers have proven that people are often interested in those brands whose personality is congruent with their own (Dikcius et al., 2013; Maehle & Shneor, 2010; Mulyanegara et al., 2009). Although much research has been done in the area of brand and customer personality, to the best of our knowledge, no study exists that uses social media data and machine learning approaches to automatically investigate whether congruence between customer and brand personality has a positive influence on the success of a company. In this study, we analyze how congruence between customer and brand personality affects a company’s success in the automotive industry. First, the theoretical background of personality and tribe theory is explained. It is also described how these characteristics can be analyzed based on social media data (Section 2). Next, the methodology of this work is presented in Section 3. We used IBM Watson Personality Insights and the Griffin Tribefinder to analyze the personality of brands and customers. Then, congruence between potential customers and brand personality was calculated using a similarity index and correlated with the sales data of 29 companies operating in the car industry. The results are then shown, tested, and discussed in Section 4. Lastly, the limitations of our work, implications for marketers, and ideas for future research are explained.
2.
THEORETICAL BACKGROUND
Digital transformation and the resulting new offerings, such as mobility-as-a-service, autonomous, and electric vehicles, are changing the automotive industry (Llopis-Albert et al., 2021). The transformation makes the automotive industry more complex and diversified, and traditional car manufacturers have to prepare for new competitors (Gao et al., 2016). Brand management plays an essential role in succeeding in a competitive market since a strong and focused brand personality can be used to differentiate from competitors (Lin, 2010; Maehle & Shneor, 2010). Since people generally only buy a car every few years, they usually pay much attention to which car fits them best. They evaluate quality and price and are affected by brand image (Dhanabalan et al., 2018), a key factor in brand personality (Lin, 2010).
192 Handbook of social computing Table 10.1
Explanation of the Five Factors Model
Factor
Explanation
Agreeableness
People with high levels of Agreeableness are cooperative and helpful. They are optimistic and trust other people.
Conscientiousness
Humans with a high value of Conscientiousness are reasonable and rational. They are described as tidy, punctual, and well organized. They have a high level of aspiration and strive for excellence.
Extraversion
Individuals scoring high in Extraversion are friendly and energic. They like to be active and love excitement and thrills. In addition, they enjoy being around many people and drawing inspiration from others.
Neuroticism
People with a high Neuroticism score worry a lot and are often anxious. They are easily irritable, cannot handle stress well, and are often angry. They seem shy, especially in contact with strangers.
Openness
Those with a high level of Openness enjoy new activities and like variety. They have a medium/ high level of curiosity and often have a liberal vision in social and political aspects.
2.1
Big Five Personality Model
Personality is “a characteristic way of thinking, feeling, and behaving. Personality embraces moods, attitudes, and opinions and is most clearly expressed in interactions with other people. It includes inherent and acquired behavioral characteristics that distinguish one person from another” (Holzmann, 2020). For a long time, parallel research was conducted on many different personality models until McCrae and John (1992) developed the Five-Factor Model (also known as the Big Five or OCEAN Model), built on the principles developed by Norman (1963). Although Norman’s factors (Extroversion or Surgency, Agreeableness, Conscientiousness, Emotional Stability, and Culture) are very similar to the Five-Factor Model, Norman (1963) has received little attention for his model (McCrae & John, 1992). Over recent years, the Five-Factor Model has prevailed over other models, such as the HEXACO Model (Ashton & Lee, 2001), and has become the predominant model in the literature (Ambroise et al., 2005; Donvito et al., 2020; Feher & Vernon, 2021). It consists of Five Factors (Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness) and 30 underlaying Traits (McCrae & John, 1992). The most comprehensive test to measure the Five Factors and thus a human’s personality is the Revised NEO Personality Inventory (NEO-PI-R) test (Gosling et al., 2003; McCrae et al., 2005; Schmitt et al., 2007). The NEO-PI-R developed by McCrae and Costa (2004) consists of 240 items used to determine the 30 Facets (Costa & McCrae, 2008). Based on these 30 Facets, the Five Factors are then calculated (Costa & McCrae, 2008). The current version is the NEO-PI-3 (McCrae et al., 2005). Since the NEO-PI-R, with 240 items, is very comprehensive and takes about 45 minutes (Gosling et al., 2003), McCrae and Costa Jr (2004) developed another version, the Revised NEO Five-Factor Inventory (NEO-FFI-R), consisting of only 60 items. However, this test can only determine the Five Factors, not the 30 Facets of a personality. 2.2
Predicting Personality Traits using Social Media Data
The Cambridge Analytica scandal has shown that taking a 45-minute NEO-PI-R test to analyze a person’s personality is no longer necessary. For the 2016 U.S. election campaign,
Congruence between customer and brand personality 193 Cambridge Analytica analyzed the Facebook accounts of about 230 million Americans for their personalities (Leitel, 2018). Donald Trump has used this personality information in his successful presidential campaign to target negative ads (Schwarz, 2018). The analysis examined how traditional, fearful, or social the individuals were, and the generated data was used for targeted advertising (Leitel, 2018). Researchers have previously found that user behavior on social media platforms is related to personality traits (Amichai-Hamburger & Vinitzky, 2010; Azucar et al., 2018; Kuss & Griffiths, 2011; Schwartz et al., 2013; Seidman, 2013). This idea is mainly supported by the meta-analysis of Azucar et al. (2018), in which 14 previously published studies were examined and compared. The results show that data collected from social media platforms can be used to analyze user personality traits reliably and accurately, as the correlations are moderate to strong. For instance, individuals with a high Agreeableness score use fewer swear words and express significantly more positive than negative emotions (Schwartz et al., 2013). Furthermore, they mostly post positive pictures in which they laugh (Liu et al., 2016). Conscientious individuals have more friends on social networks (Amichai-Hamburger & Vinitzky, 2010) but like fewer posts and participate less in groups (Kosinski et al., 2014). They also upload fewer pictures (Amichai-Hamburger & Vinitzky, 2010). Extroverts are often highly active on social media platforms and are in danger of becoming addicted to them (Blackwell et al., 2017; Kuss & Griffiths, 2011). Furthermore, they also have more friends than introverts and act more often in groups (Kosinski et al., 2014). People high in Neuroticism are more likely to use social media passively to learn about others (Seidman, 2013) and publish very little private information, such as email addresses or phone numbers (Amichai-Hamburger & Vinitzky, 2010). Open people use more features of social media platforms (Amichai-Hamburger & Vinitzky, 2010), frequently like posts, and are active in groups (Kosinski et al., 2014). Researchers use these correlations and build machine learning algorithms to predict a user’s personality based on social media data such as written text, pictures, or likes (Golbeck et al., 2011; Liu et al., 2016). In their literature review, Marengo and Montag (2020) described a standard process for a social media analysis consisting of three major steps: determination of participants’ personality traits based on self-report questionnaires, fetching participants’ social media data, and application of machine learning models to predict previously determined personality traits. The models are evaluated using various metrics such as Mean Absolute Error (MAE) or R-square (R2). The best model is then used to predict the personality traits of new users. Several researchers have successfully used this process to predict personality traits (Blackwell et al., 2017; Liu et al., 2016). Researchers focused on different platforms (mainly on Facebook and Twitter) and other data types to see which platforms are best suited for personality trait predictions. Farnadi et al. (2016) showed that using socio-demographic and language data from Twitter to predict personality traits is possible. Similarly, Pratama and Sarno (2015) found out that, based on tweets, Openness (Accuracy: 64.92) can best be predicted using the K-Nearest-Neighbor algorithm, and Neuroticism (Accuracy: 57.54) can be predicted with the worst accuracy. Sumner et al. (2012) also highlighted that personality traits are related to word usage on Twitter. Quercia et al. (2011) used the number of followers, the number of people you follow, and the number of listings and achieved interesting results in their analysis. They find that popular users are extrovert, emotionally stable, and imaginative, while influential users tend to be organized and unneurotic. Liu et al. (2016) gave evidence that profile pictures can be used to predict personality on Twitter. Skowron et al. (2016)
194 Handbook of social computing showed that Openness (MAE: 0.11) and Conscientiousness (MAE: 0.11) could be predicted best using a combination of Twitter and Instagram data. In addition to Facebook, Twitter, and Instagram, other platforms such as Sina Weibo, one of the largest Chinese microblogging services, were also used as a data source for personality analysis (Gao et al., 2013; Wei et al., 2017). From the previously presented studies it can be deduced that the individual factors of the Five Factor Model can best be derived from different features such as likes or free text. The best features also differ depending on the platform. In summary, social media platforms are well suited for analyzing people’s personalities based on the data made available there. 2.3 Tribes When buying a product, both the personality of a person as well as micro-social factors such as tribes need to be considered (Cova & Cova, 2002). A tribe “is defined as a network of heterogeneous people – in terms of age, sex, income, etc. – who are linked by a shared passion or emotion; a tribe is capable of collective action; its members are not simple consumers they are also advocates” (Cova & Cova, 2002: 10). Since belonging to tribes is more important for members than belonging to a particular social segment or class (Cova, 1996), the construct of tribes is very important for the survival of companies (Holzweber et al., 2015). Especially for marketing managers, it can be vital to understand how to create and sustain tribal communities (Goulding et al., 2013). Companies should organize their marketing activities to meet the needs of tribe members for the success of marketing (Holzweber et al., 2015). Tribal marketing can address either individuals or the entire tribe (Cova & Cova, 2002). Understanding which tribes are particularly attracted to which products or services can greatly improve the success of marketing campaigns (Cova & Cova, 2002; Gloor et al., 2020). Identifying tribes requires significant effort (Cova & Cova, 2002), and the shift from offline to online makes identification and definition even more difficult (Gloor et al., 2020; Hamilton & Hewer, 2010). With the increased use of social media (Tankovska, 2021), the concept of e-tribes or virtual tribes has emerged in the literature (Cova & Pace, 2006; Hamilton & Hewer, 2010). To overcome difficulties and the limitations of previous systems in identifying and defining tribes, Gloor et al. (2019) developed a system called Tribefinder to identify tribes based on Twitter data. The system is based on the analysis of the common language of tribes (Gloor et al., 2020). After defining tribes, the probability that an individual belongs to a certain tribe can be calculated using Tribefinder (Gloor et al., 2019). Since Tribefinder is used for the analysis of tribe membership in this paper, the system is explained in more detail in the Methodology section. 2.4
Brand Personality and Human Personality
Brand personality, which is a core element of a brand’s image (Maehle & Shneor, 2010), is defined as “the set of human characteristics associated with a brand” (Aaker, 1997, p. 347). This definition by Aaker is one of the most widely used (Dikcius et al., 2013). Accordingly, brand personality can be described by the same characteristics that are used to describe humans: socio-demographics (e.g., gender or age), lifestyle (e.g., hobbies), and other characteristics (e.g., warmth or sentimentality) (Aaker, 1996).
Congruence between customer and brand personality 195 Self-congruity “is a psychological process and outcome in which consumers compare their perception of a brand image (more specifically, brand personality or brand-user image) with their self-concept (e.g. actual self, ideal self, social self)” (Kressmann et al., 2006). Researchers have proven that congruence between a customer’s and a brand’s personality impacts the choice of brands, customer satisfaction, and loyalty (Ambroise et al., 2005; Donvito et al., 2020; Jamal & Goode, 2001). A study by Jamal and Goode (2001) showed that a customer’s self-image influences the choice of a product; more precisely, self-image congruence positively influences the choice of a product. Similarly, Wu et al. (2020) showed that self-image congruity significantly affects the intention to use mobile apps. These findings are consistent with a study by Ambroise et al. (2005) in the food industry and a study by Lin (2010) in the gaming industry. A meta-analysis by Sop (2020) also proved that self-image congruity impacts vacation choice, and research in this area has intensified in recent years. The study of Jamal and Goode (2001) showed that in addition to the positive influence on brand choice, self-image congruence positively influences customer satisfaction. Furthermore, Alguacil et al. (2019) presented that self-image congruence also affects the satisfaction with services and does not only apply to product evaluations. Closely related to satisfaction and product/service evaluation is loyalty. As Lin (2010) has shown, related to the Taiwan toy market, the perceived brand personality traits of Competence and Sophistication have a significant positive influence on customer action and loyalty. An international study by Donvito et al. (2020), who analyzed the luxury brand market, showed that a high congruence between brand and customer personality significantly impacts the attachment between customer and brand. These results are consistent with findings from other studies such as by Kressmann et al. (2006). In summary, brand personality is a powerful way to differentiate a brand from its competitors and their products (Lin, 2010; Maehle & Shneor, 2010). Thus, brands with the same personality as their customers should be more successful. This is tested in this work by analyzing the personality of customers and car manufacturers using social media data (Section 3). Car manufacturers that appeal to customers with the same personality should be more successful than others. The success of manufacturers is assessed in this paper based on their sales in the United States.
3. METHODOLOGY The methodology implemented in this work consists of four steps. The first step is the data collection: in Section 3.1, we show how the brand and potential consumers were selected and how we collected their Twitter data and sales; then we show how we predict personality traits, describing the IBM Watson Personality Insights system; in Section 3.3 we explain Tribefinder, the tool used to predict Tribe affiliation for brands and customers. Finally, we introduce the similarity index used to calculate congruence between potential customers and brand personality, which is used to compare sales with brand personality alignment. 3.1
Data Collection
First, we gathered quarterly sales data of car manufacturers in the U.S.A. They were collected from the portal goodcarbadcar.net, which publishes sales and other information on car manu-
196 Handbook of social computing facturers in the U.S.A. and Canada. Sales data were also cross-validated with a second source, namely statista.com. We collected sales data for about 37 car manufacturers in the period from the third quarter of 2018 (2018/Q3) to the first quarter of 2021 (2021/Q1). To ensure that only manufacturers serving the mass market are considered, only those selling more than 10,000 cars each quarter were included. Due to this exclusion criterion, 29 of the 37 manufacturers remained. Alfa Romeo, Bentley, Fiat, Genesis, Jaguar, Maserati, Mini, and Smart were not included. Indeed, these brands only have a combined market share of less than 1 percent in the U.S.A. The remaining car manufacturers are listed below: Acura, Audi, BMW, Buick, Cadillac, Chevrolet, Chrysler, Dodge, Ford, GMC, Honda, Hyundai, Infinit, Jeep, Kia, Land Rover, Lexus, Lincoln Motor, Mazda, Mercedes-Benz, Mitsubishi, Nissan, Porsche, RAM, Subaru, Tesla, Toyota, Volvo, and VW. Having decided which manufacturers were included in the data analysis, corresponding Twitter accounts were determined for each brand. Since some brands have multiple accounts, we developed criteria for choosing a Twitter account. In general, U.S.-specific accounts (e.g., mercedesbenzusa) were preferred for the data analysis. Furthermore, the accounts must have at least 100,000 followers to have a sufficiently large reach. However, there were two U.S.-specific accounts (bmwusa & nissanusa) that had a large enough reach but only a few potential customers could be identified. In these cases, the global account (bmw & nissan) was preferred. For the selected brands, the timeline of each Twitter account was fetched via the Twitter Application Programming Interface (API). The Twitter timeline is composed of all tweets of a user. As only the latest 3,250 tweets can be accessed via the Twitter API, the points in time up to which a manufacturer’s tweets can be collected differed significantly. This affected the period in which the brand and potential customer personality can be analyzed. Since tweets from most manufacturers went back to at least 2019/Q4, the tweets in the period from 2019/Q4 to 2021/Q1 were considered in this data analysis. To identify potential customers from tweets of a manufacturer, we applied a two-step process. First, we considered manufacturers congratulating Twitter users on a car purchase. Moreover, some users post information on a car purchase and most manufacturers respond to these tweets to congratulate them on the purchase. Actual customers were assigned to the quarter in which they posted information about their purchases. Secondly, potential customers were identified by looking at users who shared tweets from manufacturers. To increase data quality, only tweets from a manufacturer were considered that mentioned a current car model sold at least until 2019. All Twitter users who shared a tweet related to a car model were considered potential customers. However, car dealers also shared manufacturers’ tweets and were removed from the list of potential customers. To separate car dealers from real potential customers, we defined car dealer specific terms like auto group, customer service, deal, family-owned, repair, retail, sales, sell, shop, store, test drive, and vehicle provider. Potential customers who had one of these terms in their name or profile description were removed as were Twitter users who had the brand name in their Twitter name. However, Twitter users with the brand name in their profile description were not removed, so brand enthusiasts were not excluded from this data analysis. The number of car dealers was used to determine a dealer ratio per quarter for each manufacturer. The dealer ratio per quarter (q) and brand (b) is calculated as follows: #identified dealerb,q
___________________________________ dealer ratiob,q = #identified dealerb,q + #true potential customersb,q
(10.1)
Congruence between customer and brand personality 197 In this way, in addition to the congruence between customer and brand personality, it is also possible to determine how the number of car dealers affects sales. In this way, a total of 33,924 customers were identified for the 29 manufacturers. The number of identified customers varied significantly, from 182 customers for Mazda to 4,821 customers for Mercedes-Benz. 3.2
Predicting Personality Traits: IBM Watson Personality Insights
We analyzed personality traits using IBM Watson Personality Insights, a cloud service from IBM. IBM Watson Personality Insights analyzes personality traits based on free text information (IBM, 2021b). Therefore, the service is well suited to analyze tweets of selected Twitter users and brands. In addition to the already introduced Five Factors and the underlying Traits, the service also provides scores for 12 Needs, 5 Values, and 42 Consumption preferences. However, IBM Watson Personality Insights does not return absolute personality scores but percentile scores (IBM, 2021b). For this purpose, the Twitter profiles of 1 million English-speaking people were analyzed, and the returned values were calibrated by these results (IBM, 2021b). A score of 0.5 for a certain characteristic means that this characteristic corresponds to the median of the characteristic of the 1 million people. In a validation study conducted by IBM, the accuracy of the machine learning models was evaluated using 1,500–2,000 Twitter users per language (IBM, 2021b). IBM Watson calculates scores for the Five Factors, the 36 underlying Facets, for 12 Needs, for 5 Values, and for 42 Consumptions Preferences subdivided into buying behavior, film, music, reading and learning, health and activity, entrepreneurship, environmental awareness, and volunteering. Returned scores range from 0 (weakly expressed) to 1 (strongly expressed). The scores are normalized as percentiles, as they are compared to a sample population (IBM, 2021a). A score greater than 0.5 means that a personality characteristic is above average, whereas a score less than 0.5 indicates that a characteristic is below average. On the other hand, consumption preferences are categorical values that express how likely it is that a person has a certain preference. 0 means that someone is unlikely to have a consumption preference, 0.5 that someone is neutral, and 1 that someone is very likely to have this preference. 3.3
Predicting Tribe Affiliation: Tribefinder
We performed the tribe affiliation analysis using the Griffin web tool (https://griffin .galaxyadvisors.com/), which includes the Tribefinder. Tribefinder is an AI-based system with the objective to “categorize Twitter users into alternative orthogonal tribes” (Gloor et al., 2019, p. 3). To realize this objective, Tribefinder extracts information on key people, brands, and topics from tweets of Twitter users (Gloor et al., 2019). Based on this information, users (in our case, consumers and brands) are assigned to one tribe for each macro-category. The different macro-categories and tribes are shown in Figure 10.1 and are detailed in Gloor et al. (2019).
198 Handbook of social computing
Figure 10.1 3.4
Tribal macro-categories with tribes
Similarity Index and Test Validation
After analyzing the personality characteristics of potential customers and producers, we systematically determined the congruence between customers and brand personality characteristics. Congruence was calculated in the form of a similarity index, which describes for each personality characteristic and each quarter how similar the average customer personality is to the manufacturer’s brand personality. The similarity index is calculated as follows, where c represents the personality characteristic and q the respective quarter:
|
|
S imilarity I ndexc,q = 1 − brandPersonalityc,q − custumerPersonalityc,q
(10.2)
The similarity index can take scores between 0 and 1. The higher the similarity index, the more similar customers and brands are to each other, and the lower the similarity index, the more different customers and brands are. We first calculated Pearson correlation coefficients and, based on significant correlations, a panel data regression model was used to test how much sales variance could be explained by congruence between customer and brand personality. All the results are shown in the next section.
4.
RESULTS AND DISCUSSION
We do not report the disjointed results of the personality and tribe affiliation analyses for each brand and each group of potential consumers in this section for the brevity of discussion. However, since the goal of the study is to assess whether congruence between brand personality and consumers generates a competitive advantage for companies, we first show as an example the comparison between IBM Watson Personality and Tribefinder for Ford’s brand and its consumers. Then we present the results of the correlations and the panel data regression models. We use radar charts to visualize the IBM Watson Personality and Tribe affiliation analysis results.
Congruence between customer and brand personality 199
Figure 10.2a Comparison of Five Factors of Ford and their customers Most car manufacturers have either very high or very low scores on the Five Factors. As a result, Ford’s Five Factors differ significantly from those of its customers, which are rather in the moderate range. The highest similarity between brand and customer personality at Ford can be seen for Neuroticism and Openness (Figure 10.2a). Figure 10.2b shows that Ford’s Needs and those of its customers are more similar. The Needs Practicality and Stability have the highest similarity, whereas Self Expression has the lowest similarity. Figure 10.2c shows the Values of Ford and its customers. The highest fit exists for the Values of Conservation and Self Transcendence. Self Enhancement and Hedonism, on the other hand, exhibit a rather lower fit. For Consumption Preferences (Figure 10.2d), there is a high similarity in Concerned Environment and Influenced by Brand Name. For other preferences, Ford has either very high or very low scores, which result from the fact that Consumption Preferences can only take the scores of 0, 0.5, and 1. Figure 10.3 shows that tribe affiliations in the Alternative Reality, Personality, and Recreation categories are rather similar while comparing Ford with its customers, although there are minor differences. In the Emotions category, tribe affiliation differs between Ford
200 Handbook of social computing
Figure 10.2b Comparison of needs of Ford and their customers and its customers. Ford is predominantly assigned to the Happy tribe, while Ford’s customers are predominantly assigned to the Fear and Anger tribes. Likewise, there is a low fit in the Ideology and Lifestyle categories. This analysis was repeated for customer and brand personality characteristics of all car manufacturers included in our sample. The congruence between customers and brand personality characteristics was systematically determined using the similarity score. Figure 10.4 shows the distribution of the similarity scores for all 25 brands for which the personality of at least 70 potential customers could be analyzed in at least one quarter. For four brands (Infiniti, Lincoln, Mazda, Volvo), the personality could not be analyzed for at least 70 potential customers in any of the six quarters (2019/Q4–2021/Q1). Therefore, these are not further considered in the data analysis. In total, more than 70 customer personalities could be analyzed for the various manufacturers in six quarters and 102 observations. As the upper
Congruence between customer and brand personality 201
Figure 10.2c Comparison of values of Ford and their customers whiskers show, all brands have a similarity index of 1 for at least one personality characteristic. At the same time, however, some brands tend to have higher similarity indexes, while others tend to have lower ones. Porsche (0.86) and Toyota (0.84) have the highest median value. Audi (0.58) and BMW (0.66) have the lowest ones. Since this data analysis aims to show correlations between the congruence of customer and brand personality and sales, the distribution of the similarity indexes of two car manufacturers from the ten best-selling (Toyota & Nissan) and ten worst-selling (Audi & Land Rover) manufacturers are Figure 10.5. On the x-axis, the similarity index is shown, while, on the y-axis, the probability density function is shown. In Figure 10.5, the relationship between high similarity indexes and sales strength can be seen clearly. Toyota and Nissan, for example, have more high similarity indexes (higher than 0.8), while Audi and Land Rover tend to have lower similarity indexes (lower than 0.8). To better understand the relationship between the similarity index and sales, the influence of the similarity index on sales was systematically examined, with results shown in Table 10.2. Neither the number of a manufacturer’s Twitter followers nor the number of identified potential customers significantly impacted sales. By contrast, the dealer ratio seems to have
202 Handbook of social computing
Figure 10.2d Comparison of consumption preferences of Ford and their customers
Figure 10.3
Comparison of tribe affiliation of Ford and their customers
Congruence between customer and brand personality 203
Figure 10.4
Distribution of the similarity scores per brand
Figure 10.5
Distribution of similarity index of Audi, Land Rover, Nissan, and Toyota
Table 10.2
Correlations of control variables and sales
Control Variables
Correlation with Sales
Twitter Follower
−0.1547
Identified Customers
−0.1545
Dealer Ratio
0.5248*
Notes:
n = 102; * p < 0.05.
204 Handbook of social computing Table 10.3
Correlations of Five Factor similarity and sales
Five Factors
Correlation of Similarity with Sales
Agreeableness
−0.0157
Conscientiousness
0.0989
Extraversion
0.0899
Neuroticism
0.2462*
Openness
0.1216
Notes:
n = 102; * p < 0.05.
Table 10.4
Correlations of Five Factors facet similarity and sales
Facets
Correlation of Similarity with Sales
Agreeableness Altruism
0.0452
Cooperation
0.1468
Modesty
0.0124
Morality
0.0868
Sympathy
−0.051
Trust
0.2341*
Conscientiousness Achievement striving
0.0905
Cautiousness
0.2905**
Dutifulness
0.2076*
Orderliness
−0.0502
Self-discipline
0.0761
Self-efficacy
0.0185
Extraversion Activity level
0.1723
Assertiveness
0.0695
Cheerfulness
−0.0647
Excitement-seeking
0.1144
Friendliness
0.0598
Gregariousness
0.0417
Neuroticism Anger
0.2828**
Anxiety
0.2259*
Depression
0.1174
Immoderation
0.0401
Self-consciousness
−0.04
Vulnerability
0.1336
Openness Adventurousness
0.1482
Artistic interests
0.1121
Emotionality
−0.1881
Imagination
0.0892
Intellect
−0.1011
Liberalism
−0.0401
Notes:
n = 102; * p < 0.05; ** p < 0.01.
Congruence between customer and brand personality 205 Table 10.5
Correlations of need similarity and sales
Needs
Correlation of Similarity with Sales
Excitement
0.2152*
Harmony
0.0428
Curiosity
0.2119*
Ideal
0.3286**
Closeness
0.0488
Self-expression
−0.0821
Liberty
0.1772
Love
0.4382**
Practicality
0.2347*
Stability
0.4528**
Challenge
0.0689
Structure
−0.3460**
Notes:
n = 102; * p < 0.05; ** p < 0.01.
Table 10.6
Correlations of value similarity and sales
Values
Correlation of Similarity with Sales
Self-transcendence
0.0231
Conservation
0.3885**
Hedonism
0.3838**
Self-enhancement
0.1827
Open to change
0.0087
Notes:
n = 102; ** p < 0.01.
a strong positive influence on sales. This is not surprising, as potential customers purchase cars at car dealers, and the more car dealers there are and the more active they are on Twitter, the easier it is to purchase a car. Table 10.3 shows the correlations between similarity indexes for the Five Factors and sales. The correlations indicate that congruence between customer and brand personality for Neuroticism leads to a significant increase in sales. As Table 10.4 indicates a significant correlation for the facets of Trust, Cautiousness, Dutifulness, Anger, and Anxiety, with Cautiousness having the highest correlation of 0.29. Regarding Needs (Table 10.5), we find a particularly large number of significant correlations. Only Structure similarity has a negative influence. This shows that similarity can also negatively impact sales for certain personality dimensions. Two out of five correlations for Values are also significant (Table 10.6). The Values of Conservation and Hedonism have a strong and positive correlation. Table 10.7 shows the correlations between similarity for Consumption Preferences and sales. It is important to note that although all Consumption Preferences are shown, the relevance of some preferences for the automotive industry must be doubted. Even though the similarity index of the preferences Movie Science Fiction and Music Latin correlate strongly with sales, it is difficult to interpret these scores unambiguously. Still, some relationships make very much sense for the automotive industry. For example, Clothes Quality and Clothes Comfort preferences can be transferred well to the automotive industry. There are customers for whom the quality of a car is important, and there are customers for whom comfort is
206 Handbook of social computing Table 10.7
Correlations of consumption preference similarity and sales
Consumption Preferences Shopping preferences Automobile ownership cost Automobile safety Clothes quality Clothes style Clothes comfort Influence brand name Influence utility Influence online ads Influence social media Influence family members Spur of moment Credit card payment Movie preferences Movie romance Movie adventure Movie horror Movie musical Movie historical Movie science fiction Movie war Movie drama Movie action Movie documentary Music preference Music rap Music country Music R&B Music hip hop Music live event Music playing Music Latin Music rock Music classical Reading and learning preferences Read frequency Books entertainment magazines Books nonfiction Books financial investing Books autobiographies Health and activity preferences Eat out Gym membership Outdoor Entrepreneurship preferences Start business Environmental concern preferences Concerned environment Volunteering preferences Volunteer
Notes:
n = 102; * p < 0.05; ** p < 0.01.
Correlation of Similarity and Sales −0.0336 0.0065 0.3453** 0.0165 0.3445** 0.1491 0.0153 −0.0927 0.1436 0.0500 0.2013* 0.3324** −0.0423 0.3227** −0.0974 0.2385* 0.1821 0.3234** 0.0758 −0.0112 −0.1683 0.3352** 0.1348 −0.0938 0.1835 0.1736 −0.0259 0.0979 0.3947** 0.2134* 0.1187 0.1750 −0.0905 0.3625** 0.0934 −0.1838 0.2488* −0.0067 0.0356 0.2904** 0.3592** −0.1109
Congruence between customer and brand personality 207 more important. Similarity for both preferences correlates strongly with sales, suggesting that customers take quality and comfort into account when buying a car and choose a brand that fits their expectations of quality and comfort. Likewise, similarity regarding Concerned Environment correlates significantly with sales, indicating that customers pay strong attention to a brand’s attitude towards the environment when choosing a manufacturer. In total, 14 significant correlations were identified for Consumption preferences. As shown in Table 10.8, only similarity for the tribe Personality Journalist is significantly and positively correlated with sales. In contrast, four significant negative correlations (Liberalism, Socialism, Risk-taker, and Travel) were found. These negative correlations additionally illustrate that similarity does not always impact sales positively but can also have a negative impact. This can be the case when customers see the opportunity to complement their personality to increase self-esteem (Gao et al., 2009; Park & John, 2010). Table 10.8
Correlations of tribe affiliation similarity and sales
Tribes
Similarity Index
Alternative Realities Fatherlander
0.1177
Nerd
0.1786
Spiritualism
−0.0567
Treehugger
−0.0412
Emotions Anger
−0.1308
Fear
0.1046
Happy
0.0496
Sad
0.1900
Ideology Capitalism
−0.0556
Complainers
−0.0391
Liberalism
−0.2165*
Socialism
−0.3031**
Lifestyles Fitness
−0.0269
Sedentary
−0.1598
Vegan
0.1805
Yolo
0.0295
Personality Journalist
0.1948*
Politician
−0.099
Risk-taker
−0.2118*
Stock-trader
−0.155
Recreation Arts
−0.1008
Fashion
−0.0708
Sport
0.0241
Travel
−0.2539*
Notes:
n = 102; * p < 0.05; ** p < 0.01.
208 Handbook of social computing
Note:
Light gray text stands for a positive influence and black for a negative influence.
Figure 10.6
Four important categories of brand–customer congruence
Figure 10.6 summarizes the results presented above. The most significant influencing factors were classified into four categories based on their correlations and meaning. Steadiness describes how much people value continuity and stability, whereas Product Quality summarizes personality characteristics that define how much people value high-quality products. Adventure contains personality characteristics that describe how adventurous someone is. Similarity in Steadiness, Product Quality, and Adventure positively affects a company’s success. In contrast, customers seem to value dissimilarity with characteristics from the Complements category. Therefore, the more different customers and brand personalities are in Complements, the more successful the company is. This difference is illustrated in Figure 10.6. Based on these results, marketing managers can infer which personality characteristics their customers have and which brand personality traits they value the most. For example, the two brands with the best-selling cars, Ford and Toyota, are excellent at matching their brand personality to that of their customers. Both are among the top three brands with the highest similarity indexes in Steadiness. In addition, Ford has the best similarity index for Product Quality, and Toyota is among the top three manufacturers with the best similarity for Adventure. Other manufacturers such as Infiniti or Cadillac still have a long way to go to adapt their brand personality to that of their customers. Finally, to test how much of the variance in sales can be explained by congruence between customer and brand personality, a panel regression was conducted (Baltagi & Baltagi, 2008). Since the Breusch–Pagan Test indicates homoskedasticity (p-value = 6.03e−05) and the Durban–Watson Test indicates positive autocorrelation in the data (1.23), either a Fixed-effects Panel Regression or a Random-effects Panel Regression must be used. To test which of these
PanelOLS
102
Tue, Aug 03 2021
07:23:05
Unadjusted
25
4.08
1
6
6
17
10
21
Estimator
No. Observations
Date
Time
Cov. Estimator
Entities
Avg. Observations
Min. Observations
Max. Observations
Time periods
Avg. Observations
Min. Observations
Max. Observations
−6.543e+05
ideology_socialism
2.271e+05
1.028e+05
9.008e+04
3.226e+05
−4.417e+05
dealerRatio
potentialCustomers
need_stability
1.955e+05 69.820
1.138e+06
274.75
const
need_structure
2.245e+05
Parameter
6.644e+05
Parameter Estimates
sales
Dep. Variable
Std. Err.
Results of fixed-effects Panel Regression
Panel OLS Estimation Summary
Table 10.9
T-stat
−2.8812
−4.2988
3.5816
3.9352
5.8199
2.9591
0.0049
0.0000
0.0006
0.0002
0.0000
0.0039
P-value
Distribution
P-value
F-statistic (robust)
Distribution
P-value
F-statistic
Lower CI
−1.105e+06
−6.459e+05
1.437e+05
136.06
7.494e+05
2.184e+05
F(5,91)
0.000
21.339
F(5,91)
0.000
21.339
−1311.7
0.5182
R (Overall) Log-likelihood
2
−4.3693
0.6004
0.5404
R 2 (Within)
R 2 (Between)
R 2
Upper CI
−2.032e+05
−2.376e+05
5.015e+05
413.44
1.526e+06
1.11e+06
Congruence between customer and brand personality 209
210 Handbook of social computing two panel regressions fits best, a Hausman Test was performed. The low p-value (8. 03e−38 < 0.05) shows that the Fixed-effects Panel Regression is the best choice. For the Fixed-effects Panel Regression, all significant similarity indexes from the Pearson correlation analysis (see Tables 10.3–10.8) and the control variables Dealer ratio and the number of Potential customers were used as input features. The number of followers was not considered as it was not found to be significant. Then, one by one, the least significant features were removed until only significant features remained. The results of the Fixed-effects Panel Regression are shown in Table 10.9. The significant similarity indexes and the two control variables could explain more than 54 percent of sales variance. If only the control variables are considered, the explained variance decreases to 28.9 percent. Taken alone, the similarity indexes of Need Stability, Need Structure, and Ideology Socialism can explain 25.1 percent of sales variance. All other significant correlations from Tables 10.2 to 10.8 are significant on their own but lose significance in regression models.
4.
CONCLUSION, IMPLICATIONS, AND FUTURE RESEARCH
Our findings show that customer and brand personality congruence could impact company sales. We find 34 significant correlations of similarity in customer and brand personality and sales (29 are positive). However, there are also characteristics with a negative effect. In the papers that have been published about congruence between customer and brand personality, other variables were considered, such as the impact on customer satisfaction or loyalty (Alguacil et al., 2019; Donvito et al., 2020; Jamal & Goode, 2001). In addition, to the best of our knowledge, this study is the first to demonstrate the impact of a congruent customer and brand personality in the automotive industry. Studies on customer and brand personality have already been published, but without measuring a possible correlation with sales (Dikcius et al., 2013). This work shows that personality analysis based on social media data can lead to good results. Such data analyses make it easier to analyze large customer samples without going through time-consuming interviews or surveys. Nonetheless, privacy should always be considered, which is why data should only be published anonymously and not with profile names or other personal information. Our findings also have implications for car manufacturers. The main finding shows that brands that are similar to their customers are more successful than other brands. However, the results also show that a fit does not have to exist for every personality characteristic for a brand to be successful. For example, we find that, while customers must be financially able to afford a car, a fit in other factors plays a more significant role. Likewise, a fit in the safety-related aspects does not seem to be significantly relevant. This could be because potential customers assume that all registered cars are safe and have been tested by the respective authorities. On the other hand, there are many factors for which a fit significantly impacts sales. Congruence in the scores of Excitement, Ideal, Practicality, Stability, Clothes quality, Clothes comfort, and Concerned Environment seems particularly relevant. Although the consumption preferences Clothes quality and Clothes comfort refer to clothing, they can also be applied to the automotive industry. This also shows that customers take into account the environmental impact of cars. As much of the population becomes more environmentally conscious, car manufacturers
Congruence between customer and brand personality 211 should continue to invest in green technologies, which they will have to do in any case due to stricter environmental regulations. In addition to information on how personality congruence affects sales, car manufacturers can also use our tools to analyze their customers’ personalities. This enables marketing channels to be tailored to customers and their needs For example, Bowden-Green et al. (2020) show that people with a high score for Extraversion spend much time on social media platforms. Thus, a marketing campaign on social media platforms pays off especially for companies with many extrovert customers. Future research could apply the methodology presented in this work to other sectors besides the automotive industry. One could also identify factors that reinforce or mediate influence. For example, one could examine whether the cost of a product has a significant impact on the influence of brand–customer personality congruence, ultimately affecting business success. In addition, one could also examine the effect of cultural factors (Hofstede et al., 2005). To obtain a larger data set with even more reliable results, this study could be conducted over a more extended period of time. In addition, different dimensions of success could be considered (e.g., revenue, profit, return on equity, return on sales). Lastly, the limitations in data collection could be eliminated by cooperating with car manufacturers and requesting their customers to provide a social media profile after car purchase. Other social media platforms could be included in the analysis so that customers who only have a Facebook account, for example, could also be covered. In addition to the free text information, other features such as images could also be evaluated to increase the quality of the personality analysis. In conclusion, it can be said that brands that have a similar personality to their customers are more successful than others. However, there are some traits for which it is better to have dissimilar scores to complement customers’ personality characteristics. Using text mining to analyze customer and brand personality can help marketing managers align new products with the expectations of their customers.
REFERENCES Aaker, D. (1996). Building Strong Brands. The Free Press. Aaker, J. L. (1997). Dimensions of brand personality. Journal of Marketing Research, 34(3), 347–56. https://doi.org/10.1177/002224379703400304. Alguacil, M., Núñez-Pomar, J., Pérez-Campos, C., & Prado-Gascó, V. (2019). Perceived value, satisfaction and future intentions in sport services. Academia Revista Latinoamericana de Administración, 32(4), 566–79. https://doi.org/10.1108/ARLA-04–2019–0099. Ambroise, L., Jean Marc, F., Merunka, D., Valette-Florence, P., & De Barnier, V. (2005). How well does brand personality predict brand choice? A measurement scale and analysis using binary regression models. Advances in Consumer Research Asia-Pacific Conference Proceedings, Vol. 6 (pp. 30–38). ACR. Amichai-Hamburger, Y., & Vinitzky, G. (2010). Social network use and personality. Computers in Human Behavior, 26(6), 1289–95. https://doi.org/10.1016/j.chb.2010.03.018. Ashton, M. C., & Lee, K. (2001). A theoretical basis for the major dimensions of personality. European Journal of Personality, 15(5), 327–53. https://doi.org/10.1002/per.417. Assaad, W., & Gómez, J. M. (2011). Social network in marketing (social media marketing) opportunities and risks. International Journal of Managing Public Sector Information and Communication Technologies (IJMPICT), 2(1), 13–22. https://doi.org/10.5121/ijmpict.2011.2102. Azucar, D., Marengo, D., & Settanni, M. (2018). Predicting the Big 5 personality traits from digital footprints on social media: a meta-analysis. Personality and Individual Differences, 124, 150–159. https://doi.org/10.1016/j.paid.2017.12.018. Baltagi, B. H. (2008). Econometric Analysis of Panel Data (Vol. 4). Springer.
212 Handbook of social computing Blackwell, D., Leaman, C., Tramposch, R., Osborne, C., & Liss, M. (2017). Extraversion, neuroticism, attachment style and fear of missing out as predictors of social media use and addiction. Personality and Individual Differences, 116, 69–72. https://doi.org/10.1016/j.paid.2017.04.039. Bowden-Green, T., Hinds, J., & Joinson, A. (2020). How is extraversion related to social media use? A literature review. Personality and Individual Differences, 164, 110040. https://doi.org/10.1016/j .paid.2020.110040. Celli, F., Bruni, E., & Lepri, B. (2014). Automatic personality and interaction style recognition from Facebook profile pictures. Proceedings of the 22nd ACM International Conference on Multimedia, 1101–4. https://doi.org/10.1145/2647868.2654977. Costa, P. T., & McCrae, R. R. (2008). The Revised NEO Personality Inventory (NEO-PI-R). In G. J. Boyle et al. (eds), The SAGE Handbook of Personality Theory and Assessment: Volume 2 – Personality Measurement and Testing (pp. 179–98). SAGE Publications. https://doi.org/10.4135/ 9781849200479.n9. Cova, B. (1996). What postmodernism means to marketing managers. European Management Journal, 14(5), 494–9. https://doi.org/10.1016/0263-2373(96)00043-6. Cova, B., & Cova, V. (2002). Tribal marketing: the tribalisation of society and its impact on the conduct of marketing. European Journal of Marketing, 36(5–6), 595–620. https://doi.org/10.1108/ 03090560210423023. Cova, B., & Pace, S. (2006). Brand community of convenience products: new forms of customer empowerment – the case “my Nutella The Community”. European Journal of Marketing, 40(9–10), 1087–105. https://doi.org/10.1108/03090560610681023. De Choudhury, M., Counts, S., Horvitz, E. J., & Hoff, A. (2014). Characterizing and predicting postpartum depression from shared Facebook data. Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, 626–38. https://doi.org/10.1145/ 2531602.2531675. Demling, A., & Jahn, T. (2021). Apple konkretisiert seine Autopläne – und findet wohl einen Partner. https://www.handelsblatt.com/technik/itinternet/%0Aautoindustrie-apple-konkretisiert-seine -autoplaene-und-findet-wohl-einenpartner/%0A26881614.html%0A (last accessed 23 May 2022). Dhanabalan, T., Subha, K., Shanthi, R., & Sathish, A. (2018). Factors influencing consumers’ car purchasing decision in Indian automobile industry. International Journal of Mechanical Engineering and Technology, 9(10), 53–63. Dikcius, V., Seimiene, E., & Zaliene, E. (2013). Congruence between brand and consumer personalities. Economics and Management, 18(3), 526–536. https://doi.org/10.5755/j01.em.18.3.5071. Donvito, R., Aiello, G., Grazzini, L., Godey, B., Pederzoli, D., Wiedmann, K.-P., Halliburton, C., Chan, P., Tsuchiya, J., Skorobogatykh, I. I., Oh, H., Singh, R., Ewing, M., Lee, Y., Fei, L., Chen, C. R., & Siu, N. Y.-M. (2020). Does personality congruence explain luxury brand attachment? The results of an international research study. Journal of Business Research, 120, 462–472. https://doi.org/10.1016/ j.jbusres.2020.06.047. Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., Jha, S., Agrawal, M., Dziurzynski, L. A., Sap, M., Weeg, C., Larson, E. E., Ungar, L. H., & Seligman, M. E. P. (2015). Psychological language on Twitter predicts county-level heart disease mortality. Psychological Science, 26(2), 159–69. https://doi.org/10.1177/0956797614557867. Farnadi, G., Sitaraman, G., Sushmita, S., Celli, F., Kosinski, M., Stillwell, D., Davalos, S., Moens, M.-F., & De Cock, M. (2016). Computational personality recognition in social media. User Modeling and User-Adapted Interaction, 26(2–3), 109–42. https://doi.org/10.1007/s11257-016–9171-0. Feher, A., & Vernon, P. A. (2021). Looking beyond the Big Five: a selective review of alternatives to the Big Five model of personality. Personality and Individual Differences, 169, 110002. https://doi.org/ 10.1016/j.paid.2020.110002. Gao, L., Wheeler, S. C., & Shiv, B. (2009). The “shaken self”: product choices as a means of restoring self-view confidence. Journal of Consumer Research, 36(1), 29–38. https://doi.org/10.1086/596028. Gao, P., Kaas, H.-W., Mohr, D., & Wee, D. (2016). Automotive revolution – perspective towards 2030: how the convergence of disruptive technology-driven trends could transform the auto industry. Advanced Industries, McKinsey & Company. https://www.mckinsey.com/~/media/mckinsey/ industries/automotive%20and%20assembly/our%20insights/disruptive%20trends%20that%20will %20transform%20the%20auto%20industry/auto%202030%20report%20jan%202016.pdf.
Congruence between customer and brand personality 213 Gao, R., Hao, B., Bai, S., Li, L., Li, A., & Zhu, T. (2013). Improving user profile with personality traits predicted from social media content. Proceedings of the 7th ACM Conference on Recommender Systems, 355–8. https://doi.org/10.1145/2507157.2507219. Gloor, P., Fronzetti Colladon, A., de Oliveira, J. M., & Rovelli, P. (2020). Put your money where your mouth is: using deep learning to identify consumer tribes from word usage. International Journal of Information Management, 51, 101924. https://doi.org/10.1016/j.ijinfomgt.2019.03.011 (last accessed 23 May 2022). Gloor, P. A., Fronzetti Colladon, A., de Oliveira, J. M., Rovelli, P., Galbier, M., & Vogel, M. (2019). Identifying tribes on Twitter through shared context. In Y. Song, F. Grippa, P. A. Gloor, & J. Leitão (eds), Collaborative Innovation Networks: Latest Insights from Social Innovation, Education, and Emerging Technologies Research (pp. 91–111). Springer International. Golbeck, J., Robles, C., & Turner, K. (2011). Predicting personality with social media. Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems – CHI EA ʼ11, 253. https://doi.org/10.1145/1979742.1979614. Gosling, S. D., Rentfrow, P. J., & Swann, W. B. (2003). A very brief measure of the Big-Five personality domains. Journal of Research in Personality, 37(6), 504–28. https://doi.org/10.1016/S0092 -6566(03)00046-1. Goulding, C., Shankar, A., & Canniford, R. (2013). Learning to be tribal: facilitating the formation of consumer tribes. European Journal of Marketing, 47(5–6), 813–32. https://doi.org/10.1108/ 03090561311306886. Hamilton, K., & Hewer, P. (2010). Tribal mattering spaces: social-networking sites, celebrity affiliations, and tribal innovations. Journal of Marketing Management, 26(3–4), 271–89. https://doi.org/10.1080/ 02672571003679894. Hofstede, G., Hofstede, G. J., & Minkov, M. (2005). Cultures and Organizations: Software of the Mind (Vol. 2). McGraw-Hill New York. Holzmann, P. S. (2020). Personality. Encyclopedia Britannica. https://www.britannica.com/topic/ personality (last accessed 23 May 2022). Holzweber, M., Mattsson, J., & Standing, C. (2015). Entrepreneurial business development through building tribes. Journal of Strategic Marketing, 23(7), 563–78. https://doi.org/10.1080/0965254X .2014.1001864. IBM (2021a). IBM Watson personality insights API documentation. https://cloud.ibm.com/apidocs/ personality-insights (last accessed 23 May 2022). IBM (2021b). The science behind the service. https://cloud.ibm.com/docs/personality-insights?topic= personality-insights-science (last accessed 23 May 2022). Jamal, A., & Goode, M. M. H. (2001). Consumers and brands: a study of the impact of self‐image congruence on brand preference and satisfaction. Marketing Intelligence & Planning, 19(7), 482–92. https://doi.org/10.1108/02634500110408286. Kosinski, M., Bachrach, Y., Kohli, P., Stillwell, D., & Graepel, T. (2014). Manifestations of user personality in website choice and behaviour on online social networks. Machine Learning, 95(3), 357–80. https://doi.org/10.1007/s10994-013-5415-y. Kressmann, F., Sirgy, M. J., Herrmann, A., Huber, F., Huber, S., & Lee, D.-J. (2006). Direct and indirect effects of self-image congruence on brand loyalty. Journal of Business Research, 59(9), 955–64. https://doi.org/10.1016/j.jbusres.2006.06.001. Kuss, D. J., & Griffiths, M. D. (2011). Online social networking and addiction—a review of the psychological literature. International Journal of Environmental Research and Public Health, 8(9), 3528–3552. https://doi.org/10.3390/ijerph8093528. Leitel, K. (2018). Cambridge Analytica – die Firma hinter dem Facebook- Datenskandal. https://www .handelsblatt.com/unternehmen/it-medien/politischekampagnen-%0Acambridge-analytica-die-firma -hinter-dem-facebookdatenskandal/%0A21087302.html (last accessed 23 May 2022). Lin, L. (2010). The relationship of consumer personality trait, brand personality and brand loyalty: an empirical study of toys and video games buyers. Journal of Product & Brand Management, 19(1), 4–17. https://doi.org/10.1108/10610421011018347. Lindemann, M., Briele, K., & Schmitt, R. H. (2020). Methodical data-driven integration of customer needs from social media into the product development process. Procedia CIRP, 88, 127–132. https:// doi.org/10.1016/j.procir.2020.05.023.
214 Handbook of social computing Liu, L., Preotiuc-Pietro, D., Samani, Z. R., Moghaddam, M. E., & Ungar, L. (2016). Analyzing personality through social media profile picture choice. Tenth International AAAI Conference on Web and Social Media. https://doi.org/10.1609/icwsm.v10i1.14738. Llopis-Albert, C., Rubio, F., & Valero, F. (2021). Impact of digital transformation on the automotive industry. Technological Forecasting and Social Change, 162, 120343. https://doi.org/10.1016/j .techfore.2020.120343. Maehle, N., & Shneor, R. (2010). On congruence between brand and human personalities. Journal of Product & Brand Management, 19(1), 44–53. https://doi.org/10.1108/10610421011018383. Marengo, D., & Montag, C. (2020). Digital phenotyping of big five personality via Facebook data mining: a meta-analysis. Digital Psychology, 1(1), 52–64. https://doi.org/10.24989/dp.v1i1.1823. McCrae, R. R., & Costa, P. T. (1987). Validation of the five-factor model of personality across instruments and observers. Journal of Personality and Social Psychology, 52(1), 81–90. https://doi.org/10 .1037/0022-3514.52.1.81. McCrae, R. R., & Costa Jr., P. T. (2004). A contemplated revision of the NEO Five-Factor Inventory. Personality and Individual Differences, 36(3), 587–96. McCrae, R. R., Costa Paul, T. J., & Martin, T. A. (2005). The NEO–PI–3: a more readable revised NEO personality inventory. Journal of Personality Assessment, 84(3), 261–70. McCrae, R. R., & John, O. P. (1992). An introduction to the Five-Factor Model and its applications. Journal of Personality, 60(2), 175–215. https://doi.org/10.1111/j.1467–6494.1992.tb00970.x. Mulyanegara, R. C., Tsarenko, Y., & Anderson, A. (2009). The Big Five and brand personality: investigating the impact of consumer personality on preferences towards particular brand personality. Journal of Brand Management, 16(4), 234–47. https://doi.org/10.1057/palgrave.bm.2550093. Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: replicated factor structure in peer nomination personality ratings. The Journal of Abnormal and Social Psychology, 66(6), 574–83. https://doi.org/10.1037/h0040291. Park, J. K., & John, D. R. (2010). Got to get you into my life: do brand personalities rub off on consumers? Journal of Consumer Research, 37(4), 655–69. https://doi.org/10.1086/655807. Pratama, B. Y., & Sarno, R. (2015). Personality classification based on Twitter text using Naive Bayes, KNN and SVM. 2015 International Conference on Data and Software Engineering (ICoDSE), 170–74. https://doi.org/10.1109/ICODSE.2015.7436992. Quercia, D., Kosinski, M., Stillwell, D., & Crowcroft, J. (2011). Our Twitter profiles, our selves: predicting personality with Twitter. 2011 IEEE Third Int’l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int’l Conference on Social Computing, 180–85. https://doi.org/10.1109/ PASSAT/SocialCom.2011.26. Schmitt, D. P., Allik, J., McCrae, R. R., & Benet-Martínez, V. (2007). The geographic distribution of big five personality traits. Journal of Cross-Cultural Psychology, 38(2), 173–212. https://doi.org/10 .1177/0022022106297299. Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E. P., & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE, 8(9), e73791. https://doi .org/10.1371/journal.pone.0073791(last accessed 23 May 2022). Schwartz, H. A., & Ungar, L. H. (2015). Data-driven content analysis of social media. The Annals of the American Academy of Political and Social Science, 659(1), 78–94. https://doi.org/10.1177/ 0002716215569197. Schwarz, D. (2018). Datenmissbrauch und Beeinflussung der US-Wahl – Facebook kündigt Analyseunternehmen. https://www.handelsblatt.com/unternehmen/it-medien/datensicherheitdaten missbrauch-%0Aund-beeinflussung-der-us-wahl-facebook-kuendigtanalyseunternehmen/ %0A21083604.html. Seidman, G. (2013). Self-presentation and belonging on Facebook: how personality influences social media use and motivations. Personality and Individual Differences, 54(3), 402–7. https://doi.org/10 .1016/j.paid.2012.10.009. Settanni, M., & Marengo, D. (2015). Sharing feelings online: studying emotional well-being via automated text analysis of Facebook posts. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015 .01045.
Congruence between customer and brand personality 215 Skowron, M., Tkalčič, M., Ferwerda, B., & Schedl, M. (2016). Fusing social media cues: personality prediction from Twitter and Instagram. Proceedings of the 25th International Conference Companion on World Wide Web – WWW ʼ16 Companion, 107–8. https://doi.org/10.1145/2872518.2889368. Smith, T. A. (2020). The role of customer personality in satisfaction, attitude-to-brand and loyalty in mobile services. Spanish Journal of Marketing – ESIC, 24(2), 155–75. https://doi.org/10.1108/SJME -06-2019-0036. Sop, S. (2020). Self-congruity theory in tourism research: a systematic review and future research directions. European Journal of Tourism Research, 26, 1–19, 2604. https://doi.org/10.54055/ejtr.v26i .1935. Sumner, C., Byers, A., Boochever, R., & Park, G. J. (2012). Predicting dark triad personality traits from Twitter usage and a linguistic analysis of tweets. 11th International Conference on Machine Learning and Applications, 386–93. https://doi.org/10.1109/ICMLA.2012.218. Tandera, T., Hendro, Suhartono, D., Wongso, R., & Prasetio, Y. L. (2017). Personality prediction system from Facebook users. Procedia Computer Science, 116, 604–11. https://doi.org/10.1016/j.procs.2017 .10.016. Tankovska, H. (2021). Number of global social network users 2017–2025. Statista. https://www.statista .com/statistics/278414/number-of-worldwide-socialnetwork-%0Ausers/(last accessed 23 May 2022). Tyborksi, R., & Demling, A. (2020). Autoindustrie droht beim autonomen Fahren den Anschluss an Google zu verlieren. https://www.handelsblatt.com/unternehmen/industrie/roboterautos -autoindustriedroht-%0Abeim-autonomen-fahren-den-anschluss-an-google-zu-verlieren/25832202 .html (last accessed 23 May 2022). Wei, H., Zhang, F., Yuan, N. J., Cao, C., Fu, H., Xie, X., Rui, Y., & Ma, W.-Y. (2017). Beyond the words: predicting user personality from heterogeneous information. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 305–14. https://doi.org/10.1145/3018661 .3018717. Wu, S., Ren, M., Pitafi, A. H., & Islam, T. (2020). Self-image congruence, functional congruence, and mobile app intention to use. Mobile Information Systems, 2020, 1–17. https://doi.org/10.1155/2020/ 5125238.
11. Netnography 2.0: a new approach to examine crowds on social media Mathias Efinger, Xisa Lina Eich, Marius Heck, Dung Phuong Nguyen, Halil Ibrahim Özlü, Teresa Heyder and Peter A. Gloor
1. INTRODUCTION Today, companies compete in digital environments in basically all of their domains, which brings new opportunities but also new challenges (Park & Mithas, 2020). One of the biggest new opportunities is using the surrounding flow of data beneficial for both vendors and customers (Mariani & Fosso Wamba, 2020). All the information, which is created by all members of the numerous online communities, exists as big data. It is accessible to every connected actor such as companies (Yun et al., 2020). This new data backbone can serve as a customer registry and, if used correctly, lead to improved communication with the customer. A registry that offers the company insights to all their customers in a detailed manner has shown a lot of value (Chi-Hsien & Nagasawa, 2019). As research points out, collecting and using data correctly is one of the crucial tasks in the digital environment (Tan et al., 2017). Following this thought, the value of knowing your customers can be vividly seen in daily marketing and product design operations (Varadarajan, 2020). Customer-centric approaches are increasingly found in companies which place questions such as “What does the market want? What does the customer need?” right at the beginning of their innovation process (Knight et al., 2020). Alongside these important questions, companies aim to create products or services that are tailored to satisfy each user’s needs (Wang et al., 2021). Again, these processes are strongly building on customer data, which is needed to create an effective marketing strategy (Varadarajan, 2020). Even after the product design and development is done, marketers face the challenge to offer the tailored product to the individual customer. It is common that individualized advertisements pop up on the users’ social networks and visited web pages (Cho & Hongsik, 2004). By understanding the reason why a user visited certain pages and searched for specific products, even more effective advertising can be applied. This could be done, for example, with the previously named customer registry. Thus, advertisements could be targeted more precisely and can additionally be duplicated for every other user who appears similar to the original user. Important prerequisites to exploit the full potential of these methods are being connected with and informed about the customer. While the idea of interacting with every customer individually is interesting for companies in theory, it is almost unfeasible to do so in practice. The required effort of handling numerous customers of the digital landscape person by person is immense. To use this performance-enhancing customer knowledge and cope with the size of networks at the same time, similar actors must be grouped. The process of grouping or segmenting people with 216
Netnography 2.0 217 similar interests has been well recognized in various conceptual approaches before (e.g., Deng & Gao, 2020; Kuruba Manjunath & Kashef, 2021; Zhou et al., 2021). Regarding these concepts, the goal of this chapter is to combine them with new techniques to propose a new customer segmentation approach that is contemporary, comprehensive, and easily applicable. Together with our customer segmentation approach, we contribute a set of guidelines for companies that are interested in effective and targeted segmentation based on digital social media data. To develop the guidelines, we conducted a case study of group segmentation based on the Covid-19 pandemic. This current topic offers an extensive data basis to interact with, as it has been discussed excessively on social media (Cinelli et al., 2020). This topic is additionally interesting because the different opinion groups are diverse and multifaceted, and therefore ideal for a segmenting case study. This sample thus challenged our approach with a complex instead of a homogeneous dataset. This approach helped us understand the foundations of previous literature on customer segmentation and social network analysis in a real and recent environment. The following section comprises the theoretical background relevant to our approach. We rethought customer segmentation especially in the focus of social interactions in digital environments, building on the concept of Netnography proposed by Kozinets (1998). In the sections “Methodology” and “Results” we illustrate the practical use of our new approach. We exemplify our approach by analyzing anti-vaxxers and protesters during the Covid-19 pandemic. Processing the available data is described in the Methodology section, while in the subsequent section the results are presented. In the penultimate section we elaborate and discuss the insights that have been gained by the methodology presented beforehand. This leads to several practical and theoretical implications that can be applied to other use cases. Additionally, we discuss the limitations of our approach. In the concluding section, we provide a summary of the most important findings and contributions of this paper and offer an outlook for possible future research.
2.
THEORETICAL BACKGROUND
2.1
Customer Segmentation
Customer segmentation is important for successful marketing strategies and business development (An et al., 2018; Kim et al., 2006). It is the process of dividing customers into various groups that are mutually exclusive based on their common characteristics and interests (An et al., 2018; Datta et al., 2020). This enhances the understanding of customers and aims to address them in a more precise and targeted way (Datta et al., 2020). Customer segmentation is therefore helpful for companies across industries, especially to enhance their product and service offerings (Abdulhafedh, 2021). Over recent decades, many segmentation approaches and techniques have been developed and introduced (e.g., Wedel & Kamakura, 2000). Depending on the segmentation purpose, Yankelovich and Meer (2006, p. 1) posit that it “need[s] a different kind of non-demographic segmentation to investigate” as traditional demographic traits like age, education, gender, and income are not enough to tailor product and service offerings in the recent business world. Furthermore, the authors found that personal preferences, values, and attitudes are more likely to affect a customer to purchase a product. They suggest that attitudinal indicators can help to segment customers based on their shared world view. Thus,
218 Handbook of social computing it is more promising to examine “people’s lifestyles, attitudes, self-image, and aspirations” as these characteristics are dynamic and may change along with an individual’s values and environment (Yankelovich & Meer, 2006, p. 2). Organizations have increasingly started to use social media as a source of customer insights to identify and profile customers. For example, the qualitative study by Canhoto et al. (2013) revealed that emerging segmentation practices based on social media data are a promising approach to complement traditional segmentation approaches that are still valid in the present socio-technical world. Social media data hereby offer meaningful insights into customers’ thoughts and feelings about a product or an organization in real-time, thus contributing to a more holistic view and understanding of the customers – even in a fast-paced and changing environment. Another significant advantage of observing social media activities is that customers may self-segment themselves by joining online communities of interest, which can improve accuracy in segmentation (Canhoto et al., 2013; Hines & Quinn, 2005). Güçdemir and Selim (2015) showed that using a clustering and multi-criteria decision-making approach for business customer segmentation can lead to a sustainable competitive advantage in the market. Another study relying on big data by Vojtovič et al. (2016) introduced a method on how to implement real-time customer segmentation to provide a long-term competitive advantage inside an enterprise. Teichert et al. (2008) conducted a study with 5,800 airline passengers to develop a customer segmentation that differentiates more than simply between business and economy class as customers have way more complex desires nowadays and are willing to pay different prices for different services. The segments they used can be distinguished between behavioral and socio-demographic variables. As a result, they say that “effective customer segmentation is crucial for a sustainable product strategy” (Teichert et al., 2008, p. 228). Krause and Battenfeld (2019, p. 890) even state that effective customer segmentation is essential to “promote sustainable consumption in finance”. Consequently, effective customer segmentation has the potential to positively contribute to a company’s success. 2.2 Netnography The rise of social media such as Facebook and X digitally brought together large communities (van Dijck, 2013). Many of the users in those communities share no more than a specific interest with other users and are therefore different from previous non-digital communities (Bollini, 2011). They do not have to share the same age, gender, nationality, residence, and so on to be part of a crowd. To address those superficial diverse crowds, it is necessary to use methods that adapt to the digital environment. As seen earlier in this chapter, especially companies that understand crowds as potential customers use segmentation tools for that reason (An et al., 2018). Understanding the digital nature, a method that occurred in this domain is “Netnography” (Kozinets, 1998). Netnography combines ethnographical methods of traditional customer research and culture studies with the internet and its communities to form a qualitative segmentation approach. The procedure Kozinets (2002) implies “include[s] (1) making cultural entree, (2) gathering and analyzing data, (3) ensuring trustworthy interpretation, (5) [sic] conducting ethical research, and (6) [sic] providing opportunities for culture member feedback”. He points out that after selecting desired communities as well as collecting and interpreting the data, steps 5 and 6 are especially distinctive for Netnography. Researchers analyze only the content of a digital
Netnography 2.0 219 community, but not their full set of acts. Also, users act differently depending on whether the space is public or private. To ascertain the accuracy of the method, user feedback is recommended, giving the observed community the opportunity to verify whether the observations are correct or false (Kozinets, 2002). During this process, data is created by users, which builds the base of different cyber cultural groups or tribes (Kozinets, 2002: 63). In addition to directly including written words or pictures, researchers can explore deeper constructs such as emotions, opinions, attitudes, rituals, and symbols that strengthen the mapping (Lee & Broderick, 2007). Companies which use this method accurately understand their customer groups and are consequently able to use more specific advertising and marketing strategies (Kozinets, 2002). In an experiment, Kozinets studied the behavior of members of an online coffee community. Here, it was important to understand the principle of “lead users” proposed by von Hippel (1986: 798), as they are the people who “are at the leading edge of significant new marketing trends” (Kozinets, 2002: 70). This means that these users can spot upcoming trends months or even years before the majority (von Hippel, 1986), which makes them very valuable from a marketing and product placement perspective. Originally, the screening method was used to identify these users by interviewing a large number of potentially relevant users through the completion of written questionnaires or telephone interviews (Belz & Baumbach, 2010). Netnography simplifies this process as it can be used to analyze entire online communities and their needs by using public data (Kozinets, 2002). When choosing communities to analyze, some guidelines should be followed. “In general, online communities should be preferred that have (1) a more focused and research question-relevant segment, topic, or group; (2) higher ‘traffic’ of postings; (3) larger numbers of discrete message posters; (4) more detailed or descriptively rich data; and (5) more between-member interactions of the type required by the research question” (Kozinets, 2002: 63). Our following case study was chosen according to these guidelines and therefore our approach relies on Kozinets’ recommendations for an effective segmentation approach.
3. METHODOLOGY We decided to demonstrate our method using a case study. This seemed to be appropriate as case studies are “particularly well suited to new research areas” (Eisenhardt, 1989: 548f). In our approach we build on and use parts of Kozinets’ (1998) Netnography method as Kozinets’ acknowledged approach delivers a suitable baseline for our research approach. The data collection and processing is based on Kozinets’ approach and is complemented by the division into different segments. Motivated by the ongoing Covid-19 pandemic, we demonstrated our approach using openly accessible data containing Covid-19-related content in the German-speaking region (Germany, Austria, and Switzerland). The content was obtained from the platform X, which is referred to below by the name by which it was known during the research, namely Twitter. We used the network analysis software Condor to perform data collection, processing, and analysis. It allowed us to gather Twitter data and to calculate relevant metrics (Zhang & Luo, 2017) for deeper analysis and interpretation. The definitions of the metrics we use are based on the work of Zhang and Luo because they provide a comprehensive approach and explanations for measuring relationships between actors in a network. This is extended by further metrics that follow the work of Gloor (2017). An overview of our process
220 Handbook of social computing is presented in Figure 11.1. The following sub-section explains how we conducted the data collection, processing, and analysis. Data Collection
3.1
To obtain accurate and reliable results, it was essential to establish an efficient way of collecting and storing the data. Condor allows its users to extract, process, visualize, and interpret data from various sources. By applying the ‘Fetch Twitter’ function we collected the Twitter data that we used for the analysis. By specifying search terms, the most recent tweets containing these terms can be fetched and saved. Since a single search term would not have been sufficient to collect data across a broader range, we decided on the 15 search terms that are listed in Table 11.1 before initiating the data collection. Since our aim was to examine the German-speaking area, we used the German terms that can be found in parentheses. When defining the search terms, we followed the guidelines of Kozinets (2002), which were explained at the end of the previous section. We chose terms associated with Covid-19 conspiracy theories to ensure that the right communities were identified for this analysis. Additionally, the frequency of tweet fetches needed to be determined to ensure that there were no major gaps in the collected data. We started the data collection on November 16, 2020. To ensure central data access by every team member and to accelerate the time for data fetching, a virtual machine was set up. This enabled each team member to access the central MySQL database from their local Condor Client independently. The data collection was terminated on January 31, 2021. The data collection resulted in 24 individual datasets for each search term and a total of 360 datasets for all search terms combined. Table 11.2 shows the number of actors and connections of each search term over the observed period before merging all datasets to one final set. One should note that these numbers may contain duplicates due to overlaps during the data fetching on Twitter. Table 11.1
Search terms for Twitter data collection
#
Search Term
#
Search Term
1
Conspiracy (Verschwörung)
9
Covid vaccination (Corona-Impfung)
2
Bill Gates
10
Vaccination damage (Impfschäden)
3
Virus
11
Compulsory vaccination (Impfzwang)
4
Anti-Vaxxers (Impfgegner)
12
Contergan
5
Side-effects (Nebenwirkungen)
13
Vaccination skeptics (Impfskeptiker)
6
Vaccine (Impfstoff)
14
Covid lie (Coronalüge)
7
Covid-19
15
Vaccination opposition (Impfgegnerschaft)
8
Antibodies (Antikörper)
Table 11.2
Number of actors and connections based on selected Twitter keywords
Keyword
Number of Actors
Number of Connections
Conspiracy
≈69,000
≈107,000
Bill Gates
≈59,000
≈88,000
Virus
≈57,000
≈100,000
Side-effects
≈55,000
≈100,000
Anti-Vaxxers
≈54,000
≈99,000
Netnography 2.0 221 Keyword
Number of Actors
Number of Connections
Vaccine
≈51,000
≈95,000
Covid-19
≈50,000
≈94,000
Antibodies
≈47,000
≈84,000
Covid vaccination
≈46,000
≈91,000
Compulsory vaccination
≈35,000
≈67,000
Vaccination damage
≈33,000
≈64,000
Contergan
≈14,000
≈24,000
Vaccination skeptics
≈8,000
≈14,000
Corona lie
≈4,000
≈7,000
Vaccination opposition
≈1,000
≈1,000
3.2
Data Processing
To analyze our network, it was necessary to merge the datasets of the different search terms into a single dataset. This allowed us to analyze the data of the individual search terms in more detail. In a second step, we merged these datasets, which formed the basis for our subsequent analysis. The process removed all duplicate actors and connections, resulting in a network of 115,412 actors and 584,406 connections, where each actor represents a Twitter account, and each connection a tweet. Then metrics for the processing and analysis were calculated for the final dataset which will be explained in the following paragraph. Following Zhang and Luo (2017), we use the following definitions for our selected metrics: ● Degree centrality is defined as the total number of direct links one actor has to other actors in the network. ● Betweenness centrality measures the influence of an actor in a network. An actor that lies on a relatively large number of paths between other actors is defined to have a high betweenness centrality. ● Closeness centrality indicates how relatively close one actor to all the other actors in a network is. Following Gloor (2017), we situate following further definitions: ● The sentiment calculation, a function of Condor, evaluates the content of the collected tweets and assesses their sentiment. ● The contribution index tells how many times someone has tweeted or has been retweeted. ● The turntaking annotations indicate how quickly an account responds to a message or, in this case, a tweet. ● Ego ART (average response time) shows how quickly an actor responds to somebody else and, thus, is a proxy for an actor’s passion. ● Alter ART indicates how quickly other actors respond to an individual and therefore is an indicator of how much somebody is respected. After calculating the degree centrality, all actors that had less than 15 connections (degree centrality < 15) were removed. This way, we ensured that non-influential users were removed from the dataset. Influence in this context refers to the number of direct connections of an actor. It allowed us to perform all subsequent calculations faster and made our dataset more manageable in terms of visualization as the number of actors and connections decreased
222 Handbook of social computing
Figure 11.1
Process of our case study
significantly. The degree centrality of 15 was chosen, as it ensured that users were active on the platform and also maintained a network that is large enough to conduct a comprehensive analysis. This resulted in a network of 9,666 actors and 212,073 connections. For the reduced network, the following metrics were calculated: betweenness centrality, closeness centrality, sentiment, contribution index, and turn-taking annotations. This resulted in the final dataset on which we performed our main analysis. We used Condor’s Tribefinder function (Gloor, 2017) to analyze the identified segments of our conspiracy theorists. Based on the content of each user’s posts, the Tribefinder algorithm divides users into categories of Alternative Realities, Recreation, Emotions, Lifestyle, Ideology, and Personality. To be able to do so, we had to translate our final dataset into English. For this purpose, we wrote a Python script that translated each item by using the Google Translator API via HTTP REST requests.
Netnography 2.0 223 Table 11.3
Selected key events of the Covid-19 pandemic in Europe
Date
Event
Country
19.10.2020
Introduction of mask obligation
Switzerland
28.10.2020
Announcement of light lockdown
Germany
02.11.2020
Start of light lockdown
Germany
02.11.2020
Prohibition of classroom teaching at universities
Switzerland
03.11.2020
Start of the second lockdown
Austria
16.11.2020
Introduction of stricter measurements
Germany
27.11.2020
Introduction of stricter measurements
Austria
02.12.2020
Extension of light lockdown
Germany
11.12.2020
Introduction of stricter measurements
Switzerland
13.12.2020
Announcement of the second lockdown
Germany
21.12.2020
EU Approval of BioNTech’s vaccine
EU
23.12.2020
Start of lockdown
Switzerland
27.12.2020
Start of vaccination
EU
01.01.2021
First mass protest in Austria
Austria
04.01.2021
First cases of SARS-CoV-2 mutation reported
Austria
06.01.2021
Extension of lockdown
Germany
3.3
Data Analysis
To analyze our final network, we used different metrics and methods. Two important metrics that we used were degree centrality and betweenness centrality. They indicate how a node, in our example a Twitter user, is positioned in the network. Users with a high degree centrality have many direct contacts and interactions within the network (Wasserman & Faust, 1994). Users with a high betweenness centrality are on many of the shortest paths between other users (Zhang & Luo, 2017). This means, for example, that information is more likely to pass them when it passes from one user to another. Lastly, we used the Tribefinder function to look at the personal characteristics of the users in our network. Additionally, we examined if there were any correlations between key events regarding the Covid-19 pandemic, such as the announcement of lockdowns or information about vaccination, and the social media activity on Twitter of our network. Therefore, some main Covid-related events in the considered countries were collected (see Table 11.3 for selected events in Germany, Austria, and Switzerland from the end of October 2020 to the beginning of January 2021). The results of the analysis we performed on our dataset are presented in the following section.
4. RESULTS In this section, our most significant results and our identified categories with their respective segments are presented. Based on the selected keywords (see Table 11.1) that were tracked within the observed time frame, we were able to detect 9,666 actors and 212,073 connections representing the interaction between those actors on Twitter. This final network formed the basis for further consideration and analysis.
224 Handbook of social computing 4.1
External Influential Factors
As shown in the line chart depicted in Figure 11.2, some external events produced a large echo in the activity of Twitter users over time. It becomes visible that users in the German-speaking region (Germany, Austria, Switzerland) were highly active around certain governmental announcements and incidents (see Table 11.3). The first peak in the chart exemplifies increasing activity around the discussion and introduction of stricter measurements in Germany and Austria. The second peak is illustrated at the start of stricter measurements in Switzerland and the announcement of the second full lockdown in Germany in mid-December 2020 followed by another spike on December 21, 2020, with the European Medicines Agency’s approval of BioNTech’s vaccine. Other peaks of Twitter activity of our network are recorded with the start of vaccination in the three countries and the confirmation of the first cases of SARS-CoV-2 mutation.
Figure 11.2 4.2
The Twitter activity of the analyzed network over time
(Most) Influential Actors
Table 11.4 reveals the top 10 actors with the highest degree, betweenness, and closeness centrality giving an indication of the most influential actors within the network according to the number of their direct and indirect connections and their ability to quickly connect others with each other. Table 11.5 depicts the top ten actors with the highest Ego ART and the highest Alter ART. The results from the two tables show that the ranking of the most influential actors varies depending on the considered metric. The most influential actors, which were identified based on their degree centrality, were exemplarily put together in a group on which the Tribefinder algorithm was applied to identify
Netnography 2.0 225 Table 11.4
Ranking of Twitter users according to selected measures
#
Username
DC
Username
BC
Username
1
Jensspahn
379
stephanbartosch
7.8971 x 1016
Karl_Lauterbach
0.1138
2
Welt
257
luebberding
7.5821 x 1016
SHomburg
0.1131
3
Nikitheblogger
250
gerhard_zeiler
6.8049 x 1016
Jensspahn
0.1128
4
Reitschuster
239
toheckx
6.6240 x 1016
Welt
0.1123
5
Maxotte_says
217
frauhui
6.3684 x 1016
M_T_Franz
0.1122
6
Jens140180
156
atlan35
6.3670 x 1016
Tagesschau
0.1121
7
Hendrikstreeck
151
weckgeschnappt
6.3183 x 1016
DrPeurner
0.1120
8
Derspiegel
130
iwonalaub
6.1741 x 1016
Volksverptzer
0.1119
9
Punktpreradovi c
112
woelkchen32
6.1492 x 1016
Reitschuster
0.1119
10
Drluebbers
107
pappensatt
6.1128 x 1016
WaltiSiegrist
0.1118
Notes:
CC
DC = Degree centrality; BC = Betweenness centrality; CC = Closeness centrality.
Table 11.5
Ranking of Twitter users with respect to Ego ART (passion) and Alter ART (respect)
#
Username
Ego ART
Username
Alter ART
1
MadMax6914
7.9638 x 1015
Zusehrverkuerzt
0.99055
2
GoqHasko
7.5375 x 1015
betei_geuze
0.99
3
sarkasmus_sa
7.4759 x 1015
FrankyWilliam
0.9822
4
FrankenDemo
7.1433 x 1015
tribunjgringo
0.98055
5
velerion
6.5697 x 1015
IhbeManfred
0.9755
6
maddi_madlen
6.3372 x 1015
FrancoisPuerro
0.9713
7
kakape
6.3220 x 1015
kattascha
0.9694
8
ElkeU18
5.9375 x 1015
taubentod
0.9668
9
MaibachRenee
5.8083 x 1015
TK_Presse
0.9542
10
Azarias_Ananias
5.7508 x 1015
meissner_udo
0.9527
the segments. As mentioned previously, the Tribefinder tool identifies and studies characteristics of the Twitter users by analyzing user-generated content and language use. Based on the textual data the automated tool extracts information about key individuals, brands, and other topics and categorizes Twitter users into different segments. Table 11.6 illustrates the unique characteristics of our identified segments within the German-speaking German/Austrian/ Swiss region that posted a lot of content about conspiracy theories and content against vaccination during the observed time frame. The Condor Tribefinder algorithm classifies the vast majority of the members in the “Alternative Realities” category as “Treehugger”, representing 90 percent of it (see Appendix, Table 11A.1 for a description of tribal macro categories and tribes). Personality-wise, members of the segment can be categorized as risk-takers (60 percent), while they seem to have an ideology that fits liberalism. However, they appear to be unhappy and fearful people, since the prevailing emotion and mood within the tribe is mainly annotated with fear and sadness. Concerning the categories ‘Lifestyle’ and ‘Recreation’, the majority making up roughly half of our identified accounts belong to the segments vegan and sport.
226 Handbook of social computing Table 11.6
Different categories and percentage distribution of the segments
Category
Segments and percentage distribution
Alternative Realities
Treehugger
90%
Nerd
7%
Fatherlander
3%
Recreation
Arts
11%
Sport
51%
Fashion
13%
Travel
25%
Emotions
Anger
7%
Fear
45%
Happy
3%
Sad
45%
Lifestyle
Vegan
53%
Yolo
40%
Fitness
1%
Sedentary
6%
Ideology
Socialism
3%
Complainers
3%
Liberalism
85%
Capitalism
9%
Personality
Journalist
12%
Politician
17%
Risk-taker
60%
Stock Trader
11%
5. DISCUSSION As can be seen from our methodology, the segments or communities discovered with our method are digital based and therefore need to be handled and monitored differently than offline communities. Marketers must be aware that they face a specific image of this community, as the individual user is showing his personality only as far as it is suited to the interest of the whole community. Whereas physical communities can be observed in many additional ways, online communities just offer one frame for every spectator. It becomes critical to make sure that this frame is of importance for the monitoring organization, otherwise it will not be useful to detect personal traits or information about the users. As online communities are built around shared interests or opinions, the more specific the center of the community is, the less broad and general the observations will be. Customer segmentation plays an important role in the forming and monitoring of groups. The concepts in this chapter use segmentation methods to identify communities (Kozinets, 2002). Companies can use the segmented constructs to speak to every user in the community on a personal level without being personal in fact. The community lens is suitable for the understanding of why users came together and what external point of interest influences their behavior. Resulting from our chapter, a group is not just influenced from the outside, but also the inside. Some actors who were very central in the assumed networks had the power of changing the view towards a topic inside the group by the selection of their output. Therefore, the community is a very dynamic construct; not only does it change when the outer world begins to change, but it can be changed internally by the will and opinions of central users, which is shared and diffused. Consequently, it can be advantageous to not just observe the community, but also these “influential users”, to fully understand the dynamic. Concerning our methodology, we have used the Covid-19 case as it is a vividly discussed topic that provides a lot of data. These huge amounts of data, especially on Twitter, were a good source of information for our initial analysis. On the contrary this could mean that our methodology could not be used by practitioners that want to investigate a topic that is not discussed on Twitter. Nevertheless, by adapting the time span or fetching data for a longer time period could adjust the lacking amount of data. At the same time, we cannot guarantee that the gathered data is not influenced by the specific time span we fetched it and that it still represents the current mindset of users. It might be the case that fetching today again would lead to a different output than ours. Furthermore, it is questionable whether events on the market have a direct effect on customer segments. As shown in the previous section, we do see peaks of reactions according to incidents in the market. Nevertheless, we cannot clearly define the consequences for our customer segments. Consequently, we encourage future research to investigate the influence of real-time factors on customer segmentation.
Netnography 2.0 227 As described in Section 2, we are using a clustering and multi-criteria decision-making approach of customer segmentation which can lead to a sustainable competitive advantage in the relevant market (Güçdemir & Selim, 2015). Furthermore, as presented, customer segments evolve in line with market developments, react to outer and unplanned circumstances, and are therefore only accurate to a certain degree depending on the industry and the market. The advantage of this approach is that it can be easily applied in real world settings reflecting pure and unbiased inputs from the actors. Moreover, it can be applied to monitor the evolution of attitudes and trends that may have social impacts that are relevant for proper customer segmentation. All in all, our methodology can make customer segmentation more effective and help to get a better overview of the relevant customer segments as it enables tracking and observing dynamic changes within a network over time. It is therefore a positive contribution to the addressing of customers in a more precise and targeted way (Datta et al., 2020). 5.1
Practical Implications
In regard to our research objective formulated at the beginning of the chapter, some recommendations can be made for companies for effective customer segmentation based on social media content. First of all, it has been shown that user content on social media such as Twitter is suitable for classifying people into different segments. We have shown that depending on the perspective chosen, such as climate awareness or technology aversion, different segments can be formed from the same dataset. However, it should be noted that this finding relates to data that spans at least four and a half months. It is possible that this finding is not valid for shorter periods, in which content is extracted from social media. Therefore, we generally recommend companies to choose a data fetching period as long as possible and to perform regular and short-frequency data extractions during this period, such as every three days. A comparison of our approach with previous approaches to online segmentation of people reveals another strong advantage of our methodology. Our approach works with data that users voluntarily disclose publicly on social media. Thus, the privacy of the users is respected to the highest possible degree and is still able to perform a meaningful segmentation based on their interests. Consequently, we recommend our approach to be applied in particular when no individual and private user data, such as browser data, is available, but publicly accessible user data must be accessed for availability or privacy reasons. As part of our analysis, we have shown that external influencing factors and events often lead to an echo in activities on social media. This allows the assumption that companies could indirectly influence user activities on social media and thus media trends through their activities. In other words, companies could actively influence the ‘social media market’ with, for example, targeted marketing campaigns. It can be assumed that this active intervention in the social media market will also have a potential impact on customer segmentation. Namely, it may influence the activity of individual users and thus lead to partly different segmentation results if our approach is followed. Consequently, companies following our approach should forgo large-scale marketing campaigns for the duration of the data fetch to obtain an unbiased market overview. As presented earlier, we identified multiple segments and were able to classify all customers into customer segments. These tribal affiliations need to be seen with a critical point of view as they show us specific and clear characteristics but at the same time, they need a new interpretation regarding the specific company that wants to use the method.
228 Handbook of social computing A clear reflection regarding their own customer segments is therefore recommended as well as a critical consideration on how to classify companies’ customers. 5.2
Theoretical Implications
Our approach enables publicly available information sources to be used as a data basis for segmenting people based on their interests. Specifically, this involves publicly shared blog posts on social media platforms such as Twitter, as used in this study. Since Twitter data is available to everyone and no special permission is required, this form of data acquisition is a comparatively inexpensive method. This makes it particularly suitable for small and medium-sized companies, which often do not have a large budget at their disposal. However, two factors must be taken into account here. First, the copyright of the people whose content is being extracted must be respected. Second, the integrity of social media content should always be questioned. Often, data extraction tools, such as the Condor tool used in our example, have trouble accurately interpreting the author’s intent. It is, therefore, necessary to weigh up the relatively effective and inexpensive, but still not 100 percent accurate segmentation of people based on social media content. We, therefore, recommend that a cost–benefit analysis be performed in advance before our methodology is applied. The question of whether and which metrics and key performance indicators should or must be used in pursuing our approach cannot be answered in a generalized manner. Details vary depending on the use case for which the methodology is applied. For example, in some ventures it may be important to follow a lead user approach, the concept of which was introduced by von Hippel (1986) (see e.g., Herstatt & von Hippel, 1992; Lilien et al., 2002; Urban & von Hippel, 1988). According to this approach, lead users must be identified at an early stage so that analyses can subsequently be performed on them (von Hippel, 1986). From the characteristics of lead users described by von Hippel (1986), it can be implied that in social media such as Twitter, they tend to create and share content more frequently and have a relatively larger network compared to other users than non-lead users because of their level of knowledge and interest in a product or process. Consequently, it can also be assumed that the influence of lead users on other users tends to be greater than that of non-lead users. This can be measured, compared, and interpreted using metrics such as degree, betweenness, and closeness, among others. Accordingly, as soon as our methodology is used in the context of a lead user approach, the use of metrics such as those mentioned above is indispensable.
6. CONCLUSION Customer segmentation has become particularly important for digital companies that aim to tailor individual offers to each customer (An et al., 2018). Introducing a new methodology for customer segmentation using the anti-vaccination community during the Covid-19 pandemic, we propose a new approach extending Kozinets’ (1998) Netnography in a more modern context. Our method can identify early trends and influential persons by tracking online social network activities on Twitter. The content-based approach measures and visualizes the diffusion of ideas and attitudes based on communication density and frequency between actors in the network. It can be of great help for companies to identify customer segments by using user-generated content on social media as a basis, as it can be applied in real-world settings
Netnography 2.0 229 reflecting pure and unbiased inputs. Moreover, it can be utilized to monitor the evolution of attitudes and trends that may have continuous social impacts that are relevant for proper customer segmentation. Implementing the real-time approach introduced by Vojtovič et al. (2016) could be a good extension for our methodology. Real-time updates can help companies to have a real-time overview of their customer segments and their development over time. Applying our approach in conjunction with real-time updates could enable more patterns to be identified and, for example, respond to market changes in real-time, allowing customer segments to be targeted more effectively, resulting in a more efficient and lasting competitive advantage. Our approach was limited by three design choices: (a) the time frame, (b) the data source, and (c) the selected data itself. The data was collected from November 16, 2020 to January 31, 2021. Our results are restricted to this time frame, as we pointed out that real events influence the actors and their behavior on social networks. Especially for describing the overall change, the time frame can be extended in both the past and the future. We used Twitter as our main data source. Other considered social media networks during our research were Facebook, Wikipedia, YouTube, and Reddit. We decided against Facebook, for example, because of its strict data protection which made it impossible to use the data of Facebook groups in a proper way. Extensions such as adding English and other lingual sources on conspiracy theories and other topics for data collection have the potential to discover additional insights. By overcoming this limitation and running the same calculations and filters on the new data, not only additions but also interesting comparisons could be done. Furthermore, we tried to avoid major data gaps by fetching our data every two to three days to capture all the relevant data. We introduced the different steps necessary to get an extensive overview of existing customer segments and how to analyze them. Using this new methodology, which we refer to as Netnography 2.0, as it complements the previously introduced Netnography approach, companies will be able to get to know their customers and customer segments in more detail and improve marketing and product design operations. Compared to the traditional Netnography approach, we use the possibilities of social media by integrating real user-generated content and monitoring network dynamics. Additionally, different customer segments using different metrics can lead to valuable insights regarding market changes and reactions. Future work could investigate the real-time monitoring of customer segments and the effects that market dynamics have on the customer segments. Therefore, the adoption of our Netnography 2.0 method can have a great impact on the establishment of effective customer segmentation, so it provides an innovative framework for researchers and practitioners.
ACKNOWLEDGMENTS Authors Efinger, Eich, Heck, Nguyen, and Özlü contributed equally to the design, conduct, and writing of this study. Authors Heyder and Gloor contributed significantly to the development of this chapter by supervising the study. All authors have read and agreed to the published version of the manuscript.
230 Handbook of social computing
REFERENCES Abdulhafedh, A. (2021). Incorporating K-means, hierarchical clustering and PCA in customer segmentation. Journal of City and Development, 3(1), 12–30. https://www.researchgate.net/profile/ azadabdulhafedh/publication/349094412_incorporating_kmeans_hierarchical_clustering_and_pca _in_customer_segmentation/links/601f494292851c4ed554724d/incorporating-k-means-hierarchical -clustering-and-pca-in-customersegmentation.pdf. An, J., Kwak, H., Jung, S., Salminen, J., & Jansen, B. J. (2018). Customer segmentation using online platforms: isolating behavioral and demographic segments for persona creation via aggregated user data. Social Network Analysis and Mining, 8(1), 54. https://doi.org/10.1007/s13278-018-0531-0. Belz, F.‑M., & Baumbach, W. (2010). Netnography as a method of lead user identification. Creativity and Innovation Management, 19(3), 304–13. https://doi.org/10.1111/j.1467- 8691.2010.00571.x. Bollini, L. (2011). Territories of digital communities: representing the social landscape of web relationships. In B. Murgante, O. Gervasi, A. Iglesias, D. Taniar, & B. O. Apduhan (eds.), Lecture Notes in Computer Science: Vol. 6782. Computational Science and Its Applications – ICCSA 2011: International Conference, Santander, Spain, June 20–23, 2011; Proceedings (Vol. 6782, pp. 501–11). Springer. https://doi.org/10.1007/978-3-642-21928-3_36. Canhoto, A. I., Clark, M., & Fennemore, P. (2013). Emerging segmentation practices in the age of the social customer. Journal of Strategic Marketing, 21(5), 413–28. https://doi.org/10.1080/0965254X .2013.801609. Chi-Hsien, K., & Nagasawa, S. (2019). Applying machine learning to market analysis: knowing your luxury consumer. Journal of Management Analytics, 6(4), 404–19. https://doi.org/10.1080/23270012 .2019.1692254. Cho, C.‑H., & Hongsik, J. C. (2004). Why do people avoid advertising on the internet? Journal of Advertising, 33(4), 89–97. https://doi.org/10.1080/00913367.2004.10639175. Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., Zola, P., Zollo, F., & Scala, A. (2020). The COVID-19 social media infodemic. Scientific Reports, 10(1), 16598. https://doi.org/10.1038/s41598-020-73510-5. Datta, D., Agarwal, R., & David, P. E. (2020). Performance enhancement of customer segmentation using a distributed Python framework, ray. International Journal of Scientific & Technology Research, 9(11), 130–39. Deng, Y., & Gao, Q. (2020). A study on e-commerce customer segmentation management based on improved K-means algorithm. Information Systems & E-Business Management, 18(4), 497–510. https://doi.org/10.1007/s10257-018-0381-3. Eisenhardt, K. M. (1989). Building theories from case study research. Academy of Management Review, 14(4), 532–50. https://doi.org/10.5465/amr.1989.4308385. Gloor, P. A. (2017). Sociometrics and Human Relationships: Analyzing Social Networks to Manage Brands, Predict Trends, and Improve Organizational Performance (1st edn.). Emerald Publishing. Gloor, P. A., Fronzetti Colladon, A., de Oliveira, J. M., Rovelli, P., Galbier, M., & Vogel, M. (2019). Identifying tribes on Twitter through shared context. In Collaborative Innovation Networks: Latest Insights from Social Innovation, Education, and Emerging Technologies Research, 91–111. Güçdemir, H., & Selim, H. (2015). Integrating multi-criteria decision making and clustering for business customer segmentation. Industrial Management & Data Systems, 115(6), 1022–40. https://doi.org/10 .1108/IMDS-01-2015-0027. Herstatt, C., & von Hippel, E. (1992). From experience: developing new product concepts via the lead user method: a case study in a “low-tech” field. Journal of Product Innovation Management, 9(3), 213–21. https://doi.org/10.1016/0737-6782(92)90031-7. Hines, T., & Quinn, L. (2005). Socially constructed realities and the hidden face of market segmentation. Journal of Marketing Management, 21(5–6), 529–43. https://doi.org/10.1362/0267257054307372. Kim, S.‑Y., Jung, T.‑S., Suh, E.‑H., & Hwang, H.‑S. (2006). Customer segmentation and strategy development based on customer lifetime value: a case study. Expert Systems with Applications, 31(1), 101–7. https://doi.org/10.1016/j.eswa.2005.09.004. Knight, E., Daymond, J., & Paroutis, S. (2020). Design-led strategy: how to bring design thinking into the art of strategic management. California Management Review, 62(2), 30–52. https://doi.org/10 .1177/0008125619897594.
Netnography 2.0 231 Kozinets, R. V. (1998). On netnography: initial reflections on consumer research investigations of cyberculture. ACR North American Advances, 25(1), 366–71. https://www.acrwebsite.org/volumes/8180/ volumes/(last accessed 18 January 2021). Kozinets, R. V. (2002). The field behind the screen: using netnography for marketing research in online communities. Journal of Marketing Research, 39(1), 61–72. https://doi.org/10.1509/jmkr.39.1.61 .18935. Krause, K., & Battenfeld, D. (2019). Coming out of the niche? Social banking in Germany: an empirical analysis of consumer characteristics and market size. Journal of Business Ethics, 155(3), 889–911. https://doi.org/10.1007/s10551-017-3491-9. Kuruba Manjunath, Y. S., & Kashef, R. F. (2021). Distributed clustering using multi-tier hierarchical overlay super-peer peer-to-peer network architecture for efficient customer segmentation. Electronic Commerce Research & Applications, 47. https://doi.org/10.1016/j.elerap.2021.101040. Lee, N., & Broderick, A. J. (2007). The past, present and future of observational research in marketing. Qualitative Market Research: An International Journal, 10(2), 121–9. https://doi.org/10.1108/ 13522750710740790. Lilien, G. L., Morrison, P. D., Searls, K., Sonnack, M., & von Hippel, E. (2002). Performance assessment of the lead user idea-generation process for new product development. Management Science, 48(8), 1042–59. https://doi.org/10.1287/mnsc.48.8.1042.171. Mariani, M. M., & Fosso Wamba, S. (2020). Exploring how consumer goods companies innovate in the digital age: the role of big data analytics companies. Journal of Business Research, 121, 338–52. https://doi.org/10.1016/j.jbusres.2020.09.012. Park, Y., & Mithas, S. (2020). Organized complexity of digital business strategy: a configurational perspective. MIS Quarterly, 44(1), 85–127. https://doi.org/10.25300/MISQ/2020/14477. Tan, K. H., Ji, G., Lim, C. P., & Tseng, M.‑L. (2017). Using Big Data to Make Better Decisions in the Digital Economy. Taylor & Francis. Teichert, T., Shehu, E., & von Wartburg, I. (2008). Customer segmentation revisited: the case of the airline industry. Transportation Research Part A: Policy and Practice, 42(1), 227–42. https://doi.org/ 10.1016/j.tra.2007.08.003. Urban, G. L., & von Hippel, E. (1988). Lead user analyses for the development of new industrial products. Management Science, 34(5), 569–82. https://doi.org/10.1287/mnsc.34.5.569. van Dijck, J. (2013). Facebook and the engineering of connectivity. Convergence: The International Journal of Research into New Media Technologies, 19(2), 141–55. https://doi.org/10.1177/ 1354856512457548. Varadarajan, R. (2020). Customer information resources advantage, marketing strategy and business performance: a market resources based view. Industrial Marketing Management, 89, 89–97. https:// doi.org/10.1016/j.indmarman.2020.03.003. Vojtovič, S., Navickas, V., & Gruzauskas, V. (2016). Strategy of sustainable competitiveness: methodology of real-time customers’ segmentation for retail shops. Journal of Security and Sustainability Issues, 5(4), 489–99. https://doi.org/10.9770/jssi.2016.5.4(4). von Hippel, E. (1986). Lead users: a source of novel product concepts. Management Science, 32(7), 791–805. https://doi.org/10.1287/mnsc.32.7.791. Wang, Z., Chen, C.‑H., Li, X., Zheng, P., & Khoo, L. P. (2021). A context-aware concept evaluation approach based on user experiences for smart product-service systems design iteration. Advanced Engineering Informatics, 50, 101394. https://doi.org/10.1016/j.aei.2021.101394. Wasserman, S., & Faust, K. (1994). Social Network Analysis: Methods and Applications (reprint). Structural Analysis in the Social Sciences: Vol. 8. Cambridge University Press. Wedel, M., & Kamakura, W. A. (2000). Market Segmentation: Conceptual and Methodological Foundations (2nd edn). International Series in Quantitative Marketing: Vol. 8. Springer Science+Business Media. Yankelovich, D., & Meer, D. (2006). Rediscovering market segmentation. Harvard Business Review, 84(2), 1–11. https://hbr.org/2006/02/rediscovering-market-segmentation (last accessed 2 February 2021). Yun, J. T., Vance, N., Wang, C., Marini, L., Troy, J., Donelson, C., Chin, C.‑L., & Henderson, M. D. (2020). The social media macroscope: a science gateway for research using social media data. Future Generation Computer Systems, 111, 819–28. https://doi.org/10.1016/j.future.2019.10.029.
232 Handbook of social computing Zhang, J., & Luo, Y. (2017). Degree centrality, betweenness centrality, and closeness centrality in social network. Proceedings of the 2017 2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM2017). Atlantis Press. https://doi.org/10.2991/msam-17.2017.68. Zhou, J., Wei, J., & Xu, B. (2021). Customer segmentation by web content mining. Journal of Retailing & Consumer Services, 61. https://doi.org/10.1016/j.jretconser.2021.102588.
Netnography 2.0 233
APPENDIX A Table 11A.1
Tribefinder tribal macro categories and tribes
Tribal macro category
Tribes
Description
Alternative reality
Fatherlander
They believe in God and fatherland, and that their fatherland is the best one. They cling to the good old times, hold the idea of family in high regard and have little time for foreigners.
Nerd
They believe that progress, science and technology are a blessing. They want to overcome death and colonize Mars. They are fans of globalization and network with each other.
Spiritualist
They believe in a subjective experience of a sacred dimension. They find strength in contemplation, and their behavior is driven by the search for sacred meaning.
Treehugger
They believe in the limits of growth and in the protection of nature. They challenge some elements of technological progress (e.g., gene manipulation) and welcome others (e.g., alternative energies).
Lifestyle
Fitness
They love doing sports and are addicted to training. They show an almost compulsive engagement in any form of physical exercise.
Sedentary
Opposite to the fitness addicted, they are characterized by much sitting and little physical exercise.
Vegan
They follow a plant-based diet avoiding all animal foods, as well as avoiding using animal products.
Yolo
They follow the motto “You only live once” and they think that one should make the most of the present without worrying about the future (“carpe diem”). As a consequence, they often adopt impulsive and reckless behavior.
Recreation
Art
They are interested in any form of art (e.g., paintings, sculptures, music, dance, literature, films), of which they appreciate the beauty and emotional power.
Fashion
They are interested in popular or the latest style of clothing, hair, decoration, or behavior.
Sport
They love watching any kind of sport on TV, and attending sports events. Some also actually like to practice these sports.
Travel
They love travelling around the world, for both pleasure and business, experiencing different cultures and environments.
Source:
Gloor et al. (2019).
12. Crowdfunding success: how campaign language can predict funding Andrea Fronzetti Colladon, Julia Gluesing, Francesca Greco, Francesca Grippa and Ken Riopelle
1. INTRODUCTION Crowdfunding has been recognized as a relevant contributor to innovation and economic growth, as it allows individuals or organizations looking for support to gather financial resources from a large pool of small-scale investors (Belleflamme et al., 2014; Shneor et al., 2020). Crowdsourcing platforms have been dramatically growing in both volume and importance in recent years, representing one of the most important emerging channels for project fundraising (Mollick, 2014). Posted projects vary in nature, from cultural events or weddings to more entrepreneurial endeavors, and differ in terms of compensation promised to investors or requested investment amount. Some of the most widely adopted platforms, such as Kickstarter, Starteed, Indiegogo, Eppela, and Ulule, are reward based, as they cover transactions while backers offer funding to individuals or organizations in exchange for non-monetary rewards, products, or services. Other platforms work on different models: some are equity based (e.g., Seedrs and GrowVc), while others are based on donations as backers provide funding based on philanthropic motivations without the expectation of monetary rewards. Since the goal of our study is to understand the key factors determining resource acquisition by entrepreneurs, we selected Kickstarter as the initial platform to explore the language used to promote the projects. In particular, we aim to understand the role that both linguistic choices and audio-visual selection play in determining the success of a crowdsourcing campaign. On Kickstarter, entrepreneurs create project narratives to describe their ideas, set funding goals, and define the length of the campaigns. In our study, we apply a computational linguistics methodology to predict which innovative ideas are more likely to be backed by investors. This study provides practical guidance to entrepreneurs interested in learning how to tailor their language in a crowdfunding campaign to enhance funding prospects. We extend the work of Butticè and Rovelli (2020) by going beyond the discovered negative relation between narcissism and crowdfunding success. Our goal is to explore further what aspects of the language used by entrepreneurs are conducive to being rewarded with financial support. Specifically, we use the same 59,538 Kickstarter crowdfunding campaigns to extend their work in multiple areas. First, we determine whether the Crovitz 42 Relational Words can predict campaign success or failure. Second, we adopt emotional text mining (ETM) to understand the broader categorization of words associated with successful and unsuccessful crowdsourcing campaigns. Third, we use a machine learning model to evaluate which textual and audio-visual elements of a campaign could primarily help predict its success. 234
Crowdfunding success 235
2.
LITERATURE REVIEW
2.1
Determinants of Crowdsourcing Success
Since the early 2000s, an increasing number of studies have explored the determinants of crowdsourcing success. There are several factors that could influence the ability of entrepreneurs to run a successful crowdsourcing campaign, from the organizational form adopted by entrepreneurs in their chosen industry, to the available information on crowdfounders’ previous experience, to the investors’ experience in the field (Kim & Viswanathan, 2014; Belleflamme et al., 2014). While traditional studies focus primarily on qualitative observations and surveys, more recent work adopted content analysis techniques to understand what makes some projects more successful than others. For instance, Greenberg et al. (2013) looked at the quality of the project presentation obtained by classifying the sentiment of its text, while Mollick (2014) calculated the number of spelling errors to predict success. Recent studies have looked at platforms like Ulule, Kickstarter, and Indiegogo to determine the probability that a given project is successful, by exploring the factors that influence which projects obtain a higher amount than the one they asked for (Cordova et al., 2015). By conducting a content analysis of campaign updates, Xu et al. (2014) found that adding further content and providing a progress report positively correlated with a project’s success. Other studies stressed the importance of communication with the platform members and visitors to determine successful project funding (Xiao et al., 2014). Very few studies have used text mining techniques and predictive analytics tools to determine success from the specific language used in the project description. For example, Yuan et al. (2016) introduce a text analytics-based framework to extract latent semantics from the textual descriptions of projects and predict the fundraising outcomes of these projects. Etter et al. (2013) use information gathered from tweets and Kickstarter’s projects to explore which general project characteristics are most predictive of success. 2.2
Linguistic Selection as Determinant of Success
As the language expectancy theory suggests, audiences develop specific expectations concerning the experience and credibility of the communicator as well as the context in which the communication occurs (Burgoon et al., 2002). Recent studies have focused on the specific words that comprise the entrepreneurship discourse and how it has evolved over time, with evidence suggesting how entrepreneurs’ lexicon is rather complex and fluid (Roundy and Asllani, 2019). To understand the impact of words and sentences on the outcome of crowdsourcing campaigns, Peng et al. (2022) analyzed narratives of thousands of film projects from Kickstarter and identified as successful projects the ones that included words that reflected the credibility of project creators. In contrast, failed projects often used words that transmitted uncertainty. Other studies used natural language processing and neural network methods to show the positive impact generated by the use of enthusiastic language (Kaminski & Hopp, 2020). By looking at videos in addition to text, the study suggested that depicting the product in action may capture the investors’ attention and increase a campaign’s success. Another study on crowdfunding for a technological innovation found that campaigns using more proximal language (e.g., we-pronouns, you-pronouns) and concrete words were associated with
236 Handbook of social computing increased success, while campaigns using more distant terms (e.g., self-referential I-pronouns) were less likely to succeed (Zhu, 2022).
3. METHODS Our integrated methodology combines the analysis of relational word frequency (Crovitz, 1967) with the application of ETM techniques and uses predictive analytics to determine the linguistic characteristics most likely to be associated with a successful campaign. This study used, with permission, the 59,538 Kickstarter crowdfunding campaigns from 2016 and 2017 from the work of Butticè and Rovelli (2020). We used the Kickstarter campaign pitch as this represents how entrepreneurs make their case for funding through words, pictures, and multimedia. In his 1967 article “The Form of Logical Solutions”, Crovitz used Polya’s (1957) principles to describe the goal of heuristics as the search for methods and rules to help with both discovery and invention. Crovitz then emphasized how creativity and the solution to problems often occur when two entities are brought together in a new relationship. He proposed a framework that included a set of words that could be used to foster new thoughts about relations, including new ideas that could lead to innovative solutions. Based upon Ogden’s 1934 book The System of Basic English, Crovitz’s 42 Relational Words included primarily conjunctions and prepositions. The Crovitz 42 Relational Words have already been used to distinguish “innovators” from “non-innovators”, by mining internal online communication forums that collected more than 16,626 posts from 3,754 employees (Greco et al., 2020). This is the list of words presented by Crovitz (1967): about, at, for, of, round, to, across, because, from, off, still, under, after, before, if, on, so, up, against, between, in, opposite, then, when, among, but, near, or, though, where, and, by, not, out, through, while, as, down, now, over, till, with. For the purposes of this chapter, we are interested in knowing if the 42 relational words proposed by Crovitz discriminate success from failed campaigns overall and by industry. After analyzing the Crovitz words, we applied ETM, a type of sentiment analysis based on a socio-constructivist approach used to classify unstructured data and describe the sentiments expressed in a campaign. ETM is a text mining procedure that uses a bottom-up logic for a context-sensitive text mining approach (Greco and Polli, 2020).
4. RESULTS The results of our analysis of the Kickstarter crowdfunding data are presented in three sections. The first set of findings is the result of an analysis using the Crovitz 42 Relational Words to understand their relationship to campaign success. This analysis is followed by an examination of the campaigns’ text through ETM to further understand how words are related to the success of crowdfunding. Finally, we present the results of our machine learning modeling that combines text and other crowdfunding variables that are thought to predict success.
Crowdfunding success 237 4.1
Results of the Analysis of Crovitz’s 42 Relational Words
The 59,538 Kickstarter crowdfunding campaigns vary by industry in the number of campaigns and their success rate. Table 12.1 illustrates the frequency distribution of the counts of the 59,538 Kickstarter crowdfunding campaigns from 2016 to 2017, organized by industry. For each of the 15 industries, Table 12.1 includes five columns: the Failed Campaign Count, the Success Campaign Count, the row Total, the Count of Success minus Failed Campaigns, and the Percentage Difference ((Success−Failed)/Total Count). The 15 Kickstarter industries are ranked by the percentage difference, which measures the success rate within that industry category. The Journalism Kickstarter campaigns had the lowest chances of success. Out of 880 Journalism campaigns, only 169 were successful, with a percentage difference of −61.59 percent. In contrast, Comics Kickstarter campaigns had the highest chances of success. Out of a total of 2,467 Comic campaigns, 1,545 were successful, with a percentage difference of +25.25 percent. In addition, the total number of campaigns within an industry varied considerably. Dance had the lowest number of campaigns (452), while Games had the highest number of campaigns (8,464). Thus, just knowing into which Kickstarter industry an entrepreneur’s campaign is entered provides some insight into the amount of competition and the likelihood of success or failure in obtaining crowdsource funding. Table 12.1
Kickstarter crowdfunding campaigns 2016–17 by industry
Kickstarter
Failed Campaign
Success
Industry
Count
Campaign Count
Journalism
711
169
Total Count 880
Count of Success−
% Difference (Success−Failed)/
Failed Campaigns
Total Count
−542
−61.59
Technology
5,735
1,473
7,208
−4,262
−59.13
Food
2,718
841
3,559
−1,877
−52.74
Crafts
922
320
1,242
−602
−48.47
Fashion
3,338
1,299
4,637
−2,039
−43.97
Publishing
3,839
2,277
6,116
−1,562
−25.54
Photography
864
515
1,379
−349
−25.31
Film & Video
4,460
2,774
7,234
−1,686
−23.31
Art
2,168
1,482
3,650
−686
−18.79
Design
3,604
2,646
6,250
−958
−15.33
Music
2,695
2,106
4,801
−589
−12.27
Games
4,541
3,923
8,464
−618
−7.30
Theater
495
704
1,199
209
17.43
Dance
182
270
452
88
19.47
Comics
922
1,545
2,467
623
25.25
Total
37,194
22,344
59,538
−14,850
−24.94
In short, do successful entrepreneurs use the Crovtiz 42 Relational Words more often in their campaigns than unsuccessful entrepreneurs in their failed campaigns? To answer this question, we used the WORDij (Danowski, 2013) software. WORDij is a text analysis software that computes a Z-Score test of two proportions on the words in two files.1 We sought to determine if there were any significant differences in the use of these 42 words between successful and failed campaigns. First, Table 12.2 presents the Z-Scores for the first-person personal pronouns, the words “I, we, us”, and confirms prior results reported in Butticè and Rovelli (2020) about narcissism and successful campaigns. In our study, the word “I” was used more often by the failed campaigns
238 Handbook of social computing Table 12.2
Z-Scores for first-person personal pronouns Z-Score
First-Person Personal Success
Failed
Success
Failed
Pronouns
Campaign
Campaign
Campaign
Campaign
Frequency
Frequency
Proportion
Proportion
I
113,454
170,764
0.006692
0.009513
−92.70
we
187,709
184,203
0.011071
0.010261
23.29
us
38,716
36,848
0.002283
0.002053
14.66
with the highest negative Z-Score of −92.70. In contrast, successful campaigns used significantly more often the two other first-person personal pronouns “we” and “us” with Z-Scores of +23.29 and +14.66, respectively. This result is also aligned with Zhu (2022), who used Kickstarter to analyze technological innovation campaigns and demonstrated the benefits of using proximal language and concrete words to reach the funding goals. Those campaigns that emphasize a shared innovation, as distinguished through the use of “we” and “us”, were more likely to succeed than those of entrepreneurs focused on themselves. Subsequently, we computed a Z-Score to determine the significant differences for Crovitz’s 42 Relational Words overall and by the 15 Kickstarter industries. Figure 12.1 presents the Z-Score results. Overall, 36 of the 42 Crovitz Relational Words were associated with successful and unsuccessful campaigns, while only six words show no discrimination: 86 percent (or 36) of the 42 Crovitz Relational Words had a significant Z-Score +/− 1.64, p < .05. Failed campaigns used more frequently 17 of these 42 words. These 17 words were: to, not, because, where, or, but, then, when, so, while, up, against, off, near, down, and, through (dark gray in Figure 12.1). At the other end, successful campaigns used more often the following 19 words: by, of, at, from, if, for, on, over, as, in, before, round, after, among, out, between, now, about, still (light gray). Only 14 percent (or 6) of the 42 Crovitz Relational Words did not show any significant difference. Those six words were: with, till, opposite, across, though, under (unshaded). Figure 12.2 presents the results by industry. Figure 12.2 columns are: the Crovtiz 42 Relational Words, the total Z-Score for all 15 industries, followed by the 15 industries’ Z-Scores sorted from left to right by the number of Crovitz’s words found not to be significant from low to high. Thus, Publishing, Technology, and the Comics Kickstarter industry categories have the fewest number of Crovitz words that were not found to be significant, specifically, 10, 12, and 13. At the other end of the spectrum, Art, Theater, and Journalism Kickstarter industry categories have the highest number of Crovitz’s words that were found not to be significant, precisely 24, 24, and 28. The last columns are the Industry Count with a Failed Significance Score and the Industry Count of Success. Figure 12.2 rows are the Crovitz 42 words sorted by Industry Count of Success and then by the Total Z-Score. Thus, there are two words used across all 15 Kickstarter industries more often by failed campaigns: “to” and “not”. Another way of saying this is that successful campaigns used the words “to and “not” significantly less often than unsuccessful campaigns. By contrast, two words were used more often in successful campaigns, in 14 out of 15 Kickstarter industries. These were the words “by” and “at”. Overall, Crovitz’s 42 words provide insights into the Kickstarter campaign language that can be used successfully or unsuccessfully. Nevertheless, the Crovitz 42 Relational Words vary in their discrimination ability across the 15 industries from a high of 76 percent of words in the Publishing industry to a low of 33 percent of words in the Journalism industry.
Crowdfunding success 239
Figure 12.1 4.2
Z-Score overall results for Crovitz’s 42 Relational Words
Results of the Emotional Text Mining Analysis
The Kickstarter crowdfunding corpus comprised a total of 37,033,847 words, comprising 273,725 comparable chunks of text characterized by a good lexical richness (type/token ratio = 0.01; hapax = 48.0 percent). On average, each campaign comprised 765.2 words with a large variability in the text length (SD = 755.7). Although successful and unsuccessful campaigns are similar in number of “chunks”, i.e., in word usage (successful = 49 percent; unsuccessful = 51 percent), ETM results show that they differ in their lexical profiles. Through 1,950 terms we were able to classify 89 percent of campaigns. Table 12.3 presents the identified six clusters and five factors. The five main axes of communication characterize the crowdfunding proposals globally, and they are: the item, the advantage, the support, the target, and the specificity. The six themes are: easy device, people storytelling, pledge reward, art and music, customer service, and product and service details. In Table 12.4, we highlight the lexical profiles characterizing the main axes of the communication (i.e., factors). The first factor helps differentiate the proposal of an idea from that of
240 Handbook of social computing
Notes: Cells are shaded by Z-Score significance: dark gray shaded cells are failed campaigns with significant negative Z-Scores of −1.64 or lower; light gray shaded cells are successful campaigns with significant positive Z-Scores of +1.64 or higher; unshaded cells do not have significant positive or negative Z-Scores. Critical Z values for the two proportions are: p < 0.5 is +/− 1.64; p < .01 is +/− 2.389; p < .001 is +/− 3.5.
Figure 12.2a Z-Score results by the Crovitz 42 Relational Words by 15 Kickstarter industries an object; the second factor distinguishes the advantage in funding the proposal between the possibility of receiving a reward or addressing a specific target, such as old or young people; the third factor specifies whether crowdfunding will support a service or a product; the fourth factor focuses on the area of the proposal, distinguishing art from people, and particularly the family since words like parent, kid, child, family characterize this polarity; and the fifth factor specifies the item characteristics in terms of materials or solutions. These five axes of communication set a symbolic space of sensemaking in which the representations of crowdfunding campaigns (cluster) are located. Although the main axes of communication characterize all the proposals, some campaigns are more focused on specific axes according to their location in the factorial space. In particular, successful campaigns show more campaigns classified in clusters located on the positive polarity of the first factor. The first cluster is the Easy Device (11.75 percent of campaigns) and focuses mostly on the usefulness of the proposal that can improve people’s life, offering solutions to specific problems (Table 12.5). Most (52 percent) of campaigns classified in this cluster are successful. The second cluster is the People Storytelling (18.73 percent of campaigns) and represents the people who could benefit from the proposal. This cluster seems to characterize the unsuccessful campaigns, with 55 percent of campaigns classified in it. The third cluster is the Pledge
Crowdfunding success 241
Notes: Cells are shaded by Z-Score significance: dark gray shaded cells are failed campaigns with significant negative Z-Scores of −1.64 or lower; light gray shaded cells are successful campaigns with significant positive Z-Scores of +1.64 or higher; unshaded cells do not have significant positive or negative Z-Scores. Critical Z values for the two proportions are: p < 0.5 is +/− 1.64; p < .01 is +/− 2.389; p < .001 is +/− 3.5.
Figure 12.2b Z-Score results by the Crovitz 42 Relational Words by 15 Kickstarter industries Table 12.3 Cl
1
CU%
Summary of the ETM results Label
11.75
Easy device
18.73
People storytelling
2 3
15.53
Pledge reward
4
15.52
Art & Music
5
13.33
Customer service
6
12.28
Product & service details
Factor 1
Factor 2
Factor 3
Factor 4
Factor 5
Item
Advantage
Support
Target
Specificity
30.4%
23.8%
19.3%
14.1%
12.5%
Object
Target
Art
Solution
0.67
0.31
−0.17
−0.34
0.30
Idea
Target
People
−0.28
0.48
0.09
0.39
0.17
Rewards
Solution
−0.18
−0.69
0.13
0.04
0.33
Idea
Art
−0.62
0.16
0.15
−0.46
−0.17
Service
Material
−0.06
−0.16
−0.67
0.14
−0.29
Object
Product
Material
0.48
−0.10
0.50
0.14
−0.40
Notes: Explained inertia is reported under the factor; the factor coordinate is reported under the label of the polarity. The polarity label is not reported for a coordinate value < +/− 0.2.
Reward (15.53 percent of campaigns) which characterizes 55 percent of successful campaigns and represents the possibility to reward those who support the proposal. The fourth cluster represents Art and Music initiatives (15.52 percent of campaigns), and the fifth cluster Customer Service (13.33 percent of campaigns); both these clusters characterize more the unsuccessful campaigns (54 percent and 64 percent, respectively). Finally, the sixth cluster is the Product and Service Details (12.28 percent of campaigns) that focuses on the need to specify the characteristics of what is proposed: 60 percent of campaigns in this cluster come from successful ones.
242 Handbook of social computing Table 12.4
Correspondence analysis results
Factor
Negative Pole
Label
Term
a.c.%
Positive Pole Label
Term
a.c.%
1. Item
Idea
community
0.56
Object
easy
1.01
dream
0.48
bag
0.68
hope
0.43
fit
0.67
student
0.38
water
0.66
city
0.36
device
0.64
2. Focus
Reward
country
0.24
phone
0.55
reward
3.84
Target
young
0.74
pledge
3.43
old
0.52
cost
2.73
school
0.40
backer
2.68
category
0.37
thank
1.71
change
0.28
3. Support
Service
money
1.45
0.23
app
4.29
Product
home color
1.19
browser
3.77
black
0.77
capable
3.43
size
0.76
business
2.08
white
0.48
user
1.92
paper
0.32
4. Area
Art
market
1.62
choose
0.28
album
3.95
People
child
2.31
record
3.78
family
1.14
song
3.55
learn
0.69
artist
2.15
kid
0.63
studio
1.28
0.61
5. Specificity
Material
band
1.13
woman parent
0.58
card
2.39
Solution
charge
0.48
leather
1.10
power
0.35
unique
0.83
problem
0.25
style
0.74
issue
0.16
hand
0.71
physical
0.16
quality
0.55
solar
0.14
Successful campaigns seem to be more practical and they directly mention the benefits people can receive if they support the project. They also tend to reflect better the details of the project and what entrepreneurs plan to do with the money. Unsuccessful campaigns seem to talk too much about the story, the artistic value, and the services they offer. These themes also characterize successful campaigns, but they are emphasized less. 4.3
Results of the Machine Learning Model
After characterizing the language style of successful crowdfunding campaigns, we extended our analysis to see which elements of a campaign (including its text) could primarily affect successful funding. Accordingly, we trained a machine learning model that uses decision trees and is designed for unbiased boosting with categorical features, namely CatBoost (Prokhorenkova et al., 2018). CatBoost is a machine learning method for dealing with “big data” that enables the automatic extraction of knowledge and the implementation of optimiza-
Crowdfunding success 243 Table 12.5
Cluster lexical profiles
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Easy Device
People Storytelling
Pledge Reward
Art & Music
Customer Service
Cluster 6 Product & Service Details
easy
Child
reward
album
content
card
device
Family
pledge
song
browser
cards
water
Young
cost
record
app
deck
phone
Story
backer
music
html
color
battery
Tell
thank
artist
sound
player
system
Life
ship
film
replay
size
usb
Woman
goal
studio
capable
leather
light
Old
stretch
musician
business
hand
bag
Learn
money
dance
market
game
smart
kid
copy
band
user
dice
charge
parent
fund
festival
play
black
easily
lives
kickstarter
ep
website
mm
carry
dream
print
director
service
quality
control
girl
receive
theater
customer
unique
pocket
man
campaign
perform
product
gold
cable
school
raise
producer
need
style
power
people
cover
tour
food
design
solution
mother
reach
musical
store
white
patent
human
order
th
online
piece
bike
father
tier
city
platform
paper
Note:
The words are ordered in ascending order according to their chi-square value.
tion tasks while also considering interactions and nonlinear relationships among predictors and dependent variables (Choudhury et al., 2019; Cui et al., 2006). Our model included features extracted through text mining (those described in the previous sections) together with a set of control variables that proved to impact crowdfunding success in past research (Greco et al., 2021). These are the number of pictures or videos appearing on the campaign webpage; the campaign goal, expressed in U.S. dollars; the industry category; the country; the currency; a dummy variable indicating if the campaign was selected as a “staff pick”, thus getting more attention; the time from the launch to the final deadline, expressed in days. We validated the model results through Monte Carlo cross-validation (Dubitzky et al., 2007), with 300 random dataset splits into training and test data. At each split, we trained the model on 75 percent of the sample data and used the remaining observations for testing. Therefore, we validated our results considering predictions made on 300 different test and train sets. In general, we obtained good classification results, correctly identifying successful campaigns in 84.4 percent of cases (on average, considering the different test sets). We also obtained good average values of the Area Under the ROC Curve (0.85) and Cohen’s Kappa (0.68). As we were also interested in evaluating the impact of the different predictors on the final classification, we used the SHapley Additive exPlanations (SHAP) package, designed for the Python programming language (Lundberg & Lee, 2017). In particular, we used SHAP to determine feature importance. This method is applicable to the output of different machine learning models and showed better consistency than other approaches (Lundberg et al., 2020; Lundberg
244 Handbook of social computing & Lee, 2017) – particularly for tree ensembles (Lundberg et al., 2018, 2019). Results of this last analysis are presented in Figure 12.3, which shows the SHAP values for the top 15 most important model features (from top to bottom). We observe that the most important features are the number of images (the more, the better), the campaign goal (the lower, the better), and the industry category. Being a “staff pick” also significantly increases the probability of success. Making the campaign more “personal” (i.e., using the personal pronouns she, he, and we) also has a positive impact, even if smaller than the other features just mentioned. Having longer descriptions is usually beneficial, up to a certain threshold (2,000–3,000 words), as well as having more hapaxes and types (which probably makes the message less redundant). In terms of videos, the optimum seems to have one or two. High use of the negation term “not” seems to decrease campaign success probability, which is aligned with our results based on the Crovitz model. Indeed, the word “not” was the only one in the Crovitz set to rank among the top 15 features. Similarly, in terms of emotions, negative ones seem to count more than positive ones in distinguishing successful campaigns. Fewer words related to negative emotions in a campaign text increase the probability of successful funding.
Figure 12.3
SHAP values for the top 15 model features
It is also interesting to observe a dependency between the length of the campaign text and its number of images. Figure 12.4 shows that the probability of success increases with the number of words, with optimal values between 1,000 and 2,000. This positive effect fades out when the text is probably too long (i.e., after 2,000 words). In the graph, we also notice an interaction of the word count with the number of images. In particular, the probability of success is even lower when the campaign text is very short, and there are many images. Having many images also seems to reduce the positive effect of longer texts. Probably, when there are many, long descriptions are not necessary.
5. DISCUSSION This study has important implications for practicing entrepreneurs engaging in crowdfunding campaigns. Entrepreneurs can use the proposed methodology to prepare and review their project descriptions by avoiding specific words and using a more balanced combination of images and text to improve the chance of raising additional funds. The selection of words has a significant role in predicting crowdfunding success, and entrepreneurs should craft
Crowdfunding success 245
Figure 12.4
Word count interaction with the number of images
their messages to potential investors through a careful choice of words and visual supporting material. Our findings on the role of the Crovitz 42 Relational Words in determining a campaign’s success indicate how using a language that indicates proximity between speakers and the intended audience helps reduce spatial or temporal barriers. In contrast, words that convey a sense of distance may negatively impact the outcomes, in our case receiving funding. Despite some differences across industries and contexts, we confirm what research conducted in the communication accommodation theory (Semin, 2007; Zhu, 2022) had identified: when speakers signal their relational closeness with their audiences and reduce the differences in linguistic features during their interaction, there is an increased opportunity to establish a more meaningful connection. Regarding the project descriptions length, our findings support recent evidence that providing adequate information positively impacts the fundraising outcome, as it decreases the information asymmetry and provides potential investors a more complete base for making a decision (Adamska-Mieruszewska et al., 2021). Our study contributes to this literature as it shows the need to balance the length of the project description, which we also found positively associated with success, and the use of supporting multimedia. It seems that excessively lengthy descriptions do not lead to successful funding, as the positive effect vanishes when the text is longer than 2,000 words. Overall, the three methodologies adopted in this study – Crovitz analysis, ETM, and predictive models – converge to suggest similar results: to be successful at crowdsourcing strategies, project descriptions need to stick to a simple message, offering enough details to catch the attention, and combining textual information with a mix of images and videos. Multimedia offers the opportunity to make up for the supporting details not used in the project description. Our results also indicate the need to balance simplicity, focus, and descriptiveness, including fewer campaign goals in the description which may help attract and retain investors’ attention.
246 Handbook of social computing Both the ETM and the Crovitz model illustrate the need to use positive language. Excessive use of negation terms like “not” tend to decrease campaign success probability. This confirms previous studies (Greco et al., 2021), which also used the Crovitz model and found that non-innovators mostly use the adversative and restrictive conjunction “but” that expresses an explicit opposition, exception, or correction to a previous concept. Non-innovators also tend to use the adverb “not” more often, a negative particle that negates and excludes, which is the opposite of the conjunction “and”, characterizing the language of innovators. This chapter contributes to the crowdfunding literature in several ways. Our findings confirm the importance of language choice to the success of a campaign, particularly wording that is positive and inclusive to bring investors into the spirit of the campaign. Our study also found that if entrepreneurs include personal and relational terms in their narratives, they have a higher probability of attracting investors. This research also extends literature on crowdfunding success by providing additional evidence that language is connected to action (Denning and Dunham, 2006) and specifically that a key factor to a campaign’s success is to find the “sweet spot” in the mix of images and words and in the number of words to attract and keep investors’ attention, but also to convey goals in a direct and concise manner.
6. CONCLUSION Leaders can adopt approaches similar to the one described in this study to identify the most creative and innovative individuals in their organizations. Here, we provide suggestions for how entrepreneurs can use language and other controllable variables to increase their chances for success with their crowdfunding campaigns from the very start, as well as understand which uncontrollable variables may boost their chances of success during a campaign. Future research could examine more closely the differences in results across industries that were uncovered in the analysis of the Crovitz 42 Relational Words. For example, entrepreneurs thought to be skilled in communication and writing, such as journalists, actors, artists, and designers, showed less discrimination in how they used the Crovitz words. While our research has demonstrated that Kickstarter campaigns using English are similar across cultures, with the likelihood that campaigners and investors are part of a transnational entrepreneurial culture, it would be interesting to learn if industry culture is salient and has an impact on word choice. A limitation of this study is that we relied on data from one crowdfunding platform, Kickstarter. Future studies could extend the analysis by examining projects posted on other crowdfunding platforms, such as Ulule, Eppela, and Indiegogo. We encourage future researchers to further extend the work reported in this chapter to examine contextual differences in language that might be specific to industries and platforms. Such research could provide more precise guidance about language style and word choice to entrepreneurs to help them craft successful campaigns.
ACKNOWLEDGMENT The authors are grateful to Vincenzo Butticè and Paola Rovelli for providing the crowdfunding campaign database used in this research.
Crowdfunding success 247
NOTE 1.
For a description and formula of the Z-Score two population proportions, see https://www.socscistatistics.com/tests/ztest/default.aspx.
REFERENCES Adamska-Mieruszewska, J., Mrzygłód, U., Suchanek, M., & Fornalska-Skurczyńska, A. (2021). Keep it simple: the impact of language on crowdfunding success. Economics & Sociology, 14(1), 130–44. Belleflamme, P., Lambert, T., & Schwienbacher, A. (2014). Crowdfunding: tapping the right crowd. Journal of Business Venturing, 29(5), 585–609. Burgoon, M., Denning, V. P., & Roberts, L. (2002). Language expectancy theory. In J. P. Dillard & M. Pfau (eds.), The Persuasion Handbook: Developments in Theory and Practice, 117–36. Sage. Butticè, V., & Rovelli, P. (2020). “Fund me, I am fabulous!” Do narcissistic entrepreneurs succeed or fail in crowdfunding? Personality and Individual Differences, 162, 110037. Choudhury, P., Wang, D., Carlson, N. A., & Khanna, T. (2019). Machine learning approaches to facial and text analysis: discovering CEO oral communication styles. Strategic Management Journal, 40(11), 1705–32. https://doi.org/10.1002/smj.3067. Cordova, A., Dolci, J., & Gianfrate, G. (2015). The determinants of crowdfunding success: evidence from technology projects. Procedia – Social and Behavioral Sciences, 181, 115–24. Crovitz, H. F. (1967). The form of logical solutions. American Journal of Psychology, 80(3), 461–462. Cui, G., Wong, M. L., & Lui, H.-K. (2006). Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Management Science, 52(4), 597–612. https:// doi.org/10.1287/mnsc.1060.0514. Danowski, J. A. (2013). WORDij version 3.0: semantic network analysis software. University of Illinois at Chicago. https://www.wordij.net. Denning, P., & Dunham R. (2006). Innovation as language action. Communications of the ACM, 49(5), 47–52. Dubitzky, W., Granzow, M., & Berrar, D. (2007). Fundamentals of Data Mining in Genomics and Proteomics. Springer. Etter, V., Grossglauser, M., & Thiran, P. (2013). Launch hard or go home! Predicting the success of Kickstarter campaigns. Proceedings of the First ACM Conference on Online Social Networks (pp. 177–82). https://doi.org/10.1145/2512938.2512957. Greco, F., & Polli, A. (2020). Emotional text mining: customer profiling in brand management. International Journal of Information Management, 51, 101934. https://doi.org/10.1016/j.ijinfomgt .2019.04.007. Greco, F., Riopelle, K., Grippa, F., Fronzetti Colladon, A., & Gluesing, J. (2020). Linguistic sleuthing for innovators. Quality & Quantity, 55(3), 1027–45. https://doi.org/10.1007/s11135-020-01038-x. Greco, F., Riopelle, K., Polli, A., & Gluesing, J. (2021). Using stop words in text mining: immigration and the election campaigns. Lexicometrica, Proceedings JADT2020. Greenberg, M. D., Hariharan, K., Gerber, E., & Pardo B. (2013). Crowdfunding support tools: predicting success & failure. CHI 2013: Changing Perspectives, Paris, France. Work-in-progress: Web and Ecommerce. Kaminski, J. C., & Hopp, C. (2020). Predicting outcomes in crowdfunding campaigns with textual, visual, and linguistic signals. Small Business Economics, 55(3), 627–49. Kim, K., & Viswanathan, S. (2014). The experts in the crowd: the role of experienced investors in a crowdfunding market. The 41st Research Conference on Communication, Information and Internet Policy. https://doi.org/10.25300/MISQ/2019/13758. Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67. https://doi.org/10.1038/s42256-019-0138-9. Lundberg, S. M., Erion, G. G., & Lee, S. I. (2019). Consistent individualized feature attribution for tree ensembles (1802.03888). https://arxiv.org/abs/1705.07874.
248 Handbook of social computing Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Proceedings of the 31st Conference on Neural Information Processing System, 1–10. https://www.arxiv-vanity .com/papers/1705.07874/(last accessed 18 December 2023). Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., Liston, D. E., Low, D. K.-W., Newman, S.-F., Kim, J., & Lee, S.-I. (2018). Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2(10), 749–60. https://doi .org/10.1038/s41551-018-0304-0. Mollick, E. (2014). The dynamics of crowdfunding: an exploratory study. Journal of Business Venturing, 29(1), 1–16. Ogden, C. K. (1934). The System of Basic English. Harcourt Brace. Peng, L., Cui, G., Bao, Z., & Liu, S. (2022). Speaking the same language: the power of words in crowdfunding success and failure. Marketing Letters, 33(2), 311–23. Polya, G. (1957). How to Solve It: A New Aspect of Mathematical Method. Anchor Books [edition published by arrangement with Princeton University Press]. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (eds), Advances in Neural Information Processing Systems 31 (pp. 6638–48). Neural Information Processing Systems Foundation. Roundy, P.T. and Asllani, A. (2019), Understanding the language of entrepreneurship: An exploratory analysis of entrepreneurial discourse. Journal of Economic and Administrative Sciences, 35(2), 113–127. https://doi.org/10.1108/JEAS-08-2017-0084. Semin, G. R. (2007). Linguistic markers of social distance and proximity. In K. Fiedler (ed.), Social Communication (pp. 389–408). Psychology Press. Shneor, R., & Vik, A. A. (2020). Crowdfunding success: a systematic literature review 2010–2017. Baltic Journal of Management, 15(2), 149–82. Xiao, S., Tan, X., Dong, M., & Qi, J. (2014). How to design your project in the online crowdfunding market? Evidence from Kickstarter. Proceedings of the Thirty Fifth International Conference on Information Systems, Auckland, Australia. http://dblp.uni-trier.de/db/conf/icis/icis2014.html #XiaoTDQ14 (last accessed 18 December 2023). Xu, A., Yang, X., Rao, H., Fu, W.-T., Huang, S.-W., & Bailey, B. P. (2014). Show me the money! Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 591–600. https:// doi.org/10.1145/2556288.2557045. Yuan, H., Lau, R. Y., & Xu, W. (2016). The determinants of crowdfunding success: a semantic text analytics approach. Decision Support Systems, 91, 67–76. Zhu, X. (2022). Proximal language predicts crowdfunding success: behavioral and experimental evidence. Computers in Human Behavior, 131, 107213.
13. Design, content and application of consent banners on plastic surgeon websites: derivation of a typology and discussion of possible implications for data analytics and AI applications Michael Beier and Katrin Schillo
1. INTRODUCTION New legislation such as the General Data Protection Regulation (GDPR) generally prohibit website operators from collecting certain data from European website visitors and setting non-essential cookies without their explicit consent. Thus, for numerous purposes of data collection, the permission of the user must be specifically obtained via a consent dialog (Machuletz & Böhme, 2020). Such consent dialogs are usually carried out in special tools (so-called consent banners), which are displayed when a website is accessed, for which no consent has yet been recorded. Consent banners display relevant information about the possible use of cookies on the website to visitors and offer them options to accept or reject the cookies displayed, their vendors and their respective purposes of use. If sensitive contexts are involved (e.g., in the health sector), particularly strict data protection standards should apply in this regard (Gradow & Greiner, 2021). Against this background, in this study we explore empirically the question of how consent banners on plastic surgeon websites in Germany are designed and how they are applied. To our knowledge, there have been no studies on consent banners in the medical field.1 However, healthcare providers differ, sometimes significantly, in how and to what extent they engage in online communications (Beier & Früh, 2020). Accordingly, we focused only on a narrow area of healthcare providers for this study. In addition, we selected specifically plastic surgeons as healthcare providers that tend to do more online marketing than others and have a particular focus on communicating through their own website (Mess et al., 2019). The objectives of this study are twofold: In a first (empirical) step, we exploratively assess the variety of the design and applications of consent banners on the websites of plastic surgeons in Germany. From the collected data we then derive a typology of application patterns and record the quantitative distributions of certain measures on the websites. In a second (conceptual) step, we evaluate the results and consider possible implications, which may result from a systematic bias based on differences in how people generally react on consent banners as well as on potential manipulations to increase the opt-in rates. Such new biases should be considered and corrected appropriately in data analytics and AI applications based on cookie data.
249
250 Handbook of social computing
2. LITERATURE To provide a sound introduction to the background of our study, we briefly review the relevant literature on cookies and their applications, legislation in this regard, and the basic design of consent banners, as well as several aspects of potential manipulations to increase consent rates of users during consent dialogs. 2.1
Cookies and Applications
Cookies are very small text files sent by web servers to the browsers of users who visit a website. These text files are stored on the users’ computers so that the web server can recognize them when the page is accessed again by the same user of the same computer (Peters & Sikorski, 1997). In this way, cookies can also store further information about the user’s behavior (e.g., passwords, contents of shopping carts, or website settings like the preferred language of a user) and make this available to the web server when the page is accessed again (Peng & Cisna, 2000). In general, cookies are distinguished between first-party cookies and third-party cookies. From a technical point of view, there are hardly any differences between these two types in terms of the information stored. The key differences are where cookies are set and in what context (Gradow & Greiner, 2021). First-party cookies are stored by the website a user visits and enables the website operator to store certain settings and actions of the user (e.g., shopping cart, language) or to collect analytics data on the visitors’ behavior a website. In contrast, third-party cookies are set by parties (domains) outside the website that a user visits, and they are used by these third parties for purposes such as cross-page tracking, re-targeting, aggregation of individual profiles, or performance optimization in the delivery of digital advertising (Gradow & Greiner, 2021; Tappenden & Miller, 2009; Trevisan et al., 2019). In this context, the term “online behavioral advertising” (OBA) is also frequently used, in which people’s behavior in digital environments is monitored, user profiles are generated, and the information collected is used to show users individually tailored advertising operated by complex, data driven real time auctions (Fassl et al., 2021). The monitored behavior of users in this regard may include numerous digital activities e.g., web browsing behavior, search histories, text and video consumption, app usage, purchases, responses to advertisements, as well as email and social media communications (Boerman et al., 2017). Examples of common tracker cookie vendors include Google, AppNexus, Facebook, RubiconProject, and comScore (Matte et al., 2020). In simplified terms, the comparison between first-party vs. third-party cookies can also be understood to suggest that first-party cookies can be used primarily to provide value to the user or at least to support the website provider’s service delivery. In contrast, third-party cookies are mainly used for the benefit of a third party (e.g., an advertising network or a data broker). For instance, it is often argued that by optimizing advertising based on tracking technologies, users are shown ads, which are more relevant and better suited to them (Ravichandran & Korula, 2019). In economic terms, however, this primarily benefits the advertising network that can achieve higher prices for advertising on this basis, mainly because targeted digital advertising is clicked on much more frequently than untargeted advertising (Guérin, 2020; Johnson et al., 2022).
Design, content and application of consent banners 251 2.2
Legislation and Consent Banners
The far-reaching power and strong market positions of today’s digital advertising networks and involved data brokers are largely based on the surveillance concepts and tracking technologies described above (Johnson et al., 2022; Matte et al., 2020). Some even speak of a new kind of surveillance capitalism (Reviglio, 2022; Zuboff, 2015). This situation has motivated advertising networks and data brokers over the years to continue pushing the boundaries of what is possible and to collect increasingly more data (Fassl et al., 2021). Also, since the early beginnings functionalities and applications of cookies have grown far beyond their initial intentions (Hormozi, 2005). In contrast, many users had (and still have) no clear understanding what cookies do on their computers and what they mean for the surviliance of their digital activities (Kulyk et al., 2018). At the same time, however, many users had an uneasy feeling about cookies, especially third-party cookies and their potential applications (Hormozi, 2005; McDonald & Cranor, 2010). In line with the developments in the public perception that the possibilities of using cookies had become too extensive as well as the need for more information for users and more individual control over the consent to the use of cookies, various legislative initiatives were initiated. Accordingly, 2018 saw the introduction of the GDPR, a new data protection regulation in the European Union that aims to improve user privacy by mandating better information about cookie activity on individual websites and allowing Internet users to better control their data and cookie settings (Ma & Birrell, 2022). In addition to the GDPR, the ePrivacy Directive (ePD) contains complementary rules for the processing of personal data by websites. The GDPR is a regulation that is directly enforceable in every European country. The ePD, on the other hand, is a directive, which means that it must be implemented by the individual EU member states in their own national law (Matte et al., 2020). In response to the legal requirements to inform users in a sufficiently simple and sufficiently detailed manner and to offer them customized choices for accepting cookies, many websites nowadays apply consent banners (Gray et al., 2021).2 Such consent banners are applications that are displayed when a user accesses a new website, which (1) display an informational statement about cookies that are prepared for use on the website, (2) ask for consent for all cookies, a category of cookies, or individual cookies, and (3) depending on the user’s choices, obtain or require the user’s consent to the collection, storage, and processing of his or her data (Ma & Birrell, 2022; Nouwens et al., 2020; Utz et al., 2019). 2.3
Potential Manipulation of Consent Decisions
If website visitors were significantly making use of the options for opting out of third-party cookies, the providers of these third-party cookies or the associated advertising networks and data brokers would lose massive amounts of revenue (Johnson et al., 2020; Ravichandran & Korula, 2019). Furthermore, the restriction of possible cookie applications would also make it more difficult for website operators to provide personalized services and monitor customers (Kant, 2021). Accordingly, many website operators are motivated to influence users in their consent dialog so that they agree to the use of cookies, as often as possible. Users tend to react on the specific kind of presentation of consent options or the specific design of the consent dialog. For instance, users are more likely to accept or click elements that are color highlighted (Bermejo Fernandez et al., 2021). In a similar vein, about 90 percent
252 Handbook of social computing of users accept the predefined default settings of consent dialogs (Machuletz & Böhme, 2020). The knowledge about such tendencies in user behavior allows website operators to a certain extend to influence users on their websites to give their consent on more cookie applications. Similar to the logic of conversion rates in e-commerce (e.g., the proportion of visitors to a sales page who actually buy something), today’s website operators often optimize the design, content, and application of consent banners in such a way that they lead to the highest possible consent rate among visitors to a website (Fassl et al., 2021; Machuletz & Böhme, 2020; Nouwens et al., 2020). Regarding the optimization of consent rates in the application of consent banners, various practices of nudging and dark patterns have been observed. The literature on manipulation during consent dialogs does not clearly distinguish between the terms “nudging” and “dark patterns”. However, it seems that nudging is more based on the behavioral economics literature (e.g., Thaler & Sunstein, 2008), while the perspective of dark patterns is more rooted in the User Experience (UX) design literature (e.g., Gray et al., 2018). In general, however, nudging tends to be seen as a lighter form of manipulation, compared to dark patterns. Thus, nudging is more about persuading users to perform desired actions based on subtle manipulations through minor adjustments to the design. The user’s decision-making autonomy is largely retained. Thus, certain decisions are suggested to the user, but it still requires an independent action from users to make them (Benner et al., 2022). In contrast, dark patterns are already considered a more severe form of manipulation (Nouwens et al., 2020). For example, they allow decisions to be made without a user action or options relevant to users to be excluded from the dialog (Matte et al., 2020).
3. METHOD In order to address the research questions outlined above, the first step in this study was to systematically collect data on consent banners on the homepages of plastic surgeons in Germany. For this purpose, homepages of plastic surgeons in Germany were identified, which tend to have a high traffic. To this end, the profiles of the plastic surgeons with the highest numbers of ratings were identified on the physician rating platform “jameda” (https://www.jameda.de) and the Internet addresses of their homepages were collected. In this way, 87 German plastic surgeon websites (without duplicates) could be collected. Originally, a higher number of websites was planned for this project. However, the jameda platform has considerably limited the query capacity for global searches on physicians in a specific medical specialty. Nevertheless, for this initial exploratory study, the number of 87 websites appears to be sufficient. To examine the consent banners on the websites in detail, the websites were visited via a VPN connection (virtual private network with localization Germany) and displayed in a Google Chrome browser without any cookie restrictions. From each homepage, all elements of the dialog within each consent banner were captured via screenshots and collected (covering all pages and windows of the dialog). In addition, texts of the consent dialogs were copied where possible or necessary to allow for full text searches during the analyses. Because consent banners are implemented with different strategies and instruments in the plastic surgeon sample, the first step in preparing the analysis for this study was to develop a general, abstract scheme for comparative evaluation of the sample’s consent banners. The
Design, content and application of consent banners 253 abstract scheme for the design, content, and application of the interactive dialogs in the consent banners (Figure 13.1) was developed jointly by the authors. For this purpose, the collected data were discussed together and relevant features and their characteristics were defined. Subsequently, all data sets were coded collaboratively. Inconsistencies were discussed conclusively and the final coding scheme was defined. After the finalization of the abstract scheme, it was applied for a quantitative content analysis of the sample’s consent dialogs. In the result section of this chapter, it has been applied to present different types of consent banner applications in aggregated visualizations.
Figure 13.1
Coding scheme for integrated analysis of consent dialogs
The coding scheme is divided into four different sections, which are numbered in Figure 13.1. Section 1 displays if the consent dialog provided general description texts on cookies and what they are necessary for on the website. Section 2 provides visual information on specific buttons with options to accept or reject all cookies or to ask for further options in the consent dialog. Section 3 indicates categories of cookies and check boxes for aggregated preselections of their acceptance. The naming of the individual categories in the diagram (“Necessary”, “Statistics”, “Marketing”) is only exemplary. In practice, additional categories are also used (e.g., “Performance Cookies”, “Audience Measurement”, or “Social Media”) (Utz et al., 2019). Section 4 indicates if lists of specific cookie vendors are provided, each with description, specification and selection option. Such lists are intended by the legislator for the application in consent banners to make transparent in the consent dialogs the different providers of cookies and the individual purposes for their use (Hils et al., 2020). In addition, users should be able to make decisions regarding individual cookies or individual providers (Gradow & Greiner, 2021).
4. RESULTS In this study, we examined four different approaches to the design, content, and application of consent banners on plastic surgeon websites in Germany. The results on the four approaches are presented in the following sections sorted by increasing severity of manipulation of the approaches for optimized consent rates in consent banners. 4.1
Granularity and Specificity of Cookie Setting Options During Consent Dialogs
In our first analysis we assessed the degree of granularity and specificity of cookie setting options and information provided during the whole consent dialogs of the homepages in the
254 Handbook of social computing Table 13.1 Label
Granularity/specificity of cookie setting options during consent dialogs (N=87) Number
Share
18
20.7%
18
20.7%
3: Global Decision
5
5.7%
4: Set Cookie
12
13.8%
34
39.1%
1: No information
Visualization
at all 2: Only Opt-in Cookie Banner
Categories 5: Set Individual Cookies
sample (N=87). Legislation requires website operators to inform users in a sufficiently simple and comprehensive manner about the use of cookies and the purposes pursued with them, and to offer them individual choices for accepting cookies (Gradow & Greiner, 2021). Therefore, website operators face the difficulty of finding the right balance between the amount of information and options provided vs. the simplicity and accessibility of the presentation (Utz et al., 2019). Accordingly, the granularity and specificity of cookie setting options in consent dialogs can be classified as a very low-threshold or even involuntary measure to increase consent rates. Nevertheless, the granularity and specificity of cookie setting options can be used to optimize consent rates. A higher number of presented options or information is associated with a higher perceived cognitive effort by users necessary to solve the task (Machuletz & Böhme, 2020). This can make it more difficult for users to make informed decisions about accepting cookies. Likewise, it may increase their tendency to not fully engage with the consent dialog and instead make a simple, global decision (e.g., accepting all cookies) (Nouwens et al., 2020). The results in this regard are presented in Table 13.1. In the data we could identify five different types of granularity and specificity of cookie setting options. The visualization of our coding scheme provides all content elements and options possible for consent dialogs (white). The elements of the individual types are presented additively. Thus, the elements of the previous types are included in the next type (marked in dark gray), extended by new elements (marked in black). Regarding the whole consent dialog, on the lowest level of granularity and specificity in 18 cases (20.7 percent of the sample), no information about cookies on the website are provided at all (Type 1). In a further 18 cases (20.7 percent) the users were only informed about the cookie application and offered an implicit or explicit option to accept this, meaning solely the provision of an opt-in (Type 2). In 5 cases (5.7 percent) the users had the additional option to reject the described cookie applications at all (Type 3; global decision to opt-in or opt-out). In 12 cases (13.8 percent), in addition an overview of main categories for cookies (e.g., “Necessary”, “Statistics”, “Marketing”) was offered with check boxes for their selection, as
Design, content and application of consent banners 255 well as the option to accept the selected choice (Type 4). In 34 cases (39.1 percent) the users got the opportunity to specify settings for individual cookies provided in a list (Type 5). Our results show a very high share of cases where no option for an opt-out is provided at all (41.4 percent, Type 1 or Type 2). Admittedly, this approach contradicts the current legislation. Moreover, such patterns have also been referred to as dark patterns in previous studies (e.g, Matte et al., 2020). They may be applied intentionally to prevent the rejection of cookie consent. Conversely, it may also be the case that website operators have simply been too slow to adapt their websites, so that the new cookie consent requirements have not yet been implemented. In contrast, 39.1 percent of the websites in our sample provided users with comprehensive information and options for their cookie consent decisions. At this end of the continuum, there is more of a risk that some users will be overwhelmed by the multitude of information and options and will not be able to make a thoughtful decision. 4.2
Information and Options on First Page of Consent Dialog
In a next step, we analyzed what information and options have been presented directly on the first page of the consent dialogs. Data of a big consent management service provider indicate that more than 97 percent of website visitors only interact with the first page of a consent dialog (Usercentrics, 2020). Accordingly, the positioning of information and decision options on the first page can be understood as a significant possibility for nudging in consent dialogs. It is indeed the case that more information and decision options are usually offered in the further course of the consent dialog on later pages of the consent banner. But only what is displayed on the first page of the banner is considered by the majority of users in their consent decision. Therefore, in our second analysis we assessed which information and options are provided on the first page of the consent banners on the websites in our sample (N=87). We were able to identify six different types of first page consent dialogs. The results are presented in Table 13.2. The visualization of our coding scheme provides all content elements and options possible for consent banners (white). Here, too, the elements of the individual types are shown additively, with the elements of the previous type being included in the next type (marked in dark gray) and extended by new elements (marked in black). Table 13.2 Type 1: No information
Information and options on first page of consent dialogs (N=87) Visualization
Number
Share
18
20.7%
18
20.7%
9
10.3%
16
18.4%
at all 2: Only Opt-in Cookie Banner 3: Opt-in Consent Banner 4: Opt-out with Invisible Settings
256 Handbook of social computing Type
Visualization
5: Set Cookie
Number
Share
25
28.7%
1
1.1%
Categories 6: Set Individual Cookies
In 18 cases (20.7 percent of the sample), no consent dialog or similar was offered on the first page of the website (Type 1). In 18 cases (20.7 percent) only a description text about cookie applications and their purposes were provided combined with an option to accept all cookies (Type 2; opt-in only). In 9 cases (10.3 percent) additionally buttons to ask for more information or to specify differentiated cookie settings on later pages where provided (Type 3). In 16 cases (18.4 percent), a further option to reject all cookies was offered (Type 4; opt-out). In 25 cases (28.7 percent), in addition an overview of main cookie categories (e.g., “Necessary”, “Statistics”, “Marketing”) was offered with check boxes for their selection, as well as the option to accept the selected choices (Type 5). And finally, in one case (1.1 percent), all potential options of a consent dialog have been included in the first page of the dialog (Type 6). In this case, even a list of individual cookies of different vendors have been offered with descriptions, specifications, and individual selection options. Repeating our results from the first analysis, our results show for the first pages of the consent dialogs a very high share of cases where no relevant options for a total opt-out are provided at all (41.4 percent, Type 1 or Type 2). However, in addition our results show that about 57.5 percent (50 cases) of the websites in our sample provided simplified options and information on the first pages of their consent dialogs to their users (Types 3, 4, and 5). These options will likely fall within a range that attempts to balance simplicity and comprehensiveness of presentation within a reasonable scope. Furthermore, this type of presentation can be interpreted as a slight form of nudging, since additional information and options are accessible on the next pages of the consent dialog. Accordingly, some information is hidden from direct access on the first page of the consent banner. 4.3
Color Highlighting of Buttons
Regarding the application of moderately severe nudging tactics we searched for specific color highlightings in the consent banners and interpreted the intention behind these applications. In this regard, we searched in the data of our sample for cases where color highlighting of elements was applied to influence the attention of users and their reactions during consent dialogs. More specifically we analyzed the color highlighting of global decisions of accepting (opt-in) or rejecting (opt-out) all cookies. It appears to be a simple and effective nudging application to color highlight such elements with a positively connotated color to channel the users’ attention to a specific button as well as to reduce resistance to pressing the button (Gray et al., 2018). The relevant codings of these elements were covered in Sector 2 of our evaluation scheme. Table 13.3 shows our results in this regard. First of all, we found 36 cases (41.4 percent) were no options for general acceptance or rejection of all cookies have been provided. Accordingly in these cases no options for nudging applications via color highlighting were given. Further, we found no cases (0.0 percent) where a defensive color scheme has been applied to highlight buttons for the global decisions of
Design, content and application of consent banners 257 Table 13.3
Color highlighting of buttons (N=87)
Label
Visualization
Number
Share
1: Not applicable
No options presented during the consent dialog
36
41.4%
0
0.0%
20
23.0%
31
35.6%
2: Defensive color scheme 3: Neutral color scheme 4: Offensive color scheme
rejecting all cookies (opt-out). Such a “defensive” scheme would have highlighted the “Reject All” option with a positively connotated color. Such a scheme would have made it easier to a certain extent for users to reject all cookies during the cookie dialog (Gray et al., 2018). In 20 cases (23.0 percent), we observed that no color highlighting has been applied at all for the buttons for the global decisions of accepting or rejecting all cookies. We called this a “neutral color scheme”. Finally, in 31 cases (35.6 percent) an “offensive” color scheme has been applied, highlighting the “Accept All” option (opt-in) with a positively connotated color. Such a scheme somehow manipulates users to accept all cookies more easily during the cookie dialog (Bermejo Fernandez et al., 2021; Utz et al., 2019). 4.4
Preselection of Options in Consent Banners
Regarding dark patterns, we searched in the data of our sample for cases where check boxes of cookie categories have been preselected by design. If such a preselection is misleading or manipulative this would be a much stronger manipulative intervention in the user’s decision process than just misleading color highlightings of buttons. In our analysis, such a dark pattern represents the most severe level of intentional manipulation in consent banners. The corresponding codes for these elements were visualized in Section 3 of our evaluation scheme. Table 13.4 shows our results in this regard. Table 13.4
Preselection of Options in Consent Banners (N=87)
Label
Visualization
Number
1: Not applicable
No options presented during the consent dialog
41
Share 47.1%
2: Defensive presettings
43
49.8%
3: Manipulative
3
3.4%
presettings
In 41 cases (47.1 percent), classification was not possible because no categories were offered for selection in the consent banner. In 43 cases (49.8 percent), website operators refrained from an aggressive form of manipulation by presetting only the mandatory cookie category of the necessary cookies in the selectable categories. More precisely, this means that in 93.5 percent of the cases where such a pattern was possible (43 out of 46), it has not been applied.
258 Handbook of social computing In contrast, three cases in our sample (3.4 percent of the sample and 6.5 percent of the cases where an application was possible) applied such a form of dark patterns to manipulate the users to opt-in for all cookies.
5. DISCUSSION In the following sections we evaluate the empirical results of this exploratory study and consider potential implications on applications of cookie data. In this regard, we combine practical and ethical considerations. 5.1
Evaluation of Results
The results of this exploratory study show the extent to which plastic surgeon websites in Germany use putative manipulative approaches that can significantly increase website cookie consent rates. The main uses of manipulative approaches that emerge from our results are (1) omitting any information on cookie applications or the option to opt-out (41.4 percent), (2) offensive color schemes that highlight the “Accept All” option (35.6 percent), and (3) manipulative cookie category defaults to get users to accept all categories (3.4 percent). To the best of our knowledge, these are the first findings on consent banners in the medical field. To date, there have been primarily broad-based studies that have looked at the most-used websites in a region (e.g., Matte et al., 2020; Utz et al., 2019). Unfortunately, however, it is difficult to compare the results because the evaluation schemes of the design of consent banners as well as classifications of manipulative approaches were defined differently in the studies. For instance the study of Utz et al. (2019) analyzing a sample of 1,000 of the most popular websites in the European Union found 57.4 percent using some kind of nudging or dark patterns to increase the cookie consent rates of the pages. Nevertheless, they evaluated several types of such measures in combination (including color highlighting of buttons, hiding additional settings on later pages of the consent dialog, and preselecting check boxes to activate data collection). The percentage of websites in our sample of plastic surgeons in Germany that do not offer a general opt-out option for cookie consent at all seems to be quite high, at 41.4 percent. Matte et al. (2020) analyzed 1,426 European websites (with a particular focus on France, the U.K., Italy, Belgium, and Ireland) and found that only 6.8 percent of them did so. However, the percentages in the individual countries varied between 3.6 percent and 11.1 percent. One possible reason for the high proportion of opt-outs not offered may be that plastic surgeons may be relatively slow in developing their online presences and, accordingly, many websites have simply not yet been converted to the required option to opt-out of cookies since the introduction of the GDPR. In addition, at 35.6 percent, the high proportion of offensive color schemes that highlight the “Accept All” option stands out in our data. Unfortunately, we could not find studies showing numbers on the specific prevalence of color-highlighted buttons in consent banners. Available studies in this regard used experimental approaches with simulated websites to analyze the impact of color-highlighted buttons in consent banners on the resulting consent rates (e.g., Bermejo Fernandez et al., 2021). In the context of our empirical results, however, we consider
Design, content and application of consent banners 259 color highlighting to be a rather mild form of manipulation in consent banners, since color highlighting only controls users’ attention, leaving them to make their own decisions. In contrast, cases where check boxes of cookie categories have been preselected for opt-ins by design, can be classified as severe forms of manipulation in consent banners (Machuletz & Böhme, 2020). In our sample, however, such manipulations are rare, accounting for 6.5 percent of the cases in which they were applicable. In this regard, Matte et al. (2020) observed in their sample of 1,426 European websites that 46.5 percent of the websites applied preselected choices. However, while in our sample only preselections in the category check boxes were evaluated, Matte et al. also included preselections in any other check boxes of the consent banners (e.g., regarding individual vendors) in their analysis. 5.2
Practical and Ethical Implications
The results of our study raise two important questions that have both practical and ethical dimensions. The first issue are manipulative interventions in the design of consent banners. As shown in our analyses, different measures can be ranked according to the severity of manipulation and ethical questionability. However, as is common with ethical norms, the order of such rankings is sometimes viewed differently by different stakeholders (Gray et al., 2021). Similar findings were seen in the imprecise and varying definitions of nudging and dark patterns in different studies presented above in our literature review. Nevertheless, it is obvious that manipulative measures in consent dialogs are clearly unethical above a certain level. This is also the view of the legislator (Gradow & Greiner, 2021). In contrast, website operators, connected advertising networks, data brokers, and the final exploiters of the generated data, are dependent on high consent rates in consent banners and thus on the availability of certain data volumes in sufficient quality. Accordingly, there is also a practical need for a sufficiently good solution regarding high consent rates. Overly ethical behavior could lead to significant disadvantages here. Yet this challenge is probably far less for plastic surgeons in Germany than for e-commerce companies or other businesses that rely heavily on comprehensive knowledge of their online customer interactions (Johnson et al., 2020; Kant, 2021; Ravichandran & Korula, 2019). However, in addition to aggressive manipulation, there are other ways to increase the approval rate, e.g., by offering users benefits or incentives and communicating them clearly (Mager & Kranz, 2021a, 2021b). The second issue arises from changes in the availability of data as well as changes in the data available. Aside from the fact that less data is available (making many operations difficult or even impossible), consent dialogs and manipulations also affect the content of the available data itself. Different people have different privacy concerns and attitudes (e.g., Bermejo Fernandez et al., 2021; Tifferet, 2019; Zhang et al., 2013). Also do different people react differently on manipulation measures during consent dialogs (e.g., Coventry et al., 2016). All these influences will lead to bias in available data. Today’s applications for processing user profiles, segmenting customer data sets, and optimizing digital advertising playouts make extensive use of complex computations and artificial intelligence applications (e.g., Brito et al., 2015; Huang & Rust, 2021; Wang et al., 2017; Zulfa et al., 2022). Many of these applications are fully automated without humans having any insight into the complex processes of execution (Fassl et al., 2021). Hence, the risks of miscalculation or bias in such applications due to changes in the available data and that these are not detected in a timely manner are high. On the one hand, this again has practical implications in that applications of the data
260 Handbook of social computing are less successful or generate less revenue. On the other hand, miscalculation or bias in data applications might also systematically put people at a disadvantage or lead to discrimination (Srinivasan & Chander, 2021). Accordingly, such new types of biases should be anticipated and be corrected in data analytics and AI applications by data exploiters. At first glance, one might think that the issues highlighted are hardly relevant for plastic surgeons in Germany, assuming that they have only minor use for the data collected via cookies. However, they may actively support via third-party cookies on their websites to collect data from their visitors for third parties (in particular, advertising networks, data brokers, and data exploiters). This is all the more serious in a medical context, as medical information and data on search and online behavior require special protection in a medical context (Gradow & Greiner, 2021). Even if data collection and utilization is not part of the core business of plastic surgeons, they should probably live up to their responsibility as website operators in a sensitive medical environment. On the one hand, all plastic surgeons should (as required by legislation) provide users with a simple and clear way to generally opt-out of non-essential cookies on their websites. On the other hand, they should ask themselves how much data they actually need themselves and refrain from further data collection for third parties as far as possible. With respect to data collectors and exploiters, the considerations of this chapter suggest that consent management and its manipulations can introduce new biases in the data collected. This presents both practical and ethical challenges that should be appropriately considered in analytics and AI applications based on this data.
6. CONCLUSION The empirical results of our study show significant variations in the design, content, and application of consent banners on German plastic surgeon websites. To the best of our knowledge, these are the first empirical findings on consent banners in the healthcare sector. On the basis of our empirical results, we were also able to point to possible new biases in cookie data, arising both from variations in user responses to consent dialogs in general as well as from variations in user responses to manipulations in this regard. This should raise awareness of new relevant issues related to cookie data and consent management and provide initial guidance on possible approaches to address these issues. However, this is only an exploratory empirical study with a small sample (N=87) in a narrow range of healthcare providers. Future research should therefore aim to place our findings on a broader empirical basis. Larger samples comparing different types of physicians or healthcare providers would allow for deeper and more empirically sound insights into the use of consent banners in healthcare. On the other hand, this study provides only preliminary considerations of systematic biases in cookie data resulting from consent management and manipulations therein. Further research should explore in case studies the details of how such biases occur in practice and what exactly can be done to adequately correct for them in data analytics and AI applications that use such biased cookie data.
Design, content and application of consent banners 261
NOTES 1. A systematic search of the literature for common terms for consent banners (“cookie banner”, “cookie disclaimers”, “consent notices”, “consent pop-ups”, “consent dialogs”, “cookie consent”) yielded no hits in the medical science databases PubMed and MEDLINE. 2. However, in the literature on this topic, consent banners are often referred to quite differently: e.g., as “cookie banners” (Utz et al., 2019), “cookie disclaimers” (Kulyk et al., 2018), “consent notices” (Bermejo Fernandez et al., 2021), “consent pop-ups” (Nouwens et al., 2020), or “consent dialogs” (Machuletz & Böhme, 2020), which makes the topic somewhat difficult to access.
REFERENCES Beier, M., & Früh, S. (2020). Technological, organizational, and environmental factors influencing social media adoption by hospitals in Switzerland: cross-sectional study. Journal of Medical Internet Research, 22(3), e16995. Benner, D., Schöbel, S. M., Janson, A., & Leimeister, J. M. (2022). How to achieve ethical persuasive design: a review and theoretical propositions for information systems. AIS Transactions on Human-Computer Interaction, 14(4), 548–77. Bermejo Fernandez, C., Chatzopoulos, D., Papadopoulos, D., & Hui, P. (2021). This website uses nudging: MTurk workers’ behaviour on cookie consent notices. Proceedings of the ACM on Human-Computer Interaction, 5, CSCW2, article 346. Boerman, S. C., Kruikemeier, S., & Zuiderveen Borgesius, F. J. (2017). Online behavioral advertising: a literature review and research agenda. Journal of Advertising, 46(3), 363–76. Brito, P. Q., Soares, C., Almeida, S., Monte, A., & Byvoet, M. (2015). Customer segmentation in a large database of an online customized fashion business. Robotics and Computer-Integrated Manufacturing, 36, 93–100. Coventry, L. M., Jeske, D., Blythe, J. M., Turland, J., & Briggs, P. (2016). Personality and social framing in privacy decision-making: a study on cookie acceptance. Frontiers in Psychology, 7, 1341. Fassl, M., Gröber, L. T., & Krombholz, K. (2021). Stop the consent theater. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3411763 .3451230. Gradow, L., & Greiner, R. (2021). Consent-Management. Springer Gabler. Gray, C. M., Kou, Y., Battles, B., Hoggatt, J., & Toombs, A. L. (2018). The dark (patterns) side of UX design. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. https:// doi.org/10.1145/3173574.3174108. Gray, C. M., Santos, C., Bielova, N., Toth, M., & Clifford, D. (2021). Dark patterns and the legal requirements of consent banners: an interaction criticism perspective. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3411764.3445779. Guérin, M. M. (2020). The improvement of retargeting by big data: a decision support that threatens the brand image? European Journal of Marketing and Economics, 4(1), 31–44. Hils, M., Woods, D. W., & Böhme, R. (2020). Measuring the emergence of consent management on the web. In Proceedings of the ACM Internet Measurement Conference, 317–32. https://doi.org/10.1145/ 3419394.3423647. Hormozi, A. M. (2005). Cookies and privacy. Information Security Journal, 13(6), 51–9. Huang, M. H., & Rust, R. T. (2021). A strategic framework for artificial intelligence in marketing. Journal of the Academy of Marketing Science, 49(1), 30–50. Johnson, G., Runge, J., & Seufert, E. (2022). Privacy-centric digital advertising: implications for research. Customer Needs and Solutions, 9(1), 49–54. Johnson, G. A., Shriver, S. K., & Du, S. (2020). Consumer privacy choice in online advertising: who opts out and at what cost to industry? Marketing Science, 39(1), 33–51. Kant, T. (2021). Identity, advertising, and algorithmic targeting: or how (not) to target your “ideal user”. MIT Case Studies in Social and Ethical Responsibilities of Computing, Summer 2021. https://doi.org/ 10.21428/2c646de5.929a7db6.
262 Handbook of social computing Kulyk, O., Hilt, A., Gerber, N., & Volkamer, M. (2018). “This website uses cookies”: users’ perceptions and reactions to the cookie disclaimer. In European Workshop on Usable Security (EuroUSEC), Vol. 4. https://doi.org/10.14722/eurousec.2018.23012. Ma, E., & Birrell, E. (2022). Prospective consent: the effect of framing on cookie consent decisions. Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi .org/10.1145/3491101.3519687. Machuletz, D., & Böhme, R. (2020). Multiple purposes, multiple problems: a user study of consent dialogs after GDPR. In Proceedings on Privacy Enhancing Technologies, 2, 481–98. https://doi.org/ 10.2478/popets-2020-0037. Mager, S., & Kranz, J. (2021a). Consent notices and the willingness-to-sell observational data: evidence from user reactions in the field. In Proceedings of the European Conference on Information Systems (ECIS). https://aisel.aisnet.org/ecis2021_rp/89/. Mager, S., & Kranz, J. (2021b). On the effectiveness of overt and covert interventions in influencing cookie consent: field experimental evidence. In Proceedings of the International Conference on Information Systems (ICIS). https://aisel.aisnet.org/icis2021/cyber_security/cyber_security/5/ Matte, C., Bielova, N., & Santos, C. (2020). Do cookie banners respect my choice? Measuring legal compliance of banners from IAB Europe’s transparency and consent framework. In Proceedings of the IEEE Symposium on Security and Privacy, 791–809. https://doi.org/10.48550/arXiv.1911.09964. McDonald, A. M., & Cranor, L. F. (2010). Americans’ attitudes about internet behavioral advertising practices. In Proceedings of the 9th Annual ACM Workshop on Privacy in the Electronic Society, 63–72. https://doi.org/10.1145/1866919.1866929. Mess, S. A., Bharti, G., Newcott, B., Chaffin, A. E., Van Natta, B. W., Momeni, R., & Swanson, S. (2019). To post or not to post: plastic surgery practice marketing, websites, and social media? Plastic and Reconstructive Surgery Global Open, 7(7). https://doi.org/10.1097/GOX.0000000000002331. Nouwens, M., Liccardi, I., Veale, M., Karger, D., & Kagal, L. (2020). Dark patterns after the GDPR: scraping consent pop-ups and demonstrating their influence. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3313831.3376321. Peng, W. & Cisna, J. (2000). HTTP cookies: A promising technology. Online Information Review, 24(2), 150–53. Peters, R., & Sikorski, R. (1997). Cookie monster? Science, 278(5342), 1486–7. Ravichandran, D., & Korula, N. (2019). Effect of disabling third-party cookies on publisher revenue. Google White Paper. https://services.google.com/fh/files/misc/disabling_third-party_cookies _publisher_revenue.pdf (accessed 28 December 2023). Reviglio, U. (2022). The untamed and discreet role of data brokers in surveillance capitalism: a transnational and interdisciplinary overview. Internet Policy Review, 11(3), 1–27. Srinivasan, R., & Chander, A. (2021). Biases in AI systems. Communications of the ACM, 64(8), 44–9. Tappenden, A. F., & Miller, J. (2009). Cookies: a deployment study and the testing implications. ACM Transactions on the Web, 3(3), article 9. https://doi.org/10.1145/1541822.1541824. Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press. Tifferet, S. (2019). Gender differences in privacy tendencies on social network sites: a meta-analysis. Computers in Human Behavior, 93, 1–12. Trevisan, M., Stefano, T., Bassi, E., & Marco, M. (2019). 4 years of EU cookie law: results and lessons learned. Proceedings on Privacy Enhancing Technologies, 2019(2), 126–45. Usercentrics (2020). Die Optimierung der Opt-in Rate – eine neue Disziplin im Onlinemarketing. https:// usercentrics.com/de/ressourcen/whitepaper-opt-in-optimierung/ (accessed 28 December 2023). Utz, C., Degeling, M., Fahl, S., Schaub, F., & Holz, T. (2019). (Un) informed consent: studying GDPR consent notices in the field. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 973–90. https://doi.org/10.48550/arXiv.1909.02638. Wang, J., Zhang, W., & Yuan, S. (2017). Display advertising with real-time bidding (RTB) and behavioural targeting. Foundations and Trends® in Information Retrieval, 11(4–5), 297–435. Zhang, R., Chen, J. Q., & Lee, C. J. (2013). Mobile commerce and consumer privacy concerns. Journal of Computer Information Systems, 53(4), 31–8.
Design, content and application of consent banners 263 Zuboff, S. (2015). Big other: surveillance capitalism and the prospects of an information civilization. Journal of Information Technology, 30(1), 75–89. Zulfa, A. A., Sihabuddin, S., & Widhiyanti, H. N. (2022). Utilization of personal data through cookies using artificial intelligence from human rights perspective. International Journal of Multicultural and Multireligious Understanding, 9(3), 293–303.
PART V MORE SUSTAINABILITY THROUGH SOCIAL COMPUTING
14. Creating a systematic ESG (Environmental Social Governance) scoring system using social network analysis and machine learning for more sustainable company practices Aarav Patel and Peter A. Gloor
1. INTRODUCTION Many feel companies need to place more emphasis on social responsibility. For instance, 100 companies have been responsible for 71 percent of global greenhouse gas emissions since 1998 (Carbon Majors Database1). Many business leaders have publicly stated that they are on board with incorporating sustainability measures. In 2016, a UN survey found that 78 percent of CEO respondents believed corporate efforts should contribute to the UN Standard Development Goals, which are goals adopted by the United Nations as a universal call to action to end poverty and protect the planet (UN, 2016). However, while many executives pledged greater focus on these areas of concern, only a few took noticeable tangible action. In a more recent 2019 UN survey, only ~20 percent of responding CEOs felt that businesses were making a difference in the worldwide sustainability agenda (UN, 2019). These surveys highlight a disconnect between sustainability goals and sustainability actions. They also highlight inefficiencies in current executive actions since many feel they are not making enough progress toward social responsibility. Environmental Social Governance (ESG) is a commonly used metric that determines the sustainability and societal impact of a company’s practices. ESG raters such as MSCI (Morgan Stanley Capital International), S&P Global, and FTSE (Financial Times Stock Exchange) do this by measuring sub-categories such as pollution, diversity, human rights, community impact, etc. (Figure 14.1). Measuring these areas of concern is necessary since they encourage companies to rectify bad practices. This is because ESG ratings can influence factors such as investor capital, public perception, credit ratings, and so on. Furthermore, ESG ratings can provide companies with specific information on which key areas to improve, which can help better guide their initiatives. At the moment, ESG is assessed by rating agencies using self-reporting company filings. As a result, companies can often portray themselves in an artificially positive light. These biased reports have led to subjective and inconsistent analysis between different ESG rating organizations, despite them seeking to measure the same thing (Kotsantonis et al., 2019). For instance, the correlation among six prominent ESG rating agencies is 0.54; in comparison, mainstream credit ratings have a stronger correlation of 0.99 (Berg et al., 2019). As a result, many feel that there is a disconnect between ESG ratings and a company’s true social responsibility. This highlights how subjective assessment and limited data transparency from self-reporting can create inconsistent ratings. 265
266 Handbook of social computing
Note: Based on S&P Global ESG evaluation framework. Source: S&P Global (n.d.).
Figure 14.1
Categories determining ESG score
Having more consistent and accurate ESG evaluation is important. Divergence and imprecision in ESG ratings hamper motivation for companies to improve since they give executives mixed signals on what to change (Stackpole, 2021). As a result, it becomes difficult to create better-targeted sustainability initiatives. Furthermore, self-reporting allows companies with more resources to portray themselves better. This is why there is a significant positive correlation between a company’s size, available resources, and ESG score (Drempetic et al., 2019). These issues ultimately defeat the purpose of ESG by failing to motivate companies toward sustainable practices. This raises the need for a more holistic and systemized approach to ESG evaluation that can more precisely measure a company’s social responsibility. By establishing a more representative ground truth, it can better guide company initiatives toward social responsibility, thus increasing the impact of ESG.
2.
RELATED WORKS
Existing ESG-related research falls under two main categories. Some papers aim to correlate ESG performance with financial performance and see if a company’s corporate social responsibility (CSR) can be used to predict future stock performance (Jain et al., 2019). Other papers propose new data-driven methods for enhancing and automating ESG rating measurement to avoid existing fallacies/inefficiencies (Hisano et al., 2020; Krappel et al., 2021; Liao et al., 2017; Lin & Hsu, 2018; Shahi et al., 2011; Sokolov et al., 2021; Venturelli et al., 2017; Wicher et al., 2019). This chapter falls into the latter category. Since many firms publish sustainability reports on an annual basis, many researchers use this content for analysis. This is typically done using text mining to identify ESG topics and trends. In order to parse out and leverage this data, researchers have created classification
Creating a systematic ESG scoring system 267 models that can classify sentences/paragraphs into various ESG subdimensions (Liao et al., 2017; Lin & Hsu, 2018). Additionally, some researchers have used these text classification algorithms to analyze the completeness of sustainability reports (Shahi et al., 2011). This is because companies sometimes limit disclosure regarding negative ESG aspects within their filings. Both tools can assist in automatic ESG scoring using company filings, which increases access for companies without ESG coverage. However, there are deficiencies in solely relying on self-reported filings for analysis since it fails to consider omitted data or newer developments. As a result, researchers have been testing out alternative methods to solve this. For instance, some researchers utilize Fuzzy Expert System (FES) or a Fuzzy Analytic Network Process (FANP), pulling data from quantitative indicators (i.e., metrics provided by the Global Reporting Initiative) and qualitative features from surveys/interviews (Venturelli et al., 2017; Wicher et al., 2019). Others collected data from online social networks like X (formerly Twitter) to analyze a company’s sustainability profile. For example, some used natural language processing (NLP) frameworks to classify Tweets into various ESG topics and determine whether they are positive or negative (Sokolov et al., 2021). Furthermore, some used heterogeneous information networks that combined data from various negative news datasets and used machine learning to predict ESG (Hisano et al., 2020). Finally, others explored the viability of using fundamental data such as a company’s profile and financials to predict ESG (Krappel et al., 2021). Overall, all these methods aimed to improve self-reported filings by using more balanced, unbiased, and real-time data.
3. PURPOSE The purpose of this project was to create a systematic ESG rating system that gives executives and outsiders a more balanced and representative view of a company’s practices for greater social responsibility. To do this, a machine-learning algorithm was created using social network data to quantitatively evaluate ESG. Social network data was used instead of self-reported filings since it can provide various outsider perspectives on issues people feel a corporation should address. By directly showcasing public opinion, it can remove the bias of self-reporting and help executives create more targeted initiatives for meaningful change. Furthermore, a data-driven system can provide ESG ratings for companies without coverage. To test the predictive power of the proposed system, the correlation and the mean absolute average error (MAAE) were measured against current ESG ratings. This can help determine whether the system is viable for rating prediction. However, potential constraints include limited access to high volumes of social network data, the accuracy of NLP algorithms, and limited computational resources. The contributions of this work can be summarized as follows: ● It gives a real-time social-sentiment ESG score that highlights how people feel regarding a company’s practices. This can give executives a way to monitor the ESG health of their organization. It also shows which areas the people feel need the most change, and this can help target executive initiatives to be more effective. ● It provides a full-stack method for gathering real-time ESG data and converting it into a comprehensive score. This allows for the readily available creation of initial ESG ratings that can be used either directly by investors to ensure they are making socially conscious
268 Handbook of social computing investments (especially for non-rated companies) or by ESG rating agencies to scale up coverage. ● The proposed approach utilizes multiple social networks for score prediction. Most papers about ESG social network analysis typically hyperfocus on one specific network such as Twitter or Google News (Sokolov et al., 2021). This chapter seeks to combine them while also adding other under-analyzed social networks (i.e., LinkedIn, Wikipedia).
4. METHODS The creation of this project was divided into three steps. The first step was data collection through web scrapers across various social networks. Afterward, text data was pre-processed and converted into sub-category scores using NLP. Finally, machine-learning algorithms were trained using this data to compute a cohesive ESG rating (Figure 14.2).
Figure 14.2
4.1
An overview of how the data-driven ESG index uses social network data to compute a cohesive ESG rating
Data Collection
Rather than use self-reported corporate filings, social network data was used to holistically quantify ESG. Social network analysis and web scraping can be used to identify trends (Gloor et al., 2009). Popular social networks such as Twitter, LinkedIn, and Google News have a plethora of data pertaining to nearly any topic. This data can provide a balanced view of company ESG practices, and it can help cover both short-term and long-term company ESG trends. It can also gather data that might not be reflected in filings. Finally, this data can directly highlight the concerns of outsiders, which can better guide company ESG initiatives to be more impactful. Table 14.1
Keywords/topics used for data collection
Environment: environment, carbon, climate, emission, pollution, sustainability Social: social, community, discrimination, diversity, human rights, labor Governance: governance, compensation, corruption, ethical, fraud, justice, transparency
To do this, a comprehensive list of ESG-relevant keywords was created (Table 14.1). This list of keywords was inspired by sub-categories commonly used in current ESG rating methodologies. This list was used to help collect publicly available company data from Wikipedia, LinkedIn, Twitter, and Google News. To collect data, web scrapers were developed in Python. Wikipedia data was collected using the Wikipedia Application Programming Interface (API).
Creating a systematic ESG scoring system 269 Wikipedia serves to give a general overview of a company’s practices. Google News data was collected by identifying top news articles based on a Google search. The links to these articles were stored. The news serves to give overall updates on notable ESG developments. Twitter data was collected with the help of the Snscrape library. Snscrape is a lightweight API that allows users to collect near unlimited tweets (with certain restrictions on how many can be collected per hour) from almost any time frame. Twitter was chosen to primarily give consumer-sided feedback on a company’s practices. Since the LinkedIn API does not support the collection of LinkedIn posts, an algorithm was created from scratch to do so instead. The algorithm utilized the Selenium Chromedriver to simulate a human scrolling through a LinkedIn query. Based on this, each post’s text was collected and stored using HTML requests via BeautifulSoup. LinkedIn serves to provide more professional-sided information on a company’s practices. This data collection architecture allows for ratings to be refreshed and generated in real time as needed. Afterward, data for each sub-category was stored in a .csv file. These four social networks cover a wide range of company ESG data. Data was collected for most S&P 500 companies (excluding real estate). Real estate was excluded primarily because it did not receive as much coverage pertaining to ESG issues (based on surface-level analysis), so it did not seem viable for the proposed system. This ensures the collected companies were well balanced across sectors and industries. The web scrapers attempted to collect ~100 posts/articles for each keyword on a social network. However, sometimes less data would be collected because of API rate limits and limited data availability for the lesser-known companies. In order to speed up collection, multiple scripts were run simultaneously. At first, the programs would often get rate-limited for collecting so much data in such a short time frame. To resolve this, safeguards were added to pause the program in case it encountered this. All data collection was done following each site’s terms and conditions. In total, ≈ 937,400 total data points were collected across ≈ 470 companies, with an average of ≈ 37 points per social network keyword. Most of this data was concentrated in 2021. However, a hard date range was not imposed because it would remove data points for lesser-known companies that already struggled to gather enough information. Once all data was collected, it was exported onto a spreadsheet for further analysis. Data was preprocessed using RegEx (Regular Expressions). First, URLs and links were removed. Mentions were replaced with a generic word to abstractify names. Finally, uncommon characters and punctuation were removed – this helped filter out words/characters that might interfere with NLP analysis. 4.2
NLP Analysis
After the data was cleaned and organized, an NLP algorithm was built for analysis. Firstly, an ESG relevancy algorithm was created to filter out ESG irrelevant data that might obstruct results. To do this, keyword detection was used to see if the post/article discussed the current company as well as one or more of the ESG sub-categories. Next, Python’s Natural Language Toolkit (NLTK) Named Entity Recognition library was used to determine if a post related to the organization in order to remove unintended data. For example, if the query “apple climate” was searched, then a post might come up saying “Spring climate is the best time to grow apple trees.” However, Named Entity Recognition would be able to identify that this sentence is not ESG relevant since “Apple” is used as an adjective. Therefore, the algorithm will disregard it
270 Handbook of social computing from the analysis. On the other hand, if the post said, “Apple is pouring 500 million dollars into initiatives for climate change,” then the algorithm would determine that the post is talking about Apple the organization. This filtration step helps remove irrelevant information to improve data quality. After filtration, NLP sentiment analysis was used to score whether a post was ESG positive or negative. Two NLP algorithms were created to do this: the short-post NLP algorithm analyzed shorter bodies of text (tweets, LinkedIn posts) while the long-article NLP algorithm analyzed longer ones (news articles, Wikipedia articles). A literary analysis of different Python sentiment analysis libraries was carried out. After comparing various sentiment analysis libraries such as TextBlob, VADER, FastText, and Flair, it was found that Flair outperformed the other classifiers. This is likely because the simple bag-of-words classifiers, such as VADER or TextBlob, failed to identify the relations that different words had with each other. On the other hand, Flair used contextual word vectors to analyze a sentence’s word-level and character-level relationships. This is likely why, when these algorithms were tested on the Stanford Sentiment Treebank (SST) to rate movie review sentiment on a scale of 1–5, it was found that the Flair algorithm performed the best with an F1 score of 49.90 percent (Akbik et al., 2018; Rao, 2019) (Figure 14.3). So, the short-post algorithm was built using the Flair sentiment analysis library. The long-article algorithm is essentially the short-post algorithm but averaged across all relevant body paragraphs (i.e., paragraphs containing the company name) in an article.
Sources: Akbik et al., 2018; Rao, 2019.
Figure 14.3
Comparison of accuracy of different sentiment analysis algorithms on SST-5 database
These umbrella algorithms were further optimized for each specific social network. For example, the LinkedIn algorithm analyzed the author’s profile of a LinkedIn post to eliminate self-reporting. This is because executives often discuss their positive initiatives and goals, which can dilute other unbiased observations and thus construe results. Additionally, for the Twitter and LinkedIn algorithms, if a link address was found within the text, then the algorithm would analyze that article for evaluation.
Creating a systematic ESG scoring system 271 Initially, the analysis algorithm was very slow since it would take Flair 3–4 seconds to analyze one post. So, a variation called “Flair sentiment-fast” was installed. This allowed Flair to conduct batch analysis where it analyzes multiple posts simultaneously. This significantly cut down on analysis time while slightly sacrificing accuracy. Once all raw data was scored, the scores were averaged into a cohesive spreadsheet. Mean imputing was used to fill in any missing sub-score data. These sub-category scores can provide executives with breakdowns of social sentiment on key issues, giving them concrete information about which areas to improve. These scores can be used raw to help guide initiatives, or they can be compiled further through machine learning to provide an ESG prediction. 4.3
Machine-Learning Algorithms
After compiling the data, different machine-learning models were tested. The goal of these models was to predict an ESG score from 0 to 100, with 0 being the worst and 100 being the best. Most of these supervised learning models were lightweight regression algorithms that can learn non-linear patterns with limited data. Some of these algorithms include Random Forest Regression, Support Vector Regression, K-Nearest Neighbors Regression, and XGBoost (Extreme Gradient Boosting) Regression. Random Forest Regression operates by constructing several decision trees during training time and outputting the mean prediction (Ho, 1995). Support Vector Regression identifies the best fit line within a threshold of values (Awad & Khanna, 2015). K-Nearest Neighbors Regression predicts a value based on the average value of its neighboring data points (Kramer, 2013). XGBoost Regression uses gradient boosting by combining the estimates/predictions of simpler regression trees (Chen & Guestrin, 2016). These regression algorithms were trained using 19 features. These features include the average sentiment for each of the 18 keywords with an additional category for Wikipedia. They were calibrated to public S&P Global ESG ratings to ensure they did not diverge much from existing solutions. A publicly licensed ESG rating scraper on GitHub was used to retrieve S&P Global ESG scores for all companies that were analyzed (Shweta-29, n.d.). Optimization techniques such as regularization were used to prevent overfitting for greater accuracy. Before creating the algorithms, companies with less than five articles/posts per ESG sub-category were filtered out. This left ≈ 320 companies for analysis. In order to create and test the algorithm, ≈ 256 companies were used as training data, while ≈ 64 companies were used for testing data. These results were used to determine the predictive capabilities of the algorithm.
5. RESULTS The Random Forest Regression model displayed the strongest overall results when tested on a holdout sample of 64 companies. The Random Forest Regression model had the strongest correlation with current S&P Global ESG scores with a statistically significant correlation coefficient of 26.1 percent and a MAAE of 13.4 percent (Figures 14.4, 14.5). This means that the algorithm has a p-value of 0.0372 (< 0.05), showing that it is well-calibrated to existing ESG solutions. On the other hand, while the other models have similar MAAE, they also have lower correlation coefficients that do not prove to be statistically significant (Figure 14.5). For example, Support Vector Regression algorithm had a correlation of 18.3 percent and MAAE of 13.7 percent, which results in a p-value of 0.148 (Figure 14.7). The XGBoost model had a correlation of 16.0 percent and MAAE of 14.7 percent, which results in a p-value of 0.207
272 Handbook of social computing (Figure 14.6). Finally, the K-Nearest Neighbors algorithm had a correlation of 13.2 percent and a MAAE of 14.0 percent, which is a p-value of 0.298 (Figure 14.8). However, all the algorithms had a similar MAAE that fell between 13 percent and 15 percent, with the Random Forest model having the lowest at 13.4 percent (Figure 14.9). All the algorithms surpassed the MAAE criteria of 20.0 percent.
Figure 14.4
Mean absolute average error of different machine-learning algorithms against S&P Global ESG score
6. DISCUSSION The Random Forest regression model likely performed the best because it works by combining the predictions of multiple decision trees. This allows it to improve its accuracy and reduce overfitting to one specific tree, thus producing superior results. The Random Forest regression algorithm had a statistically significant R2 correlation of 26.1 percent (p-value < 0.05), and it had a low MAAE of 13.4 percent. These results align with similar work done using other
Creating a systematic ESG scoring system 273
Figure 14.5
R2 correlation of different machine-learning algorithms
sources of data (Krappel et al., 2021). For example, a paper by Krappel et al. (2021) created an ESG prediction system by feeding fundamental data (i.e., financial data and general information surrounding the company) into ensemble machine-learning algorithms. Their most accurate model received an R2 correlation of 54 percent and a MAAE of 11.3 percent. While the proposed algorithm does not correlate as well as Krappel et al.’s model, likely because it leverages qualitative data, it still highlights the viability of using social sentiment as a proxy for ESG. The proposed algorithm displayed encouraging results, highlighting its viability in ESG rating prediction. Unlike current ESG raters who determine ESG using self-disclosed sustainability reports, the proposed algorithm’s data-driven approach allows for a more holistic and balanced evaluation. Utilizing social sentiment also allows executives to measure which areas people want a company to improve on, helping to focus actions on change. Additionally, the system’s architecture allows for scores to be updated within short time frames. Finally,
274 Handbook of social computing
Figure 14.6
XGBoost model predictions v actual scores (scale 0–100)
Figure 14.7
Support Vector Regression predictions v actual scores (scale 0–100)
executives can test additional keywords by inputting them into the algorithm. These attributes showcase the system’s flexibility as well as advantages over the conventional methodology. A limitation of the results, however, is that it was tested on the S&P 500 companies. Therefore, results might not carry over for smaller companies below this index. Another limitation could be misinformation within the social network data. While this should be diluted by other comments, it can potentially alter the algorithm’s ratings. Additionally, the Flair sentiment analysis algorithm sometimes misclassified post/article sentiment, especially if the post/article had a sarcastic attitude. Finally, for this research, access to certain paid native APIs
Creating a systematic ESG scoring system 275
Figure 14.8
K-Nearest Neighbor model predictions v actual scores (scale 0–100)
Figure 14.9
Random Forest model predictions v actual scores (scale 0–100)
was not available. As a result, the collected data might not encompass all data available for a keyword due to rate limiting. While the algorithm has displayed statistically significant results, there is room for improvement that can be done in future research. Some of this can include gathering more data. This can be done by analyzing more companies beyond the S&P 500 or by collecting data for more keywords and ESG sub-topics. This can also be done by using native APIs to collect more datapoints per individual keyword. Additionally, more data sources could be incorporated into the model. This can be done by incorporating other social networks (i.e., Reddit, Glassdoor) or
276 Handbook of social computing by including quantitative data/statistics (i.e., percentage of women as board members, number of scope 1 carbon emissions, etc.) from company reports and government databases. Furthermore, to better fit the task at hand, NLP algorithms can be created specifically for ESG. For instance, while the current method filters much of the irrelevant data, some unrelated data still gets through. So, to solve this, a new supervised learning algorithm can be trained to identify related bodies of text using Term Frequency–Inverse Document Frequency (TF-IDF) vectorization. The algorithm can be trained by hand-labeling the data that has already been collected. To add on, the long-article/short-post NLP algorithms can also be further optimized. While Flair can already provide satisfactory results, some articles seem to be misclassified, which might be a source of error for the algorithm. By creating a sentiment analysis algorithm specifically tailored to ESG classification, the long-article and short-post NLP algorithm accuracy can be further improved. This can be done by either creating a custom ESG lexicon with weights or by training a novel NLP algorithm against classified ESG data. Finally, another area to be improved is post credibility: While small amounts of misinformation would not significantly alter results, it is still best to mitigate this risk as much as possible. There is a growing body of literature that explores fake news identification on social networks. So, these approaches can potentially be used to identify fake posts/articles (de Beer & Matthee, 2020). Also, adding “hard” quantitative data from company filings to the algorithm can be used as an added safeguard. Finally, the algorithm can prioritize more centralized/credible actors over others to yield safer outputs. Overall, this research provides a proof-of-concept framework for a social-network-based ESG evaluation system. This work can serve as the backend logic for a social sentiment ESG product which can eventually be used by executives. While pre-packaged libraries were used for prototyping purposes, in future works, these aspects of the project can be optimized. Unlike existing frameworks that rely on self-reported company filings, the proposed models take on a more balanced view of the company’s ESG positives and negatives. In general, this can help approach an ESG ground truth that can better influence company practices to be more sustainable.
7. CONCLUSION The proposed ESG analysis algorithm can help standardize ESG evaluation for all companies. This is because it limits self-reporting bias by incorporating outside social network analysis for more balanced results. A social network-based ESG index can also directly show which areas people want to change, which can better focus executive efforts on meaningful change. Additionally, using machine learning, the model can generate a proxy for a company’s social responsibility, which can help determine ESG for smaller companies that do not have analyst coverage. This will help more companies receive ESG ratings in an automated way, which can create a more level playing field between small and large companies and ultimately help more socially responsible firms prevail. Overall, the project can have broad implications for bridging the gap in ESG. This will help rewire large quantities of ESG capital to more sustainable and ethical initiatives.
Creating a systematic ESG scoring system 277
NOTE 1. For more information, see https://climateaccountability.org/carbonmajors.html (accessed 24 May 2022).
REFERENCES Akbik, A., Blyth, D. & Vollgraf, R. 2018. Contextual string embeddings for sequence labeling. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, August 20–26, pp. 1638–49. Awad, M., & Khanna, R. 2015. Support vector regression. Efficient Learning Machines. Berkeley, CA: Apress. https://doi.org/10.1007/978–1-4302–5990–9_4. Berg, F. et al. 2019. Aggregate confusion: the divergence of ESG ratings. SSRN Electronic Journal. http://doi.org/10.2139/ssrn.3438533. Chen, T., & Guestrin, C. 2016. XGBoost: a scalable tree boosting system. KDD ʼ16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August, pp. 785–794. http://doi.org/10.1145/2939672.2939785. de Beer, D., & Matthee, M. 2020. Approaches to identify fake news: a systematic literature review. Integrated Science in Digital Age 136, pp. 13–22. https://doi.org/10.1007/978–3-030–49264–9_2. Drempetic, S. et al. 2019. The influence of firm size on the ESG score: corporate sustainability ratings under review. Journal of Business Ethics 167(2), pp. 333–60. http://doi.org/10.1007/ s10551–019–04164–1. Gloor, Peter A. et al. 2009. Web Science 2.0: identifying trends through semantic social network analysis. 2009 International Conference on Computational Science and Engineering. http://doi.org/10 .1109/cse.2009.186. Hisano, R., Sornette, D., & Mizuno, T. 2020. Prediction of ESG compliance using a heterogeneous information network. Journal of Big Data 7(1), p. 22. Ho, T. K. 1995. Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition (Vol. 1), pp. 278–82. Jain, M., Sharma, G. D., & Srivastava, M. (2019). Can sustainable investment yield better financial returns: a comparative study of ESG indices and MSCI indices. Risks 7(1), 15. https://doi.org/10.3390/ risks7010015. Kotsantonis, S., & George, S. 2019. Four things no one will tell you about ESG data. Journal of Applied Corporate Finance 31(2), pp. 50–58. https://doi.org/10.1111/jacf.12346. Kramer, O. 2013. Dimensionality Reduction with Unsupervised Nearest Neighbors, Intelligent Systems Reference Library Volume 51, Springer. Krappel, T., Bogun, A., & Borth, D. 2021. Heterogeneous ensemble for ESG ratings prediction. KDD Workshop on Machine Learning in Finance. https://doi.org/10.48550/arXiv.2109.10085. Liao, P.-C., Ni-Ni, X., Chun-Lin, W., Zhang, X.-L., & Yeh, J.-L. 2017. Communicating the corporate social responsibility (CSR) of international contractors: content analysis of CSR reporting. Journal of Cleaner Production 156, pp. 327–36. Lin, S.-J., & Hsu, M.-F. 2018. Decision making by extracting soft information from CSR news report. Technological and Economic Development of Economy 24(4), pp. 1344–61. Rao, P. 2019. Fine-grained sentiment analysis in Python (Part 1). Medium, Towards Data Science, Sept. 9. https://www.towardsdatascience.com/fine-grained-sentiment-analysis-in-python-part -1–2697bb111ed4 (accessed 3 January 2022). S&P Global. n.d. ESG evaluation. http://www.spglobal.com/ratings/en/products-benefits/products/esg -evaluation (accessed 1 November 2021). Shahi A. M., Issac B., & Modapothala, J. R. 2011. Analysis of supervised text classification algorithms on corporate sustainability reports. Proceedings of 2011 International Conference on Computer Science and Network Technology, Vol. 1, pp. 96–100. Shweta-29. n.d. Shweta-29/Companies_ESG_scraper. GitHub. https://github.com/shweta-29/Companies _ESG_Scraper (accessed 15 November 2022).
278 Handbook of social computing Sokolov, A., Mostovoy, J., Ding, J., & Seco, L. 2021. Building machine learning systems for automated ESG scoring. The Journal of Impact and ESG Investing 1(3), pp. 39–50. Stackpole, B. 2021.Why sustainable business needs better ESG ratings. MIT Sloan, Dec. 6. https:// mitsloan.mit.edu/ideas-made-to-matter/why-sustainable-business-needs-better-esg-ratings (accessed 24 May 2022). UN [United Nations Global Compact] (2016). The UN Global Compact – Accenture Strategy CEO Study 2016. https://www.unglobalcompact.org/library/4331 (accessed 24 May 2022). UN [United Nations Global Compact] (2019). UN Global Compact – Accenture Strategy 2019 CEO Study – The Decade to Deliver: A Call to Business Action. https://www.unglobalcompact.org/library/ 5715 (accessed 26 May 2022). Venturelli, A., Caputo, F., Rossella, L., Giovanni, M., & Chiara, M. 2017. How can CSR identity be evaluated? A pilot study using a Fuzzy Expert System. Journal of Cleaner Production 141, pp. 1000–1010. Wicher, P., Zapletal, F., & Lenort, R. 2019. Sustainability performance assessment of industrial corporation using Fuzzy Analytic Network Process. Journal of Cleaner Production 241. https://doi.org/10 .1016/j.jclepro.2019.118132.
15. Two chambers, no silver bullets: the growing polarity of climate change discourse Jacek Mańko and Dariusz Jemielniak
1. INTRODUCTION Echo chambers or filter bubbles are growing phenomena in the infosphere and have a disastrous effect on the topics prone to misinformation and disinformation (Carmichael et al., 2017; Meyer et al., 2019). Though clear definitions of echo chambers or filter bubbles remain elusive, those key terms nevertheless share many of the common characteristics. For example, echo chambers might be defined as a virtual community where an overwhelming proportion of its members share crucial views, while not being exposed to opposing views. Filter bubbles can be seen as a natural consequence of echo chamber formation in a way that encapsulates people in highly selective, isolated opinion silos, depriving them of views different from their own, thus amounting to polarization (Einav et al., 2022). It comes therefore as no surprise that discourse on the Internet becomes increasingly agonistic and polarized (Marichal & Neve, 2019). Social media are an arena of cultural wars, where misogyny, racism, as well as conspiracy theories thrive (Bricker & Justice, 2019; Jemielniak, 2016; Siapera, 2019). COVID-19 makes low-credibility information on Twitter particularly prevalent (Jemielniak & Krempovych, 2021; Yang et al., 2020), but the problem is persistent to all topics. The current discourse about climate change is particularly worth studying, given its high popularity in the mainstream media, and well-proven polarization (Williams et al., 2015). Like the COVID-19 pandemic, climate change fits well into combative, highly politicized discourse featuring science-oriented narrative versus diverse forms of science denialism and anti-scientific attitudes.
2.
LITERATURE REVIEW OR BACKGROUND
2.1 Twitter Even though it would be difficult to claim that Twitter is representative of social views, arguably its heated discussions are an emanation of topics important in the public discourse. Climate change disputes on Twitter are particularly interesting, as they are highly polarized, and saturated with a plethora of filter bubbles that permeate social media platforms. Twitter is a primarily text-based platform designed for posting and/or exchanging short (up to 280 character) comments, further referred to as tweets. Though Twitter is falling behind when it comes to the number of active users (roughly 370 million as of December 2022, compared with 2.8 billion users by Meta/Facebook or 1.5 billion by TikTok), it has managed to establish a clearly distinctive character in the kaleidoscope of other social media platforms. What makes Twitter so unique is the fact that, unlike any other platform, it attracts a combination of various opinion 279
280 Handbook of social computing leaders (i.e., celebrities, journalists, researchers, academicians, and, finally, politicians) who, very often, run their accounts, interact with other users, and post their tweets solely by themselves (Hwang, 2013; Tanupabrungsun & Hemsley, 2018). Effectively, Twitter has become a transnational public sphere, used for social activism and the general public persuasion, also in topics related to climate crisis (Ganczewski & Jemielniak, 2022; Neff & Jemielniak, 2022). Furthermore, due to its distributed and decentralized character, Twitter poses very few, if any, entry barriers for everybody willing to engage in any public discourse of interest. Therefore, Twitter has become a unique blend of conflicting yet influential first-hand opinions, as well as discussions on a number of high-profile topics, in a character and scope that is nowhere else to be found on the Internet (Jungherr, 2016; Mazid, 2022). Previous studies of Twitter have shown that it can be a good platform for analyzing climate change polarization and, in general, online tribes (Gloor et al., 2019, 2020). 2.2
Research about Climate Change Polarization on Twitter
Climate change is such a prevalent topic in mainstream media that it has been accumulating the attention of scientists, decision-makers, and public opinion for decades now. Though principal mechanisms of climate change that include anthropogenic causes through excessive greenhouse gas emissions reached broad consensus among the scientific community, public debate on that topic remains highly divisive and politicized (Williams et al., 2015). Twitter naturally poses a viable opportunity to take over some of such climate change-related divisive discourse, as it allows its users to reach a wide audience easily. Yet, somewhat unsurprisingly, by accommodating such discourse, Twitter has become yet another arena for a digital clash of incommensurable world views. Previous research on climate change-related discourse on Twitter however, diversified in scope and methods, has been consistently demonstrating a certain degree of politicization of such discourse (Bednarek et al., 2022; Wei et al., 2021) and the presence of opposing scientific as well as various anti-scientific camps advancing their contrasting positions. For example, one study (Tyagi et al., 2020) investigated echo chambers within climate change-related discussions during the United Nations’ Climate Change Conference taking place in Katowice, Poland in December 2018. Climate change denialists, also referred to as disbelievers, resorted to different hashtags (e.g., #ClimateHoax) as compared with believers who were tweeting under hashtags like #ClimateChangeIsReal. Both these groups differed significantly in terms of word use and topic selection, which further amplifies the notion of existing echo chambers between those groups. A subsequent novel study (Tyagi & Carley, 2021) demonstrated the existence on Twitter of a robust community of climate change disbelievers spreading various conspiracy theories and showing positive sentiment towards them. Most popular conspiracy theories about climate change claim that some groups with power, such as an official government or secret organization, for example in the form of a covert deep state, is intentionally causing or worsening the problem with the climate. These theories suggest further that those entities are responsible for conducting experiments or following a premeditated plan to alter the climate. Tyagi and Carley (2021) also pointed out that it is climate change disbelievers who predominantly share such conspiracy theories on Twitter bona fide. However, believers were also found to retweet some conspiracy theories, yet mostly to debunk or invalidate such views. Another study (Moernaut et al., 2022) employed both qualitative and quantitative methods within Dutch and Flemish communities on Twitter to investigate public reaction to hot weather in the Netherlands during summer 2018
Two chambers, no silver bullets 281 and observed that, indeed, there is a significant amount of climate change denialism present on social media platforms, despite such views not being reflected in mainstream news coverage. This might suggest that somewhat marginalized groups of deniers are actively looking for different online public spheres where they could disseminate their opinions uninterrupted. On the other hand, climate change believers resorted to highly unilateral and, to some extent, repetitive, or even authoritative modes of interaction, often retweeting mainstream media coverage or otherwise respected public figures, such as climate scientists. Though in their sample Moernaut et al. (2022) report some limited interactions between ideologically opposed camps of deniers and believers, their results seem to rather support the notion of polarized forms of echo chambers. A recent qualitative study by Eslen-Ziya (2022) about the debate on climate change on Twitter in the Turkish language also provided corroborating evidence about the existence of polarized groups of deniers and believers who promote their own discourse using highly emotional, humorist, or even sarcastic rhetoric. Interestingly, climate change believers were resorting to, for example, hope when tweeting about ways of dealing with climate change from a scientific and/or political perspective instead of, one might assume, fearmongering about climate change’s imminent and destructive nature. At the same time, climate change disbelievers were found to resort to distrust towards science and mainstream politicians and, hence, to conspiratorial thinking. Eslen-Ziya (2022) also points to the fact that echo chambers can lead to the intensification of views and emotions represented by those two camps, which in turn can serve to emphasize further and strengthen such already polarized opinions. 2.3
Goal of the Study
Summarizing this brief literature review, it becomes evident that polarized echo chambers of climate change deniers and believers coexist on Twitter. Yet, the character of such groups has not been studied systematically so far. The goal of our study, inspired by the approach of Thick Big Data (Jemielniak, 2020), is to perform an exploratory, mixed-methods, quantitative, and qualitative investigation of the growing polarity and echo chambers on the example of a climate change-related discussion on Twitter. We aim to look at the general sentiment of the tweets as well as at the shared media links within the tweets. From a qualitative standpoint, we are also looking at the most popular tweets and most popular words used across both groups of deniers and believers.
3. METHODOLOGY 3.1
Keyword/Hasthags Selection
Our chapter presents the results of a study of a self-collected database of nearly 400,000 tweets in English from two ideologically opposing camps, one represented by keyword #climatehoax (38,672 unique users), the other one by #sustainability (118,536 unique users). For the climate change deniers, tweets in English containing one or more of the following keywords were obtained: climatehoax, climatechangehoax, and globalcooling. For climate change believers, these were analogically keywords: sustainability, sustainableliving, and ecofriendly. These keywords have been chosen because of their a priori polarity and prevalence in the global discourse about climate change. Since climate change deniers consider the general scientific
282 Handbook of social computing consensus on climate change to be untruthful, they refer to it as a hoax. Furthermore, they adopt hashtags and/or keywords that ridicule or even mock the idea of climate change, hence referring to putative global cooling as opposed to the mainstream scientific narration of global warming. Interestingly, keywords such as ecofriendly, sustainability, or sustainable living, though while not referring to climate or climate change directly, are considered to be one of the most supportive of the “climate change is real” narrative, as more general, thematic hashtags such as #climatechange or #globalwarming are known to attract both climate change deniers and believers (Effrosynidis et al., 2022). Including such general hashtags would require some stance detection analysis among Twitter users first to distinguish believers from deniers since it was not a primary goal of our study, hence resorting to already polarized hashtags. 3.2
Tweets Collection
Tweets were collected in May 2021 using an open-source scraping tool for Twitter called TWINT (https://github.com/twintproject/twint). For each keyword, the limit was set to collect a maximum of 100,000 tweets. All relevant, including historic, tweets prior to collection date were obtained, so that the time range for tweets spanned between June 2008 and May 2021. We only looked at tweets that were still available on the platform during the period of collection, which implies that we studied tweets that followed Twitter’s community standards and were considered acceptable for public discussion. This aligns with the ethical practices recommended by the Association of Internet Researchers (AOIR). Additionally, it takes into account Twitter’s moderation practices, which influence the content on the platform. Table 15.1
Keywords amd tweets frequencies for climate change believers
Keyword
Count
sustainability
93,697
sustainableliving
89,741
ecofriendly
82,835
Total:
266,273
Table 15.2
Keywords and tweets frequencies for climate change deniers
Keyword
Count
climatehoax
85,031
climatechangehoax
27,971
globalcooling
18,641
Total:
131,643
There were 397,916 tweets in total. There are twice as many tweets for climate change believers as there are for deniers. This disproportion is attributable to the higher popularity of mainstream scientific climate change discourse on large social media platforms such as Twitter, as well as to its terms of use and service that may result in deleting tweets and/or accounts spreading misinformation. Importantly, those two datasets are mutually exclusive, so that only two tweets contained keywords from both the camps of deniers and believers.
Two chambers, no silver bullets 283 3.3
Data Analysis
All analyses were performed in Python v.3.8. First, we analyzed overall tweet sentiments in each camp. Sentiment analysis was performed using a Twitter Toolbox package (https://github .com/eschibli/twitter-toolbox). Additionally, we also show the media links most commonly shared by the opposing groups, to draw conclusions on the perceived neutrality and objectivity of the used media sources. By that, we also investigated application of the concept of sustainability for various marketing and commercial purposes. We extracted un-shortened outgoing URLs, to establish the media links used – in other words, we checked the final URL destinations linked whenever a link shortener, such as bit.ly or ow.ly, was used. We also studied the most frequent words used in both datasets and exploratively analyzed in a qualitative manner a small sample of the ten most popular (defined as a sum of replies, retweets, and favorites) tweets from both communities. We focused on the analysis of the content, nature, and emotional tone of tweets within each camp, without studying, for example, how and, if so, to what extent climate change believers and deniers interact among themselves or even between the camps.
4. RESULTS 4.1
Sentiment Analysis
Sentiment analysis involves identifying the emotions and tone of the person writing the tweets. For this task, we used an open-source tool called Twitter Toolbox with an implemented sentiment classifier based on Spacy’s small English language model (Sharma, 2020) supported by a bag-of-words logistic regression model and two long short-term memory neural networks (https://pypi.org/project/twitter-nlp-toolkit/). To show the polarized sentiment of the tweets, we assigned them a numerical value ranging from 0 to 1, where 0 is most negative and 1 most positive. The examples given below illustrate both positive and negative sentiment polarity in both datasets. “Omg omg omggg … This is unbelievable! #Hyderabad declared as most #vegan friendly city in #India This makes me super happy and proud. #veganlife is the way forward! Thank you @ TOIIndiaNews #sustainableliving #veganism #veganfood https://timesofindia.indiatimes.com/ entertainment/events/hyderabad/hurray-hyderabad-is-indias-most-vegan-friendly-city-2019/ articleshow/71940234.cms” One of the most negative tweets from climate change deniers camp: “#Fake #GlobalCooling #GlobalWarming #ClimateChange Activists Want ‘Meat Tax.’ Just another way the poor get poorer. https://www.newsmax.com/Newsfront/climate-change-meat -tax/2015/11/28/id/703802/?ns_mail_uid=83561757” Mean sentiment values for each keyword in each of the camps are presented in Tables 15.3 and 15.4. Figure 15.1 depicts sentiment polarization between the climate change deniers and believers over time. Sentiment for the latter is consistently higher throughout the whole study
284 Handbook of social computing period, implicating more positive tweets coming from climate change believers. This is in accordance with previous studies documenting that tweets coming from anti-scientific groups have lower sentiment than from those groups who support the scientific mainstream narrative (Gerts et al., 2020). Table 15.3
Mean sentiment for tweets from climate change denialists camp
Keyword
Mean sentiment
globalcooling
0.67
climatechangehoax
0.66
climatehoax
0.62
Table 15.4
Mean sentiment for tweets from climate change believers camp
Keyword
Mean sentiment
sustainability
0.85
sustainableliving
0.84
ecofriendly
0.83
Figure 15.1
4.2
Mean sentiment polarity across each month from January 2019 to May 2021 for both camps
Analysis of Media Links
To investigate how online discourse differs between the two ideological camps, we looked at the media sources that are referred to in tweets. This allowed us to see which media sources are considered trustworthy and relevant enough to support a given argument. If the media sources referenced in the two camps overlapped, it could imply that the differences in discourse are due to differing interpretations of the same information. However, if there are significant differences in the most frequently used media sources, it could suggest that the camps and
Two chambers, no silver bullets 285 their users belong to ideologically different groups with different standards for what constitutes a reliable and important source, and possibly different cultural norms. In order to investigate that, we aggregated all media links used in tweets and divided the most frequent ones into subjective categories (such as e-commerce website, blog, or social media platform) to which we believe a given media link belongs. Subjective category was assigned by one of the authors after manual inspection of the content of a given website and reading through webpage description on Wikipedia whenever applicable. Tables 15.5 and 15.6 present 15 most popular media links as measured by the number of times they appeared in tweets, alongside the respective categories for climate change believers and deniers. Figure 15.2 shows proportional distributions of media links categories across the camps. Table 15.5
Most frequent media links with categories for climate change believers camp
Domain
Frequency
Category
1.
etsy.com
6,572
E-commerce
2.
instagram.com
5,988
Social Media
3.
youtube.com
3,103
Social Media
4.
gleam.io
2,196
Marketing
5.
livegoodinc.com
2,098
E-commerce
6.
linkedin.com
2,030
Social Media
7.
blogspot.com
1,154
Blogs
8.
amazon.com
708
E-commerce
9.
bfmcreates.com
661
E-commerce
10.
rateitgreen.com
660
Blogs
11.
ecofriendlylink.com
550
Blogs
12.
facebook.com
525
Social Media
13.
theguardian.com
512
Mainstream media outlet
14.
olioex.com
432
E-commerce
15.
forbes.com
388
Mainstream media outlet
Table 15.6
Most frequent media links with categories for climate change deniers camp
Domain
Frequency
Category
1.
youtube.com
6,224
Social Media
2.
infiniteunknown.net
1,440
Climate change denialism
3.
blogspot.com
1,099
Blogs
4.
breitbart.com
886
Conservative media outlet
5.
wordpress.com
743
Blogs
6.
iceagenow.info
616
Climate change denialism
7.
wattsupwiththat.com
498
Climate change denialism
8.
drtimball.com
423
Climate change denialism
9.
dailymail.co.uk
401
Mainstream media outlet
10.
theguardian.com
387
Mainstream media outlet
11
dailycaller.com
315
Conservative media outlet
12.
americanthinker.com
305
Conservative media outlet
13.
zerohedge.com
294
Conservative media outlet
14.
foxnews.com
289
Conservative media outlet
15.
facebook.com
266
Social Media
286 Handbook of social computing
Figure 15.2
Stacked bar chart demonstrating proportional distributions of media links categories across the climate change denialists and believers camps
For climate change denialists, there is visible a relatively sparse use of social media platforms, with a “notable” exception of YouTube, notorious for hosting a large number of conspiracy videos (Röchert et al., 2022). However, since we did not perform any content analysis of shared YouTube videos, we can only assume they might have very likely contained some forms of climate change denialism. Furthermore, not surprisingly, links associated with (far) right-wing or alt-right media outlets dominate within the climate change denialists camp: Breitbart, American Thinker, The Daily Caller or FoxNews, though mainstream outlets such as Daily Mail and The Guardian are present as well. Media links from private blogs of prominent climate change denialists such as Antony Watts or Timothy Ball appeared to be popular too. It is worth mentioning that far- or alt-right-wing media tend to propagate climate change denialists’ agenda themselves (Krange et al., 2021; Żuk & Szulecki, 2020), so the distinction between those two categories might be fuzzy in this regard. As for the climate change believers, the most popular media links categories turn out to be various e-commerce platforms, followed by social media (mostly Instagram and YouTube) and marketing platforms (i.e., http://www.gleam.io). Noteworthy, Instagram is a social media platform putting a strong emphasis on visual materials in the form of images and short videos like reels or stories. There is compelling evidence that Instagram has been widely used for various marketing strategies with a variety of applications across various industries, including tourism and food which are particularly relevant for climate and sustainability or ecofriendliness (Rejeb et al., 2022). So, the popularity of Instagram can be, at least partially, attributable to the fact that those hashtags are apparently often employed as a part of marketing campaigns advertising certain ecological products or ways of life. One interesting question to consider is how the topic of climate change has been repurposed by marketing agencies as a means to promote their prod-
Two chambers, no silver bullets 287 ucts. Some research suggests that such advertisements can be part of the problem because it leaves a certain carbon footprint itself, promotes and increases overall consumption or even spreads greenwashing (Hartmann et al., 2022). Conversely, sustainability marketing can be part of the solution by raising awareness about climate-friendly consumption and stigmatizing climate-harmful products. Other popular media links included blogs or platforms aggregating ecological products or sustainable patterns of production or consumption. Surprisingly, The Guardian, though targeted primarily at liberal or left-wing public, has been quite frequently referred to by both camps. It has been already evidenced that this British mainstream newspaper finds its way among right-wing sympathizers as well (Górska et al., 2022), though the exact character and scope to which it is brought up by that group remains unclear. 4.3
Qualitative Content Analysis
We performed a preliminary qualitative analysis of ten most popular tweets in each of the camps. Additionally, the word clouds (Figures 15.3 and 15.4) illustrate the most common word used for those camps. As visible in the word clouds, both camps’ content focused on climate. Yet, climate change deniers often referred to science or scientists, though very likely in a negative sense, hence also the popularity of the term hoax. Notably, referring to political terms (Trump, Gore, tax, liberal, Obama) suggests that this is how climate change denialists might perceive the climate change crisis – as a political method of coercion by enforcing a hostile agenda towards citizens and the whole economy. On the contrary, climate change believers referred predominantly to ecological concepts such as environment, plastic, or waste.
Figure 15.3
Word cloud of most common words used by climate change denialists
Most popular tweets from climate change believers are, as a matter of fact, very heterogeneous – including promotion of celebrities’ partnership with some pro-ecological brands, comparing ecofriendliness of mining certain cryptocurrencies, advocating sustainable space rockets, or
288 Handbook of social computing
Figure 15.4
Word cloud of most common words used by climate change believers
announcements about planting trees. One tweet was even very political in its message, highlighting class inequality in the fight against climate change (see below). Selected example of most popular tweets from climate change believers: “We planted 5000 trees yesterday with #NYFoundation … Doing our bit for the planet. #SustainableLiving #GreenIndiaChallenge” “One of the great cons of the past few decades was convincing the working class that they bore the burden of fixing climate change and sustainability with their choices, rather than forcing systems to make meaningful change and asking the rich to change their extravagant ways” Most popular tweets from climate change deniers were quite homogeneous instead, focusing on denying the problem in the first place by expressing negative sentiments towards mainstream media and democratic politicians. Selected example of most popular tweets from climate change deniers: “Good morning – after the laughable CNN climate hoax debate, it’s painstakingly obvious … Every Democrat presidential candidate is a raging communist who wants to be the next Joseph Stalin. Anyone of these want-to-be dictators will destroy the USA in 4 years. They’re nuts.” “Leftist Senator threatens Americans with increased energy costs and massive job losses – for the climate hoax, which is the biggest fraud and shakedown of taxpayers in modern history.”
Two chambers, no silver bullets 289
5. DISCUSSION In our study, we investigated the emotional tone of tweets coming from climate change denialists and believers. As we observed, tweets from believers were consistently more positive than those from deniers. This is in accordance with previous studies suggesting that misinformation tweets exhibit more negative sentiment in comparison with non-misinformation tweets (Gerts et al., 2020). Also, Górska et al. (2022) provide evidence that tweet sentiment in some other predominantly conservative, right-wing groups is lower in comparison with liberal, mainstream hashtags. Very positive sentiment of climate change believers’ tweets can be also attributed to the fact that eco-friendly discourse promoting pro-environmental, sustainable actions and attitudes, including green consumption, seems to be generally more enthusiastic in order to motivate people to engage in such pro-ecological activities. Furthermore, eco-friendliness has already been identified as a very salient topic among Twitter discussion on climate change (Camarillo et al., 2021) – familiarity and prevalence might have therefore additionally enhanced positive sentiment of eco-friendly tweets. As we have also demonstrated, hashtags sustainability and eco-friendly are often used for marketing and commercial purposes, which is a noteworthy observation in itself. However, this raises the question whether such hashtags are used scientifically – in a sense whether they somehow contribute to solving the climate crisis by advocating scientific and/or political means, or if they are simply part of a problem of overconsumption and amount merely to greenwashing instead. Wiedmann et al. (2020) point out that in order for efforts to promote sustainability to be successful, it is necessary for individuals to make significant changes to their lifestyles in addition to technological advancements. Yet apparently, current social, economic, and cultural systems often encourage increased consumption and the need for economic growth, which can hamper necessary changes to be made at the societal level (Wiedmann et al., 2020). Therefore, entanglement of sustainability and eco-friendliness with consumption or greenwashing and its impact on pro-scientific discourse around climate change is multifaceted and merits separate investigation. We also looked at media links invoked in the tweets. Both groups differed largely in terms of media links used, with deniers resorting predominantly to alt- or far-right or otherwise blatantly conspiratorial media links. Believers invoked media links associated with e-commerce, social media, or marketing. Similarly, both groups differed in the choice of most frequent words or nature of most popular tweets with climate change deniers tweeting substantially in derogatory terms about their disagreement with mainstream politics. Noteworthy, tweets from climate change believers were more diversified, ranging in their scope from various marketing, ecological campaigns to some forms of political activism. Apparently, both camps seem to be to some extent aware of the politicization of the climate change crisis, however, they approach the problem from a diametrically opposed way. The issue of political awareness among such echo chamber groups is therefore more perplexed than one might think and merits further research.
6. CONCLUSION In our study, we were able to observe polarized groups of climate-change believers and deniers on Twitter. Climate change believers encompass all those users whose tweets contained at least one of the following keywords: ecofriendly, sustainability, and sustainableliving.
290 Handbook of social computing Climate change deniers tweeted using keywords climatehoax, climatechangehoax, and globalcooling. There were less deniers and they tweeted less, possibly because such beliefs are not widespread in the society and also might be subjected to some forms of online censorship, especially on social media. Noteworthy, tweets from deniers demonstrated overall more negative sentiment than tweets coming from climate change believers. Both groups also differed in terms of media links used in their tweets as well as in the scope and nature of the most popular tweets and topics. Our research adds to the growing body of evidence of climate change-related discourse on social media, especially on Twitter, by supporting the notion of the existence of polarized groups of deniers and believers that cohabit separate digital echo chambers. However, the scope of our research was rather exploratory and more detailed characteristics of such polarized groups, including interaction within them as well as between them, if there are any, would shed new light into climate change-related echo chambers on Twitter. Furthermore, one additional strength of our research is that it analyzed tweets from all around the world, as long as they were published in English. However, this global scope could also be seen as a limitation because we were not able to identify specific local weather events that might have influenced the conversation between the two camps. Previous research on climate change on Twitter had demonstrated that single events can significantly impact ensuing online discussions, sometimes restricted to certain geographies and languages (Leas et al., 2016; Moernaut et al., 2022; Tyagi et al., 2020; Willson et al., 2021). Lastly, a different selection of hashtags for climate change believers, for example #climatechangeisreal, might have resulted in a date set with different characteristics. It is worth repeating that discourse on Twitter among climate change believers is to some extent diversified, and pluralistic, and encompasses a variety of topics. Apart from eco-friendly products and sustainability, the list of topics includes, for example, raising climate awareness, discussing ongoing politics, or various calls for pro-environmental initiatives and actions (Camarillo et al., 2021). Any future research on climate change polarization, on social media in general, and on Twitter in particular should take into account first and foremost the existence of pluralistic echo chambers that might be merely loosely united under certain hashtags. In order to fully grasp the complexity and inner dynamics of such polarized groups, a combination of a wide repertoire of both quantitative and qualitative techniques including, for example, stance detection, social network analysis, sentiment analysis, topic modeling, or discourse analysis would be essential.
REFERENCES Bednarek, M., Ross, A. S., Boichak, O., Doran, Y. J., Carr, G., Altmann, E. G., & Alexander, T. J. (2022). Winning the discursive struggle? The impact of a significant environmental crisis event on dominant climate discourses on Twitter. Discourse, Context & Media, 45, 100564. https://doi.org/10 .1016/j.dcm.2021.100564. Bricker, B., & Justice, J. (2019). The postmodern medical paradigm: a case study of anti-MMR vaccine arguments. Western Journal of Speech Communication: WJSC, 83(2), 172–89. https://doi.org/10 .1080/10570314.2018.1510136. Camarillo, G., Ferguson, E., Ljevar, V., & Spence, A. (2021). Big changes start with small talk: Twitter and climate change in times of Coronavirus pandemic. Frontiers in Psychology, 12. https://www .frontiersin.org/articles/10.3389/fpsyg.2021.661395 (last accessed 24 January 2024). Carmichael, J. T., Brulle, R. J., & Huxster, J. K. (2017). The great divide: understanding the role of media and other drivers of the partisan divide in public concern over climate change in the USA, 2001–2014. Climatic Change, 141(4), 599–612. https://doi.org/10.1007/s10584-017-1908-1.
Two chambers, no silver bullets 291 Effrosynidis, D., Karasakalidis, A. I., Sylaios, G., & Arampatzis, A. (2022). The climate change Twitter dataset. Expert Systems with Applications, 204, 117541. https://doi.org/10.1016/j.eswa.2022.117541. Einav, G., Allen, O., Gur, T., Maaravi, Y., & Ravner, D. (2022). Bursting filter bubbles in a digital age: opening minds and reducing opinion polarization through digital platforms. Technology in Society, 71, 102136. https://doi.org/10.1016/j.techsoc.2022.102136. Eslen-Ziya, H. (2022). Humour and sarcasm: expressions of global warming on Twitter. Humanities and Social Sciences Communications, 9(1), 1–8. https://doi.org/10.1057/s41599-022-01236-y. Ganczewski, G., & Jemielniak, D. (2022). Twitter is garbage: a Thick Big Data exploration of #zerowaste hashtag on Twitter in relation to packaging and food packaging materials. Packaging Technology & Science. 35(12), 893–902. https://doi.org/10.1002/pts.2685. Gerts, D., Shelley, C. D., Parikh, N., Pitts, T., Ross, C. W., Fairchild, G., Chavez, N. Y. V., & Daughton, A. R. (2020). “Thought I’d share first” and other conspiracy theory tweets from the Covid-19 infodemic: exploratory study. http://arxiv.org/abs/2012.07729 (last accessed 30 December 2022). Gloor, P. A., Fronzetti Colladon, A., de Oliveira, J. M., & Rovelli, P. (2020). Put your money where your mouth is: using deep learning to identify consumer tribes from word usage. International Journal of Information Management, 51, 102031. https://doi.org/10.1016/j.ijinfomgt.2019.03.011. Gloor, P. A., Fronzetti Colladon, A., de Oliveira, J. M., Rovelli, P., Galbier, M., & Vogel, M. (2019). Identifying tribes on Twitter through shared context. In Y. Song, F. Grippa, P. A. Gloor, & J. Leitão (eds), Collaborative Innovation Networks: Latest Insights from Social Innovation, Education, and Emerging Technologies Research (pp. 91–111). Springer International. https://doi.org/10.1007/978 -3-030-17238-1_5. Górska, A. M., Kulicka, K., & Jemielniak, D. (2022). Men not going their own way: a thick big data analysis of #MGTOW and #Feminism Tweets. Feminist Media Studies, 23:8, 3774–3792. https://doi .org/10.1080/14680777.2022.2137829. Hartmann, P., Marcos, A., Castro, J., & Apaolaza, V. (2022). Perspectives: advertising and climate change – part of the problem or part of the solution? International Journal of Advertising, 42:2, 430–457. https://doi.org/10.1080/02650487.2022.2140963. Hwang, S. (2013). The effect of Twitter use on politicians’ credibility and attitudes toward politicians. Journal of Public Relations Research, 25(3), 246–58. https://doi.org/10.1080/1062726X.2013 .788445. Jemielniak, D. (2016). Breaking the glass ceiling on Wikipedia. Feminist Review, 113(1), 103–8. https:// doi.org/10.1057/fr.2016.9. Jemielniak, D. (2020). Thick Big Data: Doing Digital Social Sciences. Oxford University Press. https:// doi.org/10.1093/oso/9780198839705.001.0001. Jemielniak, D., & Krempovych, Y. (2021). An analysis of AstraZeneca COVID-19 vaccine misinformation and fear mongering on Twitter. Public Health, 200, 4–6. https://doi.org/10.1016/j.puhe.2021 .08.019. Jungherr, A. (2016). Twitter use in election campaigns: a systematic literature review. Journal of Information Technology & Politics, 13(1), 72–91. https://doi.org/10.1080/19331681.2015.1132401. Krange, O., Kaltenborn, B. P., & Hultman, M. (2021). “Don’t confuse me with facts”—how right wing populism affects trust in agencies advocating anthropogenic climate change as a reality. Humanities and Social Sciences Communications, 8(1), 1–9. https://doi.org/10.1057/s41599-021-00930-7. Leas, E. C., Althouse, B. M., Dredze, M., Obradovich, N., Fowler, J. H., Noar, S. M., Allem, J.-P., & Ayers, J. W. (2016). Big Data sensors of organic advocacy: the case of Leonardo DiCaprio and climate change. PloS One, 11(8), e0159885. https://doi.org/10.1371/journal.pone.0159885. Marichal, J., & Neve, R. (2019). Antagonistic bias: developing a typology of agonistic talk on Twitter using gun control networks. Online Information Review, 44(2), 343–363. https://doi.org/10.1108/OIR -11-2018-0338. Mazid, I. (2022). Social presence for strategic health messages: an examination of state governments’ use of Twitter to tackle the Covid-19 pandemic. Public Relations Review, 48(4), 102223. https://doi.org/ 10.1016/j.pubrev.2022.102223. Meyer, S. B., Violette, R., Aggarwal, R., Simeoni, M., MacDougall, H., & Waite, N. (2019). Vaccine hesitancy and Web 2.0: exploring how attitudes and beliefs about influenza vaccination are exchanged in online threaded user comments. Vaccine, 37(13), 1769–74. https://doi.org/10.1016/j.vaccine.2019 .02.028.
292 Handbook of social computing Moernaut, R., Mast, J., Temmerman, M., & Broersma, M. (2022). Hot weather, hot topic: polarization and sceptical framing in the climate debate on Twitter. Information, Communication and Society, 25(8), 1047–66. https://doi.org/10.1080/1369118X.2020.1834600. Neff, T., & Jemielniak, D. (2022). How do transnational public spheres emerge? Comparing news and social media networks during the Madrid climate talks. New Media & Society, 14614448221081426. https://doi.org/10.1177/14614448221081426. Rejeb, A., Rejeb, K., Abdollahi, A., & Treiblmaier, H. (2022). The big picture on Instagram research: insights from a bibliometric analysis. Telematics and Informatics, 73, 101876. https://doi.org/10 .1016/j.tele.2022.101876. Röchert, D., Neubaum, G., Ross, B., & Stieglitz, S. (2022). Caught in a networked collusion? Homogeneity in conspiracy-related discussion networks on YouTube. Information Systems, 103, 101866. https://doi.org/10.1016/j.is.2021.101866. Sharma, M. (2020). Polarity detection in a cross-lingual sentiment analysis using spaCy. 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 490–96. https://doi.org/10.1109/ICRITO48877.2020.9197829. Siapera, E. (2019). Online misogyny as witch hunt: primitive accumulation in the age of techno-capitalism. In D. Ging & E. Siapera (eds), Gender Hate Online: Understanding the New Anti-Feminism (pp. 21–43). Springer International. https://doi.org/10.1007/978-3-319-96226-9_2. Tanupabrungsun, S., & Hemsley, J. (2018). Studying celebrity practices on Twitter using a framework for measuring media richness. Social Media + Society, 4(1), 2056305118763365. https://doi.org/10 .1177/2056305118763365. Tyagi, A., Babcock, M., Carley, K. M., & Sicker, D. C. (2020). Polarizing tweets on climate change. In R. Thomson, H. Bisgin, C. Dancy, A. Hyder, & M. Hussain (eds), Social, Cultural, and Behavioral Modeling (pp. 107–17). Springer International. https://doi.org/10.1007/978-3-030-61255-9_11. Tyagi, A., & Carley, K. M. (2021). Climate change conspiracy theories on social media. https://doi.org/ 10.48550/arXiv.2107.03318. Wei, Y., Gong, P., Zhang, J., & Wang, L. (2021). Exploring public opinions on climate change policy in “Big Data Era”—a case study of the European Union Emission Trading System (EU-ETS) based on Twitter. Energy Policy, 158, 112559. https://doi.org/10.1016/j.enpol.2021.112559. Wiedmann, T., Lenzen, M., Keyßer, L. T., & Steinberger, J. K. (2020). Scientists’ warning on affluence. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-16941-y. Williams, H. T. P., McMurray, J. R., Kurz, T., & Hugo Lambert, F. (2015). Network analysis reveals open forums and echo chambers in social media discussions of climate change. Global Environmental Change: Human and Policy Dimensions, 32, 126–38. https://doi.org/10.1016/j.gloenvcha.2015.03 .006. Willson, G., Wilk, V., Sibson, R., & Morgan, A. (2021). Twitter content analysis of the Australian bushfires disaster 2019–2020: futures implications. Journal of Tourism Futures, 7(3), 350–55. https://doi .org/10.1108/JTF-10-2020-0183. Yang, K.-C., Torres-Lugo, C., & Menczer, F. (2020). Prevalence of low-credibility information on Twitter during the COVID-19 outbreak. http://arxiv.org/abs/2004.14484. Żuk, P., & Szulecki, K. (2020). Unpacking the right-populist threat to climate action: Poland’s pro-governmental media on energy transition and climate change. Energy Research & Social Science, 66, 101485. https://doi.org/10.1016/j.erss.2020.101485.
PART VI HUMAN INTERACTION WITH OTHER SPECIES
16. Plants as biosensors: tomato plants’ reaction to human voices Patrick Fuchs, Rebecca von der Grün, Camila Ines Maslatón and Peter A. Gloor
1. INTRODUCTION Plants are everywhere around us, at home, in the office, at university, in coffee shops and parks. They are easy to maintain and look after and they contribute to our well-being. They are also very sensitive to perturbations like changes in light and temperature, gravity, soil and air pollution, and insect attacks. They perceive these variations through an excitable membrane from which impulses are sent to regulate growth and function (Volkov & Ranatunga, 2006). This survival mechanism, possessed by all plants, not only helps them live longer but it says a lot about the environment they live in. But can plants also perceive people in the room? In this chapter, our goal is to find out how tomato plants perceive people’s voices. People communicate through voice and other nonverbal sounds that fall into different pitch categories. Pitch depends on sex (men’s voices are usually lower pitched than women’s voices) and on psychological, anatomical, and social factors (Pisanski et al., 2020). Verbal communication such as talking, singing, and shouting takes place every day of our lives and these are the sounds we would like to focus on. Our plan is to monitor signals emitted by plants when exposed to human voice, that is, to use plants as sensors. Since plants are natural organisms, we call these “biosensors” (Manzella et al., 2013). For this purpose, we use a tomato plant, for it is easy to grow and maintains and a Plant SpikerBox from Backyard Brains,1 a device that measures and displays electrical activity happening in plants’ cells. This chapter is structured as follows. First, our motivation for this work is presented. Then, related work about biosensors, plant reactions, and human voice is reviewed. Our research questions are introduced in Section 3. Our method and how we planned on measuring the plant’s reactions are presented in Section 4. Lastly, findings are shown and discussed, and challenges concerning model accuracy and possible future work are examined.
2. BACKGROUND 2.1
Plants as Biosensors
The definition of biosensors is broad, but generally any “device that intimately associates a biological sensing element and a transducer” can be classified as biosensor (Coulet & Blum, 2019: 1). Most people know them, for example, as measuring devices for the blood sugar level of diabetes patients. But biosensors have a wide application field which in recent years also started to include the Internet of Things. However, for the extensive use of such sensing 294
Plants as biosensors 295 devices, a new, cheap, and easy solution is needed, which a growing community sees in the use of plants as biosensors (Manzella et al., 2013; Oezkaya & Gloor, 2020). The idea of using plants as such sensors was already introduced in the 1980s when Kuriyama and Rechnitz built a sensor to detect glutamate with the tissue of a yellow squash (Kuriyama & Rechnitz, 1981). Until now, plants have been used for example in the detection of acid rain, the presence of insects, and a lot more (Volkov & Ranatunga, 2006). It is known that plants react to a large spectrum of external stimuli, which makes them suitable as sensing devices. This is because plants need to respond to events which affect them in order to survive or live in the best possible way. A lot of such events announce themselves through sounds. Water, for example, was shown to let plants grow their roots towards the source of the sound. But the other way round, plants also react to negative sounds of, for example, harmful insects, which have shown to stimulate some plants to send out signals to warn other plants (Khait et al., 2019). The plant’s reactions to different stimuli can be measured by action potentials which are visible in the electric signal produced by the plant. Action potentials arise due to different causes such as chemical treatment, wounding, temperature, or others. The action potentials are propagated through the whole plant and are used to communicate changes in the environment inter- and intracellularly. Therefore, connected to a device recording these action potentials, plants can be used as biosensors to monitor their surroundings (Volkov & Brown, 2004). However, to adopt plants as biosensors for application within the Internet of Things (e.g., to replace smartwatches), they would have to respond to humans as well. This indeed was shown in an experiment by Oezkaya and Gloor where a Mimosa pudica was able to distinguish between different people and different emotions through their gait. The plant’s response was in reaction to the electrostatic discharge produced by people walking by and was measured as an electrical signal which was later transformed to be suitable for a machine learning algorithm. This algorithm was able to classify the happy or sad mood of the participants walking by with an accuracy of 85 percent (Oezkaya & Gloor, 2020). 2.2
Plants’ Reactions to Sound
Whether plants react in a similar way to human voices is not yet very well investigated. Besides the myth that talking to your plants helps them grow faster, some less scientific experiments seem to support this assumption. In a month-long study by the Royal Horticultural Society, it was shown that plants that were exposed to female voices grew faster. Recordings read by different people were played to the plants and, on average, the ones who listened to the recordings of female readers were an inch taller after a month (Alleyne, 2009). Another experiment conducted by a gardening community wanted to see whether the way of speaking to plants affects them. To do so, they chose two speeches with a quite different tone, the “I have a dream” speech from Martin Luther King, and a speech from Hitler in the 1930s. Thirty bean seeds were exposed to either the positive speech about unity, peace, justice, and equality, the Hitler speech about anger and conflict, or served as control group and were not exposed to any recording. At the end of the experiment, the group that was exposed to Hitler’s speech had the smallest plants (Stoica, 2020). Nevertheless, the scientific credibility and reproducibility of these experiments remain uncertain. The influence of human voice on plants can only be derived from other studies investigating the effects of sound treatments on plants. It is a proven fact that plants can per-
296 Handbook of social computing ceive “various ecologically relevant frequencies” (Mishra et al., 2016). Clearly, plants respond to natural sounds like bees, but also artificial sounds of certain frequencies showed to have an impact on plants. An experiment by Collins and Foreman incorporated beans and impatiens which were treated with sounds of different frequencies. The plants selected were as genetically close as feasible and were held in the same environmental conditions. Sounds of random noise, of 500 Hz, 5000 Hz, 6000 Hz, 12000 Hz, and 14000 Hz were played each to an impatiens plant and a bean plant and the height of the plants was measured for 28 days. For the beans it showed that the 6000 Hz sound made them grow tallest, while the random noise, control group, and 500 Hz treatment produced the smallest plants. The impatiens, however, grew best being exposed to a sound of 14000 Hz (Collins & Foreman, 2001). Another experiment, conducted by Choi et al., finds that sound treatment can lead to better plant health. They exposed Arabidopsis thaliana to sounds of the same amplitude (100 dB) at the frequencies of 500 Hz, 1000 Hz, and 3000 Hz. The plants exposed to the 1000 Hz treatment were significantly more resistant against a gray mold disease named Botrytis cinerea (Choi et al., 2017). In another experiment it was shown that sound treatment can delay the ripening of tomatoes. This was achieved through the treatment of tomato fruits with different frequencies in sound-proof boxes with the same temperature. The experiment was conducted for 17 days with frequencies of 250 Hz, 500 Hz, 1000 Hz, and 1500 Hz, and a control group without treatment. The ripening of the fruits was delayed best at a frequency of 1000 Hz, while at 1500 Hz the fruits had gotten even riper than in the control group (Kim et al., 2015). Sound treatment affects plants differently, either positively or negatively, depending on the frequency. Those frequencies with a positive impact on some types of plants may have a negative impact on others (Mishra et al., 2016). 2.3
Human Voice
The human voice covers a broad frequency range, roughly between 100 Hz and 3000 Hz. The frequency range of each person is individual, as it depends on several factors such as sex and age, which determine the length and shape of the vocal tract and thus enable the speaker to produce different frequencies. Additionally, other factors like a speaker’s biosocial profile and identity can influence their voice frequency, and it can even be trained, like singers do. Changes in voice frequency can also be seen when people change their social context or express emotions (Pisanski et al., 2020; Sataloff, 1992). Therefore, it might be possible to determine a speaker or their emotional state through their voice frequency. A study by Pisanski et al. (2020) demonstrates the differences between the fundamental frequencies of male and female speakers for different emotions. The fundamental frequency (F0) is the lowest frequency within the spectrum, which is perceived as the pitch. Generally, the F0 of male speakers is lower than the F0 of female speakers. For neutral utterances the mean F0 measured in the study was 116 Hz for male speakers and 204 Hz for female speakers. The speakers reached higher F0 in aggressive speech (437 Hz female speakers, 312 Hz male speakers) and even higher values for expressing pain and the highest F0 in fear screams (900 Hz female speakers, 467 Hz male speakers) where individuals nearly reached a F0 of 2000 Hz (Pisanski et al., 2020). Regarding F0 values it is clear that the frequency of the human voice is about the same range that plants can respond to.
Plants as biosensors 297
3.
RESEARCH QUESTIONS
To assess the long-term goal of finding out whether plants can sense different people and their emotions, we decided to first concentrate on human voices of different frequencies. Individual voices differ highly in their frequencies and there are no common frequencies for gender, age, or other groups of people. Therefore, we concentrated on the question: How do tomato plants react to human voices of different frequencies? With this question we want to find out if exposing the plant to higher and lower pitched voices can influence the way the plant reacts. This is an important factor in deciding whether tomato plants are suitable as biosensors. Another question we investigated during the project was to what extent our tomato plant can recognize emotions through language and singing, and how it reacts to them. For this, among other things, tests were made with yodeling sounds or songs from the One Voice Children’s Choir.2 The last question we asked ourselves was whether tomato plants can recognize certain people by their voices, which we tried to prove with the help of a machine learning model. In the big picture, we wanted to prove that plants respond to audio stimulation through human voices and are also able to act as biosensors and can identify moods and people.
4. METHOD Our time management is depicted in Figure 16.1. The bigger the bubble, the more time we invested in the task. In addition to the four points, another even more important one was the documentation, which is shared via a media wiki. When we started working on this project, we decided to facilitate work with a knowledge platform. Doing research about plant care and SpikerBox installation and managing resources were time consuming. Also, the knowledge we needed for the project was dispersed through the web. We wanted to create a way to share our results, knowledge, data, and analysis (for more information on the platform, please see the Appendix). In this section, the method for data collection and data analysis will be examined.
Figure 16.1
Work processes
298 Handbook of social computing 4.1
Laboratory, Measuring, and Data Collection
Our test trials took place in an office at the data center of the University of Bamberg. The office we used has a large window facing north, protecting the plant from direct sunlight but still providing sufficient daylight. The office was only entered by us during the entire duration of the project, so the plant was undisturbed for most of the time. Among other things, the room has few sources of electronic interference, which we also reduced to a minimum during the tests. The room is also equipped with a fire door and well soundproofed windows, which absorb noise. To reduce stress, the plant was seldomly moved from its place and was additionally protected from floor vibrations with foam (Figure 16.3a). In addition, we noted environmental parameters such as the temperature, weather conditions, humidity, and time and duration of each measurement. Figure 16.2 shows the set-up of the office with the plant.
Figure 16.2
Controlled environment where experiments were carried out
For the collection of the data, we used the Plant SpikerBox from Backyardbrains, which was provided by the University of Bamberg. The SpikerBox measures the electrical signals produced by the plant via an electrode which is attached to a twig. It then outputs the signal via a .wav file. Figures 16.3a–16.3c show the materials and the SpikerBox as well as how the cables are connected from the box to our tomato plant. Figures 16.3a–c show how we set up a test and do all the adjustments. Also important to mention is that during the test all electronic devices are turned off and that only the laptop is running with its internal batteries to reduce the impact of interference to a minimum level. During the tests we stayed out of the room. In our first test experiment on May 5, 2021, we tested the functionality of the box and how our plant reacts to certain influences. Among other things, we measured the influence of light, watering, and touching with the hand and other materials. These experiments clearly showed that the plant reacts to stimulation and that we can measure this reaction. On June 4, 2021, our second test attempt took place. This time we wanted to investigate how our tomato plant reacts to individual frequencies as well as to different volumes. In addition, we conducted a first test to find out whether the plant detects a difference between male and female voices. The basis for the comparison was a recording of the plant without any stimu-
Plants as biosensors 299
Note: The foam pad to prevent/reduce vibration transmission can be seen clearly under the plant pot and SpikerBox.
Figure 16.3a Plant SpikerBox lation by audio signals in complete silence (no stimulus). We repeated the no stimulus test for each subsequent recording. For the first tests we used a 200 Hz frequency sine wave which we created with Python. Once in the experiment with an amplitude of 100, in the second experiment with an amplitude of 200. For the comparison between sexes, we used the RAVDESS database3 with a female as well as a male voice. Table 16.1 shows the recordings and the documented environmental parameters. Table 16.1
Second test: 200 Hz frequency and male and female voice
No.
Time
Length
Temperature Humidity %
Weather
Note
Stimuli
01
11:25
45 mins
28.2
Cloudy
None
65
02
12:55
49 mins
26.3
58
Sunny
200Hz Amp100
03
13:55
33 mins
27.8
60
Sunny
200Hz Amp 200
04
15:03
31 mins
27.9
59
Sunny
RAVDESS 7 male
05
15:40
28.8
59
Sunny
RAVDESS 8 female
300 Handbook of social computing
Figure 16.3b Connection of electrode wire to the plant
Figure 16.3c Data measurement On June 19 and 21, 2021 our third test took place, where we mainly tested how plants react to different songs. We tried out two different songs, one performed by the One Voice Children’s Choir and the other a yodeling song. Table 16.2 shows the recorded parameters. Due to a few mistakes, we had to repeat the recording without stimulation, then we played the One Voice Children’s Choir to the plant for about 40 minutes. To avoid inaccurate results due to an overload of different stimuli, we let the plant rest and completed the second part of
Plants as biosensors 301 Table 16.2
Third test: yodeling and One Voice Children’s Choir
No.
Time
Length
Temperature Humidity %
Weather
Note
Stimuli
06
13:39
29.5
Sunny
Cable fell off
None
65
07
14:54
29.4
65
Sunny
Electricity on
None
08
15:25
29.4
65
Sunny
None
09
15:39
40 mins
29.3
65
Sunny
One Voice
10
15:30
27.9
52
Sunny
Record 21/06
None
11
16:28
10 mins
28.3
52
Sunny
Record 21/06
Children’s Choir Yodeling 2 mins, 5 repetitions
the test experiment on June 21. On that day, we played a yodel recording to the plant in a continuous loop for a total of 10 minutes with five runs of two minutes each. On July 2, 2021, our last test session took place. Here, we wanted to investigate in more detail how the tomato plant reacts to explicit voices. The basis for this was an excerpt from the audiobook Harry Potter und der Stein der Weisen read by Rufus Beck.4 The audiobook Die Tribute von Panem: Tödliche Spiele read by Maria Koschny,5 referred to as Hunger Games in the following text, served as the female counterpart. We also repeated the recording of the One Voice Children’s Choir since the test results were quite striking at the first attempt. Table 16.3
Fourth test: audiobooks and singing
No.
Time
Length
Temperature Humidity %
Weather
Note
Stimuli
12
13:37
30 mins
24.7
68
Cloudy
None
13
14:31
30 mins
25.5
65
Cloudy
Harry Potter
14
15:25
30 mins
26.2
62
Sunny
Hunger Games
15
16:17
30 mins
26.9
59
Sunny
One Voice Children’s Choir
These files obtained from the last test trial also form the basis for our machine learning algorithm. 4.2
Data Analysis and Machine Learning
To get a better understanding of how stimulation via audio files affects the plants we have tried to represent this visually with Python as well as in more accurate values with counting the spikes and comparing them. In Figures 16.4a–16.4d you can see that the plant has produced very strong spikes, especially with the Children’s Choir and Harry Potter audiobook. It was also clearly visible in previous tests that the plant reacts to audio stimulation. In Figure 16.4a it is also well recognizable that despite non-existent stimulation it comes to smaller deflections again and again. Despite the described isolation and avoidance of interference sources, this shows very well how easily plants react to their environment. As stated earlier, we used the recordings of the last session for our classification algorithms. Hence, the data that we used consisted of four different stimuli to classify silence, the Harry Potter audiobook, the Hunger Games audiobook, and the Children’s Choir, each given in a 30-minute .wav file. For the classification with a machine learning algorithm, we oriented towards the approach described in an experiment conducted with Mimosa pudica to classify people walking past the plant (Oezkaya & Gloor, 2020). We omitted the normalization of
302 Handbook of social computing
Figure 16.4a Plot of the plant’s electrical signal produced during exposure to: no stimulus
Figure 16.4b Plot of the plant’s electrical signal produced during exposure to: Hunger Games
Plants as biosensors 303
Figure 16.4c Plot of the plant’s electrical signal produced during exposure to: Harry Potter
Figure 16.4d Plot of the plant’s electrical signal produced during exposure to: One Voice Children’s Choir
304 Handbook of social computing the data since we only used recordings from our last session. These recordings were all done in a single shot (with rest times in between) with the same conditions and only split up into individual files afterwards. We trained two different machine learning algorithms to classify our stimuli, a Random Forest (RF) and a Convolutional Neural Network (CNN). For our RF we split the recorded data into 20-second recordings, which makes 90 short recordings for each stimulus. The data was shuffled, and a train test split was performed with the training dataset containing 80 percent of the data, and the test dataset 20 percent. Afterwards we calculated 20 Mel Frequency Cepstral Coefficients (MFCCs) with a sampling rate of 1000, given by the recording of the SpikerBox, a window size of 2500, a hop length of 1250 and power 2. For our CNN we split the recordings into 5-second files, to obtain enough data. Shorter snippets of 3 and 1 seconds which we also tried for training the model resulted in a worse accuracy. With our 5-second snippets we ended up with 1,440 short recordings, 360 for each stimulus which we shuffled again. Then, they were split up into training and test dataset as before for the Random Forest, so that we had 1,152 recordings for training and 288 for testing. The same MFCC features as previously stated were extracted. We stuck to the parameters given in the Oezkaya and Gloor (2020) experiment, since changing them produced worse results. Our CNN model consisted of convolutional layers and a max-pooling layer for downsizing our data, followed by a flattening layer and several density and dropout-layers. We used the categorical cross entropy loss and adam optimizer, evaluating the accuracy. Our model then was trained for 100 epochs but included early stopping at the maximum validation accuracy. Several changes in the layers, the optimizer, and the number of epochs which we tested did not produce as good results as this model.
5. RESULTS In this section we present the results and performance of the deep learning models that were trained with the gathered data. To increase the available information, we extracted MFCC features from the electrical signals produced by the tomato plant during the recordings and used them for training the machine learning models. 5.1
Random Forest (RF)
We divided each collected signal of 30 minutes into 20-second-long files and fed them to an RF classifier that achieved up to 86 percent accuracy by comparing two classes. This means that the tomato plant can differentiate, for example, between the Harry Potter audiobook and the One Voice Children’s Choir 86 out of 100 times. When presented with a combination of the other two classes, the model’s accuracy varies from 72 percent to 86 percent. For all classes (silence, Harry Potter, Hunger Games, and One Voice Children’s Choir) our RF model achieves roughly 25 percent accuracy. 5.2
Convolutional Neural Network (CNN)
For the prediction of the played stimuli with our CNN, we were able to achieve a maximum validation accuracy of 60 percent. Within the 288 test datasets, 64.2 percent of the recordings
Plants as biosensors 305
Figure 16.5
Confusion matrix of our trained CNN showing the absolute values for our test dataset of size 288
with Harry Potter stimulus were classified correctly, which was the stimulus with the best classification results. The recordings without stimulus achieved 57.4 percent validation accuracy, the Hunger Games stimulus 65.3 percent, and lastly the One Voice Children’s Choir was classified in 51.2 percent of the test cases correctly. The results are shown in the confusion matrix in Figure 16.5. The validation accuracy of only 60 percent might not be very high but it is an indicator that it might be possible to do such a classification of played sound stimuli by means of the signal of a tomato plant. The CNN that we trained can probably be optimized further to improve accuracy. Data augmentation or gathering more data might improve the results as well. 5.3
General Results
The results achieved by our machine learning models show that plants indeed react differently to different human voices. The spike signal in the .wav files already indicated this, since the height and number of spikes varied across the different stimuli the plant was exposed to. Most apparent were the high spikes in the recording which was made when exposing the plant to the songs of the Children’s Choir (see Figure 16.4d). Although our overall accuracy for the classification of all four stimuli together was not extraordinarily high, our results show that it is possible to distinguish between audio stimuli from the plant’s electrical signal. The low accuracy might be the result of not perfectly optimized models, conditions for data collection, and lack of data. However, we were able to achieve a good accuracy in differentiating between two stimuli, and not only between a voice and silence. For our CNN we obtained more than half correct classifications.
306 Handbook of social computing All this leads us to the assumption that, with a bit more research and optimization of data gathering and classification, different human voices can be distinguished with the help of a plant’s electrical signal. Our results serve as first impression and clearly show that it could be possible to use tomato plants as biosensors in the future.
6. DISCUSSION Since the question on how plants (i.e., tomato plants) react to human voices of different frequencies has not been researched in this way before, we only can support our findings by other experimental results. The fact that tomato plants were treated with sound of frequencies between 100 Hz and 1000 Hz (Hassanien et al., 2014; Jung et al., 2018) supports our statement that tomato plants respond to human voices, which also lie within this range. Nevertheless, we expected the recording of the Children’s Choir to be the easiest and most accurate to determine by the plant’s signal. As the frequency range in the choir’s songs was a lot broader and included higher frequencies more intensely, we would have thought that the plant’s produced signal differs most from the others. While the amplitude in the plant’s electric signal (Figure 16.4d), with a clearly broader range than in the other signals, supported this assumption, our CNN could not classify the One Voice Children’s Choir as reliably as expected. The distinction of the choir and the signal without stimulus seemed to be especially problematic. This was also the case for the RF where the accuracy was lower for the distinction between the choir and silence. It would be interesting to find out why this is the case, if this happened due to a distortion in our data, which seems probable, or whether the frequency range of the choir was somehow problematic for our tomato plant. In general, we support the assumption that plants, in our case tomato plants, can be used as a biosensor, as stated in prior works (Manzella et al., 2013; Oezkaya & Gloor, 2020). We further conclude that they can be used to detect different human voices.
7. CONCLUSION The aim of this chapter was to assess how a tomato plant reacts to human voices of different frequencies. Three different audio stimuli with voice of different frequency ranges were played to the plant and the plant’s produced signals during this stimulation were measured. Signals were also recorded during a period of the plant exposed to silence. The three stimuli besides silence were an audiobook read by a male reader, one read by a female reader, and a children’s choir. In predicting the played stimulus from the electrical signals, the machine learning model reached a moderate accuracy of around 60 percent. So far, the algorithm is only able to distinguish between the four used stimuli. To get a higher accuracy and an algorithm which is also able to recognize different voices, their mood, and so on, additional tests are needed and therefore more data to train a machine learning algorithm. There are plenty of possibilities to improve the algorithm or the model as well, like using different machine learning models or varying the length of the split recordings. In the context of the data, it is also possible to improve the quality of the data by better understanding and better isolating the environmental factors such as the soil vibration, the use of hydroculture to provide constant nutrition to the plants, or a stable climate which can
Plants as biosensors 307 be provided with an air conditioning or temperature control system. Since the main goal is to understand the response of the plant to different audio sounds, it is especially important to protect the plant from disturbing sounds such as the noise of the street, crowds, and so on during or even before the test to obtain the best possible results. Another important point to mention is that all test trials were only carried out on a tomato plant, and it cannot be said with certainty whether these results are transferable to other plants or whether they are only representative of tomato plants. For this purpose, it would be possible to test the machine learning model with other plants. Overall, we were able to create a relatively useful test environment. An expansion of the environment would have been beneficial for the quality of the data.
ACKNOWLEDGMENT We would like to thank Dr Bökels for making it possible to use a room in the data center of the University of Bamberg for our experiments and the accommodation of our plants.
NOTES 1. Information about the Plant SpikerBox can be found at https://backyardbrains.com/products/ plantspikerbox (last accessed 28 May 2021). 2. For more information about the One Voice Children’s Choir, see https://www.onevoicechil drenschoir.com/ (last accessed 28 May 2021).. 3. The audio files of the RAVDESS database are available at https://zenodo.org/record/1188976# .YRU89tMza3I (last accessed 28 May 2021).. 4. Audiobook of the novel Harry Potter und der Stein der Weisen by J. K. Rowling, read by Rufus Beck (der Hörverlag, 2010). 5. Audiobook of the novel Die Tribute von Panem: Tödliche Spiele by Suzanne Collins, read by Maria Koschny (Oetinger Media, 2013). 6. For further information and template, see https://vercel.com/new/docusaurus-2.
REFERENCES Alleyne, R. (2009, June 22). Women’s voices “make plants grow faster” finds Royal Horticultural Society. https://www.telegraph.co.uk/news/earth/earthnews/5602419/Womens-voices-make-plants -grow-faster-finds-Royal-Horticultural-Society.html (last accessed 19 July 2021). Choi, B., Ghosh, R., Gururani, M. A., Shanmugam, G., Jeon, J., Kim, J., Park, S.-C., Jeong, M.-J., Han, K.-H., Bae, D.-W., & Bae, H. (2017). Positive regulatory role of sound vibration treatment in Arabidopsis thaliana against Botrytis cinerea infection. Scientific Reports, 7(1), 2527. https://doi.org/ 10.1038/s41598–017–02556–9. Collins, M. E., & Foreman, J. E. K. (2001). The effect of sound on the growth of plants. Canadian Acoustics, 29(2), article 2. Coulet, P. R., & Blum, L. J. (2019). Biosensor Principles and Applications. CRC Press. Hassanien, R. H., Hou, T., Li, Y., & Li, B. (2014). Advances in effects of sound waves on plants. Journal of Integrative Agriculture, 13(2), 335–48. https://doi.org/10.1016/S2095-3119(13)60492-X. Jung, J., Kim, S.-K., Kim, J. Y., Jeong, M.-J., & Ryu, C.-M. (2018). Beyond chemical triggers: evidence for sound-evoked physiological reactions in plants. Frontiers in Plant Science, 9. https://doi.org/10 .3389/fpls.2018.00025.
308 Handbook of social computing Khait, I., Obolski, U., Yovel, Y., & Hadany, L. (2019). Sound perception in plants. Seminars in Cell & Developmental Biology, 92, 134–8. https://doi.org/10.1016/j.semcdb.2019.03.006. Kim, J.-Y., Lee, J.-S., Kwon, T.-R., Lee, S.-I., Kim, J.-A., Lee, G.-M., Park, S.-C., & Jeong, M.-J. (2015). Sound waves delay tomato fruit ripening by negatively regulating ethylene biosynthesis and signaling genes. Postharvest Biology and Technology, 110, 43–50. https://doi.org/10.1016/j .postharvbio.2015.07.015. Kuriyama, S., & Rechnitz, G. A. (1981). Plant tissue-based bioselective membrane electrode for glutamate. Analytica Chimica Acta, 131, 91–6. https://doi.org/10.1016/S0003-2670(01)93537-8. Manzella, V., Gaz, C., Vitaletti, A., Masi, E., Santopolo, L., Mancuso, S., Salazar, D., & de las Heras, J. J. (2013). Plants as sensing devices: the PLEASED experience. Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, 1–2. https://doi.org/10.1145/2517351.2517403. Mishra, R., Ghosh, R., & Bae, H. (2016). Plant acoustics: in the search of a sound mechanism for sound signaling in plants. Journal of Experimental Botany, 67(15), 4483–4494. https://doi.org/10.1093/jxb/ erw235. Oezkaya, B., & Gloor, P. A. (2020). Recognizing individuals and their emotions using plants as bio-sensors through electro-static discharge. http://arxiv.org/abs/2005.04591 (last accessed 28 May 2021). Pisanski, K., Raine, J., & Reby, D. (2020). Individual differences in human voice pitch are preserved from speech to screams, roars and pain cries. Royal Society Open Science, 7(2), 191642. https://doi .org/10.1098/rsos.191642. Sataloff, R. T. (1992). The human voice. Scientific American, 267(6), 108–15. Stoica, A. (2020, January 27). Plant experiment: words and growth. You Had Me At Gardening. https:// youhadmeatgardening.com/plant-experiment-hitler-vs-king/ (last accessed 19 July 2021). Volkov, A. G., & Brown, C. L. (2004). Electrochemistry of Plant Life. Electrochemistry Encyclopedia. https://knowledge.electrochem.org/encycl/art-p01-plants.htm (last accessed 10 August 2021). Volkov, A., & Ranatunga, D. (2006). Plants as environmental biosensors. Plant Signaling & Behavior, 1(3), 105–15. https://doi.org/10.4161/psb.1.3.3000.
Plants as biosensors 309
APPENDIX Our choice to implement this platform was a modern media wiki, which can be accessed through an internet browser at https://plantsasbiosensors.vercel.app/. The data of the media wiki is managed via a GitHub repository and deployed using Vercel. The template we use is Docusaurus 2,6 which uses React (a Java Script Library for creating user interfaces) for the layout. Documentation was created using Markdown/MDX documents. Also included are Jason components and HTML data. The GitHub repository is publicly available on our media wiki. The platform contains four sections, the first one being “About or project”, where we briefly introduce our goals. The second section, “Knowledge”, presents a small introduction to the Plant SpikerBox from Backyard Brains. The page also contains the blueprint and the Arduino Code of the box. The third category of our page is “Literature”, which contains all the research papers that are related to our project. The fourth section, “Project status”, shows our trials and tests, the data analysis, and the preparation for the machine learning algorithm. It includes code, tables with the files and results, as well as the information about the measured data and the .wav plots. We provide a rich collection of resources to be used by students and artificial intelligence enthusiasts that aim to enter the world of plants (Figure 16A.6).
Figure 16A.6 Impressions of our knowledge platform
17. Prototyping a mobile app which detects dogs’ emotions based on their body posture: a design science approach Alina Hafner, Thomas M. Oliver, Benjamin B. Paßberger and Peter A. Gloor
1. INTRODUCTION Dogs are considered to be “a man’s best friend” (The History Place – Great Speeches Collection: George Graham Vest Speech – A Tribute to Dogs, 1855). An ad hoc recognition of emotions of dogs can be important for dog owners as well as for dog trainers or similar professionals in order to be able to interpret and classify them correctly. Several researchers shed light on animal emotions and how they can be recognized. Thereby, dog emotions are often categorized in four types of emotions: Anger, Fear, Happiness, and Relaxation. These emotions can, amongst other indicators, be derived from their body posture. (Coren, n.d.; Henninger, n.d.; Simpson, 1997). The classification of emotions via images by using machine learning (ML) technologies to identify emotions is well known in research (Brownlee, 2019). For instance, there are several applications where ML is used to analyse the emotions of humans by their facial expressions in pictures (Ekman & Friesen, 1978). The general idea of analysing a dog’s body posture by using ML is also not entirely new. Instead, there already are different approaches to detect dog emotions (Ferres, 2021; Niklas & Ferres, 2019; Waller et al., 2013). However, the emotion detection of dogs is quite difficult since dogs’ facial expressions and their body posture varies a lot (Karmitsa & Tiedottaja, 2016; Meridda et al., 2014). In this chapter, we build upon the work of Ferres (2021) in which she used ML approaches to predict dogs’ emotions; we are adopting a slightly different approach in which we want to investigate how different deep learning models can be used to classify these emotions. Since, up to now, there is no mobile app that dog owners can use to check in which emotional state their dog is, one goal of this work is in implementing such a mobile app. Another goal of the work is the extension of an already existing dataset of labelled images of dogs following different approaches. This dataset is then used to train an ML model, which is used by the mobile app. Since the goal of this work is actually to design and implement a software artefact, we will follow a Design Science Research approach which is an adequate research paradigm for such use cases (Hevner et al., 2004). This chapter is structured as follows. The next section provides the theoretical background. This is followed by the methodological section describing our research process based on Design Science Research. The next section is about the analysis and the results. Lastly, we conclude by discussing our contribution, future research directions, and limitations of our work. 310
Prototyping a mobile app which detects dogs’ emotions 311
2.
THEORETICAL BACKGROUND
2.1
Emotions of Dogs and Link to Their Body Posture
While emotional research on dogs is still in an early phase (Kujala, 2018), many researchers agree on the existence of some basic emotions. For instance, Amici et al. (2019), Bloom and Friedman (2013), and Meridda et al. (2014) identify an emotion associated with happiness, joy, playfulness, contentment, or excitement, which we further refer to as happiness. Also, the opposing emotion, sadness, distress, or frustration, is often listed. Other basic emotions that have been identified to apply to dogs are anger, fear, surprise, and disgust (Amici et al., 2019; Bloom & Friedman, 2013; Meridda et al., 2014). Regarding emotions in general, we also consider secondary emotions (e.g. pride or shame). These emotions require a conscious self and self-reflection (Tracy & Robins, 2004). Unfortunately, according to the current state of science, no secondary emotions could be explicitly detected in dogs (e.g. Kujala, 2018). Coren (n.d.), Henninger (n.d.), and Simpson (1997) define six different common postures as meaningful for canine communication. These postures refer to non-resting, active dogs and define specific body signals as essential for a dog to display its state of mind. These body postures are: neutral, alarmed, dominant-aggressive, active-submissive, passive-submissive, and playful. Following these authors, if a dog is in a neutral position, this means that the dog is relaxed and approachable since it feels unconcerned about the environment. In contrast, an alarmed dog is in an aroused or attentive condition as it detects something in its surroundings. A dog is showing dominant aggression to communicate its social dominance and that it will answer a challenge with an aggressive attack. Compared to this, a defensive, aggressive dog may also attack but is led by fear. An active submissive canine is also fearful and worried and shows weak signals of submission. In contrast, total surrender and submission are shown by passive, submissive behaviour and indicate extreme fear of the animal. If the dog is in a playful posture, it invites others to interact with it and is mostly in a good mood. (Coren, n.d.; Henninger, n.d.; Simpson, 1997). The named postures and how they are expressed in detail is listed in Table 17.1. It can be derived that the most critical indicators for naming a dog’s mood are the bodyweight distribution, the tail position, the position of the head and ears, and the condition of the mouth. Table 17.1
Overview of a dog’s posture characteristics
Posture
Front end
Neutral
Normal
Up
Alarmed
Normal
Up
Back end
Head
Ears
Tail
Mouth
Down
Open, tongue visible
Dominant-aggressive Lowered Active-submissive
Strongly lowered Down
Lowered
Down
Up, forward
Horizontal
Slightly open
Down, back
Down, tucked
Closed
Down, flat,
Down
Closed
Down
Closed
back Passive-submissive
Underbelly exposed
Down
Down, back
Playful
Strongly lowered Normal
Moving
Up
Open, tongue visible
Sources: Based on Coren (n.d.); Henninger (n.d.); Simpson (1997).
312 Handbook of social computing 2.2
Machine Learning for Image Recognition
To enable the automatic pose recognition of animals by computational image analysis, techniques from the field of image recognition are required. Following Solomon and Breckon (2011), image recognition, also called image processing, is an automatic identification method for objects, living beings, or patterns from photographic recordings. When identifying the posture of an animal using image recognition, the concepts of object detection or keypoint detection are used. Dasiopoulou et al. (2005) identify object recognition as a method of image recognition in which an instance displayed on a digital image is assigned to a defined object class. By applying object recognition to the different parts of an image, the area containing a specific object can be determined, opening the possibility of object detection in photographs. Each object class has features, such as a specific colour spectrum or a certain similarity in shape, defining its characteristics. Using several examples for an object class, a model can be trained that stores this class’s characteristics. An algorithm can assign a class by making a feature alignment between an image or image part and the object class using the previously trained model. Concerning identifying animal postures, object detection can be applied to identify the individual body parts such as the eyes, the nose, or the mouth. Furthermore, it is possible to also determine the condition of a recognized body part by object recognition. For instance, Tsai et al. (2020) used the described method to determine the mouth condition of a dog and thus classify a mouth as closed or opened. Rohr (2001) describes keypoint detection as a category of image recognition involving the evaluation of the coordinates of essential points that are separable from each other. Models for keypoint detection can be constructed using supervised learning, which means the algorithm will be given sample pictures and the coordinates for the key points to be learned. To achieve robust keypoint recognition algorithms, the regarded object or being must be resilient to changes in perspective, changes in size, and movement. To determine an animal’s posture using keypoint detection, a model must be trained with images of the animal and coordinates of all critical key points. Putting the points into relation allows us to calculate the pose or essential features of the pose based on the recognized key points. The joint coordinates are highly relevant for an accurate reconstruction of the pose (Rohr, 2001). Using an ML model to recognize multiple canine emotions falls under the category of multiclass supervised classification. In the context of ML, classification is the identification of a class from a set of classes and categories to which a sample belongs. A sample is a single item characterized by its attribute values and is also known as a row, observation, or instance. Assigning an instance to a class is based on its attribute values, also referred to as features. Supervision is a method for training models in which ML algorithms are given a set of examples whose class membership is already known, referred to as a training set. Computational learning can be used to learn feature criteria for the different classes using different types of pattern recognition. Since more than two emotions should be considered in the context of this research work, the applied supervised classification approach needs the ability to predict a class out of a set of multiple classes. Therefore, the model to be trained needs to be a multiclass classifier (Géron, 2017).
Prototyping a mobile app which detects dogs’ emotions 313 Here artificial neural networks (ANNs) will be used as supervised classifiers with the ability of multiclass prediction. Therefore, we will give a quick overview of the particular topic and also explain how to evaluate its performance. 2.3
Machine Learning for Image Predicition
ANNs are computer systems inspired by biological neural networks, which learn to perform complex tasks by recognizing features from processed data. They are an essential component of ML, especially supervised learning (Goodfellow et al., 2016). There are many types of ANNs, but the basic principles are very similar. ANNs consist of multiple layers, each containing a set of artificial neurons. Each neuron in the network can receive input signals, process them, and send an output signal. Each neuron is connected to at least one other neuron, and each connection is evaluated with a real number, the weight coefficient. This reflects the degree of importance of each connection in the ANN. The main advantage of ANNs is that they can use some a priori unknown information hidden in the data. The process of acquiring unknown information is called learning or training the ANN. Learning means adjusting the weight coefficients to satisfy certain constraints in a mathematical formalism. In the context of supervised training, learning means that the ANN knows the desired output and adjusts the weight coefficients so that the computed output and the desired output are as close together as possible (Goodfellow et al., 2016). ANNs are suitable for both classification and regression tasks. For multiclass classification, the network needs as many output nodes as there are classes, for each of which a probability value is calculated. The class membership of an instance is then assigned to the class with the highest computed probability value. During training, the network connections that led to the correct result are reinforced by increasing their weights in a process known as back-propagation. In this way, the network gradually learns how the values of input variables should be used to make predictions about class membership that are as accurate as possible. ANNs react sensitively to significant differences in the value ranges of the input variables, which is why feature scaling should be applied to the data in a pre-processing step (Goodfellow et al., 2016). A commonly used ANN type is called multi-layer perceptron (MLP). An MLP has one or more hidden layers in addition to an input and output layer. Increasing the number of hidden layers can help model highly complex structures better, which is called deep learning (Goodfellow et al., 2016). An example of an MLP is shown in Figure 17.1. The concept behind deep learning is to build networks with hidden layers between the input and output layers of the network. Each hidden layer applies the mathematical structures of neurons to perform the learning task. The learning approach is designed to continuously analyse data according to a logical structure, like how a human would conclude. Data analysis is repeated until accurate predictions occur. If the system provides low prediction accuracy, the learning approach automatically adjusts. The learning process consists of two critical elements: the forward feature abstraction and the backward error feedback. The forward feature abstraction is vital for the analysis of the input data, and the backward error feedback performs the adaptation of the neurons. (Goodfellow et al., 2016).
314 Handbook of social computing
Source:
Own representation.
Figure 17.1
Artificial neural network: multi-layer perceptron
Furthermore, other important parameters for adjusting a network, besides the number of layers and the number of nodes per layer, are the following (Géron, 2017): ● Activation function: a mathematical expression used to calculate a node’s output from its inputs. The function can be chosen individually for each layer of nodes. ● Optimizer: the applied method for changing the network’s attributes, such as the weight values or the learning rate. The choice of optimizer influences the duration and intensity of training. ● Batch size: describes the number of samples that are considered in one pass through the network to estimate the error gradient. The higher the batch size, the more memory is required. ● The number of epochs: defines how many complete passes – i.e. all training samples were used precisely once as part of a batch for error gradient estimation – are performed. If the number of epochs is too small, it can result in an insufficiently trained model. Whereas, if the number is too high, it might lead to overfitting, meaning the model describes the given training set very well but is not generalizable on data it has never seen before. ● Loss function: function which evaluates the current solution, meaning the weight settings, by calculating the error gradient. The great advantage of ANNs is their ability to comprehend a high degree of complexity, if it is present in the data, without needing extensive pre-processing. Their downsides include high computational requirements, a need for considerably more extensive datasets, and a loss in decision transparency for humans (Géron, 2017; Goodfellow et al., 2016). The performance of a supervised classifier must be evaluated against a dataset that has not already been used for the learning process but where the correct class memberships are available. This is why datasets are commonly divided into training and test sets. Typically, the training set contains 80–90 per cent of the instances, and the test set contains 10–20 per cent of the instances. Various training sets are used to train the model, and then the model is evaluated on an independent test set. Through this method, prediction performance can be measured for each
Prototyping a mobile app which detects dogs’ emotions 315 combination. As more samples were used in the testing process, a better approximation of the actual model performance can be obtained by using the mean or median of the measurements. The most popular performance measure for classifiers is accuracy, which simply indicates the percentage of correctly classified samples out of all tested samples. 17.1) (
TP + TN TP + TN + FP + FN
Accuracy = _______________
The accuracy of a measurement can often be helpful, though it should be used with caution since it may not be a meaningful measure depending on the dataset and circumstances. The problem can also arise if classes are very unevenly distributed because a reasonably accurate classifier can categorize many instances correctly but fails to assign instances to classes if there are only a few training samples. Confusion matrixes can be used to discover such problems. For example, a confusion matrix (Table 17.2), shows how often a classifier assigns instances of a class to other classes. Therefore, it is counted how often a sample belongs to each class for each class (Géron, 2017; Goodfellow et al., 2016). Table 17.2
Example of a confusion matrix to evaluate the outcome of a supervised classifier
Class = yes
Class = no
Class = yes
TP
FP
Class = no
FN
TN
Predicted class
Actual class
Besides accuracy, other measures are precision which is the fraction of correctly classified positive datasets in relation to all positively classified datasets. Precision = _ TP TP + FP
17 . 2) (
Whereas the recall measure is the fraction of correctly classified positive datasets in relation to all positively classified datasets. Recall = _ TP TP + FN
17.3) (
A measure that combines these two measures is the so-called F-Measure. F − Measure = _______________ 2 * Precision * Recall Precision + Recall
2.4
17.4) (
Related Work
This work is based on previous research in the field of recognizing dogs or, to be more general, animal emotions. To start with, we want to give a quick overview of previous work. For example, one relevant work is the Dog Facial Action Coding System (DogFACS), a scientific
316 Handbook of social computing observational tool for identifying and coding facial movements in dogs (Waller et al., 2013). The system is based on the facial anatomy of dogs and has been adapted from the original FACS system used for humans created by (Ekman & Friesen, 1978). The DogFACS manual details how to use the system and code the facial movements of dogs objectively (Ekman & Friesen, 1978). As an example, Franzoni et al. (2019) and Niklas and Ferres (2019) applied ML techniques on emotion image datasets to create a dog emotion classifier relying on facial analysis. There is also scientific work that introduces the concept of smart devices to recognize the emotional patterns of dogs. For example, Brugarolas et al. (2013) developed a behaviour recognition application that analyzes data from a wireless sensor worn by the dog. Furthermore, Aich et al. (2019) designed and programmed a canine emotion recognizer that is also based on sensors. In this case, the sensors are worn around the dog’s neck and tail. The collected data output was fed into an ANN in the training phase. The resulting model achieves an accuracy of 92.87 per cent in analysing the sensor outputs for emotions. Moreover, there is some relevant scientific contribution provided by Tsai et al. (2020) in which the development of a dog surveillance tool was introduced. Tsai et al. combined ML techniques for image recognition on continuous image and bark analysis to compute a dog’s mood differentiating the emotions of happiness, anger, sadness, and a neutral condition from each other. Image recognition works by applying object detection to a dog’s body parts and a condition prediction for each identified part. For the sound analysis, examples of barking, growling, and crying were used to train an audio identification model. Tsai et al. (2020) determined feature combinations that each emotional condition needs to be fulfilled to predict an emotion relying on a given image and sound information. As already mentioned, Ferres (2021) investigated how ML can classify dogs’ emotions by analysing their body posture. Unfortunately, her research was somehow limited concerning the accuracy of the implemented models. Therefore, she suggested continuing to investigate this particular point in future works. Also, until now, we are not aware of a mobile app that can be used to recognize dogs’ emotions in an ad hoc manner.
3.
RESEARCH METHOD
To achieve our research objective, we followed Design Science Research (DSR) to develop the mobile app, which can be seen as the resulting artefact. We split the design process into different parts, along with the different project goals described earlier. In general, DSR is a research paradigm in which a designer answers questions related to human problems by creating innovative artefacts, thereby generating new knowledge. The basic principle of DSR is that knowledge and understanding of a design problem and its solution are acquired through the design and application of an artefact (Hevner & Chatterjee, 2010). The concept of an artefact is used to describe something artificial or constructed by humans. Such artefacts are intended to either improve upon existing solutions to a problem or provide an initial solution to a problem. IT artefacts, which are the ultimate goal of any DSR project in information systems, can essentially be constructs (concepts and symbols), models (abstractions and representations), methods (algorithms and practices), instantiations (implemented and prototyped systems), or better design theories (Hevner & Chatterjee, 2010).
Prototyping a mobile app which detects dogs’ emotions 317 Research activities in the context of design science research have been described using a conceptual framework for understanding research in the field of information systems, the so-called Information Systems (IS) Research Framework (Hevner et al., 2004). This was further developed into the Three Cycle View (Hevner, 2007). In addition, seven guidelines for conducting and evaluating design science research were established (Hevner et al., 2004). Since in this work the seven guidelines are somehow prominent, Table 17.3 shows these seven guidelines and illustrates how they are addressed in this work. Table 17.3
Design Science Research guidelines and their occurrence in this work
Design Science Guideline
Occurrence in this work
Guideline 1: Design as an Artefact
By designing and implementing a mobile app, we are designing an artefact.
Guideline 2: Problem Relevance
As described earlier, the mobile app can be useful to analyse dog emotions in an ad hoc manner.
Guideline 3: Design Evaluation
We are constantly evaluating the design of each part of the artefact and the final artefact as well.
Guideline 4: Research Contributions
Our research contributions are mainly insights into what went well and what could be improved.
Guideline 5: Research Rigor
We are rigorous in the way we pass through all phases of the research process.
Guideline 6: Design as a Search Process
Throughout the whole research, we have been searching for what could be better solutions.
Guideline 7: Communication of Research
We are communicating our research contribution mainly through this paper and several presentations and by making our source code publicly available.
Source:
Design Science Research guidelines from Hevner et al. (2004).
According to Österle et al. (2011), ideally, design-oriented IS research follows an iterative process comprising the four basic phases, Analysis, Design, Evaluation, and Diffusion, which fit together with the previously mentioned concepts of Hevner (2007) and Hevner et al. (2004). Since the whole research process can be best understood if clearly described, we will follow these four basic phases in a slightly adapted way. We decided to split our final artefact into three parts and went through the phases of Analysis, Design, and Evaluation with each of these parts. Then we will again evaluate the final artefact and go through the Diffusion phase with the final artefact. The pursued outcome of our DSR process is the mobile app which can be seen as the final artefact. As described earlier, we divided the final artefact into three parts, the enlargement of the image database, the ML model, and the front end of the app. Figure 17.2 shows how these parts relate and how the results get used by the other parts of the artefact – starting with the first artefact, which includes the enlargement of the dataset of dog images. Next, this dataset is used for the second artefact to implement an ML model which classifies dog emotions based on the images. Afterwards, the ML model is connected to the front end of the mobile app. As already mentioned, we adapted the research process so that the Analysis phase will be done for the whole artefact. Then, the Design phase is split into three parts where each of these parts goes through the Analysis phase again, then each of these goes through the phases of Design and Evolution. Lastly, the whole artefact goes through an Evaluation phase, and the results will be presented in the phase of Diffusion. An overview of the described research process can be found in Appendix A.
318 Handbook of social computing
Figure 17.2
Usage and artefact cycle
4.
ANALYSIS AND RESULTS
4.1
Phase of Analysis
As described in the theoretical background section, there are several approaches in research to predict the emotion of dogs. As has been discussed, the performance of the presented algorithms needs to be improved. Also, there is no mobile app with which dog owners could easily predict their dogs’ emotional state ad hoc. Therefore our goal was to implement such an app that predicts the emotional state of a dog (differentiating between Anger, Fear, Happiness, and Relaxation). Since Ferres’ (2021) dataset was somewhat rudimentary, our first goal was to augment it. Since this project can be split into separate parts, we adjusted our design process slightly so that every part of the artefact will be analysed, designed, and evaluated on its own before looking at the whole artefact. The process of augmentation is described in Section 4.2.1 Secondly, we will need a working ML model to classify the emotions. The implementation of this model is described in Section 4.2.2, and lastly, the implementation of the front end of the mobile app is described in Section 4.2.3. 4.2
Phase of Design
4.2.1
Part one of the artefact – image dataset
Phase I: analysis The ML model needs a certain number of images to predict emotions. Therefore, a good starting point was the dataset that was created and used in the research works of Ferres (2021). Nevertheless, we needed to expand our dataset to increase the prediction performance. During the project, several procedures were used to increase the dataset. We found that some were more successful than others. In particular, we had problems locating images for the two emotions Anger and Fear.
Prototyping a mobile app which detects dogs’ emotions 319 Furthermore, data protection was a complicating factor, which resulted in a concern in many of our used approaches. For this reason, we have explored many different possibilities to obtain images, such as traditional ways, e.g. postings (e.g. at the veterinarian’s office or in pet shops), mailings, outside photos, and so on, and new ways, e.g. social media (e.g. pet groups, Instagram), web search, and so on. To summarize, we follow a crowdsourcing approach leveraging social networks (online and in real life) to support our project by encouraging the members of the social networks to share their pictures with us. Regarding the labelling of the images, we followed the approach of labelling them ourselves, respecting the researcher’s findings regarding the expressed emotions based on dogs’ body postures. However, we wanted the dog owners first to label their pictures themselves to benefit from collective intelligence and subsequently cross-validate their labelling. Phase II: design The starting point was the dataset of Ferres (2021). Here we were able to access 367 images, divided into the four named emotions Anger (112 pictures), Fear (95 pictures), Relaxation (85 pictures), and Happiness (75 pictures). To increase our dataset, we initially set up a formula for people to upload images of their dog’s respective emotions. Within this survey, we provided a declaration of consent which included all the relevant data protection information. We also clarified that the dog’s body posture must be visible because we we needed this information to correctly label the pictures. We asked the dog owners to label their pictures themselves, which they sometimes did. Regardless, we cross-checked their labelling with the findings of researchers (Coren, n.d.; Henninger, n.d.; Simpson, 1997). Next, we contacted animal shelters and dog breeders by mail. The output was not satisfying because if we received pictures at all, they would mainly cover the emotions Relaxation and Happiness. In addition, we received little to no feedback by mail from animal shelters. Here the feedback was often that they would not have enough time. Pursuing this approach led to a dataset of 112 pictures with emotion Anger, 106 pictures with emotion Fear, 122 pictures with emotion Relaxation, and 115 pictures with emotion Happiness. In addition, we have tried to take pictures of dogs in parks or public places ourselves. We also posted notices with a QR code for our Google Forms survey in animal shelters, pet stores, and veterinarians’ offices. Unfortunately, people were hesitant to make these images freely available to us because of privacy concerns. After these methods were not very successful, we adapted our approach of acquiring images, especially regarding the emotions of Anger and Fear. Fortunately, we were able to find someone on Instagram who supported our project and shared her pictures with us. We also joined many social media groups of dog communities to receive pictures. This switch to social media grew our dataset of images to 116 pictures of Anger, 106 pictures of Fear, 156 pictures of Relaxation, and 191 pictures of Happiness. However, since it was still challenging to get pictures of the emotions of Anger and Fear, we again went for a web search. We finally expanded our dataset to 127, 122, 156 and 191 images, respectively, by searching there. As the last step, we got the hint to specify our search and to look for street dogs in Chile, which might provide images of angry and fearful dogs. This approach enlarged our dataset in the end. Thus, in the end, we had 625 images overall, which were divided into the following emotions and their images: 139 of Anger, 139 of Fear, 156 of Relaxation, and 191 of Happiness. This is an increase of 41 per cent in total (Table 17.4).
320 Handbook of social computing Table 17.4
Increase of the image dataset over time
Anger
%
Fear
%
Relaxation
%
Happiness
%
Total
%
1
–
–
–
–
–
–
–
–
–
2
–
–
–
–
–
–
–
–
–
3
112
19
95
32
85
46
75
61
367
41
4
112
106
122
115
455
5
116
106
156
191
569
6
127
122
156
191
596
7
139
139
156
191
625
Phase III: evaluation The project has shown that it is generally not easy to get pictures of dogs, in particular showing the emotions of Anger and Fear. On the one hand, the difficulties were due to data protection and, on the other hand, people prefer to pass on happy dog pictures. This could be because of a desire to paint themselves as good dog owners. Even though trainers working in animal shelters can better classify dog emotions, their willingness to help was very low. Hence, our best source was web search and social media. Using these sources, we increased our dataset quickly and significantly in the short term. 4.2.2
Part two of the artefact – machine learning model
Phase I: analysis Since the goal was to derive dogs’ emotions by classifying an image of their body postured, the present problem can be seen as a multiclass classification problem with four classes that distinguish the four emotions. As described in the background section, the use of ML – to be more precise, the use of ANNs – is an appropriate approach for (biological) image classification (Affonso et al., 2017). Phase II: design Regarding building the ML solution, we followed different approaches. Since this work builds upon previous work in this field, we evaluated existing solutions. For example, there is work that uses decision trees for classifying emotions. Unfortunately, the performance using decision trees has been quite low, so we decided to build an ANN. We also decided to check whether commercial tools could reach a better accuracy, and then afterwards, we would decide which model to use in our final app. We will now briefly describe each approach. First approach: convolutional neural network with DeepLabCut Our first approach was to build a convolutional neural network (CNN). According to Goodfellow et al. (2016), CNNs are a specialized kind of ANNs for processing data with a known grid-like topology. Examples include time-series data, which can be considered a 1-D grid taking samples at regular time intervals, and image data as a 2-D grid of pixels. Convolutional networks have been tremendously successful in practical applications. The term “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation. Convolutional networks are simply ANNs that use convolution in place of general matrix multiplication in at least one of their layers (Goodfellow et al., 2016).
Prototyping a mobile app which detects dogs’ emotions 321 We decided to use the library DeepLabCut (DeepLabCut – The Mathis Lab of Adaptive Motor Control, n.d.). The markerless pose estimation based on transfer learning with deep neural networks achieves excellent results with minimal training data (Mathis et al., 2018). First, we installed the necessary packages: imageai, DeepLabCut, and the TensorFlow-object-detection-API and some other packages (e.g. pandas, TensorFlow/Keras) to help train our model. Next, we applied some image cleaning before preparing the calculation of the posing points, which are fixed key points. Afterwards, the use of the DeepLabCut library, as well as the dog images, were prepared. Then, new key points using the already calculated key points were predicted, and the pose was estimated. Finally, we computed some class weights for the actual classification part and implemented some EarlyStopping algorithms. As for the model optimizer, we used an Adam function (Kingma & Ba, 2017) and trained it using 100 epochs with 70 steps per epoch. Figure 17.3 shows the process of the model building. Evaluating the implemented model led to an accuracy of 68 per cent. We also evaluated the performance metrics for each emotion which can be seen in Table 17.5. We also added a confusion matrix visualizing the classification results in Appendix B.
Figure 17.3
Process of building a machine learning model with DeepLabCut
Table 17.5
Performance metrics for own DeepLabCut model for each emotion
Emotion
F1 score
Precision
Recall
Happiness
0.200
0.182
0.190
Anger
0.111
0.125
0.118
Fear
0.471
0.320
0.381
Relaxation
0.429
0.556
0.484
Second approach: use of Microsoft Azure Cognitive Services Secondly, we decided to try out commercial options for image classification. Microsoft offers Azure Cognitive Services Custom Vision (Custom Vision – Microsoft Azure, n.d.) customized and state-of-the-art embedded computer vision image analysis for specific domains. As for our dataset, we uploaded the labelled images of the dogs that we had collected and let the model classify the four emotions of the dogs. We obtained a precision score of 76.3 per cent, a recall
322 Handbook of social computing score of 75.2 per cent, and an F1 score of 78.4 per cent. We also evaluated the performance metrics for each emotion (Table 17.6). Table 17.6
Performance metrics of Azure model for each emotion
Emotion
F1 score
Precision
Recall
Happiness
0. 841
0.75
0.875
Anger
0.764
0.739
0.68
Fear
0.723
0.773
0.63
Relaxation
0.764
0.789
0.732
Third approach: use of Amazon Web Services Rekognition Lastly, we decided to try out Amazon Web Services with their Rekognition service, which offers proven, highly scalable, deep learning technology. With Amazon Rekognition, objects, people, text, scenes, and activities in images and videos can be identified. It also offers highly accurate facial analysis and facial search capabilities that can be used to detect, analyse, and compare faces for a wide variety of user verification, people counting, and public safety use cases. We obtained an overall model performance of 79 per cent, an average precision of 79.7 per cent, and an overall recall of 77.7 per cent as well as an F1 score of 77.3 per cent. We also evaluated the performance metrics for each emotion (Table 17.7). Table 17.7
Performance metrics for Amazon Web Services Recogniton model for each emotion
Emotion
F1 score
Precision
Recall
Happiness
0.788
0.689
0.894
Anger
0.857
0.913
0.808
Fear
0.778
0.650
1.00
Relaxation
0.738
0.828
0.667
Phase III: evaluation We compared the resulting approaches by their performance values in the testing and validation phases. Amazon Rekognition performed best for our use case. We therefore decided to embed the Amazon Rekognition model into our final mobile app. Amazon offers various integration APIs, among others one to integrate JavaScript, which we used for our app. Unfortunately, using a commercial approach like Azure or Amazon Rekognition, you get on one side better results regarding performance. On the other side, it is a kind of a black box. You do not know what the model is doing. Since the images all show dogs in different body positions that are labelled by the four emotions, the model is also looking for these body postures when predicting emotions. Unfortunately our model performed less well than the commercial ones and would need much further tuning. 4.2.3
Part three of the artefact – front end of the mobile app
Phase I: analysis Since the research objective is, as described earlier, to implement a mobile app that can be used to classify or predict the emotion of dogs, we defined a few use cases which our app should support. First of all, the user needs to have the opportunity to take a photo of their dog immedi-
Prototyping a mobile app which detects dogs’ emotions 323 ately. Secondly, the user should also be able to upload an already existing photo. Then the user needs to get feedback in the form of a message which tells how the dog is probably feeling. We considered a few prominent frameworks to implement the front end with. Furthermore, we considered whether to implement a native smartphone app or develop a progressive web app. Phase II: design We decided to implement a progressive web app (PWA) (Progressive Web Apps, n.d.). A PWA is built and enhanced with modern APIs to deliver enhanced capabilities, reliability, and installability while reaching anyone, anywhere, on any device with a single codebase. PWAs are just web applications. By using progressive enhancement, new capabilities are enabled in modern browsers. Using service workers and a web app manifest, the web application becomes reliable and installable. If the new capabilities are unavailable, users still get the core experience. We next decided to use a modern framework to implement the functionalities of the front end. However, we wanted the user to experience a well-designed user interface. Therefore, we used React JS (React – A JavaScript Library for Building User Interfaces, n.d.) as our framework. Using JavaScript for the front end, we ensured that it would work with common API calls as supported by Microsoft Azure Cognitive Services or AWS Rekognition. We also used the Material-UI design package. Before implementing it, we designed a mock-up of the app but decided to change the design a few times during the implementation phase. Phase III: evaluation We evaluated the implemented front end by conducting a pilot study with seven participants (dog owners) who tried out our app. As described earlier, we had to change the design a few times. We also needed to adapt the API call whenever we changed the hosting service of our ML model since each model needs another format of the image so that the ML model could process it properly. Also, as the ML models provided different outputs (e.g. probabilities for each emotion, or simply the most likely emotion), which were all in JSON format, we needed to adjust these methods each time to get the emotion displayed properly. 4.3
Phase of Diffusion
The Diffusion of the scientific contribution must be part of every design science project. As for our project, the results are communicated through this chapter and other channels (e.g. GitHub). We will make the code of our ML model with DeepLabCut and our code for the PWA publicly available. We will also share our dataset with further researchers who will work on the same topic so that their research can build on our work (the same way as we did regarding the work of Ferres, 2021 and Niklas & Ferres, 2019). A strong collaboration between project groups characterizes the research output built within the COIN seminar, so this will probably not be the last time this particular topic will be investigated. Therefore, clear and transparent communication is important.
324 Handbook of social computing
5. CONCLUSION We intended to contribute to research and practice by conducting a design science approach. We believe that the fundamental limitations are as follows. First, we tried to augment our image dataset using a crowdsourcing approach, meaning that we wanted other people to help us augment our dataset. We therefore explicitly joined social networks (online and in real life), which could be a somehow promising success since they are involved in this topic. However we found that dog owners or members of the social networks we joined were somewhat sceptical towards work like the one described in this chapter. They fear that their data might get abused. This made it more challenging to obtain the expected amount of data. This affected the accuracy of the implemented ML models (our own and commercial ones). We labelled the images based on the expressed body posture. It needs to be said that dog emotions are also research-in-progress from the psychological or biological side. We are no experts in analysing dogs’ emotions. We therefore wanted dog owners to label the pictures themselves and compare their labelling to researchers’ findings (Coren, n.d.; Henninger, n.d.; Simpson, 1997) to ensure that the images were labelled correctly. Nevertheless, we are satisfied with the outcome of this research work. Our contribution mainly lies in the learnings we have acquired and the implemented mobile app. This is ongoing research, and in future work, the self-built ML model can be further improved to get better performance without having to use the commercial options since these are non-transparent (for an example of the final app, see Appendix C). We also believe that the more images we get, the better the model will become. What could also be interesting for further research is the implementation of a reinforcement learning algorithm within the app to make use of the collective intelligence of dog owners, which could result in more precise outcomes. Since domestic animals express similar emotions with similar body postures, the implemented ML model could be used for other domestic animals (e.g. cats) with no change of code. Since this research is aligned with research of animal or dog emotions in general, it could be extended to identify automatically additional emotions whenever research finds out more about animal emotions in general.
ACKNOWLEDGEMENTS The authors thank Teresa Heyder for her comments and helpful suggestions on earlier drafts of this chapter as well as her guidance throughout the whole project. Furthermore, the authors acknowledge the continuos collaboration with earlier project teams – in particular, we thank Kim Ferres for her advice. Lastly, the authors are grateful for everyone who supported this research by sharing their images with us.
REFERENCES Affonso, C., Rossi, A. L. D., Vieira, F. H. A., & de Carvalho, A. C. P. de L. F. (2017). Deep learning for biological image classification. Expert Systems with Applications, 85, 114–22. https://doi.org/10 .1016/j.eswa.2017.05.039.
Prototyping a mobile app which detects dogs’ emotions 325 Aich, S., Chakraborty, S., Sim, J.-S., Jang, D.-J., & Kim, H.-C. (2019). The design of an automated system for the analysis of the activity and emotional patterns of dogs with wearable sensors using machine learning. Applied Sciences, 9(22). https://doi.org/10.3390/app9224938. Amici, F., Waterman, J., Kellermann, C. M., Karimullah, K., & Bräuer, J. (2019). The ability to recognize dog emotions depends on the cultural milieu in which we grow up. Scientific Reports, 9(1), 16414. https://doi.org/10.1038/s41598-019-52938-4. Bloom, T., & Friedman, H. (2013). Classifying dogs’ (Canis familiaris) facial expressions from photographs. Behavioural Processes, 96, 1–10. https://doi.org/10.1016/j.beproc.2013.02.010. Brownlee, J. (2019). Deep Learning for Computer Vision: Image Classification, Object Detection, and Face Recognition in Python. Machine Learning Mastery. Brugarolas, R., Loftin, R., Yang, P., Roberts, D., Sherman, B., & Bozkurt, A. (2013). Behavior recognition based on machine learning algorithms for a wireless canine machine interface. 2013 IEEE International Conference on Body Sensor Networks. https://doi.org/10.1109/BSN.2013.6575505. Coren, S. (n.d.). Which emotions do dogs actually experience? Modern Dog Magazine. Retrieved 1 June, 2022, from https://moderndogmagazine.com/articles/which-emotions-do-dogs-actually-experience/ 32883. Custom Vision – Microsoft Azure. (n.d.). Retrieved 1 June, 2022, from https://azure.microsoft.com/en -us/services/cognitive-services/custom-vision-service/. Dasiopoulou, S., Mezaris, V., Kompatsiaris, I., Papastathis, V.-K., & Strintzis, M. G. (2005). Knowledge-assisted semantic video object detection. IEEE Transactions on Circuits and Systems for Video Technology, 15(10), 1210–24. https://doi.org/10.1109/TCSVT.2005.854238. DeepLabCut – The Mathis Lab of Adaptive Motor Control (n.d.). Retrieved 1 June, 2022, from http:// www.mackenziemathislab.org/deeplabcut. Ekman, P., & Friesen, W. V. (1978). Facial Action Coding System (FACS). [Database Record]. APA PsycTests. https://doi.org/10.1037/t27734-000. Ferres, K. (2021). Predicting dog emotions based on posture analysis using machine learning algorithms [Master’s thesis]. University of Cologne. Franzoni, V., Milani, A., Biondi, G., & Micheli, F. (2019). A Preliminary work on dog emotion recognition. IEEE/WIC/ACM International Conference on Web Intelligence – Companion Volume, 91–6. https://doi.org/10.1145/3358695.3361750. Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. Henninger, J. (n.d.). Retrieved 10 January, 2024 from https://www.scanimalshelter.org/sites/default/ files/Canine_Body_Language_ASPCA.pdf. Reading canine body posture. The American Society for the Prevention of Cruelty to Animals. Hevner, A. (2007). A three cycle view of design science research. Scandinavian Journal of Information Systems, 19(2), article 4. Hevner, A., & Chatterjee, S. (2010). Design science research in information systems. Management Information Systems Quarterly—MISQ, 28, 9–22. https://doi.org/10.1007/978-1-4419-5653-8_2. Hevner, A. R., March, S. T., Park, J. & Ram, S. (2004). Design science in information systems research. Management Information Systems Quarterly, 28(1), 75–105. Homepage—Material Design (n.d.). Retrieved 1 June, 2022, from https://material.io/ Karmitsa, E., & Tiedottaja, E. T. (2016). Dogs distinguish emotions behind facial expressions. Retrieved 1 June, 2022, from https://www2.helsinki.fi/en/news/health/dogs-distinguish-emotions-behind-facial -expressions. Kingma, D. P., & Ba, J. (2017). Adam: a method for stochastic optimization. Retrieved 1 June, 2022, from http://arxiv.org/abs/1412.6980. Kujala, M. (2018). Canine emotions: guidelines for research. Animal Sentience, 2(14). https://doi.org/10 .51291/2377-7478.1350. Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., & Bethge, M. (2018). DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 21(9), 1281–9. https://doi.org/10.1038/s41593-018-0209-y.
326 Handbook of social computing Meridda, A., Gazzano, A., & Mariti, C. (2014). Assessment of dog facial mimicry: proposal for an emotional dog facial action coding system (EMDOGF ACS). Journal of Veterinary Behavior, 9(6), e3. https://doi.org/10.1016/j.jveb.2014.09.012. Niklas, L., & Ferres, K. (2019). Creating a smart system to detect dog emotions based on facial expressions [Seminar Thesis Coins Seminar 2019]. University of Cologne. Österle, H., Becker, J., Frank, U., Hess, T., Karagiannis, D., Krcmar, H., Loos, P., Mertens, P., Oberweis, A., & Sinz, E. J. (2011). Memorandum on design-oriented information systems research. European Journal of Information Systems, 20(1), 7–10. https://doi.org/10.1057/ejis.2010.55. Progressive Web Apps (n.d.). Web.Dev. Retrieved 1 June, 2022, from https://web.dev/progressive-web -apps/. React – A JavaScript Library for Building User Interfaces (n.d.). Retrieved 1 June, 2022, from https:// reactjs.org/. Rohr, K. (2001). Introduction and Overview. In K. Rohr (ed.), Landmark-Based Image Analysis: Using Geometric and Intensity Models (pp. 1–34). Springer Netherlands. Simpson, B. (1997). Canine communication. Veterinary Clinics of North America: Small Animal Practice, 27(3), 445–64. https://doi.org/10.1016/s0195-5616(97)50048-9. Solomon, C., & Breckon, T. (2011). Fundamentals of Digital Image Processing: A Practical Approach with Examples in Matlab. Wiley. Tracy, J. L., & Robins, R. W. (2004). Putting the self into self-conscious emotions: a theoretical model. Psychological Inquiry, 15(2), 103–25. https://doi.org/10.1207/s15327965pli1502_01. Tsai, M.-F., Lin, P.-C., Huang, Z.-H., & Lin, C.-H. (2020). Multiple feature dependency detection for deep learning technology—smart pet surveillance system implementation. Electronics, 9(9), 1387. https://doi.org/10.3390/electronics9091387. The History Place – Great Speeches Collection: George Graham Vest Speech – A Tribute to Dogs (1855) (testimony of George Graham Vest). Retrieved 1 June, 2022, from https://www.historyplace.com/ speeches/vest.htm. Waller, B., Caeiro, C. C., Peirce, K., Burrows, A., & Kaminski, J. (2013, March 15). DogFACS: The dog facial action coding system. University of Portsmouth. Retrieved 1 June, 2022, from http://dogfacs .com/manual.html.
Prototyping a mobile app which detects dogs’ emotions 327
APPENDIX A
Figure 17A.4 Adjusted design science research process
APPENDIX B
Figure 17B.5 Confusion matrix of our model using DeepLabCut
328 Handbook of social computing
APPENDIX C
Figure 17C.6 Screenshots of the final app
PART VII TEACHING AI FOR SOCIAL COMPUTING
18. Say ‘yes’ to ‘no-code’ solutions: how to teach low-code and no-code competencies to non-IT students Monika Sońta and Aleksandra Przegalinska
1. INTRODUCTION Most of the barriers to using AI were in one way or another related to difficulties associated with mastering a formal language and the time needed to do that. The cumbersome path from programming and coding, to data science and data analytics, and further on to machine learning and deep learning to some seemed like a never-ending story with no happy ending. Effectively, this prevented AI from going mainstream and becoming Internet-like general-purpose technology (Ciechanowski et al., 2018). This research aims to identify the main challenges and barriers for non-IT students learning low-code skills to propose a set of practices – the ‘Teaching Tips’ – that help course leaders teach low-code solutions (Wang & Hall, 2021). Even though there is a huge demand for AI technologies right now, there are very few AI experts available, thus, low-code, no-code platforms are often promoted as being “democratizers”, making it possible for anyone to create an app without any coding experience. The authors will also discuss the topic of democratization of AI (Jemielniak & Przegalinska, 2020) which demands the application of novel approaches to learning to code among non-technical individuals. The originality of this research is related to the application of the creative explorations approach and distilling the insights from the participant stories through projective metaphor-based techniques such as LEGO® SERIOUS PLAY® and Playmobil.pro®.
2.
LITERATURE REVIEW OR BACKGROUND
2.1
Definition of Low-code Development Platforms
With no-code software, developers can easily create full-featured apps without having to learn to code. In most cases, this type of software doesn’t require coding experience. No-code software is commonly used by non-technical individuals to create full-featured apps. They can simply drag and drop software elements into the application using a visual development interface. A low-code platform is a type of software that enables developers to create code quickly and easily using a graphical user interface (GUI). This type of software can be used to develop applications that are simple to understand and implement. We understand low-code and no-code tools (Shaikh, 2020) as solutions designed to simplify the process of creating software 330
Say ‘yes’ to ‘no-code’ solutions 331 applications, making it possible for people with little or no coding experience to build them. Due to the ease of use of low-code platforms, developers can easily create simple and effective software applications without having to write complex code. They can also draw flowcharts using a visual editor. The term ‘low-code development platform’ (LCDP) was mentioned in a commercial report by Forrester in 2017 (Hammond, 2017): “platforms that enable rapid delivery of business applications with a minimum of hand-coding and minimal upfront investment in setup, training, and deployment” (Di Ruscio et al., 2022: 438). Another definition of LCDP calls it more a “novel paradigm for developing software applications with minimal hand-coding through visual programming, a graphical user interface, and model-driven design” (Alamin et al., 2023: 2). This approach embodies end-user software programming. These low-code solutions are based on the ‘drag-and-drop’ software blocks that easily fit into the structure and can form an adjustable template (framework). The individual develops the command by using the already existing blocks. The entry barrier to learning is lowered even for novice learners as they are working with graphical blocks and can mix and match the elements of templates. A ‘prompting-like’ style of coding lets the individual stay more focused on understanding the logical flows and data infrastructure than on writing the code itself (Tan-a-ram et al., 2022, p. 2). The application of such a simple syntax with various moments to test the proper work of the blocks makes this solution less prone to human errors and more likely to test and experiment with the elements (blocks) used in the solutions. Moreover, drag-and-drop instruction blocks help learn data structures and wireless sensor network programming (Barkana & McDonough, 2019), enable integration with various systems and create an opportunity to apply coding to real-world implementations of embedded systems, opening opportunities to use more advanced and demanding coding skills solutions and coding tool companions that enhanced the quality and stability of the business application software. 2.2
Low-code Development Platforms as Agile Software Development Life Cycle (SDLC) Supporters
Clay Richardson used the term ‘low-code development’ in 2014 in a report for Forrester Research (Richardson & Rymer, 2014). The same report, “New Development Platforms Emerge for Customer-Facing Applications”, positions low-code solutions in the area of technology management and enterprise architecture (EA), showing, on the one hand, the greatest benefits of faster-than-usual application development and delivery (AD&D) process and, on the other hand, discussing the long-term fit and scale risks (Richardson & Rymer, 2014). The development of the concepts of five segments of low-code platforms based on the two dimensions of categorization (scenario flexibility vs functional breadth), namely, database, mobile, request handling, process, and general purpose, can be found in the “Vendor Landscape: The Fractured, Fertile Terrain of Low-Code Application Platforms” (Richardson & Rymer, 2016, p. 6). Simplification can also be seen in the way these tools make it easier to deploy and manage applications. This is another confirmation of the reduction of complexity and simplification of the coding activities with the probable limitations of scalability and security of the solutions (Hammond, 2017). This novel paradigm is also being discussed as the acceleration that enables flexibility to support an agile software development life cycle (SDLC), especially when we talk about
332 Handbook of social computing general-purpose low-code platforms and multidimensional areas of application. Because of the ease of use of low-code platforms, they are widely used in developing software applications for various industries such as the Internet of Things (IoT), Edge Computing and artificial intelligence. According to Galhardo and Silva (2022), low-code platforms demand mastery and precision when it comes to setting the requirements. Their benefits are most visible in the prototyping phase by transforming an Application Specification Language (ASL) into a software application via a low-code platform (Luo et al., 2021). For instance, they often include features that automate the process of setting up and maintaining a development or test environment. This can save significant time and effort, particularly for larger applications. Low-code and no-code tools can also help to reduce the cost of developing and deploying applications. By simplifying the process and making it possible for more people to get involved, they can help to reduce the need for still scarce specialists such as developers and testers. Most of the barriers to using AI were in one way or another related to difficulties associated with mastering the formal language and the time needed to do that (Bisser, 2021). The cumbersome path from programming and coding to data science and data analytics, and further on to machine learning and deep learning, to some seemed like a never-ending story with no happy ending. Effectively, this prevented AI from going mainstream and becoming Internet-like general-purpose technology (Ciechanowski et al., 2018). Finally, the low-code solutions use platform-independent specification languages which enable software development by practitioners from different backgrounds (e.g., Oracle’s APEX which uses basic SQL elements). The overall orientation in the flow of logic in programming is recommended but not required. This comment is accurate not only in the APEX environment but also in other low-code development platforms such as Mendix, Outsystems and Appian. By making the process of developing software applications easier, these tools can be used by a wider audience. 2.3
Low-code Development Platforms as New Coding Learning Enablers
From the point of view of the individual developer, non-technical individuals effortlessly develop and deploy business-grade business apps, all through a low-code visual design interface and using universal and customizable templates. With the more advanced detecting content tools based on keywords or colors, the quality of inserting information in templates increases. Some researchers call this process of app design and creation more facilitation than coding activity and some other descriptions are closer to ‘prompting the content’ as the essence is about choosing the right combination and displaying it from the ready-to-be-used elements that create something from scratch. Part of the low-code infrastructure is reusable language components for their configuration in a way that the generic components are combined to create customized solutions.
Say ‘yes’ to ‘no-code’ solutions 333 Thanks to the intuitive instructional design applicable to the low-code platforms, this novel paradigm is said to be a way to democratise programming. The demand for the low-code platform is generated through: ● a shortage of engineers ● rising demand for enterprise-grade business apps ● a need for upskilling of junior IT professionals as only 39.6 percent of candidates for IT job opportunities fulfill all the employer’s standards (Forrester, 2017). In summary, the approach to low-code platforms is discussed in the literature in three main contexts: 1. In the context of efficiency through business automation and smart manufacturing to use custom purpose-built software solutions to design digital twins (Dalibor et al., 2022). As the demand is rising, the advantages of an LCDP for manufacturing include shorter development life cycles, faster implementation of new processes, more efficient workflows, and less need for specialized IT (Waszkowski & Bocewicz, 2022). For example, when we talk about digital twins, the two-step method facilitates creating tailored low-code development platforms as well as creating and operating customized digital twins for a variety of applications. 2. General approach as a trend, an efficient way to develop applications. In the educational context, coding is regarded as a gateway to computational thinking for non-technical individuals who are experimenting with block-based coding environments. Secondly, with the usage of visual interface simplifies such as icon-based or block-based programming, the learning curiosity, and the sense of creation of an end-to-end working element is higher. Moreover, the creation of “playful, relaxed environments for students to have fun exploring and tinkering with the designs they have created” is a recurring trend in the educational trends in design thinking and engineering education design (Psenka et al., 2017, p. 14). 3. In the context of the democratization of AI-integrating solutions (Hedberg, 1998), further developments such as generative AI will only support the flourishing of low-code solutions with the visual templates generated and customized, creating high-growth opportunities in adjusting the templates based on the keywords in the content that develops the low-code solutions into automated software generation. 2.4
Low-code Development Platforms to Democratize AI
These characteristics enable a massive scale of impact on the way to the democratization of AI. Low-code and no-code artificial intelligence tools are designed to make it easier for people with little or no programming experience to build and use AI models. These tools typically provide a user-friendly interface that allows users to build, train, and deploy AI models without writing any code. There are several ways in which low-code and no-code AI tools can contribute to the democratization of artificial intelligence: ● They make it easier for non-technical users to get started with AI: By providing a user-friendly interface and pre-built models, low-code and no-code AI tools lower the barrier to entry for people who want to use AI but don’t have programming experience.
334 Handbook of social computing ● They allow for rapid prototyping and experimentation: Low-code and no-code AI tools allow users to quickly build and test prototypes of AI models, enabling rapid experimentation and iteration. ● They can help to democratize access to AI: By making it easier for people to use AI without requiring specialized knowledge or resources, low-code, and no-code AI tools can help to democratize access to AI and make it more widely available (Masood & Hashmi, 2019). ● They encourage organizations to use a data-driven decision-making approach (Cai & Zhu, 2015) and perceive organizations as data-centric systems (Faulkner & Nicholson, 2020). ● They can facilitate the development of AI applications by a wider range of organizations: Low-code and no-code AI tools can enable organizations of all sizes, including small businesses and start-ups, to develop and deploy AI applications, which can help to drive innovation and increase the adoption of AI (Pipino et al., 2002). This approach can be smoothly linked to empowerment through tech-savviness and one of the most interesting findings is the conclusion that low-code tools can empower the youth and boost curiosity to experiment with coding to reach the sense of co-creation of realities (Papert & Harel, 1991). The hope is that this reduction of barriers to app development can lead to the creation of more innovative and useful applications, as well as a wider range of people being able to use them (Kobayashi et al., 2019). Finally, the most promising interpretation that leads to the democratization of AI and empowerment through tech-savviness is the scale of potential growth as low-code platforms are named ‘the next big thing’ and according to Gartner (2021) in 2024 around 65 percent of IT-customized application development services will be generated this way and the global market for low-code platforms should reach $68.84 billion by 2026 at a CAGR of 28.6 percent (PR Newswire, 2022).
3. RESULTS 3.1
Research Design
From September to December 2022 the authors conducted six focus groups among Kozminski University students who have been asked to share their viewpoints on what is supportive and what limits their learning journey. These students had no previous experience with programming. The research questions were as follows: ● What is the most challenging element in learning to code? ● What is the most rewarding element of learning to code? The research is conducted in the Human-Centered Design approach with an emphasis on exploring and identifying a list of challenges that may slow down the process of learning. Each focus group was a mass-scale (more than 16 participants) interactive workshop using one of the metaphor-based projective techniques taken from visual thinking (collage method) or serious play approaches (LEGO® SERIOUS PLAY®). Thanks to the application of projective techniques, the participants creatively explore their identities and express their
Say ‘yes’ to ‘no-code’ solutions 335 experiences through individual stories. Additionally, the workshop itself is a social experience and the stories emerge based on collective creative explorations (Gauntlett, 2007). Before collecting the insights from students, we asked 21 academic teachers from Central and Eastern European countries about their observations of how to teach low-code skills. The 2.5-hour-long focus group was part of the Better Employability with APEX (https://beeapex .eu/) project meeting co-financed by the European Union. Twelve out of 21 were experienced academic teachers from Central and Eastern European countries who lead courses that cover low-code skills and were preparing the manual about Oracle’s APEX low-code development platform. Table 18.1
Focus group profiles
Date
Participants
Technique of facilitation
September 2022
21 participants, IT-related academic courses teachers from
LEGO® SERIOUS PLAY® + Playmobile.
Poland, Slovenia, Slovakia, Greece, Croatia, and Poland.
pro
October 2022
3x30 students in the first year of Management at Kozminski Collage, vision boards
October 2022
1x18 part-time students (experienced professionals),
University LEGO® SERIOUS PLAY®
students of Economics at Kozminski University November 2022
Source:
2x27 students of full-time studies
Collage, vision boards
Authors’ elaboration.
The themes distilled from the conversations were based on thematic categorization. Each focus group was a creative workshop with projective techniques such as vision boards, Playmobil. pro, or LEGO® SERIOUS PLAY® used to frame the discussion. Each group or an individual, depending on the exercise, was to share her/his stories to answer the questions about challenges/blockers and enablers/supporters of a coding learning journey. 3.2
Creative Facilitation as a Method Used in Focus Groups
According to various researchers (Peixoto Ribeiro, 2011; Gauntlett, 2007), social meaning is generated through creative connecting and enabling a space to share. The workshop participants come together, produce their ideas individually, and contribute to the creation of new ideas and the general vision of the group, and in this way the participatory culture is created. This type of exchange is like a creative allotment and is associated with the empowerment to mark the presence of the individual in the group. A sense of exploration and ‘agency’ plays a crucial role in the process of learning the workshops with projections and metaphorical individual stories help communicate the individuals’ personal beliefs, values, and assumptions about challenges, and then, through a transformational dialogue, the meaning is being attached to the group stories. The process of forming an interpretation is collaborative and generative. Talking about psychological knowledge in social constructionism, Gergen (1973) claims the participants in the social interactions aim at integrating into the community, reconciling individuals with social life, and not only communicate what appears but also “subtly prescribe what is desirable” (Gergen, 1973, p. 311).
336 Handbook of social computing 3.3
Key Findings
3.3.1
Challenges/blockers in the low-code learning journey
Experience No. 1: “Confusing and overwhelming” The leading theme is that learning how to code is a “confusing and overwhelming experience”. The promises and benefits are unlimited; it is about high-growth opportunities but on the other hand, these promises are abstract and blurred with the feeling that it is not a quick win at all, but rather a tedious work with a lot of “give-up” points on the way. One of the participants called it “a promised land” but “without a clear path and no obvious starting point”. The dominant perception associated with learning to code is that they have no prior experience, so this is about entering an unrecognized area. One of the students compared this experience to learning a new foreign language. Without a real conversation, the probability of giving up is rather high as the vision of success is blurred. Experience No. 2: “I don’t know where to start from” There are too many tools and programming languages. It’s hard to get orientation on what will be the most useful: Python, Java, R? You don’t know what you will need it for, and you need to choose, and declare without having orientation what are the purpose and practical application.
The students in the part-time group mentioned that the requirements differ from one company to another, so the decision is dictated by the requirements of the company. They also notice various possibilities to join the programming crash courses which they think positive. However, the conditions required after the completion of the program limiting their career choices as a part of the agreement is to join the project indicated by the course organizers or providers which is usually a direct interdependency line to the company that funds the ‘open free program’ in coding. Experience No. 3: “Tedious work with long-term commitment” This experience is seen as a very abstract success. Even with a growing ability to code some small pieces, the student doesn’t see the final picture of success. The analogy to studying appeared: “when you study, you know that it will be a kind of exam and finally graduation”. The fear is related also to the feeling of ‘never-ending beginners’ as in this learning scheme if you start something it is hard to say that you master it, there are always new languages and new tools to learn. If you stop learning due to some other priorities, you are still thinking that something is not completed, so you come back to learning as you know that some opportunities are wasted. An illustrative quote to this statement is as follows: At first you see only challenges. You need to invest a lot of energy to understand the whole concept.
The stories about investing energy in something that they see as a long-term investment occurred in every workshop. Another illustration is an analogy to a ‘fireman’s job’ – the work is hard, you need to be always in training, always can perform better and improve his/her skills, and there are not so many occasions for the real practice of the skills.
Say ‘yes’ to ‘no-code’ solutions 337 Experience No. 4: “Pressure to learn, but no clear reason why” and “course to survive and pass on” People around you put this pressure on you: “you should get orientation at least to understand and follow reality” and at the same time you hear that “it is not necessary to code as sooner than later we will be working with efficient no-code tools and all you need to do is to use the technology to translate the requirements and generate no-code solutions”. On one side there is a pressure that we should be able to code to survive in the world, that this is a competence that will be needed in the future. On the other hand, we hear that there will be no-code reality and we don’t need to learn languages or programming as we will have an automated translation or no-code solution.
A few stories explained this type of pressure as there are courses “we don’t like and don’t find practical, but we need to ‘survive them’ to complete our studies”. “If learning programming was mandatory, I would treat it as a subject to ‘survive’.” The recommendation for a solution emerges in the same story: It would be great to be ready with the basic skills, basic understanding of tools and solutions and accelerate the learning when starting the work. Now, I don’t need it. I see it as something that demands a lot of effort and I have too many things on my plate to treat it as a separate skillset to learn.
3.3.2 Solutions/enablers in the low-code learning journey As the workshops were led in a problem–solution scheme, the stories provided us with directions on how to deal with the challenges. Recommendation No. 1: “Give yourself a community boost” The highest saturation of stories was related to working together as “we need an extra boost of energy to move forward”. Working together was described as a space for discussing problems together and community stands for a motivation boost in the process of learning. In the eyes of participants, the community is the occasion to observe people who are experienced in programming. It is not only about role-modeling or mentoring but more about the opportunity to get motivated for the continuation of learning. The stories express some informal events: “It should be like a meet-up with an opportunity to ask questions and discuss problems.” Recommendation No. 2: “Benefits are both: money and lifestyle” In each group, the benefits started with money. In the groups that were building models using LEGO bricks, the example of the expression was a model of ‘treasure’ – a box full of money and the students who were drawing something expressed the vision of a high salary for the junior developer in numbers. The money factor was also mentioned in the part-time students as they already have some experience in the job market. They mentioned a new start with programming skills, and a better-quality job. “Of course, it is all about money, especially when we talk about the students and they’re entering the job market” or “It is easier to reach a higher wage ceiling just after graduating” – said another participant. Talking about investing their energy in programming, students mentioned that this is just one of many skills they would need to master. “Programming should be just one element of the reality, not everything, we need this wider perspective” and a similar statement illustrates the statement that talking about programming as an opportunity to enter a new world full of
338 Handbook of social computing opportunities is overwhelming. Students mentioned that they do not want to be independent IT professionals. IT and programming are just one of the many perspectives at our disposal that should empower the realization of your professional visions and ambitions: We shouldn’t have a perspective that we’ll be IT professionals, but we should have more “jack of all trades” mindset and IT is just one of the blocks of our winning strategy.
Recommendation No. 3: “It should be a cheerful opportunity for celebrations” As there is too much pressure around the demand to learn to program we should change the narrative to express that this is an opportunity to never stop learning and not studying just for a certificate, or confirmation of skills. “We should have a student-minded attitude, open to learning without the expectation to master something soon”, one of the students said, adding also that competition of studying or finishing the certification training should be just the beginning of the learning journey. Many of the stories were related to the celebration of small steps on the way like a celebration of new skills and not just final success. In those stories, the essence was about celebrating those moments together with other learners. Switching the narrative into the direction of lifestyle can be illustrated with the statements such as “the best in coding is this feeling that you are ‘smart’”. The change in the storytelling about the programming could be directed toward lowering the entry barrier to enter this reality. You learn this language of programming to understand the universal syntax and the framework. The essence of this approach is understanding the logic of coding is like a universal framework that should be learned. This is more about learning the scheme, the flow of logic, and reasoning to be able to customize the universal elements. It is like ‘prompting’ instead of learning how to code. The knowledge is about how to mix and match the elements to generate the solution. Recommendation No. 4: “It is a good feeling when you build something” – entering the world of creativity and agility Another insight is related to the agile reality they feel they can enter when they learn to program. One of the metaphors is ‘getting hard skin, tough skin’. The student told us that he wished he had learned to program earlier as it would be a road map of gradual growth. Now, when studying Management, he learns about innovative solutions and agility and the approach to failures, but this is not something he understands. He used a ‘hard skin’ metaphor to illustrate the long-learning journey with lots of failures and experimentations and the iterative way to learn new skills and build his resilience when it comes to the way he gains new skills and mindsets.
4.
DISCUSSION AND PRACTICAL IMPLICATIONS FOR TEACHERS AND MENTORS OF JUNIOR IT PROFESSIONALS
Having identified the key findings about enablers and blockers when learning low-code skills (Table 18.2), the discussion will be centered around the insights from the workshop with the academic IT teachers. They have been asked about blockers and enablers when learning low-code solutions, and then they have projective exercises to understand the reality of stu-
Say ‘yes’ to ‘no-code’ solutions 339 dents and empathize with them. Finally, they asked about the key elements that build a friendly learning environment for non-technical students. 1. Curiosity and excitement. “The learning environment should be exciting, not just stressful. It should be more like an informal discussion than a ‘learning session’” – this statement is in line with the concept of learning through self-discovery. As it is a long-term commitment with the vision of hard work, there should be various sources of curiosity with various ‘checkpoints’. For example, the teachers mentioned hackathons as an efficient way to be a part of the team and contribute to success from the non-technical perspective with the opportunity to confront the mindset with colleagues of different educational backgrounds. 2. Failure-friendliness and experimentation mode. “Failures are important when you learn how to program. The mindset that the failures are building blocks of your success is needed, so make your failures” – you can find an alignment with experimentation, rapid prototyping, and testing new solutions and then with the iterative approach to learning. 3. Taking care of mental health and healthy workstyle. “This environment should insist on work–life balance as when you are focused on coding (building an app for example). It is easy to forget about the world until you make this code work and solve the problem” – as coding creates this moment of never-ending learning, we can use this opportunity to teach our students about how to keep the tension of ‘unfinished work’ or ‘jobs to be done attitude’. “It is more like a long process and if succeeded, the reward will be consistent and long-term. Now, the narrative is more like programming is a Holy Grail of all.” Furthermore, one of the workshop participants said: “This learning journey is long, you need to understand how to navigate” – in this context, the class programming can develop other life competencies in the backgrounds. Technical knowledge is just one part of the story. The syllabus can have social skills: resilience, agility, a healthy mind, a sense of creation, and cooperation in the background. 4. Self-paced learning and self-conscious of your learning style. One of the key insights from the workshop with the academic teachers was the confirmation of a hectic and stressful environment which gives the students the feeling of being overwhelmed. The main approach should be fast-paced learning with checkpoints and moments to talk to other people. The division of the journey into smaller pieces is related to the gradual increase of the skills: “The more you learn, the more advancement in the tool you get.” The challenge is that, usually, non-IT students learn to program in the form of short-term courses and then events such as hackathons without the reinforcement moment on the way. The learning process is not designed for gradual consistency but more like a ‘checklist’ of tasks. Table 18.2
Summary of blockers and enablers when learning to program
Blockers
Enablers
Feeling of being overwhelmed
Group work – supporting each other Community boost
Confusion about where to start
Quality of working life – building the feeling of empowerment and agency through tech-savviness Money High-profile opportunities Employability
Tedious work/hard work
Positive lifestyle – learning the flow of logic, not the programming language
Blurred picture of success
Agency, creativity, and resilience
Source:
Authors’ elaboration.
340 Handbook of social computing 4.1 Limitations The originality of this study is built on the creative techniques used during focus groups to gather the essence of the stories around the participant’s experiences. Indeed, the level of insightfulness was sufficient and enabled forming of the findings based on the saturation of statements that appeared in the participant discussion. Secondly, the two perspectives – the 162 students who have no prior experience with programming and 21 experienced academic teachers who are designing low-code skills courses – have been combined. Researching the learning path, enablers and blockers using a quantitative approach with an audience from different cultural backgrounds than Central and Eastern European countries would be recommended.
5. CONCLUSION The conclusion is that they should start with the ‘creation and discovery’ narrative around ‘trying new skills just out of curiosity’ with the option to leave it after the trial period. “For some, too much pressure, too many stressful moments are too much, but if you are the type of person that like discovering things, this is something for you” – so the courses should be inviting in the ‘try and see if you like it’ style instead of ‘the-developer-to-be’. “The attitude that the programming opportunities are the great opener to the dream job is harmful” adds one of the professors. Low-code learning approach should promote self-directed learning agency, and easy access to use so-called wizard content creators or the possibility to apply ready-to-use elements such as cloud-based databases or pre-defined templates also influences the self-agency and self-directed learning agency of the user (Bull, 2017) supports this approach. Low-code learning design should also include a collaborative atmosphere with curiosity-driven learning (Burda et al., 2019) and could be ‘embedded’ in the traditional courses, not just like a separate academic course. Low-code tools enrich the classroom with both individual and reflective learning opportunities and enable a connective learning environment. Finally, the authors emphasized how low-code tools can empower the youth and boost curiosity to experiment with coding to reach a sense of co-creation which also opens a discussion about the impact of empowerment and agency building through tech-reskilling and upskilling.
ACKNOWLEDGMENTS The authors are working on the ‘BEE with APEX’ (Better Employability with APEX) project (for more details, see https://beeapex.eu/). BEE with APEX introduces a set of educational tools to provide no-code and low-code tools and solutions. This chapter was financed by grant no. 2021-1-SI01-KA220-HED-000032218.
Say ‘yes’ to ‘no-code’ solutions 341
REFERENCES Alamin, M. A. A., Uddin, G., Malakar, S. et al. (2023). Developer discussion topics on the adoption and barriers of low code software development platforms. Empirical Software Engineering 28(4), 1–59. https://doi.org/10.1007/s10664-022-10244-0. Barkana, B. Z., & McDonough, W. (2019). AP computer science principles: designing the hour.ly App in MIT App Inventor. In Proceedings of the 2019 IEEE Long Island Systems, Applications and Technology Conference (LISAT), Farmingdale, NY, USA, May 3, pp. 1–6. Bisser, S. (2021) Introduction to the Microsoft Conversational AI Platform. Microsoft Conversational AI Platform for Developers. https://doi.org/10.1007/978-1-4842-6837-7_1. Bull, B. (2017). Adventures in Self-Directed Learning: Nurturing Learner Agency and Ownership. Wipf & Stock. Burda, Y. E. H., Pathak, D., Storkey, A., Darrell, T., & Efros, A. A. (2019). Large-scale study of curiosity-driven learning. In 7th International Conference on Learning Representations (ICLR 2019), pp. 1–17. https://openreview.net/forum?id=r JNwDjAqYX (last accessed 28 December 2023). Cai, L., & Zhu, Y. (2015). The challenges of data quality and data quality assessment in the big data era. Data Science Journal, 14: 2, 1–10. http://dx.doi.org/10.5334/dsj-2015-002. Ciechanowski, L., Przegalinska, A., Magnuski, M., and Gloor, P. (2019). In the shades of the uncanny valley: an experimental study of human–chatbot interaction. Future Generations Computer Systems, March, 92, 539–548. https://doi.org/10.1016/j.future.2018.01.055. Dalibor, M. et al. (2022). Generating customized low-code development platforms for digital twins. Journal of Computer Languages 70. https://doi.org/10.1016/j.cola.2022.101117. Di Ruscio, D., Kolovos, D., de Lara, J. et al. (2022). Low-code development and model-driven engineering: two sides of the same coin? Software System Model, 21, 437–46. https://doi.org/10.1007/s10270 -021-00970-2. Faulkner, A., & Nicholson, M. (2020). Data-centric systems. Data-Centric Safety. https://doi.org/10 .1016/b978-0-12-820790-1.00018-8. Galhardo, P., & Silva, A. R. D. (2022). Combining rigorous requirements specifications with low-code platforms to rapid development software business applications. Applied Sciences (Switzerland), 12(19), 1–26. https://doi.org/10.3390/app12199556. Gartner (2021). Gartner says the majority of technology products and services will be built by professionals outside of IT by 2024. https://www.gartner.com/en/newsroom/press-releases/2021-06-10-gartner -says-the-majority-of-technology-products-and-services-will-be-built-by-professionals-outside-of-it -by-2024. Gauntlett, D. (2007). Creative Explorations: New Approaches to Identities and Audiences. Routledge. Gergen, K. J. (1973). Social psychology as history. Journal of Personality and Social Psychology 26(2), 309–20. https://doi.org/10.1037/h0034436. Hammond, J. (2017). The Forrester Wave: mobile low-code development platforms, Q1 2017. Forrester Research, Cambridge [2016]. https://www.forrester.com/report/The-Forrester-Wave-Digital -Experience-Platforms-Q3-2017/RES137663. Hedberg, S. R. (1998). Is AI going mainstream at last? A look inside Microsoft Research. IEEE Intelligent Systems and their Applications 13(2), 21–5. Jemielniak, D., & Przegalinska, A. (2020). Collaborative Society. The MIT Press. Kobayashi, Y., Ishibashi, M., & Kobayashi, H. (2019). How will ‘democratization of artificial intelligence’ change the future of radiologists? Japanese Journal of Radiology 37(1), 9–14. Luo, Y., Liang, P., Wang, C., Shahin, M., & Zhan, J. 2021. Characteristics and challenges of low-code development: the practitioners’ perspective. International Symposium on Empirical Software Engineering and Measurement [Preprint]. https://doi.org/10.48550/arXiv.2107.07482. Masood, A., & Hashmi, A. (2019). Democratization of AI Using Cognitive Services. In A. Masood and A. Hashmi (eds), Cognitive Computing Recipes: Artificial Intelligence Solutions Using Microsoft Cognitive Services and TensorFlow (pp. 1–17). Apress. Papert, S., & Harel, I. (1991). Situating constructionism. In I. Harel & S. Papert (eds), Constructionism, pp. 1–17, Ablex Publishing Corporation. Pipino, L. L., Yang, W. L. & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM 45(4), 211–18.
342 Handbook of social computing PR Newswire (2022). Low-Code Development Platform Global Market to Reach $68.84 Billion by 2026, PR Newswire US, September 20. https://search.ebscohost.com/login.aspx?direct=true&db=bwh&AN =202209200645PR.NEWS.USPR.IO78299&site=eds-live (last accessed 28 December 2023). Psenka, C. E., Kyoung-Yun, K., Okudan Kremer, G. E., Haapala, K. R., & Jackson, K. L. (2017). Translating constructionist learning to engineering design education. Journal of Integrated Design & Process Science 21(2), 3–20. https://doi.org/10.3233/jid-2017-0004. Ribeiro, E. M. P. (2012). Making is connecting: the social meaning of creativity, from DIY and knitting to YouTube and Web 2.0. Comunicação e Sociedade 22, 206–10. https://doi.org/10.17231/comsoc .22(2012).1282. Richardson, C., & Rymer, J. (2014). New development platforms emerge for customer-facing applications. Forrester Research, Cambridge. https://www.forrester.com/report/New+Development+ Platforms+Emerge+For+CustomerFacing+Applications/-/E-RES113411 (last accessed 28 December 2023). Richardson, C., & Rymer, J. R. (2016). Vendor landscape: the fractured, fertile terrain of low-code application platforms: the landscape reflects a market in its formative years. https://www.forrester .com/report/Vendor-Landscape-The-Fractured-Fertile-Terrain-Of-LowCode-Application-Platforms/ RES122549 (last accessed 28 December 2023). Shaikh, K. (2020). AI with low code: demystifying Azure AI. https://doi.org/10.1007/978-1-4842-6219 -1_5. Tan-a-ram, S., Leelayuttho, A., Kittipiyakul, S., Pornsukjantra, W., Sereevoravitgul, T., Intarapanich, A., Kaewkamnerd, S., & Treeumnuk, D. (2022). KidBright: an open-source embedded programming platform with a dedicated software framework in support of ecosystems for learning to code. Sustainability 14(21), 14528. https://doi.org/10.3390/su142114528. Wang, H., & Hall, N. C. (2021). Exploring relations between teacher emotions, coping strategies, and intentions to quit: a longitudinal analysis. Journal of School Psychology, 86, 64–77. https://doi.org/10 .1016/j.jsp.2021.03.005. Waszkowski, R., & Bocewicz, G. (2022). Visibility matrix: efficient user interface modelling for low-code development platforms. Sustainability 14(13), 8103. https://doi.org/10.3390/su14138103.
Index
accuracy 315 activation function 314 advertising 216, 219, 250–51, 259–60, 286–7 age (predicting YouTube success study) 151 agile software development life cycle (SDLC) 331–2 Amazon Web Services Rekognition 322 anger 164–5, 167–9, 175, 181, 186, 319–20, 321–2 application programming interfaces (APIs) 55, 56, 62, 143, 144, 179, 196, 268–9, 323 artefacts 316–23 artificial intelligence (AI) see low-code and no-code competencies, teaching artificial neural networks (ANNs) 313–14, 320 asset price prediction approaches 49–50 association mining 108 automated text mining 29 automotive industry see congruence between customer and brand personality Aven, T. 52 basic text analysis 108 batch size 314 betweenness centrality 221, 223, 224, 225 Biden Administration, use of social media 75 Big Five Personality Model 192 biosensors see measuring emotions of jazz musicians using plants as biosensors; plants as biosensors Bitcoin 50 Black Swan events see predicting Black Swan events in cryptocurrency markets blockchain 50 body posture of dogs, and emotion recognition see emotion recognition in dogs body sensors for dogs 316 see also measuring emotions of jazz musicians; measuring emotions of jazz musicians using plants as biosensors body signals and movement 173, 177–8, 185–6 Bollinger Bands (BB) 58 bots, and disinformation 100–101, 104, 128–31, 132 brand personality see congruence between customer and brand personality
Cambridge Analytica scandal 192–3 Canhoto, A.I. 218 Carley, K.M. 280 cascading behavior in social networks 53–4, 72–3 cascading model 65–7, 68, 69–70 CatBoost 242–3 causal inference approaches 13 Circumplex model 176 climate change discourse 279 conclusion 289–90 discussion 289 literature review/background 279–81 methodology 281–3 results 283–8 closeness centrality 221, 224, 225 cluster strength 64 code and coding see low-code and no-code competencies, teaching collaboration between musicians (exponential random graph models case study) 15–24 Collins, M.E. 296 community influence 108 competition 190 Condor 49, 219–20, 221, 222 confirmatory factor analysis (CFA) 108 confusion matrix 315, 321, 327 congruence between customer and brand personality 190–91 conclusion 210–11 definitions 194–5 methodology 195–8 results and discussion 198–210 theoretical background 191–5 consent banners on plastic surgeon websites 249 conclusion 260 discussion 258–60 literature 250–52 method 252–3 results 253–8 conspiracy theories 280 convolutional neural networks (CNNs) 161, 176, 304–5, 320–21 cookies see consent banners on plastic surgeon websites coolhunting projects 49 Coren, S. 311 corpora (NLP methods) see natural language processing (NLP) methods
343
344 Handbook of social computing corporate social responsibility (CSR) 266 Corpus of Presidential Speeches (CoPS) 32, 34–7, 38 Cova, B. & V. 194 COVID-19 pandemic customer (group) segmentation study 217, 219–26 see also disinformation on Twitter during COVID-19 pandemic; Presidential communications on Twitter during COVID-19 pandemic Cremer, S. 143, 155 Cross-Industry Standard Process for Data Mining (CRISP-DM) 144 Crovitz, H.F. and the Crovitz 42 Relational Words 236, 237–9, 240, 241, 244, 245–6 crowdfunding success 234 conclusion 246 discussion 244–6 literature review 235–6 methods 236 results 236–44, 245 cryptocurrency markets see predicting Black Swan events in cryptocurrency markets Cui, F. 143, 155 customer and brand personality see congruence between customer and brand personality customer registry 216 customer segmentation 216–17 conclusion 228–9 discussion 226–8 methodology 219–23 results 223–6 theoretical background 217–19 dark patterns 252, 257–8, 259 data protection 173, 249, 251, 258 data visualization see network data visualization deep learning 28, 176, 304–6, 313, 322 deep neural networks (DNNs) 50, 161, 321 DeepLabCut 321, 327 degree centrality 221–2, 223, 224–5 democratization of AI 330, 333–4 design science research (DSR) 316–18, 327 Didimo, W. 8 Ding, H. 76–7 disinformation index 108, 109–12, 132–3 disinformation on Twitter during COVID-19 pandemic 100–102 conceptual framework 102–7 conclusion 133–4 discussion 132–3 methodology 108–16, 119 results 117–18, 120–31
see also Presidential communications on Twitter during COVID-19 pandemic Dog Facial Action Coding System (DogFACS) 315–16 dogs, emotion recognition see emotion recognition in dogs Easley, D. 53–4, 72–3 echo chambers 279, 280–81 economic uncertainty (Presidential communications on Twitter during COVID-19 study) 76, 78–9, 80, 82, 85, 90, 91, 92–3, 94, 95 emotion recognition see measuring emotions of jazz musicians; measuring emotions of jazz musicians using plants as biosensors; plants as biosensors; predicting YouTube success emotion recognition in dogs 310 analysis and results 318–23, 327 conclusion 324 final app screenshots 328 research method 316–18, 327 theoretical background 311–16 emotional text mining (ETM) 236, 239–42, 243, 245–6 Enron corpus 32, 34–7, 38 entrepreneurs in creative businesses 15 environmental social governance (ESG) rating system 265–6 conclusion 276 discussion 272–6 methods 268–71 project purpose 267–8 related works 266–7 results 271–2, 273, 274, 275 epidemics, discourse during 76–7, 78 epochs 314 Eslen-Ziya, H. 281 ethnicity (predicting YouTube success study) 146, 148, 149, 150, 151, 154 exploratory factor analysis (EFA) 108, 120, 124 exponential random graph models (ERGMs) 12–14 case study (explaining collaborations between musicians using Spotify data) 15–19 conclusion 24 discussion 23 study results 19–23 F-Measure 315 face emotion recognition (FER) algorithm 179 facial emotion recognition dogs 315–16
Index 345 and physiological signals 186 see also measuring emotions of jazz musicians facial emotion recognition and facial attributes see predicting YouTube success facial emotions 175–6 factor analysis (FA) 108, 109, 117–18, 120, 121–2, 124 Ferres, K. 310, 316, 318, 319 filter bubbles 279 financial and fiscal risk analysis 8 Five-Factor Model 192 Flair 270–71, 276 flow 159, 167–8, 169–70 force-directed algorithms 3–4 Ford (congruence between customer and brand personality study) 198–200, 201, 202, 208 Foreman, J.E.K. 296 game theory 53–4, 72–3 General Data Protection Regulation (GDPR) 173, 249, 251, 258 General Themes Books corpus 32 Generative Pre-trained Transformer 3 (GPT-3) 62 Gloor, P.A. 49, 177, 191, 194, 197, 221, 222, 295 goodness of fit (GOF) 21, 22 government communications see Presidential communications on Twitter during COVID-19 pandemic granularity and specificity of cookie setting options 253–5 Griffin 49, 55, 58, 197 Guardian, The 287 Gutenberg corpus 32, 34–7, 38 ‘hairball drawings’ 4, 5 Happimeter 161, 162–3, 164–5, 166, 167, 178, 180 happiness in dogs 311, 319–20, 321, 322 measuring emotions of jazz musicians using plants as biosensors 181, 186 see also measuring emotions of jazz musicians; predicting YouTube success Henninger, J. 311 human mobility disinformation on Twitter during COVID-19 study 102–3, 105–6, 107 Presidential communications on Twitter during COVID-19 study 76, 79–80, 82, 85, 89–90, 91, 92–4, 95 human voices, plants’ reaction to see plants as biosensors hybrid network representations 6–7
IBM Watson Personality Insights 197, 198 image recognition 312–13, 316 influence maximization 8–9 influencers 49 information systems (IS) research framework 317 Instagram 286 inter-modal dynamics 160, 162 interdisciplinarity 12 Jaccard Similarity Index see natural language processing (NLP) methods jazz musicians see measuring emotions of jazz musicians; measuring emotions of jazz musicians using plants as biosensors K-nearest neighbors regression 271, 272, 273, 275 Kendall Tau rank correlation 35–6 keypoint detection 312 Kickstarter see crowdfunding success Kleinberg, J. 53–4, 72–3 Koh, B. 143, 155 Kozinets, R.V. 218–19, 222 language use, and crowdfunding success see crowdfunding success latent Dirichlet allocation (LDA) 108 see also natural language processing (NLP) methods latent semantic analysis (LSA) see natural language processing (NLP) methods ‘lead users’ 219, 228 LinkedIn (environmental social governance rating system project) 269, 270 long short-term memory (LSTM) models 50, 64–5, 67–8, 69, 161, 176 loss function 314 low-code and no-code competencies, teaching 330 conclusion 340 discussion and practical implications 338–40 literature review/background 330–34 results 334–8 loyalty, customer 195 Luo, Y. 221, 222 machine learning crowdfunding success study 242–4, 245 emotion recognition in dogs 310, 312–16, 317–22, 323, 324, 327 environmental social governance (ESG) rating system project 267–8, 271–2, 273, 274, 275
346 Handbook of social computing measuring emotions of jazz musicians study 161, 162–3 and natural language processing (NLP) 28, 29 plants as biosensors study 301–6 predicting personality traits using social media 193, 197 predicting YouTube success study 152, 154, 155 see also measuring emotions of jazz musicians using plants as biosensors marketing 194, 286–7 see also customer segmentation matrix-based network representations 6–7 McCrae, R.R. 192 MCMC diagnostics 20–21 measuring emotions of jazz musicians 159–60 conclusion and future work 169–70 discussion 167–9 methodology 162–3 results 163–6 theoretical background 160–62 measuring emotions of jazz musicians using plants as biosensors 173–5 body signals and movement 177–8, 185–6 conclusion 186 discussion 185–6 facial emotions 175–6 methodology 178–80 plants as biosensors 176–7 results 181–5 theoretical background 175–8 Meer, D. 217–18 Microsoft Azure Cognitive Services 321–2 mobility see human mobility modularity 63 Moernaut, R. 280–81 moving average (MA) 59 multi-faceted network visualization 7–8 multi-layer perception (MLP) 313, 314 musicians collaborations between (exponential random graph models case study) 15–24 see also measuring emotions of jazz musicians; measuring emotions of jazz musicians using plants as biosensors natural language processing (NLP) analysis (environmental social governance rating system project) 269–71 natural language processing (NLP) methods 27–9, 108, 109 background 29–31 conclusion 40–41
discussion 37–9 methodology 31–3 results 33–7 NEO-PI-R (Revised NEO Personality Inventory) test 192 Netnography 218–19, 229 network analysis 108 network data visualization 2 application examples 8–9 conclusion 9 matrix-based and hybrid representations 6–7 multi-faceted network visualization 7–8 node-link representations 3–5 networks cascading behavior in social networks 53–4, 72–3 parameters for adjusting 314 social network analysis (environmental social governance rating system project) 267–9 see also exponential random graph models (ERGMs); predicting Black Swan events in cryptocurrency markets neural networks artificial neural networks (ANNs) 313–14, 320 convolutional neural networks (CNNs) 161, 176, 304–5, 320–21 deep neural networks (DNNs) 50, 161, 321 long short-term memory (LSTM) models 50, 64–5, 67–8, 69, 161, 176 no-code competencies see low-code and no-code competencies, teaching node-link network representations 3–5 nudging 252, 256–7, 258–9 Obama Administration, use of social media 75 object recognition 312 Oezkaya, B. 177, 295 on-balance volume (OBV) 59 online behavioral advertising (OBA) 250 optimizer 314 orthogonal drawing convention 3 personality and personality analysis see congruence between customer and brand personality Phoenix (software) 58 Pisanski, K. 296 Plant Spikerbox 173, 174, 178, 179–80, 298–300 plants as biosensors 294 background 294–6 conclusion 306–7 discussion 306 method 297–304, 309
Index 347 research questions 297 results 304–6 see also measuring emotions of jazz musicians using plants as biosensors plastic surgeon websites see consent banners on plastic surgeon websites polarization, social media 74, 76, 78, 79–81, 83–4, 86–91, 92, 94–6 see also climate change discourse precision 315 predicting Black Swan events in cryptocurrency markets 48 Black Swan events 51–3, 72 conclusion 70 cryptocurrencies 50–51 cryptomarkets 51 data collection 54–9 discussion 68–70 hypothesis 48–9 model 59–65 related work 49–50 results 65–8 theory 50–54 weaknesses 68 predicting YouTube success 142–3 discussion 154–5 hypotheses 144 limitations and future work 155–6 method 144–53 related work 143 results 153–4 Presidential communications on Twitter during COVID-19 pandemic 74–5 background 75–9 conclusion 96 discussion 94–6 disinformation 100 methods and materials 79–85 results 86–94 see also disinformation on Twitter during COVID-19 pandemic price prediction see predicting Black Swan events in cryptocurrency markets programming see low-code and no-code competencies, teaching progressive web app (PWA) 323 public anxiety (Presidential communications on Twitter during COVID-19 study) 74, 75, 76, 77–80, 81, 84–5, 88, 90, 91, 92–3, 95 public health and disinformation on Twitter during COVID-19 pandemic 101, 102, 103–4, 107, 120, 121, 124–8, 132, 133, 134
see also Presidential communications on Twitter during COVID-19 pandemic public trust (Presidential communications on Twitter during COVID-19 study) 74, 76, 77, 78, 79–80, 81, 84, 90, 91, 92–3, 94, 95 Python 269–70, 283 random forest 163, 165–6, 169, 176, 271, 272–3, 275, 304 random graphs see exponential random graph models (ERGMs) recall 315 relative strength indicator (RSI) 58–9 Richardson, C. 331 Rymer, J.R. 331 Sankey diagram 120, 123 SARS epidemic 76–7 self-image congruity 195 semantic similarity see natural language processing (NLP) methods sentiment analysis 108, 112, 159–60, 161, 162, 163, 174, 270–71, 283–4 SHapley Additive exPlanations (SHAP) 243–4, 245 Shimono, A. 143 similarity index 198, 201, 203–7, 208 similarity, semantic see natural language processing (NLP) methods Simpson, B. 311 social meaning 335 social media see climate change discourse; congruence between customer and brand personality; customer segmentation; disinformation on Twitter during COVID-19 pandemic; predicting Black Swan events in cryptocurrency markets; Presidential communications on Twitter during COVID-19 pandemic social network analysis, environmental social governance (ESG) rating system project 267–9 social network visualization see network data visualization social networks, cascading behavior in 53–4, 72–3 socio-semantic approach 27–8, 39 specificity and granularity of cookie setting options 253–5 Spotify data, explaining collaborations between musicians using 15–24 support vector regression 271–2, 273, 274 sustainability see environmental social governance (ESG) rating system
348 Handbook of social computing Taleb, N.N. 51–2 Teichert, T. 218 term frequency-inverse document frequency (TF-IDF) 276 see also natural language processing (NLP) methods thumbnails see predicting YouTube success tomato plants see plants as biosensors triangulation method of validation 109 tribes and Tribefinder 190–91, 194, 197–8, 199–200, 202, 207, 222, 223, 224–6, 233 Trump Administration, use of social media 75 Tsai, M.-F. 316 tunable level-of-detail 5–6 Twitter (X) see climate change discourse; congruence between customer and brand personality; customer segmentation; disinformation on Twitter during COVID-19 pandemic; predicting Black Swan events in cryptocurrency markets; Presidential communications on Twitter during COVID-19 pandemic Tyagi, A. 280
VAIM (visual analytics for influence maximization) 8–9 video thumbnails see predicting YouTube success virtual mirroring 162 visual summaries (networks) 5–6 visualization see network data visualization walking, plants’ responses to 177, 295 weak ties 16 WORDij (Z-Scores) 237–9, 240, 241 XGBoost 128–9, 152, 153, 180, 181–3, 271–2, 273, 274 Yankelovich, D. 217–18 YouTube see predicting YouTube success ZClassic 50 Zhang, J. 221, 222