Disease Control Through Social Network Surveillance 9783031078682, 9783031078699


232 35 7MB

English Pages [237] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Disease Control Through Social Network Surveillance
Chapter Contributions
Contents
Editors and Contributors
About the Editors
Contributors
Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A Case Study of Tweets with Machine Learning Classifiers
1 Introduction
2 A Brief Literature Survey
3 Data Collection, Pre-processing and Methodology
3.1 Raw-Data Acquisition
3.2 Data Pre-processing
3.3 Data Visualization
3.3.1 Geographical Analytics of Tweets
3.3.2 Users with Maximum Tweets
3.3.3 Most Frequent Hashtags
3.3.4 Monthly Statistics of Tweets
3.3.5 Textual Analysis of Tweets
3.3.6 Word and Phrase Associations
4 Experimental Design, Results and Discussions
4.1 Feature Selection
4.2 Platform Employed and Performance Evaluation Parameters
5 Conclusions
References
Spreader-Centric Fake News Mitigation Framework Based on Epidemiology
1 Introduction
2 Related Work
3 Epidemiology Inspired Framework
4 Preliminaries
4.1 Trustingness and Trustworthiness
4.2 Believability
4.3 Community Health Assessment Model
5 Vulnerability Assessment
6 Identification of Infected Population
7 Risk Assessment of Population
8 Infection Control and Prevention
9 Conclusion
References
Understanding How Readers Determine the Legitimacy of Online Medical News Articles in the Era of Fake News
1 Introduction
2 Background and Related Work
2.1 Presentation and Content in True and Fake News Articles
2.2 Detecting Fake News Articles: The Reader's Side
3 Methodology
3.1 Survey 1
3.2 Survey 2
3.3 Survey 3
3.4 Clustering Analysis
4 Results
4.1 Survey 1
4.2 Survey 2
4.3 Survey 3
5 Discussion
6 Conclusion
References
Trends, Politics, Sentiments, and Misinformation: Understanding People's Reactions to COVID-19 During Its Early Stages
1 Introduction
1.1 Contributions
1.2 Organization
2 Related Work
3 Reactions to COVID-19 During its Early Stages: Social Media Analytics
3.1 Dataset and Implementation Environment
3.2 Analysis Results
3.2.1 Number of Posts Related to COVID-19 Over Time
3.2.2 Number of Published News Per Web Site, Per Month
3.2.3 Geographic Distribution of Shared News
3.2.4 Geographic and Temporal Trends in Fake News
3.2.5 Opinions About Public Figures
4 Conclusion
References
Citation Graph Analysis and Alignment Between Citation Adjacency and Themes or Topics of Publications in the Area of Disease Control Through Social Network Surveillance
1 Introduction
2 Literature Review
3 The Citation Graph Methodology
4 Data
4.1 Data Collection
4.2 Derived Networks
5 Discussion of Nodal Attributes of the DCSNS Citation Graph
5.1 Degrees
5.2 Types
5.3 Themes
5.4 Topics
5.5 Relationships Between Nodal Attributes
5.6 Degree and Attribute Assortativities
6 Conclusions
References
Privacy in Online Social Networks: A Systematic Mapping Study and a Classification Framework
1 Introduction
2 Related Work
3 Systematic Mapping
3.1 Definition of Key Terms
3.2 Definition of Research Questions: Step 1
3.3 Conduct Search for Primary Studies and Screening of Papers for Inclusion and Exclusion: Steps 2 and 3
3.4 Classification Scheme and Mapping: Steps 3 and 4
3.4.1 RQ1 and RQ2: Topics in OSN Privacy Research
3.4.2 RQ3 and RQ4: Theoretical Contributions in OSN Privacy Research
3.4.3 RQ5 and RQ6: RE Research Papers in OSN Privacy Research
3.4.4 RQ7 and RQ8: Venues in OSN Privacy Research
4 Classification Framework for the Design and Action Theoretical Contributions
5 Discussion
6 Conclusion
References
Beyond Influence Maximization: Volume Maximization in Social Networks
1 Introduction
2 Related Work
3 Method
3.1 Data
3.2 Volume Maximization
3.3 Independent Cascade (IC) Diffusion Model
3.4 Reinforcement Learning Framework
3.5 Reward for the RL Framework
3.5.1 Diffusion Degree
3.5.2 Maximum Influence Degree
3.6 RL Learning Model
3.6.1 Q-Learning
3.6.2 SARSA
3.7 IBL Framework
3.7.1 Instance-Based Learning (IBL) Model
3.8 CELF-Volume Algorithm
3.9 Baseline Algorithms
3.10 Model Calibration
3.11 Expectation
4 Result
5 Discussion and Conclusion
References
Concerns of Indian Population on Covid-19 Vaccine Shortage Amidst Second Wave Infection Rate Spikes: A Social Media Opinion Analysis
1 Introduction
2 Literature Review
3 Methodological Approach
3.1 Preprocessing
3.2 Topic Modeling Process
4 Results
5 Discussion
6 Conclusions
References
The Effects of Face Masks on the Performance of Modern MWIR Face Detectors
1 Introduction
1.1 Goals and Contributions
2 Related Work
3 Methodology
3.1 Deep Learning Models
3.1.1 SSD MobileNet V2
3.1.2 SSD ResNet50 V1
3.1.3 CenterNet HourGlass104
3.1.4 CenterNet ResNet50 V2
3.1.5 Faster R-CNN Inception-ResNet V2
4 Experiments and Results
4.1 Datasets
4.2 Experimental Protocol
4.3 Results
4.4 Face Recognition Experiments
5 Conclusions and Future Work
References
Multispectral Face Mask Compliance Classification Duringa Pandemic
1 Introduction
2 Related Work
2.1 Masked Face Recognition
2.2 Mask Detection and Classification
3 Methodology
3.1 Classification Models
3.2 Dataset
3.3 Experimental Setup
3.4 Evaluation Metrics
4 Results and Discussion
4.1 Visible Results
4.2 Thermal Results
4.3 FMLD Test Set Results
4.4 Limitations
5 Conclusion and Future Work
References
On the Effectiveness of Visible and MWIR-Based Periocular Human Authentication When Wearing Face Masks
1 Introduction
1.1 Goals and Contributions
2 Related Research
3 Methodology
3.1 Pre-processing
3.1.1 MTCNN
3.2 FaceNet
3.3 VGG Face
3.4 Selecting Pre-trained Model
4 Experimental Results
4.1 Datasets
4.1.1 MILAB(B)-VTF
4.1.2 RMFD
4.2 Effects of Different Datasets
4.3 Data Preprocessing
4.4 Visible vs Thermal Data
4.5 Performance of Different Models
5 Conclusion and Future Work
References
Recommend Papers

Disease Control Through Social Network Surveillance
 9783031078682, 9783031078699

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Social Networks

Thirimachos Bourlai Panagiotis Karampelas Reda Alhajj   Editors

Disease Control Through Social Network Surveillance

Lecture Notes in Social Networks Series Editors Reda Alhajj, University of Calgary, Calgary, AB, Canada Uwe Glässer, Simon Fraser University, Burnaby, BC, Canada Advisory Editors Charu C. Aggarwal, Yorktown Heights, NY, USA Patricia L. Brantingham, Simon Fraser University, Burnaby, BC, Canada Thilo Gross, University of Bristol, Bristol, UK Jiawei Han, University of Illinois at Urbana-Champaign, Urbana, IL, USA Raúl Manásevich, University of Chile, Santiago, Chile Anthony J. Masys, University of Leicester, Ottawa, ON, Canada

Lecture Notes in Social Networks (LNSN) comprises volumes covering the theory, foundations and applications of the new emerging multidisciplinary field of social networks analysis and mining. LNSN publishes peer- reviewed works (including monographs, edited works) in the analytical, technical as well as the organizational side of social computing, social networks, network sciences, graph theory, sociology, semantic web, web applications and analytics, information networks, theoretical physics, modeling, security, crisis and risk management, and other related disciplines. The volumes are guest-edited by experts in a specific domain. This series is indexed by DBLP. Springer and the Series Editors welcome book ideas from authors. Potential authors who wish to submit a book proposal should contact Annelies Kersbergen, Publishing Editor, Springer e-mail: [email protected]

Thirimachos Bourlai • Panagiotis Karampelas • Reda Alhajj Editors

Disease Control Through Social Network Surveillance

Editors Thirimachos Bourlai Multispectral Imagery Lab—MILAB, ECE University of Georgia Athens, GA, USA

Panagiotis Karampelas Department of Informatics and Computers Hellenic Air Force Academy Acharnes Attica, Greece

Reda Alhajj Department of Computer Science University of Calgary Calgary, AB, Canada

ISSN 2190-5428 ISSN 2190-5436 (electronic) Lecture Notes in Social Networks ISBN 978-3-031-07868-2 ISBN 978-3-031-07869-9 (eBook) https://doi.org/10.1007/978-3-031-07869-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Disease Control Through Social Network Surveillance The general topic of disease control through social network surveillance is complicated and requires an optimal balance of ingredients. The intention of this book was not to generate a teaching textbook and to cover all possible areas related to this topic. Rather, we invited experienced researchers to cover a list of interesting areas of novel trends and technologies related to disease control including (1) disease control and public health surveillance, (2) social networking surveillance and analysis, and (3) human authentication related studies when utilizing surveillance data and applying deep learning algorithms. These trends and technologies are included in the ten chapters listed below that aim to provide the reader with essential information, knowledge, and understanding on disease control via social surveillance, as well as discussion on processes related to the efficient and secure public health monitoring.

Chapter Contributions In the first chapter, the authors discuss that the analysis of public perceptions and opinion mining (OM) has received considerable attention due to the easy availability of colossal data in the form of unstructured text generated by social media, e-commerce portals, blogs, and other similar web resources. They choose the Twitter platform in their research to study public perceptions regarding the global vaccination drive. More than 112 thousand Tweets from users of different countries around the globe are extracted based on hashtags related to the affairs of the COVID-19 vaccine. A three-tier framework is being proposed in which raw tweets are extracted and cleaned first, visualized and converted into numerical vectors through word embedding and N-gram models next, and finally analyzed through a set of machine learning classifiers with the standard performance metrics, v

vi

Preface

accuracy, precision, recall, and F1-measure. The authors show that the bag-ofwords (BoW) model developed achieves the highest classification accuracy. Their conclusions are that most of the people seem to have a neutral attitude towards the current COVID-19 vaccination drive and that also people favoring the COVID19 vaccination program are greater in number than those who doubt it and its consequences. In the second chapter, the authors argue that computational models for the detection and prevention of false information spreading (popularly called fake news) have gained a lot of attention over the last decade, with most proposed models identifying the veracity of information. Thus, they propose a framework based on a complementary approach to false information mitigation inspired from the domain of epidemiology. In such a domain, false information is analogous to infection, social network is analogous to population, and likelihood of people believing an information is analogous to their vulnerability to infection. As part of the framework, the authors propose four phases that fall in the domain of social network analysis. Through experiments on real-world information spreading networks on Twitter, they show the effectiveness of their proposed models and confirm their hypothesis that spreading of false information (fake news) is more sensitive to behavioral properties, such as trust and credibility, than spreading of true information (real news). In the third chapter, the authors work also on fake news mitigation strategies. They argue that the rapid spread of fake news during the COVID-19 pandemic has aggravated the situation and made it extremely difficult for the World Health Organization (WHO) and government officials to inform people only with accurate scientific findings. Misinformation dissemination has been so unhindered that social media sites had to ultimately conceal posts related to COVID-19 entirely and allow users to see only the WHO or government-approved information. Thus, action had to be taken because newsreaders lack the ability to efficiently discern fact from fiction and thereby indirectly aid in the spread of fake news believing it to be true. In their work, the authors focus on helping in understanding the thought process of an individual when reading a news article. They expand the space of misinformation’s impact on users by conducting a set of surveys to understand the factors consumers deem most important when deciding whether the information someone receives is true or not. Experimental results show that what people perceive to be important in deciding what is true information is different when confronted with the actual articles. They conclude that prior beliefs and political leanings affect the ability of people to detect the legitimacy of the information received. The fourth chapter discusses the topic of trends, politics, sentiments, and misinformation during COVID-19. The authors conduct a large-scale spatiotemporal data analytics study to understand peoples’ reactions to the COVID-19 pandemic during its early stages. In particular, they analyze a JSON-based dataset that is collected from news, messages, boards, and blogs in English about COVID-19 over a period of 4 months, for a total of 5.2 million posts. The data were collected from December 2019 to March 2020 from several social media platforms such as Facebook, LinkedIn, Pinterest, StumbleUpon, and VK. The study aims mainly to

Preface

vii

understand which implications of COVID-19 have interested social media users the most and how did they vary over time, as well as determining the spatiotemporal distribution of misinformation, and the public opinion toward public figures during the pandemic. The authors claim that their results can be used by many stakeholders (e.g., government officials and psychologists) to make more informative decisions, considering the actual interests and opinions of the people. In the fifth chapter, the authors present a Data-Network Science study on a dataset of publications archived in “The Semantic Scholar Open Research Corpus” (S2ORC) database and categorized under the area of “Disease Control through Social Network Surveillance,” an area abbreviated from now on as “DCSNS.” In particular, their dataset consists of 10,866 documents (articles and reviews), retrieved through a Boolean search, published in the period from 1983, the first year of cataloguing such publications in S2ORC, to 2020. By retrieving the corpus of abstracts of these documents (publications) and applying the standard LDA Topic Modeling technique, the authors claim to have found the optimal number of six topics producing the maximum topic coherence score among the corresponding topic models with varying numbers of topics. In that matter, the network of their study becomes a directed citation graph of publications in the area of DCSNS, with nodes and publications labeled by the Topics. Their aim is to study global and local network properties with regards to clustering under triadic relationships among connected nodes/publications, and with regards to the assortativity of attributes related to the content of publications. They claim that they have succeeded in analyzing the interplay between semantics and structure in the area of publications on DCSNS, by examining and discovering the occurrence of certain important attributes of publications in such a way that the aggregation of publications according to these attributes is associating the meaning of attribute affiliations to certain structural patterns of clustering, exhibited by the bibliographic citation network of the collected publications. The sixth chapter focuses on privacy in online social networks. The authors start their work by arguing that disease control through online social networks (OSNs) has become particularly relevant in the past few months. Given the sensitive nature of the data collected and manipulated in that context, a major concern for (potential) users of such surveillance applications is privacy. The concept of privacy has been studied from many different angles, and this work aims to offer a general systematic literature review of the area. The contributions of their book chapter are twofold. Firstly, they propose a systematic mapping study covering papers related to privacy in OSNs. This study results in a coarse-grained overview of the landscape of existing works in the field. In this first phase, 345 papers were examined. The findings show the characteristics and trends of publications in the area. They also emphasize the areas where there is a shortage of publications, hence guiding researchers to gaps in the literature. Secondly, they propose a classification framework and apply it to the subset of 108 papers that offer a solution to protect the user’s privacy. The results provide a way for researchers to position a solution in comparison with other existing solutions and they also highlight trends in existing solutions. The main

viii

Preface

practical implications of this book chapter are guidelines and recommendations proposed to designers of applications, including applications for disease control. In the seventh chapter, the authors argue that the health crisis brought about by COVID-19 has resulted in a heightened necessity for proper and correct information dissemination to counter the prevalence of fake news and other misinformation. Doctors are the most reliable source of information regarding patients’ health status, disease, treatment options, or necessary lifestyle changes. Prior research has tackled the problem of influence maximization (IM), which tries to identify the most influential physicians inside a physician’s social network. However, less research has taken place on solving the problem of volume maximization (VM), which deals with finding the best set of physicians that maximize the combined volume (e.g., medicine prescribed) and influence (i.e., information disseminated). In this chapter, the primary objective of the authors’ work is to address the VM problem by proposing different algorithmic frameworks, including a reinforcement learning (RL) one. The authors compared the frameworks while using the physicianSN dataset (physician social network 181 nodes and 19,026 edges) and tested different algorithms. Their research highlights the utility of using reinforcement learning algorithms for finding critical physicians that can swiftly disseminate critical information to both physicians and patients. In the eighth chapter, the authors focus on a study that discusses the perception of Indian population on COVID-19 vaccine shortage during the period of a rapid hike of cases in the COVID-19 second wave. Using a Twitter API, 46,000 unique tweets of Indian citizens have been scrapped, which include the following key words, namely “vaccine,” “second wave,” and “COVID.” The authors used a topic model. In machine learning and tweet processing, a topic model is a type of statistical model for discovering all abstract topics, which happen to appear in a collection of tweets (in this study of tweets relevant to the second wave of COVID). In practice, topic modeling is a text-mining tool that is occasionally used in tweets for the purpose of discovering hidden semantic structures in the tweet text body. In this work, it was used to analyze a set of key themes based on the perception of people. The study shows that the Indian population is concerned about vaccine shortage and a collective effort is recommended to be followed for the improvement of the wellbeing of the Indian citizens. In the ninth chapter, the authors work on a case study relevant to a pandemic era. They aim to address the problem of face detection in the MWIR (thermal) spectrum when the faces are occluded with face masks. Since the publicly available datasets are not large enough to train original models, transfer learning is used on models trained on the COCO dataset. The models are first trained and tested on masked face images, which results in high precision and recall values for all models. Then, these models are tested on masked face images, and the precision and recall metrics drop significantly. Performance drops also when the models proposed are tested on masked data with a marginal but noteworthy increase of the inference time. Then, the proposed models are trained and tested on masked face data, and they yield an 89.4% precision and 92.1% recall rates, respectively. The improved face recognition results when using an efficient automated face detection

Preface

ix

approach further demonstrate the importance of such models operating in the MWIR spectrum. This work is proposed to be further extended to scenarios where the data used are collected in the visible spectrum and under constrained conditions, as well as to scenarios where the data used are collected in both the thermal and visible bands and under outdoor, unconstrained settings. The study concludes that with sufficient real data, efficient unified models for each band that detects human faces at each distance can be developed. The tenth chapter investigates the problem of face mask compliance classification in response to the coronavirus disease (COVID-19) pandemic. Since the start of the pandemic, many governments and businesses have been continually updating policies to help slow the spread of the virus, including requiring face masks to use many public and private services. In response to these policies, many researchers have developed new face detection and recognition techniques for masked faces, almost exclusively focusing on detecting the presence or absence of someone wearing a face mask or not. In this work, the authors investigate the capability of modern classification algorithms to efficiently distinguish between masked face images, captured in the visible and thermal bands, and which worn in compliance or not with the suggested guidelines provided by health organizations. The approach proposed is deep learning (DL) based and is composed of the creation of a multi-spectral masked face database from subjects wearing face masks or not; then, it continues with the augmentation of the generated database with synthetic face masks to simulate two different levels of non-compliant wearing of face masks; and finally, it assesses a variety of DL-based architectures, on the previous augmented database, to investigate the efficiency of different classifiers on face mask compliance when operating in either the visible or thermal bands. Experimental results show that face mask compliance classification in both studied bands yields a classification accuracy that reaches 100% for most models studied, when experimenting on frontal face images captured at short distances and with adequate illumination. The eleventh and final chapter discusses the effectiveness of periocular-based human authentication algorithms when wearing face masks that aim to slow the spread of viruses, such as the COVID-19. According to a study published by the National Institute of Standards and Technology (NISTIR 8311), the accuracy of facial recognition algorithms is reduced between 5% and 50% when compared to the accuracy yielded by the same algorithms when the subjects are not wearing face masks. The same report also states that face images of subjects wearing masks can increase the failure to enroll rate (FER) more frequently than before. In addition, it is discussed that masked face images lower the efficiency of surveillance (unconstrained) face recognition systems, which become even more prone to error due to occlusion, distance, camera quality, outdoors, and low light. In this book chapter, the authors focus on the effectiveness of dual-eye periocularbased recognition algorithms when the subjects are wearing face masks under controlled and challenging conditions, and when the face images are captured in both the visible and MWIR (mid-wave infrared) bands. The authors first utilize MILAB-VTF(B), a challenging multi-spectral face dataset composed of thermal and

x

Preface

visible videos collected in 2021 at the University of Georgia (the largest and most comprehensive dual band face dataset to date). Then, they manually crop the faces from the images and use existing pre-trained face recognition algorithms to perform periocular-based matching. The study reports that the proposed dual-eye periocularbased recognition model yields a rank-1 face identification accuracy that reaches a 100% and 99.52% in the thermal and visible bands, respectively. Additionally, the authors perform same-spectral face recognition experiments (visible-to-visible and thermal-to-thermal) and report the results. Athens, GA, USA Dekelia, Greece Calgary, AB, Canada

Thirimachos Bourlai Panagiotis Karampelas Reda Alhajj

Contents

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A Case Study of Tweets with Machine Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Koushal Kumar and Bhagwati Prasad Pande Spreader-Centric Fake News Mitigation Framework Based on Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Bhavtosh Rath and Jaideep Srivastava Understanding How Readers Determine the Legitimacy of Online Medical News Articles in the Era of Fake News .. . . . . . . . . . . . . . . . . . . . Srihaasa Pidikiti, Jason Shuo Zhang, Richard Han, Tamara Silbergleit Lehman, Qin Lv, and Shivakant Mishra Trends, Politics, Sentiments, and Misinformation: Understanding People’s Reactions to COVID-19 During Its Early Stages .. . . . . . . . . . . . . . . . . . Omar Abdel Wahab, Ali Mustafa, and André Bertrand Abisseck Bamatakina Citation Graph Analysis and Alignment Between Citation Adjacency and Themes or Topics of Publications in the Area of Disease Control Through Social Network Surveillance .. . . . . . . . . . . . . . . . . . Moses Boudourides, Andrew Stevens, Giannis Tsakonas, and Sergios Lenis

1

31

55

77

89

Privacy in Online Social Networks: A Systematic Mapping Study and a Classification Framework .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109 Sarah Bouraga, Ivan Jureta, and Stéphane Faulkner Beyond Influence Maximization: Volume Maximization in Social Networks . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 133 Abhinav Choudhury, Shruti Kaushik, and Varun Dutt

xi

xii

Contents

Concerns of Indian Population on Covid-19 Vaccine Shortage Amidst Second Wave Infection Rate Spikes: A Social Media Opinion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 157 Remya Lathabhavan and Arnob Banik The Effects of Face Masks on the Performance of Modern MWIR Face Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 167 Victor Philippe, Suha Reddy Mokalla, and Thirimachos Bourlai Multispectral Face Mask Compliance Classification During a Pandemic . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 189 Jacob Rose, Haiying Liu, and Thirimachos Bourlai On the Effectiveness of Visible and MWIR-Based Periocular Human Authentication When Wearing Face Masks . . . . .. . . . . . . . . . . . . . . . . . . . 207 Ananya Zabin, Suha Reddy Mokalla, and Thirimachos Bourlai

Editors and Contributors

About the Editors Thirimachos Bourlai is an associate professor in the School of Electrical and Computer Engineering and an adjunct faculty at the Institute for Cybersecurity and Privacy, both at the University of Georgia; he is also an adjunct faculty at WVU in CSEE, School of Medicine, Forensics, and Chemical Engineering. He is the founder and director of the Multi-Spectral Imagery Lab (milab.uga.edu), a Springer Nature Series Editor of the Advanced Sciences and Technologies for Security Applications, a member of the board of directors at the Document Security Alliance, the VP of Education at the IEEE Biometrics Council, and a member of the Academic Research and Innovation Expert Group at the Biometrics Institute. He has published four books with Springer Nature: “Face Recognition Across the Imaging Spectrum,” 2016, “Surveillance in Action,” 2018, “Securing Social Identity in Mobile Platforms,” 2020, and “Disease Control Through Social Network Surveillance,” 2022. He has various patents and journal and conference publications. Panagiotis Karampelas holds a PhD in electronic engineering from the University of Kent at Canterbury, UK, and a Master of Science degree from the Department of Informatics, Kapodistrian University of Athens, with specialization in “High Performance Algorithms”. He also holds a bachelor’s degree in mathematics from the same University majoring in applied mathematics. Currently, he is with the Department of Informatics and Computers at the Hellenic Air Force Academy teaching courses to pilots and engineers participating at the same time in a number of internationalization activities on behalf of his institution. He is the author of the book Techniques and Tools for Designing an Online Social Network Platform published in Lecture Notes in Social Networks (2013) and editor in a number of books such as Electricity Distribution: Intelligent Solutions for Electricity Transmission and Distribution Networks in the book series Energy Systems (2016), Surveillance in Action: Technologies for Civilian, Military and Cyber Surveillance (2018) and Securing Social Identity in Mobile Platforms: Technologies for Security, xiii

xiv

Editors and Contributors

Privacy and Identity Management (2020) in the book series Advanced Sciences and Technologies for Security Applications, and From Security to Community Detection in Social Networking Platforms (2019) in the book series Lecture Notes in Social Networks. He is also a contributor to the Encyclopedia of Social Network Analysis and Mining. He serves as a series editor in the book series Advanced Sciences and Technologies for Security Applications and as an associate editor in the Social Network Analysis and Mining journal. He also serves as program committee member in a large number of scientific journals and international conferences in his fields of interests. Finally, he now participates in the Organizing Committee of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). Reda Alhajj is a professor in the Department of Computer Science at the University of Calgary, Alberta, Canada. He has published over 550 papers in refereed international journals, conferences, and edited books. He has served on the program committee of several international conferences. He is founding editorin-chief of the Springer premier journal Social Networks Analysis and Mining, founding editor-in-chief of Springer Series Lecture Notes on Social Networks, founding editor-in-chief of Springer journal Network Modeling Analysis in Health Informatics and Bioinformatics, founding co-editor-in-chief of Springer’s Encyclopedia on Social Networks Analysis and Mining (ranked third in most downloaded sources in computer science in 2018), and founding steering chair of the flagship conference “IEEE/ACM International Conference on Advances in Social Network Analysis and Mining” and three accompanying symposiums: FAB (for big data analysis), FOSINT-SI (for homeland security and intelligence services), and HI-BIBI (for health informatics and bioinformatics). He is member of the editorial board of the Journal of Information Assurance and Security, Journal of Data Mining and Bioinformatics, and Journal of Data Mining, Modeling and Management; he has been guest editor of a number of special issues and edited a number of conference proceedings. Dr. Alhajj’s primary work and research interests focus on various aspects of data science, network science, and big data with emphasis on areas like (1) scalable techniques and structures for data management and mining; (2) social network analysis with applications in computational biology and bioinformatics, homeland security, disaster, and management; (3) sequence analysis with emphasis on domains like financial, weather, traffic, and energy; and (4) XML, schema integration and re-engineering. He currently leads a large research group of PhD and MSc candidates. He received best graduate supervision award and community service award from the University of Calgary. He recently mentored a number of successful teams, including SANO, which ranked first in the Microsoft Imagine Cup Competition in Canada and received KFC Innovation Award in the World Finals held in Russia; TRAK, which ranked in the top 15 teams in the open data analysis competition in Canada; Go2There, which ranked first in the Imagine Camp competition organized by Microsoft Canada; Funiverse, which ranked first in Microsoft Imagine Cup Competition in Canada.

Editors and Contributors

xv

Contributors André Bertrand Abisseck Bamatakina Department of Computer Science and Engineering, Université du Québec en Outaouais, Gatineau, QC, Canada Arnob Banik VIT University, Vellore, Tamil Nadu, India Moses Boudourides Department of Computer Science, Haverford College, Haverford, PA, USA Sarah Bouraga Department of Business Administration, University of Namur, Namur, Belgium Thirimachos Bourlai Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA Abhinav Choudhury School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India Varun Dutt School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India Stéphane Faulkner Department of Business Administration, University of Namur, Namur, Belgium Richard Han University of Colorado Boulder, Boulder, CO, USA Ivan Jureta Department of Business Administration, University of Namur, Namur, Belgium Shruti Kaushik School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India Koushal Kumar Sikh National College, Qadian, Guru Nanak Dev University, Amritsar, Punjab, India Remya Lathabhavan Indian Institute of Management Bodh Gaya, Bodh Gaya, Bihar, India Tamara Silbergleit Lehman University of Colorado Boulder, Boulder, CO, USA Sergios Lenis Citrix, Patras, Greece Haiying Liu Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA Qin Lv University of Colorado Boulder, Boulder, CO, USA Shivakant Mishra University of Colorado Boulder, Boulder, CO, USA Suha Reddy Mokalla Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA

xvi

Editors and Contributors

Ali Mustafa Department of Computer Science and Engineering, Université du Québec en Outaouais, Gatineau, QC, Canada Bhagwati Prasad Pande Department of Computer Applications, LSM Government PG College, Pithoragarh, Uttarakhand, India Victor Philippe Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA Srihaasa Pidikiti University of Colorado Boulder, Boulder, CO, USA Bhavtosh Rath University of Minnesota, Minneapolis, MN, USA Jacob Rose Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA Jaideep Srivastava University of Minnesota, Minneapolis, MN, USA Andrew Stevens SPS, Northwestern University, Evanston, IL, USA Giannis Tsakonas Library & Information Center, University of Patras, Patras, Greece Omar Abdel Wahab Department of Computer Science and Engineering, Université du Québec en Outaouais, Gatineau, QC, Canada Ananya Zabin Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA Jason Shuo Zhang University of Colorado Boulder, Boulder, CO, USA

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A Case Study of Tweets with Machine Learning Classifiers Koushal Kumar and Bhagwati Prasad Pande

Abstract In the realm of contemporary soft computing practices, analysis of public perceptions and opinion mining (OM) have received considerable attention due to the easy availability of colossal data in the form of unstructured text generated by social media, e-commerce portals, blogs, and other similar web resources. The year 2020 witnessed the gravest epidemic in the history of mankind, and in the present year, we stand amidst a global, massive and exhaustive vaccination movement. Since the inception of the COVID-19 vaccines and their applications, people across the globe, from the ordinary public to celebrities and VIPs have been expressing their fears, doubts, experiences, expectations, dilemmas and perceptions about the current COVID-19 vaccination program. Being very popular among a large class of modern human society, the Twitter platform has been chosen in this research to study public perceptions about this global vaccination drive. More than 112 thousand Tweets from users of different countries around the globe are extracted based on hashtags related to the affairs of the COVID-19 vaccine. A threetier framework is being proposed in which raw Tweets are extracted and cleaned first, visualized and converted into numerical vectors through word embedding and N-gram models next, and finally analyzed through a few machine learning classifiers with the standard performance metrics, accuracy, precision, recall, and F1-measure. The Logistic Regression (LR) and Adaptive Boosting (AdaBoost) classifiers attended the highest accuracies of 87% and 89% with the Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) word embedding models respectively. Overall, the BoW model achieved slightly better average classification accuracy (78.33%) than that of the TF-IDF model (77.89%). Moreover, the experimental results show that most of the people have a neutral attitude towards the current COVID-19 vaccination drive and people favoring the

K. Kumar Sikh National College, Qadian, Guru Nanak Dev University, Amritsar, Punjab, India B. P. Pande () Department of Computer Applications, LSM Government PG College, Pithoragarh, Uttarakhand, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_1

1

2

K. Kumar and B. P. Pande

COVID-19 vaccination program are greater in number than those who doubt it and its consequences. Keywords Vaccine · COVID-19 vaccination program · Sentiments analysis · Twitter · Machine learning · N-gram

1 Introduction The year 2020 witnessed one of the gravest epidemics of human civilization: the COronaVirus Disease of 2019 (COVID-19). The pace of humanity all over the globe halted under the shackles of lockdowns, quarantines and restricted lives. Since the very start of the year 2020, our lives were being elapsed in masks, sanitizers, social distancing, threat, anxiety, bewilderment, and fear. In the era of the current pandemic, digital social media platforms have been the very first choice of we humans of the modern age to express thoughts, emotions, experiences, sentiments, ideas and perceptions about the COVID-19 pandemic, its effects and consequences. After a year of the inception of this deadly disease, we find ourselves amidst a global, massive, and exhaustive vaccination campaign in the history of mankind. According to an international survey, 5.26 vaccination doses have been administered per 100 people globally so far [1]. Those who have taken the first or both shots of vaccine, besides posting their pictures of being vaccinated, are writing a lot about their experiences; physical, mental, and emotional consequences of post-vaccine intake; expectations and feedbacks. On the other hand, people who haven’t taken vaccine yet or are waiting for the turn of their social/age genre, are also expressing their doubts, dilemmas, expectations, perceptions and thoughts through the social media pulpits. It is very common to share false, misleading, and illusory information by the second class of public mentioned above. Although the successful trials of vaccines from several different drug vendors across the globe resumed hope and faith to combat the evil of COVID-19, a substantial mass is dubious, insecure, and uneasy about the vaccination program. Some anti-vaccination ideologies and spirits have also been trending over different social media platforms. Researchers observed a consistent hesitation among the general public for the acceptance and commitment towards the COVID-19 vaccination programs. Such observations reveal a herculean challenge for governments across the globe to tame the COVID-19 pandemic and to mitigate the virus spread through vaccines [2]. Therefore, a need is felt to ascertain public opinions, sentiments and perceptions about the global COVID-19 vaccination drive. Sentiments analysis (SA) is a soft computing technique that perceives positive, negative, or neutral opinions, known as polarity within a piece of text. This text can be a sentence, a Tweet, a Facebook/Instagram post, a paragraph, or a whole blog. For example, consider the following Tweets as illustrations (Table 1). SA weighs the perception of users towards the current vaccination drive which they express as a combination of text, hashtags, and emojis.

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . .

3

Table 1 Tweets about the global COVID-19 vaccination campaign and their polarities Tweet text “Australia to Manufacture Covid-19 Vaccine and give it to the Citizens for free of cost AFP quotes Prime Minister CovidVaccine” “Is more dangerous yet to come? Don’t know when CovidVaccine will come out “Serum Institute India is looking to raise up to 1 billion around Rs 7500 crore for Covid-19 vaccine development”

Sentiment Positive Negative Neutral

Automatic SA of the current vaccination program may help governments and health authorities to exercise and implement efficacious awareness, communication, know-how, and action plans. In the present article, the sentiments and opinions of the public about the global COVID-19 vaccination campaign are analyzed through their Twitter posts. There were two reasons behind choosing the Twitter platform for the current study. First its popularity among different genres of users: from common men to celebrities. And second, the availability of the latest technologies to extract or scrape data (Tweets) easily. More than 112 thousand Tweets about the COVID-19 vaccines and vaccination drives are scraped first. The scraped dataset is comprised of Tweets from global users coming from different countries1 (refer to Table 3 for more details). These raw Tweets are then preprocessed and cleaned next. Then, the preprocessed data are visualized over different observations to develop insights about various key aspects, trends and hidden relationships in the data. The polarity and subjectivity of the Tweets are determined through the Python libraries. To enhance the prediction accuracies of the ML models, the processed Twitter data are converted into numerical vectors through BoW and TF-IDF word embedding models and the N-gram (unigram, bigram, and trigram) feature selection method is exploited. Three ML algorithms, Multinomial Naïve Bayes (MNB), AdaBoost and LR are exercised to classify and predict public sentiments. The performances of these classifiers are assessed with the standard metrics viz., accuracy, precision, recall, and F1-measure for both the word embedding models mentioned above. The accuracies of the ML classifiers are calculated for all the three N-gram feature selection methods and the rest of the three metrics are calculated for each sentiments class. Section 2 presents a short literature review; Sect. 3 deals with data collection, methodology and data visualization; in Sect. 4, experimental design and results are discussed; Sect. 5 discusses the conclusion and future scope of the present research.

1 The reader can download complete dataset from here: https://drive.google.com/drive/folders/ 1AjA4PhZL7kAWfY_WItnQwJGxBUoLfP-W?usp=sharing

4

K. Kumar and B. P. Pande

2 A Brief Literature Survey Alam et al. [3] extracted Tweets about COVID-19 vaccinations and applied Natural Language Processing (NLP) tools and Valence Aware Dictionary for Sentiment Reasoner (VADER) to observe the polarities of the Tweets and found 33.96% positive, 17.55% negative, and 48.49% neutral responses. They also looked at how sentiments changed over time and evaluated the accuracy of various ML models like Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) and bidirectional LSTM. The empirical results showed that LSTM and Bi-LSTM attained accuracies of 90.59% and 90.83% respectively. Ansari and Khan [4] analyzed vaccine-related Tweets to evaluate the public’s reactions about the current vaccination drives through thematic sentiment, emotional analysis, and demographic interpretation. The authors found that the Tweets were largely negative in sentiments, with a rising lack of trust. They commented that fear may have remained the dominant emotion, expressing concern about the desire to receive the COVID-19 vaccine. In their research, Cotfas et al. [5] examined the trends of public opinions on COVID-19 vaccination drive in the United Kingdom. They extracted more than two million Tweets for their study and compared classical ML and Deep Learning (DL) algorithms to find the best one. The authors found that the majority of Tweets are neutral opinions and the number of positive Tweets surpassed the number of negative Tweets. Dubey [6] studied Tweets posted by Indian users on the COVID19 vaccination program being held in India. The authors employed the Syuzhet package for sentiments classification of Tweets into positive and negative for both the Indian vaccines, say Covishield and Covaxin. The authors also categorized Tweets into eight emotions and reported that the Covishield acquired more positive emotions like trust and anticipation as compared to the other vaccine, the Covaxin. Leung [7] studied Tweets related to COVID-19 vaccination through the Twitter API, and exercised three different sentiment analyzers, say, NLTK Vader, TextBlob, and Stanza to analyze people’s opinions. The author reported that the empirical results of NLTK Vader and TextBlob are almost similar and the TextBlob returned the most positive sentiments while the Stanza detected the most negative sentiments. Leung [7] concluded that the people’s reactions to the vaccination program of the current pandemic tend to be mixed, slightly inclined towards the positive direction. Piedrahita-Valdés et al. [8] mentioned that social media play a key role in the proliferate of perception about vaccines and surveilling social media posts about vaccines may help to control the factors that can affect the faith and confidence of the public. The authors collected vaccine coupled Tweets of a long span of 8 years and applied a hybrid approach to carry out OM. Their approach classified the Tweets into the three classes of neutral, positive and negative with a substantial percentage of neutral Tweets (69.36%). For a better insight, the authors also presented the sentiment polarities across a multitude of countries. Roe et al. [9] studied the COVID-19 vaccine hesitancy among people across the globe through their Tweets. The authors found that the majority of Tweets were negative in sentiment, while the positive and neutral Tweets appeared at second and third

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . .

5

places respectively. The authors also observed that the intensity of negative tweets has been higher than that of positive Tweets. Singh et al. [10] studied the mental state of people by analyzing Tweets through the Bidirectional Encoder Representations from Transformers (BERT) model. The authors prepared and analyzed two datasets, Tweets posted by people all over the globe and that by the Indian public. They claimed to achieve validation accuracy of 94%. Yousefinaghania et al. [11] analyzed publicly available Tweets about COVID-19 vaccine sentiments and compared their progression by time, geographical distribution, keywords, post engagement metrics and accounts characteristics. The authors observed the following order of Tweets’ sentiments: Neutral>Positive>Negative. The authors also found that the anti-vaccine Twitter accounts included political activists, authors, and artists, and Twitter bots that generated automated contents. Hum [12] exercised the opinions of people about the COVID-19 vaccine expressed through the Twitter platform. With the help of NLP techniques and data visualization, the author classified the Tweets into positive, neutral and negative. The author concluded that the perception of people towards the COVID-19 vaccine is mostly positive and neutral. Raghupathi et al. [13] studied Tweets related to the vaccination of a measles outbreak in the United States in 2019. The authors collected a sample of 9581 Tweets and studied it using the TF-IDF SA technique. They reported that 77% of the public believe in the need for a more sophisticated vaccine. The authors concluded that the health experts must be aware of the prospective rumors, falsity and myths among the public for the vaccination campaign. Samuel et al. [14] studied and identified people’s sentiments associated with the COVID-19 vaccination across the United States. The authors demonstrated insights of fearsentiment of this vaccination drive over time using descriptive textual analytics. They employed a wide range of ML algorithms to classify Tweets of various lengths and found that a few ML classifiers were not good for longer Tweets. Tavoschi et al. [15] exercised OM and SA on vaccination in Italy. They studied vaccine-allied Tweets of a long span of one year and classified them into neutral and positive classes automatically with the help of supervised ML techniques. The results of their research exhibited an increasing polarization of the public opinions and sentiments on vaccination. The authors claimed that their work may be useful to estimate the inclination of public opinions towards vaccination programs and such techniques of SA and OM possess the potential to develop communication and awareness strategies in future. Zhang et al. [16] studied public sentiments and opinions on social media for vaccination programs against Human Papilloma Virus (HPV) using ML-based techniques. The authors proposed a few transfer learning approaches to analyze Tweets on HPV vaccines. They compared the empirical results of their suggested techniques on the HPV dataset and reported that the fine-tuned BERT model performed the best. The authors concluded that this model may help to develop insight for low exposure of the vaccination program and suggest that their approaches can be applied to the SA problem of other public health affairs. On et al. [17] collected Korean social media posts on childhood vaccination by employing the Smart Insight technique. The authors extracted emotional terms from the posts and classified them into positive, negative, and neutral posts according

6

K. Kumar and B. P. Pande

to the frequency and trend of emotional words. They employed techniques of LR analysis and association analysis for SA. The authors reported that public sentiments for vaccination appeared to be affected by news about vaccination. Mitra et al. [18] studied the attitudes of people against vaccination campaigns. They collected huge Twitter data of perceptions, expressions and discussions about vaccination over a substantial span of four years and identified persistent users participating in favored and opposing arguments. Users who were recently inclined to exhibit anti-vaccination debates were also identified. The authors concluded that the users continuously involved in vaccination opposition movements appeared to be covert, adamant, and had mistrust in government, while the new joiners of the anti-vaccination group tended to be more social, concerned about health but lesser determined. Numnark et al. [19] proposed a monitoring system, VaccineWatch to analyze vaccine-related posts from Twitter and RSS feed. The authors developed GUI based web application to mine social media information related to vaccination. The authors claimed that their tool serves as a disease surveillance system that is capable to visualize information to correlate parameters like vaccine, disease, countries and cities, and firms. Salathé and Khandelwal [20] utilized freely available social media data to analyze sentiments towards a new vaccine. The authors identified a strong correlation between users’ posts and regional vaccination rates. They observed that the users who hold similar opinions or sentiments tend to exchange information more often than those who do not agree with opinions and people with negative sentiments are more likely to be vulnerable. The authors concluded that analysis of social media posts may prove to be a cheap but productive tool to evaluate vaccination impact and acceptance. The following table (Table 2) summarizes the above discussions and also presents a comparative analysis of the literature. The lesser applications of data visualization in the domain have been observed. Also, very few studies compared results of ML classifiers with word embedding techniques in the realm of SA of Twitter data. Therefore, in the present work, an extensive data visualization of the processed Twitter text has been exercised to develop new insights. Moreover, the empirical results of three popular ML algorithms with two standard feature selection techniques are also being compared in the present study.

3 Data Collection, Pre-processing and Methodology In the present era of digital social media, undeniably Twitter has been the most popular social media forum among people to express their thoughts, opinions, sentiments and beliefs. Twitter allows people to share any information with a constraint of 140 characters utmost, called a Tweet. In contrast to other social media platforms such as Facebook, Twitter has been preferred for data retrieval because it focuses on keywords/hashtags and offers reachability to a larger audience or users. With such a large volume of users’ views, these massive data provide the opportunity for researchers to extract users’ inclination towards a specific emotion

Ansari & Khan (2021)

Cotfas et al. (2021)

2.

3.

S. No. 1.

Author(s) and year Alam et al. (2021)

Annotated Twitter dataset

Twitter

Dataset Kaggle (Twitter)

Multinomial Naive Bayes (MNB), Random Forest (RF), Support Vector Machine (SVM), Bi-LSTM and Convolutional Neural Network (CNN)

Naïve Bayes (NB)

ML/DL algorithms/ Techniques applied RNN, LSTM, Bi-LSTM

Against, Neutral, and In-favor

Positive, Neutral, and Negative

Polarity classes Positive, Neutral, and Negative

Table 2 Research works on SA of vaccine-related affairs through ML technology

Removal of links, normalization of email ids and emojis, BoW, TF-IDF

Preprocessing/ Feature selection/Other technique(s) applied Dropping unnecessary columns, e-mails, URLs, quotes, tokenizationdetokenization, VADER –

Accuracy, precision, recall, and F-score



Performance parameter(s) studied Precision, F1-score, and, confusion matrix

Python

Python

Tool(s)/ Software employed Python

(continued)

Most Tweets are negative in nature and lack of trust has been observed Tweets are mostly neutral while the in-favor class of Tweets found to be greater than that of against class; Trend of events affects the occurrence of Tweets

Result(s) Neutral responses are highest, Bi-LSTM achieves the highest accuracy of 90.83%

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . . 7

Author(s) and year Dubey (2021)

Leung (2021)

S. No. 4.

5.

Table 2 (continued)

Twitter

Dataset Twitter

NLTK VADER, TextBlob, and Stanza

ML/DL algorithms/ Techniques applied

Positive, Neutral, and Negative

Polarity classes Positive and Negative

Preprocessing/ Feature selection/Other technique(s) applied Removal of white spaces, links, punctuations and stop words; conversion into lower case Removal of URLs, hashtags, emojis, and reserved words –

Performance parameter(s) studied –

Python

Tool(s)/ Software employed R

Result(s) Covaxin: 69% positive and 31% negative sentiments; Covishield: 71% positive and 29% negative sentiments Mixed sentiments: 40% positive while 38.4% negative; TextBlob and NLTK Vader exhibits same sentiments trends viz. Positive> Neutral>Negative. On the other hand, Stanza’s sentiment trends are completely different: Negative>Neutral> Positive

8 K. Kumar and B. P. Pande

PiedrahitaValdés et al. (2021)

Roe (2021)

Singh et al. (2021)

6.

7.

8.

Two datasets from Twitter: Entire world and India only

Twitter

Twitter

Bidirectional Encoder Representations from Transformers (BERT)

NLTK, VADER

SVM

Positive, Neutral, and Negative

Positive, Neutral, and Negative

Positive, Neutral, and Negative

Regression, Minimum Redundancy Maximum Relevancy (mRMR)



Dropping non-English and non-Spanish Tweets, removing Tweets with only URLs and hashtags, anonymization of user IDs

Average likes and retweets, intensity analysis, polarity, subjectivity

p-value, median, mean, variance, skewness, standard deviation (SD)

p-value, R2

Python 3.7 on Jupyter Notebook

Python, Microsoft Azure, JISC

SPSS

(continued)

Most Tweets are found to be neutral (69.36%), positive Tweets (21.78%) are greater than negative Tweets (8.86%); Neutral Tweets decreased, while positive and negative Tweets increased with time The majority of Tweets are negative, followed by positive and neutral Tweets. The intensity of negative Tweets is higher than the positive Tweets The proposed model achieves the validation accuracy of 94%; Indians exhibit positive reactions and are less intended to spread negativity

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . . 9

Author(s) and year Yousefinaghani et al. (2021)

Hum (2020)

Raghupathi et al. (2020)

S. No. 9.

10.

11.

Table 2 (continued)

Twitter

Twitter

Dataset Twitter

Clustering

NLP, Data visualization

ML/DL algorithms/ Techniques applied –



Positive, Neutral, and Negative

Polarity classes Positive, Neutral, and Negative

Preprocessing/ Feature selection/Other technique(s) applied Dropping punctuations, Unicode errors, email IDs, web-links, currency symbols and numbers, VADER Removal of @’s, hashtags, stop words, and links TF-IDF Word count frequencies and cooccurrences, and sentiment score

-

Performance parameter(s) studied The average number of retweets, favourites, replies and quotes

Python

Python

Tool(s)/ Software employed Python

About 77% of users feel the need for new/better vaccines; Trends of positive attitudes has been observed in connection with public health issues

Positive: 40–42% Neutral: 41–48% Negative: 12–16%

Result(s) Neutral Tweets are most common; Opinions and sentients change with vaccine-related events

10 K. Kumar and B. P. Pande

Samuel et al. (2020)

Tavoschi et al. (2020)

Zhang et al. (2020)

12.

13.

14.

Twitter

Italian Twitter stream

Twitter

SVM, BERT, Bidirectional Gated Recurrent Unit (BiGRU)

SVM

Linear Regression (LinR), NB, LR, and K-Nearest Neighbor (KNN)

Positive, Neutral, and Negative

In favour. Against, and Neutral

Positive, and Negative

Dropping hashtags, @’s, URLs, replacing lowercases, and cross-validation

Tokenization, removal of various PoS, stop words, stemming, and cross-validation

Removal of stop words, tokenization, part-of-speech (PoS) tagging



R, Weka toolkit and JAVA APIs

p-value, and R2

F1-score, micro-F1, SD, and Root Mean Square Error (RMSE)

R

Accuracy

(continued)

NB and LR exhibited the classification accuracies of 91% and 74% for shorter Tweets respectively but they were not effective for longer Tweets Neutral: 60%, Against: 23%, In favour: 17%; It has been observed that events related to vaccination influence the polarity of Tweets Fine-tuned BERT model performed the best

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . . 11

Mitra et al. (2016)

Numnark et al. (2014)

Salathé and Khandelwal (2011)

16.

17.

18.

S. No. 15.

Author(s) and year On et al. (2019)

Table 2 (continued)

Twitter

Twitter and RSS feed

Dataset Korean Social Media posts’ database Twitter

NB, Maximum Entropy (ME), and Dynamic Language Model (DML)

Visualization

ML/DL algorithms/ Techniques applied LR, Association rules Meaning Extraction Method (MEM)

Positive, Negative, Neutral and Irrelevant



Polarity classes Positive, Neutral, and Negative Positive, and Negative



Tokenization, PoS tagging, information tagging



Preprocessing/ Feature selection/Other technique(s) applied –

Sentiment score, vaccination coverage



p-value, mean difference

Performance parameter(s) studied p-value

NLTK, and MegaM

MySQL, VaccineWatch



Tool(s)/ Software employed R

Continuous anti-vaccine mentality tends to exhibit conspiratorial thinking and mistrust The proposed system enabled the users to focus on the parameters of their interest Neutral responses are highest

Result(s) 64.08% of posts are positive

12 K. Kumar and B. P. Pande

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . .

13

Fig. 1 Methodology for the present research

or sentiment [21]. The Tweets from users across various nations related to the COVID-19 vaccination during the tenure of six months (from 18th August 2020 to 14th February 2021) are extracted using the Twitter API. We propose a threefold framework for carrying out this research, which takes Twitter data as input and produces a comparative analysis report based on certain parameters. In the first stage, Tweets are scrapped from the Twitter platform and then pre-processing and cleaning techniques are used to repair and eliminate the dirty data. In the second stage, the pre-processed data have been visualized to find the hidden relationships among various attributes. The BoW and TF-IDF word embedding models are then exercised in this stage to represent the Twitter text in numerical vector forms. In the final stage, ML algorithms such as MNB, AdaBoost and LR are applied to evaluate the processed dataset and to classify users’ opinions into the categories of positive, negative, and neutral sentiments. These algorithms are applied to the processed Twitter dataset to study the following performance parameters: accuracy, precision, recall and F1-score. Finally, the present study examines the outputs of these three classifiers in context with the parameters listed above. The elements of our proposed framework are depicted in Fig. 1.

3.1 Raw-Data Acquisition The virtual digital world has been producing unprecedented volumes of data with each passing day. According to some projections, there will be about 175 zettabytes of data by 2025. The social media platforms such as Twitter contribute a substantial

14

K. Kumar and B. P. Pande

Table 3 Key summary of the dataset Attribute Number of Tweets analyzed Number of unique users First Tweet time

Summary 1,12,431

18th August 10:26:43 2020

Last Tweet time

14th February 3:34:16 2021

61,213

Attribute Number of unique locations Total hashtags counts

Summary 16,887

Number of verified users Number of devices used to Tweet

14,058

30,000

221

fraction of the total amount of data produced every day and it is one of the widely accessed digital platforms across the world [22]. Twitter provides easy web-based application programming interfaces (APIs) which encourages researchers to analyze people’s sentiments by exploiting Tweets’ text. An automatic technique called Scrapping has been used to collect data from the Twitter portal. Scraping is a data extraction method that is used to retrieve data from a variety of social media platforms, including Facebook, Twitter, LinkedIn, Instagram, websites, blogs, and even PDF documents. A Python library, pytwitterscraper was used to retrieve Twitter data and these data were saved into a data frame at a local site machine. One of the key benefits of the pytwitterscraper library is that it enables users to scrape Tweets from any period, despite the Twitter API’s limitations on the number of Tweets that can be scrapped. In the present study, a total of 112,431 COVID-19 vaccine-related Tweets between the dates 18th August 2020 and 14th February 2021 by users all over the world are extracted through the pytwitterscraper library version 1.3.4.post2. Some specific hashtags such as ‘CovidVaccine’, ‘COVID19Vaccine’, ‘LargestVaccineDrive’, ‘CovidVaccineModerna’, ‘CovidVaccinePfizer’ etc. are exercised for Tweets’ extraction. The complete Tweets-dataset is saved in the Comma Separated Value (CSV) format which uses a comma to separate different values. Table 3, presented below describes various important attributes of the dataset being studied and their summary. Since the goal of this study is to analyze public sentiments from Tweets’ text, two Python text libraries, Natural Language Toolkit (NLTK) and TextBlob have been employed which are the most widely used libraries for text processing. To determine the polarity score of the Tweets, i.e., whether the Tweets are negative, neutral, or positive, TextBlob is applied. After applying the TextBlob library, the sentiments of Tweets are divided into three classes: −1, 0 and 1, where −1 corresponds to a negative Tweet, 1 corresponds to a positive Tweet and 0 corresponds to a neutral Tweet. To determine the compound score of a particular Tweet, the SentimentIntensityAnalyzer tool of the Vader lexicon analysis package inside the NLTK library is utilized. Table 4 shown below describes the various attributes and their brief description about the Twitter users related to the dataset being studied.

S. No. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Attributes Name user_name user_location user_description user_created user_followers user_friends user_favourites user_verified date text hashtags source is_retweet

Description This attribute represents the name of Twitter user This attribute describes the location of the user This attribute is used to describe user behavior such as their hobbies, profession, age, etc. It describes the date when the user-created his/her Twitter profile This attribute represents the number of followers a user has on the Twitter portal This describes the number of friends a user has on Twitter This attribute shows favorites or likeable Tweets This shows that a user account is authentic, notable, and in active state This attribute describes the date of Tweet posting This attribute describes the contents of the Tweet user posted This attribute denotes a keyword used to search and link with the latest trending topics This attribute reveals the source of Tweet used for posting it This attribute describes the re-posting of a Tweet

Table 4 Description of each retrieved attribute Data type Object (String) Object (String) Object (Integer and characters) Object (Float) Int 64 Int 64 Int 64 Boolean (Yes/No) Object Object (String) Object Object Boolean (Yes/No)

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . . 15

16

K. Kumar and B. P. Pande

3.2 Data Pre-processing Users’ Tweets are always in unstructured text form, which pose a major challenge for the SA researchers. Such unstructured data contain noise and inconsistency and therefore they must be preprocessed and cleaned before putting them in ML models. Text preprocessing is a vital step in text mining because it transforms raw text into an adequate form suitable for text analysis and visualization. Preprocessing operations help in reducing noise present in the data which eventually increase the processing speed and classification accuracy of the ML models. The following preprocessing steps are exercised to enhance the quality of the data: (i) filling the missing values, (ii) removal of stop and rare words, (iii) removal of emojis, (iv) removal of punctuations, (v) removal of multiple spaces and numeric digits, (vi) removal of special symbols, hyperlinks and URLs, (vii) removal of expressions, (viii) tokenization, (ix) lemmatization, (x) case conversions, etc. To improve the performance of the classifier models, some irrelevant attributes were also dropped after the preprocessing step. Take a look at Table 5 presented below, it presents a step-by-step comprehensive process for transforming an unstructured Tweet into a structured text to be consumed by ML models. Each step of preprocessing treatment changes the original Tweet into a more refined version.

3.3 Data Visualization This section deals with Tweets’ analytics, where we explore various important aspects, trends and hidden relationships being dwelt in Tweets. Data visualization enables to comprehend data in more effective and efficient ways through various pictorial representations. Different Python libraries such as Pandas, NLTK, seaborn, ggplot, Missingno, Flashgeotext, and plotly have been applied for preprocessing and data visualization.

3.3.1 Geographical Analytics of Tweets It would be of interest to find the involvement of the public from various countries in discussions about the COVID-19 vaccination campaign. It has been investigated that there are 16,887 unique locations from where Tweets about this vaccination campaign were posted. These locations include the names of towns, cities, states, or provinces only and therefore Tweets’ metadata are exploited to convert them into country names. The geocoding services, Tweet time zones, and geo-coordinates are utilized to translate user locations into user country names. User locations are represented in the form of latitude and longitude coordinates, and a country name can be easily extracted from these coordinates using a geocoding service like Google

Pre-processing treatment Initial Tweet

Removal of Emojis

Removal of stop and rare words

Removal of special symbols and punctuations

Conversion to lowercase

Removal of hyperlinks, multiple spaces

Removal of web addresses and numeric digits

Lemmatization and tokenization

S. No. 1.

2.

3.

4.

5.

6.

7.

8.

Table 5 Data pre-processing steps and outcomes Output CORONAVIRUS UPDATES – “American businessman with Turkish-Armenian roots leads #A C¦ “it âA C™s #Covid19Millionares COVID-19 vaccine development in the Laboratory”!! #CovidVaccine”https://t.co/EsBS5MC4kW CORONAVIRUS UPDATES – “American businessman with Turkish-Armenian roots leads COVID-19 vaccine development in the Laboratory” !!#A C¦ “it âA C™s #Covid19Millionares #CovidVaccine”https://t.co/EsBS5MC4kW CORONAVIRUS UPDATES – “American businessman Turkish-Armenian roots leads COVID-19 vaccine development Laboratory”!!#A C ¦ “it âA C™s #Covid19Millionares #CovidVaccine”https://t. co/EsBS5MC4kW CORONAVIRUS UPDATES American businessman Turkish Armenian roots leads COVID19 vaccine development Laboratory Covid19Millionares CovidVaccine https://t.co/EsBS5MC4kW coronavirus updates american businessman turkish armenian roots leads covid19 vaccine development laboratory covid19millionares covidvaccine https://t.co/EsBS5MC4kW coronavirus updates american businessman turkish armenian roots leads covid19 vaccine development laboratory covid19 millionares covidvaccine https://t.co/EsBS5MC4kW coronavirus updates american businessman turkish armenian roots leads covid vaccine development laboratory covid millionares covidvaccine coronavirus update american businessman turkish armenian root lead covid vaccine development laboratory covid millionares covidvaccine

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . . 17

18

K. Kumar and B. P. Pande

Fig. 2 Top 10 countries involved in posting COVID-19 vaccination Tweets across the world

Fig. 3 Top 10 Twitter users who posted about the COVID-19 global vaccination drive

Maps Geocoding API. Figure 2 shown below depicts the top 10 countries and their percentage share in total Tweets about the vaccination campaign across the globe.

3.3.2 Users with Maximum Tweets In the Twitter dataset being studied, it has been found that 61,213 unique users posted Tweets on the current vaccination movement. We are interested to identify users who have been continuously involved in discussions solely related to the COVID-19 vaccination drive. Figure 3 depicts the top 10 Twitter users who wrote about this global vaccination program and the percentage share of their Tweets.

3.3.3 Most Frequent Hashtags The dataset under study contains a total of 30,000 hashtags used by Twitter users. Most popular hashtags among the general public while expressing their views and

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . .

19

Fig. 4 Most popular hashtags among users

perceptions regarding the current vaccination drive are identified. Figure 4 shown below reveals some most frequent hashtags used by Twitter users while writing about this affair.

3.3.4 Monthly Statistics of Tweets Since the inception of the COVID-19 pandemic, discussions for its permanent remedial had also started. As the time was being elapsed, people tended to leap into the arguments about the feasible vaccination and its allied affairs. Figure 5 shown below presents the frequency of Tweets about the COVID-19 vaccination from mid-August 2020 to mid-February 2021. Note that the frequencies of the Tweets exhibit a boom in November 2020 when the Pfizer-BioNTech alliance requested emergency use authorization (EUA) of the vaccine they developed. The next spurt of discussions on the COVID-19 vaccination drive can be observed in Jan 2021 and such discussions have been persisting since then. By January 2021, nine different vaccine technologies emerged across the globe and a campaign for a massive vaccination drive gained pace.

3.3.5 Textual Analysis of Tweets A high-level text analysis, popularly known as Wordcloud is presented for better insight into the processed Twitter dataset. A word that occurs more frequently in the corpus achieves relatively bigger and bold in size in the Wordcloud visual depiction.

20

K. Kumar and B. P. Pande

Fig. 5 Frequency of Tweets about the COVID-19 vaccine

Fig. 6 Wordcloud of the frequent words in Tweets about the COVID-19 vaccination drive

More precisely, a word with a larger size holds more weightage than a word with a smaller size [23]. The Wordcloud for the complete corpus of Tweets under study is presented in Fig. 6. Note that the terms ‘covid19vaccine’, ‘vaccine’, ‘covid’ are most common in the corpus and bear more significance accordingly. In the present research, it has been found that active users who have been posting regularly about the present COVID-19 vaccination drive come from China, the US, India, and the UK (Fig. 2). Therefore, Wordclouds for these countries are exercised and presented in Fig. 7.

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . .

21

Fig. 7 Wordclouds of Tweets’ text of users from top four countries

3.3.6 Word and Phrase Associations While classifying people’s sentiments, the information plays a very critical role. A significant aspect of textual analysis involves the identification of the most commonly used words, word pairs, and their sequences present in the corpus. This concept in computational linguistics and NLP is known as N-grams identification. For the current Twitter data of the COVID-19 vaccination program, the most common single words (unigrams), two-word sequences (bigrams), and three-word sequences (trigrams) have been identified and their frequencies are presented in Fig. 8.

4 Experimental Design, Results and Discussions This section deals with the experimental design and empirical analysis for classifying users’ emotions about contemporary affairs of the current COVID-19 vaccination drive. The public sentiments expressed through Tweets are examined by hypothesizing Tweet-text as the independent variable and sentiment class as the dependent variable. The NLTK and TextBlob libraries are exploited to evaluate the polarity and subjectivity of Tweets on the COVID-19 vaccination program. The polarity of each Tweet is determined by assigning a score from −1 to 1 based

22

K. Kumar and B. P. Pande

Fig. 8 Most frequent unigrams, bi-grams and tri-grams in COVID-19 vaccination Tweets

on the words used by the user to express his perception towards the vaccine. The application of the Textblob library classifies the vaccination Tweets into three classes: -1 (negative), 0 (neutral), and 1 (positive). Each Tweet has also been assigned a subjectivity score based on whether it represents a subjective or objective meaning. The range of subjectivity scores is from 0 to 1, where a value near 0 represents an objective Tweet and a value near 1 represents a subjective Tweet. VADER lexical analyzer is utilized to assign a compound sentiment intensity score between -1 and 1 to the Tweets in the dataset. Figure 9 shown below presents a snapshot of a few processed Tweets from our empirical experiments along with their sentiment scores, subjectivity scores, polarity scores and sentiment classes. In Table 6, sample preprocessed Tweets from the three sentiment classes, their sentiments, subjectivity and polarity scores are presented. A positive/neutral/negative score reflects a net positive/neutral/negative score of all words in a Tweet. Figure 10 shown below demonstrates the outputs of the present research and the distribution of neutral, positive and negative Tweets about the global COVID-19 vaccination drive. It has been found that most people bear a neutral opinion about this vaccination program.

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . .

23

Fig. 9 Processed Tweets and experimental scores Table 6 Sample preprocessed Tweets and various analysis scores Tweet text “Australia to Manufacture Covid-19 Vaccine and give it to the Citizens for free of cost AFP quotes Prime Minister CovidVaccine” “Is more dangerous yet to come? Don’t know when CovidVaccine will come out “Serum Institute India is looking to raise up to 1 billion around Rs 7 500 crore for Covid-19 vaccine development”

Sentiment scores Negative: 0.0 Neutral: 0.858 Positive: 0.142 Compound: 0.5106 Negative: 0.184 Neutral: 0.816 Positive: 0.0 Compound: 0.5859 Negative: 0.0 Neutral: 1.0 Positive: 0.0 Compound: 0.0

Subjectivity score 0.800000

Polarity 0.400000

Analysis Positive

0.700000

−0.050000

Negative

0.00000

0.00000

Neutral

4.1 Feature Selection Many contemporary researchers have highlighted that irrelevant or partially relevant features may negatively impact the prediction capability of ML models. Feature selection is an essential process that helps to select the most appropriate and efficient classification models and increases accuracy [24]. Such techniques are applied to derive an optimal set of features for ML classifiers. They help in reducing training time and prevent overfitting of models by choosing a subset of appropriate features for model building. Since ML algorithms cannot directly work on textual data, the BoW and TF-IDF word embedding models are exercised to represent processed Twitter text in numerical vector forms. The BoW is a text modelling technique used in NLP and it is often known as feature extraction from textual data. It is a text representation that defines the order in which words appear in a document. BoW expresses each document as a fixed-length vector of terms, where each keyword aspect is a numerical value such as Term Frequency or TF-IDF weight [25]. On the

24

K. Kumar and B. P. Pande

Distribution of Positive, Negative and Neutral Tweets 50000

45500

45000

40600

40000

Counts

35000 30000

26331

25000 20000 15000 10000 5000 0

Neutral

Positive

Negative

Sentiments

Fig. 10 Distribution of the sentiments about the COVID-19 vaccination drive

other hand, TF-IDF is a method for calculating the number of words in a collection of documents where every word has been assigned a score to signify its value and impact on the document and corpus. Here, each term is weighted by dividing the term frequency by the number of documents in the corpus containing the word [26]. For feature selection, the N-gram model is utilized to extract meaningful multigram features from the processed Twitter corpus. The N-gram model projects the possibility of a given N-gram within any sequence of words in a target language. N-gram models are mostly collected from a text or voice corpus, and its elements can be words, letters, phonetics or its pairs/sequences depending on the specific application. N-grams are generated by partitioning a text string into fixed-length substrings. For example, consider the word RESEARCHER. Its 3-grams would be: ‘RES’, ‘ESE’, ‘SEA’, ‘EAR’ etc. [27, 28].

4.2 Platform Employed and Performance Evaluation Parameters The current study is implemented using Google Colab, which is Google’s cloudbased research platform for writing and executing Python scripts. For training and testing ML classifiers, the dataset is divided into two subsets, training-set and testing-set with the ratio of 75% and 25% respectively. The parametric metrics, accuracy, precision, recall, and F1-measure are taken to assess the performance of the classifiers. The simplest and most vital performance parameter is accuracy, which is just the ratio of successfully predicted observations to total observations. When datasets are symmetric and the value counts of false positives and false

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . .

25

negatives are almost equal, the accuracy measure becomes the most significant metric. The ratio of correctly predicted positive observations to the total predicted positive observations is known as precision. Precision is a good metric to use when the value counts of false positives are significantly high. An effective classifier should have a precision of 1 (high). A recall score is the ratio of correctly predicted positive observations to all observations in an actual class. When the classes are highly unbalanced, recall score is a useful indicator of prediction success. F1-score is defined as a function of precision and recall. If we need to find a balance between precision and recall and there is an unequal class distribution, F1-score would be a preferable statistic to utilize [29, 30]. A Python library named Scikit-learn which includes several ML models is exploited for sentiments prediction and classification. The following ML classifiers are used to predict and classify sentiments of COVID-19 vaccination Tweets: MNB, AdaBoost, and LR. The performance evaluation results of ML classifiers using the BoW word embedding technique with N-gram feature selection methods are shown in Table 7. Empirical findings demonstrated in Table 7, clearly reveal that the LR classifier with the bigram approach performed best than the other two classifiers used for sentiments classification. The results show that the LR classifier attained an accuracy of 87%, precision of 90%, recall of 88% and F1-score of 93% using the bigram feature extraction method. Since our Twitter dataset contains a higher percentage of neutral sentiments, the precision, recall and F1 scores for the neutral class are relatively higher than that of positive and negative sentiments classes. The AdaBoost classifier with trigram feature selection achieved second place and the MNB classifier with trigram feature selection achieved third place among the other classifiers in terms of performance parameters. Figure 11 shown below depicts the comparison of the best three classifiers under BoW. The performance evaluation results of ML classifiers using the TF-IDF word embedding technique with N-gram feature selection methods are shown in Table 8. Table 8 shows that under the TF-IDF, the AdaBoost classifier with the bigram approach performed the best and attained an accuracy of 89%, precision of 92%, recall of 89% and F1-score of 86%. The LR classifier with the trigram approach achieved second place. The TF-IDF technique degraded the performance of the MNB classifier and it stood at sixth place with the unigram approach. Figure 12 shown below depicts the comparison of the three classifiers under the TF-IDF approach. From the above discussions, it is found that in the case of the MNB classifier, the average classification accuracy achieved by BoW (76%) is better than that is achieved by TF-IDF (72%). This classifier is based on the principle of multinomial distribution and follows the generative approach where all attributes are considered independent from each other. For the AdaBoost classifier, the average classification accuracy achieved by TF-IDF (84%) is better than that of the BoW (78%) technique. The AdaBoost algorithm is an ensemble classifier, where a weak classifier model is converted to a strong classifier by adjusting weights. In the case of the LR classifier, the classification accuracy obtained using BoW (81%) is better than that is achieved

Classifiers MNB(Unigram) MNB(Bigram) MNB(Trigram) AdaBoost(Unigram) AdaBoost(Bigram) AdaBoost(Trigram) LR(Unigram) LR(Bigram) LR(Trigram)

Accuracy 0.72 0.75 0.80 0.75 0.78 0.81 0.79 0.87 0.78

Negative class Precision 0.75 0.79 0.81 0.75 0.70 0.77 0.83 0.87 0.78 Recall 0.78 0.81 0.81 0.79 0.76 0.80 0.84 0.86 0.76

F1 0.72 0.85 0.85 0.81 0.80 0.82 0.76 0.85 0.80

Table 7 Performance comparison of ML classifiers through BoW method Neutral class Precision 0.91 0.92 0.90 0.93 0.90 0.90 0.92 0.90 0.89 Recall 0.90 0.95 0.92 0.89 0.94 0.92 0.93 0.88 0.89

F1 0.86 0.89 0.86 0.85 0.93 0.83 0.90 0.93 0.86

Positive class Precision 0.88 0.86 0.90 0.85 0.75 0.93 0.80 0.90 0.92

Recall 0.87 0.88 0.89 0.86 0.83 0.90 0.93 0.87 0.84

F1 0.90 0.91 0.91 0.83 0.80 0.90 0.86 0.91 0.90

26 K. Kumar and B. P. Pande

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . . 95

27

93 93

90

90

92

90

92

90 85

91

88

87

80

81

80

75

70 LR (Bigram)

AdaBoost (Trigram)

Accuracy

Precision

Recall

MNB (Trigram)

F1 Score

Fig. 11 Comparison of the top three classifiers under the BoW approach 95 90

92 89

92 90

90

89

91

88 86

85

84

83

80

77

75 70 65 AdaBoost (Bigram)

LR (Trigram)

Accuracy

Precision

Recall

MNB (Unigram)

F1 Score

Fig. 12 Comparison of the three classifiers under the TF-IDF approach

with TF-IDF (78%). The reason for the better accuracy of the BoW technique over TF-IDF is its high dimensional vector feature due to the huge vocabulary size and highly sparse vectors.

Classifiers MNB(Unigram) MNB(Bigram) MNB(Trigram) AdaBoost(Unigram) AdaBoost(Bigram) AdaBoost(Trigram) LR(Unigram) LR(Bigram) LR(Trigram)

Accuracy 0.77 0.72 0.68 0.82 0.89 0.80 0.78 0.72 0.83

Negative class Precision 0.75 0.72 0.70 0.84 0.87 0.81 0.80 0.74 0.83 Recall 0.80 0.74 0.70 0.85 0.89 0.78 0.82 0.79 0.88

Table 8 Performance comparison of ML through TF-IDF method F1 0.82 0.76 0.68 0.81 0.84 0.80 0.84 0.79 0.84

Neutral class Precision 0.90 0.80 0.75 0.93 0.92 0.90 0.88 0.84 0.86 Recall 0.91 0.85 0.72 0.94 0.89 0.86 0.90 0.82 0.86

F1 0.84 0.87 0.80 0.90 0.86 0.84 0.84 0.82 0.90

Positive class Precision 0.86 0.90 0.79 0.91 0.92 0.90 0.83 0.86 0.92

Recall 0.82 0.91 0.82 0.95 0.89 0.90 0.84 0.82 0.88

F1 0.84 0.87 0.76 0.89 0.86 0.87 0.90 0.84 0.89

28 K. Kumar and B. P. Pande

Analysis of Public Perceptions Towards the COVID-19 Vaccination Drive: A. . .

29

5 Conclusions The ease with which different social media platforms are accessible with available APIs resulted in an explosion of various data services and research domains such as text mining, web scrapping, SA etc. The present research analyzed sentiments and opinions of the general public about the current global COVID-19 vaccination drive expressed through Tweets. The Tweets are scrapped, preprocessed, and visualized first and then transformed into character N-grams and numerical vectors with BoW and TF-IDF word-embedding techniques. The resultant data are submitted to the three separate classifiers, viz. MNV, AdaBoost and LR and their performances are evaluated against a few standard parameters. The empirical results revealed that most of the people bear neutral opinions about the current COVID-19 vaccination drive and people with optimistic mindset are greater in number than those with pessimistic sentiments towards this vaccination campaign. For the BoW technique, the LR classifier with the bigram approach suppressed the other models by achieving the highest accuracy. On the other hand, in the case of the TF-IDF technique, the AdaBoost classifier with the bigram approach outperformed other models and achieved the highest accuracy. The MNB and LR classifiers worked more effectively with the BoW approach, while the AdaBoost classifier was found to be more effective with the TF-IDF technique. An extension of the present research may consider increasing the amount and quality of experimental data, which may help ML classifiers to produce better predictions. For an exhaustive and more balanced dataset, where the training samples for positive, negative, and neutral classes are approximately similar, accuracy can further be refined.

References 1. Ritchie H et al (2021) Coronavirus (COVID-19) vaccinations. Our World Data 2. Infographic RE (2021) Public opinion about COVID-19 vaccination. KANTAR 3. Alam KN, Khan MS, Dhruba AR, Khan MM, Al-Amri JF, Masud M, Rawashdeh M (2021) Deep learning-based sentiment analysis of COVID-19 vaccination responses from Twitter data. Comput Math Methods Med 2021:Article ID 4321131. 1–15 4. Ansari MTJ, Khan NA (2021) Worldwide COVID-19 vaccines sentiment analysis through Twitter content. Electron J Gen Med 18(6):em329. 1–10 5. Cotfas LA, Delcea C, Roxin I, Ioanas C, Simona GD, Tajariol F (2021) The longest month: analyzing COVID-19 vaccination opinions dynamics from tweets in the month following the first vaccine announcement. IEEE Access 9:33203–33223 6. Dubey AK (2021) Public sentiment analysis of COVID-19 vaccination drive in India. SSRN 7. Leung K (2021) COVID-19 vaccine-what’s the public sentiment? Towards Data Sci 8. Piedrahita-Valdés H, Piedrahita-Castillo D, Bermejo-Higuera J, Guillem-Saiz P, BermejoHiguera JR, Guillem-Saiz J, Sicilia-Montalvo JA, Machío-Regidor F (2021) Vaccine hesitancy on social media: sentiment analysis from June 2011 to APRIL 2019. Vaccines 9(1):28 9. Roe C, Lowe M, Williams B, Miller C (2021) Public perception of SARS-CoV-2 vaccinations on social media: questionnaire and sentiment analysis. Int J Environ Res Public Health 18(24):13028

30

K. Kumar and B. P. Pande

10. Singh M, Jakhar AK, Pandey S (2021) Sentiment analysis on the impact of coronavirus in social life using the BERT model. Soc Netw Anal Min 11(33) 11. Yousefinaghani S, Dara R, Mubareka S, Papadopoulos A, Sharif S (2021) An analysis of COVID-19 vaccine sentiments and opinions on Twitter. Int J Infect Dis 108:256–262 12. Hum K (2020) Sentiment analysis: evaluating the public’s perception of the COVID19 vaccine. Towards Data Sci 13. Raghupathi V, Ren J, Raghupathi W (2020) Studying public perception about vaccination: a sentiment analysis of tweets. Int J Environ Res Public Health 17(10) 14. Samuel J, Ali GGMN, Rahman MM, Esawi E, Samuel Y (2020) COVID-19 public sentiment insights and machine learning for tweets classification. Information 11(16):314 15. Tavoschi L, Quattrone F, D’Andrea E, Ducange P, Vabanesi M, Marcelloni F, Lopalco PL (2020) Twitter as a sentinel tool to monitor public opinion on vaccination: an opinion mining analysis from September 2016 to August 2017 in Italy. Hum Vaccines Immunother 16(5):1062–1069 16. Zhang L, Fan H, Peng C, Rao G, Cong Q (2020) Sentiment analysis methods for HPV vaccines related tweets based on transfer learning. Healthcare 8(3):307 17. On J, Park H, Song T (2019) Sentiment analysis of social media on childhood vaccination: development of an ontology. J Med Internet Res 21(6):e13456 18. Mitra T, Counts S, Pennebaker JW (2016) Understanding anti-vaccination attitudes in social media. In: Proceedings of the tenth international AAAI conference on web and social media, Cologne, Germany 19. Numnark S, Ingsriswang S, Wichadakul D (2014) VaccineWatch: a monitoring system of vaccine messages from social media data. In: 8th international conference on systems biology (ISB), Qingdao, China, pp 112–117 20. Salathé M, Khandelwal S (2011) Assessing vaccination sentiments with online social media: Implications for infectious disease dynamics and control. PLoS Comput Biol 21. Martí P, Serrano-Estrada L, Nolasco-Cirugeda A (2019) Social media data: challenges, opportunities and limitations in urban studies. Comput Environ Urban Syst 74:161–174 22. Hajirahimova MS, Aliyeva AS (2017) About big data measurement methodologies and indicators. Int J Modern Educ Comput Sci 9(10):1–94 23. Arora I (2020) Create a word cloud or tag cloud in python. Analytics Vidhya 24. Nagamanjula R, Pethalakshmi A (2020) A novel framework based on bi-objective optimization and LAN2FIS for Twitter sentiment analysis. Soc Netw Anal Min 10(34) 25. Nisha VM, Kumar AR (2019) Implementation on text classification using bag of words model. In: Proceedings of the second international conference on emerging trends in science & technologies for engineering systems (ICETSE-2019) 26. Rahman S, Talukder KH, Mithila SK (2021) An empirical study to detect cyberbullying with TF-IDF and machine learning algorithms. In: International conference on electronics, communications and information technology (ICECIT), pp 1–4 27. Aksayli ND, Islek I, Karaman CC, Güngör O (2021) Word-wise explanation method for deep learning models using character N-gram input. In: 29th signal processing and communications applications conference (SIU), pp 1–4 28. Hamarashid HK, Saeed SA, Rashid TA (2020) Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji. Neural Comput Appl 33:4547–4566 29. Powers DMW (2000) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. ArXiv 30. Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada DE, Fernández-Luna JM (eds) Advances in information retrieval. ECIR 2005. Lecture notes in computer science, vol 3408

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology Bhavtosh Rath and Jaideep Srivastava

Abstract Computational models for the detection and prevention of false information spreading (popularly called fake news) has gained a lot of attention over the last decade, with most proposed models identifying the veracity of information. In this chapter we propose a framework based on a complementary approach to false information mitigation inspired from the domain of Epidemiology, where false information is analogous to infection, social network is analogous to population and likelihood of people believing an information is analogous to their vulnerability to infection. As part of the framework we propose four phases that fall in the domain of social network analysis. Through experiments on real world information spreading networks on Twitter, we show the effectiveness of our models and confirm our hypothesis that spreading of false information is more sensitive to behavioral properties like trust and credibility than spreading of true information. Keywords Fake news mitigation · Epidemiology · Social network analysis

1 Introduction The wide adoption of social media platforms like Facebook, Twitter and WhatsApp has resulted in the creation of behavioral big data, thus motivating researchers to propose various computational models for combating fake news. So far the focus of most research has been on determining veracity of the information using features extracted manually or automatically through techniques such as deep learning. We propose a novel fake news prevention and control framework that incorporates people’s behavioral data along with their underlying network structure, inspired from epidemiology. The framework covers the entire life cycle of spreading: i.e. before the fake news originates, after the fake news starts spreading and containment

B. Rath () · J. Srivastava University of Minnesota, Minneapolis, MN, USA e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_2

31

32

B. Rath and J. Srivastava

of its further spreading. The framework is not to be confused with popular information diffusion based models [12] because they (a) usually categorize certain nodes and cannot be generalized to all nodes, (b) consider only the propagation paths but not the underlying graph structure and (c) can be generalized to information diffusion and need not be particular to fake news spreading. The research problem we try to address is as follows: False information generally gets very low coverage from mainstream news platforms (such as press or television) compared to true information, so an important factor contributing to a user’s decision to spread a fake news on social media is its inherent trust on its neighbor endorsing it. On the other hand, a user would most likely to endorse a true news since it is typically endorsed by multiple credible news sources. Thus, we hypothesize that the less credible nature of false information makes it much more reliant on user’s trust relationship for spreading further than true news does.

2 Related Work Literature of research in fake news detection and prevention strategies is vast, and can be divided broadly into three categories: Content-based, Propagation-based and User-based. Content-Based Majority of research in false information mitigation can be categorized as content-based models. They rely on extracting knowledge from textual or visual based data. Earlier work relied mostly on hand engineering relevant data that exploited linguistic features. Some relevant features used were grammatical structure of a sentence [14] and parts-of-speech [23]. Textual features that other models used included topic distribution and sentiment information [11], language complexity and stylistic features [10], pronoun and exclamation marks [7], content embedding [37], language complexity, readability, moral foundations and psycholinguistic cues [42] and topic distribution [18] to mention a few. A major limitation of such hand engineered features is that they do not fully exploit the rich semantic and syntactic information in the content. More recently deep learning based models have gained popularity as they can automatically extract both simple features and more complex features for text classification. Models implementing recurrent neural networks [13], convolutional neural network [33], convolutional and recurrent neural networks combined [17] have been proposed. More recently other sophisticated deep learning based models have gained popularity. Zhang et al. [45] proposed a graph neural network based multimodal model that aggregates textual information news articles, creators and subjects to identify news veracity. Shu et al. [38] applied attention mechanism that captures both news contents and user comments to propose an explainable fake news detection system. Khattar et al. [15] used textual and visual information in a variational autoencoder model coupled with a binary classifier for the task of fake news detection.

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

33

Propagation-Based Information propagates in social networks through the action of sharing/retweeting that results in a diffusion cascade or tree structure with the source poster at the root. The propagation dynamics of information contents have also been utilized to propose false information detection models. Ma et al. [20] proposed a propagation tree kernel methods to compare the similarity between information propagation trees. Wu et al. [44] defined a random walk graph kernel to compute similarity between different propagation trees. Ma et al. [21] also proposed an improved model using recursive neural networks that used syntactic and semantic parsing to extract features from information cascades. Some models inspired from information diffusion concepts such as Linear Threshold and Independent Cascades have also been proposed [12]. Ruths [35] proposed a context-based detection methods mainly leverages features extracted from the process of information propagation. Recurrent neural networks [33] are used to model temporal data, where propagation data is modeled as sequential data and both temporal and content features are integrated. Lu and Li [19] integrated attention mechanism with graph neural networks using text information and propagation structure to identify whether the source information is fake or not. User-Based User-based models are based on features extracted from two types of data. First kind of features are extracted from content of user profiles. Castillo et al. [4] analyzed hand-crafted user features to study user credibility. Wu and Liu [43] constructed user representations using network embedding approaches on the social network using profile data. The second kind of features is obtained from behavioral data of sharing and responding. Tacchini et al. [39] build a classification model based on who liked the news. Qazvinian et al. [24] collected user engagements to model user behavior patterns that could help identify false news. The role of bots in false information spreading has also been studied. Their use in political campaigns has been studied in detail [2] which has led to building of bot detection tools [3]. The role of echo-chamber effect (polarization of a person’s viewpoints leading to less diverse exposure and discussion between unaligned users) in spreading of false has been studied [25]. Two other major behavioral characteristics namely confirmation bias (the tendency of people to interpret evidence that confirms their pre-existing notions) and naive realism (people’s belief that their perception of reality is true) [36] have been found to make people vulnerable to believing false information. A major limitation with existing models is that they rely on the presence of fake news to generate meaningful features, thus making it difficult to model fake news mitigation strategies. Our framework proposes models using two important components that do not rely on the presence of fake news: underlying network structure and people’s historical behavioral data. Not many computational models have been proposed exploring psychological concepts from historical behavioral data that make people vulnerable to spreading fake news, which our proposed framework can be used to address.

34

B. Rath and J. Srivastava

3 Epidemiology Inspired Framework Epidemiology is the field of medicine which deals with the incidence, distribution and control of infection among populations. In the proposed framework fake news is analogous to infection, social network is analogous to population and the likelihood of people believing a news endorser in the immediate neighborhood is analogous to their vulnerability to getting infected when exposed. We consider fake news as a pathogen that intends to infect as many people as possible. An important assumption we make is that fake news of all kinds is generalized as a single infection, unlike in epidemiology where people have different levels of immunity against different kinds of infections (i.e. the framework is information agnostic). The likelihood of a person getting infected (i.e. believing and spreading the fake news) is dependent on two important factors: (a) the likelihood of trusting a news endorser (a person is more likely to spread a news without verifying its claim if it is endorsed by a neighbor they trust); and (b) the density of its neighborhood, similar to how high population density increases the likelihood of infection spreading, a modular network structure is more prone to fake news spreading. After the infection spreading is identified there is a need to de-contaminate the population. A medicinal cure is used to treat the infected population and thus prevent further spreading of infection. In the context of fake news, a refutation news can serve this purpose. Refutation news can be defined as true news that fact-checks a fake news. Contents from popular fact-checking websites1 are examples of refutation news. In epidemiology the medicine can have two purposes: As control mechanism (i.e. medication), with the intention to cure infected people (i.e. explicitly inform the fake news spreaders about the refutation news) and as prevention mechanism (i.e. immunization), with the intention to prevent uninfected population from becoming infection carriers in future (i.e. prevent unexposed population from becoming fake news spreaders). An infected person is said to have recovered if he either decides to retract from sharing the fake news or decides to share the refutation news, or both. Mapping of epidemiological concepts to the context of fake news spreading is summarized in Table 1. The overview of the proposed epidemiological framework is shown in Fig. 1. Data Collection We evaluate our proposed model using real world Twitter datasets. The ground truth of false information and the refuting true information was obtained from popular fact-checking websites. The source tweet related to the information was obtained through keyword based search on Twitter. Using Twitter API the source tweeter and retweeters (proxy for spreaders), the follower-following network of the spreaders (proxy for social network) were determined. The dataset is publicly accessible [26].

1

https://www.snopes.com/, https://www.politifact.com/.

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

35

Table 1 Mapping epidemiological concepts to false information spreading Infection Population Vulnerable Exposed Spreaders Prevention Control

Epidemiology context Infection People and communities Likely to become infection carriers Neighbors are infected Infected people Medication Immunization

False information spreading context False information Nodes and modular sub-graphs Likely to become false information spreaders Neighbor nodes are false information spreaders False information spreaders Refutation news Refutation news

Fig. 1 Epidemiology inspired framework for fake news mitigation

4 Preliminaries 4.1 Trustingness and Trustworthiness In the context of social media, researchers have used social networks to understand how trust manifests among users (i.e. nodes in a social network). A recent work is the Trust in Social Media (TSM) algorithm which assigns a pair of complementary trust scores to each user, called Trustingness and Trustworthiness scores [32]. Trustingness quantifies how much a user trusts its neighbors and Trustworthiness quantifies the willingness of the neighbors (i.e. nodes to and from edges connecting the user) to trust the user. The TSM algorithm takes a directed graph as input together with a specified convergence criteria or a maximum permitted number of iterations. In each iteration for every user in the network, trustingness and

36

B. Rath and J. Srivastava

trustworthiness are computed using the equations mentioned below: ti(v) =

 ∀x∈out (v)

tw(u) =



 

∀x∈in(u)

w(v, x) 1 + (tw(x))s

w(x, u) 1 + (ti(x))s

 (1)

 (2)

where u and v are user nodes, ti(v) and tw(u) are trustingness and trustworthiness scores of v and u, respectively, w(v, x) is the weight of edge from v to x, out (v) is the set of outgoing edges of v, in(u) is the set of incoming edges of u, and s is the involvement score of the network. Involvement is basically the potential risk a user takes when creating a link in the network, which is set to a constant empirically. The details regarding the algorithmic implementation and involvement score computation are not explained due to space constraints and can be found in [32].

4.2 Believability Believability is an edge score derived from the Trustingness and Trustworthiness scores [27]. It helps us to quantify the potential or strength of directed edges to transmit information by capturing the intensity of the connection between the sender and receiver. Believability for a directed edge is computed as a function of the trustworthiness of the sender and the trustingness of the receiver. while Information Cascade and Linear Threshold are information diffusion models, Believability is a metric to quantify edge weight. More specifically, given users u and v in the context of microblogs such as Twitter, a directed edge from u to v exists if u follows v. The believability quantifies the strength that u trusts on v when u decides to follow v. Therefore, u is very likely to believe in v if: 1. v has a high trustworthiness score, i.e., v is highly likely to be trusted by other users in the network, or 2. u has a high trustingness score, i.e., u is highly likely to trust others. So, the believability score is supposed to be proportional to the two values above, which can be jointly determined and computed as follow (Figs. 2 and 3): Believability(u → v) = tw(v) ∗ ti(u)

(3)

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

37

Fig. 2 Community Health Assessment model

Fig. 3 Vulnerability Assessment of nodes and communities

4.3 Community Health Assessment Model A social network has the characteristic property to exhibit community structures that are formed based on inter-node interactions. Communities tend to be modular groups where within-group members are highly connected, and across-group members are loosely connected. Thus members within a community would tend to have a higher degree of trust among each other than between members across different communities. If such communities are exposed to fake news propagating in its vicinity, the likelihood of all community members getting infected would be

38

B. Rath and J. Srivastava

high. Motivated by the idea of ease of spreading within a community we proposed the Community Health Assessment model. The model identifies three types of nodes with respect to a community: neighbor, boundary and core nodes, which are explained below: 1. Neighbor nodes: These nodes are directly connected to at least one node of the community. The set of neighbor nodes is denoted by Ncom . They are not a part of the community. 2. Boundary nodes: These are community nodes that are directly connected to at least one neighbor node. The set of boundary nodes is denoted by Bcom . It is important to note that only community nodes that have an outgoing edge towards a neighbor nodes are in Bcom . 3. Core nodes: These are community nodes that are only connected to members within the community. The set of core nodes is denoted by Ccom .

5 Vulnerability Assessment In this phase we proposed novel metrics to quantify the vulnerability of nodes and communities to false information spreading for the scenario when false information spreading has not begun. In the context of epidemiology, this would help assess the vulnerability of people before infection spreading begins. We assume that the information spreading is widespread outside of the community, i.e., at least some of the neighbor nodes of the community are spreaders. We define the node- and community-level metrics and explain the computational details are explained as follows: 1. Vulnerability of boundary node, V (b): This metric measures the likelihood of a boundary node b to become a spreader. It is important to note that the method used to quantify vulnerability of a boundary node can be generalized to any node. The metric is derived as follows: The likelihood of node b to believe an immediate neighbor n is a function of the trustworthiness of the neighbor n (n ∈ Nb , where Nb is the set of all neighbor nodes of b) and the trustingness of b, and is quantified as belnb = tw(n) ∗ ti(b), that is, Believability(n → b). Thus, the likelihood that b is not vulnerable to n can be quantified as (1 − belnb ). Generalizing  this, the likelihood of b not being vulnerable to all of its neighbor nodes is ∀n∈Nb (1 − belnb ). Therefore, the likelihood of b to believe any of its neighbors, i.e. the vulnerability of the boundary node b is computed as: V (b) = 1 −



(1 − belnb )

(4)

∀n∈Nb

(C): To compute vulnerability of community, we 2. Vulnerability of community, V consider the community health perspective, i.e., vulnerability of community

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

39

to information approaching from neighbor nodes (i.e., outside the community) towards the boundary node (i.e., circumference of the community). As the scenario does not include information diffusion within the community, thus the metric is independent of the core nodes of the community. This metric measures likelihood of the boundary node set of a community C (BC ) to believe an information from any of its neighbors. The metric is derived as follows: Going forward with the idea in (1), the likelihood that boundary node b is not vulnerable to its neighbors can be quantified as (1 − V (b)). Generalizing this to all b ∈ BC , the likelihood that none of the boundary  nodes of a community are vulnerable to their neighbors can be quantified as ∀b∈BC (1 − V (b)). Thus, the likelihood of community C being vulnerable to any its neighbors, i.e., the vulnerability of the community, is defined as: (C) = 1 − V



(1 − V (b))

(5)

∀b∈BC

The pseudo-code of algorithm to generate the vulnerability metrics is provided in Algorithm 2. Results [28] Through experiments on large news spreading networks on Twitter we show that our proposed metrics can identify the vulnerable nodes for false news networks with higher precision than in true information networks.

40

B. Rath and J. Srivastava

6 Identification of Infected Population In this phase we proposed a node embedding based recurrent neural network model to identify false and true information spreaders for the scenario when information spreading has finished. In the context of epidemiology, this would help identify infected population after the infection spreading is complete. In this work, we propose a novel machine learning based approach for automatic identification of the users spreading false information by leveraging the concept of believability, i.e., the extent to which the propagated information is likely to be perceived as truthful, based on the trust measures of users in Twitter’s retweet network. With the retweet network edge-weighted by believability scores, we use network representation learning to generate user embeddings, which are then leveraged to classify users into as false information spreaders or not. The representation of users can be reasonably learned with the differentiation of variable spreadability of different edges. The key reason why this can result in better user representation learning is that the inter-user believability score will lead to the random walk being biased to favorably travel towards nodes via high believability edges, thus potentially maximize the transmission of information over the network. Computational details are explained as follows: (a) User embeddings: We adopt the second-order proximity between a pair of nodes in a network-based representation learning method [40] which is called LINE, to learn user embeddings based on the retweet network depicted above. The goal is to embed each user ui ∈ V into a lower-dimensional space Rd by learning a function fG : V → Rd , where d is the dimension of the projected vector. Specifically, for each ui , let v i denote the embedding of ui as a node and v i be the representation of ui when treated as a specific context of other nodes. For each edge ui → uj , the conditional probability of uj being generated by ui as context is defined as follow: 

exp(v j · v i ) p(uj |ui ) = |V |  k=1 exp(vk · v i )

(6)

Given this definition, the nodes sharing similar contexts will have similar conditional distributions over the entire set of nodes. To preserve the context proximity, the objective is to make p(uj |ui ) be close to its empirical distribution p(u ˆ j |ui ), where the empirical distribution can be observed from the weighted social context network. Thus, the objective function is defined as: min

 (i,j )∈E



λi ∗ d p(u ˆ j |ui ), p(uj |ui )

(7)

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

41

where d(·, ·) is the distance between two probabilities based on KL-Divergence, λi is the prestige of ui which is set to ui ’s out-degree di following [40], and the empirical distribution is computed as p(u ˆ j |ui ) = wij /di . (b) RNN classification model: An RNN is a type of feed-forward neural network that can be used to model variable-length sequential information such as sentences or time series. A basic RNN is formalized as follows: given an input sequence (x1 , . . . , xT ), for each time step, the model updates the hidden states (h1 , . . . , hT ) and generates the output vector (o1 , . . . , oT ), where T depends on the length of the input. From t = 1 to T , the algorithm iterates over the following equations: ht = tanh(Uxt + Wht −1 + b) ot = Vht + c

(8)

where U, W and V are the input-to-hidden, hidden-to-hidden and hidden-tooutput weight matrices, respectively, b and c are the bias vectors, and tanh(.) is a hyperbolic tangent nonlinearity function. Typically, the gradients of RNNs are computed via back-propagation through time [34]. In practice, because of the vanishing or exploding gradients [1], the basic RNN cannot learn long-distance temporal dependencies with gradient-based optimization. One way to deal with this is to make an extension that includes “memory” units to store information over long time periods, commonly known as Long Short-Term Memory (LSTM) unit [6, 9] and Gated Recurrent unit (GRU) [5]. A GRU has gating units that modulate the flow of the content inside the unit, but a GRU is simpler than LSTM with fewer parameters. The following equations are used for a GRU unit in hidden layer [5]: zt = σ (xt Uz + ht −1 Wz ) rt = σ (xt Ur + ht −1 Wr ) h˜ t = tanh(xt Uh + (ht −1 · rt )Wh ) ht = (1 − zt ) · ht −1 + zt · h˜ t where a reset gate rt determines how to combine the new input with the previous memory, and an update gate zt defines how much of the previous memory is cascaded into the current time step, and h˜ t denotes the candidate activation of the hidden state ht . The structure of our RNN model is illustrated in Fig. 4. We use the recurrent units of GRU to fit the time steps as the basic identification framework. For each source tweet, all of its retweeting users are ordered in terms of the time stamps that indicate when the different users retweet it. In each step, we input the embedding of the user who retweets the message at the time step. Suppose the dimensionality of the generated user embedding is K. The structure of our GRURNN model is illustrated in Fig. 4. Note that an output unit is associated with each

42

B. Rath and J. Srivastava

Fig. 4 Proposed recurrent neural network based spreader classification model

of the time steps, which uses sigmoid function for the probabilistic output of the two classes indicating the input user is a rumor spreading user or not. Let gc , where c denotes the class label, be the ground-truth 2-dimensional multinomial distribution of a user. Here, the distribution is of the form [1, 0] for rumor spreading users and [0, 1] for non-rumor spreading users. For each training instance (i.e., each source tweet), our goal is to minimize the squared error between the probability distributions of the prediction and ground truth: min



(gc − pc )2 +

c



||θi ||2

i

where gc and pc are the gold and predicted distributions, respectively, θi represents the model parameters to be estimated, and the L2-regularization penalty is used for trading off the error and the scale of the problem. Results [27] Experimental results on a large real-world user classification dataset collected from Twitter demonstrate that the proposed method outperformed four baselines with large margin.

7 Risk Assessment of Population In this phase we proposed a model that applies a graph neural network model that samples and aggregates features to identify false information spreaders and non-spreaders for the scenario when information is in the course of spreading. In the context of epidemiology, this would be equivalent to identifying the exposed population that needs to be quarantined to prevent infection spreading. In the context of the Community Health Assessment model visualized in Fig. 5, we consider the following two scenarios where assessing the risk of infection spreading in population becomes important. 1. Information reaches neighborhood of a community: Consider the scenario when a message is propagated by D1 , a neighborhood node for community 3. Node A3 is exposed and is likely to spread the information, thus beginning spread of information into a densely connected community. Thus it is important

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

43

Fig. 5 Community Health Assessment model. Figure shows a social network having 8 disjoint modular communities. Red nodes show false information spreaders and red edges show the spreading path. Black nodes denote non spreaders

to predict nodes in the boundary of communities that are likely to become information spreaders. 2. Information penetrates the community: Consider the scenario where A3 decides to propagate a message. Nodes B3 , D3 and E3 , which are immediate followers of A3 are now exposed to the information. Due to their close proximity, they are vulnerable to believing the endorser. The remaining nodes of the community (C3 , F3 ) are two steps away from A3 . Similarly for community 8 when the message has reached node A8 , nodes D8 and F8 are one step away and remaining community members (E8 , C8 , B8 ) are two steps away. Intuitively, in a closely-knit community structure if one of the nodes decides to spread a piece of information, the likelihood of it spreading quickly within the entire community is very high. Thus it is important to detect nodes within a community that are likely to become information spreaders to protect the health of the entire community. Computational details are explained as follows: Inductive Representation Learning As fake news spreads rapidly, network structure around the spreaders also evolves quickly. Thus, it is important to have a

44

B. Rath and J. Srivastava

scalable model that is able to quickly learn meaningful representations for newly seen (i.e. exposed) nodes without relying on the complete network structure. Most graph representation learning techniques, however, employ a transductive approach to learning node representations which optimizes the embeddings for nodes based on the entire graph structure. We employ an inductive approach inspired from GraphSAGE [8] to generate embeddings for the nodes as the information spreading network gradually evolves. It learns an aggregator function that generalizes to unseen node structures which could become potential information spreaders. The idea is to simultaneously learn the topological structure and node features from the neighborhood (Nbr) nodes, by training a set of aggregator functions instead of individual node embeddings. Using an inductive representation learning model we learn features of the exposed population (i.e. followers of the spreaders) by aggregating trust-based features from their neighborhood nodes. Figure 6 shows how we model the proposed approach with community health assessment model perspective. Nodes outside the solid oval represent Ncom , between solid and dotted oval represents Bcom and within the dotted oval represents Ccom . The proposed framework is explained as follows: First we generate a weighted information spreading network based on interpersonal trust. We then sample neighborhood with a probability proportional to the trust based edge weights. For the sampled neighborhood we aggregate their feature representations. Finally we explain the loss function used to learn parameters of the model.

Fig. 6 Inductive learning framework

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

45

(a) Generating weighted graph: Graph of the information spreading network has edge weights that quantify the likelihood of trust formation between senders and receivers. Once we compute these edge scores, we normalize weights for all out-edges connecting the boundary node. wˆ bx = 

belbx

(9)

∀x∈out (b) belbx

belbx = tw(b)∗ti(x) and out (b) is the set of outgoing edges of b. We normalize weights for all in-edges connecting the boundary node. (b) Sampling neighborhood: Instead of sampling neighborhood as a uniform distribution, we sample a subset of neighbors proportional to the weights of the edges connecting them. Sampling is done recursively till depth K. The idea is to learn features from neighbors proportional to the level of inter-personal trust. Algorithm 2 explains the sampling strategy. (c) Aggregating features: After sampling neighborhood as an unordered set, we aggregate the embeddings of sampled nodes till depth K recursively for each boundary node. The intuition is that at each depth, the boundary nodes incrementally learn trust-based features from the sampled neighborhood. Three aggregation architectures namely mean, LSTM and pooling explained in [8] can be used. For simplicity, we only apply the mean aggregator, which takes the mean of representations hk−1 where u ∈ Nbrk−1 (b). The aggregator is u represented below: k−1 hkb ← σ (Wbk .Mean({hk−1 b } ∪ {hu(∀u∈Nbr(b)) )})

(10)

Algorithm 3 explains the aggregation strategy. (d) Learning parameters: The weight matrices in Algorithm 3 are tuned using stochastic gradient descent on a loss function in order to learn the parameters. We train the model to minimize cross-entropy. Loss(y, ˆ y) = −





yi log yˆi

(11)

¯ } ∀b∈Bcom i∈{bSp ,bSp

The loss function is modeled to predict whether the boundary node is an information spreader (bSp ) or a non-spreader (bSp ¯ ). y represents the actual class (2-dimensional multinomial distribution of [1,0] for spreader and [0,1] for nonspreader) and yˆ represents the predicted class.

46

B. Rath and J. Srivastava

Results [29] Through experiments run on real world false and refutation spreading networks we showed that the proposed inductive learning based framework could identify false information spreaders better than refutation information spreaders and non-spreaders.

8 Infection Control and Prevention In this final phase we proposed a model that applies an attention based graph neural network to identify information spreaders for the scenario when false and its refutation information co-exist. In the context of epidemiology, this would help in targeting people with refutation information to (a) change the role of a false information spreader into a true information spreader (i.e. as an antidote) and (b) prevent further spread of false information (i.e. as a vaccine).

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

47

Very often, information having conflicting veracity, i.e. false information and the true information refuting it (i.e refutation information) co-exist. A typical scenario is that false information originates at time t1 , and starts propagating. Once it is identified, its refutation information is created at time t2 (t1 < t2 ). Both pieces of information propagate simultaneously, with many nodes lying in their common spreading paths. While detecting false information is an important and widely researched problem, an equally important problem is that of preventing the impact of false information spreading. Techniques involve containment/suppression of false information, as well as accelerating the spread of its refutation. Being able to predict the likely action of such users before they are exposed to false information is an important aspect of such a strategy. Trust and Credibility are important psychological and sociological concepts respectively, that have subtle differences in their meanings. While trust represents the confidence one person has in another person, credibility represents generalized confidence in a person based on their perceived performance record [31]. Thus, in a graph representation of a social network, trust is a property of a (directed) edge, while credibility is a property of an individual node. Metzger et al. [22] showed that the interpretation of a neighbor’s credibility by a node relies on its perception of the neighbor based on their trust dynamics. Motivated with this idea, we propose a graph neural network model that integrates people’s credibility and interpersonal trust features in a social network to predict whether a node is likely to spread false information or not. Motivation Figure 7a denotes the distribution of spreaders in F ∩ T who spread false information followed by its refutation (FT) and those who spread refutation followed by the false information (TF). N1 and N9 is excluded from the analysis as our dataset as we did not have the spreaders’ timestamp information. An interesting

Fig. 7 Analysis of spreaders in F ∩ T. F ∩ T: People who spread both false and its refutation (i.e. true) information. (N1-N10): Social network of followers-followee for 10 news events on Twitter. F: False news event. T: Refutation news. FT: People who spread false information followed by its refutation. TF: people who spread refutation followed by the false information

48

B. Rath and J. Srivastava

Fig. 8 Architecture overview. (a) Node neighborhood is fed into the graph neural network. (b) Interpersonal trust dynamics is evaluated using (T r) features. (c) Importance score e is assigned to neighbors using graph attention mechanism. (d) Credibility (Cr) features are aggregated proportional to neighbors’ importance scores using Graph Convolution Networks. (e) Node is identified as either false information spreader (red), refutation spreader (green), or a non-spreader (black)

observation is that the majority of spreaders belong to FT. Intuitively, these are spreaders who trusted the endorser without verifying the information and later corrected their position, thereby implying that they did not intentionally want to spread false information. Consequently, the proposed model can help identify such people proactively in order to take measures to prevent them from endorsing false information in the first place. While spreaders belonging to TF are comparatively fewer (whose intentions are not certain) the proposed model can help identify them and effective containment strategies can be adopted. Figure 7b shows the time that transpired between spreading refutation and false information for FT spreaders. Once the false information is endorsed, large portions of the network must have already been exposed to false information before the endorser corrected themselves after a significant amount of time (∼1 day). This serves as a strong motivation to have a spreader prediction model which proactively identifies likely future spreaders. The proposed attention based graph neural network framework (Fig. 8) is explained below: (a) Importance score using attention: We apply a graph attention mechanism [41] which attends over the neighborhood of i and, based on their trust features, assigns an importance score to every j (j ∈ Ni ). First, every node is assigned a parameterized weight matrix (W) to perform linear transformation. Then, selfattention is performed using a shared attention mechanism a (a single layer feed-forward neural network) which computes trust-based importance scores.

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

49

The unnormalized trust score between i,j is represented as: eij = a(WT ri , WT rj )

(12)

where eij quantifies j ’s importance to i in the context of interpersonal trust. We perform masked attention by only considering nodes in Ni . This way we aggregate features based only on the neighborhood’s structure. To make the importance scores comparable across all neighbors we normalize them using the softmax function: αij = sof tmax(eij ) = 

exp(eij ) k∈Ni exp(eik )

(13)

The attention layer a is parameterized by weight vector a and applied using LeakyReLU nonlinearity. Normalized neighborhood edge weights can be represented as: αij = 

exp(LeakyReLU(aT [WT ri ||WT rj ])) k∈Ni

exp(LeakyReLU(aT [WT ri ||WT rk ]))

(14)

αij thus represents trust between i and j with respect to all nodes in Ni . Each αij obtained for the edges is used to create an attention-based adjacency matrix Aˆ at n = [αij ]|V|×|V| which is later used to aggregate credibility features. (b) Feature aggregation: The Graph Convolution Network [16] is a graph neural network model that efficiently aggregates features from a node’s neighborhood. It consists of multiple neural network layers where the information propagation between layers can be generalized by Eq. (15). Here, H represents the hidden layer and A represents the adjacency matrix representation of the subgraph (A = Aˆ at n ). H (0) = Cr and H (L) = Z, where Z denotes node-level output during transformation. H (l+1) = f (H (l), A)

(15)

We implement a Graph Convolution Network with two hidden layers using a propagation rule as explained in [16]. H (l+1) = σ (Dˆ −1/2 Aˆ Dˆ −1/2 H (l)W (l) )

(16)

Here, Aˆ = A + I , where I is the identity matrix of the neighborhood subgraph. This operation ensures that we include self-features during aggregation of neighbor’s credibility Dˆ is the diagonal matrix of node degrees for  features. (l) ˆ ˆ ˆ A, where Dii = j Aij . W is the layer weight matrix, and σ denotes the activation function. Symmetric normalization of Dˆ ensures our model is not sensitive to varying scale of the features being aggregated.

50

B. Rath and J. Srivastava

(c) Node classification: Using credibility features and network structure for nodes in i’s neighborhood, node representations are learned from the graph using a symmetric adjacency matrix with attention-based edge weights (Aˆ = Dˆ −1/2 Aˆ at n Dˆ −1/2 ). Following forward propagation model is applied: (0) ˆ Z = f (X, Aˆ at n) = sof tmax(Aˆ ReLU(AXW )W (1) )

(17)

X represents the credibility features. W (0) and W (1) are input-to-hidden and hidden-to-output weight matrices respectively, and are learnt using gradient descent learning. Classification is performed using the following cross entropy loss function:   L= Ylf lnZlf (18) l∈YL f ∈Cr where YL represents indices of labeled vertices, f represents each of the credibility features being used in the model, and Y ∈ R |YL |×|Cr| is the label indicator matrix. Results [30] Classification results of the baselines and proposed model are summarized in Table 2. The results are averaged over the 10 news events. We report the precision, recall, and F1 scores of the false information spreaders class (VF ) in F (false) and F ∪ T (false and refutation combined) networks, and of the refutation spreaders class (VT ) in T (refutation) network. We observe that structure only baseline (LI NE) performs better than feature only baselines (SV M), and models that combine both node features and network structure (GCN, SAGE) show further improvement in performance. The proposed SCARLET model outperforms other baselines on most metrics.

9 Conclusion There has been significant interest in developing computational models to mitigate the problem of fake news spreading with majority of focus on determining the veracity of the information and content analysis. We propose a novel spreadercentric false information detection and control model analyzing social network structure, inspired by the domain of Epidemiology. What makes this research different from most existing research is that (a) it proposes a more spreader-centric modelling approach instead of content-centric approach, and (b) it does not rely on features extracted from false information thus serving as motivation to build false information mitigation strategies, even for the scenario when false information has not yet originated. The research has shown encouraging results, and thus serves as motivation to pursue the idea further. It is also worth noting that while the

Prec. 0.512 0.517 0.514 0.626 0.762 0.772 0.831 0.726 0.742 0.774 0.834

Rec. 0.468 0.517 0.579 0.896 0.691 0.710 0.720 0.947 0.953 0.942 0.966

F1 0.478 0.509 0.53 0.733 0.722 0.736 0.763 0.821 0.834 0.848 0.893

The bold values represents the highest value in the metric column

SV MT r SV MCr SV MT r,Cr LI NE SAGET r SAGECr SAGET r,Cr GCNT r GCNCr GCNT r,Cr SCARLET

F (VF ) Accu. 0.497 0.508 0.516 0.686 0.734 0.747 0.779 0.784 0.800 0.824 0.876

T (VT ) Accu. 0.473 0.501 0.52 0.635 0.680 0.714 0.755 0.718 0.731 0.743 0.734 Prec. 0.472 0.477 0.513 0.608 0.698 0.692 0.787 0.675 0.697 0.702 0.674

Rec. 0.452 0.565 0.598 0.881 0.719 0.764 0.732 0.916 0.906 0.916 0.981

Table 2 Model performance evaluation. (VF : False information spreader, VT : Refutation spreader) F1 0.445 0.509 0.545 0.717 0.705 0.725 0.755 0.767 0.773 0.783 0.794

F ∪ T (VF ) Accu. Prec. 0.398 0.19 0.408 0.196 0.444 0.193 0.688 0.71 0.752 0.743 0.764 0.747 0.785 0.764 0.753 0.783 0.762 0.786 0.776 0.788 0.789 0.785

Rec. 0.465 0.542 0.489 0.896 0.859 0.881 0.878 0.930 0.940 0.954 0.972

F1 0.229 0.272 0.267 0.786 0.793 0.805 0.814 0.845 0.851 0.861 0.866

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology 51

52

B. Rath and J. Srivastava

epidemiology framework is proposed for false information prevention, it can also be generalized for mining insights for applications like cyberbullying, abuse detection, viral marketing, etc. Limitations The proposed models are based on the assumption that the action of retweeting (without commenting) is a proxy of believing. The research does not consider retweeting (with commenting), replying with affirmation or the action of liking as proxies of trust mainly due to API limitations. Epidemiology in the infection spreading context is not just concerned with infection spreaders but also people who become ill or die. This distinction is not made in our framework. While epidemiology models are infection-specific, our research considers all kinds of false information as one kind of infection i.e. we do not make distinction between types of false information. Also since we consider action of retweeting as the proxy of infection spreading (i.e. false information believers), our model cannot include spreaders who believe the false information but instead of retweeting and spreading on Twitter (platform specific), decide to either spread offline (word of mouth) or on another social networking platform. The proposed research also does not make a distinction between strong (i.e. family, friends) and weak ties (i.e. acquaintances). Future Work Since we are proposing a content-agnostic framework, our models do not factor interestingness of the information (whether it is carefully manipulated around a popular news topic) or the information endorser (whether the person shares information that is usually considered interesting and is likely to spark activity among other social media users). Integration of models from the domain of natural language processing and sequential data analysis on social networks to capture knowledge from content of people that comprise the social network can be a possible research extension.

References 1. Bengio Y, Simard P, Frasconi P, et al (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks 5(2):157–166 2. Bessi A, Ferrara E (2016) Social bots distort the 2016 US Presidential election online discussion. First Monday 21 3. Davis C, Varol O, Ferrara E, Flammini A, Menczer F (2016) Botornot: a system to evaluate social bots. In Proceedings of the 25th international conference companion on world wide web, pp 273–274 4. Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. In Proceedings of the 20th international conference on world wide web, pp 675–684 5. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. Preprint. arXiv:1409.1259 6. Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850 7. Gupta A, Lamba H, Kumaraguru P, Joshi A (2013) Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In Proceedings of the 22nd international conference on world wide web, pp. 729–736

Spreader-Centric Fake News Mitigation Framework Based on Epidemiology

53

8. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In Advances in neural information processing systems 9. Hochreiter S, Schmidhuber J (1997). Long short-term memory. Neural Comput 9(8):1735– 1780 10. Horne B, Adali S (2017) This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Eleventh international AAAI conference on web and social media 11. Hu X, Tang J, Gao H, Liu H (2014) Social spammer detection with sentiment information. In 2014 IEEE international conference on data mining, pp. 180–189 12. Jin F, Dougherty E, Saraf P, Cao Y, Ramakrishnan N (2013) Epidemiological modeling of news and rumors on twitter. In Proceedings of the 7th workshop on social network mining and analysis, pp 1–9 13. Jin Z, Cao J, Guo H, Zhang Y, Luo J (2017) Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the 25th ACM international conference on multimedia, pp 795–816 14. Johnson M (1998) PCFG models of linguistic tree representations. Comput Linguist 24:613– 632 15. Khattar D, Goud J, Gupta M, Varma V (2019) MVAE: multimodal variational autoencoder for fake news detection. In The world wide web conference, pp. 2915–2921 16. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. Preprint. arXiv:1609.02907 17. Liu Y, Wu Y (2018) Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In Thirty-second AAAI conference on artificial intelligence 18. Long Y (2017) Fake news detection through multi-perspective speaker profiles. Association for Computational Linguistics, Stroudsburg 19. Lu Y, Li C (2020) GCAN: graph-aware co-attention networks for explainable fake news detection on social media. Preprint. arXiv:2004.11648 20. Ma J, Gao W, Wong K (2017) Detect rumors in microblog posts using propagation structure via kernel learning. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), pp. 708–717 21. Ma J, Gao W, Wong K (2018) Rumor detection on twitter with tree-structured recursive neural networks. Association for Computational Linguistics, Stroudsburg 22. Metzger MJ, Flanagin AJ (2013) Credibility and trust of information in online environments: The use of cognitive heuristics. J Pragmat 59:210–220 23. Ott M, Choi Y, Cardie C, Hancock J (2011) Finding deceptive opinion spam by any stretch of the imagination. Preprint. arXiv:1107.4557 24. Qazvinian V, Rosengren E, Radev D, Mei Q (2011) Rumor has it: identifying misinformation in microblogs. In Proceedings of the conference on empirical methods in natural language processing, pp 1589–1599 25. Quattrociocchi W, Scala A, Sunstein C (2016) Echo chambers on facebook. Available at SSRN 2795110 26. Rath B (2021) False and refutation information network and historical behavioral data. Harvard Dataverse. https://doi.org/10.7910/DVN/GHAMOE 27. Rath B, Gao W, Ma J, Srivastava J (2017) From retweet to believability: utilizing trust to identify rumor spreaders on twitter. In Proceedings of ASONAM. ACM, New York, pp 179– 186 28. Rath B, Gao W, Srivastava J (2019) Evaluating vulnerability to fake news in social networks: a community health assessment model. In 2019 IEEE/ACM international conference on advances in social networks analysis and mining. IEEE, New York, pp 432–435 29. Rath B, Salecha A, Srivastava J (2020) Detecting fake news spreaders in social networks using inductive representation learning. In 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 182–189

54

B. Rath and J. Srivastava

30. Rath B, Morales X, Srivastava J (2021) SCARLET: explainable attention based graph neural network for fake news spreader prediction. In Pacific-Asia conference on knowledge discovery and data mining, pp 714–727 31. Renn O, Levine D (1991) Credibility and trust in risk communication. In Communicating risks to the public. Springer, New York 32. Roy A, Sarkar C, Srivastava J, Huh J (2016) Trustingness & trustworthiness: a pair of complementary trust measures in a social network. In Proceedings of the 2016 IEEE/ACM international conference on advances in social networks analysis and mining. IEEE Press, New York, pp 549–554 33. Ruchansky N, Seo S, Liu Y (2017) CSI: a hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on conference on information and knowledge management, pp 797–806 34. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536 35. Ruths D (2019) The misinformation machine. Science 363:348–348 36. Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newslett 19(1):22–36 37. Shu K, Wang S, Liu H (2017) Exploiting tri-relationship for fake news detection. 8 Preprint. arXiv:1712.07709 38. Shu K, Cui L, Wang S, Lee D, Liu H (2019) Defend: explainable fake news detection. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 395–405 39. Tacchini E, Ballarin G, Della Vedova M, Moret S, Alfaro L (2017) Some like it hoax: automated fake news detection in social networks. Preprint. arXiv:1704.07506 40. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 1067–1077 41. Veliˇckovi´c, P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. In ICLR, 2018 42. Volkova S, Jang, J (2018) Misleading or falsification: inferring deceptive strategies and types in online news and social media. In Companion proceedings of the web conference 2018, pp 575–583 43. Wu L, Liu H (2018) Tracing fake-news footprints: characterizing social media messages by how they propagate. In Proceedings of the eleventh ACM international conference on web search and data mining, pp 637–645 44. Wu K, Yang S, Zhu K (2015) False rumors detection on sina weibo by propagation structures. In 2015 IEEE 31st international conference on data engineering, pp 651–662 45. Zhang J, Dong B, Philip S (2020) Fakedetector: effective fake news detection with deep diffusive neural network. In 2020 IEEE 36th international conference on data engineering (ICDE), pp 1826–1829

Understanding How Readers Determine the Legitimacy of Online Medical News Articles in the Era of Fake News Srihaasa Pidikiti, Jason Shuo Zhang, Richard Han, Tamara Silbergleit Lehman, Qin Lv, and Shivakant Mishra

Abstract The rapid spread of fake news during the COVID-19 pandemic has aggravated the situation and made it extremely difficult for the World Health Organization and government officials to inform people only with accurate scientific findings. Misinformation dissemination was so unhindered that social media sites had to ultimately conceal posts related to COVID-19 entirely and allow users to see only the WHO or government-approved information. This action had to be taken because newsreaders lack the ability to efficiently discern fact from fiction and thereby indirectly aid in the spread of fake news believing it to be true. Our work helps in understanding the thought process of an individual when reading a news article. This information can further be used to develop their critical thinking ability. We expand the space of misinformation’s impact on users by conducting our own surveys to understand the factors consumers deem most important when deciding if some piece of information is true or not. Results from our study show that what people perceive to be important in deciding what is true information is different when confronted with the actual articles. We also find that prior beliefs and political leanings affect the ability of people to detect the legitimacy of the information. Keywords Fake news · Critical thinking · Source · Social media

1 Introduction The emergence of the 24-hour news cycle, citizen journalism, and the abundant information available at our fingertips on social media has had a profound impact on not only how we consume news, but also how we trust the news. Besides, in the digital era, there are ecosystems created by domestic groups or foreign actors

S. Pidikiti () · J. S. Zhang · R. Han · T. S. Lehman · Q. Lv · S. Mishra University of Colorado Boulder, Boulder, CO, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_3

55

56

S. Pidikiti et al.

to intentionally promote fake news and conspiracy theories, making this problem much more complicated. In November 2016, an analysis conducted by BuzzFeed News found that the top fake election news stories generated more engagements (likes and shares) on Facebook than the top election stories from nineteen major news outlets combined [27]. The spread of misinformation is a real threat to our society, as it can disrupt the public trust of legitimate news sources and undermine our understanding of important medical findings. To prevent the spread of misinformation, fact-checking has become an article of faith in the era of dueling facts. For example, social media platforms like Facebook and Twitter have teamed up with fact-checking organizations to label fake stories and remove them from the platform [2]. However, the effectiveness of fact-checking is being questioned, as many citizens may resist fact-checking messages due to their prior beliefs. Moreover, in a world in which an entire industry exists to deceive consumers of information, it is hard for these service providers to scale up and automatically fact-check the plethora of information coming from different news sources. With all these challenges, developing people’s ability to discern fact from fiction and responding productively to those who counter data with a belief is critical for fighting misinformation. We perform an investigation by conducting three different surveys on Amazon Mechanical Turk where volunteers were presented with several questions to understand the user’s process in discerning true articles from fake ones. In the first survey, participants rated the importance of each factor based on their prior experience, without reading any news article. This survey revealed that Content and Source were rated as the most important factors, while Picture and Date were rated as the least important. To understand how readers consider factors in determining the veracity of an article, in the second survey, each participant was presented with 20 news articles and was asked to rank factors to assess the article’s veracity. Interestingly, when reading these news articles, Content and Source turned out to be much less important, compared to the first survey, in determining the legitimacy of the news article. In contrast, Title turned out to be a much more important factor compared to the first survey. These results suggest that the factors that affect people’s judgment in determining whether a news article is fake or not vary in actuality. Furthermore, results show that both conservatives and liberals are more likely to correctly identify political news credibility that is not fake and consistent with their beliefs Finally, to understand the critical thinking process among different domains, in the third survey, volunteers were presented with a set of news articles with healthrelated content. This survey revealed that people’s political ideologies affect their ability to discriminate fake news articles from true ones, even when not reading political articles. Results show that content and source are still important for the reader to identify fake articles and we validate the responses by doing qualitative analysis. In this chapter, we systematically investigate how people determine the legitimacy of medical news articles by expanding our prior work to this new domain [23]. First, we detail the work done in the research community to understand people’s

Understanding How Readers Determine the Legitimacy of Online Medical. . .

57

perceptions of an online news article. Then, we explore factors that play important roles in this determination and examine how they vary in different scenarios. We look at several pieces of work in the literature and compare and contrast their approaches, including our own prior work. We extend our prior investigation to understand the user’s critical thinking process when reading medical news articles.

2 Background and Related Work According to a survey done in 2016, the majority of U.S. adults, about 62%, get news on social media, and 18% do so often [13]. Social confirmation in such platforms plays a major role in evaluating the credibility of news articles [19]. If several people consume some information, recommend it, and agree with it, then users assume it is credible information. There is also a high chance that a person might not be exposed to opposing beliefs and, as a result, experience echo chamber effects [6]. Prior studies made successful attempts in using user profile and sentiment score of user comments on a news article to detect fake news [8, 26]. While our study is limited to presenting news articles in a vacuum, the article’s representation in social media can still have an impact on the user’s ability to discern fake news and thus we find our work to be complementary to these social media studies.

2.1 Presentation and Content in True and Fake News Articles Fake news articles are represented in a way that favors their dissemination and credibility in comparison to true articles. In one such study, it is shown that fake and true news articles are notably distinguishable, specifically in the title of the articles [16]. Horne et al. used sources like Buzzfeed election data set, Burfoot and Baldwin data set [5] to understand the distinct differences in presentation between fake and true news articles. On further analysis of these articles by computing various stylistic, complexity and psychological features, it was found that the title of fake news articles tends to use more verb phrases and proper nouns to get the point of the article across without needing to read the content. Chakraborty et al. made an observation that money-making clickbait articles, unlike true articles, have lengthier titles with both content and functional words, frequently misleading the users [7]. This observation was based on semantic and syntactic analysis of 15,000 clickbait and non-clickbait headlines using the Stanford CoreNLP tool. Another notable distinction between true and fake news is in the way the content is presented. In fake news articles, the content is presented similar to that of satire. There is evidence that content’s sentiment score has effective usage in fake news detection [9]. Dickerson et al. computed the sentiment score of tweets during the 2014 Indian elections for this purpose. This score was based on

58

S. Pidikiti et al.

variables like positive or negative sentiment strength, sentiment polarity fractions, contradiction rank. Also, it is found that misinformation spreaders include links to the fact-checking websites like Snopes [28] and Politifact [24] with misleading or inconsistent wording, thereby defeating the purpose of fact checking [25]. Shao et al. explained this competition with subsequent fact-checking, by designing an open platform named Hoaxy, that collects and analyzes tweets on news from low credibility sources and their fact-checks. The source of the article can also be a good indicator of fake news. Helmstetter et al. performed a weakly supervised learning as a two-class classification problem on the data set of tweets with the source labeled as either trustworthy or untrustworthy. They extended this classifier to determine the legitimacy of the article as a whole and achieved high-quality results implying a strong correlation between the source and the article’s credibility [15]. Indeed, Baly et al. based on a study about the factuality of various news reporting websites found that the source of an article is one of the most important factors for credibility [3]. This study collected features like the existence of a Wikipedia page, a Twitter account for the website and the information on website traffic for determining factuality alongside the in-depth analysis of articles’ bias, presentation and sentiment. This prior work focuses on the unique differences between true and fake news articles and thereby aids in automatic fake news detection, essentially focusing on the producer side of online news dissemination. In addition to this, we explore the consumer side of online news dissemination in order to understand the relationship between an article’s representation and the article’s credibility.

2.2 Detecting Fake News Articles: The Reader’s Side Several pieces of work studied the role of analytical thinking in detecting fake news [4, 20, 22, 29]. Here we present a brief survey of these methods. In one study, Moravec et.al. set out to investigate the neural activity of Facebook users while reading news articles [20]. The conclusions of this study found that Facebook users are susceptible to confirmation bias, resulting in only 17% of studied users being able to correctly identify fake news better than chance. This study shows that even though flagging articles as fake news causes an increase in neural activity, it does not lead to higher accuracy in fake news detection. The authors point out that once the user realizes the article headline does not align with their beliefs, the article is simply ignored. Harper et al. suggests that party identification is a major factor in building people’s perceptions and reactions to political news [14]. In this study, researchers presented participants with customized news articles to understand the difference in their perception of the article based on their political leaning. Articles were presented in two formats: one that purposefully aligned with the participants’ political views and one that did not. What the authors found was that news articles that were inconsistent with the participants’ political views ignited more cognitive

Understanding How Readers Determine the Legitimacy of Online Medical. . .

59

activity but did not lead to higher accuracy in recognizing fake news. This effect is an interesting phenomenon when considering that medical news is sometimes presented in a politically charged manner, impacting consumers’ trust even with medical news articles [12]. Gollust et al. designed a study with news articles about medical policies to reduce Type2 diabetes that are supported by different causal frames. They identified that exposure to social determinants messages resulted in varied acceptance levels for medical policies among the participants. This behavior was explained by a possible polarization of the article based on political leaning, with social determinants being presumed as a liberal worldview. Besides a person’s own prior beliefs, repeated exposure to a fake article may ultimately lead to developing a belief that it is true. This effect is often referred to as the continued influence effect of the misinformation phenomenon. In a prior study, this phenomenon was studied in relation to the correction of misinformation [29]. Swire et al. found that in fact, this phenomenon impacts a person’s ability to detect fake information, even after it has been corrected. Another study established that delusion-prone individuals, dogmatic individuals, and religious fundamentalists were more likely to believe fake news, which is explained by suggestions that such individuals have reduced involvement in analytical thinking applications [4]. In this study, Bronstein et al. perform a series of analytical studies, such as the Cognitive Reflection test [10] and an illusory truth test, to determine the correlation of the participants’ beliefs and their gullibility. While all these studies established there are other factors besides analytical thinking ability for the reader to discern fake news, there are a lot of other studies that have contradictory findings. In one research study, it was found that analytical thinking can help in determining the veracity of political articles irrespective of them aligning with their beliefs [22]. Pennycook et al. also perform the Cognitive Reflection test to understand the impact of the participant beliefs and their perceived understanding of the veracity of a news article. In this study, researchers found no difference in fake news detection accuracy of participants between politically consistent and politically discordant articles. Their results show a clear alignment in the participants’ analytical thinking activity and their accuracy for identifying fake news. In a more recent study, Pehlivanoglu et al. conclude that analytical thinking plays a bigger role in identifying fake news but not as much with true news. Furthermore, in this study, they also found that the news source perceived credibility impacted negatively the ability of participants to identify true and fake articles [21]. Understanding how a person comes to a decision regarding an article’s truthfulness is a complex system that combines multiple factors. To aid understanding of this complex system, our research looks to analyze the relative importance of an article’s representation factors like Title, Content, Source, Picture, Authors and Date in how readers determine the legitimacy of online news articles, particularly in the medical field.

60

S. Pidikiti et al.

3 Methodology We conducted three surveys using Amazon Mechanical Turk to understand the role of an article’s representation in determining its veracity. Participants volunteered from all parts of the United States and are evenly distributed in age, political leaning, gender, news reading frequency and the type of community they live in (urban, rural or suburban). These surveys have been administered to three mutually exclusive groups of participants. The first survey was designed to uncover which factors are most important in considering the credibility of an article. We offered participants a few options to choose from (i.e., Title, Picture, Content, Source, Author and Date) and also a blank option where they could enter other factors they deemed important. Based on the results from the first survey, we then conducted two other surveys which presented participants with an article and asked them to determine if it was true or fake. The difference between surveys two and three are in the nature of the articles presented. Survey two focuses on political news while survey three focuses on medical news. The goal of conducting these two surveys is to understand the differences in how readers consider an article’s veracity in the two contexts. Our experimental protocol is reviewed and conducted under IRB Protocol 19-0610.

3.1 Survey 1 This survey was designed to understand the most important factors that users consider in determining the credibility of online news articles. We asked participants to rate the importance of six factors—Title, Picture, Content, Source, Author, Date—without showing any specific news article. The goal of this survey is to understand the relative importance of the six factors in general, irrespective of the specifics of a news article. The survey includes questions to consider a rating scale that includes 1. High, 2. Moderately High, 3. Moderately Low and 4. Low for each factor. We also ask participants to share their demographic information like gender, political Leaning as well as age to ensure that participants have an evenly distributed distribution. The survey also included test questions for validation purposes. The survey paid 50 cents to each participant. A total of 100 responses were collected that passed our quality checks. These 100 respondents were from many parts of the United States and had an equal age distribution, with 20% in each of the five age groups—18–24, 25–34, 35–44, 45–54 and above 55. The distribution of political leaning of respondents was 41% liberals, 25% conservatives and 33% moderates, and the remaining 1% chose “other”.

Understanding How Readers Determine the Legitimacy of Online Medical. . .

61

3.2 Survey 2 Based on the first survey’s finding we designed survey two to understand how readers determine the veracity of political news articles considering the article’s representation. We selected 20 news articles (see Fig. 1) and asked participants to determine if they were true or fake. Then we asked participants to share their thought process by inquiring about the importance of the article’s representation factors (as determined in the first survey) to make their determination. The chosen articles were selected from fact-checking organizations, such as Snopes [28] and Politifact [24]. We maintained a balance distribution of fake and true articles among liberal and conservative-leaning articles. With the balanced distribution, the selected articles contained five liberal-leaning true articles, five conservative-leaning true articles, five liberal-leaning fake articles, and five conservative-leaning fake articles. To minimize the temporal impact of the articles we selected seven articles that were published in 2020, eight published in 2019, two published in 2018, two published in 2016, and one published in 2014. To get a deeper insight into the thinking process of the participants, at the end of the survey we asked participants more details about the articles they were most confident about and the one they were the least confident about. Participants were asked to give a specific explanation for their veracity decision for these two articles and for their confidence level. This survey paid out $1.5 for each participant. We received 100 responses that passed our quality checks. These 100 respondents were from many parts of the United States. The distribution of political leaning of respondents was 40% liberals, 36% conservatives, and 23% moderates, and the remaining 1% chose “other”.

Fig. 1 Example of survey two questionnaire

62

S. Pidikiti et al.

3.3 Survey 3 Similar to survey two, in survey three we asked participants to determine the veracity of a news article but for articles that contained medical content. The purpose of this survey was to characterize the differences, compared with political articles, in considering the veracity of a medical focused article. We selected 10 news articles (see Fig. 2 for an example) and asked participants to determine the veracity of each article as well as the importance of the article’s representation factors to make the determination. The articles were selected from fact-checking organizations, such as Snopes [28] and Politifact [24]. Once again, to maintain a balanced distribution we selected five true articles and five fake articles. To minimize the temporal impact we selected a variety of publication years: two from 2021, three from 2020, two from 2019, two from 2018, and one from 2017. The distribution of political leaning of respondents was 47% liberals, 34% conservatives and 19% moderates. To get a better sense of the participants’ decision-making process, at the end of the survey, we asked participants to give us a more detailed explanation for their decisions for the article they felt most confident about and the one they felt least confident about. We also asked participants to explain why they felt that level of confidence for those particular articles. The survey paid out $1.5 for each participant. We received 100 responses that passed our quality check. These 100 respondents were from many parts of the United States and were evenly distributed in age and gender.

Fig. 2 Example of survey three questionnaire

Understanding How Readers Determine the Legitimacy of Online Medical. . .

63

3.4 Clustering Analysis For all three surveys, we used the k-means clustering algorithm [18] to analyze the participant responses. For all three surveys, we cluster responses using three distinct groups (i.e., the number of clusters k = 3). In all surveys, we used clustering to understand the differences in the responses with respect to the participants’ political leaning. Given that there are three different political leanings among the respondents (liberals, moderates, and conservatives) we assign k to be 3. To perform the clustering, first, we transformed each respondent’s answers into a vector. The vector contained the given ratings for each of the six factors (Title, Picture, Content, Source, Author and Date), divided into an n-dimensional vector for clustering, n being equal to the number of factors, six, in survey one and the number of factors times the number of articles in surveys two and three. For surveys two and three, in order to understand the accuracy in determining if an article was fake, we clustered participants based on the response to the veracity of the article and a k-value was assigned using the elbow method [30]. Each participant’s response regarding the veracity of the article was converted into an x-dimensional vector, where x was equal to the number of articles presented in the survey (20 in survey two and 10 in survey three).

4 Results We present a detailed analysis of the survey results. Surveys two and three are based on the results of survey one, which was used to determine the factors that are most important to determine the veracity of a news article. Each factor’s level of importance has been converted to a numerical format (High-3, Moderately high-2, Moderately low-1, Low-0) for data analysis purposes.

4.1 Survey 1 Participants were asked to rate an article’s representation format, among six factors, in considering the veracity of a news article. The participants’ responses were based on prior experiences. Articles were not presented to the participants when the questions were asked. The initial representation factors were based on our group’s consensus as to which factors should be presented. Most people in our group agreed to the six factors we chose: 1. title, 2. picture, 3. content, 4. source, 5. author, and 6. date . In case other factors should have been considered, we also asked participants to enter other factors they considered beyond the ones offered. As shown in Fig. 3, Content and Source were rated as the most important factors in determining the credibility of news. Whereas Title, Picture, Author and Date were

64

S. Pidikiti et al.

Fig. 3 Average factor ratings for survey one

Fig. 4 Average factor ratings grouped by age for survey one

rated as less important factors. The observed mean differences of the six factors were statistically significant (p < 0.05), according to the one-way ANOVA test results. To understand the results further, we looked into the correlation between the importance of factors and age groups. Figure 4 shows the responses grouped by ages. Content and Source were the most important factors across all age groups. The mean difference between the six factors was statistically significant (p < 0.05), according to one-way ANOVA test results, across all age groups. The title was also rated as highly important among age groups 18–24, 25–34 and 45–54, followed by Author in the same age groups. The political leaning of the respondents was also taken into consideration and we found that it does not have an impact on the responses. Figure 5 shows the responses grouped by political affiliation. Content and Source were still the most important

Understanding How Readers Determine the Legitimacy of Online Medical. . .

65

Fig. 5 Average factor ratings grouped by political leaning for survey one Table 1 Political leaning distribution of clustering analysis for survey one # participants Political leaning

Cluster 1 28 Liberal-46.43% Conservative-25% Moderate-28.57%

Cluster 2 38 Liberal-40.54% Conservative-24.32% Moderate-35.14%

Cluster 3 34 Liberal-41.18% Conservative-35.29% Moderate-23.53%

factors for determining the veracity of an article irrespective of the political leaning. The mean differences of the six factors were statistically significant (p < 0.05), according to the one-way ANOVA test results, regardless of the political leaning. Among moderate respondents, Title was the next most important factor, whereas, among conservative respondents, Author was the next most important factor. We perform k-means (k = 3) clustering to further understand the correlation between the factor’s ratings with the participant’s political leaning. Each of the ratings was split into a 6-dimensional vector. The political leaning distribution of each cluster is shown in Table 1 and the results grouped by cluster are shown Fig. 6. We note that cluster 1 has the highest percentage of liberals (46%), cluster 2 has the highest percentage of moderates (35% ), and cluster 3 has the highest percentage of conservatives (35%). Respondents in cluster 1 rated every factor higher in comparison to respondents in other clusters. In clusters 1 and 3, Title and Content were rated as the most important factors, whereas cluster 2 had Content and Source rated as the most important ones. Finally, the survey results showed that other than the six factors originally presented to participants, another two factors were mentioned: Popularity and Recommendation. Considering the results of this survey, we performed subsequent surveys that included these two (not presented here), but respondents consistently rated them as the least important factors thus we decided to stop including them from further surveys to simplify the presentation of the questions.

66

S. Pidikiti et al.

Fig. 6 Average factor ratings grouped by cluster for survey one

Fig. 7 Accuracy of the veracity determination of participants grouped by political leaning in survey two

4.2 Survey 2 We designed survey two with a balanced set of true and fake political news articles to understand the extent to which the readers with different political leanings could accurately distinguish them. Figure 7 shows the results of survey two. These results show that those who self-identified as having a liberal political leaning performed better detecting true news articles compared to other groups. This difference was statistically significant (p < 0.05), according to Mann-Whitney test results. In general, respondents were more accurate in determining the legitimacy of articles that aligned with their political leaning. Conservatives achieved higher accuracy (74% vs. 63%) in identifying fake articles that were conservative in nature,

Understanding How Readers Determine the Legitimacy of Online Medical. . .

67

while liberals achieved higher accuracy (81% vs. 75%) in identifying fake articles that were liberal. One surprising result is the lower accuracy of the true news articles identification across political groups. Across all groups, the true news detection accuracy was at 43% on average, compared to 71% in fake news detection accuracy. This result seems to indicate that participants were already skeptical of the articles presented in the survey, hence the lower accuracy. We did identify some differences in the accuracy of the true news detection between the political groups. Concretely, liberals had higher accuracy compared to conservatives in identifying true articles for both conservative (44% vs. 42% accuracy) and liberal articles (53% vs. 36% accuracy). We performed k-means clustering on the participants’ responses on the veracity accuracy for all articles (20-dimensional data). Based on the elbow method, k was set at 3 as the optimal clustering point. Respondents in cluster one had a higher fake news detection accuracy, while respondents in cluster two had a lower true news detection accuracy. Respondents in cluster three had a higher true news accuracy than fake. Note that 56% of the respondents in cluster one were liberals and 61% of the respondents in cluster two were conservatives. In conclusion, all these results indicate that political leaning is a potential factor that may influence people’s accuracy in detecting the veracity of news articles. Qualitative Analysis We performed a final step in survey two to assess the quality of our results. We asked participants to share the reasons for their choices for the most confident and the least confident articles. We grouped responses based on the topics mentioned. The results are shown in Table 2 for the most confident article and in Table 3 for the least confident one. In general, there were three major reasons for begin confident or not: content, source and familiarity. These responses align with

Table 2 Reasons for respondents being highly confident about a political news article Category #Participants Sample responses Unreliable content 21 Incredible stupidity of the content. Content was implausible with visual style similar to other fake websites. Poorly written and highly biased content. Familiarity 20 I heard about this from multiple sources. I remember when the article was debunked in 2016. I recall reading in a well known newspaper the Washington examiner. Unreliable Source 11 Name of the source website and ludicrous claims in the article. Source seemed particularly iffy. Source sounds unknown not reputable fake. Misleading Title 8 Title seemed completely click-bait. Content had an incendiary title. Headline seemed fishy to me.

68

S. Pidikiti et al.

Table 3 Reasons for respondents being highly uncertain about a political news article they read Category Unfamiliar with the story

#Participants Sample responses 18 I haven’t heard the story before today. This was unfamiliar to me therefore I wasn’t sure about its accuracy. It looked real but I was unsure because I wasn’t aware of the alleged facts. Trust source but not content 10 The It didn’t sound credible but came from credible source. It came from reliable source but content seemed suspicious. I highly trust National Public Radio but the content would be weird for a possible presidential candidate. Event date 6 I do not know what happened in 1995 and Trump helping out. It is hard to know because it was so long ago. Because the date of news is outdated it should be in 2020. Trust content but not source 4 Content seemed legitimate but the source did not. Content seemed credible but I could not get past the source. Fox News is generally untrustworthy but they do have some news people that are actual journalists despite their conservative news.

our prior findings and solidify the conclusions that content and source are the most important factors affecting the readers’ trust in a political news article.

4.3 Survey 3 We extended our study further to understand the readers’ thought process when it comes to medical news articles. Participants were presented with a balanced set of true and fake articles and were asked to rank the factors that helped them determine the veracity of the article. As shown in Fig. 8, results were similar to survey two (political content). Content, Source and Title were identified as the most important factors in determining the legitimacy of a medical news article. The mean differences of the six factors were statistically significant (p < 0.05), according to one-way ANOVA test results. We found a statistically significant correlation between the importance rating of Content and the participant’s age (p < 0.05), according to spearman’s rankorder test results. Results for survey three grouped by age are shown in Fig. 9. The importance of the Content factor increases with the increasing age of the individual.

Understanding How Readers Determine the Legitimacy of Online Medical. . .

69

Fig. 8 Average factor ratings for survey three

Fig. 9 Average factor ratings grouped by age for survey three

Additionally, results showed that Content, Source and Title were the three most important factors irrespective of the participant’s age. Participants were able to determine the legitimacy of the presented medical news articles with an average accuracy of 64.6%. When looking at the true and fake news articles accuracy determination independently we identified that the fake medical article accuracy was higher, 74.8%. In comparison, the average accuracy of the true medical articles was 54.4%. To be able to compare the results of this survey with survey two, we looked at the impact of political leanings of individuals on the legitimacy determination of medical articles. As shown in Fig. 10, liberals perform better in comparison to Conservatives, with a statistically significant difference (p < 0.05), according to the Mann-Whitney test. This result for medical news articles was consistent with that observed in survey two for political news articles.

70

S. Pidikiti et al.

Fig. 10 Accuracy grouped by political leaning in survey three

Fig. 11 Average factor ratings grouped by cluster for survey three

We performed k-means (k = 3) clustering with ratings for each of the six factors of each of the 10 articles, i.e., 60-dimensional vector representation of each participant. Figure 11 shows the results for survey three grouped by cluster. Based on these results, we concluded that respondents of cluster one had the lowest ratings overall for all factors and respondents in cluster three had the highest ones. To further understand the clustering results, we investigate the political leaning distribution in each cluster (shown in Table 4). The majority of participants in cluster one were conservatives whereas cluster two and cluster three had a higher proportion of liberal-leaning individuals. Participants in cluster one (56.53% accuracy) had lower accuracy rates in determining a medical news article’s veracity in comparison to participants in cluster two (68.04% accuracy) and cluster three (66.43% accuracy).

Understanding How Readers Determine the Legitimacy of Online Medical. . .

71

Table 4 Political leaning distribution of clustering analysis for survey three # participants Political leaning

Cluster 1 26 Liberal-19.23% Conservative-57.69% Moderate-23.08%

Cluster 2 46 Liberal-56.52% Conservative-21.74% Moderate-21.74%

Cluster 3 28 Liberal-57.14% Conservative-32.14% Moderate-10.71%

Table 5 Reasons for respondents being highly confident about a medical news article Category #Participants Sample responses Unreliable content 20 The language used and just the article content itself was ridiculous and obviously fake. Sounds heavily opinionated piece, not a lot of facts presented. The content seemed really over the top and exaggerated. Familiarity 18 I read about it on several news sites when this happened and I found it interesting which is why I remembered it. I remember when this happened during the pandemic. Plausibility 16 Because it was impossible claim. This is extremely unbelievable to the point that it’s comical. Unreliable Source 13 The source isn’t looking legit. The source was Pharmacist Steve, which is just ridiculous. The source is “Natural News” which I’m quite sure isn’t a reputable source.

Qualitative Analysis We conducted further qualitative analysis to understand why users were confident or uncertain about an article’s legitimacy and how these factors were considered in their thought process. Each participant was asked to choose one most-confident article and one least-confident article, and explain his/her decision for choosing each article. We grouped responses into categories according to the topics discussed. The top categories and sample responses for each category are summarized in Tables 5 and 6. In accordance with the main survey results, content and source were the two main factors mentioned when explaining why they were so confident about an article’s veracity. As shown in Table 5, 20 respondents felt confident about their decision because the content’s article was untrustworthy, which lead them to determine that the article was fake. Another 18 respondents felt confident in their decision because they were already familiar with the content. 16 respondents felt confident about their decision because of the plausibility of the article’s content. They simply did not believe the story in the article could be real. 13 respondents felt confident about their decision because they thought it to be an unreliable source. Responses in these four categories, especially about Content and Source, are consistent with results in Survey 3, where these factors are rated as the most important factors in determining the legitimacy of a medical article. Once again corroborating prior findings, content and source were the two main factors mentioned as the reasons for being uncertain about an article’s veracity. As

72

S. Pidikiti et al.

Table 6 Reasons for respondents being highly uncertain about a medical news article they read Category Trust source but not content

#Participants 12

Familiarity with conflicting reports

9

Trust content but not source

8

Sample responses The New York Post is a fairly truthful newspaper but couldn’t be sure about this article. I live in new jersey, and trust nj.com but the content felt a little bit over the top and weird. The Guardian is a reputable source but the article itself seems fake. There are other stories about amoebas and neti pots and while not sure if those are true, it still makes this one seem possible. I have heard conflicting reports of people dying from the flu shot so I was never sure if they were accurate or not. I think I remember reading something similar, but wasn’t sure if it was debunked or not. It cited a psychiatry journal. But the Daily Mail is hit or miss for facts. The story made sense but I’ve never heard of this publication. The content seemed like it could be legit but the source seemed like it could be shady.

shown in Table 6, 12 respondents felt uncertain about their decision because even though they trusted the source, they felt uncertain about the content’s credibility. Nine respondents felt uncertain about their decision because they had encountered another article previously that contradicted the content of the presented article. Interestingly, eight respondents felt uncertain about their decision because even though the content sounded credible, they did not trust the source. Six respondents were uncertain because the events mentioned in the article happened a long time ago. All these responses mentioned specifically Source and Content, which once again is in line with the Survey 3 main results. Since Source and Content are rated important in determining an article’s legitimacy, one of them appearing credible and the other seeming untrustworthy may have prompted respondents to mention being uncertain of that article. Moreover, content familiarity was another cause for being confident (when prior knowledge was in line with the presented story) or uncertain (when prior knowledge contradicted the content of the presented story). All these responses about the decision-making process of respondents for the news articles they read provide qualitative evidence for the factors that rated high in all three surveys.

Understanding How Readers Determine the Legitimacy of Online Medical. . .

73

5 Discussion Prior work shows that people, in general, make judgments about a news article’s legitimacy very quickly [17]. Far from carefully reading the source and content and possibly cross-checking the content, which are supposed to be the most important factors, many people make their judgments immediately after reading the title. This problem is prevalent in social media. Earlier research points out that nearly 80% of people will share news articles online right after reading the headline [11], and only the remaining 20% will read the rest. Even though our participants rate Content and Source as the most important factors, it is not reflected in their real-life behavior. This observation highlights the challenge for people in identifying fake news. Furthermore, “filter bubbles” that social media spawns and “fake news” continue to have drastic consequences for political partisanship. Recent studies show that both Democrats and Republicans are 15% more likely to believe “ideologically aligned headlines” [1]. The results in Survey 2 echo these findings and show that people are more likely to believe political news that is consistent with their political leaning, regardless of whether the news is fake or true. Because of these “filter bubbles” it is critically important to maintain a politically neutral context when it comes to medical news. In this study, we have shown that the thought process that people use in determining political news articles’ veracity is the same when determining the same for medical articles. For this reason, it is important to present medical news without any political leaning attached to it and to help readers not fall under the “filter bubbles” that prevent people from believing or not certain medical findings. Our survey results also notice that the participants who identify themselves as liberals perform better than conservatives in determining the legitimacy of an article. The findings in our work raise fruitful directions for future research. First, social media platforms may consider factors identified here, such as the fact that most people consider the content and source to determine the veracity of news articles, in the design of their news-forwarding mechanisms to minimize the spread of misinformation. Second, future work may investigate not only the perception of fake news but also the consequences of reading news that changes beliefs and behavior in real life. It is an important research question in understanding to what extent and how fake news impacts ongoing medical events worldwide. Limitations There are several limitations in our studies to acknowledge. First, our surveys were run on Amazon Mechanical Turk and the number of participants in each survey was limited to 100 responses, which is a relatively small portion of the population and not a representative sample. However, prior work suggests that Turk workers are a reliable source for studying behavior associated with fake news determination, which is why we chose this method and are confident in our findings [22]. Moreover, all the discussed results are proven to be statistically significant. Second, the user interface of news articles in our study is a confounding factor and difficult to rule out. However, the surveys were designed in a way to reflect how people read diverse news articles on social media. Third, our study environment is not a perfect simulation of a real-world setting. Participants were

74

S. Pidikiti et al.

asked to rate a stream of news articles with no posts from friends, family members, or advertisements. Finally, participants were asked to rate each of the article’s representation factors’ importance before determining the credibility of the article. This could have prompted participants to think more analytically. In future work, we plan to design an application to monitor participants’ real-time online news consumption and query them regarding the most important factors aiding their judgment about each article’s credibility to further explore and address these problems.

6 Conclusion Guided by the open question of what factors contribute to the determination of the legitimacy of online news articles, both in the political and medical arenas, we have conducted empirical surveys and identified that Content and Source are the most important factors, regardless of the theme. We further identify that when shown actual news content, Content and Source remain the most important factors, but their importance is less than when asked to rate these factors in the abstract. Our studies, with an equal number of true and fake articles, reveal differences in accuracy between ideological groups, particularly emphasized in true news articles. Furthermore, we perform a qualitative analysis to understand why users were confident or uncertain about an article’s legitimacy and how factors impacted their thought process. All these results shed light on readers’ thinking process when deciding the credibility of an article and serve to extend the work on improving an individual’s critical thinking skills to combat fake news online.

References 1. Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–236 2. Ananny M (2018) The partnership press: Lessons for platform-publisher collaborations as facebook and news outlets team to fight misinformation. Tow Center for Digital Journalism (2018) 3. Baly R, Karadzhov G, Alexandrov D, Glass J, Nakov, P (2018) Predicting factuality of reporting and bias of news media sources. Preprint. arXiv:1810.01765 4. Bronstein MV, Pennycook G, Bear A, Rand DG, Cannon TD (2019) Belief in fake news is associated with delusionality, dogmatism, religious fundamentalism, and reduced analytic thinking. J Appl Res Memory Cogn 8(1):108–117 5. Burfoot C, Baldwin T (2009) Automatic satire detection: are you having a laugh? In Proceedings of the 2009 ACL-IJCNLP association for computational linguistcs and the international joint conference on natural language processing, pp. 161–164 6. Cerf VG Information and misinformation on the internet. Commun ACM 60(1):9 7. Chakraborty A, Paranjape B, Kakarla S, Ganguly N (2016) Stop clickbait: detecting and preventing clickbaits in online news media. In 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, New York, pp 9–16

Understanding How Readers Determine the Legitimacy of Online Medical. . .

75

8. Cui L, Wang S, Lee D (2019) Same: sentiment-aware multi-modal embedding for detecting fake news. In Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp 41–48 9. Dickerson JP, Kagan V, Subrahmanian V (2014) Using sentiment to detect bots on twitter: are humans more opinionated than bots? In 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM 2014). IEEE, New York , pp 620–627 10. Frederick S (2005) Cognitive reflection and decision making. J Econ Perspect 19(4):25–42 11. Gabielkov M, Ramachandran A, Chaintreau A, Legout A (2016) Social clicks: what and who gets read on twitter? In Proceedings of the 2016 ACM SIGMETRICS international conference on measurement and modeling of computer science, pp 179–192 12. Gollust SE, Lantz PM, Ubel PA (2009) The polarizing effect of news media messages about the social determinants of health. Am J Publ Health 99(12):2160–2167 13. Gottfried J, Shearer E (2016) News use across social media platforms 2016. Pew Research Center, Washington (2016) 14. Harper CA, Baguley T (2019) You are fake news: ideological (a) symmetries in perceptions of media legitimacy. Preprint. PsyArXiv:10.31234 15. Helmstetter S, Paulheim H (2018) Weakly supervised learning for fake news detection on twitter. In 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, New York, pp 274–277 16. Horne B, Adali S. (2017) This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, vol. 11 17. Klein N, O’Brien E (2018) People use less information than they think to make up their minds. Proc Natl Acad Sci 115(52):13222–13227 18. Krishna K, Murty MN (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybernet Part B (Cybernetics) 29(3):433–439 19. Metzger MJ, Flanagin AJ, Medders RB (2010) Social and heuristic approaches to credibility evaluation online. J Commun 60(3):413–439 20. Moravec P, Minas R, Dennis AR (2018) Fake news on social media: People believe what they want to believe when it makes no sense at all. Kelley School of Business Research Paper (18–87) 21. Pehlivanoglu D, Lin T, Deceus F, Heemskerk A, Ebner NC, Cahill BS (2021) The role of analytical reasoning and source credibility on the evaluation of real and fake full-length news articles. Cognit Res Prin Impl 6(1):1–12 22. Pennycook G, Rand DG Lazy, not biased: susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition 188:39–50 23. Pidikiti S, Zhang JS, Han R, Lehman T, Lv Q, Mishra S (2020) Understanding how readers determine the legitimacy of online news articles in the era of fake news. In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, New York, pp 768–775 24. Politifact (2019). http://www.politifact.com 25. Shao C, Hui PM, Wang L, Jiang X, Flammini A, Menczer F, Ciampaglia GL (2018) Anatomy of an online misinformation network. Publ Lib Sci One 13(4):e0196087 (2018) 26. Shu K, Zhou X, Wang S, Zafarani R, Liu H (2019) The role of user profiles for fake news detection. In Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp 436–439 27. Silverman C (2016) This analysis shows how viral fake election news stories outperformed real news on facebook. BuzzFeed News 16 (2016) 28. Snopes (2019). http://www.snopes.com 29. Swire B, Ecker UK, Lewandowsky S (2017) The role of familiarity in correcting inaccurate information. J Exp Psychol Learn Memory Cogn 43(12):1948 30. Thorndike RL (1953) Who belongs in the family? Psychometrika 18(4):267–276

Trends, Politics, Sentiments, and Misinformation: Understanding People’s Reactions to COVID-19 During Its Early Stages Omar Abdel Wahab, Ali Mustafa, and André Bertrand Abisseck Bamatakina

Abstract The sudden outbreak of COVID-19 resulted in large volumes of data shared on different social media platforms. Analyzing and visualizing these data is doubtlessly essential to having a deep understanding of the pandemic’s impacts on people’s lives and their reactions to them. In this work, we conduct a largescale spatiotemporal data analytic study to understand peoples’ reactions to the COVID-19 pandemic during its early stages. In particular, we analyze a JSONbased dataset that is collected from news/messages/boards/blogs in English about COVID-19 over a period of 4 months, for a total of 5.2M posts. The data are collected from December 2019 to March 2020 from several social media platforms such as Facebook, LinkedIn, Pinterest, StumbleUpon and VK. Our study aims mainly to understand which implications of COVID-19 have interested social media users the most and how did they vary over time, the spatiotemporal distribution of misinformation, and the public opinion toward public figures during the pandemic. Our results can be used by many parties (e.g., governments, psychologists, etc.) to make more informative decisions, taking into account the actual interests and opinions of the people. Keywords COVID-19 pandemic · Social media · Misinformation · Sentiment analysis · Spatiotemporal analysis · Data analytics

1 Introduction The abrupt outbreak of COVID-19 has created a global crisis that had affected not only our physical health but also our mental health and way of living [9, 14]. As a result of the pandemic, social media usage has undeniably gone up. In fact, the stayat-home orders that followed the rapid outbreak of COVID-19 have pushed us to

O. A. Wahab () · A. Mustafa · A. B. A. Bamatakina Department of Computer Science and Engineering, Université du Québec en Outaouais, Gatineau, QC, Canada e-mail: [email protected]; [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_4

77

78

O. A. Wahab et al.

rely more and more on the Internet, not only for entertainment purposes but also to work from home, to pursue our education virtually, and to catch up with family and friends. Moreover, as shopping centres, stores and restaurants closed their doors for months, most of our shopping activities shifted to online. For example, a survey led by the leading media and research organization Digital Commerce 360 over 4500 Influenster (product discovery and reviews platform for consumers) community members in North America reported that social media consumption increased up to 72% and that the posting activities went up to 43% during pandemic times.1 Add to this the fact that social media had been the number one communication platform for health professionals, governments, universities and organizations to deliver pandemic-related information to the public [4, 7]. Thus, it is undeniable that the pandemic and the subsequent nationwide lockdowns have entailed a second to none surge in social media usage across the World. Consequently, it becomes crucial to perform a deep social media analysis to extract useful insights about the COVID-19 pandemic and peoples’ reactions to it. Given that the traditional survey methods are time-consuming and expensive to conduct [16], there is a doubtless need for proactive and timely data analytic studies to understand and respond to the speedily emerging effects of the pandemic on our physical and mental health. Several social media analysis studies [1, 3, 6, 10–12, 15] have been lately conducted in an attempt to understand the impacts of the COVID19 on people’s lives and attitudes. Our work aims to complement these studies by providing a large-scale spatiotemporal on peoples’ reactions to the COVID-19 pandemic during its early stages. Our work differs from these studies from two perspectives: (1) unlike most of these studies which capitalize on twitter data, we analyze in this work data collected from many social media platforms such as Facebook, LinkedIn, Pinterest, StumbleUpon and VK, some of which haven’t been included in earlier studies; (2) we focus our study on the first four months of the pandemic in an attempt to understand the evolution of people’s reactions and opinions regarding the pandemic over time.

1.1 Contributions We conduct a large-scale study on a dataset [5] that contains 5.2M posts collected from news/message/boards/blogs about COVID-19 over a period of 4 months (December 2019 to March 2020). The goal is to understand how people reacted to the COVID-19 pandemic during its early stages and how the pandemic affected

1

https://www.digitalcommerce360.com/2020/09/16/covid-19-is-changing-how-why-and-howmuch-were-using-social-media/.

Trends, Politics, Sentiments, and Misinformation: Understanding People’s. . .

79

peoples’ opinions on several matters. To attain this goal, we formulate the following specific research questions that we aim to answer through our study: 1. How did the number of COVID-19-related postings evolve on social media over time? 2. Which online Web sites were the most consulted by social media users to get updates on COVID-19? 3. How did the interests of social media users in the ramifications of COVID-19 vary across the first four months of the pandemic? 4. How did the spread of illegitimate information evolve over time? 5. What countries were the most targeted by the posts shared on social media? 6. What are the temporal and geographic distributions of illegitimate information? 7. Which public figures were the most mentioned on social media? 8. What were the public sentiments toward the most mentioned public figures?

1.2 Organization In Sect. 2, we review the related studies and highlight the unique contributions of this work. In Sect. 3, we first describe the environment and tools used to conduct our study and then present and discuss the results. Finally, in Sect. 4, we summarize the main findings of the paper.

2 Related Work In [11], the authors aim to examine the volume, content, and geo-spatial distribution of tweets related to telehealth during the COVID-19 pandemic. To do so, public data on telehealth in the United States collected from Twitter for the period of March 30, 2020 to April 6, 2020 have been used. The tweets were analyzed using a mixture of cluster analysis and natural language processing techniques. The study suggests the importance of social media in promoting telehealth-favoring policies to counter mental problems in highly affected areas. In [6], the authors aim to study the impact, advantages and limitation of using social networks during the COVID-19 pandemic. The authors concluded that social media is important to foster the dissemination of important information, diagnostics, treatments and follow-up protocols. However, according to the authors also, social media can also be negatively used to spread fake data, pessimist information and myths which could contribute in increasing the depression and anxiety among people. In [3], the authors perform a large-scale analysis of COVID-19-related data shared on Instagram, Twitter, Reddit, YouTube and Gab. Particularly, they investigate the engagement of social media users with COVID-19 and provide a

80

O. A. Wahab et al.

comparative evaluation on the evolution of the discourse on each social media platform. The main finding of the article is that the interaction patterns of each social media along with the distinctiveness of each platform’s audience is a crucial factor in information and misinformation spreading. In [15], the authors investigate the propagation, authors and content of false information related to COVID-19. To do so, they gathered 1500 fact-checked tweets associated with COVID-19 for the period of January to mid-July 2020, of which 1274 are false and 226 are partially false. The study suggests that (1) verified twitter accounts including those of organisations and celebrities contributed in generating or propagation misinformation; (2) tweets with false information often tend to defame legit information on social media and (3) authors of false information use less cautious language, seeking to harm others. In [10], the authors develop a Web application, called CoVerifi, to asses the credibility of COVID-19-related news. CoVerifi integrates machine learning [17] and human feedback to evaluate the credibility of the news. It first enables users to give a vote on the content of the news, resulting in a labelled dataset. A Bidirectional Long Short-Term Memory (LSTM) machine learning model is then trained on this dataset to predict future false information. In [1], the authors propose a Markov-inspired computational method to characterize topics in tweets within an specific period in Brazil. The proposed solution seeks to address the abuse of social media from three perspectives, which are: (1) providing a better understanding of the fact-checking actions during the pandemic; (2) studying the contradictions between the pandemic and political agendas and (3) detecting false information. In [12], the authors conduct a large-scale study on Twitter-generated data that spans over a period of two months. The study concludes that social media have been used to delude users and reorient them to extraneous topics and promote wrongful medical measures and information. On the bright side, the authors noted the importance of credible social media users of different roles (e.g., influencers, content developers, etc.) in the battle against the COVID-19 pandemic. The unique contributions of our work compared to these studies are two-fold: (1) while most studies base their analysis on data collected from twitter, we capitalize in this work on data collected from various social media mediums, i.e., Facebook, LinkedIn, Pinterest, StumbleUpon and VK, where some of these mediums are not considered in the previous studies; (2) our study is based on the first four consecutive months of the pandemic with the goal of understanding the evolution of people’s reactions and opinions on matters related the pandemic over time. Thus, the insights extracted from our study are original, given the social media platforms and time interval we consider.

Trends, Politics, Sentiments, and Misinformation: Understanding People’s. . .

81

3 Reactions to COVID-19 During its Early Stages: Social Media Analytics 3.1 Dataset and Implementation Environment Our analysis is done on a JSON-based dataset [5] which is collected from news/message boards/blogs about COVID-19 over a period of 4 month, for a total of 5.2M posts. The time frame of the data is December 2019 to March 2020. The posts are in English mentioning at least one of the following: “Covid”, “CoronaVirus” or “Corona Virus”. To analyze the dataset, we employ the MongoDB2 documentoriented, distributed, JSON-based database platform. More specifically, we write the code in the form of MapReduce queries in MongoDB, which helps us analyze large volumes of data in a distributed way and generate useful aggregated results.

3.2 Analysis Results We explain hereafter the results of our analysis in terms of number of posts related to COVID-19 over time; number of published news per Web site, per month; geographic distribution of shared news; geographic and temporal trends in fake news; and opinions about public figures.

3.2.1 Number of Posts Related to COVID-19 Over Time We study in Fig. 1 the evolution of the number of COVID-19-related posts on the different studied social media mediums during the first four months of the pandemic. We notice from the figure that the number of posts grew exponentially from December 2019 to March 2020. This might be justified by the fact that in December 2019, the virus was still in its infancy and was somehow limited to China. Starting from January 2020 when the World Health Organization (WHO) published the first Disease Outbreak News on the COVID-19 (January 5, 2020) and where the first case of COVID-19 was recorded outside of China,3 the number of posts related to COVID-19 started to increase exponentially. Yet, it is worth noticing that the sharpest increase in the number of posts was recorded in March 2020. The reason can be attributed to the fact that during this month, the virus started to spread across the globe and many of the countries started to apply many restrictions such as lockdown and social distancing to contain the spread of COVID-19.

2 3

https://www.mongodb.com/. https://www.who.int/news/item/27-04-2020-who-timeline---covid-19.

82

O. A. Wahab et al.

Number of Posts Related to COVID-19 36,23,729.00

40,00,000.00

Number of Posts

35,00,000.00 30,00,000.00 25,00,000.00 20,00,000.00

15,00,000.00

10,64,718.00

10,00,000.00 5,00,000.00

1,42,959.60

0.00 Nov-19

Dec-19

3,68,691.41

Dec-19

Jan-20

Jan-20

Feb-20

Mar-20

Month Fig. 1 The number of posts increased exponentially from December 2019 to March 2020 with the sharpest increase being in March 2020

3.2.2 Number of Published News Per Web Site, Per Month In Fig. 2, we give a breakdown of the Web sites that were cited on the social media platforms as sources of information for the months of December 2019 (Fig. 2a), January 2020 (Fig. 2b), February 2020 (Fig. 2c) and March 2020 (Fig. 2d). By observing Fig. 2a, we notice that the medical Really Simple Syndication (RSS) feed provider was the most cited Web site in December 2019 with a big gap vis-à-vis the other Web sites. This indicates that at that period of the pandemic, the people were mostly interested in learning more about this new generation of viruses from a medical perspective. As for January 2020, we notice by observing Fig. 2b that the most cited Web site was MarketScreener followed by BNN Bloomberg. Knowing that MarketScreener is a company that operates as an international stock market and financial news Website and that BNN Bloomberg is Canada’s Business News Network reporting on finance and markets, we conclude that in the second month of the pandemic, people were more interested in knowing the impacts of the pandemic on the local and global financial markets. On the other hand, we notice from Fig. 2c and d that the trend started to change in February and March 2020 where the most cited Web sites become those that are news-oriented such as Fox News, Yahoo News and The Guardian. This indicates that in this period of time, people started to consult more new-related sites to get news on the emergency measures adopted by the governments and the impacts of the pandemic on the political situation such as the 2020 United States presidential election [2, 8].

Trends, Politics, Sentiments, and Misinformation: Understanding People’s. . .

thehealthsite.com myscience.uk newspatrolling.com us.acrofan.com nature.com who.int mdpi.com cidrap.umn.edu info.gov.hk medworm.com

24 24 24 44 44 64

Number of Published Posts Per Site - January 2020

Social Media Site

Social Media Site

Number of Published Posts Per Site - December 2019

104 144 164 744 0

100

200

300

400

500

600

700

1768 136 500

1000

1500

Number of Posts

(a) December 2019

(b) January 2020

27409 5000

10000

15000

20000

25000

2000

Number of Published Posts Per Site - March 2020

Social Media Site

Social Media Site

988 1079 1079 1170 1222 1248 1313 1443

0

800

5134 5199 5367 5796 6108 6160 6186 6251 6329

0

uk.makemefeed.com sg.news.yahoo.com finance.yahoo.com urdupoint.com news.yahoo.com onenewspage.com sharenet.co.za bnnbloomberg.ca marketscreener.com breitbart.com

Number of Posts

Number of Published Posts Per Site - February 2020 marketscreener.com news.ycombinator.com foxbusiness.com bnnbloomberg.ca breitbart.com onenewspage.com marketwatch.com theguardian.com news.yahoo.com foxnews.com

83

finance.yahoo.com marketscreener.com breitbart.com dailymagazine.news express.co.uk timesofindia.indiatimes.com onenewspage.com news.yahoo.com theguardian.com foxnews.com

30000

11541 11580 11762 11956 11969 12996 15699 17025 20274 68970 0

20000

40000

60000

Number of Posts

Number of Posts

(c) February 2020

(d) March 2020

80000

Fig. 2 The interests of social media users varied significantly across the first four months of the pandemic from being medical-oriented to being market-oriented and then news-oriented. (a) December 2019. (b) January 2020. (c) February 2020. (d) March 2020

3.2.3 Geographic Distribution of Shared News We measure in Fig. 3d the geographic distribution of the news shared on social media per month, across the four studied months. Starting with December 2019 (Fig. 3a), we notice that in this month, Germany accounted for 38% of the news, followed by the United States with a percentage of 29%, Hong Kong with a percentage of 9%, Canada with a percentage of 8%, United Kingdom and Ireland with a percentage of 4%, Switzerland with a percentage of 3.5%, South Korea with a percentage of 0.025%, and Luxembourg and France with a percentage of 0.01%. In January 2020 (Fig. 3b), we notice that the US accounted for more than the half of the news with a percentage of 53% followed by the United Kingdom with a percentage of 9%, with a big noticeable percentage gap between the two countries. We also notice that some new countries started to appear in the shared news such as India, Australia, Singapore, the Philippines and South Africa. In February 2020 (Fig. 3c) and March 2020 (Fig. 3c), the geographic distribution status quo remains almost the same with the US being in the lead with a percentage of 63% in February 2020 and a percentage of 62% in March 2020.

84

O. A. Wahab et al. Geographic Distribution of Published Posts - January 2020

Geographic Distribution of Published Posts - December 2019 0.025

3.50%

0.03

0.01

0.01

4.00%

0.03

0.03

0.04 0.05

4.00% 38.00%

8%

0.06 7%

9%

53%

7% 29%

9%

Germany

United States

Hong Kong

Canada

United Kingdom

United States

United Kingdom

India

Canada

Australia

Ireland

Switzerland

South Korea

Luxembourg

France

France

South Africa

Singapore

Philippines

Ireland

(a) December 2019

(b) January 2020 Geographic Distribution of Published Posts - March 2020

Geographic Distribution of Published Posts - February 2020 3%

3%

2%

3% 3%

2% 2%

2%

3%

3% 5%

4% 5%

5% 7%

5% 63%

8%

62%

10%

United States

United Kingdom

Australia

India

Canada

United States

United Kingdom

India

Australia

Canada

France

Ireland

China

Netherlands

Singapore

Ireland

South Africa

France

Nigeria

Netherlands

(c) February 2020

(d) March 2020

Fig. 3 Starting from January 2020, the US accounts for more than the half of the news shared on social media. (a) December 2019. (b) January 2020. (c) February 2020 . (d) March 2020

Evolution of Fake News over Time

News Legitimacy (%)

100% 95% 90% 85% 80% 75% Dec-19

Jan-20

Feb-20

Mar-20

Date Percentage - Legitimate

Percentage - Probably Legitimate

Percentage - Fake

Fig. 4 The amount of fake news increased six times from December 2019 to March 2020

3.2.4 Geographic and Temporal Trends in Fake News In Fig. 4, we study the evolution of fake news spread across the first four months of the pandemic. The news are classified into three categories, i.e., legitimate, probably legitimate and fake. In the dataset, each shared news is associated with a spam score

Trends, Politics, Sentiments, and Misinformation: Understanding People’s. . .

85

Fig. 5 The United States accounted for 70% of the (English-based) illegitimate news on the studied social media platforms in the period between December 2019 to March 2020

in the interval [0, 1]. A Spam Score quantifies the percentage of news with similar features to news that were already classified as illegitimate. To classify the news, we adopt the method proposed by Link Explorer4 which is based on the following criteria: – News with a spam score between 1% and 30% are considered legitimate. – News with a spam score between 31% and 60% are considered to be probably legitimate. – News with a spam score between 61% and 100% are considered illegitimate. By carefully looking at Fig. 4, we notice that the spread of fake news has considerably increased over time. From a percentage of 2% in December 2019 to a percentage of 13% in March 2020. Thus, we conclude that the amount of fake news has increased six times in a period of four months. In Fig. 5, we study the geographic distribution of fake news. By observing figure, we notice that 70% of the fake news came from the United States, followed by 9% from India, 5% from the United Kingdom, 5% from Australia, 4% from the Philippines, 4% from Canada and 3% from China. It is worth mentioning that the fact that the collected news are restricted to the English language only might have influenced the geographic distribution of the news in general, including the illegitimate ones.

4

https://moz.com/help/link-explorer/link-building/spam-score.

86

O. A. Wahab et al.

Opinions Distribution (%)

Opinions about Public Figures During the First Months of the Pandemic 100%

89%

86%

86%

83%

85%

89% 77%

71%

80% 60% 40% 20%

9% 2%

12% 2%

Justin Trudeau

Narenda Modi

20% 9%

12% 2%

15% 2%

Andrew Cuomo

Bernie Mike Pence Boris Sanders Johnson

13% 2%

9% 2%

18% 5%

0% Joe Biden

Donald Trump

Public Figure Name Positive Feeling

Negative Feeling

Neutral Feeling

Fig. 6 The most controversial public figures in the period between December 2019 to March 2020 were Joe Biden and Donald Trump

3.2.5 Opinions About Public Figures Finally, we identify in Fig. 6 the public figures that were the most mentioned on social during the first fourth months of the pandemic and provide a detailed breakdown of the overall sentiment of the public towards them. Specifically, the top eight most mentioned public figures on the considered social media platforms in that period were: Justin Trudeau (Prime Minister of Canada), Narendra Modi (Prime Minister of India), Joe Biden (Presidential Candidate at the United States elections during the analyzed period), Andrew Cuomo (New York’s Governor), Bernie Sanders (United States Senator), Mike Pence (Vice President of the United States during the analyzed period), Boris Johnson (Prime Minister of the United Kingdom during the analyzed period) and Donald Trump (President of the United States during the analyzed period). To perform the sentiment analysis, we use the AFINN lexicon [13] which records over 3300+ words with a polarity score (i.e., positive, negative or neutral) associated with each word. Starting with Justin Trudeau, 89% of the authors were neutral about him, 9% had a negative feeling and 2% had a positive feeling. Moving to Narenda Modi, 86% of the authors were neutral about him, 12% had a negative feeling and 2% had a positive feeling. As for Joe Biden, 71% of the authors were neutral about him, 20% had a negative feeling and 9% had a positive feeling. Concerning Andrew Cuomo, 86% of the authors were neutral about him, 12% had a negative feeling and 2% had a positive feeling. Concerning Bernie Sanders, 83% of the authors were neutral about him, 15% had a negative feeling and 2% had a positive feeling. As for Mike Pence, 85% of the authors were neutral about him, 13% had a negative feeling and 2% had a positive feeling. Concerning Boris Johnson, 89% of the authors were neutral about him, 9% had a negative feeling and 2% had a positive feeling. Moving to Donald Trump, 77% of the authors were neutral about him, 18% had a negative

Trends, Politics, Sentiments, and Misinformation: Understanding People’s. . .

87

feeling and 5% had a positive feeling. Overall, we conclude from this figure that the most controversial (having higher positive and negative sentiments toward them) personages were Joe Biden and Donald Trump who were in a fierce competition for the 2020 United States presidential election. This also hints that the COVID-19 pandemic had an effect on the people’s general opinion regarding candidates in the 2020 United States presidential election.

4 Conclusion We analyze in this work a dataset that contains news/message/boards/blogs in English about COVID-19 for the period December 2019 to March 2020 from several social media platforms such as Facebook, LinkedIn, Pinterest, StumbleUpon and VK. Our results suggest that (1) the number of posts related to COVID-19 increased exponentially from December 2019 to March 2020; (2) interests of social media users changed from being health-oriented in December 2019 to being economicsoriented in January 2020, and news-oriented in February and March 2020; (3) the amount of fake news increased six times from December 2019 to March 2020; (4) most of the news, including the illegitimate ones, originated from the United States; (5) people mostly had a neutral sentiment toward public figures with negative sentiments prevailing positive ones; (6) the most controversial public figures with more positive and negative sentiments in the studied period were Joe Biden and Donald Trump. Acknowledgments This work is partially funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grant number RGPIN-2020-04707 and by the Université du Québec en Outaouais (UQO).

References 1. Ceron W, de Lima-Santos MF, Quiles MG (2021) Fake news agenda in the era of covid-19: identifying trends through fact-checking content. Online Soc Netw Media 21:100116 2. Chen E, Chang H, Rao A, Lerman K, Cowan G, Ferrara E (2021) Covid-19 misinformation and the 2020 us presidential election. The Harvard Kennedy School Misinformation Review (2021) 3. Cinelli M, Quattrociocchi W, Galeazzi A, Valensise CM, Brugnoli E, Schmidt AL, Zola P, Zollo F, Scala A (2020) The covid-19 social media infodemic. Sci Rep 10(1):1–10 4. Featherstone RM, Boldt RG, Torabi N, Konrad SL (2012) Provision of pandemic disease information by health sciences librarians: a multisite comparative case series. J Med Libr Assoc 100(2):104 5. Geva R (2020) Free dataset from news/message boards/blogs about coronavirus (4 month of data - 5.2m posts) (2020). https://doi.org/10.21227/kc4v-q323 6. González-Padilla DA, Tortolero-Blanco L (2020) Social media influence in the covid-19 pandemic. Intl Braz J Urol 46:120–124 7. Hussain W (2020) Role of social media in covid-19 pandemic. Intl J Front Sci 4(2):59–60

88

O. A. Wahab et al.

8. James TS, Alihodzic S (2020) When is it democratic to postpone an election? elections during natural disasters, covid-19, and emergency situations. Elect Law J Rules Polit Pol 19(3):344– 362 9. Khanna RC, Cicinelli MV, Gilbert SS, Honavar SG, Murthy GV (2020) Covid-19 pandemic: lessons learned and future directions. Indian J Ophthalmol 68(5):703 10. Kolluri NL, Murthy D (2021) Coverifi: a covid-19 news verification system. Online Soc Netw Media 22:100123 11. Massaad E, Cherfan P (2020) Social media data analytics on telehealth during the covid-19 pandemic. Cureus 12(4) 12. Mourad A, Srour A, Harmanai H, Jenainati C, Arafeh M (2020) Critical impact of social networks infodemic on defeating coronavirus covid-19 pandemic: Twitter-based study and research directions. IEEE Trans Netw Serv Manage 17(4):2145–2155 13. Nielsen FÅ Afinn (Mar 2011). http://www2.compute.dtu.dk/pubdb/pubs/6010-full.html 14. Organization WH et al (2020) Mental health and psychosocial considerations during the covid19 outbreak. 18 March 2020. Tech. Rep., World Health Organization 15. Shahi GK, Dirkson A, Majchrzak TA (2021) An exploratory study of covid-19 misinformation on twitter. Online Soc Netw Media 22:100104 16. Valdez D, Ten Thij M, Bathina K, Rutter LA, Bollen J (2020) Social media insights into us mental health during the covid-19 pandemic: longitudinal analysis of twitter data. J Med Intern Res 22(12):e21418 17. Wahab OA, Mourad A, Otrok H, Taleb T (2021) Federated machine learning: survey, multilevel classification, desirable criteria and future directions in communication and networking systems. IEEE Commun Surv Tutor 23(2):1342–1397

Citation Graph Analysis and Alignment Between Citation Adjacency and Themes or Topics of Publications in the Area of Disease Control Through Social Network Surveillance Moses Boudourides, Andrew Stevens, Giannis Tsakonas, and Sergios Lenis

Abstract This paper presents a Data-Network Science study on a dataset of publications archived in The Semantic Scholar Open Research Corpus (S2ORC) database and categorized under the area of “Disease Control through Social Network Surveillance,” an area abbreviated from now on as “DCSNS.” In particular, our dataset consists of 10,866 documents (which are articles and reviews), retrieved through a Boolean search, published in the period from 1983, the first year of cataloguing such publications in S2ORC, to 2020. Retrieving also the corpus of abstracts of these documents (publications) and applying the standard LDA Topic Modeling technique, we found an optimal number of six topics producing the maximum topic coherence score among the corresponding topic models with varying numbers of topics. In that matter, the network of our study becomes a directed citation graph of publications in the area of DCSNS, with nodes/publications labeled by the Topics (into which Topic Modeling categorizes words from their abstracts). Our aim is to study global and local network properties with regards to clustering under triadic relationships amongst connected nodes/publications, and with regards to the assortativity of attributes related to the content of publications (such as types of publications and themes in the employed keyword searches). Thus, we have succeeded in analyzing the interplay between semantics and structure in the M. Boudourides () Department of Computer Science, Haverford College, Haverford, PA, USA School of Public Affairs, Arizona State University, Phoenix, AZ, USA e-mail: [email protected]; [email protected] A. Stevens SPS, Northwestern University, Evanston, IL, USA e-mail: [email protected] G. Tsakonas Library & Information Center, University of Patras, Patras, Greece e-mail: [email protected] S. Lenis Citrix, Patras, Greece © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_5

89

90

M. Boudourides et al.

area of publications on DCSNS, by examining and discovering the occurrence of certain important attributes of publications in such a way that the aggregation of publications according to these attributes is associating the meaning of attribute affiliations to certain structural patterns of clustering, exhibited by the bibliographic citation network of the collected publications. Keywords Scientometric networks · Citation graphs · DAG · Nodal degrees · Triadic closure and openness · Assortativity · Topic modeling

1 Introduction A graph or network is a pattern of pairwise interactions between nodes or vertices, where such interactions are formally represented by links or edges, which, in general (among other typologies), might be directed or undirected, simple (binary) or multiple (weighted), while both nodes and attributes can be labeled by various attributes that may characterize them. The relatively old mathematical field of Graph Theory is the discipline studying graphs and the relatively new interdisciplinary fields of Social Network Analysis and Network Science are the typical domains for the study of networks. According to [1], the field of bibliographic network analysis “contrasts, compares, and integrates techniques and algorithms developed in disciplines as diverse as mathematics, statistics, physics, social network analysis, information science, and computer science.” Thus, the study of networks in science communication becomes a key interest of the field of Scientometrics. In this study, we are exploring publications in the field of “Disease Control through Social Network Surveillance” (DCSNS). The dataset that we have used consisted of 10,866 documents (articles and reviews) published in the period from 1983 to 2020 and, basing our analysis on extracted Topic Modeling topics, we are studying global and local network properties with regards to clustering, triadic relationships amongst connected nodes/publication, and the assortativity of attributes related to the content of publications (which, in particular, are the types of publications and the themes of the employed keyword searches).

2 Literature Review According to the Merriam-Webster Dictionary, Scientometrics is “the application of statistical methods to the study of bibliographic data” [2] and it can be regarded as a sub-field of Bibliometrics that measures and analyzes scientific literature [3]. Scientometrics emphasizes the current digital technological framework, where the formats, actors and venues of scientific publishing, indexing, analysis and assessment extend beyond the printed media. An example of Scientometrics is the study of networks deployed in Wikipedia and the kinds of citation (referencing) behaviors that are therein expressed [4].

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

91

In general, scientometric networks are addressing the qualities of two main concepts, works and authors, and are used for analysis and/or modeling the cohesion of fields, the structure of institutions or disciplines or the changes observed in certain periods of activity. By works, it is mainly meant publications in scientific journals. The current schemes of indexing the journal literature, through well-structured bibliographic databases, enables researchers to identify the associations (links) among works and one of the aims of scientometrics is to study the distribution of these associations in and among documents or publications. A citation is a reference, embedded in the body of a document (usually summarized in the section of the bibliography or list of references), which is citing or referencing (or attributing to) another document, the relevance of which is acknowledged and discussed in the former. Citation analysis is used to examine several issues, such as the connectedness of research of highly cited authors and the distribution of the topics that they are studying [5] or to create charts (or diagrams) of the productivity in the main subject categories of the scientific world [6]. Author-level networks may analyze (again, based on citations) relations between individual authors or compositions of authors (co-authors). However, as the dialogue that Authors are contributing through their work is continuous, the interplay among them can be reciprocal and weighted, depending on the intensity of citations [7]. For example, in [8], bibliometric networks are viewed under properties of Authors, such as gender and seniority, to show differences in the structure of co-authorship networks. The common ground, that of citations, has been used for analyses of both concepts, such as in [9], which is exploring the interdisciplinarity of publications on Information Science through direct citations and co-authorships, or as in [10] that uses multi-layer networks in order to identify the existence of author-based social drivers in citation behaviors. The increased interest in scientometrics, as well as parallel developments in scholarly communication, has led to exploring other qualities as well, as, for instance, those resulting by aggregations of publications. Properties of journals, such as the access status, has been a topic aiming to unearth the effects that the venues of journals have on the exposure and the impact of scientific research [11]. Disease control through social media is the term of a concept and a research methodology referring to the use of community applications in order to record, track, analyze and take informed decisions on urgent matters of public health. The use of the term is not new; yet, in the context of the pandemic of COVID-19, its use has been extensively propagated. In this regard, the role of social media is quite important, despite the various considerations about the ethics and the validity of the process [12]. Nonetheless, in [13], digital surveillance systems were acknowledged as important instruments for early warning about emerging or already known diseases. As outbreaks of diseases are platformed on networks, network analysis (and network science) contributes to the tracing of cases and the informed decision making on specific socially sensitive issues. According to [13], “[n]etwork analysis, in general, allows us to anticipate individual-level epidemiological risk and can thereby help us improve and strategically extend surveillance systems to enhance the

92

M. Boudourides et al.

early and reliable identification of outbreaks.” Therefore, given the methodological salience of network analysis on disease control, surveillance, and safeguarding public health, we have opted for using the same methodology in order to survey the bibliography and the scientific literature amalgamating the accumulated production and the published efforts of the scientific community to get to grips with social challenges of such extraordinary emergencies for the general good.

3 The Citation Graph Methodology Stated in a formal way, extracted from a bibliographic dataset (i.e., a dataset of publications together with their reference lists), a citation graph is composed of nodes, which are the dataset publications, and links among nodes, i.e., attributions of citations among publications. The resulting citation graph is what is Graph Theory [14, 15] is called a directed acyclic graph (DAG), since a publication can only cite chronologically earlier publications. Notice that every node/publication in a citation graph can be either citing or cited by other publications according to the values of the corresponding total degree (i.e., the sum of in-degree and out-degree) of this node/publication. Often in Bibliometrics, as in Library and Information Science, documents in a bibliographic dataset are generically referred to as ‘sources’ and, although the common use of the term source is for citations (as references to sources), here we will be using the term source to mean (only) “citing document,” holding the term of citation to “cited documents.” In this terminology, and after removing isolated nodes/publications (with zero total degree) from a citation graph, every node/publication in this graph can be one of the following three types (adopting the convention that in a citation graph the direction of edges/links goes from citations to sources): (A) source, but not citation (zero out-degree, positive in-degree); (B) citation, but not source (zero in-degree, positive out-degree); (C) source and citation (positive in- and out-degree). Although this classification (according to the distribution of in- and out-degrees) might appear to be complete, there are certain circumstances eluding its purview. For example, assuming that on the average most publications in a bibliographic dataset might have the length of their reference lists being of the same order, the resulting citation graphs may consist of relatively more nodes/publications of type (B) compared to the number of types (A) and (C). In such cases, either the citation graph is highly disconnected, when the average in-degree of citations is sufficiently low, or, otherwise, the induced co-citation graph is highly disconnected (although, in the latter case, the corresponding graph of bibliographic coupling might happen to be well connected). In the extreme case that the number of nodes of type (C) is zero, then the citation graph becomes a bipartite graph with the bipartition of types (A) and (B). Thus, the problem in these circumstances comes from the fact that the mixing in the citation graph among nodes of type (A) and among nodes of type (B)

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

93

is very low, in such a way that the citation graph becomes highly assortative (or “homophilic”) with regards to the nodal attribute of their type. To what can such high type-assortativity (or low type-mixing) be due? To understand this, one might argue both in terms of the inherent endogenous structural patterns in the citation graph and the observed exogenous attributes with which nodes/publications of this graph might be labeled. First, let us consider the case of non-structural nodal labels. These attributions may originate either from the data collection protocols or from the content of documents in the collected corpora. In the former case, typically a bibliographic data set is harvested from a big bibliographic database by querying the occurrence of certain search keywords. Apparently, the latter can be reduced to a number of elementary themes such that any node/publication of the resulting citation graph might display. For example, if the search query is a composite statement involving certain basic terms (variables) assembled with the help of certain Boolean connectives, then each of these basic search terms might be a theme-attribute to nodes/publications of the citation graph. Of course, this presupposes that the database from which a bibliographic dataset is extracted by keyword search querying is already categorized in certain fields to which every publication in the database might be attributing. Normally, such fields on the elements of a database are derived from taxonomies already embedded in the available information about publications (for instance, title or abstract words or keywords given by authors or by any other classification scheme used in the database etc.). Hence, in some way, nodal theme-attributes are always related to the content, i.e., to the semantics of nodes/publications of a citation graph extracted from a bibliographic database. However, there exist other nodal attributes which hinge directly on the contents of such publications or, partially, on the text of their abstracts. For example, Topic Modeling [16, 17] is a popular unsupervised machinelearning classification technique which categorizes a corpus of documents (now, the content of all the publications or the content of their abstracts) to certain topics, in such a way that each document/publication is associated to a dominant topic. So, in short, theme-labels and topic-labels are two attributions of certain non-structural (exogenous) characteristics on nodes of a citation graph. Next, let us examine possible structural reasons being responsible for the occurrence of high or low type-assortativity of citation graph nodes/publications. A first reason might be sought in the possible clustering of these nodes. After the seminal work of Ronald Burt on structural holes [18], it is known that the mechanism of triadic closure (or closure-producing transitivity) might increase the clustering patterns in a graph. However, in the context of citation graphs having the above defined three types of nodes, the only possible transitivity completion that can be attained is by the brokerage of nodes of type (C) bridging linkages among nodes of type (A) and (B). As we have already seen, complete absence of nodes of type (C) creates a bipartition among nodes of type (A) and type (B), implying zero clustering inside each of these partitions. Nevertheless, when nodes of type (C) come to play a role by being attached to nodes of either type (A) or type (B), the attained mixing is not always the desired one. There are two extreme ends in the way that nodes

94

M. Boudourides et al.

of type (C) are articulating linkages with nodes of type (A) or (B). These ends are operating through the following two mechanisms of type assembling: • At the one end, all nodes of type (C) might be placed exclusively as adjacent nodes to nodes of type (A) or (B) in such a way they are completely subordinated by the latter creating a configuration of segmented ego-nets (with egos being nodes of type (A) or (B)), which (ego-nets) are not linked to each other. In this case, what increases is the mixing of nodes of type (A) with nodes of type (C) or of nodes of type (B) with nodes of type (C), but not the mixing among nodes of type (A) and (B). • At the opposite end, each node of type (C) might be bridging a node of type (A) with a node of type (B). The more often this end occurs, the higher the indirect (as mediated by nodes of type (C)) clustering among nodes of type (A) and (B) can be attained and, at the same time, the higher the overall mixing (or disassortativity) of nodal types might be achieved. What happens in mechanism (I) is that triadic closure occurs exclusively around nodes of either type (A) or type (B), while in mechanism (II) nodes of (C) might create structural holes (triadic incompletion) among nodes of types (A) and (B). Motivated by these two extreme mechanisms, we are introducing the following four new nodal degrees. For this purpose, let us consider a node u of the citation graph (DAG) G = (V,E). By convention, the direction in the edge-pairing (u,v) ∈ E in a citation graph G is interpreted as node/publication v is citing node/publication u (i.e. the direction of links in G goes from citation to source). Moreover, the in/outneighbors of u are denoted as follows: Nin (u) = {v ∈ V : (v, u) ∈ E} , Nout (u) = {v ∈ V : (u, v) ∈ E} , and, thus, the in/out-degrees of u are: in-degree(u) = |Nin (u)| , out-degree(u) = |Nout (u)| , where, for a set X, |X| denotes the cardinality of X, i.e., the number of elements of X. Furthermore, the symbol TC stands for ‘triadically closed’ and TO for ‘triadically open’. The TC-in-adjacency set of u, denoted as TCin (u), consists of all the in-neighbors v of u having all of their in-neighbors to also be in-neighbors of u (which is a case of triadic in-adjacency completion). Symbolically, TCin (u) = {v ∈ Nin (u) : Nin (v) ⊆ Nin (u)} . Moreover, the TC-in-degree of u is defined as TC-in-degree(u) = |T C in (u)| ,

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

95

i.e., the TC-in-degree(u) is equal to the number of those in-neighbors of u, which inherit the same in-adjacency property to their own in-neighbors. Thus, TC-indegree(u) in-deg(u). The TC-out-adjacency set of u, denoted as TCout (u), consists of all the outneighbors v of u having all of their out-neighbors to also be out-neighbors of u (which is a case of triadic out-adjacency completion). Symbolically, TCout(u) = {v ∈ Nout (u) : Nout (v) ⊆ Nout (u)} . Moreover, the TC-out-degree of u is defined as TC-out-degree(u) = |TCout (u)| , i.e., the TC-in-degree(u) is equal to the number of those out-neighbors of u, which inherit the same out-adjacency property to their own out-neighbors. Thus, TC-outdegree(u) out-deg(u). The TO-in-adjacency set of u, denoted as TOin (u), consists of those in-neighbors v of u that they might possess at least one in-neighbor which is not u’s in-neighbor (which is a case of triadic in-adjacency incompletion). Symbolically, TOin (u) = {v ∈ Nin (u) : Nin (v) − Nin (u) = ∅} Moreover, the TO-in-degree of u is defined as TO-in-degree(u) = |TOin (u)| , i.e., the TO-in-degree(u) is equal to the number of those in-neighbors of u, which break the transitivity of the in-adjacency property that they are holding with u. Thus, TO-in-degree(u) in-deg(u). The TO-out-adjacency set of u, denoted as TOout(u), consists of those outneighbors v of u that they might possess at least one out-neighbor which is not u’s out-neighbor (which is a case of triadic out-adjacency incompletion). Symbolically, TOout(u) = {v ∈ Nout (u) : Nout (v) − Nout (u) = ∅} Moreover, the TO-out-degree of u is defined as TO-out-degree(u) = |TOout (u)| , i.e., the TO-out-degree(u) is equal to the number of those out-neighbors of u, which break the transitivity of the out-adjacency property that they are holding with u. Thus, TO-out-degree(u) in-deg(u).

96

M. Boudourides et al.

Proposition For any node u ∈ V in a DAG G = (V,E), TC-in-degree(u) + TO-in-degree(u) = in-degree(u), TC-out-degree(u) + TO-out-degree(u) = out-degree(u). The Handshaking Lemma For any node u ∈ V in a DAG G = (V,E), TC-in-degree(u) + TC-out-degree(u) + TO-in-degree(u) + TO-out-degree(u) = 2 |E| . In other words, the higher/lower is the TC-in/out-degree of a node/publication u, the stronger/weaker is the dependence or influence of u from its in/out-adjacency set. Similarly, the higher/lower is the TO-in/out-degree of u, the stronger/weaker is the mixing or the association of u with non-in/out-adjacent nodes in the citation graph, with which u is connected through directed 2-paths. In particular, the extreme case of zero TC-in/out-degree of a non-isolated node u means that u depends solely on non-in/out-adjacent nodes, which are accessible through directed 2-paths, while zero TO-in/out-degree means that u is sustained exclusively by its in/out-adjacent nodes, which are again accessible through directed 2-paths. As an example, to demonstrate the above introduced degrees, let us display the radius 2 egonet citation subgraph of the paper with id 203619203. Incidentally, this is a paper entitled “The centers for disease control and prevention strive initiative: construction of a national program to reduce health care associated infections at the local level,” which was published in the Annals of Internal Medicine in 2019 (Fig. 1). Paper 203619203 (the ego) has in-degree 3 (citing the three papers 203619205, 203619201 and 203619216) and out-degree 5 (being cited by the five papers 1006874, 653882, 23897107, 43880477 and 51718781). Moreover, there exist two predecessors of the ego which are TC-in-adjacent to it (papers 203619201 and 203619205 with their citation links colored red) and one predecessor which is TO-in-adjacent to the ego (paper 203619216 with green colored citation link). Furthermore, four successors of the ego are TC-out-adjacent to it (papers 23897107, 43880477, 51718781 and 653882 with cyan colored inverse citation links) and one successor of the ego is TO-out adjacent (paper 1006874 with orange colored inverse citation links). Notice that the six triadically closed neighbors of the ego either are cited/citing other papers cited/citing by the ego or they do not possess any such corresponding citations. On the other hand, the two triadically open neighbors of the ego are cited/citing papers which are not cited/citing the ego.

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

97

Fig. 1 The radius 2 egonet around paper 203619203

4 Data 4.1 Data Collection The Semantic Scholar Open-Source Research Corpus “S2ORC”, which includes 81.1 million publications, served as the source for retrieving a subcorpus related to the area of interest. All processing has been performed in python [19]. The rake_nltk [20] and supplementary yake [21] packages were used to extract keywords as features on graph nodes, and each citation is represented as an edge in a networkX construct [22]. Attempts to obtain data to be used for analysis included Web of Science (WoS), Scopus, Elsevier, their ICSR Lab, Microsoft Academic Graph (MAG), and finally S2ORC. Efforts in using sources before S2ORC were impeded by various degrees

98

M. Boudourides et al.

of bureaucracy, paywalls, lack of support, or difficulty in implementation. S2ORC is the most substantial source available while also being the easiest to obtain and mine. It is available as uniformly shuffled shards of compressed JavaScript Object Notation (JSON) line, ‘.jsonl’, files which were downloaded onto the Northwestern Quest Analytics computing platform. Of the 25 fields available in the data set, ‘paper_id’ was utilized as the unique identifier for a record, and the following fields were associated: ‘title’, ‘abstract’, ‘year’, ‘outbound_citations’. Titles and abstracts of the full S2ORC corpus were searched using the Boolean phrase “(‘network’ or ‘surveillance’) and (‘disease control’ or ‘disease network’)”. The search returned 3566 publications which we will use as entries of a bibliographic dataset in the area of Disease Control through Social Network Surveillance (from now on referred to by the acronym DCSNS). For each of the source nodes, the ‘outbound_citations’ field (list) was traversed to look up and obtain outbound citations, totaling 67,572. The corresponding citation graph was constructed, the nodes of which are the paper IDs with edges created on each ‘paper_id’ to ‘oubound_citation’ link. The graph was then pruned of nodes with a degree of one and then zero. This reduced the ‘source set’, publications matching the search phrase, to 3044 publications. The reference set has 8467 documents with 645 items existing in both sets (sources referenced by other sources) resulting in a final corpus size of 10,866. Publication titles and abstracts were then combined from the S2ORC dataset into documents representing each text. The four component search terms/phrases were split into binary fields named ‘themes’, as indicators for whether they are found in each document. The ‘LatentDirichletAllocation’ (LDA) module of scikit-learn [23] generated topic modeling classification. Fields representing the dominant topic from each classification added 6 more fields for each document. These processes result in a single topic and between zero and four themes attributed to each document. All data processing code and resulting data is available in our github repository at https://github.com/zaphodnothingth/dcsns.

4.2 Derived Networks The DCNS citation graph extracted from the above described S2ORC dataset is a directed acyclic graph (DAG) [14, 15] composed of 10,852 nodes/publications and 23,173 edges/citations. As a DAG, it is not strongly connected (it has as many strongly connected components as the number of its nodes, i.e., 10,852). Neither it is weakly connected (it has 25 weakly connected components with the largest weakly connected component composed of 10,749 nodes and 23,065 edges). The density of this graph is 0.0002 and its transitivity is 0.009. Since plotting a graph of this size would result in a hardly intelligible visualization, what we are displaying below is the 8-core [24] of the citation graph, which is the maximal subgraph that contains nodes of (total) degree 8 or more (Fig. 2).

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

99

Fig. 2 The 8-core of the citation graph (with nodes colored in 3 Girvan-Newman communities)

5 Discussion of Nodal Attributes of the DCSNS Citation Graph 5.1 Degrees In the following two diagrams, the boxplots and the correlation matrix among all the degrees of the DCSNS citation graph are displayed (Figs. 3 and 4).

100

M. Boudourides et al.

Fig. 3 Boxplots of various degrees of nodes of the DCSNS citation graph

Fig. 4 Correlation matrix of various degrees of nodes of the DCSNS citation graph

5.2 Types As we have already specified, the citation graph nodes/publications are grouped with regards to their type. Here are the three types of nodes (where in parenthesis the number of corresponding nodes/publications belonging to each type is given): • Type (A): Source, not Citation (7840)

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

101

Fig. 5 The reduced citation graph for types of publications

• Type (B): Citation, not Source (2450) • Type (C): Source and Citation (562) This is the first of totally three attributes labeling the nodes of the citation graph. Before examining the other two attributes, let us display the reduced graph of types, when nodes of the citation graph are aggregated according to the type to which they belong, and correspondingly edges are aggregated among types. Apparently, the reduced graph which is aggregated in this way is a digraph of three nodes (the three types of nodes) and six weighted edges. These edges are the only possible, because the node “Citation – not Source” cannot be citing, because otherwise its in-degree would become positive, and this would make it to be a source (but it is not). Similarly, the node “Source – not Citation” cannot be cited, because otherwise its out-degree would become positive, and this would make it to be a citation (but it is not) (Fig. 5).

5.3 Themes As we have already discussed, the nodes/publications of the DCSNS citation graph were classified into three categories, called themes, which were related to terms appearing in the keyword searches employed for the collection of the DCSNS bibliographic dataset from the S2ORC database. Four themes were, thus, identified (in parenthesis, the number of nodes/publications characterized by the corresponding theme): • • • •

Disease Control Theme (3089) Disease Network Theme (354) Network Theme (2243) Surveillance Theme (3663)

Moreover, among the 10,852 nodes/publications of the citation graph, 5143 of them did not possess any theme categorization, because they appeared as extra

102

M. Boudourides et al.

publications inside the reference lists of the primary dataset collected by the aforementioned keyword-searches in the S2ORC database. In addition, among the remaining 5709 publications classified to these themes, several of them were assigned to more than one theme. In other words, the theme attribute on nodes was non-exclusionary (overlapping). Thus, in order to partition the citation graph nodes into distinct thematically determined groups, we had to consider the set of combined themes (according to the existing combinations of occurrences of the above four themes as categories of the citation graph nodes). In this way, the following eleven combinations of themes were identified (in parenthesis, the numbers of corresponding nodes): • • • • • • • • • • •

Disease Control Theme (247) Network Theme (905) Surveillance Theme (1207) Network & Disease Control Theme (548) Network & Disease Network Theme (344) Network & Surveillance Theme (159) Network, Disease Control & Disease Network Theme (2) Network, Surveillance & Disease Control Theme (277) Network, Surveillance & Disease Network Theme (5) Network, Surveillance, Disease Control & Disease Network Theme (3) Surveillance & Disease Control Theme (2012)

Of course, there still exist the 5143 nodes/publications, not categorized by any of these combined themes. For the sake of completeness, we may categorize them in a twelfth category of combined themes designated as “Reference without Theme.” In this way, the reduced DCNS citation graph of combined themes is the following weighted digraph composed of 12 nodes (combined themes) and 41 aggregated edges (Fig. 6).

5.4 Topics The 10,852 nodes of the DCSNS citation graph are publications which form a corpus of documents. In this corpus, the document of a publication consisted of the abstract of the publication extracted from the S2ORC database, when the publication was in the primary dataset collected by the aforementioned keyword-searches, or simply the existing title words, when the publication was inside the lists of references of the former. This textual corpus was processed through the unsupervised machine learning technique of Topic Modeling (using the LDA model) to classify the content of the corpus into six topics and to associate to each document (i.e., to each publication) a dominant topic. What follows is the list of the six resulting topics (which were interpreted with the given names according to the top terms in each topic). Notice that in parenthesis is the number of publications, for which the corresponding topic is dominant.

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

References without Themes

13262

103

Network, Surveillance & Disease Network Theme 1

Surveillance & Disease Control Theme 1

224 91

1

2

203

Network & Disease Control Theme

2981

6

Network Theme

1814

Network & Disease Network Theme

1

1

272

99

33

818 225

Network, Surveillance, Disease Control & Disease Network Theme

129

39 34

191

Network, Disease Control & Disease Network Theme

Network, Surveillance & Disease Control Theme

62

33

4

957

528

Surveillance Theme

4

64

224

10

162

4

1

1

501

68 1

124

4

Network & Surveillance Theme

Disease Control Theme

Fig. 6 The reduced citation graph for combined themes

• • • • • •

Disease Networks Topic (719) Infectious Diseases Topic (1245) Disease Control Topic (2636) Health-Related Data Topic (2053) Surveillance Topic (2925) Network Models Topic (1274)

Now, the categorization of the citation graph nodes/papers in one of these six topics becomes a partition in the set of nodes, leaving no node uncategorized and without any overlapping of topics among the nodes. Thus, the reduced DCNS citation graph

104

M. Boudourides et al.

Fig. 7 The reduced citation graph for dominant topics

of (dominant) topics is the following weighted digraph composed of six nodes (topics) and 36 aggregated edges (Fig. 7).

5.5 Relationships Between Nodal Attributes The three attributes considered here (types, combined themes, and topics) partition the set of all nodes of the DCSNS citation graph in the way depicted by the following bar plot (Fig. 8). For pairwise relationships among these three attributes, first we are plotting the corresponding cross-tabulations (or contingent tables) (Figs. 9 and 10). In addition, we may also visualize the correlation matrix of all the three attributes together (Fig. 11).

5.6 Degree and Attribute Assortativities Here, we are examining all the degree and all attribute assortativity coefficients [25] for the DCSNS citation graphs. The results are summarized in the following Table 1.

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

105

Fig. 8 Partition of the set of citation nodes by the three attributes

Fig. 9 Topics vs. nodal types cross-tabulation

6 Conclusions In our network analysis of publications on DCNS, we have highlighted the importance of both global (citation) patterns and local properties (triads of cit-ing/ed nodes/publications). Our tool to do this was to analyze structural triadic clustering

106

M. Boudourides et al.

Fig. 10 Combined themes vs. nodal types cross-tabulation

Fig. 11 Correlation matrix among attributes of type, combined theme, and topics

Citation Graph Analysis and Alignment Between Citation Adjacency. . .

107

Table 1 Degree and assortativity coefficients InDegree OutDegree TC-InDegree TC-OutDegree TO-InDegree TO-OutDegree Type Combined Theme Topic

Degree assortativity coefficient 0.094 0.028 −0.007 −0.001 0.348 0.217 – – –

Attribute assortativity coefficient – – – – – – 0.003 0.029 0.019

together with assortativities of attributes related to the content of publications (types of publications and themes in the employed keyword searches). In this way, we have managed to connect a structural network analysis with an NLP-based semantic exploration (through Topic Modeling) of a complex bibliographic dataset. Moreover, we achieved to associate particular aggregations of publications, in which certain semantic attributes co-occur, with network patterns of triadic closures. All this was obtained by the study of a bibliographic citation network, meaning that in our future work we are going to apply a similar scientometric analysis on the corresponding co-citation and bibliographic coupling induced networks [26] extracted from the DCSNS bibliographic dataset. Acknowledgements This research was supported in part through the computational resources and staff contributions provided for the Quest High Performance Computing facility at Northwestern University which is jointly supported by the Office of the Provost, the Office for Research, and Northwestern University Information Technology.

References 1. Börner K, Sanyal S, Vespignani A (2008) Network science. Annu Rev Inf Sci Technol 41:537– 607 2. Bibliometrics. https://www.merriam-webster.com/dictionary/bibliometrics 3. Garfield E (2009) From the science of science to Scientometrics visualizing the history of science with HistCite software. J Inf 3:173–179 4. Arroyo-Machado W, Torres-Salinas D, Herrera-Viedma E, Romero-Frías E (2020) Science through Wikipedia: a novel representation of open knowledge through co-citation networks. PLoS One 15:1–20 5. Chandrasekharan S, Zaka M, Gallo S, Zhao W, Korobskiy D, Warnow T, Chacko G (2021) Finding scientific communities in citation graphs: Articles and authors. Quant Sci Stud 2:184– 203 6. Leydesdorff L, Rafols I (2009) A global map of science based on the ISI subject categories. J Am Soc Inf Sci Technol 60:348–362 7. West JD, Vilhena DA (2014) A network approach to scholarly evaluation. In: Beyond bibliometrics: harnessing multidimensional indicators of scholarly impact. MIT Press, Cambridge, MA

108

M. Boudourides et al.

8. Mählck P, Persson O (2000) Socio-bibliometric mapping of intra-departmental networks. Scientometrics 49:81–91 9. Huang M-H, Chang Y-W (2011) A study of interdisciplinarity in information science: using direct citation and co-authorship analysis. J Inf Sci 37:369–378 10. Zingg C, Nanumyan V, Schweitzer F (2020) Citations driven by social connections? A multilayer representation of coauthorship networks. Quant Sci Stud 1:1493–1509 11. Gray RJ (2020) Sorry, we’re open: golden open-access and inequality in non-human biological sciences. Scientometrics 124:1663–1675 12. Aiello AE, Renson A, Zivich PN (2020) Social Media– and Internet-based disease surveillance for public health. Annu Rev Public Health 41:101–118 13. Herrera JL, Srinivasan R, Brownstein JS, Galvani AP, Meyers LA (2016) Disease surveillance on complex social networks. PLoS Comput Biol 12:1–16 14. Bang-Jensen J, Gutin GZ (2009) Digraphs: theory, algorithms and applications. Springer, London 15. West D (2002) Introduction to graph theory. Pearson, Singapore 16. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022 17. Blei DM. Topic modeling and digital humanities. J Digit Human 2:1 18. Burt RS (1995) Structural holes: the social structure of competition. Harvard University Press, Cambridge, MA 19. van Rossum G (1995) Python tutorial. Centrum voor Wiskunde en Informatica, Amsterdam 20. Rose S, Engel D, Cramer N, Cowley W (2010) Automatic keyword extraction from individual documents. In: Text mining. Wiley, New York, pp 1–20 21. Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, Jatowt A (2020) YAKE! Keyword extraction from single documents using multiple local features. Inf Sci 509:257–289 22. Hagberg A, Swart D, Chult S (2008) Exploring network structure, dynamics, and function using networkx. In: Report Number: LA-UR-08-05495; LA-UR-08-5495, Research Org.: Los Alamos National Lab. (LANL), Los Alamos, NM 23. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825– 2830 24. Batagelj V, Zaversnik M (2003) An O(m) algorithm for cores decomposition of networks. CoRR. cs.DS/0310049 25. Newman MEJ (2003) Mixing patterns in networks. Phys Rev E 67 26. Garfield E (2001) From bibliographic coupling to co-citation analysis via algorithmic historiobibliography: a citationist’s tribute to Belver C. Griffith. In: A paper presented at the Drexel University, Philadelphia, PA

Privacy in Online Social Networks: A Systematic Mapping Study and a Classification Framework Sarah Bouraga, Ivan Jureta, and Stéphane Faulkner

Abstract Disease control through Online Social Networks (OSNs) has become particularly relevant in the past few months. Given the sensitive nature of the data collected and manipulated in that context, a major concern for (potential) users of such surveillance applications is privacy. The concept of privacy has been studied from many different angles and this work aims to offer a general systematic literature review of the area. The contributions of this paper are twofold. Firstly, we propose a systematic mapping study covering papers related to privacy in OSNs. This study results in a coarse-grained overview of the landscape of existing works in the field. In this first phase, 345 papers were examined. The findings show the characteristics and trends of publications in the area. They also emphasize the areas where there is a shortage of publications, hence guiding researchers to gaps in the literature. Secondly, we propose a classification framework and apply it to the subset of papers (108 papers) that offer a solution to protect the user’s privacy. The results provide a way for researchers to position a solution in comparison with other existing solutions. The results also highlight trends in existing solutions. The main practical implications of this paper are guidelines and recommendations proposed to designers of applications, including applications for disease control. Keywords Online social networks · Privacy · Systematic mapping study · Classification framework

S. Bouraga () · S. Faulkner Department of Business Administration, University of Namur, Namur, Belgium Namur Digital Institute, University of Namur, Namur, Belgium e-mail: [email protected]; [email protected] I. Jureta Department of Business Administration, University of Namur, Namur, Belgium Namur Digital Institute, University of Namur, Namur, Belgium Fonds de la Recherche Scientifique – FNRS, Brussels, Belgium e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_6

109

110

S. Bouraga et al.

1 Introduction Ever since Online Social Networks (OSNs) have become popular, privacy in these services has been an issue. OSNs allow a user to share content about herself, both directly and indirectly: directly by giving, e.g., her birthdate and indirectly by performing actions which suggests her preferences, e.g. commenting on some specific content on Facebook. The availability of such personal content has led to concerns about, and research on trust and privacy. Users have concerns about how OSNs use their data and about who can see their data. These concerns could be heightened when it comes to sensitive data, as it is the case when we are dealing with disease control. Disease control carried out using OSN involves the collection, processing, and analysis of critical and private data. Protecting these data and ensuring the user privacy should be a top priority for the designers and administrators of the OSN. Many papers have addressed the aspect of privacy in OSNs, and from many different angles. In this work, we aim to understand the landscape and status of recent works in privacy in OSNs. Among other things, it will allow us to identify whether or not privacy in the context of disease control has been studied. Surveys about privacy in OSNs have already been done by Joshi and Kuo [41] or Zheleva and Getoor [126]. However, the OSNs and what is expected from them in terms of privacy have greatly evolved since 2011. It is thus relevant to have a picture of the current status of this research. Kayes and Iamnitchi [45] proposed a more recent survey of privacy in OSNs. The authors focused on Privacy and security issues in OSNs, and provided a taxonomy of attacks and an overview of existing solutions. The scope of the current study is different, as well as the way we approach it: (1) the goal is to offer an overview of the research on privacy in OSNs, from the psychological angle to the legal aspects of privacy, and not solely on privacy attacks, and (2) we propose a Systematic Mapping Study (SMS) and a classification framework, offering two levels of details to the analysis. In this work, we make two contributions. Firstly, we provide a SMS summarizing publications falling under the scope of our study. This type of study can benefit several types of readers: (1) readers who are seeking an introduction to the subject can use the map to discover a coarse-grained overview of privacy in OSNs, and (2) practitioners can use the study to locate the most popular (types of) publications. Secondly, this work proposes a classification framework offering a more in-depth analysis of a subset of papers. While the SMS considers publications addressing various topics, the classification framework aims to give a more in-depth overview of the solution to protect user privacy in OSNs. The application of this framework can be used, again, by practitioners who want a more fine-grained analysis of the types of solutions proposed in the literature. It can help them identify gaps in the literature. It can also help practitioners to position their own solution by comparing it to existing ones using the framework. In order to propose the systematic mapping, we followed the methodology suggested by Petersen et al. [74].

Privacy in OSNs

111

The remainder of the paper is structured as follows. Section 3 presents the Systematic Mapping. Section 4 presents and applies the Classification Framework. Sections 5 and 6 discusses the results and concludes this paper respectively.

2 Related Work Our work is not the first systematic mapping study conducted in the field of OSNs. Other such surveys have been carried out, examples include a survey of social media for business process improvement [64]; social media and its relation to business [6]; transportation research based on social media analysis [120]; sentiment analysis for Arabic in OSNs [2]; and automatic classification of fake news in social media [27]. We can distinguish between surveys using: (1) the research terms, (2) the time of the publications, and (3) the digital libraries used for the articles retrieval [120]. We can state that our work here differs from the ones mentioned above based on the query and the timeline. Indeed, we are interested in recent works addressing privacy in OSNs.

3 Systematic Mapping “A systematic mapping study provides a structure of the type of research reports and results that have been published by categorizing them and often gives a visual summary, the map, of its results” [74]. The process proposed by Petersen et al. is as follows: (1) The definition of research questions (RQs), (2) Conduct search for primary studies, (3) Screening of papers for inclusion and exclusion, (4) Keywording of abstracts, and (5) Data extraction and mapping of studies. Each of these steps are applied in the following subsections.

3.1 Definition of Key Terms In this work, we adopt the commonly accepted definition of OSNs proposed by Boyd and Ellison [18]: Web-based services that allow individuals to (1) construct a public or semi-public profile within a bounded system, (2) articulate a list of other users with whom they share a connection, and (3) view and traverse their list of connections and those made by others within the system. The nature and nomenclature of these connections may vary from site to site.

As far as the definition of privacy is concerned, we apply the view of Clarke [22], who proposed to consider privacy as “the interest that individuals have in sustaining

112

S. Bouraga et al.

Table 1 Research questions Topics Contribution to theory Research approach Venue

RQ1: Which topics have been covered by Privacy publications? RQ2: How have these topics evolved over the past few years? RQ3: Which type of theory is proposed? RQ4: How have these evolved over the past few years? RQ5: Which type of research approach is applied? RQ6: How have these evolved over the past few years? RQ7: In which venues do the papers typically appear? RQ8: How have these evolved over the past few years?

a ‘personal space’, free from interference by other people and organisations.” This definition has the benefit of being large enough to account for the many perspectives addressed in the literature.

3.2 Definition of Research Questions: Step 1 The goal here is to define the research scope, by clarifying and articulating the RQs guiding the study. The RQs are listed in Table 1. Given the plethora of research focusing on privacy in OSNs, it is interesting to see which topics are addressed by all these papers (RQ1). Next, as Belanger and Crossler [15] adapted Gregor framework [35] to the privacy in the digital age, here we apply the author’s taxonomy to the even more specific subject of privacy in OSNs (RQ3). The type of research paper is also of interest (RQ5). Finally, as argued by Petersen et al. [74], one of the goals of a systematic mapping study can be to identify the venues in which the papers typically appear (RQ7). For each topic, we also want to analyse whether a trend has been emerging over the years or if the frequency of publications has remained the same. This is covered by RQ2, RQ4, RQ6, and RQ8.

3.3 Conduct Search for Primary Studies and Screening of Papers for Inclusion and Exclusion: Steps 2 and 3 The search results are summarized in Table 2. We only included the works published in 2013 or later, given that OSNs and the addressing of privacy have significantly evolved since they first became popular. Our intention was to propose an in-depth analysis of the newer publications and solutions proposed in the literature. We searched for the term “Social networks” but only included papers that dealt with OSNs. We focused on papers published in peer-reviewed journals or conferences.

Privacy in OSNs

113

Table 2 Search results Terms used in the query “Online social network” and “Privacy” in the title, published in 2013 or later “Online social networks” and “Privacy” in the title, published in 2013 or later “Social network” and “Privacy” in the title, published in 2013 or later “Social networks” and “Privacy” in the title, published in 2013 or later “Social Media” and “Privacy” in the title, published in 2013 or later

# of results 58 270 381 720 391

The papers must be written in English. Finally, we excluded theses, Editorials, and regional conferences. Hence, out of all these publications, we kept 345 papers.

3.4 Classification Scheme and Mapping: Steps 3 and 4 After screening the papers, three facets emerged relatively easily. The first facet used to analyze the papers is the Topic as summarized in Table 3. The second facet structured the type of theoretical contributions as summarized in Table 4. Table 3 Paper count classified by topic Topic Privacy mechanism. How can privacy be ensured? Legal aspect of privacy. How does the law handle privacy in OSNs? Psychological aspect of privacy. A study of the attitude and/or behavior of users towards privacy in OSNs Privacy and security challenges. What kind of problems or attacks OSN users could face? Privacy and Applications or Case Studies. A discussion or analysis of a specific problem related to privacy. Ethical aspect of privacy

# of papers 108 27 94 77 31 8

Table 4 Paper count classified by theoretical contribution Theoretical contribution [35] Analyzing. Describing what is, without providing any explanations Explaining. Describing what is, providing explanations, but without providing any predictions or proposing any testable solutions Predicting. Describing what is and will be, but without providing any precise causal explanations Explaining and Predicting. Describing what is and will be, and providing explanations Design and Action. Describing how to do something

# of papers 123 63 3 28 128

114

S. Bouraga et al.

Finally, the third facet created is the research approach. We use the Wieringa’s topology of Requirements Engineering (RE) research papers [112]. Zave [119] define RE as: The branch of software engineering concerned with the real-world goals for, functions of, and constraints on software systems. It is also concerned with the relationship of these factors to precise specifications of software behavior, and to their evolution over time and across software families.

We consider that privacy is a requirement for OSNs today. When studying, analyzing or designing OSNs today, one is concerned with the aspect of privacy, whether as a goal or a constraint. Privacy has emerged as one of the most critical requirements for any OSN platform. Hence, we believe it is relevant to consider Wieringa’s classification of RE papers. Tables 3, 4, and 5 summarize the data extraction step, and Figs. 1, 2 and 3 show the correlation between two facets. Figure 1 indicates that most of the Evaluation research papers are classified as an Analyzing (120 papers) and the Explaining (62 papers) categories for the Theoretical contribution. The Proposal of solution papers, as might be expected, fall almost exclusively in the Design and Action type of contribution. Figure 2 shows that all Privacy mechanism papers (108 papers) fall into the Design and Action type of contribution. Papers dealing with the Psychological aspect topic are mostly distributed among the Explaining (43 papers), Explaining and Predicting (28 papers) and Analyzing (21 papers) categories. The papers addressing Privacy and Security challenges mostly fall into the Analyzing (54 papers) and Explaining (16 papers) types of contribution. Finally, Fig. 3 shows that all Privacy mechanism papers (108 papers—Topic axis) fall into the Proposal of solution type of RE Research papers. We can also observe that most of the Privacy and Security challenges papers and most of the papers addressing a Psychological aspect of privacy are classified as Evaluation research (73 and 91 papers).

Table 5 Paper count classified by RE research paper RE research paper [112] Evaluation research: investigation of a problem in RE or implementation of a RE technique in practice Proposal of solution: proposal of a novel solution technique and argumentation for its relevance Validation research: investigation of the properties of a solution technique that has not yet been implemented in practice Philosophical papers: proposal of a new way of looking at something Opinion papers: proposal of the author’s opinion about something Personal experience papers: description of the author’s personal experience

# of papers 213 127 1 0 4 0

Privacy in OSNs

115

Fig. 1 Visualization of the systematic mapping—RE research papers vs theoretical contribution

Fig. 2 Visualization of the systematic mapping—Topic vs theoretical contribution

116

S. Bouraga et al.

Fig. 3 Visualization of the systematic mapping—Topic vs RE research papers

3.4.1 RQ1 and RQ2: Topics in OSN Privacy Research A little more than half of the papers fall into the Privacy mechanism (108 papers) and the study of privacy from a Psychological perspective (94) (Figs. 2 and 3). Figure 4 shows that the peak for the three most popular topics is logged in: (1) 2016 for the Privacy mechanism with 25 papers, (2) 2017 for the Privacy and Security challenges and for the Psychological aspect with 17 and 22 papers respectively.

3.4.2 RQ3 and RQ4: Theoretical Contributions in OSN Privacy Research 70% of the papers fall into the Design and Action category (128 papers) and the Analyzing category (123) (Figs. 1 and 2). It is in 2016 that most of the Design and Action contributions were logged with 32 papers, and a peak for the Analyzing contribution in 2013 with 26 articles (Fig. 5).

Privacy in OSNs

Fig. 4 Evolution of paper count by topic between 2013 and 2019

Fig. 5 Evolution of paper count by theoretical contribution between 2013 and 2019

117

118

S. Bouraga et al.

Fig. 6 Evolution of paper count by RE research papers between 2013 and 2019

3.4.3 RQ5 and RQ6: RE Research Papers in OSN Privacy Research As shown in Figs. 1, 3 and 6, the works on privacy fall mainly into 2 types of research papers: Evaluation research and Proposal of solution. Both types followed a similar evolution over the years. The peak for the Evaluation research was 2017 with a total of 42 papers, while the Proposal of solution had its highest numbers in 2016 with 32 papers. Only 4 Opinion papers, 1 Validation research and no Personal Experience papers were registered.

3.4.4 RQ7 and RQ8: Venues in OSN Privacy Research Most papers were published in either a journal (213) or a conference (114) (Fig. 7), and 45 papers were published in journal in 2017 and 31 papers were presented in a conference (or workshop) in 2016 (Fig. 8).

4 Classification Framework for the Design and Action Theoretical Contributions This section focuses on the classification framework we propose for the analysis of the privacy mechanisms for OSNs identified in the literature; providing a more indepth analysis than the first part of this paper. The axes and their dimensions listed

Privacy in OSNs

119

Fig. 7 Paper count by venue

Fig. 8 Evolution of paper count by venue between 2013 and 2019

here naturally emerged after the reading of the papers. The framework is composed of the following axes: – What? It describes the nature of what is being protected. Possible values: the whole profile, posts, images, relationships, shared data, or the identity of the user. – When? States if the solution proposed by the paper is meant to prevent violations or to detect whether there has been a breach in privacy. Possible values: (1) Prevention: the solution will implement its decision before the data are shared; (2) or Detection: the solution will give the warning after the data are published.

120

S. Bouraga et al.

– How? Explores the nature of the output of the solution. Possible values: (1) Binary: the output will be similar to “Disclose data” or “Do not disclose data”; (2) Continuous: the output will be similar to a privacy score. – Who? Who makes the decision regarding privacy? Possible values: (1) the System: the user has no direct influence on the decision, the system takes advantage of all the available information and makes the decision by itself; (2) the User: the system has no direct influence on the decision or (3) Hybrid: both the system and the user are involved in the final decision. – Against whom? This axis indicates against what type of entities the solution protects the user from. Possible values: (i) Other users: the published data will not be accessible to non authorized regular (i.e. not malicious) users (ii) Attackers: the solution will protect the published data against malicious entities, (iii) OSN providers: the published data will be inaccessible to the OSN, or (iii) Third-Party applications: if the OSN allows the installation of third-party applications, the latter will not be able to access the published data. – For whom? This axis will indicate the type of disclosure that is being protected. Possible values: (1) Self: the user will protect elements in her profile and/or posts so that her own privacy will be protected, (2) Others: the user will protect elements in her profile and/or posts so that the privacy of others will be protected, or (3) Both: the user will protect elements in her profile and/or posts so that her own privacy and others’ will be protected. The summary of the classification framework can be found in Table 6 and its application is represented in Table 7.

5 Discussion This paper offers a coarse-grained overview of the current research on privacy in OSNs using a SMS. Someone interested in an introduction to the subject can have a better idea of the state of the literature by reading the results of this systematic mapping. The results of RQ1 to RQ8 allow the reader to draw some insights. Out of the 345 papers considered, many studies focused on the following aspects: (1) Design and Action (128 papers), Analyzing (123 papers), and Explaining (63 papers) for the Theoretical contribution; (2) Evaluation research (213 papers), and Proposal of solution (127 papers) for the RE Research papers, and (3) Privacy mechanism (108 papers), Psychological aspect (94 papers), and Privacy and security challenges (77 papers) for the Topic. On the other hand, there seems to be a shortage of publications in the following areas: (1) Theoretical contribution: Predicting (3 papers), and Explaining and Predicting (28 papers); (2) RE Research papers: Validation research (1 paper), Opinion paper (4 papers), Philosophical paper (0 paper), and Personal experience paper (0 paper); and (3) Topic: Ethical aspect (8 papers), and Legal aspect (27 papers). Thus, researchers seeking to make a

Privacy in OSNs Table 6 Summary of the classification framework

121 Axes What

When

How

Who

Against whom

For whom

Dimensions Whole profile Posts Images Relationships Shared data Profile attributes Identity of the user Location Prevention Detection Both Binary Continuous Hybrid User System Hybrid—Input Hybrid—Output Hybrid—Input and output Other users Attackers OSN providers 1/3 applications Self Others Both

Table 7 Application of the classification framework for the what axis What axis Whole profile

Papers [1, 3, 4, 7–9, 11, 17, 20, 24, 30, 34, 36, 37, 49– 51, 55, 56, 60, 66, 70, 71, 73, 78–80, 83, 84, 88, 98, 116, 127, 127, 128] Posts [12, 13, 16, 19, 21, 25, 26, 40, 42, 44, 48, 52, 65, 68, 72, 77, 86, 87, 100, 103– 105, 108, 122] Images [38, 69, 92, 95, 97, 99, 115] Relationships [28, 125] Shared data [10, 46, 47, 52, 62, 67, 89–91, 101, 102] Profile attributes [23, 32, 33, 117] Identity of the user [14, 19, 29, 54, 61, 96, 106, 107, 109, 110, 124] Location [39, 53, 57–59, 75, 85, 94, 113, 114, 118, 123]

122

S. Bouraga et al.

contribution could focus on Validation papers and/or address the ethical aspect of privacy. Practitioners who are interested in a more detailed analysis will turn to the classification framework. The benefit of combining a SMS with a classification framework in one study is that it allowed us to deal with some validity problems one may encounter when not evaluating papers in much detail. More specifically, during steps 3 and 4 of the SMS process, we classified a few papers as Privacy mechanism topic. However, during the application of the classification framework, we realized that these papers did not actually propose a mechanism to protect the privacy of users and thus changed our initial categorization. Without the finer-grained analysis, some studies would have remained misclassified (Tables 8, 9, 10, 11 and 12). The application of the classification framework offers the following insights. Out of the 108 works considered, many studies focused on the following aspects: (1) What: the Whole profile (35 papers) and the posts (24 papers); (2) When: Prevention (94 papers); (3) How: Binary (75 papers); (4) Who: Hybrid input (63 papers); (5) Against whom: Other users (62 papers); (6) For whom: Self (91 papers). The fact that researchers focused on these aspects of privacy makes sense. Users want their whole profile to be protected, and not only a specific part of it (What). Users want to know beforehand that their profile or posts are at risk (When), and they usually want information in the form of “Risky” vs “Safe” instead of a Risk score (How). In most solutions considered here, the user has to identify the group of friends she wants to share some information with. This process seems logical: the user has control over who sees her profile/posts, but she lets the mechanism enforce the protection (Who). Authors focused mainly on protection against other users (Against whom), while there is a shortage of papers addressing the protection against Adversaries, Third-Party applications and the OSN providers. Researchers seeking to make a contribution about privacy on OSNs could focus on a solution protecting

Table 8 Application of the classification framework for the when axis When axis Prevention

Detection Both

Papers [1, 4, 7–10, 12–14, 16, 17, 20, 23, 24, 24, 26, 28–30, 32–34, 36–40, 44, 46– 48, 52, 52–54, 56–62, 65–69, 75, 77–80, 82–106, 106, 108–111, 113–118, 122– 125, 127, 128, 128] [3, 11, 19, 21, 49–51, 55, 70–72, 121] [42, 73]

Table 9 Application of the classification framework for the how axis How axis Binary

Continuous Hybrid

Papers [1, 4, 7, 9, 10, 12, 13, 16, 17, 19, 20, 23–26, 28, 30, 34, 36, 37, 39, 42, 44, 46– 54, 56–58, 60–62, 65–69, 75, 77–80, 82–95, 97–99, 101, 102, 105, 111, 113, 114, 118, 122, 123, 127] [3, 8, 11, 14, 21, 29, 32, 33, 38, 40, 55, 59, 70– 72, 96, 100, 103, 104, 106, 106, 108–110, 116, 117, 121, 124, 125, 128] [73, 115]

Privacy in OSNs

123

Table 10 Application of the classification framework for the who axis Who axis User System Hybrid input

Hybrid—Output Hybrid—Input and output

Papers [79, 88] [4, 9, 12, 14, 17, 19, 21, 29, 32, 33, 36, 53–56, 58, 60, 75, 82, 84, 85, 95, 96, 99, 100, 103, 106, 106, 109, 110, 113, 114, 117, 124, 125] [1, 3, 7, 10, 11, 13, 16, 20, 24–26, 28, 30, 34, 37– 40, 42, 44, 46, 47, 49–52, 57, 59, 61, 62, 65–73, 77, 78, 80, 83, 86, 89–94, 97, 98, 101, 102, 104, 105, 111, 116, 118, 121–123, 127] [8, 108, 115, 128] [23, 48, 87]

Table 11 Application of the classification framework for the against whom axis Against whom axis Other users

Attackers OSN providers 1/3 application

Papers [1, 9, 10, 13, 16, 24–26, 28, 30, 34, 38, 40, 42, 44, 46– 52, 54, 57, 60, 62, 65–69, 73, 77–80, 82, 83, 86–94, 97, 99, 101–105, 108, 115, 116, 122, 123, 128, 128] [3, 8, 11, 12, 14, 19, 21, 23, 29, 32, 33, 36, 39, 53, 55, 58, 60, 61, 71, 75, 82, 85, 95, 96, 106, 106, 109, 110, 113, 114, 117, 118, 121, 124, 125] [1, 17, 56, 70, 72, 84, 86, 127] [7, 20, 37, 59, 70, 72, 98, 100, 111]

Table 12 Application of the classification framework for the for whom axis For whom axis Self

Others Both

Papers [1, 3, 4, 7–9, 11–13, 16, 17, 19–21, 23–26, 28–30, 32–34, 36– 40, 42, 44, 48–51, 53–61, 65, 68–73, 75, 77, 78, 80, 82–89, 92–100, 103– 106, 106, 108–111, 113, 114, 116–118, 123–125, 127, 128, 128] [10, 14, 46, 47, 52, 62, 67, 90, 91, 101, 102, 115, 122] [66, 79, 121]

the user against these agents. Finally, a large proportion of papers centered on the self protection instead of the protection of other users (For whom). An indisputable gap is the management of privacy related to disease control in OSNs. When analyzing the topics covered by articles addressing privacy issues in OSNs, one can clearly identify a lack of papers attending to disease control. Several works related to disease management and OSNs have been proposed in the literature. These works assess vaccine sentiment in OSNs [43, 81], the impact of social media during the measles outbreak in the Netherlands in 2013 [63] or during the COVID-19 pandemic [5, 31], and disease awareness in OSNs [76]. However, we did not find any work addressing the privacy management in this area; while this is relevant since disease control will involve some sensitive data. Specifically, what are the practical implications of these results? New solutions in general and solutions for disease control in particular should take these findings into account. First of all, and most importantly, novel research in this field should recognize the importance of privacy management. Secondly, developers of such

124

S. Bouraga et al.

solutions should pay attention to the various axes of the classification framework. Their solutions should protect the Whole profile (What). They should prevent any privacy breach (When), and should do so wholly (Binary—How). The decision regarding the information disclosure should be made by both the user and the system, i.e. both parties should be involved in the decision (Hybrid—Who). The sensitive information should be protected from other users, attackers and OSN providers (Against whom). Finally, the solutions should protect both the user and her contacts (For whom).

6 Conclusion This paper provides a general SMS of privacy in OSNs, covering 345 publications and offering an analysis along multiple facets. Eight RQs guided both the search and screening of existing papers. This work also offers a classification framework (and its application) for the fine-grained analysis of privacy mechanisms. The SMS and the application of the classification framework offer insights into the existing literature, showing the areas where extensive efforts have been made and highlighting the areas where there is a shortage of publications. Specifically, we can summarize the results as follows. The SMS shows that: – Many studies focused on the following aspects: (i) Design and Action, Analyzing, and Explaining (Theoretical contribution); (ii) Evaluation research, and Proposal of solution (RE Research papers), and (iii) Privacy mechanism, Psychological aspect, and Privacy and security challenges (Topic) – Researchers seeking to make a contribution could focus on Validation papers and/or address the ethical aspect of privacy The application of the classification framework shows that: – Many studies focused on the following aspects: (1) the Whole profile and the Posts (What); (2) Prevention (When); (3) Binary (How); (4) Hybrid input (Who); (5) Other users (Against whom); (6) Self (For whom) – Researchers seeking to make a contribution could focus on a solution protecting the user against Adversaries, Third-Party applications or OSN providers – There exists a gap in the management of privacy related to disease control in OSNs – New solutions—including solutions for disease control—should take these findings into account, i.e., developers should pay attention to the various axes of the classification framework when designing their solution. Finally, a limitation of this study is that we focus solely on the technical aspects of solutions. However, we believe that cultural and sociodemographic differences could have an influence on the definition of a suitable solution and by extension on its adoption by users. Some studies have been carried out, e.g. by the Pew Research

Privacy in OSNs

125

Center for the US.1 or the topic has been discussed in the news. Hence, we believe that governments and institutions should take into account these cultural differences when designing new solutions.

References 1. AbdulKader H, ElAbd E, Ead W (2016) Protecting online social networks profiles by hiding sensitive data attributes. Procedia Comput Sci 82:20–27 2. Abo MEM, Raj RG, Qazi A, Zakari A (2019) Sentiment analysis for Arabic in social media network: A systematic mapping study. Preprint arXiv:191105483 3. Aghasian E, Garg S, Gao L, Yu S, Montgomery J (2017) Scoring users’ privacy disclosure across multiple online social networks. IEEE Access 5:13118–13130 4. Aghasian E, Garg S, Montgomery J (2018) A privacy-enhanced friending approach for users on multiple online social networks. Computers 7(3):42 5. Ahmad AR, Murad HR (2020) The impact of social media on panic during the covid-19 pandemic in iraqi kurdistan: online questionnaire study. J Med Int Res 22(5):e19556 6. Alarcón CN, Sepúlveda AU, Valenzuela-Fernández L, Gil-Lafuente J (2018) Systematic mapping on social media and its relation to business. Eur Res Manag Busin Econ 24(2):104– 113 7. Ali S, Solehria SAS (2013) User interaction based framework for protecting user privacy in online social networks. In: ICISO 2013, p 82 8. Almasoud SK, Almogren A, Hassan MM, Alrassan I (2018) An efficient approach of improving privacy and security in online social networks. Concurr Comput Pract Exp 30(5):e4272 9. Alsalibi BA, Zakaria N (2013) CFPRS: collaborative filtering privacy recommender system for online social networks. J Eng Res Appl 3(5):1850–1858 10. Amrutha P, Sathiyaraj R (2013) Privacy management of multi user environment in online social networks (OSNs). Global Journal of Computer Science and Technology 11. Ananthula S, Abuzaghleh O, Alla NB, Chaganti S, Kaja P, Mogilineedi D (2015) Measuring privacy in online social networks. Int J Sec Privacy Trust Manag 4(2):1–9 12. Anuradha P, Srinivas Y, Prasad MK (2015) A frame work for preserving privacy in social media using generalized gaussian mixture model. Int J Adv Comp Sci Appl 6:68–71 13. Bahri L, Carminati B, Ferrari E, Lucia W (2016) Lamp-label-based access-control for more privacy in online social networks. In: IFIP International conference on information security theory and practice. Springer, Berlin, pp 171–186 14. Beato F, Halunen K, Mennink B (2016) Recipient privacy in online social networks (short paper). In: International workshop on security. Springer, Berlin, pp 254–264 15. Bélanger F, Crossler RE (2011) Privacy in the digital age: a review of information privacy research in information systems. MIS Quart 35(4):1017–1042 16. Boonkrong S (2013) A step towards a solution to information privacy problem on online social networks. GSTF J Comput (JoC) 2(4):139 17. Boshrooyeh ST, Küpçü A, Özkasap Ö (2018) Ppad: privacy preserving group-based advertising in online social networks. In: 2018 IFIP networking conference (IFIP networking) and workshops. IEEE, Piscataway, pp 1–9 18. Boyd DM, Ellison NB (2007) Social network sites: definition, history, and scholarship. J Comput-Med Commun 13(1):210–230

1

https://www.pewresearch.org/fact-tank/2020/05/04/how-americans-see-digital-privacy-issuesamid-the-covid-19-outbreak/.

126

S. Bouraga et al.

19. Caviglione L, Coccoli M, Merlo A (2013) A graph-based approach to model privacy and security issues of online social networks. In: Social network engineering for secure web data and services. IGI Global, Hershey, pp 184–205 20. Cheng Y, Park J, Sandhu R (2013) Preserving user privacy from third-party applications in online social networks. In: Proceedings of the 22nd international conference on world wide web, pp 723–728 21. Cheng C, Zhang Ch, Yang J (2014) Background knowledge based privacy metric model for online social networks. J China Univer Posts Telecommun 21(2):75–82 22. Clarke R (1999) Internet privacy concerns confirm the case for intervention. Commun ACM 42(2):60–67 23. De SJ, Imine A (2018) To reveal or not to reveal: balancing user-centric social benefit and privacy in online social networks. In: Proceedings of the 33rd annual ACM symposium on applied computing, pp 1157–1164 24. De Salve A, Mori P, Ricci L (2015) A privacy-aware framework for decentralized online social networks. In: Database and expert systems applications. Springer, Berlin, pp 479–490 25. De Salve A, Mori P, Ricci L, Al-Aaridhi R, Graffi K (2016) Privacy-preserving data allocation in decentralized online social networks. In: IFIP international conference on distributed applications and interoperable systems. Springer, Berlin, pp 47–60 26. De Salve A, Di Pietro R, Mori P, Ricci L (2017) A logical key hierarchy based approach to preserve content privacy in decentralized online social networks. IEEE Trans Dependable Secure Computing 17:2–21 27. de Souza JV, Gomes Jr J, de Souza Filho FM, de Oliveira Julio AM, de Souza JF (2020) A systematic mapping on automatic classification of fake news in social media. Soc Netw Analy Mining 10(1):1–21 28. Fu Y, Wang Y, Peng W (2014) Commonfinder: a decentralized and privacy-preserving common-friend measurement method for the distributed online social networks. Comput Netw 64:369–389 29. Gao T, Li F, Chen Y, Zou X (2017) Preserving local differential privacy in online social networks. In: International conference on wireless algorithms, systems, and applications. Springer, Berlin, pp 393–405 30. García-Recuero Á, Burdges J, Grothoff C (2016) Privacy-preserving abuse detection in future decentralised online social networks. In: Data privacy management and security assurance. Springer, Berlin, pp 78–93 31. Garg H, Chauhan A, Bhatia M, Sethi G, Chauhan G, et al (2021) Role of mass media and it’s impact on general public during coronavirus disease 2019 pandemic in north India: an online assessment. Indian J Med Sci 73:1–5 32. Georgiou T, El Abbadi A, Yan X (2017) Privacy cyborg: towards protecting the privacy of social media users. In: 2017 IEEE 33rd international conference on data engineering (ICDE). IEEE, Piscataway, pp 1395–1396 33. Georgiou T, El Abbadi A, Yan X (2017) Privacy-preserving community-aware trending topic detection in online social media. In: IFIP annual conference on data and applications security and privacy. Springer, Berlin, pp 205–224 34. Ghemri L (2015) A user centered approach to managing privacy in online social networks. In: I n SITE 2015: informing science+ IT education conferences: USA, pp 187–199 35. Gregor S (2006) The nature of theory in information systems. MIS Quart 30:611–642 36. Guo L, Zhang C, Fang Y (2014) A trust-based privacy-preserving friend recommendation scheme for online social networks. IEEE Trans Dependable Secure Comput 12(4):413–427 37. Hasan MR (2013) Emergence of privacy conventions in online social networks. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp 1433–1434 38. Hu D, Chen F, Wu X, Zhao Z (2016) A framework of privacy decision recommendation for image sharing in online social networks. In: 2016 IEEE first international conference on data science in cyberspace (DSC). IEEE, Piscataway, pp 243–251

Privacy in OSNs

127

39. Hwang RH, Hsueh YL, Wu JJ, Huang FH (2016) Socialhide: a generic distributed framework for location privacy protection. J Netw Comput Appl 76:87–100 40. Imran-Daud M, Sánchez D, Viejo A (2016) Privacy-driven access control in social networks by means of automatic semantic annotation. Comput Commun 76:12–25 41. Joshi P, Kuo CCJ (2011) Security and privacy in online social networks: a survey. In: 2011 IEEE international conference on multimedia and expo. IEEE, Piscataway, pp 1–6 42. Kafalı Ö, Günay A, Yolum P (2014) Detecting and predicting privacy violations in online social networks. Distrib Parall Databases 32(1):161–190 43. Kang GJ, Ewing-Nelson SR, Mackey L, Schlitt JT, Marathe A, Abbas KM, Swarup S (2017) Semantic network analysis of vaccine sentiment in online social media. Vaccine 35(29):3621– 3638 44. Kaosar M, Mamun Q (2014) Privacy-preserving interest group formation in online social networks (OSNs) using fully homomorphic encryption. J Inf Privacy Security 10(1):44–52 45. Kayes I, Iamnitchi A (2017) Privacy and security in online social networks: a survey. Online Soc Netw Media 3:1–21 46. Keküllüo˘glu D, Kökciyan N, Yolum P (2016) Strategies for privacy negotiation in online social networks. In: Proceedings of the 1st international workshop on AI for privacy and security, pp 1–8 47. Kekulluoglu D, Kokciyan N, Yolum P (2018) Preserving privacy as social responsibility in online social networks. ACM Trans Int Technol 18(4):1–22 48. Kepez B, Yolum P (2016) Learning privacy rules cooperatively in online social networks. In: Proceedings of the 1st international workshop on AI for privacy and security, pp 1–4 49. Kökciyan N, Yolum P (2014) Commitment-based privacy management in online social networks. In: First international workshop on multiagent foundations of social computing at AAMAS 50. Kökciyan N, Yolum P (2016) Priguard: A semantic approach to detect privacy violations in online social networks. IEEE Trans Knowl Data Eng 28(10):2724–2737 51. Kökciyan N, Yolum P (2016) Priguardtool: A tool for monitoring privacy violations in online social networks. In: AAMAS, pp 1496–1497 52. Kökciyan N, Yaglikci N, Yolum P (2017) An argumentation approach for resolving privacy disputes in online social networks. ACM Trans Int Technol 17(3):1–22 53. Li J, Yan H, Liu Z, Chen X, Huang X, Wong DS (2015) Location-sharing systems with enhanced privacy in mobile online social networks. IEEE Sys J 11(2):439–448 54. Li C, Palanisamy B, Joshi J (2016) Socialmix: supporting privacy-aware trusted social networking services. In: 2016 IEEE international conference on web services (ICWS). IEEE, Piscataway, pp 115–122 55. Li X, Yang Y, Chen Y, Niu X (2018) A privacy measurement framework for multiple online social networks against social identity linkage. Appl Sci 8(10):1790 56. Lin YH, Wang CY, Chen WT (2014) A content privacy-preserving protocol for energyefficient access to commercial online social networks. In: 2014 IEEE international conference on communications (ICC). IEEE, Piscataway, pp 688–694 57. Liu Z, Li J, Chen X, Li J, Jia C (2013) New privacy-preserving location sharing system for mobile online social networks. In: 2013 eighth international conference on P2P, parallel, grid, cloud and internet computing. IEEE, Piscataway, pp 214–218 58. Liu Z, Luo D, Li J, Chen X, Jia C (2016) N-mobishare: new privacy-preserving locationsharing system for mobile online social networks. Int J Comput Math 93(2):384–400 59. Löchner M, Dunkel A, Burghardt D (2018) A privacy-aware model to process data from location-based social media. In: VGI geovisual analytics workshop, colocated with BDVA 2018 60. Ma X, Ma J, Li H, Jiang Q, Gao S (2018) Armor: a trust-based privacy-preserving framework for decentralized friend recommendation in online social networks. Future Gener Comput Syst 79:82–94 61. Maheswaran J, Jackowitz D, Wolinsky DI, Wang L, Ford B (2014) Crypto-book: Bootstrapping privacy preserving online identities from social networks. Preprint arXiv:14064053

128

S. Bouraga et al.

62. Mester Y, Kökciyan N, Yolum P (2015) Negotiating privacy constraints in online social networks. In: International workshop on multiagent foundations of social computing. Springer, Berlin, pp 112–129 63. Mollema L, Harmsen IA, Broekhuizen E, Clijnk R, De Melker H, Paulussen T, Kok G, Ruiter R, Das E (2015) Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in the Netherlands in 2013. J Med Int Res 17(5):e3863 64. Nascimento AM, Da Silveira DS (2017) A systematic mapping study on using social media for business process improvement. Comp Human Behavior 73:670–675 65. Pace GJ, Pardo R, Schneider G (2016) On the runtime enforcement of evolving privacy policies in online social networks. In: International symposium on leveraging applications of formal methods. Springer, Berlin, pp 407–412 66. Palomar E, González-Manzano L, Alcaide A, Galán A (2015) Implementing a privacyenhanced ABC system for online social networks with co-ownership management. Githubio 67. Palomar E, González-Manzano L, Alcaide A, Galán Á (2016) Implementing a privacyenhanced attribute-based credential system for online social networks with co-ownership management. IET Inf Security 10(2):60–68 68. Pang J, Zhang Y (2015) A new access control scheme for facebook-style social networks. Comput Security 54:44–59 69. Patsakis C, Zigomitros A, Papageorgiou A, Galván-López E (2014) Distributing privacy policies over multimedia content across multiple online social networks. Comput Netw 75:531–543 70. Pensa RG (2019) Enhancing privacy awareness in online social networks: a knowledge-driven approach. In: 2018 international workshop on knowledge-driven analytics impacting human quality of life (KDAH 2018), CEUR-WS, vol 2482, pp 1–2 71. Pensa RG, Di Blasi G (2016) A centrality-based measure of user privacy in online social networks. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, Piscataway, pp 1438–1439 72. Pensa RG, Di Blasi G (2016) A semi-supervised approach to measuring user privacy in online social networks. In: International conference on discovery science. Springer, Berlin, pp 392– 407 73. Pensa RG, Di Blasi G (2017) A privacy self-assessment framework for online social networks. Expert Sys Appl 86:18–31 74. Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: 12th international conference on evaluation and assessment in software engineering (EASE) 12, pp 1–10 75. Pramanik MI, Hasan MR, Karmaker B, Alom T (2014) An art of location privacy on social media. Int J Sci Eng Res 5(12):1115–1122 76. Preciado VM, Sahneh FD, Scoglio C (2013) A convex framework for optimal investment on disease awareness in social networks. In: 2013 IEEE global conference on signal and information processing. IEEE, Piscataway, pp 851–854 77. Raji F, Miri A, Jazi MD (2013) Cp2: Cryptographic privacy protection framework for online social networks. Comput Electr Eng 39(7):2282–2298 78. Raji F, Miri A, Davarpanah Jazi M (2014) A centralized privacy-preserving framework for online social networks. ISeCure-The ISC Int J Inf Security 6(1):35–52 79. Ratikan A, Shikida M (2014) Privacy protection based privacy conflict detection and solution in online social networks. In: International conference on human aspects of information security, privacy, and trust. Springer, Berlin, pp 433–445 80. Razzaq N (2013) Securewall-a framework for fine-grained privacy control in online social networks. Int J Inf Technol Model Comput 1:51–72 81. Salathé M, Khandelwal S (2011) Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. PLoS Comput Biol 7(10):e1002199 82. Samanthula BK, Cen L, Jiang W, Si L (2015) Privacy-preserving and efficient friend recommendation in online social networks. Trans Data Privacy 8(2):141–171

Privacy in OSNs

129

83. Saxena A, Jain I, Gorantla MC (2013) An integrated framework for enhancing privacy in online social networks. In: Proceedings of the 6th ACM India computing convention, pp 1–6 84. Schwittmann L, Boelmann C, Wander M, Weis T (2013) Sonet–privacy and replication in federated online social networks. In: 2013 IEEE 33rd international conference on distributed computing systems workshops. IEEE, Piscataway, pp 51–57 85. Shen N, Yuan K, Yang J, Jia C (2014) B-mobishare: Privacy-preserving location sharing mechanism in mobile online social networks. In: 2014 ninth international conference on broadband and wireless computing, communication and applications. IEEE, Piscataway, pp 312–316 86. Singh I, Akhoondi M, Arslan MY, Madhyastha HV, Krishnamurthy SV (2015) Resource efficient privacy preservation of online social media conversations. In: International conference on security and privacy in communication systems. Springer, Berlin, pp 233–255 87. Srivastava A, Geethakumari G (2014) A privacy settings recommender system for online social networks. In: International conference on recent advances and innovations in engineering (ICRAIE-2014). IEEE, Piscataway, pp 1–6 88. Stern T, Kumar N (2014) Improving privacy settings control in online social networks with a wheel interface. J Assoc Inf Sci Technol 65(3):524–538 89. Such JM, Criado N (2014) Adaptive conflict resolution mechanism for multi-party privacy management in social media. In: Proceedings of the 13th workshop on privacy in the electronic society, pp 69–72 90. Such JM, Criado N (2016) Resolving multi-party privacy conflicts in social media. IEEE Trans Knowl Data Eng 28(7):1851–1863 91. Such JM, Rovatsos M (2016) Privacy policy negotiation in social media. ACM Trans Auton Adapt Sys 11(1):1–29 92. Sun W, Zhou J, Lyu R, Zhu S (2016) Processing-aware privacy-preserving photo sharing over online social networks. In: Proceedings of the 24th ACM international conference on multimedia, pp 581–585 93. Sun G, Xie Y, Liao D, Yu H, Chang V (2017) User-defined privacy location-sharing system in mobile online social networks. J Netw Comput Appl 86:34–45 94. Tang C, Cai C (2017) Verifiable mobile online social network privacy-preserving location sharing scheme. Concurrency Comput Pract Exp 29(24):e4238 95. Tayeb S, Week A, Yee J, Carrera M, Edwards K, Murray-Garcia V, Marchello M, Zhan J, Pirouz M (2018) Toward metadata removal to preserve privacy of social media users. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE, Piscataway, pp 287–293 96. Tian H, Zhong B, Shen H (2014) Diffusion wavelet-based analysis on traffic matrices by different diffusion operators. Comput Electr Eng 40(6):1874–1882 97. Tierney M, Spiro I, Bregler C, Subramanian L (2013) Cryptagram: photo privacy for online social media. In: Proceedings of the first ACM conference on online social networks, pp 75– 88 98. Tomy S, Pardede E (2016) Controlling privacy disclosure of third party applications in online social networks. Int J Web Inf Sys 12:215–241 99. Tonge A, Caragea C (2015) Privacy prediction of images shared on social media sites using deep features. Preprint arXiv:151008583 100. Tucker R, Tucker C, Zheng J (2015) Privacy pal: improving permission safety awareness of third party applications in online social networks. In: 2015 IEEE 17th international conference on high performance computing and communications, 2015 IEEE 7th international symposium on cyberspace safety and security, and 2015 IEEE 12th international conference on embedded software and systems. IEEE, Piscataway, pp 1268–1273 101. Ulusoy O (2018) Collaborative privacy management in online social networks. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, pp 1788–1790

130

S. Bouraga et al.

102. Ulusoy O, Yolum P (2018) Pano: privacy auctioning for online social networks. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, pp 2103–2105 103. Valliyammai C, Bhuvaneswari A (2019) Semantics-based sensitive topic diffusion detection framework towards privacy aware online social networks. Cluster Comput 22(1):407–422 104. Viejo A, Sánchez D (2016) Enforcing transparent access to private content in social networks by means of automatic sanitization. Exp Sys Appl 62:148–160 105. Wang Z, Minsky NH (2015) A novel, privacy preserving, architecture for online social networks. EAI Endorsed Trans Collabor Comput 1(5):e3 106. Wang S, Sinnott RO (2016) Supporting geospatial privacy-preserving data mining of social media. Soc Netw Analy Mining 6(1):109 107. Wang S, Sinnott RO (2017) Protecting personal trajectories of social media users through differential privacy. Comput Security 67:142–163 108. Wang Y, Leon PG, Scott K, Chen X, Acquisti A, Cranor LF (2013) Privacy nudges for social media: an exploratory facebook study. In: Proceedings of the 22nd international conference on world wide web, pp 763–770 109. Wang S, Sinnott R, Nepal S (2018) Privacy-protected statistics publication over social media user trajectory streams. Future Gener Comput Syst 87:792–802 110. Wang Y, Yang L, Chen X, Zhang X, He Z (2018) Enhancing social network privacy with accumulated non-zero prior knowledge. Inf Sci 445:6–21 111. Wang H, He D, Yu J (2019) Privacy-preserving incentive and rewarding scheme for crowd computing in social media. Inf Sci 470:15–27 112. Wieringa R, Maiden N, Mead N, Rolland C (2006) Requirements engineering paper classification and evaluation criteria: a proposal and a discussion. Req Eng 11(1):102–107 113. Xiao X, Chen C, Liu X, Hu G, Jiang Y (2017) Privacy-preserving location sharing system with client/server architecture in mobile online social network. Int J Comput Inf Eng 11(2):200– 206 114. Xiao X, Chen C, Sangaiah AK, Hu G, Ye R, Jiang Y (2018) Cenlocshare: a centralized privacy-preserving location-sharing system for mobile online social networks. Future Gener Comput Syst 86:863–872 115. Xu K, Guo Y, Guo L, Fang Y, Li X (2015) My privacy my decision: control of photo sharing on online social networks. IEEE Trans Depend Secure Comput 14(2):199–210 116. Yang M, Yu Y, Bandara AK, Nuseibeh B (2014) Adaptive sharing for online social networks: a trade-off between privacy risk and social benefit. In: 2014 IEEE 13th international conference on trust, security and privacy in computing and communications. IEEE, Piscataway, pp 45–52 117. Yang D, Qu B, Cudré-Mauroux P (2018) Privacy-preserving social media data publishing for personalized ranking-based recommendation. IEEE Trans Knowl Data Eng 31(3):507–520 118. Ye A, Chen Q, Xu L, Wu W (2018) The flexible and privacy-preserving proximity detection in mobile social network. Future Gener Comput Sys 79:271–283 119. Zave P, Jackson M (1997) Four dark corners of requirements engineering. ACM Trans Softw Eng Methodol 6(1):1–30 120. Zayet T, Ismail MA, Varathan KD, Noor R, Chua HN, Lee A, Low YC, Singh SKJ (2021) Investigating transportation research based on social media analysis: a systematic mapping review. Scientometrics 126(8):6383–6421 121. Zeng Y, Sun Y, Xing L, Vokkarane V (2014) Trust-aware privacy evaluation in online social networks. In: 2014 IEEE international conference on communications (ICC). IEEE, Piscataway, pp 932–938 122. Zhang L, Guo Y, Chen X (2013) Patronus: augmented privacy protection for resource publication in online social networks. In: 2013 IEEE seventh international symposium on service-oriented system engineering. IEEE, Piscataway, pp 578–583 123. Zhang S, Lin Y, Liu Q, Jiang J, Yin B, Choo KKR (2017) Secure hitch in location based social networks. Comput Commun 100:65–77

Privacy in OSNs

131

124. Zhang J, Sun J, Zhang R, Zhang Y, Hu X (2018) Privacy-preserving social media data outsourcing. In: IEEE INFOCOM 2018-IEEE conference on computer communications. IEEE, Piscataway, pp 1106–1114 125. Zhang S, Li X, Liu H, Lin Y, Sangaiah AK (2018) A privacy-preserving friend recommendation scheme in online social networks. Sustain Cities Soc 38:275–285 126. Zheleva E, Getoor L (2011) Privacy in social networks: a survey. In: Social network data analytics. Springer, Berlin, pp 277–306 127. Zheng Y, Wang B, Lou W, Hou YT (2015) Privacy-preserving link prediction in decentralized online social networks. In: European symposium on research in computer security. Springer, Berlin, pp 61–80 128. Zhou YS, Peng EW, Guo CQ (2016) A random-walk based privacy-preserving access control for online social networks. Int J Adv Comput Sci Appl 7(2):74–79

Beyond Influence Maximization: Volume Maximization in Social Networks Abhinav Choudhury, Shruti Kaushik, and Varun Dutt

Abstract The health crisis brought about by Covid-19 has resulted in a heightened necessity for proper and correct information dissemination to counter the prevalence of fake news and other misinformation. Doctors are the most reliable source of information regarding patients’ health status, disease, treatment options, or necessary lifestyle changes. Prior research has tackled the problem of influence maximization (IM), which tries to identify the most influential physicians inside a physician’s social network. However, less research has taken place on solving the problem of volume maximization (VM), which deals with finding the best set of physicians that maximize the combined volume (e.g., medicine prescribed) and influence (i.e., information disseminated). The primary objective of this work is to address the VM problem by proposing three frameworks: a reinforcement learning (RL) framework, and Instance-Based Learning (IBL), and a heuristic framework called Cost-Effective Lazy Forward (CELF)-volume algorithm, a variant of the popular CELF algorithm. We compared the proposed algorithms with a Weighted-greedy algorithm and a prefix excluding maximum influence arborescence (PMIA) IM algorithm. We used the physicianSN dataset (physician social network 181 nodes and 19,026 edges) to test different algorithms. Results revealed that the CELF-volume algorithm gave an average volume spread increment of 58% compared to the baseline algorithm but gave an average influence spread increment of only 12%. While PMIA gave the highest average influence spread increment of 46% but an average volume spread increment of only 28%. In contrast, Q-learning gave an average volume spread increment of 34% and an influence spread increment of 14%. This research highlights the utility of using reinforcement learning algorithms for finding critical physicians that can swiftly disseminate critical information to both physicians and patients.

A. Choudhury () · S. Kaushik · V. Dutt School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India e-mail: [email protected]; [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_7

133

134

A. Choudhury et al.

Keywords Volume maximization · Influence maximization · Reinforcement learning · Instance-based learning · Influence spread · Volume spread · Physicians · Social networks

1 Introduction Doctors are the most reliable sources of information regarding patients’ health, disease treatment options, or necessary lifestyle changes [1]. Although there are many easily accessible medical information sources like websites, discussion forums, and support groups, doctors are the most trusted source of information [2]. Moreover, the quality of the information found online may be incorrect or outdated, or commercial websites might pre-dominate the search results [3]. Therefore, doctor-patient interactions play a significant role in knowledge transfer or dissemination of information to patients. The World Health Organization (WHO) has promulgated that the desirable doctor–population ratio should be 1:1000; however, over 44% of the WHO member states have reported to having less than one physician per 1000 populations [4]. Thus, it is imperative to identify opinion leaders or critical physicians in the healthcare sector for proper dissemination of healthcare information and medications to patients. Prior research has shown that opinion leaders in medical communities effectively speed up the adoption of clinical guidelines by other physicians [5]. However, many opinion leaders may be from academia or may occupy administrative positions [6] and, as such, may not have frequent interactions with patients. This research measures the number of interactions between patients and physicians by the number of scripts prescribed by each physician. A single script is prescribed to one single patient. Therefore, a higher volume of scripts prescribed by a physician signifies a higher number of visits by different patients. With a high number of patient visits, physicians interact with more patients and, as such, can disseminate critical healthcare information swiftly and appropriately. Furthermore, as per [7], influential physicians may not be high-volume prescribers. Therefore, opinion leaders in the medical community, though maybe influential to other physicians, may not have frequent interaction with patients. Overall, it is critical to identify physicians that are both influential and have frequent interaction with patients for proper dissemination of healthcare information to physicians and patients alike. Influence Maximization (IM) formulates the identification of critical physicians as an algorithmic problem [8]. IM is the problem of maximizing the number of physicians in a physician social network who become aware of an information by finding an initial set of physicians (also termed seed set) to expose the information to [8, 9]. This set of physicians can be termed as influential or critical physicians who can influence other physicians in the network. The IM problem is NP-hard, but it can be efficiently solved approximately within the efficiency of using a greedy approach [8, 9]. An extension to the IM problem is the weighted IM problem, where a non-negative weight is associated to each node capturing the importance

Beyond Influence Maximization: Volume Maximization in Social Networks

135

of the node [8]. In the weighted IM problem, we select the nodes with the highest weights, where weights may signify the average number of scripts prescribed by a physician. However, in the volume maximization (VM) problem, we need to select those physicians that are both influential (in the context of other physicians) as well as those who disseminate the largest healthcare information and medications among patients (signified by the weight or volume assigned to each physician). Although prior research has investigated the IM and the weighted IM problems [8, 9], no research has yet investigated the VM problem. Researchers have developed greedy heuristic algorithms such as Greedy [9] and CELF [10] to solve the IM problem. These algorithms could be extended to solve the weighted IM problem [8]. However, these algorithms select nodes that either maximizes the influence of the weight, but not both simultaneously. To overcome this issue, we propose two frameworks: a reinforcement learning (RL) framework and an Instance-based learning (IBL) framework. These frameworks select nodes (physicians) that are both influential as well as those who disseminate the largest healthcare information and medications among patients. Moreover, social networks evolve over time by adding or removing connections. As such most IM algorithms need to be applied again from scratch on the same social network due to small network changes. However, the RL and the IBL framework calculate each node’s Q–value or blended value based on the reward function. Thus, by traversing the whole graph, these frameworks may be able to generalize to evolving graphs. Furthermore, to the best of the authors’ knowledge, the comparison of RL and IBL frameworks to heuristics methods for solving the VM problem is still lacking in the literature. The primary objective of this research is to bridge the literature gaps highlighted above by proposing the VM problem and different algorithmic solutions to the VM problem. The objective of the VM problem is to select a set of preliminary users (which is called a seed set) in a social network, such that the combined volume (e.g., medicine prescribed, and information disseminated, also referred to as the volume spread) is maximized. To solve this problem algorithmically, we propose two frameworks: an RL [10] framework that develops Q-learning [10, 11] and SARSA [10, 12] models; and an IBL framework [13, 14] that develops an IBL model. Lastly, we also propose the CELF-volume algorithm, which is a variant of a popular greedy algorithm developed for solving the IM problem. We compare these algorithms with the Weighted-greedy and PMIA [12] algorithms on a large healthcare dataset. In what follows, first, we describe the background and related studies on VM. In Sect. 3, we describe in detail the proposed RL and IBL models. In Sect. 4, we compare our models with certain baseline algorithms for solving the VM problem. Finally, in Sect. 5, we discuss the conclusions from our results and their implication for solving the VM problem in social networks.

136

A. Choudhury et al.

2 Related Work Opinion leaders act as gatekeepers for interventions, help change social norms, and accelerate behavior change [15]. They influence the opinions, attitudes, beliefs, motivations, and behavior of others around them [15]. Opinion leaders in medical communities are effective in speeding the adoption of clinical guidelines [5]. Moreover, the lag between the publication of clinical and healthcare research and the application of this information is substantial, and it delays healthcare improvement [5]. Therefore, it is highly imperative to identify key opinion leaders or critical physicians in the healthcare sector for proper and faster dissemination of healthcare information. Patient-physician communication is an integral part of clinical practice [17]. Patients who understand and communicate properly with their doctors are more likely to acknowledge health problems, understand their treatment options, modify their behavior accordingly, and follow their medication schedules [18–21]. Research has shown that effective patient-physician communication can improve a patient’s health [18–21]. However, not all opinion leaders may interact with patients. Lewis [6] conducted a study to examine the positional and personal influence of people in health policy in Victoria, Australia. He found that many opinion leaders were from academia or occupied administrative positions like healthcare researchers or Deans of medicine in a hospital. Therefore, these opinion leaders did not have frequent interactions with patients. Furthermore, Choudhury, Kaushik, and Dutt [7] created a physician social network using data from 30+ hospitals in Boston containing more than 2000+ physicians and the prescription data and adoption behavior of pain medications. These authors found that influential physicians were not high-volume prescribers. Thus, one may need to identify physicians that are both influential and spread the most amount of medical information to their patients through medicine prescriptions. IM formulates the identification of critical physicians as an algorithmic problem [8]. The objective of the IM problem is to select a set of k nodes (i.e., users) from the influence graph, such that the expected number of influenced users (also called influence spread (σ)) is maximized [8]. The influence spread (σ) is a set function defined on the seed set of the graph, i.e., σ : 2V → R>0 , where 2V are the total number of seed sets possible (bounded by the value of k). The influence of a node in a social network is estimated based on the number of nodes it activates (i.e., influences the user to adopt a product or re-tweet a tweet) during the process of information diffusion in the social network [16]. An extension to the IM problem is the weighted IM problem where a non-negative weight is associated to each node capturing the importance of the node [8]. In the weighted IM problem, we define the weighted influence function (ω) and try to maximize it. Multiple models of information diffusion have been introduced in the literature [8, 9]. Some of the most widely studied models for information diffusion are the linear threshold (LT) model and the independent cascade (IC) model [8]. The IM problem is NP-hard, but using the LT or IC diffusion models, the greedy hill climbing algorithms like

Beyond Influence Maximization: Volume Maximization in Social Networks

137

Greedy and CELF can efficiently approximate a solution within a factor of 1 − 1/e of the optimal solution [8]. The Greedy algorithm starts with an empty set S = Φ. At each step of the greedy algorithm, a node with the largest marginal influence gain (u| S) where (u| S) = σ (S ∪ {u}) − σ (S) is inserted into S. The process is repeated until the size of the seed set becomes equal to k, i.e.,  S  = k. Similarly, for the weighted IM problem (where the weight may signify average number of scripts prescribed), the Weighted-greedy algorithm [8] selects the node with the largest marginal weighted gain w (u| S). In a physician social network, these k nodes are selected based on marginal influence gain (where k signifies the critical physicians or opinion leaders). In contrast in the weighted IM problem, the k nodes selected based on marginal weighted gain i.e., popular physicians with high patient interactions. However, critical physicians may be influential to other physicians but may not be influential to patients while popular physicians may be popular among patients but may not be influential among physicians. In this research, we tried to overcome these limitations by proposing the problem of volume maximization (VM) and different algorithmic solutions to the VM problem. The objective of the VM problem is to select a set of k preliminary users (which is called a seed set) in a social network, such that the combined volume (e.g., medicine prescribed, and information disseminated, also referred to as the volume spread) is maximized. For solving the VM problem, we propose two frameworks: a RL framework that develops Q-learning and SARSA models; and cognitive framework that develops an IBL model. We develop custom reward functions using different centrality measures for the RL and IBL frameworks to select nodes that are both influential and those who disseminate the largest healthcare information and medications among patients. Lastly, we also propose the CELF-volume algorithm, which is a variant of a popular greedy algorithm developed for solving the IM problem. We compare these algorithms with the Weighted-greedy and PMIA [12] algorithm on a large healthcare dataset. Furthermore, we compare these models with the degree centrality [22], Diffusion degree, maximum influence degree [30] and a RANDOM (baseline) algorithms.

3 Method 3.1 Data We tested the proposed frameworks on a real-world social networks datasets, the PhysicianSN [23]. Table 1 provides details about the PhysicianSN dataset. It is a physician social network dataset, where the physicians are the nodes, and edges represent the relationship between them. The PhysicianSN dataset was created by using the medical-prescriptions dataset and the Healthcare Organization Services (HCOS) [27] physician-affiliation dataset to extract physicians’ attributes. It is a large directed graph with 181 nodes and 19,026 edges with an average node degree

138

A. Choudhury et al.

Table 1 Dataset used for analyses in this research

Graph metrics #of nodes #of edges avg. degree max. degree

PhysicianSN 181 19026 105.7 120

being 105.7 (Table 1). The relationship between the physicians was created based on the similarity between physicians’ attributes. Moreover, the [23] used real-world prescription logs for calculating the influence probabilities between physicians. In this dataset, the individual volume wv represents the total average number of scripts prescribed by each physician between 1996 and 2017.

3.2 Volume Maximization Consider a weighted social graph G = (V, E, p, w), where V is a set of vertices, E is the set of edges with influence probabilities puv attached with each edge, a diffusion model M, and the individual volume (w) of each user u. The VM can be defined as follows.

ω S ∗ = argmaxS⊆V ∧|S|≤k ω(S) ∩ σ (S)

(1)

where ω is the volume spread and σ is the influence spread. In volume maximization, we select a preliminary set of k users (seed set) in an influence graph, such that the combined volume (e.g., medicine prescribed) of the expected number of influenced users (also called volume spread (ω)) and the expected number of influenced users (also called influence spread (σ )) through the seed set is maximized. We modeled the VM problem as a sequential decisionmaking task and developed RL, IBL, and other frameworks to solve it. In these frameworks, we calculated the objective value (Q-value in RL or the blended value in IBL) of each node and selected the k nodes with the k highest objective values as the seed set. The IM problem focuses on the selection of set of k nodes (S) (i.e., users) from the influence graph (G), such that the expected number of influenced nodes (users) (σ(S)) is maximized. Figure 1 shows a directed social influence graph G = (V, E, p, w) where the influence probabilities pu, v = 1 and each node is assigned a particular weight (wu ). The objective of the influence maximization problem is to select an initial set S of k (k = 2 in this example) nodes from the vertex set V such that the maximum number of nodes gets influenced. Therefore, as shown in Fig. 1a, most IM algorithms will select node 2 and node 0 as it maximizes the total influence spread (σ(S)). In contrast, the aim of volume maximization (VM) is to select a set S of k nodes from the vertex set V such that the volume spread (ω) and the influence

Beyond Influence Maximization: Volume Maximization in Social Networks

139

Fig. 1 (a) IM problem (b) VM problem

spread (σ ) is maximized. As such VM algorithms should select nodes 1 and 2 as it maximizes both the volume and influence spread (Fig. 1b).

3.3 Independent Cascade (IC) Diffusion Model In the IC model [8], when a node (user) i becomes active (influenced to perform an action) at time step t, it becomes contagious and starts influencing her neighbor v with a probability pu, v , where pu, v is the influence probability between u and v. The activated node u has a single chance to activate each of its currently inactive neighbors v. If v has multiple newly activated neighbors, their attempts at activating v would be sequenced in an arbitrary order. If u succeeds in activating v in step t, then v will become active in step t + 1; but, if u fails, no further attempts can be made by u to activate v in subsequent time-steps. The process runs until no more activation is possible. Using the IC model, one could calculate the influence spread (σ ) of each node where influence spread is the expected number of users influenced by a node v. Additionally, since each node has a volume (w) attached to it, volume spread is defined as the total combined volume of the influenced nodes. Motivated from literature [7, 25, 26], in this paper, we have used the IC model to calculate the influence spread (σ ) and volume spread (ω) of each node.

140

A. Choudhury et al.

3.4 Reinforcement Learning Framework Reinforcement learning (RL) [10] consists of an agent A that interacts with the environment E at time step t. Figure 2 explains the basic structure of the RL framework. The agent chooses an action At from the set of viable actions available at time step t. The environment then moves to another state St + 1 and receives a reward Rt + 1 . The goal of reinforcement learning is to maximize the total reward received. The environment E is a social network graph, where the nodes are the states S and transition from one state to another is governed by the influence probability pu, v between the nodes of the social network. The agent A interacts with environment E and selects an action At at time set t. The set of neighbors of the current state St are the set of viable actions for the agent. The agent starts from an initial state S0 , selects an action A0 and receives a reward R1 . It then transitions to state S1 based on the IC model, i.e., the agent can transition from state St to St + 1 with probabilitypst ,s(t+1) . If the agent is unable to transition to any of the neighbor states, or if the number of hops exceed a certain limit (termed as the “hop-limit”), then the episode terminates. Each node in the network was selected as an initial state and we ran multiple iterations for each initial state.

Fig. 2 Flow chart of the proposed RL framework

Beyond Influence Maximization: Volume Maximization in Social Networks

141

3.5 Reward for the RL Framework 3.5.1 Diffusion Degree The Diffusion degree [30] of a node u is defined as the cumulative contribution of the node itself and contributions of its immediate neighbors in the diffusion of information inside a social network. Diffusion degree (CDD ) is defined as follows: CDD (u) = Exp(u) + Exp u(2)

(2)

where, Exp(u) is the expected number of nodes activated or influenced by u, and Exp(u(2)) is the number of nodes activated by active neighbors of u. Exp(u) is defined as follows:  Exp(u) = pu,v (3) v∈(u)

where, pu, v is the influence probability that u exerts on v and Γ (u) are the set of nodes in the neighborhood of u. As the diffusion propagates further, active neighbors of v will try to activate its inactive neighbors. Therefore Exp(u(2)) is defined as follows: ⎛ ⎞   ⎝pu,v × pv,i ⎠ Exp(u) = (4) v∈(u)

i∈(v)

where, pu, v is the influence probability that u exerts on v, pv, i is the influence probability that v exerts on its neighbors i where i ∈ Γ (v), and Γ defines the set of neighbors of a node.

3.5.2 Maximum Influence Degree Maximum influence degree (MID) [30] of a node v is the total number of nodes reachable from v at different path lengths. A path length is defined as the number of edges in a path from a node u to node v. If there are multiple paths between u and v, then u has multiple opportunities to activate v through these paths. An adjacency matrix is defined as follows:  au,v = 1, if there is a link f rom node u to node v 0, otherwise The non-zero elements (au, v ) in the adjacency matrix A of a graph signify the presence of a 1-length path between from u to v. Now, if we multiply the adjacency matrix with itself (A × A), we get the 2nd power of the adjacency matrix A2 . Each

142

A. Choudhury et al.

2 ) in A2 signify a 2-length path from u to v. Similarly, a nonnon-zero element au,v n ) in An signify an n-length path from u to v. For a node v in zero element au,v graph G = (V, E), where V is the set of vertices and E is the set of edges, MID may be theoretically defined as: CMI D (v) =

∞ 

α n (v)

(5)

n=1

where, α n (v) is the number of n-length paths from v α n (v) =| ξv (n) |

(6)

where, ξ v (n) contains the non-zero element in the nth power adjacency matrix An .  n

 n ξv (n) = u ∈ V |au,v > 0, An = au,v

(7)

where, An is the nth power of the adjacency matrix A of graph G. A positive entry au, v in An signifies the presence of an n-length path from u to v. For practical purposes, CMID (v) is not calculated to ∞but rather to d, the diameter of the graph, where diameter is the greatest distance between any pair of vertices in the graph. Figure 3 shows a directed graph G, its adjacency matrix A, and the 2nd power of the adjacency matrix A2 . A positive entry au, v in A2 signifies the presence of a 2length path from u to v. The A2 matrix shows that one can go from node 0 to node 1 (0 → 4 → 1), or from node 0 to node 3 (0 → 1 → 3) in two steps. Therefore, node 0 can activate node 1 directly or through 0 → 4 → 1 by activating node 4. MID(v) calculates the total count of paths from v to all the vertices in the graph G for each path length. A vertex v can only activate a vertex u, if there exists a path from v to u. When the agent goes from state u to state v, the reward Rt + 1 is calculated as follows: R1 (t + 1) = CDD (v) + CMI D (v) + ω(v) + σ (v)

(8)

where CDD (u) is the normalized Diffusion degree, CMID (v) is the normalized maximum influence degree, ω(v) is the normalized volume spread of node v, and σ (v) is the normalized influence spread of node v. CDD (u), CMID (v), and σ (v) gives a measure of influence exerted by a node on other nodes of the graph while ω(v) defines the volume spread of a node. CDD (v) and CMID (v) were chosen as reward function in addition to the influence spread because these centrality measures provide a significant improvement over other existing centrality-based heuristics and even some greedy algorithms like PMIA [12] and SGA [31]. This reward function gives importance to nodes that are both influential and also prescribe large volume

Beyond Influence Maximization: Volume Maximization in Social Networks

143

Fig. 3 (a) Social network G (b) Adjacency matrix A of G (c) 2 power adjacency matrix A2 of G

of medications. Additionally, we used another reward function which we named R2 , which is defined as follows: R2 (t + 1) = CDD (v) + ω(v) + σ (v)

(9)

In this reward function, we removed CMID (v) as it gave one of the worst results in VM task. Lastly, the k nodes with the highest Q-value were selected as the solution set or the set of critical physicians. In the next section, we define then different learning agents used in this paper.

3.6 RL Learning Model 3.6.1 Q-Learning Q-learning [8, 28] is a model-free off-policy reinforcement learning algorithm. It does not require a model of the environment (thereby “model-free”), and it can handle problems with stochastic transitions and rewards without requiring

144

A. Choudhury et al.

adaptations [10]. The core of the Q-learning algorithm is the value iteration update rule defined below: (10) Qnew (St , At ) ← (1 − α) Q (St , At ) + α Rt +1 + γ maxQ (St +1 , a) a

where, Q(St , At ) is the state-action function, α is the learning rate, and γ is the discount factor. The Q(St , At ) (Q : S × A → R) calculates the quality of a stateaction pair. The Q-learning algorithm creates a Q-table for each state-action pair, where the initial Q-values are randomly assigned. At each time-step t, the Q-learning model (at state St ) interacts with the environment and selects an action At using the ε – greedy policy [10, 11], observes a reward Rt + 1 , and reaches a new state St + 1 (based on the action At taken from state St ) and the algorithm updates the current value of Q(St , At ) using a greedy policy (maxQ (St +1 , a) . In an ε – greedy policy, a

the greedy action (i.e., the action highest Q-value) is taken with a probability of 1− ε and a random action with a probability of ε. An episode of the algorithm ends when St + 1 is a final or a terminal state. Q-learning is called off-policy because it updates the Q-value of St , At (Q(St , At )) using the Q-value of the next state St + 1 and the greedy action a (maxQ (St +1 , a)) while following a ε – greedy policy for a

action selection. It chooses different policies for selecting an action (ε – greedy) and updating the Q-value (greedy).

3.6.2 SARSA SARSA [10, 12] is a model-free on-policy reinforcement learning algorithm. Similar to Q-learning, SARSA does not require a model (thereby “model-free”) of the environment. The core of the SARSA algorithm is the value iteration update rule defined below: Qnew (St , At ) ← Q (St , At ) + α (Rt +1 + γ (Q (St +1 , At +1 ) − Q (St , At ))) (11) where, Q(St , At ) is the state-action function, α is the learning rate, and γ is the discount factor. At each time-step t, the SARSA model (currently at state St ) interacts with the environment and selects an action At using the ε – greedy policy, observes a reward Rt + 1 , and reaches a new state St + 1 (based on the action At taken from state St ). The algorithm then updates the current value of Q(St , At ) using the ε – greedy policy(Q(St + 1 , At + 1 ) − Q(St , At )). SARSA is called on-policy because the SARSA agent interacts with the environment and updates the Q-value of St , At (Q(St , At )) based on current actions taken (Q(St + 1 , At + 1 ) − Q(St , At )), i.e., the policy that is used to select the action (ε – greedy) is the same one used to update the Q-value.

Beyond Influence Maximization: Volume Maximization in Social Networks

145

3.7 IBL Framework Instance-based learning (IBL) [13, 14, 31, 32] is the theory of how individuals make decisions from experience in dynamic tasks. In dynamic tasks, individuals make repeated decisions attempting to maximize gains over the long run [14]. IBL proposes a key representation of cognitive information: an instance. An instance is a representation of each decision option, often consisting of three parts: a situation (a set of attributes that define the option), a decision for one of the many options, and an outcome resulting from making that decision. In reinforcement learning terms, a decision is analogous to action situations are represented as states, and the outcome is the reward. The theory proposes a generic decision-making process that starts by recognizing decision situations, generating instances through the interaction with the decision task, and finishes with the reinforcement of instances that led to good decision outcomes through feedback. The IBL agent creates instances of the form (situation, decision, utility) and stores it in memory. Then, based on these instances, takes a decision that was most profitable for the current scenario.

3.7.1 Instance-Based Learning (IBL) Model The IBL model [31–34] chooses an action based on the blended value V of that action. The blended value [13] of an action (j) is the sum of all the observed rewards xi weighted by their probability of retrieval pi : Vj =

n 

pi xi

(12)

i=1

where, xi is the observed reward for that action a. The observed reward xi is calculated using Eq. (8), i.e., the same as one used in the RL framework. The probability of retrieval captures the imprecision of recalling past experiences from memory [35]. At any trial t, the probability of retrieval of observed outcome i is a function of the activation of that outcome relative to the activation of all the observed outcomes within that option, given by: i, t τ e A

Pi,t =



j, t τ je A

(13)

√ where, τ is random noise defined as τ = σ. 2, and σ is a free noise parameter.

146

A. Choudhury et al.

The activation of an outcome in a given trial is a function of the frequency and recency of its occurrence. At each trial t, activation A of outcome i is given by:  Ai,t = σ ln

1 − γi,t γi,t

 + ln





−d t − tp

(14)

tp ∈{1,....,t −1}

where, d is a free decay parameter, γ i, t is a random draw from a uniform distribution bounded between 0 and 1, and tp is each of the previous trial indexes in which the outcome i was observed. Lastly, the k nodes with the highest blended value were selected as the solution set or the set of critical physicians.

3.8 CELF-Volume Algorithm CELF-volume exploits the sub-modularity property of the volume spread (ω)   function. A function ω(·) is submodular iff ω(S ∪ {v}) − ω(S) ≥ ω(S ∪ {v}) − ω(S )   for any S ⊆ S ⊆ V and v ∈ V\S , where V is the set of nodes in the graph and S and  S are subsets of V [21]. As per the sub-modularity property, the marginal gain from adding an element to the set S is at least as high as the marginal gain from adding the same element to a superset of S. CELF first computes the w (u| Φ) for every node in V in the first iteration and then adds the node with the highest w (u| Φ) to S1 and uses it as an upper bound for the next iteration. In the next iteration, CELF calculates the w (u| S1 ) using MC simulations for all nodes in V\S1 in a descending order of w (u| Φ). If the marginal gain of any node is less than the upper bound, then the iteration is terminated. And the node with the highest w (u| S1 ) is added to S2 , and that becomes the upper bound for the next iteration. This process continues until k nodes are added to the seed set. By only calculating the marginal gain of nodes higher than the upper bound and not all the nodes, CELF reduces its computational overhead.

3.9 Baseline Algorithms PMIA The PMIA is a well-known proxy-based model that extends the MIA model [12]. In the MIA model, to estimate the information diffusion of node u, we create an arborescence tree rooted at u. An arborescence is a rooted digraph where all edges are either pointing toward the root (in-arborescence) or pointing away from the root (out-arborescence) [12]. The path between the root and other vertex in an arborescence is always unique. Thus, using arborescence, we can efficiently and exactly calculate the influence of a node without using MC simulations. To reduce the computational overhead of creating an arborescence, MIA makes the following two reductions. First, for any pair of nodes (u, v), u can influence v only

Beyond Influence Maximization: Volume Maximization in Social Networks

147

through the maximum influence path (MIP). Second, all MIP’s having influence probability less than the threshold θ are not considered. The influence probability of a path pp(P) is the product of influence probabilities of all edges in the path P. Intuitively, the probability of u activating v through path P, is dependent on u activating all nodes in the path from u to v. The maximum influence path MIPG (u, v) is the path with the maximum influence probability among all paths from u to v. Using these two assumptions, the MIA model used the Dijkstra [22] shortest-path algorithm to construct a maximum influence in-arborescence MIIA(u, θ ) containing all MIPs that are pointing towards u and a maximum influence out-arborescence MIOA(u, θ ) containing all MIPs that are pointing out of u each node u ∈ V. Using the maximum influence in-arborescence MIIA(v, θ ) and maximum influence outarborescence MIOA(v, θ ), we estimate the influence to v from all the other nodes in the graph and the influence of v on all other nodes in the graph respectively. By using MIIAs and MIOAs, the marginal gain (u| S) of adding any user u to a seed set S can be computed efficiently. One issue in the MIA model is that a node v will block the influence of another seed w in MIIA(u, θ ) if v is on the path from w to u in the in-arborescence. To address this issue, PMIA updates the influenced in-arborescence only after adding a node into the seed set so as to not block the influence of future seeds. Furthermore, using the individual spread of each node and dividing it by the total influence spread, the PMIA expert assigns weights to the recommended k nodes. Degree Centrality The degree centrality [22] for a node v is the fraction of nodes it is connected to. It is calculated as follows: CDD (u) =

dv (|N| − 1)

(15)

where, dv is the degree of node v, and N is the set of all nodes of the graph. The dv of a node in a directed graph is the sum of its in-degree and out-degree. RANDOM The RANDOM algorithm chooses nodes randomly from the vertex set of the social graph. Table 2 details the different algorithms used for solving the VM problem and their seed selection criteria.

3.10 Model Calibration We used a genetic algorithm (GA) [36] with a 5% mutation and an 80% crossover rate to tune the following hyper-parameters of Q-learning and SARSA: hop limit, number of episodes, α, and γ. The hop-limit defined the maximum number of steps an RL model can take before the episode terminates. The α parameter is the learning rate, and γ is the discount factor that determines the contribution of future reward

148

A. Choudhury et al.

Table 2 Algorithms and seed selection criteria Algorithm Weighted-greedy CELF-volume Diffusion degree

Degree centrality RANDOM PMIA Maximum influence degree Q-learning (R2) SARSA (R2) Q learning (R1) SARSA (R1) IBL

Seed set selection criteria Highest marginal gain (w (u| Φ)) Highest marginal gain (w (u| Φ)) Cumulative contribution of a node and its immediate neighbors on the diffusion of information CDD (u) Nodes with the highest degree Random Maximum influence in-arborescence MIIA(v, θ) and Maximum influence out-arborescence MIOA(v, θ), Total number of nodes reachable from v at different path lengths R2 (t + 1) = CDD (v) + ω(v) + σ (v) R2 (t + 1) = CDD (v) + ω(v) + σ (v) R1 (t + 1) = CDD (v) + CMID (v) + ω(v) + σ (v) R1 (t + 1) = CDD (v) + CMID (v) + ω(v) + σ (v) R1 (t + 1) = CDD (v) + CMID (v) + ω(v) + σ (v)

on the total reward received by the agent [8]. The hyper-parameters were varied in the following ranges: hop limit (10, 40), number of episodes (200, 1000), and alpha (0.1–0.99) and gamma (0.1–0.99). For the IBL model, we calibrated the following parameters in the given ranges: hop-limit (10, 40), number of episodes (200, 1000), noise (0.1–0.9), decay (0.1–0.9), and utility (100, 1000). The fitness value optimized by the GA was the cumulative volume of the k selected nodes returned by different models.

3.11 Expectation We expected Q-learning, SARSA, and IBL models to select nodes that give high influence and volume spread due to the nature of their reward functions Next, we expected the Weighted-greedy and the CELF-volume algorithm to give the best results compared to the other algorithms. A likely reason for this expectation was that the Weighted-greedy and the CELF-volume algorithm is based upon the greedy approach proposed by reference [8], which can efficiently approximate a solution within a factor of 1 − 1e of the optimal solution for the IM problem [8]. Lastly, we expected that the nodes selected by Weighted-greedy and CELF-volume to not be highly influential but to have volume spread specifically in case of PhysicianSN (again on account of its approximate solution above) while the node selected by PMIA to be influential but not have high volume spread.

Beyond Influence Maximization: Volume Maximization in Social Networks

149

4 Result In this section, we compare the results of our proposed framework on the PhysicianSN dataset with the Weighted-greedy, PMIA, Diffusion degree, Maximum influence degree RANDOM and degree centrality algorithms. For the PhysicianSN dataset, the optimum set of hyperparameters for the Q-learning model were the following: hop-limit = 28, number of episodes = 375, alpha = 0.59 and gamma = 0.89. The best set of parameters for the SARSA model were the following: hop-limit = 18, number of episodes = 525, alpha = 0.55 and gamma = 0.53. In case of the IBL model, the best set of parameters were the following: hop-limit = 30, number of episodes = 100, noise = 0.25, decay = 0.5, and utility = 1000. Table 3 shows the proportion of volume spread for all eleven algorithms. As shown in Table 3, the highest proportion of volume spread was obtained by the Weighted-greedy and CELF-volume algorithm followed by the Q-learning (R2), PMIA, Diffusion degree, SARSA (R2), degree centrality, SARSA (R1), Q-learning (R1), IBL, RANDOM and the maximum influence degree algorithm. The Weightedgreedy and the CELF-volume gave the optimum results for all seed set sizes. Qlearning (R2) performed consistently well across all seed set sizes (k = 10, 20, 30, 40, and 50) while PMIA initially performed poorly for k = 10 but gave similar results to Q-learning (R2) for all the other seed set sizes (k = 20, 30, 40 and 50). This result is as per expectation. The volume spread given in the Table 3 is the mean volume spread obtained over the 5 runs of the algorithms. Table 4 shows the proportion of influence spread for all eleven algorithms. As shown in Table 4, the highest proportion of influence spread was obtained by the PMIA algorithm followed by the Diffusion degree, Q-learning (R2), SARSA (R2), CELF-volume, SARSA (R1), degree centrality, Q-learning (R1), Weighted-greedy, IBL, RANDOM and the maximum influence degree algorithm. This result is as per expectation. Q-learning (R2) performed consistently well across all seed set

Table 3 Proportion of volume spread for different algorithms on the PhysicianSN dataset Weighted-greedy CELF-volume Q-learning (R2) Diffusion degree SARSA (R2) Degree centrality Q learning (R1) SARSA (R1) RANDOM IBL PMIA Maximum influence degree

10 0.64 0.63 0.53 0.48 0.46 0.45 0.43 0.43 0.42 0.41 0.38 0.22

20 0.74 0.75 0.62 0.60 0.59 0.57 0.53 0.50 0.45 0.49 0.6 0.33

30 0.81 0.81 0.69 0.65 0.65 0.63 0.61 0.59 0.47 0.59 0.69 0.35

40 0.85 0.86 0.73 0.72 0.72 0.69 0.67 0.68 0.55 0.66 0.73 0.37

50 0.88 0.89 0.77 0.75 0.77 0.74 0.74 0.74 0.60 0.69 0.8 0.46

150

A. Choudhury et al.

Table 4 Proportion of influence spread for different algorithms on the PhysicianSN dataset PMIA Diffusion degree Q learning (R2) SARSA (R2) CELF-volume SARSA (R1) Degree centrality Q learning (R1) Weighted-greedy IBL RANDOM Maximum influence degree

10 0.33 0.23 0.21 0.19 0.2 0.20 0.20 0.21 0.20 0.185 0.192 0.13

20 0.45 0.31 0.31 0.29 0.29 0.28 0.28 0.28 0.27 0.27 0.26 0.18

30 0.53 0.38 0.36 0.36 0.35 0.35 0.36 0.35 0.34 0.34 0.31 0.24

40 0.60 0.42 0.42 0.42 0.42 0.41 0.42 0.41 0.40 0.4 0.38 0.31

50 0.65 0.47 0.46 0.47 0.47 0.46 0.47 0.47 0.46 0.46 0.40 0.38

Table 5 Relative change in average proportion of volume and influence spread compared to the baseline algorithm

Weighted-greedy CELF-volume Q-learning (R2) Diffusion degree SARSA (R2) Degree centrality Q learning (R1) SARSA (R1) IBL PMIA Maximum influence degree

Relative change in proportion of volume spread 0.57 0.58 0.34 0.29 0.28 0.24 0.2 0.18 0.14 0.29 −0.31

Relative change in proportion of influence spread 0.08 0.12 0.14 0.17 0.12 0.12 0.11 0.10 0.07 0.66 −0.19

sizes (k = 10, 20, 30, 40, 50). Furthermore, as can be seen from Table 4, both the Weighted-greedy and CELF-volume performed poorly compared to the Q-learning (R2) and SARSA (R2) algorithms. The influence spread given in the Table 4 is the mean volume spread obtained over the 5 runs of the algorithms. Furthermore, Table 5 shows the relative difference in the average proportion of volume and influence spread compared to the baseline algorithm (RANDOM). As shown in Table 5, CELF-volume algorithm had the highest relative increment (0.58) in average proportion of volume spread (averaged over all the 5 seed set sizes) while PMIA had the highest relative increment in average proportion of influence spread (0.66) compared to the baseline algorithm. In comparision, Q-Learning (R2) performed consistently well in both criteria (0.34 relative increment in volume spread and 0.14 relative increment in influence spread).

Beyond Influence Maximization: Volume Maximization in Social Networks

151

1

Proportion of volume spread

0.9 0.8 0.7 0.6

0.67

0.62

0.74 0.68

0.71 0.64

0.66 0.6

0.57 0.54

0.49

0.5

0.78

0.72

0.45 0.38

0.4 0.3

PMIA Q learning(R2) SARSA(R2)

0.2 0.1 0 10

20 30 40 Initial seed set size (k)

50

Proportion of volume spread

Fig. 4 Proportion of volume spread for different algorithms trained on the 80% of the PhysicianSN dataset

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0.45 0.33

0.3

0.53 0.37

0.22

10

0.59 0.43

0.65 0.48

PMIA Q learning (80) SARSA (80) 20

30

40

50

Initial seed set size (k)

Fig. 5 Proportion of influence spread for different algorithms trained on the 80% of the PhysicianSN dataset

Lastly, to simulate an evolving social network we train the PMIA, Q-learning and SARSA models on 80% of the graph by randomly dropping 20% of the edges from the social network graph. We then tested the nodes selected by all the three algorithms on the whole graph. Figure 4 shows the volume spread of all three algorithms trained on 80% of the social network graph for different seed set sizes. As can be seen from Fig. 4 the volume spread of PMIA decreased when it was trained on 80% of the graph while the volume spread of Q-learning (R2) and SARSA remained similar. Figure 5 shows the influence spread of all three algorithms trained on 80% of the graph for different seed set sizes. As can be seen from the Fig. 5, the influence

152

A. Choudhury et al.

spread of all the three algorithms remained the same even when they were trained on 80 % of the social network graph.

5 Discussion and Conclusion Physicians can positively influence patients; health habits by counseling them about prevention and health-promoting behaviors [37]. Oberg and Frank [37] found that physicians’ health practices strongly influence patients’ health practices. As such, by effective communication, physicians can improve overall healthcare provided to patients. Physicians who interact with a large number of patients are better able to diffuse more pertinent information to patients (like safety guidelines for COVID 19) than physicians who rarely interact with patients. So we propose the problem of volume maximization that aims to identify the best set of influential physicians disseminating the most quantity of medical information due to their frequent interactions with patients. To solve this problem, we proposed three frameworks: an RL framework that developed Q-learning and SARSA models; and a cognitive framework that developed an IBL model and the CELF-volume algorithm. We compare these algorithms with the Weighted-greedy, PMIA [12], degree centrality [22], Diffusion degree [30], maximum influence degree [30] and a RANDOM (baseline) algorithm on a large healthcare dataset. First, we found that the highest proportion of volume spread was obtained by the Weighted-greedy and CELF-volume algorithm, followed by the Q-learning (R2), PMIA, Diffusion degree, SARSA (R2), degree centrality, SARSA (R1), Q-learning (R1), IBL, RANDOM and the maximum influence degree algorithm (see Table 3). The Weighted-greedy and CELF gave almost similar results. This can be attributed to the fact that both Weighted-greedy and CELF-volume selected nodes based on the largest marginal weighted gain w (u| S) where w (u| S)=ω(S ∪ {u}) − ω(S) and ω is the volume spread. As such both Weighted-greedy algorithm [8] and CELFvolume can efficiently approximate a solution within a factor of 1 − 1/e of the optimal solution. Second, we found that the highest proportion of influence spread was obtained by the PMIA algorithm followed by the Diffusion degree, Q-learning (R2), SARSA (R2), CELF-volume, SARSA (R1), degree centrality, Q-learning (R1), Weightedgreedy, IBL, RANDOM and the maximum influence degree algorithm (Table 4). This can be attributed to the fact that PMIA algorithm has theoretical guarantees that it can efficiently approximate a solution within a factor of 1 − 1/e of the optimal solution for the IM problem [12]. Next, we found that both Weighted-greedy and CELF-volume gave the highest volume spread but gave low influence spread. This result backs the observation made by Choudhury, Kaushik, and Dutt [7] that influential physicians are not high volume prescribers. Moreover, as can be seen from Table 5, CELF-volume algorithm had a relative increment of 0.58 (highest) in average proportion of volume spread and a relative increment of 0.12 in average proportion of influence spread compared to

Beyond Influence Maximization: Volume Maximization in Social Networks

153

the baseline algorithm. While, PMIA algorithm had a relative increment of 0.66 (highest) in average proportion of influence spread and a relative increment of 0.29 in average proportion of influence spread compared to the baseline algorithm. In contrast Q-learning had a relative increment of 0.32 in average proportion of volume spread and a relative increment of 0.14 in average proportion of influence spread. As such Q-learning (R2) performed consistently well in both criteria showcasing its strength in tackling the VM problem efficiently. This is be due to the reward function used by the RL and IBL models, where equal importance is given to both the volume spread and influence spread of a node. However, reward R2 performed better as compared to reward R1 for the Q-learning and SARSA models. This was because in reward R2 we used maximum influence degree which was giving the worst volume and influence spread among all the models. Furthermore, to simulate evolving graphs we trained the models on 80% of the graph and tested the selected nodes on the whole graph. We found that the nodes selected by the Q-learning and SARSA models had similar volume and influence spread to the nodes selected when they were trained on the whole graph. However, the volume and influence spread decreased in case of the PMIA algorithm. A likely reason for this is that Q-learning and SARSA models calculate the Q–value of each node based on the reward function while traversing the whole graph from multiple different start nodes. Lastly, since Q-learning, SARSA, and IBL store the Q-value or blended value of each node therefore one key advantage of Q-learning, SARSA, and IBL is that these algorithms need to run only once irrespective of different initial seed set sizes (k). While in case of Weighted-greedy, CELF-volume and PMIA the algorithms need to be run n times for n different initial seed set sizes (k). The objective of VM is to identify physicians that are both influential and spread the most amount of medical information to their patients through medicine prescriptions. By identifying such critical physicians, there will be better knowledge transfer between doctors and patients which will lead to better healthcare. Prior research has shown that patients who understand and communicate properly with their doctors are more likely to acknowledge health problems, understand their treatment options, and follow their medication schedules [18]. The Q and SARSA algorithms selected nodes that are both influential and had a high volume spread. However, since we are using tables to store the Q-values of nodes, as the size of the graph increases it would be computationally expensive to keep using tables. Therefore, in the future, we would like to implement deep Qnetworks (DQNs) and other RL approaches for solving the VM problem. The DQNs and other RL approaches may likely be able to generalize to graphs from the same distributions. This idea and others form the immediate next steps in our research program on social network analyses. Acknowledgement The project was supported from grant (awards: #IITM/CONS/PPLP/VD/033) to Varun Dutt.

154

A. Choudhury et al.

References 1. Dutta-Bergman M (2003) Trusted online sources of health information: differences in demographics, health beliefs, and health-information orientation. J Med Internet Res 5(3):e21. https:/ /doi.org/10.2196/jmir.5.3.e21 2. Pilnick A, Dingwall R (2011) On the remarkable persistence of asymmetry in doctor/patient interaction: a critical review. Soc Sci Med (1982) 72(8):1374–1382. https://doi.org/10.1016/ j.socscimed.2011.02.033 3. Sára Z, Csed˝o TTZ, Fejes J, Pörzse G (2013) Doctor-patient knowledge transfer: innovative technologies and policy implications. J Inf Eng Appl 3(3):32–38 4. World Bank (2020) World Health Organization;s Global Health Workforce Statistics, OECD. data.worldbank.org. [Online]. Available: https://data.worldbank.org/indicator/ SH.MED.PHYS.ZS. Accessed 27 July 2020 5. Borbas C, Morris N, McLaughlin B, Asinger R, Gobel F (2000) The role of clinical opinion leaders in guideline implementation and quality improvement. Chest 118(2):24–32 6. Lewis JM (2006) Being around and knowing the players: networks of influence in health policy. Soc Sci Med 62(9):2125–2136 7. Choudhury A, Kaushik S, Dutt V (2017) Social-network analysis for pain medications: influential physicians may not be high-volume prescribers. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining, Sydney, Australia 8. Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In: The ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC 9. Kempe D, Kleinberg J, Tardos É (2005) Influential nodes in a diffusion model for social networks. In: International colloquium on automata, languages and programming, Berlin, Heidelberg 10. Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT Press, Cambridge, MA 11. Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279–292 12. Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Cambridge University Press, Cambridge 13. Lebiere C (1999) Blending. In: Proceedings of the Sixth ACT-RWorkshop, Fairfax, VA 14. Gonzalez C, Lerch JF, Lebiere C (2003) Instance-based learning in dynamic decision making. Cogn Sci 27(4):591–635 15. Valente TW, Pumpuang P (2007) Identifying opinion leaders to promote behavior change. Health Educ Behav 34(6):881–896 16. Li Y, Fan J, Wang Y, Tan K-L (2018) Influence maximization on social graphs: a survey. IEEE Trans Knowl Data Eng 30(10):1852–1872 17. Travaline JM, Ruchinskas R, D’Alonzo GE Jr (2005) Patient-physician communication: why and how. J Am Osteopath Assoc 105(1):13 18. Stewart M (1995) Effective physician-patient communication and health outcomes: a review. Can Med Assoc J 15(9):1423–1433 19. Bull SA, Hu XH, Hunkeler EM, Lee JY, Ming EE, Markson LE, Fireman B (2002) Discontinuation of use and switching of antidepressants: influence of patient-physician communication. JAMA 288(11):1403–1409 20. Ciechanowski PS, Katon WJ, Russo JE, Walker EA (2001) The patient-provider relationship: attachment theory and adherence to treatment in diabetes. Am J Psychiatr 158(1):29–35 21. Bogardus ST Jr, Holmboe E, Jekel JF (1999) Perils, pitfalls, and possibilities in talking about medical risk. JAMA 281(11):1037–1041 22. Friedkin NE (1991) Theoretical foundations for centrality measures. Am J Sociol 96(6):1478– 1504

Beyond Influence Maximization: Volume Maximization in Social Networks

155

23. Choudhury A, Kaushik S, Dutt V (2018) Social-network analysis in healthcare: analysing the effect of weighted influence in physician networks. Netw Model Anal Health Inf Bioinf 7(17) 24. Rossi RA, Ahmed NK (2015) The network data repository with interactive graph analytics and visualization. In: Twenty-ninth AAAI conference on artificial intelligence, Austin, Texas 25. Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems, Atlanta, Georgia 26. Rozemberczki B, Davies R, Sarkar R, Sutton C (2019) GEMSEC: graph embedding with self clustering. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks and mining, Vancouver, British Columbia 27. IMS health. Healthcare organization services: professional and organization affiliations maintenance process 28. Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France 29. Chen W, Wang Y, Yuan Y (2013) Combinatorial multi-armed bandit: general framework, results and applications. In: International conference on machine learning, Sydney, Australia 30. Pal SK, Kundu S, Murthy CA (2014) Centrality measures, upper bound, and influence maximization in large scale directed social networks. Fundamenta Inf 130(3):317–342 31. Gonzalez C, Dutt V (2010) Instance-based learning models of training. In: Proceedings of the human factors and ergonomics society annual meeting, Los Angeles, CA 32. Dutt V, Gonzalez C (2012) Making instance-based learning theory usable and understandable: the instance-based learning tool. Comput Hum Behav 28(4):1227–1240 33. Gonzalez C, Dutt V (2011) Instance-based learning: integrating sampling and repeated decisions from experience. Psychol Rev 18(4):523 34. Lejarraga T, Dutt V, Gonzalez C (2012) Instance-based learning: a general model of repeated binary choice. J Behav Decis Mak 25(2):143–153 35. Whitley D (1994) A genetic algorithm tutorial. Stat Comput 4(2):65–85 36. Oberg EB, Frank E (2009) Physicians’ health practices strongly influence patient health practices. J R College Physicians Edinb 39(4):290

Concerns of Indian Population on Covid-19 Vaccine Shortage Amidst Second Wave Infection Rate Spikes: A Social Media Opinion Analysis Remya Lathabhavan

and Arnob Banik

Abstract The study aims to understand the perception of Indian population on COVID-19 vaccine shortage in India during rapid hike of cases in the COVID-19 second wave. Using a Twitter API, 46,000 unique tweets of Indian citizens have been scrapped having the word ‘vaccine’, ‘second wave’ and ‘COVID’. Then, a topic modeling method was used to analyze the data. The study analyzes five key themes based on the perception of people. Indian population is concerned about vaccine shortage and a collective effort is required for the wellbeing of citizens. Keywords COVID-19 · Second wave · India · Social media

1 Introduction The novel Coronavirus disease (COVID-19) has an effect in global health, raising concerns on local public health [16]. The second wave COVID-19 had a higher impact compared to the first one, with various forms of mutations and fast infection rates [2]. India is one of the countries impacted the most by the second wave of COVID-19 with 22,296,414 confirmed cases, 2,42,362 confirmed deaths and 4,03,738 per day cases, as documented during the Fall of 2021 [19]. Being the second populated country in the world, vaccination challenges and concerns are raising in India [4]. Along with second wave calamities, India faces huge crisis on vaccine availability and supply. The present work investigates the social media opinion analysis of the concerns of Indian population on Covid-19 Vaccine Shortage amidst second wave infection

R. Lathabhavan () Indian Institute of Management Bodh Gaya, Bodh Gaya, Bihar, India e-mail: [email protected] A. Banik VIT University, Vellore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_8

157

158

R. Lathabhavan and A. Banik

rate spikes. What follows is a brief literature review on the topic and the methodological approach we used to process and analyze our data.

2 Literature Review The COVID-19 pandemic has affected severely throughout the globe and raising health concerns on citizens [14]. The health concerns are not only limited in the high infection rates and death tolls, but also extended to mental health of major population [9, 10]. Strengthening the structure of both physical health and mental health were laid as the priority of the nations [11, 12]. India was one among the worst hit countries around the globe due to the pandemic impacts. Strengthening the healthcare system with proper availability and distribution of COVID-19 vaccine is challenging for India due to its huge population and mindset of people [17]. Whereas, during the COVID-19 second wave in India, as people were aware of pandemic effects and realizing the calamities, they were keener to get vaccinated [13]. As social media plays a major role in public opinion sharing during the pandemic [3], in this work we have chosen a large-scale dataset generated from Tweeter feeds, which are opinions from the public on vaccine management in India. Then, we focused on analyzing these opinions. We first used a topic modeling technique. Topic modelling is a text-mining technique to analyze tweets, as a machine learning approach would be a better fit for analyzing the huge volume of free-text data in tweets [8]. We have selected Twitter for social media opinion due to public availability of tweets and ease of collecting a large-scale dataset based on Twitter. Moreover, Twitter is the among the popular social media platforms in globally including in the United Kingdom and the United States. All data used in this study were collected according to the Twitter terms of use and were publicly available at the time of collection and analysis.

3 Methodological Approach As the Twitter feeds involve huge volumes of data, the suitable approach for analysis was Topic Modeling [1]. Topic Modeling is an unsupervised machine learning technique that identifies key topics within free-text data, based on statistical probability and correlations among words. It is similar to the thematic analysis in qualitative methodology [1]. The major difference between the topic modelling and thematic analysis is that topic Modeling does not require manual processing to classify the free-text data and hence is well-suited for analyses of large volumes of free-text data as we used in this study.

Concerns of Indian Population on Covid-19 Vaccine Shortage Amidst Second. . .

159

3.1 Preprocessing Before conducting Topic Modeling, we followed a four steps approach to preprocess the free-text data [1]. What follows is a brief description of these four steps: – In the first step, each sentence in the Twitter feeds was reduced to individual words. For example, a sentence ‘COVID-19 pandemic impacted badly’ is reduce to individual words of ‘COVID-19’,‘pandemic’, ‘impacted’ and ‘badly’. – In the second step, stop words were removed from the lists of words. Stop words are the words that occur so frequently in the English language that they do not add value in the identification of topics. Examples of stop words are ‘and’, ‘the’, ‘a’ etc. – In the third step, the remaining words were converted to their ‘root’ form. For example ‘impacted’ would be converted to ‘impact’. – In fourth and last step, words that occurred in less than 50 Twitter feeds were removed to reduce statistical noise and improve accuracy [1]. Words that have high probability of co-occurring in Twitter feeds were then considered to belong to the same ‘topic’. Using the Twitter API [6], we have scrapped 46,000 unique tweets of Indian citizens having the word ‘vaccine’, ‘second wave’ and ‘covid’. The tweets considered were posted from the 15th of April 2021 to the 1st of June 2021.

3.2 Topic Modeling Process In Topic Modeling, the processed words were presented to the unsupervised machine learning algorithm to identify clusters of words that tended to happen together. Words that have high probability of co-occurring in Twitter feeds were then considered to belong to the same ‘topic’. Three covariates were included in the Topic Modeling (country of Twitter feeds, date of Twitter feeds, and number of followers of the Twitter user) to improve the accuracy of the model in identifying the topics. The optimal number of topics were identified using the anchor words algorithm [15]. The anchor words algorithm performs efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space. After identifying the optimal number of topics, those topics that were related to each other could be further lumped together to form a theme – this clustering approach was conducted using Hierarchical Clustering method, with the optimal number of themes identified through the elbow method [7]. After identifying the optimal number of topics and themes from the Twitter feeds, the descriptive label for each topic/theme was manually crafted by the authors based on the keywords and sample Twitter feeds within each topic/theme.

160

R. Lathabhavan and A. Banik

4 Results A total of 55,000 Twitter feeds were initially identified in the period of 15th April 2021 to 1st June 2021. After removing Twitter feeds without relevant terms of vaccine’, ‘second wave’ and ‘covid’, removing duplicate posts as well as Twitter feeds by organizations, a final 46000 Twitter feeds were included in the study. Majority of the posts originated from India (89.3%). While considering continental origin approach, the rest of tweets were from other Asian countries (7.2%), Europe and America (2.4%) and unknown location (1.1%). Altogether, 30 topics were identified from the included Twitter feeds. The 30 topics could possibly be clustered into 5 key themes using Hierarchical Clustering, as suggested by the elbow method. The results of our analysis depicted in Table 1. Our study analyzed five key themes: 1. 2. 3. 4. 5.

Anxiety on COVID-19 vaccine unavailability COVID-19 vaccine shortage concerns Concerns of COVID-19 vaccine management Possibility of COVID-19 price hike and Anxiety due health conditions around.

As India adopted a phased manner of vaccine distribution for different cohort groups, demand increased COVID –19 suddenly during second wave due to fear of unavailability or shortage. It also caused price hike too, which became a concern for many people. The surrounding situations of public health also caused concerns among people on vaccine’s demand supply gap. Figures 1a, 1b, 1c, 1d, and 1e, below, depict the different themes. Theme 1 was talked about in 33.4% of the Twitter feeds, and involves keywords such as vaccine, availability, doses, registration, hidden and provider etc. It describes the Anxiety on COVID-19 vaccine unavailability. Theme 2 was talked about in 21.9% of the Twitter feeds, and involves keywords such as vaccine shortage, unavailable, queues, short, app, failed and 18–45. It describes COVID-19 vaccine shortage concerns. Theme 3 was talked about in 17.2% of the Twitter feeds, and involves keywords such as management, allocation, covishield, dropped, app and

Table 1 Topic labels and top words Topic label Anxiety on COVID-19 vaccine unavailability COVID-19 vaccine shortage concerns Concerns of COVID-19 vaccine management Possibility of COVID-19 price hike Anxiety due health conditions around

Top words Vaccine, availability, doses, registration, hidden, provider Vaccine shortage, unavailable, queues, short, app, failed, 18–45 Management, allocation, covishield, dropped, app, circulated Supply, procure, increase, hike, launching, angry Severity, variant, cases, travel, exam

Concerns of Indian Population on Covid-19 Vaccine Shortage Amidst Second. . .

161

Fig. 1a COVID Vaccine availability

Fig. 1b COVID Vaccine shortage

circulated. It describes Concerns of COVID-19 vaccine management. Theme 4 was talked about 15.2% of the Twitter feeds, and involves keywords such as Supply, procure, increase, hike, launching and angry. It describes Possibility of COVID19 price hike. Theme 5 was talked about 12.3% of the Twitter feeds, and involves keywords such as severity, variant, cases, travel, exam. It describes Anxiety due health conditions around.

162

Fig. 1c COVID Vaccine management

Fig. 1d COVID Vaccine price hike

R. Lathabhavan and A. Banik

Concerns of Indian Population on Covid-19 Vaccine Shortage Amidst Second. . .

163

Fig. 1e Concerns among Indian population

5 Discussion This study adopted a novel approach – using a social media platform – to harness 5 close-to-real-time public sentiments related to COVID-19 vaccine shortage during the second wave of infection in India. It provided an alternative avenue to gather ground-level data on COVID-19 vaccine shortage especially during a time when traditional research methodologies can be restricted by social distancing measures. Twitter feeds related to vaccine shortage were posted by citizens in India, of which 5 key themes could be identified, viz. 1. 2. 3. 4. 5.

Anxiety on COVID-19 vaccine unavailability, COVID-19 vaccine shortage concerns, Concerns of COVID-19 vaccine management, Possibility of COVID-19 price hike and Anxiety due health conditions around.

A two-fold discussion on both physical health and mental health is needed on this issue, which has an intertwining effect on the overall public health. – First, a systematic strategy and proper supply methods can ensure further crises, but for which a collective effort of institutions, health workers, authorities a both states and national level to be ensured [5]. – Second, to uphold mental health among citizens, proper awareness, information tracking, technological support such as help lines and online counseling, and other supporting mental health care facilities should be provided with a collective supportive system with a co-operative approach from individuals,

164

R. Lathabhavan and A. Banik

institutions, mental healthcare professionals and government [12, 18]. These proactive measures can provide a better healthy society.

6 Conclusions The present study found out five key themes after analyzing social media tweet feeds, viz. (1) Anxiety on COVID-19 vaccine unavailability, (2) COVID-19 vaccine shortage concerns, (3) Concerns of COVID-19 vaccine management, (4) Possibility of COVID-19 price hike and (5) Anxiety due health conditions around. The anxiety and concerns were highly visible and indicate the mental health problems during this time. A collective and supportive system of individuals, healthcare professionals, institutions and government can bring up changes in such scenario.

References 1. Banks GC, Woznyj HM, Wesslen RS, Ross RL (2018) A review of best practice recommendations for text analysis in R (and a user-friendly app). J Bus Psychol. https://doi.org/10.15139/ S3/R4W7ZS 2. Cacciapaglia G, Cot C, Sannino F (2020) Second wave COVID-19 pandemics in Europe: a temporal playbook. Sci Rep 10(1):1–8. https://doi.org/10.1038/s41598-020-72611-5 3. Chen Q, Min C, Zhang W, Wang G, Ma X, Evans R (2020) Unpacking the black box: how to promote citizen engagement through government social media during the COVID-19 crisis. Comput Hum Behav 110:106380. https://doi.org/10.1016/j.chb.2020.106380 4. Forman R, Shah S, Jeurissen P, Jit M, Mossialos E (2021) COVID-19 vaccine challenges: what have we learned so far and what remains to be done? Health Policy 125(5):553–567. https:// doi.org/10.1016/j.healthpol.2021.03.013 5. Foy BH, Wahl B, Mehta K, Shet A, Menon GI, Britto C (2021) Comparing COVID-19 vaccine allocation strategies in India: a mathematical modelling study. Int J Infect Dis 103:431–438. https://doi.org/10.1016/j.ijid.2020.12.075 6. Hasan MR, Maliha M, Arifuzzaman M (2019) Sentiment analysis with NLP on Twitter data. Materials and Electronic Engineering (IC4ME2). IEEE 7. Gareth J, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning with applications in R related papers 8. Koh JX, Liew TM (2020) How loneliness is talked about in social media during COVID19 pandemic: text mining of 4,492 Twitter feeds. J Psychiatr Res. https://doi.org/10.1016/ j.jpsychires.2020.11.015 9. Lathabhavan R (2021a) A psychometric analysis of fear of COVID-19 scale in India. Int J Ment Heal Addict 0123456789. https://doi.org/10.1007/s11469-021-00657-1 10. Lathabhavan R (2021b) Covid-19 effects on psychological outcomes: how do gender responses differ? Psychol Rep. https://doi.org/10.1177/00332941211040428 11. Lathabhavan R (2021c) First and second waves of COVID-19: a comparative study on the impact of pandemic fear on the mental health of university students in India. J Loss Trauma:1– 2. https://doi.org/10.1080/15325024.2021.1950432 12. Lathabhavan R (2021d) People and social media platforms for positive mental health- a paradigm shift: a case on COVID-19 impact form India. Asian J Psychiatr 56:102460. https:// doi.org/10.1016/j.ajp.2020.102460

Concerns of Indian Population on Covid-19 Vaccine Shortage Amidst Second. . .

165

13. Lathabhavan R, Padhy PC (2022) Role of fear of COVID-19 in the relationship of problematic internet use and stress: a retrospective cohort study among Gen X , Y and Z. Asian J Psychiatr 67:102937. https://doi.org/10.1016/j.ajp.2021.102937 14. Lathabhavan R, Sudevan S (2022) The impacts of psychological distress on life satisfaction and wellbeing of the Indian general population during the first and second waves of COVID19: a comparative study. Int J Ment Heal Addict 0123456789. https://doi.org/10.1007/s11469021-00735-4 15. Lee M, Mimno D (2014) Low-dimensional embeddings for interpretable anchor-based topic inference. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1319–1328. http://metaoptimize.com/projects/wordreprs/ 16. Paakkari L, Okan O (2020) COVID-19: health literacy is an underestimated problem. Lancet Public Health 5(5):e249–e250. https://doi.org/10.1016/S2468-2667(20)30086-4 17. Sv P, Lathabhavan R, Ittamalla R (2021) What concerns Indian general public on second wave of COVID-19? A report on social media opinions. Diabetes Metab Syndr Clin Res Rev 15(3):829–830. https://doi.org/10.1016/j.dsx.2021.04.001 18. Tandon R (2020) COVID-19 and mental health: preserving humanity, maintaining sanity, and promoting health. Asian J Psychiatr 51 19. WHO 2021 World Health Organization. https://covid19.who.int/region/searo/country/in. Accessed on 24th May 2021 Remya Lathabhavan is an Assistant Professor of Indian Institute of Management Bodh Gaya, India. Her research interests include Glass Ceiling, Corporate Social Responsibility, Human Resource Management, Data Analytics, Artificial Intelligence, Career Progression, mental health, and psychology. She authored many articles and book chapters in peer reviewed journals and books chapters and books. Arnob Banik is a B Tech student in Computer Science and Engineering, VIT University, Vellore, India. His research interests include data mining, text analytics, positive psychology, and mental health.

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors Victor Philippe, Suha Reddy Mokalla, and Thirimachos Bourlai

Abstract Since the pandemic is still a challenge all over the world, wearing a mask has become the new norm. Even if we step out of this pandemic soon, another one may come in the future, and thus, wearing a mask will always be considered a step towards the right direction in preventing the spread of a virus. Wearing face masks poses a challenge for existing biometric systems that are passive and depend on the full facial information to perform well. This book chapter explains the impact of the usage of face masks on the performance of various face detectors trained using some of the most efficient object detection models to-date in terms of accuracy and computational complexity. The chapter presents an insight into the effects of face masks on face detection in one of the most challenging spectral bands, i.e. the Mid-Wave Infrared (MWIR; 3–5 μm) and addresses the problem by training new models using masked face image data. Initially, we train and test two of each model, one with masked and one with unmasked data, to establish a baseline of performance. Then, we test those model’s abilities to generalize the face boundaries on the opposite face data from what it was trained on. Subsequently, we determine a percent drop in performance, called a penalty, when comparing a model’s performance when tested on the same category of data it was trained on versus tested on the opposite category of data. For unmasked trained models tested on masked data, we noticed an average precision and recall penalty of 37.27% and 37.25%, respectively. Similarly, the computational time for the detection of a single face image increased by an average of 12.72%. Then, this process is reversed, where we test the models we trained on masked data with unmasked face data and determine the drop in performance when each model is asked to detect faces on this unfamiliar data. We notice a near identical drop or penalty in detection time at 11.81% as well as an average precision drop of 30.24% and a recall drop of 29.50%. Faster R-CNN Inception-ResNet V2 was determined to be superior in terms of performance. Across all experiments performed, it outperforms all other models in terms of precision and recall. The downside is the fact that it is slow compared

V. Philippe () · S. Mokalla · T. Bourlai Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA e-mail: [email protected]; [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_9

167

168

V. Philippe et al.

to all other models, making it a less attractive solution when working in mobile or time sensitive operations. Our fastest model is the CenterNet RestNet50 V2, but performs poorly when the train and test data are different. This shows that unmasked train models are considerably disadvantaged in scenarios where masks are required and face detection algorithms are used. Models trained on solely masked face data perform marginally better, but largely cannot stand on their own. This highlights the need for using masked face data when training face detection models in today’s ever changing Covid-19 landscape. Keywords Face detection · Deep learning · Masked faces · Thermal imaging · MWIR · Face recognition · Night-time environments · COVID-19

1 Introduction With mask mandates all over the world, wearing a face mask became a new habit. This leads to a need to develop face based biometric applications on masked face data. In the past, the majority of face based biometric research focused mostly on faces that do not suffer of any occlusion. Then, different types of occlusion scenarios started becoming the focus (usage of eye wear, wearing of face masks used by different cultures). Within the last two years, the problem of visible band facial occlusion due to wearing face masks (surgical or not) that are used to protect the health of subjects, has gained the interest of many researchers around the world. This brings a need to develop new face based models that work well with that type of occluded faces. The question that arises is why we use the human face as the selected biometric modality. Collecting face image data to use for identification and verification of individuals has advantages over other biometric modalities such as finger-print or iris. These advantages include the fact that visible face image data can be collected in a covert manner without the knowledge of the subject in a non-cooperative environment, at various stand-off distances, and passively (no physical contact is required with the biometric sensor). There are a lot of areas of face-based research. Among many challenging problems, many researchers still focus on face detection, classification (soft biometric problems such as gender, or ethnicity detection etc.), and recognition models. Most of the proposed solutions in the literature focus on visible band face recognition related challenges. Developing deep learning based models that operate in the visible spectrum is relatively easier than that developed for thermal or other IR bands, due to the availability of publicly available large-scale visible band face datasets [1, 2]. However visible band based facial recognition fails when working with image data collected under low- to night-time (no light) conditions. Using images collected in spectra other than the visible, is gaining the interest of researchers in the recent decades due to its invariability to the lighting conditions [3].

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

169

The IR spectrum is broadly divided into the active and passive IR. The active IR comprises of the NIR (Near IR) and SWIR (Short Wave IR) bands, while the passive IR comprises of the MWIR (Mid-Wave IR) and LWIR (Long Wave Infrared) bands. This chapter focuses on face images collected in the MWIR spectrum. MWIR has the advantage of not requiring any external light. When an image or a video is captured using an MWIR camera, heat emitted from the subject’s face is captured as IR radiation to form the image. Since capturing the thermal image requires only the heat pattern and not the color information, challenging lighting conditions (low or not light) do not affect the data collection procedures or the quality of the data collected. Due to the absence of real MWIR face data and especially of MWIR face data occluded with face masks, it is important to design and develop face recognition models for MWIR faces occluded with masks. A face recognition model consists of several building blocks, namely, acquiring face image data, face detection, face alignment, and face identification or verification. Face detection refers to localizing a face in the given face image to remove any face-irrelevant information such as background clutter, which is not useful and is a hindrance to face recognition. Then, the localized and cropped face needs to be aligned (as expected by the majority of modern face matchers). One of the ways to perform face alignment is through geometric normalization, where the eye centers are detected and the image is rotated such that the line joining the eye centers (not necessarily of the pupil) is horizontal and the eye center coordinates are fixed. After aligning all the faces in the dataset, face recognition is performed, using either the identification (identifying a person in a given set of gallery images), or verification (comparing two face images to verify that they belong to the same person) functionalities. Face detection is an important building block of any face recognition model and can greatly affect the accuracy of face matchers. One way to localize the faces is by manually annotating each face in all face images available in the dataset. This is possible when the size of the dataset is limited to a few hundreds to a few thousands images. However, in practical face recognition applications, the size of the dataset can be millions of images. In such a case, manually annotating the faces becomes impractical. The solution to this problem is to develop a face detection model that can work in varying MWIR-based scenarios. One of these scenarios is where the faces are occluded by masks. This is important because mask mandates can be in place in many countries, especially due to a pandemic and its potential various waves.

1.1 Goals and Contributions This chapter focuses on training and testing various deep learning based object detection models on MWIR face images with and without face masks. Figure 1 shows the current shortcomings of face detection in MWIR, where it can be seen that the faces are not correctly detected. The results shown are obtained when models

170

V. Philippe et al.

Fig. 1 This figure shows face detection examples in the MWIR band, where the bounding box computed by a model is unacceptable. For our purposes, the face must be centrally enclosed, where the box includes the top of the forehead and the bottom of the chin and is bounded laterally by the ears. We use this convention in order to maximize the potential information that any downstream models can use such as face segmentation and facial recognition. This figure demonstrates the need for models that can handle masked face data more efficiently

trained on unmasked data are used on masked data, and vice versa. The first goal of this paper is to demonstrate the importance of training new face detection models in the thermal spectrum with masked face data. Next, we aim to train and test new models using transfer learning on masked and unmasked data. Finally, we aim to demonstrate the importance of automatic face detection models and their impact on face recognition accuracy. The specific contributions of this work are threefold: – Designed and developed efficient deep learning based pre-trained object detection models and trained them on MWIR face images. – Yielded comparable results to the state-of-the-art for face detection without having to train the original models. – Trained models using masked MWIR face images that yield high precision and recall values. Rest of the chapter is organized as follows. Section 2 presents detailed up-todate literature review in the field of face detection. The methodological approach followed to address the problem of masked face detection is presented in Sect. 3. In Sect. 4, the details on the implementation of the models for training and testing are presented, while drawn conclusions and a scope for future work are presented in Sect. 5.

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

171

2 Related Work Modern deep learning based face detection algorithms have substantially improved over the last ten years due to the availability of newer datasets, novel algorithms, and other technologies that enable them to better detect human faces [4–10]. Kabakus analyzed well known, open source face detectors such as YOLOFace, which is based off a CNN (Convolutional Neural Network) design in YOLOv3[11], proved to be the most accurate on the CelebA dataset [12] at the time. Improvements on CNN-based approaches on face detection and segmentation came about with the Mask R-CNN (Region based Convolutional Neural Network) [13] and performs better on the Face Detection Dataset and Benchmark (FDDB) [14] than previous R-CNN models. Zheng et al. similarly adapted an R-CNN model to be more versatile in the creation of their Multiple Scale Faster R-CNN model which led to superior precision and recall over the WIDER FACE dataset [2] compared to the standard Faster R-CNN approach. Lin et al. [5] proposed a face detection and segmentation based on an improved version of Mask R-CNN and called it G-Mask. They used ResNet-101 [15] to extract features, RPN (Region Proposal Network) to generate ROIs (Regions of Interest), and ROIAlign to preserve the exact spatial locations to generate binary masks through FCN (Fully Convolutional Network). Zheng et al. [6] proposed a CNN-based approach for face detection called Multiple Scale Faster R-CNN, which extends the framework of Faster R-CNN by allowing it to span the receptive fields in the ConvNet in multiple deep feature maps. They also introduced multiple scale RPN to generate a set of region proposals and use the proposed multiple scale region based CNN to extract ROIs of facial regions and a confidence score is calculated for each ROI. Finally, the quality of the detection was estimated by thresholding these generated confidence scores in all the face images. Guo et al. [7] proposed an occlusion-robust face detection model that combines both the shallow and deep proposals to produce a more comprehensive set of candidate region instead of using an RPN that only generates deep proposals and might fail to detect faces in extreme conditions caused by low resolution, severe pose, and large occlusions. They embed the shallow proposals provided by a human upper body detector consisting of a coarse Haar-like feature based cascade classifier and a fine Support Vector Machine (SVM) classifier. Then, Faster R-CNN is employed to make the final decisions. Qin et al. [8] proposed a joint cascaded CNN for face detection in which the detection and the calibration network share a multi-loss network used for both detection and bounding-box regression. They showed that the cascaded CNN can be trained using the back propagation algorithm used in traditional Generative Adversarial Network (GAN). As masks became a part of everyday life starting in 2019, the pandemic brought new needs for algorithms tailored to new large subsets of the world population that would or would be required to wear masks. Early efforts in this new domain

172

V. Philippe et al.

saw the use of statistical methods such as PCA (Principal Component Analysis) to categorize images into masked or unmasked which found unmasked data considerably easier to categorize [16]. In [17], the authors proposed a hybrid method combining ResNet50’s feature extraction and three machine learning classifiers to achieve high performance on a large set of unmasked data versus real and fake masked data. While image categorization works for some applications, it is often necessary to detect multiple faces in an image or use the qualities from the face bounds. Some applications such as banks, casinos, and subway stations rely on quick and accurate detection performance such as the ones touted in [18]. Their MobileNetV2 and Single-shot Multi-box Detector (SSD) based model saw an F1 score of 0.93 and a Frames per second (FPS) performance of 15.71[19, 20]. In [21], Chavda et al. proposed a two stage face detector and mask detector using pretrained deep learning models to achieve better results on masked faces than [16] and [17]. They used one CNN to find human faces in an image. Then, these ROIs (Regions of Interest) are fed to another CNN to determine if the face was wearing a mask or not. Our approach attempts to bring about frontal face detection exclusively in the MWIR band on subjects wearing and not face masks. The main challenge is that when operating in the MWIR band, facial recognition and other related tasks are more complicated than when operating in the visible band due to the lack of publicly available face data [22–30]. In this work we compare how models trained exclusively on one type of data fare on the opposite as well as their own. This would help determine the performance challenges face detection algorithms face if they have not been properly trained with MWIR masked face data, but are asked to infer on them. Similarly, we record inference speed across all models and types of data to determine the trade off between the speed and accuracy of different face detectors.

3 Methodology This section explains the methodological approach followed in this chapter to address the problem of masked face detection in the MWIR spectrum. A graphical overview of the methodology is outlined in Fig. 3. First, all the images are manually annotated by drawing bounding boxes around human faces. Sample masked and unmasked face images are provided in Fig. 2. These are used as training labels and as ground truth annotations to calculate the accuracy metrics for detection, which are precision and recall. Then, the deep learning models described below are trained and tested using transfer learning. Once the models are trained, the bounding box annotations, obtained by our automatic face detection algorithm, are used to crop the images. These cropped faces are used for same spectral, same scenario face recognition experiments, and these results are compared to those obtained using manual annotations (baseline).

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

173

Fig. 2 The same two subjects are featured wearing and not wearing masks. While there is a considerable change in lighting from the camera or otherwise when comparing subject 1 and 2, the facial features of the subjects remain undisturbed due to the thermal signatures not being affected by light. Images 1b and 2b show masked faces, where the mouth and nose are fully obscured. However, you can see thermal peaks from the warm breathing condensation that appear around the nose area

3.1 Deep Learning Models The deep learning models used in this chapter for training and testing for face detection with mask and no mask data are SSD MobileNet V2, SSD ResNet50 V1, CenterNet HourGlass104, CenterNet ResNet50 V2, and Faster R-CNN Inception ResNet V2. A brief description of each of the models follows as well as a brief justification for their use:

3.1.1 SSD MobileNet V2 The SSD MobileNet model we trained, more specifically outlined in the Tensorflow object API as “SSD MobileNet v2 320 × 320”, uses the established SSD or single shot detector as the object detecting algorithm and MobileNet V2 as the feature

174

V. Philippe et al.

Fig. 3 Overview of the proposed methodology. Image data in the MWIR spectrum is acquired with a sensor and then is split into testing and training sets. Then, a pre-trained object detection model is trained and tested to determine its applicability to the data we have acquired. We pull these models from Tensorflow’s Object Detection API [31]. Subsequently, the model with the highest performance metrics including, but not limited to, precision, recall, and speed, is selected for use in further applications such as face recognition

extract[19, 20]. SSD uses a feed-forward convolutional neural network to establish bounding boxes and metrics for object presence and has been hailed for its speed and accuracy compared to other detection algorithms such as Faster R-CNN [20]. The MobileNet v2 architecture improves on its predecessor in terms of reducing its computational costs, Top 1% accuracy on the COCO dataset, and speed. The input size is established as the square 320×320. This model was chosen due to its application for mobile phones and fast, real-time detections that are often needed in biometric security applications.

3.1.2 SSD ResNet50 V1 Similar to the previous MobileNet model, this SSD algorithm is coupled with the widely known ResNet50 feature extractor [15, 20]. As previously noted, the SSD algorithm remains one of the most efficient and speedy object category score predictors for use in one class and multi class applications. The ResNet50 aims to outperform MobileNets, while unsurprisingly sacrificing speed to calculate its features. We used an input size of 1024 × 1024 which would increase training time due to the increase in pixels of the samples, but is expected to yield improved

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

175

detection performance results. We also tuned a ResNet50 based model, since it is one of the most widely cited one.

3.1.3 CenterNet HourGlass104 Another model we experimented with is the CenterNet HourGlass104. CenterNet is a one stage detector based off of the CornerNet architecture [32] that was famous for its creation of bounding boxes using key-points, which outperformed all single stage detectors such as SSD at the time of its publication[33]. CenterNet introduces a three key-point design and the use of novel cascading and center pooling modules to build its detector. It improved upon the MS-COCO test-dev dataset benchmarks across the board for single stage detectors and is comparable if not vastly superior to most twostage detectors at the time of its creation. This solution offers impressive results. The feature extractor of the HourGlass104 model, is based off of the Hourglass module introduced in Alejandro et al. [34]. Originally used for human pose estimation, the Hourglass module was designed to take in as much information from an image to properly identify knees, wrists, and other body parts. However, its image analysis performance is notable, so we decided to include it in our analysis of its capabilities of masked and unmasked thermal face data. We also used an input size of 512×512 face images for training and testing.

3.1.4 CenterNet ResNet50 V2 We also introduced a Google model with CenterNet’s precise detection algorithm to compare ResNet versions. Like the previous model, we use CenterNet as the detector, but instead use ResNet50 as the feature extractor [33, 35]. This version of ResNet50 improves upon V1 [15] by changing the order in which batch normalization and ReLU layers are placed with respect to the weights layers. By placing them before weight layers rather than after them, just like in [15], the ResNet V2 family of models has better final training loss, handles better overfitting, and yields better performance on the CIFAR-10 and CIFAR-100 datasets. We configure our training data to fit the size of 512 × 512 for the purposes of this chapter. We decided to use this model to look at the differences in its performance when using an alternative feature extractor to Sect. 3.1.4, while using the same detection algorithm.

3.1.5 Faster R-CNN Inception-ResNet V2 With several one stage detectors and pure deep learning models, the inclusion of a two stage detector and hybrid models was considered. Two stage detectors, such as the Faster R-CNN we used, work by having a region proposal network to give some potential bounding boxes. Then, in the second stage, the features are extracted

176

V. Philippe et al.

from the boxes for classification[36]. Single stage detectors, such as the SSD and CenterNet, do not have a region proposal network or similar operation, and thus they are considered faster and more efficient for mobile or real time applications. Faster R-CNN builds off of Fast R-CNN and R-CNN to create a well known Convolutional Neural Network (CNN) for fast region proposal using a RPN or region proposal network, which creates a set of object boxes with scores for each and then feeds into Fast R-CNN for Region of Interest or ROI Pooling. The feature extractor we used to generate the feature map for the Fast R-CNN module is the hybrid model Inception-ResNet V2[15]. This neural network most notably replaces filter concatenation layers in the traditional Inception blocks with ReLU or Rectified Linear Unit activation functions [37] to create Inception-ResNet blocks (with some other modifications). This model was shown to have comparable accuracy to the last version of Inception (v4) and be significantly faster in terms of training speed. We chose this model to compare it to the traditional ResNet architecture and measure the differences. We used an input size of 640 × 640 for our experiments.

4 Experiments and Results This section explains the datasets used, pre-processing steps performed, experiments completed and the results obtained in detail. All the experiments are performed using a deep learning computer with two NVIDIA GeForce RTX 3080 Ti GPUs.

4.1 Datasets The datasets used to perform all the experiments for this chapter are explained here. MILAB(B)-VTF is the largest, MWIR-Visible paired face dataset in the world to date [38]. It includes dual band (visible and thermal) face images from 400 subjects, captured under various indoors, outdoors and stand off distance scenarios. The subset of this dataset featured in this chapter contains face images collected in constrained indoor conditions with subjects wearing face masks or not. In this work, the masked and unmasked face dataset generated, are divided into training and test sets. The training set of the unmasked data consists of 1152 face images, and the test set, 288 face images. Similarly, the training set of masked data consists of 1085, and the test set of 272 face images. All the images are manually annotated by drawing bounding boxes around the face images for training labels and groundtruth annotations and are saved as XML files. Example images and bounding box annotations are shown in Fig. 4.

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

177

Fig. 4 Sample failed detections from the model demonstrating the lowest efficiency, i.e. the SSD MobileNet V2, compared to the detections on the same images when the most efficient model is used, i.e. the Faster R-CNN Inception-ResNet V2. The blue box represents the ground truth bounds for the face, the red box shows failed detection, and the green box corresponds to a successful detection. We established the failed detections and correct detections based on IoU (Intersection over Union)≥ 0.95. This was done to illustrate the difference in performance of the two models as well as the problems that can arise when a model is tested on a dataset with data that the model has not being trained on

178

V. Philippe et al.

4.2 Experimental Protocol All the five models described in Sect. 3 are fine-tuned, trained and tested using our datasets. The pre-trained models are originally trained on COCO dataset and these weights are frozen. These pre-trained models are re-trained using our data. Four sets of experiments are performed for this study: – First set of experiments are baseline experiments, where unmasked face images are used to train and test the models. – The second set of experiments is conducted by training the models with unmasked data and the trained models are tested on masked data. The results obtained from the second set are compared to those obtained from the first set and this further demonstrates the need to train new models using masked data. – In the third set of experiments, all the models are trained and tested using the images with masked faces. – Finally, a last set of experiments are performed where models trained on masked data are test on unmasked data. All models are trained with a learning rate of 10−5 . The batch size and the number of steps varied depending on the memory allocation requirements and the time required for the models to converge during training. Tables 1, 2, 3, 4, and 5 present the number of training steps, precision and recall and the processing time for single detection for the SSD MobileNet V2, SSD ResNet50, CenterNet HourGlass104, CenterNet ResNet50 V2, and Faster R-CNN Inception-ResNet V2 models, respectively. SSD MobileNet V2 is always trained using a batch size of 32, and all the other models are trained using a batch size of

Table 1 Training parameters and precision and recall metrics for SSD MobileNet V2 model. We demonstrate the results on the following scenarios: (1) U-U, which refers to the model being trained and tested on unmasked face data; (2) U-M, which refers to the model trained on unmasked and tested on masked; (3) M-M, trained and tested on masked face data; and (4) M-U, trained on masked and tested on unmasked face data Protocol U-U U-M M-M M-U

Number of steps 41.8k 41.8k 36.3k 41.8k

Precision 0.728 0.559 0.743 0.330

Recall 0.766 0.630 0.788 0.346

Time for single detection (sec) 0.5618652 0.6321278 0.5619196 0.6324498

Table 2 Training parameters and precision and recall metrics for SSD ResNet50 V1 model Protocol U-U U-M M-M M-U

Number of steps 41.8k 41.8k 41.8k 41.8k

Precision 0.899 0.238 0.899 0.589

Recall 0.927 0.227 0.928 0.634

Time for single detection (sec) 0.784428 0.866674 0.777717 0.878021

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

179

Table 3 Training parameters and precision and recall metrics for CenterNet HourGlass104 model Protocol U-U U-M M-M M-U

Number of steps 41.8k 41.8k 41.8k 41.8k

Precision 0.924 0.816 0.889 0.717

Recall 0.947 0.822 0.922 0.743

Time for single detection (sec) 0.949356 1.070911 0.957466 1.060222

Table 4 Training parameters and precision and recall metrics for CenterNet ResNet50 V2 model Protocol U-U U-M M-M M-U

Number of steps 41.8k 41.8k 23.0k 23.0k

Precision 0.863 0.245 0.894 0.655

Recall 0.884 0.268 0.921 0.672

Time for single detection (sec) 0.2060142 0.2359750 0.2104698 0.2324458

Table 5 Training parameters and precision and recall metrics for Faster R-CNN Inception-ResNet V2 model Protocol U-U U-M M-M M-U

Number of steps 25.0k 25.0k 35.0k 35.0k

Precision 0.953 0.841 0.978 0.831

Recall 0.969 0.871 0.987 0.856

Time for single detection (sec) 1.3513816 1.5306270 1.3564610 1.5248608

4. All the precision and recall values are obtained over an IoU (Intersection over Union) of 0.95.

4.3 Results From Table 1, it can be seen that when the SSD MobileNet V2 model is trained and tested on unmasked face images, the precision and recall scores are 72.8% and 76% respectively. When this model is tested on images with masked faces, the precision and recall scores dropped to 55.9% and 63% respectively. This further supports the need to develop and test new models that can operate in datasets with masked faces. When this model is trained and tested on masked face images, the precision and recall scores are 74.3% and 78.8% respectively, which are higher than that of the models trained and tested on unmasked faces. This pattern can be seen in all the trained models. In the last set of experiments, when the models are trained on masked data and unmasked data, each model performed differently. The precision and recall of SSD MobileNet V2 are the lowest at 32.9% and 34.65% respectively, and Faster R-CNN yielded the highest accuracy metrics at 83.1 and 85.6%. When using masked face images for both training and testing, Faster R-CNN Inception-ResNet V2 yields the highest precision and recall metrics of 97.8% and

180

V. Philippe et al.

98.7% respectively. In this case, SSD MobileNet V2 yields the lowest precision and recall metrics at 74.3% and 78.8% respectively. In general, comparing all the models, Faster R-CNN Inception-ResNet V2 yields the highest accuracy in all the sets of experiments, whereas SSD MobileNet V2 yields the lowest, except for the second set. When the models trained on unmasked data are tested on masked data, SSD ResNet50 V1 performed poorly with 23.8% precision and 22.7% recall, followed by CenterNet ResNet50 V2 with 24.5% precision and 26.8% recall values. Figure 5 shows the precision plot for all the models for all the scenarios tested. It can be seen that the Faster R-CNN Inception ResNet V2 model yielded the highest precision in all the scenarios. Figure 6 shows the recall plot and Faster RCNN Inception ResNet V2 model yields the highest recall values for all scenarios investigated. Looking at the percent drop across precision and recall for all the models together, such as in Figs. 7 and 8, there is a rapid change in performance for any models dealing data unlike what it was trained on. Both Precision and recall declines by as much as 75% and the inference time has a 10–15% drop. Example images from SSD MobileNet V2 and Faster R-CNN Inception ResNet V2’s detections coupled with the ground truth values are included in Fig. 4.

Fig. 5 Precision plot for all the models trained and tested on all the scenarios: (1) SSD MobileNet V2; (2) SSD ResNet50 V1; (3) CenterNet HourGlass104; (4) CenterNet ResNet50 V2; (5) Faster R-CNN Inception-ResNet V2. It can be seen that the Faster R-CNN Inception ResNet V2 model results in the highest precision in all the scenarios investigated

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

181

Fig. 6 Recall plot for all the models trained and tested on all the scenarios: (1) SSD MobileNet V2; (2) SSD ResNet50 V1; (3) CenterNet HourGlass104; (4) CenterNet ResNet50 V2; (5) Faster R-CNN Inception-ResNet V2. It can be seen that the Faster R-CNN Inception ResNet V2 model resulted in the highest recall in all the scenarios

For the unmasked trained models tested on unmasked data, the failed detection was largely due to the angle of the face to the camera which the model failed to generalize to. The Inception-ResNet model was much more accurate in determining the face even at a range of poses—not full frontal. For the unmasked trained models tested on masked data, the MobileNet had trouble interpreting the mask due to not being trained on it. The successful detectors were able to extrapolate the structure of the face from the priors it established in training. However, this category always had an increase in inference time. This is likely due to the uncertainty the models were faced with when dealing with the new model. The hiding of facial features such as the mouth, which would have helped the models find chin positioning, is evidenced in the shortening of the detection boxes such as in U-M row of Fig. 4. Regarding the models trained and tested on masked data, SSD MobileNet V2 produced a considerably less accurate and larger bounding box for the provided face compared to Faster R-CNN Inception-ResNet V2, which may have due to poor generalization again. Figure 4 displays this phenomenon in row M-M where the red MobileNet box takes up more area than the green Inception-ResNet or the blue ground truth boxes.

182

V. Philippe et al.

Fig. 7 This collection of box plots demonstrates the collective drop in performance when models are tested on data they are not familiar with. Unmasked precision and unmasked recall refers to the percentage drops of the models’ performance when comparing their unmasked train, unmasked test state versus their unmasked train, masked test state. The opposite is true for the Masked Precision and Masked Recall. This plot shows that you can have a wide range of drop off depending (up to 75%) demonstrating the need for a model capable of handling masked face data

Fig. 8 This collection of box plots demonstrates the collective increase in inference time when models are tested on data they are not familiar with. Unmasked refers to the inference time differences between a model trained and tested on unmasked data versus a model trained on unmasked data, but tested on masked faces. The opposite is applicable to the Masked data points. There is a considerable drop in inference time across the board which in the real world can have tremendous implications such as school or workplace security. In those areas, every second or half second can be the difference between a criminal wreaking havoc or a life saved

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

183

As for the last category (masked trained model tested on unmasked data), the MobileNet struggled with the angle of the face while the Inception-ResNet provided a much more similar detection box to the ground truth. This success of a large model such as the Faster R-CNN based version compared to the MobileNet model can be largely due to sheer size difference in the models. Figure 4 again showcases this in the M-U row with the red box cutting the right eye of the subject which is not apparent in the ground truth or the Faster R-CNN Inception-ResNet V2 box. The Inception-ResNet architecture is known to have many more parameters and consume more memory than MobileNet [39]. This argument can be extended to the ResNet50 based models which are larger than MobileNets but smaller than Inception-ResNets. The hourglass model is more interesting since it has outpreformed the Faster R-CNN detector in the past [33]. However, in our experiments, it fell short. This may be because of over-fitting considering how many steps it was trained on as well as imperfect hyper-parameters.

4.4 Face Recognition Experiments The main purpose of developing a face detection model is to assist a face recognition model in successfully localizing the face. Therefore, to further demonstrate the importance of automatic face detection, same-spectral, same-scenario face recognition experiments are performed using the manual and automated annotations. Then, the results are compared and discussed. The gallery set always consists of masked faces cropped manually, one image per subject. The first probe set consists of one image per subject, which is again manually cropped. The second probe set consists of masked face images that are cropped using the bounding box annotations that are obtained from the model with the highest accuracy metrics, i.e., Faster R-CNN Inception-ResNet V2 model. FaceNet [40] and VGGFace [41] are used to perform these experiments. Rank-1 face identification accuracy metrics of these models are presented in Table 6, and Figs. 9 and 10 illustrate the CMC (Cumulative Match Characteristic) curves. It can be seen in Table 6 that the face recognition accuracy when using manual or automated annotations is comparable, while automated detection has the advantage of not requiring any operator intervention saving hours or days of manual work. Table 6 Rank-1 face identification accuracy for masked data—Faces are cropped with manual annotations and using detected bounding boxes on the test set Model FaceNet VGGFace

Manual (%) 79.22 100

Proposed (%) 76.62 100

184

V. Philippe et al.

Fig. 9 CMC curves using FaceNet face recognition on faces that are manually cropped and cropped using automatically detected face images. It can be seen that the face recognition accuracy using the automated face detection is very close to that of the manually annotated images

Figure 9 shows the CMC (Cumulative Match Characteristic) curves obtained using pre-trained FaceNet face recognition model. It can be seen that the CMC curves using the two approaches almost overlap, demonstrating the efficiency of automatic face detection. Figure 10 shows the CMC curves obtained using the pre-trained VGGFace model and the curves for both the manual and automated annotations overlap.

5 Conclusions and Future Work This book chapter addresses the problem of face detection in the MWIR spectrum, when the faces are occluded with face masks, which is a case study relevant to a pandemic era. Instead of training original models, transfer learning is used, which utilizes frozen weights from pre-trained models to train new models. The pre-trained models are trained on COCO dataset, which consists of images in the order of tens of thousands. Since the academically available datasets are not large enough to develop original models, transfer learning is the preferred approach. The models are first trained and tested on masked face images, which resulted in high precision and recall values for all the models. Then these models are tested on masked face images, and the precision and recall metrics dropped significantly.

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

185

Fig. 10 CMC curves using VGGFace face recognition on faces that are manually cropped and cropped using detected annotations. It can be seen that the face recognition accuracy, using the automated face detection, is equal to that of the manually annotated images

For instance, precision and recall for CenterNet ResNet50 V2, when trained and tested on unmasked data, are 86.3% and 88.4% respectively. Performance dropped to 24.5% and 26.8% respectively when tested on masked data with the inference time increasing by 14.54%. Then, the model trained and tested on masked data resulted in 89.4% precision and 92.1% recall values with an inference time similar to that of the first scenario. This demonstrates the necessity to train and test new models to be able to perform face recognition and related tasks in masked face images without such challenges. Real-life face detection scenarios such as video security need high confidence detection accuracy, and quick response times, so that they can eliminate false negatives. The face recognition results further demonstrate the importance of automatic face detection in the MWIR spectrum. This work can be further extended to scenarios where the data used are collected in the visible spectrum and under constrained conditions, as well as to scenarios where the data used are collected in both thermal and visible spectra and under outdoor unconstrained settings. With sufficient data (preferably real and not synthesized), a unified model for each band that detects human faces at each distance can be developed using masked face data, similar to MT-CNN (Multi-Task Cascaded Convolutional Networks). Acknowledgments This work was partially supported by an STTR Phase II contract W911QX20C0022 from the US Army Research Laboratory, Adelphi, MD.

186

V. Philippe et al.

References 1. Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Workshop on faces in ’reallife’ images: detection, alignment, and recognition 2. Yang S, Luo P, Loy, C-C, Tang X (2016) Wider face: a face detection benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5525– 5533 3. Bourlai T (2016) Face recognition across the imaging spectrum. Springer, Berlin 4. Kabakus AT et al (2019) An experimental performance comparison of widely used face detection tools. Adv Distrib Comput Artif Intell J 8:5–12 5. Lin K, Zhao H, Lv J, Li C, Liu X, Chen R, Zhao R (2020) Face detection and segmentation based on improved mask r-cnn. Discrete Dyn Nat Soc 2020:1–11 6. Zheng Y, Zhu C, Luu K, Bhagavatula C, Le THN, Savvides M (2016) Towards a deep learning framework for unconstrained face detection. In: 2016 IEEE 8th international conference on biometrics theory, applications and systems (BTAS). IEEE, Piscataway, pp 1–8 7. Guo J, Xu J, Liu S, Huang D, Wang Y (2016) Occlusion-robust face detection using shallow and deep proposal based faster R-CNN. In: Chinese conference on biometric recognition. Springer, Berlin, pp 3–12 8. Qin H, Yan J, Li X, Hu X (2016) Joint training of cascaded CNN for face detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3456– 3465 9. Jiang H, Learned-Miller E (2017) Face detection with the faster R-CNN. In: 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017). IEEE, Piscataway, pp 650–657 10. Sun X, Wu P, Hoi SC (2018) Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299:42–50 11. Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. Preprint arXiv:1804.02767 12. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of international conference on computer vision (ICCV) 13. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV) 14. Jain V, Learned-Miller E (2010) FDDB: a benchmark for face detection in unconstrained settings. UMass Amherst technical report, Technical report 15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 16. Ejaz MS, Islam MR, Sifatullah M, Sarker A (2019) Implementation of principal component analysis on masked and non-masked face recognition. In: 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT), pp 1–5 17. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021) A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 167:108288 18. Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J (2021) SSDMNV2: a real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustainable Cities Soc 66:102692 19. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520 20. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision. Springer, Berlin, pp 21–37 21. Chavda A, Dsouza J, Badgujar S, Damani A (2021) Multi-stage CNN architecture for face mask detection. In 2021 6th international conference for convergence in technology (I2CT), pp 1–8

The Effects of Face Masks on the Performance of Modern MWIR Face Detectors

187

22. Osia N, Bourlai T (2012) Holistic and partial face recognition in the MWIR band using manual and automatic detection of face-based features. In: 2012 IEEE conference on technologies for homeland security (HST). IEEE, Piscataway, pp 273–279 23. Mokalla SR, Bourlai T (2020) Face detection in MWIR spectrum. In: Securing social identity in mobile platforms. Springer, Berlin, pp 145–158 24. Bourlai T, Cukic B (2012) Multi-spectral face recognition: identification of people in difficult environments. In: 2012 IEEE international conference on intelligence and security informatics. IEEE, Piscataway, pp 196–201 25. Bourlai T, Ross A, Chen C, Hornak L (2012) A study on using mid-wave infrared images for face recognition. In: Sensing technologies for global health, military medicine, disaster response, and environmental monitoring II; and biometric technology for human identification IX, vol 8371. International Society for Optics and Photonics, Baltimore, p 83711K 26. Abaza A, Bourlai T (2013) On ear-based human identification in the mid-wave infrared spectrum. Image Vision Comput 31(9):640–648 27. Narang N, Bourlai T (2015) Face recognition in the SWIR band when using single sensor multi-wavelength imaging systems. Image Vision Comput 33:26–43 28. Mokalla SR, Bourlai T (2019) On designing MWIR and visible band based deepface detection models. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, Piscataway, pp 1140–1147 29. Mokalla SR (2020) Deep learning based face detection and recognition in MWIR and visible bands. West Virginia University, Morgantown 30. Bourlai T, Jafri Z (2011) Eye detection in the middle-wave infrared spectrum: towards recognition in the dark. In: 2011 IEEE international workshop on information forensics and security. IEEE, Piscataway, pp 1–6 31. Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7310–7311 32. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750 33. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578 34. Newell A, Yang K, Deng J, Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, Berlin, pp 483–499 35. He K, Zhang X, Ren S, Sun J (2016)Identity mappings in deep residual networks. In: European conference on computer vision. Springer, Berlin, pp 630–645 36. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neur Inf Process Sys 28:91–99 37. Agarap AF (2018) Deep learning using rectified linear units (ReLU). CoRR, vol abs/1803.08375 [Online] 38. Peri N, Gleason J, Castillo CD, Bourlai T, Patel VM, Chellappa R (2021) A synthesis-based approach for thermal-to-visible face verification. In: 2021 16th IEEE international conference on automatic face and gesture recognition (FG 2021). IEEE, Piscataway, pp 01–08 39. Bianco S, Cadene R, Celona L, Napoletano P (2018) Benchmark analysis of representative deep neural network architectures. IEEE Access 6:64270–64277 40. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823 41. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Proceedings of the British Machine Vision Conference (BMVC), pp 41.1–41.12

Multispectral Face Mask Compliance Classification During a Pandemic Jacob Rose, Haiying Liu, and Thirimachos Bourlai

Abstract In this paper, we investigate the problem of face mask compliance classification in response to the Corona Virus Disease (COVID-19) pandemic. Since the start of the pandemic, many governments and businesses have been continually updating policies to help slow the spread of the virus, including requiring face masks to use many public and private services. In response to these policies, many researchers have developed new face detection and recognition techniques for masked faces. Many of the developed approaches proposed to address the problem of masked face detection have been relatively successful. However, these approaches almost exclusively focus on detecting the presence or absence of someone wearing a face mask. It is understood and broadly discussed in various reports and the media that there are people not always following the suggested guidelines provided by public health authorities for wearing face masks, which include ensuring the face mask properly covers the nose and mouth. To date, very few research publications exist that investigate the capability of modern classification algorithms to efficiently distinguish between masked faces in the visible band that are either compliant or non-compliant with the suggested guidelines. Furthermore, to the best of our knowledge, there is no publication in the open literature that focuses on the investigation of face mask classification in the thermal band. As thermal sensors continue to improve in quality and decrease in cost, surveillance applications using thermal imagery are expected to continue to grow and benefit those organizations considering the automation of face mask detection and compliance. In this study, we propose an investigation on face mask compliance in both the thermal and visible bands. It is composed of the following salient steps; (1) the creation of a multi-spectral masked face database from subjects wearing or not wearing face masks, (2) the augmentation of the generated database with synthetic face masks to simulate two different levels of non-compliant wearing of face masks, and (3) the assessment of a variety of CNN architectures on the previous augmented database to investigate any differences between classifying thermal

J. Rose () · H. Liu · T. Bourlai Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA e-mail: [email protected]; [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_10

189

190

J. Rose et al.

and visible masked faces. Experimental results show that face mask compliance classification in both studied bands yield a classification accuracy that reaches 100% for most models studied, when experimenting on frontal face images captured at short distances with adequate illumination. Keywords Face masks · COVID-19 · Pandemic · Face mask classification · Deep learning · Mask compliance · Thermal imaging · Multi-spectral Imaging

1 Introduction As of August 2021, the ongoing pandemic of COVID-19, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in 200,840,180 confirmed cases and 4,265,903 deaths worldwide, including 35,229,302 confirmed cases and 610,180 deaths in the United States alone. According to the World Health Organization (WHO), wearing face masks helps prevent our respiratory droplets from reaching others. In addition, when wearing a face mask over the nose and mouth, the spray of droplets is reduced. In response to this recommendation, different governments around the world started a set of initiatives, including ones that aim to utilize machine learning techniques to detect whether passengers are wearing face masks in metro stations. Many other members of the artificial intelligence community have also started developing various automatic face mask detection models that can aid in the monitoring and screening of face mask usage. However, most current models typically focus on detecting whether a mask is present in a face image or not. According to the mask guidelines provided by the Center for Disease Control and Prevention (CDC), face masks should (1) have two or more layers of washable, breathable fabric, (2) completely cover the nose and mouth, (3) fit snugly against the sides of the face with no gaps, and (4) have a nose wire to prevent air from leaking out the top of the mask. Therefore, the methods that only detect the presence of a mask will fail to identify subjects who are improperly wearing their mask and thus, not complying with CDC guidelines. It is important to note that in our work we are using 2D face images, and we focus on detecting whether the guideline (2) discussed above is being followed or not. More specifically, we want to determine if a detected face is “compliant”, where a face mask is properly worn over the nose and mouth, or “non-compliant”, where a detected face has either, (a) no face mask, (b) a mask worn below the nose, or (c) a mask worn below the chin. While we cannot account for all types of noncompliance, especially if the mask fits snugly against the sides of the face without gaps, scenarios (b) and (c) seem to be the most common cases of masked non-compliance observed during the pandemic. In the remainder of this work, when we mention compliance with face mask guidelines, we will be specifically talking about the automation of monitoring and detecting compliance of wearing face masks that properly cover the nose and mouth areas.

Multispectral Face Mask Compliance Classification During a Pandemic

191

As more businesses and government agencies apply mandates for mask usage in the workplace or in certain public venues, there is a need for methods that automate the monitoring of face mask compliance. Although there are several works proposed since the start of the pandemic, very few address different levels of face mask noncompliance, including (a) no face mask present, (b) face mask is below the nose, and (c) face mask is below the mouth and nose. To our knowledge, only one large scale publicly available dataset exists with visible band face images annotated for masks that are present but not worn correctly [3]. We have found no work extending face mask compliance in the thermal band, specifically, the middle wave infrared band (MWIR). MWIR sensors operate on the passive IR band in the 3.0–5.0 μm spectral range. The benefits of operating on the MWIR band are various, with many applications including biometrics and biomedical related [4–7, 28, 29, 32, 33]. Passive IR sensors need no external light source, and instead detect IR radiation as heat emitted from objects. In addition to being tolerant of other commonly encountered environmental conditions such as fog, smoke, and dust, MWIR imaging sensors are ideal when operating under low light, night-time environments. Any operational scenario with less-than-ideal lighting conditions can greatly benefit from the use of MWIR imaging sensors. In this chapter, we will focus on classifying faces wearing masks as either compliant (mask properly covers the nose and mouth face areas) or non-compliant (not wearing a face mask or wearing one but it is placed below the nose or mouth). In addition to evaluating classification performance on visible band face images, we also investigate how different CNN architectures perform on face images where subjects are wearing face masks captured when using an MWIR imaging sensor. The main contributions of our work are the following: – The creation of a multispectral masked face (MMF-DB) database of 100 subjects with various levels of non-compliant and compliant mask wearing in the visible and MWIR bands. – The augmentation the MMF-DB with synthetically applied masks at two levels of non-compliance, masks placed below the nose and below the mouth. – The assessment of the performance of nine well-established CNN architectures on masked and unmasked face images. – The development of an efficient deep learning based approach on solving the problem of classifying face images wearing masks as either compliant or noncompliant when operating in either the visible or thermal bands. Experimental results show that face mask compliance classification in both studied bands yield a classification accuracy that reaches 100% for most models studied, when experimenting on frontal face images captured at short distances with adequate illumination.

192

J. Rose et al.

2 Related Work With the outbreak of COVID-19, there has been increasing interest in face mask research for face recognition as well as other face detection and classification tasks. In this section, we discuss and highlight a gap in the literature for face mask classification by varying degrees of non-compliance and classification in the thermal spectrum.

2.1 Masked Face Recognition One of the major challenges in still image-based face recognition is the partial occlusion of the face in unconstrained environments [15]. While there is a large amount of literature dedicated to the facial occlusion problem, there are now several publications that have been reported after the COVID-19 outbreak that focus specifically on face masks. One of the first studies was performed by the National Institute of Standards and Technology (NIST) on the performance of face recognition algorithms on masked faces [31]. All algorithms used in the study were provided to NIST before the pandemic, thus offering a verification benchmark for algorithms not specifically developed to handle masked face images. The occlusions were made by synthetically applying masks of different shapes, colors, and nose coverage to the probe images. Experimental results show that the overall accuracy using masked probe images led to substantial performance decrease for all algorithms used, and masks that covered more of the face resulted in more false non-matches. In [11], the authors also assess the effects of face masks using their own database designed to simulate realistic use cases of people with and without masks covering their faces. They assessed two high-performing academic algorithms and one of the most efficient Commercial Off the Shelf (COTS) algorithm, finding that masks have a large impact on score separability between genuine and imposter comparisons in all three methods. Their dataset was extended to include more participants with real and simulated masks in [12]. They compared the effect of masked faces on verification performance by evaluating 12 human experts and 4 popular face recognition algorithms. Among several observations, they found that human experts and the verification performance established by the used algorithms are similarly affected when comparing masked probes to unmasked references or pairs of masked faces. More recently, the Masked Face Recognition Competition was held in the International Joint Conference of Biometrics 2021 and summarized in [8]. The competition included 10 teams from academia and industry from nine different countries. The submissions were evaluated on a database of individuals wearing real face masks on two different scenarios, masked vs masked face verification accuracy, and masked vs non-masked face verification accuracy. 10 of the 18

Multispectral Face Mask Compliance Classification During a Pandemic

193

solutions submitted by the teams were able to achieve lower verification error than the ArcFace baseline. Another challenge is face recognition in the thermal band, where we can have either same spectral face matching (thermal to thermal), or cross-spectral face matching (visible against thermal). Face recognition in the MWIR band [4, 7, 17, 28, 33] is an active area of research that can be applied to a variety of surveillance or law enforcement applications. The face recognition pipeline is often similar to that of the visible, where faces must be detected, normalized, and matched. These challenges are addressed in [28], where face detectors are trained and assessed on thermal data captured at 5 and 10 m distances, both indoors and outdoors. Then, same-spectral cross-scenario (indoor vs outdoor) face recognition is used to compare faces detected using the trained models versus the annotated ground truth faces. We recognize that face recognition with masked occlusions in the visible and thermal bands is a new challenge during the pandemic, and the ability to detect and classify occluded faces across different spectra is a requirement before face matching can be accomplished. Therefore, relevant literature for masked face detection and classification is presented next.

2.2 Mask Detection and Classification Interest in the problems of detection and classification of masked faces has increased since the start of the COVID-19 pandemic. Earlier works [14, 42] used the term “masked” as a description of faces that are occluded in some way, and not necessarily from a homemade or medical face mask. The more recent works focus on the localization and classification of medical or cloth face mask occlusions only. Of the face mask occlusion literature recently produced, most detect either the presence or absence of a mask on a human face [1, 10, 20, 21, 25–27, 36, 39, 40], and not whether it is being worn correctly. In [20], one of the first dedicated face mask detectors, RetinaFaceMask, was proposed. The one-stage detector utilized a feature pyramid network with two novel additions to increase masked face detection. A context attention detection head focuses on detecting masks, and a cross-class object removal algorithm that removes objects with low confidence and high IoU scores. The authors assessed two different backbone architectures, namely a ResNet architecture for high computational scenarios and a MobileNet architecture for low computational scenarios. Results on images from the MAFA [14] and WIDER Face [44] datasets achieved state-ofthe-art results. In [36], an automated system to detect persons not wearing a face mask in a smart city network was proposed. The authors created their own novel CNN-based detection architecture that can accurately detect faces with masks. Then, the decision is forwarded to a proper authority to ensure precautionary measures are being maintained. An ensemble face mask detector that uses a one-stage and two-stage detector during pre-processing was created in [39]. This approach resulted in high accuracy and low inference time. The inclusion of a bounding box

194

J. Rose et al.

transformation that improved mask localization performance allowed their model to achieve higher precision in face and mask detection when compared to the previously mentioned RetinaFaceMask detector. The authors also address the large class imbalance of the MAFA dataset by creating a balanced version where the imbalance ratio is nearly equal to one. Loey et al. proposed two different face mask detection methods. In [26], classical machine learning and deep learning methods are combined for accurate mask detection on three masked datasets. ResNet50 [16] was used for feature extraction, and SVM, decision trees, and ensemble algorithms were used for classification. The proposed model outperformed related works, with the SVM achieving 99.64% on the Real-World Masked Face Dataset [43]. In [25], the authors use a conventional deep learning approach for face mask detection. In this work they used once more the ResNet50 model for feature extraction and the YOLOv2 [37] detector. Using the ADAM optimizer and mean IoU for estimating the number of anchor boxes, they achieved better results than the related work reported in their paper. There are some other works we draw inspiration from that not only detect whether a face mask is present on the face, but if it is being worn correctly [3, 9, 35]. In [35], the SRCNet was proposed for face mask detection. The method consists of a super-resolution network and face mask condition classification network that classifies three face mask wearing conditions: no face-mask wearing, incorrect facemask wearing, and correct face-mask wearing. SRCNet applies super-resolution to all cropped faces, when width or length are no more than 150 pixels. After the network enhances the image to an output size of 224 × 224 × 3, face mask condition classification is performed. An ablation study showed that both transfer learning and the super-resolution network greatly contributed to the accuracy of SRCNet. Chavda et al. [9] used a two-stage CNN architecture for detecting masked and unmasked faces that can be used with CCTV footage. The authors construct their own database from several publicly available masked datasets and online sources. They noted that the dataset contains improperly worn face masks and palms masking the face, and they labeled those instances as non-masked faces. After training several face detectors and classifiers, the results yielding the highest scores were achieved using the RetinaFace face detector and NASNetMobile classifier. Results on video data were also improved with a modified Centroid Tracking technique from [30]. One of the largest studies investigating the proper wearing of face masks was carried out in [3]. The authors investigate three important research questions, (a) how well do existing face detectors perform on masked face images, (b) is it possible to detect compliant placement of face masks, and (c) are existing face mask detection techniques useful for monitoring applications during the current pandemic. To address these questions, they performed a comprehensive examination of seven pre-trained face detection models for masked face detection performance and 15 classification models for correct face mask placement. To implement the study, the authors also created the Face-Mask-Label Dataset (FMLD), compiled from the MAFA and Wider Face datasets. Most existing techniques only detect the presence of a face mask, so the FMLD dataset is annotated for compliant and non-compliant face masks. The dataset is also made publicly available. The

Multispectral Face Mask Compliance Classification During a Pandemic

195

authors, by evaluating the face detection and classification stages separately, they found that RetinaFace and ResNet152 yields the highest performance. Their results indicated that masked faces are a challenge for most face detectors, but RetinaFace was able to achieve an average precision of 92.93% on the entire dataset. The classification models performed better, with all methods achieving an average recognition accuracy of over 97% and only a 1.12% difference between the least and most accurate models in terms of performance. Due to the current pandemic, research on the detection and classification of face masks has been studied with increasing urgency. However, there are still relatively few publications in the area, especially works that specifically classify different levels of face mask compliance. Furthermore, at the time of this writing, we could not find any literature classifying face mask compliance in the thermal band. Due to the limited amount of literature in the area, in this work we focus on the classification of face mask compliance in both the visible and MWIR bands. In the next section we describe our methodological approach.

3 Methodology 3.1 Classification Models To determine face mask compliance in the visible and thermal spectra, we assess nine well-established and pre-trained CNNs on our unique dual-band face datasets. All models determine if a cropped face belongs to the “compliant” class, where a face mask is properly worn over the nose and mouth, or the “non-compliant” class, where a cropped face has either, (a) no face mask, (b) a mask worn below the nose, or (c) a mask worn below the chin. While we cannot account for all types of improper mask wearing, especially if the mask fits snugly against the sides of the face without gaps, scenarios (b) and (c) seem to be the most common cases of masked non-compliance observed during the pandemic. All cropped faces are resized during training and inference based on the required input size of the classification model. We selected a wide array of classifiers based on model depth and number of parameters to identify any differences in performance due to model complexity on our data, see Table 1. All networks were trained using MATLAB 2020b. The following classifiers were assessed in our work: – AlexNet [24] (2012): AlexNet heavily influenced the field of deep learning, winning the ILSVRC in 2012 by a very large margin. AlexNet features included using ReLU instead of Tanh to introduce non-linearity, and dropout regularization to handle over-fitting. – SqueezeNet [19] (2016): We use SqueezeNet v1.1 in our experiments. SqueezeNet uses 1×1 convolutions inside fire modules that squeeze and expand feature maps, reducing the number of parameters, while still maintaining accuracy.

196 Table 1 Model depth, parameters, and image input size. Parameters are in millions. *From MATLAB, the NASNet-Mobile network does not consist of a linear sequence of modules

J. Rose et al. Model AlexNet SqueezeNet v1.1 ResNet18 RenNet50 DarkNet53 EfficientNet-B0 ResNet101 DenseNet-201 NASNetMobile

Depth 8 18 18 50 53 82 101 201 *

Parameters 61 1.24 11.7 25.6 41.6 5.3 44.6 20 5.3

Input size 227×227 227×227 224×224 224×224 256×256 224×224 224×224 224×224 224×224

– ResNet [16] (2016): ResNet addresses the vanishing gradient problem in very deep networks by using residual blocks that allow gradients to flow through skip connections. We use three versions of ResNet: (1) ResNet18, (2) ResNet50, and (3) ResNet101. – DenseNet-201 [18] (2017): DenseNet is another architecture for training deeper networks. DenseNet connects every layer directly with each other, which allows for feature reuse and reduces the number of parameters. – DarkNet53 [38] (2018): DarkNet53 is the feature extractor used in the YOLOv3 object detector. It uses 3×3 and 1×1 convolutions and shortcut connections, which is an improvement over the previous DarkNet19 feature extractor from YOLOv2. – NASNetMobile [46] (2018): The Neural Search Architecture (NAS) is an algorithm that learns the model architecture directly on the training dataset. NASNetMobile is a smaller version of the NASNet architecture. – EfficientNet-B0 [41] (2019): The authors of EfficientNet use the NAS algorithm to create the baseline EfficientNet-B0 architecture, and a novel compound coefficient to scale up the network’s depth, width, and resolution to improve performance.

3.2 Dataset Our multispectral masked database [34] was collected during the winter of 2021 when mask mandates were the norm in most places. For this work, we selected 100 of the 280 subjects who participated to compose the dataset. The visible data was captured using a Canon EOS 5D Mark IV DSLR camera. It uses a 30.4 MP full frame CMOS sensor and a Canon EF 70–200 mm f/2.8 L-series lens. The thermal data was captured using a FLIR A8581 MWIR camera with a 50 mm, f/2.5 manual focus lens. The FLIR has an indium antimonide detector and a 3.0–5.0 μm spectral range and thermal sensitivity of ≤ 30 mk. Examples of the visible and thermal data we collected are visualized in Figs. 1 and 2 respectively.

Multispectral Face Mask Compliance Classification During a Pandemic

197

Fig. 1 Visible spectrum examples of masked and unmasked faces

From all subjects that came wearing the mask, a number of subjects interested were filmed with and without a mask on, indoors, at six feet (this created a subset of our original MILAB-VTF(B) no mask face dataset). All subjects wore their mask over the nose and mouth for the masked portion of the data collection. For every subject, we extracted 10 frontal face frames from each of the videos. Then, faces were cropped in the visible band using the MTCNN face detector [45] and all thermal frames were manually cropped. The total number of images for each spectrum can be seen in Table 2 After the fully compliant and non-compliant faces were processed, synthetic masks were added to the compliant faces to create the two levels of non-compliant mask wearing. For visible images, the method used in [2] was applied. It uses facial landmarks detected by Dlib [22] to identify the face tilt and six key features of the face that are required for applying the mask. These points include the nose bridge, chin, and four points along the jawline, two on each side. We modified the 6 key points to shrink and shift the mask down to create non-compliant cases. The color code and mask type code were set to select random masks that are surgical style blue and white, and cloth style that are blue, gray, black, and dark green. For the thermal face images, in which the Dlib detector did not work, we manually (via a software) fitted the masks on faces without one. We first extracted a mask from a

198

J. Rose et al.

Fig. 2 Thermal spectrum examples of masked and unmasked faces Table 2 Compliant samples have real masks that cover the nose and mouth. Non-compliant samples have synthetically applied masks that do not cover the nose or the nose and mouth Spectrum Visible Thermal

Compliant 1000 995

Non-compliant 1000 1000

Total 2000 1995

face in the visible dataset and saved it as a template. Then, we shrunk the mask to a suitable size for each of the two non-compliant conditions and placed them over the face and saved the image. Since the thermal images do not record color information, the color of the mask is restricted to black, similar to the masks captured with the FLIR camera. Examples of synthetic masks in the visible spectrum are presented in Fig. 3.

3.3 Experimental Setup In this section we discuss the experiments performed on our dataset. For the amount of data for each class and each spectrum we have available, a five-fold

Multispectral Face Mask Compliance Classification During a Pandemic

199

Fig. 3 Examples of synthetically masked faces

cross-validation is performed. All folds are split evenly between the compliant and non-compliant classes with no overlap between samples. The different levels of noncompliant mask wearing are also evenly distributed among the folds. For equal comparison, all models were pre-trained on the ImageNet database [13] and then trained using the same hyper-parameters by applying transfer learning. Transfer learning is a technique to leverage the feature representations that the network has already learned and use this knowledge to learn features on a new set of images. To perform transfer learning, generally, weights in the early and middle layers, where features such as edges and textures have been learned, are frozen. Then, the final layers of the network, where more complex features have been learned, are fine-tuned during training to learn the new representations of the dataset. This process is especially useful when the problem to be solved does not have a large amount of data available to train with. We empirically assessed a range of learning rates and found our chosen rate would converge quickly during training for all models, so we use the same parameters for every model. Optimization is performed using the ADAM [23] algorithm, with an initial learning rate of 0.0001 and decreased by a factor of 0.1 every 3 epochs. The models are trained for 5 epochs using a batch size of 16 on an NVIDIA GeForce RTX 2080 Ti graphics card, with 11 GB of memory. Data augmentation is performed during training using random reflection, scale, and

200

J. Rose et al.

translation changes. Training is performed 5 times for each model, once for every cross-validation fold, on the visible and thermal datasets.

3.4 Evaluation Metrics We report the classification accuracy, which is the number of predictions the model got correct divided by the total number of samples, for all models. The formula for computing accuracy is: TP +TN T P + FP + T N + FN

(1)

where TP = True Positive, TN = True Negative, FP = False Positive, and FN = False Negative.

4 Results and Discussion In this section we present the results of our experiments on face mask compliance classification in the visible and thermal spectra. Our intention is to provide results of face mask compliance using established classification models in the thermal band and a complimentary visible dataset that was simultaneously captured under the same conditions.

4.1 Visible Results The accuracy for all models is presented in Table 3. Through all five folds, six of the nine models were able to classify every sample correctly, and two of the three models misclassified just a single image. NASNetMobile yielded the lowest accuracy results, missing seven samples. Five of those seven misclassifications were from the non-compliant class and the remaining two from the compliant class. The non-compliant below the nose and non-compliant below the mouth images each accounted for one of the errors, with the remainder being either no mask/non-compliant or fully masked/compliant. Some of those misclassifications are visualized on the left side of Fig. 4. Across all models, fully compliant and non-compliant with no mask samples compose most of all misclassifications. The DenseNet and ResNet101 models both failed on the same compliant subject wearing a bright red patterned mask, bottom right in Fig. 4. It is observed that all errors in the visible band are from the deepest models. This could partially be due to overfitting on the relatively small database. These models would likely benefit from training on

Multispectral Face Mask Compliance Classification During a Pandemic Table 3 Accuracy results for the problem of thermal and visible mask compliance classification

Model AlexNet SqueezeNet v1.1 ResNet18 RenNet50 DarkNet53 EfficientNet-B0 ResNet101 DenseNet-201 NASNetMobile

201 Thermal 0.999 1 1 1 1 1 1 1 1

Visible 1 1 1 1 1 1 0.999 0.999 0.996

Fig. 4 Samples of compliant (top left) and non-compliant (bottom left) faces misclassified using NASNetMobile. The only thermal face misclassified by AlexNet (top right), and the face misclassified by ResNet101 and DenseNet (bottom right)

a larger dataset. Other misclassifications could be due to the large variability of the masks, including color, texture, and patterns that may confuse the classifier.

202

J. Rose et al.

4.2 Thermal Results The results for the thermal dataset in terms of classification accuracy were very high, with only one case of misclassification through all five folds and when Alexnet is used, as shown in the top right corner of Fig. 4. With only one sample classified incorrectly, it is difficult to draw a conclusion about where these models struggle on this face mask classification task in the thermal band. We can, however, infer that mask compliance in the thermal spectrum is more accurate than in the visible one. This is likely to due to the lack of variance of the masks themselves and how pronounced the nose is in thermal images. For thermal masks, very little to no texture or color information is present, with all of them being various shades of black to dark gray. The nose in most samples is colder than the rest of the face, making it a very distinguishable feature and easy to detect if it is covered by a mask or not. Additionally, different levels of mask compliance appear to be easier to classify due to the lack of mask variation. In thermal images, a mask appears as a large dark covering over the face and is highly distinct. The masks in the visible images vary in color, pattern, and texture, which adds difficulty to the final classification decision.

4.3 FMLD Test Set Results We also evaluate our visible trained models on the FMLD test set in [3]. This data is quite different from our own, with a high degree of diversity in face pose, various occlusions, degree of face mask coverage, illumination, and image resolution. All results can be seen in Table 4. We observe that our model’s performance varied considerably, with a 33.4% difference between our best and worst performing model. SqueezeNet, ResNet101, and Alexnet performed the best, with accuracy of

Table 4 Accuracy results on FMLD test set using our trained classifiers. Compliant samples account for 38% of the test set, while the non-compliant cases account for the remaining 62%. The non-compliant incorrectly worn samples are clearly the most difficult to classify for all models, with results ranging from 20.01 to 77.2% accurate Model AlexNet SqueezeNet v1.1 ResNet18 RenNet50 DarkNet53 EfficientNet-B0 ResNet101 DenseNet-201 NASNetMobile

Compliant(38%) 0.981 0.921 0.992 0.969 0.991 0.993 0.912 0.993 0.992

NC-incorrectly worn(2.5%) 0.438 0.500 0.407 0.503 0.244 0.201 0.772 0.469 0.207

NC-no mask(58.5%) 0.796 0.891 0.482 0.726 0.287 0.315 0.851 0.599 0.364

Total 0.859 0.893 0.678 0.814 0.559 0.575 0.873 0.748 0.604

Multispectral Face Mask Compliance Classification During a Pandemic

203

Fig. 5 Samples of misclassified images from the FMLD test set. The true labels are (a) compliant, (b) non-compliant with a mask, and (c) non-compliant with no mask

89.3, 87.3 and 85.9% respectively. Two of the top three were our shallowest models in terms of depth, with the exception being ResNet101. DenseNet also performed well. These scores were not as good as the results in [3], where all models had over 97% accuracy and only 1.12% difference between the worst, AlexNet, and best, ResNet152, models. This was to be expected, as we did not train on this data and were interested in getting a better understanding of where our models fail on more challenging data. The compliant cases are much easier to label correctly, with all models achieving over 91% and many over 99%. The non-compliant cases where the mask is worn incorrectly was the most difficult, with the best model getting 77% correct and the next best only getting half of them correct. This case does highlight a need for more data because they only account for 2.5% of both the training and test set. The other non-compliant cases, those with no mask present, were quite challenging. Results varied widely between just over 31% to almost 90%. Samples of misclassified images from all three levels of masking that we investigated can be visualized in Fig. 5.

4.4 Limitations Although the results from this evaluation are quite good on our dataset, there are a few limitations that must be mentioned. Most importantly, the dataset is quite small, using only 1000 samples per class in each spectrum. A larger dataset with additional subjects would introduce more variance and likely more errors so that we can further investigate the factors that lead to a decrease in performance. Next, the samples are relatively easy to classify because they were collected indoors with consistent lighting at the same distance. We also did not include any profile faces, which would increase the degree of difficulty. Lastly, we ignore the face detection

204

J. Rose et al.

step in the pipeline and only assess the classification portion using faces that have already been detected and cropped. All the mentioned limitations should not be ignored in future work.

5 Conclusion and Future Work In this work, we assess nine well-established CNN architectures for classification of face mask compliance. We collect and annotate our own dataset of 100 subjects for this task. All subjects are wearing a mask in a compliant state with the mask covering their nose and mouth, and a non-compliant state with no mask on. Additionally, we augment this data with synthetic masks where the mask is either sitting below the nose or below the mouth to account for common instances when a mask is worn but not in compliance with the current CDC mask wearing guidelines. The synthetic masks we applied to visible images differ in shape, color, and pattern to introduce as much variance as possible into the dataset. After assessing all classification models, we observe that the thermal band offers a more accurate option for mask compliance classification, classifying 100% of all faces correctly except for AlexNet. The models trained on the visible data are nearly as good, with accuracy well over 99% for all models. Additionally, our SqueezeNet visible model, even though it was trained on a small high-quality dataset, was able to achieve accuracy on the FMLD test set that is only 9% less accurate than the same model trained on FMLD in [3]. It is important to note the issues discussed in Sect. 4.4 regarding the reported success of our results. Future work could involve the creation of a larger and more challenging dataset, especially the thermal portion. This may include more subjects with off-pose samples instead of only full-frontal samples. Additionally, data captured outdoors and at longer distances would also benefit future research, as well as assessing face detection and recognition with various levels of mask compliance. Acknowledgments This work was partially supported by an STTR Phase II contract W911QX20C0022 from the US Army Research Laboratory, Adelphi, MD

References 1. Abbasi S, Abdi H, Ahmadi A (2021) A face-mask detection approach based on yolo applied for a new collected dataset. In: 2021 26th international computer conference, computer society of Iran (CSICC). IEEE, Piscataway, pp 1–6 2. Anwar A, Raychowdhury A (2020) Masked face recognition for secure authentication. arXiv:2008.11104 3. Batagelj B, Peer P, Štruc V, Dobrišek S (2021) How to correctly detect face-masks for covid-19 from visual information? Appl Sci 11(5):2070 4. Bourlai T (2016) Face recognition across the imaging spectrum. Springer, Berlin

Multispectral Face Mask Compliance Classification During a Pandemic

205

5. Bourlai T, Hornak LA (2016) Face recognition outside the visible spectrum. Image Vis Comput 55:14–17 6. Bourlai T, Pryor RR, Suyama J, Reis SE, Hostler D (2012) Use of thermal imagery for estimation of core body temperature during precooling, exertion, and recovery in wildland firefighter protective clothing. Prehosp. Emerg. Care 16(3):390–399 7. Bourlai T, Ross A, Chen C, Hornak L (2012) A study on using mid-wave infrared images for face recognition. In: Sensing technologies for global health, military medicine, disaster response, and environmental monitoring II; and biometric technology for human identification IX, vol 8371. International Society for Optics and Photonics, p 83711K 8. Boutros F, Damer N, Kolf JN, Raja K, Kirchbuchner F, Ramachandra R, Kuijper A, Fang P, Zhang C, Wang F, et al (2021) MFR 2021: masked face recognition competition. In: 2021 IEEE international joint conference on biometrics (IJCB). IEEE, Piscataway, pp 1–10 9. Chavda A, Dsouza J, Badgujar S, Damani A (2021) Multi-stage CNN architecture for face mask detection. In: 2021 6th international conference for convergence in technology (I2CT). IEEE, Piscataway, pp 1–8 10. Chowdary GJ, Punn NS, Sonbhadra SK, Agarwal S (2020) Face mask detection using transfer learning of inceptionv3. In: International conference on big data analytics. Springer, Berlin 11. Damer N, Grebe JH, Chen C, Boutros F, Kirchbuchner F, Kuijper A (2020) The effect of wearing a mask on face recognition performance: an exploratory study. In: 2020 International conference of the biometrics special interest group (BIOSIG). IEEE, pp 1–6 12. Damer N, Boutros F, Süßmilch M, Fang M, Kirchbuchner F, Kuijper A (2021) Masked face recognition: human vs. machine. arXiv:2103.01924 13. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, Piscataway, pp 248–255 14. Ge S, Li J, Ye Q, Luo Z (2017) Detecting masked faces in the wild with LLE-CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2682– 2690 15. Guo G, Zhang N (2019) A survey on deep learning based face recognition. Comput Vis Image Understand 189:102805 16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 17. Hu S, Choi J, Chan AL, Schwartz WR (2015) Thermal-to-visible face recognition using partial least squares. JOSA A 32(3):431–442 18. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 19. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv:1602.07360 20. Jiang M, Fan X, Yan H (2020) Retina facemask: a face mask detector, vol 2. arXiv:2005.03950 21. Khandelwal P, Khandelwal A, Agarwal S, Thomas D, Xavier N, Raghuraman A (2020) Using computer vision to enhance safety of workforce in manufacturing in a post covid world. arXiv:2005.05287 22. King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758 23. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980 24. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 25. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021) Fighting against covid-19: a novel deep learning model based on yolo-v2 with resnet-50 for medical face mask detection. Sustain Cities Soc 65:102600 26. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021) A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the covid-19 pandemic. Measurement 167:108288

206

J. Rose et al.

27. Mohan P, Paul AJ, Chirania A (2021) A tiny CNN architecture for medical face mask detection for resource-constrained endpoints. In: Innovations in electrical and electronic engineering. Springer, Berlin, pp 657–670 28. Mokalla SR, Bourlai T (2019) On designing MWIR and visible band based deepface detection models. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, Piscataway, pp 1140–1147 29. Mokalla SR, Bourlai T (2020) Face detection in MWIR spectrum. In: Securing social identity in mobile platforms. Springer, Berlin, pp 145–158 30. Nascimento JC, Abrantes AJ, Marques JS (1999) An algorithm for centroid-based tracking of moving objects. In: 1999 IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), vol 6. IEEE, Piscataway, pp 3305–3308 31. Ngan M, Grother P, Hanaoka K (2020) Ongoing face recognition vendor test (FRVT) part 6a: face recognition accuracy with masks using pre- covid-19 algorithms. https://doi.org/10.6028/ NIST.IR.8311 32. Osia N, Bourlai T (2012) Holistic and partial face recognition in the MWIR band using manual and automatic detection of face-based features. In: 2012 IEEE conference on technologies for homeland security (HST). IEEE, Piscataway, pp 273–279 33. Osia N, Bourlai T (2017) Bridging the spectral gap using image synthesis: a study on matching visible to passive infrared face images. Mach Vis Appl 28(5):649–663 34. Peri N, Gleason J, Castillo CD, Bourlai T, Patel VM, Chellappa R (2021) A synthesis-based approach for thermal-to-visible face verification. In: 2021 16th IEEE international conference on automatic face and gesture recognition (FG 2021). IEEE, Piscataway, pp 01–08 35. Qin B, Li D (2020) Identifying facemask-wearing condition using image super-resolution with classification network to prevent covid-19. Sensors 20(18):5236 36. Rahman MM, Manik MMH, Islam MM, Mahmud S, Kim JH (2020) An automated system to limit covid-19 using facial mask detection in smart city network. In: 2020 IEEE international IOT, electronics and mechatronics conference (IEMTRONICS). IEEE, Piscataway, pp 1–5 37. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271 38. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767 39. Sethi S, Kathuria M, Kaushik T (2021) Face mask detection using deep learning: an approach to reduce risk of coronavirus spread. J Biomed Inf 120:103848 40. Suresh K, Palangappa M, Bhuvan S (2021) Face mask detection by using optimistic convolutional neural network. In: 2021 6th international conference on inventive computation technologies (ICICT). IEEE, Piscataway, pp 1084–1089 41. Tan M, Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. Proceedings of machine learning research, pp 6105–6114 42. Wang J, Yuan Y, Yu G (2017) Face attention network: an effective face detector for the occluded faces. arXiv:1711.07246 43. Wang Z, Wang G, Huang B, Xiong Z, Hong Q, Wu H, Yi P, Jiang K, Wang N, Pei Y, et al (2020) Masked face recognition dataset and application. arXiv:2003.09093 44. Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5525–5533 45. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503 46. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710

On the Effectiveness of Visible and MWIR-Based Periocular Human Authentication When Wearing Face Masks Ananya Zabin, Suha Reddy Mokalla, and Thirimachos Bourlai

Abstract In the COVID-19 (Coronavirus Disease-19) era, everyone is advised to wear a face mask. According to the Centers for Disease Control and Prevention (CDC) published on 19th of April 2021 (Centers for Disease Control and Prevention: Guidance for wearing masks, 2021), masks are most effective when everyone wears one to completely cover vital facial features, i.e., nose and mouth, and to fit correctly against the sides of the face, so that there is no air gap. While the benefits of masks to control the spread of a pandemic or even an endemic are known, many challenges are imposed on state-of-the-art face recognition systems. According to a study published by the National Institute of Standards and Technology (NISTIR 8311), the accuracy of facial recognition algorithms is reduced between 5 and 50% when compared to the accuracy yielded by the same algorithms when the subjects are not wearing face masks. It has been observed that the periocular region is the feature-rich region around the eye, which includes features such as eyelids, eyelashes, eyebrows, tear ducts, eye shape, and skin texture. The report from NIST also states that face images of subjects wearing masks can increase the failure to enroll rate (FER) more frequently than before. In addition, masked face images lower the efficiency of surveillance (unconstrained) face recognition systems, which become even more prone to error due to occlusion, distance, camera quality, outdoors, and low light. In this study, we focus on the effectiveness of periocularbased (combining both eye regions and not just one eye, i.e. left or right) face recognition algorithms when the subjects are wearing face masks under controlled and challenging conditions both for visible and MWIR (mid-wave infrared) band face images. We utilize MILAB-VTF(B), a challenging multi-spectral face dataset composed of thermal and visible videos collected at the University of Georgia (the largest and most comprehensive dual band face dataset to date) (Peri et al (2021) A synthesis-based approach for thermal-to-visible face verification. In: 2021 16th IEEE international conference on automatic face and gesture recognition (FG 2021).

A. Zabin () · S. Mokalla · T. Bourlai Multispectral Imagery Lab—MILAB, ECE, University of Georgia, Athens, GA, USA e-mail: [email protected]; [email protected]; [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 T. Bourlai et al. (eds.), Disease Control Through Social Network Surveillance, Lecture Notes in Social Networks, https://doi.org/10.1007/978-3-031-07869-9_11

207

208

A. Zabin et al.

IEEE, pp 01–08). We manually crop the faces from the images and use the existing pre-trained face recognition algorithms to perform periocular face recognition. The FR models used in this research study are FaceNet, and VGG-Face. After manually cropping the faces, we perform same-spectral periocular face recognition. FaceNet yields a Rank-1 face identification accuracy of 87.54 and 83.54% in the thermal and visible bands respectively, while VGG-Face yields superior performance with 100 and 99.52% in the thermal and visible bands respectively. Additionally, we also perform same-spectral face recognition experiments (visible-to-visible and thermalto-thermal) and report the results. Keywords Covid-19 · face mask · Periocular · Biometric authentication · MILAB-VTF(B) · Large-scale data · Thermal band · Visible band

1 Introduction Biometrics is an automated process of recognizing individuals based on their unique behavioral and physiological traits. These traits include, but are not limited to, face, fingerprint, iris, sclera, ears, etc. Face and iris are the most widely used biometrics. Face, since face images can be captured in a covert manner without the subject’s knowledge at various stand-off distances, is one of the important biometric modalities. Iris, on the other hand requires higher user cooperation, but once it is properly captured, it is very feature rich and unobtrusive. Iris pattern, referred to as iris stroma, is stochastic in nature, genetically impenetrable, inimitable, complex, and different even in identical twins [22]. In this work we focus on face-based recognition systems which can be used for human identification (1:N matching), and verification (1:1 matching) experiment. Modern face recognition systems are so efficient that they can detect if the person claiming their identity in a live session is real (aliveness detection)[35]. Such systems aim to prevent from vulnerabilities such as showing the picture of a real person and successfully accessing a system without the person’s presence and consent. There are many academic and commercial examples where this technology is useful. For example, Facebook’s auto-tagging technique is built using facial recognition technology. It identifies the person and tags them whenever anyone uploads their picture. It is so efficient that, even when the person’s face is occluded, or the picture is taken in darkness, it accurately tags them. All these successful face recognition systems are the results of recent advancements in the field of computer vision, which is backed by powerful deep learning algorithms [8]. The outbreak of novel coronavirus disease in 2019 (Covid-19) has made us rethink most aspects of our everyday life. Within a brief period, it evolved as a global pandemic, and no one knows if and when the spread of this virus can be contained. To remain safe from the virus, we had to make a habit of wearing a mask [9], keeping 6 ft distance from others and practice being contact-less as much as possible so that the virus does not spread. With everything else, we had to

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

209

think of changing the way we handle technology as well. In these circumstances, existing biometric system solutions are facing many challenges too. Most of the organizations are leaning towards using non-contact based authentication systems over the contact-based ones such as finger-print recognition [34]. Even though face is a non-contact based biometric modality, it is becoming a more challenging trait to use due to occlusion caused by the masks [5, 36]. While, on the other hand, iris recognition needs higher user cooperation, periocular recognition is a convenient next step solution. Periocular region refers to the feature rich region around the eye which includes features such as eyelids, eyelashes, eyebrows, tear duct, eye shape, skin texture etc. as shown in Fig. 1. A periocular region-based authentication system is a good trade-off between face and iris based biometric authentication systems, since face-based systems cannot be used without whole or almost whole face in view, and iris based systems require high user cooperation. In practice, it can be applied to any image where the face is occluded (Fig. 2) (e.g. by wearing a hat, or by the presence of facial hair and clothes in front of the face) and the quality of the periocular image is good enough to allow for a periocular based matching algorithm to be efficient. While the researchers are now considering periocular region-based authentication as an alternative to other types of authentication systems, Park et al. [30] analyzed the feasibility of the periocular region as a biometric trait. A few researchers also pointed out that the periocular area suffers less impact from expression variations [38] and aging [18] as compared to the entire face [44]. Another challenge in the aforementioned systems is the selection of the operational band. Most of the passive recognition systems (based on face, iris, periocular etc.) focus on the usage of visible band images, i.e., when using cameras operating in the 300–700 nm range of the Electro-Magnetic (EM) spectrum. Visible band-based authentication systems are more cost efficient given the low cost of visible band imaging sensors. There is also a wide availability of large-scale visible band face

Fig. 1 Periocular region contains both the eyes, eyebrows and attached areas, which consists of some significant facial features that are still visible while wearing a face mask

210

A. Zabin et al.

Fig. 2 Example face images where periocular recognition can be applied. The examples include but are not limited to helmet, garments, hair occluding parts of the face etc

datasets, namely the LFW [17], FRGC V2.0 [33], etc. However, when operating in low-light to no-light conditions, operating in the visible band becomes challenging since it is highly variant to ambient lighting conditions [5]. This issue can be addressed by using other bands of the EM spectrum. One among these is Infrared (IR), which can be broadly classified into active and passive IR. Active IR comprises of Short-Wave Infrared (SWIR)[3, 26], and Near Infrared (NIR) [27], and passive IR comprises of Mid-Wave Infrared (MWIR) and Long-Wave Infrared (LWIR) bands [10]. This chapter focuses on MWIR (3–5 μm) band based authentication system [25, 29] alongside visible [8, 28]. When an image or a video is captured in the MWIR spectrum, the IR radiation emitted by the subject’s face in the form of heat is detected by the camera sensor. This makes MWIR a feasible option in low-light to no-light conditions [24]. In addition to this, when used to capture face images with masks on, the data is not affected significantly. Since each of the visible and MWIR bands have their own advantages and disadvantages, design, and development of band independent set of algorithms for face detection and recognition is a viable approach [4].

1.1 Goals and Contributions The goal of this chapter is to perform same-spectral periocular recognition experiments using two of the most popular face recognition (FR) models available in the literature, namely, FaceNet [37], and VGG-Face [31]. The contributions of the chapter are explained below: 1. To introduce a new dataset (MILAB-VTF(B)) which includes thermal and visible face images collected in constrained indoor, and unconstrained outdoor environments. The indoor data is collected at a stand-off distance of 6 ft, and the outdoor data is collected at 100, 200, 300, and 400 m (Fig. 3). This dataset will become publicly available within 2022. 2. To propose a novel approach for periocular based recognition using existing deep learning based state-of-the-art models.

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

211

3. To demonstrate the effectiveness of these models when used for recognizing individuals with and without wearing face masks. The rest of this chapter is organized as follows. Section 2 mentions periocular authentication related research, Sect. 3 describes Multi-Task Convolutional Neural Networks (MTCNN) for face detection, FaceNet, and VGGFace for face recognition in detail and how we choose which experiment to go along with. Section 4 explains the datasets and their effects, pre-processing of data, how the models were applied and how their results differ respectively. Conclusions and future work are described in Sect. 5.

2 Related Research There are many algorithms available in the open literature on face recognition that focus on periocular region based authentication. Table 1 presents different approaches used over time for periocular recognition. Although they achieved

Table 1 Different methods for periocular authentication in literature Method Kumari [19] Mason [23] Hwang [16] Boutors [7] Park [30] FaceNet [37] SphereFace [21]

CosFace [40]

ArcFace [11] Sub-center ArcFace [12] CircleLoss [39] Bourlai [5] Proposed

Used feature HOG + ResNet101/ VGG19 Modified AlexNet Feature extraction and ResNet MobileNetV3 HOG + LBP + SIFT Triplet loss using L2 distance between the Euclidean embeddings of face images. Angular softmax loss that incorporates an angular margin and a parameter that quantitatively controls the size of the angular margin Large Margin Cosine Loss to maximize the inter-class cosine margin with decision margin in the cosine space Uses Arc-cosine function and an additive angular margin A variant of ArcFace where sub-centers are designed with in the same class Penalizes positive and negative scores differently CSU Face Identification Evaluation System [2] Masked face recognition using pre-trained models

Band Visible Visible Visible Visible Visible Visible

Mask (Y/N) Y N N N N N

Visible

N

Visible

N

Visible

N

Visible

N

Visible

N

MWIR

N

MWIR/Visible

Y

212

A. Zabin et al.

promising results, all the experiments are conducted using visible band face images, whereas our research is accomplished with visible and thermal band images with mask. Kumari et al. [19] adopt an approach to calculate the Euclidean distance between the medial and lateral canthus points1 and find the midpoint of the line joining those canthus points. They calculated the top left and bottom right corner points of the rectangular region of interest (ROI). Figure 1 shows the periocular region of a face, which includes eyes, eyebrows and attached areas that has some notable features while wearing a mask. Whatever is the shape of the mask, it hardly affects the periocular area. This is the advantage of using this part of the face for authentication. Next, to extract the feature from the selected ROI, they used Histogram of Oriented Gradients (HOG) for its computational complexity is exceptionally low. It can handle non-ideal situations to extract the handcrafted features and several other transfer learning algorithms to extract other types of components. Finally, they use a multi-class Support Vector Machine (SVM) to perform the face recognition task. Hand crafted features extracted by the periocular data using feature descriptors such as HOG, LBP, and SIFT has been used [30] with promising results. Mason et al. [23] proposed to use periocular biometrics in the healthcare systems instead of using a physical device for identification; for that, they used a deep learning approach to reduce the training and processing time, they used a modified AlexNet in the form of transfer learning. He et al. [13] and Hwang et al. [16] emphasize the importance of using a periocular authentication system in head-mounted displays (HMD), proposing a combination of feature extraction and ResNet, which have very low equal error rate (EER), false acceptance rate (FAR) and false rejection rate (FRR). Boutors et al. [7] and Howard et al. [15] propose periocular authentication for HMD, where their ERR is relatively low with MobileNetV3. Mokalla and Bourlai [24] show the benefit of Mid-Wave Infrared (MWIR) data for face recognition, as they assess both visible and thermal face recognition with state-of-the-art algorithms. In their research, it is seen that at some cases MWIR band data produces even better results than visible band data. Bourlai et al. [5, 6] study the spectral gap between visible and MWIR band images and how MWIR images can be beneficial while in harsh environmental conditions characterized by unfavorable lighting and pronounced shadows (such as a nighttime environment). Bakshi et al. [1] analyze the accuracy of face recognition algorithms with periocular data employing on both visible and MWIR images and compared how controlling the threshold value can lead to better performance for both types of data. There are some recent visible-to-visible face recognition algorithms available in the open literature that use deep CNNs (Convolutional Neural Networks). Most, if not all, of these models use the distance between Euclidean or other form of embeddings as a measure of similarity between faces. FaceNet [37] is one of

1

Canthus is either corner of the eye where the upper and lower eyelids meet. More specifically, the inner and outer canthi are, respectively, the medial and lateral ends/angles of the palpebral fissure.

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

213

the state-of-the-art face recognition algorithms available for visible-to-visible face recognition. It uses a triplet loss function which is obtained by calculating the L2 distance between the Euclidean embeddings of faces such that the distances represent the face similarity i.e., faces of the same person have smaller distances and faces of distinct people have large distances. Li [20] presents an implementation of triplet loss on face recognition task and conduct several experiments to analyze the factors that influence the training of triplet loss. They use triplet pairs (one anchor image—one positive image, the same anchor image—one negative image). The samples are then mapped into a feature vector through deep CNNs such as Resnet [13] or MobileNet [14]. One major drawback with this method is that it requires mining of triplets to train the model. Also, angular margin is preferred to the Euclidean margin because the cosine of the angle has an intrinsic consistency with softmax. To overcome this problem Liu et. al. [21] proposes SphereFace, in which angular softmax (A-Softmax) loss is introduced that incorporates an angular margin. A-Softmax loss learns discriminating features that span on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. In addition to that they introduce a parameter that quantitatively controls the size of angular margin and derive lower bounds on this margin such that A-Softmax loss can approximate that minimal inter-class distance is larger than the maximal intraclass distance. This decision boundary parameter is defined over the angular space. CosFace [40] uses Large Margin Cosine Loss (LMCL) that takes the normalized features as input to learn highly discriminating features by maximizing the interclass cosine margin. This loss defines the decision margin in the cosine space and not in the angular space as in [21]. ArcFace [11] utilizes the arc-cosine function to calculate the angle between the current feature and the target weight, since the dot product between the DCNN (Deep Convolutional Neural Network) feature and the last fully connected layer is equal to the cosine distance after feature and weight normalization. Then, an additive angular margin is added to the target angle and the target logit is obtained back from the cosine function. This has an advantage over the other angular margin losses as it directly optimizes the geodesic margin that is the exact correspondence between the angle and the arc in the normalized hypersphere. The above margin-based face recognition methods are susceptible to the label noise in the training data and thus require human effort to clean the datasets. To address this problem, Deng et al. [12] propose the sub-center ArcFace, where the intra-class constraint forces all samples close to the corresponding positive centers by introducing sub-centers. This avoids the possibility of a noisy image not belonging to the corresponding positive class. Instead, several sub-centers are designed with in the same class and the training sample only needs to be close to any of these sub-centers. This encourages one dominant sub-class that contains majority of the clean faces and multiple non-dominant sub-classes that include hard or noisy faces resulting in a model that is robust to noise. Sun et al. [39] propose the Circle Loss, which penalizes various similarity scores differently i.e., if a similarity score deviates far from the optimum, it receives a significant penalty. To this end, they use two different weighting parameters for the intra-class and inter-class distances, allowing them to learn at different paces. This

214

A. Zabin et al.

leads to a unified loss function that learns with class-level labels (softmax crossentropy loss function) and pair-wise labels (triplet loss etc.). While there are algorithms proposed in the visible band images for face and periocular recognition, to the best of our knowledge, this is the first research study that proposes using thermal band images for periocular recognition. We use the existing methods and test their efficiency on periocular recognition where the individuals are wearing face masks. This research is highly useful and essential in various operational scenarios including health related ones, such as during a pandemic with mask mandates in place.

3 Methodology This section presents the methodological approach proposed in this study to address the problem of periocular region-based authentication system to identify masked faces in the COVID-19 era. Figure 4 explains the entire methodological approach of this research. After collecting the face images from different sensors, face detection and normalization are performed through MTCNN in the visible band unmasked images. Faces in the thermal band, both, masked and unmasked are manually cropped. Also, masked faces in the visible band are cropped manually. Then pretrained same spectral face recognition models are used for identifying these masked faces.

3.1 Pre-processing The pre-processing steps include extracting the required frames from the videos, cropping the face part out of the image to remove the unnecessary background information. Only frontal frames are used to perform the experiments; therefore these frames are manually selected. The next step is to crop the faces. For the visible band unmasked face images, MTCNN is used to detect and crop the faces. For all the other images, i.e., thermal masked and unmasked images, and visible masked images, faces are cropped by manually annotating the bounding boxes.

3.1.1 MTCNN To be able to perform face recognition, faces need to be detected and cropped. MultiTask cascaded Convolutional Neural Networks (MTCNN) [43] is used to perform face detection and alignment in visible band unmasked face images. The process starts with initializing the resize of input images into different scales and, thus, building an image pyramid, which is the input for the following three stages.

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

215

Fig. 3 RMFD dataset (a and b) consists of uncontrolled random (indoor and outdoor) face images, whereas MILAB(B)-VTF dataset presents images captured in a more controlled environment. For identification experiments with masked face images, we chose only indoor data from the MILAB(B)-VTF dataset (c, d, e and f) to analyze how accurately the pre-trained models work with masked face data

216

A. Zabin et al.

Fig. 4 Figure showing the methodological approach followed. The first step is to acquire the datasets (MILAB(B)-VTF and RMFD), then detect and crop the faces and use pre-trained models for face recognition

Stage 1 A fully convolutional network named Proposal Network (P-Net) is incorporated to get the candidate facial windows and their bounding box regression vectors. Based on the approximate bounding box regression vectors, the candidates are now fine-tuned. Next, non-maximum suppression (NMS)2 is used to combine the highly overlapped candidates. Stage 2 The resulting candidates are fed to another CNN, called Refine Network (R-Net), which further eliminates many false candidates, performs calibration with bounding box regression, and employs NMS. Stage 3 The third stage is similar to the second and the only difference is that in this stage, the aim is to identify face regions with more supervision. Notably, the output will be five facial landmarks’ positions. MTCNN yields state-of-the-art performance in detecting faces and is therefore used in face identification, verification, and facial expression identification experiments [42].

3.2 FaceNet FaceNet [37], developed by researchers at Google in 2015, is a unified system for face verification (is this the same person?), recognition (who is this person?), and clustering (grouping together images that belong to the same person). It achieved state-of-the-art results on a range of face recognition benchmark datasets. FaceNet is based on learning a Euclidean embedding per image using a deep convolutional network. It extracts high-quality features from face images and predicts a 128-element vector representation of these features, called face embedding.

2

It is a class of algorithms to select one entity (e.g., bounding boxes) out of many overlapping entities.

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

217

Fig. 5 Example images from MILAB(B)-VTF dataset. Periocular region refers to the periphery of eyes which contains eye, eyebrow, and pre-eye orbital region. (a) Visible face images with mask. (b) Thermal face images with mask. (c) Visible face images without mask. (d) Thermal face images without mask

218

A. Zabin et al.

It calculates the squared L2 distance corresponding to the face similarity. The more similar the faces are, the lower the distance is i.e., there is an exceedingly small distance among the images of the same person than among the images of different people. After the embedding, face verification includes thresholding the distance between the two embeddings; recognition is a k-NN3 classification problem. While training the deep convolutional network, it uses Stochastic Gradient Descent (SGD) with standard backpropagation and Adaptive Gradient Descent optimization (AdaGrad). The FaceNet model is trained with an initial learning rate of 0.05 and decayed exponentially to finalize the model. FaceNet uses triplet loss [37], which uses triplets. These are sets of three images which consist of an anchor image, a positive image that belongs to the same identity as that of the anchor image, and a negative image, which belongs to a different identity. The premise of triplet loss is that the Euclidean distance between the images of the same identity is lower than the distance between the images of different identities. Selection of triplets plays an important role in training an efficient model. To train FaceNet, triplets are selected using two different methods: generate triplets either offline or online.

3.3 VGG Face VGG Face [31] uses a long sequence of convolutional layers. The end of the layers is set up as an N-ways classification problem. Each training image passing through the layers is associated with a score vector in such a way that the final fully connected layer would end up inferred N linear predictions. Then, these are compared to the ground-truth class identified by computing empirical softmax log-loss. After learning, the classification layer can be removed, and the scores can be used for face identification or verification experiments. Scores can be improved further by tuning in the Euclidean space with the triplet loss.

3.4 Selecting Pre-trained Model In this research, we use pre-trained models to perform face recognition tasks. With RMFD and MILAB(B)-VTF, we executed both FaceNet and VGGFace models. VGGFace consistently outperformed FaceNet. Figure 6 shows that at rank-1 it yields more than 90% accuracy. The results are shown in detail in Table 2.

3

K nearest neighbors (k-NN) is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions).

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

219

Fig. 6 Figure shows cumulative match characteristic (CMC) curve, a metric used to measure the performance of recognition algorithms based on the precision for each rank. The curve above shows that VGG Face invariably produces more than 90% rank-1 accuracy, whereas FaceNet’s lowest accuracy is around 75%. (a) Combined CMC curve for FaceNet. (b) Combined CMC curve for VGGFace

220

A. Zabin et al.

4 Experimental Results This section describes the datasets used, pre-processing work completed, experiments performed, and results obtained in detail. All the experiments are performed using two NVIDIA 2080 Ti GPUs (Graphical Processing Units).

4.1 Datasets The facial recognition experiments in this chapter have been performed using two datasets: one of which is captured in a more organized and controlled (MILAB(B)VTF) environment for the definite purpose, where the other one is in-the-wild dataset, which includes images from CCTV footage, putting simulated mask on face images, collecting face images (with and without mask) from news media or online, etc. (RMFD).

4.1.1 MILAB(B)-VTF MILAB(B)-VTF dataset comprises of MWIR, and visible images captured under constrained indoor and unconstrained outdoor environments. Indoor images are captured at a stand-off distance of 6 ft, and outdoor images are captured at four different stand-off distances: 100, 200, 300, and 400 m of 400 individuals. While unmasked face data is collected from all the 400 subjects involved in the data collection at all the distances, masked data was collected for 280 people in indoor setting and 100m outdoors. The MWIR cameras used are FLIR A8581 and FLIR RS8513 for capturing indoor and outdoor data respectively. The visible cameras used are Canon EOS 5D Mark IV and Nikon Coolpix P1000 for indoor and outdoor data respectively. This dataset was collected early 2021 by the MILAB team led by Dr. Bourlai at UGA and is in the final stages of processing to be publicly released on demand. For this study, we used only the indoor fully frontal images of 100 subjects (8 images were taken per subject including enrollment and recognition), both visible and thermal, which are taken with Canon EOS 5D Mark IV and FLIR A8581. The resolution of the images captured with Canon Mark IV is 1920×1080, and the FLIR A8581 image resolution is 1280×1024.

4.1.2 RMFD Real-World Masked Face Dataset (RMFD)[41] is a large dataset for masked face detection. This dataset has several parts, primarily collected from websites, or simulated from other publicly available face recognition datasets. The number of

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

221

Table 2 Experimentation with masked and unmasked data from MILAB(B)-VTF dataset show promising results, which inspired us to expand it into more sophisticated research and include the periocular region more into authentication system Type of data Thermal—masked vs masked Thermal—unmasked vs unmasked Visible—masked vs masked Visible—unmasked vs unmasked

Model FaceNet VGG Face FaceNet VGG Face FaceNet VGG face FaceNet VGG Face

Accuracy (%) 87.54 100 96.22 99.84 83.54 99.59 70.00 99.52

images collected from websites is 5000 masked faces of 525 people and 90,000 regular faces (unmasked). The simulated dataset contains 500,000 images of 10,000 subjects. At present, RMFD is the world’s largest masked face dataset to accumulate data resources for possible intelligent management and control of similar public safety events in the future, available online. The resolution of the images in this dataset is 256×256. For this study, we used masked images of 100 subjects from the 525 subjects which were collected from website. Since this is a randomly collected dataset with no strictly controlled environment, there are photos with occluded face areas (other than the face masks, like sunglasses and other random obstructs). These cause the face recognition algorithms to give very poor accuracy, as low as only 10–15%.

4.2 Effects of Different Datasets From Fig. 3, we can see that the RMFD dataset contains various types of images captured in different modes and circumstances. While some of the images are of high quality, the rest are of surveillance quality, which makes it difficult to detect or identify a face. The faces in these images could not be detected by MTCNN due to occlusion and poor quality of data. Also, FaceNet and VGGFace could not produce satisfactory results on the RMFD dataset (both masked and unmasked). As the MILAB(B)-VTF data is captured in a more controlled environment and the visibility of facial landmarks is relatively high, the recognition accuracy on this dataset is higher than that of the RMFD. They performed well in this study. Both thermal and visible band images from MILAB(B)-VTF dataset are used in face recognition experiments. It can be seen from Table 2, that FaceNet and VGGFace performed better on thermal images.

222

A. Zabin et al.

4.3 Data Preprocessing Considering the datasets (in Fig. 3), each image is different, including the surrounding parts. For a face recognition algorithm to work accurately, we need to perform data preprocessing. Here, we detect the faces using the Multi-Task cascaded Convolutional Neural Networks (MTCNN) algorithm and crop the faces (shown in Fig. 5). Both FaceNet and VGGFace need detected faces to perform the face recognition tasks.

4.4 Visible vs Thermal Data It is evident from the results presented in Table 2 and Fig. 6 that both the face recognition models used perform better on the thermal data. This is because the thermal camera captures images using temperature by detecting the heat emitted from the subject. Due to this, the occlusion caused by the mask does not affect the performance of face recognition model on the thermal images [32]. Visible face images are result is excellent face matching results when the images are captured in full light environments in good quality, and they are not occluded by masks.

4.5 Performance of Different Models Table 2 shows that VGGFace outperformed FaceNet in all the scenarios. These models are performed on unmasked visible faces and yet yielded high accuracy metrics on masked data. VGGFace yields 100% rank-1 scores when using thermal face images when mask is used, and this is 12.46% higher than FaceNet. VGGFace yields an accuracy of 100% on thermal unmasked face data, which is 0.48% higher than its accuracy on visible data. Applying VGG Face on thermal face images yields the highest accuracy (100%). In Fig. 6b it is shown that all four experiment’s rank— 1 accuracy with VGGFace is 99%+ , which is not the case with FaceNet in Fig. 6a.

5 Conclusion and Future Work We have presented a study on facial recognition performed with visible and thermal band images applying VGGFace and FaceNet. Section 4 shows that applying VGGFace on good quality, close distance, indoor collected, frontal pose thermal masked face data yields high rank 1 scores. It results in an accuracy of 100%, while FaceNet yields a 87.54% accuracy with the same data. On the other hand, for unmasked thermal data the algorithms yield 99.84 and 96.22% accuracy

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

223

consequently. For Visible masked data VGGFace and FaceNet capitulate 99.52 and 83.54% accordingly. Thus, thermal face images can be a potential solution for face recognition with occluded face data using face masks. Currently, our research on periocular recognition is carried out only with the existing face datasets. To the authors’ knowledge, there is not yet any dataset that has been collected solely for capturing the periocular region of a face. Researchers who included iris patterns with the periocular part to get better accuracy in authentication would benefit most if a periocular dataset were available. The same goes for the MWIR data. Facial recognition is crucial for security, where user cooperation (i.e., subjects are expected to look towards the camera) is not always possible. Nevertheless, there are some occasions where occlusion on the face is a reality. For those occasions, a particularly satisfactory solution can be a periocular region-based authentication system. Acknowledgments This work was partially supported by an STTR Phase II contract W911QX20C0022 from the US Army Research Laboratory, Adelphi, MD.”

References 1. Bakshi S, Sa PK, Wang H, Barpanda SS, Majhi B (2018) Fast periocular authentication in handheld devices with reduced phase intensive local pattern. Multimedia Tools Appl 77(14):17595–17623 2. Bolme DS, Beveridge JR, Teixeira M, Draper BA (2003) The CSU face identification evaluation system: its purpose, features, and structure. In: International Conference on Computer Vision Systems. Springer, Berlin, pp 304–313 3. Bourlai T (2012) Short-wave infrared for face-based recognition systems. SPIE Newsroom Magazine-Defense & Security, pp 1–2 4. Bourlai T (2013) Mid-wave IR face recognition systems. SPIE Newsroom Magazine-Defense & Security, pp 1–3 5. Bourlai T (2016) Face recognition across the imaging spectrum. Springer, Berlin 6. Bourlai T, Ross A, Chen C, Hornak L (2012) A study on using mid-wave infrared images for face recognition. In: Sensing technologies for global health, military medicine, disaster response, and environmental monitoring II; and biometric technology for human identification IX, vol. 8371. International Society for Optics and Photonics, p 83711K 7. Boutros F, Damer N, Raja K, Ramachandra R, Kirchbuchner F, Kuijper A (2020) Fusing IRIS and periocular region for user verification in head mounted displays. In: 2020 IEEE 23rd international conference on information fusion (FUSION). IEEE, pp 1–8 8. Cao Z, Schmid NA, Bourlai T (2016) Composite multilobe descriptors for cross-spectral recognition of full and partial face. Opt Eng 55(8):083107 9. Centers for Disease Control and Prevention (2021) Guidance for wearing masks. https://www. cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/cloth-face-cover-guidance.html 10. Dawson J, Leffel S, Whitelam C, Bourlai T (2016) Face recognition across the imaging spectrum. In: Collection of multi-spectral biometrics data for cross-spectral identification applications. Springer, Berlin 11. Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699

224

A. Zabin et al.

12. Deng J, Guo J, Liu T, Gong M, Zafeiriou S (2020) Sub-center arcface: boosting face recognition by large-scale noisy web faces. In: European Conference on Computer Vision. Springer, Berlin, pp 741–757 13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 14. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 15. Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1314–1324 16. Hwang H, Lee EC (2020) Near-infrared image-based periocular biometric method using convolutional neural network. IEEE Access 8:158612–158621 17. Jain V, Learned-Miller E (2010) FDDB: a benchmark for face detection in unconstrained settings. Tech. Rep. UM-CS-2010-009, University of Massachusetts, Amherst 18. Juefei-Xu F, Luu K, Savvides M, Bui TD, Suen CY (2011) Investigating age invariant face recognition based on periocular biometrics. In: 2011 international joint conference on biometrics (IJCB). IEEE, pp 1–7 19. Kumari P, Seeja K (2021) A novel periocular biometrics solution for authentication during covid-19 pandemic situation. J Ambient Intell Humanized Comput 34(4):1–17 20. Li Y (2019) Massface: an efficient implementation using triplet loss for face recognition. arXiv:1902.11007 21. Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220 22. Marra F, Poggi G, Sansone C, Verdoliva L (2018) A deep learning approach for IRIS sensor model identification. Pattern Recognit Lett 113:46–53 23. Mason J, Dave R, Chatterjee P, Graham-Allen I, Esterline A, Roy K (2020) An investigation of biometric authentication in the healthcare environment. Array 8:100042 24. Mokalla SR, Bourlai T (2019) On designing MWIR and visible band based deepface detection models. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp. 1140–1147. IEEE (2019) 25. Mokalla SR, Bourlai T (2020) Face detection in MWIR spectrum. In: Securing social identity in mobile platforms. Springer, Berlin, pp 145–158 26. Narang N, Bourlai T (2018) Deep feature learning for classification when using single sensor multi-wavelength based facial recognition systems in swir band. In: Surveillance in action. Springer, Berlin, pp 147–163 27. Narang N, Bourlai T (2020) Classification of soft biometric traits when matching near-infrared long-range face images against their visible counterparts. In: Securing social identity in mobile platforms. Springer, Berlin, pp. 77–104 28. Osia N, Bourlai T (2017) Bridging the spectral gap using image synthesis: a study on matching visible to passive infrared face images. Mach. Vis Appl. 28(5):649–663 29. Osia N, Bourlai T, Hornak L (2018) Facial surveillance and recognition in the passive infrared bands. In: Surveillance in action. Springer, Berlin, pp 127–145 30. Park U, Ross A, Jain AK (2009) Periocular biometrics in the visible spectrum: a feasibility study. In: 2009 IEEE 3rd international conference on biometrics: theory, applications, and systems. IEEE, pp 1–6 31. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Proceedings of the British Machine Vision, vol 1, no 3, p 6 32. Peri N, Gleason J, Castillo CD, Bourlai T, Patel VM, Chellappa R (2021) A synthesis-based approach for thermal-to-visible face verification. In: 2021 16th IEEE international conference on automatic face and gesture recognition (FG 2021). IEEE, Piscataway, pp 01–08

On the Effectiveness of Visible and MWIR-based Periocular Human. . .

225

33. Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Min J, Worek W (2005) Overview of the face recognition grand challenge. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, Piscataway, pp 947–954 34. Poh N, Bourlai T, Kittler J (2010) A multimodal biometric test bed for quality-dependent, costsensitive and client-specific score-level fusion algorithms. Pattern Recognit 43(3):1094–1105 35. Purnapatra S, Smalt N, Bahmani K, Das P, Yambay D, Mohammadi A, George, A, Bourlai T, Marcel S, Schuckers S, Fang M, Damer N, Boutros F, Kuijper A, Kantarci A, Demir B, Yildiz Z, Ghafoory Z, Dertli H, Ekenel HK, Vu S, Christophides V, Dashuang L, Guanghao Z, Zhanlong H, Junfu L, Yufeng J, Liu S, Huang S, Kuei S, Singh JM, Ramachandra R (2021) Face liveness detection competition (livdet-face) - 2021. In: 2021 IEEE international joint conference on biometrics (IJCB), pp1–10. https://doi.org/10.1109/IJCB52358.2021.9484359 36. Rose J, Bourlai T, Liu H (2020) Face mask compliance classification during a pandemic. In: Disease control through social network surveillance. Springer, Berlin 37. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823 38. Smereka JM, Boddeti VN, Kumar BV (2015) Probabilistic deformation models for challenging periocular image verification. IEEE Trans Inf Forensics Secur 10(9):1875–1890 39. Sun Y, Cheng C, Zhang Y, Zhang C, Zheng L, Wang Z, Wei Y (2020) Circle loss: a unified perspective of pair similarity optimization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6398–6407 40. Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274 41. Wang Z, Wang G, Huang B, Xiong Z, Hong Q, Wu H, Yi P, Jiang K, Wang N, Pei Y, et al (2020) Masked face recognition dataset and application. arXiv:2003.09093 42. Xiang J, Zhu G (2017) Joint face detection and facial expression recognition with MTCNN. In: 2017 4th international conference on information science and control engineering (ICISCE). IEEE, Piscataway, pp 424–427 43. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503 44. Zhao Z, Kumar A (2018) Improving periocular recognition by explicit attention to critical regions in deep neural network. IEEE Trans Inf Forensics Secur 13(12):2937–2952