160 15 10MB
English Pages 204 [194] Year 2022
LNBIP 442
Ewa Ziemba Witold Chmielarz (Eds.)
Information Technology for Management Business and Social Issues 16th Conference, ISM 2021 and FedCSIS-AIST 2021 Track, Held as Part of FedCSIS 2021 Virtual Event, September 2–5, 2021 Extended and Revised Selected Papers
123
Lecture Notes in Business Information Processing Series Editors Wil van der Aalst RWTH Aachen University, Aachen, Germany John Mylopoulos University of Trento, Trento, Italy Sudha Ram University of Arizona, Tucson, AZ, USA Michael Rosemann Queensland University of Technology, Brisbane, QLD, Australia Clemens Szyperski Microsoft Research, Redmond, WA, USA
442
More information about this series at https://link.springer.com/bookseries/7911
Ewa Ziemba · Witold Chmielarz (Eds.)
Information Technology for Management Business and Social Issues 16th Conference, ISM 2021 and FedCSIS-AIST 2021 Track, Held as Part of FedCSIS 2021 Virtual Event, September 2–5, 2021 Extended and Revised Selected Papers
Editors Ewa Ziemba University of Economics in Katowice Katowice, Poland
Witold Chmielarz University of Warsaw Warsaw, Poland
ISSN 1865-1348 ISSN 1865-1356 (electronic) Lecture Notes in Business Information Processing ISBN 978-3-030-98996-5 ISBN 978-3-030-98997-2 (eBook) https://doi.org/10.1007/978-3-030-98997-2 © Springer Nature Switzerland AG 2022 Chapter “Artificial Intelligence Project Success Factors—Beyond the Ethical Principles” is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/ by/4.0/). For further details see license information in the chapter. This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Six editions of this book have appeared in the last six years: • Information Technology for Management in 2016 (LNBIP 243); • Information Technology for Management: New Ideas or Real Solutions in 2017 (LNBIP 277); • Information Technology for Management: Ongoing Research and Development in 2018 (LNBIP 311); • Information Technology for Management: Emerging Research and Applications in 2019 (LNBIP 346); • Information Technology for Management: Current Research and Future Directions in 2020 (LNBIP 380); • Information Technology for Management: Towards Business Excellence in 2021 (LNBIP 413). Given new opportunities and threats for economic and social development, emerging business and social challenges, and the rapid developments in information systems and technologies and their adoption for improving business and society, there was a clear need for an updated publication. The present book includes extended and revised versions of a set of selected papers submitted to the Advances in Information Systems and Technologies conference track (FedCSIS-AIST 2021) organized within the 16th Conference on Computer Science and Intelligence Systems (FedCSIS 2021), which was held online during September 2–5, 2021. FedCSIS provides a presentation, discussion, and reputable publication forum in computer science and intelligence systems. It invites researchers and practitioners from around the world to contribute their research results focused on emerging topics related to research and practice in computer science and intelligence systems. Since 2012, the proceedings of the FedCSIS have been indexed in the Thomson Reuters Web of Science, Scopus, the IEEE Xplore Digital Library, and the DBLP Computer Science Bibliography. In 2021 FedCSIS was ranked by CORE as a B conference, a major achievement for the series. FedCSIS-AIST covers a broad spectrum of topics which bring together information technologies, information systems, and social sciences, i.e., economics, management, business, finance, and education. This year’s edition consisted of the following four scientific sessions: Advances in Information Systems and Technologies (AIST 2021), the 3rd Special Session on Data Science in Health, Ecology and Commerce (DSH 2021), the 16th Conference on Information Systems Management (ISM 2021), and the 27th Conference on Knowledge Acquisition and Management (KAM 2021). AIST focuses on the most recent innovations, current trends, professional experiences, and new challenges in designing, implementing, and using information
vi
Preface
systems and technologies for business, government, education, healthcare, smart cities, and sustainable development. DSH topics include data analysis, data economics, information systems, and data science research, focusing on the interaction of the three fields, i.e., health, ecology, and commerce. ISM concentrates on various issues of planning, organizing, resourcing, coordinating, controlling, and leading management functions to ensure the smooth, effective, and high-quality operation of information systems in organizations. KAM discusses approaches, techniques, and tools in knowledge acquisition and other knowledge management areas with a focus on the contribution of artificial intelligence to improving human-machine intelligence and facing the challenges of this century. For FedCSIS-AIST 2021, we received 30 papers from authors from 13 countries in all continents. The quality of the papers was carefully evaluated by the members of the Program Committees by taking into account the criteria for papers, relevance to conference topics, originality, and novelty. Based on their extensive reviews, a set of 24 papers were selected. After further reviews, only eight papers were accepted as full papers and five as short papers, yielding an acceptance rate of 43%. Finally, eight papers of the highest quality were carefully reviewed and chosen by the chairs of the four sessions, and the authors were invited to extend their research and submit the new extended papers for consideration for this LNBIP publication. Our guiding criteria for including papers in the book were the excellence of submissions as indicated by the reviewers, the relevance of the subject matter for improving management by adopting information technology, and the promise of the scientific contributions and the implications for practitioners. The selected papers reflect state-of-art research work that is often oriented toward real-world applications and highlights the benefits of information systems and technologies for business and public administration, thus forming a bridge between theory and practice. The papers selected to be included in this book contribute to the understanding of relevant trends of current research on and future directions of information systems and technologies for improving and developing business and society. The first part of the book focuses on improving management systems, the second part presents solutions to social problems, and the third part explores methods and approaches for business and societal support. Finally, we and the authors hope readers will find the content of this book useful and interesting for their own research activities. It is in this spirit and conviction we offer our monograph, which is the result of the intellectual effort of the authors, for the final judgment of the readers. We are open to discussion on the issues raised in this book and look forward to critical or even polemical voices as to the content and form. The final evaluation of this publication is up to you - our readers. February 2022
Ewa Ziemba Witold Chmielarz
Acknowledgment
We would like to express our thanks to everyone who made the FedCSIS-AIST 2021 track successful. First of all, our authors for offering very interesting research and submitting new findings for publication in LNBIP. We express our appreciation to the members of the Program Committees for taking the time and effort necessary to provide valuable insights for the authors. Their high standards enabled the authors to ensure the high quality of the papers, which in turn enabled us to ensure the high quality of the conferences sessions, excellent presentations of the authors research, and valuable scientific discussion. We acknowledge the chairs of FedCSIS 2021, Maria Ganzha, Leszek A. Maciaszek, and Marcin Paprzycki, for building an active international community around the FedCSIS conference. Last but not least, we are indebted to the team at Springer without whom this book would not have been possible. We cordially invite you to visits the FedCSIS website at https://fedcsis.org and join us at the future AIST tracks.
Organization
FedCSIS-AIST 2021 Track Chairs Ewa Ziemba Witold Chmielarz Alberto Cano
University of Economics in Katowice, Poland University of Warsaw, Poland Virginia Commonwealth University, USA
Program Chairs Witold Chmielarz Daphne Raban Jarosław W˛atróbski Ewa Ziemba
University of Warsaw, Poland University of Haifa, Israel University of Szczecin, Poland University of Economics in Katowice, Poland
Program Committee Anton Agafonov Andrzej Białas Ofir Ben-Assuli Robertas Damasevicius Gonçalo Dias Rafal Drezewski Leila Halawi Ralf Haerting Adrian Kapczy´nski Wojciech Kempa Agnieszka Konys Eugenia Kovatcheva Jan Kozak Marcin Lawnik Antoni Ligeza Amit Rechavi Nina Rizun
Samara National Research University, Russia Institute of Innovative Technologies EMAG, Poland Ono Academic College, Israel Silesian University of Technology, Poland University of Aveiro, Portugal AGH University of Science and Technology, Poland Embry-Riddle Aeronautical University, USA Aalen University, Germany Silesian University of Technology, Poland Silesian University of Technology, Poland West Pomeranian University of Technology, Poland University of Library Studies and Information Technologies, Bulgaria University of Economics in Katowice, Poland Silesian University of Technology, Poland AGH University of Science and Technology, Poland Ruppin Academic Center, Israel Gdansk University of Technology, Poland
x
Organization
Joanna Santiago Wojciech Sałabun Marcin Sikorski Francesco Taglino Łukasz Tomczyk Gerhard-Wilhelm Weber Paweł Ziemba
University of Lisbon, Portugal West Pomeranian University of Technology, Poland Gdansk University of Technology, Poland IASI-CNR, Italy Pedagogical University of Cracow, Poland Poznan University of Technology, Poland University of Szczecin, Poland
DSH 2021 Chairs Bogdan Franczyk Carsta Militzer-Horstmann Dennis Häckl Jan Bumberger Olaf Reinhold
University of Leipzig, Germany WIG2 Institute for Health Economics and Health Service Research, Germany WIG2 Institute for Health Economics and Health Service Research, Germany Helmholtz-Centre for Environmental Research – UFZ, Germany University of Leipzig/Social CRM Research Center, Germany
Program Committee Alpkoçak Adil Douglas Cirqueira Nilanjan Dey Nils Kossack Karol Kozak Marco Müller Piotr Popowski Shelly Sachdeva Malte Viehbahn Katarzyna Wasielewska-Michniewska
Dokuz Eylul University, Turkey Dublin City University, Ireland Techno International New Town, India WIG2 Institute for Health Economics and Health Service Research, Germany Fraunhofer IWS and Uniklinikum Dresden, Germany WIG2 Institute for Health Economics and Health Service Research, Germany Medical University of Gda´nsk, Poland National Institute of Technology Delhi, India WIG2 Institute for Health Economics and Health Service Research, Germany Systems Research Institute of the Polish Academy of Sciences, Poland
Organization
xi
ISM 2021 Chairs Bernard Arogyaswami Witold Chmielarz Jarosław Jankowski Dimitris Karagiannis Jerzy Kisielnicki Ewa Ziemba
Le Moyne University, USA University of Warsaw, Poland West Pomeranian University of Technology, Poland University of Vienna, Austria University of Warsaw, Poland University of Economics in Katowice, Poland
Program Committee J¯anis Biˇcevskis Alberto Cano Vincenza Carchiolo Beata Czarnacka-Chrobot Pankaj Deshwal Robertas Damasevicius Monika Eisenbardt Marcelo Fantinato Renata Gabryelczyk Nitza Geri Dariusz Grabara Jarosław Jankowski Andrzej Kobylinski Christian Leyh Karolina Muszy´nska Tomasz Parys Uldis Rozevskis Nina Rizun Andrzej Sobczak Jakub Swacha Symeon Symeonidis Oskar Szumski Jarosław W˛atróbski Janusz Wielki Dmitry Zaitsev Marek Zborowski
University of Latvia, Latvia Virginia Commonwealth University, USA Universita di Catania, Italy Warsaw School of Economics, Poland Netaji Subhas Institute of Technology, India Silesian University of Technology, Poland University of Economics in Katowice, Poland University of São Paulo, Brazil University of Warsaw, Poland The Open University of Israel, Israel University of Economics in Katowice, Poland West Pomeranian University of Technology, Poland Warsaw School of Economics, Poland Dresden University of Technology, Germany University of Szczecin, Poland University of Warsaw, Poland University of Latvia, Latvia Gdansk University of Technology, Poland Warsaw School of Economics, Poland University of Szczecin, Poland Democritus University of Thrace, Greece University of Warsaw, Poland University of Szczecin, Poland Opole University of Technology, Poland Odessa State Environmental University, Ukraine University of Warsaw, Poland
xii
Organization
KAM 2021 Chairs Krzysztof Hauke Małgorzata Nycz Mieczysław Owoc Maciej Pondel
Wroclaw University of Economics, Poland Wroclaw University of Economics, Poland Wroclaw University of Economics, Poland Wroclaw University of Economics, Poland
Program Committee Witold Abramowicz Frederic Andres Yevgeniy Bodyanskiy Witold Chmielarz Dimitar Christozov Jan Vanthienen Eunika Mercier-Laurent Małgorzata Sobi´nska Jerzy Surma Julian Vasiliev Yungang Zhu
Poznan University of Economics, Poland National Institute of Informatics, Japan Kharkiv National University of Radio Electronics, Ukraine University of Warsaw, Poland American University in Bulgaria, Bulgaria Katholieke Universiteit Leuven, Belgium Jean Moulin University Lyon 3, France Wroclaw University of Economics, Poland Warsaw School of Economics, Poland/University of Massachusetts Lowell, USA University of Economics – Varna, Bulgaria Jilin University, China
Contents
Approaches to Improving Management Systems Analysis of Critical Success Factors for Successfully Conducting Digitalization Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Leyh, Konstanze Köppel, Sarah Neuschl, and Milan Pentrack
3
Analysis of Concurrent Processes in Internet of Things Solutions . . . . . . . . . . . . . Janis Bicevskis, Girts Karnitis, Zane Bicevska, and Ivo Oditis
26
A Framework for Emotion-Driven Product Design Through Virtual Reality . . . . Davide Andreoletti, Marco Paoliello, Luca Luceri, Tiziano Leidi, Achille Peternier, and Silvia Giordano
42
Solutions to Social Issues Artificial Intelligence Project Success Factors—Beyond the Ethical Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gloria J. Miller Planning a Mass Vaccination Campaign with Balanced Staff Engagement . . . . . Salvatore Foderaro, Maurizio Naldi, Gaia Nicosia, and Andrea Pacifici
65
97
Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Federica Rollo, Giovanni Bonisoli, and Laura Po Methods for Supporting Business and Society Towards Reliable Results - A Comparative Analysis of Selected MCDA Techniques in the Camera Selection Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Aleksandra B˛aczkiewicz, Jarosław W˛atróbski, Bartłomiej Kizielewicz, and Wojciech Sałabun Towards a Web-Based Platform Supporting the Recomposition of Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Piotr Wi´sniewski, Agata Bujak, Krzysztof Kluza, Anna Suchenia, Mateusz Zaremba, Paweł Jemioło, and Antoni Lig˛eza Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Approaches to Improving Management Systems
Analysis of Critical Success Factors for Successfully Conducting Digitalization Projects Christian Leyh1(B)
, Konstanze Köppel1 , Sarah Neuschl2 , and Milan Pentrack2
1 Technische Universität Dresden, 01062 Dresden, Germany
[email protected]
2 Fraunhofer Center for International Management and Knowledge Economy IMW,
04109 Leipzig, Germany {sarah.neuschl,milan.pentrack}@imw.fraunhofer.de
Abstract. Our article provides insights into which critical success factors (CSFs) for digital transformation projects are considered particularly relevant from the perspective of companies. To this end, we used an online survey addressing German companies with an existing set of CSFs. The aim of our study was to obtain a practice-oriented ranking to derive approaches for further research into CSFs in digital transformation. Our study shows that it is primarily CSFs in the dimensions of Corporate organization and Technology that are considered to be particularly relevant. For example, when asked about important CSFs, the companies stated that digital transformation projects require besides other factors a suitable Corporate culture, Top management support, and a Unified digital corporate strategy/vision. Thereby, this article contributes to CSF research in the context of digital transformation with profound experience and orientation knowledge. This will make it possible to develop practical recommendations for action and assistance in shaping the digital transformation. Keywords: Digitalization · Digital transformation · Critical success factor CSF · Digitalization projects
1 Introduction Today, more than ever, society is undergoing rapid evolution of digital transformation. Government institutions, households, enterprises, and their interactions are changing due to increasing prevalence and speedy growth potential of digital technologies. For companies to rely on a deep understanding of information technology (IT) and of digital innovation is exceedingly important. The technological possibilities, especially concerning the merging of the physical with the digital world, are leading to fundamental paradigm shifts that affect all industries. Furthermore, the progressive and steady digitalization of society itself, with associated changes, also plays a role in the daily activities of enterprises. The consequences of this development and the question of whether these changes should be seen as positive or negative are omnipresent [1–5]. Digitalization © Springer Nature Switzerland AG 2022 E. Ziemba and W. Chmielarz (Eds.): FedCSIS-AIST 2021/ISM 2021, LNBIP 442, pp. 3–25, 2022. https://doi.org/10.1007/978-3-030-98997-2_1
4
C. Leyh et al.
has long since ceased to be a mere buzzword but has rather become a strategic competitive factor. Moreover, digitalization is often seen as an enabler to increase resilience in companies. Here, the positive effects of digital technologies and business models are emphasized. The COVID-19 crisis lends new relevance to this thesis, as many companies were only able to maintain certain processes with the help of digital tools [6–8]. In the COVID-19 crisis, it became particularly apparent that the challenge is not merely the implementation and use of digital technologies, since the accompanying, appropriate changes at every organizational level, e.g., business process adjustments, business model innovations, and restructuring the company organization itself, are at least of equal importance. Consequently, mastering the challenges posed by digitalization has long since ceased to be merely the task of the IT department but rather of the entire company [5]. Activities and projects in digital transformation are usually highly complex and timeintensive, thus leading to great opportunities for companies as well as enormous risks. To avoid being “swallowed up” by the risks, it is imperative for companies to focus on the factors that influence digitalization projects. In this context, various studies (e.g., [9–15]) have shown that paying attention to these so-called “critical success factors” (CSFs) can have a positive influence on the success of IT projects and their subsequent use, thus minimizing the project risks. In both scientific and practice-oriented literature, the CSFs for digitalization projects are primarily discussed against the background of the difference between digitalization projects and “classic IT projects.” Addressing this discussion and topic, we set up a long-term research project that specifically addresses CSFs influencing projects in the context of the digital transformation of enterprises. As a first step, we conducted an extensive systematic literature analysis to identify the CSFs of digitalization projects. Second, we set up an interview study with several selected companies to verify the factors identified in the literature and identify additional factors (see [16]). This resulted in 25 CSFs that form the basis of the third step of the research project and, thus, the basis of this paper. The aim of this third step is to examine the importance/relevance of the identified 25 CSFs for digitalization projects with a quantitative study using an online survey. Furthermore, we aim to examine the implementation and characteristics of these factors in the companies’ projects as a fourth step (which will not be part of this paper). For the third step, we derived four research questions to guide our analysis. In the following, we will only focus on the central research question for the aim and scope of this paper: Which critical success factors are considered (particularly) important in digitalization projects? Taking up this research question, this paper (as extended version of [17]) aims to provide initial answers by presenting and discussing selected results of the online survey. To this end, we structured the paper as follows. This introduction is followed by a brief overview of the theoretical foundations of our study as well as a description of the most important CSFs. Afterwards, we present the design of our study and the structure of the questionnaire. Then we describe selected results of the survey. Finally, the paper concludes with a discussion and conclusion of an overview on further research steps.
Analysis of Critical Success Factors
5
2 Theoretical Background 2.1 Digital Transformation Digital transformation is the inner engine of a highly extensive transformation, as the effects of which are technologically detectable but the overall consequences for the economy and society are not traceable. Driven by the fourth industrial revolution, it is not only customer behavior that has changed but also the way people, organizations, and industries interact with each other [18]. So far, there is no universal definition of digital transformation in the literature. The terms digital transformation, digitalization, and digital age are frequently used as synonyms. Therefore, we use the term digital transformation (DT) in this study. Despite the different views of DT, we can see that DT is a development driven by digital technologies and constant changes in society as well as companies. DT is described as linking together the changes in strategies, business models, cultures, structures, and processes in companies with the goal of strengthening the company’s market position using digital technologies [19]. Furthermore, DT differs from a classic change process based on three specific characteristics. The first characteristic is that DT often starts with the customer. Here, the digital customer data plays a central role. New business models, for example, can emerge from this resource. The second characteristic is that DT represents more than just the optimization of business processes and IT. In general, DT encompasses the complete renewal of the entire business model. The third and final characteristic shows that DT is an open-ended and long-term process. The most profound difference between DT and a classic change process is its open-endedness. It fosters a completely new kind of management challenge, since there have been little to no standards or best practices that companies can draw on for help. Management must start from new premises for the conception and implementation of DT processes [20]. In conclusion, there is no clear definition for DT in science and practice, as various definitions represent DT in a general or highly simplified way. In the context of our study, we define DT as follows: DT refers to the fundamental transformation of society, as well as the economy, using digital technologies. DT not only has social, cultural, legal, and political implications, but also consequences for all corporate structures and value chains. For companies to master DT successfully, new business models, strategies, organizational forms, and processes are necessary, as well as a strong customer-centricity. 2.2 Digitalization Projects The transformation of the company in DT is often traversed in several digitalization projects. However, there is no uniform definition for digitalization projects in the literature. In general, a digitalization project can involve not only redesigning parts of the working environment but also networking systems or production facilities through machines. In any digitalization project, it is important to consider the reservations, wishes, and goals of the various target groups. In general, four phases divide the procedure of a digitalization project: goal setting, strengthening project acceptance, implementation, and control. The first phase derives the objectives and strategies for the DT project. Since this
6
C. Leyh et al.
is the basis of the entire digitalization project, it is essential to involve all target groups. In the second phase, a strategic and tactical concept design of the digitalization project must be developed and implemented in the company. For the successful completion of a digitalization project, it is helpful to define a person responsible for the project who is already familiar with the implementation of DT. In the third phase, the actual implementation of the digitalization project takes place through suitable measures in the company. The final phase monitors the success of the digitalization project. Feedback should be obtained from all stakeholders involved to derive the potential for improvement [21]. 2.3 Critical Success Factors For several decades, practitioners have been dealing with the idea that corporate success is based on specific influencing factors and measures of management. In practice, success factor research first gained acceptance through the much-cited PIMS study (Profit Impact of Marketing Strategies), which addressed corporate success and its causes in the early 1970s. This was a pioneering study in the field of success factor research. Over the years, other works have also had a significant influence on the domain. For example, Rockart [22] took up the ideas of the initial success factor research and further developed them in his concept of critical success factors using a variety of methods. Rockart [22] conducted intensive interviews with chief executive officers (CEOs) of specific companies to identify success factors. Since 1980, research in the field has changed from specific individual cases to a holistic or industry-specific research of critical success factors [23]. However, the literature defines the term success factors differently. The terms critical success factors, strategic success factors, and key factors are often used as synonyms. In this study, we use the term critical success factors (CSFs). Table 1 shows selected definitions in the literature. The definition by Rockart [22] is the most influential. Table 1. Definitions of CSFs. Reference
Definition
[22]
“Critical success factors thus are, for any business, the limited number of areas in which results, if they are satisfactory, will ensure successful competitive performance for the organization. They are the few key areas where ‘things must go right’ for the business to flourish. If results in these areas are not adequate, the organization’s efforts for the period will be less than desired.”
[24]
“Key success factors are those variables, which management can influence through its decisions that can affect significantly the overall competitive positions of the various firms in an industry.”
[25]
“Critical Success Factors (CSFs) are those characteristics, conditions, or variables that when properly sustained, maintained, or managed can have a significant impact on the success of a firm competing in a particular industry.”
Analysis of Critical Success Factors
7
All authors of the definitions presented in Table 1 point out that CSFs play a decisive role in the success of the company and the project. These are areas of action for management to monitor and contribute to the achievement of the company’s goals continuously and carefully [22]. However, CSFs vary by company and industry. Therefore, it is important for each company to identify the specific CSFs of their industry and respective project areas. 2.4 Critical Success Factors for Digitalization Projects In the first step of our research project, we identified 25 CSFs of DT (see [16]), which form the basis of this paper. Table 2 lists the 25 CSFs of DT associated with their respective dimensions. Table 2. CSFs of digitalization projects (adapted from [16]) Dimension
CSFs
Corporate organization
• • • • • • • •
Corporate culture Implementation of a digital mindset Unified digital corporate strategy/vision Leadership Top management support Change management Digital talent in leadership positions Qualification
Technology
• • • • •
Data collection/Big data analysis Hardware Software Unified database in an overall system Data security
Customer
• Customer centric management model • Omni-channel-management
Project management
• Network effects through open systems/partnerships • Long-term implementation through short intensive sprints • Resources
Value creation
• • • •
Value proposition
• Servitization • Fast prototyping • Scalability
Networking of the entire value network Implementation of new KPIs Cross-functional development teams Lean thinking/OpEx
To provide a comprehensive understanding of the different CSFs and their concepts, they are described in this section before presenting the research methodology and discussing the results. However, only the most important factors per dimension are described
8
C. Leyh et al.
subsequently. A detailed description of the remaining factors can be requested from the authors. CSF Dimension “Corporate Organization” Top Management Support: In DT, top management support and commitment is an important CSF. Insufficient top management support or a lack of project understanding by top management can be problematic in DT implementation [26]. For this reason, it is important that this project is given highest priority by top management as they are actively involved in the transformation process. Since top management is responsible for the digital transformation strategy, strategic decisions and measures can only be made with (or executed slowly or not at all without) regular or active involvement of top management [27]. When the Chief Information Officer (CIO) manages the transformation, the focus is usually too much on business processes. However, in the context of DT, this should be directed toward the interface with the customer. By introducing a Chief Digital Officer (CDO), the right focus can be ensured and, in addition, sufficient attention to top management can be generated [28]. His or her activities in the company include the development of new digital business models, the introduction of innovative technologies and the promotion of networked working in the company. In this context, the CDO acts as a disruptor who acts strategically and entrepreneurially in the network organization. On the one hand, he or she should penetrate new industries and challenge existing market standards, and on the other hand, he or she should be the first point of contact for the operational digitization of existing business areas and the transfer of digital skills throughout the hierarchical part of the company [29]. Corporate Culture: Corporate culture is considered one of the most important CSFs of DT. This is because DT not only creates new business models, but also improves business processes and changes the way companies process real-time information. There are high expectations placed on resulting disruptive innovations. However, there are also often problems with such promising business changes that produce many benefits for the customer and the organization. Often, the changes collide with the corporate culture. Companies should therefore invest a lot of time and resources of various kinds so that a new digital corporate culture can be created. The sooner such cultural changes are managed, the sooner it is possible to implement new business models and opportunities. The corporate culture should therefore be data-based, data-driven, agile, risk-averse, creative, and have flat hierarchies [30]. To create agile and digital organizations, it is necessary to profoundly change the distribution of work and roles as well as the associated shared values, norms, and attitudes. Interdisciplinarity and personal responsibility are important characteristics of agile teams. The definition and pursuit of goals take place jointly because each member is seen as a relevant part of the team and thus bears direct responsibility on the outcome of his or her actions [29]. All agile methods are based on the same principle: the failure of a project in the early phases is encouraged and the lessons learned are used in later project phases. Therefore, companies should support the culture of “learning from mistakes” and stimulate the innovative power of employees [31]. High creativity and risk-taking are closely linked in this approach. Basically, the hierarchy levels of a company are crucial for this approach. Hierarchies must be flattened to ensure shorter decision-making paths on the one hand and to enable accelerated adaptability
Analysis of Critical Success Factors
9
to dynamic market changes on the other. Employees should be able to make their own decisions without having to pay too much attention to formalities. Companies are subject to constant change, so openness and willingness to change, especially in DT, must be deeply embedded in the company’s cultural identity and mission statement [32]. Implementation of a Digital Mindset: DT is forcing companies to develop a new way of thinking: a digital mindset. Companies need to revise or innovate their current business model to meet the challenges of shifting the economy from automation to digitalization. With a digital mindset, the time needed for innovation can be shortened. Thus, companies can recognize the potential of the innovation(s) early and analyze this to ultimately adopt the innovation faster and easier [33]. The digital mindset not only supports the confident use of new technologies, but also a variety of other behavioral patterns. At the heart of this is the idea of an unlimited number of possibilities and opportunities for growth. When companies have a digital mindset, it means they are comfortable with embracing new technologies and understand the importance of data and information on progress. They also have the understanding to see change as an opportunity and accept new ways of working. Furthermore, they can make decisions based on incomplete information or adapt them to new circumstances in an emergency. Employees are open-minded, curious, and willing to learn when dealing with new technologies and agile working methods [34]. Companies must succeed in ensuring that data and information are perceived as valuable resources that are essential in the transformation to a digital company and can thus generate or achieve competitive advantages [31]. CSF Dimension “Technology” Data Security: In DT, topics such as data security, data protection, or compliance play a major role. Data or applications must always be available to employees, but legal frameworks must be observed, and the information must be protected from third parties [35]. Due to overly strict guidelines for companies in Germany, they may lose out to their international competitors, both as users of DT, and as developers of digital business models and products. Alongside data protection, the issue of data security is a critical factor for the success of a company’s digitalization projects. Effective data security management can ensure this success. To guarantee corporate security, an individual security analysis should be carried out by specialists in every company. This allows attack methods and scenarios to be tested and potential threats to be identified in good time. Companies can only successfully master DT if the increasing volumes of digital data are adequately protected and safe from cyberattacks [36]. Software: Until now, many companies have had many heterogeneous systems and software solutions. For example, there are individual spreadsheets, small special applications, or enterprise resource planning systems (ERP systems). All in all, the complexity in companies is already high. However, this will be increased by digital value networks and smart products. For DT, the right software is very important. In this regard, a cloud solution can provide an appropriate way of obtaining the right software “easily” [33]. Cloud solutions make it possible to store, manage, and process data without the company needing local hardware capacities for operation [37]. By using cloud computing, regardless of the deployment model, companies are not only offered the flexibility to
10
C. Leyh et al.
perform operational business processes, but also customized solutions to digitize the business model [33]. Unified Database in an Overall System: The integration of individual applications in an overall system is also considered critical to success in DT. In a digitalized environment, a large number of computer devices, household applications, mobile devices, industrial plants, sensors, cameras, and vehicles are connected by different software and hardware and operate on different platforms. To ensure a seamless exchange of information, the individual applications must be integrated via standardized interfaces and not used in isolation. To avoid redundancies and ensure free and cross-process data maintenance, it is important that the applications access a unified database [26, 38]. CSF Dimension “Customer” Customer Centric Management Model: For successful digital companies, the focus is on the customer, with whom the company can connect digitally. However, the Customer centric management model goes far beyond the previous customer orientation of companies. For a traditional company, the product is usually the focal point through which the company defines itself. In a customer-centric company, on the other hand, all customer experiences form the benchmark for the development of services and their implementation. This means, for example, that a company in the automotive industry no longer sees itself solely as a carmaker, but rather as a mobility service provider [39]. Whether incremental or radical innovations, companies should pursue a benefit-centric mindset in all development processes. For companies that are digitally transformed, a core task is to identify relevant customer needs and translate them into usable solutions, as well as to innovate the customer experience [29], iteratively and continuously. Not only must the sales, marketing, and service departments work together, but product design, supply chain management, human resources, IT, and finance all have to interact together at every stage of the development process. By integrating the customer into the entire value chain, a customer-centric company differs from those that only engage the customer through innovative media [40]. Omni-Channel-Management: In recent years, the simultaneous use of offline and online channels has led many companies to establish multi-channel management. In this process, different content is harmonized across different channels to leave a consistent overall impression on the customer. In today’s world, Omni-channel management is required in companies. This represents the further development of multi-channel management. While multi-channel management attempts to give the customer a consistent overall impression across several channels, this is already a prerequisite for Omnichannel management. The novelty of omni-channel management is that the channelspecific possibilities within a campaign are exhausted in each case. To reach and convince a company’s customers in the best possible way, an omni-channel presence is unavoidable. Above all, this requires more individual, flexible, and adaptable sales areas [36, 41]. In DT, successful Omni-channel-management can deliver great added value. If different channels are intelligently linked at the point of sale, this can contribute to higher
Analysis of Critical Success Factors
11
customer loyalty by analyzing user behavior and identifying customer needs even better. Here, companies usually “only” must face the challenge of effectively using the enormous amount of data [27]. The use of all analog and digital channels as well as the modification or renewal of business models is a critical factor for the success of digitization projects in companies [42]. CSF Dimension “Project Management” Network Effects Through Open Systems/Partnerships: Companies’ own products can become even more powerful in the network if the companies open these products or services, enabling easier integration of new devices, people, and objects through open standards. Network effects can be created by increasing the value of a product or service. These effects can be further enhanced as more users adopt these products or services, or as other providers extend the service. For companies, network effects can become a differentiator and driver of value creation by making products and services more digitized and connected. Until now, traditional companies largely controlled the speed to market of their standalone autonomous products themselves. However, DT requires companies to coordinate their product and service launches with complementary products and services from networks or partnerships that have emerged [43]. However, effective and efficient partnerships can arise not only between established companies, but also between established companies and start-ups. In the case of larger companies, despite sufficient relevant resources, such as budget or technical equipment, the innovative power is lower. Start-ups, on the other hand, have the necessary prerequisites for a fast and agile introduction of digital technologies or innovations [44]. With the help of new creativity and innovation methods, new business areas can be worked on in the network-oriented part of the company. Companies can promote and develop promising ideas in innovation projects and, if successful, incorporate them into the existing corporate portfolio or transfer them to their own corporate structure through a spin-off [29]. Long-Term Implementation Through Short Intensive Sprints: For companies, the “how” plays an important role in successful DT implementation. The path and the result are associated with a certain degree of uncertainty when planning DT projects. Since companies have no previous experience in this regard, unexpected obstacles can arise at any time. The more extensive the project is and the longer it takes to implement, the more challenges accumulate. In sum, the problems are more difficult to eliminate than if each obstacle is considered and solved individually. A long-term strategy is important for DT, but rather than one enterprise-wide project, DT implementation should be approached in several smaller projects, characterized by a test-and-learn approach. By splitting the project, integration can be faster and, in addition, acceptance within the company can be reached [42, 45]. Resources: Sufficient resources are needed to implement transformation initiatives and digitization projects. These projects are, for example: Acquisition of digital technologies, targeted recruiting of digital talent or improvement of employee qualifications [28]. In the case of financial resources, it is not the amount that is decisive, but where and how the funds are used. Companies usually prefer existing, well-known systems and processes and try to update them through investments or replace older versions with
12
C. Leyh et al.
newer ones. For unknown projects, the monetary benefits are difficult to estimate, and companies are less willing to invest. When in doubt, companies are more likely to use their human and financial resources for safer and better-known methods. However, automating old processes or making them more efficient would not achieve the desired results. In contrast, it is better to analyze and question the existing process to redefine and redesign it if necessary. Problems such as incorrect prioritization of issues or resource allocations, an insufficient number of project staff, or a lack of time due to cost constraints can make successful DT implementation difficult or even impossible [37, 45]. CSF Dimension “Value Creation” Networking of the Entire Value Network: Value networks that place a strong focus on innovation are usually still unable to meet the challenges of DT. The speed factor is very important in DT because, for example, product life cycles are becoming shorter and more unpredictable. In addition, customers are demanding faster development and delivery of new products or services. This demand can be met through better response times, new organizational structures, or business processes. Crucial to this change is the transformation of traditional value networks into demand-centric networks. The focus of supply chain managers should shift from cost reduction to the introduction of new processes to interconnect the entire value network more closely. Thus, benefits can be gained from technological innovations [46]. Cross-functional Development Teams: Consistent collaboration in cross-functional development teams is essential for successful digital enterprises. New forms of division of labor between different corporate functions and divisions are also emerging because of new networking opportunities in DT. Project-based collaboration across organizational departmental boundaries is a key CSF for companies. For smooth collaboration, it is important that the operational experts of the different areas can contribute their knowledge to the digital development process at any time. Cross-functional and crossdepartmental integration leads to numerous benefits, such as an improved exchange of ideas, an increase in problem-solving skills, or the consideration of different perspectives [29, 47]. Lean Thinking/OpEx: The four principles of lean management include: Creating customer value, eliminating waste, continuous improvement, and respect for people. Lean management includes various methods for implementing these principles, such as 5S, Total Productive Maintenance (TPM) or Kanban [48]. With the help of lean management, companies can guarantee the efficiency, scalability, reliability, predictability, and quality of core operations and ultimately achieve operational excellence (OpEx). The company’s existing business operations and processes should be free of waste (lean) before the company moves forward with DT. If flawed or inefficient processes were to be digitized, this could lead to duplicating waste and these would have to be optimized and changed afterwards at great expense [49]. The operational business of companies that have introduced OpEx is automated, standardized, effective, and reliable. This allows companies to focus more intensively on strategic issues and the development of innovations. Companies that have not achieved OpEx lack basic competencies. However, these are necessary to carry out the transformation into a digital business operation in the
Analysis of Critical Success Factors
13
long term and to implement a strategy for customer retention or for digitized solutions. For the successful implementation of DT in companies, the achievement of OpEx is an important component [50]. CSF Dimension “Value Proposition” Servitization: Servitization is the transformation that takes place in a company when more and more services are offered that are directly related to the products. The aim is to maximize the benefits for the customer as well as for the provider [51]. DT increasingly blurs the boundaries between physical products and intangible services. Companies that sell hardware are equally providers of related services. Customers are making their decision to buy a product increasingly dependent on the service offered. Therefore, for many companies, the service is often more decisive for differentiation on the market than the product itself [29, 52]. In many industries, pure product offerings initially dominated. Over time, complementary services developed alongside the products. In the meantime, the product range of companies is increasingly being penetrated by services. It is only through the acquisition of specific services that some products achieve their full depth of application. For this reason, more business models are emerging while focusing on services. By offering digital services, interaction with the customer is simplified and companies can gain more information about the wishes or requirements of customers and better respond, accordingly. By various services, Big Data is collected. The analysis of this data can provide important information for the company that helps to successfully implement DT [29, 50]. Fast Prototyping: The success of a new product is ensured by the iterative process of frequent feedback loops and its response. Learning processes can be accelerated if rapid prototyping is encouraged within the company. Companies can continuously and immediately test their products when these new products are quickly modeled and immediately brought to market. With direct testing at the customer site, many important insights can be generated. With the information, product development staff can more quickly and efficiently develop ideas that meet customer needs and expectations [14]. This allows companies to escape the previous rigid product development and thus work on constant, iterative changes and further developments [49]. Scalability: Companies developing new products and services should ask themselves two elementary questions: The first question is whether the solutions developed are scalable. The second question is how quickly these solutions can be adapted to ten, one thousand, or even ten million users. Even the best products and services generate little added value if they cannot be easily scaled. Solutions with shared standards often offer new opportunities in the market as well as fast and scalable growth options [29]. Therefore, all companies always strive for high scalability of their products or services. The results of virtual corporations have already shown the extent to which scalability is possible in a digital environment. However, a digital environment alone is not sufficient. Skilled management is also essential for such scalability. If companies want to be successful, they must find a way to best manage exponential growth of their products or services. To ensure that growth does not overwhelm managers, specific foundations for scalability should be laid in good time [49].
14
C. Leyh et al.
3 Research Methodology 3.1 Structure of the Online Questionnaire With our central research question, we aim to gain initial insights into companies’ assessments and understandings of CSFs in DT. Therefore, we chose an explorative approach for this study. Accordingly, the study is intended as a starting point for more in-depth investigations of the characteristics of the individual CSFs in the further course of our research project. For this reason, we also make no claim of the representativeness of participants in this study. To design our online questionnaire, we looked at existing CSF study designs (e.g., for ERP system implementation projects) and used them for orientation. In total, our questionnaire was divided into three parts: Part A is comprised of 11 questions (Part A.1: four questions, Part A.2: seven questions). In Part A.1, the first two questions address the company’s industry sector and number of employees. In the last two questions of Part A.1, the participant is asked to state his/her position in the company and the location (federal state) of the company. In Part A.2, first, we asked the participant if he/she agrees with the given definition of DT (see Sect. 2). The next question discusses the company’s attitude regarding DT against the background of the current COVID-19 crisis. We then asked whether the pandemic has favored the attitude towards digitalization projects in certain companies. The next three questions address the participants’ assessment of the extent to which their companies have already implemented digitalization projects, in general. The last two questions in Part A.2 cover the structure of the IT department and the current digital trends the company is focusing on, such as smart factories or IoT. Part B is comprised of seven questions. First, the participant must assess all CSFs regarding the perceived influence on the success of digitalization projects. To this end, we query the 25 CSFs within the dimensions of business organization, technology, customer, project management, value creation, and value proposition. Each dimension represents one matrix question (six questions in total). We chose a 5-point Likert scale to measure the influence of the CSFs of digitalization projects: 1—No influence; 2—Little influence; 3—Medium influence; 4—High influence; & 5—Very high influence. Parts A and B are relevant for addressing our central research question within this paper. The aim of the final Part C is to evaluate the implementation of the CSFs in the company. Due to the complexity of the factors, it was, unfortunately, not possible to ask about all CSFs. Therefore, we examined only the three CSFs that were determined in Part B as the three most important CSFs of DT. To prevent the questionnaire from becoming too long, we asked a maximum of three questions for each CSF, so that the total number of questions in Part C did not exceed nine. Hence, the results of Part C will not be part of this paper. For the implementation of the questionnaire, we used the online survey application LimeSurvey. For a better overview, we displayed all the questions of a question group on one page to reduce the number of clicks needed. To ensure the same understanding of the response, we gave all terms/concepts a lay-over possibility, allowing participants to see a given definition.
Analysis of Critical Success Factors
15
Before the online survey started, we performed a pretest to check the questionnaire instructions and individual items for comprehensibility and errors. Within the scope of the pretest, seven people from the target group (e.g., managing directors, department heads) went through the questionnaire. Their answers were not included in the final data evaluation. Based on their feedback, we made final changes to the online questionnaire. 3.2 Data Collection For the online survey, we invited companies to participate primarily via emails. We used the AMADEUS company database (https://amadeus.bvdinfo.com/) by Bureau van Dijk as the main source for contact information. The query in the AMADEUS database was limited to “active companies,” regardless of industry sector, which provided an e-mail address, location headquartered in Germany, and had at least 20 employees. From the resulting list, 7,360 e-mails were randomly sorted and sent to companies in the period from December 1, 2020, to January 31, 2021. In addition, we shared the link to the online survey in various groups on the XING platform (https://www.xing.com/). After the survey period closed, the questionnaire was at least partially completed 225 times. Of these 225 questionnaires, 101 were completed in full. Before the data analysis was carried out, the 101 fully completed questionnaires were checked for plausibility. We, therefore, needed to exclude four data sets, which meant that 97 data sets could be considered for the evaluation of results presented in the following chapter.
4 Selected Results 4.1 General Participants’ Characteristics First, we asked the 97 participants about the general characteristics of their companies, which included location, industry affiliation, number of employees, and position of the participants (Part A.1). Most of the companies (n = 22) belong to the manufacturing industry/production of goods. The subsequent dominant sector allocation falls into the provision of economic services (n = 18) and education and training (n = 11). The remaining companies are distributed roughly equally among the other industry sectors. The aggregation of the individual sectors to the secondary sector (industrial production) or tertiary sector (service enterprises in the broader sense) shows that most companies belong to the service sector (n = 65), and the remaining businesses are industrial enterprises (n = 32). According to the indicated number of employees, most of the companies (n = 70) are SMEs (i.e., companies with up to 249 employees). Large companies are in the minority in our sample (n = 27). 4.2 Digital Transformation Within the Companies Following the general question regarding company specifics, we asked seven questions with a specific focus on the characteristics of DT (Part A.2). First, participants were asked to evaluate a presented definition of DT (see Theoretical Background). Almost two-thirds of the participants (n = 63) fully agreed with the given
16
C. Leyh et al.
definition. One-third of the respondents (n = 32) agreed at least partially. Reasons for partial agreement with the DT definition vary widely. For example, it was noted that each company must overcome individual challenges in the context of DT, and that the definition can, therefore, only be regarded as a rough guide. Furthermore, participants put into perspective that new business models and strategies at existing companies are not necessary for the success of DT. The companies then assessed to what extent their attitude towards DT has changed due to the COVID-19 crisis. Most companies (n = 58) indicated that their attitude toward DT has not changed because of the current COVID-19 crisis, since they had already perceived DT as an important issue. This indicates that many companies had already addressed DT in their strategies or are currently doing so. One-third of the companies (n = 32) perceived DT as more important than before due to the COVID-19 crisis. In turn, five companies indicated that their attitudes toward DT have not changed because of the COVID-19 crisis, in that DT does not play an important role in their companies. Furthermore, the DT status of the company was of interest; DT was already an integral part of the business strategy in almost half of the companies surveyed (n = 47). In 40% of the companies, there was no overarching corporate strategy for DT, but they had already started or implemented single digitalization projects. Ten companies are currently in the planning phase in digitalization projects, and only one company indicated that it has not yet addressed the issue of DT at all. The companies were then asked about digitalization projects conducted or planned along key business functions (logistics, production, human resources, purchasing, sales, marketing, accounting/controlling, service, other). In human resources (n = 61), marketing (n = 54), and accounting/controlling (n = 56) functions, most companies have already conducted digitalization projects. One in three companies—cumulatively viewed for all functions—is currently conducting or has already completed digitalization projects. In the final question of Part A.2, we asked for the DT trend topics the companies have already addressed. The topic that most companies (n = 66) have already addressed or currently focus upon is cloud technologies. While many companies (n = 34) also focus on big data, some also deal with trends like additive manufacturing processes, IoT, cyber-physical systems, and smart factory. The trends a company chooses to address often also depend on the industry sector. For example, smart factory or cyber-physical systems play a role more often in the manufacturing sector and less frequently in service companies. Some companies also listed additional trends, i.e., artificial intelligence (AI), telematics infrastructure, and hybrid commerce. 4.3 Assessment of CSFs for Digitalization Projects The core of our survey consisted of assessing all identified CSFs (see [18]) in terms of their influence on the success of digitalization projects (Part B). Companies rated their importance using the 5-point Likert scale (1—No influence; 2—Little influence; 3—Medium influence; 4—High influence; to 5—Very high influence). The respective ranking is shown in Table 3.
Analysis of Critical Success Factors
17
The dimension Corporate organization is the largest and comprises eight CSFs. The entire dimension seems to have a high to very-high impact (on average, rated with a 4.14), as the participants mostly rated the pertinent CSFs with a four or five: • About nine out of ten companies rated the CSF of Corporate culture as very important for digitalization projects. • Most companies rated the CSFs Implementation of a digital mindset and Unified digital corporate strategy/vision as high (n = 46; n = 40) to very high (n = 37; n = 38). • About eight out of ten companies believe that the CSF Leadership has a high (n = 32) or very high (n = 44) impact for digitalization projects. • For the CSF Top management support, over 50% of the respondents (n = 52) indicated that this factor has a very high influence in DT project implementation. • For Change management and Digital talent in leadership positions, the percentage of companies rating the influence as only moderate is higher (n = 20; n = 16,) than for the other CSFs in this dimension. However, even for these two CSFs, companies rated their influence as high (n = 40; n = 43) or very high (n = 31; n = 24). • The final CSF in this dimension, Qualification, is also rated as having a high (n = 49) to very high (n = 32) influence with respect to the success of digitalization projects. The dimension Technology is the second largest dimension and includes five CSFs. This dimension is also assigned a high to very-high influence, as the individual CSFs. were predominantly rated as a four or five. On average, companies rated all CSFs in this dimension with 4.11: • The CSF Data security stands out in having the highest influence on project success in DT: 23 participants perceive a high influence on digitalization projects. Two thirds of the companies (n = 64) stated that the influence of this factor is very high. • For the two CSFs Software and Unified database in an overall system, the influence on the successful implementation of digitalization projects is mainly rated as high (n = 40; n = 39) to very high (n = 50; n = 44). • The assessment was not so clear-cut for the last two CSFs Data collection/Big data analysis and Hardware. In both cases, participants agreed to a high (n = 34; n = 41) or very high impact (n = 18; n = 22). Compared to the other three CSFs in this dimension, respondents also indicated that these two CSFs each had a rather medium influence (n = 24 and n = 21). In addition, about one in ten of the respondents (n = 11 and n = 12) rated the influence as low in each case. The dimension Customer covers the two CSFs Customer centric management model and Omni-channel-management. On average, respondents in this dimension rated the impact of the CSFs on project success in DT only with a 3.54: About half of the respondents each rated the influence of the two CSFs as high to very high. Just under one-fifth of the companies (n = 18; n = 21) rated the influence as medium. Compared to the CSFs of the first two dimensions considered so far, some participants stated that these CSFs have no influence on the success of digitalization projects. This may be due, for example, to
18
C. Leyh et al. Table 3. Ranking of CSFs from questionnaire part B
Ranking of CSFs based on average score using the 5-point Likert scale Critical Success Factor
Rank
Critical Success Factor
Rank
Data security
1
Cross-functional development teams 14
Software
2
Hardware
15
Top management support
3
Customer centric management model
16
Unified database in an overall system
4
Long-term implementation through short intensive sprints
17
Corporate culture
5
Scalability
18
Implementation of a digital mindset
6
Network effects through open systems/partnerships
19
Unified digital corporate strategy / Vision
7
Lean thinking/OpEx
20
Leadership
8
Data collection/Big data analysis
21
9
Qualification
Omni-channel-management
22
Resources
10
Servitization
23
Change management
11
Fast prototyping
24
Networking of the entire value network
12
Implementation of new KPIs
25
Digital talent in leadership positions
13
the fact that these two CSFs are somewhat more specific for individual industry sectors and many respondents may not be able to assess this for their company. The following three CSFs belong to the dimension Project management: Network effects through open systems/partnerships; Long-term implementation through short intensive sprints; and Resources. On average, this dimension is rated with a 3.74. This is slightly above the score for the dimension Customer (3.54) but below the dominant ones (Corporate Organization: 4.14; Technology: 4.11). All three CSFs of this dimension were assigned a high influence on the success of digitalization projects by more than 40% of the companies (n = 40, n = 45 and n = 43). Almost one-third of respondents (n = 31) even rated the influence of Resources as very high. In contrast, for the other two CSFs, a quarter of the companies (n = 24 and n = 26) think that the influence on project success in DT is rather moderate. Furthermore, 15% of the participants (n = 12 and n = 13) believe that the CSF Network effects through open systems/partnerships has little to no influence on the success of the digitalization projects. The dimension Value creation consists of four CSFs. On average, there is a rating of 3.59 in this dimension. The influence of the two CSFs Networking of the entire value network and Cross-functional development teams was rated higher than for the other two CSFs. More than 40% of the companies (n = 39; n = 43) indicated that the two CSFs mentioned had a high influence on project success in DT, and several companies (n = 23; n = 19) even rated this as very high. However, about 22% of the participants (n = 22 and
Analysis of Critical Success Factors
19
n = 21) are of the opinion that the two CSFs have only a medium influence on success. Regarding the other two CSFs, Implementation of new KPIs and Lean thinking/OpEx, most companies (n = 35; n = 34) stated that the influence here is neutral. The final dimension Value proposition includes the three CSFs Servitization, Fast prototyping, and Scalability. On average, participants rated this dimension the lowest of all dimensions with a 3.38. When evaluating the CSF Servitization, 27 of the respondents stated that its influence on the success of the digitalization projects is medium, 21 of the respondents estimated it to be high, and 10 companies very high. For Fast prototyping and Scalability, approximately 30% of respondents (n = 29; n = 30) believe that their influence on project success is high. The percentage of respondents who find their influence to be neutral is one-fifth (n = 20) and one-quarter (n = 23), respectively.
5 Discussion Regardless of the size of the company, projects within the scope of DT are complex and extensive undertakings, which sometimes lead to strong interventions in the company’s processes and daily business. A structured approach to the implementation of digitalization projects can prove highly useful. Therefore, it can be helpful for companies to use CSFs as a guide for the specific implementation of digitalization projects. According to the assessments of the companies surveyed, organizational factors play a decisive role in DT. For example, Corporate culture was rated as an important CSF for digitalization projects. As DT gives rise to new business models, companies must adapt and improve business processes for these new circumstances. As these changes often collide with the already existing corporate culture, it is particularly important that companies invest time and resources to create a digital corporate culture. The associated, necessary changes should be openly communicated and, above all, implemented together with the employees (internal co-creation). The newly created corporate culture should have flatter hierarchies and be databased, data-driven, agile, risk-aware, and creative [30]. However, Top management support (among the Top 3 CSFs, see Table 3) is also necessary for successful implementation. Without a certain level of commitment and project understanding, most digitalization projects will fail. Since the top management is responsible for the digital transformation strategy, it must also be actively involved in the digital transformation process. Managers should define appropriate goals for DT and harmonize them with the rest of the corporate goals. Regarding DT, it is necessary that the future positioning of a company is anchored in a Unified digital corporate strategy/vision. For a successful implementation of digitalization projects, companies must define corresponding goals, determine expected developments, and derive resulting measures. Due to a strong dependency of corporate strategy and corporate culture, it is important that both are aligned [14, 53]. A particularly helpful exposition for successfully aligning is to implement a digital mindset in the company. Setting the right course at the organizational and strategic level is one thing—the effective and efficient implementation of concrete projects for DT is another. Since digitalization projects—in their dominant nature as innovation projects (new-to-the-firm or even new-to-the-market)—can have different focuses, not all factors are of the same relevance for every company or every project. Noticing this, we set the identified success
20
C. Leyh et al.
factors in relation to the different dimensions of DT. This resulted in a comprehensive model as a starting point for digitalization projects—see Fig. 1. The ranks of the Top 10 CSFs shown in this figure refer to Table 3. When the survey results are integrated into our model (see Fig. 1), the abovementioned discussion becomes clear once again: Most of the Top 10 CSFs are assigned to the dimension Digital corporate organization. The other Top 10 CSFs (except for the CSF Resources) belong to the dimension Digital technology. This can be interpreted that the two dimensions are particularly fundamental and flanking areas, which are, therefore, rated as the most important. The Digital corporate organization permeates the other dimensions of DT and influences all transformation tasks. The technological basis, in turn, is a component of all developments in the other dimensions. Differences emerge in the results when the data provided by SMEs (n = 70) and large companies (n = 27) are considered separately: • Top 3 CSFs of SMEs: Data security; Software; and Top management support. • Top 3 CSFs of large enterprises: Data security; Corporate culture; and Implementation of a digital mindset. Looking at the Top 10 CSFs without considering company size (see Table 3), the difference between large companies and SMEs is shown, in that both SMEs and large companies consider the dimensions of both Digital corporate organization and Digital technology to be of particular importance. In the case of SMEs, however, technological CSFs rank even higher, whereas large companies give more importance to organizational CSFs. One possible explanation for this difference is that large companies have more resources (human and financial) to create the technological basis for DT—and have already done so to a much greater extent than SMEs.
Fig. 1. Model of digital transformation – integration of dimensions and critical success factors
Analysis of Critical Success Factors
21
The distinction between SMEs and large enterprises can shed new light on CSFs, since these must be interpreted through the background of company specifics in structure and processes (e.g., flatter structures, familiarity, scarcity of resources in SMEs). Since organizations are social systems that consist of complex interactions between individuals and groups, Corporate culture (as a CSF of the dimension Digital corporate organization) is, therefore, incorporated. In this way, shared convictions and attitudes exist in groups, which influence the perception, reactions to changes, and, consequently, the occurrence of resistance/barriers [54]. Corporate culture, thus, has an influence on perceptions, attitudes, and behaviors [55], which, in turn, influence the success of digitalization projects [56].
6 Conclusion The results of our study make a significant contribution to the CSF research focusing on digitalization projects. However, the need for a more detailed and diversified view of different CSFs becomes even more evident when considering the potential impact of the COVID-19 crisis on DT. The crisis brought DT into sharper focus, especially for companies that had not previously addressed DT in such detail. This is also illustrated by the answers to the question focusing on the influence of the COVID-19 crisis. Even though nearly 60% of the companies had seen DT as important before the pandemic, an additional 33% now see DT as more important than before. In conclusion, this shows the importance of focusing strongly on DT in research and deriving concrete practiceoriented recommendations for action and assistance for companies in shaping DT. By discussing CSFs within the context of the COVID-19 crisis, different questions on shortterm and long-term time horizons are implied, e.g., what changes were organizations able to implement ad hoc, what lessons have been learned, and which changes will remain after the COVID-19 crisis? At the interface of digitalization projects, the call for new work imperatives came up. However, since the advancement/adaption strategies of large companies are often more clear than the respective coping mechanisms of SMEs, we are currently working on a study that focuses on CSFs for improved data management and data analysis within SMEs (as an exemplary digitalization project) in times of the COVID-19 crisis. At this point, future research activities in this area can build on the insights gained from our study. Within our research project, we plan to investigate individual CSFs (esp. the top 10 factors) in more detail to derive recommendations for action for the best possible implementation of the CSFs in the company. Therefore, we are going to apply a qualitative approach by conducting several in-depth interviews with different enterprises. Furthermore, we want to set up a larger quantitative study (with claim to representativity) with specific viewpoints on individual industry sectors and with a more specific consideration of company sizes to further specify the importance of CSFs for digitalization projects in this regard. Another starting point for future research could be to analyze CSFs with reference to the different types of digitalization projects, such as logistics or human resources, to highlight any differences. Furthermore, it should be investigated what makes the implementation of individual CSFs in companies more difficult and how these obstacles can be minimized.
22
C. Leyh et al.
A few limitations of our study must be mentioned as well. For our literature review (see [16]), we are aware that we cannot be certain that we have identified all relevant papers published in journals and conferences. Therefore, journals that are not included in our selected databases and the proceedings from other conferences might also provide relevant articles. In addition, since DT is rapidly evolving, it is necessary to update the results of the literature review every few years to possibly identify new emerging CSFs that must be included in further studies. Due to the small sample of participating enterprises, the obtained results possess limited statistical generalizability. However, the applied method allowed us to identify important details and obtain insights into the enterprises’ views on CSFs for digitalization projects and their importance, which was the main focus of this step in our long-term research project. Another limitation is that the participants’ origins are limited to German enterprises, which implies that the study reflects the situation in only one country. German specifics could have influenced the results. Furthermore, our study pertains mostly to small and medium-sized companies. Therefore, including a higher number of larger companies would widen the results of our investigation.
References 1. Leyh, C., Schäffer, T., Bley, K., Forstenhäusler, S.: Assessing the IT and software landscapes of industry 4.0-enterprises: the maturity model SIMMI 4.0. In: Ziemba, E. (ed.) AITM/ISM -2016. LNBIP, vol. 277, pp. 103–119. Springer, Cham (2017). https://doi.org/10.1007/9783-319-53076-5_6 2. Pagani, M.: Digital business strategy and value creation: framing the dynamic cycle of control points. MIS Q. 37(2), 617–632 (2013). https://doi.org/10.25300/MISQ/2013/37.2.13 3. Leyh, C., Bley, K., Ott, M.: Chancen und risiken der digitalisierung – befragungen ausgewählter KMU. In: Hofmann, J. (ed.) Arbeit 4.0 – Digitalisierung, IT und Arbeit. EH, pp. 29–51. Springer, Wiesbaden (2018). https://doi.org/10.1007/978-3-658-21359-6_3 4. Mathrani, S., Mathrani, A., Viehland, D.: Using enterprise systems to realize digital business strategies. J. Enterp. Inf. Manag. 26(4), 363–386 (2013). https://doi.org/10.1108/JEIM-012012-0003 5. Bley, K., Leyh, C., Schäffer, T.: Digitization of German Enterprises in the Production Sector – Do they know how “digitized” they are? In: Proceedings of the 22nd Americas Conference on Information Systems (AMCIS 2016) (2016) 6. Helmenstein, C., Zalesak, M., El-Rayes, J., Krabb, P.: Raise the Curve: Mit Digitalisierung zu mehr Resilienz und Wachstum. Accenture and Industriellenvereinigung (2020) 7. Berg, A.: Digitalisierung der Wirtschaft – Auswirkungen der Corona-Pandemie. bitkom, Berlin (2020) 8. Streibich, K.-H., Winter, J.: Resiliente Vorreiter aus Wirtschaft und Gesellschaft. acatech — Deutsche Akademie der Technikwissenschaften, Munich (2020) 9. Jones, A., Robinson, J., O’Toole, B., Webb, D.: Implementing a bespoke supply chain management system to deliver tangible benefits. Int. J. Adv. Manuf. Technol 30(9–10), 927–937 (2006). https://doi.org/10.1007/s00170-005-0065-2 10. Hentschel, R., Leyh, C., Baumhauer, T.: Critical success factors for the implementation and adoption of cloud services in SMEs. In: Proceedings of the 52nd Hawaii International Conference on System Sciences (HICSS 2019) (2019) 11. Achanga, P., Shehab, E., Roy, R., Nelder, G.: Critical success factors for lean implementation within SMEs. J. Manu Tech. Manag. 17(4), 460–471 (2006). https://doi.org/10.1108/174103 80610662889
Analysis of Critical Success Factors
23
12. Leyh, C., Thomschke, J.: Critical success factors for implementing supply chain management systems – the perspective of selected German enterprises. In: Proceedings of the 2015 Federated Conference on Computer Science and Information Systems (FedCSIS 2015), pp. 1403–1413 (2015). https://doi.org/10.15439/2015F245 13. Denolf, J.M., Trienekens, J.H., Wognum, P.M., van der Vorst, J.G.A.J., Omta, S.W.F.: Towards a framework of critical success factors for implementing supply chain information systems. Comput. Ind. 68, 16–26 (2015). https://doi.org/10.1016/j.compind.2014.12.012 14. Holotiuk, F., Beimborn, D.: Critical success factors of digital business strategy. In: Wirtschaftsinformatik Proceedings 2017 (WI 2017) (2017) 15. Leyh, C., Crenze, L.: ERP system implementations vs. IT projects: comparison of critical success factors. In: Poels, G. (ed.) Enterprise Information Systems of the Future. LNBIP, vol.139, pp. 223–233. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36611-6_20 16. Leyh, C., Meischner, N.: Erfolgsfaktoren von Digitalisierungsprojekten - Einflussfaktoren auf Projekte zur Digitalen Transformation von Unternehmen. ERP Management 2/2018, 35–38 (2018). https://doi.org/10.30844/ERP18-2_35-38 17. Leyh, C., Köppel, K., Neuschl, S., Pentrack, M.: Critical success factors for digitalization projects. In: Proceedings of the 16th Conference on Computer Science and Intelligence Systems (FedCSIS 2021). (2021). https://doi.org/10.15439/2021F122 18. Sauer, R., Dopfer, M., Schmeiss, J., Gassmann, O.: Geschäftsmodell als gral der digitalisierung. In: Gassmann, O., Sutter, P. (eds.) Digitale Transformation im Unternehmen gestalten: Geschäftsmodelle, Erfolgsfaktoren, Handlungsanweisungen, Fallstudien, pp. 15–27. Hanser Verlag, Munich (2016) 19. Wallmüller, E.: Praxiswissen Digitale Transformation: Den Wandel verstehen, Lösungen entwickeln, Wertschöpfung steigern. Hanser Verlag, Munich (2017). https://doi.org/10.3139/ 9783446452732 20. Barghop, D., Deekeling, E., Schweer, D.: Herausforderung Disruption: Konsequenzen und Erfolgsfaktoren für die Kommunikation. In: Deekeling, E., Barghop, D. (eds.) Kommunikation in der digitalen Transformation, pp. 5–19. Springer, Wiesbaden (2017). https://doi.org/ 10.1007/978-3-658-17630-3_2 21. Falkenreck, C.: Digitalisierungsprojekte erfolgreich planen und steuern: Kunden und Mitarbeiter für die digitale Transformation begeistern. Springer, Wiesbaden (2019). https://doi.org/ 10.1007/978-3-658-24890-1 22. Rockart, J.F.: Chief executives define their own data needs. Harv. Bus. Rev. 57(2), 81–93 (1979) 23. Nicolai, A., Kieser, A.: Trotz eklatanter Erfolgslosigkeit: Die Erfolgsfaktorenforschung weiter auf Erfolgskurs. Die Betriebswirtschaft 62(6), 579–596 (2002) 24. Hofer, C.W., Schendel, D.: Strategy Formulation: Analytical Concepts. West Publishing, St.Paul/Minnesota (1978) 25. Leidecker, J.K., Bruno, A.V.: Identifying and using critical success factors. Long Range Plan. 17(1), 23–32 (1984). https://doi.org/10.1016/0024-6301(84)90163-8 26. Himmler, F., Amberg, M.: Die digitale fabrik – eine literaturanalyse. In: Wirtschaftsinformatik Proceedings 2013 (WI 2013) (2013) 27. Boersma, T.: Erfolgsfaktoren der digitalen Transformation. In: Heinemann, G., Gehrckens, HMathias, Wolters, U.J. (eds.) Digitale Transformation oder digitale Disruption im Handel, pp. 509–528. Springer, Wiesbaden (2016). https://doi.org/10.1007/978-3-658-13504-1_24 28. Hess, T., Matt, C., Benlian, A., Wiesböck, F.: Options for formulating a digital transformation strategy. MIS Q. Exec. 15(2), 123–139 (2016) 29. Kreutzer, R.T., Neugebauer, T., Pattloch, A.: Digital Business Leadership. Springer, Wiesbaden (2017). https://doi.org/10.1007/978-3-658-11914-0
24
C. Leyh et al.
30. Wokurka, G., Banschbach, Y., Houlder, D., Jolly, R.: Digital Culture: Why Strategy and Culture Should Eat Breakfast Together. In: Oswald, G., Kleinemeier, M. (eds.) Shaping the Digital Enterprise, pp. 109–120. Springer, Cham (2017). https://doi.org/10.1007/978-3-31940967-2_5 31. Dremel, C., Herterich, M.M., Wulf, J., Waizmann, J.-C., Brenner, W.: How AUDI AG Established Big Data Analytics in its Digital Transformation. MIS Q. Exec. 16(2), 81–100 (2017) 32. Pinkwart, A.: Change Management in Zeiten digitalen Wandels. In: Bruhn, M. and Kirchgeorg, M. (eds.) Marketing Weiterdenken. pp. 349–363. Springer, Wiesbaden (2018). https:// doi.org/10.1007/978-3-658-18538-1_27 33. Kowalkiewicz, M., Safrudin, N., Schulze, B.: The Business Consequences of a Digitally Transformed Economy. In: Oswald, G., Kleinemeier, M. (eds.) Shaping the Digital Enterprise, pp. 29–67. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-40967-2_2 34. Creusen, U., Gall, B., Hackl, O.: Digital Leadership. Springer, Wiesbaden (2017). https://doi. org/10.1007/978-3-658-17812-3 35. Abolhassan, F.: Digitalisierung als Ziel – Cloud als Motor. In: Abolhassan, F. (ed.) Was treibt die Digitalisierung?, pp. 15–26. Springer, Wiesbaden (2016). https://doi.org/10.1007/978-3658-10640-9_1 36. Strecker, F., Kellermann, J.: Die Cloud in der Praxis. In: Abolhassan, F. (ed.) Was treibt die Digitalisierung?, pp. 75–89. Springer, Wiesbaden (2016). https://doi.org/10.1007/978-3-65810640-9_6 37. Ryan, M., et al.: Digitale Führungsintelligenz in der Praxis. In: Summa, L. (ed.) Digitale Führungsintelligenz: „Adapt to win“, pp. 171–412. Springer, Wiesbaden (2016). https://doi. org/10.1007/978-3-658-10802-1_4 38. Xu, J.: Managing Digital Enterprise. Atlantis Press, Paris (2014). https://doi.org/10.2991/ 978-94-6239-094-2 39. Große Holtforth, D., Geibel, R.C., Kracht, R.: Schlüsselfaktoren im E-Commerce: Innovationen, Skaleneffekte, Datenorientierung und Kundenzentrierung. Springer, Wiesbaden (2020). https://doi.org/10.1007/978-3-658-31959-5 40. Berman, S.J.: Digital transformation: opportunities to create new business models. Strategy Leadersh. 40(2), 16–24 (2012). https://doi.org/10.1108/10878571211209314 41. Kreutzer, R.T., Land, K.-H.: Digitaler Darwinismus. Springer, Wiesbaden (2016). https://doi. org/10.1007/978-3-658-11306-3 42. Schütt, P.: Der Weg zum Digitalen Unternehmen. Springer, Heidelberg (2015). https://doi. org/10.1007/978-3-662-44707-9 43. Bharadwaj, A., El Sawy, O.A., Pavlou, P.A., Venkatraman, N.: Digital business strategy: toward a next generation of insights. MISQ 37(2), 471–482 (2013). https://doi.org/10.25300/ MISQ/2013/37:2.3 44. Islam, N., Buxmann, P., Eling, N.: Why should incumbent firms jump on the start-up bandwagon in the digital era? - a qualitative study. In: Wirtschaftsinformatik Proceedings 2017 (WI 2017) (2017) 45. Summa, L.: (Un) Bequeme Denkimpulse für Veränderung zugunsten einer digitalen Welt. In: Summa, L. (ed.) Digitale Führungsintelligenz: „Adapt to win“, pp. 13–150. Springer, Wiesbaden (2016). https://doi.org/10.1007/978-3-658-10802-1_2 46. Farahani, P., Meier, C., Wilke, J.: Digital supply chain management agenda for the automotive supplier industry. In: Oswald, G., Kleinemeier, M. (eds.) Shaping the Digital Enterprise, pp. 157–172. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-40967-2_8
Analysis of Critical Success Factors
25
47. Tomczak, T., Vogt, D., Frischeisen, J.: Wie Konsumenten Innovationen wahrnehmen Neuartigkeit und Sinnhaftigkeit als zentrale Determinanten. In: Hoffmann, C.P., Lennerts, S., Schmitz, C., Stölzle, W., Uebernickel, F. (eds.) Business Innovation: Das St. Galler Modell. BIUSG, pp. 187–209. Springer, Wiesbaden (2016). https://doi.org/10.1007/978-3-65807167-7_12 48. Kieviet, A.: Digitalisierung der Wertschöpfung: Auswirkung auf das Lean Management. In: Künzel, H. (ed.) Erfolgsfaktor Lean Management 2.0. ES, pp. 41–59. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49752-4_3 49. Weinreich, U.: Lean Digitization. Springer, Heidelberg (2016). https://doi.org/10.1007/9783-662-50502-1 50. Ross, J.W., Sebastian, I., Beath, C., Mocker, M., Moloney, K., Fonstad, N.: Designing and executing digital strategies. In: Proceedings of the 2016 International Conference on Information Systems (ICIS 2016). (2016) 51. Coreynen, W., Matthyssens, P., Van Bockhaven, W.: Boosting servitization through digitization: Pathways and dynamic resource configurations for manufacturers. Ind. Mark. Manage. 60, 42–53 (2017). https://doi.org/10.1016/j.indmarman.2016.04.012 52. Châlons, C., Dufft, N.: Die rolle der it als enabler f¨ur digitalisierung. In: Abolhassan, F. (ed.) Was treibt die Digitalisierung?, pp. 27–37. Springer, Wiesbaden (2016). https://doi.org/10. 1007/978-3-658-10640-9_2 53. vom Brocke, J., Fay, M., Schmiedel, T., Petry, M., Krause, F., Teinzer, T.: A journey of digital innovation and transformation: the case of Hilti. In: Oswald, G., Kleinemeier, M. (eds.) Shaping the Digital Enterprise, pp. 237–251. Springer, Cham (2017). https://doi.org/ 10.1007/978-3-319-40967-2_12 54. Weiner, B.J.: A theory of organizational readiness for change. Implementation Sci. 4(1), 1–9 (2009). https://doi.org/10.1186/1748-5908-4-67 55. Hu, Q., Dinev, T., Hart, P., Cooke, D.: Managing employee compliance with information security policies: the critical role of top management and organizational culture. Decis. Sci. 43(4), 615–660 (2012). https://doi.org/10.1111/j.1540-5915.2012.00361.x 56. Ries, S.: Veränderungen in kleinen und mittelständischen Unternehmen: Innerbetrieblichen Widerstand überwinden, Unterstützung von Veränderung herbeiführen und organisationale Veränderungsbereitschaft leben. Fraunhofer Center for International Management and Knowledge Economy IMW, Leipzig (2021)
Analysis of Concurrent Processes in Internet of Things Solutions Janis Bicevskis1(B)
, Girts Karnitis1
, Zane Bicevska2
, and Ivo Oditis2
1 Department of Computing, University of Latvia, Raina Boulevard 19, Riga 1586, Latvia
{Janis.Bicevskis,Girts.Karnitis}@lu.lv
2 DIVI Grupa Ltd., Avotu Street 40a-33, Riga 1009, Latvia
{Zane.Bicevska,Ivo.Oditis}@di.lv
Abstract. The rapid development of ICT has enabled services that did not exist before. Such services include shared vehicles like e-scooters, which offer a quick and simple mobility. A method of analysis for e-scooter and client collaboration is offered to detect e-scooter and client collaboration risks when e-scooter is used concurrently by many clients. The method provides for the creation of an exact description of e-scooter and customer cooperation processes, from which e-scooter and all-executable scenarios are created through symbolic execution. When analyzing the results of scenario execution, e-scooter and client collaboration risks are identified. The obtained result allows e-scooter system developers to prevent or mitigate identified risks, while customers have the possibility to use new transport services effectively but knowing the potential risks. Keyword: Concurrent processes · Risk analysis · Internet of things
1 Introduction The rapid development of technology has created new opportunities for services that could only be guessed by the public in the past. First, the rapid use of Internet shops should be mentioned: many customers, no need for visiting a traditional store, option of choosing an item, ordering, and paying it completely remote. Many customers may be serviced concurrently, and that may not be felt by each individual customer. However, new services that raise the quality of life of society hide new risks. Purchase/sales risks in internet shops are analyzed in [1]. The main identified risks: The buyer is afraid to risk paying for goods he has not even seen; the seller is afraid to risk sending the goods before receiving payment. If several customers are served at the same time (concurrently), the processes can interference each other, for example, a product selected by one customer may be sold to another customer whilst not yet paid by the first customer. The method of analyzing the risks of Internet stores is universal enough to apply it to assess the risks of using shared e-scooters. This new form of public transport has gained widespread popularity in the youth environment in large cities with developed street infrastructure. It offers a quick and easy way to move by starting a trip from an e-scooter location, continuing a customer-selected route, and ending the trip at a customer’s choice. The © Springer Nature Switzerland AG 2022 E. Ziemba and W. Chmielarz (Eds.): FedCSIS-AIST 2021/ISM 2021, LNBIP 442, pp. 26–41, 2022. https://doi.org/10.1007/978-3-030-98997-2_2
Analysis of Concurrent Processes in Internet of Things Solutions
27
new mode of transport is offered to a wide range of customers, i.e., many customers are being serviced concurrently. The proposed method offers you to analyze e-scooter and client collaboration in several steps. In the first step, create a detailed e-scooter and a description of the process of client collaboration. Two types of client processes are offered - with and without e-scooter reservation. The process with e-scooter reservation enables the customer to choose one of the available e-scooters on the city map and reserve it. The customer can go to the location of e-scooter to be sure that at this time another customer will not take the same e-scooter for his trip. Of course, the reservation service requires certain financial resources. The process without reservation enables the customer to receive information on the available e-scooters on the city map. The customer can go to an e-scooter location knowing that at this time another customer can take the same e-scooter for his trip. If it happens, the client can repeatedly choose another e-scooter and repeat the process. In cases where many e-scooter are available, the client risks a little while not using the reservation process. Many customers can be serviced concurrently, and each customer can proceed by using the process with or without reservation. A deeper analysis of the risks of e-scooter and client collaboration in the following chapters shows that there are seven e-scooter and client process collaboration risks. These risks are caused by the selected e-scooter and client cooperation model, not by the technical characteristics of e-scooter. The main goal of the customer is to get to the destination, and he is not interested in the functioning of e-scooter. The most labor-intensive part of the method is to build a feasibility tree of executable scenarios. Each tree branch represents one feasible e-scooter and client collaboration scenario. The result of the scenario is calculated symbolically. Analysis of all scenarios included in the feasibility tree allows you to identify e-scooter and client collaboration risks. The research is an extension of the previous studies published in [1–3]. It shows, firstly, the wide range of applications of the method, this time - in the domain of escooter and client collaboration analysis. Secondly, the method is applied and extended to the internet of things, described by a finite state machine, and used for analysis of collaboration between informally described business processes. It should be noted that the work is more than 60% different from the previous studies. The paper is structured as follows: Sect. 2 forms a theoretical background of research, Sect. 3 describes the methodology of e-scooter and client collaboration analysis, Sect. 4 presents analysis of the proposed solution applying it to the real system - e-scooter and customers collaboration, Sect. 5 provides the study’s contribution, implications, and limitations, as well as conclusions in Sect. 6.
2 Theoretical Background 2.1 Concurrent Processes Execution Correctness Insolvability Theoretical studies of the problem of identifying incorrect execution of concurrent processes have been published in [2]. First, there should be noted that identifying of incorrect execution of concurrent processes is an algorithmically unsolvable problem. It follows from the theory of algorithms that the Turing machine halting problem cannot be
28
J. Bicevskis et al.
resolved algorithmically. If the process description language is rich enough to simulate the functioning of a Turing-machine (Turing-complete language), the halting problem can be reduced to the problem to identify incorrect execution of concurrent processes. Turing-complete are the languages with two bidirectional counters and loops. Hence, the identifying of incorrect execution of concurrent processes is algorithmically unsolvable if a language with two bidirectional counters and loops is used to describe the business processes. A similar problem for setting up full sets of tests was addressed by Barzdins with co-authors [4]. 2.2 Process Description Language CPL-1 The study [2] offers a simple process description language CPL-1 (Concurrent Programming Language) with built-in transaction processing mechanism and an algorithm that determines whether incorrect concurrent execution of any two processes described in CPL-1 is possible. The algorithm contains two steps: (1) for any process execution scenario, using symbolic execution, determine whether the given scenario is feasible, and (2) assess whether the scenario is permissible/correct for the business process to be analyzed. The algorithm used for the analysis of processes described in CPL-1 has been taken as basis for the analysis of e-scooter and client collaboration. In addition to theoretical studies, the paper [2] also analyzes bank payment processes with and without reservation of transfer amount. The study shows that the concurrent execution of the two payments without reservation can be executed incorrectly if you allow the reading and recording of the balance sheet amount in different transactions. It is not possible to have an incorrect result if two payments are run currently but using processes with reservation of the transfer amount. This idea has also been used for the analysis of e-scooter and customer collaboration in the following chapters. 2.3 Other Relevant Studies The problem of identifying risks in computing systems is familiar from the 70-ies [5], and the research continues [6] until today [7]. Methods for discovering the potential deadlocks are most sought (a deadlock is the situation where one process is expecting the end of another process, which is expecting the end of the first execution). The computing system, failing to overcome this situation, stops working. This can cause serious problems, especially in telecommunications systems. Deadlock can be classified as a technological risk situation caused by inadequate design and programming solutions which do not relate to financial and organizational aspects. Specification languages such as Specification Description Language SDL [8] are used to detect deadlock situations, which allows you to define the functioning of the system precisely at a high level of abstraction and analyze its behavior. This reduces the possibility of errors by implementing specifications into programs. Many testing theory works are also devoted to the discovery of deadlock situations, which try to generate large number of test cases in the hope to detect risky situations of the system. However, it should be acknowledged that a universal solution to the deadlock situation has not been found at this time.
Analysis of Concurrent Processes in Internet of Things Solutions
29
Deadlock situations are rare for business processes and, as pointed out in [9], financial and organizational risks are mostly of interest. Forsythe et al. [10] identified six types of perceived risks that may have a negative impact on the experience of buyers: financial, product performance, social, psycho-logical, physical, and time/convenience loss. Respondents found financial risk to be the most important and significant. Financial risks were primarily associated with potential losses of money due to fraudulent misuse of credit card information. Nowadays, the paradigm on financial risks has changed [11, 12]. The online credit card usage-related risks are thoroughly discussed, and practical solutions are invented in online shopping platforms, including the implementation of 128-bit RSA encryption, digital certificates, firewalls etc. [12]. Another financial side-related risk is less covered: the trust between the customer and the shopping service provider [11–16]. Trust and reputation are considered now to be the concepts dominating in e-commerce most [16]. There is a study rejecting the significance of financial risks [16]. The study, based on a survey carried out by 245 country residents, identifies a “convenience risk” and a “non-delivery risk” (other classifications consider both as financial risks) to be very significant, as well as the “reliability of shipper” and the “settling disputes”. This study also highlights the discovery of risks of a communications nature with less attention to risks of a financial nature. The main risk for e-scooter and client collaboration is assumed in a situation where the customer cannot make the necessary trip because of the chosen collaboration scenario, not the technical features of an e-scooter.
3 Research Methodology 3.1 E-scooter and Client Collaboration Model Electric scooters are an example of Internet of Things (IoT) devices or even more specifically Internet of Vehicles that allows demonstrating wide possibilities of the proposed method. Three processes are included in the analysis of E-scooter and client collaboration: (1) E-scooter process, (2) Client process without reservation, (3) Client process with reservation. E-scooter operation is described by a finite state machine with four states (Fig. 1). Transitions from one state to another is caused by client’s process events, and they are embedded in e-scooters as a part of centralized management system functionality. Collaboration between a client process without reservation and e-scooter processes is given in Fig. 2a. This describes, using a graphical chart, a collaboration of one client with an e-scooter and other clients who can use collaborative processes with and without reservation. Collaboration between a client process with reservation and e-scooter processes is given in Fig. 2b. In the same way as in the previous case, a graphic chart describes how one client using the process with reservation collaborates with e-scooter and other clients, who can use collaborative processes with and without reservation. Further analysis defines e-scooter and client collaboration situations that are considered risky. The most complicated is the creation of a feasibility tree that contains all the e-scooter and client collaboration scenarios to be followed. Symbolic execution of
30
J. Bicevskis et al.
scenarios when calculating the outcome of a scenario provides an opportunity to identify potential risks in e-scooter and client collaboration. The next section gives a brief description of the algorithm for analysis of e-scooter and customer collaboration. 3.2 Algorithm on Risk Analysis This study considers an algorithm for the analysis of business processes that use a transaction mechanism, which may detect possibility of incorrect concurrent execution of processes. The main steps of the universal analysis algorithm are the following: (1) Create a description of the business process. To analyze a business process, a model of the business process should be created. If the model consists of program CPL-1 code like [2], the semantic of each statement is strictly defined. In this paper, an algorithm will be used for an informally defined business process, and it is described as a graphical diagram where the vertices of the graph represent activities of the business process and the arcs – a sequence of activities. To perform process analysis, the feasibility of scenarios and the outcome of scenarios should be assessable. (2) Define business process transactions. Business process activities or a set of activities are defined as a transaction in cases when their execution is delegated to another part of an information system, or their execution requires time during which access to common resources may not be blocked for other processes. For example, the e-scooter selection process should not be blocked for other remote clients, as the selecting may take longer. (3) Define incorrect business process execution. This step identifies situations that are not acceptable from a business perspective. If the process execution scenario leads to a situation that does not meet the business requirements, then it is likely that the definition of the business process needs to be revised to avoid incorrect execution results. Collaboration between e-scooter and client processes without reservation is risky, and therefore another way of collaboration between E-scooters and client process was introduced - with reservation. In the case of a formal model in CPL-1, concurrent process execution is considered incorrect if the results of concurrent execution differ from all the results of serial process execution. This requirement is borrowed from the database theory. (4) Construct a feasible scenario tree. If an informal model is used, its author selects different process execution scenarios and evaluates their feasibility; the author makes sure that there is input events that will make the selected scenario executed. The result of the analysis is represented in a tree, where each branch of the tree represents one feasible scenario, and the tree contains all possible different scenarios. In general, depending on the activities of the business process, this can be a difficult goal to achieve. If the model is defined by program code, the widely used method of “white box” analysis is used - symbolic execution of the program code takes place, which enables to compile the conditions for the execution of a pre-defined scenario. When solving the conditions, the obtained solution is a test case that should be executed to cover a pre-defined path. This approach has been
Analysis of Concurrent Processes in Internet of Things Solutions
31
known since the 1970s, and it is currently applied in the IntelliTest tool [17]. The tool generates test case sets that cover all arcs of a C# program control graph. (5) Calculate scenario execution results. In the case of an informal model, its author evaluates the expected result of the scenario execution from the business point of view. In the case of a formal model, a symbolic expression is calculated using symbolic execution. (6) Identify scenarios that lead to risks during business process execution. The risks of E-scooter and client collaboration are identified by using the message “noscooter”. The risks, which will be dealt with in detail in the following chapter, relate only to the use of the client’s e-scooter and show situations where it is not possible for the customer to drive. The e-scooter process could be risky if some e-scooters were to be excluded from circulation for a long time. As it will be shown in the next chapter, the e-scooter process is correct, and risks do not arise if e-scooter reservation is cancelled after some critical time tc .
4 Research Findings This chapter will offer an algorithm that allows you to analyse the risks of concurrent execution of multiple E-scooter processes. 4.1 Analysis of E-scooter Processes Three processes are considered: (1) e-scooter operation process, (2) client process when e-scooter reservation is not possible, (3) client process with e-scooter reservation. The purpose of the analysis is to identify the risks that could occur in these processes when concurrently serving multiple customers. In line with the methodology outlined above, let us first clarify these three processes: (1) E-scooter operation process. E-scooter operation is described by a final state machine with four states: Wait, Run, Reserved and Service as shown in Fig. 1. An e-scooter is in the status Wait when the vehicle is ready for use and a customer can rent it. Many e-scooters can be deployed in the city, and multiple vehicles can be placed in one location. An e-scooter is in the status Run when used by a customer. An e-scooter is in the status Reserved when a client has ordered the e-scooter service and booked a specific facility for himself; the e-scooter isn’t available to other customers. A vehicle is in the status Service during maintenance, for example, during battery charging. Transitions between the states are caused by events sent to or by clients and e-scooters. The process requires e-scooter to be started from the state Wait after the event Start was received. Customer service is completed when an event End is received from the client, and the vehicle returnes to the state Wait. A client has the option to book e-scooter that is in the state Wait by sending the event Reserve to the vehicle; a trip can be started with an event Start. The service provider has determined the critical time tc after which e-scooter is automatically returned from the state Reserved to the state Wait, therefore allowing it to be used by other customers and preventing inappropriate e-scooter downtime. The process
32
J. Bicevskis et al.
provides the option to cancel the reservation by the client himself; it happens when sending the event Free to the e-scooter. If the e-scooter batteries are discharged below the critical level cc , an event c < cc is sent to the equipment leading to the state Service. After the service, the vehicle is returned to the state Wait with the event Done. Although e-scooter works continuously, the process shows the starting and ending positions which enable and stop the use of the vehicle.
Fig. 1. E-scooter operation process
(2) Client process without e-scooter reservation. The description of the process is given in the graphical chart Fig. 2a. A client can view a city map with e-scooter locations and states on its own smart device. In the simplest case, the customer can choose the nearest vehicle and go to it to start the trip. The operation of the vehicle is managed by changing its states by sending events to it. At the end of the trip, the customer pays the service received. In a situation when the customer’s account does not have sufficient means to pay the trip, there is a possibility to contact the representative of the service provider. Obviously, the process is risky because while a client gets to an e-scooter location, another client can take that verhicle for his or her own use. This situation is caused by the concurrent execution of multiple processes; the further chapter will be devoted to a deeper analysis of it.
Analysis of Concurrent Processes in Internet of Things Solutions
33
Fig. 2. Client processes: (a) - without reservation, (b) - with reservation.
(3) Client process with e-scooter reservation is described in Fig. 2b. Like in the case without reservation, the customer can view a city map with e-scooter locations and states on its smart device. The client can book the equipment before the trip (the state for the vehicle changed to Reserved) and start using the selected e-scooter during the reserved period. The e-scooter control system sends the access code to the client after reaching the vehicle. E-scooter is in the state Run during the trip, and the state is changed to Wait after the trip. This is followed by payment for the reservation and the trip, which may be addressed with the representative of the service provider if a failure to pay occurs. The process is less risky than the simple one because another client cannot take this equipment for his or her use while a client gets to an e-scooter location. However, the e-scooter service provider has required to automatically cancel the reservation if the client has not started using the vehicle by the critical time tc . This is determined by the financial interests of the service provider – a reserved vehicle gives less income than a payment for a trip. A more detailed analysis of multiple processes with and without reservation that are executed concurrently will be addressed in the following chapter.
34
J. Bicevskis et al.
Fig. 3. Feasibility tree describing collaboration between a client without reservation and an escooter.
4.2 Conditions of Incorrect Concurrent Execution As incorrect is considered a concurrent execution of e-scooter and client processes that leads to situations when a customer cannot make a trip. It is recorded with a message “no-scooter” and is possible in a number of cases: (1) In a process without an e-scooter reservation, the client has reached the e-scooter location and does not find it because it may have been taken by another client. (2) In a process with an e-scooter reservation, the client has reached the e-scooter location and does not find it because the client has not reached the selected escooter at the critical time tc , the reservation has been cancelled and the vehicle have been taken by another client. (3) In other cases that will be discussed in the next chapter. As incorrect result should also be considered a failed customer payment for the use of e-scooter which is identified by sending a message “payment-problem”. However, these situations will not be analysed in this paper because they are not caused by the concurrent execution of client and e-scooter processes. Detailed information on the risks of payment processes can be found at [2].
Analysis of Concurrent Processes in Internet of Things Solutions
35
4.3 Building of a Feasibility Tree According to the method of concurrent execution of processes set out in the previous chapters, let us look at the creation of an feasibility tree (FT) for e-scooter and client processes (see Fig. 3). The tree consists of client process steps where each FT branch represents one concurrent execution scenarios of client and e-scooter processes. All scenarios start with the first step in the client process seekScooter - e-scooter selection from all available e-scooters being in the state Wait. If there is no such an e-scooter, the trip cannot take place and the message “no-scooter” is sent. If any e-scooters are available, the trip may take place and the client has a choice between the two options in the process – with or without reservation. In the following, a number of options for client and e-scooter process concurrent execution will be analysed, identifying potential situations and risks leading to the result “no-scooter”. 4.4 Concurrent Execution of E-scooter and Client Processes Without Reservation First, let us have a look at the FT “left” branch with number 1 in the end-of-scenario symbol. Let’s take a step-by-step look at scenario execution. In the first step, the client chooses a suitable e-scooter. In the second step of the scenario, the customer goes to the selected e-scooter and begins the trip by changing the state of the vehicle to Run. After the trip, the customer changes the status of the vehicle back to Wait and pays the ride. This typical e-scooter scenario can be repeated many times. Obviously, concurrent execution of e-scooter and clients processes without reservation cannot lead to incorrect result according to the scenario. Let’s look at the next scenario corresponding to the branch 5. In the second step of the scenario, the client has reached the location where e-scooter was at the moment of the choice (in the first step), but the vehicle is not found because the e-scooter has been picked by another client. The client can re-choose another e-scooter and go to ist current location in the hope of being available. If the new e-scooter is available, the next steps will be repeated as in the corresponding branch of the scenario 1. If the new e-scooter can not be found, the choice can be repeated. Such a cyclical process can be repeated many times and can end with a successful trip or a denial of service. Therefore, the Risk-1 is identified which can be mitigated by allowing an e-scooter to be reserved. Similarly, you can identify the Risk-2 that occurs when a customer reaches the selected e-scooter, but it is reserved by another client. The Risk-3 occurs when an e-scooter terminates the battery power to be restored to the charging station (event c < cc ). It stops the customer’s trip (see branch 3) and the customer has to pay for the service received, after which to choose the follow-up: either to search for another e-scooter or to opt out of the e-scooter service. As a result of the scenario, e-scooter enters the Service state, which is considered to be the result of “no-scooter”. The Risk-4 occurs when the customer is unable to correctly settle the service received, represented by branches 2 and 4. his situation does not apply to customer-to-e-scooter collaboration and will therefore not be further analysed. On the other hand, if the customer does not find a suitable e-scooter in the first step of the scenario and refuses
36
J. Bicevskis et al.
to receive the service (branches 9 and 10), then let us consider that Risk-5 has been identified. Thus, the analysis identifies five risks: • Risk-1 – the e-scooter you selected cannot be found because it has been checked out by another customer and the e-scooter has changed location, • Risk-2 – the e-scooter you have selected is not available because it has been reserved by another customer, • Risk-3 – the selected e-scooter has changed location because delivered to battery charging station, • Risk-4 – the customer has not settled on the trip, • Risk-5 – the customer has not found a suitable e-scooter. The shows that no other risks to e-scooter, client process without reservation and other customer collaboration are possible. Figure 3 depicts a FT in the case where the customer was unable to start the trip, as another customer has already taken the e-scooter selected for his trip and repeats the choice of e-scooter (see branches 5, 6, 7, 8). These scenarios repeat the situations already analysed and do not reveal any new risks. Similarly, there is a case where a customer cannot make a trip because another customer has reserved the e-scooter selected for himself during the period when the customer had not yet reached the e-scooter selected. These scenarios are not displayed in the tree (see branch 11). 4.5 Concurrent Execution of E-scooter and Client Processes with Reservation In this case, e-scooter and client collaboration differ from the collaboration described above that, in the first step of the e-scooter scenario, the client can book a suitable vehicle for himself by sending the Reserve event to it. (see Fig. 4). This ensures that another client cannot take this e-scooter into his or her use and that the Risk-1 has been removed. However, Risk-3, Risk-4 and Risk-5 remain valid. The Risk-3 may occur because battery charging is required and the service personnel have delivered the reserved equipment to a charging station permitted by e-scooter operating scenarios. In addition, two new risks Risk-6 and Risk-7 are identified. The Risk-6 occurs when the client fails to reach the selected e-scooter at a critical time (event t > tc ). The vehicle changes the position from Reserved to Wait according to its operational process, so that another customer can take the equipment previously reserved in his or her use. The Risk-7 is triggered when a client does not reach the e-scooter selected at a critical time, it changes the position from Reserved to Wait, which allows another client to book the equipment previously reserved by another client.. The reservation requires additional information - client-number and scooter-number - to identify the customer and the reserved vehicle, preventing other customers from using it. This information was not necessary in the process without reservation. Thus, 7 risks have been identified for e-scooter, for the customer with reservation and for other customers processes. The process with reservation eliminates the frequently possible Risk-1, which is why, in many cases, it is useful.
Analysis of Concurrent Processes in Internet of Things Solutions
37
Fig. 4. Feasibility tree describing collaboration between a client with reservation and an e-scooter.
5 Discussion of Findings The method used in the study originates from a study [2] in which the processes to be analyzed are described in the programming language CPL-1, which allows the processes to be defined very accurately. Looking at the real processes we face every day, such as an Internet store, distribution of theatrical tickets, etc., this precise definition of processes is difficult to achieve. Therefore, the method proposed by the theory was modified and applied to the analysis of real processes. Modifications to the theoretical method depend on the language in which the processes to be analyzed are defined. If the process is described precisely, for example in the programming language CPL-1 [2], then the semantic of symbolic execution of the scenarios is defined precisely, and risks can be mathematically proven. If the description of the processes is non-formal, the analysis is nevertheless possible under the conditions that the process scenarios to be analyzed are symbolically feasible, the feasibility of the scenarios can be assessed, and the results of scenario execution can be calculated. This in turn allows the identification of the risks of cooperation between processes.
38
J. Bicevskis et al.
5.1 E-commerce Let us first compare the risks of e-scooter processes with the risks of e-commerce processes analyzed for four e-commerce solutions – Internet Shop for Theatre Ticketing, Hotel Reservation, Online Store, Airline ticketing. The results are published in FedCSIS ’2021 conference materials [1]. In all 4 cases, customer and information system processes are analyzed. Process collaboration is described informally, with graphic charts and textual explanations. However, the description is given in sufficient detail to enable all possible collaborative scenarios to be evaluated and to identify the risks of cooperation between processes. This approach can be found at [17]. When comparing the e-commerce process with the analysis of e-scooter processes, it can be concluded that in both cases the method used is like that derived from theoretical studies. In both cases, the description of the proceedings is informal but sufficiently precise to carry out risk analysis through symbolic execution. The most significant difference between the two studies is that e-scooter has been described by means of a finite state machine. Therefore, the method is also applicable to the analysis of risks in collaboration of informally described customer processes and well-defined finite state machines. 5.2 Petri Networks Petri networks are traditionally offered as a language for modelling systems and analysis of concurrent execution [18]. Petri networks are an exact mathematical modelling language and its models in many cases require the model to satisfy the properties of soundness: 1. Correct end of action. When the Petri network is finished, there are no labels left on the Petri network (Petri network stops its working correctly); 2. Possibility to finish functioning. From any position, a label may come to an end (Petri network does not contain states from which there is no exit); 3. No unattainable states. For each state, there is a series of transitions from the starting state of the Petri network, followed by which a label will reach a particular state. Retreat from the properties of soundness is identified as a risk situation of a distributed system. Since the Petri network is very strongly mathematically formulated, many of its properties have been well studied, including the properties of soundness [19]. However, real-world processes are difficult to define through the Petri network, because modeling common resources in a relatively simple way is only possible for resources with very few available values, for instnce, {0, 1}. If there are many values available for the resource, you must use extended Petri network formalism to model it and create sophisticated networks. 5.3 Model of Collaboration Between Autonomous Robots Autonomous robots are capable of executing their intended works without human involvement in an unstructured environment [20]. Autonomous robots are being used
Analysis of Concurrent Processes in Internet of Things Solutions
39
more and more broadly in a variety of spheres of human life, taking over often the execution and monitoring of activities that, in the past, were able to perform only intelligent beings. Therefore, methods for the establishment and testing of autonomous robot systems are needed to verify the correctness of their operation in various difficult-to-anticipated external environments. Finite State Machines (FSM) are often used to describe robots. Thus, the analysis of the cooperation of robots can be reduced to the analysis of two finite state machines. Unfortunately, the product of two FSM machines for the analysis of collaboration between two machines is difficult to apply due to the “explosion” of the number of FSM states. The method used in this work is easy to apply to the analysis of machine-to-machine cooperation, since robots defined by automatic machines can be described precisely in robot-collaboration scenarios and calculated their results with symbolic execution [21].
6 Conclusion The study offers the use of a concurrent process analysis method for e-scooter and customer collaboration analysis. The method allows you to discover e-scooter and client collaboration risks when e-scooters are used concurrently by many clients. Key steps of the algorithm: (1) Creating an exact description of e-scooter and customer collaboration processes, (2) Specifying when the collaboration of e-scooter and client is considered as incorrect, (3) Creating a feasibility tree with all e-scooter and client collaboration scenarios, (4) Calculating execution results for all scenarios, (5) Identifying e-scooter and client collaboration risks. The proposed method is derived from theoretical studies, its use in the analysis of e-scooter and client collaboration shows the wide use of the method. The obtained results allow e-scooter software developers to reduce the identified risks; customers, on the other hand, have the possibility to use new transport services effectively, while understanding the potential risks. It would be appropriate to select a further direction of application for an analysis of the vaccination processes carried out to limit the COVID-19 pandemic. As is known, fraudulent cases have been reported in both the recording of vaccination and the use of vaccination certificates. Analysis with a view to identifying the risks of vaccination processes is a practical problem. Acknowledgements. This work has been supported by University of Latvia projects: AAP2016/B032 “Innovative information technologies” and research project “Competence Centre of Information and Communication Technologies” of EU Structural funds, contract No. 1.2.1.1/18/A/003 signed between IT Competence Centre and Central Finance and Contracting Agency, Research No. 1.6 “Concurrence analysis in business process models”.
References 1. Bicevskis, J., Nikiforova, A., Karnitis, G., Oditis, I., Bicevska, Z.: Risks of concurrent exe´ ezak D. cution in E-commerce processes. In: Ganzha, M., Maciaszek, L., Paprzycki, M., Sl˛
40
2.
3.
4. 5. 6.
7.
8. 9. 10. 11.
12.
13.
14.
15. 16.
17.
18.
J. Bicevskis et al. (eds.) Proceedings of the 16th Conference on Computer Science and Intelligence Systems ACSIS, pp. 447–451 (2021). https://doi.org/10.15439/2021F70 Bicevskis, J., Karnitis, G.: Testing of execution of concurrent processes. In: Robal, T., Haav, H.-M., Penjam, J., Matuleviˇcius, R. (eds.) DB&IS 2020. CCIS, vol. 1243, pp. 265–279. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57672-1_20 Nikiforova, A., Bicevskis, J., Karnitis, G.: Towards a concurrence analysis in business processes. In: Proceedings of the 7th International Conference on Social Network Analysis, Management and Security (SNAMS), pp. 1–6 (2020). https://doi.org/10.1109/SNAMS52053. 2020.9336566 Barzdins, J., Bicevskis, J., Kalnins, A.: Automatic construction of complete sample systems for testing. In: Proceedings of the IFIP 1977 Congress, North Holland, pp. 57–62 (1977) Coffman, E.G., Elphick, M.J., Shoshani, A.: System deadlocks. ACM Comput. Surv. 3(2), 67–78 (1971). https://doi.org/10.1145/356586.356588 Xi, Y., Zhang, J., Xiang, F.: The implementation of a communication deadlock analysis based on the theory of deadlock analysis for the actor model. J. Softw. Eng. Appl. 12, 393–405 (2021). https://doi.org/10.4236/jsea.2019.1210024 Rajesh Kumar, R., Shanbhag, V., Dinesha, K.V.: Automated deadlock detection for large Java libraries. In: Goswami, D., Hoang, T.A. (eds.) ICDCIT 2021. LNCS, vol. 12582, pp. 129–144. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65621-8_8 Mitschele-Thiel, A.: Systems Engineering with SDL: Developing Performance-Critical Communication Systems. Wiley, Hoboken (2001) O’Callaghan, E., Murray, J.: The International Review of Retail, Distribution and Consumer Research 27(5), 435–436 (2017). https://doi.org/10.1080/09593969.2017.1391957 Forsythe, S., Shi, B.: Consumer patronage and risk perceptions in Internet shopping. J. Bus. Res. 56(11), 867–875 (2003). https://doi.org/10.1016/S0148-2963(01)00273-9 Amirtha, R., Sivakumar, V.J., Hwang, Y.: Influence of perceived risk dimensions one-shopping behavioral intention among women—a family life cycle stage perspective. J. Theor. Appl. Electron. Commer. Res. 16(3), 320–355 (2021). https://doi.org/10.3390/jtaer16030022 Rita, P., Oliveira, T., Farisa, A.: The impact of e-service quality and customer satisfaction on customer behavior in online shopping. Heliyon 5(10), e02690 (2019). https://doi.org/10. 1016/j.heliyon.2019.e02690 Pappas, N.: Marketing strategies, perceived risks, and consumer trust in online buying behavior. J. Retail. Consum. Serv. 29, 92–103 (2016). https://doi.org/10.1016/j.jretconser.2015. 11.007 Sellami, C., Baron, M., Bechchi, M., Hadjali, A., Jean, S., Chabot, D.: Towards a unified framework for computational trust and reputation models for e-commerce applications. In: Cherfi, S., Perini, A., Nurcan, S. (eds.) RCIS 2021. LNBIP, vol. 415, pp. 616–622. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75018-3_44 Bezes, C.: Comparing online and in-store risks in multichannel shopping. Int. J. Retail Distrib. Manag. 44(3), 284–300 (2016). https://doi.org/10.1108/IJRDM-02-2015-0019 Wai, K., Dastane, O., Johari, Z., Ismail, N.B.: Perceived risk factors affecting consumers’ online shopping behavior. J. Asian Finance Econ. Bus. 6(4), 249–260 (2019). https://doi.org/ 10.2139/ssrn.3498766 Ziemba, E., Chmielarz, W. (eds): Information Technology for Management: Towards Business Excellence. 15th Conference, ISM 2020, and FedCSIS-IST 2020 Track, held as part of FedCSIS, Sofia, Bulgaria, 6–9 September 2020, extended and revised selected papers. Springer, Cham (2020). https://link.springer.com/book/10.1007/978-3-030-71846-6 Padua, D. (ed.): Encyclopedia of Parallel Computing. Springer, Boston (2011). https://doi. org/10.1007/978-0-387-09766-4
Analysis of Concurrent Processes in Internet of Things Solutions
41
19. van der Aalst, W.M.P., van Hee, K.M., Hofstede, A.H.M.: Soundness of workflow nets: classification, decidability, and analysis. Formal Aspects Comput. 23(3), 333–363 (2011). https://doi.org/10.1007/s00165-010-0161-4 20. Bekey, A.: Autonomous Robots. Massachusetts Institute of Technology (2005). ISBN 0-26202578-7 21. Bicevskis, J., Gaujens, A., Kalnins, J.: Testing of RUAV and UGV robots’ collaboration in the simulink environment. Baltic J. Modern Comput. 1(3–4), 156–177 (2013)
A Framework for Emotion-Driven Product Design Through Virtual Reality Davide Andreoletti(B) , Marco Paoliello , Luca Luceri , Tiziano Leidi , Achille Peternier , and Silvia Giordano University of Applied Sciences of Southern Switzerland, Via la Santa 1, 6962 Viganello, Switzerland {davide.andreoletti,marco.paoliello,luca.luceri,tiziano.leidi, achille.peternier,silvia.giordano}@supsi.ch
Abstract. Emotions play a significant role in the interaction between products and users. However, it is still not very well understood how users’ emotions can be incorporated in product design [34]. We argue that this gap is due to a lack of a methodological and technological framework for investigating emotions’ elicitation conditions and for emotion recognition. For example, the effectiveness of emotion elicitation conditions is generally validated by assessing users’ emotional response through ineffective means (e.g., surveys and interviews [36]). In this paper, we argue that Virtual Reality (VR) is the most suitable means to perform this investigation, and we propose a novel methodological framework, referred to as the Virtual-Reality-Based Emotion-Elicitationand-Recognition loop (VEE-Loop), that can be exploited to realize it. The VEE-Loop consists of continuous monitoring of users’ emotions, which are then provided to product designers as implicit user feedback. This information is used to dynamically change the content of VR environment, and the process is iterated until the desired affective state is solicited. In this work, we develop a proof-of-concept implementation of the VEE-Loop, and we apply it in two real use cases. Obtained results show that designers can precisely identify when users feel negative emotions (e.g., frustration) simply by analyzing their movements. As negative emotions signal troublesome interactions with the virtual representation of their products, designers obtain valuable feedback on how to enhance them. Keywords: Virtual Reality · Emotion Recognition Elicitation · Emotion-driven design
1
· Emotion
Introduction
It is notorious that, when making a purchase decision, customers often give more importance to emotional aspects than to rational ones. Indeed, various sources (e.g., [22]) report that customers value the emotional engagement provided by a product as much as they value its functionalities. Therefore, the ability to c Springer Nature Switzerland AG 2022 E. Ziemba and W. Chmielarz (Eds.): FedCSIS-AIST 2021/ISM 2021, LNBIP 442, pp. 42–61, 2022. https://doi.org/10.1007/978-3-030-98997-2_3
A Framework for Emotion-Driven Product Design Through Virtual Reality
43
develop products capable to induce certain emotions is today a very important skill for designers. The term Emotion-driven design refers to a set of working practices that designers follow to develop products that evoke specific emotions in their users. Following Ref. [21], we believe that the realization of emotion-driven design requires a deep comprehension of emotions, and specifically of the conditions of their elicitation (i.e., what triggers a specific emotion) and their manifestation on people (i.e., how to recognize that a person is actually feeling that emotion). In line with many existing works (e.g., [4,6]), we also argue that emotion-driven design should exploit iterative product adaptation strategies to effectively evoke the desired emotion in users. More formally, emotion-driven design should be characterized by i) an emotion elicitation phase, in which designers perform systematic experimentation of various emotion elicitation conditions (e.g., by modifying the sensory qualities of the product and context of its usage), ii) an emotion recognition phase, in which designers measure users’ emotional response and iii) a loop phase, in which emotion elicitation and emotion recognition are iterated until the designer converges to a product configuration capable to effectively evoke the intended emotion. Despite the relevant role that emotions play, however, emotion-driven design practices are still not widespread, and the main concern of designers remains the fulfillment of products’ functional requirements [23,34]. The authors of [23] motivate this contradiction with the argument of cultural inertia, according to which consolidated working praxis are resistant to change. While supporting this interpretation, in this work we try to move beyond it, by arguing also that emotiondriven practices (i.e., emotion elicitation, emotion recognition, and iteration of them) are hardly practical without the help of suitable technology. We believe, specifically, that designers lack a technological and methodological framework for the systematic experimentation of emotion elicitation conditions and for the assessment of users’ emotional responses. Such a system would open the door for the effective practice of emotion-driven design and would foster a radical transformation of the whole field of product design [33]. In our view, Virtual Reality (VR) represents the most suitable technology to turn the aforementioned scheme into a practical design instrument. In fact, VR is the digital technology that guarantees the most tangible experience across different domains, by allowing users to feel a sense of presence that makes emotion elicitation conditions quite similar to a real scenario [15]. Then, VR is a completely controlled environment in which virtual contents can be flexibly modified. In addition, VR allows gathering a set of user-related data from which their emotions can be inferred (e.g., by elaborating bio-feedback, such as heartbeat and gestures, with an intelligent system). Note that most bio-feedback signals can be directly gathered through the Head-Mounted Displays (HMDs) used by the VR system, coupled with external devices (e.g., wearables) when needed. The use of VR to implement the emotion-driven design framework based on the iteration of emotion elicitation and recognition tasks gives rise to a scheme that we refer to as the Virtual-Reality-Based Emotion-Elicitation- andRecognition loop or, in short, the VEE-Loop. A high-level representation of the
44
D. Andreoletti et al.
VEE-Loop is depicted in Fig. 1. This framework can find applications in a high number of areas. For example, product designers can prototype their products in VR and, exploiting flexible content modification, enhance them based on the emotional feedback of potential customers. As this feedback is obtained before the actual product development, the risk of designing unsuccessful products is significantly reduced. Moreover, the VEE-Loop can be exploited to adapt a service delivered through VR to the current emotional status of its users (e.g., in remote learning). We stress on the fact that the proposed system is not thought to create addictive products leading to compulsive purchases, but only to develop products increasingly tailored to their users’ needs. Please also note that the proposed scheme may come with several limitations. For instance, different people (and even the same person, at different times) can react differently to the same emotion elicitation choice, which makes it harder to identify which products’ characteristics evoke a particular emotion. Then, finding such characteristics may require a large set of experiments. We also note, however, that the VEELoop can be designed to be a portable instrument, therefore facilitating remote experiments, and in turn involve a large basis of potential customers. Another limitation concerns the subjective and cultural aspects of emotions (e.g., emotions are expressed differently across regions of the world), which can limit the generalization of the obtained results. In this work, we apply a first proof-of-concept implementation of the VEELoop in the use cases of personalized teaching in industrial manufacturing and product design. In these use cases, designers aim to identify the moments of troublesome interaction between users and virtual scenes. This information, obtained using the VEE-Loop to detect moments in which users experience negative emotions, is valuable feedback that enables designers to understand how to punctually enhance their products. In this first implementation, only one iteration of emotion elicitation and emotion recognition is performed. The extension to multiple iterations is left as future work. In summary, our work’s main contributions are the following: (i) we present an extensive review and discussion on the importance of emotions in product design and delivery, (ii) we provide a formalization of iterative emotion-driven design approaches, (iii) we propose a technological framework to realize this approach by means of VR, and (iv) we apply a proof-of-concept implementation of the VEE-Loop to two use cases and we evaluate its effectiveness in evoking and recognizing emotions. The structure of the paper is as follows. Section 2 introduces the main concept of emotion-driven design, elaborates on the role of VR as the main enabler for its realization, and briefly reviews the related work. Then, in Sect. 3 the VEE-Loop is formally introduced, along with a description of its architecture and how it has been implemented in this work. The main research findings, obtained by applying the VEE-Loop on the use cases of teaching in industrial manufacturing and product design, are presented in Sect. 4 and further discussed in Sect. 5. Finally, Sect. 6 concludes the paper.
A Framework for Emotion-Driven Product Design Through Virtual Reality
45
Fig. 1. High-level representation of the VEE loop
2
Theoretical Background
This section presents the theoretical background required to understand the proposed methodology. Specifically, we first elaborate on the paradigm of emotiondriven design, formalizing its main components of emotion elicitation, emotion recognition, and iteration of the two. Then, we provide arguments supporting the use of VR as a main enabling technology for realizing emotion-driven design. 2.1
Emotion-Driven Product Design
Emotion-Driven design is a methodology aimed at developing products capable to evoke specific emotions in their users [20]. Due to the prominent role the customers ascribe to emotions, emotion-driven design is rapidly acquiring interest. Existing frameworks and instruments for emotion-driven design (see, for instance, the Emotion-Driven Innovation paradigm at Ref. [21] and the Wheel of Emotions at Ref. [5]) revolve around the effective actuation of emotion elicitation and recognition strategies. Indeed, to implement such emotion-driven methodologies, designers are required to deeply understand how emotions are evoked (i.e., emotion elicitation) and how they are manifested (i.e., emotion recognition). In Ref. [20], the ability to effectively perform emotion elicitation and recognition is referred to as emotion knowledge. Consistently with other iterative design methodologies (see Refs. [4,6]), we also argue that emotion elicitation and emotion recognition should be iterated until the developed product can effectively arouse the intended emotional reaction. In the rest of this subsection, we provide further details about the three main building blocks of an emotion-driven design strategy, namely emotion elicitation, emotion recognition, and iteration of them.
46
D. Andreoletti et al.
Emotion Elicitation. Emotion Elicitation is the task of evoking specific emotions. The most difficult aspect of this task is the identification of the conditions under which specific emotions are most likely evoked. In product design, such conditions span two main dimensions, namely the formal characteristics of a product (e.g., shape and color) and the context of its use [33]. Several theories have been formulated to evoke emotions by changing the formal characteristics of a product (e.g., warm colors generally induce positive emotions). However, the subjective nature of emotions renders these theories difficult to generalize. For instance, the relationship between certain types of stimuli and emotions is neither deterministic nor constant, as it can change over time even for the same person [20]. To the best of our knowledge, no theory has ever been formulated on how to evoke emotions through the modification of a product’s usage context. Emotion Recognition. Emotion Recognition is the task of classifying a set of observations (e.g., the bio-feedback of a person, such as heart pressure and skin conductance) into a class of emotion, e.g., joy, sadness, and frustration. The authors of Ref. [20] define the ability to distinguish similar emotions (e.g., frustration and nervousness) as emotion granularity. As the spectrum of possible emotions is very vast (for instance, customers can feel up to 25 different positive emotions when interacting with a product [20]), it is essential that designers effectively recognize even the most subtle differences among emotions, going beyond the distinction between positive and negative emotions [18]. Loop of Emotion Elicitation and Recognition. The phase of emotion recognition provides designers with implicit feedback about the affective state of a product user. This feedback can inform the designer on the suitability of the current emotion elicitation condition (e.g., shape and colors of the product) and can properly drive its modification. In practice, emotion elicitation conditions should be modified according to the current emotional state of the user, and this operation is repeated until the designer understands which conditions are effectively capable to evoke the desired emotion. This iterative scheme follows existing design strategies (e.g., [6]) in which the feedback is explicit (e.g., obtained through surveys and questionnaires). 2.2
VR as Enabling Technology
VR represents the most suitable technology to realize emotion elicitation, emotion recognition, and iteration of them. In fact, VR allows creating flexiblymodifiable and dynamic experiences, which can be distributed online, replayed at will, and regularly updated. The level of immersion provided by VR induces a sense of presence that amplifies emotional reactions [15,29,30], which renders VR more effective than other digital media in both emotion elicitation and recognition. The recent uptake in VR adoption, as well as increased availability of affordable VR headsets, provide a much lower entry-point to the technology,
A Framework for Emotion-Driven Product Design Through Virtual Reality
47
which is now considered a commodity, off-the-shelf option no longer limited to research laboratories or professional contexts. Then, modern VR devices are equipped with sensors that acquire multiple types of data. For example, eye tracking and body tracking sensors can be used to precisely determine what the user is currently looking at, but also to derive a series of additional metrics such as heartbeat and respiratory rate [38]. In addition, HMDs can still be coupled with additional monitoring devices to increase the amount, types, and accuracy of users’ bio-feedback signals for this task (e.g., by combining the full-body tracking provided by the Microsoft Kinect with the head and hands positions returned by the headset). Moreover, tools have been recently proposed to enable HMDs to track the movements of the face1 . These data alone are sufficient ingredients to build an effective emotion recognition system. Nevertheless, VR also enables simulating the context in which a user acts and analyzing his/her behaviors in relation to this. Since emotions are observable from behaviors as well, VR has the potential to open the doors for the development of revolutionary emotion recognition systems, which exploit both bio-signals and contextual data, such as the interaction between users and the virtual scene (VE). 2.3
Literature Review
This Section starts by elaborating on the role of emotions in users-product interaction. Then, it briefly reviews early prototypes of the VEE-Loop. Emotions in Users-Product Interaction. In Ref. [21], three main product characteristics have been identified as crucial to evoke emotions on users: appearance, functionality, and symbolic meaning. The appearance of a product is defined by its sensory qualities, such as its shape and color. It is well known that there is a causal relation between certain sensory qualities and the emotional response of customers (see the Kansei model [3]). For instance, warmer colors generally increase the arousal levels of evoked emotions [14]. The functionality of a product does not only influence its utility, i.e., whether it meets the needs of its users, but also users’ emotions. In fact, functional products that minimize the efforts done by users are regarded as source of positive emotions. For instance, a product that improves a situation that is perceived as frustrating and limiting (e.g., by enabling to gain space in small environments) is likely to evoke positive emotions. On the contrary, a product that is cumbersome and reduces the available space likely leads to frustration and annoyance. Then, the symbolic meaning of a product refers to its connection with a broader scheme of beliefs and values. According to the appraisal theory [13], users feel different emotions based on the idea they have regarding the role of a certain product in their lives. Another example is the symbolic value given by the affinity of a product with a certain idea (e.g., a flag that represents a certain 1
https://uploadvr.com/htc-facial-tracker-quest-index/.
48
D. Andreoletti et al.
political view). The importance of the symbolic meaning is well expressed in the famous quote stated by Simon Sinek in one of the most viewed TED talk ever2 : people don’t buy what you do, they buy why you do it [12]. In other words, the meaning given to a product depends on people beliefs and values, as well as on their affinity with a company’s mission and concerns. In this respect, the law of concern [19] affirms that every emotion hides a personal concern and a disposition to prefer particular states of the world. Because of that, emotions have been defined in [20] as gateways to uncover people’s underlying concerns. Similarly, the authors of Ref. [23] argue that understanding users at the emotional level allows having a deeper comprehension of their values, which is crucial to produce radical product innovations, while the sole understanding of users’ functional needs yields only superficial and slight product modifications. Moreover, products capable to emotionally engage their users foster creative and innovative thinking [20], and benefit well-being [18]. Therefore, the capability of understanding and engaging users at the emotional level is crucial to design products that are appreciated and guarantee loyalty of customers in the long term, as a clear and tight connection between a product and a specific emotion reinforces brand identification [23]. Prototypes of the VEE-Loop. Reference [1] proposes an architecture to perform users’ emotion-driven generation of a VE in the context of mental health treatment. In this solution, users’ emotions are inferred from the analysis of various bio-feedback and, based on the detected emotions, a VE is generated to stabilize them, e.g., to induce calm. Another work that investigates the use of VR as a tool to perform ER and EE can be found in Ref. [24], where ER is performed using a very simple machine learning algorithm that works on users’ electroencephalograms, while EE is implemented using static VEs. Most of the research on ER is done on data that carry a single emotion, which is often nonspontaneous and exaggerated [25]. Instead, the proposed VEE-Loop aims to consider streams of multi-mode data (which introduce the challenge of identifying the onset and end of emotions) and exploiting the sense of presence typical of VR experience to induce (and then, recognize) more spontaneous emotions [26]. Concerning the iteration of emotion elicitation and recognition, several existing works (e.g., [4,6]) show how iterative schemes are widespread in product design. These schemes are based on the idea that products can be iteratively improved by incrementally modifying their characteristic in response to the perception of various persons (e.g., designers and customers). Reference [32] represents an attempt to partially automatize this iterative process by employing a genetic algorithm that suggests how to modify a product based on current products’ characteristics and by users’ perception. References [21] and [18] describe methodological frameworks that designers can follow to develop products with emotional intentions. However, as far as we 2
https://www.ted.com/talks/simon sinek how great leaders inspire action? language=en.
A Framework for Emotion-Driven Product Design Through Virtual Reality
49
know, we are the first to propose a technological solution that can help to perform emotion-driven and iterative product design. We also remark that the proposed VEE-Loop can also be used to dynamically modify virtual products based on users’ emotions (e.g., a service of remote schooling). In this respect, similar previous work (that, however, do not make use of VR technologies) are [9–11] and [14], which propose systems for emotion-driven recommendation and advertising, respectively. Similarly, a digital system that adapts the characteristics of a service based on users’ emotions is proposed in [8]. Then, Ref. [2] proposes a gaming framework that changes the characteristics of the game based on users’ emotions. Finally, Ref. [7] describes the realization of a smart office in which sensory features (e.g., light in the office) and tasks assigned to users are changed to regulate their emotions.
3 3.1
Research Methodology The VEE-Loop
The VEE-Loop is the realization of the Emotion-Recognition-EmotionElicitation loop described in Subsect. 2.1 by means of VR. The architecture of the VEE-Loop is mainly based on a module for Emotion Elicitation (EE module) and a module for Emotion Recognition (ER module). The interaction between such modules is depicted in Fig. 2: the ER module infers the emotion most likely perceived by the user by elaborating information such user’s bio-feedback and the interaction between user and VE (e.g., her behavior or the attention she pays to a particular virtual object). The emotion detected by the ER module and the emotion that the designer aims to evoke are then given in input to the EE module, which is responsible for dynamically changing the VE. In the following, we provide more details about the two modules. Emotion Elicitation Module. The Emotion Elicitation (EE) module outputs a modified VE based on i) a representation of the current VE, ii) the emotion detected by the ER module, and iii) the emotion that the designer is willing to induce. Based on the distance between the emotion intended by the designer and that recognized by the ER module, the EE module modifies the VE (e.g., by changing the color of a given virtual object). How to measure the distance between emotions, and the consequent modification of the VE are two pending research questions. A possible approach is to describe emotions using a dimensional representation (e.g., in the valence/arousal plane), which allows quantifying the difference among them. Then, the difficulty in modifying content to elicit emotions is a well-known and unsolved problem [15,34]. To tackle this problem, the VE should be clearly represented, e.g., in terms of positions, semantic (i.e., functional role), and the sensory qualities (e.g., shape and size) of the most salient virtual objects. Then, a model to modify the VE according to the measured distance between desired and actual emotion needs to be devised. For instance, if the target emotion is
50
D. Andreoletti et al.
Fig. 2. Architecture of the proposed VEE-Loop
joy, but the one detected by the ER module is sadness, the colors of the objects could be tuned to be warmer, given that warmer colors are usually associated with joy. In our view, this task is still very complex to be automatized and would require the manual tuning of VE’s characteristics. However, in case the VEELoop became a tool of common use, it would function also as a tool for data collection. Specifically, the tuning choices of designers, along with the emotional reactions of users, might be collected and then used to train automatic systems (e.g., machine learning algorithms). Emotion Recognition Module. The ER module infers the emotion of users from a set of multi-modal signals. Such data can be i) users’ bio-feedback signals and ii) user-VE interactions. User’s bio-feedback (e.g., vital parameters) can generally be acquired using the HMD or with other supporting tools that do not hinder VR experience (e.g., wearable devices). On the other hand, the interaction between user and VE is much harder to model and, therefore, relative data are difficult to obtain. However, as users’ emotions are strongly correlated with their interaction with the VE [18], we argue that such models are essential to fully exploit the potentiality of ER in VR. For the processing of bio-feedback and other contextual data, we envision an architecture composed of the following four layers:
A Framework for Emotion-Driven Product Design Through Virtual Reality
51
– Feature Extraction: a set of suitable features has to be defined to capture the properties of users’ bio-feedback and contextual data that are beneficial for the emotion recognition task. A quite established literature can help in the definition of features to represent a high number of bio-feedback, both handcrafted and automatically learned [31] (e.g., acceleration of joints for body’s movements and spectrogram for voices, or learned with a deep learning approach). The definition of features to represent contextual data (e.g., interaction of the user with the VE), instead, requires a more pioneering attitude. We argue that existing features used to model the level of attention and engagement can help to define some new more emotion-oriented features. – Fusion: this layer is meant to combine data, features, and algorithms to maximally exploit the information contained in the users’ data to increase the generalization of the ER module; a research challenge here is to combine data of different domains (e.g., voice, heart rate and interaction with the VE) that have radically different properties, such as acquisition frequency and temporal dynamic. – Segmentation: please note that, during the use of VR, users’ emotions may change over time. This layer performs data segmentation, i.e., it segments the stream into portions of signals that are coherent with respect to the particular emotion they carry. This is a remarkable difference with respect to existing studies on emotion recognition, which generally assume that observed data is associated with a single emotional state. – Emotion Classification: finally, this layer infers the emotion that the obtained segmented data most likely carries. Note that emotions can be represented either using a set of classes (e.g., joy, fear, etc.), or using a dimensional approach (e.g., arousal, valence, dominance [17]). Based on the type of representation that is chosen, this layer performs a supervised learning task, either a classification or a regression. 3.2
Proof-of-Concept Implementation
In this work, we implement a simplified version of the VEE-Loop, and we apply it to two use cases. Specifically, we apply VR to implement emotion elicitation and emotion recognition in the scenarios of personalized teaching in industrial manufacturing and product design. The modification of the virtual scene based on the obtained feedback (i.e., the iteration of emotion elicitation and recognition) is left as future work. Indeed, our aim in this work is to prove that VR is a reliable means to evoke and recognize emotions in two real-world settings and that designers can obtain useful feedback to improve their products in an informed manner. More in detail, designers develop the virtual representations of their products and aim to identify the points in which the interaction between users and products is troublesome. This is performed by inferring, from the analysis of users’ movements, the instants in which they feel negative emotions. The identification of negative emotions allows designers to understand the specific points at which products need to be improved to make the user experience more comfortable. By
52
D. Andreoletti et al.
movements, we specifically refer to the trajectory of the head and hands tracked with the headset used to execute the simulation in VR. By negative emotions, we refer to a broad spectrum of affective states that are likely correlated with a difficult interaction with the virtual content (e.g., frustration, disappointment, and nervousness). On the other hand, non-negative emotions (i.e., neutral states) are associated with the normal behavior of the user, in which no specific emotion is perceived. Only two classes of emotional states (namely, neutral and negative) have been considered because i) the task of emotion recognition from body movement is extremely challenging (since the same movement can express different emotions), and it becomes much simpler if the decision that the algorithm has to take is between two classes of emotions only and ii) this disambiguation was sufficient for the scope of the use cases in which we applied the VEE-Loop. Data Processing. The trajectory of a user’s movements (i.e., positions of head and hands) is sliced into a set of windows of fixed length, which overlap with adjacent windows by a certain percentage of their length. Each window is successively under-sampled to reduce data redundancy. From each window, a set of movement-related features (e.g., speed and acceleration) are extracted, both in the time and in the frequency domain. These features are successively fed to a machine learning module, which has been previously trained to classify each window as belonging to either the neutral or negative class. The considered machine learning module is the Extreme Gradient Boosting Regressor, which significantly outperformed other candidates (e.g., neural networks and gradient boosting classifier). This module returns a stream of values representing the likelihood that the relative windows are associated with negative emotion. Please note that the closer such values are to 0, the higher is the likelihood that the user is feeling a neutral emotion. Instead, the closer such values are to 1, the higher is the likelihood that the user is feeling a negative emotion. This stream is then provided to the segmentation module, which returns the instants in which, most likely, users experience negative emotion. These instants are then collected and used to identify which parts of the virtual scene are responsible for the problematic interaction. This allows designers to modify the virtual product, in an informed manner. Please note that the parameters considered in the implementation of the VEE-Loop vary based on the considered use cases, as further detailed in Subsects. 4.1 and 4.2. Data Collection. In order to train and validate the effectiveness of the proposed system, we collected datasets specific to the considered use case. Each sample of these datasets is characterized by a sequence of movements (from which we perform the feature extraction previously described) and the corresponding ground truth, i.e., a sequence of labels belonging to the negative or neutral class. Datasets have been collected with the help of a set of actors and an assistant. The assistant uses our developed tagging system to associate ground truth values with the streams of movements.
A Framework for Emotion-Driven Product Design Through Virtual Reality
53
Performance Evaluation. The effectiveness of the system is evaluated considering its capability of identifying the instants of transition between negative and neutral states. Such instants are correctly identified if the instant inferred by the proposed system is close to the actual ground-truth instant by a quantity lower than a tolerance threshold T . The effectiveness of the system is measured considering its Precision, Recall, and F-score in reaching the aforementioned objective. The precision is the ratio between the number of correctly identified instants and the total number of identified instants. The recall is the ratio between the number of correctly identified instants and the total number of actual groundtruth instants. The F-score balances precision and recall. The training and the test are done following a leave-one-sample-out approach, i.e., only one sample is used for the test, while all the remaining ones are used to form the training set. The results presented in the following are the average of those obtained over each sample of the dataset. In the following sub-sections, we describe more in detail how the aforementioned scheme has been implemented in the two considered use cases, and we present the relative results that we have obtained.
4
Research Findings
4.1
Scenario 1: Training in Industrial Manufacturing
The first use case is carried out within the EIT Manufacturing V-Machina project3 , whose aim is to exploit VR-based simulations to teach practitioners in manufacturing how to correctly use industrial machinery. A representation of the virtual scene considered in the project is depicted in Fig. 3. The VEELoop is employed to automatically identify when a practitioner (i.e., a trainee) is experiencing a negative emotion, which indicates a difficult interaction with the virtual machinery. With this data in hand, the trainer can understand which characteristics of the training session have generated a negative emotion, and modify them accordingly (i.e., emotion elicitation). For example, the trainer can decide to simplify the training session in its most problematic parts. Specific Implementation of the VEE Loop. The signal is sliced into windows of 2 s, which overlap for 50% of their length. Each window is under-sampled by a factor of 5. From this signal, we computed the following features: the trajectory of head and hands, the discrete derivative of the first order (i.e., velocity), the discrete derivative of the second order (i.e., acceleration), and the discrete derivative of the third order. These features are then fed to the machine-learning module, which returns the stream of values between 0 and 1. Note that a requirement of the considered use case is to perform a real-time identification of negative emotions. To meet this requirement, at each instant, the values corresponding to the last 3 s are sent to the segmentation module. This number is a hyperparameter of our system, which has been tuned to find a balance between delay 3
https://v-machina.supsi.ch/.
54
D. Andreoletti et al.
and effectiveness of the system. Specifically, low values allow real-time processing, but also generally decreases the effectiveness of the system. The segmentation module relies on a heuristic approach that we developed to select only the most relevant transitions from negative to neutral states. In the following subsection, we present the numerical results obtained by the proposed system.
Fig. 3. Interaction between a trainee and an industrial machinery represented in VR
4.2
Scenario 2: Product Prototyping in VR
The second use case is carried out within the Innosuisse VR+4CAD project4 , whose aim is to increase the interoperability between CAD and VR technologies. By using the tools developed within VR+4CAD, product designers can automatically convert CAD projects into their virtual representation in VR. Users interact with such virtual representations and provide feedback on its usability. The VEE-Loop is used to identify when a user feels negative emotions (i.e., emotion recognition), which indicate a problematic interaction with the product. By collecting such implicit feedback, the designer understands how to punctually modify the product (i.e., emotion elicitation). Users interact with is an AC wall-mounted charger. Specific Implementation of the VEE-Loop. The signal is sliced into overlapping rectangular windows of 4 s, which overlap with nearby windows for the 2.5% of their length. Each window is under-sampled by a factor of 5 to reduce data redundancy. The following features have been extracted in the time domain: i) trajectory of head and hands within each window, ii) discrete first-order derivative (i.e., the velocity of the movements, or the difference between trajectories of two adjacent windows), iii) discrete second-order derivative (i.e., the acceleration of the movements), iv) discrete third-order derivative, v) discrete fourth-order derivative and vi) discrete fifth-order derivative. Although the signal in the temporal domain already carries significant information for the considered inference task, other relevant information is more evident from the signal in the frequency domain. Hence, we also computed vii) the module of the Fourier Transform. 4
https://artanim.ch/project/vr4cad/.
A Framework for Emotion-Driven Product Design Through Virtual Reality
55
These features are successively fed to a machine-learning module that returns the stream of values between 0 and 1. This stream is fed to the segmentation module, which returns the central instants of the most likely negative emotions. To reduce noise and remove spurious artifacts, we applied the Savitzky-Golay filter [35] to the stream. Finally, we implemented a peak detection strategy on the filtered signal, which returns only the most significant instant corresponding to the central timeframe of negative emotion. Product designers collect the instants in which, most likely, negative emotions have been experienced by users. From these instants, they can identify which parts of the product determined such emotions, and obtain useful information to modify the product, in an informed manner. 4.3
Illustrative Numerical Results
Scenario 1. To train and validate the effectiveness of the VEE-Loop in the first scenario, we collected two datasets. Dataset 1 has been collected with the help of 6 actors and consists of 92 samples, whose average duration is 120 s. Dataset 2 has been collected with the help of 5 actors and consists of 40 samples, whose average duration is 180 s. Table 1 shows F-score, Precision and Recall computed on such datasets, for tolerance values T of 1, 3, and 5 s. From these numbers, we observe that the system is significantly more effective when trained and validated on Dataset 1 than on Dataset 2. This fact can be interpreted considering that emotions in the first dataset are more exaggerated than in the second one and, therefore, are easier to detect. In addition, precision and recall are perfectly balanced on Dataset 1, while in Dataset 2 the system shows a precision higher than the recall (i.e., the number of false negatives is higher than the number of false positives). This can also be interpreted considering that, as negative emotions in Dataset 2 are not exaggerated, they are easier to be confused with neutral states. We note that, when a tolerance level of 5 s is considered, the F-score is always satisfactory (i.e., 0.91 in Dataset 1 and 0.75 in Dataset 2). In our view, such tolerance is sufficiently small to precisely correlate the inferred instants with the actions that a user is doing in the virtual scene, therefore enabling the trainer to understand what triggered a negative emotion (e.g., a piece of machinery difficult to use). Scenario 2. To train and validate the effectiveness of the VEE-Loop in the second scenario, we collected a dataset consisting of 40 samples. Each sample represents the movements of a user in VR which last, on average, 120 s. Table 1 shows F-score, Precision and Recall computed on such datasets, for tolerance values T of 1, 3, and 5 s. From these numbers, we observe that precision and recall are fairly balanced, regardless of the considered tolerance value. This means that the proposed system is not biased to negative emotions (which would be associated with a recall much higher than the precision) nor to the neutral state (which would be associated with a precision much higher than the recall).
56
D. Andreoletti et al.
Moreover, we note that, when a tolerance level of 5 s is considered, the reached F-score is fairly high (i.e., 0.82). We argue that such tolerance is sufficiently small to precisely correlate the inferred instants with the actions that a user is doing in the virtual scene, therefore enabling designers to understand what triggered a negative emotion (e.g., a piece of the product difficult to use). Table 1. Effectiveness of the detection of instants of negative emotions, for several tolerance values T seconds, with T ∈ {1, 3, 5} Scenario 1 Scenario 2 Dataset 1 Dataset 2 T=1 T=3 T=5 T=1 T=3 T=5 T=1 T=3 T=5
5
Precision 0.57
0.88
0.91
0.21
0.74
0.78
0.32
0.77
0.87
Recall
0.57
0.88
0.91
0.18
0.70
0.74
0.32
0.72
0.81
F-score
0.57
0.88
0.91
0.19
0.71
0.75
0.32
0.73
0.82
Discussion of Findings
This section is dedicated to the discussion of our main research findings in light of the performed experiments. Along with this discussion, we also present several challenges that need to be properly tackled to make the VEE-Loop an effective design instrument. Finally, we elaborate on the application of the VEE-Loop, and on its potential impact across various areas. 5.1
Results of the Experiments
From our experiments, we realized that VR is a suitable means to induce emotions in users. Indeed, we observed that users actually felt negative emotions whenever the interaction with the virtual scene was troublesome. In scenario 1 (see Subsect. 4.1), users felt negative emotions mainly when the goals set by the trainer were difficult to accomplish. In scenario 2 (see Subsect. 4.2), negative emotions generally occurred when pieces of the AC wall-mounted charger were difficult to manage and assemble. Future experiments will be devoted to further studying VR as a tool for emotion elicitation. In particular, we plan to investigate the effect that the number of iteration of the VEE-Loop has on the evoked emotion (e.g., users might get bored after long experiments), as well as the impact of changing the context of products usage. Then, our experiments also prove that VR is a suitable technology to perform emotion recognition from the stream of data carrying different emotional states. To our knowledge, such a scenario is still quite unexplored in the emotion recognition one, where signals are generally assumed to carry a single emotion. The proposed architecture composed of emotion recognition and segmentation blocks is therefore a novel solution, which proved capable to identify instants of
A Framework for Emotion-Driven Product Design Through Virtual Reality
57
negative emotions with high accuracy. Hence, designers can effectively exploit the VEE-Loop to punctually identify which parts of their products need to be enhanced to avoid troublesome usage. In future experiments, we will consider a richer set of observations, spanning both users’ bio-feedback (e.g., gestures and blood pressure) and behaviors. The main challenge for the former is the fusion of different types of data, while for the latter is the definition of suitable experiments to gain relevant insights from users’ behaviors (e.g., how to model their interaction with the virtual scene). 5.2
Areas of Application and Potential Impact
We identify the following three main areas of applications: 1) product design, 2) virtual service delivery, and 3) research in emotion recognition and elicitation. In product design, the VEE-Loop can be used to validate the capability of a product to evoke the desired emotions before its expensive tangible production. The sense of presence given by VR technologies guarantees that evoked emotions are similar to those induced using real products. Being the VEE-Loop a portable technology, experiments can easily involve a higher number of users with respect to traditional on-site experiments. Therefore, designers can get insights from both single users and large populations of customers (e.g., to understand which factors reinforce brand identification [27,28]). In virtual service delivery, the VEE-Loop can help deliver services that are increasingly tailored to their users’ emotions. For instance, in training activities, learning tasks can be adapted to the emotions of trainees. This is especially relevant in situations where having the information on users’ emotional states is highly beneficial [16], but unavailable for some reasons (e.g., in remote schooling during the Covid-19 pandemic). Finally, the VEE-Loop can be used to perform research in emotion elicitation and recognition. In emotion elicitation, the VEE-Loop can be used to better study the effectiveness of elicitation conditions (e.g., in marketing, interior design, etc.). In emotion recognition, it can improve current approaches, for example by including other types of users’ bio-feedback and their behaviors. In the following, we discuss several impacts that the VEE-Loop can have: – Economical Impact: by understanding users’ emotional response to the characteristics of a product before its tangible development, product designers can apply modifications in an informed manner, which in turn reduces the risks (and associated costs) of creating unsuccessful products and services. – Social and Environmental Impact: the VEE-Loop can be employed in remote teaching activities, especially those requiring the interaction with expensive, cumbersome, and potentially dangerous machinery and robots. In industry, for example, the cost of expensive teaching activities can be significantly lowered by using virtual machinery instead of real ones. This opens the doors to more widespread and fair adoption of sophisticated technologies. In addition to that, the possibility to perform remote teaching also in areas that previously required physical presence (e.g., learning how to use industrial machinery) limits the need for unnecessary travels, and, in turn, the emissions produced by means of transport [37].
58
D. Andreoletti et al.
– Research Impact: the VEE-Loop can be used as a research instrument in the growing fields of EE and ER, which can be studied in more realistic scenarios than what is currently done (e.g., in the interaction with industrial machinery).
6
Conclusions
In this paper, we proposed a methodological framework for implementing emotion-driven design. This framework is based on the iteration of emotion elicitation and recognition phases through the aid of VR. Because of this, we named this framework the Virtual-Reality-Based Emotion-Elicitation-and-Recognition loop (VEE-Loop). Emotion elicitation and emotion recognition are required to experiment with various products’ characteristics, and to assess users’ emotional reactions. This reaction is used by designers as implicit feedback to drive the modification of products’ design. The iteration of emotion elicitation and recognition is performed to let designers converge to a version of products that actually evokes a target emotion. In this work, we have provided arguments that support the claim that VR is the most suitable technology to implement such an iterative scheme. Indeed, VR allows creating virtual yet very realistic environments that designers can flexibly modify to induce specific emotional reactions. Moreover, in VR it is possible to easily gather data (e.g., users’ bio-feedback) to infer users’ emotional states. We then developed a proof-of-concept implementation of the VEE-Loop, and we applied it to the use cases of personalized teaching in industrial manufacturing and product design. In these use cases, we performed experiments that proved the capability of the proposed system to correctly identify troublesome interactions between users and virtual contents. These promising results show that the VEE-Loop has the potential to become an effective instrument for emotion-driven design, provided that several issues are properly handled. From a methodological perspective, the VEE-Loop needs to be validated over a comprehensive set of samples which, to handle the different expression of emotions across different cultures, should include users from many countries. From a technological perspective, additional users’ bio-feedback (i.e., not only their movements) should be considered in the emotion recognition task. Finally, from the legal perspective, for example, the treatment of sensitive information, such as those related to a person’s emotion, should be performed carefully. Indeed, potential customers should be informed of the sensitiveness of emotion-related data, and should grant explicit permission to their elaboration. Future works will be done to improve mostly the methodological and technological aspects. Acknowledgments. This research was funded by the EU EIT Manufacturing initiative through the V-Machina project, and by the Innosuisse innovation project VR+4CAD. Many thanks to ArtAnim (Geneva) for the collaboration and support on this project, and to Green Motion SA (Lausanne) for providing us with the 3D model of the AC wall-mounted charger.
A Framework for Emotion-Driven Product Design Through Virtual Reality
59
References 1. Badia, S.B., et al.: Toward emotionally adaptive virtual reality for mental health applications. IEEE J. Biomed. Health Inform. 23, 1877–1887 (2018). https://doi. org/10.1109/jbhi.2018.2878846 2. Hossain, M.S., Muhammad, G., Song, B., Hassan, M.M., Alelaiwi, A., Alamri, A.: Audio-visual emotion-aware cloud gaming framework. IEEE Trans. Circuits Syst. Video Technol. 25, 2105–2118 (2015). https://doi.org/10.1109/TCSVT.2015. 2444731 3. Nagamachi, M.: Kansei engineering: a new ergonomic consumer-oriented technology for product development. Int. J. Ind. Ergon. 15, 3–11 (1995). https://doi.org/ 10.1016/0169-8141(94)00052-5 4. Luh, P.B., Liu, F., Moser, B.: Scheduling of design projects with uncertain number of iterations. Eur. J. Oper. Res. 113(3), 575–592 (1999). https://doi.org/10.1016/ S0377-2217(98)00027-7 5. Stals, S.: Exploring emotion, affect and technology in the urban environment. In: Proceedings of the 2017 ACM Conference Companion Publication on Designing Interactive Systems, pp. 404–406 (2017). https://doi.org/10.1145/3064857.3079172 6. Dou, R., Zhang, Y., Nan, G.: Iterative product design through group opinion evolution. Int. J. Prod. Res. 55(13), 3886–3905 (2017). https://doi.org/10.1080/ 00207543.2017.1316020 7. Munoz, S., Araque, O., S´ anchez-Rada, J.F., Iglesias, C.A.: An emotion aware task automation architecture based on semantic technologies for smart offices. Sensors 18(5), 1499 (2018). https://doi.org/10.3390/s18051499 8. Condori-Fernandez, N.: HAPPYNESS: an emotion-aware QoS assurance framework for enhancing user experience. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 235–237 (2017). https://doi.org/10.1109/ICSE-C.2017.137 9. Sindhu, N., Jerritta, S., Anjali, R.: Emotion driven mood enhancing multimedia recommendation system using physiological signal. IOP Conf. Ser. Mater. Sci. Eng. 1070(1), 012070 (2021). https://doi.org/10.1088/1757-899X/1070/1/012070 10. Mariappan, M.B., Suk, M., Prabhakaran, B.: FaceFetch: a user emotion driven multimedia content recommendation system based on facial expression recognition. In: 2012 IEEE International Symposium on Multimedia, pp. 84–87 (2012). https:// doi.org/10.1109/ISM.2012.24 11. Polignano, M., Narducci, F., De Gemmis, M., Semeraro, G.: Towards emotionaware recommender systems: an affective coherence model based on emotion-driven behaviors. Expert Syst. Appl. 170, 114382 (2021). https://doi.org/10.1016/j.eswa. 2020.114382 12. Sinek, S.: Start with Why: How Great Leaders Inspire Everyone to Take Action. Penguin, New York (2009) 13. Desmet, P.: Designing emotions. Delft University of Technology, Department of Industrial Design (2002). https://doi.org/10.4236/msce.2020.84008 14. Liu, Y., Sourina, O., Hafiyyandi, M.R.: EEG-based emotion-adaptive advertising. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pp. 843–848 (2012). https://doi.org/10.1109/ACII.2013.158 15. Riva, G., et al.: Affective interactions using virtual reality: the link between presence and emotions 10(1), 45–56 (2007). https://doi.org/10.1089/cpb.2006.9993 16. Ortigosa, A., Mart´ın, J.M., Carro, R.M.: Sentiment analysis in Facebook and its application to e-learning. Comput. Hum. Behav. 31, 527–541 (2014). https://doi. org/10.1016/j.chb.2013.05.024
60
D. Andreoletti et al.
17. Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Current Psychol. 14(4), 261– 292 (1996). https://doi.org/10.1007/BF02686918 18. Kim, C., Yoon, J., Desmet, P., Pohlmeyer, A.: Designing for Positive Emotions: Issues and Emerging Research Directions. Ashgate Publishing Ltd. (2021). https:// doi.org/10.1080/14606925.2020.1845434 19. Frijda, N.H.: The Emotions. Cambridge University Press, Cambridge (1986) 20. Desmet, P.M.A., Fokkinga, S.F., Ozkaramanli, D., Yoon, J.: Emotion-driven product design. Emot. Meas., 405–426 (2016). https://doi.org/10.1016/B978-0-08100508-8.00016-3 21. Alaniz, T., Biazzo, S.: Emotional design: the development of a process to envision emotion-centric new product ideas. Procedia Comput. Sci. 158, 474–484 (2019). https://doi.org/10.1016/j.procs.2019.09.078 22. Zaltman, G.: The subconscious mind of the consumer (and how to reach it). Harvard Business School. Working Knowledge (2003). http://hbswk.hbs.edu/item/ 3246.html 23. Wrigley, C., Straker, K.: Affected: Emotionally Engaging Customers in the Digital Age. Wiley, Milton (2019) 24. Mar´ın Morales, J.: Modelling human emotions using immersive virtual reality, physiological signals and behavioural responses (2020) 25. Saxena, A., Khanna, A., Gupta, D.: Emotion recognition and detection methods: a comprehensive survey. J. Artif. Intell. Syst. 2(1), 53–79 (2020). https://doi.org/ 10.33969/AIS.2020.21005 26. Susindar, S., Sadeghi, M., Huntington, L., Singer, A., Ferris, T.K.: The feeling is real: emotion elicitation in virtual reality. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 63, no. 1, pp. 252–256 (2019). https://doi.org/10.1177/1071181319631509 27. De Luca, V.: Emotions-based interactions: design challenges for increasing wellbeing. (2016). https://doi.org/10.13140/RG.2.2.27698.81609 28. De Luca, V.: Oltre l’interfaccia: emozioni e design dell’interazione per il benessere. MD J. 1(1), 106–119 (2016) 29. Diemer, J., Alpers, G.W., Peperkorn, H.M., Shiban, Y., M¨ uhlberger, A.: The impact of perception and presence on emotional reactions: a review of research in virtual reality. Front. Psychol. 6, 26 (2015). https://doi.org/10.3389/fpsyg.2015. 00026 30. Mar´ın-Morales, J., et al.: Real vs. immersive-virtual emotional experience: analysis of psycho-physiological patterns in a free exploration of an art museum. PloS One 14, 10 (2019). https://doi.org/10.1371/journal.pone.0223881 31. Buccoli, M., Zanoni, M., Sarti, A., Tubaro, S., Andreoletti, D.: Unsupervised feature learning for music structural analysis. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 993–997 (2016). https://doi.org/10.1109/ EUSIPCO.2016.7760397 32. Fung, K.Y., Kwong, C.K., Siu, K.W.M., Yu, K.M.: A multi-objective genetic algorithm approach to rule mining for affective product design. Expert Syst. Appl. 39(8), 7411–7419 (2012). https://doi.org/10.1016/j.eswa.2012.01.065 33. Andreoletti, D., Luceri, L., Peternier, A., Leidi, T., Giordano, S.: The virtual emotion loop: towards emotion-driven product design via virtual reality. In: 2021 16th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 371–378 (2021). https://doi.org/10.15439/2021F120 34. Fishwick, M.: Emotional design: why we love (or hate) everyday things. J. Am. Cult. 27(2), 234–236 (2024). https://doi.org/10.1108/07363760610655069
A Framework for Emotion-Driven Product Design Through Virtual Reality
61
35. Press, W.H., Teukolsky, S.A.: Savitzky-Golay smoothing filters. Comput. Phys. 4(6), 669–672 (1990). https://doi.org/10.1063/1.4822961 36. Boonjing, V., Pimchangthong, D.: Data mining for positive customer reaction to advertising in social media. Info. Technol. Manag. Ongoing Res. Dev., 83–95 (2017). https://doi.org/10.15439/2017F356 37. Ziemba, E.: Synthetic indexes for a sustainable information society: measuring ICT adoption and sustainability in Polish government units. In: Ziemba, E. (ed.) AITM/ISM 2018. LNBIP, vol. 346, pp. 214–234. Springer, Cham (2019). https:// doi.org/10.1007/978-3-030-15154-6 12 38. Floris, C., et al.: Feasibility of heart rate and respiratory rate estimation by inertial sensors embedded in a virtual reality headset. Sensors 20(24), 1424–8220 (2020). https://doi.org/10.3390/s20247168
Solutions to Social Issues
Artificial Intelligence Project Success Factors—Beyond the Ethical Principles Gloria J. Miller(B) maxmetrics, Heidelberg, Germany [email protected]
Abstract. The algorithms implemented through artificial intelligence (AI) and big data projects are used in life-and-death situations. Despite research that addresses varying aspects of moral decision-making based upon algorithms, the definition of project success is less clear. Nevertheless, researchers place the burden of responsibility for ethical decisions on the developers of AI systems. This study used a systematic literature review to identify five categories of AI project success factors in 17 groups related to moral decision-making with algorithms. It translates AI ethical principles into practical project deliverables and actions that underpin the success of AI projects. It considers success over time by investigating the development, usage, and consequences of moral decision-making by algorithmic systems. Moreover, the review reveals and defines AI success factors within the project management literature. Project managers and sponsors can use the results during project planning and execution. Keywords: Algorithms · Artificial intelligence · Big data · Critical success factors · Moral decision-making · Project management
1 Introduction Algorithmic decision-making is replacing or augmenting human decision-making across many industries and functions [1, 2]. The decisions range from trivial to life and death. For example, predicting the customer reaction to a social media advertisement [3] is insignificant compared to diagnosing Parkinson’s disease [4], breast cancer [5], or COVID-19 [6]. An “algorithm is a defined, repeatable process and outcome based on data, processes, and assumptions” [7]. Algorithms can take many forms and have been used in decision-making for centuries. Today, data-driven algorithms are usually the result of artificial intelligence (AI) or big data projects. Scholars anticipate that AI will significantly impact society, generating productivity and efficiency gains and changing the way people work [8]. Understanding success factors in AI projects is critical given the considerable impact on individuals, society, and the environment. Sponsoring organizations invest in AI projects with the expectation that they will deliver measurable, meaningful benefits such as revenue or productivity gains [9]. These This work was not supported by any organization. © The Author(s) 2022 E. Ziemba and W. Chmielarz (Eds.): FedCSIS-AIST 2021/ISM 2021, LNBIP 442, pp. 65–96, 2022. https://doi.org/10.1007/978-3-030-98997-2_4
66
G. J. Miller
benefits are usually realized long after the projects are completed and the algorithms are used. However, the on-time and cost limits of the task or the goal orientation of projects create the risk that the interests of significant stakeholders may not be considered. Thus, weighing short-term project objectives against long-term social and environmental consequences raises essential questions about the definition of project success. While the literature acknowledges the importance of client consultation and client acceptances as critical success factors, the public interest is not considered in the most referenced project management success model [10]. Mitchell, Agle, and Wood [11] argue that managers should serve legitimate stakeholders’ legal and moral interests. However, the technology view of moral decisionmaking with AI does not consider non-technical stakeholders, or the ethical principles are so broad they are not practical for developers [12–14]. Manders-Huits [15] contends that the notion of consequences and the level of autonomy of action are preconditions or considerations for moral responsibility, arguing that the burden of responsibility for moral decisions is on the system designers’ shoulders. Martin [16] makes a similar case, stating that “developers are those most capable of enacting change in the design and are sometimes the only individuals in a position to change the algorithm.” However, Wachnik [17] warns that a high level of information asymmetry creates a moral hazard when the supplier (i.e., developers) and the client are from differing organizations. Moral hazards are defined as the ineffectiveness or the abuse of trust created by opportunistic behavior [17]. Thus, while research exists to address varying aspects of ethics and morality in AI projects, a practical definition of project success is needed that addresses all legitimate stakeholders. This research uses a systematic review of the literature to answer a novel question regarding success factors in AI projects: what are the project success factors for moral decision-making with algorithms? It closes the gap on a lack of literature that translates AI ethical principles into practice [14] and introduces AI success factors into the project management literature. The paper is structured as follows. Section two provides a literature review. Section three includes the theoretical framework and a description of the research methodology, and section four outlines the findings. Section five discusses the results, and section six provides conclusions, including the study’s contribution, implications, limitations, and considerations for future research.
2 Literature Review 2.1 Project Success Factors Projects are temporary endeavors with an end date planned from the beginning. Thus, the project objectives and success criteria should be agreed upon with stakeholders before starting a project and be reflected in the project objectives, business case, and investments [18, 19]. Otherwise, the long-term needs of internal and external stakeholders may conflict with their temporality and end date [19]. Project success refers to delivering the expected output, achieving the intended objective, and meeting stakeholder requirements. Project efficiency is a subset of project success and is defined as project management success in relation to time, costs, and
Artificial Intelligence Project Success Factors
67
quality—the iron triangle [18, 20]. Within a project, success criteria and success factors are the dimensions of success [18, 20]: criteria measure success while factors identify the circumstances, conditions, and events for reaching the objectives. Consequently, the efficiency of a project can be measured once the outputs are produced. In contrast, project benefits and organizational performance impacts can be measured after the project outputs are put into use—fully or incrementally. The project management critical success factor model drawn from Pinto and Slevin [21] is the most referenced [18]. It outlines ten success factors that the project team controls and four factors that influence project success but are not under their control. Davis [10] identifies three gaps with the Pinto and Slevin [21] model: project efficiency, accountability and stakeholder involvement, and benefits to stakeholder groups. The project success model from Turner and Zolin [20] examines how stakeholders perceive success after the project is completed. It was the first model to look at success outside the typical project life cycle and simultaneously consider multiple stakeholders [10]. It defines the project results at multiple timescales: project outputs at the end of the project, outcomes months after the end, and impacts years after the end. The success model from Shenhar, Dvir, Levy, and Maltz [9] identifies five dimensions of success: project efficiency, impact on the customer, business success, impact on the team, and preparation for the future. It also acknowledges the importance and relevance of shifts in the measures over time, with efficiency being important during the project but less so once it is complete. Since each project management success model argues that success depends on the project objectives and type [9], success models from information technology projects were further investigated. The investigation included models for projects to digitally transform a specific area of a company (digitalization projects) [22]; software engineering projects [23, 24]; projects to implement and customize standardized information systems—enterprise resource planning [13, 25, 26]; information systems, big data analytics, and business intelligence implementation projects [27–29]; and artificial intelligence and robots projects [30]. Though phrased differently depending on the subject material, customer consultation and acceptance were success factors in all models. System usage was considered a prerequisite to realizing organizational performance benefits or societal impacts. Using the framework from Pinto and Slevin [21], Miller [27] concludes that ethical knowledge is a specialized skill needed by project personnel. The author also questions the role of moral decision-making in extreme situations and identifies ethical concerns as a risk to manage. However, in conclusion, stakeholder involvement and stakeholder benefits were concepts in some models [10, 20]—otherwise, the ethical and moral interests related to the public interest were not considered success factors in the reviewed models. 2.2 AI Projects Algorithm Methods and Techniques. The methods and techniques used for datadriven algorithm development encompass multiple disciplines or branches within computer science and statistics. They include machine learning and deep learning techniques to learn from data and define algorithms [27, 31, 32]. Machine learning uses supervised and unsupervised methods to discover and model the patterns and relationships in data,
68
G. J. Miller
allowing it to make predictions. Deep learning uses machine-learning approaches to automatically learn and extract features from complex unverified data without human involvement [27, 31]. Machine-learning methods such as regression, discriminate analysis, and time series analysis are mature methods with many decades of usage [33]. Other approaches have emerged in the last few decades, such as natural language processing (NLP) and artificial neural networks (ANN). NLP covers automated methods to process and manipulate human language [32, 33]. Conceptually inspired by the human brain’s biological neurons and synapses, ANN models are trained on past data and use pattern recognition to make predictions [32, 33]. Other methods or combinations of methods exist for dealing with imprecise or vague data or approximating reasoning, e.g., artificial neural networks with fuzzy logic [33]. Each method requires data for learning. Most methods require high-performance computing systems and architectures [27]. The resulting algorithms are incorporated into computer systems (here after referred to as AI systems); technologies such as big data, predictive analytics, business intelligence, data mining, advanced analytics, and digitization are used in the system development. Project Lifecycle. The algorithmic decision-making life cycle can have three stages: development, usage, and consequence. The development stage produces an AI system in four steps [16, 27, 34]. The first step is the planning and design of the system. In the second step, the source data are collected from multiple sources; they are made fit for purpose, including augmenting data with tags, identifiers, or metadata; and stored in data repositories. For the third step, subsets of source data are transformed to train the models (subsequently referred to as training data). The models and algorithms are developed through extensive data and analytical methods. Here high-performance computing is needed to support the computational load and data volumes [27]. In the fourth step, the algorithms are verified and validated. A user interface (UI) is developed to produce autonomous decisions or provide input for human decision-making. This step may include other technical aspects, such as system deployment. However, though relevant, these topics are not the main focus of this study. The usage stage includes the operation and monitoring of the system [16, 27, 34]. The algorithms are used by inputting parameters or data to invoke them; the algorithms then output the decisions. The algorithm or AI system may be a standalone system or integrated into other systems, robots, automobiles, or digital technology platforms such as a social media platform. The algorithms may produce the output with or without the user’s knowledge of their existence. They are monitored over time for their effectiveness and retrained or otherwise adjusted. In the consequences stage, the decision is finalized, and the consequences begin to impact people, organizations, and groups (i.e. stakeholders).
2.3 Morality and Ethics in AI Jones [35] defines a moral issue as one where a person’s actions, when freely performed, have consequences (harms or benefits) for others. The moral issue must involve a choice on the part of the actor or the decision-maker. Jones also states that many decisions have
Artificial Intelligence Project Success Factors
69
a moral component—a moral agent is a person who makes a decision even when the decision-maker may not recognize that a moral issue is at stake. An ethical decision is legally and morally acceptable to the larger community; an unethical decision violates legal or moral codes. Much of the research reviewed treats the terms moral and ethical as equivalent and uses them interchangeably depending on the context. The thesis from Anscombe [36] holds that the concepts “right” and “wrong” should be discarded and replaced with a definition of morality in terms of “intrinsically unjust” versus “unjust given the circumstances.” The author argues that the boundary between the two concepts is “according to what’s reasonable.” Anscombe [36] further theorizes that determining the expected consequences plays a part in determining what is just. These arguments place the responsibility for morality on the decision-maker. However, they do not define who is accountable when the decisions are delegated from humans to systems. Shaw, Stöckel, Orr, Lidbetter, and Cohen [37] argue that machines are artificial agents that should not be held to a higher moral standard than humans; they define four meta-moral qualities that machines should possess to be considered proper moral agents (robustness, consistency, universality, and simplicity). Manders-Huits [15] contends that the notion of consequences and level of autonomy of action are preconditions or considerations for moral responsibility. First, the notion of consequences in information technology (IT) places the burden of responsibility for moral decisions on the shoulders of those who design complex IT systems. However, the definition of designers is unclear—including technicians, finance providers, and instructors—and the relationship between the responsibility of designers compared to end-users in final decision-making is also opaque. Martin [16] also places the responsibility for moral decision-making in the hands of the system developer and their companies. Second, the abundance of information that individuals have and understand enhances their ability to act autonomously. However, the actions or decisions integrated into IT applications are limited based on “implying an adequate understanding of all relevant propositions or statements that correctly describe the nature of the action and the foreseeable consequences of the action” [15]. Moreover, it is not likely that modelers can predict all potential uses of their models [38]. Wachnik [17] identifies several incidents of moral hazard in IT projects when the supplier and the client are from differing organizations, classified by ineffectiveness or the abuse of trust created by a high level of information asymmetry in a supplieragent relationship or the opportunistic behavior of the supplier [17]. The study reveals unwanted behaviors of the supplier that introduce a risk of project failure, an increase in the transaction cost for the client, and result in a loss of mutual trust. Much research has focused on defining values, principles, frameworks, and guidelines for ethical AI development and deployment [14, 39]. However, Mittelstadt [14] maintains that principles alone have a limited impact on AI design and governance. Conducting an analysis of 21 AI ethics guidelines, Hopkins and Booth [40] similarly finds that such regulations are ineffective and do not change the behavior of professionals throughout the technology community. One challenge is the difficulty in translating concepts, theories, and values into practice. Specifically, the translation process is likely to “encounter incommensurable moral norms and frameworks which present true moral dilemmas that principles cannot resolve” [14]. Furthermore, there are no proven methods
70
G. J. Miller
to translate principles into practice. For example, Mittelstadt [14] warns that the solution to AI ethics should not be oversimplified to AI technical design or expertise alone. Jobin, Ienca, and Vayena [39] conducted a content analysis of 84 AI ethical guidelines and identified five ethical principles that have converged globally (transparency, justice, fairness, non-maleficence, and privacy). Building on their research [39], Ryan and Stahl [12] provides a detailed explanation of the normative implication of AI ethics guidelines for developers and organizational users. The paper defines AI ethical principles and describes what users and developers should do to realize their moral responsibilities. However, it explicitly excludes other stakeholders. The study distinguishes 11 AI ethical categories: beneficence, dignity, freedom and autonomy, justice and fairness, non-maleficence, privacy, responsibility, solidarity, sustainability, transparency, and trust.
3 Research Methodology 3.1 Theoretical Framework To answer the research question, this review seeks the deliverables, acts, or situations— success factors—necessary to avoid harm or ensure the benefits of an algorithm developed in projects. Thus, the project success model from Turner and Zolin [20] is relevant for identifying success factors, and the ethical principles from Ryan and Stahl [12] are useful to quantify and contextualize the success factors. The model from Turner and Zolin [20] attempts to forecast project success beyond the initial project outputs, recognizing that project outcomes and impacts, and stakeholder interest change over time. It was chosen for four key reasons. First, the model focuses on projects, which are bound by time, team, tasks, and activity. These boundaries limit environmental considerations and give context to the investigation. This approach is relevant as personal experience, organizational norms, industry norms, and cultural norms affect stakeholders’ perceived alternatives, consequences, and importance. Second, decisions made during the project will have an impact many months or years in the future. However, project participants may not be aware of the magnitude of the consequences of their decisions initially, particularly in terms of harms or benefits for victims or beneficiaries. Thus, it is essential to consider the multiple time dimension available in the model. Third, stakeholders influence the project’s planning and outputs and are impacted by the results. Thus, multiple stakeholder perspectives are useful to consider the influence of decision-making and the resulting impact on the stakeholders. Finally, the model outlines the multiple types of success indicators that should be included in the investigation. Consequently, the algorithmic development, usages, consequence stages, and AI components were aligned with the timescales with the model from Turner and Zolin [20]. Algorithm development aligns with the output, algorithm usage with the outcome, and decision-making consequences with the impact. Table 1 identifies the alignment of AI components according to the time scales. The study from Ryan and Stahl [12] categorizes and subcategorizes ethical principles and concepts. It was chosen for the thematic mapping of success factors to ethical principles for three reasons. First, it includes a comprehensive set of categories by building
Artificial Intelligence Project Success Factors
71
Table 1. Project timescales and AI components Time scales [20]
AI component
Output (end of project)
• • • • •
Outcome (plus months)
• Input Interface Usage • Model and Algorithm Usage • Monitoring
Impact (plus years)
• Decisions
Source Data, Data Collection, and Storage Training Data Model and Algorithm Development and Validation User Interface Development System Architecture and Configuration
on the 84 AI ethical guidelines reviewed by Jobin, Ienca, and Vayena [39]. Second, the subcategories contain semantically similar concepts that were not merged into a single concept. This method provides rich descriptions that permit the qualitative mapping of individual success factors to the most relevant ethical concept. Third, Ryan and Stahl [12] argues that the meanings within the corpus of AI ethics guidelines should include activities that are morally appropriate for developers and users. This argument aligns with this study’s goal to understand the activities necessary for project success. 3.2 Systematic Review Procedure A single researcher used a systematic review of the literature to explore the research question, defined as “a review of a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyze data from the studies that are included in the review” [41]. The purpose of the systematic review was to synthesize existing knowledge in a structured and rigorous manner. The procedure included 1) an identification of bibliographic databases from which to collect the literature, 2) a definition of the search process including the keywords and the search string, 3) definition of inclusion and exclusion criteria, 4) removing duplicates and screening the articles, 5) extracting data based on a full-text review of the articles, and 6) synthesizing the data using a coherent coding method. The details are described in the following sections, and Fig. 1 includes a flow of information through the systematic review. Bibliographic Databases. The first literature search was begun in October 2020 to collect peer-reviewed articles from the ProQuest, Emerald, ScienceDirect, and IEEE Xplore bibliographic databases. The focal keywords were “algorithm” and “stakeholder.” This search revealed key themes in how success was viewed in algorithmic projects. Keywords such as ethics, fairness, accountability, transparency, and explainability were frequently referenced in the articles. The analysis also identified the “ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT)” as an important source for cross-disciplinary research. Thus, subsequent bibliographic searches were undertaken in March, July, and December 2021, adding the ACM Digital Library to the previous
72
G. J. Miller
bibliographic databases. Articles from the initial search and FaccT conference were retained for analysis. Table 2 shows the article distribution by database. Search String. The ultimate search emphasized “accountability” instead of “stakeholder.” Stakeholder in combination with algorithm was not a frequent keyword. Accountability focuses on the relationship between project actors and those to whom the actors should be accountable [42]. Other frequent keywords were also included in the search string to make the results meaningful. Since not all databases allowed wild cards, variations of the search string were used, and adjustments were made in the syntax for each search engine. There were adjustments for typographical errors and refinements between database searches. The wildcard version of the last used search is as follows: All = accountab* AND Title = (“machine learning” OR “artificial intelligence” OR AI OR “big data” OR algorithm*) AND Title = (fair* OR ethic* OR moral* OR success OR transparency OR expla* OR accountab*) Inclusion and Exclusion Criteria. Articles were retained in the search results for peerreviewed journal articles or conference papers in the English language; book reviews were excluded. There were no filters for the dates. Duplicate entries and entries with no available document were removed. Next, literature was excluded or retained in an iterative process based on a review of the title, the abstract, and the full article text. First, the titles of the articles were reviewed, and articles were retained that considered the process or considerations for the development, use, or outcomes of algorithms. Articles were excluded that discussed the structure or content of an algorithm, a specific use case, or wrongly identified articles, e.g., magazine articles and panel descriptions. Next, the abstracts were reviewed to determine if the article could yield information on the success factors. Finally, the full text of the included articles was reviewed and coded to answer the research question. Articles from the initial search and FaccT conference that did not appear in subsequent searches were retained for analysis. In total, the full text of 144 articles were analyzed; a subset of 55 articles introduced relevant and novel details and are included in the summary tables. The majority of the analyzed articles (64%) were published after 2019, and many (40%) were conference papers. Figure 1 shows the preferred reporting items for systematic reviews and meta-analyses (PRISMA) process flow. Coding Strategy and Data Analysis. Each of the 144 articles was reviewed in detail to code its success factors. The coding was conducted in NVivo 12 (Windows) software. We extracted terms and explanations to determine what was known about how different stakeholders viewed success; we then used the literature to clarify the definitions, provide examples, determine the main elements of success, and develop context. We compiled a resulting list of success factors that must be delivered: acts or situations that contributed to a positive outcome with algorithm decision-making projects. Then, where possible, each success factor was mapped to one or more ethical principles, as described in Ryan and Stahl [12]. The success factors were qualitatively grouped based on their common characteristics or responsibility patterns. Subsequently, the relationship between the
73
Records from database search (n = 932)
Records from initial search ( n=26)
Total records (n= 958)
Articles duplicates and missing studies excluded (n=42)
Screened by title per database (n=916)
Articles excluded due to title (n=390)
Screened by abstract (n=526)
Articles excluded due to abstract (n=382)
Studies included in synthesis (n=144)
Eligibility
Screening
Idenficaon
Artificial Intelligence Project Success Factors
Included
Coded in success factors (n=55)
Fig. 1. PRISMA process flow
Table 2. Article distribution by database
Database ACM Emerald
Search Results 300
Duplicates/ Exclusions
Title 5
Screened by Abstract 295 179
Full text Reviewed 57
25
5
20
20
5
IEEE Xplore
143
3
140
118
14
ProQuest
299
13
286
137
35
Science Direct
165
4
161
58
19
26
12
14
14
14
958
42
916
526
144
Initial search
74
G. J. Miller
groups of success factors was theorized, and a relationship model was defined. The results are summarized in the Research Findings section.
3.3 Validity and Reliability This approach of defining elements is an acceptable method for placing boundaries around the meaning of a term [41]. First, external validity was ensured by using a theoretical model as a guiding framework for the thematical analysis. Second, validity was maintained by using the literature as primary and validation sources. The success factors were mapped at a detailed level to the original sources in the literature. The results were cross-validated with previous AI ethics literature reviews [12] to ensure completeness. The checklist and phase flow from the PRISMA statement were used to guide the study and report the results [41].
4 Research Findings The literature review identified 82 success factors that were qualitatively consolidated in an iterative process into five broad categories and 17 groups. The meaning of each category is provided in Table 3. Table 4 identifies the success factors and references them according to success category and group. The ethical principles are shown next to each success factor based on mapping them to the 11 ethical categories from Ryan and Stahl [12]. Figure 2 visualizes the intersection between the success factors and the ethical principles, each column being one of the 11 ethical categories. The colors represent the percentage and the numbers the count of success factors that address the principle. The results describe the practical requirements for success with AI development and usage based on the moral issues and ethical principles found in the literature. Table 3. Success category descriptions Category
Description
Project Governance
Deciding the scope and managing AI implementation or operational resources
Product Quality
Incorporating ethical principles into the design, development, and documentation
Usage Qualities
Establishing usage practices for ethical decision-making with a given AI system
Benefits and Protections
Protecting and benefiting from investments in the project and the AI system
Societal Impacts
Protecting individual rights and the environment
Artificial Intelligence Project Success Factors
75
Table 4. Success groups, success factors, and ethical principles Success group Project Governance
Success factor Project Management
Ethical Practices
Investigation
Principle(s)
Scope definition document BE DY JF NM RY SO TY
Ref(s) [12, 43–47]
Risk assessment records
FA NM RY
[45]
Responsibility assignment matrix
JF NM RY SU TT
[12, 47–49]
Community engagement
JF SO
[38, 50]
Disclosure records
JF TY
[8, 45, 51, 52]
Standards and guidelines, Diverse working environment
JF
[12, 14, 38, 51, 53, 54]
Model risk assessment
NM
[55]
Recordkeeping, Procurement records
RY
[12, 45, 51, 56]
Ethics policies
JF NM RY TY
[2, 12, 14, 51]
Ombudsman
JF NM RY
[12, 42, 53]
Professional membership
JF
[14]
Ethics training
RY
[2, 12, 51]
Algorithm impact assessment
BE JF NM TY
[42, 53, 57]
Audit response records
JF NM RY TY
[2, 45]
Audit finding records
JF NM RY TY
[45]
Algorithm auditing
JF NM RY TY
[1, 2, 7, 45, 58, 59]
Certification
JF TY
[38, 59]
(continued)
76
G. J. Miller Table 4. (continued)
Success group Product Quality
Source Data Qualities
Training Data Qualities
Models and Algorithms Qualities
User Interface Qualities
System Configuration
Data and Privacy Protections
Success factor
Principle(s)
Ref(s)
Data accessibility
JF PY
[1, 7, 51, 52, 60, 61]
Data transparency
JF TT TY
[6, 38, 51, 62–66]
Data collection records
TY
[2, 45, 51, 62, 63]
Data quality and relevance
JF PY RY TY
[45, 62, 63]
Equitable representation
JF
[38, 66–70]
Model training records
TY
[62, 71]
Model validation
FA JF NM
[2, 38, 59, 72, 73]
Accuracy, Equitable treatment, Consistency
JF
[1, 2, 6, 7, 12, 38, 66–69, 74]
Model validation records
NM TY
[45, 71]
Algorithm transparency
TT TY
[1, 6, 38, 45, 51, 52, 58, 66, 71, 75, 76]
Interpretability, Auditability
TY
[2, 6, 12, 56, 63, 66]
Front-end transparency
DY FA JF TT TY
[47, 52, 60, 64, 66, 77]
Equitable accessibility, Human intervention
JF
[1, 6, 12, 38, 47, 78]
System and architecture quality
JF TT
[7, 30, 56]
Security safeguards
NM PY RY TT
[2, 52, 79]
Technical deployment records, Technical logging
RY
[45]
Versioning and metadata
TY
[40]
Informed consent
FA JF PY TY
[6, 7, 12, 38, 52]
Privacy safeguards
JF PY TT TY
[1, 6, 7, 38, 51, 60]
Data anonymization, Data encryption
JF PY TY
[1, 6, 7, 38, 51, 60]
Personal data controls
JF PY
[1, 6, 60]
Data governance, Data retention policy
PY
[52, 63, 80]
Confidentiality
RY
[7, 12, 38, 56, 65, 74]
(continued)
Artificial Intelligence Project Success Factors
77
Table 4. (continued) Success group Usage Qualities
System Transparency and Understandability
Success factor
Principle(s)
Ref(s)
Interaction safety
DY NM RY TT
[61]
Stakeholder-centric communication
FA JF TY
[2, 6, 12, 52, 65, 81, 82]
Choices
FA JF
[6, 12, 38, 47, 56, 83]
Specialized skills and knowledge
JF NM RY
[56, 84]
Privacy and confidentiality JF NM PY RY TY
Usage controls
Onboarding procedures
JF
[40]
Problem reporting
RY
[58]
Interpretable models
TY
[6, 12]
Quality controls
JF NM RY TT
[38, 47]
Usage records
JF RY TY
[45, 51]
Complaint process
JF
[64]
System monitoring
NM TY
[84]
Consequence records
RY TY
[45, 59]
Process deployment records
RY
[45]
Staff monitoring, TY Algorithm renewal process Decision Quality
Benefits and Protections
Financial Benefits
Financial Protections
[38, 42, 84]
Awareness
DY JF TY
[2, 6]
Decision accountability
JF RY TT
[2, 12, 66, 85]
Access and redress
JF
[1, 2, 6, 12, 65, 85]
Licensing or service fees
[38, 74]
Financial gains
[38, 65, 74]
Intellectual property rights
[38, 53]
Investment funds
[30]
Energy costs, Environmental impacts
SU
[12, 69]
Intellectual property protection
[38, 53, 58]
Cost efficiency
[74]
Project efficiency Legal Protections
[7, 12, 38, 56, 65, 74]
[18, 20]
Regulatory and legal compliance
FA JF NM TY
[7, 30, 45, 51, 57, 59, 65]
Legal safeguards
NM RY
[2, 52]
(continued)
78
G. J. Miller Table 4. (continued)
Success group Societal Impacts
Success factor
Principle(s)
Ref(s)
Individual Protections
Civil rights and liberties protections
DY FA JF NM RY
[12, 47, 60]
Sustainability
Environmental sustainability
SU
[12, 69]
Legend: beneficence (BE), dignity (DY), freedom and autonomy (FA), justice and fairness (JF), nonmaleficence (NM), privacy (PY), responsibility (RY), solidarity (SO), sustainability (SU), transparency (TY), Trust (TT)
Fig. 2. Intersection between success factors and ethical principles
4.1 Project Governance Project Management. The scope definition document, or problem statement, defines the aims and rationale for the algorithmic system [45]. The requirement for the system, the moral issues, and all aspects of the project are impacted by the algorithm’s context (e.g., country, industry sector, functional topic, and use case). Trust is context-dependent since systems can work in one context but not another; thus, the scope should act as a contract that reveals the algorithm’s goal and the behavior that can be anticipated [46]. Furthermore, a clearly defined scope protects against spurious claims and the misapplication or misuse of the system. Next, AI ethical principles argue that AI systems should be developed to do good or benefit someone or the society as a whole (beneficence); they should avoid harming others (non-maleficence) [12, 47]. Bondi, Xu, Acosta-Navas, and Killian [50] propose that the communities impacted by the algorithms should be active participants in AI projects. The community members should have some say in granting the project a moral standing based on if it benefits the community or makes the community more vulnerable. While these arguments were made in relationship to the “AI for social good” type projects, the community engagement is relevant for all types of AI projects. Finally, rules should be established to manage conflict of interest situations within the team or when the values of the system conflict with the interests or values of the users [43, 44]. A responsibility assignment matrix defines roles and responsibilities within a project, distinguishing between persons or organizations with responsibility and accountability [48]. Accountability ensures a task is satisfactorily done, and responsibility denotes an obligation to perform a task satisfactorily, with transparency in reporting outcomes,
Artificial Intelligence Project Success Factors
79
corrective actions, or interactive controls [47–49]. Both responsibility and accountability assume a degree of subject matter understanding and knowledge and should include moral, legal, and organizational responsibilities [12, 47]. The project organization should promote a diverse working environment, involving various stakeholders and people from numerous backgrounds and disciplines and promoting the exchange and cooperation across regions and among organizations [12, 54]. Furthermore, the project team needs members with specialized skills and knowledge to process data and accomplish the design and development of the algorithms; the quality of the team’s skills and expertise impacts the usability of the algorithms and the governance policies [30]. Standards and policy guidelines should be used to build consensus and provide understanding on the relevant issues, such as algorithmic transparency, accountability, fairness, and ethical use [14, 38, 51, 53]. Systematic record-keeping is the mechanism for retaining logs and other documents with contextual information about the process, decisions, and decision-making from project inception through system operations [12, 45, 51, 56]; the various types of records are listed as individual success factors. The disclosure records are logs that are themselves about disclosures or the processes for disclosure, what was actually released, how information was compiled, how it was delivered, in what format, to whom, and when [45, 51, 52]. Bonsón, Lavorato, Lamboglia, and Mancini [8] found that some European publicly-listed companies are disclosing their AI ethical approaches and facts about their AI projects in their annual and sustainability reports. The procurement records are contractual arrangements, tender documents, design specifications, quality assurance measures, and other documents that detail the suppliers and relevant due-diligence [45]. The risk assessment records identify the potential implications and risks of the system, including legality and compliance, discrimination and equality, impacts on basic rights, ethical issues, and sustainability concerns [45]. However, model risk assessment should be an active process of identifying and avoiding or accepting risks, changing the likelihood of a risk occurring or of its consequences, sharing risk responsibility, or retaining the risk as an informed decision [55]. Retraining and fine-tuning models, adding new components to original solutions (i.e., wrappers), and building functionally equivalent models (i.e., copies) are three methods suggested by Unceta, Nin, and Pujol [55] for mitigating AI risks. Ethics Practices. Ethics policies should include ethical principles with guidelines and rules for implementation and to verify and remedy any violations; the ethics policies should be shareable externally with the public or public authorities [2, 12, 14, 51]. Ethics training should cover the practical aspects of addressing ethical principles [2, 12, 51, 53]. An independent official such as an ombudsman, or a whistleblower process, should be available to hear or investigate ethical or moral concerns or complaints from team members [12, 42, 53]. Finally, professional membership in an association or standard organization (e.g., ACM, IEEE) that provides standards, guidelines, practices on ethical design, development, and usage activities should be encouraged and supported [14]. Investigations. Algorithm auditing is a method that reveals how algorithms work. Testing algorithms based on issues that should not arise and making inferences from the algorithms’ data is a technique for auditing complex algorithms [1, 2, 7, 45, 53, 58,
80
G. J. Miller
59]. Audit finding records document the audit, the basis or other reasons it was undertaken, how it was conducted, and any findings [45]. Audit response records document remediations and subsequent actions or remedial responses based on audit findings [2, 45]. Algorithmic impact assessments investigate aspects of the system to uncover the impacts of the systems and propose steps to address any deficiencies or harm [42, 53, 57]. Certification ensures that people or institutions comply with regulations and safeguards and punishes institutions for breaches; it offers independent oversight by an external organization [38, 53, 59].
4.2 Product Quality Source Data Qualities. Data accessibility refers to data access and usage in the algorithm creation process. Several regulations and laws constrain how data may be accessed, processed, and used in analytical processes. Thus, a legal agreement to use the data and the confidentiality of personal data should be preserved [1, 7, 51, 52, 60, 61]. Data transparency reveals the source of the data collected, including the context or purpose of the data collection, application, or sensors (or users who collected the data), and the location(s) where the data are stored [6, 38, 51, 62–66]. The reviewability framework [45] recommends maintaining data collection records that include details on their lifecycle: purpose, creators, funders, composition, content, collection process, usage, distribution, limitations, maintenance, and data protection and privacy concerns [2, 45, 51, 62, 63]. Datasheet for datasets by Gebru, et al. [62] provides detailed guidance on document content. Training Data Qualities. Data quality and relevance refer to possessing data that are fit for purpose. The quality challenges relating to training data include the diversity of the data collected and used, how well it represents the future population, and the representativeness of the sample [45, 62, 63]. Individuals are entitled to physical and psychological safety when interacting with and processing data, i.e., interaction safety [7, 12, 38, 51, 54, 61, 74]. Kang, Chiu, Lin, Su, and Huang [70] recommends that in addition to labelling data points, that human experts annotate representations and rules to optimize models for human interpretation. Equitable representation applies to data and people. For data, it means having enough data to represent the whole population for whom the algorithm is being developed while also considering the needs of minority groups such as disabled people, minors (under 13 years old), and ethnic minorities. For people, it means, for example, including representatives from minority groups or their advocates in the project governance structures or teams that design and develop algorithms [38, 66–69]. Model training records should document the training workflow, model approaches, predictors, variables, and other factors; datasheets for datasets by Gebru, et al. [62] and model cards by Mitchell, et al. [71] provide a framework for the documentation. Model and Algorithm Qualities. Algorithm transparency refers to using straightforward language to provide clear, easily accessible descriptive information (including
Artificial Intelligence Project Success Factors
81
trade secrets) about the algorithms and data and explanations for why specific recommendations or decisions are relevant. The need for end-users to understand and explain the decisions produced influences the algorithm, data, and user interface transparency requirements [1, 6, 38, 51, 52, 58, 66, 71, 75]. For example, transparency may be needed for compatibility between human and machine reasoning, for the degree of technical literacy needed to comprehend the model, or to understand the model’s relevance or processes [45]. Here it is worth noting that the method and technique used to create the model influences it’s explainability and transparency [76]. Models can be interpretable by design, provide explanations of their internals, or provide post-hoc explanation of their outputs. Mariotti, Alonso, and Confalonieri [76] provides a framework for analyzing a model’s transparency. Chazette, Brunotte, and Speith [75] provide a knowledge framework that maps the relationship between a model’s non-functional requirements and their influence on explainability. Equitable treatment means eliminating discrimination and differential treatment, whereby similarly situated people are given similar treatment. In this context, discrimination should not be equated to prejudice based on race. It is based on forming groups using “statistical discrimination” and refers to anti-discrimination and human rights protections [1, 2, 12, 66, 67, 74]. Model qualities include consistency, accuracy, interpretability, and suitability; there are no legal standards for acceptable error rates or ethical designs. Consistency means receiving the same results given the same inputs, as non-deterministic effects can occur based on architectures with opaque encodings or imperfect computing environments [7]. Accuracy is how effectively the model provides the desired output with the fewest mistakes (e.g., false positives, error rates) [6, 7, 38, 66, 68, 69]. Overconfidence is a common modeling problem that occurs when the model’s average confidence level exceeds its average accuracy [40]. Interpretability refers to how the model is designed to provide reliable and easy-to-understand explanations of its prediction [6, 12, 56]. Auditability refers to how the algorithm is transparent or obscured from an external view to allow other parties to monitor or critique it [2, 63, 66]. Model validation is the execution of mechanisms to measure or validate the models for adherence to defined principles and standards, effectiveness, performance in typical and adverse situations, and sensitivity. Validation should include bias testing, i.e., an explicit attempt to identify any unfair bias, avoid human subjectivity that introduces individual and societal biases, and reverse any biases detected. Models can be biased based on a lack of representations in the training data or how the model makes decisions, e.g., the selected input variables. The model outcomes should be traceable to input characteristics [2, 38, 40, 59, 72, 73]. The reviewability framework suggests maintaining model validation records that contain details on the model and how it was validated, including dates, version, intended use, factors, metrics, evaluation data, training data, quantitative analyses, ethical considerations, caveats and recommendations, or any other restrictions [45, 71]. Model cards by Mitchell, et al. [71] provide detailed guidance on the content. User Interface (UI) Qualities. Expertise is embodied in a model in a generalized form that may not be applicable in individual situations. Thus, human intervention is the ability to override default decisions [1, 6, 47]. Equitable accessibility ensures usability for all
82
G. J. Miller
potential users, including people with disabilities; it considers the ergonomic design of the user interface [12, 38, 78]. Front-end transparency designs should meet transparency requirements and not unduly influence, manipulate, confuse, or trick users [52, 60, 64, 66, 77]. Furthermore, dynamic settings or parameters should consider the context to avoid individual and societal biases such as those created by socio-demographic variables [47]. App-Synopsis by Albrecht [77] provides detailed guidance on the recommended description of algorithmic applications. System Configuration. The system and architecture quality may impact the algorithm’s outcomes, introduce biases, or result in indeterminate behavior. Default choices (e.g., where thresholds are set and defaults specified) may introduce bias in the decisionmaking. Specifically, the selected defaults may be based on the personal values of the developer. The availability, robustness, cost, and safety capabilities of the software, hardware, and storage are essential to algorithm development and use [30]. Decisions on methods and the parallelism of processes may cause system behavior that does not always produce the same results when given the same inputs. Obfuscated encodings may make it difficult to process the results or audit the system. The degree of automation may limit the user’s choices [7, 56]. Security safeguards allow technology, processes, and people to resist accidental, unlawful, or malicious actions that compromise data availability, authenticity, integrity, and confidentiality [2, 52, 53, 79]. The reviewability framework suggests that systems should provide a technical logging process, including mechanisms to capture the details of inputs, outputs, and data processing and computation [45]. The framework also recommends records relevant to the technical deployment records and operations, including installation procedures, hardware, software, network, storage provisions or architectural plans, system integration, security plans, logging mechanisms, technical audit procedures, technical support processes, and maintenance procedures [45]. Hopkins and Booth [40] highlights the need for model and data versioning and metadata to evaluate data, direct model changes overtime, document changes, align the product to the end documentation, and to act as institutional knowledge. Data and Privacy Protections. Data governance includes the practices for allocating authority and control over data, including the way it is collected managed and used; it should consider the data development lifecycle of the original source and training data [63, 80]. Data retention policy specifies the time and obligations for keeping data; personal data should be retained for the least amount of time possible [12, 52]. Privacy safeguards include processes, strategies, guidelines, and measures to protect and safeguard data privacy and implement remedies for privacy breaches [1, 6, 7, 38, 51, 53, 60]. Informed consent is the right to be informed of the collection, use, and repurposing of their personal data [6, 7, 12, 38, 52, 53]. The legal and regulatory rules covering consent vary by region and usage purposes. Personal data control means giving people control of their data [1, 6, 60], while confidentiality concerns protecting and keeping data and proprietary information confidential [7, 12, 38, 56, 65, 74]. Data encryption, data anonymization, and privacy notices are examples of privacy measures [1, 6, 7, 38, 51, 53, 60]. Data anonymization involves applying rules and processes that randomize data so an individual is not identifiable and cannot be found through combining data
Artificial Intelligence Project Success Factors
83
sources. Data protection principles do not apply to anonymous information [38, 51, 53, 60]. Data encryption is an engineering approach to secure data with electronic keys.
4.3 Usage Qualities System Transparency and Understandability. Stakeholder-centric communication considers the explainability of the algorithm to the intended audience. Scoleze Ferrer Paulo, Galvão Graziela Darla, and de Carvalho Marly [82] note that there are different audiences for model and algorithm explanations: executives and manager who are accountable for regulatory compliance, third-parties that check for compliance with laws and regulation, expert users with domain-specific knowledge, people affected by the algorithms that need to understand how they will be impacted, and models designers and product owner who research and develop the models. Thus, explanations should be comprehendible and transmit essential, understandable information rather than legalistic terms and conditions, even for complex algorithms [2, 6, 12, 52, 65, 81, 82]. Interpretable models refer to having a model design that is reliable, understandable, and facilitates the explanation of predictions by expert users [6, 12]. Choices allow users to decide what to do with the model results, maintaining a human-in-the-loop for a degree of human control [6, 12, 38, 47, 53, 56, 83]. Expertise is embodied in a generalized form that may not be applicable in individual situations, so specialized skills and knowledge may be required to choose among alternatives. Consequently, professional expertise, staff training and supervision, and on-the-job coaching may be necessary to ensure appropriate use and decision quality [56, 84]. Similarly, onboarding procedures are required to orient users on the system usage, their responsibilities, and system adjustments needed to address confidence levels, adjust thresholds, or override decisions [40]. Interaction safety refers to ensuring physical and psychological safety for the people interacting with AI systems [61]. Problem reporting is a mechanism that allows users to discuss and report concerns such as bugs or algorithmic biases [58]. Usage Controls. The complaint process means having mechanisms to identify, investigate, and resolve improper activity or receive and mediate complaints [64]. Quality controls detect improper usage or under-performance. Improper usage occurs when the system is used in a situation for which it was not originally intended [38, 47]. Monitoring is a continual process of surveying the system’s performance, environment, and staff for problem identification and learning [84]. Staff monitoring identifies absent or inadequate content areas, identifies systematic errors, anticipates and prevents bias, and identifies learning opportunities. System monitoring verifies how the system behaves in unexpected situations and environments. Model values or choices become obsolete and must be reviewed or refreshed through an algorithm renewal process [38, 42, 53]. The reviewability framework recommends retaining usage, consequence, and process deployment records. Usage records contain model inputs and the outputs of parameters, operational records at the technical (systems log) level, and usage instructions [45, 51]. Consequence records document the quality assurance processes for a decision and log any actions taken to affect the decision, including failures or near misses
84
G. J. Miller
[45, 59]. Logging and recording decision-making information are appropriate means of providing traceability. Process deployment records document relevant operational and business processes, including workflows, operating procedures, manuals, staff training and procedures, decision matrices, operational support, and maintenance processes [45]. Decision Quality. Awareness is educating the public about the existence and the degree of automation, the underlying mechanisms, and the consequences [2, 6, 53]. Access and redress are ways to investigate and correct erroneous decisions, including the ability to contest automated decisions by, e.g., expressing a point-of-view or requesting human intervention in the decision [1, 2, 6, 12, 53, 65, 85]. Decision accountability is knowing who is accountable for the actions when decisions are taken by the automated systems in which the algorithms are embedded [2, 12, 53, 66, 85]. Privacy and confidentiality are the activities to protect and maintain confidential information of an identified or identifiable natural person [7, 12, 38, 56, 65, 74].
4.4 Benefits and Protections Financial Benefits. Intellectual property rights consist of the ownership of the design of the models, including the indicators. Innovation levels have to be balanced against the liability and litigation risks for novel concepts [38, 53]. Financial gains include increased revenues from a sale or licensing models that produce revenue through license or service fees [38, 74]; cost reductions from making faster, less expensive, or better decisions [65]; or improved efficiency from reducing or eliminating tasks [65]. Furthermore, proven successful models, concepts, algorithms, or businesses can attract investment funds [38]. Investment funds are needed to finance project resources and activities [30]. Financial Protections. Intellectual property protection is achieved by hiding the algorithm’s design choices, partly or entirely, and establishing clear ownership of AI artifacts (e.g., data, models) [38, 53, 58]. Data and algorithm transparency and auditing requirements should be considered in deciding what to reveal [58]. Further, model development has environmental impacts and energy costs. The environmental impacts occur as training models may be energy-intense, using as much computing energy as a trans-American flight as measured by carbon emissions [12, 69]. The energy costs from computing power and electricity consumption (for on-premise or cloud-based services) are relevant for training models [12, 69]; for an incremental increase inaccuracy, the cost of training a single model may be extreme (e.g., 0.1 increase in accuracy for 150,000 USD) [69]. Cost efficiency occurs when acquiring and using information is less expensive than the costs involved if the data were absent [74]. Project efficiency evaluates the project management’s success in meeting stakeholder requirements for quality, schedule, and budget [18, 20]. Legal Protections. Legal safeguards include protection against legal claims or regulatory issues that arise from algorithmic decisions [2, 52]—limiting liability or risk of litigation for users and balancing risks from adaptations and customizations with fear of penalties or liability in situations of malfunction, error, or harm [12, 38]. Regulatory and legal compliance involves meeting the legal and regulatory obligations for collecting,
Artificial Intelligence Project Success Factors
85
storing, using, processing, profiling, and releasing data and complying with other laws, regulations, or ordinances [7, 30, 45, 51, 53, 57, 59, 65].
4.5 Societal Impacts Individual Protection. Civil rights and liberties protection secures natural persons’ fundamental rights and freedoms, including the right to data protection and privacy and to have opinions and decisions made independently of an automated system [12, 60]. To ensure such rights, the product and usage qualities enumerated in their respective success groups must be implemented (e.g., equitable treatment, accessibility, choices, privacy, and confidentiality) [12, 47, 60]. Finally, AI systems may introduce new job structures and patterns, eliminate specific types of jobs, or change the way of working. Thus, programs may be required to address employee and employment-related issues [12, 53]. Sustainability. Environmental sustainability is supported by limiting environmental impacts and reducing energy consumption in model creation [12, 69].
5 Discussion This study framed the question of project success from the perspective of moral decisionmaking with algorithms. It listed success factors for each component of an AI system based on the development, usage, and consequences stages of an AI project lifecycle. The research revealed that the project team’s actions influence who judges the reasonability of a given decision [15]. Thus, the project team has some responsibility for the moral decisions produced by algorithmic systems. Based on the definition from Jones [35], project team members are moral agents because they make decisions that may affect others (whether harmful or beneficial), even if they do not recognize that a moral issue is at stake. Hence, the systems they develop are artificial agents that should abide by the moral laws of society. The analysis produced success groups that differentiate activities, clarifying the requirements for technicians, finance providers, instructors, and end-users in the final decision-making process. This separation provides some clarity to avoid any role confusion among IT designers, as described by Manders-Huits [15]. Figure 3 visualizes the relationship between the AI success categories and the project governance success groups. The findings differ between the information technology (IT) [13, 22–30] and project management success models [9, 10, 20, 21] in several important ways. First, the success groups include the ethical practices and investigation success categories, which were not explicitly considered in the investigated models. For the IT models, there are three additional differences. First, there is a direct bi-directional relationship between benefits and protections and product qualities. That is, there are consequences in developing AI systems for stakeholders with or without system usage. Second, it stresses that usages have non-financial performance impacts that may harm or benefit society. Third, the benefits and protection category introduces new factors such as intellectual property rights and protections.
86
G. J. Miller
For the project management models, there are four additional differences. First, the factors consider stakeholders beyond the customer and wider interests than those typically considered by customer consultation and customer acceptance. This finding is consistent with the observation from Davis [10] that accountability and stakeholder impact are project success factors. Consequently, external stakeholder engagement and interest occur throughout the project management success group (i.e., inclusion in a diverse working environment) through the investigation and societal impacts categories. Second, in addition to the technical task success category identified by Pinto and Slevin [21], usage qualities are an equally important success category. Third, the success group for legal and financial protections is explicit and specific in content. These qualities differentiate it from the “preparing for the future” success factor foreseen by Shenhar, Dvir, Levy, and Maltz [9] and the generic public success measures described by Turner and Zolin [20]. Fourth, the success categories consider a time dimension similar to Shenhar, Dvir, Levy, and Maltz [9] and Turner and Zolin [20] but with an AI focus. Consequently, a project team could forecast success expectations months or years after completion. The success categories, groups, and factors are discussed in the following sections.
Ethical Practices
Project Management
Product Quality
Investigations
Usage Qualities
Social Impact
Benefits & Protections
Fig. 3. Success categories relationship model
5.1 Project Governance The intersectionality of success groups and ethical principles in Table 4 highlights project managers’ and sponsors’ challenges in balancing the public’s interest with the organization’s interest. Some of the most crucial success groups have little or no overlap with ethical principles, while others are highly relevant across several specific principles. The project scope document is a factor in determining the product relationship to benefit society as a whole (beneficence). The ethical practices drive how the project managers, team members, and end-users evaluate what is in the best interest of the public and the organization. The project team may face challenges in dealing with tensions between expectations and reality and with moral hazards. There are likely to be tensions between the
Artificial Intelligence Project Success Factors
87
expectations of users and managers and what is feasible given the available data, compute power, and expertise of the team [40]. In addition, when some or all project team members, project sponsors, or system users are from differing organizations, people may be prone to behaviors that create a moral hazard [17]. For example, a supplier may demonstrate opportunist behavior such as creating a high transparency gap by hiding information, engaging specialists with low competence and little experience, intentionally completing tasks poorly to create a competence gap for the client, or underpricing the development work to win the operational work. Thus, the responsibility assignment matrix is a vital tool to use when assigning tasks and roles to mitigate the risk of morally hazardous behaviors [17]. Third parties such as investigative reporters in the media, internal or external auditors, public crowds, or regulators also may attempt to expose algorithmic harms [53]. Furthermore, the user organization may be the target of investigations or investigate or audit the algorithms. Thus, product and usage quality may be challenged as part of an investigation. 5.2 Product Quality Product quality success factors are under the control of the project team but influenced by many stakeholders. Thus, each development aspect needs to consider technical product qualities, usability features, information requirements, and legal and regulatory requirements. In this regard, several conflicting success factors have to be balanced. For example: • End-users may want a high degree of flexibility for human intervention, including making alternative choices. Similarly, the person impacted by the decision outcome will want to have erroneous (or biased) decisions reviewed and corrected. Conversely, the user’s organization would want to limit legal liabilities, leading to fewer choices. The more open the system, the harder it is to differentiate between a system error and user error and assign accountability. • The need for the end-users to understand and explain the decisions produced by the algorithms suggests a high degree of transparency for the algorithm, data, and frontend user interface. Conversely, preserving intellectual property rights is a factor for a lesser degree of transparency. Furthermore, external bad actors may try to manipulate the algorithm if too much information is understood about how it works, which would be a security concern [82]. • Unbiased models can produce high error rates (or be inaccurate), and biased models can be accurate. Thus, there is a trade-off between utility and fairness due to bias or inaccuracies. • There is an additional trade-off between the degree of automation and human autonomy. Too much automation can give the perception that—or lead to a reality where— people are under constant surveillance, suggesting that the system knows too much and is what Watson and Conner [65] calls “creepy.” Meanwhile, the system can offer flexibility, accuracy, or benefits not available through human autonomy. • Developing large-scale language models produces carbon emissions and has a financial cost. However, the assumption (which is challenged) is that large models increase
88
G. J. Miller
accuracy. Thus, there is a tradeoff between accuracy, environmental impacts, and financial costs. 5.3 Usage Qualities Shin and Park [66] empirically have found that end-users understand, perceive, and process algorithm fairness, accountability, and transparency differently. Furthermore, the interaction between trust and algorithmic features influences user satisfaction. Thus, the end user’s confidence in and understanding of the system are some of the essential success factors that impact decision quality. Product quality determines the procedures implemented by the end-users and their organization. For example, modelers may defer responsibility to the users for addressing model overconfidence or adjusting thresholds. Such cases require onboarding procedures to calibrate user trust and understanding in the system [40]. The user’s organization and the platform providers must follow regulations and laws relevant to the industry, data processing, and data profiling. Furthermore, as of April 2021 in the European Union, the artificial intelligence regulation act requirements should be considered [86]. Thus, success factors are robust operational rules, policies, contracts, quality controls, and privacy and security safeguards. 5.4 Benefits and Protections There are several success factors from a business and governance perspective for delivering the product, intellectual property rights, and protections, limiting liability, and ensuring legal safeguards and regulatory compliance. Similar to product quality, there are multiple conflicting success factors. For example: • The trade-off between accuracy, environmental, and financial costs, as already discussed. • The need for financial gains from algorithmic systems and the need to benefit society (beneficence) may result in conflicting objectives. • Project efficiency concerning quality, time, budget, and regulatory and legal compliance. • The need for algorithm, data, and front-end user interface transparency and the production of intellectual property rights by protecting trade secrets. • The need for legal safeguards, comparing the need for system flexibility to allow for choices at the point of decision versus restricting human intervention. 5.5 Societal Impacts First, people impacted by algorithmic decisions want fairness, meaning moral or “just” treatment from algorithmic decision-making. However, fairness or the perception of fairness has several subjective components that fall out of the scope of any development project, including pre-established attitudes and emotional reactions to algorithmic outcomes [1, 67]. Therefore, product and usage qualities should incorporate the enumerated success factors to protect individual rights and avoid potential harms [12, 53].
Artificial Intelligence Project Success Factors
89
Next, since the development of algorithms has an environmental impact, sustainable development is a success factor. Ziemba [87] proposes that sustainable development should include four dimensions; they each appear relevant to algorithmic development: ecological, socio-cultural, economic, and political. Ecological sustainability is the conservation and proper use of renewable resources (air, water, land), and an economic approach adopts sustainable practices for innovation, financial benefits, and reputation and brand value. Socio-cultural sustainability contributes to the community and its stability and security. Finally, political sustainability upholds democratic values and effectively advocates for all enumerated rights.
6 Conclusions The importance of algorithms in society and individuals’ lives is becoming increasingly apparent. Therefore, it is crucial to discuss the success factors for AI projects, which are dramatically more expansive than a typical information systems project. This research identified five categories of AI project success factors in 17 groups related to moral decision-making, the relationship among the success factor categories, and the relationship between the success factor and AI ethical principles. The review summarized the concerns for fair, moral algorithm development and usage in decision-making. It revealed that the project manager and project team need to consider many factors when defining the project scope and executing it, arguing that people who develop and operate AI systems are moral agents. Consequently, these actors should build AI systems and procedures to avoid harm and ensure benefits. Projects are constrained by time and budget, limiting the availability of people and other resources. Nevertheless, the algorithms that result from AI projects can have a significant, long-term impact. Thus, it is necessary and relevant that a broad view of success be considered in planning and executing these projects and that the societal impacts should be addressed as an explicit critical success factor. In particular, designers and project managers should evaluate how well the public interest and concerns are addressed by the product and usage quality. 6.1 Contributions The findings from this study provide some guidelines on the success factors that may only be used indirectly or over time to judge a project’s success. It makes four contributions. First, it closes the gap on a lack of literature that translates AI ethical principles into practice [14]. Second, it provides a descriptive view of the ethical and additional related project deliverables, acts, or situations necessary for AI project success from the perspective of moral decision-making. Third, it considers success across time by investigating the post-project activities—usage and investigations—and impacts. Finally, since AI projects are rarely discussed in the project management literature, it contributes a broader review of AI success factors.
90
G. J. Miller
6.2 Practical Implications Projects, and especially AI projects, are context-sensitive. Because the factors presented here are generic, it is vital to adjust and validate these features in specific contexts. For example, developing an algorithm for a healthcare situation would have different considerations than a marketing approach. However, success factors provide insights into the activities and deliverables that should be considered during planning to ensure fair, ethical decisions. First, the project manager and sponsor should ensure project scopes consider moral decision-making with algorithms. This approach will dramatically affect the team compositions, the deliverables produced as part of the project, and the project budget. Moreover, the benefits to society and the environment could be highlighted and potentially measured. Next, they should consider the success factors described herein to recognize moral issues that require decisions during the development process to mitigate project risks. Project managers and sponsors may be limited in influencing future usage and operational processes. Nevertheless, they should try to exert their influence on the ethical practices of system users and user organizations. As an agent of the sponsoring organization with a reputation to manage and business objectives to reach, the project manager should consider these success factors to ensure adherence to ethical, privacy, security, and societal norms and to avoid regulatory fines and legal issues. 6.3 Theoretical Implications The research expands the existing project management literature on project success factors specific to the AI domain. This contribution is consistent with the direction identified by Ika [18] “that one should turn to context-specific and even symbolic and rhetoric project success and CSFs [critical success factors].” In addition, the study qualitatively confirmed the finding from Davis [10] that accountability and stakeholder impacts are project critical success factors. 6.4 Limitations and Future Research This research was based on the latest available literature at a single point in time. However, AI is a fast-moving topic, judging from the number of recent articles. Thus, other methods such as a Delphi study with field experts could extend and update the study and validate the findings. Since a single researcher conducted the analysis, the results may be biased by the researcher’s perspective. As an opportunity for additional research, the success factors could be used to investigate project accountability or stakeholder management. It could be expanded to identify measurable success criteria for some success factors. AI literature regarding ways to measure bias, inequality, and accuracy should be left to specialists; however, it would be interesting to understand how to evaluate the trade-offs needed to complete a project and still meet all stakeholder requirements while retaining an honest approach. Furthermore, following the model from Ziemba [87], the factors and their relationships should be empirically investigated in future research.
Artificial Intelligence Project Success Factors
91
References 1. Helberger, N., Araujo, T., de Vreese, C.H.: Who is the fairest of them all? Public attitudes and expectations regarding automated decision-making. Comput. Law Secur. Rev. 39, 1–16 (2020). https://dx.doi.org/10.1016/j.clsr.2020.105456 2. Garfinkel, S., Matthews, J., Shapiro, S.S., Smith, J.M.: Toward algorithmic transparency and accountability. Commun. ACM 60(9), 5 (2017). https://dx.doi.org/10.1145/3125780 3. Boonjing, V., Pimchangthong, D.: Data mining for positive customer reaction to advertising in social media. In: Ziemba, E. (ed.) AITM/ISM -2017. LNBIP, vol. 311, pp. 83–95. Springer, Cham (2018). https://dx.doi.org/10.1007/978-3-319-77721-4_5 4. Yadav, G., Kumar, Y., Sahoo, G.: Predication of Parkinson’s disease using data mining methods: a comparative analysis of tree, statistical and support vector machine classifiers. In: Proceedings International Conference Computing Communication Systems, pp. 1−8. IEEE (2012). https://doi.org/10.1109/NCCCS.2012.6413034 5. Abdelaal, M.M.A., Sena, H.A., Farouq, M.W., Salem, A.-B.M.: Using data mining for assessing diagnosis of breast cancer. In: Proceedings International Multiconference Computing Science Information Technology, pp. 11−17. IEEE (2010). https://dx.doi.org/10.1109/IMCSIT. 2010.5679647 6. Hamon, R., Junklewitz, H., Malgieri, G., De Hert, P., Beslay, L., Sanchez, I.: Impossible explanations? Beyond explainable AI in the GDPR from a COVID-19 use case scenario. In: FAccT 2021: Proceedings 2021 ACM Conference Fairness Accountability and Transparency, pp. 549−559. ACM (2021). https://dx.doi.org/10.1145/3442188.3445917 7. Sherer, J.A.: When is a chair not a chair?: Big data algorithms, disparate impact, and considerations of modular programming. Comput. Internet lawyer 34(8), 6–10 (2017) 8. Bonsón, E., Lavorato, D., Lamboglia, R., Mancini, D.: Artificial intelligence activities and ethical approaches in leading listed companies in the European union. Int. J. Account. Inf. Syst. 43, 100535 (2021). https://doi.org/10.1016/j.accinf.2021.100535 9. Shenhar, A.J., Dvir, D., Levy, O., Maltz, A.C.: Project success: a multidimensional strategic concept. Long Range Plan. 34(6), 699–725 (2001). https://doi.org/10.1016/S0024-630 1(01)00097-8 10. Davis, K.: An empirical investigation into different stakeholder groups perception of project success. Int. J. Project Manage. 35(4), 604–617 (2017). https://dx.doi.org/10.1016/j.ijproman. 2017.02.004 11. Mitchell, R.K., Agle, B.R., Wood, D.J.: Toward a theory of stakeholder identification and salience: defining the principle of who and what really counts. Acad. Manage. Rev. 22(4), 853–886 (1997). https://dx.doi.org/10.5465/amr.1997.9711022105 12. Ryan, M., Stahl, B.C.: Artificial intelligence ethics guidelines for developers and users: clarifying their content and normative implications. J. Inf. Commun. Ethics Soc. 19(1), 61–86 (2021). https://dx.doi.org/10.1108/JICES-12-2019-0138 13. Leyh, C.: Critical success factors for ERP projects in small and medium-sized enterprises - the perspective of selected German SMEs. In: Proceedings 2014 Federated Conference Computing Science Information Systems FedCSIS 2014, pp. 1181−1190. ACSIS (2014). https://dx.doi.org/10.15439/2014F243 14. Mittelstadt, B.: Principles alone cannot guarantee ethical AI. Nat. Mach. Intell. 1(11), 501–507 (2019). https://doi.org/10.1038/s42256-019-0114-4 15. Manders-Huits, N.: Moral responsibility and it for human enhancement. In: SAC 2006: Proceedings 2006 ACM Symposium Application Computing, pp. 267–271. ACM (2006). https:// dx.doi.org/10.1145/1141277.1141340 16. Martin, K.: Ethical implications and accountability of algorithms. J. Bus. Ethics 160(4), 835–850 (2018). https://dx.doi.org/10.1007/s10551-018-3921-3
92
G. J. Miller
17. Wachnik, B.: Moral hazard in IT project completion. An analysis of supplier and client behavior in polish and German enterprises. In: Ziemba, E. (ed.) Information Technology for Management. LNBIP, vol. 243, pp. 77–90. Springer, Cham (2016). https://dx.doi.org/10. 1007/978-3-319-30528-8_5 18. Ika, L.A.: Project success as a topic in project management journals. Proj. Manag. J. 40(4), 6–19 (2009). https://dx.doi.org/10.1002/pmj.20137 19. Weninger, C.: Project initiation and sustainability principles: what global project management standards can learn from development projects when analyzing investments. In: PMI Research Education Conference Newtown Square, PA: Project Management Institute (2012) 20. Turner, R.J., Zolin, R.: Forecasting success on large projects: developing reliable scales to predict multiple perspectives by multiple stakeholders over multiple time frames. Proj. Manag. J. 43(5), 87–99 (2012). https://dx.doi.org/10.1002/pmj.21289 21. Pinto, J.K., Slevin, D.P.: Critical success factors across the project life cycle. Proj. Manag. J. 19(3), 67–75 (1988) 22. Leyh, C., Köppel, K., Neuschl, S., Pentrack, M.: Critical success factors for digitalization projects. In: Proceedings16th Conference Computing Science Intelligent System FedCSIS 2021, pp. 427−436. ACSIS (2021). https://dx.doi.org/10.15439/2021F122 23. Włodarski, R., Poniszewska-Mara´nda, A.: Measuring dimensions of software engineering projects’ success in an academic context. In: Proceedings 2017 Federated Conference Computing Science Information System FedCSIS 2017, pp. 1207−1210. ACSIS (2017). https:// dx.doi.org/10.15439/2017F295 24. Ralph, P., Kelly, P.: The dimensions of software engineering success. In: Proceedings - 2017 IEEE/ACM 39th International Conference Software Engineering, pp. 24–35. ACM (2014). https://doi.org/10.1145/2568225.2568261 25. Chatzoglou, P., Chatzoudes, D., Fragidis, L., Symeonidis, S.: Examining the critical success factors for ERP implementation: an explanatory study conducted in SMEs. In: Ziemba, E. (ed.) AITM/ISM -2016. LNBIP, vol. 277, pp. 179–201. Springer, Cham (2017). https://dx. doi.org/10.1007/978-3-319-53076-5_10 26. Leyh, C., Gebhardt, A., Berton, P.: Implementing ERP systems in higher education institutes critical success factors revisited. In: Proceedings 2017 Federated Conference Computing Science Information System FedCSIS 2017, pp. 913−917. ACSIS (2017). https://dx.doi.org/ 10.15439/2017F364 27. Miller, G.J.: A conceptual framework for interdisciplinary decision support project success. In: 2019 IEEE Technology Engineering Management Conference TEMSCON 2019, pp. 1−8. IEEE (2019). https://dx.doi.org/10.1109/TEMSCON.2019.8813650 28. Miller, G.J.: Quantitative comparison of big data analytics and business intelligence project success factors. In: Ziemba, E. (ed.) AITM/ISM -2018. LNBIP, vol. 346, pp. 53–72. Springer, Cham (2019). https://dx.doi.org/10.1007/978-3-030-15154-6_4 29. Petter, S., McLean, E.R.: A meta-analytic assessment of the delone and mclean is success model: an examination of is success at the individual level. Inform. Manage. 46(3), 159–166 (2009). https://dx.doi.org/10.1016/j.im.2008.12.006 30. Umar Bashir, M., Sharma, S., Kar, A.K., Manmohan Prasad, G.: Critical success factors for integrating artificial intelligence and robotics. Digit. Policy Regul. Gov. 22(4), 307–331 (2020). https://doi.org/10.1108/DPRG-03-2020-0032 31. Iqbal, R., Doctor, F., More, B., Mahmud, S., Yousuf, U.: Big data analytics and computational intelligence for cyber-physical systems: Recent trends and state of the art applications. Future Gener. Comput. Syst. 105, 766–778 (2017). https://doi.org/10.1016/j.future.2017.10.021 32. Aggarwal, J., Kumar, S.: A survey on artificial intelligence. Int. J. Res. Eng. Sci. Manage. 1(12), 244–245 (2018). https://dx.doi.org/10.31224/osf.io/47a85
Artificial Intelligence Project Success Factors
93
33. Homayounfar, P., Owoc, M.L.: Data mining research trends in computerized patient records. In: Proceedings 2011 Federated Conference Computing Science Information System FedCSIS 2011, pp. 133−139. IEEE (2011) 34. OECD: Artificial intelligence in society. OECD Publishing, Paris (2019) 35. Jones, T.M.: Ethical decision making by individuals in organizations: an issue-contingent model. Acad. Manage. Rev. 16(2), 366–395 (1991) 36. Anscombe, G.E.M.: Modern moral philosophy. In: Hudson, W.D. (ed.) The Is-Ought Question. CP, pp. 175–195. Palgrave Macmillan UK, London (1969). https://doi.org/10.1007/9781-349-15336-7_19 37. Shaw, N.P., Stöckel, A., Orr, R.W., Lidbetter, T.F., Cohen, R.: Towards provably moral AI agents in bottom-up learning frameworks. In: AIES 2018: Proceedings 2018 AAAI/ACM Conference AI, Ethics Society, pp. 271–277. ACM (2018). https://dx.doi.org/10.1145/327 8721.3278728 38. Cohen, I.G., Amarasingham, R., Shah, A., Xie, B., Lo, B.: The legal and ethical concerns that arise from using complex predictive analytics in health care. Health Affair 33(7), 1139–1147 (2014). https://dx.doi.org/10.1377/hlthaff.2014.0048 39. Jobin, A., Ienca, M., Vayena, E.: The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1(9), 389–399 (2019). https://doi.org/10.1038/s42256-019-0088-2 40. Hopkins, A., Booth, S.: Machine learning practices outside big tech: How resource constraints challenge responsible development. In: AIES 2018: Proceedings 2018 AAAI/ACM Conference AI, Ethics Society, pp. 134–145. ACM (2021). https://dx.doi.org/10.1145/3461702.346 2527 41. Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G.: Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. Int. J. Surg. 8(5), 336–341 (2010). https:// dx.doi.org/10.1016/j.ijsu.2010.02.007 42. Wieringa, M.: What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. In: FAT* 2020 - Proceedings 2020 Conference Fairness Accountability Transparency, pp. 1–18. ACM (2020). https://dx.doi.org/10.1145/3351095. 3372833 43. Aguirre, A., Dempsey, G., Surden, H., Reiner, P.B.: AI loyalty: a new paradigm for aligning stakeholder interests. IEEE Trans. Technol. Soc. 1(3), 128–137 (2020). https://dx.doi.org/10. 1109/TTS.2020.3013490 44. Brady, A.P., Neri, E.: Artificial intelligence in radiology—ethical considerations. Diagnostics 10(4), 231 (2020). https://dx.doi.org/10.3390/diagnostics10040231 45. Cobbe, J., Lee, M.S.A., Singh, J.: Reviewable automated decision-making: a framework for accountable algorithmic systems. In: FAccT 2021: Proceedings 2021 ACM Conference Fairness Accountability Transparency, pp. 598–609. ACM (2021). https://dx.doi.org/10.1145/ 3442188.3445921 46. Jacovi, A., Marasovi: formalizing trust in artificial intelligence: prerequisites, causes and goals of human trust in AI. In: FAccT 2021: Proceedings 2021 ACM Conference Fairness Accountability Transparency, pp. 624–635. ACM (2021). https://dx.doi.org/10.1145/ 3442188.3445923 47. Loi, M., Heitz, C., Christen, M.: A comparative assessment and synthesis of twenty ethics codes on AI and big data. In: 7th Swiss Conference Data Science, pp. 41–46. IEEE (2020). https://dx.doi.org/10.1109/SDS49233.2020.00015 48. McGrath, S.K., Whitty, S.J.: Accountability and responsibility defined. Int. J. Manag. Proj. Bus. 11(3), 687–707 (2018). https://dx.doi.org/10.1108/IJMPB-06-2017-0058 49. Rezania, D., Baker, R., Nixon, A.: Exploring project managers’ accountability. Int. J. Manag. Proj. Bus. 12(4), 919–937 (2019). https://dx.doi.org/10.1108/IJMPB-03-2018-0037
94
G. J. Miller
50. Bondi, E., Xu, L., Acosta-Navas, D., Killian, J.A.: Envisioning communities: a participatory approach towards AI for social good. In: AIES 2018: Proceedings 2018 AAAI/ACM Conference AI, Ethics Society, pp. 425–436. ACM (2021). https://dx.doi.org/10.1145/3461702.346 2612 51. Bertino, E., Kundu, A., Sura, Z.: Data transparency with blockchain and AI ethics. ACM J. Data Inf. Qual. 11(4), 1–8 (2019). https://dx.doi.org/10.1145/3312750 52. Rossi, A., Lenzini, G.: Transparency by design in data-informed research: a collection of information design patterns. Comput. Law Secur. Rev. 37, 1–22 (2020). https://dx.doi.org/10. 1016/j.clsr.2020.105402 53. Rodrigues, R.: Legal and human rights issues of AI: gaps, challenges and vulnerabilities. J Responsible Tech. 4, 100005 (2020). https://doi.org/10.1016/j.jrt.2020.100005 54. Lim, J.H., Kwon, H.Y.: A study on the modeling of major factors for the principles of AI ethics. In: DG.O2021: 22nd Annual International Conference Digital Government Research, pp. 208–218. ACM (2021). https://doi.org/10.1145/3463677.3463733 55. Unceta, I., Nin, J., Pujol, O.: Risk mitigation in algorithmic accountability: the role of machine learning copies. PLoS One 15(11), e0241286 (2020). https://dx.doi.org/10.1371/journal.pone. 0241286 56. Langer, M., Landers, R.N.: The future of artificial intelligence at work: a review on effects of decision automation and augmentation on workers targeted by algorithms and third-party observers. Comput. Hum. Behav. 123, 106878 (2021). https://dx.doi.org/10.1016/j.chb.2021. 106878 57. Metcalf, J., Moss, E., Watkins, E.A., Singh, R., Elish, M.C.: Algorithmic impact assessments and accountability: the co-construction of impacts. In: FAccT 2021: Proceedings 2021 ACM Conference Fairness Accountability Transparency, pp. 735–746. ACM (2021). https://dx.doi. org/10.1145/3442188.3445935 58. Eslami, M., Vaccaro, K., Lee, M.K., On, A.E.B., Gilbert, E., Karahalios, K.: User attitudes towards algorithmic opacity and transparency in online reviewing platforms. In: CHI 2019: Proceedings 2019 CHI Conference Human Factors Computing Systems, pp. 1–14. ACM (2019). https://dx.doi.org/10.1145/3290605.3300724 59. Shneiderman, B.: Bridging the gap between ethics and practice: guidelines for reliable, safe, and trustworthy human-centered AI systems. ACM Trans. Interact. Intell. Syst. 10(4), 1–31 (2020). https://dx.doi.org/10.1145/3419764 60. Büchi, M., Fosch-Villaronga, E., Lutz, C., Tamò-Larrieux, A., Velidi, S., Viljoen, S.: The chilling effects of algorithmic profiling: mapping the issues. Comput. Law Secur. Rev. 36, 1–15 (2020). https://dx.doi.org/10.1016/j.clsr.2019.105367 61. Munoko, I., Brown-Liburd, H.L., Vasarhelyi, M.: The ethical implications of using artificial intelligence in auditing. J. Bus. Ethics 167(2), 209–234 (2020). https://dx.doi.org/10.1007/ s10551-019-04407-1 62. Gebru, T., et al.: Datasheets for datasets. arXiv preprint https://arxiv.org/abs/1803.09010v7 (2018) 63. Hutchinson, B., et al.: Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. In: FAccT 2021: Proceedings. 2021 ACM Conference Fairness Accountability Transparency, pp. 560–575. ACM (2021). https://dx.doi.org/10.1145/ 3442188.3445918 64. Wagner, B., Rozgonyi, K., Sekwenz, M.-T., Cobbe, J., Singh, J.: Regulating transparency? Facebook, twitter and the German network enforcement act. In: FAT* 2020 - Proceedings 2020 Conference Fairness Accountability Transparency, pp. 261–271. ACM (2020). https:// doi.org/10.1145/3351095.3372856 65. Watson, H.J., Conner, N.: Addressing the growing need for algorithmic transparency. Commun. Assoc. Inf. Syst. 45, 488–510 (2019).https://dx.doi.org/10.17705/1CAIS.04526
Artificial Intelligence Project Success Factors
95
66. Shin, D., Park, Y.J.: Role of fairness, accountability, and transparency in algorithmic affordance. Comput. Hum. Behav. 98, 277–284 (2019). https://dx.doi.org/10.1016/j.chb.2019. 04.019 67. Adam, H.: The ghost in the legal machine: algorithmic governmentality, economy, and the practice of law. J. Inf. Commun. Ethics Soc. 16(1), 16–31 (2018). https://dx.doi.org/10.1108/ JICES-09-2016-0038 68. Alasadi, J., Al Hilli, A., Singh, V.K.: Toward fairness in face matching algorithms. In: FAT* 2019 - Proceedings 2019 Conference Fairness Accountability Transparency MultiMedia, pp. 19–25. ACM (2019). https://dx.doi.org/10.1145/3347447.3356751 69. Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: can language models be too big? In: FAccT 2021: Proceedings 2021 ACM Conference Fairness Accountability Transparency, pp. 610–623. ACM (2021). https://dx.doi.org/10.1145/ 3442188.3445922 70. Kang, Y., Chiu, Y.W., Lin, M.Y., Su, F.Y., Huang, S.T.: Towards model-informed precision dosing with expert-in-the-loop machine learning. In: Proceedings - 2021 IEEE 22nd International Conference Information Reuse Integrated Data Science IRI 2021, pp. 342–347. IEEE (2021). https://dx.doi.org/10.1109/IRI51335.2021.00053 71. Mitchell, M., et al.: Model cards for model reporting. In: FAT* 2019 - Proceedings. 2019 Conference Fairness Accountability Transparency, pp. 220–229. ACM (2019). https://dx.doi. org/10.1145/3287560.3287596 72. Wan, W.X., Lindenthal, T.: Towards accountability in machine learning applications: a systemtesting approach. SSRN Electron. J. 1–64 (2021)https://dx.doi.org/10.2139/ssrn.3758451 73. Harrison, G., Hanson, J., Jacinto, C., Ramirez, J., Ur, B.: An empirical study on the perceived fairness of realistic, imperfect machine learning models. In: FAT* 2020 - Proceedings 2020 Conference Fairness Accountability Transparency, pp. 392–402. ACM (2020). https://dx.doi. org/10.1145/3351095.3372831 74. Gandy, O.H.: Engaging rational discrimination: exploring reasons for placing regulatory constraints on decision support systems. Ethics Inf. Technol. 12(1), 29–42 (2010). https://dx.doi. org/10.1007/s10676-009-9198-6 75. Chazette, L., Brunotte, W., Speith, T.: Exploring explainability: a definition, a model, and a knowledge catalogue. In: Proceedings - 2021 IEEE 29th International Requirements Engineering Conference RE 2021, pp. 197–208. IEEE (2021). https://dx.doi.org/10.1109/RE5 1729.2021.00025 76. Mariotti, E., Alonso, J.M., Confalonieri, R.: A framework for analyzing fairness, accountability, transparency and ethics: a use-case in banking services. In: 2021 IEEE International Conference Fuzzy System (FUZZ-IEEE), pp. 1–6. IEEE (2021). https://dx.doi.org/10.1109/ FUZZ45933.2021.9494481 77. Albrecht, U.-V.: Transparency of health-apps for trust and decision making. J. Med. Internet Res. 15(12), 1–5 (2013). https://dx.doi.org/10.2196/jmir.2981 78. Givens, A.R., Morris, M.R.: Centering disability perspectives in algorithmic fairness, accountability and transparency. In: FAT* 2020 - Proceedings 2020 Conference Fairness Accountability Transparency, p. 684. ACM (2020). https://dx.doi.org/10.1145/3351095.337 5686 79. Vallejos, E.P., Koene, A., Portillo, V., Dowthwaite, L., Cano, M.: Young people’s policy recommendations on algorithm fairness. In: WebSci 2017: Proceedings 2017 ACM Web Science Conference, pp. 247–251. ACM (2017). https://dx.doi.org/10.1145/3091478.3091512 80. Janssen, M., Brous, P., Estevez, E., Barbosa, L.S., Janowski, T.: Data governance: organizing data for trustworthy artificial intelligence. Gov. Inf. Q. 37(3), 101493 (2020). https://doi.org/ 10.1016/j.giq.2020.101493
96
G. J. Miller
81. Bhatt, U., et al.: Explainable machine learning in deployment. In: FAT* 2020 - Proceedings 2020 Conference Fairness Accountability Transparency, pp. 648–657. ACM (2020). https:// dx.doi.org/10.1145/3351095.3375624 82. Scoleze Ferrer Paulo, S., Galvão Graziela Darla, A., de Carvalho Marly, M.: Tensions between compliance, internal controls and ethics in the domain of project governance. Int. J. Manag. Proj. Bus. 13(4), 845–865 (2020). https://dx.doi.org/10.1108/IJMPB-07-2019-0171 83. Mowbray, A., Chung, P., Greenleaf, G.: Utilising AI in the legal assistance sector—testing a role for legal information institutes. Comput. Law Secur. Rev. 38, 1–9 (2020). https://dx.doi. org/10.1016/j.clsr.2020.105407 84. Joerin, A., Rauws, M., Fulmer, R., Black, V.: Ethical artificial intelligence for digital health organizations. Cureus 12(3), e7202 (2020). https://dx.doi.org/10.7759/cureus.7202 85. Matthews, J.: Patterns and antipatterns, principles, and pitfalls: accountability and transparency in artificial intelligence. AI Mag. 41(1), 82–89 (2020) 86. Artificial intelligence act, Proposal for a regulation of the European Parliament and of the Council: Laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts C.F.R. (2021) 87. Ziemba, E.: The ICT adoption in enterprises in the context of the sustainable information society. In: Proceedings 2017 Federated Conference Computing Science Information System FedCSIS 2017, pp. 1031–1038. ACSIS (2017). https://dx.doi.org/10.15439/2017F89
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Planning a Mass Vaccination Campaign with Balanced Staff Engagement Salvatore Foderaro1 , Maurizio Naldi2(B) , Gaia Nicosia3 , and Andrea Pacifici1 1
Università degli Studi di Roma “Tor Vergata”, via del Politecnico, 1, Rome, Italy [email protected], [email protected] 2 LUMSA University, via Marcantonio Colonna 19, Rome, Italy [email protected] 3 Università degli Studi Roma Tre, via della Vasca Navale 79, Rome, Italy [email protected]
Abstract. The insurgence of the COVID pandemic calls for mass vaccination campaigns worldwide. Pharmaceutical companies struggle to ramp up their production to meet the demand for vaccines but cannot always guarantee a perfectly regular delivery schedule. On the other hand, governments must devise plans to have most of their population vaccinated in the shortest possible time and have the vaccine booster administered after a precise time interval. The combination of delivery uncertainties and those time requirements may make such planning difficult. In this paper, we propose several heuristic strategies to meet those requirements in the face of delivery uncertainties. The outcome of those strategies is a daily vaccination plan that suggests how many initial doses and boosters can be administered each day. We compare the results with the optimal plan obtained through linear programming, which however assumes that we know in advance the whole delivery schedule. As for performance metrics, we consider both the vaccination time (which has to be as low as possible) and the balance between vaccination capacities over time (which has to be as uniform as possible). The strategies achieving the best trade-off between those competing requirements turn out to be the q-days ahead strategies, which put aside doses to guarantee that we do not run out of stock on just the next q days. Increasing the look-ahead period, i.e. q, allows to achieve a lower number of out-of-stock days, though worsening the other performance indicators. Keywords: Health management · Vaccination Heuristic algorithms · Load balancing
· Linear programming ·
This work is partially supported by MIUR PRIN Project AHeAD (Efficient Algorithms for HArnessing Networked Data). c Springer Nature Switzerland AG 2022 E. Ziemba and W. Chmielarz (Eds.): FedCSIS-AIST 2021/ISM 2021, LNBIP 442, pp. 97–116, 2022. https://doi.org/10.1007/978-3-030-98997-2_5
98
1
S. Foderaro et al.
Introduction
The coronavirus known as COVID has stayed with us for two years now [1]. Though an intense debate still exists concerning the best way to fight it, with divisions among supporters of prevention [2], cures [3], and vaccines [4], the last has so far emerged as the major tool [5]. However, a mass vaccination on a global scale poses unprecedented challenges [6]. All the stages on the way from vaccine conception to engineering to production and delivery are subject to huge scaling stress. In particular, production struggles to cope up with the growing demand, and delays in delivery crop up [7]. The irregular schedule of delivery hampers an orderly vaccination plan. Such plans must also cater to boosters’ administration at regularly spaced intervals. In addition, a mass vaccination campaign calls for massive use of human and material resources. Medical and nursing staff has to be organized. Vaccination clinics have to be readied. Medical and government authorities must then devise a plan that optimizes the use of resources and the campaign objectives, which can be summed up as aiming at vaccinating as many people in the shortest possible time. In this paper, we deal with the problem of devising a vaccination plan that can guarantee the shortest possible vaccination time to as many people as possible in a scenario of uncertain vaccine deliveries and under the constraint of guaranteeing the administration of a booster after a tight time interval1 . The planning problem addressed here has been introduced in [8]. This paper extends that study and considers an additional objective related to the problem of suitably sizing the overall vaccination system. Minimizing the vaccination time could lead to resource needs wildly fluctuating over time. Since the staff has to be contracted and/or seconded over a suitable period, managing a staff heavily variable on a day-by-day basis is not feasible. In order to achieve a smooth deployment of resources (i.e., balancing the usage of resources over time), we aim at minimizing the maximum workload of the system, i.e., the maximum number of doses that shall be inoculated per day2 . Such a quantity has clear relevance to the cost of the system; it, however, conflicts with the above-mentioned objective that—in a socially fair fashion—seeks to provide vaccines with a fast and extensive approach to the largest possible fraction of the population. Trade-offs between profit and fairness are often encountered in several problems concerning the distribution of resources within a collection of individuals as well as an organization (see, for instance, [9]). In this paper, we propose a possible approach to build a vaccination plan that suitably considers both fairness and cost-related 1
2
It is to be noted that this paper deals with the case where a single booster (i.e., second dose) is administered during the planning horizon. Still, the approach can easily include the possibility of further boosters (e.g., third and fourth dose), which are under discussion these days. This objective is pursued together with the constraint that all (or the largest possible part of) the supplied doses have to be used. Without such a constraint, an obvious optimal solution would be not to administer any vaccine.
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
99
issues. We propose several heuristic strategies and compare them against the benchmark represented by a linear programming solution that assumes to know in advance the actual schedule of dose arrivals (which we call the clairvoyant strategy). Our major findings are the following: – it is not worthwhile to deploy a vaccination capacity larger than twice the average workload; – the optimization focus may be heavily shifted towards reducing the maximum workload since the vaccination time is impacted just when the optimization concerns the maximum workload only; – the 1-day look-ahead strategy appears as the best strategy if we wish to minimize the average vaccination time and achieve a uniform workload with minimal stock-outs at the end of the planning horizon; – Look-ahead strategies with a longer look-ahead horizon are instead to be preferred if we wish to reduce the number of out-of-stock days.
2
Literature Review
Our paper concerns the application of operational research techniques to plan a mass vaccination campaign. In this section, we briefly review the literature concerning both vaccination campaign planning and the use of OR techniques for such a purpose. Influenza pandemics (COVID may be considered as a virulent form of influenza) are not new. Iskander et al. have counted four influenza pandemics during the past century [10]. In addition to pandemics, numerous other events can be considered spurring the need for public health emergency plans: the avian influence, the anthrax attack, and the Severe Acute Respiratory Syndrome infection. Among the ingredients of any pandemics response plan, vaccination plays a major role [11,12]. Mass vaccination plans under the emergency caused by a pandemics outbreak have been advocated to rapidly increase population (herd) immunity [13]. Most countries have started massive vaccination campaigns against COVID. A notable example is India, where the target was set at 300 million vaccinations in a country that has extensive experience in mass vaccination [14]. On quite a smaller scale, the Nepalese authorities launched and reported a campaign to cover the whole population [15]. The Our World in Data COVID-19 vaccination dataset has been put into place to offer an accurate and up-to-date view of the status of vaccinations around the world [16]. A common feature of any mass vaccination campaign under an emergency is the availability of vaccines. The outbreak of a new virus calls for the development of a new vaccine, which in turn may require a significant amount of time. Also, the mass production of such vaccines unavoidably exhibits a slow start phase. The shortage of vaccine doses is mentioned among the major criticalities to be dealt with during pandemics so that supply problems emerge [17–19]. In [20], delays in the procurement processes and suppliers’ reliability are mentioned as
100
S. Foderaro et al.
important sources of variability of the process. The proposed countermeasure to build a vaccine stockpile is not applicable in this context since vaccines are not readily available in the first place [21]. An additional hampering factor for the deployment of a vaccination campaign is the rise of virus variants, which may in turn call for further mass vaccination campaigns [22]. A fast vaccination cycle is then a significant success indicator of a vaccination campaign [23]. The importance of having an efficient decision-making process in place in health management crises is anyway stressed in [24]. The literature is not ample for the research efforts to apply operations research techniques to mass vaccination campaigns. So far, most efforts have been devoted to the optimal design of vaccination clinics, which focus on vaccination optimisation when vaccines have been allocated to vaccination centres, and the supply of vaccines is warranted. In the context of the 2009 H1N1 outbreak, a study has reported the use of simulation with accompanying computer animation to assess several clinic configurations to optimise the staffing level [25]. A more thorough simulation study has been conducted by [26] to analyse the performance and optimally design a vaccination clinic for smallpox. By using data extracted from the guidelines developed by the U.S. Center for Disease Control and Prevention and a mock vaccination campaign, they deployed a mixture of discrete-event simulation and queueing model approximations to assess the time spent by the patient through the clinic and compute the optimal staffing level, as well as exploring the impact of layout changes. A similar study was conducted in [27], where both exact and fast heuristic optimisation techniques were used to design a mass medication dispensing system. On a different scale, and closer to our level of analysis, a recent paper by Bertsimas et al. considers the allocation of vaccines within the population to optimally fight the diffusion of the pandemics [28]. A common assumption with our paper is the scarcity of vaccines. The major difference is that Bertsimas et al. discriminate between subpopulations based on their risks, while we discriminate between first-dose receivers and booster receivers, favouring the vaccination cycle’s completion. Another paper of interest concerns the optimal location of vaccination centres [29]. Here, we focus on spatial prioritisation rather than temporal prioritisation (unlike the focus of our paper and that in [28]). Finally, all the papers, and particularly [27], mention the variety of objective functions that can be considered. Multi-objective optimisation is probably the best choice, with an example of such approaches reported in [30].
3
Research Methodology
In this section, we first present, in Sect. 3.1, our approach to model the arrival process using suitable probability density functions, as the number bit of doses of vaccine i ∈ V delivered each day is considered as a random variable. We then describe the methodologies that help establish an acceptable vaccination
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
101
plan. We illustrate different procedures which depend on the complete or partial knowledge of the schedule of the dose arrivals. These topics are presented in Sects. 3.2 and 3.3. In the first one, it is assumed that the arrival dates of supplies are completely deterministic and perfectly known. In particular, we provide a linear optimization model and a greedy heuristic. We refer to these two approaches with the term off-line. In Sect. 3.3, we present two different non-clairvoyant algorithms with different degrees of out-of-stock risk. 3.1
Modeling the Arrival of Vaccine Doses
Vaccine doses represent the major input to the vaccination campaign. Rapid and large-scale adoption of vaccines is considered as the major weapon against the disruption brought by the diffusion of the pandemics [31]. Their scarce availability at the beginning of the vaccination campaign has been a hard obstacle for the fast abatement of the virus spread [32], and still, bottlenecks exist for the large-scale manufacturing of vaccines [33]. The actual arrival of doses has undergone a somewhat irregular pattern, with the frequency of arrivals and size of shipments deviating significantly from the desirable periodic and reliable delivery3 . The irregularities cannot be expected to vanish in the short run. In order to consider a realistic scenario in the vaccination planning effort, we must incorporate that irregularity in the model for the arrival of vaccine doses. This need calls for a stochastic model of arrivals. As we did in [8], in this paper, we keep focusing on a steady-state phase rather than on the very initial one, where the number of delivery doses kept growing till the manufacturing plant went past the start-up period. After observing the time series of arrivals, we have discarded rank-size models based on the day of the week (an example of rank-size models is reported in [34]), since they would have implied a mild regularity in the days of arrival, which we have not observed. We have also discarded models assuming a continuous distribution of values, such as in [35]. Instead, since the pattern of arrivals shows a significant number of days when there are no arrivals, we have opted for a zeroinflated Poisson model, also known as ZIP, where the probability distribution for the number X of dose arrivals is π + (1 − π)e−λ if k = 0, (1) P[X = k] = (1 − π)e−λ λk /k! if k = 1, 2, . . . . where 0 ≤ π ≤ 1 and λ ≥ 0. This model is then represented by two parameters, π and λ, which could be estimated from the observed time series of delivered doses. Among the several methods that can be adopted to estimate those parameters [36], we could opt for the method of moments. After indicating the sampling mean and variance respectively as m and s2 , the estimates are: 3
The actual daily shipments to Italy can be observed in the datasets provided at https://github.com/italia/covid19-opendata-vaccini under an OpenData agreement.
102
S. Foderaro et al.
s2 − m s2 + m2 − m 2 2 ˆ = s + m − 1. λ m
π ˆ=
(2)
However, rather than mimicking the figures observed for a specific country, we have set those parameters to be representative of the latest pandemics peak in Europe. In particular, we have employed a ZIP model where the parameters are respectively π = 0.85 and λ = 107 , to achieve an average daily number of doses equal to 1.5·106 (the figure refers to a single country and can be considered as representative of the situation in Italy in spring 2021).
Fig. 1. Time series of doses arrivals.
3.2
Off-line Vaccination Planning Approaches
In this section, we describe the algorithms that define a vaccine administration plan over the next T days. It consists of establishing the number of doses (precisely, initial doses and boosters) of vaccines that shall be administered every day. Hereafter, the set of vaccine types is denoted as V, and the planning horizon consists of a number T of periods (days) T = {1, . . . , T }. Another critical input parameter is the capacity limit of the system, i.e., the maximum number of vaccine doses that may be administered at time t. This limit may or may not be independent of the vaccine type (this depends, e.g.,
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
103
whether vaccination hubs provide a single or multiple types of vaccines). In fact, in our algorithms, we assume to reserve a certain capacity kti for each single vaccine type i, which is an upper bound on the number of doses (of that vaccine) that can be administered in day t, for all i ∈ V, t ∈ T . Of course, the size of these parameters (so that i∈V kti is a constant or slightly variable over time) can be suitably tuned, depending on the available supply. Note that our model can be easily adapted to consider an overall, independent on the vaccine type, daily capacity limit. The methods we are proposing are basically prescriptive models. In particular, the output of the algorithms is the number xit , resp. yti , of people receiving the initial dose, resp. the booster, of vaccine i on day t, for all t = 1, . . . , T . As a consequence, an additional output of these procedures is the stock levels for each vaccine i at (the end of) day t ∈ T . For the developments to follow, we refer to the total number of supplied doses of vaccine i until day t by: t biθ B i (t) = θ=1
Let Δi be the recommended, given, time interval, expressed in number of periods, between the first and the (mandatory) booster of vaccine i ∈ V. Since T −Δi T −Δi i T xt = t=Δi +1 yti ≤ 12 B i (T ) and t=1 xit ≤ B i (T − Δi ), then there is t=1 a feasible solution with siT = 0 if and only if B i (T − Δi ) ≥ 12 B i (T ). In fact, since the overall number of (first and booster) doses that can be administered are at most twice the number of inoculated first doses which in turn cannot be larger than B i (T − Δi ), any feasible solution has siT ≥ B i (T ) − 2B i (T − Δi ). With no loss of generality, we assume that the right-hand side of the latter inequality is not positive. Otherwise, we may subtract this quantity from the arrivals of the last T − Δi days and then apply the algorithm. So doing we eventually have siT = B i (T ) − 2B i (T − Δi ). Off-line Optimization Model. Hereafter, we present a linear programming model providing a solution to maximize the number of vaccinated people per day by exploiting as much as possible the doses supplied during the planning time window T = {1, . . . , T }. We optimize two different figures of merit: the maximum daily workload of the system and the average vaccination time. The first index is measured as the maximum number of doses administered in a single day, over the whole planning period T , while the second figure is the average number of days required to serve the booster. In our model, we may distinguish between vaccine types that require one or two doses: The set of available vaccine types is V = V ∪ V . Vaccine type V requires a single dose while vaccine type V requires an additional booster shot. We consider the following decision variables: – xit and yti , t ∈ T , i ∈ V indicate the number of first doses of vaccine i and boosters administered during time t;
104
S. Foderaro et al.
– sit is the amount of doses of vaccine i remaining in stock at (the end) of period t. We also assume that, in the initial period t = 1, a given inventory si0 ≥ 0 of doses is available in stock and that a given maximum inventory level s¯iF ≥ 0 is required at the end of the planning horizon. We discuss possible feasible values for s¯iF later on. The LP model is min αf1 (x, y) + (1 − α)f2 (y) + p(s) s.t. xit + yti + sit − sit−1 = bit xit + yti ≤ kti i xit = yt+Δ i i xt = 0 xit , yti , sit ≥ 0
(3) i ∈ V, t ∈ T
(4)
i ∈ V, t ∈ T
(5)
i ∈ V ,t ∈ T
(6)
i ∈ V ,t ∈ T
(7)
i ∈ V, t ∈ T
(8)
As we already mentioned, in this model, the objective function (3) is a linear combination of the maximum staff workload, i.e., the maximum number of doses administered in a single day over all the days t ∈ T , and the average vaccination time, plus a penalty function p(s) which aims at reducing the final level of stocks. The first two quantities can be expressed as: (9) f1 (x, y) = max{ (xit + yti )} t∈T
i∈V
yi ) t∈T (t i∈V it f2 (y) = t∈T i∈V yt
(10)
Parameter α measures the relative importance of the figure expressing the maximum workload with respect to that indicating the average vaccination time. Note that both Expression (9) and (10) can be linearized in order to obtain an equivalent suitable objective for the above linear program. In particular, observe thatin Expression (10) each time period t is “weighted” by the number of peothe final dose at t. In the same equation, the denominator is ple i yti receiving approximated to 12 i∈V B i (T ), i.e., the maximum number of boosters that can be administered in the face of certain dose-supplies bi , in order to obtain a linear expression. In fact, if the final stocks are siT = 0, the two quantities have equal values. The penalty function p(s) is introduced in order to minimize as much as possible the final overall inventory level. Indeed, without such a penalty in the objective, an optimal solution of the LP model would be not to administer any choice for the penalty—which we dose (i.e., xit = yti = 0 for all i and t.) A simple implemented in our experiments—is p(s) = ε i∈V siT , where ε is chosen small enough in order to prioritize the objectives in Eqs. (9) and (10). Equations (4) are simple continuity constraints expressing the obvious relationship among the variables and the supplied number of doses. Constraints (5) bound the total number of doses administered in each period. Constraints (6)
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
105
ensure that the booster is given after the recommended time interval, while constraints (7) refer to the single-dose vaccines4 . An alternative possibility to adding a penalty function in the model objective function consists of using an explicit constraint imposing that the final stocks, for all vaccine type i, cannot exceed a given (small) quantity s¯iF . These values can be chosen small enough to guarantee that we are using as much as possible of the supplied doses. In any case, it is clear that s¯iF ≥ max{0, B i (T ) − 2B i (T − Δi )}
(11)
should hold. The above model computes an optimal solution of our planning problem under the assumption that the exact amount of supply bit of each vaccine i is perfectly known, for each period t, i.e., the LP solves a deterministic (or off-line) version of the actual stochastic problem. Such solutions can be used as a benchmark to assess the quality of blind heuristic algorithms providing prescriptive information on the number of doses that can be administered each day without a perfect knowledge on the doses supplied in the future. As it is expected that the supply process will become steady and the data about arrival dates trustworthy enough, the above linear programming models could be used as a reliable decision support system. Clearly, these algorithms could also be used in a rolling-horizon fashion, i.e., re-optimizing every single or one-in-n day, taking into account current inventory levels and new estimated future dose arrivals. Off-line Heuristic Algorithm. With regard to the off-line version of the problem, in which the supply bit is deterministic and given for all i ∈ V, t ∈ T , one may ask if a simpler mechanism than the LP-based one described above would determine an optimal (or close to optimal) solution without recurring to a mathematical program. A greedy strategy seems a viable tool due to the simple continuity relations binding the different quantities together (similar, for instance, to those of the classical lot-sizing problem). Therefore, in [8] we present a greedy algorithm that disregards capacities, guarantees the maximum consumption of all the supplied doses and tries to schedule as early as possible the vaccinations while keeping inventory levels nonnegative. Note that—if necessary—capacity constraints can be easily taken into account by applying an additional straightforward procedure. We do not report about this heuristic here since its results are outperformed by the LP optimization model, though with an additional computational cost. For those instances in which this additional cost becomes substantial, the Greedy algorithm can be a useful alternative to the linear optimization model to provide a benchmark for assessing the performance of short-sighted planning procedures, which are illustrated in the next section. 4
While we showed that single-dose vaccines might be easily included in our models, due to scarcity of data about this type of immunization, in the remainder of the paper we only present algorithms and experiments concerning two-doses vaccines.
106
3.3
S. Foderaro et al.
Short-Sighted Vaccination Planning Approaches
Hereafter we illustrate different greedy approaches to establish the number of doses to administer each day t, when no clairvoyance can be assumed on the future supplies. In particular, in this section, we present two different procedures: the first one is denoted as conservative heuristic, and it always produces feasible vaccination plans, i.e., it guarantees that no out-of-stock would ever occur. The second set of algorithms (called q-days-ahead heuristics) is based on the idea that the solution must be feasible at least in the next q days. In the following, we assume that the information on the number of doses bit supplied in day t, is an available datum at the beginning of day t and can be used to compute the plan for the same day, for each vaccine i. The pseudo-code of the conservative algorithm is reported in Algorithm 1. As already mentioned, this procedure guarantees that an adequate level of inventory is always available for boosters, assuming that no knowledge on the future supplies is provided. The idea is that whenever a number b of doses becomes available at t, one can immediately administer b/2 first doses and reserve the remainder to administer the corresponding booster after the prescribed period. The output plan is obtained by augmenting the current solution each time a positive supply of new doses arrives. Capacity constraints that upper-bound the number of doses to be administered are easily taken into account (see Step 5 of the algorithm). Additional details on this procedure can be found in our previous paper [8]. As one may expect, this conservative attitude of the algorithm has its disadvantages in terms of residual inventory and average vaccination times. Moreover, the algorithm does not consider the minimization of the maximum workload.
Algorithm 1. Conservative algorithm 1: for i ∈ V do 2: Initialize x, y and a as null vectors; 3: for t = 1, 2, . . . , T do 4: ait := ait + bit ; i i 1 i 5: δ := max{xit +yti + 12 ait −kti , xit+Δi +yt+Δ i + 2 at −kt+Δi }; {Excess is computed} 6: if δ > 0 then 7: ait+1 := ait+1 + 12 ait + δ; 8: ait := ait − 2δ {Excess is transferred} 9: end if 10: if ait > 0 then 11: xit := xit + 12 ait ; i i 1 i 12: yt+Δ i := yt+Δi + 2 at 13: end if 14: end for 15: end for
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
107
Different administration strategies are provided by the q-days-ahead heuristics, which are based on a less conservative but simple idea exploiting a partial knowledge of arrivals in the next future. In these algorithms, we assume that the amount of stock at time t − 1 should be enough to cover the overall demand over the next q days. Considering the (possibly null) supply at time t, a sufficient q coni . dition not to undergo an out-of-stock in the next q days is sit−1 +bit ≥ =1 yt+ Clearly, this assumption does not ensure to cover possible demand after day t+q. By following this policy over a sliding window that is shifted each day t = 1, . . . , T , we obtain a procedure sketched in Algorithm 2. Algorithm 2. q-days-ahead algorithm 1: Initialize x, y and a as null vectors; 2: for all i ∈ V do 3: for all t ∈ T do i + bit }}; 4: xit := max{0, min{kti − yti , sit−1 − q=1 yt+ i i i 5: Update values st and yt+Δi := xt ; 6: end for 7: end for
Note that Step 4 indicates the number of first doses that can be safely administered at time t while taking into account capacity constraints on the maximum number kti of vaccines i that can be inoculated in day t. Since we can rely on new arrivals to meet the constraint on the stock at the end of the day, we will not be able to safely administer first doses if the following condition holds sit−1 − xit−Δi + bit −
q
i yt+ ≤0
=1
⇒
bit ≤
q
i yt+ + xit−Δi − sit−1 .
(12)
=0
The above procedure is a simple myopic algorithm in which the risk of running out-of-stock is inversely proportional to the value of q. This intuitive relation is also experimentally proved, in the case of ZIP processes, in [8] and in Sect. 4: There, it is shown that the risk of not being able to administer either initial doses or boosters under q-days-ahead strategies, with q = 7, 14, 21, is actually decreasing for increasing values of q and, in particular, it is lower than that suffered in a day-by-day (q = 1) strategy. In fact, though a suitable choice of q avoids the risk of dose shortages with a reasonable degree of confidence, it does not exclude the possibility of such an undesirable situation, which is instead ruled out by the conservative algorithm presented above.
4
Research Findings
In this section, we describe our computational experience to compare the different strategies described above against the benchmark obtained by linear programming optimization (which embodies the clairvoyant strategy). In all cases,
108
S. Foderaro et al.
the outcome is a strategy that designs daily vaccination plans over a time horizon of T days. We first define a set of performance metrics and then report the results of our simulation experiments. In the experiments, the capacity values are variable parameters expressed as an integer multiple of a base-step capacity depending on the particular instance at hand: In particular, for all i ∈ V, ci = T1 t∈W bit . The idea is that a system with a base-step capacity ci for all i ∈ V would be able to consume all the arrived doses of vaccine i, at the end of the planning period, only if it would deliver doses at its capacity level, every single day. The arrivals of doses are identical for all the strategies involved, i.e., represented by a ZIP model where the parameters are respectively π = 0.85 and λ = 107 , as in [8]. We considered 1000 instances of that stochastic model, i.e., 1000 sequences of arrivals over the time horizon T , assumed here to be equal to six months, i.e., T = 180. However, in order to smooth the effect of doses arriving at the end of the planning period—which therefore could not be possibly administered within the time horizon—the algorithms run over an extended time window of T + 28 days so that in the last additional periods no supply is provided thus allowing to reduce the remaining stocks. We first consider the clairvoyant strategy to examine the maximum range of possibilities that we could exploit if we had full knowledge of the arrival process. As mentioned above, this is the best that we could achieve since we assume to know the complete sequence of arrivals (a hypothesis unrealizable in the real world). The performance metrics that we consider in the following are: – vaccination time; – maximum workload; – out-of-stock days. The average vaccination time for the clairvoyant strategy is shown in Fig. 2, where we seek to understand the impact of the optimization weight α and the vaccination capacity. We see that the average vaccination time is constant for nearly the whole range of α. This performance metric is then quite tolerant towards shifts in the optimization weight. However, things change dramatically as the weight is heavily shifted towards minimizing the maximum workload. As the weight exceeds 0.99, the vaccination time explodes. The good news is that we can safely set the optimization weight at or below 0.99 without worsening the vaccination time while probably getting some significant benefit on the maximum workload. In addition, we see that there is no significant reduction of the vaccination time when we increase the daily vaccination capacity over 2c. In contrast, the cost probably increases linearly with that capacity (in order to increase the capacity, we need to staff the vaccination clinics with more nurses, roughly proportionally to the desired capacity). Fixing, for the sake of simplicity, the optimization weight as α = 0.9, we can see what happens with heuristic strategies in Fig. 3, where we have plotted the average vaccination time vs the normalized capacity (i.e., the vaccination capacity divided by c). As expected, the vaccination time decreases with the
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
109
Fig. 2. Average vaccination time under objective trade-off in the clairvoyant case
capacity for all strategies, with a knee located where the capacity equals 2c. There is an exception, though, represented by the conservative strategy, where the average vaccination time keeps decreasing smoothly even when the daily capacity is a large multiple of c. The distance between the benchmark and the q-days ahead strategies reduces as the capacity grows, but, again, the reduction is probably not worth beyond 2c. If we wish to set q in those strategies, we see that the vaccination time keeps reducing as we decrease q. The penalty in the vaccination time with respect to the benchmark reaches 41.5% when we plan for 21 days ahead but reduces to 10.8% when we plan for one day ahead only. The other metric we have considered in our combined objective function is the maximum workload, i.e., the maximum number of doses administered in a day over the whole period. Since we aim to distribute the workload as uniformly as possible over the time horizon to achieve the optimal balance, we want this metric to be as low as possible and as uniform as possible at the same time. A large maximum workload implies some peak days when we are vaccinating exceedingly many people. However, the maximum alone could hide situations where the number of daily vaccinations fluctuates wildly, with undesired implications on the stability of the staff (or, in other terms, the actual utilization rate). On the practical side, an unstable staff calls for continuously adjusting the nurses to assign to the vaccination clinic, a task that may prove quite difficult. Also, we could end up with a large stock of unused vaccine doses at the end of the period. The alternative is to size vaccination clinics according to the maximum workload, ending with days where the staff is oversized and poorly utilized. For the first aspect (the absolute maximum), we must consider both the maximum workload and the stocks at the end of the period. We can first take
110
S. Foderaro et al.
Fig. 3. Average vaccination time: comparison of heuristic strategies
a look at Fig. 4, where we have reported the maximum workload normalized to the capacity, which is the allocated staff. We see that for the sensible capacity values (we have assumed that we would not go further than 2c), the maximum workload is very close to 100% (actually higher than 99%) for all the strategies but the conservative one. For the latter, the maximum workload is 95%. The straightforward conclusion would be to consider the conservative strategy as the best one. However, if we take a look at Fig. 5, we see that the conservative strategy terminates the period with the largest stock of unused doses. The combination of the two criteria (maximum workload and end-of-period stocks) leads us to consider the look-ahead strategies as the preferred ones, since their maximum workload is not much larger than the conservative strategy’s (with no significant differences among the different look-ahead horizons) but end up with quite lower stocks (the lower, the shorter the look-ahead horizon). As to the second aspect, we can measure the dispersion of the maximum workload through its coefficient of variation, i.e., its standard deviation normalized to its mean value over the 1000 instances. We would aim at a zero coefficient of variation, which means a perfectly uniform utilization of the nursing staff over time. In Fig. 6, the dispersion is not very large, ranging from roughly 0.24 to 0.29. The utilization of the staff appears quite balanced over time, with no significant variations among the strategies. Finally, we consider the number of out-of-stock days. Any greedy approach aiming at employing all the available doses as fast as possible leads to the danger of incurring out-of-stock days, where, due to the irregularity of dose arrivals, there are no more first doses available to administer. The out-of-stock days result in partially idle staff since they only administer boosters. In Fig. 7, we see that the percentage of days when we run out of stock may be pretty large if we operate
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
111
Fig. 4. Maximum workload: comparison of heuristic strategies (α = 0.9)
Fig. 5. End-of-period stocks (α = 0.9)
on a day-by-day basis while planning over a longer time horizon smooths out the irregularity of arrivals. We also see the negative side of increasing the capacity, which allows exploiting the peaks of arrivals at the expense of idling during troughs.
112
S. Foderaro et al.
Fig. 6. Maximum workload balance: comparison of heuristic strategies (α = 0.9)
5
Discussion of Findings
In Sect. 4, we have shown the performance metrics of some heuristic strategies, comparing them with the clairvoyant strategy. Is there a best heuristic strategy in the lot we have considered based on those results? Before commenting on those results, we can safely state that we should not overdo the staff size. A capacity above 2c (i.e., twice the staff needed to administer the daily average number of doses) does not bring any significant advantages: the average vaccination time reduces by ten percentage points at most when we increase the capacity to 6c (which roughly means trebling the cost with respect to the 2c size). Having set the capacity to 2c, we observe that q-days ahead strategy with a low-to-moderate look-ahead horizon perform better than the conservative strategy. In particular, the 1-day ahead strategy achieves the minimum average vaccination time, the minimum amount of stock-outs, and a maximum workload and workload standard deviation either in line with or better than the other strategies (as long as we stick to the 2c capacity). Instead, if we wish to reduce the number of out-of-stock days, we have to employ a longer look-ahead horizon.
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
113
Fig. 7. Out-of-stock days: comparison of heuristic strategies
6
Conclusion
Our study suggests a close-to-optimal way of planning vaccine booster administration in the face of delivery uncertainties. Vaccination seems to be still at the centre of COVID-containment campaigns, with a massive effort in communication strategies [37]. It appears then that planning a mass vaccination will still be a critical task in the foreseeable future. Though our study was conceived at a time when the second dose was being planned, we are already in the stage where a third dose (second booster) is being administered in Europe and many other countries [38]. It must then be noted that our study can be easily extended to cater for further boosters, which now looks like a likely occurrence. In addition, though deliveries have somewhat stabilized, they are still subject to uncertainties, where the weekly delivery may be delayed. In addition, the occurrence of waves in contagion figures, typically related to the appearance of new variants of the virus, leads to sudden bumps in the demand and difficulties of delivery procedures to cope with them. We may therefore assume that significant uncertainties will stay with us. The next research threads we are considering concern: a) refining the arrival model (exploiting the mass of data that are now being accumulated); b) adding degrees of freedom in the parameters (such as the range of vaccines, their interchangeability, and the inter-booster time); c) deal with the problem at different spatial scales, e.g. down to the regional level or even at the level of metropolitan vaccination clinics. In addition to those refinements, we would like to consider alternative approaches. An interesting perspective is offered by looking at the problem of dealing with arrival uncertainties through an algorithmic-game-theoretic point of view. Following a consolidated approach (e.g., [39,40]), one may consider that vaccine doses are supplied by an
114
S. Foderaro et al.
opponent, and the decision-maker has to provide the best feasible response, i.e., a plan which rationally reacts to any possible undesirable scenario. A similar methodological flavour is that of Stackelberg games (e.g., [41,42]), which therefore provide a natural direction to deal with the above-mentioned uncertainty issues.
References 1. Yesudhas, D., Srivastava, A., Gromiha, M.M.: COVID-19 outbreak: history, mechanism, transmission, structural studies and therapeutics. Infection 49(2), 199–213 (2020). https://doi.org/10.1007/s15010-020-01516-2 2. Abbasi, K.: COVID-19: why prioritising prevention matters in a pandemic of cures. BMJ Br. Med. J. 373(1275) (2021). https://doi.org/10.1136/bmj.n1275 3. Dai, H., Han, J., Lichtfouse, E.: Smarter cures to combat COVID-19 and future pathogens: a review. Environ. Chem. Lett. 19(4), 2759–2771 (2021). https://doi. org/10.1007/s10311-021-01224-9 4. Mallapaty, S.: Can COVID vaccines stop transmission? Scientists race to find answers. Nature (2021). https://doi.org/10.1038/d41586-021-00450-z 5. Desmond, A., Offit, P.A.: On the shoulders of giants—from Jenner’s cowpox to mRNA COVID vaccines. New Engl. J. Med. 384(12), 1081–1083 (2021). https:// doi.org/10.1056/NEJMp2034334 6. Wouters, O.J., et al.: Challenges in ensuring global access to COVID-19 vaccines: production, affordability, allocation, and deployment. Lancet 397(10278), 1023– 1034 (2021). https://doi.org/10.1016/S0140-6736(21)00306-8 7. Feinmann, J.: COVID-19: global vaccine production is a mess and shortages are down to more than just hoarding. BMJ Br. Med. J. (Online) 375, 2375 (2021). https://doi.org/10.1136/bmj.n2375 8. Foderaro, S., Naldi, M., Nicosia, G., Pacifici, A.: Mass vaccine administration under supply uncertainty. In: 2021 16th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 393–402 (2021). https://doi.org/10.15439/2021F78 9. Naldi, M., Nicosia, G., Pacifici, A., Pferschy, U.: Profit-fairness trade-off in project selection. Socio Econ. Plan. Sci. 67, 133–146 (2019). https://doi.org/10.1016/j. seps.2018.10.007 10. Iskander, J., Strikas, R.A., Gensheimer, K.F., Cox, N.J., Redd, S.C.: Pandemic influenza planning, United States, 1978–2008. Emerg. Infect. Dis. 19(6), 879–885 (2013). https://doi.org/10.3201/eid1906.121478 11. Fedson, D.S.: Preparing for pandemic vaccination: an international policy agenda for vaccine development. J. Public Health Policy 26(1), 4–29 (2005). https://doi. org/10.1057/palgrave.jphp.3200008 12. Gostin, L.O.: Pandemic influenza: public health preparedness for the next global health emergency. J. Law Med. Ethics 32(4), 565–573 (2004). https://doi.org/10. 1111/j.1748-720x.2004.tb01962.x 13. Heymann, D.L., Aylward, R.B.: Mass vaccination: when and why. In: Plotkin, S.A. (ed.) Mass Vaccination: Global Aspects — Progress and Obstacles. CT MICROBIOLOGY, vol. 304, pp. 1–16. Springer, Heidelberg (2006). https://doi.org/10.1007/ 3-540-36583-4_1
Planning a Mass Vaccination Campaign with Balanced Staff Engagement
115
14. Bagcchi, S.: The world’s largest COVID-19 vaccination campaign. Lancet Infect. Dis. 21(3), 323 (2021). https://doi.org/10.1016/S1473-3099(21)00081-5 15. Sah, R., et al.: AZD1222 (Covishield) vaccination for COVID-19: experiences, challenges and solutions in Nepal. Travel Med. Infect. Dis. 40(101989) (2021). https:// doi.org/10.1016/j.tmaid.2021.101989 16. Mathieu, E., et al.: A global database of COVID-19 vaccinations. Nat. Hum. Behav. 5(7), 1–7 (2021). https://doi.org/10.1038/s41562-021-01122-8 17. Rambhia, K.J., Watson, M., Sell, T.K., Waldhorn, R., Toner, E.: Mass vaccination for the 2009 h1n1 pandemic: approaches, challenges, and recommendations. Biosecurity Bioterrorism Biodefense Strategy Pract. Sci. 8(4), 321–330 (2010). https:// doi.org/10.1089/bsp.2010.0043 18. Hessel, L., et al.: Pandemic influenza vaccines: meeting the supply, distribution and deployment challenges. Influenza Other Respir. Viruses 3(4), 165–170 (2009). https://doi.org/10.1111/j.1750-2659.2009.00085.x 19. Harris, P., Moss, D.: Managing through the COVID second wave: public affairs and the challenge of COVID vaccination. J. Public Affairs 21(e2642) (2021) 20. De Boeck, K., Decouttere, C., Vandaele, N.: Vaccine distribution chains in low- and middle-income countries: a literature review. Omega 97, 102097 (2020). https:// doi.org/https://doi.org/10.1016/j.omega.2019.08.004, https://www.sciencedirect. com/science/article/pii/S0305048319304098 21. Yen, C., et al.: The development of global vaccine stockpiles. Lancet Infect. Dis. 15(3), 340–347 (2015). https://doi.org/10.1016/S1473-3099(14)70999-5 22. Al Khalaf, R., Alfonsi, T., Ceri, S., Bernasconi, A.: CoV2K: a knowledge base of SARS-CoV-2 variant impacts. In: Cherfi, S., Perini, A., Nurcan, S. (eds.) RCIS 2021. LNBIP, vol. 415, pp. 274–282. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-75018-3_18 23. Rosen, B., Waitzberg, R., Israeli, A.: Israel’s rapid rollout of vaccinations for COVID-19. Israel J. Health Policy Res. 10(1), 6 (2021). https://doi.org/10.1186/ s13584-021-00440-6 24. Liapis, A., et al.: A position paper on improving preparedness and response of health services in major crises. In: Bellamine Ben Saoud, N., Adam, C., Hanachi, C. (eds.) ISCRAM-med 2015. LNBIP, vol. 233, pp. 205–216. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24399-3_18 25. Yaylali, E., Ivy, J.S., Taheri, J.: Systems engineering methods for enhancing the value stream in public health preparedness: the role of Markov models, simulation, and optimization. Public Health Rep. 129(6_suppl4), 145–153 (2014). https:// doi.org/10.1177/00333549141296s419 26. Aaby, K., Herrmann, J.W., Jordan, C.S., Treadwell, M., Wood, K.: Montgomery county’s public health service uses operations research to plan emergency mass dispensing and vaccination clinics. Interfaces 36(6), 569–579 (2006). https://doi. org/10.1287/inte.1060.0229 27. Lee, E.K., Maheshwary, S., Mason, J., Glisson, W.: Decision support system for mass dispensing of medications for infectious disease outbreaks and bioterrorist attacks. Ann. Oper. Res. 148(1), 25–53 (2006). https://doi.org/10.1007/s10479006-0087-7 28. Bertsimas, D., et al.: Optimizing vaccine allocation to combat the COVID-19 pandemic. medRxiv (2020). https://doi.org/10.1101/2020.11.17.20233213 29. Bertsimas, D., Digalakis, V., Jr., Jacquillat, A., Li, M.L., Previero, A.: Where to locate COVID-19 mass vaccination facilities? Naval Res. Logistics (NRL) 69(2), 179–200 (2021). https://doi.org/10.1002/nav.22007
116
S. Foderaro et al.
30. Karczmarczyk, A., Wątróbski, J., Jankowski, J.: Multi-criteria approach to planning of information spreading processes focused on their initialization with the use of sequential seeding. In: Ziemba, E. (ed.) AITM/ISM -2019. LNBIP, vol. 380, pp. 116–134. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43353-6_7 31. Graham, B.S.: Rapid COVID-19 vaccine development. Science 368(6494), 945–946 (2020). https://doi.org/10.1126/science.abb8923 32. Chen, W.-H., Strych, U., Hotez, P.J., Bottazzi, M.E.: The SARS-CoV-2 vaccine pipeline: an overview. Curr. Trop. Med. Rep. 7(2), 61–64 (2020). https://doi.org/ 10.1007/s40475-020-00201-6 33. Rosa, S.S., Prazeres, D.M., Azevedo, A.M., Marques, M.P.: mRNA vaccines manufacturing: challenges and bottlenecks. Vaccine 39(16), 2190–2200 (2021). https:// doi.org/10.1016/j.vaccine.2021.03.038 34. Naldi, M., Salaris, C.: Rank-size distribution of teletraffic and customers over a wide area network. Eur. Trans. Telecommun. 17(4), 415–421 (2006). https://doi. org/10.1002/ett.1084 35. Naldi, M.: A probability model for the size of investment projects. In: 2015 IEEE European Modelling Symposium (EMS), pp. 169–173. IEEE (2015). https://doi. org/10.1109/EMS.2015.35 36. Beckett, S., et al.: Zero-inflated Poisson (ZIP) distribution: parameter estimation and applications to model data from natural calamities. Involve J. Math. 7(6), 751–767 (2014). https://doi.org/10.2140/involve.2014.7.751 37. Motta, M., Sylvester, S., Callaghan, T., Lunz-Trujillo, K.: Encouraging COVID19 vaccine uptake through effective health communication. Front. Polit. Sci. 3, 1 (2021). https://doi.org/10.3389/fpos.2021.630133 38. Barda, N., et al.: Effectiveness of a third dose of the BNT162b2 mRNA COVID-19 vaccine for preventing severe outcomes in Israel: an observational study. Lancet 398(10316), 2093–2100 (2021). https://doi.org/10.1016/S0140-6736(21)02249-2 39. Marini, C., Nicosia, G., Pacifici, A., Pferschy, U.: Strategies in competing subset selection. Ann. Oper. Res. 207(1), 181–200 (2013). https://doi.org/10.1007/ s10479-011-1057-2 40. Nicosia, G., Pacifici, A., Pferschy, U.: Competitive subset selection with two agents. Discrete Appl. Math. 159(16), 1865–1877 (2011). https://doi.org/10.1016/j.dam. 2010.11.011 41. Pferschy, U., Nicosia, G., Pacifici, A.: A Stackelberg knapsack game with weight control. Theoret. Comput. Sci. 799, 149–159 (2019). https://doi.org/10.1016/j.tcs. 2019.10.007 42. Pferschy, U., Nicosia, G., Pacifici, A., Schauer, J.: On the Stackelberg knapsack game. Eur. J. Oper. Res. 291(1), 18–31 (2021). https://doi.org/10.1016/j.ejor. 2020.09.007
Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset Federica Rollo(B) , Giovanni Bonisoli , and Laura Po Enzo Ferrari Engineering Department, University of Modena and Reggio Emilia, Modena, Italy {federica.rollo,giovanni.bonisoli,laura.po}@unimore.it
Abstract. The automatic categorization of crime news is useful to create statistics on the type of crimes occurring in a certain area. This assignment can be treated as a text categorization problem. Several studies have shown that the use of word embeddings improves outcomes in many Natural Language Processing (NLP), including text categorization. The scope of this paper is to explore the use of word embeddings for Italian crime news text categorization. The approach followed is to compare different document pre-processing, Word2Vec models and methods to obtain word embeddings, including the extraction of bigrams and keyphrases. Then, supervised and unsupervised Machine Learning categorization algorithms have been applied and compared. In addition, the imbalance issue of the input dataset has been addressed by using Synthetic Minority Oversampling Technique (SMOTE) to oversample the elements in the minority classes. Experiments conducted on an Italian dataset of 17,500 crime news articles collected from 2011 till 2021 show very promising results. The supervised categorization has proven to be better than the unsupervised categorization, overcoming 80% both in precision and recall, reaching an accuracy of 0.86. Furthermore, lemmatization, bigrams and keyphrase extraction are not so decisive. In the end, the availability of our model on GitHub together with the code we used to extract word embeddings allows replicating our approach to other corpus either in Italian or other languages. Keywords: Text categorization · Word embeddings Crime category · Keyphrase extraction
1
· Word2Vec ·
Introduction
The categorization of news articles consists of understanding the topic of the articles and associating each of them to a category. In the case of news articles related to crimes, the scope is to identify the type of crime (crime categorization). This task is important for many reasons. The first one is the need to create statistics on the type of events. Indeed, categorization allows understanding how c Springer Nature Switzerland AG 2022 E. Ziemba and W. Chmielarz (Eds.): FedCSIS-AIST 2021/ISM 2021, LNBIP 442, pp. 117–139, 2022. https://doi.org/10.1007/978-3-030-98997-2_6
118
F. Rollo et al.
often a certain type of crime occurs [1]. Secondly, categorization enables further processing that is in the scope of crime analysis. From each news article, it is possible to retrieve detailed information about the event it reports: the place, the author of the crime, the victim [2]. If we know the type of crime, we can also look for information specific to that crime type, e.g., the stolen items in a theft. Analyzing crime news articles allows also to studying how exposure to crime news articles content is associated with perceived social trust [3]. Moreover, Machine Learning approaches can help crime analysts to identify the connected events and to generate alerts and predictions that lead to better decision-making and optimized actions [4]. In this paper, we introduce an approach to perform crime categorization on Italian news articles based on word embeddings. This work also addresses the unbalance problem of the input dataset by using the Synthetic Minority Oversampling Technique (SMOTE) [5] to oversample the elements in the minority classes. This paper extends the work done in [6] in different points: – bigram and keyphrase extraction has been added in the pre-processing for the extraction of document embeddings, – experiments have been conducted on a bigger dataset and, consequently, Word2Vec model has been trained on the new dataset, – both supervised and unsupervised algorithms have been applied to the whole dataset of 17,500 instances, while in the previous paper, unsupervised categorization was done only on 200 instances of each category, – also the silhouette coefficient has been taken into account for the clustering evaluation, and precision, recall and accuracy have been averaged considering the number of news articles for each category because of the imbalance of the dataset. In addition, we release a new Word2Vec model along with the code used to extract the document embeddings and train some supervised and unsupervised algorithms.1 This could be useful for other researchers to replicate our experiments on a different dataset. The rest of the paper is organized as follows. The general approach is described in Sect. 3 focusing on the document vector extraction and the application of crime categorization algorithms. Section 4 details the experiments of crime categorization, which is performed on Italian news articles using both supervised and unsupervised techniques and different pre-processing phases, and discusses the obtained results. Section 5 is dedicated to conclusions.
2
Literature Review
Crime analysis is a set of systematic, analytical processes for providing timely and pertinent information relative to crime patterns and trend correlations to 1
Code available at: https://github.com/SemanticFun/Word2Vec-for-text-categori zation/.
Supervised and Unsupervised Categorization
119
assist the police in crime reduction, prevention, and evaluation. If police reports are made public, they can be analyzed and geolocalized for the above mentioned scopes. However, police reports are usually private documents. In addition, if they are made public, the time delay between the occurrence of the event and the report publication can reach some days, months or even years. Therefore, police reports cannot be considered a possible source for timely crime analysis for citizens. In those cases, newspapers are a valuable source of authentic and timely information [7]. In Italy, police crime reports are not available to citizens, only some aggregated analyses are published yearly. Indeed, several works concerning crime analysis exploit news articles [7–10]. Detailed information about the crime events can be extracted through the application of Natural Language Processing (NLP) techniques to the news articles’ text. The scope of assigning a news article to a crime category can be addressed following several approaches, such as text classification, community or topic detection [11–14]. In this work, we model this problem as a text classification task which consists of automatically assigning text documents to one of the predefined categories. Due to the information overload, this is a well-proven way to organize free document sources, improve browsing, or identify related content by tagging content or products based on their categories. Newspaper articles are part of the increasing volume of textual data generated every day together with company data, healthcare data, social network contents, and others. News categorization can be useful to organize them by topic for generating statistics [15] or detect fake news [16–18]. Automatic text classification has been widely studied since the1960s and over the years different types of approaches to this problem have arisen. Recent surveys [19,20] mainly distinguish between two categories of approaches: conventional methods (or shallow learning methods) and deep learning-based methods. Conventional methods are those that need a pre-processing step to transform the raw text input into flat features which can be fed to a Machine Learning model. In the literature, there are several feature extraction techniques, such as term frequency (TF), term frequency-inverse document frequency (TF-IDF), Ngrams, Bag-of-words and word embeddings. Among these, word embedding is one of the most recent text representations which is swiftly growing in popularity. Word embedding is a continuous vector representation of words that encodes the meaning of the word, such that the words that are closer in the vector space are supposed to be similar in the meaning. The use of word embeddings as additional features improves the performance in many NLP tasks, including text classification [21–29]. Different Machine Learning algorithms can be trained to derive these vectors, such as Word2Vec [30], FastText [31], Glove [32]. In the last decade, deep neural networks have been overcoming state-of-theart results in many fields, including Natural Language Processing. This success relies on their capacity to model complex and non-linear relationships within data. This has led to increasing development of deep learning-based methods also in text classification. They exploit many of the most known deep learning architectures, such as CNNs [33–35], RNNs [36,37], LSTMs [38–40] and the
120
F. Rollo et al.
most recent Transformers [41,42]. Unlike conventional methods, they do not need designing rules and features by humans, since they automatically provide semantically meaningful representations. These advantages involve a great deal of complexity and computational costs. Many of the works regarding news categorization fall in the category of the conventional methods we have mentioned above. Keyword extraction, term frequency, document frequency, TF-IDF, POS tagging are mainly used as feature extraction methods along with the traditional Machine Learning models as classification methods, such as Naive Bayes, Decision Tree, Support Vector Machine or K-Nearest Neighbour [43–45]. There are some examples also for the Italian language [46–48]. However, none of them exploit word embeddings for feature extraction. In [49,50] two multi-label classification approaches are described; in these works, the feature extraction methods leverage on topic modeling through Latent Dirichlet Allocation, which has the advantage to make text dimension reduced before getting the features. More recent works include the use of deep learning architecture, in particular CNNs [51–53] and BERT [54,55]. In literature, there are very few works on the categorization of crime news articles. The authors of [56] proposed an approach for classifying Thailand online crime news involving TF-IDF as feature extraction method and tested six different Machine Learning algorithms for classification. The classifiers with the best results are Support Vector Machine and Multinomial Naive Bayes which reach an F-measure around 80%. In [39] better results (98.87% of accuracy) are achieved by using LSTM to classify Spanish news texts deriving the text representation from a pre-trained Spanish Word2Vec model. From the above-reviewed literature and to the best of our knowledge, there are no works devoted to developing methods for the automatic classification of criminal and violent activities from documents written in Italian.
3
Research Methodology
The general procedure consists of the use of word embeddings (also indicated as word vector) to assign to each news article a document vector. Then, the document vectors are exploited by a categorization algorithm to assign to each news article a category. We use Word2Vec as a word embedding model and perform categorization through both supervised and unsupervised algorithms. 3.1
Document Vector Extraction
To start with, the text of the news articles is pre-processed following some consecutive phases, as illustrated in Fig. 1. The first phase is the tokenization (Point 1), which returns the list of the words that are present in the text, then the stop word removal phase (Point 2), a commonly used technique before performing NLP tasks, removes the stop words (e.g., articles, prepositions, conjunctions) from the above list since they usually occur a lot of times in texts and do not provide any relevant information. The result is a list of the most relevant words
Supervised and Unsupervised Categorization
121
that are present in the text. Then, the lemmatization (Point 3) is applied for replacing the words in the list with their lemma. At the end of these phases, the final result is a list of meaningful tokens for every news article (Point 4).
Fig. 1. Document vector extraction.
In addition, bigrams and keyphrases are extracted to identify the most frequent sequences of two adjacent words (bigram) (Point 5) and the most relevant expressions that can contain multiple words (keyphrase) (Point 6). The bigrams are extracted from the list of tokens after removing the stop words, considering a minimum number of co-occurrences of the two words (min count) and a threshold of the score [57] obtained by the following the formula: score =
L ∗ (bigram count − min count) count(X) ∗ count(Y )
where L is the number of unique tokens in the text of the news article, bigram count if the number of occurrences of the bigram, and X and Y are the two words of the candidate bigram. The keyphrases are identified in the news articles’ text by using the RAKE (Rapid Automatic Keyword Extraction) algorithm [58] that is an unsupervised and domain-independent method. Both bigrams and keyphrases are added to the list of tokens (Point 4). The news articles with an empty list of tokens are removed. Then, each token is replaced by its corresponding word embedding using a trained Word2Vec model (Point 7 and 8). Word2Vec is based on a shallow neural network whose input data are generated by a window sliding on the text of the training corpus. This window selects a context within which it chooses a target to obscure and predict based on the rest of the selected context. Through this “fake task” internal parameters of the network are learned which constitute word embedding, the real objective of training. If a token in the list is not found in the vocabulary of the model, it is simply discarded from the list without any replacement. Consequently, an aggregation function is applied to the obtained
122
F. Rollo et al.
word embeddings to get the document vector of each news article (Point 9). As the authors of [28] suggest, two vector representations can be extracted: A1 the simple average of the word vectors, A2 the average of the word vectors weighted by the TF-IDF score of each word computed on the text of the news articles in the dataset. This representation gives more importance to those vectors that are related to words with a high frequency in the text of a news article and a low frequency in the others. The obtained document vectors (Point 10) are the input data for any categorization algorithm. 3.2
News Categorization
After obtaining the document vectors of each news article in the dataset, several algorithms can be used to identify the category each news article belongs to. Both supervised and unsupervised techniques can be taken into account. The supervised text categorization algorithms predict the topic of a document within a predefined set of categories, named labels. In our case, the labels are the crime categories listed in Sect. 4.1 and the documents are the texts of the crime news articles which are represented by the document vectors. The unsupervised text categorization, also known as clustering, is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar to each other than those in the other groups. The use of clustering for crime categorization consists of feeding the obtained document vectors into an algorithm and checking if the final clusters have a correspondence with the crime categories listed in Sect. 4.1. As suggested by the authors of [59], to address the unbalance problem of the input dataset, the Synthetic Minority Oversampling Technique (SMOTE) [5] is employed. The approach is to oversample the elements in the minority class. Starting from an imbalanced dataset, this technique creates new samples for the classes that are present in minority in order to equal the number of elements in the most present category. The algorithm works in the feature space, then the new points do not correspond to real data. SMOTE first selects a minority class instance a at random and finds its k nearest minority class neighbors. The synthetic instance is then created by choosing one of the k nearest neighbors b at random and connecting a and b to form a line segment in the feature space. The synthetic instances are generated as a convex combination of the two chosen instances a and b.
4
Research Findings and Discussion
This Section is devoted to present the Italian Crime News dataset used in the experiments (Sect. 4.1) and the setup of our experiments (Sect. 4.2), and to describe the metrics used for the evaluation (Sect. 4.3), while in Sect. 4.4 and 4.5 we discuss the results obtained with the supervised and unsupervised algorithms, respectively.
Supervised and Unsupervised Categorization
4.1
123
Italian Crime News Dataset
The texts to categorize are extracted from an Italian dataset of crime news articles. The information about the news articles is collected by the Crime Ingestion App [10], a Java application that aims at extracting, geolocalizing and deduplicating crime-related news articles from two online newspapers of the province of Modena, in Italy (“ModenaToday”2 and “Gazzetta di Modena”3 ). The selection of these newspapers is motivated by their popularity. They publish on average 850 news articles per year related to crimes in the Modena province. There exist other 3 minor online newspapers, however integrating them will not substantially change the results since they cover just 5% of the total news articles we already collect, and their news articles are in almost all cases duplicated with respect to the news reported by the two main newspapers. The data extracted from the newspapers include the URL of the web page containing the news article, the title of the news article, the sub-title, the text, the information related to the place where the crime occurred (municipality, area, and address), the publication datetime that is the date and the time of publication of the news article, and the event datetime that refers to the date of crime event. Part of these data is automatically extracted from the web page of the news articles, the other ones are identified by applying NLP techniques to the text of the news articles. Besides, the newspapers we consider already classify news articles according to the crime type (this classification is done manually by the journalist, author of the news articles). Each news article is assigned to a specific crime category. The list of categories we elaborated on is based on two lists of crimes: the annual crime reports of the Italian National Institute of Statistics (ISTAT) and the data of the Italian Department of Public Security of the Minister of the Interior (published by Sole24Ore4 ). The ISTAT annual report5 shows, aggregated by time and space, the number of crimes divided by category that happen in each Italian province. The crime hierarchy of ISTAT is very detailed with 53 types of crimes organized in a hierarchy. The Italian Department of Public Security of the Minister of the Interior publishes a list of the most frequent crimes in each province, also Modena. It uses 37 categories of crimes. For the city of Modena in 2021, only 13 categories have a number of complaints greater than 0.4 on 100,000 inhabitants. Based on those two lists, and on the broader categories of crimes used by newspapers, we elaborated our own list of crimes. The total number of categories is 13: “furto” (theft), “rapina” (robbery), “omicidio” (murder), “violenza sessuale” (sexual violence), “maltrattamento” (mistreatment), “aggressione” (aggression), “spaccio” (illegal sale, most commonly used to refer to drug trufficking), “droga” (drug dealing), “truffa” (scam), “frode” (fraud), “riciclaggio” (money laundering), “evasione” (evasion), and “sequestro” (kidnapping). 2 3 4 5
ModenaToday newspaper: https://www.modenatoday.it/. Gazzetta di Modena newspaper: https://gazzettadimodena.gelocal.it/modena. https://lab24.ilsole24ore.com/indice-della-criminalita/?Modena. http://dati.istat.it/Index.aspx?DataSetCode=dccv delittips.
124
F. Rollo et al.
The current dataset contains 17,500 news articles published in the two selected newspapers from 2011 to now (approximately 10 years). The dataset is imbalanced on the category of the crimes that are described in the news articles. 4.2
Experimental Setup
Considering that the news articles in our dataset are written in Italian, three Word2Vec models have been chosen for our experiments: M1 a pre-trained model [60], whose dimension is 300. The dataset used to train Word2Vec was obtained exploiting the information extracted from a dump of Wikipedia, the main categories of Italian Google News and some anonymized chats between users and the customer care chatbot Laila.6 The dataset (composed of 2.6 GB of raw text) includes 17,305,401 sentences and 421,829,960 words. M2 a Skip-Gram model trained from scratch on the crime news articles of our dataset for 30 epochs (window size = 10, min count = 20, negative sampling = 20, embedding dim = 300 ). M3 a Skip-Gram model which has been trained on the crime news articles of our dataset for 5 epochs, starting from the embeddings of M1 (window size = 10, min count = 20, negative sampling = 20, embedding dim = 300 ). The experiments have been conducted employing all the three models to extract the word embeddings separately. In addition, three different configurations of the pre-processing phase have been set up to allow a comparison of the results: P1 pre-processing with tokenization and stop word removal. The result is a list of relevant words for each news article. P2 pre-processing with tokenization, stop word removal, and lemmatization. The result is a list of relevant lemmatized words for each news article. P3 pre-processing with tokenization, stop word removal, lemmatization, and bigram and keyphrase extraction. The result is the list of P2 with the integration of bigrams and keyphrases. In the end, the two aggregation functions mentioned in Sect. 3 (A1 and A2) has been applied to the word embeddings. Concluding, we obtained 18 different combinations of pre-processing, word embeddings’ average, and Word2Vec model. In the following, Sect. 4.4 presents our tests with supervised text categorization algorithms, while Sect. 4.5 discusses some experiments with unsupervised methods.
6
https://www.laila.tech/.
Supervised and Unsupervised Categorization
4.3
125
Evaluation Metrics
Precision, recall and accuracy are the most common metrics when evaluating a categorization task. They are obtained by the following formula: precision =
TP TP TP + TN , recall = , accuracy = TP + FP TP + FN TP + FP + FN + TN
where, given a category, T P is the number of samples that are correctly assigned to that category (true positives), F P is the number of samples that are associated to that category, but they belong to a different one (false positives), F N represents the number of samples of that category that are assigned by the algorithm to another one (false negatives), and T N indicates the number of samples that are correctly not assigned to that category (true negatives). Using the precision and the recall values, F1-score can be calculated: F1 = 2 ∗
precision ∗ recall precision + recall
For the supervised algorithms, we calculated these metrics for each category, and then we averaged the obtained values. Since our dataset is very imbalanced, the average is weighted by the support, i.e. the number of news articles for each category. Before calculating the same metrics for the unsupervised categorization, we need to find the best match between the class labels and the cluster labels, i.e. to assign a category of crime to each cluster. We start finding the highest number of samples for a certain category in a cluster, and assign the category to that cluster. Then, we go on with the other clusters and the other categories, again starting from the highest number of samples. The process assigns only one category to each cluster, and a category cannot be assigned to multiple clusters. For each cluster, we calculate the values of precision, recall, accuracy, and F1-score and then find the average of these values for the overall values. In addition to the above-mentioned metrics, the Silhouette Coefficient [61] is used in unsupervised categorization to assess the quality of clusters, and determine how well the clusters fit the input data. This metric evaluates the density of clusters generated by the model. The score is computed by averaging the silhouette coefficient for each sample, that is computed as the difference between the average intra-cluster distance (a), i.e. the average distance between each point within a cluster, and the mean nearest-cluster distance (b), i.e. the average distance between all clusters, normalized by the maximum value: N
silhouette =
1 (bk − ak ) ∗ N max(ak , bk ) k=1
where N is the number of generated clusters. The Silhouette Coefficient is a score between 1 and −1, where 1 means that there are highly dense clusters and clearly distinguished, while −1 stands for completely incorrect clustering. A value near 0 represents overlapping clusters with samples very close to the decision boundary
126
F. Rollo et al.
of the neighboring clusters. Different distance metrics can be used to calculate a and b, the most common distances are the euclidean distance, the manhattan distance, canberra, cosine, jaccard, minkowski. We use the euclidean distance. Table 1. The number of news articles in the training and test sets for each category in supervised categorization. Category Theft
7,062
3,658
Drug dealing
1,180
617
Illegal sale
769
382
Aggression
619
301
Robbery
500
303
Scam
414
204
Mistreatment
225
125
Evasion
196
92
Murder
169
81
Kidnapping
162
85
99
56
Money laundering Sexual violence
4.4
Training set Test set
106
39
Fraud
42
14
Total
11,543
5,957
Supervised Categorization
Different supervised machine learning algorithms have been exploited to compare their performances. For each algorithm, around 65% of the news articles in the dataset is used as the training set, while the remaining is used as the test set. Both sets contain articles from both newspapers. Table 1 shows the number of news articles for each category that are included in each set. As can be noticed, there is a considerable imbalance of the categories. The dominant category is “theft”. Table 2 shows the values of precision and recall of 15 supervised algorithms trained on the document embeddings obtained by the 18 different combinations of pre-processing configuration, word embeddings’ average and Word2Vec model. In the table, the first column contains the name of the categorization algorithm employed, and the highlighted cells with the number in bold indicate the best values of precision or recall (values greater than or equal to 0.78). As can be seen, there are six algorithms with the highest values: Linear SVC (C = 1.0), SVC (RBF kernel, C = 1.0, gamma = ‘scale’), SGD (both configurations), Bagging, and XGBoost. Considering the performance of these algorithms in the different configurations, we can notice that the lowest values are found when model M1 is used. Therefore, even if the embeddings of M1 are trained on a dataset that
Supervised and Unsupervised Categorization
127
Table 2. Precision (P) and recall (R) of the application of different categorization algorithms on the embeddings derived from the three selected models. P1 M1
M2
P2 M3
M1
M2
P3 M3
M1
M2
M3
A1
A2
A1
A2
A1
A2
A1
A2
A1
A2
A1
A2
A1
A2
A1
A2
A1
A2
Linear SVC
P
.77
.75
.80
.79
.80
.79
.77
.74
.80
.79
.80
.79
.74
.72
.79
.77
.79
.77
(C=1.0)
R
.78
.76
.81
.80
.81
.79
.78
.76
.81
.80
.81
.79
.76
.73
.80
.78
.80
.78
SVC (RBF, C=1,
P
.75
.74
.79
.79
.79
.79
.74
.74
.79
.79
.78
.79
.69
.71
.79
.79
.78
.78
gamma=‘scale’)
R
.75
.74
.80
.80
.80
.80
.74
.74
.80
.80
.79
.80
.73
.72
.80
.80
.79
.79
SGD (L2 norm,
P
.76
.75
.80
.77
.80
.79
.74
.74
.79
.78
.80
.78
.71
.72
.79
.79
.79
.79
Hinge loss)
R
.76
.74
.80
.78
.81
.79
.74
.76
.80
.79
.80
.79
.74
.73
.79
.73
.79
.77
P
.73
.71
.77
.76
.77
.76
.72
.70
.77
.76
.77
.76
.69
.68
.76
.75
.77
.76
R
.74
.73
.78
.77
.78
.78
.74
.72
.78
.77
.78
.78
.72
.70
.77
.76
.78
.78
Bagging
P
.72
.70
.76
.76
.77
.76
.72
.70
.77
.76
.77
.76
.67
.65
.76
.75
.76
.75
(KNN(n=5))
R
.74
.72
.77
.77
.78
.77
.74
.72
.78
.77
.78
.77
.70
.68
.77
.76
.77
.76
P
.71
.70
.76
.75
.76
.76
.71
.70
.76
.75
.76
.75
.65
.65
.75
.74
.76
.75
R
.73
.72
.77
.76
.77
.77
.73
.72
.78
.77
.77
.77
.69
.68
.77
.76
.77
.76
SGD (L1 norm,
P
.80
.68
.76
.76
.80
.77
.73
.73
.83
.75
.78
.75
.73
.69
.79
.77
.72
.77
Perceptron)
R
.57
.69
.73
.76
.73
.75
.69
.70
.68
.76
.75
.74
.70
.69
.74
.72
.71
.68
P
.70
.69
.75
.74
.76
.75
.69
.68
.76
.75
.76
.75
.66
.64
.74
.73
.74
.73
R
.72
.71
.77
.76
.77
.76
.72
.71
.77
.76
.77
.76
.69
.67
.75
.75
.75
.75
P
.69
.68
.74
.73
.75
.73
.69
.67
.75
.74
.75
.73
.64
.63
.73
.72
.73
.73
R
.69
.67
.74
.73
.75
.74
.69
.67
.75
.74
.75
.74
.65
.63
.73
.72
.73
.74
Random Forest
P
.71
.66
.74
.73
.75
.74
.64
.63
.75
.73
.75
.74
.62
.61
.71
.71
.74
.74
(n=100)
R
.70
.68
.74
.72
.75
.74
.69
.67
.75
.73
.75
.75
.67
.65
.72
.71
.74
.74
Bagging
P
.64
.62
.70
.68
.70
.70
.62
.61
.71
.69
.72
.72
.60
.60
.68
.67
.71
.70
(Decision Tree)
R
.69
.67
.72
.71
.73
.72
.68
.66
.73
.72
.74
.73
.66
.66
.71
.70
.73
.72
Adaboost
P
.63
.64
.71
.71
.70
.71
.62
.59
.72
.69
.71
.70
.59
.56
.67
.66
.70
.70
(Decision Tree)
R
.67
.68
.73
.72
.72
.73
.67
.65
.74
.72
.73
.72
.65
.64
.71
.69
.72
.72
P
.69
.69
.74
.74
.74
.74
.69
.68
.74
.74
.74
.75
.67
.66
.74
.74
.73
.74
R
.55
.54
.62
.62
.58
.58
.58
.55
.63
.63
.61
.61
.56
.54
.60
.61
.58
.58
P
.73
.71
.76
.75
.76
.75
.73
.70
.76
.76
.76
.76
.71
.70
.74
.73
.74
.73
R
.52
.43
.60
.58
.59
.57
.52
.39
.61
.58
.60
.58
.50
.41
.60
.59
.58
.57
P
.57
.55
.62
.62
.64
.64
.56
.55
.64
.62
.65
.63
.54
.52
.61
.60
.63
.62
R
.56
.55
.62
.61
.63
.63
.55
.54
.63
.61
.64
.62
.53
.51
.61
.59
.62
.62
XGBoost
KNN (k=5)
KNN (k=3) KNN (k=1)
BernoulliNB GaussianNB Decision Tree
largely includes news articles and contains contexts very similar to the ones of our dataset, M2 and M3 outperform M1 in terms of precision and recall. This is probably due to the fact that the word embeddings of M2 and M3 are learned from the same documents that are then categorized (indeed, both M2 and M3 are trained on our crime news articles). This makes certain words more discriminative for certain contexts, and therefore, for certain crime categories. Comparing models M2 and M3, we notice that the use of lemmatization and the extraction of bigram and keyphrase has little influence on the performances. The same consideration can be done comparing the simple average and the TF-IDF weighted average. Table 3 shows in detail the results of the best algorithm (Linear SVC) using the embeddings of M3, the pre-processing with tokenization and stop word removal, and the simple average for each category. The first column contains the name of the crime category, while the second column indicates the number of news articles in the test set for that category. The values of precision, recall and f1-score show that the algorithm suffers from the imbalance of the
128
F. Rollo et al.
Table 3. Precision, recall and F1-score for each crime category obtained using the embeddings of model M3 with pre-processing P1 and average A1, and Linear SVC. Category
#news articles Precision Recall f1-score
Theft
525
.96
.96
.96
Drug dealing
177
.81
.81
.81
Illegal sale
173
.82
.82
.82
Robbery
143
.87
.78
.82
90
.72
.81
.76
Aggression Scam
81
.80
.86
.83
Murder
42
.80
.93
.86
Kidnapping
41
.79
.83
.81
Mistreatment
22
1
.27
.43
Sexual violence
8
0
0
0
Money laundering
7
0
0
0
Evasion
5
1
.40
.57
training set. The less the category is present in the dataset, the more the recall (sometimes also the precision) decreases. In some cases, recall is equal to zero, this means that the number of true positives is zero or there are a lot of false negatives, i.e. the algorithm was not able to identify the most news articles of that category. On the other hand, when the precision is equal to 1 it means that the number of false positives is zero, i.e. no news article of other categories has been mislabeled with that category. After some analysis, we discovered that the annotation for the news articles of “Gazzetta di Modena” in our dataset is not so accurate, therefore these tests on categorization are “dirty”. Then, we decided to perform the test again by using the embeddings of M2 and M3 and the best six categorization algorithms of the previous examples only on the news articles published in “ModenaToday”. Table 4 shows the values of precision and recall achieved by the best categorization algorithms on “ModenaToday” news articles. Comparing these values to the values of Table 2, we can notice slightly higher values. The highest values are found when Linear SVC is applied to the document embeddings obtained by the word embeddings of model M3 using the simple average and the pre-processing with tokenization and stop word removal (P1). In conclusion, further steps in pre-processing, such as lemmatization, bigram extraction and keyphrase extraction do not seem to be beneficial because the performances in terms of precision and recall do not improve. Also, the simple average (A1) shows better results than the TF-IDF average (A2). The model M3 is the preferable model for two reasons:
Supervised and Unsupervised Categorization
129
Table 4. Precision (P) and recall (R) of the application of the best six algorithms on the embeddings of M2 and M3 on “ModenaToday” news articles. P1 M2
P2 M3
M2
P3 M3
M2
M3
A1
A2
A1
A2
A1
A2
A1
A2
A1
A2
A1
A2
P
.85
.82
.86
.83
.84
.82
.85
.83
.83
.80
.83
.81
R
.85
.82
.86
.83
.85
.82
.85
.83
.83
.81
.84
.81
SVC (RBF, C=1,
P
.84
.84
.84
.83
.83
.83
.83
.83
.82
.81
.81
.81
gamma=‘scale’)
R
.85
.85
.85
.84
.84
.83
.83
.84
.83
.83
.82
.82
SGD (L2 norm,
P
.85
.83
.85
.84
.83
.84
.85
.83
.82
.81
.81
.83
Hinge loss)
R
.85
.83
.84
.83
.83
.83
.85
.82
.83
.81
.80
.80
SGD (L1 norm,
P
.84
.81
.85
.82
.84
.81
.85
.83
.81
.79
.82
.83
Perceptron)
R
.79
.81
.81
.82
.80
.80
.79
.80
.80
.79
.80
.81
P
.80
.80
.81
.81
.81
.80
.80
.81
.79
.78
.80
.79
R
.81
.80
.82
.81
.81
.81
.80
.81
.80
.79
.80
.80
Bagging
P
.78
.79
.79
.80
.81
.79
.81
.80
.79
.77
.79
.78
(KNN(n=5))
R
.78
.78
.79
.79
.80
.79
.81
.80
.79
.77
.79
.79
Linear SVC (C=1.0)
XGBoost
– the training of a Word2Vec model from scratch on our dataset requires 15 min, while the use of transfer training learning for M3 requires less than 3 min for retraining, – the pre-trained model has a wider vocabulary. It could be useful the document embeddings extraction for new news articles which contain words that do not appear in the training corpus. However, it is highly likely that all those words that are discriminative for crime categories are already present in the vocabulary of M2.
4.5
Unsupervised Categorization
Clustering test has been performed on the features obtained by M3, according to the results of the supervised categorization. We decided to use only the news articles published in the “ModenaToday” newspaper since the annotations available for this newspaper are more reliable than the ones available for the “Gazzetta di Modena” newspaper. The dataset contains 5,896 news articles and is imbalanced. Table 5 shows the number of news articles for each category in the dataset. Again, the most present category is “theft”, while the least present category is “fraud” with only 3 news articles. SMOTE has been applied to overcome the unbalance problem, generating new points in order to achieve the number of “theft” instances in all the other categories. In the end, in our test, there are 30,888 points in the feature space (2,376 points for each category). Four unsupervised algorithms are chosen for our experiments: – – – –
K-means Mini Batch K-means Agglomerative Clustering Spectral Clustering
130
F. Rollo et al.
For all these algorithms, the number of clusters n has to be established in advance. We start by setting n = 13, that is the number of crime categories extracted from the newspapers. Table 5. The number of news articles from “ModenaToday” newspaper for each category. Category
#news articles
Theft
2,376
Drug dealing
810
Illegal sale
739
Robbery
616
Aggression
427
Scam
415
Murder
185
Kidnapping
168
Mistreatment
85
Evasion
35
Sexual violence
20
Money laundering
17
Fraud
3
Total
5,896
According to the results of the supervised categorization, we applied clustering to the document embeddings obtained by model M3 with the simple average of the word embeddings considering all the three different pre-processing phases. Table 6 shows the values of precision, recall, f-score, and accuracy in each test, the two highest values of each metric are highlighted. Precision, recall, and f1score are always low (the highest value is 0.52), while accuracy reaches high values (from 0.86 to 0.93). This means that the number of false positives and false negatives w.r.t. the true positives is very high. Summing up, the highest metric values are highlighted in the second type of the pre-processing phase using K-means. The value of the silhouette coefficient in this test is 0.132, that is a low value, while the highest value (0.138) corresponds to the K-means algorithm with the first pre-processing type. In Fig. 2 the histograms show the number of news articles in each cluster for each crime category. For a better visualization, only the names of dominant categories are shown. As can be seen, in all the clusters there are few (maximum 3) dominant categories, as expected by the value of the silhouette coefficient. Figure 3 displays the silhouette coefficient for each sample, visualizing which clusters are dense and which are not. The red line indicates the average (0.132). This plot
Supervised and Unsupervised Categorization
131
Fig. 2. Histograms of the cluster distribution obtained with K-means (n = 13) applied to the embeddings of model M3, simple average and second pre-processing type. Table 6. Evaluation of unsupervised categorization using the document embeddings obtained by model M3 with the simple average and the three different pre-processing phases. precision
recall
f1-score
accuracy
P1
P2
P3
P1
P2
P3
P1
P2
P3
P1
P2
K-means
.44
.52
.47
.47
.51
.46
.45
.51
.47
.90
.92
.89
Mini Batch K-means
.51
.49
.49
.50
.50
.49
.50
.50
.49
.91
.90
.92
Agglomerative Clustering
.45
.47
.47
.49
.48
.39
.47
.48
.43
.90
.92
.92
0
.40
.33
.40
.50
.46
0
.45
.38
.93
.89
.86
Spectral Clustering
P3
allows understanding the cluster imbalance. We can notice that in all the clusters, except cluster 4, there are some instances with negative coefficient, this means that the instances are in the wrong cluster. The highest coefficients are related to some samples in cluster 3, indeed, looking at the histograms, in that
132
F. Rollo et al.
Fig. 3. Plot of the silhouette coefficient in the clusters of K-means (n = 13) on the embeddings of model M3, simple average (A1) and pre-processing with lemmatization (P2).
cluster there are the most samples of “murder” (almost 2,000 samples) and this category is present also in cluster 9 and 12 but with a very low number of samples (around 10). Analyzing in detail the results of this experiment, we notice that the clusters group together categories that are semantically similar. Based on this consideration, we decided to run a test by grouping together semantically similar categories in macro-category. The chosen macro-categories are seven: – – – – – – –
“Kidnapping”, “Murder”, “Robbery”, “Theft”, “Mistreatment”, “Aggression”, “Sexual Violence”, “Scam”, “Fraud”, “Money Laundering”, “Illegal Sale”, “Drug Dealing”, “Evasion”.
All the four algorithms tested before are re-used to perform categorization with macro-categories. In this case, the best result in terms of silhouette coefficient is given by the Spectral Clustering using the document embeddings generated by the simple average and the third pre-processing that includes also bigram and keyphrase extraction. The results are shown in Table 7, the numbers in bold are the number of instances of the assigned category for the corresponding cluster. Considering the Table row by row, we notice that each macro-category
Supervised and Unsupervised Categorization
133
Table 7. Results of unsupervised text categorization obtained by Spectral Clustering (n = 7 ) applied to the document embeddings of model M3, simple average and the third pre-processing type. Macro-category
1
2
3
4
5
6
Robbery and theft
258
3959 354
Drug dealing and illegal sale
119
156
4226 2
7
Fraud, scam and money laundering
98
1354
90
13 5573 0
11 33 137 0
240
7 0 2
Aggression, mistreatment and sexual violence 5940 937
132
5
11 99
4
Kidnapping
156
77
41
0
19 2075
8
Murder
2123
209
0
0
11 16
17
Evasion
1694
147
346
0
0
0
189
is dominant in only one cluster. However, clusters 1 and 6 have two dominant macro-categories. In addition, while clusters 1, 2, 3, and 6 contain a high number of samples, in the other clusters there are few samples. Following the procedure described in Sect. 4.3 to assign a category to each cluster, the assigned category for cluster 4 is “evasion” which has no instance in that cluster. The overall accuracy achieved in this experiment is 0.90.
5
Conclusion
In this paper, the use of word embeddings for the crime categorization on an Italian dataset of 17,500 news articles has been proved. Both supervised and unsupervised categorization algorithms have been explored. The model used to obtain the word embeddings is Word2Vec, and we selected 15 supervised categorization algorithms and 4 unsupervised categorization algorithms. The method described in the paper can be applied also in other contexts and is suitable for documents in languages different from Italian. However, since the trained Word2Vec model is language-dependent, it is necessary to use the appropriate Word2Vec model (if exists) or train the model on the documents in the specific language. Also, it is possible to test this approach on word embeddings generated by using other models, such as Glove or FastText. After generating word embeddings, supervised and unsupervised algorithms can be applied as described in the paper. The experiment results confirm the results obtained in our previous work [6] showing that the representation of texts through word embeddings is suitable for text categorization. The supervised algorithm with the best values of precision and recall is Linear SVC that reached an accuracy of 0.86 when using the re-trained model M3, the simple average of the word embeddings (A1) and the pre-processing with tokenization and stop word removal (P1). The unsupervised approach outperforms an accuracy of 0.93 using the Spectral Clustering algorithm with the same configuration of Linear SVC. The use of lemmatization and the integration of bigram and keyphrase extraction do not improve the results;
134
F. Rollo et al.
besides, the re-trained model M3 outperforms the other two models in most of the configurations. We release model M3 in a github repository7 along with the code used to extract the document embeddings and train the model on them and the code for the application of both supervised algorithms and unsupervised algorithms. The released Word2Vec model has an enriched vocabulary that contains terminology related to crimes. This can help other researchers to replicate our experiments on a different dataset. Both supervised and unsupervised approaches are affected by the imbalance of the dataset and the uncertainty of the annotation provided by the newspapers. In addition, in some cases, news articles are related to general information about crimes, and they do not describe a specific crime event. For the first problem, the use of SMOTE technique allows enhancing the results in the unsupervised approach. To overcome the difficulties due to the inaccurate annotation of the newspapers, a manual re-annotation is needed. Since this is a very timeconsuming operation, the supervised text categorization can be exploited with the active learning technique that allows categorizing more news articles in a short time with no need for manual checking the annotations predicted by the algorithm with high confidence. This approach will be explored in future work.
References 1. Ghankutkar, S., Sarkar, N., Gajbhiye, P., Yadav, S., Kalbande, D., Bakereywala, N.: Modelling machine learning for analysing crime news. In: 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), pp. 1–5 (2019). https://doi.org/10.1109/ICAC347590.2019.9036769 2. Hassan, M., Rahman, M.Z.: Crime news analysis: Location and story detection. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–6 (2017). https://doi.org/10.1109/ICCITECHN.2017.8281798 3. Vel´ asquez, D., et al.: I read the news today, oh boy: the effect of crime news coverage on crime perception and trust. In: IZA Discussion Papers 12056, Institute of Labor Economics (IZA), December 2018. https://doi.org/10.1016/j.worlddev. 2020.105111, https://ideas.repec.org/p/iza/izadps/dp12056.html 4. Ghosh, D., Chun, S.A., Shafiq, B., Adam, N.R.: Big data-based smart city platform: Real-time crime analysis. In: Kim, Y., Liu, S.M. (eds.) Proceedings of the 17th International Digital Government Research Conference on Digital Government Research, DG.O 2016, Shanghai, China, 08–10 June 2016, pp. 58–66. ACM (2016). https://doi.org/10.1145/2912160.2912205 5. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002). https://doi.org/10.1613/jair.953 6. Bonisoli, G., Rollo, F., Po, L.: Using word embeddings for Italian crime news categorization. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M., Slezak, D. (eds.) Proceedings of the 16th Conference on Computer Science and Intelligence Systems, Online, 2–5 September 2021, pp. 461–470 (2021). https://doi.org/10.15439/ 2021F118 7
https://github.com/SemanticFun/Word2Vec-for-text-categorization/.
Supervised and Unsupervised Categorization
135
7. Srinivasa, K., Thilagam, P.S.: Crime base: towards building a knowledge base for crime entities and their relationships from online news papers. Inf. Process. Manage. 56(6), 102059 (2019). https://doi.org/10.1016/j.ipm.2019.102059 8. Po, L., Rollo, F.: Building an urban theft map by analyzing newspaper crime reports. In: 2018 13th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), pp. 13–18 (2018). https://doi.org/10. 1109/SMAP.2018.8501866 9. Dasgupta, T., Naskar, A., Saha, R., Dey, L.: CrimeProfiler: crime information extraction and visualization from news media. In: Proceedings of the International Conference on Web Intelligence. WI 2017, pp. 541–549. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3106426.3106476 10. Rollo, F., Po, L.: Crime event localization and deduplication. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 361–377. Springer, Cham (2020). https:// doi.org/10.1007/978-3-030-62466-8 23 11. Po, L., Rollo, F., Trillo Lado, R.: Topic detection in multichannel Italian newspapers. In: Cal`ı, A., Gorgan, D., Ugarte, M. (eds.) IKC 2016. LNCS, vol. 10151, pp. 62–75. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53640-8 6 12. Rollo, F.: A key-entity graph for clustering multichannel news: student research abstract. In: Seffah, A., Penzenstadler, B., Alves, C., Peng, X. (eds.) Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, 3–7 April 2017, pp. 699–700. ACM (2017). https://doi.org/10.1145/3019612.3019930 13. Bergamaschi, S., Po, L., Sorrentino, S.: Comparing topic models for a movie recommendation system. In: Monfort, V., Krempels, K. (eds.) WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies, vol. 2, Barcelona, Spain, 3–5 April 2014, pp. 172–183. SciTePress (2014). https://doi.org/10.5220/0004835601720183 14. Po, L., Malvezzi, D.: Community detection applied on big linked data. J. Univ. Comput. Sci. 24(11), 1627–1650 (2018). https://doi.org/10.3217/jucs-024-11-1627 15. Bracewell, D.B., Yan, J., Ren, F., Kuroiwa, S.: Category classification and topic discovery of Japanese and English news articles. Electron. Notes Theor. Comput. Sci. 225, 51–65 (2009). https://doi.org/10.1016/j.entcs.2008.12.066 16. Jiang, T., Li, J.P., Haq, A.U., Saboor, A., Ali, A.: A novel stacking approach for accurate detection of fake news. IEEE Access 9, 22626–22639 (2021). https://doi. org/10.1109/ACCESS.2021.3056079 17. Do, T.H., Berneman, M., Patro, J., Bekoulis, G., Deligiannis, N.: Context-aware deep Markov random fields for fake news detection. IEEE Access 9, 130042–130054 (2021). https://doi.org/10.1109/ACCESS.2021.3113877 18. Kaliyar, R.K., Goswami, A., Narang, P.: FakeBERT: fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 80(8), 11765–11788 (2021). https://doi.org/10.1007/s11042-020-10183-2 19. Dhar, A., Mukherjee, H., Dash, N.S., Roy, K.: Text categorization: past and present. Artif. Intell. Rev. 54(4), 3007–3054 (2020). https://doi.org/10.1007/ s10462-020-09919-1 20. Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P.S., He, L.: A survey on text classification: from shallow to deep learning. CoRR (2020) 21. Wang, C., Nulty, P., Lillis, D.: A comparative study on word embeddings in deep learning for text classification. In: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval. NLPIR 2020, pp. 37– 46. Association for Computing Machinery, New York, NY, USA (2020). https:// doi.org/10.1145/3443279.3443304
136
F. Rollo et al.
22. Moreo, A., Esuli, A., Sebastiani, F.: Word-class embeddings for multiclass text classification. Data Mining Knowl. Disc. 35(3), 911–963 (2021). https://doi.org/ 10.1007/s10618-020-00735-3 23. Fesseha, A., Xiong, S., Emiru, E.D., Diallo, M., Dahou, A.: Text classification based on convolutional neural networks and word embedding for low-resource languages: Tigrinya. Information 12(2), 52 (2021). https://doi.org/10.3390/info12020052 24. Borg, A., Boldt, M., Rosander, O., Ahlstrand, J.: E-mail classification with machine learning and word embeddings for improved customer support. Neural Comput. App. 33(6), 1881–1902 (2020). https://doi.org/10.1007/s00521-020-05058-4 25. Christodoulou, E., Gregoriades, A., Pampaka, M., Herodotou, H.: Application of classification and word embedding techniques to evaluate tourists’ hotel-revisit intention. In: Filipe, J., Smialek, M., Brodsky, A., Hammoudi, S. (eds.) Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, Online Streaming, 26–28 April 2021, vol. 1, pp. 216–223. SciTePress (2021). https://doi.org/10.5220/0010453502160223 26. Semberecki, P., Maciejewski, H.: Deep learning methods for subject text classification of articles. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017, Prague, Czech Republic, 3–6 September 2017. Annals of Computer Science and Information Systems, vol. 11, pp. 357–360 (2017). https://doi.org/10. 15439/2017F414 27. Vita, M., Kr´ız, V.: Word2vec based system for recognizing partial textual entailment. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M. (eds.) Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, FedCSIS 2016, Gda´ nsk, Poland, 11–14 September 2016. IEEE Annals of Computer Science and Information Systems, vol. 8, pp. 513–516 (2016). https://doi.org/10. 15439/2016F419 28. Lin, T.: Performance of Different Word Embeddings on Text Classification (2019). https://towardsdatascience.com/nlp-performance-of-different-wordembeddings-on-text-classification-de648c6262b. Accessed 7 June 2021 29. Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: Ge, N., et al. (eds.) 14th IEEE International Conference on Cognitive Informatics & Cognitive Computing, ICCI*CC 2015, Beijing, China, 6–8 July 2015, pp. 136–140. IEEE Computer Society (2015). https://doi.org/10.1109/ICCI-CC.2015.7259377 30. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301. 3781 31. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2016). https:// doi.org/10.1162/tacl a 00051 32. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25–29 October 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 1532–1543. ACL (2014). https://doi.org/10.3115/v1/d141162
Supervised and Unsupervised Categorization
137
33. Kim, Y.: Convolutional neural networks for sentence classification. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25–29 October 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 1746–1751. ACL (2014). https://doi.org/10.3115/v1/d14-1181 34. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Bonet, B., Koenig, S. (eds.) Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 25–30 January 2015, Austin, Texas, USA, pp. 2267–2273. AAAI Press (2015). https://doi.org/10.1109/IJCNN.2019. 8852406, http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9745 35. Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text classification. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 649–657 (2015). https://proceedings.neurips.cc/paper/2015/ hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html 36. Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Barzilay, R., Kan, M. (eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, 30 July–4 August, Volume 1: Long Papers, pp. 562–570. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1052 37. Dieng, A.B., Wang, C., Gao, J., Paisley, J.W.: Topic-RNN: a recurrent neural network with long-range semantic dependency. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum? id=rJbbOLcex 38. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from treestructured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, 26–31 July 2015, Beijing, China, Volume 1: Long Papers, pp. 1556–1566. The Association for Computer Linguistics (2015). https://doi.org/10.3115/v1/p15-1150 39. Vidal, M.T., Rodr´ıguez, E.S., Reyes-Ortiz, J.A.: Classification of criminal news over time using bidirectional LSTM. In: Lu, Y., et al. (eds.) ICPRAI 2020. LNCS, vol. 12068, pp. 702–713. Springer, Cham (2020). https://doi.org/10.1007/978-3030-59830-3 61 40. Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: Su, J., Carreras, X., Duh, K. (eds.) Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, 1–4 November 2016, pp. 551–561. The Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/d16-1053 41. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
138
F. Rollo et al.
42. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: Xlnet: Generalized autoregressive pretraining for language understanding. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alch´e-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp. 5754–5764 (2019). https://proceedings.neurips. cc/paper/2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html 43. Sanwaliya, A., Shanker, K., Misra, S.C.: Categorization of news articles: a model based on discriminative term extraction method. In: Laux, F., Str¨ omb¨ ack, L. (eds.) The Second International Conference on Advances in Databases, Knowledge, and Data Applications, DBKDA 2010, Menuires, France, 11–16 April 2010, pp. 149– 154. IEEE Computer Society (2010). https://doi.org/10.1109/DBKDA.2010.18 44. Tahrawi, M.: Arabic text categorization using logistic regression. Int. J. Intell. Syst. App. 7, 71–78 (2015). https://doi.org/10.5815/ijisa.2015.06.08 45. Wongso, R., Luwinda, F.A., Trisnajaya, B.C., Rusli, O., Rudy: News article text classification in Indonesian language. In: ICCSCI, pp. 137–143 (2017). https://doi. org/10.1016/j.procs.2017.10.039 46. Totis, P., Stede, M.: Classifying Italian newspaper text: news or editorial? In: Cabrio, E., Mazzei, A., Tamburini, F. (eds.) Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Torino, Italy, 10–12 December 2018. CEUR Workshop Proceedings, vol. 2253 (2018). https://doi.org/10.4000/ books.aaccademia.3645,http://ceur-ws.org/Vol-2253/paper02.pdf 47. Camastra, F., Razi, G.: Italian text categorization with lemmatization and support vector machines. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Neural Approaches to Dynamics of Signal Exchanges. SIST, vol. 151, pp. 47–54. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8950-4 5 48. Bondielli, A., Ducange, P., Marcelloni, F.: Exploiting categorization of online news for profiling city areas. In: 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2020, Bari, Italy, 27–29 May 2020, pp. 1–8. IEEE (2020). https://doi.org/10.1109/EAIS48028.2020.9122777 49. Bai, Y., Wang, J.: News classifications with labeled LDA. In: Fred, A.L.N., Dietz, J.L.G., Aveiro, D., Liu, K., Filipe, J. (eds.) KDIR 2015 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval, Part of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015), vol. 1, Lisbon, Portugal, 12–14 November 2015, pp. 75–83. SciTePress (2015). https://doi.org/10.5220/ 0005610600750083 50. Li, Z., Shang, W., Yan, M.: News text classification model based on topic model. In: 15th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2016, Okayama, Japan, 26–29 June 2016, pp. 1–5. IEEE Computer Society (2016). https://doi.org/10.1109/ICIS.2016.7550929 51. He, C., Hu, Y., Zhou, A., Tan, Z., Zhang, C., Ge, B.: A web news classification method: fusion noise filtering and convolutional neural network. In: SSPS 2020: 2020 2nd Symposium on Signal Processing Systems, Guangdong China, July 2020, pp. 80–85. ACM (2020). https://doi.org/10.1145/3421515.3421523 52. Zhu, Y.: Research on news text classification based on deep learning convolutional neural network. Wirel. Commun. Mob. Comput. 2021, 1–6 (2021). https://doi. org/10.1155/2021/1508150
Supervised and Unsupervised Categorization
139
53. Duan, J., Zhao, H., Qin, W., Qiu, M., Liu, M.: News text classification based on MLCNN and BiGRU hybrid neural network. In: 3rd International Conference on Smart BlockChain, SmartBlock 2020, Zhengzhou, China, 23–25 October 2020, pp. 137–142. IEEE (2020). https://doi.org/10.1109/SmartBlock52591.2020.00032 54. Kim, D., Koo, J., Kim, U.: EnvBERT: multi-label text classification for imbalanced, noisy environmental news data. In: Lee, S., Choo, H., Ismail, R. (eds.) 15th International Conference on Ubiquitous Information Management and Communication, IMCOM 2021, Seoul, South Korea, 4–6 January 2021, pp. 1–8. IEEE (2021). https://doi.org/10.1109/IMCOM51814.2021.9377411 55. Nugroho, K.S., Sukmadewa, A.Y., Yudistira, N.: Large-scale news classification using BERT language model: spark NLP approach. In: 6th International Conference on Sustainable Information Engineering and Technology 2021. SIET 2021, pp. 240–246. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3479645.3479658 56. Thaipisutikul, T., Tuarob, S., Pongpaichet, S., Amornvatcharapong, A., Shih, T.K.: Automated classification of criminal and violent activities in Thailand from online news articles. In: 13th International Conference on Knowledge and Smart Technology, KST 2021, Bangsaen, Chonburi, Thailand, 21–24 January 2021, pp. 170–175. IEEE (2021). https://doi.org/10.1109/KST51265.2021.9415789 57. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a Meeting Held 5–8 December 2013, Lake Tahoe, Nevada, United States, pp. 3111–3119 (2013). https://proceedings.neurips. cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html 58. Rose, S., Engel, D., Cramer, N., Cowley, W.: 1. Automatic Keyword Extraction from Individual Documents, pp. 1–20. John Wiley & Sons Ltd., Hoboken (2010). https://doi.org/10.1002/9780470689646.ch1, https://onlinelibrary.wiley.com/doi/ abs/10.1002/9780470689646.ch1 59. Kumar, L., Kumar, M., Neti, L.B.M., Misra, S., Kocher, V., Padmanabhuni, S.: An empirical study on application of word embedding techniques for prediction of software defect severity level. In: Ganzha, M., Maciaszek, L.A., Paprzycki, M., Slezak, D. (eds.) Proceedings of the 16th Conference on Computer Science and Intelligence Systems, Online, 2–5 September 2021, pp. 477–484 (2021). https:// doi.org/10.15439/2021F100 60. Di Gennaro, G., Buonanno, A., Di Girolamo, A., Ospedale, A., Palmieri, F.A.N., Fedele, G.: An analysis of word2vec for the Italian language. In: Esposito, A., Faundez-Zanuy, M., Morabito, F.C., Pasero, E. (eds.) Progresses in Artificial Intelligence and Neural Systems. SIST, vol. 184, pp. 137–146. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5093-5 13 61. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10. 1016/0377-0427(87)90125-7
Methods for Supporting Business and Society
Towards Reliable Results - A Comparative Analysis of Selected MCDA Techniques in the Camera Selection Problem 1(B) 1 Aleksandra Baczkiewicz , Jaroslaw Watr´ , obski 2 2 Bartlomiej Kizielewicz , and Wojciech Salabun 1 University of Szczecin, Szczecin, Poland [email protected], [email protected] 2 West Pomeranian University of Technology in Szczecin, Szczecin, Poland {bartlomiej-kizielewicz,wojciech.salabun}@zut.edu.pl
Abstract. Objective evaluation in real-life decision problems requiring considering many contrasting criteria is quite a challenge for the decisionmaker. This paper presents an approach employing several MCDA methods to objectify the multi-criteria assessment procedure in the camera selection problem. The proposed approach includes a comparative analysis of four MCDA methods, namely the Preference Ranking Organization METHod for Enrichment of Evaluations (PROMETHEE II), the MultiAttributive Border Approximation area Comparison method (MABAC), the Evaluation based on Distance from Average Solution method (EDAS) and the Multi-Objective Optimization Method by Ratio Analysis method (MOORA), applied with two objective criterion weighting methods. The similarity of the rankings provided by the employed MCDA methods was determined using two ranking correlation coefficients. In addition, a compromise ranking strategy called the Dominance-Directed Graph was applied to obtain a single reliable ranking. The performed research confirmed the significance of the appropriate selection of multi-criteria decision-making methods for the solved problem and the relevance of benchmarking in method selection and construction of objective rankings of alternatives. Keywords: MCDA evaluation Compromise ranking
1
· Objective weighting methods ·
Introduction
Dealing with complex, real-world decision-making problems involves recognizing conflicting goals, making decisions with multiple criteria, and aiming for compromise solutions [1,2]. Addressing these requirements, many solutions dedicated to selected areas and general-purpose methods have been developed [3–6]. Most research is oriented toward developing and improving new MCDA methods. They vary in many aspects, such as different algorithms for establishing c Springer Nature Switzerland AG 2022 E. Ziemba and W. Chmielarz (Eds.): FedCSIS-AIST 2021/ISM 2021, LNBIP 442, pp. 143–165, 2022. https://doi.org/10.1007/978-3-030-98997-2_7
144
A. Baczkiewicz et al.
the weights of criteria in the computations, algorithms’ complexity, preference functions and representation of evaluation criteria, the kind of data aggregation, and the possibility of regarding uncertain data [7]. Although there are many MCDA methods available, it is important to be aware that there is no perfect method that could be considered suitable for applying to each decision situation or solving every decision problem [8]. In this regard, it becomes an important research problem to choose a decision support method appropriate for the considered problem since only a properly selected method can provide a proper solution that reflects the decision maker’s preferences [9]. The assessment of alternatives with MCDA methods requires considering the subjective decision maker’s preferences regarding the criteria importance, which means that the final recommendation may change depending on those preferences [10] or using objective weighting techniques that determine criteria importance based on performances included in decision matrix [11]. Although a dynamic development of new MCDA methods and improved existing algorithms is observed, relatively little attention is given to their appropriate selection for the particular decision problem. Applying the inadequate method to a particular decision situation can reduce the quality of the recommendation, as different MCDA methods produce inconsistent outcomes. Furthermore, the complexity, uniqueness, or the fact that decision situations can appear simultaneously over a short time causes their analysis to be challenging. Consequently, it is necessary to apply formal procedures and guidelines for selecting MCDA methods in case of a partial deficiency of knowledge about the decision situation [12,13]. Common real-life decision problems in which MCDA methods are applied to solve are mobile devices selection problems. Among them, there can be considered multi-criteria decision problem concerning on selecting mobile devices such as mobile phone, mobile handset, laptop, camera, where criteria represent attributes and capabilities such as the size of the in-build camera, battery life and talk time, brand, color, camera size and resolution [14]. There are many methods of multi-criteria decision-making arising from different streams. Among them, the two main streams, i.e., the American school and the European school, can be mentioned. This paper presents the multi-criteria decision problem involving an objective selection of the most advantageous camera model. The authors’ main goal was to obtain the most reliable solution by integrating results provided by four selected MCDA methods. Another objective was to analyze the comparative results obtained and determine their convergence. The authors chose two objective methods for criteria weighting according to the aim of obtaining objective results. It was hypothesized that due to the differences in the algorithms included in particular MCDA methods, which cause various solutions to the same problems, benchmarking with a set of selected methods is an important phase in assessing a multi-criteria problem. Because MCDA methods are dedicated to applying in different domains, the necessity for a customized approach considering the particular character of the considered problem arises [15,16]. Using the correlation coefficients of the rankings in the next step allows an objective
Towards Reliable Results - A Comparative Analysis of Selected MCDA
145
assessment of the convergence of the rankings and identification of methods that give consistent and outlier results in a specific problem. To solve a described problem, a model-based approach including four MCDA methods, namely Preference Ranking Organization METHod for Enrichment of Evaluations (PROMETHEE II), Multi-Attributive Border Approximation area Comparison method (MABAC), Evaluation based on Distance from Average Solution method (EDAS), and Multi-Objective Optimization Method by Ratio Analysis method (MOORA), has been applied, taking into account two techniques for determining the criteria weights: Entropy weighting method and Gini coefficient-based weighting method. The rest of this paper is organized as follows. Section 2 provides the literature review of studies covering the application of MCDA methods to multi-criteria evaluation problems. Section 3 gives the basics, fundamental assumptions, and following steps of the MCDA methods and supporting methods involved in this research and explains subsequent stages of the research presented in this paper. In Sect. 4 research results are presented and discussed. The last Sect. 5 summarizes the research and provides conclusions and future work directions.
2
Literature Review
Development in the domain of MCDA methods research has led to establishing two main groups of methods, the American school, and the European school. Methods arising from the American school use a utility function in evaluating alternatives [17]. This group of methods includes, for example, the Analytic Hierarchy Process (AHP), Analytic Network Process (ANP), Simple multi-attribute rating technique (SMART), Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS), VlseKriterijumska Optimizacija I Kompromisno Resenje (VIKOR), and COmplex PropoRtional ASsessmen (COPRAS) [18]. The methods representing the European school, on the other hand, are based on a relational model that takes into account relations of indifference, weak or strong preference, and incomparability. Popular methods belonging to the European school are Elimination and choice expressing the reality family of methods (ELECTRE), PROMETHEE, Novel Approach to Imprecise Assessment and Decision Environment (NAIADE), ORESTE, and Treatment of the Alternatives according To the Importance of Criteria (TACTIC) [19]. In addition, methods that combine the approaches of the American and European schools, such as EVAMIX, QUALIFLEX, and the PCCA group of methods, and the group of methods based on decision rules using fuzzy set theory, which Characteristic Objects METhod (COMET) represents, can be mentioned. Preference Ranking Organization METHod for Enrichment of Evaluations (PROMETHEE II) belongs to the European school. It involves attributes specific for other European school-based methods, such as outranking relations, threshold values, various types of preference functions [9] and reduced effect of criteria’s linear compensation [20]. The advantage of PROMETHEE II is that it provides a quantitative final ranking compared to other methods from the
146
A. Baczkiewicz et al.
European school. This method includes a pairwise comparison of performance values of alternatives against criteria. This method was developed in 1982 by J.P. Brans and further developed by Brans and Vincke in 1985 [21]. This method applies preference functions to compare alternatives across criteria and uses preference and indifference thresholds [22]. PROMETHEE II was used to evaluate the websites of enterprises promoting renewable energy [23], to evaluate investment scenarios for the geothermal field in comparison with ELECTRE III [24] and assessment of locations for onshore farms [25]. This method has also been applied for evaluation of sustainable energy consumption in the industrial sector, in comparison with COMET, TOPSIS, and VIKOR [26], and assessment of banking services [27], sustainability in e-banking website [28] and electronic banking services [29] in comparison with PROSA (PROMETHEE for Sustainability Assessment) and TOPSIS. Multi-Attributive Border Approximation area Comparison method (MABAC) ´ was introduced by Pamuˇcar and Cirovi´ c in 2015 [30]. This method involves calculating values of the criterion function of each alternative and determining their distance from the boundary of the approximation area. Therefore, the MABAC method has a practical application in various multi-criteria decision-making problems, such as forklift evaluation in comparison with SAW, COPRAS, TOPSIS, MOORA, and VIKOR methods [30], identifying the most viable locations for wind farms [31], identifying areas most susceptible to flood occurrence in comparison with TOPSIS [32], and selecting the most favorable mobile phone model in ecommerce recommender system in comparison with COMET, COCOSO (Combined compromise solution), EDAS, MAIRCA (Multi Attributive Ideal-Real Comparative Analysis) and MABAC [33]. Moreover, as can be noticed, the MABAC method was applied in many studies in the cited literature and has shown high stability of solutions when criteria weights are shifted compared to other methods, confirming its suitability for rational decision-making. Evaluation based on Distance from Average Solution method (EDAS) is based on reference points similar to well-known methods from the American school, including TOPSIS [34], VIKOR, CODAS (COmbinative Distance-based ASsessment). However, the algorithm of the EDAS method differs from the conventional approach, in which ideal or anti-ideal solutions or both serve as reference points. EDAS takes into account the distance of the alternatives from the average solution in the alternatives evaluation procedure. The studies provided in the literature prove the usefulness of this method, especially in problems requiring the consideration of conflicting criteria [35]. EDAS is a relatively new method comparing other well-established MCDA methods. It was developed by Keshavarz Ghorabaee, Zavadskas, Olfat, and Turskis in 2015 [36]. The literature reports that searching for the best solution based on measuring the distance from the average solution reduces the possibility of deviation from the best solution, providing more accurate results in solving real-life multi-criteria problems [37]. The EDAS method has been applied to multi-criteria evaluation of sustainable socioeconomic development in European Union countries [38], to evaluate zero-carbon transport policies [39], and for selection the most advantageous mobile phone [40].
Towards Reliable Results - A Comparative Analysis of Selected MCDA
147
The Multi-Objective Optimization Method by Ratio Analysis method (MOORA) includes maximization and minimization goals [41]. This method was introduced by Brauers and Zavadskas [42]. This method is characterized by simplicity, resulting in low computational complexity. In the MOORA method, alternatives are evaluated by aggregating maximum profits and minimum costs [43]. MOORA was applied for the multi-criteria problem of industrial robot selection in comparison with WSM, WPM, and WASPAS [41], for evaluation of the potential for regions to transition to renewable energy in comparison with TOPSIS [43], to assess road design alternatives [44] and for selecting the most advantageous energy storage device in comparison with COMET, PROMETHEE II and TOPSIS [45]. Based on the literature review, the authors selected PROMETHEE II for the set of MCDA methods employed in this study due to its advantages of considering not only indifference or preference between comparable alternatives, as is the case with methods from the American school, but also other relationships such as indifference, incomparability, and weak or strong preference. The other selected methods, namely MABAC, EDAS, and MOORA, are based on the assumptions of the American school but have some distinctive advantages. For example, the MABAC method is robust to changes in the model, such as in the values of the weights, which has been confirmed in the literature, EDAS is resistant to the occurrence of a deviation from the best solution, thus producing accurate results, while MOORA has a straightforward algorithm and low computational complexity. Based on the above literature review, it can be identified that most published investigations involve more than one MCDA method. It implies that evaluating a multi-criteria problem using only one method is a limitation of the study because no one ideal MCDA method is designed for a particular problem. The various methods may provide different solutions to the same problem worth considering in the benchmarking procedure. On the other hand, divergences occurring in the rankings provided by several MCDA methods generate the question of which ranking is the most reliable. The variations in solutions provided by different MCDA methods are caused by differences in algorithms that, for example, determine reference solutions and calculate the distance of alternatives from them in a different manner. For example, MCDA methods may utilize the distance from the best, worst, or average solution in the procedure for determining the most favorable alternative. On the other hand, obtaining different solutions can help obtain more reliable results that take into account the advantages of each method [46]. However, an additional method that determines the ranking based on multiple rankings is needed to achieve this goal. Tools that aggregate multiple rankings into a single integrated ranking list are called compromise ranking techniques. Among the tools designed to aggregate rankings are the Dominance-Directed Graph, also known as Tournaments [47], Rank Position Method [47], Technique of Precise Order Preference, Borda Count [48], Improved Borda Rules [49], and the Copeland Strategy [50]. Most of the mentioned strategies are used in the MULTIMOORA method, which
148
A. Baczkiewicz et al.
integrates three specific approaches providing three rankings [51]. However, some of these strategies can be successfully adapted to construct reliable rankings based on rankings provided by different MCDA methods.
3
Research Methodology
The purpose of this research is to compare the results of four different MCDA methods, namely PROMETHEE II [21,52,53], MABAC [30], EDAS [35], and MOORA [44], applied for solving the problem of evaluating 20 different camera models and generate the compromise ranking based on results obtained with particular MCDA methods. The basic assumptions and subsequent steps of the MCDA methods employed in this research are provided in the following Sects. 3.1, 3.2, 3.3 and 3.4. 3.1
The PROMETHEE II Method
The subsequent steps of the PROMETHEE II method are presented on the basis of [54]. Step 1. Calculate the preference function value for each pair of compared alternatives regarding particular criteria. The preference function is defined by Eq. (1) (1) Pj (a, b) = Fj [dj (a, b)], ∀a, b ∈ A where A represents the set of evaluated alternatives, d(a, b) denotes the difference, which is the result of pairwise comparison of two actions as demonstrated in Eq. (2) in the case of profit criteria and Eq. (3) for cost criteria, and j = 1, 2, . . . , n represent number of considered criteria. dj (a, b) = gj (a) − gj (b)
(2)
dj (a, b) = gj (b) − gj (a)
(3)
The preference function P value is always from 0 to 1. The V-shape with indifference criterion, also known as a linear function, was applied in this research. ⎧ ⎨0 d ≤ q qp This type of preference function requires two additional parameters, namely p, which denotes the threshold of absolute preference, and q represents the indifference threshold. In this research, both preference thresholds were calculated based on the standard deviation σj computed from all alternatives’ values in respect to each criterion. The preference threshold was calculated using formula p = 2σj and the indifference threshold was determined with formula q = 0.5σj , as proposed in [55].
Towards Reliable Results - A Comparative Analysis of Selected MCDA
149
Step 2. Calculation of the aggregated preference indices according to Eq. (5) n π(a, b) = j=1 Pj (a, b)wj n (5) π(b, a) = j=1 Pj (b, a)wj where a and b represent compared alternatives and π(a, b) demonstrates how much a is preferable to b in respect to all criteria, wj represents the weight of j-th criterion. Step 3. Calculation of the positive Φ+ and negative Φ− outranking flows for each alternative. The positive outranking flow denotes how much alternative a is preferable to other alternatives and is calculated according to Eq. (6). 1 π(a, bi ) m − 1 i=1 m
Φ+ (a) =
(6)
On the other hand, the negative outranking flow informs how much other variants are preferred than alternative a and is computed as Eq. (7) shows. Φ− (a) =
1 π(bi , a) m − 1 i=1 m
(7)
In both Eqs. (6) and (7) i = 1, 2, . . . , m represents number of evaluated alternatives. Step 4. Computation of the net flow Φ according to Eq. (8) Φ(a) = Φ+ (a) − Φ− (a)
(8)
Finally, alternatives are ranked according to the descending order of Φ(a) values. Namely, the alternative which has the highest net flow value is the most preferable. 3.2
The MABAC Method
The subsequent stages of the MABAC (Multi-Attributive Border Approximation area Comparison) method are detailed below, based on [30]. Step 1. Normalization of the decision matrix X = [xij ]m×n , where m denotes number of alternatives (i = 1, 2, . . . , m), and n represents number of criteria (j = 1, 2, . . . , n), using Minimum-Maximum normalization according to Eq. (9) for profit criteria and (10) for cost criteria. rij =
xij − minj (xij ) maxj (xij ) − minj (xij )
(9)
rij =
maxj (xij ) − xij maxj (xij ) − minj (xij )
(10)
150
A. Baczkiewicz et al.
Step 2. Computation of the weighted normalized matrix V = [vij ]m×n as Eq. (11) shows (11) vij = wj (nij + 1) where nij means the normalized decision matrix and wj denotes weight of j-th criterion. The weighted matrix V is received as Eq. (12) demonstrates. ⎡ ⎤ ⎡ ⎤ w1 (n11 + 1) w2 (n12 + 1) · · · wn (n1n + 1) v11 v12 · · · v1n ⎢ v21 v22 · · · v2n ⎥ ⎢ w1 (n21 + 1) w2 (n22 + 1) · · · wn (n2n + 1) ⎥ ⎢ ⎥ ⎢ ⎥ V =⎢ . ⎥ (12) .. .. .. ⎥ − ⎢ .. .. .. .. ⎣ ⎣ .. ⎦ ⎦ . . . . . . . vm1 vm2 · · · vmn
w1 (nm1 + 1) w2 (nm2 + 1) · · · wn (nmn + 1)
Step 3. Determination of the border approximation matrix (G). The border approximation area for each criterion is computed applying Eq. (13) gj = (
m
1
vij ) m
(13)
i=1
where m represents the number of alternatives and n denotes the number of criteria. Computation of gj for each criterion provides a border approximation area matrix G formed with the format 1 × n as Eq. (14) demonstrates. (14) G = g1 g2 · · · gn Step 4. Calculation of the distance of the alternatives from the border approximation area for the matrix Q elements like Eqs. (15) and (16) present. ⎡ ⎤ v11 v12 · · · v1n ⎢ v21 v22 · · · v2n ⎥ ⎢ ⎥ − g1 g2 · · · gn Q=V −G=⎢ . (15) ⎥ . . . .. .. .. ⎦ ⎣ .. vm1 vm2 · · · vmn ⎡
v11 − g1 v12 − g2 ⎢ v21 − g1 v22 − g2 ⎢ Q=⎢ .. .. ⎣ . .
· · · v1n − gn · · · v2n − gn .. .. . .
⎤
⎡
q11 q12 ⎥ ⎢ q21 q22 ⎥ ⎢ ⎥ = ⎢ .. .. ⎦ ⎣ . . vm1 − g1 vm2 − g2 · · · vmn − gn qm1 qm2
··· ··· .. .
q1n q2n .. .
⎤ ⎥ ⎥ ⎥ ⎦
(16)
· · · qmn
Step 5. Calculation of criterion functions values for each alternative using Eq. (17). n Si = qij , i = 1, 2, . . . , m j = 1, 2, . . . , n (17) j=1
The final ranking of alternatives is built by sorting alternatives according to S values in descending order. It means that the alternative with the highest value Si is the most advantageous.
Towards Reliable Results - A Comparative Analysis of Selected MCDA
3.3
151
The EDAS Method
The subsequent steps of the EDAS (Evaluation based on Distance from Average Solution) method are provided based on [35]. Step 1. Determination the average solution AV = [AVj ]1×n for all criteria as Eq. (18) shows. m xij (18) AVj = i=1 m Step 2. Determine P DA = [P DAij ]m×n and N DA = [N DAij ]m×n . For profit criteria above Eqs. (19) and (20) are employed. P DAij =
max(0, (Xij − AVj )) AVj
(19)
N DAij =
max(0, (AVj − Xij )) AVj
(20)
In the case of cost type of criteria, Eqs. (21) and (22) provided below are applied P DAij =
max(0, (AVj − Xij )) AVj
(21)
N DAij =
max(0, (Xij − AVj )) AVj
(22)
where P DAij represents the positive distance of i-th alternative from average solution in respect of j-th criterion and N DAij is the negative distance of i-th alternative from average solution considering j-th crietrion. Step 3. Calculation of weighted sums of the positive and the negative distance according to Eqs. (23) and (24) SPi =
n
wj P DAij
(23)
wj N DAij
(24)
j=1
SNi =
n j=1
where wj represents the weight of j-th criterion. Step 4. Normalization of the weighted sum of P DA and N DA using Eqs. (25) and (26). N SPi =
SPi maxi SPi
N SNi = 1 −
SNi maxi SNi
(25) (26)
152
A. Baczkiewicz et al.
Step 5. Calculation of the final appraisal score (ASi ) for each alternative applying the mean of the normalized weighted sum of the positive and negative distance from the average solution determined for each alternative using Eq. (27) ASi =
1 (N SPi + N SNi ) 2
(27)
where 0 ≤ ASi ≤ 1. Alternatives are ranked according to their appraisal scores by ordering them decreasingly. It means that the most preferred alternative has the highest appraisal score. 3.4
The MOORA Method
The following stages of the MOORA method are descibed based on [42]. Step 1. Normalization of the decision matrix X = [xij ]m×n according to Eq. (28) xij rij = m 2 1 (28) ( i=1 xij ) 2 where rij represents values of i-th alternative in respect to j-th criterion in decision matrix. Step 2. Determination of the overall performance index for each alternative. This procedure involves the calculation of the difference between its sums of the weighted normalized performance values of the profit and cost criteria, as Eq. (29) demonstrates. wj rij − wj rij (29) Qi = j∈Ωmax
j∈Ωmin
where Qi represents the preference value of i-th alternative, wj represents the weight assigned to j-th criterion, Ωmax is the set of profit criteria and Ωmin denotes the set of cost criteria, i = 1, 2, . . . , m represents the number of evaluated alternatives and j = 1, 2, . . . , n denotes the number of considered criteria. The final ranking of alternatives is constructed by ordering alternatives decreasingly according to the Q value. Thus, an alternative with the highest Q value is the most advantageous. In modeling decision problems, a very significant stage is determining the importance of decision criteria. Because this research aimed to assess the alternatives objectively, the authors chose two objective weighting techniques: the entropy weighting method [56] detailed in Sect. 3.5 and the Gini coefficient-based weighting method [57] described in Sect. 3.6, instead of subjective methods to determine the criteria weights. 3.5
The Entropy Weighting Method
Step 1. Normalization of the decision matrix according to Eq. (30) xij pij = m i=1
xij
(30)
Towards Reliable Results - A Comparative Analysis of Selected MCDA
153
Step 2. Calculate entropy as Eq. (31) shows. hj = −(ln m)−1
m
pij · ln pij
(31)
i=1
where pij · ln pij is set as 0 if pij = 0. Step 3. Calculate weights using Eq. (32). 1 − hj j=1 (1 − hj )
wj = n
3.6
(32)
The Gini Coefficient-Based Weighting Method
Step 1. Calculate the Gini coefficient value for each j-th criterion using Eq. (33) Gj =
m m |xij − xkj | 2m2 Ej i=1
(33)
k=1
where x denotes performance values of decision matrix X = [xij ]m×n , m represents number of alternatives (i = 1, 2, . . . , m), n denotes number of criteria (j = 1, 2, . . . , n), and Ej represents the average value for all alternatives regarding j-th criterion. If Ej is not equal to 0, Eq. (33) is applied. Otherwise, the Gini coefficient is calculated according to Eq. (34). Gj =
m m |xij − xkj | m2 − m i=1
(34)
k=1
Step 2. Calculate weights for each j-th criterion using Eq. (35). Gj wj = n j=1
Gj
(35)
The next phase of the investigation was to determine the convergence of the rankings obtained using the different MCDA methods. For this purpose, two ranking correlation coefficients were used: symmetrical rw and asymmetrical W S, which are detailed in Sects. 3.7 and 3.8. 3.7
Weighted Spearman’s Rank Correlation Coefficient
The symmetrical rw correlation coefficient is calculated using Eq. (36). The sample size is represented by N and xi and yi are the positions in compared rankings [9]. rw = 1 −
6
N
i=1 (xi
− yi )2 ((N − xi + 1) + (N − yi + 1)) N4 + N3 − N2 − N
(36)
154
3.8
A. Baczkiewicz et al.
The W S Similarity Coefficient
The asymmetrical W S similarity coefficient is calculated according to Eq. (37), where N denotes the size of sample and xi and yi represent ranks in the compared rankings x and y. For this coefficient, rank reversals in the positions at the top of the ranking affect its value most significantly [9]. WS = 1 −
N
2−xi
i=1
3.9
|xi − yi | max(|xi − 1|, |xi − N |)
(37)
The Dominance-Directed Graph for Compromise Ranking Determination
The final step of this research was to determine the compromise rankings within each weighting method based on the rankings provided by the four MCDA methods investigated. In order to generate the compromise rankings, the DominanceDirected Graph method [47] was applied. In the dominance directed graph method, each ranking is regarded as a tournament in which each alternative is an individual team. Step 1. Assume that matrix Z presented in Eq. (38) contains n different rankings (j = 1, 2, . . . , n) of m alternatives (i = 1, k, . . . , m). ⎡ ⎤ z11 z12 · · · z1n ⎢ zk1 zk2 · · · zkn ⎥ ⎢ ⎥ (38) Z = [zij ]m×n = ⎢ . .. .. .. ⎥ ⎣ .. . . . ⎦ zm1 zm2 · · · zmn Step 2. The alternatives have to be compared within each j-th tournament such that when alternative zij dominates over alternative zkj , it receives a value of 1; otherwise, it gets a value of 0. For performing this step, construct a vertex matrix H = [hik ]m×m in which the values resulting from each pairwise comparison of the particular i-th and k-th alternatives are added within each j-th tournament, as Eq. (39) shows H = [hik ]m×m =
n
P (zij , zkj ) ∀ i, k ∈ {1, 2, . . . , m}
(39)
j=1
where P is performed as Eq. (40) demonstrates 1 f (zij ) > f (zkj ) P (zij , zkj ) = 0 f (zij ) ≤ f (zkj )
(40)
where f represents function which has to be maximized. Step 3. In the final step, the rows of the matrix H are summed. The sums of the matrix H rows represent the preference value for each alternative. The highest preference value indicates the best alternative. The final ranking is generated by ordering the alternatives in descending order according to the preference values.
Towards Reliable Results - A Comparative Analysis of Selected MCDA
3.10
155
The Study Case
Data on the evaluation criteria values of the considered camera models were obtained from various websites. The selected quantitative criteria, which are provided in Table 1 together with their types and units, represent camera parameters considered by customers during purchase decisions. Table 1. Evaluation criteria, their types and units. Ci
Name
Type
C1
Thickness
Cost
Millimeters [mm]
C2
Width
Cost
Millimeters [mm]
C3
Height
Cost
Millimeters [mm]
C4
Weight
Cost
Gram [g]
C5
Resolution
Profit Megapixel [M px]
C6
4K
Profit Frames per second [F P S]
C7
FullHD
Profit Frames per second [F P S]
C8
HD
Profit Frames per second [F P S]
C9
Viewing angle Profit Radian [◦ ]
Unit
C10 Battery life
Profit Minutes [min]
C11 Price
Cost
Polish zloty [P LN ]
Table 2. Decision matrix with performance values of alternatives for considered criteria. Ai
Alternatives
C1
C2
C3
C4
A1
SONY FDR-X3000
29.4
83
47
114 12
C5
C6 C7
C8
C9
C10 C11
A2 A3
DJI Pocket 2 Creator Combo 30 38.1 124.7 117 16 ¨ GOTZE & JENSEN S-Line SC501 29.28 59.27 41.13 58 16
A4
GOPRO HERO9
33.6
71
55
A5
Xblitz Move 4K+
21
59
41
66 16
A6
DJI Osmo Action
35
65
42
134 12
60 240 240 145
60 1087
A7
Insta360 ONE R-1-Inch Edition
47
79
54
158 19
60 120 120 360
72 2499
A8
GOPRO HERO7
28.3
62.3
44.9
116 12
30
90 999.99
A9
DJI Osmo Pocket
36.9
28.6
121.6 130 12
60 120 120
A10 GOXTREME Enduro
32
59
41
30 120 120 170
A11 GOPRO HERO8
28.4
66.3
48.6
126 12
A12 Insta360 One X2
29.8
46
113
47 18
50
50
A13 SJCAM A20
20.2
64
80
70 8
24
60 120 166 480 699.99
30 120 240 170
90 1717.75
60
60
70 2389
30
60 120 170
60
93
78 239.99
159 23.6 60 240 240 132 140 2099
60 16
24
60 120 170
60
60 130 80
70 439
80 1099 60 302.96
60 240 240 132 135 1629 50 360
72 2099
A14 LAMAX X9.1
33
60
44
59 12
30
60 120 170
A15 MANTA MM9259
29
59
41
55 16
30
60 120 170 120 299
90 388
A16 SJCAM SJ4000 WiFi
29
59
41
182 12
30
30
A17 LAMAX Action X3.1 Atlas
29.8
59.2
41
65 16
30
60 120 160
A18 SJCAM SJ10 Pro
28.8
62.5
41
70 12
60 120 120 170 138 1399.99
A19 GOXTREME Pioneer
24
40
59
60 12
10
30
30 140
78 269.99
A20 TRACER eXplore SJ 4561
30
60
45
201 16
30
30
30 170
90 199.99
60
94 140 249 90 219.99
156
A. Baczkiewicz et al.
The performance values of the evaluated alternatives Ai for the considered criteria Cj are included in the decision matrix provided in Table 2. Based on the data in the decision matrix, the weights of each criterion were calculated using the entropy and Gini coefficient-based weighting method. Weights assigned to particular criteria provided by each weighting technique are displayed in Table 3. Table 3. Criteria weights determined by two objective weighting methods. Weighting method C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
Entropy weights
0.0127 0.0190 0.0729 0.0778 0.0211 0.0663 0.1661 0.1163 0.0586 0.1519 0.2372
Gini weights
0.0362 0.0437 0.0848 0.0984 0.0480 0.0842 0.1379 0.1125 0.0745 0.1107 0.1690
It can be noticed that both weighting methods indicated C11 (Price) as the most significant criterion. In addition, criteria such as C7 (FullHD concerning frames per second), C8 (HD concerning frames per second), and C10 (Battery life) also received high values indicating high importance in the context of decisionmaking choice.
4 4.1
Research Findings and Discussion MCDA Results of Evaluation with Using Entropy Weighting Method
Results of evaluation with applying entropy weighting method containing preference and rank values obtained with each MCDA method and additionally compromise ranking generated with the Dominance-Directed Graph method are provided in Table 4. Received rankings are also visualized in the form of a column chart presented in Fig. 1. In the case of applying the weights determined by the entropy method, the leaders of the rankings provided by the employed MCDA methods were different. The PROMETHEE II and MABAC rankings leader was alternative A6 (DJI Osmo Action), which ranked second in the EDAS and MOORA rankings. In contrast, the EDAS and MOORA methods identified A13 (SJCAM A20) as the most advantageous alternative, which ranked third in the PROMETHEE II and MABAC rankings. Alternative A11 (GOPRO HERO8) was ranked second in the PROMETHEE II and MABAC rankings and third in the EDAS and MOORA rankings. Despite the noticeable differences in the rankings, it can be seen that all methods clearly identified A6 , A13 , and A11 as the most favorable alternatives. Analysis of the results makes it possible to conclude that the PROMETHEE II and MABAC rankings are very similar, as are the EDAS and MOORA rankings. In order to reliably assess the similarities and differences in the rankings provided by each MCDA method for each pairwise comparison of particular rankings, the values of two correlation coefficients of the rankings, namely rw and W S, were determined. The results of the rankings convergence analysis are visualized in Fig. 2.
Towards Reliable Results - A Comparative Analysis of Selected MCDA
157
Table 4. Preference values and rankings provided by MCDA methods using entropy weighting method. Ai
Preference values Rankings PROMETHEE II MABAC EDAS MOORA PROMETHEE II MABAC EDAS MOORA Compromise
A1
−0.0526
A2
−0.2974
A3
0.0629
0.0947 0.5960
A4
0.1045
0.1070 0.6177
A5
0.0491
0.0213
13
12
13
12
12
−0.2351 0.0354 −0.0746
0.0133 0.4527
20
20
20
20
20
0.0513
8
8
8
8
8
0.0579
4
5
6
6
5
0.0634 0.5387
0.0387
9
11
11
11
11
0.1949 0.7393
A6
0.2247
0.0854
1
1
2
2
1
A7
−0.1303
−0.0866 0.2569 −0.0213
18
18
18
18
18
A8
−0.1072
−0.0596 0.3588 −0.0026
17
17
17
17
17
A9
−0.0747
−0.0296 0.4056
0.0077
14
15
16
15
15
A10
0.1022
0.1271 0.6460
0.0628
5
4
4
4
4
A11
0.1632
0.1644 0.7163
0.0800
2
2
3
3
3
−0.1296 0.1900 −0.0374
19
19
19
19
19
A12 −0.1825 A13
0.1308
0.1358 0.8946
0.1119
3
3
1
1
2
A14
0.0419
0.0733 0.5705
0.0455
10
9
9
9
9
A15
0.0669
0.1057 0.6329
0.0594
6
6
5
5
6
A16 −0.0772
−0.0244 0.4729
0.0192
15
13
12
13
13
A17
0.0634
0.0954 0.6057
0.0532
7
7
7
7
7
A18
0.0312
0.0717 0.5515
0.0436
11
10
10
10
10
A19 −0.0275
−0.0275 0.4271
0.0105
12
14
14
14
14
A20 −0.0914
−0.0467 0.4136
0.0063
16
16
15
16
16
Fig. 1. Comparison of rankings provided by MCDA methods using entropy weighting method.
The results obtained confirm the high convergence of the rankings provided by PROMETHEE II and MABAC and the high similarity of the rankings given by EDAS and MOORA. Furthermore, the high values of the W S coefficient, which is specifically sensitive to the reversals occurring at the top of the rankings, confirm the high convergence on the leadership positions within the PROMETHEE II and MABAC rankings similarly for EDAS and MOORA.
158
A. Baczkiewicz et al.
Fig. 2. Correlations of rankings provided by MCDA methods using entropy weighting method determined by rw and W S coefficients.
As the rankings obtained were not identical, a compromise ranking considering rankings provided by all methods was generated using the DominanceDirected Graph method. The advantage of the compromise ranking technique is that the information provided by several MCDA methods can be exploited so that the final ranking is reliable. The leader of the received compromise ranking is alternative A6 . This alternative has very advantageous performance values of criteria C6 , C7 , and C8 , and meanwhile, the price (C11 ) is relatively low compared to the other alternatives in the evaluated set. The A13 alternative, which has an outstanding favorable performance value of criterion C10 and its price is even lower than the A6 , was ranked second in the compromise ranking. Alternative A11 , which was ranked third in the compromise ranking, has a favorable value for criteria C6 , C7 , and C8 , but its price (C11 ) is higher than A6 . 4.2
MCDA Results of Evaluation with Using Gini Coefficient-Based Weighting Method
In the next stage of the study, all procedures performed in the previous stage for weights determined by the entropy weighting method were repeated for weights generated by the Gini coefficient-based weighting method. Obtained results are included in Table 5 and rankings are displayed in Fig. 3. When using weights determined by the Gini coefficient-based weighting method, alternative A6 received the same results as those obtained for applying the entropy weighting method and was again the leader of the compromise ranking. In contrast to the results obtained for entropy weighting, second place was gained by alternative A11 . The ranks of A11 provided by the four MCDA methods employed in this research are the same as the entropy weighting method.
Towards Reliable Results - A Comparative Analysis of Selected MCDA
159
Table 5. Preference values and rankings provided by MCDA methods using Gini coefficient-based weighting method. Ai
Preference values PROMETHEE II MABAC EDAS
A1
−0.0584
A2
−0.2220
A3
0.0545
0.0872 0.6176
A4
0.1303
0.1194 0.7023
A5
0.0593
0.0668 0.5648
0.0122 0.4857 −0.1812
Rankings MOORA PROMETHEE II MABAC EDAS MOORA Compromise 0.0197
13
12
12
12
12
0.0691 −0.0589
20
20
20
20
19
0.0433
8
6
8
8
7
0.0563
3
3
4
4
3
0.0341
6
10
11
11
10
A6
0.1888
0.1725 0.7946
0.0743
1
1
2
2
1
A7
−0.0853
−0.0473 0.3386
−0.0070
16
15
18
17
16
A8
−0.1177
−0.0625 0.3423
−0.0071
17
18
17
18
18
A9
−0.0822
−0.0349 0.3949
0.0012
14
14
15
14
14
A10
0.0851
0.1127 0.6701
0.0530
4
4
5
5
5
A11
0.1572
0.1553 0.7806
0.0713
2
2
3
3
2
A12 −0.0849
−0.0604 0.2854
−0.0170
15
17
19
19
18
A13
0.0698
0.0850 0.8608
0.0785
5
8
1
1
4
A14
0.0176
0.0581 0.5782
0.0360
11
11
10
10
11
A15
0.0586
0.0966 0.6515
0.0496
7
5
6
6
6
A16 −0.1271
−0.0600 0.4202
0.0011
19
16
13
15
15 8
A17
0.0523
0.0842 0.6204
0.0436
10
9
7
7
A18
0.0526
0.0859 0.6007
0.0428
9
7
9
9
9
A19 −0.0251
−0.0336 0.4181
0.0028
12
13
14
13
13
A20 −0.1236
−0.0714 0.3720
−0.0070
18
19
16
16
17
Fig. 3. Comparison of rankings provided by MCDA methods using Gini coefficientbased weighting method.
However, for the Gini coefficient-based weighting method, alternative A13 in the PROMETHEE II and MABAC rankings did not achieve third place as it did for the entropy weighting method, but the only fifth place in the PROMETHEE II rankings and eighth place in the MABAC rankings. In the EDAS and MOORA rankings, A13 was the leader as it was noted for the entropy weighting method. However, A13 ’s worse scores in the PROMETHEE II and MABAC rankings
160
A. Baczkiewicz et al.
caused this alternative to rank only fourth in the compromise ranking for the Gini coefficient-based weighting method. The third place in the compromise ranking created for the Gini coefficient-based weighting method was taken by alternative A4 (GOPRO HERO9), third in the PROMETHEE II and MABAC rankings, and fourth in the EDAS and MOORA rankings. A4 ranked fifth in the compromise ranking generated for the entropy weighting method. Figure 4 presents the correlation values between the individual rankings generated using the weights determined with the Gini coefficient-based weighting method.
Fig. 4. Correlations of rankings provided by MCDA methods using Gini coefficientbased weighting method determined by rw and W S coefficients.
It can be found that, compared to the results obtained for the weights determined by the entropy weighting method, the divergences between the compared rankings are slightly more significant. In particular, PROMETHEE II and MABAC gave rankings that differ from those provided by EDAS and MOORA. It is worth noting that alternative A6 was identified as the leader of the compromise ranking for both weighting methods used in this study. Although this camera model did not have the most favorable value of the most relevant criterion, namely price, the high values of the other parameters that have to be maximized indicated this alternative as the most favorable model. It implies that even a distinctive advantageous value of a single relevant criterion cannot guarantee the best ranking. It can be observed in the example of alternative A20 (TRACER eXplore SJ 4561). This model did not achieve the top of the rankings despite the lowest price because of unattractive values of the other criteria. Table 6. Correlations of compromise rankings with rankings provided by individual MCDA methods for different weighting methods. Weighting method
Correlation coefficient PROMETHEE II MABAC EDAS MOORA
Entropy
rw
0.9849
0.9974
0.9930 0.9950
WS
0.9722
0.9788
0.9566 0.9566
0.9680
0.9729
0.9757 0.9797
0.9884
0.9902
0.8948 0.8948
Gini coefficient-based rw WS
Towards Reliable Results - A Comparative Analysis of Selected MCDA
161
In the last stage of this research, employing the correlation coefficients rw and W S, it was checked which MCDA method provides results most strongly correlated with the compromise ranking. Obtained results are displayed in Table 6. It was found that the rankings provided by the MABAC method were most consistent with the compromise rankings for both weighting techniques.
5
Conclusions
This paper aimed to investigate the effect of selected MCDA methods and objective weighting techniques on the objectivity and reliability of the resulting rankings. The case study presented in this research was the camera selection problem. The results obtained confirm that several conditions must be respected to obtain appropriate assessment results using MCDA methods. First, it is essential to select methods for the problem to be adequately solved. Second, benchmarking with other methods allowing for comparative analysis is required. Also, a proper selection of criteria weighting methods that objectively determines criteria importance based on data included in the table containing alternatives’ performance values for considered criteria is recommended. Presented research shows that the highest convergence was found in comparing PROMETHEE II and MABAC rankings and rankings provided by EDAS and MOORA. The compromise ranking strategy allowed the advantages and contributions of all methods to be exploited by integrating their results into a single reliable and transparent ranking. Obtained results encourage continuing the research with other MCDA methods, weighting techniques, and compromise ranking strategies to extend the set of methods enabling objectivization of evaluations in the undertaken problem. Acknowledgements. The work was supported by the project financed within the framework of the program of the Minister of Science and Higher Education under the name “Regional Excellence Initiative” in the years 2019–2022, Project Number 001/RID/2018/19; the amount of financing: PLN 10.684.000,00 (A.B. and J.W.) and by the National Science Centre, Decision number UMO-2018/29/B/HS4/02725 (B.K. and W.S.).
References 1. Faizi, S., Salabun, W., Nawaz, S., ur Rehman, A., Watr´ obski, J.: Best-Worst method and Hamacher aggregation operations for intuitionistic 2-tuple linguistic sets. Expert Syst. Appli. 181, 115088 (2021). https://doi.org/10.1016/j.eswa. 2021.115088 2. Pamuˇcar, D., Behzad, M., Boˇzani´c, D., Behzad, M.: Decision making to support sustainable energy policies corresponding to agriculture sector: case study in Iran’s Caspian Sea coastline. J. Clean. Prod. 292, 125302 (2021). https://doi.org/10. 1016/j.jclepro.2020.125302
162
A. Baczkiewicz et al.
3. Ziemba, E.: Synthetic indexes for a sustainable information society: measuring ICT adoption and sustainability in Polish government units. In: Ziemba, E. (ed.) AITM/ISM -2018. LNBIP, vol. 346, pp. 214–234. Springer, Cham (2019). https:// doi.org/10.1007/978-3-030-15154-6 12 4. Ziemba, E.: The contribution of ICT adoption to the sustainable information society. J. Comput. Inf. Syst. 59(2), 116–126 (2019). https://doi.org/10.1080/ 08874417.2017.1312635 5. Ziemba, E.: The contribution of ICT adoption to sustainability: households’ perspective. Inf. Technol. People (2019). https://doi.org/10.1108/ITP-02-2018-0090 ˇ Pamuˇcar, D.: Evaluation and selection of healthcare waste 6. Puˇska, A., Stevi´c, Z., incinerators using extended sustainability criteria and multi-criteria analysis methods. Environ. Dev. Sustain. 1–31 (2021). https://doi.org/10.1007/s10668-02101902-2 7. Watr´ obski, J., Jankowski, J., Piotrowski, Z.: The selection of multicriteria method based on unstructured decision problem description. In: Hwang, D., Jung, J.J., Nguyen, N.-T. (eds.) ICCCI 2014. LNCS (LNAI), vol. 8733, pp. 454–465. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11289-3 46 8. Guitouni, A., Martel, J.M.: Tentative guidelines to help choosing an appropriate MCDA method. Eur. J. Oper. Res. 109(2), 501–521 (1998). https://doi.org/10. 1016/S0377-2217(98)00073-3 9. Salabun, W., Watr´ obski, J., Shekhovtsov, A.: Are MCDA methods benchmarkable? A comparative study of TOPSIS, VIKOR, COPRAS, and PROMETHEE II methods. Symmetry 12(9), 1549 (2020). https://doi.org/10.3390/sym12091549 10. Lombardi Netto, A., Salomon, V.A.P., Ortiz Barrios, M.A.: Multi-criteria analysis of green bonds: hybrid multi-method applications. Sustainability 13(19), 10512 (2021). https://doi.org/10.3390/su131910512 11. Tu¸s, A., Ayta¸c Adalı, E.: The new combination with CRITIC and WASPAS methods for the time and attendance software selection problem. Opsearch 56(2), 528– 538 (2019). https://doi.org/10.1007/s12597-019-00371-6 12. Watr´ obski, J., Jankowski, J.: Guideline for MCDA method selection in production management area. In: R´ oz˙ ewski, P., Novikov, D., Bakhtadze, N., Zaikin, O. (eds.) New Frontiers in Information and Production Systems Modelling and Analysis. ISRL, vol. 98, pp. 119–138. Springer, Cham (2016). https://doi.org/10.1007/9783-319-23338-3 6 13. Watr´ obski, J., Jankowski, J., Ziemba, P., Karczmarczyk, A., Ziolo, M.: Generalised framework for multi-criteria method selection. Omega 86, 107–124 (2019). https:// doi.org/10.1016/j.omega.2018.07.004 14. Kumar, G., Parimala, N.: A weighted sum method MCDM approach for recommending product using sentiment analysis. Int. J. Bus. Inf. Syst. 35(2), 185–203 (2020). https://doi.org/10.1504/IJBIS.2020.110172 15. Marsh, K., Thokala, P., M¨ uhlbacher, A., Lanitis, T.: Incorporating preferences and priorities into MCDA: selecting an appropriate scoring and weighting technique. In: Marsh, K., Goetghebeur, M., Thokala, P., Baltussen, R. (eds.) Multi-Criteria Decision Analysis to Support Healthcare Decisions, pp. 47–66. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47540-0 4 16. Cinelli, M., Kadzi´ nski, M., Gonzalez, M., Slowi´ nski, R.: How to support the application of multiple criteria decision analysis? Let us start with a comprehensive taxonomy. Omega 96, 102261 (2020). https://doi.org/10.1016/j.omega.2020.102261
Towards Reliable Results - A Comparative Analysis of Selected MCDA
163
17. Chmielarz, W., Zborowski, M.: Scoring method versus TOPSIS method in the evaluation of E-banking services. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 683–689. IEEE (2018). https://doi.org/ 10.15439/2018F115 18. Watr´ obski, J., Ziemba, E., Karczmarczyk, A., Jankowski, J.: An index to measure the sustainable information society: the Polish households case. Sustainability 10(9), 3223 (2018). https://doi.org/10.3390/su10093223 19. Karczmarczyk, A., Watr´ obski, J., Jankowski, J., Ziemba, E.: Comparative study of ICT and SIS measurement in Polish households using a MCDA-based approach. Procedia Comput. Sci. 159, 2616–2628 (2019). https://doi.org/10.1016/j.procs. 2019.09.254 20. Chmielarz, W., Zborowski, M.: The selection and comparison of the methods used to evaluate the quality of e-banking websites: the perspective of individual clients. Procedia Comput. Sci. 176, 1903–1922 (2020). https://doi.org/10.1016/j.procs. 2020.09.230 21. Brans, J.P., Vincke, P., Mareschal, B.: How to select and how to rank projects: the PROMETHEE method. Eur. J. Oper. Res. 24(2), 228–238 (1986). https://doi. org/10.1016/0377-2217(86)90044-5 22. Bagherikahvarin, M., De Smet, Y.: A ranking method based on DEA and PROMETHEE II (a rank based on DEA & PR. II). Measurement 89, 333–342 (2016). https://doi.org/10.1016/j.measurement.2016.04.026 23. Andreopoulou, Z., Koliouska, C., Galariotis, E., Zopounidis, C.: Renewable energy sources: using PROMETHEE II for ranking websites to support market opportunities. Technol. Forecast. Soc. Change 131, 31–37 (2018). https://doi.org/10.1016/ j.techfore.2017.06.007 24. Polatidis, H., Haralambidou, K., Haralambopoulos, D.: Multi-criteria decision analysis for geothermal energy: a comparison between the ELECTRE III and the PROMETHEE II methods. Energy Sources Part B 10(3), 241–249 (2015). https:// doi.org/10.1080/15567249.2011.565297 25. Sotiropoulou, K.F., Vavatsikos, A.P.: Onshore wind farms GIS-Assisted suitability analysis using PROMETHEE II. Energy Policy 158, 112531 (2021). https://doi. org/10.1016/j.enpol.2021.112531 A., Kizielewicz, B.: Towards sustainable energy consumption evalua26. Baczkiewicz, tion in Europe for industrial sector based on MCDA methods. Procedia Comput. Sci. 192, 1334–1346 (2021). https://doi.org/10.1016/j.procs.2021.08.137 27. Chmielarz, W., Zborowski, M.: On the assessment of e-banking websites supporting sustainable development goals. Energies 15(1), 378 (2022). https://doi.org/10. 3390/en15010378 28. Chmielarz, W., Zborowski, M.: Towards sustainability in E-banking website assessment methods. Sustainability 12(17), 7000 (2020). https://doi.org/10.3390/ su12177000 29. Chmielarz, W., Zborowski, M.: A hybrid method of assessing individual electronic banking services in 2019. The case of Poland. Procedia Comput. Sci. 176, 3881– 3889 (2020). https://doi.org/10.1016/j.procs.2020.10.093 ´ 30. Pamuˇcar, D., Cirovi´ c, G.: The selection of transport and handling resources in logistics centers using Multi-Attributive Border Approximation area Comparison (MABAC). Expert Syst. Appl. 42(6), 3016–3028 (2015). https://doi.org/10.1016/ j.eswa.2014.11.057
164
A. Baczkiewicz et al.
31. Gigovi´c, L., Pamuˇcar, D., Boˇzani´c, D., Ljubojevi´c, S.: Application of the GISDANP-MABAC multi-criteria model for selecting the location of wind farms: a case study of Vojvodina, Serbia. Renew. Energy 103, 501–521 (2017). https://doi. org/10.1016/j.renene.2016.11.057 32. Shahiri Tabarestani, E., Afzalimehr, H.: A comparative assessment of multi-criteria decision analysis for flood susceptibility modelling. Geocarto Int. 1–24 (2021). https://doi.org/10.1080/10106049.2021.1923834 A., Kizielewicz, B., Shekhovtsov, A., Watr´ 33. Baczkiewicz, obski, J., Salabun, W.: Methodical aspects of MCDM based E-commerce recommender system. J. Theor. Appl. Electron. Commer. Res. 16(6), 2192–2229 (2021). https://doi.org/10.3390/ jtaer16060122 34. Chmielarz, W., Zborowski, M.: Analysis of e-banking websites’ quality with the application of the TOPSIS method-a practical study. Procedia Comput. Sci. 126, 1964–1976 (2018). https://doi.org/10.1016/j.procs.2018.07.256 35. Dhanalakshmi, C.S., Madhu, P., Karthick, A., Mathew, M., Kumar, R.V.: A comprehensive MCDM-based approach using TOPSIS and EDAS as an auxiliary tool for pyrolysis material selection and its application. Biomass Conv. Bioref. 1–16 (2020). https://doi.org/10.1007/s13399-020-01009-0 36. Keshavarz Ghorabaee, M., Zavadskas, E.K., Olfat, L., Turskis, Z.: Multi-criteria inventory classification using a new method of evaluation based on distance from average solution (EDAS). Informatica 26(3), 435–451 (2015). https://doi.org/10. 15388/Informatica.2015.57 37. Tadi´c, S., Krsti´c, M., Brnjac, N.: Selection of efficient types of inland intermodal terminals. J. Transp. Geogr. 78, 170–180 (2019). https://doi.org/10.1016/ j.jtrangeo.2019.06.004 38. Skvarciany, V., Jureviˇcien˙e, D., Volskyt˙e, G.: Assessment of sustainable socioeconomic development in European Union countries. Sustainability 12(5), 1986 (2020). https://doi.org/10.3390/su12051986 39. Krishankumar, R., Pamucar, D., Deveci, M., Ravichandran, K.S.: Prioritization of zero-carbon measures for sustainable urban mobility using integrated double hierarchy decision framework and EDAS approach. Sci. Total Environ. 797, 149068 (2021). https://doi.org/10.1016/j.scitotenv.2021.149068 40. Aggarwal, A., Choudhary, C., Mehrotra, D.: Evaluation of smartphones in Indian market using EDAS. Procedia Comput. Sci. 132, 236–243 (2018). https://doi.org/ 10.1016/j.procs.2018.05.193 41. Karande, P., Zavadskas, E., Chakraborty, S.: A study on the ranking performance of some MCDM methods for industrial robot selection problems. Int. J. Ind. Eng. Comput. 7(3), 399–422 (2016). https://doi.org/10.5267/j.ijiec.2016.1.001 42. Brauers, W.K., Zavadskas, E.K.: The MOORA method and its application to privatization in a transition economy. Control. Cybern. 35(2), 445–469 (2006) 43. Indrajayanthan, V., Mohanty, N.K.: Assessment of clean energy transition potential in major power-producing states of India using multi-criteria decision analysis. Sustainability 14(3), 1166 (2022). https://doi.org/10.3390/su14031166 44. Brauers, W.K.M., Zavadskas, E.K., Peldschus, F., Turskis, Z.: Multi-objective optimization of road design alternatives with an application of the MOORA method (2008) A., Shekhovtsov, A., Watr´ 45. Kizielewicz, B., Baczkiewicz, obski, J., Salabun, W.: Towards the RES development: multi-criteria assessment of energy storage devices. In: 2021 International Conference on Decision Aid Sciences and Application (DASA), pp. 766–771. IEEE (2021). https://doi.org/10.1109/DASA53625.2021. 9682220
Towards Reliable Results - A Comparative Analysis of Selected MCDA
165
46. Shekhovtsov, A., Wieckowski, J., Kizielewicz, B., Salabun, W.: Towards Reliable Decision-Making in the green urban transport domain. Facta Universitatis Ser. Mech. Eng. (2021). https://doi.org/10.22190/FUME210315056S 47. Altuntas, S., Dereli, T., Yilmaz, M.K.: Evaluation of excavator technologies: application of data fusion based MULTIMOORA methods. J. Civ. Eng. Manag. 21(8), 977–997 (2015). https://doi.org/10.3846/13923730.2015.1064468 48. Wu, W.W.: Beyond Travel & Tourism competitiveness ranking using DEA, GST, ANN and Borda count. Expert Syst. Appl. 38(10), 12974–12982 (2011). https:// doi.org/10.1016/j.eswa.2011.04.096 49. Hafezalkotob, A., Hafezalkotob, A., Liao, H., Herrera, F.: An overview of MULTIMOORA for multi-criteria decision-making: theory, developments, applications, and challenges. Inf. Fusion 51, 145–177 (2019). https://doi.org/10.1016/j.inffus. 2018.12.002 50. Ecer, F.: A consolidated MCDM framework for performance assessment of battery electric vehicles based on ranking strategies. Renew. Sustain. Energy Rev. 143, 110916 (2021). https://doi.org/10.1016/j.rser.2021.110916 51. Karabasevic, D., Stanujkic, D., Urosevic, S., Maksimovic, M.: Selection of candidates in the mining industry based on the application of the SWARA and the MULTIMOORA methods. Acta Montanist. Slovaca 20(2), 116–124 (2015) 52. Ziemba, P.: Towards strong sustainability management-a generalized PROSA method. Sustainability 11(6), 1555 (2019). https://doi.org/10.3390/su11061555 53. Ziemba, P., Watr´ obski, J., Ziolo, M., Karczmarczyk, A.: Using the PROSA method in offshore wind farm location problems. Energies 10(11), 1755 (2017). https://doi. org/10.3390/en10111755 54. Papathanasiou, J., Ploskas, N., et al.: Multiple Criteria Decision Aid. Methods, Examples and Python Implementations, vol. 136. Springer, Cham (2018) 55. Ziemba, P.: Multi-criteria stochastic selection of electric vehicles for the sustainable development of local government and state administration units in Poland. Energies 13(23), 6299 (2020). https://doi.org/10.3390/en13236299 56. Lotfi, F.H., Fallahnejad, R.: Imprecise Shannon’s entropy and multi attribute decision making. Entropy 12(1), 53–62 (2010). https://doi.org/10.3390/e12010053 ˇ 57. Lai, H., Liao, H., Saparauskas, J., Banaitis, A., Ferreira, F.A., Al-Barakati, A.: Sustainable cloud service provider development by a Z-number-based DNMA method with Gini-coefficient-based weight determination. Sustainability 12(8), 3410 (2020). https://doi.org/10.3390/su12083410
Towards a Web-Based Platform Supporting the Recomposition of Business Processes Piotr Wi´sniewski1(B) , Agata Bujak1 , Krzysztof Kluza1 , Anna Suchenia2 , 1 Mateusz Zaremba1 , Pawel Jemiolo1 , and Antoni Ligeza 1
2
AGH University of Science and Technology, al. A. Mickiewicza 30, 30-059 Krakow, Poland [email protected] Cracow University of Technology, ul. Warszawska 24, 31-155, Krak´ ow, Poland
Abstract. Modeling is an integral part of business process management. However, it is often a repetitive work and therefore, methods of its improvement are being invented. In this paper, we aim at facilitating the modeling phase by reusing components from existing repositories and assembling them into new models in a manual or automated way. As a first approach to assess the implementation aspects of this method, we propose a web-based system that supports the recomposition of business processes. The aim of our research is to obtain an application which is clear, easy to use and at the same time functional for advanced users. In the process of user interface design, Double Diamond method was used, containing a benchmark and personas. The application was tested in two iterations on groups of four people. Testing results show that the final system is intuitive and also suitable for experienced business users. Keywords: Business process management · Process models Process decomposition · Process composition
1
· BPMN ·
Introduction
The business process approach is a common way to manage organizations, especially in the technical and financial industries [1–4]. The number of notations for visualizing various aspects of processes and software products are growing [5]. There are plenty of tools supporting knowledge management processes [6]. Observing process industrial applications [7], manually created models have often quality defects. For this reason, computer-assisted modeling is a valuable technique to eliminate basic errors at the implementation stage. The goal of our research was to offer a method and tool for computer-aided recomposition of process models, defined as the use of existing processes to create new ones. This technique entails breaking down old BPMN diagrams into smaller pieces from which new models can be built. Such type of model generation might be done manually by business analysts. c Springer Nature Switzerland AG 2022 E. Ziemba and W. Chmielarz (Eds.): FedCSIS-AIST 2021/ISM 2021, LNBIP 442, pp. 166–185, 2022. https://doi.org/10.1007/978-3-030-98997-2_9
Towards a Web-Based Platform Supporting the Recomposition
167
In this paper, we provide an overview of the research aiming at development of a web-based platform that supports recomposition of business process models, the method presented in our short conference paper [8]. We focus on presentation of the architecture and the interface design of our system that uses the recomposition method, which may speed up the modeling and prototyping of business processes. The goal was to support business users with any level of experience with modeling software. In order to achieve this, it was necessary to review the existing technologies and the subject matter of the system to be built. Taking into account that there are no applications using the recomposition method yet, it was crucial to design the user interface of the system in a proper way. In the following section, we provide an overview of recomposition method and the principles of interface design, including the Double Diamond method that guides the design process. Then, in Sect. 3, the application development is described, starting from the target group using personas and benchmarking, through defining the problems and project goals, and finally prototyping the application. Later, the tests of the prototypes in two iterations on a predefined group of respondents are presented in Sect. 4. Finally, we discuss the related works and our results in Sect. 5, and conclude the paper in Sect. 6.
2
Theoretical Background
Process modeling allows for graphical representation of tasks and events occurring in the process. The notation for modeling business processes is an important aspect of their representation. The most commonly used notation is BPMN (Business Process Model and Notation) [9]. Running Example. Two example BPMN processes were used to illustrate the approach presented in this paper. They represent purchasing furniture in an online shop (Fig. 1) and in a physical store (Fig. 2).
Fig. 1. Furniture buying process – online shop.
168
P. Wi´sniewski et al.
Fig. 2. Furniture buying process – physical shop.
2.1
Business Process Recomposition
The concept of business process recomposition consists in building graphs using the existing sub-diagrams in the process repository, created by decomposing previously used models into smaller elements [8]. Decomposition of Process Models into Sub-diagrams. A process model might be broken down into sub-diagrams according to the formula 2 ≤ k ≤ n, where k denotes the number of elements in sub-diagrams and n denotes the number of elements in the source process model. Once the number k is determined, the next step is to build a neighborhood matrix for the subgraphs. To do this, a neighborhood matrix is created for the entire process and then columns and rows that do not apply to the subgraph are removed. Figures 3 and 4 present results of decomposition performed on example models from Figs. 1 and 2, respectively.
Fig. 3. Process 1 after decomposition into 3-element diagrams.
After braking down a process model into sub-diagrams, the models are stored in the component repository. There, the components are categorized due to the features of the component. The main tasks of the repository are to facilitate component discovery and to avoid anomalies that cause errors in the process.
Towards a Web-Based Platform Supporting the Recomposition
169
Fig. 4. Process 2 after decomposition into 4-element diagrams.
The categorization occurs because of the similarities between the diagrams and the potential number of inputs and outputs. Objects are divided into five basic groups: 1. 2. 3. 4. 5.
Sources, Sinks, SESE (single entry, single exit), SEME (single entry, multiple exits), MESE (multiple entries, multiple exits).
Table 1 helps to determine the group of a component, based on the functions σ0 and σ1. Their values depend on the number of inputs for σ0 and outputs for σ1 and take values: – 0 no entry/exit, – 1 single entry/exit, – 2 other cases.
Table 1. Values of the σ functions. Based on [8]. σ1 σ0 0
1
2
0
Subprocess Output Output
1
Source
SESE
SEME
2
Source
SEME
MEME
Synthesis of process models based on the sub-diagrams stored in a repository can be done in two ways. In the first one, the user manually builds a process model in a graphical editor. The second approach is an automated one. It consists in selecting the tasks that should be performed in the process. In the next step, the system constructs a set of syntactically correct models using the Constraint Programming technique. In the formulated Constraint Satisfaction Problem, the input consists of subdiagrams that contain the selected tasks along with neighboring flow objects such as gateways and events. Generated models must satisfy the set of predefined constraints such as:
170
P. Wi´sniewski et al.
1. The process must start with only one start event. 2. There has to be at least one end event in the process. 3. All the inputs and outputs of the flow objects must be connected. Then, from the generated results, a user can choose the most suitable one to be finally saved and implemented. Figure 5 presents a recomposed process of buying furniture.
Fig. 5. Result of recomposition of processed 1 and 2.
2.2
Application Interface Design
Usability Heuristics for User Interface Design. A user interface is the way a device presents information. The elementary principles for designing and checking the usability of interfaces are the so called Nielsen Heuristics, a set of ten principles for human-machine interaction, created based on several years of usability research by Jacob Nielsen and Ralf Molich [10]. The 10 Nielsen’s general principles for interaction design were used in the design of our application interface. Double Diamond Method. The Double Diamond method, created and disseminated by the British Design Council in 2005, was used in the design of our application. The approach consists of four steps, which are formed by two diamonds that define the most important points of the design. 1. Discover – getting to know the user and understanding their expectations and preferences resulting from the solutions used so far. 2. Explore and Define – determining user requirements based on the information gathered earlier. When working with a larger team, this is the time to also learn about the technical capabilities and limitations of the project. 3. Develop – making a prototype based on the collected materials. 4. Deliver and Listen – testing the prototype and gathering feedback. The relationship between tasks found together in diamonds is also important, as it is required to revisit the previous elements in order to check if the results match the previous assumptions. Accordingly, after completing the second step Explore and Define, we compare the obtained conclusions with the information gathered during the user needs survey and possibly perform it again if the
Towards a Web-Based Platform Supporting the Recomposition
171
insights we obtained are unsatisfactory. A similar mechanism occurs in the case of the second part of the process, when, after the Deliver and Listen step and gathering feedback based on the tests conducted, we return to improve the mockups using the users’ comments. The British Design Council also presents additional design principles that should be taken into account alongside the Double Diamond method: – Put people first. The design process should focus on understanding the user, needs, and habits based on the services used so far. – Use visual communication that everyone can understand. Help the user gain a shared understanding of the problem and idea. – Collaborate and co-create. Gain inspiration by working with others. – Repeat the process. This helps to avoid mistakes and be confident in your solution. 2.3
User Interface Testing
Testing is a crucial part of the design process, allowing designers to check the product and fix bugs before launch. In addition, by exploring user sentiment early on, we can prevent financial loss due to rejection of the technology and check how well we understand the target audience. We can distinguish several aims of usability studies [11]: – Usability of the product, i.e. how intuitive users find it to understand and use, – The emotions the user experiences when using the product, – How attractive the solutions used are, whether they are preferred by the user and whether they meet the user’s aesthetic requirements. Because of the difficulty in checking the other features, designers most often deal with usability testing. Apart from the ease of determination and repair, the usability directly impacts the user’s evaluation of the product. Usability studies are divided into two groups, which differ in the manner and purpose of the study. The first one, formative, is used to determine the advantages and disadvantages of the interface. Such research is carried out qualitatively, and its result is a list of problems encountered by the user. The second group, summative, is based on the quantitative method and focuses on determining the interface’s usability using its overall evaluation. It works well when there are several solutions to choose from, and one wants to select the one that suits the users best. Test Sample. Based on research of Nielsen [12], the results of the usability testing versus number of test users, a group of five is sufficient for usability testing, as it detects about 85% of the interface deficiencies that occur.
172
P. Wi´sniewski et al.
Some researchers remain sceptical of this approach, including Laura Faulkner. She carried out usability tests on a group of sixty people and checked how the results would look for different sized study groups with a random selection of participants [13]. It showed that the average number of errors found for the five subjects coincides with Nielsen’s results, but there is a significant deviation between the groups. The weakest detected only 55% of the errors, while the most effective detected 95%. Increasing the number of testers to ten had the effect of raising the lowest score to 85% error detection efficiency. It is therefore recommended that the number of test users should be between five and ten. Another issue is the number of test iterations. Steve Krug [14] suggests that instead of testing on a larger group of test subjects, a better practice is to reduce the group and run another test after fixing the detected bugs first. During the second test, users can notice smaller bugs that they did not pay attention to before, as they would be focused on the flaws that interfere with their understanding of the site to a greater extent (Fig. 6).
Fig. 6. Relationship between number of iterations and effectiveness of testing based on S. Krug [14]
In our project, it was decided that the interface testing would be carried out in two iterations on groups of four people. There are not much benefits in testing additional users [15]. On the other hand, having three regular users ensure covering the diversity of behavior within the group [12], and one expert provides an additional professional insight.
Towards a Web-Based Platform Supporting the Recomposition
173
Task-Based Usability Testing. Task-based usability testing constitutes one of the most popular testing methods. It consists in guiding the user through a test scenario in which they are asked to perform selected actions while using an interface. While observing the task, the researcher tries to understand how users use the system and encounter problems. There are many varieties of usability tests. The ones used in our project are described below. Five Second Testing. Five Second Test is a method that uses the belief that the first few seconds of user contact with the system are crucial, as this is enough time for the user to form an opinion about it and decide whether to leave it. It consists in showing a mock-up of the interface and, after 5 s, asking about the impression it made on the user. It may include questions about the elements they remember or the feelings the presented system evoked in them. If one of the website’s functionalities is important, one can try to find out in which direction the user would go to reach it. A/B Testing. In A/B Tests [16], two solutions are presented to the user, from which they have to choose the one that suits them better. These are usually slight differences, but they can affect the perception of a website or brand. This test is traditionally used in the field of e-commerce, but is also suitable for evaluating website design. Coaching Method. The coaching method is a solution that involves the researcher who, while the user performs the tasks planned in the scenario, answers their questions and presents the system. The method encourages sharing difficulties and thinking during the study, which can produce better results than simply observing the user.
3
Designing User Interface for Recomposition Method
In our research concerning the user interface for our recomposition method, we followed the steps of the good practices in application interface design described in the previous section. To explore the target group, we collected the data about users and created personas. For exploring the existing software, we use the benchmarking method. 3.1
Personas
Personas [17] are user models created based on collected data about the target group. They should represent individual characteristics that help to empathise as well as needs concerning the product. Figure 7 presents graphics depicting our two personas. The first persona has been working with process modelling for a long time. They are familiar with the available software and looking for one that would make their work more efficient.
174
P. Wi´sniewski et al.
Fig. 7. Graphics depicting two personas.
The second persona is a beginner in the industry. They want the applications their use to be intuitive and easy to use, even for people who do not have much experience with software of a similar application. 3.2
Benchmark
Benchmarking is a procedure often used during system redesign in order to learn about newer solutions and gain inspiration. In the case of this project, it is intended to help understand user habits and identify language and solutions that are intuitive to the user. Adapting ICT for enterprise management is an important factor [18], especially in the case of BPM systems [19]. Currently, there is no system supporting our solution, so this research focuses on interfaces for designing business processes. We have selected four modeling applications: Lucidchart, Smartdraw, Diagrams.net and Camunda Modeler. We evaluated the common elements that may determine what the user is used to will be described and then the advantages and disadvantages of each service will be presented. In Figs. 8, 9, 10 and 11, the user interfaces of the four selected tools were presented. One can see that the interfaces of the selected applications do not differ significantly in design. Most of their functions are located in similar places. Each of the interfaces consists of a horizontal menu bar in the upper part and a panel on the left, from which we can select elements of the business process divided into categories. Below the menu bar, a toolbar is placed. Additional options in Lucidchart and Diagrams.net applications are located in the panel on the right.
Towards a Web-Based Platform Supporting the Recomposition
Fig. 8. Lucidchart user interface.
Fig. 9. Smartdraw user interface.
Fig. 10. Diagrams.net user interface.
175
176
P. Wi´sniewski et al.
Fig. 11. Camunda modeler user interface. Table 2. Comparison of the advantages and disadvantages Editor
Advantages
Disadvantages
Lucidchart
- clear user interface,
- small number of the available BPMN elements
- possibility of importing data, - easy-to-use editing menu bar, - easy way of choosing the notation, - ability to design several processes at the same time and move elements between them, - extensive page editing menu, - possibility of introducing automatic formatting rules Smartdraw
- possibility of changing options in the toolbar on the left, - possibility of adding elements in a quick way,
-
least pleasant to use, no ’undo’ function, no item search option, quick elements joining is not possible
- ability to design several processes at the same time and move elements between them, - check spelling Diagrams.net
- clear user interface,
- small number of page editing options,
- many BPMN elements available,
- very simple editor
- ability to design several processes at the same time and move elements between them, Camunda modeler
- clear user interface, - minimalistic, - no unnecessary features
- lack of some functions, e.g. text editing, - lack of explicit element types in the menu (each type has to be edited after inserting element), - no possibility to change the appearance of diagrams
Towards a Web-Based Platform Supporting the Recomposition
3.3
177
Defining the Problem
The main design issues are to create an intuitive, easy interface to use and, at the same time, offers streamlining possibilities for advanced users. At the same time, the application should build on existing habits and experiences and provide a better alternative to existing systems. 3.4
Developing User Interface
Based on our research, we developed a web-based user interface for the application that implements the recomposition method. The home page is presented in Fig. 12. The most significant part of the home page is the diagramming space. A grid has been used to facilitate the positioning of elements. On the left-hand side, in the side panel, there is a button directing the user to the model archive. Below are grouped elements of the BPMN notation that can be dragged onto the grid to create the process. In the middle, there is a panel for editing text. Above it, there is a menu toolbar with options for saving, opening a new file, basic editing tasks, view handling and help.
Fig. 12. Home page interface.
After clicking on the ARCHIVE button, one is taken to a new window (Fig. 13). Here, they can select the saved elements of the process and, after passing further (Fig. 14), choose from the generated solutions the one that meets their requirements. Another option available in the new window is to decompose and add subprocesses to the archive. The user can manually select which of the resulting
178
P. Wi´sniewski et al.
sub-diagrams are saved to the repository. The proposed system includes also the import option that allows a user to include elements from an external source in BPMN 2.0 XML format into the archive.
Fig. 13. Archive interface.
Fig. 14. Process selection interface.
Towards a Web-Based Platform Supporting the Recomposition
4
179
Testing of Prototypes
As a part of delivery of the project, we have tested our prototypes. Two groups of four people took part in the tests. In each group, there was one person with experience in process design and three who are not involved in this. However, they have a lot of experience with diverse systems. 4.1
First Iteration
Five Second Test. After the five second test, each respondent remembered the ARCHIVE button, the side panel and the text editing panel. They had positive impressions of the application. A/B Test. Several colour options were presented during A/B testing – pink and orange, nautical and purple. The second option was unanimously chosen. Coaching Method. After a brief introduction, users were asked to add subdiagrams to the application repository and see which ones were already there. In the case of two users, they had trouble finding the archive because they thought the ARCHIVE button was to archive the created process. The other two had no problem hitting the repository. First Iteration Conclusions. The button should be renamed because the current name might be confusing. Thus, the word ’library’ has been selected. 4.2
Second Iteration
Five Second Test. After the five-second test, the users claim that the application looks clear and professional. The most striking feature is the LIBRARY function and the notation elements below. A/B Test. Once again, several colour options were shown. Once again, the marine version was chosen. Coaching Method. The respondents had no problem performing tasks such as finding a repository or dividing the created process into sub-diagrams and attaching them to the application memory. However, a userexperience in creating processes pointed out that a function that would select elements and combine them into a process would be a great help. Second Iteration Conclusions. The function mentioned by the experienced user should be included.
180
4.3
P. Wi´sniewski et al.
Final Result
As suggested, we added a new function that allows a user to select elements of the diagram and combine them into a process by the algorithm (Fig. 15). After clicking NEXT, a user is taken to a screen that displays the available possibilities, and the user selects the one that best suits their needs (Fig. 16).
Fig. 15. The interface of the process creation function using an algorithm.
Fig. 16. Interface of the second step of the process creation function using an algorithm.
Towards a Web-Based Platform Supporting the Recomposition
5
181
Discussion
Business process models can be obtained as a result of a mapping, transformation or translation from other existing diagrams. The easiest way of obtaining business process models is by translation from the models in other representations. This way produces high-quality models; however, it requires the existing models to be available. There are several existing mappings between the notations, e.g. the mapping of the Unified Modeling Language (UML) use case diagrams into process models is an example of a transformation approach [20,21]. Moreover, logic expressions can also be used to represent UML activity diagrams [22] and translated into BPMN. In such an approach, it is possible to connect process patterns to formal specifications. Although the set of mapping rules is primarily intended to extract logical specifications from the process model, it also allows for the generation of a model based on a logical expression that establishes relationships between activities. There are also integration approaches, which could be used for a partial translation from use case diagrams [23,24], activity diagrams [25–28], sequence diagrams [29] as well as a combination of different UML diagrams integrated with the process [30,31]. In our approach, however, we consider using the existing diagrams, e.g. from business process repositories [32], especially in the BPMN notation, but decomposed into sub-diagrams. Although managing large business process collections is not a straightforward task [33], it provides a way to reuse the existing process models. One of the ways is by finding a required model in the repository using some matching technique [34]. In this way mostly whole process models can be reused. However, the repository might store reusable process fragments which could be reused. Such fragments might be just syntactically connected elements from the existing diagrams, i.e. sub-diagrams without precise semantics [35] or more complex domain patterns with a specific semantics [36,37]. Moreover, the patterns might also be supported by specified services [38,39] making the models executable. As in the repository there might exist process models representing various workflows of similar processes, these models can be used for composing a more universal model. An example of such an approach is Decomposition Driven Consolidation [40] which supports modeling of processes by reducing possibility of inconsistencies which can occur when creating models manually as well as eliminated redundancies in the created models. In our research, however, we provided the method and design of the user interface for modeling process models using the recomposition method which was not provided in the state of the art solutions. In the future, we want to focus on improving semantic description of the decomposed parts and the similarity calculation for them. Moreover, we want to provide the support for collective decision making [41] about the process composition as well as adding new process fragments acquired from other representations such as natural language texts (documents, e-mails, log data) [42–46].
182
6
P. Wi´sniewski et al.
Conclusions
In the paper, we have made a step towards implementing the process recomposition method, the aim of which is to model business processes more efficiently. Our aim was to present the building method and the architecture of the user interface that clearly and intuitively enables the construction of a new process model based on reusable fragments. For testing our solution, a benchmark and personas were used. Competitor analysis was also helpful and showed a lot of users’ habits and customs. The personas made it possible to empathise with the target group as well as focus on beginner users who had no experience with the competition solutions. Respondents with no experience of process modelling helped to make the product more intuitive. Furthermore, we consulted experts with professional experience in process building concerning our solution. Compared to the concepts reviewed in the benchmarking phase (Sect. 3.2), our solution benefits from being based on the recomposition method, that includes not only using existing process fragments, but also deconstructing complete models into sub-diagrams. As limitation, we could mention the fact that a live application is still under development. Therefore, it was not yet possible to test its usability on a set of real-life business examples.
References 1. Ziemba, E., Oblak, I.: Critical success factors for ERP systems implementation in public administration. Interdisc. J. Inf., Knowl., Manage. 8, 1–19 (2013). https:// doi.org/10.28945/1785 2. Bitkowska, A., et al.: The relationship between business process management and knowledge management-selected aspects from a study of companies in Poland. J. Entrepreneurship, Manag. Innov. 16(1), 169–193 (2020) 3. Chmielarz, W., Zborowski, M., Biernikowicz, A.: Analysis of the importance of business process management depending on the organization structure and culture. In: 2013 Federated Conference on Computer Science and Information Systems, pp. 1079–1086. IEEE (2013) 4. Ziemba, E. (ed.): AITM/ISM -2019. LNBIP, vol. 380. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43353-6 A., Mroczek, A.S.: Comparison of 5. Kluza, K., Wi´sniewski, P., Jobczyk, K., Ligeza, selected modeling notations for process, decision and system modeling. In: Ganzha, M., Maciaszek, L., Paprzycki, M. (eds.) Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, Annals of Computer Science and Information Systems, vol. 11, pp. 1095–1098. IEEE (2017) 6. Pondel, M., Pondel, J.: Selected It tools in enterprise knowledge management processes – overview and efficiency study. In: Mercier-Laurent, E., Boulanger, D. (eds.) AI4KM 2017. IAICT, vol. 571, pp. 12–28. Springer, Cham (2019). https://doi.org/ 10.1007/978-3-030-29904-0 2 7. Leopold, H., Mendling, J., G¨ unther, O.: Learning from quality issues of BPMN models from industry. IEEE Software 33(4), 26–33 (2015). https://doi.org/10. 1109/MS.2015.81
Towards a Web-Based Platform Supporting the Recomposition
183
8. Wi´sniewski, P., Kluza, K., Jemiolo, P., Ligeza, A., Suchenia, A.: Business process recomposition as a way to redesign workflows effectively. In: 2021 16th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 471–474. IEEE (2021). https://doi.org/10.15439/2021F138 9. Chinosi, M., Trombetta, A.: BPMN: an introduction to the standard. Comput. Stand. Interface. 34(1), 124–134 (2012). https://doi.org/10.1016/j.csi.2011.06.002 10. Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 249–256. CHI 1990, Association for Computing Machinery, New York, NY, USA (1990). https://doi.org/10.1145/97243.97281 11. Seffah, A., Gulliksen, J., Desmarais, M.: Human-Centered Software Engineering - Integrating Usability in the Software Development Lifecycle. Springer, HumanComputer Interaction Series (2006) 12. Nielsen, J.: Why you only need to test with 5 users (2000). https://www.nngroup. com/articles/why-you-only-need-to-test-with-5-users/. Accessed 12 June 2020 13. Faulkner, L.: Beyond the five-user assumption: benefits of increased sample sizes in usability testing. Behav. Res. Methods, Inst., Comput. 35(3), 379–383 (2003). https://doi.org/10.3758/bf03195514 14. Krug, S., Black, R.: Don’t Make Me Think! A Common Sense Approach to Web Usability, 1st edn. Que Corp, USA (2000) 15. Nielsen, J., Landauer, T.K.: A mathematical model of the finding of usability problems. In: Proceedings of the INTERACT 1993 and CHI 1993 conference on Human factors in computing systems, pp. 206–213 (1993) 16. Christian, B.: The a/b test: Inside the technology that’s changing the rules of business (2012). https://www.wired.com/2012/04/ff-abtesting/. Accessed 12 June 2020 17. Cooper, A.: Why High-tech Products Drive Us Crazy and how to Restore the Sanity. Sams Publishing, Carmel (2004) 18. Ziemba, E.: Synthetic indexes for a sustainable information society: measuring ICT adoption and sustainability in polish enterprises. In: Ziemba, E. (ed.) AITM/ISM -2017. LNBIP, vol. 311, pp. 151–169. Springer, Cham (2018). https://doi.org/10. 1007/978-3-319-77721-4 9 19. Gabryelczyk, R.: Exploring BPM adoption factors: insights into literature and experts knowledge. In: Ziemba, E. (ed.) AITM/ISM -2018. LNBIP, vol. 346, pp. 155–175. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15154-6 9 20. Nawrocki, J.R., Nedza, T., Ochodek, M., Olek, L.: Describing business processes with use cases. In: Abramowicz, W., Mayr, H.C. (eds.) Business Information Systems, 9th International Conference on Business Information Systems, BIS 2006, May 31 - June 2, 2006, Klagenfurt, Austria, pp. 13–27. Gesellschaft f¨ ur Informatik e.V, Bonn (2006) 21. Lubke, D., Schneider, K., Weidlich, M.: Visualizing use case sets as BPMN processes. In: 2008 Requirements Engineering Visualization, pp. 21–25. IEEE (2008). https://doi.org/10.1109/REV.2008.8 22. Klimek, R., Faber, L., Kisiel-Dorohinicki, M.: Verifying data integration agents with deduction-based models. In: Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on. pp. 1029–1035. IEEE (2013) 23. Zafar, U., Bhuiyan, M., Prasad, P., Haque, F.: Integration of use case models and BPMN using goal-oriented requirements engineering. J. Comput. 13(2), 212–222 (2018). https://doi.org/10.17706/jcp.13.2.212-221
184
P. Wi´sniewski et al.
24. Wautelet, Y., Poelmans, S.: An integrated enterprise modeling framework using the RUP/UML business use-case model and BPMN. In: Poels, G., Gailly, F., Serral Asensio, E., Snoeck, M. (eds.) PoEM 2017. LNBIP, vol. 305, pp. 299–315. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70241-4 20 25. Salma, K., Khalid, B., et al.: Product design methodology for modeling multi business products: comparative study between UML and BPMN modeling for business processes. J. Theor. Appl. Inf. Technol. 79(2), 279 (2015) 26. Khabbazi, M.R., Hasan, M.K., Sulaiman, R., Shapi’i, A.: Business process modeling in production logistics: complementary use of BPMN and UML. Middle East J. Sci. Res. 15(4), 516–529 (2013) 27. Badura, D.: Modelling business processes in logistics with the use of diagrams BPMN and UML. Forum Scientiae Oeconomia 2(4), 35–50 (2014) 28. Geamba¸su, C.V.: BPMN vs UML activity diagram for business process modeling. In: Proceedings of the 7th International Conference Accounting and Management Information Systems AMIS 2012, pp. 934–945. ASE Bucharest (2012) A.: 29. Mroczek, A.S., Kluza, K., Jobczyk, K., Wi´sniewski, P., Wypych, M., Ligeza, Supporting BPMN process models with UML sequence diagrams for representing time issues and testing models. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10246, pp. 589–598. Springer, Cham (2017). https://doi.org/10.1007/978-3319-59060-8 53 30. Aversano, L., Grasso, C., Tortorella, M.: Managing the alignment between business processes and software systems. Inf. Softw. Technol. 72, 171–188 (2016). https:// doi.org/10.1016/j.infsof.2015.12.009 31. Alreshidi, E., Mourshed, M., Rezgui, Y.: Cloud-based BIM governance platform requirements and specifications: software engineering approach using BPMN and UML. J. Comput. Civil Eng. 30(4), 04015063 (2015). https://doi.org/10.1061/ (ASCE)CP.1943-5487.0000539 32. Yan, Z., Dijkman, R., Grefen, P.: Business process model repositories-framework and survey. Inf. Softw. Technol. 54(4), 380–395 (2012). https://doi.org/10.1016/j. infsof.2011.11.005 33. Dijkman, R.M., La Rosa, M., Reijers, H.A.: Managing large collections of business process models-current techniques and challenges. Comput. Ind. 63(2), 91–97 (2012). https://doi.org/10.1016/j.compind.2011.12.003 34. Dijkman, R., Dumas, M., Garc´ıa-Ba˜ nuelos, L.: Graph matching algorithms for business process model similarity search. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 48–63. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03848-8 5 35. Wi´sniewski, P.: Decomposition of business process models into reusable subdiagrams. In: ITM Web of Conferences, vol. 15, p. 01002. EDP Sciences (2017). https://doi.org/10.1051/itmconf/20171501002 36. Fellmann, M., Delfmann, P., Koschmider, A., Laue, R., Leopold, H., Schoknecht, A.: Semantic technology in business process modeling and analysis, part 1: matching, modeling support, correctness and compliance. In: EMISA Forum, vol. 35, pp. 15–31. EMISA (2015) 37. Fellmann, M., Delfmann, P., Koschmider, A., Laue, R., Leopold, H., Schoknecht, A.: Semantic technology in business process modeling and analysis, part 2: domain patterns and (semantic) process model elicitation. In: EMISA Forum, vol. 35, pp. 12–23. EMISA (2015)
Towards a Web-Based Platform Supporting the Recomposition
185
38. Sheng, Q.Z., Qiao, X., Vasilakos, A.V., Szabo, C., Bourne, S., Xu, X.: Web services composition: a decade’s overview. Inf. Sci. 280, 218–238 (2014). https://doi.org/ 10.1016/j.ins.2014.04.054 39. Klimek, R.: Towards formal and deduction-based analysis of business models for soa processes. In: Filipe, J., Fred, A. (eds.) Proceedings of 4th International Conference on Agents and Artificial Intelligence (ICAART 2012), 6–8 February, 2012, Vilamoura, Algarve, Portugal, vol. 2, pp. 325–330. SciTePress (2012) 40. Milani, F., Dumas, M., Matuleviˇcius, R.: Decomposition driven consolidation of ´ (eds.) CAiSE 2013. LNCS, process models. In: Salinesi, C., Norrie, M.C., Pastor, O. vol. 7908, pp. 193–207. Springer, Heidelberg (2013). https://doi.org/10.1007/9783-642-38709-8 13 K., Klimek, R.: Collective decision making in 41. Kucharska, E., Grobler-Debska, dynamic vehicle routing problem. In: MATEC Web of Conferences, vol. 252, p. 03003. EDP Sciences (2019). https://doi.org/10.1051/matecconf/201925203003 42. Adrian, W.T., Leone, N., Manna, M., Marte, C.: Document layout analysis for semantic information extraction. In: Esposito, F., Basili, R., Ferilli, S., Lisi, F.A. (eds.) AI*IA 2017 Advances in Artificial Intelligence. LNCS, vol. 10640, pp. 269– 281. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70169-1 20 43. Weichbroth, P.: Mining e-mail message sequences from log data. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 855– 858. IEEE (2018) 44. Weichbroth, P.: Frequent sequence mining in web log data. In: Gruca, A., Czach´ orski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds.) ICMMI 2017. AISC, vol. 659, pp. 459–467. Springer, Cham (2018). https://doi.org/10.1007/9783-319-67792-7 45 45. Chambers, A.J., Stringfellow, A.M., Luo, B.B., Underwood, S.J., Allard, T.G., Johnston, I.A., Brockman, S., Shing, L., Wollaber, A., VanDam, C.: Automated business process discovery from unstructured natural-language documents. In: Del R´ıo Ortega, A., Leopold, H., Santoro, F.M. (eds.) BPM 2020. LNBIP, vol. 397, pp. 232–243. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66498-5 18 46. Robak, M., Buchmann, E.: How to extract workflow privacy patterns from legal documents. In: Ziemba, E. (ed.) AITM/ISM -2019. LNBIP, vol. 380, pp. 214–234. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43353-6 12
Author Index
Andreoletti, Davide
42
B˛aczkiewicz, Aleksandra Bicevska, Zane 26 Bicevskis, Janis 26 Bonisoli, Giovanni 117 Bujak, Agata 166 Foderaro, Salvatore
Miller, Gloria J. 143
97
Giordano, Silvia 42 Jemioło, Paweł
166
65
Naldi, Maurizio 97 Neuschl, Sarah 3 Nicosia, Gaia 97 Oditis, Ivo 26 Pacifici, Andrea 97 Paoliello, Marco 42 Pentrack, Milan 3 Peternier, Achille 42 Po, Laura 117
Karnitis, Girts 26 Kizielewicz, Bartłomiej 143 Kluza, Krzysztof 166 Köppel, Konstanze 3
Rollo, Federica
Leidi, Tiziano 42 Leyh, Christian 3 Lig˛eza, Antoni 166 Luceri, Luca 42
W˛atróbski, Jarosław 143 Wi´sniewski, Piotr 166
117
Sałabun, Wojciech 143 Suchenia, Anna 166
Zaremba, Mateusz 166