Developments in Information & Knowledge Management for Business Applications: Volume 3 (Studies in Systems, Decision and Control, 377) 3030779157, 9783030779153

This book provides practical knowledge on different aspects of information and knowledge management in businesses. In co

114 24

English Pages 823 [809] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Creating a System Based on CRM Solutions that Will Manage the Supplier Base
1 Introduction
1.1 HTML Language
1.2 Cascading Style Sheets (CSS)
1.3 PHP Language
1.4 MySQL
1.5 JavaScript Language
1.6 AJAX
2 Description of the Created System
2.1 Description of the Problem and System Requirements
3 Technical Documentation
3.1 Technical Specification of the Customer Section
3.2 Technical Specification of the Employee Part
4 User Guide
4.1 Description of the User View Application
4.2 Application Description—Employee View
5 Summary
References
Voucher 4.0—Digitisation Potential in Voucher Sales from the Works Council’s Point of View
1 Introduction
1.1 Relevance
1.2 Initial Situation
1.3 Objective
1.4 Structure of the Work
2 Voucher 4.0
3 Characteristics of Digitisation and Industry 4.0
3.1 Digitisation
3.2 Industry 4.0
3.3 Realization of Industry 4.0
4 Summary
5 Empirical Survey to Determine the Potential for Digitization in the Sale of Vouchers
5.1 Conception of the Data Collection
5.2 Data Evaluation
5.3 Interpretation
5.4 Summary and Outlook
References
Use of E-service Analytics in Slovakia
1 Introduction
1.1 Theoretical Basis
1.2 Analytics Defined
1.3 Why Analytics Matter
1.4 Analytics and Service Science
2 Service Analytics
2.1 E-services
2.2 Research Problem and Research Goal
2.3 Research Methodology
2.4 Research Sample
3 Results and Discussion
3.1 Analytics Department or Team, Organizational Structure, Internal Communication
3.2 Approach to Service Analytics
3.3 Source of Data
3.4 Dashboards and Spreadsheets
3.5 Impact of Service Analytics on Business Success and Decision-Making
4 Service Analytics and Strategy
4.1 Service Analytics and Human Resources
4.2 Challenges Connected to Service Analytics
4.3 Additional Findings
4.4 Companies That Do Not Use Analytics
5 Conclusion
References
Managing Quality of Human-Based Electronic Services
1 Introduction
1.1 Relevance
2 Theoretical and Conceptual Background
2.1 Human Intelligence Tasks (HIT)
2.2 Human-Based Electronic Services
2.3 Crowdsourcing
2.4 Quality of a Service
2.5 Quality Management
3 Key Quality Requirements for Human-Based Electronic Services
3.1 Quality Management Approaches
4 The Economical Background of Human-Based Electronic Services
5 Human-Based Electronic Services Overview: Business Models and Used Quality Management System
5.1 UpWork.com
5.2 Textbroker.com
5.3 Designenlassen.de
5.4 MTurk.com
5.5 Standard Quality Management Systems in Human-Based Electronic Services
6 Conclusion
References
Sustainability Drives of the Sharing Economy
1 Introduction
1.1 Relevance
1.2 Goals and Objectives
2 Social Impact
2.1 Reputation Systems
3 Economic Impact
4 Technology Confluence
4.1 Technological Capabilities
4.2 Digital Marketing Channels
5 Environmental Impacts and Sustainability
6 Conclusion
6.1 Synopsis
References
Sentiment Analysis for Diagnostic Purposes
1 Introduction
2 Text Representation
2.1 Text Pre-processing
2.2 Vectorization
2.3 Supervised Automatic Classification Methods
3 Social Aspects of Text Representation
3.1 Theory of Attitude
3.2 Diagnostic Elements
4 Text Analyzer Application
5 Experiments
6 Summary
References
SZZ Unleashed-RA-C: An Improved Implementation of the SZZ Algorithm and Empirical Comparison with Existing Open Source Solutions
1 Introduction
2 Literature Review
2.1 The SZZ Algorithm
2.2 Existing SZZ Algorithm Implementations
2.3 Existing Implementations Comparison
2.4 Research Questions
3 Methods and Materials
3.1 Bug Data Set
3.2 SZZ Unleashed and OpenSZZ Comparison
3.3 Base SZZ Algorithm Choice
3.4 Github Issues
3.5 SZZ Improvements Implementation
3.6 SZZ Improvements Comparison
3.7 SZZ Unleashed Fix Impact
4 Results
4.1 SZZ Unleashed Fix Impact Results
4.2 Proposed Improvements Impact Results
5 Discussion
5.1 Threats to Validity
6 Future Research
7 Conclusions
8 Appendix: Research Reproduction
8.1 Dependencies Installation
8.2 Steps to Reproduce
References
Which Static Code Metrics Can Help to Predict Test Case Effectiveness? New Metrics and Their Empirical Evaluation on Projects Assessed for Industrial Relevance
1 Introduction
2 Literature Review
2.1 Lightweight Assessment of Test-Case Effectiveness Using Source-Code-Quality Indicators
2.2 Predictive Mutation Testing
2.3 Comparison of Lightweight Assessment and Predictive Mutation Testing
3 Methods and Materials
3.1 Study Reproduction
3.2 New Metrics Propositions
3.3 Model Creation
4 Results
5 Discussion
6 Conclusions
7 Appendix: Reproducibility of the Presented Research
7.1 Study Reproduction
7.2 Chosen Environment
7.3 Reproduction Instructions
References
Intelligent Freight Forwarder with Tabu Search Algorithm
1 Introduction
2 State of the Art
2.1 Ant Colony Optimization
2.2 Simulated Annealing
2.3 Tabu Search
2.4 Transport Exchange Market
3 Project Specification and Requirements
3.1 Design and Architecture
3.2 Data Layer
3.3 Planning Layer
3.4 Dynamic Scheduling Layer
3.5 Real-Time Coordination Layer
4 Conclusion and Discussion
References
Comparison the Genetic Algorithm and Selected Heuristics for the Vehicle Routing Problem with Capacity Limitation
1 Introduction
2 Problem of Route Planning
2.1 Traveling Salesman Problem
2.2 Route Planning
2.3 Mathematical Model
3 Genetic Algorithm
3.1 Initialization
3.2 Coding Scheme
3.3 Fitness Value
3.4 Selection Operator
3.5 The Crossover Operator
3.6 Mutation Operator
3.7 Elite
4 Heuristics
4.1 Savings Algorithm
4.2 Dijkstra Algorithm
4.3 Christofides Algorithm
5 Experiments
5.1 Testing the Efficiency of the Proposed Solutions of Selected Heuristics in Comparison with the Genetic Algorithm
5.2 Experiment 1
5.3 Experiment 2
5.4 Experiment 3
5.5 Experiment 4
5.6 Experiment 5
5.7 Comparison
6 Summary
References
Dynamic Analysis of Website Content Using a Mobile Application
1 Introduction
2 Web Scrapping
2.1 Web Scrapping Techniques
3 Multi-platform Applications
4 Proposed Solution
4.1 Architecture
4.2 Data Layer
4.3 Logical layer
4.4 Graphical Layer
4.5 User Perspective
5 Conclusions
References
Code Smells Detection Using Artificial Intelligence Techniques: A Business-Driven Systematic Review
1 Introduction
1.1 Related Work
1.2 Contributions of This Study
2 Methods
2.1 Research Questions
2.2 Protocol Development
2.3 Search Process
2.4 Primary Study Selection Process
2.5 Assessing Study Quality
2.6 Data Extraction
2.7 Data Synthesis and Aggregation Process
3 Results
3.1 RQ1: Which Predictors Are Used in Prediction Models to Detect Code Smells?
3.2 RQ2: Which ML/AI Methods Are Used in Prediction Models to Detect Code Smells?
3.3 RQ3: Which Code Smells Are Analyzed in Scientific Literature?
3.4 RQ4: What Datasets and Projects, and of What Sizes Are Used in Research Papers to Predict Code Smells?
3.5 RQ5: Which Performance Metrics Are Most Commonly Used in the Literature?
3.6 RQ6: What Are the Ideas, in the Existing Research, Upon Which Code Smell Prediction Using Machine Learning May Be Built?
4 Discussion
4.1 Threats to Validity
5 Conclusions
References
Risk Management of Procurement of the German Medium-Sized Industrial Companies with the Focus on Security of Supply
1 Introduction
1.1 Relevance
1.2 Goals and Objectives
2 Theoretical and Conceptual Background
2.1 Procurement and Supply Security
2.2 Small and Medium-Sized Industrial Companies in Germany
2.3 Procurement and Risk Management in SMEs
2.4 Sub-steps of the Risk Management of Procurement
3 Organization of Risk Management
3.1 Organizational Structure
3.2 Process-Oriented Organization
3.3 Cooperation
3.4 Risk Identification
3.5 Risk Assessment
3.6 Risk Postprocessing
3.7 Risk Controlling
3.8 Interim Conclusion for the Derivation of Recommendations for Action
3.9 Transferability
4 Discussion
4.1 Outline of Recommendations for Action
4.2 Completeness Postulate
4.3 ABC Analysis
4.4 Master Data
4.5 Risk Management in Procurement
4.6 Business Management Research and Teaching
4.7 Organizational Integration
4.8 Sector-Specific Risk Management
4.9 Size-Specific Risk Management
4.10 Procurement Cooperation
4.11 Risk Management as a Risk
5 Conclusion
5.1 Synopsis
5.2 Further Research
References
The Documentation in the Project of Software Creation
1 Introduction
2 Programming Documentation
2.1 Types of Documentation Developed in the Software Development Cycle
2.2 Software Development Process
2.3 Software Development Methodologies
2.4 System Modeling Tools
2.5 Summary of Part 2
3 Cascading Software Development Procedure
3.1 Strategic Phase
3.2 Requirements Setting Phase
3.3 Analysis Phase
3.4 Design Phase
3.5 Implementation Phase
3.6 Software Testing, Verification and Validation
3.7 Summary of Part 3
4 RUP Methodology
4.1 What Is RUP
4.2 RUP Structure
4.3 Artifacts in Individual RUP Disciplines
4.4 Summary of Part 4
5 Extreme Programming Methodology
5.1 What Is XP?
5.2 XP Components
5.3 Roles of Project Participants
5.4 Documentation in XP Methodology
5.5 User Documentation
5.6 Summary of Part 5
6 Analysis and Development of the Form of Documentation
6.1 Balance Between Agility and Discipline
6.2 Project Planning and Management Documentation
6.3 Documentation of the Requirements Definition Process
6.4 Documentation of Analysis and Implementation
6.5 Test Documentation and Product Implementation
6.6 Summary of Part 6
7 Summary of the Work
References
E-Commerce Platform Using SQLite
1 Introduction
2 E-Commerce
2.1 History
2.2 Types of E-Commerce
2.3 Examples of Use
2.4 M Store
3 Electronic Commerce in Poland
3.1 General Information
3.2 Internet Marketing
3.3 Transactions and Payments
3.4 Technology
4 Online Software Store
4.1 Open Source Software
4.2 Platforms Providing Online Stores
4.3 Commercial Software
4.4 Comparison and Summary
5 Discussion of Technologies and Tools Used During Project Implementation
5.1 Java
5.2 Frameworks
5.3 Databases
5.4 Analysis of Application Performance with Different Databases
6 Technical Documentation Describing the Created Application
6.1 Functional and Non-Functional Requirements
6.2 Database Schema
6.3 Schemes of Components and Packages
6.4 Schemes of Use
6.5 Schemes of Activity
7 User Documentation
7.1 Management Panel
7.2 Online Shop
8 Summary
References
How to Prevent Unsafe Behaviour of Employees? Explanatory Models of Insecure Behaviour at the Workplace and Prevention Methods
1 Introduction
2 Methods
3 Results
3.1 The Definition of Occupational Accidents
3.2 Researched Causes of Occupational Accidents
3.3 Human Error in the Context of Occupational Safety
3.4 Explanatory Models for Causes of Human Error in Occupational Safety and Health
3.5 The ABC Model of Behavioural Analysis in the Context of Occupational Safety
3.6 Pre-existing Conditions of Employee Behaviour
3.7 Behaviour and Consequences
3.8 Dissemination of Explanatory Models for Human Error in Occupational Safety and Health
3.9 Use of Methods to Prevent Human Error in Occupational Safety
4 Discussion and Conclusion
5 Attachment
References
Privacy and Cost Concerns in Online Advertising—Literature Review and Analysis
1 Introduction
2 Online Advertisement in Literature
2.1 Keywords
2.2 Search Terms and Digital Libraries
2.3 Search Results
3 Analysis
3.1 Authors
3.2 Publication Types and Publishers
3.3 Keywords and Search Terms
4 Conclusion
Appendix
References
Technological Advancements Within the Canadian Electric Vehicle Industry
1 Introduction
2 Conceptual Background
3 Methodology
3.1 Online Query for Relevant Publications
3.2 Data Cleansing and Publication Selection
3.3 Full-Text Review of Preselected Publications
3.4 2nd Draft—Full-Text Review of Remaining Publications
3.5 Final Text Review
4 Results
4.1 Chosen Publications and Respective Research Disciplines
4.2 Researched Context Attributes
5 Discussions
5.1 Core Findings from Qualitative Research
5.2 Application of Advancements to the Canadian EV Industry
6 Conclusions
References
Game Analytics—Business Impact, Methods and Tools
1 Introduction
2 Business Impact and Methodological Aspects of Game Analytics
2.1 Range of Business Performance of Game Analytics
2.2 Methodological Aspects Based on Literature Review
3 Tools and Metrics in Game Analytics
3.1 Tools and Methods in Game Analytics
3.2 Metrics and KPIs in Game Analytics
4 Conclusion
References
Synergistics and Collaboration in Supply Chains: An Integrated Conceptual Framework for Simulation Modeling of Supply Chains
1 Introduction
2 Issues of Strategic Management of Developing Supply Chains
2.1 Integration Paradigm in Supply Chain Management. Supply Chain Synergies
2.2 Inter-Organizational Coordination. Strategic Partnership Between Supply Chain Participants
2.3 Modern Logistics Concepts/technologies Based on Participants Integration.
3 Theoretical Background
3.1 System Modeling and Stratification of Supply Chains
3.2 SC Modeling Methods and the Analysis of Dynamic SC
4 Ontological Modeling of SCM Domain
5 Simulation Applications
6 Main Conclusions and Future Research
References
Time Management and Procrastination
1 Introduction
2 Procrastination
2.1 Forms of Procrastination
2.2 Types of Procrastination
2.3 Causes of Procrastination
2.4 Procrastination Solutions
3 Time Management
3.1 Time Management Definitions
3.2 Evolution of Time Management
3.3 Time Management and Procrastination
3.4 Time Management Methods
3.5 Ten Common Mistakes in Time Management
3.6 Free-Time Management
4 Project and Time Management Software
4.1 Monday.com
4.2 Asana.com
4.3 Trello
4.4 Basecamp
4.5 Task Lists
4.6 Evernote
4.7 Todoist
5 Conclusion
References
Creating Database Models in Rational Data Architect
1 Introduction
2 Databases
2.1 Types of Databases
2.2 Database Concepts
2.3 Advantages and Disadvantages of Using Databases
2.4 SQL Language
2.5 Modeling Process
2.6 Cloud Databases
3 Database Modeling Programs
3.1 Rational Data Architect
3.2 MySQL Workbench
3.3 DbDesigner
4 Create a Base Model
4.1 Logical Model Design
4.2 Database Preconditions
4.3 Defining Functionality
4.4 Data Flow Diagrams
4.5 Rational Data Architect
4.6 DbDesigner
4.7 MySQL Workbench
5 Analysis of Programs
5.1 Rational Data Architect Rating
5.2 Rating DBDesigner
5.3 MySQL Workbench Rating
6 Summary
References
The Dynamic Environment of Pricing in E-Commerce and the Impact on Customer’s Behavior
1 Introduction
1.1 Relevance
1.2 Goals and Objectives
2 Methodology
3 Literature Review
3.1 Drives of Price Dispersion
3.2 The Online Environment of Airline and Hotel Businesses
3.3 The Perception of Price Fairness
4 Conclusions
5 Future Recommendations
References
An Investigation of the Complexity of Bitcoin Pricing
1 Introduction
2 Methodology
2.1 Database Search to Find Relevant Literature
2.2 Quantitative Analysis of the Sample
2.3 Qualitative Analysis of the Sample
3 Literature Review
3.1 Quantitative Analysis
3.2 Qualitative Analysis
4 Discussion and Introduction of Regression Model
4.1 Conclusions from the Theoretical Framework
4.2 Findings to Be Included in the Regression Model
5 Regression Model
5.1 Testing
6 Conclusion and Limitations
References
Recommend Papers

Developments in Information & Knowledge Management for Business Applications: Volume 3 (Studies in Systems, Decision and Control, 377)
 3030779157, 9783030779153

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Studies in Systems, Decision and Control 377

Natalia Kryvinska Aneta Poniszewska-Marańda   Editors

Developments in Information & Knowledge Management for Business Applications Volume 3

Studies in Systems, Decision and Control Volume 377

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Systems, Decision and Control” (SSDC) covers both new developments and advances, as well as the state of the art, in the various areas of broadly perceived systems, decision making and control–quickly, up to date and with a high quality. The intent is to cover the theory, applications, and perspectives on the state of the art and future developments relevant to systems, decision making, control, complex processes and related areas, as embedded in the fields of engineering, computer science, physics, economics, social and life sciences, as well as the paradigms and methodologies behind them. The series contains monographs, textbooks, lecture notes and edited volumes in systems, decision making and control spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the worldwide distribution and exposure which enable both a wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/13304

Natalia Kryvinska · Aneta Poniszewska-Mara´nda Editors

Developments in Information & Knowledge Management for Business Applications Volume 3

Editors Natalia Kryvinska Department of Information Systems Faculty of Management Comenius University Bratislava, Slovakia

Aneta Poniszewska-Mara´nda Institute of Information Technology Lodz University of Technology Łód´z, Poland

ISSN 2198-4182 ISSN 2198-4190 (electronic) Studies in Systems, Decision and Control ISBN 978-3-030-77915-3 ISBN 978-3-030-77916-0 (eBook) https://doi.org/10.1007/978-3-030-77916-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

In contemporary unstable time, enterprises/businesses deal with various challenges— such as large-scale competitions, high levels of uncertainty and risk, rush technological advancements, and increasing customer requirements. Thus, businesses work continually on improving efficiency of their operations and resources toward enabling sustainable solutions based on the knowledge and information accumulated previously. Consequently, this third volume of our subline persists to highlight different approaches of handling enterprise knowledge/information management directing to the importance of unceasing progress of structural management for the steady growth. We look forward that the works of this volume can encourage and initiate further research on this topic. Hence, the starting chapter “Creating a System Based on CRM Solutions that Will ˙ Manage the Supplier Base” authored by Zabicki et al. describes a solution supporting the management of the company’s IT clients. The work presents the technology of creating the solution, its functional requirements, technical documentation, and complete instructions for using the website. The aim of the project was to increase the functionality of the solution supporting the management of the company’s clients at low costs for the creation and operation of the system. The next chapter presents a study on the “Voucher 4.0—Digitisation Potential in Voucher Sales from the Works Council’s Point of View.” It deals with the digitization potential in the distribution of vouchers in Austrian companies from the perspective of the works council, which is an important, if not the most important, multiplier in the distribution of these vouchers. It begins with the literature on digitization, Industry 4.0, and its integration in the value chain discussed in order to provide a basis for the subsequent survey and to underline the relevance of the topic. The digitization potential in value voucher sales has been determined by means of a systematic collection of empirical facts by means of an online survey. The insights gained were evaluated and analyzed in the subsequent step and serve to ascertain the initial situation. The evaluation showed that there are some problems in the current voucher distribution which can be solved with the help of the digitized form of the value voucher distribution “Voucher 4.0.” The advantages of digitizing this distribution system for the works councils and also for the employees were presented in detail in the course of the work. v

vi

Preface

In the next work titled “Use of E-service Analytics in Slovakia,” the authors outline the current state of analytics use in the companies providing electronic services in Slovakia. Service analytics provides companies with many advantages. On the other hand, companies face various challenges that arise from the use of service analytics. The research is focused on companies providing e-services in Slovakia, their perception of and approach to service analytics as well as key issues they face. The subsequent chapter “Managing Quality of Human-Based Electronic Services” explores that with globalization and technology, it is possible to allocate work to workers across the globe and save time, costs, and resources for a company. Humanbased electronic services provide different options for the outsourcing of tasks that cannot be purely automated. Managing quality is one of the most important elements in this field as low-quality solutions might lead to delays in delivery, exceeding budget, and overall dissatisfaction. Various types of human-based electronic services have been analyzed and examined potential threads in quality management, as well as described workflows to improve the quality assurance while their usage. By selecting the right type and approach and combining different services with outsourcing might be optimized the result in terms of either time or quality or budget. The chapter authored by Šepeˇlová et al. “Sustainability Drives of the Sharing Economy” investigates the impact of informational services on the driving forces of the sharing economy that have resulted in the expansion of sharing platforms, specifically on the example of ride sharing platform Uber and accommodation sharing platform Airbnb. The methodology is based on the analyzing of the existing literature from developed and developing countries focusing on the model of four contributing drivers. As the result, the study will reveal the most important factors that have prompted the development of the sharing economy. Understanding these concepts as potential driving forces for participation in the sharing economy is necessary due to exploring the consumers’ needs that motivate them to the participation in the sharing economy. The work entitled “Sentiment Analysis for Diagnostic Purposes,” and authored by Urszula Krzeszewska and Joanna Ochelska-Mierzejewska, focuses on the analysis of emotional attitudes in the texts that are the statements of centuries old computer science at the Technical University of Lodz. Due to the free style of expression, the created application is a good basis for automatic analyses under the diagnostic angle, as an aid for psychologists, educators, or sociologists. The bag-of-words and n-gram methods were used to vectorize the text, while for the classification of sentiments, k-nn and NBC were used. In the next study on the “SZZ Unleashed-RA-C: An Improved Implementation of the SZZ Algorithm and Empirical Comparison with Existing Open Source Solutions,” the authors claim that the SZZ algorithm is one of the most important algorithms in mining software defects as it allows to create datasets for the sake of software defect prediction. Further, the authors explore that still very few open-source implementations of this algorithm were created. In recent years, two interesting open-source implementations of SZZ algorithm have been created, which are SZZ Unleashed and OpenSZZ. Thus, in this study, they compare how well these implementations perform as well as propose an improved implementation named SZZ

Preface

vii

Unleashed-RA-C. The most important features of the proposed algorithm and implementation include ability to identify and handle refactoring changes when tracing bug-introducing changes (RA functionality), discarding comments and files based on a regular expression, and last but not least the ability of using GitHub as the issue tracker. In the chapter “Which Static Code Metrics Can Help to Predict Test Case Effectiveness? New Metrics and Their Empirical Evaluation on Projects Assessed for Industrial Relevance,” the authors tested possibility of predicting test case effectiveness, strictly on a basis of static code metrics of production and test classes. To solve this task, the authors employed three different learning classifiers, to check feasibility of the process and compare their performance. They created own set of metrics all of which were later assessed for their impact on prediction. Created models yield a promising result, with best of them achieving over 85% for both F-Measure and Precision along with 73% for Matthews correlation coefficient. With the fact of wellbalanced data used in creation of model, it is safe to assume that they hold some merit. All steps taken to achieve this result are explained in detail. The next work named “Intelligent Freight Forwarder with Tabu Search Algorithm” aims to determine which part of Freight Forwarder processes can be enhanced with use of this cutting-edge technology. Initial step toward the goal was to split the whole process into smaller independent parts. This way four layers of the issue were obtained: data layer, planning layer, realtime coordination layer, and dynamic scheduling layer. Each of the layers required unique approach and deep understanding of the knowledge behind it. Obviously for the found problem, there were no absolute solutions, hence for every classified case paper depicts various ideas of fixing it. The chapter authored by Joanna Ochelska-Mierzejewska and Przemysław Zakrzewski “Comparison the Genetic Algorithm and Selected Heuristics for the Vehicle Routing Problem with Capacity Limitation” compares the operation of the genetic algorithm with selected heuristics (savings heuristics, Dijkstra heuristic, Christofides heuristics) for the routing problem with capacity constraints, for which the following comparison criteria were defined: time to find a solution, filling the fleet and accuracy of the solution. The chapter analyzes five random datasets differing in the location of points (cities) and the size of orders. Such a variety of data made it possible to analyze the effectiveness of selected heuristics. Results from genetic algorithm were compared with other heuristic. The results are presented in appropriate graphs, which facilitate the analysis of the results and their comparison. In the research performed by Krzysztof Stepien and Dawid Kossowski “Dynamic Analysis of Website Content Using a Mobile Application,” a mobile application, which is used to analyze changes on websites, was created. It allows user to track all or any user-selected items on any site that uses Hypertext Transfer Protocol Secure (HTTPS). Nowadays, the use of mobile and Internet applications has become extremely popular. It has led to a very large development in these areas. More and more data and information are provided to users, and their processing is a very timeconsuming process. The existing solutions that allow for the improvement of the data selection process are usually created in a limited way for the user.

viii

Preface

The following chapter presents a study on the “Code Smells Detection Using Artificial Intelligence Techniques: A Business-Driven Systematic Review.” It aims to identify and investigate the current state of the art with respect to: (1) predictors used in prediction models to detect code smells, (2) machine learning/artificial intelligence (ML/AI) methods used in prediction models to detect code smells, and (3) code smells analyzed in scientific literature. Most researchers still use source code metrics as predictors. Precision, recall, and F-measure are the go-to performance metrics. There seems to be a need for modern reference data/projects sets that reflect modern constructs of programming languages. Thus, the authors identified various promising paths of research that have the potential to advance the state of the art in the area of code smells prediction. The work authored by Stephanie Burghart and Milan Fekete “Risk Management of Procurement of the German Medium-Sized Industrial Companies with the Focus on Security of Supply” presents developed recommendations for action to strategically secure the supply of goods not produced by the company itself. The authors conceive these recommendations for action which are suitable for strategically improving the security of supply for German medium-sized industrial companies. For this purpose, a research approach based on Hans Ulrich’s demand for applicationoriented research was chosen. No theories are developed or tested by hypotheses. Instead, the focus is on advising the practice. The chapter “The Documentation in the Project of Software Creation” describes the documentation process in software development projects, which are based on various methodologies. The classic waterfall model of the software development process, Rational Unified Process, and eXtreme Programming were chosen as examples of methodologies. The RUP and XP methodologies are the main examples of two different groups of methodologies—agile and traditional. Although these methodologies represent completely different approaches to the design of the system and the process of its documentation, both have gained great popularity and are currently used in many software companies. The aim of the work is to provide the reader with various documentation processes, compare their essential content, and demonstrate their impact on the success of the project. Due to the advantages and disadvantages of the presented documentation processes, the result of the work is the creation of a universal form of the documentation process in software development projects. The next chapter presents a research on the chapter “E-Commerce Platform Using SQLite.” It describes the process of online store software development and administration panel for its administration, based on SQLite database. It lists the technologies and tools that were used to create the online store application. In addition, the performance of the database used was tested on another popular database management system. Their advantages and disadvantages are presented, as well as exemplary deals that were realized with their help. Part of the lever is the technical documentation of the created project and the administration panel of the online store. The documentation contains a general description of the application, which explains its structure. At the end of the work, the documentation of the system administrator and the user documentation of the store are presented.

Preface

ix

In the following chapter “How to Prevent Unsafe Behaviour of Employees? Explanatory Models of Insecure Behaviour at the Workplace and Prevention Methods” authored by Valéry Wöll, Rozália Sulíková, human error is considered to be the main cause of occupational accidents, accounting for up to 96%. Four models are currently cited in the german-speaking world to explain human error in the occupational safety. Except for the ABC model, which is regarded as the only holistic, scientifically proven and practicable model for explaining the causes of human error in occupational accidents, the other models are controversial or are not considered adequate when used in isolation. A quantitative literature analysis of 56 legal texts, regulations, and official notices from the field of occupational safety made it possible to investigate which methods are currently required by law to prevent the causes of occupational accidents in companies. It can be seen that the elements of the qualification method, with a share of approx. 76 %, are the measures most frequently required by law. In the work named “Privacy and Cost Concerns in Online Advertising—Literature Review and Analysis,” Tomas Lego claims that thanks to technological advancements, improvements in data analysis, and the widespread of the Internet, online advertising became a growing industry worth billions of US dollars. With large numbers of active users and data on their behavior being easily retrievable, the Internet provides a cost-efficient way of targeting individuals. Through its reliance on the behavioral data of individual Internet users, it however also poses a great threat to users’ privacy. Having conducted a keyword search in six independent digital libraries, the author provides a description of the topic of privacy and cost issues of targeted online advertisement in the form of a literature review. Drawing on a sample of 70 unique journal articles, conference papers, or book chapters, the author introduces the reader to the relevant sources and offers an overview of the most salient keywords used in this context. In the subsequent work “Technological Advancements Within the Canadian Electric Vehicle Industry,” the authors find that electric vehicle industry is experiencing moderate market adoption rates in Canada, and as of 2019, they are even becoming more affordable. Inevitably, the technology available in these vehicles will surpass that of the traditional internal combustion engines (ICEs). However, there is much more business owners can do to elevate the consumer perception of EV and synthesize market attractiveness. The rise in Big Data analysis and IoT has promised to provide incredible benefits to both consumers and corporations; however, their significance can be blown out of proportion. For that reason, the specific technologies that can be implemented in the EV are further explored to find their realities. It is found that the most prominent barrier slowing adoption is consumers’ range anxiety, and the technologies researched can be used strategically to minimize this negative perception while simultaneously providing businesses fantastic insight. The findings show that the opportunity cost is high but can be excellent for accelerating market share and altering consumer perceptions. The chapter “Game Analytics—Business Impact, Methods and Tools” authored by Flunger et al. outlines the relevance and potential of game analytics in the context of gaming business. The authors identify and discuss crucial aspects of analytical

x

Preface

and predictive models for free-to-play (F2P) business models. Based on a literature review, they analyze several business issues where game analytics may provide major benefit. Besides identifying motivations for small- and medium-sized game developers to use game analytic tools, the authors furthermore introduce six studies, which discuss churn prediction models in F2P games, as well as four studies on prediction of customers’ lifetime value. Emphasis is laid on methods, metrics, and tools in game analytics, such as player churn prediction and customer lifetime value (CLV) prediction, and their functionalities. The next chapter “Synergistics and Collaboration in Supply Chains: An Integrated Conceptual Framework for Simulation Modeling of Supply Chains” explores the approaches to simulation of supply chains’ strategic development specifically focusing on formation of cooperation strategies between supply chain partners. The objective of this paper is to suggest a conceptual scheme and stratification approaches that enable creation of a model reflecting polysystemic representation of the supply chain. The following base levels of the supply chain representation are considered: object-based, configuration/network-based, process-based, and logistics coordination levels. In the field of supply chain transformation and strategic development, there is a strong need in concurrent and aligned usage of different supply chain representations. That defines the approach to building generic supply chain representation based on composite simulation models. Depending on addressable tasks of supply chain analysis and synthesis, process and system dynamic simulation models of different degrees of detail may be used. Agent-based modeling is used to model interorganizational coordination between supply chain partners. The chapter authored by Loretta Pinke, René Pawera, and Oskar Karlík “Time Management and Procrastination” introduces the time management tools that can be used to combat procrastination. Organizations that use time management software have a better overview of their activities, so it can be expected their performance to grow. In the next work called “Creating Database Models in Rational Data Architect” the authors are analyzing various uses, types, and programming languages. As a practical demonstration and better conduct analysis, they also create database models or better put one single database model in several programs to better compare their advantages and disadvantages. Additionally, the authors are analyzing which databases are best in what kind of business situations. The authors conclude that there is no single best data modeling software and that the decision on which to use needs to be made with a prepared list of requirements in mind. While a simple database may be modeled in free software, more complex will require paid software, such as Rational Data Architect. The analysis conducted in this paper aids in making such decisions. The research performed in the chapter “The Dynamic Environment of Pricing in E-Commerce and the Impact on Customer’s Behavior” highlights the relevant sources regarding the problem of fairness, and the outcome this price strategy can cause within a consumer, while also covering an example of this dynamism in airline and hotel businesses. Results indicate that businesses should aim for a stable longterm relationship with customers to win their loyalty, as they view small price changes as fair (e.g., roughly around 5% price difference). Future researches can

Preface

xi

focus on a combination of dynamic pricing and loyalty programs, to investigate whether substantial price differences (e.g., above 25%) could be accepted by loyal customers by including any additional benefits (e.g., discount bonus for the future). Then the final chapter “An Investigation of the Complexity of Bitcoin Pricing” aims to investigate whether it is possible to combine existing research regarding specific attributes of Bitcoin and other cryptocurrencies into one model of Bitcoin price explanation. To do so, an extensive literature review is conducted to explore the publications available. The literature review results in a list of variables used to explore various research areas regarding Bitcoin. The most popular variables (such as the amount of web searches regarding Bitcoin, the gold price or security of blockchain technologies) are selected and combined into a regression model. Even though the coefficient estimates for the Google Trends index, the mean transaction fee, the number of Bitcoin wallets, the security breach dummy variable, and the lagged Bitcoin price are reported as significant, statistical testing indicates severe issues with the model. The paper therefore concludes with the finding that research regarding Bitcoin is not advanced enough and that its pricing mechanisms are too complex to build a sum-of-the-parts model. For future research, the exploration of an advanced model with measures implemented to counteract mentioned issues is suggested. Bratislava, Slovakia Łód´z, Poland May 2021

Natalia Kryvinska [email protected] Aneta Poniszewska-Mara´nda [email protected]

Contents

Creating a System Based on CRM Solutions that Will Manage the Supplier Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˙ Dawid Zabicki, Vincent Karoviˇc, and Iryna Ivanochko

1

Voucher 4.0—Digitisation Potential in Voucher Sales from the Works Council’s Point of View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Neussner and Perinne Rapp

47

Use of E-service Analytics in Slovakia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martina Halás Vanˇcová and Marián Mikolášik

83

Managing Quality of Human-Based Electronic Services . . . . . . . . . . . . . . . 123 Zuzana Takacsova and Sergiy Masalitin Sustainability Drives of the Sharing Economy . . . . . . . . . . . . . . . . . . . . . . . . 139 Lucia Šepel’ová, Jennifer R. Calhoun, and Michaela Straffhauser-Linzatti Sentiment Analysis for Diagnostic Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Urszula Krzeszewska and Joanna Ochelska-Mierzejewska SZZ Unleashed-RA-C: An Improved Implementation of the SZZ Algorithm and Empirical Comparison with Existing Open Source Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Jarosław Pokropi´nski, Jakub Gasiorek, Patryk Kramarczyk, and Lech Madeyski Which Static Code Metrics Can Help to Predict Test Case Effectiveness? New Metrics and Their Empirical Evaluation on Projects Assessed for Industrial Relevance . . . . . . . . . . . . . . . . . . . . . . . . 201 Bartosz Boczar, Michał Pytka, and Lech Madeyski Intelligent Freight Forwarder with Tabu Search Algorithm . . . . . . . . . . . . 217 Mateusz Bujnowicz, Adam Dabrowski, Mateusz Szuban´nski, Mateusz Wasilewski, and Witold Mara´nda

xiii

xiv

Contents

Comparison the Genetic Algorithm and Selected Heuristics for the Vehicle Routing Problem with Capacity Limitation . . . . . . . . . . . . 231 Joanna Ochelska-Mierzejewska and Przemysław Zakrzewski Dynamic Analysis of Website Content Using a Mobile Application . . . . . 267 Krzysztof Stepie´n and Dawid Kossowski Code Smells Detection Using Artificial Intelligence Techniques: A Business-Driven Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Tomasz Lewowski and Lech Madeyski Risk Management of Procurement of the German Medium-Sized Industrial Companies with the Focus on Security of Supply . . . . . . . . . . . 321 Stephanie Burghart and Milan Fekete The Documentation in the Project of Software Creation . . . . . . . . . . . . . . . 361 Adam Szewc, Vincent Karoviˇc, and Peter Veselý E-Commerce Platform Using SQLite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Michał Kieszek, Vincent Karoviˇc, and Iryna Ivanochko How to Prevent Unsafe Behaviour of Employees? Explanatory Models of Insecure Behaviour at the Workplace and Prevention Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Valéry Wöll and Rozália Sulíková Privacy and Cost Concerns in Online Advertising—Literature Review and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 Tomas Lego Technological Advancements Within the Canadian Electric Vehicle Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Michael Vice and Marián Mikolášik Game Analytics—Business Impact, Methods and Tools . . . . . . . . . . . . . . . 601 Rober Flunger, Andreas Mladenow, and Christine Strauss Synergistics and Collaboration in Supply Chains: An Integrated Conceptual Framework for Simulation Modeling of Supply Chains . . . . 619 Natalia Lychkina Time Management and Procrastination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649 Loretta Pinke, René Pawera, and Oskar Karlík Creating Database Models in Rational Data Architect . . . . . . . . . . . . . . . . . 731 ˇ Artur Bogusławski, Peter Veselý, Lucia Husenicová, and Ondrej Cupka The Dynamic Environment of Pricing in E-Commerce and the Impact on Customer’s Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 Jozef Sirotnak and Dmitry Ushakov

Contents

xv

An Investigation of the Complexity of Bitcoin Pricing . . . . . . . . . . . . . . . . . 781 Philipp Saborosch and Dmitry Ushakov

Creating a System Based on CRM Solutions that Will Manage the Supplier Base ˙ Dawid Zabicki, Vincent Karoviˇc, and Iryna Ivanochko

Abstract The project presented in the work describes a solution supporting the management of the company’s IT clients. The work presents the technology of creating the solution, its functional requirements, technical documentation and complete instructions for using the website. The aim of the project was to increase the functionality of the solution supporting the management of the company’s clients at low costs for the creation and operation of the system. The following technologies were used: HTML, CSS, PHP, AJAX, JavaScript and MySQL. These are the most popular open source solutions that guarantee high quality and, in most cases, free use. Another advantage of using these technologies is the number of publications, professional courses, as well as prepared examples and scripts about them, which can serve as a model. Keywords WEB technologies · CRM application model · Database structure

1 Introduction The purpose of the engineering work below was to develop a CRM (Customer Relationship Management) system, i. J. Customer Contact Management System. The work describes a fictitious company Business Solutions dealing with the sale of business management software. The company’s board of directors takes care of the best possible customer service and increased revenues. Therefore, it was decided to create a system based on CRM solutions that will manage the supplier base. The main goals are: low costs for application programming, system implementation and maintenance. In relation to these criteria, attention has been focused on open source solutions that ensure high quality and low costs. In addition, the technologies used in ˙ D. Zabicki Lodz University of Technology, Lodz, Poland e-mail: [email protected] V. Karoviˇc (B) · I. Ivanochko Faculty of Management, Comenius University in Bratislava, Bratislava, Slovakia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_1

1

2

˙ D. Zabicki et al.

the application are very popular, which guarantees wide access to teaching materials and a wide range of specialists in the given fields. The proposed solution differs from the standard CRM system because, in addition to the most frequently recurring functions, it is based on an Internet platform that is accessible not only to the company’s employees, but also to its active clients. This is a very important issue because CRM databases are still “alive”: customers change addresses, phones, email addresses, lay off employees, and hire employees. Companies that work on the basis of CRM systems very often, when they try to re-establish contact after a relatively long time, find that the data in their database is outdated and it is not possible to contact the supplier. Customer data is the most valuable information for a manufacturing and trading company and keeping it up to date is very important. By launching an online platform, the customer can check the status of their contact details, purchased products and services, the status of payments, messages and current promotions. It can also get a lot more information depending on the business profile of the seller. Completion and current updating of data through the proposed system are beneficial for both the customer and the seller. Another important advantage of the proposed system is easy adaptation to the individual needs of each client. The work was divided into four chapters. The first chapter describes the basic technologies used in the project. Here are the most popular solutions used to create websites and web applications, such as HTML, CSS, PHP, MySQl, AJAX or Java Script. The second chapter discusses the theoretical aspects of the issue. It describes the most important features of Business Solutions and the requirements that are set for the proposed system. The third chapter describes the technical way of solving the problem. Here are the most important aspects of the application and its functions, as well as the method of data transfer between individual areas. This aspect is divided into two areas: the client application page and the system administrator section. The fourth chapter presents a description of the application itself in terms of its operation. In the form of instructions for use with many illustrations, it is shown how to use all the proposed functions, both at the level of the supplier and the administrator. Website Creation Techniques The following chapter describes the technologies that were used in the design. The most popular ways to create websites and web applications have been used (Fig. 1).

1.1 HTML Language The first and most important technology is HTML (HyperText Markup Language) [1]. In free translation, the acronym HTML can be developed as Hypertext Markup Language. Hypertext—This name was coined to refer to the very popular “hyperjump” in sci-fi movies, a technology that allows you to quickly jump between star systems. The same method of operation has hypertext, which allows you to immediately “jump” from one using a link (hyperlink) pages to the next. Tags—defined (you can’t create your own) tags give them special attributes. This allows the browser

Creating a System Based on CRM Solutions that Will Manage …

3

Fig. 1 Technologies used in the project

to know how to interpret them. Language—HTML is a language for publishing on the web. HTML allows you to mark text so that web browsers can recognize it as webpage code. Tags allow you to describe the appearance and layout of all objects on a page, such as text, images, tables, forms, tables. The creator of HTML and HTTP (Hypertext Transfer Protocol) is Tim BernersLee, who worked as an IT specialist at the Swiss scientific institute CERN (European Organization for Nuclear Research) at the turn of the 1980s and 1990s. He wanted to develop a system to help scientists exchange articles and scientific articles via the Internet. HTML is based on SGML (Standard Generalized Markup Language), which is used to describe different types of specifications. HTML has evolved several times over the years [2]. Meanwhile, Tim Berners-Lee founded the World Wide Web Consortium (W3C), which has previously been involved in HTML standardization, developing technology, creating and approving specifications. The W3C is an institution that shows the direction of the development of the Internet. It is a kind of agreement between companies and organizations that seek to develop common standards for the functioning of the “global village” [3]. The W3C accepted the HTML 2.0 specification, HTML 3.2 was released in 1996, and HTML 4.0 was still used in 1998. The next step in the development of HTML was the connection with XML (eXtensible Markup Language), which led to the creation of XHTML 1.0 (eXtensible Hypertext Markup Language) [4], which is the next generation of markup languages. Informally, XHTML is called HTML 5.0. At first glance, HTML is a complete language that allows the presentation of data in the form of web pages, but its original intention was only to describe the structure of the document. It was created for the purpose of presenting articles by scientists, but over time and increasing its popularity, it was necessary to “decorate” the site. Due to the growing demand, version 3.2 introduces tags describing colors, tables, backgrounds, which enabled the development of web pages in the visual field (Fig. 2).

4

˙ D. Zabicki et al.

Fig. 2 HTML workflow

1.2 Cascading Style Sheets (CSS) Over time, W3C has introduced Cascading Style Sheets (CSS) [5], which are cascading styles. As of HTML 4.0, the entire layer describing document formatting can be moved to a separate CSS document. Before CSS, developers most often inserted pages into tables. For each subpage, they defined the background color, border color, font, all kinds of information that determined the appearance of the page. HTML code has become longer, more complex and unreadable. Each page contained hundreds of tags and elements that not only supported the presentation, but also increased the page size and slowed down the work. The basic objectives of introducing cascading styles were: to simplify HTML/XHTML code, reduce data transfers and shorten page load time [6]. The introduction of blocks made it possible to resign from the creation of pages on tables and caused the standardization of the page display in terms of the arrangement of elements on the pages, regardless of the browser used (Fig. 3). CSS is a very simple, intuitive and easy-to-learn language, but despite its simplicity, it provides great possibilities for creating a “layout” of a web page. CSS determines the parameters of each HTML/XHTML code tag and, by separating it into a separate file (.css), allows you to apply one configuration to several tagged pages (Fig. 4). So you don’t have to look at the code from each to change the look of your site subpages, but you only need to change the appropriate parameters in the external style sheet, and these will be used in all documents that contain a statement about the created sheet. Cascading styles are shown in Fig. 3. Thanks to CSS, the HTML code becomes clear and transparent and contains only the necessary content.

Creating a System Based on CRM Solutions that Will Manage …

5

Fig. 3 Cascading Style Sheets CSS

Fig. 4 External CSS

1.3 PHP Language PHP (PHP Hypertext Preprocessor) PHP Hypertext Preprocessor was originally called Personal HomePage Tools. During the development of the language, its name also changed, so the current version is a thing of the past. PHP is the most popular scripting language in the world, which works on the server side [7]. It can be used for many tasks, but the most popular is dynamic web page generation. The PHP code is inserted into the page and then the server interprets it and creates a result visible to the user (Fig. 5).

6

˙ D. Zabicki et al.

Fig. 5 PHP query processing

The most important advantages of PHP are [8]:Availability and Pricing—PHP is available under an Open-Source license, so you can use it for free and have access to its full source code. Simplicity—PHP is a very friendly language for novice programmers and those who already have programming experience, because its syntax is based on other popular high-level languages. Despite its simplicity, PHP also provides advanced users with access to very professional tools. Portability—PHP is available on many system platforms and server applications. It allows you to restore, save and even create and delete files in the user’s operating system. It also allows the execution of system commands. Integration—PHP is very strongly integrated with the most important Internet language, t. J. HTML. In addition, PHP allows direct connection to a database. The most popular combination is PHP + MySQL (Structured Query Language). In addition, PHP supports ODBC (Open Database Connectivity Standard), t. J. Open Standard for communication with databases. Security—PHP script is invisible to the user of the site and cannot be copied or modified, the recipient sees only the generated HTML. Performance—the simplest servers using PHP can handle high traffic and many transactions. Popularity—A number of publications, courses, websites and forums facilitate its learning and PHP support. Libraries—PHP has hundreds of built-in libraries that support solutions used to create websites. HTML was a static language that displayed only data. With the development of the Internet, there is a need for user integration, data collection, preferences and user integration. From these needs, PHP was born. The father of this language is Rasmus Lerdorf, who in 1995 created a script based on the Perl/CGI (Common Gateway Interface) languages, which allows you to check the number of visits to his website. The script had two options: show users the login and count them. His solution gained more and more popularity, which resulted in the release of Personal a year later Tools Home. A second version of PHP/FI (Form Interpreter) was soon released, i. J. A tool for executing SQL (Structured Query Language) queries. The growing popularity of the language meant that a group of programmers joined Rasmus to support the development of the project. Another version 3.0 was soon born, which brought changes in the parser. The number of language users and enthusiasts

Creating a System Based on CRM Solutions that Will Manage …

7

is constantly growing, programmers have added hundreds of new features. At one point during the statistical surveys it turned out, which are used by PHP by more than a million people worldwide. The manufacturers therefore concluded that the language requirements of the users could be too high compared to their abilities and it was decided to go further and further develop them. The two best programmers, Zeev Suraski and Andi Giutsmas, have taken on the challenge of completely rethinking the way we work and rebuilding PHP. They created version 4.0 by improving the parser and adding scripting support. The main changes in this version are object-oriented, session-like, encrypted, ISAPI (Internet application programming interface for Internet servers), support and compatibility with the Microsoft IIS (Internet Information Services) web server, connection to Java objects. In addition, hundreds of new features have been added that significantly improve language skills. The language began to be treated seriously. Work on version 5.0 began soon. This version was a turning point because it did not include as many changes and innovations as previous versions, but focused on improving and enhancing previously introduced features. With the introduction of version 5, the popularity of PHP has increased to 19 million pages. The current version is 5.3.3, while PHP 6 is currently under development.

1.4 MySQL As already mentioned, PHP is closely related to MySQL (Structured Query Language). MySQL is a solution of RDBMS (Relational Database Management System), t. J. Relational database management system. It is currently the most popular open source database software in the world. According to statistics, more than 100 million copies of this application have already been distributed [9]. The history of MySQL dates back to the 1970s and its creators are considered to be employees of the private Swedish company TcX AB. They were Michael “Monty” Widenius, David Axmark and Allan Larsson. Monty originally wrote UNIREG, an interface that allows direct access to ISAM (Indexed Sequential Access Method) databases. At one point, TcX began working on network applications, leading to the search for better solutions than those proposed in UNIREG. The attention of programmers was attracted by mSQL written by David Hughes, who excelled on the market with a pleasant API (Application Programming Interface) and a low price. Unfortunately, this solution was not fully compatible with UNIREG, and therefore the first concept of creating MySQL was born. TcX decided to continue developing UNIREG, but the API was very similar to the API implemented in mSQL. In 1995, the company prepared the first version of its application. A year later, it released it under the name MySQL. The product was licensed in two ways: as an open source to expand the number of recipients and promote the product, and in the commercial version, which led to funding for development and further improvements. TcX AB has changed to MySQL AB. Most users respected the product for its fast uptime, ease of use and scalability, forgetting the disadvantages and advanced

8

˙ D. Zabicki et al.

Fig. 6 MySQL Workbench

features that no one used often. Another argument for using this solution was its adoption and cooperation with the most popular technologies, such as PHP, Java or Python. Today, MySQL is a major part of LAMP (Linux Apache MySQL PHP/Perl/Python), the fastest growing set of open source solutions. In subsequent versions, MySQL confirmed the trust of its customers and introduced innovations that developed the project and also supplemented the identified shortcomings [10]. All this meant that MySQL has become an important player in the database solutions market and can compete with the largest companies in this segment. The success of MySQL lies in its rich functionality and ease of use (Fig. 6), speed and reliability at very low cost. MySQL was acquired by Sun Microsystems in 2008 and acquired by Oracle in January 2010.

1.5 JavaScript Language Everyone wants their site to be dynamic, responsive to events and changes introduced by recipients. JS (Java Script), a scripting language that supports plain HTML, is useful here. By using these two technologies together with CSS, the so-called DHTML (Dynamic HTML). JavaScript is both a programming language and a scripting language. Both types allow the creator to compile code and instructions

Creating a System Based on CRM Solutions that Will Manage …

9

Fig. 7 JavaScript structure

that browsers should follow. A few years ago, Netscape Communications had a Live Script in its browser. At the same time, Java began to develop very strongly—the object language of Sun Microsystems, which had applets in its function, t. J. Miniapplications launched directly from the browser window. In the second version of its browser, Netscape added Java and Live Script changed its name to Java Script, which was to popularize this solution in the web developer environment. This clever marketing approach was a hit, but to this day it causes a lot of confusion among beginning programmers. The main similarity between Java and JS is that they are object-oriented languages, but the biggest difference is the approach. Java Script uses a client-browser and a Java server [11]. The main advantages of JavaScript are:Easy to use—function names are similar to real operations, for example, if we want to start an action by clicking (click), just use the “onClick” function. Code Transparency—Java Script code can be entered directly in the header of an HTML document or external file with a “.js” extension. Speed—Java is an interpreted language, so you can keep track of the progress of your work without having to compile code. The advantage and disadvantage of this solution is that the script can work well for 20 lines of code and the last 10 will be broken—no compiler means no warnings or error marks. The original goal of Java Script was to support form validation, which has since been supported on the server side and languages such as PHP. The data check operation required a connection to the server, which significantly extended the operating time and increased the transmission consumption in times of slow internet connection. Thanks to JS, all work is done in the browser window and only after the initial verification of the data they go to the server. Java Script consists of three parts (Fig. 7): kernel, t. J. ECMAScript, Document Object Model (DOM) and Object Object Browser Object (BOM) [12].

1.6 AJAX AJAX (Asynchronous Javascript and XML) or Asynchronous Java and XML Script is not really a technology, but a set of technologies combined into one very functional tool. It uses XHTML and CSS to present data. The DOM (Object Object Model) is responsible for the dynamic display of data and user interaction. Asynchrony is supported by XMLHttpRequest and XML or 13XSLT (XSL Transformations,

10

˙ D. Zabicki et al.

Extensible Stylesheet Language Transformations [13]) is responsible for accessing and changing data. All this is interconnected using Java Script. AJAX is the most important element of Web 2.0, t. J. A new trend and idea for creating interactive websites. Creating AJAX is a response to attempts to make websites similar and increase their functionality with features in desktop applications. The first person to name AJAX was Jesse James Garret of Adaptive Path. To understand how AJAX works, you need to it divides its name into individual parts. Asynchronous—on standard websites, the user sends a query to the server and waits for a response. After receiving it, the page is reloaded and the results are displayed (Fig. 1). We can accept the solution when everything is done right. But what if we made a mistake and had to fill out the entire form from the beginning? In the case of AJAX, the situation is different. The user sends a query during the work, accesses first to the AJAX engine and then to the server. At this point, the user can still work, the application will not stop and the page will be refreshed, all displayed data is still displayed, work continues continuously. You can still post more questions. All commands and operations run in the background. When the motor receives a response from the server, it is displayed on the screen at a location specified by the programmer. Figure 8 shows this process Java Script—Java Script plays a very important role in AJAX—thanks to it we can use AJAX in the application, because the user is the “face” of this solution. Allows you to send questions to the server and receive answers. It is the basic way of displaying data. The element that allows

Fig. 8 AJAX operating diagram

Creating a System Based on CRM Solutions that Will Manage …

11

these changes and the so-called background work is the object built into the JS. It’s XMLHttpRequest. XML is the most popular way to upload and describe documents on the Internet. Data is sent and received from the server using XML files. AJAX also works with DHTML and CSS. These two technologies are very useful for dynamically displaying the content of a page and for changing only some of the elements it contains, without having to constantly refresh it. All these technologies have existed since 1995 and have been used sporadically. Only their use by Google significantly popularized this method, which led to the creation of the name AJAX [14]. The only disadvantage of using AJAX is that the standard built-in “browser” and “other” buttons are often no longer useful because all the work is done in one place. The above chapter presents the technologies that were used in the project created as part of the work. The combination of all of them guarantees the creation of a fully functional and functional application that meets current standards and requirements. Users, even if they use it in a browser, will feel as if they are working on a large and advanced desktop system. The advantages of the application include a useful, clear interface, stable connection and flexible database.

2 Description of the Created System The paper presents a solution designed for a fictitious company “Business Solutions” (Figs. 9 and 10). It is a company engaged in the production, sale and implementation of enterprise software supporting business management in various fields. The “Business Solutions” offer is focused on the small and medium-sized enterprise

Fig. 9 Scheme of hierarchy of functions, part 1

12

˙ D. Zabicki et al.

Fig. 10 Scheme of hierarchy of functions, part 2

sector (small and medium-sized enterprises). Business Solutions offers the following applications: “BS-Finance”—supports the company’s accounting area, allows you to define an individual chart of accounts, types of your own documents, simplifies accounting and account assignment, allows you to create standard statements such as balance sheet or profit and loss statement, and much more extensive. This will allow you to generate the necessary statements. “BS-Human Resources”—a solution supporting the area of human resources and wages in the company. It enables the automation of processes such as the calculation of wages, benefits, holidays. It enables the recording of employee data, helps in creating declarations for the authorities. “BS-Sales”—supports the areas of sales and warehousing, allows you to define and issue documents of various types in the categories of sales, purchasing, warehousing and orders. The system allows you to generate detailed statements that allow various types of analysis of transactions. “BS-Measures”—the program keeps records of long-term and intangible assets of the company. It enables the purchase, sale, inventory and depreciation of fixed assets. “BS-Analyzes”—the system expands the range of reports and summaries obtained from the “BS-Finance” program. This will allow you to analyze the company’s financial situation and plan budgets and control their implementation. “BS-Little Accounting”—an application to support small businesses in keeping a book of income and expenditure. “BS-Declaration”—an application that allows you to send reports generated in other modules directly to offices.

Creating a System Based on CRM Solutions that Will Manage …

13

2.1 Description of the Problem and System Requirements The software that the company produces depends on applicable laws and changing laws, so it is important to maintain constant contact with the customer. The customer must be informed of changes in Polish legislation, he should carefully know his applications, their versions and serial numbers. The launch of the platform allows continuous contact with current customers, but also supports the information and offer of potential customers, increases sales opportunities, which in turn leads to increased profits, t. J. To realize the basic mission of each company. The functionality of the system is shown in Fig. 11. CRM Customer Relationship Management is a method of cooperating with clients, gathering information about them and a number of activities and procedures to facilitate cooperation with them. The application created as part of the work covers most areas that include standard CRM software, and also exceeds the average limits and tries to eliminate the most popular problems. The first element that is worth paying attention to the implementation of the CRM system is the availability of data stored in it. CRM applications are often not very mobile. In order for the user to have access to the data, he must work in the office. The latest solution is to export part of the database and take it with you to the client, update and re-import the corrected data into the system. However, this solution is associated with a limitation on the number of exported records (we do not always know which fragment will be useful to us), as well as their updating by other employees during the absence of a colleague in the office. The project proposes to set up an application on an internet platform so that employees can access such a database from anywhere

Fig. 11 Scheme of use cases of the created system

14

˙ D. Zabicki et al.

in the world without having to come to the office and do “exports” and “imports” of data (Fig. 12). An alternative is to use server terminals and connect remotely. It’s also a good solution, but the web application has another advantage—it can be opened from any electronic device with a browser and Internet access, such as a mobile phone in your pocket. The structure of the database is shown in Fig. 13. Another problem facing CRM systems is updating data. Customers change telephone numbers, residence, lay off and employ employees, bases live their lives. If we do not contact the client for too long, his data may be unhappy, which may result in the loss of any relationship with him. The project decided to ensure that the database updated itself. This was achieved by a simple method that gave the customer access. If any customer is provoked in any way to keep their data up to date, this will increase their likelihood that they will update it. In the proposed system, the client uses a number of functions. The supplier must log in to the system to check the current offers. Resolving a matter increases the likelihood that he will be interested in other matters. We will try to analyze the requirements that an internet company’s customer can set for their reseller. First, every customer wants to be served quickly, efficiently and comprehensively. In the created system, several mechanisms were launched to support the relationship between supplier and manufacturer and supplier (Fig. 14). The exchange of information must take place in both ways. The message box on the customer panel allows the manufacturer to send simple information about current Fig. 12 Communication with the application

Creating a System Based on CRM Solutions that Will Manage …

15

Fig. 13 Database structure

Fig. 14 Data exchange

events, for example, you can include information about the issuance of documents, payment overruns or other types of activities performed. This ensures that in the event of a failure of traditional means of information: the e-mail is lost, the e-mail does not reach the recipient or the telephone cables are damaged, the system monitors whether such contact has taken place and the customer knows what happened. Another place where the seller provides information to the customer is the Promotions tab. Here, the customer can always find detailed information about current marketing activities, check regulations, download the order form and, under very

16

˙ D. Zabicki et al.

advantageous conditions, expand or improve their system. The user was provided with a “Contact” card, which contains a form containing several predetermined problem topics and space for case development. Using this mechanism, the user can be sure that his requests will go to the right person and be resolved. An important element of contacts between companies is the regulation of legal matters. In such cases, it is necessary for the client to fill in the relevant statements or requests. This often means sending many pages of documents. An application department was developed in the created system, in which 22 users gradually fill in the necessary fields. After confirming this action, the system will generate finished documents in.rtf format, which require only the signature of an authorized person. This solution saves a lot of time due to complex data entry. The user can be sure that the procedure has been performed correctly. Customer satisfaction and satisfaction are very important here. The mission of each business company is to maximize profits, and therefore the price calculator is implemented in the presented system. The customer does not have to call the salesman or visit the company’s branches. From the level of his panel, he can open the Offer tab, select the product he is interested in, and in a few seconds he will receive a price calculation. The result of this solution is a mutual benefit, the customer receives a specific offer and the seller gets time to prepare it and can spend it looking for new contracts. An important thing for a customer buying software is to check its properties. The user must know what software he can use and to what extent, what are the serial numbers of his license and how many seats of such an application can be operated. For this purpose, a License tab has been introduced on the user’s panel, in which the supplier checks the status of his purchases, all the above parameters and even the date of purchase. Obtaining such information is very important in the case of official controls and in the event of loss of the invoices or software fields on which this information is located. To obtain information about any features, the user must log in to their client panel. He is also entitled to edit his contact and employee details here. By providing this opportunity, the seller urges the customer to frequently check and update this information. The system from the Business Solutions employee provides control over all suppliers, allows you to edit company data and enter new suppliers. Acquiring suppliers always takes place in two ways: the customer can log in himself. Counting on it. At present, it is possible to register using the prepared form or enter it directly into the system by the administrator. The second option is to find a customer and try to force him to buy. In this case, the seller enters the supplier in the database. If the sales process is successful, the company information will be re-verified and a new login will be added for the client to use. Usually one the customer is assigned one login name. Most companies have a chosen decision maker and should have access to this type of information. However, there are situations where the structure of a company is so large that several people can handle the software, so a mechanism has been available that allows you to add multiple logins to a single counterparty. This system allows you to send information to customers in the form of messages and also record all contacts that have occurred with them. Reliable recording of

Creating a System Based on CRM Solutions that Will Manage …

17

all events is a very important issue, especially when it comes to assessing business opportunities as well as in complaints proceedings. After completing the transaction, the employee can use the system to directly add new licenses and generate serial numbers for them. During contacts with the client, the employee can obtain information about changes in the staff of his client’s employees or contact details. Of course, the system allows you to update all data related to employees, their names, telephone numbers and positions. The system has a built-in customer search module based on specific criteria. There are currently several basic fields available. However, you can run additional fields and increase filtering details if necessary. The CRM system should be integrated with other management modules in the company, unfortunately such an implementation is not possible in the proposed model. There are several solutions that could be used in subsequent versions of the application. First, the above integration. A very interesting solution would be to connect the system with the sales and accounting module. The accounting module could generate payment statements, current account balance and statements from the customer’s account. This information would appear on a separate tab called “Finance” and would be presented in tabular form. As a result, the customer would have full control over the status of his receivables and payables to the manufacturer. The advantage of integration with the sales system could be the possibility of issuing advance invoices on the basis of electronic orders placed by customers. Another advantage of such a solution is the implementation of the “Documents” tab, in which the user would see a list of issued documents with due dates and their settlement status. Every customer wants the software they buy and implement to be infinitely functional, flexible and stable, as well as extremely inexpensive, easy to use and prone to change, and tailored to their specific needs. The answer to all these requests is LAMP, t. J. Linux, Apache, MySQL and PHP. It is a server platform based on open source solutions. Linux is the most popular free operating system, Apache the most popular free server supporting HTTP, PHP the most popular free programming language for websites and MySQL an extremely stable and efficient database available to companies at a very low cost. In the case of companies, the proposed solution will be WAMP, t. J. Instead of Linux, which uses Windows, because companies most often use Microsoft software in terms of operating systems, which is caused by the number of commercial applications released on this platform. Analysis of theoretical assumptions created for CRM systems, a proprietary application was designed and implemented. The used functionalities and methods fully meet the needs of the IT company, the costs of their implementation will also be reduced to a minimum. A detailed description of how the application works and the mechanisms used will be given in the following chapters.

18

˙ D. Zabicki et al.

3 Technical Documentation The developed application consists of two parts. The first is intended for the client, t. J. For Internet users. The second part is for the administrator, available only from the company’s internal network. The aim of this division of the structure is to increase the security of the data stored in the system [15–17]. The application is located in the main directory bs_crm, inside it was divided into two directories related to functionality: user, t. J. Part visible to clients and administrator. Part visible to company employees. Each of the directories has an identical subdirectory structure.

3.1 Technical Specification of the Customer Section The user’s directory contains the index.php file and the sites subdirectory. You will find a number of subpages and catalogs in the site directory. Index.php is the first page the system user encounters. The most important elements of the file are: (Login form) consisting of two fields for login and password. After entering the data, the form sends the login name and password to the index.php file, then the function checks the correctness of the data and sends the appropriate message. The login and password fields only accept letters from the alphabet from A to Z and numbers from 0 to 9. Each must have at least 5 characters. The security of the entered data is ensured by two mechanisms [18, 19]. The first is validation of a form written in JavaScript that displays instructions and warnings. The second mechanism is validation filters stored in PHP code. The form also includes a “Submit” button with the AJAX onclick = “SendRequest () function added, which is responsible for passing the result of processing the form to a specific sector (div). The checkPass () function is a very important function that checks the correct password. Based on the information provided in the registration form, the website will connect to the database and verify the login name and password entered for those already registered. This action is supported by the query: “$ query = “SELECT THE PASSPORT OF USERS WHO NAME = “”. $ User. “” “;“. The system first searches for the user. If it finds it, it checks what password is in the database and compares it with the password entered by the user. If the procedure is performed correctly, “logged in” is entered in the “true” session variable and the login is redirected to the main menu located in the main.php file. If there are discrepancies, the function returns an explanatory message. The function also checks if the session variable is already set. If so, the user who bypasses the login screen will go directly to the menu. Figure 15 illustrates this process. Link to the registration form—a user who does not have an account is sent to the registration_company.php file for registration.

Creating a System Based on CRM Solutions that Will Manage …

19

Fig. 15 Login scheme

Company_company.php—The file contains a registration form called “registration”, which must be completed in order to become a potential customer of the company. The form contains the data to be entered: • • • • • • • • • • • •

Name—number of characters 2–60, letters A–Z, numbers 0–9, NIP—numbers and signs only—input format: AAA-BB-CC-DDD, City—letters only, minimum 3 characters, maximum 30 characters, Street—number of characters 2–50, letters A–Z, numbers 0–9, special characters allowed: “.”, House number—numbers only, maximum 5 characters, Apartment number—numbers only, maximum 5 characters, Phone number—numbers only, maximum 5 characters, Fax number—numbers only, up to 5 characters, Mail—letters A–Z, numbers 0–9 and special characters: “@”, “.”, “-”, “_”, Industry—a drop-down list with specific values, Employment—a list with specific values, Turnovers—a drop-down list with specific values.

Data input formats are called using JavaScript—walidacja.js and are validated using PHP code that obtains values. Messages are displayed in the reply block. After confirmation, the information from the form will be sent to the registr_firme.php file. The website reads the data downloaded from the form specified in the registration.php file and checks their correctness, as well as verifies whether the company is not yet stored in the database. If it is, he will send it message used if no record is added. The verification argument is a tax identification number because it is unique and unique. In addition to displaying data, the website also generates a form in which the user enters

20

˙ D. Zabicki et al.

the proposed login name and password (letters and numbers, at least 6 characters), confirms and sends to the file registration_login.php. In addition, the company’s NIP number is sent in the “hidden” field. If all data is correct, the record is stored in the database. Each login name and password is assigned a NIP number so that the system knows which company information the user can view. On the following pages, before displaying any content, the application checks the user name stored in the session variable during login and checks the NIP number assigned to the login to display the relevant entries from the tables. Registration_login.php—the addsUser () file is part of the file. The function checks the correctness of the entered login data and passwords in terms of the number and type of characters, verifies whether the given login already exists in the database, and also compares whether the password was entered correctly. In case of discrepancies, a message will be displayed. Main.php—a page containing a navigation menu for the entire site. From here you can refer to the most important functions. At the beginning of the document is a statement of external CSS styles regarding the appearance of the entire site: main_styl.css, tables_styl.css, text_styl.css, form_styl.css. Also included is a style that changes the default icon in the browser’s address bar. The page contains several blocks divided by CSS templates. • this block contains graphics with the company logo • this block contains information about the logged in person, it gets it from the session variable. There are also two links to the logout.php and settings.php and logout.php pages—the function included here clears the value of the “logged in” session variable. It also displays a logout message and sends you back to the login screen. Settings.php is a page that contains a form to change your password. After entering the new password, this number is redirected to the change_pass.php page and based on the session variable, a login is searched and the new password is inserted into the database, thus replacing the old one. • navigation menus and links to subpages have been announced here: news, company data, people, licenses, contact, applications, promotions, offers. Clicking on one of these links will not load the entire page. The contents of the called file are displayed in the “contents” block, along with all the CSS styles in the glowna.php file. • the block displays the content of the called page. If no page is selected, a predefined message is displayed. If the selected page does not exist, a warning is displayed. messages.php—this page contains a set of commands that are designed to filter from the database all messages that have been entered for the company. The script checks the name of the logged in user, checks which NIP number is assigned and selects the contents of the fields: date, name and contents from the “Messages” table and creates a table for each record. After opening this page, a list of tables will be displayed (Fig. 16).

Creating a System Based on CRM Solutions that Will Manage …

21

Fig. 16 Message display

company_company.php—a page based on the username stored in the session variable, checks the associated company and generates a form to change the company data. By default, the information declared during registration is entered into the fields of the form (Fig. 17). The user can edit all data except the NIP number. A change in the NIP number is associated with legal changes and an official document is required to change this value. The change of the NIP number will be made at the customer’s request after filling in a special declaration. After filling in the data from the form are sent to the data_change.php file and the corresponding messages are displayed using AJAX in the div block “response”. Change_data.php—the website checks the correctness of the entered data and updates it by entering it into the database. ludzi.php—the system checks the logged in user and generates a list of people working in his company. The data is generated in the form of default data forms. The values are taken from the Employees table, which contains the fields:

Fig. 17 Change of contact details

22

˙ D. Zabicki et al.

• • • • • • • •

Name—at least 3 characters, letters only, Last name—at least 3 characters, letters only, Telephone—input format KK-NNN-NN-NN, only numbers and the character “-”, Mobile phone—format: NNN-NNN-NNN, only numbers and the “-” character, Mail—A–Z, 0–9, characters “@”, “.”, “-”, “_”, Position—letters only and two hidden fields: Id_pracownik—zero value, the database uses automatic increment of the following numbers, • Company_name—downloaded based on the logged in user. After confirmation, the data is moved to the file change_person.php. In the same way, a form has been prepared for adding new contacts, but the form has no default values while its contents are sent to a file. dodanie_osob.php. Change_person.php—checks the correctness of the data, searches for the selected record and updates the data (Fig. 18). Add_person.php—validates the data and enters it as a new record in the table. Licences.php—the system checks which user is logged in. It connects to the database and checks the contents of the License table based on the NIP number in which the fields are declared: license_id, program_name, version, number_of_places, license_number, company number, sale_date, Activation. It then searches for the licenses assigned to the given NIP number and displays the contents of the fields in the form of an HTML table (Fig. 19). contact.php—the code contains a form for entering data: • case—a check box with a declared list of options to choose from, • content—a text field in which the user can describe their problem • user—hidden field containing the name of the logged in user. After confirmation, the data is sent to the send.php file and a success or error message is displayed in the “response” block.

Fig. 18 Change of employee data

Creating a System Based on CRM Solutions that Will Manage …

23

Fig. 19 License Information

Send.php—the file contains the code needed to use and configure the PHPmailer function: host name, login name and password for the SMTP server, as well as information about the message itself: recipient, sender, sender’s name and subject and content downloaded from the form. It also contains messages displayed when the letter is sent or when an error occurs. Promotions.php—the appearance of the site depends on the current promotion of the company. The content is written in HTML format. Its appearance, links and all functionality depend on the company and its configuration. Offer.php—the page contains a form in which two selection lists are declared. The first contains the programs available in the menu, the second contains a list of available services and products and one text field in which the user enters the number of positions used in the application. After confirmation, the data is sent to the computed_ferta.php file and the result is displayed using AJAX in the “response” block. Calculate_oferte.php—after receiving the data from the offer form, the code uses the switch function to check which program has been selected and which price multiplier suits it. It then checks which service the user has chosen and selects the appropriate price from the Database Goods table. Then, using the algorithm: [multiplier * (service price * number of seats)]/2 calculates the product price (Fig. 20). Applications.php—in this window the customer selects the application he wants to send (Fig. 21). Three options are proposed: rewriting the license (1_prescription.php), reducing the number of places (2_reduction.php) and changing the tax identification number (3_change_nip.php). 1_przepisanie.php—the page generates a form in which the user enters data about the company to which he transfers his licenses and the person responsible for contacts with the software manufacturer. The form fields are as follows: • company_name—number of characters 2–60, letters A–Z, numbers 0–9, • company_name—only numbers and sign “-”, input format: AAA-BB-CC-DDD,

24

˙ D. Zabicki et al.

Fig. 20 Price information

Fig. 21 Submission of applications

• company_city—letters only, minimum 3 characters, maximum 30 characters, • company_ Street—number of characters 2–50, letters A–Z, numbers 0–9, character “.”, • company_domain_number—numbers only, maximum 5 characters, • company_ spaces_number—numbers only, maximum 5 characters, • company_phone number—numbers only, maximum 5 characters, • company_fax_number—numbers only, maximum 5 characters, • mail_company—letters A–Z, numbers 0–9, characters “@”, “.”, “-”, “_”, • Name—at least 3 characters, letters only, • Last name—at least 3 characters, letters only, • Telephone—input format KK-NNN-NN-NN, only numbers and the character “-”.

Creating a System Based on CRM Solutions that Will Manage …

25

After confirmation, the data will be sent from the address.php. There are values assigned to variables. The template opens in.rtf format and the values are assigned to macros. An example of the rewriting.rtf layout is shown in Fig. 22. 2_reduction.php—this page will generate a form to reduce the number of places in program licenses. The code checks which user is logged in and displays the appropriate options: • • • •

name—at least 3 characters, only letters, last name—at least 3 characters, only letters, telephone—input format KK-NNN-NN-NN, only numbers and the character “-”, target number of positions—numbers only, maximum 3 characters.

After entering the data, they are sent to the file from2.php, where they are subjected to verification and inserted into a form template named redu.rtf. In addition, company data is filtered from the Company table and also inserted into the file. A sample form is shown in Fig. 23. 3_zmiana_nip.php—PHP code checks which user is currently logged in, and then generates a form with fields: • name—at least 3 characters, only letters, • last name—at least 3 characters, only letters, • telephone—input format KK-NNN-NN-NN, only numbers and the character “-”, Fig. 22 Sample form “rewrite.rtf”

26

˙ D. Zabicki et al.

Fig. 23 Sample form “reduction.rtf”

• company number—only numbers and the character “-”, input format: AAA-BBCC-DDD, • nip_dalej—hidden field with the current NIP number of the company. After confirmation, the data is sent to the file from3.php. PHP code looks for complete customer data from the Company table and assigns all values to variables. It then opens the nip.rtf template and inserts the text into the file. An example of this file is shown in Fig. 24.

3.2 Technical Specification of the Employee Part In addition to the client’s perspective, the system also offers a number of mechanisms for system administrators. A description of the solutions used is given below. Index.php—employees log in to the application in the same way as customers. The difference is in the table from which the login name and password are taken, in this case the Administrators table. Glowna.php—similar to the customer version, the website contains a menu with links to all the most important functions of the application: customer, company data, people, licenses, information, new customer, new customer login, new administrator. The user can also call the password change page (settings_admin.php and send the data to change_pass_admin.php) and log out (logout.php) from there. They work the same as the user version. Before opening any link, the employee must

Creating a System Based on CRM Solutions that Will Manage …

27

Fig. 24 Sample form “NIP”

decide which company information he wants to display. To do this, call client_ definition_admin.php. The website generates a customer filter based on the specified criteria. The form fields are: • • • • •

ID—numbers only, maximum 6 characters, Name—number of characters 2–60, letters A–Z, numbers 0–9, NIP–only numbers and the character “-”, input format: AAA-BB-CC-DDD, Mail–letters A-Z, numbers 0–9 and characters “@”, “.”, “-”, “_”, City–letters only, minimum 3 characters, maximum 30 characters.

After confirmation, the data is sent to the company_company.php file. It processes the content of the form and displays a list of customers who meet the requirements. data_of_company_admin.php—this page generates a form to change company data. By default, the information declared during registration is entered in the form fields. The employee can edit all data. It should be borne in mind that the change of the NIP number will take place at the request of the customer after completing the special declaration. The fields available in the form are: • • • • • • •

company name, NIP, city of the company, company street, company house number, number of business premises, telephone number of the company,

˙ D. Zabicki et al.

28

• • • • •

fax number of the company, company mail, market industry, employment in the market, market turnover.

After filling in, the data from the form is sent to the file change.php and the relevant messages are displayed using AJAX in the “response” block. Data_change.php—the page contains a set of commands that retrieve and update the contents of tables. In this case, it is not only an update of the company table, but also all tables that contain a reference to the company’s NIP number, and these are the following tables: Reports, licenses and employees. Person_admin.php—generates a list of people working in the selected company. The data is generated in the form of default data forms. The values are taken from the Employees table, which contains the fields: Employee ID, name, surname, position, telephone, telephone 2, post office, tax identification number of the company. After confirmation, the data is moved to the file change_person.php. In an analogous way, a form was prepared for adding new contacts, except that the form has no default values and its contents are sent to the add_person.php file. Change_person.php—check the correctness of the data, find the selected record and update the data. Add_person.php—validates the data and inserts it as a new record in the Company table. Licenses_admin.php—the page displays a list of licenses purchased by the customer, as well as a form for entering subsequent licenses. Form fields: • program name, • program version, • Number of positions. Contact_admin.php—the page displays a list of all messages that have been sent to the client, and also generates a form for entering additional messages. Licenses2_admin.php—the website will generate a list of several forms with default values that correspond to the purchased licenses. To transfer licenses from one company to another, edit the NIP field and then submit the form. The values are transferred to the variable_licences.php file and then inserted into the database. add_customer_admin.php—the page contains a form that allows you to add new customers to the company. After approval, the contents of the fields will go to the add_data_company_admin.php file verified and then inserted into the database. New_login_klient_admin.php—there is a form on the page that allows you to assign more than one user login to one company. The fields of the form are the login name, password and NIP of the company. The data is sent to the address add_login_admin.php and after verification is saved in the Administrator table.

Creating a System Based on CRM Solutions that Will Manage …

29

Add_admina.php—this page allows you to enter multiple logins for employees. After filling in the form, they go to the file add_admin_admin.php and save it in the database. In addition to PHP and HTML, the system also uses several supporting technologies. JavaScript stored in the walidacja.js file was used to verify the data entered in the forms. The file contains declarations of the functions needed to verify the specified strings. Next to each field of the form, depending on the type of data to be entered, there is a reference to the verification function. Checks that the entered strings are correct and displays a specific message and graphic symbol depending on the situation. There are three states: bad—means non-compliance with the assumed conditions, type of goods—partial fulfillment of conditions, good—complete fulfillment of conditions. The appearance of the messages is given in the style.css sheet. The content of the messages is defined in the form itself, the conditions to be met are specified in the validation.js file. Another element that PHP and HTML could not process was the dynamic display of the results of queries sent from the server. For improvement, a ready-made open AJAX library called mintAjax was used. Thanks to the script, we can declare a number of functions. Each of them is assigned a block identifier in which its content should be displayed. Then, code that triggers AJAX functions is added to the form approval button. This way, you don’t have to reload the page when you submit your data. The results are displayed in place selected by user. In summary, the presented chapter presents a technical description of the created application. The views needed for the system are described, as well as the most important functions and libraries. The most important forms, fields and tables were also discussed.

4 User Guide This chapter describes and illustrates in the form of user instructions all the most important functions and options that the application offers.

4.1 Description of the User View Application After entering the website address, a login window will appear in the browser window (Fig. 25). At this point, the customer has a choice: he can fill in the required fields or click on the “Register” button. The fields should be filled in according to the displayed prompts. If the customer does not already have an account on the website, they should click on the “Register” link. It will then be transferred to the registration form page (Fig. 26). The information entered must be reliable and must correspond to the

30

˙ D. Zabicki et al.

Fig. 25 Customer login window

Fig. 26 Company registration window

information submitted to the tax office. This is very important because these data will be taken into account when issuing invoices and other documents. A page will open in which you should enter the suggested login name and password (Fig. 27). The system verifies that all data has been provided correctly. If the process is successful, the user will see a link to the login panel. After logging in to the website, the user will see the main page (Fig. 28), which contains links to all the most important functions of the application. In the upper left corner is the company logo, on the right the system informs which user is currently logged in. In the middle of the page is the navigation menu and below it is the content

Creating a System Based on CRM Solutions that Will Manage …

31

Fig. 27 Login window

Fig. 28 Navigation menu

of the selected subpage. The user has eight options to choose from: information, company data, people, licenses, contact, applications, promotions and offers. After selecting the information option, the system displays a window with all the messages that administrators have sent to the client. They are presented in the form of a list of tables (Fig. 29). First, the date of creation of the message and its name are given, below the entire content of the advertisement. After selecting the link to the company data from the main menu, a window with a form for changing the company data will be displayed (Fig. 30). By default, the fields indicate the information that the customer entered when registering on the website or reporting directly to the Business Solutions employee. Here the user has the opportunity to correct and complete the data. The value that cannot be changed is the tax identification number of the company. When a user selects a person link from the menu, a window will appear in which they will be able to enter and view their company’s employees. These persons will be entitled to contact the software manufacturer and will be able to represent their company. The data are presented in the form of forms. The form for adding users is

Fig. 29 Message window

32

Fig. 30 Company data window Fig. 31 Company employees window

˙ D. Zabicki et al.

Creating a System Based on CRM Solutions that Will Manage …

33

Fig. 32 License window

Fig. 33 Message window

empty (Fig. 31), while the other forms with default data represent already entered employees. After selecting the license, the system will display a window containing a list of all programs purchased by the client. The columns of the table (Fig. 32) show the license identifier, program name, version, license number, number of seats and date of sale of the product. The contact link sends the user to the contact form (Fig. 33). The customer who wants to send a question must select a topic and then describe the case in detail in the content field. Selecting an application option from the navigation menu allows you to submit one of three applications. After filling in the form, user templates will be generated, which will be signed and sent to business solutions. The user has the option to change the NIP number (Fig. 35), he can also request a reduction in the number of places in the license (Fig. 36) and transfer his programs to another company (Fig. 34). Based on the entered data, the program creates RTF files, an example of such a document is shown in Fig. 37. Another page that a user can call from the main menu is promotions. Here you will find the latest ads related to the manufacturer’s current marketing activities. The appearance of this page can be changed very often. By default, this is a six-row table. The promotional password is displayed on the first line, the full description is displayed on the second line, and links to detailed rules and the order form below. An example of the table is shown in Fig. 38.

34 Fig. 34 License transcription form

Fig. 35 NIP number change form

˙ D. Zabicki et al.

Creating a System Based on CRM Solutions that Will Manage … Fig. 36 Form for reducing the number of positions

Fig. 37 Generated statement

35

36

˙ D. Zabicki et al.

Fig. 38 Promotional window

Fig. 39 Price list window

The last option that the user can choose from the navigation menu is the menus. Allows you to check the current prices of selected products. Simply select the program, type of service or product you are interested in and enter the target number of positions. After confirming the form (Fig. 39), the system calculates and displays the price.

4.2 Application Description—Employee View As in the case of customers, the login window is the first window that the user sees after opening the application (Fig. 40). From the customer’s point of view, it differs in that there is no link to registration. In this case, the administrator will add new logins. His account is configured when the system is deployed, then it adds more users. Similar to the client screen, the main application window for employees is created (Fig. 41). The only difference is the increased number of options available in the menu. Before starting any work, the user must choose the company he wants to work with. Select a filter option from the menu. After loading the page, a filter form will

Creating a System Based on CRM Solutions that Will Manage …

37

Fig. 40 Employee login window

Fig. 41 Employees panel menu

appear with five selection criteria: id, name, tax identification number, city, post office. He can enter any of them. After pressing the “filter” button, a list of companies that meet the specified requirements will be displayed (Figs. 42 and 43). After selecting the Data option from the main menu, the user will be taken to the page with customer data declared during registration (Fig. 44). The employee can edit them in more detail, including the tax identification number and name. This feature is used in response to customer requests. Selecting the person option from the main menu opens a window containing a number of forms filled with default values. Each of them corresponds to one employee

Fig. 42 Selection of the supported company

38 Fig. 43 Adding an employee

Fig. 44 Editing customer data

˙ D. Zabicki et al.

Creating a System Based on CRM Solutions that Will Manage …

39

that the client reported, in addition, the system generates a form for adding a new employee (Fig. 43), which was not previously stored in the database. The license link shows a list (Fig. 45) of the programs currently owned by the customer, together with the serial numbers, number of seats, date of sale and ID of these licenses, as well as a form for adding new licenses (Fig. 46). To add licenses, simply select the program, version, and enter the target number of locations. The system generates a serial number and then enters a new license in the table. The information menu of the main menu leads to the subpage containing the form for adding a message (Fig. 47). With this option, we can enter the information that the customer will see in their panel. The change page filters all client licenses and creates a number of forms (Fig. 48) in which you can edit the most important information regarding vendor purchases. On this card, you can respond to two customer requests: a request for a reduction— by entering a new number of places and by rewriting the license—by entering the company’s NIP. There is also a link to the company name. The form (Fig. 49) that it calls allows you to add a new company that is not yet a database. This could be a potential customer

Fig. 45 License window

Fig. 46 Add license window

40

˙ D. Zabicki et al.

Fig. 47 Window for adding messages

Fig. 48 License information edit window

to whom the resellers offer the software or the company that personally contacted the device. The form is identical to the form that the client sees when registering. Selecting a login option from the navigation menu allows you to assign additional users to that customer. Each customer can normally declare one login, but can request the registration of additional accounts. In addition, a customer who was previously

Creating a System Based on CRM Solutions that Will Manage … Fig. 49 Window for adding a new customer

41

42

˙ D. Zabicki et al.

Fig. 50 Window for adding the customer’s login name

Fig. 51 Window for adding a new employee login name

introduced by a Business Solutions employee must have a login name added. This is how the form on the login page serves (Fig. 50). Employee accounts can be freely added. For this purpose, the admin tab and the form for adding new logins are used (Fig. 51). The above functions illustrate what a designed and constructed system looks like. Easy and accessible navigation menus and clear forms are the main advantages of the application. Uncomplicated procedures and ease of use support ergonomics and make the website easier to use.

5 Summary The project presented in the work describes a solution supporting the management of IT company’s clients. The work presents the technology of creating the solution, its functional requirements, technical documentation and complete instructions for using the website. The aim of the project was to increase the functionality of the

Creating a System Based on CRM Solutions that Will Manage …

43

solution supporting the management of the company’s clients at low costs for the creation and operation of the system. The following technologies were used: HTML, CSS, PHP, AJAX, JavaScript and MySQL. These are the most popular open source solutions that guarantee high quality and, in most cases, free use. Another advantage of using these technologies is the number of publications, professional courses, as well as prepared examples and scripts about them, which can serve as a model [20–22]. The paper presents a description of the functional requirements for the created project. The application must be fully adapted to the company’s requirements. Its role is to provide support in all areas related to serving current and potential customers. The project represents an innovative method based on making part of the database available to the customers themselves, which enables more frequent updating and increasing the reliability of the entered data. Clients connect to servers through a web browser. Users can enter company and employee data, review reports and promotions, receive reports from administrators, and create all necessary forms and statements. All contact details are downloaded from the database from previously entered information. Service administrators who oversee the entire application have greater access to data [23–25]. Another important aspect of the work is the technical documentation and a description of how to implement specific assumptions. Specific opinions, functions, created application were presented. Thanks to descriptions and diagrams, you can learn about the data flow in the application, about use cases and the structure of the database. The last elements presented in the work are instructions for using the application from the perspective of the client and administrator. This is an extremely important issue that allows the proper operation and maintenance of the application. The project presented in the work is a solution that meets the basic requirements in the second chapter. However, it leaves room for significant expansion in other areas concerning other areas of the company. It can be connected to the integration with the financial and accounting module and the business module, which allows the user free access to information on the status of his current settlements, receivables and payables, as well as viewing the issued documents. Integration with the services module would then allow continuous monitoring of reported errors and issues. The user was able to check the current status of their posts and who handles them [26, 27]. There are also a number of other modules that could complete the project over time and with the development of the system, which would result in a significant improvement and increase in the functionality of the presented application. However, the project presented in the work is an important starting point for creating a fully professional and comfortable way of managing the company’s IT clients, which is characterized by a number of devices for managers and users.

˙ D. Zabicki et al.

44

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

16.

17.

18.

19.

20. 21. 22. 23.

24.

HTML, XHTML, and CSS Bible, 3rd edn. Wiley Publishing, Inc Indianapolis, Indiana (2004) Sokół, M.: ABC J˛ezyka HTML www.w3schools.com/css/cssintro.asp, stan na 01.02.2011 Duckett, J.: Accessible XHTML and CSS Web Sites Problem—Design—Solution. Wiley Publishing Inc., Indianapolis, Indiana (2005) York, R.: Beginning CSS: Cascading Style Sheets for Web Design. Wiley Publishing Inc., Indianapolis, Indiana (2005) Andrew, R.: The CSS Anthology: 101 Essential Tips, Tricks & Hacks. SitePoint Pty. Ltd. (2004) www.php.net, stan na 01.02.2011 Valade, J.: PHP 5 For Dummies. Wiley Publishing, Inc., Indianapolis, Indiana (2004) http://www.mysql.com/about/, stan na 01.02.2011 Converse, T., Park, J., Morgan, C.: PHP5 and MySQL Bible Wiley Publishing Inc. Indianapolis, Indiana (2004) Zakas, N.C.: Professional JavaScript for Web Developers. Wiley Publishing Inc. (2005) Goodman, D., Morrison, M.: JavaScript Bible, 5th edn. Wiley Publishing, Inc. (2004) Babin, L.: Beginning Ajax with PHP From Novice to Professional. Apress (2007) Lauriat, S.M.: Advanced Ajax Architecture and Best Practices. Pearson Education Inc. (2008) Poniszewska-Maranda, A., Majchrzycka, A.: Access control approach in development of mobile applications. In: Younas, M. et al. (eds.) Mobile Web and Intelligent Information Systems, MobiWIS 2016, LNCS 9847 pp. 149–162. Springer-Verlag Heidelberg (2016). https:// doi.org/10.1007/978-3-319-44215-0_12. ISSN 0302-9743, ISBN: 978-3-319-44214-3 Poniszewska-Mara´nda, A.: Security constraints in access control of information system using UML language. In: Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE-2006) (2006) St˛epie´n, K., Poniszewska-Mara´nda, A.: Towards the security measures of the vehicular AdHoc networks. In: Andrzej, M., Skulimowski, J. et al. (eds.) Internet of Vehicles. Technologies and Services Towards Smart City, IOV 2018, LNCS 11253, pp. 233–248. Springer-Verlag Heidelberg (2018). https://doi.org/10.1007/978-3-030-05081-8_17. ISSN 0302-9743, ISBN: 978-3-030-05080-1 Poniszewska-Mara´nda, A.: Access control coherence of information systems based on security constraints. In: SafeComp 2006: 25th International Conference on Computer Safety, Security and Reliability, LNCS 4166, pp. 412–425. Springer-Verlag Heidelberg (2006) Poniszewska-Mara´nda, A., Rutkowska, R.: Access control approach in public software as a service cloud. In: Zamojski, W. et al. (eds.) Theory and Engineering of Complex Systems and Dependability, in Advances in Intelligent and Soft Computing, vol. 365, pp. 381–390. Springer-Verlag Heidelberg (2015). ISSN 2194-5357, ISBN 978-3-319-19215-4 Gregus, M., Kryvinska, N.: Service Orientation of Enterprises—Aspects, Dimensions, Technologies. Comenius University in Bratislava (2015). ISBN: 9788022339780 Kryvinska, N., Gregus, M.: SOA and its Business Value in Requirements, Features, Practices and Methodologies. Comenius University in Bratislava (2014). ISBN: 9788022337649. Molnár, E., Molnár, R., Kryvinska, N., Greguš, M.: Web intelligence in practice the society of service science. J. Serv. Sci. Res. Springer 6(1), 149–172 (2014) Kryvinska, N., Poniszewska-Maranda, A., Gregus, M.: An approach towards service system building for road traffic signs detection and recognition. Elsevier Journal Procedia Computer Science. Special Issue on The 9th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2018), vol. 141, pp. 64–71 (2018). https://doi.org/10.1016/ j.procs.2018.10.150 Pawlak, M., Poniszewska-Maranda, A., Kryvinska, N.: Towards the intelligent agents for blockchain e-voting system. Elsevier Journal Procedia Computer Science. Special Issue on The 9th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2018), vol. 141, pp. 239–246 (2018). https://doi.org/10.1016/j.procs.2018.10.177

Creating a System Based on CRM Solutions that Will Manage …

45

25. Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Xhafa, F.: Endowing IoT devices with intelligent services. In: Barolli et al. (eds.), The 6th International Conference on Emerging Internet, Data & Web Technologies (EIDWT-2018), March 15–17, 2018, Polytechnic University of Tirana, Albania, Springer, Lecture Notes on Data Engineering and Communications Technologies (LNDECT), vol. 17, L. pp. 359–370 (2018) 26. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N., Ansar-Ul-Haque Y.: A real-time service system in the cloud. Springer, J. Ambient Intell. Humanized Comput. https://doi.org/ 10.1007/s12652-019-01203-7 27. Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Fatos, X.: Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. Springer J Comput 101(11), 1661–1685 (2019). https://doi.org/10.1007/s00607-0180680-z

Voucher 4.0—Digitisation Potential in Voucher Sales from the Works Council’s Point of View Wolfgang Neussner and Perinne Rapp

Abstract Digitisation in the context of Industry 4.0 not only changes value creation processes and makes entire business models obsolete, but could also have an impact on various areas, such as the sale of vouchers. Value vouchers are vouchers that can be redeemed at retailers or service providers like cash. Newly developed technologies or solutions, such as the Internet of Things, Big Data, Cloud Computing, etc. are playing an increasingly important role. By using such new solutions, both effectiveness and efficiency can be increased in many areas. The sale of vouchers also benefits from digitalization, as it can be made faster, more flexible and more adaptable in the future. This paper deals with the digitization potential in the distribution of vouchers in Austrian companies from the perspective of the works council, which is an important, if not the most important, multiplier in the distribution of these vouchers. To begin with, literature on digitisation, Industry 4.0 and its integration in the value chain is discussed in order to provide a basis for the subsequent survey and to underline the relevance of the topic. The digitisation potential in value voucher sales has been determined by means of a systematic collection of empirical facts by means of an online survey. The insights gained were evaluated and analyzed in the subsequent step and serve to ascertain the initial situation. The evaluation showed that there are some problems in the current voucher distribution which can be solved with the help of the digitised form of the value voucher distribution “Voucher 4.0”. The advantages of digitising this distribution system for the works councils and also for the employees were presented in detail in the course of the work. Keywords Digitisation · Voucher · Retail · Works Council

W. Neussner (B) · P. Rapp University of Applied Sciences Technikum Vienna, Höchstädtplatz 1, Vienna 1200, Austria e-mail: [email protected] P. Rapp e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_2

47

48

W. Neussner and P. Rapp

1 Introduction 1.1 Relevance “If technologies and society change faster than companies are able to adapt, then, according to the rules of evolution, certain types of companies will become extinct” [1]. Due to the increasing digitization and networking of the physical and digital world, the value chains and business relationships of different companies are undergoing a major change. This is illustrated in a figurative sense by the above quote from Karl-Heinz Land. Digitisation is intervening more and more in daily life. In connection with this, terms such as Industry 4.0, Big Data or the Internet of Things very often appear. Digitalisation goes far beyond the paperless office. It also describes the changes brought about by digitization and is consistently supported by the Internet and increasing networking. Networking via the Internet is achieved through laptops, tablets and smartphones, which have become permanent companions. Needs and habits have been significantly changed by technological change. The use of culture, movies or music has been strongly influenced by the development of the Internet. Other areas that have changed significantly are shopping behaviour, media consumption, information behaviour and work habits [1–3]. More and more people spend time in front of the computer during their working day in order to be able to collect, process and prepare information faster as a basis for decision-making. Networking via social media is used for both private and professional purposes. Communication platforms or channels such as Skype, WhatsApp, Facebook, Instagram, etc. are frequently used for this purpose [1, 2]. The increasing digitalisation is thus also having a growing influence on daily life. Various digitisation solutions are now indispensable, even though digitisation is still in its infancy and a great deal of potential is still unused [1, 2]. One business model that has not yet been digitized is the sale of discounted vouchers in Austria. When we speak of value vouchers, we always mean value vouchers (as opposed to discount vouchers) which can be purchased by works councils / trade unions from dealers or service providers who have been contractually bound in advance and which are then sold at a discount by them as a service idea to the company’s employees or those they represent. So far, the vouchers are only available in paper and plastic form, but not in digitalised form. In order to benefit from a rebate, it is usually considered necessary for the retailer or service provider to purchase a minimum purchase quantity in order to qualify for a rebate. These agreements are based on rebates, which are usually related to the annual purchase volume. This discount can be offered in the form of a lower purchase price or additional vouchers as a discount in kind, which the employee or represented person can then purchase from the works council or the trade union. This should not be confused with Christmas gifts in the form of vouchers which the works council gives

Voucher 4.0—Digitisation Potential in Voucher Sales …

49

to employees. In this way works councils encourage exchanges with their representatives on the one hand, and on the other hand it also offers the opportunity to give voters an advantage. However, this form of voucher distribution entails numerous risks. The greatest economic risk lies in the possible insolvency of the person issuing the voucher (retailer or service provider). There is a high risk of losing all the money already invested if the retailer becomes insolvent, as vouchers are pre-financed by works council funds. For better readability, from now on only works councils will be mentioned instead of works councils and trade unions and dealers instead of dealers and service providers. There are other risks, such as vouchers being lost or damage occurring during the purchase process. Since vouchers usually also have an expiry date, there is also the risk that the vouchers become worthless [1, 4, 5]. In order to reduce these risks, a trading platform is therefore to be created under the name “Voucher 4.0”. “Voucher 4.0” aims at the digitization of the voucher business in Austria. In this context, digital access is to be used to guarantee employees access for 24 h on 7 days, during which they can purchase vouchers. This would also be advantageous for employees in that they would also be able to purchase these vouchers in the local retailer’s shop via the app and would therefore not have to purchase them in stock without knowing whether they will receive the desired goods or not.

1.2 Initial Situation Works councils offer company employees the opportunity to purchase discounted vouchers from selected Austrian retail companies and service providers. After negotiating discounts with the selected retailers, the works council is responsible for reselling them to the employees. This can be on demand or on stock, depending on demand and the number of shops where the vouchers can be redeemed. The vouchers are pre-financed from the funds of the works council fund. In the event of the dealer’s insolvency, this financing can lead to a total loss, as the vouchers represent a claim against the dealer that can become worthless in the event of insolvency. Furthermore, the distribution of the vouchers is very timeconsuming for the works council/trade union and involves a high administrative effort, as employees/members always have to ask for the vouchers in order to obtain the vouchers in a subsequent step. Therefore many works councils decide not to offer vouchers for the workforce. In addition, there is a handling risk, as retailers with low demand or regional access are not included in the range. The corona pandemic that has been prevalent in Austria since March 2020 strongly supports the importance of digitizing the sale of vouchers. Due to home offices, many employees have or had no access to the vouchers, and after the reopening of the offices, access was made even more difficult due to the shift operation in many offices (access to the office every three weeks) to purchase vouchers. Furthermore, the retailers are suffering from Covid-related declines in sales and are looking for

50

W. Neussner and P. Rapp

ways to avoid discount battles and achieve “healthy” sales. Voucher 4.0 can make a contribution in this respect.

1.3 Objective Based on the problems and challenges outlined above in Chap. 1.1, the aim of this paper is to answer the following questions: • Are the works councils surveyed satisfied with the current voucher sales system Satisfied? • What problems arise with the existing voucher sales system? • What do the processes (currently) look like in the companies? • If the voucher offers are used by the employees, the took? • If the implementation of “Voucher 4.0” will make the existing Problems described by the works councils surveyed solved? • What are the advantages for works councils when using the Voucher sales via “Voucher 4.0”? The aim of this thesis is to determine the actual situation in companies from the works council’s point of view by systematically recording empirical facts by means of an online survey. By means of this investigation, the motives for voucher sales regarding current processes, existing problems, suggestions for improvement, etc. are to be investigated and subsequently evaluated. In addition, it will also be examined whether there is a demand for centralised processing of voucher sales.

1.4 Structure of the Work The first section of the thesis deals with the digital trading platform “Voucher 4.0”. Here it is explained how the concept is designed without going into the works councils in more detail. In addition, the process is also described in more detail. The next step is to work out the advantages for the works councils by introducing “Voucher 4.0”. Subsequently, by means of literature research, the topics digitisation, industry 4.0 and their integration into the value chain will be discussed in order to ensure an adequate basis and to demonstrate the importance of the topic. Furthermore, current trends and developments will also be discussed. The third part of this thesis is the empirical part, the survey of works councils which was conducted online. First the definition of the method is explained and then the procedure of the survey is discussed. In this passage, the self-created survey is then evaluated. Special attention is paid to the actual situation as well as to the problem definition and improvements of the processes. Finally, it will be evaluated whether there is potential for digitization in the distribution of vouchers in Austrian companies from the works council’s point of view.

Voucher 4.0—Digitisation Potential in Voucher Sales …

51

The last section presents the summary and the outlook of the master thesis. Here the research questions are to be answered and finally a final result is to be presented.

2 Voucher 4.0 “Voucher 4.0” is used for the digitisation of value vouchers in Austria. By means of this trading platform, the employee is granted access for 24 h on 7 days during which he/she can purchase vouchers. This means that the employee no longer has to go to the works council to buy the desired vouchers. With “Voucher 4.0”, these can be purchased via smartphones, laptops, etc. at a specific discount if required. The vouchers can be redeemed both online (if the dealers so wish and are technically able to do so) and offline [6]. Vouchers are currently only available in paper and plastic form. Retailers sell vouchers in their own stores, online or via third-party distribution (e.g. on stands at food retailers) and these are also only available at no discount. However, the vouchers can be purchased at a discount when sold through works councils. Retailers sell value vouchers in order to bind customers to the company and acquire new customers through “gift vouchers”. In addition, the sale of vouchers increases liquidity for retailers. Voucher 4.0” also helps to achieve a significant reduction in the amount of PVC cards that do not decompose and are usually only used once. Thus, the platform also ensures a high level of sustainability and a strong reduction of non-rotting cards [6]. “Voucher 4.0” represents an innovative solution for all three parties involved (dealers, works councils and employees). The employee of a company registers for “Voucher 4.0” on the homepage. After successful registration, an automatically generated e-mail is sent to the relevant works council of the company where the employee is employed. The works council must then confirm the employee’s company affiliation and release him or her for use. Only after confirmation does the employee receive the access data for downloading the app in order to gain unrestricted access to the value voucher offers without ongoing costs [6]. After the successful download of the app, vouchers can be purchased, which are paid by direct debit. Direct debits are free of charge at most banks. A voucher code is then sent to the employee on the app so that the discounted voucher can be obtained [6]. Employees will find the works council representing them in the app, which also has the possibility to use the app for its own purposes (e.g. communication with employees) and to purchase the discounted vouchers under the function “Voucher purchase” [6]. Since the voucher no longer has to be purchased and paid for in advance by the works council through “Voucher 4.0”, the risk of insolvency of the retailer is no longer relevant for the works council. The risk of insolvency is thus shifted from the works council to the purchaser of the vouchers. Furthermore, the handling risk for the works council is also more or less eliminated, as the works council no longer has

52

W. Neussner and P. Rapp

to take any financial risk. A further advantage is that fewer vouchers in demand are also accepted, as the risk of the voucher becoming invalid due to the passage of time can be excluded [6].

3 Characteristics of Digitisation and Industry 4.0 In the following chapter, the basic terms and basic knowledge about digitisation will be developed and the context between digitisation and Industry 4.0 will be established.

3.1 Digitisation The interpretation of the term digitisation is manifold and also has many meanings. The starting point was the change from analogue to digital. This can concern data, images or information. A well-known example is the burning of music onto a CD, which revolutionised the music industry. Subsequently, there is an increasing use of digitalisation in both the private and professional spheres with the aim of recording and storing information [7, 8]. On the one hand, the basis for the ongoing digitalization is the ever higher data rates that are becoming increasingly available and the ever more powerful terminal devices. Due to increased use, the price spiral has also begun to move downwards. As a result, more and more people or companies can and want to afford these end devices in order to reap the benefits of digitization. Thanks to the improvements mentioned above, investments are also paying off faster than in the past [9]. The rapid pace of digitisation, innovation and technical progress has led to the development of a wide range of applications that offer companies a multitude of growth opportunities [10].

3.1.1

Internet of Things

“It seems to me that the rapid growth of the WWW was just the spark of a much more powerful explosion. It will go off the minute things start using the Internet.” [11]—Neil Gershenfeld, MIT. The term Technicus “Internet of Things” was first used by Kevin Ashton in 1999, who was then head of the Auto-ID laboratory at the Massachusetts Institute of Technology (MIT). The goal of this research facility was to develop computers to the point where they had the ability to organize information independently of humans. “Internet of Things” means the connection of objects via the Internet with other objects. This enables production facilities to communicate with each other regardless of location, e.g. to exchange production information [12–15].

Voucher 4.0—Digitisation Potential in Voucher Sales …

53

If machines exchange information with each other, this is called “M2M Communication”. Here, information and data are exchanged via interfaces independent of human beings in order to achieve advantages in the production process [16–18]. An analysis by the German Federal Statistical Office in cooperation with Morgan Stanley estimates that within five years there will be more than seventy billion Internet-enabled end devices, three billion smartphone users and more than seven billion Internet users. Due to these rapidly increasing number of Internet-capable devices and users, the prices of the devices will continue to move downwards and the number of applications will rise sharply [16].

3.1.2

Big Data

The English term “Big Data”, which translates as “big data”, is not yet a clearly defined and well-defined term. The scientific literature speaks of three or four dimensions that make up Big Data. The basic characteristics are volume (quantity), variability (variety) and velocity (speed of quantity increase). The 4th “V”, stands for Veracity (reliability or truthfulness). Therefore one speaks of either the 3 or 4 Vs [19–21]. • Volume refers to the amount of data to be viewed, where in. Big Data area is spoken of quantities in the tera and/or petabyte area [17–19]. • Variety stands for the variety or difference regarding the origin, background and composition of the data [17–19]. • Velocity is the speed at which data changes or becomes available [17–19]. • Veracity refers to the integrity of a data source [17–19]. The advantage of Big Data is seen in the up-to-dateness of the data and the high relevance of the results, whereby the following components serve as prerequisites for increasing the profitability of a company: • • • •

the improvement of business processes the improved assessment of market potentials the recognition of unknown possibilities of a market Improved cooperation with customers generated by a better understanding [19, 20].

3.1.3

Cloud Computing

Cloud computing means the possibility of using storage capacities, software applications or computer performance of third parties. The prerequisite for cloud computing is a functioning interface to the cloud, which allows access to external computing capacities. This can be used either always or only to cushion bottlenecks in computer performance or if the company’s own storage capacities are not sufficient or the investment is to be avoided as a corporate principle. Often the cloud fees already

54

W. Neussner and P. Rapp

cover the license costs of required software, which offers additional potential. Usual pay per use fee structures allow a better calculability of the running costs [22–24]. With cloud computing it is also possible to grant third parties access to the data without the need to access the company network, which entails security aspects [22]. This advantage of cloud computing can also become a disadvantage at the same time if unauthorized persons gain unauthorized access to data in the cloud [25].

3.2 Industry 4.0 To date there is no conclusive definition or demarcation of Industry 4.0 or the fourth industrial revolution. Neither concrete procedures for implementation and use nor generally accepted standards are available. The first indicators of the fourth industrial revolution appeared at the beginning of the twenty-first century. The four levels of the industrial revolution are the beginning of the first industrial revolution took place in the middle of the eighteenth century. It was here that the first development of mechanical working machines and power engines powered by water and steam took place. This was the basis for the industrialization of the iron, textile and steel industry. The invention of mass production based on the division of labour with the aid of conveyor and assembly lines is described as the second industrial revolution, which took place towards the end of the nineteenth century. The third industrial revolution is referred to as the automation of production through the use of information and communication technologies and electronics, driven by the German economic miracle, beginning in the 1960s. Even if there is no uniform definition, the fourth industrial revolution continues to focus on information and communication technologies supplemented by cyber-physical systems (CPS) [1, 23, 24]. The following figure describes the four stages of the industrial revolution [1] (Fig. 1). The term “Industry 4.0” was first coined in 2011 and is used repeatedly to emphasize the competitiveness of German industry and to point out the (necessary) changes, although at the same time it is not defined what exactly is being talked about. On a scientific level, a discourse is being held on whether this could be a “revolution” or “evolution”. Supporters of “revolution” argue that the changes in the world of work are radical, which is by no means sufficient for supporters of “evolution”. To give an overview of different approaches to the definition of “Industry 4.0”, a summary of common definitions is given here [1, 3, 27]. Industry 4.0 uses intelligent networking to link the virtual and real worlds together along the entire value chain. The necessary information is collected and evaluated. The aim is to build up the processes of value creation efficiently and transparently and to perfect the customer benefit with intelligent services and products. [1, 3, 9, 23, 24, 27–33]

Voucher 4.0—Digitisation Potential in Voucher Sales …

55

Fig. 1 Four stages of the industrial revolution. (Source modified version taken from [26])

3.2.1

Implementation Status Industry 4.0

Industry 4.0 has become a widely used term and is gradually gaining relevance in more and more industries and applications. In the following, the status of the four major industrial nations USA, Japan, China and Germany are presented. From 2011 onwards, Germany is in any case keen to do so, and policy is geared towards achieving or maintaining technological leadership in the manufacturing sector in order to keep the quality of the location competitive despite high wage costs, and to develop innovative business models through Industry 4.0 that are conducive to achieving this goal [28]. In 2015, the Chinese government defined the strategy (“Made in China 2025”). The goal of this strategy is to introduce Industry 4.0 in industrial enterprises throughout China to adapt the heterogeneous landscape regarding Industry 4.0 to enable major progress throughout the country. This progress with a focus on increasing the degree of automation and the resulting improvement of the competitiveness of Chinese companies should be implemented across all industries and company sizes [28, 29]. In 2015, Japan also decided on the following industry 4.0 initiatives, such as Value Chain Initiative, IoT Acceleration Consortium, Robot Revolution Initiative, etc. The main focus of these activities is the robotics sector and the optimization of its own production capacities and the development of new business models through robotics [28, 29]. The USA has a focus on the data-driven part of Industry 4.0, with the aim of developing new business models. The development of these new business models should lead to improved customer value, which in turn should result in competitive

56

W. Neussner and P. Rapp

advantages. The following sectors have been identified as the most important factors: Production, energy, public sector and logistics [28, 29]. Germany’s 4.0 strategy led to Germany quickly assuming a pioneering role and earning itself a high reputation [30, 31].

3.2.2

Maturity Level Model

According to the “acatech STUDY-Industry 4.0 Maturity Index (2017), digitization will lay the foundation for Industry 4.0, which can be seen in the maturity model below [32]. Figure 2 shows that “computerisation” and “connectivity”, collectively referred to as “digitisation”, serve as a buildable basis for the following levels. The term “computerization” is understood to mean the use of information technology that is detached from other systems. The already widespread use of computerization is mainly aimed at a more efficient use of resources, especially for repetitive activities. If the devices, which previously worked in isolation, are networked with each other, one speaks of “connectivity”. This serves as the basis for digitization, as it allows entire processes in companies to be tracked and controlled [32]. In order to maintain or improve competitiveness, it will not be possible in many industries to do without digitization and, based on this, Industry 4.0 solutions. The term “visibility” in connection with Industry 4.0 is understood to mean the recording and monitoring of processes with the aid of sensor technology. The higher the number of points where data is collected, the better the information situation is, in order to be able to intervene quickly in the optimization of processes. Due to the increased use of sensors, they are becoming cheaper and cheaper, which makes implementation easier, as there is better cost-effectiveness. Schuh [32] speaks of the “digital shadow”, which means the digital image of a company, individual machines and/or processes. The digital shadow represents exclusively the digital image without the creation of analyses or evaluations and serves as a basis for the next steps [32].

Fig. 2 Maturity Model Digitization and Industry 4.0. (Source modified version taken from [32], p. 16)

Voucher 4.0—Digitisation Potential in Voucher Sales …

57

After the data have been systematically collected in the context of “visibility”, “transparency” is understood to mean the evaluation and analysis of the data previously collected in a structured manner. An essential task at this level is the structured analysis of the recorded data in order to identify problems and potentials and to derive measures from them. Identifying the essential findings from Big Data requires experience and also the right tools, whereby visualisation can be helpful here, as the ability of humans to record data is limited [32]. In the area of forecasting capability, the digital shadow is simulated on the basis of the previously systematically collected and evaluated data in order to learn from the generated data and generate added value for the future through simulations that take into account the probability of occurrence. Added value can be the improvement of processes or qualities or the faster recognition of problems that will occur in the future. The earlier possible problems can be counteracted, the easier it should be to counteract them [32]. At the last stage, “adaptability”, the aim is to ensure that all the activities described so far are carried out automatically, without human intervention, and that only checks are carried out to ensure that the effectiveness and efficiency of the measures taken are being taken [32].

3.3 Realization of Industry 4.0 In a study conducted in 2016 (“Industry 4.0—Building the Digital Enterprise”), PricewaterhouseCoopers (PWC) Germany found that the majority of the management of German companies have recognized the importance and necessity of Industry 4.0 and are pursuing the digitization of the integration of the horizontal and vertical value chain as their goal. 31] In another study by PWC (2014) the management of German companies stated that they wanted to achieve the digitization of 86% of horizontal and 80% of vertical value chains by 2020 [33–35]. A comparison of the targets with the measures implemented remains to be seen. In the view of several authors it can be assumed that digitisation will serve as the most important basis for the implementation of Industry 4.0 in order to ensure the competitiveness and progress of companies [24, 33, 36].

3.3.1

Vertical Integration of the Value Chain

The digitization of the vertical value chain ensures the capture of the flow of data and information. Figure 3 shows the vertical value chain of a company [33]. Interfaces are used to ensure that the internal systems are networked throughout. Standardised interfaces reduce the complexity and error-proneness of the transmissions. Ideally, communication systems with real-time capability are used for real-time information exchange. This promotes prompt intervention in the event of deviations and the replacement of rigid automation systems [37, 38].

58

W. Neussner and P. Rapp

Fig. 3 Vertical value chain. (Source modified version taken from [33])

3.3.2

Horizontal Integration of the Value Chain

In contrast to the vertical value-added chain, the horizontal value-added chain also shows the value-added partners that are located outside the company, such as suppliers, customers, partner companies. These should also be integrated via standardized interfaces. Figure 4 shows the horizontal value chain by means of an example [33]. In the context of the digitalization of the horizontal value chain, the goal is the real-time networking of all partners involved in a process. This means real-time data transfer between different companies, whereby the data can be stored at one company or even in a cloud. All participants in the exchange of data should benefit from the real-time data transfer [37, 38].

Fig. 4 Horizontal value chain. (Source modified version taken from [33])

Voucher 4.0—Digitisation Potential in Voucher Sales …

59

One speaks of a Smart Factory when there is vertical and horizontal integration, which is the goal of Industry 4.0. Smart Factories aim at the real-time networking of people, objects and machines in real time, thus optimizing processes [39]. In order to enable real-time communication, sensor technology will be required in addition to communication and information technology, so that the data can be recorded and processed quickly and measures can be derived from it. The implementation of the Smart Factory components leads to advantages on the one hand, but on the other hand the susceptibility to errors increases due to additional components and there may also be pressure to pass on the cost advantages to the customer. If a competitor in an industry switches to the Smart Factory concept, the price spiral can begin to turn around as a result of the competitor himself or pressure from customers [40].

4 Summary Even though there is no uniform definition of Industry 4.0, more and more companies from different industries are starting to digitize their data. The increasing progress in the area of vertical (internal company) as well as horizontal (cross-company) digitization as a basis for the implementation of Industry 4.0 will lead to a serious change, especially for production companies. The merging of the analog and digital world requires investments in information and communication technologies as well as sensor technology. Different industrial nations are pursuing different strategies, such as the development of new business models, the promotion of robotics or digitization, especially of production companies.

5 Empirical Survey to Determine the Potential for Digitization in the Sale of Vouchers Digitization, as well as Industry 4.0, require radical change in many areas and especially for companies. In the first part of this thesis, the prerequisites, influencing factors and consequences for companies were explained. The following part will survey and evaluate the status of the digitization of vouchers in a structured way and recommend measures. In the first step, the motivation for offering value vouchers by works councils as well as the employees’ preferred vouchers will be surveyed.

60

W. Neussner and P. Rapp

5.1 Conception of the Data Collection The following chapter describes the design of the study, the method of data collection and the procedure.

5.1.1

Research Design

The study design explains the methodological procedure of the study, for which, as in Table 1, a distinction is made between nine classification criteria. Once the basic variants have been defined, details are dealt with in a further step in order to develop the ideal study design. Table 1 Classification criteria based on [41] Characteristics of the study design

Variants of study designs

1. The scientific theory approach of the study

Quantitative study Qualitative study Mixed Methods Study

2. Insight objective of the study

Basic scientific study Applied scientific study

3. Subject of the study

Empirical study Methodology study Theory study

4. Data basis for empirical studies

Primary Analysis Secondary analysis Meta-analysis

5. Knowledge interest in empirical studies

Explorative study Descriptive study Explanative study

6. Formation and treatment of study groups in explanatory studies

Experimental study

7. Place of investigation for empirical studies

Laboratory study

Quasi-experimental study Non-experimental study Field study

8. Number of times empirical studies have been conducted

(Quasi) Experimental studies with/without repeated measurements Non-experimental studies with/without repeated measurements

9. Number of objects of investigation in empirical studies

Group Study Single case study

Voucher 4.0—Digitisation Potential in Voucher Sales …

61

In the theory of science, a distinction is made between qualitative and quantitative research strategies. The more appropriate or, increasingly often, a combination of both approaches is used [41]. The origin of the qualitative research strategy can be found in the humanities and systematically analyses observations, interviews or images. Due to the fact that the approach is often not structured, the result is not anticipated, and thus unexpected results come about. Open questions contribute to the possibility of the unexpected result, which is important for the development of theories, as well as the typically detailed analysis and interpretation [41, 42]. Coming from the natural sciences, the quantitative approach measures and statistically evaluates variables. The survey is usually carried out through the systematic analysis of cases in large numbers or experimental tests in laboratory environments, which enables hypotheses to be tested. Further development is carried out with standardised data collection instruments as well as structured procedures, which are assessed using three essential criteria. These evaluate objectivity (traceability), detached from the respondents, whether the study is repeatable (= replicability) and whether the results are valid (= validity). The validity is the most important criterion. A distinction is made between two validities. The validity of the causeand-effect relationships is checked on the basis of internal validity and that of the generalizability of the results on the basis of external validity [41]. Originally strictly separated, the combination of both methods has been adopted in recent years. In the literature, this is described as a “mixed-method approach”, whereby the qualitative and quantitative methods are used simultaneously or consecutively [41, 43, 44]. Under the premise of solving either scientific or practical problems, the sciencetheoretical approach always aims at the knowledge goal as the result. In the first step, a basic scientific study is carried out, the primary goal of which is the progress of scientific knowledge, which is to be achieved through the expansion of theories and procedures or through the analysis of facts. As an approach to solving practical problems, the so-called applied scientific study is applied, using scientific methods and theories. This can be carried out as independent research or as a commission [41]. The next step is to clarify the content of the survey, distinguishing between three types, namely the theoretical study, the methodological study and the empirical study. In the context of the theory study, the previous state of theory and research is worked out on the basis of publications and research. The method study serves to compare and further develop research methods. In contrast, the further development of research methods is the goal. The empirical study aims to find solutions to content-related research problems, whereby systematic data collection and data analysis can be applied here [41]. In the analysis phase, a distinction is made between primary, secondary and metaanalysis, which describes the data basis for empirical studies. If data is collected and analysed by the researchers themselves, this is referred to as primary analysis. In the secondary analysis, available data are analysed by the researchers themselves,

62

W. Neussner and P. Rapp

but are not collected independently. Meta-analysis, on the other hand, summarises existing data on a topic [41]. Within the definition of the epistemological interest of the empirical study it is determined whether the studies are explorative, descriptive or explanatory. The explorative study is used when the objective is a precise exploration and description in order to develop scientific research questions, hypotheses and theories. The descriptive study is used to survey the dispersion of characteristics and effects in large units. If an explanatory study is chosen, hypothesis testing is carried out, with the formation and treatment of study groups being an essential component. Experimental studies, quasi-experimental studies and non-experimental studies are used. In experimental studies, at least two groups are formed for hypothesis testing. The quasi-experimental study uses groups for testing the hypothesis, which exist and are not formed separately. Due to their unsuitability, non-experimental studies are rarely used to test the hypotheses [41]. Great importance is attached to the location of scientific research. A distinction is made between laboratory and field investigations. In the context of laboratory studies, we speak of a controlled environment. Influences can thus be avoided or encouraged. Field studies, where a distinction is made between quantitative and qualitative, reflect everyday conditions, as they take place in a natural environment. If a personal interview is conducted in a natural environment, one speaks of a qualitative field study. If standardised interview guidelines or questionnaires are used, this is referred to as a quantitative field study [41]. If measurement repetitions occur within the framework of quasi-/non-) experimental studies, these serve to ensure measurement accuracy. A further distinction can be made between group and individual case studies. The group study deals with a sample from a population. This sample is examined and evaluated. Individual case studies examine typical or untypical individual cases. Various data collection methods can be used for this case study [41]. The study design of the following data collection is therefore defined as [41]: • • • • • • •

Quantitative research approach Applied scientific study Empirical study Primary Analysis Explorative study Quantitative field study Group Study.

5.1.2

Data Collection Method

The survey could be carried out by telephone, in person or in writing, by post or online. Since in this case possible respondents are distributed nationwide and the respondents should also be given time flexibility, an online survey was preferred. This decision was also accompanied by the fact that no interviewers were used and the associated possible interviewer effects, such as influencing or controlling the

Voucher 4.0—Digitisation Potential in Voucher Sales …

63

interview, could be excluded. A further aspect is also that no costs are associated with it [44, 45]. In contrast to the advantages mentioned above, the range of such an approach is limited, as it is necessary to be able to reach the target group and the target group must then be able to be motivated at a distance [44]. The high number of survey e-mails sent can lead to non-participation, which means that a low participation rate can be expected [46]. These disadvantages are tried to be compensated by a serious appearance, an informative letter of announcement, “friendly reminder” to encourage the participants to participate [44].

5.1.3

Procedure

After identifying Austrian companies with works council members, they were contacted by email in a personally addressed email and the purpose of the survey and the benefits of participation were discussed. On 28.1.2020 a further email with the link to the online survey was sent to the works councils. The email addresses were generated either via the union’s own network, addresses available on the Internet or via a union mailing list. A standardized questionnaire was developed on the “Survey Online” platform. This was available via the link: https://www.umfrageonline.com/s/gutschein_4-0. The questionnaire was available for almost one month (28.01.2020 to 25.02.2020).

5.2 Data Evaluation The survey was structured in three parts. First, general company information was asked for. This included data such as the size of the company, number of locations, etc. After collecting this data, the first part of the survey was concluded with the following closed question: “Does your company offer discounted vouchers for employees? Depending on the answer, the respondent was either redirected to the area “respondent with voucher sales system” or to the area “works council without voucher sales system”. If the answer was “No”, a distinction was made as to whether or not there was interest in implementing a voucher sales system.

5.2.1

Collection of General Information

In the first step, the number of employees in the company was surveyed by the 91 participating works councils of Austrian companies. 7 participants indicated that they represent between 1 and 50 employees. 6 respondents ticked 51 to 100 as number of employees. 16 indicated 101–200 and the remaining 62 respondents more than 200. It is worth checking whether companies with more than 200 employees have a works

64

W. Neussner and P. Rapp

Fig. 5 Number of employees (own presentation)

council statistically more often than companies with a smaller number of employees (Fig. 5). Since the physical distribution of the vouchers is very complex if there are several or many locations, the question of the number of locations was also raised. The answer categories were 1, 2, 3, 4 or > = 5 and a field to fill in. 35 respondents stated that one location and 8, that two locations should be provided with vouchers. The group of respondents whose company has five or more than five locations is the largest (Fig. 6). In order to get to the core of the survey, the final question was: “Does your company offer discounted vouchers for employees? The question was asked as a closed question and answered “Yes” 52 times. The remaining 39 companies currently do not offer discounted vouchers for employees. Since 31 of the 91 respondents did not complete the survey, the total for the following questions is reduced to 60. Potential reasons why 31 respondents did not complete the survey are explained in Chap. 6.3. As shown in Fig. 7, the distribution is as follows: About two thirds have no vouchers and one third have vouchers for their employees. Fig. 6 Number of sites (own representation)

Number of sites (n=91)

1 2 35

41

3 4

43 8

>5

Voucher 4.0—Digitisation Potential in Voucher Sales …

65

Fig. 7 Subdivision of enterprises with/without voucher sales system (Own representation)

5.2.2

Companies with a Voucher Distribution System

Information on the Discounted Vouchers In order to survey the status of works councils that offer vouchers to their employees, it was determined how many retailers offer vouchers. On the one hand, this requires an agreement with the retailer regarding discounts, payment deadlines and who bears the logistics costs. When asked how many retailers could purchase vouchers, there were six possible answers, which were answered by 22 respondents. Just under 41% selected category 1–5 retailers as their answer. In each case 18% stated that they had a contract with 5–10 or 21–30 retailers. Figure 8 shows the breakdown. In response to the question of the dealers represented and the discounts received, the respondents replied as follows: • • • • • •

Eurotherme/Aquapulco = 8% Bauhaus = 10% Bellaflora = 10% C&A = 10% Douglas = 10% Dänisches Bettenlager = 10%

Fig. 8 Number of retailers (own representation)

66

• • • • • • • • • • • • • • • • • • • • •

W. Neussner and P. Rapp

Geinberg = 10% H&M = 8% Humanic 10% Ikea = 4% Interspar, Spar = 3% Intersport = 8% Landes- /Musiktheater = 10% Libro = 10% Lidl = 3% Maximarkt = 3% Müller = 10% OBI = 8% Pizzamann = 15% Rewe = 3% Shell Tankstellen = 3% Thalia = 10% Timberland = 10% Vamed Thermen = 8% XXXLutz = 10% XXL Sports = 10% Zalando = 8%

The discounts negotiated are between 3 and 15%, but predominantly between 8 and 10%. Again, this should not be seen as a representative estimate of the success of negotiations by works council representatives throughout Austria, but it should be seen as a first impression of the magnitude of the discounts currently being negotiated. It is noticeable that the range of products is broad—many areas of daily life are covered, from food, thermal baths, sport, culture to gastronomy. Food retailers seem to grant the lowest discounts, which suggests that this is explained by the (low) margins achievable in this sector. Respondents also indicate that a minimum order volume is specified for many voucher providers. This naturally makes individual purchase transactions and the addition of new suppliers to the range more difficult. As a consideration, which might need to be empirically investigated in more detail in a subsequent survey, the question arises as to whether it is attractive for companies to include in their product range specifically discounted vouchers in sectors or from suppliers that support their corporate strategy. These could be, for example, wellness providers, sports and health products if the company pays special attention to the longterm well-being of its employees’ health. If a company wishes to present itself to the workforce and the labour market as being particularly international or particularly family-friendly, the choice would be different in each case—e.g. family-friendly holiday providers and care facilities. When asked about the highest amount of a voucher, the answers vary between 50 euros and 5000 euros. From the financial restrictions mentioned above, it follows that five of the respondents will only buy vouchers if explicitly ordered by employees, whereas nine works

Voucher 4.0—Digitisation Potential in Voucher Sales …

67

council members would order vouchers on stock and the remaining nine would order both on order and on stock. This is due on the one hand to the seasonality of vouchers (e.g. Christmas business) and on the other hand to the value of the required vouchers. For example, if an employee would like to redesign a room with furniture, the required amounts can amount to several thousand euros, which are then ordered for a specific occasion rather than bought in stock. In order to be able to estimate the sales potential of these works councils, it was asked how many vouchers or what value they have per year. Twelve works council members answered this question with a range of 30–20,000 vouchers. The question of value was answered by 18 works councils, with the highest value being 1.5 million euros. Reasons for the Purchase of Vouchers The following questions serve to explore the motivation behind why works councils allow their represented employees to purchase discounted vouchers. The following answers were given: • • • • • • •

Services of the works council Service for the employees Maintaining contact with the employees Increasing the presence of the works council in the company enable employees to take advantage of listed dealers or service providers Making the vouchers available as Christmas presents or gifts for anniversaries Promotion of the image of the works council.

The support of employees and the possibility of contact with employees was mentioned most often. Figure 9 shows the frequency of this service. The additional field was used to determine that the usage is very person-dependent, since some employees use it very regularly and others do not use it at all. The question of the availability of the vouchers at all locations was answered in such a way that more than 50% of the respondents stated that they could reach between 75 and 100% of those represented. Further details on employee availability are shown in Fig. 10. Fig. 9 Claim (own representation)

68

W. Neussner and P. Rapp

Fig. 10 Accessibility of the locations (own representation)

Again, more specific questions can be derived for follow-up examinations. In order to tie in with the above-mentioned example of the concretization of corporate strategy, the question would have to be asked in view of this feedback to what extent the quality of cooperation between works council and company has an influence on the motivations of works councils for (digitized) voucher offers. Conceivable extreme positions would be, for example, the motive of a works council to position itself primarily as an antagonist of the employer and therefore to use its own offers or services for internal company image cultivation in order to achieve a high level of interest and corresponding negotiating power in the discourse with the employer. A completely opposite model would be the friendly cooperation of the employee representatives with the employer, which tries to bundle the respective possibilities in the service of the common goal with a common understanding of values (e.g. employee health, labour market attractiveness). This could go as far as company purchasing and works council representatives pooling purchasing volumes from suitable suppliers in order to achieve higher discounts for their employees. Possibly, however, a corresponding empirical study would also show that there is no or hardly any connection between the cooperation culture of works council and management. For platforms such as “Voucher 4.0”, but also for works councils that (want to) offer value vouchers or for suppliers that wish to be listed in such value voucher programs of companies, such analyses would be interesting in so far as—depending on these answers—the respective business model could be based on very different drivers. Seasonal Preferences of the Employees In order to find out whether a broadband voucher offer is advantageous and whether there are seasonal fluctuations or whether certain vouchers are preferred in winter, the works councils were also asked about this. Differences in demand were identified by the next question. Of the 23 respondents, about half each stated that there is or is not a seasonality in demand. In the context of further surveys it should be examined whether this is due to the range of vouchers (Fig. 11). Seasonality is identified by the works councils in the following form.

Voucher 4.0—Digitisation Potential in Voucher Sales … Fig. 11 Seasonal trends (own presentation)

69

Seasonal trends (n=23)

48

%

Yes 52%

No

• Winter: Thermal baths, hotel, ski tickets, Christmas presents (here especially drugstores, clothing and books) • Spring: DIY store sporting goods • Summer: hardware store, parking lot (this refers to discounted vouchers for airport parking). In order to be able to estimate the further potential, it was also surveyed that the works councils are of the opinion that a wider range of products would generate more demand. 4 out of 5 works councils affirm this and only one fifth see no additional benefit from the inclusion of further vouchers. Satisfaction of Existing Systems In a further step, the satisfaction with the current situation with the vouchers was surveyed. This was answered with the question “How satisfied are you with your current voucher sales system?” was surveyed. The available answer categories were very satisfied, satisfied, less satisfied, or not satisfied. Figure 12 shows the satisfaction of the respondents with the current approach. With 17% who are very satisfied with the current voucher distribution system and 61% who are satisfied, a total of about four out of five works councils are satisfied or very satisfied. In contrast, 13% are not very satisfied and 9% are not satisfied. Fig. 12 Satisfaction (own representation)

70

W. Neussner and P. Rapp

15 works council members used the opportunity to comment on their answers as follows: • • • • • • • • • • • • • •

High effort Availability of change and cash desk settlement Process time from order to availability very long and cumbersome Great organisational effort Logistical processing partly laborious Need for employees to buy the vouchers in stock, as they are usually needed directly in the store In order to achieve minimum order quantities, works council members purchase the vouchers together with others. This increases the complexity and the process duration Need for pre-financing The high effort if companies have several locations, because several times a week you have to drive to one location to pick up the vouchers. Long distribution channels High effort in the distribution to the employees Cash payments demanded by dealers instead of purchase on account High administration effort Challenge to get dealer offers.

In order to gain an insight into the process of purchasing vouchers, this was collected: Works councils have the option of purchasing vouchers from dealers alone, together with others or through distributors. Payment is made immediately after receipt of the vouchers. The distribution of the vouchers to the employees works differently. Half of the works councils have the employees fill out order forms, which have to be sent to the works council so that the works council can place the order with the retailer. The other half go directly to the works council to announce the order. The vouchers can be paid for by card or cash. Occasionally there is also the necessity to pay when receiving the vouchers. In this case, the works council assumes the risk of subsequent collection as well as a temporary financing function. Once the vouchers are available, the employees are notified and asked to collect them. The paper or plastic voucher is then handed over. A monthly mailing of the current conditions to the employees should motivate them to make further purchases of vouchers. The works councils indicated the following challenges in connection with the purchase and sale of vouchers: • Cover of short-term requests and orders that cannot be fulfilled • Insufficient amount of required vouchers available • Expenses for the collection of the vouchers at different locations of the company or the retailers.

Voucher 4.0—Digitisation Potential in Voucher Sales …

71

• Pre-financing—some works council’s use 25% of the works council fund for pre-financing • High expenditure of time in the distribution and verification of the incoming payments of the employees • Employees must purchase vouchers in advance. • Merchants do not offer to reuse plastic cards and the paper vouchers are too large. In order to ascertain the expenditure incurred by the works council in selling the vouchers, the time required was asked for. 15 works council members submitted their estimate. The average value for administrative expenses is 70 h followed by administrative expenses with an estimated 57 h. The time required for ordering is estimated at 17 h and distribution at 48 h. In total, the estimate amounts to 192 h per works council. As mentioned above, pre-financing entails the risk of insolvency of the dealer. If the dealer becomes insolvent, the vouchers are no longer redeemable, but only a claim on the estate. Thus, works councils were asked whether the works council fund they administered was affected by such insolvency. 42% replied that they had already suffered damage from the insolvency. In the last part of the survey it was determined which suggestions for improvement and -wishes there are in the current situation. Several works councils would like to see the optimisation of logistics from the retailer to them. The personal collection of the vouchers requires a lot of time. It would be helpful if the vouchers were sent by post or messenger. In this case, it would be necessary to clarify the bearing of costs. Furthermore, the works councils would like to be able to offer their employees a more extensive range of vouchers. From the point of view of the works councils, it would be desirable if employees could collect vouchers themselves in the retailer’s shop on presentation of their employee ID card. This would reduce the need for stockpiling and thus the pre-financing by the works councils. Dealers would be expected to be more proactive, so that offers for employees would be transmitted without the need to ask. In summary, the following wishes of the works councils are pending solutions: • • • • • • •

A solution to meet short-term needs Risk reduction in handling large amounts of money Increased use of the service Improvement of employee satisfaction Reduction of the administrative effort Reduction of administrative expenses and administrative costs Time saving.

5.2.3

Companies Without a Voucher Distribution System

As described above, the works councils, which do not have any sales of vouchers, were also asked about the general issues. They were then asked why no distribution of value vouchers is sufficient. The answer categories given were: time expenditure

72

W. Neussner and P. Rapp

Fig. 13 Motives (own representation)

too high, administrative expenditure too high, financial expenditure too high, due to the risks in handling and due to pre-financing. Additionally there was an optional field. Figure 13 shows both the reasons and the number of responses. More than half of the respondents believe that the administrative burden is too high. 43% stated that they do not have a voucher sales system because of the necessary pre-financing. The additional field was used for the following comments: • • • • • • •

No need Employees receive a discount by presenting their company ID card Due to possible taxes So far not relevant or no provider known They’re currently in negotiation No demand from employees According to the Vienna Prohibition of Industrial Action Act 1957, collective orders by the works council are prohibited and are punishable by law.

The above-mentioned issues of the tax and the Vienna Prohibition of Business Promotions Act have already been discussed in more detail above. The next point was the interest in the introduction of a voucher distribution system. Of the total of 37 participants, almost half are interested in the introduction. Those who are not interested had the opportunity to justify this. The following arguments were given: • Time expenditure too high, administrative expenditure too high, financial expenditure too high, handling risk, pre-financing • Vivo card as an alternative to the discounted vouchers (compare [47]) • Obligation of the works council to pay value added tax, as it would be engaged in a trade • The agreement of discounts on presentation of the service card involves less effort • No need, as trade unions offer employees numerous discounts/reductions. It should be noted here that not every employee is a union member • Company is too “big” for an in-house voucher distribution system.

Voucher 4.0—Digitisation Potential in Voucher Sales …

73

Interest in the Introduction Anyone who had expressed interest in a centralised system of voucher distribution was questioned in depth. 17 out of 18 people who were interested in such a system were of the opinion that centralised processing would be advantageous for their company. One works council argued that such a system was not applicable. Of the 17 works councils surveyed, 87% see the advantage of centralised processing in minimising administrative costs. Three quarters see an advantage in the elimination of pre-financing. One quarter mention easier handling as an advantage and 19% see the elimination of the insolvency risk for the works council. Figure 14 shows the mentions graphically. In order to work out the disinterest in introducing such a system, 19 works councils were asked why this was the case. Possible answers were offered: Voucher sales had never been considered, lack of employee interest, no suitable voucher offers, no free resources and an optional additional field. The following figure shows which decisions were made. 42% see the lack of resources as a problem to offer this service. 32% of the respondents are not aware of a suitable voucher offer. 32% of respondents say that the employees they represent have no interest in this issue and 37% of works councils answer that they had never considered a voucher distribution system (Fig. 15). Fig. 14 Advantages (own presentation)

Fig. 15 Justification (own presentation)

74

W. Neussner and P. Rapp

Fig. 16 Voucher sales system (own presentation)

In the final question of whether the works councils would consider a voucher distribution system if the problems described above were resolved, the following answers were given One person answered “Yes, definitely”, where against three respondents there is no way that they would consider implementing a voucher distribution system. The other 15 persons could not or would not give a yes/no answer to this question (Fig. 16).

5.3 Interpretation The first step is to find out whether there is a connection between the size of the company and the existence of an offer of discounted vouchers. About two thirds (62 out of 91 participants) state that they have more than 200 employees. 65% of them offer their representatives the possibility of acquiring discounted vouchers. 35% of the companies with > 200 employees do not offer this. From this it can be concluded that larger companies tend to be more likely to offer their representatives discounted vouchers. If the survey was already completed after four questions, it can be assumed that the topic was not relevant or there was no interest in novelties and not that the survey had already taken too long. or was too complex. If we take this interpretation as a basis for further considerations, it can be assumed that those 32% (29 out of 91 respondents) who already offer vouchers for their employees are not interested in a new voucher distribution system. The question arises whether there is a connection between the number of people represented and the amount of the largest voucher in terms of value. Five respondents indicated that they represented between 1 and 50 people. This is the lowest number that was surveyed. On average, it was stated that works councils distribute a maximum of five different vouchers. The number of value vouchers purchased annually could not be determined, but it was stated that the five respondents spent approximately 1000 euros annually. As the three works councils with 51–100 employees did not provide any information in this respect, no evaluation can be made.

Voucher 4.0—Digitisation Potential in Voucher Sales …

75

The next group of respondents (12) represents companies with 101–200 employees. The highest values of the vouchers are between 100 and 500 euros. 39 of the respondents state that they have more than 200 employees, with the values per voucher card ranging from 50 to 5000 Euros. From this it can be concluded that the greater the number of people represented, the higher the individual highest value. This may be due to the larger works council fund and the resulting possibility of financing, or it may also be due to the fact that the more employees there are, the greater the possibility that large sums will be demanded. 83% of the works councils state that they are not satisfied with the existing distribution system for vouchers. First and foremost, they mention the high costs, the pre-financing, the long distribution channels from order to employee and the long lead times. All the problems mentioned can be solved by digitalisation and thus software solutions. The main reasons for the group of respondents not offering vouchers are the high costs and the necessary pre-financing. The Viennese law prohibiting the sale of vouchers, which was mentioned by several works councils as an obstacle to the sale of vouchers, was already repealed in 2001, as it was functionless and based on an already invalid currency [48, 49]. Trade unions, such as the Public Service Union (GÖD), represent the interests of civil servants and contract staff. These include teachers, university staff, police, public construction services, federal and state companies and institutions, the judiciary, finance, the labour market service, etc. These professional groups can also become members of the CPS. The services include free legal advice, security packages or assistance with employee assessments, but also discounts [50]. The monthly membership fee is 1% of gross income [51]. Members receive discounts on shopping, cultural and educational offers, leisure events and holidays. All offers can be viewed at https://www.goedvorteil.at/ [52, 53]. In summary, it can be said that a centralised processing of the voucher distribution system would be advantageous and attractive for many companies in order to solve the challenges described above. 89% of the respondents who do not have a system today are interested in implementing such a system. Five of the respondents would not consider a system despite some advantages.

5.4 Summary and Outlook The goal of the study was to survey and analyze the digitization potential in the distribution of vouchers in cooperation with Austrian companies and works councils from the works council’s perspective. The survey to ascertain the actual situation and possible potentials was carried out by means of a systematic collection through an online survey. Are the works councils surveyed satisfied with the current voucher sales system? 79% of the works councils surveyed stated that they are satisfied with the current

76

W. Neussner and P. Rapp

system and 17% that they are very satisfied. The remaining 21% said they were less or not satisfied. What problems arise with the existing voucher sales system? Many of the respondents stated that the high costs of the current voucher distribution system are seen as a disadvantage. The high administrative effort stands out, although the time-consuming administrative work must not be neglected (e.g.: checking the incoming payments of employees falls under this effort). Furthermore, the time required to provide the service is also considered to be significant. The process from the order to the receipt of the vouchers is perceived as very laborious and time-consuming, since the personal collection and distribution to the employees is necessary. A short-term demand of the employees often cannot be covered. The lack of supply as well as the prepayment are costly and/or risky for the works council. In the absence of a works council fund, some works councils are unable to order vouchers. Current procedure. The employee must inform the works council of the desired voucher either personally or by means of an order form. Once the order has been placed, the order is placed with the retailer directly or with the help of a distributor via an individual or joint purchase. The works council can collect the voucher or have it sent by post. After the voucher has been delivered, the discounted value will be paid. Depending on the works council, the retailer is paid in advance or the employee settles after receiving the money. Is the offer of vouchers from works councils popular? Even though the use of this option is person-dependent, the works councils report that this option is used. 63% of the works councils surveyed replied that employees use this offer very frequently. 19% of employees use this offer occasionally. The remaining 18% rarely or almost never buy vouchers from the works council. Does the introduction of “Voucher 4.0” solve existing problems described by the works councils surveyed? The issues raised by the works councils can be resolved across the board (Table 3). What are the advantages for works councils when using voucher sales via“Voucher 4.0”? Table 2 shows the answers to the challenges faced by works councils in the current system. The present survey clearly shows the digitization potential in the distribution of vouchers among Austrian companies. This is supported by 21% of the works councils surveyed, who are less or not at all satisfied with their current voucher sales. Further potential can be found in the works councils, which currently do not have a system in place.

Voucher 4.0—Digitisation Potential in Voucher Sales … Table 2 Valuation of expenses (own presentation)

77 Person hours per year  x˜

Order of the vouchers

248.00

16.53

Distribution of the vouchers

713.00

47.53

Administrative burden

1.05200

70.13

Administrative effort

847.00

56.47

Table 3 Solution competence through the use of “Voucher 4.0” (own presentation) Existing problems

Solution

High administrative costs & high administrative effort

The works council only has to confirm the employee’s affiliation to the company in order to ensure that the employee has unrestricted access to the offers

Prefinancing

No more pre-financing necessary

Risk of insolvency of a trader

Insolvency risk is avoided, as pre-financing is no longer necessary

Short-term wishes and orders cannot be covered

24/7 access to all voucher offers

Long lead time Long distribution channels Receipt of the vouchers very time-consuming

Immediate receipt of the voucher after payment via the app (e-mail, app, etc.)

Distribution to employees partly costly Change Retailers often only accept cash

Direct payment (direct debit) from the employee to the trader via the trading platform

Dealer offers difficult to obtain (handling risk)

All trader offers can be accessed via the trading platform

Environmental aspect

High sustainability, as the vouchers are no longer transmitted in paper or plastic form

Most of the known problems of the existing processes are eliminated by “Voucher 4.0”, whereby digitalisation not only reduces speed in processing but also the risks (handling, insolvencies). The high demand for the supply of paid union cards shows that there is a great need for such offers. If a works council or even a company can now offer such benefits (free of charge) to its employees, this will also have an impact on the retention of its own workforce. Furthermore, the digitisation of voucher sales makes sense at a time when many areas are being digitised. The ever-increasing shift from local trade to “international online shopping” also requires the digitisation of associated vouchers, whereby COVID 19 could lead to an opposite trend, namely regionalisation (“buy local”). Not to forget the climate friendliness of voucher 4.0. Instead of many one-way cards, an

78

W. Neussner and P. Rapp

app is used. In the days of “Fridays for Future” and a constantly growing environmental awareness, the abandonment of paper or even plastic vouchers is a politically welcome step. In summary, it can be said that the existing concept of the value voucher appears to be outdated and that with Voucher 4.0 a digitalised solution is available whose success will depend above all on the willingness to try something new. In order to be able to evaluate the digitisation potential in value voucher sales even more precisely and representatively, the next step would be to ascertain the will and the economic possibility of paying for this service. The advantages of digitization alone do not necessarily have to be enough to ensure success. Furthermore, it is recommended to survey retailers. This could include an investigation of the effects of voucher sales on companies and the motives of retailers to sell discounted vouchers to works councils. In addition, the current procedures or processes could be identified and evaluated in more detail in order to evaluate the savings and calculate the economic sense of a possible investment.

References 1. Roth, A.: Industrie 4.0—Grundlagen und Gesamtzusammenhang. In: Roth A (ed.) Einführung und Umsetzung von Industrie 4.0: Grundlagen, Vorgehensmodell und Use Cases aus der Praxis, pp. 17–82. Springer Gabler, Berlin, Heidelberg (2016) 2. Cachelin, J.L.: The Consequences of Digitalization: New Working Environments, Knowledge Cultures and Leadership Understandings. Knowledge Factory, St.Gallen, (2012) 3. Kaufmann, T.: Business Models in Industry 4.0 and the Internet of Things: The Path from Aspiration to Reality, 1st edn. Springer Vieweg, Wiesbaden (2015) 4. Kryvinska, N.: Building consistent formal specification for the service enterprise agility foundation. Soc Serv Sci J Serv Sci Res Springer 4(2), 235–269 (2012) 5. Kaczor, S., Kryvinska, N.: It is all about services—fundamentals, drivers, and business models. Soc Serv Sci J Serv Sci Res Springer 5(2), 125–154 (2013) 6. Neussner, W.: Gutschein 4.0. Vienna (2017) 7. Bendel.: Gabler Wirtschaftslexikon: Digitalisierung, 2018 [Online]. Available at: https://wir tschaftslexikon.gabler.de/definition/digitalisierung-54195/version-277247. Accessed 30 July 2019 8. Cole, T.: Digital Transformation: Why the German Economy is Sleeping Through the Digital Future and What Needs to be Done Now! 2nd edn. Vahler, Munich (2015) 9. IFAA.: Digitisation & Industry 4.0: as Individual as Demand—Productivity Growth through Information, 2016 [Online]. Available at: https://www.arbeitswissenschaft.net/fileadmin/Dow nloads/Angebote_und_Produkte/Broschueren/ifaa_2016_Digitalisierung_I40.pdf. Accessed 08 Aug 2019 10. Streissler Guide, A.: Digitization, Productivity and Employment. Agnes Streissler: Economic Policy Project Consulting, 2016 [Online]. Available at: https://www.digitales.oesterreich.gv. at/documents/22124/30428/Studie_Digitalisierung,+Produktivität+und+Beschäftigungung/ 4fa3af4d-bc03–416c-87a0–33f2707ac88f. Accessed 09 Aug 2019 11. Mattern, F.: Die technische Basis für das Internet der Dinge. In: Fleisch, E., Mattern, F. (eds.) The Internet of Things: Ubiquitous Computing and RFID in Practice: Visions, Technologies, Applications, Instructions for Action, 1st edn., pp. 39–66. Springer-Verlag, Berlin Heidelberg (2005)

Voucher 4.0—Digitisation Potential in Voucher Sales …

79

12. Fraunhofer Institute for Material Flow and Logistics, Rapid Growth: The Internet of Things Scales Exponentially, [Online]. Available at: https://www.internet-der-dinge.de/. Accessed: 12 Sep 2019 13. Weber, R.H., Weber, R.: Introduction, in Internet of Things : Legal Perspectives, pp. 1–22. Springer, Berlin Heidelberg (2010) 14. Haller, S., Karnouskos, S., Schroth, C., Ag, S.A.P.S., Zurich, C.E.C.: The Internet of Things in an Enterprise Context. In: Domingue, J., Fensel, D., Traverso, P. (eds.) Future Internet—FIS 2008, pp. 14–28. Springer, Berlin Heidelberg (2009) 15. Kagermann, H., Wahlster, W., Helbig, J.: Implementation recommendations for the future project Industry 4.0, Acatech—German Academy of Science and Engineering 2013, [Online]. Available at: https://www.acatech.de/wp-content/uploads/2018/03/Abschlussbericht_Industr ie4.0_barrierefrei.pdf. Accessed: 11 Aug 2019 16. Industrial Internet of Things—The Role of Telecommunications Companies, Deloitte, 2016. Available at: https://www2.deloitte.com/content/dam/Deloitte/de/Documents/techno logy-media-telecommunications/Deloitte_TMT_IndustriellesInternet%20of%20Things.pdf. Accessed: 09 Sep 2019 17. Wachter, B.: Big Data—Applications in Market Research. In: König, C., Schröder, J., Wiegand, E. (eds.) Big Data: Opportunities, Risks, Development Trends, pp. 17–25. Springer, Wiesbaden (2018) 18. Pendyala, V.: The Big Data Phenomenon. In: Pendyala, V. (ed.) Veracity of Big Data, pp. 1–15. Apress, Berkeley, CA (2018) 19. Bloehdorn, S., Fromm, H.: Big Data—Technologies and Potential. In: Schuh, G., Stich, V. (eds.) Enterprise Integration: On the Way to the Collaborative Enterprise, pp. 107–124. Springer, Berlin Heidelberg (2014) 20. Morabito, V.: Big Data and Analytics: Strategic and Organizational Impacts, 1. Springer International Publishing, Aufl (2015) 21. Verl, A., Lechler, A.: Control from the Cloud. In: Bauernhansl, T., Ten Hompel, M., Vogel-Heuser, B. (ed.) Industry 4.0 in Production, Automation and Logistics: Application, Technologies, Migration, pp. 235–247. Springer Vieweg, Wiesbaden (2014) 22. Appelrath, H.-J., Kagermann, H., Krcmar, H. (2014) Future Business Clouds: a Contribution to the Future Project Internet-based Services for the Economy, acatech STUDY, 2014 [Online]. Available at: https://www.acatech.de/wp-content/uploads/2018/03/acatech_S TUDIE_FutureBusinessClouds_WEB.pdf. Accessed: 04 Sep 2019 23. Bauernhansl, T.: Introduction. In: Bauernhansl, T, Ten Hompel, M., Vogl-Heuser, B (eds.) Industry 4.0 in Production, Automation and Logistics, pp. 1–48. Springer Vieweg, Wiesbaden (2014) 24. Schrauf, S., Berttram, P.: Industry 4.0: How digitization Makes the Supply Chain more e Client, Agile, and Customer-Focused, PwC Strategy&, 2016. [Online]. Verfügbar unter: https://www.strategyand.pwc.com/gx/en/insights/2015/industry-4-opportuni ties-and-challenges/industry-4-0.pdf. Zugegriffen: 21 Okt 2019. 25. Fallenbeck, N., Eckert, C.: T Security and Cloud Computing. In: Bauernhansl, T., Ten Hompel, M., Vogel-Heuser, B (ed.) Industry 4.0 in Production, Automation and Logistics: Application, Technologies, Migration, pp. 397–431. Springer Vieweg, Wiesbaden (2014) 26. Diaonescu, R.: Status and Trends in the Global Manufacturing Sector, IIOT-World, 2020. [Online]. Verfügbar unter: https://iiot-world.com/connected-industry/status-and-trends-in-theglobal-manufacturing-sector/. Zugegriffen: 16 März 2020 27. Dais, S.: Industrie 4.0—Anstoß, Vision, Vorgehen. In: Bauernhansl, T., Ten Hompel, M., VoglHeuser, B (eds.) Industry 4.0 in Production, Automation and Logistics, pp. 625–634. Springer Vieweg, Wiesbaden (2014) 28. Gausemeier, J., Klocke, F.:Industry 4.0: International Benchmark, Future Options and Recommendations for Action in Production Research, Acatech—German Academy of Science and Engineering 2016. [Online]. Available at: https://www.acatech.de/publikation/industrie-4-0internationaler-benchmark-zukunftsoptionen-und-handlungsempfehlungen-fuer-die-produk tionsforschung/. Accessed: 18 Oct 2019

80

W. Neussner and P. Rapp

29. Kagermann, H., Anderl, R., Gausemeier, J., Schuh, G., Wahlster, W.: Industry 4.0 in a Global Context: Strategies of Cooperation with International Partners, Acatech STUDY, 2016 [Online]. Available at: https://www.acatech.de/publikation/industrie-4-0-im-globalen-kontextstrategien-der-zusammenarbeit-mit-internationalen-partnern/. Accessed: 18 Oct 2019 30. Federal Ministry of Economics and Energy: Chancen durch Industrie 4.0, Federal Ministry of Economics and Energy, 2019 [Online]. Available at: https://www.plattform-i40.de/PI40/Naviga tion/DE/Industrie40/ChancenIndustrie40/chancen-durch-industrie-40.html. Accessed: 18 Oct 2019 31. Bischoff, J.:Tapping the Potential of the Application of Industry 4.0” in Medium-Sized Companies, agiplan GmbH, 2015 [Online]. Available at: https://www.bmwi.de/Redaktion/ DE/Publikationen/Studien/erschliessen-der-potenziale-der-anwendung-von-industrie-4-0-immittelstand.pdf%3F__blob%3DpublicationFile%26v%3D5. Accessed: 11 Sep 2019 32. Schuh, G., Anderl, R., Gausemeier, J., Ten Hompel, M., Wahlster, W., Anderl, R.: Industrie 4.0 Maturity Index: Shaping the Digital Transformation of Companies, Acatech STUDY, 2017 [Online]. Available at: https://i40mc.de/wp-content/uploads/sites/22/2016/11/acatech_S TUDIE_Maturity_Index_de_WEB.pdf. Accessed: 16 Sep 2019 33. Geissbauer, R., Schrauf, S.: Industry 4.0: Opportunities and challenges of the fourth industrial revolution, PwC Strategy, 2014 [Online]. Available at: https://www.strategyand.pwc.com/de/ de/studie/industrie-4-0.pdf. Accessed: 21 Oct 2019 34. Gregus, M., Kryvinska, N.: Service Orientation of Enterprises—Aspects, Dimensions, Technologies. Comenius University in Bratislava (2015). ISBN: 9788022339780 35. Kryvinska, N., Gregus, M.: SOA and its Business Value in Requirements, Features, Practices and Methodologies. Comenius University in Bratislava (2014). ISBN: 9788022337649 36. Geissbauer, R., Vedso, J., Schrauf, S.: Industry 4.0: Building the Digital Enterprise, PwC, 2016. Available at: https://www.pwc.com/gx/en/industries/industries-4.0/landing-page/industry-4.0building-your-digital-enterprise-april-2016.pdf. Accessed: 21 Oct 2019 37. Siepmann, D.: Industrie 4.0—Grundlagen und Gesamtzusammenhang. In: Roth, A (ed.) Introduction and Implementation of Industry 4.0: Basics, Process Model and Use Cases from Practice. Springer Gabler, Berlin Heidelberg (2016) 38. Huber, W.: Industry 4.0 in Automobile Production: A Practical Book. Springer Vieweg, Wiesbaden (2016) 39. Light blue et al.:Industry 4.0 Readiness, 2015. [Online]. Verfügbar unter: https://industrie40. vdma.org/documents/4214230/26342484/Industrie_40_Readiness_Study_1529498007918. pdf/0b5fd521-9ee2-2de0-f377-93bdd01ed1c8. Accessed: 27 Nov 2019. 40. Zillman, M., Wilk, C.: Smart Factory—How digitization changes factories: Transformation from the factory floor to management”, Lünendonk—Whithepaper, 2016 [Online]. Available at: https://www.telekom.com/resource/blob/323140/53ed1330933baaca2da24915 b4b7cfce/dl-160620-whitepaper-smart-factory-data.pdf. Accessed: 29-Nov-2019]. 41. Döring, N., Bortz, J.: Research Methods and Evaluation in the Social and Human Sciences, 5th edn. Springer, Berlin Heidelberg (2016) 42. Creswell, J.W., Creswell, D.: Research Design: Qualitative, Quantitative and Mixed Methods Approaches, 5. SAGE Publications, Aufl (2018) 43. Tashakkori, A., Teddlie, C.: Foundations of Mixed Methods Research: Integrating Quantitative and Qualitative Approaches in the Social and Behavioral Sciences. SAGE Publications (2009) 44. Wagner-Schelewsky, P., Hering, L.: Online Survey. In: Bauer, N., Blasius, J (eds.) Handbuch Methoden der empirischen Sozialforschung, 2nd ed., pp. 787–800. Springer Fachmedien Wiesbaden (2019) 45. Taddicken, M.: Online-Befragung. In: Möhring, W., Schlütz, D. (eds.) Handbuch standardisierte Erhebungsverfahren in der Kommunikationswissenschaft, pp. 201–217. Springer Fachmedien Wiesbaden, Hrsg (2013) 46. From Maurer, M., Jandura, O.: Mass Instead of Class? Some Critical Remarks on the Representativeness and Validity of Online Surveys. In: Jackob, Z. T. N., Schoen H. (eds.) Sozialforschung im Internet, pp. 61–73. VS Verlag für Sozialwissenschaften (2009)

Voucher 4.0—Digitisation Potential in Voucher Sales …

81

47. vivo Mitarbeiter-Service GmbH, Becoming a vivo Card holder, 2020 [Online]. Available at: https://www.vivo-service.at/VIVO/vivo-Card-erwerben. Accessed: 20 Mar 2020 48. Land, W.: Entwurf Gesetz, mit dem Aufhebung des Betriebsaktionen-Verbotsgesetzes (Supplement No 7/2001)’, 2001 [Online]. Available at: https://www.wien.gv.at/ma08/hist-gesetzese ntwurf/2001/beilage-7-01.pdf. Accessed: 18 March 2020 49. Land, W.: Betriebsaktionen-Verbotsgesetz; Aufhebung, 2001 [Online]. Available at: https:// www.wien.gv.at/recht/landesrecht-wien/landesgesetzblatt/jahrgang/2001/html/lg2001121. htm. Accessed: 18 March 2020 50. Public Service Union: GÖD Leitbild, 2020 [Online]. Available at: https://www.goed.at/ueberuns/leitbild/. Accessed: 19 March 2020 51. Public Service Union: GÖD Mitglied werden, 2020 [Online]. Available at: https://www.goed. at/mitgliedschaft/goed-mitglied-werden/. Accessed: 19 Mar 2020 52. Public Service Union: CEDA Benefits, 2020 [Online]. Available at: https://www.goedvorte il.at/. Accessed: 19 March 2020 53. Molnár, E., Molnár, R., Kryvinska, N., Greguš, M.: Web Intelligence in practice. Socf Serv Sci J Serv Sci Res Springer 6(1), 149–172 (2014)

Use of E-service Analytics in Slovakia Martina Halás Vanˇcová and Marián Mikolášik

Abstract The aim of the paper is to outline the current state of analytics use in the companies providing electronic services in Slovakia. Service analytics provides companies with many advantages. On the other hand, companies face various challenges that arise from the use of service analytics. The research is focused on companies providing e-services in Slovakia, their perception of and approach to service analytics as well as key issues they face. It was conducted in all regions of Slovakia. Keywords Electronic services · Analytics · ICT · Goals

1 Introduction We live in a world where services prevail in the economy, and at the same time, services are in an increasing amount offered via the Internet. When companies provide electronic services, they do not have a good chance for face-to-face communication with their customers. Thus, they cannot influence customers directly as it is in classical offline services. On the other hand, companies providing e-services can benefit from the wealth of data. Data are an endless well of information and knowledge. Thus, the modern age leads us to a place where we no longer need to meet personally with our clients and still be able to adjust our services to their needs. The “data age” has brought us to data processing, data mining and analyzing, or more precisely to analytics. The Internet has brought us to provide services electronically. And the combination of both—e-services and data—brought us to the age of service analytics.

M. H. Vanˇcová · M. Mikolášik (B) Department of Information Systems, Faculty of Management, Comenius University in Bratislava, Bratislava, Slovakia e-mail: [email protected] M. H. Vanˇcová e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_3

83

84

M. H. Vanˇcová and M. Mikolasik

We were motivated to focus on the topic of service analytics for two main reasons: it is a new undiscovered topic, and it is becoming the core of decision-making in eservice providing organizations. Another point is that service analytics is taking over the data world faster than we have expected. Succeeding in the world of information requires scientists to understand what is actually happening in companies. Therefore, we focused on companies and their approach to service analytics. Whether a company is large or small, service analytics influences it as a whole, from human resources through marketing to sales. Service analysis is provided in compliance of standards [1]. Service analytics is already carried out in companies. However, it still offers countless options for scientists to research, observe and shape the field based on obtained knowledge. The scientific insight into service analytics is required mainly from the perspective of knowledge sharing, ideas making, and innovation. In contrast to companies, scientists who focus on service analytics have levity over it since they are not affected by the profit or loss of companies, but they focus on service analytics from the perspective of knowledge creation or value co-creation. In addition to this, service analytics has not been sufficiently researched yet, and it is often confused with terms such as business analytics, data analytics, or business intelligence. We aim to focus on service analytics and explain our perception from a theoretical as well as an empirical perspective. The data protection must be considered as priority [2].

1.1 Theoretical Basis Analytics has evolved with the increasing amount of data [3]. Actually, at the beginning of analytics, there were data. With the development of ICT, organizations have been able to store increasing amounts of data in databases with constantly growing capacities. However, storing data might be useless when data are not used for purposes of improvement of an organization. Thus, organizations started to “use” the data. In order to use them, they had to apply various mathematical and statistical operations and methods that would enable them to extract knowledge from data [4, 5]. Nowadays, organizations operating in the knowledge economy cannot rely on pure data. Data must be transferred into knowledge. Based on the DIKW principle [6], we can deduce that analytics could be understood as one of the ways how an organization can obtain wisdom or knowledge from pure data. We have visualized our synthesis in Fig. 1. Naturally, analytics has evolved from mathematics and statistics. Actually, analytics has roots in operations research, which was applied during World War II. The only improvement of IT and the increasing availability of computers enabled its growth into its current form as well as its simple application in business [7]. Obviously, since analytics is derived from statistics, as mentioned above, it was already present in ancient times. In the 1970s, Edgar Codd introduced relational databases, which make data analyzing simpler. We can claim that relational databases together

Use of E-service Analytics in Slovakia

85

Fig. 1 Our visualization of the DIKW principle connected with analytics

with the development of SQL, strongly contributed to the increased interest in the faster analysis of data. However, relational databases are applicable only when data is of a certain type. But the invention of the Internet brought a new perspective on data and data sources. It was necessary to come up with non-relational databases and subsequently with NoSQL. All of these steps were necessary for the future invention of Big Data [8]. Actually, the term Big Data was used in 2005 for the first time. Business intelligence, as a related part of business analytics, was for the first time mentioned in 1965 but adapted in 1989. Similarly, data mining was introduced in the 1990s. In 1999, cloud analytics was already present. Predictive analytics was already used in the 1940s thanks to the first use of computers and the need of governments to predict the development of certain economic phenomena [9–11]. All of these aspects led to the current view on the sphere of analytics, which is very broad and confusingly structured. Currently, analytics is dealing with new concepts such as service analytics or artificial intelligence, and many others.

1.2 Analytics Defined Analytics is a process [3, 12]. Its inputs are data, which are stored in various systems, clouds, databases, etc. They are processed through different methods (e.g. data mining, sorting, extrapolation, linear programming, etc.) in order to reach output—in the form of information or knowledge. Basically, the main aim of business analytics is to deliver the right decision to the right people at the right time [13]. In the end, every company tries to find a competitive advantage, and analytics may be its source [14]. The term “analytics” is still considered a buzzword. It is currently quite overused, and therefore there occur different applications of analytics in practice. Thus, its definition can also be understood from different perspectives, and it may vary from

86

M. H. Vanˇcová and M. Mikolasik

user to user and from company to company. Generally, scientists claim that analytics is “a concept of data-driven decision-making”. Probably, the best way how to define analytics is to describe it. Therefore, we can sum it up that “analytics is the process of developing actionable insights through problem definition and the application of statistical models and analysis against existing and/or simulated future data” [15].

1.3 Why Analytics Matter Nowadays, we can clearly claim that analytics really matters. It is the way how data can be transferred into value for a company. This fact is widely known. However, companies also have to realize that analytics itself is not a lifesaver. It is necessary to align it with business strategy, business performance management, day-to-day tasks and responsibilities [16]. As mentioned above, analytics results in “actionable insights”. It means that the output of analytics should/has to lead to a change, improvement or at least stabilization of a trend. Additionally, since analytics is based on data, i.e. rational information, it should give a user the sense of reliability and validity—something a company can rely on. And the reliability and validity should be viable also for a long-term perspective [15]. “Companies that embrace and effectively use analytics will be more successful than companies that don’t” [17]. Scientists and companies already know that analytics is a part of the successful future. Its emphasis is placed already at primary schools. Analytics and big data are considered as elements that will shape the future of higher education, even though it is something intangible and difficult to explain to ordinary people [11, 18]. It is clear that the latest interest in the sphere of analytics has been caused by an improvement in computer and data science. Actually, analytics has noted the greatest growth in the last decade. Since analytics is able to improve a company’s decision-making, it also has a strong impact on the future of the whole company. It means that correct decision-making, which is based on analytics, can increase a company’s profits, market share as well as position in the market, and naturally also revenue and return to shareholders. It may seem that the main precondition for the successful use of analytics in a company is understanding of business data. However, it might be vice-versa. Analytics may actually help a company to understand its data, or to be more specific, give knowledge about the data from different perspectives. Since we live in the digital age, the presence of computers and data science at every company does not give any company a competitive advantage. Actually, analytics might seem as a business need. However, it is people’s need. Even though we might not realize it, analytics is purposeful for ordinary citizens, for example, when they come into contact with electronic services provided by the government. Analysis of historical data, predicting of future needs, data mining, all of these techniques are a benefit for citizens and their daily life. Examples of its use include a better understanding of criminality, frauds in social insurance, education planning and

Use of E-service Analytics in Slovakia

87

allocation of students, testing at schools, disease tracking, predicting road repairing, and many others [19–21].

1.4 Analytics and Service Science Recently, researchers have introduced a new concept within the field of analytics— service analytics [22]. The new concept has not been properly reviewed yet. However, it lies on the interface between Service Science and business analytics [23]. Thus, our aim is to provide the theoretical basis to the sphere of the services sector and related Service Science. Analytics is a powerful method how to achieve a competitive advantage. Service organizations can use analytics in order to attract customers, improve internal processes, find “niche” where they can exploit resources. Analytics is an enabler helping them to improve service offerings, innovate existing services, but it may also modify their relationship with customers to make it more responsive, fair, and aligned to customer’s desires [24, 25].

2 Service Analytics 2.1 E-services E-services generate huge amounts of data. The main aim of service analytics is to capture, process, and analyze the collected data with the aim to advance, extend or personalize the service delivered to customers. Actually, service analytics enables e-service providers to generate better business outcomes that they acquire thanks to the availability of data and a right look inside the data [26]. Service analytics enables the providers of e-services to better and easier discover strengths, weaknesses, opportunities and threads of their business, as well as optimization of processes, potential cost-saving and improved customer orientation and relationships with customers [27, 28]. If we compare service analytics to a broader topic of business analytics, we can deduce that service analytics belongs to the general group of business analytics, however, service analytics specialized in the analysis of provided e-services. Additionally, if we compare traditional human-based service delivery to service delivery via ICT, we can naturally claim that human-based or offline services are provided with a face-to-face connection of a provider and a customer. In many cases, the provider knows the customer, his/her needs, wants, the reason why he/she decided for certain services [29]. According to this knowledge, the service provider can improve, extend or personalize the service in order to fit best to the needs of the customer and, on the other hand, to achieve a better position in the market as well as

88

M. H. Vanˇcová and M. Mikolasik

higher profits in the future, or at least stable customers and stable profits. This faceto-face communication is not possible in the case of e-services. Thus, the knowledge obtained during the offline delivery of services is missing [30, 31]. However, eservices generate tons of data. The main task for an e-service provider is not only to securely store the data but to work with the data in order to achieve as much information as possible, which will in return bring benefits for both sides—a customer as well as a provider [26].

2.2 Research Problem and Research Goal After studying available foreign and Slovak literature related to e-services, we identified the research problem that we are not fully aware of situation with the use of analytics in Slovak organizations providing e-services. In an ideal world, companies should be aware of the power of data, but we cannot expect it globally for the whole country. Thus, the main scientific goal of our paper is to outline the current state of analytics use in the companies providing electronic services (e-services) in Slovakia. It means that we are specifically focused on e-service analytics. The main scientific goal will be achieved thanks to its decomposition to the following partial goals: • By the methods of the primary research, find out whether service analytics is used in Slovak companies • Identify the differences in service analytics use according to the company size • Identify the main problems that companies face with the use of service analytics • Identify whether companies use internal or external data for service analytics or if they rely on the combination of both types • Identify whether service analytics influences decision-making in companies and to what extent • Identify whether companies consider service analytics as a source of competitive advantage • Identify how employees and managers approach analytics • Find out whether the companies that do not use analytics collect, store and work with data • Find out what are the factors causing that companies do not use analytics.

2.3 Research Methodology The quantitative research was carried out via the method of a questionnaire survey. The quantitative research had explorative characteristics. Its main aim was to outline the situation with the use of service analytics in Slovak companies. The questionnaire was distributed via email in an electronic form. We used Google Forms to create the questionnaire. The questionnaire was logically divided into three sections since we

Use of E-service Analytics in Slovakia

89

sent it to a research sample without previous knowledge of whether they use service analytics or not. Thus, we sectioned it into three parts: (a) (b) (c)

Respondents who know what analytics is and use it in their company Respondents who know what analytics is but do not use it in their company Respondents who do not know what analytics is and do not use it in their company.

In case of Group (a), we focused on the way how analytics is used, what is its position within the company from the perspective of organizational structure as well as strategy, what are the main challenges, how analytics influences decisionmaking in e-service providing companies. This section was the richest in questions because we needed to discover the as-is state in Slovak companies. It might seem that Group (b) would not directly help us with uncovering the state of using service analytics in Slovak companies, however, it provided us with valuable insights about the perception of data use and storage in Slovak companies as well as the opinion about the perceived value of analytics and potential use in the future. Last but not least, Group (c) enabled us to find out whether Slovak companies collect and store data and if they further work with data. The qualitative research consisted of non-structured personal interviews with 8 companies operating in the Slovak republic and situated in Bratislava. The interviews were carried out with various representatives of selected companies about which we were previously sure that they use analytics. The interviews were attended by one, two or three representatives in the individual companies. All of the interviewed companies have more than 250 employees and therefore, we assign them to the group of large companies [32]. The research project consisted of three main parts: (a)

(b)

(c)

Preparation phase—the initial preparation phase includes identification of the research problem, the main research goal and partial goals, preparation of questionnaire survey and personal interviews Realization phase—collection and processing of data—i.e. distribution of questionnaires, a realization of personal interviews, analysis of data in MS Excel Evaluation phase—analysis of obtained data, interpretation of findings, verification of hypotheses, formulation of the main research contribution.

The research subject was primarily the state of the use of service analytics in Slovak companies as well as the identification of the main challenges they face during the use of analytics. The research object consisted of Slovak companies operating in the sphere of information technology and telecommunication because only in the case of these two groups from the range of all Slovak companies we could expect: (a) (b)

Use of analytics Providing of e-services.

90

M. H. Vanˇcová and M. Mikolasik

The research technique consisted of a combination of quantitative and qualitative research methods—questionnaire survey and personal interviews with representatives of analytics departments of Slovak companies.

2.4 Research Sample The research sample was selected based on the selected research methods. For personal interviews, we used an intentional sample—8 companies operating in the Slovak republic, providing electronic services and using analytics, while the main limiting factor was the willingness of the representatives of analytics departments to have a personal interview with us. We carried out personal interviews with 3 companies operating in the banking industry, 3 shared services centers, 2 telecommunication companies. For the questionnaire survey, the research sample consisted of the companies in case of which we can assume providing of e-services and use of analytics, i.e. application of service analytics techniques. Based on our logical elimination, we eliminated a large group of Slovak companies operating in the sector of information technology and communication technology. We eliminated the research populationbased on SK NACE codes. Thus, the research population consists only of those SK NACE codes that cover the ICT sector. The list of SK NACE code names is available in Table 1. The full list is available in Slovak language at the webpage of the Slovak Statistics Office [33]. Based on the revision of SK NACE codes, we were able to select the suitable research population at the webpage Finstat.sk. We used the paid version, where we could search according to the entrepreneurial sector. We selected the information technology and telecommunication technology sector [34]. The questionnaire was created in Google Forms and consisted of three sections. Question no. 6 directed the respondents to the particular sections. The sections were dedicated to those respondents who: (a) (b) (c)

Know what analytics is and use it in their company Know what analytics is but do not use it in their company Do not know what analytics is and do not use it in their company.

The questions in the questionnaire were created in order to fit to our research objective: – Firstly, we needed to divide our respondents into three groups: those who use analytics, those who know what analytics is but do not use it, and those who do not know analytics and do not use it. This division enabled us to see the ratio between analytics users and those who do not use it. The main selection of respondents for the questionnaire was eliminated by the business focus. To be more specific, we focused only on those companies that operate in the ICT sector because there is a higher probability that they will use analytics. Another perspective of this selection is that we are focused on service analytics and therefore, we needed primarily

Use of E-service Analytics in Slovakia

91

Table 1 List of SK NACE codes and names for the sector of information technology and communication technology Class

Name

58.11

Book publishing

58.12

Publishing of directories and catalogs

58.13

Publishing of newspapers

58.14

Publishing of magazines and periodicals

58.19

Other publishing activities

58.21

Computer games publishing

58.29

Other software publishing

59.11

Motion picture, video and television program production services

59.12

Supporting activities relating to the production of films, videos and television programs

59.13

Distribution of films, videos and television programs

59.14

Movie screening

59.20

Preparation and publication of sound recordings

60.10

Radio broadcasting

60.20

Television broadcasting and television subscription programs

61.10

Wired telecommunications activities

61.20

Wireless telecommunications activities

61.30

Satellite telecommunications activities

61.90

Other telecommunications activities

62.01

Computer programming

62.02

Computer consultancy

62.03

Computer accessories management activities

62.09

Other information technology and computer services

63.11

Data processing, web server provisioning and related services

63.12

Web portal services

63.91

Activities of news agencies

63.99

Other information services

Section J—information and communication

– – – –

those companies that provide electronic services (e-services). The selection was made based on SK NACE codes, which distinguish companies based on their business focus. Furthermore, we asked companies about their business focus in the questionnaire in order to confirm our selection by SK NACE codes. We focused on the perceived importance of analytics for companies. We focused on data collection and storage in the case of those companies that do not use analytics. We focused on the most crucial challenges that companies face. We focused on the type of analytics that companies carry out.

92

M. H. Vanˇcová and M. Mikolasik

– We focused on employees included in the analytics processes as well as those who are users of analytics outputs. The target population of the questionnaire are companies that are: – operating in the Slovak Republic – providing electronic services. According to the set elimination criteria, we selected only those companies that operate in the sector of information technology and communication. We filtered them in the paid version of Finstat.sk. The target population consisted of 16117 companies. The random sample was created by 4598 companies for which we were able to find email contacts in the database on Finstat.sk. With the population size of 4598 companies, a margin of error of 0.1, a confidence level of 0.95 we calculated the recommended sample size to 95 questionnaires. The minimum sample size was 67 questionnaires in case of the margin of error 0.1. We used the webpage www.rao soft.com for the calculation of the sample size [35]. In order to achieve a sufficient return of filled questionnaires, we had to send an email reminder to fill the questionnaire. In the end, we obtained 108 filled and valid questionnaires. From the perspective of statistics, we achieved more filled questionnaires than the recommended sample size. Therefore, we consider the research sample as sufficiently large. Our research sample covers all 8 regions of the Slovak Republic. The most answers were obtained from companies operating in the Bratislava region (53.4%), followed by the Banská Bystrica region (14.3%) and the Trenˇcín region (10.8%). From the perspective of size, we divided companies in the questionnaire into small (1–49 employees), middle-sized (50–249 employees) and large (above 250 employees). We obtained the filled questionnaire mostly from small companies (71.44%), the rest was from middle-sized companies (27.13%). Only one questionnaire was obtained from a large company (1.43%). We expected a low return rate from large companies and that was one of the reasons why we also planned personal interviews with large companies. The composition of the research sample from the perspective of region and size of a company is showed in Fig. 2. From the perspective of the legal form of a company, we received the majority of answers from limited liability companies (82.55%) followed by a public limited company (16.90%). Only 2 answers (0.54%) were from sole traders. The obtained composition according to legal form corresponds with the distribution in the Slovak IT sector where most of the companies operate as limited liability companies. The questionnaire was aimed to be filled by managing employees, owners, statutory bodies, directors, CEOs, etc., which was also noted in the body of the email. This rule was reviewed in question no. 4, where we asked respondents for their position at work. We considered only those questionnaires as valid, where the job role/position fulfilled our rule. Table 2 shows the job roles that respondents stated in question no. 4. Similarly, our research is focused on service analytics. Therefore, we are focused only on those companies that provide electronic services. We limited our research

Use of E-service Analytics in Slovakia

93

Fig. 2 Composition of the research sample according to region and size of a company—questionnaire survey

sample only to those companies that operate in the ICT sector (as mentioned above, we selected based on the SK NACE coding). However, we needed to confirm this rule, and therefore we asked respondents about the focus of their business. Again, we used only those questionnaires where business orientation fulfilled the rule of the ICT sector. In addition to this, some of the respondents specified area under the ICT sector. Table 3 shows the business focus that respondents stated in question no. 5. In question no. 6, we divided the respondents into 3 groups: (a) (b) (c)

those who know what analytics is and use it in their company those who know what analytics is but do not use it in their company those who do not know what analytics is but do not use it in their company.

The question was obligatory for all respondents. 68 respondents (67.52%) belong to the group (a)—they use analytics in their company. 35 respondents (31.63%) belong to the group (b)—they know what analytics is but do not use it in their

94

M. H. Vanˇcová and M. Mikolasik

Table 2 The position of the employee who filled this questionnaire Question 4: State the position of the employee who filled this questionnaire CEO

19

17.59%

Owner/co-owner

17

15.74%

Manager (general)

15

13.89%

Manager (HR)

3

2.78%

IT manager/CTO

9

8.33%

Project manager/coordinator

7

6.48%

Data analyst/business analyst

15

13.89%

Programmer/developer

6

5.56%

Delivery manager

2

1.85%

Board member

6

5.56%

IT consultant/consultant

8

7.41%

1

0.93%

Area manager Total

108

100.00%

Table 3 The business focus of the companies that filled the questionnaire Question 5: State the business focus of your company Webpage creation

19

17.59%

Information technologies for the forestry industry

1

0.93%

Internet services/Internet connection services

9

8.33%

28

25.93%

IT (general) IT consulting

8

7.41%

23

21.30%

HW leasing/HW configuration

3

2.78%

Cybersecurity

1

0.93%

IT marketing

2

1.85%

Navigation SW

1

0.93%

SW development

R&D in IT

2

1.85%

SW for school canteens

1

0.93%

Software, licencing

1

0.93%

Cloud services

1

0.93%

SW for banks

1

0.93%

Creation of online business systems

1

0.93%

Telecommunications

3

2.78%

SW for accounting

1

0.93%

SW for procurement

1

0.93%

1

0.93%

Custom-made web apps Total

108

100.00%

Use of E-service Analytics in Slovakia

95

company. 5 respondents (0.85%) belong to the group (c)—they do not know what analytics is and do not use it in their company. From perspective of small companies (1–49 employees; 84 respondents in total), the distribution according to question no. 6 was as follows: 49 respondents (58.33%) know and use analytics; 32 respondents (38.10%) know what analytics is but do not use it in their company and the remaining 3 respondents (3.57%) does not know what analytics is and do not use it. In case of middle-sized companies (50–249 employees; 23 respondents in total), the distribution according to question no. 6 was as follows: 18 respondents (78.26%) know analytics and use it in their company; 3 respondents (13.04%) know what analytics is but do not use it in their company, and the remaining 2 respondents (8.70%) does not know what analytics is and do not use it in their company. From the perspective of large companies (more than 250 employees; 1 respondent), the respondent belongs to the group (a) i.e. to those companies that know the term analytics and use it in their company. The visualization of the distribution of respondents into the groups (a), (b) and (c) according to a company size is shown in Fig. 3.

Fig. 3 Composition of the research sample according to the approach to service analytics— according to size of a company

96

M. H. Vanˇcová and M. Mikolasik

The personal interviews were carried out in 8 large companies (more than 250 employees) that operate in Bratislava. Since in the questionnaire survey we did not achieve sufficient number of respondents from large companies, we supported our research by the personal interview in order to complete the full view about the use of service analytics in Slovak ICT companies. Moreover, the interviews enabled us to obtain information about daily processes of the interviewed companies, the obstacles and challenges that they face within the area of analytics and moreover, we were able to discuss best practices and success stories. The interviews were semi-structured. We applied several questions from our questionnaires; however, we wanted the interviews to be sufficiently open to achieve better insight into their approach and attitude to analytics. Within the interviews we focused on the main areas: – Analytics and organizational structure—how organizational structure influences analytics and its impact on decision making – How analytics influences decision-making – Attitude of employees towards analytics – Technological perspective, used tools, obstacles in technology – The major challenges – How/what data are used. The interviews were carried out with representatives of 8 large companies operating in the following business areas: – Banking industry (3 companies)—we focused mainly on service analytics carried out above e-services provided within mobile banking applications – Telco industry (2 companies)—we focused on service analytics carried out above telecommunication electronic services, internet services as well as B2B services – Shared services centers operating primarily in IT sector (3 companies)—we focused on service analytics carried out on data obtained from IT support help desks, cloud services and software delivery. The shared services centers provide also other services apart from those that belong to the ICT sector (e.g. outsourcing of accounting, contract analysis, etc.), however, we were purely focused on the area of e-service provision. The interviews were carried out with representatives of data processing departments, business intelligence departments and business analytics departments. The interviews were attended by 1–3 employees, mixture of males and females and in some cases, they were observed by a representatives of HR department due to security issues. The respondents who participated in the interviews had practice in the business sphere for at least 5 years. The detailed information about the respondents are stated in Table 4.

Use of E-service Analytics in Slovakia

97

Table 4 Composition of respondents of personal interviews Business area

Number of respondents Department/team at the interview

Experience of respondents

Banking industry

2 (male)

Data processing department

6 years 15 years

Banking industry

1 (male)

Business intelligence team

5 years

Banking industry

2 (male and female)

Business analytics team 20+ years 9 years

Telco industry

2 (female, male); 1 observer from HR department (female)

Business intelligence department

Telco industry

1 (female)

Business analytics team 5 years

Shared services center

2 (male)

Data analysis team

10+ years 8+ years

Shared services center

1 (male); 1 observer (female)

Business intelligence team (center of excellence)

5 years

Shared services center

2 (male)

Business analytics department

10+ years 10+ years

13 years 11 years

3 Results and Discussion This chapter contains the assessment of the empirical part of the paper that consisted of two research methods: questionnaire survey and personal interviews. The respondents from the questionnaire survey cover mostly the sector of small and middle-size companies (except for one large company) and the respondents in personal interviews cover the sector of large companies. In the following subchapters, we will combine our findings from the questionnaire survey with findings from the personal interviews.

3.1 Analytics Department or Team, Organizational Structure, Internal Communication In the Section A of the questionnaire, i.e. the section with respondents who know what analytics is and use it in their company, we asked the respondents whether they have a specialized team or department for analytics processes. In general, 49 respondents (75.49%) does not have a specialized department and 19 does (24.51%). From the perspective of size of a company, all large companies (i.e. those from the questionnaire survey as well as those who attended the personal interviews) have either specialized department or team. In case of middle-sized companies 8 companies (35.04%) have a specialized department/team and the remaining 10

98

M. H. Vanˇcová and M. Mikolasik

companies (64.96%) do not have such a department. Lastly, in case of small companies, our research sample showed similar results as in case of middle-sized companies—10 respondents (16.56%) have a specialized analytics department/team and 39 companies (83.44%) do not. Thus, from the perspective of size, we can conclude that small and middle-sized companies located in Slovakia mostly do not tend to create analytics departments/teams. However, the situation significantly changes with increasing number of employees—in case of large companies, we can see that all of the companies that attended our research have a specialized analytics department or team. From the perspective of regions of Slovakia, we did not observe any specific deviations from the general distribution of answers for this question in the total sample size. From the perspective of the organizational structure, we observed that the majority of the companies in our research has a team, department or a specialized employee who work with analytics. This is perceived as a positive factor. Researchers have already highlighted the fact, that the main aim of companies, in the area of analytics, should be integration. Companies should integrate analytics as a business function, or more specifically, business knowledge [1, 36]. As we have observed from our questionnaires as well as interview, nowadays, integration of analytics teams/department is a common practice and companies understand that such a department is necessary for their success. The key factor in the positioning of an analytics department within a company is the perception of analytics as a main function—similarly as it is in case of a financial department or HR department. And we observed that this level has already been achieved in the companies. Undoubtedly, as we could see in case of technology boom, companies change under the pressure of innovation. If we consider companies in 1980s or earlier, IT department was only supportive function. Today, IT is considered as the main function enabling all other functions to work. The same scenario was visible with analytics. In 2000s companies simply “played” with data and started to observe its value. 15 years later analytics is considered as an important function and a game changer. The interviewed companies confirmed that they already used service analytics (or its forms and more basic functions) earlier than it was a “buzzword” and a famous topic. It came as a natural part of their job and only after a certain time their analytics efforts turned into specialized teams, departments and individuals with specific knowledge. According to practitioners, when allocating an analytics department into the organizational structure, it is important to discover, where its placement would be the most beneficial [8]. A company has to consider where analytics fits the best and where it will bring the most value for decision-making. Another perspective is effective management. An analytics department should be allocated (and aligned) with ability to effectively manage its functions and employees and at the same time provide sufficient internal service to the whole organization. Naturally, organizational structure directly influences communication flow between the departments. In case of the interviewed companies, they did not specify the type of organizational structure, however, they perceive their companies as those that have the matrix organizational structure. In such an organizational structure, it is

Use of E-service Analytics in Slovakia

99

typical that the communication flow might create obstacles and communication noise. Since we perceive the organizational structure as one of the factors that can influence success of service analytics in organizations, we focused on this topic in our research questions in the interviews as well as questionnaire survey. The correct communication flow between employees, teams or departments assures that the organizational processes and goals are mutually presented, understood and interconnected. Therefore, we proposed the following statements under the question no. 9 in Section A of the questionnaire: • Other departments are aware of what the analytics department/team/employee does (statement f) • Other teams share their goals with the analytics department/team/employee (statement g) • The analytics department/team/employee shares the goals with other teams (statement h) • The respondents were requested to assess the statements on the scale from 0 to 10, where 0 means absolutely disagree with the statement and 10 means absolutely agree with the statement. Results are shown in Fig. 4. In case of goals sharing, the highest number of respondents agrees with the statements that goals are mutually shared between analytics department/team and other departments. Additionally, the companies perceive that other departments are aware of what is done within analytics processes. In addition to this, companies that have an analytics department/team established, did not show significant deviance from the results valid for the whole research sample. Thus, we can conclude that in case of small and middle-sized companies, communication of goals and analytics processes is not influenced by a formally established analytics team/department. If we look at

Fig. 4 Communication of goals and analytics processes

100

M. H. Vanˇcová and M. Mikolasik

the answers from middle-sized companies in case of the statement “The analytics department/team/employee shares the goals with other teams”, the obtained answers were rather ambiguous, and we cannot deduce direct conclusion. The results are ambiguous also from the perspective of established analytics department. Respondents choose almost equally between number 2, 3, 9, and 10 on the scale as shown in Fig. 5. Similarly, we requested large companies during the interviews to assess the position of the analytics team/department within the organizational structure. We asked the respondents whether they are satisfied how the analytics team/department is organized within the company from the perspective of organizational structure. The respondents were requested to answer the question on a scale from 0 to 10, where 0 means not satisfied at all and 10 extremely satisfied. Majority of the respondents was strongly satisfied, i.e. they mostly chose options higher than 6. Results are shown in Fig. 6.

Fig. 5 Analytics department/team/employee shares goals with other departments

Fig. 6 Satisfaction with the analytics department from the perspective of organizational structure in large companies

Use of E-service Analytics in Slovakia

101

In general, we can conclude, that our research showed that it is reasonable for large companies to establish a specialized analytics team/department, that would focus purely on data processing, data mining or analytics. In case of small and middle-sized companies, this is not so typical. However, we have found out that having a specialized department/team does not have significant impact on communication about analytics processes and goals within a company. The state of communication about analytics goals is perceived mostly positively. In case of small companies, employees are usually aware about analytics processes. In case of middle-sized companies, the perception of this state was ambiguous.

3.2 Approach to Service Analytics In question no. 8, we asked respondents to select what type of analytics they are carrying out—analyzing historical data and trends, searching for reasons of certain outcomes or analyzing future trends. We proposed the following options: • We analyze past data and trends (descriptive analytics) • We analyze past data and trends and look for answers why something happened (diagnostic analytics) • We forecast likely future trends (predictive analytics). The questions were also included in personal interviews, where we asked large companies to agree or disagree with the proposed statements. As shown in Table 5, 86% of respondents that use analytics applies descriptive analytics methods (i.e. they describe past trends), 82% applies diagnostic analytics methods (i.e. they analyze why something happened), and 64% of respondents applies predictive analytics methods (i.e. they predict likely future outcomes and trends). In case of large companies, all of the interviewed companies as well as one company that attended the questionnaire survey use descriptive analytics. Only one of them does not use predictive analytics and only two of them do not use diagnostic analytics. In case of large companies, we can see the best results for predictive analytics. We can assume that the level of analytics processes in large companies is more sophisticated than in small and middle-sized companies, therefore we obtained higher results for this type. In small companies, only 57% of respondents uses predictive analytics and in middle-sized companies it is 72%. We can conclude that with increasing number of employees and more time dedicated for analytics there is interesting interest of companies to predict future trends.

3.3 Source of Data In the question no. 8, we also included the statements about the use of data. We provided the respondents with the following options:

100

76

Total

100

100

9

Large

100

65

9

15

41

86

100

83

84

62

7

15

40

82

78

83

82

49

8

13

28

Relative (%) Absolute

64

89

72

57

Relative (%)

We analyze past data and trends and We forecast likely future trends look for answers why something (predictive analytics) happened (diagnostic analytics)

Relative (%) Absolute

We analyze past data and trends (descriptive analytics)

Relative (%) Absolute

49

Absolute

Total number

Middle-sized 18

Small

Comp. size

Table 5 Type of applied analytics according size of a company

102 M. H. Vanˇcová and M. Mikolasik

Use of E-service Analytics in Slovakia

103

Table 6 Type of data used within analytics Comp. size

Small

Total number

We only use internal data

We only use external data

We use the combination of internal and external data

Absolute Relative Absolute Relative Absolute Relative Absolute (%) (%) (%)

Relative (%)

49

100

12

24

2

4

35

71

Middle-sized 18

100

3

17

0

0

14

78

Large

9

100

2

22

0

0

7

78

Total

76

100

17

22

2

3

56

74

• We only use internal data • We only use external data • We use the combination of internal and external data. As shown in Table 6, the majority of companies relies on the combination of external data (71% of small companies, 78% of middle-sized companies and 78% of large companies). Only two small companies use external data (4%). In case of the data use, we do not see significant differences in the type of used data between small, middle-sized and large companies. Naturally, the combination of internal and external data might be the most relevant for a majority of companies, because external data may provide necessary information about the world outside the company.

3.4 Dashboards and Spreadsheets In the question no. 8, we proposed the option for respondents to select whether they use spreadsheets (e.g. MS Excel) and dashboards within analytics processes. We asked this question also during the interviews. All of the large companies included in our research uses spreadsheets and 7 of them (78%) uses dashboards. During the interviews, the companies explained that “Automated dashboards help us quickly visualize the data. We have learned that many people better understand when we provide visuals rather than numbers and words.” In case of small companies, 53% uses spreadsheets and 29% dashboards. 83% of middle-sized companies uses spreadsheet and 39% uses dashboards. During the interviews we have also obtained the following information: “A company cannot do analytics without spreadsheets. There is always something you need to process in Excel.” “Without dashboards we would not be able to highlights KPIs to employees. It is better for them to see the current state and if we are improving the reaction time or not.” The obtained results are shown in Table 7.

104

M. H. Vanˇcová and M. Mikolasik

Table 7 The use of dashboards and spreadsheets according to size of a company Company size

Total number

Dashboard use

Absolute

Relative (%)

Absolute

Spreadsheet use Relative (%)

Absolute

Relative (%)

Small

49

100

14

29

26

53

Middle-sized

18

100

7

39

15

83

Large

9

100

7

78

9

100

Total

76

100

28

37

50

66

3.5 Impact of Service Analytics on Business Success and Decision-Making According to our questionnaire survey, 52 respondents (76.47%) cannot imagine success of their company without analytics. On the other hand, 9 respondents (13.24%) can imagine it and 7 respondents (10.29%) were not able to assess such situation. In addition to this, 37 respondents (54.41%) expressed that their business changed after they started to use analytics. 24 respondents (42.65%) could not assess the situation and 7 respondents (9.24%) answered that their business did not change after they started to use analytics. According to these findings we can conclude, that the use of analytics has a positive impact on companies, because it shows them a way how a company should change its operation in order to be more successful. We did not find significant deviations in these areas from the perspective of company size. Furthermore, 38 companies (55.88%; consisting of 24 small companies, 11 middle-sized companies and 9 large companies) plan to use analytics more intensively in the future. Within the questionnaire as well as personal interviews, we proposed the following question: To what extent does analytics influence your business decisions? In case of large companies, 77.78% of respondents perceive that analytics influences 75–100% of their decisions. In case of middle-sized companies, the perception was almost evenly distributed among four possible answers. In case of small companies, the most frequently selected answer was 50–75% of decisions chosen by 20 respondents (40.82%), followed by 17 respondents (34.69%) who chose the option 25–50% of decisions. In the perception of analytics and its impact on decision-making, we can see significant differences according to the size of a company. It can be caused by the fact, that in small companies, decisions are not made only based on data, because smaller companies have usually better and closer connection with customers and they can decide based on feelings and other factors. However, in case of large companies, we can see that the decision making is more significantly influenced by data and knowledge obtained from data. It is also connected with the fact, that it is usually more difficult for large companies to know customers very well and therefore data are the best source for evidence-based decision making. The results of this question are shown in Table 8 and Fig. 7.

Use of E-service Analytics in Slovakia

105

Table 8 Extent to which analytics influences decision-making of companies Extent to which analytics influences decision-making Absolute number Relative number (%) Small companies

49

100.00

Influences 0–25% of our decisions

3

6.12

Influences 25–50% of our decisions

17

34.69

Influences 50–75% of our decisions

20

40.82

Influences 75–100% of our decisions

9

18.37

18

100.00

Influences 0–25% of our decisions

5

27.78

Influences 25–50% of our decisions

5

27.78

Influences 50–75% of our decisions

5

27.78

Influences 75–100% of our decisions

3

16.67

Large companies

9

100.00

Influences 50–75% of our decisions

2

22.22

Influences 75–100% of our decisions

7

77.78

76

100.00

Middle-sized companies

Total

Fig. 7 Extent to which analytics influences decision-making of companies

It is undeniable, that deciding based on knowledge obtained from analytics have a direct impact on company’s profit. On one side, the investment into a successful analytics team might be costly, but on the other side its benefits may outweigh the cost. Or even more, the analytics team is the option how a company can find a gap in the market and achieve competitive advantage. The topic of competitive advantage was another viewpoint that we asked our respondents about during personal interviews.

106

M. H. Vanˇcová and M. Mikolasik

Fig. 8 Analytics as a source of competitive advantage

We wanted to know to if they perceive analytics as a source of competitive advantage. We offered the respondents a scale from 0 to 10, where 0 means that they do not agree at all that analytics brings competitive advantage and 10 means that they extremely agree that analytics brings competitive advantage. The results, shown in the chart highlight the fact, that the perception of analytics assuring competitive advantage is more positive than negative. All of the respondents mentioned, that the analytics team/department and the use of analytics is currently deeply rooted in work processes that they cannot imagine working without analytics. Results are shown in Fig. 8. In addition to this, the respondents expressed the following opinions about the need to use analytics: • “For instance, we discovered when is the major need for our services and decided to have more part-time and supporting employees during that time of the year.” • “We can better predict that interest in our services declines and react to it early.” • “Many of our departments provide IT support to various company teams in the world. Thanks to analytics we are able to discover when particular teams are requested the most and we can rotate employees between individual teams. It means that in January more employees are needed for US team and in March we rotate them to UK team, because that is the place where we need them the most, and it is also the way how we can better plan our resources and avoid idle time.”

Use of E-service Analytics in Slovakia

107

4 Service Analytics and Strategy Studies suggest that analytics is nowadays considered as a crucial part of companies. Analytics strategy should be considered from the beginning of the overall business strategy and both should be aligned, as well as analytics strategy should consider goals of other partial business strategies and vice-versa. Another aspect that is important from the perspective of high-quality analytics team is mutual trust—Company’s departments trust analytics output and the analytics department can trust the data and information from other teams as well as understand their metrics, goals or KPIs. Goals of individual departments must be mutually aligned and shared. In addition to this, employees must be aware of the existence of analytics departments, of its processes, work, abilities, improvements, etc. Basically, sharing of information is considered as a crucial part of successful analytics department. From strategic perspective, the scientific literature suggest that it is beneficial for a company when its analytics strategy aligned with business objectives. Thus, during the interviews we asked the interviewed respondents whether their overall strategy covers analytics strategy and whether they are mutually interconnected. Most of the respondents could not clearly state if they are interconnected or not, the answers of the respondents were ambiguous. We believe that it was caused by the fact, that large companies are very complex, and the representatives of analytics departments cannot clearly assess the whole perspective of the company. We also offered the respondents a scale from 0 to 10 on which they should assess the mutual interconnection of analytics strategy to the overall strategy, however, we did not obtain relevant answers that would show us a specific trend for large companies. In the questionnaire survey, we also included the opinion about the interconnection of analytics strategy to the overall strategy in the question no. 9, where we proposed the statement: “The strategy of analytics department is closely connected to the overall business strategy”. We requested the respondents to assess their agreement with the statement on the scale from 0 to 10, where 0 means do not agree at all and 10 means totally agree. The respondents mostly choose numbers 8, 9 and 10 (11.76%, 13.24% and 23.53% respectively) on the scale, and the trend was similar for small as well as middle sized companies. Thus, we can conclude that in case of small and middle-sized companies, the perception about the mutual interconnection of the analytics strategy to the overall business strategy is positive.

4.1 Service Analytics and Human Resources In the questionnaire survey, we proposed several questions and statements related to the connection of human resources to service analytics. Firstly, we wanted to know whether employees are informed about the presence of analytics processes carried out within companies. 56 respondents (82.35%) expressed the opinion that employees are aware of analytics department/team within the company. 54 respondents (79.41%)

108

M. H. Vanˇcová and M. Mikolasik

confirmed that the majority of employees can work with output of analytics, i.e. with information or knowledge that is a result of analytics processes. In relation to this, 58 respondents in the questionnaire survey (85.29%) never considered outsourcing of analytics to external companies, while 10 respondents (14.71%) considered it. In addition to this, during the interviews with large companies, we learned that none of the interviewed companies would outsource analytics process to external companies at the moment, however, 3 of them confirmed that it can happen in the future, mostly because their scope of analytics processes is constantly increasing. One of the interviewed companies (from the banking sector) expressed the opinion, that it would be dangerous to outsource analytics processes, because data they work with are very precious for their business success. In the questionnaire survey, we introduced the following statement under question no. 9; the respondents were requested to assess the statement on the scale from 0 to 10, where 0 means do not agree at all and 10 means totally agree: Employees understand analytics processes very well. The obtained results, shown in Fig. 9, suggest that the respondents either were not sure about the statement (selected number 5 on the scale), or they mostly agreed with the statement (selected number 8 and higher on the scale). Another statement was focused on employees in the managerial roles—top managers and business leaders. The following statement was proposed in the questionnaire survey as well as during personal interviews; the respondents were again requested to assess the statement on the scale from 0 to 10, where 0 means do not agree at all and 10 means totally agree. Management and business leaders know what they can expect from the analytics team. The results in Fig. 10 suggest that in small companies, respondents had the tendency to agree with the statement. The situation was similar with middle-sized

Fig. 9 Employees understand analytics processes well

Use of E-service Analytics in Slovakia

109

Fig. 10 Management and business leaders know what they can expect from the analytics team

companies. We can assume that communication in small and middle-sized companies is easier and more direct and employees and management are better mutually informed about processes and capabilities of individual teams. In small companies, where a specialized analytics team or department is rare, even managers can carry out analytics themselves. However, the situation is completely different in case of large companies. During the interview in one of the telecommunication companies, the respondent explained the opinion as follows: “Our business leaders often require analytics outputs that does not help them in reality. And then we are blamed.” There was not so strong tendency to agree with the statement and the assessment of the statement was different almost in every company. In case of large companies, we cannot propose a clear statement about their opinion on this statement. Furthermore, we proposed two following statements: • People capital is the major factor for analytics success. • Technology is the main factor for analytics success. And we again asked the respondents in the questionnaire survey to express their agreement with the statements on the scale from 0 to 10, where 0 means do not agree at all and 10 totally agree. As we can see in Figs. 11 and 12, answers were quite similar, and the tendency was to agree with both statements. In case of analytics, we can conclude that capabilities of human resources and the advancement of technology are two factors that complement each other and have a direct impact on success of analytics. During interviews with large companies, all of them agreed that skilled employees are the main factor that assures success of analytics department. However, they also mentioned that it is one of their biggest challenge—to attract skilled people and retain them longer than

110

M. H. Vanˇcová and M. Mikolasik

Fig. 11 People capital is the major factor for analytics success

Fig. 12 Technology is the major factor for analytics success

2 years. A respondent in one of the shared services center explained that: “Even though we are using one of the most sophisticated analytics tools such as SPSS, our success does not lie in the software, but in the people who can perfectly work with the software and their willingness to improve.”

Use of E-service Analytics in Slovakia

111

4.2 Challenges Connected to Service Analytics As any other processes in a company, analytics also faces various challenges. During our interviews we opened several areas of challenges and discussed with companies whether they face similar problems. We also left the communication about challenges open in order to let companies talk about challenges that they perceive as crucial. We discussed the following areas of challenges: • Financial challenges—analytics teams/departments do not have enough financial resources allocated for their day-to-day operation, for future needs, for development of employees, for up-to-date hardware and software, for increasing salaries of high-quality employees and managers to retain them for longer time. • Technological challenges—companies lack up-to-date technology and is unable to upgrade it due to various reasons (financial, building capacity, operational capacity, etc.). • Security concerns—companies collect huge amounts of data and has troubles with security, introduction of GDPR was mentioned as a difficult period by 2 of the companies since they had to assure compliance. • Insufficient skills of employees—according to all interviewed companies the labor market does not offer enough skilled employees; or employees do not have particular skills required by the company; or employees are not able to react to changes (develop skills, attend education). • Data quality and data management—all of the interviewed companies mentioned that they have already faced problems with data in some form. Either data were not correctly stored, they had problems with format, with compatibility of databases, some data were refreshed in bulk at certain period of time and not available when needed; companies would prefer wider range of data; databases are not sufficient; etc. • Formulation of metrics—3 of the companies mentioned that they sometimes have misunderstandings in metrics formulation, business users do not specify or poorly specify KPIs and metrics and analytics team is unable to provide relevant outputs. • Poor understanding of analytics output by employees outside the analytics team— this challenge was mentioned by one of the interviewed companies, since analytics might be difficult to carry out, all the employees within the organization might face problems with understanding the output and they may not feel confident about the output or do not trust the data. • Insufficient alignment of the analytics team within the organizational structure of a company—in one shared service center, the respondent explained that sometimes the analytics team is not perceived as a part of the company; it is not a first “go-to” department when decision-making takes place, communication channels are not explained and employees do not know where to find for analytics help, in addition to this, they have found out that other departments carry out analytics procedures on their own. • Inability to quickly react to changing business needs—4 companies mentioned that they are not always sure they work with up to date tools and they use the

112

M. H. Vanˇcová and M. Mikolasik

most relevant methods how to look at data. One of the interviewed companies mentioned that when they hire new employees who already had some practice in analytics in different companies, they always bring new ideas and methods that have impact on their regular analytics processes. Furthermore, we asked the interviewed respondents to evaluate the following statements related to analytics challenges on the scale 0–10, where 0 means we do not face it at all and 10 means we face it very often. Afterwards we made average of the answers. Results are shown in Table 9 and Fig. 13. As we can see based on this question, the most crucial are technological challenges, data quality, insufficient skills of employees and financial challenges. Thus, we can conclude that analytics in companies mostly face four areas of challenges: • • • •

technology skills of people data quality money (cost).

Naturally, analytics teams require sufficient financial resources to be able to purchase new/upgraded analytics software or invest into the further development of self-made software. Usually, money is tightly connected to technology and skills of employees. Analytics teams cannot be prosperous without highly educated, motivated and skilled employees, and moreover, skills of the employees need to be continuously developed and maintained. Companies also agreed that labor market does not provide sufficient quality of graduates who would be able to quickly fit into the robust analytics teams and it usually requires lots of time to improve their skills and professional qualities. In addition to this, shared services centers in Bratislava, where some of our interviews took place, face huge fluctuation of workforce, and therefore it is costly for companies to educate employees who are likely to move to competitors. However, challenges occur in every sector of a company. It is an inseparable part of success. The key question is, whether these challenges are properly communicated towards management and solved by responsible organs. Thus, we proposed the following question to the respondents during the personal interviews: To what extent are challenges with analytics being solved in your company? The respondents were required to answer the question on the scale 0–10, where 0 means the challenges are not solved at all and 10 means that the challenges are being solved very actively. The result of the question shown in Fig. 14 does not clearly tell us a general answer about the situation in companies. A majority of respondents (50%) chose number 5 on the scale, which shows that challenges are being solved with a certain activeness but not actively enough. Some of the companies mentioned that some of the challenges are solved immediately, but others are not solved for several months or years. In the questionnaire survey, the topic of challenges was included in the question no. 16. In our questionnaire survey, there was only one large company, and the respondent stated that they face problems with insufficient quality and management of data.

3

Insufficient alignment of the analytics team within the organizational structure of a company

Inability to quickly react to changing business needs

C is abbreviation for company

7 2

Poor understanding of analytics output by employees outside the analytics team

1

Formulation of metrics

5

Insufficient skills of employees 8

3

Security concerns

3

9

Technological challenges

Data management

7

Financial challenges

Data quality

C1

Analytics challenges

Table 9 Analytics challenges—large companies

2

8

6

0

4

7

6

2

8

8

C2

3

6

4

1

4

6

6

1

7

6

C3

6

5

4

2

5

7

9

2

4

3

C4

1

3

3

4

4

5

8

4

5

4

C5

1

2

8

1

2

9

7

2

8

6

C6

3

1

7

1

1

6

5

1

8

6

C7

2

7

7

3

4

6

9

3

10

8

C8

3

4

6

2

3

7

7

2

7

6

Average

Use of E-service Analytics in Slovakia 113

114

Fig. 13 Seriousness of analytics issues—large companies

Fig. 14 How actively are issues solved—large companies

M. H. Vanˇcová and M. Mikolasik

Use of E-service Analytics in Slovakia

115

In case of small and middle-sized companies, the respondents stated that the most crucial challenges are: • • • • • •

insufficient time to carry out analytics inability to promptly use information obtained by analytics for business needs insufficient quality and management of data insufficient financial resources insufficient technological equipment insufficient knowledge of employees. The detailed answers are stated in Table 10.

4.3 Additional Findings In the questionnaire survey, we proposed a list of well-known analytics tools and asked respondents to choose the ones that they know. The question was proposed in the Section A as well as in the Section B, i.e. for those respondents who know what analytics is and use it in their company as well as those who know what analytics is but do not use it in their company. Our aim was to compare, whether the knowledge about different analytics tools increases when companies actually use some of the tools. Based on results shown in Table 11, we can see that the most known analytics tools by the both groups of companies are: SAP BI, Oracle BI, Microsoft Power BI, QlikView and Tableau. Both groups of respondents added Google Analytics into our list. The companies that use analytics also added IBM SPSS, Pohoda BI and 4 of them (6%) mentioned that they use own analytics software. According to our comparison, the companies that use analytics also have more information about available commercial analytics tools.

4.4 Companies That Do Not Use Analytics As mentioned above, our questionnaire survey was divided into three parts: (a) (b) (c)

Section A: respondents who know what analytics is and use it in their company (68 respondents) Section B: respondents who know what analytics is but do not use it in their company (35 respondents) Section C: respondents who do not know what analytics is and do not use it in their company (5 respondents).

According to the survey, 46% of those companies that know analytics but do not use it and 60% of those companies that do not know and do not use analytics collects and store data. As shown in Table 12, 31% and 40% of these companies (respectively) work with the stored data. Moreover, 31% of the respondents who

116

M. H. Vanˇcová and M. Mikolasik

Table 10 Challenges connected to analytics in small and middle-sized companies Small companies Absolute

Middle-sized companies Relative (%)

Absolute

Relative (%)

Challenges proposed by us Insufficient knowledge how to work with analytics

8

16

3

17

Insufficient time for carrying out analytics

21

43

7

39

Insufficient financial resources for carrying out analytics

10

20

1

6

Insufficient technological equipment (hardware, software)

10

20

2

11

Insufficient data quality 13 and management

27

3

17

Management and business leaders poorly understand output of analytics

5

10

2

11

We cannot promptly 16 use information obtained from analytics for business needs

33

5

28

6

0

0

Analytics team/department is unsuitably positioned within the organizational structure

3

Challenges specified by respondents Only I understand our data and it requires lots of time to analyze it

1

2

It is a very specific area which I cannot assign to anyone else

1

2

Our data are relatively accurate, but they change extremely quickly

1

2

Systems for data processing are labile

1

2

Use of E-service Analytics in Slovakia

117

Table 11 Analytics tools recognized by respondents Companies that use analytics

Companies that do not use analytics

Absolute

Relative (%)

Absolute

25

37

8

23

6

9

4

11

Relative (%)

Tools proposed by us SAP BI MicroStrategy Sisense

2

3

SAS BI

10

15

0

0

YellowFin BI Dundas BI

1

1

1

3

Microsoft Power BI

17

25

5

14

Clear Analytics

10

15

8

12

4

11

Oracle BI

17

25

7

20

QlikView

14

21

5

14

2

6

QlikSense

1

3

IBM Cognos

1

3

Tableau

Tools specified by respondents IBM SPSS

2

3

Google Analytics

6

9

Pohoda BI

3

4

Metabase

1

1

Logi Analytics

1

1

Own software

4

6

3D experience platform

1

1

Table 12 Work with data in companies that do not use analytics Companies that know analytics

Companies that do not know analytics

Absolute

Absolute

Relative (%)

Relative (%)

Does your company collect and store data about its activities? Yes

16

46

No

17 2

I do not know

3

60

49

2

40

6

0

0

Does your company work with the stored data? Yes

11

31

2

40

No

21

60

3

60

3

9

0

0

I do not know

118

M. H. Vanˇcová and M. Mikolasik

know what analytics is but do not use it yet answered that they think that the use of analytics would contribute to the success of their company; 37% of them have already considered using analytics in their company. We asked the companies that know what analytics is but do not use it in their company, what are the main obstacles that limits them in the use of analytics. They specified the following: • • • • • • •

We do not need it in our company (4 respondents) We do not have enough time for analytics (6 respondents) We do not have enough money for analytics (2 respondents) We do not have enough employees for analytics (2 respondents) We do not have sufficient amount of data (3 respondents) We do not have enough knowledge about analytics (2 respondents) Actually, we are not limited. We might start using it (2 respondents).

5 Conclusion Service analytics is a new “buzzword” that has recently started to appear in the scientific literature. The interest in the subject is caused by two main factors: the world’s economy is increasingly oriented on services and people are generating extreme amounts of data. The combination of the service economy and data analytics has brought the topic of service analytics. The study of foreign literature sources opened a discussion, that we do not know how service analytics is used in Slovak companies providing e-services and moreover, the English term “service analytics” does not even have a Slovak equivalent yet. Thus, we were reasonably interested about the situation in Slovakia. We set up a research project in order to uncover the situation with service analytics use in Slovakia and we applied diverse scientific methods to solve the research problem. We primarily applied the empirical method of questionnaire survey and personal interviews. The empirical research methods were supported by analysis, synthesis, comparison, deduction, induction, abstraction, instantiation as well as generalization. The research sample for the questionnaire survey consisted of 108 respondents who filled the questionnaire, which included 84 small companies, 23 middle-sized companies and 1 large company. Additionally, our research sample in case of the personal interviews consisted of 8 large companies operating in the banking, IT and telco sector. Our findings may be interesting for practitioners, companies, researchers as well as developers of analytics tools. We have found out that larger companies tend to create specific departments or teams oriented purely on analytics processes. Small and middle-sized companies do not have this tendency, which is naturally influenced by a smaller number of employees. A majority of the companies included in our research applies methods of descriptive analytics and diagnostic analytics, however, predictive analytics is more typical for large companies. The majority of companies uses the combination of external and internal data, so they do not focus only on data

Use of E-service Analytics in Slovakia

119

they generate but also on data that come from the external environment. The use of dashboards is more typical in large companies than in small and middle-sized. More than 76% of our respondents cannot imagine success of their company without analytics. Additionally, 54% of respondents confirmed that their business changed after they started to use analytics. On top of that 56% of the respondents plan to use analytic is more intensively in the future. Decision-making based on the information obtained by service analytics is more significant in large companies, where all of the large companies included in our research confirmed that analytics influences at least 50% of their decisions, or more specifically, 78% of large companies confirmed that it is even more than 75% of decisions. In case of small and middle-sized companies, service analytics influences 50–75% of decisions of 41% of them. In addition to this, companies confirmed that human resources with technology are equally the most important factor for successful service analytics. From the perspective of outsourcing of analytics processes, which is currently very common in the most successful companies in the world, none of the interviewed companies perceives this as their actual solution. We have summarized that companies face several most crucial challenges connected to service analytics, which include insufficient time, inability to promptly use information obtained by analytics for business needs, insufficient quality and management of data, insufficient financial resources, insufficient technological equipment, insufficient knowledge of employees. Our survey discovered that 46% of those companies that know analytics but do not use it and 60% of those companies that do not know and do not use analytics collects and store data. Moreover, 31% of the respondents who know what analytics is but do not use it yet answered that they think that the use of analytics would contribute to the success of their company; 37% of them have already considered using analytics in their company. The paper enables us to look inside Slovak companies providing electronic services and explain what the current state of the use of service analytics is. We believe that service analytics is a phenomenon that will be more frequently used, discussed and researched and therefore we consider the information obtained in our paper as an appreciative source for all interested parties.

References 1. Veselý, P.: Technické normy a další metodiky pro bezpeˇcnost ICT. In: Bezpeˇcnostní vˇedy: úvod do teorie, metodologie a bezpeˇcnostní terminologie, p. 175. Vydavatelství a nakladatelství Aleš ˇ ek, Plzeˇn (2019) Cenˇ 2. Jurˇcák, V., Klimek, L., Porada, V., Veselý, P., Pawera, R.: Evropská bezpeˇcnost. 4. In: Bezpeˇcnostní vˇedy: úvod do teorie, metodologie a bezpeˇcnostní terminologie, pp. 89–140. ˇ ek, Plzeˇn (2019) Vydavatelství a nakladatelství Aleš Cenˇ 3. Schniederjans, M.J., Schniederjans, D.G., Starkey, C.M.: Business Analytics Principles, Concepts, and Applications: What, Why, and How. Pearson Education (2014) 4. Dinsmore, T.W.: A short history of analytics. In: Dinsmore, T.W. (ed.) Disruptive Analytics: Charting Your Strategy for Next-Generation Business Analytics, pp. 23–46. Apress, Berkeley, CA (2016)

120

M. H. Vanˇcová and M. Mikolasik

5. Rich, D., Harris, J.G.: Why predictive analytics is a game-changer. In: Forbes. https://www. forbes.com/2010/04/01/analytics-best-buy-technology-data-companies-10-accenture.html (2010). Accessed 24 Jan 2021 6. Liew, A.: DIKIW: data, information, knowledge, intelligence, wisdom and their interrelationships. Bus. Manag. Dyn. 2, 14 (2013) 7. Business Analytics—Meaning, Importance and its Scope. https://www.managementstudy guide.com/business-analytics.htm. Accessed 24 Jan 2021 8. Veselý, P.: Definice kybernetické bezpeˇcnosti, kybernetických útok˚u a kybernetické kriminality. In: Bezpeˇcnostní vˇedy: úvod do teorie, metodologie a bezpeˇcnostní terminologie, pp. 160–175. ˇ ek, Plzeˇn (2019) Vydavatelství a nakladatelství Aleš Cenˇ 9. Bendre, M.R., Thool, V.R.: Analytics, challenges and applications in big data environment: a survey. J. Manag. Anal. 3, 206–239 (2016). https://doi.org/10.1080/23270012.2016.1186578 10. Chen, H., Chiang, R.H.L., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36, 1165–1188 (2012). https://doi.org/10.2307/41703503 11. Tien, J.M.: Big data: unleashing information. J. Syst. Sci. Syst. Eng. 22, 127–151 (2013). https://doi.org/10.1007/s11518-013-5219-4 12. Anandarajan, M., Harrison, T.D.: Aligning Business Strategies and Analytics: Bridging Between Theory and Practice, pp. 1–7. Springer International Publishing, Cham (2019) 13. Laursen, G.H.N., Thorlund, J.: Business Analytics for Managers: Taking Business Intelligence Beyond Reporting. Wiley (2016) 14. Davenport, T.H.: Competing on analytics. Harv. Bus. Rev. http://hosteddocs.ittoolbox.com/ competinganalytics.pdf (2006). Accessed 24 Jan 2021 15. Cooper, A.: What is Analytics? Definition and Essential Characteristics, vol. 1, p. 10 (2012) 16. Acito, F., Khatri, V.: Business analytics: why now and what next? Bus. Horiz. 57, 565–570 (2014). https://doi.org/10.1016/j.bushor.2014.06.001 17. Simon, P.: Why Analytics Matter More Than Ever. https://www.philsimon.com/blog/data/bigdata/why-analytics-matter-more-than-ever/ (2015). Accessed 25 Jan 2021 18. Ebner, M., Schön, M.: Why learning analytics for primary education matters! Bull. Tech. Comm. Learn. Technol. 5 (2014) 19. Ansari, M.: Top 82+ Question and Answer Website List | Best Q&A Sites. https://www.seotut orialpoint.com/question-answer-website-list/ (2020). Accessed 14 Jan 2021 20. Hanna, N.K.: Mastering Digital Transformation, pp. i–xxvi. Emerald Group Publishing Limited (2016) 21. Saxena, S., Kumar Sharma, S.: Integrating big data in “e-Oman”: opportunities and challenges. info 18, 79–97 (2016). https://doi.org/10.1108/info-04-2016-0016 22. Diao, Y., Jan, E., Li, Y., Rosu, D., Sailer, A.: Service analytics for IT service management. IBM J. Res. Dev. 60, 13:1–13:17 (2016). https://doi.org/10.1147/JRD.2016.2520620 23. Fromm, H., Habryn, F., Satzger, G.: Service analytics: leveraging data across enterprise boundaries for competitive advantage. In: Bäumer, U., Kreutter, P., Messner, W. (eds.) Globalization of Professional Services: Innovative Strategies, Successful Processes, Inspired Talent Management, and First-Hand Experiences, pp. 139–149. Springer, Berlin, Heidelberg (2012) 24. Carroll, N., Helfert, M.: Service capabilities within open innovation: revisiting the applicability of capability maturity models. J. Enterp. Inf. Manag. 28, 275–303 (2015). https://doi.org/10. 1108/JEIM-10-2013-0078 25. Spohrer, J.C., Demirkan, H.: Introduction to the smart service systems: analytics, cognition, and innovation minitrack. In: 2015 48th Hawaii International Conference on System Sciences, pp. 1442–1442. IEEE, Kauai, HI (2015) 26. Cardoso, J., Hoxha, J., Fromm, H.: Service analytics. In: Cardoso, J., Fromm, H., Nickel, S., Satzger, G., Studer, R., Weinhardt, C. (eds.) Fundamentals of Service Systems, pp. 179–215. Springer International Publishing, Cham (2015) 27. Loukis, E., Pazalos, K., Salagara, A.: Transforming e-services evaluation data into business analytics using value models. Electron. Commer. Res. Appl. 11, 129–141 (2012). https://doi. org/10.1016/j.elerap.2011.12.004

Use of E-service Analytics in Slovakia

121

28. Tian, C.H., Cao, R.Z., Zhang, H., Li, F., Ding, W., Ray, B.: Service analytics framework for web-delivered services. Int. J. Serv. Oper. Inf. 4, 317–332 (2009). https://doi.org/10.1504/ IJSOI.2009.029182 29. Porada, V., Smejkal, V., Veselý, P.: Kybernetická bezpeˇcnost. 6. In: Bezpeˇcnostní vˇedy: úvod do teorie, metodologie a bezpeˇcnostní terminologie, pp. 160–182. Vydavatelství a nakladatelství ˇ ek, Plzeˇn (2019) Aleš Cenˇ 30. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N., Yasar, A.U.H.: A real-time service system in the cloud. J. Ambient Intell. Humaniz. Comput. 11, 961–977 (2020). https://doi.org/ 10.1007/s12652-019-01203-7 31. Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Xhafa, F.: Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. Computing 101(11), 1661–1685 (2019). https://doi.org/10.1007/s00607-018-0680-z ˇ Malý, stredný alebo mikro podnik—veˇlkostné kritériá. In: Podnikajte.sk. https:// 32. Šrenkel, L.: www.podnikajte.sk/podpora-podnikania/maly-stredny-mikro-podnik (2016). Accessed 25 Jan 2021 33. ŠTATISTICKÝ ÚRAD SLOVENSKEJ REPUBLIKY. 183 34. [email protected]. FinStat.sk—information on all Slovak companies. https://finstat.sk/inform ation-of-slovak-companies. Accessed 25 Jan 2021 35. Sample Size Calculator by Raosoft, Inc. http://www.raosoft.com/samplesize.html. Accessed 25 Jan 2021 36. Parmar, R.: Driving innovation through data (2014)

Managing Quality of Human-Based Electronic Services Zuzana Takacsova and Sergiy Masalitin

Abstract Automation and outsourcing of different tasks have become more popular over the last years. With globalization and technology is possible to allocate work to workers across the globe and save time, costs and resources for a company. HumanBased Electronic services provide different options for the outsourcing of tasks that cannot be purely automated. Managing quality is one of the most important elements in this field as low-quality solutions might lead to delays in delivery, exceeding budget and overall dissatisfaction. Various types of Human-Based Electronic services have been analysed and examined potential threads in quality management, as well as described workflows to improve the quality assurance while their usage. By selecting the right type and approach and combine different services with outsourcing might be optimized the result either in terms of time or quality or budget. Keywords Crowdsourcing · Quality management · Outsourcing · Human-based electronic services

1 Introduction 1.1 Relevance Globalisation, standardization of business processes and development of communication technologies trigger the possibility for effective cooperation relationships between people and organizations across the globe. Modern companies profit from the effective usage of outsourcing by using telecommunication technologies. Delegating some business processes at a cheap payment range seems to be the desired Z. Takacsova (B) Faculty of Management, Comenius University, Odbojarov 10, 82005 Bratislava, Slovakia e-mail: [email protected] S. Masalitin University of Vienna, Oskar Morgenstern Platz 1, 1090 Vienna, Austria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_4

123

124

Z. Takacsova and S. Masalitin

prospective for employers, e.g., using working power in cheaper locations. The first remarkable player on this market became portal Elance-oDesk (currently called Upwork), developed in 1999. In 2004–2005 entered the market powerful players mTurk.com by Amazon and UpWork.com. This development coined a new term— Human-Based Electronic Services [1]. Not considering the similarity of ideas and work principles, these services have remarkable differences that are caused by a variety of task types and levels of people’s engagement. Despite the fact, that these platforms have become more popular, within platform exploitation have been encountered some negative features. To such disadvantages belong: (1) direct service worker supervision, (2) absence of project performance control, (3) difficulties with recognition of worker’s qualification and (4) and lack of possibilities to predict and affect the future task results before the work is finished. The economical reasonability of using these services can be ruined by the need of continuous requesters controls due to insufficient service quality. Relevant studies focused on the question of quality in human-based services have been found from 2011 [2]. These reasons might lead to exceeded budget, due to more iteration of the tasks or necessity of complete rework. The first part of this chapter describes quality management done by Electronic Services users. The theoretical background is followed by data analysis on how the most common types of Human-Based Electronic Services work, their business models and quality management approaches. The purpose of this chapter is to define the requirements for quality management systems in Human-Based Electronic Services, to make an overview of quality control methods that are in use.

2 Theoretical and Conceptual Background 2.1 Human Intelligence Tasks (HIT) The clarity about the automation of diverse tasks and identification of different types of tasks that can be fully automated is high, but also has certain limitations. There are still human intelligence tasks (HIT) that are normally considered as those which require human intelligence or action. Definition of HIT is clearly connected to the meaning of human intelligence itself [1]. Cognitive abilities, i.e. understanding, learning, logic application, recognizing and solving issues, making decisions are crucial components of human intelligence. The term HIT is being used in MTurk.com by Amazon, where it is defined as one accurately set and structured task which is performed by a service worker using his intellectual capabilities.

Managing Quality of Human-Based Electronic Services

125

2.2 Human-Based Electronic Services Efficiency in running a business considerably depends on rationalization level (e.g.: costs reduction, revenue growth). The Internet enables human intelligence tasks to be performed virtually so that workers actually provide companies with electronic services. This phenomenon has been caused by increasing rationalization level in business processes and human capital outsourcing. In other words, these two features coming from opposite directions (employer-employee) have introduced a new product area, a market of human-based electronic services; services that is the result of human activity virtually performed and/or provided on market [1]. In this chapter, this definition recalls to electronic online portals that supply the infrastructure to carry out such services exampled by mTurk.com, TextBroker.com, DesignenLassen.de and UpWork.com [3–6].

2.3 Crowdsourcing Crowdsourcing is defined by Howe [7], as the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. It represents the connection between workforce demand and supply. We could cluster the main promoted added values of using crowdsourcing as following: Crowd creativity. Tapping of talent pools in order to develop and design the products in different fields. It might be applicable also in the original art industry and media, including advertising, film, photo and video production, graphic design, apparel, consumer goods, and branding concepts. Other examples are represented by 99Designs, a platform that allocates public design contests, and Spreadshirt, a platform for the crowdsourced design of personalized apparel [8, 9]. Crowd knowledge. Development of knowledge assets or information resources from a distributed pool of contributors. Crowdsourcing is used to develop, aggregate, and share knowledge and information through open Q&A, user-generated knowledge systems, news, citizen journalism, and forecasting. Wikipedia is a traditional key example for the concept of the wisdom of crowds introduced by Surowiecki, which means that collective intelligence can be reached at a high level when deficient intuitive assertions of a group of people are aggregated in the right way [10]. Crowdsourcing includes coordination, motivation, contracting, steady development and cooperation. On the other hand, there is a lack of employment rights. Cloud labor is usually considered to be a particular form of crowdsourcing. At the same time, activities tend to be more specific, so-called micro jobs.

126

Z. Takacsova and S. Masalitin

2.4 Quality of a Service Even if Quality of Service (QoS) cannot be called an indispensable characteristic of the product/service, it remains a crucial attribute and can be estimated by the level of consumer satisfaction. The ISO standard 9000:2005 defines ‘quality’ as the degree to which a set of inherent characteristics fulfills requirements [11]. The traditional definition of quality is based on the viewpoint that products and services must meet the requirements of those who use them [12]. It is generally estimated that the quality is optimal if the expectations of the requestor have been perfectly satisfied. If they haven’t then it is caused by a gap between product characteristics and customer’s expectations. In compliance with the gap-theory of Parasuraman et al. [13], the quality that a customer perceives in a service is a function of the magnitude and direction of the gap between expected service and perceived service. In this chapter is QoS supplied service fulfills the requestor’s expectations. Additionally, we understood by QoS the accuracy in fulfillment of the requestor’s expectations by supplied service.

2.5 Quality Management According to the ISO standard 9000:2005, the term management refers to all the activities that are used to coordinate, direct, and control an organization [11]. Quality management includes all the activities that organizations use to direct, control, and coordinate quality. These activities include formulating a quality policy and setting quality objectives. They also include quality planning, quality control, quality assurance, and quality improvement [11]. In this chapter, QM refers to efforts and actions of management performing party (service requestors, platform managers or specific system tools of the platform) in order to carry out tasks effectively and efficiently. The term quality management system (QMS) can be understood as elements of an organization that are put together and interacting with each other to manage responsibilities, processes, and resources related to implementation of quality policies. In the ISO standard 9000:2005 QMS is defined as a set of interrelated or interacting elements that organizations use to direct and control how quality policies are implemented, and quality objectives are achieved [11].

3 Key Quality Requirements for Human-Based Electronic Services In the service-oriented computing paradigm and the Web service architecture, the broker role is a key facilitator to leverage technical capabilities of loose coupling to achieve organizational capabilities of dynamic customer-provider relationships [14].

Managing Quality of Human-Based Electronic Services

127

Table 1 Determinant of service quality defined in [13] applied to Human-Based Electronic Services according to [1] Characteristic

Description

Reliability

Consistency, dependability, accuracy, correctness, timely delivery of results

Responsiveness

Willingness or readiness of employees to provide service

Competence

Skills and knowledge to perform the service

Access

Approachability and ease of contact

Courtesy

Politeness, respect, consideration, and friendliness of contact personnel

Communication

Active listening and elaborate on the clients’ needs, keeping the clients up-to-date

Credibility

Trustworthiness, believability, honesty

Security

Freedom from danger, risk, or doubt

Understanding

Making the effort to understand the customer’s needs

Tangibles

Physical evidence of the service

Quality is not an essential attribute of a product or a service but can be determined by considering the expectations of the Service Requester. The traditional definition of quality is based on the viewpoint that products and services must meet the requirements of those who use them [12]. Electronics services might have specific quality benchmarks [15]. For the general case we can highlight the next main requirements according to [1]: • Performance: Ability to return a result within certain response time limits and deadlines • Scalability: Ability to handle a certain average and peak number of requests • Availability: Ability to provide the service continuously • Correctness: Ability to return a minimum percentage of correct results • Economical effectivity: Ability to provide comparable inexpensive service for the Service Requesters and Comparably high and attractive payment for Workers. We can highlight more “determinants of service quality”. In this case, we should consider the phrase “determinants of service quality” as a requirement for workers who are standing behind the Human-Based Electronic Services but not the system itself (Table 1).

3.1 Quality Management Approaches There are numerous QM approaches described by different researchers in the scientific literature. The pros and cons of different approaches were analysed and are summarised below.

128

Z. Takacsova and S. Masalitin

Qualification Tests and Ranking Systems. Qualification tests are widely used for assessing the qualifications of workers before they start working on their tasks. Qualification tests are usually a list of questions that represent some certain real task or theoretical problem that will be faced by an employee during his assigned duty. Depending on the type of evaluation test it might be assessed manually or through some automatic algorithm. Depending on the test’s result it gets some error rate. Error rate can be used to rank work applicants to select the best of them and it might be used as a parameter for the next use of the system’s Quality Control (QC) algorithm. Some systems as a MTurk.com and CrowdFlower use error rate as a fundamental parameter in their QC systems. The error rate helps to draw a distinct line between spammers, unskilled applicants and could be used in more sophisticated Quality Management (QM) algorithms is one of the fundamental parameters for it. The other advantage of qualification tests is the ability to use such tests for training purposes. Usually, it is not a difficult task to add scoring systems to existing qualification tests [16]. There are different rating systems to rank workers available. In some systems the worker’s rating and score are given automatically after providing the results of work. Usually, it is used for simple tasks. Other systems are using crowdsourcing to provide this scoring or allow service requesters to rank workers manually by adding notes, comments or stars. Output-Based Quality Management. A wide variety of approaches used to measure accuracy are output-based. They assess the quality based on the results received from the workers. Within a variety of those approaches, some algorithms rely on crowdbased QM outsourcing. QM returns tasks to the workers’ pool until the quality of the work does not reach the desired level. Another subcase of output-based quality management is goal-based QM or “gold patterns” when the final result is already known for deterministic tasks [17, 18]. In this case, the average accuracy rate can be measured by randomly sending tasks to the workers and comparing received results with the former results of those tasks [19]. Alterative available option to ensure QM is to have a manager responsible for project implementation and QM. This manager could be provided by the platform itself assigning these tasks to one or a few managers who will take care of them. The manager could be assigned from the service requester as well. Usually, this service rises the cost of the QM service. Execution Process Monitoring. Execution process monitoring is a quality management approach oriented to get information about the way how the task is performed by monitoring, tracking, or analyzing the actual execution process. Execution process monitoring is not oriented to provide accuracy or the quality of the results. The most famous example of such an approach is the Upwork’s Work Diary takes screenshots on workers’ computers making them available for service requesters. Such an approach is allowing to track, monitor and document if worker is working on the project and proves workers’ effectiveness. It is optional and available for the projects that are using a pay-per-hour payment model [20].

Managing Quality of Human-Based Electronic Services

129

A more sophisticated mechanism to the monitor execution process was developed in 2011 [21]. This mechanism was introduced as CrowdSwape and is using JavaScript. CrowdSwape is a tool gathering information about workers’ behavior, its deviation, and the way it affects result quality. Generalizing CrowdSwape is transforming this statistical data in a task fingerprint what allows to estimate accuracy and quality of workers’ result. CrowdSwape combines this approach with additional approaches like crowd-based and gold-based QM. Combining predictions and insights gathered through machine learning tools and visualising them CrowdSwape aims to provide a better analytical tool for service requesters to provide better crowd exploiting. Response Time Management. There are many applications where the response time of Human-Based Electronic Services matters. It is especially important when the user awaits the response from the application which is generated by using Human-Based Electronic Services. An example of a software application can be VizWiz app [22]. This application was developed to help blind people to take a picture with the camera and then receive a verbal description of what they captured. Application guides blind user through a picture-taking process and voice recording process with own voice guidance and then sends picture with voice comment to MTurk workers to get a description on what is depicted on the captured picture. In the promo video, the lady takes a picture of a T-shirt asking in an audio comment about what is written on this T-shirt and gets an audio answer from the program operation. There might be some other ideas to use in other applications like creating text translation based on the online video stream. In this case, the response time management matter is critical [16]. In general, the response time can be divided into two parts: time delay from the submission of the task till the task is being executed, and the time needed for actual task execution. Response time can be positively improved by the parallelization of the task to be performed and subtasks then executed by a group of workers. These facts are directly influencing methods and algorithms aimed to reduce response time. High-Level Management Outsourcing. The availability of different services and approaches to organise Human-Based Electronic Services on the market is bringing in the idea of their combination to create a highly self-organizable convenient tool to solve a wide variety of tasks. Some services like MTurk with its focus on identical recurring tasks can be combined with e.g. UpWork services focused on more sophisticated tasks. This combination enables to get done and properly control the big number of tasks from the tasks and overall project Quality Assurance (QA) point of view. For example, the process of transforming medical record books in electronic form requires MTurk-like workforce to perform the actual task and some higher-level supervision which can be performed by UpWork like freelancers hired according to the qualification needed. This combination provides quite effortless and more control over the whole project execution.

130

Z. Takacsova and S. Masalitin

4 The Economical Background of Human-Based Electronic Services Outsourcing of internal business processes to Human-Based electronic Service is bringing undoubtedly economical profit to the company. Besides the advantages seen in scalability and the delegation of the labor cost management, there should be mentioned the price difference of such services compare to non-electronic outsourcing is remarkable. For example, logo development on the DesignenLassen.de platform will cost about 200–500 euro, when the minimum market price is leveled at 225 euro [5]. And projects on MTurk.com show the same trend. According to the investigation carried out by Horton, a middle service worker of MTurk.com (Turker) agrees to perform tasks for twice lower compensation than the average salary in the US [23]. Moreover, the average salary rate in the US is above the living minimum wage in developing countries, hence there is a considerable labor force to support the US. It should be also mentioned that MTurk.com developed a cash payment system for its workers just to increase the number of them in India [24]. In the study from 2013 was stated, that despite of the important role pricing plays in crowdsourcing campaigns and the complexity of the market, most platforms do not provide requesters appropriate tools for effective pricing and allocation of tasks [25]. Therefore we also examine pricing models of analysed platforms.

5 Human-Based Electronic Services Overview: Business Models and Used Quality Management System After outlining the main features and methods of quality management in HumanBased electronic Services there should be examined the key services available online. Every mentioned service follows either scheme Service Requester or online Service Platform or Service Worker, or a combination of them. Service Requesters access the server Service Platform, create a request which as soon as it appears in the system becomes available for the service workers (humans). Four platforms have been chosen to be studied in this chapter: UpWork.com, DesignenLassen.de, TextBroker.com and MTurk.com, as representatives of the main business models in supplying HumanBased electronic Services. Differences of these platforms are represented by mechanisms and models of work with a service request, typical tasks that can be proceeded with these platforms and applied quality control systems. Another special type of work request is to check computing results, and a computer is a Service Requester. A practical example of this scheme can be tutoring computer systems, the ones that are involved in creating artificial intelligence. In this research, it does not play any importance because in this case, a machine becomes a service requester. Within our study, we are going to focus neither on work relations’ issues between requester and performer, nor on activity regulations of the platforms that provide

Managing Quality of Human-Based Electronic Services

131

these kinds of services, as there doesn’t exist any respective regulatory framework for the Internet, and moreover, existing employment and labor laws for local markets and international relations can’t cover the topic in the right way. Therefore, we don’t outline copyright issues at this scope. To investigate quality control capabilities and requirements of electronic provided human services, we will need to study a business model that is applied at these platforms and their structural scheme.

5.1 UpWork.com UpWork platform is the expert crowdsourcing website, raised from two strong companies in this field—oDesk and Elance. At the UpWork platform, a servicerequester can solve a wide range of issues, ranging from website development, design projects, software development, SEO optimization, to data analysis and dealing with data arrays as well. UpWork.com gain profit from 5 to 20% of the money paid by requesters for carried out projects. The platform allows users to create tree types of projects: a fixed value, paid by hour and a contest project. In the case of a fixed-value project, the requester sets up a service request in form of insertion [6]. Next, he gives a detailed project description, so that he allots a task, indicating performance time and an approximate price that he is ready to pay. Within already first minutes one of the one million registered freelancers (here: service worker) having watched the project will mark it by leaving a comment, where the price and time for performing this project are indicated. Normally, to this offer is added qualification portrait and tasks already performed by him, and that enables an indirect qualification test to be done. Communication between customer and service worker, and consequently execution process monitoring are carried out both within the platform on the base of embedded communication tools (chat, messages), and beyond it. After the worker has completed a milestone on the project, the requester checks it and if the results are meeting expectations, he pays out the service worker [6]. After project accomplishment needs to be made mutual assessment when the worker and the requester get rating and comments about their project. This information will be available for all platform users. This procedure supplies information for the ranking system, which is used as one of the elements of QoS Management. The second type of calls disposed on the platform is paid by an hour. The selection process of a service worker is the same as in the project mentioned above, but the control of the performance process is now more automated using the application Team App developed by UpWork.com and an online Work Diary server working along with it. Team App program, being installed in the workers’ computer, writes how much time does the project take, creating a track-file that includes screenshots [20]. Then, this created track-file is sent to an UpWork Internet-platform into Work Diary web-application, available on the active-tasks panel of both service worker and the requester. With the help of this feature, the requester can estimate the level

132

Z. Takacsova and S. Masalitin

of quality of his project performance while its execution, and also find out the realtime scope. The applied mechanism of quality control can be specified as execution process monitoring. For such types of requests output- and ranking-based QM is also applied. The third kind of order is a contest. Requester defines requirements to result, performance time scope and payment rate. Performers nominate their accomplished tasks, and the requester rates them. The work marked with the highest rate is recognized and rewarded. Such type of projects is especially common and advantageous for design-projects, for this reason, it is described more precisely in paragraph 3 this chapter within considering Service specialized at creating graphic design. It should be mentioned that this workflow uses output- and ranking-based QM.

5.2 Textbroker.com Platform Textbroker is similar to UpWork but focused exclusively on creating text content and includes some specific mechanisms of quality control. Currently, this pool amounts to 1 million registered copywriters. This platform gets its profit from the difference between what it pays service workers and what payment rate is offered to the requester. The difference between prices for 1 word paid by the requester and paid to the copywriter is from 45 to 85% [4]. The platform offers 2 types of orders: Self-Service and Managed Service. In the case of Self-Service, the service requester makes a service call like on UpWork.com when it is fixed-value projects. In this case, workflow and applied systems of quality control are identical with those of UpWork.com: output- and ranking-based QM. One of the specific features of this server is the possibility to use a method mentioned before as High-Level Management Outsourcing to control quality. Then, there is a dedicated manager between the server itself and the service requester on TextBroker.com platform [26–28]. This manager leads and controls this request. TextBroker’s managers are selected from merit copywriters of the portal who are classified to this position. There is also a possibility for experienced service workers to be promoted to managers after successful completion of various tasks. Working with Managed Services starts also with a service request. As soon as a request is made a manager qualified at working with the system of this project starts to lead it. The manager offers consultations, defines his requirements for this project. Then, this personal manager redirects the project to specially selected copywriters. When the task is solved, the manager verifies if it meets the requirements before when the requester sees it, and in this way, service quality is controlled.

Managing Quality of Human-Based Electronic Services

133

5.3 Designenlassen.de Internet portal designenlassen.de offers designer services, it is based on contest among its community members. This kind of working scheme appears to be a contest project on UpWork.com. When a service request has been made, service workers upload their designs to the task. Generally, there are about 100 design-proposals for one project, and this feature depends on time scope and payment rate. During the “contest” the requester leaves a comment under each of the proposals, rates it, and he can also share a wish or advice upon the design. This illustrates an execution process monitoring approach of QM. When the time is up, the requester review all submitted design variants and evaluate which one is the best and will be receiving the reward [5]. Among service infrastructure, moderators can also be ranked that for a modest fixed amount of additional payment offer consultations and a possibility to choose only the best designers for the project [5]. Service quality is defined on the one hand by a created infrastructure of a high quality aimed to cooperate with the requester. On the other hand, requester’s capacity to adopt the offered framework to his business needs and more than that it is highly needed to fast and accurately evaluate work results and quality of workers performance. Therefore, these platforms are also considered to be an example of output-based QM approach and High-Level Management application.

5.4 MTurk.com MTurk.com was launched in 2005 by Amazon. MTurk service is oriented on a big number of tasks of the same type, so-called HITs which does not require high workers qualification, preliminary training or special education [29]. As far as every project consists of a high amount of similar types of tasks, it is natural to get more than one service worker involved in one project. Broadly speaking, this approach to work on the projects corresponds to the crowdsourcing definition [3]. The task performers, so-called Turkers, might have the lowest qualification. The most important for them is being able to read the task description, perform the HIT and return the correct result according to the appropriate deadline [16]. From the project control point of view, it is possible to ask the workers to get a certificate on performing quality of the certain types of tasks which is an example of the qualification test approach in QM. After the project is being finished system calculates the accuracy rate quota for the workers who participated in the project based on the HIT’s results approved by Requester. The Requester can define not only the time frame for the whole project but the time frame for the one HIT as well. Such a parameter is opening possibilities for the Response Time QM approach. More detailed analysis of QM in MTurk is performed in [30].

134

Z. Takacsova and S. Masalitin

Fig. 1 Alternative scheme of work with MTurk.com platform (own illustration)

According to the scheme of work the MTurk.com service is the simplest, but the most flexible one thanks to the variability of the Workers involved in the task execution and the possibility to make the tailored interfaces for the particular project needs. In addition to the MTurk.com flexibility, it presents a unique possibility for this market to use its resources through Application Programming Interfaces (API). It enables third parties to create custom solutions based on the possibilities of MTurk.com [3]. There is also the alternative scheme of work with the MTurk.com service possible (Fig. 1). This scheme enables the possibilities to create plugins and external services that are working with MTurk.com. In the next paragraph of this work, the Statistical Quality Management System represented by Robert Kern is described [16]. The development of this system was possible because of API existence. This system is opening a new approach to the QM.

5.5 Standard Quality Management Systems in Human-Based Electronic Services Let us have a brief overview of a smart and sophisticated Statistical Quality Management System developed by Robert Kern which became a focal point of his Ph.D. thesis [16]. Statistical Quality Management System consists of two main components: QC mechanism and Dynamic Voting Mechanism (DVM). After the Requester submits the task to the QM system it immediately publishes the first assignment to the Human-based Electronic Service. A Worker gets the task assignment, works on it and returns a result that goes back to the QM system with the worker’s ID. Depending on the Continuous Sampling Plan (CSP) worker’s status inspection of the material is being initiated or not. If Worker’s result is not recognised and accepted as a final result and is passed back to the Requester without any additional validation. In case the inspection is needed, the DVM is initiated. The DVM publishes another assignment with the same task in the Human-based Electronic Service again. After receiving the result of an assignment from a different worker, DVM uses historically collected error rates of Workers performed these assignments to identify the one with a better probability of correctness. The DVM checks whether the performed task fulfills the minimal requirement for the predefined minimum inspection quality. If all quality checks are passed, then the result is accepted as a final and is sent to the Requester. In this case, the CSP status of

Managing Quality of Human-Based Electronic Services

135

Fig. 2 Functional schema of Statistical Quality Management System [16]. Process flow of the CSP/DVM in the context of the service requester, the Human-Based Electronic Service and the crowd workers variety

the Worker is updated accordingly. If the predefined minimum inspection quality is not yet met, the DVM increases redundancy by publishing another assignment with the same task again, receiving a result from another Worker and again checking the result with the highest probability of correctness. This process continues until the minimum inspection quality is met or the probability of this fact is very unlikely for the involved Workers considering their historical error rates. In this case, it can be assumed that there is something wrong with the task or its description. The task is being sent back to the Requester for further review, validation, and improvement [16] (Fig. 2).

6 Conclusion To conclude, it should be said that Human-Based electronic Services represent the next step in outsourcing evolution which will not just enable to cast a new light

136

Z. Takacsova and S. Masalitin

on a business processes optimization using IT, but also brings a new level of cooperation between a human and a machine. These services collocate all profits from outsourcing, namely using a cheap labor force without concerning the problem of creating physical conditions for work and to watch if the law is followed in arising relationships. Utilizing crowdsourcing, concept Human-Based electronic Services enlarge their scales of outsourcing application, adding such features as availability, agility, scalability, elasticity and maintenance. It shows a particular similarity with the principle of Cloud. Lowering costs that are guaranteed by Human-Based electronic Services, brings some possibility of a decrease in the quality of work. For its further development, it is needed to pay more attention to quality control. Along with hybrid methods mentioned in this chapter that are performed by quality managers with the help of mechanisms given by online platforms, some new kinds of automatic quality control have arisen, for example, Statistical Quality Management System [16]. Based on the research performed, we can conclude that the presented system and similar available systems, will guarantee steady development in this field, implementation in human–computer-interaction and surely will cause a speed-up in self-tutoring computer systems invention and as a consequence invention of artificial intelligence.

References 1. Kern, R., Zirpins, C., Agarwal, S.: Managing quality of human-based eServices. In: Feuerlicht, G., Lamersdorf, W. (eds.) Service-Oriented Computing—ICSOC 2008 Workshops. ICSOC 2008. Lecture Notes in Computer Science, vol. 5472, pp. 304–309. Springer, Berlin Heidelberg (2009) 2. Bermbach, D., Kern, R., Wichmann, P., Rath, S., Zirpins, C.: An extendable toolkit for managing quality of human-based electronic services. In: Human Computation. AAAI Workshop, vol. 11(11) (2011) 3. mTurk.com: Requester Overview. https://requester.mturk.com/ (2020). Accessed 24 Nov 2020 4. TextBroker.com: Prices and Services. https://www.textbroker.com/clients-prices-conditions (2020). Accessed 24 Nov 2020 5. DesignenLassen.de: Preise und Projektoptionen. https://www.designenlassen.de/preise/ (2020). Accessed 24 Nov 2020 6. Upwork.com: How It Works for Clients. https://www.upwork.com/i/how-itworks/client/ (2020). Accessed 24 Nov 2020 7. Howe, J.: Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Random House (2008) 8. 99Designs.com: http://99designs.com/ (2020). Accessed 24 Nov 2020 9. Spreadshirt.com: http://www.spreadshirt.com/ (2020). Accessed 24 Nov 2020 10. Surowiecki, J.: The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations. Doubleday, New York (2004) 11. ISO—The International Organization for Standardization: http://www.iso.org/iso/home/store/ catalogue_ics.htm (2005). Accessed 24 Nov 2020 12. Montgomery, D.C.: Introduction to Statistical Quality Control, 6th edn. Wiley, New York (2008) 13. Parasuraman, A., Zeithaml, V.A., Berry, L.L.: A conceptual model of service quality and its implications for future research. J. Mark. 49(4), 41–50 (1985)

Managing Quality of Human-Based Electronic Services

137

14. Scholten, U., Fischer, R., Zirpins, C.: Perspectives for web service intermediaries: how influence on quality makes the difference. In: Di Noia, T., Buccafurri, F. (eds.) E-Commerce and Web Technologies. EC-Web 2009. Lecture Notes in Computer Science, vol. 5692, pp. 145–156. Springer, Berlin Heidelberg (2009) 15. Kahn, B.K., Strong, D.M., Wang, R.Y.: Information quality benchmarks: product and service performance. Commun. ACM 45(4), 184–192 (2002) 16. Kern, R., Zirpins, C., Agarwal, S., Thies, H., Satzger, G.: Dynamic and goal-based quality management for human-based electronic services. Int. J. Coop. Inf. Syst. 21(1), 3–29 (2012) 17. Le, J., Edmonds, A., Hester, V., Biewald, L.: Ensuring quality in crowdsourced search relevance evaluation: the effects of training question distribution. In: SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation, pp. 21–26 (2010) 18. Oleson, D., Sorokin, A., Laughlin, G.P., Hester, V., Le, J., Biewald, L.: Programmatic gold: targeted and scalable quality assurance in crowdsourcing. In: Proceedings of the 11th AAAI Conference on Human Computation, pp. 43–48 (2011) 19. Sorokin, A., Forsyth, D.: Utility data annotation with Amazon Mechanical Turk. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2008) 20. Upwork.com: Work Diary. https://www.upwork.com/hiring/community/upworks-work-diary/ (2020). Accessed 24 Nov 2020 21. Rzeszotarski, J.M., Kittur, A.: Instrumenting the crowd: using implicit behavioral measures to predict task performance. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 13–22 (2011) 22. VizWiz: Overview. https://vizwiz.org/ (2021). Accessed 11 Jan 2021 23. Horton, J., Chilton, L.: The labor economics of paid crowdsourcing. In: Proceedings of the 11th ACM Conference on Electronic Commerce, pp. 209–218. ACM (2010) 24. Ipeirotis, P.: The new demographics of Mechanical Turk. N. Y. Univ. J. (2010) 25. Singer, Y., Mittal, M.: Pricing mechanisms for crowdsourcing markets. In: Proceedings of the 22nd International Conference on World Wide Web, May 2013, pp. 1157–1166 (2013) 26. TextBroker.com: Why Managed Services. https://www.textbroker.com/why-managed-service (2020). Accessed 24 Nov 2020 27. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N., Yasar, A.-U.-H.: A real-time service system in the cloud. J. Ambient Intell. Humaniz. Comput. 11, 961–977 (2020). https://doi.org/ 10.1007/s12652-019-01203-7 28. Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Xhafa, F.: Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. Computing 101(11), 1661–1685 (2019). https://doi.org/10.1007/s00607-018-0680-z 29. mTurk.com: All HITs. https://www.mturk.com/ (2020). Accessed 24 Nov 2020 30. Ipeirotis, P.: Crowdsourcing using Mechanical Turk: quality management and scalability. In: Proceedings of the 8th International Workshop on Information Integration on the Web: In Conjunction with WWW 2011, IIWeb’11, p. 1 (2011)

Sustainability Drives of the Sharing Economy Lucia Šepel’ová, Jennifer R. Calhoun, and Michaela Straffhauser-Linzatti

Abstract Due to the development of information and communication technologies, the process of sharing is expanding and has simplified the scope of daily lives and working activities. Currently, many companies are taking advantage of the benefits of the digital technology, resulting in improved prosperity. The most important factors behind the rise and sustainability of the sharing economy are the economic, social, environmental, and technological indicators that are influenced by a change in consumer requirements and values, as well as by market innovations The purpose of this chapter is to investigate the impact of informational services on the driving forces of the sharing economy that have resulted in the expansion of sharing platforms, specifically on the example of ride sharing platform Uber and accommodation sharing Airbnb platform. The methodology is based on the analyzing of the existing literature from developed and developing countries focusing on the model of four contributing drivers. As the result, the study will reveal the most important factors that have prompted the development of the sharing economy. Understanding these concepts as potential driving forces for participation in the sharing economy is necessary due to exploring the consumers’ needs that motivate them to the participation in the sharing economy. Keywords Sharing economy · Driving forces · Airbnb · Uber

L. Šepel’ová (B) Comenius University, Odbojárov 10, 831 04 Bratislava, Slovakia e-mail: [email protected] J. R. Calhoun E. Craig Wall Sr. College of Business Administration, Coastal Carolina University, Conway, SC, USA e-mail: [email protected] M. Straffhauser-Linzatti University of Vienna, Oskar Morgenstern Platz 1, 1090 Vienna, Austria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_5

139

140

L. Šepel’ová et al.

1 Introduction 1.1 Relevance As it is difficult to clearly define the terminology of the sharing economy or to appoint its founder, it is equally difficult to determine the period of its emergence. The notion of sharing is known for decades since the first forms of trade involved the exchanges of goods or services among individuals without using the currency. Sharing economy platforms are a disruptive technology that increases efficiency across many industries by dramatically reducing transaction costs, emerging from new technology, and increasing the efficiency of using different economic assets. With the rapid expansion of the sharing economy, there is a change in the approach of life and the economic system itself which requires the development of new strategies. It is above all the possibility to offer own unused resources that can generate a certain income. Moreover, the platforms changed the way people work or use the assets what leads to the new socio-economic interactions [1]. Among the main advantages of the sharing economy belong utilization of resources (platforms are via mobile applications targeting potential customers and meeting their demands more efficiently), new working opportunities (mostly in the developing countries has the emergence of online platforms provided new working possibilities), environmental protection, convenience (community-based platforms represent a comfortable space of sharing for both parties just with a single finger click) [2]. These benefits of the sharing economy have resulted in very significant changes in society and the functioning of traditional sectors. Emergence of the sharing economy was supported by four driving factors: technological, economic, ecological, and social [3]. One of them can be considered as a recent economic decline that has led people to the searching of cheaper goods and services. Equally important are the environmental issues or the need for finding new contacts. Social networks also encouraged the expansion of the sharing economy. Involvement of social media encourages and supports the sharing economy and social commerce, not only encouraging people to buy equivalent products, but also encouraging people to seek group agreements on these related products. Finally, it is a technological advance that has made it possible for people to simply connect people around the world [4].

1.2 Goals and Objectives The paper is an exploration of the role and influence of informational services in the sharing economy, specifically on ride sharing platform Uber and accommodation sharing Airbnb platform. A purpose of this chapter is to investigate the impact of informational services on the driving forces of the sharing economy, the technological, social, economic, and environmental factors that have resulted in the expansion

Sustainability Drives of the Sharing Economy

141

of sharing economy platforms in the last decade, namely the impact of these drivers on the market as well as on the platform as itself, by answering the following questions: • Which driving forces have contributed to the emergence of the sharing economy? • What is the role of informational services regarding the driving forces of the sharing economy? The methodology is based on the analyzing of the existing literature from developed and developing countries focusing on the model of four contributing drivers and statistical analysis. As the result, the study will reveal the impacts of the technological revolution that have caused the disruption of traditional business models.

2 Social Impact Digital communication namely virtual social networking has made it possible to connect known and unknown users, who exchange both positive and negative attitudes influencing the decisions of other users. In this environment, peer to peer markets became more user-friendly mainly through utilization of smartphone applications. This has increased convenience and flexibility of service and enabled realtime collaboration regardless of user location. Thus, the social impact of Information Systems (IS), is related to the ability of consumers to adopt to changes in traditional business models. However, the implications of the use of IS include some negativity, for example in the form of threats to privacy, which may lead to the rejection of the implementation of these practices to the target segments [5]. Social networking has to the large extent contributed the development of social connections between consumers, which have become a key communication tool for traders, allowing more effective and faster communication and trust-building among participants of the sharing economy [6]. A significant positive relation between community membership and the probability of reusing the sharing option was found in the survey of Car2go and Airbnb users. The desire of belongingness into the community represents a significant impact when deciding to participate in the sharing activities [7]. According to Botsman, trust is regarded as ‘currency’ of the sharing economy [8]. Many authors have agreed that trust is considered the main trait for a provision of sustainability of electronic commerce, as well as in the sharing economy. Web 2.0 and social media, based on communication and dependency on interpersonal interactions, are ideal platforms for building human relationships which could not fully operate without mutual trust [9]. By building the trust between the two parties are platforms mostly using evaluation systems through which consumers and providers can, after each transaction, give feedback about their purchase. Additionally, the advancement of the sharing economy is mostly based on the creating of new social circles within society, especially in the creation of principles among individual participants. If a user is not satisfied with the services provided, his negative or low rating can significantly affect decision-making about the participation of other consumers [10].

142

L. Šepel’ová et al.

2.1 Reputation Systems A significant disadvantage for users within various sectors of the sharing economy is asymmetric information. For example, in the case of Uber, the lack of passengers’ knowledge about the qualifications of drivers is apparent. In the case of Airbnb, accommodated guests have less knowledge about the quality of their accommodation. The informational asymmetries resulting from the uncertainty about the new services provided or the uncertainty about quality, may lead to a lower number of shares or participations. Therefore, trust building between participants of the sharing economy is more crucial when compared to traditional businesses [11]. However, sharing economy platforms help reduce the negative impacts of information asymmetries mostly by raising importance of the reputation systems through which potential users can learn about ‘social capital’ and build trust between parties [12]. Reputation systems are well-proven as a mechanism for social control, where asymmetric information often occur, which can negatively affect the functioning of the community, due to omitting of formal control systems. These systems also represent a significant factor in the trust building relationships in the sharing economy [13]. Resnick suggested that Dellarocas’s research omitted some variables, such as, attractiveness of websites or the way the items were presented [14]. To dispel uncertainty and support trustworthiness, some sharing economy platforms are using the services of other networks to create professional presentations of their offer, for instance, Airbnb uses services of professional photographers supporting its accommodation offers [15]. Online consumer reviews have contributed to a change in consumer perceptions, which has also significantly changed the way consumers purchase and even replace traditional consumer industries. This has increased engagement by ordinary consumers in the feedback rating systems [16]. These reputation systems have also become a specific resource for transferring information among users representing a certain form of online advertising.

Fig. 1 Trust framework [17]

Sustainability Drives of the Sharing Economy

143

Teng designed a theoretical framework presented in Fig. 1 for examining the effects of trust on Airbnb platform. The research posits that positive ratings of reputation systems increase confidence in websites, and hosts as participants, positively respond to product ratings and ratings of other platform members. In the final phase, the relationship of participants’ perceived risk with respect to the reputation system that records the quality of the listings and experiences with hosts. The research showed a positive relationship between trust in Airbnb platform, contingent on increasing trust of host [17]. The study demonstrates that the greater the confidence toward traders in traditional businesses will increase confidence in the trader’s company [18]. Feedback system is intended to provide information to assess the probability of an unsatisfactory outcome. For this reason, the negative impact of risk perception by participants in relation to the positive ratings of the host is observed. When they interact, participants compare the relationship between risk and trust. If trust is higher than perceived risk, it is more likely that participants will perceive the host or the platform of the sharing economy in a more positive sense [17]. The interconnection of participants’ social profiles with the sharing economy platforms creates the potential for the accumulation of social capital in a digital way from a single network or platform to the other [19]. The accumulation of social capital from different contexts increases the amount of feedback and therefore could build trust especially on social networks [20]. Platforms usually share some information with the consumers through private channels, such as emails or SMS, in which consumers consider more trustworthy and safer [21].

3 Economic Impact An economic recession triggered in 2008 had caused the most significant decrease in the household’s economic activity and the rise of unemployment. Declining incomes and increasing cost for goods were putting pressure on innovation and structural changes. For many owners of properties, the availability of existing long-term goods has given rise to the possibility of a short-term rent, from which an income allowed mitigation the impacts of the crisis and developing new sources of income [6]. Consumers have started to perceive the offer more innovative, specialized, extensive, and especially affordable. The result is the expansion of the market by a new level as well as a higher competition, both on the demand side as well as on the supply side. Many users that are these days participating in the sharing economy have benefited from the global economic crisis, which resulted in a higher price sensitivity of consumers. Because of financial difficulties, consumers habits have changed and tended to turn away from ownership towards consumption-based access [22]. An estimation of the official size of the sharing economy is extremely difficult because many providers are private. The comparative measurement of PwC, forecasting the revenues of the five most significant sharing economy sectors (accommodation, car sharing, online staffing, finance, and music streaming) and traditional

144

L. Šepel’ová et al.

sectors, estimates that in 2025 revenues will reach $335 billion compared to 2013, when revenues from online platforms were $15 billion. Moreover, revenues are expected to increase by 2133% between 2013 and 2025 compared to traditional enterprises, which are estimated to increase by 39.6% [23]. The expansion of the sharing economy facilitated through online platforms have resulted in numerous economic impacts. Traditional businesses are being motivated to enter the sharing economy in any form to maintain market position and social sustainability [24]. Significant cost reductions in transaction costs such as costs of switching, searching, and negotiating have been associated with the sharing economy for both providers and users. Without the Internet-based sharing platforms these costs would be too high and the development of the sharing economy in the commercial markets would be very difficult. Cost reduction in the electronic commerce represents a considerable advantage in the removal of certain entry barriers for new providers entering the market of the sharing economy [25]. Airbnb can significantly reduce costs, for instance, the cost of contracting or client search processes were found to be a significantly higher and harder for private businesses of traditional industries. Through the sharing economy platforms this market niche and new “infrastructure” provide easier access to capital and services [26]. In addition, scholars agree, the reduction in transaction costs significantly contributed to a decline in the prices of goods and services offered in the sharing economy, resulting into a consumer-friendly pricing policy [26]. Uber showed cost savings in the form of a reduction in the search time between the customer and the driver. The geolocation technologies that Uber applies allowed matching the nearest localized driver with the orderer. In doing so, Uber effectively managed to reduce transaction and search costs for rides to almost zero [27]. Moreover, a growing broadband mobile connectivity is reflected, for example, in increased revenues of advertisement on the mobile segments. Advertising revenues in the European Union increased in 2014 by 55% and represent an indispensable source of income for many sharing economics platforms [28]. Farronato and Fradkin analyzed the impacts of Airbnb on the hotel industry in the United States market. The study found that due to the entry into the hotel market, there was a 1.3% decline in reservations and a loss of 1.5% in the hotel chain revenues [29]. The advantages of reducing costs were also found in smaller businesses and platforms that utilize mobile payments in their business models, especially in mediating financial services. A self-service platform CaptureCode suggests a cut in mobile payment costs up to 20%; a significant surplus for companies to save a considerable amount in the long-term [30]. By traditional banking, transaction costs include among others, conversion, storage, and calculation fees. Likewise, by accepting the card payments, businesses cannot avoid monthly charges, which result in the increasing transaction costs. Payments through mobile systems reduce these costs to a substantial extent, which puts pressure on financial intermediaries to reduce billing charges for transactions carried out through them [31]. Sharing economy platforms strive to increase trustworthiness and awareness of mobile payments, thereby benefiting from the reduction of transaction costs to the minimum. The proliferation of smartphones provides a tool to transfer digital

Sustainability Drives of the Sharing Economy

145

currency, resulting in changes in consumer behavior and simplicity and security of payments. The Millennials generation is the largest group of users of these systems, as well as a major segment in the sharing economy [32]. The revolution in the sharing of digital currency impacts traditional financial intermediaries with the advancement of online financial peer-to-peer platforms such as PayPal. The comparison by Pastorek, analyzed money transfer through PayPal against typical banks. The analysis found PayPal fees of transfer more financially compliant, exchange rates were lower and more affordable for users. The process of exchange rate evaluation is the same by both companies, depending on the market rate added by a margin. Considering the speed of transfer, PayPal recorded faster transactions within two PayPal’s accounts than traditional banks [33].

4 Technology Confluence The emergence and fast adoption of modern technologies such as social, mobile, analytics and cloud computing are an accelerator of the progress of the sharing economy, which not only transform the way of human’s interaction but also the way of consumer’s consumption [34]. The structure of the market has changed considerably in relation to the customers, whose interests and demands have greatly developed over time. As a result, customers have begun to accept a new form of business brokerage through the rise of social networks such as Facebook, LinkedIn, or YouTube. Sharing on social networks has become a common practice of everyday life. A breakthrough occurred in 2006 when Web 1.0 was upgraded to Web 2.0 [35, 36]. Web 2.0 refers to “the ability for people to collaborate and share information online via social media, blogging and Web-based communities” [37]. Unlike the previous generation of the Internet, which served as a platform for providing information only, through the Web 2.0, sellers could offer their unusable consumer goods and buyers to consume the goods of other participants. As a result of this process, the seller could transform passive capital into an asset [38]. The Internet, as well as Web 2.0, thus became the new sharing channels, enabling the emergence and development of the new forms of communication and the way in which information is generated and consumed by online-users. The availability of information technology tools has also developed new forms of sharing, or it facilitated to expand the existing ones, resulting in virtually every participant being able to participate in web-content creating. The growing number of people connected to the Internet, since interconnected people are one of the most important elements of every business, contributed fast exchange of information and data in the world. According to social media management platforms Hootsuite and We are social, the number of people using Internet in 2018 reached over half of the world’s population, from what the number of mobile phone users represents almost 5.2 billion which indicates a year-on-year increase of 4% and about 3 billion use of social media each month [39]. Clearly, technological advancement has a significant impact on the development of the sharing

146

L. Šepel’ová et al.

Fig. 2 Laws of disruption [39]

economy, while other indicators, such as social, economic, or political have much slighter impacts. It is clear from Fig. 2 that the development of technology is growing exponentially, while the growth of other indicators is rather incremental, the axis is almost linear shaped, having more negligible impact on the development of the sharing economy [40].

4.1 Technological Capabilities Impementation of the ICT in the sharing economy services have simplified the interconnection of strangers, promoted the collaboration of users, increased and encouraged cooperation, gather information about the past and present actions, and predict the future behaviour of participants [41]. Through the sharing applications, operators gain access to large quantities of data about users of the sharing economy platforms. As a result, companies with the available information can more easily reach their customers with a specific offer of services, regardless of geographic location. The platforms also attract new users, especially through word-of-mouth and mediation programs. Simmons suggests the main entities causing disruption in traditional industry: hardware for having access to applications necessary for both parties. Software is an inseparable part of hardware that enables its functionality. Sharing platforms synchronize applications of both actors with geolocation features such as GPS and mapping systems. The role of the software is to connect mobile devices and applications together. Internet is giving a possibility to download and install applications for interaction on the sharing services, enabling to order a service. The findings suggest the population familiar with ICT has shifted from the use of traditional industry to the sharing economy. In addition, no industry is immune toward emerging digital advancement and is likely to experience disruption [42].

Sustainability Drives of the Sharing Economy

147

Table 1 Technology innovation and implementation for the sharing economy (based on [43]) Innovation

Implementation

World Wide Web

A platform for presentation and exchange of content, functionality, and media

Web-based consumer commerce

Peer to peer commerce and introduction of reputation, trust, recommendation

E-mail

Low cost means for communication online

3G phone network

Access to the Internet when moving

Social media

Communication means between informal groups of participants, online profile creation, tool for building a trust and awareness

Cloud storage

Low-cost data storage, providing a high level of complex functionality

Mobile broadband connection

Access to Internet for wireless communication

Mobile devices

Access to all listed technological innovations

Harvey examined online capabilities causing innovation on the sharing economy platforms, summarized in Table 1. The twentieth century is associated with technological expansion and the introduction of a World Wide Web that provided cheaper access to unlimited resources, along with the advancement of cloud storage and mobile broadband connection [43]. Scholars argue that participation in sharing platforms depends on ICT skills [44]. Therefore, a lack of ICT skills (digital competency), poses a considerable disadvantage for the sharing economy. Dillahunt found the individuals who have less technological knowledge or skills do not participate on the sharing services. A study sample claimed they were more comfortable by using platform applications when someone in their social community supported them in case of operating difficulties [45]. Computer self-efficacy presents “an impact on whether the users adopted technology” [46]. Scholars agree that individuals with higher education [23, 47, 48], higher income [23, 47, 48], lower age range [23], and higher skills in ICT are typical sharing economy segments, unlike those with lower education, income, higher age range and lower technological skills [45]. Hsiao’s research about how computer self-efficacy and easiness of computer usage impact the individual’s presence in the sharing economy found that the technological self-efficacy and perceived easiness of use of sharing economy services positively correlate to future willingness to pay for these services and experience [49–51].

148

L. Šepel’ová et al.

4.2 Digital Marketing Channels Sharing economy platforms via their digital marketing activities seamlessly communicate with customers. As a result, consumers are getting information from unfamiliar people not only through the Internet, but also various interactive channels for sharing information such as social networks, blogs, forums, and online communities. Marketing communication is no longer limited to traditional one-way interaction, partly due to ICT that stimulate the flow of information, especially through the relationships that individuals have with each other. The utilization of Web 2.0 has led to a significant change of traditional word-of-mouth (WOM), especially through social media and blogging [52]. Digital Marketing is characterized as “the practice of promoting products and services using digital distribution channels via computers, mobile phones, smart phones, or other digital devices” [48]. The tool is frequently used by companies in the sharing economy to communicate and build a relationship with both current, as well as future users representing a promising communication strategy, due to minimal cost and significant bonding of relationships. Digital marketing is a way of communication when the marketing message appears to the recipient so interesting that it will cause self-dissemination by his own technological means. The viral message exponentially spreads through the media without the initiator’s control. Consumers have an active role in distributing marketing messages to friends, and recipients have more confidence in messages from well-known sources. According to Keller, WOM is perceived as one of the most effective and influential channels of communication [53]. Within the current online environment, Internet product reviews are one of the most influential types of electronic WOM (eWOM) because they can shape consumer attitudes and influence their purchasing decisions. According to a global information, data, and measurement company Nielsen, consumer opinions that are posted online are the third most trusted online advertising format with the confidence of 66% of respondents from around the world [54]. The company that significantly takes advantage of the viral marketing campaigns is Uber. Its former Chef Executive Officer, Travis Kalanick, claims that Uber spends “virtually zero dollars on marketing, which spreads almost exclusively through (e)WOM”. Kalanick further refers that their virality is enormous, as 95% of customers have learned about their services from other travelers [55]. Similar approaches have been applied by Airbnb that has also launched several marketing campaigns with the features of digital marketing. Since 2010, Airbnb has used a marketing strategy that includes Craigslist. After placing room listings on Craigslist, Airbnb responds by automatic emails suggesting a direct link to offers on the company’s website. In addition, users are encouraged to share their experiences through social networking sites, which create a powerful WOM effect [56].

Sustainability Drives of the Sharing Economy

149

5 Environmental Impacts and Sustainability Since sharing platforms are based on the concept of temporary secondary rentals, hence, abandoning of the purchase of first-hand assets, the sharing economy contributes to environmental protection in the form of reduced use of energy resources needed to produce consumer goods or reduction of emissions. Given that the market of the sharing economy is constantly growing, and the providers of peer-to-peer markets are private persons, and a clear analysis of the environmental impacts of the sharing economy is uncertain. Nowadays, humanity is more conscious of the impacts of the environmental crisis resulting in significant changes in consumer’s values. According to a study of the consulting enterprise PwC, 76% of participants of the survey suppose their presence in the sharing economy, the way to protect the environment [23]. Research by Frenken found that sharing in B2C and C2C markets can reduce emissions by 8–13%. With the growth of world’s population there is an increase demand by consumers for good and services [57]. Environmental sustainability is increasingly associated with electronic applications [58] and new business models of the sharing economy platforms, enabled by ICT, provide the optimal solution for re-use of existing resources. The underlying factor of the emergence of sharing economy is informational technology, but the question remains what role technology plays in terms of environmental impact and environmental sustainability? In general, experts predict that enterprises of the sharing economy have sustainability potential in both local and global markets [57]. In 2013, worldwide electronic waste represented 53 million tons, compared to 67 million tons of new electronic devices entering the market [59]. Considering environmental impacts and sustainability, Frenken emphasizes the importance of political interference and pressure from restrictive laws, which seek to influence businesses to respond favorably towards the environment and support the efforts of international companies trying to reduce the negative aspects of environmental damages [57]. ICT impacts has significant systemic effects defined as the long-term reaction of the dynamic socio-economic system to the availability of ICT services, including behavioral change (lifestyles) and economic structural change [60]. Based on these assumptions, Pouri and Hilty points out that sustainability in commerce and in the sharing economy depends not only on the subject itself, but also on the associated resource. An increased interest in ride sharing services, lower demand for newly manufactured cars is expected and related to reduction in production costs. For effective sustainability, it is necessary to consider resources necessary for production and consumption of goods and services. Therefore, to obtain a comprehensive understanding of the impact of the sharing economy on the environment, it is inevitable to consider the long-term effects of the life cycle resulting from the value obtained through the participation on the sharing [61]. Chowdhury and Veeramani, study about the environmental impact of ICT and its sustainable development, calculated the amount of energy consumed in 13 selected

150

L. Šepel’ová et al.

Fig. 3 Environmental effects of electric consumption [62]

companies and its impact on the environment. The sample consisted of businesses operating in the ICT sector such as Microsoft, IBM, Apple, Dell etc. [62]. The result of the study is depicted in Fig. 3 and found that the more electricity companies’ demand, the more harmful effects they have on the environment. From the illustration, Google, Facebook, and Microsoft companies are the largest energy consumers, which are an inseparable part of the sharing economy by selling electronic devices or mediating social interconnection. The researchers conclude: in general, increased interest in consumer goods has a negative impact on the environment. Achieving environmental sustainability could be possible if demand for consumer goods could be reduced. One of the positive effects of the sharing economy is the use of second-hand goods, which leads to a decrease in production.

6 Conclusion 6.1 Synopsis As demonstrated in the study, innovative technologies have significantly disrupted business models of platforms, making the way of sharing in the global dimension much easier than ever before. The innovativeness in the new ways of transmission of sharing capital and services involves the integration of ICT-platform as mediator in the sharing relations. Among other things, the main reason for the expansion of the sharing economy is the number of people with Internet connection, especially mobile Internet and increasing number of smart phones users. The studies showed the number of ‘connected’ people in the world is growing every year. Moreover, by incorporation of the disruptive technology into business models of the sharing economy results in ease of interconnection and communication by participants that are located across the world. Thanks to applications that have become the place for

Sustainability Drives of the Sharing Economy

151

sales and purchases, thus, digital stores and the technology driving them, such as cloud and storage services, are able to gather and evaluate data in the simplest way on these platforms. As proven, the sharing economy threatens traditional businesses with its diverse offerings, friendly prices, and omnipresence. A consequence of technological innovations in the sharing economy is the change in the mode of thinking of traditional suppliers of goods and services. These suppliers are now incorporating technological advancements in their existing business models to compete in the sharing economy platforms. Since interaction through platforms is carried out between two completely unknown people, there is a strong need to build trust between the purchaser and the provider, thus, between two entities that have had no previous personal contact. In the sharing economy, trust is placed not only on the providers of goods and services, but also on the platform itself. Therefore, for building a trust, platforms are using reputation systems as well as social networks. Social networks are considered as a type of reputation systems that significantly influence the decision to participate in the sharing economy. The issue of trust in the sharing economy is very important for the platform providers because as shown, the trust in seller increases the overall trust in the business thus, platform. If there exists low trust between supply–demand actors, the existence of the sharing economy would be impossible. As payment transfers are made through mobile devices, the perceived risk of mobile payments is another important factor influencing the functioning and participation in a sharing economy. As has been shown, the most populous group of mobile payment users by age distribution is the population that includes the segment of the sharing economy. Technological advancement that has increased security in the mobile payment systems has increased the use of these services. Hence, trust associated with the reputation systems, mobile payments, and providers of sharing assets and services, is increasing intention to participate in the sharing economy services. The technology incorporated into business models made it possible to liquidate transaction costs to the minimum level that are normally associated by doing businesses and in the case of micro-enterprises, transaction costs would absorb a large part of the profit from the business. For example, the development of a virtual platform is requiring a single expense, and therefore, there is no need to pay monthly rents or insurance. Additionally, due to the technology used, platforms could have excluded from its business models’ other factors, which are necessary for traditional businesses. Even when considering environmental issues, the impact of the sharing economy itself is significant. However, considering the technology, the studies do not lead to a single conclusion. The amount of electronic waste is rising globally, but it is not possible to clearly state to what extent the sharing economy is involved. The enterprises can achieve environmental sustainability by decreasing production and consumption. Moreover, political restrictions put pressure on companies to be more environmentally friendly. The issues of environmental impact leading to sustainability is still challenging.

152

L. Šepel’ová et al.

References 1. PwC: Assessing the size and presence of collaborative economy in Europe. https://www.pwc. es/es/publicaciones/digital/evaluacion-economia-colaborativa-europa.pdf (2016). Accessed 29 June 2019 2. Ernst & Young: The rise of the sharing economy. The Indian landscape. http://sharehub.kr/wpcontent/uploads/2015/11/e1a7c1d73dfae19dcfa0.pdf (2015). Accessed 2 Nov 2019 3. Latitude: The new sharing economy. http://files.latd.com.s3.amazonaws.com/New_Sharing_E conomy-Report.pdf (2010). Accessed 23 Sept 2018 4. Selloni, D.: New forms of economies: sharing economy, collaborative consumption, peer-topeer economy. In: Codesign for Public-Interest Services, pp. 15–26 (2017) 5. Dé, R.: Societal impacts of information and communications technology. IIMB Manag. Rev. 28(2), 111–118 (2016) 6. Katsoni, V.: An investigation into the sharing economy phenomenon in the Greek tourism industry in the accommodation sector. Tur. Rozw. Reg. 25–35 (2017) 7. Möhlmann, M.: Collaborative consumption: determinants of satisfaction and the likelihood of using a sharing economy option again. J. Consum. Behav. 14(3), 193–207 (2015) 8. Botsman, R.: The Sharing Economy Lacks a Shared Definition. https://www.fastcompany.com/ 3022028/the-sharing-economy-lacks-a-shared-definition#1 (2013). Accessed 14 Nov 2018 9. Hawlitschek, F., Teubner, T., Weinhardt, C.: Trust in the sharing economy. Die Unternehmung Swiss J. Bus. Res. Pract. 70, 26–44 (2016) 10. Grybait˙e, V., Stankeviciene, J.: Motives for participation in the sharing economy—evidence from Lithuania. Ekon. Zarzadz. 8(4), 7–17 (2016) 11. Wu, X., Shen, J.: A study on Airbnb’s trust mechanism and the effects of cultural values—based on a survey of Chinese consumers. Sustainability 10(9), 3041 (2018) 12. Foldvary, F.E., Klein, D.B.: The half-life of policy rationales: how new technology affects old policy issues. Knowl. Technol. Policy 15(3), 82–92 (2002) 13. Dellarocas, C.: The digitization of word of mouth: promise and challenges of online feedback mechanisms. Manag. Sci. 49 (2003) 14. Resnick, P., Zeckhauser, R., Swanson, J., Lockwood, K.: The value of reputation on eBay: a controlled experiment. Exp. Econ. 9(2), 79–101 (2006) 15. Guttentag, D.: Airbnb: disruptive innovation and the rise of an informal tourism accommodation sector. Curr. Issue Tour. 18(12), 1–26 (2013) 16. You, L., Sikora, R.: Performance of online reputation mechanisms under the influence of different types of biases. Inf. Syst. e-Bus. Manag. 12(3), 417–442 (2013) 17. Ye, T., Alahmad, R., Pierce, C., Robert, L.: Race and rating on sharing economy platforms: the effect of race similarity and reputation on trust and booking intention in Airbnb. In: Proceedings of the 38th International Conference on Information Systems, Seoul, Korea (2017) 18. Doney, P.M., Cannon, J.P.: An examination of the nature of trust in buyer-seller relationships. J. Mark. 61(2), 35–51 (1997) 19. Möhlmann, M.: Digital trust and peer-to-peer collaborative consumption platforms: a mediation analysis. SSRN Electron. J. (2016) 20. Kamal, P., Chen, J.: Trust in sharing economy. In: PACIS 2016 Proceedings (2016) 21. Ranzini, G., Newlands, G., Anselmi, G., Andreotti, A., Eichhorn, T., Etter, M., Hoffmann, C., JJrss, S., Lutz, C.: Millennials and the sharing economy: European perspectives. SSRN Electron. J. (2017) 22. Belk, R.: You are what you can access: sharing and collaborative consumption online. J. Bus. Res. 67(8), 1595–1600 (2014) 23. PwC: Consumer Intelligence Series “The Sharing Economy”. https://www.pwc.com/us/en/ technology/publications/assets/pwc-consumer-intelligence-series-the-sharing-economy.pdf (2015). Accessed 7 Sept 2018 24. Matzler, K., Veider, V., Kathan, W.: Adapting to the sharing economy. MIT Sloan Manag. Rev. 56(2), 71–77 (2015)

Sustainability Drives of the Sharing Economy

153

25. Cordella, A.: Transaction costs and information systems. J. Inf. Technol. 21, 195–202 (2006) 26. Henten, H.A., Windekilde, M.I.: Transaction costs and the sharing economy. INFO 18(1), 1–15 (2016) 27. Furchtgott-Roth, H.: The Myth of “Sharing” in a Sharing Economy. https://www.forbes.com/ sites/haroldfurchtgottroth/2016/06/09/the-myth-of-the-sharing-economy/ (2016). Accessed 17 Oct 2018 28. European Commission: Communication on Online Platforms and the Digital Single Market Opportunities and Challenges for Europe. https://ec.europa.eu/digital-single-market/en/news/ communication-online-platforms-and-digital-single-market-opportunities-and-challengeseurope (2016). Accessed 3 Oct 2019 29. Farronato, C., Fradkin, A.: The Welfare Effects of Peer Entry in the Accommodation Market: The Case of Airbnb. Social Science Research Network, Rochester, NY (2018) 30. CaptureCode: Businesses Can Reduce Costs with Mobile Payment | Mobile Marketing Solutions. https://www.capturecode.com/mobile-payment-a-cost-effective-measure-for-bus inesses/ (2014). Accessed 15 Nov 2020 31. Prime Indexes: Mobile payments industry overview. https://www.primeindexes.com/indexes/ prime-mobile-payments-index/whitepaper.html. Accessed 2 Nov 2018 32. Godelnik, R.: Millennials and the sharing economy: lessons from a ‘buy nothing new, share everything month’ project. Environ. Innov. Soc. Trans. 23, 40–52 (2017) 33. Pastorek, G.: PayPal vs. bank transfer—which can save you more? https://www.finder.com/ paypal-vs-banks-international-transfers (2018). Accessed 1 Dec 2019 34. Hosu, I.: Digital Entrepreneurship and Global Innovation. IGI Global (2016) 35. Cormode, G., Krishnamurthy, B.: Key differences between Web 1.0 and Web 2.0. First Monday 13(6) (2008) 36. Molnár, E., Molnár, R., Kryvinska, N., Greguš, M.: Web intelligence in practice. Soc. Serv. Sci. J. Serv. Sci. Res. 6(1), 149–172 (2014) 37. Techopedia: What is Web 2.0?—Definition from Techopedia. http://www.techopedia.com/def inition/4922/web-20. Accessed 19 Nov 2018 38. John, N.: Sharing and Web 2.0: the emergence of a keyword. New Media Soc. 15(2), 167–182 (2013) 39. We Are Social: Digital in 2018: World’s internet users pass the 4 billion mark. https://weares ocial.com/blog/2018/01/global-digital-report-2018 (2018). Accessed 10 Oct 2018 40. DHL: Sharing economy logistics: rethinking logistics with access over ownership. http:// www.dhl.com/content/dam/downloads/g0/about_us/logistics_insights/DHLTrend_Report_ Sharing_Economy.pdf (2017). Accessed 12 Sept 2018 41. Zapatero, M., Brändle, G., San Roman, R.: Interpersonal communication in the Web 2.0. The relations of young people with strangers. Rev. Lat. Comun. Soc. 68, 436–456 (2013) 42. Simmons, R.O.-B.: Disruptive digital technology services: the case of Uber car ridesharing in Ghana. In: AMCIS 2018 Proceedings (2018) 43. Harvey, J., Smith, A., Golightly, D.: Online technology as a driver of sharing. In: The Rise of the Sharing Economy: Exploring the Challenges and Opportunities of Collaborative Consumption. Praeger (2018) 44. Andreotti, A., Anselmi, G., Eichhorn, T., Hoffmann, C., Micheli, M.: Participation in the sharing economy. In: Ps2Share: Participation, Privacy and Power in the Sharing Economy (2017) 45. Dillahunt, T., Lampinen, A., O’Neill, J., Terveen, L., Kendrick, C.: Does the sharing economy do any good? In: Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion—CSCW ‘16 Companion (2016) 46. Wang, Y., Wang, Y., Lin, H., Tang, T.: Determinants of user acceptance of Internet banking: an empirical study. Int. J. Serv. Ind. Manag. 14(5), 501–519 (2003) 47. Campbell, M.: National Study Quantifies the “Sharing Economy” Movement. https://www. prnewswire.com/news-releases/national-study-quantifies-the-sharing-economy-movement138949069.html (2012). Accessed 17 Oct 2018 48. Smith, K., Brower, T.: Longitudinal study of green marketing strategies that influence Millennials. J. Strateg. Mark. 20(6), 535–551 (2012)

154

L. Šepel’ová et al.

49. Hsiao, C.-Y., Moser, C., Schoenebeck, S., Dillahunt, T.: The role of demographics, trust, computer self-efficacy, and ease of use in the sharing economy. In: Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS)—COMPASS ‘18 (2018) 50. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N., Yasar, A.-U.-H.: A real-time service system in the cloud. J. Ambient Intell. Humaniz. Comput. 11, 961–977 (2020). https://doi.org/ 10.1007/s12652-019-01203-7 51. Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Xhafa, F.: Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. Computing 101(11), 1661–1685 (2019). https://doi.org/10.1007/s00607-018-0680-z 52. Leskovec, J., Adamic, L., Huberman, B.: The dynamics of viral marketing. ACM Trans. Web 1 (2005) 53. Keller, E.: Unleashing the power of word of mouth: creating brand advocacy to drive growth. J. Advert. Res. JAR 47(4), 448–452 (2007) 54. NIELSEN: Global Trust in Advertising—2015. https://www.nielsen.com/eu/en/insights/rep ort/2015/global-trust-in-advertising-2015 (2015). Accessed 14 Nov 2019 55. Sinan, M.: On heels of new funding and global expansion, car service Uber launches in D.C. today. https://venturebeat.com/2011/12/15/uber/ (2011). Accessed 23 Aug 2020 56. Key, T.: Domains of digital marketing channels in the sharing economy. J. Mark. Channels 24(1–2), 27–38 (2017) 57. Frenken, K.: Political economies and environmental futures for the sharing economy. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 375, 20160367 (2017) 58. Burcea, S., Ciocoiu, C., Tartiu, V.: Environmental impact of ICT and implications for e-waste management in Romania. Econ. Ser. Manag. 13(2), 348–360 (2010) 59. Ormazabal, M.: Turning the E-waste challenge into an opportunity. https://news.itu.int/turninge-waste-challenge-opportunity/ (2013). Accessed 19 Nov 2018 60. Hilty, L., Aebischer, B.: ICT for sustainability: an emerging research field. In: ICT Innovations for Sustainability. Advances in Intelligent Systems and Computing, vol. 310, Springer International Publishing (2015) 61. Pouri, M.J., Hilty, L.: ICT-enabled sharing economy and environmental sustainability—a resource-oriented approach. In: Advances and New Trends in Environmental Informatics (2018) 62. Chowdhury, A., Veeramani, S.: Information technology: impacts on environment and sustainable development. Pertanika J. Sci. Technol. 23(1), 127–139 (2015)

Sentiment Analysis for Diagnostic Purposes Urszula Krzeszewska and Joanna Ochelska-Mierzejewska

Abstract The natural language processing has been the subject of numerous studies over the last decade. They have focused on the various stages of text processing, from the prepared text, through vectorization, to the final understanding. One of the greatest challenges nowadays is to understand the emotions that are express in written text. This is all the more difficult because sometimes people are also unable to recognize the sentiment of the text. This work focuses on the analysis of emotional attitudes in the texts that are the statements of centuries old computer science at the Technical University of Lodz. Due to the free style of expression, the created application is a good basis for automatic analyses under the diagnostic angle, as an aid for psychologists, educators or sociologists. The bag-of-words and n-gram methods were used to vectorize the text, while for the classification of sentiments, k-nn and NBC were used. Keywords Natural language processing · Sentiment classification · Polish natural language · Diagnosis · Attitude

1 Introduction Automatic analysis of the content of written statements is a dynamically developing field of science. It is an area of knowledge for data mining, which is a very popular and important field of computer science. Currently, most works or applications are focused on English, while analysis in Polish is still an issue that leaves a huge number of questions and gives wide opportunities for development [1]. The classification of written statements by emotion, or semantic analysis, is intended to assign a particular emotion or emotional attitude (positive or negative) U. Krzeszewska (B) · J. Ochelska-Mierzejewska Institute of Information Technology, Lodz University of Technology, 215 Wolczanska Street, Łód´z, Poland e-mail: [email protected] J. Ochelska-Mierzejewska e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_6

155

156

U. Krzeszewska and J. Ochelska-Mierzejewska

based on the phrases provided in the text. The input document of the analysis can be a free speech, a diary fragment or a transcription of the interview. This type of document is given a label which specifies to which emotion it is associated [2, 3]. Natural Language Processing (NLP) is an area that began in the 1950s. From the very beginning it was a combination of artificial intelligence with linguistics [4]. At the beginning, NLP was significantly different from the information retrieval (IR) process, which was based mainly on statistical methods. However, nowadays, both of these fields have merged slightly. This means that IR uses the achievements of NLP, for example, in the form of artificial intelligence, and NLP, apart from machine learning, also uses statistical methods of text analysis, borrowed from IR [4–6]. Various tools for automatic analysis of written statements are available on the market. Each one of them focuses on a different issue, and pays more attention to something different. The part is completely commercial and the part is available for each user. Among the sample applications you can mention: • Text inspector—an internet application that allows to analyze the difficulty of a text in English. The main part of the application is a web page containing a field for entering text by copying or typing. It is also possible to add text to the analysis by providing a file with *.txt extension. Statistics concerning the analyzed text and analysis of the difficulty of the words used in the text are presented in a separate view in the form of tables and diagrams [7]. The main advantages of this application include a detailed analysis of text documents taking into account the difficulty of the text as well as ease of use. However, the main disadvantages are the analysis of texts in English only and the analysis of text difficulties only and not the emotional aspect, sentiment. Additionally, the application is partially paid. • Jasnopis—an application allowing to assess the language difficulty of a text in Polish (according to the instruction manual available on the website, the text to be assessed should be written in Polish and consist of at least one hundred words, it should not be an artistic text, with particular emphasis on not being poetic) [8]. The result of the application contains three fields: – “Statystyki” (“Statistics” in English), where the scale of difficulty of the text is shown, as well as detailed statistics concerning the analyzed text (after pressing the “expand” button); – “Dane” (“Data” in English), where the analyzed text is presented, with the more difficult fragments marked and suggestions for replacing individual fragments with easier ones; – “Legenda” (“Legend” in English), where you can see all explanations for the analysis available in the “Dane” field. Among the advantages of this application is that it is designed for the Polish language, it is easy to use and the result of the analysis is presented in a clear way. However, the main disadvantage here is again the analysis of only the difficulty of the text and little, compared to Text Inspector, statistical data provided by the application (most can be found even in simple text editors).

Sentiment Analysis for Diagnostic Purposes

157

• Morfeusz—a desktop application enabling morphological analysis of given texts. The application is available for operating systems: Linux 64-bit, Linux 32-bit, Windows 64-bit, Windows 32-bit and Mac OS X 64-bit. It is offered in two versions, as a console and a window application. It is useful to add that this application can also be used as an external library for C++, Java and Python [9]. This application consists of two main tabs: “Analyzer” and “Generator”. The “Analyzer” allows for an accurate morphological analysis of the text given in the large field. The result of the analysis is presented in the largest area of the window called “Morphological analysis”. There, in the form of a table, all the words contained in the analyzed text, their possible lemmas, parts of speech assigned with appropriate forms, etc., are divided into tokens. The “Generator” allows to generate all available forms for the given lemmar. Among the main advantages of this application one can mention an analysis in Polish, focused on the characteristics of this language, also providing a graphic and console interface, and a detailed analysis that it offers. However, none of the applications mentioned above allows to analyze the emotional coloration of the text. Such applications or analyses can be found in the literature for the English language [10–12], but this type of approach is not available for the Polish language. The main objective of the work is to analyze the content of written statements in therms of emotions. The design and implementation of the application enabling the analysis of sentiment is to help in the diagnosis of social attitudes, as well as to automate the process of analysis of interviews, documents and content, which will be helpful in the daily work of educators, sociologists and psychologists. The first chapter contains an introduction to the topic of natural language processing, as well as an overview of the tools that enable automatic text analysis, with their advantages and disadvantages. Additionally, the goal of the work is also included. The second chapter describes how the text is represented together with examples of using specific methods. There is information on how to prepare the text for analysis, what are the possible approaches to text vectorization and methods that allow to classify texts. The third chapter introduces social science issues that are helpful for automatic text analysis for diagnostic purposes. It nails down the notions of attitudes and types of known tools that allow for making a diagnosis. The fourth chapter describes the architecture and operation scheme of the tool prepared within the framework of this publication that allows for automatic analysis of the text from the point of view of emotions. The fifth chapter summarizes the results of experiments conducted on the tool. The sixth chapter contains a summary of this work, conclusions and possible improvements or development of the prepared tool.

158

U. Krzeszewska and J. Ochelska-Mierzejewska

2 Text Representation The process of machine learning based on text data requires the representation of documents in the form of vectors in a specific feature space. The way of such representation can be called a language model [13]. For further analysis it is necessary to define the basic concept which is a word. A word is defined as any sequence of letters in the text in natural language, separated by spaces or punctuation marks [5, 13].

2.1 Text Pre-processing Every document can be transformed into certain groups, subgroups, in order to finally get a vector of numbers. The basic operations that a text document can be subjected to as part of a shallow text analysis are the following transformations [14]—Fig. 1. 1.

2.

Tokenization is one of the most basic transformations of the text. It consists of dividing the text into linguistic units, so called tokens. Tokens can be single sentences, but also words, numbers, punctuation marks, etc. [14, 15]. The simplest tokenization consists of dividing the text into sentences, using a capital letter and punctuation marks like dot, exclamation mark and question mark, and into smaller tokens using spaces. This type of algorithm is sufficient for the Polish language. An example of tokenization is presented below. SENTENCE: „Ala ma kota, kot ma Al˛e.” (What means “Ala has cat, cat has Ala.”) TOKENS: ‘Ala’, ‘ma’, ‘kota’, ‘,’, ‘kot’, ‘ma’, ‘Al˛e’, ‘.’ The unification of letter capitalization appears in the situation when all letters in a given word are of equal size. This is a very important stage of initial text processing. It allows for further analysis [16]. Examples of unification are presented below, the first one can be the word beginning of a new sentence, and the second one is one of the ways of writing words on the Internet. Everyone → everyone SuNrIsE → sunrise

Fig. 1 Schema of text preparation—shallow text analysis

Sentiment Analysis for Diagnostic Purposes

3.

159

Lematization is the reduction of a group of words constituting a variation of one expression to a common form [17]. Technical lemmatics can be applied in two ways: • by reduction, which means that during processing all words are converted to their basic form; • by extension, which means that all grammatical forms are added to the words being processed, that is, in fact, taking into account many grammatical forms of the analyzed word. What is important is that the whole process of lemmatization does not take into account the context in which a given word is found. The two ways of lemmatization mentioned above are respectively: • Stemming is to extract from the word the so-called core, or the theme of the word by removing the affixes from it. It means that with the help of one therm it will be possible to present the whole family of words [16, 17]. An example of stemming (all the words given can be translated into count instead of liczba which means number): zaliczy´c przeliczy´c podliczenie → licz wyliczenie liczba This approach is very fast and works well for the English language, for which the number of stamping rules is relatively small, it can be placed on one page. For comparison, the number of such rules for the French language is already 60,000. Additionally, a list of exceptions must be defined for each case. As far as the Polish language is concerned, there would be several times more rules than in the French language, the same applies to the exceptions. This means that applying this kind of analysis for the Polish language can be very expensive both in therms of time and calculation. • Dictionary analysis, or inflection analysis, that is, analysis based on the assumption that all possible variations of a word are included in the dictionary, where additionally there are relationships between words from the same word family [15]. An example of such an analysis is presented in Table 1.

4.

Part-of-Speech tagging (POS-tagging) is the process of determining what part of speech a word is. As part of this process, it is possible to obtain additional information regarding morphological analysis, such as type, person, whether it is a singular or a plural etc. An example of POS-tagging is shown below. przechodził (went through) → verb → masculine → singular → past tense

160 Table 1 Example of dictionary analysis

U. Krzeszewska and J. Ochelska-Mierzejewska Example

Explanation

Podczas patroszenia dzika… (When gutting wild boar…)

Can be lematized as: – patroszy´c (gut), – patroszenie (gutting)

Wi˛eksze w˛atpliwo´sci maj˛a… (More doubts have…)

Can be lematized as: – wielki (the great), – du˙zy (large)

Ala ma kota (Ala has a cat.)

Can be lematized as: – mie´c (have), – mój (my)

2.2 Vectorization Bag-of-words (BOW) is one of the ways of vector representation of text. In this approach, each document is transformed into a vector whose length is equal to the number of words contained in a certain dictionary. A dictionary, on the other hand, is usually a set of unique words (including therms) that appear in the whole set of documents [18]. Usually, these are not all words, because the so-called stopwords are not infrequently omitted. Stopwords are words that occur relatively often in the examined set of documents and are insignificant, i.e. they are not a carrier of information. There are lists of such words for each language. The list of stopwords for Polish language is presented below. Examples of stop words in English include: “a”, “an”, “the”, “and”, “but”, “if”, “or”, “because” etc. For the Polish language, due to national characters, this list contains words in two versions—with and without national characters, e.g. “az” and “a˙z”. a, aby, ach, acz, aczkolwiek, aj, albo, ale, alez, ale˙ z, ani, az, a˙ z, bardziej, bardzo, beda, bedzie, bez, deda, b˛ ed˛ a, bede, b˛ ed˛ e, b˛ edzie, bo, bowiem, by, byc, by´ c, byl, byla, byli, bylo, byly, był, była, było, były, bynajmniej, cala, cali, caly, cała, cały, ci, cie, ciebie, ci˛ e, co, cokolwiek, cos, co´ s, czasami, czasem, czemu, czy, czyli, daleko, dla, dlaczego, dlatego, do, dobrze, dokad, dok˛ ad, dosc, do´ s´ c, duzo, du˙ zo, dwa, dwaj, dwie, dwoje, dzis, dzisiaj, dzi´ s, gdy, gdyby, gdyz, gdy˙ z, gdzie, gdziekolwiek, gdzies, gdzie´ s, go, i, ich, ile, im, inna, inne, inny, innych, iz, i˙ z, ja, jak, jakas, jaka´ s, jakby, jaki, jakichs, jakich´ s, jakie, jakis, jaki´ s, jakiz, jaki˙ z, jakkolwiek, jako, jakos, jako´ s, j˛ a, je, jeden, jedna, jednak, jednakze, jednak˙ ze, jedno, jego, jej, jemu, jesli, jest, jestem, jeszcze, je´ sli, jezeli, je˙ zeli, juz, ju˙ z, kazdy, ka˙ zdy, kiedy, kilka, kims, kim´ s, kto, ktokolwiek, ktora, ktore, ktorego, ktorej, ktory, ktorych, ktorym, ktorzy, ktos, kto´ s, która, które, którego, której, który, których, którym, którzy, ku, lat, lecz, lub, ma, maj˛ a, mało, mam, mi, miedzy, mi˛ edzy, mimo, mna, mn˛ a, mnie, moga, mog˛ a, moi, moim, moj, moja, moje, moze, mozliwe, mozna, mo˙ ze, mo˙ zliwe, mo˙ zna, mój, mu, musi, my, na, nad, nam, nami, nas, nasi, nasz, nasza, nasze, naszego, naszych, natomiast, natychmiast, nawet, nia, ni˛ a, nic, nich, nie, niech, niego, niej, niemu, nigdy, nim, nimi, niz, ni˙ z, no, o, obok, od, około, on, ona, one, oni,

Sentiment Analysis for Diagnostic Purposes

161

ono, oraz, oto, owszem, pan, pana, pani, po, pod, podczas, pomimo, ponad, poniewaz, poniewa˙ z, powinien, powinna, powinni, powinno, poza, prawie, przeciez, przecie˙ z, przed, przede, przedtem, przez, przy, roku, rowniez, równie˙ z, sam, sama, s˛ a, sie, si˛ e, skad, sk˛ ad, soba, sob˛ a, sobie, sposob, sposób, swoje, ta, tak, taka, taki, takie, takze, tak˙ ze, tam, te, tego, tej, ten, teraz, te˙ z, to, toba, tob˛ a, tobie, totez, tote˙ z, totob˛ a, trzeba, tu, tutaj, twoi, twoim, twoj, twoja, twoje, twój, twym, ty, tych, tylko, tym, u, w, wam, wami, was, wasz, wasza, wasze, we, według, wiele, wielu, wi˛ ec, wi˛ ecej, wlasnie, wła´ snie, wszyscy, wszystkich, wszystkie, wszystkim, wszystko, wtedy, wy, z, za, zaden, zadna, zadne, zadnych, zapewne, zawsze, ze, zeby, zeznowu, zł, znow, znowu, znów, zostal, został, ˙ zaden, ˙ zadna, ˙ zadne, ˙ zadnych, ˙ ze, ˙ zeby [19]

It is worth noting that the word “nie” (what means “no”) is on this list. Indeed, it occurs very often in Polish, but for the purpose of the applied solution, it will be removed from the list of irrelevant words, due to the importance it carries, especially when talking about emotional attitudes and sentiment. The bag-of-words method works well in texts with different themes, when there are no discipline-specific words, which means that it is very good for analyzing the text from the point of view of emotions, because each text can touch different themes. Finally, this means that within bag-of-words each document is reproduced in a discrete space where each dimension corresponds to a unique word. However, individual elements of this vector may determine [20]: • presence of the therm in the analyzed document, then each element of the vector belongs to a two-element set {0, 1}; • number of therm occurrences in the analyzed document, then each element of the vector belongs to a set of non-negative integers. Example: It is assumed that the set contains three documents, as shown in Table 2. In the first step, tokenization and lemmatization of documents is performed, thus creating a set of therms for each document. The next step is to create a dictionary that will consist of a set of unique words for all documents, with the omission of irrelevant words (in this case, the words „i” (“and”) and „te˙z” (“also”)). For the documents given in the example, a collection of Table 2 Example of a set of documents and their conversion to therms

Document

Therms

Ala ma kota, kot ma Al˛e (Ala has cat, cat has Ala.)

Ala, mie´c, kot, kot, mie´c, Ala (Ala, have, cat, cat, have, Ala)

Tomek ma kota i ma te˙z psa (Tomek has a cat and also has a dog.)

Tomek, mie´c, kot, i, mie´c, te˙z, pies (Tomek, have, cat, and, have, also, dog)

Ala nie ma psa (Ala has no dog.)

Ala, nie, mie´c, pies (Ala, not, have, dog)

162

U. Krzeszewska and J. Ochelska-Mierzejewska

Table 3 Vector representation of sample documents Document

Vector Ala (Ala) kot (cat) mie´c (have) nie (no)

pies (dog) Tomek (Tomek)

2

2

2

0

0

0

Tomek ma kota i 0 ma te˙z psa (Tomek has a cat and also has a dog.)

1

2

0

1

1

Ala nie ma psa 1 (Ala has no dog.)

0

1

1

1

0

Ala ma kota, kot ma Al˛e (Ala has cat, cat has Ala.)

unique elements is: „Ala”, „kot”, „mie´c”, „nie” (as described, we do not treat „nie” (“no”) as a stop word), „pies”, „Tomek”. Therms in the dictionary are set in a specific order, e.g. alphabetically. For such a dictionary it is possible to express each of the documents using vectors, where the ith element of the vector corresponds to the number of occurrences of the ith therm from the dictionary in the analyzed document. Vector representation of sample documents is presented in Table 3. The bag-of-words method is one of the simplest methods of text representation. It is, however, criticized for omitting the occurrence of relationships between individual words. Such omission may have huge consequences for the outcome of the analysis. For example, the phrase „nie najgorzej” (‘not the worst’) is positive, but if the words „nie” (‘not’) and „najgorzej” (‘the worst’) are analysed separately, the result can easily be described as negative [20]. The answer to this problem is to use the n-gram method. Can be said that this method is a kind of development of bag-of-words. Here we count the occurrences of a sequence of adjacent words of a fixed length n. The most common models are [21]: • 1-gram (unigrams), or in fact bag-of-words, is present in two variants, taking into account the occurrence of a given unigram and paying attention to the frequency of occurrence of a given unigram, as described in detail before; • 2-gram (bigrams); • 3-gram (trigrams). For a sequence of adjacent words with a length of 2 or more, the frequency variant is used, which means that the number of occurrences of the sequence in the document is counted. Example: It was assumed that the teaching set contains three documents, as in Table 3. As before, the first step is to tokenize and lemmatize the documents, remove

Sentiment Analysis for Diagnostic Purposes Table 4 Sample documents and corresponding bigrams

163

Document

Therms

Ala ma kota, kot ma Al˛e (Ala has cat, cat has Ala.)

Ala mie´c, mie´c kot, kot kot, kot mie´c, mie´c Ala (Ala has, has cat, cat cat, cat has, has Ala)

Tomek ma kota i ma te˙z psa Tomek mie´c, mie´c kot, kot (Tomek has a cat and also has a mie´c, mie´c pies dog.) (Tomek has, has cat, cat has, has dog) Ala nie ma psa (Ala has no dog.)

Ala nie, nie mie´c, mie´c pies (Ala not, has no, no dog)

irrelevant words, and then create therms from the sequence of adjacent words for the big frame model. The result of these transformations is shown in Table 4. As in the previous example, the next step is to create a dictionary of unique bigrams, for this example it will be: „Ala mie´c”, „mie´c kot”, „kot kot”, „kot mie´c”, „mie´c Ala”, „Tomek mie´c”, „mie´c pies”, „Ala nie”, „nie mie´c”. They should be ranked in a certain order and then their occurrences should be counted, as shown in Table 5. Of course, this method can also be used without removing words considered insignificant, which can further increase the accuracy of subsequent analysis and thus subject the entire structure of the document to analysis. This makes the algorithm more sensitive to the appearance and shape of the document [22].

2.3 Supervised Automatic Classification Methods The k-nearest neighbours (k-NN) method is one of the basic machine learning methods used for classification. It is based on the intuition that similar data have similar labels. The following steps of the algorithm are presented below [23, 24]. Learning: 1. 2.

Preparation of data for analysis. Memorizing the whole training set. Classification:

1. 2. 3. 4. 5.

Preparing test data in the same way as training data. Calculating the distance between the test vector and all vectors from the training set. Sorting the test vectors by their distance, from the smallest to the largest. Select the k-closest vectors for the test vector. Counting the occurrence of individual labels among the k-nearest and selecting the most frequent one, if several labels have the same, highest number of occurrences, this one is selected at random.

1

0

0

Tomek ma kota i ma te˙z psa (Tomek has a cat and also has a dog.)

Ala nie ma psa 0 (Ala has no dog.)

1

mie´c kot (has cat)

1

Ala mie´c (Ala has)

Vectors

Ala ma kota, kot ma Al˛e (Ala has cat, cat has Ala.)

Document

0

0

1

kot kot (cat cat)

0

0

1

kot mie´c (cat has)

Table 5 Vector representation of sample texts of the big frame model

0

0

1

mie´c Ala (has Ala)

0

1

0

Tomek mie´c (Tomek has)

1

1

0

mie´c pies (has dog)

1

0

0

Ala nie (Ala not)

1

0

0

nie mie´c (has no)

164 U. Krzeszewska and J. Ochelska-Mierzejewska

Sentiment Analysis for Diagnostic Purposes

6.

165

Assigning the most frequent label as the test vector labels.

To calculate the distance between the vectors two Euclidean and cab meters (otherwise called urban) were used. In the Euclidean metric, in order to calculate the distance d e (x, y) between two points x = (x 1 , x 2 , …, x n ), y = (y1 , y2 , …, yn ) it is necessary to calculate the square root from the sum of the second power of the differences of the coordinate values with the same indices, according to Formula (1). de (x, y) =



(x1 − y1 )2 + (x2 − y2 )2 + · · · (xn − yn )2

(1)

In the cab (city) metric, in order to calculate the distance d m (x, y) between two points x = (x 1 , x 2 , …, x n ), y = (y1 , y2 , …, yn ) the sum of values of the absolute differences in coordinates of points x and y should be calculated, according to Formula (2). dm (x, y) =

n    xi −y  i

(2)

i=1

Naive Bayes classifier. Often machine learning uses probabilistic methods based on Bayes’s claim. It is presented in Formula (3). P(A|B) =

P(B|A) · P(A) P(B)

(3)

where: P(A|B) is the probability of conditional occurrence of event A, if B occurs; P(B|A) is the probability of conditional occurrence of event B, if A occurs; P(A) is the probability of conditional occurrence of event A; P(B) is the probability of conditional occurrence of event B. Based on this statement, a classification method called the Naive Bayes Classifier (NBC) was created. This classifier establishes the linear independence of the tested traits, in the case of the text classification, the linear independence of single words and texts. It has been proven that when the relationship between the traits is nontrivial, NBC is still effective. Depending on the characteristics of the input vector, it is possible to distinguish several versions of this classifier, based on the following distributions [25]: • Bernoulli distribution, for which the feature vector only contains information about the presence or absence of the feature. • Gauss distribution, for which feature vector contains values from continuous space (for natural language processing only feature vectors in discrete space will be taken). • Polynomial distribution, for which the feature vector contains information about the number of a given feature in the analyzed set (it has been proven that for the

166

U. Krzeszewska and J. Ochelska-Mierzejewska

classification of texts, Naive Bayes Classifier based on this distribution works best). Regardless of the chosen probability distribution, the result of the text classification will be the result of the following algorithm: 1.

Calculation of the probability of the occurrence of the x j word in the class of ci , which represents Formula (4). P(x j |ci ) =

2.

occurrence of the term x j in documents of the class ci occurrence of all terms in documents of the class ci

Calculation of the probability of the document belonging to a given class ci , as shown by Formula (5). j P(ci |d) = P(ci ) ·

3.

P(xk |ci ) P(d)

k=1

(5)

Selection of the class for which the calculated probability is highest, according to Formula (6). 

c = arg max P(ci |d) = arg max ci

4.

(4)

ci

 j 

 P(xk |ci )

(6)

k=1

In order to avoid zeroing the probabilities when one of the thermocouples does not occur, Laplace smooth should be used, i.e. 1 should be added to the numerator in Formula (4) and the number of unique thermocouples in all categories should be added to the denominator, as presented in Formula (7). P(x j |ci ) =

occurence of the term x j in documents of the class ci + 1 occurence of the class ci + unique terms of all the classes

(7)

Example: The set of documents d 1 , d 2 , d 3 , d 4 and d 5 is given. Documents d 1 , d 2 , d 3 , d 4 belong to the appropriate classes, and all documents have the designated therm, as shown in Table 6. To classify document d 5 , you must calculate the probability of the words „trener”, „parkiet”, „noga” and „rytm” in subsequent documents of sport and dance, according to the Formula (7), as shown in Table 7. All therms of sport class is 10, dance class are 9, and the number of unique therms of all classes is 11. As shown in Table 7, for document d 5 classified on the basis of documents d 1 , d 2 , d 3 , d 4 it is possible to classify the document as a dance.

Sentiment Analysis for Diagnostic Purposes

167

Table 6 Set of documents with assigned thermals and classes Document

Therms

Class

d1

trener, piłka, rzut, parkiet, noga (trainer, ball, throw, floor, leg)

sport

d2

trener, partner, muzyka (trainer, partner, music)

dance

d3

zespół, mecz, boisko, zawodnik, piłka, noga (team, match, pitch, player, ball, leg)

sport

d4

parkiet, noga, muzyka, muzyka, parkiet (dance floor, leg, music, music, dance floor)

dance

d5

trener, parkiet, noga, rytm (trainer, floor, leg, rhythm)

?

Table 7 Calculation of the probability of text classification d5 using NBC

Therm

P(therm|sport)

P(therm|taniec)

trener (trainer)

rytm (rhythm)

1+1 10+11 1+1 10+11 2+1 10+11 0+1 10+11

1+1 9+11 2+1 9+11 1+1 9+11 0+1 9+11

Result

0.000059

0.000075

parkiet (floor) noga (leg)

3 Social Aspects of Text Representation The natural language processing nowadays requires from the scientist, apart from the knowledge of statistics and machine learning, to deepen the knowledge of psychology, sociology, pedagogy, because these are the fields he uses. When the analysis is to touch upon the notion of attitude, it is necessary to become acquainted with the theory of attitudes, as well as to fully understand it.

3.1 Theory of Attitude The word “attitude” has many meanings. This ambiguity is mainly due to its use in many, completely different fields (e.g. philosophy, sociology, anatomy, psychology). What is additionally important is that in psychology itself, and thus in pedagogy, the term “attitude” was used in various fields. In classical psychology, this term was defined as a kind of mental attitude. Nowadays, an attitude is understood as a state of preparation for a specific reaction, which facilitates this process of reaction. Right next to the notion of attitude, we are currently encountering an attitude. Social psychologists dealing with personality issues have come to the conclusion

168

U. Krzeszewska and J. Ochelska-Mierzejewska

that, similarly to the readiness to react to stimuli, there may be a readiness to react to social situations, the opponent’s arguments, his views, i.e. a specific readiness to behave towards representatives of other races, nations, views, etc. Following the definition of G. W. Allport’s attitude is a mental and nervous state of readiness, organized by experience, which has a directing or dynamic influence on the individual’s reactions to all objects or situations in which he is bound. Importantly, this author also includes traits in the state of readiness (by which he defines the concept of attitude). It can be said that, like a trait, the attitude is a neuropsychic system. However, the two concepts cannot be identified with each other because there are significant differences between them. And it is by revealing these differences that the concept of attitude itself can be better understood [26–29]. An attitude always has a specific object—it concerns someone or something, you cannot speak of an attitude in itself without referring it to some phenomenon, object or person—a feature does not have such a specific object. Going further, attitudes can be specific or general when the traits are only general. And finally, attitudes usually mean acceptance or rejection, it can be clearly stated that they are favorable or unfavorable when the traits are not so strictly oriented [26]. Concepts of attitude. Among the numerous theories concerning the issue of attitudes, three groups can be distinguished [30]: 1.

2.

3.

Theories referring to the behavioral tradition or psychology of learning, pay particular attention to human behavior, his reactions to the objects of the outside world. It is assumed that these behaviors are repetitive and consistent. Theories relating to the sociological concept, which draws attention to the attitude of the attitude carrier to its subject. This relationship is described as evaluative or emotional. According to this theory, an attitude is understood as the degree of intensity of positive or negative feelings associated with an object. Theories that refer to the cognitive concepts in psychology, take an attitude not only as a specific behavior or emotional relation to a given object, but also cognitive elements related to it.

All these three concepts make it possible to state that the attitude itself consists of three components: cognitive, emotional and behavioral [30]. Due to the specificity of the analysis, which is the analysis of the content of a written statement, it will be possible to focus only on the emotional component, therefore it will be described in more detail in the next part of the paper. The emotional component. According to the definition presented by Nowak, the emotional-evaluating or otherwise affective component may have the character of cool, intellectualized assessments. The evaluations are usually verbalised using terms such as “good,” “bad,” “right,” “wrong, and when an attitude is accompanied by concrete behavior, also using words such as “should,” “want,” “should". This component can also be expressed in the form of emotions accompanying the image of the object of the attitude, such as joy, fear, respect, contempt, etc. The cumulative treatment of emotions and judgments as an underwater affective component is justified by the fact that they can have very

Sentiment Analysis for Diagnostic Purposes

169

similar functions, both determining how important an attitude object is, as well as motivational, which determine our behavior towards an attitude object. The judgments and emotions of different people towards an object have a certain direction and intensity. The direction is defined as a positive or negative attitude towards an attitude object. The intensity of the attitude is defined by the strength of emotions evoked by this object [31].

3.2 Diagnostic Elements The word “diagnosis” comes from the Greek διαγνoσις, that is, recognition, differentiation, judgment. This concept has its origins already in ancient works, such as that of Hippocrates. Currently, the term “diagnosis” is repeated after Hederson and Henderson can be described as “a concise description of the body, containing fully important characteristics, distinguishing physiological or pathological conditionings based on their important signs” [32]. According to Oko´n, a diagnosis is “the recognition of an object, event, or situation in order to obtain precise information and to prepare for drainage activities” [33]. He adds that we can distinguish several types of diagnosis: medical, psychological, pedagogical and educational. The concept of diagnosis is used primarily in medical science. Originally, it concerned only the description of the body’s pathological condition, then it concerned the determination of a person’s health, their ability to work, etc. Nowadays, the term “diagnosis” is also used outside medicine, in other fields of knowledge. It is then used to describe the condition of the examined object. This means that it does not only identify the pathological state, but also all the states of things, developmental trends, etc. related to the examined object. Moreover, Ziemski determines the complexity of diagnostic activities and distinguishes many partial diagnoses [34]. According to Jarosz and Wysocka diagnostic description is a list of empirical data, about which there is a possibility of purposeful behaviour [35]. The description should be exhaustive, clear, presenting a given phenomenon or situation in an ordered way. It is expressed in empirical terms as a selection of facts reported in a selective way. Research methods, techniques and tools. The research process is a conscious, purposeful and intended action. This means that it should be guided by certain rules. Therefore, one of the most important processes, after defining the general problem and the research hypothesis, is to establish research methods, techniques and tools. The concept of the research method is widely described in the literature. The definition proposed by Nowak will be adopted for the purpose of this paper. He concludes that: “By scientific method we mean here a specific, repetitive way of solving a problem. The method of empirical research is as much as a specific, repetitive way of obtaining a certain type of information about reality, necessary for solving a specific type of research problem, searching for an answer to a specific type of

170

U. Krzeszewska and J. Ochelska-Mierzejewska

question by a broadly understood observation of reality” [36]. With reference to the presented definition, there are many classifications of pedagogical research methods. Pilch stands out in his classification [37]: 1.

2. 3. 4.

Pedagogical experiment—it is a way of gathering knowledge about the examined person, consisting in organizing an unusual situation, allowing for the release of his attitudes and reactions. The pedagogical monograph—it is a method of conduct that leads to the description of educational institutions. The method of individual case—it is a method of research consisting in the analysis of the activity of a single person who is in specific life situations. Diagnostic survey method—it is a way of collecting knowledge about the functioning of a particular social phenomenon.

Due to the specifics of the proposed solution, this work will focus only on the individual case method. For the chosen research method, it is necessary to select appropriate research techniques. The most frequently used in the individual case method are [37]: 1.

2.

3.

4.

Observation—is a research activity consisting in gathering information by means of observations, i.e. all forms of observation. One can say that observation provides the researcher with the most natural information about the examined person. Interview—is a conversation between the researcher and the respondent according to previously developed instructions or based on a special questionnaire. An interview is used mainly to learn about facts, opinions and attitudes of a given individual. Analysis of documents—it is a technique consisting in collecting preliminary, descriptive and quantitative information about the examined unit, learning about its biography and opinions expressed in documents. Content analysis—it is a research technique used to objectively, systematically and quantitatively describe the open content of informational messages. The analysis of the content of personal documents allows to make diagnoses of psychological characteristics of people.

To the full extent of terminological findings, the concept of a research tool still needs to be defined. It is a subject used to implement a selected research method. It is worth emphasizing that the research technique has a verbic meaning and signifies an action, while the research tool has a noun meaning and serves for technical data collection [37].

Sentiment Analysis for Diagnostic Purposes

171

4 Text Analyzer Application The designed application (Text Analyzer) is used to classify texts in Polish in terms of emotional attitude presented in the text. In the current version of the program there are two types of text classification, namely: • positive, • negative. Classification is made on the basis of collected texts. These texts were collected from 4th year students of computer science at the Lodz University of Technology in 2019. The task of each of the students was to write any five positive sentences and five negative sentences, together with a statement which of the written sentences are positive and which are negative. An example of a positive sentence is: “Wczoraj, gdy wracałem z pracy spotkała mnie bardzo miła sytuacja - dostałem kupon na darmow˛a pizz˛e.” (what means “Yesterday, when I was coming back from work, I had a very nice situation - I got a coupon for a free pizza.”) and a negative sentence example: “Dzisiaj rano musiałem jecha´c na zaj˛ecia w zatłoczonym tramwaju z jakimi´s idiotami, którzy wpadli na pomysł z˙ eby sprawdzi´c bilety.” (what means “This morning I had to go to class in a crowded train with some idiots who came up with an idea to check the tickets.”). There were also texts whose sentiment was not determined by the students and their classification was not clear, such as “Najlepiej zmarnowane 3.5 roku mojego z˙ ycia” (what means “The best wasted 3.5 years of my life”). The sentences studied had an average of 7 words. The longest text consisted of 22 words, while the shortest one contained only 2 words. It is worth noting that there were no restrictions on the vocabulary used, hence many negative texts contained numerous vulgarisms. In addition, the subject of the statement was not specified in order to avoid the extraction of features from the subject of the statement instead of sentiment. In the end, however, most of the texts were focused around university life, public transport and the work that the students probably do in parallel with their studies. 150 students took part in the process of data collection, which ultimately resulted in about 750 positive sentences and about 750 negative sentences (as mentioned earlier, not all students marked the sentiment of the text). The application is written in Java version 1.8. It uses an external library morpholophic-polish version 1.9.0 [38]. This is free software under a 3-clause BSD license. The text analyzer application consists of 8 modules divided according to their functions. These modules are assigned to the three-tier application architecture as follows—Fig. 2. 1.

Data Access Layer • DataEnums—module containing a set of enumeric values representing data. • DataReader—module responsible for loading data from files and preparing them for analysis. • TextComponents—module containing object-oriented text representation.

172

U. Krzeszewska and J. Ochelska-Mierzejewska

Fig. 2 Text analyzer architecture schema with all modules included

2.

Business Layer • Classification—module containing all applied methods of text classification. • MorphologicalAnalise—module containing the entire morphological analysis. • Serialization—module responsible for serialization of application effects. • TextVectorization—module containing all applied methods of text vectorization.

Sentiment Analysis for Diagnostic Purposes

3.

173

Presentation Layer • CommandLineInterface—module containing command line interface.

5 Experiments In order to check the effectiveness of the created application and proposed methods, a series of tests were conducted. The collection of texts was randomly divided into a teaching set (70% of sentences obtained from students, as many positive sentences as negative sentences) and a test set (30% of sentences). The experiment consisted in classifying the texts from the test set into one of the categories, as described in Chapter “Managing Quality of Human-Based Electronic Services”—positive or negative. Each text was analyzed separately and after the classification it did not become part of the system, so it was possible to obtain the same test conditions for all texts—the order did not matter. Tests were performed for two of the mentioned classification methods—the knearest neighbors method and the Naive Bayesian classifier. For the k-nn method the effectiveness in two metrics (Euclidean and cab) for the bag of words, 2-gram and 3-gram vectorization methods was compared (Fig. 3). For NBC the efficiency was

Fig. 3 Percentage efficiency of k-nn method for different metrics and vectorization methods

174

U. Krzeszewska and J. Ochelska-Mierzejewska

Fig. 4 Percentage efficiency of NBC for different probabilistic distribution and bag-of-words vectorization method

measured for bag-of-words vectorization method for all three mentioned probability distributions (Fig. 4). As can be seen in the graphs, all the methods used have achieved an efficiency of about 55% for the given texts. The best effectiveness was shown by the algorithm based on Bayes’s Naive Classifier with polynomial distribution additionally enriched with morphological analysis (61.6%). It is also worth mentioning that among the methods based on the k-nearest neighbors classification algorithm was more effective using the Euclidean metric to calculate the distance between texts compared to the city metric. Additionally, the use of n-gram methods also increased the effectiveness of this method (depending on the metric used, either the big or the triggers worked better, but the difference in results is so small that it does not determine which of the methods of vectorization is better in this scenario). Providing context to the analyzed words, as expected, increases the effectiveness of the classification made (at the same time increases the number of dimensions of the text vector allowing for more accurate comparison). Regarding the duration of running time of the application, a tendency can be observed that the longer the vector to be analyzed, the longer the time to calculate

Sentiment Analysis for Diagnostic Purposes

175

the result also becomes. This is particularly evident when using the n-gram method. Iterating successively after all elements in each n-gram causes, as the number of n increases, the processing time increases n times (due to the need to perform n times more operations to compare all n-grams with each other). By following the behaviour of the appropriate classifiers for the texts under examination, certain regularities can be observed: • None of the classifiers can handle very short texts like this, e.g.: „Lubi˛e placki” (what means “I like cakes”) or „Ładna pogoda” (what means “Nice weather”). One can say that both of these texts have positive overtones, but they are too short for the methods used and the collections of learners. In this case, very large text vectors are created, but most of them are filled with zeros, so the calculated distances or probabilities for classifying both as positive and negative are practically random. • None of the classifiers can also handle the analysis of sarcasm or sentences containing conflicting information. As for sarcasm, this is a general problem of natural language analysis. When we analyze a short text without knowing the context of the situation and the conditions of the person who speaks in a particular way, even as people we may have a problem deciding whether a sentence is positive or negative, but sarcastic. It is also worth mentioning that the achieved efficacy is not satisfactory in the context of diagnostics. Even the best of the obtained results is not much better than random assignment of sentiment to the text. This means that the methods used are insufficient for classification in this respect.

6 Summary The result of this work is an analysis of an application that allows for automatic classification of texts using two simple classification methods (k-nearest neighbors and Naive Bayesian Classifier methods) based on two methods of text vectorization (bag-of-words and n-gram) by stating the general emotional attitude of the examined text. The obtained results were made on a relatively small set of data (1500 texts in total), however, it is sufficient to establish their reliability. It can be assumed that the size of the training and test set had no significant impact on the results obtained. In addition, it is possible to state that the classification was made in terms of sentiment and not just some common features of the texts, such as the subject matter. The application prepared in this way is a good start to create diagnostic tools, to recognize the mood of the statement, the emotions contained in it, and attitudes towards the subject of the statement. It is one of the approaches enabling to improve the work of educators, psychologists or sociologists in the process of analysis of documents, interviews or the content of statements.

176

U. Krzeszewska and J. Ochelska-Mierzejewska

The most important influence on the effectiveness of the application was taking into account the context of the statement by using the n-gram method and taking into account the morphological context. The same sentence classified by the k-nearest neighbours method using the bag of words text vectoring method was wrongly classified as opposed to using the same method by vectoring the text using the 2-gram method. The very fact of increasing the weight of adjectives, verbs and adverbs allowed to improve the effectiveness of the methods used. This is in line with expectations—through knowledge of additional factors resulting from the context of the statement, the classification is more effective. In the future, it may be valuable to extend the context to the whole text, e.g. by using methods of text vectorization based on the whole document and not only on the local context of the word, or by using attention in the process of learning the unchartedness as in [39]. It is also worth noting that this type of application should be language-specific, as in this case, Polish. Using the same application for another language does not make sense, due to morphological and dictionary analysis and adjusting all functions to the rules governing one language. The only application of this type of application for another language may be justified if the text is previously translated into the application language. However, with this type of treatment, the cultural and linguistic context is lost, and you can never be sure that the translation will make sense. It is possible to try to generalize the methods by creating mechanisms that allow to manage application functions on the basis of grammatical rules, but so far, there is no solution suitable for all the users. The current methods of text processing can only work for languages from the same groups. The created application leaves a huge scope for various improvements and extensions. These may be related to, among others: • with creating a method that is a resultant of what is known nowadays or creating a completely new way of analysis—text classification algorithms available nowadays do not give satisfactory effectiveness in classifying emotions or attitudes; • with correction of spelling and language errors—nowadays the application is completely immune to errors, especially spelling errors, if a word does not exist in the dictionary and due to the error it will definitely not be there, or it will mean something completely different, there is no chance for its proper interpretation, which additionally disturbs the result; • with the extension of the morphological analysis and its greater inclusion in the classification—in the current version of the application the classification only takes into account the presence of specific parts of speech that could be of greater importance, in the future it would be worthwhile to take into account additionally the specific features of these parts of speech, such as the degree of the adjective, could determine the power of a given word, or maybe some parts of speech should not be taken into account, because they do not affect the overtones of the speech, but only make the classification difficult; • with the extraction of specific emotions—currently the application only defines the emotional attitude (positive or negative), which is sufficient to determine

Sentiment Analysis for Diagnostic Purposes

177

the attitude, but together with the acquisition of more data of the learners, the application can be extended to basic emotions (anger, fear, disgust, surprise, joy and sadness); • with a combination of applications with speech analysis—such a tool could additionally measure e.g. speed of speech, tone of voice, which would help to overcome the problem of sarcasm in texts, for example; • with more data needed for the learning process—currently the application contains quite a small set of learners, this is due to the great difficulty in obtaining data, the texts cannot be literary texts, because the target will be to analyze free speech and not literary texts. The same applies to song lyrics or poems, these, despite the wealth of emotional terms, are still not the target lyrics, just increasing the database of lyrics could increase the effectiveness. Future research in this area should primarily focus on: • the previously mentioned more complicated methods of representing texts would be worth testing: Continuos Bag of Words (CBOW), SkipGram, or methods based on the mechanism of attention. • comparison of classification efficiency for more complex classification methods, such as decision trees or logistic regression. Such a comparison can be found in [40, 41]. • trying to detect different emotions (what has already been mentioned as a possible improvement of the application) or sarcasm in the speech. These issues are inseparably connected with the analysis of texts in Polish, due to cultural conditions, however, so far the research in this area has only concerned the English language and the results have not been satisfactory in the case of sarcasm [42, 43]. • adding a speech analysis (as mentioned before as a possible improvement of the application), because of the many factors related to the topicality of how sentences are pronounced, it would be possible to better recognize the individual emotions accompanying the statement.

References 1. Hurwitz, J., Krisch, D.: Machine Learning for Dummies®, IBM Limited Edition. Wiley (2018) 2. Usama, M., Qadir, J., Raza, A., Arif, H., Yau, K.-L.A., Elkhatib, Y., Hussain, A., Al-Fuqaha, A.: Unsupervised machine learning for networking: techniques, applications and research challenges. IEEE Access (2017) 3. Miernik, T.: Wykonanie sztucznej sieci neuronowej do klasyfikacji depesz agencyjnych/Implementation of neural network to classify Reuters telegrams (2012). https://doi. org/10.13140/RG.2.2.23646.02887 4. Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. (2011) 5. Khurana, D., Koli, A., Khatter, K., Singh, S.: Natural Language Processing: State of the Art, Current Trends and Challenges (2017) ´ 6. Sniegula, A., Poniszewska-Mara´nda, A., Chom˛atek, Ł.: Towards the named entity recognition methods in biomedical field. In: Chatzigeorgiou, A., Dondi, R., Herodotou, H., Kapoutsis, C.,

178

7. 8. 9. 10.

11. 12. 13. 14. 15. 16.

17. 18. 19. 20.

21.

22. 23.

24. 25. 26. 27. 28.

29.

30. 31.

U. Krzeszewska and J. Ochelska-Mierzejewska Manolopoulos, Y., Papadopoulos, G.A., Sikora, F. (eds) SOFSEM 2020, LNCS, vol. 12011, pp. 375–387. Springer International Publishing (2020). ISSN 0302-9743. ISBN 978-3-3038918-5 https://textinspector.com/ https://jasnopis.pl/ http://morfeusz.sgjp.pl/ Aich, S., Choi, K.W., Kim, H.C.: An approach to investigate the impact of political change on the economy of South Korea using twitter sentiment analysis. Adv. Sci. Lett. 10172–10176 (2017) Bollen, J., Huina, M.: Twitter mood as a stock market predictor. Computer 44, 91–94 (2011) Öztürk, N., Ayvaz, S.: Sentiment analysis on Twitter: a text mining approach to the Syrian refugee crisis. Telemat. Inform. 136–147 (2017). https://doi.org/10.1016/j.tele.2017.10.006 Bonaccorso, G.: Machine Learning Algorithms. Packt Publishing, Birmingham (2017) Neumann, G., Piskorski, J.: A Shallow Text Processing. German Research Center for Artificial Intelligence GmbH (DFKI), Saarbrücken (2002) Mullen, L., Benoit, K., Keyes, O., Selivanov, D., Arnold, J.: Fast, consistent tokenization of natural language text. J. Open Source Softw. 3, 655 (2018). https://doi.org/10.21105/joss.00655 Camacho-Collados, J., Pilehvar, M.T.: On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis, pp. 40– 46. Association for Computational Linguistics, Brussels, Belgium (2018). https://www.aclweb. org/anthology/W18-5406. https://doi.org/10.18653/v1/W18-5406 Gurusamy, V., Kannan, S.: Preprocessing Techniques for Text Mining (2014) Soumya, G.K., Shibily, J.: Text classification by augmenting bag of words (BOW) representation with co-occurrence feature. IOSR J. Comput. Eng. (IOSR-JCE) (2014) https://github.com/bieli/stopwords/blob/master/polish.stopwords.txt Zhao, J., Yun, Y.: A proximity language model for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 291–298 (2009) Zyglarski, B.: Wykorzystanie sieci neuronowych i algorytmów genetycznych w analizie i kategoryzacji dokumentów naukowych, Warszawa. Wydział Matematyki i Informatyki Uniwersytet Mikołaja Kopernika, praca doktorska (2010) Cavnar, W.: Using an N-gram-based document representation with a vector processing retrieval model (1994) Dimitrakaki, C.: k nearest neighbours. The dirty secret of machine learning. Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University (2015) Anava, O., Levy, K.Y.: k-Nearest Neighbors: From Global to Local. Cornell University (2017). arXiv:1701.07266v1 [stat.ML] McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: Learning for Text Categorization: AAAI Workshop, pp. 41–48 (1998) M˛adrzycki, T.: Psychologiczne prawidłowo´sci kształtowania si˛e postaw. Pa´nstwowe Zakłady Wydawnictw Szkolnych, Warszawa (1970) Majchrzycka, A., Poniszewska-Mara´nda, A.: Secure development model for mobile applications. Bull. Pol. Acad. Sci. Tech. Sci. 64(3), 495–503 (2016) Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Xhafa, F.: Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. Computing 101(11), 1661–1685 (2019). https://doi.org/10.1007/s00607-018-0680-z Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N.: Use of salesforce platform for building real-time service systems in cloud. In: Proceedings of 14th IEEE International Conference on Services Computing, IEEE SCC 2017, pp. 491–494, 25–30 June 30 2017, Honolulu, Hawaii Soborski, W.: Postawy, ich badanie i kształtowanie. Wydawnictwo Naukowe Wy˙zszej Szkoły Pedagogicznej, Kraków (1987) Marody, M.: Sens teoretyczny a sens empiryczny poj˛ecia postawy. Pa´nstwowe Wydawnictwo Naukowe, Warszawa (1976)

Sentiment Analysis for Diagnostic Purposes

179

32. Hederson, I.F., Henderson, W.D.: Grundlagen der ärztlichen Diagnostik. Arzt und Philosophie, Berlin (1961) ˙ 33. Oko´n, W.: Nowy słownik pedagogiczny. Wydawnictwo Akademickie „Zak”, Warszawa (2001) 34. Ziemski, S.: Problemy dobrej diagnozy. Wiedza Powszechna, Warszawa (1973) ˙ 35. Jarosz, E., Wysocka, E.: Diagnoza psychopedagogiczna. Wydawnictwo Akademickie “Zak”, Warszawa (2006) 36. Nowak, S.: Metodologia bada´n socjologicznych. Pa´nstwowe Wydawnictwo Naukowe, Warszawa (1970) ˙ 37. Pilch, T.: Zasady bada´n pedagogicznych. Wydawnictwo Akademickie “Zak”, Warszawa (1998) 38. https://github.com/morfologik 39. Lindén, J., Forsström, S., Zhang, T.: Evaluating combinations of classification algorithms and paragraph vectors for news article classification, pp. 489–495 (2018). https://doi.org/10.15439/ 2018F110 40. Wang, Y., Zhou, Z., Jin, S., Liu, D., Lu, M.: Comparisons and selections of features and classifiers for short text classification. IOP Conf. Ser. Mater. Sci. Eng. 261, 012018 (2017) 41. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need (2017). arXiv:1706.03762v5 [cs.CL] 42. Khan, A.: [SARCASM DETECTION] (2020). https://doi.org/10.13140/RG.2.2.12940.46721. 43. Katyayan, P., Joshi, N.: Sarcasm detection approaches for English language (2019). https://doi. org/10.1007/978-3-030-03131-2_9

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ Algorithm and Empirical Comparison with Existing Open Source Solutions Jarosław Pokropinski, ´ Jakub Gasiorek, Patryk Kramarczyk, and Lech Madeyski Abstract SZZ algorithm is one of the most important algorithms in mining software defects as it allows to create data sets for the sake of software defect prediction. Unfortunately, still very few open source implementations of this algorithm were created. In recent years two interesting open source implementations of SZZ algorithm have been created, which are SZZ Unleashed and OpenSZZ. In this paper we compare how well these implementations perform, as well as propose an improved implementation named SZZ Unleashed-RA-C. The most important features of the proposed algorithm and implementation include: ability to identify and handle refactoring changes when tracing bug-introducing changes (RA functionality), discarding comments and files based on a regular expression, and last but not least the ability of using GitHub as the issue tracker. Keywords Software defect prediction · Software defect proneness · Bug inducing changes · Bug fixing commits · SZZ · Open-source tools

1 Introduction Information about the root cause of a bug and when it was introduced are often missing from issue tracking software. Research in the area of mining software repositories often relies on detailed bug information. Extending the data stored in issue trackers could be of a great value both for researchers and software developers. J. Pokropi´nski · J. Gasiorek · P. Kramarczyk · L. Madeyski (B) Wroclaw University of Science and Technology, Wroclaw, Poland e-mail: [email protected] J. Pokropi´nski e-mail: [email protected] J. Gasiorek e-mail: [email protected] P. Kramarczyk e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_7

181

182

J. Pokropi´nski et al.

One of the most known algorithms used for identifying bug-inducing changes is ´ SZZ proposed by Sliwerski et al. [9]. Its main purpose is to extend the bug report data with commit that first introduced the bug. The SZZ algorithm consists of 2 steps: • Identification of bug fixing commits using version control system (e.g., git) and issue tracking software (e.g., Jira). • Identification of a commit causing the bug. Researchers often rely on the SZZ algorithm to identify bug-introducing changes. Unfortunately, only a few of the SZZ implementations are publicly available. Two of the most popular ones introduced recently are: SZZ Unleashed made by Borg et al. [1] and OpenSZZ by Lenarduzzi et al. [4]. It is difficult to say which of these implementations produces better results based on the research papers alone, so our aim is to compare them using a validated Defects4J data set specifically prepared for evaluating SZZ implementations and introduced by Neto et al. [6]. We attempt to improve one of the available open source SZZ implementations with our own ideas, as well as ideas from existing literature. Our contributions in this paper are as follows: • Literature review of SZZ publications and open source implementations. • Extracting a list of improvements to the basic SZZ algorithm on a basis of literature and our own solutions. • An attempt to propose a new implementation of SZZ algorithm combining improvement ideas (own and existing in literature). • Empirical comparison of the found SZZ implementations as well as the new one on the same data set. The remainder of this paper is structured as follows. Section 2 contains a brief overview of the SZZ algorithm, existing implementations found in literature and evaluation of two open source SZZ implementations. Section 3 describes the ways in which topics including validation data set preparation and SZZ algorithm performance evaluation were conducted. In Sect. 4, we present results of experiments. In Sect. 5, we answer given research questions and present threats to their validity. Finally, in Sect. 6, we propose further research, while in Sect. 7, we draw conclusions.

2 Literature Review In this section we introduce the SZZ algorithm, discuss and compare existing SZZ algorithm implementations, and pose research questions.

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ …

183

2.1 The SZZ Algorithm The SZZ algorithm is the most commonly used algorithm for finding bug-introducing changes. Initially, it was developed for the SVN version control system, but has since evolved for repositories using git. The algorithm consists of two steps. In the first step, SZZ tries to find a bug fixing commit, based on references to bug reports or commit messages containing words like “fix”. Modified lines in the source code are then extracted from bug-fixing commits. Step two is the identification of bug inducing changes. The SZZ algorithm uses the blame functionality of the version control system to determine all commits that previously made changes to the same lines of code as bug-fixing commits. These commits are then labelled as potential bug-introducing commits. SZZ then determines whether these potential bug-introducing commits can be ruled out as bug-introducing or not. Each potential bug-introducing candidate has its commit date compared to the submission date of a corresponding bug report. All candidates, that took place before the creation of the report are considered as bug-introducing. If the commit time is after the bug report submission time, then the candidate is still a suspect, because it could be bug-introducing. This can happen if the change is a partial fix, or it is inducing another bug.

2.2 Existing SZZ Algorithm Implementations Researchers developed their own versions of the SZZ algorithm. One of them is an SZZ implementation proposed by Neto et al. [5] which is called the refactoring aware SZZ (RA-SZZ). This implementation introduces ability to identify and handle refactoring changes when tracing bug-introducing changes. RA-SZZ was then compared with another implementation named meta-changes aware SZZ (MA-SZZ) proposed ´ by da Costa et al. [2] and the original SZZ by Sliwerski et al. [9]. Refactoring changes detection is usually based on two tools: RefDiff by Silva et al. [8] and RefactoringMiner by Tsantalis et al. [10]. Comparison of these two tools shows that their precision and recall are similar, with a slight advantage towards RefactoringMiner. We will try to incorporate detection of refactoring changes by utilising the more accurate tool which is RefactoringMiner. Neto et al. [6] points out that the overall accuracy of SZZ algorithm increases by 40% if only valid bug-fix lines are used as the input for SZZ. To the best of our knowledge, the only two open implementations of the SZZ algorithm referenced from literature are SZZ Unleashed1 by Borg et al. [1] and OpenSZZ2 by Lenarduzzi et al. [4]. These two open source SZZ implementations 1 2

https://github.com/wogscpar/SZZUnleashed. https://github.com/clowee/OpenSZZ.

184

J. Pokropi´nski et al.

have concise readme files and an active community (reflected by the amount of GitHub stars and forks, higher for SZZ Unleashed), compared to other repositories.

2.3 Existing Implementations Comparison The ultimate goal of our publication was to choose one of the existing implementations of the SZZ algorithm and improve it. As mentioned in Sect. 2.2 there were two most promising open source implementations. Both of those were tested and the results were compared.

2.3.1

OpenSZZ Project Evaluation

The OpenSZZ project can be found in two versions: OpenSZZ as a standard Java project and a cloud native version with a surrounding docker infrastructure prepared. The cloud native version was developed after the simple one and has a few differences. At first, let us focus on the non functional aspect of the project. The primary difference between the two versions is the architecture. The so called cloud native version is using the microservices design pattern and the RabbitMQ message broker for the internal communication. One of the biggest issues with it is the log handling of the core SZZ service. By default the logs are forwarded to a few files in the container file system. Unfortunately, for most of the project processing, the used file writer flushes the data only after most of the processing is done. This results with user not being able to track most of the progress in the processed project. SZZ algorithms are known to take a lot of time, especially for highly developed projects. Having any indicator of whether the application is functional is important, especially in those used for research purposes. This leads to another problem the OpenSZZ project has. At first, the project was evaluated with the commons-bcel (this project was presented as an example in OpenSZZ) and unomi repositories. No issues occurred while running those. Later the syncope and commons-math projects were run. Unfortunately the runs were unsuccessful and the very basic logging did not give any indicators as to what could be the the cause. The application simply dropped its resource usage and kept running. Further investigation allowed to detect, that the application has ran out of memory. Increasing the heap size allowed the commons-math to succeed. 12 GB allowed syncope to process more data, but was not enough to finish the project processing. Besides the presented issues, the cloud version was easy to set up and work with. OpenSZZ encapsulates the whole SZZ process into a single service, which allows the user to easily start the processing without additional manual steps being required. The functional performance of the project was evaluated on commons-bcel and unomi projects. Initial results show that the algorithm is highly susceptible to major refactoring commits. Many resulting bug introducing commits were a major commit changing dozens of thousands lines of code. OpenSZZ also produces pairs

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ …

185

that do not have any common files changed between them. This can be observed between 45da20f49abafa125ff4f616e8312b89fbd1f139 (bug introducing commit) and 4d89da4f52f6ae26a4917ba79259e8c89c67eb77 (bug fixing commit) revisions in the commons-bcel repository. OpenSZZ attributed the bug introducing change to src/main/java/org/apache/bcel/classfile/Attribute.java file which was not changed in the bug introducing commit. Knowing current SZZ algorithm limitations it is highly unlikely for such a situation to be correctly detected. OpenSZZ has found 249 bug introducers in this repository in 3 min 30 s. After discarding issues that were major refactoring commits or any other commits that were unsuccessfully matched to at least ten bug fixing commits this number drops to 24.

2.3.2

SZZ Unleashed Project Evaluation

Starting with the non functional performance of the SZZ Unleashed implementation it should be noted that it is separated into a few steps. Each of which needs to be run manually. The same repositories were processed without any issues on default settings. We noticed that for bigger projects such as syncope, where the processing time took hours the job partitioning between threads was not perfect. Often half of the available threads have finished work within 1–2 h while others required few more hours to process all the issues. Running the commons-bcel repository took 6 min and produced 708 bug introducer and bug fixer pairs. A huge disadvantage of the SZZ Unleashed project is its output. By default it is a JSON array of arrays containing two strings—commit hashes. Internally the algorithm recognises the file in which the bug fix is placed which results in a huge count of duplicates. Another interesting issue that was noticed is that a single changed line in a change log file (in commons-bcel—it is called changes.xml) was attributed both to around 80 bug fixing commits and 120 bug inducing ones. Furthermore, the algorithm was susceptible to huge refactoring commits. Creating a list with a count of bug introducing commits assigned to each bug fixing commit shows that there are many issues fixing dozens of commits. Such a case is possible, but highly unlikely to occur. It indicates a higher possibility of the bug fixing commit being incorrectly matched. SZZ Unleashed often detected commits containing changes to the comments only. This can be noticed in this pair from the unomi project 0ffc0814f4ff4288b591407afdb0679358249bc (bug fixing commit), 1d075ec19850466a355ecffc1dfed2da049e25c9 (bug introducing commit). After discarding most common invalid bug inducing commits the count of issues dropped to 458. The same action for bug fixing commits resulted in 468 pairs. After discarding both, the count dropped to 220 and after removing duplicated pairs that could not be validated it was equal to 86.

186

2.3.3

J. Pokropi´nski et al.

OpenSZZ and SZZ Unleashed Comparison

The major differences between those two projects are the time and memory complexity. OpenSZZ is much faster on bigger repositories, but requires much more memory to process them. OpenSZZ has a friendlier interface for working with it while SZZ Unleashed requires more manual steps. During this process we had an opportunity to get familiar with the internal code base of both of them and our subjective opinion is that the code quality was better in SZZ Unleashed. Both of the implementations have their constraints and neither of them by default produces reasonably valid results. There is a lot of room for improvement. The performed evaluation resulted in the following improvements that could be implemented: 1. 2. 3. 4.

Making SZZ refactoring aware. Considering only specified file extensions (ex. “.java”). Disregarding deleted lines matching specified pattern. Disregarding fixing and bug-introducing commit pairs if the time between them is greater than 2 years. 5. Adding GitHub issues support as an issue tracker.

Limiting algorithm to .java files addresses the problem where configuration files are matched as introducers to fixes that are made in code. Disregarding deleted lines allows to ignore lines that do not contain bug. We wanted to validate impact of time between commits on matching bug introducing commits. As was pointed out in paper by da Costa et al. [2], it is unlikely, that bug-introducing changes in a project introduce bugs that took years to be discovered. We tried disregarding fixing and bug-introducing commit pairs if the time between them is greater than 2 years. Lastly, to bring SZZ to greater number of projects we proposed to extend existing implementations by adding support for GitHub issues.

2.4 Research Questions The aim of this paper is to compare existing SZZ implementations using a validated data set and build upon and improve the algorithm which produces better results out of the box. Therefore, we address the following research questions (RQs): • RQ1: Which of the two open source implementations (OpenSZZ vs. SZZ Unleashed) produces results with higher recall using our data set? We want to compare this two implementations against the validated Defects4J data set by Neto et al. [6]. However, we believe that original SZZ Unleashed contains an error causing the performance to drop significantly which is fixed by a pull request on GitHub: https://github.com/wogscpar/SZZUnleashed/pull/32. That is why we have added another sub-RQ: – RQ1.1: Is the available fix for SZZ Unleashed valid?

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ …

187

• RQ2: How does detecting and discarding refactoring changes influence the overall recall? We analyse how does adding RefactoringMiner by Tsantalis et al. [10] affect the performance. • RQ3: How does adding our own proposed improvements affect the recall of the selected algorithm? We want to investigate the impact of each individual improvement on the previously selected SZZ implementation (from RQ1).

3 Methods and Materials Data set of bugs, as well as the compared SZZ implementations are described in this section.

3.1 Bug Data Set To automate comparison of SZZ algorithms we needed a data set of bugs. We chose data set published in Neto et al. [6], as it was created with evaluation of SZZ algorithms in mind. It consists of data such as bug fixing commit id, bug inducing commit id, path to bug fix and more. Since tested algorithms use git as version control system, and given data set also used SVN, we had to modify the data. Two repositories, commons-math and commons-lang, were migrated from SVN to git so we modified data set by replacing their SVN revision identifiers with respective git commit hashes. After that, the only remaining project was jfreechart, which does not use labels in GitHub issues, so it was removed from the data set. Resulting data set consisted of five projects: Apache commons-math, Apache commons-lang, mockito, JodaOrg joda-time and Google closure-compiler with corresponding bug fixing git commit hash, bug inducing git commit hash, path to bug fix and additional information. In resulting data set, all repositories were hosted on GitHub. Two repositories: Apache commons-math and Apache commons-lang used Jira as a issue tracker, while the rest used GitHub issues.

3.2 SZZ Unleashed and OpenSZZ Comparison To compare OpenSZZ and SZZ Unleashed we cloned their repositories and followed instructions in their README.md files. To make better comparison, we modified SZZ Unleashed so its output contained information about path of the fix. We ran OpenSZZ, and then SZZ Unleashed on Apache commons-math and Apache commons-lang. After we got results from both SZZ implementations we filtered them, so they held results only for fixes contained in our data set and then we compared their results

188

J. Pokropi´nski et al.

with the data set. For each repository and implementation, we counted distinct results consisting of bug fix, bug introducer and bug fix path that appear both in results and our data set. With this data we evaluated performance of implementations using the recall measure.

3.3 Base SZZ Algorithm Choice Rodriguez et al. [7] noticed an issue that most of the researchers studying topics related to the SZZ algorithm tend to create their own implementation from scratch. We agree with this statement and wanted to use an existing solution as a base for our improvements. During the literature review two most promising candidates were chosen: SZZ Unleashed and OpenSZZ. This process was started by running both of those implementations on repositories commons-bcel which by default was used as an example by OpenSZZ and unomi. The results were later on analysed by a script that compared both of them. The output of the mentioned script contained information about the count of bug introducer and bug fixer pairs that both implementations have in common, the counts without duplicates and the number of results for each of them: both with and without duplicates. The next step was to perform additional manual validation of results. During this process we noticed that most of the bug introducers are not valid due to the commits being major releases or refactorings with dozens of thousands of changed lines that had no connection to the bug fix. As this concerned most of the commits in results, we decided to redo the previous steps with those commits filtered out. This resulted in much clearer output and more readable data. As for the final decision both the steps mentioned above and those from Sect. 3.2 were used to determine it.

3.4 Github Issues The bug data set described in Sect. 3.1 contained only two repositories that were using JIRA as an issue tracker. This was the main motivation to extend the used SZZ implementation to allow fetching issues from GitHub issue tracker. An alternative strategy for fetching issues was developed and attached to the scripts handling the first stage of the SZZ algorithm.

3.5 SZZ Improvements Implementation After choosing the base for SZZ implementation we implemented our proposed improvements:

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ …

189

1. The first improvement (1) was to make SZZ refactoring aware. We used Refactoring Miner by Tsantalis et al. [10] to mine repositories for refactorings and ignored lines that were refactorings while building line mapping graph. 2. The second improvement (2) was to run SZZ only on changes in files that contained code as opposed to configuration files. We implemented it by using pattern for files that contained code and ignored all files that did not match it. As we used Java projects we set pattern to .*\.java. 3. Another improvement (3) was to ignore line deletions that were deletions of comments while building line mapping graph. To do this we used pattern \s*\/\/.*|\s*\*.*|\s*\/\*.* and ignored deletions that matched that pattern.

3.6 SZZ Improvements Comparison The article introduces a few possible improvements to the SZZ algorithm. Each of those improvements needs analysis of its performance. For that, we used the data set prepared in Sect. 3.1 and formula from Sect. 3.2. For the SZZ Unleashed implementation, as the created data set can be considered the only used source of truth for the validation, we decided to limit the input data for the bug introducer detection only to those that it contained. This allowed us to get much smaller processing times and clearer results. An additional advantage was that the reproducibility of end results has improved. Having done that, the received issues were used in the second step of the SZZ algorithm. This step was repeated for each improvement that was developed. Results were analysed with a formula mentioned above. In addition to that the processing time was measured. It should be noted that for the SZZ Unleashed it is smaller due to processing of a limited amount of issues. It is also worth mentioning that we did not test version that limited SZZ to .java files because our data set consists only of such files.

3.7 SZZ Unleashed Fix Impact During the development an open pull request was noticed in the SZZ Unleashed GitHub repository. Interestingly, its title contained the fatal bug phrase. The fix concerned a variable used as a list iterator. The first step of the impact analysis of the potential bug was estimated by a manual review of the code it related to. Afterwards, the fixed version was run just as others described in Sect. 3.6. Both those steps were enough to make a final decision about whether it was valid.

190

J. Pokropi´nski et al.

Table 1 Closure compiler results Algorithm Repository Time (s) version SZZ Unleashed SZZ Unleashed fixed SZZ UnleashedRA SZZ Unleashed-C SZZ Unleashed-T

Matches

Size data

Size results

closurecompiler closurecompiler

0

0

124

0

0

0

124

0

closurecompiler

0

0

124

0

closurecompiler closurecompiler

0

0

124

0

0

0

124

0

Size data

Size results

Table 2 SZZ Unleashed and OpenSZZ comparison results Algorithm Repository Time Matches version SZZ Unleashed SZZ Unleashed OpenSZZ OpenSZZ

commonslang commonsmath commonslang commonsmath

11 min 54 s

8

64

150

15 min 55 s

28

107

238

9 min 18 s

0

64

7

15 min 42 s

0

107

0

4 Results Firstly we observed, as seen in Table 1, that all tested implementations produced bad results for closure-compiler so we omitted them in further experiments. The issue with this repository concerns the lack of issues from the GitHub issue tracker that are present in the used data set. Conducted research focuses mostly on the second step of the SZZ algorithm and this repository produces no data that could be supplied to it. 8+28 = Using data in Table 2, we measured the performance of SZZ Unleashed ( 150+238 0.0928, i.e., 9.28%) and performance of OpenSZZ (0%) on these two repositories on the given data set. With these results we decided to use SZZ Unleashed as a base for improvements.

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ … Table 3 SZZ Unleashed fix impact results Algorithm version Repository Time SZZ Unleashed SZZ Unleashed fixed SZZ Unleashed SZZ Unleashed fixed SZZ Unleashed SZZ Unleashed fixed SZZ Unleashed SZZ Unleashed fixed

commonslang commonslang commonsmath commonsmath mockito mockito joda-time joda-time

Matches

191

Size data

Size results

16 s

8

64

150

6s

34

64

188

20 s

28

107

238

1 min 1 s

52

107

357

3s 2s 1s 2s

5 10 3 6

59 59 29 29

82 119 67 54

4.1 SZZ Unleashed Fix Impact Results To the best of our knowledge, original SZZ Unleashed algorithm contains a bug described in Sect. 3.7. A proper analysis was performed to validate its impact. Using data in Table 3, we measured that performance of SZZ Unleashed is 8.19% and performance of SZZ Unleashed with fix is 14.21%. That result and analysis of the part of the code containing described bug gave us enough confidence to assume that proposed fix is valid and further tests were performed on the fixed version. After performing our research, the proposed fix (https://github.com/wogscpar/ SZZUnleashed/pull/32) was merged into the repository, which means that our assumptions were correct in the context of the results we have got.

4.2 Proposed Improvements Impact Results In this section we present the impact of our proposed improvements, which are as follows: • • • •

Making SZZ refactoring aware (RA) (1). Considering only specified file extensions (ex. “.java”) (2). Disregarding deleted lines matching specified pattern (3). Disregarding fixing and bug-introducing commit pairs if the time between them is greater than 2 years (4). • Adding GitHub issues support as an issue tracker (5).

192

J. Pokropi´nski et al.

Table 4 SZZ Unleashed proposed improvements impact results Algorithm version

Repository

Time

Matches

Size data

Size results

SZZ Unleashed fixed

commonslang

6s

34

64

188

SZZ Unleashed-RA

commonslang

40 s

34

64

186

SZZ Unleashed-C

commonslang

5s

33

64

114

SZZ Unleashed-T

commonslang

5s

17

64

81

SZZ Unleashed-RA-C

commonslang

33 s

33

64

106

SZZ Unleashed fixed

commonsmath

1 min 1 s

52

107

357

SZZ Unleashed-RA

commonsmath

20 min 41 s

52

107

349

SZZ Unleashed-C

commonsmath

1 min 39 s

52

107

308

SZZ Unleashed-T

commonsmath

49 s

40

107

266

SZZ Unleashed-RA-C

commonsmath

20 min 59 s

52

107

302

SZZ Unleashed fixed

mockito

2s

10

59

119

SZZ Unleashed-RA

mockito

45 s

10

59

277

SZZ Unleashed-C

mockito

45 s

10

59

265

SZZ Unleashed-T

mockito

43 s

5

59

176

SZZ Unleashed-RA-C

mockito

44 s

10

59

251

SZZ Unleashed fixed

joda-time

2s

6

29

54

SZZ Unleashed-RA

joda-time

9s

6

29

54

SZZ Unleashed-C

joda-time

2s

6

29

47

SZZ Unleashed-T

joda-time

2s

0

29

30

SZZ Unleashed-RA-C

joda-time

6s

6

29

47

SZZ Unleashed fixed—SZZ Unleashed with the fix applied SZZ Unleashed-RA—“Refactoring Aware” version of SZZ Unleashed with RefactoringMiner added SZZ Unleashed-C—SZZ Unleashed version disregarding deleted lines which are comments. The C letter stands for comments SZZ Unleashed-T—SZZ Unleashed version disregarding fixing and bug-introducing commit pairs if the time (T) between them is greater than 2 years SZZ Unleashed-RA-C—version containing both RefactoringMiner and disregarding deleted lines matching specified pattern

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ …

193

Table 4 shows the performance of the base version of SZZ Unleashed with fix is 14.21% on 4 repositories. Our proposed improvements affect the results as follows: • • • •

SZZ Unleashed-RA—14.41% performance, SZZ Unleashed-C—17.75% performance, SZZ Unleashed-T—13.66% performance, SZZ Unleashed-RA-C—18.20% performance.

5 Discussion We received no results for any of the SZZ implementation for closurecompiler. That situation seems to be caused in the first phase of the SZZ algorithm and might suggest that GitHub implementation of this phase should be improved. It is worth mentioning that this result would be useful when improving the first phase of the algorithm. Since we did not implement such improvements as it was the only project with issues and others using GitHub as issue tracker did return valid results, we decided that those results bring little to comparison and we omitted them in further experiments. RQ1: Which of the two open source implementations (OpenSZZ, SZZ Unleashed) produces better results? Answer to RQ1 While comparing SZZ Unleashed and OpenSZZ we observed that for bug fixes in our data set OpenSZZ generated only 7 results for commons-lang and none for commons-math, while all results were incorrect which led to its performance being evaluated at 0%. Performance of SZZ Unleashed was 9.28% so its performance is better than OpenSZZ. It is worth mentioning, that there is a large space for improvement. RQ1.1: Is the available fix for SZZ Unleashed valid? Answer to RQ1.1 Version of SZZ Unleashed with fix performed about 1.74 times better which is a concrete proof that the fix is valid. RQ2: How does detecting and discarding refactoring changes influence the overall performance? Answer to RQ2 Adding information about refactoring changes caused the performance of the algorithm to increase from 11.65% to 11.79%. However this comes with a disadvantage of RefactoringMiner only being able to detect refactoring of Java projects, which limits us to those type of projects in the future.

194

J. Pokropi´nski et al.

RQ3: How does adding our own proposed improvements affect performance of the selected algorithm? Answer to RQ3 Results show that our proposed improvements on their own positively affect the performance, apart from the only case of the SZZ UnleashedT version, which caused the performance to drop. Overall, the best performing version of the algorithm was SZZ Unleashed-C, which discards lines matching specified pattern. It caused the performance to increase from 11.65% to 13.89%. However, combining two versions SZZ Unleashed-RA and SZZ Unleashed-C - led to even greater improvements, with the performance of SZZ Unleashed-RA-C increased to 14.16%.

5.1 Threats to Validity Neto et al. [6] created data set using bugs from Defects4J [3] database. Defects4J maintainers create this database with strict rules for each bug. Each bug in database has the following properties: 1. Issue filed in the corresponding issue tracker, and issue tracker identifier mentioned in the fixing commit message. 2. Fixed in a single commit. 3. Minimised: the Defects4J maintainers manually pruned out irrelevant changes in the commit (e.g., refactorings or feature additions). 4. Fixed by modifying the source code (as opposed to configuration files, documentation, or test files). 5. A triggering test exists that failed before the fix and passes after the fix—the test failure is not random or dependent on the test execution order. Due to the above mentioned constrains, experiments on this data set may give results not reflecting the real world conditions. Another threat to validity of this research is the number of projects available in our data set. Since closurecompiler was omitted in the experiments, we ended up with only four repositories and the OpenSZZ was evaluated (in automated way) only on two projects as it is only compatible with JIRA issue tracker. As one of our proposed improvements was to include only specified file extensions in the process of detecting the bug introducing commits, we have performed our experiments only on Java projects and “*.java” files. The addition of RefactoringMiner also limits us to Java projects only. It is possible to run the algorithm on other projects, by not using the RefactoringMiner and running the program without the “-fp” flag, which excludes files based on specified file pattern, but it has not been tested.

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ …

195

6 Future Research One of the limitations of the improved version is the GitHub issue fetcher. It is possible that the lack of direct interpretation of the id, which is a number, might affect the output of bug fixing commits detection. The issue id has the same format as the one of a GitHub pull request. Merging a pull request usually creates a commit with message containing its id and matching the regular expression used for detecting bug fixing commits by issue id: “#ID”. SZZ uses annotation graphs, which (as claimed by Williams and Spacco [11]) are imprecise at tracking lines across large hunks of modified lines. One possible solution to this issue would be to replace them with line-number mappings proposed by the same authors [12]. To better evaluate the SZZ implementations we would need a bigger data set of bugs containing code that is not limited to Java only. What is also worth exploring is the time between bug-fixing and bug-introducing commits and finding a perfect time frame, where the results are still valid, and no correct pairs are rejected.

7 Conclusions The publication presents improved versions of the SZZ Unleashed algorithm including the most promising SZZ Unleashed-RA-C. Few of the most important features are: refactoring awareness (RA functionality), discarding comments and files based on a regular expression and the possibility of using GitHub as the issue tracker. Those changes result in a much better performance compared to the base version. Nevertheless, there is still a large space for improvement. The paper introducing the original SZZ algorithm [9] received a huge number of citations and was awarded as the most influential paper at MSR conference in 2015. Hence, we believe that the improved version would provide benefits for many researchers.

8 Appendix: Research Reproduction This section presents steps required to reproduce the presented research. Details, code and data are available at https://github.com/pwr-pbrwio/PBR20M1/blob/master/ reproduction.md.

196

J. Pokropi´nski et al.

8.1 Dependencies Installation Dependencies include: • git • java 8 • python 3

8.1.1

Dependencies Installation on Windows and Macos

Download and install dependencies: 1. git: https://git-scm.com/ 2. java: https://www.oracle.com/java/technologies/javase-jre8-downloads.html 3. python: https://www.python.org/

8.1.2 1. 2. 3. 4.

Dependencies Installation on Linux

sudo apt update sudo apt install git sudo apt-get install openjdk-8-jre sudo apt-get install python3

8.2 Steps to Reproduce Requirements: You will need a GitHub personal access token (https://help.github.com/en/github/ authenticating-to-github/creating-a-personal-access-token). Place it in Scripts/ token.txt. It is used for projects using GitHub as issue tracker.

8.2.1

SZZUnleashed: With and Without Improvements

On windows replace python3 with python and pip3 with pip. 1. Prepare SZZ: 1 2 3 4 5

git c l o n e h t t p s :// g i t h u b . com / pwr - p b r w i o / P B R 2 0 M 1 cd P B R 2 0 M 1 pip3 i n s t a l l - r r e q u i r e m e n t s . txt cd ..

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ …

197

2. Get repository from data set (example of commons-lang): 1 2 3

mkdir commons - lang cd commons - lang git c l o n e h t t p s :// g i t h u b . com / a p a c h e / commons - lang . git

3. Download project issues (filtered with data set) If you are using Jira as the issue tracker: 1

p y t h o n 3 ../ P B R 2 0 M 1 / S c r i p t s / g e t N e t o I s s u e s . py -- o w n e r " a p a c h e " -- repo " commons - lang " -- tag " lang " -r e p o P a t h " ./ commons - lang " -- jira " i s s u e s . a p a c h e . org / jira "

If you are using GitHub as the issue tracker (e.g., for mockito): 1

p y t h o n 3 ../ P B R 2 0 M 1 / S c r i p t s / g e t N e t o I s s u e s . py -- o w n e r " m o c k i t o " -- repo " m o c k i t o " -- r e p o P a t h " ./ m o c k i t o " -- f e t c h S t r a t e g y g i t h u b

2

4. Run the SZZ algorithm: 1

java - jar " ../ P B R 2 0 M 1 / S c r i p t s / u n l e a s h e d / szz . jar " - i " . temp / i s s u e _ l i s t . json " -r " ./ commons - lang " - d =3 - fix - ra - up - mt - fp

2

Where flags -fix -ra -up -mt -fp are optional: -fix enables fix -ra runs SZZ with refactoring awareness -up removes comments -mt limits time between commits to 2 years -fp limits SZZ to .java files 5. Get results: 1

p y t h o n 3 ../ P B R 2 0 M 1 / S c r i p t s / m e a s u r e P o s . py -- r e p o N a m e = " commons - lang "

2

8.2.2

OpenSZZ

Dependency requirements—the following software is required: • docker • docker-compose Usage: 1. Clone the OpenSZZ repo (https://github.com/clowee/OpenSZZ-Cloud-Native). 2. Publication was prepared on version 533b4911710753e76c78c02c02ca10707a74e05b. Make sure the correct version is used. 3. Increase the heap size by adding the JVM_OPTS environmental variable to the web service in the docker-compose.yml file. Note that using docker on windows or linux might require increasing the total memory assigned to docker in the docker settings. Example:

198 1 2 3 4 5 6 7 8 9 10 11 12

J. Pokropi´nski et al. web : build : ./ core ports : - "${ PORTRANGE_FROM }-${ PORTRANGE_TO }:8080" networks : - spring - cloud - n e t w o r k depends_on : - rabbitmq volumes : - / var / run / d o c k e r . s o c k :/ var / run / d o c k e r . s o c k environment : - J V M _ O P T S = - X m x 1 2 g - X m s 1 2 g - XX : M a x P e r m S i z e = 1 0 2 4 m

13

4. Follow the OpenSZZ readme file (https://github.com/clowee/OpenSZZ-CloudNative) for instructions on starting the application and running repositories. 5. Rename results to BugInducingCommits.csv. 6. Analyse results: 1

p y t h o n 3 ../ P B R 2 0 M 1 / S c r i p t s / o p e n S z z A c c . py -- r e p o N a m e = " commons - lang "

2

References 1. Borg, M., Svensson, O., Berg, K., Hansson, D.: SZZ unleashed: an open implementation of the SZZ algorithm-featuring example usage in a study of just-in-time bug prediction for the Jenkins project. In: Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation, pp. 7–12 (2019) 2. da Costa, D.A., McIntosh, S., Shang, W., Kulesza, U., Coelho, R., Hassan, A.E.: A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes. IEEE Trans. Softw. Eng. 43(7), 641–657 (2017) 3. Just, R., Jalali, D., Ernst, M.D.: Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis, pp. 437–440 (2014) 4. Lenarduzzi, V., Palomba, F., Taibi, D., Tamburri, D.: OpenSZZ: A Free, Open-Source, WebAccessible Implementation of the SZZ Algorithm (2020) 5. Neto, E.C., da Costa, D.A., Kulesza, U.: The impact of refactoring changes on the SZZ algorithm: an empirical study. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 380–390 (2018) 6. Neto, E.C., da Costa, D.A., Kulesza, U.: Revisiting and improving SZZ implementations. In: 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–12. IEEE (2019) 7. Rodriguez, G., Robles, G., Gonzalez-Barahona, J.: Reproducibility and credibility in empirical software engineering: a case study based on a systematic literature review of the use of the SZZ algorithm. Inf. Softw. Technol. (2018). https://doi.org/10.1016/j.infsof.2018.03.009 8. Silva, D., Silva, J., De Souza Santos, G.J., Terra, R., Valente, M.T.O.: RefDiff 2.0: a multilanguage refactoring detection tool. IEEE Trans. Softw. Eng. 1 (2020). https://doi.org/10.1109/ TSE.2020.2968072 ´ 9. Sliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? ACM Sigsoft Softw. Eng. Notes 30(4), 1–5 (2005)

SZZ Unleashed-RA-C: An Improved Implementation of the SZZ …

199

10. Tsantalis, N., Mansouri, M., Eshkevari, L.M., Mazinanian, D., Dig, D.: Accurate and efficient refactoring detection in commit history. In: Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, pp. 483–494. ACM, New York, NY, USA (2018). https:// doi.org/10.1145/3180155.3180206 11. Williams, C., Spacco, J.: SZZ revisited: verifying when changes induce fixes. In: Proceedings of the 2008 Workshop on Defects in Large Software Systems, DEFECTS ’08, pp. 32–36. Association for Computing Machinery, New York, NY, USA (2008a). https://doi.org/10.1145/ 1390817.1390826 12. Williams, C.C., Spacco, J.W.: Branching and merging in the repository. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR ’08, pp. 19–22. Association for Computing Machinery, New York, NY, USA (2008b). https://doi.org/ 10.1145/1370750.1370754

Which Static Code Metrics Can Help to Predict Test Case Effectiveness? New Metrics and Their Empirical Evaluation on Projects Assessed for Industrial Relevance Bartosz Boczar, Michał Pytka, and Lech Madeyski Abstract One of corner stones of software development are test cases, which help in assessment of created production code. As long as they are properly designed, they have a capacity to capture faults. In order to check whether tests are well made, different procedures have been established, like statement coverage or mutation testing, to evaluate their performance. This has an obvious downside of being computationally expensive and as such is not employed on a wide enough scale. Finding solutions to increase efficiency of assessing test cases, could lead to a more widespread adoption and for that reason we investigate one such approach. We tested possibility of predicting test case effectiveness, strictly on a basis of static code metrics of production and test classes. To solve this task we employed three different learning classifiers, to check feasibility of the process and compare their performance. We created our own set of metrics all of which were later assessed for their impact on prediction. Out of seven most impactful predictors, four of them were proposed by us: Number Of test Cases used in Test class (NOCT), Number Of Defined Variables in a class (NODV), Number of New Objects created in a class (NONO), Number Of Assertions used In Test class (NOAIT). Created models yield a promising result, with best of them achieving over 85% for both F-Measure and Precision along with 73% for Matthews Correlation Coefficient. With the fact of well balanced data used in creation of model, it is safe to assume, that they hold some merit. All steps taken to achieve this result are explained in detail. Keywords Mutation testing · Predictive mutation testing · Software metrics · Machine Learning

B. Boczar · M. Pytka · L. Madeyski (B) Wroclaw University of Science and Technology, Wroclaw, Poland e-mail: [email protected] B. Boczar e-mail: [email protected] M. Pytka e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_8

201

202

B. Boczar et al.

1 Introduction In order to implement reliable software, the programmers need to prepare set of tests that could indicate the errors present during the execution of the code. For that reason tests implementation is an important process. One of the best methods of estimating whether the test is good is mutation testing. Unfortunately this is a time consuming process. The studies performed by Grano et al. [2] prove that this process can be simulated using Machine Learning methods, which costs significantly less time. In our paper we present our attempts at reproducing study by Grano et al. [2]. Furthermore, we describe the process of using similar Machine Learning methods to create a model able to estimate effectiveness of the tests. However, our solution does not use coverage, a measure which requires much greater computational power in order to be computed, compared to other measures. Instead we present a solution which uses metrics provided by JavaMetrics tool [10], earlier used by Grodzicka et al. [3], and our new set of proposed metrics. Our studies can be reproduced with the package provided by us. All steps needed to recreate this study are included in the Appendix of this paper and appropriate scripts are available in our repository.1 Our contributions in this paper are as follows: • An attempt to reproduce study by Grano et al. [2] including a detailed list of issues we encountered. • Completely new replication package addressing the issues in the original package by Software Evolution and Architecture Lab [6]. • Extending the study by Grano et al. [2] to different (class and function level) samples and projects assessed for industrial relevance [4]. • Extension of the data sets proposed by Grano et al. [2], by new metrics related to source code, code coverage, and Java features related to testing. • Empirical evaluation of the new metrics as predictors of mutation score indicator.

2 Literature Review Our literature review focused on finding work related to the subject of mutation testing and finding bad code smells. Our main goal was extending the MLCQ data set [4] by new metrics. We wanted to find simple metrics that would identify the usage of new java features in code. That way we could also verify whether those features increase or decrease the chances of bad smells.

1

https://github.com/pwr-pbrwio/PBR20M2.

Which Static Code Metrics Can Help to Predict Test Case …

203

2.1 Lightweight Assessment of Test-Case Effectiveness Using Source-Code-Quality Indicators Study by Grano et al. [2] is a highly relevant paper that we tried to reproduce. The main goal of Grano et al. [2] was to create a model which uses code quality metrics to estimate the effectiveness of the tests without creating and running mutation tests. The model was trained using good and bad tests of 18 projects. They divided tests into good and bad ones using their mutation score. To compute mutation score, they used PIT, which provides 13 mutation operations. They considered 67 code quality factors and 5 dimensions: Code Coverage, Test Smells, Code Metrics, Code Smells and Readability. In total, Grano et al. [2] used 18 open source Java projects. Their first task was to determine whether there is a relationship between the chosen quality factors and the effectiveness of the tests. The effects of their work indicate that the greater the number of statements executed by test cases tend to improve their effectiveness. Production code metrics also seem to be linked to test effectiveness, which indicates that the code with higher quality is better at finding faults in the production code. Similarly the results show that code smells can cause test cases to be less effective, although test smells do not appear to have an impact on test effectiveness. Then they proceeded to check to what extent their tool can be used to estimate the effectiveness of test cases as compared to mutation score. To answer this question they used different Machine Learning classification algorithms: Random Forest, KNeighbours and Support Vector Machines. This lead to the creation and comparison of 6 different models as each algorithm was used to create dynamic and static one. A dynamic model uses the same data set as static model extended with coverage because this is dynamic feature. Their experiments indicate that Random Forest model achieves better results when it comes to evaluation metrics than K-Neighbours and Support Vector Machines. Their study shows that Random Forest dynamic model could achieve results of 95% in terms of metrics like F-Measure and AUC-ROC, which confirms that Machine Learning models can be used for effective estimation of test cases. It also appears that static model performs a little worse than dynamic one, which has decreased performance by about 9%. Finally they discussed how the created model can be used. One of the most important possible usage would be integrating the model within the code analysis software to let the developers diagnose their code. That would give them desired information about potential effectiveness of test cases. Those information might be used to discard non-effective tests or to study them to understand which operations cause test cases to be non-effective. The model can also be used simply as an alternative to standard mutation testing. This solution could save developers their time as mutation testing is a time consuming method.

204

B. Boczar et al.

2.2 Predictive Mutation Testing Zhang et al. [9] concerns mutation testing approaches. As that technique involves a lot of computationally expensive operations, they proposed a novel classification model, which would be used to predict, whether a mutant is to be killed or not. This would be the main factor increasing the performance of tests, obtaining results for mutants without executing them. Features used in that prediction were gathered with the dynamic technique called PIE (propagation, infection, and execution) in mind [8]. According to PIE, program’s computational behaviour can be estimated based on three characteristics: probability of particular code section execution, probability of that section influencing data state and probability of produced data state influencing the output. With this in mind, selected features for PMT would indicate either the execution of mutated statement, infection of the program state after the mutated statement was executed or propagation of the infected program state leading to different output. With features in place, several algorithms for the model were tested, leading to selection of the best performing one, namely Random Forest. The pipeline of creating a prediction is as follows: Selecting a project for which testing is to be done, taking its previous release as training set, extracting the features from both of gathered versions, teaching the model and finally predicting results. Initial testing comprised of 9 projects, to make initial judgements and later expanded to 154 projects. Obtained results were promising. A test was conducted, in order to check performance of this approach under imbalanced data. This was done by, selecting two groups of projects, one with mutation score lower than 0.2 and second with score higher than 0.8. This yield a result, that the process is feasible despite the selecting inherently imbalanced data. Result analysis with research questions, shows that effectiveness of this method rivals the traditional execution of mutants and it is significantly more efficient time wise. Additional data balancing steps were not impacting results negatively nor positively, which was attributed to the Random Forest and its ability to handle imbalanced data. Features themselves were analysed in the context of their merit in prediction. Execution and propagation features showed a visible impact on performance, while the infection ones made no difference. This was attributed to the fact, that object oriented design limits the propagation of infections, between test outcomes and such features would be more impactful in procedural languages. Finally a consideration on predictability was presented. In most projects, mutants with high predictability were the most numerous group. Additionally mutants with higher amount of executions are more predictable along side mutants with high killability.

Which Static Code Metrics Can Help to Predict Test Case …

205

2.3 Comparison of Lightweight Assessment and Predictive Mutation Testing As Grano et al. [2] and Zhang et al. [9] are studies from which we base our work, it is important to compare and contrast those two works. • Outcome of classification: Grano et al. [2] classifies tests to one of two classes: good or bad. Zhang et al. [9] classifies whether the mutant will be killed or not and uses this knowledge to calculate mutation score. • Focus of work: Grano et al. [2] focuses on static code-quality features as predictors of test quality. Additionally coverage of test cases is considered as such predictor. Zhang et al. [9] focuses on predicting outcomes of mutants. • Data used in their prediction: Grano et al. [2] analyses code smells, test smells, code coverage and readability. Zhang et al. [9] considers features of generated mutants, along with code quality. • Research questions: Grano et al. [2] enquire about relationship between code quality and test-case effectiveness, see to what extent test case effectiveness can be estimated and distinguish the most important factors in predicting effectiveness. Zhang et al. [9] enquires about how effective is this approach in predicting whether mutants survive or get killed, how different application scenarios (cross-project and cross-version) influence the prediction, applicability of the model, impact of different features on outcome. • Goal of work: Grano et al. [2] focuses on characterising effective test cases based on static features of code, finding impactful factors and informing about them and estimating quality of code based on said factors. Zhang et al. [9] focuses on testing performance of PMT, evaluating the effect that different classification parameters had on it, checking effectiveness under two different application scenarios, checking viability of the proposed approach to evaluating mutants, identifying features that are the most impactful, finding characteristics of mutants that are hard/easy to predict. • Effect of work: Grano et al. [2] aims to give feedback to developers based on easily accessible measures of static code, inform them about problematic parts of code. The model created in their study might be integrated with already existing software analysis tools. Zhang et al. [9] aims to speed up tremendously the process of mutation testing, clarify the connection between mutant features and code quality. They present the new methodology which decreases the time necessary for the process.

3 Methods and Materials Our first task was reproducing the package provided by Grano et al. [2] to verify whether we can use their model in our research and extend it by our own metrics. Then we wanted to prepare our own set of metrics which would be implemented in a

206

B. Boczar et al.

tool Java Metrics by Ziobrowski [10]. The last step was to create a model with those metrics, check the model’s effectiveness and find the most important metrics in this model.

3.1 Study Reproduction Before we reproduced the package by Grano et al. [2], we had to solve all the issues encountered while building all the projects. We had to build each of the projects separately and fix the ones that failed to build. Then we proceeded to generate mutation tests and create the model. Mutation tests using PIT and appropriate generated scripts were completed successfully without significant issues. This step ended with scores for each test case, that had to be aggregated. Scripts responsible for aggregating data, had some bugs, that needed changing in order to complete their described tasks. All of the mentioned bugs were technical in their nature, detailed description of needed changes will be provided in Appendix. As we were analysing the code responsible for creating the model, we noticed that the input data used for training was included in the git repository [6]. The mutation tests generated during previous steps of our reproduction were not used which means that the whole process had no effect on the final outcome. The training data was precalculated using tools that have not been published yet. Therefore we are not able to correctly prepare data of model training and our reproduction of study by Grano et al. [2] is not successful.

3.2 New Metrics Propositions We decided to propose our own metrics which could be used to estimate tests effectiveness. These metrics are supposed to replace the coverage, because calculating coverage is a time consuming process. We would like to continue Grano et al.’s [2] idea to create a lightweight solution. For that reason we performed further studies on the same set of tests used by Grano et al. [2]. The list of metrics we proposed is as follows: • ALU—Assertion Library usage, the metric shows if the class uses Assertion Library. The list of considered libraries: Hamcrest, Assertj, Atrium, Truth, Valid4j, Datasource-Assert; • NOAIT —Number of assertions used in test class; • NOASIT —Number of assumptions used in test class (Junit 5.0 feature); • NOAU—Number of @After annotations used in test class; • NOBU—Number of @Before annotations used in test class; • NOCT —Number of test cases used in test class;

Which Static Code Metrics Can Help to Predict Test Case …

• • • • • • • • • • •

207

NODT —Number of dynamic tests in a test class (Junit 5.0 feature); NODV —Number of defined variables in a class; NOECU—Usage of ErrorCollector in class (JUnit 4.0 feature); NOEET —Number of expected exception tests; NOET —Number of extended tests (Junit 5.0 feature); NOFS—Number of for statements in a class; NOIS—Number of if statements in a class; NONO—Number of new objects created in a class; NOPT —Number of parametrized tests (Junit 5.0 feature); NORT —Number of repeated tests (Junit 5.0 feature); NOST —Number of nested tests (Junit 5.0 feature).

To calculate metrics we decided to use a tool Java Metrics by Ziobrowski [10], which we extended by implementing our metrics. The metrics that were already present in Java Metrics were also used to train the model.

3.3 Model Creation Our process of creating a model started with selecting 18 projects, which were also used in Grano et al. [2] and gathering them from their GitHub repositories. After that manual building of those projects had to be done, as this process proved to be too unfeasible to be completed automatically. With projects prepared, their static code metrics were computed using Java-Metrics and outputs stored for later use. Next mutation testing has to be done, in order to compute mutation score, which was done with use of scripts from Software Evolution and Architecture Lab [6] reproduction package. Generated mutation values have to be then gathered. With both metrics and mutation scores ready, preprocessing of data could be done, where metrics of production classes get merged with metrics of their test classes and appropriate mutation score. Additionally MIN, MAX, MEAN of McCabe Complexity are calculated for each of classes. Final step in preparing a data frame is assigning a classification for all of the computed rows, using first quartile of mutation scores as Bad Tests and forth as Good Tests. This yield 1342 rows of data, with exactly even split of 671 for both of classifications. As classifier selection has a strong influence on outcome of classification, we chose to test three different ones: Random Forest, K-Neighbours and Support Vector Machines. Our method of training and testing was nested cross-validation. This choice was made, to ensure best possible parameter selection for each of the models. Following are set of tuned parameters: • Random Forest – ntree: (300:500) – mtry: (10:40) – nodesize: (5:20)

208

B. Boczar et al.

Table 1 Table showing the performance results of classification algorithms Mcc Ce Precision Random Forest KNN SVM

0.729 0.703 0.631

0.135 0.148 0.184

0.880 0.876 0.802

Fbeta 0.853 0.837 0.810

Fbeta parameter is the name of F-measure in mlr3 package

• K-Neighbours – k: (1:20) – distance: (0.001:2) • Support Vector Machines – cachesize: (20:150) – cost: (30:80) – gamma: (0.00001:0.01).

4 Results The following are model performance parameters obtained after their generation (Table 1). Interestingly judging their performance based on Matthews Correlation Coefficient different hierarchy of classifiers was obtained, as compared to Zhang et al. [9] and Grano et al. [2]. Random Forest still performed the best, but K-Neighbours scored higher than Support Vector Machines. Additionally we looked at the Importance of parameters, that was generated alongside our Random Forest model, to later evaluate how impactful are certain predictors.

5 Discussion Some of the key results for our study is the performance of obtained models, as ultimately, that was one of the goals. With precision of 0.880 and a balanced data set, it can be assumed that it has some merit and could be valuable when put into a real life scenario. When judging classifier performance, we mostly tried to maximise for Matthews Correlation Coefficient, as it takes into account all parts of a confusion matrix. To compare our results, we looked at final performance values obtained in Grano et al. [2], to see how effective our process is, when compared to other contemporary work. Scenario we looked at, was the one without Code Coverage, as it resembles our study the most. With values of precision equal to 0.880 and F-measure equal to

Which Static Code Metrics Can Help to Predict Test Case … Table 2 Table showing the importance of the metrics Metric name Importance Metric name MAX_CYCLO.x MEAN_CYCLO.x NOCT.y NODV_C.y NOPA.x Nono_C.x NOAIT_C.y NPM.x NOAM.x LOC_C.y WMCNAMM.y WOC.y LOC_C.x WMC.y WMCNAMM.x WMC.x NOM.x NOM.y NPM.y WOC.x NODV_C.x NOIS_C.x Nono_C.y NOPV.x NOAM.y NOFS_C.y MEAN_CYCLO.y NOMM.x NOFS_C.x NOBU.y NOEET.y MAX_CYCLO.y NOPV.y NOIS_C.y

0.0493 0.0453 0.0429 0.0355 0.0336 0.0310 0.0293 0.0267 0.0257 0.0243 0.0220 0.0191 0.0191 0.0170 0.0163 0.0155 0.0154 0.0151 0.0146 0.0143 0.0137 0.0136 0.0128 0.0117 0.0102 0.0076 0.0067 0.0056 0.0047 0.0041 0.0035 0.0029 0.0020 0.0012

NOMR_C.x MRD.x LD.x NOPA.y NOL_C.x MIN_CYCLO.y LD.y NOAU.y NOL_C.y ALU.y NOAIT_C.x ALU.x NOMM.y MIN_CYCLO.x NOASIT_C.x NOAU.x NOBU.x NOCT.x NODT_C.x NODT_C.y NOECU_C.x NOECU_C.y NOEET.x NOET.x NOET.y NONT.x NONT.y NOPT.x NOPT.y NORT.x NORT.y NOMR_C.y MRD.y NOASIT_C.y

209

Importance 0.0009 0.0008 0.0006 0.0005 0.0003 0.0002 0.0002 0.0002 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Positions with *.x suffix reference production class metrics and *.y suffix reference test class metrics

0.853, and values of both metrics achieved by Grano et al. [2] equal to 0.864, it can be said that both solutions are similarly capable (Table 1). We also would like to remark on performance of some of the predictors used in the classification. As shown by Grano et al. [2], Code Coverage was the most important

210

B. Boczar et al.

Table 3 Table showing the importance of metrics proposed by us Metric name Importance Metric name MAX_CYCLO.x MEAN_CYCLO.x NOCT.y NODV_C.y Nono_C.x NOAIT_C.y NODV_C.x NOIS_C.x Nono_C.y MEAN_CYCLO.y NOFS_C.x NOFS_C.y NOBU.y NOEET.y MAX_CYCLO.y NOIS_C.y MIN_CYCLO.y NOAU.y ALU.y

0.0493 0.0453 0.0429 0.0355 0.0310 0.0293 0.0137 0.0136 0.0128 0.0067 0.0047 0.0076 0.0041 0.0035 0.0029 0.0012 0.0002 0.0002 0.0001

NOAIT_C.x ALU.x MIN_CYCLO.x NOASIT_C.x NOAU.x NOBU.x NOCT.x NODT_C.x NODT_C.y NOECU_C.x NOECU_C.y NOEET.x NOET.x NOET.y NOPT.x NOPT.y NORT.x NORT.y NOASIT_C.y

Importance 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Positions with *.x suffix reference production class metrics and *.y suffix reference test class metrics

information in all of the predictions and because of this, metrics which are the most closely connected to it have highest importance values, namely maximal cyclomatic complexity for production class and mean cyclomatic complexity for production class. This could be lead to an interpretation, that a single complex method of a class, decreases its quality. Perhaps what is surprising, is the fact that out of seven most impactful predictors, four of them were proposed by us (NOCT, NODV and NONO for test classes and NOAIT for production classes). Among those NODV, number of defined variables for test class, scores highly. This was an unexpected finding, seeing such correlation between this metric and test case effectiveness. A significant portion of predictors, have an importance factor of zero, meaning they have no bearing on the outcome. Given further exploration of this topic, those parameters would have to be omitted, due to being of no relevance (Tables 2 and 3).

6 Conclusions In our studies we proposed new metrics that could be used to estimate test effectiveness. We tried recreating studies done in Grano et al. [2], but were unable to finalise the process, which in turn lead to creation of our proposed solution. Despite

Which Static Code Metrics Can Help to Predict Test Case …

211

using separate process, albeit similar to tested reproduction package, we created a model, which performed similarly well to the one created by Software Evolution and Architecture Lab [6]. After analysing the importance computed by the Random Forest model we discovered that some of those metrics might be useful features in test effectiveness estimation. We have documented the whole process of our studies and provided reproduction package that can be used by anyone who would like to reproduce our studies. In the future we could expand our studies and examine model with only relevant metrics. We could also propose and implement more metrics that might be useful features for test effectiveness estimation.

7 Appendix: Reproducibility of the Presented Research This section presents materials and steps required to reproduce the presented research. Details are available at https://github.com/pwr-pbrwio/PBR20M2/blob/ master/README.md.

7.1 Study Reproduction In the process of recreating the Software Evolution and Architecture Lab [6] we encountered several issues, which needed to be solved in order to yield a valid result. We documented them, to transparently show what was changed.

7.1.1

Fixing the Process of Building Projects

In order to reproduce study by Grano et al. [2], we cloned the git repository2 and attempted to follow instructions presented in the README.md file. The reproduction was performed on Ubuntu machine with the following software versions: • Maven 3.6.03 • Python 3.64 • Java 1.8.5 The main problem was encountered during the execution of the script get_projects.sh. The script clones each project, that was used in the study,

2

https://github.com/sealuzh/lightweight-effectiveness. Miller et al. [5]. 4 Van Rossum and Drake [7]. 5 Arnold et al. [1]. 3

212

B. Boczar et al.

from its git repository and performs maven installation. Unfortunately a few projects failed to build successfully. The first project that could not be built was gson. The project uses flags supported since JDK 1.9. We managed to resolve the issue by using JDK 14 to build this project. We also had to correct the source and target version in the pom file to 1.9. The next project that had to be fixed was cat. While installing the project maven did not manage to download maven-source-plugin specified in pom file. The issue is caused because The Central Repository does not support HTTP communication since January 15, 2020. The repository can be accessed only with HTTPS. We resolved the issue by correcting urls associated with maven repository in pom file to use HTTPS communication. Another problematic project was RXJava. The project uses Gradle6 and the script uses maven to build it. The problem could be resolved by chaining the script to use gradle install instead of mvn install for gradle projects. However this would cause problems during the process of creating mutation tests. For that reason we converted the project to maven by generating a new pom file. In the generated pom file we specified target and source version to be 1.9. The last project that caused us problems was Opengrok. We resolved the issue by installing necessary software present on the git repository. Furthermore, the building process had trouble with generating javadoc. The javadoc comments were not necessary so we changed the pom settings to skip the generation process.

7.1.2

Fixing Python Script Problems Within the Package

In order to complete the reproduction, a handful of python scripts has to be executed in order as presented in the read me in git repository.7 In process of recreating the study, several issues has been found within them. Script calculate_results.py8 was responsible for gathering all mutation scores of all test cases into a csv file. In function of the same name (that is calculate_results), one of operations required a path to a directory with mutation results. String representing the path was not being created in a correct way, leading to finding no scores to gather. This was fixed by changing the string creation. Script aggregate_sources.py9 was responsible for combining data about mutation scores with data on code metrics into a single file and dividing gathered data into sets of good and bad tests. This script was placed in a different directory, than the one mentioned in the read me. The function process_results which combined the data, would in its first step check for the existence of different files and directories. Some 6

https://gradle.org/. https://github.com/sealuzh/lightweight-effectiveness. 8 https://github.com/sealuzh/lightweight-effectiveness/blob/v1.0/effectiveness/mutation/ calculate_results.py. 9 https://github.com/sealuzh/lightweight-effectiveness/blob/v1.0/effectiveness/metrics/ aggregate_sources.py. 7

Which Static Code Metrics Can Help to Predict Test Case …

213

of the files (like test_readability.csv10 or source_readability.csv11 ) were not present in the expected locations, which would lead to immediate halt of the script. This can be solved in two ways, either modify the structure of the package and place all file accordingly, or modify the code to accommodate for the different location. In the process of recreation the first option was picked. Additionally it is important to note, all files were present in the package, but their locations were different. Script plots.py12 was responsible for creating plots after classification was complete. Single problem with this script, was connected with a use of external service called Plotly. In order to use it, in the script one has to provide credentials for their account. Line of code, which authorises the user with given credentials was left with placeholder user information. This would lead to an error and was solved simply by removing the line.

7.1.3

The Issue Unable to Be Solved

The process of generating mutation tests resulted in creating 26 csv files, each file represented the specific mutation operation. Half of these files contained tests classified as good ones, the other half contained tests classified as bad ones. The code responsible for creating the model imported only the 2 of the 26 files which were already present in the repository (one with good tests, the other one with bad tests). Each of the generated files has the size of around 300 kB, while the 2 files used for the training of the model have the size of around 400 kB each. We do now know what causes the difference in the size. The problem with classifier data was two fold. If data provided in the package was using only one mutation operator, then we did not know which one that was and why our data sets had smaller amount of entries. On the other hand, if their data includes all operators, the size of our data set is significantly bigger. For that reason we were forced to stop our attempts at reproducing the study by Grano et al. [2].

7.2 Chosen Environment We decided to use R programming language to prepare our own classification model, because it provides useful Machine Learning libraries. We used mlr3 library to create models and tune them. This library provides numerous objects that let the user perform learning, re-sampling and analysing of Machine Learning models. There are also many additional packages that can further extend functionalities provided by mlr3. 10

https://github.com/sealuzh/lightweight-effectiveness/blob/v1.0/metrics/test_readability.csv. https://github.com/sealuzh/lightweight-effectiveness/blob/v1.0/metrics/source_readability.csv. 12 https://github.com/sealuzh/lightweight-effectiveness/blob/v1.0/effectiveness/classification/ plots.py. 11

214

B. Boczar et al.

The script we implemented creates three, each with different classification algorithm: Random Forest, K-Nearest Neighbours and Support Vector Machine. The models are tuned using Nested Cross Validation with 10 folds. We wanted to compare results of our study to the results achieved by Grano et al. [2]. For that reason we chose the same algorithms to create Machine Learning models.

7.3 Reproduction Instructions As our attempt at reproduction of aforementioned package was unsuccessful, this meant that we could not improve on it either and created a separate one. Our package was created from ground up, but with some use of scripts from Software Evolution and Architecture Lab [6]. All of used scripts still reference the original author. List of needed tools to recreate the process completely is as follows: • • • • •

Ubuntu 18.04 Maven 3.6.0 Python 3.6 with pandas 1.0.4 package installed Java 1.8 R 3.4.4 with pacman package installed.

In order to reproduce our studies you need to clone our git repository. Then you need to clone git repositories of the projects used in our studies into the projects folder. The full list of used projects is listed in the projects.csv in our repository. However, if you prefer you can use your own projects. In order for the package to work with external projects, their names have to be added to the projects.csv as well. In order to prepare the static code metrics, our fork of Ziobrowski [10] has to be used. It is available on this git repository.13 Instructions on how to build and use this tool are present on the repository page. After outputs from selected projects are computed, .csv files with metrics have to be put into javametrics_outputs directory. The next step is to build all the projects. To build the project you can open terminal in the project’s root folder and use the command mvn clean install—DskipTests. Keep in mind that all the projects must be build successfully. You need to resolve any issues Yourselves and try to build the project again if the building process ends with failure. Once the project is built successfully you should run unit tests with the command mvn test—Dmaven.test.failure.ignore=true, which will also ignore failed tests. In order to generate mutation tests and execute them, the same scripts were used, as in Software Evolution and Architecture Lab [6]. All of them, are being executed through our runExternalScipts.R, to stay in single environment. It is important to note, that this script will take a considerable amount of time to complete. Outcome of this script is mutationScoresGathered.csv, which holds mutation scores of all executed tests. Alternatively, if any trouble would be met while executing this script, all of the external scripts could be called individually in the following order: 13

https://github.com/michalpytka-pwr/JavaMetrics.

Which Static Code Metrics Can Help to Predict Test Case …

215

• generate_script.py • run_experiment_ALL.sh • gatherMutations.py. In order to execute python scripts, the PYTHONPATH variable has to be set on the root of the project. Additionally the bash script has to have it’s mode changed to 777, as advised previously in Software Evolution and Architecture Lab [6]. Both python scripts reside in python_scripts and the bash script is an outcome of the generate_script.py. The final step is to create the model. In order to do that you need to run the preprocessing.R script in the r_scripts folder. The script will create the file cleanData.csv containing all the data prepared for machine learning. Finally you should run basePipeline.R, which will train 3 models each with different classification algorithm: K-Neighbours, Support Vector Machines and Random Forest. Created models are saved in the folder saved_models and can be loaded into R environment.

References 1. Arnold, K., Gosling, J., Holmes, D.: The Java Programming Language. Addison Wesley Professional (2005) 2. Grano, G., Palomba, F., Gall, H.C.: Lightweight assessment of test-case effectiveness using source-code-quality indicators. IEEE Trans. Softw. Eng. 1 (2019). https://doi.org/10.1109/ TSE.2019.2903057 3. Grodzicka, H., Ziobrowski, A., Łakomiak, Z., Kawa, M., Madeyski, L.: Code smell prediction employing machine learning meets emerging Java language constructs. In: PoniszewskaMara´nda, A., Kryvinska, N., Jarza˛bek, S., Madeyski, L. (eds.) Data-Centric Business and Applications: Towards Software Development (Volume 4), vol. 40 of book series Lecture Notes on Data Engineering and Communications Technologies, pp. 137–167. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-34706-2_8, https://madeyski.einformatyka.pl/download/GrodzickaEtAl20LNDECT.pdf 4. Madeyski, L., Lewowski, T.: MLCQ: industry-relevant code smell data set. In: Evaluation and Assessment in Software Engineering (EASE2020). ACM, New York, NY (2020). https://doi.org/10.1145/3383219.3383264, http://madeyski.e-informatyka.pl/ download/MadeyskiLewowski20EASE.pdf 5. Miller, F.P., Vandome, A.F., McBrewster, J.: Apache Maven. Alpha Press (2010) 6. Software Evolution and Architecture Lab: Lightweight Effectiveness (2018). https://github. com/sealuzh/lightweight-effectiveness 7. Van Rossum, G., Drake, F.L.: Python 3 Reference Manual. CreateSpace, Scotts Valley, CA (2009) 8. Voas, J.M.: Pie: a dynamic failure-based technique. IEEE Trans. Softw. Eng. 18(8), 717–727 (1992). https://doi.org/10.1109/32.153381 9. Zhang, J., Zhang, L., Harman, M., Hao, D., Jia, Y., Zhang, L.: Predictive mutation testing. IEEE Trans. Softw. Eng. 45(9), 898–918 (2019). https://doi.org/10.1109/TSE.2018.2809496 10. Ziobrowski, A.: Java metrics (2020). https://github.com/LechMadeyski/JavaMetrics

Intelligent Freight Forwarder with Tabu Search Algorithm Mateusz Bujnowicz, Adam Dabrowski, Mateusz Szubannski, ´ Mateusz Wasilewski, and Witold Maranda ´

Abstract Use of Artificial Intelligence and automation contribute to improvements in variety of industries. This study aims to determine which part of Freight Forwarder processes can be enhanced with use of this cutting-edge technology. Initial step towards the goal was to split the whole process into smaller independent parts. This way four layers of the issue were obtained: Data layer, Planning layer, Realtime coordination layer and Dynamic scheduling layer. Each of the layers required unique approach and deep understanding of the knowledge behind it. Obviously for the found problem there were no absolute solutions, hence for every classified case paper depicts various ideas of fixing it.

1 Introduction Efficient shipment of goods is a very difficult task. The VRP (Vehicle Routing Problem [1]) has been studied for decades and was determined to be NP-hard. With the inclusion of cross-docking (which involves consolidating cargo with little or no intermediate storage) the issue becomes even more complicated [2–4]. Furthermore, a complex freight forwarding solution has to take into account a multitude of real-life factors to produce optimal routes such as the telematics data from a fleet of vehicles, the check-in times of drivers, the procedures in place for loading and unloading cargo, real-time traffic, as well as the situation on available transport exchanges.

M. Bujnowicz · A. Dabrowski · M. Szuban´nski · M. Wasilewski Lodz University of Technology, Lodz, Poland W. Mara´nda (B) Department of Microelectronics and Computer Science, Lodz University of Technology, Lodz, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_9

217

218

M. Bujnowicz et al.

A streamlined freight forwarding system with cross-docking has the potential to decrease the need for intermediate storage, speed up deliveries, save fuel and improve customer satisfaction—all in all reducing costs and increasing business profitability [5–7]. Despite being several decades old, there are not many resources on the VRPCD (Vehicle Routing with Cross-Docking Problem), especially when it comes to integrating the algorithm with all of the factors mentioned above. We can look at crossdocking at three different levels: • strategic level: deals with long-term planning, such as optimal cross-dock locations and the cross-dock layout, • tactical level: deals with the network of connections between the cross-docks and the product distribution routes, • operational level: deals with immediate decisions, such as arrival/departure scheduling at the cross-docks, truck ordering (choosing cross-dock slots), and optimal transfer of cargo between the trucks (sequencing forklifts). We want to create a drop-in solution to handle the logistics needs of a company that already has access to a given cross dock, therefore we will focus on the tactical and operational levels of the Vehicle Routing with Cross Docking Problem. We decided on a four layer architecture: • • • •

data layer, planning layer, dynamic scheduling layer, real-time coordination layer.

We settled on the tabu search algorithm for the core of our routing solution. We proposed a design of an intelligent freight forwarding system that makes use of a state of the art algorithm. We took into account all factors influencing deliveries.

2 State of the Art After examining the relevant literature, we identified three most prominent approaches to the VRPCD [8]: ant colony optimization, simulated annealing and tabu search.

2.1 Ant Colony Optimization Ant Colony Optimization is a problem-solving solution inspired by nature. In biology, ants use pheromones for optimal path-finding. Each ant leaves pheromones behind which evaporate in time. When another ant has to pick a route it is going to favour the one with stronger pheromones. Shorter paths take less time for an ant to go back and forth giving less time for pheromones to evaporate and thus favouring this shorter path. Also, the path with stronger pheromones is going to be chosen by a

Intelligent Freight Forwarder with Tabu Search Algorithm

219

greater amount of ants, enhancing their pheromone level even more. This way the ant colony overlay chooses the shortest paths. In the VRPCD problem, the Ant Colony Optimization approach can be used for finding the best paths between inbound trucks, cross docks, and outbound trucks [9]. The pheromone level, in this case, can be calculated based on more sophisticated attributes, like: • • • •

time, amount of trucks used, delivered products, products left at crossdock.

2.2 Simulated Annealing The approach presented in [10] is a meta-heuristic used to solve combinatorial optimization that repeatedly improves initial solution by making small local changes until no improvement occurs in the solution or an ending condition is satisfied. This SA (simulated annealing) algorithm employs three different neighbourhood search mechanisms to find better route combinations: swap, reversion and insertion. To find a new solution each neighbourhood search mechanism is used with a 1/3 probability: • swap: two nodes in a solution are randomly selected and their positions are exchanged in the new solution, • reversion: two nodes in a solution are randomly selected and the nodes between the selected nodes are reversely sorted in the new solution, • insertion: two nodes in a solution are randomly selected and the node which has the smallest position is inserted into the position just before the other node.

2.3 Tabu Search The approach presented in [11] uses a tabu search heuristic embedded within an adaptive memory procedure to solve the problem of transporting products from suppliers to customers using cross-docking. The method assumes a homogeneous fleet of vehicles and features no need for intermediate storage. Since the cross-dock allows the transfer of goods between vehicles, the pickup and delivery vehicles are not independent of each other. The pickup and delivery parts are also correlated. As a result of these interactions, calculating and performing a move can be difficult. A new aggressive skip procedure introduced in the tabu search plays a key role in effectively narrowing down the number of moves to be calculated thoroughly and in reaching high quality solutions within short computing times. The proposed algorithm was tested on realistic data sets involving up to 200

220

M. Bujnowicz et al.

pairs of nodes. Computational results show that it can provide high quality solutions (less than 5% away from the optimum) within very short running times.

2.4 Transport Exchange Market Nowadays there are four transport exchange companies that count on the market: 1. TimoCom—around 450.000 offers daily, 4 weeks of free of charge testing period. The cost after trial is calculated for each enterprise individually. Access to the exchange platform of TC Truck & Cargo and TC eMap (calculation packet), security packet included. They offer first-handed cargos, vehicle GPS systems and information about current traffic status. 2. Wtransnet—around 28.000 offers per day, free trial period included, the cost after testing is calculated individually for each enterprise. Access to private exchange platform Cargo Plus, Doc & Data system for managing documentation. They offer most services in Spain and also have white list system that includes companies paying on time. 3. Trans.eu—around 150.000 offers daily, 30 days of free trial period. After the testing phase the cost is around 70–80 euro per month or 60 euro when billed annually. They include access to TransLoad exchange and a security packet called TransProtect. Trans.eu also offers most services in Poland. Communicator application with contractors is included. 4. Teleroute—around 200.000 offers per day. No trial period, only a demo of the whole suite is available for testing. The cost is around 240 euro per 6 months. In the offer there is a dedicated system included for searching and loading cargos and a nice route planner. They offer services mostly for France. Hence all of the solutions above are using their own names for the tools being utilized when talking about warehouse docking solutions so we cannot determine the specific functionalities of these products. It is a well known fact that there are multiple ways of obtaining the same goal. All of them comes with their own pros and cons. The decision of using any of the method should be based on potential business value that it can bring. Cross Docks has direct impact on delivery time hence they are potentially a bottle necks of overall transportation process. From the other side, any improvement in terms of time needed to perform goods exchange within the Cross Dock can save enormous amount of money.

Intelligent Freight Forwarder with Tabu Search Algorithm

221

3 Project Specification and Requirements We decided to focus on the tactical and operational levels of VRPCD, disregarding the strategical level. We assumed that there is a single, unchangeable cross-dock location [12], through which we have to route cargo for consolidation. Unfortunately majority of the information regarding this topic is only available for private use of companies involved in the process. Naturally this state of affairs should not be considered as unusual. Such data can be a valuable resource, hence sharing it with competition would not be a smart decision [13]. The positive side of it is that despite the lack of available business data, there are many insightful researches regarding this field. We decided to use Python, because of its rich ecosystem of scientific libraries and its ergonomics. It is a great fit for quick prototyping. All of the team members had previous experience with the language. We used git to manage our source code repository and enable collaboration. To successfully measure if the outcome of the project is working as expected we need to state what needs to be the outcome of the whole process. As the project main goal is to create a fully functional intelligent freight forwarder, in the end we should obtain such a solution. We propose implementing some testing units that will allow us to check whether input of some simulated data from the trucks will be a satisfied outcome after transforming it using our algorithms and methods presented in this paper. This can be done by creating unique unit tests. In Python, unittest framework can be used that is based on the most famous JUnit framework in Java programming language. The brief simulation how the tests will take place is as follows: 1. 2. 3. 4. 5. 6.

Simulated random data input. Data pre-processing. Planning processing. Dynamic scheduling post processing. Real-time coordination processing. Sorted data.

3.1 Design and Architecture As illustrated in Fig. 1, three layer architecture proposed by Li et al. [14] was used, comprising of a planning, dynamic scheduling and real-time coordination layers, expanded by a data layer: • data layer contains location of cross-docks, suppliers (and the goods they are producing), as well as the size of the available fleet, • planning layer is tasked with determining what goods should arrive where (at what cross-docks),

222

M. Bujnowicz et al.

Fig. 1 Component diagram of intelligent freight forwarder prototype

• dynamic scheduling layer answers the question when will the trucks arrive at cross docks, • real-time coordination layer determines how the goods should be moved between inbound and outbound trucks.

3.2 Data Layer Main point of this layer is to introduce certain amount of abstraction to the problems of real world Cross-docking [15]. The first comes obvious division on classes. As a

Intelligent Freight Forwarder with Tabu Search Algorithm

223

standard approach in Object Oriented Programming complex structures and relations where mimic with use of concepts like inheritance, encapsulation and polymorphism. Initial parting of problem results in creation of four modules: 1. MapPoints—this module holds classes that are related to important locations that are involved into the process like for instance: Shops, Pick up centres and Cross Dock centres. Also in this particular part of the program the base class is MapPoint which represents physical location of thous places. 2. Products—the name can be considered as self explanatory. It contains abstractions of all goods that are somehow transferred in the cycle. Naturally base class is Product that store the information about the form of the product (liquid, solid), weights and how much space it takes. 3. Transport—next package is about to represent all entities actually are responsible for delivering products. Prime example is a class that supposed abstract a Truck. It holds variables like maximum amount of load that it is able to carry, calculated average speed on particular route and the cost. Base class is a LoadCarry that encapsulate information about how much load particular object can handle. 4. Participants—those classes are illustration of all business sides of overall process. Deals, shops demand.

3.3 Planning Layer The planning layer uses the cross-dock, supplier and customer locations provided by the data layer. It makes use of tabu search to find an optimal delivery route with an optimal number of vehicles. Example data in the CSV format can be as follows (Listing 1): CD 50 50 0 CD Apple (+3) 25 25 3 Apple (−3) Apple (−3) 10 10 −3 Apple (+3) Banana (+3) 24 24 3 Banana (−3) Banana (−3) 90 90 −3 Banana (+3) Cherry (+3) 75 75 3 Cherry (−3) Cherry (−3) 91 91 −3 Cherry (+3) Date (+4) 74 74 4 Date (−4) Date (−4) 11 11 −4 Date (+4) Listing 1. Example data in the CSV format

Solution for the example data given above is the following (Listing 2): Vehicle 1: CD -> Cherry (−3) -> Banana (−3) -> CD Vehicle 2: CD -> Cherry (+3) -> Date (+4) -> CD -> Apple (−3) -> Date (−4) -> CD Vehicle 3: CD -> Banana (+3) -> Apple (+3) -> CD Listing 2. Solution for the example data of Listing 1

Since the layers were initially developed in isolation, the planning layer also supports the import of CSV formatted data, such as shown in Listing 1. The first

224

M. Bujnowicz et al.

Fig. 2 Graph of the solution in Listing 2

column is the name of a node (any string is valid—CD is reserved for the crossdock). The second and third columns are the x and y positions, respectively. The next column contains the product amount: positive for suppliers, negative for customers, and 0 if the node is a crossdock. The last column is the paired node. This links suppliers and customers of a given product. The names of the nodes contain the supplied/consumed amount only for the reader’s convenience (in parentheses). Figure 2 shows the generated solution for the example data. The delivery routes are also depicted graphically in Fig. 2. The algorithm chose to dispatch three different trucks. First, two trucks gather supplies. The goods are then consolidated at the crossdock—one truck fully unloads, another truck only partially unloads and loads again one of the goods, and lastly a third truck loads all leftover goods. As depicted on the UML class diagram in Fig. 3, the planning layer makes use of dependency injection both for the data input and the solving strategy. The data loader has to implement the IDataLoader interface that returns a graph with the locations of the crossdock, the suppliers, and the customers.

3.4 Dynamic Scheduling Layer The dynamic scheduling layer is responsible for time calculation when will the trucks arrive at cross docks. The aim of this layer is to precisely obtain arrival and departure time of each truck at the docking station. Moreover, it also takes into consideration data pushed from each truck regarding its contents so the unloading time can also be calculated (Fig. 4).

Intelligent Freight Forwarder with Tabu Search Algorithm Fig. 3 Class diagram of the planning layer of intelligent freight forwarder prototype

225

226

M. Bujnowicz et al.

Fig. 4 Dynamic docking simulation [16]

The algorithm used by this layer consists of data obtained from drivers and vehicles using telematics technologies. From those obtained particular data sets, the arrival, departure and expected unloading time is calculated. Our algorithm can be divided into steps that illustrate its behaviour at a crossdocking model: 1. First, a function sums up the delay time for all outbound trucks and the waiting time of all inbound trucks. 2. Going further, the algorithm checks each trucks’ contents and their needs for goods. 3. Then, a function assigns each inbound truck to exactly one dock. 4. As a dock can be left unused, another function enforces that every dock is serving at most one truck at a given time window. 5. Next, the algorithm makes a valid sequence for arriving and departing times for the inbound trucks assigned to the same dock. 6. Another function makes sure that the start and end time of the first truck at each dock should be larger than the dock’s starting available time. 7. The last step is to connect the departure time for an outbound truck to the arrival time of an inbound truck to keep the flow going in the cross-dock. All of the functions above use data obtained from the telematic devices installed into the trucks.

3.5 Real-Time Coordination Layer The scope of this layer are the operations performed inside a cross-docking location. A cross-docking location handles the transfer of products between the inbound trucks and outbound trucks. The aim of our system is to schedule this transfer effectively, which stands for reducing the usage of temporary storage and thus minimizing the

Intelligent Freight Forwarder with Tabu Search Algorithm

227

products transfer makespan. The system performs it using the effective products transferring algorithm. We proposed this algorithm based on the following paper “Truck Scheduling in a Cross-Docking Terminal by Using Novel Robust Heuristics” [17]. For better understanding of this solution following definitions are established: • Outbound truck—an empty truck that needs to be filled with a particular amount of specific products (required products). • Remaining Required Products (RRP)—currently remaining products that need to be loaded into a given outbound truck to meet its requirements. • Inbound truck—a truck with products to be unloaded. • Temporary storage—during a transfer of products between an inbound truck to an outbound truck all the products from the inbound truck that are outside RRP of the outbound truck are sent to a temporary storage. Effective operating of the algorithm is based on two heuristic selection strategies: 1. Outbound truck selection strategy—take the outbound truck that requires highest number of products that is currently available in the temporary storage. If temporary storage is empty take a first truck from the possible outbound trucks list. 2. Inbound truck selection strategy—an inbound truck is selected for a particular outbound truck. For each possible inbound truck calculate the proportion of the number of products that are within RRP of the given outbound truck to the rest of this inbound truck’s products (this rest will be unloaded to the temporary storage). Pick an inbound truck with the maximum value of this calculation. Effective products transferring algorithm is presented in Fig. 5. There are a few advantages and one big disadvantage as an outcome of presented solution. To start with, using a tabu search optimization algorithm is effective on a wide variety of classical optimisation problems such as widely known graph colouring and travelling salesman problems. Taking into consideration above statement, this method is also practical when it comes to the scheduling issues and our project is a perfect example of that kind of difficulties. Moreover, tabu search is based on use of adaptive memory which allows the algorithm to be more flexible in its search behaviour. On the other hand, we found out that implementation of presented solution is quite hard to integrate into existing telemetry system. By integration, we mean taking the data from already existing telematic framework available onto the market and supplying it into our project [18]. Because of that, the project needs to be independently tweaked for a chosen solution as data format differs in each one of them. This requires proper API availability of the chosen framework and will take some time to correctly merge it together.

228

M. Bujnowicz et al.

Fig. 5 Effective products transferring algorithm for intelligent freight forwarder

4 Conclusion and Discussion The final outcome of the project is a proof of concept that can process data from the telematic systems and provide outcome as a cross-docking solution to the input. Data collected from the trucks by telemetry combined with the cross-dock type is processed using written algorithms and as an output we get proper schedule for the cargo so the drivers know where to dock and load/unload their goods. We did not find any particular solutions that will work similar to our project, as based on the market research, to dig deeper into the already available technologies a redeem of paid solutions is needed and we did not have funds to cover it. Moreover,

Intelligent Freight Forwarder with Tabu Search Algorithm

229

we do not know the basis of algorithms found in the cross-docking solutions such as Wtransnet and Teleroute because of all rights being reserved to the authors. None of the companies we did researched offer open-source code so we could compare our solution to their products [18, 19]. The final product is useful for small/medium corporations that want to implement cross-docking solution to their business model and eventually merge it with existing telemetry solutions so the data processed will be collected by another framework. However, as mentioned earlier, connecting together our solution with existing projects on the market could be tough and requires correct API set up from the 3rd party solution. We believe that it is possible but takes a lot of work to get it done. The creation of the proof of concept for solving VRPCD in the tactical and operational level was done. This solution has been developed on four layers: • data layer containing location of cross-docks, suppliers (and the goods they are producing), as well as the size of the available fleet, • planning layer is tasked with determining what goods should arrive where (at what cross-docks), • dynamic scheduling layer answers the question when will the trucks arrive at cross docks, • real-time coordination layer determines how the goods should be moved between inbound and outbound trucks. The next future step is implementing the proof-of-concept solution into a prototype software product. It should be a complementary system that would use all four layers of proposed solution and support communication between them. First, the prototype should enable the end user to insert cross-docks locations, suppliers, and the available fleet information into the data layer. Then for a given delivery request of the end user the system should return an optimal truck scheduling solution consisting of: • truck routes between supplier, cross-docks, destinations (planning layer), • timetables for trucks departures (dynamic scheduling layer), • inbound and outbound trucks order for products transfer in cross-docks (real-time coordination layer). The next steps would involve testing the prototype with artificial data and performing system improvements if needed. After that, if the system would become reliable it should be also tested and evaluated in the real world scenarios.

References 1. Wen, M., et al.: Vehicle Routing with Cross-Docking (2007) 2. Ertek, G.: Cross-Docking Insights from a Third-Party Logistics Firm in Turkey (2011). ISBN: 9781439867204. https://doi.org/10.1201/b11368-11 3. Ochelska-Mierzejewska, J.: Ant colony optimization algorithm for split delivery vehicle routing problem. In: Advanced Information Networking and Applications, pp. 758–767 (2020). https:// doi.org/10.1007/978-3-030-44041-1_67

230

M. Bujnowicz et al.

4. Ochelska-Mierzejewska, J.: Tabu search algorithm for vehicle routing problem with time windows. In: Data-Centric Business and Applications, Towards Software Development, vol. 4, pp. 117–136 (2020). https://doi.org/10.1007/978-3-030-34706-2_7 5. Poniszewska-Maranda, A.: Implementation of access control model for distributed information systems using usage control. In: Bouvry, P., et al. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 54–67. Springer, Heidelberg (2011) 6. Stepien, K., Poniszewska-Maranda, A.: Towards the security measures of the vehicular adhoc networks. In: Skulimowski, A.M.J., et al. (eds.) Internet of Vehicles. Technologies and Services Towards Smart City, IOV 2018. LNCS, vol. 11253, pp. 233–248. ISSN 0302-9743. ISBN: 978-3-030-05080-1. Springer-Verlag, Heidelberg (2018) 7. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N.: Use of salesforce platform for building real-time service systems in cloud. In: 2017 IEEE 14th International Conference on Services Computing (IEEE SCC 2017), Honolulu, Hawaii, USA, pp. 491–494, 2474-2473/17. https:// doi.org/10.1109/SCC.2017.72 (2017) 8. Van Belle, J., Valckenaers, P., Cattrysse, D.: Cross-docking: state of the art. Omega 40 (2012). https://doi.org/10.1016/j.omega.2012.01.005 9. Ting, C.-J.: An ant colony optimization for the multi-dock truck scheduling problem with cross-docking (2016) 10. Birim, S.: Vehicle routing problem with cross docking: a simulated annealing approach. Procedia Soc. Behav. Sci. 235, 149–158 (2016). https://doi.org/10.1016/j.sbspro.2016.11.010 11. Wen, M., et al.: Vehicle routing with cross-docking. J. Oper. Res. Soc. 60, 1708–1718 (2009). https://doi.org/10.1057/jors.2008.108 12. Chen, P., et al.: Multiple crossdocks with inventory and time windows. Comput. Oper. Res. 33, 43–63 (2006). https://doi.org/10.1016/j.cor.2004.06.002 13. Magableh, G., Rossetti, M., Mason, S.: Modeling and Analysis of a Generic Cross-Docking Facility, vol. 2005, 8 pp (2005). ISBN: 0-7803-9519-0. https://doi.org/10.1109/WSC.2005. 1574430 14. Li, Z., et al.: A solution for cross-docking operations planning, scheduling and coordination, vol. 05, pp. 2957–2962 (2008). https://doi.org/10.1109/SOLI.2008.4683041 15. Xing, B.: Computational intelligence in cross docking. Int. J. Softw. Innov. 2, 1–8 (2016). https://doi.org/10.4018/ijsi.2014010101 16. Baniamerian, A., Bashiri, M., Zabihi, F.: Two phase genetic algorithm for vehicle routing and scheduling problem with cross-docking and time windows considering customer satisfaction. J. Ind. Eng. Int. 14 (2017). https://doi.org/10.1007/s40092-017-0203-0 17. Tavakkoli-Moghaddam, R., Seyedi, I., Hamedi, M.: Truck scheduling in a cross-docking terminal by using novel robust heuristics (2019) 18. Babics, T.: Cross-docking in the sales supply chain: integration of information and communication (I+C) relationships. Period. Polytech. Transp. Eng. 33 (2005) 19. Kryvinska, N., Poniszewska-Maranda, A., Gregus, M.: An approach towards service system building for road traffic signs detection and recognition. J. Procedia Comput. Sci. 141, 64–71. Special Issue on 9th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2018). https://doi.org/10.1016/j.procs.2018.10.150 (2018)

Comparison the Genetic Algorithm and Selected Heuristics for the Vehicle Routing Problem with Capacity Limitation Joanna Ochelska-Mierzejewska and Przemysław Zakrzewski

Abstract The main objective of the research was to compare the operation of the genetic algorithm with selected heuristics (savings heuristics, Dijkstra heuristic, Christofides heuristics) for the routing problem with capacity constraints, for which the following comparison criteria were defined: time to find a solution, filling the fleet and accuracy of the solution. The article analyzes five random data sets differing in the location of points (cities) and the size of orders. Such a variety of data made it possible to analyze the effectiveness of selected heuristics. Results from genetic algorithm was compared with other heuristic. The results are presented in appropriate graphs, which facilitates the analysis of the results and their comparison. Keywords Vehicle routing problem · Savings heuristics · Dijkstra heuristic · Christofides heuristics

1 Introduction The cost of transporting goods and services is an interesting topic in today’s society. Routing is one of the most intriguing areas of operations research, where the goal is to find efficient route for transporting items through a complex network. The network is usually represented by a graph similar to that shown in Fig. 1. Each node represents a location and the route is a path through a set of nodes. Most of the real routing problems are finding efficient routes for vehicles: cars, trains, airplanes, etc. Routing problems can be divided into two main types—node routing problems and arc routing problems—depending on whether the goal is to visit nodes (locations) or arcs (edges connecting them). An example of each is given below.

J. Ochelska-Mierzejewska (B) · P. Zakrzewski Institute of Information Technology, Lodz University of Technology, Wolczanska 215 Street, Lodz, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_10

231

232

J. Ochelska-Mierzejewska and P. Zakrzewski

Fig. 1 Sample network graph

The first is the arc routing problem that, for example, Google Maps has to solve every day. When you look at Street View on Google Maps, you might be wondering how Google obtains street-level imagery under millions of addresses around the world. The answer is simple: the Google team is constantly driving a fleet of vehicles equipped with cameras that automatically take pictures at every address on the roads of the world. Google’s problem is to build the shortest route for each vehicle that traverses every street in the designated region. By representing this problem in a graph where arcs represent streets and nodes are street intersections, the problem with tracing the arc is finding the shortest path through each arc in the graph. An example of a problem with node routing is vehicle routing, assuming that a company has to deliver packages to different locations using a fleet of vehicles. In the graph of this problem, the nodes represent the locations and the arcs represent the routes between them. Each curve has a weight corresponding to the cost of traveling that route. The problem is finding a set of paths in the graph (corresponding to the delivery routes for each vehicle) that covers each destination while minimizing the total cost. This is different from the arc tracing problem because paths do not have to pass through each arc, you just need to include each node. One variant of the Vehicle Routing Problem (VRP) is the Capacitated Vehicle Routing Problem (CVRP) that is considered in these studies. The Capacitive Vehicle Routing (CVRP) problem is a variation of VRP where limited payload vehicles have to pick up or deliver items at different locations. Each location has items that it has a certain number of, and vehicles have a maximum capacity that they can carry. The problem is collecting or delivering items at the lowest possible cost, never exceeding the capacity of the vehicles [1, 2]. The aim of this paper is to analyze the vehicle routing problem using a genetic algorithm and some selected heuristics and to compare how the results are be changed depending on modification of: • • • •

the number of points; the quantity of goods to be collected from reloading points; the number of drivers (fleets); fleet capacity.

Comparison the Genetic Algorithm and Selected Heuristics …

233

The research will help to answer the question of what input parameters will give us the most optimal result. It will also be examined how efficient the selected heuristics are in relation to the genetic algorithm. The test results will be presented in tables and graphs. All studies were performed on the same input parameters for the genetic algorithm and each heuristic, because only then can a comparative analysis be performed. The work is divided into 12 section, each of them describes a different topic. After the introduction, in Sect. 2 there is a description of the route planning problem. Next, in Sect. 3 there is presented the genetic algorithm and in Sect. 4 you can find some information about selected heuristics. The most important is Sect. 5, where there are described experiments and disscussion of results.

2 Problem of Route Planning 2.1 Traveling Salesman Problem The Traveling Salesman Problem (TSP) is the problem of finding a seller’s route that starts at home, visits a specific set of cities, and returns to the original location so that the total distance traveled is minimal and each city is visited accurately once. While a modern retailer’s business trip may not seem too complicated in terms of route planning, TSP in general represents the typical “hard” combinatorial optimization problem. A formal mathematical definition of TSP will be given now. Let G = (V , E) be a graph (directed or undirected) and F be the family of all Hamiltonian cycles (routes) in E. For each edge of ρ ∈ E the cost (weight) cf is recommended. The problem with traveling vendors is to find a route (Hamiltonian cycle) in G such that the sum of the route edge costs is as small as possible. Without losing generality, it is assumed that G is a complete graph, otherwise one could replace missing edges with very costly edges. Let V = {1, 2, . . . , n} be a node set. The matrix C = (cij )(n×n) is called the cost matrix (also called the distance matrix or mass matrix), where (i, j) inputs cij are the costs responsible for connecting the edge of node i with node j in G. TSP can also be viewed as a permutation problem. Generally, there are two varieties of TSP permutation representation, quadratic and linear. Let Pn be the set of all permutations of the set {1, 2, . . . , n}.  Then TSP has to find π = (π(1), π(2), . . . , π(n)) in Pn such that cπ(n)π(1) + n−1 i=1 cπ(i)π(i+1) is minimized. In this case, (π(1), π(2), . . . , π(n)) gives the order of visiting the cities, starting with the city of π(1). For example, if n = 5 and π = (2, 1, 5, 3, 4) then the appropriate route would be (2, 1, 5, 3, 4, 2). Each cyclic change of π also gives the same route. Thus, there are n different combinations that represent the same route. This TSP representation is generally used in the context of some problem sequencing and results in a binary formulation of quadratic TSP programming.

234

J. Ochelska-Mierzejewska and P. Zakrzewski

In another representation of a TSP permutation, referred to as a linear representation of permutations, only special types of permutations, called cyclic permutations, {1, 2, . . . , n}. are considered feasible. Let Cn be a collection of all cyclic permutations  Then TSP has to find σ = (σ (1), σ (2), σ (n)) ∈ Cn such that n(i=1) ciσ (i) is minimized. Under this representation, σ (i) is the successor of the city i in the resulting path for i = 1, 2, . . . , n. For example, if n = 5 and (σ (1), σ (2), . . . , σ (5)) = (2, 4, 5, 3, 1), the resulting route is given by (1, 2, 4, 3, 5, 1). This TSP representation leads to the binary linear programming formulation of TSP and hence we call it the linear permutation representation. Depending on the nature of the cost matrix (equivalently the nature of G), TSP is divided into two classes. If C is symmetric (ie, G is unguided) then TSP is called Symmetric Traveling Salesman Problem (STSP). If C is not necessarily symmetric (equivalently, the graph G is directed) then it is called the Asymmetric Traveling Salesman Problem (ATSP). Since any non-directional graph can be viewed as a directed graph, duplicating the edges one forward and the other in the reverse direction, STSP can be considered a special case of ATSP. Interestingly, it is also possible to formulate ATSP as STSP by doubling the number of nodes. Consider ATSP on the graph D = (N , A) with cij as the cost of the arc (i, j). Construct an undirected graph G∗ = (V ∗, E∗) with node V ∗ = N ∪ {1∗, 2∗, . . . , n∗} where N = {1, 2, . . . , n}. For each arc (i, j) ∈ D, create an edge (i, j∗) ∈ G∗ with costs cij . Also enter n neutral edges (i, i∗) of cost – M , where M is a very large number. All other edges have a cost of M . It can be verified that the ATSP solution on D is equivalent to the STSP solution on G∗ [3].

2.2 Route Planning The classic vehicle routing problem (VRP) can be defined as follows. Let G = (N , A) be a graph, where N = {0, . . . , n} is a set of vertices corresponding to cities, and A = {(i, j) : i, j ∈ N , i = j} is a set of arcs. The vertex 0 represents the fleet of m vehicles, the remaining vertices correspond to customers—m belongs to a certain interval [m, m], where 1 ≤ m ≤ m ≤ n. Vehicles may have equal or different capacities. Let the vehicle have a capacity of Qv . Each vertex i from N \ {0} has a non-negative demand qi ≤ [max]v {Qv } and each arc (i, j) has an associated non-negative travel distance or cost cij . VRP is the determination of a set of vehicle routes with a minimum cost [4, 5]. The main assumptions of VRP are: • starting and ending journeys in the warehouse; • each customer is visited once by one vehicle; • the total demand for any route does not exceed the capacity of the vehicles assigned to the route. It iswell known that VRP is an NP-hard problem [4, 6, 7]. When m = 1 and Q1 ≥ ni=1 qi , this then reduces to the Traveling Salesman Problem. There are many items in the literature on VRP and related problems. For example, Dror and Trudeau [8, 9] investigated VRP relaxation in which the second condition (from the above)

Comparison the Genetic Algorithm and Selected Heuristics …

235

was removed, ie the customer’s request can be divided into several vehicles. In this context, it is no longer necessary to assume that qi ≤ [max]v {Qv }. This variant of VRP is called the Split Delivery Vehicle Routing Problem (SDVRP). Dror and Trudeau [8, 9] proposed a heuristic algorithm for SDVRP and it was shown that allowing split deliveries can bring significant savings, both in the region of the total distance traveled and the number of vehicles used in the optimal solution [10]. Unfortunately, SDVRP is still an NP-hard problem [9].

2.3 Mathematical Model Let xijv be a binary variable defined for i = j and is equal to 1 if and only if vehicle v will drive directly from i to j in optimal solution. Let yiv be the proportion of i customer requests provided by vehicle v [5, 11]. The problem is then: Minimize n  m n   cij xijv (1) i=0 j=0 v=1

with limitation n 

xikv =

i=0

n 

xkjv = 0

(k = 0, . . . , n; v = 1, . . . , m)

(2)

j=0 m 

yij = 1

(i = 1, . . . , n)

(3)

v=1 n 

qi yiv ≤ Qv

(v = 1, . . . , m)

(4)

i=0 n 

xijv ≥ yiv

(i = 0, . . . , n; v = 1, . . . , m)

(5)

j=0

subcontracting restrictions and connectivity restrictions xijv ∈ [0, 1]

(i, j = 0, . . . , n; v = 1, . . . , m)

(6)

0 ≤ yiv ≤ 1

(i, j = 0, . . . , n; v = 1, . . . , m)

(7)

In this formulation, the limitation (2) is the flow conservation conditions. The limitation (3) is that the demand of all customers is completely met. Constraint (4) ensures that the capacity of the vehicle is never exceeded, and constraint (5) ensures

236

J. Ochelska-Mierzejewska and P. Zakrzewski

that if a customer i is visited by a vehicle v then the same vehicle leaves that customer. Sub-route elimination restrictions (6–7) require more detailed discussion. Note that adding the constraints (5) for all vehicles produces the following connectivity constraints: m  m n   xijv ≥ yiv = 1 (i = 1, . . . , n) (8) v=1 j=0

v=1

which means that each client will receive at least one visit [3].

2.3.1

Vehicle Routing Problem with Limited Capacity

CVRP can be defined as the following graph theoretical problem [12, 13]. Let G = (V , A) be a complete graph, where V = {0, . . . , n} is a set of vertices and A is a set of arcs. The vertices j = 1, . . . , n correspond to clients, each with a known non(with a negative request, dj , to be delivered, while vertex 0 corresponds to warehouse  dummy request d0 = 0). Given the customer’s set S ⊆ V , let d (S) = j∈S dj denote the total demand for the set. The non-negative cost, cij is associated with each arc (i, j) ∈ A and represents the travel cost of the transition from vertex i to vertex j. Basically, the use of loop arcs (i, i) is not allowed and is imposed by defining cii = +∞ for all i ∈ V . If the cost matrix is asymmetric, A is a set of directed arcs, and the corresponding problem is called Asymmetric Capacitated Vehicle Routing Problem (ACVRP). Otherwise, i.e., when cij = cji for all i, j ∈ V , the problem is called Symmetric Capacitated Vehicle Routing Problem (SCVRP) and the set of arcs A is often replaced by a set of nondirected edges, E. Below denote the unguided set of edges of the graph G by A when the edges are indicated by their endpoints (i, j), i, j ∈ V , and by E when the edges are indicated by a single index e. Given the set of vertices S ⊂ V , let δ(S) and σ (S) denote the set of edges e ∈ E (or arcs (i, j) ∈ A) that have only one or both endpoints in S, respectively. As usual, when single vertices i ∈ V , we write δ(i) instead of δ({i}). It is generally accepted that a G graph is complete (that is, it includes arcs connecting all pairs of vertices, possibly except for loops) because this simplifies the notation. If not, a complete plot can be easily obtained by assigning an infinite cost value to non-existent arcs. In several practical situations, the cost matrix satisfies the triangle inequality, cik + ckj ≥ cij for all i, j, k ∈ V . In such a case, it is not convenient to deviate from the direct connection between the two vertices i and j. Estimation of the triangle inequality is sometimes required by the algorithms for CVRP. Then, if the original example does not satisfy the triangle inequality, an equivalent example can be obtained immediately by adding a sufficiently large positive value of M to the cost of each arc. However, the drastic distortion of the metric caused by this operation can give very bad solutions in relation to the original cost, mainly in relation to the effectiveness of heuristic algorithms. If G is strongly connected but not complete, it is possible to obtain a

Comparison the Genetic Algorithm and Selected Heuristics …

237

complete graph where the cost of each arc (i, j) is defined as the shortest path cost from i to j, computed on the original graph. Note, in this case the full graph satisfies the triangle inequality, so you can also see it as a full graph “triangulation” method. Moreover, in some cases the vertices are related to the points of the plane with given coordinates and the cost cij , for all arcs (i, j) ∈ A, is defined as the Euclidean distance between two points corresponding to the vertices i and j. In this case, the cost matrix is symmetric and satisfies the triangle inequality, and the resulting problem is often called the Euclidean CVRP. A set of K identical vehicles, each with a C capacity, is available from warehouse. Each vehicle can only travel one route, and we assume that K is not less than Kmin , where Kmin is the minimum number of vehicles needed to serve all customers. The Kmin value can be determined by solving the CVRP Bin Packing Problem (BPP) by calling for a minimum number of containers, each with a capacity C, required to load all n items, each with a non-negative mass dj , j = 1, . . . , n. Despite the fact that BPP is NP-hard problem, cases with hundreds of items can be optimally resolved very effectively. Below, considering the set S ⊆ V \ {0}, we denote by γ (S) the minimum number of vehicles needed to serve all customers in S, i.e. the optimal value of the BPP solution with the set of items S. Note that γ (V \ {0}) = Kmin . Often γ (S) is replaced with a so-called continuous lower bound for BPP: d C(S) . Further, to ensure workability, we assume that dj ≤ C for each j = 1, . . . , n. CVRP consists of finding a set of K simple circuits (corresponding to vehicle routes) at a minimum cost, defined as the sum of the costs of the arcs belonging to the circuits and those that: • each circuit visits vertex 0, ie the top of depot, warehouse; • each vertex j ∈ V \ {0} is visited by exactly one circuit; • the sum of the needs of the vertices visited by the circuit does not exceed the vehicle capacity, C. Several variants of the basic versions of CVRP have been considered in the literature. First, when the K number of available vehicles is greater than Kmin , it may be possible to leave the vehicle unused, which will require at most K circuits. In this case, the fixed costs are often associated with the use of the vehicles. This can be accounted for in the CVRP by adding a constant value representing the fixed cost of operating the vehicle to the cost of the bends leaving the depot. In practical situations, there is often an additional goal of minimizing the number of circuits (i.e. vehicles) used. Usually, the algorithms proposed in the literature do not explicitly take into account this goal, however, depending on the properties of the algorithm used, there are various ways to take it into account. When the algorithm allows solutions to be determined using a number of circuits less than K, this goal can be easily accommodated by adding a large constant value to the cost of the arcs leaving the depot. Thus, the optimal solution first minimizes the number of arcs leaving the warehouse (hence the number of circuits) and then the cost of other arcs used. If, as is usual, the algorithm only specifies solutions that use all available K vehicles, there are two possibilities. The first is to calculate Kmin by solving the BPP related to the CVRP and then apply the algorithm with K = Kmin . The second

238

J. Ochelska-Mierzejewska and P. Zakrzewski

possibility is to define an extended instance with a full plot of G = (V , A) obtained from G by adding a dummy vertex K − Kmin to V , each with the request dj = 0. Let W = {n + 1, . . . , n + K − Kmin } will be the set of these dummy vertices, cost cij of the arcs (i, j) ∈ A is defined as: cij = cji

for

i, j ∈ V

(9)

0

for

i = 0, j ∈ W

(10)

0

for

i ∈ W, j = 0

(11)

c0j

for

i ∈ W, j ∈ V \ {0}

(12)

M

for

i ∈ V \ {0} , j ∈ W

(13)

i ∈ W, j ∈ W

(14)

M

for

where M is a very large positive number. An optimal CVRP solution computed on an extended instance may contain “empty” routes formed by individual vertex mockups. Note that by adding the large constant c0j , j ∈ W , the number of empty routes is maximized, i.e. the number of vehicles used is minimized. It should be noted that even in the case where the triangle inequality persists, minimizing the number of circuits used generally does not correspond to minimizing the total cost of the circuits. On the other hand, solutions forced to use the exact K routes (with K > Kmin ) do not generally lead to a minimization of the total cost of the circuit. CVRP is known as NP-hard problem and generalizes the well-known Traveling Seller Problem, arising when C ≥ d (V ) and K = Kmin = 1. Therefore, all relaxations proposed for TSP are valid for CVRP. As mentioned before, CVRP is also associated with the problem of container packaging [14].

2.3.2

Vehicle Routing Problem with Time Window

Vehicle routing problem with time windows (VRPTW) is defined in the network N = (V , A) with the V node set and the arc set A. As usual for the customer i ∈ C ⊂ V must be collected a certain amount of qi for one vehicle visit. Each i client allows the service to start in the [ei , li ] time window. Let K be the set of vehicles. Since we assume that each vehicle acts exactly as one route in the planning horizon, K is also a set of routes. Each k ∈ K tour begins at the starting point o(k) ∈ V , ends at the destination d (k) ∈ V , and visits customers in between. Side dependencies may limit vehicle k to only customers visiting C k ⊆ C. Hence the subnet N k = (V k , Ak ) with nodes V k = C k ∪ {o(k), d (k)} describes the possible movements of the vehicle in k in space. For modeling purposes, it is

Comparison the Genetic Algorithm and Selected Heuristics …

239

beneficial to formulate a problem with separate nodes, as a result of which O = {o(k) : k ∈ K} and D = {d (k) : k ∈ K} both have cardinality |K|. The k ∈ K vehicles are characterized by the following data: the sum of the quantities collected by the vehicle k may not exceed the vehicle capacity Qk . Time windows [eo(k) , lo(k) ] and [ed (k) , ld (k) ] limit the start and end times of the route k. Travel times tijtime and costs cij for (i, j) ∈ A are assumed to be vehicle independent. Note that additional service times in node i can always be included in tijtime without changing the interpretation of time windows. The VRPTW model is as follows:  k,cost Td (k) (15) min k∈K

  k∈K





k xo(k),j =

k xi,j −

j:(i,j)∈Ak

(16)

k xi,d (k) = 1 ∀k ∈ K

(17)

i:(i,d (k))∈Ak

j:(o(k),j)∈Ak



xijk = 1 ∀i ∈ C

j:(i,j)∈Ak



xjik = 0 ∀k ∈ K, i ∈ V k

(18)

j:(j,i)∈Ak

xijk ∈ {0, 1} ∀k ∈ K, (i, j) ∈ Ak

(19)

xijk (Tik,cost + cij − Tjk,cost ) ≤ 0 ∀k ∈ K, (i, j) ∈ Ak

(20)

Tik,cost ≥ 0 ∀k ∈ K, i ∈ V k

(21)

xijk (Tik,load + qj − Tjk,load ) ≤ 0 ∀k ∈ K, (i, j) ∈ Ak

(22)

0 ≤ Tik,load ≤ Qk ∀k ∈ K, i ∈ V k

(23)

xijk (Tik,time + tijtime − Tjk,time ) ≤ 0 ∀k ∈ K, (i, j) ∈ Ak

(24)

ei ≤ Tik,time ≤ li ∀k ∈ K, i ∈ V k

(25)

This nonlinear VRPTW math programming formulation includes two types of decision variables: first, the flow variables xijk for k ∈ K and (i, j) ∈ Ak are 1 if the arc (i, j) is used in the route k otherwise, they are equal to 0. Second, the resource variables Tik,r represent the resource consumption r ∈ R of the k route in node i. For VRPTW, we consider the resource R = {cost, load , time}. Constraints (16) ensure that each client i ∈ C is assigned to exactly one route k ∈ K. Continuous flow between the starting point o(k) and the destination d (k) in N k is guaranteed by (17) and (18). The non-negative resource variables Tik,cost , record

240

J. Ochelska-Mierzejewska and P. Zakrzewski

the costs of the (partial) route starting with o(k) and ending with the corresponding node i ∈ V k . The correct cost route update is provided by (20): If the vehicle k moves directly from i to j, the partial costs Tjk,cost are at least the cost of Tik,cost plus the cost cij along the arc (i, j). Note that Tik,cost can always be set to zero if node i is not visited by vehicle k. Therefore, target (15) accurately determines the cost of all trips. Operating costs on arcs can be supplemented with fixed costs on arcs (o(k), i) connecting the source to the first customer. Also an arc (o(k), d (k)) may exist in Ak to represent the empty path k. The remaining resource constrained, time and load are resource modeled variables Tik,time and Tik,load , which are constrained by (23) and (25). Their update is given by (22) and (24). The load update (22) is managed identically to the cost update (20).  Updating the times by (24) guarantees together with (25) that Tjk,time ≥ max ej , Tik,time + tijtime holds whenever the vehicle k uses an arc (i, j). Vehicles arriving before the time window starts must wait. It is obvious that the purpose and limitations  can also be formulated   of the ability in a more “classic” way, e.g. minimize k ij cij xijk and ij qj xijk ≤ Qk for all k ∈ K. There are also simple linear reformulations of the time update (24) using the wellknown big-M technique. The point, however, is that the above formulation is more general since all three resources are treated identically: constraints (20), (22), and (24) can be reformulated with Resource Extension Functions (REFs) which is more convenient for the graphical and theoretical description of the problem [11, 15].

3 Genetic Algorithm Genetic algorithms (GA) belong to the class of evolutionary algorithms. GA starts out with several possible solutions to the problems that are called as population. Each unit (solution) in the population is a solution cost. The optimization problem is finding a solution that minimizes or maximizes this efficiency value. New generations are created with crossover and mutation operators applied to the selected person in the population. In the selection method, two people are selected from the population as parents. Two offspring are created, and they are crossed and mutated. When the optimal route meets the threshold or the number of generations to the limit is reached, the termination condition is satisfied and the GA generates a solution with a maximum or minimum efficiency value [16].

3.1 Initialization Initialization is GA’s first meta-heuristic process. Usually it can be random, and sometimes the solution of the heuristic method can be combined with a random population. However, the wrong choice of the starting population can lead to a pile

Comparison the Genetic Algorithm and Selected Heuristics …

241

Fig. 2 Representation of the CVRP example

consisting of local maximum or minimum points. Some solutions may be so dominant that other people (chromosomes) cannot produce better offspring (solutions). Most CVRP solutions follow a greedy capacity limiting approach after random initialization. It is therefore assumed that the vehicle visits the first customer and serves it according to customer requests and vehicle capacity. If the vehicle has enough capacity for the next customer, it will visit. Otherwise, the vehicle is returned to the depot. For example, taking into account (Fig. 1), if the order of the possible solution (chromosome) is 1 4 5 6 3 7 2, customer demand is 3 2 1 3 3 2 4, and the vehicle capacity is 6, the chromosome consists of 3 separate routes initiated and ending at starting point 0 as shown in Fig. 2.

3.2 Coding Scheme In this work, we represent each city as a unique number. Except for the store, no number will be repeated on the chromosome. For example, if we have 10 cities, the unit consists of 10 or more numbers (with warehouse 0) from 1 to 10 in the order of random permutation. A unit can thus be 3 2 8 0 1 7 6 0 4 5 9 0 1 0, where zero is the composition.

3.3 Fitness Value The performance value is calculated when new offspring are created. Each person (chromosome) has an fitness value. This helps in finding the best individual. If an individual has the maximum fitness value, it will be the best unit in the population. In CVRP, if the total distance is minimum for a given solution, it means it has the maximum efficiency value. Therefore, usually the efficiency value is formulated as:

242

J. Ochelska-Mierzejewska and P. Zakrzewski

fi =

100 TotalDisti = TotalDisti TotalDisti

(26)

where fi is the fitness function, and TotalDisti is the sum of the distance for ith person.

3.4 Selection Operator The selection operator is used to select the parents of the current population to create a new unit for the next generation by using crossover or mutation operators. Some of the known selection operators are tournament selections, roulette selections and steady state approaches. Selecting a tournament helps you find the best unit among the units participating in the tournament. For example, if the tournament size is 2 (this is called the binary tournament size), two individuals are randomly selected and one of them with the better fitness value is the winner and will be chosen as the parent. The choice of the roulette wheel is proposed in order to select the chromosomes which have the higher fitness function values. The higher the efficiency value, the more likely the choice is. The fixed approach ensures that the two parents and the two offspring are compared. The best two chromosomes are selected.

3.5 The Crossover Operator The crossover operator is GA’s major meta-heuristic process to differentiate chromosomes from generation to generation. The two selected parents consist of two offspring following some steps depending on the type of crossing. There are different types of crossover operators, such as: • Partially Mapped Crossover (PMX), • Best Route Better Adjustment Recombination (BRBAX), • Pereira Crossover etc. PMX is one of the best known crossover operators. The two cutting points are selected randomly and the offspring inherit the genes between the two cutting points in the same order. Then a mapping table is created between the two parents. According to the position of the second parent, if the gene is not between the cutting points, it is placed on the first child’s chromosome. However, if the gene is already copied, it is ignored. The following example explains how PMX works. Suppose the chromosomes of the selected parents are: P1 is (3 5 2 1 4 8 7 5) and P2 is (2 4 5 6 7 3 1 8). Then, two cutting points are randomly generated and shown as black box in Fig. 3. The gene exchange information between the cutting points what is shown in Fig. 4. The mapping will look like shown in Fig. 5.

Comparison the Genetic Algorithm and Selected Heuristics …

243

Fig. 3 Example of parents for CVRP with cutting points

Fig. 4 An example of temporary children for CVRP

Fig. 5 Mapping table for CVRP

Fig. 6 An example of BRBAX crossover operator

Best Route Better Adjustment Recombination is a constructed solution to this problem. This operator tries to pass the best routes to new generations that have shorter distance and meet the capacity limit. The procedure is as follows. The first step is to select one of the parents and sort its m routes and m/2 routes that have the minimum distance, these are selected and placed as the first chromosome genes of the child. Then, in a second step, the genes of the other parent are passed on to the child according to the chromosome order of the parents. Figure 6 shows an example BRBAX hybridization operation. In contrast, Fig. 7 shows the classic crossover for CVRP. The crossing suggested by Pereira is another operator that can be used in CVRP. First, cutting point is randomly selected from the second parent. Then, the next gene for the first of the sub-route genes is selected, the sub-route is inserted; therefore it is probably satisfied over a shorter distance. The following example assumes that client 6 is closer to client 9 (9, 1, 10) the subroutine is randomly selected and this is superimposed near 6.

244

J. Ochelska-Mierzejewska and P. Zakrzewski

Fig. 7 An example of classic crossover operator

3.6 Mutation Operator The mutation operator is used to escape local optimal points. When the mutation is not used, local points can be treated as an optimal solution. Swap, insertion, displacement and inversion mutations are well known types of mutations. Details of these mutation operators are given below: • Swap: Two points are randomly selected from the chromosome and swapped. The genes can be the same or different routes. • Insertion: The gene is randomly selected and removed from the position, the new position is re-inserted. • Displacement: A subgroup is selected and then set to the new item again. A route may be added or a new route may be created. • Inversion: The points between two randomly selected positions are inverted. • Scramble: The subgroup is selected randomly and the genes are mixed. All the above-mentioned mutations can be local and global, that is, they can be on the route or in the chromosome.

3.7 Elite The best unit can be preserved without changing the results of crosses and mutations. That is why it will be inherited this way to the next generations without any changes, it is the so-called elitism.

4 Heuristics 4.1 Savings Algorithm The savings algorithm is a heuristic algorithm, therefore it does not provide an optimal solution to the problem. However, this method often produces relatively a good solution. It is a solution that does not differ much from the optimal solution.

Comparison the Genetic Algorithm and Selected Heuristics …

245

Fig. 8 Illustration of a savings concept

The basic concept of savings expresses the cost savings obtained by combining two routes into one route, as shown in Fig. 8, where point 0 represents the warehause. Initially in Fig. 8a, customers i and j are visited on separate routes. An alternative for them is that two customers visit the same route, for example in the sequence i–j as illustrated in Fig. 8b. Because the transport costs for the route are given in Fig. 8a, b. The savings that result from driving one route in Fig. 8b instead of the two routes in Fig. 8a can be calculated. By denoting the transport cost between the two given points i and j by cij , the total transport cost Da in Fig. 8a is: Da = c0i + ci0 + c0j + cj0

(27)

Accordingly, the cost of transport Db in Fig. 8b is: Db = c0i + cij + cj0

(28)

By combining two routes, savings Sij are obtained: Sij = Da − Db = ci0 + c0j + cij

(29)

Relatively large values of Sij indicate that visiting in term of cost of points i and j on the same route, such that point j is visited immediately after point i. There are two versions of the savings algorithm, sequential and parallel. In the sequential version, only one route is created at a time (excluding routes with only one client), while in the parallel version, you can build more than one route at a time. In the first step of the savings algorithm, savings are calculated for all customer pairs and all customer point pairs are sorted in descending order of savings. Then, one pair of points is considered at a time from the top of the sorted list of pairs of points. When taking into account the pair of points i–j, the two routes visited by i and j are connected (such that j is visited immediately after i on the resulting route), if possible this is done without removing the previously established direct connection between the two clients points, and if the total demand on the result route does not exceed the vehicle capacity. In the sequential version you have to start from the beginning of the list each time, a fresh connection is established between a pair of points (because

246

J. Ochelska-Mierzejewska and P. Zakrzewski

combinations that were not viable so far may have become profitable), while the parallel version requires only one pass through the list [15].

4.2 Dijkstra Algorithm The Dijkstra algorithm was introduced by the Dutch computer scientist Dijkstra in 1959 [17]. It solves the problem of the shortest distance from a single starting point to other points in the directional graph and is a method that computes the shortest routes after the weighted consideration is completed. The calculation steps of the Dijkstra Algorithm are presented as follows: 1. Divide all nodes in the graph into two sets of S and U : “visited nodes set” is placed in S with the original state of the zero set; the “set of unvisited nodes” is placed in U with the original state of all sets of distribution sites. 2. Change the starting point 0 (usually a logistics center) as a permanent label and move from U to S. Set the starting point P(0) = N U LL, travel cost to the starting point distance L(0) = 0, set i = 0. Distance travel cost for all other nodes j: L(j) = ∞. Consequently, P(i) is the exit node of node i, and (i) is the set of all nodes i. 3. Update all nodes marked as temporary on page stronie (i): L(j) = min[L(j) , L(i) + w(i, j)] if L(i) + w(i, j) < L(j) , then P(j) = i. 4. Select the smallest j node of L(j) from the set U . 5. Set node j as a permanent label, move it from set U to set S and let i = j. 6. If i = D then it is the shortest distance from origin 0 to distribution D, L(D) is the minimum travel cost. If i = D, return to step 2 to continue calculations. The advantage of the Dijkstra algorithm is that you don’t have to go through all the nodes to find the shortest route. If the shortest route has found a distribution destination, the distribution routes to the distribution point will necessarily spend more time than this route, and the sub-route of that shortest route will necessarily become the shortest route. However, even if the Dijkstra algorithm can get the optimal solution, it will waste a lot of time looking for unnecessary directions when the scale of the traffic network is large. For example, when the distribution site is in the southeast direction of the starting point, the Dijkstra Algorithm will look for all possible routes in all directions, including Northwest and other directions [5].

4.3 Christofides Algorithm Christofides’ algorithm [11] is an approximation algorithm for solving VRP. To apply this algorithm, we should build a full G-weighted graph of the problem. Each city (client) of the VRP is shown as a node of the G graph. The weight of each edge is the Euclidean distance of the edge vertices. The steps of Christofides’ algorithm are as follows:

Comparison the Genetic Algorithm and Selected Heuristics …

247

1. Find a minimal G spanning tree named T . 2. Consider the odd-degree vertices of T named O, and find a perfect match of M with O vertices of minimum weight. 3. Add edges M to T to create an H pseudo-graph. 4. Generate Eurel cycle H . If you catch parallel edges in the Eurel cycle generation process, use only one of them instead of going back to the previous node, go to the next node. The result is the Hamiltonian cycle. Now we will solve the problem using Christofides’ algorithm. First, we should generate a full G graph. The generated data is presented in Table 1. Next, the minimum scope of the T tree is created. The structure of tree T is shown in Fig. 9a. The vertices 1 and 4 are the odd degree and form of the set O. Then the edges 1–2 and 4–6 form a perfect match of M with O. The resulting pseudo-graph H , which is obtained by adding the border M to the tree T is shown in Fig. 9b. Matching edges are marked with a dotted line symbol. The 3–7–5–6–4–2–1–2–3 cycle is the result of the Euler cycle on the graph. The output of the Hamiltonian cycle algorithm is shown in Fig. 9c. The Hamilton cycle is obtained by omitting the visited city (junction). The Hamilton cycle 1–2–3–7–5–6–4–1 is a solution of the Christofides algorithm. It can be seen that the solution of Christofides’ algorithm is equal to the optimal solution obtained by mathematical programming techniques. Christofides’ algorithm is a good heuristic algorithm, which gives an almost optimal solution [13].

Table 1 Information about the problem with the traveling dealer City Horizontal coordinates Vertical coordinates 1 2 3 4 5 6 7

292 293 296 271 283 279 295

Fig. 9 Illustration of a savings concept

495 421 418 401 406 399 402

248

J. Ochelska-Mierzejewska and P. Zakrzewski

5 Experiments This section will be devoted to research, we will check to what extent the optimal distortions can be obtained using particular heuristics in comparison with the genetic algorithm.

5.1 Testing the Efficiency of the Proposed Solutions of Selected Heuristics in Comparison with the Genetic Algorithm While conducting the research, it was assumed that the warehouse would be the first of the generated cities. The input parameters for the vehicle routing problem are presented in Table 2. The input parameters for the genetic algorithm are presented in Table 3.

5.2 Experiment 1 Location data, coordinates and the order quantity for each city are presented in Table 4. The results of the tests in terms of the distance covered can be seen in

Table 2 Parameters and values of the vehicle routing problem for a given study Parameter Value Number of cities Number of driver Capacity of driver’s truck Order quantity for each city

20 10 15 1–10

Table 3 Parameters and values of the genetic algorithm for a given study Parameter Value The number of chromosomes Size of population Elite coefficient Number of iterations Crossover factor Mutation rate

3000 1000 0.7 10,000 1 0.1

Comparison the Genetic Algorithm and Selected Heuristics … Table 4 Location data in experiment 1 City number X coordinate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

977 671 745 586 362 900 446 762 824 736 731 754 560 643 306 827 616 375 123 472

249

Y coordinate

Order quantity

44 219 206 755 590 792 852 984 383 761 363 145 425 919 747 973 657 375 440 216

0 6 6 6 7 2 9 4 6 2 5 2 1 8 8 7 10 9 4 1

Table 5. The results of the tests in terms of the selected route and vehicle filling can be seen in Table 6. From the obtained research results, it can be observed that some routes for the genetic algorithm, savings heuristics and cheapest arc path heuristics were selected in a very similar way. The genetic algorithm turned out to be the most optimal algorithm in terms of the selection of the optimal route, and the least of Christofides’ heuristic. When it comes to the algorithm’s operating time, the Christofides’ algorithm turned out to be the fastest, while the genetic algorithm was the slowest. The summary of this sample can be seen in Table 7. The results for the experiment 1 can be presented in the graphs in Fig. 10. Figure 10 shows that the shortest route was selected using the genetic algorithm, slightly shorter using the Dijkstra heuristic and savings. However, the longest using the Christofides algorithm. As a percentage, the longest route deviates from the shortest by 14.7%, and the remaining two to 1%. The genetic algorithm needed the longest to find a solution, and the least expensive arc path heuristic the least.

250

J. Ochelska-Mierzejewska and P. Zakrzewski

Table 5 Test results in terms of the distance for experiment 1 Driver no Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristic 1 2 3 4 5 6 7 8 9 10

1429.93 1874.24 1960.92 1993.97 1948.60 1964.92 714.10 0.00 0.00 0.00

1960.92 1948.60 2039.63 1982.09 1909.05 1429.93 714.10 0.00 0.00 0.00

0.00 0.00 1948.60 1960.92 2039.63 1429.93 1982.09 1909.05 0.00 734.11

734.11 1933.07 1960.92 2674.77 1971.23 1905.51 1710.32 743.85 0.00 0.00

Table 6 Test results in terms of vehicle filling for experiment 1 Driver no Route for the genetic algorithm Route for savings heuristic 1 2 3 4 5 6 7 8 9 10 Driver no

[0, 10, 16, 0] – 15 [0, 13, 8, 0] – 14 [0, 14, 4, 0] – 15 [0, 9, 7, 15, 5, 0] – 15 [0, 3, 6, 0] – 15 [0, 12, 17, 18, 19, 0] – 15 [0, 11, 1, 2, 0] – 14 [0] [0] [0] Route for Dijkstra heuristic

[0, 4, 14, 0] – 15 [0, 3, 6, 0] – 15 [0, 7, 13, 9, 0] – 14 [0, 19, 17, 18, 12, 0] – 15 [0, 5, 15, 8, 0] – 15 [0, 10, 16, 0] – 15 [0, 11, 1, 2, 0] – 14 [0] [0] [0] Route for Christofides heuristics

1 2 3 4 5 6 7 8 9 10

[0] [0] [0, 3, 6, 0] – 15 [0, 4, 14, 0] – 15 [0, 7, 13, 9, 0] – 14 [0, 10, 16, 0] – 15 [0, 19, 17, 18, 12, 0] – 15 [0, 5, 15, 8, 0] – 15 [0] [0, 11, 2, 1, 0] – 14

[0, 11, 2, 1, 0] – 14 [0, 17, 3, 0] – 15 [0, 4, 14, 0] – 15 [0, 18, 6, 5, 0] – 15 [0, 15, 7, 0] – 11 [0, 9, 13, 10, 0] – 15 [0, 19, 12, 16, 0] – 12 [0, 8, 0] – 6 [0] [0]

Comparison the Genetic Algorithm and Selected Heuristics …

251

Table 7 Summary for the experiment 1 Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristics Total time [ms] 9547 Total load 103 Total distance [m] 11,886.68

116 103 11,984.32

8 103 12,004.33

9 103 13,633.78

Fig. 10 Experiment 1—results of a total distance, b time to find a solution

5.3 Experiment 2 Location data, coordinates and the order quantity are presented in Table 8. This set differs from the previous one in terms of other location coordinates and their load capacity.

252

J. Ochelska-Mierzejewska and P. Zakrzewski

Table 8 Location data in experiment 2 City number X coordinate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

654 609 933 552 487 940 468 847 630 731 32 424 721 290 133 794 511 381 649 359

Y coordinate

Order quantity

865 940 447 86 869 969 526 601 578 998 797 596 484 991 942 499 704 794 967 870

0 7 9 9 1 10 1 6 2 4 8 3 1 6 8 3 8 7 4 2

The results of the tests in terms of the distance for experiment 2 can be seen in Table 9. The results of the tests in terms of the selected route and vehicle filling can be seen in Table 10. The results from Table 10 show that many routes were selected in a similar way. The algorithm that selected the most optimal route turned out to be a genetic algorithm, while the least optimal route was selected by the savings algorithm. The fastest route was determined using the Christofides heuristic, and the slowest using the genetic algorithm. These results can be seen in Table 11. The results for the experiment 2 can be seen in the graphs in Fig. 11. From Fig. 11 it can be concluded that the most optimal route was chosen by a genetic algorithm, a slightly longer algorithm for Dijkstra heuristic. Longer routes have been selected for the other two. Dijkstra heuristic deviated 1.9% from the best solution, while the Christofides heurist by 6.3%, and the savings heuristic by 8.1%.

Comparison the Genetic Algorithm and Selected Heuristics …

253

Table 9 Test results in terms of the distance for experiment 2 Driver no Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristic 1 2 3 4 5 6 7 8 9 10

1684.93 1005.97 1539.98 1076.32 669.01 655.53 237.84 0.00 0.00 0.00

1142.32 1089.89 1670.70 1256.80 669.01 893.79 705.04 0.00 0.00 0.00

0.00 0.00 1684.93 1256.80 1089.89 1005.97 669.01 0.00 1057.78 237.85

377.07 1257.85 1005.97 1670.70 1076.32 655.53 1255.81 0.00 0.00 0.00

Table 10 Test results in terms of vehicle filling for experiment 2 Driver no Route for the genetic algorithm Route for savings heuristic 1 2 3 4 5 6 7 8 9 10 Driver no

[0, 15, 12, 3, 8, 0] – 15 [0, 2, 7, 0] – 15 [0, 6, 11, 10, 19, 4, 0] – 15 [0, 13, 14, 0] – 14 [0, 9, 5, 0] – 14 [0, 16, 17, 0] – 15 [0, 1, 18, 0] – 11 [0] [0] [0] Route for Dijkstra heuristic

[0, 12, 2, 15, 0] – 13 [0, 4, 13, 14, 0] – 15 [0, 8, 3, 6, 11, 0] – 15 [0, 10, 17, 0] – 15 [0, 5, 9, 0] – 14 [0, 7, 16, 0] – 14 [0, 19, 1, 18, 0] – 13 [0] [0] [0] Route for Christofides heuristics

1 2 3 4 5 6 7 8 9 10

[0] [0] [0, 15, 12, 3, 8, 0] – 15 [0, 17, 10, 0] – 15 [0, 4, 13, 14, 0] – 15 [0, 7, 2, 0] – 15 [0, 9, 5, 0] – 14 [0] [0, 19, 11, 6, 16, 0] – 14 [0, 1, 18, 0] – 11

[0, 1, 18, 9, 0] – 15 [0, 5, 15, 12, 0] – 14 [0, 2, 7, 0] – 15 [0, 8, 3, 6, 11, 0] – 15 [0, 14, 13, 0] – 14 [0, 16, 17, 0] – 15 [0, 10, 19, 4, 0] – 11 [0] [0] [0]

254

J. Ochelska-Mierzejewska and P. Zakrzewski

Table 11 Summary for the experiment 2 Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristics Total time [ms] 9583 Total load 99 Total distance [m] 6869.58

102 99 7427.55

9 99 7002.23

Fig. 11 Experiment 2—results of a total distance, b time to find a solution

7 99 7299.24

Comparison the Genetic Algorithm and Selected Heuristics … Table 12 Location data in experiment 3 City number X coordinate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

359 740 968 991 927 453 434 840 778 178 419 739 929 672 795 174 593 235 629 984

255

Y coordinate

Order quantity

10 330 614 804 574 46 890 190 772 128 969 619 683 200 221 972 94 823 75 470

0 4 5 1 1 3 10 9 7 3 2 4 8 4 9 8 1 3 4 3

5.4 Experiment 3 Location data, coordinates and the order quantity are presented in Table 12. This set differs from the previous one with other location coordinates and their load capacity. The results of the experiment 3 in terms of the distance traveled can be seen in Table 13. The results of the tests in terms of the selected route and vehicle filling can be seen in Table 14. The results of this test are less likely to show similarities in the selection of routes. The most optimal route was selected by the genetic algorithm, while the least optimal route was selected by the Dijkstra heuristic. It took the longest time to calculate the routes for the genetic algorithm, and the least for the Dijkstra heuristic. These results are presented in Table 15. The results for the experiment 3 is presented in the graphs in Fig. 12. Figure 12 presents that slightly longer routes were determined using the Christofides algorithm and the savings algorithm, by 1.9% and 1.5% respectively, compared to the most optimal solution. The least optimal turned out to be Dijkstra heuristic, 12.4% longer than that determined by the genetic algorithm.

256

J. Ochelska-Mierzejewska and P. Zakrzewski

Table 13 Test results in terms of the distance for experiment 3 Driver no Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristic 1 2 3 4 5 6 7 8 9 10

2054.52 1818.93 975.31 1031.64 2336.83 2201.22 201.32 0.00 0.00 0.00

2067.02 2032.37 2043.48 1104.02 1048.03 1924.48 556.74 0.00 0.00 0.00

0.00 0.00 0.00 0.00 1926.82 2342.03 2043.48 2718.71 1848.10 1055.49

556.74 1133.12 1048.03 2037.63 2075.87 1924.48 2043.48 0.00 0.00 0.00

Table 14 Test results in terms of vehicle filling for experiment 3 Driver no Route for the genetic algorithm Route for savings heuristic 1 2 3 4 5 6 7 8 9 10 Driver no

[0, 6, 10, 9, 0] – 15 [0, 4, 2, 12, 0] – 14 [0, 13, 14, 0] – 13 [0, 18, 7, 16, 0] – 14 [0, 17, 15, 1, 0] – 15 [0, 19, 3, 8, 11, 0] – 15 [0, 5, 0] – 3 [0] [0] [0] Route for Dijkstra heuristic

[0, 4, 2, 3, 12, 0] – 15 [0, 16, 19, 8, 11, 0] – 15 [0, 9, 15, 17, 0] – 14 [0, 1, 14, 0] – 13 [0, 7, 13, 0] – 13 [0, 10, 6, 0] – 12 [0, 5, 18, 0] – 7 [0] [0] [0] Route for Christofides heuristics

1 2 3 4 5 6 7 8 9 10

[0] [0] [0] [0] [0, 8, 12, 0] – 15 [0, 6, 2, 0] – 15 [0, 17, 15, 9, 0] – 14 [0, 7, 19, 3, 10, 0] – 15 [0, 16, 11, 4, 14, 0] – 15 [0, 5, 18, 13, 1, 0] – 15

[0, 5, 18, 0] – 7 [0, 16, 1, 14, 0] – 14 [0, 7, 13, 0] – 13 [0, 19, 2, 8, 0] – 15 [0, 4, 12, 3, 11, 0] – 14 [0, 10, 6, 0] – 12 [0, 9, 15, 17, 0] – 14 [0] [0] [0]

Comparison the Genetic Algorithm and Selected Heuristics …

257

Table 15 Summary for the experiment 3 Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristics Total time [ms] 10,269 Total load 89 Total distance [m] 10,619.77

114 89 10,776.14

7 89 11,934.62

107 89 10,819.36

Fig. 12 Experiment 3—results of a total distance, b time to find a solution

5.5 Experiment 4 Location data, coordinates and the order quantity are presented in Table 16. This set differs from the previous one with other location coordinates and their load capacity. The results of the tests in terms of the distance traveled can be seen in Table 17. The results of the tests in terms of the selected route and vehicle filling can be seen in Table 18.

258

J. Ochelska-Mierzejewska and P. Zakrzewski

Table 16 Location data in experiment 4 City number X coordinate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

749 342 936 141 511 390 74 349 791 377 864 833 998 868 504 584 491 201 925 68

Y coordinate

Order quantity

695 939 611 352 415 798 53 869 656 487 860 922 814 180 150 944 898 931 18 393

0 9 7 3 2 1 2 7 3 3 1 1 2 1 4 10 2 3 6 1

Table 17 Test results in terms of the distance for experiment 4 Driver no Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristic 1 2 3 4 5 6 7 8 9 10

1498.41 1228.64 1035.65 2195.24 984.17 0.00 0.00 0.00 0.00 0.00

2195.25 1283.82 1498.41 1035.65 984.17 0.00 0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 0.00 1228.64 2537.16 2195.25 597.41 764.19

414.14 1927.85 2450.98 1035.65 1283.82 0.00 0.00 0.00 0.00 0.00

Comparison the Genetic Algorithm and Selected Heuristics …

259

Table 18 Test results in terms of vehicle filling for experiment 4 Driver no Route for the genetic algorithm Route for savings heuristic 1 2 3 4 5 6 7 8 9 10 Driver no

[0, 13, 18, 2, 0] – 14 [0, 16, 1, 17, 5, 0] – 15 [0, 15, 11, 10, 12, 0] – 14 [0, 4, 14, 6, 19, 3, 9, 0] – 15 [0, 7, 8, 0] - 10 [0] [0] [0] [0] [0] Route for Dijkstra heuristic

[0, 4, 14, 6, 19, 3, 9, 0] – 15 [0, 5, 1, 17, 16, 0] – 15 [0, 13, 18, 2, 0] – 14 [0, 12, 10, 11, 15, 0] – 14 [0, 7, 8, 0] – 10 [0] [0] [0] [0] [0] Route for Christofides heuristics

1 2 3 4 5 6 7 8 9 10

[0] [0] [0] [0] [0] [0, 5, 17, 1, 16, 0] – 15 [0, 7, 11, 13, 18, 0] – 15 [0, 9, 3, 19, 6, 14, 4, 0] – 15 [0, 15, 0] – 10 [0, 8, 2, 12, 10, 0] – 13

[0, 8, 2, 0] – 10 [0, 13, 18, 14, 9, 0] – 14 [0, 4, 19, 6, 3, 7, 0] – 15 [0, 12, 10, 11, 15, 0] – 14 [0, 5, 1, 17, 16, 0] – 15 [0] [0] [0] [0] [0]

Table 19 Summary for the experiment 4 Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristics Total time [ms] Total load Total distances [m]

9600 68 6942.11

107 68 6997.28

7 68 7322.66

9 68 7112.43

In this trial, the good routes for the genetic algorithm and the savings heuristics were very similar. The genetic algorithm turned out to be the most optimal in the route selection, while the least optimal was Dijkstra heuristic. The slowest route was determined by the genetic algorithm, while the fastest was the Dijkstra heuristic, which can be seen in the results of Table 19. The results for the experiment 4 can be seen in the graphs in Fig. 13.

260

J. Ochelska-Mierzejewska and P. Zakrzewski

Fig. 13 Experiment 4—results of a total distance, b time to find a solution

As a result of the fourth test in Fig. 13, it can be observed that the Dijkstra heuristic turned out to be the least optimal, 5.5% longer than the most optimal route. The remaining routes were longer by 0.8% and 2.5%, respectively, the savings heuristic and the Christofides heuristic.

5.6 Experiment 5 Location data, coordinates and the order quantity are presented in Table 20. This set differs from the previous one with other location coordinates and their load capacity. The results of the tests in terms of the distance traveled can be seen in Table 21. The results of the tests in terms of the selected route and vehicle filling is presented in Table 22. In this sample, few routes overlapped, the most optimal in terms of route selection turned out to be the genetic algorithm, and the least optimal Christofides’ heuristic.

Comparison the Genetic Algorithm and Selected Heuristics … Table 20 Location data in experiment 5 City number X coordinate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

677 671 232 133 760 230 136 402 780 464 366 684 887 971 794 504 107 582 133 89

261

Y coordinate

Order quantity

10 240 503 846 911 921 127 452 618 344 230 174 382 192 751 432 241 941 882 65

0 4 6 5 4 2 4 3 2 1 1 1 2 7 6 8 9 2 9 8

Table 21 Test results in terms of the distance for experiment 5 Driver no Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristic 1 2 3 4 5 6 7 8 9 10

881.01 2103.31 1221.88 1569.45 2465.61 1643.64 0.00 0.00 0.00 0.00

2061.21 2540.96 1585.73 1286.17 1132.60 1827.44 0.00 0.00 0.00 0.00

0.00 0.00 0.00 0.00 2103.31 2409.00 1569.45 1221.87 2011.13 940.63

1311.32 2195.42 2061.22 2893.25 1269.52 1682.53 0.00 0.00 0.00 0.00

262

J. Ochelska-Mierzejewska and P. Zakrzewski

Table 22 Test results in terms of vehicle filling for experiment 5 Driver no Route for the genetic algorithm Route for savings heuristic 1 2 3 4 5 6 7 8 9 10 Driver no

[0, 11, 1, 13, 0] – 10 [0, 18, 3, 10, 0] – 15 [0, 19, 6, 0] – 12 [0, 2, 16, 0] – 15 [0, 14, 4, 17, 5, 9, 0] – 15 [0, 7, 15, 8, 12, 0] – 15 [0] [0] [0] [0] Route for Dijkstra heuristic

[0, 9, 3, 18, 0] – 15 [0, 10, 2, 5, 17, 4, 0] – 15 [0, 8, 14, 12, 11, 0] – 11 [0, 6, 16, 0] – 13 [0, 1, 7, 15, 0] – 15 [0, 13, 19, 0] – 15 [0] [0] [0] [0] Route for Christofides heuristics

1 2 3 4 5 6 7 8 9 10

[0] [0] [0] [0] [0, 10, 3, 18, 0] – 15 [0, 11, 14, 17, 5, 7, 9, 0] – 15 [0] [0, 19, 6, 0] – 12 [0, 13, 12, 8, 4, 0] – 15 [0, 1, 15, 0] – 12

[0, 6, 16, 10, 0] – 14 [0, 7, 2, 17, 1, 0] – 15 [0, 9, 18, 3, 0] – 15 [0, 19, 5, 4, 11, 0] – 15 [0, 15, 12, 0] – 10 [0, 8, 14, 13, 0] – 15 [0] [0] [0] [0]

Table 23 Summary for the experiment 5 Genetic algorithm Savings heuristics Dijkstra heuristic Christofides heuristics Total time [ms] 9363 Total load 84 Total distance [m] 9884.9

105 84 10,434.11

7 84 10,255.41

8 84 11,413.25

The Dijkstra heuristic found the fastest solution, while the genetic algorithm found the slowest. The results are presented in Table 23. The results for this data set can be presented in Fig. 14. Figure 14 shows that after the genetic algorithm, the Dijkstra heuristic were the most optimal—3.5%, the savings heuristic—5.6% and the Christofides heuristic— 15.5%. The genetic algorithm took the longest time to find the optimal route, and the least it took to find the Dijkstra heuristic.

Comparison the Genetic Algorithm and Selected Heuristics …

263

Fig. 14 Experiment 5—results of a total distance, b time to find a solution

5.7 Comparison For the above-mentioned experiments, it can be seen which of the heuristics would be slightly longer than the genetic algorithm each time or would be longer each time. The results of the work were averaged using arithmetic tests and are presented in Fig. 15a. The genetic algorithm was the algorithm that chose the most optimal route, the second best savings heuristic, which took an average of 3.4% longer. The third was the Dijkstra heuristic, it chose a longer route by 4.9%, and Christofides’ heuristics turned out to be the least optimal, it was worse by 8.1%. Figure 15b shows that the genetic algorithm needed the most time to determine the solution, followed by the savings heuristic and the third Christofides heuristic. The Dijkstra heuristic turned out to be the most optimal.

264

J. Ochelska-Mierzejewska and P. Zakrzewski

Fig. 15 a Comparison of solutions for the optimal solution, b time to find a solution

6 Summary After analyzing the research, it can be found that the most optimal route selection turned out to be the genetic algorithm. Next were the savings heuristic, the Dijkstra heuristic, and the Christofides heuristic. The research was conducted with the assumption that each drivers cannot exceed the filling more than 15 capacity in each trucks, order quantity for each of the locations is not higer than 10, as shown in Table 2. Additionally, the genetic algorithm had its parameters, which are listed in Table 3. However, if we compare the routing times, the Dijkstra heuristic turns out to be the fastest, followed by Christofides heuristics and savings heuristic. If the main criterion was the route optimality, the genetic algorithm would be chosen, if the speed of the route calculation, then it is worth choosing the Dijkstra heuristics. They also analyze vehicle filling, it is worth noting that the least optimal in this respect is Christofides’ heuristics, which in one of the attempts needed more drivers than the other algorithms. In the real world, parameters such as the number of drivers used are important because there are new factors such as: equipment (vehicle) operation.

Comparison the Genetic Algorithm and Selected Heuristics …

265

Each of the algorithms has its own characteristics that go definitively, so when choosing a route, it is worth choosing the right algorithm for the search. Many integrated services in the strategic route planning problem explored, but still pages on the platform in scientific research put emphasis: • • • • • • • • •

strict client’s time frame, drivers’ timetable, many warehouses, multitasking, choosing a carrier or outsourcing, regulations for drivers, collection and delivery, parts number of checks, any dynamic problems with fast response times.

References 1. https://developers.google.com/optimization/routing 2. https://developers.google.com/optimization/routing/cvrp 3. Gutin, G., Punnen, A.P.: The traveling salesman problem and its variations (2001). https://doi. org/10.1007/b101971 4. Suthikarnnarunai, N.: A Sweep Algorithm for the Mix Fleet Vehicle Routing Problem. IMECS (2008). ISBN: 978-988-17012-1-3 5. Ding, D., Zou, X.: The Optimization of Logistics Distribution Route Based on Dijkstra’s Algorithm and C-W Savings Algorithm, pp. 957–958 (2016). https://doi.org/10.2991/mmebc16.2016.200 6. Letchford, A., Lysgaard, J., Eglese, R.: A branch-and-cut algorithm for the capacitated open vehicle routing problem. J. Oper. Res. Soc. 1642–1651 (2007) 7. Chepuri, K., Homem-de-Mello, T.: Solving the vehicle routing problem with stochastic demands using the cross-entropy method. Ann. Oper. Res. 153–181 (2005) 8. Dror, M., Trudeau, P.: Savings by Split Delivery Routing. Transportation Science 23(2), 141– 145 (1989) 9. Dror, M., Trudeau, P.: Split delivery routing. Naval Research Logistics 37, 383–402 (1990). https://doi.org/10.1002/nav.3800370304 10. Dror, M., Laporte, G., Trudeau, P.: Vehicle routing with split deliveries (1994). https://doi.org/ 10.1016/0166-218X(92)00172-I 11. Golden, B.L., Raghavan, S., Wasil, E.: The Vehicle Routing Problem: Latest Advances and New Challenges, pp. 422–426 (2008). https://doi.org/10.1007/978-0-387-77778-8 12. Dror, M.: Arc Routing: Theory, Solutions and Applications. Springer (2000). https://doi.org/ 10.1007/978-1-4615-4495-1 13. Zanjirani Farahani, R., Miandoabchi, E.: Graph Theory for Operations Research and Management: Applications in Industrial Engineering. IGI Global (2012). ISBN: 978-1466626614 14. Toth, P., Vigo, D.: Models, relaxations and exact approaches for the capacitated vehicle routing problem. Discrete Applied Mathematics 123(1–3), 487–512 (2002). https://doi.org/10.1016/ S0166-218X(01)00351-1 15. Clarke, G., Wright, J.V.: Scheduling of vehicles from a central depot to a number of delivery points. Oper. Res. 568–581 (1964)

266

J. Ochelska-Mierzejewska and P. Zakrzewski

16. Daglayan, H., Karakaya, M.: The Impact of Crossover and Mutation Operators on a GA Solution for the Capacitated Vehicle Routing Problem. Universal Journal of Engineering Science 4(3), 39–44 (2016). https://doi.org/10.13189/ujes.2016.040301 17. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959) 18. Nazifa, H., Lee, L.S.: Optimised crossover genetic algorithm for capacitated vehicle routing problem. Applied Mathematical Modelling 36(5), 2110–2117 (2012) 19. Crevier, B., Cordeau, J.F., Laporte, G.: The multi-depot vehicle routing problem with interdepot routes. Eur. J. Oper. Res. 756–773 (2007) 20. Malandraki, C., Daskin, M.: Time dependent vehicle routing problems: formulations, properties and heuristic algorithms. Transp. Sci. 185–200 (1992) 21. Zeimpekis, V., Giaglis, G.: Urban dynamic real-time distribution services: insights from SMEs. J. Enterp. Inf. Manag. 367–388 (2006) 22. Mourgaya, M., Vanderbeck, F.: Column generation based heuristic for tactical planning in multi-period vehicle routing. Eur. J. Oper. Res. 1028–1041 (2007) 23. Brandão, J.: A new tabu search algorithm for the vehicle routing problem with backhauls. Eur. J. Oper. Res. 540–555 (2006) 24. Ropke, S., Pisinger, D.: An adaptive large neighborhood search heuristic for the pickup and delivery problem with time windows. Transp. Sci. 455–472 (2006)

Dynamic Analysis of Website Content Using a Mobile Application Krzysztof Stepien´ and Dawid Kossowski

Abstract Nowadays, the use of mobile and internet applications has become extremely popular. It has led to a very large development in these areas. More and more data and information are provided to users, and their processing is a very timeconsuming process. The existing solutions that allow for the improvement of the data selection process are usually created in a limited way for the user. The main purpose of the document is to create a mobile application, which is used to analyze changes on websites. It allows user to track all or any user-selected items on any site that uses HTTPS (Hypertext Transfer Protocol Secure). Keywords Web scrapping · React native · Multi-platform · NodeJS · GraphQL

1 Introduction The continuous development of the electronic market makes access to devices and new technologies easier and more common. One device that almost everyone has is a mobile phone. The result of its popularity is the creation of newer and more optimized applications supporting the users. Software producers try to meet the requirements and adjust solutions to the needs of consumers in the best possible way. However, there are still some areas that can be improved for easier life or more time saving. Currently, the telephone market is dominated by two platforms. Android is the first one in terms of popularity. It is estimated that the number of smartphone users is around 75% of all people using a mobile phone. On the other hand, iOS is in second place, implemented in iPhone’s and used by 25% of mobile device users. The functioning of these systems is significantly different from each other, and thus the process of creating applications for each of them requires different tools and skills, and a lot of work. Providing the software only to one selected platform entails the loss of potential users. So, a common problem has become how to quickly and K. Stepie´n (B) · D. Kossowski Institute of Information Technology, Lodz University of Technology, Lodz, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_11

267

268

K. Stepie´n and D. Kossowski

efficiently ensure efficient operation of both of these dissimilar platforms. Gradually, multi-platform solutions started to appear, which allowed for creating support for both of these systems with a single application. Although today there are many ways to create multi-platform mobile applications but they are still not perfect. Universal access to the Internet and the ease of exchanging information through websites have led to the development of websites in almost every possible area of life. New technologies and systems affect the automation of many everyday user processes, such as, for example, online shopping. Each person who needs any information in order to find it first opens a web browser. The need to use websites has become everyday life. However, such a dynamic development in the internet industry meant that the amount of new data reaching users of websites and web applications is often impossible for them to process. Websites, especially related to social media or marketing, provide more and more information in the displayed content, “distracting” the user from his main goal. The displayed data is usually tailored to the needs or interests of the users, which makes it very difficult for them to leave the site. The result is a large amount of time spent on such websites, which can also lead to Internet addiction. Although the application market is trying to provide new solutions that are designed to reduce the time spent by users on performing operations, such as the purchase process, it is rare to find software that supports the analysis and selections of information from pages. Combining a telephone application with the ability to observe information that is valuable in a certain context can become a solution for processing a huge amount of information on the Internet.

2 Web Scrapping Nowadays, when the Internet became a source of knowledge and information, the process of downloading data from websites was necessary in many cases. Web scraping is the extraction of certain data important in a specific context from websites or from Internet applications. The method of downloading them and their subsequent selection often depends on the problem they concern (Fig. 1). All web pages or applications are built from HTML documents, which are represented using the Document Object Model (DOM). These documents are processed and converted in an appropriate manner by browsers into an object model, which is independent from the user’s operating system or platform. The DOM is an object that represents the entire web page with all information and metadata. Therefore, the user has the possibility to download, add, delete or modify HTML elements and attributes. Although the mentioned mechanism is provided by browsers and cannot be used directly outside of them—most web scraping applications are based on this mechanism. There are specialized libraries for processing HTML code, called parsers and they can recreate most of the functions provided by the DOM (Fig. 2) [1].

Dynamic Analysis of Website Content Using a Mobile Application

269

Fig. 1 Web scraping schema process

Fig. 2 HTML document schema

2.1 Web Scrapping Techniques Initially, information was analyzed and extracted manually. Data from the pages were copied to local text files. This approach was appropriate with small research area. The data obtained in this way were very well adapted to the needs of the problem being solved. However, it caused the necessity of monotonous work, in which it is easy to make a mistake or to omit certain elements. After development of websites, this process turned out to be ineffective. Currently, manual data extraction has been almost completely replaced, but there are circumstances in which it is necessary to use it. Some websites block access to downloading content by the software, and the only access to the information can be done manually, directly through the browser [2]. Another approach to analyze HTML page, is to download it to a local file. Downloaded documents could be automatically transferred into spreadsheets, which allows data extraction from tables. This method was successful when the analyzed data

270

K. Stepie´n and D. Kossowski

Fig. 3 Web scraping versus web crawling

related to one website. However, when data were to be analyzed from all sites in a given area, this process required a lot of human intervention and work. When the amount of data became too large that it was impossible to analyze them manually, websites for web scraping were created, and any other similar tools. Those websites, like browsers, download the content of the website. They use an appropriate HTTP request for this purpose, and then, using appropriate parsers, read the received content and refer to it as an object. Then the necessary values and metadata is extracted [3]. This form of web scraping is currently the most popular. There are many tools that allows to retrieve data from any URL simultaneously. However, the automated method is not always the best. There are situations when the data is presented on the website in such a specific way that the software cannot transform it into an appropriate form from which it would be possible to read the necessary information (Fig. 3). One of another approach are regular expressions. Regular expressions are certain patterns that are most often used to define a string or symbols. They are primarily used to describe regular languages. Using them, the user can define a certain template with which he will adjust the selected content on the page. Properly built algorithms can determine whether the currently analyzed content matches the defined pattern. They allow you to get rid of errors and filter out information, limiting it only to what is necessary. However, if the user changes tags or data representation, they may not work properly and the operator needs to change the pattern [4].

Dynamic Analysis of Website Content Using a Mobile Application

271

The biggest challenge faced by web searching and data extraction programs are web applications which, unlike static pages, generate data dynamically. Such applications run on the server, communicating with the user’s host via a browser. Modern technologies operate based on this principle and data are usually generated on them only after the user selects a given URL. The content of such websites downloaded by scraping programs will be empty. Scrappers can only download the content that is defined in a static way. One of possible exceptions that can obey that rule are headless browsers They are web browsers that do not have a graphical user interface. Their operation is similar to standard browsers, but they are executed with the command line. Using this method, it is possible to simulate opening a website from the program level, which will be filled with data [5, 6].

3 Multi-platform Applications The problem of multi-platform concerns all technologies that work only on one specific platform. Software that is available only to a certain group of users limits its potential audience. In the past, the creation and development of new mobile applications with the use of technologies supporting only one platform entailed the need to create two separate programs. Thus this caused high cost and time of their production. This resulted in the need to develop solutions that would ensure the creation of a single executable program for both Android and iOS. Currently, there are new ideas and technologies related to the improvement of this process, and one of the newest tools that has already gained great popularity is React Native. It introduces a completely new approach to the creation of multi-platform telephone software [7]. The multi-platform idea base on the fact that the hardware platform determines how the application is made and installed on a specific device. It consists of many elements, such as hardware components, type of software, but also the way of memory management and the possibility of accessing it. Multiplatformity, also called crossplatform or platform independence, is a feature of operating systems, programming languages, and applications that allows them to work properly on at least two different vendor platforms [8, 9]. One of the most popular examples is Java Runtime Environment (JRE). It is a run-time environment for programs written using Java programming language on various operating systems. It consists of two basic elements. The first of these are base classes and helper files, through which it is possible to display the user interface and perform basic operations such as I/O operations. The second component is the Java Virtual Machine (JVM), which is the backbone of this software. The code of the application being developed is compiled into binary code and then executed on this machine with the use of an appropriate interpreter. Providing an additional layer, which is a virtual machine, gives full independence from the platform on which the applications are run. In the case of mobile systems, multi-platformity corresponds to the situation when one application can be launched on both the iOS and Android platforms [6,

272

K. Stepie´n and D. Kossowski

Fig. 4 Cross platform application idea

10, 11]. There are several ways to create multi-platform phone programs. A popular approach is to use internet applications, ensuring that they are properly displayed and functioned on telephones. They are performed by the browser, which ensures full hardware independence. Another well-known approach are hybrid applications which, instead of in the browser, are executed in a native container, which is a layer of communication between the platform and the code (Fig. 4). Multiplatformity very often creates an additional intermediate layer between the hardware platform and the software being executed, which usually results in a reduction in efficiency. In such case, operations are not performed directly on the device and must communicate with it via the layer, which reduces the speed of application execution. Moreover, it involves the use of additional, specialized tools, such as interfaces, which enable proper communication with a given platform, which unfortunately have an adverse effect on the size of the application being created, and additionally force Programmers to use them. Cross-platform software must also be thoroughly tested on all existing platforms. There are situations where the same error may appear completely different on each of them [12, 13].

4 Proposed Solution The utility of all applications for web scrapping is related to the requirements of the users. The main idea of the application is to relieve users of the overload of data that require a lot of attention and time. LookForChange is an application which supports both iOS and Android operating systems. Every data inside is encrypted. All operations related to user interaction should be performed no longer than 300 ms. Longer periods of time may discourage

Dynamic Analysis of Website Content Using a Mobile Application

273

Fig. 5 Application architecture

the user. In addition, information about changes to the analyzed pages must be provided not later than 120 s from the moment of their occurrence. Moreover application supports PUSH notifications.

4.1 Architecture Communication between the visual layer of the application and the server is performed using the HTTP protocol. GraphQL has been used to handle all the requests. It is the query language for the server-side application interface. It allows only one endpoint to be defined to which all HTTP requests are directed. Appropriate functions, called resolvers, are implemented on the server to handle all incoming requests. Queries sent from the application must be in an appropriate form. Thanks to this, they can be processed correctly, and the result of the given resolver is returned to the application as a response. The second layer in the architecture of the application being created (Fig. 5)— it is the connection between the server and the database. It is provided by a tool called Prisma for reading, modifying and writing items sent to the database using a programming language. In addition, it allows the user to define the database schema, on the basis of which the appropriate tables are generated along with all operations on them. The functions created in this way can be used on the server for data processing and handling. The requests sent from the application to the GraphQL server are divided into three types, which are used to manage operations related to data processing: • Inquiries—reading and retrieving certain, specific data, such as a list of user’s watched pages. • Mutations—modify, add or delete certain data, such as creating a new account during the registration process. • Subscriptions—allow the server to send data to the clients in the case of a specific event. Their operation is based on the WebSocket tool, which allows the user to create a two-way communication channel between the application and the server. Thus, changed data can be sent at any time, for example when a page change is detected.

274

K. Stepie´n and D. Kossowski

In order to use GraphQL operations in the mobile application, a special client called Apollo was used. It is a tool that allows the user to fill his view with the data retrieved from the GraphQL server. It ensures connection with the server and provides methods that allow for sending appropriate HTTP requests as well as receiving and processing responses. Additionally, it provides the ability to handle errors caused by GraphQL operations.

4.2 Data Layer Prisma allows the user to generate functions to handle all elements in existing databases created with the help of technologies such as MySql or PostgreSQL. For this purpose, it uses appropriate adapters that connect and control a specific base. Apart from that, it also allows the user to generate his own database on the basis of a properly defined schema that contains the names of all tables together with the relations between them and all the fields belonging to them. Then, methods for the generated database tables, allow to read, add, modify and delete elements.

4.3 Logical layer The GraphQL server implementation was fully developed using the graphql-yoga tool. It is built by the express framework, which is an extension of the Node.js environment, which allows to run servers based on the JavaScript language. Therefore, on the server it is possible to use all elements of the express framework. In addition, graphql-yoga has a built-in GraphQL engine that allows the user to handle all its operations and is compatible with the Apollo client, through which requests are sent from the user’s application. As in the case of Prisma, which is based on GraphQL, the server is also defined using the appropriate schema. It contains all the operations that can be used by the applications communicating with it. Types described in its structure are specified as well. The schema therefore describes all the functions of the server. The visualization of the system diagram has been divided into two parts, the first one concerns the users and the observed pages and elements (Fig. 6), and the second - the history of changes on the observed pages (Fig. 7) When the user is logged-in, the main interaction is related to the websites monitored by him. The newly added site must have its own name and a valid URL, i.e. it must lead to an existing HTTPS secured page. In case of manual selection of observed elements on the page, they are converted into character strings, which allows to save them in the database. The website added in this way can be added to the list of watched pages. Then the process of analyzing changes in its structure is performed on the server (Fig. 8).

Dynamic Analysis of Website Content Using a Mobile Application

Fig. 6 Server schema visualization

Fig. 7 Server schema history visualization

275

276

K. Stepie´n and D. Kossowski

Fig. 8 Application activity diagram

The process of analyzing pages begins with the start of the server. The server iterates over all observed pages stored in the database every 120 s and based on their URL links, downloads the HTML of the examined page. The site code is saved in memory, and then it is compared with the downloaded content in the next iteration. In order to perform that, the Hiff library was used, which allows to extract information about the differences between two fragments of HTML code. If there is any change on the page, then it is verified whether the change concerns the elements observed by the user. In case of correct match, a subscription is made that sends a new detection to the user’s application website history and a PUSH notification is sent informing about the change. In the case of PUSH notification, a module provided by Expo was used. Therefore, it is possible to send an HTTP request to Expo servers along with an identifier that defines the device and the message that needs to be delivered. Then the expoID is checked and the message is delivered to the specific user’s device.

4.4 Graphical Layer The mobile app view was made in React Native. All graphic elements were created using the components provided by the ReactJS library. Moreover, the NativeBase library is used, which provides a set of ready-made, elementary components, such as buttons, whose properties can be overwritten and used to build full views. In addition, it allows easy management of global styles such as color and text size that can be used throughout the application. This ensures consistency between all elements.

Dynamic Analysis of Website Content Using a Mobile Application

277

Fig. 9 Webpage list

4.5 User Perspective The logged in user is redirected to the watched pages section (Fig. 9). There is a list of all the elements that have been made to track page changes. Each of them has a title defined by the user. After clicking on the selected element, detailed information about it is expanded (Fig. 10): URL of the website to which the change analysis process relates and the number of elements that are observed on it. When adding a new page (Fig. 11) to the watch list, it is necessary to enter name that will define it and the URL address where it is available. Then the user has an option for observing all elements on the selected site or defining them himself. In case of selecting the second option (SELECT ELEMENTS button), the window containing the WebView with the entered page will be displayed (Fig. 12). The user can navigate on the displayed page to its subsequent sub-pages and perform all interactions similarly to the normal web browser. In case of clicking on any element of the website which is not a hyper-link, a dialogue box will be displayed in which the user should enter the name of the observed element or stop the activity

278

K. Stepie´n and D. Kossowski

Fig. 10 Details

(Fig. 13). After confirmation, the selected element is surrounded by the red frame (Fig. 14). The user has option to select any number of items to be tracked. If on any page any change is made to an element that is being watched by the user, a PUSH notification is sent (Fig. 15). It consists of two elements: • title—contains the name of the site to which the modification relates, • description—information about the edited element. In case the user is observing all the objects on a given page, the notification does not provide general information.

5 Conclusions Nowadays, the Internet has become a place where there are websites from almost every area of life and it is possible to carry out almost all everyday duties, such as shopping, exchanging information or reading and acquiring new ones. information

Dynamic Analysis of Website Content Using a Mobile Application

279

Fig. 11 Adding new web-page sheet

on certain topics. Regardless of whether the user is an employee of a large company, a person who uses a mobile phone in his spare time, or even a programmer, each of them probably faces the problem of a huge amount of data appearing on the Internet, which are often impossible for themselves to process. However, this problem may concern completely different aspects and areas of the Internet, depending on the user. The average person spends more than 50% of his free time browsing websites, where the number of views from mobile devices is the highest. Although large companies, especially marketing ones, have been forced to use the process of extracting and processing data from websites, for example in order to analyze competition, the application market, especially mobile applications, still lacks solutions that allow For effective and efficient monitoring of changes and analyzing the content of websites. The created application enables the implementation of this process in an intuitive and user-controlled manner. The process of extracting information from websites can often pose threats to their owners. The extracted data may be made public on other websites, which may result in loss of credibility. Another case is the analysis of information in order to

280

K. Stepie´n and D. Kossowski

Fig. 12 Choosing elements

gain a competitive advantage in the market. This has contributed to the creation of ways to protect pages from being extracted by programs or to impose rules that programs navigating on the pages must meet, which allows for the proper functioning of applications, such as created solution, which do not cause any harm to the owners of the websites concerned. In addition to the Internet, the related hardware and software have significantly developed. Many devices with different operating systems with different operation made it necessary to develop tools that would allow for the correct operation of one application on many platforms. Despite the fact that there are already many similar solutions, they usually are related to certain limitations, most often in terms of efficiency. However, multi-platform applications significantly accelerate the program development time, and thus reduce the required costs. Their advantages have contributed to the wide use in the application market. The implementation of the mentioned application is one of the steps that is to ensure more time-saving users using the Internet. It allows the user to track changes

Dynamic Analysis of Website Content Using a Mobile Application

281

Fig. 13 Providing desired name

in any elements on websites, and the PUSH notifications sent eliminate the need to constantly check whether a modification has occurred. Therefore, created application is a multi-platform mobile application that can provide many benefits to its users, and in the future it can be developed and delivered to an even wider group of potential consumers.

282 Fig. 14 Chosen marked element

K. Stepie´n and D. Kossowski

Dynamic Analysis of Website Content Using a Mobile Application

283

Fig. 15 PUSH notification that informs user about changed element on the monitored website

References 1. Brody, H.: The Ultimate Guide to Web Scraping. Lean Publishing (2017) 2. Martinez, E.R.: React: Cross-Platform Application Development with React Native. Packt Publishing (2017) 3. Simpson, K., Schmitt, C.: HTML5 Cookbook. O’Reilly (2011) 4. Boduch, A.: React and React Native: A Complete Hands-on Guide to Modern Web and Mobile Development with React.js, 3rd Edn. (2020) 5. Veltri, G.A.: Digital Social Research Hardcover (2019) 6. Majchrzycka, A., Poniszewska-Mara´nda, A.: Secure development model for mobile applications. Bull. Polish Acad. Sci. Tech. Sci. 64(3), 495–503 (2016) 7. Eisenman, B.: Learning React Native, 2e, O’Reilly Media (2017) 8. Lewis, S., Dunn, M.: A Cross-Reference for iOS and Android Native Programming. Native Mobile Development (2019) 9. Michalska, A., Poniszewska-Mara´nda, A.: Security risks and their prevention capabilities in mobile application development. Inform. Syst. Manage. WULS Press 4(2), 123–134 (2015) 10. Seppe, V.B.: Practical Web Scraping for Data Science. APress (2018) 11. Poniszewska-Maranda, A., Majchrzycka, A.: Access control approach in development of mobile applications. In: Younas, M. et al. (eds.) Mobile Web and Intelligent Information Systems, MobiWIS 2016, LNCS 9847, ISSN 0302-9743, ISBN: 978-3-319-44214-3, pp. 149–162. Springer, Heidelberg (2016)

284

K. Stepie´n and D. Kossowski

12. Schrenk, M.: Webbots, Spiders, and Screen Scrapers, 2nd Edn. No Starch Press (2012) 13. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N.: Use of salesforce platform for building real-time service systems in cloud. In: Proceedings of 14th IEEE International Conference on Services Computing, IEEE SCC 2017, pp. 491–494. Honolulu, Hawaii, USA (2017) 14. Boduch, A.: React and React Native. Packt Publishing

Code Smells Detection Using Artificial Intelligence Techniques: A Business-Driven Systematic Review Tomasz Lewowski

and Lech Madeyski

Abstract Context Code smells in the software systems are indications that usually correspond to deeper problems that can negatively influence software quality characteristics. This review is a part of a R&D project aiming to improve the existing codebeat platform that help developers to avoid code smells and deliver quality code. Objective This study aims to identify and investigate the current state of the art with respect to: (1) predictors used in prediction models to detect code smells, (2) machine learning/artificial intelligence (ML/AI) methods used in prediction models to detect code smells, (3) code smells analyzed in scientific literature. Our secondary objectives were to identify (4) data sets and projects used in research papers to predict code smells, (5) performance measures used to assess prediction models and (6) improvement ideas with regard to code smell detection using ML/AI. Method We conducted a systematic review using a database search in Scopus and evaluated it using the quasi-gold standard procedure to identify relevant studies. In the data sheet used to obtain data from publications we factor research questions into finer-grained ones, which are then answered on a per-publication basis. Those are then merged over a set of publications using an automated script to obtain answers to the posed research questions. Results We have identified 45 primary studies relevant to the primary objectives of this research. The results show the prediction capability of the ML/AI techniques for predicting code smells. Conclusion Only a few smells—Blob, Feature Envy, Long Method and Data Class—have received the vast majority of interest in research community. The usage of deep learning techniques is increasing. Most researchers still use source code metrics as predictors. Precision, recall and F-measure are the go-to performance metrics. There seems to be a need for modern reference data/projects sets that reflect modern constructs of programming languages. We identified various promising paths of research that have the potential to advance the state of the art in the area of code smells prediction. T. Lewowski (B) · L. Madeyski Department of Applied Informatics, Wroclaw University of Science and Technology, Wroclaw, Poland e-mail: [email protected] L. Madeyski e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_12

285

286

T. Lewowski and L. Madeyski

Keywords Code smells, Software Engineering, Predictive modelling, Systematic review

1 Introduction Software industry is a huge business with worldwide revenue totaled $407.3 billion in 2013 [13]. According to World Quality Report (2016–2017), in average the industry was spending over 30% of the IT budget on Quality Assurance (QA) and Testing, while the report study participants predicted an upward move to 40% by 2019 [3]. Hence, precise detection of quality issues in code (issues that make the code hard to maintain and evolve, and thus need to be fixed/refactored), is of great importance. At the end of the previous century, the term “code smells” has been coined by Beck and Fowler [12] in the context of identifying quality issues in code that can be refactored. Since then a lot of researchers investigated the smell metaphor in software engineering describing a wide range of smells that can be detected, techniques that can be used to predict (detect) smells, as well as metrics that can serve as predictors of bad smells. The aim of this systematic review is to summarize a large body of knowledge in the aforementioned areas. However, it is worth mentioning that this review was conducted as a preliminary step of a research & development (R&D) project funded by NCBiR (POIR.01.01.01-00-0792/16) conducted in the code quest software development company1 . The company develops a platform, called codebeat 2 , for automated code review for mobile and web, supporting developers in detection of code smells. As the company wants to know what is the state of the art in prediction of code smells using artificial intelligence (AI) in general, and machine learning (ML) in particular, this review is in fact a business driven literature review with the goal to present the state of the art in code smells detection with research questions presented in Sect. 2.1.

1.1 Related Work This is not the first systematic review of literature regarding code smells. An earlier review by Zhang et al. [30] focused on more meta-research questions: which code smells were researched at the time, what were the aims of studies on code smells, what techniques were used in these studies and whether there is an actual evidence of usefulness of the code smell concept. Their study was focused on the original 22 code smells introduced by Fowler [12], and they discovered that relatively few smells attract most research, and most of the smells were not thoroughly analyzed. Their

1 2

codequest.com. codebeat.co.

Code Smells Detection Using Artificial Intelligence . . .

287

research included publications published between 2000 and 2009, which means that a lot of recent research is simply not present there. Singh and Kaur [24] incorporated data up to September 2015. They focused on refactoring of code smells and anti-patterns, but some of the research questions are related strictly to code smell detection. Authors analyzed a wide range of techniques and tools, but their focus is on brief presentation of all tools and techniques (not only automated, but also semi-automated and manual) used for code smell detection rather than on analyzing and comparing performance of ML/AI methods. Five other systematic studies on code smells were done in the last three years: [1, 2, 4, 22, 23]. In the first one, by Sharma and Spinellis [23], authors cover broad range of code smell-related issues, such as what do they actually represent, how do they get introduced into software systems, what is their effect on processes, artifacts and people and what are their detection methods. Due to the broad scope, the paper only briefly lists categories of smell detection techniques, without going into specific techniques in the category or achieved results. This paper covers publications published from year 1999 to 2016. A review by Santos et al. [22] covers a slightly different area—it focuses on the “themes” of studies investigating code smells, their experimental settings and convergence of their findings. It covers a total of 65 publications from years 2002 to 2017. The “theme” in the context of this study is kind of a context classification— themes are “Detection”, “Programming”, “Human aspects”, “Correlation with development issues”. This study found that there is a multitude of inconsistencies among researchers and, as of now, no known detection and removal techniques are adequate, including human evaluation. An extensive literature review conducted by Azeem et al. [2] included research questions on used machine learning techniques, independent variables, machine learning algorithms, data sets, evaluation techniques and impact of those factors on final model performance. The paper findings include limited support for code smells detection using machine learning techniques, the fact that current papers focus mostly on using code metrics, problem of code smell intensity is under-researched and the impact of each of these factors cannot be easily determined. The study analyzed papers published between 2000 and 2017. Caram et al. [4] discussed code smells addressed in the literature, used machine learning techniques and frequency of their usage as well as performance of various techniques. The study analyzed 26 papers published between 1999 and 2016. The conclusion was that all machine learning techniques perform comparably, with Decision Tree, Random Forest, Semi-supervised and Nearest Neighbor having a slightly better overall performance. One of the issues raised by the paper is incomparability of the studies, mostly due to using different data sets. A more recent study in the area by Al-Shaaby et al. [1] focuses on papers published between 2005 and 2018. 17 studies were deemed relevant and sufficiently precise quality-wise. The study addresses several research questions: used machine learning techniques, code smells that researchers attempt to identify, used performance measures, data sets and tools used for modelling.

288

T. Lewowski and L. Madeyski

None of these studies included any appendix with intermediate results, such as full initial list of considered publications, reasons for rejection for each of them or data elements extracted from each, which would simplify reproduction and improve reviewability.

1.2 Contributions of This Study In response to the above mentioned needs we conducted a systematic review on literature concerning prediction of code smells using ML/AI methods. As a result, the review makes the following contributions to the field: 1. Presents the state-of-the-art in the current code smells detection research including predictors and ML/AI methods used in prediction models, as well as the range of code smells analyzed in the scientific literature. 2. Identifies performance metrics used by researchers today. 3. Identifies data sets or software projects being their origin (including their size and other characteristics) used to create code smell prediction models. 4. Identifies research ideas to advance the domain of code smells prediction on which other researchers and tool vendors may build upon, as well as factors that influence the predictive performance of code smells prediction models. We present the details of our research methods in Sect. 2, the results of our systematic review in Sect. 3, the discussion of the results in Sect. 4, and we conclude with Sect. 5.

2 Methods We performed this systematic review (SR) according to the guidelines by Kitchenham et al. [14]. The processes we adopted are specified in this section.

2.1 Research Questions The research questions relating to our review are as follows: RQ1 Which predictors are used in prediction models to detect code smells? RQ2 Which ML/AI methods are used in prediction models to detect code smells and which methods are considered the best? RQ3 Which code smells are analyzed in scientific literature? RQ4 What data sets and projects, and of what characteristics are used in research papers to predict code smells?

Code Smells Detection Using Artificial Intelligence . . .

289

RQ5 Which performance metrics are most commonly used in the literature? RQ6 What are the ideas, in the existing research, upon which code smell prediction using machine learning may be built?

2.2 Protocol Development Initially a protocol was created to define the procedures we intended to use for the systematic review including the search process, the primary study selection process, the data extraction process and the data analysis process. It also identifies the main tasks of all the co-authors. The protocol was initially drafted by the second author and double-checked by the first author. The following sections are based on the processes defined in the protocol. Any divergences report our actual processes, as opposed to the planned processes are described in the protocol.

2.3 Search Process We intend to search for papers that will help us to answer research questions posed in Sect. 2.1 by following our search strategy described in Sect. 2.3.1.

2.3.1

Search Strategy

Our main search process will be an automated search using Scopus because of its wide coverage. From the point of view of the research project we are involved in and the code quest company, finding all of the relevant papers is not critical, but to be on the safe side we will validate the Scopus search using a quasi-gold standard [14, 29] performing a manual search across a limited set of topic-specific journals and conference proceedings over a restricted time period (year 2017). The results of this process are reported in subsequent sections.

2.3.2

Search Strings

We derived major terms from research questions, identified alternative spellings or synonyms for major terms using OR, used AND to connect the major terms, checked

290

T. Lewowski and L. Madeyski

the search terms in relevant publications we already had, and followed the rules to construct search strings in Scopus3 . As a result, our initial search string in Scopus was as follows: TITLE-ABS-KEY ( ( “code smell” OR “bad smell” OR antipattern OR anti-pattern OR “anti pattern” ) AND ( “machine learning” OR predict* ) ) AND PUBYEAR > 1998 which translates into the following URL: https://www.scopus.com/results/results.uri?sort=plf-f&src=s&sid=a9ac162c765 cdd97d420c650614d365f&sot=a&sdt=a&sl=155&s=TITLE-ABS-KEY+%28+ %28+%22code+smell%22+OR+%22bad+smell%22+OR+antipattern+OR+antipattern+OR+%22anti+pattern%22+%29+AND+%28%22machine+learning%22 +OR+predict*+%29+%29+AND+PUBYEAR+%3e+1998&origin=searchadvan ced&editSaveSearch=&txGid=272750ded430629599c4d74aae98f0e3 It is worth mentioning that Beck coined the term “code smell” in the context of identifying quality issues in code that can be refactored to improve the maintainability of a software in 1999 [12]. Hence, the time period to be covered by the review is limited by PUBYEAR > 1998 in the search string. Madeyski performed the search in Scopus on February 21, 2018. In total, 88 papers were returned from Scopus. All of the results were saved in BibTeX (and CSV) format. After analysis of the aforementioned preliminary set of 88 papers necessary corrections to our search string were introduced. The final search string was: TITLE-ABS-KEY ( ( “code smell’ OR “bad smell” OR antipattern OR anti-pattern OR “anti pattern” ) AND ( “machine learning” OR predict* OR detect OR detection OR heuristic* ) AND software ) AND PUBYEAR > 1998 while URL was: https://www.scopus.com/results/results.uri?sort=plf-f&src=s&sid=9d6009998ae 2e22265826addfe46ebd6&sot=a&sdt=a&sl=210&s=TITLE-ABS-KEY+%28+%28 +%22code+smell%22+OR+%22bad+smell%22+OR+antipattern+OR+anti-pattern +OR+%22anti+pattern%22+%29+AND+%28+%22machine+learning%22+OR+ predict*+OR+%7bdetect%7d+OR+%7bdetection%7d+OR+heuristic*+%29+AND +software+%29+AND+PUBYEAR+%3e+1998&origin=searchadvanced&editSave Search=&txGid=70e3ad5814b99ef9e407ff18c23a8c07 Madeyski performed the final search in Scopus on March 21, 2018. In total, 424 papers were returned from Scopus. All of the results were saved in BibTeX (and CSV) format for further analysis. Due to the fact that work on this paper tool substantial amount of time, the same search was re-run by Lewowski on June 05, 2020. This search yield a total of 607 papers. 3

More details can be found at https://service.elsevier.com/app/answers/detail/a_id/11213/ supporthub/scopus/#tips and https://service.elsevier.com/app/answers/detail/a_id/11236/kw/all %20fields/supporthub/scopus/.

Code Smells Detection Using Artificial Intelligence . . .

291

It is worth mentioning that now our search string includes now not only terms “predict” and “machine learning”, but also “detect” and “heuristic*”. Especially the word “detect” was sometimes used in the abstracts of interesting papers without the word “predict”. Checking the accuracy of the search string, Madeyski found that an important review paper by Sharma and Spinellis was missing in Scopus for unknown reason (other papers in JSS journal vol. 138 are indexed). Fortunately, the paper was indexed in Scopus later. Our search procedure may be summarized as follows: • Identify a tentative set of papers via automated search in Scopus. • Evaluate the papers for inclusions and exclusions. • Validate (and perhaps correct) the search strategy. Initially, we also planned to perform the snowballing procedure using both backward and forward snowballing as described by Wohlin [28], but the number of found, read and included primary studies was so large that we decided to stick to the already identified and accepted set of papers if it passes the validation step described in the next section.

2.3.3

Validating the Search Strategy

Two key criteria for assessing the automated search are recall (also termed sensitivity) and precision [7, 29] that can be calculated as follows: Recall =

R f ound Rtotal

Pr ecision =

R f ound Ntotal

(1)

(2)

where: Rtotal is the total number of relevant studies Ntotal is the total number of studies found R f ound is the number of relevant studies found Unfortunately, the practical problem in calculating recall is that Rtotal is not known. Hence, our strategy to validate the search process will be through the construction of a “quasi-gold standard” (QGS). We will incorporate the QGS concept, which consists of collection of known studies, and corresponding “quasi-sensitivity” into the search process for evaluating search performance as described by Zhang et al. [29]. The quasi-gold standard will be determined by performing a manual search across a limited set of topic-specific journals and conference proceedings over a restricted time period (year 2017). The approach has been originally evaluated through two participant-observer case studies with promising results. QGS

292

T. Lewowski and L. Madeyski

will be applied only to publications relevant to primary RQs (1–4) and publications relevant only to secondary RQs(5–6) will not be included. Our validation of the search strategy will be conducted in the following steps, similar to what was recommended by Zhang et al. [29]: Step 1A Step 1B Step 2 Step 3*

Step 4 Step 5

Determine initial search string using domain knowledge and experience. Identify relevant journals and conferences. Establish quasi-gold standard using manual search Revise search string using QGS results4 The review team screen all papers in the selected sources and apply the inclusion and exclusion criteria, defined in advance. Screening can be applied initially to the title and abstract of a paper (in Phase 1) or to the whole paper (in Phase 2). Conduct automated search Evaluate search performance If we do not reach quasi-sensitivity threshold of 75% then we go to Step 3 and revise search strings5 .

The workflow of the validation of the search strategy is presented in Fig. 1. The workflow diverges from what was proposed by Zhang et al. [29] in that initial search string is not determined using QGS results but using authors’ domain knowledge and experience. There are two causes for this divergence (taking advantage of the fact that there is more than one author of the review): 1. Independence of initial search string formulation and selection of venues for QGS enhances reliability of the validation procedure by removing coupling between the search string and the validation data set. 2. Lack of the aforementioned dependency enables parallel work on Steps 2 and 4, thus making the process shorter and delivering business value earlier. This allows stakeholders to review their expectations (for example by refining/adding/removing research questions or providing additional guidance), which in turn increases business relevance of the study, even at the cost of additional search string revision. The results of the automated search are compared to the results of the manual R ound ), as well as search (the quasi-gold standard) and quasi-sensitivity (Recall = Rftotal quasi-precision (Pr ecision =

R f ound ) Ntotal

can be calculated, as on a basis of:

• R f ound is the number of relevant studies found by the automated search (Step 4) that are published in the venues used in Step 2 (the manual search) during the time period covered by the manual search.

4

Apply only if recall does not reach required threshold. Zhang et al. [29] suggest that a sensitivity (recall) threshold (i.e., a completeness target) of between 70% and 80% might be used to decide whether to go to Step 3 (and to refine the search terms) or whether to proceed to the next stage of the review.

5

Code Smells Detection Using Artificial Intelligence . . .

293

Fig. 1 Workflow of the systematic search process (inspired by [29])

• Rtotal (the total number of relevant studies for the selected venues and time period) is the number of relevant papers found by the manual search (Step 2). • Ntotal is the total number of papers found by the automated search (Step 4). Validation of our search strategy requires (in Step 1B) to identify relevant journals and conferences6 . We decided to search for papers published in 2017 in the following top software engineering journals: • • • • • •

IEEE Transactions on Software Engineering (TSE) Empirical Software Engineering (EMSE) ACM Transactions on Software Engineering Methodology (TOSEM) Information and Software Technology (IST) Journal of Systems and Software (JSS) Journal of Software: Evolution and Process (JSEP).

and conference proceedings: 6

We focus on the main full papers research tracks of the conferences, and do not cover collocated conferences or workshops.

294

T. Lewowski and L. Madeyski

• International Conference on Software Engineering (ICSE) • Mining Software Repositories (MSR) • IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) • International Conference on Software Maintenance and Evolution (ICSME). Publications from a venue were extracted using Scopus on April 23, 2018 using search strings presented in Table 1. Number of extracted publications and established QGS publications are presented in Table 2. It is worth mentioning that limiting PUBYEAR in Scopus to 2017 does not exclude papers accepted in this year, but still waiting for assigning to a specific issue (so called articles in press). This means that more papers might have been analyzed (apart from papers published in 2017, we analyzed also some papers that will be published in 2018 and maybe even later), but does not affect the relevance of the procedure. Established QGS contains only a single publication. To verify why, we have investigated at which venues were the accepted papers published. Their DOIs and venue names are included in Table 3. Search Process Task Allocation: Madeyski prepared an initial search string and performed initial search in Scopus which returned 88 results. He verified if any known papers are missing, refined the search string, and performed a refined search in Scopus which returned 424 results. He prepared the search evaluation strategy on a basis of quasi-gold standard (QGS). Lewowski performed the manual search required by QGS.

2.4 Primary Study Selection Process Primary studies were filtered using a two-phase approach: in the first phase (screening) the paper’s title and abstract were verified with a checklist, and if the paper passed this phase, the second one was data acquisition from full text. Since the screening was done using only abstract, some papers were rejected during second phase—for example if their abstract suggested that the paper used machine learning, but in fact it did not.

2.4.1

Inclusion and Exclusion Criteria

The inclusion criteria for papers are defined as follows: • The paper reports the use of ML/AI prediction models • The paper is related to code smell detection The exclusion criteria: • The paper was an editorial, abstract, presentation slides, is not peer-reviewed or is not an article or chapter of a book or conference proceedings.

Code Smells Detection Using Artificial Intelligence . . .

295

Table 1 Search strings used to extract publications from Scopus to establish quasi-gold standard Venue Search string TOSEM

TSE EMSE

IST JSS JSEP ICSE

MSR SANER

ICSME

SRCTITLE (“ACM Transactions on Software Engineering and Methodology”) AND PUBYEAR = 2017 SRCTITLE ( “ieee transactions on software engineering”) AND PUBYEAR = 2017 SRCTITLE (“Empirical Software Engineering”) AND PUBYEAR = 2017 AND (LIMIT-TO ( EXACTSRCTITLE , “Empirical Software Engineering”) ) SRCTITLE (“Information and Software Technology”) AND PUBYEAR = 2017 SRCTITLE (“Journal of Systems and Software”) AND PUBYEAR = 2017 SRCTITLE (“Journal of Software Evolution and Process”) AND PUBYEAR = 2017 SRCTITLE (“International Conference on Software Engineering”) AND PUBYEAR = 2017 AND ( LIMIT-TO ( EXACTSRCTITLE , “Proceedings 2017 IEEE ACM 39th International Conference On Software Engineering ICSE 2017”) OR LIMIT-TO (EXACTSRCTITLE , “Proceedings International Conference On Software Engineering”) OR LIMIT-TO (EXACTSRCTITLE , “Proceedings 2017 IEEE ACM 39th International Conference On Software Engineering Software Engineering In Practice Track ICSE Seip 2017”) ) SRCTITLE (“Mining Software Repositories”) AND PUBYEAR = 2017 SRCTITLE (“IEEE International Conference on Software Analysis, Evolution and Reengineering”) AND PUBYEAR = 2017 SRCTITLE (“International Conference on Software Maintenance and Evolution”) AND PUBYEAR = 2017

• The paper was written in a language other than English. • Full text of the paper was not available. • The paper was published before 1999 when Fowler at al [12] first introduced the concept of code smells. • The paper did not attempt to use ML/AI for code smell prediction/detection in the context of text-based object-oriented, functional or procedural programming

296

T. Lewowski and L. Madeyski

Table 2 Results of QGS search and selection Title Total # of publications # of publications References accepted in Phase accepted in Phase 1 2 TSE EMSE TOSEM IST JSS JSEP ICSE MSR SANER ICSME

108 131 12 111 225 68 152 67 84 175

2 0 0 3 4 0 1 1 2 5

0 0 0 0 0 0 0 0 0 1

– – – – – – – – – [69]

Table 3 Publications published in 2017 accepted for analysis DOI Venue name http://orcid.org/10.1109/ICSME.2016.26

ICSME 2016—Proceedings of the 2016 IEEE International Conference on Software Maintenance and Evolution http://orcid.org/10.1109/ASE.2017.8115667 ASE 2017—Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering http://orcid.org/10.1016/j.knosys.2017.04.014 Knowledge-Based Systems http://orcid.org/10.1109/MOBILESoft.2017.29 MOBILESoft 2017—Proceedings of the 2017 IEEE/ACM 4th Int. Conference on Mobile Software Engineering and Systems http://orcid.org/10.1007/s11219-016-9309-7 Software Quality Journal http://orcid.org/10.5220/0006338804740482 ICEIS 2017—Proceedings of the 19th International Conference on Enterprise Information Systems

languages or a language being a mix of some or all of them (e.g., Scala is an objectoriented language, but has many features of functional programming languages). • The same or extended results were already published (only extended results are then included in the study). • The same authors published a study with the same title in conference proceedings (or book chapter), as well as in a journal (only journal paper, which typically is more thorough, was included). Phase 1 of selection process was performed using a checklist that contained inclusion and exclusion criteria, split into simple Yes/No statements that can be answered basing only on publication title and abstract. The checklist is designed in such a

Code Smells Detection Using Artificial Intelligence . . .

297

way that answering “No” results in rejection of the publication (i.e., all checks are represented as inclusion criteria). Applied checklist contained the following statements: 1. The entry is a single journal paper, chapter of a book or conference proceedings paper which requires peer review (i.e., it is not an editorial, abstract, technical report etc). 2. The paper is written in English. 3. The paper was published in 1999 or later. 4. Title or abstract of the paper indicates that it is related to software engineering. 5. Title or abstract of the paper indicates that at least one code smell/anti-pattern plays an important part of the study. 6. Title or abstract of the paper indicates that it might use machine learning techniques. 7. Abstract or full text of the paper indicates that it focuses on code smells/antipatterns in programming languages, 8. Abstract or full text of the paper indicates that it focuses on code smells/antipatterns detection using source code. 9. The paper does not focus on techniques for resolving code smells/anti-patterns. 10. The paper does not focus on using code smells/anti-patterns as predictors of other code or project traits. 11. The paper focuses on detection/prediction of code smells/anti-patterns. 12. If the paper is a chapter of a book or conference proceedings publication, its authors have not published a study under same title in a journal (we want to include the paper once and it may be expected that the journal version includes more details). 13. Full text of the paper is available. Results after Phase 1 of selection (on the basis of abstracts) are as follows: • • • •

Number of publications: 607. Number of relevant publications: 164. Precision: 27.0%. Number of relevant publications found in QGS, but not found in automated search: 0 • Number of relevant publications found in both QGS and automated search: 5 [5, 10, 18, 20, 69]. • Recall: 100%, i.e., above the assumed threshold 75%. Results after Phase 2 of selection (on the basis of full texts) are as follows: • • • •

Number of publications: 164. Number of relevant publications: 44. Precision: 26.8%. Number of relevant publications found in QGS, but not found in automated search: 0 • Number of relevant publications found in both QGS and automated search: 1 [69].

298

T. Lewowski and L. Madeyski

• Recall: 100%, i.e., above the assumed threshold 75%. We were aware of one more publication relevant to the study which was not present in search results, therefore we included the paper by Grodzicka et al. [42] to the initial publication set manually. It has gone through the regular checklist and data acquisition phases. Final list of accepted studies is listed in the end of the publication. Task allocation during selection process: 1. Lewowski applied the inclusion and exclusion criteria to the identified studies. 2. Madeyski checked the application of the inclusion/exclusion criteria on randomly selected papers. Disagreements were resolved by discussion. Agreement rate for results of first phase was 95% (disagreement on [21]—final decision: reject), for results of second phase— 80% (disagreement on [26]—final decision: reject and [63]—final decision: accept).

2.5 Assessing Study Quality Quality assessment is about determining the extent to which the results of an empirical study are valid and free from bias. We applied the quality checklist inspired by Dybå and Dingsøyr [8] which, as mentioned by Kitchenham et al. [14], has an advantage that can be used across multiple study types. The same checklist inspired other researchers performing systematic reviews of machine learning techniques in other areas as well [16, 27]. Each question has only three possible answers: “Yes”, “Partly”, or “No” and these three answers are scored in the following way: “Yes” = 1, “Partly” = 0.5, and “No” = 0 inspired by [16, 27]. The final score is obtained after adding the values assigned to each question. A study could have maximum score of 11 and minimum score of 0. The criteria we take into account are as follows: 1. Are the aims of the research clearly defined? Yes Partly No

goals of the paper are explicitly defined and presented goals of the paper are briefly mentioned (perhaps as part of introduction) the paper goes straight to the proposed concept, without discussing its goals

2. Is there an adequate description of the context in which the research was carried out (analyzed projects, data sets, data collection procedure etc.)? Yes

Partly

paper contains detailed information on analyzed data sets (including version tags or VCS revisions) or valid references to data sets, detailed procedure of data collection (or data collection script) and reproducible analysis procedure (or runnable script) paper contains information on analyzed data sets (without version tags or VCS revisions), approximate procedure of data collection, stated analysis goals

Code Smells Detection Using Artificial Intelligence . . .

No

299

only rough information on analyzed data sets (e.g. number of data sets), no or rough information on data collection procedure

3. Are the independent variables (predictors) and dependent variable(s) clearly defined? Yes

Partly

No

paper contains detailed description of all predictors and dependent variables. If predictors are obtained by running a tool, tool version is given (if required, parameters used for running the tool are given as well) paper contains rough description of used predictors or reference to them. If predictors are obtained by running a tool, tool version and used parameters are unknown (but the tool is accessible) paper contains only basic information on predictors, like their number and names (if names are not well-known). There is no possibility to acquire information about predictors from used tool (for example because the tool is not clearly mentioned or is not accessible).

4. Are the predictive modelling techniques clearly defined? Yes

Partly

No

paper either refers explicitly to a specific technique (by citing a paper that describes details) or describes in detail used modelling technique (for both prediction model and data model, if applicable) paper either refers to a well-known technique (e.g. “genetic algorithm” or “decision tree”) or describes general concepts of models (both prediction model and data model, if applicable) paper either does not describe modelling techniques or uses only general terms such as “neural network” or “population-based algorithm”

5. Are the performance measures used to assess the models clearly defined? Yes Partly No

paper either explicitly refers to papers defining the performance measures or defines them itself paper uses well-known measures, like precision and recall, but without defining or referencing them paper uses non-standard performance measures without defining them or does not perform performance evaluation

6. Are the performance measures used to assess the models considered credible? Yes Partly

No

performance measures use all quadrants of the confusion matrix, e.g. MCC performance measures use some quadrants of the confusion matrix, e.g. precision and recall use together three out of four quadrants of confusion matrix or use entirely different mechanisms (e.g. correlation with defects) no performance measurement done

7. Are the limitations or threats to validity of the study specified? Yes

a detailed analysis of threats is done in the paper

300

Partly No

T. Lewowski and L. Madeyski

only brief threat analysis is performed no analysis of threats is given in the paper

8. Is the research reproducible (is a package including data sets and code or a detailed description available)? Yes Partly No

complete information required to reproduce the study is available some of the information is missing (e.g. code version, data set version, script parameters) paper lacks most of these elements

9. Is the proposed method or methods compared with other methods and/or baselines? Yes Partly No

well-defined baseline is used and uses same performance measures and data sets as proposed methods baseline is used, but it is not entirely representative (e.g. results reported on different data sets by other researchers are used as baseline) no baseline is given

10. Are the findings of study clearly stated and supported by reported results? Yes Partly No

findings are presented and fully and unambiguously supported by reported results findings are presented and reported results can be reasonably interpreted as supporting the findings findings are either not presented or are contradictory to reported results (or at least not supported by them)

11. Does the study provide convincing arguments about additional value given to academia or industry community? Yes Partly No

earlier research is described and new contributions are clearly stated new contributions are stated without or with limited reference to earlier research new contributions are not stated or are not new

Results of assessing study quality are presented in Table 4 and Fig. 2. Quality scores will help us during the interpretation of the findings of a review, but our observation is that the quality scores do not always correlate with the importance

Table 4 Statistics for quality assessment scores Statistic name Statistic value Max score Average score Min score Total number of publications

10.5 6.23 0.0 44

Code Smells Detection Using Artificial Intelligence . . .

301

Fig. 2 Number of publications scored given amount of points in quality assessment rating

of the research ideas, to advance the domain, on which other researchers and tool vendors may build upon while developing code smells prediction tools. For example, Palomba et al. [69] scored only 4 points in quality assessment. However, this was a space-constrained conference paper, and it introduced concepts of smell detection via history mining and via text similarity analysis. To avoid rejection of such interesting papers, we have decided not to reject any paper solely on the basis of quality assessment score. All of the low-scoring papers are either short conference papers or workshop notes that usually lack details related to limited descriptions of threats to validity, reproducibility, and lack of comparison to the baseline, likely due to papers’ length limitations.

302

T. Lewowski and L. Madeyski

2.6 Data Extraction Data extraction form was prepared in Google Sheets to streamline, as much as possible, data synthesis steps, as well as progress monitoring. Selected fields of the form are presented below: • • • •

DOI Authors Title Assessing study quality (each question has only three possible answers: “Yes”, “Partly”, or “No” and these three answers are scored in the following way: “Yes” = 1, “Partly” = 0.5, and “No” = 0): – Are the aims of the research clearly defined? – Is there an adequate description of the context in which the research was carried out (analyzed projects, data sets, data collection procedure etc.)? – Are the independent variables (predictors) and dependent variable(s) clearly defined? – Are the predictive modelling techniques clearly defined? – Are the performance measures used to assess the models clearly defined? – Are the performance measures used to assess the models considered credible? – Are the limitations or threats to validity of the study specified? – Is the research reproducible (is a package including data sets and code or a detailed description available)? – Is the proposed method or methods compared with other methods and/or baselines? – Are the findings of study clearly stated and supported by reported results? – Does the study provide convincing arguments about additional value given to academia or industry community?

• • • • • •

PQ1: Which code smells are analyzed in the paper? PQ2: Which predictors are used for each of those code smells? PQ3: Which ML/AI methods are used for detection of the smells? PQ4: Which data sets were used in the study? PQ5: What are the reported performance measures of prediction models? PQ6: What novel concept/technique is introduced by the study?

We decided to include in the workbook references to page numbers while performing data extraction of every important and hard to find again later chunk of information. Data Extraction Process Task Allocation: 1. Lewowski undertook all the extractions, which was held in Google Sheets. 2. Madeyski independently checked the extraction for randomly selected papers. 3. Disagreements were resolved by discussion. There were two disagreements, that resulted in adjustments of collected data.

Code Smells Detection Using Artificial Intelligence . . .

303

2.7 Data Synthesis and Aggregation Process The basic objective while synthesizing data is to accumulate and combine data and figures from the selected primary studies in order to formulate a response to the posed research questions. In order to answer the research questions we used visualization techniques such as bar charts and box plots. We also used tables for summarizing and presenting the results combined with narrative synthesis. During data preprocessing, smells with similar definitions will be merged into one, details of execution of machine learning algorithms (such as whether boosting was used or which parameters were configured) will be erased and project names and versions will be adjusted to common scheme. If there are multiple similar machine learning techniques (for example C4.5, J48 and generic Decision Tree), they will also be merged into single category. Preprocessed data will be stored in a separate file in provided data set, so that access to raw data will not be lost. Preprocessing is performed manually. Data used in the study as well as reproduction scripts are published on Zenodo: https://doi.org/10.5281/zenodo.4783264.

3 Results

Figure 3 presents number of publications relevant to this study published in a given year. A rise in interest is visible since 2009, probably due to increased interest in machine learning techniques.

Fig. 3 Number of relevant publications per year

304

T. Lewowski and L. Madeyski

Table 5 Number of publications containing reference to given predictors Type of predictor Number of publications Code metrics Vectorized source Textual similarity Code history Other Unknown

30 3 2 2 6 5

3.1 RQ1: Which Predictors Are Used in Prediction Models to Detect Code Smells? While the concept of using product metrics is shared between most publications, predictor sets are hardly shared between any of them—most often used predictor set is used by three publications (not counting source code as input), therefore it is not possible to establish whether performance differences are related to usage of different predictors or to other differences between studies (e.g., different data sets). Even if publications claim to use same predictor set, it is hard to guarantee that they actually do that—since software metrics are calculated automatically by some tool, even if the description of a metric is given, it is not sufficient for full reproducibility. This is because various tools, or even different versions of same tool, may have operational-level differences or defects in calculation routines, which will result in no possibility to reproduce the study. While some studies utilize high amount of metrics [39], the same studies claim that best prediction model used only few of them. Table 5 presents aggregated view on types of predictors used in the studies. Some publications [59, 69] use more than one type of predictor, so the numbers in Table 5 do not add up to the number of studies.

3.2 RQ2: Which ML/AI Methods Are Used in Prediction Models to Detect Code Smells? We started with dividing used methods into the following categories: Trees Rules Genetic Programming Neural Networks Deep Neural Networks

C4.5, C5.0, J48, unnamed decision trees JRip, Association rules Genetic Programming, Multi-Objective Genetic Programming MLP, perceptrones, Voted perceptrones Autoencoders, Convolutional Networks

Code Smells Detection Using Artificial Intelligence . . .

305

Table 6 Number of publications containing reference to given ML methods ML/AI method type Number of publications Trees Support Vector Machine Random Forests Statistical Population Deep Neural Networks Rules Neural Network Genetic Programming Regression Other

Statistical Population Regression Random Forests SVM Other

12 11 9 9 8 7 6 5 3 3 9

Bayesian Belief Networks, Naive Bayes Genetic Algorithms, PSO, BFO, Evolutionary Algorithms, SPOA Logistic regression Random Forest Support Vector Machine (with any kernel), SMO FP-growth, Decision tables, KNN, Clustering, Similarity measures, Tuning machine

Trees and SVMs (used in 12 and 11 studies respectively) are the most commonly used machine learning techniques, with statistical methods and Random Forests (both in 9 studies) closely following. A recent rise in deep learning techniques, used in 7 studies, is also visible. Numbers of publications are presented in Table 6.

3.3 RQ3: Which Code Smells Are Analyzed in Scientific Literature? By far the most researched smell is Blob, a class-level smell, present in 32 publications. Blob category contains several smells (like Large Class, Big Class and God Class), for which boundaries are not defined well enough to justify separation. It is relatively well-defined, as a class which is bigger and/or more complex than it should be. Next come Feature Envy with 26 analyzing publications and Long Method with 20 publications. Of those Long Method is well defined and fairly well understood, while the understanding of Feature Envy seems to be more vague—sometimes it is

306

T. Lewowski and L. Madeyski

Table 7 Number of publications containing reference to given code smells Smell Number of publications Blob Feature Envy Long Method Data Class Spaghetti Code Functional Decomposition Shotgun Surgery Lazy Class Long Parameter list Divergent Change Swiss Army Knife Misplaced Class Duplicated Code Leaking Inner Class Member Ignoring Method Parallel Inheritance Promiscuous Package

32 26 20 14 8 8 5 5 5 3 3 3 2 2 2 2 2

attributed to class, sometimes to method. Approaches to detect Feature Envy are also more diverse than those used to detect Long Method. Further researched smells are Data Class (14 publications), Spaghetti Code (8 publications), Functional Decomposition (8 publications), Shotgun Surgery (5 publications), Lazy Class (5 publications) and Long Parameter List (5 publications). In total there were references to 59 different smells. All smells that were researched in more than one publication are listed in Table 7. A substantial number of smells is only referred to by a single publication, probably because they were not formalized earlier. This applies particularly to domain-specific smells, such as Android smells (like UI Overdraw) or Web Service smells (like Chatty Web Service).

3.4 RQ4: What Datasets and Projects, and of What Sizes Are Used in Research Papers to Predict Code Smells? According to gathered data, there is no single data set that is shared and universally accepted. Most authors provide rough information regarding used projects—for example their names and versions [59, 73] or their general characteristics (e.g. [66, 72]).

Code Smells Detection Using Artificial Intelligence . . .

307

The most common annotated data set is the one published by Fontana et al. [39], and it is used in 8 studies. Another code smell data set, Landfill [19], is only used in a single paper, by Hadj-Kacem and Bouassida [46]. Finally, data set from the work of Di Nucci et al. [6] is used in a single study by Guggulothu and Moiz [43]), but in a modified version. Two studies give no details on used data sets. Out of projects, the one most often encountered was Xerces, used in 12 of the studies. Out of the rest only a few were used in more than two studies—these include Azureus, ArgoUML, Ant, Gantt Project, Log4j and JFreeChart. All these projects are outdated as of today—used versions were released over ten years ago (for example, Azureus 2.3.0.6 was released on 4th February 2006, Xerces:2.7.0 on 24 June 2005 and Nutch:1.1 no 7 June 2010. We were able to obtain release dates and basic size statistics for 30 open source projects and present those in Table 8. Those projects contain between 22 and 4844 Java source files with an average of 724 and between 3374 and 325301 LoC with an average of 81912 (as calculated by CLOC 1.727 ). Release dates range between 17/05/2002 and 25/06/2014, with a median of 04/09/2009. Some studies used Qualitas Corpus [25] as the source of code smells. It contains sligthly older set of projects—starting from 10/07/2002, ending on 15/12/2011—but in similar size range (between single thousands and hundred of thousands of lines).

3.5 RQ5: Which Performance Metrics Are Most Commonly Used in the Literature? Two most commonly used performance metrics are precision (in 29 papers) and recall (26 papers). They are often accompanied by F-measure (in 17 papers). In earlier papers accuracy (11 papers) was fairly often used. Seven publications use area under receiving operating characteristic (AUROC) as the performance metric, while three use Matthews Correlation Coefficient (MCC). Three papers do not use any performance metric. Only two papers report full confusion matrix. Several papers use other performance metrics, usually strictly related to the way the research was performed—for example Kessentini and Ouni in [55] use relevance, understood as count of algorithm recommendations that were accepted by developers while Kaur et al. in [53] use the ratio of defects detected (in this research not every smell represents a defect) to defects in the source code (Table 9).

7

https://github.com/AlDanial/cloc.

308

T. Lewowski and L. Madeyski

Table 8 Basic information about projects used as sources for data sets Project name

URL

Release date

# of files

Apache Ant:1.5.2

https://github.com/apache/ant

13/01/2016

932

# of Java LoC 91132

Apache Ant:1.7.0

https://github.com/apache/ant

13/01/2016

1194

123697

ArgoUML:0.26

https://argouml-tigris-org.github.io/

27/09/2008

1752

186425

ArgoUML:0.30

https://argouml-tigris-org.github.io/

11/02/2010

2210

200662

ArgoUML:0.34

https://argouml-tigris-org.github.io/

15/12/2011

1922

195670

AspectJ:1.5.3

https://github.com/eclipse/org.aspectj

22/11/2006

4844

325301

Class Editor:2.23

http://classeditor.sourceforge.net

21/03/2004

66

10027

DavMail:4.5.1

https://github.com/mguessan/davmail

20/06/2014

181

29696

DirBuster:1.0

https://sourceforge.net/projects/ dirbuster/

27/02/2009

75

12928

FormLayoutMaker:8.2.1rc

https://sourceforge.net/projects/ formlayoutmaker/

26/03/2006

22

4239

HSQLDB:2.2.9

https://sourceforge.net/projects/hsqldb/

06/08/2012

529

164026

Java3D Modeler:1.3.5

https://sourceforge.net/projects/ java3dmodeler/

24/07/2012

78

9105

jEdit:4.5pre1

https://sourceforge.net/projects/jedit/

19/11/2011

554

110869

JFreeChart:1.0.13

https://sourceforge.net/projects/ jfreechart/

20/04/2009

989

143062

JFreeChart:1.0.14

https://sourceforge.net/projects/ jfreechart/

20/11/2011

1005

146966

JFreeChart:1.0.9

https://sourceforge.net/projects/ jfreechart/

04/01/2008

920

128209

JFtp:1.53

https://sourceforge.net/projects/j-ftp/

07/11/2010

133

23808

JHotDraw:6.1

https://sourceforge.net/projects/ jhotdraw/

07/10/2004

484

28399

JPropsEdit:1.0.2

https://sourceforge.net/projects/ jpropsedit/

22/07/2003

47

3374

Log4j:1.2.1

https://github.com/apache/log4j

17/05/2002

283

23363

Lucene:1.4.3

https://github.com/apache/lucene-solr

26/11/2004

244

25472

nTorrent:0.5.1

https://code.google.com/archive/p/ ntorrent/

28/11/2009

377

36286

Nutch:1.1

https://github.com/apache/nutch

26/06/2010

447

45357

outliner:1.8.10.6

https://sourceforge.net/projects/ outliner/

04/06/2004

418

35404

PDF Split and Merge: 2.2.4 https://sourceforge.net/projects/ pdfsam/

25/06/2014

303

26717

pdfsam:2.2.1

https://sourceforge.net/projects/ pdfsam/

24/11/2010

299

26058

Rhino:1.6

https://github.com/mozilla/rhino

23/07/2007

175

58303

Rhino:1.7R1

https://github.com/mozilla/rhino

25/04/2011

329

78197

Tyrant:0.334

https://sourceforge.net/projects/tyrant/

12/06/2005

179

41331

Xerces:2.7.0

https://github.com/apache/xerces2-j

24/06/2005

740

123275

Code Smells Detection Using Artificial Intelligence . . .

309

Table 9 Number of publications using given performance metric Performance metric Number of publications Precision Recall F-measure Accuracy AuROC MCC

29 26 17 11 7 3

3.6 RQ6: What Are the Ideas, in the Existing Research, Upon Which Code Smell Prediction Using Machine Learning May Be Built? The following directions of future research and development seem to be promising in the light of the performed review: 1. Text analysis, process metrics may add new useful predictors not correlated with classic ones (e.g., product based metrics) to the tool. Unlikely that this information is used by any of the tools on the market. Fusion of the classic metrics and new ones may lead to interesting results. 2. Search-based software engineering methods (e.g., multi-objective optimization algorithms using genetic programming [34, 54, 55, 63]) may be combined with classic ML methods as well to improve the results even further. 3. Some studies, e.g. the one by Hozano et al. [48], analyze level of agreement between developers on the same set of code smells. The study yields 0.222 (Feature Envy) −0.421 (Data Class) inter-rater agreement measured by κ, which is a measure to evaluate the concordance or agreement among multiple raters described by Fleiss [9], as to whether a given structure is or is not a smell. It is important to take this into account when setting goals for and evaluating code smell prediction tool(s), despite the fact that some scientific publications reported over 95% accuracy or F-measure in detecting code smells when prediction models were trained on data sets produced by a small group of people with similar background and experience (e.g., a small group of MSc students attending same preparation lectures). Tool vendors aiming to serve a wide range of developers with different background and skill sets may expect low inter-rater agreement. An interesting path of further R&D activities seems to be customization of code smell prediction models to specific projects. 4. Quite a lot of research (and thus one may expect code smells detection tools development) was performed using really old versions of software projects (e.g., webmail-0.7.10 released in 2002, being a part of QualitasCorpus), often using very old versions of Java (e.g., Java 5 released in 2004), see Table 8. A promising path of future research would be to take into account how long way made

310

5.

6.

7.

8.

T. Lewowski and L. Madeyski

programming languages like Java, which now include, e.g., closures, streams, varargs, type inference for local variables, generics, enumerations, annotations, foreach loop, static imports and vast changes in standard libraries (introduction of immutable data types, improved concurrency, database access, IO and lot of others). Furthermore, very few projects were used in more than three studies (these include: Xerces, Gantt Project, Apache Ant, JFreeChart and Azureus). It would be important to create modern version of reference data/projects sets that would reflect modern constructs of programming languages (one such attempt was done by Grodzicka et al. [42]) instead of applying contemporary ML/AI techniques to old projects, which may not reflect fully how software is developed nowadays, and thus how code smells may look like when new language constructs are employed. This need for a benchmark data set is also supported by the fact that results vary greatly between publications, which is likely to be caused by different training data—most of the publications do not publish data, thus easy replication is not possible. Number of predictors (metrics) used in some papers is large (e.g.in the paper by Fontana et al. [39]), but even in such papers it was possible to extract the most important predictors (e.g., used to extract rules). In further research, it would make sense to focus on rather small number of important predictors to avoid overfitting of the models, which otherwise would overemphasize patterns that are not reproducible. Another direction of further research could be focused on applying ML/AI methods to detect code smells which were not covered by any of the reviewed papers, see the list in the beginning of Sect. 4. Publications generally use precision and recall as performance metrics. High precision is critical from the business point of view, while high recall is only niceto-have (cost of smell detected later on is generally lower than cost of analyzing false positives). That said, valuable performance measures according to which prediction models should be evaluated are measures which take into account all of the four quadrants of the confusion matrix (e.g., MCC). Otherwise performance measure could be misleading. Hence, an important path of further research and development would be to evaluate models using better performance measures. Data acquisition is generally a resource-consuming task. While it likely cannot be automated (since this would equal automated code smell detection and no machine learning would be necessary), it may be reasonable to use some kind of advisors. Advisors proposed in literature vary from regular code smell detection tools (as in the work by Fontana et al. [39]) up verification of effects of a potential refactoring (as in the work by Liu et al. [61])

Combining directions 3, 1 and 2 with observation that the work by Hozano et al. [48] was inspired by Fontana et al. [39] and thus used product metrics as predictors, we pose a hypothesis that using only product metrics may yield good results for homogeneous groups of developers producing training data (for example, accuracy was over 95% in the results obtained by Fontana et al. [39]), but much worse for groups of developers with more divergent backgrounds, as described by Hozano

Code Smells Detection Using Artificial Intelligence . . .

311

et al. [48] (mean accuracy 43.7–63%). In subject literature there are the following promising inspirations to embrace that may not be yet well-explored by code smells detection tool vendors: • • • •

process metrics, such as code change history, lexical analysis, such as similarities between fragments of code, deep learning, which includes analysis of source as token stream, search-based methods.

4 Discussion In this section we address our research questions, discuss our results and their implications. Table 7 shows that most research that applies ML/AI techniques to code smell detection focuses on the original smells by Fowler et al. [12] (with an exception for Blob, which is present in the original list as “Large Class”). The only smell from top 5 most often researched that did not appear in the original list is “Spaghetti Code”. While other researchers attempt to extend the smell list with new ones, for example Kessentini and Ouni [55] introduced smells dedicated to mobile development, apparently these attempts did not yet cause a major change in perception of code smells and did not make it yet to the mainstream of the discussed domain. What we did not find, but expected to find in our review were publications addressing some of the code smells originally defined by Fowler et al. [12]: Data Clumps, Switch Statements, Middle Man, Alternative Classes with Different Interfaces, Incomplete Library Class. Hence, almost one fourth of code smells defined by Fowler was not considered in any of the analyzed papers.

4.1 Threats to Validity It is important to assess the threats to validity (e.g., construct, internal, external), particularly constraints on the search process and deviations from the standard practice.

4.1.1

Internal Validity

Internal validity concerns the process of performing the study. We exclude threats related to study reproducibility, as these are explained in detail in Sect. 4.1.4. An important threat arises from data preprocessing layer—since we merged some of the categories (e.g., stripped parametrization from all methods, merged similar smells), it is possible that we accidentally analyze multiple concepts under same common

312

T. Lewowski and L. Madeyski

name (this would be particularly visible for Blob smell and SVM machine learning method). Additionally we assume that metrics are calculated in similar manner in multiple publications. While it would seem reasonable for a software metric (e.g., WMC or LoC) to always represent the same value, it is not guaranteed, especially if researchers use different frameworks for calculating metric values (or even different versions of the same framework), that they are actually implemented in exactly the same manner—even if used specification is same, there may be defects in implementation, variations between tools or versions of the same tool. Next threat to internal validity arises from various projects and techniques used as teaching/training data for ML/AI algorithms. The range of used projects is remarkably wide, but for the sake of quantitative analysis we analyzed values of performance measures wrt. code smell, machine learning technique and used set of predictors, without regarding characteristics of specific data sets on which the values were calculated. While we were not able to find research proving that smells are context-sensitive, this does not seem unlikely, which constitutes a threat.

4.1.2

Construct Validity

Construct validity concerns design of the study and its possibility to reflect the actual goal of the research. To avoid threats in study design we have applied a procedure of systematic literature review. To assure that researched area is relevant for study goal, we have cross-checked research questions with developers from code quest and adjusted them several times to address the business needs. As always in literature review, it is possible that some relevant studies were not included in the search. To address this issue, we conducted a verification using a quasi-gold standard procedure using publications from 10 top venues from year 2017. However, both initial study selection and quasi-gold standard search were performed using Scopus database, therefore only publications present in Scopus are analyzed. The search term used in this systematic literature review is limited. For example, only papers referring to code smells via “code smell” OR “bad smell” OR antipattern OR anti-pattern OR “anti pattern” will form the initial data set. While it is possible that some will refer to the same concepts with a different naming, we believe that the terms are established well enough to ensure that a significant majority of relevant papers will use them. We decided to focus on precision and treat recall as a slightly less important performance measure—this decision was made, because goal of the whole NCBiR project is to reduce ratio of false positive errors. False positive errors are related to code snippet classified as smells, but which are not perceived as smells. While we believe that focus on precision is a reasonable choice, it may be considered a risk for construct validity. It is important to also note that, while the term “code smells” was coined to name the original 22 coding structures described by Fowler et al. [12], it is not exactly restricted to them—on the contrary, this metaphor was widely adopted to name not

Code Smells Detection Using Artificial Intelligence . . .

313

only structures in the source code, but also in the process, architecture and many other areas. This study is not restricted to the original set of smells, if others are well-represented, they are be analyzed as well.

4.1.3

External Validity

External validity concerns possibility of generalizing the study to broader range of applications. Most of the papers studied smells in Java programming language, often in old versions (e.g., projects from 2005 in the work of Fontana and Zanoni [11]). While Java as a language is very common, we admit that it is not used in every branch of industry (e.g., iOS applications are generally written in Swift, web interfaces use mostly JavaScript, Android has recently adopted Kotlin as a standard language competing Java). Code smells described by Fowler et al. [12] were generally meant for object-oriented languages like Java, Smalltalk or C#—in other languages they may be even recommended solutions. This may be especially true when analyzing languages with different paradigm—which is relevant, considering recent increase of interest in functional programming (Scala, Elixir, Clojure) and adoption of its features even in the mainstream languages (e.g., Java). For example, Data Class is considered a code smell in object-oriented paradigm, but is an absolutely fine pattern in the realm of functional programming. Projects used in most of the studies are relatively old. As a result, they are generally written in old versions of Java—even Java 5 or 6. As of today, the most recent version is Java 15 and the oldest supported LTS version is Java 11. Between these versions, significant changes were made to the core of the language, including shifting the paradigm from strictly object-oriented to object-oriented with minor functional features (like streams or immutable objects). This yields a threat for generalizing the results to newer versions of Java. An big threat to external validity is the technique used by researchers to assess existence of code smells—in most cases these were assessed by a briefly trained students. This is a problem, because existence of smell may be linked to some more complex program structure, which cannot be spotted by novices.

4.1.4

Reliability

Reliability is concerned with possibility to reproduce the research and achieve same results. To guarantee maximum possibility for reproduction, we describe research procedure in detail in the and attach links to gathered data and processing scripts. However, some steps were performed manually. To further improve auditability, we provided a checklist for the first step of publication filtration. While most publication selection and data extraction was done by one person, we performed three levels of cross-checks (after initial screening, after final selection and after data gathering) with high level of agreement. Another threat is that in our original search, we considered all studies present in the database, i.e., we did not constrain upper bound for

314

T. Lewowski and L. Madeyski

publication date. While this was done on purpose—to include as many recent studies as possible—the effect is that using the same search string will not yield same results, which may impact study reproducibility.

5 Conclusions

Interest in academia for using machine learning techniques for code smell detection has definitely increased as of lately, which is indicated by the growing number of papers published on the topic and conducted literature reviews. It is clear that currently the most common predictors for whether a code sample constitutes a code smell or not are source code metrics. Typical machine learning algorithms are still dominant, with trees, SVMs, Random Forests and statistical methods being the most commonly used techniques. However, with the advent of deep learning, a new trend is visible—on one hand, feature reduction, and on the other—automated feature extraction from code using tools like word2vec. Blob, Feature Envy, Data Class and Long Method are four most commonly researched smells in the literature. It is likely that existence of an independent data set provided by Fontana et al. [39] has boosted research in these particular areas. On the other hand, there are many smells that are only referred by a single paper, which may mean that either they are not noticed by the research community, or the research on them is carried under different labels (for example “anomalies” or “inconsistencies”). Precision, recall and F-measure are three most commonly reported model performance metrics. Full confusion matrix is reported only in a few cases. Problem of data sets used for machine learning is visible and is actively addressed by researchers. In this review, we only found one data set used by several researchers [39], but new ones have already been published [15, 17]. We hope that this will lead the community into a shared understanding of the concept of each code smell, and to a solution that is relevant to the industry. Acknowledgements This research was partly financed by Polish National Centre for Research and Development grant POIR.01.01.01-00-0792/16: “Codebeat—wykorzystanie sztucznej inteligencji w statycznej analizie jako´sci oprogramowania.”

References 1. Al-Shaaby, A., Aljamaan, H., Alshayeb, M.: Bad smell detection using machine learning techniques: a systematic literature review. Arabian J. Sci. Eng. 45, 2341–2369 (2020). https://doi. org/10.1007/s13369-019-04311-w

Code Smells Detection Using Artificial Intelligence . . .

315

2. Azeem, M.I., Palomba, F., Shi, L., Wang, Q.: Machine learning techniques for code smell detection: a systematic literature review and meta-analysis. Inf. Softw. Technol. 108, 115 – 138 (2019). https://doi.org/10.1016/j.infsof.2018.12.009 3. Buenen, M., Muthukrishnan, G.: World quality report 2016–17. Technical report, Sogeti and Hewlett Packard Enterprise, Capgemini (2016) 4. Caram, F., de Oliveira Rodrigues, B.R., Campanelli, A., Silva Parreiras, F.: Machine learning techniques for code smells detection: a systematic mapping study. Int. J. Softw. Eng. Knowl. Eng. 29, 285–316 (2019). http://orcid.org/10.1142/S021819401950013X 5. Chen, B., Jiang, Z.M.: Characterizing and detecting anti-patterns in the logging code. In: Proceedings—2017 IEEE/ACM 39th International Conference on Software Engineering, ICSE 2017, pp. 71–81 (2017). https://doi.org/10.1109/ICSE.2017.15 6. Di Nucci, D., Palomba, F., Tamburri, D.A., Serebrenik, A., De Lucia, A.: Detecting code smells using machine learning techniques: Are we there yet? In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 612–621 (2018). https://doi.org/10.1109/SANER.2018.8330266 7. Dieste, O., Grimán, A., Juristo, N.: Developing search strategies for detecting relevant experiments. Empirical Softw. Eng. 14(5), 513–539 (2009). http://orcid.org/10.1109/ESEM.2007. 19 8. Dybå, T., Dingsøyr, T.: Empirical studies of agile software development: a systematic review. Inf. Softw. Technol. 50(9–10), 833–859 (2008). http://orcid.org/10.1016/j.infsof.2008.01.006 9. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971). http://orcid.org/10.1037/h0031619 10. Fontana, F.A., Pigazzini, I., Roveda, R., Zanoni, M.: Automatic detection of instability architectural smells. In: Proceedings—2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, pp. 433–437 (2017). https://doi.org/10.1109/ICSME.2016.33 11. Fontana, F.A., Zanoni, M.: Code smell severity classification using machine learning techniques. Knowl. -Based Syst. 128, 43–58 (2017). http://orcid.org/10.1016/j.knosys.2017.04. 014 12. Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston, MA, USA (1999) 13. Gartner: Gartner says worldwide software market grew 4.8 percent in 2013 (2014) 14. Kitchenham, B., Budgen, D., Brereton, P.: Evidence-Based Software Engineering and Systematic Reviews. CRC Press (2016). http://orcid.org/10.1007/11767718_3 15. Madeyski, L., Lewowski, T.: MLCQ: Industry-relevant code smell data set. In: Proceedings of the Evaluation and Assessment in Software Engineering, EASE ’20, pp. 342–347. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3383219. 3383264 16. Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Softw. Comput. 27, 504–518 (2015). http://orcid.org/10.1016/j.asoc.2014.11.023 17. Palomba, F., Bavota, G., Di Penta, M., Fasano, F., Oliveto, R., Lucia, A.: On the diffuseness and the impact on maintainability of code smells: a large scale empirical investigation. Empirical Softw. Eng. pp. 1–34 (2017). https://doi.org/10.1007/s10664-017-9535-z 18. Palomba, F., Di Nucci, D., Panichella, A., Zaidman, A., De Lucia, A.: Lightweight detection of android-specific code smells: the adoctor project. In: SANER 2017—24th IEEE International Conference on Software Analysis, Evolution, and Reengineering, pp. 487–491 (2017). https:// doi.org/10.1109/SANER.2017.7884659 19. Palomba, F., Di Nucci, D., Tufano, M., Bavota, G., Oliveto, R., Poshyvanyk, D., De Lucia, A.: Landfill: an open dataset of code smells with public evaluation. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 482–485 (2015). https://doi.org/ 10.1109/MSR.2015.69 20. Palomba, F., Panichella, A., Zaidman, A., Oliveto, R., De Lucia, A.: The scent of a smell: an extensive comparison between textual and structural smells. IEEE Transa. Softw. Eng. (2017). http://orcid.org/10.1109/TSE.2017.2752171

316

T. Lewowski and L. Madeyski

21. Romano, S., Scanniello, G., Sartiani, C., Risi, M.: A graph-based approach to detect unreachable methods in java software. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, SAC ’16, p. 1538–1541. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2851613.2851968 22. Santos, J.A.M., Rocha-Junior, J.B., Prates, L.C.L., do Nascimento, R.S., Freitas, M.F., de Mendonca, M.G.: A systematic review on the code smell effect. J. Syst. Softw. 144, 450 – 477 (2018). https://doi.org/10.1016/j.jss.2018.07.035 23. Sharma, T., Spinellis, D.: A survey on software smells. J. Syst. Softw. 138, 158–173 (2018). https://doi.org/10.1016/j.jss.2017.12.034 24. Singh, S., Kaur, S.: A systematic literature review: Refactoring for disclosing code smells in object oriented software. Ain Shams Eng. J. (2017). https://doi.org/10.1016/j.asej.2017.03.002 25. Tempero, E., Anslow, C., Dietrich, J., Han, T., Li, J., Lumpe, M., Melton, H., Noble, J.: Qualitas corpus: a curated collection of java code for empirical studies. In: 2010 Asia Pacific Software Engineering Conference (APSEC2010), pp. 336–345 (2010). http://dx.doi.org/10. 1109/APSEC.2010.46 26. Wasylkowski, A., Zeller, A., Lindig, C.: Detecting object usage anomalies. In: 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2007, pp. 35–44 (2007). https://doi.org/10. 1145/1287624.1287632 27. Wen, J., Li, S., Lin, Z., Hu, Y., Huang, C.: Systematic literature review of machine learning based software development effort estimation models. Inform. Softw. Technol. 54(1), 41–59 (2012). http://orcid.org/10.1016/j.infsof.2011.09.002 28. Wohlin, C.: Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering EASE’14 (2014). https://doi.org/10.1145/2601248. 2601268 29. Zhang, H., Babar, M.A., Tell, P.: Identifying relevant studies in software engineering. Inf. Softw. Technol. 53(6), 625–637 (2011). http://orcid.org/10.1016/j.infsof.2010.12.010 30. Zhang, M., Hall, T., Baddoo, N.: Code Bad Smells: a review of current knowledge. J. Softw. Mainten. Evolut. Res. Pract. 23(3), 179–202 (2011). http://orcid.org/10.1002/smr.521

Systematic Literature Review References 31. Amorim, L., Costa, E., Antunes, N., Fonseca, B., Ribeiro, M.: Experience report: evaluating the effectiveness of decision trees for detecting code smells. In: 2015 IEEE 26th International Symposium on Software Reliability Engineering, ISSRE 2015, pp. 261–269 (2016). https:// doi.org/10.1109/ISSRE.2015.7381819 32. Barbez, A., Khomh, F., Guéhéneuc, Y.G.: A machine-learning based ensemble method for anti-patterns detection. J. Syst. Softw. 161, (2020). https://doi.org/10.1016/j.jss.2019.110486 33. Barbez, A., Khomh, F., Gueheneuc, Y.G.: Deep learning anti-patterns from code metrics history. In: Proceedings—2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, pp. 114–124 (2019). https://doi.org/10.1109/ICSME.2019.00021 34. Boussaa, M., Kessentini, W., Kessentini, M., Bechikh, S., Ben Chikha, S.: Competitive coevolutionary code-smells detection. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8084 LNCS, 50–65 (2013). https://doi.org/10.1007/978-3-642-39742-4_6 35. Bryton, S., Brito e Abreu, F., Monteiro, M.: Reducing subjectivity in code smells detection: Experimenting with the long method. In: Proceedings—7th International Conference on the Quality of Information and Communications Technology, QUATIC 2010, pp. 337–342 (2010). https://doi.org/10.1109/QUATIC.2010.60

Code Smells Detection Using Artificial Intelligence . . .

317

36. Chen, Z., Chen, L., Ma, W., Zhou, X., Zhou, Y., Xu, B.: Understanding metric-based detectable smells in python software: a comparative study. Inf. Softw. Technol. 94, 14–29 (2018). http:// orcid.org/10.1016/j.infsof.2017.09.011 37. Fakhoury, S., Arnaoudova, V., Noiseux, C., Khomh, F., Antoniol, G.: Keep it simple: Is deep learning good for linguistic smell detection? In: 25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018—Proceedings, vol. 2018-March, pp. 602–611 (2018). https://doi.org/10.1109/SANER.2018.8330265 38. Fontana, F.A., Zanoni, M., Marino, A., Mäntylä, M.V.: Code smell detection: towards a machine learning-based approach. In: IEEE International Conference on Software Maintenance, ICSM, pp. 396–399 (2013). https://doi.org/10.1109/ICSM.2013.56 39. Fontana, F.A., Mäntylä, M.V., Zanoni, M., Marino, A.: Comparing and experimenting machine learning techniques for code smell detection. Empirical Softw. Eng. 21(3), 1143–1191 (2016). http://orcid.org/10.1007/s10664-015-9378-4 40. Fu, S., Shen, B.: Code bad smell detection through evolutionary data mining. In: International Symposium on Empirical Software Engineering and Measurement, vol. 2015-November, pp. 41–49 (2015). 10.1109/ESEM.2015.7321194 41. Gauthier, F., Merlo, E.: Semantic smells and errors in access control models: a case study in PHP. In: Proceedings—International Conference on Software Engineering, pp. 1169–1172 (2013). https://doi.org/10.1109/ICSE.2013.6606670 42. Grodzicka, H., Ziobrowski, A., Łakomiak, Z., Kawa, M., Madeyski, L.: Code smell prediction employing machine learning meets emerging Java Language constructs. In: PoniszewskaMara´nda, A., Kryvinska, N., Jarza˛bek, S., Madeyski, L. (eds.) Data-Centric Business and Applications: Towards Software Development, vol. 40 of book series Lecture Notes on Data Engineering and Communications Technologies, pp. 137–167. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-34706-2_8 43. Guggulothu, T., Moiz, S.A.: Code smell detection using multi-label classification approach. Softw. Qual. J. (2020). https://doi.org/10.1007/s11219-020-09498-y 44. Guo, X., Shi, C., Jiang, H.: Deep semantic-based feature envy identification. ACM International Conference Proceeding Series (2019). https://doi.org/10.1145/3361242.3361257 45. Hadj-Kacem, M., Bouassida, N.: A hybrid approach to detect code smells using deep learning. In: ENASE 2018—Proceedings of the 13th International Conference on Evaluation of Novel Approaches to Software Engineering, vol. 2018-March, pp. 137–146 (2018). https://doi.org/ 10.5220/0006709801370146 46. Hadj-Kacem, M., Bouassida, N.: Deep representation learning for code smells detection using variational auto-encoder. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2019-July (2019). https://doi.org/10.1109/IJCNN.2019.8851854 47. Hassaine, S., Khomh, F., Guéhéneucy, Y.G., Hamel, S.: IDS: An immune-inspired approach for the detection of software design smells. In: Proceedings—7th International Conference on the Quality of Information and Communications Technology, QUATIC 2010, pp. 343–348 (2010). https://doi.org/10.1109/QUATIC.2010.61 48. Hozano, M., Antunes, N., Fonseca, B., Costa, E.: Evaluating the accuracy of machine learning algorithms on detecting code smells for different developers. In: Proceedings of the 19th International Conference on Enterprise Information Systems, Vol. 2: ICEIS, pp. 474–482. INSTICC, SciTePress (2017). https://doi.org/10.5220/0006338804740482 49. James Benedict Felix, S., Vinod, V.: Design and analysis of improvised genetic algorithm with particle swarm optimization for code smell detection. Int. J. Innov. Technol. Explor. Eng. 9(1), 5327–5330 (2019). https://doi.org/10.35940/ijitee.A5328.119119 50. Jesudoss, A., Maneesha, S., Lakshmi Naga Durga, T.: Identification of code smell using machine learning. In: 2019 International Conference on Intelligent Computing and Control Systems, ICCS 2019, pp. 54–58 (2019). https://doi.org/10.1109/ICCS45141.2019.9065317 51. Karaduzovic-Hadziabdic, K., Spahic, R.: Comparison of machine learning methods for code smell detection using reduced features. In: UBMK 2018—3rd International Conference on Computer Science and Engineering, pp. 670–672 (2018). https://doi.org/10.1109/UBMK. 2018.8566561

318

T. Lewowski and L. Madeyski

52. Kaur, A., Jain, S., Goel, S.: A support vector machine based approach for code smell detection. In: Proceedings—2017 International Conference on Machine Learning and Data Science, MLDS 2017, vol. 2018-January, pp. 9–14 (2018). https://doi.org/10.1109/MLDS.2017.8 53. Kaur, A., Jain, S., Goel, S.: SP-J48: a novel optimization and machine-learning-based approach for solving complex problems: special application in software engineering for detecting code smells. Neural Comput. Appl. (2019). http://orcid.org/10.1007/s00521-019-04175-z 54. Kessentini, W., Kessentini, M., Sahraoui, H., Bechikh, S., Ouni, A.: A cooperative parallel search-based software engineering approach for code-smells detection. IEEE Trans. Softw. Eng. 40(9), 841–861 (2014). http://orcid.org/10.1109/TSE.2014.2331057 55. Kessentini, M., Ouni, A.: Detecting android smells using multi-objective genetic programming. In: 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft), pp. 122–132 (2017). https://doi.org/10.1109/MOBILESoft.2017.29 56. Khomh, F., Vaucher, S., Guéehéneuc, Y.G., Sahraoui, H.: A Bayesian approach for the detection of code and design smells. In: Proceedings—International Conference on Quality Software, pp. 305–314 (2009). https://doi.org/10.1109/QSIC.2009.47 57. Kiyak, E.O., Birant, D., Birant, K.U.: Comparison of multi-label classification algorithms for code smell detection. In: 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2019—Proceedings (2019). https://doi.org/10.1109/ISMSIT.2019. 8932855 58. Kreimer, J.: Adaptive detection of design flaws. Electronic Notes in Theoretical Computer Science 141(4 SPEC. ISS.), 117–136 (2005). https://doi.org/10.1016/j.entcs.2005.02.059 59. Liu, H., Jin, J., Xu, Z., Bu, Y., Zou, Y., Zhang, L.: Deep learning based code smell detection. IEEE Trans. Softw. Eng. (2019). http://orcid.org/10.1109/TSE.2019.2936376 60. Liu, H., Liu, Q., Niu, Z., Liu, Y.: Dynamic and automatic feedback-based threshold adaptation for code smell detection. IEEE Trans. Softw. Eng. 42(6), 544–558 (2016). http://orcid.org/10. 1109/TSE.2015.2503740 61. Liu, H., Xu, Z., Zou, Y.: Deep learning based feature envy detection. In: ASE 2018— Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 385–396 (2018). https://doi.org/10.1145/3238147.3238166 62. Maiga, A., Ali, N., Bhattacharya, N., Sabané, A., Guéneuc, Y.G., Aimeur, E.: SMURF: A SVMbased incremental anti-pattern detection approach. In: Proceedings—Working Conference on Reverse Engineering, WCRE, pp. 466–475 (2012). https://doi.org/10.1109/WCRE.2012.56 63. Mansoor, U., Kessentini, M., Maxim, B.R., Deb, K.: Multi-objective code-smells detection using good and bad design examples. Softw. Qual. J. 25(2), 529–552 (2017). http://orcid.org/ 10.1007/s11219-016-9309-7 64. Merzah, B.M.: Software quality prediction using data mining techniques. In: 2019 International Conference on Information and Communications Technology, ICOIACT 2019, pp. 394–397 (2019). https://doi.org/10.1109/ICOIACT46704.2019.8938487 65. Mkaouer, M.W.: Interactive code smells detection: an initial investigation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9962 LNCS, 281–287 (2016). https://doi.org/10.1007/978-3-31947106-8_24 66. Ocariza, F.S., Pattabiraman, K., Mesbah, A.: Detecting unknown inconsistencies in web applications. In: ASE 2017—Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pp. 566–577 (2017). https://doi.org/10.1109/ASE.2017.8115667 67. Özkalkan, Z., Aydin, K.S., Tetik, H.Y., Belen Saglam, R.: Automatic detection of feature envy using machine learning techniques. In: CEUR Workshop Proceedings, vol. 2201 (2018). http:// ceur-ws.org/Vol-2201/UYMS_2018_paper_80.pdf 68. Palomba, F., Bavota, G., Di Penta, M., Oliveto, R., De Lucia, A., Poshyvanyk, D.: Detecting bad smells in source code using change history information. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013—Proceedings, pp. 268–278 (2013). https://doi.org/10.1109/ASE.2013.6693086 69. Palomba, F.: Alternative sources of information for code smell detection: postcards from far away. In: Proceedings—2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, pp. 636–640 (2017). https://doi.org/10.1109/ICSME.2016.26

Code Smells Detection Using Artificial Intelligence . . .

319

70. Palomba, F.: Textual analysis for code smell detection. Proc. Int. Conf. Softw. Eng. 2, 769–771 (2015). http://orcid.org/10.1109/ICSE.2015.244 71. Pradel, M., Heiniger, S., Gross, T.R.: Static detection of brittle parameter typing. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, p. 265– 275. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10. 1145/2338965.2336785 72. Rubin, J., Henniche, A.N., Moha, N., Bouguessa, M., Bousbia, N.: Sniffing android code smells: an association rules mining-based approach. In: Proceedings—2019 IEEE/ACM 6th International Conference on Mobile Software Engineering and Systems, MOBILESoft 2019, pp. 123–127 (2019). https://doi.org/10.1109/MOBILESoft.2019.00025 73. Sahin, D., Kessentini, M., Bechikh, S., Deb, K.: Code-smell detection as a bilevel problem. ACM Trans. Softw. Eng. Methodol. 24(1) (2014). https://doi.org/10.1145/2675067 74. Sharma, P., Kaur, E.A.: Design of testing framework for code smell detection (OOPS) using BFO algorithm. Int. J. Eng. Technol. (UAE) 7(2.27 Special Issue 27), 161–166 (2018). https:// doi.org/10.14419/ijet.v7i2.27.14635 75. Tummalapalli, S., Kumar, L., Neti, L.B.M.: An empirical framework for web service antipattern prediction using machine learning techniques. In: IEMECON 2019—9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference, pp. 137–143 (2019). https://doi.org/10.1109/IEMECONX.2019.8877008

Risk Management of Procurement of the German Medium-Sized Industrial Companies with the Focus on Security of Supply Stephanie Burghart and Milan Fekete

Abstract Dealing with the risk management of procurement is for German mediumsized industrial companies a big challenge. Risk management and procurement are, viewed separately, very well researched areas of knowledge. The research gap lies in the weak link between the two topics to a risk management of procurement, taking into account the particularities of the different economic sectors and company sizes. We developed recommendations for action to strategically secure the supply of goods not produced by the company itself. We conceive these recommendations for action are suitable for strategically improving the security of supply for German medium-sized industrial companies. For this purpose, a research approach based on Hans Ulrich’s demand for application-oriented research was chosen. No theories are developed or tested by hypotheses. Instead, the focus is on advising the practice. Keywords Risk management · Procurement · Medium-sized companies · Security of supply

1 Introduction 1.1 Relevance Two trends have radically changed the challenges of procurement for industrial companies over the past decades: Global sourcing, and the decreasing vertical integration in companies that are continuously focusing on their core competencies [1].

S. Burghart (B) · M. Fekete Comenius University in Bratislava, Šafárikovo námestie 6, P.O. BOX 440, 814 99 Bratislava 1, Slovakia e-mail: [email protected] M. Fekete e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_13

321

322

S. Burghart and M. Fekete

The demand for purchased items are also met by the German medium-sized industrial companies (SMEs) less and less by their own production, and instead of that, they are procured in large quantities worldwide. This changes the tasks of the buyers in the companies. Instead of reacting to existing requirements and ensuring the supply of goods at the right time and in the right quantity, procurement is increasingly proactively managing requirements and thus making its own value contribution. Whereas in the past, price was the main criterion for buyers, the focus today is also on risk management and supply security as strategic tasks of procurement. Buyers act as an interface between external and internal contact persons. In this strategic function, they ensure that innovations from existing and new suppliers are utilized in their own companies [1]. In view of the advancing digitization, the importance of procurement as an interface will continue to grow. Industry 4.0 and the Internet of Things (IoT) will significantly change the performance of manufacturing companies, with serious effects on the supply chain. Industry 4.0 is not just about extensive automation within individual companies—it is about connecting the suppliers of suppliers with the customers of the customers of industrial companies in a wide variety of sectors. The Internet of Things goes even further. Without human intervention, machines communicate with each other and order required production goods as independently as their spare parts or maintenance services. The automation of routine activities in purchasing cannot be stopped. What can frighten operational purchasers about the future is seen by purchasers as an opportunity to focus even more strongly on strategic activities such as risk management.

1.2 Goals and Objectives The aim of this work is to support purchasers with recommendations for action for the various sub-steps of risk management. These were derived from findings and conclusions from publications on business management theory, as well as expert discussions with purchasers and economists [2]. As outlined in Fig. 1, the organization of risk management creates the basis for it. The first step is to create the organization of the risk strategy as well as the course of actions. This is followed by those sub-steps of the risk management process that SMEs can carry out autonomously [4]. This means that no recommendations are given for the handling of risks, which is usually done after the assessment of risks. There are three reasons for this approach. First, small and medium-sized companies have the same methods available to them as large companies for managing their risks. These range from avoiding, reducing, transferring, insuring, or diversifying to consciously taking risks. Second, SMEs generally have only limited market power. They have to use the instruments available on the market to manage their risks. As a rule, SMEs do not have the possibility to determine the conditions of insurance, derivatives, or currency

Risk Management of Procurement of the German …

323

Fig. 1 Focus of risk management (Own presentation, adapted from Romeike [3])

swaps. The same applies to the relationship with their customers, suppliers, and other business partners. Third, SMEs are active in many different industries, whose market situations differ significantly. In summary, SMEs cannot determine and actively shape their risk management themselves. Therefore, no recommendations for action are given here for those sub-steps of risk management that require interaction with external partners. In contrast, identification and evaluation are tasks of risk management that take place without the participation of external partners [5].

2 Theoretical and Conceptual Background 2.1 Procurement and Supply Security Procurement is the counterpart to sales. Both represent the input and output interface of each company to its markets [6]. Already 1935 Curt Sandig had recognized this and in his still current contribution [7] “Layout of the procurement” or German „Grundriss der Beschaffung “ demanded that the business administration should dedicate more attention to the procurement [8]. Nevertheless, the focus of the economical view was

324

S. Burghart and M. Fekete

for a long time on the selling [9]. The research need for the procurement is thus considering the abundance of literature to production or selling as insufficient [7]. Internally the procurement was seen rather as “fulfillment aide” of production, than as independent function noticed [10]. Even in the times of Industry 4,0 the procurement is not noticed yet everywhere as cross-functional mediator between the different functions in the enterprise and its suppliers [11]. An empirical study has shown that, due to increasing complexity, cross-functional cooperation in procurement is a prerequisite for corporate success [12]. In order to strategically manage the risks of securing demand, purchasers depend on cross-functional cooperation within the company [13]. Despite its high importance for the success of a company, cross-departmental collaboration of procurement within the supply chain is still an insufficiently researched topic [14]. The requirement for more research on SME procurement has been around for a very long time [15, 16]. Nevertheless, Stütz comes to the disillusioning summary that only little to the characteristics of the procurement of these enterprises is found. On the contrary, statements from the literature contradict the results of empirical research or make inconsistent statements about them [15, 17]. One example is an investigation of the resilience of different sourcing strategies of SMEs in case of supply interruptions. It contradicts a common opinion that SMEs should prefer reactive sourcing strategies in order to ensure the supply of purchased items. The study shows that the decision depends on factors such as the type of risk perception, the risks, and the procurement strategy [18]. Security of supply, i.e., ensuring the supply of all purchased production goods not manufactured by the company itself, is one of the strategic tasks of procurement [19]. According to an empirical study, the supply of material is even more important to small and medium-sized companies with over 80% cited, than price or quality targets, which were cited with 80 and 70% respectively; securing liquidity was far behind with a good 30% [20]. An empirical survey from 2018 came to a similar conclusion: the majority of participants working for manufacturing companies considered security of supply as a goal even more important than reducing purchase prices [12]. A slow change can be seen here, leading away from the predominantly operational goals of procurement in SMEs [20]. The high strategic potential of a cooperative instead of competitive collaboration with suppliers is also pursued by the approach of presenting oneself to promising suppliers as a promising customer. This can mean, for example, that little effort is required in the operational process, and that the business relationship is not always only to one’s own advantage. This can be worthwhile for SMEs in two ways. Firstly, operationally, because they usually do not have the purchasing power of corporations and have to find other ways to ensure their supply even in critical situations. In addition, SMEs are characterized by their flexibility and their high innovative strength. If, as “preferred customers”, they manage to be the first to benefit from their suppliers’ innovations, they can gain an edge over their competitors [21].

Risk Management of Procurement of the German …

325

In order to implement the main objective of supply security, a quantitative procedure is recommended, especially for medium-sized companies, with the ABC analysis; often with the reference to direct their limited resources to the important A articles in this way [22]. The general availability of goods is thus implicitly assumed [23] and the focus is placed on the operational safeguarding of the supply of goods [24]. Medium-sized companies have not yet recognized the potential of strategic procurement and grant their procurers only little room for maneuver [22]. The strategic portion, thus the influence of the need still before it develops, is only little considered, the procurement compared with other enterprise functions even as “remainders” designated [9]. This is surprising in view of the fact that the vertical range of manufacturing in industrial companies has been continuously decreasing for decades [10, 25]. The importance of strategic procurement has therefore not increased to the same extent as the share of goods purchased from external sources would have led one to expect.

2.2 Small and Medium-Sized Industrial Companies in Germany “A small business is not a little big business “ [26]. The key message of this oftenquoted sentence is that SMEs differ from large companies not only in their size. Worldwide, most companies are family businesses. It is assumed that risk management is also carried out differently in these companies than in companies run by salaried management. Consequently, there is a demand to examine the special features of family businesses in the context of the supply chain [27]. An empirical study of SMEs of various sizes showed that insufficient business management knowledge stands in the way of the application of existing theoretical knowledge in practice. In contrast, it is recommended that chambers of industry and commerce and universities increasingly offer continuing education measures specifically for SMEs [28]. Such efforts would be justified in so far as SMEs and the SME sector can hardly be overestimated in their importance worldwide and for the German economy, too. Their innovative power is one of the strengths of small and medium-sized enterprises. They are flexible [29], pragmatic, and are thus able to respond to changing market requirements and customer needs [30]. As employers, German SMEs find it difficult to compete with the competition from large corporations. For the German SMEs, too, demographic change and the changing expectations of employees born after 1980 are challenges that need to be taken seriously [31]. It is not a new insight that good employees play a significant role in innovation, productivity and ultimately in competitiveness [32]. It was confirmed by a survey

326

S. Burghart and M. Fekete

conducted worldwide in 2019, in which the shortage of skilled workers was named as one of the ten most frequently mentioned dangers for companies [33]. Germany is particularly affected by the shortage of skilled workers, as it differs from other industrialized nations in two respects. On the one hand, no other major industrialized country generates such a high share of its gross national product from industrial companies [34]. Another special feature of the German economy is its very high number of hidden champions most of whom are medium-sized manufacturing companies [35]. They have specialized in niches, where they lead the world market, and according to a study by the Fraunhofer Institute, their strong innovative power in technology competition clearly distinguishes them from their global competitors [36]. Nevertheless, SMEs have long been recommended to enter into cooperative ventures in procurement [10, 37]. Because of their striving for independence [38], this strategy is only of limited suitability for medium-sized companies and especially “hidden champions” [36]. As such, they are strongly owner-driven and have a very long planning horizon [39]. Many are skeptical about the cooperation [20]. That appears comprehensible to that extent, if it concerns the procurement of goods, which flow into products representing a position characteristic or Unique Selling Proposition (USP) for SMEs. This possibility of threatening dependencies and know-how losses is only occasionally pointed out, and it is not recommended to procure success-relevant additional purchase articles by means of [40, 41]. Cooperation may not only entail risks for technological know-how but may also jeopardize competitive advantages generated by procurement [42]. This shows that not all recommendations for procurement can be used by all companies without restriction.

2.3 Procurement and Risk Management in SMEs There is a lack of practical recommendations for the different types of companies for risk management [43]. Processes such as risk management can represent a USP, just like material goods, if they have a high share in the success and cannot easily be imitated by other companies [44]. For risk management to become a USP for SMEs, it must be integrated into existing business processes and be coherent with the goals and strategy of the company [43]. It is the task of the management of SMEs to establish an effective and efficient risk management system that is specific to the company and derived from the corporate strategy, and to actively develop the risk culture [45]. The risk policy specifies how risks are to be handled, which rules are to apply, and which instruments are to be used [29]. How these tasks are to be implemented and designed in a company-specific manner is not explained. It is therefore not surprising that many companies do not operate a planned risk management system and instead rely on improvisation in the event of damage [46]. This is not only true for SMEs, but also for the risk management of industrial companies of all sizes [47]. The recommendation is that companies should draw up

Risk Management of Procurement of the German …

327

plans in advance to maintain their business operations [33]. However, such a strategic approach is rarely implemented—neither in large nor in small companies. For large companies, a case study showed the paradoxical situation that their risk management in practice is also different from what is presented in theory. The goal of the study was to improve the risk management process in large companies. As a result, an enterprise-wide integrated cross-functional risk management was recommended in order to ensure supply across the entire supply chain [48]. Small and medium-sized industrial companies have long been aware that their risk management is not at the level of large companies. Most of them do not pursue holistic, systematic risk management [49]. Consequently, it can be assumed that risk management in small companies is not more in line with theory than in large companies. Paradoxically, medium-sized companies seem to evaluate their risk management better than it probably actually is [29]. This fits in with a large-scale study that found that risk perception in corporations and SMEs is still different [29]. Risk management of procurement in SMEs is mainly concerned with operational tasks. A repeated study shows that the situation had not improved significantly in recent years. The sobering result is that six out of ten of the companies with the majority of production do not actively manage the risks of their procurement [12]. Many SMEs only consider operational risks and act according to the motto that everything has gone well so far and that risks will be taken care of as soon as they occur [43]. If, in contrast to this fatalistic attitude, SMEs want effective risk management, a profound structural and process organization is a prerequisite for this [49]. Although this has been known for a long time, even a minimum of organizational structure and its documentation is not yet standard everywhere in SMEs [50]. Small and medium-sized enterprises are represented in all three economic sectors: industry, trade, and services. For none of these sectors is there a specific risk management system that is tailored to the needs of small and medium-sized businesses. What has long been common practice in marketing theory and practice [45] is not taken into account in risk management. The significant differences in the performance generation in these three sectors of the economy have so far been ignored in the risk management of their procurement [51]. The topic of risk management has also become significantly more important in SMEs as a result of the Basel Capital Accords (Basel II) and the German Act on Control and Transparency in Business (KonTraG), which was passed in 1998 [29]. That is why investors demand and promote risk management especially in SMEs as a decisive success factor [52]. Since both regulations have a financial background, it is not surprising that their focus is on this aspect [53]. At best, procurement plays a subordinate role. Although risk awareness has also arrived in the management of medium-sized manufacturing companies [37], there is a lack of derivation of risk management strategy from corporate strategy [54]. This can lead to overreactions to emerging opportunities or risks. Such abrupt changes in strategy seem counterproductive

328

S. Burghart and M. Fekete

in view of the realization that risk strategies are the more successful the more consistently they are pursued [55]. How risk management is carried out and whether it is aligned with the company’s objectives depends strongly on the size of the company [56]. There is empirical evidence that risks must be approached in a structured manner despite the heterogeneity of medium-sized companies. This applies not only to the industry and financial resources, but also to the normatively very important risk attitude of the owner [57]. SMEs are recommended to set up a staff position for risk management or, if this is not possible for them, to hire external consultants [58]. This is countered by the fact that SMEs are critical of external consultants. Derived from the results of an empirical study, it is therefore recommended that risk management be managed internally by controlling or accounting, or even by a tax consultant or auditor. The empirical study came to the conclusion that SMEs have a great deficit in the use of formal methods of risk management. Due to their heterogeneous structure, approaches to risk management tailored to SMEs were called for. The reason given for this appeal is that it is not expedient to impose concepts on SMEs that were developed with large companies in mind [28]. Another study confirms that concepts that were developed for the risk management of corporate groups are often given the rating “suitable for SMEs". It came to the conclusion that this undifferentiated approach contributes to the low acceptance of risk management in medium-sized companies. One of the main findings of this empirical study was that German medium-sized companies do not operate adequate risk management despite the self-interest of the managing owners. The long-term survival of the companies is therefore not secured [59].

2.4 Sub-steps of the Risk Management of Procurement Several dozen methods are recommended for the individual sub-steps: identification, analysis, management, and control of supply chain risk management in small, medium, and large companies [60]. The recommendations usually do not specify any criteria according to which the instruments should be selected. Simple handling, transparency or the use of already existing data could be such decision criteria [61]. The basis of risk management is the identification of risks. Industrial companies are recommended that sales and procurement work together to analyze their risks [62]. For a long time, it has been emphasized that all risks should be completely recorded [63]. How this has to be done is regarded as company-specific and should be adapted to the individual company [64, 65]. In accordance with the contingency approach, it is refrained from showing how this can be implemented in practice. Only rarely is it pointed out that this so-called completeness postulate, i.e., the complete recording of all risks, is actually an unsolvable task for risk management [61]. There are no criteria to decide when this goal is reached.

Risk Management of Procurement of the German …

329

Instead, companies should find the balance that is appropriate for them, focus on significant risks and manage these risks in a way that is economically appropriate and suitable for them [66]. More than half of all medium-sized companies use only simple, qualitative methods to identify risks. Reasons for this could be low personnel resources [67] as well as a lack of methodological competence. Buyers need methodological competence to select optimal risk management tools for their company [68]. Although the range of quantitative methods for risk identification is increasing, ABC analysis is usually recommended [65]. Parallel to it frequently also the Portfolio analysis is recommended to procurers for the evaluation of their goods. This tool apart from the quantitative, also has a subjective, thus qualitative component [69]. Especially for medium-sized industrial companies integrated buying risk analysis (IBR analysis) would be better suited for the identification and evaluation of supply risks [70]. A cross-functional working group is recommended for the identification of risks [29, 43], which also determines strategic procurement risks. The IBR analysis is based on the input–output analysis commonly used in economics, for whose development Wassilij Leontief was awarded the Nobel Prize for Economics in 1973 [71]. Measures to deal with identified risks range from conscious avoidance, insurance, shuffling off risks, to the conscious taking of risks [72]. Many small and mediumsized companies (SMEs) hardly deal with their risks at all, only insuring themselves against natural hazards and ignoring their other risks [52]. If damage then occurs, the existence of the company can be threatened if employees are so upset by the occurrence of the damage that they are unable to take appropriate action. To prevent this, it is recommended that checklists be drawn up on how to proceed in the event of damage [73]. However, this procedure is not uncontroversial for SMEs because of the high effort required to prepare them and the implicit emphasis on already known risks [74]. Risk aggregation, with which the entire risk potential of a company is to be cumulated, takes place in only 6% of all SMEs [75]. In view of the effort and methodological challenge for SMEs, this is not surprising [59]. What is surprising at first sight is that SMEs do not take the opportunity to learn from risk situations that have actually occurred. Only a few companies make use of this very simple measure to learn from mistakes and the lessons learned from them and thus improve their risk management [29]. In the literature, the post-processing of damages, sometimes referred to as postmortem analysis, is rarely mentioned, and if it is, then it is mostly mentioned in textbooks and practical guides as “lessons learned” [76]. Correspondingly, an empirical study showed that less than 2% of the German industrial companies surveyed were SMEs that carry out post-processing of damages. Large companies took advantage of this opportunity at least 40% [49]. The task of controlling is to plan and control the various measures of corporate management and to monitor their success [77]. In many publications, controlling is presented as the last step in the cycle of risk management. Risk controlling can be organized centrally or decentrally according to the separation model or the integration

330

S. Burghart and M. Fekete

model. The integration model is recommended for SMEs: It is positive that the experts bring their expert knowledge, but critically that they monitor themselves in that regard. Although this would be circumvented with the separation model, it would be less suitable for SMEs because of the separate function required for this [78]. In addition, there is the opposite recommendation that the separation model should also be preferred to the integration model by medium-sized companies. In this context, it is pointed out that, unlike company-wide controlling, risk controlling has the problem of proving its effectiveness. Because if risk management is successful, there will be no or less impact of damages. This makes it difficult to prove its usefulness [73]. This makes it more important to have the backing of the company management for risk management. By clearly stating their support for its objective, setting the framework for action, and justifying the necessary effort, they counteract the fact that risk management is only implemented pro forma [74].

3 Organization of Risk Management Based upon the previous chapters, the following recommendations for action had been developed. They show how effective risk management of procurement should be organized in medium-sized industrial companies and how it can be carried out.

3.1 Organizational Structure A strategy in the sense of a longer-term plan shows how a target state is to be achieved starting from an actual state [79]. This creates the prerequisites for a uniform risk awareness throughout the company and, ideally, for a consciously developed risk culture. The as-is analysis is intended to record and make transparent as far as possible all aspects that influence a secure supply of purchased items. A survey indicates that strategically planned risk management is more likely to realize the intended positive effects on the company’s success [80]. In order to achieve this overall objective, the initial and target situation must be defined and communicated to all stakeholders. Due to the flat hierarchy of medium-sized companies, both are the responsibility of management board. It must define the objectives of risk management, clearly stand by its introduction and implementation, and provide the framework for action. It is also its task to convince employees of the benefits of risk management [54]. The more stable a risk strategy is, the more successful it is. The opposite would be a situation-driven change in the basic strategic direction with improvised risk handling.

Risk Management of Procurement of the German …

331

Another prerequisite for a successful risk strategy is that it is derived from the corporate objectives and integrated into corporate planning. A detached risk management of individual corporate functions, such as procurement, cannot guarantee this for industrial companies. On the contrary, conflicting goals could destroy the efforts of individual departments. In order to prevent opposing efforts, employees must know the risk attitude of the owners. This awareness is the prerequisite for employees to be able to assess opportunities and risks in the owners’ interests and make appropriate decisions. In addition, what is defined as risk must be known throughout the company [81]. What sounds so obvious is hardly ever addressed in literature. Slovic provocatively formulates “defining risk is thus an exercise in power” [82]. This could, for example, be a new legal provision restricting the purchase and use of certain raw materials. Depending on the management’s assessment of risk, this can be perceived as a risk or an opportunity. It is also necessary for procurement risk management to define how to work with suppliers and service providers. The type of cooperation can vary from partnership and mutual benefit on the one hand to the opposite of the consistent use of each individual advantage. In SMEs, security of supply depends more than in corporations on the prevailing style of cooperation. In many cases, large companies have greater negotiating power and can benefit from this. Smaller companies must compensate for this disadvantage, for example by pursuing a “preferred customer” strategy. Their goal is a win–win situation. In addition to securing the supply, it is also important to ensure that one’s own company is granted advantages, such as innovations from suppliers, rather than those of competitors [83]. Defining what is considered a risk within the company and how to cooperate with external partners is not only important for the organizational structure of risk management. It also has the psychological effect of sharpening the perception to recognize possible risks. The risk awareness of the employees involved increases when they deal with risks that could affect the supply of purchased items. If culture in the broadest sense is defined as something that people create themselves, then every company has its own culture. This culture has either developed latently or, ideally, has been developed by the management in such a way that it serves the success of the company. The value of a consciously developed risk culture is demonstrated by the fact that it is not sufficient to be able to identify and evaluate risks. What is decisive is the will to make risks transparent and to want to manage them. Both of these presuppose how previously made decisions are dealt with in case they turned out to be wrong in retrospect. SMEs can not only benefit from sub-optimal risk management because of its positive influence on the risk culture, but also save their resources. With this heuristic approach, German medium-sized industrial companies can use the knowledge available in the company to master complexity within the bounds of their possibilities. In this way, SMEs achieve a useful result instead of resigning themselves to trying to achieve an optimal result.

332

S. Burghart and M. Fekete

SMEs differ from large companies in their flatter hierarchical levels and lower degree of structuring. This peculiarity makes it easier for these companies to communicate more directly across hierarchical and departmental boundaries and to cooperate more pragmatically than is possible in corporations. For medium-sized companies, this offers the opportunity for a constructive risk and error culture.

3.2 Process-Oriented Organization While the previous chapter explained what is essential for a structured built-up of the organization, it is now a matter of which guidelines the risk management process should follow. And not only in large companies, but also in those whose size is perceived as manageable. For accepting risk management in SMEs effectively throughout the company, it must be pragmatically adapted to existing organizational characteristics and implemented with as little effort as possible. Despite this, some fundamental decisions are necessary to show how the strategy should be implemented. The employees responsible for the operational process of risk management need rules and guidelines that define the framework within which they can make decisions independently. The management board does not need to set up the specifications and rules for the process itself but must control their creation and have the results confirmed, documented, and communicated. This can result in a risk management process that is integrated into the company’s processes and ideally contributes to the company’s success as a unique selling point. When the process is developed by the employees responsible for risk management, they contribute their knowledge and experience. They also feel more responsible for the process they have created themselves. The company’s management board determines which persons are involved in the risk management process. Due to the characteristics of SMEs and the service provision of industrial companies, the staffing and cooperation of this group is of crucial importance for the risk management of procurement. For this reason, the following chapter deals with how the cooperation of the participants should be structured.

3.3 Cooperation The purchase department is unable to completely overlook and handle the complexity of an industrial production by itself [2]. To achieve this, cooperation with, for example, production and R&D is essential. Decision-makers in operational practice who have recognized this fact are already working together across functions in risk management [84, 85], without this approach already being firmly anchored in risk management theory.

Risk Management of Procurement of the German …

333

Thus, every specialist can make its contribution instead of employees from research and development taking over the part of procurement and procurers want to do research and development work together with their suppliers. In order to achieve the partnership-based cooperation across departmental boundaries [7], it is the task of the management board to set the course for the risk management process with an organizational framework. Because complex tasks can be better met by interdisciplinary cooperation und Dey [86]. This may involve the possibility of substituting a raw material that is either no longer available or only available at higher costs. Or the potential to either increase sales or increase profits by adjusting the features of a sales item. An adjustment of the purchase quantities can also reduce procurement risks if they can be logistically and production-wise reasonably implemented. Nevertheless, according to one study, decisions on risk management issues were made by cross-functional teams in only 43% of the companies investigated [87]. Even if this study is limited to corporations, it can be assumed that the rate is rather lower in smaller companies. This is indicated by a study conducted in the countries of the Visegrád Group. It clearly showed that the responsibility for risk management there is strongly dependent on the size of the companies. The smaller the company, the more likely it was that either nobody or only the owner was responsible for risk management [88]. Psychological factors also speak in favor of forming cross-functional teams, for example when it comes to implementing the management’s risk tendency. A study on company-wide risk communication in supply chains showed that employees do not generally adjust their risk decisions to the direction desired by the management board even if they are aware of their risk inclination [89]. Cross-functional teams can counteract such deviating behavior of autonomously deciding employees. This is indicated by an experiment that showed that the risk attitude of individuals is either disproportionately risk-averse or disproportionately risk-affine, depending on their own experience. In contrast, the risk behavior of a group showed significantly fewer extremes [90]. There are dozens of different risk management models, some of them with very different sub-steps. The publication by Burghart divides the process into the sub-steps of identification, evaluation, handling, follow-up and controlling [2]. For each substep, recommendations for action are given because German SMEs are able to carry out these internally in a self-determined manner in order to increase their strategic security of supply. As already explained, due to the cooperation of SMEs with external partners such as insurance companies, suppliers, customers, or market companions, i.e., potential competitors, no recommendations are given on how to handle procurement risks. For the sake of completeness, however, these steps will be dealt with globally in the following.

334

S. Burghart and M. Fekete

3.4 Risk Identification The vast majority of industrial companies work with the bill of materials (list of parts). Unlike trading enterprises, it is not sufficient for them therefore to regard only the purchase volume of their purchase articles very frequently with the risk identification recommended by the ABC analysis [65]. Trading enterprises sell quasi unchanged the products, which they buy. However, industrial enterprises buy raw, auxiliary, and packing materials in order to manufacture from them their sales products. This means that something else is sold than purchased. This must be taken into account by the German SMEs’ procurement department when selecting a method for identifying its risks. SMEs have little knowledge of the broad and inconsistently defined range of instruments for the sub-steps of risk management. For this reason and due to a lack of resources and know-how for their implementation, easy-to-use procedures for identifying risks are requested. Quantitative methods evaluate existing Enterprise Resource Planning (ERP) data. This keeps the effort of data collection low and helps to ensure that this additional effort is accepted by the executing employees. This is why easy-to-use quantitative methods are ideal for SMEs. In contrast, it is a highly demanding task to identify risks with qualitative methods. There is a risk of consciously or unconsciously influenced, or distorted perception. Known or recent risks can be overestimated—unknown risks or risks that are unpleasant for the responsible employee can be underestimated or even suppressed Kahneman [91]. For this reason and because of the additional effort involved in carrying out two procedures, the statement often made in the literature for the German SMEs that the combination of a quantitative and a qualitative method is the ideal way to identify risks [9] is only valid to a limited extent. This is even more true since the results of quantitative methods are also subject to psychological influences. In the literature, the fact that the data basis of each quantitative analysis was defined with the help of qualitative criteria is only indirectly mentioned. Many publications hardly deal with data selection and are limited to indications that “relevant” data should be selected “carefully”. Selecting the database is not trivial. Consciously or unconsciously, the selection of data influences the results of analyses. An example of this is the evaluation of suppliers. The more important a purchased item is, the more you can consider whether and how it should be included in the analysis, whether its supplier is itself a manufacturer or only a dealer of the purchased item. It is obvious that the supply is better ensured by two different manufacturers than by two dealers only. This is even more true if these two buy from the same manufacturer. Furthermore, the location of the suppliers can be included in the analysis and evaluation. In the event of a natural disaster, the supply would also be endangered if different manufacturers were to purchase from different suppliers, but they were all located in the same region. The devastating earthquake and subsequent tsunami in Japan had a massive impact on the automotive supply chain in 2011 [92].

Risk Management of Procurement of the German …

335

The example shows that the recommended course of action of cross-functional collaboration already contributes to a broader view when selecting data. Instead of assessing the initial situation only from the perspective of one department, a discussion can be initiated throughout the company as to which circumstances are assessed as risks and which are not. The cross-functional discussion about which insights should be gained from the master data sharpens the perception of possible risks and promotes risk awareness in the organization. Therefore, the recommended course of action for the German SMEs is to deploy cross-functional teams in order to benefit from the different assessments. The more employees from different functions are involved in the selection process and the existing knowledge of cross-functional teams is used for risk identification, the wider the spectrum of approaches. It is true that any risk management is only as good as its basis, the identification of risks. However, the premise often propagated in the literature that all risks must be identified can tend to have the opposite effect on pragmatic and goal-oriented SMEs. They resign themselves to the postulate of completeness, insure themselves against natural hazards such as fire or storm damage and refrain from active and holistic risk management [52]. In addition, the hurdle of the completeness postulate can tempt the German SMEs to trust in their flexibility and to want to master an occurring damage case ad hoc. SMEs with this attitude overlook the psychological aspect of shock, which can render people incapable of action in crisis situations. Therefore, the recommendation for action is that the German SMEs should not try to identify all risks. There are several reasons for not trying to identify all risks. Firstly, there is a lack of decision support, how and according to which criteria it is to be decided when this goal is reached. The same applies to the requirement to identify all relevant risks. In this context, we would like to refer once again to Kahneman and others’ research, which shows that people are reluctant to deal with possible losses. Consequently, even the compulsion to decide when all risks have been identified could lead to no risk management being implemented at all. In addition, the German SMEs are generally critical of workloads that are not obvious and promptly profitable due to their limited resources. This skepticism also extends to the utilization of the support of external management consultants [28]. For these reasons, the recommendation is that cross-functional teams in the German SMEs should use the available resources to implement the risk management that is feasible and consciously avoid perfection. The reference to the fact that even an immature, proactive risk management can help to improve the risk awareness of employees, the rating at banks, and also the security of supply with purchased items can make it easier for SMEs to get started in the holistic risk management. Therefore, it can be assumed that easy-to-use quantitative methods are advantageous for the German SMEs in identifying risks. While portfolio analyses have a qualitative and a quantitative component, quantitative methods such as Integrated buying risk analysis (IBR) or ABC analysis rely exclusively on already existing data. Irrespective of this, quantitative forecasts always have qualitative components: the subjective selection of the data to be analyzed and making a subjective assessment of the results.

336

S. Burghart and M. Fekete

In the literature, there are no uniform guidelines as to who decides in the ABC analysis, which data should be analyzed and who evaluates the results. This distinguishes it fundamentally from the IBR analysis [70]. IBR analysis relies on cross-functional teams for the selection and evaluation of data to reduce the impact of missing know-how and possible psychological bias. Like the related input–output analysis, it is hardly ever used to identify risks [2]. Another significant difference to ABC analysis is that IBR analysis compares the analyzed data with the influence of the bought-in goods have on the company success. This comparison of the profit contribution, i.e., the turnover or profit, with the purchased items that are needed to generate this profit contribution is the core of the IBR analysis. At the beginning, the cross-functional team determines which data should be analyzed. Afterwards the parts list or bill of materials of the sales items are exploded. In the next step, the purchased items determined in this way are compared with the profit contribution that depends on them. Contrary to the ABC analysis with the IBR analysis the procurement volume of the additional purchase articles is neither monetary nor quantitatively relevant. So, also the dependence on a pricewise or quantitatively insignificant purchase article can be revealed, which is indispensable for the success-relevant sales article independent of its production [70]. Bill of materials (BOM) explosion is already being used in practice: Companies like BMW AG or Vorwerk & Co. KG explode their BOM and compare the results with the company’s results. With the IBR analysis, companies go one step further. By jointly determining which data is analyzed and how they evaluate the results of the analysis, cross-functional teams reduce the risk of bias, leverage the expertise of all parties involved, raise their risk awareness, and level out divergent risk perceptions [2]. The IBR analysis is particularly well suited for SMEs because it is based on existing ERP data and can be performed quickly and easily with standard spreadsheet programs. A study showed that the deeper an ERP system is integrated, the more it helps to reduce business risks. It has been proven, especially for large companies, that their investment in the ERP system has a positive effect in terms of risk reduction [93]. Since SMEs use analogous ERP systems, this positive effect can be assumed for them, as well.

3.5 Risk Assessment For risk assessment as the second sub-step of risk assessment, many things apply, which also apply to the risk identification of the German SMEs. For this reason, factors that have already been dealt with, such as lack of know-how, scarce resources, cross-functional teams, or even psychological resistance in dealing with risks, are no longer dealt with in depth.

Risk Management of Procurement of the German …

337

The assessment is used to decide whether, and if so to what extent, a negative impact is expected from the identified risks. What sounds banal has a decisive influence on the allocation of resources. It depends on this decision how much effort and which methods are to be used to handle the issues assessed as risks. A good structural and process organization with rules and definitions can help to make the best possible use of scarce resources [78]. Without such guidelines, their efficient use cannot be guaranteed. For example, an item may be regarded as essential by the sales department, but in fact makes only a small contribution to the company´s result. Without regulations, the sales department could force the purchasers to invest in securing the supply of this article—an effort that would have been better spent on an article with a higher profit contribution. As in the case of identification, the literature also contains sometimes many contradictory recommendations for the assessment of risks. Due to the characteristics of SMEs, a heuristic approach is again suitable for assessing the risks of the German SMEs with their little knowledge of business management. It must be accepted, indeed consciously accepted, that a suboptimal decision is made with incomplete knowledge and without sufficient time. This is particularly useful for those SMEs which, as industrial companies, have to cope with the higher complexity in risk assessment. In contrast to trading or service companies, they have to look at two different views to ensure their supply: the sales items that they produce themselves and the purchased items required for this, such as raw materials or packaging. Sales and product management can evaluate the relevance of sales items, while R&D can assess the substitution, or buyers can assess a possible replacement of purchased items. Risks are often assessed according to their probability of occurrence and the expected extent of damage. Multiplying both values results in a product. The critical point here is that the product of a risk threatening the company’s existence can be the same, with a very low probability, as that of a risk with medium scope and probability of occurrence. Multiplication alone does not therefore reveal that the firstmentioned risk jeopardizes the existence of the company, albeit with a significantly lower probability [29]. In order to arrive at a consensual risk assessment, the German SMEs can use the Gaussian sum formula instead. The formula for this is (n * (n + 1))/2 or (n2 + n)/2. This method, which is very easy to use and requires no prior knowledge, has not yet been described in the literature for risk management. By allowing the cross-functional team to make a number of partial decisions, each of which is of minor importance, leading to an overall decision, the inhibition threshold of individual persons to make a single but far-reaching decision is circumvented. The cross-functional approach allows the effects of risks to be viewed from a broader perspective. An example will explain the process. A decision depends on 12 parameters. Each of these 12 parameters is compared individually to all 11 other parameters. The cross-functional team spontaneously makes a partial decision for each combination by a show of hands. These partial decisions are documented so that it is possible to reconstruct how the decision was reached. In this example, a total of 78 dichotomous partial decisions has to be made. This seemingly high number is relativized in view

338

S. Burghart and M. Fekete

of the need for discussion among a large number of decision-makers on a complex topic with a large number of parameters. The recommended course of action for the German SMEs is to make decisions in a cross-functional team even without knowing all the facts in order to obtain results quickly and easily. This heuristic approach, i.e., not obtaining an optimal but a useful result with little effort, is suitable for medium-sized companies. Accordingly, during the risk assessment it should also be accepted consciously that a better decision could have been made with the help of a more time-consuming discussion, greater expertise, and possibly also through external consulting. With the Gaussian sum formula, cross-functional teams make decisions quickly and with little effort. All the functions involved evaluate their risks together and on the basis of the premises defined specifically for the company. In this way, psychological resistance can be circumvented in order to make decisions on risk assessment [2]. The term handling refers to those tasks that arise after the identification and assessment of risks. Thus, either the probability of occurrence or the possible extent of damage of risks should be influenced. Handling is an essential core element of the risk management process and can have both a strategic and an operational dimension. While strategic activities are approached proactively, i.e., without acute cause, the handling of incidents is an operational activity. Measures for handling risks range from conscious avoidance, insurance, shifting, sharing, diversification to consciously taking risks. SMEs do not face the same challenges as large companies in managing their risks. It has already been outlined above that medium-sized industrial companies cannot fully identify all risks, even at great expense. At the same time, industrial companies face the challenge that the risks of their purchased items are not directly reflected in the sales items. Trading companies are at an advantage here. For them, it is directly recognizable how much turnover is endangered by an unavailable purchased item. As a rule, SMEs have less market power than large companies and therefore only limited possibilities to freely choose from the portfolio of different instruments for risk management. They hardly have the opportunity to negotiate the terms of their insurance contracts with insurance companies, for example. This is a disadvantage compared to corporations that can use their procurement volume to actively influence the conditions of their risk measures. To compensate, SMEs could take measures that they can influence themselves. Small and medium-sized industrial companies are considered innovative and flexible. For the handling of identified risks, as well as for the handling of acute cases of damage, they could benefit from the recommendation to have their risk management carried out by cross-functional teams. In doing so, cross-departmental opportunities for solving strategic and acute challenges are explored. The substitution of raw materials by R&D, the change of the production process, or the switch to other sales items are examples for the fact that decisions on these issues require specific knowledge that purchasers in SMEs do not regularly have.

Risk Management of Procurement of the German …

339

Ideally, the company-wide knowledge of cross-functional teams is already used in the evaluation of the identified risks. This way it can be determined even before a concrete case of damage occurs whether and which alternatives are possible. Checklists can be created to determine how to deal with a loss event [73]. The effort to create checklists for articles or product groups is not insignificant. In order to keep the effort to a minimum, the recommendation of a heuristic approach could deliberately accept that checklists are not perfect and may never be used. Nevertheless, it may be useful for the German SMEs to create checklists. This is all truer if they are together used in the identification and assessment of risks. The effort can be worthwhile for SMEs in three ways. For example, checklists can make companies independent of individual knowledge carriers or prevent panic reactions in the event of damage. Thirdly, proactive consideration of potential risks and approaches to solving them can promote risk awareness and have a positive effect on the risk culture. For example, in the event of a loss, there could be no recriminations, but instead all resources could be concentrated on solving the problem.

3.6 Risk Postprocessing Risk assessments and their derivation are not consistently documented. One of the reasons for this could be that nobody likes to admit mistakes to themselves or others [29]. This stands in the way of the opportunity to benefit from “lessons learned”, i.e., the knowledge gained in overcoming damage. For this to happen, it must be comprehensible how and why the decisions that did not prevent a damage event had been made before. Without documentation, the opportunity to learn from the damage that has been overcome is usually missed. Consequently, the recommended course of action is to form a cross-functional team consisting of the employees who were involved in handling the acute incident. Once the acute incident is overcome, the team analyzes whether and which lessons can be learned for future risk assessments. At the beginning of the follow-up process, it is advisable through the newly gained knowledge and the documentation to analyze the previous risk assessment whether a different decision could have been made based on the information available at that time. The basic principle of follow-up should be to ensure even better care in the future, and not to look for alleged culprits. To this end, the rule must apply that decisions that did not prevent a case of damage were only a wrong decision if the knowledge available at the time could have been used to decide otherwise. The importance of rules for the operational process as well as for the general cooperation of the team has already been discussed. Even very small companies can benefit from such considerations. No special know-how or instruments are required to follow up on overcoming the cases of damage, and there is no need for a staff position to moderate, coordinate, or document findings.

340

S. Burghart and M. Fekete

3.7 Risk Controlling Controlling or related terms are often mentioned as the last step of risk management without being the end point of this ongoing process. Risk controlling as a specific subarea of controlling has smooth transitions to risk assessment and the follow-up of damage claims [2]. SMEs are also given partly contradictory recommendations for the organization and implementation of their risk controlling. It would be fatal if SMEs were to forego controlling the risk management process in view of the contradictions. It is empirically proven that SMEs improve the quality of their risk management through controlling [91]. Because of the complex value creation, industrial companies in particular benefit from this. This advantage is counterbalanced by the effort and expense involved. On the one hand it is justified by concretely averted cases of damage. On the other hand, a constructive risk culture generates an independent added value and thus contributes to the long-term survival of the company. The special thing about risk control is that the more successful risk management is, the fewer measurable results are obtained. The positive effect on corporate culture, in general, and risk perception, in particular, cannot be measured in the usual sense. This limits the possibilities of measuring the effectiveness of controlling within individual companies. However, a meta-study has shown that the advantages of proactive risk management outweigh the risks in securing supply overall [91]. An empirical study published in 2019 points in the same direction and shows that the effort for risk management is justified. It was shown that a high degree of maturity of procurement risk management has a clearly positive impact on the performance of companies [94]. Furthermore, the SME-suitable approach is to dispense with risk controlling as a separate sub-step. This is possible because the tasks of planning, managing, and controlling risks are also the tasks of the regular risk assessment. The same applies to risk follow-up, with the difference that a loss event occurred prior to this. On both occasions, newly gained knowledge and changing assessments are incorporated into risk management. This controlling approach combines the advantages of the integration model as well as the separation model. The advantage of the integration model is that the expertise is used and no additional resources of internal or external controllers are required. Whereby an already existing controlling can be integrated to bring in their specific controlling know-how. The advantage of a separation model, that no department controls itself, comes into play when the various departments contribute their expertise and also their different perspectives to the cross-functional team. The effort for risk management is minimized, which contributes to the acceptance in the German SMEs [2].

Risk Management of Procurement of the German …

341

3.8 Interim Conclusion for the Derivation of Recommendations for Action The previous parts have shown how risk management should be structured and what steps the German SMEs can take internally and independently to strategically ensure the security of supply of its purchased items. A key finding is that isolated risk management of procurement in German industrial companies is not as effective and efficient as that of a cross-functional team. In industrial companies, one department alone cannot comprehensively oversee and evaluate the complex challenges in the procurement of their requirements. This is especially true for SMEs, for which scarce resources and limited know-how are considered as typical characteristics. Cross-functional teams can compensate for both weaknesses by contributing their expertise to procurement risk management across departments. Through cross-departmental cooperation, the responsibility for all necessary measures can be distributed throughout the company. Decisions taken jointly can prevent mutual recrimination in the event of damage and create a congruent sense of risk and responsibility. A central impulse for SMEs is that the risk management to ensure the security of supply can be implemented with little effort and know-how if it is supported by a broad base of functions. The recommendation is to keep the effort low by doing what is possible with the available resources and know-how. With this approach, which is suitable for medium-sized companies, it is consciously accepted that despite risk management, damage can occur. These premises increase the willingness in SMEs to implement risk management. The prerequisite for such a heuristic risk management is a constructive risk culture in which loss events are consciously accepted. Furthermore, it was shown that a structural and procedural organization is the indispensable basis for an effective and efficient risk management, also for procurement in SMEs. To create the guidelines for this is the normative task of the management board, even if there is no blueprint for risk management in SMEs [54]. This does not have to be a shortcoming if companies deliberately go their own way, which everyone in the company knows and follows together. Risk management goals and strategies derived from the company’s objectives and communicated throughout the company can prevent the efforts of individual employees or departments from running counter to the company’s objectives. With the recommendations for an individual risk management process, which can be carried out without external partners, a USP can be created and thus the competitiveness of German medium-sized industrial companies can be increased. A further recommendation for action is that SMEs should consciously use only simple methods for risk management, regardless of the fact that their results will not always be optimal.

342

S. Burghart and M. Fekete

Concrete examples of instruments that are easy to apply are IBR analysis for identification and the Gaussian summation formula for evaluating risks in industrial companies. The follow-up of overcome cases of damage can help to avoid future damage, and a combination of the integration and separation model can be used for risk controlling in SMEs. Another finding is that the literature on risk identification practically does not address the fundamental influence of the data basis on the result of the analysis. The recommendation for the German SMEs to identify potential risks is to pay special attention to the selection of the data basis for quantitative analysis instead of trying to identify all risks that might threaten them and to refrain from trying to prevent every probable risk. Reasons for not doing so include a lack of guidance on when this goal is reached, scarce resources, and the deceptive sense of security of having all risks in focus. In the evaluation, the cross-functional team should choose between active and reactive measures with a sense of proportion. To ensure that the available resources are used as purposefully as possible, it must be defined what is considered a risk for the company. Otherwise, employees make decisions at their own discretion that could deviate from the risk attitude of the management board. Once a case of damage has been overcome, the German SMEs should take the opportunity to learn lessons for the future. The risk follow-up can be easily implemented by cross-functional teams without much effort. Reflecting on previous assessments can have a positive impact on company-wide risk awareness and risk perception. It is suitable for all types and sizes of SMEs. Risk controlling is also indispensable for medium-sized industrial companies. Its planning, management, and control segments can be integrated into the assessment and follow-up of risks and can be carried out by a cross-functional team. SMEs do not need to check the effectiveness of their risk controlling. For all sub-steps, it is true that medium-sized industrial companies cannot completely secure their supply of purchased goods despite risk management. The recommendation here is to consciously refrain from preparing for all eventualities.

3.9 Transferability In the course of developing the recommendations for action, it became clear that transferability is less about different industries [2]. The results of an earlier study, which had identified a strong influence of company size, but only a small influence of the industry on the quality of risk management in companies, already pointed in this direction [28]. Furthermore, the influence of the different economic sectors industry, trade and services became apparent due to the very different services provided [95, 96]. These differences in value creation lead to significantly different challenges in procurement and risk management.

Risk Management of Procurement of the German …

343

The survey tended to show no sector-specific differences in the responses. This trend was supported by the fact that most of the participants’ companies work with the bills of materials [2]. It can therefore be assumed that industrial companies can benefit from the recommendations for action across all sectors. According to this, the recommendations for action could be suitable for industrial companies from different sectors that work with the bills of materials as SMEs in Germany. For trade or service companies they are only conditionally suitable because of the different way of value creation. This is also valid for the IBR analysis recommended for risk identification in industrial companies. Here the bills of materials of the production goods are dissolved, and their components are compared with the profit contribution, which the individual production goods affect. Trade and service enterprises do not work with the bills of materials. Therefore, the essential step of the comparison with the result contribution cannot be accomplished, on which the advantage of this analysis is based. German SMEs and hidden champions are different from small and medium-sized industrial companies of other nations. If the conditions are otherwise comparable, however, it can be assumed that companies outside Germany can also benefit from the recommendations for action. Industrial corporations usually have larger workforce and more know-how than SMEs, but also face the challenge of complex procurement. The fact that individual corporations such as BMW AG are dissolving their bills of materials as part of their procurement risk management suggests that non-SMEs can also benefit from the recommendations for action presented in this article. This is true if they, like the actual target groups, follow a holistic and heuristic approach to risk management with cross-functional teams and work with the bills of materials. The recommendations for action also focus on strategic risk management. This does not prevent their use in operational risk management. Cross-functional cooperation is also particularly useful for securing existing requirements. For example, when a damage event has occurred and a raw material is no longer available. In this case, cross-departmental cooperation between purchasing and R&D, product management or logistics can be used to work out a broader range of possible countermeasures. The question of transferability can therefore be answered in the majority of cases in the affirmative tone, provided that the circumstances are comparable to those of the German SMEs.

4 Discussion In the following, the insights gained are interpreted, compared to the current knowledge and possible limitations are pointed out. The significance and possible consequences of the results are presented. In addition, considerations are made as to what effects the findings might have and suggestions are given as to how the existing knowledge could be supplemented by the recommendations for action.

344

S. Burghart and M. Fekete

The goal was to provide practitioners with theory-based recommendations for action in order to change the social reality in companies. Consequently, the recommendations for action do not necessarily have to represent new knowledge if they provide practitioners with meaningful design models and rules [97].

4.1 Outline of Recommendations for Action A key finding is that cross-functional teams can carry out the risk management of strategic procurement in the German SMEs with a greater all-round view than purchasing alone can do. A heuristic approach and the use of easy-to-use tools enables these companies to achieve useful results with little effort. This approach, which is suitable for medium-sized companies, does not contradict the demand for holistic risk management. When selecting master data for risk analyses, their impact on the results must be taken into account to a large extent. It was shown that controlling the risk management process is obligatory and can be operated as effectively as efficiently by cross-functional teams as a combination of integration and separation model. The German SMEs can do what is possible with the given means if they accept that the outcome of the risk management process will not be perfect. This pragmatic approach contradicts the completeness postulate, which implicitly demands flawless risk management.

4.2 Completeness Postulate The claim of the completeness postulate to perfection in risk identification can have a negative effect on the willingness of SMEs to invest in active risk management. Equally counterproductive would be an expectation in companies, generated by the completeness postulate, that risk management can be used to fully control all risks. The assumption that all risks have been identified and averted could be disappointed in the event of an unforeseen loss event. In the face of significant failure, SMEs may no longer consider their risk management efforts as justified and may question them in general. There is no doubt that the success of risk management depends on identifying risks as comprehensively as possible. Companies can only manage those risks that they have identified as such. Nevertheless, publications that point out that all risks are to be identified could be supplemented by a heuristic approach. Particularly suitable would be the reference to the recommendation that the German SMEs should do what they can with the available resources to strategically secure their needs. This deliberately lower hurdle to entry into risk management could encourage the German SMEs to adopt a holistic risk management approach.

Risk Management of Procurement of the German …

345

4.3 ABC Analysis The ABC analysis is one of the best known, and in the literature most frequently mentioned, methods for identifying risks. It has already been shown during the derivation of the recommendations for action that the process of applying the procedure and the previously necessary data selection are not described in more detail. This shortcoming applies even more to the large number of other methods for the various sub-steps of risk management, which are mentioned less frequently. Most publications are limited to the following limitations of the ABC analysis. On the one hand, these are the indication that it is a purely quantitative approach. This shortcoming should be overcome by adding a qualitative method to the analysis. In this way, users are to be given a more comprehensive view of possible risks. Furthermore, the reference to interdepartmental cooperation in risk identification could help to broaden the perspective of risk identification. With this approach, a further limitation of the ABC analysis could be eliminated. The point here is that even the absence of goods of low value can lead to a production stop. Publications could be extended by the recommendation to be able to uncover such connections, for example, with the IBR analysis. In principle a clearer designation of the restrictions of the procedure could prevent that practical employees regard the ABC analysis as a maxim for the identification of their risks. This expectation, like the completeness postulate, can become dangerous for risk management as a whole. This is the case when a loss occurs that is triggered by a risk that was not identified in advance. Disillusionment with this can lead to a general questioning of the sense of risk management. Again, companies could contain this danger with a deliberately chosen heuristic approach to risk management. When publications are supplemented by this aspect, it is important to emphasize that the use of easy-to-use tools does not exempt companies from having to deal intensively with the selection of their data in advance.

4.4 Master Data The basic idea is that an analysis result cannot be better than its database. The English saying “rubbish in—rubbish out” makes this very clear. In addition to the mere recommendation of procedures for data analysis, business administration has the chance to sensitize users to the considerable effects of their decisions on the database through academic teaching and publications. The following example of the identification of possible risks in the context of an IBR analysis illustrates what consequences this can have. Depending on which data is evaluated in the process, it becomes visible or remains hidden that individual items with a low individual risk may overall have a higher risk if they are all sourced from the same supplier. If this connection is to be uncovered, it must not only be considered

346

S. Burghart and M. Fekete

whether goods are procured from only one source. It must also be considered who the suppliers are and what they supply beyond that. Alternative sourcing from different distributors can become a risk, as can be a sourcing from different producers, all of whom are located in the same region. In this respect, the recommendation may make sense to carry out an aggregation of risks in addition to the assessment [59, 75]. Treatises on the identification, evaluation, and possible aggregation of risks could be supplemented by a recommendation to define in advance what is considered a risk on a company-wide basis. Furthermore, the possibility could be emphasized even more clearly that rules for the assessment of analogous risks reduce the effort involved if this does not mean that each individual risk must be assessed according to probability and impact. This would be in the sense of a heuristic approach, and in accordance with the recommended action for an effectively and efficiently structured risk management. Again, the risk of a possible damaging event would be consciously taken despite the joint efforts. On the one hand to save resources, and on the other hand to benefit from developing risk awareness across functional boundaries. While analyses of all kinds are among the classic areas of application for master data, they are taking on a new key role in the course of industrial digitization. Digitization will radically change the performance of manufacturing companies. Through the Internet of Things and Industry 4.0, the boundaries between individual companies will blur, as will the classic role of customer and supplier. The prerequisite for this is closely interlinked cooperation with a smooth exchange of shared data along the entire supply chain. For such a deep integration, more must be done than ensuring the consistency of your own master data within the company. If you want to exchange data automatically, you need a set of rules to standardize the data of all partners. The significance of master data for Industry 4.0 and the Internet of Things can be demonstrated using the example of the metric labeling of an item. A special screw is required as an essential component for a machine and is listed in the ERP system in its bill of materials. This single screw can be classified as one piece in very different ways: pc., piece, or otherwise. It becomes even more inconsistent if the item is purchased from different suppliers in different packaging units. Screws usually do not have a high purchase price and are sold in lot sizes from one to 1–10.000 pieces. Suppliers do not classify screws uniformly, but sometimes as an independent product, sometimes as a spare part for higher value goods. This simple example alone reveals the obstacles for the Internet of Things to implement self-steering planning, control, and provision of data along the supply chain. In many cases, inconsistent data still hinders the exchange of information from the manufacturer to the logisticians and distributor to the fabricator and his customers. The buzzword to remedy this is data governance or data management. Both terms are mostly used synonymously. To put it succinctly, data governance is used to regulate how and which data is managed throughout the company and who is entitled or obliged to maintain it. The main areas of application include Big Data with its variants Enterprise Search, Machine Learning, Data Mining or Predictive Analytics.

Risk Management of Procurement of the German …

347

Outside of the IT environment, hardly any literature has been found on the very broad facets of data governance with which practitioners are confronted in terms of organization and practical implementation. Application-oriented research on data management could provide those responsible in companies with recommendations for action to design their data management. Business administration should not leave it to management consulting firms to support organizations in using data governance to create and develop the basis for their digitization. Appelfeller and Feldmann have already taken a step in this direction with their research on digital transformation in companies. At the end of their likewise application-oriented research, they saw a need for further research in specific areas such as the various industries or types of service provision [98].

4.5 Risk Management in Procurement In the course of the literature search carried out from 2015 onwards, only a weak link between the two subject areas of risk management and procurement was identified as a research gap. Even then, there was no lack of theoretical knowledge on both topics. Significantly less literature presented specific findings for the risk management of small or medium-sized industrial companies and their strategic procurement. In the entire context examined, only in individual cases did the authors give application-oriented recommendations for action on how to put the various theories into practice. This was confirmed by a repeated research in academic journals such as the Harvard Business Review or the Journal of Supply Chain Management at the end of 2019, which showed that the situation had not changed significantly since the initial research. While the full-text search still found a large number of articles for the individual keywords “Risk management”, “Procure*”, “Purchase*” or “SME”, fewer and fewer articles were found if the search words were combined and restricted to titles or keywords. Of the few articles found, none dealt with the core topics of this work. The result shows that the focus in both research fields, risk management and procurement, remains on the further development of scientific theories. Thus, the fundamentals of risk management and procurement can be considered as theoretically well investigated subareas of business administration. The challenges of risk management within procurement are still much less well investigated. Likewise, the literature on risk management hardly deals with the subarea of procurement with its specific characteristics. If you are looking for publications that classify companies in greater depth within these two subject areas, you will hardly find anything either. This applies to the different branches of the economy or industries as well as to the size aspect, i.e., whether the companies are small, medium-sized, or large. The few publications that deal with procurement and its risk management in companies of different sizes or in different sectors of the economy are often sources that are only cited to a limited extent in academic circles.

348

S. Burghart and M. Fekete

Nonetheless, non-academic publications are also justified if their applicationoriented recommendations for action or experience reports encourage practitioners from SMEs to consciously pursue imperfect risk management, or if this is due to a lack of know-how.

4.6 Business Management Research and Teaching Hans Ulrich, one of the co-founders of the St. Gallen Management Model, had already expressed his opinion in the mid-1980s that practitioners should preferably be enabled to handle their future challenges themselves using design models and solution methods [97]. For academic teaching at German universities, this could mean placing even greater emphasis on communicating the results of application-oriented research. In addition to basic knowledge and methodological skills, more methodological competence could be taught so that future managers can adapt their theoretical knowledge to the realities of practice. Gloger writes in his partly quite harsh criticism that the training of prospective business economists is taught in a way that misses the reality of business, neglects the middle class with its imprint by the owner, does not promote independent, deductive thinking enough and instead creates numerical credibility among the graduates [99]. Even if this criticism might be exaggerated, it does show the weakness of the lack of practical relevance, which Ulrich had pointed out decades ago. If his demand for the equivalence of application-oriented and basic science were to be implemented in research and teaching, business economists could close the gap that exists between risk management and procurement in practice themselves. Thereby, company-specific characteristics such as ownership, size differences, industry, degree of maturity or the kind of the achievement could be considered. An example of this is the post-processing or follow-up of the cases of damage. Up to date, also the small publicity and rare naming as a firm component of the risk management process opposes their employment. Consequently, only a fraction of companies takes advantage of the opportunity to learn systematically from possible mistakes. One of the recommendations for action advises to regard this easy-to-use instrument as an integral part of risk management. Companies of all sizes can benefit from the implementation of the proposal to unanimously consider and communicate the follow-up in research and teaching as a regular step of the risk management process. This would establish an easily applicable component of a holistic risk management. If, in parallel, more methodological competence would be taught in the course of studies in order to adapt and apply existing instruments to the needs and circumstances of companies, a further recommendation of this paper would be to implement a coordinated sequence of measures instead of just implementing individual

Risk Management of Procurement of the German …

349

measures. This expectation of a theory-based, coordinated sequence of risk management as a holistic process is opposed to the fact that there is no gold standard for risk management. The danger here is that it can harm companies if only autonomously running partial steps such as the identification or handling of risks are implemented. Individual measures can even run counter to corporate goals if they are not aligned with the company’s goals and embedded in an organizational framework.

4.7 Organizational Integration If the management wants to steer against the erroneous trend, the goals of the enterprise and individual functions like the procurement must be well-known and coordinated. Publications to the risk management of the procurement could stress this connection still more clearly and refer to the recommendation of a holistic risk management embedded into the organization of the enterprise. Furthermore, procurers in SMEs could benefit if publications were supplemented by the recommendation for action that a heuristic risk management process is better than no risk management at all. This finding is significant in that the problem of the heterogeneous terminology of risk management as well as the previously discussed issues cannot be overcome in the short term. Especially the clear imbalance between theory-building and applicationoriented research highlights the call for more methodological competence. It is necessary so that managers can put existing theoretical knowledge into practice. While the facilitation of methodological competence is the task of teaching, practitioners themselves are responsible for their continuous further education. In order to select the most suitable sub-steps as well as risk management instruments for their companies, they should be able to draw the right conclusions from publications of empirical studies, which are often based on the contingency approach.

4.8 Sector-Specific Risk Management SMEs are represented in all sectors of the economy and in the most diverse industries. Low resources and know-how are regarded as typical for them. This makes the demand for simple, as well as type- and sector-specific, procedures for risk management of SMEs understandable. There is evidence that industry affiliation has little influence on the nature of risk management in these companies. In contrast, the influence of the different economic sectors on the risk management of procurement had become clear [2]. In principle, manufacturing companies, which include industrial companies and craft, can be distinguished from trading and service companies on the basis of their different way

350

S. Burghart and M. Fekete

of value creation. The consequence for risk management would be a specific risk management of procurement for each of the three sectors. Such an approach has been implemented in marketing since the 1970s. With this classification, a distinction is made between trade and service marketing and the marketing of producing manufacturers, thus taking into account their different business models. This successfully applied approach of differentiation is not yet implemented by risk management and procurement in general, or risk management of procurement. Nevertheless, many publications on procurement or risk management implicitly address the conditions in manufacturing companies. In practice, this means that when companies are looking for risk management methods, they cannot clearly see whether a method is suitable for a particular sector, and if so, for which sector. Two extremes can be the result: either the methods are perceived as unsuitable from the outset and not implemented at all, or they are applied but do not bring the expected added value despite the effort involved. It would also not be productive to implement risk measures only for the sake of form, for example to increase the rating of banks or other external stakeholders. In both cases, the probability of establishing risk management in SMEs in the long term is not high. Furthermore, the size of the company could also be taken into account, if the different branches of the economy are provided with recommendations for action that are tailored to their specific service provision.

4.9 Size-Specific Risk Management Consequently, a correlation between the size of the company and the quality of risk management can be assumed. The larger a company is, the better the quality of its risk management. Further, the process of risk management in SMEs differs significantly from that in corporations. A look at the organization charts of corporate groups shows that there the risk management is often the task of specially assigned employees. In SMEs, the task or at least the responsibility for it lies either with the owner or the individual departments such as procurement. It is assumed that the level of business management knowledge among employees in procurement in corporate groups is higher than in small and medium-sized companies. Further, it is assumed that small and medium-size enterprises have smaller procurement volumes. If both applies, they have a less good negotiation position opposite their suppliers. Furthermore, it is assumed that SMEs generally have fewer resources than large companies and therefore shy away from the effort involved in risk management. These obvious points of discussion alone suggest that risk management in SMEs is usually different from that in large companies. In practice, this means that sophisticated procurement or risk management procedures tend to be used less frequently in SMEs. Consequently, SMEs can benefit

Risk Management of Procurement of the German …

351

from the fact that publications for their risk management would be supplemented by recommendations for action to consciously use simple methods for the various steps of risk management. A reference to the recommended action to have a holistic risk management supported by a broad base of functions could compensate for the possibly low knowledge of individual participants. The elementary influence of the owner on the handling of risks is another reason for a SME-specific risk management. SME entrepreneurs are considered to be very concerned about their independence. The results of an empirical study on risk management in SMEs are in line with this. It shows how reluctant SMEs are to work with external consultants and are skeptical about methods that are apparently more tailored to the circumstances in corporate groups [28]. Conversely, such methods or recommendations for action have a better chance of being used that take into account the different sizes of companies and economic sectors as well as the specific characteristics of SMEs.

4.10 Procurement Cooperation A striking example for the fact that recommendations for action without considering the characteristics of their addressees are only of limited use is the recommendation that SMEs should enter into procurement cooperation. This method can be used in the context of the partial step of the risk handling not treated here. Purchasers are recommended to bundle their requirements with those of other companies and to procure jointly. In this way, SMEs should be able to compensate for their weak negotiating position and low market power, and thus better secure their supply. This recommendation is per se based on insignificant needs of small and mediumsized industrial enterprises. It does not take into account that especially in Germany many medium-sized companies are hidden champions, i.e., market or even world market leaders in their field. Furthermore, the striving of their owners for independence and sovereignty as driving forces of their actions and distinctive characteristics of medium-sized companies are not taken into account. In general, co-operation can be quite useful with the procurement of standardized products, if for instance by the quantity bundling purchase advantages are obtained. With the recommendation of the procedure such advantages are referred to, however, without pointing out possible restrictions of the use, and further hurdles of a common procurement more clearly to work out. Consequently, SMEs may overlook any existing antitrust risks. Or they may underestimate the organizational effort of a purchasing cooperation of different companies. In order to close joint contracts with suppliers, mutual trust must be built up. This may involve liability for non-payments by the cooperation partners or the discreet handling of data exchanged between them.

352

S. Burghart and M. Fekete

Apart from this, the disclosure of requirements, conditions, or sources of supply necessary for a cooperation can even damage the own supply of low-priced products if the joint requirements exceed the quantity available on the market. Then the bundled demand generates a sudden increase in demand, which can have a negative effect on the procurement prices. These risks can affect all companies that enter into a procurement cooperation. For hidden champions, a cooperation with potential competitors can even jeopardize their own competitive advantage if goods are procured that are related to the company’s USP [40]. So, there are a number of situations in which procurement cooperation is not fully appropriate. In order to avoid that SMEs have negative experiences with it, publications could make both academics and practitioners even more aware of possible restrictions. In addition, it could be pointed out which type of enterprises can benefit most from the instrument and under which conditions procurement cooperation bring their full yield. For procurement in SMEs, this means that purchasers may be disappointed by recommendations such as the one on cooperation with other companies if there is no indication of their limitations. This is true if they only realize the limitations afterwards or in retrospect. Under certain circumstances, they might also be skeptical about economically sound recommendations for action in the future, and not take advantage of the opportunity to utilize it. For the selection and implementation of purchasing instruments, as a whole, this means that users should have sound methodological competence in addition to mere methodological knowledge. Until such considerations are implemented, indications of possible restrictions regarding to the size of the companies and their economic sector can be given in order to guide practitioners to the most suitable instruments for their application context.

4.11 Risk Management as a Risk This paradoxical situation arises, for example, when companies collect large amounts of data with ever more procedures and greater efforts and process them in a complex and time-consuming manner. By trying to manage all risks in this way, not only SMEs can easily overlook the fact that complex systems can have a contrary effect. Instead of achieving greater security, possible risks can be overemphasized and opportunities missed. Risk management thus becomes a quasi self-fulfilling prophecy [100]. Even if no one should seriously doubt the rationale of risk management, a consideration of risk management would be incomplete and would ignore this discrepancy. To ensure that companies do not suffer any damage from their risk management, the discussion ends with an appeal to all those responsible in the companies to conduct their risk management with a sense of proportion. A second appeal is addressed to authors of publications on risk management to have the courage to plead for the courage to close the gap [101].

Risk Management of Procurement of the German …

353

5 Conclusion 5.1 Synopsis The proposed recommendations for action for risk management in procurement is an efficient means for the German medium-sized industrial companies to strategically ensure the supply of goods not produced by themselves. For this purpose, an application-oriented research approach based on Hans Ulrich was explicitly chosen. It is based on the problems of practice, which solves them with the help of business management theories, deriving recommendations for action from them, and always striving to advise practice. The recommendations for action are suitable for proactively securing the supply of goods to industrial companies. We think that these recommendations for action can be extended to the entirety of German medium-sized industrial companies. One of the main findings of this study was the fundamental importance of crossfunctional teams [2]. In industrial companies, their relevance for procurement risk management can hardly be overestimated. On the other hand, we have to accept that there can be no perfect risk management. With this elementary insight, the German medium-sized industrial companies gain the freedom to focus existing knowledge and scarce resources on their innovative strength and flexibility. These particular strengths of SMEs help to ensure their long-term survival.

5.2 Further Research If researchers want to support SMEs in securing their long-term existence with recommendations on the risk management of procurement, they will find a broad spectrum of future research approaches. The sub-step of managing risks had been excluded from the development of recommendations for action because of the cooperation with external partners such as insurance companies or suppliers that is usually necessary. It could be an approach for further research to include this step in order to offer SMEs recommendations for action for the entire process of their risk management. Recommendations for action could be developed for operational risk management, which is also excluded here. Operational purchasing deals with already existing needs. The aim of such recommendations for action would be to ensure the supply of SMEs with existing requirements of purchased articles in accordance with the materials management optimum. Another starting point for future research could be the further development of existing instruments for the various sub-steps of risk management. Taking into account the special features of SMEs and the various sectors of the economy offers application-oriented researchers a broad field of activity.

354

S. Burghart and M. Fekete

In addition to the field of risk management, master data management offers potential for further research. Digitalization will significantly change value creation in industrial companies. This forecast is forcing the realization, both in research and teaching as well as in practice, that consistent data is the basis for this. Industrial companies can use the IBR analysis to identify potential risks for those purchased items that are essential to a company’s performance and profit contribution, regardless of their purchase price or procurement volume. Through crossdepartmental cooperation in the identification and evaluation of the IBR analysis results, this tool draws on the entire expertise available throughout the company.

References 1. Van Weele, A.J., Eßig, M.: Strategische Beschaffung. Grundlagen, Planung und Umsetzung eines integrierten Supply Management. Springer Gabler, Wiesbaden (2017). ISBN 9783658084905 2. Burghart, S.: Risikomanagement der Beschaffung deutscher mittelständischer Industrieunternehmen mit Fokus auf Versorgungssicherheit. Dissertation. Comenius University in Bratislava, Faculty of Management (2020) 3. Romeike, F.: Der Prozess des strategischen und operativen Risikomanagements. In: Romeike, F., Finke, R.B. (eds.) Erfolgsfaktor Risiko-Management. Chance für Industrie und Handel Methoden, Beispiele, Checklisten, pp. 147–161. Gabler Verlag, Wiesbaden (2003a) 4. Mugler, J.: Grundlagen der BWL der Klein- und Mittelbetriebe. Facultas.wuv Univ.-Verl. Manual, Wien (2008) 5. Mugler, J.: Betriebswirtschaftslehre der Klein- und Mittelbetriebe. Band 1. 3. Wien. Springer, New York. Springers Kurzlehrbücher der Wirtschaftswissenschaften (1998) 6. Grochla, E.: Der Weg zu einer umfassenden betriebswirtschaftlichen Beschaffungslehre. Die Betriebswirtschaft 37(2), 181–191 (1977) 7. Large, R.O.: Strategisches Beschaffungsmanagement. Eine praxisorientierte Einführung mit Fallstudien. Springer Gabler. Lehrbuch, Wiesbaden (2013) 8. Sandig, C.: Grundriss der Beschaffung. In: Sandig, C., Geist, M. (eds.) Vom Markt des Betriebes zur Betriebswirtschaftspolitik. Bedarf, Beschaffung, Absatz; Festschrift zum 70. Geburtstag, pp. 82–113. Poeschel, Stuttgart (1971) 9. Arnold, U.: Strategische Beschaffungspolitik. Steuerung und Kontrolle strategischer Beschaffungssubsysteme von Unternehmen. Lang. Europäische Hochschulschriften Reihe 5, Volksund Betriebswirtschaft, Frankfurt am Main (1982) 10. Arnold, U.: Beschaffungsmanagement. Schäffer-Poeschel. Sammlung Poeschel, Stuttgart (1997) 11. Kleemann, F.C., Glas, A.: Einkauf 4.0. Digitale Transformation der Beschaffung. Springer Gabler Essentials, Wiesbaden (2017) 12. Droste, M., Grobosch, S.: 13 Prozent Performance-Steigerung durch optimierte Prozesse im Einkauf. Studie: Optimierter Einkauf in der Hochkonjunktur. Gemeinsame Studie von Expense Reduction Analyst. BME und EBS (2018) 13. Paranikas, P., Whiteford, G.P., Tevelson, B., Belz, D.: How to negotiate with powerful suppliers. Harv. Bus. Rev. 93(7/8), 90–96 (2015) 14. Zimmermann, F., Foerstl, K.: A Meta-Analysis of the “purchasing and supply management practice-performance link.” J. Supply Chain Manag. 50(3), 37–54 (2014) 15. Arnold, U.: Größenspezifischen Probleme und Möglichkeiten zu ihrer Lösung. In: Pfohl, H.-C., Arnold, U. (eds.) Betriebswirtschaftslehre der Mittel- und Kleinbetriebe. Größenspezifische Probleme und Möglichkeiten zu ihrer Lösung. Schmidt, Berlin (2006)

Risk Management of Procurement of the German …

355

16. Gantzel, K.J.: Wesen und Begriff der mittelständischen Unternehmung. Westdt. Verlag für Sozialwissenschaften. Abhandlungen zur Mittelstandsforschung, Köln (1962) 17. Stütz, S.: Kleine und mittlere Industrieunternehmen in der ökonomischen Theorie. In: Meyer, J.A. (ed.) Kleine und mittlere Industrieunternehmen in der ökonomischen Theorie, pp. 1–440. Eul, Lohmar (2011) 18. Namdar, J., Li, X., Sawhney, R., Pradhan, N.: Supply chain resilience for single and multiple sourcing in the presence of disruption risks. Int. J. Prod. Res. 56(6), 2339–2360 (2017). Retrieved from: https://doi.org/10.1080/00207543.2017.1370149. Accessed: 31 Oct 2019 19. Bloech, J.R., Bogaschewsky, U., Buscher, A., Daub, Götze U., Roland, F.: Einführung in die Produktion. Springer Gabler. Springer-Lehrbuch, Berlin (2014) 20. Bär, J.: Strategische Beschaffung in kleinen und mittleren Unternehmen. Diplomica Verlag, Hamburg (2012) 21. Schiele, H., Calvi, R., Gibbert, M.: Customer attractiveness, supplier satisfaction and preferred customer status. Introduction, definitions and an overarching framework. Industr. Market. Manage. 41(8), 1178–1185 (2012) 22. Bogaschewsky, R.: Einkauf im Mittelstand. In: Becker, W., Ulrich, P. (eds.) BWL im Mittelstand. Besonderheiten; Entwicklungen. Kohlhammer Verlag, Grundlagen (2015) 23. Kirsch, T.: Entwicklung eines Modells zur Umsetzung einer ökologisch orientierten Beschaffung in der Ernährungswirtschaft. Zittau. Hochsch.-Inst., Dissertation, 2012. Cuvillier. Schriften zum Supply-Chain-Management, Göttingen (2013) 24. Waser, B.R., Peter, D.: Prozess-und Operations-Management. Strategisches und Operatives Prozessmanagement in Wertschöpfungsnetzwerken. Versus, Zürich (2013) 25. Arnolds, H., Heege, F., Röh, C., Tussing, W.: Materialwirtschaft und Einkauf. Grundlagen– Spezialthemen–Übungen. Springer Fachmedien Wiesbaden, Wiesbaden (2016) 26. Welsh, J.A., White, J.F.: A small business is not a little big business. Harv. Bus. Rev. 59(4), 18–32 (1980) 27. Maloni, M.J., Hiatt, M.S., Astrachan, J.H.: Supply management and family business. a review and call for research. J. Purchas. Suppl. Manage. 23(2), 123–136 (2017) 28. Henschel, T.: Erfolgreiches Risikomanagement im Mittelstand. Strategien zur Unternehmenssicherung. Schmidt, Berlin (2010) 29. Gleißner, W.: Grundlagen des Risikomanagements. Mit fundierten Informationen zu besseren Entscheidungen. Franz Vahlen. Management Competence, München (2017) 30. Wegmann, J.: Betriebswirtschaftslehre mittelständischer Unternehmen. Praktiker-Lehrbuch. Oldenbourg, München (2013) 31. Sauter, R., Sauter, W., Wolfig, R.: Agile Werte- und Kompetenzentwicklung. Wege in eine neue Arbeitswelt. Springer Gabler, Berlin (2018) 32. Mikus, B.: Make-or-buy-Entscheidungen in der Produktion. Führungsprozesse - Risikomanagement - Modellanalysen. Göttingen, Univ., Dissertation, 1997. Dt. Univ.-Verl. GablerEdition Wissenschaft, Wiesbaden (1998) 33. Allianz, S.E.: Allianz Global Corporate & Specialty SE: Allianz Risk Barometer 2019. Top business risks for 2019 (2019). Retrieved from: https://www.agcs.allianz.com/news-and-ins ights/expert-risk-articles/risk-barometer-2019-business-risks.html. Accessed: 12 Nov 2019 34. Blum, M., Kellermann, C.: Bedeutung der Industrie für Deutschland. Daten und Fakten zum Industriestandort Deutschland (2017). Retrieved from: https://www.vci.de/die-branche/zah len-berichte/daten-zur-bedeutung-der-industrie-und-zum-standortprofil-deutschlands.jsp. Accessed: 24 Feb 2018 35. Schlepphorst, S., Schlömer-Laufen, N., Holz, M.: Determinants of Hidden Champions— Evidence from Germany, Bonn (2016). Retrieved from: http://www.ifm-bonn.org/filead min/data/redaktion/publikationen/workingpapers/dokumente/workingpaper_03_16.pdf. Accessed: 18 Sept 2017 36. Frietsch, R.: Hidden Champions im Innovationswettbewerb (2010). Retrieved from: http:// www.isi.fraunhofer.de/isi-wAssets/docs/p/de/events/p_workshop_05-2010/Frietsch_Hid den_Champions.pdf. Accessed: 23 Sept 2017

356

S. Burghart and M. Fekete

37. Becker, S.B., Neyer, G., Schewe, Wilke, R.: Risikomanagement im Mittelstand: Instrumente des Beschaffungsrisikomanagements. RCRC, Münster (2016) 38. Bundesverband der Deutschen Industrie E.V.: Mittelstand und Familienunternehmen (2015). Retrieved from: https://bdi.eu/media/presse/publikationen/mittelstand-und-familienunterne hmen/Faktencheck_Mittelstand_Familienunternehmen_230915.pdf. Accessed: 19 Apr 2020 39. Fieten, R.: Ein Hoch auf den deutschen Mittelstand. Beschaffung aktuell. 9, 110 (2016) 40. Gabath, C.: Risiko- und Krisenmanagement im Einkauf. Methoden zur aktiven Kostensenkung. Gabler Verlag / Springer Fachmedien Wiesbaden GmbH, Wiesbaden (2010) 41. Schulte in den Bäumen, M.: Einordnung, Systematisierung und Konzeption von Beschaffungskooperationen. Cuvillier, Göttingen. Schriften zum Supply-Chain-Management (2009) 42. Specht, D., Behrens, S., Mieke, C.: Risikomanagement in technologieorientierten Beschaffungsnetzwerken. In: Vahrenkamp, R., Amann, M. (eds.) Risikomanagement in Supply Chains. Gefahren abwehren, Chancen nutzen, Erfolg generieren, pp. 133–148. E. Schmidt, Berlin (2007) 43. Risk Management Association E. V.: Praxisleitfaden Risikomanagement im Mittelstand. Grundsätze - Organisation – Durchführung, p. 1. Schmidt. Risikomanagement-Schriftenreihe der RMA, Berlin (2015). ISBN 978-3-503-16526-1 Retrieved from: https://www.esv.info/ 978-3-503-16526-1. Accessed: 29 Oct 2017 44. Baumberger, B., Schwab, R.: Management der Wettbewerbsfähigkeit in KMU. In: Berndt, R. (ed.) Leadership in turbulenten Zeiten. Springer, Berlin (2003) 45. Becker, J.: Marketing-Konzeption. Grundlagen des ziel-strategischen und operativen Marketing-Managements. München: Franz Vahlen (2009) 46. Brühwiler, B.: Risikomanagement nach ISO 31000 und ONR 49000. Mit 13 Praxisbeispielen. QuickInfo (2012) 47. Allianz, S.E.: Allianz Global Corporate & Specialty SE: Allianz Risk Barometer. Die 10 größten Geschäftsrisiken (2016). Retrieved from: https://www.allianz.com/v_1458302171 000/media/press/document/AllianzRiskBarometer2016_DE.pdf. Accessed: 24 July 2016 48. Sarker, S.: The paradox of risk management: a supply management practice perspective. In: Zsidisin, G.A., Henke, M. (eds.) Revisiting Supply Chain Risk, pp. 421–437. Springer, Berlin (2019) 49. Hölscher, R.: Die Praxis des Risiko- und Versicherungsmanagements in der deutschen Industrie. In: Schierenbeck, H. (ed.) Risk-Controlling in der Praxis. Rechtliche Rahmenbedingungen und geschäftspolitische Konzeptionen in Banken, Versicherungen und Industrie, pp. 413–455. Schäffer-Poeschel, Stuttgart (2000a) 50. Schröer, C.: Risikomanagement in KMU. Grundlagen, Instrumente, Nutzen. VDM Müller, Saarbrücken (2007) 51. Burghart, S.: Risk management in various economic sectors. In: MAGNANIMITAS. International Masaryk Conference for Ph.D. Students and Young Researchers. Reviewed Proceedings ˇ of the International Scientific Conference, pp. 51–61. Hradec Králové, Ceská republika (2018) 52. Janßen, S., Mielke, C.: Risikomanagement—Know-how im Mittelstand. Initiative Finanzstandort Deutschland (IFD), Frankfurt (2009) 53. Metzger, A.: How to live with risks. You can’t get rid of them all. Harvard Bus. Rev. 93(7/8), 20–21 (2015) 54. Henschel, T.: Risikomanagement im Mittelstand – eine empirische Untersuchung (2003). Retrieved from: http://link.springer.com/article/10.1007/BF03254200?LI=true. Accessed: 14 Feb 2016 55. Kahneman, D.: Schnelles Denken, langsames Denken. Penguin Verlag, München (2012) 56. Pfohl, H.C., et al.: Betriebswirtschaftslehre der Mittel- und Kleinbetriebe. Größenspezifische Probleme und Möglichkeiten zu ihrer Lösung. Schmidt. Management und Wirtschaft Praxis, Berlin (2013) 57. Montag, P.: Risikomanagement und Compliance im Mittelstand. Dissertation. Berlin: Erich Schmidt Verlag. Management und Wirtschaft Studien. Band 75 (2015) 58. Hölscher, R.: Gestaltungsformen und Instrumente des industriellen Risikomanagements. In: Schierenbeck, H. (ed.) Risk-Controlling in der Praxis. Rechtliche Rahmenbedingungen und

Risk Management of Procurement of the German …

59.

60.

61.

62. 63. 64.

65.

66. 67. 68.

69. 70.

71. 72. 73.

74.

75.

76.

77. 78.

357

geschäftspolitische Konzeptionen in Banken, Versicherungen und Industrie, pp. 297–363. Schäffer-Poeschel, Stuttgart (2000b) Stroeder, D.: Fundamentale Risiken im deutschen Mittelstand und Modelle zu ihrer Bewältigung. Entwicklung modularer, mittelstandsadäquater Risikobewältigungsstrategien auf Basis einer brachenübergreifenden empirischen Studie unter 421 mittelständischen Unternehmen. SMB Stroeder Süddt. Mittelstandsberatung, Stuttgart (2008) Feser, M.: Entwicklung eines Modells zur situationsadäquaten Implementierung von Supply Chain Risikomanagement. Dissertation. Supply chain, logistics and operations management. Band 21 (2015) Schorcht, H.: Risikomanagement und Risikocontrolling junger Unternehmen in Wachstumsbranchen. Konzeption eines theoriegeleiteten Handlungsrahmens für die praxisinduzierte Unternehmenssteuerung. Ilmenau, Univ., Diss., 2003 u.d.T.: Schorcht, H.: Risikocontrolling junger Technologieunternehmen in Wachstumsbranchen. Logos-Verl. Schriften zum Konvergenzmanagement, Berlin (2010) Schimmelpfeng, K.: Risikomanagement im Industrieunternehmen. In: Götze, U., Henselmann, K., Mikus, B.: Risikomanagement, pp. 277–297. Physica-Verlag, Heidelberg (2001) Mugler, J.: Risk Management in der Unternehmung. Wien, Wirtschaftsuniv., Hab.-Schr. Orac. Unternehmung und Gesellschaft, Wien (1979) Romeike, F.: Risikoidentifikation und Risikokategorien. In: Romeike, F., Finke, R.B. (eds.) Erfolgsfaktor Risiko-Management. Chance für Industrie und Handel Methoden, Beispiele, Checklisten, pp. 165–180. Gabler Verlag, Wiesbaden (2003b) Zawisla, T.: Risikoorientiertes Lieferantenmanagement. Eine empirische Analyse. München, Techn. Univ., Diss., 2006, p. 39. TCW Transfer-Centrum. TCW Wissenschaft und Praxis, München (2008) Diederichs, M.: Risikomanagement und Risikocontrolling. Franz Vahlen, München (2017) Becker, W., Ulrich, P.: BWL im Mittelstand. Grundlagen; Besonderheiten; Entwicklungen. Kohlhammer Verlag (2015) Hartmann, H.: Modernes Einkaufsmanagement. Global Sourcing, Methodenkompetenz, Risikomanagement. Dt. Betriebswirte-Verl. Praxisreihe Einkauf, Materialwirtschaft, Gernsbach (2014) Wildemann, H.: Einkaufspotentialanalyse. Programme zur partnerschaftlichen Erschließung von Rationalisierungspotentialen, p. 22. TCW Transfer-Centrum. TCW, München (2008) Burghart, S.: IBR-Analyse zur Identifikation von Beschaffungsrisiken als Alternative zur ABC-Analyse. In: Hofbauer, G., Oppitz, V.: Wissenschaft und Forschung. Wissenschaftliche Beiträge zur Forschung, pp. 423–439. Uni-Edition, Berlin (2017) Kuhn, A.: Input-Output-Rechnung im Überblick (2010). Retrieved from: https://www.destat is.de/. Accessed: 24 Aug 2018 Romeike, F.: Risikomanagement. Springer Gabler. Studienwissen kompakt, Wiesbaden (2018) Hoffmann, J.: Risikomanagement für mittelständische Unternehmen. Risikopotenziale erkennen und erfolgreich bewältigen; mit zahlreichen Praxissituationen. Books on Demand, Norderstedt (2012) Gleißner, W., Lienhard, H., Stroeder, D.H.: Risikomanagement im Mittelstand. Planungssicherheit erhöhen, Rating verbessern, Unternehmen sichern. RKW-Verlag, Eschborn (2004) Lehmeyer, P.: Zur Bedeutung des Risikomanagements im Mittelstand. Eine Untersuchung des Verbreitungsgrades und der verwendeten Instrumente. Diplomica Verlag GmbH, Hamburg (2014) Deutsche Gesellschaft für Qualität, Arbeitsgruppe Risikomanagement: Risikomanagement. Risiken beherrschen - Chancen nutzen, pp. 12–41. Dt. Gesellschaft f. Qualität e.V. DGQ-Band, Frankfurt (2007) Horváth, P.: Finanz-Controlling. Strategische und operative Steuerung der Liquidität. HaufeLexware GmbH & Co. KG. Haufe Fachpraxis, München (2011) Wolke, T.: Risikomanagement. De Gruyter Oldenbourg, Berlin (2016)

358

S. Burghart and M. Fekete

79. Vahs, D., Schäfer-Kunz, J.: Einführung in die Betriebswirtschaftslehre. Schäffer-Poeschel Verlag, Stuttgart (2015) 80. Sax, J., Andersen, T.J.: Making risk management strategic. Integrating enterprise risk management with strategic planning. Eur. Manage. Rev. 17(3), 1–22 (2018). Retrieved from: https:// doi.org/10.1111/emre.12185. Accessed 5 Nov 2019 81. Heß, G.: Strategischer Einkauf und Supply-Strategie. Schrittweise Entwicklung des strategischen Einkaufs mit der 15M-Architektur 2.0. Springer Gabler, Wiesbaden (2017) 82. Slovic, P.: The perception of risk. Reprinted. Risk, society and policy series. Earthscan, London (2011) 83. Schiele, H., Veldman, J., Hüttinger, L.: Supplier innovativeness and supplier pricing. The role of preferred customer status. Int. J. Innov. Manage. 15(1), 1–27 (2011) 84. Maushake, A., Löffler, J., Burghart, S.: Risikomanagement im Einkauf. 6. BME-Forum. Wiesbaden (2018) 85. Pieringer, M.: Passung geht vor Eignung. Logistik Heute 6(11), 18–19 (2018) 86. Verkuil, A.H., Dey, P.: Forschungsverständnis im Kontext anwendungsorientierter Wissenschaften (F&E). Forschungsbeitrag. Brugg-Windisch (2010). Retrieved from: http://docplayer.org/17863651-Forschungsverstaendnis-im-kontext-anwendungsorientierterwissenschaften-f-e.html. Accessed 29 Oct 2017 87. Tilch, T., Lenz, A., Scheffler, R., Andreas, S., Obersdorf, S., Yilmaz, Y.: Risk-ManagementBenchmarking. Dissertation (2015) 88. Virglerova, Z.: Differences in the concept of risk management in V4 Countries. Int J Entrepreneurial Knowl 6(2), 100–109 (2018). Retrieved from: https://doi.org/10.2478/ijek2018-0017. Accessed: 31 Oct 2019 89. Duhadway, S., Carnovale, S., Kannan, V.R.: Organizational communication and individual behavior: implications for supply chain risk management. J. Supply Chain Manag. 54(4), 3–19 (2018) 90. Brocas, I., Carrillo, J.D., Giga, A., Zapatero, F.: Risk aversion in a dynamic asset allocation experiment. J. Financ. Quantit. Anal. 54(5), 2209–2232 (2019). Retrieved from: https://doi. org/10.1017/S0022109018001151. Accessed: 31 Oct 2019 91. Kirilmaz, O., Erol, S.: A proactive approach to supply chain risk management. Shifting orders among suppliers to mitigate the supply side risks. J. Purchas. Suppl. Manage. 23(1), 54– 65 (2017). Retrieved from: https://doi.org/10.1016/j.pursup.2016.04.002. Accessed: 22 May 2019 92. Wheatley, M., Ramsay, M.: After the Disaster in Japan. Automotive Logistics (2011). Retrieved from: https://www.automotivelogistics.media/after-the-disaster-in-japan/7408.art icle. Accessed 12 Oct 2019 93. Tian, F., Xu, S.X.: How do enterprise resource planning systems affect firm risk? Postimplementation Impact. Manag. Inf. Syst. Q. 39(1), 39–60 (2015) 94. Hoeckel, C., Neuert, J., Schüller, M., Schwamborn, A., Wang, J.: Return on Investment from Supplier/Risk Management. J. Bus. Manage. 25(2), 1–23 (2019) 95. Kaczor, S., Kryvinska, N.: It is all about services—fundamentals, drivers, and business models. Soc. Serv. Sci. J. Serv, Sci. Res. 5(2), 125–154 (2013) 96. Molnár, E., Molnár, R., Kryvinska, N., Greguš, M.: Web Intelligence in practice. Soc. Serv. Sci. J. Serv, Sci. Res. 6(1), 149–172 (2014) 97. Ulrich, H.: Management. Schriftenreihe Unternehmung und Unternehmungsführung. Haupt, Bern (1984) 98. Appelfeller, W., Feldmann, C.: Die Digitale Transformation des Unternehmens. Systematischer Leitfaden mit zehn Elementen zur Strukturierung und Reifegradmessung. Springer Gabler, Berlin (2018) 99. Gloger, A.: Betriebswirtschaftsleere. Wem nützt BWL noch? Frankfurter Societäts-Medien GmbH, Frankfurter Allgemeine Buch, Frankfurt am Main (2016) 100. Power, M.: Risikomanagement ist selbst ein Risiko. Harvard Business Manager, 11, 109–115 (2010)

Risk Management of Procurement of the German …

359

101. Bode, C.: Reaktives Risikomanagement: Mut zur Lücke. Agieren oder Reagieren. BIP Best in Procurement 6(1), 50–51 (2015)

The Documentation in the Project of Software Creation Adam Szewc, Vincent Karoviˇc, and Peter Veselý

Abstract The presented work describes the documentation process in software development projects, which are based on various methodologies. The classic waterfall model of the software development process, Rational Unified Process and eXtreme Programming were chosen as examples of methodologies. The RUP and XP methodologies are the main examples of two different groups of methodologies— agile and traditional. Although these methodologies represent completely different approaches to the design of the system and the process of its documentation, both have gained great popularity and are currently used in many software companies. The aim of the work is to provide the reader with various documentation processes, compare their essential content and demonstrate their impact on the success of the project. Due to the advantages and disadvantages of the presented documentation processes, the final result of the work is the creation of a universal form of the documentation process in software development projects.

1 Introduction Creation and updating of system documentation is a condition for the creation, maintenance and operation of complex IT systems. Each software design team follows certain established rules and procedures during the production of the project and its documentation. Typically, for small projects implemented by small programming teams, a relatively small amount of documentation design is created and the most important phases A. Szewc Lodz University of Technology, Lodz, Poland e-mail: [email protected] V. Karoviˇc · P. Veselý (B) Faculty of Management, Comenius University in Bratislava, Bratislava, Slovakia e-mail: [email protected] V. Karoviˇc e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_14

361

362

A. Szewc et al.

of software development, such as analysis and design, are performed informally. The creators of small projects are based on the premise and the use of modern code, in the vast majority, object-oriented and created using highly automated tools, should be sufficiently transparent and it is unnecessary to create documentation for it. In large projects where there is communication between team members. The scope of the project requires a high specialization of its participants and the assumptions and functions of the project are very extensive, it is necessary. A project is a complex and intricate process of creating a project. The result is the creation of many formal and officially evaluated documents, which are created in the successive stages of software development and which are then presented at various meetings and presentations. Documents that describe not only the assumptions of the system, but are used to plan the entire company, estimate its costs and duration and identify its resources needed for implementation. They also have the task of assessing the risks, sharing responsibilities between the individuals involved in the creation of the project and how to report on the progress of work on it. An important role is also played by the documentation of the generated code, which significantly simplifies not only the work on the project, but also enables its subsequent extension, updating and maintenance. The aim of this thesis is to present the process of documentation of a programming project on the example of selected methodologies and then compare their quality, content and impact on the success of the project. The final result of the work will be the development of a universal organization of the documentation process, taking into account the advantages and disadvantages of these processes in these methodologies. The classic process of cascade software development, RUP and XP methodology was chosen as examples of methodologies. These methodologies represent different ways of approaching the system development process and its documentation. The RUP and XP methodologies are flagship examples of two methodologies, namely agile and classical. They are an excellent example of a completely opposite approach to project creation, but both have gained great popularity and are currently used in many software companies [1]. However, all the methodologies presented have advantages and disadvantages, and this applies to the equally established documentation procedure. In this work, an attempt was made to determine a universal approach to the subject of the design of documentation. The following elements, which are decisive and necessary for the project, were selected from the submitted methodologies, while elements that are not necessary were excluded. Great attention was also paid to the notes, the use of which will bring the most desired effects and at the same time streamline the entire software production process. The last element of this work is the development of the documentation layout by creating templates for the latest documents and describing their content.

The Documentation in the Project of Software Creation

363

2 Programming Documentation Creating sufficiently detailed and usable documentation is a demanding and relatively expensive task, which significantly helps to eliminate the risk of project failure due to the possibility of early detection of errors in the initial and key phases of the project. Already in the phase of determining the system analysis, it is possible to identify specific problems that hinder the successful implementation of the project and which are also easy to eliminate by creating the appropriate project documentation: • • • • • • • • •

incorrect or inaccurate understanding of the client’s needs, software that is difficult to maintain and expand, mismatched design modules, too late detection of serious project errors, unacceptable operation of the software, poor software quality, insufficient coordination of team members’ activities, no personal responsibility for individual elements of the system, Difficulty in responding to changes in customer requirements.

In addition to significantly reducing the risk of project success, the documentation also enables the planning of workloads, costs and deadlines at a rationally low risk of errors. Creating project documentation also gives you the opportunity to check the development cycle in terms of compliance with the schedule and the planned budget, as well as possible evaluation of adjustments in this regard. From the customer’s point of view, a very important aspect of correctly prepared documentation is the possibility of changing the supplier after the completion of any phase of the project without the need to repeat the entire development cycle from the beginning.

2.1 Types of Documentation Developed in the Software Development Cycle During the software development process, various types of project documentation are created. There are several ways to classify documentation according to factors such as the addressee, the purpose, the role of the person forming, or the development cycle of the system in which it was created. In general, there are two basic types of project documentation: • user documentation, • manufacturer’s documentation. Due to the large problem area, the manufacturer’s documentation is usually divided into two modules: • process documentation, • Technical documentation.

364

2.1.1

A. Szewc et al.

User Documentation

This type of documentation is intended for the end user system. The purpose of the user documentation is to present the principles of the application from the perspective of the person using it, it should accurately describe all the possibilities of the application and indicate how to perform the required activities in it. User documentation should be prepared in plain language, consist of unstructured sentences and should not contain technical slang that is incomprehensible to the recipient. In the user documentation, they are most often included in the list of actions to be performed in the system and in the examples describing the procedure of their implementation. The arrangement, structure and binding illustrations of the text serve to make it as easy as possible to find information and understand the problem. In many cases, the documentation also includes question and answer files (FAQ1) based on problems reported by users when using the product, help files, and general information about the system and how to operate it. Operational documentation also consists of administrator documentation that includes configurations, hardware requirements, and complete system administration specifications. In many cases, the documentation also includes question and answer files (FAQ1) based on problems reported by users when using the product, help files, and general information about the system and how to operate it. Operational documentation also consists of administrator documentation that includes configurations, hardware requirements, and complete system administration specifications. The operational documentation of the system is necessary from the customer’s point of view. Precisely prepared documentation makes it possible to avoid many misunderstandings that arise during the use of the created final product.

2.1.2

Process Documentation

The purpose of process documentation is to ensure the correct, repeatable and stable implementation of the entire software development process. With its help, the manufacturer defines the method of project implementation and informs about its progress. The process documentation consists of various plans, estimates, reports, plans, standards and communication between team members. From the manufacturer’s point of view, this is particularly important if it is included in its quantitative and qualitative information. It allows you to evaluate the method of implementation of previous projects and correctly plan the current implementation of the project. Part of the process documentation is created before starting the work programming. Special emphasis is placed on the description of the project drive processes, project management techniques and organizational structures. Individual documents present the work schedule, their priorities and the tasks of individual project participants. The method of implementation also includes elements of management, such as planning, monitoring and reporting on the progress of work in routine and exceptional situations. This documentation specifies the types, rules and plans for system

The Documentation in the Project of Software Creation

365

acceptance tests with their acceptance of the criteria and the procedure for their implementation. However, the process documentation is created mainly during the development of the work and contains documents showing the state of the project in the various stages of production. The documents created include a settlement of the use of allocated resources, reports on work performed and technical statistics. This type of documentation allows you to estimate the degree of progress in the design and production process and control the performance of the team. In this way, documents presenting the results, tests and analyzes of the created software are created. The process documentation also includes the resulting documents after successful completion of the tests. These are, in particular, memoranda of understanding and approval and approval reports.

2.1.3

Technical Documentation

The technical documentation is created during the following stages of the system development cycle and then updated during system operation and maintenance. Its content is strictly dependent on the accepted software development methodology. The documentation contains components such as: The technical documentation is created during the following stages of the system development cycle and then updated during system operation and maintenance. Its content is strictly dependent on the accepted software development methodology. The documentation contains components such as: • • • •

system architecture, description of functionality, specification of the system structure, other documents with technical and design aspects.

An integral part is also the source code documentation, which is necessary because it facilitates its later repair or modification. The primary goal of the technical documentation is a detailed description of the manufactured system, its structure, algorithms used, the way of operation of individual components and implemented functionality. It makes it possible to create a system of many participants in a coherent and uniform way. The aim of this documentation is also to facilitate the subsequent phases of updates and fixes.

2.2 Software Development Process To thoroughly understand and determine the role of documentation in the software development process, it is necessary to define this process and its parts.

366

A. Szewc et al.

Fig. 1 Diagram of the software development process

As a software development process, we understand software development as the whole activity performed in order to transform the customer’s or recipient’s requirements into software in accordance with the adopted system manufacturing methodology. It includes not only the technical implementation of the project, but also related to project management and quality aspects. This process is characterized by the tasks of the project participants, their assigned tasks and the products obtained. Figure 1 shows a software development scheme. The distribution of individual activities over time plays an important role in the software development process. For this purpose, a system life cycle has been introduced. It covers the entire period of origin and existence of the system—from analysis, design and implementation to its operation and maintenance.

2.2.1

Programming Life Cycle

The software life cycle is a process consisting of a series of coherent stages that enable the complete and efficient creation and use of information systems. This cycle begins when the system realizes the need for its existence and ends with decommissioning [2–4]. The main tasks of the software life cycle are the definition of individual activities that should be performed during the planning and subsequent construction of the system, the unification of project implementation methods and the planning of control points for system implementation. Due to the complexity of the processes implemented in it, the software life cycle is divided into phases. Thanks to this division, it becomes easier to solve the problem—the production of the system, because the individual stages involve a much smaller range of tasks and activities to be performed, which greatly facilitates their implementation and supervision. The software life cycle assumes the existence of the following phases:

The Documentation in the Project of Software Creation

367

Fig. 2 System life cycle diagram

• strategic phase—consists of defining strategic goals, as well as project planning and definition, • requirements formulation phase, • business analysis and system requirements, • design phase: conceptual and logical, • realization (construction), • test phase, • preparation of documentation, • installation, • preparation, admission, training, • maintenance, care and backup. Figure 2 shows an graphical diagram of the system life cycle. It presents the sequence of individual phases and their location in time. Many software lifecycle models have been developed, including a cascade model, a spiral model, prototypes, or assembly from finished components. The cascade model and the spiral model are considered to be the basic models of the system development cycle, but they are rarely used in pure form, but they are the basis for other, more frequently used models, which are their development or detail.

2.2.2

Cascade Model

The cascade model of the system development cycle was published in 1970 by Winston Royce with the aim of introducing a certain level of formalization of IT projects [5, 6]. The scheme of this model is shown in Fig. 3. In the cascade model, the entire system development cycle was divided into several stages; each of them is shown in the drawing by a corresponding block. The model is based on the assumption that it is not possible to move to the next phase until the previous one is completed. If one of the phases returns an unsatisfactory product, we will return by performing subsequent iterations until a satisfactory result is obtained. In addition, two parts can be distinguished in each of the stages: the first concerns the actual work carried out at that stage and the second concerns its verification and approval. Verification is aimed at checking that the stage has been completed in accordance with the project’s assumptions, and by approval we mean checking and

368

A. Szewc et al.

Fig. 3 Cascade model of the system creation cycle [5]

confirming that the product obtained at a given stage is correctly designed and in accordance with accepted standards. In addition to the undoubted benefits, such as ordering activities and managing the quality of the whole project through verification and approval after each stage, this model has very acute shortcomings. It does not distinguish between management control, risk planning and control measures. It also forces implementers of subsequent phases to wait for the completion of previous phases, and significantly increases the cost of errors made in previous phases. The cascade model is designed mainly for processes with a small number of repetitions of work on the product and a relatively small number of changes after completion of work in subsequent stages, so it is applicable only to projects with

The Documentation in the Project of Software Creation

369

clear, transparent and precise requirements, because each iteration is time consuming and associated with additional costs. This determines the method of creating documentation—the main emphasis on creating complete and accurate documentation is placed on documents containing system specifications and customer requirements. Due to the hermetic nature of the stages, it is not necessary to create a comprehensive project and detailed technical documentation. Due to the lack of expected changes, code documentation is most often omitted in this approach, as well as a description of the implementation of the intended functionalities.

2.2.3

Spiral Model

The spiral model was developed by Barry Boehm. Introduces an iterative or evolutionary approach to building systems. It is mainly used in projects where the requirements are not clearly formulated or understood by users. In this case, it is most often necessary to repeat the same steps in several cycles in order to clarify the requirements, problems and obtain the desired product [5, 7, 8]. Figure 4 shows a generalized diagram of the spiral model. This model was divided into four parts: • planning—this is the area where goals are set and alternatives and limitations are indicated, • risk analysis—in this phase, alternatives are assessed and risks are identified, • construction (engineering and implementation)—in this phase, the creation of individual elements takes place in accordance with the assumptions of the cascade model,

Fig. 4 General diagram of the spiral model [9]

370

A. Szewc et al.

• testing (evaluation)—in this phase, tests of the adopted solution are performed and the planning of the next phase or iteration begins. In the spiral model, the software development cycle begins in the middle of the spiral and unfolds outwards. In the initial phase, the requirements for the product are poorly understood and are clarified only by gradual rotations in a spiral orbit—the basis for the implementation of subsequent versions of the product are the results of tests performed by the customer. The undoubted advantages of this software development cycle model are the rapid response to emerging risk factors and the combination of iteration with the classical cascade model. However, when deciding to use a spiral model, it is necessary to take into account the possibility of problems in case of incorrect risk assessment and keep in mind that the time to reach the target version of the product can be significantly longer, which usually causes customers’ reluctance to implement this model.

2.3 Software Development Methodologies By the end of the 1970s, most IT systems construction projects had defined user requirements in a “narrative way.” The systems analyst has written his understanding of the user’s needs in an extensive document. The functional specifications of the system thus created had many serious disadvantages [3, 10, 11]: • were monolithic—to understand them, you need to read the whole document carefully, • redundancy—the same information often appeared in several places in the document, • ambiguity—requirements that the user, analyst, designer and programmer often interpreted differently. As early as the 1970s, in response to the increasing complexity of the software being developed, there was a need to formulate standardized assumptions and principles regarding the software development process. Large IT projects have increasingly suffered from LOOP syndrome: • Late order software is usually shipped late. The average delay is 6–12 months. • Beyond the budget—as a rule, the budget is significantly exceeded—on average from 50 to 100%. • Overtime—Developers must work overtime. • Low quality—poor quality of the resulting software. At the turn of the 70’s and 80’s, it is believed that the source of all failures in the software development process is the will of programmers, organizational disorder, lack of clearly defined rules of cooperation, fuzzy responsibility and lack of appropriate methods of control and management. Quality assurance shown in Fig. 5. Based on the above opinions, software development methodologies have been developed—standards that organize the software development process.

The Documentation in the Project of Software Creation

371

Fig. 5 Phases of building an IT system for a company [12]

The methodology is a set of standardized principles of project management aimed at achieving the project while achieving its objectives. It is a set of concepts, notations, models, languages and procedures for analyzing the domain that is the subject of the proposed system, and conceptual, logical and physical design. Software development methodology is usually organized throughout the life cycle of an IT system, which is a structure illustrating the basic activities of production, implementation and use of the system together with the interrelationships between these activities, data flow between them and their dependencies over time. The project methodology is in the form of a description of how to proceed in the project implementation, contains guidelines for project planning, management, production and monitoring. Using software development methodologies in the project, it is possible to identify the individual phases of the project, the tasks of its participants, behavior scenarios and models created in each phase, the rules for moving to the next phase, the notation used and the types of documentation created in each phase. Due to differences in the method of information modeling in the project analysis phase, we distinguish two types of methodologies: • structural, • object oriented. However, due to the approach to formalities related to the course of process documentation during the project, the project management methodology is divided into: • classical methodologies (difficult, hard), • light methodologies (agile, soft).

372

2.3.1

A. Szewc et al.

Structural Methodologies

Structural methodologies combine the description of static data and the description of a static process. They are characterized by separation in process and data modeling. Structural methodologies are adapted to the relational data model, only simple data types are used. In structural methodologies, the basic task is to build a so-called basic model without implementation conditions and then create an implementation model based on it. The basic model contains the complete functional specification of the system, describes the principles of its operation and the content of data stored in the system and flowing inside the system. The basic model includes, but is not limited to: • • • • • • • •

context diagram, determining the purpose of the system, list of events, diagrams of relationships between entities, data flow diagrams, diagrams of state changes, data dictionaries, decision tables and trees.

This model should not contain information on how the system is implemented. On the other hand, the implementation model consists of: a graphical user interface and a list of operating restrictions. Structural methodologies include Yourdon’s Methodology, Structured Analysis and Design Methodology (SSADM) and Structured Analysis and Design Technique (SADT).

2.3.2

Object Methodologies

In response to the need to cope with the increasing complexity of system information technologies, object-oriented methodologies were developed—initially in the field of software design, then the object-oriented paradigm was transferred to the earlier stages of the information systems development cycle. The specified problem is considered to be a set of related objects, corresponding to physical or abstract objects of the modeled fragment of reality. In object-oriented methodologies, data and processes are modeled together, complex data types are used. These methodologies use the concepts of Objectivity for information systems of conceptual modeling, analysis and design. The main component of object-oriented methodologies are class diagrams, which create a certain extension of diagrams of relationships between entities. By filling in class diagrams in object-oriented methodologies, they are dynamic diagrams that take into account the states and transitions between them, and interaction diagrams

The Documentation in the Project of Software Creation

373

that determine the relationship between method calls and function diagrams. Instead, case diagrams are used to map the structure of the system from the user’s perspective. Examples of methodologies using the concept of objectivity are: Object Modeling Technique (OMT), Object Oriented Analysis and Design (OOAD), Object Oriented System Analysis (OOSA), Object Oriented Analysis/Object Oriented Design (OOA/OOD) and Rational Unified Process (RUP).).

2.3.3

Classical Methodologies

Classical methodologies focus exclusively on standardization of access, terminology and documentation, they assume precise control over each stage of work, the beginning and end of the project. Difficult methodologies are plan-oriented and precisely define the final products of the project. The main premise is to deliver the product with the given requirements. Requirements are defined for the delivered project products, on the basis of which the project schedule and costs are determined. Classical methodologies are characterized by a considerable amount of necessary project documentation and formally formulated decision-making, reporting and reporting processes. They are most often used in large projects involving large project teams. Traditional project management methods include the following methodologies: Body Management Body of Knowledge (PMBoK), Projects in a Controlled Environment (PRINCE2), TenStep Project Management Process and RUP.

2.3.4

Easy Methodologies

Light methodologies are characterized by a limited number of formalisms related to the software development process, smaller and less formal documentation, and easier and more direct contact between project team members. Agile methodologies were created in response to fatigue and discouragement of project teams from creating too extensive project documentation and communication difficulties resulting from excessive formalization of communication protocols in the project. The light methodologies place great emphasis on direct communication in the development team and iterative implementation of the project. Light methods are used mainly in small teams, where there is the possibility of direct communication, so there is no need to create a large amount of documentation. This makes it easier to understand the problem and minimizes risk in projects with a relatively short implementation time, but requires a well-coordinated and stable team. Agile methodologies are based on the value of delivered products. Costs and time frame are predetermined. Using agile methodologies, a product requirements list is created that specifies changes to an existing software product prototype. The role of the project manager is to maximize the value of the delivery by adjusting the properties for the current iteration and removing any obstacles faced by the

374

A. Szewc et al.

development team. After a specified period, a product with the highest possible commercial value should be created. Easy project management methodologies include: Extreme Project Management (XPM), Adaptive Project Framework (APF), Lean Six Sigma, Scrum, Lean Development (LD), Eclipse Process Framework Project (EPF), Microsoft Solution Framework (MSF) and eXtreme Programming (XP).

2.4 System Modeling Tools Each software development methodology defines a suitable notation, t. J. A set of designations used during the project documentation. It supports human memory and imagination and facilitates communication between project team members and between the project team and the client. Notation as a necessary tool in the analysis and design phase of the system allows easy and transparent modeling of complex dependencies. The following types of registration are distinguished: • natural language, • graphic notation, • technical specifications. Depending on the software development methodology used, characteristic tools and notations supporting system modeling can be distinguished. For structural methodologies, the following diagrams are among the most important and most frequently used graphical representations of concepts: • DFD (Data Flow Diagram)—data flow diagram. Used to model system functions, it shows the direction of data flow between functions, warehouses and external objects. The DFD diagram consists of the following elements: • functions (processes)—follow specific goals, • data warehouses—permanent or temporary data warehouses, which are arguments for functions, • terminators—objects that are not part of the system, but are recipients or sources of data or functional arguments, • flows—elements indicating the direction of data transmission (for example, bytes, characters or packets). • ERD (Entity Relationship Diagram)—a diagram of relationships between entities. Used to model data, it transforms the real world into sets of entities and the relationships between them. This diagram is often used in database design, especially in analyzing functional dependencies, troubleshooting data redundancy, and organizing the database structure. Entity relationship diagrams are most commonly used to describe the location of data in a high-level abstraction system, to ensure the independence of data from the processes acting on them, and to indicate the relationships between data.

The Documentation in the Project of Software Creation

375

• STD (State-Transition Diagram)—transient network diagram. Used to model the temporary behavior of the system. The diagram shows which sequences of input signals (data) cause the system to enter a given state, and what actions are taken in response to the occurrence of specific input states. • ELH (Entity Life History Diagram)—an object’s life history diagram. Used to display the behavior of an object depending on events in a hierarchical model. The growing popularity of objectivity in computer science has led to the creation of many methodologies and object-oriented notations. Despite many differences in approach and purpose, these methodologies have many elements in common and the differences between the entries introduced in these methodologies are negligible. In order to unify them, the UML modeling language was developed, which, unlike methodologies, which, in addition to notation, also define the procedure in other phases of the project, is only a set of concepts and notations.

2.4.1

UML

UML (Unified Modeling Language) is the result of the joint efforts of three renowned methodologists: Grady Booch, Ivar Jacobson and James Rumbaugh. It is a unified modeling language, which is the successor and synthesis of the notations present in object-oriented methodologies for the analysis and design of information systems, which appeared at the turn of the 80s and 90s. UML covers everything that can be done with existing notations. It is based on concepts of objectivity, such as: objects, classes, attributes, relationships, aggregations, inheritance and methods [11, 13, 14]. The main role of UML is to mediate between the human understanding of the structure and operation of programs and program code. It then addresses human perception and imagination through a graphic sign. UML allows you to create software specifications, structures, visualizations, and documentation. UML diagrams create a direct link between elements of the conceptual model and executable programs, which allows to cover the problems associated with the scale of the problem that accompanies complex systems, with a critical mission [15]. Taking into account many design aspects of the created system, UML introduced various graphical tools for a clear and intuitive presentation of the analyzed level of the system: • • • • • • • •

use of case diagrams, class diagrams, diagrams mapping the dynamic properties of the system, including: sequence diagrams, cooperation diagrams, state diagrams, activity diagrams, implementation diagrams.

376

2.4.2

A. Szewc et al.

Use Case Diagram

The use case diagram is used to graphically represent use cases, actors and their relationships. It provides an overview of the services provided by the system actor, but does not provide specific technical solutions. A use case is a complex action performed on a system in response to a specific user activity. This is a specially named interaction between the user and the proposed system. The use case maps the system’s functions as future users will see them. An actor is a collection of roles that users play in interacting with a particular use case. Actors are divided into two groups: personal actors and impersonal actors. Personal entities are most often identified with specific functions, persons, or organizations, and impersonal entities represent external systems, subsystems, devices, databases, or time. An actor may be associated with one or more use cases in the proposed system, while a use case is used by one or more actors. The interaction of actors with use cases consists in initiating them, providing data, receiving data and using the functions implemented in the use case. Actors should not be compared to users, as one person can play the role of many actors and one actor can answer several people. The actor is the driver of the use cases, both the originator of the events triggering the use case and the recipient of the information obtained in the use cases. A relationship is understood as a semantic connection between elements. Each participant in the diagram must be associated with at least one use case and each use case must be associated with at least one actor. The use case diagram is mainly used to set requirements for the proposed system and during dialogue with future users in order to specify these requirements. It allows a better understanding of the purpose of using the system, verification of the correctness and completeness of the design, determination of all functional and non-functional properties of the system, determination of system components and providing a basis for testing. Figure 6 shows an example of a use case diagram for a business invoicing system. In addition to simple associations of actors with use cases, two modifications of the relationship are distinguished: • «extends»—means that a given use case can be extended by another, • «includes»—to indicate a common fragment in many use cases, which is worth distinguishing due to its conceptual similarity and due to the later possibility to avoid multiple implementations of this fragment. For the use case diagram [13], an appropriate dossier should be created containing the following elements: • • • • •

a brief description of the use case, flow of events described informally, description of the relationships between use cases, description of participating sites, special requirements (eg response time, power),

The Documentation in the Project of Software Creation

377

Fig. 6 Example of use case diagram [13]

• UI images • interaction diagrams for each actor. 2.4.3

Class Diagram

The central concept in all known object-oriented methodologies is the class diagram used in modeling the static aspects of the design perspective. A class diagram is used to represent a fragment or the entire structure of a system in an object model by illustrating the structure of the classes and the dependencies between them. Usually, there are many class diagrams in the system design, which are divided based on their functionality. One class can appear in many diagrams. A class diagram is usually an extension of an entity relationship diagram. In connection with this type of diagram, it introduces methods assigned to specified classes and a new auxiliary notation. Class diagrams are used to record the results of the analysis and to determine design assumptions. They are the basis for analysis in object-oriented designs. They are used in writing a conceptual model, formal specification of data and methods, and in the implementation phase as a graphical tool to display implementation details. The basic element in class diagrams are classes that, at the domain modeling level, correspond to the concepts that exist in this area. A class describes a set of objects that share the same attributes, operations, methods, and semantics and are connected to a network of dependencies that fall into one of the following categories: • dependency—often represents a usage relationship, changes made to the specification of one element can affect the dependent element, • heritage—illustrates the generalization/specialization relationship between classes,

378

A. Szewc et al.

Fig. 7 Sample class diagram

• association—any relationship between the objects of the domain in question. Figure 7 shows an example class diagram for a wolf and sheep game project.

2.4.4

Activity Diagram

An activity diagram, also known as an activity diagram, is a diagram that describes the dynamics of a system. Used to model the activities and responsibilities of system components or users. Unlike a state diagram, it describes activities associated not with a single object, but with many, between which there may be communication when performing activities. Activity schemes make it possible to determine how the system achieves its intended objectives—they define what activities are carried out and how they relate to each other.

The Documentation in the Project of Software Creation

379

Fig. 8 Sample activity diagram

Activity diagrams are particularly useful for modeling traffic flows or for describing behavior with a predominance of concurrent processing. They illustrate the activity, but without showing the entities that participate in it, so they are most often used as a starting point for the behavioral modeling process. Figure 8 shows a sample workflow for designing a library maintenance system.

2.4.5

Interaction Diagram

Interaction diagrams are used to describe dependencies when sending messages to a particular group of objects. They illustrate the dependencies in the management flow, which helps to understand the relationships and interactions between the various methods that implement this flow. UML introduces four types of interaction diagrams: sequence diagrams, collaboration diagrams, scheduling diagrams, and interaction summary diagrams. Sequence diagrams have two dimensions: a vertical dimension—representing time and a horizontal dimension—representing the respective objects. In sequence diagrams, the sequence of events is important, but the actual time is not important. Individual objects are represented by rectangles, a line representing the lifetime of this object is drawn from each object. Each object in the diagram can be in one of two states: active—then its life line is represented by a narrow, long rectangle or sleep—then the life line has the form of a dashed line. Between the individual life lines of objects, the moments of sending messages between objects are marked in the form of arrows (Fig. 9).

380

A. Szewc et al.

Fig. 9 Sample sequence diagram [13]

Collaboration schemes are similar to sequence diagrams, but the time dimension is not directly mapped here, but the relationships between objects are mapped. Cooperation between objects focuses on two aspects: the static structure of the participating objects and the sequence of messages they exchange with each other. The collaboration diagram shows messages sent between objects that are used to achieve their intended purpose. The message flow is indicated by an arrow next to the message name and its parameters (Fig. 10).

Fig. 10 Sample collaboration diagram [13]

The Documentation in the Project of Software Creation

2.4.6

381

State Diagram

The state diagram describes the states of a certain process, important in terms of the conceptual model of this process and the transitions between them. An object lifecycle is created, which can be more important in the software development process, the more possible states of the object. Over time, the original purpose of the state diagram has changed quite significantly and now also shows the flow of control along with a number of secondary syntactic and semantic options (Fig. 11).

2.5 Summary of Part 2 The methodology used in the software development process is an essential part. The implementation of the methodology into the life cycle of the system enables the implementation of large projects with a significantly reduced risk of failure. This is due to the fact that they enable effective management, leadership and supervision of the entire system development process. Easy and classic software development methodologies are characterized by a completely different approach to the process of creating project documentation. The differences are especially visible in the implementation of technical documentation and process documentation. Light methodologies virtually ignore the process

Fig. 11 Example of a simplified state diagram [13]

382

A. Szewc et al.

documentation process, while product documentation is limited to key elements, the absence of which would make it impossible to create a project.

3 Cascading Software Development Procedure The cascade procedure of the software development process was created based on the cascade model of the system life cycle. It describes the goals of individual phases of this model, the methods of their implementation and the documentation created in them.

3.1 Strategic Phase The strategic phase, also known as the Strategic IT Development Plan (SPRI) or feasibility study, takes place before the manufacturer decides to implement the project. The main elements of this phase are interviews with the client aimed at defining the individual objectives of the project from the client’s point of view and defining the scope and context of the project. In the strategic phase, it is necessary to define the main requirements and create an overall and schematic design of the system that will be used to estimate the valuation and determine the risk of project success. It is a very common practice to present to the client several options for solving the problem together with the relevant costs and analysis. In this phase, it is also necessary to prepare a preliminary schedule of work and define the standards according to which the project will be implemented, and then present the project to the client. At the strategic stage, the manufacturer must make key decisions regarding the software development process, such as: • • • • • • • • • •

selection of the model according to which the project will be implemented, selection of techniques used in the analysis and design phases, selection of implementation environment/environment, CASE 2 tool selection, determination of the degree of use of finished components, deciding on cooperation with other manufacturers or jobs experts and consider the following restrictions: maximum expenses that may be incurred in the implementation of the project, available staff, available tools, time limits.

The final phase of the strategic phase is the submission of the results of the phase to the client, which may be followed by the acceptance or rejection of the software developer’s offer. The strategic phase is an integral part of the software

The Documentation in the Project of Software Creation

383

development cycle and should therefore not be carried out at the expense and risk of the manufacturer. In the strategic phase of the software life cycle, the creation of clear and transparent documentation is an important element. It should clearly define how the manufacturer will implement the project and its approximate valuation. The first document, t. J. Demand, prepared by the client. It contains a requirement for the specification of the product and the conditions for its creation by the said manufacturer. Such demand should consist of the following elements: • formal data (name of the client, contact details, NIP and REGON, account number, current extract from the Commercial Register, designation and signature of the person authorized to act on behalf of the client), • entrance fee, • description of the current state of the project, • product/service requirements, • description of the scope of the project, • principle of activity, • defining basic restrictions (time, access to data), • the date of opening and closing of negotiations, • a list of other specifications required outside the offer. On the basis of the request for quotation, the manufacturer shall prepare an initial offer, which should include the following elements: • formal data (name of the client, contact details, NIP and REGON, account number, current extract from the Commercial Register, designation and signature of the person authorized to act on behalf of the applicant), • characteristics of the task to be performed by the candidate, • hardware and system requirements, • price list of individual elements of the system and training, optional prices components from external suppliers, • annual maintenance costs, • task completion time, • a preliminary work schedule, most often in the form of a Gantt chart, taking into account the division of the project into individual tasks and their timing. Figure 12 shows an exemplary Gantt chart that shows the initial work schedule of an IT system project. When the customer accepts the initial offer, he requests a detailed order form. The order form is in most cases based on mutual consultations and is the basis for the preparation of the contract. The order form contains information about: • equipment: – proposed configuration, – expansion possibilities, – the infrastructure required for the system,

384

A. Szewc et al.

Fig. 12 Gantt chart example [9]

– maintenance and repair conditions, – costs. • system software: – operating systems, compilers, database management systems, utilities, – license and maintenance costs and payment terms, • application software: – – – – – – – – – –

definition of project objectives, description of the scope of the project, a description of the external systems with which the system will cooperate, a general description of the requirements and other functions, the overall model of the system and its architecture, a description of the proposed solution, software language used, description of the system documentation, ownership rules, software costs,

• support for the future user: – definition of the scope of the supplier’s responsibility for individual elements of the system, – training offer, – training and support costs, • contract: – model purchase contract (price dependencies, warranty conditions, extent of liability, etc.),

The Documentation in the Project of Software Creation

385

Fig. 13 Example of criteria for comparing three solutions

– the method of accepting the contract and making the offer, • application tests: – – – –

responsibility for carrying out the tests, place and time of the test, method of handing over the system, description of the documentation of acceptance and acceptance of the system by the client,

• integration and implementation plans: – rules for the coordination of activities and responsibilities, – financial and organizational situation of subcontractors, – assigning suitable roles in the process to project members. In addition, the manufacturer must provide the following documents: • the solution evaluation report, which contains information on the considered solutions and the reasons for choosing one of them, the results of the estimated comparison are most often presented in tabular form (Fig. 13), • definitions of standards, • a statement of the tenderer’s financial and market situation containing the information for example: – – – – –

the legal status of the company and its share capital, analysis of financial reports, list of implementations to date, expert opinions and references, the method of contacting the persons responsible for concluding the contract.

3.2 Requirements Setting Phase The requirements determination phase is a process aimed at precisely specifying the requirements for the software being created. The description of the functionalities and behavior of the system created during its duration should be as accurate as

386

A. Szewc et al.

possible, unambiguous for both parties and should be easy to adjust during possible future changes. Due to the fact that the client is usually not able to clearly and realistically define these requirements, communication between the creators, the client and future users of the system is extremely important at this stage. During interviews and reviews, customer-defined requirements are verified and criticized, and comparisons are often made with existing analog software and prototypes to determine the availability of requirements. During the meetings, analysts and designers have the opportunity not only to define a complete list of software requirements, but also to define ways in which these goals can be achieved. The system requirements are divided into two basic groups: • Functional requirements—these are requirements that describe all types of functions and operations performed by the system or external systems. In order to precisely define the functional requirements of the system, it is necessary to define all types of users who will use it and those who are necessary for its proper and stable operation, such as administrators or coordinators. For each type of user, the functions of the system and how to use the proposed system should be defined. When determining functional requirements, you should also identify the external systems used in the system (databases, networks). • Non-functional requirements—these requirements describe the limitations under which the system should perform its functions. These can be product (eg keyboard navigation), process (eg compliance with specified standards) or external systems. Factors such as system capabilities, scalability, speed, security, adaptability, standards and communication interfaces should be taken into account when defining these types of non-functional requirements [16–18]. Non-functional requirements should be verifiable, which means that it should be possible to measure or verify that the system actually meets them. All requirements defined in the requirements definition phase should be included in the relevant document. This document is the basis for the elaboration of a detailed contract between the client and the software manufacturer and allows verification of the achievability of the goals set by the client at an earlier stage. The document describing the requirements should be clearly worded and understandable for both users and designers. All requirements should be worded in clear points, which can be easily isolated from the context and, if necessary, replaced by new ones. A very important aspect is the creation of the document so that it can be easily modified in the future and extended with new requirements. A template of such a document was created to standardize the method of creating a description of requirements (Fig. 14). According to the submitted standard, the document should be divided into two main parts. The first is used for organizational information and the second for the main content of the document. The order and numbering of the subheadings in the submitted content should be maintained, even if the subheading remains empty. For each requirement described, it is necessary to state the reason for its introduction and to define the objectives that the requirement helps to achieve [9, 19].

The Documentation in the Project of Software Creation

387

Fig. 14 ANSI/IEE Std 830-1993 “Recommended Practice for Software Requirements Specifications”

In the main part we can distinguish the introduction, general description and specification of requirements. The most important element of the introduction is a suitable dictionary of technical terms in the field of IT as well as in the field of system applications, which allows both parties to understand the document. The general description of the system is most often performed using natural language and structural natural language 3. It contains mainly restrictions imposed on the system, a description of users and a description of its characteristics. However, the detailed specification of the requirements is the definition and description of all requirements. In creating this section, formal notations are used, such as tables and forms, which allow to systematize and illustrate the individual requirements and the relationships between them. An example of a form for defining functional requirements is given in Table 1. Graphical notation plays an important role in specifying requirements. Figure 15 shows an example of a block diagram. Illustrates the hierarchy of functions and the dependencies between them.

388

A. Szewc et al.

Table 1 Font sizes of headings Function name

Edit employee income

Description

The function allows you to edit the total taxpayer’s income obtained in a given year

Input data

Information on employee income obtained from various sources: amount of revenues, costs of obtaining revenues and paid income tax advances. Document information describing income from individual sources

Data source input Documents and information provided by the taxpayer Score

Data entered by an employee of the tax company

Prerequisite

Income amount = Income amount—Cost amount (both for specific income as well as for the total income of the taxpayer). Total the amount of revenues, tax deductible costs and paid advances are the sums of these amounts for income from individual sources

Final condition

As above

Side effects Reason

Updating the tax base. The function helps to speed up customer service and reduce the risk making mistakes

Fig. 15 Block diagram example—hierarchy of functions [9]

3.3 Analysis Phase The purpose of the analysis phase is to answer the question “how is the system supposed to work?” This model is characterized by a hierarchical decomposition of

The Documentation in the Project of Software Creation

389

system functions, a high level of abstraction and a characteristic notation in accordance with a specific convention. The most commonly used analytical techniques include [9]: • • • • • • •

compilation of a static class model, analysis of functions and use cases, verification of classes and subjects, identification and definition of methods and messages, modeling of states and state transitions, modeling processes and data flows, modeling of control flow.

During the analysis phase, user requirements are transformed into requirements for the created product. They may relate to aspects such as system functions, system performance, operations performed, test methods, documentation, portability. Graphical notation plays an important role in the analysis phase as well as in the requirements determination phase. They not only form the basis for software implementation and a form of recording technical documentation, but also support communication between members of the programming team. We distinguish two types of analysis methodologies: • Object-oriented—the use of object-oriented concepts for the purposes of conceptual modeling and analysis of information systems design. An essential part of this type of methodology is a class diagram that shows classes, their attributes, methods, relationships between classes, and the diversity of those relationships and various constraints. The class diagram is complemented by other dynamic diagrams, such as interaction diagrams, which create relationships between method calls, and function diagrams, which are a variation of data flow diagrams. • Structural—containing passive components (data) and active components (operations performed). They combine static data description and static process description. The most commonly used structural analysis techniques are data flow diagrams (DFDs), data dictionaries (data dictionaries), decision tables and decision trees, state transition diagrams, and data diagrams. The final successes of the analysis phase are: • Revised document describing the requirements, updated with the requirements for external interfaces, required means, testing methods, quality, reliability, security and verification methods [20, 21]. • Revised and detailed analytical model of the system. • Data dictionary with full model specification. • A document that presents a detailed description of the analytical model created in this phase. This document contains class diagrams, use case diagrams, sequence diagrams, and state diagrams. It also contains a detailed description of all object classes, their attributes, interrelationships and functions in the system. • Timeline for the design phase. • A document containing the initial assignment of teams to tasks.

390

A. Szewc et al.

3.4 Design Phase The aim of the design phase is to develop a detailed description of the method of system implementation with the least possible intervention in the structure of the model created in the earlier stages. The end product of this phase is the answer to the question: “How should the system be implemented?”, And the documents prepared in this phase will later become the technical documentation of the project. The main tasks of designers include detailing the results of the analysis so that they can become the basis for implementation, system optimization, adapting the analytical model to the constraints of the selected implementation environment and determining the physical structure of the system [9, 22]. The refinement of the results of the previous phase is achieved by: • providing appropriate rules that allow the mapping of the notation used in the analysis phase to the structures of the programming language, • a detailed description of the methods by providing appropriate names, headers and parameters, • determining how to implement relationships, most often by introducing appropriate attributes, indicators, identifiers, or candidate keys to objects of a related class; • defining rules for the transformation of object schemas into relational, • a detailed description of the algorithms used, • definition of basic data types. The project, which is created on the basis of such a detailed analytical model, is responsible for the implementation of the system requirements. In order for the system to be complete, it is necessary to design other components, such as: • • • •

User interface, persistent data management, memory management, task management and time sharing.

The design of the user interface should be prepared in advance in the requirements definition phase, in the design phase special emphasis is placed on the flow of data between the user and the system and the way of entering and transmitting data to the user. The interface must be designed to be primarily consistent and standardized. Its mechanism should be designed to allow easy error handling, grouping of related operations, non-congestion of short-term memory and informing the user of the acceptance of the command [23]. The most common technique used to design the user interface is structural diagrams. These diagrams are a refinement of flowcharts and allow not only to present the individual elements, their hierarchy and the relationships between them, but also to present the directions of data and information flow. The design of the data management includes the choice of the type of non-volatile memory in which the data will be stored (database, file), the form in which it is to be stored (single relationship or file, separate relationship or file for different types

The Documentation in the Project of Software Creation

391

of objects) and how to write to non-volatile memory (continuously, at the request of the user). The correct design of the system should be complete, consistent, consistent and in accordance with the rules relevant to registration. The completeness of the project means that all classes, fields, methods and data have been defined—both complex and basic. On the other hand, consistency determines the semantic consistency of all the information contained in the project documentation [9]. The result of the design phase are the following documents: • • • •

corrected and updated document describing the requirements, improved analytical model, detailed design specification included in the data dictionary, a document describing the created project consisting of class diagrams, interaction diagrams, state diagrams, module and configuration diagrams and containing summaries describing class definitions, attributes, data and methods, • UI resources, • physical design of the system structure, • schedule of the implementation phase. The main design document is called the Retail Design Document (DDP). It contains all information about the design and operation of the software, taking into account all system requirements. The structure of this document is strictly defined and the language and diagrams used in it must be clear and unambiguous. DDP should be organized in the same way as software. In order to increase the transparency of the document and to determine how the diagrams are created, the following rules have been introduced [9]: • • • • • •

diagrams should be read from left to right and from top to bottom, similar items should be arranged in one line and in the same style, visual symmetry should reflect functional symmetry, avoiding crossing lines and overlapping marks, avoid excessive density of diagrams, Important information should be emphasized.

Figure 16 shows the content of the document with project details.

3.5 Implementation Phase The implementation (coding) phase is performed by programmers in the selected implementation environment. Transforms a project into code in a specific programming language. This phase can be partially automated using high-level programming languages and ready-made components, CASE tools and code generators. A key element of the implementation phase is the reliability of the software produced, which can be increased by eliminating errors and their tolerance. Avoiding

392

A. Szewc et al.

Fig. 16 Content of a retail project document

errors means minimizing the likelihood of them occurring, which is possible thanks to the following recommendations: • • • • • •

avoidance of dangerous programming techniques, application of the principle of limited access 6, use of languages with strong type control, use of languages with a higher level of abstraction, use of finished components, Accurate and consistent specification of the interface between modules. The final results of the analysis phase are:

• • • • • •

corrected document describing the requirements, revised system design, forming technical documentation, code consisting of tested modules, a report describing the test results of the implemented modules, test schedule, technical documentation of the code.

The Documentation in the Project of Software Creation

393

3.6 Software Testing, Verification and Validation The software testing phase is crucial throughout the software development lifecycle. It allows you not only to detect errors, but also to verify the quality and reliability of the manufactured product. Software testing can be divided into verification, t. J. Checking the conformity of the product with the specification, and attestation, i. J. Check that the product meets the user’s expectations. The primary standard for software testing is IEEE 829-1998 (829 Standard for Software Test Documentation). It is a standard that defines the form of a set of eight documents required at each stage of software testing. The result of each of these phases is the creation of one final document. This standard specifies the exact format of documents, but does not require the production of all. It also does not contain information on what specifically they should contain [24]: • Test plan—a project management planning document that contains information on how the tests will take place, who will carry them out, what will be tested, how long the whole process will take and what the scope of the tests will be. • Test design specification—a document that details the test conditions, expected results, and test criteria. • Test Case Specification—this document specifies the test data to be used in implementing the test conditions specified in the test design specification. • Test Procedure Specification—provides details on how each test was run, including the prerequisites and steps of the test. • Test item transfer report—contains reports on the time of transition of tested software fragments between phases. • Test report—contains information about which test cases were used, who used them and in what order, and information about their success. • Test Incident Report—contains information about failed tests, information about results, and the reasons why the test failed. • Summary test report—this report contains all relevant information published during the completed tests and an assessment of the quality of the test processes, the quality of the software tested, as well as the statistics obtained from the accident report. The report also refers to the types and duration of tests to be performed to facilitate further testing plans. The final form of the document is used to verify the correctness of the tested system against the requirements defined by clients.

3.7 Summary of Part 3 The cascade software development process is one of the first classical methodologies. It has been very popular among programming teams for many years. However, over time, the cascading model of the software lifecycle used in it has proven to be inefficient because it generates huge costs for errors made in the initial stages and requires well-defined client requirements.

394

A. Szewc et al.

With the development of classical methodologies based on a spiral model of the software life cycle and the dissemination of rapid methodologies, the cascade model began to move away. The basic assumptions regarding project management and documentation are still successfully applied in the latest software development methodologies.

4 RUP Methodology Many software development companies have concluded that the foundation of success is the correct and accurate documentation of the production process. Based on this experience and analysis of the latest programming procedures, the RUP methodology was developed. This chapter presents important information about the RUP methodology, a description of the process structure, its properties and the form of the resulting documentation.

4.1 What Is RUP RUP (Rational Unified Process) is one of the classic software development methodologies, developed and officially published in 1998 by Rational Software and currently developed by IBM. It ensures an orderly allocation of tasks, defines the scope of responsibilities within the company and precisely defines the phases of the project duration within which it was used. Its main goal is to create high-quality software that meets the needs of users within the expected schedule and budget. RUP is also understood as an iterative approach to use-oriented software development. The methodology uses many current best practices in software development, in particular it includes: • • • • • •

iterative software development, requirements management, use of component-based architecture, visual software modeling, continuous software quality control, check for software changes.

Cascading or linear approaches to software development, implemented by most classical software development methodologies, are characterized by the fact that unauthorized threats move in time and debugging at later stages is not only costly but also labor intensive and often difficult to implement. A key factor influencing the success of a project created in accordance with these processes is the unambiguous presentation and definition of the project requirements and the vast experience of

The Documentation in the Project of Software Creation

395

Fig. 17 Schematic of the RUP iteration process

the people working on the project. This is due to the fact that the design created in the previous stages cannot be changed. Due to the enormity of the thematic areas of projects and the complexity of formulating clear requirements by the client at the beginning of system development, it is practically impossible to achieve these goals in large and medium-sized projects. Therefore, an iterative approach was used in RUP. It is characterized by the fact that the process consists of a series of incremental steps, i. J. Iterations. Each covers all phases of the software development process, i. J. Formulation, analysis, design and implementation of requirements. Each iteration has a clearly defined set of goals and leads to a partially functioning implementation of the end system. As a result of subsequent iterations, the system evolves and improves until the final product is obtained. A diagram of such a process is shown in Fig. 17. This approach allows for easy modification and completion of requirements, also allows for system integration to be analyzed for subsequent iterations, significantly reducing the risk associated with the final integration of the project and neutralizing the biggest threats most often detected in the final integration of product components. An iterative approach also allows for dynamic product changes that can, for example, reduce software development time by reducing the number of features, if needed. It also eliminates a large number of system errors, as errors are detected in early iterations and can be easily eliminated. An important feature of this approach is the easier reuse of the software, which results from the implementation and design of only part of the system, and the gradual implementation and training of team members who have several opportunities to verify the accuracy and correctness of their work and solutions.

396

A. Szewc et al.

An iterative approach and focus on system architecture also supports building a system with components. Thanks to their gradual identification, analysts and designers have the opportunity to verify the need to buy or build a specific module.

4.1.1

The Essence of RUP

The RUP methodology was developed on the basis of experience from the successful implementation of a large number of projects. Based on them, a set of principles was developed, which became the main goal of this methodology. You can repeat the software without following these guidelines, but adhering to it significantly increases your chances of success. The rules are as follows: • attack the main threats as soon as possible and consistently, otherwise they will attack you, • make sure the customer gets a valuable product, • constantly focus on creating executable software, • take into account changes in the earlier stages of programming, • try to develop an executable architecture as soon as possible, • build a system of components, • work with colleagues, create one team, • Quality assurance should be a process, not something to consider. Attacking major threats means identifying them in time. The Rational Unified Process provides a structured approach to tackling major hazards in a timely manner, reducing overall costs, and enabling more timely and realistic estimates of project completion time. This is done by listing the most important threats in each iteration, prioritizing them, and then carefully considering each case to address and mitigate the risk. Risk reduction using this approach is shown in Fig. 18.

Fig. 18 Risk reduction curves for cascading and iterative software development

The Documentation in the Project of Software Creation

397

The provision of a valuable product in the RUP methodology was achieved to the customer through a use-oriented approach. They force you to focus on the user’s perspective and evaluate the project in terms of its requirements. Because they describe the interaction between the user and the system and are arranged chronologically, they are understandable to both the client and the analyst, making it possible to identify many gaps and shortcomings in the requirements definition phase. The advantage of this approach is also that it allows all team members direct access to requirements during design, implementation and testing, as well as during the final project documentation. Executable software development is the third basic principle of RUP. Thanks to this approach, it is possible to realistically assess the progress of work on the project. This procedure also determines the need to create appropriate project documentation, as designers and programmers are able to continuously assess which artifacts are useful in creating a module and which are unnecessary. The RUP methodology has been developed in such a way as to minimize the costs of changes with the greatest possible ability to implement them. This has been achieved by skillfully setting milestones (7), completing successive stages of software development and forcing the design of an overall system vision at the end of the start-up phase. In addition, at the end of the development phase, the RUP will force the creation of a solid architecture and prevent the introduction of new functions in the final phase of the construction phase. This will allow you to reduce costly changes at key moments in software development. The changes in RUP were also facilitated by the promotion of component software. One of the main benefits of using RUP is the ability to start testing software soon. This is because the tests can be run from the moment the first iteration is completed and the prototype is presented. Thanks to such a timely assessment, it is possible not only to achieve significant time and cost savings, but also to check whether the created architecture works properly. Another beneficial factor is the close interdependence between testing and design. This allows for much more efficient automation of testing, as test code can be generated directly from design models.

4.2 RUP Structure Figure 19 shows the entire Rational Unified Process architecture. The process consists of two structures: • horizontal axis—the dynamic structure, also known as the time dimension of the process, shows how the cycles, phases, iterations and milestones of the process develop during the project, • vertical os—static structure, used to present the static properties of the process described in the context of process components, activities, disciplines, artifacts and roles.

398

A. Szewc et al.

Fig. 19 The two-dimensional structure of Rational Unified Process

4.2.1

Static Structure—Process Description

The Rational Unified Process methodology defines five basic elements describing who should do what and what and how and when it should be done: • The role defines the scope of competencies and responsibilities and the way in which persons or groups of people working on a given project should carry out their work. A person usually plays one or more roles, and each task can be performed by several people. The responsibilities of each role are usually expressed as artifacts that that role creates, modifies, or oversees. Each role also has a set of abilities that the player must have in the role in order to complete his assigned activities. The roles are divided into five main categories: – – – – –

the role of the analyst, the role of creators, the role of testers, the role of managers, production and ancillary tasks.

• Activities—an activity is a unit of work to be performed by role actors, and a specific set of activities indicates the work assigned to a given role within a project. Each activity has a clearly defined objective to be achieved after its completion, its duration and the task according to which it should be carried out. The steps consist of steps that are divided into: – mentally—analysis and study of the nature of input artifacts, – executive—creating or updating certain artifacts, – controls—evaluation of results based on given criteria.

The Documentation in the Project of Software Creation

399

• Artifacts—An artifact is a piece of information that is created, modified, or used in the software development process. This is a specific product that actors can play as input in or as a result of performing certain activities. The artifact can be documented either formally in the form of a model, model element, document, source code, executable software, or informally, for example, by e-mail or whiteboard. In an iterative development process, various artifacts are not built, assembled, or even frozen before a particular phase is completed, but evolve throughout the product development cycle. • Workflows—A workflow combines artifacts, roles, and activities together and describes a sequence of tasks that provide valuable and measurable results. It also describes the interactions between roles and their sets. There are three types of workflows in RUP: – basic—assigned to each discipline, – detailed—definition of individual elements of the basic flow, – iteration plan—description of activities in each iteration. • Disciplines—Disciplines are used to group specific activities within the software development process. The RUP defines nine basic disciplines—six technical (main) and three auxiliary. They share all roles and activities according to areas of interest and specialization. Technical disciplines are: – – – – – – – – – –

business modeling, requirements, analysis and design, implementation, testing, implementation. Auxiliary departments are: project management, configuration and change management, environment.

Tasks, activities (organized as workflows) and artifacts are the backbone of the RUP’s static structure. However, there are several additional elements that make it easier to understand and use the whole process. These are other elements (Fig. 20): • Tips—Briefly and concisely described rules or recommendations to help you perform actions, steps, and artifacts. These are explanations of what needs to be done, a list of specific techniques for creating artifacts or how to formulate an assessment of their quality. Tips can be divided into: – working—with practical advice on activities or groups of activities, such as programming or workshops, – related to artifacts—describe creation, evaluation and use individual artifacts.

400

A. Szewc et al.

Fig. 20 Introducing templates, user guides for tools, and tips

• Templates—models of artifacts, supporting their creation and standardization. • Tool User Guides—Provides information about what depends on the tools they describe in the process. They make it possible to avoid activities being dependent on the tools used. • Terms—key terms and definitions used in the process. • Wizards—acquaint the user with RUP from a given perspective. 4.2.2

Dynamic Structure

The dynamic structure is related to the duration of the project or its time dimension. RUP takes a structured approach to iterative software development. The whole process was divided into four steps: launch, development, construction and delivery, each with a milestone. The schematic division into stages is shown in Fig. 21. This approach makes it easier to manage and control the progress of work, because after each phase you can evaluate the process and decide on the efficiency and accuracy of the tasks performed. The phases form one software development cycle and their end product is a prototype or generation of software. If the system is to be

The Documentation in the Project of Software Creation

401

Fig. 21 The four stages and milestones of the iteration process

further developed, the cycle is repeated with all the steps taken and the product is developed. During the initial phase, a vision of the product will be developed and requirements will be set, thanks to which a common understanding of the project’s objectives can be achieved between all its participants. During this phase, the cases of critical use of the system and the main behavior scenarios are also defined, which enable the compilation of the overall system architecture. In addition, at this stage, the costs are estimated and a timetable for the whole project is set. The main artifacts created in this phase are [25]: • • • •

visa document, use case model (10–15% completed), introductory vocabulary of the project, Initial business case containing: – operating conditions, – Success criteria, – financial forecast,

• initial risk assessment, • project plan with stages and iterations, • domain model. In the development phase, steps are taken to analyze the task area, create a basis for the system architecture, gather the necessary resources and eliminate the elements causing the greatest threats to the project. Most of the work in this phase is focused on creating and checking the system architecture, which will serve as a starting point in the next stages. The results of the development phase are: • • • • • •

use case model (completed at least 80%), description of additional system requirements (non-functional), description of the system architecture, executable architecture prototype, extensive list of threats, a plan for the creation of the whole system together with criteria for the evaluation of individual iterations, • an introductory user guide.

402

A. Szewc et al.

In the production phase, the product is implemented and combined with the components used. The main goals set during its implementation are to minimize costs and optimize quality through skillful resource management. The end product of this phase is a tested usable version of the system. During the construction phase, the product is also assessed according to visual criteria. The results of this phase are: • executable software as a finished product, • manual, • a description of the current software version. The last step in the process is to upload. Its main tasks are final tests of the presented prototype, product launch and training of future users. An important prerequisite for this phase is also the correct evaluation of the product in terms of the vision created and the criteria for its acceptance.

4.3 Artifacts in Individual RUP Disciplines In the RUP methodology, the collection of artifacts was divided into the disciplines in which they are implemented. Each of the disciplines defines a number of characteristic products that should arise as a result of its implementation.

4.3.1

Project Management Discipline

A very important issue in the RUP methodology is the approach to the problem of project management. Therefore, it not only defines the ways of solving threats and methods of measuring and monitoring the progress of work, but also specifies the implementation of iterative process planning. The project manager is responsible for project planning. It must include in the plan the roles and responsibilities of all team members, as well as oversee the progress of work and check its status in relation to the implemented plan. Planning also involves inanimate resources such as equipment, finances and contracts. However, RUP does not address these aspects of design. Due to the length of projects, their extensive structure and lack of stable requirements, it is often impossible to create an accurate plan of the entire production process of the system. Therefore, RUP used a completely new approach to the project planning process. Specifically, during the construction of the system, two types of plans are created: • rough plan—created for the needs of further stages of software development, • detailed plan—these plans contain a set of activities and works for each iteration. In the software development process, only one rough plan is created, which covers all general information about the process. It is created during the initial phase of the initial phase and then updated. It includes the dates of the most important milestones, such as the review of the company’s goals, the development of the architecture, the

The Documentation in the Project of Software Creation

403

launch of the first functional version of the product and making the product available to users. It also contains information on sub-milestones, including the planned start and end dates of each iteration and their main objectives. The rough plan should also mention the team that will implement the project, such as the employment profile of the members and their size. Only one detailed plan is created for each iteration in the process implemented in accordance with the RUP methodology. However, these plans are most often active twos, which means that in a particular iteration, its detailed plan is implemented and at the same time a plan for the next iteration is created. These plans are created using traditional planning methods and tools, such as the Gantt chart, that allow you to define tasks and assign them to specific team members. They contain dates relevant to the iteration, such as the delivery date of a particular product, the creation of an artifact, or the construction of a model. In addition to the rough plan and the plans for the individual iterations, the discipline of project management defines a number of other documents. They are listed in Table 2.

4.3.2

Business Modeling Discipline

The discipline of business modeling defines a number of methods for understanding the structure and functioning of the company for which the system is created. The aim of this discipline is also to obtain information about current problems in this company, possible improvements and to obtain common views of customers, users and creators on the shape of the final product. Another task of this discipline is to gather the basic requirements for the system. The above goals are achieved by creating a vision of the target company. With this vision, the business model defines processes, tasks and responsibilities. It allows you to precisely define the properties of the software and determine its purpose. Thanks to this, the produced software meets the network not only of the client, but also of future users. The business modeling process begins with an assessment of the situation in the company in which the system is to be implemented. Based on it, a company evaluation document and a business vision will be created. Based on the resulting documents, a suitable business modeling scenario is selected. The next step is domain modeling and domain analysis by examining use cases of the activity. If the modeled activity is an improvement of the existing one, both models of the current and the new vision of the company will be created. Table 3 lists all artifacts arising in the discipline of business modeling.

4.3.3

Discipline of Requirements

The Requirements discipline describes what needs to be done to achieve the following goals:

Businesses manager Purpose of creation: Elaboration of the business vision of the system and determination of the profitability of the project Description: This artifact contains the necessary information from a commercial point of view. The information contained therein relates to the economic vision of the company. Its main task is to determine costs and benefits. It also allows you to define the scope and plan of the project more precisely Contents: – a set of project assumptions, – order amount, – client and producer responsibility, – a complete list of costs associated with the manufacture and maintenance of the system (continued)

Implementation plan (Deployment plan)

The case activities (Business case)

Description

Role

Businesses manager Purpose of creation: Ensuring the delivery of the system to his users Description: Implementation can cause many changes and problems. Therefore, ensuring the smooth marketing of the product is a key factor in meeting customer requirements. The implementation plan should minimize the costs and risks associated with delivering the system to the customer. This plan is made for implemented systems, is not designed for prototypes. Describes a set of tasks that must be performed when implementing and testing a product Contents: – a detailed implementation schedule, – list of persons responsible for implementation, – people’s roles related to implementation

The name of the artifact

Table 2 Collection of artifacts in the discipline of project management

404 A. Szewc et al.

List of threats (Problem list)

(continued)

Businesses manager Purpose of creation: Defining the main risks, problems, exceptions or irregularities in the project Description: This document describes issues that are related to project management and which may affect the success of the project. These problems can be related to schedule, deadlines, resources Contents: – informal description of problems, – priority issues, – ways to avoid or solve problems

Creation plan software (Development Plan)

Description

Role

Businesses manager Purpose of creation: Systematization of information contained in documents created in the development phase. Defining the method of management and the course of the project Description: This document contains all the information on how to start a project, which allows you to create a work plan, define the required resources and determine the work control method. This document is evolving with design and requirements Contents: – information on the contract, – sketch of the project architecture, – description of the implemented methodology, – a description of the tools used to work on the project, – product quality requirements, – a description of the main assumptions of the project, – technical and personnel needs

The name of the artifact

Table 2 (continued)

The Documentation in the Project of Software Creation 405

Work order (Order)

Measurement businesses (Project Measurements) Businesses manager Purpose of creation: Collect up-to-date project data, resources, processes, and product measurements Description: Complete project database. This document provides information on the progress of work, the means used and the performance characteristics Contents: – measurement characteristics, – measurements, – conclusions, – data acquisition methods

Businesses manager Purpose of creation: enterprises Definition of activities to be carried out and deadlines for their implementation Description: This document is the second contract between the project manager and the task managers. Describes the tasks to be performed in a form understandable to team members. Defines their priorities and deadlines Contents: – list of tasks, – assigning tasks to individual people

Iterative assessment (Iteration Rating)

Description

Role Businesses manager Purpose of creation: Assessment of iteration implementation Description: This document defines the extent to which the evaluation criteria for a given iteration have been met and defines the conclusions drawn from it Contents: – a description of the activities to be performed in the iteration, – evaluation of the achievement of the intended objectives, – conclusions

The name of the artifact

Table 2 (continued)

406 A. Szewc et al.

The Documentation in the Project of Software Creation

407

Table 3 Artifacts in the field of business modeling Artifact

Role

Description

Evaluation of the target organization (Target-Organization Assessment)

Business process analyst

Document containing a description of the current state of the company in which the system is to be implemented

Business vision (Business Vision)

Business process analyst

A document that defines the purpose of the activity and its objectives. It is a basic document of the discipline of business modeling. It contains a simplified business model and describes all related elements. Defines critical guidelines for the implemented project

Business glossary (Business Glossary)

Business process analyst

A document defining important and specific terms related to the characteristics of a given company

Business rule (Business Rule)

Business process analyst

Determining the directions of key importance for society and the conditions that need to be taken into account when creating business vision

Business use case model (Business Use Case Model)

Business process analyst

A document that contains a model of expected business functions, which is an important source of information about roles and products delivered within the company. Defines the direction and intentions of the company’s activities. The direction is expressed through business goals resulting from the company’s strategy. On the other hand, the intentions are formulated on the basis of the company’s interactions with customers. This document is used to improve and expand the services offered by the user and to define the appropriate set of features for the product being created

Business analysis model Business process analyst (Business Analysis Model)

An object model that describes the implementation of use cases through the interaction of business systems, company employees and business entities. It also identifies a set of external business services that are used in these cases (continued)

408

A. Szewc et al.

Table 3 (continued) Artifact

Role

Description

Business architecture document (Business architecture document)

Business process analyst

A document that provides a detailed overview of important architectural elements of activities from different angles. This document is most often used to present the characteristics of the company to third parties and when it is necessary to change the characteristics of the company

• defining a set of functional and non-functional system requirements and their acceptance by clients, • assisting team members with an accurate and unambiguous interpretation of the established system requirements, • definition of system boundaries, • creating a basis for planning the content of the technical iteration, • creation of a basis for determining the costs of the manufactured system and the time of its construction, • design of a prototype user interface. The requirements discipline also defines how to create a system vision and then translate it into a use case model. It further explains how to use requirement attributes to facilitate the management of scope and change of requirements [25]. A key element of this discipline is the collection of all requirements concerning the created system from its shareholders. They can be gathered through various forms of communication, such as interviews, questionnaires and workshops. These requirements create a set of data needed to understand and clearly define all needs and critical comments on the system being created. They also form the basis for the development of detailed system requirements and features. Features are a distinct, informal type of request that describes services that meet the specific needs of users. When you add the appropriate attributes to them, such as workload, priority, and rating, threats become specific requirements for the system you are creating. The requirements discipline workflow begins with a written formulation of the content of the task to be performed, the appointment of stakeholders, and the definition of boundaries and system constraints. Stakeholder requests are then collected and defined through various forms of communication. Based on the obtained data, a list of wishes and a vision document is prepared, which contains a complete picture of the created system and represents the basis for the creation of a contract between the manufacturer and the customer. It is written from the perspective of the system user and its most important part is the set of needs and properties of the system related to the services that must be included in the created system in order to meet the basic requirements of its future users. The selection of requirements included in the vision document is made on the basis of a cost-risk analysis prepared in a previously created project case document. The next step in the requirements discipline is to define the

The Documentation in the Project of Software Creation

409

details of the described requirements and to create a use case model. This model is a key artifact in the RUP methodology, it is understandable for both developers and the client. It not only allows you to exchange knowledge about the requirements implemented in the system, but also allows you to verify the vision of the created system and its scope. After creating the use case model, the next step is to create additional specifications. They complement the use case model and include, in addition to functional cases, non-functional cases. These specifications, together with the use case model, provide a complete definition of the requirements for the software being developed. All requirements discipline artifacts are listed in Table 4.

4.3.4

Discipline of Analysis and Design

The main goal of the discipline of analysis and design is to translate system requirements into the description of the system implementation method. To achieve this goal, it is necessary to adopt an appropriate implementation strategy and develop a system architecture. The analysis, i.e., the first phase of this discipline, consists in translating the requirements so that they are understandable and unambiguous for the software designer. For this purpose, a set of object classes is defined, as well as a division into subsystems and their function is determined. Special emphasis is placed on the implementation of functional requirements and ensuring their implementation with minimal risk of conceptual errors. The second phase of this discipline is the design of a system, which consists in adapting previously obtained products to the constraints resulting from non-functional requirements, efficiency requirements, implementation environment and the like [25]. The detail of the project depends on the type of system created, the experience of the team and the company’s policy. The main roles in the discipline of analysis and design are software architect and designer. People performing the first of these roles are responsible for performing and coordinating technical activities and creating appropriate artifacts, while designers define the scope of responsibilities, operations, attributes, relationships between classes, and how they are integrated and adapted to the implementation environment. In addition, if your project contains a rich database, the role of database designer is introduced. Some projects also require people to act as designers and designers. They evaluate the main artifacts created in the analysis and design phase. The most important artifact created in the discipline of design and analysis is the design model. This model is based on a previously developed use case model. It contains a description and structure of object classes grouped into packages and subsystems, whose task is to obtain specific behavior within one mechanism. In the initial phase, the design model is an outline of the system, which defines the set of elements of the system, its mechanisms and the way of their organization. This sketch will then be expanded and refined at the design stage. The definitions of the design elements are then refined and details of how these elements implement the expected behavior are developed.

410

A. Szewc et al.

Table 4 Requirements discipline artifacts Artifact

Role

Requirements management plan Systems analyst (Requirements Management Plan)

Description A document describing the format that request artifacts should have, request types, and their attributes. It defines the method of describing the collected requirements, the methods of their control and evaluation a mechanisms for versioning documents containing information about system requirements

Software requirement (Software Requirement)

Requirements specifier The document defines the options that the software must provide to the user and describes the requirements for the software production process itself, such as coding standards, forms of documentation, and standards to be followed

Glossary (Glossary)

Systems analyst

Document containing definitions specific to a given terminology

Requests from interested parties (Stakeholder Requests)

Systems analyst

The document is a list of shareholder requirements and system features. It is created during the requirements definition process. It is the basis for creating visions and use case models

Vision (Vision)

Systems analyst

A document containing a complete picture of the system being created. It contains a description of the shareholders of the system, key needs, requirements and the most important aspects related to the production process. The information contained in it is the basis for drawing up the contract

Additional specification (Supplementary Specification)

Systems analyst

Document containing specifications of all non-functional system requirements. It describes the legal requirements that the system should meet and the required standards that it should meet. It also defines system attributes such as reliability, performance, and scope of post-product technical support (continued)

The Documentation in the Project of Software Creation

411

Table 4 (continued) Artifact

Role

Use case model (Use-Case Model) Systems analyst

Storyboard (Storyboard)

Description The model contains all functional requirements of the system presented in the form of a scheme of use cases and their description. It describes all the actors and the relationships that can occur between them and the system. It is a basic document developed and modified during the entire software development cycle

User interface designer A document describing use case scenarios that is the basis for user interface development. Defines what actions the user should be able to perform in a given application context and how they should proceed

The user interface is an important element of the system developed in the analysis and design phase. Its design runs in parallel with the design of the system and should result in the development of a prototype. Artifacts created within the design and analysis discipline are listed in Table 5.

4.3.5

Implementation Discipline

Implementation discipline focuses on achieving the following goals [25]: • presentation of the code organized into subsystems with division into appropriate layers, • implementation of defined classes and objects, • testing of individual components, • merging code created by programmers. In the RUP methodology, the above objectives are achieved by iteratively creating products 8, merging them 9 and creating new prototypes. This approach allows for regular system development and simultaneous testing of already created modules, allowing for early detection of many inconsistencies and errors—even before the system testing discipline begins. The roles responsible for performing these activities are: the implementer, the assembly of the individual components, their testing and the creation of suitable artifacts, and the system integrator responsible for product design. The RUP methodology defines the framework of the workflow in the implementation discipline. For each iteration, the following steps are performed [25]:

412

A. Szewc et al.

Table 5 Artifacts of the discipline of analysis and design Artifact

Role

Description

Analysis model (Analysis Model)

Software architect

Abstract generalized system model based on a design model. It provides an overview of all functions implemented in the system and describes aspects that are specific to the application. It also contains an analysis of packages and their hierarchy, object classes, relationships and a description of the functions performed by individual packages. Because this document describes how all the features work, it is most often designed to quickly deploy new team members. Its level of abstraction also facilitates the implementation of the system in many different environments

Data model (Data Model)

Database designer

The document contains the design of the system database. Describes the structure of the base, identifies persistent classes in the project, and describes the mechanisms and strategies for storing, retrieving data to meet system performance criteria

Deployment model (Deployment Model)

Software architect

This document contains the configuration of client computers, servers, nodes, connections, and protocols. It also describes a set of drivers and external programs needed to start and operate the system

Design model (Design Model)

Designer

The design model is an abstract description of the system. It is a comprehensive description covering all classes, components, packages and subsystems and the relationships between them [25]

System map (Navigation Map)

Interface designer

Most often, the document contains a tree diagram representing the main routes of the user in the system. These are paths through the system’s successive screens and not necessarily all possible paths. This document is a guide to the user interface (continued)

The Documentation in the Project of Software Creation

413

Table 5 (continued) Artifact

Role

Description

Reference architecture (Reference Architecture)

Software architect

This product consists of finished architectural designs, architectures of mechanisms and structures used so far, and a complete description of existing systems with known characteristics and proven in use. Its goal is to create a starting point for the development of architecture and enable architects to follow ready-made, existing and functional solutions

Software architecture document (Software Architecture Document)

Software architect

This product provides a comprehensive overview of the system architecture using several different architectural views that show different aspects of the system being created. It also serves as a means of communication between software architects and other members of the design team about important architectural decisions made in the project [25]

Prototype user interfac (User-Interface Prototype)

Interface designer

This artifact is an example of a user interface. Describes and presents the created appearance of the system, which allows the customer an initial analysis of the appearance and functions of individual screens of the system

• the establishment of a merger plan indicating the subsystems to be implemented in the current iteration and the order in which they will be implemented and merged, • determining the order of class implementation within the current iteration, • implementation of classes, objects, customization of existing components, compilation and connection, • performing unit tests in order to check the implemented changes and correct any errors, • checking the generated code and checking compliance with the prescribed instructions, • connecting new and changed components, creating products, • testing the correctness of mergers and integrations. In the implementation discipline, several documents have been prepared, which are listed in Table 6.

414

A. Szewc et al.

Table 6 Artifacts of discipline implementation Artifact

Role

Description

Integration plan (Integration System integrator Build Plan)

Document defining the order in which the elements and subsystems should be implemented. It also includes a description of the products to be manufactured during system integration

Implementation model (Implementation Model)

Implementer

Document containing the physical structure of the implemented system, in particular its components and subsystems (directories and files, including source code, data and executable files). It also identifies major integration units that can be independently versioned, deployed, and replaced

Test for developers (Developer Test)

Implementer

A document containing a scheme for testing software produced by programmers and a definition of the basic requirements that software must meet

4.3.6

Test Discipline

Testing discipline is primarily about assessing and evaluating product quality. These assumptions can be achieved using the following tasks [25]: • Find and document bugs, errors, and unresolved issues in executable software • evaluating software quality and informing project management about it, • formulation and support of specific evidence to assess the assumptions made during the design phase and specification of requirements, verification of the correctness of these assumptions, • checking that the system works in accordance with the developed design, • Check that all requirements have been identified. In the RUP methodology, testing takes place in all phases of the software development process. This approach allows you to obtain timely information about product quality, so you can take steps to improve product quality when designing and creating it. Artifacts produced in the test discipline are listed in Table 7.

4.3.7

Configuration and Change Management Discipline

The goal of the configuration and change management discipline is to update and maintain the consistency of artifacts that are modified during the system build process. The purpose of this discipline is also to observe the changes taking place in the product, to map them to a set of artifacts, to edit versions of documents and to

The Documentation in the Project of Software Creation

415

Table 7 Testing discipline artifacts Artifact

Role

Description

Test schedule (Test Plan)

Testing Manager

A document that provides information about the role and objectives of testing in an enterprise. It also contains a general description of testing strategies, their configuration, and a description of the resources needed for proper implementation

Testing strategy (Test strategy)

Test Designer

A document presenting selected variables and their values used in various efficiency tests to simulate or emulate the characteristics of the actors and functions of end-user activities, their load and volumes [25]

Workload analysis model (Workload Analysis Model)

Test Analyst

A document presenting selected variables and their values used in various efficiency tests to simulate or emulate the characteristics of the actors and functions of end-user activities, their load and volumes [25]

Test log (Test Log)

Tester

Document containing raw data collected during testing [25]

Summary of test evaluation (Test Evaluation Summary)

Testing Manager

Document created after all test scenarios have been completed. It shows the overall quality of the system, describes the results of individual tests and evaluates them overall. It allows you to decide whether to implement or repeat certain iterations of the software development process

Test data (Test Data)

Test Designer

This artifact specifies a set of test input values and expected results. They are used in testing for comparative analysis of the obtained values

Test situation (Test Case)

Test Designer

The document contains a set of tests to be performed on the given functionality, the conditions of its execution and a link to the test data (continued)

416

A. Szewc et al.

Table 7 (continued) Artifact

Role

Description

Test results (Test Results)

Test Analyst

This artifact summarizes the analysis of one or more tests and provides a detailed assessment of the quality of the component and the state of preparation

Test environment configuration (Test Environment Configuration)

Test Designer

This product specifies the hardware, software, and environment configuration required to perform accurate testing

Test the automation architecture (Test Test Designer Automation Architecture)

Description of algorithms, projects and elements enabling test automation. The document describes their properties and possible use in relation to the current project

provide team members with information about the changes that have been made in them. The whole process can be divided into the following zones: • Configuration Management (illustrates product structure)—This area covers the identification of artifacts, their versions, and change history. Because many artifacts are interrelated, it is important to keep these artifacts consistent. Within this zone, responsibilities are also divided so that stakeholders do not interfere with each other. • Change request management 10 (illustrates the structure of the process)—This area includes the collection and management of change requests generated by project stakeholders. An important element is also the analysis of the possible impact of each of the introduced changes on the operation of the system and monitoring its implementation. • Status assessment and measurement (illustrates the project management structure)—contains information on the status of the product, its quality and progress. This data is obtained by analyzing the tasks performed and the tasks that still need to be performed, the costs and the areas in which problems occurred during the work on the project. The key artifacts that appear in the discipline of configuration and change management are summarized in Table 8.

4.3.8

Environmental Discipline

The goal of environmental discipline is to provide the software company with appropriate processes, tools and methods for the developed system. This objective is achieved through [25]:

The Documentation in the Project of Software Creation

417

Table 8 Artifacts of the discipline of configuration and change management Artifact

Role

Description

Configuration management plan Configuration Manager Describes enterprise policies and (Configuration Management Plan) configuration management procedures: variations, variations, workspaces, change management procedures, products, and versions. It also sets out the rules and responsibilities of the Change Control Committee. It is part of the software development plan Request change (Change Request) Shift Control Manager

Configuration audit findings (Configuration Audit Findings)

• • • • •

Change requests can be caused by errors, configuration changes, or change requests. This document indicates for each application the originator and the reason for the notification. Then, the analysis and impact of the change on other functionalities is described and a list of artifacts affected by the change is provided. The costs of implementing the changes are also determined, the dates of implementation and the status of implementation are recorded according to the progress of work

Configuration Manager Reference document for the analysis of the state of progress. It contains all missing and required artifacts and all incompletely tested or failed requests

selection and purchase of tools, tool settings and configuration, process configuration, improving and updating processes, provision of technical services supporting processes.

The RUP methodology makes it possible to adapt the process to the individual expectations of the manufacturer and the project. Factors such as the current state of society or access to support tools should be taken into account for the correct configuration of the methodology. On this basis, an outline of the software development case is prepared, which contains a list of tools to be used during the system development process and templates of basic artifacts used during this process. Artifacts of environmental discipline are described in Table 9.

418

A. Szewc et al.

Table 9 Artifacts of environmental discipline Artifact

Role

Description

Development infrastructure (Development Infrastructure)

System Administrator

The development infrastructure includes the hardware and software, such as the computers and operating systems, that will run the system. This document also describes the hardware and software used to connect computers and individual users

Evaluation of the development organization (Development Organization Assessment)

Process Engineer

This document covers the situation of a software development company It contains information on the current state of the process, the tools used, the competencies of employees, the attitudes of people, existing customers, competitors, technical trends, problems and areas for improvement

Development process (Development Process)

Process Engineer

This document specifies the scope of the process used to create the product. Defines the basic elements of the process, such as roles, activities, and artifacts. It also describes the details of the counseling process, for example through descriptions of process elements, concepts and literature used for educational purposes. This document also defines the details of the adopted software lifecycle model and information retrieval mechanisms, which allows team members to quickly discover data

4.3.9

Implementation Discipline

The goal of the implementation discipline is to deliver the created product to the client. This task is performed in several steps. The first stage is testing the delivered system in the target environment of the client. If these tests are successful and the product is to be distributed to a wide range of customers, the software is packaged and then distributed for further distribution. The next stage of product implementation is the preparation of user documentation and the implementation of appropriate training in order to acquaint future users with the possibilities and manner of handling the supplied software. The basic artifacts created for the deployment discipline are summarized in Table 10.

The Documentation in the Project of Software Creation

419

Table 10 Artifacts of discipline implementation Artifact

Role

Description

User Support Material (User Support Material)

Implementation Manager

This document contains information intended for the future user of the system. This information covers the configuration and hardware requirements required for the supplied software to function properly, as well as instructions that familiarize the user with the available features and capabilities of the product. This document also often describes how to install and update a product

Product (Product)

Implementation Manager

This artifact includes not only the finished product packaged or made available on the website, but also elements such as inventory and packaging illustrations that clearly identify the product

4.4 Summary of Part 4 The documentation process of the whole project is a very important element in the RUP methodology. A significant part of the time and resources allocated to the project is devoted to the creation of subsequent artifacts, their versioning, merging and updating. As a result of this approach to the software development process, the risk of project failure and the occurrence of serious conceptual errors has been significantly reduced. However, this has been achieved at the cost of the need for a significant increase in the expenditure allocated to the implementation of the project, which results from the need to create, maintain and update a large number of documents. Based on the analysis of the area of application of the RUP methodology, it can be concluded that it is profitable only for large projects in which a number of teams participate. Such projects most often require constant updating and implementation of new solutions after the introduction of the original version of the system into use. In this context, the RUP methodology provides excellent results because it not only allows you to maintain product consistency through detailed documentation and specification of requirements and configurations, but also greatly facilitates the implementation of new team members.

420

A. Szewc et al.

5 Extreme Programming Methodology The bureaucracy of traditional software development methodologies, the reluctance of employees to write multi-page documents, and the frequent failures of created projects have initiated the search for a new trend in the software development process. As a result, a new group of methodologies, known as agile methodologies, was developed. The main example of such a methodology is extreme programming. Extreme programming is an innovative concept of high-quality software development that eliminates unnecessary activities that do not bring the team closer to achieving the measured goal. This concept is smooth and constantly evolving, so some of its rules are replaced by others, much more perfect, which are created on the basis of gaining more and more experience in this field.

5.1 What Is XP? XP (Extreme Programming) is a software development methodology that belongs to the family of agile methodologies. It was developed for small and medium-sized teams that create high-risk software, that is, software for which user requirements are incomplete, changeable, or unclear to team members. The basic assumptions of this methodology were formulated by Kent Beck in 2001. Since then, they have undergone many modifications until they adopt the current form [26]: • XP is a methodology that assumes that only activities that are directly related to the development of software functions are performed. • The XP methodology addresses known issues and limitations in software development. However, it does not cover issues such as archiving project documentation, project financial service, human resources organization, marketing or advertising. • XP methodology can be used by teams of all sizes. The basic assumptions of XP remain unchanged regardless of the scale of the problem, but their implementation may vary depending on the needs of the team. • The XP methodology easily adapts to changes or unclear design requirements. Extreme programming covers the following areas [26]: • a programming philosophy based on communication, feedback, simplicity, courage and respect, • a set of complementary procedures with particular utility, • a set of complementary rules for the implementation of XP assumptions in situations where we do not have a ready method of resolving the appeal, • a development team whose members share values and follow similar procedures. Among other agile methodologies, the XP methodology is characterized by very short developmental cycles, thanks to which it is possible to obtain continuous feedback. It is also characterized by a gradual project planning process, which allows the creation of a very general system development plan at the beginning of the

The Documentation in the Project of Software Creation

421

work and then its gradual updating and adaptation to the prevailing conditions and requirements. Based on this approach, it is also possible to flexibly and dynamically adjust the functionality in response to changing customer requirements. An important element of the XP methodology is the use of automated testing as a tool for early detection of software errors. This methodology basically minimizes the number of documents created during the entire software development process. Communication between members is based only on the oral transmission of guidelines, information and assumptions. This requires constant communication between team members and the development of software development standards so that the code and structure of the project are understandable to all stakeholders.

5.2 XP Components The XP methodology distinguishes three basic components: • values—set goals that each team member should achieve, • rules—a set of guidelines that allow the implementation of the practice in accordance with the accepted values, • internships—activities performed by team members in order to achieve a certain result. 5.2.1

Values

The basic value of the XP methodology is adequate communication in the team. This is due to the fact that the methodology is based on the exchange of experiences, comments and ideas between all team members. Thanks to communication, it is possible to maintain the consistency of the created product without the need to create time-consuming project documentation. The crucial importance of communication also stems from the fact that problems caused by its lack are often difficult to detect and lead to a loss of trust in individual team members. Simplicity is also an important value of the XP methodology. The aim of this value is to exclude complex solutions and focus only on those solutions that achieve the intended goals in the simplest possible way. Feedback is another value of the XP methodology. It is based on continuous product improvement, which changes, expands and develops during the development period. Instead of an immediate effort to obtain a finished product, the XP methodology is based on its continuous improvement through the analysis and observation of the created modules. It is often possible only after the implementation of several variants of the solution to make the right decision regarding the selection of a suitable solution. Courage is the value responsible for the ability to make effective decisions in the face of threats. It can also manifest itself in other ways, for example by being willing

422

A. Szewc et al.

to act before emerging problems or by calmly waiting for the collection of relevant data needed for a rational and responsible decision. This value is extremely valuable because it allows you to share responsibilities between team members. Mutual respect for other team members is a very important and at the same time necessary value. Thanks to the existence of this value, each employee feels as an integral part of a larger whole and at the same time is aware of their participation and value in the team. This will allow you to create a good atmosphere, and thus increase the efficiency and quality of work.

5.2.2

Principles

The basic principle of the XP methodology is humanity, according to which each member of the team should be treated primarily as a human being, ensuring his safety in the workplace, the possibility of self-realization and improving one’s abilities and one should understand one’s needs as an individual. Economics is a very important principle supported by XP. This methodology supports two economic aspects, which allows the creation of software with measurable commercial value. The first aspect is to get the software to customers as quickly as possible and take advantage of it. In XP, this was achieved through an incremental software development model, which allows it to be implemented in the first stages of a project’s lifecycle and then further developed. The second aspect is the flexibility of the created software and the ability to adapt it to the changing needs of users. The XP methodology enables a dynamic change of requirements, and thus, if necessary, easy adaptation of the product to the needs of many customers. A key element is also the possibility of applying systems created using the XP methodology to a wide range of similar problems, which was achieved by emphasizing the universality of the code and applied technological and design solutions. Another principle of the XP methodology is related to the mutual benefit of performing each activity for all team members. According to this principle, it is necessary to avoid activities that could damage or slow down the work of team members, unless they are necessary for the further success of the project. An example of such a procedure is forcing detailed documentation of the code, which is unnecessary for its creators, slows down their work and can be useful for future employees. However, according to this principle, measures should be taken that will bring positive results to all involved, although they require further work. An example of this behavior is the creation of automated tests that allow effective work on the Code while facilitating its subsequent adaptation or recasting, removing its excessive complexity and increasing its transparency for third parties [26]. Another basic principle in XP is self-similarity and improvement. They relate directly to the software development process. Self-similarity means reusing proven structures and behaviors in new contexts. Based on this principle, the form of the XP methodology was developed. Proven processes are repeated from the very beginning of the project’s existence until its end, in different contexts and on a different scale. The principle of improvement, on the other hand, requires that you strive for

The Documentation in the Project of Software Creation

423

excellence throughout the software lifecycle. However, it is not about focusing on getting the perfect product right away, but about the product, development process, tools and projects that are constantly used to develop, update and expand until the end result is achieved. The flow principle in XP applies to the constant creation of usable and executable code. The methodology does not distinguish between specific phases of software development, but assumes a continuous flow of activities. Thanks to this solution, the problems associated with the compilation, integration or merging of large modules occurring in classical methodologies have been eliminated. XP focuses on the continuous development of working software and its gradual development. For this purpose, the small steps rule is applied, according to which, if a complicated activity can be divided into smaller stages and gradually implemented, this should be the case. This not only allows you to maintain continuous system consistency, but also allows you to respond quickly to errors or changes in the project. This principle also has an economic basis, because in the event of a failure during the preparation of the module, the losses associated with the elimination of the stage are much smaller than the losses resulting from the need to remove the entire component. An important principle that reflects the features of XP is the constant effort to improve product quality. According to the assumptions of the extreme programming methodology, reducing the quality of the software produced in order to meet deadlines or reduce costs is an incorrect and unacceptable procedure. The aim of this methodology is as one of the main goals to always create the optimal code of the highest quality, which means that despite the higher initial costs, the costs required for the subsequent implementation and maintenance of the system will be significantly reduced.

5.2.3

Procedures

Exercises are a set of activities and certain procedures that are used daily by team members to implement the values of extreme programming methodology. The basic practice of XP is to create a common workplace. According to her assumptions, the more time employees spend together, the more efficient their work is [26]. The purpose of this practice is to facilitate team communication by placing all team members in one room. The process of creating a transparent and friendly environment is also applied in the workplace. Its main premise is to develop the space used by the team to reflect the state of the project while providing employees with a sense of security and peace. For this purpose, tables with information on completed tasks and tasks to be performed, as well as clear graphs reflecting the progress of work, are often used. Another common practice in XP is to build self-sufficient teams. It consists in the selection of team members so that their knowledge, specialization and experience enable independent and complete implementation of assigned tasks. This procedure allows team members to develop a sense of belonging, responsibility and self-worth, which results in a significant increase in their effectiveness.

424

A. Szewc et al.

An important practice in terms of employee performance is to perform energetic work. In the XP methodology, this is achieved by reducing working hours and increasing employee involvement during their duration. According to XP rules, overtime significantly reduces employee productivity and, as a result, does not contribute to improving the status of the project. Employees should therefore be mobilized to make a great effort within the set working hours and be remunerated for completing the assigned tasks in less than expected time. The basic practice of the XP methodology is programming pairing. According to his assumptions, the entire project code should be developed by components consisting of two people working on one computer. This approach has many positive aspects. In the first place, it will force the concentration and continuous mobilization of employees. It also increases product quality, as most bugs and errors are identified and resolved at the implementation level. Working in pairs also increases the productivity of programmers, because all kinds of problems are solved much faster and more of these solutions are often optimal and effective. Figure 22 shows an example of a development team workplace organization tailored to the needs of paired programming. Given that planning is a key element in the success of the project, the practice of creating scenarios has been adopted in the XP methodology. These are the phases of the planning process that define the individual elements of functionality [26]. Scenarios describe individual functionalities from the user’s point of view, define various patterns of behavior and variants. They are most often written on individual

Fig. 22 Exemplary organization of a programming team workplace in pairs [26]

The Documentation in the Project of Software Creation

425

Fig. 23 Sample description of a scenario on a sheet of paper

sheets and hung in a visible place in the room so that everyone can read them. An example of a scenario is shown in Fig. 23. Another procedure is to use weekly cycles. A meeting is organized at the beginning of each week to discuss the progress made so far and then compare it with the results achieved in recent weeks. It also decides on the implemented scenarios and creates a division into tasks assigned to individual teams. Such scheduling allows you to respond to changing conditions and allows you to adjust the load in response to received signals. Quarterly cycles are common practice for medium and large projects. Once a quarter, information is collected on the course of the weekly cycles and summaries are prepared on the basis of the project status, team dispositions and work progress. Quarterly meetings are also used to make long-term decisions on recovery processes, personnel policy and baseline scenarios to be implemented in the next cycle. Job practice is also a practice used in XP. It is implemented by assigning appropriate priorities to assigned tasks. Thanks to this procedure, the programmer realizes which part of his work is the most important and must be done, and which should be done only if time allows. Continuous integration is a process that forces you to merge and test your changes as often as possible. According to the principles of the XP methodology, the integration should take place several times a day. Programmers will not start any work waiting for the results. When the integration is complete, the team returns to work or fixes the errors that caused it to fail. Incremental design is one of the key practices of the XP methodology. This is done by daily analysis and adjustment of scenarios and extension of the system design. Thanks to this approach, the cost of modifying the developed software does not increase with the time that has elapsed since the beginning of the project.

426

A. Szewc et al.

5.3 Roles of Project Participants A very important aspect of the role methodology is the appropriate staff. The people involved in the project must be brave, educated and, above all, communicative. The ability of teamwork and the ability to take the initiative are also extremely important for achieving the goals of the methodology [27]. In the XP methodology, unlike the RUP, it is not strictly divided into roles in the project. Each member of the team should strive to achieve the team’s goals and perform various tasks, as long as it is able to take responsibility for them and perform them in accordance with the rules adopted. However, with the development of the team, the right balance between control and responsibility should be maintained. Each member of the team can propose changes to the project, but should be able to support their proposals with concrete actions [26]. In extreme programming methodology, testers are most often responsible for defining and creating automated tests before implementation. Their responsibilities also include informing programmers about the methods of optimal and effective use of created tests in practice. The tasks of iteration designers include creating scenarios, developing ways to use ready-made programs and components, and working with the client on the optimal formulation of tasks to be performed in subsequent iterations [26]. The role of architects in teams using the XP methodology is to plan and implement refactoring operations, create system and stress tests, and implement scenarios. However, architects taking on programming tasks are primarily focused on finding opportunities for greater change that can bring great benefits to the team [26]. Project managers are responsible for contacting the client to explain and discuss key aspects of the system. Their task is also to analyze and evaluate the progress of the project. In contrast, product managers have the task of prioritizing subsequent tasks, dividing tasks among team members, and preparing scenarios to be implemented in the current software development cycle. The tasks of the programmers include estimating the costs of the proposed scenarios, dividing the scenarios into sentences and implementing other functions and tests.

5.4 Documentation in XP Methodology Extreme programming represents a “new wave” among software development methodologies called easy access. It is primarily based on communication between team members and on short project cycles. Thanks to this approach, the project documentation process in the XP methodology is completely marginalized. The assumptions of the methodology defined by its author Kent Beck do not contain any required documents, as they are replaced by code, whiteboard and information cards hung on the walls in the workplace. Figure 24 shows examples of wall scenarios.

The Documentation in the Project of Software Creation

427

Fig. 24 Example of arranging scenarios on the wall [26]

The whole software development process was divided into small stages and tasks performed in individual stages and the client’s requirements are dynamically defined on a weekly basis. This approach significantly reduced the cost of the project, which in classical methodologies was often generated by the need to create a large number of artifacts and the subsequent need to update and modify them. As a result, however, the XP methodology causes software maintenance issues. This is because the only artifacts created throughout the process are test cases and code. Due to the fact that the pure XP methodology does not define any documents, several modifications were made in order to create the basic project documentation. This documentation is intended not only to support the memory of team members, but also to enable effective management and maintenance of the product after its implementation at the customer. The client also often requests it as a guarantee of the consistency of the work performed and the vision of the system created by the user. The modifications introduced in XP were also caused by difficulties in implementing some of the assumptions of this methodology.

5.4.1

Requirements Documentation

The pure XP methodology requires the client to be on the team during the software development process. Such contact with the client makes it possible to resign from any documentation of user requirements in favor of free sheets, which only briefly describe the scenarios implemented at the time. In many situations, however, continuous communication with the client is impossible, and therefore it was necessary to make some adjustments to the methodology. They consist of creating an additional element in the form of a requirements document. It contains a set of scenarios developed by the team together with a record

428

A. Szewc et al.

of all agreements between the creators and the client that were made at the stage of its creation. It also contains information about the costs of the scenario, the date of implementation, and the functionalities that may be affected by the change or extension. It is also common practice to include the results of automated tests in this document and comments on their progress. Such a document not only enables the consolidation of findings and requirements defined by the client during the software development process, but is also very useful even after finishing work on the product in a situation where the client has doubts about the correctness of functionality [28].

5.4.2

Project Documentation

The XP methodology uses short design cycles that focus on independent scenarios. Therefore, it is not necessary to create design documents containing a vision of the entire system. If it is necessary to create a small design of the functionality to be created, this is most often implemented through UML diagrams drawn on a whiteboard in the workplace of programmers. The disadvantage of such a solution is that after finishing the work on the given scenario, all information and architectural data related to the given scenario will be lost. This is especially disadvantageous in the situation of introducing new team members or later changes to the created functionality [28, 29]. The solution to this problem is to modify the XP methodology consisting in the systematic creation of general project documentation. This documentation is usually one document containing a brief description of the system and its most important elements, as well as a brief description of its architecture, supplemented by basic diagrams. Another element is the inclusion in the above document of a brief description of the scenarios that have been implemented, together with the UML diagrams created for them. Creating such documentation requires virtually no time. The documentation is created in parallel with the development of the system and its main elements reflect and document what has been created for the given scenarios and stored on perishable information carriers [30].

5.4.3

Code Documentation

The XP methodology is based on clear, transparent and understandable code. It is the only carrier of information about the implementation of scenarios, the structure of the system and the way the problem is solved by the programmer who creates the given functionality. Therefore, this methodology places such emphasis on refactoring and code description. According to his assumptions, the code itself is the best documentation. However, sometimes additional information is needed to help you understand how it works and the purpose of certain elements to third parties [31].

The Documentation in the Project of Software Creation

429

Comments are the main carrier of information about the code being created and convey the creator’s vision to others. Therefore, the XP methodology is often modified by adding special elements to facilitate and force commenting and description of the generated code. Thanks to this procedure, it is possible to use tools such as JavaDoc 11 to generate complete and comprehensible information for people involved in the project documentation of the product implementation process. An accurate description of the purpose of each class, object, or method also allows for more effective treatment of other team members. It is also required if the customer wishes to purchase a product code from a software company.

5.5 User Documentation The creators of the XP methodology followed the following assumptions regarding the user documentation [28, 32]: • Anyone who has created the software knows that almost no one reads the manual. This is known from the technical support of the product, because users often ask basic questions. It is also known because most of us do not read the guide. • Web applications do not have user manuals, some of them have several help pages or only basic information per page. • Box-sized software is increasingly being released on CDs. This solution works—it has been scientifically found that more people are reading these books or searching for a search phrase. • Extreme programming is all about creating working software. All other elements are excluded, if they are not essential. • Developed software should be easy to use so that you do not have to write extensive instructions for its use. The result of this approach to user documentation is the production of user manuals that are concise, concise and informative.

5.6 Summary of Part 5 Extreme Programming is a methodology focused on fast, efficient and inexpensive software development. It is based on excellent team communication and a gradual development cycle of the system. In addition to many advantages, this approach has certain disadvantages. This may include, for example, a lack of project documentation, uncertain responsibilities, difficulties in maintaining and extending the implemented software, and a lack of strictly defined requirements that the system should meet [33]. When analyzing the characteristics of the XP methodology, it seems clear that the effectiveness of this methodology decreases with increasing project size. As a

430

A. Szewc et al.

result, some XP features are modified and modified. For medium-sized projects, modifications consisting in the introduction of several documents into the software development process are sufficient, but for very large projects, it happens that the XP methodology is combined with selected elements of the RUP methodology. In this case, the document files turn out to be crucial, as they make it possible to maintain the consistency of the whole process, despite the obvious costs they create.

6 Analysis and Development of the Form of Documentation Each of the methodologies mentioned in the previous chapters represents a different approach to the project documentation process. From formalized and very detailed documents promoted by the RUP methodology, through a large number of forms and descriptions in natural language in a cascading procedure to free pages and whiteboards used in the XP methodology. The purpose of this chapter is to develop a universal design of the documentation process, which would be suitable especially for medium-sized projects.

6.1 Balance Between Agility and Discipline The methodologies described so far have represented completely different approaches to the software development process and its documentation, but each of them is usable and works perfectly in other types of programming projects. In the case of small projects, where the emphasis is mainly on minimizing costs and implementation time, the XP methodology is used. This is because it assumes the performance of only those activities that are directly related to the development of software functionality—such aspects of the software development process as documentation, planning, project management, implementation, product maintenance and further maintenance. The RUP methodology works better for complex projects implemented by large project teams. It allows you to accurately plan, document, and coordinate the entire software development process by dividing the project into appropriate phases and assigning roles responsible for their implementation, extensive and detailed artifacts created at each stage of the software lifecycle, and a detailed description of activities undertaken in each. However, the modern market requires more than the above methodologies can offer, especially for medium-sized projects. The limited budget and customer requirements that change during the software development process preclude the use of the RUP methodology. On the other hand, the need to document the basic life cycle processes, the sale of source code, maintenance of the product after its implementation and limited contact with the customer make it impossible to use a clean XP methodology. Therefore, it is necessary to create a new or modify one of the existing

The Documentation in the Project of Software Creation

431

methodologies with elements that allow the fulfillment of such targeted and precise customer requirements in relation to the system development process [34]. As Barry Boehm and Richard Turner noted, “any thoughtful effort in a changing world requires agility and discipline” [35]. Thanks to this domain, the universal methodology should combine as much as possible the elements and assumptions of the agile and classical methodology. Later in this chapter, the original idea of the concept of the documentation process intended for medium-sized projects will be presented. The XP methodology was used as a basis, as it is currently the most popular among software producers. However, this methodology in its purest form does not ensure the appropriate level of documentation and does not define the content of the most important documents. Therefore, it is necessary to introduce certain adjustments to increase the level of discipline in the project. The later developed documentation is aimed at eliminating the basic shortcomings of the XP methodology at the lowest possible cost through a clever and reasonable selection of project documents. This is especially important for medium-sized projects created in accordance with XP principles, as they do not define any documentation processes in a clean form.

6.2 Project Planning and Management Documentation Business planning and management are an essential part of any programming project. They allow you to create a vision of the entire process of creating a system and then manage its progress. However, the documentation of these elements was practically completely omitted in the XP methodology. Due to the lack of artifacts, the scope of the entire project is unclear, which often causes conflicts with the client. Another disadvantage of this approach is the very general approach to the estimated costs and duration of the entire project, which is often required by the customer. The XP methodology also assumes the definition of successive scenarios, which are implemented dynamically before each weekly or monthly cycle. However, this approach is insufficient for larger projects because it does not allow for an overall vision of the system development process, which allows the client to develop the direction the process should take. For example, some customers may need to develop an application skeleton first, while others gradually implement new features. The lack of any project documents also causes problems in creating a clear and substantive contract with an estimate of the required resources of staff and equipment [36]. From the point of view of program management of the project, a special disadvantage of the XP methodology is the lack of relevant documentation describing the assumptions and work to be performed in the next phases, as well as the lack of artifacts containing the work process, its evaluation and conclusions. Such documentation is useful not only for the project team, but often also required by the client. From the team’s point of view, it is an element supporting human memory that clearly defines the scope of work and responsibilities and reflects the current state of the project. For the client, on the other hand, it is an element that reflects the degree of progress of the work, the timeliness of its execution and the activities that

432

A. Szewc et al.

will be done in the next steps. This is especially important when the client cannot stay with the team and actively participate in his work. Such documentation is also useful during subsequent iterations, as it allows conclusions to be drawn regarding the accuracy of cost estimation, the involvement of individual employees and the assignment of tasks in subsequent phases. To minimize the problems described above, create a set of relevant documents. The basic and at the same time essential artifact is the software development plan. This document should be created before the start of the project and then approved by the customer and the supplier. This artifact should contain the following information: • • • • • • • • •

critical customer requirements for the system, skeleton of architecture, a list with a brief description of the technologies used, programming environment used in the production of the system, human and equipment necessary for the implementation of the project, deadlines for implementation of the main modules, estimated costs for the implementation of individual modules, the expected date of delivery of the finished product to the customer, Estimated product implementation date.

A document template has been developed on the basis of the following artifacts of the RUP methodology: • • • •

deployment plan, software development plan, Business case, Order.

Other artifacts that are not included in the XP methodology and that are necessary for the business planning and management process are the Iteration Plan and the Iteration Assessment. The first is the consolidation of decisions taken at meetings held before the start of the next cycle. As in the RUP methodology, it must contain information on the tasks to be performed in the next iteration, their completion dates, costs and priorities. For the purpose of adapting to the XP methodology, this document should also contain a list of all scenarios performed in the course and the persons responsible for their implementation. The client should be involved in creating this artifact, because it is primarily up to him which scenarios and in which order they should be executed. A document summarizing the iteration should be created at the meeting after each completed cycle. It includes information on the progress of work, as well as conclusions arising from problems that occurred in the course. An important element is also the results of test cycles created within a given iteration of functionality. They allow you to decide how to implement the module in the product or download it for repairs [37].

The Documentation in the Project of Software Creation

433

6.3 Documentation of the Requirements Definition Process According to the assumptions of the XP methodology, the requirements are defined by the client before each iteration in the form of scenarios. The advantage of this approach is the flexibility of the project, as only the near future is planned and a complete set of requirements is created throughout the software development period. Creating scenarios and constant contact with the client also eliminates the problem of not understanding the requirements or their incorrect definition. The XP methodology assumes that the created functionalities are not documented and specified in any way. The planning cards only provide a general description, estimated working hours and risks. All details regarding the operation are already agreed directly between the developer and the client. This allows the development of functionality that meets all customer expectations already in the implementation phase, even if it was not able to define consistent and rational requirements in the design phase. An important aspect of this approach to the implementation of requirements is also a significant reduction in costs for the entire project, which is the result of a lack of any documents, resignation from the requirements determination phase, limitations of the analysis process and function design. The approach to the process of collecting and specifying customer requirements, proposed by the creators of the XP methodology, has, in addition to undoubted advantages, many disadvantages that cannot be ignored and ignored. One of the most important and significant shortcomings is the missing set of implemented requirements and their documentation at the time of project start. This hinders the process of planning such aspects of software development as project duration, hardware and personnel requirements, as well as the total costs and benefits generated during project implementation. The lack of documentation of requirements makes it impossible to create a contract for the implementation of the system, which is clear and unambiguous for both parties. It also causes huge problems in product implementation, maintenance, and updating. The lack of description and analysis of requirements, from which the XP methodology resigns in favor of direct cooperation between the client and programmers, results in difficulties in determining the implementation time of the functionality, which generates problems with schedule implementation and contributes to inefficient management of time and resources in different cases. iterations, as independent scenarios, cover a common area of code. The effect of applying the net assumptions of the XP methodology in the context of the process of collecting requirements and documentation is the so-called. This phenomenon consists in constantly adjusting and increasing the extent to which each scenario should be implemented by the client. This is due to the fact that the client does not define a detailed description of the created functionalities, which software developers should then analyze. The scenarios it presents do not allow a detailed and rational estimation of the costs and scope of work, which causes misunderstandings and conflicts between the client and the manufacturer, and exposes creators to delays and additional costs.

434

A. Szewc et al.

The main factor causing the problems described above is the lack of a defined process for documenting customer requirements in the XP methodology. Therefore, it is necessary to extend the set of extreme programming artifacts with additional elements. The basic and at the same time necessary documents that analysts and system architects should prepare in close cooperation with the client before starting the project are a glossary and a specification of software requirements. The glossary defines specific terms, entries and terms used in the field covered by the project. This document enables accurate and unambiguous communication with the client, thus eliminating problems such as the use of specific and specialized terms that may have been incomprehensible or misinterpreted by both the client and the software manufacturer. The RUP methodology defines a number of artifacts focused on the specification of requirements, their analysis and the definition of the process of their collection, formulation and implementation. However, due to the basic assumptions of agile methodologies, not all of these documents should be used in projects created using XP. When analyzing the characteristics of the process of formulating and implementing requirements in XP, it seems necessary to create a document containing all implemented requirements, estimated amount of work required to implement them, total cost of the project, as well as specific conditions and dependencies that must be met to the functionality could be implemented. It is also important to explicitly define the priorities for each task and the initial order in which they are performed. The document containing the above information is a specification of the software requirements used in the RUP. For the purposes of the XP methodology, this document replaced the use cases characteristic of RUP with requirement scenarios and removed information on the process of collecting and formulating user needs, which are defined by the assumptions of agile methodologies. This document should be created before the implementation work begins and, together with the software development plan, should form the basis of the contract.

6.4 Documentation of Analysis and Implementation A characteristic feature of the XP methodology is the replacement of the design and analysis phases, which are key for classical methodologies, for close cooperation and good communication between team members. An approach based on interviews, the exchange of knowledge and the solution of complex problems in the forum makes it difficult and lengthy to create any documentation of the system design and analysis process. This is also due to the very flexible approach to defining requirements, which stems from the fact that tasks performed in the current phase are determined dynamically before each iteration and scenarios created on sheets of paper hung in the workplace contain only basic information that is more of a guide for programmers, and no specification of customer requirements [38]. A serious disadvantage of such a solution is not only the lack of any analysis and project documentation, but also the lack of final and unambiguous requirements, which are usually specified only in the implementation phase through constant

The Documentation in the Project of Software Creation

435

contact between suppliers and the customer. As a result, projects implemented using the XP methodology are very expensive to maintain, update, and extend because they require an analysis of how a given functionality, module, or configuration works from scratch each time. The problems outlined above would be solved by introducing scenario specifications into the XP methodology artifact file. This document would systematize and detail the client’s requirements and describe the most important measures taken during the requirements analysis at the project meetings. It would also define the details of implemented and currently implemented scenarios, such as title, content, estimated implementation time, priority, status, interface design, implementation start and end dates, limitations, and assumptions. The data in the document should be entered at the beginning of each iteration—after the client selects scenarios from the specification of all system requirements to be implemented in the next phase. Each of the scenarios should be thoroughly discussed, verified and detailed. Figure 25 shows a sample project of the invoice list interface of the electronic invoice payment module based on customer requirements. Only after the completion of these activities can the scenario be entered in its specification, which would additionally state its estimated costs, the persons responsible for implementation, as well as the limitations and assumptions that should be taken into account during implementation. The scenario defined in this way is unambiguous and contains all the information necessary for its implementation. It is also the basis for verifying the correctness of the implemented functionality.

Fig. 25 Sample user interface design

436

A. Szewc et al.

Another document that should be part of the XP methodology artifact set is the implementation model. Its purpose would be to document the changes made to the system. Such a solution would minimize the costs and risks associated with maintaining and updating the created product while increasing control over the created code and system structure [39]. This document should be based on artifacts of the RUP methodology, such as: • • • •

analysis model, data model, design model, implementation model.

The implementation model will contain information about the scenario being created, useful for other team members, such as a general description of the implemented functionality, the mechanisms and libraries used, and a specification of added or modified system elements. This document should also contain information on the required data, its format, configuration and the hardware and technical requirements necessary for the proper functioning of the system. If diagrams, charts or other auxiliary materials were created while working on the scenario, they should also be attached to the document. Figure 26 shows an example scheme of changing the status of invoices in the new system of the module—electronic payment of the invoice, which should be included in the implementation model. Placing the diagram below will make it easier for programmers who make changes to the resulting module to understand how the newly created functionality works without having to analyze the code. Creating an implementation model requires relatively little effort by programmers, because the implemented scenarios are not extensive and the changes most often concern only part of the system. At the same time, all modifications introduced as part of a given functionality are obvious and their creators can be easily formulated at the time of implementation. Creating an implementation model brings significant benefits to the entire team, as it allows its members to track current changes in the system and possible early identification of errors, before handing over the functionality to the client. This document also allows you to quickly introduce new team members during the project. Another argument in favor of the implementation model is a significant reduction in system maintenance and upgrade costs, because together with the code documentation, it creates a set of important information on how to implement all product functionalities [40].

The Documentation in the Project of Software Creation

437

Fig. 26 An example diagram of changing invoice statuses in the electronic invoice payment module

6.5 Test Documentation and Product Implementation The XP methodology does not define any artifacts related to product testing and implementation. However, the lack of testing documentation does not have a significant impact on the product implementation process, as systems designed to manipulate and control the application process, such as JIRA, have become very popular today. Product defect reporting systems allow the customer to report defects, forward them to the appropriate persons on the manufacturer’s side, monitor the progress of work and monitor implemented changes. They also have many other features that allow binding errors and extensions to thematic groups, automatic creation of reports and graphs presenting the progress of work and their progress at any time. These systems successfully replace traditional artifacts practiced in classical methodologies such as RUP. In addition to the undoubtedly time-saving time required to create extensive documentation, they also allow you to monitor the current status of the project and respond to any incidents and problems that occur during system testing and implementation. They will also allow you to save a complete history of all requests, allowing you to improve your work on similar issues in the future. These systems are also equipped with a module that allows the system to be assigned to a

438

A. Szewc et al.

code store so that the programmer can at any time check which parts of the system have changed during the correction of the error message. A significant problem in the XP methodology is the lack of any implementation documentation, which often causes problems in the maintenance of the finished system, especially if there has been a significant rotation of the workforce. Incorrect description of hardware, software, and any external components that affect the proper operation of the system often causes problems when updating and modifying the system. This is especially important in a situation where programmers do not have direct access to the environment in which the system operates—for example, to protect clients’ personal data from clients. In this case, the lack of information about the exact configuration of the system often causes a significant increase in costs and the risk of implementing operating system extensions. The solution to this problem is to create a suitable document containing the complete hardware configuration and requirements for software and external systems. A system implementation specification template is provided in Annex 8.

6.6 Summary of Part 6 The XP methodology is extremely popular and widely used in current programming projects. Its flexibility, dynamism and approach based on creating useful code without unnecessary documentation works well in most small projects. It enables perfect adaptation of the created software to the needs of the client and a significant reduction of costs, which makes it extremely competitive in relation to classic software development methodologies [41]. However, the limitations resulting from the lack of any documentation mean that the XP methodology is not applicable to the implementation of medium and large projects. Such projects often require an initial determination of customer requirements, analysis and cost estimation for the entire project. The group of clients ordering such projects usually has much higher expectations from the created system. They often require at least illustrative documents representing the current state of the project, a budget statement, a description of the functions created and the technologies used. A popular practice is also the need to service and extend the product after the implementation phase at the customer, which is extremely inefficient and expensive for systems built in accordance with the XP methodology. The solution to the above problems can be the creation of a suitable set of artifacts extending the XP methodology documentation. Properly selected and prepared documents will not significantly increase the cost of the process and will allow a wider use of the discussed methodology—in larger projects and more advanced systems. The documents discussed in this chapter have been selected and developed to provide the information needed for the software development process, such as the system plan, scope of requirements, how to implement functionalities and their implementation in a way that interferes as little as possible with the basic principles of XP methodology. The main goal in creating the discussed documentation process was to

The Documentation in the Project of Software Creation

439

achieve with the lowest possible costs and effort the level of project documentation, which would not only allow reasonable and consistent management of a long-term project, but also allow its subsequent expansion and maintenance. This was achieved by selecting the most important elements of the artifacts of the XP methodology and assembling them into a comprehensive unit enabling the documentation of key phases of the software life cycle [42].

7 Summary of the Work Along with the rapid development of the IT market and the growing demand for increasingly complex systems tailored to the needs of a particular customer, software developers have noticed the need to create a set of rules and principles that standardize and organize project implementation. As a result, the first software development methodologies were created, which defined not only the process itself, but also the notation, methods of building, managing and updating artifacts created during the project implementation. Subsequent methodologies developed in response to the ever-changing needs of the market referred differently to the project documentation process. Starting with very general and informal documents, based mainly on natural language, and forms specific to classical methodologies, implementation of a cascade model of the software life cycle, through artifacts of the RUP methodology, strictly specified in terms of content, people responsible for their creation and disciplines in which they should arise and end with verbal arrangements and constant contact with the client, which eliminates the need to create any documentation in extreme programming. The analysis of the above methodologies and the form of the documentation process defined by them shows that each methodology is suitable for a certain group of projects. However, there is no universal methodology to define a project documentation process that would work well for both large projects, which are characterized by the need to clearly formulate and store information about the project being created and the work carried out during its implementation, as well as for small projects. the most important feature of the project is its flexibility and limitations. costs generated by labor-intensive documentation. The form of the documentation process developed in this article is based on the basic assumptions of the XP methodology, as it is one of the most popular methodologies and is often used in small programming projects. However, too small a set of artifacts defined by it, which usually does not meet the requirements of both software developers and clients, makes it impossible to apply this methodology to more advanced projects. Therefore, it was necessary to analyze the most important elements of classical methodologies, such as RUP, and to extract from them the part of the documentation that complements the XP methodology. An important aspect of this process was the selection and definition of the content of the artifacts in such a way that they interfere as little as possible with the basic assumptions of the XP

440

A. Szewc et al.

methodology and that they do not represent a great burden for the people involved in the project. By monitoring the basic problems in projects developed in accordance with the assumptions of the XP methodology, it has been shown that they can be eliminated relatively easily. Thanks to the adequate documentation of requirements, the course of the iteration and the tasks performed, although this adversely affects the flexibility and overall cost of the project, the XP methodology is more universal, which allows its use in a much wider group.

References 1. Kroll, P., Kruchten, P.: PHP5. Rational Unified Process Od Strony Praktycznej. WNT (2007) 2. http://www.bryk.pl/teksty/studia/pozostałe/informatyka 3. Krzysztof Rychlicki-Kicior: Java EE 6. Programowanie aplikacji WWW. Ed. Helion. Gliwice (2010) 4. Kaczor, S., Kryvinska, N.: It is all about services—fundamentals, drivers, and business models. Soc. Serv. Sci. J. Serv. Sci. Res. 5(2), 125–154 (2013) 5. Cadle, J., Yeates, D.: Zarz˛adzanie Procesem Tworzenia Systemów Informacyjnych. WNT (2004) 6. Kryvinska, N.: Building consistent formal specification for the service enterprise agility foundation. Soc. Serv. Sci. J. Serv. Sci. Res. 4(2), 235–269 (2012) 7. Hemrajani, A.: Java. Tworzenie aplikacji sieciowych za pomoc˛a Springa, Hibernate i Eclipse. Ed. Helion. Gliwice (2007) 8. Gregus, M., Kryvinska, N.: Service Orientation of Enterprises—Aspects, Dimensions, Technologies. Comenius University in Bratislava (2015). ISBN: 9788022339780 9. http://pjwstk.wafel.com/byt/BYT.html 10. Liderman, K., Arciuch, A.: Projektowanie Systemów Komputerowych. BEL Studio Sp. Z. o. o. (2001) 11. Kryvinska, N., Gregus, M.: SOA and its Business Value in Requirements, Features, Practices and Methodologies. Comenius University in Bratislava (2014). ISBN: 9788022337649 12. http://www.nurt.pl 13. http://www.ipipan.waw.pl/~subieta/artykuly/JezykUML.doc 14. York, R.: Beginning CSS: Cascading Style Sheets for Web Design. Wiley Publishing Inc., Indianapolis, Indiana (2005) 15. http://www-01.ibm.com/software/rational/uml 16. Poniszewska-Maranda, A., Majchrzycka, A.: Access control approach in development of mobile applications. In: Younas, M., et al. (Eds.) Mobile Web and Intelligent Information Systems, MobiWIS 2016, LNCS 9847. Springer, Heidelberg (2016). https://doi.org/10.1007/ 978-3-319-44215-0_12. ISSN 0302-9743, ISBN: 978-3-319-44214-3, pp. 149–162 17. Poniszewska-Mara´nda, A.: Security constraints in access control of information system using UML language. In: Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE-2006) (2006) 18. Poniszewska-Mara´nda, A.: Access control coherence of information systems based on security constraints. In: SafeComp 2006: 25th International Conference on Computer Safety, Security and Reliability, September 2006, LNCS 4166, pp. 412–425. Springer, Heidelberg (2006) 19. Schildt, H.: In: Helion (Ed.) Java Kompendium programisty. Gliwice (2005) 20. St˛epie´n, K., Poniszewska-Mara´nda, A.: Towards the security measures of the vehicular ad-hoc networks. In: Andrzej, M.J., Skulimowski et al (Eds.) Internet of Vehicles. Technologies and Services Towards Smart City, IOV 2018, LNCS 11253, pp. 233–248. Springer, Heidelberg

The Documentation in the Project of Software Creation

21.

22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42.

441

(2018). https://doi.org/10.1007/978-3-030-05081-8_17, ISSN 0302-9743, ISBN: 978-3-03005080-1 Poniszewska-Mara´nda, A., Rutkowska, R.: Access control approach in public software as a service cloud. In: Zamojski, W., et al (Eds.) Theory and Engineering of Complex Systems and Dependability, in Advances in Intelligent and Soft Computing, Vol. 365, pp. 381–390, Publisher: Springer-Verlag Heidelberg (2015). ISSN 2194-5357, ISBN 978-3-319-19215-4 Molnár, E., Molnár, R., Kryvinska, N., Greguš, M.: Web Intelligence in practice. Soc. Serv. Sci. J. Serv. Sci. Res. 6(1), 149–172 (2014) Poniszewska-Mara´nda, A.: Wykłady z przedmiotu „In˙zynieria Oprogramowania II. Instytut Informatyki, Łód´z (2009) Andrew, R.: The CSS Anthology: 101 Essential Tips. Tricks & Hacks, SitePoint Pty. Ltd. (2004) Kruchten, P.: PHP5. Rational Unified Process Od Strony Teoretycznej. WNT (2007) Beck, K., Anders, C.: Wydajne Programowanie Extreme Programming. MIKOM (2005) Hnatkowska, B., Huzar, Z.: In˙zynieria Oprogramowania—Metody Wytwarzania i Wybrane Zastosowania. PWN (2008) http://xprogramming.com/xpmag/expDocumentationInXp Huzar, Z.: Wprowadzenie do j˛ezyka UML. Materiały z konferencji „Systemy czasu rzeczywistego”, Zakopane (1999) Liderman, K.: Formalizacja procesu pozyskiwania informacji dla potrzeb specyfikacji wymaga´n na projektowany system. Zeszyt 9, WAT, Warszawa (1998) Patkowski, A.E.: Dokumentowanie procesu projektowania. Biuletyn IAiR str 57-70 WAT, Warszawa (1998) Yourdon, E.: Marsz ku kl˛esce. Poradnik dla projektantów systemów, WNT, Warszawa (2000) Brooks, F. Jr.: Eseje o in˙zynierii oprogramowania. WNT, Warszawa (2000) Jacobson, I., Booch, G.: The Unified Software Development Process. Addison-Wesley, Boston (1999) Boehm, B., Turner, R.: Balancing Agility and Discipline. A Guide for Perplexed, AddisonWesley, Boston (2004) Thomas, D., Hunt, A.: The Pregmatic Programmer. Addison-Wesley, Boston (1999) Koszlajda, A.: Zarz˛adzanie projektami IT. Przewodnik po metodykach, Helion (2008) Wrycza, S., Marcinkowski, B.: J˛ezyk UML 2.0 w modelowaniu systemów informatycznych. Helion (2006) Shore, J., Warden, S.: Agile Development. Filozofia programowania zwinnego. Helion (2008) Duckett, J.: Accessible XHTML and CSS Web Sites Problem–Design–Solution. Wiley Publishing Inc., Indianapolis, Indiana (2005) Babin, L.: Beginning Ajax with PHP from Novice to Professional. Apress, New York (2007) http://kis.pwszchelm.pl/publikacje/II/Powaga.pdf

E-Commerce Platform Using SQLite Michał Kieszek, Vincent Karoviˇc, and Iryna Ivanochko

Abstract The presented work describes the process of online store software development and administration panel for its administration, based on SQLite database. It lists the technologies and tools that were used to create the online store application. In addition, the performance of the database used was tested on another popular database management system. Their advantages and disadvantages are presented, as well as exemplary deals that were realized with their help. Part of the lever is the technical documentation of the created project and the administration panel of the online store. The documentation contains a general description of the application, which explains its structure. At the end of the work, the documentation of the system administrator and the user documentation of the store are presented. Keywords Online store · Software development · Administration panel · Electronic commerce

1 Introduction Trade has accompanied humanity since time immemorial. People have found that they often want to own more different types of goods than they can acquire or produce themselves. For this reason, barter was gaining more and more popularity. Marketing already existed at that time, but over the years its appearance has changed. Nowadays, money is very valuable because you can buy almost everything for it. Sellers want to sell the goods and products they offer, at the highest possible price and by all possible means. They use all possible media to reach the widest possible group of people.

M. Kieszek Lodz University of Technology, Lodz, Poland V. Karoviˇc (B) · I. Ivanochko Faculty of Management, Comenius University in Bratislava, Bratislava, Slovakia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_15

443

444

M. Kieszek et al.

In recent years, the Internet has become increasingly popular. It is used at home and at work. Access to it becomes easier and cheaper. Due to the great popularity of this medium, retailers are increasingly seeing it as an opportunity to reach a very wide audience. They can offer their services and goods almost constantly. In addition, the distance of the device from the customer’s place of residence ceases to be a limitation. This is facilitated by courier companies, which are able to deliver the ordered goods in a very short time. The number of open online stores is growing very fast. They are very diverse and try to test customers in different ways. However, it should be borne in mind that the comfort of use and the visual effect are very important for a potential customer, as they are the seller’s showcase. Simplicity and intuitive design are also very important. It is important that the store is intuitive and easy to navigate, because if you want to reach a large number of customers, we must remember that not everyone is well versed today. The speed of the operation is also a very important issue. It is influenced not only by the technology in which the store itself is written, but also by the database it uses. The aim of this work is to create an online store software and administration panel for its administration, based on SQLite databases. In addition, the study examines whether the selected database is a good solution and compares it with another popular database server. The second chapter presents the concept of e-commerce, its classification, presents and discusses its most important applications. In addition, the definition of m-commerce, which has been developing dynamically recently, has been explained. The third chapter presents the most important information about e-commerce in Poland. The most important methods of store promotion to attract new customers and information on the most important payment methods in online stores, as well as information on the software used in stores, were discussed. The fourth chapter presents the most important types of online store software, t. J. Platforms offering ready-made online stores and stores built on commercial and Open Source software. Their advantages and disadvantages are presented, as well as exemplary deals that were realized with their help. Technologies and tools were described in the fourth chapter and were used to create an online store application. In addition, the performance of the database used was tested on another popular database management system. The sixth chapter is the technical documentation of the created project and the administration panel of the online store. The documentation contains a general description of the application, which explains its structure. The seventh chapter presents the system administrator’s documentation and the user’s shop documentation.

E-Commerce Platform Using SQLite

445

2 E-Commerce The term electronic commerce means electronic commerce, that is, it combines the production and sale of goods over the Internet. E-commerce consists of four main business processes: promotion and marketing, ordering, payment and delivery. Only for the delivery of non-digital goods must it be delivered in the traditional way. Many services, such as setting up a free e-mail account or financial services (opening a bank account, buying shares) are provided only electronically [1, 2]. The term e-commerce is often confused with e-business. The concept of e-business covers all business activities on the Internet (including, for example, production, inventory management, risk management, finance, data exchange in the company, exchange of information between manufacturers, distributors and recipients of products and services). On the other hand, e-commerce is primarily the external processes of the company, such as marketing, sales, taking orders, contact with the client. E-commerce is therefore only a subset of e-business [1].

2.1 History The concept of e-commerce was first introduced more than 40 years ago. Initially, it was an exchange of electronic documents (data) (EDI, electronic data interchange) and money transfers (EFT, electronic money transfer). This consisted in companies ordering and sending invoices electronically. In 1979, the English inventor Michael Aldrich discovered the e-shop and just two years after Thomson Holidays (an English travel agency) introduced the possibility of booking tours online. In 1982, France Telecom (a French telecommunications operator) introduced the possibility of ordering its services online. A breakthrough in e-commerce was the writing of the first Internet browser by Tim Berneas-Lee in 1990 [2]. In 1995, two of the most popular e-commerce sites in the world appeared: amazon.com and ebay.com. The first was initially an online bookstore selling books over the Internet. Due to the growing popularity of this website, its range has been constantly expanding, for example with films, music recordings, then electronic devices, furniture and even food. Amazon currently has offices in Canada, England, France, Germany, Japan and even China, among others, although most goods can be ordered in other countries. In the case of ebay.com, it originally focused on arranging sales between users, acting as a platform offering online auctions. It could be used by individuals and companies, thanks to which it quickly became popular. It currently has more than 247 million registered users worldwide. Both of these companies have been very successful and are still operating [3, 4]. The first Polish online store is totu.com. It was founded on July 16, 1997 in Poznan. He sold food, chemicals and cosmetics over the Internet. It gained popularity because the store itself delivered the ordered products within 2 h, originally in four cities in

446

M. Kieszek et al.

Poland. Over time, totu.com’s offering has expanded to include other products, such as home appliances or electronics, as in the case of amazon.com [5]. The first auction site operating in Poland is allegro.pl. It was founded in 1999. Its creator and originator is Arjan Bakker and programmer Tomasz Dudziak. From the beginning, their platform was focused on the sale of goods and services through auctions and at fixed prices—the so-called “buy now”. In 2000, their team consisted of 8 people, while the company currently employs almost 300 people. Allegro was the first in Poland to introduce a loyalty program and an online payment system. Allegro.pl is currently the largest trading platform in Poland [6].

2.2 Types of E-Commerce There are four main types of e-commerce: B2B, B2C, C2C and C2B. B2B (business to business) is a relationship between two companies, which usually consists of the wholesale of goods and services. In this concept, in addition to trade between two companies, there is also sales between branches of the same company. Many companies have special websites to which only their branches and companies that work with them have access [1, 2]. B2C (Business to consumer) is the retail sale of services and goods by businesses to private individuals. An example of such a transaction is retail sales in an online store. The form of electronic commerce between two private persons is C2C (consumer to consumer). Most often, transactions are concluded through auction sites (such as Allegro.pl), which make it easier for the seller to display his offers and find the customer. Such websites make it possible to buy and sell products safely, because after the conclusion of the transaction, a purchase and sale agreement is reached, which is regulated by law. Such an agreement obliges the buyer to transfer ownership of the product to the buyer, who is obliged to pay and collect the auctioned item. The least popular category of e-commerce is C2B (consumer to business), in which a private person submits a purchase offer to many sellers or manufacturers, stating a description of the product or service sought and the maximum price he can pay. An example of such a website in Poland is Ofertaia.pl, where the client can search for suppliers and choose the most advantageous offer [1, 2].

2.3 Examples of Use E-commerce websites are specially designed websites that allow you to sell or buy goods and services over the Internet. The most popular applications are online stores and auction systems. This is where most Internet users make transactions. Another example could be reservation systems (for example, a ticket reservation system for sporting events). E-commerce is not just a sale of goods, and therefore includes

E-Commerce Platform Using SQLite

447

electronic banking. With internet access from your bank account, you can make transfers, deposit funds or even take out a loan.

2.3.1

Online Stores

Online shopping is the most popular use of e-commerce. It allows you to promote the company, its products and services and sell these products over the Internet. Customers can create an account and place orders. An example of an online store is shown in Fig. 1 [7]. Every online store should have subpages with the offer, t. J. Products or services that can be purchased in it. Products in the store should be accurately described and have a photograph. If there are many articles in it, they should be divided into categories. They are grouped by type for easier navigation and greater clarity. The online store must have a basket in which customers put the goods they want to order. It should contain a list of selected products with their price. Transport costs are often included. After adding the products he wants to buy to the cart, the customer should provide his data in the appropriate form so that the order can be processed. In

Fig. 1 Sample online store—www.elektro-ogrod.pl

448

M. Kieszek et al.

addition, he should be able to create an account so that he does not have to re-enter the same data in the case of future orders [7]. The most common online stores consist of modules such as the main menu, list of categories, cart, promotions, the latest and most frequently purchased products and contact details. Their task is to make it easier for the customer to navigate the store. An important solution is the search tool, thanks to which we can easily find it by entering the categories by entering the product name. It’s important that the customer easily finds information about the date and cost of delivery, payment options, returns, warranties, and who the site belongs to in the store. We talked about the part of the store that is available to the customer (Frontend) above, but each store must also have an administration panel (Backend), which is available only to the administrator. It is used to manage the store and its database. The administrator has the ability to manage products, categories, customers and orders. The panel should also be able to manage employees and their tasks in the system. In addition, you can edit store subpages there, such as information, regulations, contact information, and shipping costs.

2.3.2

Auction Platforms

Auction sites are trading platforms that mediate transactions between buyers and sellers. Sellers can list items there in two ways: by auction or at a fixed price (socalled “Buy Now”). In the auction, the winner is the person who has the highest bid for the item before the auction is completed. It is also possible to set a minimum price below which the seller is not obliged to sell the item. If the seller sets a fixed price, the customer must pay for the purchase of goods. It is also possible to conduct an auction with the possibility of auctioning and with a fixed price. If the client does not want to wait for the end of the auction, he can buy it immediately [6, 7]. The largest Polish auction site is www.allegro.pl, one of the many subpages of which is shown in Fig. 2. As in the online store described above, here are the products listed in the corresponding categories for easier navigation. In addition, the auction page allows many options for searching and sorting products, such as by price, location, and by category by other item parameters. To learn more about an item, go to its description page. The auction sites are only intermediaries of the transaction, but they allow you to find out if the seller is serious about his customers. To this end, a system of comments has been set up, meaning that after each completed transaction, buyers and sellers will provide each other with a brief statement of reasons. It can be positive, negative or neutral. This allows other users to see how many transactions the seller has made and at what level they were. The Allegro.pl auction platform charges a commission from the seller for each completed transaction and for the inclusion of an item. Buyers use the website at no cost. Thanks to such advantageous conditions, more and more potential customers are looking for articles on this website. More consumers are leading to better service quality and lower prices due to competition and a growing number of retailers.

E-Commerce Platform Using SQLite

449

Fig. 2 Website subpage www.allegro.pl

Allegro.pl offers a Buyer Protection Program, thanks to which they can feel safe when shopping. Under this program, if the consumer does not receive the ordered goods or the service is faulty, the money will be refunded. There is a special section in Allegro.pl in which users form a website community. They exchange their business experiences there. They have the opportunity to share their ideas and suggestions regarding the website. This allows owners to improve it by introducing new options and solutions. There are also some tips that are useful for beginning sellers. Based on them, they are able to prepare, for example, auction templates [6].

2.4 M Store Recently, m-commerce has become increasingly popular. It is part of e-commerce, where mobile phones and mobile devices with internet access play a major role. A specially prepared website for mobile phones can be a great idea to complement the offer of online stores. Its interface must be adapted to be displayed on small screens (usually at a lower resolution). Navigation should be planned so that you can conveniently browse the page using the keyboard or stylus on touch screen devices. Viewing many thousands of products on a small screen can be tedious and difficult, it would be good to show the best-selling or most cheap products. The largest Polish websites offering product comparison websites already have mobile versions of their websites. This allows customers to compare prices in a stationary

450

M. Kieszek et al.

store or supermarket with the prices of the same products offered by online stores. In this case, stores offering applications and games for phones or file stores would also work perfectly sound that would legally download music. M-commerce has a chance to become even more popular if mobile operators introduce support for mobile payments [2, 8]. The above chapter explains the concept of e-commerce and lists the differences between e-commerce and e-commerce. Furthermore, the history of e-commerce and its most important types were presented. The chapter also discusses the two most common applications, online stores and auction platforms.

3 Electronic Commerce in Poland Due to the growing number of Internet users and easier access to it, e-commerce is becoming increasingly popular. According to a report by the Central Statistical Office, in 2005 only 30% of households in Poland had internet access in Poland. Within 4 years, their number doubled and in 2009 reached 59%. The report also shows that in 2005, only 7% of Poles aged 16–74 declared shopping online. In 2009, the number of these people increased to 23% [9]. In order to attract new customers, companies create websites on which they place their offer and look for buyers of goods and services through auction websites or online stores. Due to growing competition, companies are trying to lower their prices in order to make the offer more attractive to buyers. The goods are most often sold on the web at a lower price than in stationary shops. According to estimates by the Direct Marketing Association, the turnover of the Polish e-commerce market (B2C and C2C) increased by 22% in 2009 compared to the previous year, reaching PLN 13.43 billion [10]. Figure 3 shows the value of the Polish e-commerce market according to the Direct Marketing Association in 2001–2009, broken down by turnover in online stores

Fig. 3 Value of the polish e-commerce market according to SMB (PLN billion)

E-Commerce Platform Using SQLite

451

and auction sites. As can be seen, the value of product sales by auction platforms significantly exceeds sales 14 in online stores during the period under review. The highest growth rates were recorded in 2002 and 2004, reaching 207% and 158% respectively. Similarly, in online stores, the largest increases were recorded in 2002 and 2004, reaching 217% and 199%, respectively. In the case of auction platforms, the largest change took place between 2001 and 2002 (an increase of 200%) and between 2002 and 2003 (an increase of 128%). In 2008, growth dynamics declined (36.4%), probably due to market stabilization and economic slowdown [10]. According to a survey conducted by Polish online stores and consumers carried out by the sklep24.pl website, the most frequently represented industries on the Polish Internet are “home and garden” (18.17%) and “gifts and accessories” (10.7%). This is due to the fact that these industries have a very wide range and, in addition, these products are not expensive and usually cost up to several tens of zlotys. Another reason is that these products do not need to be tested, they rarely disintegrate, and to buy them it is usually enough to read their description and look at the photos. Interesting products or gift ideas are often not available near home. People who live far away from larger cities and live in small towns usually have a similar problem because it is more advantageous for them to shop online for more choice or to save time to travel. The smallest are groups that specialize in a given, often narrow industry. For delicacies, which account for only 2.67%, this is because people are concerned about the quality of the food, fruit and vegetables offered by online retailers. The reason is also that they cannot compare the quality and freshness of the offered goods, especially fruits and vegetables. In the case of the “Books and Multimedia” category, which gained 5.03%, this is probably due to the fact that, for example, when buying a book, it is not possible to check it to see if it is interesting. The situation is completely different in the case of “Auto and Moto”. The share of this category in the total number of trades is 5.22%. As with books and multimedia, customers cannot view a product before purchasing it, and in addition, these products are often more expensive and varied, for example depending on the model or year of manufacture of the car [11]. Table 1 shows the value of sales and the number of orders for online stores in individual industries for stores with a monthly sales value between PLN 5000 and PLN 5 million. Obviously, the number of orders in these stores is almost 39 million, with the value of these orders exceeding PLN 9.5 billion, which means that the average order was completed for PLN 250.33. The table shows that Polish internet users most often buy items from the “books and multimedia” sector, and as these are lower value items (average order price is PLN 110.82), the sales of these stores amounted to PLN 1011.54 million, which gives the fourth position of the stores with the highest sales. The “Photo and RTV-AGD” industry had the highest sales in 2009 (27.6%), which was almost twice as high as in the case of “Home and Garden” (14.3%) and almost three times higher than in the “Computer” category (10.9%). The weighted average value of the basket of these three industries with the highest value of orders definitely exceeded the value of the others by several hundred zlotys. This leads to the conclusion that Internet users prefer to buy consumer electronics

452

M. Kieszek et al.

Table 1 Sales volume and number of orders for online stores in individual industries (stores with monthly sales in the range of PLN 5000—5 million)

Auto and moto

Number Participation Number shops in in the total orders in 2009 number of 2009 stores

Participation in the total number of orders

Weighted average basket value

worth Participation selling in the total in stores sale in 2009 in 2009

Pieces

%

Pieces

%

PLN

Million PLN

%

394

5,23

1,664,256

4,32

380,81

633,76

6,6 1,3

Delicacies

201

2,67

692,244

1,80

180,81

125,16

House and garden

1396

18,17

2,644,908

6,87

521,70

1379,84 14,3

A child

580

7,7

904,800

2,35

143,41

129,75

Photo and RTV-AGD

668

8,87

4,713,408

12,24

563,38

2655,44 27,6

Hobby

428

5,68

2,239,296

5,82

127,25

352,12

Computer

661

8,77

1,824,360

4,74

578,24

1054,91 10,9

Books and 379 multimedia

5,03

9,127,836

23,71

110,82

1011,54 10,5

The clothes

10,09

4,450,560

11,56

138,71

617,33

6,4

Gifts and 806 accessories

10,7

3,936,504

10,23

131,79

518,79

5,4

Sports and tourism

6,87

3,014,760

7,83

242,18

730,11

7,6

10,23

3,284,460

8,53

130,37

428,19

4,4

760

518

Health and 771 beauty Sum

7535

38497392S

1,3

3,7

9637

Source Sklep24.pl, 2009

over the Internet because they are looking for the most attractive price offers [10, 11].

3.1 General Information According to the “Report e-commerce 2010” carried out by internetstandart.pl (for 2009), it seems that the largest number of operating online stores is from 2 to 165 years, which represents 57.2% of all operated stores. In 2008, it was 37.9%, which means that stores that have been open in the last 1–2 years are still operating and developing. So there has been a lot of progress recently, which is likely to continue to grow. In 2009, most companies operated only one online store (75.18%). 15.94% of companies had two stores with different assortments. The report also

E-Commerce Platform Using SQLite

453

shows that 30% of stores offer 1–5 thousand different products. Only 10% of the surveyed stores offer about 100 products and are most often operated by one person. Among companies with such a small assortment, 69% of them also sell on auction sites. It’s a good addition because it’s easy and cheap to get customers. Most people there start looking for products because websites like Allegro.pl are already promoted and known. On the other hand, 65% of them do not have a stationary point of sale because they only trade online. The result is lower operating costs, which indirectly contribute to lowering the prices of the products offered [11]. In 2008, 73% of online store owners said they made a net profit from their business, while in 2009 this result increased by 14–87%. This means that during this time, sales increased and more people shopped online. Among the 13% of companies that made a loss in 2009, there was not a single company employing more than three people, while as many as 74% of them were run by one person, which means that they are mostly young companies that are just developing.

3.2 Internet Marketing The purpose of trading on the web is to sell products or services and attract new customers. Stationary shops, if they are in the city center or near places of high popularity, are promoted in this way and encourage potential customers to become acquainted with the offer and make purchases. On the Internet, in order for a customer to go to the store, they must know their Internet address. It can recognize it through advertisements, links, banners, promotional items. Such forms of advertising are also expensive and require knowledge in the field. You can also make sure that the links (hyperlinks) to the store are on other sites, which also includes a fee. You can also exchange them in friendly shops or 17 services. Such a store must also take care of domain placement for keywords related to the industry and the products it offers. Currently, software is available on the market that adds a store page to thematic and industry catalogs. You can also order this service for companies specializing in location and online advertising. Only stores with a domain name that matches the product you are looking for, such as laptops.pl or tyres.pl [7], can count on random entry and purchase. The graph in Fig. 4 shows which forms of Internet marketing are used by shop owners in 2007 and 2009. The most frequently chosen form of Trade Promotion in 2009 was the position of a search engine (77.4%). It consists of placing a link to the store’s website as high as possible in the search results below the given keyword. The most common people When looking for a product, pay attention to the first five results and the highest The first three items have a clickthrough rate. In order to get a page high in them the results must be placed. There are many factors, among other things, and how many links to it from other sites, such as keywords. It is also important that the site is added to thematic catalogs. As a result, search engines like Google see other websites of the store’s website “advertising” in this way, making it more important. The site should also come out to verify and comply with the

454

M. Kieszek et al.

Fig. 4 Online marketing forms used in stores. Source Sklepy24.pl, 2009 (n = 616) and gemiusReports, Sklepy24.pl, 2007 (n = 525)

standards. The position of the domain and its “Page Rank”, which is a page rank from a given range from 0 to 10 describing the quality and importance of the page, also have an important influence on the position in the search engine Google [7, 11]. In addition to determining positions, 71.1% of shop owners choose promotion, presence in shopping malls and catalogs as a form. This is how a website is listed in a website directory and is divided into categories. This makes it easier to browse and view records. Most often, there is a short description of the page, presenting its offer, a thumbnail photo of the store’s website and a link to it. In the most important directories, entries are moderated and checked through administrators who accept or reject added pages, causing them to contain only valuable pages. In addition, administrators monitor websites entered in directories, keeping them up to date [7]. The third position among online retailers was the presence on websites comparing prices (59.9%). When comparing data from 2009 with 2007, it can be stated that the largest increase in store promotion was recorded in advertising on price comparison websites (an increase of 18.4%), as well as in the presence of shopping malls and catalogs (an increase of 9.2%). Interest in the location of the search engine has decreased over the last two years—a decrease of 4.7% [11].

3.3 Transactions and Payments Customers who buy products over the Internet have great opportunities to pay for their purchases. Most stores provide the option of paying by bank transfer or cash on delivery, although according to Polish law, sellers must provide cash on delivery. The Act of March 2, 2000 on the protection of certain consumer rights and liability

E-Commerce Platform Using SQLite

455

Fig. 5 Payment methods available in online stores

for damage caused by a dangerous product says that: “the contract may not oblige the consumer to pay the price or remuneration before receiving the benefit”. Unfortunately, according to the “e-Handel Polska 2009” report, prepared by the sklep24.pl website, out of the surveyed group of 616 stores, only 85.1% of them comply with the above Act [11]. The chart presented in Fig. 5 shows that stores most often offer a transfer to the account, i.e. prepayment (86.2%) as a method of payment for customers for the ordered goods. This is due to the fact that the shop owner is sure that he will receive payment for the goods sold and can ship it. Although all stores legally should offer cash on delivery shipment, as many as 14.9% of them do not comply. This is because stores are afraid that the customer will order the goods and not pick them up, and then the store will bear the shipping costs. Additionally, goods that have not been collected cannot be offered again in the store until they are returned to you. Another example would be a store that shipped groceries with a short best-before date, such as fruit, vegetables. By not offering cash on delivery payment, you ensure that you will only send goods for which customers have paid. On the other hand, in the case of stores where the customer can order goods prepared on a special order or dimensions, for example fabric clothes that the store sews itself, the inability to send cash on delivery is fully justified [10, 11]. In addition, many stores offer a transfer to your account as a payment method based on an issued invoice, usually with the option of payment with deferred payment. Most often, this form is addressed to stores whose customers are other economic entities, for example intermediaries or companies purchasing goods for resale. Payment on delivery is very often chosen by customers (46.4%), while the payment in advance by bank transfer to the store’s account is chosen by 39.4%. This means that Poles are still distrustful of sellers and courier companies and prefer pay for the goods upon delivery. In Poland, few people have credit cards, so almost half of the stores offer this option of payment, because it is still not a popular payment method. As shown in Fig. 5, only 19% of the surveyed stores offer an instalment plan. It turns out, however, that in online stores that provide such a payment system, the size

456

M. Kieszek et al.

Fig. 6 Average order value depending on available payment methods. Source Sklep24.pl, 2009 (n = 616)

of the average order is much higher than in those that do not offer it. This tendency can be seen in Fig. 6. In the instalment system, almost 60% of orders are orders in excess of PLN 300. This means that when buying more expensive goods, customers are not afraid to buy credit online. In the figure above, it can also be seen that the order values for payment by credit card and payment on delivery do not differ significantly from each other. As already mentioned, the industry in which the store operates is of great importance for the value of the cart. The highest value of sales have stores with electronic products and from the “Home and Garden” sector. Outside the industry, the seniority of the business is another important factor related to the value of the order. This can be seen in Fig. 7, which shows the average value of the order depending on the age of the store. In stores operating on the market for a short time (up to 2 years), about 41% of orders are worth up to 100 PLN. Stores that have been operating for 2–5 years have only 30.9% of these orders and the value for those that sell for more than 5 years is 23.7%. This is due to the fact that older sellers are gaining experience with online sales, and together with these skills in increasing the value of the cart, optimizing the offer and the fact that the store develops and offers more expensive and more frequently purchased products on which it makes more profit. Figure 7 shows that

Fig. 7 Average order value depending on business experience

E-Commerce Platform Using SQLite

457

Fig. 8 Average value of the order depending on the ownership of the stationary shop. Source Sklep24.pl, 2009 (n = 616)

stores operating on the market for less than a year declare the largest number of baskets worth more than PLN 1,000 (12.7%), which seems rather strange. They may offer high discounts or use frequent promotions to encourage new customers to buy [11]. Another factor influencing the value of the order is the stone shop, as shown in Fig. 8. In stores that still have a point of sale, there are more orders from PLN 200. This means that customers can personally inspect and test the goods before purchasing them. They can also buy them and pick them up in person, pay in cash or by card. In addition, higher sales may be due to the fact that goods in stationary shops are more often available because they often have warehouses [11].

3.4 Technology The basis for the functioning of an online store is its software and the server on which it will operate. The application in the store can be purchased in the form of a license, which is associated with one-time expenses. It can be rented—the so-called Saas model—Software as a Service, which includes fees in the form of a commission on turnover (or net sales value) or a monthly subscription. Another option is to have the entire software of the store platform written to a third party or written internally. Such software is then completely tailored to your needs, both in terms of functionality and graphics. The third option is to use free software under the Open Source license, which can be freely adapted to your needs [7, 11]. As Fig. 9 shows, 35.3% of stores run on proprietary software written independently by store employees (15.7%) or ordered by an external company (19.6%). Finished commercial software represents 26.8% of used applications. It is quite a surprise that up to 21.3% of stores use free software. The lifespan of an online store has the greatest impact on the type of software used. Figure 10 shows this very well. It can be seen that stores that start and operate for up to two years most often use ready-made commercial software and free software under an Open Source license. The use of this type of application is justified by the fact that it is the cheapest

458

M. Kieszek et al.

Fig. 9 A list showing what type of software online stores use. Source Sklepy24.pl, 2009 (n = 616)

Fig. 10 Application of the trade depending on its seniority. Source Sklep24.pl, 2009 (n = 616)

solution that most stores can afford at the beginning of their activities. Patented software (43.4%), which is the most expensive software, is most often used by stores that have existed for more than 5 years. This allows them to create an application that is perfectly adapted to the nature of the store and assortment. As is also evident, the smallest changes occur in the case of software provided by platforms that provide business. This is because you don’t have to employ as many people who deal with the technical side of the business in stores run with their help as you do with other solutions. This makes it possible to save costs in the first place [11] (Fig. 11) In order to work, the store software must be installed on the appropriate server. According to the report “Polish Survey of Online Stores”, virtual servers are the most frequently selected hosting service for applications in the store (33.20%). This means that multiple websites can be hosted on a single server. Each user has access only to their own part. Dedicated servers are used by 28.31% of stores. It consists in the fact that there is a separate computer in the professional server room and the client can install the necessary software on it. In addition, it has the ability to freely configure the operating system and can install the necessary software, for example for

E-Commerce Platform Using SQLite

459

Fig. 11 Hosting services used by online stores in 2009. Source “Survey of Polish Internet Stores”, Internet Standard, September 2010

e-mail or file servers (ftp). Such a server is connected to the Internet via high-speed connections (100 Mbit and more), which affects the loading speed of the store’s website on the client side [7, 11]. The chapter above presents the most important statistics of Polish online stores, including the value of the e-commerce market, the most common types of customer purchases or the industries in which the stores operate. The most important ways of promoting stores and gaining customers were also discussed. The chapter also contains information on the most important payment methods in stores and information on store software.

4 Online Software Store At present, to set up your own online store and start selling online, just create an account on the store platform and add products, but can such a store be tailored to your needs? The following chapter presents the possibilities of platforms offering ready-made shops and shops created on commercial and Open Source software. For this reason, the platforms have been compared to the software that is most used.

4.1 Open Source Software Currently, the most popular open source software for running your own online store is Magento and Prestashop. Mention should also be made of osCommerce, which was one of the first platforms to run a store, but due to a long-term lack of software

460

M. Kieszek et al.

Fig. 12 Interest in time—Google search

updates and development, many customers began to use competing products that were dynamically developed for them at the time with price comparison sites [12, 13]. Figure 12 shows keyword and phrase search statistics with the words “magento,” “prestashop,” and “oscommerce” on Google from January 2006 through November 2010. These numbers indicate how many searches were performed on a particular search term compared to Google searches over time. The graph shows that until 2009, the most frequently searched information was for osCommerce software. Its popularity declined with the arrival of PrestaShop in August 2007 and Magenta in March 2008. As you can see, just before the second quarter of 2009, Magento was already the most popular free software for running an online store. Although they are free solutions, setting up an online store on them is associated with expenses. To run them, you must buy a domain and suitable hosting with support for at least PHP 5.0 and a MySQL database. For stores with a larger number of items and a large assortment, a dedicated server is a good solution. As these stores have a standard graphic design after installation, they should be adapted to the appearance of the store and the scope that will be sold in it. You can use paid templates or outsource graphics to an external company for this purpose. Additionally, after software releases are released, you will need to install them yourself. Magento and Prestashop have a community that exchanges views on forums and mailing lists, creates modules, fixes, language versions, and new features. Most of the problems and issues related to these applications are described here. Both of these programs support most languages (including Polish) and currencies whose courses are regularly converted. Magento has introduced support for mobile devices, but it is not available under an Open Source license and is due. In 2010, PrestaShop was awarded for dynamic development and a large number of implementations. This software won the Open Source Awards 2010 in the category “e-commerce applications” [12, 13]. Figure 13 shows a web page of an example of an online store running on PrestaShop. The main page shows the latest products and store information. There are different modules on the left and right side. The “tags” module contains keywords that can be used to search for products in the store, and below it is a list of product categories. You can also view products by manufacturer in the left panel.

E-Commerce Platform Using SQLite

461

Fig. 13 Sample online store with PrestaShop software

On the right side there is a “basket” module, which is a basket in which you can place products before placing an order. The last module is the module of the latest products in the store. In addition, the customer can search for products in the store and change the menu and language. Each online store has its own administration panel. The PrestaShop software panel is shown in Fig. 14. Here you can see the section with the system administrator, where the options for managing the store’s customers are displayed. At the top is an offer from which you can select cards that match, for example, the message: products, customers, orders, or payments. In the administration panel, the administrator can configure the store options, such as shipping costs, payment settings or the appearance of the customer part of the store.

Fig. 14 Save the administration panel in the PrestaShop software

462

M. Kieszek et al.

4.2 Platforms Providing Online Stores The largest platforms offering ready-made e-commerce platforms in Poland are IAIShop.com and iStore.pl. Starting a business in them is based on renting software for a certain period of time. These platforms offer ready-made operations with graphics tailored to the requirements and needs of the client. The price additionally includes the domain, hosting and technical support. The software is created using new technologies and offers a high standard of security, including SSL encryption between clients and the server [14, 15]. It is automatically updated when a new software version is released. These stores have a very large number of options. They are integrated with other websites offering online sales, such as the auction services Allegro.pl, swistak.pl, price comparison websites and shopping malls. They already have online credit card payment systems and the ability to quickly transfer from a bank account using Przelewy24.pl, paypal.pl or dotpay.pl. Another advantage of using platforms offering online stores is the fact that they are integrated into warehousing and accounting systems, for example with the popular invoicing and accounting program Subiekt GT. All orders placed by customers are automatically posted and the status of products is updated in the stock database [16, 17]. In addition, the IAI-Shop.com platform offers an integrator that connects the store with online wholesalers, allowing you to automatically download data containing information about many thousands of products from warehouse databases along with descriptions and photos to the store. It has the possibility to determine the margin for specific products or groups of products and to add them at the customer’s request, as long as they are in stock [16]. The iStore.pl platform, together with the Allegro.pl website, offers its customers a Buyer Protection Program. This allows customers of shops set up on this platform to secure purchases of up to PLN 10 000 in the event of a purchase failure [17]. Figure 15 shows the website of a demo store working on the iStore.pl platform. This is an example store, which is one of the many options a customer who buys a license can get. As you can see in the picture, the store has a corresponding graphic design, for example for a shop with cosmetics or body care products. In the case of shops with ready-made platforms, each of them has an prepared layout of the offer for a specific graphic design, which is also related to where the other modules will be inserted. In the demo store, there is a panel at the top of the page where the customer can set currencies and language options, as well as register or log in to their user account. The homepage of such a store also contains news from the store added by the administrator and a gallery of the most frequently purchased products. In addition, the panel on the right shows the latest product and promotions.

E-Commerce Platform Using SQLite

463

Fig. 15 Demo page for trading on the iStore platform

4.3 Commercial Software Commercial software is the most expensive solution that stores that have long been in the market and have their regular customers can afford. Such software is written for each client separately, which allows the software to be adapted to his needs in terms of functionality and graphic design. A good example is a store operating in the electronics industry that sells computer parts and complete sets. You can even use software for such a business. However, when choosing a shop on commercial software, it is possible to equip such a shop with modules that facilitate shopping, for example an intelligent wizard who protects the customer from choosing parts that do not fit together. Another function could be to challenge the customer from which parts to build their computer, or to provide alternative parts that are cheaper or more expensive. An interesting solution could be to compare the parameters and performance of competing products, such as graphics cards [7]. In the case of commercial software, it can be installed on a client server or in a professional server room. In addition, it has more options if it wants to add adjustments to its store in the future, such as new payment methods or integration with price comparison sites.

464

M. Kieszek et al.

4.4 Comparison and Summary Open source store software is ideal for a store that is just starting up and has a limited budget. Such a shop should be handled by a person who will be able to handle the installation of the shop, minor modifications and its maintenance. In the case of platforms offering ready-made software for running shops, the customer gets a ready-made, functional shop with many functions, but has little ability to adapt the shop to his needs. You don’t have to worry about technical issues like installing and updating software. It has technical support available, which, in accordance with the agreement, solves all technical problems related to the store’s software and prepares new versions with additional functions. Such software is usually used by stores that have been on the market for some time or are just starting to sell online. Commercial software is mainly used by stores that have been operating on the web for a long time, usually for more than 5 years. Vendors usually already know which industries they will specialize in and what products they will offer so they can afford additional features. The chapter presents three types of software with which you can open an online store, t. J. Platforms offering ready-made online stores and stores built on commercial and Open Source software. Their advantages and disadvantages are presented, as well as exemplary deals that were realized with their help. The following are examples of their use for specific customer requirements.

5 Discussion of Technologies and Tools Used During Project Implementation The online store application was written in Java EE using JSP, HTML and CSS technologies. The Eclipse IDE application for Java EE developers was used to prepare the project. Apache Tomcat 6.0 was used as the server.

5.1 Java Java is an object-oriented language. Its object model is simple and extensible, but simple types (such as integers) are not objects for performance reasons. Java meets the characteristics of object-oriented programming, t. J. It is characterized by abstraction, encapsulation, heredity and polymorphism. Its important advantage is the crossplatform nature, which means that programs written in Java will be able to run on all system platforms [18–20]. J2EE offers components for client sessions, presentation (GUI layer), business logic, and application management logic. It also has transaction and data management services. Applications written using J2EE technology are characterized by portability

E-Commerce Platform Using SQLite

465

and division into a three-tier application architecture. In such programs, you can distinguish between the data presentation layer (i.e. the J. Graphical User Interface), the business logic layer, and database abstraction layer [21–23].

5.2 Frameworks The following Java frameworks were used during the implementation of the e-commerce project: Hibernate, Spring, Struts, and Sitemesh. Hibernate is a Java framework developed under an Open Source license. Provides data mapping between a relational database and objects (mapping objects to relationships), using descriptions of data structures using Java classes for this purpose. This way you can “project” data from database tables onto objects used in object-oriented languages. Hibernation can increase the performance of 32 database operations by caching and minimizing the number of queries sent [24–27]. Hibernate has implemented dialect classes for many databases to ensure that the correct and best-optimized SQL is used for each individual product. In the basic version, Hibernate supports all major databases: DB2, HSQLDB, Microsoft SQL Server, MySQL, Oracle, PostgreeSQL. It is also possible to write a dialect for collaboration with other databases or use ready-made solutions provided by other companies and communities. Hibernation supports the most important types of Java and SQL objects, making it much easier to use when mapping objects. Furthermore, it is possible to automatically convert from similar types, for example java.lang.String to the corresponding SQL VARCHAR [24, 26, 28–30]. Figure 16 shows a snippet of application code that is responsible for configuring hibernation to connect to a selected database in a project, in this case SQLite. In order for Hibernate to connect to the selected database, its address (“connection.url”), dialect, and appropriate libraries must be attached to the project. For databases that are secured with a login name and password, enter them in the “connection.username” and “connection.password” fields. The “show_sql” field is responsible for displaying database queries in the application console. Figure 17 shows an example of an Invoice class that is mapped from a database. Each class is assigned its corresponding database table name using @Table (name

Fig. 16 A fragment of the hibernate.cfg.xml file

466

M. Kieszek et al.

Fig. 17 Example of a mapped Java class for hibernation

= ”table name”). Hibernation requires that each mapped class specify a primary key with an @Id element. The MVC (Model-View-Controller) pattern is a task division model on the server side of a web application. An application like this is implemented as a combination of servlets, JSPs, services, and real Java code. This means that each JSP file used to display a different subpage of a web page has its own java class, which is responsible for its actions and logic. The Spring Framework project was used as the application framework. The role of the application controller is to coordinate all control flow activities in the application, handle errors, and select the appropriate display to display. In the created project, the Struts framework was used for this purpose, which uses “xml” files for configuration purposes [24, 31, 32]. An example of a configuration file fragment is shown in Fig. 18. Sitemesh is a free Java framework for linking application-generated views to a generic template. In the project, it was used to insert the same menu on each subpage of the site, which makes it easier to add new menu items.

Fig. 18 An example fragment of the Struts framework configuration

E-Commerce Platform Using SQLite

467

5.3 Databases When creating an online store, it is very important to use a database that will store all the data required by the online store application. There are many database solutions on the market for various applications. Choosing the right one is very important as it can affect the speed of the website and the security of the stored data [15, 33].

5.3.1

SQLite

SQLite is free software (public domain license) that provides a relational database management system (RDBMS). This system is used to store defined records in tables. In addition, such database engines have the ability to search for data from multiple tables [34, 35]. Figure 19 shows the traditional database architecture and SQLite server architecture. Unlike most RDBMS systems, SQLite does not have a client/server architecture. Large database management systems most often have a built-in server, which is a database engine. The database server typically has multiple processes that manage connections, I/O files, caching, and query optimization. Such databases are stored in many files stored in a particular directory structure. The software to use the SQLite database must have the appropriate libraries to replace the server and be responsible for communication. Unlike other RDBMS servers, where multitasking and interprocess communication are required, SQLite requires less resources and memory. As a result, it is often used, for example, in mobile phones, game consoles and other mobile devices. This database is it is used not only in Internet programs, but also often in desktop applications. Many well-known applications use a SQLite database, such as the popular Mozilla Firefox web browser, which stores user profiles in such a database [34]. The SQLite database is stored in a single binary file that contains a schema of tables, indexes, and data. Such a database file is cross-platform and compatible with major operating systems, so it can be freely created, copied and modified. The SQLite database can be easily uploaded or published. The maximum size of such a file is

Fig. 19 Traditional RDBMS architecture and SQLite server architecture [34]

468

M. Kieszek et al.

2 TB. SQLite is maintained on disk using b-trees. Each table and index uses a different tree [34]. On most platforms that support SQLite, its library is less than 700 KB and requires less than 4 MB of memory. By getting rid of some of the functions of this library, it is possible to reduce its size to even about 300 kB and 256 kB of the required memory, which is an ideal solution for various embedded systems that have limited resources [35]. SQLite offers several features that other database management systems do not have. It can use a dynamic variable system for tables. In addition, SQLite can use several databases at the same time. Allows you to connect to other files multiple times during a single database connection. This allows you to combine tables from different databases with a common query. In addition, the SQLite database can be stored in temporary RAM. It is true that it will not be permanent and will not be able to fully support transactions, but its functioning is very fast [34]. You can use the “SQLite Manager” add-on available for the Mozilla Firefox web browser to manage your SQLite database. A sample view of this program is shown in Fig. 20. This application allows you to perform all database operations, such as creating tables, adding new ones, and editing or deleting existing records. Furthermore, you can easily clear the data from the table and even enter the SQL code to be executed. An interesting feature of the program is the “Compact Database”, which optimizes the database. You can also back up the entire database and save them in other files [36].

Fig. 20 SQLite database management program—SQLite manager

E-Commerce Platform Using SQLite

5.3.2

469

MySQL

MySQL is a relational database management system developed under a free software license currently provided by Oracle. It is available for the most important system platforms and since its code is the source version, it can be compiled on any platform. Free phpMyAdmin software can be used to manage the MySQL database. It allows you to create and delete databases, add and delete relationships and modify their structure and content (Fig. 21). Using the phpMyAdmin tool, all operations with the MySQL database can be performed in a web browser, in a graphical environment, without the need to work with a text console. The disadvantage of this solution is that phpMyAdmin was written in PHP, in order for this application to run, it must be installed on a server that supports PHP [37].

5.4 Analysis of Application Performance with Different Databases This section compares the performance of SQLite with another frequently used MySQL database to answer the question of whether a SQLite database is a good solution for online shopping software. The first test was to add 1, 10, 100 and 1,000 copies of the same client to the database. The time was measured before and after the prompt to create a new object and save it to the database. To perform this test, each operation was performed 10 times on an empty database and the average result is recorded in Table 2. As you can see, SQLite is almost 50 times slower to write data to an empty database than MySQL. This is a significant difference that you may notice when working with an online store system. Figure 22 shows a graph comparing the speed of the above test.

Fig. 21 PhpMyAdmin tool for managing a MySQL database

470

M. Kieszek et al.

Table 2 Average time to add records to the database Database

Average addition time [ms]

SQLite MySQL

1 client

10 clients

100 clients

1000 clients

130.7

1354

15.253

167,480.6

2.6

39.3

234.6

3812.7

Fig. 22 Average write time to empty SQLite and MySQL databases

The second test consisted of adding another 1000 records to the same database (SQLite and MySQL) and measuring the write time to the database to which the records had previously been added. The test results are recorded in Table 3. Table 3 shows the average time to add an additional 1000 records of customer data to SQLite and MySQL servers. Figure 23 shows graphs illustrating the addition time (in milliseconds) of an additional 1000 records based on the table above. You will notice that for SQLite, when adding new records, it does not really matter whether or not there is data in the database. There was no significant increase in the time of such an operation for her. In the case of MySQL, there is a noticeable decrease in performance when adding new records to an existing database. The difference at the time of adding 1000 clients with an empty database and when there were already 4000 clients in the database was more than 15 s. Table 3 Time to add new customers to the database Database

Time of adding new clients to the database [ms] 1000

From 1001 to 2000

From 2001 to 3000

From 3001 to 4000

From 4001 to 5000

SQLite

160,877

157,917

169,219

163,442

163,287

MySQL

5166

8349

11,712

15,742

20,342

E-Commerce Platform Using SQLite

471

Fig. 23 Database performance while adding additional records to the database

Table 4 Font sizes of headings Database

Average time of searching for a specific client from the database data containing 10,000 records [ms]

SQLite

985.7

MySQL

411.7

On the server where the tests were performed, it was also clear that the server’s hard disk was the busiest when adding records to the SQLite database, while in the case of MySQL the CPU was the busiest. The last test was to compare the data retrieval time of one specific client from a database of 10,000 records for both databases. Each test was run 10 times and the average results are shown in Table 4. The test shows that searching for data in SQLite is more than twice as slow as in the case of MySQL. In the case of SQLite, the entire database must be loaded from the file each time. This is due to many factors, such as read speed from the hard disk, data transfer rate, data access time, and even the rotation speed of the plates. The MySQL database has its own server, which runs many processes that manage connections, cache and also optimize queries. As can be seen from the tests performed, the SQLite database is not a good solution for purchasing software. It is much slower compared to the competing product MySQL. As mentioned above, SQLite is a good solution for desktop programs so that no additional database server is installed separately for them. In addition, this database can also be used in mobile devices and embedded systems, where the emphasis is mainly on the amount of memory space and limited processor resources. This chapter describes the technologies and tools that were used to create the SQLite online store application project. Because the application was written using a three-tier architecture, you can easily change the database it used. This was used to test its performance against another frequently chosen database management system, MySQL.

472

M. Kieszek et al.

6 Technical Documentation Describing the Created Application The aim of the project is to create an online store application and an administration panel for its administration, working on SQLite databases.

6.1 Functional and Non-Functional Requirements The functional requirements of the created system are the following: In the online store, customers can: • • • • • • • • • • • • • • • • •

registration, check-in and check-out browsing categories and viewing products, ordering products by adding them to the virtual cart, access and edit your personal data, View your previous orders and download invoices in the message panel, the administrator can: check-in and check-out adding a new customer, editing its data and deleting it, search for a customer by name, surname and telephone number, adding a new producer, modifying his data and removing existing producers, search for manufacturers, taking into account the name, tax identification number and telephone number, adding, editing and deleting categories to which products are assigned, adding, adjusting and deleting VAT rates to which they will be assigned Products, adding new products, editing data on them and deleting them, search for products by name, category, availability and specific price and quantity range, preview of products ordered by customers and detailed information about them. Non-functional requirements for the built-in application are:

• Application independence from the operating system - the online store software and administration panel are made in Java EE technology, which allows you to run the application and its server (eg Tomcat) in any environment. • Possibility to use different databases—the program was written using a three-tier application architecture that extracted the database abstraction layer in the server architecture. For this purpose, the Hibernate framework was used, which can work with the most important databases available. • Online store application scalability—allows many customers to perform independent operations simultaneously.

E-Commerce Platform Using SQLite

473

• Application distribution—e-commerce customers can perform operations over the Internet using any web browser, ie. The application was written using a client– server architecture. • Increase the security of websites and customer data by encrypting customers’ passwords before storing them in a database [38].

6.2 Database Schema Figure 24 shows a database diagram of the online store software and its control panel. Although they are two independent applications, they use the same database.

Fig. 24 Scheme of the database of the created online store

474

M. Kieszek et al.

The “customers” table stores information about people registered on the website, such as: email address, which also includes a login to the website, password, name, surname, address information, telephone number and tax identification number. The password is stored in the database as a coded string with a one-way MD5 hash function. In addition, to provide additional protection against password cracking using a dictionary or brute force method, the password is combined with a specific string of characters, which makes it difficult to crack. The “products” table contains information about the products offered in the online store. Each product has its own unique master key in the form of a 44ID attribute, name, price, available quantity, description, server file name stored on the server, date added to the store, and a value that determines whether the product is available for sale. In addition, each product stores the VAT rate ID, the manufacturer ID, and the category ID to which it is assigned. The “ratesVat” table contains the ID of the master key and the corresponding value of the VAT rate. The same applies to the “categories” table, in which the category names are stored. The “Manufacturers” table contains details of the product manufacturer. It contains data such as: unique identification key, manufacturer’s name, address details, tax identification number, company website address and telephone number. This table has been used in the administration panel to see who is the producer of an item that ends up in stock so that you can quickly order that item. The “carts” table stores the products that customers have added to the cart. As a result, it is known how many products should be blocked on the website so that other customers cannot buy them at the same time. If the order from the cart is not processed within 30 min of the last change in it, its products will be added again to the products available in the store. The data of customers logged in to the store are stored in the “sessions” table. This table has a unique key ID, foreign key user ID, session, and login date. This table is used to assign a specific customer to the shopping cart when placing an order. When the customer places an order, his ID (“customer identification number”) and the current date are stored in the “orders” table, while the ordered products, their quantity, name, net and gross price and VAT rate value are stored in the “ordersProducts” table. This is because if the product in the store has a changed price, the orders will include the price of the product, its name and the value of VAT at the time of placing the order. Thanks to this, the customer will be able to view the orders he has placed in the future, with the prices valid at the time of ordering. Billing information is stored in the “invoices” table. This table has its own unique master key, the identification number of the customer who placed the order, and a unique invoice number in the form: “additional order number/month/year”. The “options” table stores store configuration data, such as the number of list items displayed per page in the admin panel and store, contacts, store details, or the number of the latest products to display to customers.

E-Commerce Platform Using SQLite

475

6.3 Schemes of Components and Packages The online store application uses a three-tier client–server architecture in which the user interface, the business logic layer, and the data layer are separate modules. A diagram illustrating the three-tier architecture is shown in Fig. 25. The top layer is the presentation layer, whose task is to present the results required by the user. It also allows you to communicate with the system by calling up available services. The lower layer is the layer of business logic that coordinates the work of the application, processes user requests and is responsible for data transfer between the presentation layer and the data layer. The lowest layer is the data layer, which organizes the permanent data stored in the database and makes it available to higher layers. Figure 26 shows a diagram of the components of an online store. It is almost the same for the store application and the administration panel, but they differ in the structure of the main folder. All other components and interfaces are the same, so for simplicity, all components are shown in one general diagram. The most important part is the main part of the business/administration panel, which is responsible for the operation of the application. It connects via a communication interface to the “services” component, which is responsible for the business logic layer. The business logic layer uses the database through the database service interface using JDBC (Java DataBase Connectivity). Allows platform-independent applications to communicate with databases. The project is responsible for a framework of long-term sleep that converts data from a relational database to objects that use Java. A data presentation interface was used to link the business logic layer to the presentation layer. For this purpose, the project used the Struts and Sitemesh frameworks. Struts is responsible for mapping application web addresses to class methods that handle requests. Sitemesh was used to compile the templates. It is based on the Fig. 25 Schematic illustrating a three-tier architecture

476

M. Kieszek et al.

Fig. 26 Schematic of components of the created application

fact that the same elements are inserted on each subpage by the Sitemesh server, which clearly facilitates the creation of the display and graphical user interface. Figure 27 shows the main component of the business logic administration layer in the form of a package diagram. Each of the submitted packages is responsible for managing a different part of the system. The Manufacturers package contains classes that include methods for adding, editing, and removing producers. In addition, it allows you to search for manufacturers by specific data. The Categories package is responsible for managing the product categories in the application. Includes classes responsible for adding, editing, and deleting categories. The VAT rate package is responsible for managing taxes on products in the store. The classes included in the Products package were used to manage products. For proper operation, this package uses the services offered by the Manufacturers, Categories and VAT rates packages. He is responsible for adding new products, editing them, adding photos to them and also for searching for them. The Customers package allows you to manage customer data. There are classes responsible for adding new customers, deleting existing customers, editing their personal data, changing passwords, and finding customers. The order package contains the classes responsible for displaying orders and their details. Requires Products and Customers to work properly.

E-Commerce Platform Using SQLite

477

Fig. 27 Schema of the business logic layer package of the administration panel

In the Options package, you can configure the settings for the admin panel and the online store. Figure 28 shows a package diagram for the main part of the e-commerce business logic. Each of them is responsible for the different functionality of the website. The

Fig. 28 Scheme of the business logical layer package of the online store

478

M. Kieszek et al.

login/registration package contains classes that allow the customer to register on the website, log in to it and log out securely after work. In the User Account Control package, a registered website user can edit their personal information and view the orders associated with their account. The Product Browsing package is responsible for providing the customer with a list of products, their respective display on store pages, and the division of products into appropriate categories. The Product Ordering and Cart Management packages have classes that offer the customer the option to add products to the cart and order them later.

6.4 Schemes of Use The subsection presents use case diagrams for the online store and the administration panel, which allow you to thoroughly understand the functionality of the created software.

6.4.1

Online Shop

Figure 29 is a use case diagram that illustrates the functionality of an online store in which the following entities are defined: • user, t. J. Any person using an online store • guest, t. J. Unlogged user, • a logged-in online store user. The user has the rights that every user of the online store has. It has the ability to view products from selected categories of stores and the specified number of the latest products. In addition, the user can view a detailed description of the products that interest him. Selected products can be added to the virtual cart, where the products you want to order will be stored. Of course, due to the limited number of products, he cannot add more items to the cart than is in stock in the store. The user can view his order basket, remove products from it and reduce or increase their quantity. The value of products in the cart, the value of the order, net prices, gross prices are constantly updated, so the customer always knows the price of their purchases. Each guest has the same options as the user. Additionally, it can also register. This process consists of setting up an account on the website using the entered login name, password and personal data for the purpose of completing future orders. He can also log in to his account. If the customer wants to complete the order, he must log in. The logged in client has all the options, such as user. In addition, he can place an order at any time if there is at least one product in the cart. There is a new order associated with his account that he can view. The logged-in user can download images of invoices, which are automatically generated based on his orders. In addition, the registered customer has

E-Commerce Platform Using SQLite

479

Fig. 29 Use the case diagram for an online store

the opportunity to verify their personal data, change it and change their password on the website. After completing all activities in the store, the customer has the opportunity to log out so that he can safely end work on the website.

6.4.2

Administration Panel

Figure 30 shows a use case diagram for the online store administration panel. The main administrator of the website is the participant who logged in using the login name and password. This is only part of the administrator’s capabilities, showing cases of using the client management module. This actor has the ability to view a list of all customers in the store who have registered their account. It has the ability to display this information, such as fulfilling a customer’s order. They can also edit and change the password of a customer who forgot it. An administrator can delete a customer in a store who wants to cancel an account on the site, but can only delete the data of those customers who have not made any purchase or order in the online store. It can also search for customers by specific fields, such as first name, last name, or email address.

480

M. Kieszek et al.

Fig. 30 Use the case diagram for the administration panel—customers module

The administrator can also manually add a new client to the database. For this purpose, he must fill in a form similar to the form filled in by the customer registering in the store. This option can be useful when a customer wants to place an order over the phone. Figure 31 shows a use case diagram for manufacturer management in the online store panel. The main actor is the administrator, who has the possibility to list or search for manufacturers according to selected data, for example by name, telephone and tax identification number. He can see the details of the selected manufacturer and edit them. The administrator also has the option to remove the manufacturer, but only if the products offered in the store do not refer to the manufacturer. Figure 32 shows a use case diagram for a system administrator as part of managing a product module that is available in an online store. The administrator has the opportunity to view a list of all available products that are offered in the online store and has the ability to search for them. The product finder is extensive because you can search by product name, the category to which it is assigned, and whether it is

Fig. 31 Use the case diagram for the administration panel—Manufacturers module

E-Commerce Platform Using SQLite

481

Fig. 32 Use the case diagram for the administration panel—Products module

currently available in the store. You can also find products by entering their price range and quantity. The administrator can display information about the selected product from the list. He can add and edit new products to the store, including placing a new product photo and storing it on a server. The administrator can also remove the product if the customer has not previously ordered it. Figure 33 shows a schematic of the use cases of the administration panel that supports the product and order category module. The actor of this diagram is the webmaster. In the category management module, the administrator has the option to display a list of categories from the store to which the products are assigned, so that they can be easily grouped and searched in the store. By selecting a category, the administrator can edit its name or delete it. Deleting a category is only possible if it is not assigned

Fig. 33 Use the case diagram for the administration panel—the categories module and the orders module

482

M. Kieszek et al.

Fig. 34 Case diagram for the administration panel—VAT rate management module and election management module

to any product. If the administrator cannot delete it, a list of products belonging to this category will be displayed for editing. In the order management module, the administrator can check the latest orders entered in the store and preview them. They can also download the invoice in “pdf” format with the details of the order. Figure 34 shows a diagram of the use cases of the administrator panel for the business options management module and the product VAT rate management module. The webmaster is also an actor. In the store options management package, the administrator has the ability to view and modify the store configuration. For example, it can change the number of products displayed on individual store subpages and the dashboard, the number of the latest products in the online store, and it can also add contact information for the store. In the VAT rate management package, the administrator can manage rate values, search for products according to a given rate value, and modify them. In addition, it can remove the rate, but only if it is not assigned to any product.

6.5 Schemes of Activity Figure 35 shows a diagram of activities describing the user registration process in the online store. The first thing a customer needs to do is enter an email address, which is also the account login name. Thanks to it, the user will be able to be identified in the system. The customer must also enter their password for authentication. To avoid an error that prevents you from logging in later, the customer must reconfirm the password. Only after performing these activities can the customer proceed to the

E-Commerce Platform Using SQLite

483

Fig. 35 User registration in the online store

next phase of registration, ie provide his personal data, which is necessary to process the customer’s order. After entering this information, it will be verified. If the fields are filled in incorrectly, the relevant message will be displayed to the customer. If the data was entered correctly, the registration process was successful and the customer data is stored in the database. Figure 36 shows a diagram of activities describing the login process and password change for store customers. The first step is for the user to enter their login name and password on the application’s website. The password is verified and then a cryptographic hash function is generated from it using the MD5 function and compared with the password hash stored in the database. If the previous process was successful, the user can change the password. For this reason, for security reasons, he must re-enter his current password in the browser and the new password and confirm it [33, 38]. Password confirmation is required to avoid confusion that would prevent the user from logging in later. If all the entered data is correct, a 128-bit MD5 password hash is generated and stored in the database. In the database, user passwords are securely stored as an MD5 hash function. This makes them difficult to read and decrypt because the hash function used is one-way. Figure 37 shows a diagram of activities describing the process of ordering products in the online store from the customer’s point of view. The customer views the list of

484

M. Kieszek et al.

Fig. 36 Activity diagram showing the change of the user’s login and password

Fig. 37 The process of ordering products in the online store

products and then adds the selected one to the virtual cart in the store. In this way, he can choose any number of products in the store. When a customer places an order, they must be logged in on the website to associate the order with their account. If the customer does not have an account on the website, they must go through the registration process and then log in to the

E-Commerce Platform Using SQLite

485

Fig. 38 Diagram of activities showing the process of adding a product to the cart and placing an order

website. After performing these activities, the customer places an order, which is then assigned to his account. Figure 38 is an action diagram illustrating the process of adding a product to a customer’s cart by a website. First, the customer chooses a product and adds it to the cart. The GUI layer transfers the data of the selected product to the business logic layer. It is then checked that the warehouse has the correct quantity of the ordered product. The next step is to assign the selected product to the customer’s cart and reduce the number of available items in stock so that other customers can no longer buy it. This restriction must be defined so that there is no situation where customers want to order more products than are currently available in the store. This is a solution to the conflict of availability of goods. After selecting products by the customer, the next step is to place an order. The GUI layer provides the business logic layer with customer data, based on which the order is stored in the database. The products are then removed from the customer’s cart. The last step is to generate an invoice and save its data in the database and display the order confirmation to the customer. If the customer does not complete the order within 30 min of the last change in the cart, the products in his cart will be added back to the group of items available in the store for other customers to order. The submitted technical documentation contains the most important requirements for the online store. The diagrams show the processes that take place in virtually

486

M. Kieszek et al.

every online store, both on the administrative and customer side. The documentation contains a general description of the application development, which represents the construction of the entire store.

7 User Documentation The user documentation has been divided into an administrative section, which is available to the system administrator, and a client section, which is available to all users and customers.

7.1 Management Panel Figure 39 shows one of the windows of the administration panel of the created online store. The main menu contains tabs such as: Customers, Categories, Manufacturers, Products, VAT Rates, Orders, Configuration and Logout Button. Each of these tabs has options that allow the administrator to manage the online store. The “log out” button is used to safely exit the work in the administration panel. The Configuration tab allows you to manage store settings. One of the options he has is the ability to complete online store information and contact information visible to customers.

Fig. 39 Save the administration panel

E-Commerce Platform Using SQLite

487

After installing the storage software on the server, the administrator must additionally add manufacturers, product categories, and applicable VAT rates.

7.1.1

Login to the Administration Panel

To log in to the administration panel, fill in the appropriate form on the website (Fig. 40). After entering the correct login name and password, the administrator is logged in to the website and redirected to the home page shown in Fig. 41. In case of providing incorrect data, the user will be notified of this fact by the relevant message.

Fig. 40 Log in to the admin panel

Fig. 41 Correct login to the administration panel

488

M. Kieszek et al.

Fig. 42 Main window of manufacturers management

7.1.2

Manufacturer Management

On the panel, the administrator can manage producers. To do this, select the Manufacturers tab. The main window displays a list of manufacturers sorted alphabetically by name. If there are more than ten manufacturers (default setting), they are divided into pages for easier browsing. Figure 42 shows the main user management window. The administrator also has the ability to view and edit the data of the selected manufacturer and delete them. On the Find Manufacturer tab, the administrator can search for manufacturers according to the following criteria: name, tax identification number, and phone number.

7.1.3

Category Management

On the Categories tab, the administrator can add new product categories as well as edit and delete existing ones. Figure 43 shows the main view of category management in the message bar. It displays a list of categories and buttons for each of them, so you can edit or delete an existing category. To add a new category, select Categories on the main tab, then select New Category. Then a form will appear in which you should enter the name of the new category and press the “Save” button.

7.1.4

Management of VAT Rates

On the VAT Rates tab, the administrator has the option to add new VAT rates and view, edit values, and delete existing rates. Only rates to which no products are assigned can be removed.

E-Commerce Platform Using SQLite

489

Fig. 43 Main category message window

Fig. 44 The main window for managing VAT rates

Figure 44 shows the main window for managing VAT rates for products in the online store. Furthermore, in the Find meme rate option, the administrator can search for the relevant VAT rate by entering its value.

7.1.5

Product Management

To add new products, you need to add their manufacturers, create the appropriate categories in the store, and add the appropriate VAT rates. Figure 45 shows the

490

M. Kieszek et al.

Fig. 45 The main product management window

main product management window offered in the online store. The administrator can select a product to view or edit detailed data, as well as to delete the product from the database. To add a new product, select the New Product option and then fill out the form shown in Fig. 46. In the next step, select the product photo file from your computer disk, and then save the changes. The administrator also has the ability to search for products with specific parameters, such as: product name and category, availability, price range, and the number of items available in stock. Sample search results are shown in Fig. 47.

Fig. 46 Adding a new product

E-Commerce Platform Using SQLite

491

Fig. 47 Manufacturer search results

Fig. 48 Management of registered shop customers

7.1.6

Customer Management

On the Customers tab, the webmaster can manage the registered customers of the store. Figure 48 shows the main view of this functionality. There is a list of store customers where the administrator can view user data, edit it, and delete the selected user’s account. Then the administrator can change any user’s password, which can be useful if the password is lost. On the New Customer tab, you can add new customers to the database, while in the Find Customer item, the administrator can search for customers by first name, last name, phone number, and login (email address).

7.1.7

Order Management

The Order Management tab contains a list of orders entered into the store by customers. The administrator can review new orders and preview products ordered

492

M. Kieszek et al.

Fig. 49 Sample order placed in the store

in them and customer information. In addition, a “pdf” file with an invoice that can be printed is generated for each order. Figure 49 shows an example of an order placed in an online store.

7.2 Online Shop Figure 50 shows one of the online store pages. At the top of the page is a menu with hyperlinks to pages where you can create an account or log in. On the left side there is a panel in which the main categories of the store are displayed and below them are the contact details of the online store. On the right side is a basket to which are added products that the customer wants to order. Displays the number of stored products and the value of the order. Below it are recently added products that are available in the store. In the middle of the website there are products corresponding to the selected category. Here is a short description with the name of the manufacturer, price and photo of the specific product. To the right of the product description is the availability bar, which is green if there are many items left, yellow if there is a medium amount, and red if there are less than 10 items. Each product is associated with buttons, thanks to which you can add it to the cart and view more information about it.

E-Commerce Platform Using SQLite

493

Fig. 50 Example of an online store page

7.2.1

User Registration and Login

To create a new user account in the online store, select “registration” from the menu. In the next step of the form, you must enter your email address and password and your confirmation to verify. If any field is incorrectly filled in or an account has already been created for the entered e-mail address, a corresponding message will be displayed. The user then moves to the next page, where they must fill out a form by entering their personal information. If it is a business account, it must also provide a tax identification number. After setting up the user account correctly, log in by selecting the Log in option from the menu and then filling in the form provided by the registration you entered.

7.2.2

Ordering Products

When ordering products, it is not necessary to log in to the online store first. You can add products to the cart and log in before placing an order, without having to add them again. To add a specific product to the cart, select the appropriate category or select it from the list of newest products, if any. If the category is displayed, one product item will be added after clicking the Add to cart button. Click the Show

494

M. Kieszek et al.

Fig. 51 Preview of the order cart

Details button to add more. After loading the page with a detailed description of the product, you can enter the appropriate number of pieces that the customer wants to order. The quantity of each product is listed in stock. If the customer enters more items than are available, the largest possible number of items will be added to the cart. After adding a product to the cart, its contents will be displayed. An example of displaying the contents of the basket is shown in Fig. 51. Furthermore, in the basket it is possible to increase and decrease the amount of the product, but also to select the product completely from the basket. Net price, tax and gross price are calculated for each order. If the customer is logged in, after pressing the Execute order button, his order will be processed and the cart will be cleaned.

7.2.3

User Account Management

On the My Account tab, the user has the option to verify and update their personal information. He can also change his password to access the website there. On the My Orders tab, the customer can view a list of the orders he has entered and can download an invoice image in “pdf” format. A view of the order panel is shown in Fig. 52. The user documentation describes the most important configuration options in the administration panel and the online store. The documentation consists of two separate parts, the first for the system administrator and the second for the store’s customers. The most important possibilities and possibilities connected with the operation of the shop, which facilitate its use, were illustrated.

E-Commerce Platform Using SQLite

495

Fig. 52 List of customer orders

8 Summary The introductory chapters describe how important all types of transactions over the Internet are in our lives. Nowadays, you can buy anything in it without leaving your home. The ordered goods will be delivered even within a few hours of the order being made by courier companies. It is a very convenient and fast way to buy, often at lower prices than in stationary stores. Because of this, it will attract more and more people and use it. Creating an online store application is not an easy task, as it has to be adapted to the requirements of different users. It’s important for customers to have convenient navigation between store pages and an intuitive and easy interface. The appearance of the store is also important, which should also encourage potential customers to buy. Every product in the store should be presented in a way that attracts attention. It should also have a photograph and a detailed description. From the point of view of the system administrator, the administration panel must allow the management of all possible storage options and, in addition, its use should be intuitive and convenient. Its transparency and grouping of options into categories is important. The created store application can be used to open any online store. Its appearance can be easily adapted to the requirements of customers and the range of the store. The e-commerce software was written using Java using a three-tier application architecture. This made it possible to separate the database abstraction layer from the application’s business logic. This allows the developed software to use any database supported by the Hibernate Framework. Speed of operation is important for the administration panel and for the online store. The implemented store was based on the SQLite database, which is common to both sites. However, it turned out that this database is definitely slower compared to another popular database server, namely MySQL. In performance tests, it was even several tens of times slower. This means that with a large number of customers who would remain in the store at the same time, this database would not work. Customers would notice a longer time to load the store page, which could discourage them from

496

M. Kieszek et al.

making further purchases. Due to the size of the SQLite database and low hardware requirements, it can be used in desktop applications used by individual users and in embedded systems. The tools used significantly facilitated the implementation of the business. It was very important to prepare detailed documentation and UML diagrams in advance. This has made it easier to understand how the online store and administration panel work, as well as its implementation. The store application can be further developed with new features, but in the current phase it is possible to run the fully functional online store software and its management panel. In the case of the introduction of software for everyday use, it would be worthwhile to introduce its integration with online payment systems and price comparison websites, where customers have recently been searching for products more and more often.

References 1. Włodzimierz Szpringer: Handel elektroniczny—konkurencja czy regulacja. Ed. Difin. Warszawa (2000) 2. Pete Loshin, John Vacca: Electronic Commerce, Fourth Edition. Ed. Charles River Media. USA 2004 3. http://www.amazon.com/ 4. http://www.ebay.pl/ 5. http://www.totu.com/ 6. http://allegro.pl/ 7. Wojciech Kyciak, Karol Przeliorz: Jak zało˙zy´c skuteczny i dochodowy sklep internetowy. Ed. Helion. Gliwice (2006) 8. Marek Ja´slan: Sklepy w komórce b˛ed˛a inne. Ed. Twoja Komórka, red. Marcin Kwa´sniak, Warszawa 2009 M 12 nr 12/142. s. 20–22 9. Polska w liczbach. [Online] Zakład Wydawnictw Statystycznych .Warszawa 2010. http://www. stat.gov.pl/cps/rde/xbcr/gus/PUBL_f_polska_w_liczbach_2010.pdf. 20–21 10. Grzegorz, S., et al.: Ecommerce 2010 [online], edn 4. International Data Group, Poland SA (2010). http://www.internetstandard.pl/whitepapers/1521/Raport.ecommerce.2010.html 11. Piotr, J., et al.: Raport e-Handel Polska 2009 [online]. Dotcom River (2009). https://www.skl epy24.pl/pobierz_raport_e-handel_2009 12. http://www.magentocommerce.com/ 13. http://www.prestashop.com/ 14. Poniszewska-Mara´nda, A.: Security constraints in access control of information system using UML language. In: Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE-2006) (2006) 15. St˛epie´n, K., Poniszewska-Mara´nda, A.: Towards the security measures of the vehicular AdHoc networks. In: Andrzej, M., Skulimowski, J., et al. (eds.) Internet of Vehicles. Technologies and Services Towards Smart City, IOV 2018, LNCS 11253, pp. 233–248. Springer-Verlag Heidelberg (2018). https://doi.org/10.1007/978-3-030-05081-8_17. ISSN 0302-9743, ISBN: 978-3-030-05080-1 16. http://www.iai-shop.com/ 17. http://www.istore.pl/ 18. Herbert S.: Java Kompendium programisty. Ed. Helion. Gliwice 2005 10 19. Molnár, E., Molnár, R., Kryvinska, N., Greguš, M.: Web Intelligence in practice. Soc Serv Sci J Serv Sci Res Springer 6(1), 149–172 (2014)

E-Commerce Platform Using SQLite

497

20. Liderman, K., Arciuch A.: Projektowanie systemów komputerowych. BEL Studio Sp. Z. o. o. 2001 21. Naci, D., Lawrence, M., Arthur, R.: Eclipse Web Tools Platform Tworzenie aplikacji WWW w j˛ezyku Java. Ed. Helion. Gliwice (2008) 22. Kryvinska, N., Gregus, M.: SOA and its Business Value in Requirements. Features Comenius University in Bratislava Practices and Methodologies (2014). ISBN 9788022337649 23. Yourdon E.: Marsz ku kl˛esce. Poradnik dla projektantów systemów, WNT, Warszawa (2000) 24. Anil, H.: Java. Tworzenie aplikacji sieciowych za pomoc˛a Springa, Hibernate i Eclipse. Ed. Helion. Gliwice 2007 25. Krzysztof Rychlicki-Kicior: Java EE 6. Programowanie aplikacji WWW. Ed. Helion. Gliwice 2010 26. Patkowski A.E.: Dokumentowanie procesu projektowania. Biuletyn IAiR str 57-70 WAT, Warszawa 1998 27. Gregus, M., Kryvinska, N.5 Service Orientation of Enterprises—Aspects. Dimensions Comenius University in Bratislava Technologies (2015). ISBN 9788022339780 28. Kaczor, S., Kryvinska, N.: It is all about services—fundamentals, drivers, and business models. Soc Serv Sci J Serv Sci Res Springer 5(2), 125–154 (2013) 29. Shore, J., Warden, S.: Agile Development. Filozofia programowania zwinnego, Helion (2008) 30. Poniszewska-Mara´nda, A.: Access control coherence of information systems based on security constraints. In: SafeComp 2006: 25th International Conference on Computer Safety, Security and Reliability, LNCS 4166, pp. 412–425. Springer-Verlag Heidelberg (2006) 31. Kryvinska, N.: Building consistent formal specification for the service enterprise agility foundation. Soc Serv Sci J Serv Sci Res Springer 4(2), 235–269 (2012) 32. Liderman, K.: Formalizacja procesu pozyskiwania informacji dla potrzeb specyfikacji wymaga´n na projektowany system. Zeszyt 9, WAT, Warszawa (1998) 33. Poniszewska-Mara´nda, A., Rutkowska, R.: Access control approach in public software as a service cloud. In: Zamojski, W., et al. (wds.): Theory and Engineering of Complex Systems and Dependability, in Advances in Intelligent and Soft Computing, vol. 365, pp 381–390. Springer-Verlag Heidelberg (2015). ISSN 2194-5357, ISBN 978-3-319-19215-4 34. Kreibich, J.A.: Using SQLite. Ed. O’Reilly (2010) 35. http://www.sqlite.org/docs.html 36. https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager 37. http://sourceforge.net/projects/phpmyadmin 38. Poniszewska-Mara´nda, A.: Role engineering of information system using extended RBAC model. In: Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE-2005). Linkoping, Sweden (2005)

How to Prevent Unsafe Behaviour of Employees? Explanatory Models of Insecure Behaviour at the Workplace and Prevention Methods Valéry Wöll and Rozália Sulíková

Abstract Human error is considered to be the main cause of occupational accidents, accounting for up to 96%. Four models are currently cited in the german-speaking world to explain human error in the area of occupational safety. With the exception of the ABC model, which is regarded as the only holistic, scientifically proven and practicable model for explaining the causes of human error in occupational accidents, the other models are controversial or are not considered adequate when used in isolation. A quantitative literature analysis of 56 legal texts, regulations and official notices from the field of occupational safety made it possible to investigate which methods are currently required by law to prevent the causes of occupational accidents in companies. It can be seen that the elements of the qualification method, with a share of approx. 76%, are the measures most frequently required by law. Keywords Behaviour-based safety · Occupational safety · Behavioural analysis · Preventive measures · Occupational accident · Occupational safety laws

1 Introduction Workplaces in Germany are still far from being safe places. In 2019, 806 fatal occupational accidents and over 1,000,000 reportable occupational accidents occurred, representing a 10.41% increase in fatal occupational accidents from the previous year [1, p. 14]. In addition to the suffering of the persons affected and their families, the economic damage caused each year by occupational accidents must also be considered. With approximately 75 million working days lost as a result of accidents in 2018 [2, p. 2], the German economy lost around 15.4 billion euros in gross value added. The prevention of occupational accidents and the systematic organisation of V. Wöll (B) · R. Sulíková Faculty of Management, Comenius University in Bratislava, Odbojárov 10, P.O. Box 95, 820 05 Bratislava 25, Slovak Republic e-mail: [email protected] R. Sulíková e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_16

499

500

V. Wöll and R. Sulíková

accident prevention measures are not only required by law [3], they are also part of sustainable leadership and management activities in all companies [4, p. 4]. This chapter aims to summarise the current state of research regarding the causes and possible prevention of occupational accidents and to identify research gaps. The following identified questions should be answered: • Question 1: What are the main causes of occupational accidents? • Question 2: What theoretical models are used in the scientific literature today to explain the main causes of occupational accidents? • Question 3: What methods, based on the theoretical models, are currently used by companies to eliminate the main causes of occupational accidents?

2 Methods In order to identify the main causes of occupational accidents and to clarify the question of which theoretical models are used to explain human error in the context of occupational safety, a literature search was conducted according to Webster and Watson [5]. Using the databases JSTOR, Google Scholar and Google, scientific literature was collected, synthesised and analysed in terms of content. The following keywords were used to search for the appropriate literature: Occupational accident; risk assessment; hazardous conditions; cause of accident; behaviour; human error; behavioural analysis; reinforcement; behavioural occupational safety. The research focused on texts in German in order to obtain a representative picture of the literature on occupational safety in the German-speaking world. The criteria used for relevance categorisation and synthesis were content consistency with the topic and Google Scholar’s relevance categorisation. A total of 49 scientific sources and 15 official publications were examined in terms of content. The statements of various authors on models and methods for preventing human error in the field of occupational safety were then evaluated in a structuring qualitative content analysis according to Mayring and Fenzl [6, pp. 543–558]. Since no complete information on the use of specific occupational safety methods in companies could be found in the scientific literature, 56 laws, regulations and official instructions for action on risk assessment were introduced in a quantitative literature analysis. The OSH laws that were classified as authoritative by the authorities were used [7, p. 24]. In addition, all DGUV (Deutsche Gesetzliche Unfallversicherung) regulations were used that are located below the laws on the 2nd level in the OSH regulatory hierarchy. Including the 3rd level (DGUV rules, information and principles) alone would have increased the number of documents to be examined to approx. 1000 documents. In addition, the Federal Republic of Germany still has technical rules for workplaces, occupational safety, bio-materials and hazardous substances, as well as standards on technical occupational safety and health from the DIN (Deutsches Institut für Normung), VDE (Verband der Elektrotechnik) and VDI (Verein Deutscher Ingenieure) organisations. Several European regulations and directives at European level, which are implemented as regulations in the Federal

How to Prevent Unsafe Behaviour of Employees? Explanatory …

501

Republic of Germany, also exist. These are far more than 1000 additional documents. All documents are also subject to change and new documents are issued regularly. It was therefore not possible to include all legally binding documents in the area of occupational safety. The number of texts examined was therefore limited to the 56 most relevant. The texts were subjected to a quantitative content analysis according to Kelle [6, pp. 153–165] using the maxQDA software tool. For this, 47 different codes in 4 categories are used to explore the frequency of mention of the various terms of the models and methods for preventing human error in the field of occupational safety. In this way it should be determined which methods for the prevention of accidents the laws and regulations demand from the companies. A conclusive statement on which methods are actually applied by the companies could not be made.

3 Results 3.1 The Definition of Occupational Accidents In the German-speaking world, accidents at work are colloquially referred to as accidents that an employee suffers as a result of a work activity. The legal definition by the german government in Social Code VII [8, p. 17] reads as follows: “Occupational accidents are accidents suffered by insured persons as a result of an activity giving rise to insurance cover under sections 2, 3 or 6 (insured activity). Accidents are events of limited duration that act on the body from outside and lead to damage to health or death.” With this, the legislator in the Federal Republic of Germany has on the one hand extended the group of persons beyond the classic employee and on the other hand set several conditions. First of all, the event must be related to the work activity and must not occur as a result of a private act. A fall in the toilet while the employee is going to the toilet, even during working hours, is therefore not an occupational accident in the sense of the legislator and therefore does not entitle the employee to benefits from the accident insurance institutions. Furthermore, the external impact must not extend over a longer period of time but must occur suddenly. In distinction to this, a longerlasting health-damaging event as a result of an insured activity is referred to as an occupational disease. The complete criteria for the recognition of an occupational disease are also defined in Social Code VII [8, p. 17]. To exclude events of internal cause (cardiovascular failure, embolism), reference is made to an event acting on the body from the outside. Such an event can be, for example, an electric shock or a falling tool. The last criterion aims at the fact that the event must not remain without consequences. The insured person must therefore have been objectively and verifiably harmed in some way. However, a health injury can also be, for example, an emotional trauma (for example, caused by an assault). An occupational accident without the fulfilled condition of damage to health or death is referred to in the literature as a near-accident [9, p. 6] and, according to the occupational health and

502

V. Wöll and R. Sulíková

Fig. 1 The safety pyramid based on Anderson and Denkl [11]

safety act section 16 [3], must be reported at least to the supervisor even without the fulfilled condition. A near-accident has not become an accident only due to a coincidental event [10, p. 11] (Fig. 1). The safety pyramid based on Anderson and Denkl [11] illustrates the relationship between fatal accidents, serious accidents, minor accidents, near accidents and unsafe acts. Reported accidents are only the visible tip of the iceberg of unsafe acts. The figures are based on the accident figures of the Federal Republic of Germany from 2019 [1, p. 14] and projections according to Bördlein [12, p. 35]. The number of near-accidents far exceeds the number of serious accidents. For example, the safety pyramid according to Anderson and Denkl [11] describes that only a fraction of accidents actually lead to an injury. According to research, for every serious accident there are 29 minor accidents, 300 accidents without injury consequences and an unexplained but much larger number of unsafe behaviours [12, p. 35]. Considering that in the Federal Republic of Germany only occupational accidents with injuries resulting in incapacity for work of more than 3 working days are reportable, and that in 2019 more than 1,000,000 occupational accidents were reported [1, p. 14], one can extrapolate how many non-reportable accidents, how many near-accidents and unsafe acts must have occurred in the same period.

3.2 Researched Causes of Occupational Accidents In order to explain the causes of occupational accidents, the model of the interaction of causes and hazardous conditions is chosen in parts of the literature. The directly acting factors are counted as causes, while the hazardous conditions enable the causes to act in the first place [12, p. 17]. Accordingly, in the case of an occupational accident involving electricity, the defective power cable on the equipment would be the dangerous condition, while the worker’s touching of the cable would be the cause. Other publications [13, p. 7] choose terms such as hazard and hazard source, according to which a hazard is a spatio-temporal encounter of people with a hazard

How to Prevent Unsafe Behaviour of Employees? Explanatory …

503

source (as a synonym for hazardous condition). Hazard factors are then listed by way of example as possible sources of danger. The different hazard factors are published in various publications as a non-exhaustive list [13] and are intended to serve experts in the detection of possible sources of danger. There is no generally valid definition of the different terms. For example, in currently valid standards the hazard factors are referred to as hazards [14, p. 21], or in other legally binding documents they are referred to as stress factors [15, p. 9]. Other literature also cites “favourable conditions”, which are added as a further factor in the cause of accidents [10, p. 15]. This then refers to factors that are not necessarily foreseeable, such as excessive fatigue of an insured person at the time of the event. A reduced ability to react due to alcohol or drug consumption could also be described as such a favourable condition. Nowadays, only human error is cited in the literature as the main cause, in the sense of a directly acting factor for occupational accidents. According to a metaanalysis by Bördlein [12, p. 19], the proportion of human error as the main cause of an occupational accident is 76–96%. However, the cited studies also did not use standardised categories, so that a uniform assessment cannot be assumed. In a study by the Federal Ministry of Transport, for example, 3 different categories were used to assess the causes of road accidents. Here, a distinction was made between “human error”, “general causes of accidents” (road conditions, weather conditions, obstacles, wildlife accidents) and “technical defects” [16]. The study found that in 2019, 91.4% of all road accidents in the Federal Republic of Germany were due to human error. An investigation of fatal occupational accidents from the years 1990 to 1995 in Berlin shows that in approx. 44.8% of all cases, occupational safety regulations were also obviously violated [17, pp. 138–144]. However, the report on occupational accidents by the DGUV [18] never refers to “human error” or behaviour in general as the cause of occupational accidents. Causes given here are, for example, “fall accident” or “trip and fall accident”. The accident report forms of the employers’ liability insurance associations do not provide for accident cause categories either. Here, rather, different types of accidents are differentiated as causes and named as “causes” of the accidents. Fahlbruch and Mayer [19, p. 18] distinguish between the categories “technology”, “organisation” and “person/behaviour” in their guide to investigating occupational accidents. Paffrath [20, p. 14] also distinguishes between the categories of technical (22%), organisational (33%) and personal (45%) causes of accidents. Organisational failure is also ultimately the misconduct of individual persons in their function, so that 78% human misconduct can also be inferred here. In its official investigation report on fatal occupational accidents, the BAUA (Bundesanstalt für Arbeitsschutz und Arbeitsmedizin) [21] distinguishes between the four categories of technology, organisation, personnel and other and thus categorises them similarly to Fahlbruch and Mayer [19, p. 18]. External environmental influences such as weather conditions are named under Other. In the investigation reports, up to 3 interacting different causes for an accident can be defined. The so-called T-O-P principle can be described as a defined model for categorising occupational accident causes [22, p. 7]. In the TO-P principle, technology, organisation and person are distinguished as causes of accidents and the category other is eliminated as negligible. The T-O-P principle can thus be described as an available standard in occupational safety for defining the

504

V. Wöll and R. Sulíková

causes of occupational accidents. However, this standard is not used by all actors. In conclusion, it can be stated that all the texts examined define human error as the main cause of occupational accidents.

3.3 Human Error in the Context of Occupational Safety In the literature, there is a widely accepted model for human error according to Reason [23, p. 21] in which 3 categories of human error are distinguished. Accordingly, the errors can be differentiated according to the mode of action and the presence of intent. Hofinger [24, p. 49] introduces a fourth category, as she further divides unintentional errors into errors due to inattention and errors due to forgetfulness. These errors can also be grouped under unconscious and not intentional. The unconscious unintentional errors, also called blunders or oversights by Fahlbruch et al. [23, p. 21], can only be eliminated very poorly through training, as they do not fall into the category of conscious actions. To reduce unconscious errors, attempts are made to eliminate disturbing influences such as overtiredness, noise or poor ventilation. Hofinger [24, p. 49] also refers to these errors as attentional errors. The deliberate unintentional errors are intentional but wrong actions performed in the firm belief that they are performing a safe action [23, p. 21]. For example, an employee might work on a rung ladder, although this has been forbidden for several years. He is well aware of what he is doing, but lacks the intention to work unsafely and violate rules because he chooses the wrong action due to ignorance [24, p. 49]. This behaviour can be prevented by conventional training and instruction, which provide the employee with the necessary knowledge. The third group are the conscious and deliberate mistakes that are made because the employee expects positive consequences from his behaviour. For example, the work can be done faster, uncomfortably oppressive PPE (Personal Protective Equipment) does not have to be worn, or the employee saves himself additional work. Hofinger [24, p. 49] also refers to these errors as violations or sabotage, because strictly speaking they are not errors. In this case, the employee wants to perform an unsafe action. Since human error is assumed to be the main cause for the occurrence of occupational accidents in the sense of direct effect by all publications examined, without going into further detail as to which category of human error the error belongs to, no further distinction will be made here in individual categories. However, from the author’s point of view, it must be considered a research gap that the share of the respective error categories from Table 1 in the causes of accidents is not known. According to Hofinger [24, p. 51], no clear result can be determined as to which of the error categories is decisive.

How to Prevent Unsafe Behaviour of Employees? Explanatory …

505

Table 1 Shows the different categories of human error based on Fahlbruch et al. [23] and Hofinger [24] in connection with the causes and possibilities of influencing them Category

Consciously

Premeditated

Cause

Influence

Unconscious unintentional mistakes

no

no

Poor attention, excessive demands, lack of perception, forgetfulness

Design solutions, assistance, elimination of negative influences

Deliberate unintentional errors

yes

no

Lack of knowledge, inadequate training

Training, instruction

Deliberate intentional errors

yes

yes

Pleasant consequences due to the misconduct

Motivation to change behaviour

3.4 Explanatory Models for Causes of Human Error in Occupational Safety and Health The literature search revealed the existence of 4 different models to explain the causes of human error in occupational safety (Table 2). In the following, the different models and their evaluation in the literature will be presented. In the personality model, it is assumed that employees with certain personality traits are disproportionately often involved in accidents at work. Influences from environmental conditions are defined as secondary [12, p. 24]. Among the critical voices against the personality model is Bördlein [12, p. 24], who argues that in the model the victim of the occupational accident is in a sense blamed for the accident. According to BG (Berufsgenossenschaft) Verkehr [22, p. 41], studies on personality traits that promote accidents are inconsistent and partly contradictory, so that the personality model is currently no longer considered a suitable model for explaining the cause of accidents, despite partly increased correlations between personality types and accident frequencies. In addition, it is legally problematic to treat an employee differently on the basis of personal characteristics such as age and gender or character traits. An employer would probably even be liable to prosecution here in some cases, as it violates the General Equal Treatment Act [27]. Studies have also shown that if you transfer employees with a high accident propensity out of areas, other employees in that area suffer a similarly high number of accidents [12, p. 22]. However, there are also representatives in the literature who certainly describe the personality model as a suitable approach. Lengwiler [26, p. 358] writes on the condemnation of the personality model by parts of the literature that individualising responsibility-assigning concepts such as the personality model must be rejected by social insurance funds, because as general social institutions they do not have the possibility to use the method of personnel selection. They are forced to treat and support every member equally. Insurance companies, on the other hand, which are

506

V. Wöll and R. Sulíková

Table 2 Shows the models for causes of human error in occupational safety and links the models to the methods of occupational safety for increasing the safety-conscious behaviour of employees Model

Cause of human error

Derived method of occupational safety

Actions against human error

Personality model

The cause of the misbehaviour lies in the characteristics of the personality

Personnel selection

Dismissal, not recruitment, or transfer of employees in another departments

Monitoring model

The cause of the misconduct is due to insufficient supervision

Police method

Increase the likelihood of detection of misconduct and sanction it

Information model

The reason for the misconduct is insufficient knowledge and lack of information availability

Qualification method

Providing all necessary information and increasing the qualification of the staff

ABC model

The cause of the Behavior Based misbehaviour lies in the Safety (BBS) preceding conditions and in the predominantly positive consequences of the misbehaviour

Motivation for safe behaviour through systematic goal-oriented design of favorable preceding conditions and positive consequences for safe behaviour

The table was developed based on Bördlein [12], Schaper [25] and Lengwiler [26]

able to select personnel, have been using bonus-malus systems for years, thus individualising the risks and the associated costs. Competitive thinking, quick action, hostile perceptions of the environment and external control beliefs, for example, are cited as promoting characteristics for a tendency to have accidents. Various studies have demonstrated correlations between these different personality types and accident frequencies [25, p. 496]. Personal hereditary characteristics such as gender also obviously play a role as a factor in safety-related behaviour. For example, 77.8% of speeding offences and 83.8% of alcohol-related road traffic offences are committed by Men [28]. In some professions, such as that of air traffic controller, defined personality traits such as a high level of attention and concentration are still hiring requirements today. These personality traits are tested in tests before hiring the appropriate personnel. In particular, correlations of low social compatibility and accident proneness are also shown [29, p. 7]. In contrast to earlier research, it is nowadays no longer assumed that most of the personality traits that cause an increased accident propensity are unalterable personal characteristics. It is now assumed in occupational and organisational psychology that it is changeable temporary personality traits that produce an increased accident propensity [26, p. 353]. Many authors [12, 22, 25, 30, 31] currently also cite the improvement of safety culture as a tool to prevent accidents. If one understands culture as the sum of behaviours, attitudes and values

How to Prevent Unsafe Behaviour of Employees? Explanatory …

507

[32, p. 54], this means nothing other than that one wants to change personality traits such as the willingness to take risks, the sense of order or the need for safety of one’s employees in a certain direction. In conclusion, it can be said that there is no unanimous opinion in the scientific literature on the effectiveness of the personality model and personnel selection. It is undisputed that the personnel selection model alone, without further actions, does not lead to success. The monitoring model assumes that misconduct is stopped by detecting and sanctioning it during inspections [12]. The cause of the misconduct is not questioned here and is not the focus of the model. A decrease in human error is achieved by more frequent controls and more drastic penalties for deviations and by increasing the probability of catching deviants. In fact, several studies have shown that increasing surveillance leads to a decrease in deviant behaviour [33, p. 9]. It should be noted, however, that there are several disadvantages associated with the surveillance model. Firstly, the financial cost of constant monitoring and control is high [12, p. 29] and secondly, the permanent monitoring and sanctioning of employees does not contribute to a positive working climate [34, p. 27]. When applying the monitoring model, it must also be considered that employees as a resource are not arbitrarily replaceable and available. Legal hurdles severely limit the sanctioning of employees for misdemeanours and permanent monitoring of employees is prohibited by law [35]. Furthermore, research has shown that even with prison sentences, the recidivism rate is on average over 36% [36, p. 233]. It is important to mention that sanctioning may not be completely omitted, however, as according to section 6 of the Occupational Health and Safety Act, the employer must check the effectiveness of its measures to prevent occupational accidents and must initiate further measures if its previous measures are not effective [3, p. 2]. Thus, if the employer knows that one of his employees repeatedly and intentionally violates OSH (Occupational Safety Health) regulations, he must sanction him so that he is not himself guilty of an administrative offence or even a criminal offence. The method of monitoring employees is also explicitly required in some legal regulations [37, p. 10]; [3, p. 2]. Without sanctions, therefore, legally compliant and effective occupational safety and health is inconceivable. However, due to the disadvantages of the model already mentioned, an entrepreneur will probably not be successful with monitoring and sanctions alone. In conclusion, it can be stated that the literature consistently rejects the exclusive monitoring and sanctioning of employees but regards the police method as indispensable as a supplementary method to other methods. The information model assumes that unsafe behaviour results from a lack of information and that employees act safely as soon as they have sufficient information. Retrospective accident analyses have shown that in approx. 21% of the accidents the corresponding information about emerging hazards was not available to a sufficient extent and in 36% of the accidents risks were not recognised [25, p. 503]. Knowledge of safe behaviour is therefore a necessity for the execution of safe behaviour, but the information alone about which behaviour is safe and desirable does not generate safe behaviour. In 2019 alone, over 3 million people in the Germany were punished for speeding with their vehicle [28]. It can be assumed that most drivers knew that they were violating applicable safety rules at that moment and did it in spite of everything.

508

V. Wöll and R. Sulíková

For this reason, the information model alone cannot be suitable for generating safe behaviour. However, the information model is explicitly applied in various laws, even if it is not described and named as a model in the practice-relevant literature. The obligation to instruct employees on hazards and safety measures is nothing other than the obligation to implement the qualification method on the basis of the information model [38, p. 7]. Many regulations [39, p. 21]; [40, p. 11] also require the issuing of information documents such as safety data sheets, operating instructions or instruction manuals, so that the information model is a supporting pillar of preventive occupational safety and health. As a conclusion, it can be stated that the sufficient supply of employees with information is imperative. However, as an isolated model without the support of other models, the information model can neither explain nor prevent human error. The ABC model from behavioural analysis represents the links between behaviour and environmental conditions. Bördlein [12, p. 393] describes the model as the only holistic scientifically proven and practicable model for explaining the causes of human error in occupational accidents. The occupational safety authorities of the Federal Republic of Germany also use this model to explain the causes of accidents [22, p. 48]. The ABC model also incorporates the monitoring model, the information model and the personality model, as it includes the elements of monitoring, information and sanctioning as “antecedent conditions” and “consequences” in its overall model. According to Paffrath [20, p. 53], the BBS (Behavior Based Safety) approaches have proven their worth in occupational safety and health. As shown in Table 3, the ABC model is described by all authors as effective in preventing human error in OSH. The information model and the monitoring model are described as partially effective, but disadvantages and weaknesses are also cited by authors. With these models, the authors do not assume that they can reduce human error to a large extent in isolation as a single model. The personality model must be considered quite controversial, as some authors reject it completely, but other authors evaluate the model positively. Since all the authors examined evaluated the ABC Table 3 Shows the result of the structuring qualitative content analysis Cat.

Personality model

Information model

Monitoring model

ABC model

C1 positive

[26, p. 353]

[20, p. 14]

[20, p. 139]

[12, p. 393] [41, p. 23] [20, p. 53] [22, p. 48]

C2 indiscriminate

[25, p. 496] [29, p. 7]

[12, p. 26] [25, p. 503] [23, p. 36]

[12, p. 29] [34, p. 27]

C3 negative

[22, p. 41] [12, p. 24]

The table shows which models and methods receive full agreement from the various authors (C1), which models and methods receive partial agreement (C2) and which models and methods are rejected (C3)

How to Prevent Unsafe Behaviour of Employees? Explanatory …

509

model and the BBS method extremely positively, the ABC model will be examined more closely.

3.5 The ABC Model of Behavioural Analysis in the Context of Occupational Safety In the Encyclopaedia of Psychology, Kaiser defines behaviour as “any form of motor activity” [42], referring to overt conscious reactions that are to be distinguished from purely involuntary and neuronally mediated reflexes. In behaviourism and the behavioural analysis based on it, behaviour is divided into overt and covert behaviour, with overt behaviour subsuming all visible activities (laughing, talking, howling) and covert behaviour subsuming the invisible activities that are only self-perceptible (thinking). Behavioural analysis as a science attempts to explain behaviour and relates this behaviour to the preceding conditions and to the consequences caused by the behaviour [12, p. 53]. The antecedent conditions (A—Antecendents) are the environmental influences that affect the person before the behaviour. In occupational safety, the following factors could be listed as examples: Safety culture of the company, availability of PPE, level of knowledge of the employee, training of the employee, amount of time pressure, noise level in the work area, mental state of the employee, presence of the supervisor, consequences suspected by the employee. These conditions lead to behaviour (B—Behaviour), which is the employee’s reaction to the preceding conditions and the expected consequences. For example: Not putting on the hearing protection; putting on the hearing protection. The consequences (C—Consequences) are the environmental reactions to the employee’s behaviour. Here, for example, the following reactions could be possible: Approval by colleagues, warning by the supervisor, no consequences at all (ignoring), attenuation of the noise level by the hearing protection, hearing damage by noise (Fig. 2).

3.6 Pre-existing Conditions of Employee Behaviour Prior conditions can be divided into 2 categories in the context of desired behaviour: • Absolutely necessary prior conditions • Possible preceding conditions. Absolutely necessary conditions are conditions that make behaviour possible in the first place and without which it is not possible for the employee to show safe behaviour [12, p. 65]. The following conditions can be listed as examples: • The employee must have information about what behaviour is desired • The employee must be cognitively and physically able to perform the behaviour.

510

V. Wöll and R. Sulíková

Fig. 2 The extended ABC model based on Bördlein [12, p. 59] and Messner [30, p. 8] illustrates the connection between preceding conditions, behaviour and consequences, which influence each other to varying degrees

• Required work equipment and personal protective equipment must be available. The information about the desired behaviour must have been presented beforehand in an understandable way and not, for example, in an incomprehensible text form. If a German employee is to be informed about the correct use of the prescribed PPE when using a hazardous substance, it is not sufficient to hand him a safety data sheet in English in accordance with the REACH (Registration, Evaluation, Authorisation and Restriction of Chemicals) Regulation. The employer must provide him with this information in writing in a clearly understandable manner as operating instructions in accordance with the Industrial Safety Ordinance [40, p. 11] and also instruct him practically with the corresponding PPE. Only if the employee has obviously understood this practical instruction may the employee work with the hazardous substance at all [39, p. 21]. The instruction must also not be older than one year. The employee must then of course have the PPE so that he can also use it. Possible and not necessarily necessary antecedent conditions can be the safety culture in the company or the training intensity of the employees. Environmental conditions such as the weather, the time of day, lack of sleep or work pressure are also preceding conditions. In short, anything that can influence the employee’s behaviour. The preceding conditions reappear downstream as consequences. The preceding condition “storage location of PPE” later becomes the consequence “fetch PPE from storage location”. If this storage location is 10 m or 500 m away from the workplace, this results in very different consequences for the employee. The preceding conditions therefore become the natural consequences.

How to Prevent Unsafe Behaviour of Employees? Explanatory …

511

3.7 Behaviour and Consequences People’s behaviour is shaped by consequences. The behaviour shown occurs more frequently after positive consequences and less frequently with negative consequences [12, p. 84]. Nevertheless, an employee does not always behave safely because, firstly, different consequences can follow a behaviour and, secondly, different consequences are evaluated differently by people. Even the exact same consequence for the same behaviour (passing by the office → meeting colleague X) can be assessed positively by colleague A (likes colleague X) and negatively by colleague B (does not like colleague X). Negative consequences can also be accepted because the positive consequences are perceived as more valuable by the person acting. In order to permanently change behaviour, consequences must be linked to it [12, p. 73]. Consequences should therefore be considered in more detail. According to Linderkamp [43, p. 211], consequences are divided into 4 categories, which also describe their direction and mode of action. The reinforcers reinforce a displayed behaviour by providing the organism with positive feedback from the environment on its behaviour. Positive reinforcement (C+) rewards a behaviour by adding positive stimuli such as money or positive feedback. Negative reinforcement (C−) rewards a behaviour by taking away negative stimuli such as cutting overtime or not giving admonishments. Punishment works against the behaviour shown and can be used by adding negative stimuli (positive punishment, C−) such as fines or warnings. Similarly, punishment can be used by removing pleasant stimuli (negative punishment, C+) such as cancelling a bonus payment or denying an upcoming promotion. The terms positive and negative are not used by in the sense of a valuation, but in the sense of adding (positive) and removing (negative). In the following, positive and negative mean the effect of the consequence on the actor, as defined in Behaviour Based Safety [12, p. 17]. Here it is only relevant whether the consequence positively encourages the actor in his behaviour, i.e. the probability increases that the actor will behave in the same way again, or whether the consequence has negative effects on the actor and he is likely to change his behaviour in the future. In addition, according to Bördlein [1, p. 75], a distinction can be made in occupational safety between natural and planned consequences. Natural consequences result automatically from the behaviour and the preceding conditions, whereas planned consequences are additionally added in a targeted manner by another party. Inserted consequences such as penalties, bonuses and promotions are therefore planned consequences that only arise through active intervention in the natural processes. Natural consequences are those arising from the environment, such as unpleasant pressures from the PPE applied or reactions from colleagues. According to Koch and Stahl [44, p. 320], two factors are relevant for linking behaviour with consequences. Contiguity (temporal proximity) serves as a prerequisite for linking behaviour and consequence. Contingency (probability of consequence) is crucial for associative learning to work. Kossmann [33, p. 26] describes the general acceptance of the rules, the appropriateness of the sanction, the probability of detection of the behaviour and the immediacy of the sanction as the 4 decisive

512

V. Wöll and R. Sulíková

factors of the effect of consequences. According to Bördlein [12, p. 80], it is mainly the 3 factors of temporal distance of the consequence to the behaviour, probability of occurrence and severity of the consequence that have an effect on the value of consequences. In the following, the factors mentioned in the literature that have an influence on the subjectively perceived value of consequences will be examined in more detail. The first factor is the time lag of the consequence on the behaviour. The effectiveness of the consequence on a behaviour change decreases proportionally with the time interval to the behaviour. In various studies, James Mazur [45, p. 191], among others, proved that the devaluation of the value of a consequence over time can be calculated with the so-called “hyperbolic decay model”. Even after a short time, a large devaluation of the consequence value takes place, which then decreases more and more slowly with duration. In various experiments, it was also proven that in certain groups of people (alcoholics, heroin addicts), in relation to the normal population, the devaluation of consequences takes place in a shorter time interval. The second factor is the probability of a consequence occurring. The more likely a consequence is to occur, the higher its subjective value for the person acting. It is obvious that the consequence of a warning for behaviour by a superior can only occur if the superior is on site. If the supervisor is 200 km away at the company site and the employee is alone on the construction site, the value of the consequence is zero. If the supervisor works in the neighbouring room and walks around the construction site at irregular intervals, the consequence will become more important. In the case of permanent supervision by the supervisor, the probability of consequence (with a consistent supervisor) can be assumed to be 100%. This realisation has also already found its way into the regulations on occupational safety, so that permanent supervision of employees is explicitly required for certain activities [37, p. 10]. The third relevant factor is the severity of the consequence. It will be obvious to anyone that a fatal fall accident is a more effective negative consequence than a warning or negative feedback from the foreman. Similarly, a promotion with a salary increase is a more severe positive consequence than a one-time positive verbal feedback or a company coffee mug as a gift. The appropriateness of the sanction and the acceptance of the rules cited by Kossmann [33, p. 26] can also be subsumed under this point. The fourth factor, which is not mentioned here, is and remains the human factor, which will not be examined in detail here, as an examination of all the individual factors of the human personality would make practical application in occupational safety impossible with the resources available. Bördlein [12, p. 80] states that, strictly speaking, one can only determine with certainty in retrospect whether a particular consequence is relevant for a person. Certain groups of people with different characteristics (age, gender, origin, level of education) also react significantly differently to the same consequences. This is probably the only explanation for the fact that accident frequencies in the same sectors differ according to age and gender [1, p. 22]. The “human factor” should only be taken into account to the extent that it should be mentioned that it is an absolutely necessary precondition that the employee has the mental maturity and professional qualification to be able to assess the consequences

How to Prevent Unsafe Behaviour of Employees? Explanatory …

513

in their effect. It is not for nothing that the law already sets the preceding conditions in certain areas, so that dangerous work, such as certain activities with hazardous substances, may only be carried out by persons over 18 years of age and after prior practical instruction [46, p. 11]. It should also be noted that reinforcing positive behaviour is generally preferable to punishing negative behaviour. According to Werner and Trunk [34, p. 27], punishment will often lead to a flight or avoidance reaction and the employee will try to escape from the punisher. Bördlein [12, p. 93] mainly lists the following problems with punishment: • Punishment alone stops a wrong behaviour without reinforcing the right behaviour • Punishment leads to aggression in the punished person • The punisher is avoided Behaviour is shaped by the consequences for the future. Thus, behaviour occurs more frequently in the future after positive consequences and less frequently in the future after negative consequences [34, p. 24]; [20, p. 34]. The task of managers in occupational safety is therefore to shape consequences in such a way that employees behave safely in the future. If employees behave safely, the number of dangerous situations and the number of occupational accidents will automatically decrease. Finally, it can be added that the BBS method, which is based on the ABC model, is the most frequently studied and most successful method for changing unsafe behaviour in occupational safety and health [41, p. 23]. According to the current state of the literature, it can therefore be considered the only holistic model that can significantly reduce the main causes of occupational accidents.

3.8 Dissemination of Explanatory Models for Human Error in Occupational Safety and Health The dissemination of different theoretical models for explaining human error behaviour in the field of application of practical occupational safety in Germanspeaking countries must be regarded as low. Only the scientific publications by Bördlein [12], Lengwiler [26], Messner [30] and BG Verkehr [22] deal directly with different theoretical models for explaining human error behaviour in the context of occupational safety. Overall, the only publications in the German-language literature, which specifically mention the ABC model in connection with occupational safety, are published by Messner [30], Bördlein [12], Paffrath [20], BG Verkehr [22] and Bördlein and Zeitler [41]. The various models are not mentioned in the authoritative laws and regulations on occupational safety. Only individual actions are required, which are not linked with explanatory models and their derived methods.

514

V. Wöll and R. Sulíková

3.9 Use of Methods to Prevent Human Error in Occupational Safety A survey that examines the use of specific methods to prevent human error in Germany does not exist in the scientific literature. Which companies use the different methods of personnel selection, monitoring method, qualification method and BBS, and to what extent, is a research gap that needs to be closed. Since the models are not directly mentioned in the literature with regard to the intensity of their use by the companies, 47 elements from OSH were assigned as codes to the various 4 different model-method constructs. The mention of the different codes of the models and methods in the 56 selected laws and regulations was examined. The result therefore does not show which models and methods are used by the companies, but which methods are required by the OSH authorities. The data from Table 4 clearly show a strong presence of elements of the information model and thus the qualification method in the laws and regulations on workplace safety in the Federal Republic of Germany. With a share of 76.4% of the codes found, the focus of the OSH required by the authorities is clearly on the information model and the qualification method. Codes of the monitoring model are found with a frequency of 11.9% and codes of the ABC model were detected with a frequency of 11.5% in the ratio. The elements of the personality model were found in proportion in only 0.3% of the cases. It is also striking that most of the codes were found in an information document of the authorities on risk assessment [63]. Although this document is an official instruction of the OSH authorities, it is not an adopted law or regulation. It is therefore only supplementary and purely informative. A deeper analysis of the individual registered codes shows how the results are composed (Table 5). It is striking that the proper names of all models and methods developed for the prevention of human error in occupational safety are not mentioned in the legal texts and regulations. The laws and regulations only require individual elements of methods, but never the holistic application of a method or model. In the case of the ABC model, it is noticeable that although the central element of behaviour is mentioned in 53.45% of all documents, the behaviour of employees is not placed in any context with influencing factors. Consequences and motivation are only mentioned in about 5% of the publications and elements such as praise or reinforcement are not mentioned at all. In contrast, elements of the monitoring model such as supervision (37.93%), control (32.76%) and inspection (3.45%) are mentioned in significantly more publications. The laws and regulations thus reflect that employee behaviour is important and causal in the context of accident prevention, only they do not specify measures based on the application of the ABC model, which according to scientists [12, p. 393]; [20, p. 41] and the opinion of occupational safety and health authorities [22, p. 48] is the effective model for preventing human error. For the personality model, only the warning is mentioned in only one publication [63, p. 67]. There, reference is made only to DGUV Information 206-009 [95] on dealing with addicted employees. It can therefore be stated that the personality model and

How to Prevent Unsafe Behaviour of Employees? Explanatory …

515

Table 4 Shows which models and methods are required in 56 different laws and regulations on occupational safety and health in the Federal Republic of Germany No Name of document

Number Personality Monitoring of model model Codes Personnel Police selection method

Information model Qualification method

ABC-model Behavior Based Safety

1

AfB 2018 TRBS 1111 Gefährdungsbeurteilung [15]

59

0

3

51

5

2

BG BAU 2017 Gefährdungsbeurteilung [47]

2

0

0

2

0

3

BG ETEM 2020 Gefährdungsbeurteilung [13]

60

0

2

55

3

4

BGHM 2007 DGUV V 56 Schussapparate [48]

14

0

0

12

2

5

BGHM 2007a DGUV V 70 Fahrzeuge [49]

9

0

1

5

3

6

BGHM 2007b DGUV V 19 73 Schienenbahnen [50]

0

0

6

13

7

BGHM 2007c DGUV V 79 Flüssiggas [51]

33

0

3

23

7

8

BGHM 2012 DGUV V 3 31 Elektrische Anlagen [52]

0

3

27

1

9

BGHM 2013 DGUV V 15 Elektromagn. Felder [53]

4

0

1

3

0

10

BGHM 2013a DGUV V 52 Krane [54]

22

0

3

19

0

11

BGHM 2013b DGUV V 20 54 Winden [55]

0

5

15

0

12

BGHM 2013c DGUV V 68 Flurförderzeuge [56]

32

0

0

28

4

13

BGHM 2014 DGUV V 62 Maschinenanlagen [57]

3

0

3

0

0

14

BGHM 2014a DGUV V 1 64 Schwimmende Geräte [58]

0

1

0

0

15

BGHM 2015 DGUV V 1 42 Grundsätze [38]

0

0

40

2

16

BGHM 2016 DGUV V 2 60 Betriebsärzte und F. f. A.[59]

0

2

42

16

(continued)

516

V. Wöll and R. Sulíková

Table 4 (continued) No Name of document

Number Personality Monitoring of model model Codes Personnel Police selection method

Information model Qualification method

ABC-model Behavior Based Safety

17

BGHM 2019 DGUV V38 Baustellen [60]

3

0

0

3

0

18

BGHM 2020 GBU Hinweise zur Durchführung [61]

5

0

0

5

0

19

BGN 2020 Gefährdungsbeurteilung [62]

11

0

3

4

4

20

BGRCI 2020 Gefährdungsbeurteilung A017 [63]

271

3

15

231

22

21

Bundesregierung 1996 PSA [64]

3

0

0

3

0

22

Bundesregierung 2004 6 Arbeitsstättenverordnung [65]

0

1

5

0

23

Bundesregierung 2013a Arbeitssicherheitsgesetz [66]

16

0

0

14

2

24

Bundesregierung 2013 Gefahrstoffverordnung [39]

19

0

1

18

0

25

Bundesregierung 2015 BetrSichV [40]

32

0

3

29

0

26

Bundesregierung 2017 Baustellenverordnung [67]

1

0

1

0

0

27

Bundesregierung 2017a Biostoffverordnung [68]

21

0

3

17

1

28

Bundesregierung 2017b 9 Lärm und VibrSchV [69]

0

0

9

0

29

Bundesregierung 2017c OStrV [70]

11

0

1

10

0

30

Bundesregierung 2019 ArbMedVV [71]

2

0

1

1

0

31

Bundesregierung 2019a Straßenverkehrsgesetz [72]

87

0

38

42

7

(continued)

How to Prevent Unsafe Behaviour of Employees? Explanatory …

517

Table 4 (continued) No Name of document

Number Personality Monitoring of model model Codes Personnel Police selection method

Information model Qualification method

ABC-model Behavior Based Safety

32

Bundesregierung 2020 JArbSchG [46]

14

0

3

10

1

33

Bundesregierung 2020c Lastenhandhabungsv. [73]

2

0

0

2

0

34

Bundesregierung 2020a SGB VII 1996 [8]

37

0

14

22

1

35

Bundesregierung 2020b Arbeitsschutzgesetz [3]

26

0

13

12

1

36

DGUV 1982 DGUV V 66 Sprengkörper Schrott [74]

2

0

0

2

0

37

DGUV 1996 DGUV V 1 60 Wasserfahrzeuge [75]

0

0

1

0

38

DGUV 1997 DGUV V 11 Laserstrahlung [76]

5

0

1

3

1

39

DGUV 1997a DGUV V 13 Organische Peroxide [77]

9

0

1

5

3

40

DGUV 1997b DGUV V 19 Schausteller [78]

1

0

0

1

0

41

DGUV 1997c DGUV V 20 Spielhallen [79]

3

0

0

3

0

42

DGUV 1997d DGUV V 21 Abwassertechnische [80]

8

0

0

8

0

43

DGUV 1997e DGUV V 23 Sicherungsdienste [81]

10

0

1

5

4

44

DGUV 1997f DGUV V 32 Kernkraftwerke [82]

2

0

1

0

1

45

DGUV 1997g DGUV V 42 Zelte u. Tragluftbauten [83]

1

0

0

1

0

46

DGUV 1997h DGUV V 3 48 Straßenreinigung [84]

0

0

0

3

47

DGUV 1997i DGUV V 65 Druckluftbehälter [85]

0

0

5

0

5

(continued)

518

V. Wöll and R. Sulíková

Table 4 (continued) No Name of document

Number Personality Monitoring of model model Codes Personnel Police selection method

Information model Qualification method

ABC-model Behavior Based Safety

48

DGUV 1998 DGUV V 2 17 Veranstalltungsstätten [86]

0

0

1

1

49

DGUV 1998a DGUV V 25 Kassen [87]

0

0

0

0

0

50

DGUV 1998b DGUV V 1 29 Steinbrüche u. Halden [88]

0

0

1

0

51

DGUV 1999 DGUV V 34 Metallhütten [89]

1

0

0

0

1

52

DGUV 1999a DGUV V 43 Müllbeseitigung [90]

6

0

0

3

3

53

DGUV 2001 DGUV V 36 Hafenarbeiten [91]

9

0

1

7

1

54

DGUV 2012 DGUV V 40 Taucherarbeiten [92]

24

0

1

11

12

55

DGUV 2018 DGUV V 49 Feuerwehren [93]

11

0

1

8

2

56

VBG 2020 Gefährdungsbeurteilung [94]

31

0

2

27

2

1122

3

133

857

129

0.3

11.9

76.4

11.5

Sum of codes Share of codes in %

The table shows the 4 supercategories of the 47 codes used

personnel selection are almost non-existent in the laws and occupational health and safety regulations and are also not required.

4 Discussion and Conclusion The literature research showed that the main cause of occupational accidents in all the texts researched was human error, with up to 96%. This human error can be seen in organisational failure on the part of those indirectly responsible, or in direct misconduct on the part of the persons involved at the scene of the accident. A categorisation of accident causes according to the T-O-P principle is the only standard used in the field of occupational safety. From the author’s point of view, it is a research gap that the share of the respective categories of human error from Table 1 in the causes of accidents is unknown.

Information model

Information model

Information model

Information model

Monitoring model

ABC model

Information model

Information model

ABC model

Personality model

ABC model

befähigung

mitteilung

wissen

sachkunde

begehung

motivation

schulung

arbeitsanweisung

konsequenzen

abmahnung

feedback

Information model

betriebsanleitung

Monitoring model

Information model

qualifikation

Information model

Monitoring model

überwachung

fortbildung

Information model

betriebsanweisung

kontrolle

ABC model

Information model

ausbildung

unterweisung

verhalten

Information model

Information model

information

Category by model and method

Code

2

3

4

5

7

7

10

11

12

25

35

40

44

49

54

79

108

112

117

148

254

Frequency of mention

0.18

0.27

0.36

0.44

0.62

0.62

0.89

0.98

1.07

2.22

3.11

3.55

3.91

4.35

4.80

7.02

9.59

9.95

10.39

13.14

22.56

Frequency %

21

20

19

18

16

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

Rank by frequency

3.45

1.72

5.17

3.45

6.90

5.17

3.45

10.34

12.07

13.79

17.24

15.52

32.76

17.24

29.31

37.93

34.48

44.83

53.45

60.34

24.14

Naming in documents %

16

17

15

16

14

15

16

13

12

11

9

10

5

9

7

6

4

3

2

1

8

(continued)

Rank naming in documents

Table 5 Shows which models and methods are required in 56 different laws, regulations and instructions on occupational safety and health in Germany

How to Prevent Unsafe Behaviour of Employees? Explanatory … 519

Information model

Personality model

Personality model

Monitoring model

Monitoring model

qualifikationsmethode

risikopersönlichkeit

risikoverhalten

rundgang

überwachungsmodell

personalselektion

Monitoring model

Personality model

loben

Personality model

ABC model

informationsmodell

polizeimethode

Information model

dupont

persönlichkeitsmodell

Monitoring model

beobachter

Personality model

ABC model

behavior based safety

persönlichkeitsmerkmal

ABC model

behavior

Personality model

ABC model

bbs

Personality model

ABC model

audit

persönlichkeitseigenschaft

Monitoring model

abc-modell

persönlichkeit

ABC model

ABC model

a-b-c

Category by model and method

Code

Table 5 (continued)

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Frequency of mention

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Frequency %

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Rank by frequency

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

Naming in documents %

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

(continued)

Rank naming in documents

520 V. Wöll and R. Sulíková

Personality model

ABC model

ABC model

unfallneigung

unfallpersönlichkeit

verhaltensanalyse

verstärken

0

0

0

0

0

Frequency of mention

0.00

0.00

0.00

0.00

0.00

Frequency %

The table shows the 47 codes used in the 4 supercategories of models and methods

Personality model

Personality model

unfäller

Category by model and method

Code

Table 5 (continued)

0

0

0

0

0

Rank by frequency

0.00

0.00

0.00

0.00

0.00

Naming in documents %

0

0

0

0

0

Rank naming in documents

How to Prevent Unsafe Behaviour of Employees? Explanatory … 521

522

V. Wöll and R. Sulíková

In the field of occupational safety, 4 scientific models for explaining human error are currently mentioned in the German-speaking world. Only a few authors in the German-language literature present and examine the models in their entirety. The structured qualitative content analysis showed that the ABC model with its BBS method is the only model for explaining the causes of human error in occupational accidents that is considered suitable and effective by all the authors studied (Table 3). This model has also been adopted more recently by the employers’ liability insurance association as the basic explanatory model in the user-oriented literature [22]. The monitoring model, information model and personality model are assessed differently and, according to the literature examined, are not suitable for eliminating human error as the most frequent cause of occupational accidents by their methods alone. However, the three models and their methods are described in large parts of the literature as convenient and, in combination with other models and measures, as effective. Which methods according to Table 2 are currently used by companies in the area of occupational safety, and to what intensity, could not be verified through the literature research, as there are no publications on this question. It was only possible to research that 54.7% of the companies use the risk assessment method to identify mental hazards at the workplace [96, p. 43]. In the author’s opinion, answering the question of which methods are used by companies and to what extent is a research gap that still needs to be filled. A quantitative content analysis of 56 laws, regulations and official guidance documents made it possible to determine the extent to which the use of the 4 methods examined from Table 2 is required by legislators to prevent human error. Here it could be clearly shown that the information model with the associated qualification method is strongly dominant in relation to the other models with a frequency of 76.4% (Table 4). In more than 60% of the texts examined, instruction is mentioned as a legal requirement in occupational safety and health. The monitoring model and the police method are mentioned in 2nd place with a frequency of 11.9% at a strong distance. Here, concrete measures such as monitoring are also called for in approx. 40% of all legal texts. The ABC model and the BBS method are mentioned in 3rd place with a frequency of 11.5%, but only through the frequent mention of the element of behaviour. A targeted demand for measures of the BBS method for behavioural change is only given in approx. 5% of the publications. Here, motivation and consequences are named as elements of the model. From the results it can be concluded that there is an extreme mismatch between the assessment of the functionality and effectiveness of models and methods for the prevention of human error in the context of occupational safety in the scientific literature and the requirement by the legislator as to which models and methods should be applied. It therefore seems logical to the authors that many industries and large corporations follow occupational health and safety standards such as SCC (Safety Certificate Contractors), SCL (Safety Culture Ladder) or DIN EN ISO 45001 or use methods such as BBS that go well beyond the required level of legislation to reduce the number of their occupational accidents and lost days. Since the return on investment for

How to Prevent Unsafe Behaviour of Employees? Explanatory …

523

systems such as BBS is up to 10:1 according to [22, p. 48], such systems should prevail in the future, as they offer a clear competitive advantage by reducing unplanned costs due to occupational accidents.

5 Attachment See Tables 6 and 7.

Table 6 Coding guidelines for the structuring qualitative content analysis according to Mayring and Fenzl (Table 3) Categories

Definition

Anchor example

Coding rules

C1: Full agreement

The author describes that • The model studied is suitable for explaining human error in the context of occupational safety and health • The prevention method based on the model is effective

“BBS has been shown to be “by far the most successful and widely studied behaviour change programme” (Zimolong et al. 2006) in the field of occupational safety” [41]

The author must make at least one of the statements

C2: Partial agreement

The author describes that • The model studied is of limited use in explaining human error in the context of occupational safety, and lists advantages and disadvantages • The prevention method based on the model can be effective in combination with other methods

“Various analyses of The author must make at accident statistics show least one of the that the accident rate is statements significantly higher for certain groups of people than for others” [25, p. 496]

C3: Complete rejection

The author describes that • The model studied is unsuitable for explaining human error in the context of occupational safety and health • The prevention method based on the model is ineffective

“The fact that one The author must make at person is supposed to least one of the cause accidents more statements often than another because of certain traits accommodates the layman’s understanding of psychology” [12, p. 24]

524

V. Wöll and R. Sulíková

Table 7 Categories according to models and assigned code variables of the quantitative literature analysis (Table 4 and 5) Category

Search-Code

Translation

Personality model

Unfäller

no equivalent

Personality model

Unfallpersönlichkeit

Accident personality

Personality model

Unfallneigung

Accident proneness

Personality model

Persönlichkeit

personality

Personality model

Personalselektion

Personnel selection

Personality model

Persönlichkeitsmerkmal

Personality trait

Personality model

Risikoverhalten

Risk behaviour

Personality model

Risikopersönlichkeit

Risk personality

Personality model

Persönlichkeitseigenschaft

Personality trait

Personality model

Persönlichkeitsmodell

Personality model

Personality model

Abmahnung

Warning

Monitoring model

Audit

Audit

Monitoring model

Begehung

Inspection

Monitoring model

Kontrolle

Check

Monitoring model

Überwachung

Monitoring

Monitoring model

Überwachungsmodell

Monitoring model

Monitoring model

Polizeimethode

Police method

Monitoring model

DuPont

DuPont

Monitoring model

Rundgang

patrol

Information model

Wissen

knowledge

Information model

Schulung

Training

Information model

Qualifikation

Qualification

Information model

Ausbildung

education

Information model

Fortbildung

Further education

Information model

Unterweisung

Instruction

Information model

Betriebsanweisung

Operating instructions

Information model

Arbeitsanweisung

Work instruction

Information model

Mitteilung

Communication

Information model

Informationsmodell

Information model

Information model

Information

information

Information model

Qualifikationsmethode

Qualification method

Information model

Betriebsanleitung

Manual

Information model

Sachkunde

Expertise

Information model

Enablement

Enablement

ABC model

Konsequenzen

Consequences (continued)

How to Prevent Unsafe Behaviour of Employees? Explanatory …

525

Table 7 (continued) Category

Search-Code

Translation

ABC model

Verhalten

behavior

ABC model

ABC-Modell

ABC model

ABC model

A-B-C

A-B-C

ABC model

Verhaltensanalyse

Behavioural analysis

ABC model

BBS

bbs

ABC model

Behavior

Behavior

ABC model

Beobachter

Observer

ABC model

Verstärken

Amplify

ABC model

Loben

Praise

ABC model

Feedback

Feedback

ABC model

Behavior Based Safety

Behavior Based Safety

ABC model

Motivation

motivation

References 1. DGUV: Arbeitsunfallgeschehen 2019. DGUV, Berlin, Germany (2020) 2. BAUA: Volkswirtschaftliche Kosten durch Arbeitsunfähigkeit 2018. BAUA, Berlin, Germany (2020) 3. Bundesregierung: Arbeitsschutzgesetz. Bundesregierung, Berlin, Germany (2020) 4. Dellve, L., Eriksson, A.: Health-Promoting Managerial Work: A Theoretical Framework for a Leadership Program. Societies 2017, 7(2), 12. University West, Hjulkvarn, Sweden (2017) 5. Webster, J. & Watson, R.: Analysing the past to prepare for the future: writing a literature review. MIS Quaterly Vol. 26 No. 2, MIS Research Centre, Minneapolis, USA (2002), pp. 13–24 6. Baur, N., Blasius, J. (eds.): Handbuch Methoden der empirischen Sozialforschung, Mixed Methods. Springer Fachmedien, Wiesbaden, Germany (2014) 7. BGHM: DGUV V 1 Grundsätze der Prävention. BGHM, Mainz, Germany (2015) 8. Bundesregierung: Das Siebte Buch Sozialgesetzbuch SGB VII. Bundesregierung, Berlin, Germany (2020) 9. Hammer, W., et al.: Ergonomische Arbeitsplatzgestaltung zur Erhöhung der Arbeitssicherheit. Institut für Betriebstechnik, Braunschweig-Völkenrode, Germany (1990) 10. BAUA: Grundbegriffe des Arbeitsschutzes. BAUA, Berlin, Germany (2013) 11. Anderson, M., Denkl, M.: The Heinrich Triangle. Rio de Janiero, Brazil: SPE International Conference on HSE, Retrieved from: https://doi.org/10.2118/126661-MS (2010) 12. Bördlein, C.: Verhaltensorientierte Arbeitssicherheit – Behavior Based Safety (BBS), 2nd edn. Erich Schmidt Verlag, Berlin (2015) 13. BG ETEM: Gefährdungsbeurteilung – Gefährdungen und Belastungen am Arbeitsplatz. BG ETEM, Köln, Germany (2020) 14. DIN: DIN EN ISO 12100:2010 Sicherheit von Maschinen – Allgemeine Gestaltungsleitsätze – Risikobeurteilung und Risikominderung. DIN, Berlin, Germany (2010) 15. Ausschuss für Betriebssicherheit: TRBS 1111 Gefährdungsbeurteilung. BAUA, Berlin, Germany (2018) 16. BMVI: Die häufigsten Unfallursachen. www.runtervomgas.de. Retrieved from https://www. runtervomgas.de/unfallursachen/artikel/die-haeufigsten-unfallursachen.html. BMVI, Berlin, Germany (2020) 17. Schieche, C., et al.: Tödliche Arbeitsunfälle in Berlin von 1990–1995 ausrechtsmedizinischer Perspektive. Institut für Rechtsmedizin, Humboldt-Universität, Berlin, Germany (2000)

526

V. Wöll and R. Sulíková

18. DGUV: Arbeitsunfallgeschehen 2018. DGUV, Berlin, Germany (2019) 19. Fahlbruch, B., Mayer, I.: Leitfaden zur Untersuchung von Arbeitsunfällen. BAUA, Dresden (2013) 20. Paffrath, D.: Reduzierung des Unfallrisikos auf Baustellen. Bergische Universität, Wuppertal (2005) 21. BAUA: Untersuchungsbogen für tödliche Arbeitsunfälle. BAUA, Berlin, Germany (2019) 22. BG Verkehr: Psychologie der Arbeitssicherheit. BG Verkehr, Hamburg, Germany (2019) 23. Fahlbruch, B., et al.: Einfluss menschlicher Faktoren auf Unfälle in der verfahrenstechnischen Industrie. Umweltbundesamt, Dessau-Roßlau, Germany (2008) 24. Hofinger, G.: Human Factors, Kap. 3 Fehler und Unfälle. Springer Medizin Verlag, Heidelberg, Germany, pp. 36–54 (2008) 25. Schaper, N., et al.: Arbeits- und Organisationspsychologie, Chapter 27 Psychologie der Arbeitssicherheit. Springer-Verlag, Heidelberg (2014) 26. Lengwiler, M.: Arbeitswissenschaften und Geschlechterverhältnis: Die Geschichte der Unfallpersönlichkeit in zwei institutionellen Anwendungsbereichen. In: Verharrender Wandel: Institutionen und Geschlechterverhältnisse. Sigma, Berlin, Germany (2004) 27. Bundesregierung: Allgemeines Gleichbehandlungsgesetz AGG. Bundesregierung, Berlin, Germany (2006) 28. KBA: Eintragungen von Verkehrsverstößen. Retrieved from: https://www.kba.de/DE/Statis tik/Kraftfahrer/Verkehrsauffaelligkeiten/Zugang%20FAER/Zugang_FAER_archiv/2019/pse udo_va_Zugang_FAER_thema_node.html. KBA, Flensburg, Germany (2020) 29. Wienkamp, H.: Anforderungsmerkmale in Theorie und Praxis. In: Psychologische Anforderungsanalysen in Theorie und Praxis. Springer, Wiesbaden (2020) 30. Messner, T.: Behavior Based Safety Praxisrelevante Aspekte. Fachhochschule Nordwestschweiz, Windisch, Switzerland (2014) 31. Landwehrs, T.: Sicherheit und Gesundheitsschutz bei der Arbeit. Bergische Universität, Wuppertal (2019) 32. Grass, P., Hille, S.: Werte und Kultur als Faktoren für den Unternehmenserfolg, Betriebspraxis und Arbeitsforschung 230. Ifaa, Düsseldorf, Germany (2017) 33. Kossmann, I.: Polizeiliche Verkehrsüberwachung, Mensch und Sicherheit M67. Verlag für neue Wissenschaft, Bremerhaven (1996) 34. Werner und Trunk: Operante Verfahren Techniken der Verhaltenstherapie. Beltz Verlag, Weinheim Basel (2017) 35. Europäisches Parlament: Datenschutzgrundverordnung. Europäisches Parlament, Brüssel, Belgien (2016) 36. Hohmann-Fricke, S.: Strafwirkungen und Rückfall. Georg August Universität, Göttingen (2012) 37. BG BAU: DGUV Regel 100-004 Arbeiten in kontaminierten Bereichen. BG BAU, Berlin, Germany (2006) 38. BGHM: DGUV V1 Grundsätze der Prävention. BGHM, Mainz, Germany (2015) 39. Bundesregierung: Gefahrstoffverordnung GefStoffV. Bundesregierung, Berlin, Germany (2013) 40. Bundesregierung: Betriebssicherheitsverordnung BetrSichV. Bundesregierung, Berlin, Germany (2015) 41. Bördlein, C., Zeitler, L.: Das Verhalten der Mitarbeiter verstehen. Sicherheitsingenieur, 03/2020. Dr. Curt Haefner-Verlag GmbH, Leinefelden, Germany, pp. 22–25 (2020) 42. Kaiser, F.: Dorsch Lexikon der Psychologie. Retrieved from: https://dorsch.hogrefe.com/stichw ort/verhalten#search=4259e896bd488cad73519c5c9380929e&offset=0, 10.12.2020. Hogrefe AG, Bern, Schweiz (2019) 43. Linderkamp, F.: Operante Methoden. In: Lehrbuch der Verhaltenstherapie, p. 211. Springer, Heidelberg (2009) 44. Koch und Stahl: Lernen – Assoziationsbildung, Konditionierung und implizites Lernen. Springer Verlag, Heidelberg, Germany, p. 337 (2017)

How to Prevent Unsafe Behaviour of Employees? Explanatory …

527

45. Mazur, J.E.: Risky choice: selecting between certain and uncertain outcomes. Behav. Anal. Today 5(2), 190–203 (2004). https://doi.org/10.1037/h0100031 46. Bundesregierung: Gesetz zum Schutze der arbeitenden Jugend JArbSchG. Bundesregierung, Berlin, Germany (2020) 47. BG BAU: Gefährdungsbeurteilung. BAUA, Berlin, Germany (2017) 48. BGHM: DGUV V 56 Schussapparate. BGHM, Mainz, Germany (2007) 49. BGHM: DGUV V 70 Fahrzeuge. BGHM, Mainz, Germany (2007) 50. BGHM: DGUV V 73 Schienenbahnen. BGHM, Mainz, Germany (2007) 51. BGHM: DGUV V 79 Verwendung von Flüssiggas. BGHM, Mainz, Germany (2007) 52. BGHM: DGUV V 3 Elektrische Anlagen. BGHM, Mainz, Germany (2012) 53. BGHM: DGUV V 15 Elektromagnetische Felder. BGHM, Mainz, Germany (2013) 54. BGHM: DGUV V 52 Krane. BGHM, Mainz, Germany (2013) 55. BGHM: DGUV V 54 Winden. BGHM, Mainz, Germany (2013) 56. BGHM: DGUV V 68 Flurförderzeuge. BGHM, Mainz, Germany (2013) 57. BGHM: DGUV V 62 Maschinenanlagen auf Wasserfahrzeugen. BGHM, Mainz, Germany (2014) 58. BGHM: DGUV V 64 Schwimmende Geräte. BGHM, Mainz, Germany (2014) 59. BGHM: DGUV V 2 Betriebsärzte und Fachkräfte für Arbeitssicherheit. BGHM, Mainz, Germany (2016) 60. BGHM: DGUV V 38 Baustellen. BGHM, Mainz, Germany (2019) 61. BGHM: GBU Hinweise zur Durchführung. BGHM, Mainz, Germany (2020) 62. BGN: Handlungsanleitung Gefährdungsbeurteilung. BGN, Mannheim, Germany (2020) 63. BGRCI: Gefährdungsbeurteilung A 017. BGRCI, Heidelberg, Germany (2020) 64. Bundesregierung: PSA-Benutzungsverordnung. Bundesregierung, Berlin, Germany (1996) 65. Bundesregierung: Arbeitsstättenverordnung. Bundesregierung, Berlin, Germany (2004) 66. Bundesregierung: Arbeitssicherheitsgesetz. Bundesregierung, Berlin, Germany (2013) 67. Bundesregierung: Baustellenverordnung. Bundesregierung, Berlin, Germany (2017) 68. Bundesregierung: Biostoffverordnung. Bundesregierung, Berlin, Germany (2017) 69. Bundesregierung: Lärm- und Vibrationsschutzverordnung. Bundesregierung, Berlin, Germany (2017) 70. Bundesregierung: Arbeitsschutzverordnung zu künstlicher optischer Strahlung OStrV. Bundesregierung, Berlin, Germany (2017) 71. Bundesregierung: Arbeitsmedizinische Vorsorgeverordnung ArbMedVV. Bundesregierung, Berlin, Germany (2019) 72. Bundesregierung: Straßenverkehrsgesetz. Bundesregierung, Berlin, Germany (2019) 73. Bundesregierung: Lastenhandhabungsverordnung. Bundesregierung, Berlin, Germany (2020) 74. DGUV: DGUV V 66 Sprengkörper und Hohlkörper im Schrott. DGUV, Berlin, Germany (1982) 75. DGUV: DGUV V 60 Wasserfahrzeuge. DGUV, Berlin, Germany (1996) 76. DGUV: DGUV V 11 Laserstrahlung. DGUV, Berlin, Germany (1997) 77. DGUV: DGUV V 13 Organische Peroxide. DGUV, Berlin, Germany (1997) 78. DGUV: DGUV V 19 Schausteller und Zirkusunternehmen. DGUV, Berlin, Germany (1997) 79. DGUV: DGUV V 20 Spielhallen. DGUV, Berlin, Germany (1997) 80. DGUV: DGUV V 21 Abwassertechnische Anlagen. DGUV, Berlin, Germany (1997) 81. DGUV: DGUV V 23 Wach- und Sicherungsdienste. DGUV, Berlin, Germany (1997) 82. DGUV: DGUV V 32 Kernkraftwerke. DGUV, Berlin, Germany (1997) 83. DGUV: DGUV V 42 Zelte und Tragluftbauten. DGUV, Berlin, Germany (1997) 84. DGUV: DGUV V 48 Straßenreinigung. DGUV, Berlin, Germany (1997) 85. DGUV: DGUV V 65 Druckluftbehälter auf Wasserfahrzeugen. DGUV, Berlin, Germany (1997) 86. DGUV: DGUV V 17 Veranstalltungsstätten für szenische Darstellung. DGUV, Berlin, Germany (1998) 87. DGUV: DGUV V 25 Kassen. DGUV, Berlin, Germany (1998) 88. DGUV: DGUV V 29 Steinbrüche und Halden. DGUV, Berlin, Germany (1998) 89. DGUV: DGUV V 34 Metallhütten. DGUV, Berlin, Germany (1999) 90. DGUV: DGUV V 43 Müllbeseitigung. DGUV, Berlin, Germany (1999)

528 91. 92. 93. 94. 95.

V. Wöll and R. Sulíková

DGUV: DGUV V 36 Hafenarbeiten. DGUV, Berlin, Germany (2001) DGUV: DGUV V 40 Taucherarbeiten. DGUV, Berlin, Germany (2012) DGUV: DGUV V 49 Feuerwehren. DGUV, Berlin, Germany (2018) VBG: Gefährdungsbeurteilung allgemeiner Katalog. VBG, Hamburg, Germany (2020) DGUV: DGUV Information 206-009 Suchtprävention in der Arbeitswelt. DGUV, Berlin, Germany (2019) 96. IAG: Mitgliederbefragung bei der Unfallkasse NRW 2020. Retrieved from https://www.unfall kasse-nrw.de/fileadmin/server/download/PDF_2020/Mitgliederbefragung_2020.pdf. DGUV, Berlin, Germany (2020)

Privacy and Cost Concerns in Online Advertising—Literature Review and Analysis Tomas Lego

Abstract Thanks to technological advancements, improvements in data analysis and the widespread of the Internet, online advertising became a growing industry worth billions of US dollars. With large numbers of active users and data on their behavior being easily retrievable, the Internet provides a cost-efficient way of targeting individuals. Through its reliance on the behavioral data of individual Internet users it however also poses a great threat to users’ privacy. Having conducted a keyword search in six independent digital libraries, this short chapter provides a description of the topic of privacy and cost issues of targeted online advertisement in the form of a literature review. Drawing on a sample of 70 unique journal articles, conference papers or book chapters, it introduces the reader to relevant authors and sources and offers an overview of the most salient keywords used in this context. Keywords Online advertisement · Privacy · Cost issues · Internet · Targeting · Personalization

1 Introduction Over decades, centuries even, the world has been getting smaller; more interconnected thanks to the introduction of countless new technologies and inventions. Over centuries, businessmen, entrepreneurs and potential sellers more generally, were trying to promote and sell their products to customers. Despite continuous technological improvements and advances, this goal has remained intact. However, the way of reaching out and advertising to both existing and potential new customers, has majorly been reshaped within the last years. The initial development of the Internet and the World Wide Web was not a sudden game-changer. Actually, it took almost two decades for online advertising returns to overcome outcomes achievable through long-established, “old-fashioned”, ways of advertisement [1]. It was the subsequent T. Lego (B) University of Vienna, Oskar Morgenstern Platz 1, 1090 Vienna, Austria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_17

529

530

T. Lego

evolution of the World Wide Web, as well as great advances in data collection and analysis that provided the ground for this major change [2–5]. Before the development of the Internet and also in the early stages of the World Wide Web’s existence, advertisers lacked a good way of targeting their promotions based on the individual needs of their customers. The idea that targeting advertisement was beneficial and could lead to augmentations of ad-revenues was not new to them, however, as it has been described in the literature much earlier (see e.g. Rossi et al. [6]). Realizing this importance, but lacking better ways of targeting individual customers, retailers tried to offer personalized ads based on the offline behavior of their clients. Already a decade ago, Target as well as other general merchandise retailers managed to predict pregnancies based on buying patterns of their customers and were able to offer their customers tailored promotions. Still, the majority of advertisement campaigns was aimed at large demographic segments through the printed media, billboards, or the television [2]. Only through the recent improvements and developments in machine learning, data collection and processing, have the mass advertisement strategies largely been replaced by micro-targeted (personalized) advertising campaigns. As the Internet offers a great opportunity for collecting behavioral data about its users, it allows a superior level of targeting [7], something often referred to as “one-to-one marketing” (see e.g. [8, 9]). In fact, nowadays, online advertisement is one of the fastest growing industries and worth billions of US dollars globally [4]. Mainly thanks to the emergence of social networks in the second half of the 2000s, but also due to the general expansion of the Internet and advances in accessibility to this global system, billions of active users can be reached online [1, 10]. Indeed, approximately 90% of all young Europeans can be found on social networks [11], and even more (98%) US Americans aged 18–29 are active on the Internet [3]. Combined with how easily data on the online (search) behavior of such individuals, i.e. internet users, can be retrieved, it is essential to realize that the Internet does represent an enormous source of information and opportunities for companies and advertisers in general. With the majority of businesses today being present online and billions of individuals also being active on the Internet, companies do fully realize the potential behind this enormous (potential) customer base. It is a fact that the largest audience can be reliably reached through the World Wide Web or other Internet services and thus the use of the Internet as an advertisement platform is a very popular and a common thing today [1, 2, 4, 10, 12, 13]. Allowing for higher levels of personalization, however, also poses great threats to users’ privacy [7, 14–17]. Customers are often being identified, tracked and targeted by their internet cookies, potentially infringing their online privacy. Past simply collecting and storing such data, its trade has also grown into an industry itself [15], posing additional opportunities for a leakage or a misuse of vulnerable personal information, eventually. This danger does not seem unknown to the end-users either, as for example two thirds of American adults fear a potential misuse of data that was stored about them for future advertisement purposes. Large shares of US as well as EU citizens would even be reluctant to allowing (even fully trusted) companies the access to their personal information, if it was entirely up to them [18]. It shall thus not come as a surprise that numerous publications exist that revolve around the

Privacy and Cost Concerns in Online Advertising …

531

Fig. 1 Number of total hits on all search terms over years

topic of privacy in online advertisement and focus on ways of banning or avoiding personalized ads (see e.g. [19–21]). Since the effectivity of targeting in online advertisement, however, is linked to the availability and quality of retrievable user data, the introduction of ad-blocking services or legal regulations that direct the use of personal data1 might pose a threat to this industry which upon reflection seems not only effective, but also more efficient than offline ways of advertising. This is driven by two main factors. First, the costs of targeting are relatively low on the Internet [13]. Thanks to the use of internet cookies, or more generally the tracking of users’ behavior online, but also the ease with which such data can be obtained, profiling of individual users is not difficult. The second reason for the cost-efficiency of online advertising is that there exists a myriad of different pricing mechanisms relevant in this context [22–26] and in the most mundane case, advertisers only pay when someone actually clicks on their ad [13]. Online advertising hence offers an interesting conflict between the interest of the entrepreneur and such of the hard-to-be pleased customer. It shows an end-user who might not be willing to share her own behavioral data—out of the fear for her privacy—that would allow for highly personalized ads, but might not want to be presented with irrelevant ads either. It also shows the entrepreneur; a potential advertiser who wants to advertise cost-efficiently and be able to target the right audience, whilst at the same time (at least seemingly) respecting the customer’s privacy. Nevertheless, for high levels of personalization to work, detailed data on users’ behavior is key. Thanks to today’s wide spread of smart devices and a nearly unlimited access to the Internet in many regions, online advertisement is a phenomenon, highly relevant for every day’s life of billions of people today. It is an industry turning billions of US dollars every year and a topic that has sparked tremendous interest in the scientific community. Actually, online advertisement as a research-field shows an ever-growing annual number of publications (see Fig. 1). 1

See Sarikakis and Winter [11] for the “Right to be forgotten”; see Bortgolte and Feamster [4] for an example on GDPR.

532

T. Lego

Not trying to uncover all matters in this context, this chapter does not focus on any specific type of online advertising [3, 13] or any specific Internet service, but focuses on advertising on the Internet more generally. It focuses on the two main issues of personalized online advertisement introduced above; privacy and costs. It does so, as not many existing publications target both aspects at the same time. Some authors touch upon both topics, but only a very limited number of publications seems to cover the aforementioned interplay between privacy and costs in the context of online advertising in depth. Drawing on a random sample of book chapters, journal articles and conference proceedings (or papers), the goal of this chapter is hence to provide a basic overview of scientific publications of the last two decades targeting the issue of privacy or costs (or both) in the context of online advertising. In doing so, this chapter draws on literature analysis methods introduced in Bauer and Strauss [27] and Kryvinska et al. [28] and provides potential future researchers a basic overview of the most relevant authors and trends, but mainly a detailed overview of frequently used keywords in this context. As such, this chapter may provide a foundation for literature search of future works revolving around the topic of costs and privacy concerns in targeted online advertisement on the Internet.

2 Online Advertisement in Literature The main contribution of this work is a systematic literature review of scientific publications touching upon the topics of privacy and costs in online advertisement. These publications were obtained through a keyword search conducted online in six different digital libraries, seeking to identify relevant literature that directly targets privacy and/or cost issues in the context of advertising online, or revolves around advertising on the Internet more generally, touching upon either of the constructs and providing valuable references in this regard. This chapter first introduces the keywords and the digital libraries used for this purpose. It then delivers a more detailed description of the search process and outcomes, focusing on the search terms used, the way of assessing the relevance of individual publications, as well as the number of search hits obtained during this process.

Privacy and Cost Concerns in Online Advertising …

533

2.1 Keywords Six keywords particularly salient in the context of this analysis were selected and used to find relevant publications. The first of the six keywords was “online advertisement”,2 defining the main focus of this chapter. Keywords two (“privacy”) and three (“cost”)3 further steered the search towards those specific topics. As privacy concerns and both the efficiency and effectivity of tailored advertisement on the Internet is linked to how the data on the (online) behavior of individual web users is being used or treated, keywords four and five go hand in hand and were formulated as “personalization”4 and “targeting”. Only through the availability of such data and its sophisticated analysis, can ads be tailored to individuals. Finally, making the link to the field of e-commerce even more visible, and abandoning the focus on any specific service (World Wide Web, E-Mail, etc.), the sixth keyword used was “Internet”.

2.2 Search Terms and Digital Libraries After having identified the relevant keywords, an initial examination of the used digital libraries revealed that the search inquiry can be made more specific by using different search terms (keyword combinations) using the “AND” operator available in all six selected libraries. Given the enormous number of hits achievable through searching for individual keywords only, ten search terms have been formed in total and used to conduct this search. All individual search terms are listed and assigned numbers in Table 1. The search has then been conducted using six respected digital libraries. These were Springer Link, Wiley Online Library, ACM Digital Library, Emerald insight, IEEE Xplore, and Sage journals.

2

The related terms “advertisement” and “advertising”, as well as their abbreviated versions “advert” and “ad” were used interchangeably. When examining the publications on the presence of the keyword “advertisement”, all related forms of this word were hence accepted. 3 The word “cost” is being used in many different ways in the context of online advertising (e.g. cost-efficiency; privacy costs; cost of advertisement; specific pricing mechanisms of advertising such as cost-per-click, cost-per-mile, etc.). An overview of all related keywords and their detailed analysis is provided in Chap. 3. 4 Under “personalization”, I also understand the British way of spelling this word (“personalisation”). Since many papers were written in different parts of the world, the differentiation between the US-American and the British spelling of this particular term is of an enormous relevance as it heavily influences the number of hits obtainable in most of the selected digital libraries. When a text was examined on the presence of this keyword, both spellings were accepted.

Springer 111 155 990 106 452 89 379 1903 319 1000 5504

Search term

“privacy” + “online advertisement” + “personalisation”

“cost” + “online advertisement” + “personalisation”

“cost” + “online advertisement” + “internet”

“personalization” + “internet” + “privacy” + “online advertisement”

“privacy” + “online advertisement” + “internet”

“privacy” + “cost” + “online advertisement” + “personalisation”

“privacy” + “cost” + “online advertisement”

“privacy” + “cost” + “personalisation” + “advertisement”

“targeting” + “online advertisement” + “cost” + “privacy”

“targeting” + “cost” + “online advertisement”

Total

#

1

2

3

4

5

6

7

8

9

10

Table 1 Hits on individual search terms in individual digital libraries Wiley

1132

70

15

923

21

1

25

6

64

5

2

ACM

1113

137

42

613

54

18

60

24

109

34

22

Emerald

1454

252

98

404

115

29

146

40

273

55

42

IEEE

38

1

0

0

0

0

5

0

8

9

15

SAGE

632

59

12

462

17

2

18

0

53

7

2

Total

9873

1519

486

4305

586

139

706

176

1497

265

194

534 T. Lego

Privacy and Cost Concerns in Online Advertising …

535

2.3 Search Results The search was conducted using the advanced search option in all individual digital libraries. It was limited to the period of 2000–2020, texts written in English and yielded a total of 9873 publications over all ten search terms in all six digital libraries (Table 1). This number, however, does not represent the number of unique search hits that had to be evaluated. Mainly, the numbers from Table 1 comprise all publication types past the desired conference papers (or proceedings), journal articles and book chapters. Additionally, a large number of duplicates (identical publications found using different search terms) was identified. Fortunately, duplicates could only be found within individual digital libraries. For example, the same publication (see e.g. Roddick [18]) could be found in the Wiley Online Library using either of the ten individual search terms, but could not be found in any of the other libraries. For those reasons, the number of unique publications found was significantly lower than the total number of 9873 hits. Considering the accumulated number of hits for all search terms in all5 individual libraries in a given year, Fig. 1 reveals a clear upward-trend in the number of published works. Together with the ever-present advancements in technologies and rising number of Internet users [29, 30], the scientific interest it sparks might indicate that online advertising will not lose its place as a leading industry worth billions in the coming years or decades even. After having filtered the search hits for journal articles, book chapters and conference papers (or publications in conference proceedings), randomly drawn search hits were examined for their relevance based on their abstracts. A publication had to clearly revolve around the topic of online advertisement and focus on, tackle, or provide useful references on the areas of privacy and costs in this context. After an initial exclusion of irrelevant publications based on their type, title and abstract, works deemed relevant due to clearly revolving around the topic of online advertisement generally, but also focusing on either the issue of costs or privacy (ideally both), were evaluated based on their full texts6 as well and only then included into the final sample, comprising 70 publications. Table 2 offers an overview of all 70 publications included in the final sample, indicating their author(s), year of publication, title, their publication type, as well as the library used to find them.

3 Analysis This chapter offers a closer look at the 70 relevant publications presented in Table 2. It offers a basic overview of the author teams and publications types. At the backdrop of an analysis of the keywords used, it also delivers a review of the currently 5

IEEE was excluded due to the relatively small number of hits. There is only one publication in the final sample that was assessed as of indisputable relevance, but had to be evaluated based on its abstract only, due to its full-text not being available.

6

The price of privacy

Predictive client-side profiles for personalized advertising Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Baumann et al. (2019)

Bilenko and Richardson (2011)

Optimization and Engineering

Heuristic Bayesian targeting of banner advertising

Online advertorial attributions on consumer responses: materialism as a moderator

A model of advertiser—portal contracts: personalization strategies under privacy concerns

Sharing online advertising revenue with consumers

In-depth survey of digital advertising technologies

Optimal keyword bids in search-based advertising with stochastic advertisement positions

Online advertising and its security and privacy concerns

Caruso et al. (2015)

Chang et al. (2018)

Chellappa and Shivendu (2006)

Chen et al. (2008)

Chen et al. (2016)

Cholette et al. (2012)

Davar Pishva (2013)

2013 15th International Conference on Advanced Communications Technology

Journal of Optimization Theory and Applications

IEEE Communications Surveys and Tutorials

Internet and Network Economics

Information Technology and Management

Online Information Review

Proceedings of The Web Conference 2020

Borgolte and Understanding the Performance Costs and Benefits of Feamster (2020) Privacy-focused Browser Extensions

Business and Information Systems Engineering

Proceedings of the Association for Information Science and Technology

Online privacy and informed consent: the dilemma of information asymmetry

Bashir et al. (2015)

Source

Title

Publication

Table 2 Sampled publications, by their type and publisher

Conference

Journal

Journal

Conference

Journal

Journal

Journal

Conference

Conference

Journal

Journal

Typea Library

(continued)

IEEE

Springer

IEEE

Springer

Springer

Emerald

Springer

ACM

ACM

Springer

Wiley

536 T. Lego

Digital Advertising’s Human Toll: how Implied Cost-to-User Affects Web Content Platforms (A Research Proposal)

Privacy preserving frequency capping in internet banner advertising

Dunn and Galletta (2012)

Farahat (2009)

Policy and Internet

Self-regulation of online advertising: a lesson from a failure

Online advertising, behavioral targeting, and privacy

Ginosar (2014)

Goldfarb and Tucker (2011) Communications of the ACM

Advertising gets personal

Challenges in measuring online advertising systems

Privad: practical privacy in online advertising

Search engine advertisements: the impact of advertising statements on click-through and conversion rates

Not all adware is badware: towards privacy-aware advertising

Greengard (2012)

Guha et al. (2010)

Guha et al. (2011)

Haans et al. (2013)

Haddadi et al. (2009)

Software Services for e-Business and e-Society

Marketing Letters

8th USENIX Symposium on Networked Systems Design and Implementation

Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement

Review of Industrial Organization

Goldfarb (2014) What is different about online advertising?

Communications of the ACM

Proceedings of the 2013 Conference on Internet Measurement Conference

Gill et al. (2013) Best paper—Follow the money: understanding economics of online aggregation and advertising

Proceedings of the 18th international conference on World Wide Web

2012 45th Hawaii International Conference on System Sciences

Allocation of advertising space by a web service provider Sadhana using combinatorial auctions

Dulluri and Raghavan (2005)

Source

Title

Publication

Table 2 (continued)

Conference

Journal

Conference

Conference

Journal

Journal

Journal

Journal

Conference

Conference

Conference

Journal

Typea Library

(continued)

Springer

Springer

ACM

ACM

ACM

Springer

ACM

Wiley

ACM

ACM

IEEE

Springer

Privacy and Cost Concerns in Online Advertising … 537

New Online Retailing

Eight success factors in new online retailing

Theory and policy in online privacy

Heinemann and Schwarzl (2010)

Hinduja (2004)

Adaptive Hypermedia and Adaptive Web-Based Systems Applied Cognitive Psychology

Kazienko and Personalized web advertising method Adamski (2004)

Köster et al. (2015)

Annual Review of Information Science and Technology

Large (2005)

Liu et al. (2007) Online Advertisement Campaign Optimization

2007 IEEE International Conference on Service Operations and Logistics, and Informatics

Proceedings of the 2012 ACM workshop on Privacy in the electronic society

Leon et al. (2012)

What do online behavioral advertising privacy disclosures communicate to users?

World Wide Web

Lee et al. (2016) Advertisement clicking prediction by using multiple criteria mathematical programming

Children, teenagers, and the Web

Personal Data in Competition, Consumer Protection and Intellectual Property Law

La Diega (2018) Data as digital assets. The case of targeted advertising

Effects of personalized banner ads on visual attention and recognition memory

2010 43rd Hawaii International Conference on System Sciences

Hu et al. (2010) Pricing of online advertising: cost-per-click-through vs. cost-per-action

Knowledge, Technology and Policy

NETNOMICS: Economic Research and Electronic Networking

Hai et al. (2010) An integrated framework for the design of optimal web banners

Targeted advertising on the handset: privacy and security Pervasive Advertising challenges

Haddadi et al. (2011)

Source

Title

Publication

Table 2 (continued)

Conference

Conference

Journal

Journal

Book

Journal

Conference

Conference

Journal

Book

Journal

Book

Typea Library

(continued)

IEEE

ACM

Springer

Wiley

Springer

Wiley

Springer

IEEE

Springer

Springer

Springer

Springer

538 T. Lego

Pay-per-action model for online advertising

Too close for comfort: a study of the effectiveness and acceptability of rich-media personalized advertising

Americans’ attitudes about internet behavioral advertising practices

Managing the quality of CPC traffic

Internet ad auctions: insights and directions

Stochastic models for budget optimization in search-based advertising

Dynamic cost-per-action mechanisms and applications to Proceedings of the 17th International Conference on online advertising World Wide Web

Predictors of avoidance towards personalization of restaurant smartphone advertising

Developments in online, social media marketing in China Journal of Entrepreneurship and Innovation in and the West: an overview of different approaches Emerging Economies

Mahdian and Tomak (2007)

Malheiros et al. (2012)

McDonald and Cranor (2010)

Mungamuru and Garcia-Molina (2009)

Muthukrishnan (2008)

Muthukrishnan et al. (2010)

Nazerzadeh et al. (2008)

Nyheim et al. (2015)

O’Farrell (2020)

Journal of Hospitality and Tourism Technology

Algorithmica

Automata, Languages and Programming

Proceedings of the 10th ACM Conference on Electronic Commerce

Proceedings of the 9th Annual ACM Workshop on Privacy in the Electronic Society

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Internet and Network Economics

2014 IEEE Conference on Computer Communications Workshops

Cooperative online native advertisement: a game theoretical scheme leveraging on popularity dynamics

Maggi and De Pellegrini (2014)

Source

Title

Publication

Table 2 (continued)

Journal

Journal

Conference

Journal

Conference

Conference

Conference

Conference

Conference

Conference

Typea Library

(continued)

SAGE

Emerald

ACM

Springer

Springer

ACM

ACM

ACM

Springer

IEEE

Privacy and Cost Concerns in Online Advertising … 539

Adaptive targeting for online advertisement

A framework to harvest page views of web for banner advertising

Auctions in do-not-track compliant internet advertising

User agents in E-commerce environments: industry vs. consumer perspectives on data exchange

Real-time-advertising

Understanding fraudulent activities in online ad exchanges

Adblock usage in web advertisement in Poland

Ad-mad: automated development and optimization of online advertising campaigns

How does personalization affect brand relationship in social commerce? A mediation perspective

Three findings regarding privacy online

Pepelyshev et al. (2015)

Reddy (2015)

Reznichenko et al. (2011)

Spiekermann et al. (2003)

Stange and Funk (2014)

Stone-Gross et al. (2011)

Strzelecki et al. (2018)

Thomaidou et al. (2012)

Tran et al. (2020)

Tucker (2013)

Proceedings of the sixth ACM International Conference on Web Search and Data Mining

Journal of Consumer Marketing

2012 IEEE 12th International Conference on Data Mining Workshops

Advances in Information and Communication Networks

Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference

Business and Information Systems Engineering

Advanced Information Systems Engineering

Proceedings of the 18th ACM Conference on Computer and Communications Security

Big Data Analytics

Machine Learning, Optimization, and Big Data

ACM Transactions on the Web

Myadchoices: bringing transparency and control to online advertising

Parra-Arnau et al. (2017)

Source Journal of Business Ethics

Title

Park and Skoric Personalized ad in your Google Glass? Wearable (2017) technology, hands-off data collection, and new policy imperative

Publication

Table 2 (continued)

Conference

Journal

Conference

Conference

Conference

Journal

Conference

Conference

Conference

Conference

Journal

Journal

Typea Library

(continued)

ACM

Emerald

IEEE

Springer

ACM

Springer

Springer

ACM

Springer

Springer

ACM

Springer

540 T. Lego

Personalized delivery of on–line search advertisement based on user interests

A privacy-aware framework for online advertisement targeting

‘Hello, Mrs. Sarah Jones! We recommend this product!’ Consumers’ perceptions about personalized advertising: comparisons across advertisements delivered via three different types of media

Real-time bidding for online advertising: measurement and analysis

Xiao and Gong (2009)

Yang et al. (2013)

Yu and Cude (2009)

Yuan et al. (2013)

Proceedings of the Seventh International Workshop on Data Mining for Online Advertising

International Journal of Consumer Studies

2013 IEEE Global Communications Conference

Advances in Data and Web Management

Oh, the places you’ve been! user reactions to longitudinal Proceedings of the 2019 ACM SIGSAC Conference transparency about third-party web tracking and on Computer and Communications Security inferencing

Weinshel et al. (2019)

Wang and Chen Learning to predict the cost-per-click for your ad words (2012)

Proceedings of the 21st ACM International Conference on Information and Knowledge Management

Social Media Management

Van Looy (2016)b

Online advertising and viral campaigns

Marketing Letters

Van Doorn and Customization of online advertising: The role of Hoekstra (2013) intrusiveness

Internet Research Qualitative Market Research

Why do internet consumers block ads? New evidence from consumer opinion mining and sentiment analysis

Tudoran (2019)

Source

Van den Broeck How do users evaluate personalized Facebook et al. (2020) advertising? An analysis of consumer-and advertiser controlled factors

Title

Publication

Table 2 (continued)

Conference

Journal

Conference

Conference

Conference

Conference

Book

Journal

Journal

Journal

Typea Library

(continued)

ACM

Wiley

IEEE

Springer

ACM

ACM

Springer

Springer

Emerald

Emerald

Privacy and Cost Concerns in Online Advertising … 541

Online advertising channel choice—Posted price vs. auction

Feedback control of real-time display advertising

Have your cake and eat it too!: preserving privacy while achieving high behavioral targeting performance

Zhang and Yi-qin (2011)

Zhang et al. (2016)

Zhao et al. (2012)

b Due

Proceedings of the Sixth International Workshop on Data Mining for Online Advertising and Internet Economy

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

2011 International Conference on Management Science and Engineering 18th Annual Conference Proceedings

Source

Conference

Conference

Conference

Typea

“Type” indicates whether the publication is a book chapter (“Book”), a journal article (“Journal”) or a conference paper (“Conference”) to the full-text not being available, this publication was evaluated based on its abstract and associated keywords only.

a Column

Title

Publication

Table 2 (continued) Library

ACM

ACM

IEEE

542 T. Lego

Privacy and Cost Concerns in Online Advertising …

543

Fig. 2 Number of publications by the number of authors

recurring topics targeted in the context of privacy and cost issues of personalized online advertisement.

3.1 Authors The author teams which wrote individual publications from the final sample do vary in size. With an average author team of 2.78, the majority (54%) of sampled publications was written by either two or three authors. More specifically, the sample includes 14 single-authored publications, 19 papers that were written by two authors, 19 additional publications written by a team of three and 18 publications that were written by teams of four or more researchers, with the exact numbers being depicted in Fig. 2. For the majority of all selected works, the authors differed. The final sample of 70 publications was hence written by a total of 195 authors, but comprises 181 unique authors, as eight of them feature in two of the selected publications and an additional two researchers worked on four publications from the final sample.7

7

The authors that are included in the final sample more than once are: Avi Goldfarb from the Rothman School of Management, University of Toronto; Bin Cheng from NEC Laboratories Europe; Blasé Ur from the University of Chicago; Catherine Tucker, from the MIT Sloan School of Management; Hamed Haddadi from the Imperial College London; Jun Wang from the University College London; Lorrie Faith Cranor from the Carnegie Mellon University; Paul Francis from the Max-Planck Institute; Saikat Guha from Microsoft Research India; S. Muthukrishnan.

544

T. Lego

Fig. 3 Number of publications by their type

3.2 Publication Types and Publishers All of the 70 selected papers were either published in the proceedings of a conference, as an article in a scientific journal or as a chapter in a book. For this particular work, the biggest share of publication types is presented by conference contributions. As visible from Fig. 3, whole 37 (52.9%) papers were published as a conclusion of a conference, followed by 29 journal articles and four book chapters. Interestingly, whilst including 37 conference articles, the sample only encompasses 30 unique conferences. This is due to the fact that with this particular sample, three of the relevant contributions resulted from three different volumes of one conference, with additional five conferences providing two publications each. The conference relevant for three out of the 70 sampled publication is the ACM “Internet Measurement Conference” (2010, 2011, 2013). The additional five conferences providing two papers each are: the “Annual Hawaii International Conference on System Sciences” (2010, 2012); the “Annual Workshop on Privacy in the electronic society” (2010, 2012); the “International conference on World Wide Web” (2008, 2009); “International Workshop on Internet and Network Economics” (2007, 2008)8 and the “International Conference on Web Search and Data Mining” (2013, 2016). A very similar phenomenon can also be observed with the journal articles. However, with a total number of 26 unique journals and 29 journal articles in total, the frequency of repeated sources is not as high as is the case with conferences. The three journals that seem particularly relevant in this context and provide two publications each are “Marketing Letters”, “Communications of the ACM” and “Business and Information Systems Engineering”. The list of all remaining relevant conferences, journals and books can be found in Table 2. Therefore, looking not only at the type, but also the source of individual, selected publications, Fig. 4 shows that all individual digital libraries contributed to the final sample. Whereas all six libraries provided relevant journal articles, they strongly 8

The proceedings of this international workshop are titled “Internet and Network Economics”. The workshop itself was titled “International Workshop on Web and Internet Economics” in 2007 and “International Workshop on Internet and Network Economics” in 2008.

Privacy and Cost Concerns in Online Advertising …

545

Fig. 4 Number of publications by their type and publisher

differ in terms of the absolute number of papers provided. This is mainly driven by large differences in the total number of hits achieved in individual libraries and an overall considerable number of irrelevant articles that were filtered out during the sampling process. Hence, without providing any claim about the relevance of individual digital libraries, Fig. 4 provides a detailed overview of the constitution of the final sample. Showing that the majority of sampled publications was published by Springer or ACM, it does not only indicate how many works were provided by individual libraries, but also shows the distribution of journal articles, conference papers and book chapters over individual sources.

3.3 Keywords and Search Terms Whereas stating which search term was used to find individual hits is not desirable, for a big share of the sampled publications could be found using different search phrases, the individual search terms presented in Table 1 can be ranked according to their importance, i.e. by how many hits they provided. Table 3 indicates that across all six libraries combined, search term 8 accounted for the highest number of results,

546

T. Lego

Table 3 Search terms by the number of hits in all digital libraries Search term

Hits achieved

#8

4305

Percent of total (%) 43.6

Cumulative (%) 43.6

# 10

1519

15.4

59.0

#3

1497

15.2

74.2

#5

706

7.2

81.3

#7

586

5.9

87.2

#9

486

4.9

92.2

#2

265

2.7

94.8

#1

194

2.0

96.8

#4

176

1.8

98.6

#6

139

1.4

100.0

Total

9873

100.0

Fig. 5 Number of papers including a given number of keywords

followed by search terms 10 and 3. However, the sampled publications do not only vary along the lines of which search terms were used to find them, by whom, in which format and when they were published, but also differ according to how many of the six pre-defined keywords they include; i.e. they differ in their focus. Considering the number of keywords included as an indication for how narrow the focus of individual papers is, the sample provides a highly diverse selection of publications with an average of 4.56 keywords per paper (see Fig. 5). Whereas there is no article that would include one keyword only, approximately a quarter of them focuses on two or three key-terms, indicating a relatively narrow focus of such publications; i.e. publications that either focus on online advertisement in general, touch upon the monetary background of advertising on the internet, or tackle the related privacy concerns, but do not establish a link between these two focal points. Followed by 14 (16) papers focusing on 4 (5) keywords each, 22 of the selected publications indeed focus, or at least touch upon, all of the individual expressions. In

Privacy and Cost Concerns in Online Advertising …

547

accordance with the main focus of this chapter, the keyword “online advertisement” is picked out as a central theme in all papers in the final sample. However, for assessing individual publications on including a given keyword, not only the keyword itself, but its synonyms and related headwords were considered as well. Thus, a publication was evaluated as targeting “online advertisement” whenever it talked about “Internet advertisement”, “advertisement on the web”, or “search engine advertisement” for example. Equally, other forms of the word “advertisement” were accepted. These included the word “advertising” or the abbreviations “advert” and “ad”. Similar to “online advertisement”, relevant related expressions existed for the majority of other keywords as well. When including the word “cost”, for example, a paper might be talking about the cost efficiency of online advertisement, different pricing mechanisms and pricing methods of online ads (e.g. cost-per-click, or cost-per-mile), but might also focus on the costs of disclosing private information on the Internet. Therefore, a clear differentiation using not only the keyword, but additional catchwords was desirable. Table 4 provides such an overview of all relevant keywords and expressions derived from them and thus delivers a detailed indication of the specific focus of individual publications. Restating that “online advertisement” appeared in all 70 publications and focusing on the aggregated counts of individual catchwords, Fig. 6 further indicates that cost issues were a more common topic than privacy concerns. It also shows a frequent use of the buzzword “Internet” and a surprising difference between the frequency of use of words “targeting” and “personalization”, possibly indicating a different focus on the consumer, or the advertiser.

4 Conclusion Online advertising is an ever-rising industry relevant for billions of Internet users all over the world, with a business volume of billions of US dollars in ad payments and ad returns every year. Being a potentially cost-efficient way of advertising, but a danger to individuals’ privacy through relying on data on users’ online behavior, privacy and cost issues of online advertisement are two topics often targeted separately by extant literature. Taking their interplay more seriously, the goal of this chapter was to provide a topic description in the form of a literature review and to spark interest in further exploiting their link. Accordingly, at the backdrop of 70 randomly sampled scientific papers published between the years 2000 and 2020, this chapter provided an overview of the most frequently used keywords and revealed an array of different tendencies in the literature on online advertisement. It found that many papers published in this field tackle different pricing mechanisms of online ads and, considering their cost efficiency, differentiate between a myriad of different Internet services available for online advertisement and a large number of different ad types. It also identified many mathematical models that allow for determining optimal bidding strategies in online advertisement auctions or predicting ad revenues. It revealed that further publications depart from maximization problems and describe end-users’ attitude towards online advertisement and ad













x



x





Bashir et al. (2015)

Baumann et al. (2019)

Bilenko and Richardson (2011)

Borgolte and Feamster (2020)

Caruso et al. (2015)

Chang et al. (2018)

Chellappa and Shivendu (2006)

Chen et al. (2008)

Chen et al. (2016)

Cholette et al. (2012)

Davar Pishva (2013)

Advertisinga delivered online/on the web/on the internet





x







x



x





Banner/display advertisinga

x

x

x

















Internet advertisinga

x

x

x

x

x

x

x

x

x

x

x

Online advertisinga





x















x

(Online) behavioral advertisinga



x

x











x





Search (engine) advertisinga

Table 4 Sampled publications by the keywords and related catchwords they show























Web advertisinga















x







Ada blocking/banning/ avoidance





x

















Social media/smartphone/ social network advertisinga





x



x







x

x



(continued)

Personalizationb

548 T. Lego



x

















x





Dulluri and Raghavan (2005)

Dunn and Galletta (2012)

Farahat (2009)

Gill et al. (2013)

Ginosar (2014)

Goldfarb and Tucker (2011)

Goldfarb (2014)

Greengard (2012)

Guha et al. (2010)

Guha et al. (2011)

Haans et al. (2013)

Haddadi et al. (2009)

Haddadi et al. (2011)

Advertisinga delivered online/on the web/on the internet

Table 4 (continued)

x

x

x







x

x





x

x



Banner/display advertisinga

x



x

x









x







x

Internet advertisinga

x

x



x

x



x

x

x

x

x

x

x

Online advertisinga











x





x









(Online) behavioral advertisinga





x

x





x

x











Search (engine) advertisinga



























Web advertisinga



























Ada blocking/banning/ avoidance

x

























Social media/smartphone/ social network advertisinga

x

x



x



















(continued)

Personalizationb

Privacy and Cost Concerns in Online Advertising … 549

x













x

x







Hai et al. (2010)

Heinemann and Schwarzl (2010)

Hinduja (2004)

Hu et al. (2010)

Kazienko and Adamski (2004)

Köster et al. (2015)

La Diega (2018)

Large (2005)

Lee et al. (2016)

Leon et al. (2012)

Liu et al. (2007)

Maggi and De Pellegrini (2014)

Advertisinga delivered online/on the web/on the internet

Table 4 (continued)



x









x

x



x

x

x

Banner/display advertisinga





x

x







x



x



x

Internet advertisinga

x

x

x

x

x

x

x

x

x



x

x

Online advertisinga





x





x













(Online) behavioral advertisinga



x



x

















Search (engine) advertisinga















x







x

Web advertisinga











x













Ada blocking/banning/ avoidance

























Social media/smartphone/ social network advertisinga









x

x

x

x



x

x

x

(continued)

Personalizationb

550 T. Lego























x

Mahdian and Tomak (2007)

Malheiros et al. (2012)

McDonald and Cranor (2010)

Mungamuru and Garcia–Molina (2009)

Muthukrishnan (2008)

Muthukrishnan et al. (2010)

Nazerzadeh et al. (2008)

Nyheim et al. (2015)

O’Farrell (2020)

Park and Skoric (2017)

Parra–Arnau et al. (2017)

Pepelyshev et al. (2015)

Advertisinga delivered online/on the web/on the internet

Table 4 (continued)







x



x



x





x



Banner/display advertisinga

x

x





x





x



x

x



Internet advertisinga

x

x

x

x



x





x

x

x

x

Online advertisinga



x















x

x



(Online) behavioral advertisinga













x

x









Search (engine) advertisinga

























Web advertisinga



x



x

x















Ada blocking/banning/ avoidance









x





x









Social media/smartphone/ social network advertisinga



x

x



x











x



(continued)

Personalizationb

Privacy and Cost Concerns in Online Advertising … 551

x





















x

Reddy (2015)

Reznichenko et al. (2011)

Spiekermann et al. (2003)

Stange and Funk (2014)

Stone–Gross et al. (2011)

Strzelecki et al. (2018)

Thomaidou et al. (2012)

Tran et al. (2020)

Tucker (2013)

Tudoran (2019)

Van den Broeck et al. (2020)

Van Doorn and Hoekstra (2013)

Advertisinga delivered online/on the web/on the internet

Table 4 (continued)













x



x

x



x

Banner/display advertisinga

x



















x



Internet advertisinga

x

x

x

x

x

x

x

x

x

x

x

x

Online advertisinga



x

















x



(Online) behavioral advertisinga

















x



x



Search (engine) advertisinga











x

x









x

Web advertisinga





x







x











Ada blocking/banning/ avoidance



x

x



x















Social media/smartphone/ social network advertisinga

x

x





x







x

x



x

(continued)

Personalizationb

552 T. Lego











x









11

Van Looy (2016)

Wang and Chen (2012)

Weinshel et al. (2019)

Xiao and Gong (2009)

Yang et al. (2013)

Yu and Cude (2009)

Yuan et al. (2013)

Zhang and Yi–qin (2011)

Zhang et al. (2016)

Zhao et al. (2012)

Number of publications

Advertisinga delivered online/on the web/on the internet

Table 4 (continued)

27



x

x

x













Banner/display advertisinga

23

x















x



Internet advertisinga

63



x

x

x

x

x

x

x

x

x

Online advertisinga

11





















(Online) behavioral advertisinga

15













x





x

Search (engine) advertisinga

6















x





Web advertisinga

9















x



x

Ada blocking/banning/ avoidance

8









x











Social media/smartphone/ social network advertisinga

27

x







x





x





(continued)

Personalizationb

Privacy and Cost Concerns in Online Advertising … 553





x







x



x



Bashir et al. (2015)

Baumann et al. (2019)

Bilenko and Richardson (2011)

Borgolte and Feamster (2020)

Caruso et al. (2015)

Chang et al. (2018)

Chellappa and Shivendu (2006)

Chen et al. (2008)

Chen et al. (2016)

Cholette et al. (2012)

Personalizedb,c advertisinga

Table 4 (continued)



x



x

x



x

x

x

x

(Online) privacy (on the internet) (concerns/issues)







x







x

x



Privacy cost/cost of disclosing personal information





















Cost of effectiveness/costeffectiveness















x





Advertisinga effectiveness/costeffectiveness















x





Advertisementa efficiency



x

















Cost-per-action (OR pay-per …)

x

x

x





x



x





Cost-per-click (OR pay-per …)



x

x









x





Cost-perimpression/costper-mile (OR pay-per …)



x

x









x





(continued)

Cost-peracquisition/costper-lead (OR pay-per …)

554 T. Lego

x





x









x



x



x

Davar Pishva (2013)

Dulluri and Raghavan (2005)

Dunn and Galletta (2012)

Farahat (2009)

Gill et al. (2013)

Ginosar (2014)

Goldfarb and Tucker (2011)

Goldfarb (2014)

Greengard (2012)

Guha et al. (2010)

Guha et al. (2011)

Haans et al. (2013)

Haddadi et al. (2009)

Personalizedb,c advertisinga

Table 4 (continued)

x

x

x

x

x



x

x

x

x





x

(Online) privacy (on the internet) (concerns/issues)



























Privacy cost/cost of disclosing personal information



x























Cost of effectiveness/costeffectiveness



x

x





x

x







x



x

Advertisinga effectiveness/costeffectiveness



























Advertisementa efficiency



























Cost-per-action (OR pay-per …)



x







x





x

x







Cost-per-click (OR pay-per …)











x





x

x







Cost-perimpression/costper-mile (OR pay-per …)











x





x

x







(continued)

Cost-peracquisition/costper-lead (OR pay-per …)

Privacy and Cost Concerns in Online Advertising … 555

x

x

x





x

x













Haddadi et al. (2011)

Hai et al. (2010)

Heinemann and Schwarzl (2010)

Hinduja (2004)

Hu et al. (2010)

Kazienko and Adamski (2004)

Köster et al. (2015)

La Diega (2018)

Large (2005)

Lee et al. (2016)

Leon et al. (2012)

Liu et al. (2007)

Maggi and De Pellegrini (2014)

Personalizedb,c advertisinga

Table 4 (continued)





x



x

x

x

x



x



x

x

(Online) privacy (on the internet) (concerns/issues)











x















Privacy cost/cost of disclosing personal information



























Cost of effectiveness/costeffectiveness



x

x







x



x

x

x

x



Advertisinga effectiveness/costeffectiveness







x



















Advertisementa efficiency

















x









Cost-per-action (OR pay-per …)







x









x







x

Cost-per-click (OR pay-per …)



x













x







x

Cost-perimpression/costper-mile (OR pay-per …)



x











x

x







x

(continued)

Cost-peracquisition/costper-lead (OR pay-per …)

556 T. Lego



x

x









x



x

x



Mahdian and Tomak (2007)

Malheiros et al. (2012)

McDonald and Cranor (2010)

Mungamuru and Garcia–Molina (2009)

Muthukrishnan (2008)

Muthukrishnan et al. (2010)

Nazerzadeh et al. (2008)

Nyheim et al. (2015)

O’Farrell (2020)

Park and Skoric (2017)

Parra–Arnau et al. (2017)

Pepelyshev et al. (2015)

Personalizedb,c advertisinga

Table 4 (continued)



x

x

x

x









x

x



(Online) privacy (on the internet) (concerns/issues)





















x



Privacy cost/cost of disclosing personal information







x

















Cost of effectiveness/costeffectiveness













x

x





x



Advertisinga effectiveness/costeffectiveness











x













Advertisementa efficiency

x









x





x





x

Cost-per-action (OR pay-per …)



x







x

x



x





x

Cost-per-click (OR pay-per …)

x









x





x





x

Cost-perimpression/costper-mile (OR pay-per …)

x

x







x





x





x

(continued)

Cost-peracquisition/costper-lead (OR pay-per …)

Privacy and Cost Concerns in Online Advertising … 557

x



x

x



x



x



x

x

x

x

Reddy (2015)

Reznichenko et al. (2011)

Spiekermann et al. (2003)

Stange and Funk (2014)

Stone–Gross et al. (2011)

Strzelecki et al. (2018)

Thomaidou et al. (2012)

Tran et al. (2020)

Tucker (2013)

Tudoran (2019)

Van den Broeck et al. (2020)

Van Doorn and Hoekstra (2013)

Van Looy (2016)

Personalizedb,c advertisinga

Table 4 (continued)

x

x

x

x

x

x



x



x

x

x

x

(Online) privacy (on the internet) (concerns/issues)





x





















Privacy cost/cost of disclosing personal information













x













Cost of effectiveness/costeffectiveness





x

x

x

x















Advertisinga effectiveness/costeffectiveness















x











Advertisementa efficiency

















x







x

Cost-per-action (OR pay-per …)













x



x





x

x

Cost-per-click (OR pay-per …)

















x







x

Cost-perimpression/costper-mile (OR pay-per …)

















x







x

(continued)

Cost-peracquisition/costper-lead (OR pay-per …)

558 T. Lego



30

Zhao et al. (2012)

Number of publications

46

x







x

x



x



(Online) privacy (on the internet) (concerns/issues)

(Bidding) price/pricing (of adsa )





Zhang et al. (2016)

Borgolte and Feamster (2020)



Zhang and Yi–qin (2011)





Yuan et al. (2013)

Bilenko and Richardson (2011)

x

Yu and Cude (2009)





Yang et al. (2013)

x

x

Xiao and Gong (2009)

Baumann et al. (2019)

x

Weinshel et al. (2019)

Bashir et al. (2015)



Wang and Chen (2012)

Personalizedb,c advertisinga

Table 4 (continued)

5

x

x





















x



x





Return

Cost of effectiveness/costeffectiveness

Profit/revenue/profitability

6



















Privacy cost/cost of disclosing personal information

5

x

























x



x

x

Internet

Advertisementa efficiency

Advertisinga costs (OR costs of)

24

x







x







x

Advertisinga effectiveness/costeffectiveness

8





















x





Target—verb

Cost-per-action (OR pay-per …)

25



x



x





x



x

17



x



x











Cost-perimpression/costper-mile (OR pay-per …)

x

x



x

Targeted (ada )—adverb

Cost-per-click (OR pay-per …)

x

x

x

x

(continued)

Targeting—noun

19



x



x











Cost-peracquisition/costper-lead (OR pay-per …)

Privacy and Cost Concerns in Online Advertising … 559





x

x

x

x



x

x

x

x

x



x





x





x

x







Chang et al. (2018)

Chellappa and Shivendu (2006)

Chen et al. (2008)

Chen et al. (2016)

Cholette et al. (2012)

Davar Pishva (2013)

Dulluri and Raghavan (2005)

Dunn and Galletta (2012)

Farahat (2009)

Gill et al. (2013)

Ginosar (2014)

Goldfarb and Tucker (2011)

Goldfarb (2014)

Greengard (2012)

Guha et al. (2010)

Guha et al. (2011)

Haans et al. (2013)

Haddadi et al. (2009)

Haddadi et al. (2011)

Hai et al. (2010)

Heinemann and Schwarzl (2010)

Hinduja (2004)

Hu et al. (2010)

(Bidding) price/pricing (of adsa )

Caruso et al. (2015)

Table 4 (continued)



x



x

x





x





x

x



x





x

x

x



x

x



x

Profit/revenue/profitability





































x

x



x





Return















x















x







x



x





Advertisinga costs (OR costs of)

x

x

x

x

x

x

x

x

x

x

x

x

x



x

x

x

x

x

x





x



Internet







x



x





x

x

x

x















x







x

Target—verb

x

x

x

x

x

x

x

x

x



x

x

x

x







x



x



x





Targeted (ada )—adverb







x

x

x



x

x



x

x

x

x



x



x



x

x

x



x

(continued)

Targeting—noun

560 T. Lego





x







x



x





x

x

x

x









x

x

x

Köster et al. (2015)

La Diega (2018)

Large (2005)

Lee et al. (2016)

Leon et al. (2012)

Liu et al. (2007)

Maggi and De Pellegrini (2014)

Mahdian and Tomak (2007)

Malheiros et al. (2012)

McDonald and Cranor (2010)

Mungamuru and Garcia–Molina (2009)

Muthukrishnan (2008)

Muthukrishnan et al. (2010)

Nazerzadeh et al. (2008)

Nyheim et al. (2015)

O’Farrell (2020)

Park and Skoric (2017)

Parra–Arnau et al. (2017)

Pepelyshev et al. (2015)

Reddy (2015)

Reznichenko et al. (2011)

(Bidding) price/pricing (of adsa )

Kazienko and Adamski (2004)

Table 4 (continued)

x













x



x

x





x

x

x



x



x





Profit/revenue/profitability



x













x



x























Return

x







x



x



x



x







x















Advertisinga costs (OR costs of)



x



x

x

x

x



x

x



x

x







x

x

x

x

x

x

Internet



x



x

x

x

x





x











x







x

x

x

Target—verb

x

x

x

x

x



x







x

x

x







x





x

x



Targeted (ada )—adverb

x

x

x

x

x

x

x





x



x

x



x

x

x

x



x

x

x

(continued)

Targeting—noun

Privacy and Cost Concerns in Online Advertising … 561

x



32

Zhang et al. (2016)

Zhao et al. (2012)

Number of publications

32

x

x

x

x



x



















x



x

x

x

Profit/revenue/profitability

10



x



x



























x





Return

15





x

x

















x



x





x





Advertisinga costs (OR costs of)

a The words “advertisement”, “advertising”, “advert” and “ad” were used interchangeably in this context b The word “personalization” is the US-American spelling of the British form “personalisation”. Both spellings were treated as equal. c The word “personalized” in the context of online advertisement is interchangeable with the word “tailored”.

x

x

Zhang and Yi–qin (2011)



Yu and Cude (2009)

Yuan et al. (2013)





Van Doorn and Hoekstra (2013)

Yang et al. (2013)



Van den Broeck et al. (2020)

x



Tudoran (2019)





Tucker (2013)

Xiao and Gong (2009)



Tran et al. (2020)

Weinshel et al. (2019)



Thomaidou et al. (2012)

x



Strzelecki et al. (2018)

x

x

Stone–Gross et al. (2011)

Wang and Chen (2012)

x

Van Looy (2016)



Stange and Funk (2014)

(Bidding) price/pricing (of adsa )

Spiekermann et al. (2003)

Table 4 (continued)

53

x



x



x

x



x

x



x

x

x

x

x



x

x

x

x

Internet

27







x

x





x





x

x





x





x

x



Target—verb

42

x

x





x

x



x



x

x



x



x





x



x

Targeted (ada )—adverb

46

x

x



x



x



x





x

x





x





x



x

Targeting—noun

562 T. Lego

Privacy and Cost Concerns in Online Advertising …

563

Fig. 6 Share of publications with a given keyword

personalization. Such publications inevitably tackle potential privacy issues related to the use of personal (behavioral) data for the purpose of ad targeting. They eventually provide policy frameworks for online advertisement and introduce methods of banning adverts. Despite numerous papers touching upon aspects of costs and privacy simultaneously, there still appears to be a dearth of understanding their interplay in more detail. Realizing the importance of personal behavioral data for the efficiency of online advertisement, but also its potential to harm the privacy of individual endusers, this chapter thus encourages future research to further exploit the interplay between privacy and cost factors of targeted advertising online. For such studies, this chapter may provide a stepping-stone.

Appendix Alphabetic References of Examined Publications from Tables 2 and 4 Bashir, M., Hayes, C., Lambert, A.D., Kesan, J.P.: Online Privacy and Informed Consent: The Dilemma of Information Asymmetry. Proceedings of the Association for Information Science and Technology, Vol. 52(1). (2015) 1–10 Baumann, A., Haupt, J., Gebert, F., Lessmann, S.: The Price of Privacy: An Evaluation of the Economic Value of Collecting Clickstream Data. Business & Information Systems Engineering, Vol. 61(4), (2019) 413–431 Bilenko, M., Richardson, M.: Predictive Client-side Profiles for Personalized Advertising. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. (2011) 413–421 Borgolte, K., Feamster, N.: Understanding the Performance Costs and Benefits of Privacy-focused Browser Extensions. In: Proceedings of The Web Conference. (2020) 2275–2286 Caruso, F., Giuffrida, G., Zarba, C.: Heuristic Bayesian targeting of banner advertising. Optimization and Engineering, Vol. 16(1), (2015) 247–257 Chang, H.H., Wong, K.H., Chu, T.W.: Online advertorial attributions on consumer responses: materialism as a moderator. Online Information Review, Vol. 42(5). (2018) 697–717 Chellappa, R.K., Shivendu, S.: A model of advertiser—Portal contracts: Personalization strategies under privacy concerns. Information Technology and Management, Vol. 7(1). (2006) 7–19

564

T. Lego

Chen, Y., Ghosh, A., McAafee, R.P., Pennock, D.: Sharing online advertising revenue with consumers. In: Internet and Network Economics. (2008) 556–565 Chen, G., Cox, J.H., Uluagac, A.S., Copeland, J.A.: In-depth survey of digital advertising technologies. IEEE Communications Surveys and Tutorials, Vol. 18(3). (2016) 2124–2148 Cholette, S., Özlük, Ö., Parlar, M.: Optimal keyword bids in search-based advertising with stochastic advertisement positions. Journal of Optimization Theory and Applications, Vol. 152(1). (2012) 225–244 Davar Pishva, A.: Online Advertising and its Security and Privacy Concerns. In: 2013 15th International Conference on Advanced Communications Technology. (2013) 372–377 Dulluri, S., Raghavan, N.R.S.: Allocation of advertising space by a web service provider using combinatorial auctions. Sadhana, Vol. 30(2–3). (2005) 213–230 Dunn, B.K., Galletta, D.F.: Digital advertising’s human toll: How implied cost-to-user affects Web content platforms (a research proposal). In: 2012 45th Hawaii International Conference on System Sciences. (2012) 3180–3187 Farahat, A.: Privacy preserving frequency capping in internet banner advertising. In: Proceedings of the 18th international conference on World wide web. (2009) 1147–1148 Gill, P., Erramilli, V., Chaintreau, A., Krishnamurthy, B., Papagiannaki, D., Rodriguez, P.: Best Paper—Follow the money: understanding economics of online aggregation and advertising. In: Proceedings of the 2013 conference on Internet measurement conference. (2013) 141–148 Ginosar, A.: Self-Regulation of Online Advertising: A Lesson From a Failure. Policy & Internet, Vol. 6(3). (2014) 296–314 Goldfarb, A., Tucker, C.: Online advertising, behavioral targeting, and privacy. Communications of the ACM, Vol. 54(5). (2011) 25–27 Goldfarb, A.: What is Different About Online Advertising?. Review of Industrial Organization, Vol. 44(2). (2014) 115–129 Greengard, S.: Advertising Gets Personal. Communications of the ACM, Vol. 55(8). (2012) 18–20 Guha, S., Cheng, B., Francis, P.: Challenges in Measuring Online Advertising Systems. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. (2010) 81–87 Guha, S., Cheng, B., Francis, P.: Privad: Practical Privacy in Online Advertising Saikat. In: 8th USENIX Symposium on Networked Systems Design and Implementation. (2011) 169–182 Haans, H., Raassens, N., van Hout, R.: Search engine advertisements: The impact of advertising statements on click-through and conversion rates. Marketing Letters, Vol. 24(2). (2013) 151–163 Haddadi, H., Guha, S., Francis, P.: Not All Adware Is Badware: Towards Privacy-Aware Advertising. In: Software Services for e-Business and e-Society. (2009) 161–172 Haddadi, H., Hui, P., Henderson, T., Brown, I.: Targeted Advertising on the Handset: Privacy and Security Challenges. In: Müller, J., Alt, F., Michelis, D.: Pervasive Advertising. (2011) 119–137 Hai, L., Zhao, L., Nagurney, A.: An integrated framework for the design of optimal web banners. NETNOMICS: Economic Research and Electronic Networking, Vol. 11(1). (2010) 69–83 Heinemann, G., Schwarzl, C.: Eight Success Factors in New Online Retailing. In: New Online Retailing. (2010) 92–186 Hinduja, S.: Theory and Policy in Online Privacy. Knowledge, Technology & Policy, Vol. 17(1). (2004) 38–58 Hu, Y., Shin, J., Tang, Z.: Pricing of Online Advertising: Cost-per-Click-through vs. Cost-perAction. In: 2010 43rd Hawaii International Conference on System Sciences. (2010) 1–9 Kazienko, P., Adamski, M.: Personalized Web Advertising Method. In: Adaptive Hypermedia and Adaptive Web-Based Systems. (2004) 146–155 Köster, M., Rüth, M., Hamborg, K.-C., Kaspar, K.: Effects of Personalized Banner Ads on Visual Attention and Recognition Memory. Applied Cognitive Psychology, Vol. 29(2). (2015) 181–192 La Diega, G.N.: Data as Digital Assets. The Case of Targeted Advertising. In: Bakhoum, M., Gallego, B.C., Mackenrodt, M.-O., Surblyt˙e-Namaviˇcien˙e, G.: Personal Data in Competition, Consumer Protection and Intellectual Property Law. (2018) 445–499 Large, A.: Children, Teenagers, and the Web. Annual Review of Information Science and Technology, Vol. 39(1). (2005) 347–392

Privacy and Cost Concerns in Online Advertising …

565

Lee, J., Shi, Y., Wang, F., Lee, H., Kim, H.K.: Advertisement clicking prediction by using multiple criteria mathematical programming. World Wide Web, Vol. 19(4). (2016) 707–724 Leon, P.G., Cranshaw, J., Cranor, L.F., Graves, J., Hastak, M., Ur, B., Xu, G.: What Do Online Behavioral Advertising Privacy Disclosures Communicate to Users?. In: Proceedings of the 2012 ACM workshop on Privacy in the electronic society. (2012) 19–30 Liu, W., Zhong, S., Chaudhary, M., Kapur, S.: Online Advertisement Campaign Optimization. In: 2007 IEEE International Conference on Service Operations and Logistics, and Informatics. (2007) 1–4 Maggi, L., De Pellegrini, F.: Cooperative Online Native Advertisement: a Game Theoretical Scheme Leveraging on Popularity Dynamics. In: 2014 IEEE Conference on Computer Communications Workshops. (2014) 334–339 Mahdian, M., Tomak, K.: Pay-per-action Model for Online Advertising. In: Internet and Network Economics. (2007) Malheiros, M., Jennett, C., Patel, S., Brostoff, S., Sasse, M.A.: Too Close for Comfort: A Study of the Effectiveness and Acceptability of Rich-Media Personalized Advertising. In: Proceedings of the SIGCHI conference on human factors in computing systems. (2012) 579–588 McDonald, A.M., Cranor, L.F.: Americans’ Attitudes About Internet Behavioral Advertising Practices. In: Proceedings of the 9th annual ACM workshop on Privacy in the electronic society. (2010) 63–72 Mungamuru, B., Garcia-Molina, H.: Managing the Quality of CPC Traffic Bobji. In: Proceedings of the 10th ACM conference on Electronic commerce. (2009) 215–224 Muthukrishnan, S.: Internet Ad Auctions: Insights and Directions. In: Automata, Languages and Programming. (2008) 14–23 Muthukrishnan, S., Pál, M., Svitkina, Z.: Stochastic Models for Budget Optimization in SearchBased Advertising. Algorithmica, Vol. 58(4). (2010) 1022–1044 Nazerzadeh, H., Saberi, A., Vohra, R.: Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising. In: Proceedings of the 17th international conference on World Wide Web. (2008) 179–188 Nyheim, P., Xu, S., Zhang, L., Mattila, A.S.: Predictors of avoidance towards personalization of restaurant smartphone advertising: A study from the Millennials’ perspective. Journal of Hospitality and Tourism Technology, Vol. 6(2). (2015) 145–159 O’Farrell, H.: Developments in Online, Social Media Marketing in China and the West: An Overview of Different Approaches. Journal of Entrepreneurship and Innovation in Emerging Economies, Vol. 6(2). (2020) 383–403 Park, Y.J., Skoric, M.: Personalized Ad in Your Google Glass? Wearable Technology, Hands-Off Data Collection, and New Policy Imperative. Journal of Business Ethics, Vol. 142(1). (2017) 71–82 Parra-Arnau, J., Achara, J.P., Castelluccia, C.: MyAdChoices: Bringing Transparency and Control to Online Advertising. ACM Transactions on the Web, Vol. 11(1). (2017) 1–47 Pepelyshev, A., Staroselskiy, Y., Zhigljavsky, A.: Adaptive Targeting for Online Advertisement. In: Machine Learning, Optimization, and Big Data. (2015) 240–251 Reddy, P.K.: A Framework to Harvest Page Views of Web for Banner Advertising. In: Big Data Analytics. (2015) 57–68 Reznichenko, A., Guha, S., Francis, P.: Auctions in Do-Not-Track Compliant Internet Advertising. In: Proceedings of the 18th ACM conference on Computer and communications security. (2011) 667–676 Spiekermann, S., Dickinson, I., Günther, O., Reynolds, D.: User Agents in E-commerce Environments: Industry vs. Consumer Perspectives on Data Exchange. In: Advanced Information Systems Engineering. (2003) 696–710 Stange, M., Funk, B.: Real-Time Advertising. Business & Information Systems Engineering, Vol. 6. (2014) 305–308

566

T. Lego

Stone-Gross, B., Stevens, R., Zarras, A., Kemmerer, R., Kruegel, C., Vigna, G.: Understanding Fraudulent Activities in Online Ad Exchanges. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. (2011) 279–294 Strzelecki, A., Abramek, E., Sołtysik-Piorunkiewicz, A.: Adblock Usage in Web Advertisement in Poland. In: Advances in Information and Communication Networks. (2018) 13–23 Thomaidou, S., Leymonis, K., Liakopoulos, K., Vazirgiannis, M.: AD-MAD: Automated Development and Optimization of Online Advertising Campaigns. In: 2012 IEEE 12th International Conference on Data Mining Workshops. (2012) 902–905 Tran, T.P., van Solt, M., Zemanek Jr., J.E.: How does personalization affect brand relationship in social commerce? A mediation perspective. Journal of Consumer Marketing, Vol. 37(5). (2020) 473–486 Tucker, C.: Three Findings Regarding Privacy Online Catherine. In: Proceedings of the sixth ACM international conference on Web search and data mining. (2013) 243–244 Tudoran, A.A.: Why do internet consumers block ads? New evidence from consumer opinion mining and sentiment analysis. Internet Research, Vol. 29(1). (2019) 144–166 Van den Broeck, E., Poels, K., Walrave, M.: How do users evaluate personalized Facebook advertising? An analysis of consumer- and advertiser controlled factors. Qualitative Market Research, Vol. 23(2). (2020) 309–327 Van Doorn, J., Hoekstra, J.C.: Customization of online advertising: The role of intrusiveness. Marketing Letters, Vol. 24(4). (2013) 339–351 Van Looy, A.: Online Advertising and Viral Campaigns. In: Social Media Management. (2016) 63–85 Wang, C.-J., Chen, H.-H.: Learning to Predict the Cost-Per-Click for Your Ad Words. In: Proceedings of the 21st ACM international conference on Information and knowledge management. (2012) 2291–2294 Weinshel, B., Wei, M., Mondal, M., Choi, E., Shan, S., Dolin, C., Mazurek, M.L., Ur, B.: Oh, the Places You’ve Been! User Reactions to Longitudinal Transparency About Third-Party Web Tracking and Inferencing. In: Proceedings of the ACM Conference on Computer and Communications Security. (2019) 149–166 Xiao, G., Gong, Z.: Personalized Delivery of On–Line Search Advertisement Based on User Interests. In: Advances in Data and Web Management. (2009) 198–210 Yang, L., Wang, W., Chen, Y., Zhang, Q.: A Privacy-aware Framework for Online Advertisement Targeting. In: 2013 IEEE Global Communications Conference. (2013) 3145–3150 Yu, J.H., Cude, B.: “Hello, Mrs. Sarah Jones! We recommend this product!” Consumers’ perceptions about personalized advertising: Comparisons across advertisements delivered via three different types of media. International Journal of Consumer Studies, Vol. 33(4). (2009) 503–514 Yuan, S., Wang, J., Zhao, X.: Real-time Bidding for Online Advertising: Measurement and Analysis. In: Proceedings of the Seventh International Workshop on Data Mining for Online Advertising. (2013) 1–8 Zhang, E., Yi-Qin, Z.: Online Advertising Channel Choice—Posted Price VS. Auction. In: 2011 International Conference on Management Science & Engineering 18th Annual Conference Proceedings. (2011) 321–328 Zhang, W., Rong, Y., Wang, J., Zhu, T., Wang, X.: Feedback Control of Real-Time Display Advertising. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. (2016) 407–416 Zhao, Q., Zhang, Y., Lita, L.V.: Have Your Cake and Eat It Too! Preserving Privacy while Achieving High Behavioral Targeting Performance. In: Proceedings of the Sixth International Workshop on Data Mining for Online Advertising and Internet Economy. (2012) 1–9

Privacy and Cost Concerns in Online Advertising …

567

References 1. Wu, C.-H., Kan, M.-H., Bayarjargal, U., Wu, C.-C.: Effect of online advertisement types on click behavior in Mongolia: mediating impact of emotion. In: Proceedings of the 4th Multidisciplinary International Social Networks Conference, pp. 1–8 (2017) 2. Greengard, S.: Advertising gets personal. Commun. ACM 55(8), 18–20 (2012) 3. O’Farrell, H.: Developments in online, social media marketing in China and the West: an overview of different approaches. J. Entrepr. Innov. Emerg. Econ. 6(2), 383–403 (2020) 4. Borgolte, K., Feamster, N.: Understanding the performance costs and benefits of privacyfocused browser extensions. Proc. Web Conf. 2020, 2275–2286 (2020) 5. Bauer, W., Kryvinska, N., Dorn, J.: Towards trust in digital services advertisements—buying experts’ opinions on USDL-Trust. Forthcoming (2020) 6. Rossi, P.E., Mcculloch, R.E., Allenby, G.M.: The value of purchase history data in target marketing. Mark. Sci. 15(4), 321–340 (1996) 7. Goldfarb, A., Tucker, C.: Online advertising, behavioral targeting, and privacy. Commun. ACM 54(5), 25–27 (2011) 8. Arora, N., Dreze, X., Ghose, A., Hess, J.D., Iyengar, R., Jing, B., Joshi, Y., Kumar, V., Lurie, N., Neslin, S., Sajeesh, S., Su, M., Syam, N., Thomas, J., Zhang, Z.J.: Putting one-to-one marketing to work: personalization, customization, and choice. Mark. Lett. 19(3), 305–321 (2008) 9. Allen, C., Kania, D., Yaeckel, B.: One-to-one web marketing: build a relationship marketing strategy one customer at a time. Wiley, Hoboken (2001) 10. Tran, T.P.: Personalized ads on facebook: an effective marketing tool for online marketers. J. Retail. Consum. Serv. 39, 230–242 (2017) 11. Sarikakis, K., Winter, L.: Social Media Users’ Legal Consciousness About Privacy, vol. 3, no. 1. Social Media + Society (2017) 12. Evans, D.S.: The online advertising industry: economics, evolution, and privacy. J. Econ. Persp. 23(3), 37–60 (2009) 13. Goldfarb, A.: What is different about online advertising? Rev. Ind. Organ. 44(2), 115–129 (2014) 14. Yang, L., Wang, W., Chen, Y., Zhang, Q.: A Privacy-aware framework for online advertisement targeting. In: 2013 IEEE Global Communications Conference, pp. 3145–3150 (2013) 15. Davar Pishva, A.: Online advertising and its security and privacy concerns. In: 2013 15th International Conference on Advanced Communications Technology, pp. 372–377 (2013) 16. Farahat, A.: Privacy preserving frequency capping in internet banner advertising. In: Proceedings of the 18th International Conference on World Wide Web, pp. 1147–1148 (2009) 17. Baumann, A., Haupt, J., Gebert, F., Lessmann, S.: The price of privacy: an evaluation of the economic value of collecting clickstream data. Bus. Inf. Syst. Eng. 61(4), 413–431 (2019) 18. Roddick, A.: Privacy and customer feedback. In: Peppers, D., Rogers, M. (eds.) Managing Customer Experience and Relationships: A Strategic Framework, pp. 289–320 (2016) 19. Parra-Arnau, J., Achara, J.P., Castelluccia, C.: MyAdChoices: bringing transparency and control to online advertising. ACM Trans. Web 11(1), 1–47 (2017) 20. Tudoran, A.A.: Why do internet consumers block ads? New evidence from consumer opinion mining and sentiment analysis. Internet Res. 29(1), 144–166 (2019) 21. Strzelecki, A., Abramek, E., Sołtysik-Piorunkiewicz, A.: Adblock usage in web advertisement in Poland. In: Advances in Information and Communication Networks, pp. 13–23 (2018) 22. Yuan, S., Wang, J., Zhao, X.: Real-time bidding for online advertising: measurement and analysis. In: Proceedings of the Seventh International Workshop on Data Mining for Online Advertising, pp. 1–8 (2013) 23. Mungamuru, B., Garcia-Molina, H.: Managing the quality of CPC Traffic Bobji. In: Proceedings of the 10th ACM Conference on Electronic Commerce, pp. 215–224 (2009) 24. Hu, Y., Shin, J., Tang, Z.: Pricing of online advertising: cost-per-click-through vs. cost-peraction. In: 2010 43rd Hawaii International Conference on System Sciences, pp. 1–9 (2010)

568

T. Lego

25. Nazerzadeh, H., Saberi, A., Vohra, R.: Dynamic cost-per-action mechanisms and applications to online advertising. In: Proceedings of the 17th International Conference on World Wide Web, pp. 179–188 (2008) 26. Wang, C.-J., Chen, H.-H.: Learning to predict the cost-per-click for your ad words. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2291–2294 (2012) 27. Bauer, C., Strauss, C.: Location-based advertising on mobile devices: a literature review and analysis. Manage. Rev. Q. 66(3), 159–194 (2016) 28. Kryvinska, N., Olexova, R., Dohmen, P., Strauss, C.: The S-D logic phenomenonconceptualization and systematization by reviewing the literature of a decade (2004–2013). J. Serv. Sci. Res. 5(1), 35–94 (2013) 29. Statista: Number of internet users worldwide from 2005 to 2019. Retrieved from: https://www. statista.com/statistics/273018/number-of-internet-users-worldwide/ (2021). Accessed: 26 Feb 2021 30. Roser, M., Ritchie, H., Ortiz-Ospina, E.: Internet. Retrieved from: https://ourworldindata.org/ internet (2021). Accessed 26 Feb 2021

Technological Advancements Within the Canadian Electric Vehicle Industry Michael Vice and Marián Mikolášik

Abstract The Electric Vehicle industry is experiencing moderate market adoption rates in Canada, and as of 2019, they are even becoming more affordable. Inevitably, the technology available in these vehicles will surpass that of the traditional internal combustion engines (ICEs). However, there is much more business owners can do to elevate the consumer perception of EV and synthesize market attractiveness. The rise in Big Data analysis and IoT has promised to provide incredible benefits to both consumers and corporations; however, their significance can be blown out of proportion. For that reason, the specific technologies that can be implemented in the EV are further explored to find their realities. It is found that the most prominent barrier slowing adoption is consumers’ range anxiety, and the technologies researched can be used strategically to minimize this negative perception while simultaneously providing businesses fantastic insight. The findings show that the opportunity cost is high but can be excellent for accelerating market share and altering consumer perceptions. Keywords Electric vehicle · Big data analysis · Sensors · Battery range · IoT · Infrastructure

1 Introduction With the rapid acceleration of technologies and the ever-haunting threat of drastic climate change, the Electric Vehicle (EV) has become an attractive alternative for many consumers, and consequently companies, when compared to traditional gaspowered automobiles. Sadly, the Canadian EV Industry does not have the market M. Vice University of Vienna, Oskar Morgenstern Platz 1, 1090 Vienna, Austria e-mail: [email protected] M. Mikolášik (B) Faculty of Management, Comenius University in Bratislava, Odbojarov 10, 831 04 Bratislava, Slovakia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_18

569

570

M. Vice and M. Mikolášik

Fig. 1 Annual EV sales comparison—Canada [1]

share required to minimize the climate impact yet. Canada is especially interesting because of the massive barriers placed on it due to an extremely low population density nationwide—just four people per square kilometer!1 Referring to Fig. 1, the Canadian EV Market has been increasing exponentially over the past eight years, yet still, by 2018, annual sales are a little over forty-thousand country-wide.2 Considering total Canadian automobile sales, including EVs, were over two-million the same year,3 it subsequently follows that one would question reasons as to why this is happening despite the increasing capabilities of the EV. For the sake of this paper, the term “EV” will only refer to commercialized passenger automobiles classified as zero-emission vehicles that are battery-powered. Therefore, hybrid vehicles that utilize gas and electric power in unison will not be included. The focus is not climate change per se, but mitigating this evolving problem is a derivative objective of the EV. The truth is, there is a multitude of reasons for consumers to adopt the EV and replace their gas-powered vehicle; very low charging costs, charging stations are becoming much more accessible, government tax exemptions and subsidies, they are virtually silent when driven, and due to the speed of electricity, EVs gain instant torque with incredible efficiency. This list is by no means exhaustive, but it illustrates the typically overlooked benefits that consumers have because of their overall perception, range anxiety, and view on immediate monetary investment. Due to these dominating negative aspects of the EV, 1

Canadian Population Density—http://worldpopulationreview.com/countries/canada-population/. Figure 1—https://wattev2buy.com/global-ev-sales/canada-ev-sales/. 3 Total Car Sales 2018—https://www.autonews.com/retail/canada-2-million-new-vehicles-sold2018-even-sales-fell-65-december. 2

Technological Advancements Within the Canadian Electric Vehicle …

571

it has made it difficult for companies to gain a large market share in the gas-dominant automotive industry. To have a larger impact on aiding this issue, the industry needs to first gain traction, then aim for a future of EV mass adoption, then idealistically, become the personal automobile norm for Canadian consumers. Therefore, we aim to investigate the tools and processes necessary to make the first step a reality while combating the barriers put in place by competitors and technological capabilities. From a business viewpoint, one might be intimidated by the daunting task of changing consumer perceptions of the EV and the traditional mobility dynamic that we have all come to know and understand: by this, it is meant traveling from Point A to B while filling your vehicle with gasoline when required. However, technological advancements are progressing exponentially every day, creating a ripple effect for so many disciplines surrounding the EV industry, making this unique convergence of mobility and data an incredible point in time [2]. To mitigate these barriers that EV companies are faced with, companies have created various sensors to elevate the consumer experience. “Cars are a major driving force of IoT with 1 in 5 or over 250 million cars by 2020 having wireless network capability and a diverse array of sensors” [3]. This can include built-in GPS, parking sensors, road conditions monitoring, etc. The list goes on and is increasing every day with the advancement of technological capabilities. Interestingly, by placing additional sensors within the vehicle, the interconnectivity is now creating immense amounts of data by the communication of devices within the systems. It is estimated that more than 2.5 quintillion bytes of data are created daily, and 90% of the data that has ever existed was created within the last two years.4 This is mainly caused by a rise in the capabilities for everyday items to be interconnected. From your phone with your speaker via Bluetooth to automated manufacturers with their suppliers via the internet: this phenomenon is called the Internet of Things (IoT). From a consumer standpoint, the IoT evolution is allowing for everyday devices to become seamlessly connected through the constant transfer of data created by each device. Looking at it from a technological angle, there is an immense amount of information being transferred device-to-device every second. This computerization of everyday objects, devices, and specifically automobiles, and the ability to share data has tremendous implications which, when harnessed correctly, can create exponential benefits for both the consumer and business [4]. Now, this immense amount of data is useless unless a system is put in place and able to rapidly collect, analyze, and interpret the information. This is where a properly integrated and capable big data analysis (BDA) system is essential. Data has three common characteristics; volume, velocity, and variety. These refer to the sheer amount of information, the rapidity of accumulation, and consistency in altering the types of data being collected, respectively [5, 6, 7]. Ecosystems exist that have the capability to handle petabytes of data and, by utilizing sophisticated architectures, can recognize complex data patterns and create consumer profiles based on transactional

4

Amount of data per day—https://www.forbes.com/sites/bernardmarr/2018/05/21/how-muchdata-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#4315268560ba.

572

M. Vice and M. Mikolášik

data. These abilities are not an exhaustive list of what these systems are capable of but are meant for illustrative purposes to show the significance of their power [8, 9]. The ability to efficiently introduce a BDA system into one’s business process has been understood to realize new profit streams, better predict future market trends, and greatly improve processes for any enterprise. Having complete, accurate, and timely data support businesses when making ambitious decisions because of the mitigated risk that extra information adds. “Big data mining can provide a comprehensive means of considering data to discover latent information from the big data” [10]. Thereby, a properly integrated system can allow companies to make supported decisions based on consumer data. This has drastic implications across many industries, but what is common is that there has begun a shift from product to service-oriented business models in which will be further discussed. The goal of this research is to gain an understanding of the realistic capabilities of integrating a big data system working in unison with a multitude of AI sensors within the car together with interconnected EVs across Canada and to investigate the impact it will have on the consumers’ perception in hopes of ultimately causing the widespread adoption of the EV. Canada was chosen due to its specific geographic barriers, which may have resulted in heightened range anxiety in consumers, which in turn discourages EV adoption. The convergence of global warming, IoT, EV capabilities, technological advancements, and efficient big data systems makes this topic especially interesting when one realizes that the incredible advancements within the industry have resulted in Canadian EV sales that are little more than disheartening. By creating the EV to be an extension of an IoT mechanism that can communicate with other vehicles and devices, the future of the automobile can become an efficient service rather than a traditional product. In conjunction with the previously stated goal, this research will give an understanding of the capabilities of both decision making, business model innovation, and service focused EV efficiency and interpret the findings as whether to decide that AI and the subsequent data analysis will be the spark that allows the EV to become the norm for new vehicle purchases or not. This paper aims to produce (i) an in-depth description of the emerging technological advancements surrounding the EV industry and (ii) a scientific publication analysis on emerging topics such as IoT, business model innovation, vehicle interconnectivity, and big data analysis. It aims to progress the research in the emerging field by combining these important topics and applying them to the EV industry, specifically. The paper will be structured as follows: Sect. 2 describes how these topics coming together is especially significant for the EV industry. Section 3 illustrates the systematic review of scientific literature that analyses the significance of AI/BDA/IoT in the slowly growing EV industry. The results are then presented in Sect. 4, where the exact research methods are further discussed. Then Sect. 5 aims to discuss the realities of these advancements specific to the EV industry with regards to Canada. After this, Sect. 6 aims to discuss conclusions and possible future realities of the EV industry.

Technological Advancements Within the Canadian Electric Vehicle …

573

2 Conceptual Background The idea for this topic arose from contemplation within multiple facets surrounding the Electric Vehicle Industry. Climate change implications will not be researched further than the fact that EVs are an excellent way to drastically reduce CO2 emissions when driven, compared to a conventional gas-powered vehicle.5 With barely 2% of total Canadian automobile sales in 2018 being EVs,6 why then, despite the growing need for cleaner transportation, is market share so low? Transformative technologies need time for consumers to accept the change from the norm and grow to understand the advancement and its significance. Tesla, Inc. today is the world leader on commercial EVs with product diversification, allowing the company to gain market share in different classes, e.g., the Model X and the SUV/Crossover class. They have proved the technological capabilities of the EV and begun to change consumer’s perceptions of this alternative energy automobile. However, there are still overlaying ideas that threaten mass adoption of EVs, which are; price sensitivity, range anxiety, and overall perception. First, compared to traditional gas-powered vehicles, EVs are typically more expensive, and many people do not have the capital required to invest in these vehicles. Despite the significant savings on running the EV, due to the lack of a gas tank and attractive government subsidies in Canada, the price tag alone is much too high for many families and individual consumers to invest in. Secondly, the EV is still relatively close to the introduction and the early adopter’s stage according to the Diffusion of Innovations theory,7 which in turn can be assumed that many are still unaware or nervous of the capabilities of the technology. People are worried that they won’t make it to their destination, or worse, not be able to get back. An individual is not attracted to a product when they are unsure of its capabilities and view its purchase as a trade-off to the proven alternative. Typically, consumers are not willing to pay extra for something when a cheaper and more reliable option has stood the test of time and proven itself again and again. The truth is, there are many reasons as to why consumers do not want to purchase an EV, but it all boils down to their overall perception of the vehicle. By proving to consumers that a product is much more attractive and capable than the alternative, then the market can be stimulated. Therefore, if businesses can change consumer perceptions of what the EV is and what it is able to do, then sales and overall market share will increase. To prove that EVs are not just environmentally friendly and to show consumers that they need not feel like they are making a sacrifice when purchasing one— businesses need to prove to their customers that an EV is capable of so much more than what the traditional gas-powered vehicle is. Logically, the business needs to create a product that can do this, and this idea is not as far-fetched as one may think. 5 CO emissions for EVs—https://www.ucsusa.org/clean-vehicles/electric-vehicles/ev-emissions2 tool. 6 See Footnote 2. 7 Diffusion of Innovations—https://pdfs.semanticscholar.org/7d42/c5ee80aaa3d85a9cbce19244 3d65e13ad324.pdf.

574

M. Vice and M. Mikolášik

Fig. 2 Tracking solar price declines

With advancements happening within the industry every day, resources are becoming cheaper and cheaper, allowing for a lower price tag on a new EV. Take the cost per Watt of Solar Power prices, for example, see Fig. 2. In just four years, prices have dropped almost 21%, which can save households thousands per year.8 Prices are continually lowering with the advancements of new technologies, allowing for EV companies to gradually reduce their costs to slowly increase their market share over time. The abilities of the battery technology will, too, progressively become on par with the range abilities of gas-powered vehicles and eventually become superior to that of its counterparts. However, businesses can do much more than simply waiting patiently for the capabilities to catch up. Companies have begun to implement a plethora of interconnected devices within vehicles to elevate the consumer experience and perceived value. From parking sensors, auto-pilot, hands-free calling, internal computer dashboard, etc., the list of AI and computerized processes are seemingly infinite. The total amount of extras will most likely not be the deciding factor that leads to the mass adoption of the EV, as these advancements may seem like gimmicks to many consumers; however, the more devices that are utilized, the more data that is accessible. It follows that with more data comes more insight, but there must be a capable system in place that can handle immense amounts of data at incredible speeds. It has been well understood for ages that a business that successfully integrates a big data analysis, BDA system can gain insight on consumers and better predict market trends. However, due to monetary constraints and technical limitations, many companies, other than those of the likes of Google and Apple, are unable to utilize this incredible technology. The terms Big Data and real-time analysis have been thrown around and unfortunately can be blown out 8

Figure 2—https://news.energysage.com/solar-panel-efficiency-cost-over-time/.

Technological Advancements Within the Canadian Electric Vehicle …

575

of proportion, making it increasingly difficult to form non-biased views of BDA systems’ capabilities, the rise of IoT, and the computerization of cars [11]. What makes the convergence of BDA and business management so interesting is that it does not only benefit the bottom line for a company, but it significantly elevates the consumer experience. It aims to provide insight for businesses for a multitude of applications while providing the consumers with an integrated ecosystem that drastically changes the perceived value and expectations of what a vehicle can do. As stated before, companies should aim to change the consumers’ perceived value of the EV. Therefore, an investigation of the advancements in other technologies (i.e. AI sensors, BDA, cloud computing, IoT, etc.) surrounding the industry is required so that one may understand their realities/limitations to decipher whether they can be integrated to accelerate the mass-adoption of the EV, or not [12]. Therefore, the purpose of this research is to investigate the capabilities of AI and BDA systems working in synergy with EVs to elevate the consumer experience above what is capable of the traditional gas-powered vehicle, and to accurately convey the significance of these advancements as it pertains to the EV industry. To gain the academic support and to draw accurate conclusions, it is required to conduct scientific research via a literary review on topics surrounding these emerging techniques and technologies so that the realities can be well-supported and thorough [13].

3 Methodology The academic framework for this paper was constructed through the review of publications relative to; Electric Vehicles, Marketing/Business Management, and Big Data Analysis. The aim of the thesis was to connect well-researched areas and make inferences and draw conclusions so that the ideas discussed may be specific to the EV industry. In this section, we explore the systematic analysis and step-by-step procedure followed to create a relative database to produce the strongest academic foundation possible. Table 1 gives an overview of this process.

3.1 Online Query for Relevant Publications Articles and relevant papers were found using five online databases that are best known for their depth of documents in the information systems industry: ACM, EBSCO, Google Scholar, IEEE, and SpringerLink. These databases were chosen so that a bridge could be formed between management information systems (MIS) and the computational engineering that makes progress in big data analysis possible. Although Google Scholar is not technically an MIS or computer-science specific database, the sheer accuracy of the search engine and the overall quantity of relevant academic papers found using this database was imperative to the success of the research.

576

M. Vice and M. Mikolášik

Table 1 Overview of the step-by-step process taken for the systematic academic paper review Step number

Title

Described in

Activities

Results

1

First online searches for relevant sources

Section 3.1

Searched in 5 databases with variations of 6 search terms

877 hits

2

Loose data cleansing and narrowing publication sample

Section 3.2

Eliminated duplicates and irrelevant papers based on titles and abstracts

74 hits

3

Text review of preselected publications

Section 3.3

Skim of full texts for accuracy

61 hits

4

2nd draft—full-text review of remaining publications

Section 3.4

Full text analysis for information harvesting

42 hits

5

Final text review

Section 3.5

Quick reading of full text: trends, quotes, and commonalities were noted for future analysis

18 hits

To find a wide variety of relevant paper, the five online databases were searched using variations of six key-words or phrases; ‘electric vehicles’, ‘value creation’, ‘big data’, ‘real-time big data analytics’, ‘AI’, and ‘marketing’. It was necessary for most searches to add ‘electric vehicles’ to each combination as sources that are industryspecific make interpretations and conclusions drawn much more feasible and relevant. Further research was conducted for extra information during the conclusions and analysis sections; however, they are included in a separate references page. This includes searches for Climate Change, various emissions report cards, the growth of the EV industry for specific countries/companies, etc. Using the previously listed terms, a total of 877 hits were found from the databases: ACM [73], EBSCO [94], Google Scholar [634], IEEE [14], and SpringerLink [51]. See Table 2. For each search engine, excluding IEEE, it was necessary to take unique steps to ensure accuracy of resources found, which include the following; 1. 2.

3.

4.

ACM: required searching for exact words surrounded by quotations (“,”) otherwise, an infeasible amount of resources was presented. EBSCO: the searches excluded hits that were under the subgenre of “News” and “Products” leaving only “Resources” to ensure only academic findings were presented. Google Scholar: negative statements were required to exclude papers about smart grids and manufacturing, which are essential to the EV industry but not to this specific analysis. SpringerLink: the content type was constrained to “Business and Management” articles. Necessary to avoid strictly computer science-related papers with a strong mathematical approach to the research.

Technological Advancements Within the Canadian Electric Vehicle …

577

Table 2 Retrieved hits per source and search term Search terms

ACM

EBSCO

Google Scholar

IEEE

SpringerLink

Totals

“Electric vehicles” + “marketing” + “AI”

55

18

483

12

35

603

“Electric vehicles” + “value creation” + “big data”

2

53

48

5

9

117

“Electric vehicles” + “value creation” + “AI”

0

23

34

0

4

61

“Electric vehicles” + “consumer analytics” + “marketing” + “AI”

0

0

2

0

0

2

“Electric vehicles” + “marketing” + “real time big data analytics”

2

0

6

4

1

13

“Electric vehicles” + “big data” + “AI” + “marketing”

14

0

45

0

1

60

“Real time big data analytics” + “marketing” + “value creation”

0

0

6

4

1

11

“Electric vehicles” “big data analytics” + “value creation” + “AI”

0

0

10

0

0

10

73

94

634

25

51

877

Totals

3.2 Data Cleansing and Publication Selection The total of 877 hits were to go through the data cleaning process to narrow down the academic database to a much more precise and manageable number. The total number of online findings were skimmed through using a simple title skimming process. Papers that were even slightly intriguing and informative would undergo an abstract evaluation for overall relevance. At the end of this, 74 publications were chosen to become part of the database that would be used for referencing and research. The remaining 803 hits were disregarded as their overall topics were deemed to be too distant from the thesis focus.

578

M. Vice and M. Mikolášik

3.3 Full-Text Review of Preselected Publications The remaining 74 publications were given a loose review of the full-text with specific criteria that needed to be met. They are as follows: 1. 2. 3. 4.

The paper’s theme remains centered around AI and big data. The paper’s theme remains centered around marketing and business models. The paper’s theme remains academically supported and discusses the realities of AI and BDA. The paper is an original piece.

After the review, 13 publications were discarded from the sample because they strayed too far away from the criteria. Despite focusing on big data in the EV industry, many papers were eliminated as the common theme centered around manufacturing and AI applications for efficiency. Although this is important for the widespread adoption of EVs, this strayed too far from the thesis topic at hand and marketing in general. Other papers were forced to be ignored since the focus was the implementation and importance of AI for the development of smart grids for charging EVs in the most efficient way possible. These papers were not used as the applications of the information were mostly for the development of smart cities and more efficient EV infrastructure. Although important and seemingly correlated, for marketing and accuracy purposes, these papers were avoided. The remaining 61 publications made up the final database, which was all used in the creation of this analytical paper.

3.4 2nd Draft—Full-Text Review of Remaining Publications Sixty-one publications were left over after the initial database review. The remaining were all given a much more thorough review for crucial information that would be used in the core of this paper’s analysis. The papers that were discarded had one or more of the following criteria: 1. 2. 3. 4.

Contents and genre of subtopic were deemed irrelevant from research (i.e., POS systems, supply chain management, etc.). Irrelevant for the EV industry. Much too specific on big data architecture (i.e., javascript, server requirements, etc.). Some focused on fully automated driving systems (i.e. computerized fleet traffic control).

The 4 criteria listed were derived through the full-text analysis, and the commonalities of discarded papers were developed. To ensure the paper did not stray too far, which would cause for a vague description of many findings rather than the desired opposite. The 19 papers that were not used in the final analysis did, in fact, describe crucial aspects of the EV Industry, including AI integration and BDA, but were ultimately deemed unnecessary or irrelevant for the desired topic.

Technological Advancements Within the Canadian Electric Vehicle …

579

All 61 documents were evaluated by one individual allowing for consistency in the fundamental selection process, further ensuring accuracy; however, it limited the ability to add more research papers due to time restrictions.

3.5 Final Text Review After the full-text analysis conducted in Sect. 3.4, another accelerated skim of the [15] remaining papers was conducted. The purpose was to give the papers a quick second look to find: 1. 2. 3.

The commonalities or overarching themes that connect the papers. Papers that are written with above-average accuracy for the chosen topic which will be used heavily through the analysis section of this paper (see Sect. 5.1). Specific quotes will be used to support later arguments.

The aim in creating this simple database was to keep a running tab of which papers contain the same theme as others but add very interesting insight. These specific documents will be referenced much more frequently than the others as they offer compelling insights and/or quotes. A total of 18 of these papers were utilized and will be discussed in Sect. 5.1.

4 Results Four analyses were completed, and the results will be described in the following subsections. Section 4.1 investigates the results from quantitative analysis, and Sect. 4.2. First, an understanding of the dominating disciplines that are leading research and allowing for the most impact on the EV industry will be analyzed. This aims to give an understanding of the fields which are most responsible for solving this issue of slow EV adoption. Next, in Sect. 4.2, a list of important aspects found that derive from the key findings from the created publications database is created. These aspects are divided into technological (Sect. 4.2.1) and non-technological (Sect. 4.2.2) in hopes of understanding what can be realized when effective systems utilize information brought to light by data analysis. This aims to provide a realistic overview of the advancements that are crucial to bringing the end goal, of increasing EV market share, to a reality (Table 3).

580

M. Vice and M. Mikolášik

Table 3 Publication classification and challenges No.

Title

Challenge

1

A quantitative research on marketing and sales in the artificial intelligence age [16]

The impacts on the Literary review/case marketing and sales study/data analysis industries with the rise of AI. Talks job loss, ass well as the advancements AI allow marketing purpose data analysis

2

A study of China’s explosive growth in plug-in electric vehicle market [17]

A full case study of Literary review with Market research the Chinese EV a heavy focus on industry mainly case study focused on the EV consumer behavior for the help with Canadian market analysis

3

A survey of deep learning: platforms, applications, and emerging research trends [18]

The impacts of deep Literary review/case learning and the study and survey realistic applications were analyzed to get a better understanding wat the deep learning is and what is capable of

4

A user study on station Electric car-sharing Consumer study based EV car sharing has become popular, in Shanghai [19] and this study aims to shed some light on the successful implementation and following consumer perceptions

Business management

5

Addressing electric vehicle (EV) sales and range anxiety through parking layout, policy and regulation

Market research

Range anxiety is one of the most influential barriers preventing the widespread adoption of EV’s. Overall better infrastructure improvements and their impacts are analyzed

Methodology

Literature review/case study/cause and effect of infrastructure development

Field/discipline Market research

Computer science

(continued)

Technological Advancements Within the Canadian Electric Vehicle …

581

Table 3 (continued) No.

Title

Challenge

6

Analysis of a consumer survey on plug-in hybrid electric vehicles

EV’s are facing Consumer survey battery and pricing analysis barriers preventing the adoption of the technologies, despite consumer interest. The survey aims to gain insight

Methodology

Field/discipline

7

Beyond the hype; big Mainstream data concepts, journalism has caused methods, and analytics the idea of big data to blow out of proportion, this paper analyses what is reality and what fantasy

Literature review/case study

Business management

8

Big data analysis—Hadoop performance analysis

Big data analysis is becoming crucial to harness the ever-increasing amount of data through connected devices. Hadoop is described in great detail to provide realistic expectations of this BDA software’s abilities

Thesis on BDA architecture

Computer science

9

Big data in building energy efficiency: understanding of big data and main challenges

Building efficiency can be greatly increased with the proper analysis of consumer consumption levels

Literature review/case study/mathematical approach

Engineering

10

Big data: issues, challenges, tools and good practices

Data volume is increasing exponentially and with the proper implementation of AI big data analysis system can provide true insights

Literature review/case study/situational implementation

Business management

Market research

(continued)

582

M. Vice and M. Mikolášik

Table 3 (continued) No.

Title

11

Big data ushering new The volume of Literary review vistas in market personal data has research caused a shift in marketing research techniques, aiming to boost company profits as well as customer satisfaction and engagement

Challenge

Methodology

Field/discipline Market research

12

Business models for sustainable technologies: exploring business model evolution in the case of electric vehicles

The acceleration of new technologies has caused a shift in business models for successful market penetration despite a complex and constantly evolving market

Literary review/case study

Business management

13

BYTE: understanding and mapping big data

Aims to discuss the impact of AI and big data analysis across various industries

Qualitative research/case study

Business management

14

Cities of tomorrow

Discusses the synergy Qualitative of big data and research/case study energy-efficient cities will provide a major boost in society with regards to productivity and life quality

Engineering

15

Comparative analysis of infrastructure: hydrogen fueling and electric charging of vehicles

Analyzes the logistic of implementing various zero-emission vehicle charging stations and the proceeding benefits

Qualitative research/case study

Engineering

16

Competing and coexisting business models for electric vehicles: lessons from international case studies [20]

The EV industry faces Qualitative unique challenges research/case study requiring innovation and creativity to succeed. Four international cases are studied for better understanding

Business management

(continued)

Technological Advancements Within the Canadian Electric Vehicle …

583

Table 3 (continued) No.

Title

Challenge

Methodology

Field/discipline

17

Data-driven business models in connected cars, mobility services and beyond

With the rise of data-driven business models, this paper aims to investigate innovation during the IoT evolution specific to the automotive industries

Qualitative research/case study

Business management

18

Discovering a new digital business model types a study of technology startups from the mobility sector [21]

Aims to analyze successful startups and their respective business model innovations with regards to technological advancement

Qualitative research/case study

Business management

19

Editorial marketing science and big data

Marketing science is ever-evolving and with the rise of IoT. AI and BDA, the discipline will see exponential growth in the upcoming years

Qualitative research/case study

Market research

20

Forecasting sales in the supply chain: consumer analytics in the big data era

Through the analysis Qualitative of big data, consumer research/case study behavior and supply chain management can be understood even at the point of sale and during the in-store purchase experience

Business management

21

E-mobility services: new economic models for transport in the digital economy

By analyzing the Paris-based car-sharing service, the paper discusses the technological advancements that allow for new business models to emerge

Literature review with focus on case study and its implications

Market research

22

Electric car brand positioning in the automotive industry: recommendations for sustainable and innovative marketing strategies

Using Tesla as a case study, recommendations based on past accomplishments and potential future revenue streams are analyzed

Qualitative research/case study

Business management

(continued)

584

M. Vice and M. Mikolášik

Table 3 (continued) No.

Title

Challenge

23

Everything you wanted to know about smart cities [22]

The rise in greenhouse Qualitative gasses and energy research/case study inefficiency within cities has shown that the need for a perfectly connected city; the benefits of a smart city are immense

Methodology

Engineering

24

Flexible mobility on demand: an environmental scan

The impacts of the rise Qualitative in IoT and AI analysis research/case study as it pertains to mobility (shared or as a service). Specific cases are analyzed

Engineering

25

Holistic methodology to analyze EV business model [23]

A mathematical Mathematics approach was used to gain insight into the emerging business models in the EV industry with the introduction of AI and IoT

Engineering

26

Increasing the competitiveness of E-vehicles in Europe [24]

Using Austria and Qualitative Norway, the paper research/case study analyzes the practicality and limitations of an EV for various consumers and lists their means of compensation to gain widespread market adoption

Market research

27

Intelligent efficiency Aims to investigate technology and market the effectiveness of assessment smart technologies in the energy sector by using four main industries and their experiences thus far

Qualitative research/case study

Field/discipline

Engineering

(continued)

Technological Advancements Within the Canadian Electric Vehicle …

585

Table 3 (continued) No.

Title

Challenge

Methodology

Field/discipline

28

Introduction to the special issue on exploring service science for data-driven service design and innovation

Discusses the capabilities of properly integrating big data into a business services in aims to elevate efficiency and consumer experience

Qualitative research/case study

Engineering

29

ISG provider lens: internet of things, U.S. market-quadrant report [25]

Realizes the potential, Qualitative from a business research/case study perspective, of a properly integrated IoT ecosystem in various industries

Business management

30

Local design and global dreams—emerging business models creating the emergent electric vehicle industry [26]

Discusses business models in the existing automotive industry and ways companies can create value for consumers in the emerging EV market

Literature review

Business management

31

A data-driven approach for characterising the charging demand of electric vehicles: a UK case study

Analysis of charging data has been used to get a deeper understanding of consumer charging behavior and the subsequent demand for a larger infrastructure

Qualitative research/case study

Engineering

32

MapReduce: simplified data analysis of big data

Taking a mathematical Mathematical and computer science analysis approach, the realities of efficiently analyzing extremely large databases are shown to be a growing problem, requiring elaborate solutions

Engineering

33

Methods to identify user needs and decision mechanisms for the adoption of electric vehicles

To explore the Quantitative consumer’s perceived research limitations of the EV a focus group was created to analyze their underlying shortcomings in the automotive industry

Market research

(continued)

586

M. Vice and M. Mikolášik

Table 3 (continued) No.

Title

Challenge

Methodology

Field/discipline

34

Mining big data in real-time

Explores the existing programs and their respective challenges, that allow for BDA to gain instant information to make better business decisions

Qualitative research/case study

Engineering

35

On the distribution of individual daily driving distances [27]

The variation of daily Quantitative kilometers driven for research EVs is much too difficult to analyze. The paper aims to develop a method to correlate daily driving distances to normalized density—uses four different cases for analysis

Engineering

36

On the move towards customer-centric business models in the automotive industry—a conceptual reference framework of shared automotive service systems [28]

The automotive Literature industry is review/case study experiencing unique transformations due to the rise in digitization. The subsequent business model shifts are investigated

Business management

37

Real-time big data analysis: an emerging technique [29]

Architecture for RTBDA is necessary for specific industries to remain as efficient and functional as possible. These applications and their models are explored

Computer science

38

Review of big data and big data mining for adding value to enterprises

Big data analytics is Literature review still a relatively new phenomenon, and without advancements in technology, the business will continue to make biased decisions based on incomplete sets

Qualitative research/case study

Engineering

(continued)

Technological Advancements Within the Canadian Electric Vehicle …

587

Table 3 (continued) No.

Title

Challenge

39

Smart automation, customer experience and customer engagement in electric vehicles

Electric vehicles have Literature review little traction in the North American market, however, with the rise of interconnectivity, the shift of product to service orientation allows for a better consumer experience, thereby increasing market share for EV’s

Methodology

Business management

40

Teaching big data management—an active learning approach for higher education

The importance of BDA is being more and more realized daily creating a need for formal teachings for developing professionals in the emerging field

Qualitative research/case study

Business management

41

Vignettes in two-step arrival of internet of things and its reshaping of marketing management’s service-dominant logic [30]

Rising IoT Qualitative technologies impact research/case study the service dominating logic in marketing. The paper aims to revise this logic due to arising IoT innovations

Business management

42

When big data meets dataveillance: the hidden side of analytics

Big data can be extremely useful for companies who are able to extract it efficiently and make correct inferences, however, there are various privacy issues that need to be understood

Business management

Literature review

Field/discipline

588

M. Vice and M. Mikolášik

4.1 Chosen Publications and Respective Research Disciplines When working in unison, AI and BDA systems within vehicles can create immense advancements leveraging all disciplines within the business model (marketing, management, etc.). In this section, an analysis of the industries conducting the most relevant research to understand whether it is mainly technologically dominated or viewed from a business perspective. To understand the dominating disciplines that are researching these topics, the chosen publications for this paper were analyzed based on their fields. To realize the full potential of these technologies, it is essential to have many professions investigate the advancements. If only computer engineering research was conducted, the business implications could be much too expensive or infeasible, and vice-versa. This is beneficial to the paper as it allows for a wider variety of information so that a better-supported conclusion can be made on the realities of BDA and AI. Interestingly, the engineering and computer science papers are fundamental in this analysis as these disciplines bring these fantastical ideologies to reality for the EV industry and their businesses (Fig. 3). With a business-dominated database, the research was more heavily focused on business model innovation. However, the engineering and computer science disciplines were recognized and taken into consideration in the discussion process. This allowed for a well-rounded approach in finding the information required to make realistic conclusions. Creating a database with research from different academic backgrounds was crucial for further analysis and aided in providing a much more thorough and unbiased paper.

Fig. 3 Running total of each publication and its respective discipline

Technological Advancements Within the Canadian Electric Vehicle …

589

4.2 Researched Context Attributes Aside from advancing the technology within the vehicles, incentives and innovations surrounding the EV can create an enticing ecosystem that is not only functional but user-friendly [31]. To further bring the realities of these technologies to the surface and to fully understand the significance of said advancements, it is first required to know exactly what each technology is. Section 4.2 will simply provide definitions and immediate capabilities of each individual technological advancement and/or process. First, in Sect. 4.2.1, these supportive technological advancements will be distinguished, then in Sect. 4.2.2, the incentives and non-technological advancements benefitting the industry, or evolved because of these advancements, will be discussed.

4.2.1

Technological Aspects

This section aims to provide a short introduction of various findings within the database analysis. An understanding of each individual concept and aspect is necessary for the full impact of Sect. 5 to be realized. • Big Data—this term is used to categorize datasets that are of immense size and complexity [32], and is created in part due to the rise of IoT connectivity and other new technologies. It requires sophisticated architectures to derive value and information by sifting through the data. • Big Data Mining—“refers to the activity of going through big datasets to look for relevant information” [32]. Due to the rising amounts of data being created in our everyday lives, businesses have realized the potential of collecting all information from their enterprise and consumers to interpret previously unseen patterns and trends. By systematically analyzing this data, future predictions and decisions can be better executed due to the deeper understanding of what is happening in realtime, the problem here is that the systems must deal with; volume, variety, and velocity of the data. These are incredible barriers to overcome, however, there have been a pair of notable big data analysis ecosystems created that are very efficient in their domain: Hadoop is an excellent resource for BDA. • Real-Time BDA—a term that refers to the implementation of a perfect BDA mechanism that works in “real-time” meaning that the big data mining architecture in place has the ability to store, sort, analyze, and interpret data almost as soon as the data is collected. This is looking towards the future and is meant to be an end-goal for BDA as of now, however, the implications for both technology and businesses are immense with regards to efficiency and accuracy. • Variety—when communications between two devices is conducted, the data created is specific for those participating parties, which in turn creates a massive variation in the type of data being transferred. For example, video, audio, and transactional data will all require their own specific analysis since the medium is different for each communication. Big Data aims to create systems to allow for any data creation to be analyzed, regardless of its form [33].

590

M. Vice and M. Mikolášik

• Volume—with an estimated [34] sextillion bytes of data to exist in 2020 [35], the requirement for an efficient warehouse to handle this quantity of data is ever increasing. Since data can come from a variety of aspects, the architecture in place needs to use sophisticated mechanisms to contain this data and later create value from this information. Big Data technologies aim to do exactly this [33]. • Velocity—the rise of IoT has opened the gates for almost an unfathomable amount of data flowing between devices at every second. No matter the organization type, there is information to be found with the systematic analysis of all this data, however, the real challenge is creating a system that can work at incredible speeds to find insights in real-time [33]. • Hadoop—the “Apache Hadoop” utilizes programming models to process big data sets efficiently—it is a BDA Architecture. Created in 2006, this architecture aims to handle datasets of various sizes and complexities for many applications at lowcost and high efficiency. Due to its ability to scale itself depending on the dataset size, it has become the architecture of choice for companies such as LinkedIn and Facebook [33]. • IoT—“is the core connectivity across networks, systems, data, and objects” [36]. Ranging from daily usage by consumers to manufacturers and their suppliers, this interconnectivity of devices is creating an ecosystem with incredible abilities and data. From a business perspective, one can create a more integrated, consumer-friendly and experience for their clients, all while creating data points that, when analyzed, can create amazing insight. What’s most interesting here is that IoT has seemingly become the norm for the modern consumer as it can elevate their perception of the overall product and subsequent benefits, and it allows for incredible transformations for businesses [36]. • Connected Cars—with the rise in IoT and the advancement of various sophisticated sensors on the vehicle in conjunction with a cars ability to be connected to the internet, this term implies that when cars are interconnected to each other, the sharing of data is mutually beneficial for the user and business. A plethora of information being shared with vehicles who have access to the cloud allows for driving conditions, weather quality, available parking, etc. The amount of applications that can elevate the vehicles ability to work as efficiently as possible is seemingly endless when crucial information is accessible and analyzed instantly by each car. • V2V—vehicle to vehicle (V2V) interconnectivity is a real-time communication “with other vehicles through dynamic wireless data exchange” [37]. This allows cars to have access to information gathered from every connected vehicle, potentially improving safety and increasing driving efficiency. “V2V systems let vehicles exchange information such as tire pressure, speed, and GPS location …” [37], which can allow for collective analysis of real-time traffic reports, which has the potential to reduce traffic congestion and emissions. • Deep Learning—“… have the ability to learn from stochastic data and recognize trends, such that machine learning-based systems have been widely developed for market prediction” [38]. Deep learning refers to the actual information gathered from big data mining, making it arguably the most strived for of all technological

Technological Advancements Within the Canadian Electric Vehicle …

591

advancements found within the database analysis. The opened-ended questions that it can answer and the complex problems it can theoretically solve make this a very attractive technology [32]. With further development of various sensors and integrated computerized processes, the accuracy and importance of deep learning will continue to increase. • Charging Station Infrastructure—refers to the public charging stations available to all EV owners. To lower overall consumer range anxiety, it is crucial for cities to provide sufficient charging stations strategically placed to further incentivize consumers to adopt the new automotive technology. Many charging stations are free to use, however, there have been more powerful fast-charging stations implemented for a small fee. 4.2.2

Non-technological Aspects

There have been multiple advancements either because of these technological advancements or simply aiding in the adoption of the EV. Below, these innovations and incentives will be explored. • Government Incentives—these include any monetary incentives or social benefits provided by a government to create an elevated attractiveness to choosing EVs as a consumer’s mobility medium of choice. “The introduction of a new technology and departures from (internal combustion engines) could require large subsidies and investments as well as a political commitment, a situation more generally found for environmental technology” [39]. Notably, Norway has implemented financial and tax incentives as well as; free 24/7 parking for all EVs and bus-lane access for efficient traffic jam traveling to further incentivize consumers to adopt the technology. • Business Model Innovation—refers to the process of businesses utilizing technological advancements to their advantage in new and innovative ways. With new technologies come new opportunities for companies to find revenue streams and emerging markets. As with IoT advancements, AI has allowed for data-driven business models to be implemented with success, i.e., Uber. These emerging techniques have allowed companies to strategically shift their services to utilize these technologies to the fullest. There are many business model innovations specific to the EV industry, but it is still unclear what the correct model is or if it even exists [40]. These two advancements are especially powerful as they act as a connection between engineered technologies and the public eye. Without businesses in place, there would lack a competitive market to constantly innovate in search of profits, but they in turn can bring the mass-adoption of EVs to a reality. In conjunction, the non-monetary government incentives are very powerful as it provides highly visible social benefits that can act as word-of-mouth to convince people to consider investing in an EV.

592

M. Vice and M. Mikolášik

5 Discussions Due to the sophisticated nature of the growing EV industry, it is essential to draw connections between the aspects described in Sect. 4.2 to gain a comprehensive grasp on the significance of these advancements. Alone, these technologies can slowly alter or advance any industry that utilizes them, however, they are capable of creating benefits much greater than the sum of their parts when working in synergy. For this reason, Sect. 5.1 will analyze the convergence of certain aspects, and Sect. 5.2 will then apply these advancements specifically to the Canadian EV industry.

5.1 Core Findings from Qualitative Research The further discussion of core findings was heavily influenced by the analysis of 18 specific papers, as discussed in Sect. 3.5. These publications covered the crucial topics for this paper’s research and, as a result, will be referenced heavily in the sections to follow. The remaining 24 publications in the database were not discarded, however, the 18 chosen papers covered in detail the main findings but were executed with accuracy that was beneficial to the production of this analysis. The aspects and terms previously explored in Sect. 4.2 are all working in unison to elevate the overall EV industry, however, without further analysis, there is a difficulty distinguishing reality and fantasy when it comes to their implications. For this purpose, the conjunction of specific previously stated topics that are especially interesting will be further explored.

5.1.1

V2V, Interconnectivity, and Business Management

This pertains to vehicles having access to driving information from other connected vehicles, acting as a fleet. An immense number of sensors have continuously been implemented and have become more and more efficient as technological advancements continue, which, as previously stated, creates insightful data. The idea of interconnected vehicles would allow any EV in the fleet to have access to real-time driving data [41]. For example, if a vehicle is experiencing abnormal stoppages on a route, another vehicle with a similar route can gain this information in real-time and automatically adjust based on the next most efficient route. Although Google Maps has been able to predict driving patterns and traffic flow, this further advancement would be more accurate and updated seemingly instantaneously. Driver-Assisted technologies also have incredible capabilities, including selfdriving cars and an interesting technique called “Cooperative Adaptive Cruise Control,” or CACC. These are much further away from mass-adoption; however, the implications are compelling. CACC is a V2V system in place for connected vehicles, allowing them to intercommunicate and work as a platoon with a lead car

Technological Advancements Within the Canadian Electric Vehicle …

593

setting the pace and the others following closely behind at above-average speeds [42]. Interestingly, the more interconnected cars utilizing this technology, the more cars a highway can handle at a specific time due to the increased efficiency created. These technologies have been shown to reduce fuel consumption [3], however, the reality is that the wide-spread use of this technology is much too far to be a focus for increasing EV market share. For consumers, interconnectivity means that they can fully utilize their EVs currently limited battery range due to the maximization of these distances driven. “…vehicles to be electric and intelligently connected to other vehicles and infrastructure (can) reduce traffic accidents, traffic congestions and reduce the discharge of environmental pollutants” [32]. If a vehicle has immediate and perfect knowledge of real-time traffic and road conditions, then the most efficient route will always be chosen, which in turn synthetically boosts the ability of the battery. Therefore, the more vehicles that send and receive information, the more efficient the interconnectivity becomes. Smart systems in place can control traffic through GPS routes, as an absolute knowledge of routes chosen allows the V2V interconnectivity to allocate vehicles and control road congestion of those in the fleet, further increasing traffic efficiency. Consumers would be able to reach their destinations as quickly as possible without taking the risk of excessive speeding, therefore, further progressing the alteration of consumer perceptions of the EV. This allows businesses to accept the limited ability of current battery technologies while still working to alleviate the consumer’s range anxiety. They can now utilize these abilities to communicate this information strategically to consumers in efforts to stimulate the market. There are a multitude of applications for interconnected cars, but they all aim to maximize efficiency and elevate the consumer experience. Conveniently, these are two characteristics especially attractive for businesses trying to increase market share. Interestingly, the data created by these vehicles can help in finding new profit streams as well. For example, driving patterns can be analyzed to find bottlenecks in the current traffic infrastructure. Businesses can then sell this information to city planners/construction companies allowing them to visualize problems with current techniques and designs and improve their future decision-making. This benefits all parties within the system as it maximizes driving efficiency, creates new revenue streams, and advances future road infrastructure for the consumer, business, and city planning teams, respectively. The applications for V2V are endless, but ultimately, it allows; companies to tackle the haunting range anxiety held by consumers; justifies the current above-average prices of EVs; and creates an uncompromised efficiency-boosting ecosystem with the consumer as the focal point.

5.1.2

Charging Station Infrastructure and BDA/Hadoop

Hadoop is a BDA architecture capable of handling petabytes of data with excellent pattern recognition analysis allowing for accurate and efficient information mining within the endless stream of data, which comes because of IoT and the computerization of vehicles. In continuing the theme of changing consumer perceptions, this

594

M. Vice and M. Mikolášik

technology can be utilized by companies to harness driver information and apply it for regional marketing. Hadoop is capable of setting parameters within the system, which in turn would allow for geolocation analysis, giving one the ability to analyze target markets with much more accuracy than previously possible. With a complete analysis of datasets, it becomes possible to gain information of specific target markets, which would mitigate risk for businesses and shed light upon new revenue streams. Therefore, analyzed big datasets can aid in better decision making. The goal for companies is to increase market share for the EV, which becomes increasingly difficult the further one travels outside of a city. This is because there is a lack of charging station infrastructure available due to high costs and low population density. With GPS navigation data being accumulated through interconnected cars, there comes the ability for Hadoop to step in and analyze the driving data created. Previously, it was stated that this information could be applied to make driving much more efficient with hopes to maximize the currently limited battery abilities. However, there is another crucial aspect in which this data can be applied to further change consumer perceptions—creating a maximally efficient charging station infrastructure. The unification of Hadoop and charging station infrastructure is especially important for addressing consumer range anxiety. “…the number one concern with purchasing EVs was the range and the second the ability to charge” [43]. Consumers are worried that they can drive to work but will be unable to return on a single charge, making an efficient charging station infrastructure essential. An analysis of driving patterns can give an understanding of the most popular routes and even decipher where the most strategic location for charging spots can be. Therefore, through data, a company could gain a bird’s eye view of specific geolocations that are potentially attractive markets but lack a proper charging station infrastructure. Without an adequate number of charging stations readily available, a consumer will be discouraged from investing in an EV due to heightened range anxiety. To further incentivize consumers to purchase an EV, there need to be tangible driving assets conveniently accessible. The end goal is to lower range anxiety in consumers: it is crucial that they perceive the infrastructure in place is sufficient for their driving needs. Otherwise, the decision to buy an EV would be unjustified. By making these charging stations visible and placing them along notably high traffic areas, consumers can begin to alter their ideas of the EV. Interestingly, it is said that 72% of people are more inclined to purchase an EV if charging stations are conveniently located at their place of work or destination [43]. With this understanding of consumer perceptions, it follows that there needs to be a systematic implementation of charging stations at the most strategic places possible, which is exactly what a BDA system is capable of for the EV industry, further allowing for growth in the market share.

5.1.3

Business Model Innovation and AI

An AI ecosystem is an interactive consumer interface in which they can have access to personal and public information in one location while the platform provides structure to their daily lives. When integrated into a vehicle, it can greatly enhance a consumer’s

Technological Advancements Within the Canadian Electric Vehicle …

595

experience, overall perception and allow businesses to gain the information needed to justify business model innovation. Take Uber, for example; this company realized an opportunity to streamline the mobility as a service (MaaS) [44] industry by utilizing advancements in GPS and smartphone technology to provide consumers with a more convenient process. By understanding technological improvements surrounding the industry and applying them strategically, a business can greatly innovate current processes and recognize new revenue streams when they embrace and capitalize on AI advancements. Companies are incentivized to develop creative ways to utilize technology due to the competitive nature of the business world, which as a result, has caused many companies to strive for business model innovations and unique implementations of technologies to create a consumer-centric service that subsequently benefits their bottom line. EV Businesses need to “…know how to build a comprehensive ecosystem encompassing virtually all areas of life” [45, 46] within their product to further shift consumers’ perceptions. An implemented ecosystem would utilize smart technologies and features within an EV, to greatly elevate a consumer’s experience with the product or service [46]. There has been a multitude of corporations that have begun research and development within the automotive industry with the aims of creating an encompassing ecosystem with the consumer as the focus; Apple, Google, Microsoft, Alibaba, and Amazon are notable examples [46]. All of these companies are investing in the automotive industry as they believe that the next big platform for them to capitalize upon is in vehicles. Collectively, they understand that interconnectivity and IoT will progress into the mobility sector, which is exactly why Amazon, for instance, has implemented their smart home system ‘Alexa’9 into the vehicle to create “…a seamless transition from home to the car” [46]. Amazon and the other previously listed companies are all chasing the same goal: the creation of an encompassing platform that aids consumers in multiple facets of their daily life and seamlessly interconnects their IoT devices creating a convenient and invaluable smart system that is integrated harmoniously into their lives. The successful implementation that is accepted by consumers will create unprecedented benefits for both the individual and businesses. In a perfect world, this idealistic ecosystem would be the one-stop platform for all the consumer’s needs, and although companies are heading in this direction, this is an incredible feat for businesses to accomplish. Therefore, it is important to acknowledge that the advancements in AI and IoT are simply providing businesses with capabilities to shift from product to service-centric goods with hopes to create brand loyalty [47] and aid in realizing new revenue streams due to the adoption of these advancements. “…future consumer demand in the automobile industry will be more relationship-based, and the automakers’ role will change more to servicing the customers, rather than just selling them vehicles” [32]. To gain EV market share, a business must provide its consumers with a product and/or service that provides benefits that vastly outweigh those of the traditional gas-powered automobile. By creating this encompassing ecosystem that meshes well with consumer demands, then they will be further incentivized to purchase an EV. 9

Marketing in the Age of Alexa—Harvard Business Review, May 2009.

596

M. Vice and M. Mikolášik

5.2 Application of Advancements to the Canadian EV Industry The largest barrier for the Canadian EV market is consumer range anxiety due to the its incredible geographic size and low population density. Attracting even those with limited daily driving distances can feel dissonance with the new technology as the internal combustion engine (ICE) has withstood the test of time and proven itself reliable. Utilizing the technologies in Sect. 5.1 can have amazing benefits when applied strategically. It has been explained that the charging station infrastructure and traffic flow can be maximized for efficiency, however, it is crucial that there be a systematic implementation of infrastructure. Utilizing these technologies must begin in city centers and slowly branch out. This may seem obvious, however, the logic behind it is a bit misunderstood. Of course, there is a larger market in cities as they are economically and structurally superior to rural areas, but it is important to focus on populated areas to create incentive for consumers who commute into the city to adopt. If these individuals adopt an EV, then product awareness and feasibility will begin to grow in the city’s surrounding areas. This is simply due to the sight of the EV in an area in which a consumer may have previously assumed was not suitable to own this vehicle as it is impractical. The current stage of EV adoption has made it more than likely that these individuals already exist, meaning that driving data is available. An implemented BDA system could use pattern recognition to pinpoint the exact locations in which charging station investments would be most strategic for both adoption rates and awareness. The idea behind utilizing an efficient BDA system, such as Hadoop, is not for immediate monetary benefits but for long-term efficiency and strategy. However, with the adoption of this technology and properly integrating it into business models, companies within the EV industry would have the ability to; artificially increase battery range, alter consumer perceptions of EV feasibility and maximize EV awareness, integrate a mutually-beneficial AI ecosystem that elevates the consumer experience and provides businesses with exponential returns, provide insight to new profit streams in conjunction with lower-risk, creates an increasingly efficient driving infrastructure and interconnectivity aims to make driving itself more efficient. From a business standpoint, companies that utilize this technology to its fullest would have access to these benefits and gain the knowledge of how to attract more people to the rising EV market and further alleviate range anxiety. Innovation will allow consumers to change their overall perception of what an electric vehicle can do. With regards to the cost of implementing these systems, it is again a long-term investment. There were not figures explicitly stated in any of the research conducted, however, even though it may be expensive, the benefits will bring lower costs increasingly. Take the charging station infrastructure, for example. The more monetarily strategic and efficient a location chosen for the investment of a charging station, the less it costs a business per consumer. Through the analysis of driving data, a BDA system can provide insight to business owners of high-traffic locations most suitable for a charging station, which in turn alleviates much of the risk involved with

Technological Advancements Within the Canadian Electric Vehicle …

597

the investment. So, although it may be expensive to implement these systems, it is meant to be integrated into a business much like that of the HR or Marketing department. The benefits of an integrated BDA system do not stop at strategy and decision making, but they also show flaws in current practices and shed light on potential new revenue streams allowing for further company growth. The Canadian business environment is already extremely competitive, and companies are impeding their own development by disregarding the utilization of these technological advancements. Finding innovative ways to utilize technology, gaining a competitive advantage over competitors, and skillfully strategizing based on current market conditions are three characteristics that businesses desperately need for growth, and BDA is an essential key to gaining these benefits.

6 Conclusions The adoption of various AI sensors and IoT devices can create exponential benefits when their data is harnessed and processed by BDA systems. When utilized, they can provide insight for businesses on new profit streams and techniques to elevate the overall consumer experience. Through analysis, “novel insights on how vehicles are used and the way in which mobility is consumed becomes accessible when the generated and platform-processed data is harvested” [48]. Making the long-term investment in BDA crucial for the adoption of EVs in the Canadian market. However, Canada is not typically known for its long-term orientation,10 making the initial investment in a BDA system much more difficult for many companies. Although this technology is meant to aid in risk-aversion, making it an incredible tool, businesses need to understand the realities of these technologies and their capabilities before the adoption can occur. The main goal of finding these technological advancements was to understand if they have the capability of altering consumer perceptions of the EV. The technology needs to prove to consumers that the EV is not a trade-off alternative to the traditional gas-powered vehicle, but it is a reliable and capable medium for personal mobility. Regarding this statement, the research has not shown that the EV will become a better tool for mobility as the current abilities of batteries is not on par with that of the ICE. However, it has proven that they can elevate the encompassing driving experience by providing incredible efficiency, benefits, and customization opportunities, making each consumer’s vehicle their own. As for changing the consumer’s perception of the EV, these advancements will not necessarily change their perceptions more than they already are, however, they can accelerate this process. Consumers need to physically see EVs and their infrastructure to begin to understand the feasibility of owning one. With the BDA systems in place, a business can make the most strategic and efficient decisions to maximize awareness. What makes the integration of this technology into products and business models 10

Hofstede Long Term Orientation—https://www.hofstede-insights.com/country/canada/.

598

M. Vice and M. Mikolášik

is that they benefit both the business and customer by providing incredible insight and upgrade the user interface and experience, respectively. A technology that can do this will not necessarily shatter perceptions of the old EV and create this idolized idea of the futuristic EV as many consumers will come to overlook the maximized driving efficiency that it provides. Instead, they may look at monetary benefits or the prestige that comes with being seen driving the vehicle. The rationale for purchasing an EV will vary from consumer to consumer, but a business can use the efficiency and risk-averse benefits that a BDA system realizes to influence this reasoning. Due to the nature of this thesis topic, there were many sophisticated technological advancements that needed to be briefly explored due to time constraints. To further explore the impacts these technological advancements have on the EV industry, the following need to be individually researched; BDA, IoT, Hadoop, V2V, Business Model Innovation, Computerization of Cars, Big Data Privacy, and EV Battery Technology/Charging Infrastructure. An in-depth understanding of these individual advancements will allow for more realistic analysis and conclusion to this specific thesis topic.

References 1. Bonges, H.A., Lusk, A.C.: Addressing electric vehicle (EV) sales and range anxiety through parking layout, policy and regulation. Transp. Res Part A Policy Pract. 83, 63–73 (2016). https:// doi.org/10.1016/j.tra.2015.09.011 2. Bohnsack, R., Pinkse, J., Kolk, A.: Business models for sustainable technologies: exploring business model evolution in the case of electric vehicles. Res. Policy 43(2), 284–300 (2014). https://doi.org/10.1016/j.respol.2013.10.014 3. Verhoef, P.: Creating Value with Big Data Analytics: Making Smarter Marketing Decisions, 1st edn. (2016). https://doi.org/10.4324/9781315734750 4. Borangiu, T., Polese, F.: Introduction to the special issue on exploring service science for datadriven service design and innovation. Serv. Sci. 9(4), v–x (2017). https://doi.org/10.1287/serv. 2017.0195 5. cities-of-tomorrow_en.pdf (n.d.). Retrieved from https://www.enel.com/content/dam/enelcom/media/document/cities-of-tomorrow_en.pdf 6. Kryvinska, N., Gregus, M.: SOA and its Business Value in Requirements, Features, Practices and Methodologies. Comenius University in Bratislava (2014). ISBN: 9788022337649 7. Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Xhafa, F.: Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. Computing 101(11), 1661–1685 (2019). https://doi.org/10.1007/s00607-018-0680-z 8. Analysis of a consumer survey on plug-in hybrid electric vehicles. Elsevier Enhanced Reader (n.d.). https://doi.org/10.1016/j.tra.2014.02.019 9. Elsevier Enhanced Reader (n.d.). https://doi.org/10.1016/j.ijforecast.2018.09.003 10. Rogers, E.: Intelligent Efficiency Technology 97 (n.d.) 11. Atanassov, A.: Review of big data and big data mining for adding big value to enterprises 9 (n.d.) 12. Beyond the hype: big data concepts, methods, and analytics. Elsevier Enhanced Reader (n.d.). https://doi.org/10.1016/j.ijinfomgt.2014.10.007 13. Chintagunta, P., Hanssens, D.M., Hauser, J.R.: Editorial—Marketing Science and Big Data 3 (n.d.)

Technological Advancements Within the Canadian Electric Vehicle …

599

14. Koseleva, N., Ropaite, G.: Big data in building energy efficiency: understanding of big data and main challenges. Procedia Eng. 172, 544–549 (2017). https://doi.org/10.1016/j.proeng.2017. 02.064 15. Wang, H., You, F., Chu, X., Li, X., Sun, X.: Research on customer marketing acceptance for future automatic driving—a case study in China city. IEEE Access 7, 20938–20949 (2019). https://doi.org/10.1109/ACCESS.2019.2898936 16. Yang, Y., Siau, K.L.: A qualitative research on marketing and sales in the artificial intelligence age. Mach. Learn. 7 (2018) 17. Ou, S., Lin, Z., Wu, Z., Zheng, J., Lyu, R., Przesmitzki, S. V., He, X.: A study of China’s explosive growth in the plug-in electric vehicle market (No. ORNL/TM--2016/750, 1341568) (2017). https://doi.org/10.2172/1341568 18. Hatcher, W.G., Yu, W.: A survey of deep learning: platforms, applications and emerging research trends. IEEE Access 6, 24411–24432 (2018). https://doi.org/10.1109/ACCESS.2018. 2830661 19. Yu, A., Pettersson, S., Wedlin, J., Jin, Y., Yu, J.: A user study on station-based EV car sharing in Shanghai 10 (n.d.) 20. Weiller, C., Shang, A., Neely, A., Shi, Y.: Competing and Co-existing Business Models for EV: Lessons from International Case Studies 21 (n.d.) 21. Remane, G., Hildebrandt, B., Hanelt, A., Kolbe, L.M.: Discovering New Digital Business Model Types—A Study of Technology Startups from the Mobility Sector 17 (n.d.) 22. Mohanty, S.P., Choppali, U., Kougianos, E.: Everything you wanted to know about smart cities 10 (n.d.) 23. Jun, M., Muro, A.D.: Holistic methodology to analyze EV business models. Int. J. Innov. Manag. Technol. (2013). https://doi.org/10.7763/IJIMT.2013.V4.402 24. Figenbaum, E., Fearnley, N., Pfaffenbichler, P., Hjorthol, R., Kolbenstvedt, M., Jellinek, R., et al.: Increasing the competitiveness of e-vehicles in Europe. Eur. Transp. Res. Rev. 7(3) (2015). https://doi.org/10.1007/s12544-015-0177-1 25. isg-iot-quadrant-report.pdf (n.d.). Retrieved from https://www.cognizant.com.au/Resources/ isg-iot-quadrant-report.pdf 26. Rask, M., Andersen, P.H., Linneberg, M.S., Christensen, P.R.: Local Design & Global Dreams—Emerging Business Models creating the Emergent Electric Vehicle Industry 20 (n.d.) 27. Plötz, P., Jakobsson, N., Sprei, F.: On the distribution of individual daily driving distances. Transp. Res. Part B Methodol. 101, 213–227 (2017). https://doi.org/10.1016/j.trb.2017.04.008 28. Isaksson, D., Wennberg, K.: Digitalization and collective value creation 16 (n.d.) 29. Phatak, A.A.: Real time big data analysis: an emerging technique. KHOJ J. Indian Manag. Res. Pract. 48–54 (2016) 30. Woodside, A.G.: Vignettes in the two-step arrival of the internet of things and its reshaping of marketing management’s service-dominant logic 15 (n.d.) 31. Ensslen, A., Kuehl, N., Stryja, C., Jochem, P.: Methods to identify user needs and decision mechanisms for the adoption of electric vehicles. World Electr. Veh. J. 8(3), 673–684 (2016). https://doi.org/10.3390/wevj8030673 32. Seiberth, D.G.: Data-Driven Business Models in Connected Cars, Mobility Services & Beyond 58 (n.d.) 33. BYTE-D1.1-FINAL-post-Y1-review.compressed-1.pdf (n.d.). Retrieved from http://byte-pro ject.eu/wp-content/uploads/2016/03/BYTE-D1.1-FINAL-post-Y1-review.compressed-1.pdf 34. Ullah, A., Aimin, W., Ahmed, M.: Smart automation, customer experience and customer engagement in electric vehicles. Sustainability 10(5), 1350 (2018). https://doi.org/10.3390/ su10051350 35. Comparative-Analysis-of-Infrastructures-Hydrogen-Fueling-and-Electric-Charging-ofVehicles.pdf (n.d.). Retrieved from https://www.researchgate.net/profile/Martin_Robinius/ publication/322698780_Comparative_Analysis_of_Infrastructures_Hydrogen_Fueling_and_ Electric_Charging_of_Vehicles/links/5a69dd8ba6fdccf8849667d3/Comparative-Analysis-ofInfrastructures-Hydrogen-Fueling-and-Electric-Charging-of-Vehicles.pdf

600

M. Vice and M. Mikolášik

36. On the move towards customer-centric business models in the automotive industry—a conceptual reference framework of shared automotive service systems. SpringerLink (n.d.). Retrieved 3 Apr 2019, from https://link.springer.com/article/10.1007/s12525-018-0321-6 37. Merchant, A.: Big data: ushering new vistas in market research. Big Data 5 (n.d.) 38. Axsen, D.J., Goldberg, S., Melton, N.: Canada’s Electric Vehicle Policy Report Card 77 (n.d.) 39. Liyanage, S., Dia, H., Abduljabbar, R., Bagloee, S.: Flexible mobility on-demand: an environmental scan. Sustainability 11(5), 1262 (2019). https://doi.org/10.3390/su11051262main. pdf 40. Dinter, B., Jaekel, T.: Teaching Big Data Management—An Active Learning Approach for Higher Education. South Korea, 17 (2017) 41. Rambow, N.G., Rambow-Hoeschele, K.: The Connected Vehicle and its Impact on the Development of Electromobility 5 (2018) 42. MapReduce: Simplified Data Analysis of Big Data. Elsevier Enhanced Reader (n.d.). https:// doi.org/10.1016/j.procs.2015.07.392 43. Bifet, A.: Mining Big Data in Real Time 6 (n.d.) 44. Katal, A., Wazid, M., Goudar, R.H.: Big data: issues, challenges, tools and good practices. In: 2013 Sixth International Conference on Contemporary Computing (IC3), pp. 404–409. https:// doi.org/10.1109/IC3.2013.6612229(2013) 45. Emobility-services-final-double-spread.pdf (n.d.). Retrieved from http://www.nemode.ac.uk/ wp-content/uploads/2012/12/Emobility-services-final-double-spread.pdf 46. Esposti, S.D.: When big data meets dataveillance: the hidden side of analytics. Surveill. Soc. 12(2), 209–225 (2014). https://doi.org/10.24908/ss.v12i2.5113 47. EV Brand Positioning—Marketing Innovation (n.d.) 48. Raste_sdsu_0220N_10274.pdf (n.d.). Retrieved from http://sdsu-dspace.calstate.edu/bitstr eam/handle/10211.3/120375/Raste_sdsu_0220N_10274.pdf?sequence=1

Game Analytics—Business Impact, Methods and Tools Rober Flunger, Andreas Mladenow, and Christine Strauss

Abstract The gaming business has developed into a prosperous digital business segment with exceptional business prospects during recent years and has evolved into a considerable economic sector. Hence, this contribution outlines the relevance and potential of game analytics in the context of gaming business. We identify and discuss crucial aspects of analytical and predictive models for free to play (F2P) business models. Based on a literature review we analyze several business issues where game analytics may provide major benefit. Besides identifying motivations for small and medium sized game developers to use game analytic tools, we furthermore introduce six studies, which discuss churn prediction models in F2P games, as well as four studies on prediction of customers’ lifetime value. Emphasis is laid on methods, metrics and tools in game analytics, such as player churn prediction and customer lifetime value (CLV) prediction, and their functionalities. Keywords Game analytics · Free to play · Customer lifetime value · Gaming industry

1 Introduction Markets enabled through information and communication technologies (ICT) have grown at a tremendous pace during the past years [1, 2]. Along with the growth of markets and sectors many novel business models and opportunities have emerged, enabled through innovations such as high-speed internet and high-quality mobile devices [3–6]. As one of the fastest growing industries in the world, the online R. Flunger · A. Mladenow (B) · C. Strauss (B) University of Vienna, Oskar Morgenstern Platz 1, 1090 Vienna, Austria e-mail: [email protected] C. Strauss e-mail: [email protected] R. Flunger e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_19

601

602

R. Flunger et al.

gaming industry probably stands out the most [7, 8]. For researchers, this market is of particular interest, as it is the first truly digital native industry with the inherent potential to disrupt traditional business concepts [9]. The industry is characterized by a high degree of innovation, such as digital distribution, downloadable content, independent game development, early access titles and the free-to-play (F2P) business model [10, 11]. Especially the latter is relevant, as it has become the most successful monetization model in games [12]. The emergence of F2P approaches led to the erosion of traditional pay-to-play (P2P) business models. For example, the successful online role-playing game World of Warcraft which requires a monthly subscription to play, is reported to have dropped about one third of their subscribers from 2010 to 2013, as gamers trended towards free alternatives [13]. Another example is Valve’s game Team Fortress 2, which first launched as retail game in 2007. In 2012 the game was made completely free-to-play, with an in-game shop selling virtual goods. Doing so increased the revenue of the game by a factor of twelve [14]. Overall, the global video game market was estimated to have a revenue of $159.3 billion in 2020, which is a 9.3% increase from the previous year. The market is forecasted to grow further at a rapid pace, crossing the $200 billion mark by the end of 2023. Mobile games, being mostly F2P, accounted for an estimated revenue of $77.2, comprising almost half of the total revenue. The other half is shared by console and PC games. Most of total global revenue, namely $78.4 billion, or 49%, is produced in the Asia–Pacific region, especially China, Japan and South Korea, followed by North America and Europe, with $40 billion (25%) and $29.6 billion (19%) respectively [15]. Monetization is not upfront anymore but happens continuously through selling virtual goods that enhance the gaming experience. However, the conversion rate from non-paying to paying players is reported to be extremely low, rarely over 5% [16]. In the online gaming industry, F2P games massively increased their market share in a short amount of time, causing more and more businesses to rethink their (subscription-based) models [17]. It is noteworthy though that such a switch was rarely successful, indicating that developers do not understand the intricacies of the new business model to its full extent [18]. Accordingly, it seems promising for research and practice to analyze all form of data that pertains to a F2P business model. In this regard, game analytics has recently emerged as a new field for data mining in the online gaming industry [19–21]. Against this background, this paper investigates game analytics aspects of the F2P business model based on a literature analysis. The paper at hand is based on [1]. Other than the underlying conference contribution the extended version provides insights into powerful tools and metrics such as retention and acquisition rate. Hence, Sect. 1 pinpoints game analytics from a business perspective, and Sect. 2 from a methodological and tool implementation point of view. Section 3 forms the core contribution by providing insights into player-centric metrics, key performance indicators, software applications and functionalities. A conclusion summarizes the essence of this contribution.

Game Analytics—Business Impact, Methods and Tools

603

2 Business Impact and Methodological Aspects of Game Analytics F2P games are available to play free of charge in their basic form. However, the gameplay is restricted in some ways [1]. For example, by time constraints or unavailability of certain areas or actions in the game. These additional parts can be unlocked by paying a fee. Some games also include advertising or offer optional premium subscriptions. The most common model nowadays is in-game purchases in terms of a broad variety of virtual items that enhance the gaming experience [22]. Due to its characteristics, F2P can thus be regarded as variation of the freemium business model [10]. Because of its nature of selling virtual items for low amounts of money, the F2P model is also referred to as “microtransactions revenue model” [23, 24].

2.1 Range of Business Performance of Game Analytics Based on the performed systematic literature review and analysis Fig. 1 gives an overview on the range of application and business performance of game analytics; it provides a synoptic view on the contribution that game analytics may provide to various functions and tasks, such as designers, marketers or gamers (cf., Sect. 2.2) including the motivation to use it, the prediction of customer life time, and the churn rate. In the online channel category, online players can be reached via distribution channels such as App Store or Google Play, which are particularly popular in the field of mobile gaming. The performance of those channels can be evaluated and compared using key performance indicators [25], such as the proportion of active or new players or the retention rate [26].

Fig. 1 Range of game analytics performance

604

R. Flunger et al.

For the online player category, the gamers’ behavior and their specific preferences are analyzed. Here, data on behavior, motivation, experience and satisfaction with the content are examined. Through in-game behavior, gamers’ behavior patterns (e.g. frustration) are identified that may lead to a player leaving the game. The customer lifetime value refers to all income during the relationship between company and gamer, that can be traced back to one or more customers [27]. With the CLV those customers who are particularly valuable and profitable in terms of willingness to pay and influencer or recommender to the company can be identified. The CLV predictions can be used to estimate a customer’s potential value in order to improve planning reliability and enable for more accurate budgeting for future business strategies and investments. Since all activities of a player can be traced back to one player through account-information, there is an increasing importance and necessity for a precise prediction of the customer lifetime value. Using the conversion rate, the percentage of players, who make in-app purchases can be determined. The F2P market is characterized by low retention rates due to the large number of different games available, therefore improving those low rates is a necessity in order to improve sales. The game development category mainly refers to design and implementation issues and to monitoring of the development process. The sub-areas include: gameplay, interface, system, and the analysis of processes and performance. Gameplay is about the actual behavior of a user as a player. This includes actions to evaluate the game design such as in-game interactions, swapping items and navigating the map. Interface Analytics deals with the interface and menu, which e.g. with the help of variables such as mouse sensitivity or brightness of the screen. System analytics include, for example, the artificial intelligence system, automatic events and actions by non-player characters (NPC), which measure the effectiveness of the system design. Process Analytics monitors the development process and Performance Analytics refers to the game’s technical and software infrastructure. This includes, for example, the frame rate, number of bugs or the quality of a game. From the gaming industry’s perspective, effective data analysis during development is helpful for optimizing the game. It creates an improvement in the efficiency of game development and thus reduces the cost of development [28]. The category containing game publisher and marketer refers to analyses for the business areas of acquisition, retention, and revenue. Acquisition analytics deal with the potential of cost-reductions in customer acquisition. It provides numerical evidence for example on how many new players there are, and how many of them complete the tutorial. Retention analytics, however, evaluates game quality. Three main indicators are essential in this matter, i.e. the weekly playing time, the stop rate, and the duration of the playing time. The retention rate is a benchmark for the success of a game, since it can be used to measure quality. In F2P games, sales are generated through in-app purchases or advertising, therefore the prediction of the CLV becomes a major challenge for game analytics in the area of revenue analytics. For online game developers, the biggest challenge is to maximize the number of players in order to improve the retention rate. In connection with this, the goal is to improve the average lifecycle value of the players, since the costs for the acquisition of

Game Analytics—Business Impact, Methods and Tools

605

new customers have increased constantly in recent years. Certain functions contribute significantly to the company’s success, such as the possibility of inviting friends or members of the personal network, and the opportunity to ask for help, as well as the activation of new content [28].

2.2 Methodological Aspects Based on Literature Review For the methodological part of this paper a comprehensive literature review has been conducted. For the review five databases were inquired to find publications on the topic of the free-to- play model in a business context. These are (1) ACM Digital Library, (2) EBSCOhost, (3) IEEE Explore, (4) SpringerLink and (5) Wiley Online Library. This paper investigates the key results of eleven scientific papers based on the conducted literature review. Covering aspects of game analytics for the F2P business model the papers were categorized into the following three categories: Motivation to use Game Analytics, Customer Lifetime Value Prediction, and Churn Prediction and are depicted in Table 1 in chronological order. Table 1 Methodological aspects of game analytics Sources

Method Quantitative

Hadiji et al. [21]

Category Qualitative

Motivation to use game analytics

CLV prediction

×

×

Runge et al. [22] × Hanner and Zarnekow [34]

×

×

× ×

Koskenvoima and Mäntymäki [9]

Churn prediction

×

Sifa et al. [16]

×

×

Lee et al. [29]

×

×

Perianez et al. [4]

×

×

Voigt and Hinz [35]

×

Milosevic et al. [14]

×

×

Demediuk et al. [33]

×

×

Drachen et al. [19]

×

×

×

606

R. Flunger et al.

Table 2 Game environments for game analytics Category

PC

Social network game

Mobile

Motivation to use game analytics

1

0

1

Customer lifetime value prediction

1

0

4

Churn prediction

1

4

5

Furthermore, Table 2 displays the frequency distribution of game environments for game analytics in the literature displayed in Table 1. In the following the three categories will be discussed in detail. Motivation to use Game Analytic Tools for SME. A qualitative study based on interviews with game developers revealed that small and medium-sized enterprises (SME) apply game analytics tools for two purposes: (i) (ii)

analytics as a communication tool, and analytics as a decision support tool.

The first purpose is about the necessity of reporting to investors and publishers in a comprehensive way by using for example key performance indicators. Hereby, retention rate was considered the most important metric [1]. Furthermore, measuring CLV was deemed important as it helps to understand the extent to which the costs related to customers´ acquisition are covered [29]. Churn Prediction. Churn prediction in a gaming context denotes the process of detecting and defining players of a game, who will leave the game for good at a certain future point of time [1]. Those players, who leave the service are called “churners”, and the ratio of these over non-churning players represents the so-called “churn rate”. Particularly in an F2P environment the prediction of churners is an important task, as retaining players is generally considered less expensive than recruiting new ones [21]. By being able to predict when a player is about to leave, developers can adjust the gameplay experience on a more individual level and thus prolong the user’s lifetime [30]. It is to mention, that all studies in this section (except one) refer to casual mobile games. For churn prediction, a system needs to be implemented that is able to differentiate between players in a reliable way. The most common approach is a simple binary classification of players, i.e. churners and returning players [31]. Advanced approaches are based on machine learning algorithms, which can be trained by using game datasets. Popular algorithms are neural networks, logistic regression, decision trees, support vector machines, and Markov models. However, some approaches have also been criticized. While a binary classification is intuitive and relatively easy to implement, the results are rather limited. It cannot properly process temporal information and is inflexible in predicting exact churn times and probabilities. Thus, a model based on survival analysis was suggested as this allows to produce utility functions with clear probabilities of player churn at any given point in time [32]. Hadiji et al. used binary classification, introducing a 7-day window after which a player was classified as churned [21]. They developed a prediction model and used

Game Analytics—Business Impact, Methods and Tools

607

it on an experimental dataset of four games. Hereby, results showed that the most important indicators to predict player churn were number of play sessions, number of days since sign-up, average time between play sessions and current absence time. Additionally, Runge et al. analysed in detail so-called high-value players in two games [22]. In their study high-value players are the top 10% of paying players over the last 90 days before the study took place. Features important to churn prediction were not discussed in this paper. However, the authors tested how churning players may be effectively manipulated. Thus, they sent substantial amounts of in-game currency to players who were predicted to churn or have recently churned. It turned out that this action had no significant impact on the overall churn rate. For this reason, they recommended to cross-link churning players to other games in the developer’s portfolio, rather than trying to retain them in the game they are about to leave. Furthermore, Perianez et al. found that high-value players can generally be regarded as churned when they did not play for more than ten consecutive days [4]. As most important variables to predict churn they listed the amount of last purchase, days since last purchase and the user’s level in the game. In addition, Lee et al. found that the number of purchases and the number of times attending the player’s guild were the most important predictors for churning [29]. Also, the amount of virtual currency left after the last logout showed to be a relevant indicator. Especially attending the guild is an interesting element, as it implies the relevance of social factors in the player’s decision to stay with a game. Another study discussed early churn and proposed a personalized targeting strategy to retain players. Early churn denotes the first day a user starts playing the game, as it is pointed out that this period has the highest churn rate overall. After developing the prediction model, they sent push notifications to players that were likely to churn soon. The notifications had two different occurrences: they either explained game features the player used a lot in more detail, or they presented rather unexplored core features. The goal of both activities was to re-attract and boost the player’s interest in the game and motivate them to continue to play. It turned out that the first approach would not lead to a higher return rate compared to sending no notifications. Furthermore, the second approach of presenting unexplored features would significantly improve retention. This is believed to be the case as players are invited to explore something new to them, while in the first case they already may have decided that they do not like the game enough to play it [14]. Concerning the exact definition of player churn, some variants can be observed. Especially in the case of binary modeling it is important to set a certain cut-off point at which a player is classified as churned. While Lee et al. [29], Runge et al. [22], and Miloševi´c et al. [14] classify players as “churned” when they did not play for 14 days, Hadiji et al. [21] use a 7-day-window, while Perianez et al. [4] conclude that 10 days is a viable point of classification [4, 14, 21, 22, 29]. Finally, one study in the sample discussed a different environment, namely the popular desktop F2P game “League of Legends”. In this study, survival analysis was employed to predict when players churn from the game. It is to note that “League of Legends” does not take place in a persistent game world. Instead it is a competitive player versus player (PvP) game in separate matches. For instance, the analysis

608

R. Flunger et al.

showed that the time span between the matches is the strongest predictor for player churn. It indicates, the longer the time between matches, the lower the probability for players to return. In addition, it was found that a longer duration of a single match would lead to a longer time between subsequent matches. This is in line with the findings of the aforementioned studies on casual games, where the time between play sessions was a significant predictor for player churn [33]. Customer Lifetime Value Prediction. Another motive in terms of game analytics was the use of CLV theory to predict and discuss purchase behavior [1]. According to CLV theory, purchase behavior consists of three steps, which are: (i) (ii) (iii)

customer acquisition retention, and expansion.

Acquisition is the period from installing a game until the occurrence of the first purchase. With the conversion to paying customers the retention phase starts, which ultimately leads to an expansion in the form of efficient monetization and thus a high customer value [34]. It was shown that users who do not start playing a game on the first day of installing it have a very low probability of converting to paying users. However, it turned out that retention rate increases with repeated purchases. Especially after the third repeated purchase within the specified time period the probability of purchasing again was very high. Also, the average amount spent per purchase would generally increase with following purchases. While on the first purchase, the mean volume was rather low it rose significantly from the second purchase onwards. This was explained by users purchasing small packages first to try them out, and when gaining trust in the product they would continue to choose more expensive options. Basically, this means that users who are not attracted by the game at first are likely to never change their perception. Thus, it is important to convince them and make a good impression right from the start. As for retaining users it was suggested to keep enjoyment levels relatively high at the beginning of their lifetime, to motivate them to repeated purchasing. Also, in terms of marketing it was implied that developers have to be careful about which items are advertised to whom, as advertising relatively expensive packages to new users may turn them off, as they would perceive the purchase as too much of a risk. Thus, a dynamic presentation of potentially appropriate items per stage in the customer’s lifetime cycle is suggested [34]. Voigt and Hinz investigated how the time until a user makes her/his initial purchase influences the CLV [35]. It turned out that the correlation is negative, meaning that the longer it takes until the user performs the first purchase, the lower the expected CLV. Additionally, results show that the amount of money spent on the first purchase has a positive correlation with future CLV. This implies that the respective customers are likely to spend a high amount of money on future purchases as well. Interestingly, it was also found that payment methods hold a significant value in predicting future CLV. Customers paying with credit card showed a much higher remaining CLV than those using other methods. The authors noted that up to the date

Game Analytics—Business Impact, Methods and Tools

609

of their study, literature on effect of payment models in digital businesses has been very scarce [36] Thus, this might be a research avenue worth exploring in the future. Furthermore, the findings of Hanner and Zarnekow that the average purchase amount increases with subsequent purchases, was also confirmed by Voigt and Hinz [34, 35]. The latter showed in their prediction model that with subsequent purchases the average amount of money spent per purchase rises. However, the higher this amount got, the less likely it was to change any further. This is an important finding as identifying heavy spenders as early as possible allows to target them efficiently with marketing incentives. For example, high-value customers could be served with benefits such as exclusive content and faster response time to service inquiries [35, 48]. Another CLV prediction model was established by Drachen et al. [19]. Based on a casual mobile game they classified the players in the dataset into premium (paying) and non-premium (non-paying) as well as social and non-social players. A social player was someone who sent at least one friend request to another player, and the category of premium includes all players who made at least one purchase. They found that the factor of current absence time is the strongest CLV predictor. Also, players who play more consistent in the first week were more likely to pay. These results suggest that social players are less likely to convert to premium players as they advance through the game by using their social connections. However, they are still creating value by leveraging network effects and thus advertising the game. In general, these results are counterintuitive with past research that indicated a connection between likeliness to purchase and social interactions. The authors noted that this may be the case due to the nature of the game being a casual game with superficial social interaction mechanics. Thus, their findings may not apply to games with more complex social interactions such as Massively Multiplayer Online Role-Playing Games (MMORPG) [37]. Moreover, Sifa et al. implemented a model for purchase prediction in a mobile game [16, 49]. They used binary classification to identify players as premium or non-spending users and apply machine learning algorithms on an existing dataset. This method is similar to Hadiji et al. and Runge et al. They found that the number of purchases in the past and the amount spent on them are the strongest indicators for future purchases [21, 22]. Other, less significant factors are: (i) (ii) (iii)

number of in-game interactions activity related features, and total playtime.

Furthermore, the authors tried to predict the number of future purchases, to allow more precise overall CLV prediction. Hereby, they found that the amount spent is the strongest predictor for future purchase amount. If players spend relatively large amounts early on, most likely they will continue to do so. Additionally, regional clustering showed that the probability to purchase and the amount of money spent differs between countries. The authors noted that the insights from their study can be used by professionals to strengthen their revenue streams by

610

R. Flunger et al.

implementing efficient customer relationship management (CRM) systems. Furthermore, design implications were drawn from the study. For example, it was argued that games should be designed in a way that enforces intense interaction and optimizes for total playtime instead of playtime per session. This indicates that the overall play experience is more important than just single sessions that are perceived as especially positive [16, 38].

3 Tools and Metrics in Game Analytics As Luton puts it, the application of game analytics is synonymous with “applying a scientific method”. First, an idea or concept (“hypothesis”) is developed, which is then tested, and from which empirical evidence is gathered [39]. Then, the collected data is interpreted and used to either strengthen or undermine the initially developed concept. By adjusting and reformulating the original idea the process becomes more effective in the next iteration. The application of the described iterative process of alternating theorization and testing on gaming leads to a better understanding of the actors, i.e. players, which in turn supports the improvement of gameplay and game design, and—as a consequence—generates more revenue [39]. However, a common issue with game analytics is that developers often lack the experience and do not know what data to track, what tools to use, and how to interpret the collected data [40].

3.1 Tools and Methods in Game Analytics Data for game analytics may stem from various sources, whereof telemetry can be regarded as an application-specific important one. Telemetry encompasses remote monitoring of game servers, devices and user behavior, for example the gamers’ physical movements or interacting behavior (e.g. with other users, applications, or devices). This refers essentially to data transmitted from a remote source, such as the game client. Telemetry data then needs to be “operationalized” in the form of interpretable metrics, such as the number of daily active users (DAU), distances or speed [41]. Overall, there are three ways of data transmission. First, event-based transmission, where an action is initiated by a user, for example by making an ingame purchase. Second, frequency transmission by recording data at specific points of time, e.g. every minute. The third way is specific tracking initiated by the analyst, e.g. recording player behavior when a new patch is deployed [41]. To perform game analytics a wide array of software applications in different price ranges and with varying functionalities is available to game developers. Especially small and medium sized developers’ companies tend to outsource game analytical tasks and processes as they have limited resources available [42, 43]. Data sampling is another means to reduce costs for analyses and for computational costs; this method

Game Analytics—Business Impact, Methods and Tools

611

implies the risk of an inherent bias due to the data reduction. Therefore, key metrics have to be carefully planned out beforehand in order to collect “the proper” data and avoid excessive data collections and unnecessarily large datasets [42]. In particular, tools to perform analytics should be able to segment users into cohorts according to their behavior or their desires, define custom key performance indicators (KPI) and metrics, dig into the data, monitor user acquisition sources, provide game industry metrics, predictive behavior modeling, engage with user and test changes within the metrics. There is a plethora of such tools available online, e.g. DeltaDNA, Upsight, mixpanel, Flurry, Localytics, Ninjametrics and many more [40]. Developers should keep to using one single tool, as it is beneficial to store data in one place. This helps in analyzing and easily compare and correlate metrics [44]. By employing the theory-of-attention-based view on the firm, Mäntymäki et al. stated that game analytics are found to perform four major roles in small to medium sized companies: (i) (ii) (iii) (iv)

as a sense making device as a decision-support system as a communication tool, and as a hygiene factor.

Sense making device refers to the use of game analytics to better understand gamers’ behavior and preferences, for example by looking at where exactly gamers spend most of their time in the game, or when and under what circumstances they tend to quit playing. Decision-support revolves around using game analytics to drive decision making, based on certain important metrics. It is generally used to decide which game mechanics to use in new products and to estimate typical points of purchase in the game. Analytics as a communication tool is about reporting KPIs to investors and other shareholders. Hygiene factor discusses the necessity of employing analytics as this provides many benefits and reduces risks of failure. Because of limited resources, especially smaller to middle developers do not regard analytics as a way to achieve competitive advantage. This shows that the process has become an integral part of developing F2P games [42]. Drachen et al. suggest customer behavior analysis as the key area of game analytics, where game user research (GUR) is an integral part. GUR encompasses the application of various methodologies from disciplines such as experimental psychology, computational intelligence, machine learning, and human–computer interactions. The goal is to evaluate the quality of the interaction between the players and the game and discover how games are played [38]. In another instance, a similar method is described as “player tracking” by Luton. This is done by analyzing the flow of new gamers, what they are doing in which order and how they behave in and during the game. These aspects provide a solid basis for suggestions on how to tailor the game experience to new players and how to improve retention as well as conversion [39]. Another common method when performing analytics is so-called A/B testing. This entails the creation of two or more alternative versions of a game, or a certain game feature or object. Players are then separated into groups, which receive the differing

612

R. Flunger et al.

versions to test which one is perceived better from the gamers’ perspective. It allows to evaluate how design changes may affect dimensions such as player retention and monetization. A/B testing is especially useful for testing custom metrics that are specific to a single game [39].

3.2 Metrics and KPIs in Game Analytics There are various ways to categorize metrics in F2P games. Drachen et al. define three different types. These are. (i) (ii) (iii)

user metrics performance metrics, and process metrics.

User metrics have two perspectives: first, they are revenue related, e.g., through metrics such as average revenue per user or churn rate; second, they explain how people interact with a game, for example through showing total and average play time or the number of friends. The majority of game analytics are performed on user metrics. Performance metrics relate to the performance of the technical infrastructure of a game, such as frame rate, server stability or the number of bugs found. These are important to monitor as a bad game performance can significantly lower play rates. If servers shut down repeatedly or many bugs are encountered, gamers may leave the game for good. Process metrics are about the process of game development, e.g. the average turnaround time of new content releases [41]. In another instance, Zenn differentiates between engagement- and revenuefocused KPIs. Engagement KPIs are retention rate, churn rate, number of sessions, session length and daily/monthly active users. Revenue focused KPIs are average revenue per user, lifetime value, time to first purchase or customer acquisition cost [45]. Other sources take a somewhat similar approach by observing metrics, which focus on player behavior [35, 40] and the general gamers’ health [40] or distinguish between player segmentation and player behaviors [46]. In one case usercentric metrics introduce a further distinction between user acquisition and user retention-based metrics [43]. No matter how metrics are categorized, it was generally agreed on the importance of user retention rate as the key metric in determining a game’s success [39]. Retention rate describes the amount of time between subsequent gaming sessions of a gamer and is usually measured in certain intervals (e.g., 1-day, 7-days, 14-days and so on). Users are therefore separated into cohorts based on the day they started playing the game [44]. Retention rate is highly useful to evaluate the monetization potential of a game, to enhance first impressions (e.g., the tutorial), and to determine if and how players enjoy the game over time [26]. It is important to predict as fast as possible when players might churn, even more so if they do so in the first few days of playing. Therefore, churn rate can be regarded as the “opposing” metric to retention rate. Determinant variables of player churn are

Game Analytics—Business Impact, Methods and Tools

613

the total playtime, average duration of played sessions, current absence time, (total) number of rounds/levels played, and the maximum level reached [43]. By predicting player churn accurately developers can prioritize their resources for better operation and game management. Also, better marketing strategies to improve user retention can be created. From the perspective of game platforms (platforms offering multiple games to download and play) churn prediction can help fuel recommender systems by determining the right time to recommend new applications to gamers. It is argued that gamers who stop playing a game are more likely to act on a recommendation for another application [47]. In addition to retention and churn rate there is a variety of other player-centric metrics developers should possibly track. These are average revenue per user (ARPU), conversion rate (from non-paying to paying players), and tutorial funnels, meaning how players experience the tutorial and if they might quit the game early on [42]. Furthermore, tracking daily new users and daily/monthly active users may provide valuable insights into a game’s popularity [39]. Also, total gameplay time can be useful to identify highly engaged players, as those who interact intensely with the game may have a higher willingness to pay [16]. In addition, tracking achievements (e.g., which missions/achievements players complete), player levels and time of peak usage can provide useful information on what players enjoy doing and when they do it. This helps in delivering relevant content and encouraging players to make purchases. In these regards, demographics should also be considered, as factors such as for instance age, language, region, and devices used may give indications about specific player behavior, habits and preferences [39]. Furthermore, as described in the previous section of this paper, CLV-related metrics are of high importance to determine the cost per player and customer acquisition. The CLV is hereby described as an aggregate of user acquisition costs and ARPU [42]. Essentially, it tells the net profit generated from a user before he/she quits playing the game [44]. From a marketing perspective user acquisition is an important aspect. Here it can be useful to consider metrics such as number of invites sent from players to their friends, ratio of users acquired virally, and the so-called “K-factor”. The “K-factor” is calculated from the number of invites sent by a customer, multiplied by the conversion rate of each invite. This value determines the effectiveness of a developer’s referral growth strategy [44]. Another category is resource and time, tracking how limited resources are created by players in the game environment and how they are utilized. This is also about tracking the creation and purchase of virtual items. Doing so provides insights into what attracts players to use limited resources, which furthermore helps in planning out content and ensuring that purchasable objects are appealing to players [39]. It is also important to point out that performing analytics places a high time demand on related metrics. Obviously, developers cannot wait until players leave their game, and as players constantly come and go there is a different observation time for each player. This is a phenomenon called “data censoring”. Therefore, Viljanen et al. make use of a method called “mean cumulative function” arguing

614

R. Flunger et al.

that it is possible to estimate an expected number of user sessions, purchases, total playtime and lifetime value per player [26].

4 Conclusion Game analytics support game publishers, game designers, game developers, game marketers, and gamers by utilizing advanced quantitative methods, which are implemented in novel, powerful tools. Such game analytics tools provide companies in the gaming business with various key performance indicators, from which customer lifetime value (CLV) prediction, and player churn prediction are two of the most important gamer-related key data. The contribution at hand has made transparent the possible value the application of such methods may generate in the context of actors in the value chain in the gaming business. Furthermore, selected tools and their effectiveness were introduced and discussed. Such tools and the implemented methods help to improve strategic design decisions, and allow precise and fine-grained player segmentation, which in turn represents a valuable basis for targeting players with appropriate marketing incentives. Together with game analytical metrics, such as retention and acquisition rates, they serve as a common ground for further development of innovative products and of competitive strategies in the gaming industry.

References 1. Flunger, R., Mladenow, A., Strauss, C.: Game analytics on free to play. In: Younas, M., Awan I., Benbernou S. (eds.) Big Data Innovations and Applications. Innovate-Data 2019. Communications in Computer and Information Science, vol. 1054, pp. 133–141. Springer (2019) 2. Komorowski, M., Delaere, S.: Online media business models: lessons from the video game sector. Westminster Papers Commun. Cult. 11(1), 103–123 (2016) 3. O’Donnell, C.: Getting played: gamification and the rise of algorithmic surveillance. Surv. Appl. Math. 12(3), 349–359 (2014) 4. Perianez, A., Saas, A., Guitart, A., Magne, C.: Churn prediction in mobile social games: towards a complete assessment using survival ensembles. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 564–573. IEEE (2016) 5. Oh, G., Ryu, T.: Game design on item-selling based payment model in Korean online games. In: Situated Play, Proceedings of DiGRA 2007 Conference, pp. 650–657. The University of Tokyo, Tokyo. Retrieved from http://www.digra.org/wp-content/uploads/digital-library/07312.20080. pdf Accessed 10 March 2021 6. Buerger, B., Mladenow, A., Novak, N.M., Strauss, C.: Equity Crowdfunding: quality signals for online-platform projects and supporters’ motivations. In: Tjoa, A., Raffai, M., Doucek, P., Novak, N (eds.) Research and Practical Issues of Enterprise Information Systems. CONFENIS 2018. Lecture Notes in Business Information Processing, vol 327, 109–119. Springer (2018) 7. Aleem, S., Capretz, L.F., Ahmed, F.: Empirical investigation of key business factors for digital game performance. Entertain. Comput. 13, 25–36 (2016)

Game Analytics—Business Impact, Methods and Tools

615

8. Seufert, E.B.: The freemium business model. In: Freemium Economics, 1–27. Elsevier (2014) 9. Koskenvoima, A., Mäntymäki, M.: Why Do Small and Medium-Size Freemium Game Developers Use Game Analytics? IFIP International Federation for Information Processing 2015, 326–337 (2015) 10. Miller, P.: GDC 2012: how valve made team fortress 2 free-to-play. Retrieved from http:// gamasutra.com/view/news/164922/GDC_2012_How_Valve_made_Team_Fortress_2_free toplay.php. Accessed 10 Match 2021 11. Seidl, A., Caulkins, J.P., Hartl, R.F., Kort, P.M.: Serious strategy for the makers of fun: analyzing the option to switch from pay-to-play to free-to-play in a two-stage optimal control model with quadratic costs. Eur. J. Oper. Res. 267(2), 700–715 (2018) 12. Marchand, A., Hennig-Thurau, T.: Value creation in the video game industry: industry economics, consumer benefits, and research opportunities. J. Interact. Mark. 27(3), 141–157 (2013) 13. De Prato, G., Feijóo, C., Simon, J.-P.: Innovations in the video game industry: changing global markets. Digiworld Econ. J. (94), 17–38. Retrieved from https://ssrn.com/abstract=2533973. Accessed 10 March 2021 14. Miloševi´c, M., Živi´c, N., Andjelkovi´c, I.: Early churn prediction with personalized targeting in mobile social games. Expert Syst. Appl. 83, 326–332 (2017) 15. Newzoo.: Newzoo global games market report 2020|Light Version. Retrieved from https:// newzoo.com/insights/trend-reports/newzoo-global-games-market-report-2020-light-version/. Accessed 10 March 2021 16. Sifa, R., Hadiji, F., Runge, J., Drachen, A., Kersting, K., Bauckhage, C.: Predicting purchase decisions in mobile free-to-play games. In: The Eleventh AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-15), 79–85 (2015) 17. Park, B.-W., Lee, K.C.: Exploring the value of purchasing online game items. Comput. Hum. Behav. 27(6), 2178–2185 (2011) 18. Davidovici-Nora, M.: Paid and free digital business models innovations in the video game industry. Digiworld Econ. J. (94), 83–102. Retrieved from https://ssrn.com/abstract=2534022. Accessed 10 March 2021 19. Drachen, A., Pastor, M., Liu, A., Fontaine, D. J., Chang, Y., Runge, J., Sifer, R., Klabjan, D.: To be or not to be…social. In: Abramson D (ed.) Proceedings of the Australasian Computer Science Week Multiconference on—ACSW ‘18, 1–10. ACM Press, New York, New York, USA (2018) 20. Drachen, A., Thurau, C., Togelius, J., Yannakakis, G.N., Bauckhage, C.: Game data mining. In: Game Analytics, pp. 205–253. Springer, London (2013) 21. Hadiji, F., Sifa, R., Drachen, A., Thurau, C., Kersting, K., Bauckhage, C.: Predicting player Churn in the wild. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), 1–8 (2014) 22. Runge, J., Gao, P., Garcin, F., Faltings, B.: Churn prediction for high-value players in casual social games. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), 1–8 (2014) 23. Davidovici-Nora, M.: Innovation in business models in the video game industry: free-to-play or the gaming experience as a service. Comput. Games J. 2(3), 22–51 (2013) 24. Hamari, J., Lehdonvirta, V.: Game design as marketing: how game mechanics create demand for virtual goods. Int. J. Bus. Sci. Appl. Manage. 5(1), 15–29 (2010). Retrieved from https://www.business-and-management.org/library/2010/5_1--14-29Hamari,Lehdonvirta.pdf. Accessed 10 March 2021 25. Flunger, R., Mladenow, A., Strauss, C.: The free-to-play business model. In: Indrawan-Santiago M., Steinbauer M., Salvadori I.L., Khalil I., Anderst-Kotsis G. (eds.) Proceedings of the 19th International Conference on Information Integration and Web-based Applications & Services, 373–379. ACM (2017) 26. Viljanen, M., Airola, A., Majanoja, A-M., Heikkonen, J., Pahikkala, T.: Measuring player retention and monetization using the mean calculative function. IEEE Trans Games (2017)

616

R. Flunger et al.

27. Burelli, P.: Predicting customer lifetime value in free-to-play games. Data Anal. Appl. Gaming Entertain. 79 (2019) 28. Su, Y.: Game analytics research: status and trends. In: International Conference on e-Business Engineering, pp. 572–589. Springer, Cham (2019) 29. Lee, S.-K., Hong, S.-J., Yang, S.-I., Lee, H.: Predicting churn in mobile free-to-play games. In: International Conference on ICT Convergence 2016, 1046–1048. IEEE (2016) 30. Sandqvist, U.: The games they are a changing: new business models and transformation within the video game industry. Hum. Soc. Sci. Latvia 23(2), 4–20 (2015). Retrieved from https://www.lu.lv/fileadmin/user_upload/lu_portal/apgads/PDF/Hum_Soc_ 2015_2_.pdf. Accessed 10 March 2021 31. Macchiarella, P.: Trends in Digital Gaming: Free-to-Play, Social, and Mobile Games. Dallas, Texas. Retrieved from Parks Associates website: http://www.parksassociates.com/bento/shop/ whitepapers/files/Parks%20Assoc%20Trends%20in%20Digital%20Gaming%20White%20P aper.pdf. Accessed 10 March 2021 32. Roquilly, C.: Control over virtual worlds by game companies: issues and recommendations. MIS Q. 35(3), 653–671 (2011) 33. Demediuk, S., Murrin, A., Bulger, D., Hitchens, M., Drachen, A., Raffe, W.L., Tamassia, M.: Player retention in league of legends. In: Abramson, D. (ed.) Proceedings of the Australasian Computer Science Week Multiconference on – ACSW ‘18, 1–9. ACM Press, New York, New York, USA (2018) 34. Hanner, N., Zarnekow, R.: Purchasing behavior in free to play games: concepts and empirical validation. In: 2015 48th Hawaii International Conference on System Sciences, 3326–3335. IEEE (2015) 35. Voigt, S., Hinz, O.: Making digital freemium business models a success: predicting customers’ lifetime value via initial purchase information. Bus. Inf. Syst. Eng. 58(2), 107–118 (2016) 36. Beier, N., Mladenow, A., Strauss, C.: Paid Content—Eine empirische Untersuchung zu redaktionellen Sportinhalten. In: Multikonferenz Wirtschaftsinformatik (MKWI) Data driven X—turning data into value, vol. III, 1099–1110 (2018) 37. Devlin, S., Cowling, P.I., Kudenko, D., Goumagias, N., Nucciareli, A., Cabras, I., Fernandes, K.J., Li, F.: Game intelligence. In: 2014 IEEE Conference on Computational Intelligence and Games, 1–8 (2014) 38. Becker, A., Mladenow, A., Kryvinska, N., Strauss, C.: Aggregated survey of sustainable business models for agile mobile service delivery platforms. J. Serv. Sci. Res. 4(1), 97–121 (2012) 39. Luton, W.: Free-to-play: making money from games you give away. New Riders, Indianapolis (2013) 40. GameSauce: A comprehensive analysis of the tools that support mobile game development. Retrieved from http://www.gamesauce.biz/2014/09/10/a-comprehensive-analysis-ofthe-tools-that-support-mobile-game-development-part-2/. Accessed 10 March 2021 41. Drachen, A., El-Nasr, M.S., Canossa A.: Game analytics—the basics. In: Game Analytics: Maximizing the Value of Player Data, 13–40 (2013) 42. Mäntymäki, M., Hyrynsalmi, S., Koskenvoima, A.: How do small and medium-sized game companies use analytics? An attention-based view of game analytics. Inform. Syst. Front. 22, 1163–1178 (2019) 43. Drachen, A., Lundquist, E.T., Kung, Y.-J., Rao, P.S., Sifa, R., Runge, J., Klabjan, D.: Rapid prediction of player retention in free-to-play mobile games. Proc AAAI Conf. Artif. Intell. Interact. Digital Entertain. 12(1), 23–29 (2016) 44. Cooladata: The 19 metrics every mobile game needs to track. Retrieved from: https://www.coo ladata.com/19-metrics-every-mobile-games-needs-track/. Accessed 10 March 2021 45. Zenn, J.: 50+ KPIs to measure your mobile game or App. Retrieved from: https://gameanaly tics.com/blog/50-kpi-measure-mobile-game-app.html. Accessed 10 March 2021 46. Su, Y., Backlund, P., Engström, H.: Comprehensive review and classification of game analytics. Serv. Orient. Comput. Appl. 1–16 (2020)

Game Analytics—Business Impact, Methods and Tools

617

47. Liu, X., Xie, M., Wen, X., Chen, R., Ge, Y., Duffield, N., Wang, N.: Micro- and macro-level churn analysis of large-scale mobile games. Knowl. Infor. Syst. 1–32 (2019) 48. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N., Ansar-Ul-Haque, Y.: A real-time service system in the cloud, Springer. J. Amb. Intell. Hum. Comput. (2020). https://doi.org/10. 1007/s12652-019-01203-7 49. Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Xhafa, F.: Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. Springer J. Comput. 101(11), 1661–1685 (2019). https://doi.org/10.1007/s00607-0180680-z

Synergistics and Collaboration in Supply Chains: An Integrated Conceptual Framework for Simulation Modeling of Supply Chains Natalia Lychkina

Abstract In this research, the author explores the approaches to simulation of supply chains’ strategic development specifically focusing on formation of cooperation strategies between supply chain partners. The objective of this paper is to suggest a conceptual scheme and stratification approaches that enable creation of a model reflecting polysystemic representation of the supply chain. The following base levels of the supply chain representation are considered: object-based, configuration/network-based, process-based, and logistics coordination levels. In the field of supply chains transformation and strategic development there is a strong need in concurrent and aligned usage of different supply chain representations. That defines the approach to building generic supply-chain representation based on composite simulation models. Depending on addressable tasks of supply chain analysis and synthesis, process and system dynamic simulation models of different degrees of detail may be used. Agent-based modeling is used to model interorganizational coordination between supply chain partners. Keywords Simulation · Supply chain management · Synergetics · Stratification · Ontologies

1 Introduction Changes in the external environment and highly dynamic and unpredicted nature of current changes call for defining a strategy and direction of a supply chain development to be resistant to market force or demand fluctuations, adaptable, transformable and responsive to customer needs, and be able to survive external turbulences in an environment featuring direct or indirect disturbing influences and multiple risk factors. Long-term sustainable development, changing and developing supply chains N. Lychkina (B) National Research University Higher School of Economics, 20 Myasnitskaya, Moscow 101000, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_20

619

620

N. Lychkina

(SC), transforming logistics systems and changing logistics processes become the objects of analysis in strategic terms. Comprehensive strategic modeling of a SC (its network structure, key business processes, and interactions between agents) help to define the performance and development strategy of a supply chain over the longer term. The issues of coherence and concurrent usage of different supply chain model representations in transformation and strategic development projects define the approach to building generalized supply-chain models based on composite simulation models. Polysystemic representation of a model-based supply chain and stratification approaches uses a conceptual scheme with the following base levels of supply chain representation: object-based, configuration/network-based, process-based, and logistics coordination levels. Depending on addressable tasks of supply chain analysis and synthesis, process and system dynamic simulation models of different degrees of detail may be used. Agent-based modeling and simulation (ABMS) is used to model inter-organizational coordination processes between supply chain partners. Literature [1] and others offer different qualitative and quantitative supply chain modeling methods: analytical methods, simulation and modeling (S&M), physical experiments, heuristics, etc. Depending on the goals, various combinations of optimization, computer simulation, heuristics and statistics are used. By referring to the taxonomy of supply chain models, the authors [2] point out hybrid modeling and the IT driver for application of different SC planning methods. This paper relies on the general theoretical and instrumental foundation and further expands it with the principles of synergetics and organizational sciences, attempting to focus not only on studying the structural characteristics, but also on the dynamic characteristics of supply chains in terms of their strategic development based on hybrid simulation. A number of studies performed benchmarking of the applicability of different simulation paradigms for supply chain examination. Tako et al. [3] studied the application of discrete event simulation (DES) and system dynamics (SD) as decisionmaking support systems in the field of logistics and supply chain management taking account of the nature of addressed tasks and the level of management. In particular, the authors showed that SD is the leader in bullwhip effect analysis tasks, with the same application frequency as DES in studying information sharing and return flows. DES is used more often than SD for studying SC structures and tactical and operational tasks. The authors’ shared view is that both DES and SD may be actively used to achieve SC strategic planning goals. The status and prospects of supply chain simulation are analyzed in a review [4], which discussed the most popular application of mono methods (DES, SD, ABMS) in scientific publications and determined the prospects of hybrid simulation modeling. Kersten et al. [5] have reached similar conclusions. The authors examine the application of various simulation methods for the description of core processes in a supply chain (as per SCOR reference process model) and point out the fragmented nature of the well-known models, since only a few studies examine more than one SC process. It proves that simulation tools are more often used for improvement and reengineering of individual processes rather than for complex analysis of synergies in the performance of integrated supply chains. Compensatory combination of SC simulation paradigms was addressed in

Synergistics and Collaboration in Supply Chains: An Integrated …

621

a number of studies [3, 6–12, 14, 60]. Hennies et al. [8] offered the mesoscopic approach for modeling supply chains that combines discrete impulse-like flows with piecewise constant flow rates. Palma-Mendoza [15] built a hybrid SC model, using different modeling paradigms at different levels of representation (SD was used for aggregated model and DES for more detailed model). Castilho et al. [6] proposed a hierarchically integrated set of models consisting of a system-dynamic model to support strategic decisions, an analytical-optimization model to support tactical decisions, and a discrete-event model to support decisions at the operational level. Various combinations of agents and processes in hybrid SC models are discussed in [11, 16–18]. However, such models failed to sufficiently reflect inter-organizational coordination nature. This calls for identification of inter-organizational coordination mechanisms and processes for building a general simulation model of SC. The paper discusses dynamic aspects of developing supply chains and the potential of the simulation modeling method and paradigms in achieving strategic goals of supply chain management, focused on efficient integration, inter-organizational coordination and long-term strategies of stakeholder’s cooperation, as well as implementation of modern logistics technologies based on partner cooperation (an example Vendor Managed Inventory (VMI) and Collaborative Planning, Forecasting and Replenishment (CPFR)). The author explore the genesis of inter-organizational coordination processes and its key mechanisms, which need to be represented in supply chain model descriptions. By benchmarking various simulation paradigms, the author demonstrate that ABMS best fits the purpose of describing inter-organizational coordination processes and cooperation within a supply chain. The agent-based simulation features discussed above are a topic of interest among researchers who study supply chains, specifically when the behavior of individual system entities in relation to each other is an important aspect. Behdani [19] point out the following capabilities of ABMS, versus other simulation methods, in studying supply chain properties at a micro level (such as numerousness and heterogeneity of its elements, local interactions, nestedness, and adaptiveness), and at a macro level of the supply chain (the properties of emergence, self-organization, co-evolution, and path dependency). A review of vast academic literature [61] dedicated to ABMS applications highlights that most authors have insufficiently specified the properties of individual and social (expressed through interaction) behavior within a supply chain. A review of literature on collaborative supply chains shows that the cooperation phenomena based on information sharing and coordination of operation processes of SC agents has been studied thoroughly by using ABMS. Nevertheless, capabilities of the ABMS method, in exploring limited rationality related to conflicts and establishment of trust between partners, as well as complex cooperation processes and development of long-term relations between supply chain partners related to organizational and technological changes in a supply chain, information and knowledge sharing, agent adaptation over time, emergence and dynamic reconfiguration of economic network structures built on a background of such long term cooperation, and the influence of efficient trust-based cooperation on supply chain business development strategies in a longer strategic term, have been understudied.

622

N. Lychkina

The following works have contributed most. Below is a review of the potential of simulation of the social processes of coordination, collaboration and cooperation between supply chain agents, using the existing approaches proposed in academic literature. The authors [13] examine an agent model of a four-link supply chain to prove that the decision-making process should evolve ‘from the reductive approach (where common strategy is an aggregate of individual strategies) to the integrative approach (where global optimization relies on cooperation)’. Common strategy is not shaped by summing up individual strategies (local optimization), but by alignment of all individual strategies with common objectives being the competitive ability of all partners and efficient performance throughout the supply chain. The authors show that the bullwhip effect may be mitigated by aligning individual objectives of partners with the global objectives using different SC coordination mechanisms. The authors [22] suggest a system dynamics model of collaborative supply chains highlighting such model components as stakeholders, topology, levels of collaboration, enabling technology, business strategy and processes. The list of potential stakeholders include suppliers, manufacturers, wholesalers, retailers and end consumers, each playing its role and sharing responsibility for performing the key supply chain processes. Depending on stakeholders’ decisions, various supply chain topologies are possible. Additional links within a supply chain, on the one part, increase costs and on the other part, reduce the time required for a product to reach the retailer. Introduction of information technologies and cooperation involves costs and cooperation effect needs time to show up. In the model, the authors also account for the dependence of the supply chain’s efficiency on degree of alignment of individual stakeholders’ business strategies with the supply chain’s common business strategy. The SC model, however, is created on the basis of system dynamics methods allowing for a cause and effect analysis of the examined factors, but does not reflect social behavior of partners, dynamic reconfiguration of structures and emerging effects in logistics systems. Arvitrida et al. [23] explore, using agent-based modeling, competition and cooperation between SC partners on a market landscape. The model describes a twolink supply chain consisting of suppliers, manufacturers and end consumers, each of which is an agent with its own behavior and also depicts limited rationality of agents, their willingness to compromise and loyalty reflecting the degree of trust. Consumers select manufacturers relying on their preferences and commitment to collaborative work. Similarly, manufacturers select suppliers and compete between themselves for the consumer. Links may or may not be established. The model accounts for such factors as the cost of establishing long-term relations and duration of cooperation relationships between agents. However, the application of agent-based simulation in studying strategic management of supply chains using logistics technologies of agent integration and cooperation, complex studies of the influences of various coordination mechanisms, information sharing, integrated planning, covering organizational and technological aspects of introduction of modern logistics concepts and information technologies is very limited. Literature does not identify any models allowing to assess the efficiency of

Synergistics and Collaboration in Supply Chains: An Integrated …

623

collaborative planning, forecasting and inventory replenishment and the related information sharing based on SC agent coordination and cooperation mechanisms. This is specifically the case in a situation of creation of partner coalitions, trust-based cooperation strategies and motivations based on individual objectives, emergence and resolution of conflicts in supply chain management in a dynamic environment. Comprehensive analysis of the interrelation and influence of inter-organizational coordination as well as agent cooperation strategies based on trust and collaborative conflict resolution on supply chain performance in strategic terms is the principal area of research of this and future papers.

2 Issues of Strategic Management of Developing Supply Chains 2.1 Integration Paradigm in Supply Chain Management. Supply Chain Synergies The core of the integration approach in Supply Chain Management (SCM) is treatment of the logistics process as an integral whole within a supply chain for efficient achievement of strategic business objectives and cost/service balance, improved reliability and sustainability of a supply chain as the key characteristic of its performance in terms of logistics integration. Individual firms are viewed as supply chain links, which implies interaction between partners focused on the achievement of shared objectives of efficient performance throughout a supply chain and resolution of any conflict of interest between supply chain partners/counteragents. In terms of systemic approach, integration capability is a core challenge in the SCM theory. Intuitively, integration is perceived as an action taken to achieve a holistic understanding of a composite system. As the SCM paradigm evolved and was established, different types of integration were studied: from operational logistics integration to integration of key business processes in a supply chain, integration of logistics infrastructure assets (development of logistics infrastructure asset management into a single complex), integration within outsourcing, inter-organizational logistics integration, supply chain strategic planning and control, integrated planning and inventory management in a supply chain and information integration driven by the need to create a shared information space for supply chain counteragents [24]. In practice, integrated systems rest, for example, on stable long-term contract relations/agreements between the product/service manufacturer or owner and other independent entities within a sustainable supply chain. Supply chain integration also means a set of channels and communications, both internally and between supply chain partners. Flexible supply chain network structures rest on partner interaction. Logistics coordination, inter-organizational interaction (or coherence, reciprocity) is a prerequisite for efficient supply chain management. Strategic interaction and collaboration between firms (supply chain stakeholders), integration of value chain

624

N. Lychkina

participants, cooperation, intra- and inter-organizational integration and coordination of supply chain participants are prerequisites for efficient coordinated interaction of all elements of a logistics system, consistent synchronization and optimization of logistics instances throughout a supply chain. Different types of integration ensure a synergetic effect that cannot be achieved through local optimization by disregarding system efficiency or addressing specific tasks in functional areas of logistics. In such conditions, analysis and synthesis tasks, model description or representation (conceptualization) of a logistics system should be treated as an integral whole by applying a systemic approach. In terms of the analysis and synthesis of integrated supply chains and the system modeling tasks explored in this paper, the supply chain as a modeling object should be viewed from different angles: • In terms of the object-based approach that views the logistics system as an aggregate of interacting material, financial and information flows • In terms of the process-based approach that secures integration and synchronization of key business processes into a single business model throughout the value chain (Value Chain Management), or the holistic optimization of logistics processes (procurement, production, and distribution cycle) throughout a supply chain • Flexible network structures and interaction (inter-organizational coordination) between supply chain partners • Information integration and knowledge sharing between participants stand out as important parts of adopting modern logistics concepts and technology integration and collaboration in a supply chain • Managerial integration or aligned goal achievement at different management levels of a supply chain, the alignment of strategic, tactical and operational efficiency solutions in SCM. Current SCM trends are shifting from operating efficiency improvement tasks toward strategic tasks set by developing, transforming or changing supply chains. Strategic planning and long-term strategic development of supply chains; prompt reconfiguration of supply chains, efficient and sustainable performance of a supply chain in a turbulent external environment and business transformations; the introduction of long-term cooperation strategies between supply chain partners and the adoption of modern logistics concepts and technologies relying on partner integration and coordination represent a broad range of strategic challenges currently faced by a business which adheres to the integration paradigm in SCM and seeks strategic and competitive advantages in an uncertain and fast-changing environment. The implementation of a supply chain strategy implies the creation and analysis of multiple dynamic alternative supply chain structures (supply chain configuration) and transformation scenarios that address strategic development tasks and general efficient performance of a supply chain measured by a set of performance indicators. Strategic management consists in the systemic development of an object over time; and a study of the long-term effects and scenarios of such development is a priority objective. The impact of strategic initiatives on the entire composite logistics

Synergistics and Collaboration in Supply Chains: An Integrated …

625

and production infrastructure of a supply chain, coherence of corporate and other operating strategies, search for efficient supply chain configuration and management scenarios, and supply chain analysis and synthesis (design) make up the key group of strategic-level tasks requiring an appropriate modeling and decision-making tool. This paper focuses on a search for efficient (conceptual) model structures and simulation models to solve strategic tasks of supply chain development.

2.2 Inter-Organizational Coordination. Strategic Partnership Between Supply Chain Participants Integration and strategic partnerships between supply chain participants are strategically vital for SCM. Let us discuss general issues of inter-organizational coordination in strategic terms. Logistics coordination is expressed in sharing profit, risks and responsibility (decision-making powers) between stakeholders. Supply chain participants cooperate in a competitive environment, although as partners. “Interorganizational logistics coordination is coordination of actions (including the resolution of conflicts related to logistics parameters) between a supply chain’s focal company and its partners to meet the objectives set by the supply chain” [24], and related to integrated interaction between partners, which in reality involves the occurrence, prevention and resolution of conflicts arising from controversies between the local objectives of participants and global objectives of efficient performance throughout the supply chain. Inter-organizational interaction of supply chain counteragents is an area of analysis of a large number of complementing scientific approaches, which include certain aspects of sociology, economic analysis, strategic management, and even psychology. Scientists do not have a common approach to defining supply chains (business networks) or general research methodology. Most popular approaches are managerial and economic. For example, the managerial approach to network study does not rest on the analysis of network structures’ economic nature alone, but also on the development of management strategies and the identification of sources of competitive advantages created by a combination of the key characteristics of supply chain counteragents [25]. Choi and Hong [26] note that supply chain business process management implies the development of a certain inter-organizational coordination mechanism to be used for the coordination of strategies and the adaptation and synchronization of all actions taken by the interrelated parties. The approach includes three views: resource-based, evolutionary, and relational. The resource-based view, represented by J. Barney and R. Rumelt, same as the theory of transaction costs in institutional economics (R. Coarse), consists in ensuring inter-organizational interaction in a network of firms (supply chain), which generally causes higher transaction and operating costs in the short run, however, brings substantial benefits in the long run. The reason for that is the rational use of complementary resources, knowledge of

626

N. Lychkina

the external environment, experience in building long-term relations between supply chain counteragents, and joint value creation for the end consumer. The relational view [27] suggests that a strategic partnership of supply chain counteragents may become a source of “surplus profits earned jointly through sharing, which cannot be achieved by firms (supply chain participants) individually and in isolation”. The theory rests on the idea of interaction between specific assets of each supply chain participant, and knowledge and experience sharing. In supply chain terms, the phenomenon is usually called a “synergetic effect”, supplemented with the idea of the resource-based view, which studies cost optimization. The principal theoretical approaches to inter-organizational interaction within a supply chain are represented by a number of authors in the social network theory (S. Nadel, J. Mitchell) who treat a network as specific multiple links between agents within a certain group, where the characteristics of such links may serve to interpret the social behavior of involved participants; the network concept of strategic management (P. R. Miles, C. Snow) and its treatment of a network as a new form of organization structure securing concurrent use of shared assets of several economic agents; parts of the new institutional economics theory (R. Coarse, O. Williamson, I. MacNeil, V. Tambovtsev, G.B. Richardson), which studies contractual relations as forms of institutes, transaction costs, asset specificity, limited rationality and the opportunist behavior of economic agents, as well as coordination mechanisms in hierarchical organizations using cooperative interactions. The use of a contractual approach as a tool for inter-organizational coordination in a supply chain shows [28] that hybrid relationships between participants are quite frequent due to asymmetries in power. Pursuant to the inter-organizational coordination concept [24], the resolution of SCM issues has a relational dependency on relationships between supply chain parties and is impacted by different objectives and operational priorities of supply chain links, different capacities, capital concentration, financial position, the degree of profit and risk distribution between participants, and conflicts resulting therefrom, and a few other factors. Supply chain performance improvement requires coordination of partners’ interests and operations by “creating the relationships of trust implemented through integration, coordination and cooperation”. The dynamics of collaborative supply chains and behavior of supply chain partners were studied in [29–32]. The reference model of the maturity of interorganizational relations “4C” proposed [31], in which levels and models of maturity of inter-organizational interaction of SC counterparties: communication, coordination, collaboration, and cooperation, corresponding to the integration of processes, information exchange, joint decision—making based on trust, the formation of a community of equal partners demonstrating commitment to common strategic goals. Cooperation takes different forms of network interaction, certain the topology of stable links is developed, and coalitions are built, which may vary over time, given a changing external environment and internal transformations. Cooperation at a strategic level is related to decisions that affect the potential areas of cooperation within a supply chain, for example, capital investments in logistics infrastructure

Synergistics and Collaboration in Supply Chains: An Integrated …

627

growth and logistics network restructuring by the expulsion or accession of new participants. Trust and partner relations have a positive effect on forming stable coalitions in the long run. In a dynamic-changing external environment, trust-based cooperation within a supply chain affords synergies in creating value for the end consumer. In this situation, trust-based cooperation is highlighted as the principal source of strategic competitive advantages gained through the combined activity of participants. Such cooperation enables partners to coordinate their processes within a supply chain and harmonize their multiple managerial decisions to achieve SCM synergies. A new systemic vision of efficient coordination consists in transforming suboptimal decisions of individual participants into an overall optimum throughout a supply chain, preserving the interests of all parties and achieving economic compromises by all participants. The divergence of supply chain agents’ objectives and the differences in available and potential resources and logistics capabilities, capital concentration and financial position of each partner are the key drivers and limitations in the search for efficient cooperation forms. Cooperation also helps to redistribute transaction costs between the parties. Success is delivered by moving to a level of long-term win–win cooperation relationships that are based on harmonized relations and trust between participants, so as to account for the interests (benefits and risks) of all stakeholders and achieve a balance of individual objectives aligned with common interests. Trust is a complex category (a “soft” hard-to-measure factor describing institutional relations and rules) and usually builds on a background of long-term reliable relationships between partners (historical dependency). The art of compromise is about searching for narrow places in interfaces between supply chain links and measuring the depth of inter-organizational conflicts in a supply chain which features a conflict of interests. The efficiency of inter-organizational coordination is measured, among other things, by the degree of interest and satisfaction of all parties and the cumulative effect securing competitive ability and optimal performance throughout a supply chain by the cost/service balance and other criteria. In addition, parties’ interaction in a supply chain allows using each party’s personal experience to create shared knowledge (a form of social capital) of the external environment, thus making the supply chain more responsive. Strategic positioning aims to shape a common interaction strategy for all network/coalition participants and build long-term strategic partnerships relying on harmonization and balancing of common supply chain objectives and individual partner objectives. The study of inter-organizational coordination, behavioral aspects and the creation of new flexible forms of organizations or networks in the process of interaction and cooperation are a critical aspect of supply chain management and modeling and a factor of its dynamic behavior. Organizational aspects, agent behavior and personal interests and strategies, the trust-based nature and strategy of cooperation, and cooperation motivations and constraints are the dominant factors of the new environment thus calling for a search for new approaches to the modeling of supply chains as organizational-type systems. Computer-aided multi-agent-based modeling opens up new opportunities for achieving these tasks as shown below.

628

N. Lychkina

2.3 Modern Logistics Concepts/technologies Based on Participants Integration. Inter-organizational coordination mechanisms in a supply chain are very diverse. Building inter-organizational cooperation includes information and knowledge sharing between the participants, the introduction of integrated planning systems, supply chain business process re-engineering, and other mechanisms: flexible network structures, social norms and trust, contracting and obligations, mutual approval routines and other relationship frameworks providing for the efficient management of inter-organizational interactions and long-term relationships. Key components of inter-organizational coordination should include trust between parties and the accumulation of historical knowledge and experience, the coordination of objectives, and benefit, resource and risk sharing, which, ultimately, motivates supply chain participants. The supply chain strategy is based on the introduction of modern logistics concepts/technologies that rest on supply chain counteragent integration, such as VMI (Vendor Managed Inventory), CPFR (Collaborative Planning, Forecasting and Replenishment) and their combinations in various transformation projects for given supply chains. The choice of a logistics concept as a practical framework for a supply chain organization often serves as a strategic platform for transformation projects. The implementation of such logistics concepts is based on the introduction by firms of various up-to-date information technologies and Internet technologies (information integration), which, in turn, facilitates organizational and technological changes in a supply chain. The logistics concept of VMI consists in sharing demand and inventory balance information with the supplier. The supplier undertakes the customer’s inventory management. The logistics concept of CPFR, due to a shared planning center, affords efficient inventory management across supply chain links. The introduction of VMI and CPFR technologies that are based on counteragent integration and coordination improves the overall performance of a supply chain and mitigates the bullwhip effect, i.e. improves supply chain resilience. Their introduction, however, involves certain organizational and technological transformations in a supply chain, expenditures for implementation, redistribution of responsibilities and transaction costs between the participants. To streamline links within a supply chain, both centralized (for example, 4PL outsourcing providers) or decentralized organizational structures or their substructures may be chosen. Attempts to apply model approaches to the study of the influence of logistics technologies VMI, CPFR on the efficiency of the supply chain have been made in a number of works [20, 33, 34]. However, they are fragmentary and do not cover the wide range of interorganizational coordination mechanisms used in collaborative supply chains. The introduction of modern logistics concepts and technologies (VMI, CPFR) calling for cooperation and collaborative planning and forecasting, requires, in reality, building mutually-beneficial long-term cooperation relationships between partners

Synergistics and Collaboration in Supply Chains: An Integrated …

629

and devising efficient strategies of such cooperation that meet both common objectives and individual partner objectives. Organizational integration and coordination based on such logistics technologies do not call for enhanced and modified information sharing alone, but also business transformations, logistics process reengineering, and the modification and improvement of planning methods. In this situation, a proactive assessment of managerial decisions and their influences on supply chain performance is, strategically, a challenging task, which cannot be achieved without appropriate supply chain modeling.

3 Theoretical Background 3.1 System Modeling and Stratification of Supply Chains The theoretical and methodological foundation of supply chain modeling is represented by a group of sciences: general systems theory, cybernetics, applied systems analysis, management theory, synergetics, etc. The exploration of a complex system’s synergies and properties, which are manifested through its element interactions, is the key area of study for systemological sciences. The core procedure of systems analysis is building a generalized model representing all factors and interactions in a real system. Systems analysis serves as the methodological foundation of comprehensive supply chain modeling, and as an analysis and synthesis (designing) tool. The supply chain integration described above suggests the treatment of a supply chain as a single and consistent system. Integral supply chain management requires accounting for multiple interrelations, interactions and interdependencies of all business processes used by a complex logistics system. Today, the theoretical and methodological foundation of complex systems sciences is to a large extent being developed and replenished based on the principles of synergetics. The emergence of synergetics and its variations studying objects of different natures, as compared to the systemological group of sciences, the predecessors of synergetics, has shifted dynamic systems analysis towards the exploration of specific structural and dynamic changes in complex systems, which study selforganization processes in open nonlinear environments of different natures. System modeling based on synergetics principles helps to describe a system’s evolution based on the principles of self-organization as an open developing system with new properties and structures emerging within [35]. The large number of interrelated and interacting elements, the multifunctionality and expressed heterogeneity of elements, the diversity of cause and effect relationships, and the presence of nonlinear inverse relations impede supply chain studies. Structured analysis, the design of supply chains, and the systematic configuration of network structures within a supply chain traditionally focus on exploring different supply chain structures and require coordinated representation of multiple supply chain structures.

630

N. Lychkina

SCM theory applies object-based and process-based approaches to supply chain modeling. The focus here is that a chain is an interrelated sequence of links used to deliver a product or service to the end consumer, or a sequence of events/processes organized in order to achieve a desired business objective. The object-based approach treats a supply chain as a structure consisting of a focal enterprise and its suppliers and consumers covering all links through which material, finance and cash flows go. Along with the object-based approach, the theory and practice of logistics and SCM successfully apply a process-based approach, where a supply chain is explored and designed as a sequence of flows and processes. A supply chain here means a related sequence of flows and processes occurring between different links and integrated in the focal enterprise’s strategic planning to meet the consumer demand for products or services. The theory and practice of SCM use several options of the process/flow-oriented representation of a supply chain. These include, in particular, the S model, SCOR model, GSCF (Global Supply Chain Forum) model and a number of other supply chain models. The SCOR model is the most popular model in business practice. In supply chain configuration objectives, the basis for decomposition may include network description (network models and structures), channels or logistics networks (detailed to the extent of functional areas of logistics: manufacturing, procurement or distribution logistics). The basic standard supply chain structure represents its key participants: suppliers, manufacturers, distributors, retailers, and consumers. A supply chain is also characterized by dynamic complexity, which does not manifest itself in changing structures solely. Time and dynamics are a significant factor in studying logistics systems, but they have to be disregarded when setting optimization goals. Time is a measure of logistics processes and the order fulfilment cycle that defines the level of logistics service. State and movement (turnover) of material and finance flows, the stability of and fluctuations in supply chains, demand dynamics and developments in the external competitive environment, supply chain adaptability as a system’s ability to change its behavior and transit to a new stable state, and, finally, the development processes themselves, development dynamics and scenarios should be accounted for in analyzing the model-based dynamic supply chains. Supply chains are also stochastic systems, where multiple uncertainty and risk factors should be accounted for [36]. The most significant factors, which cannot be disregarded by the appropriate performance analysis of dynamic and developing supply chains, include: demand uncertainty and fluctuations, external environment uncertainty (stochastic factors of the external environment, incomplete information, behavioral uncertainty (ambiguous objectives, biased decision-making, etc.)); any logistics risks (and other stochastic factors) and bottlenecks that interfere with the operation of individual supply chain links, and a number of other factors. The unavoidable multicriteriality of supply chain performance evaluation is related to measuring and comparability a set of indicators of a supply chain in general and its individual components, participants or processes, including the following: revenues, logistics costs, logistics service quality, accuracy and time of order fulfillment, demand dynamics, the efficiency of supply integration and coordination,

Synergistics and Collaboration in Supply Chains: An Integrated …

631

including trust and the quality of relationships between partners, agent-specific risks, inventory dynamics and turnover in different logistics system nodes, and a number of other indicators helping to identify bottlenecks, constraints, imbalances and conflicts in different supply chain segments. A supply chain is also a complex social organizational system (i.e. with human involvement). The study of behavioral aspects (more likely to be the subject of social sciences, organizational behavior, behavioral economics, or psychology) and inter-organizational interaction are critical aspects of supply chain management and modelling. In terms of content, coordination in a supply chain aims at aligning individual objectives and behavioral options of individual participants with the global objective and affects the quality of global objective achievement. Participant and partner behavior, individual interests and objectives, motivation and decisionmaking, the establishment of coalitions, building and maintaining trust-based relationships between supply chain agents and other factors attributable to coordination and cooperation are also substantial dynamic factors in supply chain modeling. Coordination, cooperation and interaction between partners facilitate the creation of new organizational properties of a supply chain. Therefore, cooperation strategy and selforganization ability may significantly impact the performance indicators throughout a supply chain. The integrated nature of activity within a supply chain and polysystemic representation of its issues require the concurrent achievement of a number of goals focused on integration and the systemic representation of a supply chain: the alignment and optimization of key business processes by adding more value (a supply chain as a set of interacting processes), management of interrelated material, finance and information flows, and inter-organizational coordination or collaborative interaction and cooperation of supply chain participants. Description of a supply chain as an integral whole from the viewpoint of a systemic approach calls for studying the aggregate of multiple interrelated structures, flows, processes, participants and coordination mechanisms. The conceptualization, structuring and detailing of objects and processes in a supply chain may be performed from any angle of vision depending on the addressed tasks. In real tasks of supply chain analysis and synthesis, configuration and supply chain development analysis, these representations may overlap and complete each other. It is also necessary to address the static and dynamic descriptions of a supply chain in conceptual modeling, in addition to structural descriptions. A dynamic description of a modeled supply chain should include a number of factors, processes, cause and effect relationships, and internal and external (exogenetic) variables adding to and detailing such structural descriptions. The model-based research of a complex supply chain faces the challenge of the stratification of its structural layers and the interpretation of interactions between the strata. This defines some conventional strata in SC model descriptions and approaches to stratification and modeling of supply chains as the objects of research of an interdisciplinary nature, which have substantial structural and dynamic complexity. The complexity of modern supply chains and addressable management

632

N. Lychkina

Fig. 1 Conceptual diagram of a general simulation model framework of a supply chain based on composite combination of SD-DES-ABMS paradigms: major representations and strata

tasks define the issues of comprehensive modeling based on consistent methodological principles and the alignment of different strata models and representations of the conceptual description of supply chains. The conceptualization and general model framework of a supply chain (Fig. 1) includes a number of interrelated representations: object-based description of a logistics system, network model

Synergistics and Collaboration in Supply Chains: An Integrated …

633

(network topology), process description, and description of coordination functions and mechanisms, and clarifies the composition of descriptions, including dynamic representation and the level of detail depending on the addressed management tasks. Different strata of a complex system are characterized by a different degree of organization and the nature of dynamic processes occurring in different strata of such system [35]. The strata of representation of inter-organizational coordination functions and mechanisms includes active elements of the modeled system such as agents and describes the inner structure of a complex system made up by supply chain agents. A supply chain includes multiple agents (suppliers, customers, etc.) with different needs, objectives and decision-making behavior. Other representation levels (network-based, process-based) of the modeled system generally represent the supply chain’s production and logistics infrastructure (primarily physical) made up of respective facilities (production, transportation, and warehousing), which aggregates general and functional area-specific properties and performance indicators. The emergent behavior phenomena and self-organization describe the interrelation between the system’s micro-structures and system-wide behavior. At the systemwide level, a complex system demonstrates the development and evolution trajectory, changes in the system and the state of the system over time. A general, conceptual, multilayered diagram (stratified description) of supply chain system modeling has been proposed, which is based on multilayered representations of supply chain structures, configurations and processes that describe strategic planning and development of the supply chain and its production and logistics infrastructure; description levels identifying the processes of inter-organizational coordination and strategical cooperation of supply chain agents have been defined.

3.2 SC Modeling Methods and the Analysis of Dynamic SC The SCM integration paradigm and improved SC analysis and synthesis methods require modern computer-aided decision-making and simulation methods applied in SCM. Simulation supports managers in decision-making at strategic, tactical and operational levels through visualization, understanding of complex systems behavior, and supply chain dynamics analysis and synthesis [37]. Many organizations have successfully introduced simulation models in their supply chain management and optimization practices. Simulation modeling (S&M) [12, 38], which uses applied systems analysis as its methodological foundation, helps to overcome the axioms of mathematical modeling and offers more opportunities in examining complex logistics systems such as: • a comprehensive understanding of processes and characteristics of a supply chain by using charts and advanced animation • the possibility of accounting for the stochastic nature and dynamics of multiple factors of the external and internal environment; the user can study the effects

634

N. Lychkina

of various factors, accidental occurrences or risks to identify their influence on a supply chain • the possibility of representing the supply chain dynamics, reflecting the dynamic character of key logistics processes, behavioral aspects, and multiple time and cause and effect relationships • the application of a multistage design procedure helps a decision-maker to overcome decision-making difficulties and review a large number of alternatives and potential supply chain transformation scenarios and assessment criteria • risk minimization due to the preliminary analysis and modeling of potential scenarios of developments in a supply chain. The most popular paradigms of simulation modeling widely used in supply chain study and modeling include discrete or process-based event simulation (DES), system dynamics (SD), and agent-based modeling and simulation (ABMS). Comprehensive SCM challenges require a combined use of different modeling methods. A stratified description of a supply chain, as shown above, should combine representation methods for network structures, processes, flows, cooperation and inter-organizational coordination and many other occurrences or phenomena in the description of dynamic supply chains. Let us discuss which simulation models can form various strata of the proposed model framework. The selection of the appropriate modeling paradigm is an important step in the process of supply chain model development; it limits its application areas in terms of describing the dynamics of logistics processes and occurrences, development processes, and solution of specific strategic tasks. ABMS, DES and SD paradigms have essentially different points of view on the modeling of the structure and dynamics of a supply chain for different representations. For this purpose, we will briefly describe and benchmark the DES, SD and ABMS methods in studying SC properties and cooperation processes. Today, discrete event simulation (DES) has become a practical technology of logistics engineering and auditing (so-called material flows models). A discreteevent model represents in detail the network structure of a supply chain and the movement of dynamic objects (goods, vehicles) through a network and helps to measure the time and cost of core business processes and analyze bottlenecks. This method of SC representation through a discrete event model allows for a detailed description of the configuration and topology of the supply chain with detailed characteristics and rules of performance of certain business processes in the network nodes, which is very helpful in designing optimal SC topology and configuration and specifying individual design solutions related to a selection of a SC strategy, its network structure configuration, logistics infrastructure facilities location, the determination of needs in logistics capacities, inventory management and transportation policies, business process reengineering and many other matters in terms of a complex solution regarding SC strategic and tactical planning. DES key benefits and potential in SC modelling: • describes complex topologies and network structures linked to a map

Synergistics and Collaboration in Supply Chains: An Integrated …

635

• accounts for individual characteristics (suppliers, regional demand, product mix, etc.) • applies the Activity-Based Costing methodology – value analysis linked to temporary parameters of the model-based business processes • describes asynchronous logistics processes; measures the time parameters of logistics processes, and dynamically changed routes • analyzes bottlenecks in network nodes, process alignment, enables comfortable description and selection of material flow management strategies (pull, push) • accounts for multiple stochastic factors (demand, reliability, supply failures, etc.) • detailed algorithms describing the rules of cargo flow handling, traffic control, etc. Therefore, DES best works for describing network configurations and core processes in a supply chain. Modeling by core processes (Plan, Source, Make and Deliver) and subprocesses in a supply chain is based on a reference process model using SCOR recommendations. However, the SCOR model is a static tool that does not include any capabilities for dynamic SC analysis and active reengineering of business processes using quantitative methods for analyzing SC performance indicators and dynamic parameters. The integration of simulation modeling and the SCOR reference model of operations provides advantages for the formation of a common simulation methodology for solving a wide range of supply chain management tasks. Barnett and Miller [39] described the architectural components used to implement the distributed supply chain modeling tool (e-SCOR), and e-SCOR applications that demonstrate how businesses are modeled and analyzed to determine the validity of alternative, virtual business models. Herrmann, Lin and Pundoor [40] described a new supply chain modeling framework that follows the SCOR model. The development and application of e-SCOR technique are presented in [16, 41–44]. The e-SCOR technique offers and supports a common methodology and hierarchical structure for modeling processes in supply chains, based on the conceptual structure of the SCOR reference model, the building blocks of which combine standard processes from the SCOR model and simulation models of multiple processes in supply chains, performed at various levels of detail, most often implemented using the DES process simulation technique (as well as ABMS), which provide not only improvement, but also synchronization of processes in the SC. When modeling the supply chain, a quantitative analysis of the efficiency of business processes is carried out, which allows you to analyze the order lead time, delivery accuracy and delivery speed, and other indicators defined in the SCOR recommendations, as well as to identify bottlenecks in the processes, and problems with the synchronization of processes in the SC. In management consulting, more efficient are the iterative optimization simulation design procedures developed by applying heuristics, which are (as shown in the literature review [59] and practical work in the field) usually tested on solutions for network structure optimization and SC logistics process modeling. Key phases of the procedure include a preliminary synthesis of the SC structure through optimization, a detailed simulation of SC processes followed by optimization (reducing

636

N. Lychkina

multiple scenarios with the optimization function built into the simulator), SC engineering and scenario analysis by using heuristics and simulation with account for additional risk factors. Tool solutions emerging as domain-specific SC simulation systems (PRODISI; Logic Net Plus; Supply Chain Builder; Sim Flex; AnyLogistix) usually apply this method and combine the capabilities of analytical optimization methods and process simulation. Useful system methodologies in the management area utilized to analyze and improve logistics structures and processes include the theory of constraints (TOC) by Goldratt, VSM (Value Steam Mapping), ABC (Activity-Based Costing) and a few others that can hardly be applied without a quantitative analysis of business process efficiency and the alignment of SC logistics processes ensured by the DES technique. However, although widely applicable in the logistics field, the method has a number of limitations when used for analyzing stability processes and behavioral aspects, decisions on supplier or consumer relations management or self-organization processes in supply chains. Dynamic entities in a process model are passive entities operating within a rigid structure (network); they describe logistics infrastructure elements such as transport and warehousing infrastructure facilities, cargo flows, and resources, but cannot reproduce the activities of supply chain agents who make their own decisions. The decentralized decision-making and self-organization inherent at the level of representation of inter-organizational coordination processes cannot be reproduced by using DES constructs either. Although most researchers refer the network configuration task to the tactical level, a reproduction of development dynamics, evolution and structure transformation in the process of DES-based modeling is not an easy task. The concept of system dynamics (SD) proposed by Forrester describes the modelbased complex system in the constructs of interacting flows of different natures and multiple interacting loops of inverse relations contours. The concept of system dynamics allows for simulating dynamic processes at a high level of aggregation; it rests on the concept of dynamic system functioning as a set of flows (cash, material, etc.). The general structural diagram of system dynamics models identifies two sections: flow network and information network. In system dynamics models, the system problems are described at an aggregated level and over the longer term. The first system dynamics model of a supply chain was developed by Forrester [46] and based on the example of a simple production and sales system that consisted of only two flows: material flow and order flow, whose interaction was defined on the basis of the rules of defining order quantities, whose rules regulated procurement and inventory in an organization. The model also accounted for organizational relationships and delays occurring in the system. These were the first adaptive supply chain models, which were later perpetuated in The Beer Game developed by Sterman [47]. The model allowed for studying potential fluctuations or imbalances in a system behavior caused by accidental demand changes that resulted in periodic inventory fluctuations occurring due to organizational relationships and management rules for production plants, wholesale and retail, and delay influences on order and material flows. Later, logisticians called such supply chain effects the “Forrester effects” or “bullwhip effect.”

Synergistics and Collaboration in Supply Chains: An Integrated …

637

Application of system dynamics in studying supply chains and engineering activities helps to: • study fluctuations in supply chains, the bullwhip effect • analyze time parameters and aggregate costs of supply chain performance • show complex interaction in managing material, finance and information flows when making managerial decisions • develop a strategy, perform integrated management of business processes and resources of partners – studying the systemic functions of logistics • research the impact of various factors (demand dynamics, competitive environment, market situation, and other exogenic factors); demonstrate how lower logistics system performance efficiency leads to the loss of customers and market share (the marketing concept of logistics). Strategic analysis and dynamic models of organizations are the most popular application areas of the system dynamics in management consulting practice globally [47–50]. Consulting companies develop system dynamics models of organizations, use models for strategic forecasts, offer advice based on experiments with performance improvement models, promote “systemic thinking” among managers, shape their mental models, and hold training and business games in companies to teach managers how to take coordinated decisions. Inclusion of SC submodels into dynamic models of organizations helps to develop and analyze a system of Balanced Scorecard (BSC) as it evolves over time and in strategic perspective, and align logistics and corporate strategies (marketing, innovations, etc.) (cross-functional interaction). A number of studies address adaptive SC modeling using SD methods [22, 51–53]. However, the system dynamics paradigm, while applying an aggregate approach without specifying particular agents or entities, the top-down modeling principle, has limited capabilities in describing clusters and heterogeneous objects with different properties, mainly defined in a model as volume and time characteristics of the examined flows and other flows, thus limiting the methods’ capabilities in describing organizational interactions and effects of emergent behavior inherent to inter-organizational interaction process representations. Therefore, system dynamics best fits the object-based approach to SC decomposition. SD and DES reproduce (emulate) the actual performance of logistics infrastructure at various levels of aggregation of SC objects or processes. Agent-based modeling (ABMS) focuses on identifying active elements in a system, i.e. agents (individuals or organizations) and interactions between them and with the environment. Global (system level) behavior emerges from the interaction of agents and their individual behavior. An agent is an active system element, which is, to some extent, independent and can make its own decisions relying on the available information about the environment and other agents’ actions. An agent can be smart, i.e. learn from its own experience. Academic literature notes such agent’s properties as independence, response ability, proactivity, social capability, adaptability, etc. System behavior is described at an individual level, while global behavior is treated as a result of cumulative activities of agents existing in a common environment, each of them acting according to its own rules. Behavior of a complex system is

638

N. Lychkina

an outcome of interactions between agents whose behaviors are manifested in such system, which enables observations and studies of general patterns and properties inherent to the system. A systemological simulation model is a bottom-up model; it is created by setting the individual behavior logic of the process agents, while the behavior trends, patterns and characteristics of the entire system are created as integral behavior characteristics of the aggregate of the agents who make up the system. The key objective of agent-based models is to obtain knowledge of the global rules, general behavior patterns and trends, and the system’s dynamic properties are based on the individual assumptions, individual behavior of active objects and their interactions within the system. In such models, the agent is a supply chain link (a company) that acts independently using local information available to it to respond to market changes. There is information interaction between supply chain participants (the system, however, may lack centralized management). The key reasons for applying agent-based modeling in supply chain management are attributed to the following: • Agent-based models are suitable for the analysis of interrelated problems when there are multiple agents with scattered (independent) knowledge and certain communication patterns • Focus on cooperation and collaborative planning strategies • Complex system of communications between different links of a supply chain • High degree of independence of each supply chain link and principles of decentralized management. Social or economic behavior and agent interaction within a supply chain are key factors of the system’s dynamic behavior. Multiplicity and heterogeneity, local interactions, limited rationality, activity of decision-making agents, adaptability, knowledge sharing and decision changing across various subjects in the system and their interrelations are the factors that have significant influence on the description of the supply chain agents’ behavior. ABMS, therefore, best works for representing processes and occurrences of inter-organizational coordination within a supply chain. The key benefits of using agent-based modeling in supply chain management are: it supports the key task of SC management, i.e. coordination and interaction between different agents; it may align internal business processes (of SC partners) with supply chain-wide business processes; a multi-agent model helps to develop common business rules and introduce a system to manage common business processes, thus ensuring efficient information exchange; agent-based simulation models, similar to business games, enable businesses to develop a trust-based strategy. The model reproduces the emergent behavior and new organizational structures based on participant interaction rules, i.e. certain properties of the model-based supply chain, as defined by managerial decisions on agent relations. Let us examine which modeling constructs are suitable for describing strategic development, inter-organizational coordination and cooperation processes and emergent effects in supply chains. For this purpose, the model structure of a complex adaptive and growing supply chain should concurrently reproduce the properties of

Synergistics and Collaboration in Supply Chains: An Integrated …

639

different representation levels of the model-based system (Fig.). A prerequisite for the property of emergent behavior, self-organization, and consequently, reproduction of dynamically built structures in a system, is the presence of a hierarchy of levels within such a system. SD’s aggregated approach does not let such phenomena be reproduced in SC models. SD and DES also have difficulties with the inclusion of the evolution of complex adaptive systems, since in both paradigms a system structure is treated as a constant structure rather than a structure transformed due to the cooperation effects created by agent interactions. In addition, agents within a system change. Their composition and decision-making rules change depending on the interaction with other agents, external environment, and knowledge and experience accumulated during cooperation (adaptive agents). Such aspects can only be modeled in ABMS. However, decisions resulting from coordination and interactions between agents directly impact the logistics infrastructure and logistics processes which determine efficient performance of a supply chain, and define the properties and overall efficiency of a supply chain. This defines a compensatory (composite) combination of simulation paradigms in high-level SC models based on the above proposed multi-layer conceptual diagram of model framework. DES and SD are used to describe evolution and development, SC structure and some properties of adaptive SC (SD), while transforming (due to agent decisions) SC structures, occurrences and phenomena of dynamically formed structures and inter-organizational coordination are reproduced by using ABMS (Fig.). Let us discuss the general approach to building composite simulation SC models describing different strata. A conceptual model of a developing dynamic supply chain rests on the conceptualization and stratification approaches discussed above: representations of objects and flows, network configurations, logistic processes and inter-organizational coordination (agents and their cooperation). The model structure of a supply chain should align the conceptual level on which individuals make their decisions and act with the level that describes the status, basic structure and development of a supply chain. All model’s variables are continuously changing over a long period of time influenced by external and internal factors in the conditions of transforming system structures and SC properties. The effective modeling constructs of developing supply chains build on the principles of a composite combination of system dynamic, process-based and agent-based simulation models. Composite dynamic SC models function on the basis of a single model and information framework, thus enabling arrangement of information sharing processes and interaction mechanisms between the model-based system representations. The top level of a model layer represents the logistics infrastructure of a supply chain and the business environment where economic agents manifest their individual behavior and which predefines decision-making rules, agents’ experience, knowledge, and cooperation strategies. In turn, the model layer, which describes behavior and interaction between agents, launches the processes of self-organization, cooperation strategy, and new organizational forms that define overall supply chain performance and management. Such an approach to building multi-model complexes based on composite simulation models allows studying the dynamics and development of a supply chain by using cyclic interconnection of model strata in the examined organizational system [35].

640

N. Lychkina

4 Ontological Modeling of SCM Domain The availability of multiple languages of model representations of the domain posts calls for aligning models that are described by using different languages and methodologies and searching for a single language to describe the SCM domain. It requires an integrated computer environment supporting both formal and informal business strategy modeling methods, SCM strategy and the efforts on SC analysis and synthesis at different levels of conceptualization of the SCM domain to identify overlaps and links between different descriptions. The structure of a model complex and semantic description of the domain can be developed on the basis of ontological modeling [7, 54–56]. A model complex of the supply chains management domain is created by using modern ontological simulation languages (Protege, an ontology editor supporting the OWL) on the basis of a multilayered semantic network based on metaontology (critical categories of the strategic management) [55], the subject area (SCM) ontologies and applied ontologies (SCOR, etc.), and represents a knowledge management system in the SCM domain. A high-level meta-model as an ontology can serve as a basis to integrate heterogeneous representations in SC strategic transformation projects. Metaontology is detailed by using the domain ontologies whose number is only limited by the practicability of detalization. Ontological models of the SCM subject area and applied ontologies represent heterogeneous model descriptions of structures, network configurations, business process models and SC coordination mechanisms in different modeling languages. Ontological models ensure a consistent conceptual framework and a glossary for all project participants, provide SC model visualization tools interpretable by a wide range of specialists (managers, system analysts, and all stakeholders), ensure information interaction, and logical consistency of models that form the domain, and serve as a manager interface in working with a multi-model consisting of simulation and other SC models [21, 45]. Models and methods of ontological description and transforming supply chains are created based on the framework conceptual schemes examined in this paper and meet the following requirements: • comprehensive description of the SCM domain fits integration and analysis of its aspects in SC transformation and strategic development projects; • availability of most general and basic definitions and relationships in the modeled SCM domain with the possibility of detailed specification of its aspects; • description of dynamic SC models with different levels of detail based on conceptual schemes; • navigation through different levels of SC description (models), including the strategic level, link with the corporate strategy, SC configuration, object-based and process-based SC representation, and modeling of inter-organizational coordination mechanism; • machine-readable with possible full or partial translation into other simulation languages for integration with existing methodologies and modeling tool environments;

Synergistics and Collaboration in Supply Chains: An Integrated …

641

• availability of SC model visualization tools interpretable by a wide range of specialists; • consistent set of definitions and a glossary for all participants of SC transformation and strategic development process. Ontological approach to SC modeling offers such benefits as versatility and possibility to describe various SC design aspects from a set of strategies and objectives to the organizational structure and a set of business processes; the model fits any level of detail from the top level of description of basic categories of strategic management to the level of designing specialist analytical applications; certain aspects of the framework scheme may be described in other simulation languages by using the definitions introduced in the ontological model. A complex of ontological models also makes it possible to establish links between models at different levels of generalization; visualization simplifies the perception and work with complex conceptual schemes and a large number of objects in interconnected measurements of generalization, aggregation and detailing at various conceptual levels of SC representation. An ontological model integrates definitions used in various aspects of strategy or SCM and creates a harmonized set of definitions understandable and adjustable both by simulation specialists and logistics specialists; it is machine-readable and translatable into other tool environments. Applied ontological models of SCM are used to create an information base to build a general simulation model of SCM, including object-based SC representation, its network configuration, process-based representation, and description of logistics coordination mechanisms. General simulation models of SCM as components of metaontology allow for analysis of the future status and dynamics of a developing supply chain as an integral system, make it possible to devise a supply chain development strategy, and align it with the logistics strategy facilitated by the scenario analysis of strategic options in a dynamic external environment.

5 Simulation Applications The applied task was to demonstrate how SC management improvements achieved through using inter-organizational coordination and cooperation technologies can be assessed in a multi-agent model that supplements the general object-based or process-based model of a supply chain. The conceptual foundations for building a multi-agent SC model are presented in the paper [32]. A supply chain includes multiple agents with different needs, objectives and decision-making behavior. The activity of SC elements (organizations) is determined by their own objectives and interests. Their behavior is manifested in an open and dynamic business environment. An agent-based model of a supply chain should include decomposition of the supply chain (basic structure and participants composition) and details with a description of agents’ behaviors, agents’ internal

642

N. Lychkina

structures, their cooperation processes, interaction methods, and alternative communication methods resulting from the cooperation in a layered supply chain structure, redistribution of powers and responsibilities (distribution of duties and responsibilities), and implementation of planning and coordination processes that link agent decisions. The model contains descriptions of specific coordination processes and mechanisms of inter-organizational integration, cooperation and coordination in a supply chain which identify and explain social interactions on the basis of the ABMS paradigm. In a supply chain, suppliers, manufacturers, retailers and buyers form a social network with multiple formal and informal interactions. Formal interaction between partners is mostly exercised through contracts or agreements, which define obligations and terms of transactions between the parties. For example, information and knowledge sharing between SC partners may be defined by their formal interaction (i.e., contract terms) or informal social factors (i.e., trustworthy relationships). Motivation issues and conflicts between partners are an important factor for agents when making individual and coordinated decisions. Economic analysis does not include analysis of performance indicators of a supply chain alone, but also benefits and risks of all process participants to measure conflicts and build an aligned scheme of economic compromise between the partners. Relationships between supply chain links based on trust and partnership imply an interrelation between the links and each link’s ability to take a step forward. If trust is undermined by conflicts, relationships may be damaged, including to the extent of dissolution and creation of new coalitions (variable structures). Trust impacts the quality of cooperation, while the background of long-term partner relations (historical dependency, experience accumulation) results in strategic partnerships and stable alliances. The model allows for assessing the influence of inter-organizational coordination effect on organizational structures and processes in SCM. Different types of chain interactions may be directly or indirectly dependent on each other. Organization of a supply chain as a dynamic entity is the outcome of actions undertaken by agents; agents are limited by their organizational structures and available resources. In addition, partner coordination mechanisms affect decision-making whose outcomes influence the performance and development of the physical (logistics) infrastructure and organization links directly connected with it. Decision-making in a network also depends on and is limited by the attributes of its physical components (production facilities, existing logistics infrastructure), organizational and technological changes, and SC transformation decisions, and on the quality of information sharing and applicable methods of integrated (collaborative) planning, etc. Therefore, a general model of a supply chain should describe both social (cooperation processes) and physical aspects (production and logistics infrastructure, key business processes), since the coordination function is directly aimed at SC process alignment (configuration and process levels of a model layer representation). The scenario study of collaborative supply chains and defining strategies for interorganizational interaction and cooperation of supply chain participants using ABMS is based on the reference model of the maturity of inter-organizational relations

Synergistics and Collaboration in Supply Chains: An Integrated …

643

“4C” [31], in which levels and models of maturity of inter-organizational interaction of SC counterparties: communication, coordination, collaboration, and cooperation. Composite simulation models were implemented in AnyLogic environment and tested for the following scenarios: (1) Introduction of logistics technology, VMI: agent cooperation strategies; (2) Introduction of logistics technology, CPFR: agent cooperation strategies. The VMI simulation model included the principles of information sharing (information integration) between agents and vendor inventory management rules across different links of a supply chain. The CPFR simulation model additionally included collaborative planning procedures and appointed a planning agent, who was authorized by other planning process participants with the duties of collaborative planning and inventory replenishment in all SC links. The simulation helped to quantify various strategies and cooperation options in a supply chain. The baseline scenario used a traditional multi-link supply chain operating in a dynamic environment, which, after the introduction of logistics technologies and certain inter-organizational integration mechanisms, featured changes in the organizational structure, logistics infrastructure, business processes and responsibility (functions), income and risk sharing between the partners. The decision-making function has a built-in conflict resolution procedure between partners, if conflicts are revealed through regular monitoring. The simulation model helped to assess, over the longer term, the efficiency of coordination used as a basis for creating an efficient scheme of economic compromise ensuring both stability and quality of (trustbased) relationships between agents and the integral efficiency of SC performance by multiple criteria: incomes (of agents), logistics costs (cumulative and per each individual link), inventory balance and turnover (cumulative and per each individual link), logistics cycle time, service quality, and others (reliability and stability). Performance of scenario studies on a multi-model complex allows for analyzing SC operating processes under different management strategies, SC configurations involving different SC participants and their coordination mechanisms, and for assessing the prospects of SC strategic development. Assessments of the influences of network nodes’ conflicts and imbalances on performance and development of the supply chain in general have been reviewed [32].

6 Main Conclusions and Future Research Efficient system and simulation solutions in supply chain management rest on the following: • Principles of managerial integration and balanced strategic, tactical and operational decisions, and principles of alignment of models of different levels or descriptions • Polysystemic representation and simulation of the logistics and supply chain management domain as a basis for creating a single model framework

644

N. Lychkina

• Synergy paradigm and composite system dynamic and agent-based models of supply chains built on its basis, which makes it possible to describe structure dynamics, processes of new organizational structure emergence, performance of a supply chain, and its measurable characteristics, as well as self-organization processes manifested through the behavior of supply chain agents, their cooperation strategies and logistics technologies based on collaboration and collaborative planning. The integrated nature of SCM activity and synergetic effect require examination of different forms of integration. The authors have studied key types of integration in a supply chain. Introduction of logistics technologies using partner coordination and integration (VMI, CPFR, etc.) leads to transformation of structures and organizational and technological changes in a supply chain. Synergetic effect of SCM requires defining and studying a number of aspects. The authors have proposed a conceptual diagram of the general simulation model of a supply chain and defined main strata of model representation. This paper examines the potential of and benchmarks various simulation paradigms in describing dynamic occurrences in supply chains as complex structural and dynamic systems. The paper deals with the genesis, key factors and mechanisms of inter-organizational coordination as objects of representation in dynamic models of developing systems, such as behavior, motivation and alignment of partner interests, integrated collaborative planning, and related information and knowledge sharing, and formation of coalitions and flexible network structures, etc. Composite simulation models include descriptions of evolution and development of transforming supply chains (by using SD and DES constructs) and model descriptions representing inter-organizational coordination processes of supply chain agents on the basis of ABMS by using cyclic interconnection of different model strata. Such modeling constructs help to study the structural and dynamic aspects of SC (adaptive and developing supply chains), solve the tasks of long-term development and efficient transformation of supply chains, align strategic managerial decisions at inter-organizational levels, and search for efficient inter-organizational coordination and long-term cooperation strategies between supply chain participants. Interrelation between inter-organizational coordination and agent cooperation strategies, which are based on trust and collaborative conflict resolution, and supply chain performance in strategic terms are the principal area of research of this paper and future studies. In the future, the focus will be on the alignment of strategic, tactical and operational SCM solutions using a single model framework. The area of future research in creating a multi-model SC set covers the methods of its creation and building of models in the SCM domain by utilizing ontological modeling [54, 56, 57]. The authors will develop a suite of ontological models in the SCM and strategic planning domain as a single tool base for aligning models at different management levels and representation strata of designed supply chains in solving the tasks of strategic development and performance improvement of supply chains. Applied ontological models of SCM are used to create an information base for building a general simulation model of SCM, including object-based representation, SC network configuration,

Synergistics and Collaboration in Supply Chains: An Integrated …

645

process-based representation, and description of inter-organizational coordination mechanisms. The SC simulation model acting as a component of metaontology [55, 58] enables the analysis of the future status and dynamics of a developing supply chain as an integral system and design a development strategy facilitated by scenario analysis of strategic options in a dynamic external environment.

References 1. Shapiro, J.F.: Modeling the supply chain. Pacific Grove, CA: Wadsworth Group (2001) 2. Min, H., Zhou, G.: Supply chain modeling: past, present, and future. Comput. Ind. Eng. 43, 231–249 (2002) 3. Tako, A.A., Robinson, S.: The application of discrete event simulation and system dynamics in the logistics and supply chain context. Decis. Support Syst. 52, 802–815 (2012) 4. Oliveira, J.B., Lima, R.S., Montevechi, J.A.B.: Perspectives and relationships in supply chain simulation: a systematic literature review. Simul. Model. Pract. Theory 62, 166–191 (2016) 5. Kersten W., Saeed, M.A.: A SCOR Based Analysis of Simulation in Supply Chain Management. In: Proceedings 28th European Conference on Modeling and Simulation. Brescia, Italy (2014) 6. Castilho, J.A., Lang, T.E. and Peterson, D.K., Volovoi V.: Quantifying variability impacts upon supply chain performance. In: Proceedings of the 2015 Winter Simulation Conference, 1892–1903 (2015) 7. Fayez, M.S., Rabelo, L., Mollaghasemi, M.: Ontologies for supply chain simulation modeling. In: Proceedings of the 2005 Winter Simulation Conference, 2364–2370. IEEE, Orlando, FL. (2005) 8. Hennies, T., Reggelin, T., Tolujew, J., Piccut, P.A.: Mesoscopic supply chain simulation. J. Comput. Sci. 5, 463–470 (2014) 9. Jain, S., Sigurðardóttir, S., Lindskog, E., Andersson, J., Skoogh, A., Johansson, B.: Multiresolution modeling for supply chain sustainability analysis. In: Proceedings of the 2013 Winter Simulation Conference, 1996–2007 (2013) 10. Kim, W.S.: Effects of a trust mechanisms on complex adaptive supply networks: an agent-based social simulation study. J. Artif. Soc. Soc. Simul. 12(4), 2 (2009) 11. Long, Q.: A multi-methodological collaborative simulation for inter-organizational supply chain networks. Knowl.-Based Syst. 96, 84–95 (2016) 12. Lychkina, N.: Simulation of dynamic supply chains. Logist. Supply Chain Manage. 6(89), 137–152 (2018) 13. Ponte, B., Costas, J., Puche, J., S.de la Fuente, D., Pinoa, R.: Holism versus reductionism in supply chain management: an economic analysis. Decision Supp. Syst. 86, 83–94 (2016) 14. Terlunen, S., Horstkemper, D., Hellingrath, B.: Adaption of the discrete rate-based simulation paradigm for tactical supply chain decisions. In: Proceedings of the 2014 Winter Simulation Conference, 2060–2071 (2014) 15. Palma-Mendoza, J.A.: Hybrid DES/SD simulation conceptual framework for supply chain analysis. Int. J. Data Sci. Geneva 2(3), 246–259 (2017) 16. Chatfield, D.C., Harrison, T.P., Hayya, J.C.: SISCO: an object-oriented supply chain simulation system. Decis. Support Syst. 42(1), 422–434 (2006) 17. Krejci, C.: Hybrid simulation modeling for humanitarian relief chain coordination. J. Hum. Logist. Supply Chain Manage. 5(3), 325–347 (2015) 18. Persson, F., Bartoll, C., Ganovic, A., Lidberg, M., Nilsson, M., Wibaeus, J., Winge, F.: Supply chain dynamics in the SCOR Model—A simulation modeling approach. In: Proceedings of the 2012 Winter Simulation Conference, 1–12. Berlin (2012) 19. Behdani, B.: Evaluation of paradigms for modeling supply chains as complex socio-technical systems. In: Proceedings of the 2012 Winter Simulation Conference, 3794–3808 (2012)

646

N. Lychkina

20. Ramanathan, U.: Performance of supply chain collaboration—a simulation study. Expert Syst. Appl. 41(1), 210–220 (2014) 21. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N., Yasar, A-UI-H.: A real-time service system in the cloud. J. Ambient Intell. Hum. Comput. 11, 961–977 (2020) 22. Angerhofer, B.J., Angelides, M.C.: A model and a performance measurement system for collaborative supply chains. Decis. Support Syst. 42, 283–301 (2006) 23. Arvitrida, N.I., Robinson, S., Tako, A.A.: How do competition and collaboration affect supply chain performance. An agent based modeling approach. In: Proceedings of the 2015 Winter Simulation Conference, 218–229 (2015) 24. Sergeyev, V.: Supply Chain Management: Bachelor and Master Degree. Uright, Moscow (2014) 25. Bek, M., Bek, N., Buzulukova, E., Sheresheva, M.: Research Methodology of Network Organization. Higher School of Economics Publishing, Moscow (2011) 26. Choi, T.Y., Hong, Y.: Unveiling the structure of supply networks: case studies in Honda, Acura and Daimler Crysler. J. Oper. Manag. 20, 469–493 (2002) 27. Dyer, J.H., Singh, H.: The relational view: cooperative strategy and sources of interorganizational competitive advantage. Acad. Manag. Rev. 23, 660–670 (1998) 28. Radaev, V.: Relational exchange in supply chains and its constitutive elements. J. Econ. Sociol. 16, 81–99 (2015) 29. Baratt, M.: Understanding the meaning of collaboration in the supply chain. Supply chain Manage. Int. J. 9(1), 30–42 (2004) 30. Fugate, B., Sahin, F., Menzer, J.T.: Supply chain management coordination mechanisms. J. Bus. Logist. 27(2), 129–161 (2006) 31. Lejeune, M.A., Yakova, N.: On characterizing the 4 C’s in supply chain management. J. Oper. Manag. 23(1), 81–100 (2005) 32. Sergeyev. V., Lychkina, N.: Agent-based modelling and simulation of inter-organizational integration and coordination of supply chain participants. 2019 IEEE 21st Conf. Bus. Inform. (CBI). 2, 436–444 (2019) 33. Janamanchi, B., Burns, J.R., Liu, S.: Performance metric optimization advocates CPFR in supply chains: a system dynamics model based study. Cogent Bus. Manage. 3(1) (2016) 34. Sari, K.: On the benefits of CPFR and VMI: a comparative simulation study. Int. J. Prod. Econ. 113(2), 575–586 (2008) 35. Lychkina, N.: Synergistics and development processes in socio-economic systems: search for effective modeling constructs. Bus. Inf. 1, 66–79 (2016) 36. Hoshovska, O., Poplavska, Z., Kryvinska, N., Horbal, N.: Considering random factors in modeling complex microeconomic systems. Mathematics. 8(8), 1206 (2020) 37. Lychkina, N.: Innovative paradigm of simulation and their application in management consulting, logistics and strategic management. Logist. Supply Chain Manage. 5, 28–41 (2013) 38. Lychkina, N.: Simulation Modeling of Economic Processes. NFRA-M, Moscow (2014) 39. Barnett, M.W., Miller, C.J.: Analysis of the virtual enterprise using distributed supply chain modeling and simulation: an application of e-SCOR. 2000 Winter Simulat. Conf. (WSC) 1, 352–355 (2000) 40. Herrmann, J.W., Lin, E., Pundoor, G.: Supply chain simulation modeling using the supply chain operations reference model. In: Proceedings of the ASME 2003 Design Engineering Technical Conference, 1–9 Chicago, Illinois, USA (2003) 41. Fredrik, P.: SCOR template—a simulation based dynamic supply chain analysis tool. Int. J. Prod. Econ. 131(1), 288–294 (2011) 42. Ntabe, E.N., LeBela, L., Munsona, A.D., Santa-Eulalia, L.A.: A systematic literature re-view of the supply chain operations reference (SCOR) model application with special attention to environmental issues. Int. J. Prod. Econ. 169, 310–332 (2015) 43. Persson, F.: SCOR template—a simulation based dynamic supply chain analysis tool. Int. J. Prod. Econ. 131(1), 288–294 (2011) 44. Šitova, I., Peˇcerska, J.: A concept of simulation-based SC performance analysis using SCOR metrics. Info. Technol. Manage. Sci. 20, 85–90 (2017)

Synergistics and Collaboration in Supply Chains: An Integrated …

647

45. Kryvinska, N., Bickel, L.: Scenario-based analysis of IT enterprises servitization as a part of digital transformation of modern economy. J. Appl. Sci. 10(3), 1076 (2020) 46. Forrester, J.: Industrial Dynamics. MIT Press (1961) 47. Sterman, J.D.: Business Dynamics: Systems Thinking and Modeling for a Complex World. McGraw-Hill (2000) 48. Morecroft, J.: Strategic Modelling and Business Dynamics. A Feedback Systems Approach. Wiley (2007) 49. Pidd, M.: Computer Simulation in Management Science. Wiley (1998) 50. Warren, K.: Strategic Management Dynamics. Wiley, USA (2008) 51. Bhattacharjee, S., Cruz, J.: Economic sustainability of closed loop supply chains: A holistic model for decision and policy analysis. Decis. Support Syst. 77, 67–86 (2015) 52. Crowe, J., Mesabbah, M., Arisha, A.: Understanding the dynamic behaviour of three echelon retail supply chain disruptions. In Proceedings of the 2015 Winter Simulation Conference, 1948–1959 (2015) 53. Langroodi, R.R.P., Amiri, M.: A system dynamics modeling approach for a multi-level multiproduct, multi-region supply chain under demand uncertainty. Expert Syst Appl. 51, 231–244 (2016) 54. Grubic, T., Fan, I.-S.: Supply chain ontology: review, analysis and synthesis. Comput. Ind. 61, 776–786 (2010) 55. Idiatullin, A.R., Lychkina, N.N.: Instrumental implementation of enterprise architecture models based on ontologies. Bus.-Inform. 5, 31–42 (2011) 56. Scheuermann, A., Leukel, J.: Supply chain management ontology from an ontology engineering perspective. Comput. Ind. 65, 913–923 (2014) 57. Lauriera, W., Poels, G.: Invariant conditions in value system simulation models. Decis. Support Syst. 56, 275–287 (2013) 58. Lychkina, N.: Strategic development and dynamic models of supply chains: search for effective model constructions. In: Bi, Y., Kapoor, S. and Bhatia, R (eds.) Lecture Notes in Networks and System: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016, 2, 175–185. Springer, London (2018) 59. Kumar, S., Nottestad, D.A.: Supply chain analysis methodology—leveraging optimization and simulation software. OR Insight. 26, 87–119 (2013) 60. Rabelo, L., Eskandari, H., Shaalan, T., Helal, M.: Value chain analysis using hybrid simulation and AHP. Int. J. Prod. Econ. 105(2), 536–547 (2007) 61. Santa-Eulalia, L.A., Halladjian, G., D’Amours, S., Frayret, J.M.: Integrated methodological frameworks for modeling agent-based advanced supply chain planning systems: a systematic literature review. J. Indust. Eng. Manage. 4, 624–668 (2011)

Time Management and Procrastination Loretta Pinke , René Pawera , and Oskar Karlík

Abstract The aim of this paper is to introduce the time management tools that we can use to combat procrastination. The work is mainly focused on these two characteristics. The purpose is to launch software as a tool to combat procrastination and to point out its effectiveness in organizing time. Organizations that use time management software have a better overview of their activities, so we expect their performance to grow. Keyword Time management · Procrastination · Scheduling · Planning · Motivation

1 Introduction Nowadays, procrastination is a big problem that many people face every day. Procrastination means morbid postponement of tasks. In procrastination, the person in question prefers to do something different, less important, just to avoid the duties that are important. Subsequently, they regret doing something else rather than doing the essentials. After remorse and frustration, the whole process is repeated. Therefore, it is important to realize that we all have 24 h and it is up to us how we handle this time. Time management is important for the distribution of available time or for creating a schedule. Thanks to time management, it is easier to identify which activities are more urgent and important, and therefore they should be resolved as a matter of L. Pinke (B) · R. Pawera · O. Karlík Faculty of Management, Comenius University in Bratislava, Odbojárov 10, Bratislava 25, 82005 Bratislava, Slovakia e-mail: [email protected] R. Pawera e-mail: [email protected] O. Karlík e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_21

649

650

L. Pinke et al.

priority, and which we can postpone until later. Time management is a very popular topic nowadays, as it turns out that some people either do not want to or do not know how to use it, which wastes valuable time. The use of time management can also help in solving various problems that are not under the control of the person in question, and also because I repeat the methods of time management, we are ready to solve various problem situations. Time management can be one of the key tools in the fight against procrastination. The aim of the paper is to present selected time management software as a tool to combat procrastination. The theoretical part is divided into sections procrastination and time management. The second section of the theoretical part contains the forms and types of procrastination, its causes, possible consequences and solutions to combat procrastination. The third section deals with the development of time management and the influence of time management on procrastination. Subsequently, the third section deals with time management methods and the most common mistakes in time management. The practical part of the thesis contains instructions that describe how to work with online software in order to organize time, assign tasks and monitor the performance of various tasks at a specified time. Problems with procrastination have not only top managers, organizers of various events, but also completely ordinary people who are trying to fulfil their daily or life goals. It is always important to find a way to help defeat procrastination and be more productive. Time management helps reduce the consequences of procrastination. The tools presented in this work can increase the performance of every person who decides to use them.

2 Procrastination Procrastination from the Latin word “pro-crastinus” in translation means belonging to the day, or morbid procrastination of duties and tasks. In procrastination, we can’t force us to accomplish the tasks we have to do. We often do activities that are not significant so that we do not have to do things that are important. Insignificant activities include, for example, playing computer games, watching TV shows, cleaning up or spending time on social networks. Usually, there will be remorse, frustration, and a sense of helplessness that will cause us not to do the tasks again. Procrastination and laziness don’t mean the same thing. A procrastinating person is not lazy, he would like to do something, but he is not able to persuade him to do the task. On the contrary, a lazy person does not want to do anything, and he does not mind at all, he is happy with it. Procrastination doesn’t even mean rest, because we gain new energy while resting, while procrastination takes away our energy. With low energy, we are more likely to procrastinate again [1].

Time Management and Procrastination

651

2.1 Forms of Procrastination The following section focuses on the individual forms of procrastination. Four different forms of procrastination is being discussed in the upcoming part.

2.1.1

Situational and Feature Procrastination

Lay divided procrastination into situational and based on specific feature [2]. In situational procrastination, a specific individual exhibits passive behaviour under specific conditions and a specific situation, therefore it is only occasional procrastination. On the other hand, procrastination based on specific feature (feature procrastination) is such procrastination, when the individual procrastinates regularly, at every activity and in any situation. Schouwenburg differentiated three subtypes in the population [3]. The first type are individuals who have reduced situational procrastination due to the action of circumstances, however, show high values in feature procrastination. Which means that over the past week, individuals haven’t been through the activity they’ve been watching. However, this does not mean that these individuals were not lax in carrying out activities other than those monitored. The second type is those individuals who achieve high values in situational procrastination and also had high values in feature procrastination. This group has the largest representation in the population. The last subtype is individuals with high values of situational procrastination, but they have low values in feature procrastination. In other words, these individuals experienced a decrease in performance in the monitored activity, but this is not a typical way of working for this subgroup. Situational procrastination can be considered beneficial in two cases. The first case is to avoid sudden and inconsideration decisions—in the sense of active or functional procrastination. In the latter case, it is procrastination, which is part of the creative process. At the stage of incubation, we deliberately postpone the problem in order to wait for the right solution, which suddenly emerges. This process is sometimes referred to as a creative pause [4].

2.1.2

Active and Passive Procrastination

Passive procrastinators are traditional procrastinators, who do not want to postpone tasks, but they procrastinate because they do not know how to make quick decisions and act. The opposite is active procrastinators, who make decisions quickly and on time. However, they deliberately interrupt the work on tasks and devote themselves to other important work. The differences between active and passive procrastinators are in affective, behavioural and cognitive dimensions [5]. Whether achievable goals are factors that contribute to the inconsistent relationship between personal perfectionism (self-oriented) and passive procrastination, Yan studied [6]. Another goal of the research was to get a better idea of active procrastination by comparing

652

L. Pinke et al.

its similarity and differences with passive procrastination. It concluded that achievable objectives were not factors contributing to the inconsistent relationship between passive procrastination and personal perfectionism. Passive and active procrastination are completely different. Active procrastination had no relation to personal perfectionism, it had no connection with even one type of achievable goal, in a group with a low focus on perfectionism it had a negative relationship to avoidance of performance-oriented goals. Passive procrastination had a negative relationship to personal perfectionism, a negative relationship to task-oriented goals (mastery) and was positively related to performance-oriented goals. Therefore, it is necessary to distinguish active and passive procrastination.

2.1.3

Functional and Dysfunctional Procrastination

Ferrari and Emmons divide procrastination into functional and dysfunctional [7]. Functional procrastination can be described as a deliberate postponement of duties. In functional procrastination, procrastination is a deliberately chosen strategy, when something needs to be done as quickly as possible. The procrastinator sees it as a benefit that can increase the likelihood of success in a task. Speculators on the financial exchange can be cited as an example. Dysfunctional procrastination is the opposite of functional procrastination, does not bring any desired effect. This procrastination is considered an inappropriately chosen strategy at the wrong time. The procrastinator prefers to devote himself to less important and more attractive activities for him. An example is watching TV instead of an individual studying. Dysfunctional procrastination is further divided into arousal procrastination and avoidant procrastination [8]. During arousal procrastination, the individual deliberately takes a delay and postpones tasks, the more his stress level increases and believes that his effectiveness increases in this way. However, this individual may not fulfil the task due to lack of available information, lack of time, or is unable to be sufficiently efficient and at the same time accurate. Before the deadline, the individual hastily chives all materials or is able to perform only certain parts of the task [9]. Avoidant procrastination is associated with fear of failure. An individual tries to protect his self-esteem and prefers not to perform the task so that he does not have to be confronted with his own abilities [10].

2.1.4

Decision-Making Procrastination

Decision-making procrastination is a maladaptive pattern of procrastination. Delayed decisions are most often in stressful situations, such as when we have a choice or when we face conflicts. In decision-making procrastination, individuals need more time to make decisions in order to find and evaluate information on alternative options [11, 12]. According to Hen and Goroshit, decision-making procrastination is a chronic cognitive type of procrastination [12]. If each variant of the solution seems risky or

Time Management and Procrastination

653

unsatisfactory for the individual and the individual no longer believes that a better solution could emerge, then the individual chooses the path of procrastination as his escape from a stressful situation [13]. Research conducted by Ferrari and Dovidio has revealed that people who have high values in decision-making procrastination seek information strategically and systematically, but are looking for more information, especially about the alternatives chosen [11]. Which means that the problem of decision-making procrastination is no longer limited to “how long”, but extends to “how”. A suitable tool for measuring procrastination in decision-making is the Mann’s Decisional Procrastination Scale [11, 14].

2.2 Types of Procrastination 2.2.1

Optimistic and Pessimistic Procrastination

Lay distinguishes procrastination into optimistic and pessimistic [2]. Also, Milgram, Gehrman, and Keinan, confirmed this dichotomization of procrastination [15]. The difference between optimistic and pessimistic procrastination is that an optimistic procrastinator postpones his decisions and tasks, but is not concerned about his passivity. While a pessimistic procrastinator also postpones his decisions and tasks, he is concerned about his inaction [16].

2.2.2

General and Academic Procrastination

General and academic procrastination refer to the daily routine [17]. General procrastination, or procrastination of tasks and daily activities in the general adult population, is not associated with a sarcastic environment. General procrastination is related to the postponed activities of everyday life [18]. Academic procrastination is related to the academic environment. This type of procrastination most often occurs in university students and is the most researched type of procrastination. Academic procrastination is the postponement of assignments and study duties. Study duties include, for example, writing seminar and final thesis, preparing for exams or various home preparations [18]. Özer, Demir and Ferrari estimated the phenomenon of academic procrastination among undergraduates to be between 50 and 70%, twice as high as the general population [19].

654

L. Pinke et al.

2.3 Causes of Procrastination 2.3.1

Previous Influences

Research conducted by Klingsieck, Grund, Schmid and Fries found why students procrastinate [20]. They paid their attention mainly to previous influences on procrastination and student experience. They distinguished two types of previous influences: personal and situational. Personal previous influences include motivational, ability to plan, emotional, focus on competencies and personal characteristics. Motivational influences include the circumstances associated with the start of the operation and internal motivation. The ability to plan includes questions about the planning and performance phase, the absence of self-discipline, a lack of focus on one main objective and a lack of self-organization. Emotional influences include concerns about the potential negative consequences of failure or a sense of disappointment at failure. Competency orientation includes a lack of competencies needed to accomplish a task. It was important for participants to have the ability to estimate the time needed and to have a sufficient level of procedural tasks to start their intended activities on time. They incorporated spontaneity, laziness and decision-making problems into personal characteristics. The social, external structure and natural characteristics of the task belong to the situational previous influences. The category of social influences contains not only an aspect concerning the persons themselves, but also the social environment and the attitudes of others towards procrastination. The range of topics that fall within the external structure ranges from a number of other tasks and redirects by other activities to the degree of external structure. The interviewees claimed that they had been procrastinating through other activities because they had to do other important things or had to attend other meetings, but they also added that this overload was due to past procrastination. The natural characteristics of a task include task scope, task complexity, attractiveness, task importance, and external stimulus. The more complex and challenging the task, the more pronounced the procrastination.

2.3.2

Type of Task and Deadline

Gafni and Geri found differences in procrastination where the task is compulsory or voluntary and where individuals have their own term or when the term is generic [21]. The research was carried out in seven semesters, with only the first two semesters the task was voluntary and for the remaining five semesters the task was compulsory, so the students were divided into two groups: compulsory and voluntary. In the first part, students were tasked with finding an up-to-date article (printed or electronic) in the newspaper, the topic of which was related to the course, to read the article, to analyse it and send it to the discussion forum of the course during the pre-tent week, each student having his/her own deadline. In the first part of the research, procrastination was measured by counting the days between the date of entry and

Time Management and Procrastination

655

the date of submission. If the task was submitted on the check-in date, the value was zero if it was submitted after the commit date, the value was a positive number of days by how many days the task was submitted later. A negative value was the number of days left until the commit date, that is, a negative value was assigned if the task was submitted before the deadline. In the second part, students were supposed to send two notes to analyse other peers, in which they could expand the analysis to include additional options or explain other ideas, and repetition in both parts was prohibited. Each student had to read previous posts before completing any part of the task. Although students wrote comments about the work of their classmates, the second part was considered common because they had to rely on it and refer to it. A general deadline was set here, which was at the end of the semester. In the voluntary group, only one submitted his comment. The voluntary group took procrastination to the extreme and did not meet the second part at all, so they examined procrastination in the second part only for the mandatory group. Students were not penalized for late submission, and their assignments were judged on quality, implying that the assignments were to have meaningful content. Gafini and Geri found no gender differences in procrastination. Only four students in the volunteer group did not make the first part [21]. The longer delay in submission was for the voluntary group than for the mandatory one. There were more students in the mandatory group who submitted the assignment on the day of submission or the following day than in the voluntary group. Students from the obligatory group who submitted their assignment after the deadline submitted it within ten days, but students from the voluntary group who were late submitted it for even more than two months. Almost half of the students submitted their comments in the last two weeks of the semester and a sixth of students even after the end of the semester, among them those who submitted their first assignment at the beginning or mid-term. Nearly a third of students posted their two comments on the same day. In addition, the authors of the research found standards of behaviour, if one of the students from the same group did not comment, the other students followed it and also did not add their comment, these standards of behaviour were found even otherwise, that is, when some students submitted comments earlier, which also affected other students in the group, and they did not hand over the task after the deadline. Their study showed that people are more punctual in individual tasks than in general tasks on which they work with someone. Time management of tasks on which more than one person cooperates, in addition to assigning individual deadlines, also requires the division of tasks into several individual continuous tasks that have their own closing times.

2.4 Procrastination Solutions Ferrari et al. based on clinical experience, they found that procrastinating people often experienced the following five cognitive distortions (“Big Five”) [18]: 1.

Revaluation of the time left to complete a task.

656

2. 3. 4.

5.

L. Pinke et al.

Underestimating the time, it takes to complete a job. Revaluation of future motivational states to work on tasks. A typical statement is, “I’ll be more in the mood to do it later”. Incorrectness regarding the necessary emotional conformity in order to succeed in the task. A common statement is “People should study only if they feel comfortable doing so”. The belief that if a person is not in the mood, the work will not be productive. The distinctive phrase is “It’s not a good idea to work when you’re not motivated”.

If people struggle with procrastination, a significant part of it is to endure with methods fighting this problem. It is important that procrastinators use them and, above all, that they do not put them off. First of all, the procrastinator should think about why he procrastinates. The emotional roots of procrastination include inner feelings, hopes, dreams, memories, doubts, pressures and fear. Procrastinators avoid unpleasant sensations because they can’t even imagine what’s going on inside. Various questions arise here. Is the feeling of discomfort related to the procrastinator’s past? Why can’t they work on the task right away, what’s stopping them? If they understand their own reasons, there is a better chance of determining the right direction to find a solution [22]. There are 12 methods to handle procrastination: • Identify behavioural target The behavioural goal should not be vague or global, at the same time it should be specific, specific and observable, for example, “I want to clean up and organize my garage by September 1st”. • Set realistic goals Focus on only and only one goal at a time. Think small rather than large. Choose a minimum acceptable goal rather than an ideal goal, such as “One hour a day I’ll be reading a book”. • Divide your goal into several smaller and specific objectives Smaller targets are easier to reach than larger ones. Smaller goals help to achieve a larger goal. • Be rather realistic about time Ask yourself how much time you need for the task and how much time you really have, for example, “It’s better if I look in my calendar to see when I can start. It took me more time than I expected last time”. • Just start Instead of trying to do the whole project at once, you’d better divide it into smaller steps. An example might be “What’s the first step I can take to get started?”. • Take advantage every 15 min Anything can happen in 15 min. The only way to complete a task is to work on it for 15 min of that time. That is, even 15 min is valuable. Ask yourself, “What part of the task can I do in the next 15 min”. • Expect obstacles and setbacks

Time Management and Procrastination

• • • • •

657

Do not give up as soon as you hit the first or second obstacle. An obstacle is just a problem that needs to be solved, it is not a reflection of your value or your competences. If possible, delegate the task Are you really the only person who can do the job? Are you sure the task has to be done in its entire? Remember, no one can do everything. Protect your time Don’t take the necessary or extra projects. Learn how to say “no” to others. He can always choose not to respond to what is “urgent” and instead do what is important. Watch out for your excuses Instead of looking for excuses to procrastinate, use it as a signal to use 15 min of time to complete a task. You can use your excuse as retribution for the next step. Reward yourself with your progression. Focus on the effort, not the result Avoid thinking all or nothing, the glass can be half full, as well as half empty. Even a small step is progress. Use procrastination as a signal Don’t ask yourself what message procrastination sends you, but focus on how you feel about procrastinating, what it means, how you can learn from it. But you always have a choice, you can procrastinate or perform [23].

3 Time Management 3.1 Time Management Definitions Before digging into time management and its methods, it’s important to define the term itself. For centuries, time management has helped people to organize their professional lives. However, the literature reveals uncertainty and findings on how, whether, when and why time management leads to critical findings in areas such as work performance and well-being. Features of time management appear in various disciplines such as psychology, sociology and behavioural economy [24]. Haynes states, that time is a unique source, which is either available or in shortage [25]. Time is always in motion; therefore, it is neither possible to stop it, save it nor to replace it. Time is a fixed source with 24 h in one day and every minute has 60 s, which applies to everyone. These characteristics of time requires to be handled wisely and carefully. Time utilisation varies on what kind of work are people performing. Time can be described differently by individuals. It depends on the perception of the individual. Some states time passes fast, or it’s short. Other says, that there is not enough time. All of these statements depend on the availability of time and the ability to perform a variety of tasks. Which means that individuals are compared according to the time available and the amount of work. On the one hand, there is the ability to perform work and, on the other hand, there is the available time that are being compared [26].

658

L. Pinke et al.

Time has become one of the most important factors in research analysis. Specialists from various departments as economics, management, political science affirm that time passes fast, has limited duration and is irreversible. In order of, people stop wasting their time, several time management methods were developed, which increase the efficient usage of time completing job tasks. One of the conditions of time management is controlling the availability of time which are being devoted to particular tasks. By that not only their quality but also their efficiency is escalating [27]. Time management applies managerial phases: planning, organizing, implementing and evaluating. The word time management means the utilisation of time by completing all tasks in a given time period, which requires good knowledge and experience regarding time for each task. The given time is divided into a time schedule to ensure that the necessary tasks are successfully executed. Nevertheless, there might occur some issues and problems, which are not under the control of a person. The daily practise of time management can be trained by an individual to overcome the above-mentioned problems. Moreover, the learned attitudes, various alternatives as well as the solutions shall help an individual to face difficulties when needed [26].

3.2 Evolution of Time Management Time management or personal time management has been known since the era of early homo sapiens already two hundred thousand years ago, but no one used these terms. Our ancestors had to run not just a longer distance than their prey synchronously they also had to run a longer section in the same time as the predator who chased the hunter. In order to save his life, the hunter had to know how to make a better use of time management even when hunters had fights between each other. Every ever-existing civilization could only have survived, when they knew when to perform what action, when to sow and harvest and so on. Even to build the famous pyramids in Egypt, long-term plans were needed. Before the year 1945, people wrote notes on a piece of paper to know what needs to be done or bought. They used to write these things down to calendars or notebooks [28].

3.2.1

First Generation

The first generation presents the tools that help us to remind ourselves of what needs to be done. It is necessary to keep in mind what needs to be done and how to divide time between individual activities such as attending a meeting, writing a report or cleaning up. Typical for this generation are simple and well-arranged to-do lists and notes. People who belong to the first generation of time management always carry their notes and to-do lists with them. Throughout the day, when tasks are completed, they can be scratched out and a new task can be added to the list of tasks. People who use first generation tools tend to be more flexible because they have the ability

Time Management and Procrastination

659

to respond to the changing environment and stimuli that come from other people. These people don’t have difficulties adapting to new situations or to solve a suddenly occurred problem. They create their own time schedule, which they follow, and they do what they think needs to be done or are urgent. Putting your thoughts of the todo list on paper helps to reduce stress. The weakness of the first generation is the probability of forgetting a meeting or breaking a promise. If people do not set goals and vision, they will achieve less important results. They consider the most important to be what is in front of them. The first generation answers the question of what to do [29, 30].

3.2.2

Second Generation

The second generation of time management uses “planning and preparation” tools, such as planning calendars and diaries. This generation emphasizes personal responsibility, efficiency, planning, goal setting and the timing of future activities. People who use these tools write down their responsibilities and deadlines, schedule their meetings and note down their meeting places. People use various types of tools for these activities, including online programs and computers. People who belong to this generation feel a higher level of personal responsibility for the results and fulfilment of the given tasks. They use time schedules, planning calendars and diaries to remember everything, but also to motivate them to prepare better for their meetings and presentations. By planning and setting goals increases not only the performance of an individual, but also their results. People prefer time schedules, what puts other people to the role of enemies, who try to disturb them. That is why they often distance themselves from other people, isolate themselves or entrust others to various matters. Other people represent to them a tool for achieving their targets. The group of people who represent this generation often achieve better results than people from the first generation. For the second generation, the most important thing is whether things are happening according to their goals, diary, calendar or schedule. In this generation, the individual tasks are connected to the timeline. Second-generation people think more about what they are going to do, but also when. Most people use this model. They focus more on what they write into the calendar, what they have to do and when [29, 30].

3.2.3

Third Generation

The third generation of time management focuses on “planning, controlling and prioritizing”. This generation was created as a solution to the weaknesses of the first and second generation, which cannot work on larger projects, teamwork’s or individual activities to determine priorities. Because it is logical, well developed, it is also well received and understood. People using third-generation tools are clearly setting their priorities and values. They often ask themselves, “What do I really

660

L. Pinke et al.

want?” This generation is characterized by short-term, medium-term and long-term goals, through which they realize the set priorities and values. Everyday activities are being decided which activities are going to be preferred. The third generation specifies priorities and goals and deals with teamwork and delegation. In this generation, people use a variety of tools, including printed and electronic ones, that offer detailed day planning. The most important benefit of this generation is the ability to connect the values with plans and goals. People using third-generation tools are characterized by higher personal productivity, which is based on setting priorities and planning the day ahead. The most important thing for them is determined by their goals and values. For some people, this generation is the ideal one and for people using first- or second-generation tools are the third generation the goal they would like to achieve. By clarifying the individual contexts and goals, this specifies more in depth what, specifies when, and examines more how things should be done. The main shortcomings of this generation are people who are convinced that they rule things, not principles and natural laws. The established values may not always be in accordance with the governing principles. It may happen that time schedules will be above people and people will be perceived only as things. This generation is wellcrafted to the point where it may seem inhuman. It can cause excessive programming, guilt, and role imbalances. It is very difficult to follow daily, weekly or monthly schedules, which can devalue expensive tools and reduce them to a form of diary, as for the second generation. Offers less spontaneity or flexibility. Skills and abilities are not sufficient for efficiency and leadership, character is necessary. The third generation cannot use 23 the power of vision. The main focus is on today’s goals, focusing on current and urgent issues, thus neglecting prevention, conception and creativity disappearing [29, 30].

3.2.4

Fourth Generation

The fourth generation of time management has evolved from previous generations, whose main task is to preserve the strengths of the first, second and third generations while overcoming and resolving their weaknesses. Many people want to follow the principles of the fourth generation, so they would prefer people over time schedules. Third-generation tools such as schedules, calendars, and other planning tools force us to focus on the urgent matters and at the same time they limit spontaneity and flexibility. We feel guilty if not all tasks are completed, or we do not follow the schedule. The first three generations support performance, productivity, achieving goals and setting priorities, but over time this is no longer enough. People no longer want and need to do things faster because they need to do things in a right way. There is a demand for a new generation—the fourth generation, combining practical tools and theoretical knowledge that support the use of innate qualities to meet basic needs [29].

Time Management and Procrastination

661

According to Pacovský, the fourth generation does not focus only on what and how things should be planned but penetrates much deeper into our lives [30]. Therefore, this generation represents a new lifestyle, the essence of which consists of five principles: • • • • •

man is more than time, the path is more important than the destination, there is more on the inside than on the outside, slow is more than fast, the whole is more than a part.

Man is more than time The first principle “Man is more than time” is defined as the harmony of planning, adhering to time schedules and taking care of one’s own satisfaction, well-being and fitness. When people are balanced, motivated and satisfied, they are much more efficient. Attention shifts from focusing on time and specific tasks to perceiving our lives as a whole context, that is, on relationships, satisfaction, fitness and efficiency. This path is not easy and takes longer, but it is very interesting to increase efficiency and at the same time our satisfaction. There are four groups of people. The first group consists of most people who, although efficient, feel stressed. To the second group belong people who are effective and have more positive emotions. The third group has mostly stressful feelings and is ineffective, they feel guilty, but still want to work on themselves. The last group of people is ineffective and has positive emotions, but the members of this group are blinded and have no need for change. In the fourth generation, achieving the desired performance is just as important as improving the human factor. The path is more important than the destination The third generation forces us to focus on achieving certain results, but this focuses on the fact that goal can neglect the value of the days that should lead us to the goal. There is 24 h which are essential, and we should devote ourselves to every day, as it takes one day to meet a goal, but it can take several years to achieve it. It is better to be than to have, which in fact means that many of us want to have an education, money or a career first, and only then can we strive to become who we would like to be. To achieve this, we must meet two assumptions: internal and external orientation. Inner orientation prefers experience to goals. External orientation prioritizes results over activities. There is more on the inside than on the outside Success cannot be achieved by skills, techniques or knowledge alone, as the character and behaviour that flow from it is needed. Character and behaviour form personal quality. It is not enough just to know how we should behave in stress or other emotional situations, because of the nature and the actions associated with it in these complex situations lead should be taken. Our habits based on which we make decisions, can only be changed by our own growth. We must first start with ourselves,

662

L. Pinke et al.

change our inner selves to decide for ourselves, instead of being controlled by others. If we do ourselves, then we can focus on our surroundings. Happiness is within us, it does not come from other people. Slow is better than fast If a problem occurs, in most cases we will find a solution, but over time it will turn out that nothing has changed at all. The most common reason is that we want to solve the problem as quickly as possible, if the solution is inside us, speed and strong starting points will never work. If people work on each other, the solution will be easier. We have to realize that everything is constantly moving, nothing remains the same and we do not change anything about it. If we focus our attention on ourselves, there are only two ways out: we will either improve or get worse. The whole is more than a part There are moments during our lives that we do not want to end sometimes, so that time no longer passes. The solution is not to do anything to preserve the moment, but to gradually worsen the condition. There are several important aspects in our lives (relationships, employment and fitness) that need to be gradually and permanently supported. Many people focus only on their work, but private life also needs to be developed [30].

3.3 Time Management and Procrastination Wolters, Won and Hussain examined whether traditional and active procrastination within a self-regulatory learning model can be understood through time management [31]. In the first step of the regression analysis, they determined that first metacognitive strategies and self-efficacy were important indicators of active and traditional procrastination, but in the second step, after the inclusion of time management in the analyses, it was found that time management was a very important predictor of active and traditional procrastination. In the time management model, they identified three aspects of time management: setting priorities and goals, prioritizing them and their organization.

3.4 Time Management Methods Time management methods are methods, which are well known for their benefits of time efficiency in all areas of the daily life.

Time Management and Procrastination

3.4.1

663

ABC Method

The ABC method evaluates tasks according to their importance and divides them into three groups. The most important tasks make up 15% of all tasks and their importance for achieving a goal is 65%. The group of important tasks makes up 20% and their contribution to achieving the goal is 20%. Less important tasks are 65% of the total number of tasks and their importance is equal to 15%. Using this method, you can define the purpose of the analysis, specify the object to be analysed, define the following actions after analysis results and select an analysis parameter [27].

3.4.2

Pareto Method

Pareto’s method (80/20) was discovered by the Italian economist Vilfred Pareto in 1897, who found that 80% of all wealth was held in the hands of 20% of the richest people. Pareto method has found application in other areas as well. The method can be found under many names: Pareto principle, Rule 80/20, Pareto law, Imbalance principle or Least effort principle. Principle 80/20 states that there are things in the population that will be more important than others. Pareto’s principle says that 80% of results come from 20% of causes, or 20% of causes cause 80% of outputs. The 80/20 rule is not always fixed, as sometimes this principle is rather 70/30, 70% of the results are based on 30% of the causes. Rarely is this ratio 50/50 [32]. Pareto’s principle can also be used in time management, in which it states that 80% of the tasks can be solved in 20% of the time spent, or for the remaining 20% of the tasks 80% of the time spent is spent [27].

3.4.3

Mind Mapping

According to Buzan, the mind mapping is a visual tool that serves to represent complete ideas and contributes to the improvement of memory, learning, creativity and the arrangement of reflections [23]. The first step of the mind map is to draw the main object of our attention in the centre of the paper. Then the branches emerge from this centre of the main object. First, the main themes that are closely related to the central shape are drawn. These main branches are further divided into more distant topics. A keyword or image is added to each branch that specifies that branch. The more creative the mind maps, the more effective they are. Our brain perceives images, 3D images and colours very well. Everyone can customize the mind map. You can also append special codes between branches. Mind maps have unlimited use in our lives.

664

3.4.4

L. Pinke et al.

GTD Method (Getting Things Done)

The GTD or Getting Things Done method was created by Allen [33]. The method is based on the principle of creating an external system of different types of lists that will serve as reminders, tasks should be sorted according to where it will be easiest to perform. If I have more than one task that I have to complete, I can combine these tasks into one group, which will be performed according to their importance. At the same time, the capacity of our brain is concentrating on completing the task. The first step is to collect all the activities that need to be done in predetermined “boxes”. Mailboxes can be considered as an electronic device that records notes, voice recording tools, e-mail, a paper tray or notes in a mobile phone or tablet. The second step is the processing of the contents of all mailboxes, for each task its implementation is considered. The third step is to organize the processed items. The fourth phase is the assessment whether we are really doing what we are doing and carrying out a weekly evaluation. Do the last step. It is important to make the right decision about what we are doing at any given time.

3.4.5

Pomodoro

The author of the Pomodoro method is Francesco Cirillo, who during his studies at the university explored how to improve his study process. He figured out how he could concentrate better and increase his motivation to study, when there are many distracting elements all around. He laid down the simple question of whether he could really learn ten minutes, but he needed a timer. In his kitchen he found a minute shaped tomato (Pomodoro in Italian). Pomodoro technology offers a tool that increases the productivity of one person or the whole team [34]. It is based on three basic principles: Offers a different way of perceiving time. Better use of our mind which allows for greater clarity of thought. The use of easyto-use and decent tools reduces the complexity of using this technique, allows us to focus on the activities we want to carry out and promotes continuity. The Pomodoro method lasts 30 min, with 25 min reserved for work and 5 min for a break. In the morning, each day, the activities are selected from the List of activities to be performed that day and entered in the “To Do” list. The time is set to 25 min and the first activity in the “To Do” list starts. Pomodoro should not be interrupted, there is 25 min of clean time for work. Pomodoro cannot be divided in half or in other parts. After four Pomodoro the activity is interrupted and is followed by a longer break lasting 15 to 30 min. When the task is done, it is deleted from the “To Do” list.

3.4.6

Eisenhower Matrix

The author of this matrix was the former US President Dwight D. Eisenhower. The Eisenhower Matrix, or Urgent/Important Matrix and Priority Matrix, is a tool that

Time Management and Procrastination

665

helps in making and determining long-term, medium-term and short-term forecasts of strategies, as well as in setting priorities. This matrix consists of four quadrants. The vertical axis shows the degree of importance and the horizontal axis represents the degree of urgency. The first quadrant consists of highly important tasks with high urgency that cannot be postponed, and it may be too late tomorrow, these tasks must be performed today. The second quadrant contains highly important tasks, but with low urgency. The third quadrant includes tasks that are less important but highly urgent. In the fourth quadrant there are tasks that are less important and less urgent, they are the so-called cannibals of time. Activities such as reading magazines, browsing the Internet, shopping, and computer games can be included in this group [27, 35].

3.4.7

To-Do Today

According to Ludwig, the lists only emphasize what needs to be done, their priorities, how the individual tasks follow each other in time [1]. People tend to overload lists, which creates distaste for them and consequently leads to procrastination. If people make long lists, they can cause decision paralysis. Eventually, the lists are put away somewhere, or people prefer not to look at them and stop using them. The most developed part of our brain is the visual cortex, which is why Ludwig invented the To-do today technique. This method does not use any lists, but the tasks are represented by a mind map that displays important information, and this retention is the most natural thing for our brain. The To-do today method contains ten principles: deploy tasks, name tasks precisely, divide large and merge small tasks, color-coded priorities, determine the path across the day, estimate times, focus on just one thing, learn to end the task, regenerate the cognitive resource, make it a to-do today habit. First, write the tasks you want to do during the day on a random basis. If you name the tasks accurately, it is easier to imagine them and the resistance to their fulfilment will be reduced. Each task should take between 30 and 60 min to complete. Divide more complex tasks into smaller activities. On the contrary, combine the shorter tasks into one activity. Distinguish the individual priorities of tasks in colour, circle the tasks with the highest priority in red, with the medium priority in blue and the tasks with the lowest priority highlight green. Connect the tasks with the arrows so that you can see which tasks you will perform first, and which will then follow. Plan your trip so that it is convenient for you. At the beginning of the day, start with priority tasks and intersperse them with less demanding tasks and intersperse creative activities with systematic activities. Try to assign the exact time to each task from when to when you will work on the task. When you start working on a task, focus only on its fulfilment. Once you have completed the task, black out the window to close the activity. Define short breaks for rest between tasks. Try to prepare to-do today always for the next day, you will sleep better.

666

3.4.8

L. Pinke et al.

Timeboxing

In an article by Jalote, Palit and Kurien, the basic unit in the Timeboxing model is the time field (window), which is a fixed duration [36]. In each time field there are activities that will be performed. The duration is fixed, the selection of the request that will be built into the time field will be a request that fits into the time field. Time fields are divided into a series of phases. The output of one phase is the only input to the next phase. The duration of each phase is approximately the same. Each phase has its own specialized team.

3.5 Ten Common Mistakes in Time Management There are ten most common mistakes in time management that prevent people from managing their time effectively [37].

3.5.1

Failure to Complete the To-Do List

The first mistake is not following the to-do list. People often forget to do an important task because they do not have an overview of their responsibilities. For effective use of the 30-to-do list, it is important to distinguish the urgency of each task. An A–F encoding system can be used, where A is a high priority item and F is a low priority item. Or people use a simplified A–D system. The third option is to use a number system. In case that there are large projects on the list, there is a danger that they may become vague or ineffective. If the items are insufficiently specified, they can lead to their postponement or forgetting of essential steps. It is therefore appropriate to divide large projects into concrete steps.

3.5.2

Not Setting Personal Goals

Setting your own goals is key to proper time management, as goals provide a vision of where we want to go. When one knows where one would want to go, one can set priorities, resources, and time to achieve it. Through goals, it is also possible to find out what is just a distraction and what is essential. It is recommended that the set goals to be SMART.

3.5.3

Not Setting Priorities

Sometimes it’s not easy to set priorities, especially when people are overwhelmed with tasks, they think are urgent. However, it is crucial to know how to set priorities if one wants to be effective and to manage your time. The action priority matrix is one

Time Management and Procrastination

667

of the tools used to effectively prioritize. The matrix determines which task has high profitability and high priority and which has low value. If people know which tasks have high profitability and priority and which do not, time ca be better organized during the day.

3.5.4

Improper Distraction Management

A lot of people waste time being distracted unnecessarily. They are often distracted by various chats, emails, calls from clients or colleagues in crisis, for example. All these scatterings prevent the flow state from being reached. Flow is 100% immersion in a certain activity that brings us joy. The individual must know how to reduce distractions and minimize distractions so that he/she can control his/her time and do his/her job the best he/she can. The individual should turn off the chat or others to say he needs to concentrate now. He should also learn how to concentrate when distracted.

3.5.5

Take on Too Much on One’s Plate

People often take on too many projects and can’ say “no” to others, which can cause stress, lower performance and morale. Another problem is, if the individual is a micro-manager, i.e., someone who prefers to do all the work himself/herself, because he/she does not trust anyone else to do the job properly. For an individual who takes on too many things at once, it is a misuse of time and can give him/her a reputation as a careless person. The solution is to learn to say “yes” to yourself and “no” to tasks. If people learn this, this solution will help them succeed while maintaining good relationships in the team.

3.5.6

Dependence on the Status “Busy”

Accurately meeting deadlines, stacking documents to process, or frantically running to a meeting are activities that deliver a certain dose of adrenaline. However, dependence on the state of “being busy” does not always mean that a person is productive, on the contrary, it can cause stress. One should slow down and learn how to manage your time more efficiently.

3.5.7

Multitasking

At the same time, some people chat with clients and write e-mails, thinking that working on multiple tasks will save time. However, the opposite is true. If people work on multiple tasks at once, it takes 20–40% longer to complete them than if they performed the same number of tasks sequentially. In addition, multitasking increases

668

L. Pinke et al.

the likelihood of errors. The quality of work will be higher, if people do not do more things at the same time and focus only on one thing.

3.5.8

No Breaks

It is impossible for the human brain to concentrate and do quality work without having time to rest. So, it is not a good idea to work 8–10 h without a break, especially not before the deadline. One should not perceive breaks as a waste of time, because thanks to breaks one can think creatively and be effective. If an individual has difficulty interrupting their work, they can schedule breaks or set a reminder to remind themselves of breaks. During breaks it is advisable to go for a short walk, meditate or have a cup of coffee. It is recommended to take a five-minute break every hour. Not even lunch should be rushed, because if a person is hungry, he/she is less productive.

3.5.9

Inefficient Task Planning

Everyone has a different preference for activity during the day, i.e., the time when the individual is most productive. Some are morning birds, others like more having a night shift. Everyone should plan for working with high value for a time when their productivity is at its peak. Conversely, low-energy work, such as checking emails, should be done at a time when our productivity is declining. If people overcome these mistakes, they will be more efficient and their stress will also be reduced [37].

3.6 Free-Time Management Leisure time appears in many literatures and its importance is still growing, yet research on leisure time is not enough. The Klerk and Bevan-Dye study aimed to find out how Generation Y students can use their free time [38]. The research revealed four factors. The first factor—Setting goals and evaluation, which included: creating a list of things to do in free time, setting goals for free time, setting priorities for free time, constructively use free time, remembering things that were planned for free time and evaluation of leisure time use. The second factor—Values included: free time makes sense. Free time is pleasant and free time is important. Factor three— Immediate response included items: he has plans to use free time, he has alternative plans and to adjust the ways of using free time. The last factor—The technique included items: collecting related information about spending your free time, organizing leisure activities, organizing free time daily or weekly, avoid interruptions during free time, specifically assigning free time.

Time Management and Procrastination

669

4 Project and Time Management Software Event planning and management is an intricate and time-sensitive process with many moving parts. It includes logistics such as budgeting, establishing timelines, selecting and reserving event sites, and acquiring permits and equipment. Event planning can also include many creative aspects such as choosing a theme, hiring suitable entertainment, and composing a menu. It is important to stay organized and track everything that needs to be done and everything that’s already been done.

4.1 Monday.com Using monday.com can help a solo planner or even a large event planning company to plan any kind of event. This project management software is a powerful program that can help company keep track of workflow and, well, manage projects as shown in Fig. 1. For event planners juggling several projects at once, it’s really important to be able to concentrate everything that’s on their plate into one central location. Monday.com can help you achieve this with a high-level board that brings all your running projects together in one place. To keep track of all of your projects, you can use a central high-level board, where each item on the board is a different event, and is categorized into groups on the board according to the type of event it is shown below in Fig. 2. This board is form-powered, meaning that items can be added to the board by simply filling out a form see Fig. 3. Users fill out the form with their event’s name, month and exact date, and event type, as well as the number of invitees, and contact name, phone number, and email address. All of these details are transformed into cells in the board’s columns. Every time a new item is created through form, manager is notified through the automation added to board. I tis possible to customize the recipe below to notify with a unique message based on the values in the new item’s cells. Fig. 1 Logo of Moday.com

670

Fig. 2 Appearance of the planner at Monday.com

Fig. 3 Form of creating an event

L. Pinke et al.

Time Management and Procrastination

671

Fig. 4 Events displayed by dates and events

One of the important columns on board is the Status Column where is selected the month of event. It helps to visualize planning timeline in a Kanban View to display all of upcoming events in Kanban cards according to the month they will take place in see Fig. 4.

4.1.1

Plan by Category

Planning by category feature is helpful, while having all of incoming and running projects concentrated into high-level board, there is need for something to manage a project in its entirety. It is called “project board” and it is used for each event. Birthday party is coming up in November, and organizer want to make sure that every aspect is covered. Project board organizes all tasks into groups according to vendor or category. This way, organizer can easily see the tasks that need to be done regarding food, logistics, entertainment, venue, etc. as shown in Fig. 5. For each of the tasks on board are used columns to keep track of the task’s status and timeline. Since a lot of event planning requires market research to select the best or most cost-efficient option, there have been added a “Decision” Status Column, set in deadline mode together with the decision deadline and the planner responsible for deciding or helping the client to decide. Files Column is used to add visual “Inspiration” for the project. Here, can be added any images that the client sends to organizing company to help them plan the way they envision or photographs that are shared with the client in order to keep them in the loop. It is possible to see all of our images at once using the Files View shown below in Fig. 6.

4.1.2

Plan by Timeline

Planning by timeline can be used for each of the upcoming events and a low-level project board in different structure can be created. For special project, can be chosen to organize tasks by the month you would like to tackle them in. This way, each group on board represents a month, and all of the items in each group are the things

672

L. Pinke et al.

Fig. 5 Project planning assign to participants

Fig. 6 Sharing pictures to the project group

those need to be accomplished within that month. To further categorization of tasks, can be used a Status Column to label each item with the vendor it’s associated with. For example, you can have a lot of outsourcing for special project, so it is good to keep track of everything by adding a Status Column and Timeline Column to track the status of the task, and the contact name, phone number, and email address for the external vendor to be in touch with to get this task ticked off our list as it appears in Fig. 7. To push the time management efficiency even further with this project, we’ve added the Timeline View to our board! We set it up so that all of our tasks are grouped by the People Column, so that we can clearly see the division of tasks between our team members, and what everyone has coming up. We’ve also chosen to colour all

Time Management and Procrastination

673

Fig. 7 Appearance of timeline planning

of our tasks according to their status. This gives us a visual understanding of how our planning progress is keeping up with the timeline we planned in Fig. 8. By adding descriptive charts, it can help to understand the data in boards. Chart View to this board give us a breakdown of how many tasks are labelled with each status. This way it is possible to visualize the breakdown of progress. In this project is already completed 18% of tasks at this planning stage as it can be seen in Fig. 9.

Fig. 8 Timeline view of the team’s activities

674

L. Pinke et al.

Fig. 9 Chart of team’s activity status

4.1.3

Use a Template

If a company plans a lot of specialized events with different timelines templates can be used. Many events follow a similar planning process in terms of timeline and specific tasks. To save time and to implement structure and routine to planning process, specific planning template can be created shown in Fig. 10.

Fig. 10 Elevated events template

Time Management and Procrastination

675

Template can be created by following simple steps outlined in this article, so company can be able to use it every time at the begin of planning an event with a similar structure. Then it can be customized to make it more specific and add all of the important details and columns if needed.

4.1.4

Take Control of Little Tasks

While a high-level board gives us an overview of all projects and a low-level board is a great way to plan one project from start to finish. It is really helpful to create a separate board to manage smaller upcoming tasks. A “Daily To-Dos” board can be used to concentrate all of the tasks, those have to be accomplished in the upcoming week. In Fig. 11 a group was just created for each day of the week to see clearly which tasks have to be done each day It can be customised by adding: • • • • • •

People Column to assign tasks Dropdown Column to tag which event this task is for Location Column to record where this meeting or event will take place Status Column to track the task’s status Numbers Column to understand how much of a time commitment the task is Rating Column to rank the priority of tasks to focus on the most crucial first.

Just as we discussed in the previous section, we can create a template with this board structure, so that we can create a new board with the same groups and columns at the start of each week, and fill in the relevant details. We can also duplicate the structure of our board to create a blank board identical to this one! We just have to click the 3-dot icon, select “More actions” and then “Duplicate board” shown in Fig. 12. Then, we’ll be able to select options from the drop-down menu. If we select “Structure only”, we’ll duplicate the groups and columns on our board, but not the

Fig. 11 Daily To-Dos window appearance

676

L. Pinke et al.

Fig. 12 Picture of the action of duplicating a board

Fig. 13 Picture of the action of duplicating a board

items or cells. We can even check the box to keep the subscribers to our board, meaning that all of our team members will be automatically subscribed to the new weekly task board, see Fig. 13.

4.1.5

Integrate Your Favourite Platforms

Integrations allow you to seamlessly connect your favourite external platforms to your monday.com account. You can connect your Eventbrite account to your boards with several customizable recipes. Use it to keep track of your event registration, see planned versus actual, and track the success of the events you organize. You

Time Management and Procrastination

677

will be able to dive into each event individually, automatically create items in your monday.com account, and sync all future changes from Eventbrite.

4.1.6

Add a Dashboard

Creating a dashboard gives your team a great visual overview of all of our projects, loaded with apps and widgets: • The Timeline Widget shows the time span of our projects in a comprehensive calendar. The items are color-coded corresponding to their status, so we know where things stand in a glance shown in Fig. 14. • The To-do List Widget allows us to add tasks to a checklist and tick them off as we complete them. • The Battery Widget creates a battery-style chart, giving us an overview of how many tasks are completed and how many are in progress. • The Overview Widget shows us a progress bar for each of running projects displayed in Fugure 15. • The Workload View shows us how our tasks and projects are distributed amongst the planners in our company so that we can see if anyone is overworked or has some extra time on their hands. • The Chart Widget generates charts based on the data in our boards. The first chart (on the left) shows us the breakdown of the Status Column in our boards, giving us the big picture of where everything on our entire board stands. The second chart (on the right) shows us how our upcoming tasks are divided between the projects we’re working on shown below in Fig. 16.

Fig. 14 Calendar with timelines

678

L. Pinke et al.

Fig. 15 View of the battery widget

Fig. 16 Charts of status columns and upcoming tasks

These boards are intended to be a starting point to help you translate your event planning workflow into a monday.com workflow. You can use these exact boards in your workflow if they’re right for you, but don’t be afraid to experiment and test out all of the amazing features on monday.com. Make sure to check out other prepared templates and explore the Columns Centre to see all of the ways you can use columns as building blocks to customize your own one-of-a-kind board. You can make your own automations and enable integrations to really connect the dots in your event planning workflow [39–41].

Time Management and Procrastination

679

4.2 Asana.com Staying on top of everything is not easy, especially when you have to manage your team’s deadlines in addition to your own. Luckily, Asana can make it a lot easier to manage workflows. Like many companies, it is good to use Asana to manage your teamwork even during cultural event. Yet there are always teams and individuals who are new to the tool. If you need a quick tour of how to benefit from its perks and features, you came to the right place. Asana’s Table of contents: (1) Tasks and projects (Tasks, Projects), (2) Team basics, (3) Colour coding projects, (4) Managing project access and notification settings (Adding team members), (5) Tracking tasks using progress view, (6) Using sections to organize tasks, (7) Using Asana-created templates to add new workflows, (8) Navigating calendars, (9) Using team conversations, (10) Managing user permissions (Task permissions, Project permissions, Team permissions), (11) Using workspaces (Creating a workspace) displayes in Fig. 17.

4.2.1

Task and Projects

The basic units of action in Asana are tasks. You can create tasks by pressing the “Add Task” button on the top of the main pane displayed in Fig. 18. You can also use Quick Add and click the “+” button in the top bar shown in Fig. 19. Fig. 17 Asana logo

Fig. 18 Adding new task in Asana

680

L. Pinke et al.

Fig. 19 List of tasks

On the top bar, you can assign the task to a team member, set a due date, add a like, add tags, add subtasks, attach a file or perform several actions see Fig. 20. Below the top bar, you can view the task’s title, add a project description, and add a series of subtasks. At the bottom pane, you can add comments and view the task’s followers. Projects consist of a series of tasks. They’re basically a larger goal, whereas a task is an action someone on your team needs to take in order to achieve that goal. There are two ways you can create a project: click the “+” button in the top bar and select project, or click the “+” button beside “PROJECTS” in the sidebar as shown below in Figs. 21 and 22.

Fig. 20 Step of planning and assigning tasks

Time Management and Procrastination

Fig. 21 Creating a project

Fig. 22 Detailed view on creating and planning project

681

682

L. Pinke et al.

Fig. 23 Creating a team using the + button on the top pane

You can then add your project name and description, as well as set the project’s privacy to private, public, or to a specific team. Once you’ve filled out the necessary fields, select the “Create Project” button.

4.2.2

Team Basics

A team is a group of people who work together on one or several projects. To create a team, you can hover to the left pane and click the “+” button. You can also click the “+” button on the top pane. Then, type the email address of the team member you wish to add, see Fig. 23. You can invite both members and guests to view projects and tasks. Members are your co-workers. Guests on the other hand, have limited access. They can view projects and tasks only if they’re made public or are shared with them.

4.2.3

Colour Coding Projects

You can easily track projects by colour coding them. Simply click the drop-down menu, hover over “Set Highlight Colour”, and select “Set for Everyone” as shown in Fig. 24.

4.2.4

Managing Project Access and Notification Settings

Adding team members to a project is super easy to do. Simply select the “+” button at the upper rightmost side of the top pane. Next, type the email address of the person you want to invite at the text box labelled “Who has access” displayed below as Fig. 25. At the bottom, you can select “Manage Member Notifications” to choose notification settings for status updates, conversations, and task creations.

Time Management and Procrastination

Fig. 24 Setting colour to project

Fig. 25 Sharing access to a project

683

684

4.2.5

L. Pinke et al.

Using Progress View to Track Tasks

In the project progress view, you can track task completion in a project over time and get status updates from Project Owners. To find the progress view, hover over the progress tab shown in Fig. 26. If you’re a project owner, you can update the status of your project by choosing a colour or adding a narrative. There are three colours to choose from: green, yellow, and red. Green indicates that your project is on track. Yellow means that it is on track but there are some risks worth addressing. Red means the project is behind schedule. Below the colour labels, you can update the project’s status. In the example below, the status is: “Due to the site redesign, some blog posts will be delayed”. You may also opt to get status reminders every Friday. Once you’re done click the “Set Status” button see below Fig. 27. Use the project progress chart to see a timeline of remaining and completed tasks. You can use the gear icon to add or omit sections of the project from the progress chart. This is useful to do for tasks that won’t affect the team’s progress as displayed in Fig. 28.

Fig. 26 Appearance of progress view

Fig. 27 Status update and reminder settings

Time Management and Procrastination

685

Fig. 28 Progress chart

4.2.6

Using Sections to Organize Tasks

If you have too many tasks, you can organize them into sections. Sections can be used to create categories, workflow stages, priorities, and more. To convert a task to a section, add a colon (:) at the end of the task’s name. You can also convert a section into a task by removing the colon (:) from the section’s name shown in Fig. 29. Another option is to hover over the “Add Task button”, then select the “Add Section” button. To move a task to another section, simply drag and drop it below the section.

4.2.7

Using Asana-Created Templates to Add New Workflows

Instead of starting from scratch, you can use Asana-created templates, to save time adding tasks. To use an Asana-created template, create a project and hover over the “Templates” tab. Here, you’ll find templates for onboarding, meeting agendas, company goals and milestones, and event planning. To view a template, simply select

686

L. Pinke et al.

Fig. 29 Using sections to organize tasks

the “Preview” button. Once you’ve decided on a template, click the “Use Template” button stated in Figs. 30 and 31. Next, set a project name and specify the privacy settings. For this example, we’ve named the project “Content Strategy Meeting” and clicked “Create Project” shown in below Fig. 32. You can edit the tasks and sections in the template to suit your goals and objectives. The template comes with a “Read Me” task with instructions on how to use it.

4.2.8

Navigating Calendars

Calendars are a great way to track of your goals and to-dos. To access it, select “Calendar” at the top of the main pane. Here, you can view the tasks you follow and those assigned to you across different teams and projects. Note that only tasks added to projects (not those in my tasks section) will appear in the team calendar. The tasks are colour coded, so you can see which ones belong to multiple projects. If you want to move a task’s deadline to another date, simply drag and drop the task to your preferred date stated in Fig. 33. You can also view team members’ tasks using the “Team Calendar” view. Team calendars are great for organizing your team and giving senior managers access to everyone’s daily or weekly goals and objectives. The team calendar can be found in the left sidebar. You can click on tasks to view their details, add project descriptions, comment, or attach files. You can also double click a due date in the calendar to create a new task.

Time Management and Procrastination

687

Fig. 30 Asana-created templates

4.2.9

Using Team Conversations

You can start a conversation with your entire team or with those working on a specific project. To start a conversation, click “Team Conversations” on the sidebar. Enter the subject line, provide additional info in the body, @mention people that aren’t part

688

L. Pinke et al.

Fig. 31 Meeting agenda preview

of the project but who you want to include in the conversation, and click “Post” see Fig. 34.

4.2.10

Managing User Permission

If you want to create a private task that only you can view, do it in “My Tasks”. There will be a message that states “This task is private to you”. You can also add team members as followers so they can view the task as well. You can also create private projects and manage project permissions. Click “Create a project”, and play with the three privacy options at the bottom of the window see Fig. 35.

Time Management and Procrastination

689

Fig. 32 Selected template with instructions

Fig. 33 Calendar with assigned tasks

The public to team option makes the project viewable for everyone. In the “Private to Project” option, only project members can view the project, and in the “Private Only to Me” option only you can see it. Likewise, you can also create private teams and manage team permissions. Click the “+” icon on the top pane and select “Teams” as stated in Fig. 36.

690

L. Pinke et al.

Fig. 34 Insight into project conversation

Fig. 35 New project’s privacy setting

At the bottom of the window, you’ll find three options: membership request, hidden, and public to organization. In the “Membership by Request” option anyone in the organization can find and search for the team. However, they need to request access. Meanwhile, in the “Hidden” option no one can find the team, unless they are

Time Management and Procrastination

691

Fig. 36 Privacy settings through + button

invited. Lastly, in “Public to Organization” anyone can find the team and view public projects without requesting access see Fig. 37. You can change team settings at any time by clicking the gear icon and selecting “Edit Team Settings”.

Fig. 37 Extended privacy settings

692

4.2.11

L. Pinke et al.

Using Workspaces

A workspace is a group of people that collaborate on projects and tasks. They don’t require members to have the same company email domain. If you sign up for Asana through a personal email addresses (for example Gmail), you’ll automatically be placed in a workspace. Workspaces have two types of members: workspace members and limited access members. Workspace members have full access to all projects, tasks, and conversations. They can also rename workspaces, upgrade or downgrade to a premium plan, become the billing owner, invite or remove people, and convert people to members or limited access members. By contrast, limited access members cannot access or edit project information unless it’s shared with them. To make someone a member or a limited access member, hover over their name in the workspace settings. Creating workspaces is super easy. Simply select your profile photo and click “My Profile Settings”. Next, select the Account tab and click “Create a New Workspace” [42, 43].

4.3 Trello Finding a good project management software can be tougher than you think, especially when you have already in mind the attributes that your calendar or time organizer platform should have. Trello is an online tool that provides easy, flexible, and visually appealing way to manage your projects or personal tasks and organize anything, already used by millions of people from all over the world. This visual organizing software has collected already around 4.6 million registered users—with one million monthly actives—turning their projects, whether household to-do lists or corporate realignment plans, into visual boards using its cloud-based software, see below Fig. 38.

4.3.1

Creating a Board

Boards are the main feature of Trello, and each one that you create serves as the main organizational zone for individual projects, events, ideas or collaborations. For Fig. 38 Trello logo

Time Management and Procrastination

693

Fig. 39 Creating board

example, at home you may have a real-life board for weekly tasks like cleaning, a calendar for remembering an upcoming birthday and planning a party, or a board for your fitness plan. At work you might have a board for your main project and respective tasks for it, a board for ongoing events. Board in Trello is a collection of cards organized in lists. You can create a board by clicking the “+” button next to your name and select “Create Board”. You’ll find this in the upper-right corner. You can choose to either set up your own boards or create team boards where you can collaborate and share your tasks with work colleagues. Both options are easy to set up and give you good outlook and transparency on tasks. You can enter a title for each board, which is how you’ll identify the boards in your account see Fig. 39.

4.3.2

Filling Up Board

Your board starts blank without any lists, so you’ll need to create one. By clicking “Add a list” Trello will add an empty list to the board. Lists are the categories for your board, and entries on those lists are called “cards”, which can be added later and moved between them shown in Fig. 40. For example, for a to-do list for your home, you can make a “To-Do” list and later add cards for everything that needs to be done, or an “In-Progress” list for tasks that you’re working on, and a “Done” list for tasks that you have already are completed. Those lists are even possible to be moved and rearranged by clicking on them and

694

L. Pinke et al.

Fig. 40 Creating a list

dragging them on desired position, until you’re satisfied with the structure of your board.

4.3.3

Using Cards

Next step that needs to be done after we created the board itself and lists to organize individual thoughts or plans are the details or phases for all those respective lists. Cards are individual notes that you add to each list that should capture all the details in order to get a full idea of what and how it needs to be done. Each card can be a task, idea, recipe, or any other entry that matches your board or list see Fig. 41 below. You can move cards between lists as necessary, for example you can start adding cards to lists ‘Incoming’ or ‘To Do’, until you start working on them, at any point they can be moved to an ‘In Progress’ list and finally into the ‘Done’ list. It is helpful to create a separate card for each task and after you can fill out its description with complete details, even attaching images or files for more comprehensiveness. Moreover, every card can have assigned colour label (more about that in Sect. 4.3.6) or created checklist which simplifies the management of the project tasks and co-ordination between them. Then you can assign due dates to each task, and use the Calendar to get a birds-eye view of your week or overall look of the month tasks shown in Fig. 42.

Time Management and Procrastination

695

Fig. 41 Trello cards

4.3.4

Collaborating

Finally, when your board is filled with various lists and tasks assigned to individual lists, you’ll want to add the team members responsible for these tasks to the card. Trello allows you to invite as many people as you would like to join your board and therefore you can share chosen tasks with them. You can invite your family members for household boards or your co-workers for the work boards. This can be accessed whether through clicking the “Show Menu” link in the upper-right corner and then “Add Members” button or through “Members” button on the right-hand menu while in any task window. Adding people to your board will allow chosen members to view and edit your Trello board. Members can be invited via e-mail. If the email address is already linked to an active Trello member, he/she will be immediately added to the board. On the other hand, if the e-mail address is not associated with a Trello member, there will be an invite sent to join Trello. When the addressee signs up, they’ll immediately join the board you sent the invite for, shown below in Figs. 43 and 44.

696

Fig. 42 In detail filled out project card

Fig. 43 Adding members to projects

L. Pinke et al.

Time Management and Procrastination

697

Fig. 44 Adding e-mail address to projects

4.3.5

Getting More from Trello

In case you are using Trello daily and work with repetitive administration tasks, the feature of “Butler” seems very useful. Butler, Trello’s obedient automation robot, can perform actions instead of you by setting up commands beforehand. There are 4 types of Butler commands: • Buttons that run an action on a card or across an entire board or card in a single click. • Rules that are instantly triggered by a set action. • Scheduled Commands that are performed on specific days of the week, month, or year. • Due Date Commands that run-in relation to approaching or past due dates. Butler automatically reacts to your actions on the board, helping minimize the number of clicks needed to perform different tasks, see Fig. 45. With this command, two days before the card’s due date, Butler will add the ‘red’ (high-priority) label to the card and add it to the top of the ‘Up Next’ list.

698

L. Pinke et al.

Fig. 45 Creating a due date command

4.3.6

Visual Appeal

If you’re using an organization tool every day, it is also desirable to have clear and enjoyable looks. Trello’s colour coding features help gain visibility on your work and its respective parts. The visual colourful division of tasks is perfect way how to establish the level of priority for tasks or divide the tasks into certain groups according to your liking. For such matter, you can use Labels to visualize the groups or priorities within your Trello board. To create labels for your board, click into any card, and select ‘Labels’. In there, you can choose which colours to pair with which group and edit the name of each one. In order to assign a label to a card, click the “Labels” button to add a label of your choosing to a card. You’ll find this button on the back of the card, in the “Add” section. A new menu will appear, allowing you to select from several different colours. After you can give the label a name, which will appear over the selected colour, shown below in Figs. 46 and 47. Also, important to note, a card can have multiple labels if needed, for example: ‘Low priority’ and ‘Work-related’. If having every task displayed on the board gets too overwhelming and chaotic, there is an option to filter your board so only cards with a specific label are will be displayed. The overall design of Trello also helps with project management. The dashboard clearly outlines all your boards and the lists are clearly defined too side by side with the ability to move cards easily. The whole usage system is very user-friendly and intuitive.

4.3.7

Mobile App

Trello is also available to access remotely from your laptop through app for Android and iOS. You can download it for free in the app store on your device. After downloading and initial installation, you can similarly like on desktop version log into your Trello account or crate one from scratch. When you log in, you will be able to

Time Management and Procrastination

699

Fig. 46 Appearance of label settings

Fig. 47 Different cards with different labels

see all your boards in progress. Tapping on specific board will open all added lists and cards. For more clear view on your mobile phone, each list is displayed individually and in order to switch through lists you can use “swipe left/right” function [44, 45] , see Fig. 48.

700

L. Pinke et al.

Fig. 48 Appearance of the Trello app on mobile device

4.4 Basecamp Basecamp is an online project management software for businesses of all sizes. With its reasonable pricing of 99$ monthly, this tool provides all the basic functions and transparency to keep project flow going and clear, see Fig. 49. The advantage of the Basecamp is that this project management tool is extremely simple and straightforward for all types of businesses and professionals. This system is mostly aimed for collaboration for staff projects that do not require any budget planning or time tracking features. Conversely, it offers everything you might need coordinate, oversee or execute in your team project tasks. This platform gives you the option of creating projects with task lists, tracking of those tasks, sharing files, and discussing progress whether in task comment sections, project forums, or message board. On top of that, automated project check-ins are an innovating way how to avoid regular meetings and move them online. Fig. 49 Basecamp logo

Time Management and Procrastination

4.4.1

701

Creating Projects

Getting started with Basecamp is very simple. By clicking a button on its main website, in the top right-hand corner, you can start your trial and create your dashboard filled with projects. Once you click on the “Try it FREE” button, you can start your 30-day free trial with entering your name and email address, creating a password, perhaps entering the name of your company, and then selecting type of project you’re interested in. Once you’ve entered all of this information, you are taken right to the main dashboard stated in Figs. 50 and 51. Basecamp’s biggest advantage is represented in its simplicity. Organizing between different projects and teams is really easy on the home screen. Each project and team are displayed in clear blocks listed in alphabetical order containing the information you added, see Fig. 52. Basecamp allows you to break up your work into separate projects. Each project indeed contains everything that falls under your scheme of choosing: all the people you need to share with and assign tasks, discussion, shared documents, files, scheduled tasks, important date, etc. When it comes to usability, limited features that Basecamp offer were made into its biggest benefit. It can’t get much easier than Basecamp. Every project is clearly labelled with large icons while organized on the main dashboard, so it is very easy to orientate and enter the project you wish to work on as stated in Fig. 53.

Fig. 50 Basecamp main dashboard

702

L. Pinke et al.

Fig. 51 Basecamp home screen

Fig. 52 View of splitting work in different projects

Inside the menu for each project there are clearly displayed set of tools to managing your tasks and/or co-workers: message board, to-do list, automatic check-ins, schedule, “campfire” discussion feature, and file/document uploads.

4.5 Task Lists The main management and planning feature of your projects are task lists. Inside each of your created projects you can add tasks with specific names and description. Moreover, you can create To-do Lists for all the steps that need to be done and assign them to relevant team members and set due dates, shown below in Fig. 54. For better task prioritization and schedule system, every Basecamp project allows you to follow all the due dates for to-dos and events for that project in the Schedule.

Time Management and Procrastination

703

Fig. 53 Menu of a project

If you have any missed overdue tasks, this is how Basecamp can smartly remind you. Everyone on the project can see the schedule calendar, so they’ll all know which deadlines to follow, stated in Fig. 55.

4.5.1

File/Document Sharing

Every project includes a document storage tool which is a space where all your teammates can upload documents, files, spreadsheets and images regarding their task in the project, whether from their local computer of Google Drive, shown in Fig. 56. Since everyone on the project has access to this space and all previously shared documents, they will know exactly where to find what they need. In contrast from regular e-mail sending with attachments, where it’s a high chance the document might get lost in the e-mail chain, Basecamp document sharing platform makes it predictable and easily findable.

704

L. Pinke et al.

Fig. 54 List of to-dos

4.5.2

Communication

Communication is also very important when it comes to remote workplace functioning and tracking and discussing progress. Luckily Basecamp offers great platform for real-time communication whether it is through message board or in task comment sections, and lastly project forums. Each project is assigned with respective message board which can be accessed through its project menu. Message Boards keep entire conversation about project topics together in a single space. It allows the team to post announcements, pitch ideas, post updates on the progress and give feedbacks on the topic discussed, see Fig. 57. Second communication tool besides general message board is the possibility to comment on the To-Do sections. If in need of feedback on your ongoing task, all you

Time Management and Procrastination

Fig. 55 Appearance of overdue to-dos

Fig. 56 Storage of documents and files

705

706

L. Pinke et al.

Fig. 57 In-app communication

have to do is upload the ideas you want the feedback on into a task and tag relevant team members to let them know. When someone gets tagged, Basecamp will send notification to those users via e-mail, letting them know they have been reached out to in case they are not currently logged into the app. These two communication tools— Message Board and Comment Section—allow team members to discuss work on one platform instead of chats, emails and meetings. Lastly, when it comes to productive communication, Basecamp’s real-time group chat—called Campfires—lets you ask quick questions where you can get equally quick answers without the need of posting it on the dashboard and spamming your feed, displayed in Fig. 58.

Time Management and Procrastination

707

Fig. 58 Basecamp’s group chat—campfire

4.5.3

Automatic Check-Ins

The essential way how to keep everyone motivated and on track with their tasks is usually to hold weekly meetings on the progress status and update information. Basecamps offers a feature that lets everyone stay in the loop without the need of actual meetings. You can replace those routine status meetings with a tool called “Automatic Check-ins”, see Figs. 59 and 60. Check-ins let you ask your team members questions on a regular basis though the online software, with all the replies following up which creates easy to follow thread, shown in Fig. 61.

Fig. 59 Question appearing at 4:30 p.m. on every weekday

708

Fig. 60 Question appearing at 9 a.m. every Monday

Fig. 61 Group chat conversation

L. Pinke et al.

Time Management and Procrastination

4.5.4

709

Mobile App

Furthermore, Basecamp also offers a mobile app so you can stay in touch with your team even when you’re away from your desktop Basecamp. Luckily the mobile app functions almost exactly like the desktop version, so no need to learn new functions of operation. The icons look the same, the comment functions are identical, and the team tagging and messaging is very intuitive, with notification bubbles at the top of the home screen [44, 46, 47] , shown in Fig. 62.

4.6 Evernote Evernote is one of the time management tools we are going to occupy ourselves with. It is a platform or an app that helps its users to write down to-do lists or tasks, share

Fig. 62 Appearance of mobile app

710

L. Pinke et al.

Fig. 63 Evernote home page

the content with another person or persons, edit it, add some audio-visual features, and last but not least, to organize its priority and time to accomplish the tasks. As mentioned before, Evernote comes either in a web-page form or you can download the App on your device. Both ways, the first thing that welcomes us when we search for Evernote is its welcome page, see Fig. 63. As visible on the picture above, the main page welcomes the potential new users with a visual insight of the web desktop and the app of Evernote, how things look once working, and using this time management tool. On the top of the page, three main groups can be found. The first one, WHY EVERNOTE, the second one FEATURES, and the last one PLANS.

4.6.1

Why Evernote

The first feature promises us to know more about the advantages of this product. This part enhances the opportunities this app gives you. Mentioning saving every idea, syncing it through more devices, and sharing it with relevant people.

4.6.2

Features

The second option is called features. This window shows the features this product provides. They can be divided into 8 main groups based on their function as it can be seen in Fig. 64.

Time Management and Procrastination

711

Fig. 64 Evernote features menu

Sync and Organize This enables the user to get access to their saved notes, to-do lists from various devices. A project can be started from a computer and finished from a tablet or phone. There is no need to worry about losing some important changes made online on your device, as this enables us to see be always on track.

Web Clipper This tool makes it possible to add and attach web pages, articles, screenshot from magazines, and PDF documents to your file. By this, you can easily get access to your saved links, without having to worry about losing them. Moreover, screenshots are easily editable, so you can highlight the important parts, so in the future, it’s easier to find. Besides, if needed to add a comment or a text to a document, screenshot, PDF, you can easily update the document with a text box shown un Figs. 65 and 66.

Templates Templates are a pre-defined version of notes for various activities that are fully customizable. In this section, three main collections can be found, divided into groups based on the nature and environment of the focus group—for school, for work, for life. For example—build a plan, habit tracker, save money, get fit, goal tracker, weekly planner, travel inspiration. These can be found as well in the part of categories, where they are brought together based on their purpose, e.g. personal well-being, project management, school, travel, party planner, displayed in Figs. 67 and 68.

PDF and Document Search All Evernote plans allow you to attach PDFs and documents to your notes. In this feature, the user is allowed to search not just in their notes, but also in the attached

712

Fig. 65 Web clipper feature

Fig. 66 Highlighting and editing tool in web clipper features

L. Pinke et al.

Time Management and Procrastination

Fig. 67 Templates options

Fig. 68 Template—goals for the year

713

714

L. Pinke et al.

Fig. 69 Searching in documents

PDFs, scanned documents, handwritten notes. Moreover, when enabling your device to use your location during at least during the usage of Evernote, the search based on location, but also a date, keyword as it shows Fig. 69.

Spaces Spaces help to centralize information for large teams working on the same goal. This feature is mainly used by businesses, where huge working groups are brought together to share their project ideas. By this tool, partial tasks are collected, organized, and shared with other teammates. This allows the users of the team to see the big picture behind the smaller tasks and help them to be up to date with the news changes made in the document across all workers.

Search Handwriting This feature was partially used in another tool already, but here, we get to understand better the searching system in handwritten attachments. This system feature allows us to search not just in handwritings, but also in pictures of handwriting as post-it notes, to-do lists, or a whiteboard. Evernote manages to distinguish 28 typewritten and 11 handwritten languages displayed in Figs. 70 and 71.

Time Management and Procrastination

Fig. 70 Searching in handwritten documents

Fig. 71 Language options for searching in handwritten documents

715

716

L. Pinke et al.

Fig. 72 Document scanning

Document Scanning Scanning allows the Evernote user to have all the important documents in one place. Whether it is a bill, healthcare document, invoice, warranty, insurance, it is all saved and easier to look up in the future, when needed. Moreover, you can scan the newly received business card from your business partner, and all information as name, e-mail address, a phone number is going to be remembered, as it shows the Fig. 72.

App Integration The last feature is not only important but also helpful. App integration allows you to connect your Evernote account with other platforms such as Google Drive, Outlook, Slack, Salesforce, MS Teams. This allows the user to be more creative and productive, as through these platforms you can communicate easier so goals and plans get done.

Plans The third group is PLANS, see Fig. 73. This part enables us to choose the right package of Evernote for our needs. Evernote comes in three different categories. The first one is free, with basic features and slightly limited capacity, but still satisfying for daily usage. This version comes with

Time Management and Procrastination

717

Fig. 73 Types of evernote packages

the possibility to connect Evernote on 2 devices, has a maximum note size capacity of 25 MB, and the monthly upload limit is 60 MB. The second option is the premium package, which offers broader features. Compared to the basic package, not just higher note size capacity is available— 200 MB, but also a higher monthly upload limit—10 GB. Moreover, the number of devices is unlimited and the search in is PDFs and documents are allowed. The third option is designed for business purposes. Here the Evernote platform account is shared through more employees. Higher capacity is available, monthly upload 20 GB where 2 GB per user, visible team activity history, shared spaces for collaborations. After choosing the package which suits best your needs, you are ready to use the platform of Evernote with all information provided above. After creating your account, with your e-mail address and password, the following page will occur, as it can be seen in Fig. 74. On the left side in a bright green area is written “+ New Notes”, that is where the user creates the wished notes for the work that needs to be done, see Fig. 75. A layout appears, which is common for people making notes on a computer. A pre-written guide appears in each section, which helps the new user to navigate to the wished result. You can add a title for your notes, add files, choose pictures or even use one of the templates which suit the purpose of this note. Classical editing tools for typewriting are available, such as the font, size, colour, highlighting, underlining, inserting a link, alignments. Based on the nature of your document, you customize

718

L. Pinke et al.

Fig. 74 Appearance of evernote

Fig. 75 Creating new task

your notes based on your preferences, in order to be the most productive and organized to accomplish results [48].

4.7 Todoist Todoist is a platform, which helps its users to keep track of their duties. Todoist can be used in your web browser, downloaded as an app to your devices, or as an add ins in your e-mail address, or through the web platform. This tool aims to help more things get done. Working with this time management tool, a user account has to be set up. You can either create a profile through your e-mail address or simplify the signing up process by using one of your existing accounts on other social media platforms, such

Time Management and Procrastination

719

Fig. 76 Todoist main page

as Apple, Google, or Facebook. After creating a profile that takes only a few seconds, you are ready to use Todoist and start organizing your life. Having set up the account, you can choose whether to use the web platform or an app. Both ways, the first thing which will appear on your device is the Todoist basic view, shown in Fig. 76. The advantage of this tool lies in the simplicity of its usage. Naturally, when wanting to add something we use the symbol +. It is not any different in the Todoist. On the left side of the page are the features that can be used to help us in organizing any keeping our tasks well-arranged. Upcoming, we will get to know these tools better.

4.7.1

Inbox

The first window we can see has the title—Inbox. It is a place, where you, quick short ideas can be saved until future use when we decide how we want to accomplish those goals and what actions need to take place.

4.7.2

Today

This place gives us the possibility to see the “big picture” for today. With this tool, we can have a good overview of the day ahead of us.

4.7.3

Upcoming

Coming to the third folder, it is quite obvious that it has a similar purpose as the previous one, only for the next, upcoming days, see Fig. 77. Using the “+” button, you add a new task to your do-to list. After clicking on it, a new window appears. Here you can put down your thoughts, even in a short version, not to forget about them, shown below in Fig. 78. Start by writing down the intention, and you can simply add the deadline when the task needs to be fulfilled or when it’s a periodically occurring event e.g. monthly fee

720

L. Pinke et al.

Fig. 77 Creating new task

Fig. 78 Quick add task

for a subscription, it will remind you every month on a specific date you choose as you can see in Fig. 79. Having unrolled the Today button, an extensive list of possible dates pops up, shown in Fig. 80. Seeing this newly opened window, we can set the date for the task. Either easily assigning them into a pre-defined group as tomorrow, next weekend, next week, there is a possibility to assign it a specific date and also time, see the Inbox button in Fig. 81. This sign is helpful when you haven’t decided on which group of duties your task belongs to. So, having them saved in Inbox, makes sure it won’t get lost, and later, when knowing what the future of the task is, you move it along to the wished place. Coming to the table on the right side, four symbols are visible. Those are the following, see Fig. 82. Fig. 79 Today button

Time Management and Procrastination Fig. 80 Window of upcoming dates

Fig. 81 Inbox button

Fig. 82 Label button

721

722

L. Pinke et al.

Fig. 83 List of existing labels

Clicking on the button showed above, you can assign a label to the task. Meaning, that your task will automatically be moved to a cluster with similar tasks, or tasks related to a specific topic or project, shown in Fig. 83. For example, as visible, there are possibilities to assign a task to a label that is related to school duties, and those are shown in red colour. Moreover, free time activities are marked with blue colour and family events are marked with pink colour. The colours of the labels are customizable and can be changed anytime, see Fig. 84 By moving onto the second button, which is symbolized with a flag, we can choose the level of priority of the task. The 4 alternatives are even distinguishable by the colour based on their urgency. The very urgent cases, we can mark with Priority 1 which is red-coloured. The less urgent tasks we mar with Priority 2, the orange colour. The less urgent cases we mark with either blue or transparent flag, giving them priority levels of Priority 3 or 4 as displayed in Fig. 85 below. The third but one button is designed for giving reminders for the tasks. Those reminders are useful for having the author reminding of check-ups. The last buttons’ purpose is for adding quick comments to the task. The illustration for adding quick comments can be seen below in Fig. 86. Having all important details of the task, easily add it to your Todoist, by clicking on enter or clicking on the Add task button, illustrated below in Fig. 87. Fig. 84 Categories of priorities

Time Management and Procrastination

723

Fig. 85 Setting a reminder for a task

Fig. 86 Adding quick comment to task

Fig. 87 Add task button

By having added tasks to your to-do list, likely a lot of them will be focusing on upcoming events. To get a better overlook of your future activities and obligations click on the panel on the left side on the button Upcoming, shown below in Fig. 88. Afterwards, an overview of your saved upcoming events will be grouped and arranged based on date. Staying on the panel on the left side, we will get to know the part of Projects as shown below in Fig. 89.

4.7.4

Projects

By choosing the Projects window, other sub windows will appear. You can find here your personalized projects you work on. You can edit them to distinguish them from each other, e.g., by choosing different colors for project, and naming them differently. See the example on the picture on the left, PR Campaign is color green and another project named Marketing for ABC is shown with color yellow. Free time duties and

724

L. Pinke et al.

Fig. 88 Upcoming button

Fig. 89 Projects displayed in project window

activities can be listed in the project—Home—Free time. You can move them around by dragging and dropping them to the wished place. An important hack in the Todoist is the possibility to add sub-tasks for each main task or goal. As we can see in the picture positioned below, the main project is PR Campaign. There are some to-dos on the list. But the task—Search for sponsors has 2 sub-tasks, which means smaller activities to reach the goal. Those subtasks are horizontally moved more to the right side, to indicate that it is a subtask as it can be seen in Fig. 90.

Time Management and Procrastination

725

Fig. 90 Project and its subtasks

Fig. 91 Other view on subtasks with the possibility to edit them

Todoist offers another position of view for the subtasks. In the upcoming picture, we can see that one main to-do task has a sub-task, better itemized, as it gives space for adding comments to the subtasks and also to see the activities and changes performed on the task displayed in Fig. 91. This platform offers to share your content with your partners, family members, whoever you are going to share your program with. For having the to-do list more organized, it offers the possibility to sort your list based on the alphabet, due date, or priority.

4.7.5

Labels

The next step we can use to be more productive is to use Labels. Labels were created to simplify our lives, by grouping similar tasks together based on their nature. So, it is easy to see a list of tasks with any given label. In the picture below, we can see the following labels—5, 10, 30, 15 min, BUY, READING, WATCH, MONTHLY, LINKS, WEEKLY. As we could notice in the Todoist, we can personalize the labels

726

L. Pinke et al.

Fig. 92 Labels

as we want. We can move them around by dragging and dropping them to the wished final place or adding different colours to the labels. By having labels, it is easy to search for the tasks which take around 15 min, when an unplanned 15 window appears in our schedule. Clicking on the label 15MIN, we can start right away without having to lose any precious time as it can be seen in Fig. 92.

4.7.6

Filters

The next feature we will get to know is the Filters. Filters group your tasks as well, but differently than Labels. Meanwhile, you manually and consciously assign your tasks into a certain label, the filters group them based on their similar nature. As on the picture below, there is a filter named—No due date. In this filter window, you can find all your tasks that have no due dates assigned to them. Moreover, we can see filters which group your notes based on the level of their priority (Priority 2, Priority 3, Priority 4). The possibility to organize and personalize filters is available in this tool as well. By this filter, the user gets a different view of their tasks, which helps to boost productivity. Using the filter Today, you won’t forget a task in a day, even when they are assigned for different ongoing projects [49] as it can be seen in Fig. 93.

Time Management and Procrastination

727

Fig. 93 Filters

5 Conclusion Time management as a skill and its tools are making our daily lives not only easier but also more productive. The above-mentioned examples are just a few of the best tools available on the market. From the vast amount of possibilities, the ones mentioned are used not only by individual users for their daily lives, but also by huge multinational corporations. This fact only emphasizes the importance and effectivity of time management tools in times of possible threat—procrastination. The time management tools are design to make our daily tasks easier. They are created not only to mark some to-do tasks, but also to keep efficiency in the workflow and ideas. With added features we can easily just save an idea until it becomes a concrete project, or we can save pictures, documents, web links for further use in the future. Moreover, we can add scanned files to our documents or we can continue our work on it from different devices, e.g. tablet, with written words. If we discuss teamwork project, the mentioned tools are all easily sharable in-between teammates, which enhances effective workflow. To have a more structured project split into smaller parts—subtasks. To continue the workflow some applications provide builtin platform for chatting and being in touch with your teammates. For the forgetful and busy ones, features in form of reminders are available, so there is no need to worry that an important matter will be forgotten. To have everything organized to the last point, you can add labels and group your activities and tasks based on the priority, due date, project. Pre-set templates can be used, which you can customize based on your preferences. The time management tools give the user a wide range of possibilities, to organize their tool based on their needs and preferences. The aim of this article was to introduce some of the available tools of time management on the market to the readers, in order to combat procrastination. Concluding

728

L. Pinke et al.

all information and possibilities, it is important to state, that time management tools play an indispensably important role in the daily lives of people.

References 1. Ludwig, P.: Konec prokrastinace - Jak pˇrestat odkládat a zaˇcít žít naplno. Jan Melvil Publishing (2013) 2. Lay, C.H.: At last, my research article on procrastination. J. Res. Personality 20(4), 474–495 (1986). https://doi.org/10/bpbd9b 3. Schouwenburg, H.C.: Procrastination, motivation, and personality: towards a motivational theory of procrastination. In: Paper presented at the 4th Biennial International Conference of Procrastination (2005) 4. Gabrhelík, R.: Akademická prokrastinace: Ovˇeˇrení sebepopuzovací škály, prevalence a pˇríˇciny prokrastinace. Disertaˇcní práce, Masarykova univerzita (2008) 5. Chun Chu, A.H., Choi, J.N.: Rethinking procrastination: positive effects of “active” procrastination behavior on attitudes and performance. J. Soc. Psychol. 145(3), 245–264 (2005). https:// doi.org/10/dxcw6t 6. Tiansang, Y.: Self-oriented perfectionism, achievement goals and procrastination. Magisterská práca, University of Kansas (2014) 7. Ferrari, J.R., Emmons, R.A.: Procrastination as revenge: do people report using delays as a strategy for vengeance? Personality Individ. Differ. 17(4), 539–544 (1994). https://doi.org/10/ cc6wdm 8. Ferrari, J.R., Doroszko, E., Joseph, N.: Exploring procrastination in corporate settings: sex, status, and settings for arousal and avoidance types. Individual Differ. Res. 3(2), 140–149 (2005) 9. Ferrari, J.R.: Christmas and procrastination: explaining lack of diligence at a “real-world” task deadline. Personality Individ. Differ. 14(1), 25–33 (1993). https://doi.org/10/cr2pbw 10. Ferrari, J.R., O’Callaghan, J., Newbegin, I.: Prevalence of procrastination in the United States, United Kingdom, and Australia: arousal and avoidance delays among adults. Am. J. Psychol. 7(1), 1–6 (2005) 11. Ferrari, J.R., Dovidio, J.F.: Examining behavioral processes in indecision: decisional procrastination and decision-making style. J. Res. Personality 34(1), 127–137 (2000). https://doi.org/ 10/fk6ndw 12. Hen, M., Goroshit, M.: The effects of decisional and academic procrastination on students’ feelings toward academic procrastination. Curr. Psychol. 39(2), 556–563 (2020). https://doi. org/10.1007/s12144-017-9777-3 13. Sliviaková, A.: Akademická prokrastinace ve vztahu k perfekcionismu. Diplomová práca, Masarykova univerzita (2007) 14. Mann, L.: Decision-Making Questionnaire (1982) 15. McKenzie, K., Schweitzer, R.: Who succeeds at university? Factors predicting academic performance in first year Australian university students. High. Educ. Res. Dev. 20(1), 21–33. https:// doi.org/10/ddrvs2 16. Milgram, N.A., Gehrman, T., Keinan, G.: Procrastination and emotional upset: a typological model. Personality Individ. Differ. 13(12), 1307–1313 (1992). https://doi.org/10/fmm5rr 17. Sliviaková, A.: Prokrastinace v adolescenci a mladé dospˇelosti. Rigorózna práca, Masarykova univerzita (2011) 18. Ferrari, J.R., Johnson, J.L., McCown, W.G.: Procrastination and Task Avoidance. Springer US, Boston, MA (1995) 19. Özer, B.U., Demir, A., Ferrari, J.R.: Exploring academic procrastination among Turkish students: possible gender differences in prevalence and reasons. J. Soc. Psychol. 149(2), 241–257 (2009). https://doi.org/10/b9jxjp

Time Management and Procrastination

729

20. Klingsieck, K.B., Grund, A., Schmid, S., Fries, S.: Why students procrastinate: a qualitative approach. J. College Student Dev. 54(4), 397–412 (2013). https://doi.org/10/f45j4c 21. Gafni, R., Geri, N.: Time management: procrastination tendency in individual and collaborative tasks. IJIKM 5, 115–125 (2010). https://doi.org/10/gg87b8 22. Burka, J.B., Yuen, L.M.: Procrastination: Why You Do It, What to Do About It Now. Da Capo Lifelong Books (2008) ˇ svou kreativitu, zlepšete svou pamˇeˇt, zmˇenˇ te sv˚uj 23. Buzan, T.: Myšlenkové mapy - Probudte ˇ svou kreativitu, zlepšete svou pamˇeˇt, zmˇenˇ te sv˚uj život. BIZBOOKS (2012) život - Probudte 24. Aeon, B., Aguinis, H.: It’s about time: new perspectives and insights on time management. AMP 31(4), 309–330 (2017). https://doi.org/10/gftp2m 25. Haynes, M.E., Haynes, R.: Time Management (Crisp Fifty-Minute Books), 3rd edn (2006) 26. Sabha, R.A., Al-Assaf, J.A.-F.: Awareness of the faculty members at Al-Balqa’ Applied University to the concept of time management and its relation to some variables. IES 5(5), 116 (2012). https://doi.org/10/gh42rg 27. Kirillov, A.V., Tanatova, D.K., Vinichenko, M.V., Makushkin, S.A.: Theory and practice of time-management in education. ASS 11(19), 193 (2015). https://doi.org/10/gh42qz 28. Gruber, D.: Time management: Prokrastinace. konflikty, porady, vyjednávání, emaily, mobily, angliˇctina. Management Press, Praha (2017) 29. Covey, S.R.: To nejd˚uležitˇejší na první místo. Management Press, Praha (2015) ˇ ek a cˇ as: Time Management IV. generace. Grada (2006) 30. Pacovský, P.: Clovˇ 31. Wolters, C.A., Won, S., Hussain, M.: Examining the relations of time management and procrastination within a model of self-regulated learning. Metacognition Learn. 12(3), 381–399 (2017). https://doi.org/10/gckwws 32. Koch, R.: The 80/20 Principle: The Secret of Achieving More with Less. Nicholas Brealey Publishing (1997) 33. Allen, D.: Getting Things Done: The Art of Stress-Free Productivity. Penguin, New York (2003) 34. Cirillo, F.: The Pomodoro Technique (The Pomodoro), p. 45 (2006) 35. Rafke, H.D., Lestari, Y.D.: Simulating fleet procurement in an Indonesian logistics company. Asian J. Shipping Logistics 33(1), 1–10 (2017). https://doi.org/10.1016/j.ajsl.2017.03.001 36. Jalote, P., Palit, A., Kurien, P., Peethamber, V.T.: Timeboxing: a process model for iterative software development. J. Syst. Softw. 70(1–2), 117–127 (2004). https://doi.org/10/fvbsdg 37. Pratchett, T., Young, G., Brooks, C., Jeskins, L., Monagle, H.: Practical Tips for Developing Your Staff, 1st edn. Facet (2016) 38. de Klerk, N.: Free-time management amongst generation Y students. MJSS (2014). https://doi. org/10/gh42rh 39. Kayla, K.: monday.com for Event Planning (2021). https://support.monday.com/hc/en-us/art icles/360016879799-monday-com-for-Event-Planning40. O’Sullivan, F.: Monday.com Tutorial: A 2021 Beginner’s Guide to Project Management Like a Boss (2020). https://www.cloudwards.net/monday-com-beginners-guide/ 41. Shakhovska, N., Yakovyna, V., Kryvinska, N.: An improved software defect prediction algorithm using self-organizing maps combined with hierarchical clustering and data preprocessing. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) Database and Expert Systems Applications, pp. 414–424. Springer International Publishing, Cham 42. Unito Team: The Ultimate Beginner’s Guide to Asana (2018). https://unito.io/blog/the-ult imate-beginners-guide-to-asana/ 43. It’s in Basecamp. https://basecamp.com/features 44. Morpus, N.: Basecamp vs. Trello: The Perfect Matchup. https://www.fool.com/the-blueprint/ basecamp-vs-trello/ 45. Whitten, E.: 6 Ways to Use Trello for Effective Self-Management (2020). https://blog.trello. com/use-trello-to-prioritize-and-self-manage 46. Morpus, N.: Basecamp Review. https://www.fool.com/the-blueprint/basecamp-review/ 47. Todoist. https://todoist.com 48. Monday. https://monday.com/

730

L. Pinke et al.

49. Hoshovska, O., Poplavska, Z., Kryvinska, N., Horbal, N.: Considering random factors in modeling complex microeconomic systems. Mathematics 8(8), 1206 (2020). https://doi.org/ 10/gh35dg 50. Adams, R.V., Blair, E.: Impact of time management behaviors on undergraduate engineering students’ performance. SAGE Open 9(1), 215824401882450 (2019). https://doi.org/10/ghn dbw 51. Balamurugan, M.: Structure of student time management scale (STMS). JSCH 8(4), 22–28 (2013). https://doi.org/10/gh42qw 52. Balkis, M., Duru, E., Bulus, M.: Analysis of the relation between academic procrastination, academic rational/irrational beliefs, time preferences to study for exams, and academic achievement: a structural model. Eur. J. Psychol. Educ. 28(3), 825–839 (2013). https://doi.org/10/gg7 9zh 53. Boevé, A.J., Meijer, R.R., Bosker, R.J., Vugteveen, J., Hoekstra, R., Albers, C.J.: Implementing the flipped classroom: an exploration of study behaviour and student performance. High. Educ. 74(6), 1015–1032 (2017). https://doi.org/10/gcj2vp 54. Buzdar, M.A., Mohsin, M.N., Akbar, R., Mohammad, N.: Students’ academic performance and its relationship with their intrinsic and extrinsic motivation. J. Educ. Res. (2017) 55. Cohen, S., Kamarck, T., Mermelstein, R.: A global measure of perceived stress. J. Health Soc. Behav. 24(4), 385 (1983). https://doi.org/10/d2wgms 56. Ellis, A., Knaus, W.J.: Overcoming Procrastination. Signet (1977) 57. Grunová, M.: Akademická prokrastinace a její negativní dopady na vysokoškolské studenty. PEDAGOGIKASK Slovak J. Educ. Sci. (4), 261–280 (2015) 58. Hamdan, N., McKnight, P., McKnight, K., Arfstrom, K.M.: A review of flipped learning: flipped learning network 15 (2013) 59. Hamzah, A.R., Lucky, E.O.-I., Joarder, M.H.R.: Time management, external motivation, and students’ academic performance: evidence from a Malaysian Public University. ASS 10(13), 55 (2014). https://doi.org/10/gh42q2 60. Lay, C.: Layova škála prokrastinace pro studenty. Klinika adiktologie (2019) 61. Lay, C.H.: A modal profile analysis of procrastinators: a search for types. Personality Individ. Differ. 8(5), 705–714 (1987). https://doi.org/10/dctfr4 62. Macan, T.H., Shahani, C., Dipboye, R.L., Phillips, A.P.: College students’ time management: correlations with academic performance and stress. J. Educ. Psychol. 82(4), 760–768 (1990). https://doi.org/10/ds6pn3 63. Mahler, D., Großschedl, J., Harms, U.: Does motivation matter? The relationship between teachers’ self-efficacy and enthusiasm and students’ performance. PLoS ONE 13(11), e0207252 (2018). https://doi.org/10/gfkpmv 64. O’Brien, W.K.: Applying the Transtheoretical Model to Academic Procrastination. Dizertaˇcná práca, University of Houston (2002) 65. Shokeen, A.: Procrastination, stress and academic achievement among the B.Ed. students. Educ. Quest Int. J. Educ. Appl. Soc. Sci. 9(1), 125–129. https://doi.org/10.30954/2230-7311. 2018.04.17 66. Solomon, L.J., Rothblum, E.D.: Academic procrastination: frequency and cognitive-behavioral correlates. J. Couns. Psychol. 31(4), 503–509 (1984). https://doi.org/10/fhb69d 67. Steel, P.: The nature of procrastination: a meta-analytic and theoretical review of quintessential self-regulatory failure. Psychol. Bull. 133(1), 65–94 (2007). https://doi.org/10.1037/00332909.133.1.65 68. Unal, Z., Unal, A.: Comparison of student performance, student perception, and teacher satisfaction with traditional versus flipped classroom models. Int. J. Instr. 10(4), 145–164 (2017). https://doi.org/10/gfdcjm 69. Zarick, L.M., Stonebraker, R.: I’ll do it tomorrow: the logic of procrastination. College Teach. 57(4), 211–215 (2009). https://doi.org/10/c6dw2d 70. Evernote. https://evernote.com 71. Asana. https://asana.com

Creating Database Models in Rational Data Architect ˇ Artur Bogusławski, Peter Veselý, Lucia Husenicová, and Ondrej Cupka

Abstract One of the fundaments of the Information age are databases that store the information. It is crucial to understand their uses and how they operate. Within this paper, we are analyzing various uses, types, and programming languages. As a practical demonstration and better conduct analysis, we also create database models or better put one single database model in several programs to better compare their advantages and disadvantages. Additionally, we are analyzing which databases are best in what kind of business situations. We may conclude that there is no single best data modeling software and that the decision on which to use needs to be made with a prepared list of requirements in mind. While a simple database may be modeled in free software, more complex will require paid software, such as Rational Data Architect. The analysis conducted in this paper should aid in making such decisions. Keywords Database · Database modeling · Key · Object · Attribute · Relationship

1 Introduction From the smallest company to the largest corporation, from a private company to a public institution, every company needs a system that will help it properly organize a large amount of information. The development of technology in every area of life helps to improve and speed up the work done. It is delivering quality products while A. Bogusławski Lodz University of Technology, Lodz, Poland ˇ P. Veselý (B) · L. Husenicová · O. Cupka Faculty of Management, Comenius University in Bratislava, Odbojarov 10, 831 04 Bratislava, Slovakia e-mail: [email protected] L. Husenicová e-mail: [email protected] ˇ O. Cupka e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_22

731

732

A. Bogusławski et al.

saving time or using fewer resources to complete a task. In today’s highly competitive landscape, this may be of utmost importance and decide if a company succeeds or fails. It is therefore highly important to use electronic databases to tame and large amounts of data that is used and gathered by a company. The perception of the database can be compared to how we receive air. Everyone knows that it is, but no one will see it, and at the same time it is one of the most important elements for us. Most users using applications do not realize that most run on a database, which is one of the most important elements of the system. Of course, users know that they work on information, focusing on the activities performed in the program. If data ran out, the program would probably have significantly limited functionality or could not do anything at all, it would become useless. The purpose of this paper is to explore the possibilities of rational data architect in terms of database modeling. The scope of work includes conducting an analysis of the creation of a database model in Rational Data Architect and several selected programs created to support modeling as well as cloud-based solutions. The created database will be created in each of the selected programs. This will make it easier to compare individual applications’ functionality and demonstrate their advantages and disadvantages against the competition. The second section of the paper contains concepts on the origins, types of databases, how they are managed and created. The third section describes the applications that were used to perform the analysis and compare their advantages and disadvantages. The technologies used to create database modeling programs are also discussed. Section four is devoted to the programs selected to create the database model. There will be a brief description of the application and a description of its capabilities. The next section presents the implementation of the logical model prepared in the previous section. Each of the selected programs performed the creation of a physical design model.

2 Databases The term database first appeared at the symposium in the 1960s and became popular only in the 1970s. Since then, data management technologies have started to grow quite quickly. More and more concepts have been developed in how to create databases to meet all market expectations. Since large corporations and companies store the largest amounts of data, they were most interested in such ventures. The implementation of such systems allowed them to increase productivity, which led to increased profits and reduced the costs of their activities [1]. These days the offer of database services has become a large market, with several companies making it their primary source of income. The market of Database-as-aservice or DBaaS is expected to grow to 320 billion USD by 2025. The market truly started with the large-scale implementation of cloud technologies and with Amazon offering these services in 2009 with its AWS product. This will be deeper analyzed in Sect. 2.6 [2, 3].

Creating Database Models in Rational Data Architect

733

2.1 Types of Databases Since the 1960s and the creation of the idea of databases, there have been many variations and differing ideas on how the structure of the database should look like. The following are some models that have been developed over the years [4]. 1.

Simple databases: (a)

(b)

2.

file databases It is a database based on ordinary text files. Each row in the file corresponds to a record in the database. Columns, on the other hand, are created by citing the corresponding character (separator) between the fields. The data must not contain a separator character. This splits the column, resulting in partial data loss [4]. hierarchical bases This model is based on the structure of the tree. The structure of the database is multilevel based according to the detail of the data. From the root, that is, the most general data and terms, up to the most detailed level of the structure, that is, the leaves. Such a structure is, for example, used to manage directories in operating systems. By entering redundant data resulting from model constraints and complex relationships, over time, this model required replacement by a new, correcting and simplifying it [4].

Compound databases: (a)

(b)

(c)

relational databases It is currently the most common database model of all types. The data is stored in tables that contain records. Each record is highlighted by a unique field value. This is called a key. Its task is to uniquely identify the records in the table. This way, unlike the simple databases, described earlier, you do not need to know how to arrange the data in the database in order to search it efficiently [4]. object databases A model close to relational. The biggest difference is that it doesn’t work with records, only with objects. It is a fairly new technology, not fully proven in practice, which makes it risky for companies that want to implement it. It looks similar to object-oriented programming languages in both languages. It is operated on objects using the methods they provide. They also have other object-oriented characteristics, such as encapsulation, inheritance, or overloading of objects [4]. object-relational databases These kinds of databases are a combination of the two previous models. This allows to reduce the defects resulting from the relational model, but preserve its positives using technologies proven in relational systems while also introducing the positive characteristics of object databases [4].

734

A. Bogusławski et al.

2.2 Database Concepts All databases are based on a certain data structure. It can contain personal data, data on cars, apartments, etc. But to reflect information as it does in real life, stored data has to consist of many elements. Just like with a car, a lot of the data is not just its product name but also the number of horsepowers or the capacity of the tank that collectively and accurately define the vehicle. Similarly, other types of electronic data are created. The basic unit in the database is fields, otherwise known as attributes. This is where information of different categories, such as first name, last name, the address, is stored. Many of these fields consist of a database record, called a tuple. Thanks to this data structure, you can easily sort it by the selected category and find all the information on each one of them. To uniquely distinguish data, each record has at least one field by which it is identified. This is called a master key that cannot be repeated within a given database [4]. The structure of data stored in records is called an array or entity. There are appropriate relationships between arrays (Figs. 1, 2 and 3) to express the relation existing between them. For example, this layout allows to see all the data faster. Entity bindings are divided into three types, described below [4]. In a one-to-one relationship, each record in one table is directly related to one record in another table. In other words, the data from these two tables could be placed in a single table (Fig. 1) [4]. The most commonly used relationship is one-to-many. This means that one record in table number 1 has a binding to one or more tuples in table number 2. In practice, one supplier can supply one or more products (Fig. 2) [4].

Fig. 1 One-to-one relationship

Fig. 2 One-to-many relationship

Fig. 3 Many-to-many relationship

Creating Database Models in Rational Data Architect

735

The last type is a many-to-many relationship (Fig. 3) is rarely used because of difficulties in implementing it. It can be done by adding two one-to-many relationships and adding a third array, the so-called intersee, that combines tuples from two tables. In practice, a relationship means that multiple tuples from one table can be bound to one or more tuples from the other table. In the example of suppliers, the same product can be supplied by more than one vendor [4]. The development of database technology has not only developed data management but also improvements for better protection of data against data loss or corruption. This has led to queries in a given database being worked on and written in SQL (Structured Query Language), which will be more closely described in the next section [4]. Once the database has been created, it is successively supplemented with information. Over time, it will have such a large amount of data that it will take a considerable amount of time to manage. To manage one or more databases, a Database Management System (Fig. 4) was created. Each system provides basic functions. They are divided into three groups: data care, data mining and data control [4]. Data management involves adding new structures to a database, deleting and modifying existing structures, and entering new data, updating and deleting data. Data mining is the “extraction” of data from a database by users in or through utility programs for the use of data. Database administration means controlling data, i.e., creating and monitoring database users, restricting their access to the database, and overseeing the operation of the database [4].

Fig. 4 DBMS features [5]

736

A. Bogusławski et al.

2.3 Advantages and Disadvantages of Using Databases When creating new technologies and constantly developing their capabilities, there are always some limitations. For technologies used to create, store, and manage data, this is no different. The advantages and disadvantages of databases are presented below. The advantages are: • management large amounts of data quickly and easily, • protection of data against data corruption or loss, • the ability to share the same data, saves space by not having to store data on each device separately; however, this raises the problem of simultaneous access to data by multiple users, • the ability to analyze relationships between data during common use, • control over data, when only selected data can be made available to chosen users, for example, information about the place of residence, • reducing data redundancy. Disadvantages include: • any redundancy causes memory resources to be wasted, • the operation of the database is strongly dependent on the previously executed model—the database is only a physical implementation of the logical model to remove errors caused by poor design, after which it is usually required to create a new base, • damage to the database is highly problematic, if no backup copy exists.

2.4 SQL Language SQL or Structured Query Language is a special programming language for databases. It is a language belonging to the 4th Generation Language group. Languages of this generation are characterized by the fact that they are easier for users to understand. Like any programming language, it has its advantages and disadvantages. Therefore, one of the most important elements for the good functioning of the database is the creation of a model that decreases the negative impact of language defects on the database, which would result in a faster and more efficient operation of the database [6]. SQL has the right functions to perform basic data activities, such as sorting, searching, linking, and editing data quickly and easily. SQL can be divided into several sections for functional reasons: DML (Data Manipulation Language), DDL (Data Definition Language), DCL (Data Control Language) and DQL (Data Query Language). The DML group, or data manipulation, includes, for example, insert, update, delete queries. The DDL defines the data. This group includes queries such as create or drop. Another group, the DCL, controls the data. The queries that meet

Creating Database Models in Rational Data Architect

737

this functionality are grant, revoke. The last DQL group deals with query formulation. Unlike previous groups, there is only one command here, namely select, which is sometimes included in the DML group [6]. The unit that stores all the information is an array. Without this data container, we can’t do anything. Therefore, the first command that we use in SQL is the command to create an array: CREATE TABLE table_name ( column_name1 data_type, column_name2 data_type,.... )

Once we have created the tables and introduced the first data into them, the database is ready to handle further queries about the information we are interested in. One of the most important functions of the database management system is search. This is one of the most commonly used commands for querying. This process is performed by using the select command. The query syntax looks like this: SELECT nazwa_kolumny,... FROM nazwa_tabeli [WHERE condition];

A condition with a where clause is not required. Without a condition, all records in the array will be returned. Another feature is data editing. This is done using the update statement: UPDATE nazwa_tabeli SET nazwa_kolumny = value,... WHERE nazwa_kolumny = value

A subquery may appear in the condition to reach the fields that are required. The next basic step is to delete the data from the database. This is performed using the delete query: Tables: DELETE FROM nazwa_tablicy WHERE nazwa_kolumny = value

The insert query introduces new data into the relevant table: INSERT nazwa_tabeli (value1,value2,...)

(column1,column2,...)

VALUES

The order by query allows data to be rated in a ascending (ASC) or descending way (DESC): SELECT nazwy_kolumn FROM nazwa_tabeli ORDER BY nazwy_kolumn ASC| Desc

In conclusion, SQL is simple and readable for most users. Compared to lower-level languages, learning SQL is much easier. However, the syntax may be easier for users, there is no full control over how processes are performed, which is usually associated with less optimal use of resources. However, it has to be said, that with today’s memory standards and processor speed and the rapid development of technology, this is a relatively small disadvantage.

738

A. Bogusławski et al.

2.5 Modeling Process Before a physical database can be modeled, a logical model must be created. The logical model specifies the base requirements of the database and the assumptions that must be met. It is only after the database has been designed and checked, for example, that there is no unnecessary redundancy, that the project of creating a physical database can be implemented. If the project is not thought out, users who need data may have trouble using the database in the future, or they may receive inaccurate information [5]. Modeling requires analysis of the types of data for which the database will be created. With the best understanding of the data, we are able to design it to prevent redundancy. A well-prepared design also affects the subsequent handling, for example, it will be possible to obtain identical data using different queries and allow to choose the most optimal solution in the given situation [5]. To design the database well, is required to follow several rules established as best practice. The first rule is to check whether the database will be able to handle basic and possibly sudden requests for information. Second, to make sure that the tables that are created describe exactly one topic. The table must have a master key that allows to uniquely identify the record and verify that the distribution of information in the tables contains as little redundant data as possible. The third aspect is to ensure data integrity at three levels of the database structure, namely the field level, table as a whole, and their relationships. Guaranteeing integrity at each level ensures that the data read by the database is correct each time. At the lowest level, integrity is maintained by the type of data used in tuple attributes, some of which are shown in Table 1. Data integrity consists of three factors: accuracy, correctness, validity and timeliness. Type names may vary in some programs, for example, in Microsoft Jet, the Int type is recognized as Long Integer and Tiny Int as Integer [5]. The last thing to keep in mind is the future update of the database structure. At some point, it may be found that the current appearance of the database is outdated and needs to be modified. Therefore, when designing, it is required to take into account future updates. Base structures should therefore be easy to modify [4]. If the project of creating a database is properly executed, the easy operation of the database will be ensured. Changes that are made to certain columns and simply modify the value in the fields will not significantly affect the behavior of the rest of the database. Well-spaced tables and a well-defined relationship between them can speed up reading information from a database. Additionally, such an optimized database will also be easier to implement for utility applications [5]. To create a database, a user has to go through three stages: requirements analysis, data modeling, and normalization. The analysis of requirements depends heavily on the subject for which the database is created. It should be carefully checked what information is accurate so that it is not required to change the entire database later, because, for example, one important piece of data is not included for the issue worked on by its future users. In the modeling phase, an important part is the graphical representation of the structure of the database. One of the most common methods

Creating Database Models in Rational Data Architect

739

Table 1 SQL Server-compatible data types [7] Data types Logical data type

Sql server data type

Range of values

Memory usability

Total

Int

Integers from − 2,147,483,648 to 2,147,483,647

4 bytes

Small Int

Integers from −32,768 up to 32,768

2 bytes 1 byte

Tiny Int

Integers 0 to 255

Decimal (exact)

Decimal

Integers or fractional 2–17 bytes numbers −1038 to 1038

Variable (number of approximately)

Float

Approximately the number up to − 1.79E308 to 1.79E308

Real

Approximately to the 4 bytes number of −3.40E38 to 3.40E38

Character (constant long)

Char

Up to 8000 characters

1 byte per declared character

Character (variable length)

Varchar

Up to 8000 characters

1 byte per stored character

Boolowski (logical)

Bit

0 or 1

1 byte

8 bytes

is the entity-relationship diagram (entity relationship diagramming). With it the columns, fields and primary and foreign keys are determined [5]. The next stage is normalization. This is a process that aims to remove as much redundant data as possible and is designed to help prevent problems with inserting, minifying, and deleting data from records. This is done by dividing large tables into smaller ones. During this phase, the data structure is checked for compatibility with normal forms and any corrections are made. There are several normal forms, each of which is designed to eliminate a certain type of problem. Normal characters currently in use are the first normal character, the second normal character, the third normal character, the fourth normal character, the fifth normal character, the boycott normal character, and the normal key/domain character [4]. As it can be observed, the process of creating a database is not simple. However, proper preparation of the database plans in the project phase will prevent future difficulties resulting from the poor implementation of the logical database model and, besides, can improve its functioning [4].

740

A. Bogusławski et al.

2.6 Cloud Databases Database-as-a-service or DBaaS is a cloud-based service. This service allows its users to use all benefits a database offers without setting up their own hardware to store data. Meaning, that costs can be saved on the maintenance of servers, installing and maintaining database software, outsourcing this to specialized companies that utilize economies of scale. Companies and individual users alike can therefore focus on their core tasks. Another major advantage is also the flexibility and pay-as-youuse nature of DBaaS, allowing for further optimization. If a company experiences a sudden spike in database resource use, it can very quickly scale up and, depending on the contract signed, at relatively little cost [3, 8, 9]. The main benefits of this service are: • Cost-saving—Companies and individuals save on upfront costs coming from buying expensive hardware and software, hiring staff to maintain them, space where equipment is kept and required security measures [10, 11]. Another important source of cost-saving is the fact, that the use of database resources generally fluctuates. Companies, therefore, tend to buy enough equipment to cover peak demand, which leads to great inefficiencies, with some companies utilizing less than half of capacity on average [12]. • Scalability—As mentioned in the last point, DBaaS allows its users to scale up and scale down as needed in mere minutes. This increases efficiency, further decreasing costs. • Simpler management—the use of DBaaS eliminates the need to manage onpremise databases. Most if not all of the burden coming from the management of databases is done by the provider [8]. • Fast time-to-market—this is mostly relevant to larger companies, where security procedures would require approval and sometimes reconfiguration of the databases to allow the use of new developments, software or updates. If using this service may be able to bring these new developments to market in several minutes time [8]. • High security standards—providers typically provide the highest available security and with that encryption and identity and access management tools [8]. • Reduced risk—services and security are often guaranteed by providers, who are bound to compensate any down-times caused or lower quality of service, reducing financial risks [8, 13, 14]. • Higher quality software—providers offer a selection of specialized, safe and highquality software to manage the resources used [8]. • Lowered barrier for companies to enter the market—another benefit that needs to be mentioned is the effect this has on barriers of entry. Small IT companies were often limited by the aforementioned costs of hardware. The much lower costs of DBaaS have substantially decreased barriers for new companies and startups to enter the market [15].

Creating Database Models in Rational Data Architect

741

Although it needs to be mentioned that DBaaS offers several disadvantages that are often missing in the discussions surrounding this topic, some of them are listed below: • Implementation effort—while this may not be considered by small or new companies, this may be a major point for companies that may want to switch to DBaaS. The costs, time and effort of implementing such a major change to the IT infrastructure may make it a less ideal option [15]. • Dependence—as major parts, if not all data is stored in the servers of a provider, it makes the company in question dependent on it [15]. • Network issues—using cloud technologies makes the company more dependent on the internet and the quality of the connection between it and the provider. This may make the company’s performance subject to both latency and bandwidth issues [15]. • Lack of control—a company using DBaaS services is simultaneously giving up direct control over the hardware and data stored. • Corporate espionage concerns—while corporate espionage has not been proven to exist when using cloud services, it is a concern that needs to be addressed. It is not impossible for a less reputable cloud-solutions provider to misuse their access to stored data for illicit purposes [14, 16]. Security should therefore be taken into account when deciding to use such services. Strong encryption should be used on all data communication between the company and the provider [17].

3 Database Modeling Programs With the development of any field in computer science, whether it is computer graphics, computer networks, or programming, many new tools are created with an increasing range of features, facilitating more complex work in a given field. This is no different from databases. On the Internet we find numerous applications supporting the operation and creation, including modeling databases. However, not all new tools improve on the previous ones. In the field of computer science, these are mainly applications or similar technologies used to deal with certain problems. A selected few are listed and analyzed below.

3.1 Rational Data Architect Rational Data Architect (RDA) was created by IBM. However, it has to be noted that the application name was changed to InfoSphere Data Architect. This application was created to support modeling and data integration. The RDA has built-in options that help a user understand the information and the dependencies between them. Some include information mapping, modeling, and database analysis. The application is not free but offers a 30-day trial version. For novice database users, it may prove

742

A. Bogusławski et al.

Fig. 5 History of changes for a file in CVS

too complicated. The interface is based on the Java-based framework. RAD works not only with IBM products but also with almost all of the most popular database servers that can be connected via the JDBC controller. In addition, the application features an extensive help function for the program, which includes a glossary of the words and a hierarchical view of the elements of the model, which will help in the management of the entire project [18]. The most important features of the program are: • designing logical and physical models for DB2 (relational server Information Management Software family), Oracle, Sybase, Microsoft SQL Server, MySQL, and Teradata, • elements from a logical and physical model can be represented graphically using Information Engineering (IE) or Unified Modeling Language (UML), • the ability to import existing logical or physical models and make changes to the database of the model, • improving data quality and consistency by defining and implementing standards of name, meaning, value, relationship, privilege, and privacy, • allows incremental design, that is, the next addition of functionality or data, in a word consists in expanding the database, • the ability to generate a physical model based on a logical model, or to move from logical model to a physical one, • it has different dictionaries of server data types that are needed to generate a physical model, depending on which server is required for use, • provides a standard group work tool, such as the Concurrent Versions System (Fig. 5).

3.2 MySQL Workbench The MySQL GUI Tools Bundle is a set of popular and useful graphical tools for managing MySQL 5.0 databases. The MySQL GUI Tools Bundle is no longer developed by MySQL and has been replaced by MySQL Workbench. MySQL Workbench

Creating Database Models in Rational Data Architect

743

Fig. 6 SQL editor in MySQL workbench

allows database administrators and data architects to graphically design, model, generate and manage databases. The application contains everything that is needed, such as options for creating Entity-Relationship (ER) models or reverse engineering. The reverse engineering process connects to the selected database, and the program automatically creates an Entity Relationship Diagram (ERD) of the corresponding database or part of it. In the normal process of creating a database, such diagrams are created in the initial phase and only from them is a database created. MySQL Workbench also has a SQL query editor where it is possible to create, execute, and optimize queries [19]. The most important features of the program are as follows: • ability to generate SQL scripts (Fig. 6), • SQL command history and SQL syntax highlighting (Fig. 6), • an overview mode of operation in which the entire base model is presented in a single cross-sectional view, • support for the design of databases at conceptual, logical and physical levels, • export model as Create SQL script, • import and export of DBDesigner4 models, • export models in PNG, SVG, PDF, Postscript formats, • Visual representation of tables, views, built-in procedures and functions, • Automatically arrange tables in a diagram, • view key server information (Fig. 7), • user management (Fig. 7).

744

A. Bogusławski et al.

Fig. 7 User management and server status in MySQL workbench

3.3 DbDesigner DbDesigner is a completely free tool for both home and commercial use. What’s more, the application is developed under an open-source license, which allows legal and free copying, both of the source code and any modifications to the source code. This allows developers to easily add new plug-ins themselves and develop the functionality of the program. Models created in DbDesigner are stored as XML files. Thanks to them, they can be easily modified, not only with DbDesigner [20]. One of the major advantages for DbDesigner users, as previously emphasized, is the fact that the application runs under a freeware license and that it is still being developed. A great advantage is a clear interface that makes working with DbDesigner intuitive, and the application is good for novice users who are just getting acquainted with database modeling. The interface is divided into two parts. The first is design, where most of the necessary elements are presented on the sides of the screen, and between them is left space for the graphic design of the created model. The second part is the query interface (Fig. 8), where you can work on data in tables by building SQL queries after you are in connection with a database. The documentation was made available on the program’s home page [21]. The most important features of the program are: • • • •

a window containing a bird’s-eye view of the diagram (Fig. 9), available objects, such as tables, relationships, labels, regions, plug-in interface, design interface and SQL query interface,

Creating Database Models in Rational Data Architect

745

Fig. 8 SQL query editor in DBDesigner

Fig. 9 Navigation panel

• • • • • • • •

the ability to create queries directly from the diagram (Fig. 10), reverse engineering of MySQL, Oracle, MSSQL and ODBC databases, support for weak entities, support in the creation of documentation, the model design can be kept as a graphic, data types can be defined (Fig. 11), the ability to keep the model in the database, history of executing SQL commands.

The presented programs have all the sub-functionalities needed to model the database. The interfaces in each case were intuitive and similar in operation. These more professional applications also have the ability to connect to the server and create a physical database based on the model. In conclusion, each program has met the expectations for creating a database model.

746

A. Bogusławski et al.

Fig. 10 Create SQL queries from a diagram

Fig. 11 Editor create your own data types

4 Create a Base Model In each of the described in section four, the program carried out the process of creating a table, records, that is, in general, a database model. The following section shows this process and other functionalities that make modeling a database much easier. After that has been done, an evaluation of the programs will be carried out.

Creating Database Models in Rational Data Architect

747

4.1 Logical Model Design As mentioned earlier, the preconditions and build of a logical model must be defined before working with modeling programs. The database is intended to meet a certain purpose according to certain pre-established or forced assumptions, for example, by the industry. For the purposes of this paper, it was assumed that we were commissioned to create a database for a company that sells cars in a car dealership. The industry for which the database will be created is known, so it is possible to specify what functions it should perform. The database schema reflects all the actual attributes and dependencies of objects within the specified subject. The purpose of the designed database is to store information about cars in the showroom, customers and placed contracts and orders. The data that will be stored in the database is: • • • • • • •

Car salon workers, Car salon customers, cars in the showroom, orders of cars by customers, orders for new cars for the showroom, sales invoices for customers, or new cars to the showroom, car repair data.

4.2 Database Preconditions When determining the initial assumptions, a deeper analysis of the operation of the car dealership needs to be conducted. This way, when tables are created, it is known what restrictions need to be applied to tuples. A list of examples of car dealership assumptions and information is as follows: • cars in the showroom are identified by the body number, • basic information about cars, for example, price, engine power, tank capacity, fuel consumption per 100 km, • the car has a 2-year warranty from the date of purchase, • during the warranty, the customer can bring the vehicle for repair (Fig. 12), • salon customers are divided into regular and new customers (the customer becomes a regular customer from the third purchase), • orders for new cars (Fig. 13), • employees place orders on behalf of customers, • invoices after five years from the date of their listing are automatically deleted, • employees can order new cars to the showroom.

748

A. Bogusławski et al.

Fig. 12 DFD—car repair

Fig. 13 DFD—order for new cars

4.3 Defining Functionality The question must be asked what databases to implement. After the requirements are outlined, the functionality must verify that the database has all the information that is needed to meet the user’s requirements. If not, then missing tables, tuples, or relationships need to be added. Nonetheless, once all the scheduled functionalities have been created and functioning properly, the appropriate queries in SQL can be written. The basic requirements for the car dealership model that is being created are as follows:

Creating Database Models in Rational Data Architect

• • • • •

749

the employee places an order for new cars (Fig. 13), the customer buys the car through the employee (places an order), customer search search for cars in the showroom customer’s repair report to the employee, and the employee submits the relevant documents (Fig. 12).

4.4 Data Flow Diagrams The employee who accepts the customer’s report of damage to the car purchased from the dealer is first obliged to check the warranty. Additionally, an employee would check whether the salon service is obliged to make repairs. With a positive result, the employee fills out the relevant documents, and the car is transferred to the workshop (Fig. 12). Another situation that is presented in Fig. 13 is the placing of an order for new cars by an employee. The supplier receives information about the demand of the showroom for specific car models. The supplier fulfills the order and invoices together with cars accepted by the employee in the showroom. After checking that the order is complete, the car dealership database is updated. The selected database design considerations are implemented before starting to model the applications. With a specific outline of data and functionality, the designating can focus on modeling the database and writing SQL queries that will return and perform the tasks needed to run the car dealership.

4.5 Rational Data Architect The program’s home screen is virtually identical to eclipse. People familiar with this environment will easily find themselves in this IBM product. Each of the panes is interactive. They can be minimized to the sidebars, stretched, and moved to the users liking and convenience of work. A help pane appears immediately with a page explaining how to create a logical base model to help in getting started (Fig. 14). Each page can be printed or bookmarked. In the help, it is possible to switch between: • standard search in terms of the issue of interest, • a list of content that is divided by theme, • bookmarks, where the user can save the pages that interest them so that they will find an interesting topic more quickly, • an index containing all the issues relating to databases and the program sorted alphabetically.

750

A. Bogusławski et al.

Fig. 14 Rational data architect home screen

Before starting to model a database, it is required to create a database project. For each project, folders of each category are automatically created, including data diagrams, model mapping, data models that are represented in diagrams, XML schemas, and SQL scripts (Fig. 15, left). In the right frame of the design window (Fig. 15), there are elements used to create a logical model. There the user can find entities, generalization, and all types of entity relationships, such as a one-to-one or many-to-many relationship. Relationships can be split additionally for identifying and non-identifying reasons. In addition to the fact that each of them is presented graphically on the diagram, they also have a different task. An identification union means that a child board cannot be uniquely identified without a parent. This relationship is used to model many-to-many relationships, where the key array that connects the remaining keys are the individual keys from the other two tables. For example, we have three account blocs, a person (who is to have an account) and a third board connecting the client to the account assigned to him: Account (id_konta, nazwa_konta, typ_konta) Konto_osobiste (id_konta, id_osoby, s´rodki_na_koncie) Person (id_osoby, first) The relationship between the Account board Konto_osobiste and between the Person and the Konto_osobiste identifies the row in the Konto_osobiste. No tuple in

Creating Database Models in Rational Data Architect

751

Fig. 15 Start modeling at rational data architect

this cannot exist without a defined account or person. Konto_osobiste cannot exist when there is no person with an account or when there is no account that the customer could have. A non-identifying relationship is when each table can be identified independently through its private key, for example: Account (id_konta, nr_konta, id_typu_konta) Typ_Konta (id_typu_konta, name, description) The relationship between the Account array Typ_Konta is not identifying because the account type can exist without the need to assign it to the account. Using the right panel in the modeling window (Fig. 15), we drag the elements of interest to the field to perform the diagram (box marked with a blue frame, Fig. 16). See the “Cars” entities with the already added key and one regular attribute. The red box contains table properties and where it is possible to add, remove, and modify entity attributes.

752

A. Bogusławski et al.

Fig. 16 Rational data architect’s database modeling process

To the left of the frame is a menu with table properties, such as generalization information or entity relationships. The green frame shows the type change for the tuple. The predefined data types that the user can select. There it can be determined whether an attribute is required or has a default value. In Fig. 17, you can see that the model diagram looks clear. The master key and foreign keys are always at the top and are clearly separated from the usual table attribute. Each of the relationships has an “attached” name to each other, so even when swapping table seats, we will not get lost in relationships between them. You can automatically generate a SQL script from the created model. In this script, you will find the code responsible for creating each table, along with its tuples, attributes, and relationships that appeared in the diagram. This feature in the program saves a lot of time for the designer in implementing the model, especially with a more extensive system. An example of the generated code is shown in Fig. 18. Another functionality that we have at our disposal is the analysis of the made model. Figure 19 shows the options and the sample result of the analysis. On the left, in a red box, two Options Are Shown Analyze Impact and Analyze Model. With this option, analyze impact helps you show the impact of changes to the model before they are actually implemented into it. The result of the action is presented graphically together with the report in text form for transparent and accurate analysis by the designer. The second option, Analyze Model run on a given model, checks

Creating Database Models in Rational Data Architect

Fig. 17 Final model of the created database in rational data architect

Fig. 18 Generating SQL code in rational data architect

753

754

A. Bogusławski et al.

Fig. 19 Analyze model

it for compliance with standards and logically for two of the same fields in a given table, for example. At the bottom of the window, in a green box, all the program has spotted the errors and warnings about the model. When you double-click on an error or note, you are moved to the table, field, or relationship where the problem occurred. After making corrections and performing a successful analysis, we can be sure that the model that was created complies with the standards introduced in the entity-relationship diagrams. The next option is so-called mapping. The purpose of these features is to better understand the completeness and integration of models. A mapping model is a summary of common characteristics between two independent data sources. When we add models of interest to mapping, we have several options to choose from. Figure 20 shows the full model on the left and the model stripped down to several tables on the right. In the context menu, you can see the Find best fit and Find similar options. The second option is less accurate than the first. Selecting Find similar has combined the name fields with the position that came from different tables. The initial option, i.e. Find the best fit, gives a more precise result. After mapping, you notice that this field has been linked with identical names from the same tables, and for example, the client’s table has been completely omitted by the match.

Creating Database Models in Rational Data Architect

755

Fig. 20 Mapping database elements in rational data architect

A database model was executed, and several additional outbound modeling features were provided, albeit helpful in creating the database. The next program is described below, checking to see if I offer similar solutions.

4.6 DbDesigner DbDesigner is much less complex than Rational Data Architect, as seen from the application’s home screen (Fig. 21). The program lacks built-in help or a tutorial. However, the style in which the menu was created is so intuitive that the developers of the program could afford it. Users don’t need to do anything else to start modeling, such as creating a project as in a previously described application. It is possible to start modeling as soon as the program starts. On the left side of the program window, it will place tools to perform the model database that is being created. Arrays, relationships, and regions are available to divide the model into smaller parts. A field navigator is placed to the right of the window to create the model. The tables are illustrated there, so we can organize the model more easily visually. Below, the green box contains the most commonly used data types used in creating field attributes. In the tab next to it there is access to the full list of types, which are divided thematically, numeric types, data and time types or string types. In the last table named Db Model, there are entities with columns and relationships in the form of a drop-down tree. Figure 22 shows how to create a single table. At the very top of the window, starting from the left, we have the name of the table, the table prefix, and the table type. MySQL supports several table types, such as BDB or HEAP. However, the most commonly used type is MyISAM, which is set by default, and InnoDB, which

756

A. Bogusławski et al.

Fig. 21 DBDesigner home screen

supports data compression and row-level locking. As the first tuple, the program automatically creates a key to the table and gives the name based on the name of the array, for example, “idscars” for the array “scars”. Right after the name, the data type is set. It allows to type or select a type from the drop-down list. The next two columns are checkboxes. It indicates whether there may be no data in the tuple, NN (Not Null) and whether there is an automatic addition/change of values, or AI (Auto Increment). Users can then determine whether the values they type with or without a signed character. It is also possible to determine whether the tuple will be zeroed in if no other value is entered by the user. In the penultimate spine of me, we can set the default value of the tuple if no value has been entered. In the end, there is a space for writing any comment, such as the information for the cell to store. Below adding tuples, there are additional options: access to table keys, primary and foreign, if any. In advanced options, users can also set the minimum and maximum number of rows that the table has (Fig. 22). Figure 23 contains a model of the base. In the upper right corner, operators can see the window with the project in miniature. There are seven tables that occupy a small portion of the surface to model. DbDesigner, like most database modeling tools, has the ability to generate a SQL script based on a logical or physical model. Figure 24 provides an example of the generated code for the table creation.

Creating Database Models in Rational Data Architect

757

Fig. 22 Create a table in DBDesigner

Figure 25 shows the options for generating a SQL script. The most conspicuous option is that users don’t have to create a script for all tables. They can select a part of the database by selecting the appropriate views with tables whose list is on the right side of the window. Besides the possibility of entire views, there is an option to choose several selected tables, select the first option, that is, export selected tables only, and as a result, receive SQL queries that create only the selected tables. Another option is to change the order in which the condensate creates tables. By default, arrays are created in alphabetical order. If there are foreign keys in the model, using the appropriate table arrangement option would be recommended, according to order tables by Foreign Keys. This avoids a situation where a relationship-creating PR error occurs when you create an array because one of the tables will not exist yet. Therefore, basic arrays without foreign keys are created first. In the next settings, there are flags for creating master keys (Define Primary Keys), which ensures that keys are set up in the script. Another option is the ability to create point points in the script (Create Indices). The third flag is responsible for creating foreign keys (Define Foreign Key Reference). The last three flags, output table options, Output Standard Inserts, and Output Comments, should be selected when a table contains changes to options or comments that have been added.

758

A. Bogusławski et al.

Fig. 23 Finished model of the created database in DBDesigner

Fig. 24 Example of a generated script for an array of “invoices”

4.7 MySQL Workbench The last program in which the database model was executed is the MySQL Workbench tool (Fig. 26). The interface of the tool is clear. Starting from the left, we have tools to create and manage connections to data servers. Database models are displayed in the middle. Users can open an existing project or create a new one in three different ways. The first is to create a project and create a model with own hand. The second

Creating Database Models in Rational Data Architect

Fig. 25 Model port as SQL script

Fig. 26 MySQL workbench start screen

759

760

A. Bogusławski et al.

Fig. 27 Screen after creating a project in MySQL workbench

option allows executing the model based on the existing database user specifies. And the third way is to use the SQL script. In the right part of the interface, there is for user’s disposal server management. After access to the selected server, the user has access to information, for example, about the number of links to the server, the state of the system, that is, the use of the CPU unit and memory. There is an option to add a new user or change the permissions present and disable or enable the server. Let’s consider the modeling aspect. When choosing to create a new model, the design pane appears (Fig. 27). The tools for creating the model are presented in the form of a section. A separate pane (blue frame) contains all the diagrams that make up the project. Below, in the green box, the tables in the model, views, or groups are shown in the physical section of the database schema. Additionally, it is possible to add database users and their privileges in the Model Privileges section. Below are the sections for SQL scripts and database comments. There are two ways to create tables. The first way, of course, is to prepare the model graphically (Fig. 28), and the second is to use the options in the currently discussed window (Fig. 27). User can choose to add a table (Add Table) and create it without using a desktop environment. The red box lists the data types defined by the User. If a custom data type is needed, there is an option of creating it by oneself and give it any name user wants. Now let’s get to model base data in the ER diagram. The diagram execution view is shown in Fig. 28. MySql Workbench has all the basic tools that have also been observed in previous applications. Like previous applications, MySQL Workbench has a model-wide view to help to navigate the diagram. The window in Fig. 29 shows all the tables that appear in the sample model as a drop-down tree. In the diagram, where tables and their

Creating Database Models in Rational Data Architect

Fig. 28 Creating a table in MySQL workbench

Fig. 29 The finished base model in MySQL workbench

761

762

A. Bogusławski et al.

relationships are graphically represented, the program also has features for creating views, groups, and new chart layers that help to group individual tables in a model, for example, in terms of thematic. The diagram field has been enriched with several additional features. User can add a text note or drawing that is directly visible in the diagram. After creating new tables, there must be created fields for the new tables. Under the diagram field where the entities are located, there are options for creating entity slices. The field is created in a standard procedure. We give the name of the tuple, select the type of data it will store, and determine the checkboxes, whether it is the master key, whether the tuple must have some value, or whether user want to increase the entity itself. Additionally, each field can be described in more detail in the comment if the tuple name gives little information about its purpose. After completing the steps needed to create the tables and their relations, another finished model was obtained, which is presented in Fig. 29. The diagram has one more useful functionality. Namely, when “hovering” over a table relationship line, all the elements (foreign keys) associated with the connection are highlighted. Being able to check the connection of relationships between tables helps greatly in seeing possible errors when connecting the wrong table. The above section shows how to create a database model and some additional options that are sure to work with databases, servers, and SQL. Each of the presented programs provided similar modeling capabilities, including graphical modeling, model-wide view, and sql scripting to create a modeled database.

5 Analysis of Programs As part of the work, database models were created in the selected tools. Below, they were assessed according to specific criteria. For each of the criteria, the tool can receive from 1 to 5 points. As shown above, programs are multitasking database tools with the most important functionality in one place. Therefore, the assessment was divided into two parts. The main goal is to create a database model. Thus, the focus was on evaluating individual functionalities: 1. 2. 3.

the convenience and readability of the interface, database modeling, help you find errors in the model.

Secondary options for model creation were then taken into account, but at the same time important in practical terms: 1. 2. 3. 4.

Connection to the database server, the possibility of creating a base on the basis of the model made, application documentation, additional options.

Creating Database Models in Rational Data Architect

763

5.1 Rational Data Architect Rating Let’s start with Rational Data Architect (RAD). First point, interface. Using an eclipse environment, the interface is convenient and clear. Each window can be resized or minimized to sidebars, saving space. The second point is to create a model. RAD has all the tools you need to work on your bases quickly. User can make a logical model in a diagram with a prepared palette of elements, which we use by simply dragging the necessary components into the diagram. From a logical model, it is possible to generate a physical model, saving oneself work. Once this step is finished, one can perform model analysis for errors, already in the modeling phase, which is a big advantage. RAD has the ability to connect to database servers, so users can create a database based on the model. Being a tool from IBM, of course, it supports the company’s servers best. However, it can support many other servers. In this regard, versatility is a big advantage because one does not need to use additional applications or export the model to another program. Another category is program documentation. RAD has extensive documentation built-in. We can simply search, search alphabetically, or search thematically. One of the more useful options is to be able to use the mapping technique if you get lost in more data and their relationships with each other, for example, with two databases. In conclusion, in RAD, we will find all the tools for modeling databases and practical action on them.

5.2 Rating DBDesigner Unlike the previous tool, DbDesigner immediately appears in the modeling interface when enabling the application. The application was made in a minimalist way, although we will find everything that is needed for modeling databases. It offers to create a table, relationships, and double-click on tables, add tuples and change table settings. One of the upsets is the lack of direct resizing of tables or views. To change the size of any feature, users must first select the options that they want and then drag the component. For the last category, which is to check the model for logical errors, unfortunately, DBDesigner does not provide this. DBDesigner supports database connection, mainly MySQL, but also Oracle, SQLite, or MSSQL. Thanks to this, it’s possible to implement the created logical model. Unfortunately, the documentation of the program is not available through the Applications. Information on the use of DBDesigner tools can be found on the manufacturer’s website. On the home page, we have the opportunity to view the documentation online or download it as a PDF file. The source materials are extensive. There can be found a view and with each window of the application, and the rules of operation of all options. DBDesigner does not stand out from other programs in terms of functionality.

764

A. Bogusławski et al.

5.3 MySQL Workbench Rating The last application to be analyzed is MySQL Workbench. This program has the most transparent interface of all evaluated. When starting the program, users get an interface divided into three groups: database connections, creating database models and administering data servers. There are four options for each section, mainly by creating connections to the base, a new model, or a new server instance. Each department also has a place where connections, models, and servers are displayed. MySQL, of course, has all the functionality of modeling programs that we exchanged when analyzing the rest of the programs. The distinguishing factors are the ability to highlight the relationship and the tuple associated with it, which can be used for some help in finding bugs. We can also make some cosmetic changes in the diagram. There is an option to change the notation in which we maintain tables and relationships in the diagram. MySQL Workbench does not stand out from the competition also in terms of database connections and the ability to create them based on models. The documentation is included in the application. Not only did it explain how the individual options work, but it also included a tutorial on how to get started with the tool. In addition, MySQL Workbench includes a scripting shell where you can create scripts, for example, in Python.

6 Summary As part of the thesis, the database modeling was carried out and three applications were analyzed. It’s time to evaluate the programs. Summing up Rational Data Architect, DBDesigner, and MySQL Workbench applications, each of them performs its task of modeling databases. There is a certain inconvenience in DBDesigner, although it has all the features needed to create a model. Depending on whether we care about convenience and additional functionalities, for example, connecting to a server and working on a physical database, one can prefer Rational Data Architect, which has multiple servers, or MySQL Workbench. Three applications have been selected for testing, but there are many other database modeling tools on the market. The choice is wide, so if the goal is not working on very complex databases, it there is an option to choose free tools that will also meet expectations in the field of database modeling. In summary, choosing a database modeler is not a difficult decision, given that users can model a project in each of the selected applications. Only when we have a larger project to do, we work in a larger group of people, or we need an additional feature that is not directly under the database modeling team, should we think more about choosing an application. When there is a need for more than just creating a model, for example, the ability to create a database on a server, edit data, or manage database users, it is better to choose more extensive applications, such as Rational Data Architect, where creating and running a database is supported by a single

Creating Database Models in Rational Data Architect

765

program. This way, users don’t need to have any additional tools, which allows them to work more conveniently on the project than constantly switching between multiple applications.

References 1. Foote, K.D.: A brief history of database management. In: DATAVERSITY (2017). https:// www.dataversity.net/brief-history-database-management/. Accessed 27 Jan 2021 2. Ltd R and M: Cloud Database and DBaaS—Global Market Trajectory & Analytics (2020) 3. Poniszewska-Maranda, A., Matusiak, R., Kryvinska, N., Yasar, A.-U.-H.: A Real-time service system in the cloud. J. Ambient Intell. Humanized Comput. (2020). https://doi.org/10.1007/ s12652-019-01203-7 4. Hernandez, M.J., Nowakowski, P.: Bazy danych dla zwykłych s´miertelników. Mikom, Warsaw (2004) 5. Beynon-Davies, P.: Systemy baz danych, 3. Wydownictvo Naukowo-techniczne, Warszawa (2003) 6. Dudek, W.A.: Bazy danych SQL: teoria i praktyka. Wydawnictwo Helion (2006) 7. Banachowski, L., Ch˛adzy´nska, A., Matejewski, K.: Relacyjne bazy danych: wykłady i c´ wiczenia. PJWSTK (2004) 8. IBM: Database-as-a-Service (2020). https://www.ibm.com/cloud/learn/dbaas. Accessed 31 Jan 2021 9. Poniszewska-Maranda, A., Kaczmarek, D., Kryvinska, N., Xhafa, F.: Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. J. Comput. 101(11), 1661–1685 (2019). https://doi.org/10.1007/s00607-018-0680-z 10. St˛epie´n, K., Poniszewska-Mara´nda, A.: Towards the security measures of the vehicular ad-hoc networks. In: Skulimowski, A.M.J., et al. (eds.) Internet of Vehicles. Technologies and Services Towards Smart City, IOV 2018, LNCS 11253, pp. 233–248. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-030-05081-8_17. ISSN 0302-9743. ISBN: 978-3-030-05080-1 11. Poniszewska-Mara´nda, A., Rutkowska, R.: Access control approach in public software as a service cloud. In: Zamojski, W., et al. (eds.) Theory and Engineering of Complex Systems and Dependability. Advances in Intelligent and Soft Computing, vol. 365, pp. 381–390. Springer, Heidelberg (2015). ISSN 2194-5357. ISBN 978-3-319-19215-4 12. Accenture: Engineered Systems Solutions on Oracle (2021). https://www.accenture.com/_acn media/accenture/conversion-assets/dotcom/documents/global/pdf/technology_9/accentureengineered-systems-solutions-on-oracle.pdf. Accessed 31 Jan 2021 13. Poniszewska-Mara´nda, A.: Access control coherence of information systems based on security constraints. In: SafeComp 2006: 25th International Conference on Computer Safety, Security and Reliability, Sept 2006, LNCS 4166, pp. 412–425. Springer, Heidelberg (2006) 14. Poniszewska-Mara´nda, A.: Role engineering of information system using extended RBAC model. In: Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE-2005), Linkoping, Sweden, June 2005 15. Méndez, M.V., Ferguson, D., Helfert, M., Pahl, C.: Cloud Computing and Services Science. Springer, Madeira, Portugal (2018) 16. Poniszewska-Maranda, A., Majchrzycka, A.: Access control approach in development of mobile applications. In: Younas, M., et al. (eds.) Mobile Web and Intelligent Information Systems, MobiWIS 2016, LNCS 9847, pp. 149–162. Springer, Heidelberg (2016). https://doi. org/10.1007/978-3-319-44215-0_12. ISSN 0302-9743. ISBN: 978-3-319-44214-3 17. Chow, R., Golle, P., Jakobsson, M., Shi, E., Staddon, J., Masuoka, R., Molina, J.: Controlling data in the cloud: outsourcing computation without outsourcing control. In: Proceedings of the 2009 ACM Workshop on Cloud Computing Security, pp. 85–90 (2009)

766

A. Bogusławski et al.

18. IBM: InfoSphere Data Architect—Overview (2020). https://www.ibm.com/products/infosp here-data-architect. Accessed 24 Jan 2021 19. Harrington, J.L.: SQL dla ka˙zdego, 3. Mikom, Warsaw (2005) 20. DBDesigner: Homepage. In: DB Designer (2021). https://www.dbdesigner.net/. Accessed 30 Jan 2021 21. FabForce: DB Designer (2021). http://fabforce.eu/dbdesigner4/. Accessed 24 Jan 2021

The Dynamic Environment of Pricing in E-Commerce and the Impact on Customer’s Behavior Jozef Sirotnak and Dmitry Ushakov

Abstract With the increase popularity and usage of online shopping, e-retailers are evolving their strategy to modify their prices for products and services. This leads to different charging costs for essentially the same good and can cause an outrage in customers. The aim of this article is to showcase relevant sources regarding the problem of fairness and the outcome this price strategy can cause within a consumer, while also covering an example of this dynamism in airline and hotel businesses. Results indicate that businesses should aim for a stable long term relationship with customers to win their loyalty, as they view small price changes as fair (e.g. roughly around 5% price difference). Future researches can focus on a combination of dynamic pricing and loyalty programs, to investigate whether substantial price differences (e.g. above 25%) could be accepted by loyal customers by including any additional benefits (e.g. discount bonus for the future). Keywords Blockchain · E-commerce · Pricing

1 Introduction Price is a term known to everyone. It sets a value which an individual has to pay for a certain product or service. That means price has unique attributes which is also why it is one of the four P’s of marketing (i.e., product, price, place, promotion) [1]. Therefore, the strategic value of pricing cannot be overlooked. On the one hand sellers seek a high selling price to maximize the profits, but on the other buyers prefer low costs to get the most of their money’s worth. That is why the topic of pricing was and is being researched that much with a deeper look at the customers’ reaction to those costs. Following this, a product does not always have one value price, it more often J. Sirotnak University of Vienna, Oskar Morgenstern Platz 1, 1090 Vienna, Austria D. Ushakov (B) Graduate School of Business, HSE University, Myasnitskaya, 9/11, Moscow, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_23

767

768

J. Sirotnak and D. Ushakov

than not fluctuates, depending on the region it is being sold at or the development of demand and supply. This strategy of price tailoring is also known as ‘dynamic pricing’ [2], which essentially means to set a price of goods or services to certain customer preferences, the data on their purchases, the location, or the availability on the market. A type of bottle for example would have a different cost in a wealthy region with little to no competition than in large multichannel shopping malls [3].

1.1 Relevance Around 1.8 billion people worldwide have shopped online in the year 2018 [4]. Additionally, the sales of e-retailers amounted to $2.8 trillion with an estimate of surpassing $4.8 trillion by 2021 [5]. With the high popularity of online purchases dynamic pricing strategy has become a common practice of price discrimination. Due to the nature of online markets it is very easy to check the prices of other sellers in order to compare and then choose the one that suits an individual. Amazon is probably one of the most known places to shop online, and their prices almost never stay the same for not even 30 min. Amazon tailors product costs 2.5 million times a day, which translates to price fluctuation roughly every 10 min [6]. Another example is the airline business. The closer the date of the flight is the higher the cost of the flight ticket (excluding last-minute deals). Those are just two 1 examples, the first one demonstrates that e-retailers change prices according to customers data, the latter according to time.

1.2 Goals and Objectives Due to Amazon’s pricing strategy they had to refund money back to customers of a particular set of Men in Black DVD’s, after they’ve discovered a price difference [7]. This case points directly to the problem the price discrimination strategy comes with and it raises a lot of questions, such as ‘Why? Why is it for someone else different, when it is the same thing?’ and ‘How is this fair?’ (Fig. 1). Because fairness is one of the reasons why individuals take certain actions it is crucial to know how buyers judge price fairness. That is very important as dynamic pricing, when considered ‘unfair’, can trigger negative emotions and lead to subsequent negative actions (e.g., legal exposure, change seller, spread negative word-ofmouth) [8]. As fairness is seen to be a subjective trait, it is very much so the focus point of experiments when researching the effect of dynamic pricing on individuals, to get a better understanding of fairness and finding the balance between a tolerated and flexible price.

The Dynamic Environment of Pricing in E-Commerce and the …

769

Fig. 1 Conceptual model: perceived fairness of dynamic pricing and its impact on customer satisfaction and behavioral intentions. Source Dai [8]

2 Methodology This paper is a literature review of already conducted experiments relating to the topic. The used keywords in databases for the start were dynamic pricing, fairness, hotel and airline. After thorough researches the end result of keywords used for this paper were customer satisfaction, dynamic pricing, fairness, perception, e-commerce, trust, loyalty and hotel and airline. There are many aspects to consider when covering dynamic pricing in an online environment, whether it is the cause of price dispersion and how firms even initiate it or what the first dynamic price development was. Even when solely focussing on the approach of dynamic pricing there are multiple researches [9–12] that studied how to segment customers, based on certain attributes. Which attributes had a greater impact on the price threshold development and how to develop algorithms to predict price fluctuations [13]. The topic of dynamic pricing in e-commerce is so broad that it could cover multiple literature reviews. This paper mainly focuses on the impact that dynamic pricing has on customer’s behaviour and the satisfaction level whilst maintaining a certain level of trust. The paper firstly starts with a brief overview and introduction into the drives of price dispersion and then continues with a common example of dynamic pricing strategy (i.e. hotel and airline business). Then it is followed by a section covering the part of perception of price fairness in different purchasing scenarios, in different ways of purchasing (e.g. group buying, online auction, etc.). After that comes the part covering satisfaction of customers and maintaining loyalty by providing a certain transparency in the pricing policy.

770

J. Sirotnak and D. Ushakov

3 Literature Review 3.1 Drives of Price Dispersion The concept of adjusting prices is not a new one, it has been adopted in the past randomly, but in recent years it has taken the online business like a storm. In laymen’s terms this strategy, also referred to as demand pricing, or time-based pricing, adjusts prices according to the development of supply and demand in real-time. It comes with a lot of perks, for instance giving companies a greater control over the pricing policy as they have real-time fluctuations in the purchases and current trends. It gives them transparency on the market with easy access to competitors cost changes, which aids them in adjusting their own to maximize the profits. With the correct software it allows flexibility while also setting a price limit that will reflect the brand’s value [14]. But looking past supply and demand, probably the most obvious observation point of dynamic pricing is the wealth of a country. Taking into consideration that supply and demand have an impact on the price, the starting factor is the GDP per capita. Mathematically speaking if the GDP per capita rises by EUR thousand the prices deviate from the median by 0.14% point. Gyódi et al. [15] argue in their research that this phenomenon can be seen on the example of Poland and United Kingdom, whose GDP levels are EUR 11 thousand and 41 thousand, respectively. Based on that, prices in Poland would be on average 4.2% point lower than in the United Kingdom. Their objective was to determine the online price dispersion within the EU using the web-scraping tool (i.e. technique to examine the drift of the prices). Additionally, to GDP per capita cost variety, a deeper look into a product sample, which consisted of different kinds of branded items, was conducted to observe the interquartile range (i.e. the distance between the upper and lower quartile) of median dispersion. Those results showed that the magnitude differed between the product categories, with electro household gadgets taking approximately 20% and clothing almost 40%. If the cost variety comes from the strategic cost inequality from the developers’ or dealers’ part, then any interference with the trade-barriers would lead to unfavorable effects on the consumer’s and creator’s well-being and should be issued very carefully [15]. When researching GDP per capita price dispersion, note that it is not easily bended to certain business models to meet their price expectation. It is for sure one of the reasons why even price differences exist, but supply and demand can be more influenced by the businesses, hence why more studies and researches have focused on that part.

3.2 The Online Environment of Airline and Hotel Businesses Transitioning into the online markets, how do sellers determine the price of a certain product and its predicted fluctuation in value? In an online market situation Ozer

The Dynamic Environment of Pricing in E-Commerce and the …

771

[16] delved into the importance of maintaining flexibility on the fast growing online business. Flexibility as it is can be, for this specific topic, dissected into different subcategories (i.e. technological, human resource, operational, marketing, financial and managerial). The marketing category, which focuses on the strategy of lowering the unpredictability of the changing nature of online businesses, includes the flexibility of pricing. The problem in finding the balance between a price, that benefits both the customer and the seller, is for many e-businesses unresolved. A very popular flexibility in pricing is in hotel and airline business where people are being charged differently based on their travel habits. Additionally many airline businesses are implementing the strategy of setting a lower price one or two days before the due date, in order to acquire the otherwise lost profit [16]. Following this the airline industry can also be a great example of involving the consumer segmentation part into their price settings. As people try to book a flight online they are faced with a decision of which class to choose from. That is because airlines take advantage of dividing potential customers into different pre-defined groups: casual travelers, that are mostly price sensitive and flexible, when it comes to the time/date; business travelers, who on the other hand have their time/date as first priority and the price secondary [17]. Airline businesses usually practice what is referred to as ‘yield management’, essentially meaning that consumers face lower prices, when they book the reservations months ahead, but have to endure higher prices if they delay the bookings. Worth mentioning is also that airlines offer special deals who’s prices are significantly lowered, with the sole purpose to achieve otherwise lost profit [3]. Passengers who have to endure higher prices will perceive the whole travelling experience as disappointing. Without any base of trust and fairness judgement developed by the airline business, customers will not have a feeling of satisfaction, but only a triggered negative emotion [18]. Mentioning hotels and their common practice of dynamic pricing Melis and Piga [19] looked at two consecutive seasons (2015 and 2016) in a six months’ period to establish a potential pattern in the fluctuation of hotels’ prices. The researched topic was the probability of booking more days at a hotel in four major travel destinations (i.e. Sardinia, Sicily, Corsica and Balearics) depending on the hotel starring classification (Fig. 2). The results indicated that the usage of dynamic pricing is not as widespread as literature hints, although there weren’t any evidence that would confirm that hotels are constantly adjusting the prices of rooms. Highly established hotel chains with favorable qualities (i.e. having a four star or above rating) engage in price fluctuation more frequently, what leads to the assumption that the star rating might be the strongest differentiation factor. Furthermore as hotel and destination businesses suggest they mostly thrive during high demand periods, that being summer vacation destinations during the summer period, and so do prices rise in high demand times as hotels will expect to be booked out during those months.

772

J. Sirotnak and D. Ushakov

Fig. 2 Estimated probability of a price increase given that a price variation is observed during the booking period—by star classification and number of days from query to stay. Source Melis and Piga [19]

3.3 The Perception of Price Fairness 3.3.1

Price Mechanisms

When talking about pricing on the Internet, there are 3 different types of mechanisms. First type is the approach of ‘take it or leave it’ deal, where the price can change according to time, being that periodically or more frequently. Additionally, prices can be tailored to customers attributes like location or purchase history. The second type sees a more connected shape where buyer and seller negotiate the price over and over on the Internet. The third type actually bundles three subclasses—auction, reverse buying and exchange. An auction is an interactive selling system where the seller provides a certain product and buyers proceed to present their price that they are willing to pay. It differs from the first two discussed types in the way that it brings the buyers together to fight for the winning bid. A reversed buying is a request for proposal from the customer’s side on a product, for which the price is established through a competition involving bidding between available sellers. An exchange is more or less an example of a stock exchange where multiple sellers and buyers meet each other [3].

The Dynamic Environment of Pricing in E-Commerce and the …

3.3.2

773

Internet Fair Prices in Different Scenarios

Huang et al. [3] argue that consumers are expecting online prices to be cheaper than in regular brick-and-mortar markets, thus expectation plays a crucial part in the determination of what is fairness. If a price is lower than anticipated, a customer will experience this as a win and perceives it as fair. On the opposite side when customers face a surprising higher price, their view changes to unfairness as it is experienced as a loss. Purchase choices basically come down to two attributes, either win or lose, and a 8 rational individual would not want to hurt himself by preferring a loss. Whilst conducting a survey to measure the fairness of online prices, specifically the charge for a stay in a hotel, Huang et al. [3] came to the conclusion that respondents expect cost saving from booking online. If the price is the same as in the traditional channels it is unfair. Roughly 8% of cost savings is considered as fair. Furthermore, as mentioned before the different types of price mechanisms, it was measured how an auction would pass in terms of fairness. Surprisingly the higher price was perceived more acceptable than expected. Because of the nature of an auction, meaning people demanding, it diverted the attention from the price to the actual act of the bidding and thus it was not perceived so harshly. If the general demand is high it becomes tolerable to pay more, compared to having no choice because the market has the power to raise the price as they wish. A surprise was whilst looking at the results from the survey questions about negotiation, that almost the same number of respondents thought that this price mechanism is fair as to the number who think it is unfair. This outcome showcased that the feeling of fairness involves the seller, which leads to buyers feeling a certain level of sympathy and genuine discomfort, when they sense they are exploiting the seller.

3.3.3

Illusion of Control and Lateral Consumer Relationship

Lee et al. [20] focused their attention to two of the most common price mechanisms and studied the price fairness in an online dynamic context, specifically the illusion of control and the lateral consumer relationship, meaning if these mechanisms affect the perception of fairness when consumers feel they are directly or indirectly involved in the dynamic pricing process. They conducted a laboratory experiment to prove their theories under two contexts, those being the online auction and group buying (i.e. product prices are reduced when a certain number of customers make the purchase) (Fig. 3). Defining price fairness as a customer’s related feeling towards a price difference between two parties and their judgement whether or not it is reasonable, valid, or tolerable, how does this definition relate to illusion of control and lateral consumer relationship? In their study they were predicting, as illusion of control and lateral consumer relationship have approximately the same impact on fairness perception, that it will trigger the intention to purchase within the consumers. For clarification an illusion of control was first identified by Ellen Langer in 1975 as a personal feeling of a success which is way higher than the objective possibility would permit

774

J. Sirotnak and D. Ushakov

Fig. 3 Research model of illusion of control and lateral consumer relationship. Source Lee et al. [20]

[21]. People tend to believe they have more control over an outcome when they perceive a connection between his/hers actions towards an outcome, even though the result is based only on chance. Still that creates an illusion, but is only observable when participants are involved in the transactions [20]. An example of the illusion of control might be the Monty Hall problem where a participant has an option to choose one of three doors, two of those have a sheep behind and only one door has a brand new car behind it. After choosing one door the host of the show reveals one of the two remaining doors behind which is a sheep, and asks the participant if it is within his/her desire to switch the chosen door. Even though it would mathematically be beneficial to switch, when the option is given, people tend to not do so, as they feel they had more control in the first choosing and the host wants to trick them into making a bad decision. Apart from the outcome the participant, either he/she won or lost, will perceive it as more fair when he/she believed they had control and not letting the host trick them into switching. This example led to believe that illusion of control has a positive impact on price fairness perception. Lateral consumer relationship means an exchange of information that being the price customers paid for a certain product or service, between them and either it causes 10 fairness or unfairness. Either one individual finds out he/she paid less than someone else, which is called an advantageous lateral consumer relationship, or otherwise it is seen as unfair and the reaction will be negative, which is called the disadvantageous lateral consumer relationship. Therefore this led to another believe that an advantageous lateral comparison positively impacts the fairness perception. Note, that the study by Lee et al. [20] had its limitation that participants could opt to sell items during the auction process, if the price ended up under a certain threshold. Also the focus was on purchase intention and not the actual act of purchasing. Nevertheless results showed that the most impactful was the lateral consumer relationship. This is an interesting outcome, especially because it was mainly observed in a group buying mechanism. This implies that when a product is considered to be purchased and participants realize the price is lower than from another party, it is considered to be fair and also greatly boosts the intention of purchasing. This finding is not that ground breaking as mentioned before an individual would not want to hurt himself by paying a higher price than someone else. Obviously it will result in a fair perception, even though there is no given reasoning behind the lower price, but still it has a positive impact on the purchase process. Similar evidence was observed by Corbitt et al. [22]

The Dynamic Environment of Pricing in E-Commerce and the …

775

where the results, which were not specifically tied to a certain price mechanism, indicated that potential customers tend to listen more to other customers rather than to the advertising by the firm. Word of mouth, in comparison to the firm’s advertising, affected peoples’ purchase consideration significantly and also affecting their level of trust towards the company. If one party of participants found out the price of a product is higher than for others, sellers could decrease the level of unfairness perception by including additional emphasis on services or features of the product. This leads to more fairness when there is a little bit of reasoning behind the higher price setting or some additional benefits that other parties did not receive. Additionally to lateral consumer relationship, illusion of control had also its own impact on the fairness perception in the environment of an online auction. As individuals experienced more control of the price setting, they found themselves being susceptible to higher prices, even though it had a positive impact on the purchase intention which contradicts the previous statement that rational individuals would not want to hurt themselves by paying a higher price for essentially the same thing another party bought for a lower price. Bearing this in mind sellers can potentially exploit this illusion of control and lure customers into buying their products while charging them a higher price.

3.3.4

Disclosure of Information: Level of Transparency

The example of a group buying a product at a higher price which was perceived as fair when the seller included additional information leads to another part of the perception study and that is the provided transparency of sellers. By transparency is meant the reasoning behind the rise in prices from the seller’s side. Either the seller discloses all relevant information regarding a price, that being the true transparency which includes all the relevant information how the price of a flight ticket for example is made (e.g. price including all taxes and fees, additional charges for luggage, etc.) or the information will be revealed through mainstream channel, those being social media, to make them public. Though when the price changes come public not through the company itself it leads to a negative image of that company, because it is perceived that they have something to hide and distance themselves from explaining the reasoning behind it [23]. Ferguson and Ellen [23] acknowledged in their paper that past studies found out that when a firm provides justifiable reasons behind the price rise, customers may be able to understand and avoid any negative actions towards that firm. The problematic part for the firms is how much they would want to disclose. Another point was that if firms made their price adjustments public, customers were more willing to buy more expensive items. Ferguson and Ellen [23] confirmed in their paper that disclosure of information from the firm itself has a greater impact on the price fairness perception than disclosure form sources outside of the firm. Overall it was proven that small price increases were justified by providing small explanations than a long detailed one, and similarly a more substantial price raise was more justifiable with a detailed explanation than with a short reason (e.g. due to transportation costs). When consumers only face small changes in costs, it usually will not be affecting the purchase choice that much, that is why the short

776

J. Sirotnak and D. Ushakov

reasoning is good enough, otherwise customers might read too much into the long, detailed explanation, overthink or simply put the truthfulness in question [23].

3.3.5

The Role of Satisfaction and Customer Loyalty

What is the viewpoint when a firm decreases its prices on an online product? Based on online reviews the first price decrease is mostly viewed negatively, whilst a second price decrease is rather seen positive [24]. Even though a lower price would be expected to have a positive impact the study by Lee et al. [24] show that it is not 12 always welcomed and online merchants, based on their example of online reviews and star rating development on Amazon’s kindle product, better be careful and strategic about the price policy and thoroughly scan consumers’ data and the competitive environment to adjust prices accordingly. What would happen if customers found out they have been charged a higher cost than others? As mentioned before a significant price difference leads to customers’ unfairness attitude [7], as well as leading to negative emotions and having a bad impact on trust towards a firm that a customer has been loyal to (Fig. 4). A study conducted by Santos and Basso [25] tested their theories that unfair price perception leads to negative word-of-mouth and potential seller switch on Brazilian students. Their findings concluded that the two mediators have complementary effects, which means that when then value of one mediator rises the other one decreases, but also that unfairness had a stronger influence on negative wordof-mouth compared to trust (i.e. the correlation factor of negative word-of-mouth was positive whilst that of trust was negative). Additionally, the findings show that trust impacts the switching potential greater than negative emotions. This indicates that firms should strive to maintain a stable trust relationship for the long run as this is a key determinant of customer’s feelings towards the firm. 13 Similar evidence

Fig. 4 Theoretical model of unfairness leading to behavioral intentions. Source Santos and Basso [25]

The Dynamic Environment of Pricing in E-Commerce and the …

777

resulted from the study by Kim et al. [26] where the focus was on investigating perception of fairness and risk on trust and satisfaction. Trust impacted the whole purchase process (i.e. pre-purchase phase, purchase, post-purchase phase) and not only directly affecting the intention of purchase, but additionally the customer-seller relationship in the long run. If a customer perceives a situation as risky the intention of purchasing will be negatively affected by that, but by building trust customers will be more inclined to continue purchasing from the firm. Another linking point with satisfaction comes expectation. If a customer’s expectations are high, it leads to higher difficulty to fulfil and consequently lead to negative emotions from the customer. With lowered expectations the disappointment level can be lowered and in turn, if positively affected, cause re-purchase intentions. It is difficult to meet high expectations from customers that is why firms should aim to establish a performance level in which customers can trust, that is not difficult to maintain, and expect the firm will continue to deliver at that particular level. In that case the long run trust will be secured and a loyal customer having positive feelings towards the firm. In the study by Dai [8] the results, from the experiment to test the magnitude and proximity of price difference and the impact on the customer’s feelings, indicated that price fairness perception was directly connected with self-protection, re-purchase and revenge desire. Those three factors were depended on each other, meaning if a price was considered as fair, one factor, that being the re-purchase, rises and the other two, self-protection and revenge, will be lowered. An additional focus point was the role of customer loyalty between loyal customers and non-loyal customers. Whilst loyalty plays a significant role when focusing on retaining customers, in the study of Dai [8] customer loyalty was connected with price fairness perception and self-protection. In terms of perception of fairness, a minor price difference, that being roughly 5%, loyal customers have accepted it as fair and it did not affect the level of loyalty at all. However, a price difference of around 30% was perceived significantly more negative by loyal customers than non-loyal, and the customer’s trust was mixed with emotions of exploitation and betrayal, leading to the result that firms should cancel any drastic price differences within a short period of time when already gaining loyal customers to maintain a long term trustworthy relationship.

4 Conclusions This paper started with an outlook of how the dynamism is caused by the criteria of location, meaning that the GDP of a country plays a crucial part in the price fluctuation. Leading to the conclusion that wealthy countries will charge proportionally more for their goods than poorer countries. Worth mentioning is that the location of the market, now meaning brick and mortar market, determines the price as well. A market with no competition will have greater power to set the prices according to the business, while a multichannel shopping mall faces many competitions and has to adjust their prices according to other businesses. An easy example to explain online price dynamism was showcased on hotel and airline businesses, who tailor

778

J. Sirotnak and D. Ushakov

their prices firstly according to their specific class criteria (e.g. those being for flight tickets first classes, businesses classes, etc.), meaning more comfort equals higher charges. Prices do rise higher if customers delay their bookings and do not book months in advance. An important part to point out is that hotel and airline businesses do set significant low prices a couple of days before the due date of the departure or hotel stay in order to acquire otherwise lost profit (e.g. last minute deals). Customers tend to get deceived by the scenario of the purchase, for example in an auction or in the case of illusion of control, customers were more susceptible, but also willing to pay more money for a product that would have a lower price in another scenario, for example the group buying. The word group is an important word to focus on, as it was shown that the communication between customers significantly boosts the purchase intention process, 15 implying that businesses should aim to have an established level of quality that customers can expect. This way a long term relationship with customers will be achieved, acquiring loyal customers, who will have a high level of trust, which reduces the skepticism of the purchase. The importance of that is that loyal customers do accept small price changes of a few percent, but on the other hand are more repulsed by greater price changes of above 30%, than non-loyal customers.

5 Future Recommendations The limitations of this paper are firstly that the process of how businesses do keep the prices up to date with algorithms they use for price determination was not covered, as well as how customer profiles are used by the firm to adjust costs. Additionally, the customer’s viewpoint of the dynamic pricing strategy was only covered and not how firms respond to negative word of mouth from customers, and if a negative response from a customer immediately affects the firm’s price policy to avoid any additional outrage. This topic could be covered by further investigations.

References 1. Cleverism.: Place in the four Ps marketing mix (2014). Retrieved from https://www.cleverism. com/place-four-ps-marketing-mix/ 2. Price Intelligently.: SaaS Pricing Strategy (2015). Retrieved from https://www.priceintelligen tly.com/hubfs/Price-Intelligently-SaaS-Pricing-Strategy.pdf 3. Huang, J.-H., Chang, C.-T., Chen, C.Y.-H.: Perceived fairness of pricing on the Internet. J. Econ. Psychol. 26(3), 343–361 (2005) 4. Statista.: (2018). Retrieved from https://www.statista.com 5. Oberlo.: (2019). Retrieved from https://www.oberlo.com/blog/online-shopping-statistics 6. Business Insider.: (2018). Retrieved from https://www.businessinsider.de/international/ama zon-price-changes-2018-8-2/?r=US&IR=T

The Dynamic Environment of Pricing in E-Commerce and the …

779

7. Grewal, D., Hardesty, D.M., Iyer, G.R.: The effects of buyer identification and purchase timing on consumers’ perceptions of trust, price fairness, and repurchase intentions. J. Interact. Mark. 18(4), 87–100 (2004) 8. Dai, B.: The Impact of Perceived Price Fairness of Dynamic Pricing on Customer Satisfaction and Behavioral Intentions: The Moderating Role of Customer Loyalty (2010) 9. Bauer, J., Jannach, D.: Optimal pricing in e-commerce based on sparse and noisy data. Decis. Support Syst. 106, 53–63 (2018) 10. Burger, B., Fuchs, M.: Dynamic pricing—a future airline business model. J. Revenue Pricing Manag. 4(1), 39–53 (2005) 11. Haddad, R.E.: Exploration of revenue management practices—case of an upscale budget hotel chain. Int. J. Contemp. Hosp. Manag. 27(8), 1791–1813 (2015) 12. Yan, R.: Pricing strategy for companies with mixed online and traditional retailing distribution markets. J. Prod. Brand Manag. 17(1), 48–56 (2008) 13. Vastani, S.F., Monroe, K.B.: Role of customer attributes on absolute price thresholds. J. Serv. Mark. 33(5), 589–601 (2019) 14. Khan.: The rise of “big data” on cloud computing: Review and open research issues (2015). https://doi.org/10.1016/j.is.2014.07.006 15. Gyódi, K., Sobolewski, M., Ziembi´nski, M.: What drives price dispersion in the European e-commerce industry? Cent. Eur. Econ. J. 3(50), 53–71 (2017) 16. Ozer, M.: The role of flexibility in online business. Bus. Horiz. 45(1), 61–69 (2002) 17. Raju, C.V.L., Narahari, Y., Ravikumar, K.: Learning dynamic prices in electronic retail markets with customer segmentation. Ann. Oper. Res. 143(1), 59–75 (2006) 18. Chapuis, J.M.: A cross-cultural analysis of passengers reactions to revenue and pricing management. J. Revenue Pricing Manag. 12(1), 16–25 (2012) 19. Melis, G., Piga, C.A.: Are all online hotel prices created dynamic? An empirical assessment. Int. J. Hosp. Manag. 67, 163–173 (2017) 20. Lee, S., Illia, A., Lawson-Body, A.: Perceived price fairness of dynamic pricing. Ind. Manag. Data Syst. 111(4), 531–550 (2011) 21. Interaction-design.: Designing contestability: Interaction design, machine learning, and mental health (2017). https://doi.org/10.1145/3064663.3064703 22. Corbitt, B.J., Thanasankit, T., Yi, H.: Trust and e-commerce: a study of consumer perceptions. Electron. Commer. Res. Appl. 2(3), 203–215 (2003) 23. Ferguson, J.L., Ellen, P.S.: Transparency in pricing and its effect on perceived price fairness. J. Prod. Brand Manag. 22(5/6), 404–412 (2013) 24. Lee, K.Y., Jin, Y., Rhee, C., Yang, S.-B.: Online consumers’ reactions to price decreases: Amazon’s Kindle 2 case. Internet Res. 26(4), 1001–1026 (2016) 25. Santos, C.P.D., Basso, K.: Price unfairness: the indirect effect on switching and negative wordof-mouth. J. Prod. Brand Manag. 21(7), 547–557 (2012) 26. Kim, D.J., Ferrin, D.L., Rao, H.R.: Trust and satisfaction, two stepping stones for successful e-commerce relationships: a longitudinal exploration. Inf. Syst. Res. 20(2), 237–257 (2009)

An Investigation of the Complexity of Bitcoin Pricing Philipp Saborosch and Dmitry Ushakov

Abstract This chapter aims to investigate whether it is possible to combine existing research regarding specific attributes of Bitcoin and other cryptocurrencies into one model of Bitcoin price explanation. To do so, an extensive literature review is conducted to explore the publications available. The literature review results in a list of variables used to explore various research areas regarding Bitcoin. The most popular variables (such as the amount of web searches regarding Bitcoin, the gold price or security of blockchain technologies) are selected and combined into a regression model. Even though the coefficient estimates for the Google Trends index, the mean transaction fee, the number of Bitcoin wallets, the security breach dummy variable and the lagged Bitcoin price are reported as significant, statistical testing indicates severe issues with the model. The chapter therefore concludes with the finding that research regarding Bitcoin is not advanced enough and that its pricing mechanisms are too complex to build a sum-of-the-parts model. For future research the exploration of an advanced model with measures implemented to counteract mentioned issues is suggested. Keywords Bitcoin pricing · Blockchain

1 Introduction From the beginning of January 2015 to the end of December 2019 the price of a socalled Bitcoin increased from approximately $315 to approximately $7250, peaking at $18,640.26 on December 18, 2017 [1]. In the process Bitcoin sparked not only the interest of technology enthusiasts but also of the financial services industry (e.g., [2]) and even of mainstream media companies (e.g., [3]). Subsequently, the scientific P. Saborosch University of Vienna, Oskar Morgenstern Platz 1, 1090 Vienna, Austria D. Ushakov (B) Graduate School of Business, HSE University, Myasnitskaya, 9/11 Moscow, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Kryvinska and A. Poniszewska-Mara´nda (eds.), Developments in Information & Knowledge Management for Business Applications, Studies in Systems, Decision and Control 377, https://doi.org/10.1007/978-3-030-77916-0_24

781

782

P. Saborosch and D. Ushakov

community started investigating the area of Bitcoin and cryptocurrencies in general as well. But what is a ‘Bitcoin’? Bitcoin, as one of many cryptocurrencies, is a solely digital currency which is completely decentralized, therefore lacking an authority such as a central bank [4]. From a technological point of view, Bitcoin is based on peer-to-peer transactions and the blockchain with the maximum number of Bitcoin being limited by the properties of its ‘mining’ algorithm [4, 5]. As researchers are particularly determined to explain different, highly specific aspects of the pricing of Bitcoin, it shall be the purpose of this paper to summarize these efforts in investigating these factors and explore whether it is possible to build an efficient regression model to explain Bitcoin pricing by putting these pieces of research together. To do so, a systematic literature review of academic work dedicated to the pricing of Bitcoin and other cryptocurrencies will be conducted and then the results of this literature review will be used to build a regression model. The publications composing the sample for the literature review were found by systematically searching several scientific databases for pre-determined key word combinations. After having gathered the final sample, the literature review part, whose structure loosely follows Bauer and Strauss [6], starts with an introduction to the methodology used and then continues with a part of analysis that investigates the literature of the sample based on publication outlet, year of publication, major research topic and ultimately, variables used to analyze the respective major research topic. The second part of this paper then introduces the regression model which will be based on aforementioned variables. Once the model is created, its coefficient estimates will be compared to economic intuition and then extensive testing will be conducted to analyze the meaningfulness of the results yielded by the model. Ultimately, this paper does explicitly not aim to construct a highly technical model for Bitcoin price prediction, but to piece together the single parts of research that have been published in the time frame of 2015–2019 to see whether the sum of these parts is able to deliver a meaningful explanation of Bitcoin price development. The application of advanced statistical and econometric techniques to modify and properly apply the results of this paper is reserved for further research which can be based on the findings of this paper.

2 Methodology 2.1 Database Search to Find Relevant Literature The process of looking for significant papers started with searching academic online databases with combinations of keywords. Table 1 shows the amount of results displayed by the different databases. The key word combinations start with either ‘bitcoin’ or ‘cryptocurrenc*’ (the star enabling different endings of the word) to specify the topic of research and continue with a combination of the terms ‘price’, ‘value’, ‘influences’, ‘macroeconomic’ and ‘pricing’. This clarifies the research

An Investigation of the Complexity of Bitcoin Pricing

783

Table 1 Search results after the first specification of key words Search terms/database

SpringerLink

Wiley online

SAGE

Emerald

Inderscience

Total

‘bitcoin’ + ‘price’ + ‘influences’ + ‘regression’ + ‘model’

14

3

1

5

2

25

‘bitcoin’ + ‘value’ + ‘influences’ + ‘regression’ + ‘model’

16

3

1

5

2

27

‘bitcoin’ + ‘price’ + ‘macroeconomic’ + ‘regression’ + ‘model’

10

1

0

0

1

12

‘bitcoin’ + ‘value’ + ‘macroeconomic’ + ‘regression’ + ‘model’

10

1

0

0

1

12

6

0

0

0

1

7

‘bitcoin’ + ‘pricing’ + ‘influences’ + ‘regression’ + ‘model’

14

3

1

5

2

25

‘bitcoin’ + ‘pricing’ + ‘macroeconomic’ + ‘regression’ + ‘model’

10

1

0

0

1

12

‘cryptocurrenc*’ + ‘price’ + ‘influences’ + ‘regression’ + ‘model’

17

5

0

4

0

26

‘cryptocurrenc*’ + ‘price’ + ‘macroeconomic’ + ‘regression’ + ‘model’

3

2

0

1

0

6

100

19

3

20

10

152

‘bitcoin’ + ‘macroeconomics’ + ‘influences’ + ‘regression’ + ‘model’

Total

784

P. Saborosch and D. Ushakov

interest, as the search aims to find papers that discuss price influences on Bitcoin. Finally, all combinations finish with ‘regression’ + ‘model’ to determine the desired methodology used. As stated in Table 1, the different combinations of search terms yielded a total of 152 results. However, it has to be noted, that there is still the need for manual cleansing of those results, as most combinations retrieved the same papers as results. It also has to be mentioned that the first term of those combinations, so either ‘bitcoin’ or ‘cryptocurrenc*’ was required to not only be included in the paper but in the title of the paper. This method enabled the search engines to only deliver highly relevant and significant results. As to be seen in Table 2, it was possible to narrow down the search terms to four combinations which still included all papers also found in Table 1. These four final combinations mainly consisted of the mix of the four key variables in the search, being ‘bitcoin’ and ‘cryptocurrenc*’, respectively, on the one hand and ‘influences’ and ‘macroeconomic’ on the other hand. Furthermore, after manually cleansing the second table of papers which were found with two or more combinations of the key words, it was found that the search process returned 58 unique papers. A detailed list of those 58 unique findings can be found in the references. It is worth mentioning that, as can be seen in Table 2, the majority of results was found by the SpringerLink search engine (37), followed by Wiley Online and Emerald (nine Table 2 Results of narrowed key word specification Search terms/database

Wiley online

SAGE

Emerald

Inderscience

Total

‘bitcoin’ + ‘value’ 15 + ‘influences’ + ‘regression’ + ‘model’

3

1

5

2

26

‘bitcoin’ + ‘price’ + ‘macroeconomic’ + ‘regression’ + ‘model’

4

1

0

0

0

5

‘cryptocurrenc*’ + ‘price’ + ‘influences’ + ‘regression’ + ‘model’

15

5

0

4

0

24

‘cryptocurrenc*’ + ‘price’ + ‘macroeconomic’ + ‘regression’ + ‘model’

3

0

0

0

0

3

37

9

1

9

2

58

Total

SpringerLink

An Investigation of the Complexity of Bitcoin Pricing

785

results each). Marginal contribution can be attributed to the SAGE and Inderscience Online search engine which returned one and two results, respectively.

2.2 Quantitative Analysis of the Sample Further adopting the literature review approach pursued by Bauer and Strauss [6], this section introduces the quantitative analysis of the relevant sample of 58 papers. Several categorization procedures of the sample were undertaken in this process. The publications were classified based on the following criteria: 1. 2. 3. 4.

The type of the publication outlet (journal or conference) Discipline of the publication outlet (business, computer science or others/diverse) Major research topic of the paper (price forecasting, volatility explanation, social influences, technical mechanisms) Date of the publication (in years) and medium of publication.

The first, second and last classification are based on the citation information of the respective publications provided by the databases, the third categorization is created based on a content analysis of the papers. The first and second type of classifying serve to establish a formal overview of the collected sample. The division between conference contributions and journal articles and the sorting by scientific discipline are supposed to provide an idea of which scientific fields and processes contributed most to the sample. In the third categorization the sample will be sorted by the major research topic of each paper. The fourth and final analysis will then focus on sorting the publications by their publishing date. This is particularly important for this sample as Bitcoin and cryptocurrencies were only introduced during the last decade [7] and have only been included in the general academic discourse very recently (as shown later on in this paper). The results of these classifications can be seen in Sect. 3.1.

2.3 Qualitative Analysis of the Sample For the qualitative analysis each paper will be assigned to one of seven categories of major research topics. Furthermore, the variables used by each paper to explain its respective major research topic will be listed. The results of this analysis can be seen in Sect. 3.2.

786

P. Saborosch and D. Ushakov

3 Literature Review 3.1 Quantitative Analysis As described above, the first part of the quantitative analysis divides the sample into journal articles and conference contributions and then proceeds to determine the scientific category of the publication outlet. The second part will focus on the major research topic of each paper.

3.1.1

Analysis of the Publication Outlet and Its Scientific Category

The sample contains 43 refereed journal articles, 13 conference contributions as well as the special case of two contributions, which are neither but a final summary of specific research projects. Regarding the scientific category of the publication outlet, three possible categories are presented: Business (it may be noted that this category will also include contributions to research in finance, economics and other related fields), computer science (containing machine learning, networks et al.) and others/diverse. The latter consists mainly of operations research contributions making up for the ‘other’ part and of papers published in outlets that cover a broad area of academic disciplines (‘diverse’). Table 3, which is adapted from Bauer and Strauss [6], shows the type of publication outlet (either Conference, Journal or Other) as well as the scientific discipline of the respective outlet (Business, Computer Science or Other/Diverse) of each publication contained in the sample. As stated above, the sample consists of 13 conference contributions, 43 journal articles and two contributions of the ‘other’ category. These two are final reports of research or working groups which were then publicized by one of the publishers included in the database search conducted in Chapter “Voucher 4.0—Digitisation Potential in Voucher Sales from the Works Council’s Point of View”. As the specific papers endured a peer-review process by these publishers it is deemed to be appropriate to include them in this sample, especially as they also contribute to the completeness of it. Regarding the scientific disciplines of the publications it can be seen that 33 papers were published in outlets from the business category and 15 from the computer science field. Furthermore, ten papers were published in outlets which cannot be ascribed to those fields. It is noteworthy that out of the seven publications which were published in a non-diverse, non-business, non-computer science environment, four can be attributed to the field of data science, operations research and mathematical optimization, making up for more than 50% of this category.

3.1.2

Analysis of Major Research Topic of Each Paper

For the major research topic categorization, seven distinctive categories are introduced:

An Investigation of the Complexity of Bitcoin Pricing

787

Table 3 Type of publication outlet and scientific discipline per publication outlet Authors

Publication outlet

Type of publication outlet

Scientific discipline

C

B

J

O

Abraham [8]

Journal of Corporate Accounting and Finance

X

Abraham et al. [4]

SAGE Open

X

Ajouz et al. [9]

Thunderbird International Business Review

X

X

Alexander et al. [10]

Journal of Futures Markets

X

X

Alfieri et al. [11]

The Journal of Risk Finance

X

X

Alzaatreh and Sulieman [12]

Empirical Economics

X

X

Amaral et al. [13]

IFSA/NAFIPS 2019: Fuzzy Techniques: Theory and Applications

Aste [14]

Digital Finance

Bashir et al. [15]

SocInfo 2016, 8th International Conference

Caporale and Plastun [16]

Journal of Economic Studies

X

X

Caporale et al. [17]

Financial Markets and Portfolio Management

X

X

Chakravarty et al. [18]

EAI International Conference X on Big Data Innovation for Sustainable Cognitive Computing

Chatterjee et al. [19]

Quality and Quantity

X

Cheong [20]

The Journal of Risk Finance

X

Ciaian et al. [7]

Information Systems and e-Business Management

X

De Stefani et al. [21]

ECML PKDD 2018 Workshops

X

Deng et al. [22]

Blockchain—ICBC

X

Dorfleitner and Lung [23]

Journal of Asset Management

X

X

Figá-Talamanca and Patacca [24]

Economics and Finance

X

X

Guo and Li [25]

Applied Quantitative Finance

X

X

Haferkorn and Quintana FinanceCom 2014, 7th Diaz [26] International Workshop Heine Felix and von Eije [27]

Managerial Finance

CS

OD

X X

X

X

X

X

X

X

X

X X X X X

X

X X

X (continued)

788

P. Saborosch and D. Ushakov

Table 3 (continued) Authors

Publication outlet

Type of publication outlet

Scientific discipline

C

B

J

O

Jain et al. [28]

Financial Management

X

X

Karalevicius et al. [29]

The Journal of Risk Finance

X

X

Koutmos [30]

Annals of Operations Research

X

Li et al. [31]

Quality and Quantity

X

Liu and Serletis [32]

Open Economies Review

X

X

Ma and Tanizaki [33]

China Finance Review International

X

X

Masciandaro [34]

Australian Economic Review

X

X

Matta et al. [35]

Knowledge Discovery, X Knowledge Engineering and Knowledge Management: 7th International Joint Conference, IC3K 2015

Mendoza-Tello et al. [36]

Information Systems and e-Business Management

Ming Wong [37]

Contemporary Issues in International Political Economy

Moosa [38]

Journal of Industrial and Business Economics

X

Naimy and Hayek [39]

International Journal of Mathematical Modelling and Numerical Optimisation

X

Othman et al. [40]

Journal of Financial Economic Policy

X

X

Papathanasiou et al. [41] International Journal of Financial Engineering and Risk Management

X

X

Parino et al. [5]

EPJ Data Science

X

Poyser [42]

Eurasian Economic Review

Priya and Garg [43]

Third International Conference on Smart Computing and Informatics, Volume 1

Ricci [44]

Journal of Industrial and Business Economics

Romanchenko et al. [45] Digital Science International Conference 2018 Sahoo et al. [46]

International Journal of Managerial Finance

CS

OD

X X

X

X

X X

X

X

X X

X X

X

X

X

X

X

X X

X (continued)

An Investigation of the Complexity of Bitcoin Pricing

789

Table 3 (continued) Authors

Publication outlet

Type of publication outlet

Scientific discipline

C

B

J

O

CS

Sánchez et al. [47]

Cloud Computing and Big Data

X

Shorish [48]

Digital Finance

X

X

Shrestha [49]

International Review of Finance

X

X

Soloviev and Belinskiy [50]

Information and X Communication Technologies in Education, Research, and Industrial Applications 2018

Ullrich et al. [51]

Research in Attacks, Intrusions, and Defenses: RAID 2018

Vardar and Aydogan [52]

EuroMed Journal of Business

X

X

Vogiazas and Alexiou [53]

Economic Notes

X

X

Wang et al. [54]

Journal of Computer Science and Technology

X

Wang et al. [55]

Journal of Economic Interaction and Coordination

X

Wolk [56]

Expert Systems

X

Yalaman [57]

Digital Business Strategies in Blockchain Ecosystems

Yang et al. [58]

SmartBlock 2019

X

X

Zamuda et al. [59]

Information and X Communication Technologies in Education, Research, and Industrial Applications 2018

X

Zhang et al. [60]

Accounting and Finance

X

X

Zhou [61]

Empirical Economics

X

X

Zhu et al. [62]

Financial Innovation

X

X

OD

X

X

X

X

X X X X

X

• Bitcoin technologies: this category serves for papers that primarily discuss technical aspects of Bitcoin and other cryptocurrencies • Pattern recognition: for publications concerned with specific patterns in the behavior of the Bitcoin price, mostly connected to possible bubbles and crashes • Portfolio theory: reserved for contributions that research the diversification benefits of cryptocurrencies and other effects on security portfolios • Price forecasting and explanation: this area includes mainly statistical models explaining and/or trying to forecast the price of Bitcoin and other cryptocurrencies

790

P. Saborosch and D. Ushakov

Fig. 1 Amount of publications per major research topic

• Social acceptance: this category collects papers researching the acceptance of cryptocurrencies in the general public or in specific subgroups • Social influences: a special subsection dedicated to papers which look into the connection between public development, public perception of cryptocurrencies and the development of the cryptocurrencies • Volatility forecasting and explanation: same as with the price forecasting and explanation, but here the volatility of cryptocurrencies is the response variable. It has to be noted that these distinctions are obviously partly arbitrary, as most papers connect several of the aspects discussed above. In these cases, the classification was done based on determining which topic was most related to the overall research goal of the respective paper. Figure 1 shows the distribution of publications over these seven categories. It is noteworthy that while some categories displayed a very diverse range of approaches towards conducting research, other categories were more homogenous in their methodology. Dividing the Bitcoin technology area into a macro-perspective, which analyzes the impact and relationships of Bitcoin on and with other technologies and into a micro-perspective, which looks into the specific technology of Bitcoin and other cryptocurrencies, it can be stated that both areas were covered by the sample. Chatterjee et al. [19] conducted a survey of cryptocurrency technology with special regard towards security and privacy issues as well as operation costs of the network, therefore covering the micro-perspective. The macro-perspective was covered by Ullrich et al. [51], who investigated the effect of cryptocurrency networks on public infrastructure, paying special attention to the fact that a sudden change in electricity demand from cryptocurrency miners could potentially cause trouble for power grids, as they need to constantly balance their amount of power. The approaches of the papers in the pattern recognition area were more homogenous, as they mostly focused on statistical and econometric techniques. E.g., Vogiazas

An Investigation of the Complexity of Bitcoin Pricing

791

and Alexiou [53] performed several versions of the Augmented Dickey-Fuller test [63] to determine whether the pattern of the Bitcoin price in 2017 followed a bubble development. Furthermore, Moosa [38] combined principles of stock valuation with an autoregressive-distributed lag method to investigate whether Bitcoin exhibited a bubble pattern. Publications in the portfolio theory category followed diverse objectives, such as determining the use of cryptocurrencies to hedge the risk of foreign exchange rates [20] or finding that Bitcoin is only lowly correlated to existing asset classes and is therefore a viable diversification tool [11]. Making up for nearly half of the sample (26 out of 58 papers, or approx. 45%) the price forecasting and explanation category contains a broad range of approaches as well. Alexander et al. [10] inspected the BitMEX cryptocurrency exchange platform, finding that trading data such as relative trading volumes and bid-ask-spreads are significantly contributing to the price determination of Bitcoin. Investigating a special case of cryptocurrency pricing, the initial coin offering (ICO) price, Heine Felix and von Eije [27] compared ICOs to initial public offerings (IPOs) of stocks and found that there is an underpricing level of approximately 100% for ICOs, with this number being even higher if one exclusively observes the US market. A broader analysis was conducted by Sánchez et al. [47], who researched the relationship between cryptocurrencies and the global market, concluding that Bitcoin and Ethereum, the two most well-known cryptocurrencies, are closer to the market than other cryptocurrencies. The social acceptance can be divided in a macro- and a micro-level as well, the former describing factors for Bitcoin adoption on a per country basis and the latter focusing on factors that contribute to Bitcoin adoption of individuals. In the collected sample, the micro-level is covered by Bashir et al. [15], researching the motivation for people to use Bitcoin by testing the influence of 15 different variables (such as whether the person identifies as libertarian, whether the person is a female etc.) on likeliness of Bitcoin ownership of a person. The macro-level is covered by e.g., Parino et al. [5] who look for the main factors for countries to adopt Bitcoin. The social influence area has diverse approaches as well, e.g., Matta et al. [35] investigate the relationship of Bitcoin trading volume on one side and web searches on Bitcoin and amount of Bitcoin-related communication on Twitter on the other side. Othman et al. [40] research a more specific aspect as they look into the relationship of the variability in bank deposits in predominantly Islamic countries and the market capitalization of cryptocurrencies. Finally, research conducted in the volatility forecasting and explanation category is very homogenous, mostly relying on statistical modelling to obtain results. E.g., De Stefani et al. [21] tried predicting cryptocurrency volatility with a Dynamic Factor Model. Another statistical method was conducted by Liu and Serletis [32] who used a GARCH-in-mean model to inspect relationships between volatility and returns of several cryptocurrencies.

792

P. Saborosch and D. Ushakov

45

$16,000

40

$14,000

35

$12,000

30

$10,000

25

$8,000

20

$6,000

15 10

$4,000

5

$2,000

0

2015

2016

2017

2018

Number of publicaons

2019

2020

$0

Bitcoin price

Fig. 2 Number of publications per year and Bitcoin price

3.1.3

Analysis of the Year of Publication and Distribution of Media of Publication

The chart (Fig. 2) shows the development of the number of publications contained in the sample per year as well as a simplified version of the development of the price of Bitcoin in the same years, 2015–2020 (based on [1]). It can clearly be seen that the number of publications skyrocketed after the Bitcoin price experienced a steep increase. The lag between the peak of the Bitcoin price and the amount of publications can be attributed to the fact that it took an extended amount of time for the research to be conducted, reviewed, and published. Furthermore, it has to be noted that it is to be expected that the number of publications in 2020 (three at the time of writing this paper) will drastically increase once the year progresses. Regarding the distribution of the publications among different publication media it is worth mentioning that no significant findings can be reported. The journals Digital Finance, Empirical Economics, Information Systems and e-Business Management, Journal of Industrial and Business Economics and Quality & Quantity contributed two papers to the sample, The Journal of Risk Finance contributed three papers. Furthermore, two elements of the sample were originally presented at the Information and Communication Technologies in Education, Research, and Industrial Applications conference in 2018. All other publication outlets contributed only one piece of research.

An Investigation of the Complexity of Bitcoin Pricing

793

3.2 Qualitative Analysis This subsection uses the major research topics identified in Sect. 3.1.2 (Bitcoin technologies, Pattern recognition, Portfolio theory, Price forecasting and explanation, Social acceptance, Social influences, Volatility forecasting and explanation) and shows which variables are used by the various authors to explain the behavior of their major research topic. This analysis will later on also be used to determine a selection of variables to use for the regression model on Bitcoin pricing that will be introduced by this paper. Table 4 lists the publications contained in the sample divided per major research topic and shows the explanatory variables used by each paper. As seen above, a broad range of explanatory variables can be derived from publications of the sample. Bitcoin technologies variables focused on factors determining the quality of technology networks such as operations costs, security and efficiency (e.g., [19, 54]) and on relationships between technical data of cryptocurrency networks and public infrastructure, using indicators such as total hash rate, power consumption of miners and world power consumption [51]. Pattern recognition variables were mostly of financial nature, as mainly prices, trading volumes and the volatility of cryptocurrencies were used to investigate possible patterns (e.g., [38, 53]). Variables of the portfolio theory category describe the assets and effects which were investigated regarding whether they displayed correlation with cryptocurrencies. Research in this area was focused on macroeconomic factors such as the S&P 500, FTSE 100 and other stock indices, oil price, gold price, bonds and foreign exchange rates (e.g., [11, 23]). As additional indicators for diversification possibilities the AAII investor sentiment survey, Google Trends [42], and data of other cryptocurrencies were used [20]. Researchers utilized a very heterogenous group of variables to investigate price forecasting and explanation. Besides financial criteria such as cryptocurrency prices, bid-ask spreads, trading volume and volatility (e.g., [10, 46]), social media data such as number of articles and posts about Bitcoin [29], Twitter sentiment analysis and Google Trends [56] social media feeds [59] was analyzed to determine whether there is an influence of this data on Bitcoin trading. Additionally, some researchers looked at the influence on pricing of technical criteria such as the number of Bitcoins on the market, number of transactions and number of Bitcoin addresses [7] as well as the hash rate and the price of miner futures [22]. Macroeconomic variables such as interest rates, oil price, gold price, FTSE 100 and other stock indices and foreign exchange rates were considered by many researchers too (e.g., [30, 47]). Variables predicting the social acceptance of Bitcoin or other cryptocurrencies are mostly divided into individual traits such as conscientiousness, whether the person is female, whether the person is libertarian, a person’s technical skills, if the person has a friend who uses Bitcoin [15] or perceived trust, ease of use, risk usefulness and intention to use [36] on the one hand and societal features such as public opinion [41] internet penetration, human development index etc. [5] on the other hand. Social

794

P. Saborosch and D. Ushakov

Table 4 Explanatory variables used in each paper per major research topic Major research topic/authors

Variables used to explain major research topic

Bitcoin technologies Chatterjee et al. [19]

Cost of operation, security and privacy issues

Ullrich et al. [51]

Total hash rate, power consumption of miners, release date of miners, world power consumption, mining revenue, acquisition cost of miners, miner lifetime, ratio electricity to acquisition costs, electricity price

Wang et al. [54]

Costs, efficiency, security, supervision features, implementation, future goals

Pattern recognition Li et al. [31]

Prices

Moosa [38]

Prices, trading volume

Vogiazas and Alexiou [53]

Prices, Volatility

Portfolio theory Alfieri et al. [11]

S&P 500, FTSE 100, DAX 30, Nikkei 225, CAC 40, NASDAQ, MSCI World, MSCI Europe, MSCI Asia–Pacific, oil price, gold price, commodity index, bonds, dollar index, foreign exchange of Euro, foreign exchange of Yen, foreign exchange of Yuan

Cheong [20]

Cryptocurrency prices, cryptocurrency volatility, foreign exchange prices, foreign exchange volatility, gold volatility, gold price

Dorfleitner and Lung [23]

Prices, volatility, day-of-the-week, MSCI world index, MSCI emerging and frontier market index, iBoxx euro corporations index, government bonds, oil price, gold price, HFRX fixtures global hedge fund index, FTSE index of global real estate investment trusts

Poyser [42]

USD exchange trade volume, transaction confirmation time, hash rate, Google Trends, S&P 500, CBOE volatility index, AAII investor sentiment survey, foreign exchange USD/EUR, foreign exchange USD/Yuan, gold price

Vardar and Aydogan [52]

Spillover/correlation effects with stocks, bonds, USD and EUR

Price forecasting and explanation Abraham [8]

Prices, prices of other cryptocurrencies

Alexander et al. [10]

Prices, bid-ask spreads, interexchange spreads, relative trading volumes

Alzaatreh and Sulieman [12]

Prices

Amaral et al. [13]

Prices

Aste [14]

Other cryptocurrencies, investment behavior on Twitter and StockTwits

Caporale and Plastun [16]

Prices, volatility (continued)

An Investigation of the Complexity of Bitcoin Pricing

795

Table 4 (continued) Major research topic/authors

Variables used to explain major research topic

Caporale et al. [17]

Prices, overreactions

Chakravarty et al. [18]

No detailed insights available yet

Ciaian et al. [7]

Number of Bitcoin, number of transactions, number of Bitcoin addresses, days destroyed, exchange rate, views on Wikipedia, new members, new posts, Dow Jones, oil price

Deng et al. [22]

Prices, hash rate, price of miner futures

Heine Felix and von Eije [27]

Trading volume, ICO issue size, ICO issuer retained ratio, rating, coins sold ratio, sentiment, hot market, pre-ICO, bonus scheme, currency, platform products, finance products, software products, entertainment products, USA (dummy)

Karalevicius et al. [29]

Number of articles/posts, mood of articles/posts, start/end price

Koutmos [30]

Volatility, interest rates, implied stock market, foreign exchange market volatilities

Ma and Tanizaki [33]

Prices, day-of-the-week

Priya and Garg [43]

No detailed insights available yet

Romanchenko et al. [45]

Different valuation techniques

Sahoo et al. [46]

Trading volume, volatility

Sánchez et al. [47]

Oil price, gold price, Nikkei 225, FTSE 100, Dow Jones industrial average, foreign exchange USD/EUR, foreign exchange USD/JPY

Shorish [48]

Use of token as a claim (security characteristic), use of token as a grantor of seller-generated services (utility characteristic)

Shrestha [49]

Prices

Soloviev and Belinskiy [50]

Prices, economic mass, eigenvalues of correlation matrix

Wolk [56]

Twitter sentiment analysis, Google Trends

Yang et al. [58]

Prices, Twitter volume, Twitter sentiment, CNN-LSTM

Zamuda et al. [59]

Influential users, social media feed, tweets, sentiment analysis

Zhou [61]

Prices, Bitcoin-related events, Bitcoin regulation, news coverage, financial markets, monetary policy

Zhu et al. [62]

Consumer price index for all urban consumers, Dow Jones industrial average, US dollar index, effective federal funds rate, gold price

Social acceptance Abraham et al. [4]

Hofstede’s cultural value orientations; physical/spatial distance, social distance, temporal distance, hypothetical distance (continued)

796

P. Saborosch and D. Ushakov

Table 4 (continued) Major research topic/authors

Variables used to explain major research topic

Ajouz et al. [9]

Adoption, relative advantage, compatibility, anxiety, trialability, observability, facilitating condition, trust, convertibility

Bashir et al. [15]

Willingness to use Bitcoin, female (dummy), conscientious, vertical individualism, libertarian, conservative, technical skills, discretionary cash, use of credit card for necessities, ownership of Bitcoin, friend who owns Bitcoin, devaluation expectation of Bitcoin, anonymity, borderless finance, virtual money, novelty

Haferkorn and Quintana Diaz [26]

Quantities of payment in other cryptocurrencies, year, month, weekday

Masciandaro [34]

Liquidity costs, opportunity costs, privacy costs

Mendoza-Tello et al. [36]

Perceived trust, perceived ease of use, intention of use, perceived risk, perceived usefulness

Papathanasiou et al. [41]

Public opinion, expert opinions

Parino et al. [5]

Internet penetration, population GDP per capita, inflation, human development index, overall freedom of trade, number of IPs in Bitcoin network, number of client downloads in Bitcoin network

Social influences Figá-Talamanca and Patacca [24]

Volume of transactions in Bitcoin, amount of internet searches (SVI Google index)

Matta et al. [35]

Web search, social volumes, Twitter posts

Ming Wong [37]

Bitcoin as exchange medium, role in monetary system

Othman et al. [40]

Bank deposits, cryptocurrencies market capitalization

Ricci [44]

Geographical network of Bitcoin transactions, economic freedom indicators

Zhang et al. [60]

News from governments and central banks

Volatility forecasting and explanation De Stefani et al. [21]

Prices

Guo and Li [25]

Prices, volatility, wealth distribution of cryptocurrency

Jain et al. [28]

Trading volume, day-of-the-week, time-of-the-day

Liu and Serletis [32]

Prices, volatility, spillovers from other cryptocurrencies, S&P 500 spillovers, interest rate spillovers, DAX 30 spillovers, FTSE 100 spillovers, Nikkei 225 spillovers

Naimy and Hayek [39]

Prices, volatility

Wang et al. [55]

Trading volume, market share of different currencies in Bitcoin trading

Yalaman [57]

Prices, Google Trends, Bitcoin futures

An Investigation of the Complexity of Bitcoin Pricing

797

influence variables try to explain the connection between public behavior and cryptocurrency development and the influence on cryptocurrency pricing. Researchers used variables such as news from governments and banks, amount of internet searches and amount of twitter posts [24, 35, 60] mostly, indicating a focus on news releases and social media, complemented by macroeconomic variables such as variability in bank deposits or economic freedom indicators [40, 44]. Lastly, volatility forecasting and explanation researchers mainly used trading data such as prices, volatility and trading volume (e.g., [21, 25, 28, 39]), supplemented by macroeconomic data [32].

4 Discussion and Introduction of Regression Model 4.1 Conclusions from the Theoretical Framework The literature review started with searching different key-word combinations in several scientific databases, resulting in 152 findings which were downsized to 58 findings in the final sample after cleansing for duplications. In a further step the methodology of the following quantitative and qualitative analysis was explained. The quantitative analysis then started with investigating the publication outlet and the scientific discipline of the publication outlet of each paper, finding that 43 papers were published in journals, 13 were conference contributions and two elements of the sample were ascribed to the ‘other’ category. For the scientific categories, 33 papers can be attributed to the business category, 15 papers to the computer science category and the remaining ten papers to the ‘other’ category, even though it is noteworthy that many of those contributions are from the statistics and data science field. Looking for trends between those two categorizations, it becomes apparent that contributions from the field of business are almost exclusively papers published in journals (making up for almost 75% of overall journal contributions in the sample) with zero conference contributions and two sample elements from the ‘other’ category. Computer science publications were found mostly in the conference category (making up for approximately 70% of conference contributions). However, five computer science articles were also published in journals. The other/diverse scientific category was almost evenly split between journal and conference contributions. These overall trends may be attributed to the direction of business papers mostly leaning towards formal models, making a journal article favorable over a conference contribution, while computer science papers predominantly tried to give more intuitive and innovative explanations. In a next step the sample was divided in seven major research topics, namely Bitcoin technologies, pattern recognition, portfolio theory, price forecasting and explanation, social acceptance, social influences and volatility forecasting and explanation. It was found that close to one half of the papers in the sample investigated price forecasting and explanation, which may be caused by the fact that its price is

798

P. Saborosch and D. Ushakov

Bitcoin most prominent attribute (as it is with any other security). The last part of quantitative analysis was researching the amount of publications on a per-year basis, finding that research on Bitcoin and other cryptocurrencies peaked in 2019 with the expectation to remain on a high level. It was proposed that this may be due to the steep increase of the Bitcoin price in 2018. The qualitative analysis then led to a detailed list of variables which the papers investigated in relation to Bitcoin and other cryptocurrencies. The next section of this chapter will have a deeper look into the selection of variables.

4.2 Findings to Be Included in the Regression Model The investigation of variables is considered to be the key contribution of the literature review to building the regression model. To take all seven categories into account while constructing the model, the following variables were selected as regressors, as they are the most often used ones in their respective category (Table 5). Selection criterion was that the variable is used in more than (or equal to) 50% of papers of its respective category. An exception was made for social media and web search as it was selected because of the fact that it was utilized in several categories and its number of appearances in the price forecasting and explanation category was higher than the number of appearances of any other variable in any other category. Even though ‘prices’ was the most popular variable in the pattern recognition, price forecasting and explanation and volatility forecasting and explanation category, it Table 5 Most popular variables Category

Variable

Used in …/out of … papers

As %

Bitcoin technology

Costs

2/3

~67

Security and privacy

2/3

~67

Pattern recognition

Prices

3/3

100

Portfolio theory

MSCI world

3/5

60

Gold

4/5

80

FX rates

4/5

Prices

13/24

~54

Social media and web search

6/24

25

Social acceptance

Perceived trade worth (in various ways)

4/8

50

Social influences

Connection to banking system

3/6

50

Volatility forecasting and explanation

Prices

5/7

~71

Price forecasting and explanation

80

An Investigation of the Complexity of Bitcoin Pricing

799

can obviously only be included once in the model, therefore rendering a total of nine variables to be considered in the regression model. The process of acquiring data for each of these variables will be explained in the next chapter.

5 Regression Model This chapter presents the regression model based on the variables found in the previous chapter. Ordinary least squares (OLS) [64] estimation will be utilized to explain Bitcoin pricing. As some of the variables proposed by the literature can’t be quantified directly there is a need to find proxy variables for them. Table 6 lists the variables, their respective proxy variables, data sources from which the data was obtained and the abbreviation used in the final model to describe the variable. As a proxy for the consideration of (operating) costs the average daily transaction fee in US-Dollar was chosen, as it represents the average cost of actively using Bitcoin, whether it is desirable to trade it as a security or to buy goods or services with it. The aspect of security and privacy is proxied by a dummy variable (a variable whose value is either ‘1’ if true or ‘0’ if false in a respective period) declaring whether a security breach on a Bitcoin trading platform occurred that day (‘1’) or if no incidents were reported (‘0’). There was no need for proxies of Bitcoin price, the MSCI world (represented by the MSCI World All-Country Equity Index) and gold price. FX rates are proxied by the EUR/USD rate. The proxy for social media and web search is Google Trends (which was also utilized by e.g., [56]), for perceived trade worth it is the number of existing Bitcoin wallets, as with an increasing number of wallets the amount of people who accept Bitcoin as trading good increases as well and therefore its trade value rises, as there are more exchange possibilities for it. Finally, the proxy variable for the connection of Bitcoin to the banking system will Table 6 Proxies and data sources of variables used Variable

(Proxy)

Data source

Abbreviation

Costs

Transaction fees

Coinmetrics.io [65]

FEE

Security and privacy

Hacked platforms

CryptoSec.info [66]

HACK

MSCI world



Investing.com [67]

MSCIW

Prices



Investing.com [68]

BTC

Gold



Quandl.com [69]

GOLD

FX rates

EUR/USD rate

Investing.com [70]

FX

Social media and web search

Google Trends

Google Trends [71]

GOOGL

Perceived trade worth

Number of Bitcoin wallets existing

Quandl.com [72]

WAL

Connection to banking system

Federal funds rate

FED St. Louis [73]

FED

800

P. Saborosch and D. Ushakov

be the federal funds rate of the U.S.-American FED, as it is a leading indicator for non-crypto currencies (see also e.g., [30]). Data was collected for the time period of January 1, 2015 to December 31, 2019, therefore representing a span of five years. This time frame was chosen because it represents the period of time in which the papers of the sample have been published, as can be seen in Sect. 3.1.3. The year 2020 was omitted because this paper was written in the beginning of 2020. The presented timeframe accounts for a total of 1826 days. However, as for some variables data was not available for every day (e.g., gold and stock exchanges only allow for trade on weekdays usually) the final dataset contains complete data for only 1278 days of the presented period or approximately 70% of all days possible. This presents a quite significant sample, as weekdays only make up for approximately 71.4% of the days of a week. To avoid the issue of perfect multicollinearity (a perfect linear relationship between two or more regressors in the model), which would render the model inefficient and inconsistent, a correlation matrix was calculated as a first step, to check for correlation between regressors. As can be seen in Table 7, there is no perfect multicollinearity between any two regressors at this point. However, it is pointless to regress the price of Bitcoin on itself to cover the ‘prices’ variable suggested by literature; therefore the concept of lagged variables needs to be introduced to the regression model. Example given, if one lag is used for the Bitcoin price variable, then in period t the Bitcoin price of period t – 1 is considered. Additionally a one-lagged version of the HACK variable is introduced (HACK-L1), as it is intuitive to think that a decentralized market such as the Bitcoin market isn’t able to react immediately to bad news and that it furthermore can’t grasps the news’ full extent in an instant, therefore a delayed reaction on the day after the announcement of a security breach is imaginable. Table 8 shows the correlation between the two lagged variables and the other variables. The correlation between HACK-L1 and BTC-L1 is 0.03. There is still no perfect multicollinearity issue, therefore the model can now be written as: Table 7 Correlation matrix of variables GOLD GOLD

BTC

MSCIW

FX

GOOGL

FED

FEE

WAL

HACK

1.00

BTC

0.58

1.00

MSCIW

0.60

0.86

1.00

FX

0.24

0.55

0.48

1.00

GOOGL

0.24

0.72

0.50

0.53

1.00

FED

0.51

0.74

0.84

0.36

0.33

1.00

FEE

0.12

0.55

0.30

0.37

0.76

0.14

1.00

WAL

0.67

0.78

0.87

0.24

0.30

0.93

0.12

1.00

HACK

0.03

0.03

0.04

0.05

0.03

0.03

0.02

0.02

1.00

An Investigation of the Complexity of Bitcoin Pricing

801

Table 8 Correlation of lagged variables HACK-L1 BTC-L1

GOLD

BTC

MSCIW

FX

GOOGL

FED

FEE

WAL

HACK

−0.03

0.03

0.04

0.04

0.02

0.03

0.03

0.02





0.86

0.55

0.71

0.74

0.55

0.78

0.03

0.59

Table 9 Coefficient estimates of the regression model Variable

Estimate

t-statistic

p-value

(Intercept)

−234.70

−0.661

0.50865

GOLD

−0.04114

−0.300

0.76393

MSCIW

0.1633

0.357

0.72143

FX

100.597

0.488

0.62557

GOOGL

9.776

7.492

0.00000

FED

−1621.00

−0.405

0.68527

FEE

−9.257

−3.165

0.00159

WAL

0.000006098

1.718

0.08600

−1.723

0.08518

HACK

−108.4

HACK-L1

6.551

BTC-L1

0.9648

0.104

0.91730

129.380

0.00000

*** **

***

BT Ct = G O L Dt + M SC I Wt + F X t + G O OG L t + F E Dt + F E E t + W AL t + H AC K t + H AC K t−1 + BT Ct−1 + u t

(1)

With u being the error term. Running the regression yields the following coefficient estimates. According to Table 9 neither the gold price, the MSCI World, foreign exchange rates, federal funds rates or a one-period lag of the security breach dummy variable have a verifiable impact on Bitcoin pricing. However, it seems to be the case that the positive impact of GOOGL and the one-period lag of Bitcoin price can be verified at a 0.1% significance level, the negative impact of the transaction fee at a 1% significance level and the positive impact of the number of wallets and the negative impact of the dummy whether there was a security breach are significant at a 10% level. Testing of these results will be conducted in the next section.

5.1 Testing In general terms, for a dynamic time series regression to be consistent and therefore useful, three assumptions must be met [74]:

802

1.

P. Saborosch and D. Ushakov

The model must be linear in parameters and exhibit stationarity as well as weak dependence: yt = β0 + β1 xt,1 + · · · + βk xt,k + u t

(2)

with stationarity defined as a property of variable at that fulfills the following conditions: E(at ) = μ ∀t; V ar (at ) = σa2 ∀t; Cov(at , at−h ) = C(h) ∀t, h;

(3)

and C(h) defined as weakly dependent if the following holds: C(h) = E(at − μ)(at−h − μ) → 0 as h → ∞ 2.

There must not be perfect multicollinearity among the regressors:   Corr xi , x j = |1| ∀ i = j

3.

4.

5.

(4)

(5)

The conditional mean expectation of the error terms u must be zero:   E u t |xt = 0 ∀ t;

(6)

  V ar u t |xt = σ 2 ∀t;

(7)

Errors are homoscedastic:

There is no serial correlation between errors:   E u t , u s |xt , xs = 0 ∀t = s.

(8)

As the absence of perfect multicollinearity has already been shown in the previous section and the linearity of the parameters can be seen in Eq. (1), tests for stationarity as well as for zero conditional mean expectation, homoskedasticity and serial correlation need to be conducted. Table 10 shows the results of the tests applied for each of those assumptions, the intuition behind those results will be explained in the last chapter.

An Investigation of the Complexity of Bitcoin Pricing

803

Table 10 Results of statistical testing Assumption

Test applied

Test statistic

P-value

Null hypothesis

Stationarity

Augmented Dickey-Fuller

−2.4518

0.387

Not rejected

Zero conditional mean expectation

Breusch-Godfrey

0.020592

0.8859

Not rejected

Homoskedasticity

Breusch-Pagan

355.61

0.0000

Rejected

No serial correlation

Box-Pierce

1268

0.0000

Rejected

6 Conclusion and Limitations The literature review found that researchers up until now investigated all kinds of different attributes and behaviors of Bitcoin and other cryptocurrencies, such as its use as a mean for portfolio diversification (e.g., [11, 42]), whether it exhibits traits that can be summarized in patterns (e.g., [31, 38]) or which factors contribute to the acceptance of Bitcoin and other cryptocurrencies in society (e.g., [15, 34]). The mostresearched variables of each area of research were then selected to be included in the regression model. In a next step the price of Bitcoin was regressed on the selected variables (including lagged versions of Bitcoin price and the dummy variable for security breaches). Findings were that only Google Trends, the lagged Bitcoin price and the number of wallets have statistically significant positive impact on Bitcoin price. The influence of those variables seems to go in the same direction as one would expect, as an increase in Bitcoin wallets as well as an increase of Google searches regarding Bitcoin signals a heightened interest in and therefore a stronger demand for the cryptocurrency. On the other hand, a statistically significant negative influence was found to be caused by the fact whether a security breach was reported on that day and by the average transaction fee in US-Dollars. These results are in line with expectations as well, as one would assume that trading Bitcoin becomes less attractive if the transaction fee rises, which causes a decrease in demand for Bitcoin (transactions). Furthermore, one would also assume that a reported security breach causes a price decrease, as investors panic and therefore divest their money from Bitcoin. However, when considering these results the testing conducted in Sect. 5.1 needs to be taken into account. Even though the correlation matrices showed no indication of perfect multicollinearity, issues were already encountered when testing for stationarity. The Augmented Dickey-Fuller test [63] tests the null hypothesis of no stationarity and as it was shown in the previous chapter this hypothesis cannot be rejected, therefore rendering the model invalid with regard to the five assumptions already. Non-stationarity is an indicator for a spurious regression [75] which may be caused by the fact that both Bitcoin price and some of the regressors experienced a steady upward trend in the investigated time period. Regarding the zero mean conditional expectation and no serial correlation assumptions two similar tests were applied, the Breusch-Godfrey test [76, 77] and the Box-Pierce test [78], both under the null hypothesis of no autocorrelation for a one-lagged period. Interestingly the

804

P. Saborosch and D. Ushakov

tests yield conflicting results, as Breusch-Godfrey test does not reject, implying the absence of autocorrelation, while the Box-Pierce test rejects on a highly significant level, implying autocorrelation. A further investigation of the zero mean conditional expectation assumption would require the application of Hausman tests [79] for variables susceptible to endogeneity, however, as the Augmented Dickey-Fuller test and the Breusch-Godfrey test have already indicated severe issues with the regressors, these tests have been omitted. Nonetheless a Breusch-Pagan test [80] was conducted to account for possible heteroskedasticity issues. The null hypothesis of homoskedasticity was rejected at a highly significant level, therefore implying a heteroskedasticity issue with the model. In conclusion it has to be noted that even though some of the estimates of the model appear to be highly significant and in line with intuition, the testing results yield that the model is highly biased, inconsistent and therefore also not efficient [74]. Ultimately, it can therefore be concluded that Bitcoin pricing is far too complex and research about it not advanced enough to be able to build a model that pieces together the single parts of research about its different attributes. One could possibly implement countermeasures for the issues discovered, however, as the goal of this paper was to determine whether solely combining existing research about Bitcoin pricing can lead to a efficient model, the task of finding measures and techniques to adjust, repair and improve the model will be reserved for future research.

References 1. CoinDesk.com: CoinDesk—Bitcoin. CoinDesk. https://www.coindesk.com/price/bitcoin (2020) 2. Morgan Stanley Investment Management: EDGE: Blockchain. Morgan Stanley. https://www. morganstanley.com/im/publication/insights/investment-insights/ii_theedgeblockchain_us.pdf (2018) 3. La Monica, P.R.: Bitcoin is back—but can the comeback last? CNN, 15 Nov 2019. https://edi tion.cnn.com/2019/11/15/investing/bitcoin-prices/index.html 4. Abraham, J., Sutiksno, D.U., Kurniasih, N., Warokka, A.: Acceptance and penetration of bitcoin: the role of psychological distance and national culture. SAGE Open 9(3), 215824401986581 (2019). https://doi.org/10.1177/2158244019865813 5. Parino, F., Beiro, M.G., Gauvin, L.: Analysis of the bitcoin blockchain: socio-economic factors behind the adoption [Physics] (2018). http://arxiv.org/abs/1804.07657 6. Bauer, C., Strauss, C.: Location-based advertising on mobile devices: a literature review and analysis. Manag. Rev. Q. 66(3), 159–194 (2016). https://doi.org/10.1007/s11301-015-0118-z 7. Ciaian, P., Rajcaniova, M., Kancs, D.: The digital agenda of virtual currencies: can BitCoin become a global currency? Int. Syst. E-Bus. Manage. 14(4), 883–919 (2016). https://doi.org/ 10.1007/s10257-016-0304-0 8. Abraham, M.: Studying the patterns and long-run dynamics in cryptocurrency prices. J. Corp. Account. Finance (2019). https://doi.org/10.1002/jcaf.22427 9. Ajouz, M., Abdullah, A., Kassim, S.: Acceptance of Shar¯ıah-compliant precious metal-backed cryptocurrency as an alternative currency: an empirical validation of adoption of innovation theory. Thunderbird Int. Bus. Rev. 62(2), 171–181 (2020). https://doi.org/10.1002/tie.22106

An Investigation of the Complexity of Bitcoin Pricing

805

10. Alexander, C., Choi, J., Park, H., Sohn, S.: BitMEX bitcoin derivatives: price discovery, informational efficiency, and hedging effectiveness. J. Futur. Mark. 40(1), 23–43 (2020). https:// doi.org/10.1002/fut.22050 11. Alfieri, E., Burlacu, R., Enjolras, G.: On the nature and financial performance of bitcoin. J. Risk Finance 20(2), 114–137 (2019). https://doi.org/10.1108/JRF-03-2018-0035 12. Alzaatreh, A., Sulieman, H.: On fitting cryptocurrency log-return exchange rates. Empir. Econ. (2019). https://doi.org/10.1007/s00181-019-01782-6 13. Amaral, V.L., Affonso, E.T.F., Silva, A.M., Moita, G.F., Almeida, P.E.M.: New fuzzy approaches to cryptocurrencies investment recommendation systems. In: Kearfott, R.B., Batyrshin, I., Reformat, M., Ceberio, M., Kreinovich, V. (Eds.), Fuzzy Techniques: Theory and Applications, Vol. 1000, pp. 135–147. Springer International Publishing (2019). https://doi. org/10.1007/978-3-030-21920-8_13 14. Aste, T.: Cryptocurrency market structure: connecting emotions and economics. Digit. Finance 1(1–4), 5–21 (2019). https://doi.org/10.1007/s42521-019-00008-9 15. Bashir, M., et al.: What motivates people to use bitcoin? In: 8th International Conference, SOCINFO 2016, New York, NY (2016) 16. Caporale, G.M., Plastun, O.: Price overreactions in the cryptocurrency market. SSRN Electron. J. (2018). https://doi.org/10.2139/ssrn.3113177 17. Caporale, G.M., Plastun, A., Oliinyk, V.: Bitcoin fluctuations and the frequency of price overreactions. Financ. Mark. Portf. Manag. 33(2), 109–131 (2019). https://doi.org/10.1007/s11408019-00332-5 18. Chakravarty, K., Pandey, M., Routaray, S.: Bitcoin prediction and time series analysis. In: Haldorai, A., Ramu, A., Mohanram, S., Onn, C.C. (Eds.), EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing, pp. 381–391. Springer International Publishing (2020). https://doi.org/10.1007/978-3-030-19562-5_39 19. Chatterjee, J.M., Son, L.H., Ghatak, S., Kumar, R., Khari, M.: BitCoin exclusively informational money: a valuable review from 2010 to 2017. Qual. Quant. 52(5), 2037–2054 (2018). https://doi.org/10.1007/s11135-017-0605-5 20. Cheong, C.W.H.: Cryptocurrencies vs global foreign exchange risk. J. Risk Finance 20(4), 330–351 (2019). https://doi.org/10.1108/JRF-11-2018-0178 21. De Stefani, et al.: A multivariate and multi-step ahead machine learning approach to traditional and cryptocorrencies volatility forecasting. In: Nemesis 2018, Urbreas 2018, Sogood 2018, IWAISE 2018, and Green Data Mining 2018, Dublin, Ireland, 10–14 Sept 2018, Proceedings. New York, NY (2019) 22. Deng, et al.: Research on the pricing strategy of the cryptocurrency miner’s market. In: Chen, S., Wang, H.J., Zhang, L.-J. (Eds.), Blockchain—ICBC 2018: First International Congress, Held as Part of the Services Conference Federation, SCF 2018, Seattle, WA, 25–30 June 2018: Proceedings. Springer (2018) 23. Dorfleitner, G., Lung, C.: Cryptocurrencies from the perspective of euro investors: a reexamination of diversification benefits and a new day-of-the-week effect. J. Asset Manag. 19(7), 472–494 (2018). https://doi.org/10.1057/s41260-018-0093-8 24. Figá-Talamanca, G., Patacca, M.: Does market attention affect bitcoin returns and volatility? Decis. Econ. Finan. 42(1), 135–155 (2019). https://doi.org/10.1007/s10203-019-00258-7 25. Guo, L., Li, X.J.: Risk analysis of cryptocurrency as an alternative asset class. In: Härdle, W.K., Chen, C.Y.-H., Overbeck, L. (Eds.), Applied Quantitative Finance, pp. 309–329. Springer Berlin Heidelberg (2017). https://doi.org/10.1007/978-3-662-54486-0_16 26. Haferkorn, M., Quintana Diaz, J.M.: Seasonality and interconnectivity within cryptocurrencies—an analysis on the basis of bitcoin, litecoin and namecoin. In: Lugmayr, A. (Ed.), Enterprise Applications and Services in the Finance Industry, vol. 217, pp. 106–120. Springer International Publishing (2015). https://doi.org/10.1007/978-3-319-28151-3_8 27. Heine Felix, T., von Eije, H.: Underpricing in the cryptocurrency world: evidence from initial coin offerings. Manag. Financ. 45(4), 563–578 (2019). https://doi.org/10.1108/MF-06-20180281

806

P. Saborosch and D. Ushakov

28. Jain, P.K., McInish, T.H., Miller, J.L.: Insights from bitcoin trading. Financ. Manage. 48(4), 1031–1048 (2019). https://doi.org/10.1111/fima.12299 29. Karalevicius, V., Degrande, N., De Weerdt, J.: Using sentiment analysis to predict interday bitcoin price movements. J. Risk Finance 19(1), 56–75 (2018). https://doi.org/10.1108/JRF06-2017-0092 30. Koutmos, D.: Market risk and bitcoin returns. Ann. Oper. Res. (2019). https://doi.org/10.1007/ s10479-019-03255-6 31. Li, Z.-Z., Tao, R., Su, C.-W., Lobon¸t, O.-R.: Does bitcoin bubble burst? Qual. Quant. 53(1), 91–105 (2019). https://doi.org/10.1007/s11135-018-0728-3 32. Liu, J., Serletis, A.: Volatility in the cryptocurrency market. Open Econ. Rev. 30(4), 779–811 (2019). https://doi.org/10.1007/s11079-019-09547-5 33. Ma, D., Tanizaki, H.: On the day-of-the-week effects of bitcoin markets: international evidence. China Finance Rev. Int. 9(4), 455–478 (2019). https://doi.org/10.1108/CFRI-12-2018-0158 34. Masciandaro, D.: Central bank digital cash and cryptocurrencies: insights from a new BaumolFriedman demand for money: central bank digital cash and cryptocurrencies. Aust. Econ. Rev. 51(4), 540–550 (2018). https://doi.org/10.1111/1467-8462.12304 35. Matta, M., Lunesu, I., Marchesi, M.: Is bitcoin’s market predictable? Analysis of web search and social media. In: Fred, A., Dietz, J.L.G., Aveiro, D., Liu, K., Filipe, J. (Eds.), Knowledge Discovery, Knowledge Engineering and Knowledge Management, vol. 631, pp. 155–172. Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-52758-1_10 36. Mendoza-Tello, J.C., Mora, H., Pujol-López, F.A., Lytras, M.D.: Disruptive innovation of cryptocurrencies in consumer acceptance and trust. Int. Syst. E-Bus. Manage. 17(2–4), 195–222 (2019). https://doi.org/10.1007/s10257-019-00415-w 37. Ming Wong, A.K.: The role of bitcoin in the monetary system: its development and the possible future. In: Yu, F.-L.T., Kwan, D.S. (Eds.), Contemporary Issues in International Political Economy, pp. 395–412. Springer Singapore (2019). https://doi.org/10.1007/978-981-13-64624_17 38. Moosa, I.A.: The bitcoin: a sparkling bubble or price discovery? J. Ind. Bus. Econ. (2019). https://doi.org/10.1007/s40812-019-00135-9 39. Naimy, V.Y., Hayek, M.R.: Modelling and predicting the bitcoin volatility using GARCH models. Int. J. Math. Model. Numer. Optim. 8(3), 197 (2018). https://doi.org/10.1504/IJM MNO.2018.088994 40. Othman, A.H.A., Alhabshi, S.M., Kassim, S., Sharofiddin, A.: The impact of cryptocurrencies market development on banks’ deposits variability in the GCC region. J. Financ. Econ. Policy (ahead-of-print) (2019). https://doi.org/10.1108/JFEP-02-2019-0036 41. Papathanasiou, S., Papamatthaiou, N., Balios, D.P.: Bitcoin as an alternative digital currency: exploring the publics’ perception vs. experts. Int. J. Financ. Eng. Risk Manag. 3(2), 146 (2019). https://doi.org/10.1504/IJFERM.2019.101296 42. Poyser, O.: Exploring the dynamics of bitcoin’s price: a Bayesian structural time series approach. Eurasian Econ. Rev. 9(1), 29–60 (2019). https://doi.org/10.1007/s40822-018-0108-2 43. Priya, A., Garg, S.: A comparison of prediction capabilities of Bayesian regularization and Levenberg–Marquardt training algorithms for cryptocurrencies. In: Satapathy, S.C., Bhateja, V., Mohanty, J.R., Udgata, S.K. (Eds.), Smart Intelligent Computing and Applications, vol. 159, pp. 657–664. Springer Singapore. https://doi.org/10.1007/978-981-13-9282-5_62 (2020) 44. Ricci, P.: How economic freedom reflects on the bitcoin transaction network. J. Ind. Bus. Econ. (2019). https://doi.org/10.1007/s40812-019-00143-9 45. Romanchenko, O., Shemetkova, O., Piatanova, V., Kornienko, D.: Approach of estimation of the fair value of assets on a cryptocurrency market. In: Antipova, T., Rocha, A. (Eds.), Digital Science, vol. 850, pp. 245–253. Springer International Publishing. https://doi.org/10.1007/9783-030-02351-5_29 (2019) 46. Sahoo, P.K., Sethi, D., Acharya, D.: Is bitcoin a near stock? Linear and non-linear causal evidence from a price–volume relationship. Int. J. Manag. Finance. https://doi.org/10.1108/ IJMF-06-2017-0107 (2019)

An Investigation of the Complexity of Bitcoin Pricing

807

47. Sánchez, E., Olivas, J.A., Romero, F.P.: Data analytics for the cryptocurrencies behavior. In: Naiouf, M., Chichizola, F., Rucci, E. (Eds.), Cloud Computing and Big Data, Vol. 1050, pp. 86– 97. Springer International Publishing. https://doi.org/10.1007/978-3-030-27713-0_8 (2019) 48. Shorish, J.: Hedonic Pricing of Cryptocurrency Tokens [Preprint]. SocArXiv. https://doi.org/ 10.31235/osf.io/wdg2v (2018) 49. Shrestha, K.: Multifractal detrended fluctuation analysis of return on bitcoin: bitcoin return. Int. Rev. Financ. (2019). https://doi.org/10.1111/irfi.12256 50. Soloviev, V.N., Belinskiy, A.: Complex systems theory and crashes of cryptocurrency market. In: Ermolayev, V., Suárez-Figueroa, M.C., Yakovyna, V., Mayr, H.C., Nikitchenko, M., Spivakovsky, A. (Eds.), Information and Communication Technologies in Education, Research, and Industrial Applications, vol. 1007, pp. 276–297. Springer International Publishing. https:// doi.org/10.1007/978-3-030-13929-2_14 (2019) 51. Ullrich, J., Stifter, N., Judmayer, A., Dabrowski, A., Weippl, E.: Proof-of-blackouts? How proof-of-work cryptocurrencies could affect power grids. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (Eds.), Research in Attacks, Intrusions, and Defenses, Vol. 11050, pp. 184–203. Springer International Publishing. https://doi.org/10.1007/978-3-030-00470-5_9 (2018) 52. Vardar, G., Aydogan, B.: Return and volatility spillovers between bitcoin and other asset classes in Turkey: evidence from VAR-BEKK-GARCH approach. EuroMed J. Bus. 14(3), 209–220 (2019). https://doi.org/10.1108/EMJB-10-2018-0066 53. Vogiazas, S., Alexiou, C.: Bitcoin: the road to hell is paved with good promises. Econ. Notes 48(1), 12119 (2019). https://doi.org/10.1111/ecno.12119 54. Wang, M., Wu, Q., Qin, B., Wang, Q., Liu, J., Guan, Z.: Lightweight and manageable digital evidence preservation system on bitcoin. J. Comput. Sci. Technol. 33(3), 568–586 (2018). https://doi.org/10.1007/s11390-018-1841-4 55. Wang, P., Zhang, W., Li, X., Shen, D.: Trading volume and return volatility of bitcoin market: evidence for the sequential information arrival hypothesis. J. Econ. Interact. Coord. 14(2), 377–418 (2019). https://doi.org/10.1007/s11403-019-00250-9 56. Wolk, K.: Advanced social media sentiment analysis for short-term cryptocurrency price prediction. Expert Syst. (2019) 57. Yalaman, A.: Bitcoin jumps and speculations: empirical evidence from high-frequency data. In: Hacioglu, U. (Ed.), Digital Business Strategies in Blockchain Ecosystems, pp. 617–629. Springer International Publishing. https://doi.org/10.1007/978-3-030-29739-8_29 (2020) 58. Yang, L., Liu, X.-Y., Li, X., Li, Y.: Price prediction of cryptocurrency: an empirical study. In: Qiu, M. (Ed.), Smart Blockchain, vol. 11911, pp. 130–139. Springer International Publishing. https://doi.org/10.1007/978-3-030-34083-4_13 (2019) 59. Zamuda, A., Crescimanna, V., Burguillo, J.C., Matos Dias, J., Wegrzyn-Wolska, K., Rached, I., González-Vélez, H., Senkerik, R., Pop, C., Cioara, T., Salomie, I., Bracciali, A.: Forecasting cryptocurrency value by sentiment analysis: an HPC-oriented survey of the state-of-the-art in the cloud era. In: Kołodziej, J., González-Vélez, H. (Eds.), High-Performance Modelling and Simulation for Big Data Applications, vol. 11400, pp. 325–349. Springer International Publishing. https://doi.org/10.1007/978-3-030-16272-6_12 (2019) 60. Zhang, S., Zhou, X., Pan, H., Jia, J.: Cryptocurrency, confirmatory bias and news readability— evidence from the largest Chinese cryptocurrency exchange. Account. Finance 58(5), 1445– 1468 (2019). https://doi.org/10.1111/acfi.12454 61. Zhou, S.: Exploring the driving forces of the bitcoin currency exchange rate dynamics: an EGARCH approach. Empir. Econ. (2019). https://doi.org/10.1007/s00181-019-01776-4 62. Zhu, Y., Dickinson, D., Li, J.: Analysis on the influence factors of Bitcoin’s price based on VEC model (2017) 63. Fuller, W.A.: Introduction to Statistical Time Series, 2nd edn. Wiley (1996) 64. Goldberger, A.S.: Econometric Theory, 12 pr. Wiley (1980) 65. Coinmetrics.io: Data Files. Coinmetrics. https://coinmetrics.io/data-downloads/ (2020) 66. CryptoSec.info. (2020). Documented Timeline of Exchange Hacks. CryptoSec.Info. https:// cryptosec.info/exchange-hacks/

808

P. Saborosch and D. Ushakov

67. Investing.com: MSCI All-Country World Equity Index. Investing.Com. https://www.investing. com/indices/msci-world-stock-historical-data (2020c) 68. Investing.com: BTC/USD—Bitcoin US Dollar. Investing.Com. https://www.investing.com/cry pto/bitcoin/btc-usd-historical-data (2020a) 69. Quandl.com: Gold Prices (Daily)—Currency USD. Quandl.Com. https://www.quandl.com/ data/WGC/GOLD_DAILY_USD-Gold-Prices-Daily-Currency-USD (2020b) 70. Investing.com: EUR/USD—Euro US Dollar. Investing.Com. https://www.investing.com/cur rencies/eur-usd-historical-data (2020b) 71. Google Trends: Suchbegriff Bitcoin—Interesse im zeitlichen Verlauf. Google Trends. https:// trends.google.com/trends/explore?date=2014-12-27%202019-12-31&q=bitcoin (2020) 72. Quandl.com: Bitcoin My Wallet Number of Users. Quandl.Com. https://www.quandl.com/ data/BCHAIN/MWNUS-Bitcoin-My-Wallet-Number-of-Users (2020a) 73. FED St. Louis: Effective Federal Funds Rate. FED St. Louis. https://fred.stlouisfed.org/series/ FEDFUNDS (2020) 74. Wooldridge, J.M.: Introductory Econometrics: A Modern Approach, 6th edn. Cengage Learning (2016) 75. Granger, C.W.J., Newbold, P.: Spurious regressions in econometrics. J. Econ. 2(2), 111–120 (1974). https://doi.org/10.1016/0304-4076(74)90034-7 76. Breusch, T.S.: Testing for autocorrelation in dynamic linear models*. Aust. Econ. Pap. 17(31), 334–355 (1978). https://doi.org/10.1111/j.1467-8454.1978.tb00635.x 77. Godfrey, L.G.: Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica 46(6), 1293 (1978). https:// doi.org/10.2307/1913829 78. Box, G.E.P., Pierce, D.A.: Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Am. Stat. Assoc. 65(332), 1509–1526 (1970). https:// doi.org/10.1080/01621459.1970.10481180 79. Hausman, J.A.: Specification tests in econometrics. Econometrica 46(6), 1251 (1978). https:// doi.org/10.2307/1913827 80. Breusch, T.S., Pagan, A.R.: A simple test for heteroscedasticity and random coefficient variation. Econometrica 47(5), 1287 (1979). https://doi.org/10.2307/1911963