Personalized Human–Computer Interaction 9783110552485

334 92 4MB

English Pages [320] Year 2019

Table of contents :
Introduction
Contents
List of Contributing Authors
Part I: Foundations of user modeling
1. Theory-grounded user modeling for personalized HCI
2. Opportunities and challenges of utilizing personality traits for personalization in HCI
Part II: User input and feedback
3. Automated personalization of input methods and processes
4. How to use socio-emotional signals for adaptive training
5. Explanations and user control in recommender systems
Part III: Personalization approaches
6. Tourist trip recommendations – foundations, state of the art, and challenges
7. Pictures as a tool for matching tourist preferences with destinations
8. Towards personalized virtual reality touring through cross-object user interfaces
9. User awareness in music recommender systems
10. Personalizing the user interface for people with disabilities
11. Adaptive workplace learning assistance
Index

Recommend Papers

Personalized Human-Computer Interaction 9783110552485, 9783110552478

Personalized and adaptive systems employ user models to adapt content, services, interaction or navigation to individual

141 52 11MB Read more

Personalized Human-Computer Interaction [2 ed.] 3110999609, 9783110999600

Personalized and adaptive systems employ user models to adapt content, services, interaction or navigation to individual

156 66 38MB Read more

Personalized Human-Computer Interaction 9783110746228, 9783110599374, 9783111024318, 9783111025551, 9783110999600, 9783110988567, 9783110988772

117 43 37MB Read more

Interaction Desing: Beyond Human-Computer Interaction [3rd ed.]

1,229 109 10MB Read more

RNA-Protein Interaction Protocols

Highly skilled experimentalists provide a comprehensive series of commonly used, as well as specialized, techniques for

536 33 2MB Read more

Drug-DNA Interaction Protocols

A collection of useful molecular techniques to illuminate and explore the interaction of drugs and ligands with DNA. The

480 27 272KB Read more

Textbook of Personalized Medicine [3 ed.] 9783030620806, 1100018572

379 27 7MB Read more

Personalized Mechanical Ventilation: Improving Quality of Care 3031141377, 9783031141379

In dealing with the unprecedented COVID-19 pandemic, there are an increased number of patients requiring personalized ma

114 79 12MB Read more

Human-Robot Interaction

Издательство InTech, 2010, -308 pp.Robot’s performance is increased greatly in recent years and their applications are n

485 94 3MB Read more

Human-Robot Interaction

Издательство InTech, 2007, -532 pp.Human-Robot Interaction (HRI) has been an important research area during the last dec

663 104 11MB Read more

Personalized Human–Computer Interaction
9783110552485

Author / Uploaded
Mirjam Augstein
Eelco Herder
and Wolfgang Wörndl (Eds.)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Mirjam Augstein, Eelco Herder, and Wolfgang Wörndl (Eds.) Personalized Human–Computer Interaction

Also of Interest Online Social Network Analysis, vol. 1: Structure and Evolution Binxing Fang, Yan Jia, 2019 ISBN 978-3-11-059606-9, e-ISBN (PDF) 978-3-11-059937-4, e-ISBN (EPUB) 978-3-11-059807-0

Online Social Network Analysis, vol. 2: Groups and Interaction Binxing Fang, Yan Jia, 2019 ISBN 978-3-11-059777-6, e-ISBN (PDF) 978-3-11-059941-1, e-ISBN (EPUB) 978-3-11-059792-9

Online Social Network Analysis, vol. 3: Information and Communication Binxing Fang, Yan Jia, 2019 ISBN 978-3-11-059784-4, e-ISBN (PDF) 978-3-11-059943-5, e-ISBN (EPUB) 978-3-11-059793-6

Intelligent Multimedia Data Analysis Siddhartha Bhattacharyya, Indrajit Pan, Abhijit Das, Shibakali Gupta, 2019 ISBN 978-3-11-055031-3, e-ISBN (PDF) 978-3-11-055207-2, e-ISBN (EPUB) 978-3-11-055033-7 Context-Aware Computing Ling Feng, 2017 ISBN 978-3-11-055568-4, e-ISBN (PDF) 978-3-11-055667-4, e-ISBN (EPUB) 978-3-11-055569-1

Personalized Human–Computer Interaction |

Edited by Mirjam Augstein, Eelco Herder, and Wolfgang Wörndl

Mathematics Subject Classification 2010 68U35 Editors Mirjam Augstein FH Hagenberg Softwarepark 11 4232 Hagenberg Austria [email protected] Eelco Herder Radboud University Nijmegen Institute for Computing and Information Sciences Toernooiveld 212 6525 EC Nijmegen The Netherlands [email protected]

Wolfgang Wörndl Technical University of Munich Department for Informatics Boltzmannstr. 3 85748 Garching b. München Germany [email protected]

ISBN 978-3-11-055247-8 e-ISBN (PDF) 978-3-11-055248-5 e-ISBN (EPUB) 978-3-11-055261-4 Library of Congress Control Number: 2019946022 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2019 Walter de Gruyter GmbH, Berlin/Boston Cover image: LumineImages / iStock / Getty Images Plus Typesetting: VTeX UAB, Lithuania Printing and binding: CPI books GmbH, Leck www.degruyter.com

Introduction Decades of research on user modeling, personalization, and recommender systems have led to a solid body of general approaches, principles, algorithms, and tools. Personalization has become a core functionality in search engines, online stores, and social media feeds. In the area of human–computer interaction (HCI), personalization plays a prominent role as well. For instance, interaction with computer-based devices requires users to exhibit a wide range of physical and cognitive abilities, which differ from person to person. Further, most users have their own preferred interaction styles, modalities, devices, and user interfaces, which raises the need for personalization in all aspects of HCI. Even though personalization is a commonly adopted technology, many principles and insights from the research community have not yet sufficiently been applied. User modeling refers to the process of collecting data about users, inferring a user model in order to apply this model to customize and personalize systems. Personalized systems use this model to adapt the interaction with the users to their requirements, preferences, background knowledge, restrictions, usage contexts, and/or goals. The adaptation can be carried out in different manners, e. g., modifying a user interface according to the user’s capabilities or knowledge about the system, or proposing interesting and relevant items to a user in a recommender system to reduce information overload. In this book, core researchers present the state-of-the-art research and practice of adaptation and personalization from the perspective of HCI in a wide range of areas. The book chapters were elicited via a public call for chapters. We received 24 abstracts and accepted 11 full chapter submissions after a thorough selection and reviewing process. We have grouped the chapters into the following parts: (i) Foundations of user modeling, (ii) User input and feedback, and (iii) Personalization approaches.

Foundations of user modeling The first chapter, “Theory-grounded user modeling for personalized HCI,” by Mark P. Graus and Bruce Ferwerda, presents a literature overview of models from psychological theory that can be used in personalization. The motivation is to leverage the theoretical understanding between behavior and user traits that can be used to characterize individual users. They propose a step-by-step approach on how to design personalized systems that take users’ traits into account. Sarah Theres Völkel, Ramona Schödel, Daniel Buschek, Clemens Stachl, Quay Au, Bernd Bischl, Markus Bühner, and Heinrich Hußmann continue the foundation part in their chapter “Opportunities and challenges of utilizing personality traits for personalization in HCI.” They discuss opportunities and challenges of assessing and utilizhttps://doi.org/10.1515/9783110552485-201

VI | Introduction ing personality traits in personalized interactive systems and services. The chapter includes approaches of using personality traits for recommender systems in several use cases for personality-aware personalization, such as personal communication between users.

User input and feedback The second part of the book focuses on user input and feedback options in adaptive systems. Mirjam Augstein and Thomas Neumayr showcase personalized interaction in their chapter “Automated personalization of input methods and processes.” They present a software framework that provides a template for a feasible technical infrastructure. Furthermore, the authors explain a specific case study of personalized interaction that was implemented based on the framework, and discuss an evaluation process and results for this use case. Tobias Moebert, Jan N. Schneider, Dietmar Zoerner, Anna Tscherejkina, and Ulrike Lucke look at cause-and-effect models behind adaptive training systems in their chapter “How to use socio-emotional signals for adaptive training.” They explain mechanisms for implementing the models and also empirical results from a study on the training of emotion recognition by people with autism as an example. They present two approaches; one is to extend the algorithm regarding dimensions of difficulty in social cognition, the second is to make use of socio-emotional signals of the learners in order to further adapt the training system. “Explanations and user control in recommender systems” by Dietmar Jannach, Michael Jugovac, and Ingrid Nunes reviews explanations and feedback mechanisms in recommender systems. Often, these systems are black boxes for users and do not provide information on why items were recommended. In addition, users have frequently very limited means to control the recommendations, which may lead to limited trust and acceptance.

Personalization approaches The third part of the book is about the application of adaptation and personalization in interactive systems in various domains. “Tourist trip recommendations – foundations, state of the art, and challenges” by Daniel Herzog, Linus W. Dietz, and Wolfgang Wörndl surveys the field of Tourist Trip Design Problems (TTDPs). TTDPs deal with the task of supporting tourists in creating personalized trips with sets or sequences of points of interest or other travel-related items. The authors present trip recommender systems with a focus on recommendation techniques, data analysis, and user interfaces.

Introduction

| VII

Continuing in the tourism domain, Wilfried Grossmann, Mete Sertkan, Julia Neidhardt, and Hannes Werthner present their chapter “Pictures as a tool for matching tourist preferences with destinations.” They introduce a so-called Seven-Factor Model for characterizing the preferences of tourists by assigning values in this model with a picture-based approach. For this purpose, users select pictures that represent various personality aspects and destination descriptions. The authors evaluated their profile acquisition method with a study using data from a travel agency. The chapter “Towards personalized virtual reality touring through cross-object user interfaces” by Xiangdong Li, Yunzhan Zhou, Wenqian Chen, Preben Hansen, Weidong Geng, and Lingyun Sun is about real-time adaptation in virtual environments. The authors propose cross-object user interfaces for personalized virtual reality touring. The approach consists of two components: a Deep Learning algorithm-based model to predict the user’s visual attention from the past eye movement patterns in order to determine which virtual objects are likely to be viewed next, and delivery mechanisms that determine what should be displayed and where and when on the user interfaces. Music recommender systems represent a widely adopted application area for personalized systems and interfaces. In their chapter “User awareness in music recommender systems,” Peter Knees, Markus Schedl, Bruce Ferwerda, and Audrey Laplante focus on the listener’s aspects of music recommender systems. The authors review different factors that influence relevance for music recommendation, for example the individual listener’s background and context. This is complemented by a discussion on user-centric evaluation strategies for music recommender systems and a reflection on current barriers as well as on strategies to overcome them. The chapter “Personalizing the user interface for people with disabilities” by Julio Abascal, Olatz Arbelaitz, Xabier Gardeazabal, Javier Muguerza, Juan E. Pérez, Xabier Valencia, and Ainhoa Yera deals with user interface personalization for people with disabilities. The authors present methods and techniques that are applied to research and practice in this important application area for personalized HCI. They outline possible approaches for diverse application fields where personalization is required, for example accessibility to the web using transcoding or personalized eGovernment. The book concludes with the chapter “Adaptive workplace learning assistance” by Miloš Kravčík. He gives a reflective overview of the progress of adaptive workplace learning assistance and discusses three important areas of lifelong and workplace learning which correspond to basic theories of learning. The author highlights several prospective approaches of learning technology that aim to address current issues and lead to better personalization of learning experiences.

Contents Introduction | V List of Contributing Authors | XI

Part I: Foundations of user modeling Mark P. Graus and Bruce Ferwerda 1 Theory-grounded user modeling for personalized HCI | 3 Sarah Theres Völkel, Ramona Schödel, Daniel Buschek, Clemens Stachl, Quay Au, Bernd Bischl, Markus Bühner, and Heinrich Hussmann 2 Opportunities and challenges of utilizing personality traits for personalization in HCI | 31

Part II: User input and feedback Mirjam Augstein and Thomas Neumayr 3 Automated personalization of input methods and processes | 67 Tobias Moebert, Jan N. Schneider, Dietmar Zoerner, Anna Tscherejkina, and Ulrike Lucke 4 How to use socio-emotional signals for adaptive training | 103 Dietmar Jannach, Michael Jugovac, and Ingrid Nunes 5 Explanations and user control in recommender systems | 133

Part III: Personalization approaches Daniel Herzog, Linus W. Dietz, and Wolfgang Wörndl 6 Tourist trip recommendations – foundations, state of the art, and challenges | 159 Wilfried Grossmann, Mete Sertkan, Julia Neidhardt, and Hannes Werthner 7 Pictures as a tool for matching tourist preferences with destinations | 183

X | Contents Xiangdong Li, Yunzhan Zhou, Wenqian Chen, Preben Hansen, Weidong Geng, and Lingyun Sun 8 Towards personalized virtual reality touring through cross-object user interfaces | 201 Peter Knees, Markus Schedl, Bruce Ferwerda, and Audrey Laplante 9 User awareness in music recommender systems | 223 Julio Abascal, Olatz Arbelaitz, Xabier Gardeazabal, Javier Muguerza, J. Eduardo Pérez, Xabier Valencia, and Ainhoa Yera 10 Personalizing the user interface for people with disabilities | 253 Miloš Kravčík 11 Adaptive workplace learning assistance | 283 Index | 303

List of Contributing Authors Julio Abascal Egokituz: Laboratory of Human-Computer Interaction for Special Needs University of the Basque Country/Euskal Herriko Unibertsitatea Informatika Fakultatea San Sebastian Spain Olatz Arbelaitz ALDAPA: ALgorithms, DAta mining and Parallelism University of the Basque Country/Euskal Herriko Unibertsitatea Informatika Fakultatea San Sebastian Spain Quay Au Department for Statistics, Computational Statistics Ludwig-Maximilians-Universität München Munich Germany Mirjam Augstein Department of Communication and Knowledge Media University of Applied Sciences Upper Austria Hagenberg Austria Daniel Buschek Institute for Informatics, Media Informatics Ludwig-Maximilians-Universität München Munich Germany Bernd Bischl Department for Statistics, Computational Statistics Ludwig-Maximilians-Universität München Munich Germany Markus Bühner Department of Psychology, Psychological Methods and Assessment Ludwig-Maximilians-Universität München Munich Germany

Wenqian Chen College of Computer Science and Technology Zhejiang University Hangzhou China Linus W. Dietz Department of Informatics Technical University of Munich Garching Germany Bruce Ferwerda Department of Computer Science and Informatics, School of Engineering Jönköping University Jönköping Sweden Xabier Gardeazabal Egokituz: Laboratory of Human-Computer Interaction for Special Needs University of the Basque Country/Euskal Herriko Unibertsitatea Informatika Fakultatea San Sebastian Spain Weidong Geng College of Computer Science and Technology Zhejiang University Hangzhou China Mark P. Graus Department of Human-Technology Interaction Eindhoven University of Technology The Netherlands Department of Marketing and Supply Chain Management, School of Business and Economics Maastricht University The Netherlands Wilfried Grossmann Faculty of Computer Science University of Vienna Vienna Austria

XII | List of Contributing Authors

Preben Hansen College of Computer Science and Technology Zhejiang University Hangzhou China

Ulrike Lucke Department of Computer Science University of Potsdam Potsdam Germany

Daniel Herzog Department of Informatics Technical University of Munich Garching Germany

Tobias Moebert Department of Computer Science University of Potsdam Potsdam Germany

Heinrich Hußmann Institute for Informatics, Media Informatics Ludwig-Maximilians-Universität München Munich Germany

Javier Muguerza ALDAPA: ALgorithms, DAta mining and Parallelism University of the Basque Country/Euskal Herriko Unibertsitatea Informatika Fakultatea San Sebastian Spain

Dietmar Jannach University of Klagenfurt Klagenfurt Austria Michael Jugovac Technical University of Dortmund Dortmund Germany Peter Knees Institute of Information Systems Engineering, Faculty of Informatics TU Wien Vienna Austria Miloš Kravčík DFKI – Educational Technology Lab Berlin Germany

Julia Neidhardt Research Unit of E-Commerce, Institute of Information Systems Engineering University of Vienna Vienna Austria Thomas Neumayr Department of Communication and Knowledge Media University of Applied Sciences Upper Austria Wels Austria Ingrid Nunes Federal University of Rio Grande do Sul Porto Alegre Brazil

Audrey Laplante École de bibliothéconomie et des sciences de l’information Université de Montréal Quebéc Canada

Juan Eduardo Pérez Egokituz: Laboratory of Human-Computer Interaction for Special Needs University of the Basque Country/Euskal Herriko Unibertsitatea Informatika Fakultatea San Sebastian Spain

Xiangdong Li College of Computer Science and Technology Zhejiang University Hangzhou China

Markus Schedl Institute of Computational Perception Johannes Kepler University Linz Linz Austria

List of Contributing Authors | XIII

Jan N. Schneider Berlin School of Mind and Brain Humboldt-Universität zu Berlin Berlin Germany Ramona Schödel Department of Psychology, Psychological Methods and Assessment Ludwig-Maximilians-Universität München Munich Germany Mete Sertkan Research Unit of E-Commerce, Institute of Information Systems Engineering University of Vienna Vienna Austria Clemens Stachl Department of Psychology, Psychological Methods and Assessment Ludwig-Maximilians-Universität München Munich Germany Lingyun Sun College of Computer Science and Technology Zhejiang University Hangzhou China Anna Tscherejkina Department of Computer Science University of Potsdam Potsdam Germany Xabier Valencia Egokituz: Laboratory of Human-Computer Interaction for Special Needs University of the

Basque Country/Euskal Herriko Unibertsitatea Informatika Fakultatea San Sebastian Spain Sarah Theres Völkel Institute for Informatics, Media Informatics Ludwig-Maximilians-Universität München Munich Germany Hannes Werthner Research Unit of E-Commerce, Institute of Information Systems Engineering University of Vienna Vienna Austria Wolfgang Wörndl Department of Informatics Technical University of Munich Garching Germany Ainhoa Yera ALDAPA: ALgorithms, DAta mining and Parallelism University of the Basque Country/Euskal Herriko Unibertsitatea Informatika Fakultatea San Sebastian Spain Yunzhan Zhou College of Computer Science and Technology Zhejiang University Hangzhou China Dietmar Zoerner Department of Computer Science University of Potsdam Potsdam Germany

|

Part I: Foundations of user modeling

Mark P. Graus and Bruce Ferwerda

1 Theory-grounded user modeling for personalized HCI Abstract: Personalized systems are systems that adapt themselves to meet the inferred needs of individual users. The majority of personalized systems mainly rely on data describing how users interacted with these systems. A common approach is to use historical data to predict users’ future needs, preferences, and behavior to subsequently adapt the system to these predictions. However, this adaptation is often done without leveraging the theoretical understanding between behavior and user traits that can be used to characterize individual users or the relationship between user traits and needs that can be used to adapt the system. Adopting a more theoretical perspective can benefit personalization in two ways: (i) letting systems rely on theory can reduce the need for extensive data-driven analysis, and (ii) interpreting the outcomes of datadriven analysis (such as predictive models) from a theoretical perspective can expand our knowledge about users. However, incorporating theoretical knowledge in personalization brings forth a number of challenges. In this chapter, we review literature that taps into aspects of (i) psychological models from traditional psychological theory that can be used in personalization, (ii) relationships between psychological models and online behavior, (iii) automated inference of psychological models from data, and (iv) how to incorporate psychological models in personalized systems. Finally, we propose a step-by-step approach on how to design personalized systems that take user traits into account. Keywords: personalization, psychological models, cognitive models, psychology, user modeling, theory-driven

1 Introduction Personalization is performed by adapting aspects of systems to match individual users’ needs in order to improve efficiency, effectiveness, and satisfaction. Current personalization strategies are mainly data-driven in the sense that they are based on the way users have been and are interacting with a system, after which the system is dynamically adapted to match inferred user needs. The more theory-driven counterparts of personalization are often designed based on general knowledge about how user traits influence user needs, and how these needs influence the requirements of a system. Systems are adapted to individual users based on a set of rules. Although both strategies are used separately, the knowledge gained from both strategies could be combined to achieve greater personalization possibilities. To mitigate personalization challenges, current research has primarily focused on using historical data that describe interaction behavior. Using these data, personhttps://doi.org/10.1515/9783110552485-001

4 | M. P. Graus and B. Ferwerda alization strategies are developed that predict users’ future interactions. The prediction of these future interactions is often done without leveraging the understanding of the relationship between user behavior and user traits. In other words, predictions are made without considering the root cause of certain behavior that users are showing. A prominent direction using this approach is the field of recommender systems in which historical behavioral data are used to alter the order of items in a catalog (from highest predicted relevance to lowest predicted relevance), with the goal of making users consume more items or helping them to find relevant items more easily [69]. By adopting a more theoretical perspective (often based on psychological literature), the root cause of behavior can be identified and thereby benefit personalization opportunities. Using a theoretical perspective can benefit personalization in two ways. (i) A large body of theoretical work can be used to inform personalized systems without the need of extensive data-driven analysis. For example, research has shown that it might be beneficial to adapt the way course material is presented to match students’ working memory capacity [42]. And (ii) including theory can help to interpret the results gained from the data-driven perspective and thereby meaningfully expand our knowledge about users. For example, research on music players has demonstrated that different types of people base their decisions on what to listen to on different sources of information [29]. Although adopting a more theoretical perspective by considering the relationship between user behavior and user traits has been shown to benefit personalization, this theoretical perspective comes with theoretical and methodological challenges. A first challenge is to identify and measure user traits that play a role in the needs for personalization (e. g., cognitive style [77], personality [14], susceptibility to persuasive strategies [19]) and to capture these traits in a formal user model. A second challenge is to infer the relevant user traits from interaction behavior (e. g., inferring user preferences from historical ratings, or inferring a person’s personality from the content they share on social media). A third challenge is to identify the aspects of a system that can be altered based on these user traits. In certain cases, this is straightforward (e. g., altering the order of a list of items based on predicted relevance), while in other cases the required alterations can be more intricate and require more thought to implement (e. g., altering the way in which information is presented visually to match a user’s cognitive style). While the aforementioned challenges are interconnected, they are often addressed in isolation. The current chapter provides an overview of work that relied on user traits for several (system) aspects: – introduction of psychological models that are currently used in personalization; – psychological models that have been linked to online behavior; – automatic inference of psychological models from behavioral data; – incorporating psychological models in personalized systems or systems for personalization.

1 Theory-grounded user modeling for personalized HCI

| 5

The literature discussed throughout the chapter can serve as starting points for theorygrounded personalization in certain applications (e. g., e-learning, recommendations) and content domains (e. g., movies, music). Finally, the chapter concludes with a blueprint for designing personalized systems that take user traits into consideration.

2 Psychological models in personalization Psychological models serve to explain how aspects of the environment influence human behavior and cognition. Since these models provide information on how people react to their surroundings, they can also be used to anticipate how people will react to aspects of technological systems and can thus provide insight in people’s needs in technological contexts. The proposition to use psychological models for personalization is not a new concept. Rich [89] already proposed the use of psychological stereotypes for personalizing digital systems in 1979. However, the current abundance of available user data have made personalization strategies adopt more data-driven approaches and move away from incorporating theoretical knowledge. While the availability of user data obviously benefits data-driven approaches, there are opportunities for theory-driven approaches as well to exploit the available data (e. g., the implicit acquisition of user traits). In the following section we will lay out different models that are currently used in personalization. We will then continue with providing an overview of prior research that focused on the relationship between psychological models and online behaviors, and then we will continue with work that has looked at the automated inference of psychological models and discuss the work on personalized systems based on psychological models.

2.1 Personality Personality is a long-lasting research subject in psychology [2]. Personality is considered to be reflected in behavior through the coherent patterning of affection, cognition, and desires, and has been shown to be a stable construct over time [63]. Through the construct of personality, research has aimed to capture observable individual behavioral differences [21]. Traditional personality psychology has established numerous associations between personality and concepts such as happiness, physical and psychological health, spirituality, and identity at an individual level; the quality of relationships with peers, family, and romantic others at an interpersonal level; and occupational choice, satisfaction, and performance, as well as community involvement, criminal activity, and political ideology at a social institutional level (for an overview see [84]). Different models have been developed to describe personality of people. The most commonly used model is the Five-Factor Model (FFM; mostly used in academic research). The FFM found its roots in the lexical hypothesis, which proposes that person-

6 | M. P. Graus and B. Ferwerda ality traits and differences that are most important and relevant to people eventually become a part of their language. Thus the lexical hypothesis relies on the analysis of language to derive personality traits [2]. The notion of the lexical hypothesis was used by Cattell [14] to lay out the foundation of the FFM by identifying 16 distinct factors. Based on the identified 16 factors, Tupes and Christal [107] found recurrences among the factors that resulted in clusters that represent the five personality traits that make up the FFM (see Table 1.1). The FFM thus describes personality in five factors (also called the Big Five personality traits): openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism.1 Different measurements were created to assess the five personality factors, of which the Big Five Inventory (BFI: 44-item) [63] and the Ten-Item Personality Inventory (TIPI: 10-item) [47] are two commonly used surveys. Table 1.1: Five-Factor Model adopted from John, Donahue, and Kentle [63]. General dimension

Primary factors

Openness to experience Conscientiousness Extraversion Agreeableness Neuroticism

Artistic, curious, imaginative, insightful, original, wide interest Efficient, organized, planful, reliable, responsible, thorough Active, assertive, energetic, enthusiastic, outgoing, talkative Appreciative, forgiving, generous, kind, sympathetic, trusting Anxious, self-pitying, tense, touchy, unstable, worrying

2.2 Cognitive styles Cognitive styles refer to the psychological dimensions that determines individuals’ modes of perceiving, remembering, thinking, and problem-solving [56, 77]. Different cognitive styles have been identified to indicate individual processes, such as analyticholistic and verbal-visual. In an attempt to make sense out of the fuzziness of different kinds of cognitive styles, Miller [78] proposed a hierarchal framework for systematizing cognitive styles (particularly the analytical-holistic dimension) by connecting them to different stages of cognitive processes2 (see Fig. 1.1). To measure general cognitive styles multiple questionnaires have been developed. The Cognitive Style Index (CSI; Hayes and Allinson [56]) is often used. The CSI consists of 38 items and assigns a score along a single dimension ranging from intuitive to analytical. The items are based on the following cognitive style dimensions: activepassive, analytic-holistic, and intuitive-systematic. An alternative measurement to the 1 Also called the OCEAN model due to the acronym of the five factors. 2 We acknowledge the existence of basic, higher, and complex cognitive processes. However, since this chapter focuses on personalization strategies, the focus will be put on individual differences (i. e., cognitive styles) rather than the generic cognitive processes (e. g., perception, memory, thought).

1 Theory-grounded user modeling for personalized HCI

| 7

Figure 1.1: A model of cognitive styles and processes adopted from Miller [78].

CSI is the cognitive style analysis (CSA; [90]), which assigns scores to the analyticholistic and verbal-visual dimensions. Although it is recognized that individual differences exist in general cognitive functioning, their effects are often diluted by overlapping characteristics of humans. Cognitive styles have been shown to be a better predictor of influence for particular situations and tasks rather than general functioning [70]. For example, cognitive styles have been shown to be related to academic achievements among students (see for an overview Coffield et al. [20]). Despite the domaindependent variations of cognitive styles that adhere to their own measurements (e. g., learning style questionnaire [57] to assess learning styles), studies have shown that correlations between learning styles and cognitive styles exist (see for an overview Allinson and Hayes [1]). In the following sections, we discuss the domain-dependent cognitive styles that are currently used for the purpose of personalization. 2.2.1 Learning styles The educational field has given much attention to identifying individual differences based on a subset of cognitive styles, namely, learning styles. Messick [77] discussed the merit of using cognitive styles to characterize people in an educational setting. Related to people having different preferences regarding processing information as described by cognitive styles, people have different preferences for acquiring knowledge, which are captured in learning styles. In applications that have as a goal to assist people in learning, learning styles are a logical candidate to base personalization on. Coffield et al. [20] provided an extensive overview of these learning styles, comprising a selection of 350 papers from over 3000 references, in which they identify 13 key models of learning styles. Aside from this overview, they provide references to

8 | M. P. Graus and B. Ferwerda surveys to measure learning styles, together with a list of studies in which these surveys are validated. Two notable models are the Learning Style Inventory (LSI) by Kolb [68] and the Learning Styles Questionnaire (LSQ) by Honey and Mumford [57]. The LSI assesses learning styles through a 100-item self-report questionnaire indicating preferences for environment (e. g., temperature), emotional (e. g., persistence), sociological (e. g., working alone or with peers), physical (e. g., modality preferences), and psychological factors (e. g., global-analytical). The LSQ uses an 80-item checklist to assess learning styles on four dimensions: activist, reflector, theorist, and pragmatist. 2.2.2 Personal styles Whereas the previously described FFM of personality (see Section 2.1) found its ground in the lexical hypothesis, the Myers–Briggs model is based on cognitive styles. The Myers–Briggs model of personality is commonly used in the consultancy and training world. People’s scores on the Myers–Briggs model are measured by the Myers–Briggs Type Indicator (MBTI) [83], consisting of 50 questions to measure personality types. The MBTI describes a person’s personality across four dimensions: 1. extraversion-introversion (E vs. I): how a person gets energized; 2. sensing-intuition (S vs. N): how a person takes in information; 3. thinking-feeling (T vs. F): the means a person uses to make decisions; 4. judging-perceiving (J vs. P): the speed with which a person makes decisions. The combination of these four dimensions results in one of 16 personality types that are based on Jung’s personality theory from the early 1920s [64] (see Fig. 1.2).

2.3 Relationship between models Psychological models all have their foundation in either behavioral or cognitive assessments of people. As behavior and cognition are so interconnected, the different models are expected to be related as well. Busato et al. [10] found distinct correlations between several personality traits and the type of learning styles people adhere to. Zhang [115] showed relationships between personality traits and cognitive styles and found that creativity-generating and more complex cognitive styles were related to extraversion and openness. Also other models from traditional psychology that have not been used yet for personalization purposes have shown to correlate with personality traits. For example, Greenberg et al. [53] found correlations between personality traits and people’s “music sophistication” (Gold-MSI; [82]); a measurement to indicate the musical expertise of people. An extensive literature review that summarizes the findings of how cognitive styles correlate with other psychological models is given in Allinson and Hayes [1]. In multiple studies cognitive styles have been shown to correlate with other measures

1 Theory-grounded user modeling for personalized HCI

| 9

Figure 1.2: The 16 MBTI personality types combinations based on the four personality dimensions. Reprinted from Wikipedia, by Jake Beech, 2014, retrieved from https://commons.wikimedia.org/ wiki/File:MyersBriggsTypes.png. Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported license.

of learning, thinking, or teaching, such as the LSQ [57]. This indicates that models are related and user traits according to one model can be indicative of their traits in another model.

3 The relationship between psychological traits and online behavior Aside from using psychological traits defined by traditional psychology to explain the root cause of online behavior, there is an uncharted terrain evolving with the advancement of technologies. Especially as technologies are becoming increasingly ubiquitous and pervasive, new ways of interaction become available between technologies and users. These new ways of interaction may reduce how straightforward the relationship between online behavior and traditional psychological traits is. Hence, there

10 | M. P. Graus and B. Ferwerda is an increased importance to verify to what extent results from traditional psychology still hold in computer-mediated scenarios before implementation. Aside from that, there is a need for critical reflection on the findings of new relationships between psychological traits and online behavior (e. g., the differentiation between correlation and causation) and the implications for implementation of such findings in personalized systems. In this section we lay out the related work that has focused on verifying relationships between behaviors and traits based on results from traditional psychology, and work that has focused on identifying new relationships between psychological traits and online behaviors. Subsequently, Section 5 discusses work that focused on incorporating psychological traits in personalized systems.

3.1 Personality The way we communicate with others is becoming increasingly mediated through technology in the form of social networking sites (SNSs), such as Facebook, Instagram, and Twitter [24, 108]. Just like personalities are found to be related to many behaviors in the social or physical world, the digital footprint that people leave behind on these SNSs can be a reflection of people’s personalities as well. Factors such as images (e. g., profile pictures), expressions of thoughts (e. g., content postings), and content preferences (e. g., reactions on content) are in general the information that people leave behind digitally. Similar factors in the real world have already shown to consist of information that people use to generate impressions about others [40]. Back et al. [5] showed that the personalities we express online have resemblances with the personalities that we express in the real world. In other words, it seems that how people express themselves online is an extension of their personality-based behavior, preferences, and needs in the real world. The notion of extended personality has led to different studies that investigated the digital footprint of users in relation to their personality traits. In particular Facebook has received a lot of attention in the search for personality-related behaviors. To exemplify the abundance of research on online personalities, we highlight some of the work that has been done. Some of the results that are found indicate a direct interpretation of personality characteristics to certain online behaviors. For example, one of the findings of Ross et al. [93] showed that extroverts on average belong to more Facebook groups, which is argued to relate to the social nature of extroverts, that leverage Facebook as a social tool. Neurotics (less emotionally stable) were found to spend more time on Facebook, allegedly in an attempt to try to make themselves look as attractive as possible [81]. Conscientiousness has been shown to be related to an increased usage of Twitter although this is not the case for Facebook. Hughes et al. [61] explains this finding to the limitation of the characters that can be used in a Tweet, which causes conscientious people to still be able to partake in social networking without it becoming a temporal distraction.

1 Theory-grounded user modeling for personalized HCI

| 11

Although most research in online behavior has been done in the context of SNSs in which relationships are sought between personality traits and online behavior, research has also been done in other areas focusing on the transferability of personality judgment. For example, Biel, Aran, and Gatica-Perez [9] found that personality impressions can be transferred via video logs (vlogs). An overview of current research on personality related to online behaviors can be found in Table 1.2.

3.2 Cognitive styles: learning styles Whereas personality research has mainly focused on the relationship in SNSs, the research on cognitive styles has influenced other application areas, such as online learning environments. The research in this area has mainly focused on identifying whether the cognitive styles as known from traditional psychology have the same effects in an online setting. The findings from the research that has been done on cognitive styles and online learning environments are inconclusive. The disposition of the various results sketches the importance to validate the effect of psychological traits in relation to online behavior. For example, Zacharis [114] looked at 161 students in which the difference in learning style was investigated between online and offline participation of students, but no differences between the two different groups were found. Huang, Lin, and Huang [60] found differences between online and offline students, such as sensing learners (i. e., those who were patient with details and were good at practical work) engaged online more frequently and for a longer duration. Similarly, Wang et al. [110] showed that online achievements are influenced by the learning styles of students. As mentioned previously, the majority of research on cognitive styles has focused on investigating to what extent understanding of offline learning translates to online learning environments. Research has been conducted that looked at adapting content delivery based on cognitive styles in these environments. Also, the results are mainly inconclusive in whether cognitive styles can explain individual behavior. For example, Graf and Liu [49] identified different navigational strategies based on learning styles, information that can potentially be used to create adaptive interfaces. However, Mitchell, Chen, and Macredie [80] showed that adaptive interfaces based on cognitive styles do not have an advantageous effect on student performance. One of few studies that investigated differences based on cognitive styles in a different domain than a learning environment is Belk et al. [7]. They found that people with different cognitive styles differ in preferences with regards to CAPTCHA (acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”).3 3 Commonly used as a security defense mechanism to determine whether an entity interacting with a system is a human and not, for example, an automated bot.

12 | M. P. Graus and B. Ferwerda Table 1.2: An overview of current research on the relationship between personality traits and online behavior. Study

Domain

Ellison [24]

Facebook

Moore and McElroy [81]

Facebook

Valkenburg and Peter [108] Ross et al. [93] Back et al. [6]

Seidman [99]

Gosling, Gaddis, and Vazire [46] Gosling et al. [48]

Rosenberg and Egbert [92] Bachrach et al. [4] Carpenter [12]

Skues, Williams, and Wise [101] Stieger et al. [102]

Lee, Ahn, and Kim [73] Ljepava et al. [75]

Eftekhar, Fullwood, and Morris [23]

Winter et al. [111]

Chen and Marcus [17]

Jenkins-Guarnieri, Wright, and Hudiburgh [62]

Wu, Chang, and Yuan [113] Ryan and Xenos [94]

Amichai-Hamburger and Vinitzky [3] Quercia et al. [88]

Davenport et al. [22] Hughes et al. [61]

Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook Facebook

Facebook & Twitter Facebook & Twitter

Qiu et al. [87]

Twitter

Lay and Ferwerda [72]

Instagram

Ferwerda and Tkalcic [34]

Schrammel, Köffel, and Tscheligi [97]

Ferwerda, Tkalcic, and Schedl [35, 36] Ferwerda, Schedl, and Tkalcic [29] Ferwerda et al. [37]

Tkalčič et al. [104]

Marcus, Machilek, and Schütz [76] Ferwerda et al. [38]

Chen, Wu, and He [18] Hu and Pu [58]

Golbeck and Norris [43]

Biel, Aran, and Gatica-Perez [9]

Instagram Online communities

Online music listening Online music listening Online music listening Online music listening Personal website

Recommender system Recommender system Recommender system Recommender system

Video logs

1 Theory-grounded user modeling for personalized HCI

| 13

They found that the cognitive processing style (i. e., verbal or visual) plays a role in the speed of the CAPTCHA completion: those that possess a more verbal cognitive style showed a faster completion of textual CAPTCHAs (e. g., text recognition: deciphering a scrambled text), whereas those adhering to a visual cognitive style showed a faster completion of visual CAPTCHAs (e. g., image recognition: finding a matching set of pictures).

4 Automated inference of psychological traits or styles By investigating how psychological knowledge from the offline world transfers to online environments, research has turned to personalization opportunities based on psychological traits and/or styles. The research on personalization has not only focused on implementing psychological traits and styles in systems, but also on how to implicitly infer these traits and styles. By being able to implicitly infer relevant psychological traits and styles, personalization strategies based on these traits and styles can be implemented without the use of extensive questionnaires that are normally used to assess psychological models. Although the use of questionnaires has its advantages (e. g., increased validity and reliability), it also has apparent drawbacks (e. g., time consuming and interrupting the flow between user and system). Moreover, the data for implicit inference do not necessarily come from the system directly. Thus, inference and implementation of psychological traits/styles can be achieved across different (connected) platforms [11].

4.1 Personality As personality has been shown to be related to online behavior, attempts have been made to infer personality traits from online behavior. Although all kinds of data can be exploited for personality prediction, research has primarily focused on data retrieved from SNSs. Especially Facebook, Twitter, and Instagram have received a lot of attention in attempts to infer personality from users’ information and generated content (see Table 1.3 for an overview). Golbeck, Robles, and Turner [44] looked at how language features expressed in Facebook profiles can be used to infer personality traits. They were able to create a personality predictor with mean absolute errors (MAEs) between 0.099 and 0.138 (on a normalized 0–1 scale) across the five personality traits. Similarly, Celli, Bruni, and Lepri [15] and Segalin et al. [98] showed that compositions of Facebook profiles can be used to infer users’ personalities. However, these approaches rely on content that people share on their Facebook page. With extensive privacy options available, users

14 | M. P. Graus and B. Ferwerda Table 1.3: An overview of current personality predictors. Study

Domain

Golbeck, Robles, and Turner [44] Celli, Bruni, and Lepri [15] Segalin et al. [98] Ferwerda, Schedl, and Tkalcic [30] Quercia et al. [88] Golbeck et al. [45] Ferwerda, Schedl, and Tkalcic [31, 32] Ferwerda and Tkalcic [33] Skowron et al. [100]

Facebook Facebook Facebook Facebook Twitter Twitter Instagram Instagram Twitter & Instagram

may limit the content they share on their profile. Ferwerda, Schedl, and Tkalcic [30] showed that by the decisions users make with regard to what sections of their Facebook profile they disclose is indicative of their personality information as well. They were able to create a personality predictor with a root-mean-square error (RMSE) between 0.73 and 0.99 for each personality trait (on a 1–5 scale). Other attempts to infer personality from online behavioral data have been made using Twitter. Quercia et al. [88] looked at traits of Twitter profiles (e. g., number of followers and following) and found that these characteristics could be used to infer personality. Their personality predictor achieved an RSME between 0.69 and 0.88 for each personality trait (on 1–5 scales) with openness to experience as the trait that could be predicted most accurately and extraversion as the least accurately predictable one. Golbeck et al. [45] analyzed language features (e. g., use of punctuation, sentiment) of Twitter feeds and they could predict personality with MAEs ranging from 0.119 to 0.182 (on a normalized 0–1 scale). Ferwerda, Schedl, and Tkalcic [31, 32] analyzed the picture sharing social network Instagram. More specifically, they investigated how users manipulate the pictures they upload with filters. They found that personality could be inferred from hue saturation valence (HSV) properties of the uploaded pictures. Skowron et al. [100] combined information of Twitter and Instagram. By using linguistic and meta-data from Twitter and linguistic and image data from Instagram, personality prediction could be significantly improved. They were able to achieve RMSE personality scores between 0.50 and 0.73.

4.2 Cognitive styles Although inference of cognitive styles (e. g., learning styles) has mostly been done within a learning environment (see Section 4.2.1), there is a limited amount of work that has focused on general cognitive styles in other domains. One type of applica-

1 Theory-grounded user modeling for personalized HCI

| 15

tions in which this has been studied is digital libraries. Frias-Martinez, Chen, and Liu [39] investigated to what extent behavior in digital libraries can be used to make inferences about users’ cognitive styles and how these inferences can be used to personalize these digital libraries. They outline the steps to construct a predictive model that infers cognitive styles from clickstreams and show how this can be done successfully. Similarly, Hauser et al. [54] inferred cognitive styles from the way users interacted with an advice tool for mobile phone contracts and showed that incorporating these cognitive styles improved the buying propensity of users. Belk et al. [8] argued that the cognitive style of users (i. e., the way people organize and perceive information) influences their navigational behaviors in Web 2.0 interactive systems. To investigate the effects of cognitive styles, their study consisted of three steps: (i) investigating the relationship between cognitive styles and navigational behaviors, (ii) investigating whether clustering techniques can group users based on their cognitive style, and (iii) investigating which navigational metrics can be used to predict users’ cognitive styles.

4.2.1 Learning styles Learning styles are thought to be reflected in the way students acquire knowledge. Specifically online learning environments provide students with a variety of ways to learn and the possibility to log the way students behave within the system. As such, they provide large amounts of data that enable the possibility to build models that can infer learning styles. On average the algorithms that are developed to are able to achieve a 66 % to 80 % precision. For example, Sanders and Bergasa-Suso [95] developed a system that allowed teachers to help their students study. They collected information related to how students used this system and subsequently conducted a study to investigate how well this information could be used to infer students’ learning styles expressed in the Felder–Silverman Learning Styles Model (FSLSM [25]) and measured through surveys. The FSLSM defines learning styles by four dimensions: active/reflective (A/R), sensing/intuitive (S/I), verbal/visual (V/V), and sequential/global (S/G). Sanders and Bergasa-Suso [95] were able to make significantly better predictions over naive best guesses, which indicates that the way students interact with a learning environment can indeed be used to infer their learning styles. Similarly, García et al. [41] used a Bayesian network to infer students’ learning styles as expressed by the FSLSM from how intensively the students interacted with the different elements (e. g., chat, mail, revising exam questions) in the learning system. They found that they could predict students’ learning styles with around 77 % accuracy, permitted that the students had prior experience with online learning systems. An overview of other current research on inference of the FSLSM dimensions can be found in Table 1.4.

16 | M. P. Graus and B. Ferwerda Table 1.4: An overview of learning style inferences based on the FSLSM by García et al. [41]. Percentages represent reported precision measures for the four dimensions of the FSLSM: active/reflective (A/R), sensing/intuitive (S/I), verbal/visual (V/V), and sequential/global (S/G). Study

Algorithm

A/R

S/I

V/V

S/G

Cha et al. [16] García et al. [41] Graf and Liu [50] Latham et al. [71] Özpolat and Akar [85] Villaverde, Godoy, and Amandi [109]

Decision tree Bayesian network Rule-based Rule-based NB tree classification Artificial neural network

66.70 % 58 % 79 % 86 % 70 % 69.30 %

77.80 % 77 % 77 % 75 % 73.30 % 69.30 %

100 % N/A 77 % 83 % 73.30 % N/A

71.40 % 63 % 73 % 72 % 53.30 % 69.30 %

Hidden Markov model

66.70 %

77.80 %

85.70 %

85.70 %

5 Incorporating psychological models in personalized systems In Section 3 we discussed work that focused on identifying differences/similarities between offline and online behaviors as well as work identifying new relationships between psychological traits in online environments. Section 4 presented prior work on the implicit inference of psychological traits and styles from online behavior. Whereas the majority of previous work has focused on identifying relationships between behavior and psychological traits and the implicit inference of said psychological traits, limited work has incorporated psychological traits and styles in personalized systems. In this section we illustrate work that has incorporated psychological traits in systems to create personalized experiences to users.

5.1 Personality Several studies have shown how the incorporation of users’ personalities can improve prediction accuracy in the domain of recommender systems. Hu and Pu [59] demonstrated that personality can be used to overcome the new-user cold start problem that arises when no or insufficient information about a user is available to make predictions on. By relying on new users’ personality scores expressed in the FFM, predictions could be made without the need for any additional rating data. Similarly, Fernández-Tobías et al. [27] showed that the incorporation of personality data in a recommender algorithm allowed for more easily recommending across domains. By considering users’ personality scores in conjunction with ratings across domains (i. e., books, movies, and music), they found that they were better able to predict users’ ratings in one domain based on another if they considered personality on top of rating information. Similarly, Ferwerda and Schedl [28] and Ferwerda, Schedl, and Tkalcic

1 Theory-grounded user modeling for personalized HCI

| 17

[30] showed how personality information can be integrated and exploited to improve music recommender systems, whereas Tkalcic, Delic, and Felfernig [103] proposes to incorporate personality to better serve individual needs in group recommendations by taking into account different personality types.

5.2 Cognitive styles Personalization strategies based on cognitive styles have primarily been applied in a learning context (Section 5.2.1 discusses work in a learning context that specifically used cognitive learning styles instead of the general cognitive styles). Karampiperis et al. [65] adapted the cognitive trait model of Lin [74], focusing on the inductive reasoning ability of participants (next to the basic cognitive trait, working memory) to personalize learning environments in which they dynamically adapted the content presentation based on different learners’ navigational behavior. Triantafillou et al. [105] proposed the Adaptive Educational System based on Cognitive Styles (AES-CS) system, a system based on the cognitive field dependence and independence style of Witkin et al. [112]. Different navigational organization, amount of control, and navigational support tools were adapted based on this cognitive style. The dynamic adaptation of the interaction elements based on field dependency and independency showed a significant increase in performance compared to when a static version of the system was presented. In a similar fashion, Tsianos et al. [106] used Riding’s Cognitive Style Analysis [91] to categorize users: imager/verbal and wholist/analyst. Based on the measured categorization they provided users with an adaptive content presentation and navigational organization. An overview of the studies that applied cognitive styles for adaptation can be found in Table 1.5. Table 1.5: An overview of studies in which cognitive styles were adopted for personalization. Study

Personalization

Karampiperis et al. [65] Triantafillou et al. [105]

∙ Adaptive presentation of content ∙ Navigation organization ∙ Amount of user control ∙ Navigation support tools ∙ Adaptive presentation of content ∙ Navigation organization

Tsianos et al. [106]

5.2.1 Learning styles Knowing a student’s learning styles can be used to alter online learning environments to provide students with information in line with the way they prefer to process infor-

18 | M. P. Graus and B. Ferwerda mation. This is shown to lead to improved learning. Although there is debate about the merit of matching a learning environment to learning styles, there are indications that personalizing learning environments improves the learning effectiveness. Related to learning styles, the working memory capacity has been shown to be a trait that influences what is the best learning environment [42, ch. 4]. Students for whom the instructional style matched their learning style scored higher in tests and expressed lower levels of anxiety. Graf [51] proposed a framework using the FSLSM [25]. The framework consists of adaptation strategies using the sequence as well as the amount of examples and exercises. The same FSLSM [25] was used by Papanikolaou et al. [86] in which interaction preferences were investigated in two educational systems (i. e., FLexi-OLM and INSPIRE) to provide personalized learner support. Carver, Howard, and Lane [13] used Solomon’s Inventory of Learning Styles [26] to create an interface with adaptive content presentation based on the learning style. Milosevic et al. [79] used the LSI [68] in addition to capture preferences, knowledge, goals, and navigational histories of users to adapt the learning environment. An overview of the studies that applied learning styles for adaptation can be found in Table 1.6. Table 1.6: An overview of studies in which learning styles were adopted for personalization. Study

Personalization

Carver, Howard, and Lane [13] Germanakos and Belk [42] Graf [51]

∙ Adaptive presentation of content ∙ Instructional style ∙ Sequence of content ∙ Amount of content ∙ Different course material sequencing ∙ Adaptive navigation support (i. e., sequential, global) ∙ Adaptive content presentation (i. e., visual/verbal)

Milosevic et al. [79] Papanikolaou et al. [86]

6 Combining trait inference and trait-based personalization The examples in the previous sections (Sections 4 and 5) all address two subproblems related to using psychological traits in personalization. They either aim to infer user traits or styles from online behavioral data, or they aim to use already measured user traits or styles to improve personalization approaches. These two problems however can be addressed together. Relevant, domain-dependent psychological traits can be identified from psychological theory, measured through surveys to serve

1 Theory-grounded user modeling for personalized HCI

| 19

as ground truth and incorporated in user models in personalized system. The current section presents two studies in which this is done and describes the steps in the approach.

6.1 Adapting comparison tool based on cognitive styles Hauser et al. [54] hypothesized that cognitive styles would be traits that influence how users of an online comparison tool would best be helped. They developed and tested a tool to compare cell phone contracts that relied on two subsystems for personalization. The first subsystem was a Bayesian inference loop, which was used to infer users’ cognitive styles based on what elements of the system they interacted with. The second subsystem was an automatic Gittins loop, which was used to learn how to adapt the content and form of the system to match the cognitive style. There were immediate feedback loops, where the Bayesian inference loop was updated with each click users made and the Gittins loop was updated after each time a user finished using the system. If the user exhibited the desired behavior, the systems’ predictions were reinforced. Similarly, if the system did not manage to convince the user to make a purchase, the prediction parameters were changed. The study showed that people expressed a higher propensity to buy, which indicates that incorporating the cognitive styles for personalization indeed improved the system. The system was initially not tested in an actual field study so no conclusions in terms of behavior could be made. In a follow-up study they conducted a field study and found that their approach improves the purchase likelihood in a more natural setting [55].

6.2 Library personalization cased on parenting styles Graus, Willemsen, and Snijders [52] personalized a digital library for new parents, or parents with children younger than 2 years old. When designing personalized systems for young parents an additional challenge arises from the fact that not only the users can be new to a system, the users themselves are most likely new to the domain of parenting. As such they may not be aware of what type of content is relevant to them and a mismatch may exist between interests and interaction behavior. Information about how to get babies to sleep appears relevant to everyone, but in practice is more relevant for people that raise their kids according to a schedule than it is for parents who allow their children to decide when they go to bed. Parents however might beforehand not know whether they want to raise their children according to a schedule or more flexibly and thus incorrectly judge the relevance of content on getting kids to sleep. Graus, Willemsen, and Snijders [52] compared user experience and behavior in a library personalized based on reading behavior against a library personalized based

20 | M. P. Graus and B. Ferwerda on survey responses to measure parenting styles. Their study consisted of an initial data collection that served to gather interaction and survey data to measure parenting styles. They used these data to create personalized relevance predictions which were used to re-order the articles in the library for each individual user. In a second session the same users were re-invited to the now personalized library, and data regarding their behavior and user experience were collected. The data showed that personalizing the order of articles based on the survey responses resulted in a better user experience than when relying on reading behavior, despite the fact that the former had lower objective prediction accuracy.

7 Conclusion Over the years, personalization strategies have been applying different methods. Whereas in the past personalization took a more theoretical approach (e. g., by developing systems that require explicit authoring [67, 89] or by leveraging different psychological models, as discussed in this chapter), the abundance of behavioral data and computational power nowadays have caused a shift to a more data-driven perspective (e. g., collaborative filtering [69]). Although these two different perspectives on personalization are often used in isolation, they could be used together to maximize each other’s potential and mitigate each other’s limitations. The way users interact with systems can in part be explained through psychological models of the users and incorporating these same psychological models in personalized systems may be a way to improve these systems in terms of effectiveness, efficiency, or user satisfaction over considering interaction behavior alone. The current chapter illustrates the possible advantages of combining psychological theory with more naive, data-driven methods for personalization. Doing so leverages the potential of data describing interaction behavior, with the benefits of having interpretable, meaningful user models. The current chapter presents a number of psychological models that are used in personalization and how they can complement approaches that rely on behavioral data only. Furthermore, the chapter presents a number of ways in which user traits in terms of these models can be inferred from their interaction behavior and presents ways in which users’ inferred traits can be used to improve how systems are personalized. The benefit of this approach is illustrated with two studies that created a full system by incorporating the inference of psychological models as well as implementing them to personalize systems. The current chapter considers mainly stable user traits, but more dynamic user characteristics can be considered as well. In principle, any latent trait or characteristic that is related to how users interact with systems and their needs of a system can be used. For example, expertise or experience with a system has been shown to

1 Theory-grounded user modeling for personalized HCI

| 21

have an effect on how people prefer interacting with a system [66]. Similarly, in adaptive hypermedia the inferred level of knowledge dictates what information the system presents [67]. These characteristics are more prone to change, even as a result of using the system, which brings additional challenges with it. As they are related to both the way people interact with a system and what they need from a system, they are logical candidates for being incorporated in user models. In summary, the chapter demonstrates that adopting a more theoretical perspective by incorporating user traits into personalized systems can lead to improvements of existing systems [27, 59] and that this approach can be used to build new systems [52, 54]. The presented findings warrant future research to focus on incorporating theoretical knowledge about users in personalized systems, instead of solely relying on behavioral data. Apart from providing directions for future research, the literature can be used to generate a blueprint that captures the idea of combining the theory and data-driven perspective on personalization (see Section 7.1).

7.1 Theory-driven personalization blueprint The approach of incorporating psychological traits into personalization approaches can be formulated into a blueprint. The proposed multidisciplinary approach comprises both theoretical and methodological challenges. Designing a theory-driven personalized system involves four steps. The first step (Section 7.1.1) is the identification of the right user traits and the right model to measure those. There is a virtually unlimited freedom of choice and making the right choice can be daunting. A model’s suitability depends on the application, the domain, and the users. After identifying the right model, the second step (Section 7.1.2) consists of collecting data regarding the users’ traits through surveys or through existing inference methods. After collecting these data, the third step (Section 7.1.3) is to find methods to infer the user traits measured in the previous step from natural interaction behavior with the target system. This third step is optional, as in some cases user traits might readily be available. The fourth step (Section 7.1.4) is incorporating the user traits in the personalized systems through formal user models. This section serves to explain these four steps in more detail.

7.1.1 Step 1: identifying the right psychological model The first step consists of identifying the right user traits that can be used to improve systems through personalization. Two aspects play a role, firstly the level of generality or specificity. The more specific, the more likely it is that the user traits can be inferred reliably and the more likely it is that they can be used to improve the system.

22 | M. P. Graus and B. Ferwerda Another challenge is the availability of measurement instruments. Regardless of how the user traits will be measured, collecting ground truth to incorporate in the personalization is essential. If validated measurement instruments are available, the chance of success is much higher as there is no need to develop and validate a new measurement instrument. A drawback of generic models such as personality is that they are not necessarily strongly related to what a user needs from a system. More specific models are more likely to be related to users’ needs. If no specific model is available, another possibility is that of developing an instrument to measure relevant user traits, either by designing it from scratch, or by combining existing instruments that measure relevant aspects. This step however then requires designing and validating a survey. 7.1.2 Step 2: collecting data regarding individual user traits After identifying the right model, the second step is collecting data regarding the user traits of the individual users of a system. This can be done in two ways. On the one hand this can be done through surveys as part of the system. Measurement instruments already exist for most psychological models. Collecting data then becomes a matter of administering surveys to users of the system. However, using surveys can be time consuming and interrupts the user from interacting with the system. An alternative way to acquire user data can be done by using inference methods through the use of external data sources (e. g., through the connectedness of single sign-on mechanisms). 4 If traits can be inferred from external data, collecting these data suffices to start personalizing the system without the need to interrupt the interaction flow between the user and the system. 7.1.3 Step 3: inferring individuals’ user traits from interaction behavior The data collected on the user traits in the second step can be used as ground truth to build models that can infer user traits from natural behavior with the system. Section 4 describes for different models how user traits can be inferred. Hauser et al. [54] performed this step in what they called a priming study. This priming study served to create a baseline model that inferred cognitive styles from clickstream behavior. Later on they relied on a Bayesian inference loop to relate certain aspects of behavior (e. g., what elements users interacted with) to cognitive styles. Similarly, Frias-Martinez, Chen, and Liu [39] trained a neural network to infer cognitive styles from navigation behavior in a digital library. 4 Buttons that allow users to register with or login to a system with accounts from other applications. For example, social networking services: “Login with your Facebook account.”

1 Theory-grounded user modeling for personalized HCI | 23

As mentioned in Step 2, this inference from interaction behavior is in some cases not needed. When using for example single sign-on mechanisms, data from an external system can be used to make inferences as the interconnectedness can indefinitely provide information regarding the systems’ users. A problem that occurs is that not all data from the external system are necessarily readily useful for personalization. By relying on psychological models, these data can be exploited through methods as described by Golbeck et al. [45] and Ferwerda, Schedl, and Tkalcic [31], resulting in information useful for personalization. The use of psychological models thus allows for maximum usage of data as even data that are not directly related to the system of interest can be exploited for the inference. Acquiring data from external sources can mitigate the cold start problem that occurs when users use a system for the first time and no historical interaction behavior is available to base predictions on [96].

7.1.4 Step 4: incorporating user traits in personalization models The fourth and final step consists of incorporating these traits in the personalization models. Most straightforwardly this can be done in business rules (similar to as described in [89]). If we know for example that a user has a visual cognitive style, a system might start putting more emphasis on visual information. In a more data-driven way this can be done following Hauser et al. [54], who used a Gittins loop to decide what way of presenting content resulted in the desired behavior (purchasing). In such a way, the system learned how to adapt the content to the users’ cognitive styles. An additional advantage of incorporating user traits is that the cold start problem can be (partially) alleviated. If we rely on external data or surveys to infer user traits, the system can be personalized even for users for whom no interaction behavior is available. Hu and Pu [59] did this by using personality information for calculating predictions during the cold start stage. Personality information has made it possible to calculate rating predictions even for users for whom no rating information is available.

References [1] [2] [3] [4] [5]

Christopher Allinson and John Hayes. “The cognitive style index: Technical manual and user guide.” In: Retrieved January 13 (2012), p. 2014. Gordon W Allport and Henry S Odbert. “Trait-names: A psycho-lexical study.” In: Psychological monographs 47.1. 1936, p. i. Yair Amichai-Hamburger and Gideon Vinitzky. “Social network use and personality.” In: Computers in human behavior 26.6 (2010), pp. 1289–1295. Yoram Bachrach et al.“Personality and patterns of Facebook usage.” In: Proceedings of the 4th Annual ACM Web Science Conference. ACM. 2012, pp. 24–32. Mitja D Back et al.“Facebook profiles reflect actual personality, not selfidealization.” In: Psychological science 21.3 (2010), pp. 372–374.

24 | M. P. Graus and B. Ferwerda

[6]

[7]

[8]

[9] [10]

[11]

[12] [13]

[14] [15]

[16]

[17]

[18]

[19] [20] [21] [22]

[23]

[24]

Mitja D. Back et al.“Facebook Profiles Reflect Actual Personality, Not SelfIdealization.” In: Psychological Science 21.3 (Mar. 2010), pp. 372–374. ISSN: 0956-7976. DOI: 10.1177/0956797609360756. URL: http://journals.sagepub.com/doi/10.1177/ 0956797609360756. Marios Belk et al.“Do human cognitive differences in information processing affect preference and performance of CAPTCHA?” In: International Journal of Human-Computer Studies 84 (2015), pp. 1–18. Marios Belk et al.“Modeling users on the World Wide Web based on cognitive factors, navigation behavior and clustering techniques.” In: Journal of Systems and Software 86.12 (2013), pp. 2995–3012. Joan-Isaac Biel, Oya Aran, and Daniel Gatica-Perez. “You Are Known by How You Vlog: Personality Impressions and Nonverbal Behavior in YouTube.” In: ICWSM. 2011. Vittorio V Busato et al.“The relation between learning styles, the Big Five personality traits and achievement motivation in higher education.” In: Personality and individual differences 26.1 (1998), pp. 129–140. Ivan Cantador, Ignacio Fernandez-Tobfas, and Alejandro Bellogfn. “Relating personality types with user preferences in multiple entertainment domains.” In: Shlomo Berkovsky, editor, CEUR Workshop Proceedings, 2013. Christopher J Carpenter. “Narcissism on Facebook: Self-promotional and anti-social behavior.” In: Personality and individual differences 52.4 (2012), pp. 482–486. Curtis A Carver, Richard A Howard, and William D Lane. “Enhancing student learning through hypermedia courseware and incorporation of student learning styles.” In: IEEE transactions on Education 42.1 (1999), pp. 33–38. Raymond B Cattell. Personality and motivation structure and measurement. 1957. Fabio Celli, Elia Bruni, and Bruno Lepri. “Automatic personality and interaction style recognition from facebook profile pictures.” In: Proceedings of the 22nd ACM international conference on Multimedia. ACM. 2014, pp. 1101–1104. Hyun Jin Cha et al.“Learning styles diagnosis based on user interface behaviors for the customization of learning interfaces in an intelligent tutoring system.” In: International Conference on Intelligent Tutoring Systems. Springer. 2006, pp. 513–524. Baiyun Chen and Justin Marcus. “Students’ self-presentation on Facebook: An examination of personality and self-construal factors.” In: Computers in Human Behavior 28.6 (2012), pp. 2091–2099. Li Chen, Wen Wu, and Liang He. “How personality influences users’ needs for recommendation diversity?” In: CHI’13 Extended Abstracts on Human Factors in Computing Systems. ACM. 2013, pp. 829–834. Robert B Cialdini. Influence, vol. 3. A. Michel Port Harcourt. 1987. Frank Coffield et al.Learning styles and pedagogy in post-16 learning: A systematic and critical review. 2004. Philip J Corr and Gerald Matthews. The Cambridge handbook of personality psychology. Cambridge University Press Cambridge. 2009. Shaun W Davenport et al.“Twitter versus Facebook: Exploring the role of narcissism in the motives and usage of different social media platforms.” In: Computers in Human Behavior 32 (2014), pp. 212–220. Azar Eftekhar, Chris Fullwood, and Neil Morris. “Capturing personality from Facebook photos and photo-related activities: How much exposure do you need?” In: Computers in Human Behavior 37 (2014), pp. 162–170. Nicole B Ellison et al.“Social network sites: Definition, history, and scholarship.” In: Journal of computer-mediated Communication 13.1 (2007), pp. 210–230.

1 Theory-grounded user modeling for personalized HCI

[25] [26] [27]

[28] [29] [30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38] [39]

[40] [41]

| 25

Richard M Felder, Linda K Silverman, et al.“Learning and teaching styles in engineering education.” In: Engineering education 78.7 (1988), pp. 674–681. Richard M Felder and B Solomon. “Inventory of learning styles.” In: Retrieved January 8 (1998), p. 1998. Ignacio Fernandez-Tobfas et al.“Alleviating the new user problem in collaborative filtering by exploiting personality information.” In: User Modeling and User-Adapted Interaction 26.2–3 (June 2016), pp. 221–255. ISSN: 0924-1868. DOI: 10.1007/s11257-016-9172-z. URL: http://link.springer.com/10.1007/s11257-016-9172-z. Bruce Ferwerda and Markus Schedl. “Enhancing Music Recommender Systems with Personality Information and Emotional States: A Proposal.” In: UMAP Workshops. 2014. Bruce Ferwerda, Markus Schedl, and Marko Tkalcic. “Personality & Emotional States: Understanding Users’ Music Listening Needs.” In: UMAP Workshops. 2015. Bruce Ferwerda, Markus Schedl, and Marko Tkalcic. “Personality traits and the relationship with (non-) disclosure behavior on facebook.” In: Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee. 2016, pp. 565–568. Bruce Ferwerda, Markus Schedl, and Marko Tkalcic. “Predicting personality traits with instagram pictures.” In: Proceedings of the 3rd Workshop on Emotions and Personality in Personalized Systems 2015. ACM. 2015, pp. 7–10. Bruce Ferwerda, Markus Schedl, and Marko Tkalcic. “Using instagram picture features to predict users’ personality.” In: International Conference on Multimedia Modeling. Springer. 2016, pp. 850–861. Bruce Ferwerda and Marko Tkalcic. “Predicting Users’ Personality from Instagram Pictures: Using Visual and/or Content Features?” In: The 26th Conference on User Modeling, Adaptation and Personalization, Singapore. 2018. Bruce Ferwerda and Marko Tkalcic. “You Are What You Post: What the Content of Instagram Pictures Tells About Users’ Personality.” In: Companion Proceedings of the 23rd International on Intelligent User Interfaces: 2nd Workshop on Theory-Informed User Modeling for Tailoring and Personalizing Interfaces (HUMANIZE). 2018. Bruce Ferwerda, Marko Tkalcic, and Markus Schedl. “Personality Traits and Music Genre Preferences: How Music Taste Varies Over Age Groups.” In: Proceedings of the 1st Workshop on Temporal Reasoning in Recommender Systems (RecTemp) at the 11th ACM Conference on Recommender Systems, Como, August 31, 2017. Bruce Ferwerda, Marko Tkalcic, and Markus Schedl. “Personality Traits and Music Genres: What Do People Prefer to Listen To?” In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM. 2017, pp. 285–288. Bruce Ferwerda et al.“Personality traits predict music taxonomy preferences.” In: Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems. ACM. 2015, pp. 2241–2246. Bruce Ferwerda et al.“The Influence of Users’ Personality Traits on Satisfaction and Attractiveness of Diversified Recommendation Lists.” In: EMPIRE@ RecSys. 2016, pp. 43–47. Enrique Frias-Martinez, Sherry Y. Chen, and Xiaohui Liu. “Automatic cognitive style identification of digital library users for personalization.” In: Journal of the American Society for Information Science and Technology 58.2 (2017), pp. 237–251. DOI: 10.1002/asi.20477. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/asi.20477. URL: https:// onlinelibrary.wiley.com/doi/abs/10.1002/asi.20477. David C Funder. Personality judgment: A realistic approach to person perception. Academic Press. 1999. Patricio Garcia et al.“Evaluating Bayesian networks’ precision for detecting students’ learning

26 | M. P. Graus and B. Ferwerda

[42]

[43]

[44]

[45]

[46] [47]

[48]

[49] [50]

[51]

[52]

[53] [54]

[55]

[56]

[57] [58]

styles.” In: Computers & Education 49.3 (2007), pp. 794–808. Panagiotis Germanakos and Marios Belk. Human-Centred Web Adaptation and Personalization. Human-Computer Interaction Series. Cham: Springer International Publishing. 2016, p. 336. ISBN: 978-3-319-28048-6. DOI: 10.1007/978-3-319-280509. URL: http://link.springer.com/10.1007/978-3-319-28050-9. Jennifer Golbeck and Eric Norris. “Personality, movie preferences, and recommendations.” In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on. IEEE. 2013, pp. 1414–1415. Jennifer Golbeck, Cristina Robles, and Karen Turner. “Predicting personality with social media.” In: CHI’11 extended abstracts on human factors in computing systems. ACM. 2011, pp. 253–262. Jennifer Golbeck et al.“Predicting personality from twitter.” In: Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on. IEEE. 2011, pp. 149–156. Samuel D Gosling, Sam Gaddis, Simine Vazire et al.“Personality impressions based on facebook profiles.” In: ICWSM 7. 2007, pp. 1–4. Samuel D Gosling, Peter J Rentfrow, and William B Swann Jr. “A very brief measure of the Big-Five personality domains.” In: Journal of Research in personality 37.6 (2003), pp. 504–528. Samuel D Gosling et al.“Manifestations of personality in online social networks: Self-reported Facebook-related behaviors and observable profile information.” In: Cyberpsychology, Behavior, and Social Networking 14.9 (2011), pp. 483–488. Sabine Graf and T-C Liu. “Analysis of learners’ navigational behaviour and their learning styles in an online course.” In: Journal of Computer Assisted Learning 26.2 (2010), pp. 116–131. Sabine Graf and Tzu-Chien Liu. “Supporting Teachers in Identifying Students’ Learning Styles in Learning Management Systems: An Automatic Student Modelling Approach.” In: Journal of Educational Technology & Society 12.4 (2009). Sabine Graf et al.“Advanced adaptivity in learning management systems by considering learning styles.” In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology-Volume 03. IEEE Computer Society. 2009, pp. 235–238. Mark P. Graus, Martijn C. Willemsen, and Chris C. P. Snijders. “Personalizing a Parenting App: Parenting-Style Surveys Beat Behavioral Reading-Based Models.” In: Joint Proceedings of the ACM IUI 2018 Workshops. 2018. David M Greenberg et al.“Personality predicts musical sophistication.” In: Journal of Research in Personality 58 (2015), pp. 154–158. J. R. Hauser et al.“Website Morphing.” In: Marketing Science 28.2 (Mar. 2009), pp. 202–223. ISSN: 0732-2399. DOI: 10.1287/mksc.1080.0459. URL: http://mktsci.journal.informs.org/ cgi/doi/10.1287/mksc.1080.0459. JR Hauser, GL Urban, and G Liberali. “Website Morphing 2.0: Technical and Implementation Advances Combined with the First Field Experiment of Website Morphing” (2011). URL: http://web.mit.edu/hauser/www/HauserArticles5.3.12/Hauser_Urban_Liberali_Website_ Morphing_20_September2011.pdf. John Hayes and Christopher W Allinson. “Cognitive style and the theory and practice of individual and collective learning in organizations.” In: Human relations 51.7 (1998), pp. 847–871. Peter Honey and Alan Mumford. The learning styles helper’s guide. Peter Honey Publications Maidenhead. 2000. Rong Hu and Pearl Pu. “Exploring Relations between Personality and User Rating Behaviors.”

1 Theory-grounded user modeling for personalized HCI

[59]

[60]

[61]

[62]

[63] [64] [65] [66]

[67]

[68] [69] [70] [71] [72]

[73] [74] [75] [76]

[77]

| 27

In: UMAP Workshops. 2013. Rong Hu and Pearl Pu. “Using Personality Information in Collaborative Filtering for New Users.” In: 2nd ACM RecSys’10 Workshop on Recommender Systems and the Social Web. 2010, pp. 17–24. URL: http://www.dcs.warwick.ac.uk/~ssanand/RSWeb_files/Proceedings_ RSWEB-10.pdf#page=23. Eugenia Y Huang, Sheng Wei Lin, and Travis K Huang. “What type of learning style leads to online participation in the mixed-mode e-learning environment? A study of software usage instruction.” In: Computers & Education 58.1 (2012), pp. 338–349. David John Hughes et al.“A tale of two sites: Twitter vs. Facebook and the personality predictors of social media usage.” In: Computers in Human Behavior 28.2 (2012), pp. 561–569. Michael A Jenkins-Guarnieri, Stephen L Wright, and Lynette M Hudiburgh. “The relationships among attachment style, personality traits, interpersonal competency, and Facebook use.” In: Journal of Applied Developmental Psychology 33.6 (2012), pp. 294–301. Oliver P John, Eileen M Donahue, and Robert L Kentle. The big five inventory—versions 4a and 54. 1991. Carl Jung. Psychological types. Taylor & Francis. 2016. Pythagoras Karampiperis et al.“Adaptive cognitive-based selection of learning objects.” In: Innovations in education and teaching international 43.2 (2006), pp. 121–135. Bart P. Knijnenburg, Niels J. M. Reijmer, and Martijn C. Willemsen. “Each to his own.” In: Proceedings of the fifth ACM conference on Recommender systems – RecSys ’11, New York. New York, USA: ACM Press. 2011, p. 141. ISBN: 9781450306836. DOI: 10.1145/2043932.2043960. URL: http://dl.acm.org/citation.cfm?doid=2043932.2043960. Evgeny Knutov, Paul De Bra, and Mykola Pechenizkiy. “AH 12 years later: a comprehensive survey of adaptive hypermedia methods and techniques.” In: New Review of Hypermedia and Multimedia 15.1 (Apr. 2009), pp. 5–38. ISSN: 1361-4568. DOI: 10.1080/13614560902801608. URL: http://www.tandfonline.com/doi/abs/10.1080/13614560902801608. David A Kolb. Learning-style inventory: Self-scoring inventory and interpretation booklet: Revised scoring. TRG, Hay/McBer. 1993. Yehuda Koren, Robert Bell, and Chris Volinsky. “Matrix Factorization Techniques for Recommender Systems.” In: IEEE Computer (2009), pp. 42–49. Maria Kozhevnikov. “Cognitive styles in the context of modern psychology: Toward an integrated framework of cognitive style.” In: Psychological bulletin 133.3 (2007), p. 464. Annabel Latham et al.“A conversational intelligent tutoring system to automatically predict learning styles.” In: Computers & Education 59.1 (2012), pp. 95–109. Alixe Lay and Bruce Ferwerda. “Predicting users’ personality based on their ’liked’ images on Instagram.” In: Companion Proceedings of the 23rd International on Intelligent User Interfaces: 2nd Workshop on Theory-Informed User Modeling for Tailoring and Personalizing Interfaces (HUMANIZE). 2018. Eunsun Lee, Jungsun Ahn, and Yeo Jung Kim. “Personality traits and selfpresentation at Facebook.” In: Personality and Individual Differences 69 (2014), pp. 162–167. Taiyu Lin. “Cognitive profiling towards formal adaptive technologies in web-based learning communities.” In: International Journal of Web Based Communities 1.1 (2004), pp. 103–108. Nikolina Ljepava et al.“Personality and social characteristics of Facebook non-users and frequent users.” In: Computers in Human Behavior 29.4 (2013), pp. 1602–1607. Bernd Marcus, Franz Machilek, and Astrid Schutz. “Personality in cyberspace: Personal web sites as media for personality expressions and impressions.” In: Journal of personality and social psychology 90.6 (2006), p. 1014. Samuel Messick. “The nature of cognitive styles: Problems and promise in educational

28 | M. P. Graus and B. Ferwerda

[78] [79] [80] [81] [82]

[83] [84] [85] [86]

[87] [88] [89] [90]

[91] [92]

[93] [94]

[95]

[96]

[97]

practice.” In: Educational psychologist 19.2 (1984), pp. 59–74. Alan Miller. “Cognitive styles: An integrated model.” In: Educational psychology 7.4 (1987), pp. 251–268. Danijela Milosevic et al.“Adaptive learning by using scos metadata.” In: Interdisciplinary Journal of E-Learning and Learning Objects 3.1 (2007), pp. 163–174. TJF Mitchell, Sherry Y Chen, and RD Macredie. “Cognitive styles and adaptive web-based learning.” In: Psychology of Education Review 29.1 (2005), pp. 34–42. Kelly Moore and James C McElroy. “The influence of personality on Facebook usage, wall postings, and regret.” In: Computers in Human Behavior 28.1 (2012), pp. 267–274. Daniel Mullensiefen et al. “The Musicality of Non-Musicians: An Index for Assessing Musical Sophistication in the General Population.” In: 9.2 (2014). DOI: 10.1371/journal.pone.0089642. Isabel Briggs Myers. “The Myers-Briggs Type Indicator: Manual (1962).” (1962). Daniel J Ozer and Veronica Benet-Martinez. “Personality and the prediction of consequential outcomes.” In: Annu. Rev. Psychol. 57 (2006), pp. 401–421. Ebru Ozpolat and Gozde B Akar. “Automatic detection of learning styles for an e-learning system.” In: Computers & Education 53.2 (2009), pp. 355–367. Kyparisia A Papanikolaou et al.“Personalizing the Interaction in a Web-based Educational Hypermedia System: the case of INSPIRE.” In: User modeling and user-adapted interaction 13.3 (2003), pp. 213–267. Lin Qiu et al.“You are what you tweet: Personality expression and perception on Twitter.” In: Journal of Research in Personality 46.6 (2012), pp. 710–718. Daniele Quercia et al.“The personality of popular facebook users.” In: Proceedings of the ACM 2012 conference on computer supported cooperative work. ACM. 2012, pp. 955–964. Elaine Rich. “User modeling via stereotypes.” In: Cognitive science 3.4 (1979), pp. 329–354. Richard Riding and Indra Cheema. “Cognitive Styles—an overview and integration.” In: Educational Psychology 11.3–4 (1991), pp. 193–215. DOI: 10.1080/0144341910110301. eprint: https://doi.org/10.1080/0144341910110301. URL: https://doi.org/10.1080/ 0144341910110301. Richard Riding and Indra Cheema. “Cognitive styles—an overview and integration.” In: Educational psychology 11.3–4 (1991), pp. 193–215. Jenny Rosenberg and Nichole Egbert. “Online impression management: Personality traits and concerns for secondary goals as predictors of self-presentation tactics on Facebook.” In: Journal of Computer-Mediated Communication 17.1 (2011), pp. 118. Craig Ross et al.“Personality and motivations associated with Facebook use.” In: Computers in human behavior 25.2 (2009), pp. 578–586. Tracii Ryan and Sophia Xenos. “Who uses Facebook? An investigation into the relationship between the Big Five, shyness, narcissism, loneliness, and Facebook usage.” In: Computers in human behavior 27.5 (2011), pp. 1658–1664. David Adrian Sanders and Jorge Bergasa-Suso. “Inferring learning style from the way students interact with a computer user interface and the WWW.” In: IEEE Transactions on Education 53.4 (2010), pp. 613–620. Andrew I Schein et al.“Methods and metrics for cold-start recommendations.” In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR 02 46.Sigir. 2002. pp. 253–260. ISSN: 01635840. DOI: 10.1145/564376.564421. URL: http://portal.acm.org/citation.cfm?doid=564376.564421. Johann Schrammel, Christina Koffel, and Manfred Tscheligi. “Personality traits, usage patterns and information disclosure in online communities.” In: Proceedings of the 23rd British HCI group annual conference on people and computers: Celebrating people and

1 Theory-grounded user modeling for personalized HCI

[98] [99]

[100]

[101]

[102]

[103] [104] [105] [106]

[107] [108] [109]

[110]

[111] [112] [113] [114] [115]

| 29

technology. British Computer Society. 2009, pp. 169–174. Cristina Segalin et al.“What your Facebook profile picture reveals about your personality.” In: Proceedings of the 2017 ACM on Multimedia Conference. ACM. 2017, pp. 460–468. Gwendolyn Seidman. “Self-presentation and belonging on Facebook: How personality influences social media use and motivations.” In: Personality and Individual Differences 54.3 (2013), pp. 402–407. Marcin Skowron et al.“Fusing social media cues: personality prediction from twitter and instagram.” In: Proceedings of the 25th international conference companion on world wide web. International World Wide Web Conferences Steering Committee. 2016, pp. 107–108. Jason L Skues, Ben Williams, and Lisa Wise. “The effects of personality traits, selfesteem, loneliness, and narcissism on Facebook use among university students.” In: Computers in Human Behavior 28.6 (2012), pp. 2414–2419. Stefan Stieger et al.“Who commits virtual identity suicide? Differences in privacy concerns, internet addiction, and personality between Facebook users and quitters.” In: Cyberpsychology, Behavior, and Social Networking 16.9 (2013), pp. 629–634. Marko Tkalcic, Amra Delic, and Alexander Felfernig. Personality, Emotions, and Group Dynamics. Springer. 2018. Marko Tkalcic et al.“Personality correlates for digital concert program notes.” In: International Conference on User Modeling, Adaptation, and Personalization. Springer. 2015, pp. 364–369. Evangelos Triantafillou et al.“The value of adaptivity based on cognitive style: an empirical study.” In: British Journal of Educational Technology 35.1 (2004), pp. 95106. Nikos Tsianos et al.“User-Centric Profiling on the Basis of Cognitive and Emotional Characteristics: An Empirical Study.” In: International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems. Springer. 2008, pp. 214–223. Ernest C Tupes and Raymond E Christal. “Recurrent personality factors based on trait ratings.” In: Journal of personality 60.2 (1992), pp. 225–251. Patti M Valkenburg and Jochen Peter. “Social consequences of the Internet for adolescents: A decade of research.” In: Current directions in psychological science 18.1 (2009), pp. 1–5. James E Villaverde, Daniela Godoy, and Analfa Amandi. “Learning styles’ recognition in e-learning environments with feed-forward neural networks.” In: Journal of Computer Assisted Learning 22.3 (2006), pp. 197–206. Kua Hua Wang et al.“Learning styles and formative assessment strategy: enhancing student achievement in Web-based learning.” In: Journal of Computer Assisted Learning 22.3 (2006), pp. 207–217. Stephan Winter et al.“Another brick in the Facebook wall-How personality traits relate to the content of status updates.” In: Computers in Human Behavior 34 (2014), pp. 194–202. Herman A Witkin et al.“Field-dependent and field-independent cognitive styles and their educational implications.” In: Review of educational research 47.1 (1977), pp. 1–64. Yen-Chun Jim Wu, Wei-Hung Chang, and Chih-Hung Yuan. “Do Facebook profile pictures reflect user’s personality?” In: Computers in Human Behavior 51 (2015), pp. 880–889. Nick Z Zacharis. “The effect of learning style on preference for web-based courses and learning outcomes.” In: British Journal of Educational Technology 42.5 (2011), pp. 790–800. Li-Fang Zhang. “Thinking styles and the big five personality traits.” In: Educational psychology 22.1 (2002), pp. 17–31.

Sarah Theres Völkel, Ramona Schödel, Daniel Buschek, Clemens Stachl, Quay Au, Bernd Bischl, Markus Bühner, and Heinrich Hussmann

2 Opportunities and challenges of utilizing personality traits for personalization in HCI Towards a shared perspective from HCI and psychology Abstract: This chapter discusses main opportunities and challenges of assessing and utilizing personality traits in personalized interactive systems and services. This unique perspective arises from our long-term collaboration on research projects involving three groups on human–computer interaction (HCI), psychology, and statistics. Currently, personalization in HCI is often based on past user behavior, preferences, and interaction context. We argue that personality traits provide a promising additional source of information for personalization, which goes beyond context- and device-specific behavior and preferences. We first give an overview of the well-established Big Five personality trait model from psychology. We then present previous findings on the influence of personality in HCI associated with the benefits and challenges of personalization. These findings include the preference for interactive systems, filtering of information to increase personal relevance, communication behavior, and the impact on trust and acceptance. Moreover, we present first approaches of personality-based recommender systems. We then identify several opportunities and use cases for personality-aware personalization: (i) personal communication between users, (ii) recommendations upon first use, (iii) persuasive technology, (iv) trust and comfort in autonomous vehicles, and (v) empathic intelligent systems. Furthermore, we highlight main challenges. First, we point out technological challenges of personality computing. To benefit from personality awareness, systems need to automatically assess the user’s personality. To create empathic intelligent agents (e. g., voice assistants), a consistent personality has to be synthesized. Second, personality-aware personalization raises questions about user concerns and views, particularly privacy and data control. Another challenge is acceptance and trust in personality-aware systems due to the sensitivity of the data. Moreover, the importance of an accurate mental model for users’ trust in a system was recently underlined by the right for explanations in the EU’s General Data Protection Regulation. Such considerations seem particularly relevant for systems that assess and utilize personality. Finally, we examine methodological requirements such as the need for large sample sizes and appropriate measurements. We conclude with a summary of opportunihttps://doi.org/10.1515/9783110552485-002

32 | S. T. Völkel et al. ties and challenges of personality-aware personalization and discuss future research questions. Keywords: personalization, personality traits, personality-aware, HCI, psychology

1 Introduction The positive effects of personalization have been known to business owners since antiquity, when merchants provided different products and services to their costumers based on their individual preferences [2]. Nowadays, the rise of web technologies and ubiquitous computing has stimulated a new boost of personalization both in industry and academia [138]. However, in contrast to merchants in antiquity, who knew their customers and preferences personally, today’s digital businesses face significantly bigger customer groups. Thus, they put a lot of effort into building detailed profiles of their users, for example by collecting their preferences, demographics, knowledge, previous behavior, and interests [149]. In this chapter, we argue that personality traits provide a promising additional source of information for personalization, which goes beyond context- and device-specific behavior and preferences. We think that personality traits are especially promising for building user models since they are relatively stable and cross-situational [6]. In the following section, we present the well-established Big Five personality trait model from psychology [31, 50]. First of all, we give a brief overview of personalization and its benefits and challenges in general. To our knowledge there is no standard definition of personalization [8, 138]. According to Hagen, “Personalization is the ability to provide content and services tailored to individuals based on knowledge about their preferences and behavior” [52]. A more recent definition is given by Asif and Krogstie [8]: “Personalization is a controlled process of adaptation of a service to achieve a particular goal by utilizing the user model and the context of use.”

A user model is a “(structured) data record containing user-related information [...] in contexts that are relevant to predicting and influencing future behavior” [140]. This representation of user information is built by using direct and indirect user input [91]. Direct user input refers to a user’s profile, including characteristics, abilities, interests, needs, goals, and demographics, as well as preferences and ratings. While these data have the disadvantage of subjectivity and getting out of date, most personalization services rely also on indirectly and automatically recognized input, e. g., usage patterns, web logs of usage behavior, and clustering [91, 140, 149]. Personality is assumed to interact with situations [47]. For example people with certain personalities might selectively choose or avoid to be in situations (e. g., ex-

2 Utilizing personality traits for personalization in HCI

| 33

traverts choose sociable venues). Different personalities would show different behaviors in the same situations (e. g., emotionally stable people might not panic as easily in stressful situations). Since personality traits are assumed to be relatively stable across time and situations [129], we argue that they can overcome current obstacles with direct user input. Furthermore, in Section 5.1.1, we explain that there are promising approaches to predict personality traits from usage behavior. Apart from the user model, the context of the current situation is usually used for personalization [1]: “Context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and application themselves.”

Primary context types include location, identity, activity, and time. If a system uses context for personalization, it is called context-aware [1]. Based on the user model and the current context, personalization processes change the system’s behavior [96]. Examples include the change of functionality, interface, information content, distinctiveness of a system, and esthetic appeal [16, 91]. There are several technologies used for personalization, including cookies, pattern matching, rule-based inferencing, data mining, and Machine Learning [77]. The current popular trend of personalization [68] is the result of various benefits of personalization both for the user and the business. The most important advantage for the user is the reduction of information overload to increase the personal relevance of content [8, 16, 96, 131, 149]. For example, personalization allows online newspapers to show sports news only to those users who are actually interested in sports. Hence, individual differences between users and their preferences can be addressed to improve the user experience [8, 16]. When users are looking for a nice restaurant during vacation, online tourism providers could display only specific restaurants because they know the user’s preference for food, price, restaurant style, and current location [16]. This knowledge about the user could increase the fit of the provided systems and services [8, 24]. The user is delivered a pre-selection of restaurants, increasing efficiency, effectiveness, and convenience of the decision making [24]. In addition, Das et al. [33] pointed out that users often do not search for specific information but want to be actively interested. For example, online streaming services show trailers, trying to interest the user in using the service. Due to these advantages for the user, the vendor of a system or service also benefits from personalization. Users can be targeted individually on a one-to-one basis to increase user satisfaction and loyalty to the brand [24, 131]. Moreover, personalization can help businesses to target the right user groups, which benefit most from their services [8]. As a result, businesses can increase their sales revenue and make more profit [62].

34 | S. T. Völkel et al. However, there are also several challenges businesses have to face when employing personalization. First of all, personalization services have to ensure privacy and data control of their users’ personal information [8, 24, 77]. When privacy cannot be ensured, users’ acceptance of the service and their trust to use it are likely to decrease [8]. Additionally, the importance of privacy, data control, and transparency was recently regulated by the European General Data Protection Regulation [133]. Since users often underestimate how much personal information is used and struggle with building appropriate mental models of personalization algorithms, intelligible interfaces are of crucial importance [58, 135]. Personalization also allows the vendor to manipulate users by showing selected contents, opinions, and products, influencing behavior [49, 131]. Finally, as discussed above, the automatic and successful recognition of user profiles has a great impact on the success of personalization [8, 91]. In this chapter, we discuss the role of personality traits for improving personalization in human–computer interaction (HCI). In a first step, we present the theoretical background of personality. Furthermore, we describe previous findings of the impact of personality traits on behavior and interaction with technology. Based on these findings, we discuss further opportunities and challenges of utilizing personality traits for personalization. Finally, we sum up our results and give suggestions for future work in our conclusion.

2 Theoretical background One and the same person shows relatively consistent patterns of behavior and experiencing regarding acting, thinking, and feeling. The measurement and investigation of systematic variations in human behavior, thinking, and feeling have been documented and tested since 1115 BC [37]. These systematic, psychological patterns can be used to distinguish people from each other and are generally referred to as personality [88].

2.1 History of personality models in psychology Personality research overall aims to find ways to comprehensively describe and explain the structure of personality. The most productive and still relevant paradigm has been the traditional trait approach, which assumes that traits dispose individually specific ways of behaving and experiencing [6]. The beginnings of the modern definition of personality reach back to two models that have dominated the psychometric research scene for many years: Cattell’s 16-factor model and Eysenck’s ThreeFactor Model [85]. Both models are based on classification systems that reduced vast

2 Utilizing personality traits for personalization in HCI

| 35

amounts of traits represented in the language of folk psychology to fewer, but meaningful dimensions. The 16-factor model provides narrower, so-called primary traits. In contrast, the Three-Factor Model describes personality in a more abstract way by using three higher-order secondary factors (extraversion, neuroticism, and psychoticism), which in turn comprise narrower, correlated traits [6, 85]. Over the years, researchers focused on further systematizing the classification of personality traits [50]. Following the psycholexical approach, it was assumed that important individual traits have entered natural language (e. g., via adjectives) and that the use of a word would attribute its importance as a psychological descriptor [35].

2.2 Big Five model The psycholexical approach revealed the most established personality trait model in psychology and related research areas: the Big Five [31]. Currently, the Big Five model has been claimed to be the most useful taxonomy for personality structure [88] and therefore has represented a reference model in psychology [85]. The Big Five model postulates five broad and often replicated dimensions: extraversion, emotional stability, conscientiousness, agreeableness, and openness. For the assessment of these five factors various self-report questionnaires have been developed, which use slightly different factor names [6]. The five global traits all comprise hierarchically organized subfacets, which allow for describing an individual’s personality in a more detailed way. As an illustration, according to the Five-Factor Model of Costa and McCrae [31], the trait facets of extraversion are warmth, positive emotions, gregariousness, assertiveness, activity, and excitement seeking. Thus, extraverted people can be described as sociable, experiencing more positive affect, and seeking stimulating activities. In contrast, introverted people are less outgoing, reserved, and shy, and prefer to spend time alone. Emotional instability, also often called neuroticism, is associated with anxiety, hostility, and experiencing negative affect. People high in emotional stability are calm and relaxed, whereas people with low emotional stability feel tense and uncertain. Conscientiousness describes how orderly, reliable, self-disciplined, and achievement striving a person is. The level of people’s willingness to help, compliance, modesty, and tolerance are inter alia known as important characteristics of agreeableness. Finally, openness describes people’s tendency to be creative and to be receptive to feelings, art, ideas, and fantasy [6, 89, 35, 50, 85]. The Five-Factor Model has stimulated intensive research efforts within the last decades. Personality traits were found to be relatively stable, but they can change over a life span [28]. In addition, findings show that traits are stable across highly similar situations [5] and invariant across different observers, which means that self-, peer, and observer ratings converge [89]. Finally, the Five Factors have been postulated to be universal as they empirically turned out to be valid for different sexes, races, cultures, and age groups [26].

36 | S. T. Völkel et al.

2.3 Further models Besides the Big Five model, other personality trait models like the above-mentioned Big Three (extraversion, neuroticism, and psychoticism) by Eysenck or the Alternative Big Five (impulsive sensation seeking, neurotisicm-anxiety, aggression-hostility, sociability, activity) by Zuckerman have been proposed [6, 150]. In contrast to the Big Five, these models have a stronger focus on the psychobiological basis of personality. As an illustration, impulsive sensation seeking is one of Zuckerman’s Alternative Big Five factors and describes the tendency to seek and run risks to experience a certain level of arousal. Accordingly, it has been argued that the Big Five model is a trait concept of social interaction due to its psycholexical foundation, and therefore it is not able to depict the assumed underlying true biological basis of personality [150]. However, empirical analyses revealed that the identified factors of all three models converge in large part, most of all for extraversion and neuroticism [150]. Although to date, the Big Five model has prevailed in psychological research, a new model, the socalled HEXACO, has been proposed recently [7]. This psycholexical-based model assumes six dimensions (honesty-humility, emotionality, extraversion, agreeableness, conscientiousness, and openness to experience) and, according to Asthon and Lee [7], is able to explain personality phenomena (e. g., altruism) that cannot be described within the Five-Factor framework.

3 The role of personality traits Personality influences people’s (social) behavior [31], preferences [21, 48], decision making processes, interests [102, 114], and life outcomes [106]. Hence, the role of personality traits in several domains has been investigated. Examples include work performance and intentions [100], driving behavior [30], well-being [36, 123], relationship [36, 125], job satisfaction [69, 94], and stress-coping strategies [87, 104], as well as medicine [42, 80]. In the introduction, we summed up different benefits and challenges of personalization. In the following, we present previous work regarding the relationship between personality traits and HCI, which could inform personalization and its benefits and challenges. (i) One of the main advantages of personalization is the opportunity to pre-select options for users and tailor them to their needs. Thus, in the first subsection we present previous findings on the role of personality traits for preference of interactive systems. (ii) Personalization allows systems to reduce information overload. The perception of relevance of information and its link to personality are discussed in the second subsection. (iii) Furthermore, we introduce research on the relationship between personality traits and communication behavior as an opportunity to improve user experience. (iv) In the fourth subsection, we describe

2 Utilizing personality traits for personalization in HCI

| 37

previous results on the impact of personality traits on perception of trust and acceptance, which are crucial challenges of personalization. (v) Finally, we present first approaches of using personality traits for personality-based recommender systems.

3.1 Preference for interactive systems With improving technology, the choice between different systems becomes more difficult for users. For example, when two smartphones do not differ regarding technological measures, the user might have difficulties to decide which device to buy. Personalization allows to address individual needs and preferences, which might play a decisive role in the purchase decision. Previous research suggests that personality influences how people make their decisions [102]. Moreover, humans prefer to interact with personalities reflecting their own personality [19]. This preference for congruent personalities can also be transferred to humans’ choice for products [127, 144] and brands [60, 95]. For example, products and brands are associated with personality traits such as activity for sports brands. Hence, we can assume that this preference also holds true for the choice of technologies, especially when intelligent systems representing humans are involved. For example, Ehrenbrink et al. [39] suggested that personality traits influence users’ choice for an Intelligent Personal Assistant (IPA). They compared the three IPAs Siri, Cortana, and Google Now (predecessor of Google Assistant), which differ regarding their interaction with users. While Siri and Cortana act like having a personality, e. g., by telling jokes and giving emotional replies, Google Nows shows rather neutral behavior. In their study they found first hints that highly conscientious individuals preferred Siri and Google Now. They attributed this effect to a more profound display of information on these devices in contrast to Cortana when users asked the IPAs questions. Probably due to a lack of prior exposure to Cortana, individuals with a low score on openness tended to dislike Cortana [39]. Rauschnabel et al. [112] reported that personality traits impact the motivation to buy smart glasses. While extraverted users were interested in smart glasses when they expected social conformity, users with high scores in openness focused on functional benefits. However, emotional instability moderated the perceived benefits, especially when people anticipated a strong effect on their lives [112]. Summing up, previous findings suggest that personality traits can explain a preference for systems [64, 102]. This preference could help developers and companies to create systems tailored for specific personalities to stand out from others. However, this relationship has to be examined in more depth to find clearer connections between preference and personality traits and to determine the underlying reasons for this.

38 | S. T. Völkel et al.

3.2 Providing personalized information As pointed out in the introduction, a primary goal of personalization is to filter information, reducing the information overload and increasing relevance [8]. However, the amount of information perceived as relevant varies between individuals. Thus, in a first step, we present findings regarding different information seeking types. In the second subsection, we introduce previous results on how personality traits affect the way information should be presented to the user. 3.2.1 Information seeking Personality traits influence the way people seek for information [59]. People low in emotional stability have difficulties with evaluating the quality and relevance of a piece of information and thus tend to prefer new information which confirms previous data. Furthermore, they easily give up on information searches when they are unsuccessful in their query, especially since they perceive a lack of time to put more effort into the search. People experiencing these anxieties in combination with low levels on conscientiousness are classified as fast surfers, who are quickly skimming information, avoiding high effort and deep delving into the topic [59]. Individuals high in extraversion are typically active and energetic, which is also reflected by their information seeking behavior. Due to their high social abilities, they tend to use their social contacts as information sources. People high on extraversion in combination with openness to experience and low agreeableness (competitiveness) are characterized as broad scanners, who are likely to be exhaustive yet unsystematic information seekers using a wide range of sources [59]. Apart from their preference for a wide range query and willingness to put effort into the search, individuals high in openness to experience tend to prefer thought provoking new information. They are usually intellectually curious and able to judge and reflect on information critically. In contrast, conservative individuals low in openness often have a desire for confirming information and precise information, avoiding conflicting sources [59]. Individuals low in agreeableness are competitive and competent to critically analyze information. However, due to their impatient character, they tend not to put too much effort into the search. They are also more likely to be broad scanners of information [59]. Finally, it is not surprising that individuals high in conscientiousness are deep divers, working hard and trying to obtain high quality information. Apart from the immense effort they put into the search, they also pay attention to the quality of the retrieved information and follow a structured deep analysis approach [59]. They are also distinguished by information competence [128]. On the other hand, individuals low in conscientiousness are easily distracted, hasty, and impulsive and thus try to retrieve information as easily as possible [59].

2 Utilizing personality traits for personalization in HCI

| 39

Tkalčič et al. [137] found a relationship between personality and preference for digital program notes of classical music concerts. Their results indicate that users high in openness, agreeableness, conscientiousness, and extraversion prefer more metainformation about concerts. 3.2.2 Personal visualizations Apart from the amount and depth of desired information, personality traits also play a role in how this information should be presented. Green and Fisher [51] outlined the impact of personality traits on the interaction and performance in expert analytics systems and emphasized the need for real-time interface individualization. Ziemkiewicz and Kosara [148] investigated users’ interactions with visual metaphors. They found out that individual differences, inter alia personality traits, determine a user’s ability to use different visualizations and satisfaction with representations. While they focused on expert visualizations with the primary goal to interpret information quickly, Schneider et al. [124] investigated the role of personality traits in everyday use of personal visualization. They compared plain and decorated visualizations for communicating the user’s daily water intake. Their results revealed that participants high in extraversion, openness, and agreeableness preferred a decorated visualization of a creature, which starts smiling with increased water intake. Since the combination of these personality traits is associated with a high need for affect, the participants might have found a cute creature more engaging than an unemotional representation. Highly conscientious participants, however, disapproved of the creature visualization due to a lack of detailed information. Ferwerda et al. [46] identified distinct music browsing behavior based on users’ personality, which could inform the design of user interfaces. For example, their findings revealed that while highly open users preferred to browse music by mood, highly conscientious users favored browsing by activity. In summary, previous findings suggest an impact of personality traits on preference for depth and visualization of information. We assume that these findings not only help developers to present personalized content to the user but also give intelligible explanations on how the underlying personalization algorithms work to improve the system’s transparency, which is gaining increasing importance [133]. However, users’ preferences have to be analyzed in more detail and an in-depth understanding of the underlying reasons of the relationship between preferences and personality traits is necessary [124].

3.3 Communication behavior Several associations between individual personality trait levels and interpersonal communication behavior have been reported in previous research. Most intuitively,

40 | S. T. Völkel et al. the personality trait of extraversion has repeatedly been related to both the frequency and the duration of computer-mediated communication behaviors on smartphones [18, 93, 130]. Furthermore, extraversion has also been associated with linguistic characteristics such as higher abstractness of language [15] and specific voice features, such as higher pitch [122]. More extensive studies reported associations between several personality dimensions and word use in blogs and social networks [108, 145]. Specifically, the trait of openness was associated with higher diversity in word use, both on the categorical and the single-word level [145].

3.4 Trust and acceptance When humans interact with autonomous machines and Artificial Intelligence, as is often the case in personalized systems, they abandon control and allow the machine to make decisions or contribute to decision making. Hence, humans have to accept and trust the machine to perform the given task [116, 121], posing crucial challenges for personalization. Yet, not all users respond with the same trust to automation [63, 79]. Previous work suggests that some personality traits influence humans’ trust in machines as well as their interaction with them [11, 53, 116, 121]. Evans and Revelle [43] showed an effect of extraversion and emotional stability on trust development in a robot. Haring et al. [55] also discovered that extraverted individuals reported higher trust in humanoid robots. In contrast, Salem et al. [120] could not detect any relationship between personality traits and robot trust development. Instead, they found that individuals high in extraversion and emotional stability anthropomorphized the robot more and felt close to it. On the other hand, Hancock et al. [54] found only little evidence of an impact of human characteristics on trust in human–robot interaction and Schaefer et al. [121] stressed that the relationship between personality traits and trust development has not been thoroughly explored yet. These differences in perceived trust can also influence users’ intention and actual usage of technology. Individuals who score low on emotional stability have in general more negative feelings towards technology and technology advances, being more cautious to use them [12]. Openness was found to positively influence the use of new technologies [146]. Moreover, conscientiousness [12] and agreeableness [126] moderate the relationship between behavioral intent and extent of use. On the other hand, personality traits can also play a role in trusting too easily and hence present a vulnerability to privacy attacks. Halevi et al. [53] suggested that highly neurotic individuals are more susceptible to phishing attacks. Moreover, a link between some personality traits and the willingness to disclose private information online was indicated, although Halevi et al. [53] found openness to be the salient component, while Bansal et al. [11] only found an effect for social components such as agreeableness, extraversion, and emotional instability. First findings from a study, collecting the opinion of 5,000 participants on automated driving, revealed that individuals low in emotional

2 Utilizing personality traits for personalization in HCI

| 41

stability were more anxious about data transmission in autonomous cars, whereas more agreeable respondents felt more comfortable with it [79]. Furthermore, personality traits can also play a role in active security behavior to avoid attacks [75, 142].

3.5 Personality-based recommender systems Today, people are confronted with seemingly endless possibilities when buying a product in online shops like Amazon, picking tonight’s TV show on Netflix, or choosing an activity on TripAdvisor. To support the user in making the best decision, reduce information overload, and engage users, recommender systems provide recommendations based on users’ preferences [115]. The role of personality traits for improving recommender systems has been explored before, revealing promising results [25, 41, 44, 45, 64, 65, 71, 114, 118, 136]. An overview of how personality user models can improve recommender systems can be found in [136]. There are several different approaches to implement recommender systems [103]. Content-based recommender systems only use the user’s ratings on items and then recommend items that have similar attributes, e. g., genre or actors, to the user’s preferred items [118]. In these recommender systems, the user’s personality serves as an additional, psychological attribute of a product [118]. Hu and Pu [65] compared a personality-based recommender system with a typical content-based recommender system using user ratings. Although they could only determine small differences regarding the perceived accuracy of the recommendations, personality-based recommender systems were perceived to be significantly easier to use and preferred by the majority of users [65]. However, it should be noted that the evaluated recommender system’s personality quiz is not based on solid psychological foundations. Another possibility for giving recommendations is using the link between personality traits and entertainment preferences, for example preferences for music [114, 21, 45], film, and TV show genres [23, 21], as well as book genres [21]. Based on Rentfrow and Gosling’s [114] findings, Hu and Pu [66] presented a recommender system that infers users’ music preferences based on their personality traits. They used personality quizzes to build profiles for users and their friends and compared their recommender system with a rating-based recommender system but could not determine any significant differences regarding accuracy of the recommendations. However, users enjoyed using the system to find recommendations [66]. Collaborative filtering is the most popular recommender system technique, recommending items liked by other users with similar interests and preferences [103]. Each user’s profile consists of items and ratings as well as previous usage history [118]. For example, user Tom is interested in science fiction books. When other users with similar preference in the past buy a new science fiction book, this book would also be presented to Tom. Roshchina et al. [118] presented a collaborative filtering recommender system, which automatically induced personality traits from the user’s writings. In

42 | S. T. Völkel et al. their system TWIN, they recommended items chosen by people with similar personality profiles (twins). Applying their system on the traveling application TripAdvisor’s dataset, they could show the general possibility of such their approach to develop valuable recommendatios although the accuracy remained quite low. Karumur et al. [71] suggested that recommender models using ratings from users with similar personalities can improve consumption. Elahi et al. [41] developed an active learning approach that utilizes users’ personality traits to improve the number and accuracy of their ratings. Fernández-Tobías et al. [44] could show that by incorporating users’ personality for collaborative filtering, they achieved performance improvements and novelty of the recommended items for new users. Moreover, personality traits are associated with user’s preference for diversification of recommendations [25]. In summary, previous research on personality-based recommender systems suggests that recommendations based on personality traits can be accurate and improve user experience. However, several challenges have to be addressed in order to actually deploy these recommender systems in practice. These challenges include improving the accuracy of recommendations, automatically recognizing personality traits, and an in-depth analysis of users’ acceptance of these recommender systems. However, personality traits cannot only be useful for improving recommender systems, but there are several other opportunities for HCI. In the following section, we present these opportunities.

4 Opportunities Several opportunities for considering personality in interactive systems arise from the literature, as reviewed in the previous section. Indeed, we find that information about a user’s personality might be used at several stages of the interaction process – from motivation, choice, and a user’s first contact with a system to continued use and feedback. In the following, we present diverse application opportunities which cover different parts of this range. In particular, these consider personality for (i) informing context and content of personal communication, (ii) improving recommendations upon first use, (iii) supporting behavior change in persuasive technology, (iv) facilitating comfortable driving in autonomous vehicles, and (v) enabling overall empathic systems.

4.1 Personal communication One promising use case for personality-aware personalization is personal digital communication. Here, personality could inform both context and content. Regarding context, personality information could help to address questions of when, how, and who.

2 Utilizing personality traits for personalization in HCI

| 43

For instance, personality could be used by an intelligent contact list to help users decide when to contact others, similar to ContextContacts [105]. For each contact, this list could use both the person’s context (e. g., location, time, activity, company) and his or her personality to predict how likely the person would respond or feel disturbed. Similarly, such a contact list application could use personality information to predict how each person would like to be contacted right now (e. g., phone call vs. text message). Finally, personality could support the user in deciding whom to contact, for example when looking for a sports partner (e. g., regarding preference for cooperation vs. competition) or people to complete a team for a job. More concretely, for example, a social network might highlight friends of friends who share not only similar interests but also bring compatible skills and personality. Regarding communication content, personality information could be used in systems which automatically generate text as reply suggestions (see, e. g., Google’s Smart Reply [70]). In particular, the personality of both the sender and the receiver might be considered to generate adequate content, in addition to other factors such as context and type of relationship. For example, if the receiver is highly agreeable, the system could suggest a friendly, polite, and harmonic language, even if the sender usually tends to write short and precise texts. Moreover, personality information might also be useful to adapt common nontextual content, such as emojis or (animated) avatars. Here, the user’s personality might lead to different visuals or animations, thus (subtly) communicating personal aspects of the user to others. For example, a hooray emoji might express joyful excitement rather differently for an extraverted user compared to a more introverted one (e. g., throwing hands into the air vs. a bright smile). As a result, such digital conversations might be perceived as more personal and intimate, similar to the findings in related work on context-aware messaging (see, e. g., [17, 56]).

4.2 Recommendations upon first use Previously, we presented first approaches from current research for personality-aware recommender systems. In addition, personality traits offer the opportunity to provide a stable foundation for recommendations, also (i) upon first use and (ii) across use cases. Accurate recommendations for new users are difficult due to the lack of behavior records [64]. A stable construct like personality might support systems in overcoming this cold start problem [136]. Regarding the integration of such user information, related work already utilized demographic information for content-based similarity [64]. Moreover, Elahi et al. [41] showed that personality traits can be used to improve the suggestions of items that users are requested to rate, avoiding the cold start problem. Second, personality might be used across systems. For example, recommendations for movies and music are often given by separate systems. Personality traits

44 | S. T. Völkel et al. could be used to link such different domains [64]. In contrast, specific likes or ratings might be harder to transfer adequately from one (content) type to another.

4.3 Persuasive technology Several applications try to persuade their users to show a specific behavior, for example continuing to play an online game, spending more time on a website, or buying recently viewed products. Furthermore, users often track their own physical activity or financial expenses in order to reach a specific goal [124]. Due to today’s globalized overweight and obesity problems, mobile well-being apps are gaining increasing popularity. These fitness or nutrition apps are designed to nudge the user’s behavior towards a healthier lifestyle, but suitable behavior changes are very individual [13]. Users’ personalities among their goals can influence how and when they should be persuaded to improve their well-being. Imagine Anna and Tom, who both want to improve their lifestyles and lose weight. Anna is an extravert, she likes to engage actively and go out with other people. A possibly successful intervention for her could involve to ask her to eat together with friends or colleagues who already foster a healthy lifestyle since she would probably appreciate the company and can easily be persuaded by social contacts. On the other hand, Tom is more introverted and highly conscientious. For him, a suitable intervention strategy could be to provide him with facts and information. For instance, a mobile app could show the calories of a burger compared to a salad for lunch and outline how many calories are left for dinner for both of the two choices. In contrast to Anna, he would not be comfortable with relying on other people’s advice. Lepri et al. [83] investigated the role of personality in inducing behavioral change. They found that individuals high in extraversion or neuroticism react positively to social comparison intervention strategies in order to increase daily physical activity. In contrast to emotionally instable persons, extraverts decrease their physical activity if confronted with a peer pressure social strategy. Apart from developing suitable interventions, the visualization of the user’s personal data and behavior is of significant importance to give him or her feedback about his or her behavior without evoking negative feelings [134]. However, these visualizations have to address diverse user needs and while one type of feedback might work for some users, it could be rejected by others [67, 124]. For instance, conscientious individuals could appreciate honest feedback while negative feedback could easily discourage neurotic users. Another possibility to provide persuasive feedback is gamification. Gamification research has already utilized previous results from psychological theories of motivation to improve user experience and user feedback in systems [119]. In addition to theories of motivation, feedback design could be further enriched with insights from personality psychology. Most relevant findings for the design of feedback systems relate to the differential sensitivity to rewards and punishments (aversive stimuli) with regard

2 Utilizing personality traits for personalization in HCI

| 45

to individual levels of extraversion and emotional stability [34, 111]. Whereas many computer games have mostly focused on the use of visual and auditory rewards (+1, awesome, level-up), research from the field of psychology suggests individual differences in reward dependence [29]. Therefore, rate, intensity, and the content of system feedback could be adjusted to those individual dispositions. For example, the intensity of punishments or negative user feedback could be adjusted to individual levels of emotional stability. Possibly, personalized feedback could then be used in persuasive systems design (e. g., to increase desired or to decrease undesired behavior) to improve the overall user experience.

4.4 Autonomous vehicles Using personality traits for personalization could also be a very promising approach for autonomous driving. In this context, trust in the autonomous system is especially important because the driver has to hand over the driving task to the car [116, 121]. Besides improvements in the reliability and functionality of autonomous vehicles, an approach to increase trust in autonomous vehicles could be to carefully explain actions of the car to the passenger [57]. For example, to support highly neurotic drivers in critical situations, e. g., take over requests, providing confirming and clear information can reassure the driver and prevent distraction. Individuals high in extraversion and openness to experience, in turn, could benefit from an intelligent assistant, which takes over the role of a friend or passenger and provides brief and precise explanations but simultaneously satisfies the need for social interaction and variety. It is likely that conscientious drivers prefer an indicator of the car’s certainty to perform the current driving task, allowing them to maintain a sense of control. This indicator could also be useful to agreeable drivers, who might otherwise develop inappropriately high levels of trust in the system [11]. Another use case in the automotive context could refer to non-driving-related activities, which will come into focus with increasing automation [110]. Using Advanced Driver Assistance Systems can be experienced as a reduction of driving enjoyment and fun by some drivers [38], requiring autonomous vehicles to offer alternative activities [110]. These activities are likely to be dependent on the driver’s or passenger’s personality and associated behavior patterns. For example, we can expect that passengers high in extraversion more often seek active and energetic activities and appreciate the possibilities to socially interact [150]. Users’ preferences for specific app usage [130] could provide clues for suitable tasks, such as a focus on entertainment and communication applications for extraverted drivers or efficient task scheduling applications for organized conscientious drivers. Passengers open to new experience, and therefore frequently pursuing external stimuli, could be equipped with information about points of interest on the road or latest news of the local area.

46 | S. T. Völkel et al.

4.5 Empathic systems Beyond scientific literature, (popular) culture also envisions personality-aware systems. For example, empathic intelligent robots, which give humans the feeling to completely understand them, have had an appeal to people since Greek myths [101]. Due to this fascination empathic robots appear in recent fiction, e. g., in the movies Her, Electric Dreams, Ex Machina, and A. I. Artificial Intelligence, in the TV show Black Mirror, and in novels such as Origin by Dan Brown. Previous research suggests that – similar to human–human communication – humans automatically and unconsciously attribute a virtual humanoid character a personality [90, 113]. Hence, equipping an intelligent agent (e. g., chat bots, voice assistants, humanoid robots) with a personality will be an important requirement for successful human–robot interaction [98, 132]. Furthermore, the Similarity Attraction Paradigm indicates that humans feel more attracted to humans with similar personality [19]. Likewise, adapting an intelligent agent’s personality to the human user was found to increase credibility, perceived competence, performance, and compliance [4, 81, 98, 132]. For example, a robot interacting with a conscientious user provides a huge amount of information and is always reliable and trustful. Agreeable users could prefer a robot that is highly sociable and friendly and offers support to the user. On the other hand, contradicting findings also suggest a complementary attraction paradigm [4, 82]. In these scenarios, intelligent agents behave more like friends to the user. However, this opportunity also raises several research questions: When do users prefer intelligent agents to behave similarly or complementarily to them or should agents sometimes just show random behavior to avoid predictability? Do users sometimes need to be pushed out of their comfort zone? For example, should an intelligent agent interacting with an introvert sometimes ask this user to be more active to set more stimuli? Do users sometimes need the intelligent agent to contradict them and if so, when and how often do users want intelligent assistants to behave differently from their personality? We will further discuss technological challenges regarding the synthesis of personality in the following section.

5 Challenges Apart from these opportunities, personality-aware personalization also poses several challenges. With respect to technological barriers, an important technical challenge is the automatic assessment of the user’s personality. To create empathic systems, a consistent personality of intelligent agents has to be synthesized. In the first subsection we present different promising approaches for personality computing. With respect to the user, personalization faces several challenges, as presented in the introduction.

2 Utilizing personality traits for personalization in HCI

| 47

We think that these challenges are particularly important when designing for personality. Hence, in the second subsection we discuss effects of personality-aware personalization for users and their possible views and concerns.

5.1 Personality computing Personality computing has gained increasing interest in the HCI community due to current interaction phenomena. On the one hand, users’ personal information and behavior are available on social networking platforms and easily accessible via their smartphone use. On the other hand, current trends concern endowing machines with social and affective intelligence [141]. In their survey, Vinciarelli and Mohammadi [141] claimed three main challenges of personality computing, i. e., (i) automatic personality recognition, which refers to ascertaining an individual’s true personality from machine-detectable cues, (ii) automatic personality perception, which is concerned with the prediction of the personality others attribute to an individual, and (iii) automatic personality synthesis, which deals with the generation of artificial personality through intelligent agents, such as virtual assistants or embodied robots. Since automatic personality perception is less related to personality-aware systems, we focus on the other two challenges in the following subsections.

5.1.1 Automatic personality recognition To utilize the aforementioned opportunities of personality awareness, systems must be able to automatically recognize the user’s personality. Since personality is a latent construct, personality traits cannot be measured directly but can potentially be inferred from a set of indicators. Psychometric self-report questionnaires are currently considered the gold standard of personality assessment and are used in a wide range of academic and professional settings due to their predictive capabilities for important life outcomes [106]. Unfortunately, questionnaire approaches are also subject to a series of methodological biases, such as response styles, social desirability, and memory [139]. Furthermore, users might not be willing to fill out long questionnaires before interacting with a system. Due to these limitations of existing approaches and due to recent technological advances, research efforts have started to focus on the observation of personality manifestations in the form of digital-footprint data. Digital footprints such as Likes from social media [147], app usage on smartphones [130], or records of language use [108, 145] are becoming increasingly available for researchers. Social media data in particular have shown to be predictive for individual personality [9]. Additionally, researchers have aimed at the recognition of self-reported personality traits from facial image data [22] and have reported on associations of music preferences and individual

48 | S. T. Völkel et al. personality traits [45, 99]. In order to enable personality-aware personalization, trait levels of personality could be directly predicted from digital footprints [9]. Despite the obvious opportunities of this new approach, it also raises questions regarding the ground truth of personality trait assessment as well as its accuracy. In a best-case scenario, personality assessment from digital footprints (in the current form) could perfectly predict self-reported personality scores, measured with conventional personality questionnaires. However, as mentioned above, self-reported personality measures are subject to a number of biases and do not necessarily constitute the most perfect measures of latent personality traits themselves. One particular problem that remains with regard to digital footprint-based personality recognition is the accuracy on the level of individuals. Average prediction accuracies of personality predictions have been reported as relatively low (r = 0.34, 95 % CI [0.27–0.34]) [9], allowing for usage in a best-guess fashion only. Still, research suggests that digital footprint-based personality predictions might be good enough for personality-based adaptations when applied on large samples [86]. However, when it comes to precise psychometric testing decisions on an individual level (e. g., does person X or person Y score higher), much higher precision is needed. Ultimately, personality assessment from digital footprints data needs to be validated based on relevant life outcomes in order to benchmark it with existing methodologies. We hypothesize that for a while, the field will need to balance the need for fast, deployable business solutions with mapping out fine-grained manifestations of personality in digital footprint data. Finally, the automatic extraction of personality dimensions from user data also raises a series of privacy and data ownership issues. We will discuss those in Section 5.2.

5.1.2 Automatic personality synthesis In Section 4.5 we discussed the benefits of empathic intelligent agents. To achieve a realistic and natural personality, intelligent agents have to show consistent patterns of behavior [3]. For example, an agent acting reserved and shy in one situation, yet extraverted and chatty in another would be confusing for humans [74]. Hence, to achieve a successful human–agent interaction, the personality for an intelligent agent has to be designed carefully. Automatic personality synthesis refers to the automatic generation of behavioral cues to elicit the perception of intended personality traits [141]. These behavioral cues are perceptible externalizations of the internal and non-perceptible personality [122]. For example, humans assume that an intelligent agent which is talking fast and loud while greatly gesturing is extraverted. Thus, automatic synthesis is supposed to support agent designers to explicitly control the traits humans attribute to the intelligent agent [141].

2 Utilizing personality traits for personalization in HCI

| 49

Several researchers could show that a systematic variation of intelligent agents’ synthetic behavior leads to unanimous attribution of personality traits [90]. For example, to elicit extraversion, researchers used different behavioral cues and channels, such as speech rate and pitch [98, 143], gaze [4], gestures [72, 132], and facial expressions [3, 72]. However, the relationship between the Big Five personality traits and perceptible behavior is mainly researched for extraversion as it is the most observable trait [122]. However, when designing intelligent agents, e. g., for assistive or educational contexts, other traits such as conscientiousness or agreeableness are more important. Hence, further research is necessary to determine how to synthesize other personality traits apart from extraversion. Moreover, the combination of different personality traits and potentially contradicting behavioral cues still has to be researched.

5.2 User views and concerns In the introduction, we presented several challenges of personalization with respect to possible user concerns: (i) privacy and data control, (ii) acceptance of the service and trust in using it, (iii) intelligible interfaces, and (iv) the threat of manipulation. In this subsection, we discuss each of these challenges in regard to using personality traits for personalization.

5.2.1 Privacy and data control Privacy concerns might arise from determining users’ traits for offering personalityaware personalization. As discussed before, we assume that the most promising option is to automatically detect personality traits based on natural user data like smartphone logging data [27] or digital footprints left in social media [76, 147]. In this case, it will be indispensable to collect users’ data (e. g., habits, context, or usage behavior) constantly in the background of the used device for providing personalized services. As already known from consumption research, privacy concerns depend on the type of disclosed information and are especially high for personal data [14]. Previous findings have already suggested that personality traits are perceived as sensitive data [66]. Hence, future research has to investigate users’ attitude towards personality-aware personalization. Moreover, due to the huge amount of stored data of individual and probably unique behavior, it is likely that individuals can be identified unambiguously by using these records of personality and according behavior, similar to fingerprints or DNA [97]. Hence, privacy regulations to protect the user and address these new possibilities for identification have to be developed. Users might also be worried that

50 | S. T. Völkel et al. other people see their assessed personality profiles, especially on sensitive measurements like emotional stability [66, 109]. Another problem might arise when people use other people’s devices, e. g., borrowing a phone to make a quick call. If the system is adapted to the user’s personality, is it still possible to determine the user’s personality by using his or her system?

5.2.2 Acceptance and trust Interacting with autonomous machines and Artificial Intelligence often requires the user to abandon control and to allow the machine to be involved in decisions. Therefore, humans have to accept and trust the machine to perform the given task [116, 121]. Consequently, a major challenge of personality-aware personalization is the question whether users accept the use of personality traits as input for personalization. Therefore, one of the most important research questions is to examine users’ reactions to personality-aware personalization. Furthermore, contexts, user goals, and tasks should be identified, for which users find personality-aware personalization useful. It will be particularly interesting to investigate the influence of users’ selfcharacterization versus the system’s characterization on acceptance. So far, only little research has been conducted regarding the acceptance of personality-aware personalization. When comparing a personality-based with a rating-based recommender system, Hu and Pu [65] showed that users subjectively preferred the personalitybased system and found it easier to use. However, they stressed the importance of system transparency and user control for user acceptance [65], which is discussed in the following section.

5.2.3 Intelligibility and transparency The EU’s General Data Protection Regulation requires that systems reveal which information they collect about their users and how this information is used, giving users the right of explanation [133]. Hence, systems using personality traits as part of personalization algorithms have to make these procedures transparent and intelligible to their users. Moreover, intelligible explanations of the system’s behavior can increase user trust, satisfaction, and efficiency among others [40, 84, 135]. Yet, this need for transparency poses several challenges to the developer. First of all, the user has to develop an understanding of personality trait models themselves to form an accurate mental model. The accuracy of this mental model is important for users’ trust [78]. On the one hand, personality as an explanation for a specific system behavior could be easier to understand for users than complex user behavior algorithms [102]. This kind of explanation refers mostly to everyday knowledge

2 Utilizing personality traits for personalization in HCI

| 51

and use of personality traits. On the other hand, providing a more in-depth or scientific explanation of personality traits could prove difficult, particularly when there is only little space to provide explanations, such as in mobile applications. Furthermore, while one trait might be easy to explain (“The system did this because it thinks you are an extravert”), describing the interaction between several personality traits seems to be far more difficult. It might be necessary to combine levels of several traits to find new understandable descriptions, for example like the information seeking types broad scanner and deep diver [59]. Another challenge will be to visualize personality traits and corresponding models. Should systems provide a mere textual description or also show graphical explanations? Again, it seems easier to find graphical descriptions of one personality trait, whereas the combination of several characteristics quickly becomes more complicated. It might also prove difficult to provide a neutral or positive description of personality traits to users. Most humans associate negative or positive attributes with specific personality traits. Most users will likely disapprove of explanations like “I just did that because you have such an anxious personality.” When transparent systems allow the user to understand the determining algorithms, they must also give their users the opportunity to give feedback to the system and to control the user model [10, 73]. For example, in rating-based recommender systems, the user has a clear understanding of the recommendations. By changing his or her ratings, the user can easily influence the recommendations. In contrast, personality-based recommender systems are less intelligible and the user might not know how to achieve a different result [65]. To improve the system’s scrutability, users should be given the possibility to tell the system that they like or dislike a recommendation despite their personality. User feedback and control might be especially important when the system and the user disagree about the user’s personality. If the user thinks he or she is extraverted and conscientious but the system determines different results, the user will probably be dissatisfied with the results and lose trust in the system. However, the question remains what the ground truth of somebody’s personality should be. Should the user have the power to tell the system which personality he or she is – or is the system’s analysis more accurate than the user’s self-assessment and thus should be given higher authority?

5.2.4 Manipulation concerns Personality-aware personalization implies the risk of user manipulation and therefore clearly represents a challenge for users’ perception and acceptance of this concept. Ever since the international headlines about Cambridge Analytica [20, 117], the fear of unconscious manipulation has emerged. In early 2018, Cambridge Analytica has fallen

52 | S. T. Völkel et al. into disrepute due to illegally employing Facebook users’ data for trait-related personalization of online advertisement and consequently for manipulating voters’ decisions in the US election campaign in 2016. This example illustrates that personality-aware personalization could also promote so-called filter bubbles [107]. This term describes the isolation of a person towards information not corresponding to his or her initial point of view, which could result in intellectual restriction. It is conceivable that particularly people low in emotional stability are prone to filter bubbles as they tend to prefer information confirming their previous knowledge [59]. Due to the close link between personality traits and behavior, filter bubbles could be used to address users individually to influence their opinions, attitude, and behavior. In summary, using personality traits for personalization in HCI provide tailormade services and applications for users’ needs and preferences, which is both a great advantage and a potential risk. Thus, from a technical and ethical point of view, it will be a major challenge in the future to develop responsible systems utilizing the right balance between comfortable but not too restrictive personalization based on personality traits.

6 Methodological requirements Utilizing personality traits for personalization also poses several methodological challenges. First of all, investigating the influence of personality traits usually requires large sample sizes to ensure that all personality trait expressions are represented in the sample. In the past, several studies used small sample sizes (e. g., [39]) and hence struggled with insignificant or unclear findings. The sample sizes have to scale even more when not only considering personality traits individually but also their interactions. Until now, researchers often reported the associations of single characteristics. However, the interaction between different personality traits has to be considered, too, especially when associated intended system adaptations might contradict each other. A possible approach could be to define interesting user profiles based on a combination of personality traits, similarily to Heinstrom et al.’s [59] classification of information seeking types. Furthermore, the samples have to include a representative distribution of personality traits. This requirement might prove particularly difficult since Dahlbeck and Karsvall [32] found a personality bias in volunteer-based user studies, revealing that participants are more extraverted and open than in a representative sample. Another methodological challenge is the measurement of success of personalitybased personalization. Previous researchers explained difficulties in defining a positive evaluation of these systems in contrast to control systems (e. g., [124]). Possibilities for these measurements include accuracy of recommendations, performance, and subjective satisfaction, but might be highly dependent on the use case. Moreover,

2 Utilizing personality traits for personalization in HCI

| 53

these effects might only become apparent in the long term, requiring longitudinal surveys and iterative optimizations. A possibly important measurement is not only to determine an effect of personality traits but also to examine the underlying reasons for user preferences and an effect. Often researchers reported mixed results regarding the magnitude and significance of different effects (e. g., [61]). Gaining deeper insights into user behavior could help to design more accurate experiments and clarify mixed and contradicting results. Finally, the necessary accuracy of personality-based personalization has to be determined. Since many approaches of automatically recognizing personality are based on Machine Learning algorithms (e. g., [49, 130]), the guaranteed accuracy might still not be satisfying for an actual implementation. Hence, it has to be investigated which accuracy is necessary to improve systems without resulting in user distrust. For example, only cases with a high probability of predicting personality traits could be classified. Another possibility is to present personality-aware adaptations only as initial options to the user, which can easily be changed. Moreover, one must be aware of the caveats that come by using Machine Learning methods. These algorithms are often called black box models, because they can offer high predictive accuracy, but they do not give explanations for their predictions. In personality-based systems, however, the interpretability of algorithms is of great importance, because of the high level of transparency and intelligibility these systems require. One may have to accept a drop in accuracy in favor of higher interpretability. If the use of a black box model is essential, explanations of predictions could still be achieved by using interpretable Machine Learning methods (e. g., [92]).

7 Summary and conclusion Due to the many advantages for users and businesses, personalization is an emerging trend of the twenty-first century. The success of personalization highly depends on the user model, a representation of information about the user. In this chapter, we argued that personality traits provide a promising additional source of information for personalization because they are assumed to be relatively stable and crosssituational. At first, we introduced the well-established Big Five personality model from psychology. Next, we presented previous findings on the role of personality traits for HCI, which could inform opportunities and challenges of personality-aware personalization. An overview of personality-aware personalization can be found in Fig. 2.1. It was suggested that personality traits influence a preference for (intelligent) interactive systems since users prefer to interact with congruent personalities. This preference could be an opportunity to develop completely empathic intelligent systems, as imagined by popular culture for a long time. Furthermore, personality can be used to

54 | S. T. Völkel et al.

Figure 2.1: Personality-aware personalization: overview of presented opportunities and challenges as well as previous findings on the role of personality traits in HCI and characteristics of personality traits.

provide personalized information by addressing different preferences for the amount, depth, and visualization of information. Persuasive technology could take advantage of this relationship, for example by giving adequate and engaging feedback in health applications. Another use case of personalized information is increasing comfort in autonomous vehicles. Moreover, personality traits could be used to overcome the cold start problem of personalization since they can inform systems before the first use. We described first approaches to develop personality-aware recommender systems. The reflection of personality in communication has been investigated both in face-toface communication as well as in smartphone and social media use. On the one hand, this link is an opportunity to design personalized instant-messaging services regarding autocorrection or emojis. On the other hand, the relationship between personality traits and communication behavior can also be used for one of the biggest challenges of personality-aware personalization; the automatic technological assessment of personality traits. Besides, to create empathic intelligent agents, further research is necessary to synthesize consistent personalities. We also pointed out that other important challenges for utilizing personality traits for personalization are user views and concerns, particularly trust and acceptance of these sensitive data, as well as transparent systems. In conclusion, personality traits could be a promising source for personalization. However, the impact of personality traits for HCI still remains widely unexplored. In the previous sections, we presented research questions for each opportunity and challenge. We encourage researchers to address these research questions in their work to examine whether and how personality traits can improve personalization.

2 Utilizing personality traits for personalization in HCI

| 55

References [1]

[2] [3] [4]

[5]

[6] [7]

[8] [9]

[10]

[11]

[12]

[13]

[14] [15]

[16] [17]

G. D. Abowd, A. K. Dey, P. J. Brown, N. Davies, M. Smith, and P. Steggles. Towards a better understanding of context and context-awareness. In Handheld and Ubiquitous Computing, pages 304–307. Springer, Berlin, Heidelberg, 1999. G. Adomavicius, Z. Huang, and A. Tuzhilin. Personalization and recommender systems. In State-of-the-art decision making tools in the information-intensive age, pages 55–100, 2008. E. André and T. Rist. Presenting through performing: on the use of multiple lifelike characters in knowledge-based presentation systems. Knowledge-Based Systems, 14(1):3–13, 2000. S. Andrist, B. Mutlu, and A. Tapus. Look like me: Matching robot personality via gaze to increase motivation. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI ’15, pages 3603–3612. ACM, New York, NY, US, 2015. J. B. Asendorpf. Personality: Traits and situations. In P. J. Corr and G. Matthews, editors, The Cambridge Handbook of Personality Psychology, pages 43–53. Cambridge University Press, Cambridge, UK, 2009. J. B. Asendorpf and F. J. Neyer. Psychologie der Persönlichkeit. Springer-Verlag, Berlin, Heidelberg, 2012. M. C. Ashton and K. Lee. Empirical, theoretical, and practical advantages of the HEXACO model of personality structure. Personality and Social Psychology Review, 11(2):150–166, 2007. M. Asif and J. Krogstie. Taxonomy of personalization in mobile services. In Proceedings of the 10th IADIS International Conference e-Society, pages 343–350, 2012. D. Azucar, D. Marengo, and M. Settanni. Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis. Personality and Individual Differences, 124:150–159, 2018. F. Bakalov, M.-J. Meurs, B. König-Ries, B. Sateli, R. Witte, G. Butler, and A. Tsang. An approach to controlling user models and personalization effects in recommender systems. In Proceedings of the 2013 International Conference on Intelligent User Interfaces, pages 49–56. ACM, New York, NY, US, 2013. G. Bansal, F. M. Zahedi, and D. Gefen. Do context and personality matter? Trust and privacy concerns in disclosing private information online. Information & Management, 53(1):1–21, 2016. T. Barnett, A. W. Pearson, R. Pearson, and F. W. Kellermanns. Five-factor model personality traits as predictors of perceived and actual usage of technology. European Journal of Information Systems, 24(4):374–390, 2015. F. Bentley, K. Tollmar, P. Stephenson, L. Levy, B. Jones, S. Robertson, E. Price, R. Catrambone, and J. Wilson. Health mashups: Presenting statistical patterns between wellbeing data and context in natural language to promote behavior change. ACM Transactions on Computer-Human Interaction (ToCHI), 20(5):30, 2013. A. Bergström. Online privacy concerns: A broad approach to understanding the concerns of different groups for different uses. Computers in Human Behavior, 53:419–426, 2015. C. J. Beukeboom, M. Tanis, and I. E. Vermeulen. The language of extraversion: Extraverted people talk more abstractly, introverts are more concrete. Journal of Language and Social Psychology, 32(2):191–201, 2013. J. Blom. Personalization: a taxonomy. In CHI’00 Extended Abstracts on Human Factors in Computing Systems, pages 313–314. ACM, New York, NY, US, 2000. D. Buschek, A. De Luca, and F. Alt. There is more to typing than speed: Expressive mobile touch keyboards via dynamic font personalisation. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services, MobileHCI ’15,

56 | S. T. Völkel et al.

[18] [19] [20]

[21]

[22]

[23]

[24]

[25]

[26]

[27] [28] [29] [30]

[31] [32] [33]

[34]

[35]

pages 125–130, ACM, New York, NY, US, 2015. S. Butt and J. G. Phillips. Personality and self reported mobile phone use. Computers in Human Behavior, 24(2):346–360, 2008. D. Byrne. Interpersonal attraction and attitude similarity. The Journal of Abnormal and Social Psychology, 62(3):713–715, 1961. C. Cadwalladr and E. Graham-Harrison. Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach, 2018. https://www.theguardian.com/news/2018/ mar/17/cambridge-analytica-facebook-influence-us-election, Accessed on: 11-05-2018. I. Cantador, I. Fernández-Tobías, and A. Bellogín. Relating personality types with user preferences in multiple entertainment domains. In S. Berkovsky, E. Herder, P. Lops, and O. C. Santos, editors, UMAP 2013: Extended Proceedings Late-Breaking Results, Project Papers and Workshop Proceedings of the 21st Conference on User Modeling, Adaptation, and Personalization. CEUR Workshop Proceedings, 2013. F. Celli, E. Bruni, and B. Lepri. Automatic personality and interaction style recognition from facebook profile pictures. In Proceedings of the 22nd ACM International Conference on Multimedia, MM ’14, pages 1101–1104. ACM, New York, NY, US, 2014. O. Chausson. Who watches what?: Assessing the impact of gender and personality on film preferences. Paper published online on the MyPersonality project website http:// mypersonality.org/wiki/doku.php, 2010. R. K. Chellappa and R. G. Sin. Personalization versus privacy: An empirical examination of the online consumer’s dilemma. Information Technology and Management, 6(2-3):181–202, 2005. L. Chen, W. Wu, and L. He. Personality and recommendation diversity. In M. Tkalčič, B. De Carolis, M. de Gemmis, A. Odić, and A. Košir, editors, Emotions and Personality in Personalized Services: Models, Evaluation and Applications, pages 201–225. Springer International Publishing, Cham, 2016. C. M. Ching, A. T. Church, M. S. Katigbak, J. A. S. Reyes, J. Tanaka-Matsumi, S. Takaoka, H. Zhang, J. Shen, R. M. Arias, B. C. Rincon, and F. A. Ortiz. The manifestation of traits in everyday behavior and affect: A five-culture study. Journal of Research in Personality, 48:1–16, 2014. G. Chittaranjan, J. Blom, and D. Gatica-Perez. Mining large-scale smartphone data for personality studies. Personal and Ubiquitous Computing, 17(3):433–450, 2013. W. J. Chopik and S. Kitayama. Personality change across the lifespan: Insights from a cross-cultural longitudinal study. Journal of Personality, 70(1), 2017. C. R. Cloninger, D. M. Svrakic, and T. R. Przybeck. A psychobiological model of temperament and character. Archives of General Psychiatry, 50(12):975–990, 1993. E. Constantinou, G. Panayiotou, N. Konstantinou, A. Loutsiou-Ladd, and A. Kapardis. Risky and aggressive driving in young adults: Personality matters. Accident Analysis & Prevention, 43(4):1323–1331, 2011. P. T. Costa Jr and R. R. McCrae. Four ways five factors are basic. Personality and Individual Differences, 13(6):653–665, 1992. N. Dahlbäck and A. Karsvall. Personality bias in volunteer based user studies. In Proceedings of HCI, 2000. A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web, pages 271–280. ACM, New York, NY, US, 2007. F. De Fruyt, L. Van De Wiele, and C. Van Heeringen. Cloninger’s Psychobiological Model of Temperament and Character and the Five-Factor Model of Personality. Personality and Individual Differences, 29(3):441–452, 2000. B. De Raad. The Big Five Personality Factors: The Psycholexical Approach to Personality.

2 Utilizing personality traits for personalization in HCI

[36] [37] [38]

[39]

[40]

[41]

[42]

[43] [44]

[45]

[46]

[47] [48]

[49]

[50] [51]

[52] [53]

| 57

Hogrefe & Huber Publishers, 2000. K. M. DeNeve and H. Cooper. The happy personality: A meta-analysis of 137 personality traits and subjective well-being. Psychological Bulletin, 124(2):197, 1998. P. H. DuBois. A test-dominated society: China, 1115 B. C. – 1905 A.D. In Testing Problems in Perspective, pages 29–36, 1966. K. Eckoldt, M. Knobel, M. Hassenzahl, and J. Schumann. An experiential perspective on advanced driver assistance systems. It-Information Technology Methoden und Innovative Anwendungen der Informatik und Informationstechnik, 54(4):165–171, 2012. P. Ehrenbrink, S. Osman, and S. Möller. Google Now is for the extraverted, Cortana for the introverted: Investigating the influence of personality on IPA preference. In Proceedings of the 29th Australian Conference on Human-Computer Interaction (OzCHI), pages 1–9. ACM, New York, NY, US, 2017. M. Eiband, H. Schneider, M. Bilandzic, J. Fazekas-Con, M. Haug, and H. Hussmann. Bringing transparency design into practice. In 23rd International Conference on Intelligent User Interfaces, IUI ’18, pages 211–223. ACM, New York, NY, US, 2018. M. Elahi, M. Braunhofer, F. Ricci, and M. Tkalčič. Personality-based active learning for collaborative filtering recommender systems. In M. Baldoni, C. Baroglio, G. Boella, and R. Micalizio, editors, AI*IA 2013: Advances in Artificial Intelligence, pages 360–371. Springer International Publishing, Cham, 2013. M. Emilsson, I. Berndtsson, J. Lötvall, E. Millqvist, J. Lundgren, Å. Johansson, E. Brink, The influence of personality traits and beliefs about medicines on adherence to asthma treatment. Primary Care Respiratory Journal, 20(2):141–147, 2011. A. M. Evans and W. Revelle. Survey and behavioral measurements of interpersonal trust. Journal of Research in Personality, 42(6):1585–1593, 2008. I. Fernández-Tobías, M. Braunhofer, M. Elahi, F. Ricci, and I. Cantador. Alleviating the new user problem in collaborative filtering by exploiting personality information. User Modeling and User-Adapted Interaction, 26(2):221–255, 2016. B. Ferwerda, M. Tkalčič, and M. Schedl. Personality traits and music genres: What do people prefer to listen to? In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, UMAP ’17, pages 285–288. ACM, New York, NY, US, 2017. B. Ferwerda, E. Yang, M. Schedl, and M. Tkalčič. Personality traits predict music taxonomy preferences. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, CHI EA ’15, pages 2241–2246, New York, NY, US, 2015. ACM. D. C. Funder. Persons, behaviors and situations: An agenda for personality psychology in the postwar era. Journal of Research in Personality, 43(2):120–126, 2009. A. Furnham and J. Walker. The influence of personality traits, previous experience of art, and demographic variables on artistic preference. Personality and Individual Differences, 31(6):997–1017, 2001. J. Golbeck, C. Robles, and K. Turner. Predicting personality with social media. In CHI’11 Extended Abstracts on Human Factors in Computing Systems, pages 253–262, New York, NY, US, 2011. ACM. L. R. Goldberg. Language and individual differences: The search for universals in personality lexicons. Review of Personality and Social Psychology, 2(1):141–165, 1981. T. M. Green and B. Fisher. Towards the personal equation of interaction: The impact of personality factors on visual analytics interface interaction. In Visual Analytics Science and Technology (VAST), pages 203–210. IEEE, 2010. P. Hagen, H. Manning, and R. Souza. Smart personalization. Forrester Research, Cambridge, MA, US, 1999. T. Halevi, J. Lewis, and N. Memon. A pilot study of cyber security and privacy related behavior

58 | S. T. Völkel et al.

[54]

[55]

[56]

[57]

[58] [59]

[60]

[61] [62]

[63]

[64]

[65]

[66]

[67]

[68]

[69] [70]

and personality traits. In Proceedings of the 22nd International Conference on World Wide Web, pages 737–744. ACM, New York, NY, USA, 2013. P. A. Hancock, D. R. Billings, K. E. Schaefer, J. Y. Chen, E. J. De Visser, and R. Parasuraman. A meta-analysis of factors affecting trust in human-robot interaction. Human Factors, 53(5):517–527, 2011. K. S. Haring, Y. Matsumoto, and K. Watanabe. How do people perceive and trust a lifelike robot. In Proceedings of the World Congress on Engineering and Computer Science, volume 1, 2013. M. Hassib, D. Buschek, P. W. Wozniak, and F. Alt. HeartChat: Heart rate augmented mobile chat to support empathy and awareness. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, pages 2239–2251. ACM, New York, NY, US, 2017. R. Häuslschmid, M. von Buelow, B. Pfleging, and A. Butz. Supporting trust in autonomous driving. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, IUI ’17, pages 319–329. ACM, New York, NY, US, 2017. K. Heikkinen, J. Eerola, P. Jäppinen, and J. Porras. Personalized view of personal information. WSEAS Transactions on Information Science and Applications, 2(4), 2004. J. Heinström. Fast surfing, broad scanning and deep diving: The influence of personality and study approach on students’ information-seeking behavior. Journal of Documentation, 61(2):228–247, 2005. J. G. Helgeson and M. Supphellen. A conceptual and measurement comparison of self-congruity and brand personality; the impact of socially desirable responding. International Journal of Market Research, 46(2):205–236, 2004. P. Y. Herzberg. Beyond “accident-proneness”: Using five-factor model prototypes to predict driving behavior. Journal of Research in Personality, 43(6):1096–1100, 2009. S. Y. Ho and D. Bodoff. The effects of web personalization on user attitude and behavior: An integration of the elaboration likelihood model and consumer search theory. MIS Quarterly, 38(2):497–520, 2014. C. Hohenberger, M. Spörrle, and I. M. Welpe. How and why do men and women differ in their willingness to use automated cars? The influence of emotions across different age groups. Transportation Research Part A: Policy and Practice, 94:374–385, 2016. R. Hu. Design and user issues in personality-based recommender systems. In Proceedings of the 4th ACM Conference on Recommender Systems, RecSys ’10, pages 357–360. ACM, New York, NY, US, 2010. R. Hu and P. Pu. Acceptance issues of personality-based recommender systems. In Proceedings of the 3rd ACM Conference on Recommender Systems, RecSys ’09, pages 221–224. ACM, New York, NY, US, 2009. R. Hu and P. Pu. A study on user perception of personality-based recommender systems. In P. De Bra, A. Kobsa, and D. Chin, editors, International Conference on User Modeling, Adaptation, and Personalization, UMAP 2010, pages 291–302. Springer, Berlin, Heidelberg, 2010. D. Huang, M. Tory, B. A. Aseniero, L. Bartram, S. Bateman, S. Carpendale, A. Tang, and R. Woodbury. Personal visualization and personal visual analytics. IEEE Transactions on Visualization and Computer Graphics, 21(3):420–433, 2015. S. Hyken. Recommended just for you: The power of personalization, 2017. https://www. forbes.com/sites/shephyken/2017/05/13/recommended-just-for-you-the-power-ofpersonalization/#61403e3a6087, Accessed on: 18-04-2018. T. A. Judge, D. Heller, and M. K. Mount. Five-factor model of personality and job satisfaction: A meta-analysis. Journal of Applied Psychology, 87(3):530–541, 2002. A. Kannan, K. Kurach, S. Ravi, T. Kaufmann, A. Tomkins, B. Miklos, G. Corrado, L. Lukacs, M.

2 Utilizing personality traits for personalization in HCI

[71]

[72]

[73]

[74] [75]

[76]

[77] [78]

[79]

[80] [81]

[82]

[83]

[84]

[85] [86]

| 59

Ganea, P. Young, and V. Ramavajjala. Smart reply: Automated response suggestion for email. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 955–964. ACM, New York, NY, US, 2016. R. P. Karumur, T. T. Nguyen, and J. A. Konstan. Exploring the value of personality in predicting rating behaviors: A study of category preferences on movielens. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, pages 139–142. ACM, New York, NY, US, 2016. Z. Kasap and N. Magnenat-Thalmann. Intelligent virtual humans with autonomy and personality: State-of-the-art. In N. Magnenat-Thalmann, L. C. Jain, and N. Ichalkaranje, editors, New Advances in Virtual Humans: Artificial Intelligence Environment, pages 43–84. Springer, Berlin, Heidelberg, 2008. J. Kay and B. Kummerfeld. Creating personalized systems that people can scrutinize and control: Drivers, principles and experience. ACM Transactions on Interactive Intelligent Systems (TiiS), 2(4):24, 2012. H. H. Kelley. Attribution theory in social psychology. In Nebraska Symposium on Motivation, volume 15. University of Nebraska Press, 1967. M. L. Korzaan and K. T. Boswell. The influence of personality traits and information privacy concerns on behavioral intentions. Journal of Computer Information Systems, 48(4):15–24, 2008. M. Kosinski, D. Stillwell, and T. Graepel. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15):5802–5805, 2013. J. Kramer, S. Noronha, and J. Vergo. A user-centered design approach to personalization. Communications of the ACM, 43(8):44–48, 2000. T. Kulesza, S. Stumpf, M. Burnett, and I. Kwan. Tell me more?: The effects of mental model soundness on personalizing an intelligent agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’10, pages 1–10. ACM, New York, NY, US, 2012. M. Kyriakidis, R. Happee, and J. C. de Winter. Public opinion on automated driving: Results of an international questionnaire among 5000 respondents. Transportation Research Part F: Traffic Psychology and Behaviour, 32:127–140, 2015. J. LeBlanc and M. Ducharme. Influence of personality traits on plasma levels of cortisol and cholesterol. Physiology & Behavior, 84(5):677–680, 2005. K. M. Lee and C. Nass. Designing social presence of social actors in human computer interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’03, pages 289–296. ACM, New York, NY, US, 2003. K. M. Lee, W. Peng, S.-A. Jin, and C. Yan. Can robots manifest personality?: An empirical test of personality recognition, social responses, and social presence in human–robot interaction. Journal of Communication, 56(4):754–772, 2006. B. Lepri, J. Staiano, E. Shmueli, F. Pianesi, and A. Pentland. The role of personality in shaping social networks and mediating behavioral change. User Modeling and User-Adapted Interaction, 26(2):143–175, Jun 2016. B. Y. Lim and A. K. Dey. Toolkit to support intelligibility in context-aware applications. In Proceedings of the 12th ACM international conference on Ubiquitous computing, UbiComp ’10, pages 13–22. ACM, New York, NY, US, 2010. G. Matthews, I. J. Deary, and M. C. Whiteman. Personality Traits. Cambridge University Press, Cambridge, UK, 2003. S. C. Matz, M. Kosinski, G. Nave, and D. J. Stillwell. Psychological targeting as an effective approach to digital mass persuasion. Proceedings of the National Academy of Sciences of the United States of America, 114(48):12714–12719, 2017.

60 | S. T. Völkel et al.

[87] [88]

[89] [90]

[91] [92] [93]

[94]

[95]

[96] [97] [98] [99]

[100]

[101] [102]

[103] [104]

[105]

[106]

R. R. McCrae and P. T. Costa. Personality, coping, and coping effectiveness in an adult sample. Journal of Personality, 54(2):385–404, 1986. R. R. McCrae and P. T. Costa Jr. The five-factor theory of personality. In O. P. John, R. Robins, and L. A. Pervin, editors, Handbook of Personality: Theory and Research, pages 159–181. Guilford Press, New York, NY, US, 2008. R. R. McCrae and O. P. John. An introduction to the five-factor model and its applications. Journal of Personality, 60(2):175–215, 1992. M. McRorie, I. Sneddon, G. McKeown, E. Bevacqua, E. de Sevin, and C. Pelachaud. Evaluation of four designed virtual agent personalities. IEEE Transactions on Affective Computing, 3(3):311–322, 2012. B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Communications of the ACM, 43(8):142–151, 2000. C. Molnar. Interpretable Machine Learning. 2018. https://christophm.github.io/ interpretable-ml-book/. C. Montag, K. Błaszkiewicz, B. Lachmann, I. Andone, R. Sariyska, B. Trendafilov, M. Reuter, and A. Markowetz. Correlating personality and actual phone usage: Evidence from psychoinformatics. Journal of Individual Differences, 35(3):158–165, 2014. M. Mount, R. Ilies, and E. Johnson. Relationship of personality traits and counterproductive work behaviors: The mediating effects of job satisfaction. Personnel Psychology, 59(3):591–622, 2006. R. C. Mulyanegara, Y. Tsarenko, and A. Anderson. The big five and brand personality: Investigating the impact of consumer personality on preferences towards particular brand personality. Journal of Brand Management, 16(4):234–247, 2009. E. Mussi. Flexible and context-aware processes in service oriented computing. Dipartimento di Elettronica e Informazione, 2007. A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, 2008, SP 2008, pages 111–125. IEEE, 2008. C. Nass and S. Brave. Wired for speech: How voice activates and advances the human-computer relationship. MIT Press, Cambridge, MA, US, 2005. G. Nave, J. Minxha, D. M. Greenberg, M. Kosinski, D. Stillwell, and J. Rentfrow. Musical preferences predict personality: Evidence from active listening and Facebook likes. Psychological Science, 29(7):1145–1158, 2018. J. K. H. Nga and G. Shamuganathan. The influence of personality traits and demographic factors on social entrepreneurship start up intentions. Journal of Business Ethics, 95(2):259–282, 2010. L. Nocks. The Robot: The Life Story of a Technology. Greenwood Press, Westport, CT, US, 2006. M. A. S. Nunes and R. Hu. Personality-based recommender systems: an overview. In Proceedings of the 6th ACM Conference on Recommender Systems, RecSys ’12, pages 5–6. ACM, New York, NY, US, 2012. M. A. S. N. Nunes. Towards to psychological-based recommenders systems: a survey on recommender systems. Scientia Plena, 6(8), 2010. T. B. O’Brien and A. DeLongis. The interactional context of problem-, emotion-, and relationship-focused coping: The role of the Big Five personality factors. Journal of Personality, 64(4):775–813, 1996. A. Oulasvirta, M. Raento, and S. Tiitta. Contextcontacts: Re-designing smartphone’s contact book to support mobile awareness and collaboration. In Proceedings of the 7th International Conference on Human Computer Interaction with Mobile Devices & Services, MobileHCI ’05, pages 167–174. ACM, New York, NY, US, 2005. D. J. Ozer and V. Benet-Martínez. Personality and the Prediction of Consequential Outcomes.

2 Utilizing personality traits for personalization in HCI

| 61

Annual Review of Psychology, 57(1):401–421, 2006. [107] E. Pariser. The Filter Bubble: What the Internet Is Hiding from You. The Penguin Press, London, UK, 2011. [108] G. Park, H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, M. Kosinski, D. J. Stillwell, L. H. Ungar, and M. E. P. Seligman. Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6):934–952, 2015. [109] E. Perik, B. De Ruyter, P. Markopoulos, and B. Eggen. The sensitivities of user profile information in music recommender systems. In Proceedings of Private, Security, Trust, pages 137–141, 2004. [110] B. Pfleging, M. Rang, and N. Broy. Investigating user needs for non-driving-related activities during automated driving. In Proceedings of the 15th International Conference on Mobile and Ubiquitous Multimedia, MUM ’16, pages 91–99. ACM, New York, NY, US, 2016. [111] A. D. Pickering, P. J. Corr, and J. A. Gray. Interactions and reinforcement sensitivity theory: A theoretical analysis of Rusting and Larsen (1997). Personality and Individual Differences, 26(2):357–365, 1999. [112] P. A. Rauschnabel, A. Brem, and B. S. Ivens. Who will buy smart glasses? Empirical results of two pre-market-entry studies on the role of personality in individual awareness and intended adoption of Google Glass wearables. Computers in Human Behavior, 49:635–647, 2015. [113] B. Reeves and C. I. Nass. The Media Equation: How People Treat Computers, Television, and New Media like Real People and Places. Cambridge University Press, Cambridge, UK, 1996. [114] P. J. Rentfrow and S. D. Gosling. The do re mi’s of everyday life: The structure and personality correlates of music preferences. Journal of Personality and Social Psychology, 84(6):1236, 2003. [115] P. Resnick and H. R. Varian. Recommender systems. Communications of the ACM, 40(3):56–58, 1997. [116] C. Rödel, S. Stadler, A. Meschtscherjakov, and M. Tscheligi. Towards autonomous cars: the effect of autonomy levels on acceptance and user experience. In Proceedings of the 6th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, AutoUI ’14, pages 1–8. ACM, New York, NY, US, 2014. [117] M. Rosenberg and G. J. Dance. ‘You Are the Product’: Targeted by Cambridge Analytica on Facebook, 2018. https://www.nytimes.com/2018/04/08/us/facebook-users-data-harvestedcambridge-analytica.html, Accessed on: 11-05-2018. [118] A. Roshchina, J. Cardiff, and P. Rosso. Twin: personality-based intelligent recommender system. Journal of Intelligent & Fuzzy Systems, 28(5):2059–2071, 2015. [119] M. Sailer, J. U. Hense, S. K. Mayr, and H. Mandl. How gamification motivates: An experimental study of the effects of specific game design elements on psychological need satisfaction. Computers in Human Behavior, 69:371–380, 2017. [120] M. Salem, G. Lakatos, F. Amirabdollahian, and K. Dautenhahn. Would you trust a (faulty) robot?: Effects of error, task type and personality on human-robot cooperation and trust. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI ’15, pages 141–148. ACM, New York, NY, US, 2015. [121] K. E. Schaefer, J. Y. Chen, J. L. Szalma, and P. A. Hancock. A meta-analysis of factors influencing the development of trust in automation: Implications for understanding autonomy in future systems. Human Factors, 58(3):377–400, 2016. [122] K. R. Scherer. Personality markers in speech. In K. R. Scherer and H. Giles, editors, Social Markers in Speech. Cambridge University Press, Cambridge, UK, 1979. [123] U. Schimmack, P. Radhakrishnan, S. Oishi, V. Dzokoto, and S. Ahadi. Culture, personality, and subjective well-being: Integrating process models of life satisfaction. Journal of Personality and Social Psychology, 82(4):582, 2002.

62 | S. T. Völkel et al.

[124] H. Schneider, K. Schauer, C. Stachl, and A. Butz. Your data, your vis: Personalizing personal data visualizations. In R. Bernhaupt, G. Dalvi, A. K. Joshi, D. Balkrishan, J. O’Neill, and M. Winckler, editors, Human-Computer Interaction – INTERACT 2017. Lecture Notes in Computer Science, volume 10515, pages 374–392. Springer, Cham, 2017. [125] P. R. Shaver and K. A. Brennan. Attachment styles and the “Big Five” personality traits: Their connections with each other and with romantic relationship outcomes. Personality and Social Psychology Bulletin, 18(5):536–545, 1992. [126] J. Shropshire, M. Warkentin, and S. Sharma. Personality, attitudes, and intentions: Predicting initial adoption of information security behavior. Computers & Security, 49:177–191, 2015. [127] M. J. Sirgy. Using self-congruity and ideal congruity to predict purchase motivation. Journal of Business Research, 13(3):195–206, 1985. [128] H. Song and N. Kwon. The relationship between personality traits and information competency in Korean and American students. Social Behavior and Personality, 40(7):1153–1162, 2012. [129] J. Specht, B. Egloff, and S. C. Schmukle. Stability and change of personality across the life course: the impact of age and major life events on mean-level and rank-order stability of the Big Five. Journal of Personality and Social Psychology, 101(4):862–882, 2011. [130] C. Stachl, S. Hilbert, J.-Q. Au, D. Buschek, A. De Luca, B. Bischl, H. Hussmann, and M. Bühner. Personality traits predict smartphone usage. European Journal of Personality, 31(6):701–722, 2017. [131] K. Y. Tam and S. Y. Ho. Understanding the impact of web personalization on user information processing and decision outcomes. MIS Quarterly, pages 865–890, 2006. [132] A. Tapus and M. J. Mataric. Socially assistive robots: The link between personality, empathy, physiological signals, and task performance. In AAAI Spring Symposium: Emotion, Personality, and Social Behavior, pages 133–140, 2008. [133] The European Parliament and Council. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union, 2016. [134] A. Thieme, R. Comber, J. Miebach, J. Weeden, N. Kraemer, S. Lawson, and P. Olivier. “We’ve bin watching you”: Designing for reflection and social persuasion to promote sustainable lifestyles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pages 2337–2346. ACM, New York, NY, US, 2012. [135] N. Tintarev and J. Masthoff. A survey of explanations in recommender systems. In 2007 IEEE 23rd International Conference on Data Engineering Workshop, pages 801–810. IEEE, 2007. [136] M. Tkalčič and L. Chen. Personality and recommender systems. In F. Ricci, L. Rokach, and B. Shapira, editors, Recommender Systems Handbook, pages 715–739. Springer US, Boston, MA, US, 2015. [137] M. Tkalčič, B. Ferwerda, D. Hauger, and M. Schedl. Personality correlates for digital concert program notes. In F. Ricci, K. Bontcheva, O. Conlan, and S. Lawless, editors, User Modeling, Adaptation and Personalization, pages 364–369. Springer International Publishing, Cham, 2015. [138] A. Tuzhilin. Personalization: The state of the art and future directions. Business Computing, 3(3), 2009. [139] Y. V. Vaerenbergh and T. D. Thomas. Response styles in survey research: A literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research, 25(2):195–217, 2013. [140] T. Van der Geest, J. van Dijk, W. Pieterson, W. Ebbers, B. Fennis, N. Loorbach, M. Steehouder, E. Taal, and P. de Vries. Alter Ego: State of the Art on User Profiling: An Overview of the Most

2 Utilizing personality traits for personalization in HCI

[141] [142] [143]

[144] [145] [146]

[147]

[148] [149] [150]

| 63

Relevant Organisational and Behavioural Aspects Regarding User Profiling. Telematica Instituut, Enschede, Netherlands, 2005. A. Vinciarelli and G. Mohammadi. A survey of personality computing. IEEE Transactions on Affective Computing, 5(3):273–291, 2014. M. Warkentin, M. McBride, L. Carter, and A. Johnston. The role of individual characteristics on insider abuse intentions. In AMCIS 2012 Proceedings, 28, 2012. A. Weiss, B. van Dijk, and V. Evers. Knowing me knowing you: Exploring effects of culture and context on perception of robot personality. In Proceedings of the 4th International Conference on Intercultural Collaboration, pages 133–136. ACM, New York, NY, US, 2012. R. Westfall. Psychological factors in predicting product choice. The Journal of Marketing, 26(2):34–40, 1962. T. Yarkoni. Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. Journal of Research in Personality, 44(3):363–373, 2010. H. S. Yoon and L. M. B. Steege. Development of a quantitative model of the impact of customers’ personality and perceptions on internet banking use. Computers in Human Behavior, 29(3):1133–1141, 2013. W. Youyou, M. Kosinski, and D. Stillwell. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4):1036–1040, 2015. C. Ziemkiewicz and R. Kosara. Preconceptions and individual differences in understanding visual metaphors. Computer Graphics Forum, 28(3):911–918, 2009. A. Zimmermann, M. Specht, and A. Lorenz. Personalization and context management. User Modeling and User-Adapted Interaction, 15(3):275–302, 2005. M. Zuckerman. Sensation Seeking: Beyond the Optimal Level of Arousal. Lawrence Erlbaum Associates, Inc, New York, NY, US, 1979.

|

Part II: User input and feedback

Mirjam Augstein and Thomas Neumayr

3 Automated personalization of input methods and processes Abstract: Personalization, aiming at supporting users individually and according to their individual needs and prerequisites, has been discussed in a number of domains, including learning, searching, or information retrieval. In the field of human– computer interaction, personalization also bears high potential as users might exhibit varying and strongly individual preferences and abilities related to interaction. For instance, users with certain kinds of motor impairments might not be able to use certain input devices and methods, such as touchscreens and touch-based interaction. At least a high amount of time consuming individual configuration typically arises. Further, interaction preferences might also vary among people without known impairments. Thus, personalized interaction, taking into account these prerequisites, might offer individualized support and solutions to potential problems. Personalized interaction involves automated selection and configuration of input devices but also adaptation of applications and user interfaces. This chapter discusses personalized interaction generally and presents a software framework that provides a template for a feasible technical infrastructure. Further, it explains a specific case study of personalized interaction that was implemented on the basis of the framework, and discusses an evaluation process and results for this use case. Keywords: personalized interaction, interaction modeling, user modeling, layered evaluation

1 Introduction Personalization pursues the aim of individually supporting users, either by means of configuration (which involves the initiative of the user him- or herself) or by systeminitiated adaptation to the user’s needs (see a more detailed distinction in [31]). Throughout the past decades, personalized systems research has mainly focused on personalization of content, navigation, and presentation to the user’s needs in different domains (e. g., adaptive hypermedia [24, 40], e-learning [12, 14], e-commerce [38, 33], music [39, 10], and movie recommendation [17, 29]). In the area of human–computer interaction (HCI), personalization also plays an important role as for the variety of users, interaction abilities might vary drastically due to individual preferences but also due to impairments affecting interaction processes. Especially because of the latter, it is, for many users, not possible to use common interaction devices (such as keyboard, mouse, or touchscreen) and user interfaces (UIs), at least not with their default configuration. Further, even if physically https://doi.org/10.1515/9783110552485-003

68 | M. Augstein and T. Neumayr and cognitively capable of using these devices and UIs, users’ preferences related to interaction are highly individual. Thus, Personalized Interaction (PI) approaches bear high potential and might (i) reduce barriers and increase the accessibility of UIs, interaction techniques, and devices and (ii) improve User Experience (UX) and acceptance if implemented well. So far, interaction personalization has been dealt with mainly within or around the field of assistive technology (also see Section 2) where systems are mostly individualized to be usable and meet the needs of people with impairments. However, the focus often was on configurable systems while adaptivity (i. e., automated, systeminitiated personalization) played a lesser role, although there is some pioneer work, for instance, on adaptive UIs (see, e. g., [44, 16]). This chapter summarizes the state of the art around PI (see Section 2) and describes a framework1 that offers basic functionality to implement concrete PI settings (see Section 3). Further, it describes a concrete PI case study2 that has been implemented on the basis of the framework, as well as its evaluation3 (see Sections 4 and 5). Additionally, we shortly describe how the framework can be of use for other interaction analysis processes (see Section 6) and provide a discussion and summary of our work on PI (see Sections 7 and 8).

2 Personalized interaction In this section we discuss related work in terms of general foundations of PI (see Section 2.1), examples for existing approaches and frameworks for personalization of input and output methods (see Section 2.2), and examples for existing approaches and frameworks enabling UI adaptation (see Section 2.3). Section 2.4 then summarizes the most important ideas behind these frameworks, which have partly also inspired the design of our own framework (see Section 3). Methods and techniques of adaptive hypermedia traditionally fell in one of two categories, adaptive presentation and adaptive navigation support [11]. In the 2010s, 1 Parts of the information about the framework have been previously published in an extended abstract at IUI 2017 [5]. Reprinted with permission. 2 Parts of the information on the case study have been previously presented at the HAAPIE workshop at UMAP 2017 [6]: Mirjam Augstein, Thomas Neumayr, Werner Kurschl, Daniel Kern, Thomas Burger, and Josef Altmann, “A Personalized Interaction Approach. Motivation and Use Case”, UMAP’17 Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, pages 221–226, doi:10.1145/3099023.3099043. Reprinted with permission. 3 The information on the evaluation of the use case has been presented at the EvalUMAP workshop at UMAP 2017 [3]: Mirjam Augstein and Thomas Neumayr, “Layered Evaluation of a Personalized Interaction Approach”, UMAP’17 Adjunct Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, pages 173–178, doi:10.1145/3099023.3099043. Reprinted with permission.

3 Automated personalization of input methods and processes | 69

a more explicit distinction between content and presentation issues arose, which provided an impetus to revise classification models with a focus on the distinction between three categories of techniques: content adaptation techniques, adaptive presentation techniques, and adaptive navigation techniques [24]. Still, this taxonomy is tailored to the domain of adaptive hypermedia. However, partly the three techniques generalize quite well to a more universal domain like interactive adaptive systems (which could comprise adaptive hypermedia systems but also such not explicitly built upon hypermedia), particularly those related to presentation and navigation. For instance, an interactive adaptive game application might use adaptive presentation techniques inspired by adaptive hypermedia, like rearrangement of items (adaptive layout), zooming, or scaling. Given the emerging trend towards blurring boundaries between websites and apps, desktop and web-based applications, and increasing numbers of different devices that are utilized by users to access these websites or apps, we suggest an integration of the category adaptive interaction techniques in a more general taxonomy of adaptation techniques in order to accomplish the concept of PI. By adaptive interaction techniques we understand (i) automated selection of input devices, modalities, and activities, (ii) automated configuration of input devices, modalities, and activities, (iii) automated selection of output devices, modalities, and activities, (iv) automated configuration of output devices, modalities, and activities, and (v) UI adaptation. The output-related techniques mainly relate to the system’s feedback as a response to user input. Our remarks in this chapter are focused on inputrelated adaptive interaction techniques and only marginally involve output aspects.

2.1 Foundations of personalized interaction Representative foundations behind PI without specific focus on any of the adaptive interaction techniques mentioned before can be seen in the elaboration on user modeling in HCI by Fischer [15] and the discussion of ability-based design by Wobbrock et al. [45]. Fischer [15] describes the challenges for designers of human–computer systems and states as they “face the formidable task of writing software for millions of users (at design time) while making it work as if it were designed for each individual user (only known at use time).” He further explains that at design time, developers create systems and have to make decisions for users for situational contexts and tasks that they can only anticipate. This is also emphasized by Bauer and Dey [8], who state that context has to be anticipated before run time and argue that user modeling helps to deliver at run time the optimal version of the anticipated features. This again is in line with Fischer’s explanation that an important point about user modeling is related to use time and design time getting blurred, so if a system constantly adapts to its users, use time becomes a different kind of design time [19]. This consideration applies also

70 | M. Augstein and T. Neumayr for what we understand by PI in general and for the individual adaptive interaction techniques mentioned above in specific. The discussion of Wobbrock et al. [45] is rooted in the motivation to make technology accessible for people with disabilities but applies to personalized systems in general. They identify seven principles in three categories: stance (ability, accountability), interface (adaptation, transparency), and system (performance, context, commodity). The stance category involves (and notes as a requirement) that designers should respond to user performance by changing systems, not users. From our point of view, this should be the premises behind all PI approaches. The interface category subsumes adaptation and transparency, suggesting that interfaces might be self-adaptive or useradaptable to provide the best possible match to users’ abilities and that interfaces should give users awareness of adaptations. In our classification just described, this category named by Wobbrock et al. refers to UI adaptation (as further discussed in Section 2.3). The system category involves that systems may monitor, measure, and model user performance and encourages usage of low-cost, readily available hardware and software. In our classification this is mainly relevant for personalization of input and output methods (see Section 2.2) as it involves the selection and configuration of interaction methods and devices based on user models.

2.2 Personalization of input and output methods Within recent years, the available number of different input and output methods for a broad spectrum of users has expanded massively (voice UIs or accelerometers in hand-held devices, just to name a few). Thus, the demand for systems that can handle a multitude of input and output methods and techniques has increased. This again has implications on the underlying user modeling approaches in case these systems should provide personalization. Related to this, Kaklanis et al. [22] introduce a complex taxonomy of user model variables that also involve many related to user input and system output, e. g., motor parameters (such as gait parameters [step length and step width] or upper body parameters), strength parameters (such as maximum gripping force of one hand), hearing parameters, visual parameters, or equilibrium (parameters concerning sense of balance). Although originally published as a recommendation for standardized user models without particular focus on PI, the taxonomy of [22] can be seen as a solid basis for personalization of input and output methods. A notable early example of a system that takes into consideration the personalization of input and output methods is the AVANTI framework introduced by Stephanidis et al. [44]. The framework facilitates the construction of systems supporting adaptability and adaptivity and comprises the following main components: (i) a multimedia database interface, (ii) a user modeling server, (iii) a content model, (iv) a hyperstructure adaptor, and (v) the UI. The AVANTI framework puts a clear focus on hypermedia content (using static elements and alternative hypermedia objects as a basis for

3 Automated personalization of input methods and processes | 71

the construction of individual views of this content). Furthermore, the AVANTI framework suggests an incorporation of support for multiple interaction modalities based on the user profile. A more recent example of a system that is capable of judging individual input performance in connection with different input methods was described by Biswas and Langdon [9]. They discuss a multimodal adaptation algorithm for mobility-impaired users based on evaluation of their hand strength. They point out that there is little reported work on quantitative analysis of the effects of different impairments on pointing performance and there is also a lack of work on the objective evaluation of human factors, relating them with interaction parameters. They measured different variables for hand strength (e. g., grip strength, radial or ulnar deviation, or static tremor) and performed a user study with mobility-impaired and able-bodied participants. Within the study, Biswas and Langdon analyzed pointing tasks and predicted average velocity (with different input devices) and the so-called index of performance, which is based on movement time and index of difficulty, for each participant. Another discussion of user modeling related to interaction that suggests the introduction of a specific input model is provided by Kurschl et al. [27]. Their approach involves fine-grained modeling of users’ abilities related to input, focusing on touchbased and touchless input devices. For example, they measure swipe ability, pan ability, hold time, and preferred input device for touch-based input and precision, or hand coordination for touchless input. They suggest the automated selection of input devices (e. g., a touchscreen instead of hardware switches) as well as the automated configuration of device and input method (e. g., the system’s hold and lock time after a touch was detected) adapted to the users’ needs.

2.3 User interface adaptation A more common way of accounting for a personalized UX compared to the personalization of the interaction process itself lies in the adaptation of the UI. The most successful commercial systems make extensive use of tailoring their UIs to their users’ needs. Again, the aforementioned AVANTI framework is capable of automatically modifying the presentation and behavioral attributes of interactive elements. Its UI component has been designed according to the Unified UI Design (UUID) methodology (which is a method to achieve the goals of the “User Interfaces for All” approach [41, 43, 42]). AVANTI distinguishes between adaptability (based on user characteristics known prior to the interaction) and adaptivity (applicable at run-time, considering changing user characteristics and situations). It follows a rule-based approach when making adaptation decisions for both adaptability (using rules like IF novice in computing AND motor impaired THEN Font = Large AND Size = Large) and adaptivity (e. g., IF high error rate OR inability to navigate THEN ScanRate = Slow).

72 | M. Augstein and T. Neumayr A similar approach building on UUID is discussed by Ringbauer et al. [37]. Their ASK-IT project aims to support mobility-impaired people with information about accessible routes and guidance to enable them to travel independently. The UI design approach in ASK-IT is based on automated adaptation to a user’s requirements (e. g., blind persons are provided audio content instead of standard graphical UI elements). In order to allow for provision of individually adapted UIs for the highly diverse target group, Ringbauer et al. use a modular approach and develop UI elements for all modalities that are then combined to form an individual UI fitting the respective user’s specific interaction requirements. Another discussion of adaptive UIs rooted in the motivation to make technology more accessible is provided by Peissner et al. [36], who present the generic MyUI infrastructure. MyUI relies on an open multimodal design patterns repository [30] which is used as a basis for a modular approach to individualized UIs (also see [35]). MyUI uses a three-stage process for generating and adapting UIs, comprising (i) UI parameterization, (ii) UI preparation, and (iii) UI generation and adaptation. The first stage outputs a UI profile that defines general characteristics of the UI (such as the font size of body text or the display mode which could be text only or graphics only). The system uses the following information sources for this step: a device profile denoting the currently available and used devices, a user profile providing general information about user and environment, and a customization profile. The second stage denotes the selection of the most suitable UI components and elements for the current situation before the third stage is responsible for rendering the selected UI components to an individual UI. To support people with impairments, Gajos et al. suggest UIs that automatically design themselves for the users, considering their functional capabilities. Their system SUPPLE++ (also see [16]) generates UIs based on the user’s motor and vision capabilities utilizing a user model that is constructed after a series of (clicking, pointing, dragging, and list selection) tasks. Gajos et al. treat the generation of individual UIs as an optimization problem targeting a user’s expected movement time, involving time to navigate the interface and time to manipulate an element. They vary variables like the minimum size of a UI element and the distance between UI elements to automatically determine optimal configurations. A different path of research on adaptive UIs can be found under the notion of model-driven approaches deriving their concepts from model-driven engineering (see an exhaustive overview in [13] or [1]). A concrete example is described by Khaddam et al. [23]. Their Adapt-First approach comprises adaptation mechanisms, such as adapting the UI to the context of use, as well as concretization (transformation of the model to a more concrete one). It includes several steps involving classification of the adaptation rule (e. g., rules related to color are classified as concrete UI rules), analysis of the adaptation rule (understand what input is needed to apply a rule, e. g., information about light conditions), establishment of an intermediate meta-model (select

3 Automated personalization of input methods and processes | 73

adaptation rules for the concrete use case), and design of the concretization (generate UIs from a task model using mapping rules). Recently, Park et al. [34] discussed the real-time adaptation of multi-UIs for colocated collaboration (e. g., a presentation with presenters and attendees bringing multiple devices) and treated the distribution of elements among devices as an assignment problem. Their approach, which is called AdaM, uses combinatorial optimization, replacing manual assignment (who gets what) by automatic distribution, so that the most important elements are always available. Further, users’ personal preferences are taken into account, which, however, do not involve general long-term user characteristics and implicit ones inferred from user behavior. Instead, the system considers “importance” values for UI elements (such as presentation, clock, presenter notes) explicitly stated by the users. Furthermore, there also exist many approaches to build intelligent UIs using Machine Learning techniques to automatically adapt UIs based on knowledge about the user and the problem domain. For instance, Lieberman et al. [28] present an early example of an intelligent UI to support collaborative web browsing. Their system Let’s Browse creates interest profiles on both the individual user and the group level, and then matches these profiles with a set of web pages analyzed via natural language processing techniques. The best matches are then recommended. User and group profiles as well as recommendations are updated regularly as changes occur.

2.4 Summary and motivation There are several approaches to UI adaptation and personalization of input and output processes. These approaches involve rule-based ones, model-based ones, and optimization-based ones, as well as intelligent ones using more advanced Machine Learning techniques. The frameworks presented in this section represent different aspects of these approaches which also inspired the design of our own framework discussed in Section 3. The different approaches involve different chances and challenges we tried to consider for our own work. In general, it was our aim to establish a generic infrastructure for user modeling related to interaction. Generic user modeling systems themselves are no novel idea and have a long tradition (see the overview provided by Kobsa [25]). In agreement with Wobbrock et al. [45], who consider relying on low-cost, readily available hardware beneficial, our framework should not rely on any specialized or costly hardware. We thus decided that the framework should be able to deal with arbitrary device constellations. Further, several existing frameworks and approaches are based on an architecture that explicitly distinguishes between several components such as a user modeling server, a database and interface, and the UI itself (see, e. g., AVANTI [44]). Further, Biswas and Langdon [9] maintain an input and display model. We consider such a

74 | M. Augstein and T. Neumayr distributed infrastructure an advantage, especially if individual components should be interchangeable (such as input devices). Thus our framework relies on an independent user modeling component and an analysis application which can run on different devices, as well as a UI component (see Section 3). Most existing frameworks have a very specific focus, e. g., on web-based systems in the case of AVANTI, on navigation guidance in the case of ASK-IT, on the selection and configuration of user interface components in the case of MyUI or SUPPLE++, or collaboration support in the case of AdaM. Few frameworks, however, are explicitly targeted towards the automated personalization of input processes in combination with UI adaptation (which is the focus of our framework). However, we took inspiration from several existing approaches regarding acquisition of information related to users’ interaction preferences and capabilities. For instance, we use so-called interaction tests (see Section 3) in order to derive the necessary information which is similar to the modeling activities described in [16], [27], or [9]. As described in the previous sections, many frameworks rely on rule-based approaches when combining user models based on interaction performance with adaptive UIs. Our own framework takes up this established concept and implements UI adaptation at run-time through analysis of rules (see Section 3.5). Similar to the development toolkit that supports developers in the creation of adaptive applications provided by MyUI, our framework also aims at enabling relatively simple implementation of adaptive behavior for arbitrary device constellations and concrete applications.

3 A framework for personalized interaction We describe a software framework designed to provide a basis for PI with a focus on input methods and devices, that is rooted in the considerations discussed in Section 2.4. It can be seen as a proof-of-concept for PI but also as a generic tool that can be utilized to (i) implement personalized selection and configuration of input devices, (ii) offer adaptive system behavior based on these settings, but also (iii) analyze and model users’ interaction abilities without the intention to later automatize processes related to interaction. A short introduction to the framework has been presented earlier in [5]. The following sections describe the framework’s purpose more in-depth, provide an overview of its architecture and implementation, and introduce the static and dynamic models it maintains.

3.1 Purpose The software framework can be understood as a connective link between input devices and methods, applications that provide certain functionalities, and the user.

3 Automated personalization of input methods and processes | 75

Among the conceivable functionalities in this regard are self-contained pieces of software such as a volume mixer module of an operating system, or a document viewer. The framework’s focus lies on modeling users’ interaction abilities in order to enable automated personalization of system behavior related to interaction. Framework instantiations are able to provide support for the individual user by (i) recommending the individually best fitting input device and method and (ii) tailoring the behavior of the applications to the user’s individual needs. The former refers to the personalization of input and output methods (see Section 2.2), the latter also contains UI adaptation (see Section 2.3). The framework was designed to be generic and to allow for the integration of arbitrary (i) devices and ways of interacting with them and (ii) applications using these devices and interaction methods. It strongly relies on the recording of users’ behavior, particularly their past interaction with the system and the formulation of assumptions about the user based on this interaction history (as listed among others as two frequently found services of user modeling systems [26, 25]). Kobsa [25] further names the following three requirements for user modeling systems as important: Generality, including Domain Independence, Expressiveness, and Strong Inferential Capabilities. Generality means that systems should be usable in as many application and content domains as possible. Expressiveness supposes systems to be able to express as many types of assumptions about the user as possible at the same time and Strong Inferential Capabilities means that the system should be capable of reasoning. The framework described in this section aims at complying with these expectations as well as possible, with a focus on Generality.

3.2 Overview In order to provide an overview of the framework’s basic functionality, we first describe the process a new user undergoes as soon as he/she is registered at the system. When a new user is introduced, automated initial assessments with all available input settings (i. e., combinations of devices and ways of interacting with them, see Section 3.4) are run in order to formalize (“model”) the user’s interaction abilities related to these settings. The system then uses the recorded data to compute metrics, so-called “features” that later go into the user model (see Section 3.4) and indicate a user’s ability to use a certain setting. The system distinguishes between concrete features, related to a certain setting, and common features that describe the user’s ability to work with a setting on a higher level. After the initial assessments and related modeling activities, the system compares all settings for the individual user and recommends the best fitting ones in a ranked list where the user can choose which of the recommended settings to use. Although this step could be automatized it was implemented to involve a user’s activity in order to allow for a higher level of user control. All applications registered at a concrete

76 | M. Augstein and T. Neumayr framework instance can then be operated using the currently selected setting. Further, their behavior might be implemented to adapt to the user’s interaction abilities. In case the system notices a change in these abilities, the new information is fed back into the user model, the system’s behavior adapts to it, and in case the change leads to a different optimum setting, it is recommended anew.

3.3 Architecture The framework’s functionality is organized in two main layers as follows: an Interaction Recording and Processing Component (IRPC) is responsible for running the initial assessments and processing recorded data to the Analysis Component (AC) via a REST web service. The AC then computes metrics and stores them in the user model. Fig. 3.1 shows the basic architecture. An IRPC has to be implemented for every setting and can thus take different appearances based on the input devices and methods it refers to. The restrictions that are defined for these IRPCs are mainly related to the data format the AC accepts (which is JSON-based and presumes certain meta-information to be present). We will describe an exemplary IRPC based on a case study we conducted to prove our concept (see Section 4). The AC on the other hand is generic and runs on a user modeling server (the IRPC could be running on any external device that is able to transfer data via the web). There is only one instance of the AC which receives and analyzes the interaction data of all active IRPCs.

Figure 3.1: The basic framework architecture [5].

The AC was implemented in .NET technology, running on a Windows Server 2008. The two main elements of the AC (the analysis application and the web application) can be

3 Automated personalization of input methods and processes | 77

further described as follows. The analysis application is the core of our system’s “backend” and responsible for the computation of metrics. It checks the directory where interaction data files are sent to for new files and triggers computation of metrics if necessary. Also, it takes over computation of these metrics and processes them to the user model. Before the metrics are computed, a meta-data file is analyzed. Information about the user (ID), the device (e. g., device name and type), and input settings (such as the input method used) as well as timestamps are read and compared to data already existing in the model. In case a user or device has not yet been stored there, the missing entry is created. Metric computation then comprises concrete and common features. As common features might be dependent on the values of concrete ones, concrete ones are computed first. Next, dependencies between features are searched for in the user model and if a common feature is found that is dependent on a concrete one that has changed, it is recalculated. The web application (implemented in ASP .NET) allows web-based administration of parts of the domain and user model (see Section 3.4). More concretely, data not automatically computed based on interaction have to be entered in the database (i. e., users, devices, and features) which can be done via the web UI relatively comfortably. Information that is computed automatically is presented to the user in a clearly structured way so an administrator can get a good overview without having to browse the database. The web UI can show a list of a certain user’s values for the features ordered chronologically. Further, it offers means for the visualization of a user’s progress within a certain time period. Fig. 3.2(b) shows the comparison of a user’s progress with two settings. Especially the latter can be used also in order to (i) better understand the system’s recommendations and (ii) monitor and analyze a user’s progress.

3.4 Domain and user model Our work focuses on a user’s interaction abilities and does not consider characteristics such as detailed demographic or contextual information. Based on the considerations of Knutov et al. [24], the framework relies on a domain model that describes how the conceptual representation of the application domain is structured. As shown in a relational overview in Fig. 3.3, it comprises interaction devices, ways of interaction and indicators for successful interaction, and relations between them. The user model then stores concrete information about a user’s interaction with the system and results for the indicators. It is a feature-based, flat overlay structure over the relevant parts of the domain model. The models are organized in a relational database (running on an MS SQL Server 2008) that contains users, devices, and device input types; see Fig. 3.3. The latter are important in case there is more than one way of interacting with a device. For example, there could be different ways of using physical pressure as in-

78 | M. Augstein and T. Neumayr

Figure 3.2: Screenshots of the web application showing (a) a list of a user’s feature values and (b) a visualization of a user’s progress with different settings.

put method for a smartphone. An interaction setting is an interaction scenario that is linked with (a combination of) one or more ways of interaction. As these interaction settings are used in combination with devices, the model additionally stores interaction device settings that link interaction scenarios with the devices that are used (e. g., touch input with a certain phone in combination with a smartwatch). For the concrete

3 Automated personalization of input methods and processes | 79

Figure 3.3: Simplified presentation of the model (based on [5]).

instances implemented in our case study below (see Section 4), we use the shorter term input settings instead of the more general interaction device settings. Further, the model contains assessments (called interaction tests), which are more or less self-contained applications that provide tasks to assess users’ interaction abilities. In addition to the assessments, a user can play game-like applications (see Section 4.3) or interact with real-world use cases (i. e., everyday interaction scenarios) at any point in time, which are also interpreted as assessments internally. Further, there are concrete and common features along with so-called feature executions. A feature execution is a combination of a feature and a user’s result for it. As we store feature executions rather than values for the features themselves, the system provides access to the full history of results which enables longer-term progress monitoring.

3.5 Adaptation model As suggested by Knutov et al. [24], an adaptation model is used to determine how the relations in the domain model affect feature updates. Knutov et al. further mention user navigation as to be indicated by the adaptation model which is, however, mainly relevant in the area of adaptive hypermedia and was not considered in our framework. The PI case study described in this chapter involves adaptive system behavior, such as adapting task difficulty, which is controlled via the adaptation model. The adaptation model is rule-based (e. g., IF FEATURE >= VALUE1 THEN DIFFICULTY = VALUE2) and takes effect at various places in the framework instantiation (e. g., directly in the applications offered, see Section 4.3).

80 | M. Augstein and T. Neumayr

4 Case study This section describes a concrete instantiation of our framework for a specific use case. Our framework was conceptualized to be well generalizable and applicable for a vast number of use cases. The validation of our concepts was planned to take place in an area, where users typically have a high demand for adaptation, which is usually the case when (temporary) impairments are present. Thus, one of the first target groups for a case study of the concrete framework’s instantiation was defined to consist of people with cognitive and/or motor impairments or disabilities. We wanted both (i) to harness the multitude of different interaction needs and requirements and (ii) to inspire the assistive technology community with our creative and sometimes unconventional solutions for input methods. The methodology in this case study was strongly motivated by qualitative approaches, meaning that the quantitative assessments of our framework instance would be accompanied by in-depth observations of and interviews with the individuals taking part in the evaluations. Please find an earlier introduction to this case study in [6].

4.1 Input methods and devices In order to come up with suitable device prototypes to register to our framework instance, we conducted several on-site observations at two facilities where people with disabilities work. These observations helped us to develop a basic understanding of the types of input interactions that could still be performed relatively well by users with a large pool of different impairments. On the one hand, the frequent use of large hardware buttons and the strong limitations of horizontal mobility of the hands led us to experiment with the application of physical pressure to a relatively small (horizontal) area. On the other hand, shaking the arm or hand is sometimes one of the last controlled movements possible for target group users, according to their caretakers. Applying pressure In this approach, pressure is applied to a detecting device itself (see Fig. 3.4(a)) or to an external spring-loaded device (see Fig. 3.4(b)). In the former case the pressure is detected via a vibration-absorption-based mechanism (based on [21], also see [2]). An advantage of this approach is that the user can apply pressure to either the whole device or parts of it, while holding it in the hand or pushing it against a surface. In the latter case, a magnetic-field-manipulation-based mechanism (built upon [20], also see [2]) is used with a common hole puncher augmented with a magnet. In our experience, pressure differences could be reliably measured in a finegrained way; however, the magnetic field can be easily influenced by external factors. Please note that touchscreens in more recent smartphone generations typically employ a mixture of resistive and capacitive technology to sense the applied pressure (e. g., Apple’s Force Touch) that could now be used and registered to our framework.

3 Automated personalization of input methods and processes | 81

Figure 3.4: The three input methods based on (a) and (b) application of physical pressure and (c) shaking the hand or arm [6].

Shaking the arm or hand Here, the user wears either a smartwatch or an armband with integrated acceleration sensor (see Fig. 3.4(c)). A user can decide to shake either the hand or the arm and place the wearable device accordingly. The application measures shaking intensity by analyzing position and acceleration data.

4.2 User model The generic model presented in Section 3.4 that provides the possibility to store information about the domain and the user was filled with a number of performance indicators that we called user model features. For the two input settings described above (applying pressure and shaking the arm or hand) a number of different features are calculated first with the (initial) assessments, and get updated during later assessments, the playing of games or the operation of predefined, real-world use cases. We distinguished between concrete features, which indicate how well a user can perform several detail skills with a certain input setting, and common features, which indicate on a higher level how well a user can perform with a certain input setting. For an overview of the implemented concrete and common features, please refer to Tables 3.1 and 3.2, respectively.

4.3 Applications We selected two applications based on an observation of what everyday interaction activities are problematic for people with disabilities. The first is related to scrolling in larger documents, the second to the navigation through the Windows start menu. Both applications can be operated using any of the input settings registered at our framework instance. As, especially in the vibration-based approach, the user can apply pressure to the whole smartphone, we decided not to run the applications on the phone (which is used to capture user input) but on another computer to ensure good visibility of the UI. When starting an application, the user is recommended input set-

82 | M. Augstein and T. Neumayr Table 3.1: Concrete features for two input settings (physical pressure and shaking the hand or arm), based on [3]. Feature

Description

Unit

MaxIntensity

maximum intensity level of pressure or shaking hold time of pressure or shaking level different pressure or shaking levels a user can hold reliably different levels a user can hold theoretically (but not reliably) average number of attempts a user needs to pass a level average time a user needs to pass a level physical effort (strain) for a user when applying high intensity physical effort (strain) for a user when applying medium intensity physical effort (strain) for a user when applying low intensity difference of heart rate between beginning and end of an assessment (heart rate itself is measured in beats per minute)

%

HoldTime DiffLevelsReliable DiffLevelsUnreliable DiffLevelsAvgAttempts DiffLevelsAvgTime DiffLevelsEffortHigh DiffLevelsEffortMedium DiffLevelsEffortLow HeartRateDifference

ms Level Level # ms Average effort Average effort Average effort Absolute difference

Table 3.2: Common features for application of physical pressure and hand or arm shaking [3]. Feature

Description

Unit

Precision

precision percentage (averaged across all interactions with a certain method) time needed for an interaction average effort for the interaction type

%

Time EffortInteractionType

ms Average effort

tings in a sorted list with a percentage rating, based on the preceding assessments. After selecting a setting, a game is played for training purposes (interaction there is already influenced by the user model). Next, the user can switch to a realistic use case. The user’s performance is analyzed for a high score list and to be fed back to the user model for future personalization. For all games and use cases, new common features (needed to make different devices comparable) were introduced.

4.3.1 Use case “scrolling” The first application related to the scrolling activity contains a game (“Traffic”) where a user takes the perspective of a car driver and has to adjust the velocity to chang-

3 Automated personalization of input methods and processes | 83

ing speed limits over a period of time (see Fig. 3.5). When using pressure-based interaction, the current pressure corresponds to the car’s acceleration. A user’s score increases when their own speed is near the target speed. The amount of pressure the system regards as the current user’s maximum is retrieved from the user model. The game was envisioned to match continuous operations like scrolling through content or adjusting volume of audio. Thus, afterwards, the same interaction activities can be used for scrolling through content in a PDF file or adjusting the computer’s audio volume settings. Results of game and real use cases are fed back into the user model in real time. Thus, in case a user seems to have problems with input activities that should be possible according to the user model, the user model is updated accordingly. The same applies for the other case. For example, if a user can handle all input activities exceptionally well, the user model might be updated as well to reflect the change in ability. This allows for instant reaction to problems that occur only after a longer period of interaction (and thus could not be identified during the initial assessments).

Figure 3.5: Screenshot of the “Traffic” game [6].

4.3.2 Use case “menu navigation” The second application related to menu navigation contains a game (“BarHero”) where users control an airplane while non-cloudy areas approach from the right. The airplane has to be moved up and down to fly through the approaching areas. The currently applied intensity directly corresponds to an action in the system, e. g., when

84 | M. Augstein and T. Neumayr a user applies no pressure at all, the airplane is moved to (or stays at) the lowest level immediately. When the maximum pressure is applied, the airplane is instantly moved to (or stays at) the highest level. The number of different levels the player can hold is retrieved from the user model and the target areas that are shown to the user adapt to it. Subsequently, the user can work with the recommended input setting in an everyday scenario (navigating through a menu). In this example, the different intensity levels correspond to different actions like proceed to next item, open currently selected item, and open sub-menu. In case the user cannot reliably apply three different intensity levels, the system’s behavior changes, e. g., in case the user can only apply one intensity level, the navigation uses scanning where the system selects one item after the other and the user interacts only to select the currently active one. Identical to what was described for the first use case, the user model might be influenced by interaction with the game or real applications.

5 Case study evaluation This section summarizes the results of a study that has been conducted in order to evaluate the PI use case described in Section 4 along different criteria, following the layered evaluation framework proposed by Paramythis et al. [32]. The evaluation has been described earlier in [3], however with a broader focus, including also the input methods themselves, while we concentrate on the aspects of personalization here. The layered evaluation framework divides the evaluation of interactive adaptive systems in five layers: collection of input data (CID), interpretation of the collected data (ID), modeling the current state of the “world” (MW), deciding upon adaptation (DA), and applying adaptation (AA). Paramythis et al. argue that although not all layers can be isolated and evaluated separately in all systems, each of the layers needs to be evaluated explicitly. However, they also state that the relevance of each of the layers might differ for different systems. Table 3.3 provides an overview of the five layers, related example evaluation criteria, and a (non-exhaustive) list of suggested methods.

5.1 Evaluation goals and criteria The evaluation involved all five layers separately, focusing on those aspects of the system on the respective layer we consider most relevant or most critical. This section describes our evaluation goals and criteria for each of the layers. In addition to the questions listed in the following sections, part of the evaluation dealt with the input devices and methods themselves. We do not report this part in detail here as it is not directly related to PI (please refer to [3] for the results).

3 Automated personalization of input methods and processes | 85 Table 3.3: The layers proposed in the layered evaluation framework of Paramythis et al., table adapted after [32]. Reprinted with permission from: Springer, User Modelling and User Aadapted Interaction 20 (5), Layered Evaluation of Interactive Adaptive Systems: Framework and Formative Methods, Alexandros Paramythis, Stephan Weibelzahl, and Judith Masthoff, ©2010. Layer

Goal

Evaluation criteria

Evaluation methods

CID

check quality of raw input data

accuracy, latency, sampling rate

ID

check that data are interpreted correctly

validity of interpretations, predictability, scrutability

MW

check that constructed models represent real world

DA

determine whether the adaptation decisions made are the optimal ones

AA

determine whether the implementation of the adaptation decisions made is optimal

All layers

—

primary criteria: validity of interpretations or inferences, scrutability, predictability; secondary criteria: conciseness, comprehensiveness, precision, sensitivity necessity of adaptation, appropriateness of adaptation, subjective acceptance of adaptation, predictability, scrutability, breadth of experience usability criteria, timeliness, unobtrusiveness, controllability, acceptance by user, predictability, breadth of experience privacy, transparency, controllability

data mining, play with layer [32], simulated users, cross-validation % data mining, heuristic evaluation, play with layer, simulated users, cross-validation focus group, user-as-wizard [32], data mining, heuristic evaluation, play with layer, simulated users, cross-validation focus group, user-as-wizard, heuristic evaluation, cognitive walk-through, simulated users, play with layer, user test focus group, user-as-wizard, heuristic evaluation, cognitive walk-through, user test, play with layer

focus group, cognitive walk-through, heuristic evaluation, user test

5.1.1 Layer 1: collection of input data In order to evaluate the quality of raw input data, we derived the following concrete questions related to our system, based on the criteria sampling rate, latency, and accuracy suggested by [32]: – L1Q1: Is the sampling rate sufficient? – L1Q2: Are raw data complete (no empty lines or cells, no missing data for certain timestamps)?

86 | M. Augstein and T. Neumayr – –

L1Q3: Are the data points sufficiently precise (e. g., regarding rounding factor)? L1Q4: Is the amount of data consistent for equal interaction time spans?

5.1.2 Layer 2: interpretation of data In our evaluation, the second layer had a specific focus on interpretations and hypotheses that founded the basis of many of the concrete and common features we introduced (see Tables 3.1 and 3.2). Hereby we focused on the features that were initially grounded on non-proved hypotheses. The following questions (based on the criteria validity of interpretations and predictability suggested by [32]) were subject to layer 2 evaluation: – L2Q1: Is the amount of physical pressure the system regards as general maximum4 appropriate? – L2Q2: Is the amount of physical pressure the system regards as general minimum5 appropriate? – L2Q3: Is the shaking intensity the system regards as general maximum appropriate? – L2Q4: Is the shaking intensity the system regards as general minimum appropriate? – L2Q5: Is the assumption that effort (physical strain) can be derived from the moving distance of a hand or arm correct? – L2Q6: Is the distinction between the number of pressure levels (or shaking) that can be held reliably and unreliably valid? – L2Q7: Is the assumption that interaction precision can be derived based on the interaction time, the average attempts, and the number of pressure (or shaking) levels valid? 5.1.3 Layer 3: modeling the current state of the world Our layer 3 evaluation emphasized the user model’s concrete features related to the maximum applicable pressure and the number of different pressure levels a user can apply because these features are the ones most influential for later adaptive behavior in the concrete use case. Thus, the IRPC was in the focus of the evaluation on layer 3. 4 This is particularly relevant for the vibration- and shaking-based approaches of measuring physical pressure. Here, there is no physical limit for the pressure as in the magnetic field manipulation one where the device (here, a hole puncher) provides a physical limit. However, the system still considers a certain amount of pressure as the maximum possible. 5 This is particularly relevant for the vibration absorption approach; when the smartphone vibrates at maximum intensity, part of the vibration gets absorbed without the user applying pressure to it (e. g., due to friction).

3 Automated personalization of input methods and processes | 87

The common feature Time has been included in the layer 3 evaluation without being evaluated on layer 2 (as it does not involve any kind of interpretations) but is of marginal importance here as it fully relies on the timestamps logged in the raw data (the quality of which has been checked on layer 1). The common feature Precision has been involved as well. The following questions (based on the criteria validity of interpretations and predictability [32]) were in the focus of layer 3 evaluation: – L3Q1: Are the values computed by the system and stored in the user model for the features MaxPressure and MaxShakeArm realistic? – L3Q2: Are the values computed by the system and stored in the user model for the features DiffLevelsPressureReliable and DiffLevelsShakeArmReliable realistic? – L3Q3: Are the time spans (stored in the user model as Time) computed on the basis of the timestamps logged for all interactions correct? – L3Q4: Are the values computed for the common feature Precision realistic? 5.1.4 Layer 4: deciding upon adaptation The evaluation on layer 4 focused on the phase after interaction recording and processing, starting with the recommendations regarding the best-fitting input setting for a user (in an ordered list). The main aim was to evaluate whether the system’s adaptation decision was correct, without regard to the actual implementation and application of this adaptation. The following questions (based on the criteria appropriateness, subjective acceptance, and predictability, see [32]) were in the main focus of evaluation on layer 4: – L4Q1: Is the input setting recommended as best fitting one actually the best for the current user? – L4Q2: Is the sequence of recommended settings correct for the current user (i. e., are the order and the percentages computed for the settings appropriate)? – L4Q3: In case two or more settings were ranked as equivalent by the system, are they really equivalent? 5.1.5 Layer 5: applying adaptation The evaluation on layer 5 focused on the actual application of adaptation, from the presentation of the recommendations list to the user-adapted behavior and perceived usability of the games and real application scenarios. Several questions were formed for the evaluation of this layer, which can be grouped in questions related to the input setting recommendations (L5Q1 and L5Q2) and those related to the presentation and behavior of the games (L5Q3 to L5Q6). The questions are based on the criteria controllability/scrutability, breadth of experience, usability, and predictability [32]: – L5Q1: Do users value that not only recommended settings are displayed but all (in a ranked list)?

88 | M. Augstein and T. Neumayr – – – – –

L5Q2: Is it perceived as good/helpful that not only the list is displayed but also the percentages (indicating the fit) for all settings? L5Q3: At the Traffic game – was the responsiveness of the car as expected? L5Q4: At the Traffic game – how exact could the speed be controlled? L5Q5: At the BarHero game – was the responsiveness of the plane as expected? L5Q6: At the BarHero game – how exact could the flight altitude be controlled?

5.2 Procedure and methodology The layered evaluation was split up into two parts (taking place in different phases of development). The first involved a raw data inspection (partly by a human expert, partly automated) to answer L1Q2–L1Q4. L1Q1 and all layer 2 questions required an additional pre-test that took place before the actual user study. With three users, we discussed the behavior of the system in response to user input, using the method play with layer suggested by Paramythis et al. [32], where users could freely explore the system and its in- and output. Within the second part, a user test (with observation and interview) was conducted. It involved layers 3–5 and all layer-independent questions, took place in a controlled lab setting, and was structured as follows. After an introduction by a supervisor, the participants did the initial assessments with the three input settings. After each assessment, they had to answer input setting-related questions and an observer took notes and rated the interaction. The supervisor then showed the system’s recommendations to the participant, who was asked whether (i) the input setting ranked first was the best according to their own impression, (ii) the ranking order was perceived as correct, (iii) in case input settings were ranked equal, these were perceived as equal, (iv) they liked the idea that not only the best setting but all were listed, and (v) they liked that the system did not only order the settings but also showed their rating. Next, the participants played the BarHero or the Traffic game (both games were played equally often). The original study design intended all participants to play both games; however, the pre-tests following this design took more than 60 minutes, which was deemed too long by the pre-test users. The participants could decide what setting to use (most used the one the system recommended). After the game, the participants were asked again to (i) rate the setting and (ii) rate it in comparison to how well it worked at the assessment. Additionally, they were asked game-related questions (e. g., whether the reactivity of the airplane in the BarHero game was as expected, or how well the participant could change the speed of the car in the Traffic game) and an observer rated how well the game was controlled. Next, participants tried the realistic interaction task related to the game they played. They either had to start a specific application listed in the Windows start menu or scroll to a certain page in a PDF document. Again the observer rated how

3 Automated personalization of input methods and processes | 89

well the tasks were performed. Additionally, the participants were asked questions related to the setting (also, compared to how well the interaction worked before). Lastly, participants were asked to report basic demographic information (e. g., age and gender) and had an additional opportunity to provide comments. During the whole process, observers took notes to record relevant interaction behavior, errors, and statements of the participants. The tests took about 35 minutes per participant.

5.3 Participants The aforementioned pre-test conducted to answer L1Q1 and all questions related to layer 2 involved three test users (two male, one female, aged 30 to 33) who volunteered for this task. One of them also acted as pre-tester for the study designed for the evaluation of layers 3–5 (the design of which slightly changed after the pre-test). The user tests for layers 3–5 were conducted with 22 participants in total; 18 (10 male and 8 female, aged 13 to 31, M = 20.33, SD = 4.18) were recruited at a university open house. The remaining four participants were recruited at a facility where people with impairments work. Unfortunately it was not possible to recruit more than these four participants due to different reasons (e. g., complexity of the tasks or duration of the tests, which required a relatively high concentration capacity). For the group of people with impairments the number of participants is too small to draw general conclusions. We, however, still aimed at gaining experience with the PI approach on the basis of the individual. The four participants were carefully selected by the head of the facility, who also assisted them in rating the interaction methods and provided further thoughts regarding the potential of the interaction methods for people with impairments. All were male (aged 24 to 32, M = 27.25, SD = 3.59) and had severe motor and moderately severe cognitive impairments.

5.4 Results This section summarizes the results of the layered evaluation, corresponding to the questions raised for the individual layers in Section 5.1.

5.4.1 Layer 1 The evaluation of layer 1 did not reveal any significant problems except for the quality of the raw data of the smartwatch’s heart rate sensor. For all other data, the sampling rate was sufficient, users did not notice any delays, and no missing data points were found. Related to L1Q1, users were asked to freely explore the behavior of the

90 | M. Augstein and T. Neumayr assessments and games. They reported that they could not notice any delays regarding the behavior of real-time information display (e. g., related to the visualization of the current amount of physical pressure) and reactivity of interactive objects (e. g., the airplane in the BarHero game). On a more technical level, we inspected the data recorded by the framework and computed the time span between two recorded data instances. It was between 1 and 46 milliseconds, which we deemed sufficiently small. Subsequently, we could answer question L1Q1 positively for all sensors and data related to the physical pressure or shaking-based approaches. Regarding the heart rate data we tried to record in order to compute the concrete feature HeartRateDifference, we noted that the sampling rate was not sufficient, thus we excluded this feature from the user study (and evaluation of the other layers). Related to L1Q2, we used an automated routine to search for missing data points and did not identify any, except for those related to technical problems that were noted during the interaction. The third question on layer 1, L1Q3, could be answered positively as the data points as provided by the sensors are extremely precise (using 7 decimal places). Regarding L1Q3, the amount of data was equal for equal interaction time spans (within the deviation reported for L1Q1).

5.4.2 Layer 2 Questions L2Q1–L2Q7 were answered based on play with layer tests that were conducted with the three pre-test users before the actual user study. These tests were less formal but led to insights and consequently adjustments regarding the interpretations and assumptions that form the basis for feature computation (layer 3). The focus of the layer 2 questions was on the initial assessments as the amount of pressure (or shaking) considered as general maximum and minimum are equal in the game applications. The test users were asked to freely interact with the app and see whether the visualization of the current pressure corresponded to their impression when it indicated maximum pressure. The tests revealed differences between the different settings regarding L2Q1 and L2Q2. For the vibration-based approach, the pressure the system considered as the maximum was not always perceived as such thus the maximum (and also minimum) values could later be calibrated. Regarding the minimum, the tests showed that the surface the smartphone is positioned on influences vibration absorption. Higher friction leads to better absorption which again influences the general minimum pressure. For the magnetic field approach, the tests showed that the magnetic field around the smartphone can be influenced by various interfering factors thus it is not possible to work with the same settings for minimum and maximum in every assessment. The app must be calibrated accordingly before the interaction starts. For the shakingbased approach (L2Q3–L2Q4), the general minimum was perceived as appropriate,

3 Automated personalization of input methods and processes | 91

and the maximum was altered after the tests as all users reported that the maximum shaking intensity was too exhausting and hard to reach. We further decided to offer calibration here also as the feeling of exhaustion is individually different. L2Q5 could not be answered positively after the play with layer tests, which showed that especially for all pressure-based approaches, the moving distance was not indicative for effort/physical strain. The feature was thus not included in the later user study. Regarding L2Q6, the test users reported that the system’s computed values for the number of pressure levels they could hold was as expected in all cases. For L2Q7, the test users judged the system’s output for the common feature Precision and found it to be realistic given the past interaction behavior. Summing up, only L2Q6 and L2Q7 led to affirmative results. All other questions could not or only partly be answered positively. This led to a revision or exclusion of several concepts and interpretations. In many cases, we added a calibration possibility that allows for individually adapting the minimum and maximum values to the environmental conditions.

5.4.3 Layer 3 Layers 3, 4, and 5 were evaluated within the user study involving 22 users in two groups (see Section 5.3). We answered L3Q1 and L3Q2 by comparing the values the system computed for the features MaxPressure and MaxShakeArm to the observer’s rating (the observer had to rate the participant’s ability to apply maximum pressure (or shaking intensity) on a five-point ordinal scale for each input setting). In all cases, we noted a close match between the system’s and the observer’s results; in cases the results were different, the tendency was equal. L3Q3 was answered without involving the participants, based on a comparison of the computed time span to the timestamps present in the log data. All computed time spans were correct. For L3Q4, we compared the system’s computed values for Precision to the observer’s impression of how well the participants could work with the input settings. The system’s computed values and the observer’s rating tendency were very similar for all users, although in some cases the observer’s impression was a bit less critical, which we attribute to the human characteristic of rating abilities on a higher level (i. e., considering not only objective measures but also subjective impressions). In total, all computed values were reasonable, thus the results for layer 3 are exclusively positive.

5.4.4 Layer 4 We asked participants whether they agreed with the system’s decision related to the top-ranked input settings (L4Q1). In group A, 16 of 17 participants agreed, and one

92 | M. Augstein and T. Neumayr disagreed (one participant had to be excluded due to technical problems). For group B, the participants’ consultant judged the system’s decision and agreed in all cases. Regarding the order of recommended input settings (L4Q2), the participants (or consultant) judged the system’s decision on a five-point scale between “very appropriate” and “not appropriate at all.” Six of 17 participants used the best possible rating, six the second-best, three rated undecided, and two rated below. Most participants who used an undecided or negative rating said that the shaking-based input setting was rated a bit too positive. For all group B users, the system’s decision was rated as “very appropriate.” Regarding L4Q3, in five cases, the system rated two input settings as equivalent (mostly, the vibration- and shaking-based one). In four cases, the results for the vibration-based input setting were not valid due to technical issues. In the remaining case with a valid result, the user disagreed with the system. Summing up, the results confirmed that the framework was able to recommend the best-fitting setting. However, we also found that users prefer a clear differentiation of the results for the settings (they find it harder to judge the system’s decision in case two settings are rated as almost equal). Additionally, the results indicate that for the magnetic field and vibration-based approaches, users understood the system’s rating better than for the shaking-based approach. Unfortunately we did not gain sufficient insights for L4Q3.

5.4.5 Layer 5 We asked participants of group A (L5Q1) how they liked that the system displayed all input settings in a ranked list (instead of displaying just one or automatically choosing one). They rated the value of this information on a five-point ordinal scale; 16 of 18 used the highest, the remaining two the second-highest rating. We did not include L5Q1 for group B as the care personnel argued the question would be hard or impossible to answer for the participants, which was also true for the layer-independent questions that cannot be answered by a representative. Regarding L5Q2, we asked participants whether they perceived the additional information displayed for each setting in the recommendation list (i. e., the percentage that indicates how well a setting fits for a user) as helpful; 13 used the highest possible rating, three the second-highest, and three rated undecided (they argued that the information was nice but not necessary). Regarding the questions related to the Traffic game, seven of nine users who played it said that the responsiveness of the car (L5Q3) was as expected (four used the best, three the second-best rating). Six participants played this game with the magnetic field, two with the shaking, and one with the vibration-based input setting. To answer L5Q4, we asked how exact the speed could be controlled (seven used the best or second-best rating, two rated undecided). Regarding the questions related to BarHero, all nine participants who played it used the best rating for L5Q5 (responsiveness of the airplane). All worked with the magnetic field input setting and used

3 Automated personalization of input methods and processes | 93

the best or second-best rating for the question how exact the flight altitude could be controlled (L5Q6). Summing up, the application of adaptivity has been perceived as successful by most participants. Regarding the input setting recommendations, participants’ opinions were predominantly very positive. The same is true for the two games.

5.5 Evaluation summary The evaluation described in this chapter has mainly been conducted for two reasons. First, it was intended to show whether our concrete instantiation of the framework introduced in Section 3 succeeded in (i) modeling users’ abilities related to interaction and (ii) implementing adaptive system behavior based on the models. Second, it should also reveal exemplary insights about users’ experience with PI (e. g., whether it is possible to retain sufficient user control when offering adaptive behavior related to selection and configuration of input devices). The evaluation is, however, limited to the use case described in Section 4. Thus, only part of the findings reported above are generalizable. However, the evaluation was able to show that it was possible to model user’s abilities related to the interaction settings and applications as accurately as needed in order to later provide personalized system behavior (layer 3). Further, it has also shown that the system could make high-quality decisions related to the selection of suitable input devices (layer 4) and that the actual adaptive system behavior based on these models and decisions was perceived as adequate (layer 5). As a side aspect, the evaluation can also be seen as a proof-of-concept for the layered evaluation approach itself as it actually helped to localize problems on the individual layers.

6 Framework application for interaction analysis in other scenarios In the prior steps of our research, the concepts of the automated analysis of input settings and the according adaptations of the interaction behavior and UI were successfully employed and evaluated positively for 2D input types. For example, the application of higher or lower pressure or the variation of shaking intensity could be handled relatively well and UI adaptations based on the calculated metrics led to improvements for users’ interaction behavior. In the next step, we extended our scope of interest to the 3D space for the reasons described in this section. Further, we shortly describe two device prototypes for 3D input and their implementation and evaluation.

94 | M. Augstein and T. Neumayr

6.1 Potentials and drawbacks of 3D input Apart from the sheer possibility to perform operations in a 3D space (e. g., navigate through 3D content), this should allow our users to perform a broader spectrum of input activities through the wider range of input options. In our previous work, we evaluated the performance of using a readily available 3D input device, the Leap motion controller, in the context of work places of people with disabilities [27]. While the interaction itself looked promising for some tasks, there are also some drawbacks. The Leap motion controller allows for fully touchless input, i. e., there is no physical contact between user and device and 3D input gestures are performed in mid-air (in a fixed, predefined active space above the controller). The device uses infrared sensors to analyze the position of the user’s hand and is able to capture the position of each finger (and wrist) joint individually. As the interacting hand has to be held about 20 cm above the device during the interaction process (and it is problematic to use a common hand rest to support the hand or wrist as this would interfere with the recognition technology), interaction turns out to be relatively exhausting (hand/arm fatigue is a common problem, see [27]). A second common problem is missing feedback, both regarding the interaction activity in general but also regarding the area active to user input; often, users do not notice that they have left the active area and do not immediately understand why their input activity has not led to a system response. Nevertheless, touchless input devices gained considerable popularity with the introduction of Microsoft Kinect or the Leap motion controller, especially for games involving physical activity or therapeutic applications. Besides the Leap motion controller, which was embedded into our framework as a baseline input device, we additionally conceptualized two own hardware prototypes (see [4]) with the aim of reducing the two aforementioned drawbacks of touchless input (fatigue and unknown borders of detection). Their names are loosely based on a popular American animated television series.

6.2 The SpongeBox prototype The prototype called SpongeBox consists of a transparent plastic box, the inner walls of which are covered with sponges (see Fig. 3.6(a)). The user puts the interacting hand inside the box and can then press against four sides of the box (bottom, front, left, right). SpongeBox then measures the position of the hand inside the box by measuring pressure intensities in the different directions. When interacting, the user has to overcome the physical resistance of the sponges which at the same time provide for haptic guidance and feedback. Similar to the pointer concept used by traditional mouses, SpongeBox is used to control a virtual object on the screen; however, this object can be moved in a 3D space instead of a 2D one.

3 Automated personalization of input methods and processes | 95

Figure 3.6: The two 3D interaction prototypes, (left) SpongeBox and (right) SquareSense [4]. Reprinted from Studies in Health Technology and Informatics, 242, Mirjam Augstein, Thomas Neumayr, and Thomas Burger, “The Role of Haptics in User Input for People with Motor and Cognitive Impairments”, ©2017, with permission from IOS Press.

6.3 The SquareSense prototype The SquareSense prototype aims at combining aspects of touch-based and touchless interaction enabled by SpongeBox and the Leap motion controller. The hardware setup consists of a wooden box with movable side walls (to fit different hand sizes) of about the same size as SpongeBox (see Fig. 3.6(b)). The user can move the interacting hand in a touchless manner inside the box. In contrast to fully touchless input, however, first, the walls of SquareSense provide a physical border indicating the active area, and second, it is possible to touch and apply pressure to the walls (touch and pressure intensity can be recognized and measured). Another difference compared to touchless input with the Leap motion controller is the provision of a hand rest. During the interaction, the user’s wrist can be placed and rest on a sponge, which makes interaction less exhausting and provides a certain amount of guidance, in addition to the outer walls.

6.4 Implementation and evaluation We registered all three devices (i. e., the Leap motion controller and the newly created prototypes) at our framework instantiation and implemented a number of user model features (e. g., maximum position in each direction) as well as assessment tasks. For example, the first assessment aims at finding out the individual reach with each of our devices. Therefore, a virtual interactive object (a red cube) is positioned in the middle of the virtual 3D space and serves as a cursor. The user is then instructed to move the cube as far as possible in each direction. Similar assessments then lead to the detection of an individual’s regularity (i. e., accuracy), continuous regularity (i. e.,

96 | M. Augstein and T. Neumayr the continuous pursuit of a target object), and time to reach a target object with the cursor. After taking the assessments, with the computation of said common features, we were able to achieve an automated analysis of individual interaction performance. A user study with five persons with motor and cognitive impairments [4] and a user study with 25 users without known impairments [7] were then conducted. One goal was to evaluate how well our framework could analyze the interaction performance of its users with the newly added devices for simple 3D manipulation tasks. This was possible to a large extent according to our own observations and those of specially trained caretakers in the case of participants with disabilities. Another goal was to find out the role that a haptic experience plays in the input activities on both interaction performance and UX. It was quite surprising that the results for the two parts were relatively contradictory where participants had no known impairments. While the semi-touchless (SquareSense) and touchful (SpongeBox) input techniques both outperformed touchless (Leap) input regarding important interaction performance metrics (and touchless input was not significantly better than any of the other two at any metric), the UX scoring was evaluated better in tendency for touchless input [7]. Reasons contributing to this disparity might lie in the different quality of human perception connected to hedonic or pragmatic aspects of the interaction (for details please refer to [7]). The study with users with motor and cognitive impairments, see [4], revealed a much stronger dependence on haptic guidance, compared to the nonimpaired users: almost all participants gained their best interaction performance results for important metrics with SpongeBox. Summing up, it was once again possible to register new input devices and prototypes on our framework, create metrics and assessments for them without extensive effort, and successfully model individual performance of the users.

7 Discussion and limitations The framework presented in this chapter is focused on providing a generic infrastructure to implement PI. As described earlier in Section 3.3, it is mainly composed of one (or several) decentral IRPCs and a central AC. The IRPCs are native applications on arbitrary devices that communicate with the AC, which is responsible for the analysis of the data provided by the IRPC(s), via a REST interface, using a pre-defined data exchange format. First, while the decentral, native IRPCs might be seen as an advantage (it allows for arbitrary devices to be used, with the ability to produce JSON data being the only prerequisite), this could also cause additional effort. A developer needs to be familiar with native development for the respective device (category), also including several different devices might thus require familiarization with different technologies. For instance, a native application for an Android smartphone differs drastically from one on an iOS smartphone or a low-level component such as an Arduino sensor.

3 Automated personalization of input methods and processes | 97

Second, a developer of a PI use case using the framework does not only have to be familiar with development for potentially several different platforms, but should also have at least some basic knowledge in the field of user modeling and the input methods themselves in order to define the necessary user model features. This could, however, be done in cooperation with a domain expert as the features have to be registered via the web application (which does not presume any technological knowledge). Once registered, the features’ values need to be extracted from the data provided by the IRPC (at least for concrete features). Third, the technology used for the central AC (currently, a Windows 2008 server infrastructure) will not be supported endlessly, thus the central part of the framework might require a major update in the near future. In general, we can summarize that the framework’s focus on genericity allows for implementation of a broad spectrum of concrete PI use cases but comes at the cost of reduced simplicity for developers. The phase of familiarization with the framework might take time before a new use case can be implemented from scratch.

8 Summary and conclusions In this chapter we have discussed PI as a means of individually supporting users by considering their distinct prerequisites related to interaction and providing automated, mainly system-initiated adaptations based on them. PI has previously been dealt with mainly in or around the field of assistive technology, most probably due to the fact that for the target group of people with impairments, interaction abilities vary drastically. In many cases, particularly motor impairments might lead to the complete exclusion of people regarding the usage of certain interaction devices and methods. However, the exclusion from certain interaction methods and devices is often accompanied by the exclusion from usage of certain pieces of software (e. g., smartphone or tablet apps if a user cannot operate a common touchscreen). In the field of personalized and adaptive systems research, interaction, as direct target area of personalization, has played a minor role, compared to content, presentation, or navigation adaptation. However, interaction preferences might vary also for people without known impairments, thus the potential target audience is broad and might include the entire spectrum of users. Based on these considerations, we have described a concrete framework that can be used as an infrastructure to implement PI (with a focus on input methods and devices). The framework was designed based on previous research conducted in different areas of personalization (e. g., generic user modeling systems [25] or adaptive hypermedia [11, 24]). It aims at (i) being sufficiently generic in order to allow for application with varying target groups, devices, and applications, (ii) offering support for administrators and users, and (iii) allowing for automated personalization while still enabling user control and obtaining a positive UX.

98 | M. Augstein and T. Neumayr As the framework itself does not constitute a concrete system that can be used to demonstrate a PI use case, we additionally described a concrete case study that has been implemented using the framework. The use case for this case study has been selected based on findings gained during tests with a group of users with motor and cognitive impairments; it is however not targeted to this user group solely. Thus, the evaluation of the use case described later in the chapter has been conducted with two user groups, focusing also on those without known impairments (but with individual interaction preferences). As the framework can be used to implement a wide range of PI scenarios, it is hardly possible to evaluate the framework itself (regarding user acceptance and related factors). Rather than on the basic infrastructure provided by the framework, the success of a concrete implementation relies on the selection of features, the analysis of their underlying log data, and the concrete implementation of adaptation rules. Therefore, we decided to present an evaluation conducted for the concrete use case discussed before. The evaluation described in this chapter followed the layered evaluation approach proposed by [32]. This approach was chosen due to its capability to localize sources of problems in case problems are observed. For the concrete use case, the evaluation results show that (i) problems that were identified could be localized well, (ii) the framework enabled to realistically model users’ abilities related to the given interaction scenarios, (iii) the framework also enabled implementation of adaptive behavior of apps using the resulting models, and (iv) the framework enabled implementation of adaptive behavior while preserving transparency as well as user control. From these findings we also draw more general conclusions related to PI at large, however, due to the nature of the evaluation, limited to users’ general attitude towards PI and to a short-term application of a PI scenario. The descriptions in this chapter do not allow for conclusions on users’ acceptance of longer-term personalization and adaptive behavior. During the coming years we expect the importance of PI to increase for a broader spectrum of users. First, the product range in the area of input devices grows rapidly. Thus, users’ preferences among these input devices can be expected to vary even more drastically. Second, a more general trend towards individualization can be observed that we expect to affect also the selection of input devices and their configuration. The ubiquity era as predicted already in 2008 by [18] also involves that “more people than ever will be using computing devices of one form or other” and that “computers can now be interwoven with almost every aspect of our lives.” This again emphasizes the need for personalization regarding interaction with these devices in order to not exclude certain people or groups of people from these developments that influence not only small parts of humans’ lives but might be fully interwoven with them in the future. The framework discussed in this chapter in combination with a case study using a concrete implementation of the framework is intended to demonstrate an approach

3 Automated personalization of input methods and processes | 99

of PI at the example focused on the needs of a target group with highly diverse requirements. It does not only show exemplarily how PI can be implemented technically but also how the interplay between general user model, concrete application, and its UI can be arranged.

References [1] [2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11] [12] [13]

Pierre Akiki, Arosha Bandara, Yu Yijun, Adaptive model-driven user interface development systems. ACM Computing Surveys, 47(1), 2014. Mirjam Augstein, Daniel Kern, Thomas Neumayr, Werner Kurschl, and Josef Altmann. Measuring Physical Pressure in Smart Phone Interaction for People with Impairments. In Mensch und Computer 2015, Workshopband, pages 207–214. Oldenbourg Wissenschaftsverlag, Stuttgart, Germany, 2015. Mirjam Augstein and Thomas Neumayr. Layered evaluation of a personalized interaction approach. In Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, Bratislava, Slovakia, 2017. Mirjam Augstein, Thomas Neumayr, and Thomas Burger. The role of haptics in user input for people with motor and impairments. Studies in health technology and informatics, 242:183–194, 2017. Mirjam Augstein, Thomas Neumayr, Daniel Kern, Werner Kurschl, Josef Altmann, and Thomas Burger. An analysis and modeling framework for personalized interaction. In IUI17 Companion: Proceedings of the 22nd International Conference on Intelligent User Interfaces Companion, Limassol, Cyprus, 2017. Mirjam Augstein, Thomas Neumayr, Werner Kurschl, Daniel Kern, Thomas Burger, and Josef Altmann. A personalized interaction approach: Motivation and use case. In Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, pages 221–226. ACM, 2017. Mirjam Augstein, Thomas Neumayr, Stephan Vrecer, Werner Kurschl, and Josef Altmann. The role of haptics in user input for simple 3d interaction tasks – an analysis of interaction performance and user experience. In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications – Volume 2: HUCAPP, pages 26–37. INSTICC, SciTePress, 2018. Christine Bauer and Anind K. Dey. Considering context in the design of intelligent systems: Current practices and suggestions for improvement. The Journal of Systems and Software, 112(2016):26–47, 2016. Pradipta Biswas and Patrick Langdon. Developing multimodal adaptation algorithm for mobility impaired users by evaluating their hand strength. International Journal of Human-Computer Interaction, 28(9):576–596, 2012. Dmitry Bogdanov. From Music Similarity to Music Recommendation: Computational Approaches Based in Audio Features and Metadata. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2013. Peter Brusilovsky. Methods and techniques of adaptive hypermedia. User Modelling and User-Adapted Interaction, 6(2–3):87–129, 1996. Peter Brusilovsky and Nicola Hence. Open corpus adaptive educational hypermedia. In Peter Prusilovsky, Alfred Kobsa, and Wolfgang Nejdl, editors, The Adaptive Web. 2007. Joëlle Coutaz. User interface plasticity: Model driven engineering to the limit! In Proceedings

100 | M. Augstein and T. Neumayr

[14]

[15] [16] [17] [18] [19]

[20]

[21]

[22]

[23]

[24]

[25] [26] [27]

[28]

[29]

[30] [31]

of the 2nd ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Berlin, Germany, 2010. Paul De Bra, David Smits, Kees Van der Sluijs, Alexandra Cristea, Jonathan Foss, Christian Glahn, and Christina Steiner. Grapple: Learning management systems meet adaptive learning environments. In Intelligent and Adaptive Educational Learning Systems. Springer, 2013. Gerhard Fischer. User modeling in human-computer interaction. User Modeling and User-Adapted Interaction, 11:65–86, 2001. Krysztof Gajos, Daniel S. Weld, and Jacob O. Wobbrock. Automatically generating personalized user interfaces with supple. Artificial Intelligence, 174(12–13), 2010. Carlos Gomez-Uribe and Neil Hunt. The Netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems, 6(4), 2016. Richard Harper, Tom Rodden, Yvonne Rogers, and Abigail Sellen. Being human: Human-computer interaction in the year 2020. Technical report, 2008. Austin Henderson and Morten Kyng. There’s no place like home: Continuing design in use. In Joan Greenbaum and Morten Kyng, editors, Design at Work: Cooperative Design of Computer Systems, pages 219–240. Lawrence Erlbaum Associates, Inc., 1992. S. Hwang, A. Bianchi, M. Ahn, and K. Y. Wohn. Magpen: Magnetically driven pen interaction on and around conventional smartphones. In Proceedings of the 15th International Conference on Human-Computer Interaction with Mobile Devices and Services, Munich, Germany, 2013. S. Hwang, A. Bianchi, and K. Y. Wohn. Vibpress: Estimating pressure input using vibration absorption on mobile devices. In Proceedings of the 15th International Conference on Human-Computer Interaction with Mobile Devices and Services, Munich, Germany, 2013. N. Kaklanis, P. Biswas, Y. Mohamad, M. F. Gonzalez, M. Peissner, P. Langdon, D. Tzovaras, and C. Jung. Towards standardisation of user models for simulation and adaptation purposes. Universal Access in the Information Society, pages 1–28, 2014. Iyad Khaddam, Nesrine Mezhoudi, and Jean Vanderdonckt. Adapt-first: A MDE transformation approach for supporting user interface adaptation. In Proceedings of the 2nd World Symposium on Web Applications and Networking, 2015. Evgeny Knutov, Paul De Bra, and Mykola Pechenizkiy. AH 12 years later: A comprehensive survey of adaptive hypermedia methods and techniques. New Review of Hypermedia and Multimedia, 15(1):5–38, 2009. Alfred Kobsa. Generic user modeling systems. User Modeling and User-Adapted Interaction, 11(1–2):49–63, 2001. Alfred Kobsa and Wolfgang Pohl. The BGP-MS user modeling system. User Modeling and User-Adapted Interaction, 4(2):59–106, 1995. Werner Kurschl, Mirjam Augstein, Thomas Burger, and Claudia Pointner. User modeling for people with special needs. International Journal of Pervasive Computing and Communications, 10(3):313–336, 2014. Henry Lieberman, Neil W. Van Dyke, and Adrian S. Vivacqua. Let’s browse: A collaborative web browsing agent. In Proceedings of the 4th International Conference on Intelligent User Interfaces, pages 65–68, ACM, New York, NY, USA, 1999. Bradley N. Miller, Istvan Albert, Shyong K. Lam, Joseph A. Konstan, and John Riedl. Movielens unplugged: experiences with an occasionally connected recommender system. In IUI ’03: Proceedings of the 8th international conference on Intelligent user interfaces, pages 263–266, ACM, New York, NY, USA, 2003. MyUI. Myui design patterns repository, 2018. Available online at http://myuipatterns. clevercherry.com/. Reinhard Oppermann, Rossen Rashev, and Kinshuk. Adaptability and Adaptivity in Learning Systems. Knowledge Transfer, 2:173–179, 1997.

3 Automated personalization of input methods and processes | 101

[32] Alexandros Paramythis, Stephan Weibelzahl, and Judith Masthoff. Layered evaluation of interactive adaptive systems: Framework and formative methods. User Modeling and User-Adapted Interaction, 20(5):383–453, 2010. [33] Dimitris Paraschakis, Bengt Nilsson, and John Holländer. Comparative evaluation of top-n recommenders on e-commerce: an industrial perspective. In Proceedings of the 14th International Conference on Machine Learning and Applications, 2015. [34] Seonwook Park, Christoph Gebhart, Anna Maria Feit, Hana Vrzakova, Niraj Ramesh Dayama, Hui-Shong Yeo, Clemens Klokmose, Aaron Quigley, Antti Oulasvirta, and Otmar Hilliges. Adam: Adapting multi-user interfaces for collaborative environments in real-time. In Proceedings of the 2018 ACM SIGCHI Conference (CHI), 2018. [35] Matthia Peissner, Andreas Schuller, and Dieter Spath. A design patterns approach to adaptive user interfaces for users with special needs. In Proceedings of the 14th International Conference on Human-computer Interaction: Design and Development Approaches, Orlando, Florida, USA, pages 268–277, 2011. [36] Matthias Peissner, Dagmar Häbe, Doris Janssen, and Thomas Sellner. Myui: Generating accessible user interfaces from multimodal design patterns. In Proceedings of the 4th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Copenhagen, Denmark, pages 81–90, 2012. [37] Brigitte Ringbauer, Matthias Peissner, and Maria Gemou. From design for all towards design for one: a modular user interface approach. In Proceedings of the 4th International Conference on Universal Access in Human-Computer Interaction, Beijing, China, 2007. [38] Ben Schafer, Joseph Konstan, and John Riedl. E-commerce recommendation applications. In Ron Kohavi and Foster Provost, editors, Applications of Data Mining to Electronic Commerce, pages 115–153. Springer, 2001. [39] Markus Schedl, Peter Knees, Brian McFee, Dmitry Bogdanov, and Marius Kaminskas. Music Recommender Systems. In Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor, editors, Recommender Systems Handbook, chapter 13, pages 453–492. Springer, 2nd edition, 2015. [40] David Smits and Paul De Bra. Gale: A highly extensible adaptive hypermedia engine. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, Eindhoven, The Netherlands, 2011. [41] Constantine Stephanidis. Towards user interfaces for all: Some critical issues. Advances in Human Factors/Ergonomics, 20:137–142, 1995. [42] Constantine Stephanidis. Adaptive techniques for universal access. User Modeling and User-Adapted Interaction, 11:159–179, 2001. [43] Constantine Stephanidis. User interfaces for all: New perspectives into human-computer interaction. User Interfaces for All-Concepts, Methods, and Tools, 1:3–17, 2001. [44] Constantine Stephanidis, Alexandros Paramythis, Demosthenes Akoumianakis, and Michael Sfyrakis. Self-adapting web-based systems: Towards universal accessibility. In Proceedings of the 4th Workshop on User Interface For All, Stockholm, Sweden, 1998. [45] Jacob O. Wobbrock, Shaun K. Kane, Krysztof Z. Gajos, Susumu Harada, and Jon Froehlich. Ability-based design: Concept, principles and examples. ACM Transactions on Accessible Computing, 3(3), 2011.

Tobias Moebert, Jan N. Schneider, Dietmar Zoerner, Anna Tscherejkina, and Ulrike Lucke

4 How to use socio-emotional signals for adaptive training Abstract: A closer alignment of mutual expectations between technical systems and their users regarding functionality and interactions is supposed to improve their overall performance. In general, such an alignment is realized by automatically adapting the appearance and the behavior of a system. Adaptation may be based on parameters regarding the task to be fulfilled, the surrounding context, or the user himself. Among the latter, current emphasis of research is shifting from a user’s trails in the system (for instance, to derive his level of expertise) towards transient aspects (like his current mental or emotional state). For educational technology, in particular, adapting the presented information and the tasks to be solved to the current personal needs of a learner promises a higher motivation and thus a better learning outcome. Tasks which are equally challenging and motivating can keep the users in a state of flow and thus foster enduring engagement. This is of certain importance for difficult topics and/or learners with disabilities. The chapter explains the complex cause-and-effect models behind adaptive training systems, the mechanisms that can be facilitated to implement them, and empirical results from a clinical study. We exemplify this for the training of emotion recognition by people with autism, but not limited to this user group. For this purpose, we present two approaches. One is to extent the Elo algorithm regarding dimensions of difficulty in social cognition. This allows not only to judge the difficulty of tasks and the skills of users, but also to freely generate well-suited tasks. The second approach is to make use of socio-emotional signals of the learners in order to further adapt the training system. We discuss current possibilities and remaining challenges for these approaches. Keywords: educational technology, social cognition, emotion, task difficulty, adaptivity

1 Introduction Most skills can be improved through practice. A human trainer or tutor can provide motivation during training. However, IT-based training systems are often used withAcknowledgement: This work was partly funded by the German Federal Ministry of Education and Research in the joint project Emotisk under contract number 16SV7241. Moreover, we would like to thank our project partners from HU Berlin, namely, I. Dziobek, A. Weigand, and L. Enk, for providing the data from the usability study conducted with the EVA system. https://doi.org/10.1515/9783110552485-004

104 | T. Moebert et al. out the presence of a caregiver. In the design of such a system, therefore, the challenge arises to compensate for the lack of support by the human trainer and to sufficiently support the self-motivation of the trainee. Depending on the specific training goals, a defined period of continuous training should be aimed for as well as promoting training over a certain period of time. This can be achieved by adjusting the system’s behavior to the current emotional state and mental ability of the users.

1.1 Challenges of personalization Adjusting the behavior of a system is not a radically new approach. For instance, a feedback control circuit as known from climate control provides the same functionality: First, parameters of the environment are measured (e. g., using a thermometer). Then, an assessment of the situation is derived from that (e. g., it is slightly too cold). Comparing this to a targeted state leads to a regulation strategy (e. g., to moderately increase temperature). This is carried out by certain actuators (e. g., open heating one more grade). The same cycle of adaptivity can be applied to education, as depicted in Fig. 4.1, where learning outcomes are the overall goal and teaching provides the methods to achieve that.

Figure 4.1: The adaptivity cycle applied to an educational scenario.

This is independent from the nature of the training: A training system carries out the same steps as a human teacher. However, teachers do not always decide explicitly or consciously, which makes it hard to model and implement such behavior in a technical system. While some parameters for adaptation in educational settings are well understood, like defined learning goals or characteristics of the environment, others are still subject of research. Here, the challenge is three-fold:

4 How to use socio-emotional signals for adaptive training | 105

– – –

the classification of the learner based on measured parameters; the deduction of appropriate didactical strategies; the corresponding modification of the learning environment.

In this chapter, we present our findings in these fields regarding a certain example, namely, the training of socio-emotional cognition. Deriving emotions from faces or voices helps to understand the intentions and needs of others. Along with recognizing our own emotions this is a prerequisite to successfully interact and participate in society. This is of special importance, yet difficult for people with autism. However, other users can also benefit from such training, like certain age groups or professions. This variety underlines the need for a personalized, adaptive approach. Our basic assumptions for this are that (a) learning outcome may increase with motivation, (b) a condition of flow may preserve motivation, and (c) too simple or too complex tasks may hamper flow. From this perspective, we analyze how the difficulty of tasks and the skill of users can be modeled, how suitable tasks for recognizing emotions can be generated, and how such training can evolve over time.

1.2 Related work Research on adaptive educational technology is looking for sophisticated models beyond heuristics [45]. A challenge is that existing models from pedagogy and cognitive psychology are dealing with a rather high level of abstraction, like Adaptive Control of Thought [1] or Motivational Design [37]. Since internal mechanisms of human cognition and learning cannot be directly measured, these models are rather constructs of thought than descriptions of processes within the brain. This makes them hard to operationalize in the algorithms of a technical system [39]. Nevertheless, based on such models a variety of adaptive educational systems has been developed, for instance to select proper learning contents [9] or to provide helpful feedback to the learner [2]. Later, hypermedia technology and the eXtensible Markup Language (XML) provided the basis for dynamically adjusting a given learning arrangement to certain needs [7]. While the selection and adaptation of content were sufficiently described [43], new challenges arise from the complex authoring process for such content [50]. However, this is independent from education but applies to digital publishing in general. With the raise of mobile technologies the field of adaptivity is now also labeled as context-aware [63] or pervasive education [44]. Here, sufficient means to analyze the learning process, to model didactical strategies, and to react to changing conditions of the learner itself as well as of his environment are targeted. However, challenges still exist in the translation between pedagogical, psychological, and technological approaches. To provide an example, the field of game-based learning shall be sketched. Games are supposed to promote the motivation of learners [27]. This includes both minigames that offer short-term entertainment when performing a learning task (so-called

106 | T. Moebert et al. gamification) as well as mid- or long-term games that embed the educational content into a consistent story (so-called serious games). Besides an intriguing narrative, games may provide explicit rewards for frequent training and can thus (re-)activate passive learners. This may not only help to sustain motivation during a longer period of training, but in the short term may also foster a feeling of flow within the learner. However, this is a very personal mechanism. Shaping it requires insight in the current mindset of the learner as well as general regulative models of the learning process. Besides the achieved learning progress, additional parameters can indicate if the learner is in a productive state, e. g., the displayed emotion: While a bored learner tends to be underchallenged, a frustrated learner may be overstrained. As a consequence, recognizing the emotional state of the learner and adjusting the difficulty of the task accordingly may help to maintain a condition of flow – which in turn may improve the learning progress. We will explain this mechanism (and others) in more detail throughout this chapter by using the example of an adaptive training system for social cognition. Regarding the targeted application field of social cognition, there is a wellgrounded state of research in psychology. It is known that certain populations, such as the elderly [59] and individuals with autism [67], struggle with the intuitive recognition of facial expressions. Previous research has already shown that computer-based training of the ability to recognize emotions is possible [6]. Existing training tools often rely on concepts of gamification. However, these systems follow strict rules and assumptions about the levels of difficulty to be passed and do not adapt to the individual progress of the learner [40], which is even more harmful facing an extremely diverse and sensitive target group. A special challenge is that autistic people usually have no great interest in looking at faces [69] and additionally suffer from a dysfunction of the reward system of the brain [41]. The latter presents as a lack of attendance to social but also non-social stimuli, for example money, in autistic individuals. Therefore, autistic individuals might not be motivated by the same things and as easily as neurotypical individuals. Therefore, they are a population that, on one hand, has a substantial need for emotion recognition training, but, on the other hand, might not benefit from traditional tutoring approaches. In the worst case human tutoring could cause them substantial discomfort. Conversely, an adaptive training system could provide a safe environment which fosters self-motivated learning. We will use this as an example throughout this chapter. In general, various lab experiments have provided clues that the success of learning depends on various personal factors. For game-based learning, these include different levels of gaming experience and gamer types [53]. For education in general, different learning styles and habits have been studied [10] which have to be addressed by a proper educational or game design. In a broader sense, age and gender may also play a role. We will analyze these influencing factors in more detail for the field of social cognition. However, several findings and principles can be transferred to adaptivity for other subjects of training or learning.

4 How to use socio-emotional signals for adaptive training |

107

1.3 Relevance of socio-emotional signals for adaptive systems For our targeted field of application, emotions are relevant from two perspectives. First, they are the core subject of training, which means that it may be useful to reflect the ability of the learner not only to recognize, but also to display a certain emotion (mainly basic emotions like joy, fear, or anger) within the training system. Second, the current emotional state of the learner (mainly complex emotions like boredom or frustration) may influence his degree of flow or motivation. This is another reason why detecting the emotional state of the user and processing it within the training system may be useful. We will demonstrate that with current technological means this is feasible for basic emotions, but not trivial (if possible at all) for complex emotions. This chapter is grounded on an elaboration of previously identified relationships between the characteristics of emotions and personal aspects of people displaying or recognizing this emotion, which is presented in Section 2. Knowing what is difficult in recognizing emotions can help to adapt a system to train this ability. Next, we will discuss how adaptive learning arrangements with respect to emotions can be realized. Section 3 is focused on emotions as the subject of the learning process, where we present a novel approach for modeling the difficulty of tasks and the skills of learners. Section 4 is devoted to emotions as a parameter for adaptation. This includes feedback on emotions imitated by the learner as well as adaptation regarding the mental state of the learner. Though at first sight it might look contradictory to have emotions as a subject and as a parameter of learning, we will show that both approaches are based on similar mechanisms and thus can be realized using similar technological means. In Section 5 we will present findings from a study with a clinical sample using our system and a discussion of these results. The chapter concludes with a summary and outlook to further work in Section 6.

2 The concept of emotions and human emotion recognition Emotions will re-appear as a topic throughout the entire chapter, either as a content of teaching or as a parameter for adaptation. The following section will therefore describe the basic concept of emotions and introduce important findings from the literature.

2.1 Emotions and emotion representation systems While there is no consensus in the scientific literature on a definition of the term “emotion”, most authors agree that emotions are psychological and physiological affective states which are triggered as a reaction to events or situations, which are coupled with

108 | T. Moebert et al. bodily processes and sensations and which affect behavior, decision making, and interaction with the environment [5, 8, 12, 19]. The relationship between the subjective emotional experience of these affective states and their expression is still under debate [20, 36] and there is disagreement on the optimal representation system for emotions, for example in the context of human–computer interfaces [55]. Two of the most applied theories of emotion are Ekman’s basic emotion theory [19] and Russel’s core affect [60, 61] model. Basic emotion theory describes six discrete emotional states (happiness, sadness, anger, fear, surprise, and disgust) that are universal and thus recognized and expressed across cultures. According to Ekman these states were shaped throughout evolution to provide mammalian organisms with effective and survival promoting responses to the demands of the environment. While there is evidence for the universality of these emotions this emotion model is very limited in representing the multitude of emotions outside of the six basic emotions or the degree of emotional activation. However, research has shown that observers readily attribute a mixture of emotional states of varying intensity to emotional expressions [58, 30, 56]. The core affect model is an equally simple yet less restrictive model of emotions that allows for the representation of virtually any emotion in a dimensional sense. Emotions are described using the dimensions valence and arousal. Valence is the level of pleasantness or pleasure of an emotion and typically measured on a scale ranging from “very positive” to “very negative.” Arousal captures the level of activation or excitement of an emotion and is typically measured on a scale ranging from “very calm” to “very aroused.” These dimensions are empirically validated in that they frequently appear as the underlying variables in dimension reduction analyses (such as factor analysis or multidimensional scaling) on self-reported or ascribed affect and ordered emotional words [25, 60, 61]. The advantage of this representation lies in the continuous character of the two constructs valence and arousal, which also includes a representation of intensity of an emotion. Every point in the space spanned by the two dimensions represents an emotional state of a certain intensity. An emotional state is characterized by the relative proportion of valence and arousal, while the intensity is the absolute distance from the neutral middle ground on both dimensions. This representation allows for meaningful distance calculations between emotions, which can be used for adjusting difficulty of emotion recognition tasks (see Section 3.5). The bigger the distance between two emotions in valence–arousal space, the more different and consequently the easier to tell apart they should be.

2.2 The difficulty of emotion recognition from faces Facial expressions (FEs) have been a research topic for more than a century [13]. FEs are an essential channel for emotional expression and facilitate human interaction.

4 How to use socio-emotional signals for adaptive training |

109

The accurate recognition of FEs is dependent on the individual displaying the expression (called actor in the following) and the individual perceiving the expression (called observer in the following). For FEs to function as a communication channel the actor has to produce expressions that will be readily understood by the majority of observers. On the other hand, a potential observer has to possess recognition abilities to understand the majority of expressed FEs. Some FEs seem to be generally easier to recognize than others. For example, happiness shows a higher recognition accuracy across cultures than disgust or fear [21]. A variety of variables are known to influence the perception accuracy of emotional FEs. These are age and sex of the actor and observer on the one hand and valence and arousal properties of the displayed emotion itself on the other hand. For the design of appropriate tasks of a training system it is important to know how these variables contribute to the difficulty of an emotional stimulus. Understanding the size of these contributions will determine which variables are most useful for task generation. Variables that strongly influence the difficulty of an emotional expression will have to be manipulated carefully to ensure the generation of tasks that cover a great range of difficulty, whereas variables with lower influence could be neglected with little consequences. The following sums up the scientific literature about these variables and discusses their usefulness for task generation. 2.2.1 Age and sex of actor A review by Fölster et al. [24] concluded that emotional expressions were harder to read from old faces than from young ones. This difference might stem from morphological changes of the face, such as folds and wrinkles, interfering with the emotional display, worse intentional muscle control in the elderly affecting posed emotions, and/or negative implicit attitudes towards old faces [26]. Women are more expressive in their facial expressions [23], e. g., women smile more than men [33]. A recent study by McDuff et al. [49] on over 2000 participants from five countries also found that women displayed most investigated facial actions more frequently than men. However, they also found that men show a greater frequency of brow furrowing compared to women, meaning that while women show a higher frequency of displayed expressions overall, there are certain facial actions that are more often displayed by men. However, it is unclear how differences in expressivity between the sexes relate to emotional decoding difficulty of the observers. 2.2.2 Age and sex of observer Emotional decoding abilities are not stable across the life span. There is evidence that decoding abilities decline with increasing age in adulthood [59]. By contrast, developmental effects of emotion recognition capabilities can also be observed until the end of adolescence [52].

110 | T. Moebert et al. Previous studies have found a sex difference in recognition abilities only when employing subtle emotional stimuli [35, 51]. A recent meta-analysis concluded that women have a small advantage over men in the recognition of non-verbal displays of emotion (mean Cohen’s d = 0.19) [65]. 2.2.3 Peer group effects Peer group effects are based on the attribution of group membership of individuals. Although there seems to be evidence for a “like-me” bias in the attention towards and memory of faces from the same age group, little evidence exists for a noteworthy peer group effect in facial emotion recognition [24]. Interactions of observer and actor sex seem to be restricted to specific emotional expressions. For example, male observers react with higher skin conductance responses [48] to angry male faces, although this does not seem to translate into a perceptual advantage. 2.2.4 Valence and arousal effects Of the six basic emotions happiness is recognized with the highest accuracy [21]. Extending this line of research, we found evidence for a strong correlation of participants’ happiness and valence judgments (r = 0.88) and we could furthermore show that a moderate quadratic relationship between valence and the perceived difficulty of emotional facial expressions exists. In practice this means that expressions that are judged as being of particularly high or low valence will be easy to decode for observers whereas expressions that are of neutral valence will be the most difficult for observers to decode. The valence of an FE is twice as important as a predictor for the perceived difficulty of the FE than the age of the observer or the arousal of the FE. It is three times as important as the age of the actor and more than ten times as important as the sex of the actor or the sex of the observer. Arousal is as important as a predictor for perceived difficulty of an FE as the age of the observer, twice as important as the actor age, and more than five times as important as the sex of the actor or observer. To sum up, the valence and arousal characteristics of an FE are at least as important as or more important than all the person-specific characteristics of observer and actor. Following this line of thought, adaptivity mechanisms of a learning system should focus on the valence and arousal of emotional FE stimuli to adjust the difficulty because these variables have a greater influence on the perceived difficulty of the learner. Furthermore, the dimensional nature of valence and arousal allows for a much finer-grained manipulation than the person-specific characteristics. This is because person-specific characteristics are unchangeable (user age and sex), allow only a binary choice (actor sex), or allow only a choice among a limited range of values (actor age). The combination of valence and arousal, on the other hand, allows for a wide range of possible values.

4 How to use socio-emotional signals for adaptive training

| 111

2.3 Recognizing emotions from other cues Emotions can not only be found in the face of people, but are also present in their voices and their body language – in every social interaction. In fact, the system we will introduce in the next sections is targeting several aspects of emotion display and recognition. However, this chapter is limited to facial emotion for the sake of simplicity.

3 Adaptivity for the example of the training of emotion recognition This section describes the general principles of an adaptive training system, which is built on the Elo rating system. We will discuss potential mechanisms which enable an adaptive training system to align with the learner’s momentary ability. We illustrate the described concepts with the example of our implementation of an adaptive training system for emotion recognition. This mechanism can be easily transferred to other subjects of training or learning.

3.1 Possible adaptation mechanisms When designing the difficulty level of tasks in a game-based learning environment, two things are central. First, a way has to be found to measure and map the learner’s skills. Second, training tasks must be designed so that their level of difficulty increases and decreases to match the skills of the learner [54]. Training systems often follow the idea of gradual learning progress. In this case, exercises are often created in advance at different degrees of difficulty and are then to be processed consecutively with increasing difficulty. In contrast to traditional curriculum design [64], where knowledge and skills are gradually accumulated, training (especially of socio-emotional skills) is also about form-related performance, which must be reflected in the difficulty of the tasks. In fact, this is the case where the exerciser can move faster or slower than anticipated by the training system. If he is faster he has to spend a lot of time with tasks that will not challenge him, and if he is slower the tasks will sooner or later exceed his abilities. This can sometimes lead to frustration because the sense of one’s own abilities is no longer consistent with the assessment given by the system. Another often chosen way to increase the level of difficulty is to implicitly introduce new concepts and game elements to make tasks more difficult. However, if this change in the game is not explicitly communicated, the lack of transparency in the parametrization of the level of difficulty can make the trainee feel that his skills are

112 | T. Moebert et al. inadequate, even though he lacks only the understanding of the structure of the task. Moreover, additional game elements may result in primarily training other skills than desired, e. g., the training of the memory or the concentration on irrelevant subtleties in those elements. A third variant is the use of algorithms for estimating a gradual degree of difficulty for certain compilations of task components. This estimation can then be used again to construct tasks that correspond to the skill level of the trainee. The challenge here is to find a simple and comparable map of both the skill level of the trainee and the requirement of the task to be generated. In the following, we want to present how we used the Elo rating system as the basis for such a comparison, and how we developed an adaptation mechanism for our game-based training system based on it.

3.2 The Elo rating system The Elo rating system was initially invented by Arpad Elo to improve the rating of chess players [22]. It was developed with the idea to have an easy-to-use comparison tool, which is why the mathematics behind the system are also quite simple. Its calculation method allows estimating the relative skill levels of two chess players. The system assumes that the chess performance of each player in each game is a normally distributed variable, and that the mean value of the performances of any given player changes only slowly over time. In most implementations, the score of a player equals a number between 0 and about 2500 (although the rating theoretically has no upper limit). The difference in player rating serves as a predictor of the outcome of the match, i. e., EstimateA =

1 , 1 + 10(EloB −EloA )/400

(4.1)

where EstimateA is the estimated probability of winning for player A, EloA is the Elo score of player A, EloB is the Elo score of player B, and the number 400 was chosen by Arpad Elo for compatibility reasons. Fig. 4.2 depicts the probability of winning a match for one player (Player A) in relation to the difference between his opponent’s Elo score and his own Elo score (Elo score Player B − Elo score Player A). It is obvious that the probability of Player A winning a match is proportional to the number of points he is ahead of his opponent Player B. For example, a player whose rating is 200 points higher than his or her opponent’s score has a 76 % win expectation; a player with more than 400 points is expected to win 91 % of the matches. The score increases or decreases, depending on whether the player wins or loses the match and whether this result corresponds to the prediction. Thus, in a game with a high chance of victory, winning only scores a few points, whereas defeat in such a game means losing many points. The system therefore adapts if the

4 How to use socio-emotional signals for adaptive training

| 113

Figure 4.2: Winning probability of Player A as a function of the difference in Elo scores of Player B and Player A. The probability follows a sigmoid function centered at a difference of 0 between the scores of the two players, which equals a probability of 50 % for winning and losing. The larger the difference between the players’ scores, the higher the probability for winning or losing, depending on whose score is higher.

prediction does not correspond to reality. The lost points are deducted from the loser and credited to the winner, i. e., Elo󸀠A = EloA + k ∗ (ResultA − EstimateA ),

(4.2)

where Elo󸀠A is the change in Elo score for player A, k adjusts how many points the player will gain or lose, and ResultA is the actual result of the match (1 for a win, 0.5 for a draw, 0 for a defeat). New players are rated with an initial estimated value. In order for such a player not to have to play a large number of matches against too strong or too weak opponents in order to be assessed correctly at some point, the k-value is introduced, which adjusts how many points a player wins or loses. Typically, this value is high at the start of the career and decreases with the number of games played, but the k-value adjustment varies from implementation to implementation. However, the Elo rating system is not only applicable to chess games. In fact, it can be used to rate the contestants in any zero-sum game. Therefore, the Elo system

114 | T. Moebert et al. or variations of it are often used to rate the players in competitive video games, e. g., Dota 2, League of Legends, or Overwatch.

3.3 Modeling of learners skills Adaptivity requires user modeling, in this case learner modeling. In principle, this can be achieved with qualitative means, for example learner style, or with quantitative means, for example level of expertise [54]. We use a quantitative approach based on the Elo score, but differentiated by task and skill categories. Since our training system focuses on the training of emotion recognition in three areas, namely, facial expression, voice, and social situations, the skills of the player were also modeled so that these three areas can be found there. For each of these skills, there are several training modules. The various training modules differ, for example, in that the emotion to be recognized must be identified implicitly, that is, by intuition, or explicitly, that is, by naming. Figure 4.3 shows one of the modules.

Figure 4.3: The Face Puzzle requires the player to arrange film snippets of emotions. Only the eyes of the target emotion and the mouths of three different emotions are visible.

The Elo rating system has previously been used in competitive video games as well as in IT-based teaching systems. Our contribution to this is the scientifically based transfer of multidimensional aspects of emotion recognition (as explained in Section 2) to the onedimensional Elo rating system. Also new is the prediction of Elo scores, representing the expected difficulty of a previously unplayed task, based on psychological

4 How to use socio-emotional signals for adaptive training

| 115

models. This allows for the generation of new training tasks that adequately fit the user’s current level of skills, even though these tasks have never been played or rated before (which will be explained in the next section). Inspired by the Elo rating system, each player has an Elo value for each training module: the EVA score. This represents the presumed skill level for emotion recognition within this specific training module. The EVA score ranges from 0 to 2500, and each player has separate scores for the different training modules. Originally, there was an EVA score for each skill. However, since multiple modules train the same skill, this would have meant that points would be passed between different point pools (see a detailed description in Section 3.5). This is not intended in the Elo rating system and it would have distorted the rating. Whenever a new module is unlocked, the player starts with an EVA score of 1200. In an assessment session, the player is then confronted only with tasks from the currently unlocked module. The level of difficulty of the tasks within this session varies greatly, so that an initial assessment of the skill level can be given. This is achieved by using a high k-value to calculate the degree of change of the scores. Since different training modules can train the same area of emotion recognition, e. g., recognizing emotions in faces, voices, and so on, the player will only see the average EVA score of all modules that train the same skill.

3.4 Generation of tasks with predicted level of difficulty Similar to the players, the training tasks also have an EVA score which is initially estimated. In order to always provide the player with a fair challenge, tasks should have an EVA score that does not deviate too much from the player’s score. Training tasks usually consist of a target, the emotion to be recognized, and distractors that are meant to distract from the correct solution. Based on the number of trainable emotions and available actors in our video database, in a task consisting of a single target emotion and two distractors, about 3 million possible combinations of training tasks are possible. Storing all task combinations in advance would be impractical. Instead, tasks are generated on-the-fly using estimates to meet the needs of the current player. In terms of content, the individual tasks are structured as simply as possible. For this purpose, the tasks are presented in such a way that exercises of higher difficulty are not generated by a higher number or more complex representation of the emotions to be recognized, but by the use of different parameters, for example: – Similarity of emotions: According to the core affect model [60], emotions with high similarity in valence and arousal should be more difficult to distinguish. Incorrect answer choices, i. e., distractors, with low similarity to the target are easier to rule out as a potential answer. Consequently, distractors located close to the target emotion in valence–arousal space make the overall task harder. – Variability in the expressivity of the actors: Different actors present the same emotion to different degrees, which in turn can complicate or facilitate recognition.

116 | T. Moebert et al. –

Complexity of emotions: One property of basic emotions (such as joy, sadness, fear, or anger) is that they are recognized reliably across cultures [19], which is due to the innateness of these emotions. In contrast, non-basic emotions or socalled complex emotions do not have this property, therefore depend much more on learned cultural conventions, and can thus be expected to be more difficult to recognize.

In the following, the estimation of the task difficulty will be demonstrated using the Face Puzzle Implicit Module (see Fig. 4.3). The tasks in this module consist of a target emotion, which should be recognized, and two distractors, which represent wrong answer possibilities. For implicit face puzzle tasks, differentiating emotions is more important than naming. This is because in these tasks, no emotion labels are used, but different emotions are only to be distinguished by valence and arousal, e. g., an actor playing the emotion “anger” looks more negative and excited than an actor playing the emotion “relieved”. To determine the difficulty to distinguish two emotions, we calculate the distance in space between the corresponding valence and arousal vectors, normalized by the size of this space (see equation (4.3)). Emotions which are easy to differentiate are further apart, while emotions difficult to distinguish are closer together. Our research suggests that valence has a greater influence on the difficulty of emotion recognition. We expect this effect to also play a role in emotion differentiation, which is why we weigh valence more heavily in the distance calculation. Note that these factors are specific to our scales of difficulty and do not reflect a generalized relation between valence and arousal. We have dist(e1, e2) =

1 √ 2 2 ∗ 4 ∗ ((e1valence − e2valence ) − 2) − ((e1arousal − e2arousal ) − 2) , 18

(4.3)

where dist(e1, e2) is the distance between the two emotions e1 and e2, e1valence , e2valence are valence values for both emotions, and e1arousal , e2arousal are arousal value for both emotions. This results in a distance value (dist) between 0 and 1, with 0 being closest together and 1 being furthest away. Reference values for valence and arousal (e1valence , e2valence , e1arousal , e2arousal ) are a combination of results from previous studies [32] and our own investigations. The estimated EVA score for differentiating both distractors (EVAdiff ) from the target emotion (equation (4.4)) is then calculated from these distance values, i. e., EVAdiff (dist1, dist2) = 2500 ∗ (1 − √dist1 ∗ dist2),

(4.4)

where EVAdiff is the EVA score for differentiating both distractors from the target emotion and dist1, dist2 are the distances from the target emotion for both distractor emotions. In several studies we have determined for all 40 emotions covered by the training system how difficult they are to recognize on average. From these degrees of difficulty

4 How to use socio-emotional signals for adaptive training

| 117

we have derived corresponding EVA scores between 0 and 2500, where the maximum corresponds to the level of a grand master in the original Elo algorithm for chess rating. For instance, considering the task category of the Implicit Face Puzzle presented above, the overall recognition difficulty (EVAident ) of a certain task (equation (4.5)) is calculated from the average EVA scores of the target (EVAtarget ) and the two distractors (EVAdistractor1 , EVAdistractor2 ), i. e., EVAident (EVAtarget , EVAdistractor1 , EVAdistractor2 ) EVAtarget + EVAdistractor1 + EVAdistractor2 = , 3

(4.5)

where EVAident is the EVA score for identifying all the emotions that are part of the task and EVAtarget , EVAdistractor1 , EVAdistractor2 are the EVA scores for recognizing the emotions represented in the target and the distractors. As mentioned, identifying emotions plays a minor role in this task type. Therefore, in the final calculation of the task difficulty (equation (4.6)), the EVA score for differentiation is amplified, i. e., 1

EVAtask (EVAident , EVAdiff ) = (EVAident ∗ EVA2diff ) 3 ,

(4.6)

where EVAtask is the final EVA score for the task, EVAident is the EVA score for identifying all emotions that are part of the task, and EVAdiff is the EVA score for differentiating both distractors from the target emotion. This estimation formula can now be used to generate a wide variety of such tasks with different difficulty. However, as this is only an initial estimate, the difficulty levels of the tasks have to be constantly adjusted, similar to the player’s skill assessment. This will be explained in the next section.

3.5 Dynamic classification of learner skills and task difficulty Since both player and task have an EVA score, both can compete against each other, much like two chess players. Any attempt to solve a task is considered a match. The task is treated like a player and therefore when a task “wins” or “loses” against a player, its score improves or worsens as well. As a result, the assessment of already generated tasks becomes more and more accurate over time. As an example, if a player with an EVA score of 1425 would take on a task with a score of 1480, he would have a 42 % chance of winning – similar to the Elo principle of rating player versus player. It is therefore expected that it is statistically more likely that in this case the user cannot solve the task. The following results are possible (with a k-value of 20): – Task solved (unlikely result): The player wins 12 EVA points. The EVA score of the task drops by 12 points. Although it was less likely that the player could have solved the task, he still solved it. Probably the player is better than expected and/or the task is easier than expected.

118 | T. Moebert et al. –

Task not solved (more likely result): The task was finished as expected. The player loses 8 points EVA score; after all, it would have been possible to solve the task, and the EVA score of the task increases by 8 points. Nevertheless, the player loses fewer points than he could have won as winning the match was less likely.

In our training concept, the player plays a session of several tasks from different modules in one piece. The tasks within a session are selected or generated based on the player’s current EVA score. While playing the session, in the background, his EVA score is constantly changing. Only at the end of the session the new score and its change will be presented, averaged over all abilities and modules (see Fig. 4.4).

Figure 4.4: The feedback screen reflects the user’s performance (in German language). Left: EVA score. Center: experience level. Right: library of emotions. Bottom: user feedback.

The EVA score is divided into different groups, similar to titles in chess. The naming of the groups was chosen to avoid degrading users with a stigmatizing label like beginner or amateur. We defined a group hierarchy consisting of Bronze, Silver, Gold, Platinum, and Diamond. The updated EVA score of the player will be used again when creating the next session to generate or select new tasks. The pool of generated tasks is shared between all players. A player can therefore be given a task in a session that was generated or played by another player. In this way, both the players and the training tasks, over the course of the training sessions, are increasingly estimated and the generated training sessions are tailored more and more to the skills of the players. An additional measure is the experience level (see Fig. 4.4). It expresses nothing about the training progress, but is merely a measure of the invested training time.

4 How to use socio-emotional signals for adaptive training

| 119

Unlike the EVA score, it is not intended to communicate training success but to reward the endurance invested in the training.

4 The emotional state of the learner as a supporting parameter for adaptivity In Section 3 we showed how theoretical knowledge about emotions was used for the adaptation of training tasks in the context of an emotion recognition training system. Conversely, in Section 4 we will describe how the emotions of the trainee himself can be leveraged to improve the training outcome. The basic assumption is that it should be possible to automatically assess the emotional state of the learner, for example from their facial expressions during the training, since humans have that capability, too. However, this poses several technical and conceptual challenges, as the next sections will describe. In brief, it has to be considered which physiological signals can be captured easily and without substantial discomfort for the user to assess their emotions. Furthermore, not all emotional states that can potentially be automatically assessed are of equal importance for determining learning success and the adequacy of the training situation. Finally, if learning-relevant states can be detected, the question is how this information can be used to maintain a positive engagement of the user with the training system. The following section will discuss these aspects in detail and provide an overview of the current knowledge and state of the art. Furthermore, we will provide an example on how automatic emotion recognition technology can be used successfully despite its shortcomings.

4.1 Automatic emotion recognition from facial expressions Automatic emotion recognition is the classification of a subject’s emotional state by computational methods. In the context of adaptive learning systems the aim of automatic emotion recognition is to assess the current motivation and ability of a user in order to foster further engagement and learning gain. Automatic emotion recognition is conventionally achieved by applying Machine Learning algorithms to labeled emotional expression data. In the training phase such an algorithm computationally extracts the underlying patterns that separate the classes given by the labels of the data. The set of complex rules found through this behavior is called a classifier and can be applied to unseen data. Physiological signals, such as skin conductance response or heart rate [38] can be used for emotion classification; however, these methods usually require scientific equipment not available outside of the laboratory. FEs, on the other hand, provide

120 | T. Moebert et al. emotional information and can be recorded with conventional webcams which are widely available. FEs therefore can be used to build emotion classifiers. Available FE datasets typically consist of static pictures of a number of actors displaying each or some of the six basic emotions, for example [42]. Most datasets only feature actors facing the camera, although there are exceptions, for example the Amsterdam Dynamic Facial Expression Set (ADFES) [68], which contains the additional emotions contempt, pride, and embarrassment and which also contains videos with head movement during the display of facial expressions. An issue that also lies within the limitations of available datasets is the use of acted expressions instead of spontaneous displays of emotion. Classifiers built on posed facial expressions can be expected to suffer from a loss of accuracy when applied to spontaneous emotional expressions. Traditionally, FE classifiers were built with established Machine Learning techniques such as support vector machines or random forests and the combination of various feature extraction methods. Depending on the datasets used for evaluation accuracy rates greater than 90 % (for example [72]) could be reached, which is comparable to human emotion recognition skills. An extensive overview of classification approaches, common datasets of the field, and respective results can be found in [62]. With the recent advent of Deep Learning classification techniques accuracy rates could be further improved. For example, Lopes et al. [46] showed Deep Learning approaches that beat many of the existing state-of-the-art methods across a variety of typically used datasets.

4.2 Relevant emotions for learning environments While the automatic recognition of basic emotions under ideal conditions could be considered a solved problem, the sensible integration of these methods into learning environments proves to be difficult. This might be because Ekman’s basic emotions are rarely triggered during learning and their relationship with learning outcomes remains unclear. More relevant to learning might be emotions such as engagement, confusion, boredom, or frustration. Engagement and also confusion have been shown to be positively correlated with learning gain while boredom and frustration have been found to have a negative correlation with learning gain [28, 3, 11]. D’Mello and Graesser [15] synthesize these findings in a framework that describes confusion as an indicator of cognitive disequilibrium which can either be resolved in which case the learner will return to a state of equilibrium and engagement or turn into a state of frustration and later boredom if unresolved. They reason that cognitive disequilibrium is essential for Deep Learning. In that sense, a fruitful learning experience should entail frequent obstacles that are overcome by the learner by additional

4 How to use socio-emotional signals for adaptive training

| 121

causal reasoning that will lead to a deeper conceptual understanding. Prolonged periods of confusion and frustration should however be avoided because they will lead to boredom and disengagement.

4.3 Challenges of automatic emotion recognition Outside of the laboratory automatic FE emotion recognition systems are faced with additional disturbances such as users’ head and body movements, varying illumination conditions and camera angles, and partial or full obstruction of the face. This is reflected in drastically reduced accuracy rates in naturalistic test sets compared to what is known from standardized test data. For example, the ”Emotions in the Wild“ competition contains a challenge on classifying facial expressions cropped out of movies and tv series into basic emotion categories. The winner of this competition could only achieve 62 % accuracy on the test set [14]. These results illustrate that the major challenge of automatic emotion recognition systems might not lie in the classification task itself anymore but in dealing with the circumstances of natural environments in which they are to be employed. Another non-invasive way to measure emotions, apart from FE recognition, is to analyze mouse movements and keyboard input. In the laboratory, it was shown that both forms of input are suitable for capturing the emotional state of the user [73, 74, 47, 29, 34]. The presented methods show that it is possible, for example while surfing the Internet, to detect the user’s emotional state unnoticed via the browser. In the case of negative emotions, the system behavior can then be adapted by means of assistance or explanations. Another challenge is to translate readings of emotional state into changes in the learning experience. This poses questions on how to best transfer users from undesired emotional states into learning facilitating states. Even though theoretical models exist which describe the transitions between emotional states during learning (see Section 4.2), little research has been conducted to validate these models. Hence, even less is known on how transitions from one emotional state into another can be successfully guided by adaptive changes of the training system.

4.4 Training of facial expression The technical prerequisites exist for reading out the basic emotions, such as anger or happiness, with the use of automatic emotion recognition software, called face reading in the following. However, using them outside of the laboratory usually proves difficult. Fortunately, in a game-based training context, a player can be motivated to provide appropriate conditions for face reading. This is partly because the face reading can be made explicit for the player. This means that the player can see himself

122 | T. Moebert et al. on the display as part of the task during face reading. As a result, he or she can detect faults, such as overshooting light, and correct them if necessary. In addition, the player has a self-interest that the face reading works well, since he wants to shine in the game. For the training system presented here, this led to a mimicry module. There are several variants of the module. As an example, in one variant, the player has to imitate an actor-played base emotion to a certain extent (see Fig. 4.5).

Figure 4.5: The automated rating of similarity in facial mimicry. (left) An actor displays an emotion. (right) The player is requested to imitate it over a certain period of time.

The training success in performing mimicry tasks can only be used for the EVA score algorithm if this success is quantifiable. The challenge is to automatically calculate the similarity between a given expression and the user’s imitation in such a way that the value is both meaningful to estimate training success and provides transparent feedback to the user.

5 Discussion The Elo rating provides a straightforward mechanism which, as shown, can also be used to assess trainers and training tasks in a game-based training context. However, there are limits. For what constitutes its simplicity, namely, that only the victories and defeats are evaluated, there is also a weakness for the adaptation. Only the outcome is considered for the calculation, but not the cause. For the player, it remains unclear why he was successful or not. Was it just his abilities or was he unfocused, distracted,

4 How to use socio-emotional signals for adaptive training

| 123

or simply in a bad mood? A possible intervention is to ask simple questions to the player after training, such as: “Was the session instructive for you?” However, a less invasive method of detecting frustration or motivation, such as emotion recognition through face reading, would be a rewarding alternative to better assess player and task performance. Moreover, a player could perform better for some emotions than for others, or for some actors better than for others. An open question is whether the actual task difficulty depends only on the selection of the emotions as target and distractor, or whether other factors (such as the acting performance, gender, or age of the actors) play a role. Although these factors affect the assessment of the tasks, this happens only implicitly over time. Further research and more precise models are needed to integrate this into calculations. Clinical studies of previous (non-adaptive) versions of such training have proved effective in terms of learning outcome, i. e., sustainable improvement of social cognition and social behavior [40]. A study examining the usability and flow of our software was conducted by our project partners. In addition, we performed an investigation into the progress of users and the accuracy of EVA score predictions for newly generated tasks. As previously noted, we have used a science-based algorithm to predict the EVA score (difficulty level) of newly generated tasks (see Section 3). From a computer science point of view, one of our core research interests was to find out how this estimate fits reality. For this purpose, we have saved the generated EVA score for each generated task. After the study, we then measured the real difference between the generated EVA score and the actual EVA score after several runs. The average relative difference in relation to the number of task occurrences is shown in Fig. 4.6 as an example for the Face Puzzle.

Figure 4.6: The estimated EVA scores do not differ much from the actual measurement, even after several runs.

124 | T. Moebert et al. Our measurements have shown that, even after several runs of a single task, their EVA score averages no more than 5 % relative difference between estimation and measurement. This suggests that, while there is still room for improvement, our estimate is already very accurate. Fig. 4.7 shows the development of EVA scores for multiple users across their training sessions. EVA scores increase continuously for most users, which reflects an increase in the users’ emotion recognition abilities. Different learning rates of the users, as represented by the slopes of the lines, are also evident: While some users’ scores increase rapidly, others display rather slow increases. Only a few users show stagnating scores and therefore little or no increase of their measured emotion recognition abilities over the course of their training sessions. Overall this exemplifies that the EVA scoring system is able to measure the training success of the users accurately and furthermore that most users benefit from the training sessions.

Figure 4.7: Progress of EVA scores for one module over the course of the users’ training sessions. Each line represents a user’s EVA score. All users start with a score of 1200. Most users’ scores increase over time.

Figure 4.8 shows the flow state measured using the Flow Short Scale questionnaire [57], which separates Flow in the three sub-factors fluency of performance (FI), absorption by activity (FII), and the additional factor concern (FIII). The highlighted bars

4 How to use socio-emotional signals for adaptive training

| 125

Figure 4.8: Flow for EVA users with Autism Spectrum Conditions (ASC) and neurotypical users (NT) compared to other systems or tasks. Studies on socio-emotional skills are highlighted in color.

show training tools that train socio-emotional skills (see also [66]). The other bars are comparative values from the work of Rheinberg et al. The group of neurotypical users has a relatively high level of absorption and fluency and relatively little concern. Overall, the EVA system shows high values for flow experience. In the other studies the highest values for flow and concern were found in a sample of graffiti sprayers. High values for concern correlate with findings from [70], where autistic users as compared to neurotypical users reported significantly higher levels of effort (67 ± 20 versus 39 ± 26) when performing the training with EVA, measured with the NASA Task Load Index [31]. The usability of the EVA app was measured with the System Usability Scale (SUS). Two groups with different characteristics were studied: people with autism spectrum condition (ASC) and neurotypical people (NT). Each group was again divided into a laboratory and a longitudinal testing group. A laboratory session lasts 90 min and a longitudinal session contains a two-week home-based training with the EVA app. The values for the laboratory NT-group (M = 81.9; SD = 11) and the longitudinal ASC-group (M = 82.5; SD = 6.6) are almost excellent (see Fig. 4.9).

Figure 4.9: SUS Scores from our EVA study and original ranking from [4].

126 | T. Moebert et al. Previous research could provide evidence for the effectiveness of adaptive training only for low-prior knowledge students and in some conditions [17]. Further research is required to assess which adaptive features (like task difficulty, duration, and composition or the display of additional information) are efficient in general, and which only for specific training setups. This could be measured in terms of impact on learning gain, training time, or user satisfaction. Another option is to assess the smoothness of changes in task difficulty, for instance by determining the user’s motivation or intensity of flow throughout the training (either explicitly by using a questionnaire, or implicitly by monitoring his or her behavior). Unfortunately, explicit assessment will most likely disturb the flow effect and the gaming experience in general, while methods for an implicit and unobtrusive measurement still have to be designed. A general challenge for such research is to shift emotion recognition from clinical labs to real-life conditions. This is not only a technical issue, since there are plenty of mechanisms less prone to disturbances than facial expression detection, as discussed in Section 4.4, and multimodal solutions can provide an additional increase in accuracy [16]. In-situ research becomes even more important since emotions are a very personal affair and thus expressing and recognizing complex emotions, in particular, may benefit from a familiar environment. Also, socio-emotional training will preferably take place in informal settings. If robustness of automatic emotion recognition can be improved under real-life conditions, user satisfaction, flow states, and engagement could be dynamically assessed, which provides additional cues to assess and to adjust difficulty of individual exercises.

6 Conclusion In this chapter we presented the scientific principles and the results of developing adaptive training systems based on emotions. First, emotions as a subject of training were discussed. We showed that the training can be improved by adapting the difficulty of tasks based on the difference of emotions, which was derived from empirical findings on human emotion recognition. Our evaluation of the adaptivity algorithm in EVA demonstrated that (i) our initial rating of the difficulty of previously unplayed tasks was appropriate, (ii) the adaptation of task difficulty converged when more than five sessions of this task were played, and (iii) the skill level of users (both autistic and neurotypical) continuously increased over time. Previous studies could prove that social cognition as well as social behavior in general can be sustainably enhanced by such training. This has still to be proven for EVA in a clinical intervention study. Moreover, is remains to be verified if adapting the difficulty of tasks in EVA helps to keep users in a state of flow or to maintain the motivation for training. On that basis, we presented mechanisms to facilitate emotions as an additional parameter for adaptivity. However, facial emotion detection is still a challenge, es-

4 How to use socio-emotional signals for adaptive training

| 127

pecially when it comes to real-life conditions outside the lab. Adapting the behavior of a system to the emotions of a user, so-called affective computing, is on the rise. A typical field of application is education. However, this brings up several additional issues. From a technical point of view, the demand for affect detection by different applications leads to offering this as a basic service across a computer system (instead of integrating a separate library into every single app, as common). This requires standardized interfaces for such services, maybe as a part of the operating system or close to it, which can be invoked by various apps. From a societal point of view, training systems that react to the emotions of the learner are of interest in a much broader scope than considered today. Regarding the target group, there is a current focus on people with autism, while different user groups (in terms of profession or psychological disposition) could also benefit from a training of their social cognition. Regarding the setting, there is high potential for emotion-sensitive training on various educational levels, from school to university and vocational training. However, even if automated emotion recognition will work properly under real-life conditions, this still leaves open the question how a training system should translate this into changes in learning experience. Currently, there is little experience and few empirical studies are available, since interdisciplinary understanding of actually executable models from didactics and psychology is not yet pronounced. As a final remark, there are several ethical questions associated with the use (and potential misuse) of such technology. The consequences for individuals and for society as a whole of having emotion recognition and emotion-sensitive systems fully operational are still to be understood [18]. For instance, using a training system for social cognition as an invisible assessment tool is not intended, but possible, e. g., by employers. Moreover, the mere availability of such a training may shift societal norms towards a mainstreamed understanding of how people should behave in order to be accepted (in terms of tolerance) and legally allowed (e. g., for granting public aids). Ethical guidelines have to be further shaped for this.

References [1] [2] [3]

[4]

Anderson, J. R. 1983. The Architecture of Cognition. Cambridge, MA: Harvard University Press. Anderson, J. R., Corbett, A. T., Koedinger, K., and Pelletier, R. 1995. Cognitive tutors: Lessons learned. The Journal of Learning Sciences, 4, 167–207. Baker, R. S., S. K. D’Mello, M. M. T. Rodrigo, and A. C. Graesser. 2010. Better to be frustrated than bored: The incidence, persistence, and impact of learners’ cognitive–affective states during interactions with three different computer-based learning environments. International Journal of Human-Computer Studies, 68(4), 223–241. Bangor, Aaron, Kortum, Philip, and Miller, James. 2009. Determining what individual SUS scores mean: adding an adjective rating scale. In: Journal of Usability Studies, 4(3), 114–123.

128 | T. Moebert et al.

[5] [6]

[7] [8] [9] [10]

[11] [12] [13] [14]

[15] [16] [17]

[18]

[19] [20] [21] [22] [23] [24] [25] [26] [27] [28]

Barrett, L. F., B. Mesquita, K. N. Ochsner, and J. J. Gross. 2007. The experience of emotion. Annu. Rev. Psychol., 58: 373–403. Bölte, S., Feineis-Matthews, S., Leber, S., Dierks, T., Hubl, D., and Poustka, F. 2002. The Development and Evaluation of a Computer-Based Program to Test and to Teach the Recognition of Facial Affect. International Journal of Circumpolar Health, 61 (02), 61–68. Brusilovsky, Peter. 1998. Methods and techniques of adaptive hypermedia. Adaptive hypertext and hypermedia. Dordrecht: Springer, pp. 1–43. Cabanac, M. 2002. What is emotion? Behavioural processes, 60(2): 69–83. Chambers, J., and J. Sprecher. 1983. Computer-Assisted Instruction: Its Use in the Classroom. Englewood Cliffs, New Jersey: Prentice-Hall. Coffield, F. 2012. Learning styles: unreliable, invalid and impractical and yet still widely used. In P. Adey, J. Dillon (eds.) Bad education: debunking myths in education. Maidenhead, UK: Open University Press, pp. 215–230. Craig, S., A. Graesser, J. Sullins, and B. Gholson. 2004. Affect and learning: an exploratory look into the role of affect in learning with AutoTutor. Journal of educational media, 29(3): 241–250. Damasio, A. R. 1998. Emotion in the perspective of an integrated nervous system. Brain research reviews, 26(2-3): 83–86. Darwin, C. 1872. The expression of emotion in animals and man. London: Murray. Dhall, A., R. Goecke, J. Joshi, J. Hoey, and T. Gedeon. 2016. Emotiw 2016: Video and group-level emotion recognition challenges. In: Proc. International Conference on Multimodal Interaction. ACM, pp. 427–432. D’Mello, S., and A. Graesser. 2012. Dynamics of affective states during complex learning. Learning and Instruction, 22(2): 145–157. D’Mello, S., and J. Kory. 2015. A review and meta-analysis of multimodal affect detection systems. ACM Computing Surveys, 47(3): 43. D’Mello, S., B. Lehman, J. Sullins, R. Daigle, R. Combs, K. Vogt, L. Perkins, and A. Graesser. 2010. A time for emoting: When affect-sensitivity is and isn’t effective at promoting deep learning. In: Proc. International Conference on Intelligent Tutoring Systems. Berlin: Springer, pp. 245–254. Dziobek I., U. Lucke, and A. Manzeschke. 2017. Emotions-sensitive Trainingssysteme für Menschen mit Autismus: Ethische Leitlinien. In: Proc. Informatik 2017, LNI P-275. Bonn: Köllen, pp. 369–380. Ekman, P. 1992. An argument for basic emotions. Cognition & emotion, 6(3–4): 169–200. Ekman, P., R. J. Davidson, and W. V. Friesen. 1990. The Duchenne smile: Emotional expression and brain physiology: II. Journal of personality and social psychology, 58(2): 342. Elfenbein, H. A., and N. Ambady. 2002. On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychological bulletin, 128(2): 203. Elo, A. E. 2008. The rating of chessplayers, past and present. Mountain View (CA): Ishi Press. Fischer, A., and M. LaFrance. 2015. What drives the smile and the tear: Why women are more emotionally expressive than men. Emotion Review, 7(1): 22–29. Fölster, M., U. Hess, and K. Werheid. 2014. Facial age affects emotional expression decoding. Frontiers in psychology, 5: 30. Fontaine, J. R., K. R. Scherer, E. B. Roesch, and P. C. Ellsworth. 2007. The world of emotions is not two-dimensional. Psychological science, 18(12): 1050–1057. Freudenberg, M., R. B. Adams Jr, R. E. Kleck, and U. Hess. 2015. Through a glass darkly: facial wrinkles affect our processing of emotion in the elderly. Frontiers in psychology, 6: 1476. Garris, R., R. Ahlers, and J. Driskell. 2002. Games, motivation, and learning: A research and practice modell. Simulation & Gaming, 33(4): 441–467. Grafsgaard, J. F., J. B. Wiggins, K. E. Boyer, E. N. Wiebe, and J. C. Lester. 2013. Automatically

4 How to use socio-emotional signals for adaptive training

[29] [30] [31] [32]

[33] [34]

[35]

[36] [37] [38]

[39] [40]

[41] [42]

[43]

[44] [45] [46]

[47]

| 129

recognizing facial indicators of frustration: a learning-centric analysis. In: Affective Computing and Intelligent Interaction (ACII). IEEE, pp. 159–165. Grimes, M., J. L. Jenkins, and J. S. Valacich. 2013. Exploring the Effect of Arousal and Valence on Mouse Interaction. In: Proc. International Conference on Information Systems. Hall, J. A., and D. Matsumoto. 2004. Gender differences in judgments of multiple emotions from facial expressions. Emotion, 4(2): 201. S. G. Hart and L. E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Advances in Psychology, 52: 139–183. Hepach, R., D. Kliemann, S. Grüneisen, H. R. Heekeren, and I. Dziobek. 2011. Conceptualizing emotions along the dimensions of valence, arousal, and communicative frequency – implications for social-cognitive tests and training tools. Frontiers in psychology, 2: 266. Hess, U., and P. Bourgeois. 2010. You smile–I smile: Emotion expression in social interaction. Biological psychology, 84(3): 514–520. Hibbeln, M., J. L. Jenkins, C. Schneider, J. S. Valacich, and M. Weinmann. 2017. How Is Your User Feeling?: Inferring Emotion Through Human-Computer interaction Devices. MIS Quarterly, 41(1): 1–21. Hoffmann, H., H. Kessler, T. Eppel, S. Rukavina, and H. C. Traue. 2010. Expression intensity, gender and facial emotion recognition: Women recognize only subtle facial emotions better than men. Acta Psychologica, 135(3): 278–283. Izard, C. E. 1990. Facial expressions and the regulation of emotions. Journal of Personality and Social Psychology, 58(3): 487. Keller, J. 1983. Motivational design of instruction. In C. Reigeluth (ed.), Instructional design theories and models. An overview of their current studies. Hillsdale, NJ: Erlbaum. Kim, K. H., S. W. Bang, and S. R. Kim. 2004. Emotion recognition system using short-term monitoring of physiological signals. Medical and biological engineering and computing, 42(3): 419–427. Kinshuk, and Ashok Patel. 1997. A Conceptual Framework for Internet Based Intelligent Tutoring Systems. Knowledge Transfer, II, 117–124. Kirst, S., R. Diehm, S. Wilde-Etzold, M. Ziegler, M. Noterdaeme, L. Poustka, and I. Dziobek. 2017. Fostering socio-emotional competencies in children with autism spectrum condition: Results of a randomized controlled trial using the interactive training app “Zirkus Empathico”. In: Proc. International Meeting for Autism Research (IMFAR), pp. 743–744. Kohls, G., et al.. 2018. Altered reward system reactivity for personalized circumscribed interests in autism. In Mol Autism, 9. Lucey, P., J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews. 2010. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proc. Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE CS Press, pp. 94–101. Lucke, U. 2006. An Algebra for Multidimensional Documents as Abstraction Mechanism for Cross Media Publishing. In Proc. Automated Production of Cross Media Content for Multi-Channel Distribution (AXMEDIS). IEEE CS Press, pp. 165–172. Lucke, U., and C. Rensing. 2014. A survey on pervasive education. Pervasive and Mobile Computing, 14: 3–16. Lucke, U., and M. Specht. 2012. Mobility, Adaptivity and context awareness in e-learning, i-com, 11(01): 26–29. Lopes, A. T., E. de Aguiar, A. F. De Souza, and T. Oliveira-Santos. 2017. Facial expression recognition with convolutional neural networks: coping with few data and the training sample order. Pattern Recognition, 61: 610–628. Maehr, W. 2008. eMotion: Estimation of User’s Emotional State by Mouse Motions.

130 | T. Moebert et al.

Saarbrücken: VDM. [48] Mazurski, E. J., N. W. Bond, D. A. Siddle, and P. F. Lovibond. 1996. Conditioning with facial expressions of emotion: effects of CS sex and age. Psychophysiology, 33(4): 416–425. [49] McDuff, D., E. Kodra, R. el Kaliouby, and M. LaFrance. 2017. A large-scale analysis of sex differences in facial expressions. PloS one, 12(4): e0173942. [50] Moebert, T., H. Jank, R. Zender, and U. Lucke. 2014. A Generalized Approach for context-Aware Adaption in Mobile E-Learning Settings. In Proc. Advanced Learning Technologies (ICALT). IEEE CS Press, pp. 143–145. [51] Montagne, B., R. P. Kessels, E. Frigerio, E. H. de Haan, and D. I. Perrett. 2005. Sex differences in the perception of affective facial expressions: Do men really lack emotional sensitivity? Cognitive Processing, 6(2): 136–141. [52] Montirosso, R., M. Peverelli, E. Frigerio, M. Crespi, and R. Borgatti. 2010. The Development of Dynamic Facial Expression Recognition at Different Intensities in 4-to 18-Year-Olds. Social Development, 19(1): 71–92. [53] Orji, R., R. L. Mandryk, and J. Vassileva. 2017. Improving the Efficacy of Games for Change Using Personalization Models. ACM Trans. Comput.-Hum. Interact, 24(5): 32. [54] Pelanek R. 2016. Applications of the Elo Rating System in Adaptive Educational Systems, Computers & Education, 98(C): 169–179. [55] Peter, C., and A. Herbon. 2006. Emotion representation and physiology assignments in digital systems. Interacting with Computers, 18(2): 139–170. [56] Phillips, L. H., and R. Allen. 2004. Adult aging and the perceived intensity of emotions in faces and stories. Aging clinical and experimental research, 16(3): 190–199. [57] Rheinberg, F., Vollmeyer, R., and Engeser, S. 2003. Die Erfassung des Flow-Erlebens. In J. Stiensmeier-Pelster and F. Rheinberg (eds.) Diagnostik von Selbstkonzept, Lernmotivation und Selbstregulation. Tests und Trends, vol. 16. Göttingen: Hogrefe, pp. 261–279. [58] Riediger, M., M. C. Voelkle, N. C. Ebner, and U. Lindenberger. 2011. Beyond “happy, angry, or sad?”: Age-of-poser and age-of-rater effects on multi-dimensional emotion perception. Cognition & Emotion, 25(6): 968–982. [59] Ruffman, T., J. D. Henry, V. Livingstone, and L. H. Phillips. 2008. A meta-analytic review of emotion recognition and aging: Implications for neuropsychological models of aging. Neuroscience & Biobehavioral Reviews, 32(4): 863–881. [60] Russell, J. A. 2003. Core affect and the psychological construction of emotion. Psychological review, 110(1): 145. [61] Russell, J. A., and L. F. Barrett. 1999. Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. Journal of personality and social psychology, 76(5): 805. [62] Sariyanidi, E., H. Gunes, and A. Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE transactions on pattern analysis and machine intelligence, 37(6): 1113–1133. [63] Sharples, M., I. Arnedillo Sanchez, M. Milrad, and G. Vavoula. 2009. Mobile learning: small devices, big issues. In: Technology Enhanced Learning: Principles and Products. Heidelberg: Springer, pp. 233–249. [64] Soare E. 2015. Perspectives on Designing the Competence Based Curriculum. Procedia – Social and Behavioral Sciences, 180: 972–977. [65] Thompson, A. E., and D. Voyer. 2014. Sex differences in the ability to recognise non-verbal displays of emotion: A meta-analysis. Cognition and Emotion. 28(7): 1164–1195. [66] Tscherejkina, A., A. Morgiel, T. Moebert (2018): Computergestütztes Training von sozio-emotionalen Kompetenzen durch Minispiele. Evaluation von User Experience. E-Learning Symposium 2018, Universitätsverlag Potsdam. DOI: 10.25932/publishup-42071

4 How to use socio-emotional signals for adaptive training

| 131

[67] Uljarevic, M., and A. Hamilton. 2012. Recognition of Emotions in Autism: A Formal Meta-Analysis. Journal of Autism and Developmental Disorders. 43(7): 1517–1526. [68] Van Der Schalk, J., S. T. Hawk, A. H. Fischer, B. Doosje. 2011. Moving faces, looking places: Validation of the Amsterdam Dynamic Facial Expression Set (ADFES). Emotion, 11(4): 907. [69] Wang, C., E. Shimojo, and S. Shimojo. 2015. Don’t look at the eyes: Live interaction reveals strong eye avoidance behavior in autism. Journal of Vision, 15(12): 648. [70] Weigand, A., L. Enk, T. Moebert, D. Zoerner, J. Schneider, U. Lucke, and I. Dziobek. 2019. Introducing E.V.A. – A New Training App for Social Cognition: Design, Development, and First Acceptance and Usability Evaluation for Autistic Users. In 12th Scientific Meeting for Autism Spectrum Conditions, Augsburg, Februar 2019. [71] Zimmermann, P., P. Gomez, B. Danuser, and S. G. Schär. 2006. Extending usability: Putting affect into the user-experience. In Proc. NordiCHI’06, pp. 27–32. [72] Zhao, G., and M. Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE transactions on pattern analysis and machine intelligence, 29(6): 915–928. [73] Zimmermann, P., S. Guttormsen, B. Danuser, and P. Gomez. 2003. Affective computing – a rationale for measuring mood with mouse and keyboard. International Journal of Occupational Safety and Ergonomics, 9(4): 539–551. [74] Zimmermann, P., P. Gomez, B. Danuser, and S. G. Schär. 2006. Extending usability: Putting affect into the user-experience. In Proc. NordiCHI’06, pp. 27–32.

Dietmar Jannach, Michael Jugovac, and Ingrid Nunes

5 Explanations and user control in recommender systems Beyond black box personalization systems

Abstract: Adaptive, personalized recommendations have become a common feature of today’s web and mobile app user interfaces. In most of modern applications, however, the underlying recommender systems are black boxes for the users, and no detailed information is provided about why certain items were selected for recommendation. Users also often have very limited means to influence (e. g., correct) the provided suggestions and to apply information filters. This can potentially lead to a limited acceptance of the recommendation system. In this chapter, we review explanations and feedback mechanisms as a means of building trustworthy recommender and advice giving systems that put their users in control of the personalization process, and we outline existing challenges in the area. Keywords: recommender systems, personalization, explanations, feedback, user control ACM CCS: Human-centered computing, Collaborative and social computing

1 Introduction Many of today’s user interfaces of web and mobile applications feature systemgenerated, often personalized, and context-adaptive recommendations for their users regarding, for example, things to buy, music to discover, or people to connect with. To be able to automatically generate such tailored suggestions, the underlying recommender systems maintain a user profile, which serves as a basis to (i) infer individual users’ preferences, needs, and current contextual situation and (ii) correspondingly select suitable items for recommendation. Given the huge potential value of such systems for both consumers and providers [29], a variety of algorithmic approaches have been proposed in the 2000s and 2010s to generate suitable item recommendations for a given user profile. A prominent class of such systems is based on the principle of collaborative filtering, where the user profile consists of a set of recorded explicit or implicit preference statements of individual users, and recommendations are generated by also considering preference or behavioral patterns of a larger user community. Collaborative filtering approaches based on often complex Machine Learning models have shown to lead to increased business value in practice in various application domains [12, 30, 36]. However, a potential limitation of these systems is that, https://doi.org/10.1515/9783110552485-005

134 | D. Jannach et al. from the perspective of the end user, the factors and mechanisms that determine the provided recommendations usually remain a black box. This is in particular the case for popular technical approaches based on matrix factorization and, more recently, complex neural network architectures. In some applications, users may be given an intuition about the underlying recommendation logic, e. g., through a descriptive label like “Customers who bought … also bought ….” Often, however, recommendation lists are only labeled with “Recommended for you” or “Based on your profile,” and no in-depth explanation is given about how the recommendations were selected. In case no such information is provided, users may have doubts that the recommendations are truly the best choice for them and suspect that the recommendations are mostly designed to optimize the profit of the seller or the platform. A potentially even more severe problem with such black box approaches can arise when the system’s assumptions about the user’s preferences are wrong or outdated. A typical example is when users purchase a gift for someone else but the system considers the gift to be part of the users’ own interests. Many of today’s systems provide no mechanisms for users to give feedback on the recommendations or to correct the system’s assumptions [33]. In some cases, users might not even be aware of the fact that the content provided by the system is personalized according to some assumed preferences, as is probably the case in the news feeds of major social networks. In either case, when recommendations are of limited relevance for the users, they will eventually stop to rely on the system’s suggestions or, in the worst case, abandon the platform as a whole. In the academic literature, different proposals have been made to deal with the described problems. One main stream of research is devoted to the topic of explanations for recommender systems [23, 50, 61], for example with the goal to make recommender systems more transparent and trustworthy. Explanations for decision support systems have, in fact, been explored for decades. We can, however, observe an increased interest in the field of explanations in the recent past, as more and more decisions are transferred to Machine Learning algorithms and, in many cases, these decisions must be open to scrutiny, e. g., to be able to assess the system’s fairness.1 Providing explanations can, however, also serve as a starting point to address the second type of problem, i. e., how to better put users in control of their recommendations. Some e-commerce platforms, like Amazon.com, present explanations in the form “Because you bought” and let their users give feedback if this recommendation reasoning should be applied by the system in the future. Providing user control mechanisms in the context of explanations is, however, only one of several approaches proposed in the literature, and we review various approaches in this chapter. 1 This aspect is of increasing importance also due to the European Union’s recent General Data Protection Regulation (https://eur-lex.europa.eu/eli/reg/2016/679/oj), which aims to provide more transparency and additional rights for individuals in cases where decision making is done on a solely algorithmic basis.

5 Explanations and user control in recommender systems | 135

Generally, both explanations and user control mechanisms represent a potential means to increase the user’s trust in a system and to increase the adoption of the recommendations and advices it makes. In this chapter, we overview approaches from both areas and highlight the particular potential of explanation-based user control mechanisms.

2 Explanations in recommender systems 2.1 Purposes of explanations There are a number of possible ways in which one can explain the recommendations of a system to a user. When designing an explanation facility for a recommender system, one has therefore to consider what should be achieved by adding an explanation component to an application, i. e., what its purpose(s) should be. For example, in an e-commerce system, a seller might be interested in persuading customers to buy particular products or increasing their trust in order to promote loyalty. In early medical expert systems, explanations were, for example, often provided in terms of the system’s internal inference rules, which allowed users to understand or check the plausibility of the provided diagnosis or advice. But understanding the system’s decision was soon recognized not to be the only reason for including explanations. Buchanan and Shortliffe [6] list debugging, education, acceptance, and persuasion as additional potential goals in the context of expert systems. This list was later extended with additional perspectives in [61] and [50], leading to a more comprehensive list, shown in Table 5.1. Table 5.1: Explanation purposes (based on [6, 61, 50]). Purpose

Description

Example works

Transparency Effectiveness Trust Persuasiveness Satisfaction Education Scrutability Efficiency Debugging

Explain how the system works Help users make good decisions Increase users’ confidence in the system Convince users to try or buy Increase the ease of use or enjoyment Allow users to learn something from the system Allow users to tell the system it is wrong Help users make decisions faster Help users identify defects in the system

[23, 65, 20] [19, 3, 16] [25, 54, 5] [23, 67, 1] [54, 19, 3] [22, 18, 41] [35, 27, 17] [54, 2, 43] [11, 39, 26]

The entries in the table are organized by their importance in the research literature according to the survey presented in [50]. In the majority of the cases, research papers focused mostly on one single purpose, like in the seminal work by Herlocker

136 | D. Jannach et al. et al. [23]. There are, however, also works that investigate multiple dimensions in parallel [14, 26]. In a number of research works on explanations, in particular earlier ones, the authors did not explicitly state for which purpose their explanation facility was designed [10]. In several cases, the purpose can also not be indirectly inferred due to a surprisingly large fraction of works that lack a systematic evaluation of the explanation component [50]. Explanations are in general one of the natural “entry points” for giving the user control of recommendations, e. g., by displaying the assumptions about the user’s preferences for inspection and correction. However, in a review of over 200 papers on the topic of explanations, Nunes and Jannach [50] could identify only seven works that focused on scrutability, i. e., allowing the user to correct the system, which indicates a major research gap in this area.

2.2 Explanation approaches We can find a variety of different ways of explaining the suggestions by a recommender or, more generally, advice giving system in the literature. The choice of the type of information that is used for explaining and how it is presented to the user depends on different factors. These can, for example, include the availability of certain types of information (e. g., an explicit inference chain) or the specific application domain. With respect to the explanation content, four main content categories – summarized as follows – were identified in [50]. Table 5.2 exemplifies how a system can present these different types of content in the context of an interactive recommender system for mobile phones. – Preferences and user inputs: Explanations in this category refer to the specific user inputs or inferred preferences that led to the given recommendation. For example, the explanation details to what extent a recommended alternative matches the user’s assumed preferences, or presents the predicted user rating. – Inference process: Historically, inference traces were popular in classical expert systems. With today’s complex Machine Learning algorithms, such inference chains are not available. Instead, one approach can be to explain the system’s general reasoning strategy, e. g., that it recommends objects that similar users liked. – Background knowledge and complementary information: Explanation approaches of this type use additional information, for example the popularity of a recommended alternative in the entire community or among users with similar profiles, to generate explanations. – Alternatives and their features: This type of explanation focuses on certain attributes of the recommended alternative. They, for example, point out the decisive features of an item, show pros and cons of different alternatives, or highlight where one alternative dominates another.

5 Explanations and user control in recommender systems | 137 Table 5.2: Examples of explanation content categories. Content

Explanation example

Preferences and user inputs

“We recommend this phone because you specified that you prefer light-weight models.” “We consider light phones to weigh less than 150 g.” “We recommend this phone because it is currently popular in our shop.” “This camera has a removable battery, which other similar models do not have.”

Inference process Background knowledge Alternatives and their features

With respect to how the explanations are presented to the user, natural language representations (text-based explanations) are dominating the research landscape. In some works, more structured representations, e. g., in the form of lists of relevant features, other users, or past cases, are provided. Finally, different forms of graph-based and alternative visual approaches can be found in the literature as well, e. g., in the form of rating distributions for an item [23] or in the form of tag clouds [13, 14]. Generally, when explanations are used as an entry point for user control, not all forms of explanations seem equally helpful. Presenting the general inference strategy, for example, might be of limited use. Providing information about relevant inputs and features of the recommended items, in contrast, opens more opportunities for control mechanisms for users. When provided such input–output-oriented explanations, users can interactively adapt or correct their preference information as in [5] or give feedback on the recommendations, e. g., in the form of attribute-level critiques [46].

2.3 Challenges of explaining complex models In traditional expert systems, which in many cases had an explanation component, the content that was presented to the user was often determined by collecting information about how the underlying inference algorithm ended up with its suggestion. In a rule-based system, for example, one could record which of the rules fired, given the user’s specific input. Parts or all of this internal reasoning process is then presented in a user-friendly way. As a result, the process of computing the explanations as well as the explanations themselves are tightly related to the underlying recommendation process. In the field of recommender systems, rule-based or knowledge-based approaches are nowadays only used for certain types of products, e. g., high-involvement goods. In most of the cases, content-based filtering and collaborative filtering, which often rely on various types of Machine Learning models, dominate the research landscape. However, these models cause the extraction of the rationale underlying the recommendation to be less straightforward. This led to two groups of approaches to generate explanations: (1) white box approaches, which extract particular kinds of informa-

138 | D. Jannach et al. tion from the algorithm and model that were used to produce recommendations, and (2) black box approaches, which do not take into account how recommendations are made, but explain them using different sources of information. Early approaches focused mostly on white box explanation generation. Consequently, collaborative filtering mostly relied on nearest-neighbor techniques, and, correspondingly, a number of approaches were proposed, which use information about user neighborhoods to derive appropriate explanations for the users [23, 14]. Herlocker et al. investigated the persuasiveness of visualizing such neighborhoods through a user study [23]. Even more complex visualization approaches, based on 3D or interactive representations, were proposed in [38] and [42]. However, it remains somewhat unclear how such approaches would be perceived by an average user of a recommender system. Today, with modern collaborative filtering techniques based on matrix factorization and Deep Learning, explaining recommendations that are computed based on such Machine Learning models is much more difficult. Matrix factorization models consist of vectors of uninterpreted latent factor weights for users and items; deep neural networks train a large number of weights for nodes that have no obvious attached meaning. Such complex models make it very difficult to provide users with information about how the set of recommendations was exactly determined for them, let alone allow them to influence the system’s strategy of selecting items.2 In general, because more and more decisions are nowadays made by algorithms, the topics of transparency, fairness, and accountability become increasingly important in Machine Learning research. A recent survey on approaches to interpreting the outcomes of deep neural networks can, for example, be found in [47]. Given the complexity of this problem, an alternative is to rely on other ways of computing the explanations, leading to black box explanation generation. In such a case, one goal could be to provide plausible justifications to users, which, e. g., describe why a certain recommended item matches their preferences. One could, for example, mine association rules (“Customers who bought …”) and then use these rules to explain a given item recommendation, even though the recommended item was selected in a different way [53]. Alternatively, customer reviews can be mined to extract explanations that are in accordance with recommendations made using complex models, such as in [48].

2.4 A case study of the effects of different explanations Gedikli et al. [14] reported the results of a laboratory study in which they analyzed the impact of different explanation types on users in several dimensions. The specific tar2 There is work in the context of matrix factorization techniques to find interpretations of at least the most important latent factors [44, 55, 8].

5 Explanations and user control in recommender systems | 139

gets of investigation were efficiency, effectiveness, direction of persuasiveness, transparency, and trust. We summarize their experiment and insights here as a case study of the evaluation of different types of explanations. The study also represents an example of a common research methodology that is applied in this context. 2.4.1 Study design Ten different forms of explaining recommendations from the literature were considered in the study. Some of them were personalized and, for example, showed the ratings of the user’s peers for a recommended item. Other explanation types were nonpersonalized and, for example, simply presented the items’ average community ratings as an explanation. The second main differentiating factor was whether the explanation referred to the “content” of the recommended items (e. g., by displaying certain item features) or not. Figs. 5.1 and 5.2 show examples of two explanation types.

Figure 5.1: Histogram explanation type adapted from [23], showing the rating distribution of the neighbors of the current user. The explanation type was considered particularly effective in terms of persuasion in [23].

Figure 5.2: Tag cloud explanation type adapted from [13], which shows the item features that are assumed to be particularly desired and undesired for the user in different colors.

Procedure The experimental procedure in the study followed the multistep protocol from [4].3 In a first step, the study participants were asked to provide ratings for a number of 3 The protocol is referred to as “explanation exposure delta” in [50], expressing the difference of the users’ evaluation of an item when provided with detailed item information in comparison to their explanation-based evaluation of the same item.

140 | D. Jannach et al. movies, which was the domain of the study. Then, the participants were provided matching recommendations based on some underlying algorithm. Instead of showing the movie details to the participants, they were only shown the system-generated explanations. Each participant received only one treatment, i. e., was shown one form of explanation. The participants were then asked to rate the recommended items, expressing the probability that they would like to watch the movie. In the next step, the same recommendations were presented to the user again (in randomized order), now showing all the details of the movie and a trailer. The participants were asked to also rate these movies. However, they did not know that these were the exact same movies from the previous step. After the participants had completed the procedure, they were asked to fill out a questionnaire where they could rate their satisfaction with the explanations and could also express how transparent they found the explanations. Dependent variables and measurement method Transparency and satisfaction were, as said, measured through a questionnaire, where satisfaction was determined based on ease of use and enjoyment. The efficiency of an explanation type was determined by measuring the time needed by the participants to rate a movie based on the explanations. The effectiveness was approximated by comparing the participant’s rating for a movie when only provided the explanation with the rating when full information was available [4]. A small difference means high effectiveness; large differences, in contrast, indicate that the explanations are persuasive. The effectiveness measure was therefore also used to measure the direction of persuasiveness, i. e., if the explanation has the effect that the participant over- or underestimates the true suitability of a recommendation based on the explanations. 2.4.2 Observations and implications The following observations were made based on the responses of 105 participants. In terms of efficiency, it turned out that the participants needed significantly more time when they were provided with content-based explanations, i. e., those based on tag clouds, shown in Fig. 5.2. The tag clouds were, in contrast, among the explanation types with the highest effectiveness. The difference between the explanation-based rating and the “informed” rating were close to zero, with a high positive correlation between the values. By definition, the persuasiveness of highly effective explanations is low. There were, however, a number of explanation types that led to a strong overestimation of the preference match of the shown movies. In particular, non-personalized explanations that simply indicated how many other users gave the movie a rating of four or higher led to the highest level of positively oriented persuasion. Overall, the main conclusion resulting from this part of the analysis is that the provision of information related to the features of an item can be key to effectiveness, i. e., to help users making good decisions.

5 Explanations and user control in recommender systems | 141

Looking at the participants’ answers regarding the perceived transparency, the different explanation types fell into two groups for which statistically significant differences could be observed. The provision of a rating prediction and a confidence value is an example of an explanation form that led to low perceived transparency. In general, however, the obtained results were not fully conclusive. A personalized version of the tag clouds, for example, led to the highest level of transparency, whereas its non-personalized counterpart was in the group with the lowest transparency. The personalized tag clouds also led to the highest levels of user satisfaction. However, the difference to several other forms of explanations, including in particular very simple ones, was in most cases not statistically significant. The lowest satisfaction level (based on ease of use and enjoyment) was observed for explanations that involved information about the rating behavior of the peers. Overall, the authors conclude that explanations should be presented in a form that users are already familiar with and that require limited cognitive effort. By analyzing the correlations between the variables, two more guidelines were proposed in [14]. The first guideline is to use explanations with high transparency to increase user satisfaction, as was also found in [49]. Second, explanations should not be primarily optimized for efficiency, but rather for, e. g., effectiveness, as users seem to be willing to invest more time to understand the explanations. The resulting set of guidelines obtained in this particular study is summarized in Table 5.3. A summary of outcomes and insights of other studies about different aspects of explanations can be found in [50]. Table 5.3: Guidelines for explanation design [14]. Nr.

Guideline

1 2

Use domain-specific content data to boost effectiveness Use explanation concepts the user is already familiar with, as they require less cognitive effort and are preferred by the users Increase transparency through explanations for higher user satisfaction Explanation types should not primarily be optimized for efficiency. Users take their time for making good decisions and are willing to spend the time on analyzing the explanations

3 4

2.4.3 Open issues While the presented study led to a number of insights and design guidelines, some aspects require further research. First, a deeper understanding is needed regarding which factors of an explanation lead to higher transparency. Second, the study also led to inconclusive results about the value of personalization of explanations. Two types of tag clouds were used in the study. In some cases, the personalized method worked best, while in other dimensions it made no difference if the explanations were

142 | D. Jannach et al. personalized. Finally, some of the explanations were based on providing details of the inner workings of the algorithm, e. g., by presenting statistics of the ratings provided by similar users. Given today’s more complex Machine Learning-based recommendation algorithms, alternative approaches for explaining the outcomes of such black box algorithms are needed. One can for example rely on approaches that disconnect the explanation process from the algorithmic process of determining suitable recommendations and mainly use the features of the recommended items (and possibly the user profile) as a basis to generate the explanations ex-post [51, 62, 59].

3 Putting the user in control One of the less explored purposes of explanations, as mentioned above, is that of scrutability. While the word “scrutable” can be defined as “capable of being deciphered,”4 Tintarev and Masthoff extend the interpretation of the word in the context of explanations and consider scrutability as allowing “[…] the user to tell the system it is wrong” [61]. The explanations provided by the system should therefore be a part of an iterative process, where the system explains and users can give feedback and correct the system’s assumptions if necessary. In that context, explanations can be part of a mechanism that is provided to put users in control of the system, a functionality that is considered a key aspect of effective user interface design [58]. In the case of a recommender application, the system could, for example, explain to a user that a movie is recommended because he or she liked action movies in the past. If provided with an opportunity to give feedback, the user could then correct the system’s assumption in case this interest in action movies no longer exists. In the literature, there are a number of different ways in which users can give feedback and exert control over the system’s recommendations. The literature is, however, still scattered. In this section, we provide a review of these mechanisms based on [32] and [34]. Our review covers user control mechanisms in the context of explanations, but also considers other situations in the recommendation process where users can take control. Additionally, we present the results of a survey from [32], which investigates the reasons why the explanation-based control features of Amazon are not widely used.

3.1 Review framework We base our review of user control mechanisms on the conceptual framework presented in [32], as illustrated in Fig. 5.3. 4 https://www.merriam-webster.com/dictionary/scrutable

5 Explanations and user control in recommender systems | 143

Figure 5.3: Research framework for user control in recommender systems (adapted from [32]).

The mechanisms for user control can be classified in the following two categories. – Users can be put in control during the preference elicitation phase, e. g., when the system is collecting initial information about their individual preferences. We describe these approaches in Section 3.2. – Another option for recommendation providers is to allow users to control their recommendations in the presentation phase. We review examples of such approaches in Section 3.3.

3.2 User control during preference elicitation Many online services, including in particular e-commerce shops or media sites, allow users to rate individual items, either in the context of a purchase or independent from any business transaction. These feedback signals, e. g., thumbs up/down ratings, can be used by an underlying recommendation system to build long-term user models and to adapt the recommendations accordingly. In some sense, the provision of additional feedback opportunities can therefore be seen as a mechanism for user control, as the user feedback influences which items a user will see. However, such user inputs are typically not taken into account immediately by the system and the provision of such feedback might not have a recognizable effect for the user. Furthermore, it is usually not transparent for users which effects individual preference statements have on the recommendations, and the users might not even be aware that this feedback is taken into account at all. In the following sections, we focus on three explicit forms of user control during the preference building phase, namely, preference forms/dialogs, conversational recommender systems, and critiquing.

3.2.1 Preference forms and static dialogs Static forms are a common approach to let users specify and update their explicit taste profiles. The general idea, in most cases, is to let users choose their favorite category of items, e. g., musical genres or news topics, or to let them express their level of interest

144 | D. Jannach et al. on a numerical scale. Such approaches are easy to understand and are, consequently, used in a number of web applications, such as music and movie streaming sites (e. g., Netflix) or news websites (e. g., Google News). The user model can, in most cases, be updated immediately or filters can be applied to the recommendations so that they reflect the updated preferences instantly. Another way to collect explicit preferences from the user is to use static preference dialogs instead of forms. These dialogs guide users through a series of questions to identify their taste profile. One example is the website of TOPSHOP, where users can take a “quiz” to determine their fashion style profile step by step (see Fig. 5.4). The advantage of such dialogs over a single form is that more information can be gathered without overwhelming the user with too many options at once. In the recommender systems literature, we find static preference forms and dialogs in domains such as music recommendation [24], in-restaurant menu suggestion [64], or the recommendation of energy saving measures [37].

Figure 5.4: Static preference elicitation dialog on the fashion shopping website TOPSHOP.com.

However, even though these specific user control modalities are frequently used in the literature and practical applications, some open questions remain regarding their user-friendliness. For example, it is unclear how such systems should deal with user interest drifts, since a major fraction of users will most likely not edit their taste profiles manually and keep them up-to-date on a regular basis. Furthermore, as the name suggests, these forms and dialogs are static, i. e., the same set of questions is presented to all users, which might reduce their usefulness for users with more specific needs.

5 Explanations and user control in recommender systems | 145

3.2.2 Conversational recommender systems A possible solution to the described problems of static dialogs are conversational recommender systems. Such systems typically elicit the preferences by asking users about their preferences step by step, but they usually dynamically adapt the dialogs, e. g., based on previous answers of the user. The preference dialogs can, for example, be varied in terms of the number of questions, their details, or even with respect to the type of interaction itself. For example, if the user is a novice, a natural languagebased avatar could be used to interact with the user. Experts, in contrast, might feel more comfortable when they can specify their preferences using a set of detailed forms. A variety of conversational recommender system approaches were presented in the literature. For example, in the Adaptive Place Advisor system [15], users can construct travel plans in a conversational manner where the degree of presentational detail is adapted based on the users’ previous answers. Other systems, such as the Advisor Suite [9, 31], offer additional features such as personalized explanations of the recommendations or recovery suggestions in case the user requirements cannot be fulfilled. Examples of practical applications of such systems exist as well (see, e. g., [28, 31]). A main challenge in such systems is to find ways to stimulate users to go through a multistep preference elicitation process at least once. Personalizing the dialog according to the user’s estimated willingness to answer more questions can be one possible approach in that direction. Generally, conversational systems, also in the more modern form of chatbots, can lead to a more engaging user experience than the provision of static series of fill-out forms. A main hindrance to the large-scale usage of such systems lies in the fact that they can require substantial efforts for the creation and maintenance of the explicit knowledge that is needed to conduct such dialogs. Differently from recommendation approaches based on collaborative filtering, such knowledge-based approaches have no built-in learning capacity, i. e., the recommendation models have to be continuously maintained. In addition, conversational systems are usually only designed for one-shot recommendations and do not consider long-term user models.

3.2.3 Critiquing Similar to some of the user control methods discussed so far, critiquing techniques also allow users to explicitly state their preferences regarding certain item features. However, in contrast to conversational recommender systems or static forms, preferences are expressed in the context of a reference item. For example, if the user is searching for a restaurant to eat at, a critiquing system will typically offer one selected recommendation at the beginning of the process. Users can then study the features of the recommended restaurant and critique it, e. g., using statements like “cheaper” or

146 | D. Jannach et al. “closer to my location.” Based on the critiques, the recommender system will come up with better suggestions until the user is satisfied.5 Critique-based systems are easy to understand in terms of their interaction scheme and have the advantage of giving users immediate feedback by updating the recommended (reference) item after every critique. Consequently, a number of these systems have been proposed in the recommender systems literature (see, e. g., [7, 63, 64]). However, depending on the user’s requirements and the domain, critiquing approaches can result in a higher number of interaction steps than other preference acquisition methods. One solution proposed in the literature (see, e. g., [66, 45]) to tackle this problem are compound critiques, which allow users to critique items in more than one dimension at once, which might, however, also increase the cognitive load of the users.

3.3 User control in the recommendation presentation phase Letting users state their preferences in a more explicit or interactive manner is not the only way in which user can be put in control of their recommendations. Once the initial user preferences are collected, recommender systems can also offer a range of user control mechanisms when the recommendations are presented to users. Such control mechanisms can allow users to either (i) manipulate the recommendation lists, in the simplest form by trying different sort orders, or (ii) inspect and eventually correct the presented recommendations based, e. g., on the provided explanations. 3.3.1 Filtering or adjusting the provided recommendations One simple form of user control in the context of result presentation can be achieved by giving users the option to filter, sort, and manipulate the contents of the given recommendation list, e. g., by removing individual items. For example, when given a list of movie recommendations, a filter feature can be provided to enable users to exclude movies of certain genres from the recommendations, as done, e. g., in [56]. Another example is the microblog recommender system presented in [60], where users can sort the Tweets in some way or vary the importance of different filters. Considering realworld applications, such filters can also be defined by users for Facebook’s automated news feed to make sure that posts of favorite users always appear at the top. More sophisticated approaches allow the user to manipulate the recommendations in a more interactive way based, e. g., on the exposure of the algorithm’s inner logic. For example, in [5] and [57], recommendations are presented within graph struc5 Considering our research framework in Fig. 5.3, critiquing falls into both main categories and is a technique that has a preference elicitation facet and at the same time implements a feedback mechanism during result presentation.

5 Explanations and user control in recommender systems | 147

tures that show the relations of the recommended items to those that were previously rated by the user, friends, or similar users. Users can then take control of the recommendation outcomes by adjusting their item ratings or by assigning custom weights for the influencing friends. These inputs are then taken into account to create an updated list of recommendations. Generally, the provision of additional interaction and feedback instruments can make users more satisfied with the recommendations, as was shown in the study of [5], and many of the more simple forms of user control, such as sorting or filtering, are easy to implement for providers. However, some of the more complex methods assume that users (i) are willing to spend a significant amount of time to improve their recommendations and (ii) can understand the system’s logic, such as the relations between recommendations and similar users. This might, however, not always be the case for average users. 3.3.2 Choosing or influencing the recommendation strategy A quite different form proposed in the literature to put users in control of their recommendations is to allow them to choose or influence the recommendation strategy or algorithm parameters. Such mechanisms in principle offer the maximum amount of user control in some sense, but also the highest risk that the user interfaces and the complexity of the task might overwhelm users. Consequently, these user control measures can primarily be found in the academic literature. For example, in a study in the context of the MovieLens movie recommendation system [21], a widget was added to the user interface for users to change the recommendation algorithm to be used. Selecting one of the four available strategies led to an immediate change of the displayed recommendations. However, the mechanics of the algorithms were not explained to the users, which might make the presented system somewhat less transparent and users dissatisfied if their choices do not lead to the expected effects. A different approach was implemented in the system presented in [52]. The system allows users to fine-tune the weights of different recommendation strategies in a hybrid recommender system. In addition, a Venn diagram was used to illustrate which of the sub-strategies was responsible for the inclusion of individual items. The mentioned works show through user studies that such forms of in-depth user control can have a positive effect on the user experience. However, how such a mechanism can be successfully integrated into real-world systems without overwhelming everyday users remains to some extent unclear. 3.3.3 Interactive explanations and user control As mentioned earlier, explanations represent one possible entry point for users to scrutinize the provided recommendations and to interactively improve them or cor-

148 | D. Jannach et al. rect the system’s assumptions. Both in the literature and in real-world systems, there are only a handful of examples of recommender systems that provide such interactive explanations. The previously discussed interactive visualization approaches from [5, 57], which show the relation between rated items and recommendation in a graph structure, can be considered as a form of scrutable explanation. They expose the algorithm’s reasoning and allow users to exert control by changing their ratings, which leads to updated recommendations. Another example are the conversational recommenders from [9, 31], which generate textual explanations that describe which internal rules “fired” due to the user’s stated requirements. Users can then, in case they do not agree with the recommendation rules, change the weights of specific rules, which immediately leads to an updated set of recommendations. Finally, in the mobile shopping recommender system Shopr [40], a critiquing-based approach is taken where users are shown recommendations along with feature-based explanations such as “because you currently like blue.” Users can then improve the recommendations by either rating them with a thumbs up/down button or by clicking on the assumedly preferred features to revise their user model. In the latter case, users can, for example, click on the word “blue.” This would lead them to a screen where they can select the colors they are actually interested in at the moment. The additional preference information that is gathered in this way is then used by the underlying active learning algorithm to immediately “refine” the recommendation. There is also a small number of real-world applications that feature user control mechanisms in the context of explanations. In the case of the website of Amazon.com, users are at different occasions provided with explanations as shown in Fig. 5.5. In this case, the system explains the recommendation of a product in terms of other products with which the user has interacted in the past, e. g., clicked on, added to the shopping cart, or purchased. Users can then correct the system by indicating that they either already own the recommended product, are not interested in it, or do not want the item from their profile to be used for recommendations. The latter case can, for example, be helpful when a user purchased an item as a gift. On YouTube, a similar explanation mechanism exists that explains individual video recommendations with other videos from the user’s viewing history. In this case, the user can also reject the recommendation or tell the system not to consider a particular video from the profile as a source for generating future recommendations.

3.3.4 Acceptance of Amazon’s explanation-based user control features While the more complex explanation and control mechanisms as proposed in [9, 31] are part of real-world applications, their usefulness was not systematically evaluated. To obtain a deeper understanding of the usability and adoption of the much simpler

5 Explanations and user control in recommender systems | 149

Figure 5.5: The interactive explanations of Amazon.com’s recommendation system.

explanation-based user control mechanism of Amazon.com (Fig. 5.5), a questionnairebased survey was conducted in [32]. In the first part of the study, participants were shown a screen shot of the explanation feature, and they were asked 15 questions, e. g., about whether they knew about the existence of the functionality, if the provided functionality is clear to them, or if they think it is useful. In the second phase, which was conducted at a later time with a different group of participants, the same screen shot was shown, but the emphasis of the questions was on why participants would or would not use the feature, including the opportunity to give free-text answers. The results from the first phase of the study showed that more than 90 % of the participants said they were aware that they could influence their recommendations in some way, but, to some surprise, only 20 % knew about the special page where the explanation and feedback feature can be found. Only about 8 % had actively used the feature before. However, when asked if the feedback feature (“Don’t use for recommendation,” see Fig. 5.5) is clear, about 75 % of the participants said that it was either very clear or that the meaning could be guessed, indicating that understandability was not the reason why the feature was sparsely used. Interestingly, although participants stated on average that the functionality seemed useful, the average answer as to whether they intended to use the feature in the future was rather low. To find possible reasons as to why participants seemed to think the feature was useful, but in the end did not use it, the second part of the study collected free-text feedback, which was then analyzed manually. As a result, four main reasons were identified for why participants do not use the explanation-based control mechanism, listed as follows. – About a third of the participants were not interested in recommendations in general. – About a fourth said that it would be too much effort to use the feature. – Again, about a fourth mentioned a fear of bad consequences, such as irreversible changes to their user preference profile, if they tried to use the feature to improve the recommendations.

150 | D. Jannach et al. –

Finally, 19 % of the participants did not want use the feature because of privacy concerns.

Overall, even though the functionality is rather simple, it seems that providers need to communicate the effects and benefits of their control features in a more understandable way. Also, adding an “undo” functionality, which is proposed as a functionality for highly usable interfaces in general [58], could help to increase the acceptance of the provided functionality.

4 Summary and future directions Overall, a number of works exist in the literature on recommender systems that show that providing explanations can have positive effects, e. g., in terms of the adoption of the recommendations or the development of long-term trust. In real-world applications, the use of elaborate explanation mechanisms, as provided on Amazon.com, is however still very limited. In many cases, recommendation providers only add informative labels like “Similar items” to give the users an intuition of the underlying recommendation logic. In terms of user control, we can observe that major sites such as Google News or Facebook nowadays provide their users with features to fine-tune their profile and to personalize what they will see, e. g., in their news feed. A small number of sites such as Amazon.com or Yahoo! also allow users to give finer-grained feedback on the recommendations. Academic approaches, as discussed in the previous section, usually propose much richer types of visualizations and user interactions as can be found on real-world sites. A main challenge that is shared both by explanation approaches and user control mechanisms is that their usage can easily become too complex for average users. In fact, for many academic approaches, it is not fully clear if they would not overwhelm the majority of users. One possible approach in that context is to personalize the provided explanations and user control mechanisms according to the assumed expertise or IT-literacy level of the individual user. Such a personalization process can be implemented either by dynamically selecting from a pre-defined set of explanation types or by adapting individual explanations, e. g., by leaving out technical details for nonexpert users. But even when the provided mechanisms are intuitive and easy to use, users might be reluctant to manually fine-tune their recommendations for different reasons, e. g., because it requires additional effort. Therefore, in addition to the development of appropriate user interface mechanisms, better ways are needed to incentivize users and to convince them of the value of investing these additional efforts to receive better-

5 Explanations and user control in recommender systems | 151

matching recommendations. In this way, providers can make sure that their recommender systems do not filter out things that are actually relevant to their users.

References [1]

[2]

[3] [4]

[5]

[6] [7]

[8] [9]

[10] [11]

[12]

[13]

[14]

Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. Who to follow and why: Link prediction with explanations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 1266–1275, 2014. Punam Bedi, Sumit Kumar Agarwal, Samarth Sharma, and Harshita Joshi. SAPRS: Situation-Aware Proactive Recommender system with explanations. In Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics, ICACCI ’14, pages 277–283, 2014. Punam Bedi and Ravish Sharma. Trust based recommender system using ant colony for trust computation. Expert Systems with Applications, 39(1):1183–1190, 2012. Mustafa Bilgic and Raymond J. Mooney. Explaining recommendations: Satisfaction vs. promotion. In Proceedings of Beyond Personalization 2005: A Workshop on the Next Stage of Recommender Systems Research in Conjunction with the 2005 International Conference on Intelligent User Interfaces, IUI ’05, 2005. Svetlin Bostandjiev, John O’Donovan, and Tobias Höllerer. Tasteweights: A visual interactive hybrid recommender system. In Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ’12, pages 35–42, 2012. B. G. Buchanan and E. H. Shortliffe. Explanations as a Topic of AI Research. In Rule-based Systems, pages 331–337. Addison-Wesley, Reading, MA, USA, 1984. Robin D. Burke, Kristian J. Hammond, and Benjamin C. Young. Knowledge-based navigation of complex information spaces. In Proceedings of the 13th National Conference on Artificial Intelligence and 8th Innovative Applications of Artificial Intelligence, AAAI ’96, pages 462–468, 1996. Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. Towards understanding latent factors and user profiles by enhancing matrix factorization with tags. In ACM RecSys 2016 Posters, 2016. Alexander Felfernig, Gerhard Friedrich, Dietmar Jannach, and Markus Zanker. An integrated environment for the development of knowledge-based recommender applications. International Journal of Electronic Commerce, 11(2)(2):11–34, 2006. Gerhard Friedrich and Markus Zanker. A taxonomy for generating explanations in recommender systems. AI Magazine, 32(3):90–98, 2011. Alejandro J. García, Carlos I. Chesñevar, Nicolás D. Rotstein, and Guillermo R. Simari. Formalizing dialectical explanation support for argument-based reasoning in knowledge-based systems. Expert Systems with Applications, 40(8):3233–3247, 2013. Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, and Amr Huber. Offline and online evaluation of news recommender systems at swissinfo.ch. In Proceedings of the Eighth ACM Conference on Recommender Systems, RecSys ’14, pages 169–176, 2014. Fatih Gedikli, Mouzhi Ge, and Dietmar Jannach. Understanding recommendations by reading the clouds. In Proceedings of the 12th International Conference on E-Commerce and Web Technologies, EC-Web ’11, pages 196–208, 2011. Fatih Gedikli, Dietmar Jannach, and Mouzhi Ge. How Should I Explain? A Comparison of Different Explanation Types for Recommender Systems. International Journal of Human-Computer Studies, 72(4):367–382, 2014.

152 | D. Jannach et al.

[15] Mehmet H. Göker and Cynthia A. Thompson. The Adaptive Place Advisor: A conversational recommendation system. In Proceedings of the 8th German Workshop on Case Based Reasoning, GWCBR ’00, pages 187–198, 2000. [16] M. Sinan Gönül, Dilek Önkal, and Michael Lawrence. The Effects of Structural Characteristics of Explanations on Use of a DSS. Decision Support Systems, 42(3):1481–1493, 2006. [17] K. Gowri, C. Marsh, C. Bedard, and P. Fazio. Knowledge-based assistant for aluminum component design. Computers & Structures, 38(1):9–20, 1991. [18] H. A. Güvenir and N. Emeksiz. An expert system for the differential diagnosis of erythemato-squamous diseases. Expert Systems with Applications, 18(1):43–49, 2000. [19] Ido Guy, Naama Zwerdling, David Carmel, Inbal Ronen, Erel Uziel, Sivan Yogev, and Shila Ofek-Koifman. Personalized recommendation of social software items based on social relations. In Proceedings of the Third ACM Conference on Recommender Systems, RecSys ’09, pages 53–60, 2009. [20] Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, and Erel Uziel. Social media recommendation based on people and tags. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, pages 194–201, 2010. [21] F. Maxwell Harper, Funing Xu, Harmanpreet Kaur, Kyle Condiff, Shuo Chang, and Loren G. Terveen. Putting users in control of their recommendations. In Proceedings of the Ninth Conference on Recommender Systems, RecSys ’15, pages 3–10, 2015. [22] Diane Warner Hasling, William J. Clancey, and Glenn Rennels. Strategic explanations for a diagnostic consultation system. International Journal of Man-Machine Studies, 20(1):3–19, 1984. [23] Jonathan L. Herlocker, Joseph A. Konstan, and John Riedl. Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, CSCW ’00, pages 241–250, 2000. [24] Yoshinori Hijikata, Yuki Kai, and Shogo Nishida. The relation between user intervention and user satisfaction for information recommendation. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC ’12, pages 2002–2007, 2012. [25] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the Eighth IEEE International Conference on Data Mining, ICDM ’08, pages 263–272, 2008. [26] J. E. Hunt and C. J. Price. Explaining qualitative diagnosis. Engineering Applications of Artificial Intelligence, 1(3):161–169, 1988. [27] Tim Hussein and Sebastian Neuhaus. Explanation of spreading activation based recommendations. In Proceedings of the First International Workshop on Semantic Models for Adaptive Interactive Systems, SEMAIS ’10, pages 24–28, 2010. [28] Dietmar Jannach. ADVISOR SUITE – A knowledge-based sales advisory-system. In Proceedings of the 16th Eureopean Conference on Artificial Intelligence, ECAI ’04, pages 720–724, 2004. [29] Dietmar Jannach and Gedas Adomavicius. Recommendations with a purpose. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, pages 7–10, 2016. [30] Dietmar Jannach and Kolja Hegelich. A case study on the effectiveness of recommendations in the mobile internet. In Proceedings of the Third ACM Conference on Recommender Systems, RecSys ’09, pages 205–208, 2009. [31] Dietmar Jannach and Gerold Kreutler. Personalized user preference elicitation for e-services. In Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce, and e-Services, EEE ’05, pages 604–611, 2005. [32] Dietmar Jannach, Sidra Naveed, and Michael Jugovac. User control in recommender systems: Overview and interaction challenges. In Proceedings of the 17th International Conference on

5 Explanations and user control in recommender systems | 153

Electronic Commerce and Web Technologies, EC-Web ’16, pages 21–33, 2016. [33] Dietmar Jannach, Paul Resnick, Alexander Tuzhilin, and Markus Zanker. Recommender systems — beyond matrix completion. Communications of the ACM, 59(11):94–102, 2016. [34] Michael Jugovac and Dietmar Jannach. Interacting with Recommenders — Overview and Research Directions. Transactions on Interactive Intelligent Systems, 7(3):10:1–10:46, 2017. [35] Lalana Kagal and Joe Pato. Preserving privacy based on semantic policy tools. IEEE Security Privacy, 8(4):25–30, 2010. [36] Evan Kirshenbaum, George Forman, and Michael Dugan. A live comparison of methods for personalized article recommendation at Forbes.com. In Proceedings of the 2012 European Conference on Machine Learning and Knowledge Discovery in Databases, ECML/PKDD ’12, pages 51–66, 2012. [37] Bart P. Knijnenburg, Niels J. M. Reijmer, and Martijn C. Willemsen. Each to his own: how different users call for different interaction methods in recommender systems. In Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ’11, pages 141–148, 2011. [38] Johannes Kunkel, Benedikt Loepp, and Jürgen Ziegler. A 3d item space visualization for presenting and manipulating user preferences in collaborative filtering. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, IUI ’17, pages 3–15, 2017. [39] Carmen Lacave, Agnieszka Oniśko, and Francisco J. Díez. Use of Elvira’s explanation facility for debugging probabilistic expert systems. Knowledge-Based Systems, 19(8):730–738, 2006. [40] Béatrice Lamche, Ugur Adıgüzel, and Wolfgang Wörndl. Interactive explanations in mobile shopping recommender systems. In Proceedings of the Joint Workshop on Interfaces and Human Decision Making in Recommender Systems, IntRS ’14, 2014. [41] M. Levy, P. Ferrand, and V. Chirat. SESAM-DIABETE, an expert system for insulin-requiring diabetic patient education. Computers and Biomedical Research, 22(5):442–453, 1989. [42] Benedikt Loepp, Katja Herrmanny, and Jürgen Ziegler. Blended recommending: Integrating interactive information filtering and algorithmic recommender techniques. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI ’15, pages 975–984, 2015. [43] Paul Marx, Thorsten Hennig-Thurau, and André Marchand. Increasing consumers’ understanding of recommender results: A preference-based hybrid algorithm with strong explanatory power. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, pages 297–300, 2010. [44] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, RecSys ’13, pages 165–172, 2013. [45] Kevin McCarthy, James Reilly, Lorraine McGinty, and Barry Smyth. On the dynamic generation of compound critiques in conversational recommender systems. In Proceedings of the 3rd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, AH ’04, pages 176–184, 2004. [46] Kevin McCarthy, James Reilly, Lorraine McGinty, and Barry Smyth. Experiments in dynamic critiquing. In Proceedings of the 10th International Conference on Intelligent User Interfaces, IUI ’05, pages 175–182, 2005. [47] Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73:1–15, 2018. [48] Khalil Muhammad, Aonghus Lawlor, Rachael Rafter, and Barry Smyth. Great explanations: Opinionated explanations for recommendations. In Eyke Hüllermeier and Mirjam Minor, editors, Case-Based Reasoning Research and Development, pages 244–258. Springer International Publishing, Cham, 2015. [49] Mehrbakhsh Nilashi, Dietmar Jannach, Othman bin Ibrahim, Mohammad Dalvi Esfahani, and

154 | D. Jannach et al.

[50]

[51]

[52]

[53]

[54] [55]

[56]

[57]

[58] [59]

[60]

[61]

[62] [63] [64]

[65]

Hossein Ahmadi. Recommendation quality, transparency, and website quality for trust-building in recommendation agents. Electronic Commerce Research and Applications, 19(C):70–84, 2016. Ingrid Nunes and Dietmar Jannach. A systematic review and taxonomy of explanations in decision support and recommender systems. User-Modeling and User-Adapted Interaction, 27(3–5):393–444, 2017. Sergio Oramas, Luis Espinosa-Anke, Mohamed Sordo, Horacio Saggion, and Xavier Serra. Information extraction for knowledge base construction in the music domain. Data & Knowledge Engineering, 106:70–83, 2016. Denis Parra, Peter Brusilovsky, and Christoph Trattner. See what you want to see: visual user-driven approach for hybrid recommendation. In Proceedings of the 19th International Conference on Intelligent User Interfaces, IUI ’14, pages 235–240, 2014. Georgina Peake and Jun Wang. Explanation mining: Post hoc interpretability of latent factor models for recommendation systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, pages 2060–2069, 2018. Pearl Pu and Li Chen. Trust-inspiring explanation interfaces for recommender systems. Knowledge-Based Systems, 20(6):542–556, 2007. Marco Rossetti, Fabio Stella, and Markus Zanker. Towards explaining latent factors with topic models in collaborative recommender systems. In 24th International Workshop on Database and Expert Systems Applications, DEXA 2013, pages 162–167, 2013. J. Ben Schafer, Joseph A. Konstan, and John Riedl. Meta-recommendation systems: user-controlled integration of diverse recommendations. In Proceedings of the 11th International Conference on Information and Knowledge Management, CIKM ’02, pages 43–51, 2002. James Schaffer, Tobias Höllerer, and John O’Donovan. Hypothetical recommendation: A study of interactive profile manipulation behavior for recommender systems. In Proceedings of the 28th International Florida Artificial Intelligence Research Society, FLAIRS ’15, pages 507–512, 2015. Ben Shneiderman. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 3rd edition, 1997. Wee-Kek Tan, Chuan-Hoo Tan, and Hock-Hai Teo. Consumer-based decision aid that explains which to buy: Decision confirmation or overconfidence bias? Decision Support Systems, 53(1):127–141, 2012. Nava Tintarev, Byungkyu Kang, Tobias Höllerer, and John O’Donovan. Inspection mechanisms for community-based content discovery in microblogs. In Proceedings of the 2015 Joint Workshop on Interfaces and Human Decision Making for Recommender Systems, IntRS ’15, pages 21–28, 2015. Nava Tintarev and Judith Masthoff. A survey of explanations in recommender systems. In Proceedings of the 23rd IEEE International Conference on Data Engineering Workshop, ICDE ’07, pages 801–810, 2007. Nava Tintarev and Judith Masthoff. Evaluating the effectiveness of explanations for recommender systems. User Modeling and User-Adapted Interaction, 22(4-5):399–439, 2012. Shari Trewin. Knowledge-based recommender systems. Encyclopedia of Library and Information Science, 69:180–200, 2000. Rainer Wasinger, James Wallbank, Luiz Pizzato, Judy Kay, Bob Kummerfeld, Matthias Böhmer, and Antonio Krüger. Scrutable user models and personalised item recommendation in mobile lifestyle applications. In Proceedings of the 21th International Conference on User Modeling, Adaptation, and Personalization, UMAP ’13, pages 77–88, 2013. Youngohc Yoon, Tor Guimaraes, and George Swales. Integrating artificial neural networks with

5 Explanations and user control in recommender systems | 155

rule-based expert systems. Decision Support Systems, 11(5):497–507, 1994. [66] Jiyong Zhang and Pearl Pu. A comparative study of compound critique generation in conversational recommender systems. In Proceedings of the 4th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH ’06), pages 234–243, 2006. [67] Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping Ma. Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’14, pages 83–92, 2014.

|

Part III: Personalization approaches

Daniel Herzog, Linus W. Dietz, and Wolfgang Wörndl

6 Tourist trip recommendations – foundations, state of the art, and challenges Abstract: Tourist Trip Design Problems (TTDPs) deal with the task to support tourists in creating a trip composed of a set or sequence of points of interests (POIs) or other items related to travel. This is a challenging problem for personalized recommender systems (RSs), because it is not only needed to discover interesting POIs matching the preferences and interests, but also to combine these destinations to a practical route. In this chapter, we present the TTDP and show how it can be modeled using different mathematical problems. We present trip RSs with a focus on recommendation techniques, data analysis, and user interfaces. Finally, we summarize important current and future challenges that research in the field of tourist trip recommendations faces today. The chapter concludes with a short summary. Keywords: tourist trip design problem, recommender systems, group recommendation, context-aware recommender systems, mobility patterns, public displays, orienteering problem

1 Introduction Recommender systems (RSs) are software tools and techniques, which support users in finding products, services, or information that are useful for them [70]. RSs have been successfully applied in many domains, such as e-commerce, movies, or news to help users to make decisions and to increase sales. Another well-established application field for RSs is tourism. The travel and tourism domain is one of the main contributors to global economy. In 2017, the sector contributed directly or indirectly 8.3 trillion USD to the global economy and supported 313 million jobs, which is equal to 10.4 % of the world’s Gross Domestic Product (GDP) and 1 in 10 of all jobs [89]. RSs can support users in identifying travel destinations and attractions they would like to visit. Furthermore, they can be used for comprehensive travel planning, e. g., by combining travel destinations, transport connections, and activities. In addition, RSs can support the user with proactive recommendations when already traveling, e. g., while exploring a city [73, 88]. While there are some approaches on commercial websites to suggest package tours or try to inspire customers with pre-defined trip proposals, there are limited options for independent travelers to receive personalized trip recommendations. The research focus of RSs in tourism is shifting towards the recommendation of complex items, such as itineraries composed of multiple points of interest (POIs). The latter has become a popular example for a travel-related item that can be rechttps://doi.org/10.1515/9783110552485-006

160 | D. Herzog et al. ommended by solving the Tourist Trip Design Problem (TTDP). In its simplest formulation, the TTDP is identical with the Orienteering Problem (OP), an optimization problem which aims to combine as many locations as possible along a route to maximize the value of the route for the user [86]. An abstract model for solving the TTDP comprises two steps: (i) collecting relevant items such as POIs and analyzing travelrelated data and (ii) developing algorithms using these data to generate recommendations [92]. There are several variants of the TTDP, most of them being based on the OP. In this work, however, we use a broader definition of the TTDP, which allows us to recommend tourist trips and routes on different granularities, such as trips composed of multiple travel regions which can represent countries. An important aspect when integrating the TTDP into practical tourism applications is the development of specialized user interfaces to display the recommendations to individuals and groups of travelers and also facilitate gathering feedback from users. In this book chapter, we present the TTDP and show how it can be modeled in different travel-related scenarios. We summarize the current state of the art in tourist trip RSs with a focus on recommendation techniques, data analysis, and user interfaces. We highlight important current and future challenges that research in the field of tourist trip recommendations is facing. The chapter concludes with a short summary.

2 The Tourist Trip Design Problem Tourists exploring a city usually want to visit as many interesting POIs as possible. However, visiting all attractions is not an option due to practical constraints, such as time. Hence, algorithms solving the TTDP try to find a route containing some of the POIs maximizing the route’s value for the user without violating the given constraints. According to Vansteenwegen and van Oudheusden, mobile tourist guides need to consider the following information to solve the TTDP: (i) a user profile with the user’s travel preferences, (ii) additional user requirements, such as the amount of time and money the user intends to spend, and (iii) information about the POIs that can be visited [86]. Having obtained a set of candidate items, scores have to be assigned to the POIs according to the user’s preferences and constraints. Then, TTDP algorithms can be used to combine POIs with high scores to enjoyable routes which are suggested to the user. Vansteenwegen and van Oudheusden introduced the TTDP as an extension of the OP. In our work we broaden this definition, allowing us to recommend tourist trips and routes on different granularities, such as trips composed of multiple travel regions which can represent countries. For this purpose, we introduce our extended definition of the TTDP by presenting different travel-related scenarios. We show how the OP and similar graph-theoretic routing problems, namely, the Traveling Salesman Problem

6 Tourist trip recommendations – foundations, state of the art, and challenges | 161

(TSP) with its specializations, but also the Knapsack Problem (KP), can be used to solve variants of the TTDP.

2.1 The Traveling Salesman Problem The TSP is a classic routing problem that optimizes the route of a traveler visiting all nodes in a graph exactly once before returning to the original position. Being an NP-hard problem, computing optimal solutions becomes intractable soon; however, heuristics usually provide sufficiently good approximations [49]. The TSP has many practical applications. In logistics, it can be used to plan routes of vehicles that have to visit a fixed set of locations, before they return to the depot. In tourism, this problem is suitable to model round trips, where the user wants to visit all locations of a static set and eventually return to the starting point. Visiting all national parks of the US in an optimal route would be a typical example for this [65]. However, given myriad destinations and POIs to visit, travelers usually have to choose which attractions to include in their trip. Ergo, the pure formulation is not well suited for most tourist recommenders. Instead, most variants of the TTDP use special cases of the TSP, where not all locations have to be visited and the overall value of the trip is determined by non-binary profits.

2.2 The Orienteering Problem The OP is a special form of the TSP [47], where the origin and destination do not have to be identical. All locations are associated with a profit and may be visited at most once. The OP is also referred to as the Selective TSP [50]. The aim is to maximize the overall profit gained on a single tour limited by constraints, such as time and money [82] (see Fig. 6.1a). Hence, it can be used to model the aforementioned scenarios in which the user expects a route containing some of the most interesting locations without violating constraints. The majority of tourist recommendation literature uses the OP and its variants to model the TTDP [36]. In the following, we present specializations of the OP, which serve as more complex models for the TTDP. Furthermore, we briefly present some algorithms and heuristics solving these problems. The goal of the Team Orienteering Problem (TOP) is to find k routes at the same time maximizing the total profit of all routes [21] (see Fig. 6.1b). The name of this problem is derived from a team in which each team member selects one route in an attempt to avoid overlaps of the locations visited by each team member. In a tourism scenario, a team member is commonly interpreted as one day in a multiday trip. Exact algorithms solving the TOP have been proposed [18], but the main research focus is on developing faster heuristics to make it applicable in real-world scenarios. The

162 | D. Herzog et al.

Figure 6.1: Example paths from a start S to a destination D solving (a) the OP (a) and (b) the TOP with k = 3 teams. The numbers denote the locations’ scores.

first heuristic, MAXIMP, was introduced by Butt and Cavalier [17]. Souffriau et al. propose the Greedy Randomized Adaptive Search Procedure (GRASP) for the TOP [76] and subsequently improve their work with a Path Relinking extension to the GRASP algorithm [77]. Friggstad et al. present efficient algorithms to solve the TOP, not by optimizing for the overall benefit, but maximizing the benefits of the worst day, which lead to similar user satisfaction as curated lists by travel experts [30]. In the Orienteering Problem with Time Windows (OPTW), each location can only be visited within a defined time window [46]. These time windows can represent the opening hours of an attraction. Kantor and Rosenwein [46] were the first to solve the OPTW [84]. If the TOP is extended by time windows, it is called the Team Orienteering Problem with Time Windows (TOPTW) [85]. Vansteenwegen et al. developed an iterated local search (ILS) heuristic solving the TOPTW [85]. In a recent approach, a three-component heuristic for the TOPTW is proposed [43]. The Time-Dependent Orienteering Problem (TDOP) assumes that the time needed to travel between two locations depends on the time the traveler leaves the first location [29]. This extension can be used to model different modes of transportation in a tourist trip recommendation. For example, a tourist can leave a location later than planned when a bus connection to the next location is available, hence, the traveling time between the two locations decreases. Fomin and Lingas provide a (2 + ϵ)-approximation algorithm to solve the TDOP [29]. Combining the TDOP with time windows and multiple routes leads to the Time-Dependent Team Orienteering Problem with Time Windows (TDTOPTW) [31]. Garcia et al. developed different heuristics solving the TDTOPTW [31, 32]. The Multi Constrained Team Orienteering Problem with Time Windows (MCTOPTW) introduces additional thresholds besides the time budget, which a path is not supposed to exceed [33]. A common example of such a constraint is money. In this case, the vertices come with a fixed cost and the routing algorithm has to find a path, which exceeds neither the financial threshold, nor the time budget. Garcia et al. were the first to solve the MCTOPTW [33] using a meta-heuristic based on the ILS heuristic of Vansteenwegen et al. [85]. Souffriau et al. extend the MCTOPTW to the Multi Con-

6 Tourist trip recommendations – foundations, state of the art, and challenges | 163

strained Team Orienteering Problem with Multiple Time Windows (MCTOPMTW) [78], which allows defining different time windows on different days and more than one time window per day. Some problems dynamically assign the reward of visiting a location, depending on certain events or the order in which the attractions are visited. This allows a more realistic modeling of the TTDP. In the Generalized Orienteering Problem (GOP), every location is assigned multiple scores representing different goals of the visitor [37]. Hence, the user’s travel purpose can be modeled. The objective function in the GOP is non-linear, thus, it penalizes paths including two similar attractions, such as two restaurants in a row [40]. In the Team Orienteering Problem with Decreasing Profits (DPTOP), the profit of each node decreases with time [2]. Hence, the DPTOP can be used to model a variant of the TTDP, where the value of locations for the user is lower the later the traveler arrives there. In the Clustered Orienteering Problem (COP), the score of a node can only be gained if all nodes of a group of nodes are part of the path [4]. The Orienteering Problem with Stochastic Profits (OPSP) assumes that the locations’ profits are stochastic with a known distribution and their values are not revealed before the locations are visited [44]. Campbell et al. [19] introduce the Orienteering Problem with Stochastic Travel and Service Times (OPSTS), in which the traveler is punished if they do not reach a location before a deadline. Until today, the OP research mostly focuses on the creation of tourist trips for single users. Recent work introduces a variant of the OP which tries to find routes for a group of users [80]. However, an investigation which recommendation strategies work best to find a consensus among group members is still missing.

2.3 The Knapsack Problem The KP offers a different perspective on the TTDP. It is useful when the costs of travel between the locations are unknown or negligible. In the traditional 0–1 KP, each travel item can be packed into a bag with limited size, i. e., be part of the final recommendation, exactly once or not at all. Each item comes with a profit and a cost that are independent of the other items and the route to be taken. While the profits are to be maximized, the sum of costs is not allowed to exceed the knapsack’s capacity [48]. It should be noted that using the KP to select items, the result is an unordered set of items. The obtain a route, the items can be fed into a TSP solver after the KP algorithm selected the locations that have to be visited. Hence, the problem is not suitable when the routing costs between the items are important. A variant of the KP is the Oregon Trail Knapsack Problem (OTKP) [16]. In this formulation, the value of a region is not only determined by the user query, but also depends on the presence or absence of other regions in the recommended composite trip. This allows to penalize travel items that do not fit together. The OTKP has been

164 | D. Herzog et al. used to improve item diversity by decreasing the values of similar items in the same trip [41].

3 State of the art in tourist trip recommendations Having defined and explained the TTDP as a mathematical problem, we now present the state of the art in tourist trip recommendations. We review several approaches for recommending different types of travel-related items and present practical applications solving the TTDP. Furthermore, we focus on two topics particularly important for the development of tourist trip RSs: item diversity (Section 3.2) and efforts to learn from past trips (Section 3.3). The summary of the state of the art in tourist trip recommendations helps us to identify current and future challenges in the development of next generation tourist trip RSs.

3.1 Recommender systems in tourism Most RSs in tourism recommend ranked lists of single items. We refine the categorization of items recommended by tourism RSs introduced by Borràs et al. [11] and Gavalas et al. [35] as follows: (i) a set of travel items, such as multiple POIs or travel destinations, (ii) a travel plan (also called travel bag or travel bundle) combining coherent travel items, such as destinations, activities, accommodation, and other services, in one recommendation, and (iii) a sequence of items, such as a sequence of POIs along an enjoyable route for a single- or multiday trip. In the following, we present some of the most important examples in each of these categories. 3.1.1 Recommendation of sets of travel items GUIDE is a tourist guide recommending POIs considering different personal and environmental context factors, such as the user’s age or the time of the day [23]. Another example of a context-aware RS for POIs is South Tyrol Suggests (STS), which takes various context factors, such as the weather, into account to recommend POIs in South Tyrol, Italy [12]. Furthermore, a personality questionnaire is used to mitigate the cold start problem. Benouaret and Lenne recently presented an RS for travel packages, where each package is composed of a set of different POIs [9]. Another promising idea to find POIs is presented by Baral and Li [8], but it has not yet been implemented in a practical application. Their approach combines different aspects of check-in information in location-based social networks (LBSNs), such as categorical, temporal, social, and spatial information in one model to predict the most potential check-in locations.

6 Tourist trip recommendations – foundations, state of the art, and challenges | 165

In previous work, we developed a system for combining travel regions to recommend a composite trip [41, 91]. The user is asked to specify their interests, e. g., nature and wildlife, beach or winter sports along with potential travel regions and monetary and temporal limitations. The underlying problem for picking the regions is the OTKP [16]. Every additional week the user is staying in the same region, the region’s score decreases. The destination information is manually modeled by expert knowledge. Messaoud et al. extend this approach by focusing on the diversity of activities within a composite trip [60]. They use hierarchical clustering to improve the heterogeneity of activities. The underlying dataset is the same as in the original approach [41], but extended by seasonal activities that have been rated in correspondence to specific regions and traveler types.

3.1.2 Recommendation of tourist plans Lenz was first to develop a case-based RS for holiday trips [53]. The case description of CABATA contains features such as the type of holiday, the travel region, or intended means of transportation. The case solutions present recommendations that fulfill all user requirements and others that are at least similar to the user query. CABATA is a prototypically implemented part of an architecture for travel agent systems called IMTAS and was presented to the public in 1994 [54]. Other early examples of case-based travel RSs for tourist plans are DieToRecs [68] and Trip@dvice [87]. DieToRecs allows the recommendation of single items, such as destinations or hotels, and bundling of travel items for a personalized travel plan [68]. The case base of Trip@dvice contains travel plans created by the community [87]. It has been selected by the European Union and by the European Travel Commission as a travel RS in the European tourism destination portal http://visiteurope.com. Ricci presents TripMatcher and VacationCoach [69], two of the early travel RSs using content-based approaches to match the user preferences with potential destinations. VacationCoach explicitly asks the user to choose a suitable traveler type, such as culture creature or beach bum. TripMatcher uses statistics on past user queries and guesses the importance of attributes not explicitly mentioned by the user to come up with recommendations. A conversational RS for travel planning is presented by Mahmood et al. [56].

3.1.3 Recommendation of sequences of travel items The aforementioned context-aware RS GUIDE was also one of the first applications that recommends personalized tourist trips [23]. For this purpose, the user has to choose POIs they would like to visit. Then, the system calculates a route considering contextual information, such as the opening hours of the selected POIs. GUIDE is also able to update the recommended routes dynamically when the user decides to

166 | D. Herzog et al. stay longer than planned at a POI. De Choudhury et al. use photo streams to estimate where users were and how much time they spent at a POI and traveling between POIs [24]. Based on this information, their approach creates a POI graph and recommends tourist trips. Another solution using photos is presented by Brilhante et al. Their application, TripBuilder, uses unsupervised learning for mining common patterns of movements of tourists in a given geographic area [14]. City Trip Planner is a web application that recommends multiday tourist trips [83]. It respects certain limitations, like opening hours, and can also include a lunch break into the trip. Gavalas et al. present Scenic Athens, a context-aware, mobile tourist guide for personalized tourist trip recommendations in Athens, Greece [34]. Compared to similar applications, Scenic Athens can also incorporate scenic routes into the trip recommendations. Quercia et al. introduce a different approach for route recommendation [67]. Instead of recommending shortest paths between two directions or maximizing attraction values of POIs, their trip recommender suggests routes that are perceived as pleasant. The authors collect crowd-sourced ratings to identify pleasant routes. Often users are in need for recommendations when they are already on the go. Thus, several smartphone applications for tourist trip planning have been published in the last years. Google Trips1 offers a day plans functionality which suggests thematic tourist trips such as The Museum Mile in New York. In addition, it automatically collects reservations and booking confirmations from the user’s Gmail account to collect all travel-related items in the app. In our previous work, we developed TourRec, a mobile trip RS, which uses a multitier web service to recommend tourist trips composed of multiple POIs [38, 52]. It allows the user to rate different categories, such as Food or Outdoors and Recreation, on a scale from 0 to 5. The higher a category’s rating is, the more likely POIs of this category appear in the recommended trip. Furthermore, the user has the option to overwrite the ratings of subcategories. For instance, users can rate all Food POIs with a 0, but rate cafés with a 5 if they want to avoid restaurants, but not cafés. Then, the user specifies an origin, a destination, the starting time, and the maximum duration of the trip to request a new recommendation. Based on this information, TourRec calculates a route and visualizes it as a list or on the map (see Fig. 6.2). The recommendations are context-aware, which is a particular challenge when sequences instead of single items are recommended [51]. We present this challenge and our first own solutions in Section 4.1. TourRec is publicly available for Android smartphones.2 The examples discussed in this section recommend tourist trips to individuals. However, in practice, users often travel in groups. While approaches to recommend lists of POIs or travel packages to groups of users have been developed, to the best of 1 https://get.google.com/trips/ 2 https://tourrec.cm.in.tum.de

6 Tourist trip recommendations – foundations, state of the art, and challenges | 167

Figure 6.2: User interfaces of the TourRec application allowing the user to (a) request a new trip recommendation and (b) display it on a map or in a list view.

our knowledge, there is no tourist trip RS for groups. We explain the importance of RSs for groups and the challenges linked to their research in detail in Section 4.2.

3.2 Item diversity One natural requirement for a RS for composite trips is to adjust the diversity of the items to the user’s preferences [20]. In typical tourism RSs, where only n out of k items can be recommended, the algorithm should consider skipping some high-ranked recommendations in favor of variety in the recommended itinerary. For example, even if the user’s preference model shows high interest for one specific activity, such as culinary attractions, it is suboptimal to only recommend numerous restaurants and cafés. Wu et al. present a study [90] in which they assess user satisfaction based on the diversity level of the recommendations. They use the Five-Factor Model for personality [59] to capture the user’s need for diversity. That model is commonly used in RSs to personalize recommendations [62, 81]. Another strategy to improve recommendation diversity is to group items by their features via clustering algorithms. Messaoud et al. propose a variety seeking model using semantic hierarchical clustering to establish diversity in a set of recommended activities [60]. Diversity can also be expressed

168 | D. Herzog et al. as a constraint. Savir et al. measure the diversity level based on attraction types and ensure that the trip diversity level is above a defined threshold [71]. This level of variety needs to be personalized to fit the user needs. Some travelers want to obtain an exhaustive impression of the travel destination; others who are up for relaxing at the beach do not profit from diverse recommendations. Therefore, the level of expected diversity within the trip must be elicited and incorporated into the user model. Furthermore, the recommendations should not only be diverse, but be calibrated to reflect the user’s various interests. If the user has visited 60 % outdoor, 30 % culture, and 10 % entertainment attractions, recommendations should reflect this distribution. To overcome this problem, Steck has proposed a re-ranking algorithm in the domain of music recommendations [79].

3.3 Learning from past tourist trips With nowadays’ ubiquity of GPS modules in mobile phones, a vast amount of spatiotemporal data is being collected. Such data become publicly available if users choose to publish them, as often done in LBSNs. The general adoption of LBSNs opened many opportunities for researchers to analyze human mobility in general and in combination with their social activities, such as traveling. Capturing the mobility and social ties of users in mathematical models is the basis to derive features that can then be used to influence the ranking and composition of items. Song et al. develop and evaluate mathematical models for human mobility and its predictability [74, 75]. Further analysis of human mobility in LBSNs reveals that not only geographic and economic constraints affect mobility patterns, but also the individual social status [22]. However, an analysis of another LBSN, Gowalla, shows that the number of check-ins and the number of places a user has visited follow log-normal distributions, while connecting to friends is better described by a double Pareto law [72]. LBSN data have also been used to capture cross-border movement [10]. The authors demonstrate how mobility dynamics of people in a country can be analyzed; however, this study is not about tourists and is limited to one country, Kenya. Noulas et al. analyze activity patterns of Foursquare users in urban areas, like the spatial and temporal distances between two check-ins [63]. They uncover recurring patterns of human mobility that can be used to predict or recommend future locations of users. Data from LBSNs has already been analyzed to improve RSs [7]. This is not surprising since the user’s locations and social graph tells a lot about individual preferences. Spatial co-occurrences have been used to identify similar users and generate implicit ratings for collaborative filtering algorithms [94]. In another approach [6], travelers in a foreign city are matched to local experts based on their respective home behavior to recommend Foursquare venues. Hsieh et al. use past LBSN data to recommend travel paths along POIs in cities [42]. For this purpose, they present solutions to derive the popularity, the proper time of day to visit, the transit time between venues, and the

6 Tourist trip recommendations – foundations, state of the art, and challenges | 169

best order to visit the places. In our previous work, we mined trips from Foursquare check-ins to analyze global travel patterns [27]. They reveal the distribution of the duration of stay per country and which countries are frequently visited together.

4 Current and future challenges In the following, we present some of the most important challenges in the development of next-generation tourist trip RSs, which we identified from previous work (see Section 3). Furthermore, we present ideas of how to tackle these challenges and first results from our own work.

4.1 Context-aware tourist trip recommendations Incorporating contextual information into the recommendation process can have a significant effect on the quality of a recommendation [1]. It is particularly important for mobile tourist RS [35], which have to adapt the recommendations to the current weather, for example. Existing research focuses on context-aware recommendations of single POIs. When recommending itineraries, additional context factors have to be considered. In our previous work, we explored two context factors particularly important for sequences of POIs: the time of day and previously visited POIs [51]. The optimal time to visit a POI influences the order of POIs in a tourist trip. For example, a restaurant receives a higher score when recommended during lunchtime or in the evening than in the early morning. Furthermore, the previously visited POIs influence the perceived quality of a POI in a trip. When a proposed trip contains two restaurant recommendations in a row, the second restaurant may not be appreciated by the user. Our suggested approach is to calculate the influence of a previously visited POI on a candidate POI subject to the number of POIs visited in between. This influence is called item dependence [40]. We conducted an online questionnaire to determine the influence of the context factors on the user’s decision of visiting a POI and the ratings of the POIs under these conditions [51]. The results of the questionnaire show that the time of day is highly relevant for a music event, while the previously visited POI is less important, for example. Furthermore, shopping POIs receive a very high rating when visited in the afternoon and music events are not appreciated in the morning. We integrated the context factors for sequences as well as other factors, such as weather, into the trip composition algorithms of our mobile application TourRec. Fig. 6.2b illustrates the context awareness of TourRec. The weather data are directly presented using an icon and the expected temperature range. The previously visited POI and the time of day context factors are considered in the recommended trip. A

170 | D. Herzog et al. museum is recommended during a cold and cloudy morning. Then, a restaurant is suggested for lunch. We compared the quality of the context-aware recommendations to the previous version ignorant of contextual information in a small user study [51]. The contextaware version outperformed the baseline version especially in terms of diversity and recommending POIs at suitable times of the day. However, one disadvantage of our initial approach was the possible equalizing of two or more extreme contextual conditions due to the weighted arithmetic mean. An outdoor activity can be recommended even if it rains and storms, if the other context factors are very positive for a specific POI. We tackle this issue by defining thresholds for every context factor. If context factors exceed their threshold values, the corresponding POIs are not considered for a recommendation.

4.2 Group recommendations The vast majority of work in the field of tourist trip RSs focuses on recommendations for single users, but in practice tourists often travel in groups. A group recommender system (GRS) for tourist trips has to consider the preferences and constraints of all group members. GRSs have been applied in various domains such as movies, music, or news [57]. In the tourism domain, applications have been presented that suggest lists of POIs [5] or travel packages tailored for groups [58]. To the best of our knowledge, RSs generating tourist trips composed of multiple POIs along enjoyable routes only exist for single users, but not for groups of users. A GRS for tourist trips on mobile devices can be implemented in various ways that can be differentiated by the way consensus is established [39]. For example, the group can use only one device. In this scenario, one person of the group has to enter the group’s preferences before a recommendation can be made. Existing applications can be used by groups without any additional development effort; however, the group must agree on the preferences on their own, which can be a difficult and time consuming task. Another option is to have one device per group member, allowing the group members to state their preferences separately. One the one hand, hiding preferences from other group members can avoid manipulation and social embarrassment [45]. One the other hand, an open discussion can be impeded and a strategy to aggregate the individual preferences is required. Finally, a group can also interact with a mutual display, which may facilitate an open discussion between the group members to establish consensus. We present the challenges and opportunities of public displays in RSs in Section 4.5. When preferences are collected separately, a group recommendation can be generated by combining each user’s individual recommendations to one single group recommendation or by aggregating the distinct preferences into a recommendation for a

6 Tourist trip recommendations – foundations, state of the art, and challenges | 171

so-called virtual user. Many preference aggregation strategies are inspired by the Social Choice Theory [57]. Simple approaches, such as calculating the average of user preferences, are easy to implement but can lead to unhappy group members when one person dislikes an item that is liked by the majority of the group members. Other strategies filter out items that are disliked by at least one person or assign different weights for users. In this case, the preferences of an important person, e. g., a child, can be prioritized. Research has shown that there is no perfect way to aggregate the individual preferences. Instead, the group’s intrinsic characteristics and the problem’s nature have to be considered [25]. We started to extend the TourRec application by group recommendations. The idea is to allow each user to state their preferences separately on a personal device and then send the group recommendation back to every user. We use Google Nearby Connections3 to connect multiple smartphones. It uses Bluetooth and Wi-Fi to find other devices in the vicinity and to establish a connection between devices running TourRec [38]. We implemented solutions for both group recommendation strategies: aggregating user preferences and merging recommendations. We integrated the Average, Average without Misery, and Most Pleasure preference aggregation strategies into the TourRec application. Furthermore, we developed different approaches where individual recommendations are merged to one trip for the group. These approaches firstly calculate one route for each group member using the individual’s travel preferences. Then, a group recommendation is generated using a social choice strategy. Only POIs which are part of at least one of these individual recommendations are candidates for a recommendation. POIs which are part of more than one individual recommendation receive a greater weight, which makes it more likely that these are part of the aggregated route which is recommended to the group. Another approach selects segments (e. g., three POIs in a row) from every individual recommendation. In addition, we developed an alternative approach considering splitting groups for some time so that each homogeneous subgroup can pursue their interests before merging back with the other group members. A major drawback of previous research in the field of GRS is that studies often use synthetic groups, which can lead to falsified results [57]. This is why we plan to conduct a user study evaluating our group recommendation strategies with real groups and analyze how different social-psychological conditions, such as the group type, influence the choice of the recommendation strategy.

4.3 Duration of item consumption Traditionally, RSs came up with an ordered list of recommendations, of which the user would choose one. When it comes to RSs for composite trips, typically several items 3 https://developers.google.com/nearby/connections/overview

172 | D. Herzog et al. are recommended in sequence. In such a scenario it is worthwhile to suggest a duration of stay for each attraction, instead of proposing equal amounts of time [26]. When recommending POIs in urban areas, service providers like Google already have rich information about the distribution of the durations of stay. Therefore, a first refinement step would be to assign the median visit time of each venue as duration of stay [30]. In Section 3.1, we discussed several approaches to plan an optimal trip in a city within temporal constraints. However, to the best of our knowledge, none adjusts the durations of stay with respect to the personal fit of the venue. If the distribution of the duration of stay shows a high variance, the recommendations should be personalized so that the duration of stay is prolonged if the venue has a high score, and shortened otherwise. In our travel region RS, we propose to gradually decrease the score of a region by 5–10 % per week, selecting the next region from the result list as soon as the score surpasses the one of the former [41]. While this score adjustment is already a form of personalization of the duration of item consumption, it is indirect and given the weekly interval quite coarse-grained. An ideal solution would consider three aspects to calculate the duration of item consumption; first, the typical time needed to visit one location, second, the personalized score of the location, and finally, further context information such as the total trip time and the type of traveler. The first two are already widely used in tourist RSs; however, the third is largely unexplored, mainly because it requires more information about the user. A recent approach investigates typical tourist travel patterns on country level with a focus on the durations of stay [27]. While the concrete approach lacks generality, mobility patterns can both be used to derive the typical durations of stay. Based on this approach, a cluster analysis of the mobility patterns of global trips revealed four groups of travelers [28]. To determine the pace at which a user usually travels, their past trips could be derived from an LBSN they are using and then classified into one of these groups.

4.4 Deriving tourist mobility patterns In Section 3.3, we have already discussed how past trips can be used to improve recommendations to new users. We argue that in the future this avenue should be pursued to learn about realistic trips and user preferences. However, the evaluations of many contributions discussed in this chapter are rarely based on field data and almost none of the suggested algorithms is used within a commercial context. As researchers in the area of tourist trip recommendations we should ask ourselves: “Would we realistically go on a trip as recommended by our approach?” So, if the user requests a city trip of five hours, is it appropriate to recommend 36 items, just because the algorithms computed that in theory somebody could visit those in the given time frame?

6 Tourist trip recommendations – foundations, state of the art, and challenges | 173

Tourist mobility patterns can be derived by analyzing traveler trajectories [27]. Single trips can be aggregated and characterized with metrics, such as the number of visited places in a certain time, photos taken, and modes of transportation used. Also, the typical routes can be derived to learn how tourists move in the city. This information could be used to verify that the recommended trips are within a 90 % interval of the analyzed past trips. Data sources can be LBSNs [7, 27], regional smartphone applications [13], but also image meta-data [93]. Furthermore, tourist mobility patterns help to understand the dynamics of the durations of stay as we have discussed in Section 4.3.

4.5 Tourist trip recommendations on public displays Public displays are ubiquitous today. They are used to display arrival and departure times of public transport, weather reports, news, or advertisements. While these applications are examples for public displays with an information-only purpose, advances in technology enable interactive and personalized content. For example, the content can be personalized based on the user’s stereotypes [64]. Interactive displays in shopping malls can highlight relevant shops the user is interested in, instead of only showing static maps [55]. Tourist trip RSs on personal mobile devices, such as the TourRec application, can be used in conjunction with kiosk systems that are deployed at touristic areas [39]. The integration of public displays is particularly interesting for group travelers since they provide a larger and mutual screen for all group members to facilitate the discussion among the group members and hence finding consensus. The sizes of public displays range from small TV screens for displaying information, such as visitor information in museums, to large multiuser wall displays in public spaces [66]. Besides size, public displays can be differentiated based on their interaction paradigms. Users can either directly interact with the touchscreen or keys attached to the display, or they can use speech or gestures captured by cameras [61]. Two factors may prevent people from interacting with a public display: social embarrassment and privacy. Public display applications have to be designed in a way that they attract users to interact with the display and also to protect their information when entering personal data. When designing interactive systems, shoulder-surfing, where passersby can identify sensitive content not meant for their eyes, such as a recommendation for the next POI the user should visit, should be hindered [15]. The user can pair his or her mobile phone with the public display to reduce the effects of shoulder-surfing by keeping the individual travel preferences on the personal device and displaying only the group recommendation on the public screen. It has been shown that using a mobile device to enter personal information is a promising solution to overcome privacy issues [3]. While many RSs for mobile devices have been developed and evaluated, only few examples use public displays or hybrid approaches combining smartphones and pub-

174 | D. Herzog et al. lic displays. We started to integrate public displays into the TourRec application [39]. Compared to the smartphone version, the public display application can show all relevant information on one single screen, as shown in Fig. 6.3. We will extend our prototypes integrating public displays to enable group recommendations and will use them to investigate the effects of public displays on group recommendations, since to the best of our knowledge, there is no rigorous evaluation of different user interaction techniques for tourist trip GRSs in the literature.

Figure 6.3: The TourRec application running on a kiosk system.

5 Conclusions In this chapter, we presented foundations, the current state of the art, and ongoing research challenges concerning the TTDP. Compared to related work, we use a broader definition of the TTDP. Instead of only using the OP and its variants to model the TTDP, we showed how other mathematical problems, namely, the TSP and KP, can be used to recommend tourist trips and routes on different granularities, such as inner-city routes, road trips, and trips composed of multiple countries. Furthermore, we provided an overview of the state of the art of RSs in tourism, TTDP algorithms, and aspects of item diversity and we showed how to analyze past tourist trips to come up with more realistic recommendations. The main goal of our work is to make research on recommending tourist trips more applicable in practical scenarios. For this purpose, we highlighted some of the most

6 Tourist trip recommendations – foundations, state of the art, and challenges | 175

important challenges in the development of next-generation tourist trip RSs. In practical applications, algorithms solving the TTDP should not only connect a set of nodes with fixed scores to come up with the best solution from a pure mathematical point of view. Instead, the nodes have to be perceived as travel items, such as POIs or travel regions, with certain characteristics that have to be taken into account when recommended to the user. This is why tourist trip RSs solving the TTDP should become more context-aware and take into account factors such as weather, time of day, and previously visited POIs. They should come up with a recommended sequence of POIs that is not only optimized by an algorithm, but also a pleasure to follow. Therefore, we argue that solutions to the TTDP should be evaluated more from a user’s perspective. Nowadays, many data about tourist mobility are available. We argue that analyzing mobility patterns can provide insights into preferences of individual travelers and the ideal time and duration to visit a given POI or region. Incorporating such information can potentially improve the quality of recommendations. Finally, tourists often travel in groups. Hence, RSs for tourist trips have to consider the preferences of all group members to come up with well-appreciated compromises. Particularly in view of group recommendations, alternative user interfaces for tourist trip recommendations are an important issue. While smartphones are very popular devices today to receive recommendations everywhere, larger displays, such as public displays, should be integrated into the recommendation process to facilitate the discussion among group members.

References [1]

[2] [3]

[4] [5]

[6]

Gediminas Adomavicius and Alexander Tuzhilin. Context-aware recommender systems. In Francesco Ricci, Lior Rokach, and Bracha Shapira, editors, Recommender Systems Handbook, pages 191–226. Springer US, New York, NY, USA, 2015. H. Murat Afsar and Nacima Labadie. Team orienteering problem with decreasing profits. Electronic Notes in Discrete Mathematics, 41:285–293, 2013. Florian Alt, Alireza Sahami Shirazi, Thomas Kubitza, and Albrecht Schmidt. Interaction techniques for creating and exchanging content with public displays. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’13, pages 1709–1718, ACM, New York, NY, USA, 2013. E. Angelelli, C. Archetti, and M. Vindigni. The clustered orienteering problem. European Journal of Operational Research, 238(2):404–414, 2014. Liliana Ardissono, Anna Goy, Giovanna Petrone, Marino Segnan, and Pietro Torasso. Tailoring the recommendation of tourist information to heterogeneous user groups. In Revised Papers from the Nternational Workshops OHS-7, SC-3, and AH-3 on Hypermedia: Openness, Structural Awareness, and Adaptivityf, pages 280–295, Springer-Verlag, London, UK, 2002. Jie Bao, Yu Zheng, and Mohamed F. Mokbel. Location-based and preference-aware recommendation using sparse geo-social networking data. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’12, pages 199–208, ACM, New York, NY, USA, 2012.

176 | D. Herzog et al.

[7] [8]

[9]

[10]

[11] [12]

[13]

[14]

[15]

[16]

[17] [18]

[19] [20]

[21] [22]

[23]

Jie Bao, Yu Zheng, David Wilkie, and Mohamed Mokbel. Recommendations in location-based social networks: a survey. GeoInformatica, 19(3):525–565, February 2015. Ramesh Baral and Tao Li. Maps: A multi aspect personalized POI recommender system. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, pages 281–284, ACM, New York, NY, USA, 2016. Idir Benouaret and Dominique Lenne. A package recommendation framework for trip planning activities. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, pages 203–206, ACM, New York, NY, USA, 2016. Justine I. Blanford, Zhuojie Huang, Alexander Savelyev, and Alan M. MacEachren. Geo-located tweets. enhancing mobility maps and capturing cross-border movement. PLOS ONE, 10(6):1–16, June 2015. Joan Borràs, Antonio Moreno, and Aida Valls. Intelligent tourism recommender systems: A survey. Expert Systems with Applications, 41(16):7370–7389, 2014. Matthias Braunhofer, Mehdi Elahi, Mouzhi Ge, and Francesco Ricci. Sts: Design of weather-aware mobile recommender systems in tourism. In Proceedings of the 1st Workshop on AI*HCI: Intelligent User Interfaces (AI*HCI 2013), 2013. Matthias Braunhofer, Mehdi Elahi, and Francesco Ricci. Usability assessment of a context-aware and personality-based mobile recommender system. In International conference on electronic commerce and web technologies, pages 77–88. Springer International Publishing, 2014. Igo Brilhante, Jose Antonio Macedo, Franco Maria Nardini, Raffaele Perego, and Chiara Renso. Where shall we go today?: Planning touristic tours with tripbuilder. In Proceedings of the 22Nd ACM International Conference on Information & Knowledge Management, CIKM ’13, pages 757–762, ACM, New York, NY, USA, 2013. Frederik Brudy, David Ledo, Saul Greenberg, and Andreas Butz. Is anyone looking? mitigating shoulder surfing on public displays through awareness and protection. In Proceedings of The International Symposium on Pervasive Displays, PerDis ’14, pages 1:1–1:6, ACM, New York, NY, USA, 2014. Jennifer J. Burg, John Ainsworth, Brian Casto, and Sheau-Dong Lang. Experiments with the “oregon trail knapsack problem”. Electronic Notes in Discrete Mathematics, 1:26–35, 1999. CP98, Workshop on Large Scale Combinatorial Optimisation and Constraints. Steven E. Butt and Tom M. Cavalier. A heuristic for the multiple tour maximum collection problem. Comput. Oper. Res., 21(1):101–111, January 1994. Steven E. Butt and David M. Ryan. An optimal solution procedure for the multiple tour maximum collection problem using column generation. Computers & Operations Research, 26(4):427–441, 1999. Ann M. Campbell, Michel Gendreau, and Barrett W. Thomas. The orienteering problem with stochastic travel and service times. Annals of Operations Research, 186(1):61–81, 2011. Pablo Castells, Neil J. Hurley, and Saul Vargas. Novelty and diversity in recommender systems. In Francesco Ricci, Lior Rokach, and Bracha Shapira, editors, Recommender Systems Handbook, pages 881–918. Springer US, New York, NY, USA, 2015. I-Ming Chao, Bruce L. Golden, and Edward A. Wasil. The team orienteering problem. European journal of operational research, 88(3):464–474, 1996. Zhiyuan Cheng, James Caverlee, Kyumin Lee, and Daniel Z Sui. Exploring millions of footprints in location sharing services. In Proceedings of the Fifth International Conference on Weblogs and Social Media, ICWSM ’11, pages 81–88. AAAI, Palo Alto, CA, USA, July 2011. Keith Cheverst, Nigel Davies, Keith Mitchell, Adrian Friday, and Christos Efstratiou. Developing a context-aware electronic tourist guide: Some issues and experiences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’00, pages 17–24, ACM, New

6 Tourist trip recommendations – foundations, state of the art, and challenges | 177

York, NY, USA, 2000. [24] Munmun De Choudhury, Moran Feldman, Sihem Amer-Yahia, Nadav Golbandi, Ronny Lempel, and Cong Yu. Automatic construction of travel itineraries using social breadcrumbs. In Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, HT ’10, pages 35–44, ACM, New York, NY, USA, 2010.. [25] Sérgio R. de M. Queiroz and Francisco de A. T. de Carvalho. Making collaborative group recommendations based on modal symbolic data. In Ana L. C. Bazzan and Sofiane Labidi, editors, Advances in Artificial Intelligence – SBIA 2004: 17th Brazilian Symposium on Artificial Intelligence, Sao Luis, Maranhao, Brazil, September 29-Ocotber 1, 2004. Proceedings, pages 307–316. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004. [26] Linus W. Dietz. Data-driven destination recommender systems. In Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, UMAP ’18, ACM, New York, NY, USA, July 2018. [27] Linus W. Dietz, Daniel Herzog, and Wolfgang Wörndl. Deriving tourist mobility patterns from check-in data. In Proceedings of the WSDM 2018 Workshop on Learning from User Interactions, Los Angeles, CA, USA, February 2018. [28] Linus W. Dietz, Rinita Roy, and Wolfgang Wörndl. Characterization of traveler types using check-in data from location-based social networks. In Proceedings of the 26th ENTER eTourism Conference, 2019. [29] Fedor V. Fomin and Andrzej Lingas. Approximation algorithms for time-dependent orienteering. Information Processing Letters, 83(2):57–62, 2002. [30] Zachary Friggstad, Sreenivas Gollapudi, Kostas Kollias, Tamas Sarlos, Chaitanya Swamy, and Andrew Tomkins. Orienteering algorithms for generating travel itineraries. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM ’18, pages 180–188, ACM, New York, NY, USA, 2018. [31] Ander Garcia, Olatz Arbelaitz, Maria Teresa Linaza, Pieter Vansteenwegen, and Wouter Souffriau. Personalized tourist route generation. In Proceedings of the 10th International Conference on Current Trends in Web Engineering, ICWE’10, pages 486–497. Springer-Verlag, Berlin, Heidelberg, 2010. [32] Ander Garcia, Pieter Vansteenwegen, Olatz Arbelaitz, Wouter Souffriau, and Maria Teresa Linaza. Integrating public transportation in personalised electronic tourist guides. Computers & Operations Research, 40(3):758–774, 2013. Transport Scheduling. [33] Ander Garcia, Pieter Vansteenwegen, Wouter Souffriau, Olatz Arbelaitz, and Maria Linaza. Solving multi constrained team orienteering problems to generate tourist routes. Tech. rep., Centre for Industrial Management / Traffic & Infrastructure, Katholieke Universiteit Leuven, Leuven, Belgium, 2009. [34] Damianos Gavalas, Vlasios Kasapakis, Charalampos Konstantopoulos, Grammati Pantziou, and Nikolaos Vathis. Scenic route planning for tourists. Personal and Ubiquitous Computing, 1–19, 2016. [35] Damianos Gavalas, Charalampos Konstantopoulos, Konstantinos Mastakas, and Grammati Pantziou. Mobile recommender systems in tourism. Journal of Network and Computer Applications, 39:319–333, March 2014. [36] Damianos Gavalas, Charalampos Konstantopoulos, Konstantinos Mastakas, and Grammati Pantziou. A survey on algorithmic approaches for solving tourist trip design problems. Journal of Heuristics, 20(3):291–328, June 2014. [37] Zong Woo Geem, Chung-Li Tseng, and Yongjin Park. Harmony search for generalized orienteering problem: best touring in china. In Lipo Wang, Ke Chen, and Yew Soon Ong, editors, Advances in natural computation, pages 741–750. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.

178 | D. Herzog et al.

[38] Daniel Herzog, Christopher Laß, and Wolfgang Wörndl. TourRec: A Tourist Trip Recommender System for Individuals and Groups. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, pages 496–497, ACM, New York, NY, USA, 2018. [39] Daniel Herzog, Nikolaos Promponas-Kefalas, and Wolfgang Wörndl. Integrating Public Displays into Tourist Trip Recommender Systems. In Proceedings of the 3rd Workshop on Recommenders in Tourism co-located with 12th ACM Conference on Recommender Systems (RecSys ’18), pages 18–22, 2018. [40] Daniel Herzog and Wolfgang Wörndl. Exploiting item dependencies to improve tourist trip recommendations. In Proceedings of the Workshop on Recommenders in Tourism co-located with 10th ACM Conference on Recommender Systems (RecSys 2016), Boston, MA, USA, September 15, 2016, pages 55–58, 2016. [41] Daniel Herzog and Wolfgang Wörndl. A travel recommender system for combining multiple travel regions to a composite trip. In CBRecSys@RecSys, volume 1245 of CEUR Workshop Proceedings, pages 42–48, CEUR-WS.org, Foster City, Silicon Valley, California, USA, 2014. [42] Hsun-Ping Hsieh, Cheng-Te Li, and Shou-De Lin. Exploiting large-scale check-in data to recommend time-sensitive routes. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, UrbComp ’12, pages 55–62, ACM, New York, NY, USA, 2012. [43] Qian Hu and Andrew Lim. An iterative three-component heuristic for the team orienteering problem with time windows. European Journal of Operational Research, 232(2):276–286, 2014. [44] Taylan Ilhan, Seyed M. R. Iravani, and Mark S. Daskin. The orienteering problem with stochastic profits. IIE Transactions, 40(4):406–421, 2008. [45] Anthony Jameson. More than the sum of its members: Challenges for group recommender systems. In Proceedings of the Working Conference on Advanced Visual Interfaces, AVI ’04, pages 48–54, ACM, New York, NY, USA, 2004. [46] Marisa G. Kantor and Moshe B. Rosenwein. The orienteering problem with time windows. Journal of the Operational Research Society, 43(6):629–635, 1992. [47] Imdat Kara, Papatya Sevgin Bicakci, and Tusan Derya. New formulations for the orienteering problem. Procedia Economics and Finance, 39:849–854, 2016. [48] Hans Kellerer, Ulrich Pferschy, and David Pisinger. Knapsack Problems. Springer, Berlin, 01 2004. [49] Gilbert Laporte. The traveling salesman problem: An overview of exact and approximate algorithms. European Journal of Operational Research, 59(2):231–247, 1992. [50] Gilbert Laporte and Silvano Martello. The selective travelling salesman problem. Discrete Applied Mathematics, 26:193–207, mar 1990. [51] Christopher Laß, Daniel Herzog, and Wolfgang Wörndl. Context-aware tourist trip recommendations. In Proceedings of the 2nd Workshop on Recommenders in Tourism co-located with 11th ACM Conference on Recommender Systems (RecSys 2017), Como, Italy, August 27, 2017, pages 18–25, 2017. [52] Christopher Laß, Wolfgang Wörndl, and Daniel Herzog. A multi-tier web service and mobile client for city trip recommendations. In The 8th EAI International Conference on Mobile Computing, Applications and Services (MobiCASE). ACM, 2016. [53] Mario Lenz. Cabata: Case-based reasoning for holiday planning. In Proceedings of the International Conference on Information and Communications Technologies in Tourism, pages 126–132. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1994. [54] Mario Lenz. Imtas: Intelligent multimedia travel agent system. In Stefan Klein, Beat Schmid, A. Min Tjoa, and Hannes Werthner, editors, Information and Communication Technologies in Tourism, pages 11–17, Springer Vienna, Vienna, 1996. [55] Marvin Levine. You-are-here maps: Psychological considerations. Environment and Behavior, 14(2):221–237, March 1982.

6 Tourist trip recommendations – foundations, state of the art, and challenges | 179

[56] Tariq Mahmood, Francesco Ricci, and Adriano Venturini. Improving recommendation effectiveness: Adapting a dialogue strategy in online travel planning. Information Technology & Tourism, 11(4):285–302, 2009. [57] Judith Masthoff. Group recommender systems: Aggregation, satisfaction and group attributes. In Francesco Ricci, Lior Rokach, and Bracha Shapira, editors, Recommender Systems Handbook, pages 743–776. Springer US, Boston, MA, 2015. [58] Kevin McCarthy, Lorraine McGinty, Barry Smyth, and Maria Salamó. The needs of the many: A case-based group recommender system. In Proceedings of the 8th European Conference on Advances in Case-Based Reasoning, ECCBR’06, pages 196–210, Springer-Verlag, Berlin, Heidelberg, 2006. [59] Robert R. McCrae and Oliver P. John. An introduction to the five-factor model and its applications. Journal of Personality, 60(2):175–215, June 1992. [60] Montassar Ben Messaoud, Ilyes Jenhani, Eya Garci, and Toon De Pessemier. SemCoTrip: A variety-seeking model for recommending travel activities in a composite trip. In Advances in Artificial Intelligence: From Theory to Practice, Arras, France, pages 345–355. Springer International Publishing, June 2017. [61] Jörg Müller, Florian Alt, Daniel Michelis, and Albrecht Schmidt. Requirements and design space for interactive public displays. In Proceedings of the 18th ACM International Conference on Multimedia, MM ’10, pages 1285–1294, ACM, New York, NY, USA, 2010. [62] Julia Neidhardt, Leonhard Seyfang, Rainer Schuster, and Hannes Werthner. A picture-based approach to recommender systems. Information Technology & Tourism, 15(1):49–69, March 2015. [63] Anastasios Noulas, Salvatore Scellato, Cecilia Mascolo, and Massimiliano Pontil. An empirical study of geographic user activity patterns in foursquare. In Proceedings of the Fifth International Conference on Weblogs and Social Media, ICWSM ’11, volume 11, pages 70–573, AAAI, Palo Alto, CA, USA, July 2011. [64] Sebastian Oehme and Linus W. Dietz. Affective computing and bandits: Capturing context in cold start situations. In Proceedings of the RecSys Joint Workshop on Interfaces and Human Decision Making for Recommender Systems, Vancouver, Canada, 2018. [65] Randall S. Olson. The optimal U. S. national parks centennial road trip. Online, July 2016. http: //www.randalolson.com/2016/07/30/the-optimal-u-s-national-parks-centennial-road-trip. [66] Peter Peltonen, Esko Kurvinen, Antti Salovaara, Giulio Jacucci, Tommi Ilmonen, John Evans, Antti Oulasvirta, and Petri Saarikko. It’s mine, don’t touch!: Interactions at a large multi-touch display in a city centre. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, pages 1285–1294, ACM, New York, NY, USA, 2008. [67] Daniele Quercia, Rossano Schifanella, and Luca Maria Aiello. The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. In Proceedings of the 25th ACM Conference on Hypertext and Social Media, HT ’14, pages 116–125, ACM, New York, NY, USA, 2014. [68] F. Ricci, D. R. Fesenmaier, N. Mirzadeh, H. Rumetshofer, E. Schaumlechner, A. Venturini, K. W. Wöber, and A. H. Zins. Dietorecs: a case-based travel advisory system. In D. R. Fesenmaier, K. W. Wöber, and H. Werthner, editors, Destination recommendation systems: behavioural foundations and applications, pages 227–239. CABI, 2006. [69] Francesco Ricci. Travel recommender systems. IEEE Intelligent Systems, 55–57, 2002. [70] Francesco Ricci, Lior Rokach, and Bracha Shapira. Recommender systems: Introduction and challenges. In Francesco Ricci, Lior Rokach, and Bracha Shapira, editors, Recommender Systems Handbook, pages 1–34. Springer US, Boston, MA, 2015. [71] Amihai Savir, Ronen Brafman, and Guy Shani. Recommending improved configurations for complex objects with an application in travel planning. In Proceedings of the 7th ACM

180 | D. Herzog et al.

[72]

[73]

[74] [75] [76]

[77]

[78]

[79] [80] [81]

[82] [83] [84] [85]

[86] [87]

[88]

Conference on Recommender Systems, RecSys ’13, pages 391–394, ACM, New York, NY, USA, 2013. Salvatore Scellato and Cecilia Mascolo. Measuring user activity on an online location-based social network. In 2011 IEEE Conference on Computer Communications Workshops, pages 918–923. IEEE, April 2011. Alexander Smirnov, Alexey Kashevnik, Andrew Ponomarev, Nikolay Shilov, and Nikolay Teslya. Proactive recommendation system for m-tourism application. In Björn Johansson, Bo Andersson, and Nicklas Holmberg, editors, Perspectives in Business Informatics Research, pages 113–127. Springer International Publishing, Cham, 2014. Chaoming Song, Tal Koren, Pu Wang, and Albert-László Barabási. Modelling the scaling properties of human mobility. Nature Physics, 6(10):818–823, September 2010. Chaoming Song, Zehui Qu, Nicholas Blumm, and Albert-László Barabási. Limits of predictability in human mobility. Science, 327(5968):1018–1021, February 2010. Wouter Souffriau, Pieter Vansteenwegen, Greet Vanden Berghe, and Dirk Van Oudheusden. A greedy randomised adaptive search procedure for the team orienteering problem. In EU/MEeting, pages 23–24, 2008. Wouter Souffriau, Pieter Vansteenwegen, Greet Vanden Berghe, and Dirk Van Oudheusden. A path relinking approach for the team orienteering problem. Computers & Operations Research, 37(11):1853–1859, November 2010. Wouter Souffriau, Pieter Vansteenwegen, Greet Vanden Berghe, and Dirk Van Oudheusden. The multiconstraint team orienteering problem with multiple time windows. Transportation Science, 47(1):53–63, February 2013. Harald Steck. Calibrated recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, pages 154–162, ACM, New York, NY, USA, 2018. Kadri Sylejmani, Jürgen Dorn, and Nysret Musliu. Planning the trip itinerary for tourist groups. Information Technology & Tourism, 17(3):275–314, September 2017. Marko Tkalcic, Matevž Kunaver, Jurij Tasic, and Andrej Kosir. Personality based user similarity measure for a collaborative recommender system. In Christian Peter, Elizabeth Crane, Lesley Axelrod, Harry Agius, Shazia Afzal, and Madeline Balaam, editors, 5th Workshop on Emotion in Human-Computer Interaction-Real World Challenges, pages 30–37. Fraunhofer, September 2009. Theodore Tsiligirides. Heuristic methods applied to orienteering. Journal of the Operational Research Society, pages 797–809, 1984. Pieter Vansteenwegen, Wouter Souffriau, Greet Vanden Berghe, and Dirk Van Oudheusden. The city trip planner. Expert Syst. Appl., 38(6):6540–6546, June 2011. Pieter Vansteenwegen, Wouter Souffriau, and Dirk Van Oudheusden. The orienteering problem: A survey. European Journal of Operational Research, 209(1):1–10, 2011. Pieter Vansteenwegen, Wouter Souffriau, Greet Vanden Berghe, and Dirk Van Oudheusden. Iterated local search for the team orienteering problem with time windows. Comput. Oper. Res., 36(12):3281–3290, December 2009. Pieter Vansteenwegen and Dirk Van Oudheusden. The mobile tourist guide: an OR opportunity. OR Insight, 20(3):21–27, 2007. Adriano Venturini and Francesco Ricci. Applying trip@dvice recommendation technology to www.visiteurope.com. In Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 – September 1, 2006, Riva Del Garda, Italy, pages 607–611. IOS Press, Amsterdam, The Netherlands, 2006. Wolfgang Woerndl, Johannes Huebner, Roland Bader, and Daniel Gallego-Vico. A model for proactivity in mobile, context-aware recommender systems. In Proceedings of the fifth ACM conference on Recommender systems, RecSys ’11. ACM, 2011.

6 Tourist trip recommendations – foundations, state of the art, and challenges | 181

[89] World Travel and Tourism Council. Travel & Tourism Global Economic Impact & Issues 2018, March 2018. [90] Wen Wu, Li Chen, and Liang He. Using personality to adjust diversity in recommender systems. In Proceedings of the 24th ACM Conference on Hypertext and Social Media, HT ’13, pages 225–229, ACM, New York, NY, USA, May 2013. [91] Wolfgang Wörndl. A web-based application for recommending travel regions. In Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, UMAP ’17, pages 105–106, ACM, New York, NY, USA, 2017. [92] Wolfgang Wörndl, Alexander Hefele, and Daniel Herzog. Recommending a sequence of interesting places for tourist trips. Information Technology & Tourism, 17(1):31–54, February 2017. [93] Liu Yang, Lun Wu, Yu Liu, and Chaogui Kang. Quantifying tourist behavior patterns by travel motifs and geo-tagged photos from flickr. ISPRS International Journal of Geo-Information, 6(11):345, nov 2017. [94] Yu Zheng and Xing Xie. Learning travel recommendations from user-generated GPS traces. ACM Transactions on Intelligent Systems and Technology, 2(1):1–29, January 2011.

Wilfried Grossmann, Mete Sertkan, Julia Neidhardt, and Hannes Werthner

7 Pictures as a tool for matching tourist preferences with destinations Abstract: Usually descriptions of touristic products comprise information about accommodation, tourist attractions, or leisure activities. Tourist decisions for a product are based on personal characteristics, planned vacation activities, and specificities of potential touristic products. The decision should guarantee a high level of emotional and physical well-being, considering also some hard constraints like temporal and monetary resources, or travel distance. The starting point for the design of the described recommender system is a unified description of the preferences of the tourist and the opportunities offered by touristic products using the so-called Seven-Factor Model. For the assignment of the values in the Seven-Factor Model a predefined set of pictures is the pivotal instrument. These pictures represent various aspects of the personality and preferences of the tourist as well as general categories for the description of destinations, i. e., certain tourist attractions like landscape, cultural facilities, different leisure activities, or emotional aspects associated with tourism. Based on the picture selection of a customer a so-called Factor Algorithm calculates values for each factor of the Seven-Factor Model. This is a rather fast and intuitive method for acquisition of information about personality and preferences. The evaluation of the factors of the products is obtained by mapping descriptive attributes of touristic products onto the predefined pictures and afterwards applying the Factor Algorithm to the pictures characterizing the product. Based on this unified description of tourists and touristic products a recommendation can be defined by measuring the similarity between the user attributes and the product attributes. The approach is evaluated using data from a travel agency. Furthermore other possible applications are discussed. Keywords: tourism, Seven-Factor Model, travel behavior, user modeling

1 Introduction The purpose of a recommender system is usually characterized as providing “suggestions for items that are most likely of interest to a particular user” [25]. Thus, in order to be capable of accurately delivering personalized recommendations, an appropriate user model is required at the core of such a system. As a consequence, various ways to introduce a more comprehensive view of the users, their needs, and their preferences have been explored. In this context, personality-based approaches are increasingly gaining attention. It has been shown that in a number of domains user preferences https://doi.org/10.1515/9783110552485-007

184 | W. Grossmann et al. can be related to the personality of a user and that recommender systems can successfully exploit these relationships [29], also in the context of tourism [3]. Providing accurate recommendations for travel and tourism is particularly challenging as the touristic product typically possesses a considerably high complexity. Usually the product consists of a bundle of different but interrelated components, e. g., means of transport, accommodations, and attractions and activities at the destination [30]. In the early phase of the travel decision making process, moreover, people are often not capable of phrasing their tourism preferences explicitly, so they might rather be intangible and implicitly given [33]. In addition, traveling is emotional, and this aspect should also be considered within a computational model [31]. In an ideal case the model should capture the user preferences and the product characteristics in a comparable way. The goal of this chapter is to describe in detail how user preferences and touristic products can be represented in a 7D space of latent properties and how this representation can be used for delivering personalized recommendations. A typical application scenario of this approach is the following one: A travel agency offers various products to their customers, for example short-term city trips, round trips, family holiday trips, cruises, or event trips (e. g., attendance of sport or cultural events). For better marketing and customer relationship management (CRM) the agency is interested in personalized offers for already existing customers (based on the existent customer information). In addition to providing a better service to existing customers, the agency is also interested in attracting new customers and providing suitable products for them. Hence the agency is mainly interested in developing a recommender system for the inspiration and planning phase of customers. For achieving this goal the starting point of the approach is the already existing system which was introduced in [18]. The system addresses users at a non-verbal and emotional level where a predefined set of pictures is used to elicit user preferences. These preferences are modeled by combining 17 tourist roles from literature and the “Big Five” personality traits [19] and result in a 7D representation of tourist preferences. Note that these pictures are not destination-specific but show the various aspects of preferences and personalities in a prototypical way. The novel feature in this paper is the innovative use of the pictures for characterization of not only user preferences but also the offered products. These products can be described in different ways and to distinguish between the different descriptions we use the notations , , , indicating which type of attributes are used for the description. For the description of the customers in the Seven-Factor Model we use the notation User Factor Profile. The approach is based on a three-step procedure. Starting point is a set of descriptors of touristic products. It should be mentioned that these descriptors can encompass not only facts about a destination but also opinions of former tourists about a destination. In the first step the descriptors of the products are collected and mapped onto a unified system of categories. The result is a so-called Product Category Profile

7 Pictures as a tool for matching tourist preferences with destinations | 185

for a product. In the second step the categories are mapped onto the set of pictures which is used for characterization of the customer preferences. The results is a socalled Product Picture Profile. In the third step a Product Factor Profile is computed using the same algorithm which calculates the User Factor Profile. The chapter is organized as follows: After the presentation of related work in Section 2, we illustrate in Section 3 the key concepts of the picture-based recommender system. In Section 4 the first results of an empirical validation are presented. This evaluation shows that the approach is both valid and well performing in a domain, where implicit and emotional factors strongly impact the decision making process. In Section 5 we discuss further work.

2 Related work The main objective of our work is providing an innovative and enjoyable support in describing user needs and preferences (for details see [18, 19]). Critique-based recommender techniques pursue a similar goal [17], but their focus is on the conversational process, where first results are refined iteratively. Although users do not have to specify all their preferences from the very beginning, some initial input is required, e. g., by answering some questions or with the help of initial examples, e. g., pictures of hotels [26]. For the latter, similarities to our approach exist, but their pictures clearly refer to products, whereas in our case the pictures capture user types and prototypical product characteristics. Our approach is supported by [22] as their work related to the design of preference elicitation interfaces shows that (i) low cognitive effort (e. g., due to pictures) can lead to high user liking, and (ii) affective feedback can increase the willingness to spend more effort. In some sense our approach fits the idea of reciprocal recommender systems, where the preferences of both sides involved are expressed and matched [15] using the same personality-centered measurement instrument. The first steps in this direction have already been introduced in [12, 27, 28]. Glatzer et al. [12] propose a text-mining-based approach, where textual descriptions of hotels are considered for allocating hotels to the Seven Factors. Furthermore, Sertkan et al. [27, 28] show that the Seven-Factor representation of tourism destinations can be determined by considering respective hard facts of destinations. Despite its high complexity, tourism has been an important application domain for recommender systems since the 2000s [24], and it is becoming increasingly relevant as more and more travelers are relying on Information and Communication Technology in all phases of their tourist experiences [21, 2, 9]. In [13] it is shown that tourist types can be used to predict activities of travelers during vacations as these types are distinguishable regarding travel style (e. g., variety seeking), travel motivations (e. g., social contacts), and travel values (e. g., active versus passive). This associations can be exploited when proposing appropriate tourism objects. In [1] a relation is established between tourist types and representative tourism-related pictures, im-

186 | W. Grossmann et al. plying that tourist types can be assigned to users based on their selected pictures. Our work builds upon these results. A web-based application of a picture-based recommender system is provided by Cruneo [4]. On the website of this company, cruises can be compared, and to determine the travel style of a user, 12 pictures are used. However, this application is focusing on a very specific segment and it is not clear whether a theoretical framework is given. Other systems use pictures for a more fine-grained recommendation like route planning from a more emotional perspective [23] or guides for visiting a city [5].

3 The picture-based approach In this section we describe the three cornerstones of the picture-based approach: – The Seven-Factor Model; – The determination of User Factor Profiles (i. e., Seven-Factor representation of users); – The determination of Product Factor Profiles (i. e., Seven-Factor representation of products), expressed either directly in terms of the Seven Factors or by pictures.

3.1 Seven-Factor Model Much research has already been conducted in order to develop comprehensive user models of tourists capable of capturing respective preferences, needs, and interest. A well-established and known framework in this sense is introduced in [11], namely, the 17 tourist roles. The mentioned framework captures the short-term preferences of tourists, i. e., preferences which might change depending on the context (e. g., seasonality such as summer or winter, special occasions, or single/group) [18, 19]. On the other side, personality traits tend to be more stable over time and can, in general, be considered as long-term preferences and behavior [32, 16]. A well-known, widely used, and domain-independent framework in this context is the Five-Factor Model, also known as the “Big Five” personality traits [14]. The Seven-Factor Model [19] was obtained from 997 questionnaires, using existing standardized questions for assessing and measuring the 17 tourist roles and the “Big Five” personality traits of the respondents. Using the collected data a factor analysis was conducted, which reduced the initial 22 dimensions (i. e., 17 tourist roles plus the “Big Five” personality traits) and resulted in seven independent factors, i. e. the Seven-Factor Model, which is briefly summarized in Table 7.1. The resulting Seven Factors are easier to interpret and to process cognitively and computationally compared to the initial 22 dimensions. It has been shown that based on different demographic characteristics different user groups can be well distinguished within the Seven-Factor Model [20].

7 Pictures as a tool for matching tourist preferences with destinations | 187 Table 7.1: Seven-Factor Model [18, 19]. Factor

Description

Sun & chill-out

a neurotic sun lover, who likes warm weather and sun bathing and does not like cold, rainy, or crowded places an open-minded, educational, and well-organized mass tourist, who likes traveling in groups and gaining knowledge, rather than being lazy an independent mass tourist, who is searching for the meaning of life, is interested in history and tradition, and likes to travel independently, rather than organized tours and travels an extroverted, culture and history loving high-class tourist, who is also a connoisseur of good food and wine an open-minded sportive traveler, who loves to socialize with locals and does not like areas of intense tourism a jet setting thrill seeker, who loves action, party, and exclusiveness and avoids quiet and peaceful places a nature and silence lover, who wants to escape from everyday life and avoids crowded places and large cities

Knowledge & travel Independence & history

Culture & indulgence Social & sports Action & fun Nature & recreation

3.2 Determination of User Factor Profiles Fig. 7.1 illustrates both the “traditional” and the picture-based way of eliciting a user’s profile, which is, in our case, the representation of a user in the Seven-Factor Model. Note that a user is represented as a mixture of these factors.

Figure 7.1: Traditional- versus picture-based approach to user profiling.

In the literature, most common approaches to obtain a user’s preferences, needs, and personality are critique-based. Thus, to obtain preference, needs, and personality, users have to communicate with the system or fill out questionnaires (upper path of Fig. 7.1). The Seven Factors of a user were originally obtained in such a conventional way as outlined in Section 3.1.

188 | W. Grossmann et al. Many people have difficulties in explicitly expressing their preferences and needs [33] and usually travel decisions (e. g., where to go, how to travel) are rather not rationally taken but implicitly given [19]. Thus, by a simple method of picture selection (lower path of Fig. 7.1), the picture-based approach avoids tedious communication with the system and addresses also the implicit and emotional level of the decision making. For the development of the Factor Approach (lower part of Fig. 1) several steps were carried out. First, travel-related pictures were pre-selected and it was evaluated in a workshop whether they fit to the Seven-Factors. This resulted in a set of 102 pictures. In a second study, 105 people were asked to select and rank a number of pictures out of the 102 travel-related pictures by considering their next hypothetical trip. Furthermore, the participants had to fill out the same questionnaires, which were used for the development of the Seven-Factor Model. It turned out that people tend to select between three and seven pictures. The initial set of 102 travel-related pictures, moreover, was reduced by simply omitting the most and least frequently chosen pictures. This resulted in a more concise set of 63 travel-related pictures (i. e., those that were capturing most of the information). With the help of experts, further relations between the Seven Factors and the pictures were established. These relations, moreover, were quantified through multiple regression analysis (ordinary least squares). This regression analysis resulted in seven equations, one for each of the Seven Factors, denoted in Fig. 7.1 by the term Factor Algorithm. Application of these equations to a given User Picture Profile obtained from the Selection Interface gives the User Factor Profile, as shown in the lower part of Fig. 7.1 [18, 19]. Note that the User Factor Profile obtained either by the Traditional Approach using a questionnaire or by the Factor Approach, as well as by the Factor Approach using the Selection Interface, is a mixture of the Seven Factors. This corresponds to the wellknown fact that people can have a variation of travel preferences simultaneously [13]. Overall, this non-verbal way of obtaining people’s preferences and needs through a simple picture selection not only counteracts the mentioned difficulties in explicitly expressing one’s preferences and needs, but also gamifies the way of interaction with the system, which is experienced as interesting, exciting, and inspiring by the users.

3.3 Development of the Product Factor Profile The computational model for the Product Factor Profile is depicted in Fig. 7.2. Starting point for the development of the Product Factor Profile are known descriptors for the products which were collected from different sources. The following sources of information were used: – Classification of products according to the travel agency: The agency has defined a coarse classification of products using terms like “city trip,” “round trip,” and “event visit.”

7 Pictures as a tool for matching tourist preferences with destinations | 189

Figure 7.2: Development of the Product Factor Profile.

–

–

Information from the GIATA database [10]: The GIATA database offers information about accommodations including not only detailed descriptions of the accommodations but also information about possible touristic activities in their neighborhood, like sports facilities. Furthermore, the database offers information about distances to places of interest like city centers, beaches, or ski lifts. Information from customer evaluations: The agency stores product bookings, customer information, and opinions.

In the following we describe the steps of Fig. 7.2 in detail. Step 1: Determination of a Product Category Profile The Product Category Profile is defined as a numeric vector of 37 dimensions representing the importance of each category for a product. The categories define a standardized description of a product using the following terminology: – Topographic categories describing the landscape of the product: Mountains, sea & coast & beach, lakes, cities, and nature & landscape. – Infrastructure categories describing the touristic infrastructure of the product: Spas & fitness, arts & culture, points of interest (POIs), gastronomy, night-life, history, excursions, markets, and events. – Activity categories referring to touristic activities related to the product: Winter sports, summer sports, extreme sports, recreational sports, dining, drinking, sightseeing, shopping, entertainment, walking, wellness, observing nature, workshops, and cultural activities. – Customer needs categories referring to emotional aspects related to the product: Relaxing & recreative, exciting & thrilling, family-friendly, calming, entertaining, exclusive & luxurious, alternative, romantic, and adventurous.

190 | W. Grossmann et al. In the following we describe the procedure for obtaining the values in these categories from different sources. First of all, based on the previously mentioned information sources various tourism product attributes were extracted. This process of attribute generation was carried out in dependence of the structure of the sources. The following basic methods were used for defining (generating) the product attributes: – In the case of the textual descriptions of destinations, attributes were defined by keyword extraction. For example, for the term “fly and drive” the attributes “individual product” and “travel by plane” were extracted. This was done mainly interactively using the R text mining environment. – The travel agency’s classifications of products were directly used as attributes. – Relevant information in the GIATA database was either used directly as attributes or transformed (i. e., numerical into categorical) beforehand. For example, information about the distance d to the beach in meters was translated into weights w according to the following rules: if(d ≤ 400) then w = 3; if(400 < d ≤ 1000) then w = 2; if(1000 < d ≤ 3000) then w = 1; if(d > 3000) then w = 0. – Based on the information of the user evaluations attributes were defined by keyword extraction, resulting mainly in attributes which describe the emotional feelings of the tourists. Typical examples are “quiet,” “exciting,” “child-friendly,” or “wellness.” This process resulted in a vector of 400 product attributes, i. e., 58 obtained from the travel agency, 212 from the GIATA database, and 130 from the user evaluations. The entries of such a vector are weights of the attributes. Those weights are either results of the binning process (i. e., categorization of numerical information) in the GIATA database or a consequence of the fact that some of the keywords are used in more than one source. For example, “child-friendly” can occur in the description of the product but also in the evaluation of the users. The higher the value of the attribute the more important is the attribute for the characterization of the product. Next, the existing list of attributes was mapped onto the 37 categories. This mapping was done by five tourist experts from the university and the travel agency which provided the data. In most cases the assignment was straightforward by expert knowledge and the results showed a high degree of consistency. Only for 17 attributes (about 5 %) there was disagreement in a first round, and the results were unified after discussion. The final assignment is represented as a 400 × 37 matrix A indicating the relation between descriptor attributes and categories. Multiplying the row vector p⃗ of ⃗ which is descriptor attributes with A results in the Product Category Profile PCP = pA, a 37-dimensional vector where each component gives an integer weight for the importance of the category for the product. In order to avoid overestimation of some categories a cut-off of the attributes was done. Here the cut-off value 5 was used.

7 Pictures as a tool for matching tourist preferences with destinations | 191

Note that this profile encompasses from the first three top-level categories (topography, infrastructure, and activities) an abstract profile of the product and from the categories of the fourth top-level category information about the emotional aspects and customer needs of the product. In addition to the categories, five constraint variables were defined for each product, i. e., price, means of transport, travel time, weather, and distance of the product from the customer’s home. Step 2: Determination of a Product Picture Profile For each product the Product Picture Profile is a vector of dimension 63, where each component of the vector represents the importance of the picture for characterization of the product. In the following the calculation of the Product Picture Profile is outlined. First of all, pictures are assigned to the 37 categories. The same 63 pictures as in the determination of the User Profile are used. In this assignment multiple attributions of pictures to different categories are possible. For example, a picture showing a hiking tourist may be assigned to the topographic category nature & landscape, to the activity categories walking and recreational sports, and to the customer needs category relaxing & recreative. Consequently the picture assignment to the categories can be represented as a 37 × 63 matrix B, indicating the relation between categories and pictures. Note that due to the multiple assignments of pictures to different categories matrix B has multiple non-zero entries in each row. The assignment was done independently by three tourist experts and by eight test persons. As a result we obtained 11 possible assignment matrices B. Averaging of these matrices resulted in the final assignment matrix B.̄ The Product Picture Profile is now ⃗ B̄ defined by the vector PPP = pA Step 3: Determination of a Product Factor Profile The Product Factor Profile is a 7D vector where each component of the vector defines the aptitude of the product for the factors defined in Table 7.1. The transformation of the Product Picture Profile into the Product Factor Profile is done by application of the Factor Algorithm used in the computation of the User Factor Profile from pictures. In this step not all pictures assigned to a product are used but only the pictures with the highest weight. Here the seven highest weighted pictures of each product are used. As a result one obtains a profile, which uses the same Seven Factors as the User Factor Profile for all touristic products under consideration. Note that the mapping of product characteristics onto the language of the categories and the assignment of the pictures to the categories is done only once. As soon as these results are available one can compute the Product Factor Profile automatically when there is input about the attributes for the touristic product. Moreover, it should be mentioned that a profile can be obtained also by incomplete information about the product.

192 | W. Grossmann et al.

4 Validity of the picture-based approach The Factor Model can be used in different ways. As mentioned in the introduction the main usage scenario of interest lies in the development of an enhanced CRM for a travel agency. For this scenario we describe four different types of evaluation. The first one considers the validity of the approach with respect to a ground truth defined by experts of the tourist agency. The second evaluation considers the application of the method for recommendation of a product to a potential customer with known User Factor Profile and the third evaluation refers to the use of the method for recommendation to customers who have already booked a product in the past. Finally we consider a simple application of the methods for recommendation based on product clustering.

4.1 Evaluation 1: validation of the Product Factor Profiles For validation of the algorithm which computes the Product Factor Profiles according to the steps described in Section 3.3 the procedure was applied to 1221 products offered by an Austrian travel agency. The values of the Seven Factors for these products were also assessed by experts from the travel agency according to their knowledge about the products. The experts assessed each of the Seven Factors on a percent scale resulting in a value between 0 and 100 for each factor of the Product Factor Profile. These assessments correspond to the idea that the factors are orthogonal and allow for an interpretation of a conditional probability for the fit of the product to a factor. For the evaluation of the Product Factor Profile it was assumed that the expert assessment is the ground truth. The question is how far the computed profiles match with these profiles. For matching the cosine similarity of the two profiles was used. This measure has the advantage that it can be interpreted as the correlation between the two profile vectors and allows in this way for a well-known interpretation. A disadvantage of the measure is probably that the profiles are normalized in length and the norm of the profiles is not considered. This means that it is possible that the experts assess the relations between the Seven Factors for two products in a similar way but the level of the assessment is different. The results of the comparison are shown in Table 7.2 and Fig. 7.3. Note that we use the term “similarity” for the measure calculated for each pair (Product Factor Profile due to experts and Product Factor Profile due to the algorithm) and we do not display separate curves for the two profiles. Table 7.2: Similarity distribution of Product Factor Profiles and expert profiles. Min

First quartile

Median

Mean

Third quartile

Max

0.04

0.53

0.70

0.65

0.80

0.97

7 Pictures as a tool for matching tourist preferences with destinations | 193

Figure 7.3: Similarity distribution of Product Factor Profiles and expert profiles.

The summary measures and the skewness of the distribution indicate that for 25 % of all products there is a high or very high correlation between the expert assessment and the outcome of the algorithm and for 25 % of all products the correlation is low.

4.2 Evaluation 2: comparison of Product Factor Profiles and User Factor Profiles A second method for validation of the approach is to compare Product Factor Profiles of purchased products with the User Factor Profiles of the customers obtained from the Selection Interface (see Section 3.2). Overall 81 User Factor Profiles were available. For comparison of the booked profiles and the user profiles the same method was used as in the first evaluation. The results of the comparison are shown in Table 7.3 and Fig. 7.4. The number of cases is rather small but the results lead to similar conclusions as in Section 4.1. The lower values of the similarity can be explained by the fact that customer decisions depend on a number of facts not captured in the profiles. Hence a higher variability can be expected. Table 7.3: Similarity distribution of Product Factor Profiles and User Factor Profiles. Min

First quartile

Median

Mean

Third quartile

Max

0.15

0.44

0.64

0.59

0.72

0.90

4.3 Evaluation 3: comparison of the booked products from users with multiple bookings The third evaluation concerns the question in how far the Product Factor Profiles of products purchased by a user are similar and depend on user preferences. From the

194 | W. Grossmann et al.

Figure 7.4: Similarity distribution of User Factor Profiles and Product Factor Profiles.

database of the tourist agency 982 customers were filtered who booked two different products at different times. Again the similarity of the two bookings was computed by the cosine similarity. The summary measures and the shape of the similarity distribution are shown in Table 7.4 and Fig. 7.5. Table 7.4: Similarity distribution of Product Factor Profiles for multiple bookings. Min

First quartile

Median

Mean

Third quartile

Max

0.00

0.53

0.83

0.72

0.97

1.00

Figure 7.5: Similarity distribution of Product Factor Profiles for multiple bookings.

The results confirm the hypothesis that tourists have the tendency to book products with a rather similar Product Factor Profile. This fact shows that usage of the picturebased approach can be applied in direct marketing activities: If one proposes potential customers products which are similar to those previously booked the conversion rate of such a recommendation will be higher than in the case of proposing an arbitrary product. It should be noted that in this case we use only an implicit determination of the User Factor Profile from previous bookings.

7 Pictures as a tool for matching tourist preferences with destinations | 195

4.4 Evaluation 4: evaluation of the product clusters defined by Product Factor Profiles This evaluation involves the application of a cluster analysis onto the Product Factor Profiles in which the products can be grouped into, and furthermore representatives of product clusters are shown to the potential customer. If the customer decides for one of the products further recommendations can be made by selection of products from this cluster. For defining clusters different methods of cluster analysis for 1946 products in the database of the travel agency were examined. Fig. 7.6 shows a summary of the results of a hierarchical cluster analysis using the similarity matrix based on the cosine similarity and the Ward method for cluster aggregation. The choice of the number of clusters, the cluster method, and the use of the similarity matrix was the result of a number of experiments with different settings of the parameters.

Figure 7.6: Distribution of factors in the clusters.

As one can see for some of the factors the relation to the clusters is evident. For example, products with high adequacy for customers interested in factor 1 (i. e., Sun

196 | W. Grossmann et al. & chill-out) occur mainly in cluster 2 (475 products), products for customers interested in properties described by factor 6 (i. e., Action & fun) occur mainly in cluster 1 (72 products), and products with high adequacy for customers interested in factor 7 (i. e., Nature & recreation) can be found mainly in cluster 4 (619 products) and cluster 5 (275 products). In some cases the distinction is not so clear but recommendation of a product which belongs to the same cluster as a previously booked product definitely increases the probability of booking compared to a random recommendation.

5 Conclusions and further development In this chapter we showed further development of the picture-based approach not only for obtaining user preferences with pictures but also for characterizing touristic products with respect to the Seven-Factor Model using pictures. The empirical results show that this approach is valid, as judged by the evaluation of products by experts, and can be applied in different application scenarios. A demonstration for the realization of the potential of the approach by using an A/B test is planned in the future. A number of further developments of the system are planned. In particular, work for the following improvements and modifications have already started: – Development of a user-friendly interface for mobile devices: At the moment the system presents all 63 pictures for the selection on one screen. This it not very convenient for the much smaller screen size of a mobile device. A new interface which is based on a sequential presentation of the pictures is under development. At the moment two different scenarios are tested, i. e., presentation of four pictures and presentation of two pictures. – Extension of the picture set: In the actual version of the system the picture set is static and cannot be changed. A first analysis of the coefficients of the pictures in the Factor Algorithm showed that the pictures can be grouped according to the importance for the decision by the Factor Algorithm. This clustering can be used for augmentation of pictures. In connection with the new interface with sequential picture presentation the augmentation offers the opportunity to present only a selection of pictures from a larger set. This offers users a higher degree of variety and more entertainment in multiple usage of the system. – Extension of the user interface for customer acquisition: As previously outlined the system can be used for customer acquisition and CRM without performing the picture test. An extension of the interface for better support of these usage scenarios is planned. – Application of the model in other domains: The approach is of interest whenever recommendation includes not only hard facts but also users’ tastes and emotions. First results of an application in the domain of event marketing showed promising results.

7 Pictures as a tool for matching tourist preferences with destinations | 197

–

–

Adapting the system for groups: Traveling is an activity, which is predominantly experienced by groups of people (e. g., family, friends, colleagues) rather than individuals. It has been shown that the satisfaction of group members to group decisions not only depends on the similarity to the respective individual preferences, but also on the personality of the individual group members. The picturebased approach, and thus the Seven-Factor Model, is a good starting point for capturing and aggregating preferences, needs, and personality of individual group members. In this context further adaptations, improvements, and evaluations are planned, especially for dealing with group dynamics [7, 6, 8]. Breaking the data source (data structure) and expert knowledge dependence: Neural networks, in particular convolutional neural networks, can be used to identify concepts out of tourism product pictures. In turn, these learned concepts can be exploited for mapping the respective tourism products onto the defined categorization (see Section 3.3, Step 1). In this way we can omit the one-time manual expert allocation of tourism product attributes (given by the used data source) onto the defined categories. Furthermore, this approach would lead to a more generalized solution by easing the dependence on heterogeneous data sources (data structures).

References [1]

[2] [3]

[4] [5]

[6]

[7]

[8]

Berger, H., Denk, M., Dittenbach, M., Merkl, D., and Pesenhofer, A. (2007). Quo vadis homo turisticus? towards a picture-based tourist profiler. Information and Communication Technologies in Tourism 2007, pages 87–96. Borras, J., Moreno, A., and Valls, A. (2014). Intelligent tourism recommender systems: A survey. Expert Systems with Applications, 41(16):7370–7389. Braunhofer, M., Elahi, M., Ge, M., and Ricci, F. (2014). Context dependent preference acquisition with personality-based active learning in mobile recommender systems. In International Conference on Learning and Collaboration Technologies, pages 105–116. Springer. Cruneo GmbH (2017). https://www.cruneo-kreuzfahrtvergleich.de. De Chouhdury, M., Feldman, M., Amer-Yahia, S., Golbandi, N., Lempel, R., and Yu, C. (2010). Automatic construction of travel itineraries using social breadcrumbs. In HT ’10 Proceedings of the 21st ACM conference on Hypertext and hypermedia, pages 35–44. ACM. Delic, A., and Neidhardt, J. (2017). A comprehensive approach to group recommendations in the travel and tourism domain. In Adjunct publication of the 25th conference on user modeling, adaptation and personalization, pages 11–16. ACM. Delic, A., Neidhardt, J., Nguyen, T. N., Ricci, F., Rook, L., Werthner, H., and Zanker, M. (2016). Observing group decision making processes. In Proceedings of the 10th ACM conference on recommender systems, pages 147–150. ACM. Delic, A., Neidhardt, J., Rook, L., Werthner, H., and Zanker, M. (2017). Researching individual satisfaction with group decisions in tourism: experimental evidence. In Information and Communication Technologies in Tourism 2017, pages 73–85. Springer.

198 | W. Grossmann et al.

[9] [10] [11] [12]

[13]

[14] [15] [16] [17] [18]

[19] [20]

[21] [22]

[23]

[24] [25] [26] [27]

[28] [29] [30]

Gavalas, D., Konstantopoulos, C., Mastakas, K., and Pantziou, G. (2014). Mobile recommender systems in tourism. Journal of Network and Computer Applications, 39:319–333. GIATA GmbH (2018). https://https://www.giata.com/. Gibson, H., and Yiannakis, A. (2002). Tourist roles: Needs and the lifecourse. Annals of tourism research, 29(2):358–383. Glatzer, L., Neidhardt, J., and Werthner, H. (2018). Automated assignment of hotel descriptions to travel behavioural patterns. In Information and Communication Technologies in Tourism 2018, pages 409–421. Springer. Gretzel, U., Mitsche, N., Hwang, Y.-H., and Fesenmaier, D. R. (2004). Tell me who you are and I will tell you where to go: use of travel personalities in destination recommendation systems. Information Technology & Tourism, 7(1):3–12. John, O. P., and Srivastava, S. (1999). The big five trait taxonomy: History, measurement, and theoretical perspectives. Handbook of personality: Theory and research, 2:102–138. Koprinska, I., and Yacef, K. (2015). People-to-people reciprocal recommenders. In Recommender Systems Handbook, pages 545–567. Springer. Matthews, G., Deary, I. J., and Whiteman, M. C. (2003). Personality traits. Cambridge University Press. McGinty, L., and Reilly, J. (2011). On the evolution of critiquing recommenders. In Recommender Systems Handbook, pages 419–453. Springer. Neidhardt, J., Schuster, R., Seyfang, L., and Werthner, H. (2014). Eliciting the users’ unknown preferences. In Proceedings of the 8th ACM Conference on Recommender systems, pages 309–312. ACM. Neidhardt, J., Seyfang, L., Schuster, R., and Werthner, H. (2015). A picture-based approach to recommender systems. Information Technology & Tourism, 15(1):49–69. Neidhardt, J., and Werthner, H. (2017). Travellers and their joint characteristics within the seven-factor model. In Information and Communication Technologies in Tourism 2017, pages 503–515. Springer. Neidhardt, J., and Werthner, H. (2018). IT and tourism: still a hot topic, but do not forget it. Information Technology & Tourism, 20(1):1–7. Pommeranz, A., Broekens, J., Wiggers, P., Brinkman, W.-P., and Jonker, C. M. (2012). Designing interfaces for explicit preference elicitation: a user-centered investigation of preference representation and elicitation process. User Modeling and User-Adapted Interaction, 22(4–5):357–397. Quercia, D., Schifanella, R., and Aiello, Luca M., (2014). The shortest path to happiness: Recommending beautiful, quiet and happy routes in the city. In HT ’14 Proceedings of the 25th ACM conference on Hypertext and social media, pages 116–125. ACM. Ricci, F. (2002). Travel recommender systems. IEEE Intelligent Systems, 17(6):55–57. Ricci, F., Rokach, L., and Shapira, B. (2015). Recommender systems: Introduction and challenges. In Recommender Systems Handbook, pages 1–34. Springer. Ricci, F., Woeber, K., and Zins, A. (2005). Recommendations by collaborative browsing. In Information and Communication Technologies in Tourism 2005, pages 172–182. Springer. Sertkan, M., Neidhardt, J., and Werthner, H. (2018). Mapping of tourism destinations to travel behavioural patterns. In Information and Communication Technologies in Tourism 2018, pages 422–434. Springer. Sertkan, M., Neidhardt, J., and Werthner, H. (2018). What is the “personality” of a tourism destination? Information Technology & Tourism. Tkalcic, M., and Chen, L. (2015). Personality and recommender systems. In Recommender systems handbook, pages 715–739. Springer. Werthner, H., and Klein, S. (1999). Information technology and tourism: a challenging relation.

7 Pictures as a tool for matching tourist preferences with destinations | 199

Springer-Verlag Wien. [31] Werthner, H., and Ricci, F. (2004). E-commerce and tourism. Communications of the ACM, 47(12):101–105. [32] Woszczynski, A. B., Roth, P. L., and Segars, A. H. (2002). Exploring the theoretical foundations of playfulness in computer interactions. Computers in Human Behavior, 18(4):369–388. [33] Zins, A. (2007). Exploring travel information search behavior beyond common frontiers. Information Technology & Tourism, 9(3-1):149–164.

Xiangdong Li, Yunzhan Zhou, Wenqian Chen, Preben Hansen, Weidong Geng, and Lingyun Sun

8 Towards personalized virtual reality touring through cross-object user interfaces Abstract: Real-time adaptation is one of the most important problems that currently require a solution in the field of personalized human–computer interaction. For conventional desktop system interactions, user behaviors are acquired to develop models that support context-aware interactions. In virtual reality interactions, however, users operate tools in the physical world but view virtual objects in the virtual world. This dichotomy constrains the use of conventional behavioral models and presents difficulties to personalizing interactions in virtual environments. To address this problem, we propose the cross-object user interfaces (COUIs) for personalized virtual reality touring. COUIs consist of two components: a Deep Learning algorithm-based model using convolutional neural networks (CNNs) to predict the user’s visual attention from the past eye movement patterns to determine which virtual objects are likely to be viewed next, and delivery mechanisms that determine what should when and where be displayed on the user interface. In this chapter, we elaborate on the training and testing of the prediction model and evaluate the delivery mechanisms of COUIs through a cognitive walk-through approach. Furthermore, the implications for using COUIs to personalize interactions in virtual reality (and other environments such as augmented reality and mixed reality) are discussed. Keywords: cross-object user interfaces, personalized interaction, virtual reality, eye tracking, Deep Learning

1 Introduction The modern rapid growth of the body of information has proved overwhelming to users, making it difficult to dynamically adapt individualized user interfaces and contents to users’ personal interests [46]. Nevertheless, personalization is considered as an effective solution to the problem because it can help users improve the effectiveness of task operations, change the appearance of user interfaces, and mediate users’ interactions based on their emotional and cognitive statuses [2]. Over decades of personalization research, many approaches and models have been developed and widely integrated into applications such as search engines and social networks [59]. Acknowledgement: The authors thank all the editors and reviewers for their comments. This research has been supported by funding from Cloud-based natural interaction devices and tools (2016YFB1001304) and NSFC project 61802341. https://doi.org/10.1515/9783110552485-008

202 | X. Li et al. Real-time adaptation is one of the core features of personalization and has attracted extraordinary attention from researchers in both academia and industry [61]. In conventional desktop systems, user behaviors are acquired to develop models that can adaptively support context-aware interaction. In virtual reality interactions, however, the user manipulates tools in the physical world but views objects in the virtual world. In the latter case, users’ physical operations are separated from their perceptual activities. This separation constrains the use of the conventional behavior models because they often require long-term learning that captures users’ knowledge and activities; there are few models dedicated to interactions that occur simultaneously across both physical and virtual spaces. More importantly, this separation adds practical difficulties to model construction. For example, a user can be sitting still in a seat while using eye movements to navigate around virtual reality environments through a headset. In this case, the user’s physical behavior is immaterial in developing a robust model for real-time adaptation. Existing modeling approaches rely on various types of physical activities [23]. Since the interaction in virtual reality environments involves more virtual activities, e. g., navigation and virtual object manipulation [24], conventional behavior models and adaptation strategies have been suffering a lack of physical activities that are essential to construct user profiles for personalized interaction. Similar situations also happen in augmented reality and mixed reality systems, which concern a multitude of virtual objects and user interface interactions. A few studies adopted technologies such as Machine Learning [19] to automatically classify user behaviors in personalized interaction, but that was mostly based on the physical world. As virtual reality, augmented reality, and mixed reality applications grow rapidly, it is important to consider real-time adaptation in personalized interaction in virtual environments. To address the problem, we propose cross-object user interfaces (COUIs), a concept that combines the features of distributed user interfaces and tangible user interfaces and is specialized for real-time adaptation in virtual reality environments. Put simply, COUIs act as an instance of spatially distributed user interfaces that anchor to different virtual objects and can appear and disappear in runtime. It differs from conventional graphic user interfaces on monitors, as it adopts the analogy of the information displayed on different objects’ surfaces in the physical world. In this chapter, the COUIs comprise two components: a Deep Learning-based model, which uses convolutional neural networks (CNNs) [19] to predict users’ visual attention from past eye movement patterns and infers the next objects to be viewed, and the COUIs delivery mechanisms that determine what should when and where be displayed. COUIs aim to address the problem of predicting users’ personal interests at runtime. We discuss the implications for using COUIs as an approach to personalize interactions not only in virtual reality but also in augmented and mixed realities. The remainder of this chapter is organized as follows: Section 2 describes related work regarding both conventional studies of personalized human–computer interaction and virtual reality studies. Section 3 presents the methodological details of COUIs

8 Towards personalized virtual reality touring through cross-object user interfaces | 203

and the related model design and evaluation. Section 4 discusses the findings and related implications. Finally, Section 5 summarizes the conclusions.

2 Related work To understand personalized human–computer interaction in both physical and virtual worlds and to highlight the importance of developing COUIs to address personalized virtual reality touring, we reviewed a multitude of studies on personalized interaction systems and we examined approaches to behavior modeling and prediction in virtual reality. Moreover, we propose the concept of COUI and elaborate related features and requirements for Deep Learning-based visual attention prediction in virtual environments.

2.1 Personalization in human–computer interaction Personalization in human–computer interaction has a long history during which various concepts such as adaptive user interfaces [9], user modeling, and intelligent user interfaces [8] have been developed. Despite the naming variations, these concepts share a common idea: that personalization improves system effectiveness and efficiency and improves the User Experience by representing and acting on models of user behavior [43]. Personalization has become a vital component in applications such as search engines, social media, and online shopping [59]. For example, personalized information is more likely to convince students that material they are asked learn is relevant to their lives [27]. Personal recommendation systems for scholarly publications minimize the time and effort expended by professional researchers [48]. In particular, web- and mobile-based personalized advertising has been used to provide personalized content to gain subscriber loyalty [10]. Personalized approaches and systems have also been adopted for more critical uses. For example, personalized instructions were implemented to motivate users to improve their fitness [49], and a personalized wearable monitoring system was designed for patients with mental disorders and physicians who managed the diseases [33]. Personalization addresses two main problems: (i) user diversity and individual interaction needs and (ii) related requirements based on users’ physical and cognitive abilities [52]. Personalization ensures that a variety of users can have equal access to interactive systems regardless of their experience levels and capabilities [2]. In addition, personalization allows systems to adapt to users’ varying interaction requirements. To reach the above goals, personalization efforts often acquire numerous user profiles to model users’ experiences and preferences [23]. Following this approach,

204 | X. Li et al. many studies have developed models based on collecting abundant user behavior and preference data. For example, such systems are used in online shopping sites, where they recommend products based on users’ browsing history [36]. As the user’s personal interests change over time, previous studies have introduced adaptive modeldriven user interface development systems [1] and have successfully integrated these into mobile applications [45] and brain–robot interfaces [18].

2.2 Personalized interaction in virtual reality In contrast to personalized applications in the physical world, personalization systems are relatively less common in the virtual space. Virtual reality, a concept based on “presence” and “telepresence” [54], refers to the sense of being in an environment generated by mediated means. Most current studies focus on the concept of presence in virtual reality and attempt to improve two dimensions of presence: vividness (or “immersion” [5]) and interactivity. In contrast, little is known about users and their behaviors in virtual reality. Early studies on virtual reality were mostly technology-driven [51]. In the 1990s, researchers developed the CAVE system, an early implementation of virtual reality that used surround-screen projection, to explore the feasibility of projection technology that could improve the quality and usability of the virtual experience [11]. Later, the CAVE2 system added user interaction to an immersive simulation in a hybrid reality environment. Nevertheless, the system retained a strong focus on the technical development of a scalable adaptive graphics environment [15]. More recently, studies have shifted the focus to sensory technologies that can further improve virtual reality immersion and interactivity. For example, to support high levels of interaction, hand gestures were developed to manipulate objects in virtual reality [37], and head tracking was integrated into virtual reality with the Oculus Rift headset to provide deep immersion game control [42]. Additionally, multimodal interactions have been developed that include visual, haptic, and brain–computer interfaces in virtual reality environments [35]. Also, previous studies have built virtual environments to adapt to personalized learning [60, 12]. Previous studies drew insights from the usability of user interfaces in virtual reality [56]. Interaction performances, including multiuser collaboration features [7], were evaluated to understand users’ behavior. The users’ interaction experiences and cognition statuses were examined to reveal to what extent the users’ interaction activities were analogous in the real and virtual spaces [32]. In particular, personalizing the User Experience in virtual reality has become a core topic in recent studies. For example, an individualized sense of embodiment could alter the users’ behavior and emotional states [28]. Although virtual reality allows for highly detailed behavioral observations and measurements as well as systematic environmental manipulations with multisensory

8 Towards personalized virtual reality touring through cross-object user interfaces | 205

tools [32], it has some shortcomings that hinder the design of personalized interaction. These shortcomings include insufficiently high-quality content, inflexible interaction tools, and, most importantly, problems with user interface registration and the poor usability of the presented information [34]. Virtual reality can replicate vivid scenes and present spatial user interfaces in multiple forms, for example, interfaces that float in the air or are attached to virtual objects [56]. Given their spatial memories, people do not become lost easily when navigating around the objects in the physical world. Several virtual reality-based games have successfully integrated user interfaces with the virtual objects in game scenes to support individual interaction experiences [42]. Nongame applications (e. g., virtual reality museums) have adopted similar approaches to support personalized experiences [46]. Thus far, the existing studies have taken advantage of user behaviors derived from the physical world to develop adaptive interactions in the virtual reality world. However, this approach becomes less effective in situations when the user’s manipulation behavior and perception activities are distributed in two separate spaces. The users do not need to make exaggerated bodily movements to manipulate the virtual objects. For example, users can play games through a virtual reality headset that integrates eye tracking technology [4]. FaceVR, a real-time facial re-enactment and eye gaze control system, presented an approach to personalize interactions in virtual reality [57]. The integration of eye tracking technology and head-mounted displays offers a nondisruptive virtual interaction experience [53]. Meißner et al. [44] argued that such integration provided a unique opportunity to implement individual shopping assistance.

2.3 Predicting user behaviors for personalized interactions Understanding a user’s intent during an interaction is important to personalization. In the context of ubiquitous computing, persuasive technologies were used to change users’ existing behaviors [16]; such designs are intended to influence users’ behaviors [40]. In addition, effective prediction mechanisms were proposed by modeling users’ behaviors via approaches such as the keystroke-level model [14], Markov model [58], technology acceptance model [25], tree representation model [21], and sequential behavior model [17]. These models rely on types of user behavior that need to be explicitly exhibited during the interaction. Understanding user behavior is important for recommending desired interactions. For example, mobile phone users’ sequential movements were collected to develop prediction strategies to improve system performance [58]. Obviously, the behaviors of virtual reality users are more complex than those of traditional systems in the physical world. Cognitive models have been developed to help in understanding both social and psychological behaviors [3]; some of these models were used to predict and modify health-related behaviors [50]. Several cognitive behavioral therapies have used virtual reality as a tool to address social phobias [30]; however,

206 | X. Li et al. few of these models were specialized for interactions that occur in virtual reality environments. Rapid advancements in Deep Learning, which is a subset of Machine Learning algorithms [22], have empowered the construction of user models for real-time adaptation in virtual reality. Koulieris et al. [31] trained a classifier to recognize object categories at runtime and used it to predict user eye gaze to perform dynamic stereo manipulation in games. Using a head tracking dataset, previous studies have explored user behavior in virtual reality spherical video streaming [62]. The Deep Learning-based applications involved a variety of signal and information processing tasks as well as speech recognition and computer vision [13]. Recently, Deep Learning has been used for acoustic modeling and visual understanding [22] and has demonstrated promising results in improving the efficacy of personalized teaching in an education context [20]. Nonetheless, previous studies have noted that Deep Learning has some limitations: it requires huge datasets to develop a robust model [62] and tuning the parameters of individual models can require large efforts [13]. Some Machine Learning frameworks such as Pytorch [26] have been developed to address these limitations, and a few pioneering studies have used CNNs to successfully estimate appearance-based gazes [63] and predict eye fixations [38].

2.4 Cross-object user interfaces in virtual reality Virtual reality has a novel “reality” that naturally supports personalized access to the virtual environment and the objects inside the space [29]. By contrast, in the physical world, objects are distributed across the spatial environment, allowing individuals to access these objects in a personalized way. Virtual reality can replicate the “presence” of the physical world and it has a high potential for integrating user interfaces with virtual objects. For example, a user can navigate around the virtual space and view different virtual objects while being given customized information. Integrating user interfaces with virtual objects has multiple benefits. Users are naturally familiar with the types of user interfaces embodied by the objects themselves, and they can take advantage of spatial memory to enhance information acquisition [47]. Furthermore, predicting the next object a user is about to view can provide a stronger sense of personalized interaction – similar to the approach researchers have taken in web personalization: the more accurate user behavior prediction is, the greater the interaction efficiency and the user engagement will be [23]. Given this understanding, we propose the new term “cross-object user interfaces (COUIs)” to describe user interfaces that are distributed across virtual reality and simultaneously integrated with virtual objects. COUIs are derived using the analogy of “spatially distributed user interfaces in the physical world” [41]. In the previous work

8 Towards personalized virtual reality touring through cross-object user interfaces | 207

[55], three types of COUIs demonstrated effectiveness and a positive influence on perceived usability of the exhibition system in a virtual reality museum. To some extent, COUIs have commonalities with existing concepts such as distributed user interfaces; however, it differs from these existing concepts from two perspectives: (i) COUIs are coupled with the virtual objects and can be fully attached or semi-attached to the hosting virtual objects or fully detached from them, and (ii) COUIs are controlled by eye movements: they are displayed on virtual objects that have high visual attention levels.

3 Method To demonstrate our work, we introduce the development of COUIs in terms of the prediction model and the user interface delivery mechanisms. Then, we describe tests of the COUIs prediction model using a virtual reality touring task. Our tests validated the accuracy of the prediction model and reflected the use of the delivery mechanisms.

3.1 Design of COUIs 3.1.1 COUI prediction model First of all, we chose the virtual reality museum as the main study scenario. This is justified for two reasons: (i) the museum is an ideal environment that can incorporate multiple exhibits in various forms, and (ii) the museum is a familiar scenario to most participants and it involves non-competitive interaction tasks. As a result, we constructed a virtual reality museum exhibition room which comprised a set of virtual exhibits of ancient vases and paintings. A user’s visual attention is affected by numerous factors, including eye gaze position, speed, and direction, virtual object distance, the user’s personality, and individual preferences. Thus far, the influences attributable to these factors have not been comprehensively understood. Therefore, it is impractical to enlist all these factors and assemble them in a model. Instead, to draw correlations between a user’s eye movements and the objects that will be viewed next, we adopted a Deep Learning algorithm, specifically a CNN, to process the user’s eye movement patterns and return a prediction concerning the next virtual object a user will view. A CNN was successfully used to build a prediction model for online advertisement clicks [39]. However, rather than simply predicting advertisement clicks on 2D screens, the prediction model in this study was extended to object viewing in virtual 3D spaces. Considering the complexities of user eye movement patterns and the dynamics in virtual object interaction, we proposed a 1D temporal CNN to perform eye gaze prediction. Unlike 2D feature recognition used in image classification approaches, we used

208 | X. Li et al. a 1D convolution on a 2D matrix of activations to acquire temporal features from each channel. The 1D convolution in the time dimension was effective for classifying various situations that a user would experience. The prediction model was constructed as follows. We used a matrix s ∈ Rf ×l ,

(8.1)

as the input instance, where s is the input instance, R is the set of real numbers, f is a selected number of feature channels, and l is the time duration of a time window. The output of the prediction model was the 3D coordinates (in the virtual reality coordinate system) that the user was about to view next. We can infer the next object to be viewed in a virtual reality environment from these coordinates. More specifically, the prediction model’s output included the user’s estimated eye gaze position, the duration of eye fixation, and the positions of the next objects. Second, we configured the networks as follows. The input consists of f × l-dimensional images into four 1D temporal convolutional layers (Fig. 8.1). Each layer contained different numbers of input and output channels that were followed by batch normalization (BN) layers and rectified linear unit (ReLU) layers. We inserted dropout layers to reduce the complex co-adaptations during the network training process, thus reducing overfitting in the networks. In addition, two fully connected layers were appended at the ends of the networks.

Figure 8.1: The network architecture. Input: f × l. The network consists of convolutional layers (red), batch normalization layers (blue), non-linear layers (green), and dropout layers (grey). These feed into two FC layers.

We captured the training data for the prediction model from practical virtual reality system trials. We recruited 10 participants to navigate around the virtual reality exhibition room while wearing an Oculus DK2 headset (Fig. 8.2). The tour was controlled by a program that aimed to simplify the spatial navigation and thus circumvent any unnecessary influence from the participants’ prior spatial navigation experiences or cognition differences. The headset integrated the Pupil-lab’s monocular eye tracking add-on to capture the user’s pupil movements at 120 Hz (Fig. 8.2). The participants

8 Towards personalized virtual reality touring through cross-object user interfaces | 209

Figure 8.2: The virtual reality headset (left top) with the monocular eye-tracking add-on cup with an IR mirror, IR LEDs, and a 720P camera (left bottom). The touring scene is shown on the right.

were instructed to view the exhibits in the virtual space and interact with the objects and the associated user interfaces. At this stage, all the user interfaces were static. The touring task took each participant approximately 5 min (M = 4.85, SD = 1.74), and there were no time constraints or other cognitive task requirements. The participants’ eye movements and their corresponding activities in the virtual space were recorded to video at 400 × 400 at 120 fps (eye camera) and 1080 × 720 at 30 fps (virtual reality camera), respectively. We divided the video footages into equal 10-second clips, each containing the eye movements and the virtual objects being viewed. In total, we acquired 3,045 video clips. By recording the participants’ eye movements and recording hardware parameters with the virtual reality headset, we manually extracted 44 features that were refined as the input of the prediction model. These arbitrary features included the confidence of pupil detection (ranging from 0–1), timestamps in milliseconds, indices of individual data records, 3D coordinates of eye gazes and orientations, the eye gaze parameters, and the tags of the virtual objects viewed by the participants. The overall dataset of l frames had dimensions of 44 × l, and the data for Parts 1 and 2 in Table 8.1 were merged using the timestamps. Furthermore, we manually compared the similarities of the 44 features and adopted 10 features (marked with asterisks in Table 8.1). We sampled the training data from the 10-second video clips. Each sample contained the 10 features that were encoded into a 78-bit vector. In addition to the samples, to consider the influence of previous eye movements on subsequent eye gazes, we appended the previous 10-second eye movement history with a decay weight. The distributions of the weights of these frames in the history sample are shown in Fig. 8.3. Finally, we constructed the four-layer CNNs with an input size of f × 78, where f is the number of feature channels we selected previously. The network’s output was dependent on the number of objects to predict. A summary of the network layers and related outputs is presented in Table 8.2.

210 | X. Li et al. Table 8.1: The features and related sub-channels collected in the touring (the final chosen channels are marked with asterisks). Part

Feature name

Number of sub-channels

1

Timestamp Index of current data record Eye gaze (x, y, z) Eye gaze position (x, y, z) Eye gaze orientation (x, y, z)

1 1 3* 3* 3*

2

Pupil parameters Gaze parameters Pupil detection confidence

2 29 1*

3

Label of virtual objects

1

Figure 8.3: The eye gaze historical data and its components’ weights. Table 8.2: The configurations of the networks 1D Conv followed by BN (batch normalization) and ReLU (rectifier linear unit). Kernel size is lower as the neural network goes deeper. Padding = 5 at the first convolution layer to control the amount of implicit zero-paddings on both sides. L is the number of output labels. Layer type Input 1D-Conv+BN+ReLU+Dropout 1D-Conv+BN+ReLU+Dropout 1D-Conv+BN+ReLU+Dropout 1D-Conv+BN+ReLU+Dropout FC FC Output

Channel – 32 64 128 256

Kernel size

Output size

– 6 4 3 3

f × 78 32 × 42 64 × 23 128 × 13 256 × 13 1 × 1024 1024 × 1024 1×L

We selected 90 % of the overall samples for training and the remaining 10 % for testing. We optimized the networks through SGD (lr = 0.001, momentum = 0.9) for 40 iterations with mini-batches of 64 samples. The learning rate decayed by 0.1 ev-

8 Towards personalized virtual reality touring through cross-object user interfaces | 211

ery 15 iterations. The loss function was based on the Softmax classifier function and defined as follows: loss = −

1 N efyi ). ∑ log( N i=1 ∑j efj

(8.2)

3.1.2 COUI distribution mechanisms Graphic design of COUIs The graphic design for COUIs was the same in terms of layout and content, which ensured that COUIs could be displayed consistently across a range of virtual objects. The user interface graphics consisted of only the wire frames, background colors, and contents. An example of a COUI graphic design is shown in Fig. 8.4.

Figure 8.4: COUI graphic design example.

The sizes of the COUIs varied according to the objects to be viewed next. For example, the user interface was larger when the estimated object was the table in the center of the virtual space, and vice versa. However, the sizes of the user interfaces were fixed to the virtual objects. In addition, when the user viewed the object from different angles, the user interfaces automatically adjusted to face towards the users. COUI delivery mechanisms The prediction model was not responsible for displaying the COUIs. As mentioned, the COUIs were displayed on the surfaces of the objects to be viewed next. Therefore, we had to consider what to display and when and where based on the prediction results. When to display. Real-time adaptation requires immediate responses to user behavior; thus, the COUIs were intended to be presented rapidly when the prediction model indicated the next objects. user interfaces in the physical world are always

212 | X. Li et al. available for user interactions because it is technically and economically infeasible to dynamically hide or show them. By contrast, a virtual reality environment can easily control user interface visibility at any time. However, frequently displaying and hiding user interfaces all at once is not the optimal solution because it would clearly increase the users’ cognitive loads. Consequently, during the training session, we used static user interfaces to provide the introductory information for the objects; no dynamic user interfaces were used at this stage. Similarly, the samples provided during the testing session contained no dynamic user interfaces. The time required to display COUIs in practical use involved two factors: the start time and the procedure to display. The start time was determined by the prediction model’s results: the COUIs were presented in the virtual space immediately after the prediction results were acquired. To display the COUIs, we adopted a 500 ms animation whereby the user interface would “fade in,” which reduced the visual distraction to the user. When the prediction model output an incorrect estimation, the user interfaces would be immediately displayed on the objects being viewed. Where to display. We took the analogy of the user interfaces in the physical world, which were seamlessly integrated with the hosting objects such as the posters on the wall and the texts on the box. Given that the virtual reality space was constructed as a replication of a physical cultural museum, the COUIs that were dynamically loaded during the interaction were supposed to be presented in an analogous manner, i. e., displayed in the space surrounding the objects of the user’s personal interest. Following this clue, we added extra constraints to the presence of the COUIs by displaying these user interfaces over the objects, with the bottom edges of the COUIs connected with the objects (see Figs. 8.4 and 8.5). The COUI positions and orientations were adjustable according to the user’s view points in the virtual reality space. What to display. The COUIs can display multimedia content at runtime in virtual reality. Because reading large amounts of text in a virtual reality situation could potentially cause usability problems [56], we instead chose to display pictures and videos related to the objects of the user’s interest. In addition, the COUIs only displayed content that complemented the information on the static user interfaces. For example, the static information might show the name and history of a vase, and the COUIs added videos of the craft of vase making. All content for both the static user interfaces and the COUIs was pre-configured before the study was conducted.

3.2 Evaluation of the COUIs 3.2.1 Testing the prediction model To evaluate the accuracy of the prediction model, we tested it using the testing samples by calculating the percentage of correctly detected virtual objects. We compared our

8 Towards personalized virtual reality touring through cross-object user interfaces | 213

prediction model’s performance with another prediction model from a virtual reality game [31]; the results are summarized in Table 8.3. Table 8.3: Comparison of the evaluation results (DF predictor is the prediction model from [31]). Methods

DF predictor

COUI prediction model

Data source

Current state of game variables

Number of features Sample rate Prediction objects

13 1 Hz Current object category viewed

Apparatus

NVisorTM SX111 HMD + eye tracker – Arrington Research VR games 76.2 %

Previous eye movements & other history states 10 30 Hz Area or objects viewed in the next moment Oculus Rift DK2 + Pupil-lab Add-on

Applications Accuracy

VR museum 86.3 %

These test results indicate that the model can make real-time eye gaze prediction from the user’s previous eye movements. The time required to process each frame was less than 0.02 ms (using a workstation with a 3.4 GHz CPU, 64 GB of RAM, and an Nvidia Geforce GTX 1080 with 8 GB of RAM). Furthermore, we tested the prediction model with different batch sizes and drop probabilities (as p). The results are summarized in Table 8.4. Table 8.4: The testing results with different batch sizes and drop probabilities (prediction accuracy [epoch]).

Batch size 128 Batch size 256 Batch size 512

p=0

p = 0.1

p = 0.2

82.4 % [2] 86.5 % [4] 86.5 % [8]

84.7 % [4] 86.3 % [9] 85.9 % [13]

85.8 % [2] 84.3 % [15] 85.7 % [19]

Additionally, we tested the networks’ prediction accuracy under different conditions by changing the parameters of the gaze point (GP) position, the user’s viewpoint (PF) position, the orientation of the user’s viewpoint (OF), and the eye gaze detection confidence (CE). The influence of the batch normalization layers was also tested. The results are summarized in Table 8.5. The results indicate that the CE made little contribution to the prediction accuracy, while the addition of the batch normalization layers after each convolutional layer improved the prediction accuracy by a minimum of 6 %. The single feature channel of GP and the feature channels of PF plus OF had similar influences compared to the

214 | X. Li et al. Table 8.5: The prediction accuracy results using different parameters (CE: eye detection confidence, GP: gaze point position, PF: viewpoint position, OF: viewpoint orientations). Parameter

Prediction accuracy

Conv + BN + ReLU + Dropout Conv + BN + ReLU + Dropout – CE Conv + ReLU + Dropout Feature channel of GP Feature channel of PF Feature channel of OF Feature channel of PF + OF

86.3 % [9] 87.0 % [9] 79.3 % [10] 84.7 % [5] 47.6 % [2] 72.5 % [4] 85.9 % [21]

original model; the prediction accuracy was not improved when the parameters were concatenated. The prediction errors were analyzed using the three most frequent prediction results. As Fig. 8.5 shows, Areas 1 and 2 had the highest prediction error rates, i. e., 12.7 % in the test results; Areas 12 and 5 had a prediction error rate of 10.3 %; and Areas 1 and 3 had a prediction error rate of 8.1 %.

Figure 8.5: The prediction errors and related areas (Area 1: fully-detached user interface, Area 2: semi-detached user interface, Area 3: exhibition booth with fully attached user interface, Area 5: wood pillar, Area 12: central ceiling).

3.2.2 Evaluation of the COUI delivery mechanisms We evaluated the delivery mechanisms of the COUIs using the cognitive walk-through approach because it is particularly suited for early evaluations of applications at a low cost. Additionally, it can provide rapid insights into an application’s design before a usable system is fully implemented. The cognitive walk-through was completed

8 Towards personalized virtual reality touring through cross-object user interfaces | 215

stepwise by five experts recruited from the digital media department, including three experienced digital media designers and two HCI researchers. We defined the tasks the experts needed to carry out and then asked them to examine the tasks based on provided evaluation questions. First, we set the scenario for the evaluation: You are about to experience a virtual reality exhibition room. During your tour around the room, you will be presented with personalized user interfaces at runtime based on personal interests captured from your eye movements. Second, we defined the tasks that the experts needed to carry out: (a) enter the virtual reality environment, (b) start touring the virtual space, (c) gaze at one or more objects, (d) interact with the COUIs that will appear, and (d) navigate around the virtual space. The above tasks were provided to the experts in a list format. Other interactions that were performed in the evaluation but not subject to expert walk-through were not assessed. Third, we provided four questions for the experts to answer by following the format of Blackmon et al.’ [6] work. The questions were the following: (a) Will the user feel a sense of presence in the virtual reality environment? Is it clear that the virtual reality environment is required to achieve this type of personalized touring experience? (b) Will the user notice that the dynamically loaded COUIs are available to them? Are the personalization features provided by the COUIs clearly visible? (c) Will the user associate the personalized touring experience with the COUIs? Is it clear to users that the personalized portion of the experience is delivered by the COUIs? (d) If users notice the personalized experience during the tour, will they acquire appropriate information? Will users be aware they have successfully achieved their goal after performing an action? Finally, the experts read a paper sheet of instructions concerning the virtual reality museum and began to walk through the tasks and answer the questions. Depending upon the experts’ answers during the evaluation, each answer to each question was marked as either a success or a failure. In the latter case, the experts were asked to report the reasons concerning which factors they felt might prevent the desired performance, thus allowing solutions to be found and improvements made to the delivery mechanisms of the COUIs. When the process was complete, we collected all the experts’ reports and prioritized the advantages and issues as follows. – The experts agreed that presence is a vital factor that influences user perceptions in a personalized virtual reality touring experience. Users were likely to sense the personalization in the immersive virtual reality environment, especially when the user was interactively exploring the virtual space. – The distributions of the COUIs, which integrated the user interfaces with the objects users were personally interested in during the virtual reality tour, were easily noticed by the users. The experts agreed that it would be difficult to ignore the

216 | X. Li et al.

–

–

–

COUIs because they were displayed dynamically on objects in which each user was personally interested. Additionally, the experts added that, when incorrect predictions occurred, the sudden appearance of the COUIs was even more noticeable. The nature of the COUIs was analogically derived from the physical world, which made the presentation of the COUIs intuitive to the users. The experts mentioned that the naturalness of this form is self-explaining because humans have been living in such an environment for thousands of years. Additionally, two experts mentioned that if the COUIs were presented in exactly the same way as such information might be presented in the physical world, the sense of personalization might actually be impaired because users are used to similar forms of information delivery. Users did not proactively engage with the COUI graphics due to the test system’s limited interactivity. However, given the prediction model’s results, the COUIs can present information highly relevant to the estimated virtual object. Nevertheless, as indicated by the experts, the information presentation relied heavily upon the prediction results. The experts pointed out a potential risk during the delivery of the COUIs – that the sudden appearance of the COUIs during the tour could possibly distract users’ visual attention from their current object to the next one. Consequently, such distractions would affect the prediction model’s accuracy. In this regard, future empirical studies needed to investigate the interrelationships between the distraction factor of the COUIs and the prediction model.

4 Discussion and implications The previous studies adopted several measures to support personalized interaction for individuals, including tangible user interfaces that improved the sense of personalization through haptic feedback and distributed user interfaces that were designated to augmented user engagement in the interaction with multiple devices. These studies were grounded on the physical world and often collected considerable amounts of user behavior-pattern data to implement the adaptive models. By contrast, user behavior in a virtual reality environment is much more implicit than that in the physical world. For example, little is known about how a user’s eye movements can be exploited to support personalized interactions in a virtual reality setting. The COUIs we proposed in this study address the problem of real-time adaptation during a virtual reality tour. Compared with the existing eye tracking studies and mouse-click prediction algorithms, this study adds the following new findings. First, the COUIs incorporated Deep Learning algorithms to construct the prediction model and estimate a prediction for the objects in which a user is personally

8 Towards personalized virtual reality touring through cross-object user interfaces | 217

interested. The model underwent iterative training and (by the time of this writing) achieved an accuracy prediction rate of 86.3 % for real-time eye movements – outperforming techniques used by other state-of-the-art studies. The prediction models used in previous studies focused on user behavior on a 2D screen. By contrast, the prediction model of the COUIs was specialized to predict eye movements in virtual 3D space, considered the user-object distance in virtual reality, and provided insights into which virtual objects (and even which parts of the objects) the user was about to interact with. These insights are useful in helping designers and researchers understand users’ cognitive states in 3D space and, more importantly, in supporting personalized interactions. Second, the COUIs aimed to personalize the user interfaces instead of the contents. Previous studies argued that context-aware contents are an effective approach to personalization, whereas personalizing the user interfaces was likely to confuse users during the interactions. Moreover, the COUIs supplied a new understanding of personalized interaction by incorporating spatially distributed user interfaces driven by the real-time visual attention prediction model and the delivery mechanisms. The novelty of the COUIs is twofold. The prediction model incorporates Deep Learning algorithms to predict the user’s next point of visual attention in the virtual 3D space. The prediction accuracy was higher than in other studies, and the prediction model capitalized on eye tracking to address the lack of users’ physical behaviors in virtual reality. Another novelty is the delivery mechanisms, which are derived from user interfaces in the physical world. Distributed user interfaces are normal in the physical world because information (e. g., text, images, and video) is displayed on different object surfaces in the surrounding environments. People have long been accustomed to user interface forms in the physical world; however, this form of user interface is still rare in a virtual reality setting. Interestingly, the potential applications of COUIs are not limited to virtual reality environments, as it can be transferred to other users and different virtual reality museum settings, albeit extraordinary prediction model training would be needed. The study results show that the COUIs could also form an approach to personalizing human–computer interaction in multiple scenarios. This approach capitalizes on a user’s eye movement patterns to predict the next point of visual attention and, thus, the related cognitive status, and it can function as both content personalization and user interface personalization without requiring overt user body movements. Thus, this study shows that using eye movement patterns to address the need for individualized interaction is an effective method. As Deep Learning advances, the complexities involved in capturing, interpreting, and predicting the users’ visual attention are declining, whereas the prediction efficiency and accuracy are increasing. Additionally, the COUIs can be used as an automatic evaluation tool for virtual reality systems. Conventional empirical studies were forced to recruit numerous users and measure their eye movements during system trials. By contrast, the prediction

218 | X. Li et al. model used by the COUIs can generate considerable eye movement patterns without involving human participants. Given the accuracy rate (86.3 %) of the prediction model demonstrated in this study, it is reasonably convincing that the prediction model can capture sufficient eye movement patterns to be useful in assessing virtual reality settings. Such an automatic evaluation tool could considerably improve the iteration of application design and evaluation. COUIs are also suitable for both augmented reality and mixed reality environments. More specifically, COUIs are compatible with environments that integrate information with different objects – regardless of whether those environments are virtual or physical. The virtual reality space in this study was derived from a realistic museum scene; therefore, the users’ eye movement patterns should be similar to those in both the physical world and an augmented physical world. Given this understanding, COUIs have great potential for personalizing individuals’ interactions. This study had several limitations. First, the sample size was limited. Admittedly, employing more participants could extend the breadth of the training data, which would improve the prediction model. As indicated in the study, however, participant diversity is more important than the number of participants. Given that the prediction model’s training results reached an acceptable accuracy rate, any further improvements in accuracy would likely require a substantial number of participants. Nevertheless, this is one of the important limitations that should be addressed in future studies. Second, this study did not include thorough experiments to evaluate the delivery mechanisms’ effects on the participants’ interactions. This limited the empirical understanding of the COUIs from the user’s perspective; instead, the delivery mechanisms were analytically examined against the frameworks of previous studies. These previous studies provided a solid theoretical foundation, indicating that user interfaces presented in ways that are similar to user interfaces in the physical world will naturally be acceptable to users. Empirical future studies are expected to acquire a further understanding of the exact influence of the delivery mechanism on users’ interactions. Finally, the influence of the COUIs on the user’s cognition status requires future longitudinal studies; thus far, the prediction model was constructed using sample data of short-term interactions.

5 Conclusions The chapter proposed the concept of COUIs, which consist of a prediction model and delivery mechanisms that address the problem of real-time adaptation during personalized interactions in a virtual reality environment. The prediction model used a Deep Learning algorithm (CNN) to calculate the probability of the virtual objects with which users were about to interact. The delivery mechanisms considered several factors to determine the time, form and, contents of COUI display. The chapter elaborated on the

8 Towards personalized virtual reality touring through cross-object user interfaces | 219

training and testing of the prediction model using real participants, and it examined the performance of the delivery mechanisms by comparing it with user interfaces in the physical world. Furthermore, the implications for COUIs as an approach to personalized interaction in virtual reality and other contexts (e. g., augmented reality and mixed reality) are discussed. Also, the study identifies future work challenges, e. g., constructing an adaptive prediction model for personalized interaction in virtual environments.

References [1] [2]

[3] [4]

[5] [6] [7] [8] [9]

[10] [11]

[12]

[13] [14]

[15]

Akiki, P. A., A. K. Bandara, and Y. Yu (2014). “Adaptive model-driven user interface development systems.” ACM Computing Surveys (CSUR) 47(1): 9. Arazy, O., O. Nov, and N. Kumar (2015). “Personalityzation: UI personalization, theoretical grounding in HCI and design research.” AIS Transactions on Human-Computer Interaction 7(2): 43–69. Armitage, C. J., and M. Conner (2000). “Social cognition models and health behaviour: A structured review.” Psychology and health 15(2): 173–189. Bailenson, J. N., N. Yee, J. Blascovich, A. C. Beall, N. Lundblad, and M. Jin (2008). “The use of immersive virtual reality in the learning sciences: Digital transformations of teachers, students, and social context.” The Journal of the Learning Sciences 17(1): 102–141. Beer, S. (2015). Virtual Museums: an Innovative Kind of Museum Survey. In Proceedings of the 2015 Virtual Reality International Conference. Laval, France, pages 1–6. ACM. Blackmon, M. H., P. G. Polson, M. Kitajima, and C. Lewis (2002). Cognitive walkthrough for the web. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM. Brown, C., G. Bhutra, M. Suhail, Q. Xu, and E. D. Ragan (2017). Coordinating attention and cooperation in multi-user virtual reality narratives In Virtual Reality (VR), 2017 IEEE, IEEE. Browne, D. (2016). Adaptive user interfaces, Elsevier. Cerny, T., M. J. Donahoo, and E. Song (2013). Towards effective adaptive user interfaces design. In Proceedings of the 2013 Research in Adaptive and Convergent Systems, pages 373–380. ACM. Chen, P.-T., and H.-P. Hsieh (2012). “Personalized mobile advertising: Its key attributes, trends, and social impact.” Technological Forecasting and Social Change 79(3): 543–557. Cruz-Neira, C., D. J. Sandin, and T. A. DeFanti (1993). Surround-screen projection-based virtual reality: the design and implementation of the CAVE. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques. ACM. De Troyer, O., F. Kleinermann, and A. Ewais (2010). Enhancing virtual reality learning environments with adaptivity: lessons learned. Symposium of the Austrian HCI and Usability Engineering Group. Springer. Deng, L., and D. Yu (2014). “Deep learning: methods and applications.” Foundations and Trends® in Signal Processing 7(3–4): 197–387. El Batran, K., and M. D. Dunlop (2014). Enhancing KLM (keystroke-level model) to fit touch screen mobile devices. In Proceedings of the 16th international conference on Human-computer interaction with mobile devices & services. ACM. Febretti, A., A. Nishimoto, T. Thigpen, J. Talandis, L. Long, J. Pirtle, T. Peterka, A. Verlo, M. Brown, and D. Plepys (2013). CAVE2: a hybrid reality environment for immersive simulation and

220 | X. Li et al.

[16] [17] [18]

[19] [20]

[21]

[22] [23] [24] [25]

[26] [27]

[28] [29] [30]

[31]

[32]

[33]

[34]

information analysis. In The Engineering Reality of Virtual Reality 2013. International Society for Optics and Photonics. Fogg, B. J. (2003). Computers as persuasive social actors. In Persuasive technology-Using computers to change what we think and do. pages 89–120. Frias-Martinez, E., and V. Karamcheti (2002). A prediction model for user access sequences. In WEBKDD Workshop: Web Mining for Usage Patterns and User Profiles. Gandhi, V., G. Prasad, D. Coyle, L. Behera, and T. M. McGinnity (2014). “EEG-based mobile robot control through an adaptive brain–robot interface.” IEEE Transactions on Systems, Man, and Cybernetics: Systems 44(9): 1278–1285. Goodfellow, I., Y. Bengio, A. Courville, and Y. Bengio (2016). Deep learning, MIT press Cambridge. Gordon, C., and R. Debus (2002). “Developing deep learning approaches and personal teaching efficacy within a preservice teacher education context.” British Journal of Educational Psychology 72(4): 483–511. Gündüz, Ş., and M. T. Özsu (2003). A web page prediction model based on click-stream tree representation of user behavior. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. Guo, Y., Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew (2016). “Deep learning for visual understanding: A review.” Neurocomputing 187: 27–48. Hawalah, A., and M. Fasli (2015). “Dynamic user profiles for web personalisation.” Expert Systems with Applications 42(5): 2547–2569. Huang, H., N.-C. Lin, L. Barrett, D. Springer, H.-C. Wang, M. Pomplun, and L.-F. Yu (2016). “Analyzing visual attention via virtual environments.” 1–2. Huang, J.-H., Y.-R. Lin, and S.-T. Chuang (2007). “Elucidating user behavior of mobile learning: A perspective of the extended technology acceptance model.” The Electronic Library 25(5): 585–598. Ketkar, N. (2017). Introduction to pytorch. In Deep Learning with Python, pages 195–208. Springer. Khataei, A., and A. Arya (2014). Personalized presentation builder. In CHI ’14 Extended Abstracts on Human Factors in Computing Systems. Toronto, Ontario, Canada, pages 2293–2298. ACM. Kilteni, K., R. Groten, and M. Slater (2012). “The sense of embodiment in virtual reality.” Presence: Teleoperators and Virtual Environments 21(4): 373–387. Kiourt, C., A. Koutsoudis, and G. Pavlidis (2016). “DynaMus: A fully dynamic 3D virtual museum framework.” Journal of Cultural Heritage 22: 984–991. Klinger, E., S. Bouchard, P. Légeron, S. Roy, F. Lauer, I. Chemin, and P. Nugues (2005). “Virtual reality therapy versus cognitive behavior therapy for social phobia: A preliminary controlled study.” Cyberpsychology & behavior 8(1): 76–88. Koulieris, G. A., G. Drettakis, D. Cunningham, and K. Mania (2016). Gaze prediction using machine learning for dynamic stereo manipulation in games. Virtual Reality (VR), 2016 IEEE. IEEE. Kuliga, S. F., T. Thrash, R. C. Dalton, and C. Hoelscher (2015). “Virtual reality as an empirical research tool—Exploring user experience in a real building and a corresponding virtual model.” Computers, Environment and Urban Systems 54: 363–375. Lanata, A., G. Valenza, M. Nardelli, C. Gentili, and E. P. Scilingo (2015). “Complexity index from a personalized wearable monitoring system for assessing remission in mental health.” IEEE Journal of Biomedical and health Informatics 19(1): 132–139. Langlotz, T., T. Nguyen, D. Schmalstieg, and R. Grasset (2014). “Next-generation augmented reality browsers: rich, seamless, and adaptive.” Proceedings of the IEEE 102(2): 155–169.

8 Towards personalized virtual reality touring through cross-object user interfaces | 221

[35] Lecuyer, A., L. George, and M. Marchal (2013). “Toward adaptive VR simulators combining visual, haptic, and brain-computer interfaces.” IEEE computer graphics and applications 33(5): 18–23. [36] Li, S. S., and E. Karahanna (2015). “Online recommendation systems in a B2C E-commerce context: a review and future directions.” Journal of the Association for Information Systems 16(2): 72. [37] Lin, W., L. Du, C. Harris-Adamson, A. Barr, and D. Rempel (2017). Design of hand gestures for manipulating objects in virtual reality. In International Conference on Human-Computer Interaction. Springer. [38] Liu, N., J. Han, D. Zhang, S. Wen, and T. Liu (2015). Predicting eye fixations using convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE. [39] Liu, Q., F. Yu, S. Wu, and L. Wang (2015). A convolutional click prediction model. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM. [40] Lockton, D., D. Harrison, and N. A. Stanton (2010). “The Design with Intent Method: A design tool for influencing user behaviour.” Applied ergonomics 41(3): 382–392. [41] Luyten, K., and K. Coninx (2005). Distributed user interface elements to support smart interaction spaces. In Multimedia, Seventh IEEE International Symposium on. IEEE. [42] Martel, E., F. Su, J. Gerroir, A. Hassan, A. Girouard, and K. Muldner (2015). Diving Head-First into Virtual Reality: Evaluating HMD Control Schemes for VR Games. FDG. [43] Maybury, M. (1998). Intelligent user interfaces: an introduction. In Proceedings of the 4th international conference on Intelligent user interfaces. ACM. [44] Meißner, M., J. Pfeiffer, T. Pfeiffer, and H. Oppewal (2019). “Combining virtual reality and mobile eye tracking to provide a naturalistic experimental environment for shopper research.” Journal of Business Research 100: 445–458. [45] Nivethika, M., I. Vithiya, S. Anntharshika, and S. Deegalla (2013). Personalized and adaptive user interface framework for mobile application. In Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on. IEEE. [46] Partarakis, N., M. Antona, and C. Stephanidis (2016). Adaptable, Personalizable and Multi User Museum Exhibits. In Curating the Digital, pages 167–179. [47] Partarakis, N., M. Antona, E. Zidianakis, and C. Stephanidis (2016). Adaptation and Content Personalization in the Context of Multi User Museum Exhibits. AVI*CH. [48] Pera, M. S., and Y.-K. Ng (2011). A personalized recommendation system on scholarly publications. In Proceedings of the 20th ACM international conference on Information and knowledge management, Glasgow, Scotland, UK, pages 2133–2136. ACM. [49] Prewitt, S., J. Hannon, G. T. Colquitt, T. A. Brusseau, M. Newton, and J. Shaw (2015). “Implementation of a personal fitness unit using the Personalized System of Instruction (PSI)”. [50] Schwarzer, R. (2008). “Modeling health behavior change: How to predict and modify the adoption and maintenance of health behaviors.” Applied psychology 57(1): 1–29. [51] Sherman, W. R., and A. B. Craig (2002). Understanding virtual reality: Interface, application, and design, Elsevier. [52] Sili, M., M. Garschall, M. Morandell, S. Hanke, and C. Mayer (2016). Personalization in the User Interaction Design. In International Conference on Human-Computer Interaction. Springer. [53] Soler-Dominguez, J. L., J. D. Camba, M. Contero, and M. Alcañiz (2017). A Proposal for the Selection of Eye-Tracking Metrics for the Implementation of Adaptive Gameplay in Virtual Reality Based Games. In International Conference on Virtual, Augmented and Mixed Reality. Springer. [54] Steuer, J. (1992). “Defining virtual reality: Dimensions determining telepresence.” Journal of

222 | X. Li et al.

communication 42(4): 73–93. [55] Sun, L., Y. Zhou, P. Hansen, W. Geng, and X. Li (2018). Cross-objects user interfaces for video interaction in virtual reality museum context. In Multimedia Tools and Applications. [56] Sutcliffe, A. G., and K. D. Kaur (2000). “Evaluating the usability of virtual reality user interfaces.” Behaviour & Information Technology 19(6): 415–426. [57] Thies, J., M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner (2016). “Facevr: Real-time facial reenactment and eye gaze control in virtual reality.” arXiv preprint arXiv:1610.03151. [58] Tseng, V. S., and K. W. Lin (2006). “Efficient mining and prediction of user behavior patterns in mobile web systems.” Information and software technology 48(6): 357–369. [59] Tucker, C. E. J. J. o. M. R. (2014). “Social networks, personalized advertising, and privacy controls.” Journal of Marketing Research 51(5): 546–562. [60] Verpoorten, D., C. Glahn, M. Kravcik, S. Ternier, and M. Specht (2009). Personalisation of learning in virtual learning environments. In European Conference on Technology Enhanced Learning. Springer. [61] Weber, K., H. Ritschel, F. Lingenfelser, and E. André (2018). Real-Time Adaptation of a Robotic Joke Teller Based on Human Social Signals. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems. [62] Wu, C., Z. Tan, Z. Wang, and S. Yang (2017). A Dataset for Exploring User Behaviors in VR Spherical Video Streaming. In Proceedings of the 8th ACM on Multimedia Systems Conference. ACM. [63] Zhang, X., Y. Sugano, M. Fritz, and A. Bulling (2015). Appearance-based gaze estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Peter Knees, Markus Schedl, Bruce Ferwerda, and Audrey Laplante

9 User awareness in music recommender systems

Abstract: Music recommender systems are a widely adopted application of personalized systems and interfaces. By tracking the listening activity of their users and building preference profiles, a user can be given recommendations based on the preference profiles of all users (collaborative filtering), characteristics of the music listened to (content-based methods), meta-data and relational data (knowledge-based methods; sometimes also considered content-based methods), or a mixture of these with other features (hybrid methods). In this chapter, we focus on the listener’s aspects of music recommender systems. We discuss different factors influencing relevance for recommendation on both the listener’s and the music’s side and categorize existing work. In more detail, we then review aspects of (i) listener background in terms of individual, i. e., personality traits and demographic characteristics, and cultural features, i. e., societal and environmental characteristics, (ii) listener context, in particular modeling dynamic properties and situational listening behavior, and (iii) listener intention, in particular by studying music information behavior, i. e., how people seek, find, and use music information. This is followed by a discussion of user-centric evaluation strategies for music recommender systems. We conclude the chapter with a reflection on current barriers, by pointing out current and longer-term limitations of existing approaches and outlining strategies for overcoming these. Keywords: music recommender systems, personalization, user modeling, user context, user intent

1 Introduction Music recommender systems are a widely adopted application of personalized systems and interfaces [118]. On a technical level, large-scale music recommender systems became feasible through online music distribution channels and collective platforms that track users’ listening events.1 By tracking the listening activity of their users and building preference profiles, a user can be given recommendations based on the 1 Early examples from the pre-streaming era are peer-to-peer networks and platforms like Last.fm (http://last.fm). Acknowledgement: We thank Fabien Gouyon for the discussions leading to the updated factor model. Peter Knees acknowledges support by the Austrian Research Promotion Agency (FFG) under the BRIDGE 1 project SmarterJam (858514). https://doi.org/10.1515/9783110552485-009

224 | P. Knees et al. pool of preference profiles of all users (collaborative filtering, e. g., [124, 15]), characteristics of the music listened to (content-based methods, e. g., [12, 129]), expert- and user-generated (relational) meta-data (knowledge-based methods, e. g., [134, 70, 103]; sometimes also considered content-based methods), or a mixture of these, potentially extended by other features (hybrid methods, e. g., [24, 94, 60]). The main research area exploring these opportunities, i. e., music information retrieval (MIR), historically, has predominantly followed content-based approaches [71]. This can facilitate music recommendation starting from preferred examples and then following the query-by-example paradigm, which is central to information retrieval tasks. While aspects of user adaptivity and relevance feedback can be addressed, e. g., [104, 72], modeling of the listener was underrepresented in the majority of the work [116]. For developing recommender systems, traditionally, static collections and recorded user interactions have served as offline ground truth. This permits researchers to optimize retrieval and recommendation system performance, e. g., by maximizing precision or minimizing error; cf. [16, 120]. More recently, with the establishment of dedicated online music streaming platforms such as Spotify2 and Pandora,3 more dynamic and user-oriented criteria, assessed by means of massive online A/B testing, have driven the industrial development; cf. [126, 1]. However, both offline and online approaches operate on the basis of a system-centric view and therefore neglect userand usage-centric perspectives on the process of music listening. Such perspectives involve, e. g., factors of listening context such as activity or social setting, listening intent, or the listener’s personality, background, and preferences. Incorporating this information can enhance the process of music recommendation in a variety of situations, from mitigating cold-start scenarios, i. e., when usage data of new users are missing, to mood- and situation-tailored suggestions, to adaptive and personalized interfaces that support the listener in his or her music information seeking activities. In this chapter, we focus on aspects of the listener in music recommendation. In Section 2, we discuss different factors that influence the relevance of recommendations. This covers aspects of both the listener and the musical items to recommend. Additionally, we briefly outline the development from search scenarios to approaches to personalization and user adaptation. Section 3 deals with aspects of listener background and discusses variables that influence differences in music preferences of listeners, divided into individual (i. e., personality traits and demographic characteristics) and cultural features (i. e., societal and environmental characteristics). In Section 4, we focus on the listener context, i. e., contextual and situational listening behavior. To this end, we elaborate on the modeling and description of the listener’s 2 http://www.spotify.com 3 http://www.pandora.com

9 User awareness in music recommender systems | 225

emotions, on the emotions assigned to music items, and on the relationship between these two. We further discuss methods that exploit various sensor data for user modeling. Section 5 then focuses on listener intention, in particular by studying music information behavior, i. e., how people seek, find, and use music information. This includes studies conducted in the information science field on how people discover new music artists or new music genres in everyday life, as well as studies that examine how people use and perceive music recommender systems. To round this chapter off, we give an overview of user-centric evaluation strategies for music recommender systems in Section 6, before concluding with a discussion of current barriers in Section 7, where we point out current and longer-term limitations of existing approaches and outline strategies for overcoming these. Although we present existing technical academic work, we also highlight findings from non-technical disciplines to call attention to currently missing facets of music recommender systems. These identified but not yet technically covered requirements should help the reader in identifying potential new research directions.

2 Relevant factors in music recommendation In the field of recommender systems research, the interaction of two factors is relevant for making recommendations: the user and the item, i. e., in our case a music entity, such as a track, an artist, or a playlist. In traditional recommender systems, based on previous interactions of users and items, future interactions are predicted, either by identifying similar users or items (memory-based collaborative filtering) or by learning latent representations of users and items by decomposing the matrix of interactions (model-based collaborative filtering). While model-based methods have the advantage of resulting in representations of users and items that permit effective prediction of future interactions, a major drawback of these latent representations is that they are hard to interpret and, while describing the data, typically cannot be connected to actual properties of users or items. In an attempt to connect models to such properties (e. g., rating biases of users, popularity biases of products, or domain-specific properties like preference for the music genre of a track), more factors and degrees of freedom are included to fit the observed data (e. g., [75, 56, 26]). However, particularly in scenarios where no prior interaction data have been observed (“cold start”), such purely data-driven models show their weakness, making explicit modeling of user properties, usage context, etc., and their effects desirable. While different aspects of music items are well covered by research in MIR, models of different facets of the listener have found little application in recommendation systems. For both user and item, we can identify different categories of these facets that impact recommendations, namely, intrinsic properties, goals, and external aspects.

226 | P. Knees et al. Fig. 9.1 shows these six finer-grained factors underlying the interaction between users and items.4

Figure 9.1: User and item factors influencing recommendations. Recommender systems typically model and predict the interaction between users and items. Both users and items themselves can be described by different factors that can be categorized into intrinsic properties, goals, and external factors. While intrinsic factors refer to mostly stable features, goals and external factors are more dynamic.

In terms of item factors, we can distinguish between the following. Music content refers to everything that is contained in or, more pragmatically, that can be extracted from the audio signal itself, such as aspects of rhythm, timbre, melody, harmony, structure, or even the mood of a piece. Music purpose refers to the intended usage of the music which can have a spiritual or political purpose (e. g., an anthem) or can be created for the purpose of playing in the background (e. g., muzak). This also relates to aspects of associative coding in Juslin’s theory [63], which conveys connotations between the music and other “arbitrary” events or objects. The role of this facet for recommendation has remained largely unexplored, apart from special treatment of certain events, such as offering special playlists for holiday seasons like Christmas, containing tracks usually filtered out during the remaining time. Music meta-data (and cultural context) refers to aspects of the music that cannot be inferred directly from the audio signal, such as meta-data like year of recording or country of origin, as well as different types of community meta-data: usergenerated content such as reviews or tags; additional multimodal contextual information such as album artwork, liner notes, or music videos; and diverse out4 This extends the four categories of influences of music similarity perception by Schedl et al. [116] and further integrates aspects of the model of music perception and conveyed emotions by Juslin [63].

9 User awareness in music recommender systems | 227

comes and impacts of marketing strategies. This also captures elements categorized as associative coding (see above). Correspondingly, in terms of user facets, we can distinguish between the following. Listener background refers to the listener’s personality traits, such as preference and taste, musical knowledge, training, and experience, as well as to demographics and cultural background. Generally, this comprises more static and stable characteristics of the user. Listener intent refers to the goal of the listener in consuming music. Potential goals span from evocation of certain emotions, i. e., emotional self-regulation, to the desire to demonstrate knowledge, musical sophistication, or musical taste in a social setting. Listener context refers to the current situation and setting of the user, including location, time, activity, social context, or mood. This generally describes more dynamic aspects of the listener. In the following, we focus on the listener and review work dealing with these different dimensions. First, considering aspects of listener background, we give an overview of work exploring music preference, personality, and cultural characteristics. Second, we focus on contextual and more dynamic factors, namely, modeling the listener’s emotional state, as well as deriving a listener’s context from sensor data of personal devices. Finally, we deal with the least explored area in terms of music recommender systems, i. e., the listener’s intent, in the context of music information behavior.

3 Correlates of music preferences Music plays an important part in our lives. We actively use music as a resource to support us in everyday life activities. Hence, music can have different functions (cf. Section 5). Merriam and Merriam [95] defined several functions of music, amongst which emotion expression, esthetic enjoyment, entertainment, communication, and symbolic representation. Given the different functions of music, the kind of music that is appropriate for a certain function may be a personal matter. What we like to listen to is shaped by our personal tastes and preferences as well as by our cultural preconceptions [100]. Although ample research has been carried out in traditional psychology on individual differences of music preferences, it is important to investigate to what extent these findings still hold in a technological mediated context as well as new relationships that have become available through new interaction opportunities that technologies facilitate. In this chapter we discuss work on music preferences in a technological setting that deals with preference correlations with individual as well as cultural aspects with which we try to draw parallels with results from traditional psychology.

228 | P. Knees et al.

3.1 Individual aspects The existence of individual differences in music preferences has been investigated quite extensively in traditional psychological research already (for an overview see [109]). However, with recent technological advances, current online music systems (e. g., online music streaming services) provide their users with an almost unlimited amount of content that is directly at their disposal. This abundance of available music may have deviating effects on our prior knowledge of how people listen to music. For example, users may be prone to try out different content more than they would do in the offline world and even their preference may change more often or become more versatile [42]. Prior psychological work argued that age may play an important role in identifying individual differences in music preferences due to varying influences that shape music preferences across the course of life. For example, an individual may develop their music taste through the influence of parents in the early ages, but get influenced by the taste of peers later on in life [110]. Recent work investigated the change of music preferences with advancing age by analyzing the music listening histories of an online music streaming service [41]. By being able to trace back the music listening histories, a mapping could be made of the change of music preferences. Although the online music listening behaviors reflected more diversity and versatility, the general trends are in line with prior psychological work [110]. The results showed that over time music preferences became more stable, while in the younger age groups music preferences are more exploratory and scattered across genres. Recent work has shown that, conversely, demographic information of users can also be predicted from online listening behavior [76] as well as the musical sophistication of music listeners [30]. Aside from the identification of age differences, another way that is often used to segment online music listeners is based on their personality. Personality has shown to be a stable construct and is often used as a general model to relate behavior, preferences, and needs of people to [62]. A common way to segment people on personality traits is based on the Five-Factor Model (FFM). The FFM describes personality traits based on five general dimensions: openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism (see Table 9.1). Table 9.1: Five-Factor Model, adopted from [62]. General dimension

Primary factors

Openness to Experience Conscientiousness Extraversion Agreeableness Neuroticism

Artistic, curious, imaginative, insightful, original, wide interest Efficient, organized, planful, reliable, responsible, thorough Active, assertive, energetic, enthusiastic, outgoing, talkative Appreciative, forgiving, generous, kind, sympathetic, trusting Anxious, self-pitying, tense, touchy, unstable, worrying

9 User awareness in music recommender systems | 229

Several works have shown the relationship between personality traits and online music preferences, but mainly investigated the relationship of personality traits with ways of interacting within a system. Although these results do not allow for a comparison with results from traditional psychology, they do provide new insights in user interactions with music systems and how these interactions can be personalized. Ferwerda et al. [38] investigated how personality and emotional states influence the preference for certain kinds of music. Other studies have shown that personality is related to the way we browse (i. e., by genre, mood, or activity) for online music [44]. In their study they simulated an online music streaming service in which they observed how users navigated through the service to find music that they liked to listen to. Tkalčič et al. [128] investigated the relationship between personality traits and digital concert notes. They looked at whether personality influences the preferences of the amount of content presented. Others looked at the music diversity needs based on personality traits [31] and have proposed ways to incorporate personality traits into music recommender systems to improve music recommendations to users [33, 35].

3.2 Cultural aspects Aside from individual aspects, preference differences can already occur on a cultural level. The environments that we are exposed to have a big influence on how our preferences are shaped [100, 115]. Especially with services being online and widespread, analyses of more global behaviors are possible. For example, artist preference differences have been found based on linguistic distance [91]. A known way to investigate cultures is by relying on Hofstede’s cultural dimensions [54]. Although this model originates in 1968, it is still used. Hofstede’s cultural dimensions are based on data of 97 countries. The data show patterns that result in the following six dimensions: power distance index, individualism, uncertainty avoidance index, masculinity, long-term orientation, and indulgence, as described in the following. Power distance defines the extent to which power is distributed unequally by less powerful members of institutions (e. g., family). A high power distance indicates that a hierarchy is clearly established and executed in society. A low power distance indicates that authority is questioned and attempts are made to distribute power equally. Individualism defines the degree of integration of people into societal groups. High individualism is defined by loose social ties – the main emphasis is on the “I” instead of the “we” – while this is the opposite for low individualistic cultures. Masculinity defines a society’s preference for achievement, heroism, assertiveness, and material rewards for success (high score in this dimension). Conversely, low masculinity represents a preference for cooperation, modesty, caring for the weak, and quality of life.

230 | P. Knees et al. Uncertainty avoidance defines a society’s tolerance for ambiguity. High scoring countries in this scale are more inclined to opt for stiff codes of behavior, guidelines, and laws, whereas more acceptance of different thoughts and/or ideas are more common for those scoring low in this dimension. Long-term orientation is associated with the connection of the past with the current and future actions and/or challenges. Inhabitants of lower scoring countries tend to believe that traditions are honored and kept, and values are steadfast. Inhabitants of high scoring countries believe more that adaptation and circumstantial, pragmatic problem solving are necessary. Indulgence defines in general the happiness of a country. Countries scoring high in this dimension are related to a society that allows relatively free gratification of basic and natural human desires related to enjoying life and having fun (e. g., be in control of their own life and emotions), whereas low scoring countries show more controlled gratification of needs and regulate it by means of strict social norms. Studies that looked at Hofstede’s cultural dimensions found differences on several aspects, for example, diversity in music listening. Countries scoring high on the power distance tend to show less diversity in the artists and genres they listened to. The individualism dimension was found to negatively correlate with music diversity [43]. Extended analysis can be found in [34]. Others have shown that Hofstede’s cultural dimensions and socio-economic factors can be used to predict genre preferences of music listeners [92, 119, 122]. By applying a random forest algorithm, they were able to achieve an improvement of 16.4 % in genre prediction over the baseline [122]. The identification of individual and cultural differences with regard to music contributes to new and deeper understanding of behaviors, preferences, and needs in online music environments. Moreover, the findings also provide insights in how these differences can be exploited for personalizing experiences. For example, a persistent problem for personalized systems is implicit preference elicitation for new users. Relying on identified individual and cultural differences may contribute to mitigate these preference elicitation problems. For example, research has shown that personality can be predicted from behavioral traits on social media (e. g., Facebook [14, 35, 49], Twitter [48, 108], Instagram [36, 37, 39, 40, 83], and a combination of social media sources [123]). The increased implementation of single sign-on (SSO) mechanisms5 allows users to easily log in and register to an application, but also lets applications import user information from the connected application. Hence, these personalization prediction methods could play an important role in personalization strategies to mitigate preference elicitation for new users. When there are no external information sources available to extract user information from, personalization strategies based 5 Buttons that allow users to register or log in with accounts of other applications, for example, social networking services; “Log in with your Facebook account.”

9 User awareness in music recommender systems | 231

on cultural findings may be the second best option for personalization. Country information often already exists in a standard user profile and is therefore easy to acquire.

4 Contextual and situational music listening behavior The situation or context a person is in when listening to music – or deciding what to listen to – is known to strongly affect the music preferences as well as consumption and interaction behavior [7, 22]. To give an example, a person is likely to listen to different music or create a different playlist when preparing for a romantic dinner than when preparing to go out on a Saturday night [47]. The most frequently considered types of context include location (e. g., listening at workplace, when commuting, or relaxing at home) [66] and time (typically categorized into, e. g., morning, afternoon, and evening) [13]. In addition, context may also relate to the listener’s activity [133], weather [106], listening device, e. g., earplugs on a smartphone versus hi-fi stereo at home [47], and various social aspects [20, 101], just to name a few. Another type of context is interactional context with sequences, which is particularly important for the tasks of session-based recommendation and sequence-aware recommendation. In this case, context refers to the sequence of music pieces a listener decides to consume consecutively. In the music domain, such tasks are often referred to as automatic playlist generation or automatic playlist continuation [10, 17]. Sequence learning and natural language processing techniques applied to playlist names are typically used to infer contextual aspects. A particularly important situational characteristic is that of emotion, both from a user’s perspective [113] and song annotations [137]. In the following, we therefore first introduce in Section 4.1 the most common approaches to model listeners’ moods and emotions, emotions perceived while listening to music, and ways to affectively connect listeners and music pieces. In Section 4.2, we subsequently review methods that exploit various sensor data for user modeling in music recommender systems, e. g., from sensors built into smart devices. Such sensor data can be used either to directly learn contextual music preferences, or to infer higher-level context categories such as the target user’s activity.

4.1 Emotion and mood: connecting listeners and music The affective state of the listener has a strong impact on his or her short-time musical preferences [65]. Vice versa, music also strongly influences our affective state. It therefore does not come as a surprise that affect regulation is regarded as one of the main

232 | P. Knees et al. reasons why people listen to music [93, 113]. As an example, people may listen to completely different musical genres or styles when they are sad in comparison to when they are happy. Indeed, prior research on music psychology discovered that people usually choose the type of music which moderates their affective condition [74]. More recent findings show that music is often chosen for the purpose of augmenting the emotional situation perceived by the listener [99]. Note that in psychology – often in contrast to recommender systems or MIR research, but also everyday use – it is common to distinguish between mood and emotion as two different affective constructs. The most important differences are that a mood is characterized as an experience of longer but less intense duration without a particular stimulus, whereas an emotion is a short experience with an identifiable stimulus event that causes it. In order to build affect-aware music recommenders, it is necessary to (i) infer the emotional state or mood the listener is in, (ii) infer emotional concepts from the music itself, and (iii) understand how these two interrelate. These three tasks are detailed below. In the context of (i), we also introduce the most important ways to describe emotions.

4.1.1 Modeling the listener’s emotional state The emotional state of a human can be obtained explicitly or implicitly. In the former case, the person is typically presented a questionnaire or user interface that maps the user’s explicit input to an emotion representation according to one of the various categorical models or dimensional models. Categorical models describe emotions by distinct words such as happiness, sadness, anger, or fear [139, 53], while dimensional models described emotions by scores with respect to two or three dimensions. One of the most prominent dimensional models is Russel’s 2D circumplex model [111], which represents valence and arousal as orthogonal dimensions; cf. Fig. 9.2 (top). Into this model, categorical models can be integrated, e. g., by mapping emotion terms to certain positions within the continuous emotion space. The exact positions are commonly determined through empirical studies with humans. For a more detailed elaboration on emotion models in the context of music, we refer to [137, 117]. One prominent example is the Geneva emotion wheel,6 depicted in Fig. 9.2 (bottom). It is a hybrid model that uses emotion terms as dimensions and describes the intensity of each of these emotions on a continuous scale. Besides explicit emotion elicitation, the implicit acquisition of people’s emotional states can be effected, for instance, by analyzing user-generated text [23], speech [29], 6 http://www.affective-sciences.org/gew

9 User awareness in music recommender systems | 233

Figure 9.2: Emotion models. Top: emotional terms expressed in the valence–arousal space, adopted from [111]. Bottom: Geneva emotion wheel, adopted from [121].

or facial expressions in video [27] as well as a combination of audio and visual cues [67, 98].

4.1.2 Modeling the emotion perceived in music Music can be regarded as an emotion-laden content and can hence also be described by emotion words, similar to listeners. The task of automatically assigning to a given music piece such emotion terms (in the case of categorical emotion models) or intensities (in the case of dimensional models) is an active research area, typically referred to as music emotion recognition (MER), e. g., [138, 6, 68, 57, 139]. If categorical emotion models are adopted, the MER task is treated as a classification task, whereas it is

234 | P. Knees et al. considered a regression task in the case of dimensional models. Even though a variety of datasets, feature modeling approaches, and Machine Learning methods have been created, devised, and applied, respectively, integrating the emotion terms or intensities predicted by MER tools into a music recommender system is not an easy task due to several reasons. In the beginning of MER research, the problem was approached from a pure Machine Learning perspective, taking audio features as input to predict labels, which in this case constituted emotion terms. These approaches were agnostic of the actual meaning of the emotion terms as they failed to distinguish between intended emotion, perceived emotion, and induced or felt emotion [120]. Intended emotion refers to the emotion the composer, songwriter, or performer had in mind when creating or performing the music piece; perceived emotion refers to the emotion recognized by a person listening to a piece; induced emotion refers to the emotion felt by the listener. Current MER approaches commonly target perceived or induced emotions. Second, the ways musical characteristics reflected by content descriptors (rhythm, tonality, lyrics, etc.) influence the emotional state of the listener remain highly subjective, even though some general rules have been identified [77]. For instance, a musical piece in major key is typically perceived brighter and happier than a piece in minor key. A fast piece is perceived more exciting or more tense than a slow one. However, the perception of music emotion also depends on other psychological constructs such as the listener’s personality [117, 38]; cf. Section 3.1.

4.1.3 Relating human emotions and music emotion annotations As a result of the previous discussion on the relationship between listeners and emotions (Section 4.1.1) and between music pieces and emotions (Section 4.1.2), we assume to have information about user–emotion assignments and item–emotion assignments. Towards building emotion-aware music recommender systems, we next have to connect users and items through emotions, in the affectively intended way, which is a challenging endeavor. To this end, knowing about the user’s intent is crucial. Three fundamental intents or purposes of music listening have been identified in a decent study conducted by Schäfer et al. [113]: self-awareness (e. g., stimulating a reflection of people on their identity), social relatedness (e. g., feeling closeness to friends and expressing identity), and arousal and mood regulation (e. g., managing emotions). Several studies found that affect regulation is indeed the most important purpose why people listen to music [113, 93, 9]. Nevertheless, modeling music preferences as a function of the listener’s mood, listening intent, and affective impact of listening to a certain emotionally laden music piece is still insufficiently understood. This is the likely reason why, to the best of our knowledge, full-fledged emotionaware systems still do not exist. Preliminary approaches integrate content and mood-

9 User awareness in music recommender systems | 235

based filtering [2] or implement cross-modal recommendation, such as matching the mood of an input video with that of music pieces and recommending matching pieces [112]. Other work infers the user’s emotional state from sensor data (cf. Section 4.2) and matches it with explicit user-specific preference indications. For instance, Park et al. [105] gather information about temperature, humidity, noise, light level, weather, season, and time of day and subsequently use these features to predict whether the user is depressed, content, exuberant, or anxious. Based on explicit user preference feedback about which type of music he or she prefers in a given emotional state, the proposed system then adapts recommendations. To conclude, without decent psychological listener profiles and a comprehensive understanding of the listener’s affective state, listening intent, and affective impact of a song on the listener, emotion-aware recommender systems are unlikely to produce recommendations that truly satisfy the user. Gaining such insights, elaborating methods to create respective listener profiles, and subsequently devising approaches to integrate them into systems can therefore be considered open research challenges.

4.2 Sensor data for context modeling Contextual modeling for music recommender systems can also be achieved by exploiting various sensor data, where we understand sensors in a broad sense, not only as physical or hardware devices, but also including virtual sensors like active apps or running background tasks on a personal device. Today’s smart devices are packed with sensors, ranging from motion to proximity to light sensors. It has therefore become easier than ever to gather large amounts of sensor data and exploit them for various purposes, such as gait recognition [132], human activity classification [78], or personal health assistance [89]. Table 9.2 provides a categorization of some sensor data that can be gathered from smart devices [47]. Data attributes are either of categorical or numerical type. In addition to the frequently used temporal, spatial, and motion signals, the table lists hardware-specific attributes (device and phone data), environmental information about the surrounding of the user (ambience), connectivity information (network), information about used applications (tasks), and application-specific information (in our context, of a music player). Such sensor data have been exploited to some extent to build user models that are subsequently integrated into context- or situation-aware music recommender systems. Most earlier approaches are characterized by the fact they take only a single category of context sensors into account, most often spatial and temporal features. To give a few examples, Lee and Lee [87] exploit weather conditions alongside listening histories. Cebrian et al. [13] use temporal features. Addressing the task of supporting sports and fitness training, a considerable amount of work uses sensors to gauge steps per minute or heart rate to match the music played with the pace of the listener, or

236 | P. Knees et al. Table 9.2: Categories of common sensor data used in context modeling, adapted from [47]. Letters in parenthesis indicate whether the attribute is Categorical or Numerical. Category

Attributes

Time Location Weather

day of week (N), hour of day (N) provider (C), latitude (C), longitude (C), accuracy (N), altitude (N) temperature (N), wind direction (N), wind speed (N), precipitation (N), humidity (N), visibility (N), pressure (N), cloud cover (N), weather code (N) battery level (N), battery status (N), available internal/external storage (N), volume settings (N), audio output mode (C) service state (C), roaming (C), signal strength (N), GSM indicator (N), network type (N) recently used tasks/apps (C), screen on/off (C), docking mode (C) mobile network: available (C), connected (C); active network: type (C), sub-type (C), roaming (C); Bluetooth: available (C), enabled (C); Wi-Fi: enabled (C), available (C), connected (C), BSSID (C), SSID (C), IP (N), link speed (N), RSSI (N) light (N), proximity (N), temperature (N), pressure (N), noise (N) acceleration force (N), rate of rotation (C), orientation of user (N), orientation of device (C) repeat mode (C), shuffle mode (C), sound effects: equalizer present (C), equalizer enabled (C), bass boost enabled (C), bass boost strength (N), virtualizer enabled (C), virtualizer strength (N), reverb enabled (C), reverb strength (N)

Device Phone Task Network

Ambience Motion Player

to stimulate a particular exercising behavior, for example, Biehl et al. [8], Elliott and Tomlinson [28], Dornbush et al. [25], Cunningham et al. [18], de Oliveira and Oliver [21], and Moens et al. [96]. Besides the use case of sports and exercising, there further exists research work that targets other specific tasks, e. g., music recommendation while working [136], driving a car [4], or for multiple activity classes such as running, eating, sleeping, studying, working, or shopping [133, 131]. An approach to identify music to accompany the daily activities of relaxing, studying, and workout is proposed in [135]. More recent research works integrate a larger variety of sensor data into user models. For instance, Okada et al. [102] present a mobile music recommender that exploits sensor data to predict the user’s activity, environment, and location. Activity is inferred from the device’s accelerometer and classified into idle, walking, bicycling, running, etc. The user’s environment is predicted by recording audio on the device and matching it to a database of audio snippets, which are labeled as meeting, office, bus, etc. As for location, latitude and longitude GPS data are clustered into common user locations. Integrating activity, environment, location, and quantized temporal data with respect to time of day and week days, the proposed system learns rules such as “Every Sunday afternoon, the user goes jogging and listens to metal songs,” which are used among other information to effect recommendations.

9 User awareness in music recommender systems | 237

Wang et al. [133] present a system which records time of day, accelerometer data, and ambient noise. Using these features, the recommender predicts the user’s activity, i. e., running, walking, sleeping, working, or shopping. Activity-aware recommendations are eventually effected by matching music pieces labeled with activity tags with the user’s current activity. Schedl et al. [114] propose a mobile music recommender called Mobile Music Genius, which acquires and monitors various context features during playback. It uses a decision tree classifier to learn to predict music preferences for given contexts. The user preference is modeled at various levels, i. e., genre, artist, track, and mood; the context is modeled as an approximately 100D feature vector, including the attributes listed in Table 9.2. While playing, the user context is continuously monitored and compared to the temporally preceding context vector. If context changes exceed a sensitivity parameter, a track that fits the new context is requested from the classifier and added as next track to the playlist. Hong et al. [55] exploit day of the week, location, weather, and device vibration as features to build a bipartite graph, in which nodes either represent contexts or music pieces. Edges connect the two categories of nodes and the weight of an edge indicates how often a user has listened to a certain piece in a certain context. A random walk with restart algorithm is then used to identify music pieces that fit a given context. This algorithm takes into account the number, lengths, and edge weights of paths between the given context node and the music nodes. In summary, early works that exploit sensor data for context modeling for music recommendation focused on single and typically simple sensors, such as time or weather, while more recent ones consider a variety of sensor data, extending the above by motion, environment, or location information, among others. Context models are then created either based on the entirety of the considered sensor data [114, 55], or based on inferred information such as the user’s activity [102, 133, 131] or mood [105].

5 Music information behavior To deepen our understanding of the listeners’ perception and uses of music recommender systems, we should also examine their music information behavior in everyday life. The term “information behavior” encompasses a wide range of activities including seeking, finding, using, selecting, avoiding, and stumbling upon information [11]. The information can come from formal sources (e. g., books, magazines, music recordings) or from informal ones (e. g., friends, family members). In other words, information behavior research does not limit its scope to users’ interactions with information systems. It also looks more broadly at the information practices that surround the use (or non-use) of these systems.

238 | P. Knees et al. In this section, we look at the music-related information behavior of people in everyday life. In the first subsection, we review the literature on how people discover music in their daily life and the place music recommender systems play in it; in the second, we focus more specifically on studies on users’ perceptions of music recommender systems.

5.1 Discovering music in everyday life Task-based experiments and transaction logs are useful for identifying usability problems in a system. However, the intent and experience of users can only be inferred from the traces of their interactions with the system. To know about the true users’ needs, we need to take a step back to get a broader perspective on information practices in real life. To understand how music recommender systems can better support their users, we suggest looking at the studies on how people discover music in their daily life. For the most part, these studies have been conducted by researchers in information sciences and employ qualitative interviewing, observations, diaries, and surveys, with the objective of learning about the real-life behaviors of real-life users, oftentimes in real-life settings. 5.1.1 Importance of friends and families Consistently and across age groups, studies show that friends, family, and acquaintances were and remain the main source of music discovery in everyday life [127, 79, 61, 69]. Even as music streaming services are pervasive, people prefer approaching friends and relatives to ask for music suggestions rather than seeking recommendations in online services. In a survey conducted by Lee and her colleagues [84], 82.8 % reported turning to friends or family members when searching for music information. Qualitative studies reveal that people do not only turn to people out of convenience. They appreciate receiving recommendations specifically tailored to their tastes from a source they trust, that is, a close friend, a relative, or an acquaintance they consider as being more knowledgeable in the music domain than they are and whose music tastes they value [79, 61, 80]. Additionally, along with the recommendations, the informant will often willingly provide information about the artist and the music, and convey his or her appreciation and enthusiasm along the way, thus turning the social interaction into a “musical experience in its own right” [69]. 5.1.2 Prevalence of serendipitous encounters Research on music information behavior also uncovered that people discover music primarily by chance. People rarely go to music streaming services with the specific

9 User awareness in music recommender systems | 239

objective of looking for the perfect gem. They stumble upon it during their daily routine (e. g., music heard on the radio, in a cafe, in a friend’s car) or en route, while looking for something else. Indeed, studies with younger adults show that a majority of music discoveries (63.3 % in [19]) are the result of passive information behavior or serendipitous encounters [19, 79]. Music has become a nearly constant soundtrack in many people’s lives. Opportunities for encountering music are numerous. However, the strong engagement adolescents and young adults have with music might also explain the prevalence of these events. Research suggests that serendipitous encountering of information does not occur completely randomly, “but from circumstances brought about by unconscious motives which lead ultimately to the serendipitous event” [45]. In [79], it was found that many avid music listeners were also “superencounterers,” for they regularly engaged in activities likely to produce serendipity (e. g., wandering in a music festival) and were constantly monitoring their environment for interesting music. 5.1.3 The role of music recommender systems Although all surveys converge to show the extreme popularity of music streaming services, the adoption of the discovery functionalities of these services is somewhat slower. In a survey conducted by Lee et al. [84], 64.6 % of the participants reported using cloud music services to discover music. Liikkanen and Åman [90] get similar results, with 65 % of the respondents reporting using YouTube7 to discover new artists. Interestingly, Lee et al.’s [84] survey also reveals that a much smaller proportion of women (36.4 %) use these functionalities compared with men (77.1 %). Moreover, the interviews conducted in [61, 69] reveal what seems to be a prevalent pattern: people first get introduced to new music in their daily life (e. g., in the media, through friends), and then they use music recommender systems to expand their exploration to other songs and/or artists.

5.2 User studies of music recommender systems Considering the rapid and widespread adoption of music streaming services, several recent user studies on these systems have been published. In this section, we put our main focus on the studies on users’ experiences with and perception of music recommendations provided by these systems. These studies consist mainly in qualitative research conducted by social scientists and User Experience (UX) studies on music streaming services conducted by researchers in information sciences. 7 https://www.youtube.com

240 | P. Knees et al. 5.2.1 Users’ perception of music recommendations Since the major players in the music streaming industry have comparable (and very extensive) catalogs, the branding of these services now lies in the services they offer, including personalized recommendations and curated playlists [97]. As mentioned above, many users make use of the discovery functions of music streaming services (cf. Section 5.1.3). Therefore, it seems logical that the respondents of a survey considered “Exposure to new things/serendipity” as the most important quality for a music service [84]. But how do users perceive these recommendations? Studies do not provide a clear answer to that question, mainly because the perception seems to vary from user to user. Participants in a study comparing YouTube and Spotify had an overall positive perception of the music discovery functionalities of both services [90]. Results from a large-scale study by Avdeeff [3] highlight one interesting advantage of system recommendations from the user’s perspective: for younger people who often perceive music genres as being confusing, the suggestions YouTube provides represent a useful alternative for discovery. Another perceived advantage lies in the fact that system recommendations reduce the load for the users by assisting them in digital curation tasks. Indeed, among the heavy music service users interviewed by Hagen [50], many considered that the radio function music services offer was a useful way to expand a playlist, which can sometimes lead to further exploration. Other users, however, were not as enthusiastic about the radio function, which had played songs they disliked or recommendations they did not understand. But the main criticisms targeted the lack of novelty and true exploration, which prompted one of Johansson’s participants to say that she felt “stuck in [her] own circles” [61]. Likewise, Kjus found that users “lose interest in the large databases after a period of initial fascination,” a symptom he attributes to the inability of recommender systems to lead users to long tail items [69]. In the same line, several studies reveal a general lack of trust towards music recommender systems. Lee and Price [85] found that most users want more transparency: they want the systems to explain the recommendations. For some users, when Spotify partnered with artists and labels, distrust came with the realization that some services infuse their own commercial interests into their algorithms [69, 97]. Privacy issues also contribute to mistrust among some users [85, 61]. In [61], Johansson reports on the strong negative reaction of some participants regarding Spotify sharing its users’ activities on Facebook8 as a default setting following a partnership deal with the social networking site. 5.2.2 Users’ engagement with music recommender systems Many studies have focused, voluntarily or not, on avid music listeners. Recent largescale studies and smaller qualitative studies with more diversified samples have un8 https://www.facebook.com

9 User awareness in music recommender systems | 241

covered a wider array of engagement practices with music streaming services. In [46], the researchers used mixed methods to study music service users. Their analysis resulted in seven personas with various levels of music expertise and involvement with music systems, including the “Guided Listener” who “wants to engage with a music streaming service minimally.” Indeed, user studies demonstrate that many users wish to be able to listen to music continuously, without interacting much with the system. Johansson [61] reports on young users who feel “lazy” or “less active” and who do not want to make the effort of creating playlists or browsing to find music to listen to. Hagen [50] also notes that users’ engagements vary considerably. This behavior may reflect the abundance (or overabundance) of choices in music services. Having access to such large music collections can seem exhilarating at first. But it can be intimidating to some. Users have expressed feeling “stressed” or “overwhelmed” by the number of items to chose from, or have used the term “drowning” to refer to how they felt [61, 69], a problem known in psychology as choice overload. Strategies for dealing with choice overload include withdrawing/escaping or surrendering the selection to someone else [59]. In this context, for less engaged users, recommender systems become their gateway to music, which is why it seems important to take the needs of these users into consideration in the design of music recommender systems.

6 User-centric evaluation of music recommender systems Most evaluation approaches in current music recommender systems research and recommender systems research in general focus on quantitative measures, both geared towards retrieval accuracy (e. g., precision, recall, F-measure, or NDCG) and qualities beyond pure accuracy that cover further factors of perceived recommendation quality (e. g., spread, coverage, diversity, or novelty) [120]. While such beyond-accuracy measures aim to quantify and objectively assess parameters desired by users, they cannot fully capture actual user satisfaction with a recommender system. For instance, while operational measures have been defined to quantify serendipity and diversity, they can only capture certain criteria. Serendipity and diversity as perceived by the user, however, can differ substantially from these measures, since they are highly subjective concepts [130]. Thus, despite the advantages of facilitating automation of evaluation and reproducibility of results, limiting recommender systems evaluation to such quantitative measures means to forgo essential factors related to UX. Hence, in order to improve user awareness in music recommender systems, it is essential to incorporate evaluation strategies that consider factors of UX. To overcome the tendency to measure aspects such as user satisfaction or user engagement [51, 88] individually, evaluation frameworks for recommender systems aim at providing a

242 | P. Knees et al. more holistic view. One such framework is the ResQue model by Pu et al. [107], which proposes to evaluate the perceived qualities of recommender systems according to 15 constructs pertaining to four evaluation layers. The first layer deals with perceived system qualities and aims at evaluating aspects of recommendation quality, namely, accuracy, novelty, and diversity, interaction adequacy, interface adequacy, information sufficiency, and explicability. The second layer deals with beliefs, evaluating perceived ease of use, control, transparency, and perceived usefulness. Layers three and four deal with attitudes (overall satisfaction and confidence and trust) and behavioral intentions (use intentions and purchase intentions), respectively. These constructs are evaluated using questionnaires consisting of up to 32 questions to be rated on a Likert scale. Another evaluation framework is presented by Knijnenburg et al. [73]. In contrast to the model by Pu et al., which focuses on outcome experience of recommender systems, Knijnenburg et al. aim at providing insight into the relationships of six constructs that impact UX: objective system aspects (OSA), subjective system aspects (SSA), subjective UXs (EXP), objective interaction (INT), and personal and situational characteristics (PC and SC). This includes considerations of users’ expertise of the domain or concerns for privacy, which are not reflected in the model by Pu et al. In particular, Knijnenburg et al. explicitly link INT, i. e., observable user behavior such as clicks, to OSA, i. e., unbiased factors such as response time or user interface parameters, through several subjective constructs, i. e., SSA (momentary, primary evaluative feelings evoked during interaction with the system [52]) and EXP (the user’s attitude towards the system), and argue that EXP is caused by OSA (through SSA) and PC (e. g., user demographics, knowledge, or perceived control) and/or SC (e. g., interaction context or situation-specific trust or privacy concerns). To assess subjective factors, such as perceived qualities of recommendations and system effectiveness, variety, choice satisfaction, or intention to provide feedback, Knijnenburg et al. also propose a questionnaire. Both frameworks can be applied to music recommender systems; however, neither is specifically designed for modeling the processes particular to these systems. Therefore, some of the assumptions do not hold for the requirements of today’s music recommender systems. For instance, the ResQue model by Pu et al. has a strong focus towards commercial usage intentions, which, in music recommender systems, are by far more complex than the mere goal of using the system or intending to purchase (cf. Section 5). In particular with flat-rate streaming subscriptions being the dominating business model, selecting items does not have the same significance as a purchase. On a higher level, the commercial usage intention could be translated into or interpreted as continued subscription or monthly renewal of the service. The model by Knijnenburg et al. can be easier adapted, as it was built around multimedia recommender systems, therefore offering more flexibility to incorporate the factors discussed in Section 2 and exemplarily detailed throughout this chapter. For instance, the INT aspects in the model can be adapted to refer to typical observable behavior of the user like favoring a song or adding it to a playlist, while PC aspects

9 User awareness in music recommender systems | 243

should reflect psychological factors, including affect and personality (as described in Sections 4.1 and 3, respectively), social influence, musical training and experience, and physiological condition, to mention a few more examples. SC, on the other hand, should be adapted to the particularities of music listening context and situations. To get a better understanding of the various evaluation needs in the specific scenario of music recommendation and tailor new strategies, researchers in music recommender systems should therefore increasingly resort to the findings of user-centric evaluations, in-situ interviews, and ethnographic studies; cf. [5, 19, 32, 58, 64, 81, 82, 84, 86, 125].

7 Conclusions In this chapter, we have identified different factors relevant for making music recommendations, namely, intrinsic aspects, external factors, and goals, of both users and items. Focusing on the listener’s aspects, we have highlighted exemplary technical approaches to model these factors and exploit knowledge about them for recommendation scenarios. Regarding intrinsic aspects, i. e., the listener’s background, we have reviewed work investigating the connection between a user’s demographic information, personality traits, or cultural background on one hand, and musical preference, interest in diversity, and browsing preference on the other. While correlations of different user characteristics with music consumption behavior could be uncovered, for a lack of comprehensive data, it has not yet been explored whether these findings are consistent over different experiments. Hence it needs to be investigated if these or other interactions emerge when evaluating several indicators simultaneously in a larger study and with the same subjects. Exploiting more diverse data sources, such as social media and other online activities, should give a more holistic picture of the user. In practice, the resulting challenge is to connect and match user activities across different platforms. While single sign-on options have facilitated the tracing of individuals across several services for the syndicated platforms, impartial academic research does not have comparable means. It remains, however, at least ethically questionable to which extent such profiling of users is justifiable and necessary on the premise of providing improved UXs, specifically music listening experiences. Similar considerations can be taken on matters of modeling external factors, i. e., the listener’s context. We have reviewed work dealing with estimating the current emotional state of listeners and connecting this to the emotions estimated to be conveyed by a piece, as well as work dealing with estimation of situational context of the listener based on sensor data. Both aspects can be highly dynamic and obtaining a ground truth and a general basis for repeatable experiments is challenging, if not elusive. Explicit assessments of mood and context require introspection and reflection

244 | P. Knees et al. by the user and might be considered intrusive. On the other hand, adopting a purely data-driven approach and exploring a wealth of accessible logging data to uncover latent patterns without proper means of validation might give rise to false assumptions and models and further raise issues concerning privacy. To gain a better understanding of listeners’ intents, we have reviewed work dealing with music information behavior to examine how people discover music in their daily life and how users perceive system recommendations in music streaming services. The key role friends and family play in discovering music suggests that people highly value the trustworthiness of the source of a music recommendation. Indeed, one of the most important criticisms leveled at music recommender systems is their lack of transparency and a breach of trust that comes from the role some services have taken in the promotion of specific artists. This impression contributes to the perception that music recommender systems are not the independent discovery tools they pretend to be. In terms of system design, this means that music recommender systems should pay close attention to building and maintaining the trust of their users, for instance by providing explanations to users as to why items are being recommended to them and by clearly identifying promotional recommendations. Furthermore, music information behavior studies reveal the prevalence of passive information behavior and serendipitous encounters in discovering music in daily life. In the same line, the review of studies on users’ perceptions of and experience with music streaming services shows that users have various expectations regarding how much they want to engage with music recommender systems. Although certain users, especially the highly devoted music fans, are willing to spend time actively engaging with a system to keep a high level of control, others feel submerged by the millions of tracks available in a music service and prefer giving out a larger part of the control to the system in order to interact only minimally with it. This means that user-centered music recommender systems should let the users decide how much control they want to surrender to the system in order to cater to all users. To conclude, we believe that a deepened understanding of the different factors of both user and music and their interplay is the key to improved music recommendation services and listening experiences. User awareness is therefore an essential aspect to adapt and balance systems between exploitation and exploration settings and not only identify the “right music at the right time,” but also help in discovering new artists and styles, deepening knowledge, refining tastes, broadening horizons – and generally be a catalyst to enable people to enjoy listening to music.

References [1]

X. Amatriain and J. Basilico. Recommender systems in industry: A netflix case study. In F. Ricci, L. Rokach, and B. Shapira, editors, Recommender Systems Handbook, pages 385–419. Springer US, Boston, MA, 2015.

9 User awareness in music recommender systems | 245

[2]

[3] [4]

[5]

[6]

[7] [8]

[9]

[10] [11] [12]

[13]

[14]

[15] [16]

[17]

[18]

[19]

I. Andjelkovic, D. Parra, and J. O’Donovan. Moodplay: Interactive mood-based music discovery and recommendation. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, UMAP ’16, pages 275–279. ACM, New York, NY, USA, 2016. M. Avdeeff. Technological engagement and musical eclecticism: An examination of contemporary listening practices. 2012. L. Baltrunas, M. Kaminskas, B. Ludwig, O. Moling, F. Ricci, K.-H. Lüke, and R. Schwaiger. InCarMusic: Context-Aware Music Recommendations in a Car. In International Conference on Electronic Commerce and Web Technologies (EC-Web), Toulouse, France, 2011. L. Barrington, R. Oda, and G. Lanckriet. Smarter than genius? human evaluation of music recommender systems. In Proceedings of the 10th International Society for Music Information Retrieval Conference, ISMIR’09, Nara, Japan, 2009. M. Barthet, G. Fazekas, and M. Sandler. Multidisciplinary perspectives on music emotion recognition: Implications for content and context-based models. In Proceedings of International Symposium on Computer Music Modelling and Retrieval, pages 492–507, 2012. C. Bauer and A. Novotny. A consolidated view of context for intelligent systems. Journal of Ambient Intelligence and Smart Environments, 9(4):377–393, 2017. J. T. Biehl, P. D. Adamczyk, and B. P. Bailey. DJogger: A Mobile Dynamic Music Device. In Proceedings of the 24th Annual ACM SIGCHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI EA), Montréal, QC, Canada, 2006. D. Boer and R. Fischer. Towards a holistic model of functions of music listening across cultures: A culturally decentred qualitative approach. Psychology of Music, 40(2):179–200, 2010. G. Bonnin and D. Jannach. Automated generation of music playlists: Survey and experiments. ACM Computing Surveys (CSUR), 47(2):26, 2015. D. O. Case. Looking for information: a survey of research on information seeking needs, and behavior. Emerald Group Publishing, Bingley, UK, 3rd edition, 2012. M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE, 96(4):668–696, 2008. T. Cebrián, M. Planagumà, P. Villegas, and X. Amatriain. Music Recommendations with Temporal Context Awareness. In Proceedings of the 4th ACM Conference on Recommender Systems (RecSys), Barcelona, Spain, 2010. F. Celli, E. Bruni, and B. Lepri. Automatic personality and interaction style recognition from facebook profile pictures. In Proceedings of the 22nd ACM international conference on Multimedia, pages 1101–1104. ACM, 2014. Ò. Celma. Music Recommendation and Discovery – The Long Tail, Long Fail, and Long Play in the Digital Music Space. Springer, Berlin, Heidelberg, Germany, 2010. Ò. Celma and P. Herrera. A New Approach to Evaluating Novel Recommendations. In Proceedings of the 2nd ACM Conference on Recommender Systems (RecSys), Lausanne, Switzerland, 2008. C.-W. Chen, P. Lamere, M. Schedl, and H. Zamani. RecSys Challenge 2018: Automatic Music Playlist Continuation. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, pages 527–528. ACM, New York, NY, USA, 2018. S. Cunningham, S. Caulder, and V. Grout. Saturday Night or Fever? Context-Aware Music Playlists. In Proceedings of the 3rd International Audio Mostly Conference: Sound in Motion, Piteå, Sweden, 2008. S. J. Cunningham, D. Bainbridge, and D. Mckay. Finding new music: A diary study of everyday encounters with novel songs. In Proceedings of the 8th International Conference on Music Information Retrieval, Vienna, Austria, pages 83–88, September 23–27 2007.

246 | P. Knees et al.

[20]

[21]

[22] [23]

[24]

[25]

[26]

[27]

[28]

[29]

[30] [31]

[32]

[33] [34]

[35]

[36]

S. J. Cunningham and D. M. Nichols. Exploring Social Music Behaviour: An Investigation of Music Selection at Parties. In Proceedings of the 10th International Society for Music Information Retrieval Conference (ISMIR 2009), Kobe, Japan, October 2009. R. de Oliveira and N. Oliver. TripleBeat: Enhancing Exercise Performance with Persuasion. In Proceedings of the 10th International Conference on Human Computer Interaction with Mobile Devices and Services (Mobile CHI), Amsterdam, the Netherlands, 2008. A. K. Dey. Understanding and using context. Personal and Ubiquitous Computing, 5(1):4–7, 2001. L. Dey, M. U. Asad, N. Afroz, and R. P. D. Nath. Emotion extraction from real time chat messenger. In 2014 International Conference on Informatics, Electronics Vision (ICIEV), pages 1–5, May 2014. J. Donaldson. A hybrid social-acoustic recommendation system for popular music. In Proceedings of the ACM Conference on Recommender Systems (RecSys), Minneapolis, MN, USA, 2007. S. Dornbush, J. English, T. Oates, Z. Segall, and A. Joshi. XPod: A Human Activity Aware Learning Mobile Music Player. In 20th International Joint Conference on Artificial Intelligence (IJCAI): Proceedings of the 2nd Workshop on Artificial Intelligence Techniques for Ambient Intelligence, Hyderabad, India, 2007. G. Dror, N. Koenigstein, and Y. Koren. Yahoo! Music Recommendations: Modeling Music Ratings with Temporal Dynamics and Item Taxonomy. In Proceedings of the 5th ACM Conference on Recommender Systems (RecSys), Chicago, IL, USA, 2011. S. Ebrahimi Kahou, V. Michalski, K. Konda, R. Memisevic, and C. Pal. Recurrent neural networks for emotion recognition in video. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, pages 467–474. ACM, New York, NY, USA, 2015. G. T. Elliott and B. Tomlinson. PersonalSoundtrack: Context-aware playlists that adapt to user pace. In Proceedings of the 24th Annual ACM SIGCHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI EA), Montréal, QC, Canada, 2006. M. Erdal, M. Kächele, and F. Schwenker. Emotion recognition in speech with deep learning architectures. In F. Schwenker, H. M. Abbas, N. El Gayar, and E. Trentin, editors, Proceedings of Artificial Neural Networks in Pattern Recognition: 7th IAPR TC3 Workshop, pages 298–311. Springer International Publishing, 2016. B. Ferwerda and M. Graus. Predicting musical sophistication from music listening behaviors: A preliminary study. arXiv preprint arXiv:1808.07314, 2018. B. Ferwerda, M. Graus, A. Vall, M. Tkalčič, and M. Schedl. The influence of users’ personality traits on satisfaction and attractiveness of diversified recommendation lists. In 4 th Workshop on Emotions and Personality in Personalized Systems (EMPIRE) 2016, page 43, 2016. B. Ferwerda, M. P. Graus, A. Vall, M. Tkalcic, and M. Schedl. How item discovery enabled by diversity leads to increased recommendation list attractiveness. In Proceedings of the Symposium on Applied Computing, pages 1693–1696. ACM, 2017. B. Ferwerda and M. Schedl. Enhancing music recommender systems with personality information and emotional states: A proposal. In UMAP Workshops, 2014. B. Ferwerda and M. Schedl. Investigating the relationship between diversity in music consumption behavior and cultural dimensions: A cross-country analysis. In UMAP (Extended Proceedings), 2016. B. Ferwerda and M. Schedl. Personality-based user modeling for music recommender systems. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 254–257. Springer, 2016. B. Ferwerda, M. Schedl, and M. Tkalcic. Predicting personality traits with Instagram pictures.

9 User awareness in music recommender systems | 247

[37]

[38]

[39]

[40] [41]

[42]

[43]

[44]

[45] [46]

[47]

[48]

[49] [50] [51]

[52]

[53]

In Proceedings of the 3rd Workshop on Emotions and Personality in Personalized Systems 2015, pages 7–10. ACM, 2015. B. Ferwerda, M. Schedl, and M. Tkalcic. Using Instagram picture features to predict users’ personality. In International Conference on Multimedia Modeling, pages 850–861. Springer, 2016. B. Ferwerda, M. Schedl, and M. Tkalčič. Personality & Emotional States: Understanding Users’ Music Listening Needs. In Extended Proceedings of the 23rd International Conference on User Modeling, Adaptation and Personalization (UMAP), Dublin, Ireland, June–July 2015. B. Ferwerda and M. Tkalcic. Predicting users’ personality from Instagram pictures: Using visual and/or content features? In The 26th Conference on User Modeling, Adaptation and Personalization, Singapore, 2018. B. Ferwerda and M. Tkalcic. You are what you post: What the content of Instagram pictures tells about users’ personality. In The 23rd International on Intelligent User Interfaces, 2018. B. Ferwerda, M. Tkalčič, and M. Schedl. Personality traits and music genre preferences: How music taste varies over age groups. In Proceedings of the 1st Workshop on Temporal Reasoning in Recommender Systems (RecTemp) at the 11th ACM Conference on Recommender Systems, Como, August 31, 2017. B. Ferwerda, M. Tkalčič, and M. Schedl. Personality traits and music genres: What do people prefer to listen to? In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, pages 285–288. ACM, 2017. B. Ferwerda, A. Vall, M. Tkalčič, and M. Schedl. Exploring music diversity needs across countries. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, pages 287–288. ACM, 2016. B. Ferwerda, E. Yang, M. Schedl, and M. Tkalčič. Personality traits predict music taxonomy preferences. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, pages 2241–2246. ACM, 2015. A. Foster. Serendipity and information seeking: an empirical study. Journal of Documentation, 59(3):321–340, 2003. J. Fuller, L. Hubener, Y.-S. Kim, and J. H. Lee. Elucidating user behavior in music services through persona and gender. In Proceedings of the 17th International Society for Music Information Retrieval Conference, pages 626–632, 2016. M. Gillhofer and M. Schedl. Iron Maiden While Jogging, Debussy for Dinner? – An Analysis of Music Listening Behavior in Context. In Proceedings of the 21st International Conference on MultiMedia Modeling (MMM), Sydney, Australia, January 2015. J. Golbeck, C. Robles, M. Edmondson, and K. Turner. Predicting personality from Twitter. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on, pages 149–156. IEEE, 2011. J. Golbeck, C. Robles, and K. Turner. Predicting personality with social media. In CHI’11 extended abstracts on human factors in computing systems, pages 253–262. ACM, 2011. A. N. Hagen. The playlist experience: Personal playlists in music streaming services. Popular Music and Society, 38(5):625–645, 2015. J. Hart, A. G. Sutcliffe, and A. di Angeli. Evaluating user engagement theory. In CHI Conference on Human Factors in Computing Systems, May 2012. Paper presented in Workshop ‘Theories behind UX Research and How They Are Used in Practice’, 6 May 2012. M. Hassenzahl. The thing and i: Understanding the relationship between user and product. In M. A. Blythe, K. Overbeeke, A. F. Monk, and P. C. Wright, editors, Funology: From Usability to Enjoyment, pages 31–42. Springer Netherlands, Dordrecht, 2005. K. Hevner. Expression in Music: A Discussion of Experimental Studies and Theories.

248 | P. Knees et al.

[54] [55]

[56] [57] [58]

[59] [60]

[61] [62] [63] [64]

[65] [66]

[67]

[68]

[69] [70]

[71] [72]

Psychological Review, 42, March 1935. G. Hofstede, G. J. Hofstede, and M. Minkov. Cultures and Organizations: Software of the Mind. McGraw-Hill, New York, NY, United States, 3rd edition, 2010. J. Hong, W.-S. Hwang, J.-H. Kim, and S.-W. Kim. Context-aware music recommendation in mobile smart devices. In Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC ’14, pages 1463–1468. ACM, New York, NY, USA, 2014. Y. Hu, Y. Koren, and C. Volinsky. Collaborative Filtering for Implicit Feedback Datasets. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), Pisa, Italy, 2008. A. Huq, J. Bello, and R. Rowe. Automated Music Emotion Recognition: A Systematic Evaluation. Journal of New Music Research, 39(3):227–244, November 2010. G. B. Iman Kamehkhosh and Dietmar Jannach. How Automated Recommendations Affect the Playlist Creation Behavior of Users. In Joint Proceedings of the 23rd ACM Conference on Intelligent User Interfaces (ACM IUI 2018) Workshops: Intelligent Music Interfaces for Listening and Creation (MILC), Tokyo, Japan, March 2018. S. S. Iyengar and M. R. Lepper. When choice is demotivating: Can one desire too much of a good thing? Journal of Personality and Social Psychology, 79(6):995–1006, 2000. D. Jannach, L. Lerche, and I. Kamehkhosh. Beyond “hitting the hits”: Generating coherent music playlist continuations with the right tracks. In Proceedings of the 9th ACM Conference on Recommender Systems, RecSys ’15, pages 187–194. ACM, New York, NY, USA, 2015. S. Johansson. Music as part of connectivity culture. Taylor & Francis Group, Milton, United Kingdom, 2018. O. P. John, E. M. Donahue, and R. L. Kentle. The big five inventory—versions 4a and 54, 1991. P. N. Juslin. What Does Music Express? Basic Emotions and Beyond. Frontiers in Psychology, 4(596), 2013. I. Kamehkhosh and D. Jannach. User perception of next-track music recommendations. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, UMAP ’17, pages 113–121. ACM, New York, NY, USA, 2017. M. Kaminskas and F. Ricci. Contextual music information retrieval and recommendation: State of the art and challenges. Computer Science Review, 6(2):89–119, 2012. M. Kaminskas, F. Ricci, and M. Schedl. Location-aware Music Recommendation Using Auto-Tagging and Hybrid Matching. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys), Hong Kong, China, October 2013. H. Kaya, F. Görpinar, and A. A. Salah. Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image and Vision Computing, 65:66–75, 2017. Multimodal Sentiment Analysis and Mining in the Wild Image and Vision Computing. Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. Speck, and D. Turnbull. Music emotion recognition: A state of the art review. In Proceedings of the International Society for Music Information Retrieval Conference, 2010. Y. Kjus. Musical exploration via streaming services: The Norwegian experience. Popular Communication, 14(3):127–136, 2016. P. Knees and M. Schedl. A survey of music similarity and recommendation from music context data. ACM Transactions on Multimedia Computing, Communications, and Applications, 10(1):2:1–2:21, Dec. 2013. P. Knees and M. Schedl. Music Similarity and Retrieval – An Introduction to Audio- and Web-based Strategies, volume 36 of The Information Retrieval Series. Springer, 2016. P. Knees and G. Widmer. Searching for music using natural language queries and relevance feedback. In N. Boujemaa, M. Detyniecki, and A. Nürnberger, editors, Adaptive Multimedia Retrieval: Retrieval, User, and Semantics, pages 109–121. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.

9 User awareness in music recommender systems | 249

[73]

[74] [75]

[76] [77]

[78] [79] [80]

[81]

[82]

[83] [84]

[85]

[86]

[87]

[88]

[89] [90] [91]

B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, and C. Newell. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction, 22(4–5):441–504, 2012. V. J. Konecni. Social interaction and musical preference. In The psychology of music, pages 497–516, 1982. Y. Koren. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Las Vegas, NV, USA, 2008. T. Krismayer, M. Schedl, P. Knees, and R. Rabiser. Predicting user demographics from music listening information. Multimedia Tools and Applications, 78(3):2897–2920, Feb. 2019. F.-F. Kuo, M.-F. Chiang, M.-K. Shan, and S.-Y. Lee. Emotion-based music recommendation by association discovery from film music. In Proceedings of the 13th annual ACM international conference on Multimedia, pages 507–510. ACM, 2005. J. R. Kwapisz, G. M. Weiss, and S. A. Moore. Activity recognition using cell phone accelerometers. SIGKDD Explor. Newsl., 12(2):74–82, Mar. 2011. A. Laplante. Everyday life music information-seeking behaviour of young adults: An exploratory study. Doctoral dissertation, 2008. A. Laplante. Who influence the music tastes of adolescents? a study on interpersonal influence in social networks. In Proceedings of the 2nd International ACM Workshop on Music information retrieval with user-centered and multimodal strategies (MIRUM), pages 37–42, 2012. A. Laplante. Improving music recommender systems: What we can learn from research on music tastes? In 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan, October 2014. A. Laplante and J. S. Downie. Everyday life music information-seeking behaviour of young adults. In Proceedings of the 7th International Conference on Music Information Retrieval, Victoria (BC), Canada, October 8–12 2006. A. Lay and B. Ferwerda. Predicting users’ personality based on their ‘liked’ images on Instagram. In The 23rd International on Intelligent User Interfaces, 2018. J. H. Lee, H. Cho, and Y.-S. Kim. Users’ music information needs and behaviors: Design implications for music information retrieval systems. Journal of the Association for Information Science and Technology, 67(6):1301–1330, 2016. J. H. Lee and R. Price. Understanding users of commercial music services through personas: Design implications. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015), pages 476–482, 2015. J. H. Lee, R. Wishkoski, L. Aase, P. Meas, and C. Hubbles. Understanding users of cloud music services: Selection factors, management and access behavior, and perceptions. Journal of the Association for Information Science and Technology, 68(5):1186–1200, 2017. J. S. Lee and J. C. Lee. Context Awareness by Case-Based Reasoning in a Music Recommendation System. In H. Ichikawa, W.-D. Cho, I. Satoh, and H. Youn, editors, Ubiquitous Computing Systems, volume 4836 of LNCS. Springer, 2007. J. Lehmann, M. Lalmas, E. Yom-Tov, and G. Dupret. Models of user engagement. In Proceedings of the 20th International Conference on User Modeling, Adaptation, and Personalization, UMAP’12, pages 164–175. Springer-Verlag, Berlin, Heidelberg, 2012. H. Li and M. Trocan. Deep learning of smartphone sensor data for personal health assistance. Microelectronics Journal, 2018. L. A. Liikkanen and P. Åman. Shuffling services: Current trends in interacting with digital music. Interacting with Computers, 28(3):352–371, 2016. M. Liu, X. Hu, and M. Schedl. Artist preferences and cultural, socio-economic distances across

250 | P. Knees et al.

[92] [93] [94]

[95] [96]

[97] [98] [99] [100] [101]

[102]

[103]

[104]

[105]

[106] [107]

[108]

[109] [110]

countries: A big data perspective. In The 18th International Society for Music Information Retrieval Conference, Suzhou, China, October 23-27, 2017. M. Liu, X. Hu, and M. Schedl. The relation of culture, socio-economics, and friendship to music preferences: A large-scale, cross-country study. PLOS ONE, 13(12):e0208186, 2018. A. J. Lonsdale and A. C. North. Why do we listen to music? A uses and gratifications analysis. British Journal of Psychology, 102(1):108–134, February 2011. B. McFee and G. Lanckriet. Hypergraph Models of Playlist Dialects. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal, 2012. A. P. Merriam and V. Merriam. Uses and functions. In The Anthropology of Music, chapter 11. Northwestern University Press, 1964. B. Moens, L. van Noorden, and M. Leman. D-Jogger: Syncing Music with Walking. In Proceedings of the 7th Sound and Music Computing Conference (SMC), Barcelona, Spain, 2010. J. W. Morris and D. Powers. Control, curation and musical experience in streaming music services. Creative Industries Journal, 8(2):106–122, 2015. F. Noroozi, M. Marjanovic, A. Njegus, S. Escalera, and G. Anbarjafari. Audio-visual emotion recognition in video clips. IEEE Transactions on Affective Computing, 10(1):60–75, 2019. A. C. North and D. J. Hargreaves. Situational influences on reported musical preference. Psychomusicology: A Journal of Research in Music Cognition, 15(1–2):30, 1996. A. C. North, D. J. Hargreaves, and J. J. Hargreaves. Uses of music in everyday life. Music Perception: An Interdisciplinary Journal, 22(1):41–77, 2004. K. O’Hara and B. Brown, editors. Consuming Music Together: Social and Collaborative Aspects of Music Consumption Technologies, volume 35 of Computer Supported Cooperative Work. Springer Netherlands, 2006. K. Okada, B. F. Karlsson, L. Sardinha, and T. Noleto. Contextplayer: Learning contextual music preferences for situational recommendations. In SIGGRAPH Asia 2013 Symposium on Mobile Graphics and Interactive Applications, SA ’13, pages 6:1–6:7. ACM, New York, NY, USA, 2013. S. Oramas, V. C. Ostuni, T. D. Noia, X. Serra, and E. D. Sciascio. Sound and music recommendation with knowledge graphs. ACM Transactions on Intelligent Systems and Technology, 8(2):21:1–21:21, Oct. 2016. E. Pampalk, T. Pohle, and G. Widmer. Dynamic Playlist Generation Based on Skipping Behavior. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR), London, UK, 2005. H.-S. Park, J.-O. Yoo, and S.-B. Cho. A context-aware music recommendation system using fuzzy Bayesian networks with utility theory. In Proceedings of the 3rd International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Xi’an, China, 2006. T. Pettijohn, G. Williams, and T. Carter. Music for the seasons: Seasonal music preferences in college students. Current Psychology, 1–18, 2010. P. Pu, L. Chen, and R. Hu. A user-centric evaluation framework for recommender systems. In Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys ’11, pages 157–164. ACM, New York, NY, USA, 2011. D. Quercia, M. Kosinski, D. Stillwell, and J. Crowcroft. Our Twitter profiles, our selves: Predicting personality with Twitter. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third Inernational Conference on Social Computing (SocialCom), 2011 IEEE Third International Conference on, pages 180–185. IEEE, 2011. P. J. Rentfrow. The role of music in everyday life: Current directions in the social psychology of music. Social and personality psychology compass, 6(5):402–416, 2012. P. J. Rentfrow and S. D. Gosling. The do re mi’s of everyday life: The structure and personality

9 User awareness in music recommender systems | 251

[111] [112]

[113] [114]

[115]

[116] [117]

[118]

[119]

[120]

[121] [122]

[123]

[124] [125]

[126]

[127] [128]

correlates of music preferences. Journal of personality and social psychology, 84(6):1236, 2003. J. A. Russell. A Circumplex Model of Affect. Journal of Personality and Social Psychology, 39(6):1161–1178, 1980. S. Sasaki, T. Hirai, H. Ohya, and S. Morishima. Affective music recommendation system based on the mood of input video. In X. He, S. Luo, D. Tao, C. Xu, J. Yang, and M. A. Hasan, editors, MultiMedia Modeling, pages 299–302. Springer International Publishing, Cham, 2015. T. Schäfer, P. Sedlmeier, C. Städtler, and D. Huron. The psychological functions of music listening. Frontiers in Psychology, 4(511):1–34, 2013. M. Schedl, G. Breitschopf, and B. Ionescu. Mobile Music Genius: Reggae at the Beach, Metal on a Friday Night? In Proceedings of the 4th ACM International Conference on Multimedia Retrieval (ICMR), Glasgow, UK, 2014. M. Schedl and B. Ferwerda. Large-scale analysis of group-specific music genre taste from collaborative tags. In The 19th IEEE International Symposium on Multimedia (ISM2017), Taichung, December 11–13, 2017. M. Schedl, A. Flexer, and J. Urbano. The neglected user in music information retrieval research. Journal of Intelligent Information Systems, July 2013. M. Schedl, E. Gómez, E. Trent, M. Tkalčič, H. Eghbal-Zadeh, and A. Martorell. On the Interrelation between Listener Characteristics and the Perception of Emotions in Classical Orchestra Music. IEEE Transactions on Affective Computing, 9(4):507–525, 2018. M. Schedl, P. Knees, B. McFee, D. Bogdanov, and M. Kaminskas. Music Recommender Systems. In F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors, Recommender Systems Handbook, chapter 13, pages 453–492. Springer, 2nd edition, 2015. M. Schedl, F. Lemmerich, B. Ferwerda, M. Skowron, and P. Knees. Indicators of country similarity in terms of music taste, cultural, and socio-economic factors. In The 19th IEEE International Symposium on Multimedia (ISM2017), Taichung, December 11–13, 2017. M. Schedl, H. Zamani, C.-W. Chen, Y. Deldjoo, and M. Elahi. Current challenges and visions in music recommender systems research. International Journal of Multimedia Information Retrieval, April 2018. K. Scherer. What are emotions? And how can they be measured? Social Science Information, 44(4):693–727, 2005. M. Skowron, F. Lemmerich, B. Ferwerda, and M. Schedl. Predicting genre preferences from cultural and socio-economic factors for music retrieval. In Proceedings of the 39th European Conference on Information Retrieval (ECIR), 2017. M. Skowron, M. Tkalčič, B. Ferwerda, and M. Schedl. Fusing social media cues: personality prediction from Twitter and Instagram. In Proceedings of the 25th international conference companion on world wide web, pages 107–108. International World Wide Web Conferences Steering Committee, 2016. M. Slaney and W. White. Similarity Based on Rating Data. In Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), Vienna, Austria, 2007. L. Spinelli, J. Lau, L. Pritchard, and J. H. Lee. Influences on the social practices surrounding commercial music services: A model for rich interactions. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, 2018. H. Steck, R. van Zwol, and C. Johnson. Interactive recommender systems: Tutorial. In Proceedings of the 9th ACM Conference on Recommender Systems, RecSys ’15, pages 359–360. ACM, New York, NY, USA, 2015. S. J. Tepper and E. Hargittai. Pathways to music exploration in a digital age. Poetics, 37(3):227–249, 2009. M. Tkalčič, B. Ferwerda, D. Hauger, and M. Schedl. Personality correlates for digital

252 | P. Knees et al.

[129]

[130]

[131]

[132]

[133]

[134]

[135]

[136]

[137] [138] [139]

concert program notes. In International Conference on User Modeling, Adaptation, and Personalization, pages 364–369. Springer, 2015. A. van den Oord, S. Dieleman, and B. Schrauwen. Deep Content-based Music Recommendation. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26 (NIPS). Curran Associates, Inc., 2013. S. Vargas, L. Baltrunas, A. Karatzoglou, and P. Castells. Coverage, redundancy and size-awareness in genre diversity for recommender systems. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 209–216. ACM, New York, NY, USA, 2014. S. Volokhin and E. Agichtein. Understanding music listening intents during daily activities with implications for contextual music recommendation. In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval, CHIIR ’18, pages 313–316. ACM, New York, NY, USA, 2018. W. Wang, A. X. Liu, and M. Shahzad. Gait recognition using wifi signals. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’16, pages 363–373. ACM, New York, NY, USA, 2016. X. Wang, D. Rosenblum, and Y. Wang. Context-aware Mobile Music Recommendation for Daily Activities. In Proceedings of the 20th ACM International Conference on Multimedia, pages 99–108. ACM, Nara, Japan, 2012. B. Whitman and S. Lawrence. Inferring Descriptions and Similarity for Music from Community Metadata. In Proceedings of the 2002 International Computer Music Conference (ICMC), Göteborg, Sweden, 2002. K. Yadati, C. C. Liem, M. Larson, and A. Hanjalic. On the automatic identification of music for common activities. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ICMR ’17, pages 192–200. ACM, New York, NY, USA, 2017. H. Yakura, T. Nakano, and M. Goto. Focusmusicrecommender: A system for recommending music to listen to while working. In 23rd International Conference on Intelligent User Interfaces, IUI ’18, pages 7–17. ACM, New York, NY, USA, 2018. Y.-H. Yang and H. H. Chen. Music Emotion Recognition. CRC Press, 2011. Y.-H. Yang and H. H. Chen. Machine recognition of music emotion: A review. ACM Transactions on Intelligent Systems and Technology, 3(3):40:1–40:30, May 2012. M. Zentner, D. Grandjean, and K. R. Scherer. Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion, 8(4):494, 2008.

Julio Abascal, Olatz Arbelaitz, Xabier Gardeazabal, Javier Muguerza, J. Eduardo Pérez, Xabier Valencia, and Ainhoa Yera

10 Personalizing the user interface for people with disabilities

Abstract: Computer applications, and especially the Internet, provide many people with disabilities with unique opportunities for interpersonal communication, social interaction, and active participation (including access to labor and entertainment). Nevertheless, rigid user interfaces often present accessibility barriers to people with physical, sensory, or cognitive impairments. Accordingly, user interface personalization is crucial to overcome these barriers, allowing a considerable section of the population with disabilities to have computer access. Adapting the user interface to people with disabilities requires taking into consideration their physical, sensory, or cognitive abilities and restrictions and then providing alternative access procedures according to their capacities. This chapter presents methods and techniques that are applied to research and practice on user interface personalization for people with disabilities and discusses possible approaches for diverse application fields where personalization is required: accessibility to the web using transcoding, web mining for eGovernment, and human–robot interaction for people with severe motor restrictions. Keywords: digital accessibility, modeling users with disabilities, data mining and Machine Learning for user modeling, comprehensive user modeling, assistive UserAdapted Interaction

1 Introduction A great number of common human activities can currently be carried out with computers. People with disabilities, who were previously not able to physically perform these

activities, can now carry them out through digital applications, which provide ways to

overcome specific motor, sensory, or intellectual barriers. Nevertheless, most common interfaces present accessibility barriers that hinder access for numerous users with

disabilities. Users with restrictions would especially benefit from personalized applications which adapt access procedures to the specific characteristics of each person.

Acknowledgement: This work was partially supported by the Ministry of Economy and Competitiveness of the Spanish Government and the European Regional Development fund-ERFD (PhysComp project, TIN2017-85409-P). The authors are members of the ADIAN research team, supported by the Basque Government, Department of Education, Universities and Research under grant IT980-16. Xabier Gardeazabal, J. Eduardo Pérez, and Ainhoa Yera hold PhD scholarships from the University of the Basque Country (UPV/EHU) (respective codes: PIF16/212, PIF13/248, PIF15/143). https://doi.org/10.1515/9783110552485-010

254 | J. Abascal et al. In this way personalization would help to overcome accessibility barriers, enabling access to computers for a large population with restrictions. User-Adapted Interaction is a discipline with a long tradition of interface personalization in which adaptation is primarily supported by User Modeling. User models contain abstract representations of user properties that are relevant for the interaction [60]. These properties include user needs, preferences, knowledge, etc. Physical, cognitive, and behavioral characteristics can also be included. Current values of static and dynamic characteristics, observable through the interaction, can be used to personalize the interface. When it comes to applying these techniques to personalize the interfaces used by people with disabilities, there are two main essential requisites: 1. The main characteristics of the initial interface (such as content, presentation, navigation, or behavior) must be adjustable to be able to match user restrictions and needs. 2. The user model must contain information about the specific abilities and restrictions of the user. Merely adjusting general-purpose personalization systems (extending the user model with fields that record potential disabilities) is not enough. Personalized user interfaces for accessibility have to meet the specific characteristics of specific users, taking advantage of the abilities they do have and overcoming their physical, sensory, or cognitive restrictions. Personalization requires thorough consideration of the interface parameters and features that can be adapted. Areas of possible interface adaptation to enhance accessibility include presentation (e. g., “place important areas of content near the top of the page”), content (e. g., “incorporate specific scrolling icons on each page”), navigation (e. g., “create a table of contents for the website”), and behavior (e. g., “avoid automatic page updating”), as set out by several seminal works [50, 18, 48]. The wide range of disabilities and the great variability of user characteristics even with those users who have similar disabilities hinder the application of approaches focused on specific restrictions. For this reason, advanced personalized interaction systems are more oriented to the abilities of the user. These abilities can be similar across diverse disabilities. In order to find common interaction barriers and shared interaction patterns not limited to specific disabilities, data mining and Machine Learning methods can be used. These methods can help to detect the most convenient interaction procedures for each user, avoiding preconceived schemes. Many people with disabilities use specific devices and applications (known as assistive technology) in order to gain access to computers whether these are commercialized or custom-made [25]. These devices are extremely diverse and can strongly condition the adaptation of the interface. They include (i) special switches, keyboards, and pointing devices, (ii) software such as screen readers and communication programs,

10 Personalizing the user interface for people with disabilities | 255

and (iii) devices for eye gaze and head trackers. Therefore, user interface personalization methods have to devote specific consideration to the assistive technology, as pointed out by [53] and empirically proved by [84, 77]. In addition to analyzing the technical and human requisites for valid and efficient modeling and adaptation, specific areas of interaction are introduced in this chapter in order to discuss how diverse needs can be collected and included in the design to create interfaces that enable users with disabilities to overcome interaction barriers. As illustrative application cases, we focus on three diverse areas of interaction: 1. Transcoding to personalize access to the web. Starting from the specific characteristics of a particular user (obtained from its user model), inaccessible webpages are automatically recoded applying a number of accessibility techniques. 2. Accessing eGovernment sites, which are crucial for helping people with disabilities, to exercise their right to interact with the administration. Since the access to administration sites is usually anonymous, web mining techniques are applied to extract patterns that allow some degree of personalization. 3. Interfaces for Augmented and Alternative Manipulation. Based on human–robot interaction techniques, these interfaces require very distinctive interaction modes that will be essential in the near future to provide autonomy to people with severe motor restrictions. Personalization in Augmented and Alternative Manipulation is essential because users cannot produce all the details required to control a robot and, therefore, a significant part of the interaction must be deduced from the user, the task, and the context models. Even if personalization provides great opportunities to users with physical, sensory, or cognitive impairments, it also raises additional challenges relating to privacy, data acquisition, and modeling, which are briefly discussed at the end of the chapter.

2 User modeling and personalized interaction for people with disabilities Personalization of user interfaces for people with disabilities usually involves both customization and adaptation. According to Norman and Draper [61] a system is adaptable when the user manually changes a number of parameters to customize its appearance, behavior, or functionality. Adaptive systems, however, automatically perform adaptations without direct user intervention. For people with disabilities, customization is a first step to adapting the interface to their needs and preferences, although this task is not always easy and may sometimes require expert support. In addition to permanent impairments, people with disabilities may experience significant variations in their capabilities over short periods of time. For this reason, digital systems which are able to dynamically adapt themselves to the users’ changing conditions

256 | J. Abascal et al. without inconveniencing them are especially appealing. These systems decrease the burden on the users by reducing customization demands and are prepared for shortterm changes in the users’ conditions. In order to perform dynamic adaptations in a transparent way for the users, adaptive systems employ user models that gather and process information about user characteristics, interests, restrictions, abilities, etc. User models can collect information both manually, from the users’ answers to specific questions in a registration process, and implicitly, by observing and interpreting their interaction with the system. User modeling has been frequently used in web design to support people through personalized access. These techniques are reported to be very successful in commercial websites [36] both for customers who receive personalized treatment which better satisfies their interests, as well as for online retailers that can recognize, understand, and ultimately serve their customers more efficiently. Concerning people with disabilities, user modeling requires advanced data structures to enable the management of diverse (and sometimes incoherent) personalization parameters. This is because the individual capabilities of people with special needs are extremely heterogeneous [82], as are their preferences and behaviors when interacting with technology [79]. Quite simple data structures have been used successfully for data management of user models, but in recent years ontologies have been extensively used to build user models due to their notable advantages in this domain, such as advanced automated reasoning with complex data structures and better suitability to deal with expandable domains of knowledge. Therefore, understanding human diversity when interacting with technology is crucial to improve interaction methods and thus to facilitate access to computers to people with special needs. Nevertheless, most widely used interaction models for describing human behavior are focused on people without disabilities, and therefore are not always appropriate for modeling people with special needs. This is, for instance, the case of the well-known Fitts law. This law [19] is a predictive model widely used in human–computer interaction to understand user performance accurately with interactions involving rapidly aimed movements, such as moving a finger or a cursor to select a target. However, whether Fitts’ law applies to individuals with motor impairments [71] or not [38] is a moot point. As a compromise solution, variations of the original model including combinations of Fitts’ law-related features have been employed for users with disabilities [32].

3 Evolution and types of interface personalization for people with disabilities The idea of user interface adaptivity appeared with the early intelligent user interfaces [49]. But, according to Stephanidis [73], the first attempts to bring closer the adaptiv-

10 Personalizing the user interface for people with disabilities | 257

ity and the accessibility concepts were only initiated in the 1990s, with the European project ACCESS1 being one of the pioneers in focusing on user interface adaptations to support the requirements of people with disabilities and elderly people. Among subsequent efforts, the AVANTI project stands out. The AVANTI2 project applied adaptability and adaptivity techniques to tailor the interface to the characteristics of people with disabilities to provide accessibility in web-based multimedia applications. It took into account the different contexts of use and the changing characteristics of the interaction between the user and the system. In a first phase, AVANTI considered user static characteristics (abilities, skills, requirements, and preferences) stored in the user model. During the interaction, the system collected users’ procedures to deduce their dynamic characteristics (such as idle time and error rate). Adaptation was performed by applying rule-based system reasoning [74].

3.1 Modeling users with disabilities Several user models have been created to address the particular needs of specific types of users. For instance, the PIUMA3 project is focused on personalized support to people within autism spectrum disorder. This project uses a holistic user representation in order to capture different aspects of cognition, affects, and habits, stressing the exploitation of data coming from the real world to build a broad representation of the user. The main goal is to provide support instructions to autism-affected users, according to their preferences and interests and considering their current level of stress and anxiety in specific locations. The model is structured on cognitive status and skills, spatial activities, and habits, likes, and interests and aversions. This representation is used to recommend points of interest according to the current physical context, personalized on the basis of the user’s preferences, habits, and current emotional status [67]. A different application of user modeling is presented by Biswas et al. [16]. A simulator devoted to reflect problems faced by elderly and disabled users while they use a computer, a television, or similar electronic devices. The simulator embodies both the internal state of an application and the perceptual, cognitive, and motor processes of its user. Its purpose is to help interface designers to understand, visualize, and measure the effect of impairment on the interaction with an interface. 1 ACCESS European project (1994–1996) “Development platform for unified ACCESS to enabling environments.” 2 AVANTI European project (1995–1998) “Adaptive and adaptable interactions for multimedia telecommunications applications” Report Summary. https://cordis.europa.eu/result/rcn/21761_en. html 3 PIUMA “Personalized Interactive Urban Maps for Autism.” http://piuma.di.unito.it/

258 | J. Abascal et al. At the same time, personalization has also been used in other fields related to disability, such as healthcare. Health-oriented user models are very descriptive and accurate in the provision of details about the diverse disabilities, their combinations, and their practical effects. Information systems devoted to generate personalized instructions for medical personnel and to inform patients with disabilities about their condition require detailed user models that can explain the peculiarities of the many existing disabilities. Identifying and modeling all the impairments of each disabled patient to personalize health service operations is a challenging task because patients with disability can be affected by numerous different and unrelated conditions that are not taken into account by generic disability stereotypes [23]. In this sense, the International Classification of Functioning, Disability and Health (ICF) is a source for several user models in the accessibility field. This is the World Health Organization (WHO) international standard focused specifically on disabilities [83]. The ICF is organized in two parts: the first one, devoted to Functioning and Disability, is structured in two areas: (i) Body Functions and Structures and (ii) Activities and Participation. The second part covers Contextual Factors, structured in Environmental Factors and Personal Factors. Despite its evident value, the 1400 categories contained in the ICF may result excessive for user modeling applied to user interface personalization. Nevertheless, it is very useful as a source of relations between disabilities and restrictions for activities and participation, related to environmental and personal factors. This model influenced several user interface personalization projects.

3.2 Automatic generation of personalized interfaces A distinctive approach for personalization is taken by the Supple system [32]. Supple automatically generates interfaces for people with motor impairments adapted to their devices, tasks, preferences, and abilities. To this end, the authors formally defined the interface generation as an optimization problem and demonstrated that, despite a large solution space, the problem is computationally feasible for a particular class of cost functions. In fact, the time for automatic interface production ranges from three seconds to one minute. Several different design criteria can be expressed in the cost function, allowing different kinds of personalization. This approach enables extensive user- and system-initiated runtime adaptations to the interfaces once they have been generated. For the automatic user interface generation systems, Supple relies on an interface specification, an explicit device model, and a usage model. The last of these is represented in terms of user traces corresponding either to actual or to anticipated usage. They provide interaction frequencies for primitive widgets and frequencies of transitions between different interface elements. Egoki is another automatic generator of accessible user interfaces designed to allow people with disabilities to access supportive ubiquitous services. Egoki follows

10 Personalizing the user interface for people with disabilities | 259

a model-based approach to select suitable interaction resources and modalities depending on users’ capabilities. It uses three modules: knowledge base, resource selector, and adaptation engine. The knowledge base is mainly supported by the Egonto ontology, to store, update, and maintain the models regarding user abilities, access to device features, and interface adaptations. Automated generation of user-tailored interfaces is based on the transformation of a logical specification of the user interface described in UIML into a final functional user interface using the information specified in the models [33].

3.3 Control of the adaptation by the user Adaptations can cause usability problems, including disorientation and a feeling of losing control. To tackle this problem in the MyUI4 system different adaptation patterns were developed to increase the transparency and controllability of runtime adaptations. These patterns help users to optimize the subjective utility of the system’s adaptation behavior. A design pattern repository created for MyUI includes different types of patterns which serve distinct functions in the adaptation process. There are patterns for creating and updating a user interface profile which includes global variables to define general settings throughout the entire user interface (e. g., font size). Other patterns provide alternative user interface components and elements for current interaction situations. Finally, adaptation dialogue patterns describe the transition from one instance of the user interface to another. Four adaptation patterns to provide the user control over the adaptation are designed: Automatic Adaptation without Adaptation Dialogue (which is the baseline condition), Automatic Adaptation with Implicit Confirmation, Explicit Confirmation before Adaptation, and Explicit Confirmation after Adaptation [64].

3.4 Use of ontologies for user modeling Most current personalization systems for people with disabilities use ontologies,5 which are more or less related to the ICF structure and content, to maintain their user models. For instance, the ACCESSIBLE project6 created an ontology, which is an extension of two previous ones developed for mobility-impaired users in the ASK-IT project, and 4 MyUI European project (2010–2012) “Mainstreaming Accessibility through Synergistic User Modelling and Adaptability.” http://www.myui.eu/ 5 In [72] the authors present a broad review of the use of ontological technologies for user modeling in combination with the semantic web. 6 ACCESSIBLE European Project (2008–2010) “Applications Design and Development.” http://www. accessible-eu.org/index.php/project.html

260 | J. Abascal et al. for elderly people in the OASIS7 project. The ACCESSIBLE ontology [2] includes characteristics of users with disabilities, devices, applications, and other aspects to be taken into account to develop personalized applications. In addition, it contains accessibility guidelines as checkpoints. It also includes rules for semantic verification that describe the requirements and constraints of users with disabilities, associating them to the accessibility checkpoints. ACCESSIBLE has an engine that answers basic rules coded in SWRL (a Semantic Web Rule Language Combining OWL and RuleML). The rules have the following form: “For user X, interacting on device Z, is application Y accessible?” The AEGIS project8 developed the AEGIS ontology [3] to perform the mapping between accessibility concepts and their application in accessibility scenarios. This ontology shares the definition of personal aspects with the ACCESSIBLE ontology. The AEGIS ontology aims at unambiguously defining accessibility domains, as well as the possible semantic interactions between them. To this end conceptual information about (i) the characteristics of users with disabilities, functional limitations and impairments, (ii) technical characteristics of I/O devices, (ii) general and functional characteristics of web, desktop, and mobile applications, and (iv) other assistive technology is formalized. The AEGIS ontology was provided with a Wikipedia-like interface to allow access to non-ontological users [52]. More recently, user modeling tools have advanced towards frameworks which are not focused on a particular application-specific context in order to be more inclusive than application-oriented solutions. For instance, the Global Public Inclusive Infrastructure9 has proposed an ontological framework for addressing universal accessibility in the domain of user interaction with Information and Communication Technologies (ICT). This framework builds its knowledge by reflecting the linkage between user characteristics, interaction requirements (including assistive technologies), personal needs and preferences, and the context of the use of ICT [53].

3.5 Sharing user models An important challenge in current user modeling is the fragmentation of user model definitions. New definitions of user models for disability are often produced based on the specific focus and objectives of research projects [60]. As a result, re-using or sharing user models across different projects turns out to be very difficult. Nevertheless, 7 OASIS European project (2008–2011) “Open architecture for Accessible Services Integration and Standardisation.” http://www.oasis-project.eu/ 8 AEGIS European project (2008–2012) “Open Accessibility Everywhere: Groundwork, Infrastructure, Standards.” http://www.aegis-project.eu/ 9 Global Public Inclusive Infrastructure (GPII). https://gpii.net/

10 Personalizing the user interface for people with disabilities | 261

the advantages of re-using existing models compensate for the difficulties arising from the differences in research objectives and the diversity among user groups. Different R&D fields would benefit from user model sharing. For instance, Ambient-Assisted Living (AAL) provides technological support for the daily life of people with disabilities, largely based on ubiquitous computing. In-home support can be extended to public spaces where ubiquitous accessible applications allow people with disabilities to access location-dependent services. The objective is to transfer existing knowledge about the users, their common activities, and their environment in order for it to be used in ubiquitous applications outside the home. Sharing models between in-home and out-of-home supportive applications requires model interoperability [5]. Two main approaches to achieving syntactic and semantic interoperability can be outlined, i. e., shared format and conversion. The shared format approach requires a unified user profile that can be based on various standards or Semantic Web languages. The conversion approach uses algorithms to convert the syntax and semantics of user model data used in one system for their use in another system. A third approach is possible, combining the benefits of both approaches to allow flexibility in representing user models and to provide semantic mapping of user data from one system to another. To ensure interoperability, embracing a standard, such as RDF,10 allows the exchange of semantically rich data. Ontology definition languages (e. g., RDF, OWL) can achieve interoperability between models, under both centralized and decentralized architectures [20, 21]. The semantic matching approach addresses the problem of semantic interoperability by transforming data structures into lightweight ontologies and establishing semantic correspondences between them. For example, the Semantic Matching Framework (SMF) handles daily routines at home for people with special needs. SMF uses ontologies to describe the semantics of the user model to represent devices (for example, doors, windows, sensors). The adaptation of the environment to people with special needs is based on the detection of disability-related situations that limit the capabilities of users and it provides adapted processes to personalize the provision of services [45]. The Virtual User Modelling and Simulation Standardisation (VUMS) cluster of European projects11 proposed an interoperable user model, able to describe both ablebodied people and people with various kinds of disabilities. The unified user model is used by all the participant projects and aims to be the basis of a new user model standard based on the VUMS Exchange Format. As a first step towards the standardization of user models, the VUMS cluster defined a Glossary of Terms supporting a common 10 Resource Description Framework (RDF) is a Semantic Web Standard. https://www.w3.org/RDF/ 11 The European VUMS Cluster was a joint initiative of the European Commission and the research projects VERITAS, GUIDE, VICON, and MyUI with the objective of aligning user development research in the projects and to foster common standardization activities.

262 | J. Abascal et al. language and the VUMS User Model. A VUMS Exchange Format was defined in order to store user characteristics in a machine-readable form. This format contains a superset of user variables allowing any user model expressed in any project-specific format to be transformed into a model which follows the VUMS Exchange Format. To this end a set of converters, able to transform a user profile following the VUMS Exchange Format into specific user models and vice versa, would be developed [46].

3.6 Learning ontologies Mechanisms to populate user models are very diverse and depend on the application. When ontologies are used to maintain the user model, there exist automatic and semiautomatic methods for ontology acquisition from structured, semi-structured, and unstructured sources [72]. In particular, the outcomes from data mining processes can be used to learn ontologies. In this case, methods range from the manual introduction of the types or stereotypes obtained to the application of reasoning methods provided by the ontological framework to create new items in the ontology or to connect existing ones. In addition, other sources can be used. For instance, in the Semantic Matching Framework Project, the population of the model exploited crowdsourcing mechanisms to enrich the representation, allowing people to provide ratings, comments, and reviews about places [67].

4 Data collection and analysis for user modeling User modeling implies the definition of observable parameters that are relevant for the interaction. In order to model each user, the actual values of these parameters have to be collected. According to Chin [22], the most difficult aspect of applying user modeling to computer systems is the acquisition of the models. In fact, after constructing an initial user model, the acquisition process often continues throughout the entire life of the user modeling system, presenting the user model acquisition as a type of Machine Learning. In his opinion, user model acquisition can be viewed as a type of Machine Learning. When the users are people with disabilities there are data collection choices that would seem more appropriate: Customization and changing settings by the user are frequently required before starting the personalization, but they may require advanced knowledge about the system and they may be time consuming. Users with disabilities are motivated to assume this phase for long-term use, but not for sporadic access. The former is, for example, the case of most eGovernment services. In order to collect the necessary data to start the personalization process, users can be explicitly questioned in diverse ways which should be appropriate for people

10 Personalizing the user interface for people with disabilities | 263

with disabilities. This phase allows an initial user model to be established or a stereotype to be assigned to the user. Part of this data can be also implicitly collected from external sources (databases, previous interactions, social networks, etc.), which is less intrusive for users with disabilities than explicit data gathering. During the interaction, dynamic acquisition allows data to be collected from the interaction process itself. This acquisition phase is very convenient for people with disabilities because these users can undergo significant performance changes during the interaction process (due to fatigue, motivation, learning, etc.). Machine Learning appears to be an appropriate technique for model acquisition. New knowledge derived from data directly collected from the user or indirectly acquired from the interaction process may contain errors or contradictions. Tools for reasoning and making inferences concerning the data collected, for example by means of inference rules, may contribute to enhancing the coherence of the model. Therefore, a common approach when users have disabilities is to automatically collect information while “observing” their activity, for instance, logging cursor movements, scrolls, and target selections performed by the users. These data can be used to indirectly detect changes in the modeled parameters and to update the model accordingly while the interface is being used. Observing the interaction also allows new users to be classified on the basis of similar behaviors already present in the model in order to allocate specific stereotypes.

4.1 Tools for remote and local data collection Direct observation of the users in the laboratory provides valuable information about how each participant using the system, but according to Webb et al. [81], in order to apply Machine Learning for user modeling, substantial labeled datasets are necessary. However, it is not always easy to recruit the appropriate number of participants (due to the lack of users with suitable characteristics, scheduling difficulties, etc.) to obtain such a quantity of information. However, unlike direct collection, remote user testing can provide vast quantities of data with less effort. Remote data collection-based methods are frequently used for user modeling, even though they are not well accepted for formal accessibility or usability evaluations because of the lack of control in the experimental sessions, the impossibility of direct observation of the users, etc. Nevertheless, their evident advantages have increased their use for informal evaluations. Among these advantages, remote data collection allows experimentation “in the wild” with a wider range of participants. They allow unobtrusive data collection, while the users perform the evaluated tasks at home, using their own equipment, whenever they want. These advantages are even more attractive when people with disabilities are part of the study. Remote data collection saves them from having to travel to a specific testing site. In addition, they can use their usual equipment and applications. We

264 | J. Abascal et al. cannot forget that the assistive technology (hardware and software) used by several users with disabilities are tuned and adapted to their specific needs. It is difficult to replicate all these facilities in the laboratory. Even though remotely collected data can provide valuable knowledge, no reliable information can be obtained about the cognitive processes followed by users to perform specific actions, or about how users utilize their assistive technology, unless they are explicitly asked. For instance, remote data logging has been applied extensively to collect data about cursor displacements across the screen in order to detect users’ restrictions. Claypool et al. [24] used remote observation of a set of events to learn about the interests of the users on a page, while other authors [40, 39, 31] collected various user characteristics, such as users’ dexterity, from the movements they performed with the onscreen cursor. MacKenzie et al. [57] analyzed cursor trajectories on pointing and clicking tasks and classified them into seven accuracy measurements, include.e., target re-entry, task axis crossing, movement direction change, orthogonal changes, movement variability, movement error, and movement offset, which served to determine differences among devices in precision pointing tasks. Keates et al. [47] extended these categories with six new cursor characteristics to capture a variety of features of cursor movement performed by motor-impaired users. Hwang et al. [40] discovered that several measurements from the sub-movement structure of cursor trajectories can help to identify difficulties found by people with motor impairments. These measurements, among others, are used to characterize people with physical impairments on pointing and clicking tasks, as in [84]. With regard to the apparatus used to gather data from web navigation, early tools [70] were only able to log data generated by servers.12 Later, technologies such as JavaScript or Java Applets enabled the monitoring and recording of events generated by the cursor or keyboard among others,13 providing more valuable information than server logs [28, 63]. These systems were first located at the server side adding the required code into the webpages in order to gather the data generated. However, interaction data can only be obtained from the pages located in the server in which the tracking tools are deployed using server side tools. On the other hand, proxy tools [12, 14] and client tools [24, 39, 77] allow data to be gathered from any existing page. As far as user modeling is concerned, some of these tools were used to observe the behavior of people with disabilities [14, 39, 77]. Even if they were not strictly used to create or update models [77], they allowed accessibility barriers to be detected and interaction methods and adaptations to be evaluated [76]. Hurst et al. [39], for example, used these data to classify users with respect to their input events. Bigham et al. 12 Extended Log File Format (2018). https://www.w3.org/TR/WD-logfile 13 UI Events (2018). https://www.w3.org/TR/uievents/

10 Personalizing the user interface for people with disabilities | 265

[14] applied these data to analyze the strategies used by blind people during web navigation. Gajos et al. [32] provided an interesting example of the application of the data collection to user modeling and created an ability model that elicits the user needs using a test composed of four tasks (pointing, dragging, list selecting, and multiple clicking). Thus, with this system the most adequate user interface configuration for each user can be determined and stored in the model.

4.2 Data mining and Machine Learning for user modeling The data collected through the systems described in the previous section can be automatically processed using Machine Learning techniques. The application of these techniques to user modeling aims to find sets of users with common interaction characteristics by detecting similar behavior patterns or profiles. Machine Learning techniques are mostly applied to interaction data that are collected while users are using the system. Ideally, this information should be collected in a non-invasive way, without disturbing the users, which makes server-side weblog collections the simplest in-use source and one which is applicable to the widest range of users. By contrast, client-side data collection tools such as those described by [12, 77] capture low-level interaction in an unobtrusive way but require explicit hardware or software installation on the user side. Therefore, they limit the size of the data sample that can be monitored in the modeling process. Gathering additional information, such as the content of the interface or the structure of the platform, is often useful for the profiling process [56, 7]. In web environments specifically, Natural Language Processing or Information Retrieval techniques allow information concerning the content of the website to be extracted. In addition, part of the information structure can be extracted from the network of links contained in the accessed webpages. Machine Learning techniques can be used to build or enrich user models following different approaches [17]. Content-based filtering approaches use the history of each specific user to build a profile that describes their characteristics (e. g., needs and preferences). In this way, the resulting profiles, combined with the information of the webpages, can be used to obtain pre-defined personalization schemes for each user or to enrich other data structures, such as ontology-based models. This approach can only be applied when the data contain user identification information. If the user identification is not present, user and session identification heuristics can be used in a collaborative filtering approach that enables the estimation of the needs and preferences of a user based on the experience acquired from other users with similar characteristics. The treatment of this information requires a complete data mining process: data acquisition, data pre-processing, selecting the most adequate Machine Learning techniques, and application of these techniques in order to finally obtain the user profiles

266 | J. Abascal et al. [30, 62]. This methodology requires the validation of the achieved results and some feedback that allows the process to be enhanced and repeated when required. Machine Learning algorithms cannot handle raw data in their original format. They require the removal of non-useful information, generating aggregate and derived features, etc. Weblog analytic tools, such as Piwik (recently renamed Matomo) [25], can be used to generate pre-processed data with a higher semantic level, which is directly usable by Machine Learning algorithms. This is only possible if the platform was previously designed to include the collection of all the types of required data, which is not a common practice. There are two main groups of Machine Learning algorithms: supervised and unsupervised learning algorithms. (a) Supervised learning (also known as classification) [26] is the task of inferring a mapping function from labeled training data allowing the identification of new unlabeled data. These techniques can be applied in user modeling in controlled contexts where explicit user information is known. Input data are labeled as belonging to different types of users in the database used to build the supervised model. Models built by means of supervised techniques are able to differentiate the diverse existing profiles. (b) Unsupervised learning techniques are designed to work with unlabeled data. Clustering is an unsupervised pattern classification method that partitions the input space into groups or clusters [43]. Its goal is to perform a partition where objects within a cluster are similar and objects in different clusters are dissimilar. Therefore, the purpose of clustering will be to identify natural structures among the users of a system. The following sections include examples of interaction data modeling in the context of users with disabilities, working with client-side data and server-side data.

4.2.1 Using client-side interaction data When Machine Learning techniques are applied to user interaction data the selection of features extracted from the interaction is critical. Depending on the extracted features, Machine Learning algorithms will be able to solve the problem or not. Almanji et al. [6] present a revision of features extracted using pointing devices from client-side interaction data of users with upper limb impairments due to cerebral palsy. They propose a model that measures the influence of the Manual Ability Classification System (MACS) level of each user and the characteristics of the analyzed features. Among the analyzed features, time for movement, acceleration–deceleration cycles, and average speed are the most significant. There are systems that automatically detect cursor pointing performance with the aim of learning how to deploy adaptations at the correct time without prior knowledge of the participant’s ability. They use client-side interaction data to build several systems to (i) discriminate pointing behaviors of individuals without problems from individuals with motor impairments, (ii) discriminate pointing behaviors of young people

10 Personalizing the user interface for people with disabilities | 267

from people with Parkinson disease or older adults, and (iii) detect the need for adaptations (such as “Steady Click”) designed to minimize pointer slips during a click. All these systems are built over labeled databases including features related to the click action, the movement, pauses, and task-specific features. They use wrapper methods to select the features and C4.5 classifiers [39]. WELFIT is a remote evaluation tool for identifying web usage patterns through client-side interaction data (event streams). It provides insights about differences in the composition of event streams generated by people with and people without disabilities. This tool uses the Sequence Alignment Method (SAM) for measuring the distances between event streams and uses a previously proposed heuristic to “point out” usage incidents. It labels the groups built in the clustering procedure as AT (users using assistive technologies), or non-AT, according to the corresponding majority. While endeavoring to identify web usage patterns within the discovered groups, the authors found significant differences in the distribution of several features between AT and non-AT users [69]. Supervised techniques can be used to model predefined types of users. They are focused on identifying one of the key interaction characteristics: the (assistive) device being used to interact with the computer which is critical in the selection of the best automatic adaptations [65]. To this end, specifically defined rich web user interaction data are collected by the RemoTest platform in controlled experiments where the device used for interaction is known [77]. A thorough data mining process is carried out in order to build a system able to identify the used device: keyboard, trackball, joystick, or mouse. The classifiers are built with different sets of features: the ones considered as most important by the experts, a complete set of features, and features selected automatically by wrapper systems. The performed analysis reveals that not all the features considered to be of the highest priority by accessibility experts are important from the classification point of view, whereas some of the features considered less important are, in fact, more relevant with regard to classification. The resulting system is able to efficiently determine the device used with an accuracy of 93 %.

4.2.2 Using server-side interaction data An example of the application of Machine Learning techniques to server-side interaction data for determining user profiles can be found in [8], where the authors model the interaction with Discapnet,14 a website aimed mainly at visually disabled people. It is a non-invasive system that first uses clustering to group users with similar navigation patterns into the same segment and then extracts the characteristics which cor14 Discapnet: el Portal de las Personas con Discapacidad (The Disability Portal). https://www. discapnet.es/

268 | J. Abascal et al. respond to each group of users. The analysis of the extracted characteristics leads to the identification of anomalous behaviors or groups which are undergoing navigation difficulties. The system allows it to be ascertained if the interaction problem comes from the user itself, from the characteristics of the website or platform being used, or from both. In the former case, systems personalized to this specific type of users are required to make their navigation activities easier. In the latter, the site should be redesigned or transcoded. Using this system in an experiment involving navigation under supervision carried out with mainly disabled people, 82.6 % of users with disabilities were automatically flagged as having experienced problems. Comparing with the original logs used to build the system, in which a more diverse range of people were accessing the system, only 33.5 % of the sessions revealed that problems were being experienced.

5 User interface personalization for specific applications In this section we present and discuss three challenging application areas where user interface personalization is applied with diverse results. The first one addresses the personalization of regularly used web user interfaces. In this case, a registration process may be required, enabling the interface to be personalized from the available information about the user. The second one deals with the personalization of sporadically – and often anonymously – used web interfaces, such as those in eGovernment portals. In this case, personalization requires observation of the behavior of the user in order to match it with a previously built stereotype. The last one deals with an interface where the input from the user is insufficient to specify a task. Therefore intelligent transformation of the commands, based on the user context and model, is required. In this case, the task is to control an assistive robot for augmented or alternative manipulation.

5.1 Transcoding for web accessibility personalization Several organizations that promote web standards and accessibility legislation, among which the W3C Consortium15 stands out, have campaigned for equal access to the Internet for everyone. Nevertheless, numerous websites do not yet meet the required accessibility levels. In addition, even if the accessibility guidelines and standards are met, full accessibility is not guaranteed. Diverse factors, including people’s 15 Web Content Accessibility Guidelines (WCAG) 2.0 World Wide Web Consortium (W3C). http://www. w3.org/WAI/intro/wcag.php

10 Personalizing the user interface for people with disabilities | 269

ability to use assistive technology to access the web, can also influence the existence of accessibility barriers [79]. From the early days of web accessibility research, people have embraced the objective of automatically converting a non-accessible page into an accessible one by modifying the code. This approach permits the modification of the presentation (e. g., changing colors, sizes, background) and also the addition of new content (e. g., widgets, JS code) in order to make the site more accessible. Transcoding is a technique that aims to modify “on the fly” the code of a nonaccessible webpage to convert it into an accessible webpage by adding, modifying, or tagging its content [11]. In comparison with CSS settings, transcoding allows more thorough adaptations, adding new code or altering the elements to augment the functionalities of a webpage. This technique has been used to improve the accessibility of webpages, providing more accessible variants of pages for any users that might need them. General transcoding methods that are independent of the specific website require sufficient information about the semantics of each element. The rise of the Semantic Web16 and the possibility of semantically tagging the elements of webpages made general transcoding methods possible. This also opened the way to using transcoding as a personalization technique. In this case, personalized transcoding depends on the characteristics, preferences, and likes of the specific user. First transcoders were devoted to perform adaptations such as webpage serialization, inserting missing “alt text” fields (alternative texts) to images, or content reordering17 [35], the main objective of which was to make the navigation easier for low-vision or blind users. However, in order to perform more thorough or wide-reaching adaptations, a transcoder needs to know the purpose that the designer gave to each element, such as which elements are devoted to providing main content and which provide navigation menus. In addition, it is also important to know the role of the element itself. For example, a DIV element is intended to be a container of elements, but by adding some JavaScript it can act as a button, a link, or even a text area. These different roles cannot be detected without annotations. Therefore, annotations add the necessary semantic information, which allows the content to be transformed more accurately to express the “intention” of page elements. Moreover, annotations can provide better access to the web by offering alternative content, personalized presentation, and navigation support. Only the owner of the website can insert annotations in the HTML code, although anyone can annotate a webpage when annotations are externally stored. 16 Semantic Web (2018). https://www.w3.org/standards/semanticweb/ 17 BBC Education Text to Speech Internet Enhancer (Betsie) is a script written for people using text to speech systems for web browsing used on the BBC website and on a number of other sites. http: //betsie.sourceforge.net/

270 | J. Abascal et al. Humans performing manual annotations are able to accurately assign a role to each element. However, manual annotation is time consuming [10]. For example, these authors use XPATH expressions to identify elements and assign annotations. In this case the annotator has to annotate the entire website page by page. In order to ease the annotation task automatic or semi-automatic tools are employed to help the annotators. For instance, Takagi et al. [75] propose a tool to insert annotations based on previously annotated webpages. In addition, most modern Content Management Systems avoid the need to repeat the same annotation for each page of the entire website. Alternatively, Bigham et al. [15] use crowdsourcing techniques to propose annotations. Transcoding was successfully applied before the advent of the Semantic Web standards and their utilization. Heuristics to infer the role of page elements, or the knowledge of the “templates” used in particular websites, were applied to this purpose. Nevertheless, the appearance of the Semantic Web boosted its use for web access personalization. Nowadays, advances in the Semantic Web allow transcoders and assistive technologies to perform adaptations more accurately. Screen readers such as NVDA or JAWS use the WAI-ARIA18 annotation language to transform the visual content into speech. A tool presented by Valencia et al. [76] uses the WAI-ARIA annotations to transform the web automatically when the annotation is present without the intervention of an annotator. The tool also enables the WAI-ARIA annotations to be added to the webpages that lack the annotation. Adaptations made for a specific type of user may have negative effects on other types due to the diverse characteristics of people with disabilities. Some systems are targeted at only one user group, such as the ones proposed by Asakawa and Takagi [10] or Bigham et al. [15], which are devoted to blind and low-vision people. On the other hand, Gajos et al. [32] present a system for people with motor impairments, and Richards and Hanson [68] for elderly people. Personalized transcoding systems devoted to a wider range of population require user profiling [59] or user modeling [76, 32] to decide which adaptations should be applied to a given user. In this way, personalization can be carried out performing the transcoding actions that are relevant for the specific characteristics of each user, taking into consideration personal data contained in the user model (an example can be seen in Figure 10.1).

5.2 Personalized universally accessible eGovernment According to Layne and Lee [55] electronic government or eGovernment is the use of digital technology, especially web-based applications, to enhance access to – and the 18 WAI-ARIA (2018). https://www.w3.org/TR/wai-aria/

10 Personalizing the user interface for people with disabilities | 271

Figure 10.1: Left: an original non-accessible webpage. Right: the same page after applying adaptations for people with motor impairments using tablets. The layout, size of elements, space between links, and color contrast were modified in the adapted page to make page sections more visible (split in the navigation menu, banner, main content, and footer), in order to enable comfortable reading and easier target selection. Data mining applied to the logs obtained by the RemoTest tool was used for the transcoding personalization. Taken from [7].

efficient delivery of – government information and services. Eurostat19 stated that in 2017, 49 % of the individuals in Europe used the Internet for interacting with public authorities for tasks such as obtaining information from public websites and downloading or submitting official forms. eGovernment development generates a rising demand for publicly accessible web services. In fact, one of the main benefits of eGovernment is that people can avoid face-to-face interaction with the administration insofar as public services are delivered to citizens at any time (during 24 hours, 7 days a week) and are provided in a personalized way (different languages, adaptations for disabled users, etc.) [34]. Web access to eGovernment services avoids the need to physically attend the administration building. In this way, it provides alternatives to people with disabilities who have difficulties to navigate the physical environment, but it does require the capacity to have unhindered access to the Internet. Therefore, eGovernment plans may be rendered useless if large segments of the target population, including persons with disabilities, are unable to access the system [27]. Thus, universal access is one of the main challenges of this area, together with privacy, confidentiality, and citizenfocused government management [37]. eGovernment services must be accessible and personalized in order to be fully inclusive. Governmental services provided through the Internet have distinctive characteristics: they are sporadically used, and many of them do not require registration, because they are solely devoted to providing information. Therefore, no previous information about the user is available. 19 Eurostat: Individuals using the Internet for interaction with public authorities. Dataset code: TIN00012. http://ec.europa.eu/eurostat/web/products-datasets/-/tin00012

272 | J. Abascal et al. Therefore, the design of accessible personalized eGovernment interfaces presents two main difficulties. First, the characteristics of the users are not known in advance. Even if they have registration procedures, governmental portals do not usually require the disclosure of sensitive information (e. g., about disabilities or limitations), making it difficult to personalize the interface to a particular profile. Second, eGovernment users make sporadic use of eServices and websites and therefore the usage information available is very limited. In any case, eGovernment users are more willing to provide personal information when they have previously performed a customization process [78]. User profiling for the personalization of governmental eServices that do not require registration can be carried out by (i) asking the user to fill in a form prior to the navigation, (ii) importing a user profile shared with other systems [21] or provided in a smartcard by the user themselves, or (iii) mining the user interaction to search for navigation patterns that can relate the user to similar previous users displaying the same behavior (and probably the same accessibility needs). Most approaches to adaptive interfaces for eGovernment use questionnaires and other sources, such as social networks, to obtain information about the preferences and interests of the users [54, 13]. However, questionnaires may be tedious for users and they may try to avoid them, and most social networks do not have enough public information about personal characteristics relevant to accessibility, or this information is not reliable when it is available. Nevertheless, some user profiling can be done using the information collected while the user is navigating the website. In this scenario, web usage mining techniques can be used for modeling eGovernment users by gathering their interaction data unobtrusively from web server logs [1]. Thus, these techniques avoid the use of personal data, implementing a user segmentation process based on the usage information where participants with similar navigation patterns are grouped together. Profiles extracted in this way provide relevant information about the navigation preferences and the user behavior that can be effective for different tasks. In the eGovernment context, web usage mining has been effective to model eServices, enabling the automatic prediction of successful and unsuccessful navigation behaviors based on the access patterns exhibited by the users [85]. Although the type of data currently collected in weblogs does not provide enough information to produce reliable and suitable user models, governments should not lose sight of the wide possibilities that data mining techniques offer to improve accessibility to their electronic services. Even Machine Learning techniques applied to anonymous user logs (see Section 3.2) in Common Log Format (CLF) combined with content and structure information can lead to the development of user profiles from different viewpoints which would provide an opportunity to model and adapt governmental eServices. In this regard, maintaining the navigation logs of eGovernment portals, enabling their exploitation for their usage analysis, and undertaking a commitment to take the findings into account is a reasonably cheap and simple process, when weighed against the sizable benefits that could be obtained.

10 Personalizing the user interface for people with disabilities | 273

5.3 Personalization of assistive human–robot interaction Previous examples showed cases of personalization of ordinary human–computer interfaces for accessible web access. Now we will analyze a different personalization environment: human–robot interaction for people with severe physical disabilities. This field requires personalizing not only the user interface (especially including the dialogue), but also the behavior of the robot. From the diverse types of assistive robots we will focus on one particular case: articulated arms used by people with severe mobility restrictions for augmented manipulation. This is quite a recent research field, favored by the availability of collaborative robots that allow safe working areas to be shared with humans. Human–robot interfaces provided for industrial articulated arms cannot be used by people with severe motor restrictions. Industrial robots are controlled using special keyboards and joysticks to “teach” them specific movements and other actions. The use of joysticks for commanding robotic actions is beyond the capabilities of most people with severe motor restrictions as articulated arms possess a great number of degrees of freedom (usually around six) allowing complex 3D movements. Matching the movements of a joystick to the movements of an articulated arm can be difficult and may produce counterintuitive interactions. Textual programming is still possible to produce low-level commands (such as move_to, open_gripper, close_gripper, etc.). Nevertheless, this is not a choice in a domestic environment as, for example, it is unlikely that users with motor disabilities would be able to provide commands with precise spatial coordinates. Therefore, two main issues arise when users with severe physical disabilities attempt to control an articulated arm. The first one is the personalization of the user interface (taking into account that the set of possible interaction modalities is very much reduced; in most cases pointing devices, joysticks, large keyboards, and voice commands are excluded). The second one is the bridging of the conceptual gap (considering the difference between the user and robot goals and procedures, and realworld models).

5.3.1 Multimodal interaction Multimodal interfaces for human–robot interaction (based on diverse combinations of voice, text, and gestures) are becoming increasingly popular [66]. For instance, robot control through gesture-based Computer Vision interfaces [80] may be an alternative when the user can produce a number of differentiated voluntary gestures, but they are hardly accessible to many people with severe movement restrictions. When the user commands are oral or textual orders, gestures, or signs, they may be incomplete or ambiguous. For instance, saying “give me that book” or pointing to

274 | J. Abascal et al. an object by any means (such as eye gaze) requires disambiguation in order to be interpreted. For this purpose the robot can take into account the user and context model where information about user preferences, restrictions, activities, timetables, etc., are available. In addition, users with severe motor disabilities use assistive technology to access computers very much conditioning the interaction modality. Finding new means for controlling and interacting with this type of robot is one of the main focus areas of robotics nowadays. For instance, physiological data, such as electromyographic signals [9], or brain–machine interfaces would allow for the design of human–robot interfaces for simple instructions.

5.3.2 Personalization and shared control In order to simplify the interaction, two research lines are here explored: shared control and personalization of the human–robot interface. These lines have in common the need for user models: according to Mason and Lopes [58], higher levels of autonomy required by shared control can be achieved through user adaptation. In the case of assistive robotics the adaptation can take place at interface level, in the input and output channels, in dialog management, or in the selection of available services [44], while models are usually based on user preferences, user needs, and user context. Assistive robots partially share the activity area with the user, breaking the safety rules set out for industrial installations. Domestic robots must take into account the proximity of the users and interact with them without causing any damage or harm. To this end, it is desirable that, in addition to following commands from the user, the robot takes autonomous decisions to ensure safety and performance of the commands. Ideally, an assistive robot adapts to the user’s needs and behavior automatically and decides when and how to do so autonomously, while providing the user with an easy means to directly override the decision of the robot [42] using the shared control or joint initiative approach [86, 58, 41]. Designing a robot that shares control with a human is more complex than designing a completely autonomous robot because it requires an interaction meta-level at which each party knows the characteristics, limitations, and abilities of the other party and takes over control only when necessary. Shared control robots are adaptable per se. They require user models that, in addition to the user needs and characteristics, contain relevant contextual parameters and store past interactions. The robot should be able to produce assumptions about the user’s intentions and to behave in consequence, comparing its knowledge about user habits and likes with the current situation. Mason and Lopes [58] present an example of a how a proactive robot applying the shared control paradigm provides a basis for adaptation through repeated interactions that are rewarded by the user. The robot can anticipate the user’s needs by selecting appropriate tasks according to a user profile and a context model, and then plan the

10 Personalizing the user interface for people with disabilities | 275

execution of these tasks without an explicit user request. Since this robot uses verbal instructions it does not solve the communication issue. Nevertheless, when the conceptual gap between the two agents (user and robot) is bridged, the substitution of voice commands with inputs coming from some assistive technology is more feasible.

6 Impact of interface personalization on the privacy of users with disabilities All users employing personalized services are exposed to risks relating to personal harassment and privacy attacks, including identity theft [4]. While most users are able to cope with these attacks, there exist specific populations that are more vulnerable to them, even reaching situations of helplessness. This is the case of some users with disabilities when using digital services. In this sense, systems collecting data about users for personalization purposes can be considered as being in the category of risky applications. In fact, user modeling for User-Adapted Interaction has always raised concerns about its possible impact on privacy and personal security [51]. These concerns are based on the fact that part of the data collected in user models can be sensitive and might be used to the users’ detriment. The danger of misuse increases when models include information about sensory, physical, or cognitive disabilities. For this reason, most people with disabilities are reluctant to provide information about their restrictions. Privacy issues arise if a user accesses the system by revealing their identity rather than remaining anonymous. Data stored in personalized systems include usage records, data supplied by users, and assumptions inferred from users’ data and usage behavior. If users provide information on disabilities and interests, these data are not only person-related but possibly even sensitive [29]. According to Fink et al. [29] confidentiality issues can be found in data transmission and storing (personal information about users is usually contained in the User Model Server). Encryption and authentication methods proved to be adequate to protect data from being stolen. Another issue is misuse of data carried out by the service provider [4]. Most protective applications that collect data about their users state that these data are only used for personalization of the application and would never be disclosed, sold, or lent to any other organizations. However, in general, nondisclosure statements are vague and some companies are not particularly reluctant to break them. Therefore, privacy protection requires measures being taken to meet legal regulations regarding systems that process personal information and to increase user acceptance by making the system transparent. In any case the fact that user data are gathered and processed should be pointed out to the users at the beginning of each

276 | J. Abascal et al. session [29]. In the AVANTI project, the following options were offered by the system in order to accommodate a user’s privacy expectations: – If possible, users should be given the option of accessing the system anonymously (e. g., with a pseudonym) if they do not want to reveal their identities. – In an (optional) initial dialog the user should be able to choose between no user modeling, short-term user modeling (e. g., for the current session only), and longterm modeling using persistent user models that are augmented with information from the current session. – At the end of each session the user should be asked if his or her model is to be deleted or stored for subsequent sessions. Sharing user data among diverse applications is even more critical. Data sharing might mean almost public disclosure of personal data, making it barely acceptable if there is not a transparent and effective permission policy. Smartcards have been proposed as a safer way to share user models. With these, users can provide their personal data only to the applications they have selected. This approach reduces public exposure of data, but again relies on the integrity of the applications. Aïmeur and Tremblay [4] mention Personal Management Information systems (PMIs) as an active area of research for privacy protection. These would be programs running in personal computers acting as gateways which interact with online content providers. They may use diverse privacy protection techniques such as pseudonyms, user anonymization, cryptography, and user data perturbation algorithms. In principle PMIs would provide end users with control over the usage of their data, although a more detailed description of their functionality would be required to judge the validity of this proposal. A different approach comes from the data mining field. Unsupervised learning is able to create user stereotypes from the data collected from anonymous interactions. These models contain data about user stereotypes rather than personal data, thereby ensuring privacy protection. In addition, the use of stereotypes allows the personalization to focus on the users’ abilities (which may be common across disabilities) more than on users’ restrictions. In any case, designers and developers must be aware of the national and international regulations for privacy protection in order to produce safe personalized interaction systems, especially for the most vulnerable users.

7 Conclusions User interface personalization for people with disabilities can demonstrate an important capacity to support digital accessibility and hence to drive social integration and participation. Nevertheless, most efforts in this field seem to be focused on research

10 Personalizing the user interface for people with disabilities | 277

projects rather than on commercial products. Interface personalization is currently a well-developed area that can be applied with sufficiently high levels of security to different areas. For example, it can be applied to eGovernment services in order to ensure equal opportunities to all users. Therefore, academia should increase its efforts to produce practical personalization methods that can be adopted by practitioners. On the other hand, industry and the public administrations should study the possibility of adopting interface personalization in order to guarantee the accessibility of their applications and services. Ontologies are currently the favored technology for supporting user models, mainly because, in addition to their flexibility and expandability, they permit the use of reasoning methods. Ontologies containing models of users with disabilities have proliferated in a number of different research actions. Although research requires diversity and experimentation, there is a consensus concerning the need for user model sharing and re-using. Advances in this line require the definition and adoption of standards for the lexical level and the development of tools for semantic translation. Nevertheless, the most important barrier to achieving progress in this line is the problem of ensuring user control over their own data when these are shared. User data collection is vital to populate user models. Nowadays enormous quantities of data can be mined and processed in order to extract information valid for the user models. Web mining is particularly promising in this sense. There are some types of data which are available but they are not collected in the logs of most web server applications. Data mining experts must agree with web server administrators to log specific data that are useful for characterizing the interaction. Data about the users collected by user modeling systems, required for most personalization methods, may have an impact on user privacy. This may be a barrier to accepting personalized interaction systems in the minds of possible users. Therefore, in order to enhance their accessibility for users, practical personalized systems must guarantee their commitment to user privacy, at least in the following directions: information collection must be limited to data strictly necessary for the application, and user data must be safely kept, anonymized when possible, and removed when they are is not useful. In addition, users must have the possibility to access their data and modify or remove them. They must also be warned about the use of data gathering methods when they are using a personalized system. Finally, users must have the possibility of stopping the personalization module without losing the right to use the service.

References [1]

Abascal, J., Arbelaitz, O., Arrue, M., Lojo, A., Muguerza, J., Pérez, J. E., Perona, I., and Valencia, X. 2013. Enhancing Web Accessibility through User Modelling and Adaption Techniques. In Assistive Technology: From Research to Practice: AAATE 2013, vol. 33, pp. 427–432.

278 | J. Abascal et al.

[2] [3] [4]

[5]

[6]

[7]

[8]

[9] [10] [11] [12]

[13]

[14]

[15]

[16] [17] [18] [19] [20]

ACCESSIBLE Ontology V. 5.0. 2011. http://www.accessible-eu.org/index.php/ontology.html. Accessed October 1, 2018. AEGIS Ontology. 2012. http://www.aegis-project.eu/index.php?option=com_content&view= article&id=107&Itemid=65. Accessed October 1, 2018. Aïmeur, E., and Tremblay A. 2018. Me, Myself and I are Looking for a Balance between Personalization and Privacy. In UMAP’18, Adjunct Publication of the 26th Conference, pp. 115–119. ACM. Aizpurua, A., Cearreta, I., Gamecho, B., Miñón, R., Garay-Vitoria, N., Gardeazabal, L., and Abascal, J. 2013. Extending In-Home User and Context Models to Provide Ubiquitous Adaptive Support Outside the Home. In Martín et al. (eds.), User Modeling and Adaptation for Daily Routines: Providing Assistance to People with Special Needs, pp. 25–59. Springer-Verlag, London. Almanji, A., Davies, T. C., and Stott, N. S. 2014. Using cursor measures to investigate the effects of impairment severity on cursor control for youths with cerebral palsy. International Journal of Human-Computer Studies 72(3): 349–357. Arbelaitz, O., Gurrutxaga, I., Lojo, A., Muguerza, J., Pérez, J. M., and Perona, I. 2013. Web usage and content mining to extract knowledge for modelling the users of the Bidasoa Turismo website and to adapt it. Expert Syst. Appl. 40(18): 7478–7491. Arbelaitz, O., Lojo, A., Muguerza, J., and Perona, I. 2016. Web mining for navigation problem detection and diagnosis in Discapnet: a website aimed at disabled people. JASIST 67(8): 1916–1927. Artemiadis, P. K., and Kyriakopoulos, K. J. 2010. EMG-Based Control of a Robot Arm Using Low-Dimensional Embeddings. IEEE Transactions on Robotica 26(2): 393–398. Asakawa, C., and Takagi, H. 2000. Annotation-based transcoding for nonvisual web access. In Procs. of the fourth Int. ACM Conf. on Assistive technologies, pp. 172–179. Asakawa, C., and Takagi, H. 2008. Transcoding. In Harper S., Yesilada Y. (eds.) Web Accessibility, pp. 231–260. Springer-Verlag, London. Atterer, R., Wnuk, M., and Schmidt, A. 2006. Knowing the user’s every move: user activity tracking for website usability evaluation and implicit interaction. In Procs. of the 15th Int. Conf. on World Wide Web, pp. 203–212. Ayachi, R., Boukhris, I., Mellouli, S., Amor, N. B., and Elouedi, Z. 2016. Proactive and reactive e-government services recommendation. Universal Access in the Information Society 15(4): 681–697. Bigham, J. P., Cavender, A. C., Brudvik, J. T., Wobbrock, J. O., and Ladner, R. E. 2007. WebinSitu: a comparative analysis of blind and sighted browsing behavior. In Procs. of the 9th Int. ACM SIGACCESS Conf. on Computers and Accessibility, pp. 51–58. Bigham, J. P., Kaminsky, R. S., Ladner, R. E., Danielsson, O. M., and Hempton, G. L. 2006. WebInSight: making web images accessible. In Procs. of the 8th Int. ACM SIGACCESS Conf. on Computers and accessibility, pp. 181–188. Biswas, P., Robinson, P., and Langdon, P. 2012. Designing Inclusive Interfaces Through User Modeling and Simulation. Int. Jour. of Human-Computer Interaction 28(1): 1–33. Brusilovsky, P., Kobsa, A., and Nejdl, W. 2007. The Adaptive Web: Methods and Strategies of Web Personalization. Springer-Verlag, Berlin, Heidelberg. Bunt, A., Carenini, G., and Conati, C. 2007. Adaptive content presentation for the web. In The adaptive web, pp. 409–432. Springer, Berlin, Heidelberg. Card, S. K., English, W. K., and Burr, B. J. 1978. Evaluation of mouse, rate-controlled isometric joystick, step keys, and text keys for text selection on a CRT. Ergonomics 21(8): 601–613. Carmagnola F. 2009. Handling Semantic Heterogeneity in Interoperable Distributed User Models. In Kuflik, T., Berkovsky, S., Carmagnola, F., Heckmann, D., and Krüger, A. (eds.)

10 Personalizing the user interface for people with disabilities | 279

Advances in Ubiquitous User Modelling, pp. 20–36. LNC 5830. Springer, Berlin, Heidelberg. [21] Carmagnola, F., Cena, F., and Gena, C. 2011. User model interoperability: a survey. User Modeling and User-Adapted Interaction 21(3): 285–331. [22] Chin, D. N. 1993. Acquiring user models. Artificial Intelligence Review 7: 185–197. [23] Chittaro, L., Ranon, R., De Marco, L., and Senerchia, A. 2009. User Modeling of Disabled Persons for Generating Instructions to Medical First Responders. In Houben GJ., McCalla G., Pianesi F., Zancanaro M. (eds.) User Modeling, Adaptation, and Personalization. UMAP 2009. LNCS 5535. Springer, Berlin. [24] Claypool, M., Le, P., Wased, M., and Brown, D. 2001. Implicit interest indicators. In Procs. of the 6th Int. Conf. on Intelligent user interfaces, pp. 33–40. [25] Cook, A. M., and Polgar, J. M. 2014. Assistive Technologies: Principles and Practice. Elsevier Health Sciences. [26] Cunningham, P., Cord, M., and Delany, S. 2008. Supervised learning, in Machine Learning Techniques for Multimedia, pp. 21–49. Springer, Berlin, Heidelberg. [27] Esteves, J., and Joseph, R. C. 2008. A comprehensive framework for the assessment of eGovernment projects. Government information quarterly 25: 118–132. [28] Etgen, M., and Cantor, J. 1999. What does getting WET (web event-logging tool) mean for web usability. In Procs. of Fifth Human Factors and the Web Conf. [29] Fink, J., Kobsa, A., and Schreck, J. 2005. Personalized Hypermedia Information Provision through Adaptive and Adaptable System Features: User Modeling, Privacy and Security Issues. In Mullery, A., Besson, M., Campolargo, M., Gobbi, R., and Reed, R. (eds.) Intelligence in Services and Networks: Technology for Cooperative Competition. LNCS 1238, pp. 459–467. Springer, Berlin. [30] Frank, E., Hall, M. A., and Witten, I. H. 2016. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”. Morgan Kaufmann. [31] Gajos, K. Z., Reinecke, K., and Herrmann, C. 2012. Accurate measurements of pointing performance from in situ observations. In Procs. of the SIGCHI Conf. on Human Factors in Computing Systems, pp. 3157–3166. [32] Gajos, K. Z., Weld, D. S., and Wobbrock, J. O. 2010. Automatically generating personalized user interfaces with Supple. Artificial Intelligence. 174(12): 910–950. [33] Gamecho, B., Miñón, R., Aizpurua, A., Cearreta, I., Arrue, M., Garay-Vitoria, N., and Abascal, J. 2015. Automatic generation of tailored accessible user interfaces for ubiquitous services. IEEE Trans. on Human-Machine Systems 45(5): 612–623. [34] Gonzalez, R., Gasco, J., and Llopis, J. 2007. e-Government success: some principles from a Spanish case study. Industrial Management & Data Systems 107: 845–861. [35] Goose, S., Wynblatt, M., and Mollenhauer, H. 1998. 1-800-hypertext: browsing hypertext with a telephone. In Procs. of the ninth ACM conference on Hypertext and hypermedia: links, objects, time and space—structure in hypermedia systems: links, objects, time and space—structure in hypermedia systems, pp. 287–288. ACM. [36] Goy, A., Ardissono, L., and Petrone, G. 2007. Personalization in E-Commerce Applications. In The Adaptive Web, vol. 4321, pp. 485–520. Springer, Berlin, Heidelberg. [37] Griffin, D., Trevorrow, P., and Halpin, E. F. 2007. Developments in e-government: a critical analysis, p 13. Ios Press. [38] Gump, A., Legare, M., and Hunt, D. L. 2002. Application of Fitts’ law to individuals with cerebral palsy. Perceptual and motor skills 94(3): 883–895. [39] Hurst, A., Hudson, S. E., Mankoff, J., and Trewin, S. 2008. Automatically detecting pointing performance. In Procs. of the 13th international conference on Intelligent user interfaces (IUI ’08), pp. 11–19. ACM, New York, NY, USA. [40] Hwang, F., Keates, S., Langdon, P., and Clarkson, J. 2004. Mouse movements of

280 | J. Abascal et al.

[41]

[42]

[43] [44]

[45] [46]

[47]

[48]

[49]

[50] [51] [52]

[53]

[54]

[55] [56] [57]

[58] [59]

motion-impaired users: a submovement analysis. In ACM SIGACCESS Accessibility and Computing, 77–78, pp. 102–109. Iba, S., Paredis, C. J. J., and Khosla, P. K. 2003. Intention aware interactive multi-modal robot programming. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), 3, pp. 3479–3484. Inamura, T., Inabe, M., and Inoue, H. 2002. User adaptation of human-robot interaction model based on Bayesian network and introspection of interaction experience. In Procs. Int. Conf. on Intelligent Robots and Systems. IEEE. Jain, A. K., and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice-Hall, USA. John, E. S., Rigo, S. J., and Barbosa, J. 2016. Assistive Robotics: Adaptive Multimodal Interaction Improving People with Communication Disorders. International Federation of Automatic Control–Papers 49(30): 175–180. Kadouche, R., Abdulrazak, B., Giroux, S., and Mokhtari, M. 2009. Disability centered approach in smart space management. Int J Smart Home 3(2): 13–26. Kaklanis, N., Biswas, P., Mohamad, Y., Gonzalez, M. F., Peissner, M., Langdon, P., Tzovaras, D., and Jung, C. 2016. Towards standardisation of user models for simulation and adaptation purposes. In Universal Access in the Information Society, pp. 21–48. Springer-Verlag, Berlin. Keates, S., Hwang, F., Langdon, P., Clarkson, P. J., and Robinson, P. 2002. Cursor measures for motion-impaired computer users. In Procs. of the fifth Int. ACM Conf. on Assistive technologies, pp. 135–142. Knutov, E., De Bra, P., and Pechenizkiy, M. 2009. AH 12 years later: a comprehensive survey of adaptive hypermedia methods and techniques. New Review of Hypermedia and Multimedia 15(1): 5–38. Kobsa, A. 1993. User Modeling: Recent Work, Prospects and Hazards. In Schneider-Hufsmidt M., Küme T., Malinowski U. (eds.) Adaptive User Interfaces: Principles and Practise. Elsevier Science, North Holland, Amsterdam. Kobsa, A. 2001. Generic user modeling systems. User modeling and user-adapted interaction 11(1–2): 49–63. Kobsa, A. 2007. Privacy-Enhanced Web Personalization. In The Adaptive Web. Springer-Verlag, pp. 628–670. Korn, P., Bekiaris, E., Gemou, M. 2009. Towards Open Access Accessibility Everywhere: The ÆGIS Concept. In Stephanidis C. (ed.) Universal Access in Human-Computer Interaction. Addressing Diversity. LNCS 5614, pp. 535–543. Springer, Berlin. Koutkias, V., Kaklanis, N., Votis, K., Tzovaras, D., and Maglaveras, N. 2016. An integrated semantic framework supporting universal accessibility to ICT. Universal Access in the Information Society 15(1): 49–62. Krishnaraju, V., Mathew, S. K., and Sugumaran, V. 2016. Web personalization for user acceptance of technology: An empirical investigation of E-government services. Information Systems Frontiers 18(3): 579–595. Layne, K., Lee, J. 2001. Developing fully functional E-government: A four stage model. Government information quarterly 18: 122–136. Liu, B. 2006. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications). Springer-Verlag. New York, Inc., Secaucus, NJ, USA. MacKenzie, I. S., Kauppinen, T., and Silfverberg, M. 2001. Accuracy measures for evaluating computer pointing devices. In Procs. of the SIGCHI Conf. on Human factors in computing systems, pp. 9–16. Mason, M., and Lopes, M. C. 2011. Robot self-initiative and personalization by learning through repeated interactions. In Procs. of the 6th int. Conf. on Human-robot interaction, pp. 433–440. Mirri, S., Salomoni, P., Prandi, C., and Muratori, L. A. 2012. GAPforAPE: an augmented browsing

10 Personalizing the user interface for people with disabilities | 281

[60]

[61] [62] [63] [64] [65]

[66] [67]

[68] [69]

[70] [71]

[72] [73] [74]

[75]

[76]

[77]

system to improve Web 2.0 accessibility. New Review of Hypermedia and Multimedia 18(3): 205–229. Mohamad, Y., Kouroupetroglou, C., eds. 2014. Research Report on User Modeling for Accessibility. W3C WAI Research and Development Working Group (RDWG) Notes (available at: http://www.w3.org/WAI/RD/2013/user-modeling/note/ED-UM4A). Norman, D. A., Draper, S. W. 1986. User Centered System Design; New Perspectives on Human-Computer Interaction. L. Erlbaum Associates, Hillsdale, US. Pabarskaite, Z., and Raudys, A. 2007. A process of knowledge discovery from web log data: Systematization and critical review. Journal of Intelligent Information Systems 28(1): 79–104. Paganelli, L., and Paternò, F. 2002. Intelligent analysis of user interactions with web applications. In Procs. of the 7th Int. Conf. on Intelligent user interfaces, pp. 111–118. Peissner, M., and Edlin-White, R. 2013. User Control in Adaptive User Interfaces for Accessibility. In Kotzé et al. (eds.), Procs. INTERACT 2013, Part I. LNCS 8117, pp. 623–640. Perona, I., Year, A., Arbelaitz, O., Muguerza, J., Ragkousis, N., Arrue, M., Pérez, J. E., and Valencia, X. 2016. Automatic device detection in web interaction. In Procs. of the XVII Conf. Asociación Española para la Inteligencia Artificial (CAEPIA) – (TAMIDA), pp. 825–834. Perzanowski, D., Schultz, A. C., Adams, W., Marsh, E., and Bugajska, M. 2001. Building a multimodal human-robot interface. IEEE Intelligent Systems 16(1): 16–21. Rapp, A., Cena, F., Mattutino, C., Calafiore, A., Schifanella, C., Grassi, E., and Boella, G. 2018. Holistic User Models for Cognitive Disabilities: Personalized Tools for Supporting People with Autism in the City. In UMAP’18 Adjunct: 26th Conf. on User Modeling, Adaptation and Personalization, pp. 109–113. ACM, NY. Richards, J. T., and Hanson, V. L. 2004. Web accessibility: a broader view. In Procs. of the 13th Int. Conf. on World Wide Web, pp. 72–79. Santana, V. F., and Calani, M. C. 2015. WELFIT: A remote evaluation tool for identifying Web usage patterns through client-side logging. International Journal of Human-Computer Studies 76: 40–49. Scholtz, J., Laskowski, S., and Downey, L. 1998. Developing usability tools and techniques for designing and testing web sites. In Procs. of the HFWeb, 98, pp. 1–10. Smits-Engelsman, B. C. M., Rameckers, E. A. A., and Duysens, J. 2007. Children with congenital spastic hemiplegia obey Fitts’ Law in a visually guided tapping task. Experimental Brain Research 177(4): 431–439. Sosnovsky, S., and Dicheva, D. 2010. Ontological technologies for user modelling. International Journal of Metadata, Semantics and Ontologies 5(1): 32–71. Stephanidis, C. 2001. Adaptive Techniques for Universal Access. User Modeling and User-Adapted Interaction 11: 159–179. Stephanidis, C., Paramythis, A., Sfyrakis, M., Stergiou, A., Maou, N., Leventis, A., Paparoulis, G., and Karagiannidis, C. 1998. Adaptable and adaptive user interfaces for disabled users in the AVANTI project. In Trigila S., Mullery A., Campolargo M., Vanderstraeten H., Mampaey M. (eds.) Intelligence in Services and Networks: Technology for Ubiquitous Telecom Services. LNCS 1430. Springer, Berlin. Takagi, H., Asakawa, C., Fukuda, K., and Maeda, J. 2002. Site-wide annotation: reconstructing existing pages to be accessible. In Procs. of the fifth Int. ACM Conf. on Assistive technologies, pp. 81–88. Valencia, X., Pérez, J. E., Arrue, M., Abascal, J., Duarte, C., and Moreno, L. 2017. Adapting the Web for People With Upper Body Motor Impairments Using Touch Screen Tablets. Interacting with Computers 29(6): 794–812. Valencia, X., Pérez, J. E., Muñoz, U., Arrue, M., and Abascal, J. 2015. Assisted interaction data analysis of web-based user studies. In Human-Computer Interaction, pp. 1–19. Springer.

282 | J. Abascal et al.

[78] van Velsen, L., van der Geest, T., van de Wijngaert, L., van den Berg, S., and Steehouder, M. 2015. Personalization has a Price, Controllability is the Currency: Predictors for the Intention to use Personalized eGovernment Websites, Jour. of organizational computing and electronic commerce 25(1): 76–97. [79] Vigo, M., and Harper, S. 2013. Coping tactics employed by visually disabled users on the web. Int. Journal of Human-Computer Studies 71(11): 1013–1025. [80] Waldherr, S., Romero, R., and Thrun, S. 2000. A gesture based interface for human-robot interaction. Autonomous Robots 9: 151–173. [81] Webb, G. I., Pazzani, M. J., and Billsus, D. 2001. Machine Learning for User Modeling. User Modeling and User-Adapted Interaction 11: 19–29. [82] WHO 1980. Int. classification of impairments, disabilities, and handicaps: a manual of classification relating to the consequences of disease, World Health Organization published in accordance with resolution WHA29. 35 of the 29th World Health Assembly. May 1976. [83] WHO 2001. World Health Organization. International Classification of Functioning, Disability and Health (ICF) http://www.who.int/classifications/icd/en/index.html. Accessed October 1, 2018. [84] Wobbrock, J. O., and Gajos, K. Z. 2008. Goal crossing with mice and trackballs for people with motor impairments: Performance, submovements, and design directions. ACM Transactions on Accessible Computing (TACCESS) 1(1): 1–37. [85] Yera, A., Perona, I., Arbelaitz, O., Muguerza, J. 2018. Modelling the enrolment eService of a university using machine learning techniques. In Procs. of 16th International Conference e-Society 2018, Lisbon, Portugal, pp. 83–91. [86] Yu, H., Spenko, M., and Dubowsky, S. 2008. An adaptive shared control system for an intelligent mobility aid for the elderly. Autonomous Robots 15(1): 53–66.

Miloš Kravčík

11 Adaptive workplace learning assistance Abstract: Workplace learning has been part of our everyday reality since a long time, but at present, it has become more important than ever before. New technological opportunities can radically change not only formal, but also informal (unintentional) learning, typical for the workplace. Nowadays companies face a new challenge, which is the transition towards Industry 4.0. It is a complex process that concerns both executives and employees. Therefore, it is important to find solutions that make it easier for both sides. This change is accompanied by numerous re-qualification requirements, which demand a radical improvement of workplace learning and on-the-job training. Recent developments enable a more precise understanding of users’ needs, which can lead to better personalization of learning experiences. The effectiveness and efficiency of training and work processes can be improved through wearable technologies and augmented reality. Information technology should support the whole spectrum of educational methodologies, including personalized guidance, collaborative learning, and training of practical skills, as well as meta-cognitive scaffolding. Here we provide a reflective view on the former progress of adaptive workplace learning assistance (especially in the European context) and then point out several prospective approaches that aim to address the current issues. These should lead to innovative context-sensitive and intelligent adaptive assistance systems that support learning and training at the workplace. Keywords: professional learning, adaptive workplace assistance, personalized training, Industry 4.0, Internet of Things

1 Introduction Workplace learning was defined [47] as “the integrated use of learning and other interventions for the purpose of improving human performance, and addressing individual and organizational needs. It uses a systematic process of analysing and responding to individual, group, and organizational performance issues. It creates positive, progressive change within organizations by balancing humanistic and ethical considerations.” In this chapter we use also the term professional learning in this sense. An extensive survey of requirements for professional learning [10] showed that “learning needs to be available in a suitable form everywhere, and at the workplace it should be seamlessly integrated into the work processes. Learning objectives should involve the whole spectrum from high-level competencies and skills to concrete pieces of knowledge. E-learning and blended learning are highly demanded by users, especially if taking into account various pedagogical strategies according to the particular https://doi.org/10.1515/9783110552485-011

284 | M. Kravčík objectives and context. Finding a suitable business model for professional learning is a crucial issue, which impacts on the availability of learning resources, as well as the quality, accessibility, flexibility, re-usability, and interoperability of learning solutions. Personalisation and adaptation of learning is generally considered as highly important, because learning has to be individualised to become more effective and efficient. This is particularly true for (. . . ) workplace learning.” In the past a roadmap survey [4] on Technology-Enhanced Professional Learning (TEPL) in 2015 indicated that TEPL should support knowledge workers, promoting motivation performance, collaboration, innovation, and commitment to lifelong learning. According to this vision TEPL would become an effective tool to enhance work performance and promote innovation, creativity, and entrepreneurship among employees. Learning would become a catalyst in increasing employability. The use of knowledge would be democratized to provide equal opportunities for high-quality learning for all. Everyone would be empowered to learn anything at any time at anyplace, and the TEPL market would be commoditized to achieve transparency. Moreover, the survey predicted that TEPL would be highly impacted by seamless learning and working environments, two-way interactive collaboration based on ubiquitous internet with high bandwidth, meta-data facilitating management of content objects, and online communities. Among unpredictable factors were development of standards and whether the social climate will be driven by trust or suspicion. In the meantime, many of these predictions came true and TEPL became more common. Current intelligent tools can process Big Data and transform work processes, but the related consequences are difficult to predict [70]. In order to be successful, business executives have to consider complementarities of humans and computers. Upskilling of employees should focus on competences that cannot be replaced by machines. The competitive advantage in small and medium-sized enterprises (SMEs) depends on skilled labor and specialization. The trend of automation and data exchange in manufacturing technologies influences the organizational processes as well as the role of the employee. Companies need to fill their competence gaps efficiently. Employees may want to plan their lifelong professional development. And society should be interested in reducing the unemployment rate and letting people develop their talents. The existing centrally and hierarchically organized structures in production enterprises will be more and more decentralized. The Industry 4.0 paradigm shift [62] from resource-oriented planning to product-oriented planning is based on networking of intelligent machines and products, called Cyber-Physical Production Systems (CPPSs). With changing customer demands, the product will be able to request the necessary production resources autonomously. From the change management perspective, it is crucial to obtain the support of employees. The organizational and technical changes imply regularly updated and dynamic competence profiles of employees, requiring re-qualification through new learning formats directly at the workplace. This development demands increased communication skills and an increased degree of selforganization, as well as new abstraction and problem solving competences [56]. Co-

11 Adaptive workplace learning assistance

| 285

operation with robots will be another important skill in the near future. Consequently, this requires novel education paradigms as well as development of new learning settings and measures for this purpose. There is a strong requirement for re-training and upskilling of the workforce, but the employees should be motivated for this change, possess necessary meta-cognitive competences for professional development, and understand the decisions provided by machines. This is closely related to the control of privacy by each individual. In the following we first outline the history of professional and workplace learning in the last decades, recalling a selection of relevant European projects. Then we recall the important learning theories and related models in this field. These are supported by different types of learning technology, introduced afterwards. We conclude with a vision of the technological support in the Industry 4.0 era and some future prospects.

2 Historical overview Although the development of Technology-Enhanced Learning progressed intensively already since the middle of the 1980s with the dawn of the personal computer and accelerated dramatically in the 1990s with the spread of the Internet and the web, the particular focus on workplace learning came a bit later. In the European context halfway through the 2000s, researchers and developers started to investigate more intensively how to integrate information technology into corporate and industrial settings, in order to support professional development and qualification. Perhaps it is worth recalling briefly this history from the perspective of selected projects in the research and development programs funded by the European Commission. As this progress was to a large extent driven by the available technology, in the global context the situation did not look very differently. PROLEARN (2004–2007) was the Network of Excellence in Professional Learning that gave a strong impulse to the new wave. Among other achievements, it advanced personalized adaptive learning, investigating interoperability of systems and re-usability of learning resources, and later also social software and the Web 2.0 impact in this field. The researchers identified several issues, like missing harmonization of available learning standards and their restriction in the representation of adaptive methods, as well as uneasy authoring of adaptation strategies. An important challenge was represented by open corpus adaptive hypermedia systems. In its roadmap PROLEARN provided also the following vision: every knowledge worker should be able to learn anything anytime and at anyplace. The next phase (2005–2010) aimed at the development of suitable adaptive solutions integrated in normal workplace environments, considering target competences as meaningful educational objectives and learning processes as a crucial means for their achievement. TENCompetence investigated creation and exchange of knowledge

286 | M. Kravčík resources, learning activities, competence development programs, and network data for lifelong competence development in learning networks and communities of practice. Realizing that a method of professional development will only be efficient when it is as adaptive and personalized as possible [23], the consortium implemented an appropriate standards-based and open-source technical infrastructure, integrating and interconnecting the various levels of the conceptual model mentioned above. PROLIX developed an open service-oriented architecture for interlinking business process intelligence tools enabling competence management with flexible learning environments. APOSDLE further advanced process-oriented self-directed learning and supported informal learning activities via their seamless integration in the professional work. Their approach was impacted by the competence-based knowledge space theory and included adaptive systems with context-sensitive recommenders. The next projects (2008–1012) addressed not only interoperability at the level of learning outcomes and adaptation in traditional Learning Management Systems (LMSs), but also knowledge maturing through social learning. ICOPER created a reference model for competence-driven learning, considering the European Qualification Framework (EQF) for harmonization of various national qualifications systems [11]. The focus was on an output-centered approach, i. e., on knowledge, skills, and competences. GRAPPLE aimed to support lifelong adaptive learning, taking into account personal preferences, prior knowledge, skills, and competences, learning goals, and the current personal and social context. The project delivered a generic technical infrastructure, integrated with five different LMSs. MATURE investigated knowledge maturing in organizations, supporting social learning in knowledge networks and facilitating efficient competence development. The knowledge can emerge from informal representations towards more formalized semantic structures and processes. The success of community-driven approaches in the spirit of Web 2.0 showed that the intrinsic motivation of employees is crucial for their engagement in collaborative learning activities. Later on (2009–2014) also cultivation of meta-cognitive skills became an important objective, like self-regulated learning (SRL), in which reflection plays a crucial role. For this purpose, innovative personal learning environments (PLEs) as well as immersive simulated environments were developed. ROLE advanced psychopedagogical theories of adaptive education, especially SRL. It offered adaptivity and personalization not only in terms of content and navigation, but also of the entire learning environment and its functionalities. This novel concept of PLE was evaluated in various educational settings. In the business context it turned out that a pure PLE did not satisfy the requirements of personnel development and a hybrid solution was developed – Personal Learning Management System [64]. It aggregated selected learning resources and applications, facilitating the activities of workplace learners, like planning activities, searching for content and tools, training, and testing, as well as reflecting and evaluating the progress. ImREAL enhanced immersive simulated environments to align such learning experience with daily job practice. It showed

11 Adaptive workplace learning assistance | 287

the usefulness of affective meta-cognitive scaffolding in the context of experiential training simulators, having a positive impact on motivation, learning experience, and self-regulation [65]. MIRROR elaborated on reflective learning to facilitate learningon-the-job and experience sharing. It showcased reflection as a means to empower employees and achieve organizational impact, leading to innovative business offerings [45]. The following endeavors (2012–2018) emphasized scalability and focused on particular target groups, taking into account also professional identities. Learning Layers supported workplace practices in SMEs, bridging the gap between scaling and adaptation to personal needs. Building on mobile, contextualized, and social learning achievements, the project developed a common light-weight, distributed infrastructure for fast and flexible deployment in highly distributed and dynamic settings. These technologies were applied in two sectors, i. e., (i) healthcare and (ii) building and construction. BOOST addressed the need to engage small and micro-enterprises in vocational training, helping them to identify their critical business needs and skill gaps and fulfill the critical demands. The project built on the results of ROLE, using its PLE technology to design a customized learning environment both for managers and employees. EmployID supported public employment services and their employees in adapting to the changes in their area by facilitating the development of professional identities. The developed solutions include tools for reflection, e-coaching, creativity, networked learning structuring, and measuring impact. In parallel (2014–2018), several German projects developed assistance and knowledge services for the workplace, preparing for Industry 4.0. APPSist implemented a new generation of such context-sensitive and intelligent services as well as the underlying architecture for settings with cyber-physical systems in the digitally networked factory of the future (“smart production”). DigiLernPro developed a software tool which used various digital media to semi-automatically generate learning scenarios, enabling new forms of learning for Industry 4.0 requirements. ADAPTION also addresses the challenges related to Industry 4.0, focusing on a holistic approach, taking into account also the impact on the organization and the employees, providing them with access to relevant knowledge related to required new skills. Several currently running projects (2015–2019) benefit from newly available technologies and data processing approaches. WEKIT enhances human abilities to acquire procedural knowledge by providing a smart system that directs attention to where it is most needed. New opportunities for skill training are enabled by wearable technologies (WTs) and augmented reality (AR). AFEL advanced informal and collective learning as it surfaces implicitly in online social environments. Relying on real data from a commercially available platform, the aim is to provide and validate the technological grounding and tools for exploiting learning analytics (LA) on such learning activities. MOVING enables users from all societal sectors to improve their information literacy by training how to use, choose, reflect, and evaluate data mining methods in

288 | M. Kravčík connection with their daily research tasks and to become data-savvy information professionals. This short review of the workplace learning history in the European context shows the shift of support from knowledge workers learning new knowledge to industrial employees training new skills, but also lifelong learners cultivating their meta-cognitive competences. The development was essentially driven by the progress in information technology – from static LMS to adaptive systems and flexible PLE, further to social software and Web 2.0, including Recommender Systems (RSs) and LA, later on benefiting from mobile devices and smart phones, and more recently also WT and AR, as well as data science and Artificial Intelligence (AI).

3 Learning theories and models The aim of professional and workplace learning is to acquire new knowledge and skills, as well as the ability to apply these in real settings. The EQF was proposed to make qualifications more readable and understandable across different countries and systems, facilitating lifelong learning [11]. Its core deals with different reference levels describing what a learner knows, understands, and is able to do – learning outcomes. In EQF, the term competence means the proved ability to use knowledge and skills, as well as personal, social, and methodological abilities, in work or study situations during professional and personal development. Three basic theories can be applied in learning, depending on the domain, objective, and target group. Behaviorism aims at a change in external behavior of learners achieved through reinforcement and repetition (e. g., language learning). Cognitivism seeks to explain the process of knowledge acquisition and the subsequent effects on the mental structures within the mind (e. g., concept mapping). Constructivism focuses on how humans make meaning in relation to the interaction between their experiences and their ideas (e. g., problem-based learning). Workplace learning should be naturally integrated in the work process and is usually problem-based and often informal, i. e., without set learning objectives and learner’s intention. The SECI theory of organizational knowledge creation [43] distinguishes two basic types of knowledge. Explicit knowledge can be formally and systematically transmitted across individuals. Tacit knowledge is not easily expressible, but rooted in an individual’s actions. Knowledge is created when tacit and explicit knowledge cyclically interact with each other: 1. socialization: creating new tacit knowledge through shared experiences; 2. externalization: tacit knowledge is made explicit; 3. combination: restructuring explicit knowledge; 4. internalization: reflection and conversion of explicit knowledge into tacit knowledge.

11 Adaptive workplace learning assistance | 289

Novel models of informal learning should facilitate the creation of knowledge during the learning process. Therefore, the SECI theory has been elaborated into a framework modeling the knowledge creating learning process as divided in four sub-processes [40]: 1. learning needs analysis: describing the knowledge gap; 2. learning preparation and content development: preparing learning offerings; 3. learning process execution: creating understanding; 4. learning assessment and certification: producing quality certificates. This framework distinguishes between two types of learning processes. Knowledge transmission occurs when the knowledge exists prior to the execution of the learning process; this is typical for formal learning. But in informal learning the knowledge is often created during the execution of the learning process. While in the first case the learners are trying to figure out the right answers, in the second they are trying to figure out the right questions. The SECI process framework can apply the knowledge creation process, connecting it to psychological and social motivators for learning. The traditional methods originating in knowledge transmission focused on guidance and adaptation. Competence-based Knowledge Space Theory (CBKST) is a theoretical framework mainly used for personalizing learning to individual learners’ domain-specific competences [55]. The psychological research on CBKST was extended for adaptive informal technology-enhanced workplace learning [33]. The aim was to support work-integrated learning, closely connected with task performance. Even nowadays, theories of knowledge acquisition (transmission) are still dominant in workplace learning technology research [48]. Nevertheless, theories of participation and knowledge creation deserve more interest, especially social constructivist learning theory mediated by means of artifacts created by the community [35]. Another important aspect of lifelong learning in general are meta-cognitive skills, which belong to the key competences of a successful learner [15]. One of them is selfregulation, which includes monitoring and managing one’s cognitive processes as well as the awareness of and control over one’s emotions, motivations, behavior, and environment as related to learning [41]. Research has shown that the application of SRL increases the effectiveness of education, enhancing learning performance as well as the development of reflective and responsible professionalism. SRL was considered as a cyclic process of meta-cognitive activities consisting of the forethought phase (e. g., goal setting, planning), the performance phase (e. g., self-observation processes), and the self-reflection phase [69]. Reflection attracted special attention in the informal workplace learning context [45]. It means re-examining and re-assessing past experiences and drawing conclusions for further behavior. In this sense reflection was investigated as a means to empower employees and impact organizational success. Each of these approaches has a different meaning and should be applied accordingly. Guidance can be very helpful in knowledge transmission for novice learners,

290 | M. Kravčík when the knowledge exists in advance. Collaboration can help to create new knowledge, which is crucial, especially for experts. Self-regulation operates on a meta-level, impacting the effectiveness and efficiency of learning processes. From this perspective we can consider these approaches as complementary. To scale informal learning in complex and dynamic domains a model was created [34] that provides an integrative view on three informal learning processes at work, i. e., (i) task performance, reflection, and sense making, (ii) help seeking, guidance, and support, and (iii) emergence and maturing of collective knowledge. So we can distinguish three important areas of lifelong and workplace learning, which directly correspond with the abovementioned basic theories of learning. Personalized adaptive learning facilitates acquisition of well-structured knowledge by means of guidance, which relates to cognitivism. Collaborative learning supports cooperative construction of knowledge from own experiences, thus practicing constructivism. Self-regulated learning focuses on behavior changes at a meta-cognitive level, which is the aim of behaviorism. In reality a big challenge for learning designers is to take into account the particular objective and context, in order to find a suitable arrangement between guidance (adaptation of given knowledge structures) and emergence (collaborative creation of emerging knowledge structures), between freedom of the learner (stimulating motivation) and the system control (supporting efficiency of learning), as well as between direct adaptation of learning environments (that may cause confusion in some cases) and their responsiveness (e. g., by means of nudges and alerts, leaving the decision control to the learner).

3.1 Personalized adaptive learning Personalized adaptive learning is usually a preferred choice for knowledge transmission in well-structured domains, when help seeking and guidance support are needed. The key assumption is that such knowledge can be formalized properly, in order to be suitably presented and acquired by the learner. An Intelligent Tutoring System (ITS) typically consists of four basic components [42], i. e., (i) domain model, (ii) student model, (iii) tutoring model, and (iv) user interface model. They separately represent the knowledge about the subject domain, the learner, the pedagogical instructions, and the presentation opportunities. Similarly, the Adaptive Hypermedia Application Model (AHAM) distinguishes between the (i) domain model, (ii) user model, and (iii) teaching model [6]. Mobile technologies led to the additional recognition of the context model and consideration of alternative adaptation strategies to the adaptation model. The knowledge driving the adaptation process can be represented as five complementary models [1] – the domain model specifies what is to be adapted, the user and context models tell according to what parameters it can be adapted, and the instruction (pedagogical) and adaptation models express how the adaptation should be

11 Adaptive workplace learning assistance | 291

performed, distinguishing selection of pedagogical methodologies for the current purpose (learning objective) and adaptation to the current context. The related development and authoring processes can be simplified if interoperability of system modules and re-usability of learning resources is achieved. The technological and conceptual differences between heterogeneous resources and services can be bridged either by means of standards or via approaches based on the Semantic Web (with data processable by machines, e. g., ontologies). As the existing standards cannot realize general interoperability in this area, the Semantic Web can be used as a mediator. Ontologies can help us to achieve a certain kind of consensus and to contribute to the harmonization of the existing standards [24]. Two basic types of knowledge can be distinguished. Declarative knowledge is typical for the description of the subject domain (including learning materials – IMS Content Packaging, IMS Question and Test Interoperability; meta-data – IEEE Learning Object Meta-data; and domain ontologies), the user (IEEE Public and Private Information, IMS Learner Information Package), and the context knowledge. Procedural knowledge is important for designing learning activities from the pedagogical viewpoint (IMS Learning Design) as well as for defining adaptation strategies. There are various approaches to address these issues at different levels of formalization, from freely specified informal scripts, through procedural knowledge encoded directly in a software system, to re-usable elicited procedural knowledge, which ideally follows official standards or formalized re-usable ontologies. So, specification of learning activities and adaptation strategies by separating the content, declarative and procedural knowledge in adaptive courses seems to be quite natural. A suitable solution for the re-usability and adaptivity issues would be the representation of various types of knowledge driving the process of personalized adaptive learning, and their interaction when generating the concrete instances of adaptive learning design dynamically. The CBKST iterative methodology [33] enables modeling and validating the domain model and the user model of an adaptive learning system that provides appropriate learning opportunities. This modeling methodology is based on a formal model of the relationships between tasks and needed competences (task–competence matrix). Creating the two models by starting from the tasks seems to be well suited for a work-integrated approach.

3.2 Collaborative learning Informal learning, especially in ill-structured domains, requires knowledge creation support, facilitating continuous collaborative development and gradual formalization of new knowledge by a community of participants. This process includes sharing of knowledge and experience, exchange of opinions in discussions, as well as creation of new artifacts and their annotations. A critical point in developing an active learning community is the engagement of its members demonstrated by their participation

292 | M. Kravčík and contributions. According to social exchange theory individuals contribute more when there is an intrinsic or extrinsic motivation involved, such as anticipated reciprocity, personal reputation, social altruism, or tangible rewards [20]. Suitable incentive mechanisms can significantly increase both active and passive participation [16]. The issue of trust is related to the applied privacy and security policies. People with a common interest can establish a community of practice (CoP), in order to develop personally and professionally through the process of sharing information and experiences with the group [63]. The structure of a CoP consists of norms and collaborative relationships, shared understanding, and communal resources. Learning is here considered as social participation. Similarly, a learning network is a self-organized community stimulating professional and career development through a better understanding of concepts and events [23]. A participant can specify personal learning goals in the context of competence profiles. After a (self-)assessment a gap analysis is performed, leading to a personal development plan, consisting of learning activities. Also here, sharing, communication, and collaboration are crucial.

3.3 Self-regulated learning Studies have shown that the application of SRL increases the effectiveness of education. Self-regulation is crucial for the development of lifelong learning skills. According to educational psychologists, SRL is guided by meta-cognition, strategic action, and motivation to learn [67]. In this context students are proactive with respect to their learning [69]. Research shows that self-regulatory skills can be trained and can increase students’ motivation and achievement [53]. Regarding learning performance, there is evidence that students with intrinsic motivation, initiative, and personal responsibility achieve more academic success [68]. Studies also indicate that in order to improve academic achievements, all three dimensions of SRL in students must be developed: the meta-cognitive, the motivational, and the behavioral ones [67]. Another interesting finding is that SRL can enable accelerated learning while maintaining long-term retention rates [37]. In a meta-analysis of 800 studies [15], it has been shown that applying meta-cognitive learning strategies significantly contributes to learning success. These results provide clear evidence that meta-cognitive skills and in particular SRL abilities belong to the key competences of a successful learner, especially in the context of lifelong learning. Components of SRL are cognition, meta-cognition, motivation, affect, and volition [18]. Six key processes that are essential for self-regulated learning are listed by Dabbagh and Kitsantas [5]. These are goal setting, self-monitoring, self-evaluation, task strategies, help seeking, and time management. A cyclic approach to model SRL has been given by Zimmerman [69], where SRL is seen as a process of meta-cognitive activities consisting of three phases, namely, the forethought phase (e. g., goal setting

11 Adaptive workplace learning assistance | 293

or planning), the performance phase (e. g., self-observation processes), and the selfreflection phase. According to this model, learning performance and behavior consist of both cognitive activities and meta-cognitive activities for controlling the learning process. A study investigating SRL in Massive Open Online Courses found that goal setting and strategic planning predicted attainment of personal course goals [19]. Moreover, individual characteristics like demographics and motivation predicted learners’ SRL skills.

4 Learning technology Available learning technologies usually support formal learning in well-structured domains. Nevertheless, various methods and techniques were used to develop workplace learning solutions, facilitating rapid prototyping and re-use of software components [10]. For virtual workplaces an appropriate choice incorporated a distributed architecture, with educational servers providing learning materials and pedagogical agents enabling communication between clients and servers. These agents could use various web services communicating with each other and providing the requested functionality at different levels of the learning process, including representation of the relevant models, as well as required functionality. The various types of knowledge could be encoded in software components, represented by meta-data, or elicited in formal specifications. The ICOPER Reference Model (IRM) provided a common frame of reference for stakeholders who wish to contribute to the design and development of outcomeoriented teaching and content for re-use [54]. The IRM was designed to improve interoperability of educational systems and applications both at the process level and at the technical level (i. e., data and services). Learning and training services are typically based on information about the user status and the current context. Nowadays, the Internet of Things (IoT) consists of identifiable objects that can communicate and interact [39], using sensors to collect information about their environment and actors to trigger actions. Although not all technical challenges of IoT have been already solved, the technology enables further research. In the area of education, the early work with IoT focused on recognizing an object, presenting its information or activities [2], as well as social interaction on objects [66]. So there is still a lot of unexploited potential of IoT in this field, especially beyond the technological perspective. But there are also research fields that consider the pedagogical point of view in a more advanced way. Educational Data Mining deals with automatic extraction of meaning from large learning data repositories. This can be used for guidance (in planning and learning phases) and reflection. Guidance is often facilitated by means of nudges based on RSs. But learning recommendations are highly context-dependent [38], which relates to the characteris-

294 | M. Kravčík tics of the physical and virtual environment, the learning objective, as well as the learner and his or her current task at the workplace. In TEL, user preferences are not the most prominent factor and may not necessarily be in line with learning goals and other stakeholders’ interests. Moreover, recommendation goals are complex and require well-conceived recommendation strategies, which need to be adapted to the specifics of a domain [22]. Thus, the learning environments with a very small number of users should either draw on a thorough description of the learner and learning content (ontology-based approach) or support the annotation of the relevant learning content (tag recommendation algorithm). Supporting reflection on the learning process in a flexible way can be facilitated by suitable LA tools, visualizing both long-term and short-term behavior of participants. Nevertheless, various degrees of privacy and data security should allow different levels of integration, depending on special preferences of individuals and companies [27]. It is crucial to take into account relevant pedagogical approaches for learning at the workplace, like the one that orchestrates adaptive, social, and semantic technologies that will play a key role in allowing professionals to draw on collective knowledge and to scaffold learning in a networked workplace context [34]. An analysis of LA for professional and workplace learning [48] found that this field is in an early stage of development, with a relatively low occurrence of knowledge creation approaches, but with a big potential in multimodal LA that can help to overcome the problems of scarce data.

4.1 Adaptive learning technology IMS Learning Design was created as a standard allowing to capture procedural knowledge about learning processes, enabling also adaptation through conditional constrained branching of the control flow in a learning activity, with conditions based on user characteristics [3]. When developing personalized and adaptive learning solutions, a key challenge was how to simplify the authoring process, considering collaboration of multiple persons. The previously mentioned survey [10] concluded that specification of adaptation strategies by separating the content, declarative, and procedural knowledge would be very helpful from the authoring point of view. This, of course, necessitates suitable orchestration of various representations. In order to develop new competences in the industrial workforce quickly and efficiently, suitable paradigms for continuous training of employees are needed. Various approaches have been investigated. Traditional LMSs were enhanced with adaptive functionality in the GRAPPLE project. More flexibility into learning environment design was enabled by modular PLEs. The ROLE project experimented with a hybrid solution in the form of Personal LMS and developed an approach based on the responsive and open learning environments [44], which was later customized for SMEs [28]. The Learning Layers project addressed the issues of scalability of informal workplace

11 Adaptive workplace learning assistance | 295

learning also with adaptive video trials based on semantic annotations [29]. Affordances of augmented reality and wearable technology for capturing the expert’s performance in order to support its re-enactment and expertise development have been investigated in the WEKIT project [36]. Personalization of learning experiences deals with such issues like detection and management of context and personal data of the learner, considering also their emotions [50]. A better understanding of the person’s needs can be achieved by including information from various resources (e. g., physiological and context sensors) and related Big Data. Learners’ preferences change dynamically; therefore, available sensors can help significantly in their recognition. Collected sensor data can help to infer contextual preferences directly from the individual’s behavior [61]. Meta-cognitive skills, like SRL, are crucial for the effectiveness of lifelong learning. Therefore, the employed technologies need to cultivate them, providing an appropriate balance between the learner’s freedom and guidance. This should stimulate not only motivation, but also the effectiveness and efficiency of the learning experience [44]. Effective support for SRL includes integration of nudges and reflection facilities in a suitable way [26]. Awareness and reflection services can provide valuable feedback, if they interpret and visualize the collected data meaningfully and in an understandable form. Here knowledge from various fields has to be considered, including psychology, pedagogy, neuroscience, and informatics [30]. Open learner models (OLMs) show the learner model to users to assist their SRL by helping prompt reflection, facilitating planning and supporting navigation [13]. The usage of adaptation and recommendation services in learning is limited, if they are not understandable and scrutable, which is often the case when AI techniques like Deep Learning are employed [7]. Machine-made decisions should be explainable by rules or evidence, to raise the trust of users. These need also clear and manageable privacy policies, in order for users to feel they are in control [12].

4.2 Social software The emergence of Web 2.0 opened quite new opportunities for active participation of users, e. g., by means of blogs or wikis. Consequently, the role of social software attracted a lot of attention, defined as tools and environments that support activities in digital social networks [20]. Learning Network Services (LNSs) are web services designed to facilitate the members of the network to exchange knowledge and experience in an effective way, to stimulate active and secure participation within the network, to develop and assess the competences of the members, to find relevant peers and experts to support you with certain problems, and to facilitate ubiquitous and mobile access to the learning network [23]. LNSs can stimulate social interaction, recommend navigation, assess competence levels, and provide personalization of learning events.

296 | M. Kravčík In the past, there were various approaches to professional learning. Building a technical and organizational infrastructure for lifelong competence development was the aim of the TENCompetence project. Their demand-driven approach [25] was based on the qualification matrix, mapping the relevant tasks on the required competence profiles and allowing to use such a competence map by the staff for self-assessment. An analysis of the resulting competence gap enabled to prioritize competence development needs. Expert facilitators were identified and competence networks were established for the required competences. This methodology was supported by the Personal Competence Manager [21], which at the individual level enabled goal setting (specification of the target competence profiles), self-assessment (to identify the knowledge gap), activity advice (selection of personal development plans), and progress monitoring (to support awareness and reflection). Development of mobile technology enabled the Learning Layers project to aim at the scalability issue, using mobile devices with collaboratively created and shared multimedia artifacts. Their integrative approach orchestrates adaptive, social, and semantic technologies, in order to allow professionals to draw on collective knowledge and to scaffold learning in a networked workplace context [34].

4.3 Industry 4.0 The insufficient qualifications of employees were identified as a major problem for the transition to Industry 4.0 and several dozens of important competences were identified as required [52]. Crucial for Industry 4.0 are combinations of professional and IT competences with social and personal skills. A big challenge is to develop novel ways of individualized and informal learning integrated in various settings (including workplace) and also to cultivate meta-cognitive skills (like motivation and self-regulation). Assistance and knowledge services have been defined as software components that provide specific types of support: assistance services help in solving current issues, while knowledge services support the transfer of knowledge to achieve individual qualification aims [59]. Currently service architectures provide functionalities resulting from the interplay of a number of services, each implementing a specific functionality and making it available for other services. A good example is the architecture implemented in the APPsist project, with intelligent assistance and knowledge services at the shop floor [60].

4.4 Vision From the technological point of view, we can distinguish four layers of relevant services (Table 11.1). At the bottom we find the Data layer, where a fusion of data from

11 Adaptive workplace learning assistance | 297 Table 11.1: Four layers of services.

User Interface Personalized and adaptive learning / training with WT and AR Smart Services Intelligent multimodal assistance and knowledge services Basic Services Data analysis Data IoT multisensory fusion

IoT sensors takes place. Then the Basic Services layer, where the data analysis is being performed. Next we find the Smart Services layer, with multimodal assistance and knowledge services. On the top the user interface layer offers personalized and adaptive learning and training experience with wearable technologies and augmented reality. The Data layer incorporates IoT, which is decentralized, providing privacy and security. Here the blockchain technology [57] plays a crucial role, allowing devices to autonomously execute digital contracts and function in a self-maintaining, self-servicing way. This new paradigm delegates the trust at the object level, enabling animation and personalization of the physical world. It will provide novel refined facilities for users to control their privacy and protect their data. Blockchain can disrupt education, replacing the broadcast model with preparation for lifelong learning, cultivating relevant competences, like critical thinking, problem solving, collaboration, and communication [58]. At the Basic Services layer there is support for user modeling to harness and manage personal data gathered from IoT [32], which will help IoT application developers to achieve light-weight, flexible, powerful, reactive user modeling that is accountable, transparent, and scrutable [17]. Related approaches address for instance elicitation of human cognitive styles [46] and affective states [51], as well as modeling psychomotor activities [49]. The Smart Services layer provides relevant awareness and reflection indicators [30] as well as guidance like nudges [9]. New useful services will be created, like metaadaptation providing adaptation strategies according to learning objectives [29]. Assistance and knowledge services will incorporate various levels of interaction. Based on the user behavior and its analysis they can simply provide feedback in the forms of hints, nudges, and recommendations, letting the user decide which of them to consider and accept. On the other hand, they can conduct an intelligent dialogue with the user, responding to their questions and input.

298 | M. Kravčík The user interface layer offers new chances for immersive procedural training, like capturing and re-enactment of expert performance, enabling immersive, in-situ, and intuitive learning [14]. Motor skill learning is another area where wearable technology and user modeling can be synergistically combined [8].

5 Conclusion and future prospects We have observed that in recent decades alternative approaches have been investigated in the area of workplace learning. Transmission and acquisition of wellstructured knowledge by means of guidance was a typical objective of personalized adaptive learning systems. Later on, collaborative learning could be facilitated by Web 2.0 and social software, supporting the creation of new knowledge. Moreover, a lot of attention has been given to the cultivation of meta-cognitive skills, like motivation, planning, and reflection, which are part of SRL. These efforts can benefit from the rapid progress in educational data mining, RSs, and LA. The three different types of learning correspond to the basic educational theories of cognitivism, constructivism, and behaviorism. In practice it is crucial to find a suitable orchestration and balance among them, depending on the concrete objectives and circumstances. Industry 4.0 changes the manufacturing world dramatically and especially SMEs need and deserve special support in order to be able to benefit from the new conditions [31]. Such a transition is a complex process, which is very difficult to control. It includes change management at the technical, organizational, as well as personal level. A crucial part of these changes represents the human factor with upskilling of the workforce and development of required competences, which calls for a radical improvement of informal learning and training at the workplace, based on novel models that support creation of knowledge in the learning process. Nevertheless, it is important to search for solutions that can make this move easier for both parties involved – the companies themselves and their employees. Each of them needs a good motivation and a clear benefit when new tools and services are to be successfully adopted. To summarize, learning and training offers should take into account not only individual preferences of users, but also the effectiveness and efficiency of the learning experience, including also the current context, with learner’s emotional status and attention. Ubiquitous sensors and IoT open more opportunities for processing of the big educational data, which leads to a better recognition of learners’ objectives, preferences and context, and consequently to a more precise personalization and adaptation of learning experiences. Their effectiveness and efficiency can be improved by wearable technologies and augmented reality, which open new horizons for innovative training methods, cultivating required competences. Transparency and understandability of machine decisions as well as clear and manageable privacy

11 Adaptive workplace learning assistance | 299

rules are crucial to gain the trust of the user. These requirements can be facilitated by blockchain technology, which has the potential to be disruptive.

References [1]

[2] [3] [4]

[5] [6]

[7]

[8]

[9]

[10]

[11] [12]

[13]

[14]

[15]

Aroyo, L., Dolog, P., Houben, G-J., Kravčík, M., Naeve, A., Nilsson, M., and Wild, F. 2006. Interoperability in Personalized Adaptive Learning. Educational Technology & Society, 9(2): 4–18. Broll, G.; Rukzio, E.; Paolucci, M.; Wagner, M.; Schmidt, A., and Hussmann, H. 2009. Perci: Pervasive Service Interaction with the Internet of Things. IEEE Internet Computing, 13: 74–81. Burgos, D., Tattersall, C., and Koper, R. 2006. How to represent adaptation in eLearning with IMS Learning Design. International Journal of Interactive Learning Environments. Chatti, M. A., Klamma, R., Jarke, M., Kamtsiou, V., Pappa, D., Kravčík, M., and Naeve, A. 2006. Technology Enhanced Professional Learning: Process, Challenges and Requirements. In: WEBIST conference, Setubal, Portugal. INSTICC Press, pp. 268–274. Dabbagh, N., and Kitsantas, A. 2004. Supporting Self-Regulation in Student-Centered Web-Based Learning Environments. International Journal on e-Learning. 3(1): 40–47. De Bra, P., Houben, G. J., and Wu, H. 1999. AHAM: A Dexter-based Reference Model for Adaptive Hypermedia. In: Proceedings of the ACM Conference on Hypertext and Hypermedia. ACM, pp. 147–156. De Bra, P. 2017. After Twenty-Five Years of User Modeling and Adaptation...What Makes us UMAP? In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, p. 1. ACM. Dias Pereira dos Santos, A., Yacef, K., and Martinez-Maldonado, R., 2017. Let’s Dance: How to Build a User Model for Dance Students Using Wearable Technology. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, pp. 183–191. Dimitrova, V., Mitrovic, A., Piotrkowicz, A., Lau, L., and Weerasinghe, A. 2017. Using Learning Analytics to Devise Interactive Personalised Nudges for Active Video Watching. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, pp. 22–31. Dolog, P., Kravčík, M., Cristea, A., Burgos, D., De Bra, P., Ceri, S., Devedzic, V., Houben, G-J., Libbrecht, P., Matera, M., Melis, E., Nejdl, W., Specht, M., Stewart, C., Smits, D., Stash, N., and Tattersall, C. 2007. Specification, authoring and prototyping of personalised workplace learning solutions. Int. J. Learning Technology, 3(3): 286–308. EC 2000. The EQF for lifelong learning. Office for the publication of the EC, ISBN 978-92-79-0847-4. Golbeck, J. 2017. I’ll be Watching You: Policing the Line between Personalization and Privacy. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, p. 2. Guerra-Hollstein, J., Barria-Pineda, J., Schunn, C. D., Bull, S., and Brusilovsky, P. 2017. Fine-Grained Open Learner Models: Complexity Versus Support. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, pp. 183–191. Guest, W., Wild, F., Vovk, A., Fominykh, M., Limbu, B., Klemke, R., et al.2017. Affordances for Capturing and Re-enacting Expert Performance with Wearables. In: European Conference on Technology Enhanced Learning. Springer, Cham, pp. 403–409. Hattie, J. 2008. Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.

300 | M. Kravčík

[16] Hummel, H. G., Tattersall, C., Burgos, D., Brouns, F., Kurvers, H., and Koper, R. 2005. Facilitating participation: From the EML web site to the Learning Network for Learning Design. Interactive Learning Environments, 13(1–2): 55–69. [17] Kay, J., and Kummerfeld, B. 2012. Creating personalized systems that people can scrutinize and control: Drivers, principles and experience. Transactions on Interactive Intelligent Systems, 2(4): 24. [18] Kitsantas, A. 2002. Test Preparation and Performance: A Self-Regulatory Analysis. The Journal of Experimental Education. 70(2): 101–113. [19] Kizilcec, R. F., Pérez-Sanagustín, M., and Maldonado, J. J. 2017. Self-regulated learning strategies predict learner behavior and goal attainment in Massive Open Online Courses. Computers & Education, 104: 18–33. [20] Klamma, R., Chatti, M. A., Duval, E., Hummel, H., Hvannberg, E. T., Kravčík, M., Law, E., Naeve, A., and Scott, P. 2007. Social software for life-long learning. Journal of Educational Technology & Society, 10(3). [21] Kluijfhout, E., and Koper, R. 2010. Building the technical and organisational infrastructure for lifelong competence development. [22] Kopeinik, S. 2017. Applying Cognitive Learner Models for Recommender Systems in Sparse Data Learning Environments. Doctoral dissertation, Graz University of Technology. [23] Koper, R. 2009. Learning network services for professional development. Springer, Heidelberg/Berlin. [24] Kravčík, M., and Gašević, D. 2007. Leveraging the semantic web for adaptive education. Journal of Interactive Media in Education, 1. [25] Kravčík, M., Koper, R., and Kluijfhout, E. 2007. TENCompetence Training Approach. In: Proceedings of EDEN 2007 Annual Conference, pp. 105–110. [26] Kravčík, M., and Klamma, R. 2014. Self-Regulated Learning Nudges. In: Proceedings of the First International Workshop on Decision Making and Recommender Systems, vol. 1278. CEUR. [27] Kravčík, M., Neulinger, K., and Klamma, R. 2016. Data analysis of workplace learning with BOOST. In: Proceedings of the Workshop on Learning Analytics for Workplace and Professional Learning, pp. 25–29. [28] Kravčík, M., Neulinger, K., and Klamma, R. 2016. Boosting vocational education and training in small enterprises. In: European Conference on Technology Enhanced Learning. Springer, pp. 600–604. [29] Kravčík, M., Nicolaescu P., Siddiqui A., and Klamma R. 2016. Adaptive Video Techniques for Informal Learning Support in Workplace Environments. In: Emerging Technologies for Education: First International Symposium, SETE. Springer, Cham, pp. 533–543. [30] Kravčík, M., Ullrich, C., and Igel, C. 2017. Supporting Awareness and Reflection in Companies to Move towards Industry 4.0. In: Proceedings of the International Workshop on Awareness and Reflection (ARTEL). Held in Conjunction with the EC-TEL Conference. [31] Kravčík, M., Wang, X., Ullrich, C., and Igel, C. 2018. Towards Competence Development for Industry 4.0. In: International Conference on Artificial Intelligence in Education. Springer, Cham, pp. 442–446. [32] Kummerfeld, B., and Kay, J. 2017. User Modeling for the Internet of Things. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. ACM, pp. 367–368. [33] Ley, T., Kump, B., and Albert, D. 2010. A methodology for eliciting, modelling, and evaluating expert knowledge for an adaptive work-integrated learning system. International Journal of Human-Computer Studies, 68(4): 185–208. [34] Ley, T., Cook, J., Dennerlein, S., Kravčík, M., Kunzmann, C., Pata, K., Purma, J., Sandars, J., Santos, P., Schmidt, A., Al-Smadi, M., and Trattner, C. 2014. Scaling informal learning at the workplace: A model and four designs from a large-scale design-based research effort. British

11 Adaptive workplace learning assistance

| 301

Journal of Educational Technology, 45(6): 1036–1048. [35] Ley, T. 2017. Guidance vs emergence: a reflection on a decade of integration of working and learning in technology-enhanced environments. In: TEL@Work workshop in conjunction with European Conference on Technology Enhanced Learning. [36] Limbu, B., Fominykh, M., Klemke, R., Specht, M., and Wild, F. 2018. Supporting training of expertise with wearable technologies: the WEKIT reference framework. In: Mobile and Ubiquitous Learning, Springer, Singapore, pp. 157–175. [37] Lovett, M., Meyer, O., and Thille, C. 2008. The Open Learning Initiative: Measuring the effectiveness of the OLI statistics course in accelerating student learning. Journal of Interactive Media in Education. http://jime.open.ac.uk/2008/14. [38] Manouselis, N., Drachsler, H., Vuorikari, R., Hummel, H., and Koper, R. 2011. Recommender systems in technology enhanced learning. In: Recommender systems handbook. Springer, pp. 387–415. [39] Miorandi, D.; Sicari, S.; Pellegrini, F. D., and Chlamtac, I. 2012. Internet of things: Vision, applications and research challenges. Ad Hoc Networks, 10: 1497–1516. [40] Naeve, A., Yli-Luoma, P., Kravčík, M., and Lytras, M. D. 2008. A modelling approach to study learning processes with a focus on knowledge creation. International Journal of Technology Enhanced Learning, 1(1–2): 1–34. [41] Nilson, L. 2013. Creating self-regulated learners: Strategies to strengthen students’ self-awareness and learning skills. Stylus Publishing, LLC. [42] Nkambou, R., Mizoguchi, R., and Bourdeau, J. 2010. Advances in intelligent tutoring systems, vol. 308. Springer Science & Business Media. [43] Nonaka, I., and Takeuchi, H. 1995. The knowledge-creating company: How Japanese companies create the dynamics of innovation. Oxford university press. [44] Nussbaumer, A., Kravčík, M., Renzel, D., Klamma, R., Berthold, M., and Albert, D. 2014. A Framework for Facilitating Self-Regulation in Responsive Open Learning Environments. arXiv preprint arXiv:1407.5891. [45] Pammer, V., Krogstie, B., and Prilla, M. 2017. Let’s talk about reflection at work. International Journal of Technology Enhanced Learning, 9(2–3): 151–168. [46] Raptis, G. E., Katsini, C., Belk, M., Fidas, C., Samaras, G., and Avouris, N. 2017. Using Eye Gaze Data and Visual Activities to Infer Human Cognitive Styles: Method and Feasibility Studies. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, pp. 164–173. [47] Rothwell, W., Sanders, E., and Soper, J. 1999. ASTD Models for Workplace Learning and Performance, Alexandria, VA: The American Society for Training and Development. [48] Ruiz-Calleja, A., Prieto, L. P., Ley, T., Rodríguez-Triana, M. J., and Dennerlein, S. 2017. Learning Analytics for Professional and Workplace Learning: A Literature Review. In: European Conference on Technology Enhanced Learning, Springer, Cham, pp. 164–178. [49] Santos, O. C., and Eddy, M. 2017. Modeling Psychomotor Activity: Current Approaches and Open Issues. In: Enhanced Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. [50] Santos, O. C., Kravčík, M., and Boticario, J. G. 2016. Preface to Special Issue on User Modelling to Support Personalization in Enhanced Educational Settings. International Journal of Artificial Intelligence in Education, 26(3): 809–820. [51] Sawyer, R., Smith, A., Rowe, J., Azevedo, R., and Lester, J. 2017. Enhancing Student Models in Game-based Learning with Facial Expression Recognition. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, pp. 192–201. [52] Schmid, U. 2017. Kompetenzanforderungen für Industrie 4.0. mmb Institute. [53] Schunk, D. H., and Zimmerman, B. J. 1998. Self-regulated learning: From teaching to

302 | M. Kravčík

self-reflective practice. Guilford Press, New York. [54] Simon, B., Pulkkinen, M., Totschnig, M., and Kozlov, D. 2011. The ICOPER Reference Model for Outcome-based Higher Education. ICOPER Deliverable 7.3b. [55] Steiner C. M., and Albert D. 2011. Competence-Based Knowledge Space Theory as a Framework for Intelligent Metacognitive Scaffolding. In: Biswas G., Bull S., Kay J., Mitrovic A. (eds) Artificial Intelligence in Education. AIED 2011. Lecture Notes in Computer Science, vol 6738. Springer, Berlin, Heidelberg. [56] Straub, N., Hegmanns, T., Kaczmarek, S. 2014. Betriebliches Kompetenzmanagement für Produktions- und Logistiksysteme der Zukunft. Zeitschrift für wirtschaftlichen Fabrikbetrieb, 109, 415–418. [57] Tapscott, D., & Tapscott, A. 2016. Blockchain Revolution: How the Technology Behind Bitcoin Is Changing Money, Business, and the World. Penguin. [58] Tapscott, D., & Tapscott, A. 2017. The Blockchain Revolution & Higher Education. Educause Review, 52(2): 11–24. [59] Ullrich, C., Aust, M., Blach, R., Dietrich, M., Igel, C., Kreggenfeld, N., Kahl, D., Prinz, C., Schwantzer, S. 2015. Assistance- and Knowledge-Services for Smart Production. In: Lindstaedt, S., Ley, T., Sack, H. (eds.) Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business, ACM, p. 40. [60] Ullrich, C., Aust, M., Dietrich, M., Herbig, N., Igel, C., Kreggenfeld, N., Prinz, C., Raber, F., Schwantzer, S., & Sulzmann, F. 2016. APPsist Statusbericht: Realisierung einer Plattform fur Assistenz-und Wissensdienste fur die Industrie 4.0. In: Proceedings of DeLFI Workshop, pp. 174–180. [61] Unger, M., Shapira, B., Rokach, L., Bar, A. 2017. Inferring Contextual Preferences Using Deep Auto-Encoding. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, ACM, pp. 221–229. [62] Wahlster, W. 2014. Semantic Technologies for Mass Customization. In: Wahlster, W., Grallert, H.-J., Wess, S., Friedrich, H., Widenka, T. (eds.) Towards the internet of services. The THESEUS research program. Springer, Cham, Heidelberg, New York, Dordrecht, London, pp. 3–13. [63] Wenger, E. 1998. Communities of practice: Learning, meaning, and identity. Cambridge university press. [64] Werkle, M., Schmidt, M., Dikke, D., & Schwantzer, S. (2015). Case study 4: Technology enhanced workplace learning. In Responsive Open Learning Environments. Springer, Cham, pp. 159–184. [65] Wesiak, G., Steiner, C. M., Moore, A., Dagger, D., Power, G., Berthold, M., Albert, D. & Conlan, O. (2014). Iterative augmentation of a medical training simulator: Effects of affective metacognitive scaffolding. Computers & Education, 76: 13–29. [66] Yu, Z.; Liang, Y.; Xu, B.; Yang, Y. & Guo, B. (2011) Towards a Smart Campus with Mobile Social Networking. In: 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing. IEEE. [67] Zimmerman, B. J. (1990) Self-regulated learning and academic achievement: An overview. Educational Psychologist 25(1): 3–17. [68] Zimmerman, B. J., and Martinez-Pons, M. (1990) Student differences in self-regulated learning: Relating grade, sex, and giftedness to self-efficacy and strategy use. Journal of Educational Psychology 82(1): 51–59. [69] Zimmerman, B. J. (2002). Becoming a self-regulated learner: An overview. Theory into practice, 41(2): 64–70. [70] Zysman, J., Kenney, M. 2018. The next phase in the digital revolution: intelligent tools, platforms, growth, employment. Commun. ACM 61(2): 54–63.

Index Adaptability 70, 71, 257 Adaptation – automatic 72, 259, 267 – behavior 259 – decision 87 – dialogue 259 – engine 259 – mechanisms 72, 111, 112 – model 79, 290 – process 259, 290 – rules 72, 73, 98 – strategies 285, 291, 294, 297 – techniques 69 – UI 68–70, 73–75, 93 – user 224, 274 Adaptive Hypermedia Application Model (AHAM) 290 Adaptivity 68, 70, 71, 93, 104–106, 111, 114, 119, 257 – ethics 127 – evaluation 123 Aggregating preferences 171, 197 Agreeable users 46 Agreeableness 6, 35, 39, 228 Ambient-Assisted Living (AAL) 261 Analysis Component (AC) 76 Artificial Intelligence (AI) 288 Augmented reality (AR) 287 Automatic personality – perception 47 – recognition 47 – synthesis 47, 48 Batch normalization (BN) 208 Behaviorism 288, 290, 298 Big Five 6, 31, 35, 184, 186 Blueprint 21 Clustered Orienteering Problem (COP) 163 Cognitive disabilities 275 Cognitive styles 6, 11, 14, 17 Cognitivism 288, 290, 298 Collaborative filtering 133, 138, 265 Combination 288 Common Log Format (CLF) 272 Conscientiousness 6, 10, 35, 228

Constructivism 288, 290, 298 Conversational recommender systems 143, 145 Convolutional neural networks (CNN) 202 Critiquing 143, 145 Cross-object user interfaces (COUIs) 202, 206 – distribution mechanisms 211 – prediction model 207 Customer preferences 185 Customer relationship management (CRM) 184 Delivering personalized recommendations 183, 184 Disabilities 70, 81, 96, 253–257, 262, 263, 270, 271, 275 Elo rating system 112–115 Emotion recognition 108, 109, 111, 115, 116, 124, 126 – automatic 119, 121, 126, 127, 267 – facial 110 – human 107, 126 – music 233 Emotional stability 35, 40 Emotional state 106–108, 119, 121, 227, 229, 232, 234 Emotions 107, 108, 111, 115–117, 225 – negative 121 Entertainment preferences 41 European Qualification Framework (EQF) 286 EVA score 115–118, 122, 124 Externalization 288 Extraversion 6, 8, 35, 38–40, 45, 228 Eye tracking 205, 208, 216 Facial expressions (FE) 108, 121 Flow 105, 124 Generalized Orienteering Problem (GOP) 163 Group recommendations 170, 171, 174 Group recommender system (GRS) 170 Hue saturation valence (HSV) 14 Human–computer interaction (HCI) 34, 36, 42, 47, 52, 53, 67, 202, 203, 217 Human–robot interaction 273 Human–robot interface 273, 274

304 | Index

ICOPER Reference Model (IRM) 293 Industry 4.0 284, 285, 287, 296, 298 Intelligent Personal Assistant (IPA) 37 Intelligent Tutoring System (ITS) 290 Interaction Recording and Processing Component (IRPC) 76 Interface adaptations 254, 259 Interface personalization 254, 256, 275, 277 Internalization 288 Internet of Things (IoT) 293 Iterated local search (ILS) 162 Knapsack Problem (KP) 161, 163, 174 Layered evaluation 84, 88 Learner skills 117, 124 Learning – adaptive 285, 294, 297 – collaborative 291, 298 – professional 283, 284 – self-regulated 286, 290, 292 – workplace 283, 288 Learning analytics (LA) 287 Learning Management Systems (LMS) 286 Learning Network Services (LNS) 295 Learning Style Inventory (LSI) 8 Learning Styles Questionnaire (LSQ) 8 Manual Ability Classification System (MACS) 266 Mean absolute errors (MAE) 13 Mobility patterns 168, 172, 173 Music – online preferences 229 – preferences 41, 47, 224, 227, 228, 231, 237 – recommendations 144, 168, 229, 239, 240 – recommender systems 17, 223, 225, 227, 229, 235, 237–244 Music emotion recognition (MER) 233 Music information retrieval (MIR) 224 Navigation preferences 272 Neuroticism 6, 35, 228 Objective system aspects (OSA) 242 Online behavior 9–11, 13 Open learner models (OLM) 295 Openness 6, 8, 14, 35, 38, 39, 45, 228

Oregon Trail Knapsack Problem (OTKP) 163 Orienteering Problem (OP) 160, 161 Personal learning environments (PLE) 286 Personality – assessment 47, 48 – awareness 47 – characteristics 10 – computing 46, 47 – dimensions 40 – impressions 11 – information 14, 23, 42, 43 – judgment 11 – manifestations 47 – models 34 – prediction 13, 14, 48 – predictor 13, 14 – profiles 42 – psychology 5, 44 – quizzes 41 – research 11, 34 – scores 14, 16 – traits 10, 11, 13, 35–38, 40, 41, 50, 52, 53, 186, 227–229 – types 8 Personalization – algorithms 34, 50 – automated 74, 75, 97 – environment 273 – interaction 68 – personality-aware 42, 46, 49–51, 53, 54 – services 32, 34 – strategies 3–5, 13, 17, 20 – systems 204, 259 – user interface 217, 255, 258, 268, 276 – web 206 Personalized – adaptive learning 290, 298 – interaction 68, 69, 74, 202–206, 217–219 – systems 4, 5, 10, 16, 20, 21, 67, 70, 223, 230 – user interfaces 215, 254 Preferences – inferred 136 – music 41, 47, 224, 227, 228, 231, 237 – tourist 184 – user 53, 146, 165, 171, 172, 183, 184, 193, 196, 235, 237, 274 Product – factor profile 185, 186, 188, 191–193, 195

Index | 305

– picture profile 191 Prototype – SpongeBox 94, 96 – SquareSense 95, 96 Psychological models 5, 8, 16, 21, 23 Psychological traits 9–11, 13, 18, 21 Public displays 173, 174 Recommendations – algorithms 142, 147 – lists 134, 146 – music 144, 168, 229, 239, 240 – process 142, 169, 175 – strategies 147, 163, 171 Recommender systems 4, 16, 41, 42, 133–135, 137, 144, 146, 159, 184, 185, 224, 225, 241, 242, 288 – music 17, 223, 227, 229, 235, 237–244 Semantic Matching Framework (SMF) 261 Seven-Factor Model 183, 184, 186–188, 197 Social cognition 106, 126 Social networking sites (SNS) 10 Social software 295 Socialization 288 South Tyrol Suggests (STS) 164 Subjective system aspects (SSA) 242 System Usability Scale (SUS) 125 Task difficulty 105, 116, 117, 126 Team Orienteering Problem (TOP) 161 Tourist – preferences 184 – recommendation literature 161 – trip recommendations 160, 164, 172 Tourist Trip Design Problem (TTDP) 160, 161, 163, 164, 174, 175 – algorithms 160, 174 Travel preferences 160, 171 Traveling Salesman Problem (TSP) 160, 161, 163, 174 Trip recommendations 166 – tourist 160, 164, 172 Twins 42 Usability 125 User – acceptance 98, 275 – adaptation 224, 274

– autistic 125 – awareness 241, 244 – behavior 4, 50, 53, 73, 202–206, 211, 216, 217, 272 – blind 269 – characteristics 71, 243, 254, 256, 260, 262, 264 – context 237, 268, 274 – control 75, 93, 98, 137, 143, 145–147 – measures 147 – mechanisms 135, 142, 146, 148, 150 – data 5, 22, 49, 261, 276, 277 – collection 277 – eGovernment 272 – engagement 206, 216, 241 – evaluations 190 – experience 19, 20, 33, 36, 42, 44, 45, 145, 147, 203, 204, 239 – factor profile 184–188, 191–194 – feedback 44, 45, 51, 143 – groups 33, 98, 186, 261, 270 – identification 265 – information 32, 43, 266 – input 32, 33, 69, 70, 81, 88, 94, 136, 143 – interaction 20, 84, 150, 174, 204, 212, 224, 229, 260, 266, 272 – manipulation 14, 51 – model – acquisition 262 – features 97 – modeling 69, 71, 73, 74, 97, 254–257, 259, 260, 262, 265, 266, 275, 297, 298 – server 70, 73, 76 – systems 73, 75, 97, 277 – tools 260 – neurotypical 125 – preferences 53, 146, 165, 171, 172, 183, 184, 193, 196, 235, 237, 274 – privacy 277 – profile 34, 52, 71, 72, 133, 142, 160, 191, 193, 202, 203, 231, 261, 262, 265, 267, 272, 274 – properties 225, 254 – query 163, 165 – ratings 41, 136 – requirements 145, 165, 172, 275 – satisfaction 20, 33, 126, 141, 162, 167, 241 – study 71, 90, 91, 96, 147, 171, 239 – traits 3, 4, 18, 20–22

306 | Index

User interface (UI) – adaptations 257 – personalization 217, 255, 258, 268, 276 – tangible 202, 216 User interfaces (UI) 67 – adaptations 71 User modeling 114

Virtual reality 202–206 Virtual User Modelling and Simulation Standardisation (VUMS) 261 VUMS exchange format 261, 262 Wearable technologies (WT) 287 World Health Organization (WHO) 258