Multimodal Affective Computing. Technologies and Applications in Learning Environments 9783031325410, 9783031325427


137 116 5MB

English Pages [211] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Foreword
Preface
Acknowledgments
Contents
Part I Fundamentals
1 Affective Computing
1.1 Introduction
1.2 Theories of Emotions, Sentiments, and Affect
1.3 Theories of Personality and Learning
1.3.1 Main Personality Theories
1.3.2 The Effect of Personality on Learning
1.4 Cognitive Processing and Learning-Oriented Emotions
1.5 Emotions, Sentiment, Personality, and the Machine
1.6 Discussion
References
2 Machine Learning and Pattern Recognition in Affective Computing
2.1 Introduction
2.2 Input Data in Affective Computing
2.3 Machine Learning Variants and Models
2.3.1 Supervised Learning
2.3.2 Unsupervised Learning
2.3.3 Other Learning Variants
2.4 Dimensionality Reduction
2.5 Deep Learning
2.5.1 Neural Networks
2.5.2 Convolutional Neural Networks
2.5.3 Sequential Models
2.5.4 Other DL-Based Models
2.6 Discussion
References
3 Affective Learning Environments
3.1 Introduction
3.2 The Dynamics of Teaching and Learning
3.3 Theoretical Models for the Role of Affect in Learning
3.4 Design of Affective Learning Environments
3.5 Discussion
References
Part II Sentiment Analysis for Learning Environments
4 Building Resources for Sentiment Detection
4.1 Introduction
4.2 Experimental Setup Design
4.3 Data Mining System Design and Implementation
4.4 Data Mining Challenges
4.5 Discussion
References
5 Methods for Data Representation
5.1 Introduction
5.2 Tokenization
5.3 Parsing
5.4 Stemming and Lemmatization
5.5 Word Embeddings
5.6 Discussion
References
6 Designing and Testing the Classification Models
6.1 Introduction
6.2 Lexicon-Based Sentiment Analysis
6.3 Multilayer Perceptron
6.4 Convolutional Neural Networks
6.5 Long Short-Term Memory Neural Networks
6.6 Evaluation Protocols
6.7 Discussion
References
7 Model Integration to a Learning System
7.1 Introduction
7.2 Building Resources
7.3 Dataset Focused on the Programming Language Domain
7.4 Creation of a Dictionary of Emotions Focused on Learning (SentiDICC)
7.5 Model Selection Process
7.6 Evaluation Metrics
7.7 Model Training and Validation
7.8 Affective Learning Environment
7.8.1 Model Implementation
7.8.2 Affective Tutoring Agent
7.9 Discussion
References
Part III Multimodal Recognition of Learning-Oriented Emotions
8 Building Resources for Emotion Detection
8.1 Introduction
8.2 Experimental Setup Design
8.2.1 Selecting Data Modalities
8.2.2 Labeling Process
8.2.3 Work Environment
8.2.4 Emotion Elicitation
8.3 Discussion
References
9 Methods for Data Representation
9.1 Introduction
9.2 Image-Based Data Representation for Facial Expressions
9.3 Spectrogram-Based Data Representation for Speech
9.4 Signal-Based Data Representation for Physiological Data
9.5 Practical Considerations for Choosing Data Representation Methods
9.6 Discussion
References
10 Multimodal Recognition Systems
10.1 Introduction
10.2 Data Fusion Techniques
10.3 Convolutional Neural Networks in Multimodal Emotion Recognition
10.4 Long Short-Term Memory in Multimodal Emotion Recognition
10.5 Evaluation Protocols
10.6 Discussion
References
11 Multimodal Emotion Recognition in Learning Environments
11.1 Introduction
11.2 Enhancing the Student Motivation, Engagement, and Cognitive Processing
11.3 Dataset Creation
11.3.1 Labeling Process
11.3.2 Fusing Different Datasets
11.4 Defining DL Architectures
11.4.1 Convolutional Neural Networks
11.4.2 Long Short-Term Memory
11.5 Evaluation Protocols
11.6 Model Deployment
11.6.1 Data Pipelines
11.6.2 Model Interpretation
11.7 Affective Tutoring Agent
11.8 Discussion
References
Part IV Automatic Personality Recognition
12 Building Resources for Personality Recognition
12.1 Introduction
12.2 Data Structure Design
12.3 Personality Data Annotation
12.4 Applications for Data Collection
12.5 Discussion
References
13 Methods for Data Representation
13.1 Introduction
13.2 Speech Data Representation
13.3 Text Data Representation
13.4 Facial Expressions Data Representation
13.5 Physiological Signals Data Representation
13.6 Differences Between Emotion and Personality Data Representation
13.7 Discussion
References
14 Personality Recognition Models
14.1 Introduction
14.2 Unimodal Architectures
14.3 Multimodal Architectures
14.4 Discussion
References
15 Multimodal Personality Recognition for Affective Computing
15.1 Introduction
15.2 Design of a Data Structure
15.2.1 Collecting a Dataset for APP
15.2.2 Creating a Dataset for APR
15.2.3 Apparent Personality Perception (APP) Versus Automatic Personality Recognition (APR)
15.3 An Application for Data Collection
15.3.1 Architectural Model of the Platform
15.4 Data Recollection Process
15.5 Adapting a Dataset to a Working Environment
15.5.1 Image Preprocessing
15.5.2 Sound Preprocessing
15.6 Personality Recognition Model Design
15.6.1 Image-Based Models
15.6.2 Sound-Based Models
15.6.3 Multimodal Models
15.7 Laboratory Tests
15.8 Models as a Service
15.9 Personality Recognition in Education
15.10 Discussion
References
Glossary
Recommend Papers

Multimodal Affective Computing. Technologies and Applications in Learning Environments
 9783031325410, 9783031325427

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Ramón Zatarain Cabada  Héctor Manuel Cárdenas López  Hugo Jair Escalante

Multimodal Affective Computing Technologies and Applications in Learning Environments

Multimodal Affective Computing

Ramón Zatarain Cabada • Héctor Manuel Cárdenas López • Hugo Jair Escalante

Multimodal Affective Computing Technologies and Applications in Learning Environments

Ramón Zatarain Cabada Instituto Tecnológico de Culiacán Culiacán, Sinaloa, Mexico

Héctor Manuel Cárdenas López Instituto Tecnológico de Culiacán Culiacán, Sinaloa, Mexico

Hugo Jair Escalante Óptica y Electrónica Instituto Nacional de Astrofísica Puebla, Puebla, Mexico

ISBN 978-3-031-32541-0 ISBN 978-3-031-32542-7 https://doi.org/10.1007/978-3-031-32542-7

(eBook)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To Zyanya, Ana, Naomi, and Lucy – Ramón Zatarain Cabada To Tzitzi, Héctor, María, Selene, Karla, and Lizzette – Héctor Manuel Cárdenas López

Foreword

The exploration of new knowledge has been a motor that has accelerated the development of new technological areas and niches of science that in turn open up opportunities and promote the discovery of more knowledge. Some new knowledge is more useful and drives more noticeably the development of technological and scientific areas that are in search of innovations and new guidelines to, in turn, have more significant advances. Within important areas of greater and faster technological advances is Computer Science where Artificial Intelligence has become a generating branch of new fields of study that open windows that allow us to glimpse huge amounts of knowledge to be discovered to support the continuity of the progress of society as a whole. There are several branches of Artificial Intelligence that stand out for their contribution to the rapid development of intelligent systems, among those areas is Affective Computing which was initiated with the noble intention to “humanize” as much as possible the methodologies that take into account emotions, feelings, and personality to improve their functioning by providing capabilities that take into account the affective states inherent in the human race. This has spawned an emerging field of Artificial Intelligence known as Affective Computing (AC). On the other hand, another independent field was developing on its own particularly oriented to benefit all types of educational processes. Originally, this field was known as Intelligent Tutoring Systems (ITS), which after adopting new educational trends and tools became what now we know as Intelligent Learning Environments (ILE). The combination of these two methodologies has generated a synergistic symbiosis that in a very short time has demonstrated significant advances and benefits, and allowed the emergence of surprising fields of application previously not contemplated nor intuited. In this book entitled Multimodal Affective Computing, different Artificial Intelligence methodologies have been compiled that allow the implementation of affective states in intelligent learning environments. Inside the material provided by the authors, the reader will find a well-organized and detailed overview of the most relevant features of the two main methodologies from their fundamentals, their essential theoretical support up to their fusion and some successful practical applications. Basic concepts of Affective Computing, Machine Learning and Pattern Recognition vii

viii

Foreword

in Affective Computing, and Affective Learning Environments are written in a comprehensive and easy to read manner. In the second part, an overall review of an emerging field called Sentiment Analysis for Learning Environments is introduced, including a systematic descriptive tour through topics such as building resources for sentiment detection, methods for data representation, designing and testing the classification models, and model integration to a learning system. The methodologies corresponding to Multimodal Recognition of Learning-Oriented Emotions are presented in the third part where topics such as building resources for emotion detection, methods for data representation, multimodal recognition systems, and multimodal emotion recognition in learning environments with all their subtopics are presented. The fourth and last part of the book is devoted to a wide application field of the combination of methodologies which is Automatic Personality Recognition, dealing with issues like building resources for personality recognition, methods for data representation, personality recognition models, and multimodal personality recognition for affective computing complete the set of topics that cover the integral approach of Affective Computing as it can be positively exploded in Intelligent Learning Environments. This book can be very useful not only for beginners intending to enter to the practice of these useful emerging methodologies but also for advanced and experts in the practice and developments of the field as a whole. Any researcher, with or without experience, eager to find new knowledge capable of showing attractive and useful horizons tending to make intelligent learning environments consider human affective states in all their processes, will be satisfied after consulting this book. They will find the needed materials and ideas to provide to any learning environment, not only with intelligence that allows customizing the learning process but also with the possibility of taking decisions and reacting to emotional states, and the personality of the user. Puebla, Mexico March 2023

Carlos A. Reyes-García

Preface

The automatic analysis of human behavior has been a problem approached for a while now in the context of affective computing, social signal processing, and computer vision. Significant progress has been reported on the fields of emotion, sentiment, and personality recognition during the last few years. Despite the progress, there needed to be a compilation providing an end-to-end treatment on these subjects, especially with educational applications, making it difficult for researchers and students to get on track with fundamentals, established methodologies, conventional evaluation protocols, and the latest progress on these subjects. This book comprises the first compilation in such a direction. The book is divided into four parts that cover complementary and relevant topics around multimodal affective computing. Part I covers fundamental material as tutorial chapters on affective computing, machine learning, and affective computing in the context of learning environments. These chapters elaborate on general topics required to understand the solutions to the tasks. Parts II, III, and IV elaborate on sentiment analysis, emotion recognition, personality analysis, and their impact in learning environments, respectively. These chapters deal with the processes of design and corpus generation, data preprocessing, model learning, and evaluation. All of this with an emphasis on multimodal learning. To the best of our knowledge, this book is among the first compilation dealing with this particular subject. We hope the reader finds this resource useful. This book would not be possible without the support of many people involved in the writing and review process. In particular, we would like to thank professor María Lucía Barrón Estrada and Víctor Manuel Ba´tiz Beltrán for the insightful discussion and their methodical review of the content of the book. Culiacán, México Culiacán, México Puebla, México March, 2023

Ramón Zatarain Cabada Héctor Manuel Cárdenas López Hugo Jair Escalante

ix

Acknowledgments

The authors of this book are grateful to the students, teachers and researchers who contributed and helped in various ways to write this book. The support of colleagues and students from the laboratory of advanced learning technologies of the Instituto Tecnológico de Culiacán and from the Department of Computer Sciences of INAOE (Instituto Nacional de Astrofísica, Óptica y Electrónica) whose help has been very generous and invaluable for us. We would like to specially mention María Lucía Barrón Estrada, Víctor Manuel Bátiz B. and Nestor Leyva López for their help in reviewing this book, and to Tzitzi Guerrero Gallardo for her help in the graphic design of some figures. Finally, we would like to thank the Springer staff, particularly Paul Drougas and Shanthini Kamaraj, for their help and support. Ramón Zatarain Cabada Héctor Manuel Cárdenas López Hugo Jair Escalante

xi

Contents

Part I Fundamentals 1

Affective Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Theories of Emotions, Sentiments, and Affect . . . . . . . . . . . . . . . . . . . . 1.3 Theories of Personality and Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Main Personality Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 The Effect of Personality on Learning . . . . . . . . . . . . . . . . . . . 1.4 Cognitive Processing and Learning-Oriented Emotions . . . . . . . . . . 1.5 Emotions, Sentiment, Personality, and the Machine . . . . . . . . . . . . . . 1.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3 4 7 8 10 11 15 18 19

2

Machine Learning and Pattern Recognition in Affective Computing 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Input Data in Affective Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Machine Learning Variants and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Other Learning Variants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Sequential Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Other DL-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21 21 22 23 23 25 26 26 28 29 30 30 32 32 32

3

Affective Learning Environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Dynamics of Teaching and Learning. . . . . . . . . . . . . . . . . . . . . . . . . .

35 35 37 xiii

xiv

Contents

3.3 Theoretical Models for the Role of Affect in Learning . . . . . . . . . . . 3.4 Design of Affective Learning Environments . . . . . . . . . . . . . . . . . . . . . . 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38 39 40 41

Part II Sentiment Analysis for Learning Environments 4

Building Resources for Sentiment Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Experimental Setup Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Data Mining System Design and Implementation. . . . . . . . . . . . . . . . . 4.4 Data Mining Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45 45 47 48 51 52 53

5

Methods for Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Tokenization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Stemming and Lemmatization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Word Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55 55 57 58 59 61 64 64

6

Designing and Testing the Classification Models . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Lexicon-Based Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Multilayer Perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Convolutional Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Long Short-Term Memory Neural Networks . . . . . . . . . . . . . . . . . . . . . . 6.6 Evaluation Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67 67 68 69 71 72 73 74 74

7

Model Integration to a Learning System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Building Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Dataset Focused on the Programming Language Domain . . . . . . . . 7.4 Creation of a Dictionary of Emotions Focused on Learning (SentiDICC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Model Selection Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Model Training and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Affective Learning Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.1 Model Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8.2 Affective Tutoring Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77 77 78 78 81 84 85 85 86 86 90

Contents

7.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

90 91

Part III Multimodal Recognition of Learning-Oriented Emotions 8

Building Resources for Emotion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Experimental Setup Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Selecting Data Modalities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Labeling Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Work Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Emotion Elicitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95 95 96 98 99 100 101 102 102

9

Methods for Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Image-Based Data Representation for Facial Expressions . . . . . . . . 9.3 Spectrogram-Based Data Representation for Speech . . . . . . . . . . . . . 9.4 Signal-Based Data Representation for Physiological Data . . . . . . . 9.5 Practical Considerations for Choosing Data Representation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

105 105 106 107 109

Multimodal Recognition Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Data Fusion Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Convolutional Neural Networks in Multimodal Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Long Short-Term Memory in Multimodal Emotion Recognition 10.5 Evaluation Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

115 115 116

Multimodal Emotion Recognition in Learning Environments . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Enhancing the Student Motivation, Engagement, and Cognitive Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Dataset Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Labeling Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.2 Fusing Different Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Defining DL Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.2 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Evaluation Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Model Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

123 123

10

11

110 112 112

117 118 119 120 121

124 124 128 130 135 135 137 140 141

xvi

Contents

11.6.1 Data Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6.2 Model Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Affective Tutoring Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

142 142 143 146 147

Part IV Automatic Personality Recognition 12

Building Resources for Personality Recognition . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Data Structure Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Personality Data Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Applications for Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

151 151 152 153 155 156 156

13

Methods for Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Speech Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Text Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Facial Expressions Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Physiological Signals Data Representation . . . . . . . . . . . . . . . . . . . . . . . . 13.6 Differences Between Emotion and Personality Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

159 159 160 161 162 163

14

Personality Recognition Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Unimodal Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Multimodal Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

167 167 168 169 171 171

15

Multimodal Personality Recognition for Affective Computing. . . . . . . . 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Design of a Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Collecting a Dataset for APP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.2 Creating a Dataset for APR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.3 Apparent Personality Perception (APP) Versus Automatic Personality Recognition (APR) . . . . . . . . . . . . . . 15.3 An Application for Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Architectural Model of the Platform . . . . . . . . . . . . . . . . . . . . . 15.4 Data Recollection Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5 Adapting a Dataset to a Working Environment. . . . . . . . . . . . . . . . . . . . 15.5.1 Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5.2 Sound Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

173 173 174 175 177

163 164 164

178 180 181 182 185 186 187

Contents

15.6

Personality Recognition Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.1 Image-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.2 Sound-Based Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.3 Multimodal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.7 Laboratory Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.8 Models as a Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.9 Personality Recognition in Education. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xvii

188 188 191 192 194 203 205 207 207

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Part I

Fundamentals

Chapter 1

Affective Computing

Abstract This chapter is dedicated to the study of the processes required to recognize, interpret, and process human affect. The most used affective models are studied, including a discussion on how computers can interpret them. The chapter covers the study of emotional theories (including evolutionary, physiological, neurological, and cognitive), universal emotions, valence, and arousal, learningoriented emotions, as well as the basics of sentiment and affective states as a human motivator. On the field of personality, the chapter covers personality theories (Freud, trait, authoritarian), personality factors, and different scales that exist in psychology to categorize them. The different educational models related to learning and learning techniques are also presented. In addition, the process of cognition, the dynamics of the cognitive process, and the emotional motivators of such process at the same time will be discussed. We will also have a closing subchapter that discusses about emotions, sentiment, and personality analyzed through the machine as well as how can we use the emotions from a system application point of view and what a machine can learn from the emotional states. The aim of this chapter is to give background to the reader about personality, emotions, sentiment, and affect from a psychological point of view for its application in computation and how we can measure the different indicators of an existing condition in a user.

1.1 Introduction Computational systems traditionally capture user interactions to generate a user model and use sophisticated algorithms to choose the appropriate content according to each user’s situation. Recently, new types of user models have begun to emerge that contain affect-related information such as emotions, sentiment, and personality. Initially, these models were conceptually interesting, but of little practical value, as emotions and personality were difficult to recognize. However, with the recent advancements of nonintrusive technologies for detecting emotional, sentimental expressions and personality cues, these models have generated interest for both researchers and practitioners in the field of personalized systems. Derived from this interest, at the end of the last century, a branch of artificial intelligence known © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Z. Cabada et al., Multimodal Affective Computing, https://doi.org/10.1007/978-3-031-32542-7_1

3

4

1 Affective Computing

as affective computing arose, which refers to the study and design of systems and devices that can recognize, interpret, and process emotions, sentiments, and personality. Today, there are various applications of personalized systems in many areas of human labor. One particular interest in this book is personalized learning environments, among which intelligent tutoring systems (ITS) are included. ITS can be defined as a teaching tool that determines the sequencing and presentation of content based on student performance (Nedungadi & Remya, 2015). When ITSs use affective states (emotions and sentiment), they become more complete and accurate for content adaptation and are redefined as affective tutoring systems (ATS) (Mao & Li, 2010). Moreover, in recent years, personality has been included as a branch of interest in the study of affective computing due to the relationship between the levels of expression of affective states and different personality attributes (Penley & Tomaka, 2002; Principi et al., 2019). On the other hand, cognition, also known as the process of thinking, is highly influenced by affective states, which in turn are modulated by personality. This correlation between personality, affective states, and cognition is important to study in order to understand the learning process and all the complex dynamics around it. This chapter presents several models of emotions, personality, and cognition. Afterward, we present a topic on how the computer has been used to interpret and use these models, known as affective computing. For this, we explain how we can use emotions from the point of view of computer systems with the initial works that established affective computing. This chapter aims to provide background information on personality, emotions, and sentiment from a psychological point of view for application in affective computing, answering the question of how the different indicators of an affective state can be measured in a user.

1.2 Theories of Emotions, Sentiments, and Affect Darwin is known primarily as a biologist, the father of the theory of evolution by natural selection. Still, the concern for species’ behavior was a constant part of the author’s work. In the book The Expression of the Emotions in Man and Animals (Darwin, 1872) Darwin develops his ideas about emotions and their communication with greater clarity. From the middle of the twentieth century, it became an obligatory point of reference in the psychology of emotional expression. In psychology, emotion is usually defined as a complex state of sentiment that causes physical and psychological changes that influence thinking and behavior. Emotionality is associated with several psychological phenomena, such as temperament, personality, mood, and motivation. According to author David G. Myers, human emotion involves: ...physiological arousal, expressive behaviors, and conscious experience. (Myers, 2003)

1.2 Theories of Emotions, Sentiments, and Affect

5

Fig. 1.1 Categories of emotion theories

The main theories of emotion can be classified into three categories, divided according to their origin. These categories are physiological, neurological, and cognitive as shown in Fig. 1.1. • Physiological theories suggest that responses within the body are responsible for emotions. • Neurological theories propose that activity within the brain leads to emotional responses. • Cognitive theories hold that thoughts and other mental activities play an essential role in forming emotions. However, in order to understand these categories, it is important first to analyze Darwin’s evolutionary theory of emotions. According to biologist Charles Darwin, emotions evolved because they were adaptive mechanisms that allowed humans and animals to survive and reproduce. This theory mentions that sentiments of love and affection lead people to seek mates and reproduce, while sentiments of fear compel people to fight or flee from the source of danger. Emotions motivate people to respond quickly to environmental stimuli, which helps improve the chances of success and survival. Understanding the emotions of other people and animals also plays a crucial role in safety and survival. If you meet an animal that growls and scratches, chances are you will quickly realize that the animal is frightened, and you should leave it alone. If you can correctly interpret the emotional displays of other people and animals, you can respond correctly and avoid danger. Physiological Theories One of the best-known physiological theories of emotion is the James-Lange theory. Proposed by psychologist William James and physiologist Carl Lange, the JamesLange theory of emotion suggests that emotions occur due to physiological reactions to events (James, 1884). This theory suggests that seeing an external stimulus produces a physiological reaction. An emotional reaction depends on how those physical reactions are interpreted. Another well-known physiological theory is the

6

1 Affective Computing

Cannon-Bard theory of emotion. Walter Cannon disagreed with the James-Lange theory of emotion on several points. First, he suggested that people can experience physiological reactions related to emotions without actually feeling them. For example, one’s heart may beat rapidly because one has exercised, not because of fear (Cannon, 1987). Cannon also suggested that emotional responses occur too quickly to be simply the product of physical states: When one encounters danger in the environment, one often feels fear before one begins to experience the physical symptoms associated with fear, such as trembling hands, rapid breathing, and racing heart.

Cannon first proposed his theory in the 1920s, and physiologist Philip Bard later extended his work during the 1930s. According to the Cannon-Bard theory of emotion, people feel emotions and experience physiological reactions such as sweating, trembling, and muscle tension simultaneously (Friedman, 2010). More specifically, the theory proposes that emotions occur when the thalamus sends a message to the brain in response to a stimulus that triggers a physiological reaction. At the same time, the brain also receives signals that trigger the emotional experience. Cannon and Bard’s theory suggests that the physical and psychological experience of emotion occurs simultaneously and that one does not cause the other. The facial feedback theory of emotion suggests that facial expressions are connected to the experience of emotion. Both Charles Darwin and William James noted early on that sometimes physiological responses directly impacted emotion, rather than simply being a consequence of it. Proponents of this theory suggest that changing facial expressions generates a change in emotional experience, i.e., we express, and then we feel. For example, people who are forced to smile pleasantly at a social event have a more pleasant experience than if they would have frowned or worn a more neutral facial expression (Davis et al., 2009). Neurological Theories The Schachter-Singer theory (Schachter & Singer, 1962), also known as the twofactor theory of emotion, is an example of a neurological theory of emotion. This theory suggests that physiological arousal occurs first and then the individual must identify the reason for this arousal to experience it and label it as an emotion. A stimulus leads to a physiological response that is then interpreted and cognitively labeled, resulting in an emotion. Schachter and Singer’s theory is based on the James-Lange and Cannon-Bard theories. Like the James-Lange theory, the Schachter-Singer theory proposes that people infer emotions based on physiological responses. The critical factor is the situation and the cognitive interpretation people use to label that emotion (Schachter & Singer, 1962). Like the Cannon-Bard theory, the Schachter-Singer theory also suggests that similar physiological responses can produce different emotions. Cognitive Theories According to appraisal theories of emotion, before experiencing emotion, one must think. Richard Lazarus was a pioneer in this area of emotion, and this theory is often referred to as the Lazarus theory of emotion. According to this theory, the sequence

1.3 Theories of Personality and Learning

7

of events involves first a stimulus, followed by thought, which then leads to the simultaneous experience of a physiological response and emotion (Folkman, 2013). Sentiment and Affect Theories Sentiment and affect theories are two different approaches to understanding human emotion. Sentiment theory is based on cognitive (Folkman, 2013) and neurological theories (Schachter & Singer, 1962), and it proposes that emotions are made up of cognitive appraisals of situations or events and that these appraisals give rise to a specific emotion. In other words, according to this theory, emotions result from the interpretation of situations and events, rather than the events themselves. On the other hand, affect theory is based on physiological theories (Cannon, 1987; Friedman, 2010), and it suggests that emotions are a more basic and universal experience innate to human beings. This theory proposes that emotions result from a physiological response to stimuli and that these responses are shared by all human beings regardless of their cultural background or personal experiences. Sentiment theory emphasizes the cognitive aspect of emotions, while affect theory emphasizes the physiological aspect of emotions. Both approaches have contributed to our understanding of human emotion offering different perspectives on how emotions are experienced and expressed.

1.3 Theories of Personality and Learning Personality Although people frequently discuss personality (“He has a bad personality!” or “His personality is ideal for this job!”), psychologists have not reached a consensus on the precise definition of what constitutes personality. Personality is described as the characteristic patterns of thoughts, sentiments, and behaviors that make a person unique. Simply put, it is what “makes you you.” Researchers have found that although some external factors may influence the expression of certain traits, personality originates within the individual. In addition, although some aspects of personality may change as we age, personality also tends to remain fairly constant throughout life. Because personality plays such an important role in human behavior, an entire branch of psychology is devoted to studying this subject. Personality psychologists are interested in the unique characteristics of individuals as well as the similarities between groups of people. Personality Qualities To understand the psychology of personality, it is important to learn some of the key features of how it functions (Corr & Matthews, 2009). The personality is organized and consistent. People are inclined to show specific aspects of their personality in different situations, but their responses tend to be stable. While personality is usually stable, it can be influenced by the environment. For instance, while your personality may lead you to become self-conscious in social situations, an emergency may lead you to take a more blunt approach and take the initiative (Roberts et al., 2007).

8

1 Affective Computing

Personality causes behaviors to occur. You react toward objects and people in your environment based on your personality. From your choices to your career path, your personality impacts every aspect of your life. There are different techniques used in the study of personality. Each technique has its strengths and weaknesses. Experimental methods are those in which the researcher controls and manipulates the variables of interest and measure the results. This is the most scientific form of research, but experimental research can be complex when studying aspects of personality such as motivation, emotions, and impulses. These ideas are internal and abstract and can be difficult to measure. The experimental method allows researchers to observe cause-and-effect relationships between different variables of interest. Case studies and self-report methods involve in-depth analysis of an individual and the information the individual provides. Case studies rely heavily on the observer’s interpretations, while self-report methods rely on the memory of the individual of interest. As a result, these methods tend to be very subjective, and it is difficult to generalize the results to a larger population. On the other hand, clinical research is based on information obtained from clinical patients throughout treatment. Many personality theories are based on this type of research. Because research subjects are unique and exhibit abnormal behavior, this research tends to be subjective and difficult to generalize.

1.3.1 Main Personality Theories Personality psychology is the focus of some of the best-known psychological theories studied by important figures such as Sigmund Freud and Erik Erikson. Some of these theories attempt to address a specific area of personality, while others attempt to explain personality much more broadly. Biological Theories According to biological approaches, personality is primarily determined by genetics and favors the “nature” side of the nature versus nurture debate. Research on heritability has linked personality traits to genetics (Vukasovi´c & Bratko, 2015). Twin studies are often used to investigate the extent to which traits are influenced by genetics versus environmental variables. By analyzing the personalities of twins raised together versus those raised separately, researchers can determine which factors have a greater impact. Hans Eysenck, a renowned theoretical biologist, linked aspects of personality to biological processes. He suggested that personality is influenced by cortisol, a stress hormone. According to his theory, introverts have high cortical arousal and avoid stimulation, while extroverts have low cortical arousal and crave stimulation (Soliemanifar et al., 2018). Behavioral Theories B. F. Skinner and John B. Watson are among the prominent figures in the field of behavioral theory. Behavioral theorists argue that personality arises from the

1.3 Theories of Personality and Learning

9

interplay between an individual and their environment. They focus on observable and measurable behaviors, rejecting theories that ascribe a role to internal thoughts, moods, and sentiments that cannot be objectively measured (Zeigler-Hill & Shackelford, 2020). In line with the tenets of behavioral theory, conditioning—the process of shaping predictable behavioral responses—occurs through interactions with our environment, ultimately shaping our personalities. Psychodynamic Theories Psychodynamic theories of personality are heavily influenced by the work of Sigmund Freud and emphasize the influence of the unconscious mind and childhood experiences on personality. Psychodynamic theories include Sigmund Freud’s theory of psychosexual stages and Erik Erikson’s stages of psychosocial development (Bornstein, 2003). Freud believed that the three components of the personality were the id, ego, and superego. The id is responsible for needs and drives, while the superego regulates ideals and morals. The ego moderates the demands of the id, superego, and reality. Freud suggested that children progress through stages in which the id’s energy is focused on different erogenous zones. Erikson also believed that the personality progressed through several stages, with specific conflicts arising at each stage. Success at any stage depended on overcoming these conflicts. Humanistic Theories Humanistic theories emphasize the importance of free will and individual experience in personality development. Humanistic theorists include Carl Rogers and Abraham Maslow (Wong, 2005). They promote the concept of self-actualization, which is the innate need for personal growth, and how personal growth motivates behavior. Theories of Traits The trait theory is one of the most prominent approaches within personality psychology. According to this theory, personality is composed of several general traits. A trait is a relatively stable characteristic that causes an individual to behave in a certain way. It is essentially the psychological “blueprint” that informs patterns of behavior. Some of the best-known trait theories are Eysenck’s three-dimensional theory (Eysenck, 1981) and Goldberg’s five-factor theory of personality (Goldberg, 1992). Eysenck used personality questionnaires to collect participant data and then employed a statistical technique known as factor analysis to analyze the results. Eysenck concluded that there were three main dimensions of personality: extroversion, neuroticism, and psychoticism (Taub, 2009). Eysenck believed these dimensions combined in different ways to form an individual’s unique personality. Later research suggested that five major dimensions make up an individual’s personality, often called the Big Five theory of personality. The Big Five theory (Goldberg, 1992) suggests that five major personality dimensions can characterize all personalities, openness, conscientiousness, extraversion, agreeableness, and neuroticism, collectively referred to by the acronym OCEAN. This theory is one

10

1 Affective Computing

Fig. 1.2 Five-factor model of personality

of the most widely used in affective computing. Figure 1.2 shows the five major personality attributes with the perceptions of their extreme values.

1.3.2 The Effect of Personality on Learning Individual differences in learning have long been an important research topic. This field provides useful insights into practice for developing teaching and learning support and contributes to a deeper understanding of cognitive, emotional, and behavioral mechanisms in learning processes. Broadly speaking, cognitive differences are a significant factor in determining an individual’s abilities and maximum capacity. Intelligence is a cognitive difference that measures an individual’s cognitive abilities and potential. It assesses an individual’s mental capacity for

1.4 Cognitive Processing and Learning-Oriented Emotions

11

problem-solving, learning, and reasoning tasks and indicates what an individual can do. However, while intelligence is crucial in determining an individual’s capacity, it does not necessarily predict their typical behavior. Personality and motivation are key factors influencing an individual’s typical behavior and actions. Personality traits such as openness, conscientiousness, and neuroticism can influence how individuals approach different situations and their behavioral tendencies. Conversely, motivation can drive an individual to pursue certain goals and affect the effort they put into different tasks. Therefore, intelligence indicates an individual’s maximum capacity, personality, and motivation are vital in determining how an individual will typically behave and the actions they will take (Chamorro-Premuzic & Furnham, 2009). Personality traits can impact the educational processes by directing, framing, reinforcing, or weakening it, depending on the trait in question and what is being learned. Personality can influence learning indirectly through attitudes and motivation that create particular conceptions of learning, investment in learning, and preferred ways of learning. Personality traits can also be expressed in learning styles, enabling the implementation of learning strategies that produce a particular outcome. For example, deep learning, which reflects intrinsic motivation and often results in a good study outcome, has been related to study personality traits and has been linked to personality traits such as openness, conscientiousness, and neuroticism (Diseth, 2003). Personality influences both a student’s behavior in an educational context and learning outcomes. Self-conscious students, for example, tend to attend classes, while extroverted students tend to have a higher degree of absence.

1.4 Cognitive Processing and Learning-Oriented Emotions Cognition is a term that refers to the mental processes involved in knowledge acquisition and understanding. Some of the different cognitive processes are thinking, knowing, remembering, judging, and problem-solving (Neisser, 2014). These are high-level brain functions that encompass a range of activities such as language, imagination, perception, and planning. Cognitive psychology is the field of psychology that investigates how people think and the processes involved in cognition. Some divide cognition into two categories: hot and cold. Hot cognition refers to mental processes involving emotions, such as reward-based learning. In contrast, cold cognition refers to mental processes that do not involve sentiments or emotions, such as working memory. Cognitive theories emphasize both the creative process and the person: the process by emphasizing the role of cognitive mechanisms as the basis of creative thinking and the person by considering individual differences in these mechanisms. Some cognitive theories focus on universal abilities, such as attention or memory; others emphasize individual differences, such as those indexed

12

1 Affective Computing

by divergent thinking tasks; some focus on conscious operations while others on preconscious, implicit, or involuntary processes. A classic cognitive theory of Sarnoff A. Mednick (1962) holds that creative ideas may result from associative processes in memory. According to this view, ideas are combined together, one after another, and the more remote associations tend to be more original. This perspective holds that more creative individuals tend to have hierarchies of associations rather than less creative individuals; in other words, more creative people have many more relatively strong associations for a given concept rather than just a few. This is thought to provide greater scope for the simultaneous activation of distant representations, which many believe to be an important driver of creative thinking. Similarly, another cognitive theory focuses on how concepts combine to generate novelty (Martindale et al., 1988). Research suggests that conceptual combination “putting two different sets of information together” is often involved in creative ideation, that original ideas are more likely when two disparate features are brought together, and that connections between these concepts can only be seen at a very high level of abstraction. This type of thinking has been called metaphorical logic, with the idea that something like “angry weather” is only understandable in a nonliteral way. This type of process can suggest creative alternatives to stereotypical lines of thought. In cognition, as in all areas of study, there is a process to model the way of thinking. This model is known as the cognitive process. The Cognitive Process Cognitive processing is a series of chemical and electrical signals in the brain that allow us to understand our environment and acquire knowledge. Neurons release chemicals that create electrical signals in nearby neurons, creating a mass of signals that are then translated into conscious and unconscious thoughts. Conscious interpretation of the five senses, procedural knowledge, and emotional reactions are examples of outcomes of the cognition process (Blomberg, 2011). There are many different types of cognitive processes, among them are: • Attention: a cognitive process that allows people to focus on a specific stimulus in the environment. • Language: a cognitive process that involves the ability to understand and express thoughts through spoken and written words. This allows us to communicate with others and plays an important role in thinking. • Learning: a cognitive process that involves the assimilation of new things, the synthesis of information, and its integration with previous knowledge. • Memory: a cognitive process that enables people to encode, store, and retrieve information. It is a critical component in the learning process and allows people to retain knowledge about the world and their personal histories. • Perception: a cognitive process that allows people to take in information through their senses and use it to respond and interact with the world. • Thinking: Thinking is an essential part of all the cognitive processes. It enables people to engage in decision-making, problem-solving, and higher reasoning.

1.4 Cognitive Processing and Learning-Oriented Emotions

13

The learning process is of central interest to this book, which requires a series of constructs that allow the acquisition of new knowledge. Cognitive psychology has created constructs that allow its analysis to model the learning process. Cognitive Model of Learning A cognitive model is a descriptive account or computational representation of human thinking about a concept, skill, or domain. In this case, the focus is on cognitive knowledge and skills instead of sensorimotor skills and may include declarative, procedural, and strategic knowledge. A cognitive model of learning is, therefore, an explanation of how humans acquire accurate and complete knowledge. This is closely related to metacognitive reasoning (colloquially known as thinking about thinking) and is usually the result of three phases: 1. Review of existing knowledge (e.g., memory) 2. The acquisition and encoding of new knowledge from instruction or experience (e.g., reasoning) 3. Combining existing components to infer and deduce new knowledge A cognitive model of learning explains or simulates mental processes and shows how relatively permanent changes occur in learners’ long-term memory. Impoverished cognitive models, for instance, are models that are inadequate or incomplete in representing the cognitive processes involved in learning. These models often oversimplify complex cognitive processes, leaving out important details that can help explain learner errors and misconceptions. Understanding impoverished cognitive models can be particularly useful in identifying common errors and misconceptions made by learners and designing appropriate instructional interventions to help overcome these challenges. By using cognitive models of learning, educators can gain insight into how learners perceive and process information and develop effective strategies to support and enhance their learning. Moreover, using these models can help diagnose and address errors and misconceptions, making learning more efficient and effective for all learners (Lane, 2012). Model of Affective States Psychologists in recent decades have made a significant effort to create models of affective states (emotions). A model of affective states seeks to describe the different emotions in human beings by creating a concept or computational representation that allows them to be measured. During the last century, the two most accepted models in the study of affect are the model resulting from the basic theory of emotion and the dimensional theory of emotions. The difference lies in categorizing emotions as discrete entities or as an independent dimension. The model of the basic emotions defined by Ekman (1992) describes that the basic emotions are those emotions from which all other emotions are derived. Ekman described six emotions (happiness, sadness, fear, disgust, anger, and surprise). A visual representation of these is presented in Fig. 1.3. On the other hand, there is the model defined by James Russell, also known as the dimensional theory model of emotions or the circumflex model of emotions

14

1 Affective Computing

Fig. 1.3 Basic emotions

Fig. 1.4 Russell’s circumflex

as seen in Fig. 1.4 (Russell, 1980), which proposes that all affective states arise from cognitive interpretations of basic neural sensations that are the product of two independent neurophysiological systems. This model suggests that emotions are distributed in a two-dimensional circular space, defined through the dimensions

1.5 Emotions, Sentiment, Personality, and the Machine

15

of activation and valence. Activation represents the vertical axis, and valence represents the horizontal axis, while the center of the circle represents neutral valence and an intermediate level of activation. In this model, emotional states can be represented at any level of activation and valence. Affective States in the Cognitive Process of Learning In psychology, cognitive theory suggests that the interpretation of events causes the reaction to events; this of course includes emotions. As mentioned above, emotional cognitive theory suggests that affective states play an important role in the cognitive process of learning, either through the perception of the learner’s motivation (Linnenbrink-Garcia et al., 2016) or how perceived information is rationalized through the process of acquiring new knowledge (Calvo et al., 2015; D’Mello & Graesser, 2012). This theory allows the definition of models that relate affective states such as emotions and sentiments to the different parts of the learning process. One of the most widely used models to define this relationship is reflected in the work of D’Mello (D’Mello & Graesser, 2012). D’Mello’s model (Fig. 1.5) defines the different emotions that occur during the learning process and their relationship with each situation present in the learning process.

1.5 Emotions, Sentiment, Personality, and the Machine Cognitive emotional and personality trait theories have presented a promising outlook in affective computing by creating descriptive models that allow betterestablished definitions of emotions, sentiments, and personality. In the last decades, psychologists have tried to establish better foundations for the relationship between the different variables involved in cognition, emotional states, and personality, while pedagogues have been working on the definition of models that allow the use of these noncognitive values (affective states and/or personality) in learning environments, and at the same time, computer scientists have developed different systems that allow the automatic recognition of affective states and personality through computable features. Fundamental theories of emotion helped to define clearer boundaries in the interaction between the characteristics evaluated in the form of physiological responses and the affective state. From these models, computational algorithms are created. In particular, cognitive appraisal theory is one of the founding pillars of affective computing. This is because most models derived from cognitive appraisal theory, or appraisal theories, are easily transformed into computational code. The following are some fundamental theories: The Ortony, Clore, and Collins’ Theory Ortony, Clore, and Collins (Ortony et al., 1988) developed their theoretical approach expressly with the goal of implementing it on a computer:

16

1 Affective Computing

Fig. 1.5 Emotional dynamics during complex learning D’Mello (D’Mello & Graesser, 2012)

. . . , we would like to lay the foundation for a computationally tractable model of emotion. In other words, we would like to have an account of emotion that, in principle, could be used in an Artificial Intelligence (AI) system that, for example, would be able to reason about emotion. (Ortony et al., 1988, p. 2)

Ortony, Clore, and Collins’ theory assumes that emotions develop as a consequence of certain cognitions and interpretations; therefore, it focuses exclusively on the cognitive triggers of emotions. The authors postulate that there are three aspects that determine these cognitions: events, agents, and objects. This idea leads to the following structure of emotion types (Fig. 1.6): The intensity of an emotional sentiment is predominantly determined by three central intensity variables: desirability, which is related to the reaction to events

1.5 Emotions, Sentiment, Personality, and the Machine

17

Fig. 1.6 Structure of the types of emotions of Ortony, Clore, and Collins Table 1.1 Local variables of Ortony’s theory Events Desirability Convenience for others Merit Liking Possibility Effort Realization

Agents Praise for dignity Strength of the cognitive unit Deviation from expectation

Objects Appeal Familiarity

and is evaluated in terms of goals; praiseworthiness, which is related to the reaction to agents’ actions and is evaluated in terms of norms; and attractiveness, which is related to the reaction to objects and is evaluated in terms of attitudes. The authors also define a set of variables of global and local intensity. Sense of reality, proximity, unexpectedness, and arousal are the four global variables that operate on the three categories of emotions. The local variables, to which the central intensity variables mentioned above also belong, are shown in Table 1.1. Each of these variables is assigned a value and a weight in a given case. In addition, there is a threshold value for each emotion, below which an emotion is not subjectively felt. With the help of this formal system, a computer should be able to draw conclusions about the emotional episodes presented to it. The authors limit their goal quite explicitly:

18

1 Affective Computing Our interest in emotion in the context of AI is not an interest in questions such as “Can computers feel?” or “Can computers have emotions?”. There are those who think such questions can be answered in the affirmative..., however, our view is that the subjective experience of emotion is central, and we do not consider it possible for computers to experience anything until and unless they are conscious. Our suspicion is that machines are simply not the sort of thing that can be conscious. However, our skepticism about the possibility of machines having emotions does not mean that we think the issue of emotions is irrelevant to AI. There are many AI enterprises in which the ability to understand and reason about emotions or aspects of emotions could be important. (Ortony, et al., 1988, p. 182)

Roseman’s Theory Roseman’s theory (Roseman, 1979), first presented in 1979, was modified several times in the following years. Only the basic approach to a theory of emotion appraisal remained the same. Roseman developed his first theory by analyzing 200 written reports of emotional experiences. From these documents, he derived his model, in which five cognitive dimensions determine whether and which emotion arises. In total, 48 combinations of Roseman’s dimensions (positive/negative, present/absent, true/untrue, deserved/undeserved, and circumstances/others/self) can be formed. To these 48 cognitive appraisals, according to Roseman, 13 emotions correspond. Experimental tests of this approach did not generate the results postulated by Roseman, who decided to modify his model (Roseman, 1984). The second dimension of his original model (present/absent situation) now contained the states “consistent motive” and “inconsistent motive,” so that “consistent motive” always corresponds to the “positive” value of the first dimension and “inconsistent motive” to the “negative” value of the first dimension. Instead of the alternatives “present” and “absent,” the terms “appetitive” and “aversive” were now used. Roseman’s models have a simple structure that can be quickly translated into rules that define exactly which valuations elicit which emotions, which is why they were received very positively in AI circles. Dyer’s Boris model is based on Roseman’s first model, and Picard writes: “Overall, it shows promise for implementation on a computer, both for reasoning about emotion generation and for generating emotions based on cognitive appraisals” (Picard, 1997). In the same way, the Big Five theory of personality traits proposed by Goldberg (1992) allowed the creation of computable models that represent the information of personality attributes through the extraction of characteristics from different modalities to map a series of characteristics to each personality trait.

1.6 Discussion This chapter has provided an introduction to the affective computing field in the context of learning environments. The main theories for emotion, sentiment, and affect are presented and discussed. These are fundamental fields on which the affective computing field has been built upon on. Also, an introduction to personality, the main theories, and the OCEAN model of traits are presented.

References

19

In particular, the impact of personality into the learning process is discussed. Likewise, the role of cognition and emotion in the context of learning environments is reviewed. Overall, the chapter provides the necessary background for anyone interested on studying affective computing in the context of learning environments.

References Blomberg, O. (2011). Conceptions of cognition for cognitive engineering. The International Journal of Aviation Psychology, 21(1), 85–104. Bornstein, R. F. (2003). Psychodynamic models of personality. Handbook of psychology (pp. 117–134). John Wiley & Sons, Inc. Calvo, R. A., Member, S., Mello, S. D., & Society, I. C. (2015). Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(September), 18–37. Cannon, W. B. (1987). The James-Lange theory of emotions: a critical examination and an alternative theory. The American Journal of Psychology, 100(3-4), 567–586. Chamorro-Premuzic, T., & Furnham, A. (2009). Mainly Openness: The relationship between the Big Five personality traits and learning approaches. Learning and Individual Differences, 19(4), 524–529. Corr, P. J., & Matthews, G. (2009). The Cambridge handbook of personality psychology. Cambridge University Press. Darwin, C. R. (1872). Expression of the emotions in man and animals (No. 1). London. Davis, J. I., Senghas, A., & Ochsner, K. N. (2009). How does facial feedback modulate emotional experience? Journal of Research in Personality, 43(5), 822–829. Diseth, Å. (2003). Personality and approaches to learning as predictors of academic achievement. European Journal of Personality, 17(2), 143–155. https://doi.org/10.1002/per.469. D’Mello, S., & Graesser, A. (2012). Dynamics of affective states during complex learning. Learning and Instruction, 22(2), 145–157. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3-4), 169–200. Eysenck, H. J. (1981). A model for personality. Berlin, Heidelberg: Springer. Folkman, S. (2013). Stress: appraisal and coping. In Encyclopedia of behavioral medicine (pp. 1913–1915). Springer Publishing. Friedman, B. H. (2010). Feelings and the body: the Jamesian perspective on autonomic specificity of emotion. Biological Psychology, 84(3), 383–393. Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 26–42. James, W. (1884). II.—what is an emotion? Mind, os-IX(34), 188–205. Lane, H. C. (2012). Cognitive models of learning. Encyclopedia of the sciences of learning (pp. 608–610). Linnenbrink-Garcia, L., Patall, E. A., & Pekrun, R. (2016). Adaptive motivation and emotion in education: research and principles for instructional design. Policy Insights from the Behavioral and Brain Sciences, 3(2), 228–236. Mao, X., & Li, Z. (2010). Agent based affective tutoring systems: a pilot study. Computers and Education, 55(1), 202–208. Martindale, C., Moore, K., & West, A. (1988). Relationship of preference judgments to typicality, novelty, and mere exposure. Empirical Studies of the Arts, 6(1), 79–96. Mednick, S. (1962). The associative basis of the creative process. Psychological Review, 69(3), 220. Myers, D. G. (2003). Exploring psychology (5th ed. in modules). Worth Publishers.

20

1 Affective Computing

Nedungadi, P., & Remya, M. S. (2015). Incorporating forgetting in the personalized, clustered, bayesian knowledge tracing (PC-BKT) model. In Proceedings - 2015 International Conference on Cognitive Computing and Information Processing, CCIP 2015. Institute of Electrical and Electronics Engineers Inc. Neisser, U. (2014). Cognitive psychology: classic edition. Taylor and Francis. Ortony, A., Clore, G. L., & Collins, A. (1988). The cognitive structure of emotions. Cambridge University Press. Penley, J. A., & Tomaka, J. (2002). Associations among the Big Five, emotional responses, and coping with acute stress. Personality and Individual Differences, 32(7), 1215–1228. Picard, R. (1997). Affective computing. MIT Press. Principi, R. D. P., Palmero, C., Junior, J. C., & Escalera, S. (2019). On the effect of observed subject biases in apparent personality analysis from audio-visual signals. IEEE Transactions on Affective Computing, 12(3), 607–621. Roberts, B. W., Kuncel, N. R., Shiner, R., Caspi, A., & Goldberg, L. R. (2007). The power of personality: the comparative validity of personality traits, socioeconomic status, and cognitive ability for predicting important life outcomes. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 2(4), 313–345. Roseman, I. (1979). Cognitive aspects of emotion and emotional behavior. American Psychological Association. Roseman, I. (1984). Cognitive determinants of emotion: a structural theory. Review of personality & social psychology (pp. 11–36). SAGE Publications. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. Schachter, S., & Singer, J. (1962). Cognitive, social, and physiological determinants of emotional state. Psychological Review, 69(5), 379–399. Soliemanifar, O., Soleymanifar, A., & Afrisham, R. (2018). Relationship between personality and biological reactivity to stress: a review. Psychiatry Investigation, 15(12), 1100–1114. Taub, J. M. (2009). Eysenck’s descriptive and biological theory of personality: a review of construct validity. International Journal of Neuroscience, 94(3-4), 145–197. https://doi.org/ 10.3109/00207459808986443. Vukasovi´c, T., & Bratko, D. (2015). Heritability of personality: a meta-analysis of behavior genetic studies. Psychological Bulletin, 141(4), 769–785. Wong, P. T. P. (2005). Existential and humanistic theories. In Comprehensive handbook of personality and psychopathology, (pp. 192–211). Zeigler-Hill, V., & Shackelford, T. K. (2020). Encyclopedia of personality and individual differences. Springer.

Chapter 2

Machine Learning and Pattern Recognition in Affective Computing

Abstract Machine learning (ML) and pattern recognition are at the core of affective computing, as most tasks can be formulated as machine learning problems (e.g., recognition, clustering, prediction, forecasting, etc.).This chapter provides an introduction to ML. The goal of this chapter is to provide an overview of field, describing the main techniques that are used within affective computing and outlines current trends, aiming to make the book as self-contained as possible. We first introduce the learning problem and provide an overview of the main data modalities considered in affective computing. Then we describe the main ML variants and provide an overview of traditional techniques. Next, we present a section devoted to dimensionality reduction. Furthermore, we review learning methods based on deep learning. Finally, a brief discussion of the current trends is provided.

2.1 Introduction Machine learning (ML) is the field that seeks to develop computer programs, algorithms, methods, and techniques that improve their performance with experience (Mitchell, 1997). The term experience refers to the interaction with data. ML has received increased attention in the last decade, mostly due to breakthroughs in a number of fields including computer vision (Krizhevsky et al., 2012), natural language processing (Vaswani et al., 2017), and biology (Jumper et al., 2021), among several others. Affective computing is not the exception and has also benefited from ML progress. In fact, most tasks within affective computing have been approached with ML techniques (e.g., recognition-classification, clustering, prediction-regression, forecasting, etc.). The ubiquity of ML across fields is due to the fact that ML methods work on data, regardless of their nature. In this chapter, we provide an overview of ML with emphasis on problems and techniques that are closely related to affective computing. Before introducing variants and techniques, we provide a brief overview on the types of information commonly available in affective computing.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Z. Cabada et al., Multimodal Affective Computing, https://doi.org/10.1007/978-3-031-32542-7_2

21

22

2 Machine Learning and Pattern Recognition in Affective Computing

2.2 Input Data in Affective Computing Traditional ML models and techniques are applied to tabular data, that is, objects of interest are codified in vector-based representations that are feed into ML models. The representation of objects is obtained by characterizing them according to descriptive attributes or features. Alternatively, representations can be learned directly from raw data. This is the case, for example, of deep learning-based models. This section elaborates on the information modalities that are commonly available in affective computing applications. Within affective computing, affective states are represented throughout physiological reactions of human beings. Data derived from such reactions comprise the main information source for ML modeling. Data can come from different sources, called modalities. The mostly used modalities to characterize affective states are the following: • Images. It comprises visual information capturing facial expressions, gestures, and any other relevant aspect of a scene. Still images and sequences of images can be available, and imagery can come from sensors that are not in the visual spectrum of humans, e.g., infrared and thermal imagery. • Audio. It is commonly associated with speech signals, with the aim of capturing voice patterns from humans, although audio can include nonspeech signals which are also relevant for some applications. • Text. Written or transcribed text is a widely used data modality within affective computing. • Physiological signals. These comprise one of the most accurate data modalities, as they measure physiological reactions of human beings. Signals like EEG, ECG, and others could be effectively tied to affective states. A problem with this type of data is that commonly sensors are invasive and cannot be used outside a lab. • Behavioral-based. This is a broad category that includes those data derived from patterns of behavior in the context of an environment or the interaction with other people and certain devices, for instance, cellular phone usage patterns, type of websites visited by a user, etc. Clearly, depending on the complexity of the task, a single modality by itself rarely is able to capture enough information from the user or environment as to build accurate affect computing model. Therefore, it is common to have available a combination of information sources when dealing with an affective computing problem. ML models are feed with any data modality or combination of them, and after the learning phase, models are able to model affective states. As previously mentioned, classical models (see Sect. 2.3) operate on tabular data, this means that objects of interest described by the previously mentioned modalities must be expressed as vectors in a d-dimensional space (i.e., .x ∈ Rd ). On the other hand, deep learningbased models (see Sect. 2.5) are able to simultaneously learn a representation and a

2.3 Machine Learning Variants and Models

23

model from raw data. In the following, we provide an overview of ML variants and models referring to data in general.

2.3 Machine Learning Variants and Models This section presents the main machine learning variants and briefly elaborates on traditional models implementing these variants. The reader is advised to follow the references for detailed treatments of the subjects that are only superficially covered herein.

2.3.1 Supervised Learning Supervised learning is one of the most studied paradigms within ML. Classical supervised learning tasks include spam filtering, face recognition, and text classification among others. In the context of affective computing, supervised learning has had a major role, as it is widely used for this sort of applications. Consider, for instance, the landmark tasks of emotion recognition from facial imagery or speech signals (Wang et al., 2022), the estimation of personality from text or videos (Júnior et al., 2022), etc. Supervised learning models aim at learn to map inputs to outputs as specified by a dataset of labeled samples. In the case of emotion recognition from, say, facial images, inputs are images, and outputs are the emotion label (either discrete or continuous) associated with the subject’s face. More formally, in supervised learning, we have an available dataset .D, formed by N pairs of d-dimensional samples, .xi ∈ Rd , and labels1 .yi ∈ {−1, 1}, that is, .D = {(xi , yi )}i∈1,...,N . The goal of supervised learning is to find a function d → {−1, 1} mapping inputs to outputs, i.e., .y = f (x ), such that it .fθ : R j j can generalize beyond .D. The behavior of the function .fθ is determined by its parameters .θ , and these are dependent on the form of f (see below). Usually, .D is split into training and validation partitions. The former is used to learn the parameters of the function .fθ , and the latter is used to estimate the performance of the learned function. The ultimate goal is learning .fθ from .D such that label predictions can be made for any other instance sampled from the same underlying distribution as .D. More specifically, if we denote .T the test set, formed by instances coming from the same distribution as .D but that do not appear in such set, we seek a function that maximizes performance on .T, but without having access to it. There is a wide variety of ways of defining the function .fθ ; consider, for instance:

note that labels could be also real, i.e., .yi ∈ Rp (for regression tasks), or categorical, i.e., ∈ {C1 , . . . , Ck }, for clarity; we instead describe a binary classification task.

1 Please .yi

24

2 Machine Learning and Pattern Recognition in Affective Computing

• Linear functions. Data samples .xi are seen as points in a d-dimensional space; these methods aim to find an hyperplane (e.g., .fθ (x) = wT φ(x) + b) such that (i) it separates samples into (two) classes and (ii) the hyperplane is a linear combination of the features or samples in .D. These types of methods include linear regression, perceptrons, support vector machines, and linear discriminant analysis, among others. The parameters .θ of this type of models are commonly the weights .w and the intercept term b. See Guyon and Stork (2000) for an introduction to linear models for classification. • Decision trees. Under this type of models, the function .fθ is codified in terms of rules conditioning the values that features can take (e.g., .xi,j > c, meaning the value of the j th-feature of the ith-instance is greater than constant c). A tree is formed in such a way that informative attributes are put close to the root. Instances in .D are used to build the tree. Each instance should be associated with a path in the tree, where the path is determined by the values that features take. The tree is built in such a way that leaves contain samples of a single class. See Mitchell (1997) for an introduction to decision tree-based classifiers. • Probabilistic models. It aims to infer the probability of a class or classes given the inputs .P (yi = 1|xi ). This can be directly (with discriminative probabilistic models) or indirectly by approximating the joint distribution .P (X, Y ). Examples of this type of models include logistic regression and Bayesian classifiers, respectively. The function .fθ (xi ) then can be a threshold on the aforementioned probability estimate; the parameters .θ here comprise the probability distribution P . See Bishop (2007) for a detailed treatment of probabilistic models for supervised learning. • Functions based on similarity and distances. Also called instance-based methods, it models the function .fθ in terms of a subset of instances in .D that are similar to the example under evaluation (.xi ). Representative methods include k-nearest neighbor (.k − NN ) classifiers, radial basis functions, and linear vector quantization.  For .k − NN, the decision function has the form .fθ (xi ) = argmaxC ∈C ∀yj :xj ∈N δ(Ck , yj ) , where .N = x1 , . . . , xK , is the k set of .K−most similar instances to .xi , .C = {C1 , . . . , CK } is the set of classes and .δ(a, b) = 1 if .a = b, and .δ(a, b) = 0 otherwise. Usually, this type of models do not have a learning phase; instead, all of the effort is dedicated to inference. An introduction to instance-based learning is provided in Mitchell (1997). j • Ensembles. These models combine the outputs of multiple functions .fθ , where j each .fθ is accurate enough and diverse with respect to the rest of models. The j .f functions could be any other supervised learning model. Popular and effective θ ensemble-based models are random forest, AdaBoost and gradient boosting. A detailed treatment of ensemble learning can be found in Zhou (2012). We have described some of the most common methodologies for supervised learning, although there are plenty of other variants. For a detailed treatment of these types of models, the reader is referred to Bishop (2007), Mitchell (1997), Hastie et al. (2009). It is important to emphasize that regardless of the type of function, the

2.3 Machine Learning Variants and Models

25

goal is to obtain the .fθ that maximizes performance over .T without having access to any sample in .T. In the context of affective computing, supervised learning models are used for recognition, classification, detection, etc.

2.3.2 Unsupervised Learning Unsupervised learning is another variant of ML, and it differs from supervised learning in that labels for data are not available (or are not used for the modeling process). That is, these methods operate in datasets of the type: .D = {(xi )}i∈1,...,N . The goal of this ML variant is to find structure in data, that is, the way in which objects .xi relate to each other in .D. The most common task in unsupervised learning is that of clustering (Aggarwal & Reddy, 2014). The goal of this task is to automatically find groups of data in such a way that elements in the same group are similar, under certain criteria, to each other and dissimilar to elements of other groups, for instance, clustering of facial landmarks to identify prototypical expressions that could be associated with emotions. Clusters can be disjoint or have overlaps. They could be organized in hierarchies and could be exhaustive, and the membership to a class could be probabilistic, etc. The similarity measure is normally defined in terms of the proximity between elements in a multidimensional space, and there are a larger number of similarity measures used to group elements. Commonly, the similarity or distance among objects is estimated with the Euclidean distance, although this is dependent on the type of data being analyzed. Among the most used clustering techniques, we can find the following (see Aggarwal and Reddy (2014) for a detailed review on clustering and main clustering techniques): • k-means. It is the most popular and one of the most effective clustering methods. The clusters are defined according to a set of k centroids, where each data point in .D is associated with its closest centroid. Centroids are randomly initialized, then an iterative process begins in which points  in .D are associated with centroids, and then centroids are updated as .ck = |D1 | ∀xi ∈Dk xi . This process is iterated k until convergence or until a predefined stop criterion is meet. • k-medians. This method is similar to .k−means in that k centers are sought; however, in .k−medians, a centroid should be tied with an instance in .D; thus, this is a problem of choosing k centers from .D. • Gaussian mixture models. It is a probabilistic clustering model that assumes that each data point .xi ∈ D is generated by a finite mixture of Gaussian distributions. Each Gaussian .Nk is defined by its parameters .θk = {μk , σk }. The expectation-maximization algorithm is commonly used to find the parameters of the model. • DB-Scan. This is a density-based clustering method in which clusters are defined in regions of the .Rd input space that presents more density of data points.

26

2 Machine Learning and Pattern Recognition in Affective Computing

Other unsupervised ML tasks include data projection, dimensionality reduction (see Sect. 2.4), and density estimation. Unsupervised learning has been broadly used in affective computing, mainly for data analysis and visualization; usually, unsupervised learning is adopted as a preprocessing or data preparation step before supervised learning techniques are implemented.

2.3.3 Other Learning Variants In addition to supervised and unsupervised learning, there are other ML variants that we briefly mention here; the reader is encouraged to follow the corresponding references for further information: • Semi-supervised learning. It aims to learn from both labeled and unlabeled samples. Commonly, it is assumed that a small set of labeled instances is available .L = {(xi , yi )}i∈1,...,N and that a large set of unlabeled instances coming from the same distribution as .L is also available, that is, .U = {(xi )}i∈1,...,M , with .M >> N. Therefore, these methods aim to propagate the information derived from labels in .L to samples in .U. This is usually done iteratively, for example, consider self-training-based methods (Amini et al., 2022): a model .fθ is built with labeled samples .L; then it is used to generate pseudo-labels for samples in .U; some of these pseudo-labeled examples are then moved into .L. The process is repeated until convergence or until .U is empty. • Self-supervised learning. These methods benefit from learning a supervised task that is artificially generated via unsupervised learning. Commonly, the goal is to initialize a model .fθ that then is further optimized for a supervised learning task. That is, .fθ first learns to solve an artificial but related task (called pretext task), and then it is fine-tuned to the task of interest, which is a supervised learning tasks (see Ericsson et al. (2021) for an introduction into this variant of ML). • Reinforcement learning. It deals with models that learn from the interaction with its environment. Under this learning scenario, the model is learned by rounds (episodes) in which an agent (the model) performs actions that modify the state of an environment, these actions are rewarded (or punished), and according to this feedback, the agent learns to solve a task. Robotics problems have been approached with reinforcement learning, but there are many other tasks that have been successfully approached with this formulation (see Sutton and Barto (2018) for an introduction to this subject).

2.4 Dimensionality Reduction In the previous sections, we have presented the main ML variants and described common techniques for supervised and unsupervised learning for tabular data. In

2.4 Dimensionality Reduction

27

this context, we have assumed that the data of interest lies in a .d−dimensional space, i.e., .xi ∈ Rd . While the previously described methods do not make assumptions on the value of d, when it comes to realistic scenarios (including those from affective computing), it turns out that d can take large values, meaning that the associated .d−dimensional representations can be characterizations that are overcomplete and contain irrelevant, redundant, and noisy information, and/or they are larger than what a method can process in a reasonable time. Additionally, it is often the case that it is important to visualize data for drawing insights on, for example, looking at the structure of data and how samples per class are distributed. Clearly, this is not possible to do if the input data lies in high-dimensional spaces. For alleviating these issues, dimensionality reduction techniques are commonly applied to the feature representation obtained via feature extraction (Escalante & Morales, 2022). Briefly, dimensionality reduction2 aims at transforming the original feature representation into a lower-dimensional one such that the information lost is minimized; if possible, dimensionality reduction methods can also identify (and keep) the most informative features for further analysis or to generate a visualization of data. This topic has been studied for a while, and nowadays a vast pool of dimensionality reduction techniques are available out there, where different techniques differ on their assumptions and their underlying goals. In the supervised learning context, dimensionality reduction aims at transforming the input space pursuing the following goals: (1) the dimensionality of the input space is reduced; (2) the new input space keeps/has the most informative dimensions; and (3) irrelevant/redundant features are discarded.3 Additional goals can be considered by dimensionality reduction methods, like sparsity, distance-based restrictions, etc. Dimensionality reduction can be formulated as the problem of finding a mapping function ., which maps the input space .Rd to another space .Rp such that .p