Opinion Analysis in Interactions: From Data Mining to Human-Agent Interaction 9781119649380, 1119649382

259 54 5MB

English Pages 167 pages [157] Year 2019

Table of contents :
1. Oral and Written Interaction Corpora. 2. Analyzing User Opinions in Human-human Interactions. 3. Analyzing User Opinions in Human-Agent?Interactions. 4. Socio-emotional Interaction Strategies: the Case of Alignment. 5. Generating Socio-emotional Behaviors.

Recommend Papers

From Opinion Mining to Financial Argument Mining [1 ed.] 9789811628818, 9789811628801

Opinion mining is a prevalent research issue in many domains. In the financial domain, however, it is still in the early

134 100 3MB Read more

High Performance Multidimensional Analysis and Data Mining

Summary information from data in large databases is used to answer queries in On-Line Analytical Processing (OLAP) syste

483 45 338KB Read more

An Introduction to Data Mining

This white paper provides an introduction to the basic technologies of data mining. Examples of profitable applications

603 118 599KB Read more

Data Visualization Guide: Clear Introduction to Data Mining, Analysis, and Visualization

567 91 3MB Read more

Introduction to Data Mining 1292026154, 9781292026152

Introduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first ti

496 34 13MB Read more

Applying Data Mining Techniques to Credit Scoring

‼SAS' advanced analytical techniques have a proven ability to quickly and accurately forecast the risk of credit lo

607 28 281KB Read more

Introduction to Data Mining and Analytics 2019955670

655 147 31MB Read more

Data Mining and Predictive Analysis [1 ed.] 9780750677967, 0750677961

It is now possible to predict the future when it comes to crime. In Data Mining and Predictive Analysis, Dr. Colleen McC

392 82 4MB Read more

Sentiment Analysis and Opinion Mining 9781608458851, 1500000021, 1500000011, 4415110405, 1402041020, 1041850885

Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluation

135 75 1MB Read more

Using data mining to detect fraud

455 94 216KB Read more

Opinion Analysis in Interactions: From Data Mining to Human-Agent Interaction
9781119649380, 1119649382

Author / Uploaded
Clavel
Chloe

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Opinion Analysis in Interactions

Series Editor Patrick Paroubek

Opinion Analysis in Interactions From Data Mining to Human–Agent Interaction

Chloé Clavel

First published 2019 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: ISTE Ltd 27-37 St George’s Road London SW19 4EU UK

John Wiley & Sons, Inc. 111 River Street Hoboken, NJ 07030 USA

www.iste.co.uk

www.wiley.com

© ISTE Ltd 2019 The rights of Chloé Clavel to be identified as the author of this work have been asserted by her in accordance with the Copyright, Designs and Patents Act 1988. Library of Congress Control Number: 2019940674 British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-78630-419-3

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Chapter 1. Oral and Written Interaction Corpora . . . . . . . . . 1.1. Oral H–H corpora: call centers and satisfaction surveys 1.1.1. CallSurf and Vox Factory: call center corpora . . . 1.1.2. Satisfaction surveys: the NPS07-09 corpus . . . . . 1.2. Written H–H corpora: forums . . . . . . . . . . . . . . 1.2.1. Forum data: the WebGRC corpus . . . . . . . . . . 1.2.2. External opinion analysis corpora used as points of reference . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3. Oral H–A corpora: virtual assistants and robots . . . . 1.3.1. The Semaine corpus: toward a H–A interaction scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2. The WoZ H–A negotiation corpus . . . . . . . . . . 1.3.3. The UE-HRI human–robot corpus . . . . . . . . . . 1.4. Written H–A corpus: chatbot . . . . . . . . . . . . . . . 1.5. Comparative study of different corpora . . . . . . . . . 1.5.1. Company corpora versus academic corpora in an H–H context . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2. H–A corpora . . . . . . . . . . . . . . . . . . . . . . 1.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

2 2 4 6 6

. . . . . . . . . .

7 8

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

1

. . . . .

. . . . .

8 9 10 15 17

. . . . . . . . . . . . . . .

17 18 20

vi

Opinion Analysis in Interactions

Chapter 2. Analyzing User Opinions in Human–human Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.1. From linguistic modeling to machine learning . . . . . . . 2.1.1. Rule-based system and machine learning . . . . . . . . 2.1.2. Opinion-speciﬁc strings and categorization . . . . . . . 2.1.3. Extracting strings of words linking topics and opinions 2.2. Learning to account for linguistic speciﬁcities . . . . . . . 2.2.1. Speciﬁc characteristics of written conversation: forums 2.2.2. Speciﬁc features of conversational speech . . . . . . . 2.3. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

22 23 26 29 34 35 37 39

Chapter 3. Analyzing User Opinions in Human–Agent Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

3.1. Choice of phenomena to study in relation to applications . 3.1.1. Modeling user likes and dislikes . . . . . . . . . . . . . 3.1.2. Characterizing phenomena for problem interactions . . 3.2. Rule-based system to take into account interaction . . . . . 3.2.1. Methodology: corpus analysis . . . . . . . . . . . . . . 3.2.2. Analyzing user utterances . . . . . . . . . . . . . . . . . 3.2.3. Taking account of the dialogical context . . . . . . . . 3.2.4. Taking account of the thematic structure of interactions 3.3. Hybrid approach for taking account of interactions . . . . 3.3.1. Extraction of linguistic characteristics . . . . . . . . . . 3.3.2. HCRF models . . . . . . . . . . . . . . . . . . . . . . . 3.4. Evaluation for human–agent interactions . . . . . . . . . . 3.4.1. Annotation for system evaluation . . . . . . . . . . . . 3.4.2. Evaluation of different analysis levels . . . . . . . . . . 3.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

44 44 45 46 46 51 55 57 57 58 59 62 63 69 72

Chapter 4. Socio-emotional Interaction Strategies: the Case of Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

4.1. Theoretical models . . . . . . . . . . . . . . . . . . . . . 4.1.1. Selected socio-emotional behaviors . . . . . . . . . . 4.1.2. Theoretical bases for conversational analysis . . . . . 4.2. Qualitative and quantitative corpus analysis . . . . . . . 4.2.1. Analyzing the communicative functions of HR . . . . 4.2.2. Measurement and quantiﬁcation of verbal alignment

. . . . . .

. . . . . . . .

. . . . . . . . . . . . . . .

. . . . . .

. . . . . . . .

. . . . . . . . . . . . . . .

. . . . . .

. . . . . .

77 77 77 78 78 78

Contents

4.3. Computational model of verbal alignment . . 4.3.1. Planning the emotional stance of the agent 4.3.2. Other-repetition module . . . . . . . . . . . 4.4. Method for evaluating an alignment module . 4.4.1. Post-interaction questionnaire . . . . . . . 4.4.2. Qualitative analysis of recordings . . . . . 4.5. Conclusion . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

80 81 82 83 84 85 86

Chapter 5. Generating Socio-emotional Behaviors . . . . . . .

87

5.1. Generating agent prosody . . . . . . . . . . . . . . . 5.1.1. Methodological choices . . . . . . . . . . . . . . 5.1.2. Intonation, dialog acts and social attitudes . . . 5.2. Intonation, facial expressions and sequence mining 5.2.1. Automatic extraction of characteristics and symbolization . . . . . . . . . . . . . . . . . . . . . . . 5.2.2. Parameters for signal temporality . . . . . . . . 5.2.3. Initial results . . . . . . . . . . . . . . . . . . . . 5.3. Generation of coverbal gestures for agents . . . . . 5.3.1. Methodological choices . . . . . . . . . . . . . . 5.3.2. Image schemas: from text to gesture . . . . . . . 5.3.3. Analysis of verbal content . . . . . . . . . . . . 5.3.4. Analysis of prosodic content . . . . . . . . . . . 5.3.5. Illustration . . . . . . . . . . . . . . . . . . . . . 5.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

vii

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. 93 . 95 . 95 . 96 . 97 . 98 . 99 . 100 . 101 . 101

87 88 89 92

Conclusion

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

References

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Index

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Preface

This book is dedicated to the analysis of opinions in human–human and human–agent interactions. We shall present methods based on artiﬁcial intelligence (through learning models of socio-emotional behaviors, combining symbolic and machine learning methods) and affective computing (analysis and synthesis of socio-emotional signals). The work presented here is essentially that which was submitted for my Habilitation à diriger des recherches (HDR) on May 29, 2017. It was written from an essentially personal perspective and was not intended to provide exhaustive coverage of the ﬁeld in question. My vision of the subject evidently bears the marks of my own research career in both academic and industrial settings. Following my doctoral thesis on acoustic recognition of emotions [CLA 07], I took up a research post in the R&D center at Thales Research and Technology. Later, I extended my ﬁeld of study from acoustic analysis to natural language processing in the context of opinion analysis studies carried out at EDF Lab. Since 2013, I have held a teaching and research post at the LTCI (Laboratoire Traitement et Communication de l’Information, Information and Communications Processing Laboratory) at Telecom-ParisTech. My current research builds on the work I carried out at EDF Lab on the analysis of opinions and feelings in written and oral interactions, encompassing the treatment of emotions and social attitudes in human–agent interactions. More speciﬁcally, my aim is to examine the modes of expression (verbal and

x

Opinion Analysis in Interactions

prosodic) in the context of interactions between a human individual and an animated conversation agent, and to support the development of socially and emotionally competent agents. This book presents the various studies that I have carried out on these subjects, grouped around two main axes: – opinion analysis in human–human and human–agent interactions; – socio-emotional interaction strategies and the socio-emotional behaviors in human–agent interactions.

generation

of

The research presented here is rooted in the ﬁelds of natural language processing, applied machine learning, affective computing and social human–machine/human–robot interactions. It was developed within a multidisciplinary context, drawing on ﬁelds such as psychology, sociology and linguistics. This multidisciplinary facet is crucial given the nature of the subject, from the expression of opinions to social attitudes. The majority of the work presented here was carried out by research interns, doctoral and postdoctoral students working under my supervision (listed chronologically): – interns: Charlotte Danesi, Camille Dutrey, Rachel Bawden and Jessica Durand; – doctoral students: Rémi Lavalley, Camille Dutrey, Caroline Langlet, Thomas Janssoone, Irina Maslowski and Valentin Barrière; – postdoctoral students: Sabrina Campano, Guillaume DubuissonDuplessis, Brian Ravenet and Atef Ben-Youssef. This work also draws on the results of collaborative efforts, notably in the context of cosupervision of the doctoral and post doc students named above. The studies of opinion analysis methods for interactions were carried out in collaboration with Anne-Laure Guénet, Delphine Lagarde, Anne Peradotto and Alina Stoica Beck at EDF, and with Patrice Bellot and Marc El-Béze of the Avignon Informatics Laboratory. The work on oral data from call centers, notably the disﬂuence analysis presented in Chapters 1 and 2, is the fruit of a collaboration with the LIMSICNRS, the Laboratoire d’Informatique pour la Mécanique et les Sciences de

Preface

xi

l’Ingénieur (Informatics Laboratory for Mechanics and Engineering Sciences) at the University of Paris 11 (Sophie Rosset and Ioana Vasilescu) and with the LPP, the Laboratoire de Phonétique et Phonologie (Phonetics and Phonology laboratory) at Paris 3 (Martine Adda-Decker). My research on human–agent interactions (presented in Chapters 3, 4 and 5) coincided with my arrival at the LTCI, and beneﬁted hugely from a long and fruitful collaborative partnership with Catherine Pelachaud. The work on conditional random ﬁelds presented in Chapter 3 was made possible by expertise provided by the LTCI in the context of a collaboration with Slim Essid; the study of facial expression generation, described in Chapter 5, would not have been possible without the support of Kévin Bailly (ISIR). The prosodic behavior generation and alignment measurement techniques presented in Chapters 4 and 5 owe a good deal to scientiﬁc discussions with Frédéric Landragin from Lattice. Finally, I wish to thank Björn Schuller, Dirk Heylen and Nicolas Sabouret, the reviewers of my HDR assessment, along with Frédéric Bechet, Mohamed Chetouani and Patrick Paroubek, the examiners, for their probing questions and stimulating discussion, which provided inspiration and motivation for my subsequent work. Chloé C LAVEL May 2019

Introduction From Opinion Mining to Human–agent Interactions

The automatic analysis of opinions in interactions is a rapidly expanding domain, encouraged, on the one hand, by the challenges presented by practical applications and, on the other hand, by the growing presence of online platforms for public expression and media. This development offers exciting new possibilities in terms of critical expression and action over the Internet [CAR 13], and the quantity and variety of data available is increasing. In terms of natural language processing, the challenge is to analyze expressions of opinion automatically for the purposes of analyzing social trends. There are many possible applications for these techniques: analyzing citizens’ opinions of candidates at election time, analyzing Internet users’ opinions of a product (or product reputation), identifying target clients for recommendation systems, evaluating the success of an advertising campaign, etc. Running parallel to this social web phenomenon, social robotics, and human–agent interactions as a whole, offer fertile ground for the analysis of opinions in interactions between humans and virtual agents. For example, companion robots are used to provide assistance to users (helping maintain independence) and for entertainment purposes. In this context, knowledge of the user and their proﬁle is critical in order to build a social link between the person and the robot. Using this proﬁle (particularly user preferences, in this case), a companion robot may, for example, choose subjects to discuss when interacting with the user, or recommend products, music or entertainment that they may enjoy.

xiv

Opinion Analysis in Interactions

The possible applications of the domain are manifold, as are the challenges it represents. In recent years, virtual agents have come into use for managing client relations on websites, and a number of companies have developed their own virtual assistants (such as Alexa (Amazon), Siri (Apple) and Cortana (Microsoft)). Whilst these virtual assistants are already widely used, further work on the social component of interactions is crucial in order to improve the ﬂuidity and natural feel of interactions. A further area in which socio-emotional behaviors in human–agent interactions may be taken into account is that of Serious Games, in which users may be trained to handle different situations in conjunction with a virtual agent. For example, in [YOU 15], users can work on improving their social behaviors in the context of virtual job interviews. The research on opinion analysis presented in this work covers two different interaction situations: 1) human–human interactions collected online and from company data; 2) human–agent interactions (embodied conversational agents, robots). Opinion detection and analysis approaches have been welcomed with open arms by the machine learning community, although the areas of natural language processing and affective computing have sometimes been omitted. In this work, we shall make use of research carried out in all three ﬁelds (affective computing, machine learning and natural language processing). We shall present detection methods including logico-semantic rules and machine learning methods, choosing the most appropriate option for different scientiﬁc problems and in response to different levels of maturity. Rule- and knowledge-based methods constitute an essential ﬁrst step in deﬁning the outlines of new scientiﬁc problems. For instance, we deﬁned logico-semantic rules to develop an initial user opinion detection system in a context of human–agent interactions, a subject not previously covered in the literature. Our work focuses on a number of research questions woven through every chapter. In this introduction, we shall establish the scientiﬁc context and speciﬁc elements of each of these main questions. – the ﬁrst research question relates to theoretical opinion models (Q1, presented in section I.1);

Introduction

xv

– the second research question relates to computational opinion models (Q2, presented in section I.2); – the third and fourth questions relate to the creation of socio-affective conversation agents (Q3 and Q4 in section I.3). Laura, my virtual consultant

Chapters 1 and 2: human-human interactions: - written (forum) - oral (call center)

Chapters 1 and 3: human-agent interactions: - written (chatbot) - oral (interactions with ACAs or robots) Language specificities: - disfluent speech - internet-specific element (lol, , BRB etc.)

Computational models (Q2), logico-semantic rules, machine learning

Terminological aspects, models derived from psychology and linguistics (Q1)

User/client opinions

Figure I.1. Analysis of opinions in interactions as presented in this book

These research questions clearly highlight common themes and articulations between the chapters of this book. They form a bridge between the two areas of research that we have chosen to present: 1) the development of opinion detection systems for interaction analysis purposes (written and oral1), in the context of both human–human (Chapter 2) and human–agent interactions (Chapter 3), as shown in Figure I.1;

1 i.e. concerning the detection of opinions in written text and in transcriptions of speech.

xvi

Opinion Analysis in Interactions

2) the development of virtual agents with the ability to express opinions and, more broadly, to present socio-emotional behaviors (Chapters 4 and 5), as shown in Figure I.2.

Socio-emotional behaviors: social attitudes, emotions, opinions Focus on verbal and prosodic content

E.g. verbal alignment strategies

USER I like this painting

I like this painting too

Analysis and detection Interaction strategies Chapter 3 (Q1 and Q2) Chapter 4 (Q3)

E.g. generation of multi-modal signal sequences

AGENT

Generation Chapter 5 (Q4)

Figure I.2. Human–agent interactions and socio-emotional behaviors: the three aspects presented in this book, in Chapters 4 and 5, and the associated research questions Q1, Q2, Q3 and Q4

I.1. Terminologies and theoretical models of opinions The opinion detection problem is often reduced to a simple question of positive/negative classiﬁcation. Nevertheless, the precise deﬁnition of the phenomenon and of what differentiates opinions from emotions or feelings is important, and the choice of a speciﬁc phenomenon differs in relation to the scientiﬁc problem in question. In this section, we shall present work carried out by different communities on the terminological and subjacent theoretical aspects of the opinion phenomenon. I.1.1. Overlapping terminologies used by different communities The multidisciplinary nature of this domain of research has led, on the one hand, to the use of different terminologies to denote similar phenomena

Introduction

xvii

(emotion, opinion, feelings, moods, attitudes, interpersonal stance, personality, affect sensing, judgement, assessment, argument, etc.); on the other hand, the same terminology may be used to denote different phenomena. The opinion mining community tends to use terms such as opinion, feeling and affect to refer to different phenomena. However, existing work rarely gives a precise and in-depth deﬁnition of exactly what is meant by each of these terms. The human–agent interaction community draws on psychological theories, used to model emotion-related phenomena, affect sensing and mood, and, more recently, on social interactions. Clavel and Callejas [CLA 16b] give a state of the art of these terminologies and theoretical models, as described below. Scherer [SCH 05] proposes a distinction between different phenomena. Notably, he establishes a difference between emotions and attitudes. Emotions are deﬁned as phenomena of short duration, including a physiological reaction, following the evaluation of a major stimulus (as in the case of fear, sadness, joy or anger). Attitudes are deﬁned as predispositions toward objects and/or persons (as in the case of preferences). Scherer also deﬁnes the interpersonal stance, or social attitudes, as an affective disposition toward another person in the context of an interaction, for example politeness, warmth or distrust. Speciﬁc studies of verbal content have been carried out in the ﬁeld of linguistics. Like Scherer, Martin and White [MAR 05] prefer the term “attitudes” to those of “feelings” or “opinions”. They deﬁne an attitude as something concerned with feelings, including emotional reactions, judgments of behavior and evaluations of things [MAR 05, p. 35]. The authors distinguished three types of attitudes: – affects (personal reactions relating to an emotional state); – judgments (the fact of assigning qualities – such as tenacity – to individuals as a function of normative principles); – appreciation (the evaluation of an object, product or process). Munecero et al. [MUN 14], working in the context of opinion mining, also propose deﬁnitions backed up by textual examples of feeling and

xviii

Opinion Analysis in Interactions

opinion phenomena. The author deﬁnes affects as preceding emotions prior to awareness of the associated feeling. Consequently, according to this view, affects are not expressed in linguistic form. Distinctions are also made between emotions and feelings and between opinions and feelings: – emotions differ from feelings in terms of duration (emotions have a shorter duration) and by the presence of a target (emotions are not always connected to an object); – opinions are personal interpretations of information and are not necessarily emotionally charged in the same way as feelings. The studies presented in this book are based on the concept of attitude deﬁned by Martin and White. This deﬁnition encompasses all of the phenomena relating to opinions, providing subcategories that circumscribe phenomena as a function of a scientiﬁc context. I.1.2. Three scientiﬁc models from two different communities The theoretical models underpinning opinion and emotion analysis systems also differ according to the community in question (opinion mining or human– agent interactions) and to the chosen application. In [CLA 16b], we identiﬁed three major families of theoretical models used in deﬁning opinion-related phenomena: – dimensional models; – categorical models; – models based on evaluation theory. I.1.2.1. Dimensional models The most common tasks applied in the context of opinion mining relate to the detection of polarity (positive vs. negative) and intensity [WOL 13], [OSH 09]. Polarity and intensity are two dimensions used to describe opinions and may be linked to theories based on a dimensional model [RUS 80] of opinions/emotions. This descriptive mode represents socio-emotional phenomena along abstract axes, such as valence/activation. Polarity detection in particular can be used to simplify the opinion analysis problem, segmenting the polarity axis into two or three classes (is the opinion

Introduction

xix

expressed in a text broadly positive, negative or neutral2?) and is used, for example, to analyze opinions concerning a brand (e-reputation) or in analyzing movie reviews. For example, the Deft’07 text mining challenge [GRO 07] concerned the attribution of opinions (positive, negative or neutral, where applicable) to a corpus of reviews of books, shows, video games, scientiﬁc articles and parliamentary debates. Opinion polarity or emotion valence analysis is also used in the domain of human–agent interactions to manage negative emotions within the context of interactions [SMI 11]. I.1.2.2. Categorical models Other studies (for example [PER 13]) have drawn on categorical models developed in the ﬁeld of psychology [EKM 99, IZA 71, PIC 00, PLU 03, WHI 89,] concerning the detection of categories of opinion or emotion in textual data. The categorical approach consists of assigning appropriate predeﬁned lexical items, or labels, to socio-emotional phenomena. This approach constitutes the most intuitive means of describing speciﬁc phenomena using categories drawn from everyday language [CLA 07]. Categories are deﬁned by tracing hard lines within the perceptive space. Each category corresponds to a prototype [KLE 90] to which other similar manifestations may be linked. The way in which category lines are drawn is heavily dependent on the data in question. In the case of fully simulated corpora, we look for illustrative examples of a predeﬁned prototype. All emotional manifestations contained in the corpora must converge strongly toward the prototype. In the case of spontaneous corpora, expressions are grouped around an abstract prototype. The diversity of contexts in which emotions emerge within spontaneous speech heightens the complexity of the task. The classes used in opinion analysis are thus highly dependent on the context of application and on the data in question. Examples include:

2 According to context, “neutral” may mean that no opinion is expressed in the text, or that both positive and negative opinions are present.

xx

Opinion Analysis in Interactions

– detection of agreement or disagreement [GAL 04] in recordings of meetings; – detection of insulting messages on the Internet [SPE 97]; – detection of subjectivity [TSY 12]; – detection of frustration in drivers [BOR 10], for educational support systems [LIT 06] or in computer games designed for children [YIL 11]; – representation of emotion detected in a text through avatars [NEV 10b, ZHA 08]. I.1.2.3. Models based on evaluation theory Models based on evaluation theory provide a richer basis for analysis and have been shown to be effective in the context of human–agent interactions, although they have yet to be widely adopted by the opinion mining community. The most popular model of this type within both communities is the Orthony Clore and Collins (OCC) model based on the cognitive structure of emotions. It has been used in the context of opinion mining for textual affect sensing [SHA 09]), and is particularly popular in the agent community for generating emotional behaviors for agents, classifying events, objects and actions in order to deﬁne the emotion which the agent should express [VAL 09]. However, different communities also use their own evaluation theories. In the opinion mining community, work has been carried out on another evaluation theory, providing a deﬁnition of attitudes or evaluation through language [MAR 05]. This theory is used to represent an opinion (an attitude) as an evaluation of a target (e.g. a service or a product) by a source (e.g. the person communicating) [BLO 07]. Alternatives to the OCC model have also been used among the agent community, including Scherer’s appraisal model, which breaks the evaluation process down into different steps (such as the evaluation of a new element), or the EMA dynamic appraisal model [MAR 09]. Another theoretical approach used by researchers working in a similar ﬁeld to opinion mining involves argument models. A graphic formalization of argument models from the ﬁeld of philosophy [TOU 03a] was proposed by

Introduction

xxi

[CAB 13] for the purposes of argument mining of debates on social networks. This formalization enables the identiﬁcation of structures connecting opinions, for example by linking opinions corresponding to rebuttals and claims. A ﬁrst effort to identify different terminologies and theoretical models was made by the W3C with the development of the EmotionML3 (Emotion Markup Language). The aim was to deﬁne a common language for annotating emotions. This project has now been extended to describing feelings in linked data sources, remaining within the W3C framework [SÁN 16]. In accordance with our decision to examine opinion phenomena in connection with the concept of attitude, the work presented in this book is based on Martin and White’s theory of evaluation in language, which provides a description of verbal realizations of attitudes; a symbolic formalization of expressions may thus be developed and integrated into a detection model. I.1.3. Research question and articulations in this work In [CLA 07], we reﬂected on the best theoretical model to use in constructing a computational model in a different context, that of acoustic analysis and the detection of fear-type emotions in abnormal situations. The work presented in this book extends our investigation to socio-emotional behaviors, including linguistic phenomena associated with opinions and feelings. We shall consider (Q1) the relevance of different theoretical models for constructing a computational model (linguistic and prosodic) depending on the application (social network analysis, customer relations management, recommendations, human–agent interactions and social robotics). This research question will be addressed throughout the book: – a ﬁrst, outline response to this question is presented in Chapter 2, with the deﬁnition of the concept of satisfaction in marketing terms based on company data for a customer relations application;

3 https://www.w3.org/TR/emotionml/.

xxii

Opinion Analysis in Interactions

– the theoretical modeling question is considered in greater detail in Chapter 3, using Martin and White’s appraisal theory [MAR 05] to construct: - “like” and “dislike” models for users in human–agent interactions, with the aim of establishing a user proﬁle, - models of customer/user opinions in interaction with a chatbot; – in Chapter 4, we use Martin and White’s approach to model the appreciations of visitors to a museum and to propose an agent with the capacity to adapt to user appreciations. In Chapter 5, our work is extended further to address phenomena relating to social attitudes. I.2. Computational models of opinions The literature for opinion mining and sentiment analysis, i.e. concerning detection from written texts or transcribed speech, covers three broad types of methods: rule-based methods, statistical methods and a hybrid method featuring elements of both. Opinions are rarely expressed in a simple form, e.g. “this product is bad”. Methods must therefore respond to a number of challenges: – treatment of negation and intensiﬁers [MOI 07, NEV 10a, TAB 11] in order to process expressions such as “I wouldn’t really consider this to be a good movie”; – target and source identiﬁcation [BLO 07] to treat expressions such as “I’m satisﬁed with the contact I’ve had with EDF, but not with their services”; – treatment of metaphors – for example, the expression “global warming” has a stronger negative implication than “climate change” [AHM 11, ZHA 09] – and anaphors4 [MOR 12]; – treatment of comparisons (e.g. “this movie was not as good as the last one”) and sarcasm “I strongly recommend you watch this movie – if you need something to help you sleep” – ironic expressions occur frequently and are hard to detect [REY 14];

4 Anaphor resolution and relationship extraction enable us to identify the target of an opinion in cases where this target is indicated by a personal pronoun.

Introduction

xxiii

– treatment of structural and language speciﬁcities, e.g. emoticons and hashtags in tweets, etc. [RUS 11]; – treatment of idiosyncratic contexts (different individuals do not express opinions in the same way, personality has a role to play) or the social and political context in which an opinion is expressed. I.2.1. Rule-based methods Rule-based methods examine the occurrence of words from opinion lexicons [PEN 01] and make use of linguistic rules or extraction patterns using these lexicons and different levels of textual analysis from morpho-syntactic analysis (inﬂected forms, lemmas, grammatical categories, etc.) to other rule outputs [MOI 07, OSH 09]. Methods of this type, featuring formal representations of utterances, can be used in response to the classic opinion mining problems presented in the previous section. For example, in the context of negation and intensiﬁer treatment, researchers [MOI 07, NEV 10a] made use of intercomponent dependency analysis with propagation or inversion rules to manage polarity on different levels of the syntactic structure. Rule-based methods enable easy integration of resources and knowledge [POR 14], and transform theoretical models derived from psychology into computational approaches. Shaikh et al. [SHA 09], for example, propose an interpretation of the OCC model [ORT 90] using dedicated linguistic rules, while Neviarouskaya et al. [NEV 10a] use a compositional approach to distinguish the affect, judgment and appreciation components of Martin and White’s model [MAR 07]. These methods also seem to be suitable for modeling speciﬁc language features and the interaction context. These methods will be discussed further in Chapter 3, applied in modeling the context of human–agent interactions in order to assist in detecting user opinions. Note, however, that these models are expensive to develop, as they require specialist linguistic expertise and in-depth knowledge of the input data. The developed models are relatively speciﬁc and can only be applied to new tasks following modiﬁcation by an expert.

xxiv

Opinion Analysis in Interactions

I.2.2. Machine learning methods The second category is that of machine learning methods. These methods can be used to create more robust models and for machine learning of the most relevant linguistic characteristics for opinion classiﬁcation. The development of supervised machine learning methods is dependent on the availability of a sufﬁcient volume of annotated data. The current trend favors the use of deep learning methods such as recursive neurone networks. As we see from [SOC 13], these methods rely on the use of databases featuring ﬁne annotation at different phrase levels (different nodes in the syntactic tree) to provide the structure required for recursive models to operate. The quality of learned models is highly dependent on the quality and quantity of annotations available. The difﬁculty resides in deﬁning annotation protocols to harmonize the work of multiple annotators, especially given the subjective nature of opinion phenomena [WIE 05]. The performance of learned models is also evaluated using annotations as a reference, and it can be difﬁcult to make a judgment in cases of disagreement between the system and a human annotator. Moreover, the distribution of different opinion classes is often uneven, dominated by a larger neutral class [CAL 08]. One response to this difﬁculty in obtaining sufﬁcient quantities of annotated data is to use semisupervised learning methods [MAR 13b]. The linguistic characteristics used to represent a document play an important role as input for learning models. They may make use of opinion lexicons, modiﬁers and terms linked to negation [KEN 06]. The emergence of word embedding (word2vec) type characteristics also offers interesting perspectives for improving the performance of systems based on machine learning [POR 15, TAN 14]. Note, too, that these methods are particularly useful for multimodal sentiment analysis, facilitating combined use of linguistic, acoustic and visual cues [WOL 13]. A ﬁrst step in this direction is presented in Chapter 2, where machine learning methods are used to categorize opinions and to highlight the expressions associated with these opinions. A system for detecting disﬂuency in oral interactions based on conditional random ﬁelds (CRF) is also developed. In Chapter 3, hidden CRFs are used to analyze opinions in human–agent interactions.

Introduction

xxv

I.2.3. Hybrid methods This third type of method combines elements of the rule-based and machine learning approaches. In this hybrid method, rule-based approaches are used to improve machine learning models [POR 14], or machine learning methods may be used to extract sequential patterns for use in rules for extracting subjective expressions [CHO 05]. Hybrid methods combine the generalization capacity of machine learning methods with the ﬁne, ﬂexible modeling made possible by semantic rules. These methods require a reasonable quantity of annotated data, and, as such, offer a promising alternative to deep learning type methods. In Chapter 2, we shall highlight the interest of using linguistic rules to construct input characteristics for machine learning methods in the context of opinion analysis for customer satisfaction studies. We also demonstrate the possibility of managing the learning of prosodic and linguistic characteristics using disﬂuency detection with CRF methods. I.2.4. Computational models and task types Method choice also depends on the level of granularity used in opinion classiﬁcation. The classic choice is to work at document level (one review/article/conversation) and to classify the document into opinion categories [TAB 11]. Other studies have looked at sections of the document, working, for example, at sentence level in the case of multiphrase documents, or at syntagm level (e.g. the macro-, meso- and microlevels deﬁned in [PAR 10]). Once the document has been broken down into subsegments, the methodology used is the same as for document categorization. Another type of task consists of considering the level of expression of opinions identifying sequences of words expressing an opinion and assigning a polarity. Different methods may be used to do this from the use of metrics to extract deﬁning terms [HAM 16] to CRF [CHO 05]. Moving beyond expression identiﬁcation or categorization, certain methods also extract structures relating to opinions. This concerns relations between

xxvi

Opinion Analysis in Interactions

an opinion and its target (what is the opinion about?) and source (who is expressing the opinion?). In [YAN 13], the authors propose a source and target detection method for opinions based on CRF and the use of syntactic patterns. The work presented in Chapter 2 concerns the relation between opinions and targets, and we propose a method using metrics to extract expressions of opinion and different themes from the studied corpora. In Chapter 3, we present a rule-based method using a formalization of agent utterances and the structure of interactions according to different themes, contributing to an analysis of user likes and dislikes. I.2.5. Computational models and human–agent interactions A considerable body of work is available concerning opinion and feelings analysis of verbal content on social networks and the Internet; however, very little research has been done on its application to social robotics and to human– agent interactions as a whole. The majority of studies have focused on analyzing non-verbal cues (facial expressions, acoustic cues) to analyze socio-emotional behaviors [SCH 16]; verbal content has only been used to a limited extent. Only two studies have been carried out on a system including a module for detecting sentiments on the basis of verbal content in human–agent interactions [PUL 10, SMI 11]. These exploit a sentiment analysis module [MOI 07] initially designed for the analysis of texts in a non-conversational context. They present a ﬁrst solution for integrating an opinion detection system based on verbal content in human–agent analysis, highlighting the importance of analyzing these opinions in interactions with an embodied conversational agent. However, these opinion analysis methods have yet to be adapted for use in a conversational context. The systems proposed in Chapter 3 offer a response to this need. The methods developed make use of the context of interactions in order to support user opinion analysis.

Introduction

xxvii

I.2.6. Research question and articulations in this work The research on opinion analysis methods presented here examines the capacity of different methods to create computational methods which are both generic and relevant. The question of genericness and relevance will be examined from two perspectives. (Q2a) Genericness and adaptability of computational models as a function of application (management of client relations in company data and human– agent interactions). We need to develop opinion detection methods that enable ﬁne modeling of opinion concepts based on complex theoretical models and including speciﬁc knowledge (trade knowledge, knowledge of the context of human–agent interactions, etc.). We shall consider ways of modeling the context of human–agent interactions and of adapting opinion analysis methods to the speciﬁc challenges of human–agent interactions. This research will be presented from the perspective of application contexts, considering two types of opinion analysis method: – rule-based methods, used to model the concept of satisfaction including company-speciﬁc knowledge for EDF in Chapter 2, and to model user likes and dislikes including thematic knowledge of interaction scenarios and problematic interactions in Chapter 3; – hybrid methods that permit the use of machine learning methods and the identiﬁcation of word chains relating to opinions, presented in Chapter 2. (Q2b) Genericness and adaptability of computational models in relation to data. We also need to develop opinion detection methods that allow us to modify both linguistic speciﬁcities and the richness of spontaneous language phenomena encountered in “in-the-wild” data. The treatment of “in-the-wild” data is a rapidly expanding area of research [SCH 16]; in this case, the term is

xxviii

Opinion Analysis in Interactions

used to denote data collected from real applications. This type of textual corpus is also referred to as an “ecological corpus”5. The data aspect was studied using rule-based methods to model the language speciﬁcities of conversational speech in Chapter 2 and chatbot data in Chapter 3. Corpus analysis is used to characterize language phenomena encountered in written forum interactions and oral interactions from call centers in Chapter 2. I.3. Human–agent interactions and socio-emotional behaviors In order to develop the social and emotional capacities of agents used in human interactions, research is required in three different areas, as shown in Figure I.2 [CLA 16a] : 1) analysis and recognition of user behavior; 2) development of strategies for interaction; 3) behavior generation. The ﬁrst area was introduced in section I.2. In this section, we shall introduce the remaining two areas, focusing on verbal and, more broadly, oral aspects, as before. I.3.1. Socio-emotional interaction strategies In human–agent interactions, socio-emotional strategies must be deﬁned so that the agent can react to these behaviors in a socially relevant manner. The modulation of agent responses to users’ socio-emotional behaviors contributes to agent efﬁciency in fulﬁlling assigned tasks and establishing richer relationships, increasing the capacity to be perceived in a positive light [CAL 11b] and to maintain a good rapport [BIC 10a] with users.

5 The term “corpus ecologique” is used in [WIS 10] to refer to a collection of textual data produced as spontaneously as possible by the speaker, in contrast with examples constructed by linguists for study purposes, producing methods, tools and theories that are disconnected from the real world.

Introduction

xxix

The concept of engagement also plays a crucial role in human–agent interactions: agents must be engaging, whether they are designed for short, one-off interactions or for applications involving a long-term relationship. The importance of the engagement paradigm has been observed in many areas of human–agent interaction, such as training [DME 13], healthcare [BIC 10b, GRI 14] and museums [CAM 15a, KOP 05]. This subject has been approached from several different angles in human–agent literature. An interesting distinction may be drawn between attentional engagement and emotional engagement [PET 09], although the divide is not always clear, given that attention is inﬂuenced by emotions. The relevance of agent responses to users clearly has a role to play in user engagement [NOV 10], and the creation of a socio-emotional interaction strategy helps to overcome some of the limitations arising from the limited comprehensive capacity of dialog systems. This was the focus of the European Semaine project, using an interaction scenario in which the agent was proactive – asking questions – and was equipped with a range of feedbacks and emotional backchannels. These feedbacks and backchannels are a typical example of a socio-emotional interaction strategy, maintaining user engagement by equipping the agent with listening behaviors [LAM 11]. Similarly, D’Mello and Graesser [DME 13] proposed an agent generating feedback designed to maintain student engagement in interactions and tasks. In [TRU 10], a rule-based model was used to plan agent backchannels as a function of user prosody. Another example of socio-emotional interaction strategies, focusing more on verbal content, can be found in the politeness strategies used to equip agents with social intelligence [WAN 08], giving the impression of increased engagement in interactions [DEJ 08, GLA 14]. The authors also provide a model centered on verbal content allowing the agent to adjust to the level of politeness and formality expressed by the user. The article makes use of the alignment concept, highlighting the value of these concepts in socio-emotional interactions between agents and users. The alignment concept, as deﬁned in [PIC 04], evokes the tendency of individuals to reproduce the manner of speaking of those with whom they communicate, human or machine [BRA 10]. Alignment is perceived as a sign of empathy

xxx

Opinion Analysis in Interactions

and interpersonal skills [PFE 08], and can thus be considered as a means of improving perception of an agent’s social skills. Considering alignment in its broadest sense, different terminologies have been used in the literature to refer to associated notions. These notions differ in the way in which they integrate temporal and dynamic aspects. Mimicry, for example, is deﬁned as direct imitation of what the user produces [BER 13], while synchrony is deﬁned as the reciprocal and dynamic adaptation of temporal structures of behavior between two interacting entities [DEL 12]. Alignment processes have been explored in detail in linguistic studies based on corpus observation [TRU 12]. More recently, there has been an upsurge in interest in the implementation of alignment processes in human–machine interactions. In this context, alignment operates mostly on lexical and syntactic level [BUS 09], whereas in face-to-face interactions between users and agents, most implementations have taken a broader approach to alignment – including mimicry [HES 99] and social/emotional resonance [GRA 13] – based on non-verbal content. Currently, the methods used in developing socio-emotional interaction strategies are essentially rule based, using linguistic patterns (as in the case of the AIML Markup Language, integrating user emotions [SKO 11]), although learning-based methods are becoming increasingly widespread. Thus, Bui et al. [BUI 10] use Partially Observable Markov Decision Processes (POMPD), using information on the user’s emotional state as input, and Khouzaimi et al. [KHO 15] use reinforcement-based learning methods to manage turn-taking behaviors. In Chapter 4, we propose a new type of interaction strategy for alignment with user assessments, parameterizing the verbal content used by the agent as a function of the user. Given the scarcity of this type of alignment in the literature, we established a methodology based on corpus observation and on conversational analysis studies in order to construct our model. I.3.2. Social attitude generation Studies concerning the generation of socio-emotional behaviors in artiﬁcial agents (opinion phenomena such as those described in section I.1,

Introduction

xxxi

emotions, social attitudes, personality, etc.) have essentially focused on visual animation (gestures and facial expressions) [OCH 13, XU 14]. Other authors have worked on the generation of expressive speech using TTS (Text To Speech) systems [CHA 13] and on the generation of verbal content [WAL 14] in the context of behavior generation for an Embodied Conversational Agents (ECA), but such studies are less common. One way of constructing expression models is to develop rules based on a synthesis of observations found in psychology literature [BRU 15]. A second option is to make use of quantitative analyses obtained through corpus observation. In this case, the constructed models begin to take account of the intrinsic variability of observations. Such approaches are especially relevant in cases where prior psychological research is not available, reducing the need for manual data analysis. Our methodology consisted of analyzing correlations between labels corresponding to socio-emotional behaviors and linguistic, acoustic and visual cues in a corpus of human recordings. Illustrative cases may then be used to study the prosodic characteristics affecting the perception of dominance [TUS 00] or charisma [ROS 09]. The use of machine learning methods also adds an element of variability in behavior generation; systematic behavior generation tends to produce more artiﬁcial and less ﬂuid interactions. Authors are now looking to sequence mining techniques to identify series of cues which characterize socio-emotional behaviors, for example generalized sequence patterns [CHO 14, MAR 11]. The challenge is thus to synchronize different modalities for generating expressive behaviors [MAR 13a]. Machine learning methods are designed to “learn” means of generating gestures or facial expressions based on verbal communications, e.g. [DIN 13]. In Chapter 5, we propose two types of methods: – a new sequence mining method, the SMART (Social Multimodal Association with Timing) processing chain, which allows automatic extraction of temporal association rules between social signals from audiovisual recordings, injecting them into behavior generation models for agents;

xxxii

Opinion Analysis in Interactions

– a rule-based method, modeling the cognitive process used by humans to synchronize gestures and words, for use in the agent. I.3.3. Research questions and articulations In the previous section, we considered the research question relating to the detection of user behaviors. In this section, we shall consider the behaviors of an embodied conversational agent. The work presented in this book centers on two main aspects. The ﬁrst aspect relates to socio-emotional interaction strategies centered on verbal content. Our main aim is to use these strategies in an attempt to inﬂuence user engagement factors: as we saw earlier, user engagement is of crucial importance in human–agent interactions. Following on from recent work on the implementation of alignment models in human–agent interactions, our research question here concerns: (Q3) the role of alignment strategies in user engagement, notably at the level of opinions and their expression. The development of a virtual agent with the capacity to express opinions in reaction to those expressed by the user, following an agent-to-user verbal alignment model, is presented in Chapter 4. The second aspect, also discussed in Chapter 5, concerns the generation of socio-emotional behaviors. We have chosen to focus on prosodic content and the way in which it relates to other modalities (facial expressions, gestures, etc.), and on the social attitude type of socio-emotional behaviors. The associated research question concerns: (Q4) parameterizing prosody in a multimodal context for the generation of multimodal behaviors in a virtual agent. Our literature review highlights the lack of work on this question. The methodology used to respond to this question is based on sequence mining techniques, integrating temporal information.

Introduction

xxxiii

I.4. Outline of the book Our work begins, in Chapter 1, with a comparative study of the different human–human and human–agent corpora collected, annotated and studied in the following chapters. Chapter 2 presents research on opinion analysis in human–human interactions. Our ﬁrst contribution is the creation of in-the-wild corpora (call center and forum data) with a wealth of spontaneous expressions, obtained from company data presented in Chapter 1. Our second contribution concerns the development of opinion models based on grammars of lexicons and linguistic rules, and on hybrid learning models that enable the integration of linguistic knowledge into machine learning algorithms. Our third contribution concerns the characterization of speciﬁc structures of expression found in written virtual communications, the particular way in which these communications are written, and the characterization of spontaneous communication phenomena (disﬂuency). With regard to this ﬁnal aspect, we propose a combined modeling of acoustic parameters and linguistic markers, in the form of CRF in order to detect disﬂuency. In Chapter 3, we address the context of human–agent interactions. Our contributions center around two aspects. First, we propose a ﬁne modeling of the opinion phenomenon, concentrating on user likes and dislikes and on user opinions of interactions (to detect problematic interactions). Second, the proposed opinion detection method is rooted in the context of interaction, taking account of both the dialogical context (adjacent peers and previous user utterances), the communication modalities of the agent, and the topic structure established by the interaction scenario. In our work on socio-emotional interaction strategies for agents communicating with human users (Chapter 4), the proposed strategy permits determination of agent alignment based on user appreciations, alongside

xxxiv

Opinion Analysis in Interactions

instantiation of verbal parameters for implementing this alignment. This constitutes a ﬁrst proposal for an appreciation-based alignment system in the context of human–agent interactions. In terms of utterance generation for agents, as presented in Chapter 5, our proposed methodology allows automatic extraction of sequences of social cues (prosody and facial expressions) which characterize a social attitude, working directly on recorded corpora with no manual annotation of social cues. A summary of the research questions discussed in the chapters of this work is shown below: – (Q1) relevancy of different theoretical models for the construction of a computational model (linguistic and prosodic) as a function of the application; – (Q2a) genericness and adaptability of computational models as a function of the application; – (Q2b) genericness and adaptability of computational models as a function of the data; – (Q3) the role of alignment strategies in user engagement; – (Q4) modeling prosody in a multimodal context in order to generate multimodal behaviors in a virtual agent.

1 Oral and Written Interaction Corpora

Opinion detection for real-world applications using company data raises a number of crucial scientiﬁc challenges that are not always evident to the academic community, where these data and cases of application are not always accessible. In the Introduction, we presented two research themes that highlight the speciﬁcity of these challenges: – the relevance of different theoretical models in constructing a computational model in relation to an application (social network analysis, managing client relations, recommendations, etc.) (Q1, see section I.1.3); – the capacity of different methods to create generic and relevant computational models for different applications and data (Q2a and b, see section I.2.6). In this chapter, we present the different corpora that we have developed and used in connection with these different research questions, in the context of both human–human (H–H) and human–agent (H–A) interactions. Our objective is to present the linguistic speciﬁcities encountered in spontaneous and company corpora. Using company data, we created in-the-wild corpora offering a large number of spontaneous expressions (from customer relation forums, manual and automatic transcriptions of call center recordings, satisfaction surveys, microblog data and customer relation chatbots). We also applied our methods

Opinion Analysis in Interactions: From Data Mining to Human–Agent Interaction, First Edition. Chloé Clavel. © ISTE Ltd 2019. Published by ISTE Ltd and John Wiley & Sons, Inc.

2

Opinion Analysis in Interactions

to external corpora, including open-access academic corpora used for evaluation purposes. This chapter is intended to provide a point of reference for the following chapters, in which the opinion analysis methods developed using these corpora will be described. The structure is as follows. First, in sections 1.1 and 1.2, we shall describe the H–H interaction corpora used in Chapter 2. Next, we shall present the H–A interaction corpora used to develop user opinion analysis methods (Chapter 3) and agent behavior generation models (Chapter 4) in sections 1.3 and 1.4. Finally, in section 1.5, we shall compare the different corpora, highlighting the linguistic speciﬁcities associated with different contexts and modes of interaction. 1.1. Oral H–H corpora: call centers and satisfaction surveys 1.1.1. CallSurf and Vox Factory: call center corpora EDF has carried out two call center data collection campaigns as part of two different projects: CallSurf and Vox Factory. These data cover a wide range of subjects relating to the services offered by the company, including contract subscriptions, billing enquiries and technical issues encountered by customers. The CallSurf corpus, described in [GAR 08], is made up of 5,755 conversations, totaling 620 hours (h) of conversations between professional clients and agents. The recordings were made at a call center in Montpellier, France, over a period of four months. Following automatic segmentation by speaker, as shown in Figure 1.1, automatic transcription was carried out using the LIMSI-VOCAPIA CTS (Conversational Telephone Speech) system described in [CLA 13], with a word error rate of around 30%. The corpus data were also made anonymous in order to protect personal information given by customers to the advisor (personal names, telephone numbers, banking information, information relating to health or personal problems, etc.). Part of CallSurf was transcribed and annotated manually by Vecsys (30 h of detailed transcription and 350 h of rapid transcription). The Vox Factory corpus, described in [CLA 13], comprises more than 1,000 h of conversations between private customers and agents (recorded in a

Oral and Written Interaction Corpora

3

call center in Aix-en-Provence over a period of three months); 77 conversations (around 14 h) were transcribed manually in detail: these make up the Vox14-ﬁne corpus, used in Chapter 2. Vox14-ﬁne is made up of three subsets, corresponding to emotional annotation of the corpus: – Vox5-neu-ﬁne containing conversations annotated as “neutral”; – Vox5-ang-ﬁne containing conversations annotated as expressing “anger”; – Vox5-joy-ﬁne containing conversations annotated as expressing “joy”.

Figure 1.1. The speech signal is ﬁrst segmented into speakers (agent/client), then transcribed automatically using the LIMSI-VOCAPIA CTS system

Table 1.1 shows a summary of these data and of the available annotation by subcorpus (detailed transcription and disﬂuency annotation). Corpus name Duration Number of calls Detailed transaction Disﬂuencies Vox5-neu-ﬁne

5h

33

Vox5-ang-ﬁne

5h

27

Vox4-joy-ﬁne

4h

17

Table 1.1. Vox14-ﬁne, subset of the 1,600 h call center corpus

Notably, part of the Vox14 corpus – known as VoxDISS – was manually annotated for disﬂuency using a strategy based on the Linguistic Data Consortium (LDC) guide, version 6.2 [STR 04b]. This results in ﬁne annotation of disﬂuencies (for example the start and end of edition disﬂuencies, and the annotation of different classes: repetition, self-correction, false starts and combined disﬂuencies) (see section 2.2.2). A descriptive analysis of the Vox14 corpus and the Vox9 and VoxDISS subsets

4

Opinion Analysis in Interactions

used in the different studies presented here is provided in [CLA 13] and in Camille Dutrey’s thesis [DUT 14a]. Analysis has shown a particularly high number of disﬂuencies in this corpus, with a rate substantially higher than that usually found in speech recordings. Using these collections of recordings, we were able to establish speciﬁc opinion grammars for the data (see section 2.1) with the aim of integrating the speciﬁc aspects relating to spontaneous speech. Our opinion grammars were constructed using CallSurf1 as the development corpus. The impact of recognition errors on the extraction system was ﬁrst measured in a qualitative manner for CallSurf [DAN 10] and in a quantitative manner for Vox 14 [CLA 13]: this is presented in section 2.1. We have also studied the way in which disﬂuency is taken into account in opinion detection, developing an automatic disﬂuency detection system using the VoxDISS subcorpus (part of Vox14). This is presented in section 2.2.2. 1.1.2. Satisfaction surveys: the NPS07-09 corpus At EDF, we were able to work on another type of particularly interesting data, collected in the course of satisfaction surveys carried out by telephone. The objective for the company in carrying out these surveys is to assess customer satisfaction and to identify topics that interest them. Notably, we worked on the data obtained during the net promoting score (NPS) survey2 of EDF Enterprise clients who had recently been in contact with EDF. The aim of the survey was to measure clients’ willingness to recommend EDF to others (rating on a scale from 1 to 10 in response to a closed question asked by the operator) and to evaluate the reasons for the assigned rating based on a series of open questions (relating to factors explaining their response and to areas where improvement is required in order for the client to recommend the company).

1 The CallSurf corpus was also used in developing a system for identifying the role of speakers [LAV 10b] and a thematic classiﬁcation system [LAV 10c]. These will not be presented here as their relevance for opinion analysis is limited. 2 The Net Promoting Score is a satisfaction score calculated by subtracting the percentage of negative opinions from the percentage of positive opinions.

Oral and Written Interaction Corpora

5

Responses to the open questions were transcribed by the surveyor during the discussion. Surveyors tended to summarize what was being said by the client. These written responses make up the textual data for the corpus, combining aspects speciﬁc to oral communications, (as in the case of the call center recordings), and written phenomena such as typographical errors and abbreviations. Table 1.2 shows examples of notes and responses given by clients to open questions. Notes Responses to the open question: “Could you tell me why you’ve chosen this rating?” 6

I don’t know if they keep us informed about the best offers, unlike France Telecom who called us every evening during the degrouping period, which was annoying. That doesn’t happen with EDF, but there’s a lack of service offers

8

Overall I’m quite satisﬁed. No technical problems

10

It’s a company which manages its subscribers well. Efﬁcient service provision. Everything’s ﬁne

9

For reliability. I took off a point for information about pricing options, which could be better

4

During contract subscription (during opening) there was a lack of information. They need to lower their prices

Table 1.2. Examples of responses to open questions in the NPS corpus

The assigned notes were used to create three classes: – positive: the company considered client opinions with a recommendation score of >8 to be positive; – negative: the company considered client opinions with a recommendation score of