114 39 89MB
English Pages 705 [694] Year 2023
LNCS 14015
Hirohiko Mori Yumi Asahi (Eds.)
Human Interface and the Management of Information Thematic Area, HIMI 2023 Held as Part of the 25th HCI International Conference, HCII 2023 Copenhagen, Denmark, July 23–28, 2023 Proceedings, Part I
Lecture Notes in Computer Science Founding Editors Gerhard Goos Juris Hartmanis
Editorial Board Members Elisa Bertino, Purdue University, West Lafayette, IN, USA Wen Gao, Peking University, Beijing, China Bernhard Steffen , TU Dortmund University, Dortmund, Germany Moti Yung , Columbia University, New York, NY, USA
14015
The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research, teaching, and education. LNCS enjoys close cooperation with the computer science R & D community, the series counts many renowned academics among its volume editors and paper authors, and collaborates with prestigious societies. Its mission is to serve this international community by providing an invaluable service, mainly focused on the publication of conference and workshop proceedings and postproceedings. LNCS commenced publication in 1973.
Hirohiko Mori · Yumi Asahi Editors
Human Interface and the Management of Information Thematic Area, HIMI 2023 Held as Part of the 25th HCI International Conference, HCII 2023 Copenhagen, Denmark, July 23–28, 2023 Proceedings, Part I
Editors Hirohiko Mori Tokyo City University Tokyo, Japan
Yumi Asahi Tokyo University of Science Tokyo, Japan
ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-031-35131-0 ISBN 978-3-031-35132-7 (eBook) https://doi.org/10.1007/978-3-031-35132-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword
Human-computer interaction (HCI) is acquiring an ever-increasing scientific and industrial importance, as well as having more impact on people’s everyday lives, as an ever-growing number of human activities are progressively moving from the physical to the digital world. This process, which has been ongoing for some time now, was further accelerated during the acute period of the COVID-19 pandemic. The HCI International (HCII) conference series, held annually, aims to respond to the compelling need to advance the exchange of knowledge and research and development efforts on the human aspects of design and use of computing systems. The 25th International Conference on Human-Computer Interaction, HCI International 2023 (HCII 2023), was held in the emerging post-pandemic era as a ‘hybrid’ event at the AC Bella Sky Hotel and Bella Center, Copenhagen, Denmark, during July 23–28, 2023. It incorporated the 21 thematic areas and affiliated conferences listed below. A total of 7472 individuals from academia, research institutes, industry, and government agencies from 85 countries submitted contributions, and 1578 papers and 396 posters were included in the volumes of the proceedings that were published just before the start of the conference, these are listed below. The contributions thoroughly cover the entire field of human-computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. These papers provide academics, researchers, engineers, scientists, practitioners and students with state-of-the-art information on the most recent advances in HCI. The HCI International (HCII) conference also offers the option of presenting ‘Late Breaking Work’, and this applies both for papers and posters, with corresponding volumes of proceedings that will be published after the conference. Full papers will be included in the ‘HCII 2023 - Late Breaking Work - Papers’ volumes of the proceedings to be published in the Springer LNCS series, while ‘Poster Extended Abstracts’ will be included as short research papers in the ‘HCII 2023 - Late Breaking Work - Posters’ volumes to be published in the Springer CCIS series. I would like to thank the Program Board Chairs and the members of the Program Boards of all thematic areas and affiliated conferences for their contribution towards the high scientific quality and overall success of the HCI International 2023 conference. Their manifold support in terms of paper reviewing (single-blind review process, with a minimum of two reviews per submission), session organization and their willingness to act as goodwill ambassadors for the conference is most highly appreciated. This conference would not have been possible without the continuous and unwavering support and advice of Gavriel Salvendy, founder, General Chair Emeritus, and Scientific Advisor. For his outstanding efforts, I would like to express my sincere appreciation to Abbas Moallem, Communications Chair and Editor of HCI International News. July 2023
Constantine Stephanidis
HCI International 2023 Thematic Areas and Affiliated Conferences
Thematic Areas • HCI: Human-Computer Interaction • HIMI: Human Interface and the Management of Information Affiliated Conferences • EPCE: 20th International Conference on Engineering Psychology and Cognitive Ergonomics • AC: 17th International Conference on Augmented Cognition • UAHCI: 17th International Conference on Universal Access in Human-Computer Interaction • CCD: 15th International Conference on Cross-Cultural Design • SCSM: 15th International Conference on Social Computing and Social Media • VAMR: 15th International Conference on Virtual, Augmented and Mixed Reality • DHM: 14th International Conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management • DUXU: 12th International Conference on Design, User Experience and Usability • C&C: 11th International Conference on Culture and Computing • DAPI: 11th International Conference on Distributed, Ambient and Pervasive Interactions • HCIBGO: 10th International Conference on HCI in Business, Government and Organizations • LCT: 10th International Conference on Learning and Collaboration Technologies • ITAP: 9th International Conference on Human Aspects of IT for the Aged Population • AIS: 5th International Conference on Adaptive Instructional Systems • HCI-CPT: 5th International Conference on HCI for Cybersecurity, Privacy and Trust • HCI-Games: 5th International Conference on HCI in Games • MobiTAS: 5th International Conference on HCI in Mobility, Transport and Automotive Systems • AI-HCI: 4th International Conference on Artificial Intelligence in HCI • MOBILE: 4th International Conference on Design, Operation and Evaluation of Mobile Communications
List of Conference Proceedings Volumes Appearing Before the Conference
1. LNCS 14011, Human-Computer Interaction: Part I, edited by Masaaki Kurosu and Ayako Hashizume 2. LNCS 14012, Human-Computer Interaction: Part II, edited by Masaaki Kurosu and Ayako Hashizume 3. LNCS 14013, Human-Computer Interaction: Part III, edited by Masaaki Kurosu and Ayako Hashizume 4. LNCS 14014, Human-Computer Interaction: Part IV, edited by Masaaki Kurosu and Ayako Hashizume 5. LNCS 14015, Human Interface and the Management of Information: Part I, edited by Hirohiko Mori and Yumi Asahi 6. LNCS 14016, Human Interface and the Management of Information: Part II, edited by Hirohiko Mori and Yumi Asahi 7. LNAI 14017, Engineering Psychology and Cognitive Ergonomics: Part I, edited by Don Harris and Wen-Chin Li 8. LNAI 14018, Engineering Psychology and Cognitive Ergonomics: Part II, edited by Don Harris and Wen-Chin Li 9. LNAI 14019, Augmented Cognition, edited by Dylan D. Schmorrow and Cali M. Fidopiastis 10. LNCS 14020, Universal Access in Human-Computer Interaction: Part I, edited by Margherita Antona and Constantine Stephanidis 11. LNCS 14021, Universal Access in Human-Computer Interaction: Part II, edited by Margherita Antona and Constantine Stephanidis 12. LNCS 14022, Cross-Cultural Design: Part I, edited by Pei-Luen Patrick Rau 13. LNCS 14023, Cross-Cultural Design: Part II, edited by Pei-Luen Patrick Rau 14. LNCS 14024, Cross-Cultural Design: Part III, edited by Pei-Luen Patrick Rau 15. LNCS 14025, Social Computing and Social Media: Part I, edited by Adela Coman and Simona Vasilache 16. LNCS 14026, Social Computing and Social Media: Part II, edited by Adela Coman and Simona Vasilache 17. LNCS 14027, Virtual, Augmented and Mixed Reality, edited by Jessie Y. C. Chen and Gino Fragomeni 18. LNCS 14028, Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management: Part I, edited by Vincent G. Duffy 19. LNCS 14029, Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management: Part II, edited by Vincent G. Duffy 20. LNCS 14030, Design, User Experience, and Usability: Part I, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares 21. LNCS 14031, Design, User Experience, and Usability: Part II, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares
x
List of Conference Proceedings Volumes Appearing Before the Conference
22. LNCS 14032, Design, User Experience, and Usability: Part III, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares 23. LNCS 14033, Design, User Experience, and Usability: Part IV, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares 24. LNCS 14034, Design, User Experience, and Usability: Part V, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares 25. LNCS 14035, Culture and Computing, edited by Matthias Rauterberg 26. LNCS 14036, Distributed, Ambient and Pervasive Interactions: Part I, edited by Norbert Streitz and Shin’ichi Konomi 27. LNCS 14037, Distributed, Ambient and Pervasive Interactions: Part II, edited by Norbert Streitz and Shin’ichi Konomi 28. LNCS 14038, HCI in Business, Government and Organizations: Part I, edited by Fiona Fui-Hoon Nah and Keng Siau 29. LNCS 14039, HCI in Business, Government and Organizations: Part II, edited by Fiona Fui-Hoon Nah and Keng Siau 30. LNCS 14040, Learning and Collaboration Technologies: Part I, edited by Panayiotis Zaphiris and Andri Ioannou 31. LNCS 14041, Learning and Collaboration Technologies: Part II, edited by Panayiotis Zaphiris and Andri Ioannou 32. LNCS 14042, Human Aspects of IT for the Aged Population: Part I, edited by Qin Gao and Jia Zhou 33. LNCS 14043, Human Aspects of IT for the Aged Population: Part II, edited by Qin Gao and Jia Zhou 34. LNCS 14044, Adaptive Instructional Systems, edited by Robert A. Sottilare and Jessica Schwarz 35. LNCS 14045, HCI for Cybersecurity, Privacy and Trust, edited by Abbas Moallem 36. LNCS 14046, HCI in Games: Part I, edited by Xiaowen Fang 37. LNCS 14047, HCI in Games: Part II, edited by Xiaowen Fang 38. LNCS 14048, HCI in Mobility, Transport and Automotive Systems: Part I, edited by Heidi Krömker 39. LNCS 14049, HCI in Mobility, Transport and Automotive Systems: Part II, edited by Heidi Krömker 40. LNAI 14050, Artificial Intelligence in HCI: Part I, edited by Helmut Degen and Stavroula Ntoa 41. LNAI 14051, Artificial Intelligence in HCI: Part II, edited by Helmut Degen and Stavroula Ntoa 42. LNCS 14052, Design, Operation and Evaluation of Mobile Communications, edited by Gavriel Salvendy and June Wei 43. CCIS 1832, HCI International 2023 Posters - Part I, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy 44. CCIS 1833, HCI International 2023 Posters - Part II, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy 45. CCIS 1834, HCI International 2023 Posters - Part III, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy 46. CCIS 1835, HCI International 2023 Posters - Part IV, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy
List of Conference Proceedings Volumes Appearing Before the Conference
xi
47. CCIS 1836, HCI International 2023 Posters - Part V, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy
https://2023.hci.international/proceedings
Preface
Human Interface and the Management of Information (HIMI) is a Thematic Area of the International Conference on Human-Computer Interaction (HCII), addressing topics related to information and data design, retrieval, presentation and visualization, management, and evaluation in human computer interaction in a variety of application domains, such as, for example, learning, work, decision, collaboration, medical support, and service engineering. This area of research is acquiring rapidly increasing importance towards developing new and more effective types of human interfaces addressing new emerging challenges, and evaluating their effectiveness. The ultimate goal is for information to be provided in such a way as to satisfy human needs and enhance quality of life. The related topics include, but are not limited to the following: • Service Engineering: Business Integration; Community Computing; E-commerce; E-learning and E-education; Harmonized Work; IoT and Human Behavior; Knowledge Management; Organizational Design and Management; Service Applications; Service Design; Sustainable Design; User Experience Design • New HI (Human Interfaces) and Human QOL (Quality of Life): Electronic Instrumentation; Evaluating Information; Health Promotion; E-health and its Application; Human-Centered Organization; Legal Issues in IT; Mobile Networking; Disasters and HCI • Information in VR, AR, and MR: Application of VR, AR, and MR in Human Activity; Art with New Technology; Digital Museums; Gesture/Movement Studies; New Haptic and Tactile Interaction; Information of Presentation; Multimodal Interaction; Sense of Embodiment (SoE) in VR and HCI • AI, Human Performance, and Collaboration: Automatic Driving Vehicles; Collaborative Work; Data Visualization and Big Data; Decision Support Systems; Human AI Collaboration; Human-Robot Interaction; Humanization of Work; Intellectual Property; Intelligent Systems; Medical Information Systems and Their Application; Participatory Design Two volumes of the HCII 2023 proceedings are dedicated to this year’s edition of the HIMI Thematic Area. The first part focuses on topics related to information design and user experience, data visualization and big data, multimodal interaction, and interaction with AI and Intelligent Systems. The second part focuses on topics related to service design, knowledge in e-Learning and e-Education, as well the support of work and collaboration. Papers of these volumes are included for publication after a minimum of two singleblind reviews from the members of the HIMI Program Board or, in some cases, from members of the Program Boards of other affiliated conferences. We would like to thank all of them for their invaluable contribution, support, and efforts. July 2023
Hirohiko Mori Yumi Asahi
Human Interface and the Management of Information Thematic Area (HIMI 2023)
Program Board Chairs: Hirohiko Mori, Tokyo City University, Japan and Yumi Asahi, Tokyo University of Science, Japan Program Board: • • • • • • • • • • • • • •
Takako Akakura, Tokyo University of Science, Japan Shinichi Fukuzumi, RIKEN, Japan Michitaka Hirose, University of Tokyo, Japan Yasushi Ikei, University of Tokyo, Japan Keiko Kasamatsu, Tokyo Metropolitan University, Japan Daiji Kobayashi, Chitose Institute of Science and Technology, Japan Yusuke Kometani, Kagawa University, Japan Ryosuke Saga, Osaka Metropolitan University, Japan Katsunori Shimohara, Doshisha University, Japan Takahito Tomoto, Tokyo Polytechnic University, Japan Kim-Phuong Vu, California State University, USA Tomio Watanabe, Okayama Prefectural University, Japan Takehiko Yamaguchi, Suwa University of Science, Japan Sakae Yamamoto, Tokyo University of Science, Japan
The full list with the Program Board Chairs and the members of the Program Boards of all thematic areas and affiliated conferences of HCII2023 is available online at:
http://www.hci.international/board-members-2023.php
HCI International 2024 Conference
The 26th International Conference on Human-Computer Interaction, HCI International 2024, will be held jointly with the affiliated conferences at the Washington Hilton Hotel, Washington, DC, USA, June 29 – July 4, 2024. It will cover a broad spectrum of themes related to Human-Computer Interaction, including theoretical issues, methods, tools, processes, and case studies in HCI design, as well as novel interaction techniques, interfaces, and applications. The proceedings will be published by Springer. More information will be made available on the conference website: http://2024.hci.international/. General Chair Prof. Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email: [email protected]
https://2024.hci.international/
Contents – Part I
Information Design and User Experience Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks in the Location-Based AR Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fangyuan Cheng, Qing Gu, and Xiaohua Sun
3
Generalized Cohen’s Kappa: A Novel Inter-rater Reliability Metric for Non-mutually Exclusive Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrea Figueroa, Sourojit Ghosh, and Cecilia Aragon
19
Knowledge Graph-Based Machining Process Route Generation Method . . . . . . . Jiawei Guo, Jingjing Wu, Jixuan Bian, and Qichang He
35
How to Share a Color Impression Among Different Observers Using Simplicial Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryo Kamiyama and Jinhui Chao
49
Task-Based Open Card Sorting: Towards a New Method to Produce Usable Information Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christos Katsanos, Vasileios Christoforidis, and Christina Demertzi
68
Emotive Idea and Concept Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tetsuya Maeshiro, Yuri Ozawa, and Midori Maeshiro
81
Survey on the Auditory Feelings of Strangeness While Listening to Music . . . . . Ryota Matsui, Yutaka Yanagisawa, Yoshinari Takegawa, and Keiji Hirata
95
Text Reconstructing System of Editorial Text Based on Reader’s Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Yuki Okaniwa and Tomoko Kojiri Interfaces for Learning and Connecting Around Recycling . . . . . . . . . . . . . . . . . . 122 Israel Peña and Jaime Sánchez Sound Logo to Increase TV Advertising Effectiveness Based on Audio-Visual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Kazuki Seto and Yumi Asahi Research on Visualization Method for Empathetic Design . . . . . . . . . . . . . . . . . . . 152 Miho Suto, Keiko Kasamatsu, and Takeo Ainoya
xx
Contents – Part I
A Study on HCI of a Collaborated Nurture Game for Sleep Education with Child and Parent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Madoka Takahara and Shun Hattori Analysis of Resilient Behavior for Interaction Design . . . . . . . . . . . . . . . . . . . . . . . 182 Haruka Yoshida, Taiki Ikeda, Daisuke Karikawa, Hisae Aoyama, Taro Kanno, and Takashi Toriizuka How Information Influences the Way We Perceive Unfamiliar Objects – An Eye Movement Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Lanyun Zhang, Rongfang Zhou, Jingyi Yang, Zhizhou Shao, and Xuchen Wang Data Visualization and Big Data The Nikkei Stock Average Prediction by SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Takahide Kaneko and Yumi Asahi What Causes Fertility Rate Difference Among Municipalities in Japan . . . . . . . . 222 Shigeyuki Kurashima and Yumi Asahi Explore Data Quality Challenges Based on Data Structure of Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Caihua Liu, Guochao (Alex) Peng, Chaowang Lan, and Shufeng Kong Feature Analysis of Game Software in Japan Using Topic Model and Structural Equation Modeling for Reviews and Livestreaming Chat . . . . . . . 248 Ryuto Miyake and Ryosuke Saga Inductive Model Using Abstract Meaning Representation for Text Classification via Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Takuro Ogawa and Ryosuke Saga Enhancing Visual Encodings of Uncertainty Through Aesthetic Depictions in Line Graph Visualisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Joel Pinney, Fiona Carroll, and Esyin Chew Satisfaction Analysis of Group/Individual Tutoring Schools and Video Tutoring Schools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Hiroyo Sano and Yumi Asahi Zebrafish Meets the Ising Model: Statistical Mechanics of Collective Fish Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Hirokazu Tanaka
Contents – Part I
xxi
Research on New Design Methods for Corporate Value Provision in a DX (Digital Transformation) Society: Visualization of Value by Lifestyle Derived from Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Akio Tomita, Keiko Kasamatsu, Takeo Ainoya, and Kunika Yagi Evaluating User Experience in Information Visualization Systems: UXIV an Evaluation Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Eliane Zambon Victorelli and Julio Cesar dos Reis Multimodal Interaction Study of HMI in Automotive ~ Car Design Proposal with Usage by the Elderly ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Takeo Ainoya and Takumi Ogawa Pilot Study on Interaction with Wide Area Motion Imagery Comparing Gaze Input and Mouse Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Jutta Hild, Wolfgang Krüger, Gerrit Holzbach, Michael Voit, and Elisabeth Peinsipp-Byma Development of a Speech-Driven Communication Support System Using a Smartwatch with Vibratory Nodding Responses . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Yutaka Ishii, Kenta Koike, Miwako Kitamura, and Tomio Watanabe Coordinated Motor Display System of ARM-COMS for Evoking Emotional Projection in Remote Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Teruaki Ito and Tomio Watanabe Fundamental Considerations on Representation Learning for Multimodal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Kenya Jin’no, Masato Izumi, Saki Okamoto, Mizuki Dai, Chisato Takahashi, and Tatsuro Inami A Fundamental Study on Discrimination of Dominant Hand Based on Motion Analysis of Hand Movements by Image Analysis Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Takusige Katura Glasses Encourage Your Choices: A System that Supports Indecisive Choosers by Eye-Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Tatsuya Komatsubara and Satoshi Nakamura Physiological Measures in VR Experiments: Some Aspects of Plethysmogram and Heart Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 Shinji Miyake, Chie Kurosaka, and Hiroyuki Kuraoka
xxii
Contents – Part I
Effects of Visual and Personality Impressions on the Voices Matched to Animated Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 Hiyori Takahashi and Tetsuya Maeshiro Effects of Gaze on Human Behavior Prediction of Virtual Character for Intention Inference Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Liheng Yang, Yoshihiro Sejima, and Tomio Watanabe Interacting with AI and Intelligent Systems Development of a Light-Emitting Sword Tip Accompanying Thrusts and a Device for Judging Valid Thrusts by Light Spectrum Detection Without an Electric Judge in the Foil Event of Fencing Competitions . . . . . . . . . 457 Seira Aguni, Tetsuo Nishikawa, Kaito Fujita, Ren Nakanishi, and Yumi Asahi A Study on Human-Computer Interaction with Text-to/from-Image Game AIs for Diversity Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Shun Hattori and Madoka Takahara A Generative Vase Design System Based on Users’ Visual Emotional Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Yinghsiu Huang The Impact of AI Text-to-Image Generator on Product Styling Design . . . . . . . . 502 Yu-Hsu Lee and Chun-Yao Chiu Generating Various 3D Motions by Emergent Imitation Learning . . . . . . . . . . . . . 516 Ryusei Mitsunobu, Chika Oshima, and Koichi Nakayama Personalized Sleep Stage Estimation Based on Time Series Probability of Estimation for Each Label with Wearable 3-Axis Accelerometer . . . . . . . . . . . 531 Iko Nakari, Masahiro Nakashima, and Keiki Takadama Controllable Features to Create Highly Evaluated Manga . . . . . . . . . . . . . . . . . . . . 543 Kotaro Nishizaki and Tetsuya Maeshiro A Study on Trust Building in AI Systems Through User Commitment . . . . . . . . . 557 Ryuichi Ogawa, Shigeyoshi Shima, Toshihiko Takemura, and Shin-ichi Fukuzumi Chatbot to Facilitate Opinion Formation in Web Search . . . . . . . . . . . . . . . . . . . . . 568 Yuya Okuse and Yusuke Yamamoto
Contents – Part I
xxiii
A State-of-Art Review on Intelligent Systems for Drawing Assisting . . . . . . . . . . 583 Juexiao Qin, Xiaohua Sun, and Weijian Xu Discussion Support Framework Enabling Advice Presentation that Captures Online Discussion Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Yuki Shoji, Yuki Hayashi, and Kazuhisa Seta Triple Supportive Information for Matrix Factorization with Image, Text, and Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 Takuya Tamada and Ryosuke Saga An Analysis of Factors Associated with Self-confidence in the Japanese . . . . . . . 634 Michiko Tsubaki, Naoki Hemmi, and Yumi Asahi Detecting Signs of Depression for Using Chatbots – Extraction of the First Person from Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660 Min Yang and Hirohiko Mori Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
Contents – Part II
Service Design Method for Assessing the Potential Impact of Changes in Software Requirements of Agile Methodologies Based Projects . . . . . . . . . . . . . . . . . . . . . . Angelo Amaral and Ferrucio de Franco Rosa
3
Validation of Items of Aspects of Interests in Quality-In-Use -Stakeholder Needs of Each System Domain- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shin-ichi Fukuzumi
22
Design Study of Wearable IV Pole: Service Design Perspective . . . . . . . . . . . . . . Guizhi Hong and Hong Chen
35
Extensibility Challenges of Scientific Workflow Management Systems . . . . . . . . Muhammad Mainul Hossain, Banani Roy, Chanchal Roy, and Kevin Schneider
51
The Effect of Color on the Visual Search Efficiency of Mobile Travel Service APP in Night Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Junyang Hou, Xiaofan Zhou, and Zhijuan Zhu
71
Research on Conversational Interaction Design Strategy of Shopping APP Based on Context Awareness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fusheng Jia, Xinyu Chen, and Yongkang Chen
90
Influence of Different Language Labels on Perception of Product Value . . . . . . . 104 Yen-Yu Kang and Yu-Dan Pan Structural Equation Modeling for the Interplay Among Consumer Engagements with Multiple Engagement Objects in Consumer’s Fashion . . . . . . 114 Masahiro Kuroda, Akira Oyabu, and Ryohei Takahashi Considerations for Health Care Services Related to the Menstrual Cycle . . . . . . . 127 Mayu Moriya, Suzuka Mori, Momoka Nozawa, Kaito Ofusa, Miho Suto, Ayami Ejiri, Takeo Ainoya, and Keiko Kasamatsu Dialogue-Based User Needs Extraction for Effective Service Personalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Takuya Nakata, Sinan Chen, Sachio Saiki, and Masahide Nakamura
xxvi
Contents – Part II
The Impact of External Networks on Product Innovation in Social Purpose Organizations: An Empirical Research on Japanese Museums . . . . . . . . . . . . . . . . 154 Shohei Oishi and Akitsu Oe Does Guaranteeing Anonymity in SNS Use Contribute to Regional Revitalization? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Yurika Shiozu, Soichi Arai, Hiromu Aso, Yuto Ohara, Ichiro Inaba, and Katsunori Shimohara Effects of Poor Visibility on Riding Anxiety in Riding a Bicycle that Can Be Ridden with Two Infants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Sakurako Toyoshima, Makoto Oka, and Hirohiko Mori Wayfinding and Navigation in the Outdoors: Quantitative and Data Driven Development of Personas and Requirements for Wayfinding in Nature . . . . . . . . 199 Frode Volden and Ole E. Wattne Knowledge in eLearning and eEducation Analysis of Classroom Test Results for an Error-Based Problem Presentation System for Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Nonoka Aikawa, Shintaro Maeda, Tomohiro Mogi, Kento Koike, Takahito Tomoto, Isao Imai, Tomoya Horiguchi, and Tsukasa Hirashima Using Interactive Flat Panel Display for STEM Education Based on SAMR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Yu-Hung Chien, Yu-Jui Chang, Hsunli Huang, Hsiang-Chang Lin, and Jyun-Ting Chien Analysis of Effects of Raggedy Student CG Characters in Face-to-Face Lectures and Their On-Demand Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Seishiro Hara, Ryoya Fujii, Saizo Aoyagi, and Michiya Yamamoto Triangle Logic Recomposition Exercise for Three-Clause Argument and Its Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Tsukasa Hirashima, Takuya Kitamura, Tomohiro Okinaga, Reo Nagasawa, and Yusuke Hayashi Proposal for a Semi-subjective Learning Support System with Operation Indices Targeting Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Tomohito Jumonji, Nonoka Aikawa, and Takahito Tomoto Instructional Design of a VR-Based Empathy Training Program to Primary School Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Meng-Jung Liu, Chia-Hui Pan, and Le-Yin Ma
Contents – Part II
xxvii
Classroom Practice Using a Code-Sharing Platform to Encourage Refinement Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Shintaro Maeda, Kento Koike, and Takahito Tomoto A Learning Support System for Programming that Promotes Understanding of Source Code Function Through Behavior Modeling . . . . . . . . . . . . . . . . . . . . . . 298 Taiki Matsui, Shintaro Maeda, Kento Koike, and Takahito Tomoto Proposal for Automatic Problem and Feedback Generation for Use in Trace Learning Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Tomohiro Mogi, Yuichiro Tateiwa, Takahito Tomoto, and Takako Akakura Improving Educational Outcomes: Developing and Assessing Grading System (ProGrader) for Programming Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Fatema Nafa, Lakshmidevi Sreeramareddy, Sriharsha Mallapuram, and Paul Moulema Development of VR Education System for Media Exchange in Cell Culture Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Akihiko Nakajima, Toru Kano, and Takako Akakura Proposal of Learning Programs: Using the Senseware . . . . . . . . . . . . . . . . . . . . . . . 354 Momoka Nozawa, Suzuka Mori, Miho Suto, Kaito Ofusa, Mayu Moriya, Keiko Kasamatsu, and Takeo Ainoya Investigation of the Relationship Between Map Quality and Higher-Order Thinking in Kit-Build Concept Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Nurmaya, Aryo Pinandito, Yusuke Hayashi, and Tsukasa Hirashima Application of the Recomposition Method to Mind Map and Experimental Verification of Learning Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 Kodai Watanabe, Aryo Pinandito, Nurmaya, Yusuke Hayashi, and Tsukasa Hirashima Development of a VR Collaboration System to Support Reflection on the Learning Process of Oneself and Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Yusuke Yagi, Yusuke Kometani, Saerom Lee, Naka Gotoda, Takayuki Kunieda, Masanori Yatagai, Teruhiko Unoki, and Rihito Yaegashi
xxviii
Contents – Part II
Supporting Work and Collaboration Design of an Interview Script Authoring Tool for a Job Interview Training Simulator Using Graph Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Deeksha Adiani, Emily Tam Nguyen, Jessica Urban, Matthew Fadler, Amir Alam, Jonathan Garcia-Alamilla, Nilanjan Sarkar, and Medha Sarkar Human Factors and Ergonomics Awareness Survey of Professional Personnel in a Large-Scale Company from the Aerospace Industry . . . . . . . . . . . 432 Atakan Co¸skun, Hacer Güner, and Mehmetcan Fal Optimization of a Human-Machine Team for Geographic Region Digitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Steven M. Dennis and Chris J. Michael Crowdsourced Argumentation Feedback for Persuasive Writing . . . . . . . . . . . . . . 461 Hiroki Ihoriya and Yusuke Yamamoto An Online Opinion-Learning Experiment Simulating Social Interaction on Emerging Technologies: A Case Study of Genome-Edited Crops . . . . . . . . . . 476 Kyoko Ito, Kazune Ezaki, and Tomiko Yamaguchi Tasks Decomposition Approaches in Crowdsourcing Software Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 Abdullah Khanfor Comparison of Nature and Office Environments on Creativity- A Field Study - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Ryosuke Konishi, Shinji Miyake, and Daiji Kobayashi A Conceptual Design of Management Interface for Wireless Sensor Network System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Julia Lee and Lawrence Henschen What Affects the Success of Programmers in Query Validation Process? An Eye Tracking Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 Deepti Mishra and Yavuz Inal Comparative Analysis of Manipulation Skills of Experts and Non-experts in Cell Culture Using VR Gloves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Satoru Osada, Toru Kano, and Takako Akakura
Contents – Part II
xxix
Prototyping Process Analyzed from Dialogue and Behavior in Collaborative Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Fuko Oura, Takeo Ainoya, Ahmad Eibo, and Keiko Kasamatsu A Study on Visual Communication with Different Conveyance Under MR Remote Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Keigo Satomi and Hirohiko Mori Developers Foraging Behavior in Code Hosting Sites: A Gender Perspective . . . 575 Abim Sedhain, Shahnewaz Leon, Riley Raasch, and Sandeep Kaur Kuttal Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
Information Design and User Experience
Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks in the Location-Based AR Application Fangyuan Cheng, Qing Gu, and Xiaohua Sun(B) College of Design and Innovation, Tongji University, Shanghai, China {2033681,2033666,xsun}@tongji.edu.cn
Abstract. While Augmented Reality (AR) allows users to obtain augmented information about the real world, Location-Based AR (LBAR) promotes further integration of virtual content and architectural landscapes. How to support users to better access architectural or spatial information through the interface is one of the most important directions in LBAR. Previous studies have adopted different strategies to present information in the traditional user interface and AR space. However, few studies have focused on how interfaces in LBAR applications cooperate to better support destination planning tasks in the urban shopping area. We investigated the performance of 6 cooperation modes of the interfaces in LBAR through an empirical study and summarized the findings in terms of presentation cooperation and interaction cooperation. In the end, we propose two new cooperation modes as well as a series of design suggestions to inspire subsequent practice. Keywords: Location-based augmented reality · Mobile application · Interface Design
1 Introduction Mobile Augmented Reality (AR) applications can project virtual content into the user’s environment, providing people with an engaging and interactive experience at any time and from any location. Location-based AR (LBAR) and marker-based AR are the two main types of AR applications available for mobile devices [13]. The user location and geographic information system are used by LBAR applications to connect virtual objects and information to the real world [12]. LBAR shares some characteristics with maker-based AR, such as engaging experience and intuitive perception. LBAR also has some unique features with the addition of geolocation, such as it is easy to connect the real world structure with its related information [4], and to create highly personalized services [13]. It can create perceptual immersion by triggering events that are closely related to the user’s current surroundings [3]. LBAR’s characteristics make it well suited for organic integration with buildings and urban ecological features to create larger scale augmented experiences. Access to © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 3–18, 2023. https://doi.org/10.1007/978-3-031-35132-7_1
4
F. Cheng et al.
architectural and regional information in AR is one of the important directions. There are some studies on the use of AR technology to overlay commercial information on buildings, and presents a futuristic vision of rich, seamless, and adaptive information presentation in LBAR [11]. Researchers have divided the interfaces in AR into two categories [9]. One is the world-stabilized interface registered on the geographic location and physical world; the other is the screen-stabilized interface presented directly to the user. As the quantity and richness of information in the virtual world grows [5, 11], these interface will increasingly serve as an important information display medium. Past studies have used different strategies for presenting information on these two interfaces, but no studies have identified how these two kinds of interface can cooperate for users to better access information. There is also a lack of empirical research on synergistic performance of these interfaces in information presentation and interaction in terms of delivering commercial information on a building. Our research explored the cooperation mode of 2D and 3D user interfaces on search tasks in LBAR applications in two aspects: 1) Presentation cooperation and 2) Interaction cooperation. An empirical study was conducted to demonstrate the user’s preferred information presentation mode for two media types in 2D and 3D interfaces and the effectiveness of improving the user’s search experience through coordination between the two interfaces. The contributions of this paper are mainly in the following aspects: 1. Explore the characteristics of 2D and 3D interfaces in presenting different types of information 2. Learn about the effectiveness of 6 interface cooperation modes in assisting users with information acquisition and route planning 3. Provide a case study of real-world AR for the scenario of shopping destination discovery and planning 4. Make suggestions for future interface and interaction design for LBAR application.
2 Related Work 2.1 LBAR for Building Information Representation AR information rendered on urban buildings is one of the most important directions of LBAR applications. It presents information about locations or artifacts registered in the real world, such as icons of restaurants, directly on top of a live video stream [11]. It can be used as the equivalent of a desktop or mobile Web browser for the physical world, some refer to it as the AR Brower, and in some early studies, such application were also referred to as the Touring Machine [6], the Real-World Wide Web [10] or the Situated Documentaries [9]. In [10], Rob Kooper et al. developed the Real-World Wide Web, which allows Web pages registered with real-world locations and presented to the user through AR devices. By gazing at items of interest in the environment, the user can view the details in a pop-up page. In Hollerer’s research, the user will be able to learn about campus events by looking around in AR and noticing virtual flags with textual labels denoting places of interest [9]. Rhyne proposed an AR system that shows a view through the head-mounted display of a restaurant’s detail information. It is a pop-up window with a short description and
Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks
5
a picture of the interior of the restaurant, and links to the restaurant’s menu, web page, and customer reviews [1]. In these studies, LBAR shows its capability for quick and intuitive access to augmented information of buildings in the real world. Research shows that this directness reduces cognitive effort and provides an advantage over conventional interfaces such as maps or lists [11]. However, commercial AR browsers have not yet reached their full potential. Despite the fact that AR browsers already underwent several fundamental iterations in hardware and software changes, they can still be seen as being poor in terms of content, not visually seamless and rather static [8]. To achieve better access to information, xx suggested the need for better information presentation and interaction design. 2.2 Information Display in AR AR application interfaces enable users to better access, view, and interact with information. Researchers for existing LBAR applications and studies have used different design strategies for AR interfaces than traditional interfaces. Some studies have proposed that the interface in LBAR consists of a screen-stable part and a world-stable part [9]. The menu bar at the top of the screen and the coneshaped pointer at the bottom are screen-stable and always visible. The world-stabilized interface is visually registered with specific locations in the world. For the interface in [10], it displays information nodes as simple icons, instead of presenting the raw, unfiltered content. More details appear in the upper left corner of the screen after a short gaze. In the AR system proposed by Bell, a pop-up window is automatically positioned near the place of interest with detailed information [1]. In terms of the design of information presentation for AR interfaces, Bellet al. introduced the concept of view management for AR [1]. It focuses on placing the virtual content in the right place on the screen to ensure that it does not obscure its real-world counterpart and does not overlap with other virtual content. Langlotz explored new layout and representation techniques for textual labels that the label should be geometrically aligned [11]. Some researchers have focused on the readability of information in AR. Fiorentino et al. [7] recommended having black text with a white billboard, Debernardis et al. [15] suggested not having a white billboard as large areas of white could lead to visual fatigue. He also found that white text with a blue billboard resulted in higher readability [14]. A study has identified design principles and then translated them into a federated model to investigate the usability of AR systems [2]. Some studies point out that current AR-based applications are still difficult to use which is a result of non-intuitive, poorly designed user interfaces [12]. Despite previous studies have explored interfaces design and information display for AR, there is still a lack of research on the cooperation between these interfaces in terms of information display and interaction. Which will hinder the further popularization and application of real-world AR. This paper will fill this research gap by means of an empirical study and contribute to the study of LBAR interface design and information display.
6
F. Cheng et al.
3 Method In this study, we propose to investigate the performance and user preference of the cooperation modes of interfaces, and the characteristics of different interfaces when displaying different types of information in participant applications. We focus on scenarios in which participant is used to view store information outside of a building. In this section, we will introduce the design of cooperation modes, the experiment setting, the recruitment of participants, and the experimental procedure. 3.1 Cooperation Mode Based on the classification criteria for interfaces in the participant application in [9], we propose two kinds of interfaces named 2D and 3D interfaces. The 2D interface is screen-stabilized and more similar to the traditional mobile interface; the 3D interface is world-stabilized and bound to the corresponding location in the real world, such as labels or information overlay on buildings. We analyzed existing software that supports people in store information acquisition and destination planning. Users primarily browse store information via a list, including store name, store description, rating, description, location, and pictures of the store’s interior environment. Although the type of information presented varies from platform to platform, store names and pictures are generally provided. We simplified the types of information presented to avoid other information influencing the experiment result and the store name (text) and store environment (image) were chosen to be presented. Because of the characteristics of participant, it can display location information directly in the 3D interface. To control the variables, we present location information in both 2D and 3D interfaces, and in the 2D interface it is in the form of text describing the floor where the store is located. To avoid adverse effects on the experimental results due to users’ increased familiarity with the content during multiple experiments, we used different stores in each mode. We chose 90 stores in large shopping malls, including clothing stores, restaurants, technology product stores, movie theaters, stationery stores, etc. We present information of the selected stores on the 2D and 3D interfaces respectively, forming six collaborative modes (Table 1). Each mode includes 15 stores and the corresponding name, environment and location information. The interface of each mode can be seen in Fig. 1. We set Mode 1 as the control group, which adopts the card list format commonly used in existing mobile applications, and the 3D interface has no content. In Mode 2, the 2D interface is a list in graphic form, and the floor where the store is located will be indicated in the 3D interface. In Mode 3, the 2D interface presents a picture of the store environment, and the 3D interface shows the store name and the floor where it is located. In Mode 4, the 2D interface in Mode 4 presents textual information, and 3D interface presents the store environment and the floor. In Mode 5, the 2D interface has no content, and the 3D interface shows the store name, environment picture and floor, while in Mode 6, both 2D and 3D interfaces show the store name, environment picture and floor information. We used the Unity engine and the Immersal SDK to prototype these modes in a LBAR application and built it to a mobile device (Fig. 2).
Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks
7
Table 1. Types of information on 2D and 3D interfaces for each cooperation mode Index
2D interface
3D interface
Mode 1 image, text, location – Mode 2 image, text, location location Mode 3 image, location
text, location
Mode 4 text, location
image, location
Mode 5 –
image, text, location
Mode 6 image, text, location image, text, location
Fig. 1. The interface of each mode
3.2 Experiment Setting The experimental site was selected in a semi-open space near a comprehensive commercial center in Yangpu District, Shanghai, with physical separation between the experimental site and the pedestrian road to avoid interference from the pedestrian flow. We selected one of the buildings to set up the AR contents. The LBAR application prototype does not follow any OS design specifications to prevent the subjective impact of differences in user experience with different system ecologies on the experiment. Two researchers participated in the experiment. One monitored the participants’ actions, confirmed the sequence of experience with different modes and recorded the process. The other guided the participants to complete the scale and conducted the corresponding interviews based on the results.
8
F. Cheng et al.
Fig. 2. A participant is using the application to complete the experimental task
3.3 Participants A total of nine participants (seven females and two males aged 20 to 25) were recruited. They spent money or visited shopping mall at least three times a month. 5 participants had recent AR experience, and 2 of them had experience designing or developing AR applications. The other five participants had no prior experience with AR applications but were familiar with the fundamental concepts. All the participants agreed on the video recording and data collection. Each participant was rewarded RMB 40 at the end of the experiment. 3.4 Procedure We introduced the experiment background before the experiment to quickly get participants into the scenario. Participants were asked to imagine that they were near a shopping mall, they have to find three shopping destination and plan a shopping tour. They would use each of the six cooperation modes in a random order. After using each mode, they would fill out a Five-point Likert Scale (Table 2). This scale asked participants how they felt about using the current mode to complete the task. After using all 6 modes, participants were asked to complete a questionnaire that included the same keyword as before but with multiple-choice and ranking questions to compare the six modes. Finally, participants would be interviewed by the researcher to explain the reasons for their choices based on the scale results. Each experiment lasted 30–45 min.
Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks
9
Table 2. The five-degree scale of the experience record filled out during the process Number Keyword
Question
1
Confusion
When using this model, I felt distinctly confused and difficult to understand
2
Efficiency
When using this mode, I can quickly find the information I want
3
Helplessness
When using this browsing mode, I feel the need to ask for help or clarification
4
Comprehensibility
I think this mode of is very easy to understand
5
Difficulty
I think it is easy to complete tasks in this mode
6
Willingness to Use I would like to use a similar browsing tool again in real life
4 Result The data collected on the five-point scale conformed to a normal distribution, with the variance of two items, efficiency and goal completion, which were uneven, so these two items were analyzed using the Brown-Forsythe test, while the other items were analyzed using ANOVA. The descriptive statistics of the scores, along with the significant differences between modes, can be found in Table 3 and 4. There was a significant effect of Modes on the Difficulty at the p < 0.05 level (Brown F = 2.652, p = 0.039). Post hoc comparisons using the LSD method indicated that Mode 2 (M = 4.11, SD = 0.33), Mode 5 (M = 4.11, SD = 0.93), Mode 6 (M = 4.22, SD = 0.67) was significantly different from Mode 4 (M = 2.89, SD = 1.27). All other metrics did not show any significant differences (all p’s > 0.05). Taken together, these results suggest that different cooperation modes of interfaces in LBAR applications did affect the participants’ perceived difficulty in completing the search task. The mode that places text information on a 2D interface and images on a 3D interface is more difficult to support the completion of tasks than modes that display these together on the same interface. The results also indicated that Modes 2–6 provided in this experiment have similar performance to the traditional flat list (Mode 1) in terms of Confusion, Efficiency, Helplessness, Comprehensibility, and Willingness to Use. Table 3. Results of the ANOVA Mode (M ± SD)
F
p
0.0(n = 9)
1.0(n = 9)
2.0(n = 9)
3.0(n = 9)
4.0(n = 9)
5.0(n = 9)
Confusion
1.78 ± 1.09
2.11 ± 0.60
2.33 ± 1.22
2.89 ± 1.17
2.11 ± 1.45
2.11 ± 1.27
0.917
0.478
Helplessness
1.89 ± 1.05
2.78 ± 0.83
2.67 ± 1.41
2.44 ± 1.01
1.89 ± 1.36
1.89 ± 1.27
1.145
0.350
Comprehensibility
4.11 ± 1.05
4.00 ± 0.71
3.67 ± 1.00
3.33 ± 1.22
4.44 ± 0.53
4.33 ± 0.71
1.932
0.106
Willingness to Use
3.22 ± 1.20
3.22 ± 0.97
3.00 ± 1.22
3.00 ± 1.41
3.78 ± 1.20
3.89 ± 0.93
0.990
0.434
10
F. Cheng et al. Table 4. Results of the Brown-Forsythe test Mode (M ± SD) Mode 1
Mode 2
Mode 3
Mode 4
Mode 5
Mode 6
Brown F
p
Efficiency
3.67 ± 1.22
3.78 ± 0.67
3.56 ± 1.01
3.33 ± 1.22
4.11 ± 0.33
4.22 ± 1.09
1.062
0.397
Difficulty
3.67 ± 1.12
4.11 ± 0.33
3.67 ± 0.87
2.89 ± 1.27
4.11 ± 0.93
4.22 ± 0.67
2.652
0.039*
For each participant’s final questionnaire, we performed a statistical analysis of the single-choice results. It showed that Mode 1 and Mode 5 had the same performance in adaptability (both chosen by 33.3% participants), and Mode 6 and Mode 5 had similar performance in terms of efficiency (both chosen by 33.3% participants). Mode 4 was considered the most difficult to understand (44.4%), the most needed help (44.4%). At the end of this scale, we asked the participants to rank the modes in order of preference and assigned a weight of 6–1 to the most to least liked modes. We performed a one-way between participants ANOVA (Table 5) and created a visual graph (Fig. 3) of the result. We found that cooperation mode showed a significance difference at p < 0.01 level for user preference (F = 4.862, p = 0.001). Participants found Mode 5 (M = 5.00, SD = 0.93) to be the most preferable for use, followed by Mode 6 (M = 4.75, SD = 1.28). The participants least preferred modes were Mode 3 (M = 2.63, SD = 1.41) and Mode 4 (M = 2.38, SD = 1.19). In addition, we found that the results of participants’ preferred mode showed consistency with their evaluation of the performance of each mode.
Fig. 3. Results of the participants’ preference for each mode
Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks
11
Table 5. Results of the ANOVA Modes (M ± SD) Mode 1
Mode 2
F Mode 3
Mode 4
Mode 5
p
Mode 6
Score 3.00 ± 2.14 3.13 ± 1.36 2.63 ± 1.41 2.38 ± 1.19 5.00 ± 0.93 4.75 ± 1.28 4.862 0.001**
5 Discussion 5.1 Task Supportability In the experiment, participants were asked to choose destinations and plan tour routes. They demonstrated a consistent decisions sequence in our observations and follow-up interviews: 1) view stores; 2) select specific stores; 3) view corresponding geographic information; and 4) sequence destinations. We divided these four steps into two phases: The Finding Destination (FD) phase (steps 1 and 2) and the Planning Route (PR) phase (steps 3 and 4). Finding Destination In this stage, participants need to determine their interest by browsing store information. We found that correlation between images and text had an impact on participants’ experience of finding destination. As P4 says, “When using the Mode 2, the card UI is boring but clear, and Mode 5 and 6 combine text and images so you can see them at a glance and feel at ease”. There were reports of increased difficulty in understanding the store information for the Mode 3 and Mode 4, where the images and text of the same stores were presented in 2D and 3D interfaces separately. After experiencing Mode 4, P3 stated, “There is some three-dimensional information on the building wall and some buttons on the screen, looking back and forth makes me feel tired.”. According to P5, “Mode 3 made me more confused, because if the information of the same store is separated, I will always subconsciously ignore a certain part.” Additionally, users desired to learn more additional information in order to compare different destinations. The introduction of the AR increased participants’ expectations of more dynamic information, as P4 mentioned that she would like to see more information such as real-time foot traffic and empty seats. Planning Route (PR) In this stage, users tend to view the floor information in a comprehensive and clear way. The experiment task requires the participants to describe a spatial movement route, so they need to review the stored information, then sequence the destination of their interest. In mode 1, floors are accommodated in the cards as additional information about the store only in textual form. While this mode was familiar to users and easy to follow, they showed more pronounced difficulties in recalling their choices and sequencing their actions. As P7 mentions, “When using Mode 1, if I want to confirm the location of multiple stores, I have to tap the same card repeatedly, otherwise, I forget it in a flash.”. P9 also mentions, “When using Mode 1, I need to remember information while looking at it. “ P4 commented after using Mode 1: “Although the floor indication is
12
F. Cheng et al.
obvious, I forget the initial selection after looking at a few more stores. “. Mode 5 and Mode 6, on the other hand, received the most positive comments for planning by placing store information directly on the building. P3, P4, and P9 all said that Mode 5 allows information at a glance, and P5, P7, and P8 appreciated that Mode 6 could provide comprehensive information on both 2D and 3D interfaces. In addition, P8 mentioned that it was most convenient to determine the touring route at once through LBAR, and P9 thought that LBAR provided an intuitive map-like view that helped him to remember the route in a more visual way. 5.2 Cooperation of the Interface In this section, we will examine the characteristics of LBAR interfaces in terms of Representation Cooperation (RC) and Interaction Cooperation (IC). The RC focuses on the information organization and distribution on these interfaces, and the IC focuses on the interaction paths and the feedback between the two interfaces. Representation Cooperation (RC) In the experiment, two types of information (text and image) were presented to participants in different ways of organization. We found that the participants preferred the two types of information to be placed together (Mode 5, 6), believing that this helped them to perceive information more comprehensively and provided them with more options for accessing information (P8, P9), particularly when exploring an unfamiliar environment. In Mode 6, there are images and texts on both 2D and 3D interfaces. P4 noted that this eliminates the unfamiliarity of operating the AR by providing them with a relatively familiar list of planes. However, this approach takes up more screen space than the other modes and makes it overfilled (P7) and distracting (P6, P8). Several participants expressed concern that the amount of information at the same level on the 2D and 3D interfaces would increase cognitive burden, and that emphasis on priorities could be considered to address this issue. Participants perceived an increase in the cost of information comprehension when images and text were presented separately, as in Modes 3 and 4. As P6 pointed out, her eyes had to move back and forth between the two interfaces. This format also causes information asymmetry on the two interfaces, as seen in Mode 4, where the floor information implied by the image grouping does not correspond to its 2D interface. However, such a mode has implications for providing both primary and secondary relationships. P4 suggested that in Mode 2, the image’s details can be seen more clearly as supplementary information. P9 proposed that in Mode 3, the text list in the 2D interface could provide her an index, which could give a clear overview of information in the 3D interface. The store location (floor) was regarded as critical information by the participants. We found that presenting floor information in conjunction with pictures or text information in the 3D interface assisted participants in planning their route, such as Mode 5 and Mode 6. P6 said such modes gave her a three-dimensional feeling and helped her understand the spatial distribution of stores, while P9 said it allowed her to form a mental map of the route, saving her memory costs. However, presenting only the floor information in 3D interface added complexity because the participant felt she needed to satisfy both
Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks
13
the conditions of viewing in list and scanning the building in order to locate the store, which required more but meaningless attention (P6). Interaction Cooperation (IC) In the experiment, we discovered that the corresponding interaction feedback between 2D and 3D interfaces was crucial. Because of the lack of visual guidance, participants sometimes did not know that they needed to lift their phones to a certain height to see the AR content after clicking on the card in the 2D interface (P2), as P1’s comment on Mode2 that she did not know what information the interface wanted to show her. This means that when switching between 2D and 3D interfaces, clearer hints are required to avoid the user becoming disoriented, such as guiding the user’s sight movement with arrows and lines. Furthermore, prolonged viewing of AR content necessitates holding the device and tilting the head all the time, which causes physiological fatigue. It is necessary to reconsider the timing of switching between 2D and 3D interfaces, or to ensure users’ comfort by designing appropriate interaction techniques. We categorized two groups of participants with different types of information reading habits in the interviews, namely, image readers and text readers. When viewing a card that contains both images and texts, the image reader will prioritize the image or video information, while the text reader will prioritize the text information. We discovered that 80% of image readers would first view the content in the 3D interface, regardless of the type of information, and then click on the content of interest to view more information. Whereas all four text readers would first view the 2D interface for an overview before clicking on the content of interest in the 3D interface to view more information such as floors. We hypothesize that users’ daily reading habits influence the process of viewing information in AR. We won’t go into detail because it isn’t the focus of this study, but it is instructive for future research and practice on the relationship between human cognitive habits and interface design for AR. 5.3 Information Cognition Through observations and interviews, we found that ways of presenting information in the 2D and 3D interface influence subjects’ cognition behaviors. We thematized the influences emerged in the interviews by open coding, resulting in four major themes: Quantity, Layout and Hierarchy, Content Clarity, and Action Cost. Quantity The quantity of information mentioned here is a relative concept, defined by the user’s subjective perception. When participants are cognitively burdened, they perceive the current information as “too much”. This burden may be borne by the 2D and 3D interfaces, or by a combination of both. When there are a lot of stores, the flat list on the 2D interface becomes too long and takes a long time and effort to browse (P7). In addition, participants expressed that presenting too much textual information on either 2D or 3D interfaces can increase cognitive stress, but this can be effectively alleviated by zoning out the images as backgrounds (P4). Some participants found Mode 6 stressful due to
14
F. Cheng et al.
the amount of information that filled the screen, generating too many portals and not knowing where to begin. Layout and Hierarchy The layout and hierarchy of information also had an impact on the participants’ information cognition. Mode 5 was described as clear and intuitive by most participants. We concluded that this was due to the clear floor division, the area separation under the stores’ name, and the highlighting and expansion of the image when clicked. P5 noted that the 3D interface seems well organized by adding frames and colors, and that the information hierarchy seems clearer by highlighting and graying out the selected and unselected contents. Content Clarity People usually choose a store as a destination based on its brand image and interior design, so it is critical to effectively display this information. Several participants described Mode 4 as unclear, cluttered, and difficult to understand. We analyzed that this is related to our choice of images, but we believe that the main reason is the disadvantage of the 3D interface in terms of rendering images. P3 also pointed out that when the building is covered with pictures rather than text, it is more difficult to identify each store. Furthermore, the perspective problem in the 3D interface weakens content at a distance, which is more obvious for images. We found that participants who couldn’t see the content clearly applied their interaction habits in 2D interface to 3D interface, such as two-finger zooming and swiping, despite the prototype we provided didn’t support such interactions. Action Cost When viewing information in the 2D interface, participants had to repeatedly swipe up and down to view, compare, and remember stores’ locations, whereas viewing information in the 3D interface required holding the device and tilting the head all the time. We noticed that as participants became more proficient in using the 3D interface, they were able to find information more efficiently, with less time spent tilting their heads and less costly movements, and their acceptance of the 3D interface increased significantly.
6 Design Strategy Based on the analysis of the experimental results, we summarize the strengths and weaknesses of the 6 cooperation modes and integrate the participants’ comments and recommendations to propose two recommended cooperation mode for future 2D and 3D interface collaboration in LBAR, as well as more specific design recommendations.
Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks
15
6.1 Cooperation Mode Index Mode In this mode, the 2D interface displays a text or graphic list on a as an index or overview, as well as showing complete graphic information and more vivid multimedia information on the 3D interface. Users are presented with both 2D and 3D interfaces at the same time. Interaction and interface design are used to establish correspondence between the content on the two interfaces. This mode is applicable to the users that have text reader characteristics, and to the scenarios that is unfamiliar with lots of new stores. This mode helps users browse flat list on the more familiar 2D interface and make initial decisions, and then view more information in the 3D interface, which makes it more user-friendly for novice AR users. It is worth noting that when the index is in the form of a graphic list, consider providing the option of retraction to avoid the user’s cognitive burden of having too much information on the screen. Auxiliary Mode The 3D interface contains all of the text and image information in this mode, and the 2D interface is used to supplement the detailed information, such as the description of the store. The user would browse on the 3D interface at first, and then interacts with the information to open the 2D interface. This mode is primarily appropriate for the users that have image reader characteristics, and the scenarios that is about planning touring route in familiar shopping areas. This mode helps users to intuitively view stores distribution in three dimensions and create a route planning map. It is better suited for users who are used to searching for information by AR. 6.2 Design Advice Based on the experimental analysis, we summarized four themes of LBAR design suggestions to provide insights for future design practices. Control the Amount of Information When viewing the 3D interface on mobile devices, users must tilt their heads. Despite the fact that each mode’s experience time was only 1–2 min, the subjects indicated that they would feel fatigued. As a result, care should be taken in the design to control the amount of information in the 3D interface so that users can quickly access the information they need. It should be noted that too much textual information is disliked by users in both 2D and 3D interfaces. Additionally, as information becomes more complex, filters will be required to assist users in narrowing the amount of information and choices available. Provide Correspondence When using both 2D and 3D interfaces, the two should reflect the correlation in information presentation, such as the location of a floor in the real world, which should also be reflected in the 2D interface through grouping and other design means. To prevent
16
F. Cheng et al.
users from becoming disoriented when switching between 2D and 3D interfaces, clear visual guidance and feedback should be provided. Improve Readability It is important to image selection, avoiding images with cluttered content and reducing color noise. Also, consider using highlight-ing, underlining, separating, and other design tools to improve information readability in the 3D interface. Consider using interactive methods such as zooming in to assist the user in viewing the information when the 3D interface corresponds to a high floor to ensure that the user can still see the content. More Multimedia and Interactive Methods Users have higher expectations for AR content, such as 3D models and dynamic effects. They also believe that the feature of AR encourages them to interact with it more. Richer information forms can be considered in future design to present information that is difficult to present in traditional interfaces, such as the real-time flow of people in the store, the road map between floors, and so on.
7 Limitation and Future Work Task Setting In the experiment, participants were instructed to assume that they were already close to their destination and were interested in the building in front of them but didn’t know which stores to visit. However, when realistic usage scenarios are considered, the user’s search behavior may occur in a larger spatial context. For example, filtering destinations and locating them within a 500m radius. This implies that the amount of information will expand. If this situation is taken into consideration task settings must be further optimized. Building Exterior The experiment’s target building has a glass facade and seven ground floors. In Shanghai’s mixed-use commercial districts, there are many modern buildings with less informative facades like this one. Because the participants were long-term residents of downtown Shanghai, this type of building was chosen primarily for their cognitive experience. However, it should be noted that the appearance of commercialized buildings varies greatly from district to district. Some buildings, for example, may have colorful signage hung from the facade, whereas others may be larger or smaller in size. Future research should look into the impact of such cityscape differences on users’ information perception and destination planning behavior. Perspective Issues The perspective issue was another stumbling block in the participants’ LBAR experience. This difficulty would be more noticeable if the information presented was more complex or at a greater spatial distance. Furthermore, a commercial area in the real world usually has a concentration of multiple buildings. When users arrive in an unfamiliar area, they
Cooperation Mode of 2D and 3D Interfaces on Destination Planning Tasks
17
are undecided which building they will enter. As a result, there may be a need to compare store information across multiple buildings. This also implies that the spatial relationship of buildings may have an impact on the interface design of live-action AR. There is a need to clarify how to avoid operational barriers caused by perspective problems in the study of live-action AR design for large scale spaces.
8 Conclusions In this paper, we designed and developed a LBAR prototype of six interface cooperation modes. We conducted an empirical study where participants utilized each mode to plan the destinations in an urban shopping area. Each cooperation mode experience would be evaluated by the participant on a scale of 1 to 5 after using the mode. After using all modes, participants were invited to complete a questionnaire containing singlechoice and ranking questions to gauge how they had perceived these modes. We further examined the factors influencing the participants’ selections through interviews. It was discovered that Compared to a 2D interface with text and 3D interface with images, a 3D-only interface and a coexistence of 3D and 2D interfaces can significantly make it easier to complete destination planning tasks. These two modes are also the most preferred by the participants. Besides, we found advantages of the 2D interface in providing content indexing and the 3D interface in providing route maps. We summarized our findings in terms of presentation cooperation and interaction cooperation and discussed the insights concerning information cognition in LBAR. In the end, we proposed two cooperation modes for future 2D and 3D LBAR applications, as well as design recommendations for information presentation on the interfaces. Finally, we summarized the experiment’s limitations and proposed feasible future research directions.
References 1. Bell, B., Feiner, S., Hollerer, T.: Information at a glance [augmented reality user interfaces. IEEE Comput. Graph. Appl. 22(4), 6–9 (2002) 2. Boersting, I., Fischer, B., Gruhn, V.: AR scribble: evaluating design patterns for augmented reality user interfaces. Presented at the Augmented Reality, Virtual Reality, and Computer Graphics (2021) 3. Budvytyte, S., Bukauskas, L.: Location-based story telling for mobile tourist. In: 2006 7th International Baltic Conference on Databases and Information Systems, pp. 220–228 (2006) 4. Chen, C.-W., Chen, Y.H.: Prototype Development of an Interpretative game with locationbased AR for ecomuseum. Presented at the Human Aspects of it for the Aged Population: Technology in Everyday Living, PT II (2022) 5. Chi, H.-L., Kang, S.-C., Wang, X.: Research trends and opportunities of augmented reality applications in architecture, engineering, and construction. Autom. Constr. 33, 116–122 (2013) 6. Feiner, S., MacIntyre, B., Hollerer, T., Webster, A.: A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment 7. Fiorentino, M., Debernardis, S., Uva, A.E., Monno, G.: Augmented reality text style readability with see-through head-mounted displays in industrial context. Presence 22(2), 171–190 (2013)
18
F. Cheng et al.
8. Grubert, J., Langlotz, T., Grasset, R.: Augmented reality browser survey. Institute for computer graphics and vision, University of Technology Graz, technical report. 1101 37 (2011) 9. Hollerer, T., Feiner, S., Pavlik, J.: Situated documentaries: embedding multimedia presentations in the real world. In: Digest of Papers. Third International Symposium on Wearable Computers, pp. 79–86. IEEE Comput. Soc, San Francisco, CA, USA (1999) 10. Kooper, R., MacIntyre, B.: Browsing the real-world wide web: maintaining awareness of virtual information in an AR information space. Int. J. Hum.-Comput. Interact. 16(3), 425–446 (2003) 11. Langlotz, T., Nguyen, T., Schmalstieg, D., Grasset, R.: Next-generation augmented reality browsers: rich, seamless, and adaptive. Proc. IEEE 102(2), 155–169 (2014) 12. Raber, J., Ferdig, R.E., Gandolfi, E., Clements, R.: An analysis of motivation and situational interest in a location-based augmented reality application. Interact. Des. Architect. (52), 198– 220 (2022) 13. Raeburn, G., Tokarchuk, L., Welton, M.: Creating immersive play anywhere location-based storytelling using mobile AR. Presented at the Augmented Reality, Virtual Reality, and Computer Graphics (2021) 14. Tatzgern, M., Orso, V., Kalkofen, D., Jacucci, G., Gamberini, L., Schmalstieg, D.: Adaptive information density for augmented reality displays. In: 2016 IEEE Virtual Reality (VR), Greenville, SC, USA, pp. 83–92. IEEE (2016) 15. Woodward, J., Ruiz, J.: Analytic review of using augmented reality for situational awareness. IEEE Trans. Vis. Comput. Graph. 29(4), 2166–2183 (2022) 16. Text readability in head-worn displays: color and style optimization in video versus optical see-through devices – PubMed. https://pubmed.ncbi.nlm.nih.gov/24201331/. Accessed 08 Feb 2023
Generalized Cohen’s Kappa: A Novel Inter-rater Reliability Metric for Non-mutually Exclusive Categories Andrea Figueroa , Sourojit Ghosh(B) , and Cecilia Aragon University of Washington, Seattle, USA [email protected]
Abstract. Qualitative coding of large datasets has been a valuable tool for qualitative researchers. In terms of inter-rater reliability, existing metrics have not evolved to fit current approaches, presenting a variety of restrictions. In this paper, we propose Generalized Cohen’s kappa, a novel IRR metric that can be applied in a variety of qualitative coding situations, such as variable number of coders, texts, and non-mutually exclusive categories. We show that under the preconditions for Cohen’s kappa, GCK performs very similarly, thus demonstrating their interchangeability. We then extend GCK to the aforementioned situations and demonstrate it to be stable under different permutations. Keywords: Inter-rater reliability · Cohen’s kappa cohen’s kappa · Qualitative Coding
1
· Generalized
Introduction
A key component of qualitative research is qualitative coding of data, classifying textual, or other often human-generated items into nominal categories for future analysis [4]. To account for individual subjectivites, multiple researchers often encode the same dataset. It is thus useful to have a standardized metric of evaluating the overall agreement between coders. This metric is called inter-rater reliability (IRR), which considers agreements and disagreements between coders to produce an overall score, usually between 0 and 1. Though several IRR metrics exist, most commonly used is Cohen’s kappa [6], which measures agreement between two coders using mutually exclusive categories. For more than two coders, Fleiss’ kappa [10] achieves the same goal, while Krippendorff’s alpha [18] can be used for situations where coders encode unequal amounts of data. However, one limitation of all these metrics is their inability to accommodate taxonomies of non-mutually exclusive categories. This is a major shortfall, since non-mutually exclusive categories can arise fairly commonly in taxonomies [2, 12,13]. Furthermore, no single IRR metric currently exists that accommodates a variety of qualitative coding situations, such as a variable number of coders, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 19–34, 2023. https://doi.org/10.1007/978-3-031-35132-7_2
20
A. Figueroa et al.
coders encoding unequal amounts of data, and using non-mutually exclusive categories. Since their inceptions, commonly used IRR metrics have not evolved to accommodate such modern qualitative coding situations [1,9,14,23], and there exists a need for a novel IRR metric that is robust enough to do so, yet performing similarly enough to current metrics under their preconditions. We introduce the Generalized Cohen’s Kappa (GCK), which aims to achieve this. Through Monte Carlo (MC) simulations, we first establish that this metric performs similarly to Cohen’s kappa under the preconditions of the latter. We then show how it can be applied to taxonomies of non-mutually exclusive categories, and show how it is robust enough to handle variable numbers of coders encoding unequal amounts of data. In addition to reporting an overall kappa score per combination and overall, GCK also reports on agreement per category and different combinations of coders. We establish GCK as a viable and reliable IRR metric.
2 2.1
Related Work Inter-rater Reliability
Inter-rater reliability (IRR) is a statistical measurement, defined as “the extent to which the variance in the ratings is attributable to differences among the objects rated.” [22] IRR metrics provide an artifact of measuring consensus (or lack thereof) between coders [19], also accomodating for chance agreements [6, 14]. An IRR score of 1 is considered as perfect agreement and a score of 0 is considered agreement by chance, meaning that all raters randomly coded the data. Table 1 shows an interpretation of IRR scores, which also applies to all the IRR metrics presented in this paper. Table 1. Interpretation of the IRR scores. IRR score Interpretation 1 and enhanced when γ < 1 but intact when γ = 1. Denote the color coordinate input as Cin and output as Cout , the function is γ Cin Cout = 255 255 of which the graph is shown in Fig. 2.
Fig. 2. The gamma curve for RGB
5.2
Gamma Correction in LUV Space
Now we replace the RGB space with the LUV space. We propose here a gamma compensation of the color coordinates in the LUV ∗ , and L∗out , space. Denote the input and output of L∗ , u∗ , v ∗ as L∗in , u∗in , vin ∗ ∗ uout , vout . The non-negative L coordinates are compensated the same as RGB coordinates, while the absolute values of U V coordinates are compensated, or both positive and negative values are compensated with the same γ functions as follows. ∗ γL∗ Lin L∗out = 100 100
∗ γu∗ |uin | u∗out = sign(u∗in ) 200 200
∗ γv ∗ |vin | ∗ ∗ vout = sign(vin ) 200 200 The graphs of the γ correction functions of LU V coordinates are shown in Fig. 3.
How to Share Color Impression Among Different Observers
Fig. 3. The gamma curves for LUV
Fig. 4. Paintings used in impression sharing experiments
57
58
6
R. Kamiyama and J. Chao
Experiments
Experiments are conducted in a top-open room with walls of neutral N5.5. The ceiling illumination is Hf Exclusive Hf premier fluorescent lights by Panasonic. A display ColorEdge CG2730-Z by EIZO is used with view distance of 80 cm. Images and adjective pairs are shown in random order to prevent adaptation and memory or prediction. The screen shows 10 s N5.5 gray after each image is evaluated. Each session is limited within 30 min to reduce fatigue of objectives. Two paintings in Fig. 4 are used in the experiments. To indicate the differences between SD evaluations of two observers, we define a MSE (Mean Square Error) criteria as follows. Denote Alice’s SD scores of A m adjective pairs by the vector sA = (sA 1 , ..., sm ), and Bob’s by the vector B B B s = (s1 , ..., sm ), m 1 A 2 (s − sB MSE(sA , sB ) := i ) . m i=1 i 6.1
Compensation in RGB Space
First we applied the sharing map to two observers, Alex is a color normal observer and Bob is color-weak of Type D. Therefore, the color impression sharing map from Alex to Bob creates a modified image to give Bob the same impression of Alex on the painting. We called it color-weak compensation. On the other direction, the color impression sharing map from Bob to Alex creates a modified image to give Alex the same impression of Bob on the painting. We call it colorweak simulation. The painting Fig. 4(a) is used here. The parameters are chosen as 27 points around the no-modification parameter (1, 1, 1) in the γr , γg , γb space, {(γr , γg , γb ) | γr , γg , γb ∈ {0.81, 1.00, 1.23}} = {0.81, 1.00, 1.23}3 For SD evaluations, we used 24 adjective pairs in Table 1, partially from Osgood’s original selection in [6] with additions for paintings [2–4]. Table 1. Adjective pairs used for RGB compensations light – dark
hot – cold
new – old
stable – changeable beautiful – ugly
poised – unpoised
mature – youthful
interesting – boring
heavy – light
hard – soft
constricted – spacious healthy – sick
noisy – calm
feminine – masculine
like – dislike
sharp – blunt
clear – murky
sophisticated – naive
strong – weak
dynamic – static
ornate – plain
deep – shallow
good – bad
positive – negative
How to Share Color Impression Among Different Observers
Fig. 5. Painting (b) shares Alex’s impression on painting (a) with Bob
Fig. 6. Comparison of SD scores before and after Bob shared Alex’s impression
59
60
R. Kamiyama and J. Chao
Fig. 7. Painting (b) shares Bob’s impression on painting (a) with Alex
Fig. 8. Comparison of SD scores before and after Alex shared Bob’s impression
How to Share Color Impression Among Different Observers
61
Figure 5 shown paintings and parameters which are perceptionally equivalent with respect to Alex’s impression on the painting 5(a). Specifically, the painting 5(b) shown to Bob gives him the same impression of Alex on the painting 5(a). Figure 6(a) shows the SD evaluations of both Alex and Bob on the paining Fig. 5(a). Figure 6(b) shows the SD evaluations of Alex on Fig. 5(a) and of Bob on Fig. 5(b). The SD differences or MSE between Alex and Bob is reduced from 2.79 to 1.63, a 41.63% reduction. On the other direction, Fig. 7 shown paintings and parameters which are perceptionally equivalent with respect to Bob’s impression on the painting Fig. 7(a). In particular, the painting Fig. 7(b) shown to Alex gives him the same impression of Bob on the painting Fig. 7(a). Figure 8(a) shows the SD evaluations of both Alex and Bob on the paining Fig. 7(a). Figure 8(b) shows the SD evaluations of Alex on Fig. 7(b) and of Bob on Fig. 7(a). The SD differences or MSE between Alex and Bob is reduced from 4.20 to 0.53, a 87.35% reduction. 6.2
Compensation in LUV Space
We apply the proposed method to a larger group of 9 members including 6 color normal, 1 color weak observer and 2 color normal observers wearing Variantors, the color deficiency simulation glasses of Type P and Type D [1]. In order to reduce burden of psychological experiments, we use a short list of adjective pairs in Table 2 Table 2. Adjective pairs used for LUV compensations good – bad
beautiful – ugly
light – dark
hot – cold
optimistic – pessimistic colorful – colorless
hard – soft
ornate – plain
unusual – usual
spacious – constricted
The painting Fig. 4(b) is used here. The parameters in γL , γu , γv space is the 27 points around the no-modification parameter (1, 1, 1) as below. {(γL∗ , γu∗ , γv∗ ) | γL∗ , γu∗ , γv∗ ∈ {0.71, 1.00, 1.41}} = {0.71, 1.00, 1.41}3 In fact, the proposed sharing from Alice to Bob requires the target impression of Alice is either inside or close to a simplex of Bob in the SD space. If this is not satisfied, one needs more SD evaluations to meet the condition. Hence we exclude the cases which do not meet the above condition. Besides, we also do not consider the cases when the SD evaluations between two observers are close before sharing. 20 pairs of impression sharing are conducted, the proposed method is proved to be effective in all cases. The reduction rate of SD differences or MSE between observers are from 50% to 86%.
62
R. Kamiyama and J. Chao
Fig. 9. Painting (b) shares Cathy’s impression on painting (a) with Dan
Below, we show two examples, the first one is when Cathy is a color normal observer and Dan is a color-weak of Type D. Figure 9 shows color impression sharing from Cathy to Dan. In particular, the sharing map creates a new image Fig. 9(b) by modifying Fig. 9(a) to give Dan the same impression of Cathy on Fig. 9(a). Figure 10(a) shows the SD evaluations of both Cathy and Dan on the paining Fig. 9(a). Figure 10(b) shows the SD evaluations of Cathy on Fig. 9(a) and of Dan on Fig. 9(b). The SD differences or MSE between Cathy and Dan is reduced from 3.42 to 0.49, a 85.71% reduction.
Fig. 10. Comparison of SD scores before and after Dan shared Cathy’s impression
How to Share Color Impression Among Different Observers
63
Fig. 11. Painting (b) shares Dan’s impression on painting (a) with Cathy
Fig. 12. Comparison of SD scores before and after Cathy shared Dan’s impression
On the opposite direction, Fig. 11 shown color impression sharing from Dan to Cathy. In particular, the sharing map creates a new image Fig. 11(b) by modifying Fig. 11(a) to give Cathy the same impression of Dan on Fig. 11(a). Figure 12(a) shows the SD evaluations of both Cathy and Dan on the paining Fig. 11(a). Figure 12(b) shows the SD evaluations of Cathy on Fig. 11(b) and of Dan on Fig. 11(a). The SD differences or MSE between Cathy and Dan is reduced from 3.42 to 0.51, a 85.06% reduction. The second example is a color impression between two color normal observers, Emma and Fred.
64
R. Kamiyama and J. Chao
Fig. 13. Painting (b) shares Emma’s impression on painting (a) with Fred
Fig. 14. Comparison of SD scores before and after Fred shared Emma’s impression
How to Share Color Impression Among Different Observers
65
Fig. 15. Painting (b) shares Fred’s impression on painting (a) with Emma
Figure 13 shown color impression sharing from Emma to Fred. In particular, the sharing map creates a new image Fig. 13(b) by modifying Fig. 13(a) to give Fred the same impression of Emma on Fig. 13(a). Figure 14(a) shows the SD evaluations of both Emma and Fred on the paining Fig. 13(b). Figure 14(b) shows the SD evaluations of Emma on Fig. 13(a) and of Fred on Fig. 13(b). The SD differences or MSE between Emma and Fred is reduced from 2.70 to 1.57, a 41.98% reduction.
Fig. 16. Comparison of SD scores before and after Emma shared Fred’s impression
66
R. Kamiyama and J. Chao
On the opposite direction, Fig. 15 shown impression sharing from Fred to Emma. In particular, the sharing map creates a new image Fig. 15(b) by modifying Fig. 15(a) to give Emma the same impression of Fred on Fig. 15(a). Figure 16(a) shows the SD evaluations of both Emma and Fred on the paining Fig. 15(a). Figure 16(b) shows the SD evaluations of Emma on Fig. 15(b) and of Fred on Fig. 15(a). The SD differences or MSE between Emma and Fred is reduced from 2.70 to 0.60, a 77.78% reduction.
7
Conclusions and Future Works
We proposed a novel scheme to share a color impression among different observers. The sharing map is defined as a simplical map between the parameter spaces of the observers, which is computationally efficient and without need of extensive SD evaluations. Experiments shown the proposed method is effective to share color impressions of paintings between color-normal and color-deficient observers and between color-normal observers as well. The future works besides applications to other areas include to try different compensation models of the images and to reduce further amount of necessary data measurements by psychological experiments.
References 1. Itoh Optical Industrial Co., Ltd: Variantor, http://www.variantor.com/en/ 2. Kiyoe, C., Masahiro, H.: Scale construction of adjective pairs on the research of impression of paintings. Kurume Univ. Psychol. Res. 12, 81–90 (2013) 3. Kiyoe, C., Masahiro, H.: Scale construction of adjective pairs on the research of impression of paintings II. Kurume Univ. Psychol. Res. 13, 45–53 (2014) 4. Kozaburo, H., Shigeru, E.: Retrieval of Paintings based on color distribution and impression words. In: IPSJ SIG Computers and the Humanities 1995(91 (1995CH-027)), pp. 37–44 (1995) 5. Munkres, J.R.: Elements Of Algebraic Topology. CRC Press, Boca Raton (1984) 6. Osgood, C.E., Suci, G.J., Tannenbaum, P.H.: The Measurement of Meaning. University of Illinois Press, Urbana (1957) 7. Oshima, S., Mochizuki, R., Lenz, R., Chao, J.: Modeling, measuring, and compensating color weak vision. IEEE Trans. Image Process. 25(6), 2587–2600 (2016). https://doi.org/10.1109/TIP.2016.2539679 8. Pei, Y., Takagi, H.: Research progress survey on interactive evolutionary computation. J. Ambient Intell. Hum. Comput. pp. 1–14 (2018). https://doi.org/10.1007/ s12652-018-0861-9 9. Mochizuki, R., Kojima, K., Reiner, L., Jinhui, C.: Color-weak compensation using local affine isometry based on discrimination threshold matching. J. Opt. Soc. Am. A 32(11), 2093–2103 (2015). https://doi.org/10.1364/JOSAA.32.002093 10. Shinto, M., Chao, J.: How to compare and exchange facial expression perceptions between different individuals with riemann geometry. In: Kurosu, M. (ed.) HCII 2019. LNCS, vol. 11567, pp. 155–167. Springer, Cham (2019). https://doi.org/10. 1007/978-3-030-22643-5 12
How to Share Color Impression Among Different Observers
67
11. Shinto, M., Chao, J.: A new algorithm to find isometric maps for comparison and exchange of facial expression perceptions. In: Kurosu, M. (ed.) HCII 2021. LNCS, vol. 12762, pp. 592–603. Springer, Cham (2021). https://doi.org/10.1007/978-3030-78462-1 46 12. Takagi, H.: Interactive evolutionary computation: fusion of the capabilities of EC optimization and human evaluation. Proc. IEEE 89(9), 1275–1296 (2001). https:// doi.org/10.1109/5.949485
Task-Based Open Card Sorting: Towards a New Method to Produce Usable Information Architectures Christos Katsanos(B)
, Vasileios Christoforidis, and Christina Demertzi
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece {ckatsanos,chrvaskos,demertzi}@csd.auth.gr
Abstract. Open card sorting is the most widely used HCI technique for designing user-centered Information Architectures (IAs). The method has a straightforward data collection process, but data analysis can be challenging. Open card sorting has been also criticized as an inherently content-centric technique that may lead to unusable IAs when users are attempting tasks. This paper proposes a new variant of open card sorting, the Task-Based Open Card Sorting (TB-OCS), which considers users’ tasks and simplifies data analysis. The proposed method involves two phases. First, small groups of participants perform classic open card sorting. Then, each participant performs findability tasks using each IA produced by the rest participants of the same group and their first-click success is measured. Analysis of the collected data involves simply calculating the first-click success rate per participants’ IA and selecting the one with the highest value. We have also developed a web-based software tool to facilitate the conduction of TB-OCS. A within-subjects user testing study found that open card sorting produced IAs that had significantly higher first-click success rates and perceived usability ratings compared to the IAs produced by TB-OCS. However, this may be due to parameters of the new method that require finetuning, thus further research is required. Keywords: Card Sorting · Information Architecture · IA · Task-Based Open Card Sorting
1 Introduction 1.1 Card Sorting and Information Architecture Card sorting is an established technique used to discover how participants might arrange and organise information that makes sense to them [1]. Many studies in the literature have used the method in a variety of contexts, such as exploring participants’ mental models for mental wellness [2], programming [3], cybersecurity [4] and haptic devices [5]. Open card sorting has been also used to group HCI design guidelines [6–8] or validate HCI tools that support the design of interactive systems [9–11]. Card sorting is most frequently employed, however, to assist in the creation or assessment of Information Architectures (IAs) [1, 12, 13]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 68–80, 2023. https://doi.org/10.1007/978-3-031-35132-7_5
Task-Based Open Card Sorting
69
The IA of an interactive system specifies how its content is structured and labeled [13]. Open card sorting asks participants to organize a set of provided labels that describe content items (the cards), written on paper or on any card sorting software tool, using their own groupings and category names [1]. Other variants that have been proposed in the literature include closed card sorting [1], hybrid card sorting [14] and modified Delphi card sorting [15]. In a closed card sort, participants organize a set of provided cards into a set of provided named categories. In a hybrid card sort, participants can place the provided cards into provided categories or make their own categories. In a Modified-Delphi card sort, the first participant performs an open sort, and each subsequent participant modifies it until consensus is reached. 1.2 Open Card Sorting Open card sorting is the most widely-used variant to design or evaluate the IA of interactive systems [1, 16]. Previous research has explored various questions related to open card sort data collection and analysis. Research has shown that a sample size of 15 to 20 participants is enough for open card sorts [17, 18]. Participants may complete an open card sort in anywhere between 20 and 60 min, depending on the number of cards [14]. There shouldn’t be less than 30 cards to sort since it can be difficult to establish groups, and there shouldn’t be more than 100 cards because participants might become tired or lost [1]. For a large set of cards, a sub-sample sorting approach has been proposed: if each participant sorts 60% of the total set and there are 30–40 participants involved, then the obtained data are highly similar to sorting done on the full set of cards [19]. Recent research provides support for the validity and reliability of open card sorts [20–22], but has also found that the results are significantly affected by participants’ characteristics, such as sense of direction and self-efficacy [23, 24]. Studies [25, 26] have also compared manual card sorting with physical cards against electronic card sorting using software. No differences have been found in the obtained results. However, the participants’ time spent sorting cards using software was significantly longer than their time spent sorting cards physically, especially for those who did not speak English as their first language [26]. Research [27] has also examined the usability of software tools for card sorting. It was found that researchers and participants preferred different card sorting tools. However, more current study is needed in this area since new card sort tools, such the open source CardSorter [28], have appeared and most tools in the existing study [27] are no longer supported or have substantially evolved. There is a rather large body of research on analysis of open card sort data, which is the most challenging part [29–31]. Various open card sort data analysis methods have been proposed in the literature, including tabulations [1] and graph visualizations of the data [32], factor analysis [33], general purpose clustering algorithms (e.g. hierarchical clustering, k-means clustering, multidimensional scaling) [1, 12, 14, 16, 31, 34, 35] and specialized algorithms developed for clustering open card sort data [29, 30, 36]. Righi and colleagues [31] present best practices for card sort analysis. Spencer [1] argues that both qualitative and quantitative analysis should be employed. Nawaz reports that analyzing the same data using different approaches results in varied IAs [35].
70
C. Katsanos et al.
1.3 Research Motivation This paper presents a new variant of open card sorting, the Task-Based Open Card Sorting (TB-OCS). Our motivation for proposing this new variant was twofold. On the one hand, card sorting has been criticized for not considering users’ tasks, which may lead to unusable IAs [37]. Participants may sort the cards without considering what the content is about or how they would use it to complete a task [37]. TB-OCS considers users’ tasks. On the other hand, open card sort data analysis remains a rather challenging task [29–31]. TB-OCS simplifies data analysis, but this comes at a cost of an increase in the complexity of running the card sort. In the following, we first present the TB-OCS method. Next, we present a two-phased study that compares the proposed method with the open card sorting method. In the first phase, the same participants sort the same cards following both the open card sorting and the TB-OCS approach. In the second phase, a different sample of participants interacts with two functional prototypes, one for the IA created by the open card sort and one for the IA created by the proposed method, and usability metrics are compared. The methodology and results of these two studies are reported in the following two sections, and the paper concludes with a discussion of the findings and future research directions.
2 Task-Based Open Card Sorting 2.1 Procedure The TB-OCS method involves the following steps: 1. Participants are divided into small groups (e.g., 3–5 participants per group). A small number of participants per group is required for practical purposes so that the total number of tasks to be performed in the final step of the method is manageable (see in the following). 2. Each participant of the group performs an open card sorting. Each such sorting is considered the participant’s proposal for the IA (participant’s IA). No data analysis is required for the open card sort data as each set of participant’s groupings corresponds to one IA candidate. 3. Each participant of the group performs tree testing [38, 39] in each IA created by the rest participants of the group. In a tree test, also known as reverse card sorting or card-based classification evaluation [40], participants are presented with an IA and are asked what they would select in order to accomplish a task. For TB-OCS, the study facilitator notes the total number of successfully completed tasks and the total number of tasks attempted per participant’s IA. 2.2 TB-OCS Software Tool We have developed a web-based software tool to facilitate the conduction of TB-OCS. This tool is implemented in React and is freely available as an open-source project at https://github.com/chrvaskos/card-sorting. Figure 1 presents the user interface of the TB-OCS tool while a user performs an open card sort (second step of the TB-OCS method). As shown in the figure, the tool
Task-Based Open Card Sorting
71
provides a list of the cards to sort (Fig. 1, left), which can be easily inserted as simple text in the corresponding file of the tool. Each card can be dragged and dropped into either a new category (plus symbol) or an existing one (represented as a box). This drag and drop moving of cards can be repeated as many times as the participant wants. Naming a category is done by clicking on the top of the category box and editing its title; the default title is “New category”.
Fig. 1. The user interface of the TB-OCS tool while a user performs an open card sort. The cards and categories are in Greek, the native language of our study participants.
The TB-OCS tool includes functionality to also support the third step of the TB-OCS method, which involves tree testing the participants’ IAs. First, there is a down arrow symbol to the right of the category names which can be pressed to hide or unhide the cards placed in the corresponding category. Hiding the content items of the categories is required for tree testing. In addition, there is a camera icon (Fig. 1, top-right) which downloads a screenshot of the tool user interface when pressed. This enables easy creation of an IA screenshot for the needs of tree testing. The tool also provides a text field for optionally adding the participant’s name or id (Fig. 1, top-left) so that an IA screenshot can be easily associated with a specific participant. Furthermore, there is an icon for instructions to the user (Fig. 1, bottom-right), which are displayed as a tooltip. These text instructions can be easily inserted as text in the corresponding file of the tool. Finally, the TB-OCS tool provides an embedded timer, which can be used by the participant in order to measure the session time for the open card sort. This timer provides the typical controls for starting, pausing and stopping it. Such functionality can be particularly useful when the open card sort is performed asynchronously (e.g., participants are instructed to time their session and send a screenshot with the session time value embedded).
72
C. Katsanos et al.
It should be noted that the TB-OCS tool can also be used to facilitate the collection of data in a typical open card sort. The collected data can then be exported and analyzed with other tools, such as Casolysis 2.0 [41], which is also what we did for the needs of the study described in the following (see Sect. 3.4). 2.3 Data Analysis The data produced by TB-OCS is a list with all the IAs produced by participants’ open card sorting and the total number of successful and attempted tasks per participant’s IA. Analysis of the collected data involves simply calculating the first-click success rate per participant’s IA (i.e., the number of successfully completed tasks for the IA divided by the number of tasks attempted). The proposed IA is the one with the highest first-click success rate.
3 Card Sorting Study The card sorting study employed both the open card sort method and the proposed TB-OCS method on the same group of cards with the same group of participants. In practice, only the TB-OCS method was employed for the data collection part given that open card sorting is a step of the proposed method. However, data analysis was separate per method. In the following, we describe the methodology and results of the card sorting study. 3.1 Participants A total of 20 participants, 10 females and 10 males, with mean age 28.4 years (SD = 9.7) were involved in the card sorting study. All the participants were native speakers of Greek, the language used in the cards. They were volunteers recruited by the authors and they were not compensated for their participation. 3.2 Cards Selection The card sort was about an existing eshop retailing various housewares. We selected a website domain that does not require any specialized knowledge so that no screening criteria would be required for participants and their recruitment would be easier. Following Spencer’s [1] recommendations, a total of 46 cards were chosen from the website. All cards were provided in Greek and were items selected from the lowest level of the website menu as it was available at the time of the study preparation. Examples of these cards translated into English are the following: “Cutlery”, “Bathroom curtains”, “Carpets”, “Knobs”, “Hangers”, “Photo frames”, “Candles”, “Pillowcases”, “Mirrors”, “Blankets”.
Task-Based Open Card Sorting
73
3.3 Instruments and Procedures First, the participants were split into five groups. Each group had four participants. Due to the COVID-19 restrictions, the sessions had to be performed from a distance. We used a Discord server to coordinate communication with participants and our custom-built TB-OCS tool to facilitate the card sorting. Next, each group of participants was invited to connect to a Discord voice channel on a pre-agreed time per group. This voice channel was used for communication between the study facilitators (i.e., two of the authors) and the participants as a group. The participants were welcomed, provided their consent for study participation and then were asked to read the study instructions. These instructions were available in a Discord text channel and included a hyperlink to the TB-OCS tool used to mediate the card sorting. Subsequently, participants were instructed to mute their microphones and perform a typical online open card sorting. At any point, the participants could privately talk with the study facilitators using Discord direct messaging or a one-to-one voice call in case they needed technical help, or they had a question. After completing the open card sorting, each participant used Discord direct messaging to send to the study facilitators two screenshots of their groupings. Screenshot1 presented only the categories created by each participant, whereas the Screenshot2 showed the full groupings (i.e., categories and cards placed in each category). These two screenshots were easily captured by participants through the functionality provided by the TB-OCS tool (Fig. 1, camera icon) and were used in the final step of the proposed TB-OCS method, as described in the following. After a brief break of five minutes, each participant had received through Discord direct messaging the following: a) four images showing only the categories created by the rest members of the group (Screenshot1), b) three task descriptions, each of which asked for locating a specific product on the eshop. For each Screenshot1, participants were asked to select the category that they would click to find each product and write it to the study facilitators using Discord direct messaging. The study facilitators used the corresponding Screenshot2 to decide whether the correct category had been selected. In addition, they entered in a spreadsheet the following data per participant’s IA: a) total number of successful tasks, b) total number of attempted tasks. Administration of the images with the categories and the tasks was counterbalanced to minimize order effects. 3.4 Data Analysis and Results for Open Card Sorting Open card sorting data were collected from 20 participants who performed individual sortings. According to research [17, 18], open card sorts require at least 15 users to produce reliable data, therefore our sample was sufficient. The collected card sorting data were analyzed combining exploratory and statistical analysis [1]. Our analysis was mediated by Casolysis 2.0 [41], a free software tool that supports a variety of methods for analyzing card sort data. First, the open card sort data were exported from TB-OCS and imported into Casolysis 2.0 as a csv file. Next, the visualization produced by Casolysis 2.0 using multidimensional scaling (MDS) was inspected in order to get a first understanding of the data. Subsequently, we explored the dendrogram produced by average-linkage hierarchical clustering. A dendrogram, a tree
74
C. Katsanos et al.
diagram that shows a hierarchy of groupings based on the dissimilarity of content items, is the main result of hierarchical cluster analysis. Then, we used the tool functionality to define standardized labels for every group. Each standardized label was the one that “had been used by most participants or represented the idea most clearly” [1]. Subsequently, we reinspected the MDS visualization and average-linkage dendrogram. This process was iterated and was greatly facilitated by the Casolysis 2.0 hold functionality that enables fixing individual card groups that are no longer considered in the next processing step. The latter made it possible to flexibly explore the solution space without losing intermediate results. 3.5 Data Analysis and Results for Task-Based Open Card Sorting The first step in analyzing the data produced by TB-OCS was to calculate the first-click success rate per participant’s IA. To this end, we used the spreadsheet produced by the study facilitators and simply added a column that divided the total number of successful tasks by the total number of attempted tasks for each participant’s IA. The mean IA success rate was 57% (SD = 22%) and ranged from 17% to 100%. According to the TB-OCS method, we selected as the proposed IA for the eshop the one with the highest success rate, which in our case was 100%.
4 User Testing Study The within-samples user testing study compared usability metrics for the IA produced by the open card sorting method (hereafter OCS eshop) against the IA produced by the proposed TB-OCS method (hereafter TB-OCS eshop). In the following, we describe the methodology and results of this user testing study. 4.1 Participants The user testing study involved 30 participants, 14 females and 16 males, with mean age 31.8 years (SD = 13.3). All the participants were native speakers of Greek, the language of the provided prototypes and questionnaires. They were recruited as volunteers by the authors, and they did not receive any payment for taking part. 4.2 Prototypes Two functional prototypes were created for the eshop. The prototypes shared the same overall appearance and feel and featured a top navigation menu. They differed only in their IA. One eshop implemented the IA produced by the open card sorting method and the other eshop the IA produced by the proposed TB-OCS method. The prototypes were created using HTML5, CCS3 and JavaScript, and were made available online through a web server.
Task-Based Open Card Sorting
75
4.3 Instruments and Procedures Due to the COVID-19 pandemic, participants were asked to attend a Discord videoconferencing call with the study facilitator (one of the authors). There they were first welcomed and provided their consent for study participation. Next, the participants received the hyperlinks for each eshop functional prototype and performed five tasks in both prototypes. Each task asked participants to find a specific item to buy: a) cookbook, b) mirror, c) stationery, d) fragrant cards, and e) vanity bag. These items were selected because they were categorized differently in the two eshop versions. To reduce order effects, the order of both the eshop versions and tasks was counterbalanced. Participants used screen sharing so that the facilitator could observe their interactions. The facilitator recorded whether they made the right choice with their first click (firstclick success) and how long each task required (time on task). After performing all the tasks in an eshop version, participants received a hyperlink to complete the System Usability Scale (SUS) [42] in Greek (SUS-GR) [43, 44]. SUS is a standardized scale that measures perceived usability. It has 10 questions and yields a final score between 0 and 100, the higher the score the more usable the system. In agreement with previous studies [43, 44], SUS-GR was found to have high internal reliability in our dataset; Cronbach’s α = 0.811, N = 10 items. The Google Forms service was used to create and distribute the study questionnaire. IBM SPSS Statistics 27 was used for the statistical analysis of the collected data. 4.4 Data Analysis and Results Table 1 presents descriptive statistics of the dependent variables measured in the user testing study. In all subsequent statistical analyses, the effect size r was calculated according to the formulas reported in [45]. Table 1. Descriptive statistics of the dependent variables measured in the user testing study. Eshop IA version
Variable
Mean
Mdn
SD
95% C.I
Open card sorting
First-click success (%)
72.67
80.00
17.01
(66.32, 79.02)
Task-based open card sorting
First-click success (%)
52.00
60.00
19.37
(44.77, 59.23)
Open card sorting
Time on task (sec)
13.91
12.90
4.99
(12.05, 15.77)
Task-based open card sorting
Time on task (sec)
15.48
14.63
6.41
(13.09, 17.87)
Open card sorting
SUS score (0–100)
90.00
92.50
11.03
(85.88, 94.12)
Task-based open card sorting
SUS score (0–100)
76.42
78.75
17.81
(69.77, 83.07)
First-Click Success. A Shapiro-Wilk test found that the distribution of the differences in the first click success for the OCS eshop and the TB-OCS eshop did not deviate
76
C. Katsanos et al.
significantly from a normal distribution; W(30) = 0.933, p = 0.060. Given that the p value was rather close to the 0.05 threshold, the histogram, and skewness and kurtosis values were also studied. They were found to support the Shapiro-Wilk finding. Thus, a parametric test was used to compare participants’ first-click success between the two conditions. A two-tailed dependent t-test found that participants were significantly more successful with their first click in the OCS eshop (M = 72.67%, SD = 17.01%) compared to the TB-OCS eshop (M = 52.00%, SD = 19.37%); t(29) = 4.356, p< 0.001, r = 0.629. Time on Task. Shapiro-Wilk analysis showed that the assumption of normality was violated for the differences in the task times of the two conditions; W(30) = 0.919, p = 0.026. Thus, a non-parametric test was used to compare participants’ time on task between the OCS eshop and the TB-OCS eshop. A two-tailed Wilcoxon signed-rank test found that the eshop IA version did not significantly affect participants’ time on task; z = 1.512, p = 0.131. Participants average time on task in the OCS eshop (Mdn = 12.90 s) and the TB-OCS eshop (Mdn = 14.63 s) was similar. SUS Score. A Shapiro-Wilk test found that the assumption of normality was violated for the differences in the SUS score of the two conditions; W(30) = 0.879, p = 0.003. Thus, a non-parametric test was used to compare participants’ SUS score between the OCS eshop and the TB-OCS eshop. A two-tailed Wilcoxon signed-rank test found a significant effect of the eshop IA version on participants’ SUS score; z = 4.258, p < 0.001, r = 0.550. Participants provided significantly higher SUS score for the OCS eshop (Mdn = 92.50) compared to the TB-OCS eshop (Mdn = 78.75).
5 Discussion and Conclusion Open card sorting is an important method for HCI research and practice, and thus there are many publications on how to conduct it and analyze its data. However, open card sorting does not consider users’ goal-directed behaviour when interacting with IAs and thus might result in unusable IAs [37]. Additionally, analysis of open card sorting data remains the main challenge of the method [29–31]. This paper proposed TB-OCS, a new card sorting variant that attempts to address these two limitations of open card sorting. A software tool, named TB-OCS tool, has been also developed to facilitate TB-OCS data collection. A two-phase study investigated whether the new method produces more usable IAs compared to open card sorting. In the first phase, 20 participants were involved in a card sorting study with 46 cards from an eshop. The study was mediated by our TB-OCS tool. This phase produced two IAs for the eshop: one based on analysis of open card sort data, and one based on analysis of TB-OCS data. In the second phase, 30 participants were involved in a within-samples user testing study that compared two functional prototypes, one per aforementioned IA. Results showed that users interacting with the IA produced by the TB-OCS method made significantly less correct first clicks and provided significantly lower perceived usability ratings compared to when interacting with the IA produced by open card sorting method. No significant difference was found for the time required to find products. On the one hand, the proposed method greatly simplified the analysis of the collected data, which is the main challenge in open card sorting. And it did so by increasing the
Task-Based Open Card Sorting
77
session time by only 8 min and 10 s on average; 5 min break between TB-OCS step2 and step3 plus 3 min and 10 s on average for TB-OCS step3. Of course, the new method also increased the complexity of running the sort for the facilitators. However, we believe that this can be alleviated with some experience and/or pilot testing of the study. On the other hand, the open card sorting method was found to produce more usable IAs compared to the proposed method. Although we did not expect this finding, additional studies are required to investigate if it is generalizable. If it is indeed generalizable, then this would provide support against the critique to the open card sort method that it is too content-centric and may lead to unusable IAs. If it is not generalizable, then we need to explore why the new method works well is some cases and not in some others. We speculate that it might be related to parameters of the new method that need finetuning, such as the number of groups of participants (5 in this study), the number of participants per group (4 in this study), and the number of tree testing tasks (3 in this study). For example, increasing the number of tree testing tasks could provide an IA that has increased overall findability, but this would also increase the session time. In conclusion, this research found that the classic open card sorting technique leads to more usable information architectures compared to the proposed technique. However, this may be due to parameters of the new technique that were not explored in this paper. Therefore, additional research is required in order to draw safe conclusions about the proposed technique. Acknowledgments. We would like to thank the anonymous participants that volunteered to participate in our studies and thus made this research possible.
References 1. Spencer, D.: Card Sorting: Designing Usable Categories. Rosenfeld Media, Brooklyn (2009) 2. Kelley, C., Lee, B., Wilcox, L.: Self-tracking for mental wellness: understanding expert perspectives and student experiences. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 629–641. ACM, New York (2017). https://doi.org/10. 1145/3025453.3025750 3. Dorn, B., Guzdial, M.: Learning on the job: characterizing the programming knowledge and learning strategies of web designers. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI 2010), Atlanta, Georgia, USA, pp. 703–712 (2010). https://doi.org/10.1145/1753326.1753430 4. Jeong, R., Chiasson, S.: “Lime”, “open lock”, and “blocked”: children’s perception of colors, symbols, and words in cybersecurity warnings. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 14 pp. ACM, New York (2020). https://doi.org/ 10.1145/3313831.3376611 5. Seifi, H., Oppermann, M., Bullard, J., MacLean, K.E., Kuchenbecker, K.J.: Capturing experts’ mental models to organize a collection of haptic devices: affordances outweigh attributes. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 13 pp. ACM, New York (2020). https://doi.org/10.1145/3313831.3376395 6. Adamides, G., Christou, G., Katsanos, C., Xenos, M., Hadzilacos, T.: Usability guidelines for the design of robot teleoperation: a taxonomy. IEEE Trans. Hum. Mach. Syst. 45, 256–262 (2015). https://doi.org/10.1109/THMS.2014.2371048
78
C. Katsanos et al.
7. Kappel, K., Tomitsch, M., Költringer, T., Grechenig, T.: Developing user interface guidelines for DVD menus. In: CHI 2006 Extended Abstracts on Human Factors in Computing Systems, pp. 177–182. ACM, New York (2006). https://doi.org/10.1145/1125451.1125490 8. Zaphiris, P., Ghiawadwala, M., Mughal, S.: Age-centered research-based web design guidelines. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1897–1900. ACM, New York (2005). https://doi.org/10.1145/1056808.1057050 9. Katsanos, C., Tselios, N., Avouris, N.: AutoCardSorter: designing the information architecture of a web site using latent semantic analysis. In: Proceedings of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, CHI 2008, pp. 875–878. ACM, Florence (2008). https://doi.org/10.1145/1357054.1357192 10. Katsanos, C., Tselios, N., Avouris, N.: Automated semantic elaboration of web site information architecture. Interact. Comput. 20, 535–544 (2008). https://doi.org/10.1016/j.intcom. 2008.08.002 11. Katsanos, C., Tselios, N., Goncalves, J., Juntunen, T., Kostakos, V.: Multipurpose public displays: Can automated grouping of applications and services enhance user experience? Int. J. Hum. Comput. Interact. 30, 237–249 (2014). https://doi.org/10.1080/10447318.2013. 849547 12. Albert, W., Tullis, T.S.: Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics. Morgan Kaufmann (2013) 13. Rosenfeld, L., Morville, P., Arango, J.: Information Architecture: For the Web and Beyond. O’Reilly Media, Sebastopol (2015) 14. Hudson, W.: Card sorting. In: Soegaard, M., Dam, R.F. (eds.) The Encyclopedia of HumanComputer Interaction, 2nd edn. Interaction Design Foundation (2013) 15. Paul, C.: A modified Delphi approach to a new card sorting methodology. J. Usability Stud. 4, 7–30 (2008) 16. Wood, J., Wood, L.: Card sorting: current practices and beyond. J. Usability Stud. 4, 1–6 (2008) 17. Tullis, T., Wood, L.: How many users are enough for a card-sorting study? In: Usability Professionals Association (UPA) 2004 Conference, Minneapolis, MN (2004) 18. Nielsen, J.: Card Sorting: How many users to test. http://www.useit.com/alertbox/20040719. html. Accessed 10 Feb 2023 19. Tullis, T., Wood, L.: How can you do a card-sorting study with LOTS of cards? In: Usability Professionals Association (UPA) 2004 Conference, Minneapolis, MN (2004) 20. Katsanos, C., Tselios, N., Avouris, N., Demetriadis, S., Stamelos, I., Angelis, L.: Cross-study reliability of the open card sorting method. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, pp. LBW2718:1–LBW2718:6. ACM, New York (2019). https://doi.org/10.1145/3290607.3312999 21. Pampoukidou, S., Katsanos, C.: Test-retest reliability of the open card sorting method. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. pp. Article330:1–Article330:7. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3411763.3451750 22. Ntouvaleti, M., Katsanos, C.: Validity of the open card sorting method for producing website information structures. In: CHI Conference on Human Factors in Computing Systems Extended Abstracts, pp. Article374:1–Article374:7. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3491101.3519734 23. Zafeiriou, G., Katsanos, C., Liapis, A.: Effect of sense of direction on open card sorts for websites. In: CHI Greece 2021: 1st International Conference of the ACM Greek SIGCHI Chapter, pp. Article6:1–Article6:8. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3489410.3489416
Task-Based Open Card Sorting
79
24. Katsanos, C., Zafeiriou, G., Liapis, A.: Effect of self-efficacy on open card sorts for websites. In: Proceedings of the 24th International Conference on Human-Computer Interaction, HCI International 2022, pp. 75–87. Springer International Publishing, Gothenburg (2022). https:// doi.org/10.1007/978-3-031-06424-1_7 25. Harper, M.E., Jentsch, F., Van Duyne, L.R., Smith-Jentsch, K., Sanchez, A.D.: Computerized card sort training tool: is it comparable to manual card sorting? In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, pp. 2049–2053. SAGE Publications Inc. (2002). https://doi.org/10.1177/154193120204602512 26. Petrie, H., Power, C., Cairns, P., Seneler, C.: Using card sorts for understanding website information architectures: technological, methodological and cultural issues. In: Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011. LNCS, vol. 6949, pp. 309–322. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3642-23768-3_26 27. Chaparro, B.S., Hinkle, V.D., Riley, S.K.: The usability of computerized card sorting: a comparison of three applications by researchers and end users. J. Usability Stud. 4, 31–48 (2008) 28. Melissourgos, G., Katsanos, C.: CardSorter: Towards an open source tool for online card sorts. In: Proceedings of the 24th Pan-Hellenic Conference on Informatics, pp. 77–81. ACM, New York (2020). https://doi.org/10.1145/3437120.3437279 29. Paea, S., Katsanos, C., Bulivou, G.: Information architecture: Using k-means clustering and the best merge method for open card sorting data analysis. Interact. Comput. 33, 670–689 (2021). https://doi.org/10.1093/iwc/iwac022 30. Paea, S., Katsanos, C., Bulivou, G.: Information architecture: using best merge method, category validity, and multidimensional scaling for open card sort data analysis. Int. J. Hum. Comput. Interact. Forthcoming, 1–21 (2022). https://doi.org/10.1080/10447318.2022.211 2077 31. Righi, C., et al.: Card sort analysis best practices. J. Usability Stud. 8, 69–89 (2013) 32. Paul, C.: Analyzing card-sorting data using graph visualization. J. Usability Stud. 9, 87–104 (2014) 33. Capra, M.G.: Factor analysis of card sort data: an alternative to hierarchical cluster analysis. In: Proceedings of the Human Factors and Ergonomics Society 49th Annual Meeting, pp. 691– 695. HFES, Santa Monica (2005) 34. Dong, J., Martin, S., Waldo, P.: A user input and analysis tool for information architecture. In: CHI 2001 extended abstracts on Human factors in computing systems, pp. 23–24. ACM, Seattle (2001). https://doi.org/10.1145/634067.634085 35. Nawaz, A.: A comparison of card-sorting analysis methods. In: Proceedings of the 10th Asia Pacific Conference on Computer-Human Interaction, APCHI 2012, pp. 583–592. ACM Press (2012) 36. Paea, S., Baird, R.: Information Architecture (IA): Using multidimensional scaling (MDS) and K-Means clustering algorithm for analysis of card sorting data. J. Usability Stud. 13, 138–157 (2018) 37. Usability Body of Knowledge: Card Sorting. https://www.usabilitybok.org/card-sorting. Accessed 21 Oct 2022 38. Whitenton, K.: Tree testing: fast, iterative evaluation of menu labels and categories. https:// www.nngroup.com/articles/tree-testing/. Accessed 10 Feb 2023 39. Hanington, B., Martin, B.: Universal Methods of Design, Expanded and Revised: 125 Ways to Research Complex Problems, Develop Innovative Ideas, and Design Effective Solutions. Rockport Publishers, Beverly (2019) 40. Spencer, D.: Card-based classification evaluation. https://boxesandarrows.com/card-basedclassification-evaluation/. Accessed 10 Feb 2023
80
C. Katsanos et al.
41. Szwillus, G., Hülsmann, A., Mexin, Y., Wawilow, A.: Casolysis 2.0-Flexible auswertung von card sorting experimenten. In: Mensch und Computer 2015–Usability Professionals, pp. 444–455. De Gruyter Oldenbourg (2015) 42. Brooke, J.: SUS: a “quick and dirty” usability scale. In: Jordan, P.W., Thomas, B., Weerdmeester, B.A., and McClelland, A.L. (eds.) Usability Evaluation in Industry. Taylor and Francis, London (1996) 43. Katsanos, C., Tselios, N., Xenos, M.: Perceived usability evaluation of learning management systems: a first step towards standardization of the System Usability Scale in Greek. In: Proceedings of the16th Panhellenic Conference on Informatics (PCI 2012), pp. 302–307 (2012) 44. Orfanou, K., Tselios, N., Katsanos, C.: Perceived usability evaluation of learning management systems: Empirical evaluation of the System Usability Scale. Int. Rev. Res. Open Distrib. Learn. 16, 227–246 (2015) 45. Field, A.P.: Discovering Statistics Using IBM SPSS Statistics. SAGE, London (2013)
Emotive Idea and Concept Generation Tetsuya Maeshiro1(B) , Yuri Ozawa2 , and Midori Maeshiro3 1
3
Faculty of Library, Information and Media Studies, University of Tsukuba, Tsukuba 305-8550, Japan [email protected] 2 Ozawa Clinic, Tokyo, Japan School of Music, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Abstract. This paper discusses a model of human creative activities with influence of emotion using the framework of wisdom science. The two main components of the model are the concept processing and image processing modules, which function in parallel with information exchange links between them. Emotion affects both modules. Relation of the proposed model with tacit and explicit knowledge is also discussed.
1
Introduction
This paper proposes and discusses the necessity of incorporating emotional aspects to model and understand the mechanism of human creative activities, particularly new idea and concept generation. A model and theoretical basis to study creative activities by human beings is presented using the framework of wisdom science [3], and this paper discusses specifically the influence of emotional aspects on idea, image and concept generation, and their manipulation. Idea and concept generation is the most important and fundamental step in research and artistic activities. Two representative classes of creative activities exist. One is artistic, where the constraints that limit the freedom of creation are weak or practically non-existent. The other one is scientific discovery, where the logic and theoretical constraints of the relevant field has strong influence. Other cases lie between these two classes. Creative activity is a result of knowledge manipulation. As such, explicit knowledge corresponds to concept processing, and tacit knowledge to image processing. Another interpretation is that concept processing manipulates explicit knowledge, and image processing manipulates tacit knowledge. We assume that human creative activities are the results of a parallel interaction between language based processing and image based processing [4] (Concept Processing and Image Processing in Fig. 1). The former is associated with logic, and the latter with sensibility. We denote the former as concept processing and the latter as image processing. The term “image” does not imply that the image processing is abstract or vague. On the contrary, most images are clear, although their description, usually using natural language, is difficult, but their mental c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 81–94, 2023. https://doi.org/10.1007/978-3-031-35132-7_6
82
T. Maeshiro et al.
manipulation is possible. Such operation would not be possible if images were vague. Creative activity is a result of knowledge manipulation. As such, explicit knowledge corresponds to concept processing, and tacit knowledge to image processing. Another interpretation is that concept processing manipulates explicit knowledge, and image processing manipulates tacit knowledge.
Fig. 1. Parallel processes of concept processing and image processing
The inseparability of reason and “emotion” is widely accepted [1]. However, conventional models of knowledge manipulation, for instance [5], exclude “emotive” or “non-logical” aspects. This is a flaw of conventional models. This paper treats the model of human creative activities, treating creative process as a image generation and manipulation process, not limited to concepts linked to “knowledge”. Furthermore, this paper incorporates the emotional aspects, which is completely ignored in previous studies. Wisdom science [3], upon which the present study is based, treats both explicit and tacit knowledge as two different facets of collection of knowledge elements, denoted as knowledge ocean. Explicit knowledge and tacit knowledge have contrasting properties, where the former can be explicitly described, is objective, can be communicated through text, shared by a group of people, and represents a consensus of a group of people. On the other hand, tacit knowledge cannot be described by text, is individual, subjective and internal, thus not shared by people, and is strongly associated with the body of each person and personal feelings. The basic framework of wisdom science is that the elements of knowledge of a person is activated and manipulated consciously for explicit knowledge and unconsciously for tacit knowledge. Our model of tacit and explicit knowledge is fundamentally different from the conventional frameworks where tacit and explicit knowledge constitute knowledge of a person and the two types of
Emotive Idea and Concept Generation
83
knowledge are distinct elements of knowledge of a person (Fig. 2). Therefore, our model proposes that a given knowledge element may be activated as explicit knowledge in one instance and activated as tacit knowledge in another instance. Questioning and answering process is an example of knowledge elements being used in tacit or explicit knowledge depending on situation, represented as small circles in the bottom linked to explicit knowledge and tacit knowledge in Fig. 4. When a person is asked a question, for instance in a interview, the asked person sometimes say opinions, thoughts or ideas that the person has never said before or the person himself was not aware of. And the person himself feels surprised to recognize how the person was thinking about the asked issue. Formulating a sentence to talk involves gathering knowledge elements or concepts related to the content of the speech. In this (i) collecting process and (ii) the meaning generated by the formulated sentence, the knowledge elements that belong to tacit knowledge are activated. In conventional interpretation, this process is a tacit concept being transformed to an explicit concept. However, wisdom science models this process as a concept element that was already evoked as tacit knowledge became also evoked as explicit knowledge. As stated before, tacit knowledge and explicit knowledge are the representations of knowledge elements linked to the consciousness of the person possessing that knowledge.
Fig. 2. Conventional interpretation of knowledge of a person
Tacit knowledge is contrasted to explicit knowledge primarily on the describability by a text. Explicit knowledge and tacit knowledge are closely related with memory skill and imagination skill, respectively. While the memorization skill is related to rationality, imagination skill is related to emotions and feelings. However, memorization of images is also linked to emotions and feelings. Few attempts to model the simultaneous use of explicit and tacit knowledge have been reported. For instance, from the facet related to the individuality of
84
T. Maeshiro et al.
knowledge, Nonaka treated the tacit knowledge based on the innovation process, which is a creative activity, of a group of individuals, and proposed the SECI model [5]. The SECI model connects the explicit and tacit knowledge by proposing the mechanisms involved in the endless cycle of explicit knowledge ⇒ tacit knowledge ⇒ explicit knowledge ⇒ tacit knowledge · · · . It offers a general framework, but lacks the detailed mechanism of individual processes described in the model, particularly the mechanism of transitions among tacit and explicit knowledge. In other words, it fits the observed process because the description is abstract and models abstract phenomena, but no concrete mechanism has been proposed, for instance a structure that might enable computer based implementations, for instance. Tacit knowledge is so denoted because the person cannot describe his own tacit knowledge, particularly by natural language, and is even more difficult, if not impossible, to be described by the others. However, it is often possible to describe vaguely using images or feelings. The author agrees with Polanyi’s statement that the inability of description does not negate its existence. A focus of this paper is the integrated treatment of human creative activity with influence of emotion. Activities using the knowledge, which is not limited to “intellectual” activities, is primarily an individual act. Any activity using the knowledge involves not only the explicit knowledge, but also the tacit knowledge. Conventional studies tend to treat explicit knowledge as an external layer of the tacit knowledge or explicit and tacit knowledge as two independent modules, but this paper treats the both explicit and tacit knowledge as two facets of the knowledge of a person. Wisdom science models knowledge as a pool of knowledge elements, denoted knowledge ocean, and they are recruited when the knowledge element is used to represent a concept, either explicit or tacit (Fig. 4). The model assumes two distinct processing for each explicit and tacit knowledge, the concept processing and the image processing. This is the basic model. This paper proposes that the emotion affects the manipulation of knowledge elements. The concept and image processings are distinct, but the emotion affects both types of processing. The integrated model is shown in Fig. 4. The mechanism to change the emotion status is hidden in the figure. Moreover, the results of concept and image processings may influence the emotion status, but it is also hidden.
2
Concept and Image Fusion
Human creative activities involve employment of personal knowledge. The proposed model of creative activity focuses on the different aspect of knowledge, as the concept processing corresponds to explicit knowledge, and the image processing to tacit knowledge. From the viewpoint of describability, tacit and explicit knowledge share the same ocean that is the set of knowledge elements (bottom of Fig. 4). On the other hand, the creative activity model focuses on more dynamic aspects, and includes additional property of describability, which is what differentiates tacit and explicit knowledge.
Emotive Idea and Concept Generation
85
Fig. 3. Model of functional interactions between parallel processes of concept processing and image processing with emotion. The whole process is affected by emotion.
Concept ocean and image ocean are represented as distinctive entities, contrary to the ocean of concept elements shared by tacit and explicit knowledge (bottom of Fig. 4). We introduced the concept of image in knowledge representation and manipulation as an element with fundamentally different properties from the concept element. The possible operations are also fundamentally distinct, as images can be fused, partially removed or gradually changed, which is impossible with concepts. Therefore, our creative activity model composed of concept and image processing offers a new facet of knowledge manipulation, describing different viewpoint from the conventional treatment of tacit and explicit knowledge The image ocean can be interpreted as the set of knowledge elements belonging to the ocean (Fig. 6) interpreted as image elements. Similarly, the concept ocean consists of knowledge elements focusing on their describable aspects. Such classification based on concept and image aspect is completely different from the conventional classification of tacit or explicit, which is based on the describability by the person himself. The proposed model focuses on individual activity, and is related to Polanyi’s treatments [6]. The authors assume that creative activity is based on neural activities. Polanyi also used the concept of emergence in the process of scientific discovery, the relationship between the elements (subsidiary particulars) and the image generated from the result of focusing the whole. The integrated image of the whole disappears if any of elements is focused. Our model offers more concrete aspects than conventional studies, by introducing two processes that manipulates two distinct objects, the concept ocean and the image ocean. The model of concept and image processing offers a different perspective from the conventional explicit and tacit knowledge model. Both offer possible interpretations of human knowledge. In our framework, the oceans, which contain the respective functional and descriptive elements, are the shared components among models. We propose that the elements in the oceans correspond to single
86
T. Maeshiro et al.
Fig. 4. Wisdom science interpretation of knowledge of a person. Explicit knowledge and tacit knowledge as activated from the ocean of knowledge elements. Bottom dotted circle represents the core of knowledge of a person, and small circles in the bottom dotted circle denote “knowledge elements”. Integrated model of functional interactions between parallel processes of concept processing and image processing with emotion
or groups of neurons. The existence of shared component ensures the compatibility between the two models, and the two models represent different facets of the knowledge. Compared to the model of explicit and tacit knowledge, the model of concept and image processing is function oriented, focusing on the invocation of concepts and images, and the expansion and reduction entities involved in creative activities. Conventional definition that tacit knowledge is processed on unconscious level partially due to its non-describability does not apply to image processing. In this aspect, the image processing is different from tacit knowledge because image processing is executed on conscious level. Undoubtedly image processing is not executed only at conscious level, and images operated in image processing associated with tacit knowledge are manipulated at unconscious level. The basic features to define tacit knowledge are the unconscious and automatic process and inability of description. Therefore, the execution at unconscious level is an essential feature of tacit knowledge, which is not for image processing, then tacit knowledge and image processing are incompatible and are models based on different viewpoints. Consequently, image processing is distinct from tacit knowledge, and they denote different aspects of human intellectual activities. One of main similarities between image processing and tacit knowledge is the difficulty to
Emotive Idea and Concept Generation
87
describe manipulated elements (knowledge or image) using language. Therefore, Figs. 1, 2 and 4 are distinct models of personal knowledge, but compatible as these models share the basic elements belonging to oceans of respective models. The proposed model that consists of the concept processing and image processing shares the similarity with the conventional classification of explicit and tacit knowledge. The conventional and proposed classifications are based on different viewpoints to comprehend human knowledge. The explicit and tacit knowledge designation is based on the describability by the person himself who manipulates the involved knowledge. Hence the explicit and tacit knowledge are distinct entities. On the other hand, the concept processing and image processing are designated based on the type of knowledge elements that are engaged in two processings. The knowledge elements manipulated in concept processing, the concept elements, are describable entities and correspond to explicit knowledge. On the other hand, the image processing operates on image elements, which are not limited to be describable, and can also be non-describable. Consequently, the image processing coincides with explicit knowledge when describable image elements are manipulated, and with tacit knowledge when engaged image elements are non-describable.
3
Sequence of Image and Concept Generation
In the sequence of images, at each instant single or multiple images are evoked. Similar is for the concepts. Not all these images and concepts are recognized by the person himself. Single or multiple, the “selection” of the next image is controlled by some mechanism. We propose that this selection, although the detailed mechanism cannot be provided, is influenced by emotion (Fig. 5). Thus the generated next image depends on the emotion, as shown in Fig. 5(B). From the image X at time T 1, the next image is generated at time T 2, but the generated image depends on the emotion of the person at that instant. The “image Y1” is generated if the emotion is the “emotion-A”, the “image Y2” if the emotion is the “emotion-B”, and the “image Y3” if the emotion is the “emotion-C”. The images Y1, Y2 and Y3 are not necessarily completely distinct, and they may be partially distinct. The images are probably not independent entities, i.e., there is no “pool of images” from which an image is selected. Instead, the more likely is that an image is generated by composing, combining or integrating the “image elements” selected from the image pool (Fig. 6). The figure illustrates that the image Y1 is generated under the emotion-A, and the image Y2 under the emotion-B. Figure 6 also illustrates the detailed description of image sequence on instant T 2, where the emotion influences the choice of “image elements” in the process of image formulation. The emotion influences the whole selection and integration process, which is the reason that the target of the arrow from “emotion” is not clearly pointed. Figure 6 is similar to the concept formulation (Fig. 7) and
88
T. Maeshiro et al.
Fig. 5. Sequence of image generation with emotion
image formulation (Fig. 8) mechanisms we proposed [3], differing only in the global influence of the emotion. We also propose that the emotion influences the selection of concepts, not only on concept processing and image processing (Fig. 4). Our experiences suggest that multiple locally different emotions inside one person probably do not exist. Therefore, it is plausible to assume that the emotion is analogous the environment or an atmosphere that involves the whole concept and image manipulation system (Figs. 6 and 9). Figure 5 illustrates the sequence of invoked images. In each instant Ti , a subset of image elements E1 = (e11 , e12 , . . . , e1N ) are selected, and synthesized as a single image. Then on the next instant Ti+1 , another subset of image elements E2 = (e21 , e22 , . . . , e2N ) are selected and synthesized. Basically the two subsets E1 and E2 are distinct, but may contain same image elements, i.e., E1 ∩ E2 = ∅ . Different images can be synthesized from the identical set of image elements by changing the size or relative positions among image elements in the synthesized image, but we ignore this possibility for the simplicity of the discussion. No restrictions on the number of invoked image elements exist in each time Ti and the value can be the same or not. When thinking about some not yet elucidated physical process, for instance, and trying to formulate a theory about the process, it is helpful to think based on images, visualizing the process as interactions among the elements that are involved in the process. More specific case examples are the airflow through the jet engine, and the airflow around a car when designing the shape of the car with the smallest drug coefficient. Although amateurs and novices cannot, experienced specialists are able to visualize mentally and imagine how the air flows on the surface of a car or a particular components of the car. When doing this, the person, such as engineer or designer, is not mentally calculating the velocity of air for each region of the surface of car using mathematical formula. Instead, the person is imagining inside his brain visually, probably visualizing the air stream as thin lines with various colors, the color representing the speed, temperature or pressure. Another example is of the physicist Feynman that visualized mathematical formula of physical processes with symbols representing
Emotive Idea and Concept Generation
89
Fig. 6. Image generation affected by emotion. Arrows denote image elements invocation.
different variables visualized as symbols with different colors. The point is that the visualization was not descriptive using language, but is that the image was used to think. In music composition, image comes first and the composer places musical notes on music sheets. When musicians perform, they create images from the musical piece and translate the images to the sound by manipulating musical instruments. The idea or concept generation is not executed using language, but by generating image of the sound to be played or larger image that corresponds to the phrases or melodies. When an experienced critic listens, he is able to identify the lack of emotion, such as love, when a person plays musical instruments. It is interesting that when the playing technique is mainly pursued by the musician, the played music lacks emotional drive that influence the listeners. [particularly Asians] Moreover, happy and angry or sad emotional state of the instrument player influences the generated sound and played music (Fig. 10). The effects of emotion on idea and concept generation can be classified into two types: (1) affect the generation itself, generate or not the idea and concept; (2) affect the quality or type of generated idea. The former can be exemplified by scientific idea generation, as no idea emerges when the person is in deteriorating emotional state for the idea and concept generation. Note that this state is not necessarily the negative emotional state, as the emotional state that results in favorable condition is person-dependent. Examples of the latter is the music composition and musical performance. Emotion affects the image generated inside
90
T. Maeshiro et al.
Fig. 7. Concept generation from pool of concept elements. Arrows denote concept elements invocation.
the mind of the performer, which is reflected on the sound, tempo, loudness, phrase pause, among other factors, when combined, formulate the impression of the performed music. Note that this does not apply to novices, as novices use all their effort and attention to execute exactly the notes written in music sheets. Although no formal survey exists, from the authors’ experiences and of persons around us, favorable and damaging conditions are personal, differing for each individual. However, good/accelerating and bad/damaging conditions do exist, which implies the necessity to model the influences of emotion on idea and concept generation. Good conditions can be while in hot spring, taking shower, walking in the forest, during the wakening process from sleep, drinking tea, in sauna, listening to particular music, during the particular time of day such as early morning or evening, in quiet or loud place, light or dark place, large or small place. A good condition for one person can be a bad condition for another person. As stated before, these conditions does not affect directly the idea and concept generation process. Instead, the influence is indirect, as these conditions evoke particular emotions that influence the idea and concept generation process.
Emotive Idea and Concept Generation
91
Fig. 8. Image generation from pool of image elements. Arrows denote image elements invocation.
4
Knowledge Manipulation
This paper proposes that emotion also influences knowledge manipulations. For instance, in music composition, the composed music will be distinct for different emotional state of the composer, even if the employed knowledge elements are identical. If the composer is in happiness state, for instance, the composed music might be of major mode and containing more open and bright passages, while in opposite emotional state, the composed music might of minor mode. Brahms’s two piano concertos are such examples, where his emotional state at the time of composition influenced the musical piece, as the piano concerto No.1 is minor mode and has small melody lines movements, opposed to the major mode and dynamic chord movements of the piano concerto No. 2. Other such examples exist in musical pieces. Therefore, we assume that emotion plays an important role in knowledge manipulation, in both explicit and tacit knowledge. Our viewpoint about the relationship between explicit and tacit knowledge is different from conventional standpoints. Conventionally the explicit and tacit knowledge are characterized as distinct entities. Treating as two different facets of an entity implies that each of explicit knowledge and tacit knowledge can be treated as two independent systems. One viewpoint is to model the knowledge of
92
T. Maeshiro et al.
Fig. 9. Concept generation affected by emotion. Arrows denote concept elements invocation.
a person as a system of systems. However, wisdom science treats the knowledge of a person as a different system that is the result of the fusion of explicit knowledge and tacit knowledge, and elements employed by tacit or explicit knowledge is stored in different entity. Compared to conventional interpretation of tacit and explicit knowledge as distinct entities (Fig. 2), the model illustrated in Fig. 4 explains better the transitions of knowledge pieces or elements between tacit and explicit knowledge. Wisdom science mainly focuses on phenomena in individuals, although social knowledge involving groups of people is also treated. If tacit and explicit knowledge are distinct entities as conventionally assumed, it is difficult to explain the transformation mechanism of tacit knowledge to explicit and vice-versa without the existence of something that transitions between tacit and explicit knowledge. Wisdom science also studies how knowledge elements are activated as tacit or explicit knowledge, including temporal development of knowledge elements including their state and handling when deployed as tacit or explicit knowledge. The author proposes that both tacit and explicit knowledge are two facets of knowledge of a person, or emerged state of some of knowledge elements that belong to the core of knowledge of a person (Fig. 4).
Emotive Idea and Concept Generation
93
Fig. 10. Image generation in music performance with influence of emotion
Though not discussed in this paper, another related and more fundamental element is the ethics. Analogous to emotion, there is good and bad ethics, and they influence the decision makings. Although science is supposed to be based on a logical and objective reasoning processes, this is not in reality, and the scientific reasoning is a personal process [6]. Then it is natural to assume that emotion is connected with reasoning process and has strong influence on this process.
5
Conclusions
This paper proposes a model of human creative activities using the framework of wisdom science. The two main components of the model are the concept processing and image processing modules, which function side by side. The hypernetwork model is used, which is capable of representing structures that cannot be described by conventional system models. The concept of system science serves as the basic framework for the modeling and analysis. Wisdom science aims to describe knowledge based on neuron level phenomena, differing from conventional knowledge studies that rely on vague and abstract level. Use of neuron activities permits quantitative investigation of creative activities, which has not been done before. This paper is not providing detailed mechanism of the influence of emotion on the selection of concept elements and image elements. A model framework with enough description capability is necessary to describe integrated personal knowledge of tacit and explicit knowledge. Namely, the model should be able to integrate multiple facets, no prerequisite of precise structure, and no fixed boundaries. The three features are interrelated. The hypernetwork model [2] is the model framework used to describe. No other conventional model framework presents the three properties related to multiple facets. No restrictions exist on what an element represents. It can be a concept described with terms, and image, or an abstract or a fuzzy entity.
94
T. Maeshiro et al.
The concept processing and image processing are not independent mechanisms from conventional treatments of tacit and explicit knowledge, but should be interpreted as a new viewpoint or facet to understand human creative activities. Acknowledgments. This research was supported by the JSPS KAKENHI Grant Number 20H04287 (T.M.).
References 1. Damasio, A.: Descartes’ Error: Emotion. Reason and the Human Brain, Grosset/Putnam (1994) 2. Maeshiro, T.: Framework based on relationship to describe non-hierarchical, boundaryless and multi-perspective phenomena. SICE J. Control Measur. Syst. Integr. 11, 381–389 (2019) 3. Maeshiro, T.: Proposal of wisdom science. In: Yamamoto, S., Mori, H. (eds.) HCII 2021. LNCS, vol. 12766, pp. 406–418. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-78361-7 31 4. Maeshiro, T., Ozawa, Y., Maeshiro, M.: Wisdom science of image conceptualization. Human Interface and the Management of Information: Visual and Information Design, pp. 22–34 (2022). https://doi.org/10.1007/978-3-031-06424-1 3 5. Nonaka, I.: A dynamic theory of organizational knowledge creation. Organ. Sci. 5, 14–37 (1994) 6. Polanyi, M.: Genius in science. Encounter 38, 43–50 (1972)
Survey on the Auditory Feelings of Strangeness While Listening to Music Ryota Matsui1,2(B) , Yutaka Yanagisawa1 , Yoshinari Takegawa2 , and Keiji Hirata2 1
2
MPLUSPLUS Co., Ltd., Shinagawa, Tokyo, Japan [email protected] Future University Hakodate, Hakodate, Hokkaido, Japan [email protected] http://www.mplpl.com/
Abstract. In this study, we investigated the “feeling of strangeness” while listening to music. The purpose of this study is to investigate which of the three elements of music (“pitch”, “rhythm”, and “volume”) is more related to the feeling of strangeness when listening to music. We conducted a music listening experiment with 90 public participants and 190 non-music major university students. In the experiment, a sound source was used in which either the pitch, rhythm or volume was deliberately varied by a random programme. The results of the experiment showed that in music with a fast-tempo and well-known melody, the strongest factor influencing “feeling of strangeness” was rhythm. Rhythm, volume, and pitch, in that order, had a significant effect on the sense of strangeness, and a significant differences were identified between all factors. In addition, the order of pitch, volume and rhythm had a strong influence on “feeling of strangeness” in slow-tempo songs. The results of these experiments can be used as a threshold criterion for humans to judge whether music is good or bad, and can be applied to various applications and deep learning in the future. Keywords: Feeling of strangeness differences
1 1.1
· Music Listening · Individual
Introduction Back Ground
Definition of “Feeling of Strangeness”. Our research group investigates the “feeling of strangeness” that humans feel when there is noise or missing information in the visual and auditory information presented to them. Supported by MPLUSPLUS Co., Ltd. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 95–106, 2023. https://doi.org/10.1007/978-3-031-35132-7_7
96
R. Matsui et al.
Fig. 1. Differences in appearance due to missing parts in LED displays
In performing arts such as dance and instrumental music, performers are forced to give a one-off performance on stage. Even professional performers who are accustomed to performing on stage may make mistakes or experience some kind of abnormality in their on-stage performances. These may be caused by the performer’s own mistakes or by malfunctions in the stage equipment. On the other hand, if the audience is unaware of an anomaly, the performance can be said to have been successful. The information about the performance obtained by the audience can be broadly divided into visual and auditory information. Examples of stage equipment for visual information include LED displays consisting of multiple LEDs [1,2,6]. The shape of these missing parts and the shape of the display is also important. On the other hand, depending on the shape of the missing part and the content of the displayed image, it may be difficult to feel a sense of “feeling of strangeness” even if the part is missing. A concrete example is shown in Fig 1. The example of a defective LED in Block Wiring in the diagram makes it easy for the audience to notice that the LED is defective. However, in the spread wiring example, even if the same number of LEDs are missing as in the previous example, it is difficult for the audience to notice that the LEDs are faulty. In this way, this research defines “feeling of strangeness” as the feeling that something is wrong even if it is not clear what exactly is missing. Feeling of Strangeness Regarding Auditory Information. A similar “feeling of strangeness” exists in auditory information. When there is a mistake in a performance, the audience feels that the performance “sounds strange” and thinks that the performance is not good or that the music itself is strange. In recent years, due to the increasing activity of video-sharing websites, there are many opportunities to listen to live sound sources even without going to the venue. Because live sound sources are unedited or only minimally edited, mistakes made by the performer are often recorded as they are. In such cases, the
Survey on the Auditory Feelings of Strangeness While Listening to Music
97
audience may feel “feeling of strangeness” with the performance. The same sense of “feeling of strangeness” may be felt not only in concerts but also in lessons on musical instruments. By elucidating the causes of the “feeling of strangeness” felt by the audience, it is possible to apply this to various entertainment and learning systems. Few studies have been conducted on the “feeling of strangeness” of auditory information. In addition, the elements that constitute a performance have not been clearly defined. Therefore, in this study, pitch, rhythm and volume were defined as the “three elements that compose a performance” and were the subject of analysis of the factors that cause a feeling of “feeling of strangeness” when listening to a performance. The reasons for the above definitions are as follows. When a performance is handled as some kind of data, it is often handled as MIDI data, which has high reproducibility and can be easily modified, and the MIDI data parameters pitch, inter-onset interval, and velocity are related to pitch, rhythm and volume, respectively. These elements are also among the parameters of the MIDI data. These elements are fundamental among MIDI data parameters and are particularly easy to modify. Therefore, in this experiment, the pitch, rhythm and volume, which can be calculated from these parameters, are treated as the “three elements that compose a performance”. In addition, only the piano sound source is treated as the sound source of the performance. It has not been clarified which of the three elements that compose a performance has a strong influence on the feeling of “feeling of strangeness”. In this study, a basic research was conducted to investigate which of the three elements that compose a performance has a strong influence on the feeling of “feeling of strangeness” when listening to a performance.
2 2.1
Related Research Visual Feeling of Strangeness
Although no studies have been conducted on “feeling of strangeness” during performance listening, several studies have been conducted on visual “feeling of strangeness”. Takehara et al. [8]. investigated the impression of facial “feeling of strangeness” according to the ratio of black eyes to white eyes. Yanagisawa et al. [5]. produced a stage costume using LEDs that would not cause a sense of “feeling of strangeness” even in the event of a malfunction. This stage costume is designed in such a way that even if some of the components made up of LEDs break, the audience is unlikely to notice any “feeling of strangeness” in the performance. Kubota et al. [4]. quantitatively calculated the “feeling of strangeness” in LED panel failure patterns for each features. Fujimoto et al. [3] constructed a system in which sound effects and background music are played at a timing that matches the rhythm of the dance when some error occurs, by assigning the output of sound effects and background music to the dance steps. This gives the audience the illusion that the performance is progressing without problems, as they are unaware of any errors.
98
R. Matsui et al.
Fig. 2. Pre-questionnaire response results
Fig. 3. Percentage of correct answers per song in the composer quiz
These conform to the definition of “apparent dependability” proposed by Terada et al. [7]. In other words, even if the performance does not go according to plan due to malfunctions or problems on stage, if the audience is unaware of this fact, the performance can be said to have been a success. We hypothesise that this also applies to music and performances. By investigating the causes of “feeling of strangeness” when listening to a performance, we expect to find applications in various situations, such as in performance and learning systems. 2.2
Automatic Evaluation of Music
The evaluation of music and instrumental performances is an extremely subjective task, and it is difficult to evaluate “good performances” or “feeling of strangeness” using objective indices. In research fields such as automatic music generation, it is often necessary to ask experts such as composers and perform-
Survey on the Auditory Feelings of Strangeness While Listening to Music
99
Fig. 4. Example of the point of change and the amount when an error is made in the pitch
Fig. 5. Example of the point of change and the amount when an error is made in the rhythm
Fig. 6. Example of the point of change and the amount when an error is made in the volume
ers to evaluate the generated music. On the other hand, research on music performance evaluation, known as MPA (Music Performance Analysis), has been conducted in the past. MPA is a research field that aims to measure, understand and model the effects of changes in performance expression on human listeners [10]. Research on MPA has included the analysis of data from piano rolls, MIDI performance devices and acoustic signals [9,11–13]. In recent years, research has also been carried out to evaluate music using deep learning [14]. On the other hand, these studies have not considered what elements of a performance are strongly affected when humans listen to a performance by a human performer. When some kind of abnormality or mistake occurs in a performance, the listener feels that “something is wrong” even if the cause of the mistake is not
100
R. Matsui et al.
known. Therefore, this study aims to investigate the perception of “feeling of strangeness” caused by performance errors and to elucidate what factors cause listeners to feel “feeling of strangeness”.
3
Experiment
An experiment was conducted to investigate which of the three elements (pitch, rhythm and volume) that make up a performance is strongly associated with the feeling of “feeling of strangeness” when listening to a piano performance. 3.1
Used Sound Sources
The sound source used in the experiment was MIDI data that recorded the data actually played by the pianist on the piano. This consists of information about the performance. In addition to these, control information such as timbre (type of instrument) is also included, but the piano does not change timbre in the middle of a performance. To prevent the playback sound from becoming mechanical, a YAMAHA trans-acoustic piano (TA2) is used as the sound source when playing back MIDI data. 3.2
Subjects
The subjects of the experiment were 190 male and female university students aged 20–24 years who were not music majors. A pre-questionnaire survey was conducted beforehand to investigate the extent to which they were familiar with music on a daily basis. In the questionnaire, the subjects were asked to answer questions on their “experience of practising musical instruments”, “playing musical instruments” and “guessing the composer of piano pieces”. The results of the questionnaire on the experience of practising a musical instrument are shown in Fig. 2. The percentage of correct answers to the quiz on guessing the composer of the piano piece and the list of pieces used are shown in Fig. 3. The Beethoven and Mozart pieces with relatively high percentages of correct answers are often used in TV programmes. Therefore, it can be assumed that the subjects were familiar with these songs and knew them. Table 1. List of music used in the experiment Music A Chopin op.63-1 Music B Mozart K.545 Music C Haydn Hob.XVl:11 Music D Bach BWV.847 Music E Beethoven op.27-2 Music F Ravel Pavane pour une infante d´efunte
Survey on the Auditory Feelings of Strangeness While Listening to Music
3.3
101
Experimental Procedure
We prepared MIDI data of six piano performances of approximately 20 to 30 s each for the subjects to listening. In this study, it is assumed that the subjects listen to the actual performance data, rather than recorded music. In actual performances such as live performances, performance errors occur unintentionally and at unexpected points. For this reason, MIDI format data was used, which can handle the performance itself and can be freely changed, rather than the recorded data of the performance. For each performance, we created data in which one of the following three parameters was intentionally changed: pitch, rhythm (time of sound) and volume (velocity). The pitch, rhythm and volume of the music and the amount of change were randomly selected by the programme for all pieces. We build a web page for the experiment and asked the subjects to start the experiment individually by accessing the page with their own personal terminals. The devices from which the subjects accessed the experimental page were either a smartphone or a PC. The subjects listened to both performances with abnormalities in “volume” and “pitch”, and were asked to answer the question “Please choose the performance that you felt more feeling of strange”. Similarly, they were asked to answer the question “Please choose the performance that you felt more strange with” for “rhythm” and “pitch”, and “volume” and “rhythm”. A list of the piano pieces the subjects listened to, six in all, is shown in Table 1. These pieces ranged from well-known pieces to minor pieces that were unknown to the pianists who performed them. The subjects were instructed to listen to the experimental sound source in a quiet environment. The subjects were equally divided into two groups and each group was asked to listen to six performances of each song according to the combination of performances in which “feeling of strangeness” was occurring shown in Table 2, 3. Table 2. Music used in the experiment and errors occurring combination of:1 Pitch Rhythm Volume Music A
Music B
Music C
Music D
Music E
Music F
102
R. Matsui et al. Table 3. Music used in the experiment and errors occurring combination of:2 Pitch Rhythm Volume Music A
Music B
Music C
Music D
Music E
Music F
3.4
Performance and Feeling of Strangeness Listened to by the Subjects
Each performance listened to by the subjects was between 20 and 30 s long, approximately the length of a phrase in a piece of music. The starting point of playback was the first bar of the piece. The performances listened to by the subjects contained abnormalities in pitch, rhythm or volume. For each performance, only one abnormality per song and of one type occurred, and the point of the error was selected at random. The locations and amounts of change when an error is caused in a piece of music are shown in Figs. 4, 5 and 6. The amount of change for each element is also shown below. – Pitch:The pitch value of the note in question is changed downwards by 1 – Rhythm:The Note-on time is shifted backwards by 50% – Volume:The velocity value of the range is shifted by plus or minus 50% 3.5
Results of Experiment
The results of the experiment are shown in Figs. 7,8 and 9. In music B, significantly more subjects felt strange with the rhythm when comparing pitch and rhythm (P = 2.24 × 10−7 ). Similarly, in music B, significantly more subjects felt strange with the rhythm when comparing rhythm and volume(P = 3.1 × 10−6 ).
Survey on the Auditory Feelings of Strangeness While Listening to Music
Fig. 7. Comparative results for pitch and rhythm
Fig. 8. Comparative results for rhythm and volume
Fig. 9. Comparative results for pitch and volume
103
104
R. Matsui et al.
Table 4. Music used in the additional experiment and errors occurring combination Pitch Rhythm Volume Music B
Music C
Music F
When comparing pitch and volume, significantly more subjects felt strange with pitch in music C(P = 0.032) and music F(P = 0.011). Additional Experiment. We also carried out an additional experiment with 90 general subjects on a set of music for which these significant differences were found, in a set of combinations other than those in Tables 2 and 3. A table of comparison for the additional experiments is shown in Table 4. Significantly more subjects felt strangeness with the volume of music B. Significantly more subjects felt strangeness with the volume of music F. On the other hand, there was no difference in the pitch and rhythm of music C. Therefore, summarising the results of all the experiments, it can be asserted that, in music B, the order of rhythm, volume and pitch is more likely to be perceived as “feeling of strangeness”. In addition, it can be asserted that in music F, the order of pitch, volume and rhythm is likely to be “feeling of strangeness”.
4
Consideration
No correlation was found between the subjects’ sense of strangeness towards the music and their experience of playing the instrument according to the prequestionnaire. Thus, subjects can feel the strange feelings occurring in a piece of music, regardless of their experience of playing a musical instrument. The results of the experiment clearly showed significant differences, especially with regard to music B and F. Music B and F are both relatively well-known in Japan and are often heard on television and as background music. From these commonalities, the conclusion can be argued that “well-known pieces are more likely to cause feeling of strangeness when errors occur in the performance”. The tempo of music B was relatively fast (BPM 140) and music F was relatively slow (BPM 60). Therefore, summarising these results, the following conclusions can be argued. – With regard to “feeling of strangeness”, the influence of rhythm is greater for fast-tempo pieces, while the influence of pitch is smaller. – With regard to “feeling of strangeness”, the influence of pitch is greater for slow-tempo pieces, while the influence of rhythm is smaller. With regard to this experiment, there are no quantitative evaluation criteria for the performance, so it is reasonable to make an evaluation based on the
Survey on the Auditory Feelings of Strangeness While Listening to Music
105
subjectivity of the individual. On the other hand, a detailed analysis of each piece of music and the relationship between feeling of strangeness has not been fully verified. It is necessary to examine the differences in the same piece of music in different keys, major or minor. After carrying out these analyses, the aim is to apply them to applications that will assist humans in listening to music.
5
Conclusion
In this study, we investigated the “feeling of strangeness” while listening to music. The purpose of this study is to investigate which of the three elements of music (“pitch”, “rhythm”, and “volume”) is more related to the feeling of strangeness when listening to music. We conducted a music listening experiment with 90 public participants and 190 non-music major university students. In the experiment, a sound source was used in which either the pitch, rhythm or volume was deliberately varied by a random programme. The results of the experiment showed that in music with a fast-tempo and well-known melody, the strongest factor influencing “feeling of strangeness” was rhythm. Rhythm, volume, and pitch, in that order, had a significant effect on the sense of strangeness, and a significant differences were identified between all factors. In addition, the order of pitch, volume and rhythm had a strong influence on “feeling of strangeness” in slow-tempo songs. Acknowledgment. This work was supported by JST CREST Grant Number JPMJCR18A3, Japan.
References 1. MPLUSPLUS Co., Ltd, LED VISION FLAG. http://www.mplpl.com/project/ 534/. Accessed Jan 2023 2. Fuijimoto, M., Fujita, N., Terada, T., Tsukamoto, M.: Lighting Choreographer: design and implementation of a wearable LED performance system. J. Trans. Virtual Real. Soc. Japan (in Japanese). 16(3), 517–525 (2011) 3. Fujimoto, M., Fujita, N., Takegawa, Y., Terada, T., Tsukamoto, M.: Design and implementation of a wearable dancing musical instrument. J. Inf. Process (in Japanese). 50(12), 2900–2909 (2009) 4. Kubota, S., Terada, T., Yanagisawa, Y., Tsukamoto, M.: Investigation of quantification of discomfort due to failure in LED performance, Technical Reports of Information Processing Society Japan, Vol. 2019-HCI-183, No. 13, pp. 1–7 (2019) 5. Yanagisawa, Y., Fujimoto, M.: Dependability for LED suits system as a wearable performance device. Technical Reports of Information Processing Society Japan (in Japanese), Vol. 2017-GN-100, NO.18, pp. 1–7 (2017) 6. Yanagisawa, Y., Ono, K., Ueda, K., Izuta, R., Yoshiike, T., Fujimoto, M.: An experiment of LED lighting system in synchronization with live-streaming video. Technical Reports of Information Processing Society Japan (in Japanese), Vol. 2021-GN-112, No. 19, pp. 1–7 (2021)
106
R. Matsui et al.
7. Terada, T.: Apparent dependability: a proposal of a new evaluation axis for wearable and ubiquitous entertainment systems. In: Information Processing Society of Japan Symposium Series Maruichi Media, Multimedia, Distributed, Cooperative, and Mobile Symposium(DICOMO2010), pp. 1962–1967 (2010) 8. Takehara, T., Tanijiri, T.: Interaction effect between the size of iris and the shape of eyelid for facial attractiveness. J. Trans. Japan Soc. Kansei Eng. 14(4), 491–495 (2015) 9. Eric. C.: Understanding the psychology of performance. In: Rink, J. (ed.) Musical Performance: A Guide to Understanding, pp. 59–72. Cambridge University Press, Cambridge, (2002) 10. Lerch, A.: Software-based extraction of objective parameters from music performances. Ph.D. Thesis, Technical University of Berlin (2008) 11. Palmer. C.: Mapping musical thought to musical performance. J. Exp. Psychol. Hum. Percept. Perform. 15(2), 331–346 (1989) 12. Repp, B.H.: Patterns of note onset asynchronies in expressive piano performance. J. Acoust. Soc. Am. 100, 3917–3932 (1996) 13. Dixon. S, Goebl. W.: Pinpointing the beat: Tapping to expressive performances. In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC), pp. 617–620 (2002) 14. Pati, K., Gururani, S., Lerch, A.: Assessment of student music performances using deep neural networks. J. Appl. Sci. 8(4), 507–525 (2018)
Text Reconstructing System of Editorial Text Based on Reader’s Comprehension Yuki Okaniwa1 and Tomoko Kojiri2(B) 1 Graduate School of Science and Engineering, Kansai University, Suita, Japan 2 Faculty of Engineering Science, Kansai University, Suita, Japan
[email protected]
Abstract. In writing an editorial, an author constructs a logical structure to form an argument. A reader, meanwhile, determines the topics presented in that text and the relationships between them to understand that logical structure and arrive at the argument. However, if the reader cannot correctly read the logical structure, the text that can be reconstructed from the logical structure understood by the reader will differ from the original editorial text. We have developed a system that helps a reader recognize one’s own errors in understanding logical structure by reconstructing the text based on the understood logical structure. In this paper, we provide an overview of the system and discuss the results of an experiment that we performed to evaluate the effectiveness of the system. Keywords: Reading support system · Visualization · Error recognition support
1 Introduction An editorial is an article that logically expresses the author’s ideas or opinions with the aim of making the reader understand a certain argument. Here, reading proficiency is needed to understand the editorial text. However, according to an international survey on scholastic ability conducted by the Organisation for Economic Co-operation and Development (OECD) in 2018, reading proficiency is declining in a number of countries around the world including Australia, Korea, and New Zealand [1]. Japan is one of these countries as well, so the need is felt for supporting reading comprehension so that editorial text that requires reading proficiency can be understood. This study supports reading-comprehension activities for readers to improve their ability in understanding editorial text. Past research that aims to support readers in comprehending important points in text exists. Fukunaga et al. constructed a system that has the learner underline those parts of a text thought to indicate important topics and that compares those choices with correct data to provide feedback [2]. Although extracting important points from an editorial text enables a reader to understand candidates for the author’s argument and its logical elements, it does not enable understanding of logical structure. In this regard, there is research that provides an environment for organizing logical structure to support the reader in understanding logical structure. Mochizuki et al. provides an © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 107–121, 2023. https://doi.org/10.1007/978-3-031-35132-7_8
108
Y. Okaniwa and T. Kojiri
environment that enables the reader to underline and extract important elements in a text and organize the logical relationships among those text elements in the form of a concept map [3]. This study, however, focuses on having the reader organize a logical structure and does not provide a means of determining whether the concept map organized by the reader is correct. As a result, the reader cannot necessarily understand the actual logical structure. In research that provides feedback on comprehension, there is an approach that brings an error to the learner’s attention by converting the learner’s understanding into a representation that makes errors noticeable. Horiguchi et al. have proposed a technique called Error-Based Simulation (EBS) targeting mechanics in physics that visualizes the motion of an object based on an equation formulated by the learner to help the learner recognize an error in that formulation [4]. This system analyzes a learner’s error patterns and prepares simulations based on those patterns beforehand. In reading comprehension, on the other hand, granularity in grasping topics differs from one person to the next, so logical structure will differ depending on whether the same text is viewed as consisting of one topic or multiple topics. This makes it impossible to prepare an error pattern in logical structure beforehand. Thus, in this study, instead of defining error, we dynamically generate and present a visualization that helps readers recognize their errors in logical structure based on their understanding of the logical structure of that text. Since an editorial is something that turns logical structure into text, it is possible to generate editorial text that expresses logical structure. If the reader incorrectly interprets the logical structure, the text that can be reconstructed from that misunderstood logical structure will differ from the original editorial text. The existence of errors can therefore be brought to the reader’s attention by reconstructing and presenting text based on the logical structure understood by the reader. At the same time, there are people who cannot correct their errors even if made aware of them. Aikawa et al. defined a state in which the learner cannot arrive at the correct answer as an impasse even if made aware of errors by EBS and proposed a technique that resolves those errors by presenting simple (auxiliary) problems based on the learner’s error trends and having the learner solve those problems [5]. That study targeted the field of physics the same as the study of Horiguchi et al., and since it could define error trends beforehand, it could prepare problems according to those trends. However, errors in comprehending editorial text can be made in diverse ways, so error in this case cannot be defined beforehand and problems cannot be prepared in the above way. In our study, we treat the ability to restore text as a condition for having a correct logical structure and reconstruct the reader’s logical structure with various levels of granularity to check whether certain constraints are being satisfied. Here, if a sequence of sentences of a selected logical structure differs from the original text, that section is identified as a candidate for an error location. The reader is then provided support for correcting that logical structure by presenting changes to the detected error-location candidate as correction candidates that would satisfy certain constraints from a variety of changes that could be made.
Text Reconstructing System of Editorial Text
109
2 Logical Structure Expression We here define logical structure as targeted by this study. Minto proposed a pyramid structure for structurally organizing a conclusion and the grounds for that conclusion [6]. A pyramid structure is expressed as a tree structure that can structurally organize logic by placing the conclusion at the apex of that structure and arranging the topics that support that conclusion in a pyramid shape. The parent-child relationship of topics in a pyramid structure features two types of relationships: one consisting of cause and effect in a causal relationship (Why-relation) and the other consisting of generalization and specialization (How-relation). In editorial text, these two types of relationships appear in a mixed manner, and it is important that each of these two types of relationships be recognized to understand the logic in that text. In this study, we target understanding of these two types of relationships. A pyramid structure is expressed in terms of a plane, and one pyramid structure can only express one type of relationship. For this reason, we propose a How-Why structure as a technique for expressing an understood logical structure. This technique extends the pyramid structure to three dimensions consisting of two planes each corresponding to one of the above types of relationships. The How-Why structure is shown in Fig. 1. In the figure, a node represents a topic within editorial text. It may consist of one or more sentences. The tree structure expressed on a horizontal plane shows the relationships among topics expressed as Why-relations and the tree structure expressed on a vertical plane shows How-relations.
Fig. 1. How-Why structure
3 Support System for Understanding Logical Structure Text in an editorial describes the topics that configure logic in the order conveyed by that structure. There are also times in which the insertion of a conjunction between sentences can make it easier to understand that logical structure. Similarly, the logical structure created by the reader can be used to sequence individual sentences making up a topic and create a single sentence or to insert a conjunction between adjacent sentences to convey that logical structure. Considering the case in which the reader understands the
110
Y. Okaniwa and T. Kojiri
correct logical structure, the text that can be created from the understood logical structure will be the same as the original editorial text, but if the text that can be created from the understood logical structure cannot be expressed as the original editorial text, that logical structure can be recognized as being erroneous. In this study, we help the reader recognize errors in understanding the logical structure of editorial text by generating and presenting text that can be generated from the logical structure understood by the reader. Moreover, for readers who cannot make corrections even if they should be made aware of their errors in understanding logical structure based on reconstructed text, it will be necessary to support them by specifying those errors and correction methods. Based on the fact that the correct logical structure is that in which generated text is the same as the reconstructed original editorial, we propose a support system for those readers who cannot make corrections to their understood logical structure by identifying error locations and correction methods and presenting those candidates as an aid in making corrections. System configuration is shown in Fig. 2. To reconstruct text from the learner’s logical structure and identify error location and correction methods, an environment is needed for determining the logical structure that the learner understands from the editorial text. Here, the logical-structure input interface displays an editorial selected from editorials stored beforehand in an editorial database. It also provides an interface that allows the learner to input understood topics and the relationships among them. These topics and links input by the learner are saved in the topic database and link database. The main interface displays the How-Why structure between links input by the learner in a three-dimensional format while also displaying candidates for error locations and correction methods identified by the error-location recognition support function and error-location correction support function, respectively, on the logical structure. The reconstructed-text display interface, in turn, presents reconstructed text generated by the text reconstruction function based on the logical structure input by the learner.
Fig. 2. System configuration.
Text Reconstructing System of Editorial Text
111
4 Text Reconstruction Technique 4.1 Reconstructed Text The following two features in text express the logical relationship between topics. • Order of appearance of sentences • Conjunctions inserted between sentences For example, let conclusion A and grounds B and C exist in a Why-relation. When creating text from this structure, we can consider the following sequences: A → B → C that describes the grounds from the conclusion and B → C → A that derives the conclusion from the grounds. However, writing B → A → C that inserts the conclusion between the grounds is almost never done. On the other hand, there are certain conjunctions that can express relationships, such as “so” inserted between sentences that leads from the grounds to a conclusion or “for example” inserted between sentences that lead from the general to the specific. In addition, the case of consecutive sentences provides an opportunity to check the validity of expressing the relationships understood by the reader in terms of conjunctions. With the above in mind, we perform two types of reconstructions based on the logical structure understood by the reader: rearrange the order of sentences and insert conjunctions between the original sentences. 4.2 Reconstruction of Sentence Order The sequence of sentences that can convey a logical relationship depends on that relationship. We therefore define sequences of topics that can express logical structure in both a Why-relation and How-relation. Then, based on these defined sequence, we present the reader with a rearrangement of topics as reconstructed results. There may be more than one order of topics that can express a sequence depending on the relationship between sentences. In such a case, we assume that reader error is minimal and adopt the sequence closest to that of the actual editorial text. This reconstruction technique can be explained as follows taking the Why-relation as an example. Methods for describing a causal relationship include “first describe the conclusion and then provide an explanation or evidence, i.e., grounds for that conclusion,” “first give an explanation or provide evidence and describe the conclusion at the end,” and “describe the conclusion, describe the grounds, and describe the conclusion again.” Based on the above, we define three types of sequences that can express a Why-relation: “conclusion → grounds,” “grounds → conclusion,” and “conclusion → grounds → conclusion.” However, since conclusion appears twice in the sequence “conclusion → grounds → conclusion,” it can be applied only to the case in which there are two or more sentences having the same meaning, i.e., expressing the conclusion.
112
Y. Okaniwa and T. Kojiri
We here give an example of sentence reconstruction using this definition. For the Why-relation structure shown in Fig. 3, each of the topics shown is assumed to consist of one sentence. At this time, if we take A → C to be the grounds for B, we can consider A → C → B and B → A → C as two possible sequences. If the sequence in the original text happens to be A → B → C, the sequence A → C → B closest to this will be adopted.
Fig. 3. Example of a Why-relation structure.
In the same way, we define sequences that can express a How-relation and use them as a basis for rearranging the order of sentences in a manner closest to the editorial text. 4.3 Reconstruction by Inserting Conjunctions The type of relationship between topics determines the conjunction to be used. We therefore define conjunctions that express relationships and insert a defined conjunction whenever a relationship exists in the understood logical structure between adjacent sentences in the original editorial text. These conjunctions are defined as follows. ・Why relationship ・Conclusion → grounds: “because” ・Grounds → conclusion: “so” ・How relationship ・Specific → general: “in this way” ・General → specific: “for example” As an example of inserting conjunctions, let’s assume that sentences A, B, and C exist in the editorial text in that order. Now, the reader understands B to be the grounds for A and C to be a specific example of B. In this case, conjunctions are added to the original editorial text in the manner of “A because B, for example, C.”
5 Logical-Structure Correction Support Technique Correcting logical structure requires that the location of an error in one’s own logical structure be identified and that that error location be changed to the appropriate structure. However, a reader who cannot correct a logical structure cannot do either of the above. Thus, if the reader does not understand where the errors are located, the system presents those locations that are hindering reconstruction back to the original editorial text, and if the reader, though now understanding the error locations, does not know how to go about correcting them, the system connects those recognized locations and presents locations that can reconstruct the original editorial text.
Text Reconstructing System of Editorial Text
113
5.1 Logical-Structure Error-Candidate Identification Technique A structure for which the original text can be reconstructed is called the correct logical structure. This holds even for a portion of the logical structure. To identify errors, it is sufficient to reconstruct text from a logical structure in a small range and identify which range hinders the reconstruction of the original text. We therefore proceed by reconstructing sentence order starting from a subtree in which a lower node in the How-Why structure understood by the reader is taken to be the root, and in the event that the sequence of sentences differs from the original text, treating that subtree as an error-location candidate. The following summarizes the algorithm for identifying error locations. Step 1. Search for the deepest leaf node. Step 2. Reconstruct text at the subtree comprising the parent node of the leaf node in Step 1 i. At this time, if no inconsistencies are found with the original text, the algorithm groups together this partial tree into a single node and returns to Step 1. ii. If any inconsistencies are found in the sentence order, the algorithm detects this partial tree as an error location and stops here. Step 3. Terminate the algorithm once the entire tree becomes a single node with no errors This algorithm reconstructs text starting from the smallest topic to check for the presence of errors. It constitutes a technique for identifying error locations by treating a location with no errors as one topic and repeating the reconstruction process. Furthermore, in the process of writing text, it may happen that a Why-relation topic involved in logical expansion is described together with a How-relation topic expressing a specific example both with respect to the same topic. Consequently, in the case that multiple nodes exist at the same depth, priority in applying the algorithm is given to the tree structure of the How-relation expressing the same topic. We give an example of applying this error-location identification algorithm. In this example, we assume a logical structure with only a Why-relation expressed by the tree structure shown in Fig. 4. Letters within a node denote sentences encompassed by that topic, and within the editorial text, those sentences are described in alphabetical order. First, in Step 1, the algorithm detects node {D, E} from among those nodes at the deepest level. Next, in Step 2, the algorithm rearranges the parent subtree of that node as C–D–E using the sentence-order reconstruction technique. Since this order is the same as that in the original editorial text, no error is found between these sentences. Next, on returning to Step 1, the algorithm detects node F, and in Step 2, the algorithm rearranges the parent subtree of this node as B–F–G in which B and F appear in succession though they are not in consecutive positions in the original text. Since this outcome is not the same as the original editorial text, it is extracted as an error-location candidate.
114
Y. Okaniwa and T. Kojiri
Fig. 4. Example of a Why-relation structure having an error.
5.2 Correction-Candidate Identification Technique The candidate for a topic that should be connected to a topic at an error location is that location resulting in a structure that restores the original text at the time of reconstruction. The system therefore presents a topic that should be connected to the topic at an error location specified by the user as a correction candidate. The following summarizes the algorithm for identifying correction candidates. Step 1. Connect the specified nodes in order starting from the root node and reconstruct the subtree comprising the connected nodes i. If no inconsistencies are found with the original text, the connected nodes are added as correction candidates. Step 2. Terminate the algorithm once all nodes have been searched. This algorithm constitutes a technique that performs an exhaustive search of all nodes that can be connected to the specified nodes without inconsistencies. The following gives an example of applying this correction-candidate identification algorithm. In this example, we assume that the user has selected node F in Fig. 4 as an error location. In Step 1, the algorithm connects node F to node A and reconstructs sentence order based on the subtree of connected node A to get A–C–D–E–B–G–F. This is different than the original editorial text, so node A is not a correction candidate. Next, the algorithm connects node F to node C and reconstructs sentence order based on the subtree of connected node C to get C–D–E–F. This is the same order as found in the original editorial text, so node C is detected as a correction candidate for node F. Finally, searching all nodes in the same way identifies nodes C, {D, E}, and G as correction candidates.
Text Reconstructing System of Editorial Text
115
6 Prototype System The main interface is shown in Fig. 5. The How-Why structure display area displays the How-Why structure understood by the reader as a three-dimensional structure. Clicking on a topic enters that topic’s selection state and displays the IDs of the sentences encompassed by that topic and a summary of the topic in the topic-details display area. Additionally, pushing the topic-creation button and the link-creation button displays the topic-creation interface and the link-creation interface, respectively. Pushing the reconstructed-text display button shows the reconstructed-text display interface, respectively. Pushing the error-candidate identification button displays error-location candidates colored in orange on the How-Why structure and pushing the correction-candidate identification button in a topic’s selection state displays correction candidates likewise in orange.
Fig. 5. Main interface.
The topic-creation interface is shown in Fig. 6. In the editorial-text display area, the sentences in the editorial text are displayed with IDs assigned in the order of appearance. A topic can be created by selecting the sentences that belong to the topic, entering the summary sentence of the topic in the topic-sentence input area, and pushing the nodecreation button. Created topics are displayed in the form of nodes in the topic-display area. The link-creation interface is shown in Fig. 7. The created topic node is displayed in the topic-display. The link is created by selecting the nodes of the two topics and the type of the relation.
116
Y. Okaniwa and T. Kojiri
Fig. 6. Topic-creation interface.
Fig. 7. Link-creation interface.
The reconstructed-text display interface is shown in Fig. 8. The original editorial text is displayed in the editorial-text display area. Selecting reconstructed text desired for display in the reconstruction selection area displays that reconstructed text in the reconstructed-text display area. In a reconstruction of sentence order, sentence IDs are
Text Reconstructing System of Editorial Text
117
displayed at the beginning of each sentence as shown in Fig. 9 (a) so that the reader can check sentence order by looking at those numerals. In the case of reconstruction by inserting conjunctions, any inserted conjunctions between topics are displayed in blue as shown in Fig. 9 (b) to make it easier for the reader to check.
Fig. 8. Reconstructed-text display interface.
Fig. 9. Example of displaying reconstructed text.
118
Y. Okaniwa and T. Kojiri
7 Evaluation Experiment We conducted an experiment to evaluate the effectiveness of helping a reader recognize errors by text reconstruction and of correcting those errors as proposed in this study. We recruited a total of seven undergraduate and graduate students as subjects (denoted as A–G) and had them use the prototype system to assess their understanding of How-Why structures by the following procedure. (1) Read the editorial text and attempt to understand the How-Why structure (2) Correct the understood How-Why structure using the text reconstruction function (3) Correct the How-Why structure using the error-location recognition support function and the error-location correction support function (4) Respond to the questionnaire The target text consisted of 52 sentences extracted from chapter 4 of Nihongokyo no Susume (Takao Suzuki) [7], a collection of essays on encouraging the teaching of the Japanese language that has been used as a textbook of contemporary writing for middle-school students. The questionnaire used in step (4) consisted of questions asking about the effectiveness of the system in helping to discover error locations in the reconstructed text and the effectiveness of each of the support functions. Questions and responses are listed in Table 1. Questions 1 and 2 asked about the effectiveness of reconstructed text with each offering three responses to choose from. Subjects who answered with response 2 or 3 to the question about sentence-order reconstruction were asked to respond to question 3 to give reasons why they could not notice errors. Similarly, subjects who answered with response 2 or 3 to the question about conjunction reconstruction were asked to respond to question 4 to give reasons why they could not notice errors. Questions 5 and 6 asked about the effectiveness of support functions with each offering three responses to choose from. Subjects who answered with response 1 to the question about the errorlocation recognition support function were asked to respond to question 7 to give reasons why they felt it was effective. Additionally, those who answered with response 2 to the question were asked to describe why they felt it was not effective in question 9. Similarly, subjects who answered with response 1 to the question about the error-location correction support function were asked to respond to question 8 to give reasons why they felt it was effective, and those who answered with response 2 to the question were asked to describe why they felt it was not effective in question 10. We here present the results of this experiment. On the whole, using this system to comprehend editorial text enabled all subjects A–G to eventually create How-Why structures with no errors. Questionnaire results on the effectiveness of reconstructed text are listed in Table 2. Here, 5 out of 7 subjects replied that “it was easy to notice errors” for either type of reconstruction. In addition, subject C commented “I realized that my sequence was strange when sentence order was reconstructed.” These results suggest that making the reader aware of something strange by presenting reconstructed text is effective in helping the reader notice errors in the understood How-Why structure.
Text Reconstructing System of Editorial Text
119
Table 1. Questions and responses in questionnaire Questions
Responses
1. Was it easy to notice errors from the reconstruction of sentence order?
1. It was easy 2. I could notice errors, but it was difficult 3. I could not
2. Was it easy to notice errors from the reconstruction of conjunctions? 3. Reasons why noticing errors from the reconstruction of sentence order was not easy (multiple selections OK)
1. I didn’t know that numerals indicated the order of original text 2. I didn’t notice that the order of numerals was different 3. I didn’t know which part of the How-Why structure was indicated by a difference in sentence order 4. I didn’t know what to do 5. Other
4. Reasons why noticing errors from the reconstruction of conjunctions was not easy (multiple selections OK)
1. I didn’t notice any conjunctions 2. I didn’t notice any conjunction errors 3. I didn’t know that conjunctions indicate relationships between topics 4. I didn’t know which relationship was indicated by an erroneous conjunction 5. I didn’t know what to do 6. Other
5. Was the error-location recognition support function helpful in structure correction?
1. Yes 2. No 3. I didn’t use the function
6. Was the error-location correction support function helpful in structure correction? 7. Reasons why the error-location recognition support function was helpful (multiple selections OK)
1. It was helpful in correcting an error location 2. It was helpful in understanding the erroneous relationship or topic at an error location 3. It provided an opportunity for reviewing the logical structure 4. It provided an opportunity for reviewing the reconstructed text 5. It provided an opportunity for rereading the editorial text 6. Other (continued)
120
Y. Okaniwa and T. Kojiri Table 1. (continued)
Questions
Responses
8. Reasons why the error-location correction support function was helpful (multiple selections OK) 9. Reasons why the error-location recognition support function was not helpful
Free description
10. Reasons why the error-location correction support function was not helpful
Table 2. Questions on effectiveness of reconstructed text Subject
A
B
C
D
E
F
G
Sentence-order reconstruction
1
1
1
1
1
2
2
Next, Questionnaire results on the error-recognition support function and errorcorrection support function are listed in Table 3. Here, 5 out of the 6 subjects who used both functions replied that they were helpful, and all 5 of them were able to correct all errors in the How-Why structure. Results of the number of responders choosing each reason why the support functions were helpful are listed in Table 4. Many responders replied that the support functions provided an opportunity to review logical structures, editorial text, and reconstructed text. In fact, any subject using either function went on to review logical structures, editorial text, and reconstructed text. Based on the above, it can be seen that presenting error-location candidates and correction candidates encourages the reader to check one’s understanding and that identifying error locations and supporting corrections are possible. Table 3. Responses on effectiveness of support functions Subject
A
B
C
D
E
F
G
Error-recognition support function
1
1
1
1
3
1
2
Error-correction support function
1
1
1
1
3
1
2
Text Reconstructing System of Editorial Text
121
Table 4. Number of responders to each choice in questions 7 and 8 Response Choice
1
2
3
4
5
6
Error-recognition support function
1
3
5
1
0
0
Error-correction support function
1
3
5
1
0
0
8 Conclusion In this paper, we proposed a technique for presenting reconstructed text based on the reader’s understanding of logical structure to help the reader recognize errors in the understood logical structure. We also proposed a technique for identifying error-location candidates and correction candidates and constructed an editorial-text comprehensionsupport system incorporating those techniques. The results of an evaluation experiment showed that presenting reconstructed text could indeed help the reader recognize errors in understanding logical structure. They also showed that a correction-support function has the potential of helping the reader correct those errors and reach the correct logical structure. In this study, the objective was not to support the understanding of topics from text— it was only to support the understanding of relationships between topics generated by the user. As a result, if the user cannot correctly understand the topics presented in the text, reconstruction cannot be correctly performed and correction candidates cannot be identified even if some sort of logical structure is constructed. We aim to take up a technique for supporting the understanding of topics in future studies.
References 1. OECD: PISA 2018 Results (Volume I): What Students Know and Can Do. PISA, OECD Publishing, Paris (2019) 2. Fukunaga, Y., Hirashima, T., Takeuchi, A.: Realization of the feedback function to using underlining activities aiming at promotion of the reading comprehension in e-learning instructional material and its learning effect. Jpn J. Educ. Technol. 29(3), 231–238 (2005). (in Japanese) 3. Mochizuki, T., et al.: “Development of eJournalPlus that supports practice of critical reading literacy. Jpn J. Educ. Technol. 3(3), 241–254 (2014). (in Japanese) 4. Hirashima, T., Horiguchi, T., Kashihara, A., Toyoda, J.: Error-based simulation for errorvisualization and its management. Int. J. Artif. Intell. Educ. 9(1–2), 17–31 (1998) 5. Aikawa, N., Koike, K., Tomoto, T.: Analysis of learning activities with automated auxiliary problem presentation for breaking learner impasses in physics error-based simulations. In: Workshop Proceedings of the International Conference on Computers in Education ICCE 2020, pp. 72–83 (2020) 6. Minto, B.: The Pyramid Principle: Logic in Thinking and Writing. Trans-Atlantic Publications (1976) 7. Suzuki, T.: Nihongokyo no Susume. SHINCHOSA Publishing Co., Ltd. (2009). (in Japanese)
Interfaces for Learning and Connecting Around Recycling Israel Pe˜ na1(B) and Jaime S´ anchez2 1 2
Department of Computer Science, University of Chile, Santiago, Chile [email protected] Center for Advanced Research in Education (CARE) and Department of Education, University of Chile, Santiago, Chile [email protected]
Abstract. In Chile and Latin America, people are interested in recycling, but they do not have enough tools to perform this activity properly. In the world, there are several recycling applications. Therefore, taking this experience, we constructed a recycling application that helps people to recycle as well as helps them to connect with other people through recycling, through comments and other actions that users can perform using the application. When developing the application, user-centered design methodologies are used, so two usability evaluations were performed, one in the design phase and the other at the end of the development, to understand whether the product is truly usable by users. The results of both evaluations helped to improve the application throughout its development, to finally arrive with a usable product for the end-users. Keywords: Human Computer Interaction · Usability · Recycle Interfaces for recycling · Connecting around Recycling
1
·
Introduction
Recycling is a topic of wide and diverse conversation and discussion nowadays. This is framed in the care of the planet, a topic that has been discussed for many decades. Even so, although people know they should recycle, and there are recycling campaigns by companies and places to recycle, there is not enough information or broad education on the subject. People, in general, are interested in the subject and want to start recycling, they try to look for information about recycling places in several places, especially on the Internet, ask in the local government, ask friends who recycle, etc. Despite this, there are almost no place where they can consult constantly and that has all the information they need. For this reason, many people who start recycling do not find motivation to continue doing it and end up not recycling until they are motivated again.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 122–135, 2023. https://doi.org/10.1007/978-3-031-35132-7_9
Interfaces for Learning and Connecting Around Recycling
123
At the end of 2019, the Sustainable Urban Development Perceptions Survey (EPDUS for its acronym in Spanish) of the Center for Sustainable Urban Development, conducted a study on household recycling and compared it with official figures from the Ministry of the Environment of Chile in 2017, which indicates that household garbage that was separated for some type of reuse or recycling was 1.8% of the national total. This study delivered specific figures for two main metropolitan areas on recycling: in Concepci´ on, 32% of the households interviewed declare to recycle and in Santiago, 41% does [1], so there is a big difference in the data. This difference could be explained in many ways, one is that people say they recycle but do not do it, another reason is that people encounter a lack of resources from the municipality. By the year 2019, 45% of the municipalities in Chile did not have a municipal recycling service [1]. According to an article published by the Ministry of Environment (MMA) in January 2020, which provides preliminary information on a survey on Household Waste Management conducted in 2019, the regions of Araucan´ıa (1.3%), Valpara´ıso (1.3%) and Metropolitana (1%), are the ones that recycle the most, and the municipalities that recycle the most are Vitacura, Las Condes and Providencia, being these the ones that spend the most on recycling [2]. The survey was carried out by a partnership between CEMPRE Chile, the Ministry of Environment and the Economic Commission for Latin America and the Caribbean (ECLAC). Then, it was decided to create a mobile application to learn about recycling and help people in their recycling, generating ways for the information to be fully in one place and that everyone has access to it. In Chile there is only one digital solution that collects all the information needed to recycle, this is ReChile [5], which is provided by the Ministry of Environment and there is only a web version of it. Internationally, there are other solutions, such as RecycleNation [3] or CleanSpot [4]. The purpose of this work is to design, implement, and evaluate Reciclator, a mobile applicationthat helps people to understand, learn, and implement the recycling process.
2
Related Work
The focus of this related work is to know the work already done on this topic as well as to know the impact that this work can have. In the world there are several applications that help to recycle, one of these is RecycleNation [3], which is an application from the United States that allows for searching clean points through the waste recycled, this is shown on a map together with the information of each of these clean points. There is also an application from Spain called CleanSpot [4], which by giving the address and the waste type, it is possible to get information about the clean points available for recycling that waste. In Chile, however, there is only one option similar to the previous ones, which is administered by the Ministry of Environment and is called ReChile [5], this
124
I. Pe˜ na and J. S´ anchez
is a platform that shows information on how to recycle certain types of waste, as well as a map where it is possible to find all the clean points, along with the address and what can be recycled by them. Also, in Chile there are several applications and services similar to Reciclapp [6], which is an application that, upon hiring its services, goes to people’s homes or jobs to pick up their waste and then recycle it. As can be seen, most of them are applications that respond to what, how and where to recycle, and together with the responses of people in the previous articles, can be known that this is the approach that should have the new solution.
3
Methodology
For the development of this work, a combination of Rapid Prototyping [7] and a User-Centered Design (UCD) [8] methodology was used. Rapid Prototyping is an iterative process used in product design; the goal is to rapidly improve a prototype design in several short-term cycles while occupying the least amount of resources [7]. Rapid Prototyping steps are as follows [7]: – Prototyping: A prototype is created. This can be of low or high fidelity and may or may not be interactive. – Feedback: The prototype is shared with other developers or end-users, who review it and provide their comments. – Improvement: Feedback from the previous stage is used to improve the prototype, leading to a new iteration of the process. It stops when deadlines are met, or a final product is obtained. Prototypes, as mentioned above, can be low-fidelity or high-fidelity. Lowfidelity can be a drawing made on paper or digitally, lacks detail and is not interactive. A high-fidelity prototype resembles a real product in which the user can interact with it, so design decisions can be made based on user feedback. The user-centered design methodology is a development process that places the end-user at the center of product development. The development phases of this methodology are: – Specify the context of use: It identifies the people who will use the product, for what purpose and under what conditions. – Specify Requirements: User requirements are identified. This can be done by observing the environment, the users, or by directly obtaining user feedback. – Create design solutions: The product design is built. – Evaluate the design: The product is validated through a usability evaluation. For the third and fourth phases of this methodology, the Rapid Prototyping methodology [7] was used using a high-fidelity prototype. The goal of using user-centered design is to capture and address the entire user experience, as the user is involved in the entire design process, from specifying the context, to evaluating the design [9].
Interfaces for Learning and Connecting Around Recycling
125
Fig. 1. Proof of concept.
The Rapid Prototyping methodology [7] can be used in conjunction with a user-centered methodology and is very convenient since user feedback is required for interfaces to be usable. Also, using this methodology it is possible to quickly get user feedback to make improvements to the design.
4
Reciclator Application
The application developed in this work, called Reciclator, is an application that helps people who want to recycle, more specifically to identify what waste can be recycled, how and where. This application was developed from a proof of concept developed in a computer science degree course, a first prototype of the design was implemented using Figma [10] and then for the implementation was used Flutter [11], each one was subjected to a usability evaluation. 4.1
Requirements
The main requirement of the application is that it must clearly show users what can be recycled, how and where to recycle them. To show where to recycle, this must be displayed on a dynamic map, and it must be user-centered. Looking at the recycle applications that already exist in the world, and with the experience at recycling, it can be seen that they provide necessary information to carry this out, but people who recycle always lack some information
126
I. Pe˜ na and J. S´ anchez
and talk to people about that, they need to ask the administrators of the clean points or write it in the same clean points, so it is necessary that this application addresses the communication between users and that the same users can give information through the application. 4.2
Prototype 1 Design
As previously stated, this application started with a proof of concept designed in Figma [10], which had sections to know what can be recycled, asking questions to the user, how to recycle each of the waste and a section where it shows the clean points on a map (see Fig. 1). The first design stage was centered on designing the user interaction with the application, such as improving some aspects of the proof of concept. A Profile section was created, where users could see their saved wastes and their own information, as well as a section for messages between users. Another important addition was that clean points could now be rated, shared with other users of the application, add a comment on it and report if a container of this clean point has a problem (see Fig. 2).
Fig. 2. Design of prototype 1.
Interfaces for Learning and Connecting Around Recycling
4.3
127
Final Prototype Design
For the final prototype few changes were made, two sections were added, the home and the support section, as well as some small changes were made in the main menu, since the Flutter menu design was followed. All these changes were made after analyzing the usability evaluation implemented with the design of prototype 1. This evaluation will be discussed in the Evaluation section. 4.4
Functionalities
The main functionalities of Reciclator are to search for a clean point, save a waste, comment, and share a clean point, among others. It is important to emphasize that most of these functionalities work through the internet, so it is necessary to have a good internet connection. Login. It is essential that the application has a user system, so when the application is started, an interface opens in which can be logged in with a Google account, so by selecting this option, the application uses the Google authentication service for the user to choose an account and log in to the application. After logging in for the first time, the application saves the user’s information directly in the database and locally, when logging in again it retrieves the user’s information and logs in without asking again. To log in with another account or simply log out, the use has to go to the profile section and in the menu of that interface will show the Logout option. Consult Nearby Clean Point. To consult a nearby clean point, the user can enter directly from the Map section and filter by some waste, also from the screen of a waste the user can press the Where to Recycle button, this action shows the map with the filter of this waste already included (see Fig. 3). The map starts at the user’s current position, the application obtains this information through the device’s gps. The user can navigate the map to the desired clean point. If the user wants to return to the current user position, there is the option to press to perform this action. Each of the red marks on the map is a clean point, which are positioned using the actual geo-positioning information. By clicking on one of these marks, the user can see the information of this clean point, which is composed of general information, comments, and reports. In the general information the user can obtain information such as the owner and administrator of the clean point, the address, the schedule, and the waste that can be recycled. Also, in this screen the user can share and rate the clean point. Comment on Clean Point and Waste. There is a comment section on the clean point screen, as well as on the waste screen. These comments work in the same way in both screens, a user can see the comments of other users and make
128
I. Pe˜ na and J. S´ anchez
Fig. 3. Clean point map and waste information
Fig. 4. Clean point and waste
Interfaces for Learning and Connecting Around Recycling
129
his own comment, the user can also like the comments and sort them by date or by likes (see Fig. 4). Users can use this functionality to provide information to other users about these sections, this information can be the same information that already appears elsewhere in the application, as well as information that by design the application may not be providing. Save Waste. To consult a waste for the first time, it can be done in three ways: 1. When identifying a waste, the application sends the user to the screen of this waste, 2. In the clean points there is a button to go to the waste that can be recycled, and a third way is explained below. But if the user wants to go back to consult information about a waste it can be cumbersome to go through these steps again, so in the screen of a waste there is an option to save a waste (see Fig. 4), which saves a shortcut to this waste in the My Waste section of the Profile, to consult it in a much faster way. In this section there is also a button to add a new waste, so the user can directly save wastes without having entered them before, so if the user adds it this way, can enter a waste for the first time from this section. Report Clean Point Container. Users can use the Reports section in a Clean Point to be able to report whether a waste container is full or closed. This is a good way for other users or administrators of this clean point to know the status of the clean point. Send Message to Another User. In the Messages section, users can send a text message to another user and view the chat with any user of the application. It is also possible to send a link to a clean point or a waste as a message, which can be done in each of these sections. Rating a Clean Point. From the information screen of a clean point, users can see the average rating that users gave to this clean point, and by clicking on this rating, a user can also rate a clean point from 1 to 5 stars.
5
Usability Evaluation
As previously stated, two usability evaluations were performed, the first one was performed to the application redesign created in Figma, and then a final evaluation was performed to the application implemented on Android devices. 5.1
Prototype 1 Design Evaluation
This evaluation was performed with end-users interacting with the design created in Figma [10], so the presentation mode of this platform was used to perform this evaluation.
130
I. Pe˜ na and J. S´ anchez
Sample. The evaluation was carried out with 9 end-users, who were observed, interviewed and then they answered an end-user survey. Of these 9 individuals, 6 were female and 3 were male. Four of these users were between 20 and 30 years old, 3 between 30 and 40 years old and 2 users were 60 years old or older. Procedure. Two types of evaluations were carried out, online and face-to-face, in the online form they were asked to be on a computer with internet in a video call, and before starting the procedure they were asked to open the Figma page with the demonstration of the prototype. In the face-to-face form, they were asked to open the Figma page while they were being recorded with a cell phone, in both cases they were asked for authorization to record them. First, they were asked about their personal data and their experience in recycling. After this, the observation began by asking them to perform a series of tasks. This was followed by an interview, and at the end, they were asked to complete an end-user questionnaire. Observation. For the observation, they were asked to perform a series of tasks, which had mini tasks detailing actions that could or could not be performed by the user, these tasks are as follows. 1. Task 1: Check Reports Section. 2. Task 2: Identify a Waste, see how it is recycled and share it. 3. Task 3: Check where you can recycle near the site and check where you can recycle a plastic bottle. 4. Task 4: Check Profile Section. 5. Task 5: Check Clean Point and share it. 6. Task 6: Check Messages section. The objective of this phase was to see if the users could detect all the elements of the interfaces, as well as to see if they did not understand how certain elements worked. Interview. The interview was conducted right after conducting the observation and 7 questions were asked to the users, which were as follows. 1. 2. 3. 4.
Is it easy to understand how the application works? Explain. Would you use the application if you need to recycle and why? Did you like the design of the application? Please explain. Do you understand the 5 sections that the application has, and what they are for? Explain. 5. In general, did you like the application and why? 6. Is there anything you did not like about the application and what? 7. Would you add anything to the application, what thing? End-User Questionnaire. The end-user questionnaire was conducted with the users themselves right after the interview was conducted. This end-user questionnaire is an adaptation of the mobile application usability evaluation guideline provided by Professor Jaime S´ anchez [12].
Interfaces for Learning and Connecting Around Recycling
131
Results. By observing the users, it could be seen that some of them did not interact with some elements of the interface, such as the comment filters and the share and rate buttons. This may happened since it was only a design, some users did not see the need to interact with these elements. Some users found it difficult to understand that the section to identify a waste had to be entered from the bottom menu. Several users commented that they had trouble understanding the Report and Identify sections. Also, several people gave suggestions on what could be added to the application, some of which are immediate, such as showing the status of a waste, a clean point, and some help if they felt lost in the application. Regarding the results of the End-User Questionnaire (Fig. 5), we can see that people mostly agreed with most of the statements presented. Only on a few of the statements they rated them with a 3, which is neutral, and since there are so few of them, it may not be a very complex problem. Only on one of the options did they rate with a 2, which was on whether the interface was pleasant, this user commented that the sections could be renamed, so this response could be due to that. They were also asked about an overall rating, which gave as a result that 44.4% found it Good, the same percentage found it Excellent and only 11.1% found it Neutral. With this result and the result of the interview, it can be deduced that the users liked the design.
Fig. 5. End-user Questionnaire Results for Prototype 1
5.2
Final Evaluation
Unlike the previous evaluation, this evaluation used the Reciclator application installed on the devices of each of the users, so it was essential that they had an Android device.
132
I. Pe˜ na and J. S´ anchez
Sample. In this evaluation, the same methods of the previous evaluation were performed, but the number of users changed, the first methods, that is observation and interview, were performed to 8 users and the end-user questionnaire was performed to 22 users. Out of the 8 users, 5 were women and 3 were men, and with respect to age, the users were between 20 and 65 years old, in a very balanced way. According to Jakob Nielsen [13], more than 3 users are required for observations and 5 or more for interviews, so this part of the sample meets the standards. Regarding the sample for the questionnaire, Nielsen says that 30 users are needed, but it was only possible to get 22 users. Procedure. The evaluation to the 8 users were done in person, so they were sent a link to download and install the application, making it clear to them that they needed internet to use it. They were also asked for consent to write down everything they were doing and the answers they gave to the questions asked. First, they were asked for basic data, such as name and age and their experience with recycling. Then, they were asked to perform 6 tasks, and while they were performing them, the observation was made noting whether they fulfilled these tasks. Then, the interview was conducted and at the end, they were asked to answer the end-user questionnaire. For the other users who only answered the end-user questionnaire, they were asked to download the application, check it, and then answer the questionnaire online. Observation. The observation had the same objective as the previous evaluation and the series of tasks they were asked to perform also had sub-tasks, these tasks are as follows. 1. Task 1: Login to the application. 2. Task 2: Check a clean point near your location. 3. Task 3: Enter the Identify waste section of the main menu, identify a waste and browse through it. 4. Task 4: Check your profile. 5. Task 5: Check the messages section and send a message to someone. 6. Task 6: Check the Support section. Interview. The questions asked to the 8 users were as follows. 1. 2. 3. 4.
Is it easy to understand how the application works? Explain. Would you use the application if you need to recycle and why? Did you like the design of the application? Please explain. Do you understand the 5 sections that the application has, and what they are for? Explain. 5. What function of the application did you like the most and why? 6. In general, did you like the application and why? 7. Is there anything you did not like about the application and what?
Interfaces for Learning and Connecting Around Recycling
133
8. Would you add anything to the application, what thing? The questions are the same as the previous evaluation, but a question about functionalities was added, since the previous evaluation focused on evaluating the design and this one, although it also evaluated the design, focused on evaluating the functionalities. The questions were the same as the previous evaluation, but a question about functionalities was added, since the previous evaluation focused on evaluating the design and this one, although it also evaluated the design, focused on evaluating the functionalities. End-User Questionnaire. This end-user questionnaire was conducted to the 8 users who were tested by the other methods, as well as to another 14 users, who were asked to check the application before answering the questionnaire. Results. According to what was observed, the users had no major problems using the application, there were only some elements that were difficult to understand, such as popups appeared when they wanted to search for a clean point or once sharing a clean point. As in the previous evaluation, the buttons to sort comments and reports were not used. In the home page, users did not click on all the buttons, so they did not see any use for them. In the waste screen, some users did not click on all the buttons or took a long time to do so, so this behavior needs to be studied more fully. Several users commented that the option of having direct comments with other users was not very useful, although one user said that he liked being able to communicate with other users through the application. When asked if they liked the application, many commented positively and said they would use it when they went to recycle. They liked knowing other people’s opinions and the functionality they found most useful was being able to know information about the clean points, knowing where they are, what their status is and what they can recycle in them. As can be seen in the Summary of the final questionnaire (Fig. 6), in most of the statements, users said they strongly agreed or agreed with these, to a lesser extent they were neutral and very few people said they disagreed. As in the previous evaluation, they were asked about an overall rating, which resulted in 63.6% finding it Good, 31.8% finding it Excellent and only 4.5% finding it Neutral. With this result and the result of the interview it can be inferred that the users liked the application and was more positive than the previous evaluation.
134
I. Pe˜ na and J. S´ anchez
Fig. 6. End-User Questionnaire Results for Final Prototype
6
Discussion
Users were asked about what functionality they found most useful and there was a clear correlation of responses with respect to their recycling experience, as users with little recycling experience found the information provided regarding what to recycle, how and where very useful, more specifically the information about the clean points and what waste they accepted. But users with more experience in recycling, already knew some of that information, so they found more useful the social aspects of the application, such as the comments on both clean points and waste and commented that the section on how to recycle could be very useful for people new to recycling. Another comment was the topic of the messages section between users, several did not find it very useful, even one user said that he would change it for another new section. It might be necessary to study more fully these responses and think about evaluating how usable this section is. In general, there were also several comments regarding the use of colors and the attractiveness of the home page, so it is important to redesign it to make it more attractive to the eye of the users. There were several difficulties in the development process of this work, several of these linked to the pandemic of COVID-19, so that several evaluations had to be delayed and the expected number of users was not achieved, so it could be done better in a next iteration of usability.
7
Conclusions
The development procedure of this application helped to improve the application and make it usable. The evaluation of the design helped to understand that there were several sections that were important, such as the Home and a Support.
Interfaces for Learning and Connecting Around Recycling
135
All people said that the application is useful, so it can be concluded that there is a need for such an application in Chile. This application can help people to have a tool in Chile that helps them to recycle, because it would have all the information needed to recycle in one place, which does not currently exist in our country. This work can be useful for the implementation of new applications and recycling systems in the world, since the updated opinion of users regarding clean points or waste can be a great help for other users, since recycling is a community action, where it is important what the other person thinks and how the community environment is. As future work, the messages section should be evaluated, as it was not very useful and can be easily outsourced with the most used instant messaging applications. Also, as future work, the design should be improved to make the application more attractive to use and attract people with the colors and shapes. Finally, the section to identify waste should be better implemented so that it can identify a lot of types of waste and make this section useful. Acknowledgements. This work has been developed with the support of the Basal Project FB0003, the Basal Funding for the Centers of Excellence in Research, the Program of Associated Research of ANIED, Chile.
References 1. CiperChile. Reciclaje domiciliario en chile: queremos, pero no nos dejan. https:// www.ciperchile.cl/2021/04/09/reciclaje-domiciliario-en-chile-queremos-pero-nonos-dejan/. Accessed 23 Jan 2023 2. Ministerio del Medio Ambiente. Encuesta sobre gesti´ on de residuos domiciliarios 2019. https://mma.gob.cl/encuesta-sobre-gestion-de-residuos-domiciliarios2019-araucania-valparaiso-y-metropolitana-son-las-regiones-que-mas-reciclan/. Accessed 23 Jan 2023 3. RecycleNation Homepage, https://recyclenation.com/. Accessed 23 Jan 2023 4. CleanSpot Homepage, https://cleanspotapp.com/. Accessed 23 Jan 2023 5. ReChile Homepage, https://rechile.mma.gob.cl/. Accessed 23 Jan 2023 6. Reciclapp Homepage, http://reciclapp.cl/. Accessed 23 Jan 2023 7. DevSquad and Merrill, M.:What is rapid prototyping and why is it used in development?. https://devsquad.com/blog/what-is-rapid-prototypingand-why-is-it-used-in-development/. Accessed 23 Jan 2023 8. S´ anchez, J.: Interacci´ on Humano Computador. Universidad de Chile (2021). Accessed 23 Jan 2023 9. International Design Foundation. User Centered Design. https://www.interactiondesign.org/literature/topics/user-centered-design/. Accessed 23 Jan 2023 10. Figma, https://www.figma.com/. Accessed 23 Jan 2023 11. Flutter, https://flutter.dev/. Accessed 23 Jan 2023 12. S´ anchez, J.: Evaluaci´ on de usabilidad de aplicaciones, cuestionario de usuario final. Universidad de Chile, Departamento de Ciencias de la Computaci´ on (2017) 13. Nielsen, J.: Usability Engineering, Academic Press (1993)
Sound Logo to Increase TV Advertising Effectiveness Based on Audio-Visual Features Kazuki Seto and Yumi Asahi(B) Graduate School of Management, Department of Management, Tokyo University of Science, 1-11-2, Fujimi, Chiyoda-Ku 102-0071, Tokyo, Japan [email protected]
Abstract. Internet advertisement has been expanding in recent years. More creative works for TV commercials are necessary. A sound logo, which is one of the components of the sound of TV commercials, is expected to leave more impression on consumers. This paper proposes the design policy of sound logos to increase TV advertising effectiveness. To make a useful proposal, we analyze the audio-visual features which will affect product recognition and purchase intention for each product category. Firstly, as the basic analysis, we extract actual sound logos which enhance advertising effectiveness in product recognition and purchase intention. After that, we group the sound logos into three product categories: food and drink, services, and daily necessities. Secondly, as the main analysis, we clarify the effective features which increase advertising effectiveness for each product category. We define three features of sound logos: the number of sound logos in a TV commercial, visual features, and audio features. Then, we analyze them respectively by correlation analysis, correspondence analysis, and Hierarchical cluster analysis to reveal the effective features. Finally, we summarize the design policy of sound logos to enhance TV advertising effectiveness through our audio-visual features derived from our analysis. Keywords: Sound logos · Advertising effectiveness · Audio-visual features · Product recognition · Purchase intention
1 Introduction In recent years, people are exposed to advertisements through various media such as Television and the Internet. Figure 1 shows the ratio of advertising expenditure for each media in Japan in 2021. It can be seen that a large amount of advertising expenditure is still spent on television media. However, Internet advertising is expanding rapidly and has surpassed the advertising expenditure of TV media [1]. More creative works for TV commercials are needed to be more impressive to consumers. To make impressive TV commercials with advertising effectiveness, it is necessary to make them memorable. People’s auditory memory is much better at retention. In addition, audio advertisements have the added benefit of being easier for the brain to process. Furthermore, Lightwave and iHeartMedia showed that audiences had a much more physical and emotional reaction to audio advertisements over visual advertisements [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 136–151, 2023. https://doi.org/10.1007/978-3-031-35132-7_10
Sound Logo to Increase TV Advertising Effectiveness
137
TV media 27% Internet media 40% Newspaper 6% Magazine 2% Radio 1% Promotion media 24%
Fig. 1. Ratio of advertising expenditure of media in Japan in 2021.
In this study, the importance of the sound components of TV commercials is suggested to make them memorable. To this end, we focus on sound logos which are one of the components of the sound of TV commercials. A sound logo is a copy text of a product name, a company name, or other information which is placed on a melody. In TV commercials, it makes consumers be impressed to consider purchasing the advertised product [3]. Koike [4] found that the name of a company had a more positive effect on consumer memory when a sound logo was used in the audio of an advertisement than in narration and sound effects. In this study, it is said that sound logos have a positive effect on consumers’ memory. Matsuda [3] also conducted a study on the effects of repetition and melody-familiarity of sound logos on products. The study showed that melody familiarity and the repeated presentation of the sound logo increased the sense of closeness, liking, and purchase intention. It is suggested that the repetition of the sound logo and melody familiarity increase positive feelings towards the product. Since previous studies have shown that sound logos have to some extent advertising effects, specific visualization which is projected on the screen with sound logos and their auditory features of them have not been revealed. Moreover, it was said that consumers take various processes for their purchase depending on the product category [5]. The DAGMAR model is known for defining the purpose of the advertisement to measure their effectiveness. Product recognition and purchase intention are included as purposes in it. [6]. It can be seen they are important consumer attitudes related to advertising effectiveness. Based on these points, we believe that sound logos must be carefully designed to be effective for consumer attitudes by product category. This paper proposes the design policy of sound logos that have advertising effectiveness on consumer attitudes such as product recognition and purchase intention by product
138
K. Seto and Y. Asahi
category. Our analysis derives the audio-visual features of sound logos to affect product recognition and purchase intention depending on product categories, respectively.
2 Flow of This Study Figure 2 is the flow of this study. Firstly, we collect the data for analysis. The data contain the provided data such as questionary and TV data and the originally collected data such as video data. Secondly, we extract sound logos which produce advertising effectiveness for product recognition and purchase intentions by each category. We classify them into three categories: food and drink, services, and daily necessities. Thirdly, we analyze the sound logos with correlation analysis, correspondence analysis, and Hierarchical cluster analysis to clarify the effective visual and auditory features. Finally, we propose the design policy for advertising effective sound logos.
1. Collecting data •Being provided Questionary and TV data •Collect video data of sound logos 2. Basic analysi •Conduct semiparametric difference-in-difference analysis on video data of sound logos •Classify effective sound logos into product categories 3. Main analysis •Conduct correlation analysis on the data of the number of sound logos used in a TV commercial •Conduct correspondence analysis on visual features of sound logos •Conduct hierarchical cluster analysis on audio features of sound logos 4. Proposing design policy for sound logos Fig. 2. Flow of this study.
3 Data Summary In this paper, video data, questionary data, and TV data are used. As for video data, we obtain more than 60 video data of sound logos which is in the TV commercials from the official YouTube™ sites. Among the obtained video data, as the basic analysis, we extract 39 video data of sound logos which have advertising effectiveness. To estimate
Sound Logo to Increase TV Advertising Effectiveness
139
the advertising effectiveness of sound logos to consumers in the basic analysis, we use questionary data and TV data which are provided by Nomura Research Institute. The number of samples who answered is 2,500. The questionnaires contain personal attributes (gender, age, marital status, presence or absence of children, consumer values, and frequency of channel use), TV commercials viewing, and consumer attitudes (product recognition and purchase intention) for some products. The TV data is composed of TV program contents with the advertisements, and broadcast timeslots of the advertisements from January 23rd to April 3rd , 2022.
4 Basic Analysis to Extract the Video Data for Advertising Effective Sound Logos To extract the video data, we estimate the amount of advertising effectiveness of each TV commercial that includes a sound logo. To calculate the advertising size, we apply a semiparametric difference-in-differences method with a propensity score [7]. To use the method, the amount of advertising effectiveness can be presented as the amount of advertising effectiveness =
1 n Yi1 − Yi0 Wi − e(Xi ) . i=1 N Pw 1 − e(xi )
The variable is shown in Table 1. The propensity score is estimated by Logistic regression analysis. We use the individual attributes of the 2,500 samples as explanatory variables. If the amount of advertising effect is positive, it is assumed to be effective. If it is negative, it is assumed to be ineffective. Table 1. Variable of the formula. Variable
Contents of variable
N
The number of answers (2,500)
Xi
Covariates of participant i
Yit
Product recognition (purchase intention) at time t of participant i
Pw
Viewing rate of the TV commercial
e(Xi)
Propensity score
t
t = 0: Result of first survey, t = 1: Result of second survey
Wi
Wi = 0,1: i watched the TV commercials or not
i
One of the answers
4.1 The Result of the Basic Analysis In the basic analysis, we collected 18 TV effective commercials data for product recognition and 17 TV commercials data for purchase intention.
140
K. Seto and Y. Asahi
For proposing the design policy of sound logos for each product category, we define the three product categories as food and drink, services, and daily necessities. In each product category, the number of effective TV commercials data was nine for food and beverages, six for services, and three for daily necessities in terms of product recognition. In terms of purchase intention, it was eight for food and beverages, five for services, and four for daily necessities.
5 Main Analysis to Clarify the Features of Sound Logos with Advertising Effectiveness In the main analysis, we clarify features of sound logos which have advertising effectiveness. To conduct the main analysis, we define three features of sound logos. Then, we analyze them by different analysis methods which quantify the relationship between the features and advertising effectiveness. The first feature is the number of sound logos which is used in a single TV advertisement. Correlation analysis is used. The second feature is a visual feature that represents what is projected on the TV screen. Correspondence analysis is used for them. The third feature is an auditory feature which represents the volume and pitch of the background music. Hierarchical cluster analysis is conducted on them. 5.1 Correlation Analysis and Result In the obtained video data, sound logos are shown once or twice in TV commercials. We clarify the relationship between the number of sound logos and advertising effectiveness for product recognition and purchase intention by using correlation analysis. The analysis is conducted for each of the three categories. In the analysis, advertising effectiveness is used as the objective variable, and the number of times the sound logo is used as the explanatory variable. Table 2. The result of correlation analysis. Category
Purchase recognition
Purchase intention
Food and drink
0.16667
0.25
Service
0.38576
0.15811
−0.40825
−0.40825
Dairy necessities
The result of the analysis is shown in Table 2 Through the result, all the correlation coefficients are below 0.5. It indicates that the number of sound logos in each TV commercial has no significance in advertisement effectiveness.
Sound Logo to Increase TV Advertising Effectiveness
141
5.2 Correspondence Analysis and Result In this analysis, we clarify the tendency of projection on the screen in sound logos which have advertising effectiveness. Correspondence analysis can be used to illustrate the characteristics of data consisting of two categories and to visually grasp the relationship between the two categories [8]. In this paper, the two categories are visual features (Varaiable1) and sound logos classified according to the product category and advertising effectiveness (Variable 2). We believe that it is possible to visually grasp visual features which are strongly linked to advertising effectiveness. A list of variable names and contents in variable 1 is shown in Table 3. And that of variable 2 is shown in Table 4. Table 3. A list of variables and the content in Variable 1. Variable 1
Contents
actress
An actress is shown in the first sound logo
actor
An actor is shown in the first sound logo
no human
No humans are shown in the first sound logo
item
A product name is shown in the first sound logo
company
A company name is shown in the second sound logo
2actress
An actress is shown in the second sound logo
2actor
An actor is shown in the second sound logo
2nohuman
No humans are shown in the second sound logo
2item
An item name is shown in the second sound logo
2company
A company name is shown in the sound logo
Table 4. A list of variables and the content in Variable 2. Variable 2
Contents
food and drink_1
Advertising Effective Sound logos of food and drink
food and drink_0
Ineffective Sound logos of food and drink
survice_1
Advertising Effective sound logos of service
survice_0
Ineffective Sound logos of service
dairy_1
Advertising Effective Sound logos of dairy necessities
dairy_0
Ineffective sound logos of dairy necessities
Before the analysis, we score 0 or 1 on whether the features in variable 1 are projected or not for each sound logo. After that, we tabulate the score for each Variable 2. Then, a cross-tabulation table is created based on Variable 1 and Variable 2. We make 4 cross tabulation tables. We divide the cases in terms of the number of sound logos used in a TV commercial and consumer attitudes (product recognition or purchase intention). Table 5
142
K. Seto and Y. Asahi
is one of the examples. In Table 5, Variable 2 shows sound logos used once classified according to advertising effectiveness for product recognition. Table 5. One of the cross-tabulation tables shown as an example. Variable2
Variable1 woman
man
no human
Item (product)
company
dairy_1
0
0
1
1
1
food and drink_0
1
0
0
1
0
food and drink_1
4
0
3
7
2
survice_0
4
1
1
0
5
survice_1
4
0
0
0
4
The results of the analysis conducted on each of the cross-tabulation tables are represented in the scatter plots in Figs. 3, 4, 5 and 6. These tables tell us about the correspondence between advertising effective sound logos and their visual features of them.
Fig. 3. The relationship between Variable 1 and Variable 2 (sound logos used once) in Product Recognition.
The following results are indicated from the scatterplot of Fig. 3. In the sound logos used once that have advertising effectiveness on product recognition, those in the food and drink category tend to show the item logo, services tend to show an actress, and daily necessities tend to show no humans. The scatterplot of Fig. 4 tells us the following results. In the sound logos used twice that have advertising effectiveness on product recognition, the food and drink category’s sound logos tend to show an actor with the first sound logo and item logos with the second. The dairy necessity category’s sound logos tend to show company logos with the first and second sound logos.
Sound Logo to Increase TV Advertising Effectiveness
143
Fig. 4. The relationship between Variable 1 and Variable 2 (sound logos used twice) in Product Recognition.
Fig. 5. The relationship between Variable 1 and Variable 2 (sound logos used once) in Purchase Intention.
We can figure out the following results from the scatterplot of Fig. 5. In the sound logos used once that have advertising effectiveness on purchase intention, the food and drink category tends to show the item logo, and the daily necessities category does not tend to show any humans. The scatterplot of Fig. 6 shows us the following results. In the sound logos used twice that have advertising effectiveness on purchase intention, those of the food and drink category tend to show an actor in the first sound logo and item logos in the second. Those of the service tend to show no humans in the first and actresses in the second. Those of the dairy necessity tend to show company logos both in the first and second sound logos. Comparing the results of Figs. 3 and 4, the result of Figs. 5 and 6 respectively, there are differences in the visual features that correspond to the advertising effectiveness of sound
144
K. Seto and Y. Asahi
Fig. 6. The relationship between Variable 1 and Variable 2 (sound logos used twice) in Purchase Intention.
logos. Therefore, this suggests that the use of the same visual features in commercials with sound logos used twice does not always lead to advertising effectiveness as those used once. On the other hand, comparing the results of Figs. 3 and 5, the result of Figs. 4 and 6 respectively, there is little difference in the features that correspond to advertising effectiveness sound logos. Therefore, this suggests that the visual features of the sound logos that have advertising effectiveness on product recognition can also be expected to have advertising effectiveness on purchase intention. 5.3 Hierarchical Cluster Analysis and Result In this analysis, we reveal that audio features of sound logos have advertising effectiveness on product recognition and purchase intention by product categories. To analyze the auditory features of advertising effectiveness, we use hierarchical cluster analysis. It is the method of forming new clusters by merging sample points that are closest to each other, with each sample point being considered a cluster. The result is expressed in a tree diagram called a dendrogram, and the number of clusters is generally determined by looking at the dendrogram. The ward method is applied as the cluster generation method. This method takes the distance between clusters as the loss of information for each sample. The clusters are combined in such a way that the sum of the squares of the deviations within all clusters is as small as possible. Using this method, coherent clusters can be obtained [9]. Each sound logo has various auditory features. They are grouped with similar auditory features into clusters by conducting the cluster analysis. We evaluate the ratio of advertising effective sound logos in each cluster and identify the clusters in which the ratio of effective sound logos is high. It is concluded that the auditory features of sound logos in such clusters have advertising effectiveness. In other words, we clarify the
Sound Logo to Increase TV Advertising Effectiveness
145
auditory features with advertising effectiveness by identifying auditory features of the clusters containing many effective sound logos. The Analysis Conducted on Volume Features: Auditory features can be classified into two features; volume and sound pitch. First, volume features are conducted in the analysis. We obtain the volume data of the sound logos which is collected by using a Python library called “librosa”. The volume data are obtained as numerical data as it changes over time. The analysis is conducted on these data to obtain clusters of sound logos with similar loudness, quietness, and rate of change.
Fig. 7. Dendrogram of the Analysis on Volume Features.
Table 6. List of volume features of each cluster. Cluster
Volume features of each cluster
Cluster1
High volume and volume changes
Cluster2
High volume and volume changes
Cluster3
Low volume at first and then getting higher
Cluster4
Low volume and volume changes
Cluster5
High volume at first and then getting lower
The dendrogram is shown in Fig. 7. We divide it into five clusters. The list of volume features of sound logos is shown in Table 6. From the list, it can be seen that sound logos are divided into clusters according to the loudness and the way in which the volume changes.
146
K. Seto and Y. Asahi
Table 7. Percentage of sound logos with advertising effectiveness for product recognition. Cluster
Category Food and drink
Service
Dairy necessities
Cluster1
None
67%
100%
Cluster2
100%
0%
100%
Cluster3
None
75%
None
Cluster4
100%
29%
33%
Cluster5
88%
100%
100%
Table 8. Percentage of sound logos with advertising effectiveness for purchase intention. Cluster
Category Food and drink
Service
Dairy necessities
Cluster1
None
33%
100%
Cluster2
100%
0%
100%
Cluster3
None
50%
None
Cluster4
100%
57%
67%
Cluster5
75%
50%
100%
Tables 7 and 8 show the results of the analysis which describes the percentage of effective sound logos for product recognition and purchase intention by product categories. For each product category, a cluster was considered effective if the effective sound logos were taken up more than 50% of the cluster. Therefore, in food and drink, clusters 2, 4, and 5 are effective for both product recognition and purchase intention. For services, clusters 1, 3, and 5 are effective for product recognition and cluster 4 is for purchase intention. For daily necessities, clusters 1, 2, and 5 are effective for product recognition and clusters 1, 2, 4, and 5 are for purchase intention. Clusters 1, 3, and 5 commonly have a bigger volume. From this, for sound logos in the service category, the volume of effective sound logos on product recognition would be characterized by its big volume. On the other hand, Cluster 4 has significantly less change in volume and loudness than the other clusters. Therefore, sound logos in the service category with advertising effectiveness on the purchase intention would tend to be low volume. Clusters 1, 2, and 5 have the bigger volumes. Therefore, effective sound logos in the dairy necessities for product recognition would tend to increase the volume of the sound logo.
Sound Logo to Increase TV Advertising Effectiveness
147
The Analysis Conducted on Sound Pitch Features: Next, we reveal the sound pitch of sound logos with advertising effectiveness on product recognition and purchase intention by product categories. As with the features of sound volume, hierarchical cluster analysis is used, and the ward method is employed in the cluster generation method. The pitch data is the fundamental frequency (hereafter f0 value) obtained by the python library “librosa”. The f0 value is an audio feature that describes the pitch of the sound. A higher f0 value indicates a higher pitch, while a lower value indicates a lower pitch. In this analysis, when we divide the sound logos into clusters, we use the highest and lowest sound pitch and the degree of the sound pitch changing as an indicator. Therefore, the maximum, minimum, and standard deviation of the f0 values in the Sound Logo videos are used in the analysis. Some sound logos for which the f0 value cannot be obtained sufficiently are excluded from this analysis.
Fig. 8. Dendrogram of the Analysis on Sound Pitch Features.
Table 9. List of sound pitch features of each cluster. Cluster
Sound pitches feature of each cluster
Cluster 1
Higher sound pitch and bigger pitch changes
Cluster2
Lower sound pitch and smaller pitch changes
The dendrogram is shown in Fig. 8. We divide it into two clusters. The list of sound pitch features of sound logos is shown in Table 9. From the list, it could be seen that sound logos were divided the clusters according to the high or low pitch and the sound pitch changes.
148
K. Seto and Y. Asahi
Table 10. Percentage of sound logos with advertising effectiveness for product recognition. Cluster, Categories
Food and drink
Service
Dairy necessities
Cluster1
86%
25%
33%
Cluster2
72%
25%
33%
Table 11. Percentage of sound logos with advertising effectiveness for purchase intention. Cluster, Categories
Food and drink
Service
Dairy necessities
Cluster1
100%
46%
100%
Cluster2
100%
73%
100%
Tables 10 and 11 show the results of the analysis which describes the percentage of effective sound logos for product recognition and purchase intention for each product category. Effective clusters were determined for each category as well as analysis for volume. Therefore, in food and drink, clusters 1 and 2 are effective for both product recognition and purchase intention. For services, no clusters are effective for product recognition and cluster 2 is for purchase intention. For daily necessities, both clusters 1 and cluster 2 are effective for purchase intention. For sound logos in the service category, Cluster 2 is effective for purchase intention. In other words, low and unchanging sound pitches would be effective for purchase intention in service sound logos.
6 Proposing of the Design Policy of Sound Logos Through the results of the main analysis, we propose the design policy of sound logos with advertising effectiveness on consumer attitudes such as product recognition and purchase intention depending on product categories. They are shown in Table 12. It is unable to identify effective audio-visual features in some product categories and consumer attitudes.
Sound Logo to Increase TV Advertising Effectiveness
149
Table 12. Percentage of sound logos with advertising effectiveness for purchase intention.
Category
Consumer attitude Product recognition
Food and drink Purchase intention Product recognition Service
Purchase intention
Product recognition Dairy necessities
Purchase intention
Frequency in a TV commercial
Visual features
1
Item logos
2
Actor for the first, item logos for the second
1
Item logos
2
Actor for the first, item logos for the second
1
Actress
Increase the volume
No humans for the first, actress for the second
Lower the volume and pitch
Design policy of sound logos
2
Audio features
1 2 1
No humans
2
Company logos for the first and second
1
Project no humans
2
Company logos for the first and second
Increase the volume
7 Discussion Through the results of the main analysis and proposing design policy of the sound logos, we recognized the following insights. Firstly, we recognize there is no relationship between the number of sound logos used in each TV commercial and advertising effectiveness from the correlation analysis. In conclusion, it can be said that it is not necessary to pay attention to the number of sound logos used when you want to make effective sound logos. Secondly, we recognize concrete different audio-visual features are raised for each product category. It can be shown that proposed sound logos design policies which are based on the results of main analyzes are all different for each category. Therefore, it can be said that the features of the sound logos need to change according to the product category. Thirdly, we recognize that the audio features of advertising effectiveness vary in terms of product recognition and purchase intention even in the same category. From the analysis, it can be seen that advertising effective features for product recognition of the service’s sound logo tend to be big volume, but those for purchase
150
K. Seto and Y. Asahi
intention do not. Audio features of sound logos in dairy necessities, also vary in terms of consumer attitudes. Through this new insight, we believe that it is necessary for the creators to change the features in the sound logos, depending on whether they want to improve product recognition or purchase intentions. In this study, new attempts are made to analyze sound logos by dividing them into different product categories and analyzing them for the respective consumer attitudes of product recognition and purchase intention. The results of the study provide us with new insights that the features of sound logos with advertising effectiveness differ across categories and consumer attitudes. From these outcomes, the superiority of this study is shown. To develop this study, the following improvements can be made. Firstly, it will be necessary to find a way to obtain detailed visual data such as the color of the image. Only rough data such as the presence of the actress are obtained in the multiple cluster analysis. When we obtain the detailed data, we believe that more detailed visual features of the sound logo can be clarified with advertising effectiveness. Secondly, in the cluster analysis of this study, each cluster was interpreted as an effective cluster if accounted for more than 50% of all the sound logos in each category had advertising effectiveness. The validity of this interpretation threshold will be needed to be examined in the future.
8 Conclusion In this paper, we have proposed the design policy of sound logos that have advertising effectiveness on product recognition and purchase intention by product categories. Our analysis clarified the audio-visual features of sound logos with advertising effectiveness in terms of product recognition and purchase intention. We found that the features of effective sound logos differed for each category, for each consumer’s attitude such as product recognition and purchase intention. We were convinced that the results from this study would facilitate the production of sound logos with advertising effectiveness in the appropriate situation.
References 1. 2021 “Advertising Expenditure in Japan” Commentary - Advertising Market Recovers Significantly. Internet advertising expenditure exceeds the total of the four-mass media for the first time. https://dentsu-ho.com/articles/8090. Accessed 30 Dec 2022 2. Why Audio Advertising is Better that Video Advertising. https://backtracks.fm/blog/whyaudio-advertising-is-better-than-video/. Accessed 30 Jan 2022 3. Matsuda, M., Kusumi, T., Yamada, T., Nishi, T.: Effects of repeated presentation of sound logos and melody familiarity on product evaluation. J. Cogn. Psychol. 4(1), 1–13 (2006) 4. Koike, K.: Effects of components of advertising audio on memory. Modeling advertising effectiveness in commercial songs (2016) 5. Assael, H.: Consumer behavior and marketing action. Kent Publishing Company, Boston (1987) 6. What Is DAGMAR? Model, Definition, Approach, Steps, Criticism, Assessment. https:// www.geektonight.com/dagmar-approach/. Accessed 6 Feb 2022
Sound Logo to Increase TV Advertising Effectiveness
151
7. Hoshino, M.: Statistical science of survey observational data. Causal inference, selection bias and data integration., Iwanami Shoten (2009) 8. Onuma, T., Yamagisi, N., Suzuki, M.: Analysis of review data on educational toys (2019) 9. Shizu, A., Matsuda, S.: Comparison of methods for automatic determination of the number of clusters in cluster analysis. Bulletin of Nanzan University “Academia”, Information Engineering, vol. 11, pp. 17–34 (2011) 10. The effects of audiovisual features of TV commercials on consumer attitude change. https:// www.is.nri.co.jp/contest/2021/download/mac2021special.pdf. Accessed 7 Feb 2022 11. The absolute relationship between ’sound’ and brands. Sensory branding in the new era 2. http://media.style.co.jp/2015/10/3305/. Accessed 30 Jan 2022 12. Advertising effectiveness analysis of TV commercials from a sound perspective. https://www. is.nri.co.jp/contest/2019/download/mac2019saiyushu.pdf. Accessed 8 Feb 2022
Research on Visualization Method for Empathetic Design Miho Suto1(B) , Keiko Kasamatsu1 , and Takeo Ainoya2,3 1
2
Tokyo Metropolitan University, Asahigaoka, Hino, Tokyo, Japan [email protected] Tokyo University of Technology, Nishi-Kamata, Ota, Tokyo, Japan 3 VDS Co., Ltd., Higashiyama, Meguro, Tokyo, Japan
Abstract. “Social adaptation” is conventionally used as a medical term. It is used to mean “a person changes in order to adapt to society.” We wondered if it would be possible to design a society that adapts to people, rather than changing in order for people to adapt to society. In other words, when the concept of social adaptation is applied to design, we believe that a method of “changing the side of society and its mechanisms to suit people” is necessary. This is an important concept that will lead to the realization of a diverse society in which people can easily live as they are without the need to change their individuality and characteristics. We considered this concept to be important in communicating the field of party research to people, including non-participants. Therefore, we focused on how visualization as a method for communicating party research affects people’s cognitive processes up to the level of empathy. In this study, we use visualization as a means of enabling people to empathize and share the contents of the party research we have conducted to date, including physical experiences, with non-participants, taking the case studies as examples. The purpose of this study is to analyze the mental models of the visualized matters and to clarify the cognitive process in order to devise a visualization method for empathy. Keywords: Visualization
1 1.1
· Participatory research · Empathiec design
Introduction Background
The term “social adaptation” is traditionally used as a medical term. It is often used in the sense that people change in order to adapt to society. We wondered if we could design a society that adapts to people, instead of people changing to adapt to society. In other words, when the concept of social adaptation is applied to design, we believe that a method of “changing the side of society and Supported by Tokyo Metropolitan University. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 152–168, 2023. https://doi.org/10.1007/978-3-031-35132-7_11
Research on Visualization Method for Empathetic Design
153
its mechanisms to suit people” is necessary. This is an important concept that will lead to the realization of a diverse society in which people can easily live as they are without the need to change their individuality and characteristics. We believe that this concept is important in communicating the field of participatory research to other people, including non-participants. Participatory research is a research to hypothesize and verify the experiences of difficulties and hardships that minorities with various characteristics and circumstances, including disabilities, experience in their lives due to their own characteristics and circumstances, and is often conducted with other people involved to confirm that the problems are not just assumed by the researcher. Matsumoto (2002) stated, “Participatory research by the parties concerned was considered very significant in terms of the possibility of conducting consideration from a warm, dialogical perspective with the parties concerned, the earnestness of the issues, and the researcher’s own exploration of self.” (Matsumoto 2002 p. 97) [1]. Therefore, it is important to convey to others the thoughts of various people who are troubled by the problems they have identified, in order to reflect and change them in society. Regarding social adaptation in participatory research, there exist efforts to support how to communicate participatory research in previous studies and cases. However, the method of expression used in such support was a proposal that focused on whether it was easy for non-designers, who are not good at visualization, to express themselves, in other words, the level of difficulty. There are no examples yet of analysis and verification of “how other people perceive, what they think, and how they perceive” the expressions using these proposed methods, i.e., “communicability”. Therefore, this study will focus on such aspects as how methods for communicating party research affect people’s cognitive processes up to the level of empathy. In this study, we focused on how the methods used to communicate participate research involved affect people’s cognitive processes up to the level of empathy. We took the case studies of the research on the parties involved as examples, and used “visualization” as a means of enabling people, including non-participants, to empathize and share their physical experiences with others. In order to devise a visualization method for empathy, the purpose of this study was to analyze mental models related to matters that have already been visualized and to clarify the cognitive process. 1.2
Research Flow
The following is a description of the flow of this study. The case study party study is “A Study of Left-Handed Writing Characteristics. A picture book was created to summarize the writing characteristics of left-handers, which were objectively observed using an eye mark camera, in a way that non-participants could easily understand, including the experiences caused by the characteristics as factors. This section will analyze the visuals in this picture book. First, we conducted a survey of visualization methods used in party research.
154
M. Suto et al.
A list of visualization methods was created by selecting visualization methods from the picture books used as case studies and from related research. These were then analyzed by applying them to a four-quadrant matrix diagram with “third person/first person” on the horizontal axis and “structural explanatory/experiential” on the vertical axis. Next, a questionnaire and interviews were conducted to determine which methods were most effective. In the questionnaire, respondents were asked to respond to each page spread with “what they thought was impressive” and “why they thought so,” and finally to respond to ”what they thought was impressive overall” and “what they felt was sympathetic or deepened their understanding of the characteristics of left-handed handwriting” throughout the entirety of the spread. Subsequent interviews were conducted to dig deeper regarding each item. From the results of these two surveys, we identified effective visualization techniques and why they promote empathy, and created a cognitive process model of how readers arrived at empathy.
2
Related Works
This section looks back to previous works about visualization of participatory research and supporting visualization of participatory research. 2.1
Visualization of Participatory Research
As mentioned in the previous chapter, there are several previous studies that visualize what was learned from the validation results when communicating the content of the participatory study to people for social adaptation. Itsuki Ichikawa et al. (2015) used a video/audio filter to reproduce and model the auditory and visual characteristics of people with ASD by researchers who have autism spectrum disorder (ASD) [2]. Yusuke Kakei (2021) has created a “world” of experiences of dementia based on interviews and visualized it in “travel sketches” and “travelogues” of travelers in that dementia world. These are presented in multiple media: a web page, a book, and a video [3]. Mette Due-Christensen et al. (2021) used visualization to communicate the experiences and difficulties of TD1 (diabetic patients) and to identify priorities for the care of adults with TD1. Based on this, they then developed a communication tool to help support adults in adjusting to life with diabetes [4]. Noguchi (2022) focuses on the “words” uttered by astronaut Noguchi himself during his stay in space, and conducts text mining of the contents of the author’s diary and tweets to examine the psychological changes expressed as language and how they affect individual perspectives, and then works as a researcher [5]. As described above, it was found that visualization of party research is conducted in a variety of formats. In this study, we created a list of visualization methods for party research, including the aforementioned cases, and analyzed them by
Research on Visualization Method for Empathetic Design
155
applying them to a four-quadrant matrix diagram with “third person/first person” on the horizontal axis and “structural explanatory/experiential” on the vertical axis. 2.2
Supporting Visualization of Participatory Research
A prior example was an initiative to support visualization of research by the parties involved. Tomita et al. (2017) designed and conducted a workshop for researchers in the sciences to illustrate their institution’s research using isometric projection diagrams in order to identify ways in which experts can help the researchers themselves, or the parties involved, to design in a sustainable manner [6]. Hirose et al. (2019) proposed a visualization method in a role-playing format called “Gokko Design” as a solution to the problems that arise when people with no design experience engage in party design, such as “reducing difficulties in creation,” “supporting free and equal expression,” and “supporting understanding of complex interrelated elements” [7]. Tomita et al. (2017) provided a variety of design support to public officials by designers so that the public officials themselves can design efficiently and optimally in a sustainable manner. They then proposed a methodological framework for participatory design, summarized as six steps and the division of roles between parties and designers [8]. However, the proposals for the methods of expression used in the aforementioned research were focused on whether they were easy to express even for the nondesigner researcher concerned, in other words, on the level of difficulty. There is still no case in which analysis and verification of “how other people perceive, think, or perceive” the expressions using these proposed methods, in other words, “ease of conveying” have been conducted. Therefore, in this re, we focused on how the methods for communicating the research on the parties involved affect people’s cognitive processes up to the level of empathy, and decided to create a mental model that would serve as the basis for our proposal for support services. 2.3
Cases of Visualization of Participatory Research Dealt with in this Research
In this research, a picture book created by the author, Suto, in her previous research, entitled “An analysis of the writing Japanese characteristics of lefthanded calligraphers -Use as infographics-”, was treated as a case study. The following is an overview of the research. In Japan, there are more right-handed people than left-handed people. For this reason, most of the things in our daily lives are designed with right-handed people in mind. This has led to many inconveniences that left-handed people experience in their daily lives. In spite of the fact that writing is a basic activity that is required from an early age and is performed many times in daily life, the
156
M. Suto et al.
inconvenience felt by left-handed calligraphers is much greater than that of righthanded calligraphers. In this study, we examined the inconvenience felt by lefthanded calligraphers when writing Japanese. Therefore, we conducted a metaanalysis of the inconveniences felt by left-handed calligraphers when writing to find out what elements of calligraphy are responsible for the inconveniences they feel. The purpose of this study is to clarify the writing Japanese characteristics of left-handed calligraphers and to summarize them in a form that can be easily understood by people other than left-handers. Finally, a picture book was created to communicate these results in an easy-tounderstand manner to those not involved (see Fig. 1).
Fig. 1. A picture book treated as a visualization case study of participatory research, “A Book on Left-Handed Writing Characteristics.”
3
Investigation of Visualization Methods for Party Research
We selected visualization techniques from picture books and related works and made a list of them. We then applied these to a four-quadrant matrix diagram with “third person/first person” on the horizontal axis and “structural explanatory/experiential” on the vertical axis, and analyzed the trends in visualization techniques used in the study of the participants.
Research on Visualization Method for Empathetic Design
3.1
157
Methods Found in Picture Book
First, visualization techniques were selected from the picture book “A Book about Left-Handed Writing Characteristics”. The visualization techniques used in each facing page of the picture book were extracted by writing them down using sticky notes (see Fig. 2).
Fig. 2. Visualization techniques used in each spread of the picture book were written out using sticky notes.
As a result, the following 13 methods were used. – Photographs (video ss) taken with a vision recorder, including the area where the eye is looking
158
M. Suto et al.
– Photographs that recreate the situation being experienced by the person involved from a first-person perspective – Illustrations in which body parts can be seen from a first-person perspective, such as a person’s hands or feet – Encouraging the experience of touch – Illustration of how to move your hand – Illustrate how to create a place – In dialogue form conveying a message in the form of a line – Human expression – Illustrate the situation from a third-person perspective. – Change the shape by the corresponding number – Cartoon mark – “! “, “?” and other symbols exclamation mark – Draw as many things as you want to show 3.2
Methods Found in Related Works and so on
Next, we selected visualization methods from visualizations used in related studies and visualization methods the author, Suto, has used in the past that we judged to be suitable for representing the results of the participatory study. As a result, the following 25 methods were used.
Third-Person Viewpoint Illustration (Reference case: [4]) – – – –
From the point of view of others How it looks to others Illustrate the story Illustrations with exaggerated expressions to make some parts stand out Illustrations with abbreviations to make some parts stand out Thinking is represented by a speech balloon and an illustration inside the balloon – Metaphorical expressions Illustration as it is Visual Storytelling by Replacing the Senses to Make Them Easier to Understand (Reference case: [3,7]) – Replacing life spent in the sense of being a party to a trip and travel sketches and travel journalization – Role-playing (Gokko) Graphic Modeling to Show Structure and Reflection of Learning and Understanding (Reference case: [9]) – – – – –
Showing the structure in three dimensions Express your thoughts and ideas using speech balloons and characters The order of the process is indicated by Indicated by arrows Illustrated in the moment as a reproduction of the moment Indicate order/priority
Research on Visualization Method for Empathetic Design
159
Third-Person Photography – Photographs of the parties in action – A picture of a certain situation involving the parties involved and the people around them – Acting out reactions Photographs that make it easy to understand First-Person Photography (Reference case: [2]) – First-person photos with additional computer graphics Reproduce the way you see • Apply filters (blur, color change, etc.) • Insert lines and guides Visualizing Emotions (Reference case: [10]) – Emoji – Graphing emotional waves Visualize Quantities – Change the color by the corresponding number – Graphing Character – Reproduce the reactions and actions of those around the parties involved – Have the character imitate the behavior of the party Visualization of Areas of Interest – Mark (circle, etc.) the areas that need attention – Change the color of the area you want to draw attention to – Change the shape of the area you want to draw attention to 3.3
Analysis of Trends Using a 4-Quadrant Matrix Diagram
These 38 visualization methods were applied to a four-quadrant matrix diagram with “third person/first person” on the horizontal axis and “structural explanatory/experiential” on the vertical axis to analyze trends in visualization methods used in participatory research.(see Fig. 3).
160
M. Suto et al.
Fig. 3. 4-quadrant matrix diagram of visualization techniques for participatory research.
Figure 3 shows that the “first-person experiential” and “third-person structural explanatory” methods tend to be used more frequently in the visualization of participatory research.
4
Questionnaire and Interview Survey
Based on the visualization methods extracted in the previous section, we conducted a questionnaire and interview survey to clarify which methods were the most effective.
Research on Visualization Method for Empathetic Design
161
In the questionnaire, the respondents were asked to answer “what they thought was impressive” and “why they thought so” for each facing page. Finally, they were asked to answer ”what they thought was impressive throughout” and “what they felt was sympathetic or deepened their understanding of left-handed handwriting characteristics throughout”. On the form used for the questionnaire, one page for each spread was placed at the top, and the answer columns for “the part you thought was impressive” and “the reason why you thought so” were placed at the bottom. On the last page, we placed the answer columns for “pages that you thought were impressive throughout” and “pages that you sympathized with or felt deepened your understanding of the characteristics of left-handed handwriting throughout”. Subsequent interviews were conducted to explore each of these items in more depth, and the exchanges were recorded. Three right-handed women, A–C, cooperated with our survey.
4.1
Questionnaire Results
Figures 4, 5, 6 summarizes the responses of the three participants to each of the spreads (1)–(12) in terms of “what they thought was impressive” and “why they thought so. During the process of having participants answer the questionnaire, Participant A asked, “I didn’t know how to answer the question at first, whether the word ’impressive’ means impressive in terms of content, clarity of appearance, or whether it is eye-catching (e.g., it stands out because it is red).” The question was raised. Therefore, “The part of the illustration that caught my attention, as well as the text, is one part of the visual. Therefore, I would like you to list the parts of the visuals that you thought were impressive, and then answer how they were impressive in the explanation of why. The experimenter explained the reason for this to the participants, and they were satisfied. The same explanation was given verbally to participants B and C, who answered the questionnaire afterward, in order to have them answer the questionnaire under the same conditions.
162
M. Suto et al.
Fig. 4. The figure below summarizes the responses of participants A, B, and C to each of the facing pages (1)–(4) for Q1: “The part you thought was impressive” and Q2: “The reason why you thought so”.
Research on Visualization Method for Empathetic Design
163
Fig. 5. The figure below summarizes the responses of participants A, B, and C to each of the facing pages (5)–(8) for Q1: “The part you thought was impressive” and Q2: “The reason why you thought so”.
164
M. Suto et al.
Fig. 6. The figure below summarizes the responses of participants A, B, and C to each of the facing pages (9)–(12) for Q1: ‘The part you thought was impressive” and Q2: “The reason why you thought so”.
The results of the responses to the questions “Pages that I found impressive throughout” and “Pages that I sympathized with/gained a better understanding of left-handed writing characteristics throughout” are as follows.
Research on Visualization Method for Empathetic Design
165
Participant A – Pages that I found impressive throughout: (4) – Pages that I sympathized with/gained a better understanding of left-handed writing characteristics throughout: (4), (8), (6) Participant B – Pages that I found impressive throughout: (5) – Pages that I sympathized with/gained a better understanding of left-handed writing characteristics throughout: (6) Participant C – Pages that I found impressive throughout: (1), (4), (5), (8) – Pages that I sympathized with/gained a better understanding of left-handed writing characteristics throughout: (4), (5), (6), (7), (8), (10) 4.2
Interview Results
Below are the reasons for the selection of “Pages that I found impressive throughout” and the reasons for the selection of “Pages that I sympathized with/gained a better understanding of left-handed writing characteristics throughout”, which we asked in the interview. First, we discuss the results of the interviews regarding the reasons for the “pages that I thought were impressive throughout”. Participant A cited (4) because “I can’t read! because I could see for myself that it was, and I got to experience the horizontal writing of a left-handed person.” It was. Participant B cited (5) because “it was a transformational page and I could experience the page. There were two such pages, (4) and (5), but I felt that the latter was easier to read and more impressive than the former.” It was. Participant C listed (1), (4), (5), and (8). The reasons were as follows (1): It was easy to understand that the number of people was visualized. (4): It was easy to understand how a left-handed person looks when writing horizontally because of the visual representation. Also, because the shape of the paper on the page is different. (5): Compared to (4), the text is easier to read. (8): Because I felt it was “difficult” when I actually traced it. Second, the results of the interview will be discussed in terms of the reasons for the “pages that I felt I could relate to throughout and that deepened my understanding of left-handed writing characteristics”. Participant A listed (4), (6), and (8). The reason was that “in these three pages, I had a simulated experience by tracing and imitating the page by myself. Participant B cited (6), the reason was that “it was the most inconvenient. It was not a theory, but I could understand the difficulty of writing visually at a glance.”
166
M. Suto et al.
Participant C listed (4), (5), (6), (7), and (8). The reasons were as follows. (4): It was easy to visually understand the situation because the paper cut out in the shape of a hand looked like their own hand. By holding only the page of the hand and moving it as if tracing it, the participants could simulate the vision of a left-handed person writing horizontally. (5): Same as (4). By comparing with (4), it was easy to understand the difference between horizontal and vertical writing. (6): Because it was easy to see the hand covering the letter because it was translucent and the illustration of the hand was easy to imitate. (7): Because I am also a left-handed person and I felt familiarity with it. (7): Because I am also a left-handed person and I felt a sense of affinity with left-handed people. (8): Because it was easy to understand the contrast with the mirrored letters. (10): Because the illustrations were slightly edited versions of the same thing, so it was easy to understand the differences in the arrangement of the objects on the desk. Third, we will discuss the results of the interviews on the pages where the participants experienced simulated physical movements such as mimicking and tracing. In the course of a deeper investigation of the responses to each of the survey items, it became clear that the participants were engaged in acts of imitation/tracing. The corresponding pages were (4), (5), (6), (8), (10), and (11). Imitations included “tracing with a finger,” “holding and moving part of the page,” “tracing with the pen that was being used at the time of the response,” and “imitating the way the pen was being held at the time of the response with the illustration in the picture book.” Fourth, I would like to share with you some comments we received regarding the relationship between the third-person perspective and the “left-right” motif. When asked to freely express their concerns during the interview, Participant A and Participant C commented, “From the third-person perspective, it is difficult to determine at a glance whether you want to express left-handedness or right-handedness.
5
Discussion
In this section, we discuss the visualization methods for participatory research based on the above survey results. From the four-quadrant matrix diagram created in Sect. 3.3, it was found that the “first-person experiential method” and the “third-person structural explanatory method” tend to be used more frequently in the visualization of party research. We believed this is because the former is “intuitively comprehensible information” and the latter is “logically comprehensible information” that can be easily used together. The survey/interview participants also commented that this “first-person experiential method” was ”easily memorable” in their evaluation of the picture book handled in the case study. We believed that this is because even non-participants can understand the feelings and problems of the participants as if they were the subjects of the research.
Research on Visualization Method for Empathetic Design
167
Furthermore, it was found in the interviews that the participants were “imitating and tracing”. We believed that the reason for this behavior may be that by moving their own hands, the participants physically experienced the simulated experience of the participants, and that even the non-participants could experience what they felt at that time. Therefore, we thought that the “first-person experiential method” is effective in visualizing the research on the parties concerned to convey it to non-participants, and that by devising ways to encourage physical simulated experiences, everyone can easily recognize and empathize with the content as if it were their own.
6
Future Work
Finally, we will discuss how to develop these results. First, based on the survey results, we will improve the visualization to more effectively communicate the “characteristics of left-handed calligraphers. Here, the target users will be elementary school students learning Japanese calligraphy and their teachers, and improvements will be made based on usage scenarios of how Japanese calligraphy classes are conducted. Next, we would like to visualize other examples of party research we have conducted with reference to the mental model we have considered. Ultimately, we would like to propose a support tool for non-designers and other party researchers who are not good at visualization. This proposal is based on the experimental results and focuses on two points: what kind of visualization is needed to improve the ease of communication, and what kind of mechanism should be created to reduce the difficulty of visualization. By proposing this support tool, we would like to contribute to the creation of a co-creation mechanism to think about the problems and issues verified in the party study with various people.
References 1. Matsumoto, M.: Research Article- Significance of participatory research by parties. In: 5th Exploration of Teaching Methods, pp. 93–98. Graduate School of Education, Kyoto University, Department of Educational Methodology, Japan (2002) 2. Ichikawa, I., Nagai, Y., Kanazawa, H., Yonekura, S., Kuniyoshi, Y.: Modeling sensory characteristics through reproduction of subjective experience for auditory characteristics of people with autistic spectrum disorder. In: 34th Annual Conference of the Japanese Society for Artificial Intelligence, pp. 4Rin192-4Rin192. The Japanese Society for Artificial Intelligence, Japan (2015) 3. Kakei, Y.: How to Walk in the World of Dementia. Writes Publishing Inc., Tokyo (2021) 4. Mette Due-Christensen, J., et al.: A co-design study to develop supportive interventions to improve psychological and social adaptation among adults with new-onset type 1 diabetes in Denmark and the UK. In: 11th BMJ open (2021) 5. Noguchi, S.: Localization in Microgravity Space : Astronauts’ Participatory Study. Ph.D. Thesis, The University of Tokyo. Japan (2022)
168
M. Suto et al.
6. Tomita, M., Ariga, K., Tanaka, K., Takayanagi, W., Kudo, T., Ueda, I.: The study of ”design it ourselves” for researchers’ research contents visualization – a case study of isometric drawing workshop. In: 64th Spring Conference of the Japan Society for the Study of Design (p. 372). Japan Institute of Design, Japan (2017) 7. Hirose, K., Okamoto, M.: Proposal and consideration of “Gokko design” to support the participation of the parties in the design -a case of co-creation of a new music experience. In: Spring Conference of Japan Society for the Study of Design (p. 34). The Japan Society for Design Studies, Japan (2019) 8. Tomita, M., Koshio, A.: The study of ”design it ourselves” for government workers’ design -policy overview diagram design Project with Cabinet Bureau of personnel affairs-. In: The 64th Spring Conference of Japan Society for the Study of Design (p. 20). The Japan Society for Design Studies, Japan (2017) 9. Misawa, N.: Graphic modeling to co-creation knowledge -Consideration of visual communication in collaborative process-. In: 66th Spring Research and Presentation Conference of the Japan Society for the Study of Design (p. 12). Japan Society for the Study of Design, Japan (2019) 10. Awai, K., Naitou, N.: Study of change of feelings and beliefs of sterile women through their narratives. In: 13th Nursing journal of Kagawa University, pp. 55– 65. Kagawa University School of Medicine, Department of Nursing, Japan (2009)
A Study on HCI of a Collaborated Nurture Game for Sleep Education with Child and Parent Madoka Takahara1(B) and Shun Hattori2 1 Ryukoku University, 1-5 Yokotani, Seta Oe-cho, Otsu 5202194, Shiga, Japan
[email protected]
2 The University of Shiga Prefecture, 2500 Hassaka-cho, Hikone-City 5228533, Shiga, Japan
[email protected]
Abstract. Japan is now called to be a “sleep debt nation,” and this has long been a social problem. In recent years, sleep disorders in children have also been on the rise. To address these problems, it is important to provide sleep education not only to adults but also to children, and to enable children to enjoy managing their own sleep. In this paper, this paper proposes a game in which parents and children cooperate to raise cats by sleeping well and using the food that is given as a result. Keywords: Sleep Education · Collaborated Nurture Game · Sleep Trouble · Sleep Debt · Parent and Child
1 Introduction Japan is now called to be a “sleep debt nation [1–3]" or a "sleep deprived nation [4]" and has long-solved social problems about sleep, with major economic losses being pointed out. Furthermore, in recent years, sleep disorders in children have also been gradually increasing and are considered a serious problem [5–7]. To solve these problems, it is important to provide sleep education not only to adults (parents) and university students, but also to children, and to enable children themselves to enjoy managing their sleep [8–11]. Tamura et al. [9] studied the effects of sleep education in a classroom format and the practice of target behaviors for one month on sleep and daytime conditions and suggested that ‘sleep classes are effective in ensuring sleep, irritability, and improving daytime sleepiness in early and middle elementary school students. In addition, by Furuya et al. [11], ’As a result of a single sleep education lecture, in non-attending schools (to parents only) they had more regular waking times after the lecture. In the schools that attended both children and parents, they showed more regular waking and sleeping times after the lecture, and the percentage of correct answers increased in sleep knowledge. These results confirm that the students of schools that attended the lecture were more likely than the students of non-attending schools to improve their regularity of waking and going to bed and to disseminate correct knowledge. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 169–181, 2023. https://doi.org/10.1007/978-3-031-35132-7_12
170
M. Takahara and S. Hattori
However, sleep education for children has not yet been developed. However, sleep education for children is still in its infancy. According to a survey of the literature on "sleep education" by Ohso [12], "If sleep education programs in Japan are mainly aimed at improving sleep knowledge, large cross-sectional studies are considered sufficient and have already been reported to be successful. On the other hand, there is a lack of empirical evidence showing that certain methods are superior to others in changing sleep behavior, and further study is needed regarding the number and duration of effective interventions." and "ICT technology may automatically integrate both visual and textual information, which may lead to effective learning in children who are proficient in both visual and textual learning. It is pointed out that in sleep education for children, whether in the form of a class or a one-shot lecture, the learning is passive, and it is not yet certain whether the children’s own sleep behavior will change or continue afterwards. To continuously improve the sleep behavior of children as well as adults (parents), a proactive learning system utilizing ICT technology based on a theory that promotes behavioral change, rather than the conventional passive learning, is considered necessary. Therefore, this paper proposes a game (web application) in which parents and children cooperate to raise a cat by sleeping well and using food given because of daily sleep data input, for both parents and children to learn sleep education proactively and continue to change their sleep behavior. Thereafter, Sect. 2 discusses the requirements of the proposed system of sleep education for parents and children. A survey study of existing sleep (educational) games also be conducted and compared with the proposed game. Then, Sect. 3 describes the design of a cooperative training game of sleep education for parents and children, “Neko × Neko: Children who sleeps with a cat grows up well,” and shows some demonstration screens. Finally, Sect. 4 summarizes. Although the system is currently only developed at the demonstration level, the system will be developed as a prototype system and will be conducted demonstration tests with parents-children subjects at kindergartens and other locations. The differences between single-player play, in which only parents and children play together, and cooperative play, in which parents and children play together, in which play is basically only at bedtime or upon waking, and in which play is always on-demand, including these times, and in which players’ preferences and interests, such as dogs or flowers as well as cats as characters to be trained, may affect the sleep deviation [13], the effects of sleep education, and whether there are significant differences in the persistence of changes in sleep behavior.
2 Consideration of Requirements for the Proposed System This chapter examines the requirements of the proposed game after surveying existing sleep (education) games and research, focusing on how to realize this “initiative” and “continuity” to have both parents and children proactively learn sleep education and continue to change their sleep behavior. This chapter focuses on the following three points as points to realize “independence” and “continuity” in sleep (educational) games. • On-demand at all times, not only at bedtime or waking triggers
A Study on HCI of a Collaborated Nurture Game
171
• Cooperative play, not just single player. • End content, not just (multi-)endings 2.1 System Design Policy The purpose of this study is to encourage parents and children to learn sleep education together proactively and continue to change their sleep behavior. To realize this objective, a system called "Neko × Neko: Children who sleep with cats grow up well", a cooperative training game for sleep education for parents and children, is proposed and designed in Sect. 3. • Why a game about sleep (education)? Because it is important for parents and children, especially children themselves, to be able to manage their sleep in a "fun" way, this research utilizes "gamification". Rather than a sleep "education (class)" from parents, children themselves start the sleep (education) game on their own initiative on a continuous basis, input sleep data, understand it positively, learn how to sleep better, and improve their sleep. • Why nurturing games? There are a wide variety of game genres, but in order for parents and children to be able to manage their sleep as "happily" as possible, a genre that eliminates elements that may cause negative emotions and can be controlled to positive feedback is better [14]. In addition, children (potentially) grow up well through good sleep, which naturally be reflected in the characters (cats) in the game and be easily understood as positive feedback, so we adopt "nurturing games" as a genre. The genre of "RPG (Role Playing Game)" can also be linked to children’s growth, but it requires understanding of various roles and is too complicated and is not suitable for small children. Since universal design including small children as target players is also important, simplicity (and fun) of the game is also important. • Why a cat training game? Although “sheep” characters are often used in sleeping games, one of the origins of “cat” is “neko” (a child who sleeps well), and this game character is appropriate for the purpose of this study. Also, if you take a ranking of animals’ people would like to have as pets, dogs and cats are so universally popular, at least in Japan, that they always compete for first place. According to the National Dog and Cat Breeding Survey [15] conducted annually by the Pet Food Association of Japan, dog ownership continues to decline, while cat ownership reversed in 2017 and has been on a gradual upward trend since then. On the other hand, abandoned cats in particular and the killing of cats have become a social problem [16], which this research decided to incorporate into the storyline of the proposed game. 2.2 Policy System Design Policy While there are not a few existing studies [17, 18] on "gaming disorder and sleep," few existing studies [19–21] on “Sleep (Educational) Games. “ Sudo [18] found that "the impact of game time on sleep duration and grades is slight but not large for elementary and middle school students, and has no clear effect on high school students." He analyzed the results of the study. Also, Wander [20] is an audio game that encourages reduced
172
M. Takahara and S. Hattori
smartphone use and provides breathing exercises to improve sleep quality near bedtime based on gamification theory. Pro Sleeper [21], on the other hand, is a mobile audio game that can be played with eyes closed, utilizing meditation and autonomous sensory climax response (ASMR) to improve players’ sleep quality. However, unlike the proposed sleep education game, it is not intended for parents and children and has only single-player play. The iOS/Android application "Sleep x Game [22]” is a game in which ’when the player goes to bed, the sheep go on a journey to the "end of the night" and the gentle sleep music of nature, such as rain and waves, helps the player fall asleep. The game also claims that "as players enjoy the game, they will continue to develop good sleeping habits. Daily sleeping hours are automatically recorded, and various sheep (cat food in our sleep education game) are awarded accordingly to grow the flock. Furthermore, players can also receive sheep for meeting unspecified other players and can encourage other players to improve their sleep by pressing the "Good Sleep Button," thereby encouraging everyone to continue the game. The game has many similarities to our proposed sleep education game but differs in that it is a more closely cooperative nurturing game that focuses on the proactive sleep education of specific parents and their children. In addition, it is beginning to be said that “Pokémon Sleep,” which was announced by the Pokémon Company at the Pokémon Business Strategy Presentation in May 2019 and is targeted for distribution and release in 2020, may soon be on the way. Pokémon Sleep is a smartphone application that is linked to Pokémon GO Plus+, a palm-sized device with a built-in accelerometer that can measure sleep time when placed under the user’s pillow during sleep, and has the concept of “making getting up in the morning a pleasure” and “making sleep and waking hours an entertainment experience. The application is a smartphone application with the concept of “making people look forward to getting up in the morning” and “making sleeping and waking hours entertaining. However, there was no further news on Pokémon Presents [23] on August 3, 2022. As soon as more details become available, it is needed to compare the similarities and differences with our proposed sleep education game. Several other games have also been proposed that are not based on the player’s sleep, but are based on one of the player’s own actions: walking (number of steps). Among them is the wearable “Tamagotchi Smart,” an evolution of the “Tamagotchi” series of keychain-type electronic games from Bandai, in which the number of steps taken by the player is counted and the reactions of the “Tamagotchi” creature to be trained changes. Similar wearable electronic games with pedometer functions include Hudson’s “Tekketsu Angel,” Nintendo’s “Pocket Pikachu,” and the Pokémon pedometer “Pokéwalker,” but there are no cooperative games to be found. Location-based “RPGs” rather than simple “training games” include “Pokémon GO” and “Dragon Quest Walk”. 2.3 Sample On-Demand at Bedtime or upon Waking vs. Always On-Demand Existing sleep (educational) games, including the application “Sleep × Game [22],” are often simply triggered at bedtime or upon waking up to play the game. Although it is not necessarily a bad thing in the beginning to make it a routine task to keep track of and check sleep data every day, it may become a mere task and easily lead to boredom as a game. The application “Sleep × Game [22]” allows users to set their daily bedtime and
A Study on HCI of a Collaborated Nurture Game
173
wake-up time, but if they receive a notification just before these times, they may feel forced to sleep regularly. This is not an improvement in proactive sleep behavior in the least. Therefore, we believe that the sleep education game this research proposes should not only be triggered at sleep-related times such as bedtime or waking time, but also should have a mechanism that allows players to play the game at any time they like and enjoy it in some way. For example, the following functions are being considered. Players can feed the cat at any time they like. (The cat’s food is newly given when sleep data is input after waking up, but feeding the cat is on-demand). The player can communicate with the cat at any time (click (touch) on the cat and it will purr or respond to the player’s interaction). 2.4 Single Play vs. Cooperative Play Existing sleep (educational) games are often just single-player games, with no awareness of other players. Only the application “Sleep × Game” [22], which was the only one in our survey, has some kind of interaction with other players, such as pressing the “Good Sleep Button” to encourage other players to improve their sleep, but it is an unspecified other player, and even if they press it casually in a workmanlike manner, there is no sense that they are connected to each other or to the game as friends. However, it is unclear whether this will lead to initiative and continuity, since the players are unspecified other players, and although they may push the button in a casual, workmanlike manner, they do not feel that they are playing together as friends to improve sleep. Therefore, we believe that the sleep education game this paper proposes requires a mechanism for closer cooperative play between parents and children, who are mutually identified as other players. It is hoped that this will create a sense of working together as friends to improve sleep, which will lead to positive encouragement and continuity with each other. For example, the following functions are being considered. The system can tell whether or not a pair of players have already entered their sleep data. When both players input sleep data, additional cat food will be given to the cat, and if both sleep well, an additional bonus will be given. 2.5 (Multi)endings vs. End Content As a cat training game based on the player’s daily sleep data input, a story and an initial easy-to-understand goal are necessary, and therefore, a single ending when a weakened cat becomes healthy and a multiple ending depending on the difference in the process of becoming healthy are prepared. However, there is a concern that this may cause problems with continuity afterwards. It is considered adding endless content that will not bore the player, such as making the character’s dialogue responses smarter in response to further feedings.
174
M. Takahara and S. Hattori
3 Design of the Proposed Game 3.1 Overview (Game Flow) The general flow of the proposed game is as follows. Step 0 The system administrator gives each player (parent-child pair) an ID/PW as login information. Step 1 Each player logs in initially. The story of protecting a weak cat, fostering it, and recovering it, as well as the sleeping (play) method as shown in Fig. 1, are explained to the players. Step 2. Each day, each player enters his/her own sleep data, based on which the cat is given food. The parent and child work together to feed the cat at any time during the day, with the goal of the cat’s full recovery. The special bait will not be given unless the pair of players also input sleep data, so they must communicate with each other to encourage sleep data input. The history of each player’s sleep data can be viewed on the application, but the sleep data of the paired player cannot be viewed, so if necessary, the players communicate with each other about sleep and ask each other to view the data. Even without feeding, clicking (touching) the cat will elicit a reaction from the cat, and the cat will also take action without the player having to do anything. In the demo version, it is random. Step 3: Once the cat is fully recovered, the ending is reached. We collected as many action logs of the players (parent-child pairs) on the application as possible, and changed the ending depending on the differences in the process. Step 4: Even after the ending, the sleep management of each player (parent-child pair) will continue endlessly, and endless contents will be added to keep the players (parent-child pairs) occupied, such as making the cat AI’s interactive responses smarter in response to further feedings. On the other hand, if the cat cannot continue to get a good night’s sleep, it will become weak again.
Fig. 1. How to sleep (play) “Cat × Cat.
A Study on HCI of a Collaborated Nurture Game class Player id : String pw: String name: String foodNL : int
175
class SleepDiary
1
* >
login() sleepDiaryInput() sleepDiaryBrowse() feedNL( cat : Cat ) touch( cat : Cat )
timeBedIn : double timeBedOut : double timeSleepIn : double timeSleepOut : double timeNapIn : double timeNapOut : double numWakeup : int nightmare : int sleepy : int appetite : int
> class SystemManager
class Child
id : String pw : String
foodSP : int
class Cat
1
1 >
login() registerChildParent()
feedSP( cat : Cat )
class Parent
name: String bodyStrength : int positionAtHome : int imagePNG : String[] callMP3 : String[] spokenLines : String[]
1
1 >
recover( bs : int ) move() call() speak()
backupAll() resetAll()
>
>
>
>
instance
instance
instance
instance
id = “m4d0k4” pw = “**********”
id = “child01” pw = “**********” name = “ ” foodNL = 5 foodSP = 1
name = “ ” bodyStrength = 11 positionAtHome = 0
id = “parent01” pw = “**********” name = “ ” foodNL = 2
Fig. 2. Class diagram, instances and demo screens of “Cat × Cat”, a cooperative training game for sleep education for parents and children.
3.2 Data Management The actors in the proposed game are the players (parent-child pairs) and the system administrator, and one of the entities is the cat character that the players (parent-child pairs) share and cooperatively develop. As shown in Fig. 2, these four are the basic classes. Information Security Since daily sleep data is extremely sensitive personal information, and since it is planned to incorporate not only sleep data but also daytime activity data to evaluate sleep quality in the future, information security must be strictly enforced. However, the demo version in
176
M. Takahara and S. Hattori
this paper uses only simple basic authentication with an ID/PW, which is still problematic and needs to be improved. Cat Feeding Based on Sleep Data Input First, as shown in Fig. 3, players (parents and children) input their daily sleep data. Fitbit sense and other sleep sensors can be used to facilitate sleep data input, but we have chosen to input the data manually in order to make the players more aware of their own sleep data. However, it will be necessary to explore more convenient interfaces in the future. In the demo version of this paper, the following questions were asked, but it will be necessary to consider the selection of questions in the future. Q1. Time of going to bed and time of waking up (required). Q2. Time of falling asleep and time of awakening (required). Q3. Nap start time and nap end time (optional). Q4. How many times did you wake up during the night? (required) Q5. Q5. Did you have scary dreams? (required) Q6. Q6. When you woke up, did you still want to sleep? (Required) Q7. Q7. When you woke up, did you have an appetite? (Required) Q7.
Fig. 3. Daily sleep data entry (bedtime and dreaming).
Next, based on the player’s daily sleep data input, the cat is given food as shown in Fig. 4. However, in the demo version of this paper only used the following times. Sleep duration [h] = Wake time - Sleep onset time. In the future, it is consider estimating not only the quantity (time) of sleep, but also the quality of sleep based on other questionnaire items, and taking this into consideration when assigning cat food. First, the “normal feed” is calculated independently based on the daily sleep data input for each parent and child, is independently given, and can be independently fed to the cat. The “special bait” is given to the parent-offspring pair at the time when both parent-offspring pairs have made their daily sleep data inputs, and only the child can feed the cat. The number of “specials” is calculated by the following equation. Usual number of feeds [pcs] = min (“sleep time” - “MIN”,0)/ UNIT.
A Study on HCI of a Collaborated Nurture Game
177
Fig. 4. Feeding cats to children based on sleeping hours.
However, UNIT is a unit of time common to parent and child, and is 1.5 [h] in the demonstration version of this paper; changing it to 1.0 [h], for example, will make it easier to be assigned. In addition, MIN is a parameter for each parent and child, and is the minimum time that they should sleep. In the demonstration version of this paper, the MIN for the child is set to 5.0 [h] and the MIN for the parent to 3.0 [h]. Since this is only a provisional version, it is considered more appropriate settings in the future by referring to existing studies. Note that ┤ is the floor function. As a general rule of thumb, the number of feeds to be given is 4 daily when the child sleeps 11.0–12.49 [h], which is the target recommended time [24], and one feed can increase the cat’s fitness by +1, so the cat’s fitness (internal score) is designed to recover to 100 or more in about 25 days in single-player play. On the other hand, the number of special baits is determined by the following formula. Special (bites)[pcs] = Sleep TimeChild /RCMChild · Sleep TimeParent /RCMParent However, RCM is a parameter for each parent and child, and is the target recommended sleep time. In the demonstration version of this paper, the RCM for the child is set to 11.0 [h] and the RCM for the parent to 6.0 [h]. In fact, it is desirable to set the RCM of the parents at about 7.5 [h], but many parents are not realistic, and it is easy for the parents to prevent specials from being awarded at all, so this research has tentatively lowered the target a little. Therefore, it has tentatively lowered the target a little. In this case, it would be better for sleep education to clearly state this as the next target, rather than implicitly changing it internally. If the current target seems difficult to achieve, a mechanism may be necessary to lower the target RCM value a little more, depending on the status of the granting of specials. The guideline for granting specials is at most one each day. If the parent gets more than 12.0 [h] of sleep, more than two is possible,
178
M. Takahara and S. Hattori
but it is not realistic for the child to get more than 22.0 [h] of sleep, so it depends on the parent’s sleep. Cat Changes Based on Feeding Based on the daily sleep data input by the player (parent and child), the cat is given food, which the player can freely feed to the cat. The cat changes depending on the feeding of “normal food” and “special food” as follows. The cats can be fed independently by clicking (touching) on the “normal feed” icon, which increases the cat’s strength (internal score) by +1. As shown in Fig. 5, the cat’s feeding is indicated in an easy-to-understand manner, and the cat will purr with sound and dialogue, making you happy. Special: When the icon is clicked (touched), only the child can feed and the cat’s strength (internal score) is increased by +3, although this is granted to the parent-child pair. As in Fig. 4, the cat’s having been fed is displayed in a way that is easy to understand, and the cat is more pleased than with “normal food” with a happier sound and dialogue.
Fig.5. Changes in cats due to feeding of “normal food.
In addition, to positively reflect the player’s daily good sleep to the cat, the character to be trained, the cat’s appearance (image) is monotonically changed according to the cat’s physical strength (internal score), as shown in Fig. 6. Also, the cat’s lines during normal times when the main screen is simply displayed and during feeding events are also changed. The cat’s physical strength is designed to monotonically increase until it is fully recovered once, but then it is also equipped with a mechanism that allows it to weaken again if it is left too long.
A Study on HCI of a Collaborated Nurture Game
20
0
40
60
80
179
100
Fig. 6. Appearance and lines change according to the cat’s fitness.
4 Conclusion This paper proposes “Neko x Neko: Kids who Sleep with Cats Grow Up Well,” a cooperative cat-raising game (web application) in which parents and children cooperate to raise cats by using bait given as a result of daily sleep data input, so that both parents and children can learn sleep education proactively and continue changing their sleep behavior. The design proposal and some demo screens were also shown. It is planned to develop a prototype system and conduct demonstration tests with parents and children at kindergartens. This research also considers the following functional enhancements. 4.1 Diversity of Parent-Child Pairings In the demo version of this paper, child IDs are paired one-to-one with parent IDs. it is also considered at least one-to-two pairings between children and their parents, as well as support for a wide variety of family structures, such as parents and siblings, grandparents, and so on. However, since the game balance, such as the quality and quantity of bait grants and the amount of cats’ recovery by such bait, is adjusted assuming one-on-one pairing of parents and children, it is necessary to reconsider this issue when increasing the number of players in the grouping. 4.2 Visualization of Sleep Data History In the demo version of this paper, only a list of numerical data is shown, but in addition to existing visualization methods such as bar graphs and line graphs, this research will also explore new visualization methods that are only possible with parent-child pairs. Although the parent-child pairs are only shown whether or not they have already entered their sleep data for the day, it is expected them to be aware of each other’s sleep data entry and to interact with each other in a realistic manner to encourage sleep data entry. 4.3 Interactive Response AI for Cat Characters This research considers a dialogue response with the cat character as one of the end (less) contents after the cat has fully recovered and once the ending has been reached. Research to give the dialogue response AI a personality [25], for example, will be utilized. A proposal to increase the size of the dialogue response database of the dialogue response
180
M. Takahara and S. Hattori
AI of the cat character and make it smarter as the cat’s physical strength increases could be considered, but if it is simply linear, there is a fear that the improvement in smartness will gradually become imperceptible, so exponential might be better. On the other hand, it is difficult to keep increasing the size of the dialogue response database exponentially, which is a vexing problem. Therefore, also it is considered the idea of not only controlling the size of the dialogue response database but also combining it with a forgetting curve and updating it with more recent (topic) dialogue response data. In addition, this research will increase communication with the cat character, for example, by personalizing (personalizing) the cat based on machine learning of the player’s interests and preferences based on the content of the dialogue responses, or, conversely, by forgetting to feed or talk to the cat if the player neglects to do so. Acknowledgement. This work was supported by JSPS KAKENHI Grant Number 20K13787.
References 1. Takami, S., Kadoya, H.: Current status and problems of sleep debt in Japanese. Sleep Med. 12(3), 305–309 (2018) 2. Komada, Y.: Sleep debt and social jet lag issues and responses: a developmental perspective. J. Behav. Med. 26(1), 58–64 (2021) 3. Komada, Y.: Sleep debt and social jet lag among workers. Public Health 86(1), 43–51 (2022) 4. Matsuda, H.: Trends in behavioral medical and psychological support for sleep disorders: cognitive behavioral therapy for insomnia and nightmare disorder. Bull. Edogawa Univ. 22, 51–57 (2012) 5. Okawa, K.: Sleep and brain development in children: effects of sleep deprivation and nocturnal society 15(4), 34–39 (2010) 6. Kamiyama, J.: Sleep debt in children. Sleep Med. 12(3), 325–330 (2018) 7. Komada, Y.: Sleep debt in children: effects and countermeasures. Pediatrics 61(13), 1760– 1767 (2020) 8. Eto, T.: How should children’s sleep be, Kyoiku to Igaku. Educ. Med. 55(8), 784–791 (2007) 9. Tamura, N., Takahama, Y., Sasaoka, E., Tanaka, H.: Effects of classroom sleep education on sleep, daytime sleepiness, and irritability in elementary school students. Bull. Cent. Clin. Psychol. Hiroshima Int. Univ. 11, 21–35 (2013) 10. Yoshizawa, K.: Sleep deprived children and their parents. Food Cult. 464, 24–28 (2013) 11. Furuya, M., Ishihara, K., Tanaka, H.: Single-shot sleep education in elementary school students: a comparison by listening style. 学校保健研究 57, 18–28 (2015) 12. Oiso, M.: Current status and issues of sleep education for elementary and junior high school students: a review from domestic and international literature. 人間発達学研究12, 27–40 (2021) 13. Brain Sleep Inc. sleep deviation kids survey results published (2021). https://brain-sleep.com/ service/sleepdeviationvalue/research2021kids/. Accessed 27 Nov 2022 14. Takahara, M., Suto, H., Tanev, I., Shimohara, K.: Sleep visualization through indirect biofeedback for patients’ behavioral changes and sleep quality. IEEJ Trans. Electron. Inf. Syst. 142(6), 637–642 (2022) 15. Pet food association of Japan 2021 national dog and cat breeding survey. https://petfood.or. jp/data/. Accessed 27 Nov 2022 16. PEDGE: The actual state of dog and cat killing in Japan: current status and advanced solutions. https://pedge.jp/reports/satusyobun/ last accessed 2022/11/27
A Study on HCI of a Collaborated Nurture Game
181
17. Weaver, E., Gradisar, M., Dohnt, H., Lovato, N., Douglas, P.: The effect of presleep videogame playing on adolescent sleep. J. Clin. Sleep Med. 6(2), 184–189 (2010) 18. Sudo, K.: Effects of game time on grades and sleep duration: a panel data analysis of elementary, middle, and high school students in Japan. Res. Bull. Meisei Univ. Fac. Educ. 11, 1–13 (2021) 19. Teel, P. The Floppy Sleep Game Book: A Proven 4-Week Plan to Get Your Child to Sleep, Perigee Trade (2005) 20. Cai, J., Chen, B., Wang, C., Jia, J.: Wander: a breath-control audio game to support sound sleep. In: Proceedings of the 2021 Annual Symposium on Computer-Human Interaction in Play (CHI PLAY 2021), pp. 17–23 (2021) 21. Si, H.: Pro Sleeper: A Meditative Mobile Game to Improve Sleep Quality. Northeastern University ProQuest Dissertations Publishing (2021) 22. HAappss (Abe, H.) Sleep x Game - Good Sleepers: Improving Sleep Habits in a Fun Way. iOS/Android, 23 May 2022 23. The Official Pokémon YouTube channel Pokémon Presents, 08 March 2022. https://www. youtube.com/watch?v=ojiBuA97rdc. Accessed 27 Nov 2022 24. Hirshkowitz, M., et al.: National sleep foundation’s sleep time duration recommendations: methodology and results summary. Sleep Health 1(1), 40–43 (2015) 25. Mori, K., Hattori, S., Takahara, M., Kudo, K.: Cross-language and rule-based personality removal for reducing the cost of building tsundere dialogue response AI. Summer Workshop 2022 (2022)
Analysis of Resilient Behavior for Interaction Design Haruka Yoshida1(B) , Taiki Ikeda1 , Daisuke Karikawa2 , Hisae Aoyama3 , Taro Kanno4 , and Takashi Toriizuka1 1 Nihon University, Narashino-shi, Izumi-cho 1-2-1, Chiba 275-8575, Japan
[email protected]
2 Tohoku University, Sendai-shi, Aoba-ku, Aramakiaza Aoba, 6-6, Miyagi 980-8579, Japan 3 Electronic Navigation Research Institute, Chofu-shi, Jindaiji Higashimachi 7-42-23, Tokyo,
Japan 4 The University of Tokyo, Bunkyo-ku, Hongo 7-3-1, Tokyo, Japan
Abstract. In recent years, the impact of the resilience of workers has been widely recognized in dealing with unexpected system behavior and unknown situations to safely operate complex socio-technical systems such as aviation, railroad, medical, and nuclear plants. Resilience coping flexibly with novel events and avoiding failures or worst-case scenarios is concretely defined as four-resilience potential (RP), which are responding, monitoring, learning, and anticipating. In this research, we aim to realize a human-system interaction to support the improvement of RP of workers. As the first step, we conducted a cognitive experiment to characterize workers’ RP when they must respond to a significant situation change. In the experiment, we classified participants into two groups: an HS group with high Non-Technical Skill (NTS) and an LS group with low NTS. The results revealed the following: (1) The correlation between NTS and task performance is low under normal conditions, (2) In an emergency when significant changes in circumstances occur, the HS group may have relatively higher RP than the LS group, and (3) The HS group always tends to perform tasks with an awareness of the target values and a detailed understanding of the situation. These characteristics of the HS group may contribute to demonstrating resilient performance. Keywords: Resilience Potential · Interaction Design · Cognitive Psychology Experiment
1 Introduction It is strongly required to maintain a high level of safety and reliability of the sociotechnical systems playing an essential role as the foundation of society, such as aviation, medical care, and transportation. Conventionally, the safety of these socio-technical systems has been defined as "the absence of unacceptable risks." In the field, the most critical aspect of safety management is carrying out work according to manuals and procedures. Safety should be maintained by eliminating factors that induce human error, which can cause risk, as much as possible (Safety-I) [1]. As the scale and complexity of tasks related © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 182–195, 2023. https://doi.org/10.1007/978-3-031-35132-7_13
Analysis of Resilient Behavior for Interaction Design
183
to these social infrastructures have increased in recent years, the socio-technical systems that support these tasks have become more sophisticated. For example, in aviation, in addition to aircraft autopilot, technologies have been developed to automatically detect aircraft in a conflict in air traffic control operations, and these technologies are now in actual operation. Many people expect that the introduction of AI and other advanced automation technologies contributes to reducing the workload on operators, avoiding the risk of errors, and improving overall operational efficiency. As socio-technical systems that support complex operations become more sophisticated, the systems and the work situations surrounding them more often change. Hollnagel et al. point out that socio-technical systems are not constant but constantly changing, and change is their true nature [1]. In such systems, safety should be “maintaining functionality even under fluctuating conditions,” and the safety of the entire system needs to be enhanced by humans flexibly coping with ever-changing conditions while continuing to adapt to the surrounding environment (Safety-II) [2]. For example, in air traffic control operations in the aviation field, air traffic controllers are required to respond flexibly to ever-changing conditions, such as aircraft and weather conditions, so that each aircraft can fly safely and smoothly and air traffic in the entire airspace can be efficient [3]. In daily clinical operations in the medical field, medical personnel must make appropriate judgments by the ever-changing conditions of patients and sites and treat and save patients [4]. Furthermore, in maintenance work in the nuclear field, the wear and corrosion of equipment are progressing, while the equipment condition is changing due to disassembly and reassembly, requiring flexible response by workers [5]. As these examples show, the key to the safe functioning of socio-technical systems is how to improve resilience potential (RP) [6], which is the ability of workers to respond flexibly to changes in systems and situations and to avoid failures and high-risk situations. In this study, we aim to realize system interactions that support the improvement of workers’ RP. We will clarify what information is needed and how to show it on the system to improve the RP of workers, realize them, and organize them as interaction design requirements to improve RP. As the first step, this paper describes experiments and results to clarify the characteristics of RP in situations where workers have to cope with significant changes in situations.
2 Related Works Several previous studies have discussed education and training to improve RP. Some of them focused on approaches to soft aspects such as people and organizations. For example, Hollnagel, who proposed RP, lists coping, foreseeing, monitoring, and learning as the four potentials that constitute RP and suggests that education from this perspective can enhance human RP [2]. Kitamura further states that the complementary requirements of the four potentials are adequacy of resource deployment, ability to recognize change, learning from examples of good practice, and proactive behavior [7]. Thus, while there has been much discussion of resilience in the soft aspects of people and organizations, approaches to the problematic aspects, such as the physical environment surrounding people and organizations, including equipment and facilities and the information environment, have also been studied—for example, Nakanishi et al. In a
184
H. Yoshida et al.
previous study, our research group [9] focused on information attractiveness in air traffic control operations and proposed a screen design guideline that utilizes visual salience. The proposition can be a requirement for screen design to improve the performance of air traffic controllers in air traffic control operations, which require flexible responses to changing conditions. In conclusion, interaction design to support RP improvement requires various considerations, such as information design, interface design, and other expressive designs. However, few studies still clarify the characteristics and factors of RP, which is the basic knowledge for designing such interaction design. For example, in a previous study by Karikawa et al. [11], a firefighting simulation was set as a task to demonstrate participants’ RP, and it was found that the high-performance group and the low-performance group behaved differently when a large-scale disaster with no experience occurred. However, specific characteristics and factors that lead to RP, such as what kind of differences in behavior, thinking, and learning characteristics exist between participants in the different groups, have not been clarified. Therefore, this study aims to clarify the basic knowledge for designing interactions that support the improvement of RP through an experiment.
3 Experiment 3.1 Overview We conducted the experiment to clarify the characteristics and factors of RP as essential findings for the design of interactions that support the improvement of individual RP. A breakfast cooking task [10], similar to air traffic control work requiring RP of workers, was set as the experimental task. We classified the group of participants who scored high in Non-technical Skill (NTS) in the pre-survey into the HS group and the group of participants who scored low in NTS into the LS group. Assuming a positive correlation between NTS scores and RP, we attempted to extract participants’ behavior patterns and thinking characteristics with high or low RP by analyzing differences in behavior and thinking between the HS and LS groups. Of the four potentials of RP (coping, monitoring, learning, and anticipation), this study focused on coping and learning. 3.2 Participants We asked 102 undergraduate and graduate students at Nihon University to complete a questionnaire to measure NTS [13]. Sixteen participants, the top 8 (after this referred to as the HS group) and the bottom 8 (after this referred to as the LS group), took part in the experiment. The participants were men and women in their 20s who were familiar with mouse operation because they usually used PCs for classes and assignments. The Ethics Review Committee of the College of Industrial Engineering, Nihon University approved the experiment (approval number: S2020-006).
Analysis of Resilient Behavior for Interaction Design
185
3.3 Experimental Hypothesis Based on the study by Karikawa et al. [11], we formulated the following experimental hypotheses focusing on individual RP. Hypothesis 1: There is a positive correlation between NTS and resilience potential. Hypothesis 2: High resilience potential will behave differently from low resilience potential groups when there is a change in the situation that does not fit their usual patterns (routines). Hypothesis 3: High and low resilience groups will behave differently from LP groups because they have different ways of understanding, knowing, and learning about the task.
Fig. 1. Flow of the experiment
Fig. 2. Example screenshot of the Breakfast Cooking Challenge application
186
H. Yoshida et al.
3.4 Experimental Task The experimental task was a breakfast task proposed by Craik & Bialystok. In the breakfast task, the participants prepared several sets of breakfasts according to the number of customers, which consisted of a cup of coffee, a fried egg, and a piece of toast. In addition, the participants have to correspond to additional orders for coffee, a fried egg, and a piece of toast. The task is to prepare all ordered items as quickly as possible without making any mistakes. The reason for selecting the breakfast cooking task for this experiment is that it shares some common characteristics with the air traffic control task, in which the resilience potential of the workers is significant. The following are examples of common task characteristics. • Only the goal to be achieved is given, and it is left to the operator to decide on procedures and strategies. • The operator monitors the situation and performs multiple tasks in parallel within a specific timeframe. Participants conducted the experiment using an application [10] for performing this task on a PC. Figure 1 shows an example of the application screen used in the experiment. The application indicated on a dialogue additional orders by the number of additional servings per item, but there are cases where it indicates no additional orders. The cooking time varies depending on the item, and coffee is automatically prepared. Participants can cook up to two pieces of toast at a time or one piece at a time. After a long time has elapsed, the toast cools down and is automatically discarded as waste. Similarly, if a fried egg is cooked incorrectly, it will be automatically discarded as waste. The experiment prepared two kinds of scenarios: a steady-state scenario for a steadystate situation and an emergency scenario for an emergency. In both scenarios, one scenario consisted of three table orders, and one table order consisted of a breakfast set for two or three people and an additional order. Participants received instructions to operate one cooking task application in the steady-state scenario but two simultaneously on two screens in the emergency scenario. This experiment set up two kinds of scenarios with different intentions. The steady-state scenario allowed participants to discover and become familiar with the basic procedures, rules, and strategies for performing the task. In the emergency scenario, we intended that the participants would recognize that the situation had changed to the point where their accustomed methods were no longer applicable. They would demonstrate their resilience and potential to avoid failure by devising alternative methods. 3.5 Procedure Figure 1 shows the flow of this experiment. The numbers in parentheses indicate the approximate time required. The experiment was conducted only after the experimenter had explained the purpose and task to the participants and obtained their consent. First, participants were asked to perform two practice scenarios (six table orders) to familiarize themselves with the task and operations, followed by five main scenarios. Each scenario lasted approximately 5 min, with 5 min of interview and preparation in between. The
Analysis of Resilient Behavior for Interaction Design
187
total duration of the experiment was approximately 90 min. The practice scenarios corresponded to the steady-state scenarios, and the participants were free to ask questions about the operation method and task content during the practice sessions. Scenarios 1 through 5 were conducted in the main experiment, with scenarios 3 and 5 being the emergency scenarios and the other scenarios 1, 2, and 4 being the steady-state scenarios. Scenario 3 was designed to investigate RPs for coping with a strange situation because it was assumed that participants would notice a change in the situation that would make their accustomed methods ineffective. Scenario 5 was designed to investigate RPs related to learning because it is a situation in which participants are re-responding to an emergency scenario they once experienced. 3.6 Experimental Environment The experimental environment was conducted in a quiet space with the door closed, and the illumination level of the room was equivalent to that of a typical office environment. A 23.5-inch monitor was used to display applications, and the viewing distance between the screen and the participants was 550 mm. Figure 2 shows a diagram of the equipment and human arrangement in the experiment and the experimental situation. Participants used Screen1 and Mouse1 in all scenarios and Screen2 and Mouse2 in Scenarios 3 and 5. 3.7 Measurement Performance on the Experimental Task. In this experiment, performance on the experimental task was evaluated using two indices: overtime, which is the difference between the time taken by participants to complete the task (actual time) and the theoretical minimum time (ideal time), and the total number of wastes in the task. Smaller values for both indices indicate better performance. Gaze Data. We measured eye gaze data to objectively measure what information participants were looking at during the execution of the experimental task. The eye tracker Tobii Pro nano was used to collect data on the participant’s eye position at 60 Hz intervals during the scenario. The experimental screen was divided into 21 regions for data analysis. Figure 3 shows the definitions of the regions. Interview Data. We measured eye gaze data to objectively measure what information participants were looking at during the execution of the experimental task. The eye tracker Tobii Pro nano was used to collect data on the participant’s eye position at 60 Hz intervals during the scenario. The experimental screen was divided into 21 regions for data analysis. Figure 4 shows the definitions of the regions (Table 1).
188
H. Yoshida et al.
Fig. 3. Layout of equipment (left) and scene during the experiment (right).
Fig. 4. Definition of areas in the experiment screen
4 Results 4.1 Performance on the Experimental Task To evaluate the performance of the experimental tasks, the overtime and the total number of wastes for each scenario were measured. Figure 5 shows a graph of the exceedance time for each experimental scenario, and Fig. 6 shows the total number of wastes for each experimental scenario. Bonferroni’s multiple comparisons of exceedance time and the total number of wastes between scenarios showed no significant differences at the 5% level between the steady-state scenarios (scenarios 1–2, 1–4, and 2–4, p = 1.00, 1.00, 1.00). On the other hand, there was a significant difference at the 5% level between the steady-state and emergency scenarios (p = 6.9 × 10–8, 1.7 × 10–4, 5.2 × 10–9, 1.6 × 10–5, 2.0 × 10–8, and 5.5 × 10–5 for Scenarios 1–3, 1–5, 2–3, 2–5, 4–3, and 4–5, respectively). There was also no significant
Analysis of Resilient Behavior for Interaction Design
189
Table 1. Questions to be asked in the interview No.
Questions
1
What steps did you take to handle the assignment? Why did you decide to do so?
2
Were there any situations where things did not go as planned/expected? How did you handle those situations?
3
What did you do to make the task work? What were the results you obtained?
4
Based on what went well and what did not go well, what are you thinking about doing next time? If you were in a similar situation, what would you do?
5
What information did you use as a clue, and what did you do with it?
Fig. 5. Overtime for each experimental scenario.
Fig. 6. Total number of wastes for each experimental scenario.
difference at the 5% level between the steady-state scenarios (between Scenarios 1, 2, and 4) for the total number of wastes (p = 1.00, 1.00, and 1.00 for Scenarios 1–2, 1–4, and 2–4, respectively). On the other hand, among the steady-state and emergency scenarios, there was no significant difference between the steady-state scenario and scenario 5 (p = 1.00, 0.73, 1.00 for scenarios 1–5, 2–5, and 4–5, in that order), but there was a significant difference
190
H. Yoshida et al.
between the steady-state scenario and scenario three at the 5% level for both (scenarios 1–3 Scenarios 1–3, 2–3, and 4–3, in that order, p = 2.5 × 10–3, 5.3 × 10–4, and 1.0 × 10– 3, respectively). The results show that while task performance was stable in Scenarios 1, 2, and 4, it worsened in Scenario 3, and in Scenario 5, there was no significant difference between Scenarios 1, 2, and 4 and Scenario 5, although the overtime tended to worsen again and the number of wastes tended to worsen. These results indicate that Scenarios 1, 2, and 4 functioned as steady-state scenarios in which the participants executed the tasks according to their strategies and maintained stable performance. On the other hand, Scenarios 3 and 5 were emergency scenarios in which the strategies the participants had been practicing were no longer applicable, and they functioned as opportunities to demonstrate resilient behavior, considering the tendency for task performance to deteriorate. Next, we compared the task performance of the HS and LS groups in each scenario. Based on the results in Figs. 5 and 6, Scenarios 1, 2, and 4 are summarized as the steadystate scenarios in this analysis. Figure 7 shows a graph comparing the performance of the HS and LS groups in terms of exceedance time. In the steady-state scenario, the LS group (M = 68.89, SD = 24.16) performed significantly better than the HS group (M = 82.48, SD = 20.44) in the NTS (t(46) = 2.10, p = 0.041). For Scenario 3, the emergency scenario, there was no significant difference between the HS group (M = 218.7, SD = 88.43) and the LS group (M = 196.36, SD = 85.84) (t(14) = 0.512, p = 0.62). For scenario 5, there was no significant difference between the HS group (M = 170.4, SD = 57.1) and the LS group (M = 170.2, SD = 98.8) (t(14) = 0.0050, p = 0.99). A graph comparing the HS and LS groups in terms of the number of wastes is shown in Fig. 8. For the steady-state scenario, there was no significant difference between the HS group (M = 3.42, SD = 2.9) and the LS group (M = 5.42, SD = 7.95) for NTS (t(46) = -1.16, p = 0.25). For Scenario 3, the emergency scenario, there was no significant difference between the HS group (M = 13.4, SD = 8.52) and the LS group (M = 21.0, SD = 20.1) (t(14) = -0.99, p = 0.34). No significant difference was found for scenario five between the HS group (M = 7.75, SD = 7.69) and the LS group (M = 11.12, SD = 5.25) (t(14) = -1.03, p = 0.32).
Fig. 7. Overtime for HS and LS groups in each scenario
Analysis of Resilient Behavior for Interaction Design
191
Fig. 8. Total number of wastes for HS and LS groups in each scenario
4.2 Correlation Between NTS Scores and Task Performance To determine whether the NTS correlates with the performance of this assignment, we analyzed the correlation between the overall NTS score and the scores of the six items, the overtime, and the number of wastes. The results of the correlation coefficients among the indicators are shown in Table 2. From the table, it can be read that the correlations between all indicators are low. This suggests that there is a strong possibility that the NTS is not related to the performance of this task. Table 2. The correlation coefficient between each NTS score and assignment performance Performance index NTS item
Overtime
The number of wastes
Total
−0.028
−0.21
0.064
−0.25
Situation awareness Decision
−0.019
−0.068
Workload management
−0.037
−0.18
Planning
−0.087
0.12
Summarize Attitude
0.052 −0.053
0.012 −0.23
4.3 Gaze Data To determine to what extent participants looked at which areas of the practical application to collect information during the experiment, we divided the areas within the application into 21 regions according to function, as shown in Fig. 4, and calculated the percentage of gaze accumulation time in each region relative to the total task time. Figure 9 shows the ratio of accumulated gaze time in each region for each group. Region 22 in the graph
192
H. Yoshida et al.
Fig. 9. Accumulate gaze time in each area
shows the percentage of time spent looking outside the application area. The graph shows that the percentage of time spent looking at regions 11 and 21 is significantly larger than that of the other regions. Region 11 is where the fried egg indicator is displayed, and it is necessary to constantly monitor the information in this area to press the button at the specified time for the desired degree of cooking. Area 21 is a message area that displays the history of the number of orders indicated and is used to confirm the target value of the current operation. Significant differences between the HS and LS groups were found in areas 11 and 19. Area 19 is the “All done” button, activated when the number of cooking completions reaches the target value and the work is completed. In this region, the LS group had a significantly higher percentage of gaze accumulation time than the HS group. In area 11 (fried egg indicator), the HS group had a significantly higher rate of accumulated eye movement time than the LS group.
5 Discussion 5.1 Relationship Between NTS and Resilience Potential In this experiment, the correlation between NTS and task performance was low (Table 2). Furthermore, the LS group performed significantly better than the HS group in the typical scenario in terms of exceedance time (Fig. 7). On the other hand, in the emergency scenarios, Scenarios 3 and 5, the HS group tended to perform better than the LS group regarding the number of wastes (Fig. 8). Although no significant differences were found in the results, the relatively higher performance of the HS group in the emergency scenarios suggests that the resilience potential of the HS group may be relatively higher than that of the LS group. These results suggest that the correlation between NTS and task performance is low under normal conditions and that the group with high NTS may exhibit more resilience potential than the group with low NTS under emergency conditions. In the future, it is necessary to increase the number of participants in the experiment to confirm the statistical significance of the trends observed in this study and to clarify the relationship between NTS and resilience potential by conducting a detailed analysis of the relationship between task performance and resilience potential under typical and emergency scenarios.
Analysis of Resilient Behavior for Interaction Design
193
5.2 Analysis of Resilient Behavior Based on Gaze Data In this experiment, the amount of gaze accumulated between the HS and LS groups was significantly different for the fried egg indicator (area 11 in Fig. 9) and the “All done” button (area 19 in Fig. 9). Specifically, the HS group looked more at the fried egg indicator, and the LS group looked more at the all-done button. The fried egg is the only one of the three items that require constant monitoring of the cooking progress, and the end operation must be performed at the appropriate time. The indicator is an important clue to avoid discarding an item by missing the appropriate time to press the end button, and participants in the HS group stated that they used the indicator for the fried egg not only to check the progress of the fried egg but also to determine when to start cooking other items and that they always watched the information in this area. The HS group showed some of these innovations in the experimental task, which may have led to the significant difference in the amount of eye contact with the fried egg indicator in the present study. On the other hand, the post-interview interview revealed that the LS group participants judged whether the target value was achieved based on whether or not the "All done" button was pressed. It is thought that these participants did not actively grasp the target number or recognize the situation but only blindly performed the cooking task and judged whether or not the task was completed by using the “All done” button. This means that the participants were hardly aware of the task progress status. This behavior is the opposite of resilient behavior, which can adaptively cope with unexpected events by always being aware of the status of the situation, and the presence of several participants in the LS group who exhibited similar behavior may have significantly increased the amount of eye contact with the “All done” button. These results suggest that the HS group was more likely to engage in resilient behavior than the LS group, although the correlation between NTS scores and task performance was low. 5.3 Interaction Design Requirements to Motivate Resilient Behavior From the results of the previous section, interactions with functions that support direct action decisions, such as the "All done" button that becomes active when a goal is achieved, are not expected to encourage resilient behavior positively. On the other hand, considering that a smooth and detailed understanding of target values and situation awareness leads to resilient behavior, the following requirements for interaction design that promotes resilient behavior can be identified. • To make it easy for participants to grasp the target value intuitively. • The work situation should always be recognizable. In this experiment, the number of orders for the cooking items, which is the target value, was presented in a dialog box at the beginning and middle of the cooking process and was continually displayed in the message area of the application. When additional orders modified the target values, they verbally confirmed the target values for each item and re-memorized them. Therefore, it is desirable to use expressions such as numbers and pictures that are easy to visualize and memorize rather than written messages for the customer to grasp the target values intuitively. It is a future issue to clarify the effects of such different expressions on resilient behavior.
194
H. Yoshida et al.
6 Conclusion This study aims to realize a system interaction to support the improvement of workers’ RP. As a first step, we clarified RP’s characteristics in tasks requiring workers to respond to changing situations. First, the NTS of 102 university students was measured as a preliminary survey. Then, students with the top and bottom eight scorers were selected as the participants of this experiment, which were called “HS group” and “LS group”, respectively. Two scenarios were prepared: a typical scenario in which a stable, steady state was assumed and an emergency scenario in which a significant situation change was assumed. In the emergency scenario, we attempted to extract the characteristics of RP by analyzing the differences in the behavior of participants in the HS group and the LS group. As a result, the followings were found: • The correlation between NTS and task performance is low in the typical scenario, which assumes normal operations. • In an emergency scenario where participants have to cope with a significant change in the situation, the HS group in the NTS tends to perform better than the LS group, suggesting that the RP of the HS group may be relatively high. • Participants in the HS group tended to perform the task always with the target value in mind and to grasp the situation in detail, which may have led to their resilient behavior. Future research contains a detailed analysis of the relationship between qualitative data, such as interviews and operation logs, and quantitative data, such as eye gaze data, to clarify the characteristics of RP-related thinking and behavior, which can contribute to specifying the interaction requirements to promote resilient behavior. Acknowledgment. Professor Etsuko Harada and the Harada Laboratory at the University of Tsukuba provided the Breakfast Cooking Challenge application used in this experiment. We sincerely appreciate their kindness.
References 1. Hollnagel, E.: Safety-I & Safety-II, Routledge (2014) 2. Hollnagel, E., Pariès, J., Woods, D.D., Wreathall, J. (eds.) Resilience Engineering Perspectives. Volume 3: Resilience Engineering in Practice. Ashgate, Farnham (2011) 3. Kohno, R.: The reality of human error in air traffic control. J. Hum. Interface Soc. 3(4), 221–228 (2001) 4. Nakajima, K.: The applicability of resilience engineering theory to medical safety. Japan. J. Endourol. 30(1), 54–60 (2017) 5. Kitamura, M.: Design of resilience for enhancement of practical knowledge. J. Soc. Instrum. Control Eng. 54(7), 470–478 (2015) 6. Hollnagel, E.: Safety-II in Practice, Routledge (2017) 7. Kitamura, M.: Progress in methodology for pursuing safety~safety-II and resilience engineering. Hum. Factors Jpn 21(2), 37–48 (2017) 8. Nakanishi, M., Yao, K.: Study on information design for supporting resilient operation: a practical method for safety-II. Hum. Factors Jpn 23(1), 30–51 (2018)
Analysis of Resilient Behavior for Interaction Design
195
9. Yoshida, H., et al.: A study of design policy for air traffic control radar screen using the color salience model. Japan. J. Ergon. 57(4), 180–193 (2021) 10. Harada, E., et al.: Individual differences in dynamic task management: a new making multiple breakfast task (Diner’s chef’s breakfast task). Proc. Japan. Soc. Cogn. Psychol. O3-1-2 (2014) 11. Karikawa, D., et al.: Experimental study on resilience using simplified simulation environment. Trans. Hum. Interface Soc. 21(2), 155–168 (2019) 12. Klein, G.A., Calderwood, R., MacGregor, D.: Critical decision method for eliciting knowledge. IEEE Trans. Syst. Man Cybern. 19(3), 462–472 (1989) 13. Nishido, M.: Research on the scaling of fundamental competencies for working persons. J. Pool Gakuin Univ. 51, 217–228 (2011)
How Information Influences the Way We Perceive Unfamiliar Objects – An Eye Movement Study Lanyun Zhang1(B) , Rongfang Zhou1 , Jingyi Yang1 , Zhizhou Shao1 , and Xuchen Wang2 1 Nanjing University of Aeronautics and Astronautics, 29 Yudao St., Nanjing 210016, People’s
Republic of China [email protected] 2 Xi’an Jiaotong-Liverpool University, 111 Ren’ai Road, Suzhou 215123, People’s Republic of China
Abstract. The world around us is filled with unfamiliar objects and items that we may not know much about. To better understand how we process and perceive these unknown entities, researchers have explored the physiological responses of our brain when it processes different types of information. This study aims to explore how information impacts people’s perception of unfamiliar objects, with a specific focus on eye movements. 17 participants were recruited to take part in this within-subject experiment, where they were tasked with evaluating aviation engines in terms of perceived quality, cohesion, and reliability. Four models of aviation engines were chosen as the unfamiliar objects, with two models being assessed without relevant information and the other two being assessed with relevant information. The results showed a significant difference in the perceived quality of aviation engines before and after providing relevant information. Additionally, eye movement metrics such as fixation, saccade, and heatmap revealed changes in the way participants perceived the aviation engines. The findings of this study contribute to the understanding of the role of information design and its potential to effectively increase user interest. Keywords: Unfamiliar Objects · Eye Movement · Perception · Relevant Information
1 Introduction We are surrounded by objects and things that are unfamiliar or even unknown to us all the time. For instance, we might have seen pictures of a famous painting but not everyone is familiar with the story behind it; or we might have seen pictures of corals but not everyone knows that they reproduce by releasing eggs and sperm together based on lunar cues, such as the time of sunset. As we have all experienced, having a bit of information about the unfamiliar or unknown can guide our perception of these objects, enhance our experience when we look at them, affect our level of appreciation when we interact with © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 196–208, 2023. https://doi.org/10.1007/978-3-031-35132-7_14
How Information Influences the Way We Perceive Unfamiliar Objects
197
them, and potentially trigger our interest to know more about them. However, from a physiological point of view, it is unclear how having a bit of information about unfamiliar objects influences the way we perceive these objects. Previous research has explored how our body reacts when our thoughts are running wild, such as during recall of memory [1]. For instance, pupil dilation can help assess the level of interest [2], eye movement trajectory can help evaluate the user experience [3], and eye tracking can estimate the degree of uncertainty during decision-making [4]. In this study, the focus is on exploring how we perceive an unfamiliar object before and after receiving relevant information, rather than the mere moment of receiving the information. Previous work has investigated how people process visual stimuli, which typically falls into two categories: bottom-up and top-down. In bottom-up processing, we allow the stimulus itself to shape our perception without any preconceived ideas, while in top-down processing, we use our background knowledge and expectations to interpret what we see. Further, existing work has examined how visual stimuli affect people’s physiological responses to understand the underlying cognitive process, using technologies such as eye tracking and electromyography devices. A number of eye movement metrics are widely explored, such as pupil size, fixation, saccade, and heatmaps. This study aimed to investigate how information influences the way people perceive unfamiliar objects, with a focus on eye movement. It contributes to our understanding of the implications of information design and when well-designed information can be the most effective in increasing users’ interest. The objectives of this exploratory study are: •To explore people’s subjective perceptions of an unfamiliar object before and after receiving relevant information. •To explore people’s objective responses (i.e., eye movement metrics) of an unfamiliar object before and after receiving relevant information.
2 Background and Related Work This section introduces related work in four parts: (1) visual information processing, (2) visual stimuli and physiological metrics, (3) eye movement metrics, and (4) challenges of this research. 2.1 Visual Information Processing The study of how we process what we see has a long history in both neuroscience and philosophy. Many studies have tried to explain how physical energy received by our sense organs forms the basis of perceptual experience. Currently, two main processes of visual perception are recognized in the literature: bottom-up and top-down. The aim of this research is not to explore whether people use top-down or bottom-up processing when observing an unfamiliar object with or without relevant information. Hence, this section provides background on the related work of visual information processing. Bottom-up processing, also known as data-driven processing, is based on the idea that visual perception was developed through evolution to help species survive and avoid danger [6]. This type of processing largely follows the theory presented by Gibson [5]. In other words, visual perception is forged by evolution and people are not trained to see.
198
L. Zhang et al.
Notably, Gibson does not subscribe to the idea that perceptual system has a memory. In contrast, top-down processing refers to the use of contextual information in visual perception. Psychologist Richard Gregory [7] argued that perception is a constructive process, where ambiguous stimuli from our environment require higher cognitive information (either from past experiences or existing knowledge) to make inferences about what we see. It is challenging to separate bottom-up and top-down processing in visual perception. Some research focuses on one type of visual information process, while others aim to combine the two for a better interpretation of their findings. For example, French et al. (2004) found that young infants’ understanding of images of different animals was primarily a bottom-up process [8]. Metz et al. (2017) argued that attention was primarily controlled by top-down processes while driving with visual secondary tasks [9]. Gegenfurtner et al. (2020) believed that more knowledge-driven top-down processing and less bottom-up processing of visual information helped experts quickly scan their environment [10]. Raya et al. (2020) studied the top-down and bottom-up networks of children with dyslexia in contextual and isolated word reading tasks [11]. Sussman et al. (2016) reviewed both cognitive and neuroscience literature, showing how factors in top-down and bottom-up interactions may contribute to threat-related perception and attention in anxiety [12]. 2.2 Visual Stimuli and Physiological Metrics Apart from exploring how people process what they see, existing research has investigated how visual stimuli affect people’s physiological responses, using technologies such as eye tracking, electromyography (EMG), and electromyography (EEG). These technologies are also widely used in the exploration of user experience as they provide an alternative view (i.e. objective data of user responses) to the traditional self-reported approach (i.e. subjective data of user responses). Additionally, these technologies are non-invasive and do not intervene people’s tasks at hand. Eye tracking records eye movements, such as searching and fixating on a subject. Tools that analyse eye movement can visualize people’s scanning path, area of interest, and help infer people’s psychological changes, such as concerns and difficulties in understanding. One of the challenges when analysing eye movement is that attention is divided into two types: voluntary attention and involuntary attention. That is, people will not only shift attention consciously, but also unconsciously scan over certain areas. Therefore, eye movement data should be carefully analysed to infer people’s preferences in user experience study. Guo et al. [13] explored whether eye movement metrics could reflect the user experience before purchasing and found that: (1) users had large pupillary variation in response to products with poor user experience; (2) pupil dilated when users fixated on goal related stimuli on a goal-oriented search mode. Another widely used physiological tool in studying human psychology is through studying brain activity, using electromyography (EEG). To connect human behaviour, psychology, and EEG data, EVENT-related potential (ERP) is used to provide a direct, real-time, millisecond-scale measure of neural activity mediated by neurotransmission. Handy et al. [14] used ERP to study the brain mechanism of users’ cognitive processing of icons with varying degrees of preference. Lin et al. [15] investigated how semantic
How Information Influences the Way We Perceive Unfamiliar Objects
199
networks indexed by ERP-N400 components responded to the visual stimuli of different artistic furniture styles. EEG and eye movement are often used together to explore people’s behaviour and intentions. Gino et al. [16] introduced a novel approach to analyse people’s web behaviour and preferences through analysing their click intentions, pupil dilation, and EEG responses. In this study, we focused on taking an eye-moving perspective to study how information influences the way people perceive unfamiliar objects. 2.3 Eye Movement Metrics Eye tracking provides objective data that enables human-computer interaction (HCI) researchers to comprehend cognitive processes of users and eye movements can provide essential insights into problem-solving, reasoning, mental imagery, and search strategies [17]. The “eye-mind” hypothesis by Just and Carpenter [18] suggests that what people see with their eyes mirrors their inner thoughts, making eye tracking a valuable tool in HCI research. Fixation and saccade are the two most essential eye movement indicators in eye tracking studies. Fixation, defined as the period during which the eye remains relatively stillness for an average of 218 ms [17], is considered a time for information processing. Studies have shown positive correlations between fixation and cognitive load, information complexity and interest level [17, 19, 20]. It is believed that fixation duration reflects the time it takes for users to process information, so longer fixation durations may indicate an increase in information complexity and cognitive load. A more frequent focus shown in a heatmap on a specific region indicates that this region could easily catch people’s attention, but the underlying reasons are unknown. Fu et al. [21] highlighted that eye-tracking data alone does not indicate whether a user was processing the visual information or having trouble processing it. Therefore, the results must be explained with context. Saccades are rapid eye movements between fixations that typically last 20 to 35 ms [17] that represent an information-gathering process [22]. As saccade involves little encoding behaviour [23], little cognitive process can be derived from it. Saccades are usually related to locating visual targets or areas of interest, which means that longer saccade durations indicate longer searching. In addition to the fundamental metrics, a number of other indicators are also widely used in analysing eye movements. Gaze, also known as “dwell” or “fixation cluster,” represents the sum of all fixation durations in a given area. It is often used to compare the distribution of attention among specific targets [24]. The scan path describes a complete saccade-fixation-saccade sequence, and a scan path with a longer duration indicates less efficient scanning [25]. The spatial density of fixation points reflects whether the scanning is direct. Blink rate and pupil size are also studied as potential indicators of cognitive workload. Lower blink rates and larger pupil sizes both suggest a greater cognitive load [17]. However, blink rate and pupil size are susceptible to contamination by a variety of factors, such as ambient light levels [25], and are less commonly used in eye-tracking studies. Heatmap is an effective tool for visualizing the distribution of visual attention. It demonstrates the spatial density of fixation points by using different colours to indicate the intensity of attention (e.g. red, yellow, green, and blue). Examining the heatmap
200
L. Zhang et al.
helps determine the area that catches more attention. Concentrated fixations in a small area indicate a focused and efficient search, while spread fixations imply a widespread and inefficient search [17]. Existing research has found that decision-makers attend to stimuli with higher task relevance and ignore stimuli with little relevance [26]. 2.4 Challenges of This Research There are a few challenges in answering the research question. Firstly, the unfamiliar object in the study must be selected carefully so that it is not too well known to the majority and is intriguing enough to attract people’s attention. Additionally, the presented objects cannot be the same in repeated measures, so variations of the object should be available. Secondly, the task assigned to participants must be related to the characteristics of the object and natural enough for participants to look at it for a while. Hence, the tasks assigned to participants must be carefully considered to avoid a situation where participants are simply staring at the object without any cognitive engagement. To ensure that the recorded eye movement metrics are reflective of people’s cognitive processes, appropriate tasks must be designed. Thirdly, the selection of eye movement metrics is crucial in answering the research question. We have chosen fixation, saccade, and heatmap to examine the impact of information intervention on users’ perception of unfamiliar objects. This is because these metrics provide a good indication of how users search for and process information. Fixation and saccade can interpret the user’s cognitive processes, while heatmaps can reveal users’ visual patterns. Lastly, interpreting the eye movement metrics to answer the research question is also a challenge. Changes in eye movement metrics can have multiple causes and interpretations, so combining subjective data from participants as a complement to the objective data is necessary to obtain a holistic view.
3 Method 3.1 Study Design In this study, 4 aviation engine models were selected as the unfamiliar objects for participants to observe. The reason behind using aviation engines is because they are wellknown but not familiar to most people and come with various models that share a similar structure. The independent variable in this study was whether or not related information about the aviation engines was presented to the participants. The participants were tasked with scoring each model based on engine pictures regarding their perceived quality, cohesion, and reliability. Three perceptions were selected to describe aviation engines through a review of the literature, as they each describe the characteristics of the engines from different perspectives and are easy to understand for assessment. Table 1 shows the four selected models, with the first two being tested before providing related information to the participants, and the remaining two being tested after. All engine models were upto-date, with little difference existing among them. Participants were required to score all four models, making this a within-subject study. The dependent variables were the subjective scores across the three perceptions and eye-movement metrics (i.e., heatmap,
How Information Influences the Way We Perceive Unfamiliar Objects
201
fixation duration, and saccade duration). The information about aviation engines was prepared by a student who has taken aeroengine structure analysis and design module and passed with a 5.0 grade. The information was structured and presented in presentation slides with pictures and text. Table 1. Models of aviation engines used in this study Model 1
Model 2
Model 3
Model 4
WS Taihang
Trent 1000
EJ 200
CF 34-10A
3.2 Participants and Procedure 17 participants (aged 20 to 27) were recruited from the university campus for this study. The participants had little knowledge about aviation engines, including their structure, design, and materials. After signing the consent form and receiving a brief introduction to the experiment, each participant was seated properly in front of a SMI iView X™ RED system to track eye movements. The sampling frequency was 500 Hz. The experiment was conducted in a well-lit and soundproofed room. The participants were asked to sit still during the experiment after configuration. The participants were asked to assess each presented aviation engine model with a score in terms of the degree of perceived quality, perceived cohesion, and perceived reliability based on the given picture. A warm-up session was given to each participant to get familiar with the experiment, so they would not feel disoriented in the main sessions. In the measurement of one engine model, as shown in Fig. 1, each participant was allowed 8 s to look at the picture, followed by verbalising the score in the scale of 1 (strongly disagree) to 5 (strongly agree). By the end of the study, each participant was interviewed by the researchers to express their thoughts openly.
Fig. 1. Procedure of one repetition of the main body of this study
202
L. Zhang et al.
3.3 Data Collection and Analysis Scores of all aviation engine models across three perceptions were collected through participants’ verbal responses in a 5-point Likert scale, immediately after viewing each engine picture. The nature of the scores was subjective and non-parametric, descriptive and statistical analysis were conducted using SPSS to explore the impact of providing relevant information on people’s perceptions of unfamiliar objects. Wilcoxon tests were used with the alpha level set to 0.05. Eye movement metrics were collected through SMI Experiment Centre and heatmaps were derived via SMI BeGaze. In this study, fixation duration and saccade duration were further analysed. Interview data was not thematically analysed as it was intended as a complementary perspective to the subjective scores and eye movement metrics.
4 Results 4.1 Subjective Assessment Across Three Perceptions The scores of the two engine models without relevant information were compared to the scores of the other two engine models with relevant information, across three perceptions: perceived quality, perceived cohesion, and perceived reliability. As shown in Fig. 2, a significant difference (Z = 2.775, p = 0.006) was found in the perceived quality of aviation engines, with the median scores increasing from 3 to 4 when relevant information about aviation engines was provided. However, neither perceived cohesion (p = 0.105) nor perceived reliability (p = 0.162) demonstrated statistical significance. According to the descriptive analysis, the scores for perceived cohesion increased from 3 to 4 after participant were presented with the information, while the scores for perceived reliability remained unchanged at 4. Overall, scores across all three perceptions of aviation engines either increased or remained the same after participants received relevant information. This suggests that participants learned to appreciate and enjoy aviation engines more as they gained knowledge about them.
Fig. 2. Scores across three perceptions with and without relevant information. Median values are highlighted in black lines in each bar.
How Information Influences the Way We Perceive Unfamiliar Objects
203
4.2 Eye Movement and Interview Data This section focuses on analysing the data of eye movement, including (1) fixations and saccades, and (2) heatmaps. Fixation and saccade are two metrics widely used in eyemovement research, which can provide some insights into how participants processed the visual stimuli with the tasks (i.e., assessing engine quality, cohesion, and reliability). Heatmaps can visualize participants’ fixation intensity in red, yellow, and green, representing high, medium, and low levels of intensity. It has been proven that combining qualitative data and quantitative eye-tracking data raises the explanatory power of eye-tracking analysis [27]. Fixation and Saccade. Since Fixation and Saccade Duration Were Recorded in Time Scale, Which is Continuous Data, Mean Values Were Employed to Analyse the Fixation and Saccade Datasets. Table 2 shows the mean values of fixation and saccade duration both before providing with engine-related information and after, across all three perceptions and in total. The results show that the fixation duration decreased after information was presented (mean = 6412.94 ms before, mean = 6257.24 ms after), and the saccade duration increased (mean = 1240.44 ms before, mean = 1304.91 ms after). According to existing work, fixation indicates an information processing period that fixation duration is positively related to cognitive load and difficulty in extracting information [17]. The decrease in fixation duration found in this study might suggest that the cognitive load was reduced after related information was provided. In other words, visual information was extracted and processed more easily when relevant information about aviation engines was obtained. Longer saccade duration implied that the participants spent more time searching for information that might help complete the scoring task at hand, than processing it. Table 2. Fixation and saccade data for the three perceptions and total Perception
Information
FDT [ms]
SDT [ms]
Quality
without
6341.21
1311.85
with
6094.04
1461.01
Cohesion
without
6514.36
1177.59
with
6446.56
1170.40
Reliability
without
6386.41
1229.91
with
6231.11
1283.32
without
6412.94
1240.44
with
6257.24
1304.91
Total
FDT: Fixation Duration Total; SDT: Saccade Duration Total
Heatmaps. As shown in Fig. 3, participants’ viewing mode demonstrated two different patterns before and after getting access to related information. First, when perceived quality was prompted, the fixation area was more focused before the information was
204
L. Zhang et al.
given, compared to after, see Fig. 3(a). Second, when perceived cohesion and reliability were prompted, the fixation area was more spread-out before the information was given, compared to after, see Fig. 3(b) and (c). In other words, regarding the perceived quality, participants displayed a concentrated viewing pattern before receiving relevant information, while after receiving relevant information, multiple visual highlights were directed at the engines’ components, such as air intake, exhaust cone, and pipelines. This suggests that participants learned to identify and appreciate different aspects of the engine more efficiently after obtaining relevant information. For perceived cohesion and reliability, participants showed a more dispersed viewing pattern before receiving relevant information, while after receiving relevant information, the viewing pattern became more focused. This result indicated that participants learned how to identify the cohesive structure and reliable features of the engine more efficiently after obtaining relevant information.
Fig. 3. Heatmaps formed by participants’ visual data in three dimensions: (a) perceived quality; (b) perceived cohesion; (c) perceived reliability
The eye movement patterns before and after having the relevant information are reported in this section. Several participants made the following argument: “after acquiring information about aviation engine, I seemed to know what I saw. I was thinking about the function of each component when I was scoring”. It reflects the eye-movement finding that the fixation duration decreased after relevant information was given, that people spent less time to process the components they saw. Considering the scoring questions assessed different aspects of aviation engines, the participants associated different engine components with the three questions as they saw fit. For example, “the engine’s air intake and exhaust cone were associated with the sense of quality”, “the pipelines were in relation to cohesion and reliability”. Similarly, some participants expressed that, “when considering quality, I looked at the material and colours of the engine as a whole, as well as the appearance of the engine’s components”, “when assessing the cohesion of the engine, I checked its pipelines to determine if they are crowded”, and “my evaluation of the engine’s reliability was influenced by the pipeline arrangement". We believe that after acquiring the relevant information, people tended to process the visual information with more caution, made more efforts to find evidence
How Information Influences the Way We Perceive Unfamiliar Objects
205
for their scorings. This finding could be explained by the theory of bottom-up and topdown processing mode, that people with prior knowledge or past experience tend to take top-down approach that they are actively searching for what they have in mind.
5 Discussion This section discusses the findings of this study, including theoretical contributions and practical implications. Theoretical contributions mainly focus on comparing the findings of this study with previous studies. Practical implications offer design considerations that might elicit guidance for information design and future research. 5.1 Theoretical Contributions In this study, we found a significant difference in the perceived quality of aviation engines before and after providing relevant information to participants. It indicates that relevant information and knowledge can significantly impact an individual’s views in a short period. Regarding eye movement, we observed changes in fixation duration, saccade duration, and heatmap patterns before and after the provision of relevant information. Fixation duration is related to cognitive processes, with a longer duration indicating a more difficult cognitive process [18]. Our results showed that the fixation duration shortened after relevant information was provided, indicating a reduction in cognitive workload. Saccade is a behaviour used to locate parts of visual interest [26]. Our study found that saccade duration increased after relevant information was acquired, and participants spent more time searching for additional information to help them complete the assessment. Heatmap is a tool used to display people’s visual patterns, and users tend to concentrate on stimuli that are highly relevant to their goals [26]. Our heatmap analysis showed different patterns before and after the provision of information. After receiving relevant information, participants tended to associate engine components with their assessment tasks. 5.2 Practical Implications The findings of this study also contribute to the implications of information design and highlight the importance of effective information presentation. Firstly, it is widely recognized that the presentation of information is critical in various contexts, such as social media, e-commerce, and human-computer interaction. This study demonstrates how people’s perceptions of an unfamiliar object can be altered with a brief introduction of relevant information. This means that information presentation in certain scenarios should be designed with extra considerations, such as museums or exhibitions that display unfamiliar items. It would be interesting to explore the combination of different types of experiences to effectively introduce items to visitors. Secondly, people’s attention changes as their knowledge and experience change, so the design of an experience should consider these differences. For example, online education platforms can
206
L. Zhang et al.
be designed to automatically adjust teaching materials or patterns based on learning difficulty. Finally, the study shows that the intervention of information affects users’ appreciation of aviation engines. This means that the deliberate design of information distribution and presentation can shape not only a temporary user experience but also people’s long-term interests driven by internal knowledge.
6 Conclusions, Limitations and Future Work 6.1 Conclusions This study aimed to explore the impact of information on people’s perception of unfamiliar objects, with a focus on eye movements. 17 participants participated in a withinsubject experiment to evaluate aviation engines based on their perceived quality, cohesion, and reliability. Four models of aviation engines were used as unfamiliar objects, with two models being assessed without relevant information and two models being assessed with relevant information. Results showed a significant difference in the perceived quality of aviation engines after providing relevant information. Eye movement metrics (fixation, saccade, and heatmap) also indicated changes in people’s perception of aviation engines. The findings of this study contribute to the understanding and implications of information design and highlight the effectiveness of well-designed information in increasing users’ interest. 6.2 Limitations and Future Work Firstly, the study chose aviation engines as the unfamiliar object to test the impact of relevant information on people’s perception. However, variations among the engine models used in this study may have influenced the results. Future research could explore the impact of information on a wider range of objects, including unfamiliar concepts. Secondly, the tasks assigned to participants were limited to evaluating perceived quality, cohesion, and reliability. Future research could examine the impact of relevant information on other factors, such as people’s actual performance. Finally, the participants in this study were limited to university students, which may not reflect the diversity of the general population. Further research could recruit a more diverse participant group to better represent the general population. Acknowledgements. This work is supported by Shuangchuang Programme of Jiangsu Province for the Grant JSSCBS20210190 and the Fund of Prospective Layout of Scientific Research of Nanjing University of Aeronautics and Astronautics.
References 1. Kucewicz, M.T., et al.: Pupil size reflects successful encoding and recall of memory in humans. Sci. Rep. 8(1), 4949 (2018) 2. Johnson, J.A.: Personality psychology: methods. In: International Encyclopedia of the Social & Behavioral Sciences, pp. 11313–11317. Pergamon (2001)
How Information Influences the Way We Perceive Unfamiliar Objects
207
3. Hartson, R., Pyla, P.S.: The UX Book: Process and Guidelines for Ensuring a Quality User Experience. Elsevier (2012) 4. Murphy, P.R., Vandekerckhove, J., Nieuwenhuis, S.: Pupil-linked arousal determines variability in perceptual decision making. PLoS Comput. Biol. 10(9), e1003854 (2014) 5. Gibson, J.J.: The Ecological Approach to Visual Perception: Classic edition. Psychology Press (2014) 6. Adaval, R., Saluja, G., Jiang, Y.: Seeing and thinking in pictures: a review of visual information processing. Consum. Psychol. Rev. 2(1), 50–69 (2019) 7. Gregory, R. L.: Eye and Brain: The Psychology of Seeing. Princeton University Press (2015) 8. French, R.M., Mareschal, D., Mermillod, M., Quinn, P.C.: The role of bottom-up processing in perceptual categorization by 3-to 4-month-old infants: simulations and data. J. Exp. Psychol. Gen. 133(3), 382 (2004) 9. Metz, B., Schoemig, N., Krueger, H.P.: How is driving-related attention in driving with visual secondary tasks controlled? Evidence for top-down attentional control. In: Driver Distraction and Inattention, pp. 83–102. CRC Press (2017) 10. Gegenfurtner, A., Boucheix, J.M., Gruber, H., Hauser, F., Lehtinen, E., Lowe, R.K.: The gaze relational index as a measure of visual expertise. J. Expert. 3 (2020) 11. Meri, R., Farah, R., Horowitz-Kraus, T.: Children with dyslexia utilize both top-down and bottom-up networks equally in contextual and isolated word reading. Neuropsychologia 147, 10757 (2020) 12. Sussman, T.J., Jin, J., Mohanty, A.: Top-down and bottom-up factors in threat-related perception and attention in anxiety. Biol. Psychol. 121, 160–172 (2016) 13. Guo, F., Ding, Y., Liu, W., Liu, C., Zhang, X.: Can eye-tracking data be measured to assess product design?: Visual attention mechanism should be considered. Int. J. Ind. Ergon. 53, 229–235 (2016) 14. Handy, T.C., Smilek, D., Geiger, L., Liu, C., Schooler, J.W.: ERP evidence for rapid hedonic evaluation of logos. J. Cogn. Neurosci. 22(1), 124–138 (2010) 15. Lin, M.H., Wang, C.Y., Cheng, S.K., Cheng, S.H.: An event-related potential study of semantic style-match judgments of artistic furniture. Int. J. Psychophysiol. 82(2), 188–195 (2011) 16. Slanzi, G., Balazs, J.A., Velásquez, J.D.: Combining eye tracking, pupil dilation and EEG analysis for predicting web users click intention. Inf. Fus. 35, 51–57 (2017) 17. Poole, A., Ball, L.J.: Eye tracking in HCI and usability research. In: Encyclopedia of Human Computer Interaction, pp. 211–219. IGI Global (2006) 18. Just, M.A., Carpenter, P.A.: Eye fixations and cognitive processes. Cogn. Psychol. 8(4), 441– 480 (1976) 19. Shojaeizadeh, M., Djamasbi, S., Trapp, A.C. Density of gaze points within a fixation and information processing behavior. In: Antona, M., Stephanidis, C. (eds.) Universal Access in Human-Computer Interaction. Methods, Techniques, and Best Practices. UAHCI 2016. LNCS, vol. 9737, pp. 465–471 Springer, Cham (2016). https://doi.org/10.1007/978-3-31940250-5_44 20. Horstmann, N., Ahlgrimm, A., Glöckner, A.: How distinct are intuition and deliberation? An eye-tracking analysis of instruction-induced decision modes. Judgm. Decis. Mak. 4(5), 335–354 (2009) 21. Fu, B., Noy, N.F., Storey, M.A.: Eye tracking the user experience–an evaluation of ontology visualization techniques. Semantic Web 8(1), 23–41 (2017) 22. Goldberg, J.H., Kotval, X.P.: Computer interface evaluation using eye movements: methods and constructs. Int. J. Ind. Ergon. 24(6), 631–645 (1999) 23. Underwood, G., Radach, R.: Eye guidance and visual information processing: reading, visual search, picture perception and driving. In: Eye Guidance in Reading and Scene Perception, pp. 1–27. Elsevier Science Ltd. (1998)
208
L. Zhang et al.
24. Mello-Thoms, C., Nodine, C.F., Kundel, H.L.: What attracts the eye to the location of missed and reported breast cancers? In: Proceedings of the 2002 symposium on Eye Tracking Research & Applications, pp. 111–117 (2002) 25. Goldberg, J.H., Wichansky, A.M.: Eye tracking in usability evaluation: a practitioner’s guide. In: the Mind’s Eye, pp. 493–516. North-Holland (2003) 26. Bera, P., Soffer, P., Parsons, J.: Using eye tracking to expose cognitive processes in understanding conceptual models. MIS Q. 43(4), 1105–1126 (2019) 27. Djamasbi, S., Siegel, M., Tullis, T.: Visual hierarchy and viewing behavior: an eye tracking Study. In: Jacko, J.A. (ed.) HCI 2011. LNCS, vol. 6761, pp. 331–340. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21602-2_36
Data Visualization and Big Data
The Nikkei Stock Average Prediction by SVM Takahide Kaneko and Yumi Asahi(B) Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-Ku, Tokyo, Japan [email protected]
Abstract. The problem of how to extract structures hidden in large amounts of data is called “data mining”. Using a support vector machine (SVM), which is one of the data mining methods, I predicted the rise and fall of the Nikkei stock average one day, one week, and one month later. As explanatory variables, we used the historical rate of change in US stock prices and the Nikkei Stock Average. As a result of the analysis, it was possible to stably improve the prediction accuracy of the diary average stock price one day later compared to random prediction. In addition, SHAP was used to analyze whether the explanatory variables were appropriate. As a result, we found that the effect of each explanatory variable on the analysis results differs depending on how the training set and test set are divided. We made it a future task to make stock price predictions using SVMs more concrete and convincing. Keywords: Support Vector Machine · Stock price prediction · SHAP
1 Introduction In recent years, computers and the Internet have developed, and a lot of information has been stored on the web. At convenience stores and online shops, customer data such as purchasing history is consolidated every day, and economic data on stock prices and exchange rates can be easily obtained online. These data are originally gathered for some purpose. For example, customer data is the purpose of wanting to increase sales, or in the case of stock price data, the purpose is to predict the price movement of stock prices in the future. However, the data obtained from it is too large, and it is difficult to fulfill its purpose by looking at raw data, and it is necessary to extract the necessary information from large-scale data. The question of how this hidden structure is extracted is called “data mining”, and research is underway. One of the data mining methods is the “Support Vector Machine” used in this study. SVM is one of the algorithms commonly used in data analysis sites due to the generalization performance and the size of the applied field. Based on the idea of maximizing the margin, it is mainly used in binary classification issues. It is also possible to apply to multi-class classification and regression problem. In this study, the purpose of research is to use SVM to predict the Nikkei Stock Average. There is existing research in the prediction of stock prices using SVM. For example, there are research [1], which is classified as a company’s stock price “rises” and “drop” after the news article is distributed. Research [1] analyzed the rise and fall © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 211–221, 2023. https://doi.org/10.1007/978-3-031-35132-7_15
212
T. Kaneko and Y. Asahi
of stock prices a few minutes later. In this study, we focus on the Nikkei Stock Average, not for each company, and predicts the rise or fall of one day, one week, and one month later, rather than a few minutes later. In the research [1], classification was performed when the stock price went up and when it went down. In this study, we will analyze the classification in addition to the case where there is not much change. The reason for analyzing by adding this class classification is that it is common to avoid investing if the stock price does not change much considering the transaction cost, etc., considering the transaction cost. In addition, the explanatory amount is the change rate of the past US stocks and the Nikkei Stock Average. The US economy has a major impact on Japan. In consideration of the impact and the focusing on the past price movements of the Nikkei Stock Average itself, the Nikkei Average will be predicted in the future, and will be treated as explanation. The results obtained by verification clarify the conformity and issues of the prediction by SVM.
2 Method Here, we describe the basics of SVM, which is one form of supervised learning, and introduce previous research using SVM in the financial field. 2.1 Support Vector Machine (SVM) Support vector machines (SVM) are one of the machine learning algorithms that are often used in data analysis because of their generalization performance and wide range of applications. Based on an idea called margin maximization, it is mainly used for binary classification problems. Applications to multi-class classification and regression problems are also possible. The features of SVM include that it is difficult to cause overfitting and that it is possible to make highly accurate predictions even with relatively small amounts of data. However, its computational cost is high compared to other machine learning algorithms, making it unsuitable for large datasets. 2.2 Margin Maximization The n-1 dimensional plane that classifies n-dimensional data is called the separating hyperplane, and the distance to the data (support vector) closest to the separating hyperplane is called the margin. Maximizing this margin is the goal of SVM. A margin that assumes linearly separable data is called a hard margin. A soft margin is a margin that allows for erroneous discrimination on the premise of data that cannot be linearly separable.
The Nikkei Stock Average Prediction by SVM
213
2.3 Kernel Method In fact, in most cases, linear separation cannot be performed with the data as it is. Therefore, linear separation may be possible by subjecting the original data to nonlinear transformation to a higher dimension. A machine learning method that performs highdimensional nonlinear transformation of the feature vectors included in the learning data and identifies the spatial linearity is called the kernel method. 2.4 Kernel Tricks and SVM The kernel method enables linear separation by transforming the data into a higher dimension. However, there is a fear that calculation will become difficult as the data becomes higher dimensional. Kernel tricks are used there. A high-dimensional feature vector Φ(xn )(n = 1, 2, · · · i, · · · j · · · N ) can be obtained by nonlinearly transforming the feature vector xn (n = 1, 2, · · · i, · · · j · · · N ) of the original data. The inner product xi · xj of data is required when calculating to solve a quadratic programming problem when executing SVM. Similarly, in SVM in a linear high-dimensional space, the inner product Φ xi · Φ xj of Φ xi and Φ xj obtained by transforming data xi and xj into space is required. By defining the high-dimensional shape of the inner product Φ xi ·Φ xj after this transformation with the kernel function K(xi , xj ), the concrete form of Φ(xn ) is eliminates the need to define. This property of the kernel function is called the kernel trick, and it is possible to prevent the calculation from becoming difficult due to the high dimensionality. Typical kernel functions include the following (1) Polynomial kernel, (2) Gaussian kernel, and (3) Sigmoid kernel. In this study, the Gaussian kernel was used for the analysis. (1) Polynomial kernel (xi ) · (xj ) = K(xi , xj ) = (xi · xj + c)
d
(2) Gaussian kernel 2
(xi ) · (xj ) = K(xi , xj ) = exp(−γ xi − xj ) (3) Sigmoid kernel (xi ) · (xj ) = K(xi , xj ) = tanh(cxi · xj + θ )
3 Empirical Research 3.1 Usage Data In this research, we use past Nikkei 225, NY Dow, and S&P500 prices to predict fluctuations in the Nikkei 225 stock price one day, one week, and one month later. Table 1 below shows the source, type, treatment, and period of data used in the analysis.
214
T. Kaneko and Y. Asahi Table 1. Summary of usage data
Data source
Fact Set
Data type
➀ Nikkei Stock Average ➁ NY Dow ➂ S&P500
Data handling Forecast after 1 day Obtain each daily data and calculate the rate of change The rates of change on the previous day and the day before the previous day are used as explanatory variables (feature values) In addition, the daily rate of change of the Nikkei Stock Average is used for class classification Forecast after 1 week Obtain each weekly data and calculate the rate of change The rates of change in the previous week and the week before last are used as explanatory variables (feature values) In addition, the weekly rate of change of the Nikkei Stock Average is used for class classification Forecast after 1 month Obtain each monthly data and calculate the rate of change The rates of change in the previous month and the month before last are used as explanatory variables (feature values) Also, the monthly rate of change of the Nikkei Stock Average is used for class classification Data period
Forecast after 1 day April to June 2022 Forecast after 1 week September 2020 to August 2022 Forecast after 1 month September 2012 to August 2022
3.2 Analysis Procedure ➀ Classification according to the price movement of the Japanese stock average Classify using the rate of change of the acquired Nikkei Stock Average. A method of classifying according to an increase or a decrease and a method of classifying using the average value μ and standard deviation σ of the rate of change during the obtained period are used. • Classification method 1 (a) When the stock price rises: the stock price change rate is 0 or more (b) When the stock price falls: the stock price change rate is less than 0 • Classification method 2 (a) When the stock price rises: the rate of change exceeds μ + σ .
The Nikkei Stock Average Prediction by SVM
215
(b) When the stock price falls: the rate of change is less than μ − σ . (c) Not much change: the rate of change is greater than μ + σ and less than μ − σ . ➁ Perform supervised learning with kernel SVM • Learning procedure (1) Divide the dataset into training set and test set. (2) Perform grid search using leave-one-out cross-validation on the training set to find appropriate parameters. (3) Perform SVM learning on the training set with the obtained parameters. (4) Evaluate the performance using the test dataset. ➂ Perform learning multiple times and evaluate the SVM. The data are randomly classified into training set and test set, and supervised learning is performed multiple times. The following three indicators are used for each analysis. (1) Accuracy of training set (2) Accuracy of test set (3) Average F value Of these, the average F value is the macro average of each class. By taking the macroaverage, it is possible to prevent a situation in which the F value becomes high even when the classification is extremely biased. In Classification Method 1, if the probability of a stock price going up is equal to the probability of a stock price going down, and it is predicted randomly, the hit rate and the average F value will be 0.5, which is the basis for evaluating the accuracy of SVM analysis. Becomes a line. Also, in classification method 2, if we consider the same as in classification method 1, the hit rate and the average F value are both 0.33, which is the baseline for SVM analysis accuracy evaluation.
4 Result Based on the analysis procedure shown in Sect. 3.2, learning by SVM was performed 100 times for each classification method. Table 2 below summarizes the results of averaging the evaluation indices for each analysis accuracy. As a result, from Table 2, both the correct answer rate and the average F value for the prediction one day later exceeded the baseline. However, the average F value for the predictions after 1 week and 1 month is below the baseline, indicating that the analysis accuracy is poor. Therefore, the proposed method is considered to have a certain degree of predictive power for fluctuations in the Nikkei Stock Average one day later.
216
T. Kaneko and Y. Asahi Table 2. Analysis accuracy evaluation index results for each classification
Detailed analysis results after 1 day Figures 1 and 2 below show the detailed analytical results of stock price forecasts for the next day. From Figs. 1 (left) and 1 (right), for classification method 1 (2 categories), both the hit rate and the F value are consistently above 0.5, suggesting that the analysis accuracy is stable. On the other hand, from Figs. 2 (left) and 2 (right), for classification method 2 (3 categories), the accuracy rate is consistently above 0.33, but the average F value is often below 0.33, so the analysis accuracy is unstable. Classification method 1 (2 categories)
$FFXUDF\RIWUDLQLQJVHW $FFXUDF\RIWHVWVHW
$YHUDJH)YDOXH
Fig. 1. (left): Accuracy rate for each data when SVM learning is performed 100 times. (right): Each average F value when learning by SVM is performed 100 times
The Nikkei Stock Average Prediction by SVM
217
Classification method 2 (3 categories)
Fig. 2. (left): Accuracy rate for each data when SVM learning is performed 100 times. (right): Each average F value when learning by SVM is performed 100 times
Detailed analysis results after 1 week Figures 3 and 4 below show the detailed analysis results of stock price forecasts for the next week. From Fig. 3, we can see that the hit rate for classification method 2 is consistently above 0.5, but from Fig. 4, we can see that the average F value is low, suggesting that the classification is excessively biased. Classification method 1 (2 categories)
Fig. 3. (left): Accuracy rate for each data when SVM learning is performed 100 times. (right): Each average F value when learning by SVM is performed 100 times
218
T. Kaneko and Y. Asahi
Classification method 2 (3 categories)
Fig. 4. (left): Accuracy rate for each data when SVM learning is performed 100 times. (right): Each average F value when learning by SVM is performed 100 times
Detailed analysis results after 1 month Figures 5 and 6 below show the detailed analysis results of the stock price forecast one month later. From Fig. 5, we can see that the hit rate for classification method 2 is consistently above 0.5, but from Fig. 6, we can see that the average F value is low, suggesting that the classification is excessively biased. Classification method 1 (2 categories)
Fig. 5. (left): Accuracy rate for each data when SVM learning is performed 100 times. (right): Each average F value when learning by SVM is performed 100 times
The Nikkei Stock Average Prediction by SVM
219
Classification method 2 (3 categories)
Fig. 6. (left): Accuracy rate for each data when SVM learning is performed 100 times. (right): Each average F value when learning by SVM is performed 100 times
5 Discussion In this study, we used past Nikkei 225, Dow Jones Industrial Average, and S&P500 prices as explanatory variables to predict fluctuations in the Nikkei Stock Average. To apply this method in practice, it is necessary to specifically grasp how much each explanatory variable affects the results. Here, we used Shapley additive explanations (SHAP) as an indicator. 5.1 SHAP SHAP is an abbreviation for Shapley additive explanations. A method of calculating how much each feature value contributes to the prediction result of a model based on the concept of the Shapley value proposed in the field of game theory. It is possible to visualize the effect of increasing or decreasing the value of the feature amount. 5.2 Shapley Value The Shapley value was proposed in a field called cooperative game theory. Cooperative game theory considers how to distribute rewards according to each player’s degree of contribution in a game in which multiple players cooperate to clear the game. At that time, the contribution of each player is obtained as a Shapley value. 5.3 Overview of Calculation of SHAP Value By recognizing the player as a feature and the reward as a predictor, we can apply the Shapley value concept to machine learning, but there are differences in details. In
220
T. Kaneko and Y. Asahi
SHAP, “how much each feature value affects the predicted value” is measured by how much each feature value raises or lowers the predicted value from the average. With the Shapley value, the reward is 0 if no one plays the game. Both directional contributions are output. The problem when applying the Shapley value concept to machine learning is how to obtain the predicted value when “some features are missing”. A method called KernelSHAP uses marginalization, which fixes the “present” features and takes the average of the predicted values for the “absent” features. 5.4 Implementation Results of SHAP We used SHAP to visualize the impact of each feature value (Table 3) on stock price prediction one day later. Table 3. Feature value Feature0
NY Dow 2 days ago
Feature1
NY Dow 1 day ago
Feature2
SP500 2 days ago
Feature3
SP500 1 day ago
Feature4
Nikkei average 2 days ago
Feature5
Nikkei average 1 day ago
Figure 7 below shows the results of May stock price prediction using the April dataset as training set, and Fig. 7 shows the results of June stock price prediction using the May dataset as training set. The horizontal axis represents the SHAP value. In addition, “High” and “Low” written on the right side indicate that “High” indicates an increase in stock prices and “Low” indicates a decrease in stock prices. As a result of analysis using SHAP, we found that the impact of each explanatory variable on the analysis results differs depending on how the training set and test set are divided. It is a future task to make stock price prediction using SVM more concrete and convincing.
Fig. 7. (left): Result of May stock price prediction from April data set. (right): Result of June stock price prediction from May data set
The Nikkei Stock Average Prediction by SVM
221
6 Conclusion In this research, based on the past rate of change of US stocks and the Nikkei Stock Average, we used SVM, which is one of supervised learning, to predict the Nikkei Stock Average one day, one week, and one month later. We verified its practicality. As a result, we confirmed that it is possible to improve the accuracy of stock price prediction for the next day compared to random prediction. In the two-class classification of “when the stock price rises” and “when the stock price falls”, a high hit rate and average F-value were stably calculated. In addition, in the 3-class classification of “when stock prices go up”, “when stock prices go down”, and “not much change”, there were variations in the average F value, but overall, the hit rate and average F value was high. On the other hand, it became clear that the proposed method is not practical for predicting stock prices one week and one month later. A high hit rate was calculated, but a low average F value was calculated. This suggested that the analysis was overly biased. In addition, we conducted an analysis using SHAP to determine whether the explanatory variables were appropriate for the one-day stock price prediction, which had high prediction accuracy. As a result, we found that the effect of each explanatory variable on the analysis results differs depending on how the training data and evaluation data are divided. We made it a future task to make stock price predictions using SVMs more concrete and convincing.
References 1. Yusuke, I., Danushka, B., Hitoshi, I.: Using news articles of foreign exchange to predict stock prices by SVMs. SIG-FIN-012-09 2. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011) 3. Wen, F., Xiao, J., He, Z., Gong, X.: Stock price prediction based on SSA and SVM. Procedia Comput. Sci. 31, 625–631 (2014) 4. Tanaka, K., Nakagawa, H.: Proposal of SVM method for determining corporate ratings and validation of effectiveness by comparison with sequential logit model. Trans. Oper. Res. Soc. Jpn. 57, 92–111 (2014) 5. Lahmiri, S.: A comparison of PNN and SVM for stock market trend prediction using economic and technical information. Int. J. Comput. Appl. 29(3), 0975–8887 (2011) 6. Akaho, S.: Kaneru tahennryou kaiseki (Kernel multivariate analysis). Iwanami-Shotenn, Japan (2008)
What Causes Fertility Rate Difference Among Municipalities in Japan Shigeyuki Kurashima and Yumi Asahi(B) Tokyo University of Science, Kagurazaka, Shinjuku-ku, Tokyo 1628601, Japan [email protected]
Abstract. The declining birth rate in Japan poses a significant threat to the survival of half of municipalities, leading to a recent surge in studies exploring the causes of this phenomenon using Total Fertility Rate (TFR) as a benchmark for municipal unit data. However, focusing solely on TFR is insufficient when investigating the municipal-level reasons for the declining birthrate. In this paper, we conducted a thorough analysis of fertility decline by considering TFR and women’s migration patterns, using both Ordinary Least Squares (OLS) regression and Geographically Weighted Regression (GWR) to account for regional differences. Our findings reveal that low income and women’s social advancement, commonly regarded as factors contributing to declining fertility, do not significantly impact this phenomenon. Additionally, we observed regional disparities in TFR that cannot be explained by the variables incorporated in this study, with a trend of higher TFR in the west and lower TFR in the east. Moreover, by examining successful population maintenance policies in various regions, we identified common steps that have proven effective in maintaining population growth. Keywords: birthrate · female moving in and moving out · geographically weighted regression model
1 Introduction In 2005, Japan experienced its lowest Total Fertility Rate (TFR) in modern history, with a rate of 1.26. Japan could experience a decrease of 11 million people within a decade, which represents approximately 10% of the population. Despite variations in TFR by region, TFR is commonly used as a benchmark in studies related to declining birthrates. Sasai (2005) argued that regional fertility rates exhibit diverse levels and patterns of change, and that two major demographic factors, namely marriage trends and couple’s fertility, can explain the TFR to some extent [1]. However, a geographically weighted regression (GWR) analysis conducted by Kamata and Iwasawa (2009) demonstrated that, despite regional variations, most studies tend to use least squares (OLS) regression, and that the factors contributing to TFR are largely the same across regions [2]. Nevertheless, a high TFR in a region does not necessarily guarantee population sustainability in the future. Maeda (2005) suggested that TFR is a ratio of the number of children to the number of women, and that even if the number of children decreases, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 222–235, 2023. https://doi.org/10.1007/978-3-031-35132-7_16
What Causes Fertility Rate Difference Among Municipalities
223
the TFR remains unchanged as long as the number of women in the denominator also decreases [3]. Therefore, the number of women aged 15–49 is an important factor to consider when examining TFR. The scatter plot below shows TFR on the X-axis and FMI (an indicator of women moving in and out of the region) on the Y-axis, and will be discussed later (Fig. 1).
Fig. 1. Scatter plot of TFR and FMI, and pie plot of clusters
We conducted a cluster analysis based on TFR and FMI, and the resulting clusters are named as shown in the legend on the upper right of Fig. 2. The histogram in Fig. 2 displays the future population for each cluster, with the year 2015 as the base (set to 100). Regions with high TFR and FMI, which are considered ideal for maintaining population, are able to sustain their population. However, regions with high TFR and low FMI, which we named the “High TFR type,” are unable to maintain their population. This highlights the importance of not only considering TFR, but also FMI when addressing the issue of declining birthrate and population. This paper examines the issue of declining birthrate from two perspectives: TFR and the movement of women in and out of regions, taking into account regional differences. We provide a brief overview of each region in Japan: Hokkaido, Tohoku, Kanto, Chubu, Kinki, Chugoku, and Kyushu. Hokkaido, located at the northernmost tip of Japan, is the largest prefecture and boasts flourishing dairy and fishery industries. Tohoku is the northernmost region of Honshu, the main island of Japan, and is known for its thriving agriculture industry. The Kanto region is situated south of Tohoku and is home to Tokyo and other major urban centers. Chubu is a prosperous region located between Tokyo and Osaka and is home to many large manufacturing companies. Kinki is situated west of Chubu and includes cities such as Osaka, Kyoto, and Kobe. Chugoku, also known for its manufacturing industry, is located west of Kinki. Finally, Kyushu, located at the western end of Japan, has one of the highest birth rates in the country and is one of the fastest growing regions.
224
S. Kurashima and Y. Asahi
Fig. 2. Histogram of future population by clusters
2 Methods 2.1 Data Source and Data Processing
Table 1. All data we used
What Causes Fertility Rate Difference Among Municipalities
225
We obtained statistical data for all cities, towns, and villages in Japan from the Japanese Statistics Bureau, excluding the evacuation zone near the Fukushima nuclear power plant. Geographic data was acquired from the Ministry of Land, Infrastructure, Transport and Tourism, as well as previous research, and university campus location data was obtained from the university’s website. To calculate the distance between regions A and B, assuming the Earth to be a sphere, we used the following formula based on latitude (lat) and longitude (lon) (Table 1). cos(latA) + cos(latB) ∗ cos(lonB − lonA) Distance = 6371∗ arccos + sin(latA) ∗ sin(latB) The variables that contain the word “expense” in their item names are related to municipal expenditures. For instance, the housing expense ratio measures the proportion of expenditure allocated to the development of residential areas in relation to the total expenditure of the municipality. 2.2 Create an Indicator of Women Moving in and Moving Out We developed a novel indicator, the Female Migration Indicator (FMI), by dividing the number of women aged 15–49 moving into a region by the number of women aged 15–49 moving out. The purpose of FMI was to capture population flows in and out of a region, and we employed FMI as the response variable in our regression analysis. 2.3 Analysis Procedure This study aims to explore the relationship between Female Migration Indicator (FMI) and Total Fertility Rate (TFR) in Japan using three different regression models: OLS regression with geographic variables for TFR (TOwG), OLS regression without geographic variables for TFR (TOwoG), and GWR regression for TFR (TG). Similarly, we employed three regression models to examine the relationship between FMI and TFR, which are OLS regression with geographic variables for FMI (FOwG), OLS regression without geographic variables for FMI (FOwoG), and GWR for FMI (FG). The best model for each indicator was selected by adding explanatory variables based on increasing values of partial correlation coefficient and considering R-squared, Adjusted R-squared, and AIC. To avoid multicollinearity, explanatory variables with VIF scores over 4 were excluded. The bandwidth for the kernel function used in GWR was set to a fixed type, which is a constant bandwidth from the regression point. The results of this study provide insights into the relationship between FMI and TFR, as well as the impact of geographic variables on the relationship.
3 Result 3.1 TOwG and TOwoG Table 2 presents a summary of the statistical analysis for both TOwG and TOwoG models, which were found to be statistically significant based on the probability of F-statistic. The R-squared and Adjusted R-squared values indicate that geographic variables play a crucial role in predicting TFR, as evidenced by their substantially different scores (Fig. 3).
226
S. Kurashima and Y. Asahi Table 2. Summary of result of TOwG and TowoG R-squared
Adj. R-squared
F-statistic
Prob (F-statistic)
AICc
TOwG
0.569
0.565
161.5
1.47E−300
−2051.24
TOwoG
0.337
0.332
72.65
1.38E−143
−1311.12
Fig. 3. Coefficients of TOwG and TOwoG
Focusing on the coefficients of the OLS regression analysis with geographic variables (TOwG), indicated by the blue bars, we observe that longitude has a large negative score, suggesting that TFR tends to be lower in the eastern area and higher in the western area. The percentage of university graduates and the number of university campuses within a 20 km radius also have negative scores, while FMI, child welfare expense ratio, and the number of retail stores have positive values. Similarly, the coefficients of the OLS regression analysis without geographic variables (TOwoG), represented by the orange bars, exhibit similar scores to those of TOwG. Additionally, the coefficients of agriculture, forestry, and fisheries expense ratio, and nuclear family household ratio have positive values, while the percentage of female legislators and the average annual salary have negative values. Interestingly, our findings differ from those of previous studies. For example, the coefficient of annual salary being negative implies that as the annual income decreases, the TFR increases, and vice versa, while previous studies suggest that declining incomes contribute to declining birthrates [4], and 40% of unmarried people under the age of 49 do not desire to have children due to economic issues [5].
What Causes Fertility Rate Difference Among Municipalities
227
3.2 FOwG and FOwoG Table 3 presents the results of FOwG and FOwoG analyses, both of which are statistically significant at a significance level of 5%. In contrast to the TFR regression analysis, there is little difference in R-squared values between the two models. Table 3. Summary of result of FOwG and FOwoG R-squared
Adj. R-squared
F-statistic
Prob (F-statistic)
AICc
FOwG
0.544
0.541
204.7
3.24E−284
−2274.085
FOwoG
0.531
0.528
194.5
5.96E−274
−2226.319
The coefficients bar graph below illustrates that FOwG and FOwoG have similar coefficients values. Percentage of university graduates has the largest positive value. The number of university campuses within 20 km also has a positive score, suggesting a positive effect of universities on FMI. Conversely, the number of persons per household has the largest negative value, indicating that regions with many single-person households tend to have high FMI, whereas those with many large families tend to indicate low FMI. Regarding the blue bar, latitude has a positive coefficient, suggesting high FMI in the north and low in the south, although the coefficient is relatively small, indicating a weak relationship. As for the orange bar, the female employment rate has a positive value (Fig. 4).
Fig. 4. Coefficients of FOwG and FOwoG
228
S. Kurashima and Y. Asahi
3.3 TG and FG Result Table 4 is summary of TG and FG. From R-squared and AICc, these GWR models are improved from OLS regressions analyses. The right two columns of Table 4 show results of Leung’s F test [6]. Leung’s F test has three types. F1 test is the test that confirms whether GWR is better fitted than OLS which is TOwoG or FOwoG in this paper. F2 test verifies whether there is statistically significant difference between GWR and OLS. F3 test decides if the regional differences in coefficients are statistically significant. Table 4 shows that both GWR models, TG and FG, are better fitted model and the difference with OLS are statistically significant. The result of F3 test is shown in Table 5. Variables marked with * indicate regional differences, and variables marked in yellow do not indicate regional differences. To add a note about TG and FG, Ogasawara village was excluded because optimize bandwidth was too small to regress TFR and FMI of there. The following section, statistically significant coefficients of TG and FG are visualized, provided that Tokyo’s island areas and part of Kagoshima’s island areas are excluded from the visualization because isolated islands tend to be outliers. Table 4. Summary of Result of TG and FG R-squared
AICc
Leung et al. (2000) F(1) test
Leung et al. (2000) F(2) test
TG
0.7100158
−2385.717
p-value < 2.2e−16
p-value < 2.2e−16
FG
0.6694341
−2561.379
1.102E−07
p-value < 2.2e−16
Table 5. The result of F3 test
What Causes Fertility Rate Difference Among Municipalities
229
3.4 TG Coefficients Figure 5 illustrates the regional variation of coefficients. Most regions exhibit positive values of child welfare expense ratio and FMI, and negative values of the number of university campuses within 20 km. Other variables have both positive and negative values in certain regions. Moving on to Fig. 6, the coefficient of child welfare expense ratio has the highest value in the Kyushu region and the lowest in the southern Hokkaido region, where it is negative. The Northern Tohoku and Kanto regions have small but positive values. In other words, most regions show a positive correlation between TFR and child welfare expense ratio, indicating that the government should consider providing better financial support for children. In Hokkaido and Kanto regions, where the coefficient value is positive but close to zero, raising child welfare expenses should be approached with caution. The coefficient of FMI is also mostly positive across regions, with particularly high values in the Kanto, Chubu, and Kinki regions. It was found that the coefficient of the number of university campuses within 20 km has negative values in most regions, consistent with Tsutsumi’s (2020) previous research indicating that an increase in university students, who do not give birth, can lead to a decrease in TFR. On the other hand, the Agriculture, Forestry and Fisheries expense ratio is high in Kinki region, with most regions having a positive value, except for Chubu region. While areas with high agricultural expenses tend to score high TFR, this trend does not exist in Chubu and Kyushu regions. Additionally, the percentage of female legislators, often used as a measure of women’s social advancement, has a negative value in most areas and is believed to decrease TFR. Even in the area with the highest coefficient, the score is almost 0, indicating that the percentage of female legislators does not significantly affect TFR. The largest female employment rate is found in Tottori Prefecture, while the lowest is in the Kyushu region. Other regions have small positive values, except for Kyushu. Contrary to common belief, the advancement of women in society does not necessarily accelerate the decline in birthrate, as shown by the results of the percentage of female legislators. The nuclear family household ratio has a slightly positive value in the Kanto, Chubu, and Kansai regions, and values closer to 0 in other regions. In terms of industry factors, the ratio of workers in the secondary industry has a positive coefficient in 80% of the regions, but is negative in parts of the Hokkaido, Miyagi, and Kyushu regions. The tertiary industry has a negative coefficient in 80% of the areas, with positive values only in the Kinki and Chugoku regions. Lastly, focusing on the intercept, it represents the TFR without the influence of explanatory variables. Regional differences in the intercept imply disparities that are not explained by these variables. The left figure shows that intercept values tend to be higher in the west and lower in the east, which cannot be fully explained by the variables used. The right figure is a scatter plot of intercept and TFR, with a simple regression analysis showing that 19% of the variance of TFR is explained by the intercept and 50% by explanatory variables in the TG analysis (Fig. 7). 3.5 FG Coefficients Figure 8 is violin plot of FG coefficients. As can be seen from the figure, percentage of university graduates has largest value, and TFR is the second largest. Number of persons
230
S. Kurashima and Y. Asahi
Fig. 5. Coefficients of TG
per household and ratio of workers in secondary industry are negative value in more than 85% region. Focusing on TFR, most regions have positive coefficients, especially in the Kanto region. Although not as high as Kanto, Chubu and Kinki regions also have relatively high coefficients. Comparing the previous analysis, there seems to be a correlation between TFR and FMI in the Kanto region. This is because the coefficients of FMI in TG were high in Kanto, and the coefficients of TFR in FG were also high. In fact, there is a correlation in the Tokyo metropolitan area, excluding Tokyo’s 23 wards and remote islands, with a correlation coefficient of 0.47. In Kanagawa, Chiba, and Saitama, the correlation coefficient is 0.57. If we expand the scope slightly, there is also a correlation in areas such as Tokyo’s metropolitan area, excluding Tokyo’s 23 wards, Aichi, Kyoto, Osaka, and Hyogo, with a correlation coefficient of 0.32. In all regions in Japan, the correlation coefficient is 0.11. Based on these results, it can be said
What Causes Fertility Rate Difference Among Municipalities
231
Fig. 6. Map of Japan with TG coefficients displayed
that there is a tendency for women to gather in areas with high TFR or for areas where women gather to have a high TFR in urban areas.
232
S. Kurashima and Y. Asahi
Fig. 7. Map of Japan with TG intercept displayed and scatter plot of TFR and intercept
The percentage of university graduates shows the highest coefficient in Kyushu region, particularly in Fukuoka, and is moderately high in Hokkaido region and Tohoku region, while it has the lowest coefficient in Honshu, except for Tohoku region. The strong tendency in Fukuoka is due to the high concentration of people from Kyushu region and numerous universities. The housing expense ratio is relatively high in Kanto and Chubu regions. The number of persons per household has the only positive value in Tohoku region, while it has a negative value in other areas, especially low in Chubu and Kinki regions. Although the OLS regression analysis showed a negative coefficient for the number of persons per household, there are positive coefficients that vary by region.
Fig. 8. FG Coefficients
The Kyushu region has the lowest child welfare expense ratio, whereas it had the highest ratio in the previous TG analysis. This is due to the lack of correlation between
What Causes Fertility Rate Difference Among Municipalities
233
TFR and FMI in Kyushu, where TFR remains around 1.5 to 2 regardless of the value of FMI. The relationship between FMI and female employment rate seems to be weak as the value is close to zero in most areas. The ratio of workers in the secondary industry is high in Kyushu, Chubu, and eastern Tohoku regions, possibly due to the presence of semiconductor factories. The rise of the secondary industry is unlikely to be a factor in attracting women to the area, but rather it may attract men who, in turn, create new economic zones that increase the population of both men and women (Fig. 9).
Fig. 9. Map of Japan with FG coefficients displayed
234
S. Kurashima and Y. Asahi
3.6 Successful Policies for Region Maintaining Population In this section, we examine policies adopted by regions with high TFR and FMI. The first region is Nagaizumi Town, Shizuoka Prefecture. Located about 100 km from Tokyo and 300 km from Osaka, the town has a well-developed transportation network and high TFR and FMI. Instead of implementing measures to combat the declining birthrate, the town focused on attracting companies to establish a stable financial base and utilized its transportation convenience to develop the town, leading to an increase in population and birth rates. By keeping companies local, the population has remained steady. The second region is Tsukuba City, Ibaraki Prefecture. Located about 40 km from Tokyo, the city has a day/night population ratio of 86% and serves as a bedroom community. With 20 university campuses within a 20 km radius, the city is expected to attract many students. The city also focuses on attracting companies and promoting the relocation of factories and research institutes. Childcare support is also a priority. After examining successful municipalities and their policies to prevent declining birthrates, it was found that the common approach was to reduce outflows or increase inflows of population by attracting companies. This approach was often paired with childcare support policies that raise the TFR. By increasing the number of children in the numerator while maintaining the population in the denominator of the TFR, the government has succeeded in maintaining the population. On the other hand, according to Maeda (2005), Nagaoka City in Niigata Prefecture and Tohno City in Iwate Prefecture focused on child-rearing support in order to counter the declining birthrate, and although they were able to improve TFR as a result, their future populations declined, indicating that they could prevent the declining birthrate in the long run [3]. The reason for this is thought to be that the government skipped the first step and started with the third step.
4 Discussion The result of TG provides a good explanation of TFR, but the factors contributing to TFR vary by region, requiring tailored policies rather than a uniform approach. The west high/east low trend observed in the intercept cannot be fully explained by the explanatory variables. For instance, in the Kanto region, TFR does not rise as high as in west Japan, and policies to promote women’s in-migration could improve TFR, given the highest coefficient was FMI. Economic factors such as child welfare expense and average annual salary had little impact on TFR, indicating other factors such as working hours, opportunities for social interaction, and psychology could be contributing to the decline. The coefficient of the percentage of female legislators was negative, contrary to expectations, and the ratio of workers in the tertiary industry negatively impacted FMI due to longer working hours and less personal time. The coefficient of the percentage of university graduates was high in most regions, indicating increasing employment opportunities for university graduates could increase FMI. However, in the Kanto region, the intercept was extremely high, and the coefficient of TFR was the largest, suggesting that improving TFR could lead to even higher FMI. Ultimately, focusing on the intercept and coefficient can provide insight into raising TFR and FMI and addressing declining birth rates in each region.
What Causes Fertility Rate Difference Among Municipalities
235
5 Conclusions This study aimed to examine the decline in fertility by focusing on TFR and FMI. To achieve this goal, OLS regression analysis was conducted to identify the essential variables that explain TFR and FMI. We used GWR analysis to vary the intercept and coefficients, leading to a high explanatory power model that could be tailored to different regions. By analyzing successful municipalities, we observed a common trend of enhancing FMI and TFR to improve the birthrate. Future studies should explore variables that can explain the west high/east low trend in TFR and establish a causal relationship instead of relying on correlation. In addition, an analysis accounting for the impact of COVID-19 will be necessary.
References 1. Sasai, T.: Trends in fertility rates by municipality and factors causing changes. J. Popul. Probl. 61–3, 39–49 (2005) 2. Kamata, K., Iwasawa, M.: Spatial variations in fertility: geographically weighted regression analyses for town-and-village-level TFR in Japan. Demogr. Res. 45 (2009) 3. Maeda, M.: Child-rearing support measures by local governments and the reality of declining birthrates and population decline. Hirao Sch. Manag. Rev. 5, 1–16 (2015) 4. Cabinet office. https://www5.cao.go.jp/keizai-shimon/kaigi/special/future/sentaku/s3_1_2. html. Accessed 01 Feb 2023 5. Cabinet office. https://www8.cao.go.jp/shoushi/shoushika/research/h25/taiko/2_1_1.html. Accessed 01 Feb 2023 6. Leung, Y.: Statistical tests for spatial nonstationary based on the geographically weighted regression model. Environ. Plan A 2000(32), 9–32 (2000) 7. Tsutsumi, K.: Impact of social change on total fertility rate. Stat. Data Anal. Compet. (2020) 8. Yuka, S.: Actual Conditions and Issues of Long Working Hours by Industry. Daiwa Institute of Research (2018). https://www.dir.co.jp/report/research/policy-analysis/humansociety/20180330_020030.pdf. Accessed 09 Feb 2022 9. Masumi, Z., Takashi, O., Yuichi, K., Akiko, T., Shiro, K., Masakazu Y.: Relationships between distribution of Japanese residential areas and topography (2005). http://www.csis.u-tokyo.ac. jp/dp/dp68/68.pdf. Accessed 11 Feb 2023 10. Nagaizumi Town, Shizuoka Prefecture: “A Town Where It’s Easy to Give Birth and Raise a Child” - Measures to Support Child Rearing in the Town of Niko Niko. https://www.zck.or. jp/site/forum/1319.html. Accessed 08 Feb 2022 11. Tsukuba Mirai City has the highest population growth rate in the prefecture! What is “Japan’s No. 1” in Tsukuba Mirai City? https://tochiten.com/work/changejob/news/ibaraki-147.html. Accessed 08 Feb 2022
Explore Data Quality Challenges Based on Data Structure of Electronic Health Records Caihua Liu1 , Guochao (Alex) Peng2 , Chaowang Lan1(B) , and Shufeng Kong2 1 Guilin University of Electronic Technology, Guilin 541004, Guangxi, China
[email protected] 2 Sun Yat-Sen University, Guangzhou 510275, Guangdong, China
Abstract. As the adoption of electronic health records (EHR) in primary care, ensuring high-quality data used is the premise of the quality of decision making and quality of care. Prior literature on EHR data quality has addressed dimensions and methods of data quality assessment for reuse, however, the challenges of data quality in EHR during the process of primary care from the perspective of EHR data structure have received limited attention. Looking at the EHR data structure helps improve the understanding of data quality challenges from the information pathway. Such a study assists in better designing and developing EHR systems and achieving high-quality data when using EHR. This paper thus aims at exploring challenges of data quality from the perspective of EHR data structure. For this to happen, the present study firstly investigates five main practices of primary care and describes a use case diagram of EHR systems based on these practices. Referring to the EHR systems’ functions described in the use case diagram, the study then conceptualizes the EHR data structure used in primary care, including a conceptual data model, a data flow diagram and a database schema, to better understand the data elements contained in EHR, and analyzes the changes of data elements in EHR when the practices of primary care are carried out and possible challenges of data quality in EHR. Finally, this study proposes several strategies addressing these challenges to help practitioners achieve high-quality data. Future research directions are also discussed. Keywords: Data quality · Electronic health records · Data structure
1 Introduction Electronic health records (EHR), a computer-based system with the integration of routinely collected data, enable practitioners to deliver convenient health services to their patients in using integrated clinical information, and have great potential to improve quality of care and address cost-effectiveness [1]. However, EHR carry many challenges such as complexity and bias [2]. Essentially, these challenges concern data quality in EHR. Data quality refers to the extent to which the characteristics of data meet requirements for a specific purpose and can be assessed by multiple dimensions (e.g. accuracy and completeness) [19]. Data quality problems could lead to severe consequences in the domain of healthcare such as loss of revenue and increase in mortality. Hence, data quality in EHR has been one of the research foci in the domain. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 236–247, 2023. https://doi.org/10.1007/978-3-031-35132-7_17
Explore Data Quality Challenges Based on Data Structure
237
Prior literature on data quality in EHR has addressed dimensions and methods of data quality assessment for reuse [3], while limited attention has paid to the challenges of data quality in EHR from the perspective of EHR data structure. The health information pathway is dynamic, and data quality problems could emerge at different points in this information pathway [4]. Looking at the data structure of EHR helps improve the understanding of data quality challenges through analyzing the changes of data elements in EHR from the information pathway. Such a study helps facilitate a better design and development of EHR systems and raise the awareness of healthcare practitioners to achieve high-quality data when using EHR for addressing the quality of care and patient safety. This study therefore aims to explore data quality challenges from the perspective of EHR data structure to better study and understand this phenomenon. For this to happen, the present study firstly investigates the five main practices of primary care and describes a use case diagram of EHR systems applied in the practices. The study then conceptualizes the EHR data structure used in primary care based on the functions of EHR systems, including a conceptual data model, a data flow diagram and a database schema, in order to investigate the data elements contained in EHR. Finally, this study analyzes the changes of data elements in EHR when the practices of primary care are carried out and discusses possible challenges of data quality in EHR. Accordingly, the present study makes two contributions. Firstly, the study provides an initial explanation of data quality problems from the perspective of the EHR data structure. Secondly, the study proposes the strategies to address these challenges, helping practitioners achieve quality-assured data. The rest of this paper organizes as follows: Sect. 2 gives the concepts related to EHR; Sect. 3 describes general EHR systems in primary care; Sect. 4 analyzes the data structure of EHR; Sect. 5 explores possible challenges on data quality in EHR; and Sect. 6 presents the conclusion of this study.
2 Concepts of Electronic Health Records This section gives a description of EHR including definition and architecture of EHR that helps better understand EHR. 2.1 Definition of EHR The International Organization for Standardization (ISO) has given a generic definition to EHR as “a repository of information regarding the health status of a subject of care, in computer processable form” [4]. Meanwhile, ISO uses alternative terms such as Electronic Medical Record (EMR), Electronic Patient Record (EPR), and Computerized Patient Record (CPR) to describe EHR. In [5], Electronic Health Care Record (EHCR) [6] also can be seen as a synonym of EHR. A systematic review of different types for EHR can be found in [5].
238
C. Liu et al.
2.2 Architecture of EHR The architecture of EHR is a formal description of a system, organizing components and services that support recording, retrieving and handling information in EHR [7]. The EHR are scattered across multiple clinical systems and repositories for patientcentred continuity of care [7]. National Institutes of Health National Centre for Research Resources has provided an overview of EHR [8] to improve the understanding about the EHR architecture. It was reported [8] that in the EHR architecture, each system captures a patient’s data in a single encounter at different departments and stores this data in its own file. The EHR network integrates data into a central data repository across systems with a certain standard nomenclature or structured vocabulary and supports data transmission and aggregation. During the coordination of care, data users access required data via the EHR network. When disparate data from different data silos is aggregated into integrated displays, the EHR system needs unified standards to deal with vocabulary variations. These standards (e.g., Health Level-7 Standards1 ) enable the EHR information to be received, processed and presented across clinical systems and repositories for different use [8]. The standardized architecture of EHR also can be applicable to its multiple forms such as EMR, EPR, CPR, and EHCR [9].
3 Use Case Diagram of EHR Systems This section presents a use case diagram of EHR systems based on the practices in primary care presented in an empirical study [10], to describe the interactions between users and EHR systems and responses of EHR systems in primary care. Because the EHR can increase collaborations among care teams and meet practitioners’ different needs of data that assist in delivering quality-assured care in a general practice, the EHR implementation has received much attention in primary care2 [12]. According to [10], five main practices in primary care refined are: Check-in, Triage, Diagnosis, Lab and Procedures, and Check-out that help constitute a use case diagram of EHR systems in primary care using Unified Modelling Language, as described below. Check-in. The triage nurse registers the information about a patient during a clinical encounter by creating, confirming or cancelling an appointment for the patient. Triage. The triage nurse sets a schedule of the physician and ascertains the availability of the physician within a particular facility and service location at a time slot. Diagnosis. The physician reviews the patient’s history, performs and documents an examination, orders a radiology or a lab, conducts procedures and medication, makes a diagnosis, decides a treatment, and records additional information about the patient. Lab and radiology. The healthcare practitioner performs a lab or a radiology and records the results, and then this information will appear for the physician. 1 Health Level-7 standards refer to a set of international standards for the exchange, integration,
sharing, and retrieval of electronic health information [8]. 2 When patients have a usual source of care that serves for four essential functions, patients
can be considered to receive primary care. The four functions are “providing first contact care for new health problems, comprehensive care for the majority of health problems, long-term person-focused care and care coordination across providers” [11].
Explore Data Quality Challenges Based on Data Structure
239
Check-out. After all health services have been completed, the patient will be checked out of the office of the physician. Based on this analysis, a use case diagram of EHR systems in primary care is created in Fig. 1.
Fig. 1. A use case diagram of EHR systems for primary care analyzed in this study
4 EHR Data Structure This section describes the data structure of EHR utilized in primary care, including a conceptual data model, a data flow diagram and a database schema. 4.1 Conceptual Data Model of EHR To better understand an EHR, a conceptual data model is constructed in this study as such a model describes how data is physically represented in the database [13]. As inspirited by [14], typical functions of EHR systems used in primary care can be identified from Fig. 1 and they are: (1) identification and registration of a patient, a healthcare practitioner, a facility and a service location; (2) record retrieval of a patient; (3) appointment and scheduling;
240
C. Liu et al.
(4) documentation of information at an encounter about a patient (who is seen by a healthcare practitioner at a certain service location within a specific facility) and additional patient historical information; and (5) episode of care (e.g. observations, diagnoses, examinations, treatments, and ordering and recording the results of lab and radiology). Figure 2 presents a conceptual data model of an EHR in primary care, describing the relationships among different functions of EHR systems. An episode of primary care contains all health services related to a patient’s visit for a health problem or a disease to the completion of this visit. When a patient encounters in general practices, an episode of care would emerge within a healthcare practitioner at a facility service location. An episode of care includes: (1) an appointment for healthcare practitioners, facilities and service locations; (2) documentation of examinations that allows healthcare practitioners to observe and record the patient’s status; (3) an order of a lab or a radiology with the test results for consideration of a health problem or a disease, procedures (such as immunization) and medication; (4) a diagnosis involving the determination of the patient’s condition; and (5) a treatment towards the patient’s health problems. Essentially, a patient may not encounter an episode of care or may meet such an episode many times. If a patient cancels the appointment, any care episode would not take place. If a patient suffers from chronic disease, he or she would routinely visit a general practitioner’s office for monitoring and controlling their health condition. When a patient visits a clinic, an appointment of healthcare practitioners and facility within its service location is required. Furthermore, the patient’s information about personal information, prescription history and additional relevant information should be collected and recorded. When the care is completed, the patient will be checked out. 4.2 Data Flow Diagram of EHR The architecture the EHR is a multitier architecture comprising web clients, EHR network services and database for patients’ data. That is to say, healthcare practitioners can access a patient’s data from any client that is connected to the centralized EHR systems at healthcare settings. Therefore, a data flow diagram of the EHR systems in primary care is shown in Fig. 3. In this study, Gane and Sarson symbols [15] are used to present this data flow diagram. 4.3 Database Schema of EHR Database schema indicates how can the data be physically presented in a database, supporting data structures and data storage. In this study, a database is required for management and storage of patient’s information during the course of care. Generally, the database contains a variety of tables with associated relationships. A generic database table is utilised for recording the patients’ data in the EHR for primary care, which gives details of data entries in the database (i.e. the actual data that is entered into the
Fig. 2. A conceptual data model of the EHR presented by an entity-relationship diagram in this study
Explore Data Quality Challenges Based on Data Structure 241
242
C. Liu et al.
repository). Thus, a simplified repository database schema is presented in Fig. 4, based on the conceptual data model.
5 Discussion This section discusses possible data quality challenges (problems) in EHR. Because the conceptual data model for EHR in primary care contains all required information from patients, if healthcare practitioners could achieve complete and accurate data elements at any point of care, this serves as a good foundation for quality-assured care. Hence, data elements in a patient’s record should be considered when studying data quality problems in EHR. Common and import clinical data elements that participate in primary care can be collected from the existing EHR content standard. The Canadian Institute for Health Information (CIHI) issued a subset of primary health care (PHC) EMR Content Standard (CS) to improve quality of care and management of health systems [16]. The CIHI indicated that 106 data elements aligned with jurisdictional programs are needed to collect at the point of care. Among of them, 8 data elements were highlighted with clinician-friendly pick for clinical use as shown in Fig. 5. They are Referral, Reason for Visit, Diagnostic Imaging Test Order (Radiology Order in this study), Heath Concern, Social Behaviour, Clinician Assessment (Diagnosis Results in this study), Intervention (Treatment in this study), and Vaccine Administered. Figure 5 integrates common data elements into the proposed database schema of the EHR. Most data elements in the EHR are changeable during each encounter, while a patient’s identifier, identifier type, and identifier assigning authority are permanent. The patient’s personal information such as name and birthday are relatively stable after the first registration. In this section, the five main practices in primary care presented in Sect. 3 are taken as examples to understand the changes in the EHR data elements. In the process of check-in, for new patients, their basic personal information, as well as health concerns, social behaviour, and allergy history, are recorded into the EHR systems. Then, triage nurses deal with patients’ appointments and check the availability of healthcare practitioners and facility with a service location. At this moment, records on Patient, Referral, Appointment, HealthConcern, SocialBehaviour, AllergyHistory, HealthcarePractitioner and Facility are queried and/or edited. The data elements about date of birth, type of health concern, type of social behaviour, and reasons for visit are further utilised in a clinical decision. Accordingly, completeness and accuracy of these data elements need to address during data entry. During the process of diagnosis practice, the physician asks the patient’s health concerns and checks symptoms. The patient may receive physical examinations (e.g. blood pressure, height, weight and waist circumference). Then, the physician may issue orders of lab tests and diagnostic imaging tests for looking at a patient’s condition. The results of the physical examination and tests are updated by healthcare practitioners, and the results on Examination, Laboratory and Radiology will display for the physician. Note that for a patient these records may differ from one to another encounter. When the physician obtains critical sign and symptoms, he or she could determine a health problem and input the results of this clinical assessment. By reviewing the patient’s demographics,
243
Fig. 3. Data flow of the EHR analyzed in this study
Explore Data Quality Challenges Based on Data Structure
244
C. Liu et al.
Fig. 4. Database schema of the EHR analyzed in this study
historical information, and results of physical examination and tests from the EHR, the physician could decide a procedure, medication and/or treatment for the patient. So, new records on the patient’s diagnosis, procedure, treatment and medication could emerge. Because determining a patient’s condition requires necessary knowledge on the diagnosis process and vital sign and symptoms, different health problems complicate the data elements extracted from multiple clinical systems. In other words, the data elements utilized to make a decision for a diagnosis are needed to identify based on relevant content standards, and thus, the challenges that influence data quality also relate to the variation between diseases [4, 17]. Based on this analysis, potential data quality challenges (problems) in EHR for primary care could be classified into four groups: (1) responsible personnel cannot completely and accurately collect the data during the course of care; (2) errors emerge in data transmission via the EHR network; (3) ineffective data extraction occurs due to poor algorithms [4]; and (4) intentions of data creation mismatch the usage of the data.
Explore Data Quality Challenges Based on Data Structure
245
Fig. 5. Data elements used in the EHR analyzed in this study
6 Conclusion The present study utilizes a conceptual model, a data flow diagram and a database schema to study and understand the EHR data structure, based on the functions of EHR systems used in primary care (see Fig. 1). By analyzing the changes in data elements in the EHR context, four main challenges (problems) of data quality were identified in the process of primary care, including: (1) documentation of patients’ records in using EHR systems; (2) data transmission via the network; (3) data extraction from EHR to support clinical decision-making and (4) alignment between intentions of data creation and its usage. Section 5 provides an initial explanation of these data quality challenges. Future research is encouraged to simulate EHR and capture data quality problems by tracing the changes of data elements in EHR in a case study, comparing the results from this study and developing further insights into the phenomena. To address the aforementioned four data quality problems, this study also presents several strategies accordingly: (1) As healthcare practitioners have chances to access and edit different parts of EHR for a patient, it is difficult to recover the data without any comparative sources if errors occur at the point of data entry [4]. This could lead to bad data in the database in a long run. Therefore, healthcare practitioners should raise the awareness of recording complete and accurate data elements about patients during each encounter and follow
246
C. Liu et al.
the guidelines and procedures of documentation. Furthermore, unambiguous design of user interface can be viewed as an excellent outset to address data quality, because user interface could guide users’ operation in data entry and affect their understanding about the acquired data for decision-making. At the same time, a careful and thorough scrutiny of accuracy of data entered in EHR is strongly recommended. (2) The EHR architecture indicates that the data aggregated from multiple systems for coordination of care relies heavily on the network. A stable and effective network infrastructure could assist in data transmission and provide the access to the required data elements. The network delays, failure of data nodes or partial integration of heterogeneous source feeds are cited as causes of data loss that results in incomplete data collection, particularly over a large network and a large block of time [4]. (3) Nowadays, more and more intelligent agents are employed to support clinical decision-making for healthcare practitioners. An ontology provides a means for its users to address semantical interoperability across different clinical systems and repositories and at the same time aggregate quality data from integrated healthcare records for use [18]. Hence, an ontology-based approach helps automatically assess the quality of the data extracted from the EHR and aggregated high-quality data for users. (4) The data created for operation, decision making, and planning of healthcare services and products should meet data users’ requirements. User involvement in defining data quality in EHR could help address alignment between intention of data creation and its usage for delivering the corresponding data products to improve users’ satisfaction and achieve the quality of decision making. Acknowledgement. This research was supported and funded by the Humanities and Social Sciences Youth Foundation, Ministry of Education of the People’s Republic of China (Grant No. 21YJC870009).
References 1. Menachemi, N., Collum, T.H.: Benefits and drawbacks of electronic health record systems. Risk Manag. Healthc. Policy 4, 47–55 (2011) 2. Hripcsak, G., Albers, D.J.: Next-generation phenotyping of electronic health records. J. Am. Med. Inform. Assoc. 20(1), 117–121 (2013) 3. Weiskopf, N.G., Weng, C.: Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20(1), 144–151 (2013) 4. Coleman, N., Halas, G., Peeler, W., Casaclang, N., Williamson, T., Katz, A.: From patient care to research: a validation study examining the factors contributing to data quality in a primary care electronic medical record database. BMC Fam. Pract. 16(1), 11 (2015) 5. Häyrinen, K., Saranto, K., Nykänen, P.: Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int. J. Med. Inform. 77(5), 291–304 (2008) 6. Naszlady, A., Naszlady, J.: Patient health record on a smart card. Int. J. Med. Informat. 48(1), 191–194 (1998)
Explore Data Quality Challenges Based on Data Structure
247
7. ISO. Health informatics — requirements for an electronic health record architecture. https:// www.iso.org/obp/ui/#iso:std:iso:18308:ed-1:v1:en. Accessed 21 Sept 2022 8. National Institutes of Health — National Center for Research Resources. Electronic Health Records Overview. National Institutes of Health, Bethesda (2006) 9. Australian Standard: Health Informatics-Requirements for an electronic health record architecture (ISO/TS 18308:2004, MOD). https://www.saiglobal.com/PDFTemp/Previews/OSH/ as/as10000/18000/18308-2005.pdf. Accessed 21 Sept 2022 10. Cooper, J.D., Copenhaver, J.D., Copenhaver, C.J.: Workflow in the primary care physician’s office: a study of five practices. Inf. Technol. Pract. Physician, 23–34 (2001) 11. Friedberg, M.W., Hussey, P.S., Schneider, E.C.: Primary care: a critical review of the evidence on quality and costs of health care. Health Aff. 29(5), 766–772 (2010) 12. Terry, A.L., et al.: Implementing electronic health records: key factors in primary care. Can. Fam. Physician 54(5), 730–736 (2008) 13. Connolly, T.M., Begg, C.E.: Database Systems: A Practical Approach to Design, Implementation, and Management. Pearson Education, London (2005) 14. Kotzé, P., Foster, R.: A conceptual data model for a primary health care patient-centric electronic medical record system. In: 2nd Proceedings of IASTED’s International Conference on Health Informatics, Gaborone, Botswana, pp. 245–250. IASTED Press (2014) 15. Gane, C.P., Sarson, T.: Structured Systems Analysis: Tools and Techniques. Prentice Hall Professional Technical Reference, Upper Saddle River (1979) 16. CIHI: Pan-Canadian Primary Health Care Electronic Medical Record Content Standard. https://secure.cihi.ca/free_products/PHC_EMR_Content_Standard_V3.0_Business_V iew_EN.pdf. Accessed 21 Sept 2022 17. Kadhim-Saleh, A., Green, M., Williamson, T., Hunter, D., Birtwhistle, R.: Validation of the diagnostic algorithms for 5 chronic conditions in the Canadian primary care sentinel surveillance network (CPCSSN): a Kingston practice-based research network (PBRN) report. J. Am. Board Fam. Med. 26(2), 159–167 (2013) 18. Liaw, S.T., et al.: Towards an ontology for data quality in integrated chronic disease management: a realist review of the literature. Int. J. Med. Inform. 82(1), 10–24 (2013) 19. ISO. ISO/IEC 25024:2015 Systems and software engineering—systems and software quality requirements and evaluation (SQuaRE)—measurement of data quality. https://www.iso.org/ standard/35749.html. Accessed 21 Sept 2022
Feature Analysis of Game Software in Japan Using Topic Model and Structural Equation Modeling for Reviews and Livestreaming Chat Ryuto Miyake1 and Ryosuke Saga2(B) 1 Osaka Prefecture University, 1-1 Gakuen-Cho, Naka-Ku, Sakai, Osaka, Japan 2 Osaka Metropolitan University, 1-1 Gakuen-Cho, Naka-Ku, Sakai, Osaka, Japan
[email protected]
Abstract. The popularity of video games has been increasing worldwide, with the global video game market estimated to generate $184.4 billion in revenue in 2022. User review analysis has become an increasingly important area of focus for game developers, as user reviews are a vital source of information that affects the quality and sales of games. Live streaming has also emerged as a new way of enjoying games, and analyzing live streaming chat can provide information about the game that is different from reviews. This paper analyzes the factors that lead to high ratings of games by performing causal analysis on text data such as reviews and live streaming chat, using a topic-based approach to extract distinctive topics from the data for causal analysis and perform sentiment analysis to add emotional information as a feature. Keywords: Game Review Analysis · Livestreaming Chat · Structural Equation Analysis · Time-series Analysis
1 Introduction In recent years, the popularity of video games has been increasing worldwide due to the development of technology and the widespread use of smartphones and tablets. The global video game market is estimated to generate $184.4 billion in revenue in 2022 and is showing a trend towards further market expansion in the coming years [1]. As a result, there is intense competition in the market as game genres and platforms continue to diversify. Developers must deeply understand the content and interests of the games that customers are looking for in order to remain competitive. Therefore, user review analysis of games has become an increasingly important area of focus. User reviews are a vital source of information for game developers and are known to affect the quality and sales of games. User reviews are evaluations of the game from players who have actually experienced it and contain useful information such as opinions and emotions related to the game. User review analysis uses natural language processing techniques to extract the emotions and opinions of players from those reviews and statistically analyze them to understand the quality of the game and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 248–257, 2023. https://doi.org/10.1007/978-3-031-35132-7_18
Feature Analysis of Game Software in Japan
249
the needs of the players. Therefore, analyzing game reviews plays an important role in supporting strategic decision-making in the game industry. It can also be used for user recommendations to provide interesting products to users. On the other hand, a new way of enjoying games has emerged through live streaming. In the third quarter of 2022, 7.27 billion hours of streaming were viewed on all platforms, and game streaming has become a popular content around the world [2]. In game streaming, there is a chat feature where viewers can leave comments in addition to the player’s commentary and game footage. This includes information such as comments on the game, sharing emotions and tension with the player when watching the video, and provides information about the game that is different from reviews. In this paper, we analyze the factors that lead to high ratings of games by performing causal analysis on text data such as reviews and live streaming chat for Japanese games. We use a topic-based approach to extract distinctive topics from the data for causal analysis and perform sentiment analysis to add emotional information as a feature. By applying the topic and emotional information to structural equation modeling (SEM), we can quantitatively analyze the causal relationships of each characteristic.
2 Related Work Factor analysis of game sales and analysis of reviews have been conducted conventionally. Kitami et al. analyzed purchasing factors from quantitative numerical data using regression analysis and SEM [3]. They also created a factor model by combining SEM and KJ method, and conducted purchasing factor analysis in the United States and Japan [4]. In addition, there are previous studies on topic models and SEM as analysis models using reviews, such as causal analysis of services using hLDA and SEM [5]. This study conducted a causal analysis of topics generated by hLDA, review evaluations, and emotions using SEM, assuming that evaluation factors that lead to service improvement also affect emotions [6]. In addition, the literatures [7, 8] conduct analyses for game reviews, suggesting that the previous method may have significance as a factor analysis for the game market. In addition, studies outside of Japan have analyzed how playability, such as game design and player preferences, affects player enjoyment [9], and SVM and Random Forest clustering methods have been used to conduct sentiment analysis [10]. However, there are no examples yet of using chat data for game feature analysis. The purpose of this study is to obtain knowledge useful for game development by analyzing what features are desired by users based on their evaluations and sentiments, and what features make the software exciting, based on the comments of the viewers accompanying the reviews and the actual game play.
3 Analysis Process and Methods 3.1 Overview This paper follows the analysis process outlined in Fig. 1. First, data is collected from e-commerce websites and live-streaming videos on the web using crawlers and data collection software. The data is then preprocessed, and a topic model is trained to extract
250
R. Miyake and R. Saga
the topics contained in the text. The words contained in those topics form a word distribution using the Bag of Words method. In addition, the emotions of the text obtained from sentiment analysis are extracted and used as observed variables. Next, a path model is constructed based on the word distribution and observed variables, and knowledge is gained by applying SEM (structural equation modeling).
Fig. 1. Overview of Analysis
3.2 Data Collection and Preprocessing The data for this research requires reviews of game software and chat messages from game playthroughs. For reviews, game titles for Nintendo Switch and PS4 on Amazon’s platform were obtained through web scraping. Each review contains a rating ranging from 1 to 5, and for topic extraction using a topic model, each review is considered as a single document. For chat messages, a search for “(game title) stream” was conducted on YouTube, and the top 10 results sorted by views were selected. The obtained game titles were the top 30 titles in terms of playback time from each month’s game releases in the 2022 fiscal year [11].To ensure a sufficient number of words per document, the chat messages were grouped by topic on an hourly basis and treated as a single document every 30 s. In this analysis, the following processing was performed on these text data to extract characteristic topics related to games: • Use only non-independent nouns, pronouns, suffixes, numbers, adverbial forms, and adjectival verbs • Remove words with only one or two characters consisting of hiragana or katakana • Remove words that appear in more than 50% of all documents in each genre(e.g., actions) to remove game-specific nouns, those that appear in only 3 games or less • Remove words related to the streamer’s name or associated words • Remove common phrases used in streams, such as “ナイス(nice),” “いいね(good),” “草(lol),” and “笑(laugh)”. 3.3 Topic Extraction A topic model is a technique that aims to clarify the topic structure in a collection of documents based on the assumption that there are specific topics in the document group and by inferring the words contained in those topics. Each document is generated from a probability distribution of topics, and each word is generated from a distribution of words
Feature Analysis of Game Software in Japan
251
in each topic. Latent Dirichlet Allocation (LDA) [12] is one of the methods for topic modeling and is a model that assumes that a document is formed from multiple topics. Since game reviews are often evaluated from multiple perspectives, LDA is considered a natural model. In addition, Hierarchical Latent Dirichlet Allocation (hLDA) [13] is an extended method of LDA and is a model in which topics are hierarchically structured as shown in Fig. 2. In hLDA, topics are constructed by automatic generation, and lowerlevel topics are nested in higher-level topics. Documents are assigned topics at each level, making it a more natural model as they can have multiple topics. In this study, hLDA is used to extract features for a single game review. LDA is used to analyze data that may be composed of few words, such as chat, or data where hierarchy by genre is feared to be divided for each game.
Fig. 2. The structure of hierarchical Latent Dirichlet Allocation
3.4 Sentiment Information Extraction Sentiment analysis is a technique for analyzing the emotions that individuals have from text data. Sentiment analysis can determine whether the target user has a positive or negative emotion, and it can also be used to evaluate data. By applying sentiment analysis to reviews, even high-rated reviews can reveal minor dissatisfactions that are classified as negative, and it is possible to analyze the distinctive contents that strongly feel positive. In this study, the emotion is embedded into the SEM path model using the calculation method in reference [5], and emotions are assigned to each topic. E
im
=
1 E(s) s∈Ti (Sm ) |Ti (Sm )|
(1)
where equation E im represents the sentiment of topic T i in review Rm , S m represents the set of sentences in Rm, and the function E is used to analyze sentiment. For sentiment inference, we use a function E that is fine-tuned from the “WRIME: Subjective and Objective Sentiment Analysis Dataset” [14], which is pre-trained using Japanese BERT [15] based on SNS information. This function assigns sentiment values of −1 (negative), 0 (neutral), or 1 (positive) to the text (Fig. 3).
252
R. Miyake and R. Saga O1 c11 Latent Variable 1
c12
Latent Variable 1
Latent Variable 2
O1: c11 O2: c12 O3: c13
O2: c22 O3: c23 O4: c24
O2 c13 c22
Latent Variable 2
O3 c23 c24 O4
Fig. 3. Example of Path model and simplified expression
3.5 Create Path Model and Analysis by SEM SEM (Structural Equation Modeling) [16] is a statistical method used to analyze the causal relationships between observed variables and latent variables (variables that are not directly observable but can be estimated from other variables). SEM uses a graphical representation called a path model or path diagram to show the causal relationships quantitatively and visually. The path model uses rectangles to represent observable variables and ellipses to represent latent variables. The causal relationships between observable variables are represented by paths. In topic modeling, emotions based on the words and documents that make up a topic can be defined as observed variables, while the topic itself can be defined as a latent variable and incorporated into a path model. The model is also constructed by drawing a path from each topic to each observed variable (word). In hLDA, paths are drawn from upper topics to lower topics to express hierarchical relationships. The evaluation criteria for path models using SEM (Structural Equation Modeling) include Goodness of Fit Index (GFI), adjusted GFI (AGFI), Root Means Square Error of Approximation (RMSEA), and Bayes Information Criterion (BIC) [17, 18]. GFI represents the goodness of fit between the path model and the actual model, while AGFI, which takes into account a penalty for models with many complex parameters, is a modified version of GFI. Values range from AGFI ≤ GFI ≤ 1, with values closer to 1 indicating better fit, and models with values exceeding 0.9 generally considered good. RMSEA represents the difference between the model’s distribution and the true distribution, expressed as a quantity per degree of freedom, with values closer to 0 indicating better fit and models with values below 0.05 considered good. BIC evaluates the balance between model fit and information content and is used for relative evaluations, with smaller values indicating better fit.
Feature Analysis of Game Software in Japan
253
Table 1. Data size and criteria for each genre review Genre
# of documents
GFI
AGFI
RMSEA
BIC
Action
5720
0.8458
0.8231
0.033
311
Arcade
654
0.8728
0.8432
0.038
289
First-Person Shooter
1751
0.8213
0.7865
0.028
377
Adventure
1359
0.7674
0.6990
0.040
191
Fig. 4. Results of causality analysis for each genre
4 Analysis Results Figure 4 shows the results of applying LDA and SEM to reviews with high ratings of 4 and 5 for each category. The number of topics in LDA was set to 3, and the top 3 words with high generation probabilities for each topic were selected. The names of the latent variables, which are topics, were estimated by the author from the extracted words. The numbers on the words indicate the path coefficients from the topic, representing the strength of the causal relationship. As for specific considerations on the model’s content, in the category of action games, the topic of “difficulty,” “story,” and “mode” appear. Regarding the topic of “difficulty,” “attack” has a coefficient of 0.6 and “clear” has a coefficient of 0.1, indicating that the influence of combat-related elements is strong. In FPS/shooting games, the emotional satisfaction related to the story is strong, with a coefficient of 1.1 for the topic of “story” and 0.1 for the topic of “beginners”.
254
R. Miyake and R. Saga
Fig. 5. Results of causality analysis for each title
It is speculated that the difficulty level of games can elicit mixed emotions and may seem difficult for novice players. Table 1 shows the number of documents and evaluations used in the analysis model. The GFI and AGFI of the action, arcade, FPS/shooting, and adventure models are around 0.7–0.8, which is below the general benchmark of a good model (GFI > 0.9), but the RMSEA values of all models are below 0.05, indicating a certain level of reliability. Figure 5 shows the result of hLDA applied to high-rated (4 and 5) reviews for “Animal Crossing: New Horizons”, which had the largest number of documents among the reviews collected in this study. The depth of the hierarchy was set to 3, and a total of 6 topics were extracted from the automatically constructed topics with a high number of assigned documents, and a path model was created using the 5 words with high generation probability among them. Regarding the evaluation of SEM in Table 2, both GFI and AGFI exceeded 0.9, and RMSEA showed a relatively good model at 0.045. The words that form the topics include many game-related terms such as “craft” and “my design”, which provide more detailed knowledge about the game. However, by dividing the hierarchy into finer levels, information such as orders and shopping for the game software itself from e-commerce sites, which are not directly related to the game content, also appeared frequently. Finally, in Fig. 6, we conducted LDA and SEM analyses using the chat comments from viewers during game broadcasts as an example of four game titles: “Dragon Quest X,” “Kirby Discovery,” “Resident Evil Village,” and “Final Fantasy XIV”. In addition to emotions, we incorporated the number of chat comments every 30 s into the path model and analyzed its causal relationship. As a result, in “Final Fantasy XIV,” we found that the topics of “characters” and “Lalafell” (characters in the game) were grouped into the topic of “characters,” “story” and “main” into the topic of “story,” and “job” and “ninja” (related to jobs) into the topic of “jobs”. We also found that the chat had similar features as the reviews. Regarding the influence of emotions and the number of comments, in “Resident Evil Village,” the topics related to “broadcasting” had the highest impact with values of 0.72 and 0.44, respectively, which are considered to be the parts of the broadcast that are most exciting. However, the appearing words were about the broadcasting situation, such as “New Year” and “looking forward to it,” making it difficult to analyze the game itself. Additionally, while almost all p-values for the words
Feature Analysis of Game Software in Japan
255
were 0, many p-values for emotions and the number of comments were above 0.01, making it difficult to demonstrate their significance. Moreover, from the evaluation in Table 3, only “Final Fantasy XIV” had GFI and AGFI values exceeding 0.9, and all titles had RMSEA below 0.05, indicating some reliability. Table 2. Data size and criteria for each game livestreaming chat Title
# of documents
GFI
AGFI
RMSEA
BIC
Dragon Quest X
2425
0.7811
0.7534
0.033
533
Kirby and the Forgotten Land
1978
0.8781
0.8324
0.045
227
Resident Evil Village
3541
0.8151
0.7597
0.028
253
Final Fantasy XIV
708
0.9300
0.9091
0.020
203
Dragon Quest X
Kirby and the Forgotten Land
Meta Knight
Kirby 3.1 Meta Knight 0.2 clear 1.0 (sentiment) 1.2 (comments):-0.04
final boss
Kirby 24.8 final boss -1.1 Elfilin 1.0 (sentiment) -5.6 (comments):-0.63
battle
attack 0.5 avoidance 1.0 hammer 0.1 (sentiment) 0.1 (comments):-0.01
Final Fantasy XIV
character
character 1.0 name 0.7 Lalafell 0.1 (sentiment) -0.1 (comments):-0.01
version
online 0.5 offline 1.0 story 0.5 (sentiment) 2.6 (comments):-19
equipment
equipment 1.0 level 1.0 quest 0.7 (sentiment) -4.7 (comments):-11
battle
heal 1.0 attack 0.5 Fura 0.9 (sentiment) -13 (comments):-5.5
Resident Evil Village
story
story -0.1 fun 1.0 main -0.1 (sentiment) 0.5 (comments):0.01
job
job 0.5 ninja 0.6 adjustment 1.0 (sentiment) -0.1 (comments):-0.01
command
heal 1.0 save 0.8 item 1.1 (sentiment) -0.01 (comments):0.06
character
Ethan 1.0 Rose 0.3 music box 0.1 (sentiment) 0.01 (comments):0.02
stream
New Year 1.0 Fun 15.8 BIOHAZARD 3.3 (sentiment) 0.72 (comments):0.44
Fig. 6. Results of causality analysis for each title
5 Conclusion and Future Work In this paper, causal analysis was conducted for both reviews and chats regarding video games, and their characteristics were analyzed. By analyzing the game from the perspective of the player’s emotions and the comments from viewers in chats, we were able to
256
R. Miyake and R. Saga
obtain its characteristics. However, the p-values for the influence of emotions and comment count on chats were high, indicating that emotions were not accurately identified due to the abundance of colloquial expressions and abbreviations, and that comments regarding the streamer or other factors unrelated to the game content had a significant impact. Therefore, in chat analysis, it is necessary to consider not only the textual data but also factors such as the gameplay video, audio, and streamer characteristics. The achievement of this paper is that we were able to analyze the characteristics of highly rated games from both reviews and chats and obtain insights. Moving forward, it will be a challenge to analyze the enjoyment of video games in-depth by incorporating features such as images, videos, and audio, in addition to chats. Acknowledgments. This research is supported by Hayao Nakayama Foundation for Science, Technology and Culture.
References 1. NewZoo: Global games market report (2022) 2. May, E.: Streamlabs and Stream Hatchet Q3 2022 Live Streaming Report. https://streamlabs. com/content-hub/post/streamlabs-and-stream-hatchet-q3-2022-live-streaming-report 3. Kitami, K., Saga, R.: Causality analysis for best seller of software game by regression and structural equation modeling. In: 2010 IEEE International Conference on Systems, Man and Cybernetics, pp. 1495–1500. IEEE (2010) 4. Kitami, K., Saga, R., Matsumoto, K.: Comparison analysis of video game purchase factors between Japanese and American consumers. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES 2011. LNCS (LNAI), vol. 6883, pp. 285–294. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23854-3_30 5. Ogawa, T., Saga, R.: Structural equation modeling with sentiment information and hierarchical topic modeling. Int. J. Adv. Syst. Meas. 13(3 & 4), 26–35 (2020) 6. Singh, U., Saraswat, A., Azad, H.K., Abhishek, K., Shitharth, S.: Towards improving ecommerce customer review analysis for sentiment detection. Sci Rep. 12, 21983 (2022). https://doi.org/10.1038/s41598-022-26432-3 7. Saga, R., Kunimoto, R.: LDA-based path model construction process for structure equation modeling. Artif. Life Robot. 21(2), 155–159 (2016). https://doi.org/10.1007/s10015-0160270-0 8. Kunimoto, R., Saga, R.: Causal analysis of user’s game software evaluation using hLDA and SEM. IEEJ 135(6), 602–610 (2015) 9. Li, X., et al.: A data-driven approach for video game playability analysis based on players’ reviews. Information 12, 129 (2021) 10. Britto, L.F.S., Pacífico, L.D.S.: Evaluating video game acceptance in game reviews using sentiment analysis techniques. In: Proceedings of SBGames 2020, pp. 399–402 (2020) . https://www.giken.tv/news 11. Stream Tech Research Inc.: 12. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993– 1022 (2003) 13. Blei, D.M., Griffiths, T.L., Jordan, M.I.: The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010) 14. Suzuki, H., et al.: A Japanese dataset for subjective and objective sentiment polarity classification in micro blog domain. In: Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022), pp. 7022–7028 (2022)
Feature Analysis of Game Software in Japan
257
15. hottoSNS-BERT. https://github.com/hottolink/hottoSNS-bert 16. Anderson, C.J., Gerbing, W.D.: Structural equation modeling in practice: a review and recommended two-step approach. Psychol. Bull. 103(3), 411–423 (1988) 17. Schermelleh-Engel, K., Moosbrugger, H., Müller, H.: Evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures. Methods Psychol. Res. Online 8(2), 23–74 (2003) 18. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Inductive Model Using Abstract Meaning Representation for Text Classification via Graph Neural Networks Takuro Ogawa and Ryosuke Saga(B) Osaka Prefecture University, 1-1 Gakuen-Cho, Naka-Ku, Sakai, Osaka, Japan [email protected]
Abstract. Text classification is a fundamental task in NLP. Recently, graph neural networks (GNN) have been applied to this field. GNN-based methods sufficiently capture text structures well and improve the performance of text classification. However, previous work could not accurately capture non-consecutive and longdistance semantics in individual documents. To address this issue, we propose in this study an ensemble model comprising two aspects: a model for capturing nonconsecutive and long-distance semantics and another model for capturing local word sequence semantics. To capture each of the semantics, we use abstract meaning representation (AMR) graphs for the relations between entities and another set of graphs based on the fixed-size sliding window for capturing local word sequence semantics. Furthermore, we propose a learning method that considers the edge features for AMR graphs. Extensive experiments on benchmark datasets are conducted, and the results illustrate the effectiveness of our proposed methods and AMR graphs. Keywords: Graph Neural Network · Text Classification · Abstract Meaning Representation
1 Introduction Text classification is a fundamental task in the NLP field. It is widely applied in tasks, such as topic classification [1], sentiment analysis [2], and question answering [3]. Traditional methods have conducted text classification using sparse features, such as bag-of-words [4] and n-grams [5]. Recently, deep learning models have been applied in learning text representation, such as CNN [6], and RNN, such as LSTM [7] and GRU-based methods [8]. These models can capture local word sequence semantics. However, these models have lack of non-consecutive and long-distance semantics [9, 10]. Recently, graph neural network (GNN)-based methods are proposed to attack such issues. Yao et al. [10] proposed the text graph convolutional network (TextGCN), which builds a heterogeneous word text graph for a whole dataset and captures global word co-occurrence information. This method employs semi-supervised graph convolutional networks [11]. TextGCN requires the construction of a global graph of the entire corpus, but this task is costly in terms of memory. Furthermore, unknown documents cannot © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 258–271, 2023. https://doi.org/10.1007/978-3-031-35132-7_19
Inductive Model Using Abstract Meaning Representation
259
be processed by TextGCN [12]. Conversely, Huang et al. [13] built graphs for each document with global parameter sharing. Nodes correspond to words in the document, and the edges of this model represent the co-occurrences between the terms within a fixed-size sliding window [14]. It applies message passing mechanism to learn the graphs. This method builds a graph for each document, allowing for the reduction of memory consumption, and it is an inductive method for handling unknown documents. However, as all graphs share the same parameters, the various relationships between words in different documents are ignored. Zhang et al. [15] proposed TextING, which built individual graph using a fixed-size sliding window and learned text-level word interactions by GNN to produce embedding in the new text effectively. This method has solved ignoring the contextual-aware word relations within each document. Although, TextING can capture local consecutive word sequence semantics, it may ignore the nonconsecutive and long-distance semantics. The reason is that it builds graphs for each document based on adjacent words in text using a fixed-size sliding window.
Fig. 1. Example of an AMR graph for “the film, despite the gratuitous cinematic distractions impressed upon it, is still good fun.” Graph generated from a movie review
Abstract meaning representation (AMR) [16] graph is a beneficial structure that captures relations between entities. It represents the meaning of a sentence using a rooted, directed, and acyclic graph, where nodes are concepts, and edges are semantic relations. This structure often represents long-distance semantics. In Fig. 1, we can observe that the AMR graph captures long-distance semantics between ‘fun’ and ‘film’. Recently, AMR representation has attracted wide attention. Some works generate sentence from AMR [17, 18] and AMR from sentence [19, 20]. Furthermore, some works apply AMR in NLP task fields, such as machine translation [21], text summarization [22, 23], information extraction [24, 25], and question answering [26]. In summary, many studies that utilize the properties of AMR in NLP tasks have been conducted, but none of them use AMR graphs for text classification. Therefore, in this work, we propose inductive text classification using AMR via edge-featured GNNs based on TextING. In contrast to previous approaches that use fixed-size sliding window structures, our method considers non-consecutive and longdistance semantics by using AMR graphs. Moreover, our method has a relatively higher interpretability because an AMR graph only has edges between semantically related
260
T. Ogawa and R. Saga
words. The representation on word nodes of AMR graph is propagated to their neighborhood information via the gated GNN (GGNN) [27]. We also consider edge-labels and edge-features for learning graph representation. Edge representations are updated by GRU using edge-labels and edge-features. Subsequently, they are used for updating word nodes. Although the AMR graph captures long-distance semantics, it may ignore local consecutive word sequence semantics. Therefore, the graphs in this work are trained in parallel using AMR graphs and another set of graphs based on a fixed sliding window; we also introduce a process of voting for the final prediction, as inspired by TextING-M [15], which uses TextGCN and but lacks an inductive feature. In summary, we construct an inductive model that considers long-distance semantics and local consecutive word sequence semantics. Figure 2 shows our model. Our method achieves high classification performance and has high interpretability by edge-level and node-level attentions. In summary, our contributions are presented as follows: • We propose a new inductive model that considers long-distance semantics and local consecutive word sequence semantics for text classification. • This work is the first approach for text classification using AMR representation. • This works achieves better performance than existing works.
2 Related Work 2.1 Abstract Meaning Representation The AMR representation abstracts away from surface word strings and syntactic structure, thereby producing a language-neutral representation of meaning. Therefore, AMR graph representation is flexible and not specifically designed for a particular domain. By utilizing this property, Hardy and Vlachos [22] and Liao et al. [23] summarized text. Rao et al. [24] identified some molecular events in biomedical text by hypothesizing that an event was a subgraph of the AMR graph. Song et al. [21] utilized its capability that captures more relations for machine translation. Zhang et al. [17] generated sentence from AMR using GCN, and Beck et al. changed edges into additional AMR nodes and generated sentence using GGNN [28]. Bevilacqua et al. [19] generated sentence from AMR and generated AMR from sentence using BART. In summary, many works have been widely performed by utilizing the properties of AMR in various NLP tasks. This potential is beneficial; however, AMR graphs are not applied in text classification tasks. 2.2 Deep Learning for Text Classification RNN and CNN are used to learn text representations. TextRNN [29] used RNN for text classification, and Yang et al. [8] used GRU-based methods. TextCNN [6] used CNN for text classification. Widely used in text classification, the effective deep learning model includes RNN and CNN, as these models sufficiently capture local consecutive word sequence semantics. However, they may ignore non-consecutive and long-distance semantics.
Inductive Model Using Abstract Meaning Representation
261
Fig. 2. Architecture of our method. ht is the node representation and el t is the edge label. ef t is the edge feature generated by the nodes related to the edge. eat Represents the edge attention. predictionA and predictionI are the predictions from the AMR graph and TextING model, respectively.
2.3 GNN for Text Classification GNN can process rich relational structures; therefore, it is effective model for classifying text s which include the various relationships among words. GNN-based text classification has been widely explored. For instance, Yao et al. [10] performed text classification using GCN. Moreover, Liu et al. proposed TensorGCN that considered syntactic and sequential contextual information and extended TextGCN [30]. However, GCN-based methods require a global graph that included all the documents. In other words, it is a transductive model, that is, these methods cannot handle unknown documents. To learn inductively, Huang et al. [13] and Zhang et al. [15] proposed text classification methods using individual graphs for each document. Ding et al. [31] and Xie et al. [12] considered the latent semantic structure, such as topic. However, these methods could not accurately capture non-consecutive and long-distance semantics because graphs were built based on the adjacent words in text using a fixed-size sliding windows. Yao et al. [10] and Zhang et al. [15] illustrated the performance of models with a varying window size. Although increasing the window size can capture much more long-distance semantics, the performance was not improved. The insufficiency can be attributed to the edges between nodes that are not closely related but may be added by the increased window size.
3 Method 3.1 Building Graph We construct a graph based on AMR representation and a graph of TextING based on a fixed-size sliding window. Let GA = V A , E A , EL and GT = {V T , E T } an AMR graph and a graph of TextING, where V = {v1 , . . . vi , . . . vN } is a set of N nodes,
262
T. Ogawa and R. Saga
E = {e1 , . . . ei , . . . eM } is a set of M edges, and EL = {el 1 , . . . el i , . . . el M } is a set of M edge labels of AMR. vi is a vector initialized by the d -dimensional word embedding, in which the embeddings of the nodes in a graph are denoted as h ∈ R|V |×d . el i is initialized with one-hot vector corresponding to the edge label in a graph; the edge label-based one-hot vectors are denoted as l ∈ R|E|×|K| , where K is the number of types of edge labels. AMR is a rooted directed acyclic graph, and this means that node embedding information is propagated in one direction. Ideally, AMR should be bidirectional to enable the information to propagate [28] in the same manner as how RNN-based encoders benefit from right-to-left propagation in bidirectional LSTM [32]. Therefore, we also added reverse edge to the graph. 3.2 GGNN GGNN employs a gated recurrent unit (GRU) as a recurrent function. Node representations h is updated by its previous representation and its neighboring information. In this work, GGNN is defined as: ht = GRU ht−1 , at−1 , (1) where h0 = h, and a is an aggregated information from adjacent information in nodeupdating. GRU function is defined as: zt = σ W z at + U z ht−1 + bz , (2) rt = σ W r at + U r ht−1 + br ,
(3)
h = tanh W h at + U h rt ht−1 + bh ,
(4)
∼t ht = h zt + ht−1 1 − zt ,
(5)
∼t
where σ is the sigmoid function; and all W, U, and b are trainable weights and biases. z and r functions are the update gate and reset gate, respectively. We employ this GGNN and GRU functions in node updating and edge updating, respectively 3.3 Node Updating Node embedding is updated from ht−1 to ht guided by its adjacent neighbor information. When aggregating adjacent neighbor information, we employ edge-level attention to consider the importance of each edge (this function is described at Sect. 3.4) in AMR graphs. Then, a node embedding is updated using its previous representation and aggregated information. The formulas are expressed as: h0 = tanh(hW n + bn ),
(6)
Inductive Model Using Abstract Meaning Representation
A = A × eA ,
263
(7)
at−1 = A ht−1 ,
(8)
ht = GRU (ht−1 , at−1 ),
(9)
where A ∈ R|V |×|V | is the adjacency matrix, eA is the edge-level attention and A represents the adjacency matrix that considers the edge-level attention. In Formula (7), A is multiplied by eA and our model can consider the importance of each edge. The more times this node-updating operation (Formulas (7), (8) and (9)) is repeated (i.e., the larger the number of GGNN layers), the information can be propagated to more distant nodes. In the case of TextING model, Formula (7) is not used. 3.4 Edge Updating and Edge-Level Attention Edges of AMR graph have edge-label information. Furthermore, to enrich the edge information, we create edge-feature using node information associated with the edge. Inspired by the works of Cai et al. [33], we employed the joint node representation as a temporary edge feature. Then, we used the edge feature and the edge label to update the edge representation. e0f = tanh etf W f + bf , (10) etf = GRU et−1 f ,l ,
(11)
where etf ∈ R|E|×|d ×2| (d 0 is the dimension of word embedding of h0 ) is the temporary edge feature, and e0f ∈ R|E|×|K| represents the edge features used to generate newer edge features. By repeating the edge updating process (Formula (11)), the model can consider edge label information more strongly. In this study, we repeat the operation two times. After updating the edge information, we obtain the corresponding edge soft attention. The function used to perform the edge attention is defined as follows: 0
eA = σ (f1 (etf )),
(12)
where eA is the edge attention, and f1 is the multilayer perceptron and performs as a soft attention. These operations are performed between Formulas (6) and (7) of node updating. We use different f functions for each node update operation (each GGNN layer) to explore the attention in each the state from which the node had been updated. 3.5 Readout Function We employ Zhang et al. [15]’s methods as a readout function. Word nodes need to be aggregated to a graph-level representation for text classification. The readout function is defined as: hv = σ f2 (htv ) tanh htv W r + br , (13)
264
T. Ogawa and R. Saga
hG =
1 hv + Maxpooling(h1 . . . hN ), |V |
(14)
v∈V
where hv is one row of matrix ht , and f2 is the multilayer perceptron acting as a node with a soft attention weight. The readout uses the average weighted word feature and max-pooling function for the graph representation hG . Then, graph-level representation is used to predict the label of the text, and the loss function is defined as the cross-entropy loss:
yG = softmax(WhG + b), L=−
yGi log yGi ,
i
(15) (16)
where yGi is the prediction label of graph Gi . AMR graphs and graphs of TextING performed each of the operations described so far in parallel (TextING do not include edge-updating). Finally, we train graphs with AMR graphs and graphs of TextING in parallel and make them vote for the final prediction. The formula is defined as: yG = softmax((1 − t)yGA + t(yGT )),
(17)
where yG is the final prediction, yGA is the AMR graph prediction, yGT is the TextING prediction, and t is a hyperparameter that controls the relative proportion. In this way, we learn an inductive model that considers long-distance semantics and local consecutive word sequence semantics.
4 Experiment 4.1 Preprocessing for AMR Graph First, we must construct AMR graph for text classification task dataset. In this experiment, we use trained SPRING [19] to construct AMR graph. Therefore, by utilizing some commonly used datasets for text classification task, including Movie Reviews (MR) [34], Stanford Sentiment Treebank (SST1, SST2) [35], subsets of Reuters (R8, R521 ) (Yao et al., 2019), TREC question dataset (TREC) [36], and Subjectivity dataset (Subj) [37], we confirmed that trained SPRING can predict the AMR graph accurately. We input each dataset into trained SPRING to generate AMR graph and generate text from the AMR graph. We evaluate the difference between the original text and the generated text by BLEU [38], which is a common natural language generation measure. Table 1 shows the BLEU points for each dataset. In this experiment, to evaluate the effectiveness of our model, we use the following five datasets with high BLEU scores (over 25.00): MR, SST1, SST2, TREC and Subj. Table 2 shows the statistics of the datasets. We construct individual AMR graphs for the five datasets by trained SPRING. 1 http://disi.unitn.it/moschitti/corpora.htm
Inductive Model Using Abstract Meaning Representation
265
After constructing AMR graphs, we preprocess the AMR graph. The idioms were divided into word units, and each word embeddings were averaged as the idiom embedding. We also treat conceptual elements, such as “contrast,” that do not exist in document as word nodes. The AMR graph has a polarity of negation given by the “-” node. Therefore, we change the polarity nodes expressed as “-” node to “not” nodes. Table 1. Text-to-AMR-to-Text generation results for each benchmark text classification
Table 2. Summary statistics of the datasets
4.2 Baseline The baselines can be categorized into three categories: 1) traditional deep learning methods including CNN [6] and Bi-LSTM [29]; 2) graph-based transductive representation learning models that use TextGCN [10]; and 3) graph-based inductive representation learning models including Text-level GNN [13], HyperGAT [31], T-VGAE2 [11], and TextING [15]. 4.3 Implementation Details In the experiment, we use 300-dimentional GloVe [39] vectors to initialize the embedding vector while the out-of-vocabulary words were randomly sampled between −0.01 and 0.01 [15]. The number of the GGNN layer is 2 and the relative proportion t = 0.5. We set the learning rate as 0.00125 with Adam [40] optimizer, batch-size of 1024, and dropout rate of 0.5 on our model and the TextING model. The window size of TextING graph is 3. We train our model for a maximum of 200 epochs. Edge-updating conducts GRU function twice. For all da tasets, we randomly selected 10% of training set as validation set based on Yao et al. [10]. For the baseline models with pre-trained word embeddings, we use 300-dimensional GloVe vectors. Text Classification Results. Table 3 shows the classification accuracy of the different methods on the five benchmark datasets. Our method outperforms all GNN-based 2 The code has not been released. Therefore, we only cite the accuracy on the MR dataset as
reported by Xie et al. [11].
266
T. Ogawa and R. Saga
Table 3. Accuracy on each text classification dataset. We run all models 10 times and report the mean ± standard deviation [10]. Note: Some baseline results are from Yao et al. [10], Ding et al. [31], and Xie et al. [11].
Table 4. Accuracy of TextING when combined with other inductive models
Table 5. Improved accuracy of Bi-LSTM and CNN when combined with AMR graphs
baselines on the four datasets, suggesting the effectiveness of our models for text classification using the AMR graph. This result indicates that capturing both long-distance semantics and local word sequence semantics improves text classification performance. However, the performance of our method is lower than that of TextING on TREC. The low performance can be attributed to the TREC dataset having a shorter document length compared with the other datasets; therefore, the fixed-size sliding window graph adequately captured the relationships between words. Efficiency Comparison Among AMR and Other Methods. Besides combining TextING with AMR graphs, we also combined TextING with inductive models, including Bi-LSTM (trained on a recurrent-like AMR graph) and HyperGAT (trained on a graph structure-like AMR graph). Table 4 shows the performance results. In SST2, the combination of TextING with HyperGAT has the best evaluation score. However, only the
Inductive Model Using Abstract Meaning Representation
267
TextING combined with AMR graphs improved the accuracy on all other datasets except TREC. In addition, we investigated the potential improved accuracy of bi-LSTM and CNN which capture the semantics of local continuous word sequences when combined with AMR graphs. Table 5 shows the performance results. For all datasets, the accuracy of Bi-LSTM is improved when it is combined with the AMR graph. Furthermore, for all other datasets except TREC, the accuracy of CNN is improved when it is combined with the AMR graph. These results suggest that AMR graphs can compensate for the shortcomings of such models that capture the semantics of local continuous word sequences (e.g., Bi-LSTM and CNN). 4.4 Investigate Edge-Featured Graph We also evaluated the effectiveness measure by considering the edge features of AMR graphs. Table 6 shows the classification accuracies of the edge-featured and non-edgefeatured AMR graphs, in which the performance of the former is better than the latter. This result suggests that our method can sufficiently learn AMR graph representation when edge features based on edge labels are considered. 4.5 Vote Ratio for Final Prediction We also investigated the performance of our method by varying the values of t, which controls the relative proportion. Fig. 3 shows the performance results for MR and Subj. The performance of our method improves between t = 0.1 and 0.6. In addition, when the AMR graph and TextING graph vote is 3:7 on MR and Subj, our method gets best performance. Table 6. Accuracy of edge-featured graph and non-edge-featured graph on each text classification dataset.
Accuracy
T. Ogawa and R. Saga
Accuracy
268
Parameter
Parameter
(1) MR
(2) Subj
Fig. 3. Test accuracy with different values of t
4.6 Attention Visualization We visualized the attention layer to understand that which nodes and edges are important for learning graph representation. Figure 4 illustrates the important nodes and important edges. The upper part of the figure presents the edge attention of the first layer, whereas the bottom part shows the edge attention of the last layer and the node attention in the readout. The important edges vary at each layer, which updates the node embeddings. By focusing on node attention, the nodes determined as important in the readout can be easily identified.
drama
drama
pacing
sa sfy
good
deserve
li le
havedegree
direct
more
sa sfy
good
thing
release
pacing
good
deserve
li le
thing
release
havedegree
direct
more
good
video
video
(1) First layer
(2) Last layer
Fig. 4. Edge and node attention visualization of a positive movie review on MR. Node attention in readout is illustrated at the bottom part. The original text is “a well paced and satisfying little drama that deserved better than a direct-to-video release.”
Inductive Model Using Abstract Meaning Representation
269
5 Conclusion and Future Work A new inductive ensemble text classification model was proposed in this study. Individual AMR graphs were built for capturing non-consecutive and long-distance semantics, and individual graphs were built by the fixed-size sliding window for capturing local word sequence semantics. This work is the first approach for text classification using AMR representation. Furthermore, we proposed a learning method that considers edge features for AMR graphs. The results of the experiments showed that our method outperforms all baselines on four datasets, thereby showing the effectiveness of our models for text classification by using AMR. In this study, we only used relatively short texts with AMR graphs that could be relatively accurately generated by trained SPRING. As the AMR graph can capture nonconsecutive and long-distance semantics, it may derive more benefit from long text. In the future, we intend to apply GNN with AMR graphs for longer texts.
References 1. Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short papers), pp. 90–94 (2012) 2. Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016) 3. Dhingra, B., Pruthi, D., Rajagopal, D.: Simple and effective semi-supervised question answering. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short papers), pp. 582–587 (2018) 4. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683 5. Cavnar, W.B., Trenkle, J.M.: M-gram-based text categorization. In: Proceedings of SDAIR94, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994) 6. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014) 7. Zhou, C., Sun, C., Liu, Z., Lau, F.: A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 (2015) 8. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016) 9. Peng, H., et al.: Large-scale hierarchical text classification with recursively regularized deep graph-CNN. In: Proceedings of the 2018 World Wide Web Conference, pp. 1063–1072 (2018) 10. Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. In: Proceedings of the 33th AAAI Conference on Artificial Intelligence, pp. 7370–7377 (2019) 11. Xie, Q., Huang, J.: Inductive topic variational graph auto-encoder for text classification. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4218–4227 (2021)
270
T. Ogawa and R. Saga
12. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of the 2017 International Conference on Learning Representations (2017) 13. Huang, L., Ma, D., Li, S., Zhang, X., Wang, H.: Text level graph neural network for text classification. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3444–3450 (2019) 14. Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1702–1712 (2015) 15. Zhang, Y., Yu, X., Cui, Z., Wu, S., Wen, Z., Wang, L.: Every document owns its structure: Inductive text classification via graph neural networks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 334–339 (2020) 16. Banarescu, L., et al.: Abstract meaning representation for SemBanking. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp. 178–186 (2013) 17. Zhang, Y., et al.: Lightweight, dynamic graph convolutional networks for AMR-to-text generation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2162–2172 (2020) 18. Wang, T., Wan, X., Yao, S.: Better AMR-to-text generation with graph structure reconstruction. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 3919–3925 (2021) 19. Bevilacqua, M., Blloshmi, R., Navigli, R.: One SPRING to rule them both: symmetric AMR semantic parsing and generation without a complex pipeline. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, pp. 12564–12573 (2021) 20. Lam, H.T., et al.: Ensembling graph predictions for AMR parsing. In: Proceedings of 35th Conference on Neural Information Processing Systems (2021) 21. Song, L., Gildea, D., Zhang, Y., Wang, Z., Su, J.: Semantic neural machine translation using AMR. Trans. Assoc. Comput. Linguist. 7, 19–31 (2019) 22. Hardy, H., Vlachos, A.: Guided neural language generation for abstractive summarization using abstract meaning representation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 768–773 (2018) 23. Liao, K., Lebanoff, L., Liu, F.: Abstract meaning representation for multi-document summarization. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1178–1190 (2018) 24. Rao, S., Marcu, D., Knight, K., Daumé III, H.: Biomedical event extraction using abstract meaning representation. In: Proceedings of the BioNLP 2017 Workshop, pp. 126–135 (2017) 25. Zhang, Z., Ji, H.: Abstract meaning representation guided graph encoding and decoding for joint information extraction. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 39–49 (2021) 26. Mitra, A., Baral, C.: Addressing a question answering challenge by combining statistical methods with inductive rule learning and reasoning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2779–2785 (2016) 27. Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. In: Proceedings of the 2016 International Conference on Learning Representations (2016) 28. Beck, D., Gholamreza, M., Cohn, T.: Graph-to-sequence learning using gated graph neural networks. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 273–283 (2018)
Inductive Model Using Abstract Meaning Representation
271
29. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 2873–2879 (2016) 30. Liu, X., You, X., Zhang, X., Wu, J., Lv, P.: Tensor graph convolutional networks for text classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 8409–8416 (2020) 31. Ding, K., Wang, J., Li, J., Li, D., Liu, H.: Be more with less: hypergraph attention networks for inductive text classification. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4927–4936 (2020) 32. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architecture. Neural Netw. 18(5–6), 602–610 (2005) 33. Cai, S., Li, L., Han, X., Zha, Z.-J., Huang, Q.: Edge-featured graph neural architecture search. arXiv preprint arXiv:2109.01356 (2021) 34. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124 (2005) 35. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013) 36. Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics, (Volume 1), pp. 1–7 (2002) 37. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 271–278 (2004) 38. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceeding of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002) 39. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) 40. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference for Learning Representations (2015)
Enhancing Visual Encodings of Uncertainty Through Aesthetic Depictions in Line Graph Visualisations Joel Pinney(B) , Fiona Carroll, and Esyin Chew Cardiff School of Technologies, Cardiff Metropolitan University Llandaff Campus, Western Avenue, Cardiff CF5 2YB, UK {JPinney2,FCarroll,EChew}@cardiffmet.ac.uk
Abstract. The method of representing uncertainty can drastically influence a user’s interpretation of the visualised data. Whilst reasons for the scarce adoption of accepting uncertainty visualisations has been extensively researched, exploring further intuitive depiction methods has taken a back seat. Currently, most visualisation methods for uncertainty revolve around the comprehension of grasping pre-existing techniques such as confidence intervals and error bars. Moreover, this anticipates that the intended audience will be proficient in obtaining the relevant information displayed. To help establish an accessible method for the visualisation of uncertainty, we adopt a novel cross-disciplinary approach to further understand and depict the more intuitive/affective dimensions of uncertainty. The field of aesthetics is mostly associated with the discipline of art and design, but it has been applied in this research to evaluate its effectiveness for uncertainty visualisation. In a recent study with one thousand one hundred and forty-two participants, the authors examined the influence of applying aesthetic dimensions to the visualisation of a line graph. We find that certain aesthetic renderings afford a higher degree of uncertainty and provide an intuitive approach to mapping uncertainty to the data. By analysing the participants’ responses to different aesthetic renderings, we aim to build a picture of how we might encourage the use of uncertainty visualisation for a lay audience. Keywords: Uncertainty
1
· Visualisation · Aesthetics
Introduction
Visualisations offer an enhanced experience to providing large quantities of data in an accessible and comprehensible format. However, often when displaying data in a visual form, users will interpret this as an exact depiction of what the data shows. Whilst in reality, data may take a multitude of directions which are not visualised. The scenario of only depicting one outcome is ubiquitous in data visualisations, with nearly all data sets containing a form of uncertainty c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 272–291, 2023. https://doi.org/10.1007/978-3-031-35132-7_20
Enhancing Visual Encodings of Uncertainty
273
[54]. Moreover, if the visualisation is attempting to depict a predictive data set, it is likely only the highest probability outcome will be displayed. Similarly, data sets of other natures may contain uncertainties stemming from areas including data accuracy, completeness, lineage, credibility and interrelatedness [30]. Uncertainty as a whole is not a new concept, with areas such as psychology investigating how humans deal with the unknown [43]. Likewise, within the field of visualisation, there have been many papers reporting on the depiction of uncertain information in a range of fields [30,32,41,51]. Whilst the topic of depicting uncertainty has become an area of significance within the visualisation community, many still report on why visualisations still do not accept and utilise uncertainty as an important, valuable dimension [5,20,48]. A key reason behind not adopting uncertainty visualisation is that the techniques used to represent uncertainty (such as data vagueness) have emerged from “hard science” and lack the expressiveness that is needed to connect to humanistic contexts [10]. In this paper, we investigate the use of intuitive visual depictions of uncertainty and contribute novel methods to portraying uncertain information in visualisations for a wider audience. The work has focused on the line graph (the principles of the work may be applied to any line-based graph) and detail the implementation of aesthetic renderings to create an intuitive visual depiction of uncertainty.
2
Related Work
There are two main segments of related work that formulate this research: (a) The field of uncertainty visualisation and the review of its current implementation and methods of depiction, (b) The knowledge of how aesthetic renderings can afford a deeper and more intuitive comprehension for uncertainty visualisation.
3
Uncertainty Visualisation
The visualisation of uncertainty is not a new concept [1], and for many years researchers in the fields of cartography [34], geographical visualisation, scientific visualisation [15] and information visualisation have provided an analysis of multiple typologies [28]. Whilst the scientific community has long assumed that visualisation techniques enhance the communication of information, these theories have been contested when exploring uncertainty [25]. Therefore, we must consider the application of depicting uncertainty more than solely for scientific purposes and find suitable depiction methods that are accessible to a lay audience. Whilst there is no set method agreed upon among researchers for the optimum method of depicting uncertainty [24], there is a rough outline towards categorising general methods. Currently, research shows there are three main methods of depicting uncertainty in a visualisation: graphical annotations of distributional properties (i.e. error bars, box plots etc.), visual encodings of uncertainty (i.e. size, transparency etc.) and hybrid approaches (combination of
274
J. Pinney et al.
both distributional properties and visual encodings) [38]. Whilst there are clearly defined categories for methods of depiction, the challenge remains as to which method best represents the uncertainty in data. Because of these challenges, uncertainty visualisation has gained a fairly negative impression in recent years, and many researchers have explored the reasons why. From early research by Pang et al. [40], there was an acknowledgement that the challenges and difficulties faced were simply focused on the attempt to understand uncertainty visualisation. More recently, a work titled “Why Authors Don’t Visualize Uncertainty” by Hullman [20] reviews why these challenges are still persistent despite innovation. The findings detail how many are optimistic about depicting uncertainty but are impeded by challenges, including calculating/visualising uncertainty and providing explanations of uncertainty to a viewer. Similar analysis as to why depicting uncertainty is not a common practice has become a heavily researched area [5,27,39,48]. Their findings show the multiple reasons why uncertainty is not a commonly expressed entity within data visualisations. Some have focused on attempting to raise awareness of uncertainty visualisation [53] and claim that the lack of widespread implementation may be one of the issues making it harder to include uncertainty in visualisations. The authors agree that further awareness is needed to educate a wider audience on the importance of uncertainty visualisations. Moreover, they explore and address these issues through the development of an intuitive approach. An approach that does not require an enhanced understanding of scientific representations or awareness but relies on the natural gut instinct of uncertainty created through the visual. One of the most common methods of uncertainty visualisation has been in the form of presenting error bars to depict the distributional properties [54]. In certain scientific contexts (where the end users have a background understanding), error bars can be very valuable. However, researchers have shown the critical pitfalls for even experts and their misconceptions surrounding error bars [3,29]. Furthermore, a paper by Correll and Gleicher [9] highlights how the traditional methods of depicting uncertainty (a bar chart with error bars) can often be misunderstood. Moreover, their work takes critical steps to redesign these methods by encoding the semiotics of visual displays of uncertain data in order to enhance the user’s performance. They state “even if the viewer has no prior background” they would have an improved experience [9, p.2150]. The authors believe that similar measures can be taken to improve the depiction of uncertainty in line graphs through the adoption of art and design methods to enhance the aesthetics of the visual (the line). This section has highlighted that one of the key challenges with uncertainty visualisation is the lack of clarity and understanding surrounding traditional depiction methods. Whilst researchers address mechanisms to enhance existing methods of error bars, the authors of this paper explore enhancing the visual encodings of uncertainty through the exploration of the aesthetic. Saying that, they acknowledge that the investigation of intuitive depictions of uncertainty through visual encodings is not a new notion, but they do feel there is room for further enhancement.
Enhancing Visual Encodings of Uncertainty
3.1
275
Visual Encoding of Uncertainty
Whilst many depictions of uncertainty revolve around error bars and confidence intervals, some research has begun to explore the method of visual encodings. Traditionally, visual encodings of uncertainty have mainly focused on the area of geospatial data by mapping uncertainty through methods including colour, transparency, texture, focus and contrast [17]. A study conducted by Gschwandtner et al. [17] explored how visual encodings can improve the visualisation of temporal uncertainty. Their study tested various types of plots (gradient plot, violin plot, error bars etc) to understand the influence each plot had when displayed with an array of questions. Measurements included the participant’s interpretation, timings, testing probabilities and user preference. Their findings support previous studies highlighting the confusion caused by error bars [3,29] despite participants having prior information on how to interpret them. Interestingly, their findings showed that the gradient plot (utilising blur) was best for statistical probability distributions. However, when participants were asked about their preference regarding the visualisation styles, gradient plots were ranked last [17]. No further indication was provided as to why their participants selected the gradient plot as the least preferred when error bars had received such a negative review for causing confusion. Overall; their study concluded that ambiguation (using two colours from monochromatic methods) was ranked as the most preferred for uncertainty visualisation. Interestingly, researchers exploring intuitive/alternative methods of depicting uncertainty have displayed a promising use of animation. Hypothetical outcome plots (HOPs) use random draws from distributions and animate them over a series of times to display a moving depiction of the uncertainty [39]. By taking the concepts of error bars (to show a distribution) and mapping them to an animated form, Hullman et al. [21] displays an effective technique to map uncertainty. However, it is important to consider the addition of animation may not always be possible such as for static visualisations, and therefore other visual encodings will be required. Fundamentally, the idea of using visual encodings to portray meaning relates closely to the concepts of semiotics. Semiotics is concerned with how symbols can be given meanings in order to represent a thing, or a concept [35]. Similar principles have been applied by MacEachren et al. [36], who explored uncertainty visualisation through visual semiotics focusing on components of information (space, time, and attribute). Their findings show how value, size, transparency and fuzziness were ranked as being highly intuitive methods of depiction. The use of visual semiotics of uncertainty continues to provide a visual metaphor for uncertainty through the uses of colour saturation, blur and noise [39]. The authors look to further enhance the use of intuitive methods by building on existing research of visual encodings and semiotics by applying aesthetic techniques.
276
4
J. Pinney et al.
Aesthetics
The term Aesthetics derives from the Greek word for perception aisthesis and it is often associated with the field of art and design. So much so, that aesthetics is seen by some as the purpose of art and design. However, aesthetics is a complex field of study. It is a central topic within the field of philosophy, though, at the same time, it has strongly defined ties to other fields such as art and design [2], psychology [11,23], and now even computing [16]. As Shelley [46, p.1] notes, the term aesthetic “has come to be used to designate, among other things, a kind of object, a kind of judgment, a kind of attitude, a kind of experience, and a kind of value”. As a result, the complex nature of aesthetics has made it difficult to obtain a holistic understanding of the subject [13]. In 1974, Munro [37, p.329] asked the question, “should artists be taught aesthetics?”. More recently, Reid and Miller [42, p.319] emphasised that “A designer not only needs to be able to interpret these perceptions but to be the creator of perceptions - to be able to interpret visual language and develop their own language”. 4.1
The Aesthetic Visual Language
There are no absolute rules for the design of the aesthetic; however, there are tried and tested techniques that designers can use to achieve the required aesthetic effect. It is about understanding the visual language. In particular, the range of visual elements – line, shape, tone, colour, texture and form – and how they can be worked with the design principles – balance, scale, rhythm, movement, unity, emphasis, contrast and repetition – to achieve the desired aesthetic. Visual variables were first described by French cartographer, and professor Jacques Bertin in his 1967 book Semiologie Graphique [4]. Bertin identified seven main categories of visual variables: position, size, shape, value, color, orientation, and texture. Since then, these variables have been modified and further expanded. For example, Roth [44] describe twelve visual variables in which a map or other visualization can be varied to encode information. These include: “(1) location, (2) size, (3) shape, (4) orientation, (5) color hue, (6) color value, (7) texture, (8) color saturation, (9) arrangement, (10) crispness, (11) resolution, and (12) transparency” [44, p.2]. In terms of the digital canvas, the Online Visual Aesthetics Theory Model proposes that the perception of the composition of an online medium is directly influenced by the eight web design categories: Graphics, Text, Simplicity, Animation, Layout, Unity, Emphasis, and Balance [33]. “The variance of each of these web design categories may then be explained by the more traditional seven primary design dimensions: Lines, Shapes, Colors, Textures, Forms, Values, and Spaces” [33, p.1]. In detail, to create the desired aesthetic effect, the designer needs to work with the fundamental elements of visual design, arranging them according to principles of design [47]. Indeed, an intended aesthetic can be achieved in a myriad of ways; however, by understanding the basic visual elements and how they can be visually transformed with the design principles is a good starting point. In his book Envisioning information Tufte [49, p.88] explores
Enhancing Visual Encodings of Uncertainty
277
“how colour’s inherently multidimensional quality can be used to express multidimensional information”. Colour is a design element that can provide visual and psychological information; it can also be used to control and influence viewer’s responses and reactions [14]. The relationship between colour, other design elements and principles can produce many different energy levels and feelings. The more considered the working of design elements and principles, often, the greater the aesthetic impact. In summary, from a digital perspective “aesthetics is not an abstract concept but a process in which we examine a number of elements such as lighting and picture composition and our perceptual reactions to them”[56, p.4]. As a premise for attracting users’ attention, visual aesthetics has been identified as a crucial role in product design and marketing [18]. Furthermore, Chaouali et al. [7, p.1525] highlight that “design aesthetics positively influence perceived usefulness and trust”. In their paper, they demonstrate how aesthetics positively affects the adoption and recommendation intentions of mobile banking applications [7]. 4.2
Aesthetics in Visualisation
Data visualizations are often the interface where complicated problems are presented to decision-makers. It is exactly as Ware [52, p.9] describes “On the one hand, we have the human visual system, a flexible pattern finder, coupled with an adaptive decision-making mechanism. On the other hand are the computational power and vast information resources of the computer and the World Wide Web”. He highlights that visualizations are the interface between the two and that it is important to improve these interfaces [52]. Studies have shown that aesthetics has the potential to improve these interfaces by contributing to the comprehension of data visualisations [19,22,26,50]. Moreover, research [55] shows that some decision makers arrive at an idea or a decision not by analytically inferring the solution but by sensing the correct solution. This aligns with aesthetics as they can be strategically designed to afford sensory interactions; however, part of the challenge of this for data visualisation is related to how this is defined and understood. In his paper, Chen [8] notes it is important to understand how insights and aesthetics interact, and how these two goals could sustain insightful and visually appealing information visualization. The question still lies in how we actually engineer aesthetics into data visualization (e.g. how do we apply the visual language to create that desired aesthetic effect?). In their paper, LI and Xu [31] discuss a theoretical framework for integrating the aesthetic into data visualization to enhance users’ emotional engagement and in doing so, they provide guidance for data visualization design. In terms of visualising uncertainty, Boukhelifa et al. [6] share design considerations on how to deploy visual variables (blur, dashing, grayscale and sketchiness) to effectively depict various levels of uncertainty for line marks.
278
5 5.1
J. Pinney et al.
Study Introduction
This study aims to evaluate new methods of depicting uncertainty in visualisations for a lay audience. The study involved evaluating the influence of aesthetic dimensions in the context of visualising uncertainty. In order to portray the aesthetic, the authors combined a pre-selected design element (line) with a design principle (emphasis). In order to portray the aesthetic, the authors utilised known elements of visual design (proposed by Bertin [4]) and paired them with the design principles. The design element line was selected as it is the best visual variable representative of line graphs. The design principle emphasis was selected as it can best intensify the design element but also further the concepts presented in existing research utilising these principles (or similar). The following sections detail the method of distributing the questionnaire, the details and selection of the participants and the design of the study. 5.2
Distribution
As discussed, the purpose of this study was to collect sufficient data to be able to inform on how aesthetics can be used to depict uncertainty for a wider population. Therefore, it was important that the process of distributing the questionnaire involved random sampling. The process of random sampling for test subjects’ participation has its benefit in how it can provide the best representation of a population [45]. To ensure that participant selection was not biased and that a large participant range could be obtained, the authors opted for a third-party distribution approach. The authors worked with the world’s largest first-party data company Dynata to disseminate the questionnaire throughout the United Kingdom [12]. By utilising Dynata’s extensive database of appropriate test participants, the questionnaire was released and live to participants between the 11th of August 2021 until 14th of August 2021. The only constraint on the participant selection was the requirement of being over the age of 18 years. This was enforced in the selection process to ensure a diverse and unbiased sample. 5.3
Participants
One thousand one hundred and forty-two participants were involved in the study. Of which 604 were female, 532 male, 3 non-binary and 3 prefer not to say. Participants were all aged over 18 years (18–24 = 110, 25–34 = 231, 34–44 = 230, 45–54 = 216, 55–64 = 160, 65–74 = 162, and 75+ = 33) and all resided in the United Kingdom (across all four countries). Participants had a range of experience with using data on a regular basis. In detail, 484 had previous experience, 510 had no previous experience and 148 were unsure. Participants educational backgrounds also varied with 496 participants having a BSc (level 5) degree and/or higher. 591 participants had GCSE-level 4 qualifications and 55 had no educational background or selected others (i.e., non-accredited awards).
Enhancing Visual Encodings of Uncertainty
5.4
279
Method
Fig. 1. Applying emphasis to all four lines (straight, dashed, hand-drawn, and zigzag): Displaying all four stages of the study. (1) Participant selects hand-drawn as most uncertain, (2) selects hand-drawn with 18pt emphasis, (3) changes the decision to the straight line with 18pt emphasis, (4) participant asked “What made you change your decision from line A to line B?”
This study involved a questionnaire containing both qualitative and quantitative questions. The purpose of the questionnaire was to evaluate the influence of the aesthetic dimensions to depict intuitive visualisations of uncertainty in a line graph. Participants were provided with a scenario in order to provide context to their decisions. The scenario provided stated, “You are a company director who needs to make a decision to buy a new product for your business. You view a set of data visualisations to help make your decision. The visualisations show the Sales Vs Profit.” Subsequent to the information and demographic questions, the questionnaire was developed in four distinct phases. (Phase 1) The first phase prompted the participant to rank the four types of lines (Straight, dashed, handdrawn and zig-zag) from the most uncertain to the least uncertain. Moreover, providing a numerical value for each line where 1 is the most uncertain and 4 is the least uncertain. (Phase 2) Once a participant had selected their most uncertain line (ranked 1st), they were then shown the line with various degrees of emphasis applied (shown in separate questions). For example, the participant’s most uncertain ranked line was displayed with the four degrees of emphasis (5pt, 8pt, 11pt, and 18pt). The degrees of emphasis were selected in order to display a range of intensities from minimal enhancement to maximum enhancement. It was decided that the 18pt of emphasis would be at the top end of what
280
J. Pinney et al.
could realistically be replicated in day-to-day visualisations, and the 5pt emphasis was at the lower end of what could be easily visualised (i.e. without being distorted). At this stage, we aim to understand the type of line and associated level of emphasis that the participant deemed to be the most uncertain. (Phase 3) Participants were shown the same degree of emphasis but across all previous lines. For example, if a participant selected the dashed line with 18pt emphasis as the most uncertain, they were then shown all of the lines (straight, dashed, hand-drawn and zig-zag) with the 18pt emphasis applied. This stage was to reinforce the participants original decision or determine if the aesthetic (addition of emphasis) influenced the original decision made. Moreover, to understand the impact of the other styled lines and if it influenced their decision to change from one line to another. (Phase 4) Once a participant had made their selection, they were then presented with one of two questions (4.1) “Why do you feel this line is most uncertain?”, this was shown if the participant remained with their original decision. The other question (4.2) “What made you change your decision from line A to line B?” was shown if the participant had changed from their original decision. Overall, the questionnaire contained over two thousand questions but was strategically designed to enable a participant’s previous responses to a question to influence the future questions asked. Participants were only shown a maximum of 40 questions in total.
6
Results
The findings demonstrate the influence that the aesthetic had on a participant’s decision about which line was the most uncertain. We display how the addition of emphasis to create the aesthetic can be a contributing factor to eliciting further uncertainty. As discussed in the methods section, the questionnaire was structured into four distinct categories of questioning. The results will be displayed in the four stages (Phase 1) Which line was the most uncertain (no added emphasis), (Phase 2) Which line was the most uncertain with varying levels of emphasis, (Phase 3) Which line was the most uncertain when all of the lines are presented with the same degree of emphasis from phase 2 selection, and (Phase 4) the participants provide a qualitative reason for their decisions to remain or change on which line was the most uncertain. 6.1
Phase 1 - Ranking the Lines from Most to Least Uncertain
The findings display that when participants were exposed to all types of lines (straight, dashed, hand-drawn and zig-zag) the zig-zag line was selected as being the most uncertain with 40% of the votes. This was closely followed by the straight line with 26%, while the dashed and hand-drawn lines trailed with 17% respectively. Table 1 displays the total number of participants selecting each rank (1 most uncertain 4 least uncertain) for all types of lines. Analysing the data presented in Table 2 highlights the level of variation between participants’ decisions. Whilst a large majority of participants selected
Enhancing Visual Encodings of Uncertainty
281
Table 1. Participants’ decisions on which line was the most uncertain (ranked 1-most uncertain, 4-least uncertain.) Type of line
Rank 1 Rank 2 Rank 3 Rank 4
Straight
302
147
209
484
Dashed
195
374
397
176
Hand-Drawn 189
387
384
182
Zig-Zag
234
152
300
456
the straight line as the most uncertain (mean 2.65 SD 1.24), similar variance was seen in the zig-zag (mean 2.09 SD 1.16). However, interestingly, participants’ responses to the dashed and hand-drawn lines were almost identical. Both lines show a mean of 2.34, stand deviation of 0.86 and a median of 2. The only distinguishing factor is the modes of 2 (hand drawn) and 3 (dashed) thus displaying the hand-drawn as being perceived as slightly more uncertain than the dashed. Table 2. Participants ranking of which line is the most uncertain: Mean, median, and mode Type of line
Mean Median Mode SD
Straight
2.65
3
4
Dashed
2.34
2
3
0.86
Hand-Drawn 2.34
2
2
0.86
Zig-Zag
2
1
1.16
2.09
1.24
Interestingly, of those participants who selected having experience with data (510 participants), 51% then went on to select the zig-zag line as the most uncertain. Compared to those who did have experience in data where only 35% selected the zig-zag line as the most uncertain. Overall, in every specified category detailing the level of experience with data (from definitely no experience to definitely having experience), the zig-zag line scored the highest (being the most uncertain line). 6.2
Phase 2 - Applying the Design Principles
By understanding which line elicits the most uncertainty, phase two will now focus on applying the design principle of emphasis to the participant’s most uncertain line. Applying Emphasis to Phase 1 Selected Line. Table 3 shows that when participants were first exposed to various degrees of emphasis (5pt, 8pt, 11pt, and 18pt), the 18pt emphasis was selected by participants as the most uncertain. Moreover, 59% of all participants selected their chosen line (from phase
282
J. Pinney et al.
1) with the 18pt emphasis as the most uncertain. This was followed by 20% of participants selecting the 5pt, 11% for the 8pt and 10% for the 11pt with their original selected line. Table 3. Number of participants selecting the each level of emphasis per type of line and the total percentage of participants overall (percentages rounded to the nearest whole number. Line
Emphasis # Participants % of participants
Straight
5pt 8pt 11pt 18pt
79 43 23 157
7% 4% 2% 14%
Dashed
5pt 8pt 11pt 18pt
43 40 26 86
4% 3% 2% 8%
Hand-drawn 5pt 8pt 11pt 18pt
54 22 16 97
5% 2% 1% 8%
50 24 52 330
4% 2% 5% 29%
Zig-zag
6.3
5pt 8pt 11pt 18pt
Phase 3- Reinforcing Participants’ Decisions
In phase 3 the authors have now captured the line in which the participant deemed as the most uncertain and the degree of the design principle (shown in Fig. 1) which when applied was the most uncertain. Phase 3 details the participants’ response when they are again exposed to all lines but this time with the same degree of emphasis (that was chosen in phase 2) to reinforce their decision. In particular, the authors are interested to see if they persist with their original decision from phase 2 or whether the emphasis had an impact on their original decision, and if they changed their mind to a new line. The details of phase 3 documents the changes in participants’ responses to the decision on what line is the most uncertain from phase 2. Applying Same Degrees of Emphasis to all Lines. When exposed to the same degree of emphasis on all the lines, many participants’ decisions on what aesthetically enhanced line was the most uncertain were influenced. As seen in
Enhancing Visual Encodings of Uncertainty
283
Fig. 2, when participants were exposed to all the lines with the same degree of emphasis 45% of the participants changed their decision from the straight line to an alternative. The most extensive change was seen in those who selected the 18pt emphasis, in that the number of participants was recorded to have reduced from 157 to 47 (70% change of decision). Whilst the straight line showed a statistically significant drop in the number of participants, the dashed line also had a decrease of 26% overall. Both the hand-drawn line (+3%) and zig-zag line (+40%) experienced an increase in the number of participants changing their decision and selecting it as the most uncertain. The largest increase in participant numbers was seen in the zig-zag line with the 18pt emphasis, this saw an increase from 101 participants going from 330 to 431 participants.
Fig. 2. Before and after exposure to all lines with the same degree of emphasis (Straight line)
Table 4 displays the overall changes within each category when participants were exposed to all the lines with the same degree of emphasis. As shown the greatest variation between data sets is highlighted for the straight line. Although the straight line stood out as being uncertain in the initial phases, when exposed to the other lines with the added emphasis participants’ decisions were swayed (e.g., they changed their original decisions). Interestingly, whilst there was a 45% decrease in the total number of participants selecting the straight line, only 68 participants who had originally selected the straight line remained (77% selected other lines (234 participants)). An additional 97 participants selected the straight line after their exposure to all lines in phase 3. Of those 234 participants who changed their decision from the straight line, 34 participants (14%) changed to the dashed line, 156 participants (67%) changed to the zig-zag line and 44 participants (19%) changed to the hand-drawn line. As shown, the straight line had the largest variation between phases and spread between degrees of emphasis. Coincidentally, the hand-drawn line saw
284
J. Pinney et al.
Fig. 3. Before and after exposure to all lines with the same degree of emphasis (ZigZag)
little variation between phases but also between the number of participants who selected each degree of emphasis (t = 0.0378).
7
Phase 4 - Reasons for Participants’ Decisions
Phase 4 focuses on the qualitative aspect of the study, where the authors analyse the data to understand the key reasons as to why participants either remained with their original decision or changed to a different line. They investigate the participant’s rationality for altering their decision to determine if the inclusion of the aesthetic dimension was a contributing factor. Moreover, the authors aim to understand the feelings of participants when proceeding from phase 2 (selected line with design principle) to phase 3 (all lines with the same degree of design principle) of the study. 7.1
Reasons Why emphasis was Uncertain
As shown in phase 3, when participants were exposed to all of the lines with the same degrees of emphasis, the straight line with 18pt emphasis was seen to produce a significant decrease in participant numbers. The participants reasoning for switching from the straight line with 18pt emphasis to the zig-zag with 18pt emphasis included: “The erratic movements” [P.311], “The line is not uniform” [P.898], “not a steady increase” [P. 1075] and “fluctuations” [P.336]. Overall, the main terms used to describe the reasons for changing from the straight to zig-zag line included words like zig-zag and up-and-down. Curiously, very few participants acknowledged the addition of emphasis and mainly focused on the uncertainty introduced through the line itself (primarily the design element). In
Enhancing Visual Encodings of Uncertainty
285
Table 4. Change in participants’ decisions between phase 2 and phase 3 with t-test score when applying emphasis. Line
Emphasis % Increase
Straight
5pt 8pt 11pt 18pt
–33% –16% +26% –70%
Dashed
5pt 8pt 11pt 18pt
–7% –45% –23% –28%
Hand-drawn 5pt 8pt 11pt 18pt
–35% –14% –31% +33%
Zig-zag
+98% +167% +10% +31%
5pt 8pt 11pt 18pt
other words, they were describing the line as opposed to the emphasis on the line. This is like in a painting where the viewer would generally describe the objects in the painting as opposed to what made the object (i.e., brushstrokes used to put emphasis on the object etc.). What is of real value here is how successfully the emphasis (brushstrokes etc.) presents the line (object) and engages the participant in the uncertainty. Similar results were seen across all levels of emphasis (5pt, 8pt, and 11pt) when participants changed from the straight line to the zig-zag, stating the primary reason being the line itself. Whilst the ‘up-and-down’ was the most commonly used term describing the participant’s main cause of the uncertainty, some participants still acknowledged the fuzzy and shaded area the emphasis added. Although only a few did recognise the addition of the emphasis as the reason for the uncertainty. In detail, one participant categorised the emphasis (18pt) as the “wider the error bars, we have to assume the shading is the min/max range of error on each point” [P.1010]. Similar results can be seen when analysing participants’ decisions on why the dashed line and hand-drawn were the most uncertain. Analysing the data provided by participants who selected the dashed with 18pt emphasis (and remained with this decision in phase 3) showed the majority stating the gaps between the lines were the main reason for uncertainty. Likewise, the participants, regarding all hand-drawn lines with varying levels of emphasis, mainly focused on the lines and their unpredictability. One participant stated that the hand-drawn line
286
J. Pinney et al.
Fig. 4. Word cloud of all participants who selected 18pt emphasis zig-zag)
“deviates more unnaturally and less consistently compared to the others which have either a single trend or a recurring pattern” [P.70].
8
Discussion
In this study, we have investigated the impact of the aesthetic rendering to depict uncertainty. In particular, we have focused on how aesthetics may be used to enhance the visualisation of uncertainty in the line graph. By combining the design element (line) with the design principle (emphasis), the authors explore the influence of different combinations of these on the user’s perception of what is the most uncertain. Our results have shown how a user’s perception of what is most uncertain can be heavily influenced by the inclusion of design principles. For example, despite many participants selecting the straight line as the most uncertain (without design principles applied), when applying emphasis, the line was no longer seen as the most uncertain. In fact, after applying the design principles, the straight line now ranked 3rd when applying emphasis. When exploring phase 4 and the reasons participants selected the zig-zag (18pt emphasis) over the straight line, we were surprised to find very little mention of the emphasis. The main focus of participants was on the mentioning of the qualities/ traits that were inherent in the line (i.e., the up and down movement of a zig-zagged line). However, it does raise the question of why participants had not selected that line in the first instance. Particularly, if the design element line was the contributing factor in their decision. The authors believe that the inclusion of the design principle emphasis was a contributing factor in enhancement of the uncertain qualities of the zig-zag line despite it not being raised as the sole influencing factor in the participant’s decision. Moreover, the inclusion of emphasis acted as a contributing factor that increased the feelings of uncertainty within the zig-zag line, as opposed to when participants were presented with the zig-zag line with no applied emphasis (design principles) in phase 1.
Enhancing Visual Encodings of Uncertainty
9
287
Conclusion
This research has shown how the inclusion of design principles (emphasis) can enhance a participant’s feelings towards a design element (line). As seen, when participants were faced with selecting the line (phase 1) that they felt was most uncertain, the decisions were based solely on the uncertainty emitting from the design element. However, when introduced to aesthetic renderings of emphasis phase 2, the majority of participants were swayed (they changed their decisions). The findings show that regardless of the initial decision in phase 1 when exposed to all lines with the added design principle (emphasis), many participant’s decisions were influenced. However, when asked why they changed their decisions, nearly all participants alluded to qualities of the lines being the main factor for their decision. Regardless of the fact that those same qualities of the line were still present when they made their initial decisions in phase 1. From this, the authors conclude that the inclusion of the design principle enhanced the lines in a way that participants could see them/sense them in a different light. Moreover, this provokes/ ignited in them an alternative response and set of feelings which were not present in their initial decisions in phase 1 of the study. 9.1
Future Work and Limitations
This research sets the foundations for future studies exploring more intuitive approaches to the design of uncertainty in visualisations. This research has focused on the design element line paired with the design principle emphasis to render the aesthetic. Moving forward, the authors feel there is potential to further explore additional design elements and principles to evaluate their influence on people/s decisions around data uncertainty. This includes exploring how the combinations of elements and principles such as harmony, space, form, balance, and contrast may provide additional dimensions and contributions to the interpretation and feelings of visual encodings for uncertain data in visualisations. The authors also feel that there is scope to further test how aesthetics can be applied to visualisations to help more deeply understand data uncertainty for real-world problems and scenarios. For example, applying those aesthetic depictions that were ranked as the most uncertain and evaluating the user’s performance, decision quality, speed of decision and intuitiveness. Moreover, further studies could be undertaken to understand/ probe why the design principle was never raised as contributing factor to the participant decision, despite clearly having an influence as participant numbers increased in phase 3. This study was conducted as part of a wider research project which evaluates the influence of different design elements and principles on the design of uncertainty for data visualisations. To date, the design elements of line, colour and texture have been paired with emphasis, scale, and movement. The next steps for this research will be the development of a dynamic and interactive uncertainty visualisation framework to support designers and developers in their quest to apply aesthetics to enhance the visualisation of uncertain data. This framework will provide the key components necessary to understand and visualise for the
288
J. Pinney et al.
senses/intuition to enhance the visualisation of data uncertainty. The output of all design principles and paired elements will be ranked (most to least uncertain) to provide designers with the optimum depiction for their required levels of uncertainty (i.e., high/low uncertainty). The authors acknowledge that intuitive depictions of uncertainty may not be suitable for all types of visualisations. For example, when visualising precise data that require rigorous standards (i.e. medical visualisations), the authors would not encourage the sole use of intuitive depictions. For a case like this, the true challenge would lie in aligning the aesthetic and the science to afford a combination of cognitive and intuitive decision-making processes. Additionally, certain line tests may be seen to be influenced by variations in the data (i.e., a zig-zag line showing an increase followed by a decrease). However, the authors felt it was important to include this line as it can show a static presentation of HOPs (Hypothetical Outcome Plots), where the top of the zig-zag line is the upper quartile and the bottom representing the lower quartile. Acknowledgments. Supported by Knowledge Economy Skills Scholarships 2 (KESS2) which is an All Wales higher-level skills initiative led by Bangor University on behalf of the HE sectors in Wales. It is part-funded by the Welsh Government’s European Social Fund (ESF) competitiveness programme for East Wales.
References 1. Argote, L.: Input uncertainty and organizational coordination in hospital emergency units. Adm. Sci. Q. 27(3), 420–434 (1982). http://www.jstor.org/stable/ 2392320 2. Arnheim, R.: Visual Thinking. University of California Press (1969) 3. Belia, S., Fidler, F., Williams, J., Cumming, G.: Researchers misunderstand confidence intervals and standard error bars. Psychol. Methods 10(4), 389–396 (2014) 4. Bertin, J.: Semiology of Graphics: Diagrams, Networks. ESRI Press, Redlands (1967) 5. Bonneau, G.P., et al.: Overview and state-of-the-art of uncertainty visualization. Math. Visual. 37, 3–27 (2014) 6. Boukhelifa, N., Bezerianos, A., Isenberg, T., Fekete, J.D.: Evaluating sketchiness as a visual variable for the depiction of qualitative uncertainty. IEEE Trans. Visual. Comput. Graphics 18, 2769–2778 (2012) 7. Chaouali, W., Yahia, I.B., Lunardo, R., Triki, A.: Reconsidering the ”what is beautiful is good” effect: when and how design aesthetics affect intentions towards mobile banking applications. Int. J. Bank Mark. 37 (2019) 8. Chen, C.: Top 10 unsolved information visualization problems. IEEE Comput. Graphics Appl. 25, 12–6(2005) 9. Correll, M., Gleicher, M.: Error bars considered harmful: exploring alternate encodings for mean and error. IEEE Trans. Vis. Comput. Graph. 20(12), 2142–2151 (2014) 10. Davis, S.B., Vane, O., Kr¨ autli, F., Davis, S.B.: Can I believe what I see?? Data visualization and trust in the humanities the humanities. Interdisc. Sci. Rev. 46(4), 522–546 (2021) 11. Dewey, J.: Art as Experience. Perigee Books, New York (1932)
Enhancing Visual Encodings of Uncertainty
289
12. Dynata: The worlds largest first-party data platform. https://www.dynata.com/l. Accessed 13 Mar 2013 13. Egan, A.: Understanding aesthetics in design education. In: Proceedings of the 23rd International Conference on Engineering and Product Design Education, PDE 2021), VIA Design, VIA University in Herning, Denmark. 9th–10th September 2021 (2021) 14. Feisner, E.A.: Colour. Laurence King Publishing Ltd. London (2000) 15. Fischhoff, B., Davis, A.L.: Communicating scientific uncertainty. In: Proceedings of the National Academy of Sciences of the United States of America, vol. 111, pp. 13664–13671 (September 2014) 16. Fishwick, P.: Aesthetic Computing. MIT Press, Cambridge (2008) 17. Gschwandtnei, T., B¨ ogl, M., Federico, P., Miksch, S.: Visual encodings of temporal uncertainty: a comparative user study. IEEE Trans. Visual. Comput. Graphics 22, 539–548 (2016) 18. Guo, F., Li, M., Hu, M., Li, F., Lin, B.: Distinguishing and quantifying the visual aesthetics of a product: An integrated approach of eye-tracking and EEG. Int. J. Indust. Ergono. 71 (2019) 19. Hohl, M.: From abstract to actual: art and designer-like enquiries into data visualisation. Kybernetes 40 (2011) 20. Hullman, J.: Why authors don’t visualize uncertainty. IEEE Trans. Visual Comput. Graphics 26(1), 130–139 (2019) 21. Hullman, J., Resnick, P., Adar, E.: Hypothetical outcome plots outperform error bars and violin plots for inferences about reliability of variable ordering. PLoS ONE 10(11), 1–25 (2015) 22. Hullman, J.R.: Framing artistic visualization: aesthetic object as evidence. creativity and cognition. In: 2009 Workshop on Understanding the Creative Act (2009) 23. Huston, J.P., Nadal, M., Mora, F., Agnati, L.F., Conde, C.J.C.: Art, Aesthetics, and the Brain. Oxford Scholarship Online, Oxford (2015) 24. Jena, A., Engelke, U., Dwyer, T., Raiamanickam, V., Paris, C.: Uncertainty Visualisation : an Interactive Visual Survey. In: 2020 IEEE Pacific Visualization Symposium (PacificVis), pp. 201–205 (2020) 25. Joslyn, S., Savelli, S.: Visualizing uncertainty for non-expert end users?: the challenge of the deterministic construal error. Front. Comput. Sci. 2(January), 1–12 (2021) 26. Judelman, G.: Aesthetics and inspiration for visualization design: Bridging the gap between art and science.. In: Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004 (2004) 27. Kamal, A., et al.: Recent advances and challenges in uncertainty visualization: a survey. J. Visualization 24(5), 861–890 (2021). https://doi.org/10.1007/s12650021-00755-1 28. Kinkeldey, C., Maceachren, A.M., Schiewe, J., Kinkeldey, C., Maceachren, A.M., Schiewe, J.: How to assess visual communication of uncertainty?? a systematic review of geospatial uncertainty Visualisation User Studies. Cartogr. J. 51(4), 372– 386 (2014) 29. Lanzante, J.: A cautionary note on the use of error bars. J. Clim. 17(17), 3699–3703 (2005) 30. Levontin, P., Walton, J.L., Aufegger, L., Barons, M.J.: Visualising Uncertainty : A Short Introduction. No. January, AU4DM, London (2020) 31. Li, Q., Xu, C.: A new design framework of the aesthetic data visualization (2019)
290
J. Pinney et al.
32. Liu, L., Padilla, L., Creem-Regehr, S.H., House, D.H.: Visualizing uncertain tropical cyclone predictions using representative samples from ensembles of forecast tracks. IEEE Trans. Visual Comput. Graphics 25(1), 882–891 (2019) 33. Longstreet, P., Valacich, J., Wells, J.: Towards an understanding of online visual aesthetics: an instantiation of the composition perspective. Technol. Soc. 65 (2021) 34. MacEachren, A., Robinson, A., Hopper, S., Gardner, S., Murray, R., Gahegan, M., Hetzler, E.: Visualizing geospatial information uncertainty: What we know and what we need to know. Cartog. Geogr. Inf. Sci. 32, 139–160 (2005) 35. Maceachren, A.M.: How Maps Work, 1st edn. The Guildford Press, New York (1995) 36. MacEachren, A.M., Roth, R.E., O’Brien, J., Li, B., Swingley, D., Gahegan, M.: Visual semiotics amp; uncertainty visualization: an empirical study. IEEE Trans. Visual Comput. Graphics 18(12), 2496–2505 (2012) 37. Munro, T.: Aesthetics and the artist”. Leonardo 7 (1974) 38. Padilla, L., Kay, M., Hullman, J.: Uncertainty Visualizations. J. Cogn. Eng. Decis. Mak. 6(1), 30–56 (2020) 39. Padilla, L.M., Powell, M., Kay, M., Hullman, J.: Uncertain about uncertainty: how qualitative expressions of forecaster confidence impact decision-making with uncertainty visualizations. Front. Psychol. 11, 1–23 (2021) 40. Pang, A.T., Wittenbrink, C.M., Lodha, S.K.: Approaches to uncertainty visualization. Visual Comput. 13, 370–390 (1997) 41. Potter, K.C., Gerber, S., Anderson, E.W.: Visualization of uncertainty without a mean. IEEE Comput. Graphics Appl. 33, 75–79 (2013) 42. Reid, A., Miller, M.: Why is aesthetic awareness important for design students. In: Research and Development in Higher Education: Higher Education in a Changing World (2005) 43. Rettie, H., Daniels, J.: Supplemental material for coping and tolerance of uncertainty: predictors and mediators of mental health during the Covid-19 pandemic. Am. Psychol. 76, 427–437 (2021) 44. Roth, R.E.: Visual variables. In: Richardson, D., Castree, N., Goodchild, M.F., Kobayashki, A., Liu, W., Marston, R.A. (eds.) The International Encyclopedia of Geography, pp.1–11. Wiley (2017) 45. Sharma, G.: Pros and cons of different sampling techniques. Int. J. Appl. Res. 3(7), 749–752 (2017) 46. Shelley, J.: The Concept of the Aesthetic. Stanford Encyclopedia of Philosophy (2015) 47. Siang, T.Y.: The Building Blocks of Visual Design— Interaction Design Foundation. The MIT Press (2021) 48. Skeels, M., Lee, B., Smith, G., Robertson, G.: Revealing uncertainty for information visualization. In: Proceedings of the Workshop on Advanced Visual Interfaces AVI, vol. 9, pp. 70–81 (2010) 49. Tufte, E.R.: Envisioning Information. Graphics Press, Cheshire (1990) 50. Vi´egas, F.B., Wattenberg, M.: Artistic data visualization: beyond visual analytics. In: Schuler, D. (ed.) OCSC 2007. LNCS, vol. 4564, pp. 182–191. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73257-0 21 51. Vosough, Z., Kammer, D., Keck, M., Groh, R.: Visualization approaches for understanding uncertainty in flow diagrams. J. Comput. Lang. 52(April), 44–54 (2019) 52. Ware, C.: Information Visualization Perception for Design, 2nd edn.The Morgan Kaufmann Series. Morgan Kaufmann (2004) 53. Weiskopf, D.: Uncertainty visualization: Concepts, methods, and applications in biological data visualization. Front. Bioinform. 2, 1–17 (2022)
Enhancing Visual Encodings of Uncertainty
291
54. Wilke, C.O.: Fundamentals of Data Visualization, 1st edn. O’Reilly Media, Sebastopol (2019) ¨ 55. Zander, T., Ollinger, M., Volz, K.G.: Intuition and insight: two processes that build on each other or fundamentally differ? Front. Psychol. 14 (2016) 56. Zettl, H.: Sight, Sound, Motion: Applied Media Aesthetics 6th edn. Wadsworth Publishing Company (2014)
Satisfaction Analysis of Group/Individual Tutoring Schools and Video Tutoring Schools Hiroyo Sano1(B) and Yumi Asahi2 1 Graduate School of Management, Tokyo University of Science, Tokyo, Japan
[email protected]
2 Department of Management, Tokyo University of Science, Tokyo, Japan
[email protected]
Abstract. The situation surrounding the tutoring school industry is changing daily, including changes in the content of studies and entrance examinations for high school students due to reforms in high school education, a decrease in the number of students due to the declining birthrate and aging population, and an increase in competitors due to low barriers to entry. What kind of marketing is necessary for tutoring schools to be chosen by users in the midst of market competition and adaptation to educational reform? This study used text mining and multiple regression analysis to examine the needs of users or potential users of tutoring schools. Keywords: Tutoring school · Multiple regression analysis · Text mining
1 Introduction High school education in Japan is currently undergoing a period of change. Against the background of the rapid development of internationalization and informatization and the transformation of social structure, the Ministry of Education, Culture, Sports, Science and Technology (MEXT) has proposed the “High School-University Connection Reform” as an integrated reform of high school education, university education, and university entrance examination. As a part of this effort, the “Common University Entrance Test” was introduced in the 2021 academic year. The “Common Achievement Test for University Admission” aims to evaluate students’ “three elements of academic ability (1) knowledge and skills, (2) ability to think, judge, and express themselves, and (3) attitude toward learning independently and in collaboration with diverse people” in a multidimensional and comprehensive manner, and the content of academic ability required of students has been changing. In addition, high schools will be transitioned to the new curriculum guidelines from the 2022 academic year. The Department of Mathematics has newly established “Mathematics C” according to the new curriculum guidelines. The newly introduced “Mathematics C” covers statistical graphs, discrete graphs, and matrices, in addition to “curves on a plane and complex number plane” of “Mathematics III” and “vectors” of “Mathematics B”. In addition, “probability distribution and statistical inference,” which had © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 292–300, 2023. https://doi.org/10.1007/978-3-031-35132-7_21
Satisfaction Analysis of Group/Individual Tutoring Schools
293
been left out of the scope of most entrance examinations, is made mandatory in “Mathematics B,” and sample surveys and hypothesis tests are added. Therefore, it is possible that the contents of future university entrance examinations may change drastically, for example, by including more questions from the field of statistics. From the above, it can be said that high school students and teachers who are preparing for university entrance examinations are required to adapt to the changes in upper secondary school education. The same is true for tutoring schools. It is necessary for tutoring schools, which aim to prepare for and review the contents of school studies, periodic examinations, and entrance examinations, to adapt to the changes in class contents and entrance examinations according to the new curriculum guidelines. In addition, the market environment surrounding the private tutoring school industry is characterized by two problems: the declining number of students due to the declining birthrate and the tendency for the industry to shrink, and the large number of competitors due to the low entry barriers. It is necessary for tutoring schools to adapt to changes in learning contents and entrance examinations, and to continue to respond to students’ expectations, requests, and needs to be selected in the fierce competition with competitors.
2 Data Summary In this study, two sets of data were used for analysis. First, the data of questions in Yahoo! Chiebukuro, a service provided by Yahoo Japan Corporation. The period is from April 1, 2017 to March 31, 2020. In this study, only data that include the word “tutoring school” in the questionnaire were employed to focus on tutoring school. Therefore, the total number of data was 10,433. The data used in the next section is from a survey on satisfaction with group/individual tutoring schools and video tutoring schools for the purpose of university entrance examinations, provided by oricon ME, Inc. Since the Ministry of Education, Culture, Sports, Science and Technology’s policy of tightening admission capacity may have an impact after 2018, we used the data from 2017. The amount of data after removing missing values is 848 for group and individual tutoring schools and 96 for video tutoring schools. In this study, the objective variable is the overall satisfaction with the tutoring school. Overall satisfaction is rated on a scale of 1 to 10, with 10 indicating very satisfied and 1 indicating very dissatisfied. The explanatory variables were 334 for group and individual tutoring schools and 258 for video tutoring schools.
3 Analysis Method First, I conducted frequency and network analyses using question data from Yahoo! Text Mining Studio and KH Coder were used as the analysis software. In this study, to focus on tutoring schools, only the data containing the word “tutoring school” in the questionnaire were extracted and analyzed. Next, I used data from a survey on satisfaction with group/individual tutoring cram schools and video tutoring cram schools for university entrance examinations, provided by oricon ME, Inc. First, basic tabulations were performed to see trends in the data, then objective variables were set, explanatory variables were selected, and multiple
294
H. Sano and Y. Asahi
regression analysis was performed. The explanatory variables were selected using a stepwise method.
4 Results 4.1 Text mining First, in order to clarify what kind of needs and concerns users have and what kind of information they seek, we conducted frequency analysis and network analysis using KH coder. In the frequency analysis, we extracted the top 20 words among the frequently appearing words. The results are shown in Table 1. Table 1. Top 20 most frequently appearing words 1
tutoring school
12,349
2
think
8,168
3
say
7,512
4
university
6,418
5
study
5,024
6
go
4,778
7
myself
3,850
8
people
3,560
9
teacher
3,438
10
high school
3,307
11
now
3,102
12
school
2,963
13
entrance examination
2,821
14
teach
2,208
15
deviation
2,095
16
parent
2,080
17
ask
1,977
18
friend
1,885
19
consider
1,847
20
time
1,734
In addition, a network analysis was conducted to identify words that are strongly associated with the word “juku” (tutoring school). The results are shown in Fig. 1. From Table 1, “university” and “high school” are in the top 10, and “entrance exam” is also in the 13th place, suggesting that many users may consider the necessity of attending cram school for entrance exams rather than daily study. Words that describe
Satisfaction Analysis of Group/Individual Tutoring Schools
295
Fig. 1. Results of Network Analysis
others other than the user, such as “teacher,” “parent,” and “friend,” were also found. “Time” may have appeared in the questions about “commuting time” and “opening hours of cram schools and class hours. In addition, Fig. 1 shows that the words “cram school - study – exam” are connected with a rather high degree of association, suggesting that, as in Table 1, many users were considering going to a cram school to study for an exam. Since "Examination - High School" is also somewhat related to the results, it may be that there were slightly more Tsumon from users who are about to take high school entrance examinations. It also suggests that there is a connection between the words "cram school - go - think - say parent”. Therefore, it is likely that the users who asked the questions are more likely to be children who are considering attending cram schools than parents. 4.2 Basic Tabulation The three companies with the largest number of students were selected for the basic tabulation of the overall satisfaction level of each school. The results are shown in Figs. 2 and 3, 4. As can be seen from Figs. 2 and 3, many users answered 7 to 8 out of 10 points for “group tutoring cram schools” and “video tutoring cram schools,” indicating that the distribution of overall satisfaction was similar. On the other hand, the distribution of satisfaction levels varied among the tutoring schools. This result suggests that factors affecting satisfaction may differ among individual tutoring schools.
296
H. Sano and Y. Asahi
Fig. 2. Overall satisfaction with group tutoring schools
4.3 Multiple Regression Analysis Next, in conducting multiple regression analysis, we selected 14 variables for the group and tutorial tutoring schools and 9 variables for the video tutoring schools, using the stepwise method. The results of multiple regression analysis on the selected variables are shown in Figs. 5 and 6. Among the selected variables, focusing on the variables with positive partial regression coefficients, “satisfaction with tutoring services (Q2_1, Q2_2, Q2_4, Q2_12, Q2_19, Q21_1[6]),” “satisfaction with instructors (Q2_11, Q2_15, Q9_4),” “ease of use (Q2_21),” and “willingness to repeat (Q10_1_1)” were found in the group and individual tutoring schools. (Q2_21),” “overall level of expectation (Q8_1),” and “willingness to repeat (Q10_1_1),” and for video tutoring schools, “satisfaction with tutoring services (Q2_1, Q2_9),” “change in grades due to attending tutoring (Q6_1[1]),” “ease of access and safety (Q6_1[27], Q9_7). The variables “expectations regarding ‘appropriate course fees’ (Q8_3)” and “willingness to repeat (Q10_1_1)”. While some variables were common to both types of cram schools, “quality of class contents and instructors” was selected for group/individual tutoring cram schools, and “learning environment” such as accessibility and safety was selected for video tutoring cram schools.
Satisfaction Analysis of Group/Individual Tutoring Schools
Fig. 3. Overall satisfaction with the video tutoring school
Fig. 4. Overall satisfaction with individual tutoring schools
297
298
H. Sano and Y. Asahi
Fig. 5. Multiple regression analysis results (group and individual tutoring schools)
Fig. 6. Multiple Regression Analysis Results (Video Tutoring School)
Satisfaction Analysis of Group/Individual Tutoring Schools
299
5 Consideration First, we will discuss the results of the multiple regression analysis for group and individual tutoring schools. Focusing on variables with positive partial regression coefficients, variables such as “satisfaction with the services provided by the tutoring school” and “satisfaction with the instructors” were frequently selected. This result may be attributed to the fact that, while face-to-face instruction allows for personalized instruction for each student, the quality of instruction is not consistent for each instructor due to the wide range of attributes of the instructors. Therefore, it is believed that users place more emphasis on the quality of the classes and the competence of the instructors. Next, we discuss the results of the multiple regression analysis of the video tutoring schools. In the case of the video tutoring schools, variables such as "ease of access and safety" were often selected as variables for which the partial regression coefficients were positive. Classes at the video tutoring school are taught by dedicated instructors, and students and other part-time workers are enrolled as tutors to answer questions, etc., rather than teaching classes, thus guaranteeing a certain level of quality in the classes. Therefore, it is likely that many users are attracted to the convenience and calm study environment rather than the quality of services offered by tutoring schools.
6 Conclusion The practical contributions of this study are twofold. First, the existing or potential users who were asked about cram schools were conscious of the possibility of taking an entrance examination and were considering attending cram schools, and they wanted to communicate their willingness to attend cram schools to their parents. Next, the results show that there are different needs for “quality of class contents and instructors” for group/individual tutoring cram schools and “learning environment” for video tutoring cram schools, such as accessibility and safety. By understanding such differences in needs, it will be possible to provide and improve services that meet the needs of customers. However, despite the above contributions, there are some limitations and challenges. The data used in this study are all prior to March 2020. Against the background of the spread of the new coronavirus infection in 2020, changes are also expected in the form of cram school classes, such as the growing “local orientation” of students preparing for entrance examinations and “hybrid cram schools” that use both online and face-to-face instruction. An analysis that takes into account changes in the situation before and after 2020 is needed.
References 1. MEXT. https://www.mext.go.jp/kaigisiryo/content/000029825.pdf. Accessed 10 Feb 2023 2. MEXT. https://www.mext.go.jp/content/1407073_05_1_2.pdf. Accessed 10 Feb 2023 3. Learning City Inc. https://daigaku-juken-hacker.net/column/new-math-curriculum-2022. Accessed 10 Feb 2023
300
H. Sano and Y. Asahi
4. Yamada Consulting Group Inc. https://www.ycg-advisory.jp/industry/education/cram-school/. Accessed 10 Feb 2023 5. Kitajima, Y., Saito, R., Otake, K., Namatame, T.: https://orsj.org/nc2021f/wp-content/uploads/ sites/2/2021/08/2021f-1-C-2.pdf. Accessed 10 Feb 2023
Zebrafish Meets the Ising Model: Statistical Mechanics of Collective Fish Motion Hirokazu Tanaka(B) Tokyo City University, Setagaya, Tokyo, Japan [email protected]
Abstract. Animal groups’ collective motion is often spontaneously ordered, although individual constituent action is erratic and unpredictable. The emergent order in joint motion indicates an asymptotic law as the number of individuals in a group increases. This study explores such statistical laws of zebrafish collective motion by combining machine learning and statistical-mechanical approaches. First, we compute the two-dimensional positions of individuals in the school using the deep-learning method (idtracker.ai). The number of individuals ranges from 10, 60, 80, and 100, and each session spans ten minutes in a circular arena. The zebrafish school runs along the circular wall, either counter-clockwise or clockwise. The school with ten zebrafish does not show consistent rotations, whereas one with 100 zebrafish moves in a bistable manner, interspaced with sudden transitions of rotation directions. We then model the collective motion by binarizing each zebrafish motion into an upstate (s = +1) for counter-clockwise or a downstate (s = −1) for clockwise rotation. The interaction between two individuals is Isinglike with K-nearest neighbors, and an external field’s rotational preference bias is included. We consider two possible interactions: topological interactions independent of distance and metrical interactions dependent on distance. The gradient ascent algorithm finds the optimal value of interaction parameters by maximizing the log-likelihood function from the behavioral data. We discuss how zebrafish interact and determine motion directions under other individuals’ influence. Keywords: Collective Motion · Fish Swarm
1 Biological Collective Motion as a Physical Phenomenon A wide range of animal species, including birds, fish, and insects, form a group that travels in a coordinated manner different from when they move individually. Collective behaviors of animals emerge when each group member interacts with each other and have fascinated people since antiquity (for a general introduction to the schooling of fish, see [1, 2]). Quantitative studies of such collective behaviors, however, have started recently based on developments in measurement instruments, automatic tracking, and computational modeling [3]. As in the manner that meticulous astronomical observations by Tycho Brahe led to Kepler’s laws of planetary motion and, eventually, Newton’s laws of motion, we expect to discover biological laws of motion from quantitative, behavioral data of animals [4]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 301–309, 2023. https://doi.org/10.1007/978-3-031-35132-7_22
302
H. Tanaka
Previous studies proposed governing principles of collective motion and demonstrated that these models exhibited hallmarks of biological observations. In the pioneering boid model, Reynolds hypothesized three simple laws of collision avoidance, velocity matching, and flock centering and demonstrated that the boid model could reproduce the biologically plausible motion of bird flocks [5]. Subsequently, the self-propelledparticle (SPP) model proposed by Vicsek et al. introduced a tradeoff between an order (coordination with neighbors) and a disorder (random directional fluctuations), resulting in a phase transition between order and disorder phases [6]. Whereas these models’ behaviors resemble those of biological systems, it remains unsolved whether the same principles govern biological collection movements. Another complementary approach of data-driven modeling has attracted the attention of the field of computational ethology thanks to the development of automatic animal tracking methods. Quantitative measurements and analyses of animal trajectories have revealed phase transitions between coherent and incoherent motion in fish [7] and in locusts [8], scale-free correlations in starling flocks [9], estimation of interactions in the collective motion of surf-scoter flock [10], topological (not metric) interactions in starling flocks [11], and so on. Therefore, we now measure, formulate, and model biological motion like Brahe, Kepler, and Newton did in the 17th century. In this article, we attempt to formulate the collective motion of zebrafish schools employing statistical-mechanics techniques and estimate inter-individual interactions from trajectory data. Specifically, we develop the rotational directions of zebrafish as spin variables in an Ising-like spin model and assess inter-individual exchanges from trajectory time series. Our method is an extension of previous studies that analyzed the motion of bird flocks using maximum entropy methods [12] and stochastic dynamics [13].
2 Automatic Tracking of Animal Movements: Individuals and Group The most laborious and expensive factor in studying animal motion is measuring and quantifying individual animals’ time series of motion. Previous studies employed a specifically designed stereoscopic camera for measuring the motion of starling flocks and manually tracked the motion of a fish school, thereby prohibiting a large-scale measurement of animal motion. Recent developments in computer vision, machine learning, and deep neural networks have dramatically reduced the cost of quantifying animal motion. There are two broad classes of automatic tracking of animal motion. One is to track the body parts of one animal (or a few animals) with DeepLabCut as a prime example [14, 15]. DeepLabCut allows users to select body parts of interest in prototypical frames and automatically outputs time series of those body parts. This approach is appropriate when one is interested in detailed movements within an individual animal. Another is to track the centers of mass of a large number of individuals (typically up to one hundred) with idTracker [16] and idTracker.ai [17] as representative examples. In this article, we focus on the latter approach of group tracking and its computational modeling.
Zebrafish Meets the Ising Model: Statistical Mechanics
303
As a proof of concept, we employ the publicly available dataset of zebrafish schools consisting of 10, 60, 80, and 100 individuals (downloadable at https://idtrackerai.readth edocs.io/en/latest/data.html). The data of each size contains three sessions of 10 min and describes (x, y) positions of all individuals in pixel coordinates at 30 frames per second. Some position values are missing, perhaps due to occlusions or insufficient image contrasts, and numerical differentiation results in discontinuous velocities. As a preprocessing, we apply the Kalman smoothing algorithm to interpolate missing values and compute the velocities. Figure 1 illustrates the schematics of computing trajectories of a zebrafish school of 10 and 100 individuals, respectively. The idTracker.ai algorithm reads a video file as an input (left) and outputs a labeled video (middle). One can see that zebrafish move along the wall of the circular arena in either clockwise or counter-clockwise directions (right).
Fig. 1. Schematics of automatic zebrafish tracking. Original movies of 10 and 100 zebrafish (left), labeled movies processed by idTracker.ai (middle), and trajectories of all individual zebrafish (right).
As a first step in modeling zebrafish school motion, it is appropriate to consider rotational movements. To quantify rotational movements, we compute the angular momentum l with the right-hand convention, positive for a counter-clockwise rotation, namely, l=r×v
(1)
where r and v are the position and velocity of each zebrafish, respectively. Figure 3 illustrates the mean angular momenta averaged over the population of 10 and 100 zebrafish—the zebrafish school of size 10 changes the rotation directions frequently, suggesting dynamic fluctuations of collective motion. In contrast, the zebrafish school of size 100 rotates more coherently and switches the rotational movements intermittently, indicating stable rotational dynamics with sporadic transitions. A similar size-dependent dynamical change was also reported in locust marching [8]. We here may ask: How does the rotational direction of one zebrafish influence other, neighboring zebrafish?
304
H. Tanaka
Fig. 2. Mean angular momenta for zebrafish schools of (top) 10 and (bottom) 100 individuals. The abscissa and ordinate denote the time for 600 s and the angular momenta in pixel coordinates, respectively. Each school has three sessions.
3 Statistical Mechanical Approach to Modelling the Group Motion Statistical mechanics is a branch of physics that studies the emergent properties of a system composed of many interacting components, such as atoms and molecules. A system of thermodynamic limit (i.e., a system with many degrees of freedom) discloses properties not inherent to its composition elements, such as phase transitions and scaling laws among physical observables. It is natural to model the collective motion of animal groups within the framework of statistical mechanics. The kinematics of zebrafish schools contains a large scale of time-dependent variables (100 individuals × 18,000 time points × 2 dimensions for 100 zebrafish schools), and it is not appropriate to directly model the school dynamics. As observed in the previous section, the rotational direction of each zebrafish and the school is of our interest. Therefore, we discretize the rotational direction into clockwise or counter-clockwise. Accordingly, the spin of i-th zebrafish at time step t is denoted by either si (t) = +1 (counter-clockwise) or si (t) = −1 (clockwise). We introduce a Hamiltonian of spin variables following the Hamiltonian H of the Ising model as K N N Jk si sj(i) + bi si = − si hi (2) H ({s}) = − i=1
k=1
k
i=1
where Jk is the interaction coefficient with the k-th neighbor and bi is the bias parameter specifying the directional preference of the i-th zebrafish. We here assume that an i-th
Zebrafish Meets the Ising Model: Statistical Mechanics
305
(i)
individual interacts with K nearest neighbors (denoted by jk , k = 1, · · · , K). The input hi to i-th zebrafish is a weighted sum of K neighboring spins and the preference bias bi as hi =
K k=1
Jk sj(i) + bi .
(3)
k
The spin model and proposed interactions are schematically summarized in Fig. 3.
Fig. 3. Schematics of interactions described in the Ising-like model. (Left) the i-th individual has s = +1, which is influenced by the interaction with its neighbors. (Right) The spin is updated according to the input (i.e., a weighted sum of neighboring spins and a preference bias). The spin tends to align its direction to the input h.
The advantage of formulating the interactions using a spin model is its flexibility; the interaction terms can incorporate specific assumptions about biological interactions. One may develop a particular interaction according to the biological interest in question and straightforwardly incorporate it into the spin model. We can consider four candidate models (Fig. 4). Model 1 assumes that an individual interacts with a fixed number (K) of nearest neighbors regardless of the distance (sometimes called topological interactions [11]). Model 2 assumes that an individual has an interaction area of fixed radius (R) and interacts with all neighbors in the interaction area. Model 3 assumes distancedependent interactions that decay as r −α . Finally, Model 4 assumes direction-dependent interactions in which an individual interacts more strongly with neighbors in front (J + κ) than neighbors in the back (J-κ). The Hamiltonian formulation in Eq. (2) is based on Model 1. We present the results of Model 1 of the fixed-neighbor model due to the limited space of this article.
306
H. Tanaka
Fig. 4. Four hypothetical interactions that can be modeled in the spin model. (Model 1) Fixedneighbor model, (Model 2) fixed-radius model, (Model 3) distance-dependent model, and (Model 4) direction-dependent model.
4 Inference of Pairwise Interactions from Behavioral Data The statistical model formulated in the previous section describes the coordinated behaviors of an animal group. There are two problems, forward and inverse. The forward problem is determining the rotational directions {s} when values of the coupling coefficients {J } are explicitly provided. Physics textbooks discuss the forward problem of what spin configurations are realized when interactions are physically determined. Conversely, the inverse problem is to infer the coupling coefficients {J } when the rotational directions {s} are given. In machine learning and neural networks, algorithms solving the inverse problems are developed. In this section, we tackle the inverse problem of coefficient inference. We describe how the spins at time step t determine the spins at time step t + 1. The Glauber dynamics represent the transition probability of a spin system from a current time step t to a subsequent time step t + 1. The probability of the i-th spin being up at step t + 1 is determined by the interaction with its neighboring spins as Pr( si (t + 1)|s(t)) =
exp(si (t + 1)hi (t)) . exp(+hi (t)) + exp(−hi (t))
(4)
When all the values of spin variables are provided for a session of length T, the log-likelihood function of coupling coefficients and biases is obtained as (J , b) = log L(J , b) =
−1 n T
log Pr( si (t + 1)|s(t)).
(5)
i=1 t=1
We determine the values of coupling coefficients by maximizing the log-likelihood function as Jˆ , bˆ = arg max (J , b). (6) (J ,b)
Zebrafish Meets the Ising Model: Statistical Mechanics
307
Specifically, we optimize the parameter values by a gradient ascent method as Jk = +η
∂(J , b) ∂(J , b) , bi = +η ∂Jk ∂bi
(7)
with the gradients n T −1
∂(J , b) {si (t + 1) − si (t + 1)}sj(i) (t + 1) = k ∂Jk
(8)
i=1 t=1
and n T −1
∂(J , b) {si (t + 1) − si (t + 1)}. = ∂bi
(9)
i=1 t=1
We here show the results of the K fixed-neighbor model (Fig. 5). First, the interaction coefficients are weakly dependent on the order of the neighbors, namely that the nearest neighbor and the K-th nearest neighbor contribute almost equally (Left panel in Fig. 5). Whereas weakly decreasing for schools of sizes 10 and 60, the interaction coefficients are constant for schools of sizes 80 and 100. The constancy of interactions indicates that zebrafish topologically influence each other. Next, the interaction coefficients are more substantial as the school size increases (Right panel in Fig. 5). The increased interactions in a larger school indicate coherent rotational directions, as empirically observed in Fig. 2. Therefore, the spin model and the maximum-likelihood estimation provide a mechanistic explanation of the observed collective dynamics of the zebrafish school.
Fig. 5. Estimated interaction coefficients for 10 (black ◯’s), 60 (red ×’s), 80 (green +’s), and 100 (blue *’s). (Left) Estimated coefficients in descending order of nearest neighbors. Whereas slightly decreasing, those coefficients are relatively constant. (Right) Estimated coefficients according to the school size. The interaction becomes more vital as the size of the school increases.
5 Discussions The proposed model based on statistical mechanics provides systematic modeling and estimation of interactions among individuals within an animal group. Estimating interaction coefficients suggests the following picture of rotational dynamics (Fig. 6). The
308
H. Tanaka
relatively weak interactions found in a small school result in unstable, frequently switching rotations. In contrast, the relatively strong interactions in a large school lead to stable rotations with intermittent switching. Our modeling is straightforward, so there are possible extensions and limitations. In a future study, we will perform a statistical comparison among the four possible models (see Fig. 4). We hope to discover more quantitative laws of motion of zebrafish schools from the data. On the other hand, the proposed model assumes symmetric interactions between two individuals and treats all individuals equally. Therefore, the proposed model cannot uncover which individuals are more influential or inconspicuous. An improved model should include uneven relationships designed to expose a hierarchy in a zebrafish school.
Fig. 6. Energy landscapes of collective motion derived from the experimental data. (Left) A small school of zebrafish interacts weekly, so the rotational movement is not stable with frequent transitions. (Right) A large school of zebrafish has stronger interactions, stabilizing the rotational movements with sporadic changes.
Acknowledgments. This study is partly supported by JSPS KAKENHI (Grant Numbers JP21K11200 and JP22H05082) and Tokyo City University Prioritized Studies.
References 1. Partridge, B.L.: The structure and function of fish schools. Sci. Am. 246(6), 114–123 (1982) 2. Shaw, E.: Schooling fishes: the school, a truly egalitarian form of organization in which all members of the group are alike in influence, offers substantial benefits to its participants. Am. Sci. 66(2), 166–175 (1978) 3. Datta, S.R., et al.: Computational neuroethology: a call to action. Neuron 104(1), 11–24 (2019) 4. Brown, A.E., De Bivort, B.: Ethology as a physical science. Nat. Phys. 14(7), 653–657 (2018) 5. Reynolds, C.W.: Flocks, herds and schools: a distributed behavioral model. in Proceedings of the 14th annual conference on Computer graphics and interactive techniques (1987)
Zebrafish Meets the Ising Model: Statistical Mechanics
309
6. Vicsek, T., et al.: Novel type of phase transition in a system of self-driven particles. Phys. Rev. Lett. 75(6), 1226 (1995) 7. Becco, C., et al.: Experimental evidences of a structural and dynamical transition in fish school. Phys. A 367, 487–493 (2006) 8. Buhl, J., et al.: From disorder to order in marching locusts. Science 312(5778), 1402–1406 (2006) 9. Cavagna, A., et al.: Scale-free correlations in starling flocks. Proc. Natl. Acad. Sci. 107(26), 11865–11870 (2010) 10. Katz, Y., et al.: Inferring the structure and dynamics of interactions in schooling fish. Proc. Natl. Acad. Sci. 108(46), 18720–18725 (2011) 11. Ballerini, M., et al.: Interaction ruling animal collective behavior depends on topological rather than metric distance: evidence from a field study. Proc. Natl. Acad. Sci. 105(4), 1232–1237 (2008) 12. Bialek, W., et al.: Statistical mechanics for natural flocks of birds. Proc. Natl. Acad. Sci. 109(13), 4786–4791 (2012) 13. Yates, C.A., et al.: Inherent noise can facilitate coherence in collective swarm motion. Proc. Natl. Acad. Sci. 106(14), 5464–5469 (2009) 14. Mathis, A., et al.: DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21(9), 1281–1289 (2018) 15. Nath, T., et al.: Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat. Protoc. 14(7), 2152–2176 (2019) 16. Pérez-Escudero, A., et al.: idTracker: tracking individuals in a group by automatic identification of unmarked animals. Nat. Methods 11(7), 743–748 (2014) 17. Romero-Ferrero, F., et al.: Idtracker.ai: tracking all individuals in small or large collectives of unmarked animals. Nat. Meth. 16(2), 179–182 (2019)
Research on New Design Methods for Corporate Value Provision in a DX (Digital Transformation) Society Visualization of Value by Lifestyle Derived from Qualitative Analysis Akio Tomita1(B) , Keiko Kasamatsu1 , Takeo Ainoya2 , and Kunika Yagi3 1 Graduate School of System Design, Tokyo Metropolitan University, Tokyo, Japan
[email protected], [email protected] 2 Tokyo University of Technology, Tokyo, Japan [email protected] 3 Misawa Homes Institute of Research and Development Co., Ltd., Tokyo, Japan [email protected]
Abstract. This study examines the challenges faced by construction companies in their digital transformation, specifically custom home builders. Despite the widespread use of the term, many companies have yet to fully embrace the value of digital technologies and environments in changing their customer relationships and the value they provide. The purpose of the study is to identify the value structural elements of daily life and develop methods to express these values through digitalization. The research methodology used qualitative analysis and agile prototyping to develop a two-dimensional frame to express the value structure. The results showed that the lifestyle-specific value changed depending on the relationship between different categories and that the frame could fit and compare data from the analysis sample. However, challenges remain in terms of understanding and comparing many types and values of lifestyles, and the study aims to further develop the visualization of the frame to promote change in corporate digital transformation. Keywords: Qualitative analysis · framing · value of life · Digital Transformation · Prototyping · Design Method
1 Introduction 1.1 Business Environment In Japan, many companies and organizations are engaged in digital transformation. Construction companies, especially housing companies that build custom homes, are no exception. In addition to existing hands-on locations such as home showrooms, virtual reality (VR) and other virtual spaces, as well as information on social networking services and websites, are becoming omni-channel. In addition, the scope of what computers can © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 310–319, 2023. https://doi.org/10.1007/978-3-031-35132-7_23
Research on New Design Methods for Corporate Value Provision
311
recognize has expanded extensively since the evolution of AI. In image recognition, many services such as security services and automated driving have reached the practical stage, and in the recognition of language and conversation, natural language processing (NLP) technology has also developed groundbreaking algorithms and the digital technology base The digital technology infrastructure is evolving significantly [1, 2]. However, although the term “digital transformation” is widely used, many existing companies, especially homebuilders, have only replaced paper with digital documents and have yet to create new experience value and problem-solving value for their customers by utilizing digital technology platforms and environments. 1.2 Corporate Problem This problem is because companies have built their business schemes based on previous era technologies, and without changing their schemes, they are only utilizing digitalization as a narrow interpretation. They do not envision how much they can or will change their relationships with their customers and the value they provide through digitalization. In other words, the problem is that the digitalization is only at a partially optimal level, as it is merely a means to improve work efficiency or a means of expression, and the possibility of changing the company’s raison d’etre is not considered at the overall optimal level. 1.3 Introduction of the Housing Industry In Japan, many new houses are being built and there are many companies that provide them. Some of them are major global housing manufacturers, such as an automobile manufacturer (TOYOTA) and an appliance manufacturer’s brand (Panasonic). These companies believe that the industrialization of housing construction will enable them to provide quality housing to a larger number of people and to evolve their lifestyles. 1.4 How a New Custom-Built Home Works New Construction Service for Private Residences (NCS), which creates individual living spaces for its clients, needs to provide the potential ideal living within its clients. New Construction Service for Private Residences is about understanding the time and space that the client and his/her family consider important in their lifestyle, and creating a new space that can solve the challenges of daily housework and childcare. Just as housing trends differ from country to country, lifestyle values differ depending on the environment in which one was raised, and each individual imagines their ideal lifestyle based on these values. 1.5 The Importance of Understanding Lifestyle Values It is important to note here again that the “ideal way of life” that each individual has is not a single thing, and it is not simply a matter of reducing or making more efficient the activities of daily living. In extreme cases, living and housework can be considered skills
312
A. Tomita et al.
for survival. Running a life includes raising children. It does not follow that because child rearing is a work burden, it should all be eliminated. Of course, the trend is for workloads such as housework to be reduced. However, each country, each family, each culture, and each person has different manners and interactions with partners and family members that are important in their lives. In other words, no one person can have the same ideals and values for the way of life, even for the household chores that run our lives, and the weight of value is different. Sharing and understanding these differences in value is the key to enriching families and society. If these are not understood and shared, many things, such as marital quarrels in small matters, conflicts between nations (capitalism and socialism) in large matters, and religious conflicts, can be attributed to the inability to share and understand essential values. There are many reports that with the advancement of digitalization and AI, simple tasks and communication that used to be performed by humans will no longer be necessary due to the increased cognitive level of computers. The statement “65% of today’s children will end up working” The statement “65% of today’s children will end up working” is often attributed to futurist and author Cathy Davidson. In such new social changes, what kind of value should companies, including globalized companies and homebuilders, provide to their customers?
2 Purpose With the evolution of these digital technologies, it is assumed that the understanding of the manifested needs portion of individuals will be resolved, and the task will shift to understanding the potential value based on each individual’s experience and values, and the level of empathy that can be shared. Corporate evolution requires design methods that respond to those environments. Therefore, we set the following two main objectives. (i) “Identifying value structure elements” to understand the values of life based on each background: Clarification of structural elements. (ii) Develop “methods of value expression” for the way of life that can be utilized through digitalization, and examine the situations in which they can be utilized.: Value Expression Method. “(i) Clarification of Structural Elements” is based on the need to understand the value structure of daily life, which is unconsciously practiced by each person, and what components of value are recognized as attractive and valuable. The reason for “(ii)Value Expression Method” is that from the customer understanding point of view„ it is necessary to understand not only the person and user experience, but also the value of different lifestyles in the product planning and development process.
3 Methods The research methodology is as follows. For (i), we conducted a qualitative analysis of the elements of the value structure. The target sample consisted of data from published books by people who have a philosophy of life, housework professionals, and others who independently disseminate their lifestyles (Table 1). Books disseminate the author’s
Research on New Design Methods for Corporate Value Provision
313
values. And since they are published to influence many people, they are useful as data to measure the types of values of the way of life that are already prevalent in the world. Table 1. Summary of book selection policy and target sample Heading level
Example
Subjects
Books
Selection Category
Books that fall under 59 categories (Home Economics, Life Science) of the Nippon Decimal Classification NDC
Themes Books
that show methods and ideas in daily household life, such as housework, meals, cooking, washing dishes, shopping, and tidying up
Composition of authors Authored by a single individual Author Attributes
Family members who live together
Sample size
11 books
Analysis method
Modified version of the Grounded Theory Approach
The analytical methodology was adapted from a modified version of the Grounded Theory Approach [3]. The Grounded Theory Approach is a methodology for building theories based on the real world. This approach aims to allow researchers to collect and analyze qualitative data and build theories from them, and previous studies have shown that the results obtained are more accurate and richer in information than other methods. [4, 5] The Grounded Theory The Grounded Theory Approach was proposed by Glaser and Strauss in 1967. Analysis of qualitative data using predefined theories and concepts. The approach referred to here is a model proposed by Y. Kinoshita as his own modified version of the Grounded Theory Approach with the aim of building a clearer theory [6]. The method of value expression in (ii) was developed in parallel with the structural element analysis, using agile prototyping to develop frames. The goal was to categorize the elements of the value structure and map them in such a way that values of people with different lifestyles (e.g., different countries of birth and upbringing) could be organized and compared using the same frame. The reason for utilizing the method of frames as a means of expressing value is that previous research has shown that ideas are easier to understand as a framework when frames are incorporated [7, 8]. In the field of artificial intelligence, M. Minsky also proposes frame theory as a method of knowledge representation that structurally understands the semantic system of knowledge by setting up a framework [7]. In other words, using frames not only makes it easier for people to understand and recognize, but also makes it an effective method for digitization. Agile prototyping is one approach to software development. This approach aims to develop software through collaboration between customers and developers, working together frequently and getting frequent customer feedback [9]. In this case, the objective is a corporate value delivery design approach, and the customer is a research developer who is researching new value creation at a housing company. In this prototyping, we used an agile prototyping approach with feedback from two research developers of the housing. This approach allows customers and developers to create. Something they will
314
A. Tomita et al.
actually use early on and then make modifications and improvements. The benefits of this approach include increased customer satisfaction, cost savings, and quicker product releases [10].
4 Result Structured Analysis and Prototyping. The procedure for analysis is to extract the target data that matches the purpose of the analysis (in this case, the content part indicating the way of life), and separate the data into chunks of multiple sentences that have continuity in meaning. These are then grouped into a bundle of multiple similar meanings, and the meaning representing each of them is expressed in a single sentence, which is called a concept (Fig. 1). We took the process of grouping the similarities of each of those concepts together and organizing how they are related to each other.
Fig. 1. Concept Generation (Conceptualization) from Individual Data
We conducted a structured analysis of life value using M-GTA and prototyping of value visualization and expression methods, and developed four prototype visualization frames of life value (Figs. 2, 3 and 4). A frame here refers to a layout diagram consisting of several frames (categories), and a layout diagram in which concepts are mapped within those frames. In the first prototype, Prototype A, the concepts analyzed for each sample were bundled into what could be interpreted as the same category, and frames were individually created that could be attributed to the objective from the relationship diagram. This is similar to the existing Modified Grounded Theory Approach analysis results diagram, so to speak. With this type of diagram, although each of the frames shows a value expression, when comparing different samples, it was difficult to capture the common ideas and the level of differences between them. In addition, not only is
Research on New Design Methods for Corporate Value Provision
315
it difficult to analyze and visualize values without an analyst specializing in qualitative analysis, but it is also difficult to implement and generalize the use of these results. Based on the aforementioned issues of Prototype A, the next prototype of Prototype B was designed to make frames common in order to solve the difficulty of comparing frames expressing different values. We defined the common categories in the sample group as “action,” “influence,” and “self-actualization,” and made it possible to tie related concepts together with these three categories as the trunk (Fig. 2-1, Fig. 2-2).This way of expressing the frame, with the framework consisting of the same three categories, makes it easier to compare samples with each other by seeing the differences in each category. At the stage of extracting sample data, it also becomes easier to extract those that fit into the three categories from a vast amount of information, and the method of extracting and organizing information can be generalized. However, it was not possible to determine whether people liked, disliked, or did things unconsciously, even if they were similar in content, and a hypothesis was born that it would be easier to understand if emotional weighting was used.
Fig. 2. -1 Prototype B: Frame consisting of Three Categories
Therefore, based on the issues in Prototype B, the following Prototype C (Fig. 3) was prototyped. In addition to the three categories that formed the trunk of the frame in Type B, a fourth category, “emotion,” was added to the frame, and the sample data was reorganized. According to Gene A. Fisher, emotions are noted to be highly related to personality, as he says that emotions and personality traits can be described by the same model [11]. Since living itself is an act of survival itself, but emotional behavior is said to be basically a communication process to ensure the survival of the individual and to leave genes [11], we found that when information is placed in the emotional category, various behaviors and facts are characterized.
316
A. Tomita et al.
Fig. 3. -2 Prototype B: Frames with Three Categories with other Categories Associated with Them
However, the data extracted from the books were scattered in many cases that contained categorical elements across more than one category, such as emotion and affect, or action and emotion, and in many cases it was difficult to determine which category to place the information in. In addition, there were many concepts that did not clearly express emotion or that did not seem to recognize emotion, indicating the limitations of the sample in this book and the need to be creative in how we collect information for future samples. Therefore, we created the following prototype D (Fig. 4). We decided to encompass “emotion” within the three categories of “action,” “affect,” and “self-actualization” mentioned above. We then created two subcategories for each of the three categories, “action” and “affect,” each with its own axis. The inclusion of subcategories makes it easier to compare elements of each category. In the “Action” category, we placed the subcategories “Work” and “Tools,” and in the “Influence” category, we placed the subcategories “Physical” and “Psychological. As for the axes, the “action” category has an axis of “more” or “less” quantity, etc., while the “influence” category has an axis of “positive” or “negative” influence, allowing for weighting. The frames composed of these categories can now express the amount and bias of conceptual data among categories, as well as differences among the same categories. It is also possible to show the differences in what each category consists of and how each category is structured toward self-actualization, the highest level of need in Maslow’s five-level theory of need. Specifically, we found that the way the emphasis within one category is connected to the emphasis of another category, and the difference in the connection between the three categories, changes the value by lifestyle. This opens up the possibility of finding
Research on New Design Methods for Corporate Value Provision
317
Fig. 4. Prototype C: A Frame with Three Categories Plus an Emotional Category
value and attractiveness from people other than oneself that one did not experience in one’s own life. Consideration of the Application Scenarios. The next step was to examine the application scenarios. While prototyping the prototypes, we simultaneously examined the application of the prototypes. While each prototype has its own merits and demerits, we were able to envision two possible application scenarios. The first was to utilize the STP part of the marketing process at the time of strategic planning for segmentation, targeting, and positioning. It has been pointed out that it is currently difficult to capture the diversified values of each generation in mass, and it has become difficult to set STP strategies based on attributes. By narrowing down this targeting process with the elements of the Life Value Visualization Frame, it will be easier to estimate demand and make decisions on development. The second is to use personas as additional information when creating UX maps. In addition to attribute information, needs and objectives are set for personas, and related behaviors and emotions are expressed in scenes and other forms. There are many variations of persona creation, but even detailed versions are mainly descriptive and not easy to understand visually.
318
A. Tomita et al.
Fig. 5. Prototype D: Frame with two subcategories for “behavior” and “impact”.
Therefore, this prototype allows us to understand the background that gives rise to the needs of the personas, and is useful as data for creating the value to be provided to solve these needs.
5 Discussion The research theme of “clarification of value structure elements” and “value expression methods” was clarified through qualitative analysis and prototyping. This is considered a certain achievement that is effective in an increasingly digitalized society. Specifically, the following reasons can be assumed to be valid. 1. The value of the way of life can be visualized and digitized. 2. Differences in value can be utilized in the marketing process (customer targeting). 3. The value can be used for communication among members during product planning and development. 4. There is a possibility to improve the level of understanding from content understanding to value understanding by incorporating natural language processing technology into value structure analysis using visualization frames. Advancement of digitization removes environmental hurdles such as various places and times, real or virtual, etc., and accelerates globalization, which makes it easier to
Research on New Design Methods for Corporate Value Provision
319
aggregate information based on various different values, such as information that had remained local and values that had not been diffused much from the same oriented group. Information based on different values, such as information that had remained local and values that had not been diffused well by the same oriented groups, can be easily aggregated. The information gathered may express meaning from one perspective but be based on different values with different objectives, and its true meaning may not be well understood by groups with different values. Interpreting those meanings by comparing value structures has the advantage that even those with different values can be exposed to new values, which can lead to the discovery of new issues, new solution ideas, and new demands. On the other hand, however, challenges remain. Although the frame we have developed shows the differences, it is necessary to read and understand the mapped text every time. In other words, it is difficult to compare many types and values of living, and it takes time to visualize what kind of lifestyle is being imagined. While it has the potential to be a tool for planning and development departments, it is still considered a high hurdle for customers to understand. In the future, we plan to improve the accuracy of this frame, perform graphical processing, and consider making the visualization easier to understand. We would like to contribute to the promotion of change through corporate digital transformation by validating the frame and developing a proposal tool for corporate development sites and customers.
References 1. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 2017 (2017) 2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol 1. (2019) 3. Kinoshita, Y.: Live Lectures M-GTA. Kobundou (2007) 4. Glaser, B.G.: The Constant Comparative Method of Qualitative Analysis. Soc Probl. 12(4), 436–445 (1965). https://doi.org/10.2307/798843 5. Corbin, J.M., Strauss, A.: Grounded theory research: procedures, canons, and evaluative criteria. Qual Sociol. 13(1) (1990). https://doi.org/10.1007/BF00988593 6. Kinoshita, Y.: The grounded theory approach as a qualitative research method: its characteristics and analytical techniques. J. Commun. Psychol. 5(1), 49–69 (2001) 7. Minsky, M.: Frame of Mind. Industrial Book Co. (1990) 8. Fillmore, J.C.: Frame semantics and the Nature of Language. Ann N Y Acad Sci. Published online (1976) 9. Murugaiyan, D.: International Journal of Information Technology and Business Management WATEERFALLVs V-MODEL Vs AGILE: A COMPARATIVE STUDY ON SDLC. Int. J. Inf. Technol. Bus. Manag. 2(1) (2012) 10. Mishra, A., Abdalhamid, S., Mishra, D., Ostrovska, S.: Organizational issues in embracing Agile methods: an empirical assessment. Int. J. Syst. Assur. Eng. Manage. 12(6), 1420–1433 (2021). https://doi.org/10.1007/s13198-021-01350-1 11. Pulutick, R.: The circumplex as a general model of the structure of emotions and personality.pdf (1997). https://psycnet.apa.org/record/1997-97129-001
Evaluating User Experience in Information Visualization Systems: UXIV an Evaluation Questionnaire Eliane Zambon Victorelli(B)
and Julio Cesar dos Reis
Institute of Computing and Nucleus of Informatics Applied to Education, University of Campinas, Campinas, Brazil {eliane.victorelli,jreis}@ic.unicamp.br
Abstract. Understanding how people interact with data and why users prefer one system over another is necessary to develop successful solutions for information visualization. Evaluating the users’ experience with applications based on interaction with data is challenging due to the several parameters involved. It requires evaluating the experience offered by the system, affected data, and the analysis process supported. Existing measurement tools do not take all of these aspects into account. This study investigated the gaps in an existing user experience evaluation tool regarding the interaction with data. A thematic analysis identified the missing themes and grounded our definition of evaluation dimensions. A combination of literature review and designer collaboration resulted in the evaluation statements. Our contribution consists of a questionnaire to support evaluating specific aspects of experience and relevant dimensions for measuring the interaction with data in information visualization systems. Keywords: User experience Human-data interaction
1
· Evaluation · Information visualization ·
Introduction
Visual representations have the potential to facilitate data interpretation. Effective visual interfaces are essential for people to go through data, understand them and find information [25]. The effectiveness of an Information Visualization (IV) system in helping people achieve desired results is mainly influenced by the User Experience (UX) [1]. The design of IV systems must consider human abilities and respect their limitations, allowing for seamless and enjoyable interaction with data [12]. The area of Human-Data Interaction (HDI) investigates how people interact with data as an analogy to how Human-Computer Interaction (HCI) explores the relationship between people and computers [17]. UX can be understood as users’ perceptions and responses to a system, product, or service. The concept of UX emphasizes users’ perception before, during, and after interaction with a technical product [26]. UX is related to the user’s c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 320–338, 2023. https://doi.org/10.1007/978-3-031-35132-7_24
Evaluating User Experience in Information Visualization Systems
321
internal and physical state resulting from the context of use and prior experiences, attitudes, skills, abilities, and personality [19]. Evaluating UX is complex due to the range of the parameters involved. Measuring UX is not trivial because the experience is subjective, context and product-dependent, and changes over time [13]. The UX in visualization is even more complex [20]. It is essential to achieve a richer understanding regarding all factors that influence the visual representation of data, their development, and their use [18]. IV systems are more exploratory and seek to support more unpredictable tasks than those performed with systems with typical interfaces. Existing methods and instruments are not enough for comprehensively assessing complex data analysis processes [31]. The evaluation of UX in IV systems must have dedicated approaches to assess the effectiveness of the complex processes typical of this type of system. This assessment demands practical and well-defined tools to collect qualitative and quantitative data on UX to allow systematic analysis and interpretation [9]. There are studies about generic UX evaluation instruments [14,26], UX evaluation instruments for specific contexts [22,32,38,39,41], and studies that addressed specific aspects of UX for IV [4,5,16,33,33]. However, we still lack evaluation instruments considering particular characteristics of this type of system, such as content and presentation of data, information context, and types of interaction with data. This study aims to design a questionnaire with relevant statements for the UX evaluation of IV systems. The activities included a workshop involving students. Their goal was to decide on a common problem in their life stage. The participants explored a website with information visualizations to support the decision. They assessed the UX offered by the website using a generic UX evaluation questionnaire and answered questions about the instrument used. They missed relevant aspects to evaluate the UX related to IV. In this investigation, we developed a new questionnaire called UXIV for UX evaluation in IV systems. We extended a configurable instrument for UX evaluation called meCUE [26] addressing the gaps found by participants from our study. We defined a novel module for assessing the experience in data analysis and proposed statements for UX evaluation. The module involves aspects related to principles of HDI [27], categories for user data interaction [42], and guidelines for data interaction design [40]. The remainder of this article is structured as follows. Section 2 presents background describing fundamental concepts and related work; Sect. 3 reports on our methodology for constructing the questionnaire. Section 4 shows the designed questionnaire; Sect. 5 discusses our obtained findings, lessons learned, and directions for future research; Sect. 6 presents our final considerations.
2
Background and Related Work
This section introduces instruments and approaches for evaluating UX. We present specific challenges of evaluating UX in IV and related studies to our investigation.
322
2.1
E. Z. Victorelli and J. C. dos Reis
UX Evaluation Approaches
Evaluating what has been built and the users’ experience is at the heart of a design process [28]. Evaluation methods capture different aspects of UX and may help to improve the design. Each assessment method or instrument is more appropriate in a given context. Some instruments focused on measuring emotional reactions caused by a system or device using a pictorial assessment technique. For example, the SelfAssessment Manikin (SAM) is an instrument for assessing the emotional component of the experience, capturing information about the quality and intensity of pleasure, arousal, and dominance associated with a person’s affective reaction [6]. In a different approach, there are evaluation methods based on physiological measurements, such as galvanic skin response, cardiovascular measures, respiration rate, and electrodermal activity, or electromyography [24]. Questionnaires have been extensively used in HCI research to get user experience feedback [30]. Collecting feedback about a user’s experience can be used to understand their interaction with technology or to inform system requirements and improvements [30]. They allow an efficient quantitative measurement of product features [21]. A distinction can be made between questionnaires with questions that measure separate variables (e.g., preferences concerning political party) and questionnaires with questions that are aggregated into either a scale or index (e.g., an index of Social Economic Status)1 . Questionnaire-based research can support descriptive or explanatory studies. In descriptive studies, questionnaires that measure separate variables are often used to determine the distribution of responses of people on some specific questions, such as satisfaction with the economy. In explanatory research, on the other hand, the questionnaires are composed of questions aggregated into either a scale or index. They can be used to explore why something occurs or increase the understanding of a given topic, ascertain how or why a particular phenomenon is happening, and predict future occurrences, for example, the reasons for satisfaction with the government [35]. In this work, we build the UXIV questionnaire with statements to evaluate separate variables describing IV systems’ UX. The questionnaire aims to be a practical way to measure UX quickly, simply, and immediately, covering preferably comprehensive aspects of the user’s experience when interacting with IV systems. Our study does not establish or aggregate data collected into a scale or index for UX in IV systems. 2.2
UX Evaluation for Visualizations
The UX evaluation of IV systems has particularities and can be even more challenging than evaluating other systems. The objectives of a visualization system and its supported tasks are more abstract or intangible than those of an information system with a typical user interface. The interfaces and controls for 1
https://en.wikipedia.org/wiki/Questionnaire.
Evaluating User Experience in Information Visualization Systems
323
interactions are only one of the relevant issues for users. Data and the quality of representation must be considered in the evaluation. It is necessary to assess all these elements and the interactions provided by them in the context of the complex data analysis process. Furthermore, measuring the development of new data insights in the IV context is difficult. There are a great variety of cognitive reasoning tasks. It is not always possible to trace whether a successful discovery was made through IV. Many factors might have played a role in the discovery, and it can be challenging to test such tasks empirically [8]. One of the challenges of identifying and quantifying UX is to define metrics that can assess its components. Identifying all the aspects that compose good UX when using IV is ambitious. Saket et al. summarized some methodologies and metrics of UX used to evaluate visualization goals [33]. More specifically, the study concentrates on the aspects of memorability, engagement, and enjoyment as follows: – Memorability is related to maintaining and retrieving information and involves a person’s ability to remember and recall information about the visualization. In the context of IV, Memorization involves immediate, short, or long-term memory [33]. – The user engagement goal concerns the emotional, cognitive, and behavioral connection between a user and a resource at any point in time and possibly over time [2]. Engagement can be understood as a user’s investment in exploring a visualization [5]. A taxonomy correlates engagement with the cognitive effort the user needs to employ to accomplish a task. The taxonomy has engagement levels ranging from “knowing how to read data” to “making decisions based on evaluations of different hypotheses” [23]. – Enjoyment concerns the feeling that causes the subject to experience pleasure [10]. A model for measuring enjoyment in visualizations deals with six levels: i. challenge, ii. focus, iii. clarity, iv. feedback, v. control, and vi. immersion [34]. The results of Saket et al. [33] do not intend to exhaust all possible objectives related to UX in IV. Despite this, we use their taxonomies to define memorability, engagement, and enjoyment. In our UXIV questionnaire, the user can classify these items according to taxonomies. 2.3
Related Work
The literature presents questionnaires that seek to understand multiple dimensions of users’ perceptions and experiences for any type of product, such as the AttrakDiffTM [14]. Their authors understand UX as a consequence of a user’s internal state (needs, disposition, expectations, motivation to use the product, and previous experiences with the product), system characteristics (complexity, purpose, usability, aesthetics, functionality), and the context in which the
324
E. Z. Victorelli and J. C. dos Reis
interaction occurs (organizational or social environment, meaning of the activity, voluntary use) [15]. The AttrakDiffTM assesses the pragmatic, the hedonic quality, and the attractiveness of an interactive product [14]. Another questionnaire to conduct UX evaluation is the Modular Evaluation of Key Components of User Experience (meCUE) [26,36]. This questionnaire aims to measure the main components of UX in a comprehensive and unified way. The evaluator can configure the questionnaire to meet specific research goals using only the necessary modules [26]. The meCUE consists of four modules (including nine sub-dimensions and a single item), as shown in Fig. 1. The modules referred to instrumental and noninstrumental product perceptions (usefulness, usability, visual aesthetics, status, and commitment), user emotions (positive and negative), consequences of usage (product loyalty and intention to use), and an overall judgment of attractiveness evaluated by a single item [26].
Fig. 1. meCUE questionnaire and its four modules, nine subdimensions, and a singleitem (source: [26]).
Other studies evaluating the experience with visualization systems focused on measuring particular aspects of UX. The aspects investigated includes memorability [4,33], engagement [5,33], aesthetics [16], and enjoyment [33]. There are even studies that focused on specific components of IV as animated elements [37]. In a distinct approach, research on different domains developed dedicated UX instruments because the existing ones are not specific enough for their context. Specific tools exist to evaluate the UX of services [38], innovation process [22], audiovisual application [32], mobile systems for news journalism [39], and games [41]. These instruments highlight the importance of understanding UX’s particularities that depend on the domain and application type. A specific evaluation instrument can be easier to administer and analyze when compared to other dedicated methods that can capture UX particularities of a context, such as interviews or observational techniques. In summary, we found generic UX evaluation instruments for any system, UX evaluation instruments dedicated to specific contexts, and studies that addressed
Evaluating User Experience in Information Visualization Systems
325
particular aspects of UX for IV. However, we have not found a unified tool measuring IV systems’ UX. Our study focuses on the development of a questionnaire dedicated to evaluating the UX of this type of system.
3
Developing UXIV: A Questionnaire for the UX Evaluation of Information Visualization Systems
This research proposes the UXIV questionnaire to evaluate the UX on IV systems. The questionnaire is a set of written statements devised to support UX evaluation. A statement is a declaration describing facts or perceptions about a particular aspect of the UX in visualizations. For each statement, the evaluator has a choice of the level of agreement. In this study, the term UX aspect refers to a central UX component at the conceptual level. One aspect can be materialized in a set of statements called dimension. The questionnaire is organized into modules that measure one or more dimensions. Therefore, a module refers to various UX-related aspects represented by the respective dimensions. Figure 2 presents the activities involved in creating the UXIV questionnaire. The activities of the methodology are explained in more detail in the Subsects. 3.1, 3.2, 3.3, 3.4, and 3.5. The methodology consisted of the following steps: 1. First, a website with information visualization was evaluated using an existing instrument for UX assessment. The evaluators indicated the UX aspects they missed in the instrument used (cf. Subsect. 3.1). 2. Next, we performed a thematic analysis of the aspects the evaluators missed in the instrument used. We organized these points into ten missing themes of UX evaluation (cf. Sect. 3.2). 3. We searched for an alternative UX evaluation instrument and analyzed it regarding the missing themes. Three of ten missed themes were covered by the instrument (cf. Sect. 3.3). 4. We proposed a configured questionnaire for UX evaluation of IV systems creating a new module consisting of three dimensions to address the other missing themes (cf. Sect. 3.4). 5. Finally, we created the statements for evaluating each dimension of UX. We refined the proposed statements and consolidated the questionnaire (cf. Sect. 3.5).
3.1
Evaluating a Website with Information Visualizations
A group of students assessed a website with information visualization using an existing UX evaluation instrument, the AttrakDiffTM questionnaire [14]. This step aimed to analyze the feasibility of evaluating UX using a generic instrument in an IV context. Participants. In this activity, participants that acted as evaluators were Computer Science students enrolled in Human-computer Interaction classes offered
326
E. Z. Victorelli and J. C. dos Reis
Fig. 2. Methodology for the construction of our UXIV questionnaire
by the Institute of Computing at the University of Campinas (second semester of 2019)2 . They had UX and information visualization classes before the evaluation activities. In total, 22 undergraduate students participated in the assessment. Website with Information Visualization. The participants analyzed Flerlage’s website called “What’s the Happiest Country in the World?”3 . The website offers information visualizations about countries in the world. It includes textual explanations and the author’s interpretations. The visualizations show countries’ happiness levels based on the Happiness Index. Preparation. Initially, the evaluation procedure and instrument, the AttrakDiffTM questionnaire, was explained to the participants in a 30-min presentation. All participants were instructed to perform these evaluations exclusively as a user of the visualization prototypes. Activities. The participants used visualizations to choose a country to live in after completing their studies. They would explore the website and make their decision. Then, they evaluated the UX offered by the information visualizations using the AttrakDiffTM questionnaire. The assessment questionnaires were created with the support offered by the Attrakdiff website and were made available to students online. After evaluating the UX of the website with visualizations, they assessed the AttrakDiffTM questionnaire itself. Participants took an average of 90 min to perform all the activities. This step investigated the participants’ perceptions of the questionnaire as an instrument for evaluating the UX of information visualization. The participants answered open questions about the instrument. We postprocessed the free responses using a scale of −2 to +2. Grades’ meaning was: −2 2 3
The ethical committee of the University of Campinas approved this study (#18927119.9.0000.5404). Available in October 2019 in https://www.kenflerlage.com/2016/08/whats-happ iest-country-in-world.html.
Evaluating User Experience in Information Visualization Systems
327
- totally disagree, −1 - disagree, 0 - neither disagree nor agree, 1 - agree, and 2 - totally agree. The first question was general and asked: “Does the instrument used to evaluate the website provide good results?” This question obtained a positive evaluation (average of +0.55). The second question was more specifically concerned with the instrument’s adequacy for IV context: “Is the UX evaluation tool suitable for the context of information visualizations?” It received a slightly negative evaluation (average of −0.23). The third question was about the adjectives used adequacy for IV context: “Were all pairs of adjectives suitable for evaluating the information visualization?” This question obtained a negative evaluation (average −0.59). We understand that the participants considered this generic evaluation instrument, and the adjectives used are not easily mapped to the characteristics of information visualizations. Finally, the questions about missing aspects were: “Have you identified relevant aspects of the interaction with data visualizations that were not evaluated by the used instrument? Which ones are they?” Few participants answered “nothing” to these questions. A total of 82% of participants missed some aspects not addressed by the questionnaire. 3.2
Thematic Analysis of UX Aspects Missed
We analyzed the participants’ responses about the aspects they missed in the instrument used. We grouped the answers by themes. In this activity, we followed the steps of thematic analysis proposed by Braun and Clarke [7], as follows: Step 1: Familiarization - We got a thorough overview of all the data collected before analyzing individual items. As it was a small and simple set of responses, we read all answers. Step 2: Coding - We highlighted various phrases describing the idea or feeling expressed and generated a code for them. If the comments of another evaluator concerned the same idea or feeling, we gave the same code. In this step, thirteen codes were identified on subjects that the evaluators missed during the evaluation, as shown in first column of Table 1. Step 3: Generating themes We scanned the codes created to identify patterns among them. We decided to group “Responsiveness”, “Intuitiveness”, and “Performance” codes into the “Usability” theme. Step 4: Reviewing themes - We made sure that all themes were valuable and accurate representations of the ideas expressed by the participants. We could split, combine, discard, or create new themes. The code “visualization-specific” was discarded because we considered it to be a generic code as it was related to the overall subject of the investigation. This code could be mapped in many of the codes identified, especially in “Relevant content/data”, “Graphic representation” and “Interaction Categories”.
328
E. Z. Victorelli and J. C. dos Reis
Step 5: Defining and naming themes: We should formulate precisely what each theme means and give it an appropriate name. In the former steps, we grouped the codes in themes according to the theory that the students had learned about design, UX, and IV. In this step, we did not change the themes’ names. Step 6: Writing up: Finally, we summarized our thematic analysis of the data in Table 1. Table 1. The thematic analyses codes, number of comments identified, and themes missed in used questionnaire. Codes
3.3
#Eval. Theme
Qualitative questions/Open Questions 2
Qualitative evaluation
Relevant content/data
Information content
2
Graphics quality/Visualization format 2
Graphic representation
Design guidelines/ HDI concepts
3
HDI guidelines
Interaction Categories
2
Interaction Categories
Information context
2
Information context
Decision making
1
Decision making support
Aesthetics
1
Aesthetics
Functionality/Utility
3
Functionality/Utility
Responsiveness
4
Usability
Intuitiveness
1
Usability
Performance
3
Usability
Visualization-specific items
2
—
Missing Themes in an Alternative UX Evaluation Questionnaire
We assessed another instrument looking for themes not covered in the used questionnaire. We evaluated whether the meCUE addressed these missing themes [26]. Some of the missed aspects identified in the used questionnaire were identified in the alternative questionnaire. Three of the ten missing themes were addressed by meCUE. Table 2 shows the missing themes in the used questionnaire in the first column. The second column shows the name of the meCUE questionnaire module that addresses the theme if it exists (as per the authors’ evaluation). Themes marked as “Absent” in Table 2 are not covered in either questionnaire. 3.4
New Module for Interaction with Data
The meCUE is constructed to address all key components of UX across its modules validated individually [26]. Due to its modular configuration, the questionnaire can be easily adapted to specific research goals. We decided to adapt
Evaluating User Experience in Information Visualization Systems
329
Table 2. Missing themes identified in the instrument used mapped to existing modules of the alternative questionnaire and new dimensions proposed. Missed themes
Module meCUE
New Dimension
Qualitative evaluation
Absent
One Open Question
Information content
Absent
Interaction Scope
Graphic representation
Absent
Interaction Scope
HDI guidelines
Absent
Interaction Design
Interaction categories
Absent
Interaction Scope
Information context
Absent
Decision making support Absent Aesthetic
Non-Instrumental - Aesthetics
Functionality/Utility
Instrumental - Usefulness
Usability
Instrumental - Usability
Interaction Design Core Principles
meCUE because it was configurable and would require fewer adaptations than the used questionnaire. We configured it by choosing the modules needed and creating a new one. The meCUE questionnaire does not address specific aspects of product types. Then, we proposed a new module to the questionnaire focusing on aspects of UX specifics of information visualization systems. Our proposed module aims to incorporate the aspects of interaction that facilitate the exploration, understanding, and control of data and is called HDI - Human-Data Interaction. Figure 3 presents our UXIV questionnaire with the HDI module added to the original modules. The HDI module encompasses aspects concerned with the interaction with data. Based on the previous analyses, we proposed new dimensions for UX evaluation in the HDI module. The new dimensions aim to evaluate the missing themes and to describe the UX in IV systems. The third column of Table 2 shows how the aspects missed were mapped to the three new dimensions in the configured questionnaire. We created three new dimensions based on elements that can help to understand users’ needs regarding data-driven applications. The new dimensions are concerned with: i. the core principles of HDI [27]; ii. the categories of interaction [42]; and iii. the design guidelines for HDI [40]. In our understanding, memorability, engagement, and enjoyment are essential to interacting with data. However, they are orthogonal to the dimensions added. Then, to ensure that they are represented in the questionnaire, the mapping of the statements to them is verified in the final creation step. 3.5
The UXIV Statements Definition
In defining the statements for the new dimensions of UXIV, we followed the initial guidelines suggested in a practical guide to scales development [11]. We
330
E. Z. Victorelli and J. C. dos Reis
Fig. 3. All modules of UXIV questionnaire. Original modules of meCUE and its dimensions are presented in green and blue. The new HDI module and its dimensions are presented in orange and yellow. (Color figure online)
involved three experienced designers and mixed individual brainstorming with peer review practices. The steps of statements definition are presented in Fig. 4 and described below:
Fig. 4. Activities of statements definition for new dimensions of UXIV
1. The guideline of DeVellis methodology about “Generate an item pool” guided our first activities [11]. The initial statements were generated by individual brainstorming from one of the authors. We draw upon the literature related to the new dimensions as sources of information and motivations for the first version of the statements pool. The literature about core principles of HDI [27] combined with the categories of interaction in IV systems [42], and design guidelines for HDI [40] inspired the generation of a draft set of statements. 2. The draft statements were classified to help identify redundancies and gaps in the sets. The statements were classified regarding the type of user’s needs and user experience goals to compose a balanced questionnaire. The statements should uniformly cover the requirements involved in UX: functionality, usability, and experience [29]. In the latter level of experience, the classification of
Evaluating User Experience in Information Visualization Systems
331
statements was detailed according to user experience goals in visualizations according to Saket et al. [33]. The UX goals considered were memorability (immediate, short-term, and long-term memory), engagement (expose, involve, analyze, synthesize, decide), and enjoyment (challenge, focus, clarity, feedback, control, immersion). New statements to address the gaps were proposed if no items were related to a specific classification. The statements that had the same type were analyzed. If they substantially differed from others, they remained. We followed a guideline that recommended keeping the redundancies that expressed the same idea in different ways, as they could be helpful, but eliminating the trivial redundancies [11]. A pool of 52 classified statements was defined. 3. The guideline “Have the initial item pool reviewed by experts” was suggested by Devellis to overcome gaps and biases that may have arisen due to ideas generated by a single person [11]. Two experienced designers were involved in the peer revision of the initial pool. They rated each item according to how relevant they thought each item was to what we intended to measure. They suggested breaking down, consolidating, and expressing ideas in alternative ways. The reviewers classified the items indicating whether the statements were clear or confusing. 4. The first designer consolidated the set by adding, merging, or removing some statements resulting in a group of 28 statements. Finally, following the guideline “Determine the format for measurement” [11], we defined a seven-point Likert scale indicating the level of agreement to evaluate each statement.
4
UXIV Questionnaire
The HDI module, its dimensions, and the evaluation statements are the original contributions of this study (module V in Fig. 3)4,5 Table 3 shows the complete HDI module and its evaluation statements. The statements and classifications that supported the questionnaire preparation are presented in detail. The twenty-eight statements resulting from the refinement process compound the three dimensions that help describe the UX for IV systems. Two general statements about decision-making and fluid interaction were joined with three statements about [27]. Legibility is concerned with processes of understanding data and its processing. The agency is related to the power to act upon the data. Negotiability addresses the dynamic relationships that arise from data [27]. G1 to G5 in Table 3 are the statements of the dimension called core principles (cf. item (a) of Fig. 3). Seven statements were associated with the dimension interaction scope (cf. item (b) of Fig. 3). Statements I1 to I7 of Table 3 concern the categories of interaction [42]. Sixteen statements were related to interaction design (cf. item (c) of 4 5
The HDI module of the UXIV questionnaire is available online at https://forms.gle/ 3ioXom95ovzng5xn7. The other modules used by UXIV are original from meCUE and are available online at http://mecue.de/english/download-en.html.
332
E. Z. Victorelli and J. C. dos Reis
Fig. 3). Statements P1A to P7B of Table 3 were derived from design guidelines for HDI [40]. All statements were classified according to their relationship with the utility, usability, memorability, engagement level, and enjoyment type. For engagement level, we used the taxonomies proposed by Mahyar et al. [23]. While for enjoyment, types are based in Saket et al. study [34]. The answers are requested in a seven-point Likert scale indicating the level of agreement to evaluate each statement ranging from “strongly disagree” to “completely agree” [3]. The HDI module scale maintains compatibility with that used in the original modules of the meCUE questionnaire. Figure 5 presents one example of an evaluation statement and the corresponding Likert scale for the level of agreement.
Fig. 5. Example of UXIV questionnaire statement.
UXIV also evaluates the user perception of overall attractiveness in IV systems. We introduced a general question with the same scale used in the global question in the original questionnaire. At the end of the evaluation, the user is asked “What was your experience with the information visualization system as a whole?”. It consists of a single pair of “as bad”/“as good”. Its rating scale ranges from “−5” to “+5” with an increment of 1. This scale with a pair of adjectives allows the evaluator to express their experience without worrying about the labels assigned to intermediate values. Most statements of the HDI module were written in the first person to clarify to the evaluators that the investigation is about their perception and experience. There are exceptions, about the person who takes action, to some statements that became clearer when the subject of the declaration was the system or the data, for example, statements I4 and P2A.
5
Discussion
The goal of this study was the construction of a questionnaire for the evaluation of UX in IV systems. This study did not aim to perform a comparative analysis between existing instruments, such as AttrakDiff [14], meCUE [26], and others [22,32,38,41]. Therefore, we cannot evaluate one against the others. The approach was to start from a well-established instrument and add items related to the interaction with data.
Evaluating User Experience in Information Visualization Systems
333
Table 3. HDI Module of Questionnaire UXIV Id
Item
Statement
G1
General
I interacted with the visual representation/visualization in a fluid way”
Utili-ty Usabi-lity Memo-rabil. Enga-gement Enjoy-ment
G2
General
The visualization helped me make a decision.
G3
Legibility
I easily understood how to read the visualization
G4
Agency
I was able to control, report and correct data and inferences.
G5
Negotiability
I was able to reevaluate decisions.
I1
Explore
I could examine different subsets of data.
X
X
I2
Select
I was able to select and mark something X interesting.
X
I3
Reconfigure
I was able to see the data from different X perspectives.
X
Con-trol
I4
Encode
The data were represented in different ways.
X
X
Con-trol
I5
Abstract/ Elaborate
I was able to see more or less details about a piece of data.
X
X
Con-trol
I6
Filter
I was able to show or hide data.
X
X
I7
Connect
I was able to see the relationship between the presented items.
X
X
P1.1
Self-evidence in coord. views
I realized the relationships between different views of data.
X
X
P1.2
Consistent in coord. views
I noticed consistency in the way the data was presented in the different views.
X
X
P1.3A Reversible state
I had a clear idea of the state of the visualization while interacting.
X
P1.3B Reversible state
At any time, I could return to a previous state.
X
Feed-back
P2A
Smooth transitions
The system helped me maintain my reasoning for transitions between views.
X
Focus
P2B
Smooth transitions
I easily understood the relationship between the different views.
X
P3A
Immediate Feedback
The system provided me with feedback as soon as I performed an interaction.
X
P3B
Immediate Feedback
I immediately saw my progress towards my goals.
X
Ana-lyse
Feed-back
P4A
Direct Manipulation
I directly interacted or manipulated the visual representation data.
X
Involve
Immer- sion
P4B
Integrated Components
The interface elements were well integrated with the visual representation.
X
Involve
Clarity
P5.1A Information context
I managed to get an overview and details at the same time.
X
Synthe-size
Focus
P5.1B Information context
I knew where I was and how to go where I needed to.
X
Synthe-size
P5.1C Information context
I Knew to navigate looking for information relevant to the decision I wanted to make.
X
Decide
P5.2
Min. memoriz.
I didn’t have to memorize information.
X
P6.2
Semant. enriched feedback
I was able to give my feedback to the system in a natural way for me.
Feed-back
P6.3
Semant. enriched interact.
I realized that the system understood my reasoning better while interacting with it.
Feed-back
X
Involve
Immer- sion
Decide
Focus
X
Expose
Clarity
X
Ana-lyse
Con-trol
X
X
X Involve X
Con-trol Ana-lyse
Con-trol Clarity
Synthe-size
Expose Feed-back
X
334
E. Z. Victorelli and J. C. dos Reis
We created a new module for the meCUE questionnaire for two main reasons. First, we identified some aspects addressed by this questionnaire that the previously used instrument did not cover. We considered that AttrakDiffTM focuses more on hedonic qualities and highlights several relevant aspects. However, we believe pragmatic qualities are as crucial as hedonic qualities for UX evaluation in IV. We thought that the meCUE questionnaire addresses instrumental and non-instrumental aspects in a way that seems more appropriate to the IV context. The second motivation was the ease of adaptation of meCUE questionnaire. Its creators claim that due to its modules being individually validated, the questionnaire can be easily adapted to specific research goals by simply selecting those modules which are required [26]. We could configure the instrument for our scenario by removing the modules related to some aspects that would be less important to the evaluation situation. In addition, it was easy to incorporate a new module related to interaction with data. For the creation of the statements of this new module, we based ourselves on the literature of IV [42], and IHD [27,40]. General concepts, guidelines, and taxonomies in the literature inspired the proposed statements. The innovation consists of collecting several statements organized in a questionnaire with a unified evaluation scale. Furthermore, we mapped the statements using different classifications such as usefulness, usability, memorability, engagement, enjoyment [23,34]. The mapping allowed verifying the coverage of the statements in an orthogonal view of the literature in which the statements were defined. The background of the participants may have influenced the initial comments that supported the definition of themes for the new module. In the thematic analysis, we identified several comments using terms from IV, UX, and evaluation theory that students had previously learned. Although this background may have influenced the activities, it was essential in carrying out the task. If the participants had no design and evaluation experience, they might have had more difficulty performing the tasks correctly, preventing the identification of missing themes in the questionnaire. The contribution of this study in the form of a questionnaire consists of a practical and quick way to assess IV. It can be applied using any number of evaluators, simultaneously or not, and in remote or in-person evaluations. We recommend the questionnaire be used in conjunction with other evaluation methods, mainly qualitative approaches. We consider that interacting with systems heavily based on data presents additional challenges than typical HCI systems. Many of these challenges are present in IV systems. In this way, the HDI literature guided several methodology stages, whereas the practical activities focused on information visualization systems. Most statements deal with HDI and are not restricted to IV systems. Still, in some cases, they are directed to the representation and visualization of data since this is an essential aspect of HDI. However, we believe that other types of data-based applications may present additional challenges than those addressed in this study. The next steps of our research must analyze, for exam-
Evaluating User Experience in Information Visualization Systems
335
ple, the automated data analysis flow present in Visual Analytic applications, which we consider a relevant context of interaction with data. Further experimental studies with users should be conducted to evaluate the proposed questionnaire’s reliability. If the same result can be consistently achieved using the questionnaire under the same circumstances, the instrument will be considered reliable and consistently measures UX. Reliability can be estimated by comparing different versions of the same measure. It is relevant to carry out additional studies that compare other metrics with the results obtained with specific parts of the questionnaire. The engagement with visualizations, for example, can be evaluated by measures such as “the total amount of time participants spent looking at visualizations” or “willingness to annotate a visualization” [33]. The results of these measures can be compared with the results of specific questionnaire statements related to engagement. We constructed a questionnaire with questions that measure separate variables. It aims to support the collection of a range of information about users’ perceptions of the UX. The constructed questionnaire can be used in descriptive research on IV UX [35]. Future explanatory studies are needed to identify and validate the core components of UX and define an instrument to aggregate data collected into a scale or index for UX in IV systems.
6
Conclusion
Assessing if a user has a good experience with IV systems requires practical evaluation tools that consider the specificities of users’ interaction with data. This study aimed to develop a tool to evaluate the relevant aspects of UX in IV systems. We contributed with an evaluation questionnaire that can be used in descriptive research on this subject. A module relative to the HDI was added to a questionnaire for evaluating UX regarding the interaction with data. Our results identified relevant themes concerning UX in IV systems and proposed statements to measure these aspects. Although our advancements, various questions related to investigating UX in IV systems remain open. Further studies are needed to assess and validate the proposed questionnaire. We started to pave the way for research that models the UX components for IV systems and for establishing a UX index for this type of application. Acknowledgements. We want to thank all students for participating in the research and Andressa Santos for her support in conducting the activities. This work was partially supported by the S˜ ao Paulo Research Foundation (FAPESP) (grants #2013/08293-7, #2015/16528-0, and #2022/15816-5) (The opinions expressed in here are not necessarily shared by the financial support agency).
References 1. Adagha, O., Levy, R.M., Carpendale, S.: Towards a product design assessment of visual analytics in decision support applications: a systematic review. J. Intell. Manuf. 28(7), 1623–1633 (2017)
336
E. Z. Victorelli and J. C. dos Reis
2. Attfield, S., Kazai, G., Lalmas, M.: Towards a science of user engagement (position paper). In: WSDM Workshop on User Modelling for Web Applications (2011). http://www.dcs.gla.ac.uk/∼mounia/Papers/engagement.pdf 3. Bangor, A., Kortum, P., Miller, J.: Determining what individual SUS scores mean: adding an adjective rating scale. J. Usability Stud. 4(3), 114–123 (2009) 4. Borkin, M.A., et al.: Beyond memorability: visualization recognition and recall. IEEE Trans. Visual Comput. Graphics 22(1), 519–528 (2016). https://doi.org/10. 1109/TVCG.2015.2467732 5. Boy, J., Detienne, F., Fekete, J.D.: Storytelling in information visualizations: does it engage users to explore data? In: Conference on Human Factors in Computing Systems - Proceedings, pp. 1449–1458 (2015). https://doi.org/10.1145/2702123. 2702452 6. Bradley, M.M., Lang, P.J.: Measuring emotion: the self-assessment semantic differential manikin and the. J. Behav. Therapy Exp. Psychiatry 25(I), 49–59 (1994). https://doi.org/10.1016/0005-7916(94)90063-9 7. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006). https://doi.org/10.1191/1478088706qp063oa 8. Carpendale, S.: Evaluating information visualizations. In: Kerren, A., Stasko, J.T., Fekete, J.-D., North, C. (eds.) Information Visualization. LNCS, vol. 4950, pp. 19– 45. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70956-5 2 9. Darin, T., Coelho, B., Borges, B.: Which instrument should i use? Supporting decision-making about the evaluation of user experience. In: Marcus, A., Wang, W. (eds.) HCII 2019. LNCS, vol. 11586, pp. 49–67. Springer, Cham (2019). https:// doi.org/10.1007/978-3-030-23535-2 4 10. Davis, W.A.: A causal theory of enjoyment. Mind 91(362), 240–256 (1982) 11. DeVellis, R.: Scale Development: Theory and Applications. Sage Publications, Thousand Oaks (2017) 12. Elmqvist, N.: Embodied human-data interaction. In: ACM CHI 2011 Workshop Embodied Interaction: Theory and Practice in HCI, vol. 1, pp. 104–107 (2011) 13. Gross, A., Bongartz, S.: Why do i like it? Investigating the product-specificity of user experience. In: NordiCHI 2012: Making Sense Through Design - Proceedings of the 7th Nordic Conference on Human-Computer Interaction, pp. 322–330 (2012). https://doi.org/10.1145/2399016.2399067 14. Hassenzahl, M.: The thing and i: understanding the relationship between user and product. In: Blythe, M., Monk, A. (eds.) Funology 2. HIS, pp. 301–313. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68213-6 19 15. Hassenzahl, M., Tractinsky, N.: User experience - a research agenda. Behav. Inf. Technol. 25(2), 91–97 (2006). https://doi.org/10.1080/01449290500330331 16. He, T., Isenberg, P., Dachselt, R., Isenberg, T.: Beauvis: a validated scale for measuring the aesthetic pleasure of visual representations. IEEE Trans. Visual Comput. Graphics 29(1), 363–373 (2022) 17. Hornung, H., Pereira, R., Baranauskas, M.C.C., Liu, K.: Challenges for humandata interaction – a semiotic perspective. In: Kurosu, M. (ed.) HCI 2015. LNCS, vol. 9169, pp. 37–48. Springer, Cham (2015). https://doi.org/10.1007/978-3-31920901-2 4 18. Isenberg, P., Zuk, T., Collins, C., Carpendale, S.: Grounded evaluation of information visualizations. In: Proceedings of the 2008 Workshop on BEyond Time and Errors: Novel Evaluation Methods for Information Visualization, pp. 1–8 (2008) 19. ISO 9241-11:2018 ergonomics of human-system interaction–part 11: usability: definitions and concepts. International Organization for Standardization (2018). https://www.iso.org/obp/ui/
Evaluating User Experience in Information Visualization Systems
337
20. Lam, H., Bertini, E., Isenberg, P., Plaisant, C., Carpendale, S.: Empirical studies in information visualization: seven scenarios. IEEE Trans. Visual Comput. Graphics 18(9), 1520–1536 (2012) 21. Laugwitz, B., Held, T., Schrepp, M.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (ed.) USAB 2008. LNCS, vol. 5298, pp. 63–76. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89350-9 6 22. Lecossier, A., Pallot, M., Crubleau, P., Richir, S.: Construction of an instrument to evaluate the user experience of a group of co-creators in the upstream innovation process. Int. J. Serv. Oper. Inf. 10(1), 17–42 (2019) 23. Mahyar, N., Kim, S.H., Kwon, B.C.: Towards a taxonomy for evaluating user engagement in information visualization. In: Personal Visualization: Exploring Data in Everyday Life: An IEEE VIS 2015 Workshop, pp. 1–4 (2015) 24. Mandryk, R.L., Inkpen, K.M., Calvert, T.W.: Using psychophysiological techniques to measure user experience with entertainment technologies. Behav. Inf. Technol. 25(2), 141–158 (2006). https://doi.org/10.1080/01449290500331156 25. Mazza, R.: Introduction to Information Visualization. Springer, London (2009). https://doi.org/10.1007/978-1-84800-219-7 26. Minge, M., Th¨ uring, M., Wagner, I., Kuhr, C.V.: The meCUE questionnaire: a modular tool for measuring user experience. In: Soares, M., Falc˜ ao, C., Ahram, T. (eds.) Advances in Ergonomics Modeling, Usability & Special Populations, pp. 115–128. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-41685-4 11 27. Mortier, R., Haddadi, H., Henderson, T., McAuley, D., Crowcroft, J.: Human-data interaction: the human face of the data-driven society. SSRN Electron. J. 1, 1–14 (2014) 28. Norman, D.: The Design of Everyday Things: Revised and Expanded Edition. Basic Books, Philadelphia (2013) 29. Norman, D.A.: Emotional Design: Why We Love (or Hate) Everyday Things. Basic Civitas Books, New York (2004) 30. Olson, J.S., Kellogg, W.A.: Ways of Knowing in HCI, vol. 2. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-0378-8 31. Plaisant, C.: The challenge of information visualization evaluation. In: Proceedings of the Working Conference on Advanced Visual Interfaces, AVI 2004, pp. 109–116. Association for Computing Machinery, New York (2004). https://doi.org/10.1145/ 989863.989880 32. Robinson, R.: All the Feels: A Twitch Overlay that Displays Streamers’ Biometrics to Spectators. University of California, Santa Cruz (2018) 33. Saket, B., Endert, A., Stasko, J.: Beyond usability and performance: a review of user experience-focused evaluations in visualization. In: Proceedings of the Sixth Workshop on Beyond Time and Errors on Novel Evaluation Methods for Visualization, pp. 133–142 (2016) 34. Saket, B., Scheidegger, C., Kobourov, S.: Towards understanding enjoyment and flow in information visualization. In: Eurographics Conference on Visualization (EuroVis) (2015) 35. Saris, W.E., Gallhofer, I.N.: Design, Evaluation, and Analysis of Questionnaires for Survey Research. Wiley, Hoboken (2014) 36. Th¨ uring, M., Mahlke, S.: Usability, aesthetics and emotions in human-technology interaction. Int. J. Psychol. 42(4), 253–264 (2007). https://doi.org/10.1080/ 00207590701396674 37. van Willigen, T.: Measuring the user experience of data visualization. Master’s thesis, University of Twente (2019)
338
E. Z. Victorelli and J. C. dos Reis
38. V¨ aa ¨n¨ anen-Vainio-Mattila, K., Segerst˚ ahl, K.: A tool for evaluating service user experience (servux): development of a modular questionnaire. In: Interact 2009 conference, User Experience Evaluation Methods in Product Development (UXEM 2009), Workshop in Interact 2009 Conference, Uppsala, Sweden, 2009. p. 4 (2009) 39. V¨ aa ¨t¨ aj¨ a, H., Koponen, T., Roto, V.: Developing practical tools for user experience evaluation: a case from mobile news journalism. In: European Conference on Cognitive Ergonomics: Designing Beyond the Product-Understanding Activity and User Experience in Ubiquitous Environments, pp. 1–8 (2009) 40. Victorelli, E.Z., Reis, J.C.D.: Human-data interaction design guidelines for visualization systems. In: Proceedings of the 19th Brazilian Symposium on Human Factors in Computing Systems. IHC 2020. Association for Computing Machinery (2020). https://doi.org/10.1145/3424953.3426511 41. Vissers, J., De Bot, L., Zaman, B.: Memoline: evaluating long-term UX with children. In: Proceedings of the 12th International Conference on Interaction Design and Children, pp. 285–288 (2013) 42. Yi, J.S., Kang, Y., Stasko, J.T., Jacko, J.A.: Toward a deeper understanding of the role of interaction in information visualization. IEEE Trans. Visual Comput. Graphics 13(6), 1224–1231 (2007). https://doi.org/10.1109/TVCG.2007.70515
Multimodal Interaction
Study of HMI in Automotive ~ Car Design Proposal with Usage by the Elderly ~ Takeo Ainoya(B) and Takumi Ogawa Tokyo University of Technology, Tokyo, Japan [email protected]
Abstract. There are several issues specific to the elderly that exist in automobiles that support the mobility of the elderly in the community. Among these, this study focuses on the problem of getting in and out of automobiles due to changes in physicality. Due to physical changes caused by aging, there are situations that cannot be handled with the current door geometry. Considering the current situation in which automobiles have obstacles at the beginning and end of movement, and movement itself is considered an obstacle, we believe that the design of doors that allow people to get in and out without physical burden, as an interface (HMI) between the elderly and automobiles, is important not only for solving the problem of getting in and out, but also for automobile design in the era of autonomous driving. We believe this is important not only for solving the problem of getting in and out of the car, but also for automobile design in the era of autonomous driving. In addition, by targeting rural areas, it is expected to solve mobility problems in rural areas and improve the quality of life (QOL) of the elderly in rural areas. As a result of this study, we proposed a UX based on boarding/exiting and door shape, and established a new icon design based on door shape. Such an approach is expected to be effective in future automobile design. Keywords: HMI · Mobility design · Elderly design · HCD · UX · QOL
1 Introduction 1.1 Purpose of the Project Mobility Design for the Elderly HMI Considerations for Mobility to be a Life Partner. In mobility design, especially automobile design, the symbolic nature of automobiles and their value as durable consumer goods have become important, and the designer’s professional skills have been applied to their treatment as forms and surfaces [1], an The use of mobility is expected to change significantly with automated driving. Especially for the elderly in Japan, where the “100-year life period” has become the norm, it is expected to be of great significance. There are many areas where automobiles are indispensable (e.g., where people cannot go shopping without a car), and the increasing number of accidents involving elderly people has made it a social problem to encourage people over 70 to return their driver’s licenses. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 341–351, 2023. https://doi.org/10.1007/978-3-031-35132-7_25
342
T. Ainoya and T. Ogawa
1.2 Social Background The increasing number of accidents involving elderly people has made it a social problem to encourage people over the age of 70 to return their driver’s license. The automobiles that support the mobility of the elderly in these rural areas have multiple problems with the elderly. Among them, this study focuses on the problem of getting in and out of vehicles caused by changes in physical characteristics. Due to changes in physical characteristics caused by aging, there are situations in which the current door shape cannot cope such changes. This makes it difficult for them to move around, and they may consider it impossible for them to move around [2]. A door that allows easy entry and exit without physical burden is a problem that needs to be solved immediately. The design of the door shape as an interface (HMI) between the elderly and the car not only solves the problem of getting in and out of the car. 1.3 Mobility Design for the Elderly This proposal is expected to improve the quality of life (QOL) of the elderly in rural areas by solving mobility problems and expanding the range of their lives in rural areas. It is also important to consider the possibilities of the people who will use the vehicle and its usage in the design of automobiles in the era of automated driving. In the age of 100 years of life, if we have to return our driver’s license by the age of 70, how should we live for the next 30 years? In response to such questions, we are considering the possibility of mobility as a partner to support our daily lives, just as computers and smart phones do today. In other words, we examined the possibility of mobility as a partner to support our daily lives. In other words, to verify how mobility can be used, we examined and designed HMI with mobility (Fig. 1)
Fig. 1. Concept sketch of "Make what you can’t do possible"
Study of HMI in Automotive ~ Car Design Proposal
343
2 Purpose 2.1 Problem Hypothesis Problems in Usage by the Elderly Ease of getting in and out of the chair, especially the problem of leg space. To solve the problem of ease of getting in and getting out of the car, there are some products that automatically support getting in and out of the car and that have swiveling chairs (Fig. 2). However, interviews with the elderly and the people who support them have revealed that the shape of the doors affects the leg space, causing problems such as catching (Fig. 3)
Fig. 2. (right) Devices that support getting in and out of existing automobiles
Fig. 3. (left) Getting in and out of the current rear seat
Door shape is considered to be atmospheric in automotive design because of its structure itself and its relationship with the platform. Therefore, in the normal styling design process, the door shape is not considered to be a design requirement from the upstream of the design process. Therefore, door shape is often presented as a design requirement from the upstream of the design process. However, when focusing on the ease of entry and exit for the elderly, it is considered that such design requirements do not adequately answer the needs of the elderly.
344
T. Ainoya and T. Ogawa
2.2 Basic Research Survey on the Elderly and Mobility Problems related to the mobility of the elderly in rural areas and the causes of their inability to easily get in and out of their vehicles were investigated. The results showed that mobility in rural areas is dependent on automobiles, which are a necessity in daily life. The reasons why getting in and out of the car is not easy were found to be due to physical changes caused by aging. The study also found that there are no vehicles that take into account the physical characteristics of the elderly and other people as a problem for vehicles [3, 4]. 2.3 Suburban Study As a model survey of an area for hypothetical introduction of mobility in this study, a survey was conducted in the suburbs of an urban area. The target areas of the survey were Yokosuka City and Kamakura City (Fig. 4). The results showed that both have many narrow streets and many residents own compact cars. We felt that local communities are important and that it would be good to have a place for communication.
Fig. 4. Photos of Kamakura and Yokosuka, where many small cars are owned and there are many slopes
2.4 Accessibility Analysis for Design Method (Extraction of Design Requirements) The design method of this study is to calculate the minimum requirements for door shape with regard to ease of boarding and alighting from the extraction of requirements in boarding and alighting, analysis of characteristics, and verification using 3D CAD, VR, and other digital means. Based on the door shape, we developed the design of the entire automobile from partial design.
Study of HMI in Automotive ~ Car Design Proposal
345
3 Method 3.1 Understanding the Current Boarding and Alighting Problems First, we investigated the current problems of boarding and alighting and the characteristics analysis and precedents of the current door shape. As a result, we found out the current door shape and the problems of the elderly and the ease of getting in and out by opening the door. In addition, we were able to establish design requirements for getting in and out of the car, such as a less burdensome height and width for getting in and out of the car, based on a survey of my grandmother’s grandmother. 3.2 Analysis of Differences in Ease of Boarding and Alighting Depending on Door Geometry We created 3D models of existing doors and geometric door shapes, and analyzed the ease of getting in and out of doors using VR. As a result, detailed problems with the existing door shape in terms of ease of boarding and alighting were identified (Fig. 5). Next, we analyzed the ease of getting in and out of the door from the geometric shape as a door shape that does not reflect the problems of the existing door shape. However, we were not able to find the door shape with the minimum requirements, which is the theme of this study (Fig. 6).
Fig. 5. Analysis of current car problems and door shape characteristics using cad
346
T. Ainoya and T. Ogawa
Fig. 6. Requirements determined: chassis height 200 mm, straddle width 150 mm, and overall height 2000 mm. (This is the maximum height standard for mini cars) and sample CAD/CG of exterior design
3.3 Analysis of Boarding and Alighting Trajectories from Photographs and Videos Based on the results of the analysis, we analyzed the trajectory of a person’s body when getting in and out of the door, rather than considering the door shape based on the geometry. The shape of the opening was calculated based on the results. The calculated door shape was verified using VR as well as analysis based on composite photographs. As a result, smooth boarding and exiting of the vehicle became possible, and a door shape with the minimum requirement for ease of boarding and exiting was realized (Fig. 7). Packaging and styling development based on door shape.
Study of HMI in Automotive ~ Car Design Proposal
347
Fig. 7. Body trajectory from a simplified model created based on the requirements, photographed frame by frame at the point of getting on and off the mobility vehicle.
3.4 Packaging and Styling Development Based on Door Shape Packaging was developed based on door shapes, taking into consideration the target customers and their requirements for getting in and out of the car. Based on the packaging, the overall styling was designed to be simple and iconic. The focus in the design of this packaging was the height of the chairs. The height of the chairs not only makes it easier to sit down, but also makes it easier to talk to pedestrians by bringing the eye point closer to the people in the car and to the pedestrians, thus creating accidental conversations. In addition, to achieve the height of 200 mm, which is a requirement for getting in and out of the car, the battery was placed at the front of the car instead of on the floor, which made it possible to lower the height to 200 mm to the floor height (Fig. 8).
348
T. Ainoya and T. Ogawa
Fig. 8. Considering the set targets and requirements for boarding and alighting, packaging prototypes were created on 3D CAD, and the design was developed through verification.
4 Results 4.1 Mobility Design for Grandmothers The package concept is “flexible space for 1+1 passengers,” and the basic one-seater design is intended to create a compact EV that provides a spacious space for one person and comfortable transportation for two. In addition, the roof can be moved up and down to create a space that requires height, such as standing up to talk, making it a compact yet adaptable mobility for a variety of situations, a package that is considerate of the target audience who are stressed by what they cannot do. The door opening pattern was made to be a coach door so that body movements after getting in and out of the car would be smooth (Figs. 9 and 10).
Fig. 9. Sketch of styling development
Study of HMI in Automotive ~ Car Design Proposal
349
Fig. 10. Illustration sketches of packaging details
4.2 Design verification in VR The developed packaging and styling were created in 3D, and the design and usability were verified using VR. Based on the verification results, we were able to change the design to make the door shape stand out more, expand the interior space, and create a simpler styling that makes the door shape stand out more (Fig. 11).
Fig. 11. Design verification scene in VR
4.3 Verification of Color Development to Emphasize the Shape Features The design was developed to make the shape more distinctive not only in terms of styling but also in terms of CMF (Color, Material, Finishing). As a result, the door shape not only stands out but also has a natural coloring (Fig. 12).
350
T. Ainoya and T. Ogawa
Fig. 12. Examination of coloring by CG
4.4 Verification and Results The vehicle design for the elderly was verified and confirmed by having the elderly actually experience the ease of getting in and out of the vehicle and other HMI features. As a result, not only the ease of getting in and out of the car, but also the scale and interior space of the car received good evaluations (Fig. 13).
Fig. 13. VR verification scene of the final model
4.5 Conclusion As a result of this research, we were able to propose UX based on ease of entry and exit and door shape, and to establish a new iconic design based on door shape. We believe that the door HMI will create a new mobility value for the elderly. We also believe that this approach may be effective in future automobile design. The HALO has been designed to accommodate a variety of elderly people and their lifestyles. For example, since the vehicle is designed to be fully automatic, there is no need for visibility of the windows and pillars, so smoked windows that can be changed in transparency depending on the mood of the day, and IFs are embedded in the A-pillars to make it easy to see at a glance information to be obtained outside, such as locking and unlocking the door. In order to provide a comfortable and relaxing lifestyle, the rear window latticework creates the sensation of being in the shade of a tree, and a wooden deck in the rear allows for sunbathing and conversation (Fig. 14). 1/4-scale model was fabricated and final confirmation of the design as a threedimensional object was made. 3D printed output was surface treated and painted, and the composition of surfaces and detailed design were repeatedly examined (Fig. 15).
Study of HMI in Automotive ~ Car Design Proposal
351
Fig. 14. Introductory visuals for HALO that can provide value other than ease of boarding and alighting
Fig. 15. 1/4 model and production scene
References 1. Stuart Macey (Author, Illustrator), Geoff Wardle (Assistant), H-Point 2nd Edition: The Fundamentals of Car Design & Packaging (2014) 2. 高齢者の運転免許返納は増加したか? https://www.nli-research.co.jp/report/detail/id= 64131?pno=2&site=nli 3. Watanabe, M., et al.: Predictors of house boundedness among elderly persons living autonomously in a rural community. Nihon Ronen Igakkai Zasshi (2007) 4. Yanagihara, T.: The relationship of the choice of the transportation means, the frequency of going out and the functional capacity in elderly people. Infrastruct. Plan. Manage. 32(Special Issue) (2015)
Pilot Study on Interaction with Wide Area Motion Imagery Comparing Gaze Input and Mouse Input Jutta Hild(B) , Wolfgang Krüger, Gerrit Holzbach, Michael Voit, and Elisabeth Peinsipp-Byma Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, 76131 Karlsruhe, Germany [email protected]
Abstract. Recent sensor development allows capturing Wide Area Motion Imagery (WAMI) covering several square kilometers including a vast number of tiny moving vehicles and persons. In this situation, human interactive image exploitation is exhaustive and requires support by automated image exploitation like multi-object tracking (MOT). MOT provides object detections supporting finding small moving objects; moreover, MOT provides object tracks supporting if an object has to be identified because of its moving behavior. As WAMI and MOT are current research topics, we aim to get first insight in interaction with both. We introduce an experimental system comprising typical system functions for image exploitation and for interaction with object detections and object tracks. The system provides two input concepts. One utilizes a computer mouse and a keyboard for system input. The other utilizes a remote eye-tracker and a keyboard; as in prior work, gaze-based selection of moving objects in Full Motion Video (FMV) appeared as an efficient and manually less stressful input alternative to mouse input. We introduce five task types that might occur in practical visual WAMI exploitation. In a pilot study (N = 12; all non-expert image analysts), we compare gaze input and mouse input for those five task types. The results show, that both input concepts allow similar user performance concerning error rates, completion time, and perceived workload (NASA-TLX). Most features of user satisfaction (ISO 9241-411 questionnaire) were rated similar as well, except general comfort being better for gaze input and eye fatigue being better for mouse input. Keywords: Aerial image analysis · Wide Area Motion Imagery · multi-object tracking · automated image analysis · user interface · multimodal gaze input · gaze pointing · pilot study
1 Introduction Many fields of application utilize imaging methods and require some sort of image interpretation. Single image interpretation occurs, for example, in industrial production to support quality assurance, in medicine to support diagnosis, or in various earth science domains that utilize remote sensing. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 352–369, 2023. https://doi.org/10.1007/978-3-031-35132-7_26
Pilot Study on Interaction with Wide Area Motion Imagery
353
Typically, image interpretation is a human visual task. The human expert possesses visual capabilities as well as domain knowledge which is crucial to extract the relevant information from the image. Nevertheless, image interpretation can be a demanding task, particularly, if images are blurred or if objects are small. Moreover, recent sensor development provides more and more the capability to capture Full Motion Video (FMV) and Wide Area Motion Imagery (WAMI) [1]. As a result, much more image material has to be exploited and motion imagery containing moving objects makes image exploitation even more challenging. However, missing relevant information during image interpretation should be prevented as it could result in severe consequences. Therefore, it is useful to support the human expert analyst. Promising approaches utilize automated image exploitation algorithms (cf. Sect. 1.1) or a customized user interface which is appropriate for a visual task (cf. Sect. 1.2). 1.1 Automated Image Exploitation In order to prevent missing relevant information, many fields of application support the human expert with automated image exploitation algorithms like object detection or change detection. The human expert retains the central role as decision-maker, while the automated algorithms take on the role of an assistance. Present-day automated algorithms work very reliable and, therefore, provide valuable support tools [2]. Image exploitation systems like the ABUL system developed by the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation (IOSB) offer a variety of useful image processing and image exploitation functions, for example, image optimization, image stabilization, change detection, independent motion detection, object detection or object tracking [3]. All of them relieve the human perception and cognition as they guide the user’s attention and assist the user in finding objects or in keeping them in sight. At the same time, providing automated image exploitation results generates additional visual input as well as additional interaction overhead. Hence, providing a customized user interface would be appropriate. 1.2 Gaze Input for Interaction with Motion Imagery Another means to prevent missing relevant information is to provide an appropriate user interface. Image exploitation systems like ABUL are typically desktop systems which utilize the traditional mouse and keyboard for system input. A frequently occurring interaction in image exploitation is the selection operation. For example, often the human expert analyst has to annotate relevant objects in the image by putting a frame around them. This way, relevant information can be handed on to other authorities. While mouse input is established for interaction with stationary user interface elements, it is error-prone and cumbersome for interaction with moving objects in motion imagery. Particularly, if objects move fast or unpredictably, relocating the mouse cursor on the object might be challenging and manually stressful. Hild et al. showed that gaze input is an effective and easy-to-use alternative for moving object selection [4–8]. As gaze-based selection method they used multi-modal
354
J. Hild et al.
combination of the current gaze position provided by a remote eye-tracker for pointing and a key press for selection actuation. Evaluation results showed that gaze-based selection was fast, intuitive and easy-to-use as well as manually less stressful compared to mouse input. The reason is that gaze position is a natural proxy of user attention. As the user typically fixates the interaction location before performing the selection operation, gaze appears to be a natural pointing »device«. As there is no need to reposition a (mouse) cursor, perceptive, cognitive, and manual load are reduced. The main drawback of gaze-based selection is that remote eye-trackers provide a noisy signal. Due to technical and physiological issues, robust selection for stationary objects is only possible for objects of a size starting with 2° of visual angle. However, as mouse input does by far not maintain its pixel-level precision for moving object selection, gaze input is able to match up in this interaction situation. Gaze input in the form of gaze pointing+key press achieved similar effectiveness as mouse input for moving object selection [4, 5] as well as for initializing an automated object tracking algorithm [7] in FMV. In a user study with expert image analysts, selection time was up to 40% shorter with gaze input [4]. 1.3 Interaction with Wide Area Motion Imagery and Multi-Object Tracking In this contribution, we investigate the interaction with Wide Area Motion Imagery (WAMI) together with automated multi-object tracking. In comparison to Full Motion Video (FMV), WAMI data covers much larger scenes with a larger number of objects and provides image data with a frame rate of about 1 Hz. Considering application tasks like traffic control or vehicle tracing, WAMI data allows tracking objects over a longer time. However, the resulting large amount of data makes interactive exploitation much more challenging. Therefore, accurate and timely exploitation requires support by automated multi-object tracking. Such algorithms try to detect all moving objects in the motion imagery and to track them from frame to frame. Based on this, they are able to extract object tracks [9–12]. Such tracks are helpful, e.g., for identification of an object as target due to its specific trajectory; object detections are helpful for detection of small moving objects. Today, multi-object tracking is still a research topic. Hence, there are no typical interaction use cases or interaction practices. In order to investigate the interaction with multi-object tracking, we realized an experimental system (Python, Windows 10). This system provides interaction functions typical for image exploitation (panning and zooming the image data, selecting target objects); in addition, it provides interaction functions allowing utilizing the information provided by multi-object tracking, i.e., object tracks and object detections. The system reads precomputed detections and tracks (cf. Fig. 1) and is able to play WAMI data at a frame rate of 1 Hz. For system input, we realized two input concepts. One uses mouse and keyboard, in the following referred to as M+K; it represents the traditional way of interaction with a desktop image exploitation system. The other uses gaze pointing provided by an eyetracker and keyboard, in the following referred to as G+K; it was realized as in prior
Pilot Study on Interaction with Wide Area Motion Imagery
355
Fig. 1. WAMI data (image crop from the WPAFB 2009 data set [13]) with visualization of moving vehicles from WPAFB ground truth annotations. Red dots highlight all vehicle detections, colored lines show object tracks for five selected vehicles.
work gaze pointing+keypress appeared as an efficient alternative for interaction with moving objects in FMV (cf. Sect. 1.2). Figure 2 shows the realization of the interaction functions for G+K. In total, ten functions are realized. All functions use gaze for pointing, except the on/off-key that actuates the object detections visualization. Key allocation considers several ergonomic aspects: 1. The utilized keys are prominent due to their sizes and/or locations within the keyboard layout; the objective of this choice is to ensure that users are able to press them without looking down on the keyboard as this would compromise utilizing gaze for pointing. 2. Key allocation uses appropriate key semantics to support easy creation of a mental modal of the user interface. 3. Related functions use nearby keys. 4. The functions enter track and delete track utilize the same keys as enter frame and delete frame. The track functions are distinguished by pressing a modifier at the same time. The objective of this realization was to keep the number of required keys as small as possible. Moreover, the key semantics of ENTER and DEL hold true both for the frame as well as for the track interaction functions. 5. Keys are apportioned between the left and right hand to balance manual stress. 6. As WAMI data contains large images, there are additional »Quick-zoom« options for both zooming in and out. Pressing the MINUS-key for 1.5 s zooms out on original image size in one step. Pressing the PLUS-key for 1.5 s zooms in on a zoom level that
356
J. Hild et al.
presents the objects with a size of 1.5° in one step, making objects selectable with minimum effort; this size is configurable. Mouse input is as much as possible realized in a traditional manner as users typically are familiar with this usage. Hence, we utilize the mouse wheel for zooming in and out; Quick-zoom in uses a press on the mouse wheel; Quick-zoom out uses a press of the key in front of the mouse wheel (often labeled MINUS). Panning the image is realized by left mouse pressed at mouse movement. Left mouse click enters an object frame, right mouse click deletes an object frame. SHIFT and STRG are utilized in the same way as for gaze input. SHIFT serves as on/off-key for visualization of all object detections. Left/right mouse click while pressing STRG enter/deletes an object track. In order to get first insight in interaction with WAMI data and multi-object tracking results, we conducted a pilot study which we describe in the next section.
Fig. 2. Schematic illustration of the experimental system with key assignments for gaze input.
Pilot Study on Interaction with Wide Area Motion Imagery
357
2 Method As already mentioned in the introduction, multi-object tracking is a research topic, and to the best of our knowledge there is no research available that would systematically report typical interaction practices. Hence, we designed several experimental tasks that fit the basic definition of image analysis as proposed by Philipson [14]: »Image analysis: the process by which humans and/or machines examine photographic images and/or digital data for the purpose of identifying objects and judging their significance.« Accordingly, we designed five specific test tasks (cf. Sect. 2.1 for detailed description). All include target object search and selection; four include interaction with object tracks; three include observation of moving objects. Two utilize single WAMI data frames; three utilize WAMI data sequences. The image data material was derived from the WPAFB 2009 data set [13]. It contains about 1,000 frames with a resolution of 30,000 × 23,000 pixels and a Ground Sampling Distance of 25 cm/pixel. When played with a frame rate of 1 Hz, the total duration is 17 min. For this study, we used precomputed object detections and tracks extracted from WPAFB ground truth annotations (cf. Fig. 1). The experimental setup used a 24-inch monitor with a resolution of 1920 × 1200 pixels. Hence, a car with a length of 4 m corresponds to 16 pixels in the WAMI data; displayed on the screen it appears with a length of 0.27 mm which is approximately the size of one pixel. Subjects sat at a distance of 65 cm from the monitor. Gaze data was recorded using a low-cost Tobii 4C remote eye-tracker (90 Hz sampling rate [15]; accuracy ~1° of visual angle [7]), without head stabilization. For user calibration, the standard Tobii calibration provided with the eye-tracker software was used. Key presses were performed using a standard keyboard, and a standard optical mouse was used for mouse input. 2.1 Experimental Tasks We introduce five task types that might occur in practical visual WAMI exploitation. They built on one another in terms of complexity and interaction difficulty. Hence, they were presented to the subjects in the order from 1 to 5. Experimental task 1 requires image manipulation (zoom, pan) and the selection of stationary objects. Experimental task 2 requires additionally interaction with object detections and object tracks. Experimental task 3 requires additionally object observation during an image sequence; however, the interaction with objects and tracks is still with stationary objects only. Experimental task 4 requires additionally the detection of objects according to their moving behavior as well as interaction with moving objects. Experimental task 5 requires additionally interaction under time pressure (objects might have moved out of the scene before selection) and with moving objects only. Experimental Task 1 (Single Frame, Multiple Targets, no Object Tracks) This type of task utilizes one single WAMI frame in original size of 30,000 × 23,000 pixels. Figure 3 shows a test task example. The task instruction is to select (i.e., highlight with a frame) all vehicles that are parking on roads of all kinds. Figure 4 shows the result
358
J. Hild et al.
after selection of the targets. Required interaction functions are image zoom, image pan and enter frame (if necessary due to misclicking, delete frame as well). Experimental Task 2 (Single Frame, Multiple Targets, Object Tracks) This type of task utilizes a section of 9,000 × 4,000 pixels from one single WAMI frame. Figure 5 shows a test task example. The task instruction specifies vehicles as targets that (1) are parking within the highlighted area and (2) have an object track that originates from the image edge specified by the blue bar. Figure 6 shows the result after selection of the target objects. All interaction functions are required. Experimental Task 3 (WAMI Sequence, Single Target, Object Tracks) This type of task utilizes an image sequence containing 1,000 frames with a section of 9,000 × 4,000 pixels from the WAMI data set. Figure 7 shows a test task example. The task focusses on a single, predefined vehicle. The task instruction specifies this vehicle as target if it finally (1) parks within the highlighted area and (2) has an object track that originates from the image edge specified by the blue bar. Figure 8 shows the result after selection of the vehicle as target. Required interaction functions are image zoom, image pan, enter track and enter frame (if necessary due to misclicking, delete track and delete frame as well). Experimental Task 4 (WAMI Sequence, Single Target, Object Tracks) This type of task utilizes an image sequence containing 1,000 frames with a section of 9,000 × 4,000 pixels from the WAMI data set. Figure 9 shows a test task example. The task focusses on a single vehicle. The task instruction specifies this vehicle as target if it (1) turns as specified at the junction, if (2) its track originates from the image edge specified by the blue bar and if it (3) finally parks within the highlighted area. Figure 10 shows the result after selection of the vehicle as target. Required interaction functions are image zoom, image pan, enter track and enter frame (if necessary due to misclicking, delete track and delete frame as well). Experimental Task 5 (WAMI Sequence, Multiple Targets, Object Tracks) This type of task utilizes an image sequence containing 1,000 frames with a section of 9,000 × 4,000 pixels from the WAMI data set. Figure 11 shows a test task example. The task instruction specifies vehicles as targets that (1) turn as specified at the junction and (2) have an object track that originates from the image edge specified by the blue bar. Figure 12 shows the result after selection of the target objects. Required interaction functions are image zoom, image pan, enter track and enter frame (if necessary due to misclicking, delete track and delete frame as well).
Pilot Study on Interaction with Wide Area Motion Imagery
359
Fig. 3. Experimental task 1. Task instruction: Select all vehicles that are parking within the highlighted areas.
2.2 Procedure Twelve colleagues of our department with normal/corrected to normal vision (two females; ages: five under 30 yrs., six between 30 and 45 yrs., one over 45 yrs.) gave informed consent to participate. All were expert mouse input users, one was expert gaze input user. All performed each task type once with G+K and once with M+K. To control fatigue and learning effects, one half started with G+K, the other half with M+K (complete balanced within-subjects design). In order to provide different test tasks for the two input conditions, we designed two test tasks sets A and B. They comprise the same number of tasks, namely one test task of type 1, two test tasks for type 2, four test tasks each for types 3 and 4, and eight test tasks for type 5; in total, each subject performed 19 test tasks with each input condition. The test task sets A and B contain overall a similar number of targets. Set A contained 44 targets and additional 32 potential targets for which the track had to be checked; set B contained 53 targets and additional 32 potential targets. Three of the test tasks in each set contained no target as this situation might occur in practical image analysis as well. Each session started with a general introduction into the experiment including giving informed consent and an explanation of the applied questionnaires (cf. below). After this, the subject performed eye-tracker calibration. Then, the subject performed the first input condition. This started with an introduction into the five test task types and included extensive training of all task types (one training task for types 1 and 3, two training tasks for types 2, 4 and 5).
360
J. Hild et al.
Fig. 4. Cropped image of experimental task 1 after completion with selected target objects.
Fig. 5. Experimental task 2. Task instruction: Select all vehicles that (1) are parking within the highlighted area and (2) have an object track that originates from the right side (blue bar).
After training, eye-tracker calibration was checked. Now, the examiner issued the instruction to perform the test tasks as fast and as accurate as possible. After that, the examiner left the room and the subject performed autonomously the 19 test tasks. Each test task starts with the instruction presentation on screen. After reading it, the subject starts the trial by selecting a software-button (»start«) placed beneath the instruction.
Pilot Study on Interaction with Wide Area Motion Imagery
361
Fig. 6. Cropped image of experimental task 2 after completion with selected target object. Only one of the vehicles parking in the highlighted area is a target with the required object track origin.
Fig. 7. Experimental task 3. Task instruction: Follow the vehicle at the position highlighted with a violet frame with the eyes during the image sequence; select this vehicle as target after the sequence stopped at the end if it (1) parks within the highlighted area and (2) if its object track originates from the lower side (blue bar).
Then, the subject looks for targets and frames them. On pressing CTRL+S, the trial ends and the next instruction is presented. After finishing the test tasks, the subject rates perceived workload using the NASATLX questionnaire [16], followed by rating user satisfaction using the ISO/TS 9241-411 questionnaire [17]. As eye strain can be an issue of gaze input [18], we added eye fatigue as a feature like introduced by Zhang and MacKenzie [19]. Now, there was a break of ten minutes. After that, the subject completed input condition B (training, eye-tracker calibration check, test tasks, questionnaires).
362
J. Hild et al.
Fig. 8. Cropped image of experimental task 3 after completion with vehicle selected as target.
Fig. 9. Experimental task 4. Task instruction: Observe the junction highlighted with a violet frame with the eyes. If a vehicle (1) turns according to the specification (green arrow), it is a potential target. Now, check its track. (2) If the track originates from the image edge specified by the blue bar, follow it with the eyes until the image sequence stops. Select the vehicle as target, if it (3) parks within the specified area.
A session lasted between 2 and 2.5 h. For each input condition, introduction and training lasted between 30 and 45 min, and conducting the test tasks lasted about 30 min.
Pilot Study on Interaction with Wide Area Motion Imagery
363
Fig. 10. Cropped image of experimental task 4 after completion with vehicle selected as target.
Fig. 11. Experimental task 5. Task instruction: Observe the junction highlighted with a violet frame with the eyes. Any vehicle (1) turning according to the specification (green arrow) is a potential target. Select any vehicle as a target if (2) its track originates from the image edge specified by the blue bar.
364
J. Hild et al.
Fig. 12. Cropped image of experimental task 5 after completion with vehicles selected as target. The vehicle with the lilac-colored object track is no target as the track does not originate from the margin specified by the blue bar.
3 Results As metrics for comparison of the two input conditions A and B, we evaluated effectiveness as hit rate, efficiency as time required per target selection (including potential image zooming and panning operations) as well as perceived workload (NASA-TLX with a rating scale from 0: no workload to 100: high workload), and user satisfaction. Table 1 shows the results for hit rates and times required per successful selection. Times are calculated in the following way. For task types 1, 2 and 5, we divide the overall duration of a trial by the number of successful target selections (i.e. hits). For task types 3 and 4, we calculate it as time difference between the point in time when the image sequence stops (as only then the decision whether the vehicle parks in the specified area can be made) and the point in time, when target selection happens.
Pilot Study on Interaction with Wide Area Motion Imagery
365
Table 1. Results for hit rates and times as means (1 standard deviation). Task type
Hit rate [%]
Time/Hit [s]
G+K
M+K
G+K
M+K
1
87.4 (7.9)
81.7 (16.1)
5.780 (2.153)
5.551 (2.657)
2
100 (0)
95.6 (10.8)
4.296 (0.999)
3.327 (1.538)
3
91.7 (16.3)
95.8 (9.7)
4.822 (3.326)
5.126 (4.769)
4
100 (0)
91.7 (28.9)
6.051 (3.921)
5.922 (4.077)
5
86.2 (6.4)
86.7 (10.5)
5.610 (0.949)
5.619 (1.688)
Table 2. Results for zoom factors and numbers of Quick-zoom actions as means (1 SD). Task type
Zoom factor
Quick-zoom in
Quick-zoom out
G+K
M+K
G+K
M+K
G+K
M+K
1
53 (11)
41 (14)
3.6 (2.7)
1.8 (1.9)
3.4 (0.9)
1.8 (1.8)
2
13 (3)
8 (3)
1.6 (0.3)
0.3 (0.7)
1.2 (0.7)
0.3 (0.5)
3
12 (6)
10 (4)
0.8 (0.8)
0.1 (0.2)
0.8 (0.2)
0.2 (0.3)
4
10 (4)
9 (3)
0.7 (0.8)
0.2 (0.6)
0.8 (0.6)
0.3 (0.5)
5
11 (3)
10 (3)
1.4 (1.3)
0.7 (1.2)
1.5 (1.2)
0.7 (1.1)
The hit rates appear slightly better for G+K for task types 1 and 2 comprising only the selection of stationary objects in single frames; time was slightly shorter for M+K. For task types 3, 4 and 5 both hit rates and times are similar for G+K and M+K. Overall, the results show that subjects performed similar with G+K and M+K. The hit rate was overall 93(6) % with G+K, and 90(6) % with M+K. Time required per successful target selection was on average 5.2 s with G+K, and 5.1 s with M+K. Table 2 shows the zooming behavior as zoom factors applied for selection operations and as numbers of occurrences of the Quick-zoom mechanism. The zoom factors are slightly larger for task types 1 and 2 that require stationary object selection only. Zoom factors are similar for task types 3, 4 and 5 requiring moving object selections. Quick-zoom was far more often used for G+K than for M+K. For type 1, a number of four Quick-zooms could be expected as there are four image areas to exploit; the result for G+K reflects this fact quite well.
366
J. Hild et al.
For task 2, a number of two Quick-zooms in could be expected, and one Quickzoom out: Quick-zoom in to find potential targets, Quick-zoom out to check their tracks, Quick-zoom in, again, to frame the targets; again, the result for G+K reflects this. The reason that the numbers for M+K are much smaller might be due to that the interaction with the mouse wheel appeared fast enough; hence, the subjects stuck with the familiar interaction.
Fig. 13. Results for subjectively perceived workload (0: no workload, 100: high workload).
Figure 13 shows the results for perceived workload. The NASA Task Load Index shows similar results for G+K and M+K. Looking at the subscales reveals that mental demand is higher for G+K; maybe this could be improved with more training. Frustration is much higher for M+K; maybe, the subjects were more indulgent to themselves in cases of interaction obstacles or target misses when interacting with the novel and unfamiliar gaze input. Figure 14 shows the ratings of the user satisfaction on a 7-point scale (7: very good, 1: very bad). The results are good to very good as well as similar for both input concepts for the most features. However, G+K achieves a better result for operation speed and general comfort. While such results have been reported before [4], they are still astonishing due to the fact that subjects are familiar with manual mouse and keyboard interaction, but unexperienced with gaze input. Moreover, it appears that eye strain is an issue for G+K more as it is one for M+K.
Pilot Study on Interaction with Wide Area Motion Imagery
367
Fig. 14. Results for user satisfaction on a 7-point scale (7: very good, 1: very bad).
4 Conclusion Due to the small number of subjects, the results of the present pilot study have to be considered preliminary. Moreover, the results could be different for expert image analysts. The results of our pilot study show that multi-modal gaze input combining gaze pointing and keyboard appears as an equal alternative to the traditional manual interaction with computer mouse and keyboard. Overall, the subjects achieved a higher hit rate with gaze input, and report similar perceived workload and user satisfaction with the two interaction concepts. The required time per successful target selection is similar for both input concepts if moving object selection is required; however, time is shorter for mouse input for tasks that require only stationary object selection in single WAMI frames. Probably, this is due to the fact that all subjects are expert mouse input users but (except one subject) are novel gaze input users. As moving object selection is not a common interaction
368
J. Hild et al.
task, subjects are untrained not only with gaze input, but also with mouse input; hence, selection completion times are similar. Comparing the results to those provided by Hild et al. for moving object selection in FMV [4], we find the same similar effectiveness for gaze input and mouse input. It is on the positive side that the present study confirms this result, as now this result was achieved using a low-cost eye-tracker – the high quality of such devices is a prerequisite for gaze-based interaction to become a commonly used input method. However, the reported huge time saving for gaze input compared to mouse input cannot be observed in the present pilot study.
References 1. Sommer, L.W., Teutsch, M., Schuchert, T., Beyerer, J.: A survey on moving object detection for wide area motion imagery. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9. IEEE (2016) 2. Hoffmann, R.R., Johnson, M., Bradshaw, J.M., Underbrink, A.: Trust in automation. IEEE Intell. Syst. 28(1), 84–88 (2013) 3. Heinze, N., Esswein, M., Krüger, W., Saur, G.: Automatic image exploitation system for small UAVs. In: Airborne Intelligence, Surveillance, Reconnaissance (ISR) Systems and Applications V, vol. 6946, pp. 106–115. International Society for Optics and Optronics (2008) 4. Hild, J., Kühnle, C., Beyerer, J.: Gaze-based moving target acquisition in real-time full motion video. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, pp. 241–244. ACM (2016) 5. Hild, J., et al.: Pilot study on real-time motion detection in UAS video data by human observer and image exploitation algorithm. In: Geospatial Informatics, Fusion, and Motion Video Analytics VII, vol. 10199, p. 1019903. International Society for Optics and Optronics (2017) 6. Hild, J., Saur, G., Petersen, P., Voit, M., Peinsipp-Byma, E., Beyerer, J.: Evaluating user interfaces supporting change detection in aerial images and aerial image sequences. In: Yamamoto, S., Mori, H. (eds.) HIMI 2018. LNCS, vol. 10905, pp. 383–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92046-7_33 7. Hild, J., Peinsipp-Byma, E., Voit, M., Beyerer, J.: Suggesting gaze-based selection for surveillance applications. In: 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8. IEEE (2019) 8. Hild, J., Holzbach, G., Maier, S., van de Camp, F., Voit, M., Peinsipp-Byma, E.: Gazeenhanced user interface for real-time video surveillance. In: Stephanidis, C., Antona, M., Ntoa, S., Salvendy, G. (eds.) HCI International 2022 – Late Breaking Posters. HCII 2022. Communications in Computer and Information Science, vol. 1654. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19679-9_7 9. Sommer, L., Krüger, W., Teutsch, M.: Appearance and motion based persistent multiple object tracking in wide area motion imagery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3878–3888. IEEE (2021) 10. Motorcu, H., Ates, H.F., Ugurdag, H.F., Gunturk, B.K.: Hm-net: a regression network for object center detection and tracking on wide area motion imagery. IEEE Access 10, 1346–1359 (2021) 11. Al-Shakarji, N. M., Bunyak, F., Seetharaman, G., Palaniappan, K.: Robust multi-object tracking for wide area motion imagery. In: 2018 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–5. IEEE (2018)
Pilot Study on Interaction with Wide Area Motion Imagery
369
12. Hartung, C., Spraul, R., Krüger, W.: Improvement of persistent tracking in wide area motion imagery by CNN-based motion detections. In: Image and Signal Processing for Remote Sensing XXIV, vol. 10789, pp. 249–258. International Society for Optics and Optronics (2018) 13. U.S. Air Force Research Laboratory (AFRL), WPAFB2009 dataset. https://www.sdms.afrl. af.mil/index.php?collection=wpafb2009. Accessed 9 Feb 2023 14. Philipson, W.R.: Manual of photographic interpretation. Asprs Publications (1997) 15. Tobii Homepage. https://help.tobii.com/hc/en-us/articles/213414285-Specifications-for-theTobii-Eye-Tracker-%204C. Accessed 8 Feb 2023 16. Hart, S.G.: NASA-task load index (NASA-TLX); 20 years later. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 50, no. 9, pp. 904–908. Sage Publications, Sage, Los Angeles, CA (2006) 17. ISO, I. S. O.: 9241–411 Ergonomics of human-system interaction–Part 411: Evaluation methods for the design of physical input devices. International Organization for Standardiza-tion (2012) 18. Hirzle, T., Cordts, M., Rukzio, E., Bulling, A.: A survey of digital eye strain in gaze-based interactive systems. In: ACM Symposium on Eye Tracking Research and Applications, pp. 1– 12. ACM (2020) 19. Xuan Zhang, I., MacKenzie, S.: Evaluating eye tracking with ISO 9241 - Part 9. In: Jacko, J.A. (ed.) HCI 2007. LNCS, vol. 4552, pp. 779–788. Springer, Heidelberg (2007). https://doi. org/10.1007/978-3-540-73110-8_85
Development of a Speech-Driven Communication Support System Using a Smartwatch with Vibratory Nodding Responses Yutaka Ishii1(B) , Kenta Koike2 , Miwako Kitamura2 , and Tomio Watanabe1 1 Okayama Prefectural University, Kuboki 111, Soja, Okayama, Japan
{ishii,watanabe}@cse.oka-pu.ac.jp
2 Graduate School of Okayama, Prefectural University, Kuboki 111, Soja, Okayama, Japan
[email protected]
Abstract. Communication tools are diversifying and various communication skills are required. The importance of communication skills including face-toface conversation is once again attracting attention. Therefore, there is a high demand for a communication support system in face-to-face situation. Previous studies have attempted to present rhythmic entrainment, which plays an important role in nodding responses, visually and haptically through grip type vibration devices and LEDs. However, their effectiveness in daily interaction has not been demonstrated. In this study, we developed a communication support system in conversational situations by using a smartwatch, which can be used comfortably in communicative situations. The system presents a nodding response that promotes embodied rhythmic entrainment through tactile sensation. Furthermore, the system was evaluated in daily use by a user test for 5 days. Keywords: Communication enhancement · Smartwatch · Vibratory nodding response · Haptically feedback
1 Introduction In recent years, online communication has become popular, and the importance of communication skills has been attracting attention. According to a survey conducted by the Agency for Cultural Affairs in 2017, 96.4% of the respondents answered yes to the question whether communication skills are important [1]. On the other hand, there are many people who feel that they are not good at communication. According to a survey in Japan, 57% of the people answered that they were not good at communication [2]. Therefore, the development of a system that supports communication is highly expected. In face-to-face communication, nonverbal information such as facial expressions, nods, blinks, gestures, and hand gestures play as important roles as verbal information. It has been reported that nonverbal information accounts for about 65–93% of communication. In embodied communication that includes nonverbal information and biological © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 370–378, 2023. https://doi.org/10.1007/978-3-031-35132-7_27
Development of a Speech-Driven Communication Support System
371
information, speakers interact with each other to share their embodied rhythms and communicate smoothly [3]. Nagai et al. have developed a speech-driven embodied entrainment pointer system “InterPointer” which is a pointing device with a visualized response equivalent to nodding response by the entrainment of embodied rhythms between a lecturer and audiencess. In addition, the effect of the nodding response of InterPointer was demonstrated for supporting embodied interaction and communication in presentation. Moreover, “InterVibrator” was developed by using a vibration device with a vibratory response equivalent to nodding response in the same timing as listener’s interaction to speech input. The effectiveness of the system was demonstrated by the sensory evaluation of paired comparison and seven points bipolar rating in communication experiment by using the system [4]. However, this attempt was made with the assumption of use in presentations, and the effect of tactile presentation of nodding responses in daily conversations has not been demonstrated. Therefore, we developed a system that presents nodding responses by touch using a smartwatch that can be worn at all times and used in various communication situations.
2 Vibration Presentation System 2.1 Vibration Patterns of the Device In this research, a system prototype is developed using AppleWatch (Series 6–40 mm) as a vibration presentation device [5]. Swift is used to develop an application system. AppleWatch has a Haptic Feedback function for application development. This function is originally implemented as a kind of notification by a set of sound and vibration. In this system, this notification function is used as a tactile nodding reaction presentation, and the sound that occurs at the same time is cut by setting the mute mode (Fig. 1).
Fig. 1. Smart watch (AppleWatch) with the vibration presentation system.
There are several patterns of Haptic Feedback of AppleWatch, each with different contents of sound and vibration and operation time. In the system developed in this
372
Y. Ishii et al.
research, combined patterns are set continuously in order to generate vibrations corresponding to various nodding motions. However, when it was set continuously, there was a condition where the vibration did not work properly. Therefore, we investigated the pattern of Haptic Feedback in order to understand the time interval for correct operation. The results are shown in Table 1. Here, in the table, “Feedback pattern name” and “Vibration impact” are the names officially used by Apple [5], and “Number of vibrations”, “Time of one sequence”, and “Time of the next sequence” are added based on what was investigated in this study. “Number of vibrations” is the number of movements that occur within one pattern, “Time of one sequence” is the total time for the pattern, and “Time of the next sequence” is the time required for the next pattern to work correctly. It shows the time. There are patterns that differ only in sound, and the vibration patterns are the same for DirectionUp and DirectionDown, Failure and Retry. Table 1. Haptic Feedback pattern list of AppleWatch [5]. Feedback pattern name
Vibration Impact
Number of vibrations
Time of one sequence [msec]
Time of the next sequence [msec]
Notification
Rigid
2
612
798
DirectionUP
Heavy
2
102
182
DirectionDown
Heavy
2
102
182
Success
Rigid
3
236
299
Failure
Continuous
1
364
429
Retry
Continuous
1
364
429
Start
Medium
1
43
134
Stop
Medium
2
351
419
Click
Light
1
30
117
2.2 System Prototype with Interaction Model for Auto-Generated Entrained Vibration A prototype of the system is shown in the Fig. 2. In this system, based on the volume measured by the microphone built into the AppleWatch, the nodding response is presented by visual presentation by a character displayed on the monitor and tactile presentation by Haptic Feedback. The system uses InterRobot technology (iRT) that automatically generates speaker and listener actions based on the speaker’s voice according to the following interaction model. Figure 2 shows an example screen of the system prototype. While the nod response is not presented, the character waits in the state on the left side of Fig. 2, and while the nod response is presented, the character changes to the state on the right side of Fig. 2 to relate vibration and nodding.
Development of a Speech-Driven Communication Support System
373
Fig. 2. Example of the system prototype.
2.3 Listener’s Interaction Model A listener’s interaction model of a vibration feedback includes a nodding reaction model [4] that estimates the nodding timing from a speech ON-OFF pattern and a body reaction model linked to the nodding reaction model. A hierarchy model consisting of two stages, macro and micro (Fig. 3), predicts the timing of nodding. The macro stage estimates whether a nodding response exists or not in a duration unit that consists of a talkspurt episode T (i) and the following silence episode S(i) with a hangover value of 4/30 s. The estimator M u (i) is a moving-average (MA) model, expressed as the weighted sum of unit speech activity R(i) in (1) and (2). When M u (i) exceeds the threshold value, the nodding M(i) is also an MA model, estimated as the weighted sum of the binary speech signal V (i) in (3). J Mu (i) = a(j)R(i − j) + u(i) (1) j=1
R(i) =
T (i) T (i) + S(i)
a( j): linear prediction coefficient T (i): talkspurt duration in the i-th duration unit S(i): silence duration in the i-th duration unit u(i): noise K M (i) = b(j)V (i − j) + w(i) k=1
(2)
(3)
b( j): linear prediction coefficient V (i): voice w(i): noise 2.4 Nodding Response Presentation by Vibration The volume threshold for predicting the generation of a nod and the movement pattern of the nod response can be changed on the screen. If the threshold is too low, the vibration
374
Y. Ishii et al.
Fig. 3. Interaction model for auto-generated motions.
sound of AppleWatch will react as an input and cause malfunction. Therefore, it is possible to set the range to avoid malfunction. As for the action pattern, various nodding responses are expressed by combining “Start” and “Click”, which are simple vibrations of Haptic Feedback. Table 2 shows examples of nodding responses. A chronological order of usage patterns (“Wait” does nothing) is shown for the types of nodding responses. Previous studies have shown that various nodding motions give different impressions in experiments using 3D models, and various vibration presentations were prepared in this study as well. This paper presents three example patterns as follows: • a) Light nod, nod time 1000 ms, nod once: This is represented by two “clicks” as shown in the first column. • b) Medium nod, 500 ms, two times: This combines “start” and “click” as shown in the second column. • c) Deep nod, 250 ms, three times: This is represented by six “starts” as shown in the third column.
3 Evaluation Experiment in Dairy Use 3.1 Experimental Setup In this study, an experiment was conducted to investigate impressions in daily life. In the experiment, the participants used the AppleWatch with the system installed in their daily lives and asked them to answer about their impressions. The experiment participants were five male students aged 23–25. The experiment period was 5 days, and we instructed the participants to keep the system running as long as possible. During the experiment, the AppleWatch was set to mute mode and the intensity of vibration was moderate. Users were free to use other AppleWatch applications during the experiment, and the vibration
Development of a Speech-Driven Communication Support System
375
Table 2. Examples of vibration patterns as nodding. a) Light nod, 1000ms, one time
b) Medium nod, 500 ms, two times
c) Deep nod, 250 ms, three times
Click (0–30 ms) Wait (31–969 ms) Click (970–1000 ms)
Start (0–43 ms) Wait (44–469 ms) Click (470–500 ms) Wait (501–799 ms) Start (800–843 ms) Wait (844–1269 ms) Click (1270–1300 ms)
Start (0–43 ms) Wait (44–206 ms) Start (207–250 ms) Wait (251–549 ms) Start (550–593 ms) Wait (594–756 ms) Start (757–800 ms) Wait (801–1099 ms) Start (1100–1143 ms) Wait (1144–1306 ms) Start (1307–1350 ms)
pattern and threshold volume could be changed. The vibrations that can be generated by the system are the following 6 patterns with different strengths and frequency of vibrations: • • • • • •
Pattern 1: “Click” once Pattern 2: “Start” once Pattern 3: “DirectionUp” once Pattern 4: “Start + Cick” Pattern 5: “Start + Wait (414 ms) + Start” Pattern 6: “Start + Wait (427 ms) + Click”
Pattern 4 expresses one large vibration by continuously combining two vibrations. Patterns 5 and 6 also consist of two vibrations. The first vibration expresses the motion of lowering the head of a nod, and the second vibration represents the motion of raising the head. The interval time is set to 500 ms, which is the standard nodding time in previous research [6], as the total time for one vibration pattern. In the experiment, the participants answered questionnaires before, during, and after the experiment. The questionnaire during the experiment was answered every day, a total of 5 times at the end of the day. In the questionnaire, we asked about the following items. • Pre-experiment questionnaire. – Age/Gender – Experience using a smartwatch (Currently using/Used in the past but not currently using/Never used) – Conversation frequency per day as an average (3 h or more a day/1–3 h a day/Less than 1 h a day/About once a few days) • Questionnaire during the experiment. – System usage time
376
– – – – – –
Y. Ishii et al.
Impression of the system (5-level evaluation, 1 item) Battery charging frequency (5-level evaluation, 1 item) Most used vibration pattern of the day Reason for choosing vibration pattern (free description) Impressions of dialogue scenes while using the system (free description) Situations in which the system was effectively used in situations other than dialogue situations (free description)
• Post-experiment questionnaire. – Impression of the system (8 items, 7 grades: from 1 to 7) Comfort/Ease of speaking/Familiarity/Preference/Enjoyment/Sense of unity with others/Sense of security/Would you like to continue using – The vibration pattern most commonly used throughout the experiment – Reason for choosing vibration pattern (free description) – What you felt about the system (free description) – Feelings during the entire experiment (free description) 3.2 Experimental Results Table 3 shows the results of the questionnaire before the experiment, Table 4 shows the results of the questionnaire during the experiment, and Table 5 shows the results of the questionnaire after the experiment. The standard deviation is shown in parentheses for each result in Tables 4 and 5. Table 3. Results of the pre-experiment questionnaire. Items of Questionnaire
Results
Age
23–25
Gender
Male 100%
Experience using a smartwatch
Never: 60%, Currently using: 40%
Conversation frequency
3 h or more a day: 100%
Table 4. Results of the questionnaire during the experiment. Items of Questionnaire
Results
System usage time
10.36 (±1.93) [h]
Impression of the system
3.20 (±0.57)
Battery charging frequency
2.88 (±0.77)
In Table 4, “System usage time” indicates the average time for all answers given by the participants in the experiment. “Battery charging frequency” shows the average
Development of a Speech-Driven Communication Support System
377
Table 5. Results of the post-experiment questionnaire. Items of Questionnaire
Results
Comfort
4.0 (±1.10)
Ease of speaking
4.8 (±0.75)
Familiarity
3.8 (±0.98)
Preference
3.4 (±1.02)
Enjoyment
4.0 (±1.41)
Sense of unity with others
3.4 (±0.80)
Sense of security
4.2 (±1.17)
Would you like to continue using
2.2 (±1.17)
score of all answers, with 5 points for “Very little” and 1 point for “Very often”. For each item in Table 5 the average score of the experiment participants is shown, with 7 points indicating “strongly agree” and 1 point indicating “strongly disagree.” Regarding the post-experiment questionnaire, as shown in Table 5, this system was evaluated very highly in the item “Ease of speaking”. Therefore, the possibility that it becomes easy to talk with people by using the system in daily life was shown. In addition, in the free description, there were opinions about using the system in situations other than during conversations, such as “It was fun to respond to soliloquy” and “I was able to organize my remarks by responding to soliloquy”. The system was evaluated in a situation where there is no one to talk to.
4 Conclusion In this paper, a communication support system was developed by vibrating the nodding response using a smartwatch, and the impression of using the device was investigated in the experiment of daily life. As a result, positive evaluations were obtained regarding the ease of conversation by using the system. In addition, it was shown that the system can be used in situations where there is no one to talk to. Acknowledgments. This work was supported by JSPS KAKENHI Grant Number JP20H04232.
References 1. Public Opinion Survey on Japanese Language, Agency for Cultural Affairs,Government of Japan (2017). (in Japanese) 2. JTB: Communication Comprehensive Survey 3rd Report “Awareness of poor communication skills”. https://www.jtbcom.co.jp/article/hr/547.html. Accessed 26 Oct 2022. (in Japanese) 3. Watanabe, T.: Human-entrained embodied interaction and communication technology. In: Fukuda, S. (eds.) Emotional Engineering, pp. 161–177. Springer, London (2011). https://doi. org/10.1007/978-1-84996-423-4_9
378
Y. Ishii et al.
4. Nagai, H., Watanabe, T., Yamamoto, M.: InterVibrator: speech-driven embodied entrainment system with a vibratory nodding response. In: Proceedings of the 6th International Workshop on Emergent Synthesis (IWES 2006), pp.113–118 (2006) 5. Playing haptics: Platform considerations – Apple Developer Documentation. https://develo per.apple.com/design/human-interface-guidelines/patterns/playing-haptics/. Accessed 26 Oct 2022 6. Kitamura, M., Kurokawa, S., Ishii, Y., Watanabe, T.: Impression evaluation of various nodding movements by humanoid 3D model, Correspondences on Human Interface, Human Interface Society, vol. 23, no. 6, pp. 1–8 (2021). (in Japanese)
Coordinated Motor Display System of ARM-COMS for Evoking Emotional Projection in Remote Communication Teruaki Ito(B) and Tomio Watanabe Faculty of Computer Science and System Engineering, Okayama Prefectural University, 111 Kuboki, Soja, Okayama 719-1197, Japan [email protected]
Abstract. The authors have proposed a coordinated motor display system for called ARM-COMS (ARm-supported eMbodied COMmunication Monitor System) that detects the orientation of the subject’s head by face tracking through image processing technology and physically moves the monitor to mimic the subject. The idea behind this is that the remote subject’s avatar behaves as if it functions during video communication and interacts with the local subject. In addition, ARM-COMS responds appropriately with voice even when the remote subject’s head movements cannot be detected. Furthermore, ARM-COMS is a highly responsive system by responding to local subject’s voice. This paper introduces the basic concepts of ARM-COMS, describes how it was developed, and describes how the basic procedures were implemented in a prototype system. Then this paper introduces the results of teleconferencing experiments using ARM-COMS, and describes the findings obtained from the results, including the effect of physical interaction through ARM-COMS, camera shake problem, and evoking emotional projection in remote communication. Keywords: Embodied communication · Augmented tele-presence robotic arm · Robot operating system · Natural involvement · Emotional projection
1 Introduction Due to the spread of the new coronavirus infection, people are forced to live a daily life based on a new lifestyle in order to prevent the spread of infectious diseases, such as avoiding the 3 Cs (crowded, close, closed) and maintaining physical distance from others as stated in the report of Office for COVID-19 and Other Emerging Infectious Disease Control, Cabinet Secretariat, Government of Japan [4]. Under these circumstances, personal video communication tools have become widely used by many people, not only for personal communication [3], but also for business meetings, conferences, telework, healthcare [19], etc. With DX technology spurred by IoT [17], Society 5.0 [27] and Industry 4.0 [17], these communication tools are expected to develop using the latest technologies such as AR, VR or Metaverse [5] tools. [15]. However, video communication tools still have some significant problems, such as lack of telepresence © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 379–388, 2023. https://doi.org/10.1007/978-3-031-35132-7_28
380
T. Ito and T. Watanabe
for participants, lack of sense of connection during video communication as opposed to typical face-to-face communication. This research starts with the idea of a motion-enhanced display that provides a sense of presence through the movement of a physical monitor that corresponds to the virtual content displayed on the monitor screen. For example, it was recognized that physically rotating a monitor displaying a spinning football to correspond to the movement of the football could provide a far more realistic sensation than just displaying the spinning football on a monitor screen. This study applies the idea of the motioncoordinated display to a video communication system in order to enhance the presence of the other party by physically moving the monitor in conjunction with the movement of the other party displayed on the teleconference monitor, of which system is called ARM-COMS (ARm-supported eMbodied COmmunication Monitor System) [9–11] for human-computer interaction through remote individuals’ connection with augmented tele-presence systems. It was confirmed that the ARM-COMS motion caused a physical withdrawal motion between the subject and the remote communication partner. The challenge of this idea is to design cyber-physical media using ARM-COM for remote communication [10, 12] by controlling ARM-COMS using human body motions in remote locations as non-verbal messages [7]. After briefly explaining the background of the research, the framework of the system will be presented to show the goal of this research. Then, this paper introduces a prototype system currently under development, reports the results of operation experiments conducted using this prototype system, and discusses evoking emotional projection in remote communication on ARM-COMS.
2 Research Background Remote communication support tools that incorporate remote video and voice communication have achieved a remarkable spread because of the expansion of telework due to the corona disaster. Considering that remote conferencing systems would not have spread to this extent without the COVID-19 pandemic, this could be seen as a positive side effect. On the other hand, fundamental problems have been pointed out, such as the inability to convey the presence of the remote person, the inability to share the atmosphere of the place, and the inability to feel the relationship with the remote person. As for the issue of inability to convey the presence of the remote person, telepresence robots, such as Beam+, Double, etc., have been developed to enhance the presence of remote users, and their effectiveness and practicality have been reported. These robots can not only display the face image of the remote person, but can also use basic functions such as remote-controlled robot movement and robot telemanipulation with explicit remote control. Recently, an idea of robotic arm-type systems draws researchers’ attention [31]. For example, Kubi [14], which is a non-mobile arm type robot, allows the remote user to “look around” during video communication by way of commanding Kubi where to aim the tablet with an intuitive remote control over the internet. Incidentally, this operation is a function operated by the user’s explicit remote control, and does not operate autonomously. As for the issue of inability to share the atmosphere of the place, it has been reported that some proposed ideas were recognized as effective in conveying the atmosphere of the
Coordinated Motor Display System of ARM-COMS
381
place, for example, Anthropomorphization [23] is a new idea to show the telepresence of a remote person in communication system, where the synergistic effect of the movement of people projected on the screen as well as the physical movement of the screen itself. Even though the physical space sharing is not possible, virtual space sharing is also available in other research. Virtual avatar is a promising approach to share the virtual space instead of physical space. oVice [24] is a communication tool that allows the user to easily talk with multiple people in real time in a virtual space which is called metaverse on the Internet. As for the issue of the inability to feel the relationship with the remote person, bridging the lack of connectedness with remote interlocutors is a major challenge in remote video communication [30]. However, the technology that provides a place to feel “relationship” with the communication partner is still under study. The motion control of such a robot is premised on explicit operation by a remote control, and can be said to be a remote-controlled robot that operates as the user’s eyes, hands and feet. For example, it is recognized as an effective means in the fields of space exploration robots, underfloor inspection robots, and the like. However, the situation is different when considering remote communication support robots. In other words, the perspective from the side facing the robot is lacking, and as a result, it is thought to be the cause of the feeling that the facing side does not feel the relationship with the other person, even if the presence is felt. The authors focused on body motion as non-verbal information, and proposed a method of expressing that motion with a motion-emphasizing display and using it as a remote communication support. In recent years, biological reaction measurement such as face/finger authentication, line-of-sight measurement, and electroencephalogram measurement has attracted attention, and applications in various fields are expected. Face image recognition and voice recognition have been extensively applied in engineering research, and the use in this research is consistent with such research trends. By synthesizing the input synthesis signal of the direct body motion signal from head movement and the indirect body motion signal using human biological information (eyes, nose, mouth), the tablet motion is physically linked to the movement of the remote communication partner displayed on the tablet. The authors were able to confirm the physical entrainment that was caused by this physical interaction. In order to mimic the head motion using the display [7, 13], ARM-COMS detects the orientation of a face by facedetection tool [25] based on an image processing technique [9]. However, ARM-COMS does not make appropriate reactions if a remote talker speaks without head motion in explicit motion. In order to solve this problem, a voice signal usage [16] in local interaction is used so that ARM-COMS makes an appropriate action [10] even when the remote partner does not make any head motion. It has been found that increasing the accuracy of interlocking with body movements does not necessarily directly lead to a sense of involvement. Then, the author had the opportunity to touch on the idea of a human-symbiotic robot that Madeline Ganon [1] talks about. The idea is that when an object reacts to humans like a living creature, humans project their emotions onto it. Previous studies have attempted to control motorcoordinated displays by combining multimodal information, but no matter how much
382
T. Ito and T. Watanabe
the accuracy is improved, they cannot move like living creatures. Therefore, a deep generation model using a deep neural network as a technology for automatically generating flexible motions for robots in various environments is also under study to apply in this research. This research is also focusing on deep generative models. This is a technology that regards various data that can be obtained in the real world as being generated according to a specific probability distribution, and uses this probability distribution to generate new data sampling and simulations. By learning a generative model from a group of high-dimensional real data with this complex data distribution, it is possible to generate arbitrary realistic and highly accurate pseudo data from the generative model. Representative examples of deep generative models include generative adversarial networks (GANs) and variational autoencoders (VAEs), which are being actively studied. This research is an attempt to adapt a deep generative model to the new field of movement generation of motion-coordinated display type telepresence robots, and it is a novel theme that has never been seen before.
3 System Configuration and Basic Function 3.1 Basic Concept and Control Mechanism for ARM-COMS Prototype System Focusing on head and body movements as non-verbal information used in face-to-face communication, the authors have proposed the idea of a motion-enhanced display that provides a sense of presence through the movement of a physical monitor that corresponds to the virtual content displayed on the monitor screen and implemented a prototype system called ARM-COMS (ARm-supported eMbodied COmmunication Monitor System) that integrates a tablet terminal with a robot arm that gives physical movement to the tablet terminal which mimics the movement of a remote partner in remote communication. In order to realize this proposal, the authors implemented a prototype system of the remote conference system ARM-COMS that integrates a tablet terminal with a robot arm that gives physical movement to the tablet terminal. ARM-COMS is composed of a tablet PC and a desktop robotic arm. A tablet PC used in ARM-COMS is common ICT device, and a desktop robot arm functions as a manipulator for the tablet, and its position and movement are controlled autonomously based on the motion of a remote user communicating via ARM-COMS. The autonomous operation of this ARM-COMS is controlled by head movements that can be recognized by a common USB camera through the concept of FaceNET [26]. It is then processed by the image processing library OpenCV [21], the face detection tool OpenFace [2, 22], which uses a Constrained Local Network Field (CLNF) consisting of point distribution mode, patch expert and fitting. Image analysis for face detection is then done by the Haar Cascade face detector using brightness differences using rectangles of different sizes. Next, 68 landmarks are defined using the dlib library [6] and the subject’s head orientation is estimated by the OpenFace tool. This dynamic attitude data is used to control the ARM-COMS. Figure 1 shows the system configuration of ARM-COMS and the control procedure described above. The ARM-COMS manipulator consists of 6 pairs of servo motors and motor
Coordinated Motor Display System of ARM-COMS
383
controllers, a single board computer with Ubuntu [18, 28] as operating system with ROS [8, 20, 25, 29] as middleware, a speaker and microphone, and a camera. Prototype system integrated with ARM-COMS Network Camera
Network Camera
Internet Holder Camera Tablet Arm Robot ARM-COMS Controller Image Processor
Local User
Remote site
Local site
Remote User Ethernet
Internet AI PC Controller PC
Fig. 1. Basic framework of ARM-COMS system
3.2 Algorithm of Voice Signal-Based Control System ARM-COMS mimics the head movements of remote communication partners during video conversations. In a test evaluation using response processing in image processing, a response delay of 120 ms occurred in a stand-alone environment, and a longer delay of 210 ms was confirmed in a network environment. According to experiments, it was not a delay time to worry about in a network environment. However, if the speaker’s movements are not noticeable, ARM-COMS will not be reflected in the operation. It is possible to switch the operation level each time to respond, but it is easily imagined that the control would become too complicated. Therefore, this research adopted a method of combining voice responses. The voice is used by combining the voice of the subject on the local side and the voice of the subject on the remote side. Doing so will ensure a timely response with the ARM-COMS and keep the video conference running smoothly. A speaker is considered speaking if the sound pressure level exceeds the threshold. If the sound pressure level is below the threshold, the speaker is considered not speaking. Therefore, when the sound pressure level once reaches the threshold value or more and then falls below the threshold value, it is considered that there is a discontinuity in the conversation. Therefore, it is considered that the nodding motion can be generated accurately by matching the timing of the break signal. Considering the case where the sound pressure level suddenly falls below the threshold during speech [16], the nodding
384
T. Ito and T. Watanabe
motion is generated only when the sound pressure level falls below the threshold within a certain waiting time.
4 Experimental Evaluation with ARM-COMS Basic motion control to imitate human interaction was confirmed in ARM-COMS experiment. Those motions include “Nodding”, “Sideway-shake” and “Head-tilting” controlled by one actuator, as well as “Lean-forward”, “Vertical-move”, and “Diagonalupward” controlled by two actuators, which are shown in Fig. 2. Even though these basic motions could be used in the interaction experiment, the most common reaction of “Nodding” was used in this experiment.
1 axis motion
Nodding
Sideway-shake
Head-tilting
Lean forward
Vertical move
Diagonal upward
2 axis motion
Fig. 2. Predefined motion examples of ARM-COMS
Fig. 3. Implemented action of ARM-COMS motion as “Nodding”
ARM-COMS detects the movement of the subject’s head and responds like a remotely placed avatar that follows the movement. Because subjects do not always make explicit head movements during speech, small movements need to be amplified, resulting in noisy ARM-COMS movements. Audio signals can be used for motion control even when no explicit head movements are recognized. Figure 3 shows an example of a video conference interaction based on ARM-COMS motion dynamically generated in this way. In this example, the subject’s head movement and local and remote subject audio signals are used jointly to control the motion of the ARM-COMS [16]. It was confirmed that the interaction with ARM-COMS was very
Coordinated Motor Display System of ARM-COMS
385
smooth and corresponded very well to the subjects. However, Controlling the ARMCOMS proved difficult, as low-noise filtering allowed too sensitive movements, while high-noise filtering slowed down the response.
5 Results and Discussions 5.1 Effect of Physical Interaction with Remote Subject Though ARM-COMS As opposed to the common video conference through PC/Tablet/Smartphone, it was confirmed that a physical entrainment motion was observed in teleconferencing via ARM-COMS. The authors have made efforts to further improve the accuracy of control using video and voice signals. Thanks to the introduction of control mechanism based on audio signals, ARM-COMS showed appropriate response by voice signals during a call even if there is no significant head movement when the remote subject responds. Furthermore, ARM-COMS is a highly responsive system by responding to the voice of the local subject. Synchronizing head and body movements to increase accuracy can create a sense of presence, and the introduction of remote and local voice control has improved the responsiveness of reactions. However, it was found that it is not enough to create a sense of involvement even if the accuracy is improved simply by linking it with the movement of the head and body, or voice signals. 5.2 Camera Shake Problem The physical response of the ARM-COMS monitor was effective in enhancing the presence of the remote subject. However, it turned out to be impractical because camera shake occurs and the angle of view on the monitor screen changes too much each time a nodding is made. As a countermeasure, it is possible to use a camera equipped with an anti-shake function. However, if the image displayed on the local ARM-COMS monitor showing the remote state does not move when the local subject nods, if the screen in the local ARM-COMS does not move when the subject nods, it means that the local subject will not be able to feel whether the remote ARM-COMS monitor is responding or not. Therefore, this is not a problem that can be solved simply by using a camera with an anti-shake function, but a research subject that requires further investigation. 5.3 Evoking Emotional Projection in Remote Communication According to the idea of a human-symbiotic robot that Ganon talks about, humans project their emotions onto an object when the object reacts to humans like a living creature. Previous studies have attempted to control motor-coordinated displays by combining multimodal information, but no matter how much the accuracy is improved, they cannot move like living creatures. Since it is not possible to project emotions simply by linking ARM-COMS with indirect body movements using human body movements and biometric information, this study introduces a movement generation method using machine learning as shown in Fig. 4. Therefore, a deep generation model using a deep neural network as a technology for automatically generating flexible motions for robots in various
386
T. Ito and T. Watanabe
environments is also under study to apply in this research. Direct training data taught by human movements and characteristic movements obtained from interaction with users will be linked to build a movement database that learns by an associative model. The associative information is generated by the associative model from the visual and auditory information from the network camera, the joint angle information from the angle sensor, and the verbal instruction information uttered by the user, and the deep model is learned. From this deep generation model, motion commands are generated, joint angles and angular velocities are generated as motion commands for the robot, and by making the robot move like a living creature, so that emotions could be induced and a sense of “relationship” would be felt.
Fig. 4. ARM-COMS motion generation method by deep generative model
6 Concluding Remarks Focusing on head and body movements as non-verbal information in face-to-face communication, the authors have reported a method for realizing the relationship with a remote person by linking the movements with a motion-enhanced display. The system for embodying this proposal is ARM-COMS (ARm-supported eMbodied COmmunication Monitor System), a teleconferencing system that integrates a tablet terminal with a robot arm that gives physical movement to the tablet terminal. The authors were able to confirm that the movement of the ARM-COMS caused a physical pull-in movement with the remote person. Efforts have been made to improve the accuracy of the control, but it has been found that it is not enough to feel the involvement even if the accuracy is improved just by linking it with the head and body movements. The question of what kind of motion generation method works in order to project the confirmed physical
Coordinated Motor Display System of ARM-COMS
387
entrainment effect as “relationship” with the other party is an academic “question” that is the core of this research project. Acknowledgement. This work was partly supported by JSPS KAKENHI Grant Numbers JP22K12131, Science and Technology Award 2022 of Okayama Foundation for Science and Technology, Original Research Grant 2022 of Okayama Prefectural University. The author would like to acknowledge Dr. Takashi Oyama, Mr. Hiroki Kimachi, Mr. Shuto Misawa, Mr. Kengo Sadakane, Mr. Tetsuo Kasahara for implementing the basic modules, and all members of Kansei Information Engineering Labs at Okayama Prefectural University for their cooperation to conduct the experiments.
References 1. Anotation. https://atonaton.com/. Accessed 12 Feb 2023 2. Baltrušaitis, T., Robinson, P., Morency, L.-P.: OpenFace: an open source facial behavior analysis toolkit. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, pp. 1–10 (2016). https://doi.org/10.1109/WACV.2016.747 7553 3. Bertrand, C., Bourdeau, L.: Research interviews by Skype: a new data collection method. In: Esteves, J. (ed.) Proceedings from the 9th European Conference on Research Methods, pp 70–79. IE Business School, Spain (2010) 4. Cabinet of Secretariat of Japan. https://corona.go.jp/en/. Accessed 12 Feb 2023 5. Dionisio, J.D.N., Burns III, W.G., Gilbert, R.: 3D virtual worlds and the metaverse: current status and future possibilities. ACM Comput. Surv. 45(3), 1–38 (2013). Article No 34. https:// doi.org/10.1145/2480741.2480751 6. Dlib C++ libraty. http://dlib.net/. Accessed 12 Feb 2023 7. Ekman, P., Friesen, W.V.: The repertoire or nonverbal behavior: categories, origins, usage, and coding. Semiotica 1, 49–98 (1969) 8. Gerkey, B., Smart, W., Quigley, M.: Programming robots with ROS. O’Reilly Media (2015) 9. Ito, T., Watanabe, T.: Motion control algorithm of ARM-COMS for entrainment enhancement. In: Yamamoto, S. (ed.) HIMI 2016. LNCS, vol. 9734, pp. 339–346. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40349-6_32 10. Ito, T., Kimachi, H., Watanabe, T.: Combination of local interaction with remote interaction in ARM-COMS communication. In: Yamamoto, S., Mori, H. (eds.) HCII 2019. LNCS, vol. 11570, pp. 347–356. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22649-7_28 11. Ito, T., Oyama, T., Watanabe, T.: Smart speaker interaction through ARM-COMS for health monitoring platform. In: Yamamoto, S., Mori, H. (eds.) HCII 2021. LNCS, vol. 12766, pp. 396–405. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78361-7_30 12. Kimachi, H., Ito, T., Watanabe, T.: Robotic arm control based on MQTT-based remote behavior communication. In: The Proceedings of Design & Systems Conference, vol. 27, Session ID 1206, p. 1206 (2017) 13. Kimachi, H., Ito, T.: Introduction of local interaction to head-motion based robot. In: The Proceedings of Design & Systems Conference. https://doi.org/10.1299/jsmedsd.2018.28. 2204 14. Kubi. https://www.kubiconnect.com/. Accessed 18 Feb 2023 15. Kumar, A., Haider, Y., Kumar, M., et al.: Using WhatsApp as a quick-access personal logbook for maintaining clinical records and follow-up of orthopedic patients. Cureus 13(1), e12900 (2021). https://doi.org/10.7759/cureus.12900
388
T. Ito and T. Watanabe
16. Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2009) 17. Lokshina, I., Lanting, C.: A qualitative evaluation of IoT-driven eHealth: knowledge management, business models and opportunities, deployment and evolution. In: Kryvinska, N., Greguš, M. (eds.) Data-Centric Business and Applications. LNDECT, vol. 20, pp. 23–52. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-94117-2_2 18. Helmke, M., Joseph, E., Rey, J.A.: Official Ubuntu Book, Pearson, Edition 9 (2016) 19. Medical Alert Advice. www.medicalalertadvice.com. Accessed 12 Feb 2023 20. Quigley, M., Gerkey, B., Smart, W.D.: Programming Robots with ROS: A practical introduction to the Robot Operating System. O’Reilly Media (2015) 21. OpenCV. http://opencv.org/. Accessed 18 Feb 2023 22. OpenFace API Documentation. http://cmusatyalab.github.io/openface/. Accessed 18 Feb 2023 23. Osawa, T., Matsuda, Y., Ohmura, R., Imai, M.: Embodiment of an agent by anthropomorphization of a common object. Web Intel. Agent Syst. Int. J. 10, 345–358 (2012) 24. oVice. https://www.ovice.com/. Accessed 12 Feb 2023 25. Rviz. https://carla.readthedocs.io/projects/ros-bridge/en/latest/rviz_plugin/. Accessed 18 Feb 2023 26. Schoff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: IEEE Conference on CVPR 2015, pp. 815–823 (2015) 27. Society 5.0. https://www.japan.go.jp/abenomics/_userdata/abenomics/pdf/society_5.0.pdf. Accessed 12 Feb 2023 28. Ubuntu. https://www.ubuntu.com/. Accessed 18 Feb 2023 29. urdf/XML/Transmission. http://wiki.ros.org/urdf/XML/Transmission. Accessed 12 Feb 2023 30. Watanabe, T.: Human-entrained embodied interaction and communication technology. In: Fukuda, S. (eds.) Emotional Engineering, pp. 161–177. Springer, London (2011). https://doi. org/10.1007/978-1-84996-423-4_9 31. Wongphati, M., Matsuda, Y., Osawa, H., Imai, M.: Where do you want to use a robotic arm? And what do you want from the robot? In: International Symposium on Robot and Human Interactive Communication, pp. 322–327 (2012)
Fundamental Considerations on Representation Learning for Multimodal Processing Kenya Jin’no1(B) , Masato Izumi2 , Saki Okamoto2 , Mizuki Dai2 , Chisato Takahashi2 , and Tatsuro Inami2 1
2
Tokyo City University, Tokyo 158-8557, Japan [email protected] Graduate School of Tokyo City University, Tokyo 158-8557, Japan https://www.comm.tcu.ac.jp/nel/
Abstract. In recent years, there has been an extremely active research boom in the fields of machine learning, particularly in artificial neural networks (ANNs). This boom was triggered by the extremely high performance of an image classification system based on cellular neural networks (CNNs) proposed in 2012. This is not only because of the high performance, but also because the proposed system uses GPGPU to reduce various computational costs, the ReLU function to prevent gradient vanish phenomena, and Dropout to realize a regularization. The availability of very easy-to-use deep learning modules such as TensorFlow and PyTorch also contributed to the rise of this type of research. As a result, various systems have been proposed and a variety of applications have been proposed. For example, the recent image generation model is a good example. On the other hand, theoretical analysis of such artificial neural networks is completely insufficient, and there is no rigorous explanation as to why each system works well. In deep learning to date, emphasis has been placed on improving the quality of output data, and little consideration has been given to the quality of information represented by latent variables from which features in the input information are extracted. On the other hand, improvement of representation learning by contrast learning, in which latent variable representations of similar input data in the same modality are brought closer together, has begun to take place. In this study, we assume a multimodal environment and examine the possibility of bringing latent variable representations close to similar input data in various modalities.
Keywords: representation learning
1
· latent variables · neural networks
Introduction
Roughly 30 years ago, our research topic was “Analysis and Synthesis of Artificial Neural Networks” [1]. At that time, we were conducting research on associative c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 389–399, 2023. https://doi.org/10.1007/978-3-031-35132-7_29
390
K. Jin’no et al.
memory with recurrent neural networks [2]. When we were doing these studies, artificial neural networks were very much in the limelight during the second research boom. The first artificial neural network to come into the limelight was the perceptron, proposed by F. Rosenblatt in 1958 [3]. The perceptron was a very simple two-layered formal neuron proposed in 1943 that modeled vision and brain function and was able to classify input by learning. This was the beginning of machine learning. The perceptron was expected to be a machine that could be programmed automatically because it could classify data by learning. However, when Marvin Minsky and Seymour Papert published their book in 1969 [4], which pointed out that the two-layer perceptron could not learn linearly inseparable problems, expectations for the perceptron were lost and the initial research boom ended. The perceptron, which consists of only two layers, is incapable of classifying linearly inseparable input vectors. However, a multi-layer perceptron with three or more layers can classify problems that are inseparable input vectors. However, it was unclear how to train such a multilayer perceptron. The Error Back Propagation Method [5], published in 1986, solved this problem. This method applies a continuous saturating function that can be differentiated instead of the binary output function used in perceptrons, allowing the differentiation of composite functions to be applied to multiple layers, thereby enabling the learning of multilayer perceptrons. By using “error back propagation”, the multilayer perceptron can learn to classify even inseparable inputs. As a result, the multilayer perceptron can be applied to a wide variety of problems. On the other hand, in 1982, J. J. Hopfield focused on recurrent neural networks in which neurons are not layered but mutual interconnected. He clarified that when a recurrent neural network is coupled to an object and has no selfcoupling, the system always works to reduce the energy defined for the system and the state converges to an energy minimum [6]. By exploiting this property of the recurrent neural networks revealed by Hopfield, a method for solving the traveling sales problem (TSP) with the recurrent neural networks has been proposed [7]. The idea is that by setting the minimum circuit path of the TSP to be the minimum energy value, the recurrent neural network searches for the optimal solution of the TSP as the state transitions. The Error back propagation and Hopfield’s network led to the second artificial neural network boom. The Hopfield’s network can also be applied to associative memory. Associative memory is a system that retrieves complete information stored in a system by “associating” it with given incomplete information. Our proposed associative memory system consists of a recurrent neural network whose activation function is a bipolar piecewise-linear hysteresis element [8–10]. The desired memories are stored as a stable fixed point or stable equilibrium point in the recurrent neural network. Therefore, we proposed a learning method in which an arbitrarily given desired memory becomes a stable fixed point or a stable equilibrium point, and investigated its stability, convergence domain, and the number of stable equilibrium points existing in the system. Although our proposed learning method can reliably memorize an arbitrarily given number of desired memories, it is difficult
Fundamental Considerations on Representation Learning
391
to suppress the occurrence of spurious stable output memories. It is also difficult to learn additional desired memories. Furthermore, since it was difficult to simulate a large network when conducting experiments, experiments could only be conducted on small networks. We thought that experiments on large-scale systems were indispensable for research on artificial neural networks, but experiments on large-scale systems were not feasible with the computational power of the computer environment at that time. Among other researchers in the world, the research attention decreased because the simulation of artificial neural networks requires very large computational resources, whereas the performance of artificial neural networks and work in application is not as good as predicted. In particular, the support vector machine (SVM) capable of nonlinear classification using the RBF kernel which is proposed by B. E. Boser et al. in 1992 [11], is considered to be one of the reasons for the decline of research on artificial neural networks. The SVM can achieve classification accuracy equal to or better than that of artificial neural networks with a very small amount of computation compared to artificial neural networks. For this reason, many of the tasks realized with artificial neural networks have been replaced by SVM, and results with much higher performance than those of artificial neural networks can be obtained. Thus, the boom in research on artificial neural networks subsided in the late 1990s. However, in 2012, artificial neural network research once again attracted a great deal of attention when it became clear that image classification systems based on convolutional neural networks (CNN) demonstrated remarkable classification accuracy [12]. This CNN is called AlexNet after its proposer. AlexNet [12] uses a layered neural network with four or more layers, which was said to be difficult to learn due to vanishing gradient and other factors. However, the AlexNet is capable of learning even four or more layers deep networks, whereas previously it was only capable of learning artificial neural networks with about three layers. This is due to the introduction of convolutional operations, non-saturated activation functions, dropout regularization, and softmax functions into the system. Such learning of layered networks with four or more layers is called “deep learning” because the number of layers is deeper than in the past. AlexNet is able to improve recognition accuracy by more than 10% compared to conventional systems using SVM. As a result, Artificial Neural Networks are once again in the limelight. It is also important to note that GPGPU has become very readily available through the use of frameworks. Simulation of neural networks requires a very large number of sum-of-products operations. Such sum-of-products operations are equivalent to those in image processing. Therefore, a method for high-speed and high-precision computation was developed using GPUs developed for image processing. The use of GPUs developed for image processing for general computing purposes is called General-Purpose computing on Graphics Processing Units (GPGPU). TensorFlow and PyTorch allow users to use GPGPUs for computation without being aware of them. This is one of the reasons for the recent rise of artificial neural network research.
392
K. Jin’no et al.
In the following sections, we outline CNNs, which have been the focus of much attention in recent years, and discuss what needs to be solved in these systems.
2
Convolutional Neural Networks (CNN)
In 1962, Hubel and Wiesel showed that neurons in cortical visual cortex respond to light stimuli in the form of elongated gaps, bars, and limbic edges [13]. Inspired by Hubel and Wiesel’s model, Fukushima proposed Neocognitron [14,15]. Neocoginitron applies convolutional operations to extract local features for pattern recognition of similarity of object shapes in input images. Similarly, a Cellular Nonlinear/neural network proposed by Chua and Yang in 1988 [16,17] and inspired by the model of Hubel and Wiesel. (This model is also abbreviated as CNN at that time, but as of 2022, CNN is more often used to refer to convolutional neural networks.) The cellular nonlinear/neural networks are composed of analog activation elements and have the characteristic of generating spatio-temporal nonlinear dynamics through convolutional operations with surrounding elements to process information. Various nonlinear dynamics are realized by “templates” that store the coefficients of convolutional operations, and as a result, various types of information processing can be performed in parallel as analog operations [18]. However, since it has been difficult to implement large-scale models, discrete-time cellular neural networks that discretize the state from analog have also been proposed [19,20]. This discrete-time cellular neural network employs time-varying templates to extract image features, halftoning, and so on. In cellular nonlinear/neural networks, the templates that basically control the convolution operation are predefined. In contrast, LeNet, proposed by Lucan in 1989, applies error back-propagation for learning Neocogitron weights [21,22]. LeNet is a convolutional neural network with an iterative structure of convolutional and pooling layers. Lucan et al. have shown that LeNet can identify handwritten digit images with extremely high accuracy [21,22]. Later, in 1998, Lucan et al. proposed a system with seven convolutional layers named LeNet5 [23]. This became the prototype of the current convolutional neural network. Although LeNet-5 had a very high identification capability for handwritten digit images, it was at this point only capable of handling relatively small 32 × 32 grayscale images for input. AlexNet can be regarded as a version of LeNet-5 that can handle large color images as input. AlexNet was proposed in 2012 by Alex Krizhevsky et al. [12]. AlexNet won the image identification contest held at the ILSVRC with overwhelming accuracy compared to previous image identification systems [24]. AlexNet consists of five convolution layers, three max pooling layers, and three full connection layers. They introduced a non-saturated activation function called ReLU to suppress gradient vanishing during gradient-based learning of multilayer neural networks. As a regularization method, they have introduced a technique called Dropout. They have also succeeded in significantly reducing
Fundamental Considerations on Representation Learning
393
learning time by allowing GPUs developed for image processing to be used for learning. The success of AlexNet has brought deep convolutional neural networks (CNN) into the limelight and attracted much attention. CNNs have since been able to train networks with more than 1,000 layers with ResNet which is proposed by He in 2015 [25].
784 (28x28) dim 28x28 gray
CNN
2 dim
Output 28x28 gray
2
Fig. 1. Auto Encoder composed of CNN
3
Representation Learning
Deep learning models are capable of learning complex problems on large amounts of high-dimensional data. Self-supervised learning can extract features from large amounts of data without labeling each piece of data and can perform representational learning, i.e., how to represent data with lower dimensional latent variables. For example, the Auto Encoder (AE) proposed by G. E. Hinton, R. R. Salakhutdinov, in 2006 [26] reduces the dimensionality of the input image in the encoder layer and decompresses it in the decoder layer. At this time, the input and output are trained to minimize the squared error of the difference with the intention of making the input and output identical. This produces a lowdimensional latent variable from which the features of the input are extracted. For example, we consider the AE as shown in Fig. 1. The encoder part of this AE performs dimensionality reduction in the convolution layer on the 784dimensional (28 × 28 pixels) input image and finally encodes it down to a 2dimensional code. Then, the decoder part decodes the 784-dimensional data from this 2-dimensional code. Figure 2 shows the data set MNIST [27] of handwritten digits consisting of 28 × 28 pixels, and Fig. 1 shows the 2-D code encoded with AE and the output image decoded from this 2-D code. The “Input” column in Fig. 2 is the input MNIST handwritten digit images, the two numbers are the 2-D encoded data, and the “Output” column represents the output images decoded from these 2-D encoded data.
394
K. Jin’no et al. Input
Output
Input
Output
Fig. 2. MNIST [27] handwritten character data, their 2D encoded data and decoded images
We visualize this two-dimensional encoded data as shown in Fig. 3. Each number in Fig. 3 corresponds to a number entered, and the respective color indicates the area. These results indicate that the AE creates 10 clusters depending on the shape of the input handwritten digits. If the shape features are well extracted in this way, this information can be used to construct a classifier with good performance. On the other hand, if this MNIST handwritten digits data is colored, clusters may be formed by color as shown in Fig. 4. In the experiment that produced the results shown in Fig. 4, randomly colored MNIST handwritten digit data were input. In the results shown in Fig. 4, clusters are formed for each color, but depending on coloring, clusters are formed in the form of numbers. In other words, the distribution shown in Fig. 3 is favorable for the task of identifying the shape of the input. On the other hand, the distribution shown in Fig. 4 is more favorable for tasks that identify the color of the input rather than its shape. In order to construct a system that can be used for a variety of tasks, it is important to have latent variables that combine these characteristics, but it is difficult to control them. The study of how to construct such latent variables can be described as representation learning.
Fundamental Considerations on Representation Learning
395
Fig. 3. 2 dimensional encoded data. Each color corresponds to the input digits shape.
Deep learning has been utilized for a variety of input modalities, and its feature extraction can be regarded as representational learning. For example, Sentence-BERT [28] produces a latent variable vector that captures the meaning of the input sentence. We have previously proposed a system such as Fig. 5 that generates an image from the latent variables generated from the sentence [29,30] This system outputs a car image of the corresponding car color from the sentence describing the “car model name and car color” as shown in Fig. 6. Figure 6(a) is an image generated from the sentence vector obtained by inputting the sentence “car model and color” of a car in the training data into Sentence-BERT. On the other hand, Fig. 6(b) is an image generated from the sentence vector obtained when a sentence of car color, which does not exist in the training data, is input to Sentence-BERT. This result indicates that Sentence-BERT generates a sentence vector that is a latent variable representing the colors in the color sentence, and that this system is able to generate an image corresponding to this vector, but it is not clear how the color representation is represented in the sentence vector. However, from this result, it can be expected that the features expressed by the latent variables are not in a modality-dependent manner, but are generated as meta-information corresponding to various modalities.
396
K. Jin’no et al.
Fig. 4. 2 dimensional latent space
Fig. 5. System for generating images from text describing the image
Fundamental Considerations on Representation Learning
397
Input sentence Original images
Generated images
(a) Trained data Input sentence
Generated images
(b) Untrained data
Fig. 6. The output image of the system as shown in Fig. 5
4
Conclusions
In this paper, we surveyed the history of artificial neural network research and introduced representation learning with artificial neural networks. Representation learning is very important in the utilization of artificial neural networks. However, at present, there are so many studies on systems that can be realized with artificial neural networks that the analysis of their contents is insufficient. In the future, we will analyze the contents of expression learning. Acknowledgement. This work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (C) Number: 20K11978. Part of this work was carried out under the Cooperative Research Project Program of the Research Institute of Electrical Communication, Tohoku University. Also, part of this work was carried out under Future Intelligence Research Unit, Tokyo City University Prioritized Studies.
References 1. Jin’no, K., Saito, T.: Analysis and synthesis of continuous-time hysteretic neural networks. IEICE Trans. A J75-A(3), 552–556 (1992) 2. K. Jin’no, T. Saito: A novel synthesis procedure for a continuous-time hysteretic associative memory. IEICE Trans. D-II J76-D-II(10), 2233–2239 (1993) 3. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psy. Rev. 65(6), 386–408 (1958) 4. Minsky, M., Papert, S.: Perceptron. MIT Press, Cambridge (1969) 5. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. Nature 323, 533–536 (1986)
398
K. Jin’no et al.
6. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. USA 79(8), 2554–2558 (1982) 7. Hopfield, J.J., Tank, D.W.: “Neural” computation of decisions in optimization problems. Biol. Cybern. 52, 141–152 (1985) 8. Saito, T., Oikawa, M., Jin’no, K.: Dynamics of hysteretic neural networks. In: Proceedings of 1991 IEEE International Symposium on Circuits and Systems (ISCAS 1991 Singapore), pp. 1469–1472 (1991) 9. Jin’no, K., Saito, T.: Analysis and synthesis of a continuous-time hysteresis neural network. In: Proceedings 1992 IEEE International Symposium on Circuits and Systems (ISCAS 1992 SanDiego), pp. 471–475 (1992) 10. Jin’no, K., Saito, T.: Analysis and synthesis of continuous-time hysteretic neural networks. IEICE Trans. J75-A(3), 552–556 (1992) 11. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of COLT 1992, pp. 144–152 (1992) 12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS 2012, pp. 1097–1105 (2012) 13. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160(1), 106–154 (1962) 14. Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position-neocognitoron. IEICE Trans. A J62-A(10), 658– 665 (1979) 15. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 93–202 (1980) 16. Chua, L.O., Yang, L.: Cellular neural networks: theory. IEEE Trans. Circuits Syst. 35, 1257–1272 (1988) 17. Chua, L.O., Yang, L.: Cellular neural networks: applications. IEEE Trans. Circuits Syst. 35, 1273–1290 (1988) 18. Chua, L.O., Roska, T.: Cellular Neural Networks and Visual Computing. Cambridge University Press, Cambridge (2002) 19. Harrer, H., Nossek, J.A.: Discrete-time cellular neural networks. Int. J. Circuit Theory Appl. 20(5), 453–467 (1992) 20. Harrer, H.: Multiple layer discrete-time cellular neural networks using time-variant templates. IEEE Trans. Cir. Syst. II 40(3), 191–199 (1993) 21. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989) 22. LeCun, Y., et al.: Handwritten digit recognition with a back-propagation network. In: Proceedings of NIPS 1989, pp. 396–404 (1989) 23. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 24. Large Scale Visual Recognition Challenge 2012 (ILSVRC 2012). https://www. image-net.org/challenges/LSVRC/2012/results.html 25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR 2016 (2016) 26. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006) 27. Handwritten digit database. http://yann.lecun.com/exdb/mnist/ 28. Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of 2019 EMNLP-IJCNLP (2019)
Fundamental Considerations on Representation Learning
399
29. Izumi, M., Jin’no, K.: Investigation of the influence of datasets on image generation using sentence-BERT. In: 2022 International Conference of Nonlinear Theory and its Applications (NOLTA 2022), pp. 252–255 (2022) 30. Izumi, M., Jin’no, K.: Feature analysis of sentence vectors by an image generation model using sentence-BERT. IEICE NOLTA E14-N(2), 508–519 (2023)
A Fundamental Study on Discrimination of Dominant Hand Based on Motion Analysis of Hand Movements by Image Analysis Using Deep Learning Takusige Katura(B) Tokyo City University, Setagaya, Tokyo, Japan [email protected]
Abstract. Recent advances in image analysis technology using deep learning have made it possible to analyze human motion more easily. Traditionally, such analysis has involved attaching physical markers and optically or electromagnetically measuring the marker positions, or manually labeling images with labels for joint positions. These methods take time to prepare for measurement, making it difficult to measure without causing psychological stress to the subject. In addition, it was not possible to analyze movements in daily life environments in a simple manner. Therefore, by using an image analysis method based on deep learning, it is possible to analyze human behavior in an effortless way, and is expected to be applied to behavior analysis, where the preparation for measurement has a considerable influence on the results. In this report, hand movements of healthy adults were subjected to analysis as a preliminary study. The results of analysis of video recordings of dominant and nondominant hand finger tapping showed that it is possible to discriminate between dominant and nondominant hands based on tapping speed, as in previous studies. Keywords: Handedness · Motor control · MediaPipe
1 Introduction In everyday life, people perform actions unconsciously with joint movements. There must be possibility of extraction of information about subconscious cognition from motion capture. Previous research on movement has included motion capture using optical markers and electromagnetic sensors, but there is a suspicion that the measurement itself may influence the subject’s psychological state and movement. Another method is to manually annotate the captured video images, but this method is not suitable for analyzing a large quantity of video images because of the time required for analysis. There is also research using devices such as tablets to trace fingertip movements, but it was not possible to study the movements of joints that were not in contact with the tablet. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 400–409, 2023. https://doi.org/10.1007/978-3-031-35132-7_30
A Fundamental Study on Discrimination of Dominant Hand Based
401
Recent advances in deep learning technology have made it relatively easy to automatically detect joints and convert joint movements into coordinate data for video images captured marker-free [1–3] By utilizing such technology, detailed analysis of hand motions can be performed from video images taken in more natural environments. Applications include, for example, large-scale Internet-based surveys in the marketing field [4–6]. Until now, information could only be collected on questionnaire responses and reaction times, but by using a PC camera to analyze hand motions, it is possible to quantify hand movements, including before and after responses, and to obtain qualitative information on responses, for example, whether there was hesitation in answering or whether the answer was immediate. In this study, we obtained video images of multiple subjects tapping their fingers separately from both dominant and nondominant hands, and examined whether it was possible to correctly identify the dominant hand from the automatically analyzed tapping movements. We investigated whether it is possible to correctly identify the dominant hand from the automatically analyzed tapping motions. We used the video recording function of a smartphone, which is the easiest to use.
2 Method 2.1 Finger Tapping Recoding Nine volunteers (ages 21–23, 7 males and 1 female, all right-handed) participated in the video recording. The participants were instructed to remain as relaxed as possible and to repeatedly tap their index to pinky fingers to their thumbs for approximately 10 s, as quickly as possible. An Xperia1 (Sony) was used to record the tapping. Video images were captured in full HD resolution (1920 × 1080) at a frame rate of 24 FPS. In the recording, the palm of the hand was recorded from directly above to avoid as many errors as possible in the estimation of joint positions in the image analysis. 2.2 Finger Motion Analysis MediaPipe Hands [1] was used for analysis to extract finger joints and fingertip edges from the video images for each frame. Although there are several tools available for finger image analysis, we chose to use the easiest available tool for this study. MediaPipe Hands analyzes images by setting several parameters. In this study, the parameter settings are shown in Table 1.
402
T. Katura Table 1. Analysis parameters of MediaPipe Hand
Parameter Name
Description
Default Value
static_image_mode
Whether to treat the input images as a batch of static and possibly unrelated images, or a video stream
FALSE
max_num_hands
Maximum number of hands to detect
1
min_detection_confidence
Minimum confidence value ([0.0, 1.0]) for hand 0.1 detection to be considered successful
The analysis was performed on a PC (Windows 11, Core i7-12700KF, RTX3090Ti) using Python 3.9. As a result of the analysis using MediaPipe Hands, the extracted hand coordinates output were for 21 key points as shown in Fig. 1. From these, the coordinates of the tips of five fingers (#4, 8, 12, 16, and 20) were used as representative points representing tapping movements to simplify the analysis. As a preprocessing step in the motion analysis, the baseline was subtracted from the obtained time-series changes in motion by fitting the time-series data with a quadratic formula to correct the baseline (Table 2). Table 2. Detailed explanation of MediaPipe Hands’ output Output Content
Description
hand_landmarks
A list of the coordinates of the detected hand landmarks. Each hand landmark is represented by a dictionary that has x, y, and z coordinates - index: The ID of this landmark - x: The x-coordinate in the image coordinate system - y: The y-coordinate in the image coordinate system - z: The depth z-coordinate if the landmark is represented in 3D space
handedness
A list of dictionaries that contain the handedness (left or right) and confidence score of the detected hands. Each dictionary has two keys, “label” and “score” - label: The handedness of the hand. “Left” or “Right” - score: The confidence score of hand detection. The value is within the range of 0 to 1
A Fundamental Study on Discrimination of Dominant Hand Based
403
From this change in fingertip end coordinates, we examine the differences between dominant and nondominant hands. One of the simplest ways to compare the two is to compare tapping speeds [7–9]. Therefore, the tapping velocity was estimated by frequency analysis of the change in fingertip end coordinates. To calculate the velocity, the sum of squares of the obtained X and Y coordinates was subjected to frequency analysis, and the peak frequency was extracted.
Fig. 1. Key points output from MediaPipe Hands. Hand Landmarks Detection Results. The image shows a person’s hand with 21 detected landmarks marked by small blue dots. The landmarks include key points such as the fingertips, the base of the palm, and the knuckles, which can be used for hand tracking and gesture recognition applications.
3 Results An example of hand motion analysis results is shown in Fig. 2. MediaPipe automatically detects hand regions from the input image, estimates 21 hand landmarks such as joints, and outputs them as X, Y coordinates on the image and also estimated Z coordinates. Figure 2-A shows the input image, 2-B plots the X, Y coordinates of the estimated joints, and 2-C shows the top of the estimated joint positions on the input image. The results show that landmarks are estimated at reasonable locations.
Fig. 2. Example of MediaPipe Hands results. (A) Input image which is one of a frame in recorded finger tapping motion. (B) Plots of the X, Y coordinates of the estimated landmarks of joints and fingertips. Black dots indicate positions of landmarks, red lines are fingers and green line show hand palm. (C) Overview of the estimated landmarks positions on the input image.
404
T. Katura
Figure 3 shows the results of estimation for X, Y, and Z coordinates. These are the estimation results for the frame where the thumb and ring finger are in contact. Figure 3(b), viewed from the top of the palm, appears to be correctly estimated. On the other hand, in the figures viewed from the front and from the side (A, C), the thumb and ring finger are far apart, indicating that the estimation of the Z-axis direction is incorrect. These results were similar for all estimation results. For this reason, the Z coordinate was not used in the subsequent analysis, and only the X and Y coordinates were used.
Fig. 3. Example of estimated 3D coordinates. X and Y axes represents horizontal and vertical axis of input image, respectively. Z axis is depth direction of input image. (A) Front view of estimated result. Horizontal axis represents X and vertical axis represents Z. Thumb and 3rd finger are not in contact. (B) Top view of estimated results. It is written on the same axis of input image. Thumb and 3rd finger are in contact from this viewpoint. (C) Side view of estimated results. Horizontal axis represents Y and vertical axis represents Z. Thumb and 3rd finger are not in contact.
A Fundamental Study on Discrimination of Dominant Hand Based
405
Fig. 4. Raw data of all participants. (1)–(7) correspond to participants 1 to 9. The upper row shows the results for the dominant hand and the lower row shows the results for the non-dominant hand. From left to right, the thumb and the first to fourth finger are shown. Values are baseline-corrected from the raw signal using a second-order polynomial. The thumb repeatedly contacts first to fourth fingers, whereas first and fourth fingers only move 1/4 of the time of the thumb, as seen from the graph.
406
T. Katura
To simplify the analysis of finger tapping movements, we focused only on the coordinates of the tips of the five fingers. The changes in the coordinates of all subjects are shown in Figs. 4(1)–(9). However, only the changes in the X or Y coordinates are shown, where the finger movements can be easily confirmed according to the orientation of the hand (vertical or horizontal) during recording. All analysis results were visually checked to ensure that there were no obvious outliers. As a result, no outliers were detected with respect to XY coordinates in all results. The velocity (frequency) of left and right finger tapping was calculated for the thumb movement that extracted the most stable change in all subjects. As an example, result of the power spectrum density of single participant’s temporal changes in finger position due to tapping is shown in Fig. 5. In this figure, the tapping speed is faster in the dominant hand than in the non-dominant hand. A distinct frequency peak was detected in all subjects, which was consistent with the visually confirmed tapping frequency. Figure 6 shows the non-dominant hand velocity divided by the dominant hand velocity relative to the dominant hand tapping velocity.
Fig. 5. Result of the power spectrum density of temporal changes in finger position due to tapping from single participant. There are peaks between 1.0 to 2.0 Hz for each of dominant and nondominant hand. From these peaks, the tapping speed is faster in the dominant hand than in the non-dominant hand.
For seven subjects, a slower tapping speed was detected for the non-dominant hand than for the dominant hand. For one subject, the tapping speed was faster for the nondominant hand, and for another subject, there was no difference between the two speeds.
A Fundamental Study on Discrimination of Dominant Hand Based
407
Fig. 6. Finger tapping speed ratio. The ratio of the non-dominant hand frequency to the dominant hand tapping frequency. For seven subjects, non-dominant hand’s tapping speed was slower than dominant hand. For one subject, the tapping speed was faster for the non-dominant hand. For another subject, there was no difference between the two speeds.
4 Discussion In this study, to explore the possibility of simplifying the study of hand movement, we used MediaPipe Hands, one of the image analysis technologies that can automatically detect hand landmarks, to extract 21 landmarks, including joints, from video images taken with a smartphone during finger tapping, and examined the differences between dominant and non-dominant hands. The differences in tapping speed between the dominant and non-dominant hand were examined. We believe that the results obtained sufficiently demonstrate the practicality of automatic hand analysis using image analysis technology. If the video images of the hand are clearly captured, as in the present study, no obvious detection errors were observed, and stable analysis results can be expected to be obtained. However, in preliminary recordings, some obvious estimation errors in joint estimation were observed in video images in which part of the fingers were hidden due to inappropriate shooting angles. In addition, although MediaPipe Hands also estimates the depth position, the estimation results contained obvious errors and were not at
408
T. Katura
a practical level. Although not used in this study, some methods using other estimation tools (****), for example, improve the estimation system including the depth direction by setting constraints on the estimation of fingers as 3D objects. The progress of image analysis technology using deep learning has been remarkable, and methods with higher estimation accuracy are expected to be devised soon, which will further expand the potential for application. In this study, assuming that video data will be collected in daily life environments in the future, videos were taken using only smartphones. The video recording conditions were set to Full HD at approximately 24 FPS so that it could be reproduced with a popular smartphone from a few years ago. Even with video images of this specification, it was possible to detect the speed of tapping. Since smartphone video recording conditions have recently improved, we believe that in the future it will be possible to detect detailed differences between conditions by recording at higher resolutions and frame rates. The detected finger tapping speed was faster for the dominant hand in almost all subjects, consistent with previous studies. The subject who was faster in the non-dominant hand was confirmed in later interviews to be a proficient piano player. Some of the other subjects had learned piano in the past (until elementary school) and some had not, but no clear differences were found between them. It is interesting that such background information of the subjects was revealed only by the speed of simple finger tapping. In the future, we would like to extract more background information on the subjects, such as their attributes, personality tendencies, and subconscious states, based on the information extracted from finger tapping in more detail.
5 Conclusion In this study, we automatically extracted finger movements during tapping exercises by image analysis and confirmed whether differences in tapping speed between dominant and non-dominant hands could be discriminated using video images captured by a common smartphone. As a result, it was confirmed that hand landmarks in a two-dimensional plane were correctly extracted visually, and that the tapping speed could also be extracted from the time variation of the position information of the extracted landmarks, indicating that the difference in tapping speed between the left and right hands could be detected.
References 1. Zhang, F, et al.: Mediapipe hands: on-device real-time hand tracking. arXiv preprint arXiv: 2006.10214 (2020) 2. Cao, Z., et al.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(01), 172–186 (2021) 3. Moon, G., Choi, H., Lee, K.M.: Accurate 3D hand pose estimation for whole-body 3D human mesh estimation. In: Computer Vision and Pattern Recognition Workshop (2022) 4. Hoffman, L., Novak, P.: Marketing in hypermedia computer-mediated environments: conceptual foundations. J. Mark. 60(3), 50–68 (1996) 5. Roztocki, N.: Using internet-based surveys for academic research: opportunities and problems. In: Proceedings of the 2001 American Society for Engineering Management (ASEM) National Conference (2001)
A Fundamental Study on Discrimination of Dominant Hand Based
409
6. Hung, K., Law, R.: An overview of Internet-based surveys in hospitality and tourism journals. Tour. Manage. 32(4), 717–724 (2011) 7. Hammond, G., Bolton, Y., Plant, Y., Manning, J.: Hand asymmetries in inter-response intervals during rapid repetitive finger tapping. J. Mot. Behav. 20, 67–71 (1988) 8. Heuer, H.: Control of the dominant and nondominant hand: exploitation and taming of nonmuscular forces. Exp. Brain Res. 178, 363–373 (2007) 9. Todor, J.I., Kyprie, P.M.: Hand differences in the rate and variability of rapid tapping. J. Mot. Behav. 12, 57–62 (1980)
Glasses Encourage Your Choices: A System that Supports Indecisive Choosers by Eye-Tracking Tatsuya Komatsubara(B) and Satoshi Nakamura Meiji University, Tokyo, Japan [email protected]
Abstract. There is the problem of worrying when making a choice. To solve this problem, we propose a system that recommends products like a clerk in a store when in need. In this study, we focus on the fact that people tend to gaze at what they are interested in. Our system observes eye movement during selection and encourages selection by suggesting to a person who is having difficulty choosing that they select the product they have been looking at most. We developed a prototype system that implements the proposed method using a wearable eyetracking system. We conducted two selection experiments, a selection from a menu (number of options: N = 20) and a selection from a catalog (N = 116), to test the system’s usefulness. As a result, the selection from a menu reduced the time required for selection and resulted in high selector satisfaction. There was no such tendency in the case of selection from a catalog. Gaze movements during selection were more complex than in the selection from a menu, suggesting that the timing of recommendation may be necessary. Keywords: Gesture and Eye-gaze Based Interaction · Selection Support
1 Introduction People make choices in various situations in their daily lives, such as choosing food, scheduling their day, purchasing products, choosing a route, and so on. In such situations where we have to choose, we sometimes fall into a state of indecision where we cannot easily decide. One of the causes of this is the anxiety caused by imagining regrets after making a choice [1]. Although thinking is essential in choice-making behavior, indecision and pondering may cause the chooser to feel fatigued and less satisfied with the choice [2]. Therefore, for the chooser to make a comfortable choice, it is necessary to overcome the state of indecision during the selection process in some way. Even if they are indecisive, people may be able to make a choice if prompted by a friend or an expert, such as a salesclerk, during the selection progress. In other words, prompting by others may help the person overcome the problem of choice. Shimojo et al. [3] showed that when people perform selection behavior based on multiple stimuli, they © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 410–421, 2023. https://doi.org/10.1007/978-3-031-35132-7_31
Glasses Encourage Your Choices
411
spend more time gazing at the stimulus they finally select than at other stimuli, which the researchers call the cascade effect of gaze. Furthermore, Patalano et al. [4] compared indecisive and decisive participants in a decision-making information retrieval task. They found that indecisive participants paid more attention to the final choice content before choosing it. From these studies, analyzing the eye movements of people making choices is an effective way to read their preferences. In this study, we propose a system that assesses the user’s state of indecision and the options in which the user is interested using a spectacle-type eye-detection device worn by the user and recommends it for the user in the form of advice. It determines whether the user is in a state of indecision based on the thinking time. If so, it encourages the user to choose by recommending the object that the user has been gazing at for the longest time. In addition, we will conduct an experiment in which the user makes a choice and we will verify whether the proposed system can improve the user’s level of satisfaction with the choice by encouraging the user to make a choice.
2 Related Work Yamamoto et al. [5] propose an IoT (Internet of Things) awareness system to respond to the various needs of users in online shopping. In particular, by analyzing the user’s line of sight, the system can identify what the user is interested in and is not interested in. Then it can introduce products that match the user’s interests and set up sales to promote user awareness. By doing this automatically, it can improve online shopping efficiency without requiring shopkeepers to have IT knowledge. Similar to Yamamoto’s research, our study aims to assess users’ interests automatically and provide insight into their choices. We are considering using this system for offline shopping by using an eye-tracking device. Jaiswal et al. [6] proposed a recommendation system that incorporates users’ emotions and interests. They focused on the fact that the system can function without requiring a massive amount of data, as with conventional methods, using a webcam to capture the user’s gaze and facial expressions as data. Bee et al. [7] developed a system that acquires the eye movements of users answering two-choice questions about products appears on a screen and used the data to assess the visual preferences of the users in the choice selection. They tested the system for assessing visual preferences, and the accuracy was 81%, which is a high result. However, the system was valid only in a limited number of choice situations since these studies were conducted in situations where only two choices were available. In this study, we investigate a design that has many choices simultaneously to develop a system suitable for daily life.
3 Proposed System 3.1 Method In this research, we aim to overcome the situation in which users need help to make a choice in several daily situations. To overcome this indecision, we propose a method in which a device worn by the user assesses what is bothering the user and provides support
412
T. Komatsubara and S. Nakamura
to help the user solve the problem. In order to support the user’s decision, it is necessary to assess the number of choices the user is undecided about. We focus on the user’s eye movement during the selection process since when there are multiple choices, the user may move his/her gaze to various choices. By assessing in real time what the user is looking at during the selection process and by giving option recommendations, it helps user make decisions. For the method we propose, assessing what the user is looking at is necessary. Therefore, we use a spectacle-type eye-tracking device which is worn by the user to capture information on what the user is looking at in real time. Based on the results of this, we assess what the user is having trouble choosing about and which possibilities the user is interested in, and we make a recommendation to the user, like “If you are having trouble, why don’t you choose this one?” Furthermore, it encourages the user’s selection behavior (Fig. 1).
Fig. 1. Image of the proposed system
3.2 Implementation Since the proposed method aims at a real-time recommendation, it is necessary to recognize what the user is looking at in real-time and assess the recommendation target. For this purpose, we use Tobii Pro Glasses 3 [8]. This device can capture images of the environment in front of the user through a scene camera attached to it. The eye camera and LEDs can also measure the wearer’s eye movement. Assessment process of the proposed method from images of the viewing environment captured by the scene camera. The scene camera captures images of the user’s viewing environment with a resolution of 1920px in height and 1080px in width. The eye camera and LEDs measure the user’s gaze, and only the part of the image that the user gazes at gets cropped. A machine learning system analyzes the cropped images from the video. It assesses the options that lie beyond the point of gaze. In this system, Teachable Machine [9] is used to identify the options captured by the camera of the eye-tracking device. The system calculates the degree of agreement regarding the video image obtained in real time and assesses what the selector is currently looking at. Due to the system’s processing speed, this image recognition process occurs four times per second. The system uses the object the user looked at most frequently during the selection as the option used to recommend. Therefore, based on the recognition results, the system
Glasses Encourage Your Choices
413
calculates the total gaze dwell time for each selected object and recommends the one with the longest dwell time as the recommended item. To draw the user’s attention to the recommended possibilities, we use voice to guide the user’s gaze to the screen and then present the possible choices. The system execution screen is shown in Fig. 2. Before starting the experiment, the experiment participants enter their names in the text box in the center of the screen, and clicking the “start” button begins the measurement of eye movement information. Participants select the number corresponding to the option from the pull-down list in the lower-left corner of the screen and press the button labeled “Decide!” next to it to submit their option, and the measurement finishes. At the end of the measurement, a CSV-formatted file containing the participant’s name, gaze data, the recognition result of the object analyzed by the Teachable Machine, the confidence level of the result (0–1), the measurement time, the object finally selected, and the recommended object is output. After a certain amount of time has elapsed since selection had started, the system assesss the object that the user gazed at the longest based on the analysis results. The system sounds alert and presents a text on the screen recommending that object, as shown in Fig. 3.
Fig. 2. An example of the screen of the system.
Fig. 3. An example of a recommendation message appears in the system.
4 Experiment 1 We hypothesize that the proposed system will help the user to make a decision, shorten the time required to make a decision, and increase satisfaction with the decision. To test this hypothesis, we conducted an experiment in which users made choices while using the system.
414
T. Komatsubara and S. Nakamura
4.1 Experimental Design The purpose of this study is to eliminate indecision and to verify the usefulness of the system. As an experimental environment, it is necessary to reproduce an actual situation in which people are prone to indecision. Therefore, we prepared the background of three choice categories (tourist attractions, food, and New Year’s greeting card templates) to be realistic and not complicated, as shown in Table 1. In each category, we prepared 20 choices. Here, we made sure that the range of choices was wide enough to avoid instantaneous selection based on the bias of individual interests. The 20 choices were arranged in tiles of four squares (vertical) and five squares (horizontal), as shown in Fig. 4, and printed. Based on prior experiments, the system made recommendations one and three minutes after the user began viewing the menu. The timing of the decision was not specified. The decision may have taken place before the first recommendation was displayed. These were assumed to be no recommendation trials. Since this targeted indecisive individuals, we classified those results as not indecisive. The experiment conducted in this study began with the participant wearing Tobii Pro Glasses 3 (Fig. 5). We first explained that participants were in each of the specified conditions described in Table 1. Next, they browsed the printed menus for each selected category while using the system described in Sect. 4.4, selecting from a list of 20 possible selections. We also gave them a questionnaire about their choices after each trial.
Fig. 4. Fictitious restaurant menu used in experiment.
Glasses Encourage Your Choices
415
Table 1. Selection categories and conditions for selection Categories
Conditions
Sight-seeing area
An American of the same generation and gender whom you met on a social networking service is planning to travel to Japan alone after Covid-19 is over. He (she) has never visited Japan yet and is worried that there are too many places he (she) would like to visit. He (she) has given you 20 suggestions of places he (she) would like to visit, and you have asked him (her) to recommend the best of these. Which one would you recommend? Also, please think about the reasons for your recommendation
Restaurant
You are in your fifth-period class at college. After class, you were invited to dinner by two particularly close classmates who were attending a lecture together. We decided to have dinner at a family restaurant. The restaurant’s menu contains images of 20 dishes, and you are thinking of ordering one. Which dish would you choose? Also, please think about the reasons for your choice
New Year’s Card Design On New Year’s Day, you received a New Year’s greeting card from a friend from junior high school. It has been several years since you received a New Year’s card from this friend, and the New Year’s card contains updates on what has been happening over the past few years. You are thinking of sending a New Year’s greeting card in reply. However, you didn’t have any New Year’s cards at home, so you went to buy some commercially available ones and found 20 different types of New Year’s cards on sale. Which design of the New year’s greeting card would you choose? Also, please think about the reasons why
Fig. 5. An image of participant taking part in the experiment 1.
416
T. Komatsubara and S. Nakamura
4.2 Results Twelve university students participated in the experiment. They were asked to make selections concerning tourist attractions, food, and New Year’s greeting card templates, yielding a total of 36 trials of data. Table 2 shows the number of selections made before or after each category recommendation. Table 3 shows the cases in which the system-recommended object matched the one selected by the chooser for the 21 trials decided after the system recommendation took place. The questionnaire responses (Q1-1~3) are shown in Fig. 6, Fig. 7, and Fig. 8, respectively. The results show that, except for the trials in which the decision was made before the recommendation, many respondents were interested in the recommended object for Q1-1, and many felt that the timing of the recommendation was late. No trials were rated low in terms of satisfaction with the selection itself. Table 2. The number of trials determined before the recommendation of each condition and the number of trials determined after the recommendation. Before recommendation
After recommendation
Sight-seeing area
6
6
Restaurant
4
8
New Year’s card design
5
7
Total
15
21
Table 3. Number of recommended and selected options that match and that didn’t match. Number of trials Match
6
Mismatch
15
4.3 Discussion As shown in Table 3, for six out of 21 trials, the same selection as that recommended by the system was chosen. In the results of the questionnaire, the following responses were given: “I was thinking that number 10 was a good choice, and it was suggested and I was encouraged to choose it,” and “Next year’s zodiac sign is the tiger, so I thought that anything with a tiger design printed on it would be fine, and waited for the system to recommend it.” These results suggest that our system can encourage product selection. We checked 15 cases of mismatch trials, and we found that in eight out of 15 trials, the system failed to recognize the option which the selector was paying attention to.
Glasses Encourage Your Choices
417
Fig. 6. The result of Q1-1: What did you think of the recommended option?
Fig. 7. The result of Q1-2: How did you feel about the timing of recommendations by the system?
Fig. 8. The result of Q1-3: How satisfied are you with the decision?
Next, we classified participants into three groups: participants who chose the recommended item (successful group), participants who did not choose the recommended
418
T. Komatsubara and S. Nakamura
item (unsuccessful group), and participants whom the system failed to recognize their interest (error group). Table 4 shows the average of selection time which was over one minute and the satisfaction level in Q1-3, in each participant group. This table shows that the average selection time of the successful group was eight seconds earlier than the average of the unsuccessful group, and the unsuccessful group took longer than the overall average. The satisfaction level of the successful group was the highest. This result indicates that suitable recommendations can increase satisfaction for the selectors. The question “Do you consider yourself indecisive?” asked after the experiment concluded did not correlate with the level of satisfaction. This result means that it seems that even choosers who have trouble making a decision in everyday life could decide the case of the choices used in this experiment. To solve this problem, we should redesign the experiment test for selection. Table 4. Average selection time and satisfaction in experiment 1.
5 Experiment 2 5.1 Experimental Design Since the menu used for the selection in the previous experiments were created by us, we were concerned that it might give a slightly different impression from the actual selection situation to the participants. In addition, the evaluation method was based on whether the system’s recommendations matched the user’s interests, so we could not discuss the effect of the system’s recommendation compared to the condition without such recommendation. In this experiment, we investigate the influence of system support during the selection process by conducting an inter-experiment comparison with and without the support of our system described above. We also investigate the system’s usefulness in a situation closer to a realistic environment by having the participants choose a gift from a catalog including over 100 options (Fig. 9). To minimize the effect of wearing an eye-tracking device, we asked all participants to wear Tobii Pro Glasses 3 even in conditions where our system was not used. None of the participants were instructed that there was going to be a recommendation but were only reminded to look at the PC screen when the system alerted them for the recommendation. Our system recommended an option 90 s after participants had viewed the catalog’s last page.
Glasses Encourage Your Choices
419
Fig. 9. An image of participant taking part in the experiment 2.
5.2 Results The number of experiment participants was 29, including undergraduate and graduate students. We except four cases because they have fails. Then, we obtained 14 cases “using our system” and 11 cases “not using our system.” In this experiment, none of the “with our system” trials resulted in the selection of the recommended option. Figure 10 shows the results of Q2-1, “How helpful was the recommendation in making your choice?” This result shows that while half the participants answered that the recommendation was helpful, others half answered that it was not helpful. Table 5 shows the selection time and satisfaction. The result indicated that respondents did not seem to be encouraged by the recommendations in this experiment.
Fig. 10. The result of Q2-1: How helpful was the recommendation in making your choice?
Table 5. Average selection time and satisfaction in the second experiment Number of cases
Selection time
Satisfaction
Using our system
14
882.28
4.43
Not using our system
11
814.40
4.36
420
T. Komatsubara and S. Nakamura
5.3 Discussion These results indicate that the system did not work effectively in this experiment. The first issue was that the timing of the recommendation was not good. Figure 11 shows the browsing behavior of a participant. The horizontal axis represents the elapsed time, and the vertical axis represents the page they viewed in the catalog. The red line indicates the page number of the recommended option, the green line indicates the page number of the option decided by the participant, and the black line indicates the timing of the recommendation. This figure indicates that the chooser’s intended browsing behavior was interrupted by the recommendation and that the recommendation failed to serve as an encouragement. However, some participants responded in the questionnaire that the recommendation had caused them to think again. The second issue was that the criterion of recommending the most viewed object was not good for selecting an option in the catalog. In response to the question in the questionnaire about what they thought about the recommended products, some participants answered that they ignored them. This may have been due to the eye-catching pictures and words on the catalogs, which influenced the viewing time of each option. Therefore, we believe that adding a time series component to the interest assessment will allow us to focus more on the selector’s interest than on the appearance factor and to achieve an assessment of interest with less inconvenience. We plan to improve this point in future studies.
Fig. 11. Selective action logs for the group with our system.
6 Conclusion In this study, we developed a system in which an eyeglass-type device assesses the selector’s interest by using eye-tracking and recommends to the selector to eliminate indecision during the selection process. Then, we conducted two experiment tests. The results of experiment 1 suggested that a recommendation during the selection process positively affects the indecisive chooser. However, the results of experiment 2 revealed many concerns that need to be addressed in order to implement the system in a practical way. Selection is affected by various factors, such as the importance of choice and the number of options, and people’s selection behavior changes depending on these. Therefore, the proposed system needs to be implemented taking account of this.
Glasses Encourage Your Choices
421
In the future, we intend to improve the system to encourage the user to choose more effectively. Specifically, we will add the function that enables users to ask our system to recommend an option if they have difficulty in choosing. It is also possible to improve the accuracy of interest assessment at the end of the process, which is considered particularly important, by considering time series in the interest evaluation at the time of the recommendation. We also plan to redesign the experiment to ensure that this system will be used in daily choices. Acknowledgement. This work was partly supported by JSPS KAKENHI Grant Number JP22K12135.
References 1. Bell, D.E.: Regret in decision making under uncertainty. Oper. Res. 30(5), 961–981 (1982) 2. Rassin, E., Muris, P.: Indecisiveness and the interpretation of ambiguous situations. Personality Individ. Differ. 39(7), 1285–1291 (2005) 3. Shimojo, S., Simion, C., Shimojo, E., Scheier, C.: Gaze bias both reflects and influences preference. Nat. Neurosci. 6, 1317–1322 (2013) 4. Patalano, A.L., Juhasz, B.J., Dicke, J.: The relationship between indecisiveness and eye movement patterns in a decision making informational search task. J. Behav. Decis. Mak. 23(4), 353–368 (2009) 5. Yamamoto, Y., Kawabe, T., Tsuruta, S., Damiani, E., Yoshitaka, A., Mizuno, Y., Knauf, R.: IoT-aware online shopping system enhanced with gaze analysis. In: 2016 World Automation Congress (WAC), Rio Grande, PR, USA, 2016, pp. 1–6 (2016) 6. Jaiswal, S., Virmani, S., Sethi, V., De, K., Roy, P.P.: An intelligent recommendation system using gaze and emotion detection. Multimedia Tools Appl. 78(11), 14231–14250 (2018). https://doi.org/10.1007/s11042-018-6755-1 7. Bee, N., Prendinger, H., Andre, E., Ishizuka, M.: Automatic preference detection by analyzing the gaze ‘Cascade Effect’. In: COGAIN 2006: Gazing into the Future (2006) 8. Tobii Pro Glasses 3. https://www.tobii.com/products/eye-trackers/wearables/tobii-pro-gla sses-3. Accessed 09 Feb 2023 9. Teachable Machine. https://teachablemachine.withgoogle.com/. Accessed 09 Feb 2023
Physiological Measures in VR Experiments: Some Aspects of Plethysmogram and Heart Rate Shinji Miyake1(B)
, Chie Kurosaka2
, and Hiroyuki Kuraoka2
1 Chitose Institute of Science and Technology, Hokkaido 066-8655, Japan
[email protected] 2 University of Occupational and Environmental Health, Japan, Fukuoka 807-8555, Japan
Abstract. An easy and simple method to obtain photoelectric plethysmogram (PPG) amplitude is proposed. Standard deviation (SD) of PPG wave form is almost equivalent to the average PPG amplitude measured by the difference of the peak height and the trough level. The correlation coefficient of them is 0.9942. A sensory intake task induces heart rate (HR) deceleration. This response is called Pattern 2. Therefore, HR changes show different directions (increase and decrease) in different tasks even if their subjective mental workload scores (NASA-TLX WWL) are identical. On the contrary, PPG amplitude (SD) shows correlation with WWL scores among several mental tasks. When PPG is recorded from the fingertip, hand and/or finger movement distort its waveform and the amplitude cannot be measured correctly. Therefore, hand and finger movement must be restricted. If this restriction does not disturb the experimental procedure, PPG amplitude calculated by the SD is a powerful index to evaluate alpha sympathetic nervous system activity during mental tasks. When interpret HR changes induced by mental tasks, Pattern 2 response must be considered. Keywords: Plethysmogram Amplitude · Heart Rate Change · Mental Workload
1 Introduction In this conference session “Virtual Reality Design for Effective and Comfortable Interaction”, physiological indices are used to evaluate the effect of VR environment or human responses in some papers. Difficulty and a troublesome point of physiological indices are analysis of physiological signals and the interpretation of their changes. In this paper, very easy method to obtain photoelectric plethysmogram (PPG) amplitude, which is mediated by alpha sympathetic nervous system (SNS) activity, is introduced and its validity is shown. A heart rate (HR) decrease during a specific mental task comparing with resting period must be kept in mind. This is called Pattern 2 response and is evoked by a sensory intake task such as a mirror tracing (MT) task. Therefore, HR change itself is not a good index of physiological responses to evaluate mental workloads. It should be emphasized that the purpose here will not show the significant differences among physiological responses evoked by different tasks by statistical tests: rather the focus will be on indicating the usefulness of PPG and a specific HR change in Pattern 2 response. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 422–430, 2023. https://doi.org/10.1007/978-3-031-35132-7_32
Physiological Measures in VR Experiments
423
2 Physiological Base 2.1 Photelectric Plethysmogram A long wave length light (infrared or near infrared) passes through a finger or an earlobe tissue. The amount of light received by a photoelectric transducer relates to the amount of blood within the recording region (Fig. 1). The amplitude of PPG decreases by vasoconstriction.
Fig. 1. Fingertip photoelectric plethysmograph infrared light transmitter (above) and a photoelectric transducer (below).
Wave amplitude is measured by detecting the peak and the trough one by one. The difference between the peak height and the trough level is the amplitude of one pulse wave (Fig. 2).
Fig. 2. Sample wave forms of PPG and ECG. PPG amplitude is measured as the difference between peak height and trough level as shown in a red arrow. (Color figure online)
However, it is troublesome work to detect peak and trough without by using readymade software. Average amplitude of a waveform can be obtained by RMS (Root Mean Squared) that is equivalent to the standard deviation (SD). Therefore, the SD of PPG (PPG-SD) may be identical to the mean amplitude of the PPG signal (PPG-amp) [1].
424
S. Miyake et al.
2.2 Heart Rate Deceleration and Pattern 2 Response Lacey et al. [2] hypothesized that mental concentration is accompanied by cardiac acceleration, and that attention to the environment is accompanied by cardiac deceleration. They called the former environmental (sensory) rejection stressor (task) and the later environmental (sensory) intake stressor (task). Williams [3] referred to physiological responses during the sensory intake task as Pattern 2 in which skeletal muscle vasoconstriction and heart rate decrease are observed [4].
3 Method 3.1 Participants Thirty-four male university students (average 22.6 ± 1.9 yrs.) were recruited in this study. They were separated into three groups by some reason. Group 1 was consisted of thirteen. Ten and eleven participants were included in Group 2 and Group 3 respectively. They provided written informed consent before their participation. The study protocol was approved by the Ethical Committee of University of Occupational and Environmental Health, Japan (H29-139). 3.2 Procedure The experimental procedure was consisted of two 4-block sessions and each session includes 5-min before task resting period (REST) and three 5-min mental tasks. Therefore, totally eight blocks were performed with 5-min short break between two sessions. Five different mental tasks, i.e., self-paced and machine-paced mental arithmetic (MAself and MA-paced) task, Embedded Figures Task (EFT), Mirror Trace (MT) task and Raven Progressive Matrices (RAVEN) task were used to evoke physiological responses. Session 1 included REST, MA-paced, EFT and MT and Session 2 included REST, MApaced, RAVEN and MA-self. The order of Session 1 and Session 2 were reversed in Group 3. Participants evaluated their mental workload using NASA task load index (NASA-TLX) after each mental task and weighted workload (WWL) was calculated. 3.3 Mental Task Mental Arithmetic Task. MA tasks are based on the MATH algorithm proposed by Turner et al. [5]. MATH contains addition and subtraction with 5 levels of difficulty: level one comprises 1-digit + 2-digit problems; level two 2-digit - 1-digit problems; level three 2-digit ± 2-digit problems; level four 3-digit + 2-digit problems and level five 3-digit - 2-digit. These problem settings are a little bit different from the original and every addition produces a carry and every subtraction produces a borrow. One of these problems appears on PC screen for 2 s, followed by the word ‘EQUALS’ for 1.5 s. An answer, which is correct or incorrect, then appears. Participants are required to press the mouse left button if the presented answer is correct, and the right button if it is incorrect (Fig. 3). In a machine-paced task, participants are required to respond within 1.5 s and next problem is appeared though the participants cannot respond in time. An
Physiological Measures in VR Experiments
425
appropriate response raises the level of difficulty of the next problem by one step, while an inappropriate response or no response lowers it by one step. In a self-paced task, the next problem does not appear until participants make any click.
Fig. 3. Screen shot of MA task.
Embedded Figures Task. One of five simple figures is hidden in a complex figure. Participants are asked to find which figure is there and click the answer figure shown below (Fig. 4) in their own paces. Participants are allowed to skip the problem when they think it is too difficult by clicking the ‘NEXT’ button. This task was developed by Witkin to evaluate cognitive stiles of Field Dependent and Field Independent [6]. Lacey et al. described that Obrist found cardiac deceleration for the situation of finding hidden figures in a picture [2].
Fig. 4. Screen shot of EFT.
Mirror Trace Task. A zigzag pathway is presented on a PC screen (Fig. 5) and participants are instructed to trace inside it without a deviation from the path using a mouse as precisely as possible but with no hurry. No rush is emphasized in this instruction to avoid self-time pressure. The mouse control elements of X-axis and Y-axis are interchanged. Therefore, for example, a cursor on the screen moves upward when participants move
426
S. Miyake et al.
the mouse to the right [7]. Exactly said, this task is not a mirror trace task. Nevertheless, it is similar to a mirror trace task and has the attribute of a sensory intake task.
Fig. 5. Screen shot of MT.
Raven Progressive Matrices Task. Nine geometric patterns are arranged on a 3 × 3 matrix but the right bottom is missing as shown in Fig. 6 [8, 9]. Participants are required to judge which of eight patterns shown below matches the blank space and click it. They are allowed to skip to the next problem as the same as the EFT instruction (Fig. 4).
Fig. 6. Screen shot of RAVEN.
3.4 Analysis R waves were detected from ECG signals by a purpose-made program. The maximum (peak) and the minimum (trough) points of PPG were identified between two R waves as shown in Fig. 2 by using Excel worksheet function. The difference between the peak height and the trough level was calculated as the amplitude of one pulse wave in PPG. The average of all pulse waves in each block was calculated as PPG-amp. The standard deviation of PPG waves (sampled at 1 kHz) in each block was obtained
Physiological Measures in VR Experiments
427
as PPG-SD [10]. Correlation coefficient between PPG-amp and PPG-SD from eight blocks in each participant was calculated. Three participants and one or two blocks in some participants were eliminated due to bad waveforms caused by artifact and/or hand movement. Therefore, totally 238 5-min blocks were analyzed.
4 Results and Discussion 4.1 Plethysmogram Correlation coefficients between PPG-amp and PPG-SD in each participant are listed in Table 1. The average correlation coefficient is very high; r = 0.9942. Figure 7 shows average PPG-amp and average PPG-SD in each block. These two indices cannot be distinguished in this figure meaning that PPG-amp and PPG-SD are almost identical. The PPG-amp/PPG-SD ratio was between 3.21 and 3.74, and the average (SD) was 3.48 (0.119). Therefore, PPG-amp can be obtained by multiplying PPG-SD by 3.48. Table 1. Correlation coefficients between PPG-amp and PPG-SD.
Participant 1 2 3 4 5 6 7 8 9 10 11 12 13
Group 1 0.9888 0.9941 removed 0.9798 0.9983 0.9940 0.9983 0.9974 removed 0.9999 0.9549 0.9966 0.9951
Group 2 removed 0.9975 0.9947 0.9945 0.9822 0.9897 0.9747 0.9691 0.9983 0.9920 -
Group 3 0.9950 0.9957 0.9988 0.9934 0.9721 0.9912 0.9941 0.9846 0.9936 0.9976 0.9978 -
[standardized score]
1.0 0.5 0.0 -0.5 PTG-amp PTG-SD
-1.0 -1.5
REST
EFT MA-paced
REST MT
RAVEN MA-paced
Block Fig. 7. Changes in PTG-amp and PTG-SD.
MA-self
428
S. Miyake et al.
4.2 Heart Rate Figure 8 shows HR, PPG amplitude (PPG-SD) and WWL scores. In these results, HR increased during mental arithmetic (MA) task comparing with the before task resting level. On the contrary, HR significantly decreased during MT task though subjective mental workload scores (WWL) for them were identical [4]. Therefore, when the task has sensory intake characteristic, HR does not correlate with a subjective mental workload index. On the other hand, PPG amplitude calculated by SD as described above showed high correlation with WWL scores among three different mental tasks, MA, MT and FET. PPG-SD decreased in MA and MT while it was almost identical to the resting level in EFT. WWL scores were higher in MA and MT than in EFT as shown in Fig. 8. These results indicate that PPG amplitude is sensitive physiological measure to the workload.
1.0
WWL (n=34)
HR (n=33)
PPG (n=31) 100
0.5 0.0
90
-0.5 80
-1.0 -1.5
WWL
HR㸪PPG amplitude [standardized score]
1.5
70
-2.0 -2.5
60 REST
MA-paced
EFT
MT
Block Fig. 8. HR, PPG amplitude and WWL
However, when PPG sensor is attached on the fingertip, hand and/or finger movement is highly restricted meaning that the participants cannot move their hand during the recording. In case that PPG sensor is attached on the earlobe, hand motion is not restricted. However, the response sensitivities and/or response patterns are different between the two recording sites. The fingertip is more sensitive than the earlobe. If PPG peaks are detected in a signal analysis, peak to peak intervals (PPI) are almost equivalent to RR intervals measured from ECG signals. This indicates that HR (exactly said, it is a pulse rate (PR)) information can be obtained from PPI without the ECG recoding by using a bio-amplifier (ECG amplifier) with some electrodes. As just described, PPG includes both of the amplitude information and the time domain information. Furthermore, its wave form contains vascular information. Therefore, PPG is very useful and an informative physiological signal. PPG-SD can be calculated by using Excel worksheet function as described above. Therefore, the alpha SNS activity index can be easily obtained by this method. Nevertheless, please note that PPG must be measured by AC amplifier or baseline fluctuation must be removed from the waveform when PPG is measured by DC amplifier before
Physiological Measures in VR Experiments
429
Amplitude [a.u.]
1.5 1.0 0.5 0.0 -0.5 -1.0
Amplitude [a.u.]
calculate PPG-SD. The baseline fluctuation of PPG can be removed by moving average for around 1–2 s. When averaging time is T, the cut off frequency of moving average f c is 1/0.443T Hz. For example, if the sampling time of PPG is 1 ms and the number of moving average is 1600 points (1.6 s), f c is nearly equal to 0.27 Hz (period is about 3.6 s). This moving averaged wave contains only baseline fluctuations without the beat components (Fig. 9a bold line). Therefore, the wave form that contains only beat components can be obtained by subtracting this baseline fluctuation from the original wave (Fig. 9b). Baseline fluctuation has information about emotional states such as anxiety [11, 12] and SNS activity [13]. The SD of baseline fluctuation provides the amplitude of it.
1.5 1.0 0.5 0.0 -0.5 -1.0
a
100
110
120
b
100
130
140
150
130
140
150
Time [sec]
110
120 Time [sec]
Fig. 9. PPG with baseline fluctuation (a), the baseline calculated by the moving average (bold line) and PPG without baseline fluctuation (b).
The removal of baseline fluctuation by the moving average can be calculated by Excel worksheet and the SD is also easily calculated by Excel. That is, the substitute index for PPG amplitude can be easily obtained by Excel and that is almost identical to the amplitude detected in each pulse wave as shown in Fig. 7. The shape of PPG waveform has many of information. However, a complex waveform analysis such as the second derivative is required to obtain them [14]. On the other hand, the SD of PPG may be the simplest index though some length of data is necessary to calculate SD. It should be noted that the root mean square (RMS) is equivalent to the amplitude and RMS of the signal that is deviation from the average is SD.
5 Conclusion In experiments using VR settings, recording of many physiological signals may be difficult because many electrodes, sensors, transducers and wires disturb participants’ action. In such case, PPG is a powerful physiological signal and has an advantage that
430
S. Miyake et al.
no complex and cumbersome analysis is necessary to get SNS activity as explained here though hand and/or finger motion is restricted. Furthermore, more complex and sensitive information on physiological responses will be provided by combining with HR or PR.
References 1. Miyake, S., Kurosaka, C., Kuraoka, H.: Plethysmogram amplitude can be simply measured by the standard deviation of its waveform. In: Proceedings of the 2022 Annual Meeting of Japan Human Factors and Ergonomics Society Hokkaido Branch, pp. 23–26 (2022) 2. Lacey, J.I., Kagan, J., Lacy, B.C., Moss, H.A.: The visceral level: situation determinants and behavioral correlates of autonomic response pattern. In: Knapp, P.H. (ed.) Expression of the Emotion in Man, pp. 161–205. International Universities Press, New York (1963) 3. Williams, R.B., Jr.: Patterns of reactivity and stress. In: Matthews, K.A., et al. (eds.) Handbook of Stress, Reactivity, and Cardiovascular Disease, pp. 109–125. Willey, New York (1986) 4. Miyake, S., Kurosaka, C., Kuraoka, H.: Specific physiological responses induced by mental tasks -Bradycardia induced by a sensory intake task-. In: Proceedings of the 2019 Annual Meeting of Japan Ergonomics and Human Factors Society Hokkaido Branch (2019) 5. Turner, J.R., Hewitt, J.K., Morgan, R.K., Sims, J., Carrol, D., Kelly, K.A.: Graded mental arithmetic as an active psychophysiological challenge. Int. J. Psychophysiol. 3, 307–309 (1986) 6. Witkin, H.A., Goodenough, D.R.: Cognitive Styles: Essence and Origins. Psychological Issues Monograph 51, International Universities Press, Inc. (1981) 7. Sato, N., Kamada, T., Miyake, S., Aka5tsu, J., Kumashiro, M., Kume, Y.: Subjective mental workload in Type A women. Int. J. Ind. Ergon. 24, 331–336 (1999) 8. Raven, J.C.: Mental Tests Used in Genetic Studies: The performance of related individuals on tests mainly educative and mainly reproductive. MSc thesis, University of London (1936) 9. Raven, J.: The raven progressive matrices tests: their theoretical basis and measurement model. In: John, Raven, J. (eds.) Uses and Abuses of Intelligence. Studies Advancing Spearman and Raven’s Quest for Non-Arbitrary Metrics. Part I, chap. 1. Royal Fireworks Press (2008) 10. Miyake, S.: Biosignal analysis by excel -ECG, PTG and ABP-, pp. 51–62. NTS (2020) 11. Neumann, C., Lhamon, W., Galati, C.: A study of factors (emotional) responsible for changes in the pattern of spontaneous rhythmic fluctuations in the volume of the vascular bed of the finger tip. J. Clin. Investig. 23, 1–9 (1944) 12. Yamazaki, K., Takasawa, N., Ueda, M.: Spectral analysis of finger photoelectric plethysmogram in its relation to emotion: visual display of baseline deflection and pulse wave. Jpn. J. Psychol. 53(2), 102–106 (1982) 13. Cook, M.R.: Psychophysiology of peripheral vascular change. In: Obrist, P.A., Black, A.H., Brener, J., DiCara, L.V. (eds.) Cardiovascular Psychophysiology, pp. 60–84. Aldine Transaction, New Brunswick (2008) 14. Elgendi, M.: On the analysis of fingertip photoplethysmogram signals. Curr. Cardiol. Rev. 8, 14–25 (2012)
Effects of Visual and Personality Impressions on the Voices Matched to Animated Characters Hiyori Takahashi and Tetsuya Maeshiro(B) Faculty of Library, Information and Media Studies, University of Tsukuba, Tsukuba 305-8550, Japan [email protected] Abstract. This paper analyzes the relationship between the visual aspects of characters and the voice properties. Experiments indicate that humans employ different sets of features to evaluate voice impressions when illustrated characters are displayed in different size or partially displayed. Moreover, the range of matching voice also varies. The experiments with animal characters indicate the influence of personality traits impressions. Keywords: illustrated character · animated character aspects · voice · quantitative · image
1
· visual
Introduction
Suppose there is a range of voice that matches the character impressions and visual aspects such as shape, color and size. We have analyzed this relationship for illustrated and animated characters, and have found that different appearances of the same character imply distinct voice ranges that match the character. Furthermore, the range of matched voice properties depends on the personality traits of the character. The purpose of this study is to clarify how the impression of the character’s personality and voice changes when the range and size of the character’s image are changed. In this study, we focus on the quality of the voice that is perceived as a good fit, rather than the way the character speaks or the content of the character’s utterances. One of the importances of quantitatively elucidating human perception of visual and voice impressions is the high importances of the visual and voice aspects. According to Mehrabian’s rule [6], these are the only two factors that control almost whole impression in human communication. The high importances of both visual and voice aspects imply that discrepancies between the visual and voice impressions cause viewers and listeners to have strangeness feelings. When watching a movie, for instance, a person might be distracted due to this strangeness, impeding to concentrate into the movie. This should be avoided both for the audience and the movie creators, as audience feeling discomfort might result in low evaluation of the movie. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 431–444, 2023. https://doi.org/10.1007/978-3-031-35132-7_33
432
H. Takahashi and T. Maeshiro
By clarifying the detailed relationship between the character’s appearance and voice, we can use this information in the casting and metaverse of anime voice actors. Animation has become a representative cultural industry in Japan, with many animated characters being created every year. In recent years, the character business market has been expanding, and not only animated characters but also original characters from companies and other organizations are increasing. In addition, the number of voice actors who provide the voices of these characters is increasing every year, and a wide variety of character expressions are being created. Similarly, in the metaverse, character models can be created with a high degree of freedom, and in the future, advances in technology, such as voice changers, will make it possible to create a wide variety of character expressions. In the ongoing progress of animation and metaverse, the use of voices that match the character’s appearance can reduce the viewer’s sense of discomfort. It will also be possible to manipulate impressions by changing the combination of appearance and voice. Various studies have already been conducted on character impressions. It has been shown that hair color, hair length, eye shape, etc. change the impression a character gives, and that the impression varies depending on the gender of the person viewing the character. It has also been shown that the impression given by a character changes depending on the aspect ratio of the character’s appearance. Furthermore, research has been conducted on the stereotypes of voice quality that are highly compatible with the impression words that express the character’s personality. Research has also been conducted on the relationship between a character’s appearance, personality, and voice, and it has been shown that the voice quality that is perceived as a good fit changes as the impression of the character’s personality from its face changes. In human communication, visual and voice aspects account for almost the whole importance [6]. Experiments showed that visual aspects, such as looks, gestures and facial expressions, accounts for 55%, while voice aspects, such as voice quality and loudness, speaking speed and tone, accounts for 38%, and linguistic information, such as speech contents and meaning, accounts for just 7%. A study indicated that people can estimate the visual aspects from voice [5]. This work relies that humans unconsciously expect that visual and voice aspects match. However, this is a qualitative study, and fails to disclose quantitative properties of voice and consequently the quantitative relationships between voice and visual aspects. Another study indicated the human ability to estimate on opposite direction, estimating voices from visual aspects [4]. Identical to the previously mentioned study, this work is also limited to qualitative aspects. The limitation of analysis to quantitative aspects constrains strongly the usefulness and applicability of these studies, because generation of matching pair of voice with character is very difficult, if not impossible, if no quantitative features are elucidated. Similar analyses target personality aspects of characters, instead of treating directly the visual aspects [1,2]. However, it has not yet been clarified how the impression of personality and voice changes with changes in the size and range of the character’s presentation. In animated films, characters are displayed in various sizes, ranges and angles,
Effects of Visual and Personality Impressions
433
such as the whole body or only the face. Therefore, it is necessary to clarify the effects of changes in presentation range and size. In this study, we conducted a subject experiment using full-body and face images of characters and multiple voice qualities, and clarified changes in impressions of character and voice by changing the size and range of presentation. We tested with human and animal characters. Measuring indirectly the visual aspects by the use of personality tests enables the quantitative description of visual aspects. The present paper measures quantitatively the visual and voice aspects of animated characters, and elucidates quantitative relationships among visual and voice aspects. Quantitative analysis has not been presented in other studies. Furthermore, the present work applies to any character images, not limited to animated, human-like, or existing living form-like.
2
Methods
We tested two types of illustrated characters: (1) human and (2) non-human animals. Fifteen human characters and 104 non-human, basically animal characters were used. In both cases the characters faced forward and did not show intense emotions in their facial expressions or poses. 2.1
Human Characters
For human characters, three types of images were used in experiments: (a) WB: whole body, (b) SF: small size face only, with identical size to the face in (a) whole body, and (c) LF: large size face only, with increased size. For each character, experiments were conducted under three conditions: full-body presentation, large face presentation, and small face presentation. All participants were presented with the three types of character images. However, minimum interval of two days was set between the experiments for each participant, to minimize the influences of previous impressions from the same character. The printed images on A4 size paper were displayed to participants in experiments. Each paper had single image. The images were adjusted to be 25 cm high on A4 paper for the whole body images. Large face images were adjusted to be 15 cm high on A4 paper, while small face images were 5 cm high. These three printed images were prepared for all characters used in the experiments. The voices used in the experiment consisted of 45 different voices with different fundamental frequencies and vocal tract lengths, and all 45 voices had the same speech content. Table 1 lists the voice parameters used in experiments. The base frequency and vocal tract length are the voice parameters. Generally, base frequency of male voice is 100 Hz and 140 Hz, and of female voice 200 Hz and 280 Hz. However, since most voices of animated characters in movies and TV shows have higher base frequency, we provided voices with base frequency 340 Hz (300 Hz, 340 Hz and 380 Hz) in our experiments. The vocal tract length between 0.8 (20% shorter) and 1.2 (20% longer) with 0.1 (10%) step was used in
434
H. Takahashi and T. Maeshiro Table 1. Parameter values of voice Base voice
Base frequency Vocal tract length
Male
100 Hz 120 Hz 140 Hz
0.8–1.2 (0.1 step)
Female
200 Hz 240 Hz 280 Hz
0.8–1.2 (0.1 step)
Female voice actor 300 Hz 340 Hz 380 Hz
0.8–1.2 (0.1 step)
our experiments. Higher base frequency results in higher pitch voice, and shorter vocal length also results in higher pitch voice. The sample speech data was generated with all parameter value combinations listed in Table 1 using the speech synthesis system WORLD [7]. The phrase was “good morning” in Japanese (“ohayougozaimasu”), which is asexual, location, environmental and contextual conditions independent. The phrase is also short, convenient for participants to listen to all 45 voice parameter combinations. Participants selected multiple voices that they felt matches to the displayed illustrated character, thus we obtained a range of base frequency and vocal tract length, or the “range” of matched voice as the result. The simplified personality trait test generated from the japanese version of the Big Five test [3] was used to determine the personality impression of illustrated characters. Although the Big Five test was originally conceived for selftest, we modified the question expressions asking for the impressions of the others, the presented images. The personality test consists of twenty questions, which results in the degree of the five aspects: extroversion, cooperativeness, sincerity, emotional stability and flexibility. The score of each aspect ranges between 4 and 28. The average value of the five aspects’ score was used as the personality impression of a given illustrated character. The experiment was conducted on twelve university students and twelve working adults, totaling twenty four, six of whom were male. First, a character image was presented to the participants, and they were asked to answer about their impression of the character’s personality based on the character’s visual aspects (looks). The participants were asked to rate their impressions of the character’s personality using the personality test, using a 7-point scale. They were also asked the “perceived” gender of the presented illustrated character. Then, the participants were asked to listen to 45 different voices while looking at the images, and to answer how well they felt each voice matched the character in the images using a six factor test. The above procedure was repeated, and the participants were asked about their impressions of the characters and voices for all the images. The order in which the character images were presented was randomized for each participant.
Effects of Visual and Personality Impressions
435
Participants judged the matching degree of each of 45 voices with each image shown in A4 paper. Participants were free to listen to voices any number of times, and assigned the matching degree of six scales to each voice. The presented order of character images was randomized for each participant, and the experiments with each three image types (whole body, large face and small face) were held in different dates, with interval of at least two days to minimize the influence of previous experiments, particularly the memory influenced by character images. Besides the experiments to evaluate voice matchings, participants answered to survey regarding their familiarity with anime and manga to analyze the influence on voice selection, as those familiar with anime voice might choose higher pitch (frequency) voices than participants that are unfamiliar. The analysis indicated no such influence exists. Therefore, all participants were treated as having identical properties. 2.2
Animal Characters
Six animals were selected from the video game Animal Crossing: New Horizons to elucidate the range of matched voices and relationship with personal traits impressions (Fig. 1). The game contains 413 characters from 35 animal species. This game was selected because it contains many animals, and multiple characters exist for each animal type. The characters have been created based on unified procedures and visual design principles, so we have the advantage of removing the parameters related to the variability of character design. Therefore, we can reduce the number of “invisible” or “hidden” experiment parameters and focus the analyses on the variables [directly] related to the animal type.
Fig. 1. Correlation between the personality traits and matched voices from animal characters were analyzed
The characters in the Animal Crossing: New Horizons have the standardized size independent of animal types, with similar head and body proportions. Therefore, the frog and the elephant characters, for instance, have roughly similar head and body sizes, as well as arm and leg lengths. The whole body with face
436
H. Takahashi and T. Maeshiro
forward images were prepared and printed on A5 size papers. Images were normalized to 9cm width, thus the size information of the original character images that experiment participants may imagine are neutralized. Thus the elephant character and the frog character are shown with the same size. Table 2. Selected animal species for the experiment. “Males” and “Females” denotes the number of characters identified as male or female by the participants. Note that the sum of males and females is not equal to the number of characters, as “Neutral” characters also exists. Specie
Characters Males Females Size
Cries
Frogs Squirrels
18 19
12 5
5 10
small small
no
Cats Rabbits
23 21
6 5
9 10
medium medium no
Elephants 11 21 Horses
3 4
6 6
large large
no
The six animal species used in the experiment were selected based on the size and familiarity with animal cries (Table 2). The animal cries is used as the animal selection criterion, and consequently used as one of experiment variables, because the easiness to recall the animal sounds or cries might influence the selection of matched voices. Multiple characters constitute each animal specie, and unbiased data can be obtained for each specie as multiple samples can be studied. Before selecting the matched voices for each animal character, participants also judged the gender of each animal character, as the gender is not provided in the game Animal Crossing: New Horizons. Since the male or female type influences the matched voices, participants were requested to identify the gender of each animal characters from “Male”, “Female” or “Neutral”. Each animal character was independently judged by three participants, and the gender was assigned based on the majority choice. Table 2 shows the number of identified male and female characters for each animal species used in the experiment. Besides selecting the matched voices for each animal character, participants also answered the personality traits impressions perceived from the image of animal character. The voice types used to select the matched voices were the same as the 45 voice types used for the experiment of human characters. The personality traits questionnaire was the same used in the human character experiment. The correlation between the personality traits and matched voice was calculated using the base frequency and vocal length of the best matched voice.
Effects of Visual and Personality Impressions
3 3.1
437
Results and Discussions Human Characters
The results of the experiment showed that the impression of personality did not change depending on the size of the face, but changed between the fullbody image and the face image. The experiments indicated that different factors influence the matched voice of human characters. First, we analyzed the influence of anime and manga familiarity on the voice selection. No statistical difference exists, thus there is no need to segment the participants and analyze separately. Two reasons are possible. First, the correlation between the character image and voice is strong the familiarity and thus the subconscious influence of “anime voice” has weaker influence, not strong enough to affect the judgment of character image and voice matchings. Even if a participant likes the so called “anime voice”, the degree of preference is detached when search for matched voice. Second, the familiarity with anime does not imply preference for anime voice over other voice types, although it cannot be confirmed since we have not asked the likeliness of anime voice. We calculated the difference of personality impression among three image types, which are whole body, large face and small face. The differences of personality impressions between the pair of three conditions, which result in three values, were calculated for each participant, and then averaged. Table 3 shows the values. Table 3. Differences between different image conditions. WB denotes whole body, LF denotes large face, and SF denotes small face WB–LF LF–SF SM-WB Average 2.94
2.59
2.85
More detailed analysis for each personality aspect indicated that the emotional stability between the whole body – large face conditions pair was the only statistically different conditions. Since large face – small face conditions showed no statistically different personality aspects, the difference between the whole body, where the face is small, and large face, where the rest of the body is not shown, suggests that change in face size with body deletion has significant impact. It suggests that not the size, but the displayed parts influence the personality impression, as there was no difference between large face and small face cases. The similarity of matched voices between different characters was calculated as vector distance Ui,j =
w i · wj |wi ||wj |
(1)
438
H. Takahashi and T. Maeshiro
(i, j) = (W B, SF ), (LF, SF ), (SF, W B) where WB denotes whole body, LF denotes large face and SF denotes small face image conditions. The calculated values were used to executed hierarchical clustering. The clustering results of the three conditions indicate that matching voice changes due to visual aspects, and different features are used to judge the voice impressions. Figure 2 is the clustering result of human illustrated characters based on voice impressions. The characters are grouped into male and female and indicate that gender is one of important factors the influence the decision on matched voice types. This result is straightforward. However, the contribution of this paper is the grouping manner of the characters. The impression distance among illustrated characters changes when the displayed images changes ((a) whole body, (b) small size face, (c) large size face). Figure 3 is the clustering result of human characters displayed as large size face. Note that gender separation becomes more blurr than face displayed in small size. However, the notable fact is that the distance among characters changes. For instance, characters 3 and 4 are very close in Fig. 2, but they are placed far in Fig. 3. Other changes were observed, indicating that obvious gender impression is not a strong factor of matched voice (Fig. 4).
Fig. 2. Clustering result of human characters based on matched voice. Small face image experiment. Blue circle denotes male, and red circle denotes female. The character 6 is gender neutral. Values below the circles are average of estimated age of characters. (Color figure online)
The gender separation becomes more blurr for the whole body (WB) condition, as the characters 2, 5 and 10 are grouped, while characters 1, 8 and 11
Effects of Visual and Personality Impressions
439
Fig. 3. Clustering result of human characters based on matched voice. Large face image experiment. Blue circle denotes male, and red circle denotes female. The character 6 is gender neutral. Values below the circles are average of estimated age of characters. (Color figure online)
belong to the group of male characters. Estimated age also influences the matching voice impressions, but is not the decisive factor of the voice. Since no specific pose was used in character images, the body posture can be excluded as the influencing factor. In the SF (small face-only) and LF (large face-only) conditions, the information perceived by participants are identical, differing only on the size. Detailed observation of the illustrated characters used in experiments suggests that the eye shape influences the voice impression. The SF condition has the clearer gender separation, as the information received by the participants is limited compared to the LF condition. The same face image displayed in larger size transmits more information to the participants, particularly the details of each face parts, notably the details around the eyes. For instance, the characters 2 and 5 are close in SF condition, but are far in LF condition. The closeness of the estimated age may have affected in SF condition, but the same logic cannot be applied in LF condition, indicating other features were involved. Another example is the character pairs 1 and 4 in LF condition, which belong to a completely different groups in SF condition. The character 4 is paired with character 3 in SF condition, with very similar estimated age. In LF condition, the character 4 is paired with character 1 whose estimated age makes character 1 belong to a different generation, besides of the opposite gender. The clustering result of the whole body (WB) condition indicates that more diverse features affect the voice impressions. The three dendrograms suggest that the number of factors for the voice impressions increase as follows
440
H. Takahashi and T. Maeshiro
Fig. 4. Clustering result of human characters based on matched voice. Whole body image experiment. Blue circle denotes male, and red circle denotes female. The character 6 is gender neutral. Values below the circles are average of estimated age of characters. (Color figure online)
SF < LF < W B
(2)
where SF, LF and WB denote the number of factors for SF, LF and WB conditions. Although the different grouping between the whole body and face-only conditions can be easily interpreted, it is notable that the image display size also influences, since the images shown in SF and LF conditions are identical. The estimated age are approximately identical in all three conditions, suggesting that participants had no different impressions of visual aspects related to the age. Another result indicating that different condition causes distinct matched voice is the number of matched voice types answered by the participants. The number of matched voice types differed for three conditions in all characters used in the experiment. No particular condition (WB, LF or SF) had the maximum or minimum matched voice types, indicating that the effects of these conditions are not unified (Fig. 5). 3.2
Animal Characters
Each animal specie indicated different element of personality traits that is correlated with the matched voice. The gender is a factor that has strong influence on the selection of matched voices, elucidated from our experiments with human characters. Thus the results are shown for (1) all gender (Table 4), (2) male only (Table 5), and (3) female only (Table 6) cases.
Effects of Visual and Personality Impressions
441
Fig. 5. Number of matched voice types for each character and image display conditions. c01 denotes the character-1, and so on. WB denotes the whole body, LF the large faceonly, and SF the small face-only conditions. Table 4. Correlation between personality traits and matched voices. All gender case. indicates that the correlation was detected. SHOW CORR DIR? Specie
Extraversion P T1 Agreeableness P T2 Consciousness P T3 Neurotism P T4 Openness P T5
All species –
–
–
–
–
Frogs Squirrels
–
– –
– –
– –
Cats Rabbits
– –
– –
– –
– –
Elephants Horses
– –
– –
–
– –
–
Of three gender related conditions, no relationship between the matched voice and personality traits was found if all animal species were grouped for the analysis, for all gender and male only cases. However, for female only case, P T2 (agreeableness) was correlated with the matched voice. For individual animal species and gender specific case, the female only condition had more correlated personality traits than the male only case. In other words, the voice impressions and personality trait impressions are more [evident] or easier to be detected for female characters. Although the number of correlated personality traits differs between male and female characters, the correlation direction, whether positive or negative, was the same. As Table 4 shows, animal species except frogs have only one related personality traits, while frogs have two. Cats and rabbits pair and frogs and squirrels pair have the same correlated personality traits, and others have unique correlated personal traits. This result suggest that the selection of the six animal species was appropriate, as the differences of each animal specie resulted in distinct related personal traits. Although frogs and squirrels share the same correlated personal traits (P T2 : agreeableness), frogs are also correlated with P T1 (extraversion),
442
H. Takahashi and T. Maeshiro
Table 5. Correlation between personality traits and matched voices. Male only case. indicates that the correlation was detected. SHOW CORR DIR? Specie
Extraversion P T1 Agreeableness P T2 Consciousness P T3 Neurotism P T4 Openness P T5
All species –
–
–
–
–
Frogs Squirrels
–
–
– –
– –
Cats Rabbits
– –
–
–
Elephants Horses
– –
–
Table 6. Correlation between personality traits and matched voices. Female only case. indicates that the correlation was detected. SHOW CORR DIR? Specie
Extraversion P T1 Agreeableness P T2 Consciousness P T3 Neurotism P T4 Openness P T5
All species –
–
–
–
Frogs Squirrels
–
–
Cats Rabbits
–
–
Elephants Horses
–
which is not true for squirrels. Both frogs and squirrels were selected as small animals, where frogs have familiar animal cries and the squirrels a weak image. For medium size animals, which are cats and rabbits, the same personal traits were correlated with matched voice for the all gender condition. Both animal species showed relation with P T4 (neurotism). For large size animals, which are elephants and horses, different personality traits were related: P T4 (consciousness) for elephants and P T5 (openness) for horses. For the experiment conditions used in this paper, the familiarity or the easiness to imagine animal cries has no strong influences on the visual impression that correlates with the matched voice. For the male only case, more specific correlation between the personality traits and matched voices existed compared to the all gender case. For small animals, frogs had matched voice correlated with extraversion and consciousness, and squirrels with agreeableness and consciousness. For medium size animals, cats had all personal traits correlated except the extraversion, and rabbits indicated agreeableness and neurotism correlated with matched voice. For large size animals, elephants had all personality traits except the agreeableness, and horses had extraversion, neurotism and openness correlated with matched voices. The female only case resulted in the most detailed correlation between the personality traits and matched voice. Frogs had all correlated personality traits except consciousness, and squirrels had extraversion, agreeableness and consciousness that were correlated with matched voice. Cats had all personality
Effects of Visual and Personality Impressions
443
traits except P T1 , and rabbits had P T1 , P T2 and P T5 . Elephants showed all personality traits correlated with matched voice, and horses all personality traits except P T4 . Comparison of all gender, male only and female only conditions indicate that different personality combinations are correlated with matched voices. The results indicate that animal specie specific analyses and gender specific analyses are necessary. Generalization of the conditions by integrating all genders or all animal species does not provide valuable information. These results indicate that personalization is necessary, in this case “animal specification” or “animalization” (analogous to personalization) to analyze separately for each animal specie, as no general tendency exists. Therefore, no general rule to generate and assign matched voice to an animal character exists, and pre-survey on the relationship between the personality traits and matched voices is necessary. Grouping all animal species nullify the correlation between personal traits and matched voices that exists for individual animal specie, and generates no useful information. Although we have analyzed six animal species only, these six species cover a wide variety of animal species (Table 2). Thus the loss of valuable information about matched voices when grouping all animal species is predictable, as the correlation between personality traits and matched voices decreases and eventually lost for large number of animal species. Analyses using more animal species only reinforces the results of this paper. Similar to the human character case, gender is an important factor that influences the evaluation of matched voices. It is interesting that the female only case has more personality traits correlated with matched voices. This suggests two facts. First, female characters have varied factors. And second, the generation of matched voices for female animal characters is more difficult as more personality trait factors are related. In other words, assignment of matched voices for male animal characters is easier because less personality traits should be considered, which is equivalent to fewer parameters related to the correlation between voice and personality traits.
4
Conclusions
Changes in the size of the presented eyes may have an effect on the impression of the character’s voice. These results indicate that appropriate voice range matching a character exists and the range depends on how the character is seen. Except for special scenes in which only the hands or feet of the character are visible, when the character’s face is visible, the impression of the voice from the character’s appearance does not change whether the character’s face is full or only the face is visible. Similarly, in the metaverse, the user is recommended to change the voice depending on the situation. Moreover, since the impression of the voice may change depending on the presentation size of the eyes, it may be possible to reduce viewer discomfort by changing the voice in some situations, such as when the focus is on the eyes. In addition, since this study did not examine the impression change by the character’s facial expression or clothing, it is an
444
H. Takahashi and T. Maeshiro
issue that should be considered in the future whether it is necessary to change the voice according to changes in facial expression or clothing. The results of the present paper is applicable to VTuber videos, metaverse, movie dubbings, and any application that uses speaking characters. Acknowledgments. This research was supported by the JSPS KAKENHI Grant Number 20H04287 (T.M.).
References 1. Ishii, Ito: Kyaracta onsei no sutereotaipu shikibetu notameno onkyobunseki. Dai 81 senkokutaikai 2019, 695–696 (2019) 2. Kamata: Kyarakuta kara kanjiru insho no kenkyu. Tokyo kougei daigaku kiyo 21, 27–40 (2015) 3. Koshio, Abe, Katronipino: Nihongoban ten item personality inventory (TIPI-J) sakusei no kokoromi. Pasonariti kenkyu 21(1), 40–52 (2012) 4. Koutou, Ogawa: Seishiga wo mochiita kao to koe no matting niokeru seikakutokusei no inshou no yakuwari. Nihon ninchishinri gakkai 2015, 3 (2015) 5. Kuritsu, Asano: Seishiga ni okeru michijinbutsu no onsei kara gaiken no suitei. Nihon shinri gakkai taikai 72, 2PM113 (2008) 6. Mehrabian, A.: Silent Messages: Implicit Communication of Emotions and Attitudes. Wadsworth Publishing Company (1981) 7. Morise, M., Yokomori, F., Ozawa, K.: World: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE Trans. Inf. Syst. E99-D(5), 1877– 1884 (2016)
Effects of Gaze on Human Behavior Prediction of Virtual Character for Intention Inference Design Liheng Yang1 , Yoshihiro Sejima2(B) , and Tomio Watanabe3 1 Graduate School of Informatics, Kansai University, 2-1-1 Ryozenji-cho, Takatsuki-shi,
Osaka 569-1095, Japan 2 Faculty of Informatics, Kansai University, 2-1-1 Ryozenji-cho, Takatsuki-shi,
Osaka 569-1095, Japan [email protected] 3 Faculty of Computer Science and Systems Engineering, Okayama Prefectural University, 111 Kuboki, Soja-shi, Okayama, Japan [email protected]
Abstract. In human communication, humans use not only the verbal information but also the nonverbal information to infer others’ intents. To build interpersonal relationships between humans and agents such as robots or CG character, such intention inferring process based on nonverbal information is important. However, it is not clear that which cues of nonverbal information human used to infer an agent’s intent. In this paper, we developed a CG character and designed a simple task to investigate the cues human utilized in intention inference. The intention inference task was as simple as predicting which cup the character would grab. The participants’ gaze points obtained by the eye tracking device were processed to analyze the visual information in the experiment. The results suggested that human tended to gaze at CG character’s eye or face and recognize the gaze of the CG character as intention reference. Keywords: Non-verbal communication · Gaze · Intention inference · Eye tracking
1 Introduction In human communication, to build interpersonal relationships, humans observe and predict others’ behavior to further infer their intents [1]. The intention inference helps promote understanding and deepen the relationship of trust. In intention inference, the nonverbal information is just as important as the verbal information [2]. For instance, gaze and gesture were reported as cues of social attention [3, 4]. In order to build interpersonal relationships between humans and agents such as robots or CG characters, such intention inferring design based on the nonverbal information is important. Intention inference in human-agent interaction not only refers to the agent recognizing the intention of the human, but also requires the human to infer © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Mori and Y. Asahi (Eds.): HCII 2023, LNCS 14015, pp. 445–454, 2023. https://doi.org/10.1007/978-3-031-35132-7_34
446
L. Yang et al.
the intention of the agent. Regarding the robot’s intention, an internal state architecture based on the intention was proposed [5], and an autonomous conversational android based on the intention and desire was developed [6]. However, the human perception of agents’ nonverbal behaviors has not been sufficiently discussed. Among them, it is not clear that which cues human utilized to infer the agents’ intent. In this paper, we developed a CG character and designed a simple task to investigate the cues human used in intention inference. The CG character was of independent and moveable head, eyes, and right arm. The intention inference task was as simple as predicting which cup the character would grab. The participants’ gaze points obtained by the eye tracking device were processed to analyze the visual information in the experiment. The results suggested that human tended to gaze at CG character’s eye or face and recognize the gaze of the CG character as intention reference.
2 Design of CG Character and Motion 2.1 CG Character The 3D model of the designed CG character is shown in Fig. 1. The head, eyes and right arm of the CG character can move independently. The right arm can rotate and translate around the X, Y and Z axes, while the head and eyes can rotate around the X and Y axes. The virtual space was generated by Microsoft DirectX 9.0 SDK and a laptop PC (CPU: Core i-7 1.80 GHz, Memory: 8 GB). The rendering rate was 60 fps.
Fig. 1. 3D model of CG character and the virtual space.
2.2 Motion Design In front of the CG character, there was a table with a yellow cup and a blue cup on it (Fig. 1). The yellow cup was on the CG character’s right side while the blue one was on the left. The cups were placed equidistant to the CG character.
Effects of Gaze on Human Behavior Prediction of Virtual
447
The CG character would grab one of the two cups on the table. When the CG character action started, it would take the right arm of the CG character three seconds to reach the cup. Figure 2 shows the example of the actions. As showed in Fig. 2(a), the CG character was reaching to the yellow cup, while the directions of the CG character’s gaze, face, and right arm were all orienting to the yellow cup. As shown in Fig. 2(b), the CG character faced to the yellow cup but gazed at the blue cup while the right arm was reaching to the yellow cup. As shown in Fig. 2(c), at the second seconds, the CG character’s right arm shifted the direction from the yellow cup to the blue cup, while the direction of face and gaze would not change. Therefore, the right arm’s motion was divided into phase 1 and phase 2. Table 1 indicates the 16 conditions of the actions of the CG character that combine two matched conditions (No. 1 and No. 16 condition in Table 1 that arm-face-eyes always direct to the same cup) with 14 inconsistent conditions (from No. 2 to No. 15 conditions in Table 1). The color in Table 1 refers to the color of the cups.
Fig. 2. Conditions of the CG character’s actions. (a) Matched condition. (b) Inconsistent condition without the arm shifted. (c) Inconsistent condition with the arm shifted at the phase 2.
448
L. Yang et al. Table 1. Sixteen conditions of the virtual character’s actions.
Number of Condition
Arm (phase 1)
Arm (phase 2)
Face
Eye
1
Yellow
Yellow
Yellow
Yellow
2
Yellow
Yellow
Yellow
Blue
3
Yellow
Yellow
Blue
Yellow
4
Yellow
Yellow
Blue
Blue
5
Yellow
Blue
Yellow
Yellow
6
Yellow
Blue
Yellow
Blue
7
Yellow
Blue
Blue
Yellow
8
Yellow
Blue
Blue
Blue
9
Blue
Yellow
Yellow
Yellow
10
Blue
Yellow
Yellow
Blue
11
Blue
Yellow
Blue
Yellow
12
Blue
Yellow
Blue
Blue
13
Blue
Blue
Yellow
Yellow
14
Blue
Blue
Yellow
Blue
15
Blue
Blue
Blue
Yellow
16
Blue
Blue
Blue
Blue
3 Experiment 3.1 Experiment Procedures The experimental system consisted of a laptop with Windows 11, a display (23.8-inch, 1920 × 1080 px resolution), a keyboard and an eye tracking device (Tobii Pro nano). The eye tracking device was placed at the bottom of the display and recorded the participant’s gaze point coordinates during the experiment. The sample rate of the eye tracking device was 60 Hz and synchronized with the CG character rendering rate. The participants were 13 students. The participants were firstly asked to calibrate with the eye tracking device and were told of a brief introduction about the experiment [7]. In the experiment, the participants were asked to observe the motion of CG character and predict which cup the CG character was going to grab. They were required to press the correspondent key on the keyboard as soon as they made prediction. The 16 conditions were executed once each, and the order of the conditions were random to the participants. After completing all conditions, the participants were asked to answer a questionnaire about the cues they used on the intention inference during the experiment.
Effects of Gaze on Human Behavior Prediction of Virtual
449
3.2 Gaze Data Analysis Method To obtain the participants’ visual information in the experiment, the coordinates of their gaze point obtained from the eye tracking device were analyzed. Due to the high loss rate of tracking data in three participants, ten available tracking data were analyzed. The gaze data analysis method is as follows. First, the obtained gaze points coordinates were transformed to the coordinates system where the origin was set as the center point of the eyes of the CG character. Then, the transformed coordinates were normalized by 490 px, which is the distance between the origin and the cups. For each normalized coordinate, the Distance (the magnitude of the coordinate) and the Angle (the direction with the Y axis) can be calculated. Based on the Distance and Angle, each coordinate can be classified into four categories: Eye, Face, Body, and Other. The area distribution for each category is shown in Fig. 3.
Fig. 3. Area distribution of analysis in gaze data.
Face (yellow zone): The head of CG character was rotated, while the size of the face area remained unchanged. Therefore, the area of Face category was defined similarly to the CG character’s face area. (Face: 0< Distance |t|)
Ease of use
0.225
Utility
0.334
0.030
7.618
6.0E−14
***
0.030
11.146