253 10 73MB
English Pages 470 [471] Year 2021
LNCS 12523
Nuno J. Nunes · Lizhuang Ma · Meili Wang · Nuno Correia · Zhigeng Pan (Eds.)
Entertainment Computing – ICEC 2020 19th IFIP TC 14 International Conference, ICEC 2020 Xi'an, China, November 10–13, 2020 Proceedings
Lecture Notes in Computer Science Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA
Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA
12523
More information about this subseries at http://www.springer.com/series/7409
Nuno J. Nunes Lizhuang Ma Meili Wang Nuno Correia Zhigeng Pan (Eds.) •
•
•
•
Entertainment Computing – ICEC 2020 19th IFIP TC 14 International Conference, ICEC 2020 Xi’an, China, November 10–13, 2020 Proceedings
123
Editors Nuno J. Nunes University of Lisbon Lisbon, Portugal
Lizhuang Ma Shanghai Jiao Tong University Shanghai, China
Meili Wang Northwest A&F University Xianyang, China
Nuno Correia Universidade Nova de Lisboa Lisbon, Portugal
Zhigeng Pan Hangzhou Normal University Hangzhou, China
ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-65735-2 ISBN 978-3-030-65736-9 (eBook) https://doi.org/10.1007/978-3-030-65736-9 LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI © IFIP International Federation for Information Processing 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This LNCS volume collects all contributions accepted for the 19th edition of the International Conference on Entertainment Computation (IFIP-ICEC 2020). IFIP-ICEC is the longest lasting conference on entertainment computation, after a series of successful conferences that have been held in São Paulo, Brazil (2013), Sydney, Australia (2014), Trondheim, Norway (2015), Vienna, Austria (2016), Tsukuba, Japan (2017), Poznan, Poland (2018), and Arequipa, Peru (2019). This year’s event was held online during November 11–13, 2020. Overall, we received 72 submissions from several countries across Europe and Asia. Each submission underwent a rigorous review process and received at least three reviews by members of the entertainment computation community. Eventually, we accepted 21 full papers, 18 short papers, and 2 posters. IFIP-ICEC 2020 innovated several aspects of the conference: an emerging topics paper track was added to provide a forum for emerging new topics; art exhibition and interactive sessions enriched the conference with more creative types of contributions; the IFIP-ICEC award was introduced to promote the best contributions; and thematic areas were introduced to broaden the thematic cover of the conference. The areas introduced in 2020 covered the topics area of entertainment systems and technology, digital games, bioinformatics, digital art, AI/VR/AR, and 3D modeling. Besides these novelties, the conference program was enriched by two keynote speakers: Professor Nadia Magnenat Thalmann from the University of Geneva, Switzerland, and Professor Jian Chang from Bournemouth University, UK. The two speakers contributed their views on the latest developments in entertainment computing. We would like to thank all Program Committee members for their hard work; all reviews were conducted on time. Therefore, we would truly like to thank them for their hard work. We also would like to thank Northwest A&F University, Xi’an University of Technology, as well as Hangzhou Normal University, China, for organizing the overall IFIP-ICEC 2020 event and for their help and support. We especially would like to express our gratitude to all the Organizing Committee members, especially our area chairs. Many thanks also go to our sponsors, the International Federation for Information Processing (IFIP) for supporting this year’s conference. October 2020
Nuno J. Nunes Lizhuang Ma Meili Wang Nuno Correia Zhigeng Pan
Organization
Artur Lugmayr Baoping Yan David Cheng Erik Vander Spek Esteban Clua Haiyan Jin Helmut Hlavacs Hongming Zhang Jannicke-Baalsrud Hauge Jianzhi Mu Junichi Hoshino Licinio Roque Lizhuang Ma Meili Wang Mengbo You Minhua Eunice Ma Naoya Isoyama Nick Graham Nuno Correia Nuno Nunes Rainer Malaka Ryohei Nakatsu Shaojun Hu Teresa Romao Valentina Nisi Wenzhen Yang Zepeng Wang Zhigeng Pan
Curtin University, Australia Nanjing University of the Arts, China Beijing DeepScience Technology Co., Ltd., China Eindhoven University of Technology, The Netherlands Edith Cowan University, Australia Xi’an University of Technology, China University of Vienna, Austria Northwest A&F University, China Bremer Institut für Produktion und Logistik, University of Bremen, Germany Nanjing Guangzhi Colorful Information Technology Co., Ltd., China University of Tsukuba, Japan University of Coimbra, Portugal Shanghai Jiao Tong University and East China Normal University, China Northwest A&F University, China Northwest A&F University, China Staffordshire University, UK Nara Institute of Science and Technology, Japan Queen’s University, Canada New University of Lisbon, Portugal Human-Computer Interaction Institute, Portugal University of Bremen, Germany Kyoto University, Japan Northwest A&F University, China Universidade NOVA de Lisboa, Portugal Madeira University, Portugal Zhejiang Sci-Tech University, China Northwest A&F University, China Hangzhou Normal University, China
Contents
Games Serious Violence: The Effects of Violent Elements in Serious Games . . . . . . Nat Sararit and Rainer Malaka Enhancing Game-Based Learning Through Infographics in the Context of Smart Home Security. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehrdad Bahrini, Nima Zargham, Johannes Pfau, Stella Lemke, Karsten Sohr, and Rainer Malaka
3
18
Automatic Generation of Game Levels Based on Controllable Wave Function Collapse Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Darui Cheng, Honglei Han, and Guangzheng Fei
37
VR-DLR: A Serious Game of Somatosensory Driving Applied to Limb Rehabilitation Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tianren Luo, Ning Cai, Zheng Li, Zhigeng Pan, and Qingshu Yuan
51
A Procedurally Generated World for a Zombie Survival Game . . . . . . . . . . . Nikola Stankic, Bernhard Potuzak, and Helmut Hlavacs “Let’s Play a Game!” Serious Games for Arabic Children with Dictation Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samaa M. Shohieb, Abd Elghaffar M. Elhady, Abdelrahman Mohsen, Abdelrahman Elbossaty, Eiad Rotab, Hajar Abdelmonem, Naira Elsaeed, Haidy Mostafa, Marwa Atef, Mai Tharwat, Aya Reda, M. Aya, and Shaibou Abdoulai Haji Provchastic: Understanding and Predicting Game Events Using Provenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Troy C. Kohwalter, Leonardo G. P. Murta, and Esteban W. G. Clua Applying and Facilitating Serious Location-Based Games . . . . . . . . . . . . . . Jannicke Baalsrud Hauge, Heinrich Söbke, Ioana A. Stefan, and Antoniu Stefan
65
77
90 104
The Braille Typist: A Serious Game Proposal for Braille Typewriter Training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kayo C. Santana, Victor T. Sarinho, and Claudia P. Pereira
110
Murder Mystery Game Setting Research Using Game Refinement Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuo Xiong, Long Zuo, and Hiroyuki Iida
117
viii
Contents
Finding Flow in Training Activities by Exploring Single-Agent Arcade Game Information Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuexian Gao, Naying Gao, Mohd Nor Akmal Khalid, and Hiroyuki Iida Players Perception of Loot Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Albert Sakhapov and Joseph Alexander Brown Braillestick: A Game Control Proposal for Blind Users Based on the Braille Typewriter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kayo C. Santana, Abel R. Galvão, Gabriel S. Azevedo, Victor T. Sarinho, and Claudia P. Pereira
126 134
142
Virtual Reality and Augmented Reality Conquer Catharsis – A VR Environment for Anxiety Treatment of Children and Adolescents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Lenz, Helmut Hlavacs, Oswald Kothgassner, and Anna Felnhofer
151
Virtual Reality Games for Stroke Rehabilitation: A Feasibility Study . . . . . . . Supara Grudpan, Sirprapa Wattanakul, Noppon Choosri, Patison Palee, Noppon Wongta, Rainer Malaka, and Jakkrit Klaphajone
163
Interactive Simulation of DNA Structure for Mobile-Learning. . . . . . . . . . . . Feng Jiang, Ding Lin, Liyu Tang, and Xiang Zhou
176
Augmented Reality Towards Facilitating Abstract Concepts Learning . . . . . . Sandra Câmara Olim and Valentina Nisi
188
Enhancing Whale Watching with Mobile Apps and Streaming Passive Acoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nuno Jardim Nunes, Marko Radeta, and Valentina Nisi
205
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare . . . . . . . . . Paulo Bala, Mara Dionisio, Sarah Oliveira, Tânia Andrade, and Valentina Nisi
223
Survival on Mars - A VR Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Ramharter and Helmut Hlavacs
240
Tangible Multi-card Projector-Based Interaction with Physics . . . . . . . . . . . . Songxue Wang, Youquan Liu, and Junxiu Guo
248
Co-sound: An Interactive Medium with WebAR and Spatial Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kazuma Inokuchi, Manabu Tsukada, and Hiroshi Esaki
255
Contents
ix
A Memory Game Proposal for Facial Expressions Recognition in Health Therapies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samuel Vitorio Lima and Victor Travassos Sarinho
264
Augmented Reality Media for Experiencing Worship Culture in Japanese Shrines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kei Kobayashi and Junichi Hoshino
270
Customer Service Training VR Game System Using a Multimodal Conversational Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomoya Furuno, Yuji Omi, Satoru Fujita, Wang Donghao, and Junichi Hoshino
277
Artificial Intelligence Procedural Creation of Behavior Trees for NPCs. . . . . . . . . . . . . . . . . . . . . Robert Fronek, Barbara Göbl, and Helmut Hlavacs
285
Developing Japanese Ikebana as a Digital Painting Tool via AI . . . . . . . . . . Cong Hung Mai, Ryohei Nakatsu, and Naoko Tosa
297
Learning of Art Style Using AI and Its Evaluation Based on Psychological Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cong Hung Mai, Ryohei Nakatsu, Naoko Tosa, Takashi Kusumi, and Koji Koyamada
308
Deep Learning-Based Segmentation of Key Objects of Transmission Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mingjie Liu, Yongteng Li, Xiao Wang, Renwei Tu, and Zhongjie Zhu
317
Classification of Chinese and Western Painting Images Based on Brushstrokes Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liqin Qiao, Xiaoying Guo, and Wenshu Li
325
Role and Value of Character Design of Social Robots . . . . . . . . . . . . . . . . . Junichi Osada, Keiji Suzuki, and Hitoshi Matsubara
338
Edutainment and Art Clas-Maze: An Edutainment Tool Combining Tangible Programming and Living Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qian Xing, Danli Wang, Yanyan Zhao, and Xueyu Wang To Binge or not to Binge: Viewers’ Moods and Behaviors During the Consumption of Subscribed Video Streaming . . . . . . . . . . . . . . . . . . . . Diogo Cabral, Deborah Castro, Jacob M. Rigby, Harry Vasanth, Mónica S. Cameirão, Sergi Bermúdez i Badia, and Valentina Nisi
353
369
x
Contents
Psychological Evaluation for Images/Videos Displayed Using Large LED Display and Projector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryohei Nakatsu, Naoko Tosa, Takashi Kusumi, and Hiroyuki Takada To Borrow Arrows with Thatched Boats: An Educational Game for Early Years Under the Background of Chinese Three Kingdoms Culture . . . . . . . . Hui Liang, Fanyu Bao, Yusheng Sun, Chao Ge, Fei Liang, and Qian Zhang João em Foco: A Learning Object About the Dyslexia Disorder . . . . . . . . . . Washington P. Batista, Kayo C. Santana, Lenington C. Rios, Victor T. Sarinho, and Claudia P. Pereira
382
391
399
3D Modeling 3D Modeling and 3D Materialization of Fluid Art that Occurs in Very Short Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naoko Tosa, Pan Yunian, Ryohei Nakatsu, Akihiro Yamada, Takashi Suzuki, and Kazuya Yamamoto
409
A 3D Flower Modeling Method Based on a Single Image . . . . . . . . . . . . . . Lin Jiaxian, Ju Ming, Zhu Siyuan, and Wang Meili
422
Dynamic 3D Scanning Based on Optical Tracking . . . . . . . . . . . . . . . . . . . Han Jiangtao, Yao Longxing, Yang Long, and Zhang Zhiyi
434
Animation Body2Particles: Designing Particle Systems Using Body Gestures . . . . . . . . . Haoran Xie, Dazhao Xie, and Kazunori Miyata
445
Discussion on the Art of Embryonic Form of Computer Animation—the Peepshow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cheng Liang Chen and Wen Lan Jiang
459
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
469
Games
Serious Violence: The Effects of Violent Elements in Serious Games Nat Sararit(B)
and Rainer Malaka
University of Bremen, Bremen 28359, Germany [email protected], [email protected] Abstract. In serious game development, violent elements are not used as widely as in the industry. Very few studies looked at the effects of violence in interactive educational contexts. This paper explores the effects of violent versus non-violent audiovisual and narrative elements. In which a better understanding can lead to a positive and appropriate use of violence in serious games. We present 2 experiments (n = 30 and 38) using a custom-made game for human bone anatomy learning. We conducted our first experiment incorporating violent and non-violent audiovisuals without narratives. Small negative effects were observed in the sample group with violent audiovisuals. Afterward, We conducted the second experiment incorporating narratives that aim to negate the negative effects. Our results found significant improvements in short-term memorization in all conditions in both experiments. Player experience evaluation indicates that the non-violent condition results in greater intuitive control, but only in the first experiment. Keywords: HCI
1
· Serious games · Violence · User study
Introduction
It is a topic for ongoing debates in many modern societies that violence in video games could be harmful and might have negative effects. However, it is also a non-negligible fact that violence in medias obviously have a certain appeal. A large proportion of games, movies and novels have violent elements such as crime, murder or war [20,23,24]. It is therefore an interesting question if violence as a dramatic element of media might also have a positive and motivational effect. In fact, some studies support this and have shown that violence may positively stimulate perception, attention and cognition as well as giving players more self-control in video games [1,2,5]. But when we examine the use of violent elements in video games for educational or other serious purposes, the lack of violence may at first be obvious when considering possible ethical reasons. But if audiovisual violence obviously attracts so many players in entertainment games, what cognitive and motivational effects could such stimuli elicit in an educational application? Supported by Klaus Tschira Foundation. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 3–17, 2020. https://doi.org/10.1007/978-3-030-65736-9_1
4
N. Sararit and R. Malaka
To answer our research question: Can violent audiovisual elements in serious games make users learn any different than the traditional non-violent counterparts? We first developed an educational serious game on the subject of basic bone anatomy with three different audiovisual elements as stimuli: Non-violent, Violent, and Neutral. The game’s objective is to interact with a talking skeleton which will announce each bone part that players have to interact with, enabling players to learn what the bones are called and what they look like in 3D space. We evaluated the game design in two aspects: 1) Measuring player experience: through self-reported Player Experience Need Satisfaction(PENS) questionnaire [19]. 2) Effects on memorization: how different the users perform on the bone anatomy quiz before and after the gameplay. We consider both the supportive and discouraging arguments in our background research. This leads to our research objective focusing on exploring both negative and positive cognitive effects and effects on player experience of violent audiovisual element in a serious game. By comparing it to the neutral, and non-violent (pleasant, constructive) audiovisual elements to find out if there are any advantages or disadvantages from the integration of these different conditions. And if there are negative effects in using violent audiovisual elements, can an integration of a moral disengaging narrative be used to lessen said effects and improve the experience. The better understanding about the effects and appropriate integration of these violent elements from this study can lead to an improved and more versatile serious game design.
2
Related Work
Many studies investigated the positive usage of video games. Such as, how the formal and dramatic elements in game design play important roles. [6] Both can highly influence the player experience. Game mechanics can be used together with different premises resulting in quite different games even though the formal elements of the games are the same. Violence in games is often less linked to certain game mechanics but rather conveyed through audiovisual elements and the appearance of a game and its asset design. Furthermore, from a psychological perspective, a phenomenon of human attraction to violence in real life is also observed. In medias, including video games. Lastly, we look at the investigations of positive effects of violent, non-violent and other unconventional elements in video games. 2.1
Video Games and Learning
Playing video games often requires the player to monitor and to react to stimuli rapidly involving quick decision making. Consequently, multiple studies have found evidence of cognitive skill improvements that can be gained from playing video games. These improvements include attentional control and enhanced adaptability [9], information processing [17] and possible working
Serious Violence: The Effects of Violent Elements in Serious Games
5
memory enhancement [16]. Serious games share many of these characteristics found in entertainment video games. For educational interactive media, serious games can be effective teaching tools. Wouters et al. [25] conducted a meta-analysis towards serious games, which indicated that serious games can be more beneficial for learning purposes in comparison with conventional analog methods. Further, they can even be translated to real-life benefits as demonstrated by Malaka et al. [14] that serious games can be used for serious applications such as health and fitness. However, many educational serious games still cannot escape the stigmatization of being dull and boring. A recent finding by Maheu-Cadotte et al. [13] on the effectiveness of serious games in terms of engagement and educational outcomes with healthcare professionals and students showed mixed results. The differences in the level of success of educational games can be attributed to the differences in design elements of serious game. This suggested that the investigation into a novel approach to serious game design can be of significance in gaming research. 2.2
Attraction to Violence
Our research question is based on the assumption that humans are somehow and to some degree attracted to violence. When considering physiological explanation, Strenziok et al. [21] suggested a link between violence and brain reward circuits, indicating that humans can crave violence in the same way as we do for intercourse, food, and narcotics. This implies that we are likely to give more attention and feel pleasure from performing a task related to any of them. Thus it almost seems to be natural that media and fiction focus a lot on “sex and crime” and news, movies, novels and games are flooded with such elements. Popular examples include the recent success of the TV show Game of Thrones that as been recognized as the most successful TV series ever and also been criticized for its violent and sexual content [12]. In Goldstein’s assembly of expert views on the reasons why people are attracted to media violence [7], Dolf Zillmann summarized his findings ranging from multiple catharsis hypotheses, Jungian and Freudian doctrines, evolutionary notions, protective vigilance and fear mastery, among others. While there is no universal theory, it becomes clear that multitudes of factors can draw a person toward violence in media. 2.3
Violence in Video Games
When we look specifically at violence in video games, the negative effects and the correlation to real-life aggression have been studied and debated extensively for decades. Numerous studies have suggested that there are links between aggression and violent video games [10]. Yet, several studies also suggested a counterpoint that a link between real-world violence and violent video games does not exist and there are more factors to be concerned about when discussing the correlation [15,18]. Despite the ongoing debate, an undeniable fact is, violent video games sell. An example from the US market in recent years shows the two most
6
N. Sararit and R. Malaka
played video game genres are first person shooter and action-adventure,1 both of which generally have violent elements. Furthermore, in another survey, the majority of players also purchase more games with violent element2 . 2.4
Positive Effect of Violent Video Games
Violent video games have not only been studied in the field of computer science but also in developmental, behavioral, and educational psychology. While the concern is focused on the negative effects of violent media content, some psychologists and computer scientists also look into the positive effects which can be mediated by such contents. Agina et al. [1] presented the idea of safely turning violent arousal into positive use in children’s learning. The authors believe that it is almost impossible to stop the market releasing violent video games and to entirely avoid children from being exposed to them. They conducted a study on 100 preschool children with two violent and two non-violent video games. The participants were separated into two groups, one started with playing violent games and the other started with playing non-violent games. The results showed that children who started with violent game-playing were able to regulate themselves better than the other to understand the benefit of the non-violent learning tools. 2.5
Violence and Short Term Memory Impairment
Despite the positive effects that arise from learning through video games, it is also possible that simulated violent audiovisual elements can also have negative cognitive effects. A research by Bogliacino et al. [3] found that exposure to violence or recalling the violent event can reduce short term memory and cognitive control. In a similar study focusing on media violence, Bushman et al. [4] investigated the effect of watching TV advertisements that contained neutral, sexually explicit, and violent content on the audience’s memory. The study found that both sexually explicit and violent advertisements impaired the participants’ memories of all gender regardless of whether they liked the programs or not. These examples suggested that despite the positive cognitive skills that can be improved by learning through video games, violent and explicit content can negatively impact the learning process, especially on memorization. 2.6
Video Game Narratives and Moral Disengagement
It is a possibility that having players commit virtual violent acts in video games can lead to negative emotional effects which can undermine the players’ experience of the game. This is especially crucial in an educational game that could benefit from player engagement and players having positive experience while playing the game. Hartmann et al. [11] found in their experiments which tested the 1 2
https://www.statista.com/statistics/189592/breakdown-of-us-video-game-sales2009-by-genre/. https://www.statista.com/statistics/246715/share-of-violent-vs-non-violent-videogames-played-in-the-us/.
Serious Violence: The Effects of Violent Elements in Serious Games
7
assumption that moral disengagement cues provided by a violent video game’s narrative, framing the situation as “Just a game” or “Just an experiment” and having a just purpose for the violent actions, can lessen the users’ guilt and negative effects. By designing a game narrative according to these concepts, we believe it is possible to reduce or prevent any negative emotional effects from virtual violence and create a better player experience. 2.7
Anatomy Learning with Interactive Systems
The requirement to mentally visualize many parts of the human body in detail and to learn human anatomy by heart can prove to be a challenge to many students. This can leads to lack of motivation and negatively impact the learning outcome. Yammine et al. [26] found that 3D visualization technology has been used for improving the anatomy leaning process with great effects. The 3D modelling allow the learner to explore structure of anatomy in more dimension which lead to improvement of spacial knowledge acquisition and user satisfaction compare to traditional 2D study aid methods. While this demonstrated the positive effects of interactive technology for anatomy learning, we see a distinct higher potential of incorporating game elements. When we look at the trend in the current entertainment game market, there are many approaches to designing games that focusing on human body, especially if violent elements can be introduced. A video game like Surgeon Simulator [22] tasks players with simulated non-realistic surgical operations on a human body. The game distinguishes itself by incorporate comical interactions and violent elements such as blood and gore into surgery, a serious situation in real life. Despite being non-realistic, the anatomy aspect of the game is accurate enough for beginners and can be a fun introduction into various medical procedures and the human anatomy. In this paper, we take a closer look at violent game elements that are highly popular in entertainment media but rarely used in educational serious games. We investigate their effects on learning such as memorization in a anatomy learning game.
3
Prototype Development
This section details our serious game design and development for this study, as follows: 3.1
Game Design
We developed a game for Android devices with the main goal to teach basic bone anatomy to the players. We created a talking, full-bodied, human skeleton character, called “Skelly”, whose 37 bone parts can be tapped on by the player. The three different game conditions all shared identical game mechanics. As the player taps a certain bone, Skelly will announce the name of the bone
8
N. Sararit and R. Malaka
Fig. 1. Game play with 3 stimulus conditions: (from left to right) Non-violent.violent, neutral.
immediately in the form of the speech bubble and it will also be displayed in a comic style text. The game scene can be rotated around the skeleton, letting the player see each bone part in 3D. This allows players to associate with the visual representation of the bone parts, their spatial positions on the body and to learn the bone names through the gameplay. The game consists of two play modes. 1) Free Play Mode: the player can tap on any part of the skeleton to reveal the name of the part. 2) Challenge Mode: upon a prompt naming a certain bone, players have to find it and identify it by tapping on it correctly in order to move on to another part. Both modes have a time limit of three minutes. 3.2
Audiovisual Stimuli
To determine the effect of violent elements as a stimulus in an educational game setting, we have identified three different audiovisual stimuli as the experimental conditions (Fig. 1). Neutral. In this condition, the bone information is displayed directly when the player taps on a body part. The display background and text colors are in a neutral grayscale tone. Skelly will make neutral mumble sound when each part revealed. Violent. In this condition, each tap on a bone part will break and destroy the part like a hammer smash with added blood splattering visual effect. Each bone break will result in Skelly’s announcement of the bone part’s name in pain, accompanied by a scream and hurting animation. The background and displaying text color are in black and red to reinforce the violent context.
Serious Violence: The Effects of Violent Elements in Serious Games
9
Non-violent or Pleasant. In this condition, the player will have the tasks to rebuild Skelly from a semi-transparent stage into a solid skeleton. Each tap on a bone part will materialize the part with its name revealed with an added burst of hearts as visual effect. Skelly will also enthusiastically announce the names, accompanying with a happy giggle and exciting, nodding animation. The color of background and display text is in refreshing blue and green. 3.3
Narratives
In order to create moral disengagement and lessen the negative effects that could effect players’ performances and experiences, mainly by the violent stimulus, we created 2 narratives. Each for Violent and non violent conditions of the game in the form of an in-game cut-scene. Each cut-scene was set to play right after the pre-test section and before the free mode of the game-play section in each version. We strictly focused the writings in the narratives to explain and justify the action that the player about to perform to the talking skeleton and framed the situation as being a somewhat comical incident that happens in an experiment. Both narratives were designed to be symmetrical and comparable to its counterpart. The details of the two narratives are as follow: Violent Narrative. The participant is a part of an experimental program that aims to create an artificial talking skeleton for teaching human bone anatomy, but there’s a gruesome defect in this particular model. In which the skeleton keeps killing puppies (Corgis) that live in the lab for no reason. “Doing so just makes him happy”. The player were then tasked with decommissioning the murderous talking skeleton by destroying its parts one-by-one. Non-violent Narrative. The participant is a part of an experimental program that aims to create an artificial talking skeleton for teaching human bone anatomy, but there’s a comical problem happening to this particular model. In which the skeleton’s parts keep getting stolen by puppies (Corgis) that live in the lab, because dogs love bones. The player were then tasked with reconstructing the poor talking skeleton with its retrieved parts one-by-one. 3.4
Classification of Violence in Video Games
According to ESRB3 and PEGI 4 rating system, a “violence” descriptor is assigned to a computer game which contains depictions of blood, graphic violence towards human-looking characters, and dismemberment. Such elements are displayed in the violent condition of our application and fit the descriptor. As such, the mode with violent stimulus of the application should be considered suitable only for teens above 13 years old. While the other two modes with positive and neutral stimuli do not contain any such elements and hence, should not be considered for violence descriptor. 3 4
https://www.esrb.org/ratings/ratings guide.aspx. https://pegi.info/what-do-the-labels-mean.
10
3.5
N. Sararit and R. Malaka
Ethical Concerns
The experiment was conducted with consenting, adult volunteers, who understood the scope of the experiment. All participants read the information document on the experiment procedure and signed the consent form beforehand. Participants were clearly informed that they were free to stop the experiment at any time without negative consequences. All steps of the procedure were conducted in accordance with the institution’s ethical regulation and did not need additional ethical approval.
4
Study
We conducted a pilot study to evaluate the application, followed by the main user study. Our experiments are structured as follows: 4.1
Material
The application was deployed on an Android tablet with a 10-inch screen, running Android 8.0. The players’ actions were recorded in the text log via the built-in recorder. All quizzes and questionnaires were built into the application. The recorded data included the correct and incorrect choices, together with timestamps. Except for the pre-study trial, the pre- and post-tests were identical multiple choice quizzes including 37 items (same as the number of the bone parts in the game). Audio recordings of the post-experiment interview were conducted using the same tablet under the participants’ consent. 4.2
First Experiment: Audiovisuals
We conducted a between subject experiment with n=30 participants who are volunteering international students of various age and gender (average age of 24. 12 females and 18 males). First, the participants were asked to fill in a demographic questionnaire to gather some basic information including age, education level, gender, how often do they play video games, preference of game genres. In addition participants filled in the Ten-Item Personality Inventory (TIPI) [8] Afterward, each took a pre-test evaluating prior knowledge of basic human bone anatomy. Next, they were instructed to play the game under the only one condition which each of them were assigned to in the Free Play Mode. After three minutes of Free Play Mode, the participants were asked to switch to the Challenge Mode under the same condition. Following that, the participants were asked to take a post-test which we would compare with the result of the pre-test. Lastly, we collected participants’ experience data from a PENS questionnaire. All aforementioned questionnaires were built-in into the application. After each participant finished with the application, we conducted a postexperiment interview. We asked for the participants’ usual memorization methods and their opinions on our game. We then investigated the participants’ preferences toward different conditions of each participant. These were conducted to
Serious Violence: The Effects of Violent Elements in Serious Games
11
determine any correlations between the participants’ preferences and methods and their other results. To analyze learning style and the preference of players, we asked questions related to the method of memorization and game elements, as well as specific question related to audiovisual elements. In relation to the learning aspect, we asked what are the methods that the participants normally use for memorizing when they need to learn a subject. Additionally, we asked their opinion of using our game prototype to memorize content from the game. We also asked the participants to specify the most impactful audiovisual element (audio, animation, graphic or other) to them. Finally, We showed videos of gameplay of the other 2 conditions that the participants did not play and asked them to choose one of three versions of game (Non-violent, Violent or Neutral ). For the latter, we wanted to know their opinion on audiovisual of the game. Additionally, we also asked for their overall opinion on the game. 4.3
Second Experiment: Narratives
The second experiment were conducted with n = 38 volunteering participants. Who are all international students (Average age: 20. 23 males, 15 females). The participants were randomly assigned into 2 groups, each played a version of the game with violent or non-violent condition. All participants followed the same procedures as in the first experiment, except for the additional viewing of the ingame cut-scene before the game-play started. All data recordings, questionnaires and interviewed were conducted identical to the first experiment. The neutral condition was omitted from the second experiment because we determined a neutral narrative to be the same or similar to having no narrative.
5
Result
In the following, we will present and analyze the result from the questionnaires, participants’ pre- and post-test scores and post-experiment interviews from both experiments. 5.1
Effect on Short-Term Memorization
We determined the short-term memorization effect by looking at the improvements between pre- and post-test scores of participants in each group (full score of 37 points). First, we conducted a one-tailed dependent T-test on the scores of participants within each group. In the first experiment, the results from each pretest (Neutral: M = 9.7, SD = 7.39; Violent: M = 10.4, SD = 8.36; Non-violent: M = 13; SD = 7.8) and post-test (Neutral: M = 22, SD = 10.38; Violent: M =19.7, SD = 6.36; Non-violent: M = 23.1, SD = 8,1) indicate that the Integration of all audiovisual elements resulted in an improvement in short term memorization, Neutral: t(9) = 5.36, p < .001; Violent: t(9) = 3.58, p = 0.003; Non-violent: t(9) = 7.09, p < .001.
12
N. Sararit and R. Malaka
The same improvements in short term memorization also present in all conditions in the second experiment. The results from each pre-test (Violent: M = 4.9, SD = 4.33; Non-violent: M = 4.26; SD = 4.64) and post-test (Violent: M = 15.11, SD = 6.61; Non-violent: M =17.26, SD = 6.02) indicate that the Integration of all audiovisual elements with the narratives resulted in an improvement in short term memorization, Violent: t(18) = 8.469, p < 0.001; Non-violent: t(18) = 7.34, p < .001. All groups have significant improvements in the post-test scores compare to their pre-test scores. Secondly, we look at the improvement scores (Post-test scores minus pretest scores) of each participant to compare between groups. Overall, the lowest improvement score being 2 points and highest being 23 points from the first experiment. Participants in neutral condition had the highest mean improvement scores. One-way ANOVA analysis on the scores of the 3 conditions showed no significant differences. While in the second experiment, the non-violent condition with narrative has the higher mean score compares to its violent counterpart. Similar to the first experiment, no significant different were found in One-way ANOVA analysis. 5.2
Adverse Effect on Memorization from Violent Elements
In the first experiment, even though there are no significant difference in the data between the three groups, a closer look reveals an interesting insight: Only two participants (P09, P11) scored lower in the post-test than in the pre-test. Both were female who played the condition with violent audiovisuals and aggression questionnaire scores of both participants were low on all sub-scales. This reduced score was no longer found in the second experiment. This different outcomes suggest that by incorporating narrative, the negative effects of the violent elements in our serious game can be eliminated or lessen. 5.3
Player Experience
We conducted a one-way analysis of variance (ANOVA) mean comparison on the sub-scales of competence, autonomy, presence/immersion, and intuitive controls of PENS to compare and evaluate the player experiences between different groups and measuring the effect from the violent elements. From the first experiment, our the analysis of the mean scores between 3 groups of neutral, violent and non-violent conditions revealed a significant difference in the sub-scales intuitive control(F (2, 27) = 4.678, p = 0.019) between the non-violent and the violent conditions. In which the PENS intuitive control sub-scale of the non-violent group was significantly greater than the violent group. In other words, without any narrative in the game the participants in the non-violent group found the controls to be significantly easier to understand and use when compared to the participants in the violent group. However, in the second experiment when we incorporated the narratives to lessen the negative effects, no significant difference in player experience were
Serious Violence: The Effects of Violent Elements in Serious Games
13
found between the two groups of violent and non-violent conditions from the oneway ANOVA mean comparison. The plot of the sub-scales on both experiments is presented in Fig. 2.
Fig. 2. PENS questionnaire scores’ mean comparison per condition from both experiments.
5.4
Interviews
To analyze the interview results, we first collected the answers and categorized them into 4 categories: Opinion on using this game for memorization, overall opinion of the game, audiovisual, usual method of memorization, and the most impactful game element. In the following, we present the results along these categories. All participants had positive responses to our approach to memorization used in this game and thought they would preferred this game, or thought it can be used to aid the traditional learning methods such as from textbooks and lecture. Overall opinion on the game on both experiments: 33% of the participants had highly positive opinion on the game. Found themselves to be motivated and found the approach to be novel. other 33% found the game to be more like an edutainment or e-learning tool. Less than 2% thought the game was boring. The rest had various other opinions. For participants’ usual methods of memorization: 25% of the participants preferred some form of writing-down technique, 23% preferred using visual aids, visualization, 25% preferred using repetition or focus on repeat reading and the rest of the participants preferred other methods. Interestingly, The participants who had the highest improvement in Pre- and post-test scores (above 60 percentile) preferred the methods which can be associated with our game, visualization (15%) and repetition (15%). There is only 1 participant with high
14
N. Sararit and R. Malaka
improvement who preferred other memorizing method and none from writingdown group. For the most impactful element: 26% of the participants found the audio of the game to be the most impactful to them. Other 25% also thought the same for the graphics. 13% chose the animations. The rest thought the most impactful element was the combination of two or more audiovisual elements. For participants’ preferences: the neutral condition was preferred by 12 participants from the first experiment and 6 from the second due to having the least distractions. The violent condition was preferred by 5 participants in the first and 9 in the second because of the more exciting setup, more pressure and its potential to grab more attention. The non-violent condition was most preferred (n = 13 + 17) due to the more exciting experience than the neutral condition but less disturbing than the violent condition. The majority of participants in the first experiment mentioned the lack of a story or narrative that might further affect the experiences. No correlation were found between the participants’ opinions and preferences, or between TIPI personality scores and aggression scores to the differences in player experiences or short term memorizing effect in both experiments.
6
Discussion
According to the interview, overall the game was received positively and were found to be a good tool for memorization. Many participants who preferred visualization and repetition methods of memorizing also performed better in the game. However, the game was still viewed as an edutainment and even considered boring by some participants. Which means our game wasn’t perceived more like an entertainment game just by adding the audiovisual elements. This could be caused by our decision to keep other game mechanics to the minimum and focus on examining the effects of different audiovisual and narrative elements. Opinions on the three different audiovisual elements in the first experiment: The participants thought that the neutral condition had the least distractions and allowed for better concentration on memorizing, while opinions on the violent condition scores differ widely. The positive, non-violent condition was the most preferred in the interviews, which was also backed up by PENS having a significantly greater score on intuitive control. This result signified the appropriate use of comfortable and approachable family-friendly audiovisual effects that can improves the experience in serious games. The Player experience evaluation of the first experiment found that the nonviolent audiovisual elements had greater effects on player experience compared to it’s violent counterpart. Furthermore, we also observed a negative effect of violent audiovisual element with no narrative with two female participants had an adverse memorizing effect. this might be explained by Bogliacino and Bushman’s findings [3,4] on the impairing effects of violence on short-term memory. Even though, these data are only observations without statistic proof, they might indicate an interesting new hypothesis. On the one hand, there could be a gender
Serious Violence: The Effects of Violent Elements in Serious Games
15
bias towards violence and the aversion towards violence could have a negative effect on the learning outcome. In summary, our first experiment showed that without narrative, negative effects can be found when violent audiovisuals were added to the serious game. These effects are more significant in the player experience and smaller in memorization. However, both of this negative effects were effectively eliminated in the second experiment where the narratives designed to lessen the negative effects of violent element were incorporated into the game. In the second experiment the player experience and the short-term memorization results were similar in both violent and non-violent conditions. Which indicates that the appropriate narrative can be helpful and even necessary when designing a serious game with violent elements. 6.1
Limitation
Some limitations of our research should be noted. First, the sample size of the experiment is small and possibly affected the significance of the result. A largerscale experiment should be conducted in the future. Secondly, our sample comprised mainly of international university students with various degrees of prior knowledge and interest in the subject of human anatomy. Thirdly, the samples also had an unbalanced gender distribution, due to the randomized group assigning. Lastly, the amount of research on the positive use or learning effects of violence in video games, serious games or media, in general, is still very small.
7
Conclusion and Future Work
Our findings suggested that the integration of violent audiovisual elements in an educational serious game can be a double-edged sword that need to be handle with care. While strong improvement can be found just as the same as in the other conditions, it also demonstrated a negative cognitive effect in some participants. While the neutral condition offered the least distracting experience, and non-violent condition is reliably positive and preferred by the majority of the participants. The pleasant and constructive non-violent audiovisual effects have proven to help improving user experience in educational application of serious game, despite the contrast to the subject’s grim nature, such as a human skeleton. Which means currently, we cannot recommend the use of violent audiovisual element in any educational serious game due to the possible negative effect without precautions. One such precaution is incorporating an appropriate narrative, framing the scenario as “being in an experiment or a game” and given a justifiable reasoning for the player’s violent action. Our second experiment showed how this can eliminate negative player experience and memorization effects. While the game-play can also be improved by incorporating more game mechanics. While We are planning for a more in-dept study on the other dimensions of violence in serious games, we suggest more researches in the future includes different player’s actions and game mechanics in conjunction with the
16
N. Sararit and R. Malaka
usage of appropriate violent elements in serious games. Long-term experiments can provide more in-depth understanding on the effect of violent elements in serious games. Larger number of subjects and the possibility of choosing one’s own condition to catering toward different player preferences can also be examined. Acknowledgements. This research is funded by Klaus Tschira Foundation through graduate school “Empowering Digital Media”, Digital Media Lab, TZI, University of Bremen.
References 1. Agina, A.M., Tennyson, R.D.: Towards understanding the positive effect of playing violent video games on children’s development. Procedia - Soc. Behav. Sci. 69, 780–789 (2012). https://doi.org/10.1016/j.sbspro.2012.11.473 2. Bediou, B., Adams, D.M., Mayer, R., Tipton, E., Shawn Green, C., Bavelier, D.: Meta-analysis of action video game impact on perceptual, attentional, and cognitive skills. Psychol. Bull. 144, 77 (2017). https://doi.org/10.1037/bul0000130 3. Bogliacino, F., Grimalda, G., Ortoleva, P., Ring, P.: Exposure to and recall of violence reduce short-term memory and cognitive control. Proc. Natl. Acad. Sci. 114(32), 8505–8510 (2017) 4. Bushman, B.J., Bonacci, A.M.: Violence and sex impair memory for television ads. J. Appl. Psychol. 87(3), 557 (2002) 5. Ferguson, C.: The good, the bad and the ugly: a meta-analytic review of positive and negative effects of violent video games. Psychiatr. Q. 78, 309–16 (2008). https://doi.org/10.1007/s11126-007-9056-9 6. Fullerton, T.: Game Design Workshop: A Playcentric Approach to Creating Innovative Games. AK Peters/CRC Press, United States (2018) 7. Goldstein, J.H.E.: Why We Watch: The Attractions of Violent Entertainment. Oxford University, US (1998) 8. Gosling, S.D., Rentfrow, P.J., Swann Jr., W.B.: A very brief measure of the big-five personality domains. J. Res. Pers. 37(6), 504–528 (2003) 9. Green, C.S., Bavelier, D.: Learning, attentional control, and action video games. Current Biol. 22(6), R197–R206 (2012) 10. Greitemeyer, T., M¨ ugge, D.O.: Video games do affect social outcomes: ametaanalytic review of the effects of violent and prosocial video game play. Pers. Soc. Psychol. Bull. 40(5), 578–589 (2014). https://doi.org/10.1177/0146167213520459 11. Hartmann, T., Vorderer, P.: It’s okay to shoot a character: moral disengagement in violent video games. J. Commun. 60(1), 94–119 (2010) 12. Kornhaber, S., Orr, C.S.A.: Game of Thrones: A Pointless Horror and a Ridiculous Fight, May 2015. https://www.theatlantic.com/entertainment/archive/2015/05/ game-of-thrones-roundtable-season-5-episode-six-unbowed-unbent-unbroken/ 393503/ 13. Maheu-Cadotte, M.A.: Effectiveness of serious games and impact of design elements on engagement and educational outcomes in healthcare professionals and students: a systematic review and meta-analysis protocol. BMJ Open 8(3), e019871 (2018) 14. Malaka, R.: How computer games can improve your health and fitness. In: G¨ obel, S., Wiemeyer, J. (eds.) GameDays 2014. LNCS, vol. 8395, pp. 1–7. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05972-3 1
Serious Violence: The Effects of Violent Elements in Serious Games
17
15. Markey, P.M., Markey, C.N., French, J.E.: Violent video games and real-world violence: rhetoric versus data. Psychol. Popular Media Cult. 4(4), 277 (2015) 16. Moisala, M., et al.: Gaming is related to enhanced working memory performance and task-related cortical activity. Brain Res. 1655, 204–215 (2017) 17. Powers, K.L., Brooks, P.J., Aldrich, N.J., Palladino, M.A., Alfieri, L.: Effects of video-game play on information processing: a meta-analytic investigation. Psychon. Bull. Rev. 20(6), 1055–1079 (2013) 18. Przybylski, A.K., Weinstein, N.: Violent video game engagement is not associated with adolescents’ aggressive behaviour: evidence from a registered report. Royal Soc. Open Sci. 6(2), 171474 (2019) 19. Rigby, S., Ryan, R.: The player experience of need satisfaction (pens) model. Immersyve Inc. 1–22 (2007) 20. Strasburger, V.C., Donnerstein, E.: Children, adolescents, and the media in the 21st century. Adolesc. Med. (Philadelphia, Pa.) 11(1), 51–68 (2000) 21. Strenziok, M., Krueger, F., Deshpande, G., Lenroot, R., Van der meer, E., Grafman, J.: Fronto-parietal regulation of media violence exposure in adolescents: a multi-method study. Soc. Cogn. Affect. Neurosci. 6(5), 537–547 (2011). https:// doi.org/10.1093/scan/nsq079 22. Studios, B.: Surgeon Simulator. Game [Windows], April 2013 23. on Violence, A.P.A.C., Youth: Violence & youth: Psychology’s response, vol. 1. American Psychological Association (1993) 24. Worth, K.A., Chambers, J.G., Nassau, D.H., Rakhra, B.K., Sargent, J.D.: Exposure of US adolescents to extremely violent movies. Pediatrics 122(2), 306–312 (2008) 25. Wouters, P., van Nimwegen, C., van Oostendorp, H., Spek van der, E.: Ametaanalysis of the cognitive and motivational effects of serious games. J. Educ. Psychol. 105(2), 249–265 (2013). https://doi.org/10.1037/a0031311 26. Yammine, K., Violato, C.: A meta-analysis of the educational effectiveness of threedimensional visualization technologies in teaching anatomy. Anat. Sci. Educ. 8(6), 525–538 (2015)
Enhancing Game-Based Learning Through Infographics in the Context of Smart Home Security Mehrdad Bahrini(B) , Nima Zargham, Johannes Pfau, Stella Lemke, Karsten Sohr, and Rainer Malaka Digital Media Lab, TZI, University of Bremen, Bremen, Germany {mbahrini,zargham,jpfau,slemke,sohr,malaka}@uni-bremen.de
Abstract. Constantly evolving advances of smart home devices features require users to persistently keep up with safety concerns. While update reports and news articles are common ways to keep them informed, many users struggle in thoroughly understanding and applying available security recommendations. Educational games have proven to be an intuitive way to increase the incentive for awareness but many of them come short to convey the needed supporting knowledge. In an attempt to raise security awareness on smart home devices, we designed an educational game to demonstrate the latest security challenges and solutions. To ascertain users’ attention and motivation, we have developed two versions of the game to contrast the integration of text and infographics as supporting knowledge which are the hints in this case. Our evaluations give evidence that viewing security-related content with a higher deployment of infographics improves users’ performance significantly, increases users’ interest in the topic, and creates higher levels of confidence solving security problems and complexities. Keywords: Usable security · Smart home Supporting knowledge · Infographics
1
· Educational games ·
Introduction
Educational games (edu-games) have shown great potential in being a powerful teaching tool as they can increase engagement, creativity and authentic learning [17,24,39]. Game-based learning allows users to see themselves in simulated real situations where they can learn through experience and solve the problems through critical thinking [13]. Furthermore, the motivational power of gamebased learning towards specific subjects is widely recognised [32]. Harnessing the intrinsically motivating power of games, researches have shown that edu-games can be a great tool to promote user engagement and improve positive usage patterns, such as increasing user activity, social interaction, and the quality and productivity of user actions [23,38]. Previous work has shown that edu-games c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 18–36, 2020. https://doi.org/10.1007/978-3-030-65736-9_2
Enhancing Game-Based Learning Through Infographics
19
can be useful in raising the knowledge and awareness of the users [3,17], but this alone can not get the best out of the learning experience. In edu-games, feedback plays a key role in providing the user the necessary information for the learning experience. Using in-game feedback is intended to guide learners to improve their performance, and increase motivation or learning outcomes by providing them with information on the accuracy of their answers in various ways [62]. According to Johnson et al. [36], these feedback messages can be classified into two types. Outcome-oriented feedback delivers information to learners about their progress or the accuracy of their answers (e.g. which is the correct answer and why). Process-oriented feedback provides learning guidance and supporting knowledge on the processes as well as strategies used to achieve the correct answer or action in the game. Examples of process-oriented feedback are prompts and hints that lead the learners towards the right answer. In many video games, supporting knowledge is used to inform the players about their objectives and guide them throughout the game. This form of processoriented feedback could be leveraged to improve the effectiveness of educational games [57]. The supporting knowledge can be given to the users in different forms such as text, images, audio, and video, to provide explicit guidance to players as they play the game [36]. In this paper, we study the use of infographics as a way to convey information to the players in an edu-game. Infographics are a graphical representation of information or knowledge [34]. They are essentially an effective visual representation that explains information simply and quickly using a combination of text and graphical symbols. Some commercial games such as Metrico+ [25], Mini Metro [16], and Lumino City [58] have implemented infographics as their look-and-feel or even game mechanic and have received very positive reviews from the users. Infographics can motivate players and exploit the visual potential to represent and convey knowledge. They aim to increase the amount of information people remember by breaking them into concise, visually attractive chunks of data. This way, the learners can remember more, leading to improvement in their capabilities [8]. Although utilizing infographics have shown to be effective in transferring information, the implementation of infographics in edu-games is still under investigation. Recent innovations in technology and the rise of inter-connectivity between devices enable the development of innovative solutions in the field of smart homes to take advantage of these opportunities. Along with this rapid development, the security and privacy of users has always been a concern. Making smart home devices more secure may partly address this concern, but users also have a complementary role in protecting their sensitive information. However, users’ understanding and ability to adopt and configure the security of smart home devices is not integrated. As users face a plethora of innovations as well as the everexpanding spread of security news and journals, it has become increasingly difficult for non-tech-savvy users to understand and apply security guidance. Games have long been recognized as an effective and appealing educational strategy in the field of computer security and privacy [61]. This approach has been used to teach various topics related to security [22,30].
20
M. Bahrini et al.
We have designed an edu-game with the aim of aiding owners of smart home devices to get acquainted with security issues and recent risks. Players are asked to find potential smart home devices in different rooms and answer questions about the respective device, helping a virtual smart home owner to protect his home from attacks. For contentual assistance during the game, players have the opportunity to assess security instructions concerning the respective device. Within our evaluation, this information is presented textually (analogous to conventional safety reports or updates) or visualized using infographics, as a structured combination of text, images, charts, and icons. Eventually, infographics aim to enable effective representation of data and explain complex problems in a clear and understandable way [31]. Using a between-subjects design, we investigate the users’ motivation and evaluate the impact of infographics on players, to answer the following research question: To what extent can infographics as supporting knowledge improve the learning experience of users and make learning more effective in an educational game in a smart home security context? Our results indicated a significant amount of correct answers, as well as an increase of perceived competence by the introduction of infographics. Harnessing this motivation and illustration potential, this paper augments the area of educational serious games with immediately comprehensible knowledge representation and provides evidence that players are more effective, motivated and spend more time on self-education by the implementation of infographics.
2 2.1
Related Work Game-Based Learning for Security Topics
Game-based learning uses different techniques to manipulate the behavior of users in the direction of a specific goal within a non-gaming context [28]. For example, it can be utilized as a marketing strategy to promote products or services or for training and simulating complex environments virtually [70]. Games can establish the facilitation of enjoyment and engagement by increasing intrinsic motivation, in contexts that are primarily extrinsically motivated. Game-based learning approaches, especially mobile learning [29], are a relatively new approach to security education. A study comparing the use of text, videos, and games found that mobile learning can raise awareness of security issues and teaches users more effectively in comparison to the traditional text-based and video-based learning materials [1]. Research studies showed that serious games provide promising ways to change cybersecurity behaviour [20]. Bahrini et al. [6] developed a gamified application that helps users to understand the consequences of granting permissions to the applications. Their results showed that playing the gamified application results in a significant increase of player enjoyment and that the game is more informative than the traditional approach of permission administration via the Android system settings. In an attempt to raise interest and awareness towards the topic of privacy and security settings of mobile devices, Zargham et al. developed a humorous
Enhancing Game-Based Learning Through Infographics
21
decision-making game that helps users to better understand the consequences of applying security changes on a mobile device [68]. They compared their game to two more models (a serious animated video and a humorous animated video) and found that the game-based approach is more successful in engaging and raising awareness. Wen et al. designed and developed a role-playing game to engage users to learn more about phishing threats in an active and entertaining manner [67]. Their study showed that the game raises awareness towards the topic and enhances anti-phishing self-efficacy facing phishing emails. Chen et al. presented a desktop game, aiming to change cybersecurity behavior by translating selfefficacy into game design [14]. Their results showed that the game experience could improve users’ confidence in tackling security issues. Many studies have explored the effectiveness of games for increasing cybersecurity awareness, however, most of them have focused primarily on factors of entertainment or engagement of such games, and very little on the learning effect and behavioural change in users [2,33]. 2.2
Supporting Information in Game-Based Learning
Edu-games are seen as one of the most promising forms of computer-based education and multiple studies have shown their highly engaging potentials [35,55]. Nonetheless, there is less support for their educational effectiveness [21,44,66]. Many of the existing work did not evaluate the effectiveness of the components used in an edu-game. One element of the game that is particularly easy to adapt and can have a considerable influence on motivation is feedback. Studies have indicated that in computer-based learning environments, feedback can be a confirmation of a correct answer or an explanation or recommendation in detail. Detailed feedback has a greater impact on learning outcomes and motivation than simple feedback, but this depends on the learners’ attention and ability to correct their actions [11,60]. In an attempt to study the effectiveness of hints, O’Rourke et al. gathered data from 50,000 students and compared four different hint designs based on successful hint systems in intelligent tutoring systems and commercial games [53]. Their results showed that all four hint systems negatively impacted performance compared to a baseline condition with no hints. Authors also suggest that traditional hint systems may not translate well into the educational game environment. Appropriate presentation of the feedback could have a considerable impact on the effectiveness of the players and can promote deep, meaningful learning [51]. Studies have shown that people learn more deeply when words are presented in spoken form rather than in printed form [26,56]. However, they did not suggest that feedback should always be presented as spoken words. In this paper we evaluate an approach for comparing text and infographics as process-oriented feedback and their impact on the user experience and game outcome.
22
2.3
M. Bahrini et al.
Infographics
Information is remembered better when it is supported with pictures [43]. The use of visual information during learning and instructional processes offers many advantages. Studies showed that if a text is followed by illustrations, learners retain information for longer and are more likely to remember it [5,15,19,47,48,54]. Infographics are a powerful way to distill and explain complex information as a visual narrative and constitute an effective way of communicating data to decision makers who need high-quality information in a bite-sized and easily accessible form [42]. Visual embellishments, including recognizable comics and images make infographics effective and improve data presentation and memorability [7,9]. Studies show that infographics brings various modalities together in the hope that they will be understood by a wider audience, regardless of their ability to learn. Infographics use text and illustrations or images to inspire readers to better remember the information presented [46]. Following a study by Kay and Terry [40], they argued that inclusion could be achieved through the use of iconic symbols, short facts and captions as a means of highlighting relevant important information in complex documents. Similarly, Knijnenburg and Cherry [41] suggested using comics as a more inviting, understandable and engaging medium to improve the communication of privacy notices. Unlike the efforts made to explore the effects of infographics [42,46], research on the use of infographics in edu-games has not been studied thoroughly. This paper showcases the potential of using infographics embedded in an educational game. We aim to aid users in becoming more familiar with security concepts of their devices and motivate them to increase their knowledge on the topic. Our approach is focused on providing efficient process-oriented feedback in the context of security to help with the understanding of security issues in the smart home environment.
3
Approach
We designed an educational game that uses infographics as supporting knowledge in an approach to raise players’ awareness and increase their interest towards smart home security issues. The learners explore the game levels to interact with smart devices and answer a number of security questions (see Fig. 1). The provided supporting information helps the players to answer the questions and gain a deeper understanding of the new security concerns of smart devices. The game was developed for mobile platforms and has an ordinary person narrative. At the beginning of the game, the player meets the character “Luca” in front of his home, who is worried about the security of his smart home devices. Luca has less understanding of how to configure the smart devices. He asks the player to help him by searching devices and answering related questions. The player enters Luca’s home by ringing the doorbell. There are five rooms in the house, each including two smart devices. Each time the player enters a room, a mellow background music is played. The player should tap on each of the
Enhancing Game-Based Learning Through Infographics
23
devices to display a question. For each question screen, there is also a hint button that helps the player to obtain supporting knowledge to answer the question. After submitting an answer, the game evaluates it and displays a notification. Eventually, the player is awarded based on the number of correct answers at the end of the game. During the game, users have to answer ten questions where each question is aimed at one smart device. A number of factors were assessed for the selection of devices. It is essential to have a router in the home network. Since most devices are connected to the network via an app, we have considered choosing a smartphone as an intelligent device. We have also selected 6 devices (Smart TV, IP Camera, Smart Speaker, Smart Thermostat, Smart Lamp, Smart Plug) that most smart home owners are familiar with. To arouse the players’ curiosity, the last two devices, Smart Home Firewall and Smart Mowing Robot, were chosen.
Fig. 1. The game helps players to get acquainted with security issues of smart home devices. Narrative (left), question (middle), and supporting knowledge screens (right).
3.1
Question Scenarios
The selected question for each device is based on the security and privacy concerns that have been addressed as threat models in research and articles in recent years [59,69]. Consequently, 10 recommendations have been selected that are closer to the daily life of the users. Certainly, there is no doubt that the number of available recommendations is very large. However, all these items must be taken into account in the device settings. The following is an overview of the selected questions:
24
M. Bahrini et al.
– Router: Setting up routers might be a tedious task for non-tech-savvy users. Although companies provide manuals, there is not enough information about the security issues caused by incorrect settings. Users have difficulties with understanding the configurations such as setting a secure admin password, choosing an appropriate protocol to encrypt the connection and utilizing technologies such as Wi-Fi Protected Setup (WPS) [37]. Consequently, the router question concerns which setup could help to have a secure router. – Smartphone: Nowadays, smartphones are very popular and a convenient means of accessing and controlling smart home devices. Applications are being developed and are available for download from App Stores. The use of a fake, unofficial or outdated applications could lead to security problems for users’ data and also for smart home devices [63]. Hence, we ask the players how an application could cause a security breach for smart devices. – Smart TV: New generation of TVs integrate an operating system running multiple applications and an internet connection, allowing them to offer more services to users, however this might raise security concerns [4]. Webcam hacking, tracking problems and outdated software pose threats to user privacy1 . In this scenario, users are encouraged to examine their understanding of these security and privacy issues. – IP Camera: The IP cameras allow users to monitor their properties. It is easy to set up and does not require complex configuration. Users can also use an application to access the camera at any time and from anywhere. These functions are interesting for hackers. Various types of security attacks on the internet have become a serious threat to the video stream from IP cameras [18]. Therefore, users are advised to configure a variety of security recommendations, such as camera passwords, use of up-to-date applications and video encryption to protect against these threats2 . This question investigates whether users understand the basic settings of a secured IP camera. – Smart Speaker: It is easy to neglect that intelligent assistants are designed to be at the heart of smart home systems. While they allow users to surf the Internet, they can communicate and control other internet-enabled technologies at home. Recently, it was discovered that one type of attack allows hackers to secretly communicate with your device via white noise or YouTube videos - so they can send text messages or open malicious websites without the owners knowing [12]. Providing users with information about such harmful attacks helps them to protect their voice assistants from being attacked unwanted. – Smart Thermostat: Controlling the smart thermostat via apps on smartphones allow the users to raise or lower the temperature remotely. The smart thermostats could create a gap in privacy and security of smart home networks, precisely because they learn about your habits and behaviour. Hackers could attack the vulnerable thermostat and get information about when users are not home, so they know when to break in without worrying about users 1 2
https://us.norton.com/internetsecurity-iot-smart-tvs-and-risk.html. https://www.consumer.ftc.gov/articles/0382-using-ip-cameras-safely.
Enhancing Game-Based Learning Through Infographics
–
–
–
–
25
returning [27]. Such complex scenarios should be deeply understandable to users in order to protect their information and properties from attackers. The aim of this question is to inform users about the risks if someone gaining access to a smart thermostat. Smart Lamp: By connecting a smart lamp to the home network, users can control the brightness and sometimes change the color of the light from their smartphone. This provides more advanced features such as connecting the lamp to an alarm clock or flickering the desk lamp when new messages are received. These facilities are sometimes associated with security problems that could cause health and financial damages [52]. The purpose of this question is to provide users with recommendations to improve their knowledge to better decide how to purchase a suitable and secure intelligent lamp. Smart Plug: Smart plugs with cloud connection enable users to monitor and control electronic household appliances from anywhere. To manage them over the Internet, users should have a cloud account on the manufacturer’s website or application and register the smart plug devices in the cloud service. However, they may suffer from insecure communication protocols and lack of device authentication [45]. With this question, we investigate the player’s knowledge about user profile creation and understanding why the authentication and authorization of smart plug on the cloud server is important. Smart Home Firewall: By connecting smart devices to each other and to the Internet, smart home applications automate complex household tasks. Keeping track of the actions performed and controlling data communication could be confusing for inexperienced users. Rules for firewalls help protect the home network from malicious attacks as well as controlling the security vulnerabilities [65]. In this scenario, we encourage players to consider getting familiar with the firewall and the role of using them in smart home networks. Smart Mowing Robot: Mowing robots are becoming increasingly intelligent. They use GPS information to calculate the desired location and have an internet connection that enables them to communicate with cloud services and their applications. This scenario examines the advantages of using VPN when the user is away from home and wants to access the home network via a public Wi-Fi hotspot to take control of the smart mowing robot [50].
3.2
Game Procedure
The game consists primarily of the following building blocks: – Finding devices: Players should find two devices in each room and answer the following questions regarding these. – Request help: During the game, players may lack background knowledge to answer the questions. This event gives users insights about the context of the smart device and related security issues. – Feedback of answers: After the player submits an answer, the game displays the result. If the answer was wrong, the player will receive the correct answer.
26
M. Bahrini et al.
After starting a game session, the avatar will be displayed, expressing his goal via a textual speech bubble. By tapping on the doorbell, the player goes to the next state of the game (see Fig. 1). Question: Once the player enters a room, there are two available smart devices. By clicking on one of them, the question screen will be shown. All questions are multiple choice and the game informs the players while choosing the first device. On the top of the question screen, the player finds two buttons: The hint button on the left displays the supporting knowledge about the device’s security, while the avatar icon on the right explains general game controls (see Fig. 1). The player is directed to proceed to the next room after answering two questions. Progression: Each play-through consists of 10 questions. Luca’s home will become more secure, proportional to the number of correct answers. In order to transfer this concept to the player, 3 open red locks are displayed at the start of the game. Each of these locks turns into green closed locks after three correct answers given by the player. With 9 correct answers the player could get 3 green closed locks. Supporting knowledge: By clicking on the information icon, the player is directed to the supporting knowledge screen. For the comparison of using text and infographics regarding their effect on player’s motivation and performance, either text or infographics are displayed (see Fig. 2). The content provided for the supporting knowledge is exactly the same for both versions. Every question includes a different supporting knowledge, separated from other questions. As for the used infographics, Various symbols have been added to transfer the concepts to the players and to increase their attention. A caption was selected for each infographic based on the associated device. For every device, we also designed symbols that convey basic concepts about device configuration or physical forms. To express the concept of being secure and insecure, there is a closed or open lock icon next to the titles or symbols. These concepts were applied to all infographics. The backyard is considered as the last room. By answering the two related questions, the player is directed to the reward interface where the number of correct answers and the corresponding reward are displayed on the screen.
4
Evaluation
To evaluate our research question, we conducted a between-subjects design user study with 60 participants. Within the first group (Text-Group), we evaluated with (n = 30) participants, using descriptive textual background information in the supporting knowledge screen. The second group (Infographics-Group) contained also (n = 30) participants, mutually excluded from the first, and introduced infographics instead of text in the supporting knowledge screen. We conducted laboratory study sessions on the university campus, with one participant per session and a duration of 30 to 45 min. As a mobile device, we provided a Google Pixel 2 XL with Android 9.0.
Enhancing Game-Based Learning Through Infographics
27
Fig. 2. Supporting knowledge screen: Infographics (left) and Text (right).
1. The interviewer provided an introduction about the game and security problems about smart home devices to the player. 2. The player ran the game, entered the rooms and answered related questions. Play time was measured. 3. After the game was over, the player answered a number of questionnaires. (a) The first questionnaire contained general questions regarding demographic information (e.g. age and gender). (b) In order to measure the usability of the game, the second questionnaire consisted of the System Usability Scale (SUS) [10]. (c) Motivation of the player was measured by utilizing the Intrinsic Motivation Inventory (IMI) [49] on a 7 point Likert-scale. (d) Beside standard questionnaires, we had a number of self-designed context questions. The purpose of these questions was to understand the backgrounds of the players and their familiarity with smart devices. 4.1
Participants
A quota sampling approach was used to recruit participants for this study in which the selection was based on mailing lists, social networks, word-of-mouth and looking for users of smart home devices. Participation was voluntary and uncompensated. The first group consisted of 30 participants, 9 participants had a college degree, while 21 completed high school. Among the subjects, 15 people identified themselves as male and 15 as female. In terms of age, participants ranged between 18 to 54 years with an average age of 28.9 (SD = 10.25). The second group consisted of 30 participants, 14 participants had a college degree, while 16 completed high school. Among the subjects, 15 people identified themselves as male and 15 as female. In terms of age, participants ranged between 21 to 44 years with an average age of 30.6 (SD = 6.38).
28
5
M. Bahrini et al.
Results
Statistical analysis was applied to identify possible differences between the two groups. To determine the impact of infographics on the players, the data from both groups were compared to each other. After playing the game, participants were also asked to select all the smart home devices they own to see which devices are most commonly used amongst them. It turned out that all participants in the Text-Group owned at least one smart device in their homes and all of them had a smartphone. Table 1 shows an overview of the smart devices owned by the participants in the Text-Group. Table 1. The number of smart devices owned by the participants in both groups Number of Devices Text-Group Infographics-Group Smart TV 25 12 Smart Lamp 9 Smart Speaker 3 Smart Plug 2 IP Camera 2 Smart Thermostat Smart Mowing Robot 0 0 Smart Firewall
29 10 10 2 3 1 0 0
The calculated mean value of SUS score for the Text-Group was 89.9 (N = 30, SD = 14.70). The IMI score of Interest-Enjoyment was rated 6.2 (SD = 0.78), Perceived Competence score was rated 3.4 (SD = 0.1) and Effort-Importance score was rated 5.6 (SD = 0.97). The average of correct answers was 2.4 (SD = 0.17) and the average play time was 9.27 min (SD = 1.36). In the Infographics-Group, participants were also asked to select all the smart home devices they own. The results showed that all participants in this group also owned at least one smart device in their homes and had smartphones (see Table 1). The calculated mean value of SUS score for this group was 84.0 (N = 30, SD = 7.32). The IMI score of Interest-Enjoyment was rated 6.0 (SD = 0.65), Perceived Competence score was rated 5.8 (SD = 0.39) and Effort-Importance score was rated 5.6 (SD = 0.84). The average of correct answers was 7.3 (SD = 0.15) and the average play time was 14.77 min (SD = 2.89). The independent student’s t-Tests [64] revealed that the participants in the Infographics-Group (M = 7.3, SD = 1.15) who received supporting knowledge in the form of infographics demonstrated significantly better average of correct answers (t(58) = 11.734, p < .001, Cohen sd = 3.030) compared to the TextGroup participants (M = 2.4, SD = 1.70) (see Fig. 3).
Enhancing Game-Based Learning Through Infographics
29
Fig. 3. The number of correct answers (left) playing time (right).
For average of playing time between two groups, the independent t-tests indicated that Infographics-Group participants (M = 14.77, SD = 2.89) showed a significantly higher average playing time (t(58) = 9.441, p < .001, Cohen sd = 2.438) compared to the Text-Group participants (M = 9.27, SD = 1.36) (see Fig. 3).
Fig. 4. The score of IMI test (Perceived Competence).
For IMI’s Perceived Competence scores, independent t-Tests showed that the Infographics-Group (M = 5.8, SD = 0.39) significantly outperformed (t(58) = 12.456, p < .001, Cohen sd = 3.216) the Text-Group (M = 3.4, SD = 0.1) (see Fig. 4). We did not witness any significant differences in Interest-Enjoyment (t(58) = 1.317, p = .193), and Effort-Importance (t(58) = 0.237, p = .814) of IMI between the two groups. Also, no significant differences in the SUS scores (t(58) = 1.364, p = .178) between the two groups could be found.
6
Discussion and Limitations
The purpose of this study was to investigate how a particular style of feedback, in this case infographics, affects the performance of edu-game players in the context of smart home security. Ultimately, the aim of this experiment was to provide answers to the comprehensive question: To what extent can infographics as supporting knowledge improve the learning experience of users and make learning more effective in an educational game in a smart home security context?
30
M. Bahrini et al.
Results from the user study indicate that the game has a distinct usability and players enjoyed playing it, regardless of the difference in the form of supporting knowledge. Furthermore, our results showed high engagement towards the topic for the people who played the game. Participants were eager to spend time playing the game in both groups. Players in the Infographics-Group answered significantly more questions correctly compared to the Text-Group. We evaluated that users performed better when they got infographics as supporting knowledge. Due to high complexity of the topic, the questions could be considered as difficult for the average user. However, participants in the Infographics-Group performed reasonably well. This could indicate that using infographics as supporting knowledge could improve the performance of players in an edu-game even when the topic is rather difficult for the average user. The resulting IMI Perceived Competence scores indicate that reading and viewing infographics considerably raise the players’ confidence. The IMI (EffortImportance) scores also show that the players made an effort to answer the questions in both groups. However, they were significantly less successful in terms of performance in the Text-Group. Even though participants were eager to answer the questions in both games, the infographics scored better. This could be evidence that not only a difference in motivation leads to the increase in correct answers, but the technical understanding was actually improved. Although there was a significant difference in terms of (Perceived Competence), We did not witness any significant difference in terms of (InterestEnjoyment) and (Effort-Importance) in the IMI results. Nonetheless both groups rated very high absolute scores for these subgroups. This indicates that both versions managed to foster intrinsic motivation and raise players’ interest and effort towards the topic regardless of the form of supporting knowledge. Many of the game questions were selected from the security content which are available on web pages and users may read them throughout their daily life. It should be stressed that understanding the wording and sentences of questions could also affect the results. Based on performances of the players and their comments after the experiment, we found out that the difficulty of the questions were perceived differently between participants. Therefore, for the future we suggest to designing questions and creating levels based on complexity and difficulty of the topic. Users’ playing time on average was observed significantly higher in the Infographics-Group than the Text-Group. One could argue that the difference in play time has an effect on the learning experience of the players. Although this might be true, nonetheless, it could indicate that the users would spend more time on the information if it’s visualized with infographics rather than text which further will lead to a better learning experience. For future research, we suggest implementing a fixed time period for all conditions in which the player can access the supporting knowledge in order to focus more on the evaluation of the provided supporting knowledge and minimize other possible effects on the learning experience.
Enhancing Game-Based Learning Through Infographics
31
The game was characterized as a simple quiz-genre type, thus other game genres could be evaluated to extend the findings within different game genres. Our approach was aimed to help users gain more knowledge on how to make specific security decisions and raise their awareness towards smart home security issues. This knowledge can later help players to make more informed decisions while configuring and setting up their smart home environment. One should keep in mind that it is crucial for educational games in the context of privacy and security to be updated regularly based on recent changes and updates to provide the latest information on the topic. While these results present some significant steps forward in the investigation of using infographics as supporting knowledge in the context of smart home security, there are still some limitations that should be addressed. This experiment investigated how well a person performed in answering a question in an edu-game environment when they received two different feedback interventions. Although significant differences in performance between the conditions were found, there was no direct measurement of long-term learning after training. Furthermore, individual difference factors such as playing experience or learning type as well as the background knowledge can also lead to differences in players’ performance. Although the question criteria used in this experiment were carefully calibrated from many research materials, they were limited to 10 items. It is possible that these criteria were still not specific enough. To understand the full impact of different approaches in game-based learning, future research needs to examine its potential effects in terms of alternative types of instructional support, as well as possible differential effects of timing (e.g., near real-time, delayed).
7
Conclusion and Future Work
This paper presents a novel approach to facilitate awareness and motivation as well as enhancing learning experience in an educational game by using infographics as supporting knowledge. We present a game that increases the intrinsic motivation of users and gives them more self-confidence in terms of the smart home security concerns. Our study shows that the adoption of infographics as supporting knowledge helps users to gain a better understanding of the complex context during the game and allows the players to produce a more engaging output. Our game has shown great potential in terms of usability and, according to most players, can be used to educate people about smart home security concerns. The extent to which users can remember the solutions and security recommendations remains a question for future work. Based on the results of this evaluation, we will attempt to assess the learnability of the topic through the game and the knowledge progress of the users by means of pre- and post-questions and additional smart home devices, questions and problems. The impact of the graphical elements used in the infographics for the purpose of privacy and security learning is also a topic for the future work.
32
M. Bahrini et al.
Acknowledgement. This work was supported by the German Federal Ministry of Education and Research (BMBF) under the grant 16SV8503 (UsableSec@Home project).
References 1. Abawajy, J.: User preference of cyber security awareness delivery methods. Behav. Inf. Technol. 33(3), 237–248 (2014). https://doi.org/10.1080/0144929X. 2012.708787 2. Alotaibi, F., Furnell, S., Stengel, I., Papadaki, M.: A review of using gaming technology for cyber-security awareness. Int. J. Inf. Secur. Res. (IJISR) 6(2), 660–666 (2016) 3. Arachchilage, N.A.G., Love, S., Beznosov, K.: Phishing threat avoidance behaviour: an empirical investigation. Comput. Hum. Behav. 60, 185–197 (2016) 4. Bachy, Y., Nicomette, V., Kaˆ aniche, M., Alata, E.: Smart-tv security: risk analysis and experiments on smart-tv communication channels. J. Comput. Virol. Hack. Tech. 15(1), 61–76 (2019) 5. Baddeley, A.D.: Human Memory: Theory and Practice. Psychology Press, Exeter (1997) 6. Bahrini, M., Volkmar, G., Schmutte, J., Wenig, N., Sohr, K., Malaka, R.: Make my phone secure!: using gamification for mobile security settings. In: Proceedings of Mensch Und Computer 2019, MuC 2019, pp. 299–308. ACM, New York (2019). https://doi.org/10.1145/3340764.3340775 7. Bateman, S., Mandryk, R.L., Gutwin, C., Genest, A., McDine, D., Brooks, C.: Useful junk?: the effects of visual embellishment on comprehension and memorability of charts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2010, pp. 2573–2582. ACM, New York (2010). https:// doi.org/10.1145/1753326.1753716 8. Bellato, N.: Infographics: a visual link to learning. ELearn 2013(12), December 2013. https://doi.org/10.1145/2556598.2556269 9. Borkin, M.A., et al.: What makes a visualization memorable? IEEE Trans. Vis. Comput. Graph. 19(12), 2306–2315 (2013). https://doi.org/10.1109/TVCG.2013. 234 10. Brooke, J.: SUS: a retrospective. J. Usab. Stud. 8(2), 29–40 (2013) 11. Burgers, C., Eden, A., van Engelenburg, M.D., Buningh, S.: How feedback boosts motivation and play in a brain-training game. Comput. Hum. Behav. 48, 94–103 (2015). https://doi.org/10.1016/j.chb.2015.01.038 12. Carlini, N., et al.: Hidden voice commands. In: 25th USENIX Security Symposium (USENIX Security 2016), pp. 513–530. USENIX Association, Austin, TX, August 2016. https://www.usenix.org/conference/usenixsecurity16/technicalsessions/presentation/carlini 13. Chang, C.Y., Hwang, G.J.: Trends in digital game-based learning in the mobile era: a systematic review of journal publications from 2007 to 2016. Int. J. Mob. Learn. Organ. 13(1), 68–90 (2019) 14. Chen, T., Hammer, J., Dabbish, L.: Self-efficacy-based game design to encourage security behavior online. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3290607.3312935 15. Clark, J.M., Paivio, A.: Dual coding theory and education. Educ. Psychol. Rev. 3(3), 149–210 (1991)
Enhancing Game-Based Learning Through Infographics
33
16. Club, D.P.: Mini Metro. Game [Windows] (November 2015). dinosaur PoloClub, Aotearoa, New Zeland 17. Cone, B.D., Irvine, C.E., Thompson, M.F., Nguyen, T.D.: A video game for cyber security training and awareness. Comput. Secur. 26(1), 63–72 (2007). https://doi. org/10.1016/j.cose.2006.10.005 18. Costin, A.: Security of CCTV and video surveillance systems: threats, vulnerabilities, attacks, and mitigations. In: Proceedings of the 6th International Workshop on Trustworthy Embedded Devices, TrustED 2016, pp. 45–54. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2995289.2995290 19. Cuevas, H.M., Fiore, S.M., Oser, R.L.: Scaffolding cognitive and metacognitive processes in low verbal ability learners: use of diagrams in computer-based training environments. Instr. Sci. 30(6), 433–464 (2002). https://doi.org/10.1023/A: 1020516301541 20. Culyba, S.: The Transformational Framework: A Process Tool for the Development of Transformational Games, September 2018. https://doi.org/10.1184/R1/ 7130594.v1 21. De Castell, S., Jenson, J.: Digital games for education: when meanings play. Interm´edialit´es: Histoire et th´eorie des arts, des lettres et des techniques/Intermediality: History and Theory of the Arts, Literature and Technologies (9), 113–132 (2007) 22. Denning, T., Lerner, A., Shostack, A., Kohno, T.: Control-alt-hack: the design and evaluation of a card game for computer security awareness and education. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, CCS 2013, pp. 915–928. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2508859.2516753 23. Deterding, S., Dixon, D., Khaled, R., Nacke, L.: From game design elements to gamefulness: defining “gamification”. In: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, MindTrek 2011, pp. 9–15. ACM, New York (2011). https://doi.org/10.1145/ 2181037.2181040 24. Dixon, M., Gamagedara Arachchilage, N.A., Nicholson, J.: Engaging users with educational games: The case of phishing. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3290607. 3313026 25. Dreams, D.: Metrico+. Game [Windows] (August 2016). Digital Dreams, Utrecht, Netherlands 26. Fiorella, L., Vogel-Walcutt, J., Schatz, S.: Applying the modality principle to realtime feedback and the acquisition of higher-order cognitive skills. Educ. Technol. Res. Dev. 60, 223–238 (2012). https://doi.org/10.1007/s11423-011-9218-1 27. Fu, K., et al.: Safety, security, and privacy threats posed by accelerating trends in the internet of things. Computing Community Consortium (CCC) Technical report 29(3) (2017) 28. Fuchs, M., Fizek, S., Ruffino, P., Schrape, N.: Rethinking Gamification. Meson Press, L¨ uneburg (2015) 29. Georgiev, T., Georgieva, E., Smrikarov, A.: M-learning: a new stage of e-learning. In: Proceedings of the 5th International Conference on Computer Systems and Technologies, CompSysTech 2004, pp. 1–5. ACM, New York (2004). https://doi. org/10.1145/1050330.1050437
34
M. Bahrini et al.
30. Giannakas, F., Kambourakis, G., Gritzalis, S.: Cyberaware: a mobile game-based app for cybersecurity education and awareness. In: 2015 International Conference on Interactive Mobile Communication Technologies and Learning (IMCL), pp. 54– 58 (2015) 31. de Haan, Y., Kruikemeier, S., Lecheler, S., Smit, G., van der Nat, R.: When does an infographic say more than a thousand words? Journal. Stud. 19(9), 1293–1312 (2018). https://doi.org/10.1080/1461670X.2016.1267592 32. Heintz, S., Law, E.L.C.: Digital educational games: methodologies for evaluating the impact of game type. ACM Trans. Comput. Hum. Interact. 25(2) (2018). https://doi.org/10.1145/3177881 33. Hendrix, M., Al-Sherbaz, A., Bloom, V.: Game based cyber security training: are serious games suitable for cyber security training? Int. J. Ser. Games 3(1), 53–61 (2016) 34. Huang, W., Tan, C.L.: A system for understanding imaged infographics and its applications. In: Proceedings of the 2007 ACM Symposium on Document Engineering, DocEng 2007, pp. 9–18. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1284420.1284427 35. Hwang, G.J., Wu, P.H.: Advancements and trends in digital game-based learning research: a review of publications in selected journals from 2001 to 2010. Br. J. Educ. Technol. 43(1), E6–E10 (2012). https://doi.org/10.1111/j.1467-8535.2011. 01242.x 36. Johnson, C., Bailey, S., Buskirk, W.: Designing Effective Feedback Messages in Serious Games and Simulations: A Research Review, pp. 119–140, November 2017. https://doi.org/10.1007/978-3-319-39298-1 7 37. Kaaz, K.J., Hoffer, A., Saeidi, M., Sarma, A., Bobba, R.B.: Understanding user perceptions of privacy, and configuration challenges in home automation. In: 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 297–301 (2017) 38. Kappen, D.L., Mirza-Babaei, P., Nacke, L.E.: Gamification through the application of motivational affordances for physical activity technology. In: Proceedings of the Annual Symposium on Computer-Human Interaction in Play, CHI PLAY 2017, pp. 5–18. ACM, New York (2017). https://doi.org/10.1145/3116595.3116604 39. Karoui, A., Marfisi-Schottman, I., George, S.: A nested design approach for mobile learning games. In: Proceedings of the 16th World Conference on Mobile and Contextual Learning, mLearn 2017. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3136907.3136923 40. Kay, M., Terry, M.: Textured agreements: re-envisioning electronic consent. In: Proceedings of the Sixth Symposium on Usable Privacy and Security, SOUPS 2010. Association for Computing Machinery, New York (2010). https://doi.org/ 10.1145/1837110.1837127 41. Knijnenburg, B., Cherry, D.: Comics as a medium for privacy notices. In: Twelfth Symposium on Usable Privacy and Security (SOUPS 2016). USENIX Association, Denver, CO, June 2016. https://www.usenix.org/conference/soups2016/workshopprogram/wfpn/presentation/knijnenburg 42. Lankow, J., Ritchie, J., Crooks, R.: Infographics: The Power of Visual Story Telling. Wiley, Hoboken (2012) 43. Levie, W.H., Lentz, R.: Effects of text illustrations: a review of research. ECTJ 30(4), 195–232 (1982). https://doi.org/10.1007/BF02765184
Enhancing Game-Based Learning Through Infographics
35
44. Linehan, C., Kirman, B., Lawson, S., Chan, G.: Practical, appropriate, empiricallyvalidated guidelines for designing educational games. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1979–1988 (2011) 45. Ling, Z., Luo, J., Xu, Y., Gao, C., Wu, K., Fu, X.: Security vulnerabilities of internet of things: a case study of the smart plug system. IEEE Internet Things J. 4(6), 1899–1909 (2017) 46. Lyra, K.T., et al.: Infographics or graphics+text: which material is best for robust learning? In: 2016 IEEE 16th International Conference on Advanced Learning Technologies (ICALT), July 2016. https://doi.org/10.1109/icalt.2016.83 47. Mayer, R., Bove, W., Bryman, A., Mars, R., Tapangco, L.: When less is more: meaningful learning from visual and verbal summaries of science textbook lessons. J. Educ. Psychol. 88, 64–73 (1996). https://doi.org/10.1037/0022-0663.88.1.64 48. Mayer, R.E.: Multimedia Learning, 2 edn. Cambridge University Press, New York (2009). https://doi.org/10.1017/CBO9780511811678 49. McAuley, E., Duncan, T., Tammen, V.V.: Psychometric properties of the intrinsic motivation inventory in a competitive sport setting: a confirmatory factor analysis. Res. Q. Exerc. Sport 60(1), 48–58 (1989) 50. Molina, M.D., Gambino, A., Sundar, S.S.: Online privacy in public places: how do location, terms and conditions and VPN influence disclosure? In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019. Association for Computing Machinery, New York (2019). https:// doi.org/10.1145/3290607.3312932 51. Moreno, R., Mayer, R.E.: Role of guidance, reflection, and interactivity in an agentbased multimedia game. J. Educ. Psychol. 97(1), 117 (2005) 52. Morgner, P., Mattejat, S., Benenson, Z.: All your bulbs are belong to us: investigating the current state of security in connected lighting systems. ArXiv abs/1608.03732 (2016) 53. O’Rourke, E., Ballweber, C., Popovi´ı, Z.: Hint systems may negatively impact performance in educational games. In: Proceedings of the First ACM Conference on Learning @ Scale Conference, L@S 2014, pp. 51–60. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2556325.2566248 54. Paivio, A.: Mental Representations: A Dual Coding Approach, vol. 9. Oxford University Press, New York (1990) 55. Papastergiou, M.: Digital game-based learning in high school computer science education: impact on educational effectiveness and student motivation. Comput. Educ. 52(1), 1–12 (2009). https://doi.org/10.1016/j.compedu.2008.06.004 56. Park, B., Flowerday, T., Br¨ uken, R.: Cognitive and affective effects of seductive details in multimedia learning. Comput. Hum. Behav. 44, 267–278 (2015). https:// doi.org/10.1016/j.chb.2014.10.061 57. Plass, J.L.: Handbook of Game-Based Learning. MIT Press, Cambridge (2020) 58. State of Play Games. Lumino City. Game [Windows], December 2014. State of Play Games, London, United Kingdom 59. Schiefer, M.: Smart home definition and security threats. In: 2015 Ninth International Conference on IT Security Incident Management IT Forensics, pp. 114–118 (2015) 60. Serge, S.R., Priest, H.A., Durlach, P.J., Johnson, C.I.: The effects of static and adaptive performance feedback in game-based training. Comput. Hum. Behav. 29(3), 1150–1158 (2013). https://doi.org/10.1016/j.chb.2012.10.007
36
M. Bahrini et al.
61. Sheng, S., et al.: Anti-phishing phil: the design and evaluation of a game that teaches people not to fall for phish. In: Proceedings of the 3rd Symposium on Usable Privacy and Security, SOUPS 2007, pp. 88–99. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1280680.1280692 62. Shute, V.: Focus on formative feedback. Rev. Educ. Res. 78, 153–189 (2008). https://doi.org/10.3102/0034654307313795 63. Sivaraman, V., Chan, D., Earl, D., Boreli, R.: Smart-phones attacking smarthomes. In: Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks, WiSec 2016, pp. 195–200. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939918.2939925 64. Student: The probable error of a mean. Biometrika, pp. 1–25 (1908) 65. ur Rehman, S., Gruhn, V.: An approach to secure smart homes in cyber-physical systems/internet-of-things. In: 2018 Fifth International Conference on Software Defined Systems (SDS), pp. 126–129 (2018) 66. Van Eck, R.: Building artificially intelligent learning games. In: Games and Simulations in Online Learning: Research and Development Frameworks, pp. 271–307. IGI Global (2007) 67. Wen, Z.A., Lin, Z., Chen, R., Andersen, E.: What.hack: engaging anti-phishing training through a role-playing phishing simulation game. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/ 3290605.3300338 68. Zargham, N., Bahrini, M., Volkmar, G., Wenig, D., Sohr, K., Malaka, R.: What could go wrong? raising mobile privacy and security awareness through a decisionmaking game. In: Extended Abstracts of the Annual Symposium on ComputerHuman Interaction in Play Companion Extended Abstracts, CHI PLAY 2019, pp. 805–812. Extended Abstracts, Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3341215.3356273 69. Zeng, E., Mare, S., Roesner, F.: End user security and privacy concerns with smart homes. In: Thirteenth Symposium on Usable Privacy and Security (SOUPS 2017), pp. 65–80. USENIX Association, Santa Clara, July 2017. https://www.usenix.org/ conference/soups2017/technical-sessions/presentation/zeng 70. Zichermann, G., Cunningham, C.: Gamification by Design: Implementing Game Mechanics in Web and Mobile Apps. O’Reilly Media Inc., Sebastopol (2011)
Automatic Generation of Game Levels Based on Controllable Wave Function Collapse Algorithm Darui Cheng , Honglei Han(B)
, and Guangzheng Fei
Communication University of China, Beijing, China {diary,hanhonglei,gzfei}@cuc.edu.cn
Abstract. Procedural content generation automatically creates game content through methods such as pseudo-random numbers, which helps save labor or create games that can be played repeatedly and indefinitely. The wave function collapse algorithm is an effective procedural content generation algorithm newly proposed in recent years, but it has the problems of complicated rule writing and lack of non-local constraints. In this paper, based on the original wave function collapse algorithm, an automatic rule system is proposed, which can simplify the rule writing. Moreover, we use the three mechanisms, namely global constraint, multi-layer generation, and distance constraint, to establish non-local constraints. Through experiments, compared with the original wave function collapse algorithm, the results show that manual control has been enhanced, and the generated levels have a certain degree of similarity with human-designed levels. By making a real-time dynamic level game demo using this method, it turns out that the controllable wave function collapse algorithm we proposed has great potential in the game level generation field. Keywords: Procedural content generation · Wave function collapse · Level generation · Game design
1 Introduction Procedural Content Generation (PCG) is a significant research field in game development. To automatically create game content, pseudo-random numbers and other methods have been adopted in PCG fields. The application of PCG in the field of game development helps to save manpower, assist game designers to exert their creativity, save storage space, and create games with infinite replayability [1]. PCG algorithm can be divided into the following four categories: constructive methods, search-based methods, constraint-based methods, and machine-learning methods [2]. Among them, the Wave Function Collapse (WFC) algorithm is a newly proposed PCG algorithm based on constraint solving, which has great potential value in the game development field. However, there are four problems related to the WFC algorithm: complicated rules configuration, lack of global control, difficulty to achieve distance constraints, and © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 37–50, 2020. https://doi.org/10.1007/978-3-030-65736-9_3
38
D. Cheng et al.
impossible to generate multi-layer levels. Therefore, we propose a method that adds more constraints to the original WFC algorithm to solves these problems. First, for the complex adjacency rules which need to be manually configured in the original WFC algorithm, we proposed an automatic rule system to reduce the number of rules that need to be written. Second, we added global control in the original WFC algorithm through the global maximum constraints and tiles preset, which provides more capabilities for game designers to control generated results. Then, the distance constraints were integrated into the observation stage of the WFC algorithm. At last, multi-layer levels generation was supported so that game elements with different semantics, such as NPC and interactive items, can be placed on different layers, and there are connections and constraints between each layer.
2 Related Work 2.1 Procedural Content Generation Previous studies have recognized the importance of PCG in the creation of video games in different aspects. The current use of PCG technology in game development is mainly limited to specific types of game elements. Because the design ideas for different types of game levels are not the same, it is challenging to generate the entire game level procedurally. Lawrence Johnson et al. used the idea of the constructive PCG algorithm to create the infinite cave level. The algorithm based on cellular automata is evaluated in the infinite cave game to generate playable and well-designed cave game levels [3]. Miguel Frade et al. used a genetic algorithm to evolve the terrain to obtain the required game levels with sufficient accessibility [4], which is a search-based algorithm. The PCG via machine learning (PCGML) has also become a research hotspot, and a lot of research has been carried out in this direction. These researches [5–11] each used different machine learning methods to train on the VGLC [12], a small dataset of Super Mario Bros levels, attempting to generate new levels. Nonetheless, none of them meet the requirements of sufficient playability and controllability. Generative Adversarial Networks (GAN) is a machine learning method suitable for generation. Giacomello et al. [13] input a manually extracted level feature vector to the level generator by conditional GAN, trying to generate a level that is similar to human-designed in visual and structural aspects. Volz et al. [14] combined GAN with the evolutionary search. A two-stage experiment tried to use the generated results after screening for repeating train, in order to achieve the directional generation. However, these two studies did not reach the result of the utterly controllable generation of levels. The mentioned constructed methods, search-based methods, machine learning methods have various drawbacks and are only suitable for specific generation goals. Constructive algorithms use fixed algorithms to generate game content at once without testing and may generate results that are beyond human control as well as unusable [1]. A searchbased algorithm specifies an evaluation function and uses the idea of the evolutionary search to obtain a better result through continuous generation and evaluation, whose efficiency is relatively low [15]. All machine learning methods have a common defect that they all need a large dataset for training, but this is not practical in the field of games.
Automatic Generation of Game Levels Based on Controllable Wave Function
39
It is difficult to collect a large game level construction dataset manually. Also, different game types require completely different game levels dataset. Each time developing a new game, one needs to reconstruct the dataset. Currently, all PCG methods based on machine learning can only be trained with a very small data set, so that the results obtained generally lack playability, and the failure rate is relatively high. Compared with the above three types of algorithms, the constraint solving algorithm has certain advantages in the generation of game levels, which is specifically reflected in the fact that this type of algorithm can randomly generate levels that fully meet the artificial constraints, and the controllability is higher. 2.2 Constraint Solving Algorithm Constraint solving algorithm is neither a constructive PCG algorithm nor a strictly search-based PCG method. As an algorithm in the field of traditional artificial intelligence, constraint solving uses ideas from knowledge representation and search to model continuous and combinatorial search and optimization problems and solve them with domain-independent algorithms [16]. The problem of game level generation under certain constraints is actually a constraint satisfaction problem (CSP), which is usually defined according to a series of decisive variables and values. The method of PCG based on constraint solving can usually guarantee the output precisely but at the cost of the unpredictability of the total running time [17]. Michael Cerny Green et al. took the lead in applying the constraint solving algorithm to the generation of dungeons in games. They divided the algorithm into two stages, namely Layout Creator and Game Element Furnisher, using the algorithm based on constraint solving at all of them [18]. However, their method is only applicable to the generation of dungeon rooms or cave levels. It is not possible to control the reasonable adjacency of each tile in the map in more detail, so it is not suitable for outdoor terrain generation. Moreover, because the use of traditional constraint solving methods needs to iterate step by step, a scan has to be run in order to determine the appropriate position when adding each element. So, when there are too many game elements, it is easy to cause confusion and low efficiency. 2.3 Wave Function Collapse Algorithm WFC algorithm divides game levels into a grid and defines all cells in the grid that have all possible values firstly, reference the concept of superposition state in quantum physics. During the operation, it determines the values in each cell one by one through “observing”, and then propagates the effects to adjacent cells, and finally achieve the goal of determining a feasible solution that meets all constraints. Although this algorithm has a probability of failure caused by conflict, the efficiency is higher than the traditional search-based constraint solving method. Many researchers have applied the WFC algorithm in different studies. Isaac Karth et al. summarized a data-driven branch in WFC algorithms, namely the overlapping model. This model uses the bitmap file as input, directly reads the pixel-to-pixel adjacency relationship in the bitmap file as a constraint, and is used to guide the propagation stage of the WFC algorithm [17]. The WFC algorithm of the overlapping model has
40
D. Cheng et al.
remarkable advantages when used as a texture generator but is not suitable when generating game levels. Werner Gaisbauer et al. tried to generate a virtual city under a specific game theme by adjusting the parameters used in the basic WFC algorithm [19]. They only tried to find the best parameters for the city generation in a specific game, but make no algorithm-level improvements. Hugo Scurti et al. applied the WFC algorithm to generate random paths for NPCs in games [20]. Hwanhee Kim et al. extended the WFC algorithm from the existing grid-based system to the graph-based system, breaking through the limitation of the grid in the original WFC algorithm, making the WFC algorithm more widely applicable [21]. Nevertheless, the above studies have ignored the possibility of integrating non-local constraints and multi-layer generation into the WFC algorithm. In this view, the motivation for our work is addressing this problem. WFC, an algorithm based on adjacency constraints, lacks non-local constraints in nature. The method described herein combining the original WFC algorithm Maxim Gumin proposed [22] and the dungeon room generator Green et al. presented [18], using the idea of solving CSP, provides a robust control handle for game designers. It can solve problems mentioned above of the WFC algorithm, which is conducive to obtaining a level generation result that is more controllable and more similar to the artificial design.
3 Method WFC algorithm can be divided into the overlapping model and the simple tiled model. The overlapping model is suitable for procedural texture generation [17]. The simple tiled model can manually configure more complex rules to restrict the adjacency between tiles with different semantics, which is more suitable for the game levels generation. Since the problem we need to solve is the generation of game levels, the simple tiled model is selected. The method in this paper is shown in Fig. 1 and consists of the following four steps, which contains the key improvements that we introduced to the original WFC algorithm. 1. Rule initialization: In the original WFC algorithm, all allowed adjacency rules in the grid were explicitly written into a configuration file. According to the rule library specified in this file, each time the WFC algorithm runs, it comes up with a solution that meets all the adjacency rules, that is, a game level with certain playability. To reduce the number of rules that need to be manually written, we integrate an automated rule system during the rule initialization phase, which can expand one manual rule into multiple equivalent rules based on the symmetry of tiles. 2. Data initialization: Before each generation, we have to clear and initialize all data in the grid. Before the first observation stage, we insert a global minimum constraint stage to fix some preset tiles at specific positions to enhance the game designer’s control of the result. 3. Observation: In terms of information entropy theory, we calculate the information entropy H of all cells with Eq. (1), where Pi means the probability of each available tile. A cell with the lowest information entropy is selected as the observation cell. From all the remaining available tiles in the cell, we select one randomly according to
Automatic Generation of Game Levels Based on Controllable Wave Function
41
Fig. 1. Main program flow.
the weight. After the value of a cell is determined, global maximum constraint, interlayer constraint, and the distance constraint are added to solve a series of problems of the original WFC algorithm.
H=−
n i=1
Pi logPi
(1)
4. Propagation: The constraint caused by the value determined in the observation stage is propagated to adjacent cells at the propagation stage. Then, the undetermined cell of the lowest information entropy is selected for further observation. This process is repeated until all cells converge to certain observing values. 3.1 Automatic Rule System In order to use transformations of the tile to simplify the rule configuration, as shown in Fig. 2, we have summarized 8 commonly used transformation forms of a tile, and each is marked with an index number, which is convenient for search.
42
D. Cheng et al.
Fig. 2. 8 transformation forms of a tile. The first line is obtained by rotating the No. 0 pattern 90° counterclockwise sequentially, and the second line is left-right mirror images of the first line.
First, we temporarily ignore the symmetry of tiles and expand each rule. As shown in Fig. 3, the No. 0 transformation of road tiles and No. 1 transformation of road turn tiles form an adjacency rule based on manual configuration. By rotation and mirroring, this rule can be expanded into 8 rules. Since the rules between the two tiles are mutual, the rule is expanded into 16 rules for storage finally.
Fig. 3. Get 8 equivalent rules from one left-right rule.
Next, we consider the symmetry of the tiles. The method mentioned above can solve the problem of repeating different transformation forms of the same rule, but there are still too many rules to write. For rules with symmetrical tiles, the equivalent rules can be extended using symmetry to reduce the number of the required rules further. Therefore, on the basis of the original WFC algorithm [22] and the WFC algorithm of Arunpreet Sandhu et al. [23], the common symmetry types of tiles need to be extended. To build a more comprehensive system of symmetry types of tiles, as shown in Table 1, the
Automatic Generation of Game Levels Based on Controllable Wave Function
43
symmetry types of tiles are summarized into nine types. The types are named by a letter with the same symmetry type. Table 1. Symmetry dictionary. Name
Symmetry type
Initial tile’s equivalent transformation number
New symmetry type after transformation
F
No symmetry
0
F/F/F/F/F/F/F/F
S
Centrosymmetric
0/2
S/S/S/S/S/S/S/S
T
Vertical axis symmetry
0/4
T/B/T/B/T/B/T/B
L
Counter-diagonal axis symmetry
0/5
L/Q/L/Q/Q/L/Q/L
B
Horizontal axis symmetry
0/6
B/T/B/T/B/T/B/T
Q
Main diagonal axis symmetry 0/7
Q/L/Q/L/L/Q/L/Q
I
Horizontal and vertical axis symmetry
0/2/4/6
I/I/I/I/I/I/I/I
/
Double diagonal axis symmetry
0/2/5/7
‘/’/‘/’/‘/’/‘/’/‘/’/‘/’/‘/’/‘/’
X
All 8 transforms are identical 0/1/2/3/4/5/6/7
X/X/X/X/X/X/X/X
8 transformation forms and 9 symmetry types will form a complicated mapping relationship. How to quickly find the transformation number of the equivalent tile after the symmetry transformation of the tile becomes the key point to the automatic rule system. For this reason, we have proposed the concept of the symmetry dictionary, which is established by recording the transformation number equivalent to the initial tile of each symmetry type. The symmetry dictionary is shown in Table 1. The automatic rule system expands manually configured rules into a large number of equivalent rules, eliminating the game designer’s workload of configuring these equivalent rules, which significantly simplifies the difficulty of writing the configuration file. For example, if all 4 sides of 8 transformation types of 2 tiles need to be configured to allow adjacency, 8 × 8 × 4 × 2 = 512 rules have to be manually written. With the automatic rule system, 1 to 8 rules are simply needed, then other rules are extended by this system. As a result, the efficiency of rules writing has been improved significantly. 3.2 Non-local Constraints Global Constraint. One of the main problems of the WFC algorithm is the lack of global constraint, which includes two aspects: global maximum constraint and global minimum constraint. Global maximum constraint refers to the limitation of the maximum number of a specific kind of tile, while the global minimum constraint is that a specific type of tile appears at least several times. The lack of global constraints will cause the global number of game elements generated in the level not to be controlled by the WFC algorithm.
44
D. Cheng et al.
The effect of applying the global maximum constraint is shown in Fig. 4. The constraint is not added in Fig. 4(a), which results in the water area is too large. In Fig. 4(b), the global maximum constraint of the water tiles is set to 20 units and get the result we expect.
(a)
(b)
Fig. 4. Two generated levels without (a) and with (b) global maximum constraint.
The global minimum constraint can be equivalent to presetting the minimum number of tiles at specified positions or random positions at first. The method of presetting tiles is to set the value of the specified cell as the specified tile after the data initialization stage and then to spread its influence. Game designers can also use this function to presets the outstanding design that they need to set in advance, and the WFC algorithm automatically generates the remaining parts. The effects of the global minimum constraint are shown in Fig. 5. Due to the position (0, 0) is preset to water, as shown in the result of Fig. 5, it can be seen that the WFC algorithm generates the rest of the level without violating the preset constraints.
Fig. 5. Global minimum constraint effects.
Automatic Generation of Game Levels Based on Controllable Wave Function
45
Multi-layer Generation. A game scene is often composed of multiple layers of game elements with different semantics and requires different types of game elements to be generated at different layers. Because game elements of different layers can overlap each other, they usually do not apply to adjacency constraint rules but require inter-layer constraint. It is difficult to achieve with the original WFC algorithm. Figure 6(a) is the result of generating a level of a single layer, which generally can only be applied to simple types of game scenes. However, in Fig. 6(b), the enemy and the treasure chest are added, the enemy can only be generated on the grass, and the treasure chest can only be generated on the road. It is a result of using the method of this study.
(a)
(b)
Fig. 6. Two generated levels, with single-layer generation (a) and double-layer generation (b).
To achieve multi-layer generation, after each observation of the bottom layer is completed, the tiles which do not meet the inter-layer constraints in the upper cell are banned. In this way, through the definition of the inter-layer constraint rules, the connection between the two layers is established. With the function of multi-layer generation, the richness and playability of the game levels have been improved. The automatically generated game levels are no longer just a flat one-layer game map. Distance Constraint. The distance between the game elements is closely related to the playability of the game, so it is a significant factor for game designers to build game scenes. However, the WFC algorithm is mainly built on the adjacency constraints and lacked the constraints of distance, which makes some game design relying on distance constraints cannot be achieved. As shown in Fig. 7(a), the distance between the treasure chest and the enemy cannot be effectively constrained, which makes it impossible to achieve the design expectations that the enemy surrounds the treasure chest. In view of this, it is necessary to control the maximum and minimum distance between tiles and combine adjacent constraints to achieve the effect of comprehensive distance constraints. In Fig. 7(b), when game designers limit the distance between treasure chests and enemies less than 10 units and limit the distance between treasure chests and keys greater
46
D. Cheng et al.
(a)
(b)
Fig. 7. Two generated levels, without (a) and with (b) distance constraints.
than 10 units, the generation results can meet the design goals of enemies distributed around the treasure chest and the keys kept away from the treasure chest.
4 Experiments and Analysis All experiments in this study were completed on a computer with a CPU of Intel Core i7-8750H and a graphics card of Intel UHD Graphics 630. 4.1 Comparison with the Original WFC Algorithm First, to verify the improvement of this method after adding more constraints on the original WFC algorithm, an experiment was conducted to compare the results of this method with the original WFC algorithm when generating a birds-eye-view 2D game level. According to a game designer’s requirements, the automatically generated level needs to meet the following three constraint rules: 1. Global Constraint. In the case of a scene size of 20 × 20, the maximum number of water tiles is not more than 20 units. Besides, the bottom left corner (0, 0) floor tile is water, and the center (10, 10) tile is grass. 2. Distance constraint. The minimum distance between a treasure chest and a key is 10 units, and the maximum distance is 20 units. The maximum distance between a treasure chest and an enemy is 10 units. 3. Inter-layer constraint. Trees, rocks, and enemies must be generated on the grass, and treasure chests and keys must be generated on the road. The final result is shown in Fig. 8, the two sub-pictures in (a) use Maxim Gumin’s WFC algorithm level generator [22], and the two sub-pictures in (b) use the method of our study. Due to the original WFC algorithm cannot generate multiple layers, the results of the original algorithm do not generate the second layer of elements. In the original
Automatic Generation of Game Levels Based on Controllable Wave Function
47
WFC algorithm, as shown in Fig. 8(a), the global parameters are not controlled, only the visually reasonably map generated. In particular, with the large water area as well as the lack of decorations and interactive items, the map can barely present playability enough. Although a nice visual effect has been achieved, it is actually not suitable for the game generation. In Fig. 8(b), the global maximum constraint was used in our study to limit the water area to less than 20 units. At the same time, the multi-layer generation method was used to generate double layers, so that more playable elements are generated above the map layer. The two preset tiles are marked with blue squares in Fig. 8(b), and the number of water tiles does not exceed the maximum constraint of 20 units so that the game designer achieve the goal to generate a small pool in the lower-left corner. The distance between the treasure chest and the key is marked with an orange circle in Fig. 8(b), which can be seen that meets the constraint of 10–20 units. The light-yellow rectangular frame in Fig. 8(b) shows the aggregation effect of the distance constraint, which makes enemies always appear around the treasure chest. Finally, we can find trees, rocks, monsters always generated over grass, while treasure chests and keys are always generated on the road, which meets the strict restriction rules between layers.
(a)
(b)
Fig. 8. Comparison between method in study [22] (a) and our method (b).
4.2 Similarity to Human-Designed Levels To analyze the distinction between the game levels we generate automatically and the levels created by the game designers manually, we conducted a user survey. We invited
48
D. Cheng et al.
5 game designers to design 5 levels manually and generated 5 levels by our method automatically under the same constraints, then mixed them and shuffled the order to form a questionnaire. Five game designers were invited to ensure that the survey results are not affected by the personal design style of the game designer. The questionnaire informed the respondents of the underlying meaning of the game elements and the design intention in these levels. Meanwhile, we informed them that there were only 5 humandesigned levels. Then the respondents were required to choose 5 levels which they think are manually designed from 10 game levels. Finally, we got 151 valid feedbacks from students with some game experience, and the statistical result is shown in Table 2. Table 2. The user survey results. Level number
Manual 1 Manual 2 Manual 3 Manual 4 Manual 5 Average
Percentage considered to be 61.59% human-designed
32.45%
70.20%
66.23%
68.87%
59.87%
Level number
Auto 2
Auto 3
Auto 4
Auto 5
Average
40.40%
37.75%
33.11%
31.79%
40.13%
Auto 1
Percentage considered to be 57.62% human-designed
From Table 2, it can be seen that the game levels generated by the algorithm in this paper are similar to the game levels designed by humans. Ideally, 50% levels that were generated automatically should be considered as human-designed, so do the indeed human-designed levels, meaning that the respondents cannot distinguish the generated level at all. In the worst case, 0% levels that were generated automatically should be considered as human-designed, while the human-designed levels can be 100% identified, meaning that the respondents completely distinguish the generated levels. The survey results show that 57.62% of the respondents think that the No. 1 generated level is humandesigned, which has exceeded 50%. The percentage of generated levels considered to be human-designed reaches an average of 40.13%, which is close to 50%. It shows that most participants were failed to differentiate the design made automatically or manually. 4.3 Real-Time Dynamic Level Generation Finally, we used the controllable WFC algorithm proposed in this paper to implement a real-time game demo that dynamically generates game levels at runtime. It is a 3D top-down roguelike shooter game with rich terrain with plateaus, brooks, and sinuous roads, whose goal is to defeat the enemies and collect treasures from chests. In this game, all levels are not generated before the game starts, but are generated during the player’s game. In this way, it is possible to read the player’s operation data in real-time during the game and modify the constraint configuration as a parameter, and then dynamically change the difficulty or style of the game level generated later depend on the different performance of players. For example, when the player defeats a lot of enemies, we can increase the difficulty of subsequent levels; when the player takes a large amount
Automatic Generation of Game Levels Based on Controllable Wave Function
49
of damage from enemies, we will generate easier levels later on. The real-time level generation effect is shown in the attached video.
5 Conclusion and Future Work This paper mainly solves two main problems of the WFC algorithm: the complex constraint rules are hard to write; excessively random game levels that lack control are always less playable. We proposed automatic rule system, global constraint, multi-layer generation, distance constraint to solve these problems effectively and provide a controllable game level generation tool for the game designers. Experiments show that the method is better than the original WFC algorithm in the aspects of enhancing the designer’s control and generation effect. Moreover, the game levels automatically generated by this method achieve a high similarity with the human-designed game levels. In addition, Real-time dynamic level generation experiment shows that the method in this paper can be successfully applied to practical games generation and get satisfactory results. This method is only applicable to some game types whose game scenes are built on a plane and is hard to used directly into some other game types such as 2D side-scrolling games. In future work, we plan to extend this method to the field of 2D side-scrolling game levels generation. Furthermore, we will also do more research on the real-time generation of levels by this controllable WFC algorithm. Acknowledgments. This work was supported by the Fundamental Research Funds for the Central Universities, and the National Key R&D Program of China (2018YFB1403900).
References 1. Shaker, N., Togelius, J., Nelson, M.J.: Procedural Content Generation in Games. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42716-4 2. Summerville, A., et al.: Procedural content generation via machine learning (PCGML). IEEE Trans. Games 10, 257–270 (2018). https://doi.org/10.1109/TG.2018.2846639 3. Johnson, L., Yannakakis, G.N., Togelius, J.: Cellular automata for real-time generation of infinite cave levels. In: Proceedings of the 2010 Workshop on Procedural Content Generation in Games - PCGames 2010, Monterey, California, pp. 1–4. ACM Press (2010). https://doi. org/10.1145/1814256.1814266 4. Frade, M., de Vega, F.F., Cotta, C.: Evolution of artificial terrains for video games based on accessibility. In: Di Chio, C., et al. (eds.) EvoApplications 2010. LNCS, vol. 6024, pp. 90–99. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12239-2_10 5. Dahlskog, S., Togelius, J., Nelson, M.J.: Linear levels through n-grams. In: Proceedings of the 18th International Academic MindTrek Conference on Media Business, Management, Content & Services - AcademicMindTrek 2014, Tampere, Finland, pp. 200–206. ACM Press (2014). https://doi.org/10.1145/2676467.2676506 6. Summerville, A., Philip, S., Mateas, M.: MCMCTS PCG 4 SMB: monte carlo tree search to guide platformer level generation (2015) 7. Summerville, A., Mateas, M.: Super mario as a string: platformer level generation via LSTMs. arXiv:1603.00930 [cs]. (2016)
50
D. Cheng et al.
8. Hoover, A.K., Togelius, J., Yannakis, G.N.: Composing video game levels with music metaphors through functional scaffolding (2015) 9. Snodgrass, S., Ontanon, S.: Learning to generate video game maps using Markov models. IEEE Trans. Comput. Intell. AI Games 9, 410–422 (2017). https://doi.org/10.1109/TCIAIG. 2016.2623560 10. Jain, R., Isaksen, A., Holmga, C., Togelius, J.: Autoencoders for level generation, repair, and recognition (2016) 11. Guzdial, M., Riedl, M.: Learning to blend computer game levels. arXiv:1603.02738 [cs]. (2016) 12. Summerville, A.J., Snodgrass, S., Mateas, M., Ontañón, S.: The VGLC: the video game level corpus. arXiv:1606.07487 [cs]. (2016) 13. Giacomello, E., Lanzi, P.L., Loiacono, D.: Searching the latent space of a generative adversarial network to generate DOOM levels. In: 2019 IEEE Conference on Games (CoG), London, United Kingdom, pp. 1–8. IEEE (2019). https://doi.org/10.1109/CIG.2019.8848011 14. Volz, V., Schrum, J., Liu, J., Lucas, S.M., Smith, A., Risi, S.: Evolving mario levels in the latent space of a deep convolutional generative adversarial network. arXiv:1805.00728 [cs]. (2018) 15. Togelius, J., Yannakakis, G.N., Stanley, K.O., Browne, C.: Search-based procedural content generation: a taxonomy and survey. IEEE Trans. Comput. Intell. AI Games 3, 172–186 (2011). https://doi.org/10.1109/TCIAIG.2011.2148116 16. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995) 17. Karth, I., Smith, A.M.: WaveFunctionCollapse is constraint solving in the wild. In: Proceedings of the International Conference on the Foundations of Digital Games - FDG 2017, Hyannis, Massachusetts, pp. 1–10. ACM Press (2017). https://doi.org/10.1145/3102071.311 0566 18. Green, M.C., Khalifa, A., Alsoughayer, A., Surana, D., Liapis, A., Togelius, J.: Two-step constructive approaches for dungeon generation. arXiv:1906.04660 [cs]. (2019) 19. Gaisbauer, W., Raffe, W.L., Garcia, J.A., Hlavacs, H.: Procedural generation of video game cities for specific video game genres using WaveFunctionCollapse (WFC). In: Extended Abstracts of the Annual Symposium on Computer-Human Interaction in Play Companion Extended Abstracts - CHI PLAY 2019 Extended Abstracts, Barcelona, Spain, pp. 397–404. ACM Press (2019). https://doi.org/10.1145/3341215.3356255 20. Scurti, H., Verbrugge, C.: Generating Paths with WFC. arXiv:1808.04317 [cs]. (2018) 21. Kim, H., Lee, S., Lee, H., Hahn, T., Kang, S.: Automatic Generation of Game Content using a Graph-based Wave Function Collapse Algorithm. In: 2019 IEEE Conference on Games (CoG), London, United Kingdom, pp. 1–4. IEEE (2019). https://doi.org/10.1109/CIG.2019. 8848019 22. Gumin, M.: Bitmap & tilemap generation from a single example by collapsing a wave function. https://github.com/mxgmn/WaveFunctionCollapse 23. Sandhu, A., Chen, Z., McCoy, J.: Enhancing wave function collapse with design-level constraints. In: Proceedings of the 14th International Conference on the Foundations of Digital Games - FDG 2019, San Luis Obispo, California, pp. 1–9. ACM Press (2019). https://doi. org/10.1145/3337722.3337752
VR-DLR: A Serious Game of Somatosensory Driving Applied to Limb Rehabilitation Training Tianren Luo, Ning Cai, Zheng Li, Zhigeng Pan(B) , and Qingshu Yuan(B) Research Institute of Virtual Reality and Intelligent Systems, Hangzhou Normal University, Hangzhou, China {zgpan,yuanqs}@hznu.edu.cn Abstract. Hemiplegia is one of the common symptoms of stroke, especially the motor dysfunction of upper limb has a great impact on the patients’ daily life, and it is also one of the difficult problems in rehabilitation treatment. However, traditional rehabilitation training lacks pleasure and easy to make patients shrink. We use virtual reality technology to design VR-DLR (VR driving for limb rehabilitation), a serious game of somatosensory driving that combines hardware, software and integrates sight, hearing and tactile sensations. We design parameterized steering wheel training actions according to the level of the patients’ limb disorders and the rehabilitation physician’s treatment plan. The patients’ training task is to control the virtual car to collide more coins and to avoid fewer obstacles. We collect and record the patients’ interactive operation data, and design evaluated model to evaluate the patients’ score and rehabilitation effect. VR-DLR supports two modes of fixed speed and free driving and has adapting difficulty in each mode according the limb disorders of patients. Moreover, in order to meet the needs of different patients, VR-DLR has two display modes of VR headset and 3D annular projection screen. A user study shows that VR-DLR can significantly increase the interest, initiative, confidence of patients in rehabilitation training compared with traditional training methods.
Keywords: Rehabilitation training simulator · Serious game
1
· Virtual reality · Driving
Introduction
In recent years, persons who suffer from stroke become younger and younger. The rehabilitation of limb function after stroke has always been one of the difficult clinical problems. 85% of stroke patients have disorders of upper limb at the early stage of onset [19], and about 30%–36% of stroke patients still have dysfunction of upper limb 6 months after the onset [9]. This seriously affects the patients’ N. Cai–The author’s contribution equally with Tianren Luo. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 51–64, 2020. https://doi.org/10.1007/978-3-030-65736-9_4
52
T. Luo et al.
motor function and daily life. Although the traditional treatment methods based on artificial and physical equipment have a certain effect, the patients’ movement cannot be quantified into data and collected, which is not conducive to the analysis of the rehabilitation physician. The patients’ initiative is affected by the tedious training process. Therefore, it is of great significance to find active and effective rehabilitation treatment methods to improve the limb function of stroke patients. Virtual reality technology refers to the use of integrated technology to form a realistic 3D virtual environment. Patients use the VR equipment to interact with objects in the virtual world in a natural manner, thereby generating immersive experience [16]. Many studies have shown that serious games based on virtual reality technology have a positive promotion significance for rehabilitation training [3,18]. However, most of these serious games use fixed scenes, that is, pre-built scenes and fixed tasks, which may make the patients’ interest decrease with the increase of training days. Many virtual serious games do not consider the balance between the degree of different limb disorders and the difficulty gradient, which may reduce the effectiveness of training. In addition, in order to improve the patients’ rehabilitation confidence and the fun of the game, the virtual serious game needs more interactive feedback and encouragement mechanism. In order to overcome these shortcomings, we apply VR driving technology with rich somatosensory interaction experience to the rehabilitation training of patients, so that the patients can play gamified rehabilitation training driven by the training tasks with fun. The contribution of this article is to design and develop a serious game of somatosensory interactive VR driving called VR-DLR for patients with moderate and mild limb disorders (levels 2–4 in Muscle Strength Classification [13]). We parameterize the training actions customized by rehabilitation physicians for different patients to automatically generate random scenes and training tasks. VR-DLR has two driving modes (free driving mode, fixed speed mode) according to the degree of disorders of the patients. The adaptive difficulty of the game is dynamically controlled through resistance of steering wheel, speed of the car, number of obstacles, limited time, etc. According to the patients’ preferences, VR-DLR provides two presentation methods (VR head-mounted display (HMD) and annular projection screen). In addition, according to the tasks’ completion time, operation data, score and historical data are recorded in the back-end, VR-DLR gives a score after each training and encourages users by details of improving operations compared with the historical data.
2 2.1
Related Work Rehabilitation Training Based on Traditional Methods
In the traditional rehabilitation training method, the rehabilitation physician mainly performs stretching and other training on the patients’ limbs in a freehand manner. Generally, this rehabilitation training method is operated by the
VR-DLR: A Serious Game of Somatosensory Driving Applied to Limb
53
rehabilitation physician and the patient one-on-one. Because training mainly depends on the operation of rehabilitation physicians, or uses some physical training equipment, the training intensity and efficiency of traditional training methods are not stable, and the training effect is difficult to be guaranteed. The lack of data collection and feedback on the training process of the patients makes physicians difficult to [11]. In the traditional rehabilitation training methods, there are also the problems of limited training venues, lack of professional rehabilitation physicians, expensive treatment and monotonous training procedures. Many patients do not get obvious rehabilitation effects. Losing confidence and interest in rehabilitation brings a big psychological burden to patients [14]. 2.2
Rehabilitation Training Based on Robot
The PERCRO laboratory in Italy has designed a set of virtual rehabilitation training robots for the right limb of stroke patients [6]. The external equipment of the system is a five-dof upper limb skeleton robotic arm. And they develop a virtual system of rehabilitation training for the robotic arm [5]. The University of California and the Chicago Rehabilitation Research Center jointly researched and developed a robotic training system with multi-joint called Spring [12], which is specifically aimed at patients with regaining active mobility in the early stages of rehabilitation. Although the use of robots can reduce the artificial burden and achieve better rehabilitation training results compared with traditional rehabilitation training, the robotic method cannot effectively increase the enthusiasm and interest of patients. 2.3
Rehabilitation Training Based on Virtual Serious Games
In recent years, with the development of CG technology and VR technology, rehabilitation training methods based on virtual serious games have become popular. Laffont et al. [10] study video games for treatment of stroke and they found that using video games in the first month after a stroke is more effective than traditional methods. Ines et al. [2] study the impact of serious games on elderly people participating in rehabilitation programs. Their research results show that game-based rehabilitation is very helpful to improve the balance of the elderly. Jo˜ ao et al. [1] study the impact of serious games on rehabilitation physicians’ actual work. The results show that rehabilitation physicians play an important role in making serious games work properly. Gamito et al. [7] develop a serious VR-based game application for cognitive training. The results show that the patients’ attention and memory functions can be significantly improved. OGUN et al. [15] found that the immersive VR programs in rehabilitation has a positive effect on the upper limb function and daily activities of stroke patients. Santos et al. [17] design medical rehabilitation experiments to explore the effect of VR rehabilitation program in SCAs patients. Keshner et al. [8] provide the decision-making for the VR technology systems suitable for clinical intervention therapy. Feng Hao et al. [4] design a VR system
54
T. Luo et al.
to help patients with Parkinson disease recover gait and balance and achieved good results. However, most of these serious games have only a single interactive scene, lack of scientific and fun training tasks, and lack of different game modes and difficulty for different levels of limb disorder.
3
Design and Implementation
Fig. 1. Structure of VR-DLR.
3.1
Structure Design
The structure design of VR-DLR is shown in Fig. 1. In terms of hardware, it consists of a pedal (mode 2 only) and a steering wheel that can adjust the resistance according to instructions. VR-DLR divided into fixed speed mode (mode 1) and free driving mode (mode 2). The front-end display is free to choose whether to wear a VR HMD or 3D glasses to view the annular projection screen according to the patients’ preferences. In the process of rehabilitation training, we will record some key operation data of patients into the database and establish an evaluation model. 3.2
Function Design
Before the patients undergo rehabilitation training, the rehabilitation physician determines the rehabilitation stage for the patient based on professional knowledge. Based on the training actions and training intensity designed by the rehabilitation physician for the patient, we design training tasks with different difficulty gradients, so that rehabilitation training can be carried out more scientifically. The virtual environment of VR-DLR is configured according to the treatment plan. The treatment plan can be described by the parametrically training action. The functions of VR-DLR can be divided into module of training tasks, module of training information collection and evaluation module of effect. Figure 2 shows the relationship between the modules.
VR-DLR: A Serious Game of Somatosensory Driving Applied to Limb
55
Fig. 2. Function of VR-DLR.
Fig. 3. Module of training tasks. (a) A schematic top view of a set of continuous training actions (b) Complete training actions guided by collide the coins and avoid the roadblocks. (c) Arrive at the destination of the designated task within the limited time.
56
T. Luo et al.
Module of Training Task. Rehabilitation physicians design a set of training actions suitable for patients. We parameterize the training actions to automatically and randomly generate virtual coins (gold coins/silver coins/copper coins) and roadblocks oriented to the training actions. By controlling the virtual car, the patient can complete the task of reaching the destination and collide with as many coins as possible along the way to complete training amount. As shown in Fig. 3(a), a schematic diagram of continuous training movements (left− >right− >left− >right− >left) after parameterization of some actions in the training task is shown. Guided by gold coins and roadblocks, the patient needs to complete a series of movements to turn the steering wheel to avoid roadblocks and collide coins. Have the more coins, less collision with roadblocks and reach the destination within the limited time will get the higher score. Figure 3(b) and (c) show the screenshots of the module of our training tasks. In this module, patients can experience immersive driving operations. In order to further improve the patients’ interactive experience, the steering wheel will dynamically provide various feedback, such as vibration when collision with roadblocks, and the resistance of steering wheel will increase when the vehicle speed is faster. In addition, for patients with different levels of limb disorders, the difficulty of the automatic initialization of scene and tasks is different. The difficulty is reflected in the resistance of the steering wheel, the speed of the car at a fixed speed (mode 1), the maximum speed (mode 2), the number of roadblocks, the complexity of parameterized training actions, the distance between the destination and the car’s initial position, etc. Module of Information Collection. During the rehabilitation training, VRDLR collects the operation signals of the patients through the rotation angle information obtained by the sensor of steering wheel and the displacement information obtained by the throttle and brake sensors on the pedal. Then it converts the collected information processing and maps it to the steering wheel, throttle and brakes of the virtual car. The patients’ key operation data (as shown in the left column of Table 1) is also used to evaluation model of VR-DLR and assist rehabilitation physicians in evaluating patients’ rehabilitation. Evaluation Module of Rehabilitation Training Effect. In rehabilitation assessment, joint mobility, agility of movement and amount of joint movement can be used to assess the joint rehabilitation of patients. During the training process, the greater the rotation angle of the motion, the greater the range of related joint motion; the faster the motion speed, the more agile the patients’ movements; the more the number of training actions, the greater the amount of patients’ movement. Therefore, the action requirements in the training program can be described by the rotation angle, speed and number of actions, which are the influencing factors of the training actions. We combine the rehabilitation medical theory, clinical practice and evaluation standards of functional movement of shoulder and elbow to construct a reasonable index of rehabilitation
VR-DLR: A Serious Game of Somatosensory Driving Applied to Limb
57
evaluation. In addition, we use the analytic hierarchy process to calculate the weight of each index in the evaluation of the rehabilitation effect and determine evaluation model after the rehabilitation physicians’ agree. By evaluating model’ calculations and specific operational steps in the game, we can quantify the score assessment in Table 1. Among them, ωt represents the weight value between 0 and 1 that the quantization of the completed time of turning the steering wheel. The greater the amplitude of the steering wheel and the shorter the time, the higher the weight value. In addition, if patients can complete consecutive training actions (such as turn the steering wheel to the right to avoid obstacles and then turn the steering wheel to the left to collide with coins on the left road...), they will get additional score. By calculating the score of the evaluation model and comparing it with historical data, the patients’ rehabilitation effect is obtained. The evaluation details will be presented to patients in the form of electronic report. Compared with the difficult progress of short term in traditional rehabilitation, evaluation feedback of VR-DLR will allow patients to feel the progress of details (Table 1) to improve the courage and confidence of rehabilitation. Table 1. Quantified evaluation. Key operating data
Score
Turn the steering wheel sharply (more than 120◦ at a time) 50*ωt1 Turn the steering wheel moderately (60 to 120◦ )
3.3
30*ωt2
Turn the steering wheel slightly (less than 60◦ )
20*ωt3
Hit the roadblocks
−20
Get gold/silver/copper coins
15/10/5
Arrival at the destination within the limited time
200
Complete consecutive training actions 2/3/4 times
20/30/40
Game Modes and Display Methods
Mode 1: Fixed Speed. VR-DLR is divided into two modes. Mode 1 is a fixed speed mode. In this mode, the throttle and brake will not be used. The forward movement of the virtual car relies on automatic power traction and fixed speed. Patients can concentrate on controlling the steering wheel. This mode is mainly used for patients with rate 2–3 (moderate) upper limb disorders in the Muscle Strength Rating Table [13]. We set virtual routes on every city road. By default, the car moves forward at a low speed (20 km/h) at a fixed speed, and the patients can change to the left or right lane change by turning the steering wheel on a straight road. At the intersection, the patients turn left or right by turning the steering wheel. If the patients do not turn the steering wheel at the intersection or the angle of steering wheel is too small, it goes straight by default. At an L-shaped junction, whether the patient is turning the steering wheel, the car
58
T. Luo et al.
will automatically turn in the direction of the junction to avoid driving out of lane. There is an anti-deviation system, even if the angle of the steering wheel is wrong, VR-DLR will use the method of automatic interpolation to correct the direction of the car to ensure that it will not deviate route. Mode 2: Free Driving. Mode 2 is a free driving mode. We simulate a real driving mode and patients can flexibly drive a virtual car in coordination with hands and feet. In this mode, training of upper limbs is the main training task of VR-DLR, with ankle joint exercise as auxiliary training. This mode is mainly used for patients with limb disorders of rate 4 (mild) in the Muscle Strength Rating Table [13]. In this mode, in order make patients to obtain a more realistic driving experience and dynamically adjust the difficulty of driving, we simulate some physical properties of virtual car according real car, including the acceleration of the throttle-controlled and the speed of the tire, the friction between the tires and the ground, etc. For these physical properties, detailed parameter settings are shown in Table 2. In this mode, the user accelerates and decelerates by controlling the throttle. When the car has speed, stepping on the brake will slow down. When the car is stationary, holding the brake will have a reverse effect. In addition, this mode has no automatic anti-deviation protection settings. If the user hits an obstacle outside the street area (such as trees, telephone poles, bus stops, buildings, etc.), VR-DLR will deduct points and the user needs to step back on the brakes to reversing. Table 2. Physics-based simulation. Physical properties
Parameter settings
Vehicle weight
1.5t
Torque
800N
Down force
100N
Maximum speed
50 km/h
Maximum steering angle
30°
Damping rate of wheel
0.25
Minimum forward friction 0.4 Maximum forward friction 1 Minimum lateral friction
0.2
Maximum lateral friction
1
Display Methods. In most circumstances, the display method is to wear VR HMD for an immersive driving experience, as shown in Fig. 4(a). However, we consider that some patients may be dizzy when wearing VR HMD, which affects the rehabilitation effect. Therefore, we also use a method of combining 3D glasses
VR-DLR: A Serious Game of Somatosensory Driving Applied to Limb
59
and an annular projection screen to conduct the rehabilitation training, as shown in Fig. 4(b).
(a)
(b)
Fig. 4. VR-DLR display mode. (a) VR. (b) Annular projection screen.
3.4
Development Tools
We use Logitech G29 Driving Force steering wheel and pedal. In terms of software, we use C sharp and hlsl/cg to develop driving logic on the Unity 3D engine and put key operating data into the MySQL database.
4 4.1
User Study and Analysis Subject of Research
We select the patients from a center of rehabilitation training as the subject of user study. With the help of rehabilitation physicians, we recruit 16 patients rated 2+, 3−, 3+, 4− and 4+ based on the Muscle Strength Rating Table [13] who volunteered to participate in the research. There are 10 males and 6 females. The patients’ ages range from 26 to 67. Based on the age, muscle strength rating and gender, patients are divided into group A (training with VR-DLR) and group B (training in the traditional way). Both groups have been composed with the goal having no differences regarding the criteria. There are 8 patients in each group. The average age of patients in group A is 49.5, and 48.625 in group B. There is no remarkable difference in age and muscle strength rating between the two groups (P > 0.05) after statistical analysis.
60
4.2
T. Luo et al.
Method of Research
Design of Daily Training. Our research on the two groups of patients is completed by experienced rehabilitation physicians and the patients’ family. The study lasts 10 days. Patients in both groups are required to conduct traditional limb rehabilitation training 30 min a day. The training includes scapular loosening training of the affected upper limb, active assistance and active training of the upper limb of the affected side, joint movement training of the upper limb of the affected side, object retrieval training of the upper limb of the affected side, and training of grasping and opening of the affected side. The training is mainly on the affected side, including a small amount of training on the healthy side. The training intensity of the affected side is basically the same. Then, based on their own situation and their own wishes, the additional training programs of the corresponding group will be carried out with no time limit every day. That is, in addition to the traditional limb rehabilitation training for 30 min per day, group B can increase the time of the same rehabilitation training programs according to individual wishes. For group A, after receiving traditional rehabilitation treatment, patients can use our VR-DLR for training with the assistance of rehabilitation physician. For individual patients who are not suitable for immersive VR devices, they can also choose to wear 3D glasses and look at the annular screen for training. In group A, 2 patients appeared dizziness due to the VR HMD mode on the first day, and used the annular screen for the next few days. We record the feedback of the rehabilitation experience of the two groups of patients and obtain the rehabilitation treatment data of the two groups of patients from the rehabilitation physicians, especially the time for the patients to perform voluntary training in addition to the fixed traditional training every day. Design of Questionnaire. After 10 days of group training, the patients are asked to complete a questionnaire with the assistance of rehabilitation physicians. The patients’ cognitive feedback and attitude to rehabilitation training from 4 dimensions are obtained by the questionnaire. A corresponding score should be given in each question, from 0 to 5. The design of the subjective question is shown in Table 3. 4.3
Analysis of User Data
Statistics of Autonomous Training Time. We record the length of time that the two groups of patients spent everyday (excluding the necessary traditional training), and use the t-test to determine the difference. In the 10-day experiment, we find that there is no significant difference in the autonomous training time of the two groups of patients only on day 3 (p > 0.05). In the remaining days, group A is significantly higher than group B (average of group A>average of group B, p < 0.05). Figure 5 shows the average duration and standard deviation of autonomous training for each group of patients per day.
VR-DLR: A Serious Game of Somatosensory Driving Applied to Limb
61
Table 3. Questionnaire of rehabilitation training. Question 1 Degree of psychological stress in rehabilitation training. (The more relaxed you feel, the higher the score.) Question 2 Degree of physical stress in rehabilitation training. (The more relaxed you feel, the higher the score.) Question 3 Sense of achievement in rehabilitation training. (The higher sense of achievement, the higher the score.) Question 4 Degree of interest in the rehabilitation training. (The higher the level of interest, the higher the score.)
Fig. 5. The time of daily training for two groups of patients.
Statistics of Questionnaire. At the end of the 10-day experiment, we collect feedback from the patients’ questionnaire and make statistics. Figure 6 shows the scores given by the two groups of patients. We use t-test to analyze the significance of the two sets of data. The results show that on questions 1, 3 and 4, group A is significantly higher than group B (average of group A>average of group B, p < 0.05). There is no significant difference between the two sets of data on question 2 (p > 0.05). Analysis of Differences. It can be found in Fig. 5 that as the number of exercise days increases, the average autonomous training time of both group A and B show a decreasing trend. However, the trend of group A declines more slowly than group B, which shows that compared to traditional training way, VR-DLR is more attractive to patients. It can be seen from Fig. 6 that patients
62
T. Luo et al.
Fig. 6. Result of questionnaire.
using VR-DLR have less psychological pressure, higher sense of achievement and more interest than traditional rehabilitation training. We have analyze the causes of these differences based on daily patients’ feedback and the evaluation of rehabilitation physicians. In the traditional rehabilitation training, artificial stretching and physical training are more likely to cause pain than the use of VR-DLR, which is more likely to cause the patients to develop a psychological resistance. In the process of using VR-DLR, the patients completely control the movements of joints and muscles (in the traditional training, many actions are subject to the control of rehabilitation physicians and mechanical equipment), and the patients’ attention is focused on the game, which will distract the patients’ attention in pain and be more willing to accept this treatment. In addition, compared with group A, the patients in group B generally lacked the confidence to recover and the courage to conduct the rehabilitation training (Question 1 and 3). This is because the patients are monotonously treated for a long time and could not make progress in a short time. Group A received triple encouragement from rehabilitation physicians, the process of the game and the evaluation models of the game. VR-DLR will also compare to historical data and give more detailed progress (such as the key operating data in Table 1). All of these provide patients with better encouragement and increase the patients’ confidence in their own recovery. On question 2 (physical stress) of the questionnaire, there is no significant difference (P > 0.05) in the scores given by the two groups of patients. However, after our research, some patients say that the physical weight of the VR HMD will cause a certain amount of pressure. But because of immersion of VR, this pressure will not significantly affect the them. The use of an annular projection screen will produce less pressure than a VR HMD, but it will reduce the immersion of virtual driving.
VR-DLR: A Serious Game of Somatosensory Driving Applied to Limb
5
63
Conclusion
We have developed a sense of VR driving serious game called VR-DLR for the treatment of limbs disorders, in order to overcome some shortcomings of traditional rehabilitation training and serious games. Results of the experiment show that compared with traditional training, VR-DLR has a better attraction and experience for patients, including less psychological pressure, higher sense of achievement and fun. This is also confirmed in the time of daily autonomous training.
6
Future Work
This study aims to explore the attractiveness, experience and acceptance of VR-DLR to patients. In the future, we will continue to improve VR-DLR and conduct longer user studies to study the impact of VR-DLR on rehabilitation effects.
References 1. Almeida, J., Nunes, F.: The practical work of ensuring the effective use of serious games in a rehabilitation clinic: qualitative study. JMIR Rehabil. Assistive Technol. 7(1), 154–162 (2020) 2. Ayed, I., Ghazel, A., Jaume-i Capo, A., Moya-Alcover, G., Varona, J., MartinezBueso, P.: Feasibility of kinect-based games for balance rehabilitation: a case study. J. Healthc. Eng. 2(4), 325–338 (2018) 3. Cuthbert, J.P., Staniszewski, K., Hays, K., Gerber, D., Natale, A., O’dell, D.: Virtual reality-based therapy for the treatment of balance deficits in patients receiving inpatient rehabilitation for traumatic brain injury. Brain Injury 28(2), 181–188 (2014) 4. Feng, H., et al.: Virtual reality rehabilitation versus conventional physical therapy for improving balance and gait in Parkinson’s disease patients: A randomized controlled trial. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 25, 4186–4199 (2019) 5. Frisoli, A., Bergamasco, M., Carboncini, M.C., Rossi, B.: Robotic assisted rehabilitation in virtual reality with the L-EXOS. Stud. Health Technol. Inform. 145, 40–54 (2009) 6. Frisoli, A., Salsedo, F., Bergamasco, M., Rossi, B., Carboncini, M.C.: A forcefeedback exoskeleton for upper-limb rehabilitation in virtual reality. Appl. Bion. Biomech. 6(2), 115–126 (2009) 7. Gamito, P., et al.: Cognitive training on stroke patients via virtual reality-based serious games. Disabil. Rehabil. 39(4), 385–388 (2017) 8. Keshner, E.A., Fung, J.: The quest to apply vr technology to rehabilitation: tribulations and treasures. J. Vestib. Res. 27(1), 1–5 (2017) 9. Kwakkel, G., Kollen, B.J., van der Grond, J., Prevo, A.J.: Probability of regaining dexterity in the flaccid upper limb: impact of severity of paresis and time since onset in acute stroke. Stroke 34(9), 2181–2186 (2003) 10. Laffont, I., et al.: Rehabilitation of the upper arm early after stroke: video games versus conventional rehabilitation. a randomized controlled trial. Ann. Phys. Rehabil. Med. 9(7), 102–111 (2019)
64
T. Luo et al.
11. Lee, S.J., Chun, M.H.: Combination transcranial direct current stimulation and virtual reality therapy for upper extremity training in patients with subacute stroke. Arch. Phys. Med. Rehabil. 95(3), 431–438 (2014) 12. Li, K., et al.: The effect of sensorimotor training performed by carrers on homebased rehabilitation in stroke patients. Physiotherapy 101, 866–867 (2015) 13. Lovett, R.W., Martin, E.G.: Certain aspects of infantile paralysis: with a description of a method of muscle testing. J. Am. Med. Assoc. 66(10), 729–733 (1916) 14. Schubert, M., Drachen, A., Mahlmann, T.: Esports analytics through encounter detection. In: Proceedings of the MIT Sloan Sports Analytics Conference, vol. 1, p. 2016 (2016) ¨ UN, ¨ 15. OG M.N., Kurul, R., YAS ¸ AR, M.F., Turkoglu, S.A., AVCI, S ¸ ., Yildiz, N.: Effect of leap motion-based 3D immersive virtual reality usage on upper extremity function in ischemic stroke patients. Arquivos de neuro-psiquiatria 77(10), 681–688 (2019) 16. Rose, F., Attree, E., Johnson, D.: Virtual reality: an assistive technology in neurological rehabilitation. Curr. Opin. Neurol. 9(6), 461–467 (1996) 17. Santos, G., et al.: Feasibility of virtual reality-based balance rehabilitation in adults with spinocerebellar ataxia: a prospective observational study. Hearing Balance Commun. 15(4), 244–251 (2017) 18. Saposnik, G., Levin, M., Stroke Outcome Research Canada (SORCan) Working Group: Virtual reality in stroke rehabilitation: a meta-analysis and implications for clinicians. Stroke 42(5), 1380–1386 (2011) 19. Saposnik, G., et al.: Effectiveness of virtual reality using Wii gaming technology in stroke rehabilitation: a pilot randomized clinical trial and proof of principle. Stroke 41(7), 1477–1484 (2010)
A Procedurally Generated World for a Zombie Survival Game Nikola Stankic, Bernhard Potuzak, and Helmut Hlavacs(B) Entertainment Computing, University of Vienna, Vienna, Austria [email protected] Abstract. We present a method for procedurally creating game worlds, in our case levels for playing zombie survival games. The aim is to randomly create new levels that present new challenges to players, are fun, and “work” as game levels, i.e. look like levels that have been hand crafted. We create the topology, paths to follow, random houses, and hordes of zombies to fight against. Players have to reach an end of the level, fight against zombies, and reach the final objective. The paper describes our approach, and presents an overall evaluation of players of the game. Keywords: Procedural level generation
1
· Zombie · Shooter
Introduction
Procedural Content Generation (PCG) in video games has been around for a long time, dating back to the 1980s [1]. A partial reason behind this were hardware restrictions, e.g. limited disc space, so some games turned to PCG in order to overcome these restrictions. A good example for this was a game called .kkrieger, which is only 96 KB in size [7]. Among the first games to have used procedural content generation are (i) Rogue (1980) [14], which is said to have started the trend of using PCG in video games also for the purpose of replayability [12], and (ii) Elite (1984) which was able to generate hundreds of unique star systems without having to explicitly store the data [1,19]. Nowadays disc space and game size are not a problem anymore, but their resources like budget, time and team size are. For example, the popular video game World of Warcraft was made over the course of five years, with an estimated budget between 20.000.000$ and 150.000.000$ [11]. All of this could be greatly reduced by generating at least some of the content, if not all, procedurally. But even when the aforementioned resources are not a problem, a lot of games today use procedural content generation, or at least some elements of it, in order to achieve a certain level of replayability [18]. Exactly that was the goal of this research: to produce a playable first-person shooter game utilizing procedural content generation, based on pseudo-random number generators, in order to create a small survival world from scratch and achieve at least some level of replayability. Pseudo-random Number Generators c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 65–76, 2020. https://doi.org/10.1007/978-3-030-65736-9_5
66
N. Stankic et al.
(PRNG) are one of the simplest and most common techniques used in procedural content generation [11], with other common techniques being grammars, L-Systems, Fractals and Graphs [1,4]. The game was made in Unity 3D, using assets from the Unity Asset Store. This paper describes our approach, as well as an evaluation of the result with several experimental participants.
2
Related Work
Almost all types of content for a video game can be procedurally generated. Starting with terrains and continents, rivers and roads [4], from buildings, characters and vegetation to whole virtual cities [4,8,11], worlds and even universes. Endless variety of non-playable characters and planets could be created, like in the game Spore [15], or billions of unique weapons like in Borderlands [9]. Nowadays, of course, there is a large amount of games that use procedural content generation in one way or another, but in this section, some of the more popular survival-based game titles will be mentioned. Popular choices for this are Minecraft [16] and No Man’s Sky [10]. Minecraft uses procedural content generation to create an endless survival world with different thematic areas called biomes, and No Man’s Sky procedurally generates a large amount of planets, and populates them with also procedurally created creatures and vegetation. Both of these games, however, have endless worlds and also a huge focus on exploration, where we wanted our game to have a smaller map instead, where the player can complete the objectives in short time. Another conceptually similar game to ours is Left 4 Dead, which is a zombie survival first-person shooter game. The game uses something called the AI Director, which utilizes procedural content generation to place zombies and consumable objects and weapons around the map based on player performance [6]. It is also able to procedurally change the layout of the level by blocking paths, forcing the players to take a different routes in order to increase replayability [5,6]. This approach offers great elements of surprise, however, it still requires a hand-made, predefined map, where our project is supposed to generate the whole level randomly. The most conceptually similar tool is an engine called British Countryside Generator, used by a survival game called “Sir, you are being hunted” [13]. The engine uses the Voronoi diagram space partitioning method to create villages, then places buildings next to a previously generated road [2]. The goal of the engine is to create an open world level which resembles the British countryside. We wanted to implement a more random approach in our project, with more randomly-placed buildings and paths. In our case, the goal was to make a first-person shooter survival game in Unity 3D. The game was supposed to have a completely procedurally created map from scratch, using pseudo-random number generators (PRNG). The player controls and moves a first-person character. They have two weapons to choose from: a faster-shooting, but weaker pistol, and a stronger, but slower shotgun, which has a smaller magazine and takes longer to reload.
A Procedurally Generated World for a Zombie Survival Game
67
By going into buildings, the player is able to find consumables, like med-kits to heal if they have been hurt, or ammo boxes to reload their weapons. The objectives of the game would be the following: the players have to find the book necronomicon and pick it up; then they have to destroy four pillars which are scattered around the map; and finally, after completing the previous objectives, a portal opens, with which the player has to react in order to destroy the necronomicon. The player wins and the game is over after completing this final step.
3
Implementation
The first steps are creating a new Terrain Game Object, adding it to the game scene, and attaching a new script to it, which was named “World Generator”. The default Terrain Game Object was used for this. All the other Terrain settings were left to default, as anything that had to be changed would be done on runtime with the script. Initially, a hexagonal grid to the terrain object was also included, in order to use the grid’s cells as points for path and area generation, but it was later decided against it in order to include more placement options. 3.1
Path Generation
The algorithm for generating the path is fairly simple. All it needs are two points, a start point and an end point, with the start point being randomly chosen on one side of the map using a PRNG, and the end point then being calculated from the start point, so that it is on the opposite side of the map. The algorithm then starts from the start point, calculates the normalized direction towards the end point, and generates a new path point a fixed distance in this direction, with a possible offset of 60◦ , 30◦ in both positive and negative directions, which is also chosen by a PRNG. This is an iterative process, and ends when the algorithm reaches a very close proximity to the end point. All these path points are represented as spheres and are saved in a global container. Figure 1 shows one of the possible outcomes of the path generation algorithm. As said before, at first, a hexagonal grid was used for this, with each grid cell representing one path point, but the algorithm was changed later in order to achieve a more “random” path pattern. Now that a path is ready, the algorithm can move on to other steps, starting with generating different areas around the path. 3.2
Area Generation
This part of the algorithm places different areas around the path built in the previous step. Each area is indicated by a sphere, and is defined by a type, like “Abandoned village” or “Trees” for example, which tells the algorithm what kind of objects are supposed to be found in that certain area, and a size, which indicates the radius of the area. An adaptation of a similar existing space filling algorithm [3] was used. Said algorithm is used to procedurally create twodimensional geometry and textures by randomly placing shapes of decreasing
68
N. Stankic et al.
Fig. 1. Unity scene view of a random path possibility generated with the algorithm
Fig. 2. A close-up look of a terrain alpha map offset between ‘grass’ and ‘dirt’
size onto a two-dimensional plane. Our algorithm iteratively tries to place areas of decreasing radius onto random points, once again chosen by a PRNG. The key constraint here is that the areas are not overlapping with each other, as well as with any of the path points. The areas are placed in a decreasing size order, in order to make sure that smaller areas do not use up all the space and make it impossible for larger areas to be placed. The iterative process for placing areas is repeated until either a maximum number of iterations, or a maximum number of placed areas of current size, has been reached, both of which parameters are predefined, with the former being a fail-safe against an endless loop. After each area has been placed, its center is connected to a nearest existing path point, using the same algorithm previously described. After all areas have been placed, the remaining space, not already occupied by an area, is filled with trees.
Fig. 3. Unity scene view of two possible area placements
A Procedurally Generated World for a Zombie Survival Game
69
Two possible outcomes of area generation can be seen in Fig. 3, with the areas and path points denoted as spheres. Just like with path generation, a hexagonal grid was used here as well, but was replaced later for the same reasons. Now that the areas are ready, the algorithm can start populating them with buildings and objects. 3.3
Object Generation
Object generation is done in a very similar fashion to area generation. For each area generated in the previous step, objects are generated within it using the same random space filling algorithm. Objects, just like areas, are defined by a type, like “building”, “tree” or a “prop”, and a size. The type of the object generated depends on the type of the area which the object is associated with. Buildings are placed first, and “trees” and “props” can fill the remaining space. There are currently three thematically different building types that can be generated. The maximum number of objects generated in each area depends on the size of that area. If the type of the area is a “village”, and the area is large enough to have two or more buildings in it, then each of the buildings generated is also connected to the nearest path point. In this step, however, only the information, about what kind of an object is to be spawned and where, is saved. The actual objects are not yet instantiated because there is no information about terrain heights yet. 3.4
Terrain Alphamaps and Heightmaps
After all the path points have been generated, the algorithm is now ready to generate alphamaps for the terrain. The terrain uses two textures - a “dirt” texture for the path and a “grass” texture for - well, everything else. Alphamaps in Unity are a three-dimensional float array, with the first two dimensions representing the x and y positions on the terrain, and the third dimension being splatmap texture to be applied (which, in our case, are the two “dirt” and “grass” textures). Here, the alphamaps are generated by going through all the terrain points, and, if a certain terrain point intersects with a path point-sphere, the texture for that terrain point is set to “dirt”, otherwise it is set to “grass”. A randomly generated offset is used here as well, when calculating the intersection, in order to achieve a more realistic dirt-grass pattern (Fig. 2). After that, the heights for each terrain point are set using perlin noise, which is one of the common pseudo-random number generator techniques. Heightmaps in Unity are a two-dimensional float array, each dimension representing the x and y positions on the terrain respectively, and having a float value between 0 and 1, which denotes the terrain height at that point. 3.5
Object Instantiation
Now that the information about the height of each terrain point is available, objects are ready to be instantiated. As mentioned before, there are currently
70
N. Stankic et al.
three different types of objects: building, tree and prop. Trees and props are placed in a same way: a random game object of that type is selected and instantiated at a position defined in step three, its height (the y-position) is set to the corresponding terrain height, and it is also rotated by a random amount around the y-axis. The same goes for buildings, they are, however, not rotated by a random amount, but are rather rotated so that they are facing the area center. As mentioned before, we were limited by the building assets we could find in the Unity Asset Store, so at first we only had two different types of buildings. We realized that it would be too monotonous seeing the same buildings over and over again, so we decided to use some modular assets and combine them with our own script in order to procedurally generate different buildings as well. For this, a similar approach to something described as ‘Building Blocks framework’ in [17] was used. The script would use a previously man-made layouts (a template) for the floor, walls, ceiling and roof, and instantiate random corresponding components at the appropriate positions. The inside walls can be instantiated as well, which are chosen at random, with the number of placed walls also being random, up to a predefined maximum, potentially separating the inside of the building in two different rooms in the process. There are two versions of this script, one generating a warehouse-like buildings from six different layouts, and is also able to change the textures of inside walls, floors and the ceiling. The other version is not as advanced, as it generates a “ruined house” with only one layout and texture.
Fig. 4. Some of the warehouse buildings generated with the building generator
Fig. 5. Some of the ruined buildings generated with the building generator
Figures 4 and 5 show some of the possible outcomes of the building generator both for warehouse buildings and ruins. Each building is also able to contain other objects inside of it, like med-kits, zombies, or ammo boxes, and those things are randomly instantiated in this step as well. The positions of potential items in
A Procedurally Generated World for a Zombie Survival Game
71
buildings are also predefined with the same template-like system. After placing each building, the algorithm uses a Random Number Generator to determine ifand which object it is going to place at each position. 3.6
Finalization
At this point, the map can look like something in Figs. 6 and 7. Now that the terrain is done and all the objects are finally placed, it is time to finalize the map and make the game playable. First, some grass details are added to the terrain. The engine uses a details map for grass, which is a similar mechanism to the alpha map. The algorithm basically ignores the path on the terrain, and paints the grass and potentially also flowers details everywhere else. It does this by going through all the terrain points and checking if the terrain texture there is set to “dirt” or “grass”, which is how it distinguishes if it is a path or not.
Fig. 6. Unity scene view of a possible world Fig. 7. Unity scene view of another outcome possible world outcome
Fig. 8. Unity scene view of a generated vil- Fig. 9. Gameplay during daytime & lage sand storm
The final step includes placing the player at the beginning of the main path, placing the zombies around random path points, and placing the objectives. The pillars are placed along the path in order to be more visible and easier to find.
72
N. Stankic et al.
The necronomicon is placed in one of the houses in order to encourage the player to explore the buildings. Since everything is in place, the game is then ready and playable, see Figs. 8 and 9.
4
Evaluation and Discussion
A total of 14 users were asked to play our game a few (at least 3 to 4) times and then fill out a questionnaire. Before starting the game, it was briefly explained to them what the game was about, how it works, what they had to do, and finally what the controls for the character are. We did not need any additional information about the users, so the questionnaire was anonymous. The questionnaire is composed of a total of twelve questions, which are split into two categories: the ‘Gameplay’ and the ‘World’, with each having six questions. The questions were formed as declarative statements, like for example “I enjoyed the game”, and the answers were in a form of a classical Likert rating scale, allowing the users to rate the statements from 1 to 5, with 1 being “strongly disagree” and 5 being “strongly agree”. This also gave us an opportunity to calculate the average rating of all the answers for an overall average rating of the user experience. The Gameplay category of the questionnaire focused on the player’s ability to understand the game mechanics and the UI, as well as their overall enjoyment. This part included the following questions: “I clearly understood what the objectives were”, “I had no trouble controlling my character”, “The User Interface was easy to understand”, “I used the following game features: both weapons, flashlight, map, med-kits, ammo boxes”, “I had not trouble completing the game”, and “I enjoyed playing the game”. The World category of the questionnaire focused on the procedurally generated world and buildings component of the game, player’s ability to navigate through the level and overall replayability of the game. It included the following questions: “I found it easy to navigate through the level”, “The level was different/refreshing each time I played”, “The buildings I encountered were different/refreshing each time I played”, “I tried to explore the map and go into buildings”, “I did not have any trouble finding med-kits or ammo”, “I would describe the game as ‘replayable’”. Players were also given an option to leave their own comment at the end of the questionnaire. 4.1
Results and Discussion
After all the player sessions were done, we summarized the results of our questionnaire. Figure 10 shows the pie charts of the results for the Gameplay category of the questionnaire, and Fig. 11 shows the pie charts of the results for the World category of the questionnaire. An average rating for all the answers was 4.21 out of 5. The results of both parts of the questionnaire will be discussed separately, starting with gameplay.
A Procedurally Generated World for a Zombie Survival Game
73
Fig. 10. Questionnaire results for the Gameplay category
All of the users agreed with “I clearly understood what the objectives were”, where 10 out of 14 users gave it a rating of 5, and the rest 4 gave it a rating of 4. These are very good results, as they tell us that the instructions of the objectives were clear and the users knew from the start what they were supposed to do. Also none of the users disagreed with having trouble controlling the character. This is probably because we used the standard input commands for most first-person games (move the character with W/A/S/D keys and use the mouse to turn around and shoot) and most of the users were familiar with that. It could also be because the users did not have to use a lot of additional keys, besides the ones for turning on the flashlight, switching the weapons, interacting with objects, toggling the map and sprinting. Most users, with the exception of one, also agreed that the User Interface was easy to understand. This was an expected result, as the UI was not too crowded and only displayed things like objectives, current/max ammo and health, and additionally the map when the player decides to toggle it on. We got some mixed results from users when asked if they used all of the listed features. This was the case because some users did not bother to explore the buildings that much, so they were unable to find
74
N. Stankic et al.
Fig. 11. Questionnaire results for the World category
any med-kits or ammo boxes. Some users even decided not to interact with the zombies at all, so they did not need any health or ammo pickups, nor did they use both weapons. Almost all users agreed on being able to complete the game, with the exception of one which gave this question a neutral rating. 11 out of 14 users also agreed when asked whether they enjoyed playing the game, where 2 users replied with “neutral” (rating of 3) and one person disagreed (rating of 2), giving this question an average rating of 4.07. Overall the gameplay part received good results, indicating that our goal for the game to be enjoyable was reached. The questions from the world category of our questionnaire received some mixed results. When asked if they found it easy to navigate through the level, almost all users agreed that it was, while 2 answers were neutral. This indicates that the players were probably almost always aware of their position, and could tell at least which parts of the map they have already visited. The map feature also helped here. 9 out of 14 users agreed that they noticed the level to be different each time they played, with the remaining 3 feeling neutral about this
A Procedurally Generated World for a Zombie Survival Game
75
statement, and only 2 disagreeing. This could be because the map was always thematically similar, so they confused that with not being different. The majority of the users, however, thought the buildings did look different each time they played, which probably also helped them navigate through the world by recognizing the places they have already visited. Only 57.1 % of the users agreed that they tried to explore the map and the buildings, and 9 out of 14 users did not agree that they had no trouble finding med-kits and ammo boxes. This could be because they did not explore the buildings, but it could also be an indicator that a purely PRNG technique for placing consumable objects is not ideal, or simply that our algorithm had a way too low chance of generating such objects in those places. When asked if they would describe the game as “replayable”, 10 out of 14 users agreed, 3 were neutral and 1 disagreed, giving this question an average rating of 4.14, which at least suggests that our goal for the game to be replayable was met.
5
Conclusion and Future Work
In conclusion, we were able to create a whole survival world procedurally and randomly, and fit it within a playable game. The user tests proved the game to be enjoyable and replayable. We believe that the game has a lot of potential, and could be expanded to achieve more even more replayability. Different world themes, like a desert, or mountain village, could be added, along with more potential objectives and game modes. Other gameplay elements could be added, like more weapons, grenades, and even different types of zombies. The script for procedural generation of buildings could also be extended. It could include even more layouts, or be expanded to create more complex buildings. Procedural content generation is a powerful tool, and even with its simplest technique such as PRNGs, a lot can be achieved. This research has shown us that a whole world could be successfully procedurally created using this simple, yet effective technique. Unfortunately, PCG has a strong limitation when it comes to re-usability. Most algorithms for procedural content generation are made specifically for a certain application, unlike, for example, some AI algorithms which could be reused in different games. Nevertheless, PCG has a lot of potential which should be, without a question, further explored.
References 1. Barreto, N., Cardoso, A., Roque, L.: computational creativity in procedural content generation: a state of the art survey. November 2014. https://doi.org/10.13140/2. 1.1477.0882 2. Betts, T., Carey, J.: Procedurally generated content in sir, you are being hunted, 6 November 2013 3. Bourke, P.: A space filling algorithm for generating procedural geometry and texture. GSTF J. Comput. (JoC) 3(2), 1–5 (2013). https://doi.org/10.7603/s40601013-0004-2
76
N. Stankic et al.
4. Carli, D.M.D., et al.: A survey of procedural content generation techniques suitable to game development. In: 2011 Brazilian Symposium on Games and Digital Entertainment, pp. 26–35, November 2011. https://doi.org/10.1109/SBGAMES. 2011.15 5. Champandard, A.J.: Procedural level geometry from left 4 dead 2: spying on the AI director 2.0, 6 November 2009 6. Fort, T.: Controlling randomness: using procedural generation to influence player uncertainty in video games (2015) 7. Freiknecht, J., Effelsberg, W.: A survey on the procedural generation of virtual worlds. Multimodal Technol. Interact. 1(4), 27 (2017). ISSN: 2414–4088 8. Gaisbauer, W., Hlavacs, H.: Procedural attack! procedural generation for populated virtual cities: a survey. Int. J. Serious Games 4(2), 19–29 (2017) 9. Gearbox Software: Borderlands, Computer Software (2009) 10. Hello Games: No Man’s Sky (2016) 11. Hendrikx, M., et al.: Procedural content generation for games: a survey. ACM Trans. Multimedia Comput. Commun. Appl. 9(1), 1:1–1:22 (2013). ISSN: 1551– 6857 12. Khaled, R., Nelson, M.J., Barr, P.: Design metaphors for procedural content generation in games. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2013. Paris, France, pp. 1509–1518. ACM (2013). ISBN: 978-1-4503-1899-0 13. Manaugh, G.: British countryside generator (2013) 14. Mattheus, D.T.: Rogue - Video Game. Ventana Press Inc., Chapel Hill (2012). ISBN: 6138576403, 9786138576402 15. Spore, M.: Computer Software (2008) 16. Mojang: Minecraft (2011) 17. Smith, G.: Understanding procedural content generation: a design-centric analysis of the role of PCG in games. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI 2014. Toronto, Ontario, Canada, pp. 917–926. ACM (2014) 18. Smith, G., et al.: PCG-based game design: enabling new play experiences through procedural content generation. In: Proceedings of the 2nd International Workshop on Procedural Content Generation in Games. PCGames 2011, Bordeaux, France, pp. 7:1–7:4. ACM (2011) 19. Togelius, J., et al.: Procedural content generation: goals, challenges and actionable steps. In: Lucas, S,M., et al. (ed.) Artificial and Computational Intelligence in Games, vol. 6, pp. 61–75. Dagstuhl Follow-Ups. Dagstuhl, Germany: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2013)
“Let’s Play a Game!” Serious Games for Arabic Children with Dictation Difficulties Samaa M. Shohieb1(B) , Abd Elghaffar M. Elhady2 , Abdelrahman Mohsen3 , Abdelrahman Elbossaty3 , Eiad Rotab3 , Hajar Abdelmonem3 , Naira Elsaeed3 , Haidy Mostafa3 , Marwa Atef3 , Mai Tharwat3 , Aya Reda3 , M. Aya3 , and Shaibou Abdoulai Haji4 1 Faculty of Computers and Information, Mansoura University, Mansoura, Egypt
[email protected] 2 Deanship of Scientific Research, Umm Al-Qura University, Mecca, Saudi Arabia
[email protected] 3 Students in the Faculty of Computers and Information, Mansoura University, Mansoura, Egypt 4 Institute of Educational Research, Korea University, Seoul, South Korea
[email protected]
Abstract. The purpose of this paper is to present a gaming application (Mega Challenger) dedicated to students with dictation difficulties between seven and nine years old. The user interface of the game will be presented in this paper, as well as, design principles of the game. Also, experiments have been performed to test the effect of the Mega Challenger game on the performance achievement of students with dictation difficulties; a type of dysgraphia. A dictation achievement preassessment was held for the students based on learning difficulties standards, and all results were recorded for further analysis. After applying 48 training sessions using the “Mega Challenger” application for 16 weeks, we had a post-assessment, with the assistance of learning difficulties tutors and experts. According to the experiment, pre and post results of the students show a significant difference between the student’s performance in the pre-assessment and post-assessment. Usability evaluation of Mega Challenger based on ISO 25062:2006, for postuser study interviews with students’ teachers concluded that Mega Challenger decreases the assessment loads of teachers and is a promising tool for students who have dictation difficulty. Children with dictation difficulties achieve better academic results when they play the game and that they are more motivated in serious games. Keywords: Game-based learning · Dictation difficulty · Learning disability · Technology in learning · Human-Computer interaction
1 Introduction and Motivation Learning difficulties are neurological based processing problems. These processing problems can interfere with learning necessary skills such as reading, writing and mathematics. They can also interfere with higher level skills such as organization, time planning, © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 77–89, 2020. https://doi.org/10.1007/978-3-030-65736-9_6
78
S. M. Shohieb et al.
long or short-term memory and attention. To handle learning difficulties, students need to practice a combination of their skills; information processing and motor skills. Some researchers studied the importance of a game-based approach for learning processes aiming at increasing engagement and motivation in general [1–8]. Also, more specifically, attempts were made to handle learners with learning difficulties using gamebased rehabilitation approach such as [2] who presented a prototype of a tablet-based game for writing practice. The first evaluation was by adults (about 24 years) reporting a perception of an appreciation and usefulness of the gameplay. Researchers in [3] compared the sensorimotor approach with the effect of a computer-assisted practice on handwriting problems reduction for children with learning difficulties and dictation difficulties. They performed their tests on 42 students divided into three groups; control group, sensorimotor approach group, and computer group. The computer group showed more significant development than the other groups. Giordano and Maiorana [4] presented a practical web-based software system that can be used on tablets. Their system based on the Dynamic Time Warping algorithm to recognize the hand gestures through writing from merely connecting dots to complete writing a word that is compared with a reference trace that an expert has done. Also, [5] built a responsive web-based game to improve dysgraphia children’ handwriting. The game helps students in writing words or letters to get a better score and make them practice in pre-graphism delete simple forms such as circles and straight lines, through many game motivations such as other rewards and more lives to advance in the game. They informally tested with children under the supervision of the developer. The students were engaged; the game has an impact on the students. Moreover, Borghese et al. [6] designed an early intervention game in collaboration with clinicians that was called “Exergames” for the dysgraphia disorder. The designed game focused on prewriting exercises. The Exergames were reliably tested on 16 kindergarten children. Berningera et al. [7], as well, used iPad computer-assisted writing instruction to improve skills of students between 4th and 9th grade who suffer from dyslexia, dysgraphia or oral and written language learning disability (OWL LD) (impaired syntax composing). They proved that the customized computerized systems could be used for these grades successfully. O’Halloran [8] designed an android based application based on the genetic algorithm that provides the participant with a series of cues (anagram, grapheme and semantic). The post-treatment attempts after applying their tests on real students showed that spelling moved closer to target words. Most of the available works in handling students with dysgraphia difficulties target the English language except [9] that presented and investigated the efficiency of an educational application that contained different games to handle some learning difficulties in the Arabic language especially reading difficulty. Therefore, there is a significant shortage of applications that handle students with dictation difficulties, a special type of dysgraphia, in the Arabic language. Consequently, this paper presents a design, implementation, and evaluation of an IOS games application [10] that is dedicated to handling students with dictation difficulties at single word level and utilizes tablet computer or mobile devices as therapy tools. The implemented game focuses on students with dictation difficulties for the Arabic language. Our application helps students to learn how to write Arabic letters and words easily, how to differentiate between pronunciations
“Let’s Play a Game!” Serious Games for Arabic Children
79
of Arabic words, how to compare between similar Arabic (letters/word), how to know differences between directions and learning numbers more efficiently, as well. We called the gaming application “Mega Challenger” as it helps the student to challenge his/her difficulties so that he/she can achieve better learning performance. The software user interface Mega Challenger design principles are presented in sections two and three respectively. Methodology and Experimental Design is discussed and proffered in section four. Section 5 presents the data analysis and results. Sections six and seven conclude and highlight future work respectively
2 Mega Challenger User Interface Mega Challenger was implemented as ISO/IEC/IEEE 90003:2018 compatible software [11] and evaluated as ISO 25062:2006 (last reviewed and confirmed in 2019) [12]. It consists of three independent games that each game can be played independently from the other games. The games target enhancing a specific problem related to letter recognition and dictation difficulties. The first screen of the entire game is divided into three options; pronunciation of characters, learn how to spell words and write numbers, or the main directions as shown in Fig. 1, in the same order. The details of each game are described as following.
Fig. 1. The first screen of the entire game is divided into three options from top to down (Pronunciation of Arabic letters, Learn to write words and numbers, and Learning the main directions)
80
S. M. Shohieb et al.
2.1 Pronunciation of Arabic Letters Students were presented with a list of the twenty-eight letters of the Arabic alphabet. When the student touches any letter from the list, he/she hear the pronunciation of the letter. This is considered a training level for the student. This is important to the students with dictation difficulties as it connects the hearing and sight senses with the learning process. Figure 2 shows a screenshot of the entire game. This was an extra feature added by the authors to add the efficiency and effectiveness of the game.
Fig. 2. A list of the twenty-eight letter of the Arabic alphabet embedded in the pronunciation of Arabic letters
2.2 Learn How to Spell Words and Write Numbers The second game is “Learning How to Spell Words”. This helps the students with dictation difficulties on how to spell an Arabic word correctly. It advances in three subsequent levels. Words presented in the game were chosen sequenced systematically based on the “Nour Elbian” general guidelines to teach Arabic writing for children [13]. The first screen shows the word, picture to this word and a list of Arabic characters. The student must choose the correct letters of the shown word from the list. If the student chose the letter correctly would be shine, else nothing will happen to the letters, after student chooses all letters that included in the shown word and all of it became shine, he/she should write it correctly as shown in the reference word, Fig. 3.A. Mega Challenger will check the written word if it is true or false. If the written word is false, the student must repeat these steps. If the written word is correct student will go to the next level. The second screen will show the same picture shown in the previous screen, but this time the picture is without a word. Instead, there are many choices of the related word
“Let’s Play a Game!” Serious Games for Arabic Children
A
B
81
C
Fig. 3. Learning how to write a specific word beginning from the easier to the harder
but with wrong spelling. Only one of these words is correct, and the student must select the correct word from them. If the student’s selection is correct, he/she will proceed to the third level, as shown in Fig. 3.B. The third screen has a picture only, and the student writes the word that the picture represents without any help or reference word. If the student writes the correct word, he will proceed to another word. There are many words, and he must repeat these steps on each word, as shown in Fig. 3.C. Besides helping students in learning letters writing, they also learn in writing Indian numbers (the numbers technique used in the Arab world) correctly. Many levels from one to nine in which each screen has some fruits. The student is asked to count the number of appeared fruits and choose the correct number from the numbers given the list. If the selection is correct, the number will be put in the space and then disappear. The student repeats this step until he chooses the correct number. By this method, the student learns the correct way of writing numbers and the correct direction of numbers as shown in Figs. 4. This game can handle children with dictation difficulties such as; change the word spelling, using a wrong letter or use similar characters, or even ignore writing some letters.
82
S. M. Shohieb et al.
A
B
C
Fig. 4. A game to help students in recognition and writing of the Indian numbers
2.3 Learning Directions Most of the students with dictation difficulties are not able to differentiate between directions. This game helps the student to learn the name of each direction and match with a suitable arrow. The student is asked to choose a suitable arrow for every direction. These levels are shown in Fig. 5
3 “Mega Challenger” Game Design Principles The user interface of Mega Challenger is designed in a manner to fit children who start reading and writing (between 7–9 years old). The design principles are stratified with the guidelines of game design [14] which are to: 1. Provide informative and immediate feedback. The feedback is visualized through making a comparison with a reference trace and combined with written text. 2. Reduce short-term memory load. This is gained, for instance, by clearly visible record and score for playing the game. 3. Presentation based on a feedforward technique. For example, we used bright colors for objects, and light colors for backgrounds, so that the student can easily concentrate on the targeted task.
“Let’s Play a Game!” Serious Games for Arabic Children
83
Fig. 5. A game to help students in recognition the directions
4. The game is accessible, free, and easy to reach on the Apple store. It can easily be installed on an iPad or iPhone device. 5. Tracking of student performance should be allowed in the game by storing his/her advancement and score. A set of writing exercises are hidden inside the game dynamics that are presented to the children in a way to avoid boredom. Also, the game motivates the writing process of the players with offering immediate encouraging audible feedback reward to go ahead in the game.
4 Methodology and Experimental Design This experiment tests the effect of playing the game on students with dictation difficulties motivation and learning performance achievement. The work has been accomplished through implementing a mobile-based application and testing it with a sample of students with dictation difficulties. The null hypothesis states that there is no statistically significant difference in student’s achievement when they receive a learning game. The first hypothesis (H1) claims that students with dictation difficulties achieve better academic results when they play the game. The second hypothesis (H2) claims that students with dictation difficulties are more motivated in serious games. 4.1 Sampling Procedure Students with dictation difficulty who are between seven and nine years old in Mansoura city primary schools, in Egypt, in the academic year 2018–2019 were the targeted statistical population of our experiments. The selected population (including 42 students from 3 different primary schools) is selected with utilising volunteering sampling technique after obtaining consent from their parents and an approval of the children to participate.
84
S. M. Shohieb et al.
An ethical approval was obtained from the ethics committee of the faculty of Computers and Information, Mansoura University_ Egypt. The students differ in their demographic data. Fifteen of them were girls and twenty seven were boys. Twenty of students were 7 years, ten of them were 8 years, and eight of them were 9 years. 4.2 Clinical Interview Three psychological and behavioral experts have clinically interviewed the selected students. They analyzed the students by criteria of the 4th edition of Diagnostic and Statistical Manual of Mental Disorders (DSM-V MD) [15] to detect any mental, visual or auditory disability. Therefore, the dictation difficulties could be specified whether they are generated because of mental disabilities or learning disabilities. 4.3 Dictation Achievement Assessment A dictation test from the academic books of the first grade is performed for the students to detect ones who have dictation disabilities. The test was performed according to standard dictation tests regulation; the dictation achievement test consists of 60 words [16]. For more reliability and accuracy of the test, psychological and behavioral experts were scrutinizing the students’ performance during the assessment. After performing the pre- assessment, students without dictation difficulties are given 0, and students with learning difficulties are given 1. Also, students with dictation difficulties were assigned a score between 1 and 10 to detect the achievement record and consequently the severity of their problem based on the dictation pre-assessment. 4.4 Test Conduction and Evaluation (Data Collection) Mega Challenger was offered to learning difficulties specialists for evaluation and the software was modified based on their comments. Then the final version of the software was used with the students over 16 weeks with 48 sessions (3 sessions every week). Each session was about 10–15 min; this time is recommended by learning difficulties specialists who helped us. Sessions were conducted under the supervision of both the tutors of the difficulties specialists and the researchers. Mega Challenger was presented on a touch-tablets and touching phones. In order to maximize the benefit from Mega Challenger, students were asked to play the game in its sequence that was designed based on the “Nour Elbian” general guidelines to teach Arabic writing for children [13]. Finally, we administered a post to the students with dictation difficulties assessment and compared the results with the pre-assessment to determine how the computer-based game improves the performance of the students with dictation difficulties. We presented a questionnaire to teachers of children with dictation difficulties. There were six teachers for 42 students in 3 primary schools. Questions were asked to the teachers after the user study (Table 1 that in turn concerned the perceptions of the teachers about utilizing Mega Challenger. All the presented questions were created based on usability measurements for game development standards [17] and ISO 25062:2006 (last reviewed and confirmed in 2019) [12]. Questions were in Arabic language and translated into English after conducting the study for readability reasons. Authors conducted quantitative analyses on the collected interview content.
“Let’s Play a Game!” Serious Games for Arabic Children
85
Table 1. Questions asked in the interview after the user study. Q#
Interview Questions
1.
Did your frequency of administering teaching sessions to your students increase after using Mega Challenger? Also, describe the time and place of using it most frequently. Describe any specific issues experienced while using Mega Challenger, and any notes you may have in that regard. Describe any noteworthy responses shown by your students while using Mega Challenger, and upon which feature/instance that response was given. Do you feel that Mega Challenger presented instructions and verbal communication clearly, i.e., was Mega Challenger clear enough to teach dictation through its instructions? Which one do you consider is the most suitable device size to utilize Mega Challenger: tablet or smartphone? Was your student able to use Mega Challenger alone (without assistance)? What would you recommend adding to Mega Challenger, and why? What do you consider the positive and negative aspects of “Mega Challenger” to be, and why?
2. 3. 4. 5. 6. 7. 8.
5 Results and Discussion 5.1 Usability Test Analysis It was performed on questions that were asked to the teachers after the user study, which are formulated as seen in Table 1. This analysis was based on ISO 25062:2006 [12], which is a standard method for reporting the usability evaluation of an application. These questions investigated teachers’ experience with and feedback about using Mega Challenge. The usability measurements with each metric response are shown in Table 2. 5.2 Data Analysis of Student’s Performance in the Pre-assessment and Post-assessment The results show a significant difference between the student’s performance in the preassessment and post-assessment. Consequently, the students were greatly motivated by playing a game. Base on the experiments we rejected the null hypothesis stating that there is no development in the student’s performance when they receive a learning game. Results of each student for the pre and post-assessments are shown in Fig. 6. A paired-samples t-test was conducted to compare students’ academic performance before and after using Mega Challenger. There was a significant difference in the scores of the students before (M = 2.43, SD = 1.06) and after (M = 5.10, SD = 1.32) using Mega Challenger; t (41) = −30.31, p = 0.000. These results suggest that the use of Mega Challenger does help students with dictation difficulties to perform better academically. Specifically, our results suggest that when academic games are incorporated into the teaching and learning process, it helps students with dictation difficulties.
86
S. M. Shohieb et al. Table 2. Usability objective and subjective measurements. Usability Attribute
The Question Number and the corresponding Metric
Effectiveness Q5: Device performance with the game Q7: Increase student’s concentration Q8: Lessons go smoothly Efficiency Q4: Grading the action of the child Q6: Needs a little help from the teachers Q7: Willing to write short words after using MegaChallenger Q7: Willing to make written exercise after MegaChallenger Q7: Willing to be dictated after MegaChallenger User Q1: Ease of teaching Satisfaction Q2: Ease of calibrating difficulty levels of vocabulary Q3: Fits the students needs Q8: Willing to continue learning using MegaChallenger
Number of positive
6/6 4/6 6/6 6/6 6/6 6/6 5/6 5/6 6/6 6/6 5/6 4/6
Data were collected and analyzed to have an understanding of the effectiveness of the Mega Challenger. Post-assessment in combination with teacher’s reports revealed that the Mega Challenge application was attractive to student. Also, the students were engaged by both the voiced pronunciation of the word. Teachers, additionally, mentioned that the application sounds provided uniformity in teaching dictation to students, which better accommodated the children’s learning. The majority of the participants gave positive responses to the questions, which revealed satisfactory user experience with the game. Figures 6 reveals that parents believed the application to fit students’ needs, they can use the game almost every day at home. Teachers found it easy to use to calibrate the difficulty levels of teaching materials, and to teach dictation via Mega Challenger. Teachers indicated that the application ran well on mobile phones, though they added that they may prefer tablets. They appreciated the accessibility of to the application as it could be used anywhere, how it could be used semi-independently or with simple guiding, and how smoothly the lessons went. They, consequently, recommended the inclusion of more vocabulary and ability to train of writing sentences.
“Let’s Play a Game!” Serious Games for Arabic Children
87
Fig. 6. Results of each student in the pre and post-assessment
6 Conclusions Mega Challenger is a gaming application that has been built based on the teaching strategies applied for children with dictation difficulties. It was designed, implemented tested on 42 children who have different demographic data but all have dictation difficulties. According to the experimental results of the students, show a significant difference between the student’s performance in the pre-assessment and post-assessment. Usability evaluation of Mega Challenger, based on ISO 25062:2006 [12], on post-user study interviews with students’ teachers concluded that Mega Challenger decreases the assessment loads of teachers. Mega Challenger is a promising tool for students who have dictation difficulty. We concluded that children with dictation difficulties achieve better academic results when they play the game and that children with dictation difficulties are more motivated in serious games.
88
S. M. Shohieb et al.
7 Future Work 7.1 Improving the Interaction Authors have observed that some children have weak finger muscle coordination or weak arms. Ex. While they try to touch the screen of the iPad, their touch finger sometimes crashed the surface, that the application incorrectly recognized that action as scrolling instead of touching. Therefore, the sensitivity of the application should be increased. Also, the author hopes to extend Mega Challenger to be feasible for much more learning difficulties, such as dyscalculia and dyslexia. 7.2 Enhancing the Interface The next goal of this research is to enhance the Mega Challenger to teach complete sentences to the learning difficulty in the Arabic language. We can add acting verbs and the other sentence parts to the Mega Challenger. 7.3 Enhancing the User Centered Design Enhancing the user-centred design by recruiting more volunteers as users is one of the most important issues in the future work. Mega Challenger needs to be tested on more children with dictation difficulties to take more feedbacks, consequently improving the game.
References 1. Venkatesh, V., Bala, H.: Technology acceptance model 3 and a research agenda on interventions. Decis. Sci. 39(2), 273–315 (2008) 2. Tan, C.T., Huang, J., Pisan, Y.: Initial perceptions of a touch-based tablet handwriting serious game. In: Anacleto, J.C., Clua, E.W.G., da Silva, F.S.C., Fels, S., Yang, H.S. (eds.) ICEC 2013. LNCS, vol. 8215, pp. 172–175. Springer, Heidelberg (2013). https://doi.org/10.1007/ 978-3-642-41106-9_24 3. Shao-Hsia, C., Nan-Ying, Y.: The effect of computer-assisted therapeutic practice for children with handwriting deficit: a comparison with the effect of the traditional sensorimotor approach. Res. Dev. Disabil. 35(7), 1648–1657 (2014) 4. Giordano, D., Maiorana, F.: Addressing dysgraphia with a mobile, webased software with interactive feedback. In: Proceedings of the International Conference on Biomedical and Health Informatics (BHI) 2014, EMBS, Valencia, pp. 264–268. IEEE (2016) 5. Giordano, D., Maiorana, F.: A mobile web game approach for improving dysgraphia. In: Proceedings of the 7th International Conference on Computer Supported Education 2015, CSEDU, Lisbon, 1, pp. 328–333 (2015) 6. Borghese, N.A., et al.: Assessment of exergames as treatment and prevention of dysgraphia. In: Ibáñez, J., González-Vargas, J., Azorín, J.M., Akay, M., Pons, J.L. (eds.) Converging Clinical and Engineering Research on Neurorehabilitation II. BB, vol. 15, pp. 431–436. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-46669-9_72 7. Berningera, W., Nagy, W., Tanimo, S., Thompson, R., Abbott, R.D.: Computer instruction in handwriting, spelling, and composing for students with specific learning disabilities in grades 4 to 9. Comput. Educ. 81, 154–168 (2015)
“Let’s Play a Game!” Serious Games for Arabic Children
89
8. O’Halloran, C.: Using a tablet computer application to treat acquired dysgraphia and boost word output, University of Limerick Theses & Dissertations (2013). Homepage. http://hdl. handle.net/10344/3528. Access 03 June 2020 9. Salah, J., Abdennadher, S., Sabty, C., Abdelrahman, Y.: Super alpha: arabic alphabet learning serious game for children with learning disabilities. In: Marsh, T., Ma, M., Oliveira, M.F., Baalsrud Hauge, Jannicke, Göbel, S. (eds.) JCSG 2016. LNCS, vol. 9894, pp. 104–115. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45841-0_9 10. He, G.F., Park, J.W., Kang, S.K., Jung, S.T.: Development of gesture recognition- based serious games. In: the proceeding of International Conference on Biomedical and Health Informatics 2012, BH, Hong Kong, pp. 922–925. IEEE (2012) 11. ISO/IEC/IEEE 90003:2018 Software engineering — Guidelines for the application of ISO 9001:2015 to computer software Homepage. https://www.iso.org/standard/74348.html. Accessed 03 June 2020 12. ISO/IEC 25062: 2019 Software engineering — Software product Quality Requirements and Evaluation (SQuaRE) — Common Industry Format (CIF) for usability test reports Homepage. https://www.iso.org/obp/ui/#iso:std:iso-iec:25062:ed-1:v2:en. Accessed 03 June 2020 13. Nour Elbian Homepage. http://etheses.uin-malang.ac.id/id/eprint/5995. Accessed 03 June 2020 14. Moreno-Ger, P., Burgos, D., Martínez-Ortiz, I., Sierra, J.L., Fernández-Manjón, B.: Educational game design for online education. Comput. Hum. Behav. 24(6), 2530–2540 (2008) 15. Kupfer, D.J., First, M.B., Regier, D.A.: A Research Agenda for DSM V. American Psychiatric Pub (2008) 16. Malekian, F., Akhundi, A.: The effects of educational multimedia in dictation improvement of exceptional students with dictation difficulty. Journal of Modern Thoughts in Education 6(1), 145–162 (2010) 17. Digital Game Development Standards Homepage. http://www.doe.nv.gov/uploadedFiles/nde doenvgov/content/CTE/Programs/InfoMediaTech/Standards/Digital-Game-DevelopmentSTDS-ADA.pdf. Accessed 03 June 2020
Provchastic: Understanding and Predicting Game Events Using Provenance Troy C. Kohwalter(B) , Leonardo G. P. Murta , and Esteban W. G. Clua Fluminense Federal University, Niter´ oi, Rj 24210-346, Brazil {troy,leomurta,esteban}@ic.uff.br Abstract. Game analytics became a very popular and strategic tool for business intelligence in the game industry. One of the many aspects of game analytics is predictive analytics, which generates predictive models using statistics derived from game sessions to predict future events. The generation of predictive models is of great interest in the context of game analytics with numerous applications for games, such as predicting player behavior, the sequence of future events, and win probabilities. Recently, a novel approach emerged for capturing and storing data from game sessions using provenance, which encodes cause and effect relationships together with the telemetry data. In this work, we propose a stochastic approach for game analytics based on that novel game provenance information. This approach unifies all gathered provenance data from different game sessions to create a probabilistic graph that determines the sequence of possible events using the commonly known stochastic model of Markov chains and also allows for understanding the reasons to reach a specific state. We integrated our solution with an existing opensource provenance visualization tool and provided a case study using real data for validation. We could observe that it is possible to create probabilistic models using provenance graphs for short and long predictions and to understand how to reach a specific state. Keywords: Provenance graph Stochastic model
1
· Predictive analytics · Markov chains ·
Introduction
Game analytics became an emerging field that is very popular and important for business intelligence in the game industry. It provides a wealth of information for game designers, including feedback about design and gameplay mechanics, player experience, production performance, and even market reaction. Thus, the main goal of game analytics is to support the decision-making process at the operational, tactical, and strategic levels for game development. Moreover, it is the main source of business intelligence for the game industry [5]. Supported by CAPES, CNPq, and Faperj. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 90–103, 2020. https://doi.org/10.1007/978-3-030-65736-9_7
Provchastic: Understanding and Predicting Game Events Using Provenance
91
One of the many aspects of game analytics is predictive analytics [6], which generates predictive models using statistics derived from datasets to generate statistical scores to predict future events [2]. Predictive analytics has a lot of usages in the game domain. It can be used to predict player behavior, sequence of future events based on the current game state, win probabilities, strategies on competitive games, or even be used in monetization decisions and increase the game’s revenue [1]. Thus, the generation of predictive models is of great interest in the context of game analytics, and is not a new field [7,12,13,15–17]. For example, Dereszynski et al. [4] presented a probabilistic framework for RTS games based on hidden Markov models by counting the number of units and buildings to predict build order. Yang et al. [19] proposed a data-driven approach to discover combat strategies of winning teams in MOBA games. Schubert et al. [14] presented a predicting model for Dota 2 to predict match results based on the encounters within the game session. Cleghern, Zach, et al. [3] introduced an approach to forecast changes in the hero’s health in the MOBA game Dota 2 by observing past game session data. However, none of these works take into consideration contextual information that might impact in the outcome and use game metrics over the course of the game session for predictions. Recently, Kohwalter et al. [10] proposed a novel approach named PinGU1 [10] for capturing and storing provenance data from a game session based on the Provenance in Games conceptual framework [9]. The wealth of provenance data collected during a game session is fundamental for understanding the mistakes made as well as reproducing the same results at a later moment. However, up to this point, Kohwalter et al. only focused only on analyzing a single provenance graph at a time, not comparing different game sessions to understand why some paths taken by players lead to failure while others succeeded in reaching their goals. Thus, this led to an opportunity to further explore the applications of this promising approach for game provenance data due to its wealth of information, resulting in the following research question: RQ: Does the use of provenance obtained from multiple game sessions support predictions and understanding of events for future game sessions? Therefore, in this work, we propose a stochastic approach for game analytics based on the tracked provenance data. This approach merges the tracked provenance data from multiple game sessions into a stochastic graph. This stochastic graph, represented as Markov chains [8], determines the sequence of possible events, taking advantage of the graph nature of the provenance data to navigate this stochastic model for both prediction and understanding causes for the events. Our solution is compatible with an existing open source provenance visualization tool and provide a case study using real game provenance data. We believe that our stochastic graph supports a variety of usages for game analytics since it is based on an well-known and vastly used model of Markov Chains. 1
https://github.com/gems-uff/ping.
92
T. C. Kohwalter et al.
Applications can include: AI behavior, human assistant, a decision process for interactive storytelling, among other possibilities. This paper is organized as follows: Sect. 2 presents the existing work in the literature. Section 3 presents our proposed approach for generating a stochastic graph based on provenance data. Section 4 presents our case of study. Finally, Sect. 5 concludes this work.
2
Related Work
There are many studies related to predicting outcome or strategies in digital games. Most of these studies are focused on competitive games, such as RTS or MOBAs. For example, Erickson and Buro [7] proposed a model for predicting the winning player in StarCraft using logistic regression from replay data. ˇ Stanescu and Certick´ y [15] proposed a prediction system based on the Answer Set Programming paradigm to predict the number and type of units a StarCraft or WarCraft III player trains in a given amount of time. Rioult et al. [13] proposed an approach to predict wins and losses based on topological information using player locations. Synnaeve and Bessi`ere [17] employed Bayesian model to predict the likely strategies Starcraft players would employ in the beginning of a game. Summerville et al. [16] used machine learning techniques to predict the selection of heroes in Dota 2. Below we cite in more details some of the many related works. Cleghern, Zach, et al. [3] introduced an approach to forecast changes in the hero’s health in the MOBA game Dota 2 by observing past game session data (i.e., replay logs) and splitting the data related to health values into two time series: one for small changes in health and another for large changes. They used this splitting approach to predict both types (small and large) of changes using statistical models. The authors used an auto-regressive moving-average model for small changes and a combination of statistical models for large changes. They combined both methods to create a forecasting system to forecast changes in heath in a game session. However, their approach considers only health data and not other information of the game session (e.g., events). Furthermore, their prediction model is constrained by the match duration, lacks contextual information, and is only focused on MOBAs. Yang et al. [19] proposed a data-driven approach to discover combat strategies of winning teams in MOBA games. The authors modelled each combat as a sequence of graphs, where each vertex represent a player associated to a role in the game, resulting in 10 vertices (five for each team). They also created another special vertex to represent the death state. The edges in the graph represent the interactions between players, which can be either doing damage on the adversary or healing a partner, or the death of a player (which connects to the death vertex). The authors use these combat graphs to train a decision tree based on the best features using five graph metrics (in-degree, out-degree, closeness, betweenness, and eigenvector centrality). This tree is then used to mine patterns that are predictive for winning the game. However, their approach generates
Provchastic: Understanding and Predicting Game Events Using Provenance
93
Fig. 1. The Provchastic approach overview.
generic high-level rules that does not consider important contextual information such as heroes, level differences, equipment, abilities used, and positioning. As such, it cannot reveal the dynamics of each combat, only high level factors that tends to determine the outcome of the game. Dereszynski et al. [4] presented a probabilistic framework for RTS games based on hidden Markov models. Their approach is focused on predicting the base building order by observing the timing of the current state to predict future states based on probabilistic inference. The authors limit the states to the number of units and buildings. Each state is measured at every 30 s of the game. Their strategy, which is similar to our own, allows to capture the likelihood of different choices and the probability of producing particular units in each state based on the observed events. However, their model is not sufficient to capture other aspects of the game StarCraft, such as tactical decisions, and is limited to only the first seven minutes of the game because players tend to execute their first minutes in isolation and not reactive to the other player’s tactics and unit composition.
3
Provchastic
Our proposed approach, named Provchastic (Prov enance for generating stochastic models), is a probabilistic approach for game analytics that takes advantage of a recent approach for tracking and storing game provenance data developed by Kohwalter et al. [10]. Our stochastic model was inspired by the work of Lins et al. [11], where they proposed the Timed Word Tree visualization, which is a variation of Word Tree displays [18]. The idea is to explore the structured nature of provenance graphs to generate a stochastic model using the commonly known Markov Chains [8]. The stochastic model is derived from a set of provenance graphs from captured game sessions, resulting in a unified provenance graph that contains the probabilities to change states. Our proposed Provchastic approach is divided in two major phases: (1) provenance unification and (2) stochastic model creation. Figure 1 gives an overview of these phases and how they are related, which we describe in more details in the following sections.
94
3.1
T. C. Kohwalter et al.
Provenance Unification
The first major phase is responsible for the creation of the unified graph, which is used to generate the stochastic graph. The process of creating a unified graph requires four procedures, as illustrated by Fig. 2: (1) a matching heuristic to match vertices from different graphs, (2) the definition of vertex similarity, (3) vertex merge, and (4) a graph merge. The merge process occurs by merging two graphs at a time and, consequently, the matching heuristic uses only two graphs at a time as well.
Fig. 2. Provenance unification process.
A matching heuristic for vertex selection is used to restrict the search space for vertex matching and avoid making a Cartesian product between vertices from both graphs. One solution for graph matching is graph isomorphism. However, this is a NP-Complete problem and thus we look for heuristics to make the problem resolvable in a reasonable time. Furthermore, the heuristic decides how the comparison between vertices from different graphs is made. It always chooses two vertices (one from each graph) to pass to the Vertex Similarity algorithm for comparison. Currently, we use a heuristic that employs temporal information to define the order to compare vertices from provenance graphs generated from the same game (e.g., different players playing the game or multiple game sessions from the same player). This matching heuristic takes advantage of the chronological nature of provenance graphs to prune the search space. Once a corresponding vertex from the second graph is found for the matching, then both vertices that were matched are never again revisited. Figure 3 provides an overview of the matching process used by our Matching Heuristic. The Vertex Similarity algorithm, also known as the distance metric function, always compares two vertices (e.g., vx1 and vy2 ) for the (similarity) evaluation to establish the similarity between them. The similarity value between two vertices ranges from 0 to 1, where 0 represents total mismatch (0%) and 1 represents a total match (100%). Consider G1 = (V 1 , E 1 ) as a directed graph where V 1 = v11 , v21 , ..., vn1 and consider G2 = (V 2 , E 2 ) as a directed graph where 2 . The comparison algorithm always compares vertices vx1 and V 2 = v12 , v22 , ..., vm 2 vy at a time, where vx1 ∈ V 1 and vy2 ∈ V 2 and V 1 = V 2 . This Vertex Similarity process has three steps: (1) vertex type verification, (2) attribute evaluation, and (3) similarity evaluation. Furthermore, the comparison algorithm uses user-defined parameters for the merge process
Provchastic: Understanding and Predicting Game Events Using Provenance
95
Fig. 3. Matching Heuristic process.
that are associated with each attribute. The user needs to inform the value that represents the acceptable error margin for each specific attribute and the weight of that attribute for the similarity calculation. The first step from the Vertex Similarity algorithm is the vertex type verification. This step receives two vertices (vx1 and vy2 ) from the matching heuristic and checks the type of vx1 and vy2 to verify if they match. If they belong to the same vertex type (i.e., Agent, Activity, or Entity) then it proceeds to the second step (attribute evaluation), which evaluates the attributes from both vertices. In the case where the types are mismatched, then the vertices are not considered similar and the comparison is halted, setting a similarity factor of 0%, skipping the second step (attribute evaluation), and going directly to the third step (similarity evaluation). The second step of the algorithm, which is the attribute evaluation, tries to match each attribute from one vertex (vx1 ) with the same attribute from the other vertex (vy2 ), comparing their values. This comparison also searches the parameters inside the merge configuration file to verify the acceptable error margin for the attribute when dealing with numeric values. Thus, if the difference between the numeric values is lower than the accepted error margin, then the attribute values are considered similar. The third step, which is the similarity evaluation, determines if vx1 and 2 vy can be considered similar and thus suitable for combining into a single vertex in the unified graph. The similarity factor, which is used for the evaluation, is calculated from the number of attributes that were considered similar in both vertices during the second step. The similarity factor is then compared with the similarity threshold. If the similarity factor is below the accepted similarity threshold, then vx1 and vy2 are not considered similar. However, if the similarity factor is equal or greater than to the similarity threshold, then both vertices can be considered similar vertices. Note that these two vertices are the ones received from a matching heuristic. If two vertices are similar, then the algorithm merge them into a single vertex with all attributes from both vertices and their respective values plus a new attribute named GraphFile that shows the origins of
96
T. C. Kohwalter et al.
this new merged vertex, which is the name of the graphs that were used during the merge process. The Graph Merge process is the last process to create a unified graph. This occurs only after the matching heuristic finishes matching vertices from both graphs. All the resulting merged vertices and vertices that were not matched are added in the unified graph. Then, after adding all vertices in the unified graph, we add all edges. Finally, an important step is the distance metric configuration. This step uses the merge configuration file, which contains parameters that are used during the merge process, such as error margin, similarity threshold, and attribute weights for computing the overall similarity of the vertex. 3.2
Stochastic Model Creation
The proposed stochastic approach is based on the stochastic model of Markov Chains, which describes a sequence of possible states with the probability of occurring being dependable of the previous state. We use the unified provenance graph, which was described in the previous section, to derive statistical information about the states that occurred in all of the provenance data and to calculate the probability of jumping from one state to a neighboring state. The generation of our Markov stochastic model is composed of two steps. The first step is to calculate the frequency metric for each vertex and edge of the unified graph. We define the frequency metric as the number of graphs inside the GraphFile attribute divided by the number of total graphs used in all merges that composed the unified graph. This frequency metric is then used to determine the Markov probability for an state to happen based on the previous state. The second step is to calculate the Markov probability for each edge based on the frequency metric. We calculate two types of Markov probabilities for each edge: one for navigating to the future state (future probability) and another when navigating to the past state (past probability). The probability of reaching a future state is used for predicting the next state. The past probability is useful for determining common states that happened before the current state, allowing the analyst to determine the path with the highest probability to reach a desired state, understanding how this state was reached in the game. Its important to remember that the edge orientation in provenance graphs always points from the present to the past, by definition. Thus, if we want to navigate in the graph to predict future states, then we need to traverse the graph in the direction of the source of the incoming edge. If we want to analyse the past states, then we need to traverse the graph in the direction of the target of the outgoing edge. Therefore, if we want to determine the probability of the next state, we need to: (1) Look at how many incoming edges the vertex has; (2) Check each incoming edge frequency; (3) Divide each edge’s frequency by the vertex frequency. It is also important to remember that the sum of all incoming edge’s frequency will always be equal to the vertex’s frequency. If there is only one incoming edge,
Provchastic: Understanding and Predicting Game Events Using Provenance
97
Fig. 4. Abstract example of a provenance graph embedded with Markov probabilities for short predictions. The vertex frequency is represented inside the vertex.
then its frequency is equal to the vertex’s frequency. If there are more than one incoming edge, then their frequency acts as a weight value when distributing the probabilities for taking each path. Calculating the probability to go to a previous state is analogous. The difference is using the outgoing edges instead of incoming edges. The procedure to calculate those Markov probabilities for each path can be summarized as follows: for each edge in graph do sourceVertex ← edge.source targetVertex ← edge.target edge.markovFutureProbability ← edge.frequency/targetVertex.frequency edge.markovPastProbability ← edge.frequency/sourceVertex.frequency This procedure embeds the Markov probability information to navigate both ways in the provenance graph in each edge of the graph. Thus, given a game state, we know the probability to transit to another neighboring state by checking the edge that connects the given state. Figure 4 illustrates an example of an abstract provenance graph with the Markov probabilities for short predictions, considering both ways. Navigation probability to the future is represented by right arrow (→) and for the past is represented by left arrow (←). This Markov embedding is related only to short predictions, which is the immediate vicinity of the current state. Calculating long predictions (multiple states ahead) is achieved by simply determining a path in the graph that connects the current state with the desired state and multiplying all the Markov probabilities of the edges that composes that path. Figure 5 illustrates an example of the same abstract provenance graph with the Markov probabilities for long predictions originating from the vertex marked as source. Another information resulted from the long predictions calculations is the prediction of a particular outcome or event happening at least once in the near future. The event type information is embedded in the provenance vertex and is used in parallel when calculating the probability to reach a specific vertex. For example, Fig. 6 illustrates another (different) graph with embedded long predictions showing the probability to reach at least one of each of the existing different events. This type of prediction is achieved by calculating the probability
98
T. C. Kohwalter et al.
Fig. 5. Same provenance graph from Fig. 5 embedded with Markov probabilities for long predictions originated from the vertex marked as source. The interpretation for the vertices at the right side of the source (i.e., the future) refers to the probability of reaching each vertex. On the left side of source (i.e., the past), the interpretation is related to common states that led to the source.
Fig. 6. Example for predicting possible outcomes for the source vertex. Outcome types are distinguished by the vertex border color: Red, black, yellow, and purple. (Color figure online)
to reach the first event of the desired type for each existing path that leads to that event type and then sum these probabilities of each path for each event type. 3.3
Implementation
We implemented an open source module that allows for merging multiple provenance graphs to generate the unified graph. Our module also computes the Markov probabilities and embeds them in the unified graph as described in the previous section. The resulting unified graph is compatible with the open source
Provchastic: Understanding and Predicting Game Events Using Provenance
99
provenance graph visualization tool Prov Viewer 2 , allowing the user to visually analyze and explore our stochastic graph. Our module also generates short predictions when building the unified graph. Long predictions requires the desired source vertex as an input, which is used to compute the probabilities to reach all other vertices present in the graph and to generate a new unified graph with this information embedded in each vertex. This method allows us to load the generated graph into Prov Viewer in order to generate a visual representation of the predictions.
4
Case Study
In this section, we validate our proposed approach for provenance graphs through a case study analysis. The following research question guided our study: RQ: Does the use of provenance obtained from multiple game sessions support predictions and understanding of events for future game sessions? Our case study is based on the adaptation of the Car Tutorial from Unity asset store3 . This prototype has only one racetrack and focuses on the arcade style racing game. In addition, there are no opponent cars, only the player’s car to simulate a practice run. The prototype gathers provenance data related to key events and actions executed during the game session, along with their effects on other events, to compose the provenance graph (e.g., crashing the car, pressing the car’s brake, losing car control, accelerating, coasting, etc.). We generated a provenance stochastic model for this study using our approach from 75 provenance graphs of the game that were captured from 75 gaming sessions using PinGU [10]. We used a similarity threshold of 95% and error margins for each attribute of approximately 20% (some attributes have slight different error margins due to the observed domain values). Thus, considering the total number of attributes (eight), a vertex is only considered similar to another vertex if all their attributes are considered similar. This unified graph is composed of 2,302 vertices and 8,201 edges extracted from all game sessions, where each graph represents one complete lap in the track. In contrast, the sum of all vertices and edges from the original 75 graphs is 7,840 and 16,494, respectively. Thus, the unified graph had a 70.63% reduction in the number of vertices and 50.27% reduction for the edges. We then used this generated unified provenance graph for the analysis described in the following paragraphs. In Fig. 7 we have a different example that shows two different states that are almost in the same coordinates in the track, but with a difference: the source’s speed of the left graph is around 35 km/h (Slower Vertex ), while the right graph is only at 200 km/h (Faster Vertex ). We can see that this small difference causes different possible outcomes, as show in Table 1. Thus, we can observe that, at this point of the track, being slower increase the probability to crash the car by 2 3
https://github.com/gems-uff/prov-viewer. https://assetstore.unity.com/packages/templates/tutorials/car-tutorial-unity-3-xonly-10.
100
T. C. Kohwalter et al.
13% due to miss-calculating the necessary turn from the lack of speed, resulting in tighter turn and crashing at the side-way of the road, as illustrated at the zoomed section of the figure marked with a red circle. In contrast, going faster increases the chances of the car losing contact with the ground by 9% due to a slight decline in that section of the track.
Fig. 7. Contrasting the different probabilities from two states with similar coordinates in the game. Vertex color is based on the probability of reaching it.
Figure 8 shows an example of different possibilities for outcome due to small differences in the game state. The left graph’s source vertex is in the process of decelerating (Decelerating Vertex ), while the right graph is maintaining speed (Accelerating Vertex ). Table 1 shows the differences in probabilities, which resulted in a 20% chance to crash while decelerating vs 3% while accelerating at that moment due to increased chances to lose car control to instability while braking. This can be observed in the figure by analysing the differences in the marked region inside the yellow rectangle.
Fig. 8. Contrasting the different probabilities from two states with similar coordinates in the game.
Figure 9 shows an example of our stochastic model being used to understand the common paths that led to a particular state, or how to reach it. The vertex circled in red is the source vertex and represents a crash. Looking at that state’s
Provchastic: Understanding and Predicting Game Events Using Provenance
101
Table 1. Contrasting long term predictions Figure 7: Event Probability Faster Vertex
Slower Vertex
Lost Contact w/ Ground
86.2%
78.5%
Crash
33.4%
46.3%
Lost Control
44.7%
35.7%
Figure 8: Event Probability Decelerating Vertex Accelerating Vertex Lost Contact w/ Ground Crash Lost Control Scrapped
1.2%
17.5%
20.1%
3.3%
9.4%
3.5%
17.1%
11.8%
past in our stochastic graph we can see the most common pathways that lead to that outcome. The yellow and blue circles denotes the region with past events with the most probable cause due to the vertex greenish coloration. The red path denotes the general path taken that led to that crash. Looking at these vertices, and comparing to the others nearby, we could see two things that influenced the crash: the high speed, which in turns decreases the maximum turn rate of the car, and their positioning that, when maintaining a high speed does not allow to make that sharp curve without crashing at the side-rails.
Fig. 9. Analysing the most probable causes of a crash. (Color figure online)
Our stochastic model, in the form of unified provenance graph, allows to determine possible outcomes and probable causes of a particular game state. Thus, answering our research question: RQ: Does the use of provenance obtained from multiple game sessions support predictions and understanding of events for future game sessions? Answer: Provenance graphs indeed can be used to create stochastic models based on Markov chain, for example, to predict short and long-term out-
102
T. C. Kohwalter et al.
comes by navigating the graph in the future and to understand how to reach a specific outcome by navigating to the past.
5
Conclusion
In this paper we presented Provchastic, a novel approach for game analytics that creates stochastic models using provenance data from previous game sessions. Our approach is the first work to unify multiple game provenance data from multiple game sessions for multi-session analysis and to create a stochastic provenance graph that determines the sequence of probable events using the commonly known stochastic model of Markov chains. This stochastic model allows the analyst to find out common outcomes for different game states, how they were reached, and to explore multiple game session provenance data at the same time. Provchastic is compatible with the existing provenance capture framework PinGU and the open source visualization tool named Prov Viewer. We demonstrated its usage in conjunction with Prov Viewer by generating the stochastic model from 75 game sessions that tracked provenance data. We could observe that it is possible to create stochastic models using provenance graphs for short and long predictions and to understand probable causes for certain events. A limitation of the approach is related to unseen traces. Each query find a similar state in the existing stochastic model through the similarity algorithm. This can degrade the results if the closest state that was matched is too different from the current state. Moreover, the predictions are only available for previously known traces, which is a limitation on learned systems. Future works includes finding good patterns in the graphs that reached desirable outcomes in order to improve the chances of reaching the same goal in future iterations. Similarly, another approach could be related to detecting bad patterns that should be avoided and compare current player performance with previous executions and point out the decisions that improved or degraded her overall performance in the game session. Another future work is to create a real-time prediction “helper” to aid the player in the decision making process by using projections of the outcome for each of her decisions. Finally, our approach depends on the definition the similarity thresholds. A possible future work consists on discretizing the game scene in multiple regions to run Provchastic in a less fine-grained graph. This might result in more precise predictions since the evaluation will be in a more coarse grain while also improving visual legibility due to having less vertices in the stochastic graph. Acknowledgment. The authors would like to thank CAPES, CNPq, and Faperj for the financial support.
Provchastic: Understanding and Predicting Game Events Using Provenance
103
References 1. Burelli, P.: Predicting customer lifetime value in free-to-play games. Data Analy. Appl. Gaming Entertainment, pp. 11–79 (2019) 2. Clarke, B.S., Clarke, J.L.: Predictive Statistics: Analysis and Inference Beyond Models. Cambridge University Press, vol. 46 (2018) ¨ 3. Cleghern, Z., Lahiri, S., Ozaltin, O., Roberts, D.L.: Predicting future states in dota 2 using value-split models of time series attribute data. In: Proceedings of the 12th International Conference on the Foundations of Digital Games. p. 5. ACM (2017) 4. Dereszynski, E., Hostetler, J., Fern, A., Dietterich, T., Hoang, T.T., Udarbe, M.: Learning probabilistic behavior models in real-time strategy games. In: Seventh Artificial Intelligence and Interactive Digital Entertainment Conference (2011) 5. Drachen, A., El-Nasr, M.S., Canossa, A.: Game Analytics: Maximizing the Value of Player Data. Springer (2013). https://doi.org/10.1007/978-1-4471-4769-5 6. Eckerson, W.W.: Predictive analytics. Extending the Value of Your Data Warehousing Investment. TDWI Best Practices Report 1, 1–36 (2007) 7. Erickson, G.K.S., Buro, M.: Global state evaluation in starcraft. In: Proceedings of the Tenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2014, October 3–7, 2014, North Carolina State University, Raleigh, NC, USA (2014). http://www.aaai.org/ocs/index.php/AIIDE/AIIDE14/ paper/view/8996 8. Kemeny, J.G., Snell, J.L.: Markov Chains. Springer-Verlag, New York (1976) 9. Kohwalter, T., Clua, E., Murta, L.: Provenance in games. In: Brazil Symposium Games Digital Entertainment SBGAMES, pp. 162–171 (2012) 10. Kohwalter, T.C., Murta, L.G.P., Clua, E.W.G.: Capturing game telemetry with provenance. In: 16th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), pp. 66–75. IEEE (2017) 11. Lins, L., Heilbrun, M., Freire, J., Silva, C.: Viscaretrails: visualizing trails in the electronic health record with timed word trees, a pancreas cancer use case. In: Workshop on Visual Analytics in Healthcare (VAHC) (2011) 12. Ravari, Y.N., Bakkes, S., Spronck, P.: Starcraft winner prediction. In: Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference (2016) 13. Rioult, F., M´etivier, J.P., Helleu, B., Scelles, N., Durand, C.: Mining tracks of competitive video games. AASRI Procedia 8, 82–87 (2014) 14. Schubert, M., Drachen, A., Mahlmann, T.: Esports analytics through encounter detection. In: Proceedings of the MIT Sloan Sports Analytics Conference, vol. 1, p. 2016 (2016) ˇ 15. Stanescu, M., Certick` y, M.: Predicting opponent’s production in real-time strategy games with answer set programming. IEEE Trans. Comput. Intell. AI Games 8(1), 89–94 (2014) 16. Summerville, A., Cook, M., Steenhuisen, B.: Draft-analysis of the ancients: predicting draft picks in dota 2 using machine learning. In: Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference (2016) 17. Synnaeve, G., Bessiere, P.: A bayesian model for opening prediction in RTS games with application to starcraft. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG 2011). pp. 281–288. IEEE (2011) 18. Wattenberg, M., Vi´egas, F.B.: The word tree, an interactive visual concordance. IEEE Trans. Vis. Comput. Graph. 14(6), 1221–1228 (2008) 19. Yang, P., Harrison, B.E., Roberts, D.L.: Identifying patterns in combat that are predictive of success in MOBA games. In: FDG (2014)
Applying and Facilitating Serious Location-Based Games Jannicke Baalsrud Hauge1,2(B) , Heinrich Söbke3 and Antoniu Stefan4
, Ioana A. Stefan4
,
1 BIBA – Bremer Institut Für Produktion Und Logistik GmbH, Hochschulring 20,
28359 Bremen, Germany [email protected], [email protected] 2 Royal Institute of Technology, Kvarnbergagatan 12, Södertälje, Sweden 3 Bauhaus-Universität Weimar, Bauhaus-Institute for Infrastructure Solutions (B.Is), Coudraystr. 7, 99423 Weimar, Germany [email protected] 4 Advanced Technology Systems, Str. Tineretului Nr 1, 130029 Targoviste, Romania {ioana.stefan,antoniu.stefan}@ats.com.ro
Abstract. The popularity of location-based games continues unabated and is benefiting from the increasing use of mobile end devices and advantageous general conditions, such as the Internet of Things and the Smart City paradigm. This enormous potential of engagement should also be tapped for serious location-based games, i.e. the use of location-based games beyond the purpose of entertainment. The workshop “Applying and Facilitating Serious Location-based Games” aims to contribute to the development of this potential. In the article, the theoretical basis for this workshop is derived and corresponding frameworks are presented. Keywords: Serious games · Pervasive games · Internet of Things · Smart City · Instructional design
1 Introduction and Background Location-based Augmented Reality (AR) games like Ingress [1] in 2013, Pokémon GO [2] in 2016 and nowadays Minecraft Earth [3] attract increasingly more attention both for leisure as well as for “serious” use. The suitability of using location-based games as learning tools is rooted in a few characteristics that make the use highly appealing: location-based games guide learners to real objects on location and add these objects with information that may impose a learning process. Zydney & Warner reviewed the educational use of mobile apps, which includes location-based games [4]. They identified several theoretical foundations laying the base for the usage of mobile apps in an educational setting: situated learning, inquirybased learning, social-cultural theory, scaffolding, and seamless learning are explicitly referred to. One of the identified theoretical foundations is Mayer’s cognitive theory of multimedia learning [5]. Two examples are the contiguity principle (i.e. the temporal © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 104–109, 2020. https://doi.org/10.1007/978-3-030-65736-9_8
Applying and Facilitating Serious Location-Based Games
105
and/or spatial combination of object and information) and the signaling principle (i.e. the insertion of additional clues). Sweller’s cognitive load theory [6] is also considered as a potential theoretical basis for the use of mobile apps, for example, [7] use cognitive load parameters for assessing the learning efficiency of a location-based app. The large variety in theoretical foundations used, indicates the possible multiple didactical application options of mobile apps. This in turn, allow the transfer of the educational potential to location-based games, which already has a long history [8] and the relation to pervasive games [9]. Several case studies explore the genre: Coenen et al. showcases the pervasive game MuseUs, which allows the creation of a virtual exhibition in a museum and thereby invites visitors to explore the museum’s objects [10]. The authors have examined the App PlayVisit in a digital scavenger hunt scenario aiming at conveying basic knowledge on water management [11]. Gurjanow et al. describe teaching mathematics using a trail in the environment [12]. There are also apps with different purposes from learning: e.g. the pervasive health game Candy Castle, which encourages the player to measure blood sugar at as many different locations as possible [13]. A review of the well-known health effects of Pokémon GO in literature shows the big potential of popular commercial location-based games for serving non-entertainment purposes [14]. These case studies suggest that mobile location-based games are encountering improved general conditions: The spread of mobile end devices is growing, and in several areas, it has reached a point where every resident owns at least one mobile device. The availability of mobile internet has also increased accordingly. Other factors that increase the spread and versatility of location-based games are developments such as the Internet of Things (possible communication with objects via the Internet), Smart City (possible communication with sensors and services within the city) and the ever-increasing number of sensors on mobile devices [15]. However, besides case studies similar to those mentioned, there are hardly any systematic design and application recommendations for serious location-based games that have achieved general recognition and widespread up-take and application. The dynamic development of the technical and organizational general conditions also shows a need for further development of such concepts. Therefore, in this article we would like to outline the conceptual considerations for a workshop that will support the design and use of location-based games on a theoretical and practical level. The workshop is the continuation of a series of workshops on location-based games, including 2019 in Arequipa [16] and 2016 in Vienna [17].
2 Aspects of Serious Location-Based Games The design and application of serious location-based games is guided by various aspects, which will be summarized below, and which may serve as a basis for the workshop’s ongoing efforts. 2.1 Game Design From the aspect of game design, studies on motivation and engagement in general [18] and specifically in location-based games [19–21] are relevant. Further, guidelines
106
J. Baalsrud Hauge et al.
for the design of mobile AR games are provided by [22]. They distinguish guidelines into different categories, such as general, virtual elements, real world elements, social elements, technology and usability. Additionally, a classification of pervasive games is provided by [23], while [24] gives an overview of the design space for location-based games, as well as [25] makes an introduction into pervasive games. From a design perspective, it is also required to look at the sustainability of these apps- i.e. how easily they can be upgraded or adjusted for a different technology. This has become even more apparent during the current situation with many students not being able to attend school or outdoor activities. In this way, the situation has also called for looking at how location-based activities can be designed both for outdoor and indoor activities and how indoor can replace outdoor designed activities. An integration of QR codes and Beacons has proven to be suitable alternatives. The advantage of this is that the usage of QR codes are widely used in the teachinglearning process and can be used in classroom to create activities such as treasure hunt games within and/or around the classroom, to guide students within laboratories, to integrate media in presentations/projects or for storytelling. The process of creating a game-based lesson plan using QR codes is not complicated but a few steps must be taken in order to create a successful activity and to motivate the students. Integrating Beacons as a part of such activities is a little bit more challenging, since several resources are needed: a beacon management system; sensors (Beacons); an App installed to read the beacon data; a Smartphone. An advantage of using beacon is that there is no need for internet connection, but in both cases, there are suitable templates that can be re-used for such a purpose, so that this can reduce the development efforts [28]. 2.2 Didactical Design Didactical design supporting learning is an example for purposeful design of locationbased apps beyond the purpose of entertainment. Design guidelines for educational location-based games comprise also different categories, such as general game design, engagement, learning aspects, and social aspects [26]. Suarez et al. [27] provide a two layer-classification of mobile activities used in inquiry learning potentially serving as inspiration for a classification in location-based gaming. In [28] we are particularly interested in the activities with regard to location-based apps that are conducive to learning, while in [29] we discuss customizable meta games as tools for serious game design. 2.3 Frameworks for Purposeful Integration of Content In addition to the purposeful design discussed in the previous section, meeting nonentertainment goals using location-based apps requires integrating the corresponding content into the apps. There are already available certain frameworks for this purpose of adding learning content in serious games [30–32].
Applying and Facilitating Serious Location-Based Games
107
2.4 Technical Constraints Although the technical framework conditions are constantly improving, location-based mobile games also belong to digital artefacts that are closely linked to the general technical development and are subject to so-called game aging, i.e. they lose their utility value and functions over time [33]. Strategies must be developed to slow down the aging process or to mitigate its effects. 2.5 Facilitation In addition to the design of games, the integration of content and the fitting of technical environment, the application of serious location-based games is considered the phase in which the games become effective. The term facilitation is used to describe the application of serious games supported by lecturers. Facilitation contexts differ in various parameters, for example whether the game is used in lectures or as homework. In addition to the design of facilitation contexts, the lecturers, here called game facilitators, must have various competencies. Therefore, competence models applicable to serious game facilitation [34, 35] need to be discussed.
3 Summary Location-based games are facing increasingly better general conditions. The workshop “Applying and Facilitating Serious Location-based Games” is intended to contribute to leveraging this potential for learning and other purposes by elaborating the framework conditions and systematically presenting existing guidelines and practices.
References 1. 2. 3. 4. 5. 6. 7.
8. 9.
Niantic Labs: Ingress. http://www.ingress.com/ Niantic Inc.: Pokémon Go. http://www.pokemongo.com/ Mojang Studios: Minecraft Earth. https://www.minecraft.net/en-us/about-earth/ Zydney, J.M., Warner, Z.: Mobile apps for science learning: review of research. Comput. Educ. 94, 1–17 (2016). https://doi.org/10.1016/j.compedu.2015.11.001 Mayer, R.E.: Multimedia Learning. Cambridge University Press, New York (2009) Sweller, J., Ayres, P., Kalyuga, S.: Cognitive Load Theory. Springer, New York, USA (2011). https://doi.org/10.1007/978-1-4419-8126-4 Brosda, C., Schaal, S., Bartsch, S., Oppermann, L.: On the use of audio in the educational location based game platform MILE. Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct 2016, pp. 1049– 1054 (2016) https://doi.org/10.1145/2957265.2964198 de Souza e Silva, A., Hjorth, L.: Playful urban spaces: a historical approach to mobile games. Simul. Gaming. 40, 602–625 (2009). https://doi.org/10.1177/1046878109333723 Oppermann, L., Slussareff, M.: Pervasive games. In: Dörner, R., Göbel, S., Kickmeier-Rust, M., Masuch, M., Zweig, K. (eds.) Entertainment Computing and Serious Games. LNCS, vol. 9970, pp. 475–520. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46152-6_18
108
J. Baalsrud Hauge et al.
10. Coenen, T., Mostmans, L., Naessens, K.: MuseUs: case study of a pervasive cultural heritage serious game. J. Comput. Cult. Herit. 6, 1–19 (2013). https://doi.org/10.1145/2460376.246 0379 11. Söbke, H., Baalsrud Hauge, J., Stefan, I.A., Stefan, A.: Using a location-based AR game in environmental engineering. In: van der Spek, E., Göbel, S., Do, E.Y.-L., Clua, E., Baalsrud Hauge, J. (eds.) ICEC-JCSG 2019. LNCS, vol. 11863, pp. 466–469. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34644-7_47 12. Gurjanow, I., Oliveira, M., Zender, J., Santos, P.A., Ludwig, M.: Shallow and deep gamification in mathematics trails. In: Gentile, M., Allegra, M., Söbke, H. (eds.) GALA 2018. LNCS, vol. 11385, pp. 364–374. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-115487_34 13. Stach, C., Schlindwein, L.F.M.: Candy castle-a prototype for pervasive health games. In: IEEE International Conference on Pervasive Computing and Communications Workshops, pp. 501–503 (2012). https://doi.org/10.1109/PerComW.2012.6197547 14. Laato, S., Hyrynsalmi, S., Rauti, S., Sutinen, E.: The effects playing pokémon go has on physical activity-a systematic literature review. In: Proceedings of the 53rd Hawaii International Conference on System Sciences (2020). https://doi.org/10.24251/hicss.2020.417 15. Tregel, T., Leber, F., Göbel, S.: Incentivise me: smartphone-based mobility detection for pervasive games. In: van der Spek, E., Göbel, S., Do, E.Y.-L., Clua, E., Baalsrud Hauge, J. (eds.) ICEC-JCSG 2019. LNCS, vol. 11863, pp. 470–476. Springer, Cham (2019). https:// doi.org/10.1007/978-3-030-34644-7_48 16. Baalsrud Hauge, J., Söbke, H., Stefan, I.A., Stefan, A.: Designing serious mobile locationbased games. In: van der Spek, E., Göbel, S., Do, E.Y.-L., Clua, E., Baalsrud Hauge, J. (eds.) ICEC-JCSG 2019. LNCS, vol. 11863, pp. 479–484. Springer, Cham (2019). https://doi.org/ 10.1007/978-3-030-34644-7_49 17. Baalsrud Hauge, J., Stanescu, I.A., Stefan, A.: Constructing and experimenting pervasive, gamified learning. In: Entertainment Computing – ICEC 2016 15th IFIP TC 14 International Conference Vienna, Austria, September 28–30 (2016) 18. Peters, D., Calvo, R.A., Ryan, R.M.: Designing for motivation, engagement and wellbeing in digital experience. Front. Psychol. 9, 797 (2018). https://doi.org/10.3389/fpsyg.2018.00797 19. Rauschnabel, P.A., Rossmann, A., tom Dieck, M.C.: An adoption framework for mobile augmented reality games: the case of Pokémon Go. Comput. Human Behav. 76, 276–286 (2017). https://doi.org/10.1016/j.chb.2017.07.030 20. Hamari, J., Malik, A., Koski, J., Johri, A.: Uses and gratifications of pokémon go: why do people play mobile location-based augmented reality games? Int. J. Hum. Comput. Interact. 00, 1–16 (2018). https://doi.org/10.1080/10447318.2018.1497115 21. Söbke, H., Baalsrud Hauge, J., Stefan, I.A.: Long-term engagement in mobile location-based augmented reality games. In: Geroimenko, V. (ed.) Augmented Reality Games I. LNCS, pp. 129–147. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15616-9_9 22. Wetzel, R., Blum, L., Broll, W., Oppermann, L.: Designing mobile augmented reality games. In: Furht, B. (eds) (Hrsg.) Handbook of Augmented Reality. S. 513–539. Springer New York, USA (2011). https://doi.org/10.1007/978-1-4614-0064-6_25 23. Hinske, S., Lampe, M., Magerkurth, C., Röcker, C.: Classifying pervasive games: on pervasive computing and mixed reality. Concepts Technol. Pervasive Games A Read. Pervasive Gaming Res. 1(20), pp. 11–37 (2007) 24. Kiefer, P., Matyas, S., Schlieder, C.: Systematically exploring the design space of locationbased games. In: 4th International Conference on Pervasive Computing, pp. 183–190 (2006) 25. Montola, M., Stenros, J., Waern, A.: Pervasive games: theory and design. CRC Press (2009) 26. Ardito, C., Sintoris, C., Raptis, D., Yiannoutsou, N., Avouris, N., Costabile, M.F.: Design guidelines for location-based mobile games for learning. In: International conference on social applications for lifelong learning, pp. 96–100 (2010)
Applying and Facilitating Serious Location-Based Games
109
27. Suárez, Á., Specht, M., Prinsen, F., Kalz, M., Ternier, S.: A review of the types of mobile activities in mobile inquiry-based learning. Comput. Educ. 118, 38–55 (2018). https://doi. org/10.1016/j.compedu.2017.11.004 28. Baalsrud Hauge, J., et al.: Exploring context-aware activities to enhance the learning experience. In: Dias, J., Santos, P.A., Veltkamp, R.C. (eds.) GALA 2017. LNCS, vol. 10653, pp. 238–247. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71940-5_22 29. Stefan, I.A., Baalsrud Hauge, J., Gheorge, A.F., Stefan, A.: Improving learning experiences through customizable metagames. In: Gentile, M., Allegra, M., Söbke, H. (eds.) GALA 2018. LNCS, vol. 11385, pp. 418–421. Springer, Cham (2019). https://doi.org/10.1007/978-3-03011548-7_40 30. Söbke, H., Baalsrud Hauge, J., Stefan, I.A.: Prime example ingress reframing the pervasive game design framework (PGDF). Int. J. Serious Games. 4, 39–58 (2017). https://doi.org/10. 17083/ijsg.v4i2.182 31. Habgood, M.P.J., Ainsworth, S.E.: Motivating children to learn effectively: exploring the value of intrinsic integration in educational games. J. Learn. Sci. 20, 169–206 (2011). https:// doi.org/10.1080/10508406.2010.508029 32. Arnab, S., et al.: Mapping learning and game mechanics for serious games analysis. Br. J. Educ. Technol. 46, 391–411 (2015). https://doi.org/10.1111/bjet.12113 33. Söbke, H., Harder, R., Planck-Wiedenbeck, U.: Two decades of traffic system education using the simulation game MOBILITY. In: Göbel, Stefan, et al. (eds.) JCSG 2018. LNCS, vol. 11243, pp. 43–53. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02762-9_6 34. Kortmann, R., Peters, V.: Demystifying the unseen helmsman: towards a competency model for game facilitators. (2017) 35. Stewart, J.-A.: High-performing (and threshold) competencies for group facilitators. J. Chang. Manag. 6, 417–439 (2006). https://doi.org/10.1080/14697010601087115
The Braille Typist: A Serious Game Proposal for Braille Typewriter Training Kayo C. Santana(B) , Victor T. Sarinho , and Claudia P. Pereira Computer Science Post-Graduate Program-State University of Feira de Santana, Feira de Santana, Bahia, Brazil [email protected], {vsarinho,claudiap}@uefs.br Abstract. People with disabilities are constantly denied from their rights and duties, emphasizing the accessibility importance as a path to social equality and independence for them. This paper presents the Braille Typist proposal, a digital game that uses a Braille typewriter machine emulator to perform individual competitions for training visually impaired people with the Braille typewriter usage. As expected results, an approach to publicize the Braille usage will be provided, as well as promoting a social and digital initiative to include people with visual impairments. Keywords: Serious game
1
· Assistive technology · Visual impairments
Introduction
According to World Health Organization (WHO), more than one billion people in the world have some kind of disability, with an increasing prevalence over the years [8]. These people are constantly denied from their rights and duties, having the worst life perspective, lower education achievements, and a higher rate of poverty when compared with people without disabilities [8]. Due to the growing number of people with disabilities, it is important to emphasize that accessibility needs to be further studied, working on new urban, social and technology inclusion solutions. Accessibility is a path to social equality, giving all people with the means to carry out their activities in an independent way [2]. Regarding visual impairment, WHO affirms that around 314 million people in the world have some visual impairment, of which 45 million are blind. To help these people, some technologies can be used in the process of inclusion and development, such as: voice recognition, speech synthesizer, screen reader, Braille alphabet, and others. The Braille typewriter is a faster and better way to write using the Braille system, but sometimes its use is private because of its high values that make it difficult to acquire. In this sense, this paper proposes The Braille Typist, a game that follows a Braille emulator pattern to play an individual competition to write correct words according to the Braille alphabet. Supported by CAPES. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 110–116, 2020. https://doi.org/10.1007/978-3-030-65736-9_9
A Serious Game Proposal for Braille Typewriter Training
2
111
The Braille Emulator
Louis Braille created the Braille system in 1825, in France. It is recognized as a symbol language or communication method used by visually impaired people (write and read). The Braille alphabet is built under the concept of a 6 dots combination (Braille cell, Fig. 1) that are able to represent letters, numbers, punctuation signals and other characters [7].
Fig. 1. Braille cell. Adapted from [3].
Fig. 2. Braille system characters. Adapted from [4].
Each Braille cell is responsible to represent a character, and the combination of one or many of these is responsible to create words, sentences, numbers and others (Fig. 2). If none of the dots are in relief, a blank space is represented. Otherwise, depending on which ones are in relief, it will represent a different symbol of the Latin alphabet. The “#” character (Fig. 2) is responsible to indicate that the next word is actually a number, instead of letters with similar relief (from “a” to “j”). Regarding the Braille typewriter, it has 6 important buttons (1, 2, 3, 4, 5, and 6) that are used to generate a character in Braille (Fig. 3, arrow 1). According to what they are simultaneously pressed, the Braille cell is displayed, representing how the Braille dots are arranged (Fig. 3, arrow 2). For the Braille emulator, keyboard keys (F, D, S, J, K, L) are mapped for each Braille typewriter button (Fig. 3, arrow 3), allowing a similar Braille typewriter usage in a personal computer keyboard. This key mapping among Latin letters, Braille dots and Emulator keyboard (Table 1) is interpreted by the Braille emulator, generating acceptable Braille characters as possible outputs.
112
K. C. Santana et al.
Fig. 3. From Braille typewriter to Braille emulator keyboard mapping. Adapted from [1, 5]. Table 1. Key mapping among Latin letters, Braille dots and emulator keyboard. Adapted from [5]. Letter A B C D E F G H I J K L M N O P Q R S T
Dots 1 12 14 145 15 124 1245 125 24 245 13 123 134 1345 135 1234 12345 1235 234 2345
Keys F FD FJ FJK FK FDJ FDJK FDK DJ DKL FS FDS FSJ FSJK FSK FDSJ FDSJK FDSK DSJ DSJK
U
136
FSL
V
1236
FDSL
Letter W X Y Z ´ A ´ E ´I ´ O ´ U ˆ A ˆ E ˆ O ˜ A ˜ O ! ? . , ; Uppercase sign Number sign
Dots 2456 1346 13456 1356 12356 123456 34 346 23456 16 126 1456 345 246 235 26 3 2 23 36
Keys DJKL FSJL FSJKL FSKL FDSKL FDSJKL SJ SJL DSJKL FL FDL FJKL SJK DJL DSK DL S D DS SL
46
DK
3456
SJKL
A Javascript version of the Braille emulator was also validated by [5], a Learning Object app that aims to help people in the learning process of writing on the Braille typewriter. As obtained results, according to 20 users with visual impairment, Braillearning works in a similar way to the typewriter, confirming the successful usage of the proposed mapping for the final user.
A Serious Game Proposal for Braille Typewriter Training
3
113
The Braille Typist Game Proposal
The Braille Typist was designed to be an educational game for visually impaired people. The main objective is to type a set of words using the Braille emulator before the time is over. According to the words pronunciation, players will have to correctly typing them if they want to accumulate points. As a single player game, The Braille Typist starts with the stopwatch set in 1 min. The player receives a set of words to be sequentially typed, and the challenge is to write the correct words before the time is over. To score high values, the player needs to be precise and fast as possible. Player earn points when a word is typed correctly. When an incorrectly typed word is written, the player loses points and another word will be given. The player must decide whether he will try to answer the word based on its pronunciation or skip it to try another word (and quickly recover the lost points). By correctly completing the given words, more difficult words will be drawn in each round. In each new level, with 1 extra minute each one, new word difficulties will appear to be typed and heard (bigger words, accented words, heteronym). The hearing difficulty can also be performed by changes on the synthesizer configurations (speech speed and volume). The words will be randomly chosen in the system database, which explores the keyboard usage, hearing and spelling aspects, the Braille knowledge and the typewriter machine knowledge. Regarding the aesthetic layout of the game, colors will be chosen based on better contrasts to users with low vision. A panel to see how the user is typing the word, a progress bar to show the available time, and the player points display will also be applied in the game scene (Fig. 4).
Fig. 4. Interface prototype of The Braille Typist game.
As a general view of the proposed game, a Unified Game Canvas was elaborated based on [6] model. It describes the Game Concept and Game Players (Who), Game Play (What), Game Flow (When), Game Core (How), Game Interaction (Where), Game Impact (Why) and Game Business (How Much) of the respective game (Fig. 5).
Fig. 5. The Braille Typist game canvas. Adapted from [6].
114 K. C. Santana et al.
A Serious Game Proposal for Braille Typewriter Training
4
115
Early Results of the Braille Typist
The game proposed is already under development. The first step was to adapt the braille emulator to work in this proposal. At this point the system main flow is already working, being able to output a random word that is stored in a data struture, to recognize the user input, and to check if the word is correctly typed. Figure 6 shows a student volunter testing the game in its actual stage to provide some feedback according software usability and if its developed functionalities are working fine.
Fig. 6. Evaluating The Braille Typist prototype.
As next steps to the game development, it is expected to improve user interface elements, such as sound effects and a reward system, providing to the user a better gameplay experience. After that, the game will be released to their target public to collect feedbacks to evaluate its usability, accessibility and avaliable features.
5
Conclusions and Future Work
It is important to popularize the use of Braille and its typewriter, as a way of promoting accessibility, since Braille is a mean of communication mainly used by people with visually impaired. The Braille Typist proposal presents a low cost gamified solution that will allow the player with Braille typewriting training. It provides alternative resources for the educational process of people with visual impairments, improving their accessibility aspects to build paths for social equality, as well as to carry out their activities in an independent way. As future work, after finishing the game production, The Braille Typist validation will be performed not only by surveys to evaluate usability acceptance through user opinions, but also with data collected from the Braille typewriter emulator. Is expected to gather information about the time to type, time between letters, words completed in an minute, and so on, becoming possible to see how the performance is growing during the game usage. Furthermore, it is also expected the creation of a low cost Braille joystick, based on open hardware resources able to simulate the Braille typewriter user input that will be used to play embedded Braille games.
116
K. C. Santana et al.
References 1. Marcos Laska Velozo: Braille code (2015). https://datamarcos.blogspot.com/2015/ 06/datamarcos-importancia-do-uso-da.html 2. Almeida, E., Giacomini, L.B., Bortoluzzi, M.G.: Mobilidade e acessibilidade urbana. Semin´ ario Nacional de Constru¸co ˜es (2013) 3. Almeida, K.P., Paula, B.R.: O sistema braille e sua utilidade para o deficiente visual: panorama e pr´ atica. 4a Semana do Servidor e 5a Semana Acadˆemica, p. 9 (2008) 4. High School Math: Braille code (2007). http://highschoolmath.blogspot.com/2007/ 06/braille.html 5. Santana, K., Pereira, C.P., de Santana, B.S.: Braillearning: software para simular a m´ aquina de escrever em braille. In: Brazilian Symposium on Computers in Education (Simp´ osio Brasileiro de Inform´ atica na Educa¸ca ˜o-SBIE), vol. 30, p. 1101 (2019). https://doi.org/10.5753/cbie.sbie.2019.1101 6. Sarinho, V.T.: Uma proposta de game design canvas unificado. XVI Simp´ osio Brasileiro de Jogos e Entretenimento Digital (SBGames) (2017) 7. S´ a, E.D., Campos, I.M., Silva, M.B.C.: Atendimento Educacional Especializado. Deficiˆencia Visual. SEESP / SEED / MEC, Brasilia - DF (2007) 8. World Health Organization: World report on disability 2011. World Health Organization (2011)
Murder Mystery Game Setting Research Using Game Refinement Measurement Shuo Xiong1 , Long Zuo2(B) , and Hiroyuki Iida3 1
3
Huazhong University of Science and Technology, Wuhan, China [email protected] 2 Chang’an University, Xi’an, China [email protected] Japan Advanced Institute of Science and Technology, Nomi, Japan [email protected]
Abstract. This paper explores the game sophistication of a party game is called Murder Mystery Game (MMG), which is very popular in teenagers, recently it was boom developed in China and Japan. For players, they have to promote game story process by performing the role which was assigned to them, finally, players need to find out who is the “murderer”, reversely murderer player has to try his best strategy to escape from the story. In this paper, our research focuses on the playing settings, MMG needs several players and cost a few hours, the game would become boring as the number of players becomes too large or too small, therefore we use the game refinement theory to analyze MMG setting to make the game process become more balanced and sophisticated. Computer natural and convergent simulations for a simple version of MMG are conducted to collect the data while game refinement measure is employed for the assessment. The results indicate several interesting observations, which can help game designers to create a better murder scenario.
Keywords: Murder mystery game design · Computer simulation
1
· Game refinement theory · Game
Introduction
Murder Mystery Game is a special type of Live-Action Role-Playing game (for short as LARP). The term LARP covers a wide variety of game variants, making them inherently difficult to define, however, it is LARP that is the commonly accepted term in the gaming community [8]. Generally to say, LARP is the party game, players do not need to use any electronic devices such as PlayStation 4 or computer, also players do not rely on the boards or pieces. From the rules, LARP seems like an improvisational theater or modern drama. A live role-playing game is a dramatic and narrative game form that takes place in a physical environment [6]. It is a story-telling system in which players assume character roles that they c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 117–125, 2020. https://doi.org/10.1007/978-3-030-65736-9_10
118
S. Xiong et al.
portray in person, through action and interaction. The game world is an agreedupon environment located in both space and time, and governed by a set of rules – some of which must be formal and quantifiable [2,8]. Just like the film and novel, LARP also divided into several themes, like the relationship between literature and detective fiction, in this paper, we focus on the most popular subject – Murder Mystery Game (for short as MMG) [1]. Detective and murder puzzle is the core experience of Murder Mystery Game, also they are the most attractive point differ from other LARP. Similar to the Mafia game or Werewolf game, each player will be assigned identity in the MMG, one or a few players will be assigned the murderer or shady deal scenario. During the game, the detective has to find the murderer and shady deal; the innocent person need to clear themselves and accomplish their own mission, also they can try their best to help the detective to find who is the murderer player; of course for the murderer, they must scapegoat and attack the innocent. People can easily buy the Murder Mystery Game scenario or play the MMG in entity stores, recently the online scenario Murder Mystery Game is very popular in China between the young teenagers. The origin of Murder Mystery Games came from the murder mystery fiction such as “Sherlock Holmes”. In 1935, the first murder mystery box-packed game hit the market. One that era, the scenarios were simple, until now, due to the genius authors, more and more complicated scenarios were written, even you can enjoy the story as the audience.
2
Game Process of the MMG
According to the different scripts or scenarios, various rules existed. Generally, all the entire game can be described by a set of 6 parts: Preparing, Beginning, Search, Analysis, Voting, and Ending. – Preparing: In the beginning, the players or game master have to choose the scenario, also players have to know the story theme and social background of the scenario. After that, each player will receive his/her own script (according to preferences or randomly assign), then prepare to read and act the role. – Beginning: All players have to read their script carefully, confirm the storyline, confirm their task. For example, the murderer player should hide his identity and blame others, detectives and suspects need to find the real murderer, sometimes the suspects have to accomplish several branch tasks such as hide their scandals (although they did not “kill” anyone in the scenario). While all players finish the story reading and task confirming, the performance will begin. – Rummage: After roles and story introduction, the murder happened, all players need to search the related pieces of evidence to prove who is the murderer. Of course, some evidence is interference options, and the murderer player has to hide the evidence which will work to him/her disadvantage.
Murder Mystery Game Setting Research
119
– Analysis: The fourth step is case analysis, all necessary evidence was searched clearly and shown to every player (except the hide evidence by the murderer player). Then, around the case and puzzle, players would tell their timeline, alibi, relationships, murder motive, etc., exchange the evidence and script information, try to find who is the real murderer player. In contrast, the murderer player needs to establish a complete alibi, or use existing clues and logic to blame other suspects. – Voting: According to the entire game time setting, while the final 15 min left, or everyone thinks the puzzle is solved and no space to discuss anymore, the host should announce that analysis and discussion finish. All players have to consider who is the real murderer then write the corresponding role name to vote anonymously. – Ending: While all players finished the voting, the host announces the voting result, the player who gets the most votes will be considered as “murderer”, then the host has to tell everyone the voting result whether is correct or fault. If the “murderer candidate” is correct, the detective and suspects player win the game, otherwise murderer player wins. According to the winner, the host should tell the different story ending, and restore the truth of the scenario and the murderer’s technique process. Players thank each other then game finish. To our best knowledge, few investigations have been made in the aspect of comfortable settings of the Murder Mystery Game. In order to tackle this challenge, computer simulations are performed to obtain statistical data such as the number of players, whereas game refinement measure is employed for the assessment. As a benchmark in this study, we have established a mathematical model of Murder Mystery Game, the setting can be represented as described in Notation 1. Notation 1. M M G(N, T, b, r, a, v) denotes a game of Murder Mystery Game which has totally N players and T full time, which consists of b beginning time and r rummaging time, also a means analysis period and v presents final inference and voting time.
3
Game Refinement Theory
We review the early work of game refinement theory from [3]. The decision space is the minimal search space without forecasting. It provides the common measures for almost all board games. The dynamics of decision options in the decision space have been investigated and it is observed that these dynamics are a key factor for game entertainment. Thus a measure of the refinement in games was proposed [4]. Later, the following works are sketc.hed from [5,7] that expands the model of game refinement which was cultivated in the domain of boardgames into continuous movement games such as sports games and video games.
120
S. Xiong et al.
The game progress is twofold. One is game speed, while another one is game information progress with a focus on the game outcome. Game information progress presents the degree of certainty of a game’s result in time or steps. Having full information of the game progress, i.e. after its conclusion, game progress x(t) will be given as a linear function of time t with 0 ≤ t ≤ tk and 0 ≤ x(t) ≤ x(tk ), as shown in Eq. (1). x(t) =
x(tk ) t tk
(1)
However, the game information progress given by Eq. (1) is usually unknown during the in-game period. Hence, the game information progress is reasonably assumed to be exponential. This is because the game outcome is uncertain until the very end of game in many games. Hence, a realistic model of game information progress is given by Eq. (2). x(t) = x(tk )(
t n ) tk
(2)
Here n stands for a constant parameter which is given based on the perspective of an observer in the game considered. If one knows the game outcome, for example after the game, or if one can exactly predict in advance the game outcome and its progress, then we have n = 1, where x(t) is a linear function of time t. During the in-game period, various values of the parameter n for different observers including players and supporters will be determined. For example, some observers might be optimistic with 0 ≤ t < 1. However, when one feels any difficulty to win or achieve the goal, the parameter would be n > 1. Meanwhile, we reasonable assume that the parameter would be n ≥ 2 in many cases like balanced or seesaw games. Thus, the acceleration of game information progress is obtained by deriving Eq. (2) twice. Solving it at t = T , the equation become x(tk ) n(n − 1) (3) x (T ) = tk k) It is expected that the larger the value x(t (tk )2 is, the more the game becomes exciting,√due in part to the uncertainty of game outcome. Thus, we use its root
x(t )
square, tk k , as a game refinement measure for the game under consideration. We call it GR value for short as shown in Eq. (4). x(tk ) (4) GR = tk
We show, in Table 1, measures of game refinement for various games [9]. From the results, we conjecture the relation between the measure of game refinement and game sophistication, as stated in Remark 1. Remark 1. Sophisticated games have a common factor (i.e., same degree of informational acceleration value, say 0.07 − 0.08) to feel engaged or excited regardless of different type of games.
Murder Mystery Game Setting Research
121
Table 1. Measures of game refinement for various types of games Game
x(tk )
tk
GR
Chess Shogi Go Basketball Soccer Badminton Table tennis DotA ver 6.80
35 80 250 36.38 2.64 46.336 54.863 68.6
80 115 208 82.01 22 79.344 96.465 106.2
0.074 0.078 0.076 0.073 0.073 0.086 0.077 0.078
Therefore, we will use game refinement theory to analyze the Murder Mystery Game to control the game refinement value could be located in the zone area. Consider the game process, the speed can be described by two parameter: one is the full game time T as tk , which are consisted by beginning period b, rummaging period r, analysis process A and voting inference segment v. It needs to be explained that the analysis process A should be divided into small parts for each player. For example, there are 5 players then C25 role relationships need to be analyzed, for one sub-analysis time is marked as a. Based on above, the T equal to b + r + C2N ∗ a + v. Another parameter is the effective outcome time x(tk ), in this situation, the parameter could be described as the ((N − 1) ∗ (N − 1) + N ) ∗ v, N is the total players number too. For N players, the murderer has to consider the N possibility, and other suspects have to consider the N −1 possibility (except themselves), therefore totally we have ((N − 1)∗(N − 1)+ N ) effective branching factors, then multiply the unit time v to get the outcome. Therefore, the refinement value in Murder Mystery Game as shown in Eq. 5 ((N − 1)2 + N )v (5) GR value = b + r + C2N a + v
4
Simulation
Different from Chess (certainty rules) or soccer (huge user and global organization), it is impossible to collect mega murder mystery game data from human players. Therefore, we would like to use the Python program to simulate game progress thousands of times, the simulation process as Algorithm 1 shows.
122
S. Xiong et al.
Algorithm 1. Murder Mystery Game Simulation Process Require: n is the quantifiable game round for natural simulation; 1: natural simulation random function setting for M M G(N, b, r, a, v) 2: n = 0 3: for i in range(10000): √ 4: GR value = DB 5: if the 0.07 < GR value < 0.08 and 120 < T < 300, n = n + 1 6: record average natural simulation N S(b, r, a, v); 7: else pass; Ensure: m is the quantifiable game round for convergent simulation; 8: according to N S(b, r, a, v) to make convergent simulation random function setting for M M G (N , b , r , a , v ); 9: m = 0 10: for i in range(10000): √ 11: GR value = DB 12: if the 0.07 < GR value < 0.08, m = m + 1 13: record average natural simulation CS(b, r, a, v); 14: else pass; 15: return coincidencerate = n/10000 and coincidencerate = m/10000; 16: return N S(N, T, b, r, a, v) and CS(N, T, b, r, a, v)
The simulation principle is mentioned as below: – For each setting, the simulation times equal to 10,000, which satisfy the requirement of mega statistic, and do not waste computer recourse. – The unit of time in game refinement model is 1 min, not second or hour. Because the basic and the shortest activity in MMG is role introduction, which just around 1 min, therefore we use 1 min as a basic unit. – The simulation result should be located in zone value (0.07 − 0.08) will be considered valid and recorded. Also, the entire game time should between 120 min (2 h) and 300 min (5 h), because too short game period is no interesting, and too long will disturb normal daily life. For example, no one play game in the morning, so players get together at 13:00 pm, 5 h later they have to have dinner, the evening is the same. Therefore, 120 < T < 300. – The First step is simulating the game process naturally as normal human experience setting, to get the coincidence rate as Table 2 shows, also the average setting N S (N, T, b, r, a, v) would be given. – Second, according to the N S (N, T, b, r, a, v), the convergent boundary are set within the unit range. For example in Table 2, while N = 4 and b = 16.97, then the convergent boundary setting in simulation program was marked as random.unif orm (16.47, 17, 47), following the logic the convergent simulation result can be given as CS (N, T, b, r, a, v) According to the algorithm and principle, the complete simulation data was shown in Table 2.
Murder Mystery Game Setting Research
123
Table 2. Murder Mystery Game simulation setting and result Player number b
r
a
4-natural
16.97
17.41
16.43
4-convergent
16.99 17.40 17.40
5-natural
16.25
5-convergent
16.20 15.49 16.00 11.49 160.00 203.20
99.99%
6-natural
15.32
4.51%
6-convergent
15.30 15.30 13.50 11.29 202.50 244.45
98.83%
7-natural
15.09
2.22%
7-convergent
15.10 14.70 10.22
8.67 214.62 253.20
93.37%
8-natural
14.92
14.79
7.93
6.70
0.93%
8-convergent
14.90 14.81
7.93
6.67 222.04 258.68
90.63%
9-natural
15.97
15.35
6.50
5.78
0.40%
9-convergent
15.89 15.30
6.52
5.68 234.72 271.64
84.67%
10-natural
14.26
14.13
5.60
5.27
0.15%
10-convergent
14.30 14.10
5.61
5.21 252.45 286.12
56.61% 0.04%
15.44 15.29 14.67
16.03 13.57 10.23
v 8.90
A 98.58
Total time (T) Coincidence rate 141.90
9.00 104.40 147.82 11.53 11.31 8.67
160.30 203.55 214.83 222.04 234.00 252.00 275.00
203.63 245.62 253.36 258.62 271.10 285.67
11-natural
7.00
10.25
5.00
5.00
297.25
11-convergent
6.99 10.29
4.92
4.72 270.60 292.94
4.43% 99.84% 5.86%
12.87%
Remark 2. Total player number should larger than 3, while the player number N ≤ 3, game setting should be “Player VS Environment”, In other word, “Player VS Player” at least need N ≥ 4. Remark 3. If we want control the game length in 5 h with comfortable feeling, the maximum total player number must less than 11, because the program error will happen while N was inputted larger than 12. Despite all this, from Table 2 we notice that while N = 11, the convergent coincidence rate only equal to 12.87%, therefore the reasonable maximum N is 10, in special cases could be 11. Remark 4. Both natural and convergent simulation, the highest coincidence rate setting is N = 5, which means 5-roles script is the easiest scenario for Murder Mystery Game authors, in addition 160 min is a suitable social entertainment duration in the afternoon and evening. Remark 5. From 4 to 6 players setting, the comfortable time length increase obviously. However, from 6 to 10 players setting, the total time did not change too much. It means the average participating time for each player will decrease while number of players larger than 6. In other words, when N = 6, the average thinking and participation time of players can reach the maximum under the comfortable time setting. Remark 6. Although the convergence coincidence rate of N = 9 is still high, compare with other settings when N ≤ 8, it is lower than 90%. This means
124
S. Xiong et al.
high-quality script should consider the game rhythm. On the macro level, the data performance is still passable, but on the micro level, some players may not have any participation in the script at all, they just play a role in the aggregate. When N = 10, the issue is more obvious.
5
Conclusion
In this paper, base on game refinement theory we analyzed the comfortable time setting in Murder Mystery Game. According to the computer natural and convergent simulation, some interesting result was found. The notion of game progress and game information progress model for Murder Mystery Games was introduced in the development of game refinement measure. It seemed to be a successful tool to analyze the Murder Mystery Games players and time setting. Whereas 0.07–0.08 should be a comfortable zone due to its good balance between skill and chance in game playing. The setting of M M G (5, 203.2, 16.2, 15.5, 16, 11.5) is the most manageable scene for the author, and setting of M M G (6, 244.5, 15.3, 15.3, 13.5, 11.3) is the most challenging script for the players. Moreover, in order to satisfy the comfortable setting, the total number setting should be 4 ≤ N ≤ 8, sometime N could be 9 or 10. The maximum limit is N = 11, however it is not recommended in terms of comfort and script quality. Our suggestion is that if the script really needs more characters, we can add some pure detective roles, who are not regarded as suspects, so as to ensure that the game refinement analysis of the script can be controlled within the zone value. We believe this research achievement can help the Murder Mystery Game authors and players to gain the better experience. Nevertheless, such field is still nascent and there is a pressing need for further exploration of a broader range of Murder Mystery Game domains. In the future, we may expand the Murder Mystery Game analysis area, update the game refinement theory model and simulation program to find the more interesting conclusion. Acknowledgement. This Paper Supported by Huazhong University of Science and Technology Special Funds for Development of Humanities and Social Sciences.
References 1. Aylett, R.S., Louchart, S., Dias, J., Paiva, A., Vala, M.: FearNot! – an experiment in emergent narrative. In: Panayiotopoulos, T., Gratch, J., Aylett, R., Ballin, D., Olivier, P., Rist, T. (eds.) IVA 2005. LNCS (LNAI), vol. 3661, pp. 305–316. Springer, Heidelberg (2005). https://doi.org/10.1007/11550617 26 2. Falk, J., Davenport, G.: Live role-playing games: implications for pervasive gaming. In: Rauterberg, M. (ed.) ICEC 2004. LNCS, vol. 3166, pp. 127–138. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28643-1 17 3. Iida, H., Takahara, K., Nagashima, J., Kajihara, Y., Hashimoto, T.: An application of game-refinement theory to Mah Jong. In: Rauterberg, M. (ed.) ICEC 2004. LNCS, vol. 3166, pp. 333–338. Springer, Heidelberg (2004). https://doi.org/10.1007/9783-540-28643-1 41
Murder Mystery Game Setting Research
125
4. Iida, H., Takeshita, N., Yoshimura, J.: A metric for entertainment of boardgames: its implication for evolution of chess variants. In: Nakatsu, R., Hoshino, J. (eds.) Entertainment Computing. ITIFIP, vol. 112, pp. 65–72. Springer, Boston, MA (2003). https://doi.org/10.1007/978-0-387-35660-0 8 5. Panumate, C., Xiong, S., Iida, H.: An approach to quantifying pokemon’s entertainment impact with focus on battle. In: Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence (ACIT-CSI), 2015 3rd International Conference on, pp. 60–66. IEEE (2015) 6. Schneider, J., Kortuem, G.: How to host a pervasive game-supporting face-to-face interactions in live-action roleplaying. In: Position paper at the Designing Ubiquitous Computing Games Workshop at UbiComp, pp. 1–6 (2001) 7. Sutiono, A.P., Purwarianti, A., Iida, H.: A mathematical model of game refinement. In: Reidsma, D., Choi, I., Bargar, R. (eds.) INTETAIN 2014. LNICST, vol. 136, pp. 148–151. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08189-2 22 8. Tychsen, A., Hitchens, M., Brolund, T., Kavakli, M.: Live action role-playing games: control, communication, storytelling, and MMORPG similarities. Games Cult. 1(3), 252–275 (2006) 9. Zuo, L., Xiong, S., Iida, H.: An analysis of DOTA2 using game refinement measure. In: Munekata, N., Kunita, I., Hoshino, J. (eds.) ICEC 2017. LNCS, vol. 10507, pp. 270–276. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66715-7 29
Finding Flow in Training Activities by Exploring Single-Agent Arcade Game Information Dynamics Yuexian Gao1,3(B) , Naying Gao1 , Mohd Nor Akmal Khalid1,2 , and Hiroyuki Iida1,2 1
2
School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa 923-1211, Japan {gaoyuexian,gaonaying1,akmal,iida}@jaist.ac.jp Research Center for Entertainment Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa 923-1211, Japan 3 Hebei University of Engineering, Handan 056001, China Abstract. This paper incorporates discussion about game refinement theory and the flow model to analyze simulation data collected from two types of arcade games. A mathematical model of the arcade game processes is formulated. The essence of the arcade games is verified through the game-playing processes of players. In particular, challenge setup could contribute to the addictiveness when the mode is close to flow channel. Risk frequency ratio is applied to measure the process and verified the more entertaining mode of training activities.
Keywords: Arcade games ratio · Flow
1
· Game refinement theory · Risk frequency
Introduction
Arcade games firstly referred to entertainment machines installed in public businesses such as restaurants, bars, and amusement centers. Since Apple company launched the arcade games project on 20191 , arcade games have entered the limelight with a new look. The term “arcade game” nowadays refers to action video games designed to play similarly as an arcade game in the game center with frantic, addictive game-play. It focuses on the user’s reflexes, while usually feature very little puzzle-solving, complex thinking, or strategy skills2 . Flow theory introduces a mechanism to get into the flow which refers to a state of deep enjoyment, through which people could find an approach to a better life. Playing games is a process of eliminating the uncertainty of game outcome, which is regarded as a typical flow activity [2]. 1 2
https://www.apple.com/newsroom/2019/09/apple-arcade-invites-you-to-playsomething-extraordinary/. Arcade Game, http://en.wikipedia.org/wiki/Arcade game.
c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 126–133, 2020. https://doi.org/10.1007/978-3-030-65736-9_11
Finding Flow in Training Activities by Exploring Single-Agent Arcade
127
The measure of game refinement (GR) had been used for evaluating sophistication of various games [5]. The focus of this study involves analyzing game process model to identify the reasons for players’ addiction to the arcade games and similar functional activities. By applying the GR measure, this study is interested in finding what constitutes an enduring training process.
2
Related Works
A great game will “take one minute to learn but a lifetime to master” [7]. Arcade games perfectly illustrate this feature. “Pong” was the first to appear successfully in the commercial market of the public in 1972 and lead the arcade game gradually into industrialization. Over the decades, although arcade games have been greatly improved in both form and content, their core feature is still as “simple but fun to play”. Some researchers focus on specific arcade games as an object to re-design the game, test advanced algorithms, or find solutions for human activities. Schrittwieser et al. proposed MuZero by introducing an approach combining both high-performance planning and model-free reinforcement learning methods [9]. Furthermore, the MuZero does not require any knowledge of game rules or environment dynamics, tested on 57 Atari arcade games and achieved “a new state of the art”. Cook et al. demonstrated the approach to generate complete arcade games by ANGELINA system, scratched by combining rulesets, maps, and object layouts that it has designed itself [1]. However, these researchers did not mention how well players accept and enjoy optimized improvement. More recently, some studies explored the practicalities of using games for learning. Rohlfshagen et al. summarized the peer-reviewed research from sociology, psychology, brain-computer interfaces, biology and animals, education, which focused on either game with particular emphasis on the field of computational intelligence [8]. They believed the potential usefulness of arcade games like Pac-Man for higher education is promising. In essence, scholars have been using arcade games as their research target, mainly focus on aspects such as (1) using arcade games to assist psychological experiments, (2) improving the efficiency of algorithms, and (3) using arcade games for learning. Nevertheless, research discussing the essence of arcade games from the dynamic of the game perspective is limited to discover optimal flow experience in game-playing processes.
3 3.1
Assessment Methodology Game Refinement Theory
Game refinement (GR) theory [6] is used to measure the sophistication (i.e., the balance between luck and skill) of a game. It provides an innovative game
128
Y. Gao et al.
dynamics view by evaluating the process of a game, simulating the game outcome’s uncertainty as a kind of force in mind [10], which is corresponding to force in nature based on Newton’s second law. From the perspective of players, information of the game outcome is a function of time t (in board games would be the number of possible moves). By considering the process of game as a process of solving uncertainty of game outcome (or obtained certainty) x (t), then (1) is obtained. x (t) =
n x(t) t
(1)
The parameter n (1 ≤ n ∈ N ) is the number of possible options and x(0) = 0 and x(T ) = 1. Here x(T ) stands for normalized amount of solved uncertainty. Note that 0 ≤ t ≤ T, 0 ≤ x(t) ≤ 1. The above equation implies that the rate of increase in the solved information x (t) is proportional to x(t) and inverse proportional to x(t) and inverse proportional to t. Then, (2) is obtained by solving (1). t x(t) = ( )n (2) T In most of the board and continues movement video games, the total length of the game is significantly different for players with different levels. As in Tetris, where theoretically, there will be no end for the perfect player. We assume that the solved information x(t) is twice derivable at t ∈ [0, T ]. The second derivative here indicates the accelerated velocity of the solved uncertainty along with the game progress. After the second derivative, the measure of GR can be formulated in its root square form, as given by (3). 2 = GRclassical
n(n − 1) n−2 n(n − 1) t |t=T = n T T2
(3)
It has been found that sophisticated games have a similar value of game refinement located at the zone GR ∈ [0.07, 0.08] using the term in (3) [4,11,12]. Another term n(n − 1) correspond to the game progress patterns such as oneside game and seesaw game, or the difficulty of completing the task. The game progress pattern or its difficulty is important concerning the player’s engagement. 3.2
Game Refinement Measure for Target Games
Acceleration in Mind for Arcade Games. For arcade games those aim playing as long as players can, the measure of game refinement should be reviewed from a more general and macroscopic perspective, the speed of uncertainty being solved is uniformly accelerated. Thus, total uncertainty of game outcome x(t) = vt. Consequently, the speed of a game playing, i.e. v, could be counted as the successful ratio to choose the best choice approaching to the end of the game process, namely v = T1 . Therefore, risk frequency ratio is defined as m = 1 − v = T T−1 . Note that m is
Finding Flow in Training Activities by Exploring Single-Agent Arcade
129
dynamic since the player would often feel risky when their skill is weak during the process, and as the players get more skills by practicing, m would decrease. Gamified experience in single-agent arcade games could be taken as the addiction level of the game. Hence, m could be regarded as the predictor variable of the game challenge. We assume m is the difficulty to move the game forward. Supposing the game ends at t = T , solve that x(T ) = vT = 12 aT 2 , we could get acceleration in mind, namely a-value for single arcade games in this study, as (4). 2v (4) a= T Flappy Bird. Flappy bird was released in 2013 developed by Vietnamese video game artist Dong Nguyen. It had been the most downloaded mobile in the app store for IOS, with a download amount of 50 million per month, during which the developer earned 50,000 dollars a day from the in-app advertisements. Process of Flappy bird is uniformly related to learning something or training to acquire some skill. In our simulated Flappy bird game, the player gains a score every time crossing through a pipe is successful. The total game length T is the total number of counted tries, i.e., the number of pipes, thus the successful tries always be one Pipes less than the total number. Hence the success ratio v here would be Passed Total Tries . Hence, the a-value of Flappy bird is calculated as (5). aFlappyBird =
2*Successful Speed Total Tries
(5)
Brick Car Racing. The game Brick Car Racing is a dedicated handheld game console popular in the early 1990s3 . A simplistic racing game in which the player switches between left and right lanes to avoid other cars passing through. The player loses if the car hits one of them. The higher the level, the game’s speed will be faster. In our simulation of the game, the player gains score whenever the car successfully avoids a collision. Therefore, the total game length of this game is the total times that the player needs to slide, which is the number of an encounter of the opposing cars. Unlike the Flappy bird, for every specific number of passes, the game would level up, and the difficulty would increase by increasing the car speed. As the speed suddenly changes, a sudden sense of weightlessness is expected by the players. Hence, the wining speed (v) should be decreasing. From the perspective of risk frequency ratio (m), the player experience towards sudden level changes (speed) is considered. For the Brick Car Racing, risky frequency ratio is measured from level to level. For instance, if there are 100 people participate the level 1, 70% of them passed where the m at level 1 is 30%. Among the 70 people, 35 (50%) of them passed level 2. Then, m at level 2 is 50%, and not 65% since the calculation is 3
https://retroconsoles.fandom.com/wiki/Brick Game.
130
Y. Gao et al.
not included in the previous stages (30% of the players excluded from the latter risk). Hence, referring to the formulation in Sect. 3.1, the a-value measure of the Brick Car Racing is given as (6). aBrickCarRacing = 3.3
2 ∗ Level based Success Speed Passed Car At The Level
(6)
Flow Theory and Force in Mind
Flow, originally derived from Csikszentmihalyi’s observation of artists, chess players, climbers, and composers. He observed that when these people were engaged in their work with almost total concentration, often losing track of time and awareness of their surroundings [3]. They engaged in their individual activities out of a common sense of fun. These pleasures are derived from the process of activity, and the external rewards are minimal or non-existent. This flow experience caused by concentration is considered to be the best experience usually happening when skill people acquired and challenge they faced at a comparable level. The formula is shown in (7). F low =
Skill (the goal is 1.0) Challenge
(7)
From the view of game informatics, with the decreasing of m, people would feel boring since the game is with no longer has much uncertainty for the player as at the beginning. They would come to the phase corresponding to the boredom zone in flow theory when an event needs high skill but with low challenge. In nature physics, force is any interaction that will change the motion of an object when unopposed. Force can cause an object with mass to change its velocity (which includes to begin moving from a state of rest), i.e., to accelerate4 . Therefore, 2mv (8) F = ma = T As discussed in Sect. 3.1, the acceleration in mind could be the measure about ratio of skill and challenge. In order to maintain the movement of game or activity, force should not be decreased, therefore, force in mind could be the measure of flow state. The bigger the force in mind is, the higher possibility the game could be in the flow longer.
4
Data and Analysis and Discussion
The essence of game attraction is uncertainty [5]. From the information perspective we could also agree that n = 2 is the most competitive case when the win rate and lose rate are equals, i.e. where the game outcome entropy5 reaches the highest value. In this study, two arcade games were considered for 4 5
https://en.wikipedia.org/wiki/Force. H(X) = − i P (xi ) logb P (xi ), information entropy by C.E., Shannon.
Finding Flow in Training Activities by Exploring Single-Agent Arcade
131
data collection. The open-source code is adopted and simulation is conducted by using PyGame. The Flappy bird, whose challenge setup is constant, as well as the Brick Car Racing with increasing difficulty. Although the interface and presenting form of the two games are different, the core playing model is similar, which is moving the character to avoid collision on obstacles to play as long as possible. The two classic games could represent for many popular arcade games recently such as Temple Run (as well as other Run games), Shooty Skies, Jump. The Flappy bird and Brick Car Racing games represent two typical basic singleagent arcade games. One represents for those with constant difficulty while the other for those with difficulty changing. After all the data has been aggregated, the game refinement value of target games is shown in Table 1. Data shows that the measure of game refinement are similar for the two games, both of which are lower than the GR zone. Previous work on serious games reveals that activities that happened in a serious environment would not be as entertaining or relaxing as fun games. In such a situation, game elements are used to increase motivation and engagement of the learning process so that the GR value will be much lower [5]. Flappy Bird has a value of 0.059, and Brick Car Racing is 0.055, both locating between fun game and serious game. However, the force in mind is differs significantly. The bigger force of Brick Car Racing implies the game is has more flow force, which can maintain a longer period of flow state. Table 1. The measure of game refinement of target arcade games Game
v
Flappy bird
0.9414 0.0586 1.0689 32 0.059 0.00345
m
n
T a
F
Brick car racing 0.5506 0.4494 1.8161 20 0.055 0.02474
Comparing the Flappy bird and Brick Car Racing, the game process of Brick Car Racing is closer to the real flow experience. With difficulty increased, the expected risk of the player is (kind of) reset. When the players gradually relax their vigilance and become more comfortable, the game suddenly accelerates, and then the players become anxious again. This situation is validated by the depiction of the game process of the Brick Car Racing that changes dynamically (m value in Fig. 1). As the process repeats, the player experiences weightlessness similar to that of a roller coaster, keeping them always curious about the outcome; thus, the player would get highly attracted. The activities with skill changing perform as the trend as a blue line in Fig. 1 are training activities, which mostly driven by an inner sense of accomplishment. More often than not, it is a steady, repetitive process. While activities with trend given as red in Fig. 1, skill increase defines the training activities as entertaining. As the game increase in difficulty, player skill gets better, and as player skill improves, the challenge gets harder, and so on. This kind of activity stems from people’s instinctive curiosity about the unknown and is an implicit kind
132
Y. Gao et al.
of entertainment. Thus, the relationships between boring training and gamified training are established.
m values
0.6
0.4
0.2
0
0
2
4
6
F lappyBird
8 10 Game Level #
12
BrickCarRacing
14 m=
16 1 2
Fig. 1. m value change with level increase in Flappy Bird and Brick Car
The difficulty that suits the player’s skill level needs to be dynamically changed to keep the player engagement stays within the flow zone. The process of keep playing single-agent arcade games is boring training. The more attempts the player has, the more skillful they will be. As they become skillful, they are less likely to play the game or do the training. If the game difficulty is adjusted, the training process can become more exciting and long-lasting. The game process of the arcade game is metaphorically a continuous training process. Since GR could be seen as the acceleration in mind, the fluctuation of GR in Fig. 1 indicates an information process similar to ride a roller coaster, which can keep the player stay in the “challenge equivalent to skill” state, as shown in Fig. 1. Therefore, the regulation of single-agent arcade games would be the benchmark for us to understand the training process and to overcome the persistence at the later phase of skill training.
5
Conclusion
This study uses two typical single-agent arcade games, Flappy Bird and Brick Car Racing, as a benchmark to study the arcade game’s essence. The flow experience in theory and real cases were described where a logical model of the arcade
Finding Flow in Training Activities by Exploring Single-Agent Arcade
133
game process was formulated. The game refinement measure is applied to two arcade games where data were collected and analyzed. The essence of training of the arcade games is verified. Furthermore, the general process of arcade games was proposed. Result shows when the risk frequency ratio changes irregularly and repetitively, the process is more likely to be in the flow state longer. The difficulty change with the level case is closer to flow and was verified by the measure of game refinement. It offers us inspiration if we need to turn a training event engaging and long-lasting; the game difficulty needs to be adjusted dynamically based on the player’s increase in skill. This finding will be explored further in future and contribute to serious game design to help people learning things persistently.
References 1. Cook, M., Colton, S.: Multi-faceted evolution of simple arcade games. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG 2011), pp. 289– 296. IEEE (2011) 2. Csikszentmihalyi, M.: Beyond boredom and anxiety (1975) 3. Csikszentmihalyi, M.: Toward a psychology of optimal experience. In: Csikszentmihalyi, M. (ed.) Flow and the Foundations of Positive Psychology, pp. 209–226. Springer, Dordrecht (2014). https://doi.org/10.1007/978-94-017-9088-8 14 4. Gao, Y., Li, W., Xiao, Y., Khalid, M.N.A., Iida, H.: Nature of attractive multiplayer games: case study on china’s most popular card game–doudizhu. Information 11(3), 141 (2020) 5. Iida, H.: Serious games discover game refinement measure. In: 2017 International Conference on Electrical Engineering and Computer Science (ICECOS), pp. 1–6. IEEE (2017) 6. Iida, H., Takahara, K., Nagashima, J., Kajihara, Y., Hashimoto, T.: An application of game-refinement theory to Mah Jong. Proc. ICEC Ndhoven 3166, 333–338 (2004) 7. Kunkel, B.: How alex pajitnov was tetris-ized! why tetris’ creator got the cultural bends upon his arrival in America. Good Deal Games (2003) 8. Rohlfshagen, P., Liu, J., Perez-Liebana, D., Lucas, S.M.: Pac-man conquers academia: two decades of research using a classic arcade game. IEEE Trans. Games 10(3), 233–256 (2017) 9. Schrittwieser, J., et al.: Mastering atari, go, chess and shogi by planning with a learned model. arXiv preprint arXiv:1911.08265 (2019) 10. Sutiono, A.P., Purwarianti, A., Iida, H.: A Mathematical Model of Game Refinement (2014) 11. Sutiono, A.P., Ramadan, R., Jarukasetporn, P., Takeuchi, J., Purwarianti, A., Iida, H.: A mathematical model of game refinement and its applications to sports games (2015) 12. Xiong, S., Zuo, L., Iida, H.: Possible interpretations for game refinement measure. In: Munekata, N., Kunita, I., Hoshino, J. (eds.) ICEC 2017. LNCS, vol. 10507, pp. 322–334. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66715-7 35
Players Perception of Loot Boxes Albert Sakhapov and Joseph Alexander Brown(B) Artificial Intelligence in Games Development Lab, Innopolis University, Republic of Tatarstan, Russia {a.sahapov,j.brown}@innopolis.ru Abstract. Loot boxes are a monetization technique in games where a player pays for a randomized chance at receiving an in-game item, either cosmetic or functional. They have recently been examined as a potential object for gambling, and government regulators are examining the issue of their use in games. To explore the issue and analyze players’ perceptions 53 people were surveyed and played a simple loot box game with different settings. Findings include that showing item drop probabilities directly influences the opinion of players regarding the fairness of loot boxes and can affect further choice whether to open them, which supports the need for regulations on loot box use.
Keywords: Loot boxes
1
· Gambling · Gaming · User survey
Introduction
The phenomenon that is called “loot box” has existed for more than a decade. Originally, loot boxes were created to make the players’ game process easier or interesting for a certain amount of money. At the next stage, loot boxes have become a tool to keep the players’ interest within the game by offering new virtual items such as new characters, guns, or skins. The item varies from one game to another game, and it might be either customization for a player’s character that does not affect the balance of the game or booster that makes the game process easier and more comfortable for a target player [12]. The study presented in this paper discusses the perception of fairness on loot boxes by the essential people in game development − players. Currently, there is little data on how they perceive loot boxes from the perspective of fairness. Moreover, players’ opinions might differ according to certain factors of loot boxes, such as the cost of the box, the odds of particular drops, and the outcome. This study aims at finding out and analyzing how players perceive loot boxes and what they expect and want from developers to be adjusted. Our goal is to find the data and learn how players’ opinions change regarding the factors. Behind the term “loot box,” there is a concept which Pawel Grabarczyk and Rune Nielsen named as “random reward mechanisms” (RRM) [10]. Even though loot boxes appeared about 15 years ago, RRM has existed for decades and first appeared in baseball collecting cards in 1875 in the United States. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 134–141, 2020. https://doi.org/10.1007/978-3-030-65736-9_12
Players Perception of Loot Boxes
135
One card pack contains several randomly selected pieces, ranging from standard to rare. Rare cards can reach prices in the millions of dollars. At that time, children and adolescents were spending hundreds and even thousands of dollars to receive cards of their favourite player, and now they are trying to open their favourite weapon or character skin in video games. In video games RRM came with the first RPG games like Diablo, where killing an enemy gives the player some random reward. That system was used in many games because it is an easy way to introduce the feeling of novelty and difference [20]. Moreover, Blizzard Entertainment had an idea to sell individual game assets pack, which would contain several items of different rarity. It was inspired by a collectible card game of Magic: The Gathering that offers players to build their own physical desk from booster packs. However, due to the unavailability of fast Internet technologies and the high cost of distribution of these booster packs on physical discs, the idea was not adopted in the final version of the game [2]. After a decade, with the progress of technologies, the concept of RRM began to be used in mobile games. Since most of them are free to download and to play, companies added “microtransactions” that have been an alternative monetization for developers by providing extra items and functions that make the game process more fun and exciting for some amount of money in turn. Currently, 30 games out of 31 contain in-app purchases [16]. Microtransactions have become an integral part of game marketing and monetization [17]. To increase profits developers started to add loot boxes even in full-priced video games. This has made it easier for companies to recoup multimillion budgets for games, while keeping the initial purchase prices of games relatively the same. The reasons people buy loot boxes were explained by Juho Hamari et al. [11]. They highlighted six main reasons for opening loot boxes and buying in-game content: Performance, Personalization, Obtainment Achievements, “Showing-off”, Low Perceived Cost, and Unlocking Content. The behaviour of players is explained in a Hooked Model and examines how companies keep people’s interest [5]. In the Autumn of 2017, EA Games went further than other companies, by partially locking the content in their upcoming game Battlefront II, so that the players spent their money or time to unlock it. Players researched that to open all the content, the player should play at least 4578 h or spend 2100 dollars [7]. This situation led to a huge video game scandal, and since then, governments of many countries began to regulate loot boxes [4]. Since loot boxes use a random chance of winning and players pay real-world funds, games may be considered as a form of gambling [13]. As a result of the scandal, Gaming Commissions of Belgium called game companies to have a “clear indication of the chances of winning for the various item values” [9]. The same requirement made by China’s Ministry of Culture in December 2016 [3]. Singapore’s Parliament passed The Remote Gambling Act, which obliges game companies to obtain a license to operate if their games have random awards and real-money payments [18]. Australian Gambling Commission of Victoria State suggested restricting the games that contain loot boxes for children under eighteen [19]. On the other hand of the dispute is UK
136
A. Sakhapov and J. A. Brown
Gambling Commission that concluded that loot boxes do not need regulations because virtual items are “prizes”, and they cannot be used outside the game[8]. Entertainment Software Ratings Board (ESRB) organization that provides video game age and content ratings for North America agreed with UK’s Gambling Commission and added that the players are guaranteed to receive the in-game item. In May 2019, a bill to ban loot boxes and pay-to-win microtransactions in video games for kids under 18 years was introduced in the US. Senator Josh Hawley explained that game developers should not monetize addiction when a game is designed for kids. After a day, The Entertainment Software Association stated that numerous countries do not consider loot boxes as gambling, and the ESRB organization “does not consider loot boxes to be gambling” [14]. They desired to share with the senator the tools “that keeps the control of in-game spending in parents’ hands” [15]. Though researchers have recommended game developers to avoid giving players real money predicting this could lead to problems with gambling laws [6] and associations with real gambling [1]. The study presented brings into a discussion this controversial topic - loot boxes, namely, how players perceive them.
2
Methodology
An experiment was started with the development of software that simulates opening a loot box. This system represents a loot box that provides a random item. As the idea was to examine players of Innopolis University, all items hypothetically can be used only within the University, so that students can easily get involved in the system and estimate its fairness without long introductions. To cover all aspects and not miss important factors, we studied the principles of loot box system in details. The following are loot box factors that were applied in the software: Payment Type. There are three types of payment of loot boxes in video games: fully free, ones that can be purchased with in-game currency, and that can be bought with real-world currencies. Sometimes developers combine all types to cover a wider audience. Outcome. Game development companies use many different outcomes in their loot boxes. In our software, there are three types of outcome: Innopolis University apparel that represents itself an analogy to skins in video games, items that make the process more interesting, but are optional and do not introduce imbalance, and grades for the hypothetical course at University are like boosters that directly affect the result in video games. Rarity. Mostly in all video games, items in loot boxes are divided by rarity. It usually varies from very common to very rare, where the higher rarity, the more valuable and desired item becomes. All rarity types are highlighted in some colour so that players could easily see and identify the rarity. Odds Distribution. Items of different rarity have different odds distribution. Therefore, common items have the highest probability of drop, whereas rare items - the lowest. Odds Display. As was mentioned in the previous section, some governments require game companies to show loot box odds in video games. Developers added odds display only in those countries in which it was obliged, whereas players from other
Players Perception of Loot Boxes
137
sides of our planet still cannot see the probabilities with which items fall. This factor is important because it might be the new standard in loot box systems. Therefore, it is present in software that was developed and can be enabled or disabled in its settings. The second part of the study implies gathering of the information of participants. Subjects were asked questions about their experience and preferences in loot boxes regarding fairness in probabilities, rarity division, and outcome type. We collected this information to make overall analysis and to see in the end how participants’ views have changed during the experiment. Most of the questions in surveys use a scale with seven choices, where one completely disagrees/no, 4 is the neutral answer and 7 - completely agree/yes. This scale clearly shows the opinions of participants and allows us to assess and compare the results of the experiment easily. All subjects that participated in initial data collection were divided into several groups. Each group covers some of the factors of loot boxes. We chose two the most important and widely discussed factors - items drop probabilities and displaying them. These factors were the subject of discussion of the governments because they form the opinion of people whether to open loot boxes. Thus, we developed two probabilities distributions: Fair. Probabilities: Common − 0.600, Uncommon − 0.350 Rare − 0.050 Unfair. Probabilities: ,Common − 0.849, Uncommon − 0.150, Rare − 0.001 as a result we examined a full factorial design on the factors of probabilities and showing the values. To gather real information and avoid manipulating results distribution of participants between groups was random. All the participants received their version of the game according to their experiment group. The final round of collection started with asking participants their opinion of loot boxes and whether they see loot boxes as gambling. These questions were already asked in the initial data collection and would be compared to their previous responses. Further, the participants were asked to describe their ideal loot boxes. The type that they would not mind if removing loot boxes from video games is not an option. Answers on this question might allow players, game companies, and governments to come to a single solution of loot box problem if the requirements and interests of all three sides will be satisfied.
3
Analysis
The main idea of the study was to understand the player’s perception of loot boxed, considering many factors of the loot box system. During the experiment, we were trying to influence participants opinions by changing two loot box factors mentioned in previous section. As a result of the experiment, we expected to see the change of participants’ views. We created a list of questions on which we will look for changes of opinions: 1) What is your opinion about loot boxes?, 2) Do you agree with the statement: “Loot boxes are gambling” ?, 3) Would you like to see the probabilities of item drops?, 4) Do you agree with the statement: “Probability distributions in loot
138
A. Sakhapov and J. A. Brown
boxes are fair” ?, 5) Would you like to see the history of your previous openings?, 6) Do you like that items are divided into different groups of rarity?, 7) Which type of items do you like more?, and 8) Which type of loot box payment do you prefer? Group 1. In the first group with fair probabilities and without display, we did not expect changes in opinion, because odds distribution was hidden from the participants, and they did not know the probabilities. But results were unexpected. Participants raised their opinion regarding fairness in loot box probabilities. Group 2. The second group had fair probabilities and they were displayed. We observed changes in opinion regarding fairness of probabilities in loot boxes. Three participants in this group changed their opinion from one that equals to “strongly disagree that probabilities in loot boxes are fair” to seven that stands for “strongly agree.” Group 3. The third group had unfair probabilities without showing them. For the participants being in this group was challenging, we received feedback from 10 participants from this group that they did not open even one rare item. Some of them thought that the actual probability of getting a rare item in the game is zero. As a result, 71.4% of the group raised their desire to see probabilities in loot boxes. Group 4. The fourth group had unfair probabilities with showing them, resulted with a probability distribution for the respondents which was found to be not fair, and 72.2% of the participants in the fourth group decreased their trust in the fairness of loot box odds. The results showed significance in question four that is stated: “Do you agree with the statement: “Probability distributions in loot boxes are fair” ?”. The significance level were if p < .05, where p value is an evidence against our hypothesis. For this question, we hypothesized that with different probabilities distributions in groups 1,2 and 3,4, there would be changes in opinion regarding the fairness of odds. For the first and second groups, we expected to see an increasing level of trust and for third and fourth groups − decreasing level. So, players’ perception of fairness of probability distributions is formed when they face and try loot boxes. The amount of dropped valuable items and odds distribution display helps them to decide if the probability distribution is fair in loot boxes or not. We observed an increasing level of trust in loot box probabilities in groups one and two, and a decreasing level in groups three and four. The biggest changes were in the fourth group, where the participants saw probabilities, their average changes are −1.44, proving the hypothesis. After analyzing the third question, we received p value equals to .000974 that makes results significant. The hypothesis for this question was that participants in groups with shown probabilities would increase their opinion regarding the display of probabilities. Therefore, our hypothesis was rejected because we observed changes conversely in groups that do not show probabilities. There was not a significant result that manipulations with probabilities and their display can influence players’ opinions whether they consider loot boxes as gambling.
Players Perception of Loot Boxes
139
Table 1. Distribution of answers regarding loot box preferences questions. Question
Strong No No Leaning No Neutral Leaning Yes Yes Strong Yes
Would you like to see
0
5
0
1
7
9
31
Do you think probability distributions 4
6
13
12
11
3
4
0
1
4
5
5
9
29
3
2
1
3
7
18
19
probabilities of item drops?
in loot boxes are fair? Would you like to see the history of your previous openings? Do you like that items are divided in different groups of rarity? Question
Only items that affect
Prefer Items that
Slightly for items that
balance
affect balance
affect balance
Which type of items do you like more?
3
1
2
Question
Only real-World Currency
Prefer Slightly for Real-World real-World Currency Currency
Which type of loot
2
0
0
Neutral Slightly for Prefer items that Items that
Only items that do not
do not affect balance
do not affect balance
affect balance
5
8
31
Neutral Slightly for in-Game Currency/Free
Prefer In-Game Currency/Free
Only in-Game Currency/Free
2
18
27
3
4
box payment do you prefer?
It was considered that players dislike loot boxes because loot boxes deteriorate video games. However, the distribution is evenly split across all opinions. They explained their belief that loot boxes are an additional way for developers to monetize video games and to earn money. Moreover, if game companies do not overuse loot boxes, players mostly are not against the presence of loot boxes in video games. Moreover, some of the participants noted that sometimes the process of opening loot boxes could be fun and enjoyable if the odds distribution of dropped items is balanced, and loot boxes do not affect game balance. Negative parts of loot boxes that were mentioned by participants are that loot boxes are very similar to gambling and are present in almost every game. However, even though players are neutral to loot boxes, overwhelming majority of the participants − 73.6% − agreed that loot boxes are the form of gambling and must be regulated. They mentioned that it could lead to addiction. Only 15.1% of interviewed students did not agree or slightly disagree. Overall satisfaction of results of loot box openings was also highly rated. So that, only 22.6% of the participants were not happy with results of their loot box trials, 26.4% were neither happy nor sad about the outcome, and 51% − happy and would open loot boxes in future. Even though 73.6% of the participants agreed with the statement that loot boxes are the form of gambling, most of the participants still open them some time, because they can give new experience, and the opening process might be fun and joyful. Only 25.6% spend money on loot boxes; others earn them in video games without investing money. As for preferences in loot boxes, all the questions with amount of people regarding each answer are provided in Table 1.
140
4
A. Sakhapov and J. A. Brown
Compromise Loot Boxes
Loot boxes provide additional monetization for game companies so they could pay back the money spent on development and supporting the game even if it was not very successful. This is why loot boxes appear in almost every modern video game, and game companies start to overuse them. Sometimes it might negatively affect overall video game quality because companies focus on earnings from loot boxes. Such cases lead to an adverse reaction among the gaming community. Payment Type—88.6% of the participants said the loot boxes should be not only paid with real-world money but should be received during the game. Outcome—As for this factor, the participants chose different outcome types for two types of online games: PvP and PvE. These types are contrasting, so in PvP games, the participants offered to use cosmetics that do not affect a player’s statistics and game balance as the only outcome in loot boxes. In PvE, game companies can use any of the outcomes, including booster packs and unique items that can help to improve game character because in PvE, all the players act together, and such things will not spoil the game experience for the others. Rarity—Participant’s responses yield that 90.5% claim items distribution in different groups of a rarity to be important for loot boxes. If all the items were the same type of rarity, players would be much less likely to buy them, because there would not be rare, valuable items that players wanted to get. Odds Distribution—As for odds distribution, participants wanted to be odds distribution fair, so that they do not need to open thousands of loot boxes to get one rare item. However, fairness is a subjective concept, so participants were asked to say the number of loot boxes they agree to open to get a rare item. Participants would like to see one rare item at least every 30 loot box. Odds Display—88.6% of the participants said that compromise loot boxes should show odds to the players.
5
Conclusions
In the study conducted, we discussed one of the controversial phenomenons in the video game industry − loot boxes. The majority of the participants face loot boxes almost every day in video games. Opinions are devised on if loot boxes negatively affect the game industry. However, there is the thing that unites those in the survey − the indifference to the game industry. The overall number of people who participated in the study is fifty-three, forty-six male and six females. All of the respondents had experience in video games and faced with loot boxes at least once. The perception of fairness in the study has highlighted many factors that broaden the understanding of people’s preferences in loot boxes. The results so far from the available sample size and covered loot box factors suggest that respondents’ loot box experience influenced their preferences. Finally, the participants proposed their vision of ideal loot boxes that can satisfy all the sides of the conflict - giving a picture of ideal loot box that would generate income for game companies, meet governments’ regulations and be accepted by video games community.
Players Perception of Loot Boxes
141
References 1. Brooks, A., Clark, G.L.: Associations between loot box use, problematic gaming and gambling, and gambling-related cognitions. Addict. Behav. 96, 26–34 (2019) 2. Brown, J.A.: Pitching diablo. ICGA J. 40(4), 417–424 (2018) 3. China’s Ministry of Culture: Online game regulations. http://www.mcprc.gov.cn/ whzx/bnsjdt/whscs/201612/t20161205 464422.html 4. Cross, K.: How the legal battle around loot boxes will change video games forever. https://www.theverge.com/2017/12/19/16783136/loot-boxes-video-gamesgambling-legal 5. Eyal, N.: Hooked: How to Build Habit-Forming Products. Portfolio (2014) 6. Fields, T.: Mobile and Social Game Design: Monetization Methods and Design. CRC Press, Boca Raton (2014) 7. Frank, A.: Star wars battlefront 2 content might take years to unlock, but EA won’t say. https://www.polygon.com/2017/11/15/16656478/star-warsbattlefront-2-content-unlock-time-cost 8. Gambling Commission of UK: Virtual currencies, esports and social casino gaming - position paper (2017) 9. Gaming Commission of Belgium: Research report on loot boxes (2018) 10. Grabarczyk, P., Nielsen, R.K.L.: Are loot boxes gambling? random reward mechanisms in video games (2018) 11. Hamari, J., Alha, K., Jarvela, S., Kivikangas, J.M., Koivisto, J., Paavilainen, J.: Why do players buy in-game content? an empirical study on concrete purchase motivations. Comput. Hum. Behav. 68, 538–546 (2016) 12. Lawrence, N.: The troubling psychology of pay-to-loot systems. http://www.ign. com/articles/2017/04/24/the-troubling-psychology-of-pay-to-loot-systems 13. Olivetti, J.: The perfect ten: The truth about lockboxes (2012). https://www. engadget.com/2012/05/17/the-perfect-ten-the-truth-about-lockboxes/ 14. Schreier, J.: ESRB says it doesn’t see ‘loot boxes’ as gambling. https://kotaku. com/esrb-says-it-doesnt-see-loot-boxes-as-gambling-1819363091 15. Schreier, J.: U.S. senator introduces bill to ban loot boxes and payto-win microtransactions. https://kotaku.com/u-s-senator-introduces-bill-to-banloot-boxes-and-pay-1834612226 16. Shibuya, A., Teramoto, M., Shoun, A.: Systematic analysis of in-game purchases and social features of mobile social games in japan (2015) 17. Stenros, J., Sotamaa, O.: Commoditization of helping players play: rise of the service paradigm (2009) 18. Times, S.: Remote gambling bill could have ‘negative effects’ for digital games. https://www.straitstimes.com/singapore/remote-gambling-bill-couldhave-negative-effects-for-digital-games 19. Walker, A.: Victoria’s gambling regulator: Loot boxes ‘constitute gambling’. https://www.kotaku.com.au/2017/11/victorias-gambling-regulator-loot-boxesconstitute-gambling/ 20. Wright, S.T.: The evolution of loot boxes. https://www.pcgamer.com/theevolution-of-loot-boxes/
Braillestick: A Game Control Proposal for Blind Users Based on the Braille Typewriter Kayo C. Santana1(B) , Abel R. Galv˜ ao2 , Gabriel S. Azevedo2 , Victor T. Sarinho1 , and Claudia P. Pereira1 1
Computer Science Post-Graduate Program - State University of Feira de Santana, Feira de Santana, Bahia, Brazil [email protected], {vsarinho,claudiap}@uefs.br 2 Department of Exact Sciences - State University of Feira de Santana, Feira de Santana, Bahia, Brazil [email protected], [email protected] Abstract. Nowadays, modern technology is increasingly present in society. However, for the visually impaired people, this technological advance is not so inclusive, especially in the learning processes. This article discusses the development of Braillestick, a joystick similar to the braille machine but with a low cost and greater portability for different platforms. This joystick aims to be a pathway for the development and use of serious games for audiences with visual problems. Keywords: Game joystick
1
· Braille device · Assistive technology
Introduction
By the growth of technology in the 21st century, the use of the internet in routine activities has become practically indispensable [1]. As a result, common activities like scheduling an event on the smartphone, until something more complex like developing natural language bots, has become simpler and possible for people in general. However, when considering people with visual impairments, although some research has focused on the development of learning environments for them [2–4], they do not have in most cases a viable access to current Information Technology (IT) solutions. Regarding the development of learning environments for visually impaired, the use of digital games can help in amused ways, with different applications in education, culture, training and practicing [4–6]. However, due to the constant IT evolution, the teaching and learning process through games needs to be constantly updated to find accessible ways for all audiences. As a result, it is necessary to develop alternative and inclusive technologies for people with visual impairments that do not deprive the game experience of the final user. Supported by organization CAPES. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 142–147, 2020. https://doi.org/10.1007/978-3-030-65736-9_13
A Game Control Proposal for Blind Users Based on the Braille Typewriter
143
Looking for an accessible and portable alternative for people with visual impairments to play serious games related to the Braille writing system, this paper presents the Braillestick proposal. It is an adapted game joystick that works like a Braille Typewriter (BT) and allows people to learn and practice the Braille writing while playing.
2
Related Work
The Braille System is a valuable tool for people with congenital or acquired blindness, as long as it allows them to read and write. Braille was created by Loius Braille, in 1825, and today is still the most precious instrument of writing and reading for visually impaired people, allowing direct contact with words [7,8]. Over the years, some researches have been responsible for spreading the Braille writing process across different devices, such as smartphones, computers and tablets. As an example, Hamzah and Fadzil [9] proposed the Voice4Blind application, which allows users to type using a keyboard adapted for Braille when connected to mobile devices. Per Lee and others [10] and Yang and others [11] proposals, devices that work like a glove were developed, allowing the writing of characters by the movement of the hands, where position and movement of the fingers are similar to the form used on the BT. Similarly, Chakraborty and others [12] proposed the FLight, a system of reading and writing that uses a low-cost wearable glove device that reads Braille printed texts and a special ruler containing blocks for easy writing. To help visually impaired children, Gadiraju [13] proposed the BrailleBlocks learning system. It consists of a set of wooden blocks and pins representing a Braille cell that can be used by children, together with an interface computer with games for their parents. Google also launched a Braille keyboard for TalkBack [14], a screen reader application for blind Android users. This proposal seeks to work similarly to the standard systems of this format, with 6 buttons distributed on the sides of the Android device [14]. Similar to those proposals, Braillestick offers an input mechanism that can be used in different applications, and also as a game control for serious games related to the Braille writing system. Unlike such proposals, Braillestick is a hardware solution that works like the BT, using its written mode to convert the Braille writing system to an equivalent in the latin alphabet.
3
Methodology
The Design Thinking methodology was applied in the development of the game device. It was chosen because of its potential to not only look into the product’s development, in this case the Braillestick, but also in the target public needs and feedback. The Design Thinking can be carried out according to 5 main steps [15]: Empathize, Define, Ideate, Prototype and Test. In Empathize, the team
144
K. C. Santana et al.
needs to recognize the target audience of the product in order to create a meaningful innovation for them. After that, in the Define step, users’ problems must be thoroughly reviewed to be overcome, leading to different ideas that will be explored in the Ideate step. Finally, the Prototype step is reserved to the design of artifacts that will help to get closer to the final solution, being redesigned according users experiences and opinions observed in the Test.
Fig. 1. Braillestick design process methodology. Adapted from [15].
Figure 1 illustrates the Design Thinking steps applied in the Braillestick’s development process. As a result, it is possible to visualize at the Ideate step two modeling proposals for the Braillestick prototype (Fig. 2). The suggested fingering position can also be seen in Fig. 2 by the codes (LT, LI, LM, LR, RT, RI, RM, RR, and RP), where the first letter represents the Left or Right hand, and the second the finger name (Thumb, Index, Middle, Ring, and Pinky).
Fig. 2. Braillestick modeling proposals.
These two proposals offer two main differences for users. While the prototype idea on the left side is a proposal to use holding with the hands (similar to a smartphone-joystick), which allows its use in different places, the second one on the right side is a proposal that needs to be over a surface, being more similar to the BT.
A Game Control Proposal for Blind Users Based on the Braille Typewriter
4
145
Obtained Results
For the development of an initial prototype, the Arduino UNO R3 was chosen, which is an open source hardware platform that is ideal for the creation of devices that allow interaction with the environment using buttons and sensors as input [16]. Arduino has a bootloader implemented on the board and an interface with the computer that uses the Processing language, based on the C/C++ language [17]. Among the attributes of the platform, it is possible to highlight the ease of use and the cost-benefit, since it is affordable. For the input of Braillestick, the BT was initially observed. It has 6 buttons responsible for writing the characters in Braille. Each pressed button creates a relief in a different place in the Braille cell, representing different characters according to the pressed buttons simultaneously.
Fig. 3. Braillestick’s use scheme.
Figure 3 shows how the Braillestick works. At 1st moment, the user presses simultaneously buttons from 1 to 6, according to the Braille character he wants to write. After that, at the 2nd moment, the Arduino checks the states of the buttons and sends to a device (e.g. computer), through serial port (USB), a code to identify which buttons were pressed. Finally, at the 3rd moment, the device receives that code and processes it through a driver, developed in Python programming language, for reading the Arduino output and to match the received code with the corresponding letter of the Latin alphabet. For each obtained letter, the Braillearning [18] software illustrates the displayed text and the respective braille representation. Moreover, in an entertainment perspective, the The Braille Typist game shows the obtained braille training and learning by matching letters according the game logic. At this point, Braillestick’s prototype (Fig. 3) is already working, according tests performed with the development team in the Braillearning software, that investigates not only the process of writing but also the shape and dimensions of the case, providing an ideal finger position.
146
K. C. Santana et al.
Until now, the tests were developed by using Braillestick as a input device in Braillearning Software, this way, we were able to test if all possible combination was working fine, according the Braille Alphabet. Besides that, it was also tested Graphic accentuation and other symbols used to perform numbers and capital letters.
5
Conclusions and Future Work
Braillestick’s development and research refers not only to the production of a game joystick but also to the development of a low-cost assistive technology to help the social and digital inclusion of visually impaired people. While a Braille Typewriter machine costs around $800.00, Braillestick’s production costs were around $60.00 (arduino, 3d print, and other components), representing a price reduction of 75%. Besides that, Braillestick is also lightweight and easy to transport, attending 3 of the main problems related by users about the Braille Typewriter. Until then Braillestick’s case and its logical part are already developed and working fine. The next steps of the research target the test by visually impaired people, with its use on the Braillearning learning object that is already developed and is working, and with the blind game The Braille Typist, that is currently under development. As future works, we will also explore the input and precision rates of visually impaired users in typing games using Braillestick and other devices (regular joystick and keyboard, both mapped to work similar as a braille typewriter) to see the benefits of the proposed console when compared with other devices. Furthermore, the development of new games is also expected to disseminate the joystick use and even the Braille learning and application.
References 1. Hoffman, D.L., Novak, T.P., Venkatesh, A.: Has the internet become indispensable? Commun. ACM 47(7), 37–42 (2004). https://doi.org/10.1145/1005817.1005818 2. Bhowmick, A., Hazarika, S.M.: An insight into assistive technology for the visually impaired and blind people: state-of-the-art and future trends. J. Multi. User Inter. 11(2), 149–172 (2017). https://doi.org/10.1007/s12193-016-0235-6 3. Ferreira, F., Cavaco, S.: Mathematics for all: a game-based learning environment for visually impaired students. In: 2014 IEEE Frontiers in Education Conference (FIE) Proceedings, pp. 1–8 (2014) 4. Jaramillo-Alc´ azar, A., Luj´ an-Mora, S.: Mobile serious games: an accessibility assessment for people with visual impairments. In: Proceedings of the 5th International Conference on Technological Ecosystems for Enhancing Multiculturality. TEEM 2017, Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3144826.3145416 5. Leporini, B., Hersh, M.: Games for the rehabilitation of disabled people. In: Proceedings of the 4th Workshop on ICTs for Improving Patients Rehabilitation Research Techniques, pp. 109–112. REHAB 2016, Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/3051488.3051496
A Game Control Proposal for Blind Users Based on the Braille Typewriter
147
6. Santana, K., Pereira, C.P., Fernandes, A., Santos, A.J.O.S., Macedo, R.: Blinds, basic education: jogo digital inclusivo para auxiliar o processo de ensinoaprendizagem das pessoas com deficiˆencia visual. In: Brazilian Symposium on Computers in Education (Simp´ osio Brasileiro de Inform´ atica na Educa¸ca ˜o-SBIE), vol. 28, p. 877 (2017). https://doi.org/10.5753/cbie.sbie.2017.877 7. Machado, E.V.: A importˆ ancia do (re) conhecimento do sistema braille para a humaniza¸ca ˜o das pol´ıticas p´ ublicas de inclus˜ ao. Int. Stud. Law Educ. 9, 49–54 (2011) 8. Roth, G.A., Fee, E.: The invention of braille. Am. J. Public Health 101(3), 454–454 (2011). https://doi.org/10.2105/AJPH.2010.200865 9. Hamzah, R., Fadzil, M.I.M.: Voice4blind: the talking braille keyboard to assist the visual impaired users in text messaging. In: 2016 4th International Conference on User Science and Engineering (i-USEr), pp. 265–270 (2016). https://doi.org/10. 1109/IUSER.2016.7857972 10. Lee, S., Hong, S.H., Jeon, J.W.: Designing a universal keyboard using chording gloves. In: Proceedings of the 2003 Conference on Universal Usability, pp. 142–147. CUU 2003. Association for Computing Machinery, New York, NY, USA (2002). https://doi.org/10.1145/957205.957230 11. Yang, T.J., Chen, W.A., Chu, Y.L., You, Z.X., Chou, C.H.: Tactile braille learning system to assist visual impaired users to learn taiwanese braille. In: SIGGRAPH Asia 2017 Posters. SA 2017. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3145690.3145710 12. Chakraborty, T., Khan, T.A., Al Islam, A.B.M.A.: Flight: A low-cost reading and writing system for economically less-privileged visually-impaired people exploiting ink-based braille system. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 531–540. CHI 2017. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3025453.3025646 13. Gadiraju, V.: Brailleblocks: braille toys for cross-ability collaboration. In: The 21st International ACM SIGACCESS Conference on Computers and Accessibility, pp. 688–690. ASSETS 2019. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3308561.3356104 14. Silva, C.C.: Google lan¸ca teclado braile gratuito integrado ao android (2020). https://www.tecmundo.com.br/software/151961-google-lanca-teclado-brailegratuito-integrado-android.htm 15. Institute of Design at Stanford: An introduction to design thinking: Process guide (2010) 16. Souza, A.R.d., Paix˜ ao, A.C., Uzˆeda, D.D., Dias, M.A., Duarte, S., Amorim, H.S.d.: A placa arduino: uma op¸ca ˜o de baixo custo para experiˆencias de f´ısica assistidas pelo pc. Revista Brasileira de Ensino de F´ısica, 33(1), 01–05 (2011). https://doi. org/10.1590/S1806-11172011000100026 17. Arduino: Arduino reference (2020). https://www.arduino.cc/reference/en? from=Reference.Extended 18. Santana, K., Pereira, C.P., de Santana, B.S.: Braillearning: Software para simular a m´ aquina de escrever em braille. In: Brazilian Symposium on Computers in Education (Simp´ osio Brasileiro de Inform´ atica na Educa¸ca ˜o-SBIE), vol. 30, p. 1101 (2019). https://doi.org/10.5753/cbie.sbie.2019.1101
Virtual Reality and Augmented Reality
Conquer Catharsis – A VR Environment for Anxiety Treatment of Children and Adolescents Andreas Lenz1 , Helmut Hlavacs1(B) , Oswald Kothgassner2 , and Anna Felnhofer3 1
University of Vienna, Entertainment Computing, Vienna, Austria [email protected] 2 Department of Child and Adolescent Psychiatry, Medical University of Vienna, Vienna, Austria 3 Department of Pediatrics and Adolescent Medicine, Medical University of Vienna, Vienna, Austria Abstract. In this paper we present a novel VR-based tool for the treatment of anxiety disorders for children and adolescents. The tool consists of a photo-realistic VR game, together with a biofeedback device, and a second screen for enabling interventions for the practitioner. We describe the tool, how we measure and use biofeedback, and evaluate the sense of felt presence of a group of experimental participants. Keywords: Virtual reality
1
· Children · Anxiety treatment · Relaxing
Introduction
The aim of Conquer Catharsis is to create a biofeedback-therapy simulation game that utilizes virtual reality and gamification to create an immersing and fun environment for children and adolescents with anxiety disorders to practice relaxation. The main goal is to create a therapy game that would make relaxation training more enjoyable, but also act as a research tool. As the main biofeedback metric for relaxation, the patients’ current average heart rate is chosen (HR), the main goal of the system being to teach patients to relax and thus reduce their HR at will. As feedback, progress is symbolized by gradually making a forest bloom again as the heart rate is reduced. The technical goals include: – Create an environment that can be explored in VR for relaxation training. – Integrate communication between an HR sensor, an external control app for the psychologists, and the VR game. – Use HR-Sensor data to drive gameplay elements. – Evaluate presence and perceived realism in a user study. We aim at an environment that completely immerses patients, in order to forget reality, be consumed by a virtual, friendly world that allows interactions and explorations. The main research question therefore is, whether visitors of c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 151–162, 2020. https://doi.org/10.1007/978-3-030-65736-9_14
152
A. Lenz et al.
Catharsis Island indeed are immersed and experience the psychological phenomenon called presence, concerning the physicality of the world.
2
Virtual Reality in Psychology
The usage of VR in experiments has been of great interest to psychologists and while celebrated for the shear endless possibilities of immersing a subject into any environment and being able to perform virtually any experiment without (physical) harm to the subject, it has also been scrutinized for often creating poorly designed and potentially ecologically invalid environments [8]. When used in a therapeutical setting, in the past, VR has been quite popular for both exposure therapy and relaxation therapy, which are discussed in the following. Exposure therapy is a highly effective method for treating anxiety disorders [7]. The patient is systematically and repeatedly presented with a feared or avoided stimulus (such as spiders or speaking in front of a large audience) [6]. In the real world it can often be difficult to create a scenario in which the patient can be exposed to the triggering cue [6]. Consider for example a room full of spiders or a recreation of a roadside ambush. This is where Virtual Reality can be a solution, since in VR such exposures can be simulated. In this context, in vivo means that the exposure therapy is performed in a real setting, while in virtuo or Virtual Reality Exposure Therapy (VRET) refers to therapy performed in a simulated VR environment. VRET can be more cost effective when compared to in vivo methods when it comes to scenarios that are difficult to reproduce in reality over and over again. Even though about 30% of patients turn down exposure therapy once it is explained to them over 80% express a preference for VRET over in vivo [6]. While VRET is widely celebrated as tool that solves many problems with in vivo exposure therapy, presenting it as a safe, tailor-made experience with, greater control for the therapist, [8] shows that there are still many fundamental issues with the approach to creating such scenarios with concerns regarding their ecological validity. Ramirez argues that VR-experiments lacking in perspectival-fidelity and context realism often fail to accurately model situation-related features thus increasing the risk of being ecologically invalid or at least questionable. He proposes an equivalency principle which states that VR application be held to the same ethical standards as non VR experiments [8]. 2.1
Relaxation Therapy
Stress generally arises when current challenges exceed available coping potential. For over-challenged children and adolescents stress can have a detrimental effect on the development of adequate coping strategies [1]. Due to [5] three major categories of stressors for children and adolescents are described. – Normative stressors are events that arise from the natural development of a child in its social context, such as expectations from family, society and internal expectation to oneself [1,5].
A VR Environment for Anxiety Treatment of Children and Adolescents
153
– Critical life events are events that happen suddenly and can change daily routines requiring readjustment to life circumstances. These can be the death of loved ones, chronic illness, or the parents’ divorcing. They are generally sudden and cannot be anticipated [1,5]. – Every day problems and expectations are the small irritations and frustrations dealt with on a daily basis. These can appear daily in a recurring manner and are felt as stronger irritations by children because they often cannot be accepted as impossibilities of life as easily as by adults [1,5]. Relaxation therapy involves refocusing attention from stressors to something calming and becoming more mindful of one’s body[10]. There are many techniques including: – Autogenic relaxation, in which the patient uses visual imagery and body awareness to control their reaction to stress [10]. – Progressive muscle relaxation. In this technique the patient focuses on tensing each muscle group which intends to help become more aware of physical tensions [10]. – Biofeedback focuses the patients’ attention towards their body’s reaction to stress via visualization of biosignals [9]. – Visualization is a technique in which patients attempt relaxation by mentally visualizing calming places [10]. VR can be an important tool in this context to create the visual and auditive stimuli for the patients.
3 3.1
Conquer Catharsis: An Immersive Relaxation Therapy Environment Game Concept
The concept of Conquer Catharsis is a virtual reality game connected to a heart rate (HR) sensor in which children and adolescents can have biofeedback presented to them in a more interesting and engaging manner than simply reading numbers off a screen while performing exercises. Additionally the game has to have very high visual fidelity aiming for a high degree of perspectival-fidelity and context realism. Some of the properties have been defined a priori, including a natural setting in a forest, and using HR as measurement for progress. On this basis more specific requirements were coined together with the psychologists, and a design was created which was subsequently iterated on for several months. Below, the design, technical implementation, and structure of the game are presented. Being at its core as a gamified relaxation therapy tool, the game has certain requirements it needs to fulfill to properly assist psychologists in their process. Requirements we assessed are as follows: – The game needs to be inviting to children and allow for engaging interactions, as opposed to initial assumptions made about a therapeutic tool needing to be serious and controlled. It should exhibit unpredictable and life like interactions.
154
A. Lenz et al.
– It should contain a progression system where the player can make progressive changes to the world by completing the therapeutic exercises - later named relaxation events. The play sessions should be short, it shoud be possible to pause them and resume them later on. This warrants a saving and loading system with player profiles. – Biometric data gathered from a POLAR HR-Measurement band should be displayed to the therapist with the option to automatically or manually (specifically for the case where a the HR-Measurement device is not available) trigger relaxation events in the game environment. 3.2
Implementation
The software package as a whole is able to receive data from a HR monitor via Bluetooth, process this data and display it to the player (child) in a user friendly and immersing way by rendering an adaptive environment and allowing the supervisor (psychologist) to change simulation parameters in real time as well as record the HR data from the play session. Since Conquer Catharsis is a VR game, Unreal Engine 4 (UE4) was chosen as the core authoring program. A custom version was created which integrated the NVidia VR-Works SDK and NVidia FLEX SDK. For 3D model fix-ups, 3DS Max was used. Gimp 2 was used for texture authoring and editing. TexMaker 2 was used for automatic generation of normal maps or ambient occlusion maps. Quixel’s Megascans Bridge was used to import assets from the Megascans Library into UE4. The project was stored in a repository on bitbucket and managed with Source Tree. During development we found that it was virtually impossible to connect the POLAR HR band directly to the used gaming laptop under MS Windows 10. Thus, to mediate the Bluetooth connection between the POLAR HR band and the VR game, a second-screen app was created. The app is deployed together with the packaged game and runs on the Android operating system. Additionally the game comes with an integrated TPC server that serves as a connection between the app and the game. During the game a QR code can be scanned by any QR code scanning app running on a smartphone. The code contains a url that is generated at runtime and directs the smartphone’s browser to the address of the game’s local TPC server where the second-screen app can be downloaded. A second QR code can be scanned by the second-screen app and is used to connect the app and the game server. Once the connection is established the app can connect to the POLAR HR band. Then it periodically sends HR data and other control data input by the supervisors to the game. 3.3
Game Design
Design has many facets depending on the discipline examined. In this context design is understood as gameplay design - which is the design of specific gameplay mechanics - and level design which is the design of the play-space. Since most game have some sort of play-space that needs to be defined and some sort of ruleset, the fields of gameplay and level design are the most prominently perceived
A VR Environment for Anxiety Treatment of Children and Adolescents
155
by players. Because the game is played under supervision, all the interaction possibilities and traversable locations can be explained to the player by the supervisor (psychologist) if the design fails to communicate something properly (given that making all options easily understandable is goal of the design). This made it much easier to leave gaps in the design, or try things that might not work well, as all interactions could be explained by a supervisor during play. The core gameplay of Conquer Catharsis consists of teleporting and relaxing. Additionally the player can grab or push certain predefined objects and get navigation help. For each motion controller a virtual hand is displayed in the game to increase the perspectival fidelity effect. Displaying the motion controllers as 3D models in the virtual world is also a popular method. In this tour though abstract hand models - as opposed to realistic hand, or 3D models of the motion controllers - were chosen to disconnect the player from the real world and not cause an uncanny valley effect if the hands felt too lifelike [11]. The gameplay is split into two types: structured relaxation events and free exploration passages. – Relaxation events are small areas marked as stone circles, meant to be visually segregated from the rest of the environment. Upon entry the event activates, displaying a wind around the player directing their attention towards the point of interest (this can be a waterfall that will become visible, or a tree that turns green) and the sounds of the environment are reduced in volume to exclude distractions. The player can then start the event by pushing down on a large stone button in front of them. When an event starts, it periodically queries the HR-Monitor connection component for the current heart rate, Baseline, Threshold, and Threshold time. All the values except for the heart rate can be configured in the companion app and adjusted in real time. – Heart rate is the current heart rate of the player as reported by the HR device. – Baseline is the base heart rate of the player recorded over a certain interval. – Threshold is the target heart rate the player must be under in order to complete the event. – Threshold time is the amount of time the heart rate must be under the threshold heart rate in order to complete the event. The relaxation event uses these values to drive changes in the world by calculating progress as a linear interpolation of the heart rate between the baseline and the threshold, where when the heart rate equals the baseline, progress is 0 and when the heart rate equals the threshold, progress is 1 (see Fig. 1–Fig. 3). This is then passed to an event handler component that implements case specific logic for changing the world (see Fig. 4). Exploration passages are the less structured free play aspects of the game. They are designed both as passages to the next event, usually with a path or landmark visible, and as areas where more paidia-like free play can take place [3]. Toys in forms of swords, shields, sticks and FLEX enabled plants and balls are placed here, as well as obstacles like rocks and tree stumps to break up the linear structure and encourage the intended free play. During game play, the player can use the game controllers to carry out the following interaction types:
156
A. Lenz et al.
Fig. 1. Forest bloom event at progress 0.
Fig. 2. Forest bloom event at progress 0.5.
Fig. 3. Forest bloom event at progress 1.
Fig. 4. Progress handler.
– Teleportation locomotion is one of the most important tasks in 3D VRGames [2]. HMD positional tracking within the physical - real world - playspace is managed by the Unreal Engine 4 API. However, once the player is required to travel larger distances than the physical play-space, different locomotion techniques need to be used [2]. Unreal Engine 4s VR template already includes a basic teleportation mechanic that requires the player to aim a beam at the desired arrival location. Using this template as a starting point we built a system that gives feedback as to where navigation is allowed, thus restricting teleportation over larger distances. One of the challenges of creating large spaces in VR - apart from managing performance - is comfortable locomotion. Using a teleportation system that allows the player to advance a few meters per interaction can be tiring when the distance between points of interest is in the dimension of hundreds of meters. Since the structure of the island is built around small points of interest with long exploration paths between them, a larger teleportation distance was chosen. – Grabbing is only needed during the free play aspects and serves the purpose of picking up objects or manipulating FLEX enabled objects, like branches.
A VR Environment for Anxiety Treatment of Children and Adolescents
157
– Navigation help supports the supervisor in guiding the player through the tour. It is also meant to feel like fun and magic. When the face button of the left controller is pressed, a visual path is overlaid on the world to show a path towards the nearest incomplete event. Additionally a ghostly rabbit emerges from the hand and runs along the path. Although the game is played under supervision we found that this navigation help might be necessary to better help the player understand the supervisors guidance. – Relaxation is a gameplay mechanic that is directly driven by the player but cannot be manipulated or mastered by better understanding the implementation of the mechanic. Consider a Jump and Run game where the player moves from platform to platform by timing their jumps correctly. The player can achieve mastery over the gameplay mechanic of jumping by practicing and learning the underlying mechanics that the game designer defined. But with biosignals driving the gameplay, the player’s body becomes the focus of mastery. Level Design. The playable area consists of one large persistent level (or world) with several sub-levels for organizational purposes and streaming levels for optimization purposes. The separation of the levels only partially overlaps with the conceptual grouping of game areas. In each thematic area (e.g. Forest, Cave, Ruin, etc.) there exists at least one relaxation event that controls some part of its environment. 3.4
Catharsis Island
The decision to create an island as the stage for the relaxation adventure was made after several iterations on the planed scenarios and diverged greatly from the original idea. Ultimately it is a visual expression of the game-play and highlevel design. Creating an island came with its own set of new challenges that also changed the requirements towards the project structure in regards to performance. During brainstorming and assessment with the psychologists from the Medical University of Vienna we came up with many HR-Events of varying complexity and spacial requirements (e.g. one required a large amount of space to move while another only needed the player to stand still). In combination with the requirement to have the player have an overview of their goals and successfully completed relaxation events, we designed a progression scheme with vertically arranged tiers and horizontally arranged zones (see Fig. 5 showing them as a level graph), where each stage could overlook all the its zones and HR-events. Naturally the meta design gave way to individual requirements and artistic choices in the execution. During development, zones where removed, changing some of the vertical and horizontal arrangements originally laid out. Choosing to visually execute on the concept with an island originated from the artistic desire to have some kind of large body of water combined with the meta design of ascending levels which was designed as a tree covered mountain This process resulted in the following zones:
158
A. Lenz et al.
Fig. 5. Level gaph
Fig. 6. View from the starting point.
Fig. 7. View of the Island from the Beach.
– Zone 1: Beach. Being the starting area means that the first experiences within the game would be made here, thus the zone also acts as a kind of tutorial. This results in the requirement that right at the beginning, players need to have a clear overview of where they are and where they have to go. When the tour starts for the first time, the player is placed on the beach oriented towards the forest - assuming that the VR-HMD in the real world is oriented in its default forward direction (see Fig. 6). A path leads towards the forest with several human made elements such as wooden fences, accentuating the visual cues that try to lead the player into the forest. As the players look straight ahead they can discover the first HR-event. The space around it and on the rocks behind the players define one of the larger open spaces in the game, with the intention of inviting the players to learn the teleportation mechanic, and have much physical space for exploration. Behind the players the ocean separates them from the rest of the island. Several larger landmarks are visible, which are intended to guide the players further into the island (see Fig. 7). – Zone 2: Forest. Apart from a few trees scattered around the island, the forest zone is the only remnant of the originally planed forest that was supposed
A VR Environment for Anxiety Treatment of Children and Adolescents
159
to be the entire experience. The forest is situated on an elevated plateau above the beach and is reached by a dirt path that leads up the cliffs. Once in the forest the path forks into two directions, one marked with fences and a clear opening in the undergrowth leads to a clearing where the “animal petting” relaxation event awaits (see Fig. 8).
Fig. 8. Forest clearing with deer.
Fig. 9. View of the ocean from the cliffs.
– Zone 3: Cliffs. The cliffs are a purely explorational zone. Here the player is challenged to teleport from platform to platform over the water and experience being high up on a windy cliff. Between the cliff rocks ropes are spanned with flags that flutter in the wind to support the atmosphere (Figs. 9 and 10).
Fig. 10. Top view of the cliffs.
Fig. 11. Throne Room Overview
– Zone 5: Waterfall. In its initial state the waterfall is not visible, but by relaxing in the relaxation event the player is able to gradually make it appear and flow again. – Zone 6: Port. The port is the first area the player encounters human made structures. An overgrown ruin with partially intact elements surrounded by trees and water presents itself to the player. From here a teleporter can be taken up to the main ruin on the mountain. – Zone 7: Ruins. The ruins are designed as an exploration area that tries to be less linear that the rest of the game. There are many more side paths and hidden areas for discovery (see Fig. 11). Thematically the ruins are meant as a solidification of a past human presence on the island, layered on top of the even older “elder ruins”.
160
A. Lenz et al.
– Zone 8: Ruins: Fountain Room. The fountain room can be entered by completing an HR-event that opens the door to the zone. – Zone 9: Ruins: Tar Bazaar. The basic concept for this zone came from having some form of stress inducing event that distracted the player from concentrating purely on relaxing. The idea was to have a constant stream of stimuli coming at the player while they tried to relax. During development the spacial and thematic structure of the area where the “clean-up event” was supposed to take place was changed. Visually it became a sunken city square in a tar pit (hence the name “Tar Bazaar”). In this “clean-up event”, the player has to fish garbage out of a tar pit while relaxing. The player has “wind vacuums” attached to their hands which suck the garbage out of the tar at a distance. As the player relaxes the flow of trash becomes less and the tar pit slowly turns into a lake. Upon completion waterfalls start flowing again and the player can proceed towards the cave. – Zone 10: Cave. The cave is accessed through a door that is opened with a relaxation event and is split into two parts: the cave ruins and the crystal cave. The cave ruins are the first part of the cave serving as a transition zone to the crystal cave. Thematically it is part of an underground passageway through the mountain, build by the humans who also build the ruins. To give a sense of size beyond the limits of the actual physical space the level inhabits, a sky dome with a projection of a larger cave was used as a background. Objects such as hills and castle ruins were placed at quickly decreasing scales towards the back of the cave to increase the sense of depth. – Zone 11: Crystal Cave. The almost completely unlit crystal cave forms the rear part of the cave. Here the player lights up glowing crystals mounted on the walls in a relaxation event. The intent was to have to player feel in control of a dark place with limited vision and make an otherwise unwelcoming place seem magical. – Zone 12: Forest Lake. After exiting the cave the player passes through a patch of conifer trees to the forest lake. Here a relaxation event allows the player to lift a sunken bridge out of the water allowing the player to pass to a platform with a teleporter in the center of the lake. This is the last relaxation event and is rewarded with the teleporter to the top of the mountain. – Zone 13: Mountain Top. From the mountain top area the player is able to see the entire island and through the light beams each completed relaxation event emits see all their achievements.
4
User Study and Results
To evaluate the sense of presence experienced in the game a user study was conducted. 20 People of varying ages, genders and VR experience were given 15 min to play in the forest and then asked to fill out a questionaire based on the IGroup Presence Questionnaire (IPQ) [4]. The IPQ evaluates 3 main categories:
A VR Environment for Anxiety Treatment of Children and Adolescents
161
– Spatial Presence, which is the sense of being in the Virtual Environment. – Involvement - how much attention the user devoted to the virtual environment. – Experienced Realism - the subjective realism of the virtual environment. Additionally there is one item to assess the general “sense of being there”.
Fig. 12. Questionnaire results
After evaluating the questionnaires the data was plotted in a spider chart (see Fig. 12). We find users had a high sense of being spatially present, which in part we attribute to the good tracking of the HTC Vive and the high frame rate. Experienced realism achieved a comparatively high rating. Deeper analysis showed that most users did not find the actually setting realistic but experienced the place as realistic. Some items of the questionnaire directly tested if the user perceived the tour as realistic when compared to their real world. We find this to be problematic for experiences that do not seek to recreate real world places. This would be more suited to experiences such as virtual house viewings but not virtual fantasy places. We conclude that people can experience a fantasy ruin as realistic and feel present in that space, while very well knowing that this space does not exists anywhere else in their real world experience.
5
Conclusion
Creating a computer game is not a trivial task. In this paper we describe the processes involved in the development of Conquer Catharsis, starting with the design and planning, then how the assets were prepared and what technical decisions were made and also covering some of the ethical considerations made. In a user study we evaluated how present a player feels in the tour. The environment will serve as the basis of further research into the topic of biofeedback relaxation therapy.
162
A. Lenz et al.
References 1. Beyer, A., Lohaus, A.: Konzepte zur Stressentstehung und Stressbew ¨ altigung im Kindes- und Jugendalter. In: Stress und Stressbew¨ altigung im Kindes-und Jugendalter, pp. 11–27 (2007) 2. Bozgeyikli, E.: Locomotion in virtual reality video games. In: Lee, N. (ed.) Encyclopedia of Computer Graphics and Games, pp. 1–6. Springer International Publishing, Cham (2018). ISBN: 978-3-319-08234-9. https://doi.org/10.1007/978-3-31908234-9-186-1 3. Caillois, R., Barash, M.: Man, play and games. University of Illinois Press (2001) 4. igroup presence questionnaire (IPQ) overview. http://www.igroup.org/pq/ipq/ index.php 5. McNamara, S.: Stress in Young People: What’s New and What To Do. A&C Black (2000) 6. Miloff, A., et al.: Single-session gamified virtual reality exposure therapy for spider phobia vs. traditional exposure therapy: study protocol for a randomized controlled non-inferiority trial. In: Trials 17.1, p. 60, February 2016. ISSN: 1745-6215. https:// doi.org/10.1186/s13063-016-1171-1 7. Ougrin, D.: Efficacy of exposure versus cognitive therapy in anxiety disorders: systematic review and meta-analysis. In: BMC Psychiatry 11.1, p. 200, December 2011. ISSN: 1471-244X. https://doi.org/10.1186/1471-244X-11-200 8. Ramirez, E.J.: Ecological and ethical issues in virtual reality research: a call for increased scrutiny. Philosophical Psychol., 32(2), 211–233 (2019). https://doi.org/ 10.1080/09515089.2018.1532073 9. Runck, B.: What is Biofeedback? https://psychotherapy.com/bio.html 10. Mayo Clinic Staff. Relaxation techniques: Try these steps to reduce stress, April 2017. https://www.mayoclinic.org/healthy-lifestyle/stress-management/indepth/relaxation-technique/art-20045368 11. Watson, R.: Uncanny Valley – Das Ph¨ anomen des unheimlichen Tals. In: 50 Schl¨ usselideen der Zukunft, pp. 136–139, Berlin, Heidelberg: Springer, Berlin Heidelberg (2014). ISBN: 978-3-642-40744-4. https://doi.org/10.1007/978-3-64240744-4-35
Virtual Reality Games for Stroke Rehabilitation: A Feasibility Study Supara Grudpan1,2 , Sirprapa Wattanakul1 , Noppon Choosri1(B) , Patison Palee1 , Noppon Wongta1 , Rainer Malaka2 , and Jakkrit Klaphajone3 1
3
College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand {sgrudpan,sipraprapa.wk,noppon.c,patison.p,noppon.w}@cmu.ac.th 2 Digital Media Lab, TZI, University of Bremen, Bremen, Germany {sgrudpan,malaka}@tzi.com Department of Rehabilitation, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand [email protected] Abstract. The success of stroke rehabilitation therapy is highly associated with patient cooperation. However, the repetitive nature of conventional therapies can frustrate patients and decrease their discipline in working out the physical therapy program. Serious games have shown promising outcomes when applied to tasks that require human engagement. This research focuses on sharing experiences and lessons learned from designing serious games using VR technology in cooperation with medical experts including rehab physicians, occupational therapists and physiotherapists to identify requirements and to evaluate the game before applying with stroke patients. The game has the objective to create an immersive environment that encourages the patient to exercise for recovery from stroke-induced disabilities. It is delicately designed to fit the stroke sufferers in Thailand, meanwhile, to integrate proper clinical physio therapeutic patterns based on the conventional therapy. Game design challenges for stroke patients and our solutions applied in the games were described. Our results of the preliminary field test revealed positive feedback on enjoyment and game features from physicians and physiotherapists. Finally, technical issues and suggestions for improvement were collected to adjust the game for the clinical trial with stroke patients in the next phase. Keywords: Serious games Participatory design
1
· Virtual reality · Rehabilitation · Stroke ·
Introduction
Stroke rehabilitation therapies require patients to train muscles for which the control chain downwards from the brain has been damaged. The therapy is Supported by organization x. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 163–175, 2020. https://doi.org/10.1007/978-3-030-65736-9_15
164
S. Grudpan et al.
challenged by improper and irregular training from poor patients’ engagement [7]. An effective recovery process necessitates patients to cooperate in a training program designed by a medical rehabilitation team [9]. The conventional practices of training are often monotonous and can be discouraging resulting in an ineffective outcome. Serious games and gamification have shown promising motivational effects when applied to such tasks that require human engagement [3,15] especially for personal fitness or exercise applications [10,11]. In order to develop more effective therapeutic tools in clinical settings, we co-investigated the practical requirements with a clinician team. In this way, we avoid falling into the same pitfalls of impracticality that many existing digital solutions suffer from that lack proper integration of medical expertise [13]. We applied several user research approaches and conducted a number of formative tests throughout this feasibility study to answer substantial questions. The questions that were in the focus include: What are the characteristics of successful serious game platforms that are suitable for the target users? How to transform conventional practice of physical therapy to Natural User Interface (NUI)? And: what are the non- functional requirements and limitations to consider?
2
Related Work
Game-based rehabilitation systems have been used to bring motivation to the patient [2,10]. The rehabilitation process for stroke patients requires regular practice of exercising the muscles carefully guided by physiotherapists and/or occupational therapists. Moreover, a key factor of stroke recovery is the cooperation of the patient. Frequently, patients lose motivation and discipline to effectively continue the intervention. Game-base rehabilitation systems are tools that motivate patients with more appealing environments and a diversity of fun game elements [7,10]. Virtual Reality (VR) has been researched to treat patients in neurorehabilitation. The creation of virtual environments for interaction of the patient with an asset similar to real world objects can immerse the patient’s attention to virtual activities. Furthermore, the interactive response to their activities via both visual and auditory feedbacks can raise their motivation to achieve the training objectives with additional scores and rewards showing and provoking their progression [20]. VR exergames are a distinct type of games for rehabilitation. Commercial exergame systems such as Nintendo Wii and Xbox Kinect-based systems were studied with stroke patients especially in upper extremity motor training. Saposnik [14] compared the effectiveness of mild-to-moderate upper extremity motor impairment patients performing activities using Wii games and recreational activities such as playing cards, bingo or ball game. The result of motor function improvement measured by Wolf Motor Function Test (WMFT) revealed no significant difference. However, Prachpayont and Teeranet [12] assessed effects of Wii-hab in Hemiplegic stroke patients. They found that the subject should have adequate hand muscle strength to hold the Wii controller. Additionally, the
Virtual Reality Games for Stroke Rehabilitation: A Feasibility Study
165
distance between the patient and the monitor screen distracted patient attention from the training. With Kinect technology (e.g. the Xbox VR system), the tool allows patients to interact and control the game play without holding a controller. An infrared sensor is embedded in the device reflecting limb and the body motion of players’ body. The game extends rehabilitation training doses in both duration time and repetitions within a session [1]. Trombetta [19] developed 3D VR-based games in 3rd person perspective using the Kinect for stroke rehabilitation. They were designed to engage players with elements such as movement guiding, score, play status, and stimulating sounds. The display on television and on head-mounted display (HMD) affects the subjects differently. HMDs gained better sensation of immersion with more attentiveness through the game When player wears HMD, he/she can see as if he/she is a 1st person in a closed virtual world. However, this game was visualized in the 3rd person perspective which players needed to adjust the sense of movement when he/she wears the HMD to interact with objects in the game. Moreover, the Kinect sensor cannot well detect the motion of fingers. Hand and finger movements are significant motor functions for human to perform daily activity tasks reflecting recovery in stroke patients. We considered HMD as an interesting device for user immersion. The user wears it as a goggle displaying VR as if the player was in that virtual world. The technology of Oculus Rift provides both the HMD and Oculus Touch Controller. The Oculus Touch is a hand controller with wristband support, so it is possible to be used by stroke patients who have little grip strength. The hand controller provides haptic sensation which is useful for stroke patients to practice their hand and finger muscles in grabbing, holding and touching an object. The haptic technology in the Oculus Touch stimulates the sense of touching by creating vibration that makes user feel interacted with the game as a direct feedback. The controller can be implemented on both hands which can control and record the information of each hand separately [4]. According to our knowledge, a comprehensive study of a VR game using commercial haptic controller designed for training the upper limb stroke patients is still lacking. We applied the aforementioned approach as a framework to develop a system to mimic and enhance the conventional stroke rehabilitation program with for the upper limb motor system for stroke patients. The intervention requires an adjustment on the intensity and specific task under the supervision of a therapist. Both instructors (therapists) and patients were considered as users of the system which was aimed to be user-friendly leading to more engagement.
3 3.1
Virtual Reality Based System for Stroke Rehabilitation Development Method
We developed a set of tailored games particularly for stroke patients following a participatory design approach. We established a collaboration with a rehabilitation team consisting from the Department of Rehabilitation medicine, Faculty of Medicine, Chiang Mai University who provided the core knowledge of stroke
166
S. Grudpan et al.
rehabilitation, and contacts to patients, physiotherapists, occupational therapists, physicians and nurses. In the first step, the rehabilitation team demonstrated sessions of stroke rehabilitation intervention in which the development team gained insights into patients’ movement in various disability levels and training requirements through the contextual observation. The main aim of the game was to encourage patients to exercise their muscles in the similar way but with more motivation during the conventional therapy. Concerned with game design and technology, we first chose a gaming platform that is suitable for an exergame employing the control of a character using physical interaction with full body motion game control (Physical User Interface, PUI) rather than standard controllers such as joystick, mouse, or keyboard. The potential platforms were Wii and Kinect. We conducted a field study to generate a Customer Journey Map to visualize the pain point of patients when they are in the cycle of therapy. It was found that the patients typically lose their interest after a short practice and suffering from boresome. To improve the patient experience, we decided to choose the PUI that can integrate the Virtual Reality technology to create the immersive environment. Therefore, we decide to adopt the Oculus Rift and Oculus touch. The development of our games was initiated by creating throw-away prototypes for testing basic game mechanics and input devices. The games were implemented in an iterative process over a period of 16 months. A number of formative tests were conducted with a rehabilitation team to fine-tune pace and levels of difficulty of the games that matched patients’ capability. The summative evaluation was conducted after the first release. Then we recruited rehabilitation residents and staffs as well as physical therapists and other healthcare practitioners (in total 37 participants) to test the game. During the test session, they were asked to think aloud and give feedback after finishing the session. We finished the development phase by testing the game with volunteer patients. Our first patient that we recruited to test the game was a spinal cord injury patient, not a stroke patient, who had no problem with mobility impairment of the upper limbs at all. We conducted this preliminary test with the aim to see if it was practical to set up and apply the game as an intervention in conjunction with conventional therapy at In-patient Department (IPD). We finally employed a short session test with the stroke patients under the supervision and assistance of the medical rehabilitation team. 3.2
Design Principles
Through the participatory research and experience during the development of our VR Games, the following principles were identified. These principles should be taken into account when designing games for stroke rehabilitation focusing on the upper limb. Instruction and Support System: Abilities and motor skills of stroke patients vary depending on the patient’s specific symptoms, state of disease
Virtual Reality Games for Stroke Rehabilitation: A Feasibility Study
167
and daily living activities. The level of the game should be suitable for the individual patient’s abilities and motor skills. It is important to have instructors who schedule the training program for the patients and to monitor them during play sessions. Thus, the important parameters, which relate to the patient’s movement and intensity level of the exercises should be adjustable by the instructor. Examples of adjustable parameters are movement speed, interval and number of obstacles (game level). Additionally, in conventional training, when players feel weary or hardship during the training session, instructors (therapists) persuade patients by giving encouragement. They use cheer up words or compliments such as “good”, “a bit more” or “nearly finish” to keep patients performing the tasks through the session. The game should also provide features or support system that allows instructors to conduct the training in a similar approach for encouraging patients. Appropriate Setting of VR Device and Movements: Using VR devices for exergames for stroke patients requires consideration on the patients’ conditions. Thus, designing a game should consider the setting of distance in the simulation and in the real-world to avoid injury to patients while playing games. Feedback of early prototypes showed that the game required players to move their whole body in order to reach the object in the game. This circumstance can impose an injury risk to the patient. Additionally, the way of holding game controller should mimic the gestures in the virtual scenario of the game. Initially, our game prototype was designed to be controlled with one hand, but there were suggestions to use and train both hands of the patients in parallel. Besides, the movement of fingers should be detectable via the control device in order to collect data and evaluate their movements. Familiarity to the Game Content: In order to make patients feel immersive in the game, the game should have a theme which the patients are familiar with. Most stroke patients are elderly people which leads to a challenge in designing video game for this target group [17]. The game content should be related to their experiences or daily life. For example, physiotherapists mentioned that “Most Thai elderly patients could not get accustomed to table tennis game (Wii Sport) because they have never played this kind of sports before”. Because the game content and game scenario are the elements that are key for motivation and for game selection, we designed our game based on the theme of Thai temple fair atmosphere game. Three games which are well-known activities that can be found in any Thai temple fair are used as scenarios in our mini games. Encouragement Through Rewards and Positive Feedback: A major challenge of conventional physical and occupational therapies is that patients are likely to give up the training when they feel frustrated from a failure to achieve the goal straight away [2,7]. Rewards and positive feedback are effective game mechanics that are used to motivate players [15]. These elements show the
168
S. Grudpan et al.
progress of players which can increase their engagement to the games. The game elements such as scoring, showing reward item (e.g. stars) are used to increase motivation and engagement of players. Additionally, the juicy mechanics as effect and sound also help to increase aesthetic and fun of the game [6,8]. 3.3
Implementation
The virtual reality-based system was aimed to be deployed for physicians and physiotherapist at Maharaj Nakorn Chiang Mai Hospital. The games were supposed to be an adjunctive tool to improve effectiveness of upper limb rehabilitation via immersive experience. It was also our goal to develop a software to reduce purchases of expensive software or devices such as Armeo Therapy. The games were implemented using the Unity 3D game engine and could be executed on a PC. The user interacted with a HMD (Oculus Rift) and Touch controller (Oculus Touch controller) that supported touch sensitivity and feedback when grabbing game objects. This seemed to be particularly suitable for movements that required stroke patients to practice using their upper limb muscles. 3.4
Games
According to the design principle mentioned above, we developed three minigames which are shown in Fig. 2. The games contained features to encourage the patient by immersing the patient in the Thai temple fair atmosphere where they had to make movements to complete the virtual reality game. The games helped patients to train in three patterns of movement identified by rehabilitation team. The first pattern reinforced practicing a reaching-out gesture, which involved movements by the following muscles: Deltoid, Triceps, Biceps, Pectoralis and Serratus anterior muscles through “Ice-cream selling” game (Fig. 1). The second pattern was arm-sweeping from left and right, which involved the action of the following muscles: Deltoid, Pectoralis, Infraspinatus, Latissimus, Rhomboid and Trapezius through a “Gun shooting” game. The third pattern training is to practice vertical arm movements with the following muscles: Deltoid, Triceps, Trapezius and Serratus anterior through the “Star picking” game. Moreover, in every game, players or patients needed to use their small muscles to trigger picking up virtual objects. The games were designed to support two types of users which are Player (participants or stroke patients) and Instructor (physiotherapists, occupational therapists or physicians). In the main pages of game, participants were able to select to play one of the three games. Every game allowed instructors to adjust difficulties of the games by going to Game setting menu. The information that instructors needed to give in the setting menu was player’s name, time period of game play, number of obstacle items, number of input controller (controlled by one or two hands), and grasping patterns (to hold game controller by middle finger, index finger, and to use both thumb and index finger). Additionally, the games allowed instructors to encourage and support players during play sessions. The instructor had a keyboard used as separate input from
Virtual Reality Games for Stroke Rehabilitation: A Feasibility Study
169
Fig. 1. Examples of player’s movement, Left: vertical movement; Right: horizontal movement
player’s game controller. He/she could send stickers to encourage and use help features for helping players when they were in a difficult situation (Fig. 3). Moreover, the instructor had the authority to pause and quit the game anytime in order to avoid injuring or over training patients. The “Ice-Cream Selling Game” is the first-person game that participants play a role as ice-cream seller. This game encourage reaching out gesture (straighten and pulling movements) in both horizontal and vertical. The player has to make ice-creams by shaking hand ice-cream maker (horizontal) and give the ice-creams to customer (vertical) within the limited time in order to get the score. The sequence of movements that player need to do in play session in order to get the score are as following: 1). To grasp ice-bucket/holding controller, 2). To shake ice-cream maker until ice-cream ready/rotating their arm in horizontal, and 3). To grasp ice-cream from the ice-cream maker to customer/To press button on hand controller and give to customers who show different color of icecream on their face (Fig. 2). The instructor able to adjust the difficulties of the games which depend on frequency of customers and period of time for rotating ice-cream maker. “Gun Shooting Game” was a first-person shooting game where players joined shooting booths of a Thai temple festival scenario. This game motivated players to perform arm-sweeping movements from the left and right and vice versa in order to shoot random targets in the game. The movement in this game required players to sweep the arm horizontally to point the gun with the help of a laser beam (Fig. 2). Different scoring with different target characteristics was another game element. To trigger the gun, a player needed to push the button on the hand controller. The difficulties of the game could be adjusted by changing the frequency of target popup, moving speed of the target and the time for reloading bullets. “Star Picking Game” was a first-person game where players attended a booth to draw lots and play a customer’s role to pick ruffled papers that were made
170
S. Grudpan et al.
in a star shape. This game encouraged arm-lifting movements involving both upward and downward hand motions mainly in the vertical direction. Players had to move their hand to reach the position of one of selected stars which were hung on the tree (with random appearance). After getting the star, players had to hold pressing the button until the halo surrounding that star disappeared and then could release the button (Fig. 2). The instructor could adjust the level of difficulty in this game by changing three parameters including a force that was used for picking stars, number of stars, and time for holding stars.
Fig. 2. Top left: Selling ice-cream; Top right: Shooting; Bottom left: Picking stars game; Bottom right: Game menu
Fig. 3. Left: Screenshot of help feature in Selling ice-cream game, Right: Controller for instructor
Virtual Reality Games for Stroke Rehabilitation: A Feasibility Study
4
171
Field Study
Once we completed the development of game prototypes, we managed to verify the feasibility of the games for clinical practice with the assessment by stroke rehabilitation experts. We conducted a 2-day workshop with 14 physiotherapists, 7 occupational therapists, 13 rehab physicians and 2 nurses. The participants gained knowledge and experience in the intervention for stroke patients with activity and movement enhancement. During the field study, data were collected using observation and interview methods to acquire feedback for improving the VR game system. 4.1
Setup and Procedure
On the first day, the participants were trained and had a chance to experience the VR technology for stroke therapy. The study of VR technology in medicine was new in Thailand. Some therapists had experienced commercial game systems such as Nintendo Wii and the Xbox 360 Kinect. However, commercial software and tailored software were different in functional and non-functional properties for user experiences. Therefore, an introduction session comprised Virtual Reality in therapeutic aspects, the principles of Gamification, stroke recovery and the tailored program for stroke patients. This session conveyed the understanding of how VR game systems linked to stroke patients’ muscle movement and neurological recovery factors. The game mechanics such as scoring, sound and feedback in real time had potential to induce patient’s motivation and concentration. In addition, the collected data on duration, repetition, difficulty, movement shape and etc. could be beneficial for therapeutic analysis. In the last session of the day, there was a workshop demonstrating the use of the developed VR game system: VR devices and VR game application which the participants tried out. On the second day, the participants used the developed VR games and we collected evaluation data by way of observation and participant interviews. We provided the devices including Oculus headset and Oculus touch controller connected to a computer. The game visuals are displayed on both the Oculus headset and a computer screen. Within the interval of 5 min, each participant equipped his/her body with the interaction devices and played one game in 3 levels (easy, medium, hard). During the game, an observer monitored the activities of the participant and the play on the computer screen. After the game play, the participant was asked to evaluate the game play. Each participant played all three games (Ice-cream selling, Gun shooting and Star picking games). 4.2
Results and Discussion
Based on the observations and interviews, we collected and analyzed significant points with the medical rehabilitation team. Overall, the game features worked well, and we gained further comments on functional and non-functional issues.
172
S. Grudpan et al.
Game Functional: The participants confirmed that the main functions of the game suitable for training upper limb muscle for stork patients. We listed the significant suggestions and limitations of the game according to the feedback as follows. The game setting scale for instructor – in order to adjust the game level to be proper for particular patient’s condition, the game provided a user interface such a scale for instructor to adjust game parameters such as the game pace and number of targeted object element, play time, arm reaching distance and the controlling of Oculus Touch con-troller. However, the setting scale range of the prototype was rather limited. Even though some parameters were adjusted to the lowest bound, the game seemed to be too fast for playing by stroke patients. The scale should be finer and wider. The scoring system – it should be in a proper balance between the plus and minus scores to engage players. Minus score could be discouraging while too high score could be ignored by players. The minus score was used as a mechanic to give immediate feedback how well they were achieving the goal. For example, for the “Ice-cream selling” game, the player did not only make and picked the ice-cream but also needed to consider the ice-cream color matched with the color shown on the customer face. While the plus score was used to give reward for challenging activity that required the skill level which the player had never reached before. This rewarding score was applied to improve the player’s skill [5]. The examples of such rewards were shown in the “Star picking” game, where a player who could pick more stars hung with greater distance got higher score than the one who picked mostly closer stars. Game Non-functional: Overall, the participants liked graphics and theme especially the visualization and effect. We listed the feedbacks which were related to those elements as follows: – Sound: Sound: Background music and stimulating sound effect were interesting and made the game fun. – Visual: Adding visual particle as feedback effect on an achievement was suggested. – Touch: Adding vibration on the controller to notify the achievement or to stimulate player during the game would be useful. We also received feedbacks focusing specifically on the features of particular games which were shown below. Ice-Cream Selling: This game seemed to fit well with stroke patients, especially one with isolated joint movements (Brunnstrom motor recovery stage 6). The player used the index finger or the thumb finger to control the grabbing of ice-cream stick. However, the majority of patients who had an acute stroke attack usually had distal muscle paralysis and an additional game level for the patient who could not control their fingers was suggested.
Virtual Reality Games for Stroke Rehabilitation: A Feasibility Study
173
Gun Shooting Game: The participants acknowledged the game design that they could choose the combination of the left and/or the right-hand control which was available for the hemiplegia patients. However, the scoring of each hand should be separated. The game mechanic also was intended to stimulate the use of both hands and configurable of task assigned on each hand. Star Picking Game: The game required the arm to reach out mainly in vertical direction, but oftentimes extreme reaching-out motions could cause a risk of falling from the chair. Body strapping or seat adjustment during the play was suggested. Besides, motion sensitivity of sensor should be adjusted to reduce body movement.
5
Conclusion and Future Work
The objective of our work was to study how to develop games specifically for training stroke patients. The working process required knowledge in multidisciplinary fields including medicine, physiotherapy, and game design, and needed an interdisciplinary team to fabricate a prototype regarding direct feedbacks from medical experts and the users (patients). Thus, we employed a participatory design approach and worked closely with rehab physicians, physiotherapists, and volunteer patients in order to receive iterative feedback. The field study confirmed the design principles that we identified during the development process. Specifically, the role of the Instruction and Support system, and Rewarding and Positive Feedback were highlighted. Even though the games provided features for instructor to adjust difficulties of the games, the games required a finer scale for tuning some parameters which should be adjustable for individual patients. This issue gives potential for a future work such as adaptability of game for people with disability, not abled people. Regarding the game content, most of the participants enjoyed the game content, vitality, and sound. Concerning difficulty adjustment, another issue to be looked at in the future would be balancing game and exercise difficulty in separate models as proposed in the dual flow model for Exergames [16]. Our field study was only a limited and preliminary study to evaluate the game features by medical experts. In the next step we want to evaluate the game with real patients and to evaluate the instructor interface in more depths with therapists and physicians. Moreover, the system needs to be tested for a long-term to evaluate long-term effect with patients and the usefulness for therapy. So far only a few studies of exergames have shown evidence for long term effects [18]. Our user tests during the development process and the field study revealed that participants in general enjoyed this new way of training. Moreover, the physicians who were involved in this project confirmed that these games reached the basic requirements needed for training stroke patients who have dysfunction in the upper limb. However, these games still need further evaluation with more extensive clinical trial with stroke patients in order to investigate long
174
S. Grudpan et al.
term effects and potential negative effects. Additionally, the game should record and analyze player data in a privacy-preserving way. This recording should be done in a non-intrusive manner so that it will not affect the player’s clinical rehabilitation. In the future, we will further work on analyzing and modeling all physiological data on horizontal and vertical movements. The next phase of this project will focus the relationship between the game scoring and the results of clinical rehabilitation measurement. Some clinical measurement would employ (1) Fugl-Meyer assessment on upper extremity motor score or FMA, (2) Wolf motor function test (WMFT) which evaluates the hand and arm movements, (3) Barthel index (BI) which evaluates the rehabilitation process, and (4) Arma-TH which evaluates the use of hands and arms in doing activities. This evaluation will be implemented in the second phase of our project.
References 1. A¸skın, A., Atar, E., Ko¸cyi˘ git, H., Tosun, A.: Effects of kinect-based virtual reality game training on upper extremity motor recovery in chronic stroke. Somatosens. Motor Res. 35(1), 25–32 (2018) 2. Assad, O., et al.: Motion-based games for Parkinson’s disease patients. In: Anacleto, J.C., Fels, S., Graham, N., Kapralos, B., Saif El-Nasr, M., Stanley, K. (eds.) ICEC 2011. LNCS, vol. 6972, pp. 47–58. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24500-8 6 3. Deterding, S., Dixon, D., Khaled, R., Nacke, L.: From game design elements to gamefulness: defining “gamification”. In: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, pp. 9– 15 (2011) 4. FacebookTechnologies: Oculus rift features. https://www.oculus.com/rift/ 5. Fullerton, T.: Game Design Workshop: A Playcentric Approach to Creating Innovative Games. CRC Press (2014) 6. Hicks, K., Gerling, K., Dickinson, P., Vanden Abeele, V.: Juicy game design: understanding the impact of visual embellishments on player experience. In: Proceedings of the Annual Symposium on Computer-Human Interaction in Play, pp. 185–197 (2019) 7. Hung, Y.X., Huang, P.C., Chen, K.T., Chu, W.C.: What do stroke patients look for in game-based rehabilitation: a survey study. Medicine 95(11) (2016) 8. Hunicke, R., LeBlanc, M., Zubek, R.: MDA: a formal approach to game design and game research. In: Proceedings of the AAAI Workshop on Challenges in Game AI, vol. 4, p. 1722 (2004) 9. Klaphajone, J.: Rehabilitation medicine for general practitioners. Sutin Supplies Limited Partnership (2006) 10. Malaka, R.: How computer games can improve your health and fitness. In: G¨ obel, S., Wiemeyer, J. (eds.) GameDays 2014. LNCS, vol. 8395, pp. 1–7. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05972-3 1 11. Malaka, R., Herrlich, M., Smeddinck, J.: Anticipation in motion-based games for health. In: Nadin, M. (ed.) Anticipation and Medicine, pp. 351–363. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-45142-8 22 12. Prachpayont, P., Teeranet, G.: Effects of wii-hab training on motor recovery and motor function of upper extremity in subacute stroke patients: a pilot randomized controlled trial. J. Thai Rehabilitat. Med. 23(2), 64–72 (2013)
Virtual Reality Games for Stroke Rehabilitation: A Feasibility Study
175
13. Rosser, B.A., Eccleston, C.: Smartphone applications for pain management. J. Telemed. Telecare 17(6), 308–312 (2011) 14. Saposnik, G., et al.: Efficacy and safety of non-immersive virtual reality exercising in stroke rehabilitation (evrest): a randomised, multicentre, single-blind, controlled trial. Lancet Neurol. 15(10), 1019–1027 (2016) 15. Schell, J.: The Art of Game Design: A Book of Lenses. AK Peters/CRC Press, New York (2019) 16. Sinclair, J., Hingston, P., Masek, M.: Exergame development using the dual flow model. In: Proceedings of the Sixth Australasian Conference on Interactive Entertainment, pp. 1–7 (2009) 17. Smeddinck, J., Herrlich, M., Krause, M., Gerling, K., Malaka, R.: Did they really like the game?-challenges in evaluating exergames with older adults. In: CHI 2012 Workshop on Game User Research: Exploring Methodologies, Austin, TX, USA (2012) 18. Smeddinck, J.D., Herrlich, M., Malaka, R.: Exergames for physiotherapy and rehabilitation: a medium-term situated study of motivational aspects and impact on functional reach. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 4143–4146 (2015) 19. Trombetta, M., Henrique, P.P.B., Brum, M.R., Colussi, E.L., De Marchi, A.C.B., Rieder, R.: Motion rehab AVE 3D: a VR-based exergame for post-stroke rehabilitation. Comput. Meth. Program. Biomed. 151, 15–20 (2017) 20. Weiss, P.L., Kizony, R., Feintuch, U., Katz, N.: Virtual reality in neurorehabilitation. Textb. Neural Repair Rehabilitat. 51(8), 182–197 (2006)
Interactive Simulation of DNA Structure for Mobile-Learning Feng Jiang1,2 , Ding Lin1,2(B) , Liyu Tang1,2 , and Xiang Zhou1,2 1 Key Laboratory of Spatial Data Mining and Information Sharing of MOE, Fuzhou University,
Fuzhou 350108, China [email protected] 2 National Engineering Research Center of Geospatial Information Technology, Fuzhou University, Fuzhou 350108, China
Abstract. Mobile Augmented Reality (MAR) is considered to be a promising tool in science classes to foster kids’ imagination. In this paper, a parametric simulation method of DNA three-dimensional structure is proposed for middle school biology class. Affine transformation was used to stack and spiral complementary paired nucleotides in three-dimensional space according to the structural parameters of different DNA conformations. The process of vector construction was dynamically simulated by the recognition technologies (i.e., image characteristics and collision detection). The prototype system was developed using mobile technologies, which combines multiple interactive methods (i.e., viewing, listening, touching, and physical objects) to provide users with an appropriate learning and cognitive way. The system is useful for learners (especially K−12 students) to utilize the fragments of spare time outside the class to promote the cognition of DNA. In the future, such technology-assisted education would be a popular open learning form. Keywords: Mobile Augmented Reality · DNA conformations · Parametrization · Affine transformation · Mobile technologies
1 Introduction It is difficult to imagine the three-dimensional morphology and the spatial layout of helix and twisted double chains of DNA, especially for those teenaged students who first learn it in the classroom. In biology classes, pictures, texts, videos and so on are commonly shown for demonstration with limited form and effect [1, 2]. Thus, teachers attempt to use some physical objects (e.g., paper clips, foam, etc.) or 3D-printed objects as the DNA components, and stitch them together one by one in the classroom to show the three-dimensional schematic model of DNA molecular to help students understand the knowledge [3]. Those actions usually occupy much time but having poorly visual effects, which affect the effect of teaching and learning. Moreover, the limitations of these material-combined models prevent them from dynamically showing DNA-related © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 176–187, 2020. https://doi.org/10.1007/978-3-030-65736-9_16
Interactive Simulation of DNA Structure for Mobile-Learning
177
microscopic biological processes (i.e., base complementation, vector construction and so on). With the advantages (i.e., imagination, interactivity and immersion), the technology of virtual reality and augmented reality can be used to present phenomena that cannot be observed or manipulated in the real (e.g., the process changes too fast or too slow). Integrated with professional knowledge, augmented reality (AR) builds immersive 3D visualization scenes by accurately overlaying virtual scenes and real scenes to enhance the user’s perception experience in real environments [4]. It is a new method and a supporting technology to promote teaching [5, 6], and has been adopted widely in STEM courses [7, 8], physics classes [9], biochemistry teaching [10, 11] and other aspects. As the popularization of mobile learning in daily life, AR has drawn more and more attention and becomes a positive trend [12–14]. The mobile AR cognitive tools can overcome the time & space constraints of classrooms or laboratories, and enable students to autonomously learn anytime and anywhere, thus triggering students’ deeper thinking and inquiry-based learning [15–17]. There are two types of MAR technology in scientific education (i.e., location-based and image-based) [18]. Location-based AR usually combines with geographic locations requiring real-time data and large space, and is more suitable for scientific query-based learning [18]. Image-based AR (including marked and unmarked technology) provides real-time interactive experiments to improve students’ spatial ability, practical skills and understanding of concepts [19]. A number of AR tools have promoted the development of biochemistry teaching [20, 21]. Several software packages can simulate the relationship between the structure and function of biological macromolecules, such as BioChemAR [22], Augment [23], ArBioLab [24], Palantir [25] and so on. However, most existing tools were dedicated for university students and scientific research. They are not suitable for biology teaching in middle school because of its complex and lack of dynamical simulation of micro-process-like phenomena and poorly interactive experience. In this paper, the dynamic processes of vector construction and three-dimensional shape of DNA were simulated with parametric and an interactive real time AR-based app was implemented. Adopting the interactive technologies of mobile device, learning on microscopic phenomena of DNA is full of fun with finger touches, voice controls and real things’ interactions, which are recognized and tracked by image-based natural feature detection. Some mobile experience tests are conducted among middle school students.
2 Simulation of Three-Dimensional DNA Structure 2.1 Double Helix Structure of DNA DNA is a molecule composed of a phosphate group, deoxyribose and one of four nitrogen-containing nucleobases (i.e., adenine (A), thymine (T), guanine (G) and cytosine (C)). The three-dimensional (3D) structure of DNA has the following characteristics [26, 27]: (1) Deoxyribose and phosphate groups are alternately connected by phosphodiester bonds to form the two DNA strands in a reverse parallel way; (2) The nitrogenous bases of the two separate polynucleotide strands are bound together by hydrogen bonds, according to base pairing rules (A-T, C-G).
178
F. Jiang et al.
Modeling DNA Strands. Since the two DNA strands are located on the reference cylindrical surface of the double helix space, a local reference coordinate system is established at the center of bottom so that y-axis is aligned with the central axis. The affine transformation matrix (i.e., M) is gained according to parameters about different DNA (i.e., pitch (d), twist angle (ϕ), diameter and so on). The affine transformation matrix is used to move the paired nucleotides along the y-axis by a distance (i.e., d) and rotate around the y-axis by an angle (i.e., rotation angle (θ )). The point (i.e., H 0 ) on the cylinder repeatedly performs a series of transformations to obtain the local reference frame (i.e., H i ) of phosphate groups, which is connected to become the two DNA strands based on curve. The local reference frame of deoxyribose on each paired nucleotide can be calculated using orthogonality. The direction information of phosphodiester bond can be calculated by the local reference frame of phosphate groups and adjacent deoxyribose. Assuming that the direction vector of the on deoxyribose of adjacent paired local frame nucleotide in world coordinates (i.e., F, U , R ) and the direction vector of the local frame of phosphate group (i.e., f, u , r ) are shown in Fig. 1(a). The orientation infor mation of the phosphodiester bond in the world coordinate system (i.e., θx , θy , θz ) can be calculated by formula (1). Similarly, the information of the phosphodiester bond (i.e., spatial position, length, scale factor and so on) can be obtained. Finally, all the above elements were instantiated and rendered at their corresponding spatial positions. ⎤ ⎡ ⎡ ⎤ COS −1 f, F 100 ⎢ ⎥ T ⎢ ⎥ (1) θx θy θz = ⎣ 0 1 0 ⎦⎢ COS −1 u , U ⎥ ⎦ ⎣ 001 COS −1 r , R
Fig. 1. Structure of DNA double helix
Stacking Base Pairs. The dynamic stacking of paired nucleotides in the DNA double helix space is shown in the Fig. 1(b). First, the local reference frame (i.e., F i ) of paired nucleotides is calculated by matrix multiplication, which is located on the central axis of the double helix column space. The local reference frame of phosphoric acid on each paired nucleotide can be calculated based on the three-dimensional structure parameters
Interactive Simulation of DNA Structure for Mobile-Learning
179
of DNA (i.e., pitch (d), rotation angle (θ ), etc.). The origin of phosphate groups’ local reference frame on the same side is connected by a curve to form a target region for moving paired nucleotides freely. Interpolation is performed according to the set random moving step sequence and time. The dynamic process of gradually stacking paired nucleotides in the double helix space by affine transformation.
2.2 Common Structure Linear DNA is a common DNA structure, which mainly includes B-DNA and A-DNA. The parametric interactive modeling method was used to simulate the 3D shape of different DNA conformations. The aim is to improve students’ cognition of the true structure of DNA and promote students’ spatial imagination. Summarized the rules of common DNA structure from bird’s-eye views, the parameters of the double helix DNA structure in different patterns can be calculated as shown in Fig. 2. Assuming that the y-axis was set as the direction of the helix axis, and the local coordinate frame of base pair (e.g., F i ) can be calculated by formula (2) and formula (3).
Fig. 2. Comparison of Structure of DNA molecular
xi yi zi 1 = x y z 1 T ⎡
1 i−1 ⎣ θi 1 = θ 1
0 φ1
(2)
⎤ ⎦
(3)
0
Among them, (x i , yi , zi ) is the three-dimensional coordinate of i-th paired nucleotide’s center point. (x, y, z) is the three-dimensional coordinate of the first paired nucleotide’s center point in the bottom of the double helix space. T is the affine transformation matrix (B-DNA takes T B . A-DNA takes T A ). θ i is the i-th paired nucleotide’s rotation angle. θ is the initial offset angle of paired nucleotide in the reference double helix space. The twist angle of the helix is described as (ϕ) (B-DNA takes 36°, A-DNA takes 33.6°).
180
F. Jiang et al.
The paired nucleotides of B-DNA molecular structure took DB as the interval in y-axis and rotated at the angle (ϕ) (that is 36°) to change T B as shown in formula (4): ⎡ ⎤ 1 0 00 ⎢0 1 0 0⎥ ⎢ ⎥ ⎢ ⎥ (4) TB = ⎢ 0 0 1 0 ⎥ ⎢ i−1 ⎥ ⎣ ⎦ 0 DB 0 1 0
The inclination of paired nucleotides is 13°, which causes the base plane no longer be perpendicular to the spiral axis. The effect of force causes the paired nucleotides to be extrapolated, forming a wider and deeper groove [28]. There is a “hollow” phenomenon in A-DNA from the top view, as shown in the shaded part of Fig. 2. The center of the paired nucleotides (Bi ) pass through the spiral axis and then are pushed a distance (d) to the opposite side (Ai ). In the meantime, the rotation transformation is performed at (ϕ) (i.e., 33.6°). According to formula (5), the rotation transformation matrix (T A ) of A-DNA can be obtained. ⎡ ⎤ 1 0 0 0 ⎢ 0 1 0 0⎥ ⎢ ⎥ ⎢ ⎥ (5) TA = ⎢ 0 0 1 0⎥ ⎢ ⎥ i−1 ⎣ ⎦ d · cos(φ · i) DA d · sin(φ · i) 1 0
Users can interactively select the type of common DNA, and then change the structural parameters of DNA to obtain different three-dimensional DNA conformation. All paired nucleotides in the axis direction did not completely fill the double helix room, showing the phenomenon of “major groove” and “minor groove”.
3 Interactive Simulation for Mobile-Learning 3.1 Interactive Design for Mobile Device Mobile-Learning is different from traditional teaching, emphasizing that students are the main participants. There are different requirements for learning content, learning methods, learning environment and evaluation. With its advantages (i.e., flexible, convenient, personalization and so on), mobile-learning can better satisfy the needs of learners for learning in spare time. Based on the analysis of the needs about learning, the abovementioned contents of DNA-related were reorganized in accordance with the principles (i.e., layout, moderate content and convenient operation). Those existing teaching resources (i.e., videos, problems, pictures, text, etc.) were fully used by optimally combining the technology of voice interaction and touch interaction, which can stimulate the senses of learners and improve the acceptance rate of Knowledge. Virtual button designed by UI can simplify the interactive operation process and control the switching of different modules. A number of microscopic phenomena were dynamically reproduced by combining AR technology with DNA. Several portable cards with more information of features were designed as interactive tools in order to get rid of the dependence on traditional operating tools.
Interactive Simulation of DNA Structure for Mobile-Learning
181
3.2 Marker-Based Interactive Simulation Vector construction is a microscopic process of changing the gene sequence according to human intentions as shown in Fig. 3. Restriction enzyme (e.g., the recognition site of EcoRI is GAATTC/CTTAAG) and ligase are required for vector construction. Vector construction requires restriction enzyme, ligase and genes of interest. In this paper, three pictures were designed as markers (i.e., Mark1 is restriction enzyme, Mark2 is genes of interest, Mark3 is ligase). With the help of camera from mobile, students can interactively manipulate the process of vector construction by moving the marker.
Fig. 3. Principle of vector construction
This paper mainly used the image-based AR technology (i.e., it was used to detect and track related objects.) and the collision detection technology (i.e., it was used to achieve event response.). The interactive simulation of vector construction process is shown in Fig. 4. The main steps are as follows:
Fig. 4. Implementation process of Enzyme marker detection and tracking
182
F. Jiang et al.
(1) The target picture is preprocessed to obtain feature information. (2) The current frame of the video captured by the camera is converted to binary data and extracts feature information. (3) The feature matching method is used to match the feature information of the target image and the captured image. If the matching is successful, the camera’s current pose information will be calculated according to formula (6).
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ X X ⎤ x fx 0 bx ⎢ ⎥ ⎢ ⎥ ⎢ Y ⎥ ⎢ Y⎥ ⎢ ⎥ s⎣ y⎦ = ⎣ 0 fy by ⎦ r3×3 T3×1 ⎢ ⎥ = N ⎢ ⎥ ⎣ Z⎦ ⎣ Z⎦ 0 0 1 1 1 1
(6)
Among them, (x, y) is the point’s two-dimensional coordinates in the image coordinate system. (X, Y, Z) is the point’s three-dimensional coordinates in the world coordinate system. f x and f y are the camera’s focal length in the x and y directions. (bx , by ) is the pixel coordinate vector of the principal point in the image space. r 3×3 and T 3×3 are the camera’s transformation matrix. N is the homography matrix. (4) Using the camera’s pose information, virtual objects are rendered at the appropriate position by affine transformation to achieve the effect of Virtual-Actual combination. (5) The marker is moved in order to cause the collision detection of virtual objects containing hierarchical bounding volumes, so that the function of OnTriggerEnter() is triggered in Unity3D. If the collision detection is successful, it will respond to the effect of cutting plasmid, inserting target genes and so on.
4 Prototype Development 4.1 System Structure and Function In this Study, Unity3D and augmented reality application development engine (e.g., Vuforia SDK) were integrated in the Visual Studio2017 development environment. The development process of DNA mobile AR system was shown in Fig. 5. Autodesk 3ds Max software was used to establish the DNA model of each components, (i.e., deoxynucleotides, enzymes, bases, etc.). The software of Unity3D integrated Vuforia SDK to develop the detection and tracking of AR tags and dynamically simulated microscopic phenomena (i.e., the three-dimensional structure of DNA, replication and vector construction). A variety of interactions (i.e., single-touch/multi-touch, voice assistance and physical interaction) have been realized. The content designed includes pre-class guidance, in-class demonstration and after-class review to guide and motivate students to study independently. The method by using funny games to expand the standard knowledge points of the course is suitable for the exploratory learning of students with different knowledge levels.
Interactive Simulation of DNA Structure for Mobile-Learning
183
Fig. 5. Flow of system implementation
The system consisted of three modules (i.e., knowledge guide module, experiment operation module and content review module). The knowledge guide function module included the video learning, which guides students to learn relevant knowledge (i.e., the discovery of DNA double helix structure, the concept of DNA structure, DNA replication process and characteristics, etc.). The experimental operation module included three contents (i.e., the structure of DNA, the replication of DNA and vector construction) was be prepared by videos and audios preview. Among them, simulation of the structure of DNA was described paired nucleotides dynamically stacking in the three-dimensional space. Some contents about replication and vector construction (i.e., plasmid cutting, target gene acquisition, target gene insertion, etc.) were simulated to reproduce the micro process. The module of content review included knowledge review and exercise of question bank (i.e., common question type, error-prone question type and so on). After students completed the solution, the system will explain the question. 4.2 Result Users can purposefully make the selection of experimental module through the user interface as shown in Fig. 6(a). The dynamic simulation of vector construction is conducive to improve students’ interest as shown in Fig. 6(b). Figure 6(c) showed the microscopic phenomena of DNA replication process improving users’ spatial imagination and reducing cognitive difficulties.
184
F. Jiang et al.
Fig. 6. Screenshots of virtual experiment system
5 Discussion and Conclusions 5.1 Testing and Discussion Some students in junior high school or senior high school were chosen for testing about this mobile experience. During the modeling process of DNA three-dimensional shape, the change of students’ learning attitude was recorded. At the end of all tests, participants were asked to give feedback and analyze the collected experimental results. To evaluate whether the system was useful for teaching the knowledge of DNA in middle school, the prototype system would be released as an “APK” file and installed on students’ mobile phones. Meanwhile, the feedback of all users participating in the mobile test is shown in Table 1. Table 1. Part data contrast of system simulation Items of test
Good Average poor
Fluency
90%
6%
4%
Simple operation 92%
3%
5%
Extensity
97%
3%
0
Attraction Fun
90% 95%
7% 3%
3% 2%
The experimental results showed that the prototype system has realized a variety of interactive ways (e.g., view, hearing, touch and object), which has been widely welcomed by students in middle school. This research method can intuitively represent the three-dimensional structure of DNA, simulate the dynamic process of paired nucleotides superimposing helix, and display the process of vector construction in real time. Since an interactive learning tool is popular among students in middle school, it can effectively improve students’ cognition of DNA structure. Students who used AR-assisted learning thought that this system can help them to increase interest in learning and enhance the spatial perception. What’s more, the methods adapt to the transformation of learning in the new era.
Interactive Simulation of DNA Structure for Mobile-Learning
185
5.2 Conclusion We have set the hierarchical relationships and a variety of interactive ways in order to show components of double helix DNA model clearly. We have released a mobile AR app based on Unity3D for teaching the knowledge of DNA in middle schools, which can dynamically simulate the process of vector construction. A reasonable combination of touch screen interaction, voice control and physical interaction can achieve a better users’ experience by using collision detection technology and image-based natural feature detection technology. With the help of this app, students can watch those videos about background story and manipulate double helix DNA structure interactively in a variety of ways to master key conceptions, which contain the experiments’ contents, steps and so on. Furthermore, they can gain a better understanding and deeper thinking of the structure-function spatial pattern by comparing the 3D shape of common DNA (e.g., A-DNA and B-DNA). Some students from different middle schools, as volunteer, joined our test expressing their experience to run the mobile AR app. Then the collected information of the test was analyzed. The results show that our tool is committed help students understand the structure of DNA and cultivate their spatial visualization by realistically simulation. After the tests, these students show a greater will of learning new things independently and effectively anywhere and anytime by gamification. It’s clear that AR-based mobile-learning tools can take advantages of mobile-learning (i.e., flexible and convenient) to overcome the time & space limitation of traditional learning methods. Simplified prototype of the interactive operations make it easy for students to use. They can enable learners to study autonomously anytime and anywhere. AR-based mobile learning tools can expand students’ imagination and give them appropriate cognition. It would be a promising tools for the experience of K−12 courses and open learning environments in the future. Acknowledgements. This work was supported by the National Key Research and Development Program of China (Grant No: 2018YFB1004905). We thank Sir Yifei JIN from Beijing Lebu Education Technology Company for providing the video of DNA.
References 1. Vadakedath, S., Sudhakar, T., Kandi, V.: Assessment of conventional teaching technique in the era of medical education technology: a study of biochemistry learning process among first year medical students using traditional chalk and board teaching. Am. J. Educ. Res. 6(8), 1137–1140 (2018) 2. Vadakedath, S., Kandi, V.: Modified conventional teaching: An assessment of clinical biochemistry learning process among medical undergraduate students using the traditional teaching in combination with group discussion. Cureus 11(8), e5396 (2019). https://doi.org/10. 7759/cureus.5396 3. Cooper, A.K., Oliver-Hoyo, M.T.: Creating 3D physical models to probe student understanding of macromolecular structure. Biochem. Mol. Biol. Educ. 45(6), 491–500 (2017) 4. Gunnar, L., Andrew, M.: Views, alignment and incongruity in indirect augmented reality. In: IEEE International Symposium on Mixed & Augmented Reality-Arts, Media, and Humanities, pp. 23–28. IEEE Computer Society, Adelaide (2013)
186
F. Jiang et al.
5. Abreu, P.A., Carvalho, K.L., Rabelo, V.W.H., et al.: Computational strategy for visualizing structures and teaching biochemistry. Biochem. Mol. Biol. Educ. 47(1), 76–84 (2019) 6. Ardiny, H., Khanmirza, E.: The role of AR and VR technologies in education developments: Opportunities and challenges. In: 2018 6th RSI International Conference on Robotics and Mechatronics (IcRoM), Tehran, Iran, pp. 482–487 (2018) 7. Altmeyer, K., Kapp, S., Thees, M., et al.: The use of augmented reality to foster conceptual knowledge acquisition in STEM laboratory courses—theoretical background and empirical results. Br. J. Educ. Technol. 51, 611–628 (2020) 8. Al-Azawi, R., Albadi, A., Moghaddas, R., Westlake, J.: Exploring the potential of using augmented reality and virtual reality for STEM education. In: Uden, L., Liberona, D., Sanchez, G., Rodríguez-González, S. (eds.) LTEC 2019. CCIS, vol. 1011, pp. 36–44. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20798-4_4 9. Fidan, M., Tuncel, M.: Integrating augmented reality into problem based learning: the effects on learning achievement and attitude in physics education. Comput. Educ. 142, 103635 (2019) 10. Sanii, B.: Creating augmented reality USDZ files to visualize 3D objects on student phones in the classroom. J. Chem. Educ. 97(1), 253–257 (2020) 11. Zheng, M., Waller, M.P.: ChemPreview: an augmented reality-based molecular interface. J. Mol. Graph. Model. 73, 18–23 (2017) 12. Rong-Chi, C., Zeng-Shiang, Y.: Using augmented reality technologies to enhance students’ engagement and achievement in science laboratories. Int. J. Distance Educ. Technol. 16(4), 54–72 (2018) 13. Burden, K., Kearney, M.: Future scenarios for mobile science learning. Res. Sci. Educ. 46(2), 287–308 (2016) 14. Yang, S., Mei, B., Yue, X.: Mobile augmented reality assisted chemical education: insights from elements 4D. J. Chem. Educ. 95(6), 1060–1062 (2018) 15. Coffman, T.L., Klinger, M.B.: Mobile technologies for making meaning in education: using augmented reality to connect learning. In: Mobile Technologies in Educational Organizations, pp. 64–84. IGI Global (2019) 16. Lall, P., Rees, R., Law, G.C., et al.: Influences on the implementation of mobile learning for medical and nursing education: qualitative systematic review by the digital health education collaboration. J. Med. Internet Res. 21(2), e12895 (2019) 17. Golenhofen, N., Heindl, F., Grab-Kroll, C., et al.: The use of a mobile learning tool by medical students in undergraduate anatomy and its effects on assessment outcomes. Anatom. Sci. Educ. 13(1), 8–18 (2020) 18. Leahy, S.M., Holland, C., Ward, F.: The digital frontier: Envisioning future technologies impact on the classroom. Futures 113, 102422 (2019) 19. Kun-Hung, C., Chin-Chung, T.: Affordances of augmented reality in science learning: suggestions for future research. J. Sci. Educ. Technol. 22(4), 449–462 (2013) 20. Celik, C., Guven, G., Cakir, N.K., et al.: Integration of mobile augmented reality (MAR) applications into biology laboratory: anatomic structure of the heart. Res. Learn. Technol. 28, 2355 (2020). https://doi.org/10.25304/rlt.v28.2355 21. Habig, S.: Who can benefit from augmented reality in chemistry? Sex differences in solving stereochemistry problems using augmented reality. Br. J. Educ. Technol. 51, 629–644 (2019) 22. Sung, R.J., Wilson, A.T., Lo, S.M., et al.: BiochemAR: an augmented reality educational tool for teaching macromolecular structure and function. J. Chem. Educ. 97(1), 147–153 (2020) 23. Wolle, P., Muller, M.P., Rauh, D., et al.: Augmented reality in scientific publications – taking the visualization of 3D structures to the next level. ACS Chem. Biol. 13(3), 496–499 (2018) 24. Chang, R.C., Yu, Z.S.: Using augmented reality technologies to enhance students’ engagement and achievement in science laboratories. Int. J. Distance Educ. Technol. (IJDET) 16(4), 54–72 (2018)
Interactive Simulation of DNA Structure for Mobile-Learning
187
25. Lee, N.Y., Tuckerkellogg, G.: An accessible, open-source mobile application for macromolecular visualization using augmented reality. Biochem. Mol. Biol. Educ. 48, 297–303 (2020) 26. Poppleton, E., Bohlin, J., Matthies, M., et al.: Design, optimization, and analysis of large DNA and RNA nanostructures through interactive visualization, editing, and molecular simulation. bioRxiv (2020) 27. Li, S., Olson, W.K., Lu, X., et al.: Web 3DNA 2.0 for the analysis, visualization, and modeling of 3D nucleic acid structures. Nucl. Acids Res. 47(W1), W26–W34 (2019) 28. Ghosh, A., Bansal, M.: A glossary of DNA structures from A to Z. Acta Crystallograph. Sect. D Biol. Crystallogr. 59(4), 620–626 (2003)
Augmented Reality Towards Facilitating Abstract Concepts Learning Sandra Cˆ amara Olim1(B) 1
and Valentina Nisi2
FCT, Universidade Nova de Lisboa, Portugal and ITI-LARSys, Lisbon, Portugal [email protected] 2 IST, Universidade de Lisboa, Lisbon, Portugal [email protected]
Abstract. Chemistry is often regarded as a complex and demanding subject for youth to learn, partly because of abstract concepts that are challenging to depict. These areas require spatial reasoning, defined as the ability to retain, generate and manipulate abstract visual images in space, either physically or mentally. By allowing the superimposition of virtual objects in the real world, Augmented Reality (AR) facilitates students’ understanding of difficult concepts through spatial reasoning by making them visible and allowing for multidimensional perspectives. “Periodic Fable” is an AR serious game targeting 8 to 11-year-old children, designed to teach them basic chemistry concepts of the Periodic Table of Elements in non-formal settings. After designing and implementing the game, we conducted an exploratory study with 36 young participants, using a mixed-method approach. A comparative study between pre- and post-intervention questionnaires and observation results shows a positive learning outcome, demonstrating the potential of this tool in a non-formal context.
Keywords: Augmented Reality Periodic Table.
1
· Edutainment · Non-formal learning ·
Introduction
The acquisition and development of abstract reasoning is extremely important in children’s. Great part of abstract reasoning acquisition happens during the learning process of school subjects. These subjects, involve inquiry, experimentation, evidence, the evaluation and analysis of ideas, problem-solving, creative thinking, and overall understanding of information by making connections with the real world [44]. Children’s understanding of abstract concepts in STEM disciplines has been shown to increment by means of spatial reasoning [34] enabling them to reason about the space around while manipulating real or imaginary object configurations in space [7]. Playful educational experiences and games have shown positive effects in a wide spectrum of the development of cognitive skills [19,25]. Educational games c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 188–204, 2020. https://doi.org/10.1007/978-3-030-65736-9_17
Augmented Reality Towards Facilitating Abstract Concepts Learning
189
increase concentration levels and stimulate learning in children [1], allowing for the exploration of teaching tools such as metaphors, analogies, and the spatial manipulation of 3D objects through technologies like Virtual and Augmented Reality [42]. Virtual and Augmented Reality serious games create compelling, collaborative and participatory experiences to enhance the user’s engagement and learning. These experiences can provide alternatives to a real-world environment for situated learning, allowing the user to experience a sense of “being there” while applying acquired knowledge [11]. Immersive games hold great potential for cognitive and motor development, and represent a powerful tool to facilitate the teaching of school curriculum and an effective way of acquiring knowledge within a non-formal context. Playing games is also considered a crucial activity for human development of socialization, expression, and communication skills [35]. Given these conditions, knowledge is acquired through a participatory process whereby the learner is “transformed through his actions and relations with the world” [6]. As argued by John Dewey, teaching involves “engaging children in a fun and playful environment, imparting educational content, and instigating interest in learning more” [12]. Given the possibility of transporting knowledge to other settings, Augmented Reality enables the connection between content knowledge and user context, thus making the learning more effective. Many Augmented Reality games have proved especially effective for STEM subjects. AR enables users to visualize otherwise inaccessible representations and to experiment in a low-risk and lowcost platform[3]. Augmented Reality has also been shown to be a powerful tool in the development of spatial skills, such as student understanding of structures that can be either invisible to the naked eye or spatially complex [15,47,50]. The Periodic Table is one of the biggest achievements in the modern sciences, not only for chemistry, but also in physics, medicine, earth sciences and biology. In 2019, the United Nations celebrated the 150th international anniversary of the discovery of the Periodic System by Dmitri Mendeleev. As a result, numerous competitions, events and museum exhibitions have been dedicated to this topic. We took this opportunity to reflect on the Periodic Table and its elements and introduce them to young children, who might have not experienced them yet in their school curricula. In this article, we present our research based on the design and preliminary testing results of Periodic Fable, a game intended to promote non-formal learning of the concepts of the Periodic Table for 11 years old and younger children, that have not tackled “chemistry” as a school subject yet. Periodic Fable incorporates image recognition, tangible objects, 3D virtual reconstruction of models, animations, and is part of a growing number of initiatives that seek to utilize this type of technology to create awareness and engage children in non-formal science learning. The project was designed and produced to complement the celebrations of the anniversary of the Periodic Table.
190
2 2.1
S. C. Olim and V. Nisi
Related Work Learning Framework
Situated learning builds upon social learning and development theories, according to which the quality of learning depends on social interaction with the learning context[27,41,49]. Many virtual games and immersive simulations have been designed for training and education, to motivate students by providing a more immersive and engaging learning environment to support situated learning. The seamless integration of virtual content with the real environment can evoke in the users a perceptual state of non-mediation, a sense of presence reinforcing immersion [40]. AR design strategies create virtual immersive experiences and games, which are based on action and symbolic and sensory factors. Research has shown that immersion in virtual environments can enhance education by allowing for multiple perspectives of the learning content, applying situated pedagogy and transfer of knowledge [11]. Even exceptional students in formal educational settings often find it difficult to apply what they learn in class to similar realworld contexts [11]. AR allows users to discern and interact with the real environment while simultaneously receiving additional digital information into their field of perception [5]. The simulation of real-world problems and contexts can be obtained by near transfer. This assimilation of knowledge can then be transferred to other situations, allowing for the construction of more knowledge. Examples of this process can be found in game simulations for flight and surgical training, whereby the user can practice and train certain skills, making mistakes and getting positive feedback when the task is performed in a low-risk environment before transferring this knowledge to real-world situations. Children learn by using their senses, playing and performing activities, assimilating concepts in an intuitive manner [29]. Constructivism theory argues that children learn by interacting with the physical environment, socially, and by responding to external stimuli consciously. A constructivist environment allows children to build on previously acquired knowledge, in a reflexive process directed by a teacher, parent, or colleague [52]. Likewise, learning by doing is a fundamental activity in children. It is successful when the information is comprehended and used, and this, in turn, contributes to a change, hence transformation in cognitive structure [21]. The potential of Augmented Reality and educational games is supported by constructivism theories. Learning can also benefit from novelty effects due to new technologies [38], since the excitement and curiosity of the students cause them to be more engaged and motivated. Research shows that students’ performance is affected in different variables of the learning process, while engaging in an activity individually or in a collaborative process. [2]. Some effects are related to the intrinsic motivation (enjoying the activity itself), self-efficacy (believe in the capacity to solve a task) and self-determination (decision to do something). These results have been shown in many AR experiences where the activity is performed individually, but also studies shown that there is a tendency for active exchange of knowledge while
Augmented Reality Towards Facilitating Abstract Concepts Learning
191
students performed an AR activity [26]. However limited of research is known about the learning efficacy when comparing both methods[32]. 2.2
Augmented Reality
Augmented Reality (AR) is defined as the perception of reality that has been enhanced through computer-generated inputs and other digital information [3]. AR falls in the realms between a physical and a virtual reality spectrum, meaning that AR supplements reality, rather than completely replacing it. AR is demonstrating to hold great potential in the area of Edutainment (education and entertainment) [37]. As a pedagogical tool, AR enables critical thinking and develops creativity [51]. The continuous investment in Augmented Reality has led to a proliferation of apps in the educational realm, with an increase in off-the-shelf wearable devices, like Curioscope [9], a marker-based t-shirt that allows children to visualize 3D models of human anatomy. Elements 4D, by DAQRI, created an app for chemistry designed to simulate how elements of the periodic table react in real life [10]. Animal Alphabet uses AR Flashcards to learn letters. ZooKazam [30] and Bugs 3D [46] aim to teach about animal species. Arloon Plants [45] allows children to learn about the flora. Star Walk [48] AR app suppor ts learning about the stars, the solar system and the planets. MerchEdu allows students to explore and interact with STEM concepts using a tangible cube [33]. Despite their usefulness, however, many of the commercial AR applications targeted at children have not been evaluated for their effectiveness in learning outcomes. In addition, Augmented Reality’s ability to render objects that are hard to visualize using 3D Models (simulation), makes the understanding of abstract concepts more manageable. AR can enhance learning environments, engage, stimulate and motivate students [13,14]. In scientific subjects, complex approaches are required for the students to envision concepts like micro and macro-worlds, and its assimilation remains a difficult and challenging process. Students make observations, collaborate with others, inquire, ask questions, investigate and interpret data when engaging with the AR system [11]. Chemistry in public schools with large size classes suffers from limited resources and time available, and, as a result, some students do not have the opportunity to conduct experiments. AR simulation helps to bridge this gap, contributing to the observation of some reactions on a one- to-one basis[16]. In Switzerland, an AR virtual chemistry laboratory was created for students to view and acquire atoms through a virtual drag-and drop technique. This platform allowed students to construct their own complex molecules, developing active learning skills [20]. Besides the allowance of inquirybased learning, using this system also helped students to be active in the learning process [8]. Low- achieving students benefit considerably by interacting with 3D models of micro-particles, better understanding the composition of substances [8]. The teaching of molecule formation, covalent bond and molecule structure using AR vs traditional methods such as a textbook, proved to be more effective in improving student performance [36]. StereoChem is a mobile Augmented Reality application used to help students understand stereochemistry, focusing
192
S. C. Olim and V. Nisi
on the study of stereoisomers (molecular formulas that differ in spatial orientation) to help in the perception of the 3D molecules[28]. Augmented Chemistry allows students to obtain and understand information about the spatial relations between molecules by using the input of keyboards and tangible interaction. Augmented Chemistry represents the molecules in a 3D environment, providing multiple viewpoints and control of the interactions [43]. 2.3
Tangible Interfaces
The combination of tangible, physical interfaces with tridimensional virtual imagery allows children to play, engage, and have fun as they learn, facilitating the cognitive load [4] in a natural and genuine way [8]. Tangible interfaces can ease the communication of abstract concepts. This is the case of Digital MiMs, computer enhanced building blocks that support learning in mathematics, probabilistic behavior and more. Another example is ProBoNo, a software and tangible hardware prototype developed for children from four to six years of age. PrBoNo allows children to navigate virtual worlds more playfully and engagingly. A study of this system concluded that the tangible interface facilitated the transfer of knowledge acquired in a digital environment to a situation in the real world [22]. However, regarding the benefits of tangible interactions for children, Horn et Al.[24], advocate for a hybrid approach (a single interactive system that consists of two or more equivalent interfaces). Teachers and learners can select the most appropriate interaction according to the task and needs of a specific situation [24]. Moreover, Ambient Wood, a large-scale learning activity targeting 11-12year-old children, aims to teach children about the nature of scientific inquiry by means of tangible discovery, reflection, and experimentation. A study of the system highlighted that, while tangibility per se is not enough to engage children in a task, it is crucial to have the appropriate tangible system [31].
3 3.1
Methods System Design
Periodic Fable is a serious AR game for android mobile devices (with gyroscope, accelerometer and compass sensor), designed and produced to target 8 to 11year-old children that come into contact with chemistry for the first time. The game developed using ARCore, Autodesk Maya, Mudbox and Adobe Photoshop, provides basic concepts about chemical elements like their symbols, atomic numbers, atomic weight, reactions with other elements, and products that contain the element. It also allows users to obtain information by scanning markers, and joining elements(depicted as creatures) to create reactions. Thus, the learning process becomes fun, engaging the students[11].
Augmented Reality Towards Facilitating Abstract Concepts Learning
193
Content. The content of the game was selected with the help of a Ph.D. student (in a Chemical biology related area) and two Chemistry teachers. It takes into account the knowledge of chemistry that our target audience already acquired in science classes. Children from 8 to 11 learned some basic chemistry notions while learning about topics like pollution (carbon monoxide), blood components, the atmosphere, photosynthesis (carbon, oxygen), and the cycle of water (hydrogen, oxygen), to name some. It is not until later that chemistry becomes a curricular subject. The elements selected to feature in the Periodic Fable game are the most abundant in the environment, and have compelling reactions, which makes the students more curious, inspiring them to speculate about possible combinations. The Periodic Fable design relies upon constructivism as a learning method, allowing the children to explore and interact freely with the content, thus complementing their knowledge about the chemical elements. Moreover, to create a connection between the content and the real-world, we decided to include information that could be transferred to the everyday routine of the children. Technology/Game Mechanics. The AR game uses a marker-based system and tangible cubes as part of the mechanics. The camera of the mobile phone scans a pattern/image, recognizes it, and superimposes a digital image on the screen of the device (in our case, 3D models and animations, 2D images, video, text and audio). The criteria for the selection of the equipment is based on the children’s ease of access to mobile devices, their portability, and affordability. Autodesk Maya, Adobe Photoshop, Adobe AfterEffects and Unity 3D were used for the development of the game. We designed a tangible cube as a starting point. The child has to explore each facet of the cube to gain access to different information, which allows the user physical manipulation and exploration of spatial arrangement. The decision is the result of several iterations, heuristic evaluation and small sample testing to make playing the game easy, without need for supervision (almost intuitive). Few instructions are necessary, thus avoiding cognitive overload. The game features a cube per element, and each face of the cube expresses qualities and scientific facts about that element. At the moment we have developed five cubes, corresponding to oxygen, hydrogen, carbon and nitrogen, because they are the most important elements; their covalent bond combinations are responsible for most of the biological molecules on earth. We added chlorine to the list because, like the other elements, its properties and reactions can be the most interesting and engaging for the children. In the future, we hope to develop the experience for all the elements of the Periodic Table, allowing users to position each cube in the proper location, and to construct their own physical Periodic Table. Aesthetics. Each tangible cube is dedicated to one chemical element. All the facets of each cube (used as markers) are identified with text and a 2D image of the character holding an icon indicating the type of information that will be prompted (story/book, curiosities/ question mark, combination /silhouette of a group of characters, code name, habitat/building, information/letter “i”).
194
S. C. Olim and V. Nisi
The 3D characters were designed taking into consideration the properties of the elements that they represent. For example, oxygen is a highly reactive element, and capable of combining with other elements. Because of this, oxygen is depicted as a friendly creature, illustrated with the code name of the symbol it represents: “O”. The atomic number reflects the creature’s age, and the atomic mass is indicated as its weight (see Fig. 2 number 1). We created these analogies for each element as a way to facilitate retention of the information. The creature’s aesthetics, from color to animation, are also keyed to properties of the element and provided with several iterations to create more empathy with the user. The colors assigned to each character reflect the Robert Corey, Linus Pauline and Walter Koltun coloring standard convention known as CPK color, used to distinguish the atoms of chemical elements in molecular models [23]. The color scheme is associated with some properties of the elements. For example, hydrogen, a colorless gas, is white; chlorine is green; carbon is either charcoal or graphite; nitrogen is blue, since it makes up most of the atmosphere and oxygen is associated with combustion; blood is red. The inspiration for the base shape of our 3D character is the ball-and-stick model of the molecular models, whereas the body of the characters is rounded like the spheres that represented the atoms, and the arms and legs are rods that can connect and create bonds with other elements. We designed glasses for each of the individual characters, and feet, hands, and other human features to animate and give each element a different personality, so that children can associate the creatures to the element’s properties, empathize and engage with them. For example, the animation for hydrogen is a calm Zen creature that floats while meditating, thus portraying its low density the fact that it is essential for life (see Fig. 1).
Fig. 1. Left to right -hydrogen, nitrogen, oxygen, chlorine and carbon
Story. The concept of content-infused story games (CIS Games) builds on the fusion between games and stories to educate children [17]. Building upon this concept, a facet of the tangible cube is designed to identify with the element it represents holding a book. These animations feature in short, funny animation clips, turning into stories the outcome of joining two or more elements, and the reactions that this might trigger. For example, one of the stories dedicated to hydrogen presents the characters having fun in a water-park. Two hydrogens slide down a water toboggan and get stuck at the end of the ride. The oxygen
Augmented Reality Towards Facilitating Abstract Concepts Learning
195
that slides next is unable to stop hitting the two hydrogen creatures, and all turn into water. Chlorine is depicted as a superhero that saves his friend from a huge bacterium in a lab. Our aim is to make children learn and remember some of the chemical combination’s outcomes and properties while having fun. Game-Play. The game starts with a short tutorial that guides the children in understanding the mechanics of the game: how to explore the faces of a tangible cube, gather the elements/characters, and use them to create reactions. The participant selects one of the cubes (this selection can be random) and scans one of the markers/images on each facet, generating/uploading the information linked to that particular pattern. Facet 1 (Information). Triggers an animation of a 3D character representing a chemical element. It also provides information about the chemical symbol, including atomic mass, chemical name and atomic number(see Fig. 2 number 1). Facet 2 (Habitat). Uploads an image of an apartment building divided according to the blocks of Periodic Table (atomic weight and arranged in horizontal rows/periods and vertical columns /groups). We decided to showcase the different characters in this blocks/apartment location creating an analogy (apartment building vs location of the element in the periodic table), so children will assimilate the lodging of the element in the Periodic Table (see Fig. 2 number 2). Facet 3 (Curiosities). In this facet corresponding AR scenes, 3D models of products that have that element within their composition are visualised (see Fig. 2 left-number 3). For example, we depict oxygen as being a compound that can be found on a Pepsi drink, a cleaning product like Vanish (see Fig. 2 right - number 3). Our design took a situated learning approach, whereby children can create connections with the products in their surroundings and chemical elements. Facet 4 (Stories). Capturing facet triggers short clip (less than one minute) dedicated to that particular element reaction with another elements (see Fig. 2 number 4) Facet 5 (Combinations). Capturing facet 5 allows the children to select and explore the reactions of different elements. We placed icons on the left side of the screen for the user to select and then explore several reactions (water, fertilizer, ammonia, fire, ozone, or to create a diamond) (see Fig. 2 number 5). Once the user clicks on the button, the application will indicate what type of creatures/elements are needed, and the user will be prompted to scan the corresponding cube with the camera of the mobile device (see Fig. 2 number 6). If this activity is performed correctly, the interface notifies the child with a message congratulating him or her for the reaction just created (see Fig. 3 number 2) and showing the elements that were part of the reaction (Fig. 3 number 3). Then an animation of the reaction just created plays out (see Fig. 2 number 6).
196
S. C. Olim and V. Nisi
Each element experience ends when the user has explored all the facets of the cube dedicated to that element, and captured its reactions. The experience can be repeated for the other cubes, until all cubes have been explored. Since no dependencies of content have been designed, each child can explore one, two, three, or all the cubes, always achieving full closure with each experience.
6
5
1
4 3 2
Fig. 2. Left: 2D representation of tangible cube; Right - Interaction with the faces. 1. information/animation; 2. Habitat- location of the characters/elements within the Periodic Table; 3. Products that contain the elements; 4. Stories; 5. Combination of the elements; 6. Reactions
Fig. 3. 1. Icons reactions; 2. Congratulation messages; 3. Elements that participated in the reaction
Augmented Reality Towards Facilitating Abstract Concepts Learning
3.2
197
Participants Demographic and Data Collection
According to school curricula in Portugal, Chemistry becomes available as a subject only during the 7th grade, where students are 12 years old. To experiment if the AR based game facilitated the learning of abstract concepts in early childhood we decided to evaluate our game with children before they had engaged with abstract concept of chemistry as a school curricular activity. We based our study at a Computer Science Club at a public school in Madeira. We engaged 36 participants, 20 females (55.6%) and 16 males (44.4%) ranging from 8 to 11 years old. The study was designed as in between study, where 18 participants experienced the Periodic Fable in pairs (50%)-Experimental Condition or Group 2, while 18 individually (50%) - Control Condition – or Group 1. The study was designed to explore if the AR game would support the learning of abstract concepts such as chemistry basic, in children with no previous exposure to the subject and to explore which interaction conditions (single or pairs) would yield the best learning outcome. The study took place during one day, on November 27, 2019. 3.3
User Technology Access
Most of the students (32; 88.9%) had access to a mobile device prior to the study, but four (11.1%) did not. 15 participants (41.7%) had access to a mobile device less than a year ago. 11 participants (30.6%) had access to a mobile device for more than a year, 7 (19.4%) had access to a mobile device for over 2 years, and only four (8.3%) had no access to a mobile device. 12 participants (33.3%) knew or had heard about AR technologies, while five (13.9%) had already used AR. Only four students (11.1%) had previous knowledge of the Periodic Table. 3.4
User Study Protocol
Recruitment was done through a local school at Santa Cruz, Madeira. The experimenter contacted a teacher from the school Computer Science club, explained the project’s objectives and demonstrated the study procedures. Parents signed Parental Consent before the intervention, which included information about the date and time of the study. On the intervention day, we briefed the participants and asked them to answer a pre-test questionnaire, which included questions like: What happens when you combine one oxygen and two hydrogen? What is the symbol for nitrogen? What formed the ozone? What products have chlorine in them? What are the elements found in diamonds? After having answered the questionnaire, the participants were assigned randomly to one of the two groups, to avoid any bias. Each group of participants experienced one condition. Group 1 (Control Condition) performed the intervention individually while, Group 2 (Experimental Condition) performed the intervention in pairs (see Fig. 4) While in the control condition, individual students were seated at their desks and given a set of 5
198
S. C. Olim and V. Nisi
cubes and a mobile phone to explore the cubes by themselves; in the experimental condition, students sat in pairs and shared the 5 cubes and the mobile phone. Both groups of participants had to answer individually a post- intervention questionnaire with the same questions as in the pre-intervention questionnaire, but posed in different order, so as to assess the learning more accurately. The participants were monitored by three researchers during the interactions. Observation notes were taken regarding struggles with the application, the points students seemed to enjoy most, confusion with any of the tasks, and emotional reactions. The whole experience, including pre- and post questionnaires, took 30 min. The experiment material consisted of an android smartphone and the five tangible cubes. Data regarding usability and satisfaction on part of the students were collected using a questionnaire with a Smileyometer [39] Likert scale (it uses pictorial representation) of 5 levels - “1” for totally disagree and “5” for totally agree (see Table 1). Table 1. Usability and engagement data results Q# Question
Percentage Mean/SD
Q1 I enjoy using Periodic Fable
88.9%
4,89/0.319
Q2 It was easy to use the application
55.6%
4,39/0.803
Q3 I knew what to do during the game
75%
4,72/ 0.513
Q4 The instructions were easy to follow
61.1%
4,53/0.696
Q5 I always knew what to do during the game
52.8%
4,11/1.214
Q6 The capture of the information with the markers always worked 44.4 %
4,25/ 0.770
Q7 The camera never lost the information of the marker
55.6%
4,29/0.926
Q8 The application always read the correct marker
77.8 %
4,67/0.717
Q9 It was a great experience to play the game
88.9%
4,83/0.561
Q10 I was always performing the same action during the game
58.3%
4,36/0.961
Q11 The amount of information was enough
69.4%
4,58/0.692
Q12 The 3D objects always appeared on my mobile screen
66.7%
4,56/0.695
Q13 There was a tutorial
75%
4,67/ 0.676
Table 2. Again and Again/Engagement scale Q# Question
Percentage Mean/SD
Q1 Do you want to see the animations of the elements/creatures again? 94.4%
1.06/0.232
Q2 Would you like to watch the stories again?
72.2%
1.25/0.500
Q3 Would you like to make more combinations of elements?
86.1%
0.66/ 0.398
Q4 Would you like to see the habitat of the elements again?
86.1.1%
1.08/0.439
Q5 Would you like to see more information about the elements?
80.6%
1.19/0.401
Q6 Did you learn any information from this experience?
88.9 %
1.11/ 0.319
Augmented Reality Towards Facilitating Abstract Concepts Learning
199
The second measurement, related to enjoyment, was gathered through an Again-Again table. This measure is based on the knowledge that people like to do fun things again [39]. We also analyzed which part of the experience was most enjoyable for the participants by answering “Yes” as code 1, “No” code as 0, and “Maybe” as code 2 questions related to each side of the tangible cube (see Table2). To estimate gains in learning outcomes regarding the concepts of the Periodic Table, we used the multiple-choice pre-intervention and postintervention questionnaire data. We also conducted two comparative analyses using a non-parametric test within and between groups
4
Results
The study aimed to assess whether the use of the AR Serious Game facilitates the learning of concepts related the Periodic Table. We also wanted to analyze whether interventions in pairs had higher gains than individual interventions. The normality of the sample in terms of results was evaluated, and since the values of Skewness and Kurtosis (p = 0.002) showed that the assumption of normality was violated, we used a non-parametric test. For analysis within each group, we used the Wilcoxon signed rank test, with the game played individually (Control Condition or Group 1) and in pairs (Experimental Condition or Group 2). The results show that the post-intervention learning outcome is significantly higher for participants in the Control condition (Mdn = 2), T = 78, p = 0.001, r = 0,54 (large effect) as well as in the Experimental Condition (Mdn = 1), T = 36, p = 0.010, r = 0.43 (large effect). In sum, the application had a strong, positive effect in the learning of concepts about the Periodic Table in both groups (see Fig. 4). We conducted a between group analysis – using the Mann-Whitney test – with the same conditions as above. When measuring the children’s knowledge before playing the game (Mdn = .00) U = 110.00, p = 0.10, r = –0.31(medium effect), the results show that the difference in learning outcomes between groups was not significant. However, in the post- intervention test, the children that played individually had significantly higher results than those that did the intervention in pairs (Mdn = 1), U = 98.50, p = 0.04, r = –2.09 (medium effect). Contrary to the findings of Mart´ın-SanJos´e [32] individual intervention had higher learning gains that in a collaborative setting. We attribute these results to the fact that the participants that did not share the mobile device had more freedom to explore the tangible cube, controlling the areas that they wanted to visualize, the time dedicated to each area, and could repeat the tasks ad libitum (see Fig. 4). We also performed a Whitney U analysis to evaluate if there were any gender differences in the learning outcomes. From the data, we can conclude that gender differences were not statistically significant (U = 139, p = .519). The users were pleased with factors about the usability and engagement of the game (see Table 1). Question 6 (Q6) had the lowest score (Mean = 4,25). Few children had difficulties when scanning the cubes images/ markers(see Fig. 4).
200
S. C. Olim and V. Nisi Related-Samples Wilcoxon Signed Rank Test Summary Total N Within Groups Learning Outcome
Individual intervention 18
Pair intervention 18
Test Statistic
78.000
36.000
Standard Error
12.124
6.955
Standardized Test Statistic
3.217
2.588
.001
.010
Asymptotic Sig. (2-sided test)
Independent-Samples Mann-Whitney U Test summaries Pre test
36
110.000
98.500
Wilcoxcon W
281.00
269.500
Test Statistic
110.000
98.500
Standard Error
27.842
30.415
Standardized Test Statistic
Man - Whitney U Between Groups Learning Outcome
Pos test
36
Total N
-1.881
-2.088c
Asymptotic Sig. (2-sided test)
.060
.037
Exact Sig. (2 -sided test)
.104
0.44
Fig. 4. Top: Results within subject learning outcome - Bottom: Results between subjects learning outcome
The Kruskal-Wallis test results do not demonstrate any difference in the usability and engagement of the activity between the groups conditions,χ2 (1) = 0.22, p = 0.15.
5
Discussion, Limitations and Future Work
Our results reinforce the positive effects of gaming techniques and AR technologies in the engagement and motivation of the students [4]. According to the questionnaire, the AR visualization of the animated characters/elements introducing themselves to the players was the most enjoyable part of the experience, hence we can infer that the creatures/elements captivating characterizations as funny looking, intriguingly futuristic characters was a successful factor in engaging the children(see Table 2). The participants’ reactions show that they enjoy combining the elements through the cube facets that triggered the AR scenes. One of the students commented aloud “I just made water!” and another responded, “I just created a diamond!” According to the data, the 2D animated short stories about the elements were the least captivating part of the experience. Overall, the participants acknowledged that they had learned something new about the Periodic Table. The data also show that the application increased the curiosity of the children regarding other chemical elements. The results of the preliminary study reported that our game, has positive learning outcomes regarding basic Periodic Table concepts in non-formal settings. Our results also show that there is potential in combining smart phones and Augmented Reality in non-formal spaces to facilitate the learning of content that is not available through the school curriculum, and that this helps to create
Augmented Reality Towards Facilitating Abstract Concepts Learning
201
intrinsic motivation towards future STEM subjects. Young students who came in contact with chemistry through the Periodic Fable game for the first time were able to learn while engaging in the activity. Exploring all the faces of the tangible cube to gather/visualize information was an intuitive and rewarding process. Nevertheless, some technical challenges were identified regarding the ease of use of the Augmented Reality application. Technical problems such as tracking loss made the interaction difficult for some. Since our application depended on image-based tracking, the experience stopped when moving away from the marker. This was frustrating to some children who wanted to view the animations without being interrupted. Other technical limitations when using AR marker-based technology are dependency on light quality to read the images/markers; delays in the rendering of data; over-heating of the equipment; battery consumption; the need for robust equipment with the required sensors (gyroscope, accelerometer, and compass). The results of the study enabled us to distill insights that could be beneficial when designing an AR serious game for a young audience: 1) find the appropriate pedagogical framework for the content to be delivered [18], AR benefits situated and constructive methods; 2) pay attention to the adequacy of aesthetics to the age range; 3) design analogies and metaphors carefully, ensuring that they support the mental images and the construction of ideas needed to facilitate the understanding of the concept; 5) balance the amount of information delivered with the usability load to avoid cognitive overload.
6
Conclusion
In this paper we have reported on the design and preliminary study of a Periodic Fable AR serious game designed to support children’s understanding of abstract concepts such as those found in chemistry. Our study demonstrated that the AR game facilitated the understanding of chemistry and the learning of abstract concepts in non-formal settings. Further research is needed to categorize its benefits and limitations more clearly, and to develop clear guidelines and appropriate tools to address these challenges. Acknowledgments. This work has been supported by LARSyS-FCT Plurianual funding 2020-2023 (UIDB/50009/2020), MITIExcell(M1420- 01-0145-FEDER-000002), MADEIRA 14-20 FEDER funded project Beanstalk (2015–2020) and FCT Ph.D. Grant PD/BD/150286/2019. Our gratitude also goes to the students and computer science teacher at Escola B´ asica de Santa Cruz. We would like also to thank Gon¸calo Martins, Michaela Sousa, Rui Trindade and Manuel Fontes for their ongoing support.
References 1. Arcos, C.G., et al.: Playful and interactive environment-based augmented reality to stimulate learning of children. In: 2016 18th Mediterranean Electrotechnical Conference (MELECON), pp. 1–6 (2016)
202
S. C. Olim and V. Nisi
2. Arvanitis, T.N., et al.: A human factors study of technology acceptance of a prototype mobile augmented reality system for science education. Adv. Sci. Lett. 4(11– 12), 3342–3352 (2011) 3. Azuma, R.T.: A survey of augmented reality. Presence Teleoper. Virtual Environ. 6(4), 355–385 (1997) 4. Billinghurst, M., Clark, A., Lee, G., et al.: A survey of augmented reality. Found. R Hum. Comput. Interact. 8(2–3), 73–272 (2015) Trends 5. Bower, M., Howe, C., McCredie, N., Robinson, A., Grover, D.: Augmented reality in education-cases, places and potentials. Educ. Media Int. 51(1), 1–15 (2014) 6. Britton, J.: Vygotsky’s contribution to pedagogical theory. English Educ. 21(3), 22–26 (1987) 7. Buckley, J., Seery, N., Canty, D.: A heuristic framework of spatial ability: a review and synthesis of spatial factor literature to support its translation into stem education. Educ. Psychol. Rev. 30(3), 947–972 (2018) 8. Cai, S., Wang, X., Chiang, F.K.: A case study of augmented reality simulation system application in a chemistry course. Comput. Hum. Behav. 37, 31–40 (2014) 9. Curiscope: bring learning to life, May 2020. https://www.curiscope.com/ 10. DAQRI: elements 4D interactive blocks, January 2014. https://www.kickstarter. com/projects/daqri/elements-4d-interactive-blocks/ 11. Dede, C.: Immersive interfaces for engagement and learning. Science 323(5910), 66–69 (2009) 12. Dewey, J.: Liberalism and Social Action, vol. 74. Capricorn Books, New York (1963) ´ Ib´ 13. Di Serio, A., an ˜ez, M.B., Kloos, C.D.: Impact of an augmented reality system on students’ motivation for a visual art course. Comput. Educ. 68, 586–596 (2013) 14. Duh, H.B., Klopfer, E.: Augmented reality learning: new learning paradigm in co-space. Comput. Educ. 68(1), 534–535 (2013) 15. Dunleavy, M., Dede, C.: Augmented reality teaching and learning. In: Spector, J.M., Merrill, M.D., Elen, J., Bishop, M.J. (eds.) Handbook of Research on Educational Communications and Technology, pp. 735–745. Springer, New York (2014). https://doi.org/10.1007/978-1-4614-3185-5 59 16. Edomwonyi-Otu, L., Avaa, A.: The challenge of effective teaching of chemistry: a case study. Leonardo Electron. J. Practic. Technol. 10(18), 1–8 (2011) 17. Fails, J.A., Druin, A., Guha, M.L., Chipman, G., Simms, S., Churaman, W.: Child’s play: a comparison of desktop and physical interactive environments. In: Proceedings of the 2005 Conference on Interaction Design and Children, pp. 48–55 (2005) 18. Fisch, S.M.: Cross-platform learning: on the nature of children’s learning from multiple media platforms. New Dir. Child Adolesc. Dev. 2013(139), 59–70 (2013) 19. Fissler, P., Kolassa, I.T., Schrader, C.: Educational games for brain health: revealing their unexplored potential through a neurocognitive approach. Front. Psychol. 6, 1056 (2015) 20. Fjeld, M., Schar, S.G., Signorello, D., Krueger, H.: Alternative tools for tangible interaction: a usability evaluation. In: Proceedings. International Symposium on Mixed and Augmented Reality, pp. 157–318. IEEE (2002) 21. Gardner, H.: Using multiple intelligences to improve negotiation theory and practice. Negotiat. J. 16(4), 321–324 (2000) 22. G¨ ottel, T.: Probono: transferring knowledge of virtual environments to real world situations. In: Proceedings of the 6th International Conference on Interaction Design and Children, pp. 81–88 (2007) 23. Helmenstine, T.: Science notes - learn science do science, August 2019. http:// sciencenotes.org/category/chemistry/periodic-table-chemistry/page/2/
Augmented Reality Towards Facilitating Abstract Concepts Learning
203
24. Horn, M.S., Crouser, R.J., Bers, M.U.: Tangible interaction and learning: the case for a hybrid approach. Pers. Ubiquit. Comput. 16(4), 379–389 (2012) 25. Host’oveck` y, M., Salgovic, I., Viragh, R.: Serious game in science education: how we can develop mathematical education. In: 2018 16th International Conference on Emerging eLearning Technologies and Applications (ICETA), pp. 191–196. IEEE (2018) 26. Hung, Y.H., Chen, C.H., Huang, S.W.: Applying augmented reality to enhance learning: a study of different teaching materials. J. Comput. Assist. Learn. 33(3), 252–266 (2017) 27. Johnson, D.W., Johnson, R., Holubec, E.: Cooperation in the College Classroom. Interaction Book Company, Edina (1991) 28. Narasimha Swamy, K.L., Chavan, P.S., Murthy, S.: Stereochem: augmented reality 3D molecular model visualization app for teaching and learning stereochemistry. In: 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT), pp. 252–256. IEEE (2018) 29. Kritzenberger, H., Winkler, T., Herczeg, M.: Collaborative and constructive learning of elementary school children in experiental learning spaces along the virtuality continuum. In: Herczeg, M., Prinz, W., Oberquelle, H. (eds.) Mensch & Computer 2002, pp. 115–124. Springer, Heidelberg (2002). https://doi.org/10.1007/ 978-3-322-89884-5 12 30. AtlantaAR LLC: Zookazam magical animals, January 2020. http://www. zookazam.com/ 31. Marshall, P., Price, S., Rogers, Y.: Conceptualising tangibles to support learning. In: Proceedings of the 2003 Conference on Interaction Design and Children, pp. 101–109 (2003) 32. Mart´ın-SanJos´e, J.F., Juan, M., Torres, E., Vicent, M.J., et al.: Playful interaction for learning collaboratively and individually. J. Amb. Intell. Smart Environ. 6(3), 295–311 (2014) 33. MergeLabs Inc.: Merge edu (2020). https://mergeedu.com/l/engage/ 34. Newcombe, N.S., Frick, A.: Early education for spatial intelligence: why, what, and how. Mind Brain Educ. 4(3), 102–111 (2010) 35. Paredes, J.: Juego, luego soy. Teor´ıa de la actividad l´ udica. Sevilla, Wanceulen (2003) 36. Pasar´eti, O., Hajdin, H., Matusaka, T., J´ ambori, A., Moln´ ar, I., Tucs´ anyiSzab´ o, M.: Augmented reality in education. In: INFODIDACT 2011 Informatika Szakm´ odszertani Konferencia (2011) 37. P´erez-Sanagust´ın, M., Hern´ andez-Leo, D., Santos, P., Kloos, C.D., Blat, J.: Augmenting reality and formality of informal and non-formal settings to enhance blended learning. IEEE Trans. Learn. Technol. 7(2), 118–131 (2014) 38. Radu, I.: Augmented reality in education: a meta-review and cross-media analysis. Pers. Ubiquit. Comput. 18(6), 1533–1543 (2014) 39. Read, J.C., MacFarlane, S., Casey, C.: Endurability, engagement and expectations: measuring children’s fun. In: Interaction Design and Children, vol. 2, pp. 1–23. Shaker Publishing Eindhoven (2002) 40. Riva, G., Waterworth, J., Murray, D.: Interacting with Presence: HCI and the Sense of Presence in Computer-mediated Environments. Walter de Gruyter GmbH & Co KG (2014) 41. Roth, W.M.: Authentic School Science: Knowing and Learning in Open-inquiry Science Laboratories, vol. 1. Springer, Dordrecht (2012). https://doi.org/10.1007/ 978-94-011-0495-1
204
S. C. Olim and V. Nisi
42. Shelton, B.E., Hedley, N.R.: Exploring a cognitive basis for learning spatial relationships with augmented reality. Technol. Instr. Cogn. Learn. 1(4), 323 (2004) 43. Singhal, S., Bagga, S., Goyal, P., Saxena, V.: Augmented chemistry: interactive education system. Int. J. Comput. Appl. 49(15) (2012) 44. Squire, K.D., Jan, M.: Mad city mystery: developing scientific argumentation skills with a place-based augmented reality game on handheld computers. J. Sci. Educ. Technol. 16(1), 5–29 (2007) 45. Arloon Trade Mark: Plants, January 2019. http://www.arloon.com/apps/plants/ 46. TTPM, P.T.: Bugs 3D Interactive Boom, January 2020. https://ttpm.com/p/ 9759/popar-toys/bugs-3d-interactive-book/ 47. Valimont, R.B., Gangadharan, S.N., Vincenzi, D.A., Majoros, A.E.: The effectiveness of augmented reality as a facilitator of information acquisition in aviation maintenance applications. J. Aviat./Aerosp. Educ. Res. 16(2), 9 (2007) 48. Vito Technology Inc. Star walk - the sky at your fingertips, January 2020. https:// starwalk.space/en/ 49. Vygotsky, L.S.: Thought and language (e. hanfmann & g. vakar, trans.) (1962) 50. Wijdenes, P., Borkenhagen, D., Babione, J., Ma, I., Hallihan, G.: Leveraging augmented reality training tool for medical education: A case study in central venous catheterization. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–7 (2018) 51. Z¨ und, F., et al.: Augmented creativity: bridging the real and virtual worlds to enhance creative play. In: SIGGRAPH Asia 2015 Mobile Graphics and Interactive Applications, pp. 1–7 (2015) 52. Zurita, G., Nussbaum, M.: A constructivist mobile learning environment supported by a wireless handheld network. J. Comput. Assist. Learn. 20(4), 235–243 (2004)
Enhancing Whale Watching with Mobile Apps and Streaming Passive Acoustics Nuno Jardim Nunes1,2 , Marko Radeta1,3(B) , and Valentina Nisi1,2 1
ITI/LARSyS, Funchal, Portugal Tecnico - University of Lisbon, Lisbon, Portugal {nunojnunes,valentina.nisi}@tecnico.ulisboa.pt 3 University of Madeira, Funchal, Portugal [email protected] 2
Abstract. Whale watching is a prime example of ecotourism services reaching millions of people worldwide. In this paper, we describe a prototype of a mobile application designed to enhance the whale watching experience. Integrating eco-acoustics and entertainment components, the authors designed an App that enhances customer satisfaction even in the case of distant sighing, which helps preserve animal well-being in this growing ecotourism activity. This paper presents the design and preliminary evaluation of the system, which involves many different components, from usability and presence known scales. We asked 12 participants to conduct several activities, including (i) report a sighting, (ii) classify a cetacean vocal call, and (iii) gather a cetacean photo. Because of the system’s novelty, we report on an extensive evaluation framework, including usability and presence scales reported by 12 whale watchers. We discuss the implications of our findings on the design of future applications for ecotourism services. Keywords: Whale watching · Environmental sustainability entertainment · User experience · Ecotourism
1
· Mobile
Introduction
Environmental sustainability is one of the fastest-growing areas of activity in technology related-research [16]. Many research efforts are examining the opportunities to use computing technologies to promote environmental protection, and ecological consciousness from various angles, including entertainment [2,10,23,51,60]. This is a responsible reaction to the twin crisis of climate and nature, which is supported by overwhelming scientific evidence [43]. It is causing increasing public concerns [47]. Put simply, people and wildlife are at risk unless we take urgent action to mitigate and hopefully reverse the loss of plants and animals on which our ecosystems depend for food, clean water, and a stable climate [28]. Oceans are a crucial driver of earth ecosystems. They cover three-quarters of the Earth’s surface, contain 97% of the Earth’s water, and absorb 30% of the c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 205–222, 2020. https://doi.org/10.1007/978-3-030-65736-9_18
206
N. J. Nunes et al.
carbon emissions acting as a buffer for global warming. Careful management of this essential global resource is a vital feature of a sustainable future recognized by the United Nations Sustainable Development Goal 14 - Conserve and sustainably use the oceans, seas and marine resources1 . However, observe a continuous deterioration of coastal waters due to pollution and ocean acidification compromising ecosystems and biodiversity. The UN SDG 14 goal ambitiously targets to protect 10% of the ocean by 2020, preventing or significantly reducing marine pollution [12]. At the present scale, although 12% of the Earth’s land is conserved through arrangements such as national parks, only 5% of the ocean has any protection at all2 . Marine Protected Areas are a useful management tool towards conservation [17]. However, their size and scope are not always adequate for protecting marine megafauna, like whales, dolphins, and sharks [11]. Conservation measures for these large pelagic animals depend much on data acquisition in the open ocean to ascertain their population’s status and detect temporal and spatial patterns in their distribution. Such an endeavor usually relies on dedicated monitoring and vessel-based survey programs, which are typically costly and timeconsuming. Leading to often temporally sparse or sporadic and spatially patchy collected data. We can partially tackle these data gaps by taking advantage of the growing interest and offer of recreational whale-watching service providers, as these already spend time at sea, tracking cetaceans and other large animals. On the other hand, whale-watching activities engage people with cetaceans, one of the most charismatic fauna raising awareness towards the promotion of ocean conservation measures and actions [22]. 1.1
Digital Entertainment for Nature Conservation
Digital technology is changing nature conservation in increasingly profound ways. Some have called this new movement “Wilderness 2.0” [53] or “Nature 2.0” [9] referring to the articulation of new media, mobile technologies, and locative media, with the practice of wilderness recreation [53] and online activities that stimulate and complicate the commodification of biodiversity and nature [9]. Digital technology is thus promoting a new reality (between the virtual and the real) where people connect and recreate a new social construct of nature and the underlying digital economy [53], reimagining ideas, ideals and experiences of nature [9]. In this new context, environmentalists have long expressed concern that increasing the mediation of non-human interactions [19] is contributing to diminishing support for conservation [18]. Consequently, digital entertainment is increasingly promoted by environmental organizations precisely for their potential to inspire more active commitment to environmental causes than the direct experiences most conservationists advocate [18,49]. Using gamification and serious games promotes a similar strategy to engage consumers in pro-environmental behaviors for areas such as nature conservation [42], energy efficiency [37], transport [67], and sustainability [31]. 1 2
https://sustainabledevelopment.un.org. https://sustainabledevelopment.un.org/frameworks/ouroceanourfuture.
Enhancing Whale Watching
1.2
207
Enhancing the Whale Watching Experience
This project aims to test how digital entertainment could be used in whale watching to engage participants in this form of ecotourism and enhance their experience while promoting more environmentally friendly behaviors. From Moby Dick to Free Willy, the conceptualization of whales in entertainment and their societal representations and understandings have changed dramatically in the last century [63]. Lawrence and Philips propose that whale watching is one key example of an economic activity emerging from the radical changes that occurred in societal understandings and representations of nature and animals [29]. Humans have been hunting whales since the beginning of history, but it was not until the 1930s that saw the development of international regulations to control the number and type of whales harvested [58]. Notwithstanding, these regulations were developed to manage oversupply and later overexploitation and still conceptualized the animals as “whale stocks” of valuable human resources and not as animals or species [30]. Changes in the regulatory discourse impacted the image of whales in popular culture. Most movies until the 1970s present a negative image of whales as “man-eating and ship destroying monsters” of which Moby Dick (1956) and 20,000 Leagues Under Sea (1954) are the most known examples. The movies produced after 1980 present a very different image of whales and their relationship with people. Perhaps the most emblematic example is Free Willy (1993), which is responsible for an increase in cetacean-related tourism activities like dolphin swimming and whale watching as well as visitations to marine amusement parks [63]. Our project aims at digitally enhancing the whale watching experience, while raising awareness about ocean sustainability concerns, such as stressing and disturbing the animals in their habitats. Whale watching is a USD 2.1 billion global industry, attracting more than 13 million people worldwide [11]. However, this activity is usually reasonably expensive for participants who, due to the elusive nature of these animals in the wild, often fail to see their (submerged) full-size body. At the same time, whale watching activities and pressure do affect animal behavior, and noise pollution impacts these animals as they much rely on sound to communicate, navigate, and even locate prey [44]. Our motivation for this research is to provide on boat activities that enhance the whale watching experience, allowing tourists to perform fulfilling interaction with the cetaceans while minimizing stressful situations for the animals themselves. For this purpose, our prototype enhances the whale watching activity by signaling the presence of cetacean and bringing the experience of their sounds and location to the tourists, preserving distance from the animal, and minimizing the invasion of their territory. In the remainder of this paper, we position our work against the state of the art of, describing the prototype design and preliminary evaluation on system usability and presence scales of whale watchers. Our results inform a discussion on the future of applications for nature engagement and sensing.
208
1.3
N. J. Nunes et al.
Research Questions
This work builds on other reported case studies designing novel interactive systems and services for tourism related applications [3,41,42,48,56]. In our previous work [46], we designed a ubiquitous mobile system to enrich the whalewatching activities with passive acoustics. Our reported study was proof of concept in performing the real-time machine learning classification of cetacean vocal calls using low-cost sensing and communication. Here, we expand the work, allowing users to actively report the sightings, sounds classification, and media gathering while not approaching the animals beyond stress generating proximity. In this study, we obtain participants’ feedback and the overall results of the systems’ usability and presence. Therefore, this research mainly explores how successful the system is in providing a whale-watchers fulfilling and entertaining experience. Throughout several diverse usability and presence scales, we verify the collected feedback from the participants and discuss results in light of the application’s future refinement.
2
Related Work
We situate this research in-between different three main areas of prior work: sensing and engaging nature, digital entertainment, and nature and whale watching as a form of ecotourism. In the following subsections, we address each one in turn. 2.1
Sensing and Engaging Nature
The advent of low-cost, ubiquitous sensing and computing leads to increased use of these technologies in biology, ecology, and environmental sciences, e.g., research on Eco acoustics [54]. These efforts build on empirical evidence of biotic and non-biotic sounds used as a reliable proxy for investigating ecological complexity [26], including biodiversity loss and climate change. These technologies enable the identification of several thematic priorities for research programs that take advantage of novel computer sciences and technologies. One notable example is AudioMoth [21], a low-cost, small-sized, and low-energy acoustic detector. The device is open-source and programmable, with diverse applications for recording animal calls or human activity. AudioMoth deployed globally reached in 2019 more than 700 projects worldwide [21]. The device also facilitated public engagement, including citizen science projects for bat surveys3 , biodiversity distribution assessments4 , BirdsScience5 , cetalingua6 , and engagement of the general public in wildlife monitoring programs7 . 3 4 5 6 7
https://www.bats.org.uk. https://www.soundscapes2landscapes.org. https://www.wesa.fm/post/pitt-researchers-are-eavesdropping-birds-name-science. https://www.cetalingua.com/citizen-science/. https://www.nocmig.com/audiomoth.
Enhancing Whale Watching
209
Similarly, the HCI community saw an opportunity to apply the same concept to camera traps. My NatureWatch Camera is an inexpensive wildlife camera designed for people to self-build a way to promote environmental engagement and produce digital content [20]. My NatureWatch Camera includes a smartphone interface that allows a facile setup and retrieval of images. The authors of the project worked closely with a TV Show and social media outreach to highlight the importance of co-development for research projects to circulate at scale. The users of the device also reported increased awareness about local nature and animals. POSEIDON [46] is a low-cost passive acoustic monitoring system for citizen science in nautical/marine settings, which provided whale-listening to whale-watching. Collectively, with other projects focusing on different taxa such as insects [45,61] or birds [55], it illustrates the possibilities of applying the low-cost IoT technologies to sense nature and engage with the public. 2.2
Digital Entertainment and Nature
Entertainment and gaming approaches are increasingly used to raise awareness about serious issues in a wide range of fields, including biodiversity conservation [49]. At the same time, environmentalists express concerns about the increase in technological mediation of human and non-human interactions, keeping people away from nature, and interacting with digital artifacts instead. This phenomenon could be contributing to a growing “nature-deficit disorder” (NDD) and thereby diminishing support for conservation [18,24]. These concerns are grounded in the premise that direct experiences in “natural” spaces, especially those gained in childhood, will inspire people to “love” non humans [36] and subsequently act in support of environmental causes [64]. Nevertheless, digital entertainment artifacts approaching environmental and animal preservation of natural premises are many. Examples range from Defend Your Territory8 , a game inciting player to kill evil animal poachers; Camera Birds9 which requires the player to take photos of birds in a virtual forest; Wounded iWhale Rescue10 inviting players to “save” (i.e. catch) whales. Deep water hero11 a digital puzzle game demanding players to break up an oil slick to save animals; WilderQuest12 , targeting 5 to 8 eight years old children, placing them in an immersive 360 degree virtual environment to encourage them to observe and discover the nature around them by holding the iPad like a camera. On the other hand, addressing Milton and Kareiva concerns [24,36], digital games encourage players to go into nature in order to interact with it. For example, with Wildtime App children or parents use their mobile phones to discover enjoyable activities in nearby green space13 . Dionisio et al. developed a rich 8 9 10 11 12 13
https://www.geekwire.com/2015/game-about-killing-poachers-vies-for-50000prize-in-microsoft-student-tech-competition/. https://appadvice.com/app/camera-birds/520656685. https://gamefaqs.gamespot.com/android/701348-wounded-iwhale-rescue. https://www.amazon.com/Deep-Water-Hero-HD-Free/dp/B007TXSCV0. https://wilderquest.nsw.gov.au/. https://www.thewildnetwork.com/wild-time-ideas.
210
N. J. Nunes et al.
transmedia entertainment experience to promote the natural capital of a biodiversity-rich tourist destination while physically exploring its real settings [15]. Loureiro et al. propose a mobile game to be played in real parks and green spaces supporting the biodiversity awareness through bioacoustic data validation of local species [34]. In summary, mobile apps designed to support nature immersed recreation are emerging. They frequently integrate GPS and compass functionality with maps to support navigation, record observations and send alerts of unusual sightings to other users. The App is ActInNature Hunting, is an excellent example, which adds a social dimension to the above-described functionalities. A minority of Apps associated with visiting attractions (e.g. museums, botanical gardens and parks), provide augmented reality enhancements of exhibits [35,39,62]. Our App, designed to support the recreation of the users while immersed in nature, builds and extends on state of the art by proposing the combination of acoustic sensing and animal sighting while connecting users in building a local database of data, open to a public of scientists and non. We paid particular attention to designing an application that would not stress or endanger animals while enhancing the whale watchers’ experience while immersed in nature. 2.3
Whale Watching and Ecotourism
Whaling was an essential element of the industrial revolution, providing one of the first commercially viable commodities used as a source of energy. The charismatic nature of these giant mammals played an essential role in supporting the rise of environmentalism and NGOs like Greenpeace [6]. More recently, cetaceans embody one of the prime examples of ecotourism services, with the whale watching industry reaching 13 million people worldwide (conducted in more than 119 countries) and accounting for more than 2 billion USD revenue [52]. From Moby Dick to Free Willy, the conceptualization of whales and their societal representations and understandings have changed dramatically over time [63]. Lawrence and Philips [30] propose that whale watching is a key example of an economic activity, which emerges from the radical changes that occurred in the societal understanding of nature and animals [6]. Humans have been hunting whales since the beginning of history. However, it was not until the 1930s that international regulations were developed to control the number and type of whales harvested. Notwithstanding, these regulations were developed to manage oversupply and later over-exploitation, which still position the animals as a valuable resource to humans. The argument provided by Latour [7] is that we can witness in human-whale entanglements a demonstration of a more general transformation of society referred to as “ecoligisation” [6] (a hybrid world in which the co-production of humans and non-humans are constantly renegotiated) [7]. Whales are thus prime examples of non-human actors, individualized in their “actorhood” and acquiring “personalities”, which became increasingly plausible [57]. Focusing on the design of sustainable interactions with whales, we need to consider the debate between the “short-term” impacts and the “long-term”
Enhancing Whale Watching
211
consequences of whale watching [13]. Indeed, some authors have argued that whale watching corresponds to a metabolic rift of whale hunting, problematizing the pervasive assumption that whale watching correlates primarily and directly with conservation [38]. Also, there have been other systems proposed to question the impact of climate change in the distribution and abundance of whales [27]. Lambert and colleagues developed a resilience framework of whale watching to these changes. They defined resilience as the degree of change in cetacean occurrence experienced before tourist numbers fall below a critical threshold [27]. Their framework combined likelihood of observing a cetacean, trip type and tourist type, which, when quantified, could identify which operators are more likely to experience a change in tourist numbers, given a specific scenario of changing cetacean occurrence. While these studies provided system and insight into the environmental consequences, Bentz and colleagues studied the satisfaction with whale watching tours in relationship to expectations and demographic variables and identified tour aspects that contribute to satisfaction [5]. Their study was based on expectancy theory, which suggests that participants engage in recreational activities expecting that this will fulfill their needs and motivations [25]. It then compared importance-performance analysis and performance-only perspective to measure satisfaction of whale tour participants. Environmental friendly conditions were the most important expectation: Seeing one whale, seeing lots of whales, the cost of the trip and the boat type were the most influential factors contributing to satisfaction [5].
3
System Description
In this section, we briefly describe the whole system setup, extending POSEIDON [46] with a custom made, on-boat mobile application, designed to support and enhance the whale watching activities. The system provides additional adhoc scientific information supporting awareness and inspiring conservation of marine megafauna. 3.1
Sensing Hardware and Mobile Application
The eco-acoustic sensing station is encapsulated in an autonomous, buoyant Styrofoam sphere, which contains the microcomputer with the WiFi connection, powered by the solar panels and containing the media equipment (Fig. 1). Once at the whale-watching spot, the capsule is thrown into the water, and dragged from the sea vessel with a common cable. The acoustic buoyant system was previously described and discussed in [46]. The novel POSEIDON mobile application (Fig. 2), completing the POSEIDON system, is designed to provide users with three diverse types of information: (i) general information about cetaceans; (ii) media gallery (showcasing the collected sound and images); and (iii) three action windows (report a sighting, classify a sound, and gather underwater and surface photos). The Watkins Marine Mammal Dataset [50] inspired the cetaceans used in our application. Our study
212
N. J. Nunes et al.
(a) Microcomputer with the power bank, Wi-Fi dongle (b) Assembled buoyant capfor communication with GoPro, audio jack and stereo sule including all components. cable, and a hydrophone. Tripod is used for exhibition.
Fig. 1. POSEIDON hardware system components
was pre-installed on the whale watcher’s tourist phone and accompanied the user during the whale watching tour. The App comprises several features described in the ordered list below: 1. The App supports the storing and collection of reported marine megafauna sightings, shared by previous users. Once an animal is sighted, its GPS position and the sighted species, can be logged by the user on the map interface of the App (Fig. 2a, g). 2. The App reports sounds recordings of the sensing of megafauna underwater, portraying the live acoustic feed from hydrophone hardware. The sound reports can be access on a separate screen. A detailed description of this part of the app is provided in [46] (Fig. 2b). 3. The App stores the user’s photos, captured by means of pointing and clicking the mobile device. Action triggers a contemporary underwater shot, performed by the buoyant hardware system and shared to the App (Fig. 2c). 4. In order to collect information regarding the App usability and the user satisfaction the App presents a screen Survey to be filled by individual users at the end of the experience (Fig. 2d). 5. The App presents the Watkins Marine Mammal Dataset collection [50] to facilitate users recognition and reporting of specific species. The user can select the animal that they think they have sighted at any given moment during the whale watching trip (Fig. 2e). 6. The App contains screens with scientific information and details of the species that have been sighted, according to the Watkins Marine Mammal Dataset collection [50] (Fig. 2f).
Enhancing Whale Watching
213
Fig. 2. Screenshots from the POSEIDON application. From left to right: (a) Sightings main screen, portraying the reported sightings; (b) Acoustics main screen, portraying the live acoustic feed from hydrophone; (c) Photos main screen, depicting collected underwater and surface photos; (d) Survey at the end of the experience; (e) Sighting report screen; (f) Sighting Reporting screen information collection; (g) Sighting reported at the GPS location. (h) Classification of acoustics with given options (click, moan, whistle, engine or nothing).
7. The App presents the users with an option of classifying the acoustic sounds collected by the hydrophone. The classification of acoustics is categorized within four given options (click, moan, whistle, engine or nothing) (Fig. 2h).
214
N. J. Nunes et al.
(a) Surface image of the pilot-whale sun- (b) Underwater images of the two breastfeedbathing and taking a rest. ing mothers with one calf.
Fig. 3. POSEIDON images depicting spotted cetaceans during the user study.
3.2
Tasks Performed by Whale Watchers
In summary, the App enables tourists’ whale watchers to perform a set of additional tasks that enhance the experience, without disrupting the animal’s wildlife: 1. Reporting a cetacean sighting – Whale watchers are presented with the map of possible cetacean sightings (Fig. 2a) and they are asked to report if they spot a cetacean. Once the cetacean is spotted, the user proceeds with selecting the REPORT button (Fig. 2e) and is guided to insert the additional details with the help of the researchers on board (Fig. 2f). These details include the duration and type of the activity, group and calves’ size, Beaufort wind scale and the boat type. Once the report has all the information, the user adds the report, which instantly appears as a red spot on the map (Fig. 2g). 2. Classifying a cetacean vocal call - Participants can hear the cetacean vocal calling, which is being recorded by the hydrophone and has been verified as one of the five second acoustic samples with the prediction of above 0.75. (click, moan, whistle, engine noise and empty water). Once the sound is played through the participant headphones, participant classifies the sound by pressing the CLASSIFY button. Participants are further guided to select one of the given five options: click, moan, whistle, engine noise and empty water (Fig. 2h). While listening to the sounds, the participants can see the visual wavelength of the acoustic samples (Fig. 2b). 3. Gathering cetacean images - Participants can gather the images from below and above the water surface. Participants select the type of camera they want to use and press the COLLECT button to obtain the photo (Fig. 2c). The underwater photo is captured from the action camera installed in the capsule component of the hydrophone, while the surface photo is from the actual mobile phone the participants are using. In the case of the underwater photo, participants need to wait approximately five seconds from when the image is shot to when it reaches the users’ mobile phone. We opted for this waiting time in order to prolong the battery life and maintain the highest resolution photos images (Fig. 3).
Enhancing Whale Watching
4
215
Evaluation
In this section, we describe the evaluation of the App’s first working prototype, designed to enhance the whale watching experience. Because of the underlying context and the absence of previous work evaluating systems that combine enhancing user experience and engaging participants in conservation activities in the wild, we used an extensive evaluation framework which includes assessing the usability of the system (SUS [8]) and the presence (PRE [66]) of whale watchers. The study was conducted in Madeira (North East Atlantic Ocean), a historically touristic island destination in the North East Atlantic which has a significant occurrence of cetaceans [1,14]. Participants were invited to use the App during a whale watching trip to report the sightings. The data collected through sightings included: Our sample size (N = 12) included 6 males and 6 females. Their age was ranging from 16 to 45 (M = 30.08, SD = 10.93), from 6 different countries of origin: Portugal (3), France (3), Germany (2), UK (2), Netherlands (1) and Papua New Guinea (1). All of these participants expressed themselves as knowledgeable and experienced in using mobile smartphone applications. The study included mostly tourist visitors who came to the island on vacation and opted to go for the whale watching activity using sea-vessels. No incentives for recruiting participants were used. The study consisted of two separate tasks: (i) reporting a cetacean; (ii) answering the questionnaires. Both tasks were completed using the mobile application, which included electronic forms for the System usability scale (SUS) and Presence (PRE) questionnaires and also several study-specific questions. 4.1
Data Collection
To evaluate the App, we opted to collect participants’ data according to two scales and several study-specific questions (listed below), resulting in 40 questions (10 SUS + 24 PRE + 6 study-specific). At the end of the experience, these questions were posed to users, using the mobile application to display seven points Likert-based scales (Fig. 1h). The averages of the participants’ feedback are depicted below (Figs. 4 and 5), while here, we briefly describe the used scales. 1. System Usability Scale [SUS] (7-point Likert scale) - consisting of a 10 item standard SUS questionnaire, which we converted from 5 to 7 point Likerttype scale to keep the consistency with other forthcoming scales. This scale was used to gauge the usability of the application and its interface. 2. Presence Questionnaire [PRE] (7-point Likert scale) - tailored for virtual reality environments, the scale is equally valid if used in real environments, where all subjects experience the same type of environment [59]. We used the PRE scale to understand the feeling of presence of users, while engaged with the App. Thus, to assess if they still felt presence in the natural setting of the experience.
216
N. J. Nunes et al.
3. Study-Specific Questions [STD] (7-point Likert scale) – several questions targeting the usage of the App: i) Where you looking more to the actual whales and dolphins than the app?; ii) How important to you was the sound during the whale watching activity?; iii) How much you liked reporting the sightings?; iv) How much you liked classifying the cetacean sounds?; v) How much you liked gathering the cetacean photos?; and vi) Please rate your previous experience as citizen scientist.
Fig. 4. Boxplot depicting the results of the three questionnaires (N = 12). Red the PRE scale, green the study-specific questions and blue the SUS scale. (Color figure online)
4.2
Data Analysis
The results of the three questionnaires are summarized in Fig. 4. Overall the results from the 12 users are positive in terms of usability and presence. We calculated the overall SUS score according to [33] by determining each item’s score contribution, from the range of 0–7 as follows: i) for positively-worded items (1, 3, 5, 7 and 9), the score contribution is the answered value minus 1; ii) for negatively-worded items (2, 4, 6, 8 and 10), it is 7 minus the answered value; iii) for missing values assign 5 (the center of the rating scale). To get the overall SUS score, we then multiply the sum of the item score contributions by 1,67 (100/60), thus adjusting the overall SUS score to 0–100. Calculating the overall score, we reached 60.83, which according to [4] fits on a grade C (“good”) score. For the Presence questionnaire, we obtained the overall score by averaging the results from the 24 questions, 7-point Likert scale. The final result (M = 5.45, SD = 0.40) is above average. We also performed a Cronbach’s alpha to estimate
Enhancing Whale Watching
217
Fig. 5. Boxplot depicting the results of the three questionnaires (N = 12) consolidated in the known SUS and PRE factors.
the reliability, determine correlations between the questionnaires’ items, and assess each questionnaire’s internal consistency. The overall result was good at a = 0.827 (N = 12). We investigated further the results in terms of the known SUS [32] and PRE [65] factors. This structure is reported for SUS as usability (all questions except 4 and 10) and learnability (questions 4 and 10) [32]. For PRE the structure of factors includes realism (questions 3, 4, 5, 6, 7, 10 and 13), possibility to act (questions 1, 2, 8 and 9, quality of the interface (questions 14, 17 and 18), possibility to examine (questions 11, 12 and 19), self-evaluation of performance (questions 15 and 16), sounds (questions 20, 21 and 22) and haptic (questions 23 and 24). We included the study-specific questions in these factors as follows: realism (questions 1), sounds (question 2), possibility to act (questions 3, 4 and 5) and learnability (question 6). The results are summarized in Fig. 5 and show that higher average scores are on factors: realism (M = 5.65, SD = 1.19), possibility to act (M = 5.60, SD = 1.35) and quality of the interface (M = 5.75, SD = 1.27). Conversely lower scores are on haptic (M = 4.83, SD = 1.52) and learnability (M = 4.69, SD = 1.92). Finally, the study-specific questions showed that the App was not intruding in the experience (where you looking at the whales or the App?) with mean 5.58 (s = 1.24). Users also reported an above average perception of the importance of the sound provided by the App via the hydrophone with mean 5.42 (SD = 1.62). Users also reported liking to report the sightings (M = 5.83, SD = 1.03) and classifying the sounds (M = 5.82, SD = 1.70) both with higher mean values than gathering cetacean photos (M = 5.58, S = 1.16). Finally, users were almost balanced in terms of their experience with citizen science (M = 5.50, SD = 1.88).
218
5
N. J. Nunes et al.
Conclusions
Research efforts examining the opportunities to use entertainment technologies to promote environmental protection and ecological consciousness from users is a growing area of interest in research. In this paper, we expand previous work in trying to augment the experience of whale watching using a combination of sensors with a mobile application. Our App allows whale watchers to actively participate in reporting the sightings, sound classification, and media gathering while staying at a safe distance from the animals, so as not to induce stress in the cetacean. Here we report our first attempt to test such a system by combining known usability and presence scales with some study-specific questions to understand how fulfilling this new entertainment experience is. We asked 12 participants to conduct several activities, including (i) report a sighting, (ii) classify a cetacean vocal call, and (iii) gather a cetacean image. We summarize our results against the know factors of the SUS and PRE scales. We observe that factors such as “quality of the interface”, “possibility to act” and “realism” have higher mean values. While factors such as “haptics” and “learnability” have lower mean values. Although our study included the minimum number of participants required for validity of these scales (12), and our results are valid in terms of reliability and internal consistency, we would like to expand the sample to understand further how to evaluate these complex experiences. Future work includes comparing the results of the overall experience with and without the whale watching App and integrating environmental awareness scales into the evaluation framework. Another potential avenue for research emerging from this work includes designing applications that promote offshore whale watching experiences, either using holographic displays or combining sound and video from previous experiences. Ultimately our goals are to expand the state of the art in novel entertainment experiences that lie within this space of environmental conscious and sustainability in what we framed as eco-centric design [40]. Acknowledgements. Authors recognize several grants which supported the study: (i) LARGESCALE, #32474 by FCT and PIDDAC; (ii) LARSyS, #UID/ EEA/50009/2019 by FCT; (iii) LARSyS - FCT Plurianual funding 2020-2023, by FCT; (iv) INTERTAGUA, #MAC2/1.1.a /385 by MAC INTERREG 2014-2020; and (v) MITIExcell, #M1420-01-01450FEDER0000002 by RAM.
References 1. Alves, F., Ferreira, R., Fernandes, M., Halicka, Z., Dias, L., Dinis, A.: Analysis of occurrence patterns and biological factors of cetaceans based on long-term and finescale data from platforms of opportunity: Madeira island as a case study. Marine Ecol. 39(2), e12499 (2018) 2. Arts, K., van der Wal, R., Adams, W.M.: Digital technology and the conservation of nature. Ambio 44(4), 661–673 (2015). https://doi.org/10.1007/s13280-015-0705-1
Enhancing Whale Watching
219
3. Awori, K., et al.: Flytalk: social media to meet the needs of air travelers. In: CHI 2012 Extended Abstracts on Human Factors in Computing Systems, CHI EA 2012, pp. 1769–1774. Association for Computing Machinery, New York (2012). https:// doi.org/10.1145/2212776.2223707 4. Bangor, A., Kortum, P., Miller, J.: Determining what individual SUS scores mean: adding an adjective rating scale. J. Usab. Stud. 4(3), 114–123 (2009) 5. Bentz, J., Lopes, F., Calado, H., Dearden, P.: Enhancing satisfaction and sustainable management: whale watching in the azores. Tour. Manage. 54, 465–476 (2016) 6. Blok, A.: Actor-networking ceta-sociality, or, what is sociological about contemporary whales? Distinktion Scand. J. Soc. Theor. 8(2), 65–89 (2007) 7. Braun, B., Castree, N.: To modernise or ecologise? that is the question. In: Remaking Reality, pp. 232–253. Routledge (2005) 8. Brooke, J.: SUS: a “quick and dirty” usability. In: Usability Evaluation in Industry, p. 189 (1996) 9. B¨ uscher, B.: Nature 2.0: exploring and theorizing the links between new media and nature conservation. New Media Soc. 18(5), 726–743 (2016) 10. Chen, Y.Y., Cheng, A.J., Hsu, W.H.: Travel recommendation by mining people attributes and travel group types from community-contributed photos. IEEE Trans. Multimedia 15(6), 1283–1295 (2013) 11. Cisneros-Montemayor, A.M., Sumaila, U.R., Kaschner, K., Pauly, D.: The global potential for whale watching. Marine Pol. 34(6), 1273–1278 (2010) 12. Claudet, J., et al.: A roadmap for using the un decade of ocean science for sustainable development in support of science, policy, and action. One Earth 2(1), 34–42 (2020) 13. Corkeron, P.J.: Whale watching, iconography, and marine conservation. Conser. Biol. 18(3), 847–849 (2004) 14. Dinis, A., et al.: Spatial and temporal distribution of bottlenose dolphins, tursiops truncatus, in the madeira archipelago, ne atlantic. Arquip´elago-Life Marine Sci. 33, 45–54 (2016) 15. Dionisio, M., Nisi, V., Nunes, N., Bala, P.: Transmedia storytelling for exposing natural capital and promoting ecotourism. In: Nack, F., Gordon, A.S. (eds.) ICIDS 2016. LNCS, vol. 10045, pp. 351–362. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-48279-8 31 16. Dourish, P.: HCI and environmental sustainability: the politics of design and the design of politics. In: Proceedings of the 8th ACM Conference on Designing Interactive Systems, pp. 1–10 (2010) 17. Edgar, G.J., et al.: Global conservation outcomes depend on marine protected areas with five key features. Nature 506(7487), 216–220 (2014) 18. Fletcher, R.: Gaming conservation: nature 2.0 confronts nature-deficit disorder. Geoforum 79, 153–162 (2017) 19. Forlano, L., Ji, S.: Posthumanism and design. J. Des. Econ. Innov. 3(1), 16–29 (2017) 20. Gaver, W., et al.: My naturewatch camera: disseminating practice research with a cheap and easy DIY design. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2019) 21. Hill, A.P., Prince, P., Pi˜ na Covarrubias, E., Doncaster, C.P., Snaddon, J.L., Rogers, A.: Audiomoth: evaluation of a smart open acoustic device for monitoring biodiversity and the environment. Meth. Ecol. Evol. 9(5), 1199–1211 (2018) 22. Hoyt, E., Hvenegaard, G.T.: A review of whale-watching and whaling with applications for the Caribbean. Coast. Manage. 30(4), 381–399 (2002)
220
N. J. Nunes et al.
23. Jepson, P., Ladle, R.J.: Nature apps: waiting for the revolution. Ambio 44(8), 827–832 (2015). https://doi.org/10.1007/s13280-015-0712-2 24. Kareiva, P.: Ominous trends in nature recreation. Proc. Natl. Acad. Sci. 105(8), 2757–2758 (2008) 25. Kozak, M.: A critical review of approaches to measure satisfaction with tourist destinations. Tour. Anal. 5(2–3), 191–196 (2000) 26. Krause, B., Farina, A.: Using ecoacoustic methods to survey the impacts of climate change on biodiversity. Biol. Conserv. 195, 245–254 (2016) 27. Lambert, E., Hunter, C., Pierce, G.J., MacLeod, C.D.: Sustainable whale-watching tourism and climate change: towards a framework of resilience. J. Sustain. Tour. 18(3), 409–427 (2010) 28. Lavergne, S., Mouquet, N., Thuiller, W., Ronce, O.: Biodiversity and climate change: integrating evolutionary and ecological responses of species and communities. Ann. Rev. Ecol. Evol. Syst. 41, 321–350 (2010) 29. Lawrence, T.B., Phillips, N.: From moby dick to free willy: macro-cultural discourse and institutional entrepreneurship in emerging institutional fields. Organization 11(5), 689–711 (2004) 30. Lawrence, T.B., Phillips, N., Hardy, C.: Watching whale watching: exploring the discursive foundations of collaborative relationships. J. Appl. Behav. Sci. 35(4), 479–502 (1999) 31. Lee, J.J., Matamoros, E., Kern, R., Marks, J., de Luna, C., Jordan-Cooley, W.: Greenify: fostering sustainable communities via gamification. In: CHI 2013 Extended Abstracts on Human Factors in Computing Systems, pp. 1497–1502. ACM (2013) 32. Lewis, J.J.R., Sauro, J.: Revisiting the factor structure of the system usability scale. J. Usab. Stud. 12(4), 183–192 (2017) 33. Lewis, J.R., Sauro, J.: The factor structure of the system usability scale. In: Kurosu, M. (ed.) HCD 2009. LNCS, vol. 5619, pp. 94–103. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02806-9 12 34. Loureiro, P., Prandi, C., Nunes, N., Nisi, V.: Citizen science and game with a purpose to foster biodiversity awareness and bioacoustic data validation. In: Brooks, A.L., Brooks, E., Sylla, C. (eds.) ArtsIT/DLI -2018. LNICST, vol. 265, pp. 245– 255. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-06134-0 29 35. Mann, C.: A study of the iphone app at kew gardens: improving the visitor experience. In: Electronic Visualisation and the Arts (EVA 2012), pp. 8–14 (2012) 36. Milton, K.: Loving Nature: Towards an Ecology of Emotion. Psychology Press, New York (2002) 37. Morganti, L., Pallavicini, F., Cadel, E., Candelieri, A., Archetti, F., Mantovani, F.: Gaming for earth: serious games and gamification to engage consumers in proenvironmental behaviours for energy efficiency. Energy Res. Soc. Sci. 29, 95–102 (2017) 38. Neves, K.: Cashing in on cetourism: a critical ecological engagement with dominant E-NGO discourses on whaling, cetacean conservation, and whale watching 1. Antipode 42(3), 719–741 (2010) 39. Nisi, V., Cesario, V., Nunes, N.: Augmented reality museum’s gaming for digital natives: haunted encounters in the Carvalhal’s palace. In: van der Spek, E., G¨ obel, S., Do, E.Y.-L., Clua, E., Baalsrud Hauge, J. (eds.) ICEC-JCSG 2019. LNCS, vol. 11863, pp. 28–41. Springer, Cham (2019). https://doi.org/10.1007/978-3-03034644-7 3
Enhancing Whale Watching
221
40. Nisi, V., Prandi, C., Nunes, N.J.: Towards eco-centric interaction: urban playful interventions in the anthropocene. In: Nijholt, A. (ed.) Making Smart Cities More Playable. GMSE, pp. 235–257. Springer, Singapore (2020). https://doi.org/ 10.1007/978-981-13-9765-3 11 41. Nunes, N., Ribeiro, M., Prandi, C., Nisi, V.: Beanstalk: a community based passive wi-fi tracking system for analysing tourism dynamics. In: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS 2017, pp. 93–98. Association for Computing Machinery, New York (2017). https://doi. org/10.1145/3102113.3102142 42. Nunes, N.J., Nisi, V., Rennert, K.: beEco: co-designing a game with children to promote environmental awareness-a case study. In: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 718–727 (2016) 43. Oreskes, N.: The scientific consensus on climate change. Science 306(5702), 1686– 1686 (2004) 44. Parsons, E.: The negative impacts of whale-watching. J. Marine Biol. 2012 (2012) 45. Potamitis, I.: Classifying insects on the fly. Ecol. Inf. 21, 40–49 (2014) 46. Radeta, M., Nunes, N.J., Vasconcelos, D., Nisi, V.: Poseidon-passive-acoustic ocean sensor for entertainment and interactive data-gathering in opportunistic nauticalactivities. In: Proceedings of the 2018 Designing Interactive Systems Conference, pp. 999–1011 (2018) 47. Ratter, B.M., Philipp, K.H., von Storch, H.: Between hype and decline: recent trends in public perception of climate change. Environ. Sci. Pol. 18, 3–8 (2012) 48. Redin, D., Vilela, D., Nunes, N., Ribeiro, M., Prandi, C.: Vitflow: A platform to visualize tourists flows in a rich interactive map-based interface. In: 2017 Sustainable Internet and ICT for Sustainability (SustainIT). pp. 1–2 (2017) 49. Sandbrook, C., Adams, W.M., Monteferri, B.: Digital games and biodiversity conservation. Conser. Lett. 8(2), 118–124 (2015) 50. Sayigh, L., et al.: The watkins marine mammal sound database: an online, freely accessible resource. In: Proceedings of Meetings on Acoustics 4ENAL, vol. 27, p. 040013. Acoustical Society of America (2016) 51. Schaal, S., Schaal, S., Lude, A.: Digital geogames to foster local biodiversity. Int. J. Transf. Res. 2(2), 16–29 (2015) 52. Silva, L.: How ecotourism works at the community-level: the case of whale-watching in the azores. Curr. Issues Tour. 18(3), 196–211 (2015) 53. Stinson, J.: Re-creating wilderness 2.0: or getting back to work in a virtual nature. Geoforum 79, 174–187 (2017) 54. Sueur, J., Farina, A.: Ecoacoustics: the ecological investigation and interpretation of environmental sound. Biosemiotics 8(3), 493–502 (2015) 55. Sullivan, B.L., Wood, C.L., Iliff, M.J., Bonney, R.E., Fink, D., Kelling, S.: eBird: a citizen-based bird observation network in the biological sciences. Biol. Conser. 142(10), 2282–2292 (2009) 56. Teixeira, J., Patr´ıcio, L., Nunes, N.J., N´ obrega, L.: Customer experience modeling: designing interactions for service systems. In: Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011. LNCS, vol. 6949, pp. 136–143. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3642-23768-3 11 57. Teubner, G.: Rights of non-humans? electronic agents and animals as new actors in politics and law. J. Law Soc. 33(4), 497–521 (2006) 58. Tønnessen, J.N., Johnsen, A.O.: The History of Modern Whaling. Univ of California Press, Berkeley (1982)
222
N. J. Nunes et al.
59. Usoh, M., Catena, E., Arman, S., Slater, M.: Using presence questionnaires in reality. Presence Teleoper. Virtual Environ. 9(5), 497–503 (2000) 60. Uzunboylu, H., Cavus, N., Ercag, E.: Using mobile learning to increase environmental awareness. Comput. Educ. 52(2), 381–389 (2009) 61. Vasconcelos, D., Nunes, N., Ribeiro, M., Prandi, C., Rogers, A.: Locomobis: a lowcost acoustic-based sensing system to monitor and classify mosquitoes. In: 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC), pp. 1–6. IEEE (2019) 62. Waterson, N., Saunders, M.: Delightfully lost: a new kind of wayfinding at kew. In: Museums and the Web (2012) 63. Wearing, S., Buchmann, A., Jobberns, C.: Free willy: the whale-watching legacy. Worldwide Hospit. Tourism Themes 3(2), 127 (2011) 64. Wells, N.M., Lekies, K.S.: Nature and the life course: pathways from childhood nature experiences to adult environmentalism. Children Youth Environ. 16(1), 1–24 (2006) 65. Witmer, B.G., Jerome, C.J., Singer, M.J.: The factor structure of the presence questionnaire. Presence Teleoper. Virtual Environ. 14(3), 298–312 (2005) 66. Witmer, B.G., Singer, M.J.: Measuring presence in virtual environments: a presence questionnaire. Presence 7(3), 225–240 (1998) 67. Yen, B.T., Mulley, C., Burke, M.: Gamification in transport interventions: another way to improve travel behavioural change. Cities 85, 140–149 (2019)
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare Paulo Bala1,2(B) , Mara Dionisio1,2 , Sarah Oliveira3 , Tˆania Andrade3 , and Valentina Nisi2,4 1 FCT, Universidade Nova de Lisboa, Lisbon, Portugal {paulo.bala,mara.dionisio,valentina.nisi}@iti.larsys.pt 2 ITI-LARSyS, Funchal, Portugal 3 Universidade da Madeira, Funchal, Portugal 4 IST, Universidade de Lisboa, Lisbon, Portugal
Abstract. Playful interactions and storytelling become a powerful conduit for educational-entertainment experiences aimed at informing and critiquing social issues. In this paper, we describe the design of a transmedia experience, which uses its separate channels to deconstruct a complex social issue, while maintaining a coordinated and consistent story experience. Tell a Tail explores the welfare issues of companion animals through (1) a branching comic book extended with Augmented Reality scenes, and (2) an immersive 360◦ documentary about the rescue of abandoned companion animals. Our contribution is the rationale and design of an initial working prototype of the transmedia experience. Initial pilot evaluations revealed that the prototypes were well received, with participants praising the content and the medium used to deliver it. Through reflection on findings of the pilot studies and design decisions of the prototypes, we delineate how the affordances of different XR technologies can be used to inform and raise awareness of a social issue. Keywords: Transmedia storytelling · Interactive storytelling · Animal Welfare · EXtended reality · Augmented reality · Virtual Reality
1
Introduction
In recent years, concepts like “Critical Play” [12] and “Newsgames” [6] have accentuated the importance of meaningful play, by using playful interaction to inform and/or critique political and social issues. Stories are suitable conduits for critiquing the status quo as they help us understand the world [15] and through narrative persuasion and character involvement, can educate, engage and persuade the audience to take action [16]. Projects in the field of Interactive Narratives [34] and in Transmedia Storytelling (TS) [31] have shown potential
P. Bala and M. Dionisio—These authors contributed equally. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 223–239, 2020. https://doi.org/10.1007/978-3-030-65736-9_19
224
P. Bala et al.
to influencing social change. In this paper, we narrow our focus on companion Animal Welfare issues by aligning playful interactions, storytelling and emerging XR technologies (eXtended-Reality, encompassing: VR, Virtual Reality; AR, Augmented Reality; CR, Cinematic Reality). For many years, animals were considered only in terms of the benefits they brought to human survival as tools (e.g. dogs for hunting, shepherding, guarding) or objects (e.g. cows for sustenance) [8]. The domestication of certain animals, their antromorphization and inclusion as members of the family, accentuated their nature as sentient beings, having great importance in the discourse on Animal Welfare rights [8], leading to legislation on Animal Welfare standards. Showcasing the complexity of the Animal Welfare issue, regardless of enacted legislation (e.g. criminalization of cruelty and abandonment) and support programs, many problems persist in many communities (e.g. abandonment of companion animals; overpopulation of stray animals; underfunded/overpopulated kennels). In our specific context, Madeira Island, several government-sponsored programs already exist (e.g. sterilization, microchip identification). Nevertheless, recent policy changes enforcing “No Kill” animal shelters, have left these overpopulated and unable to take in stray animals. Several NGOs (Non-Governmental Organizations) formed by private citizens help to relieve this burden partially, but abandonment and the existence of stray dogs are still pending effective solutions and are negatively impacting the Island’s main economic activity, Tourism. Abandonment of companion animals is one of the most urgent targets for intervention worldwide [32]. A common strategy is the (re)education of the public concerning this issue. School visits or programs specifically target (re)education of adolescents. For example, Kim and Lee [21] describe a socio-scientific program for middle school students, implemented both at school and in the community, aimed at making students responsible and proactive citizens. Kim and Lee’s work [21] is heavily based on community engagement (e.g. visiting local shelters). However, physical tours are demanding logistically both for schools and shelters and are not scalable to larger programs; virtual tours could easily replace these. Furthermore, by using traditional mediums like books and videos, Kim and Lee [21] miss an opportunity to introduce interactive and engaging content to students. A broader strategy to (re)educate the public is through the use of Social Marketing [23] campaigns (intended to inform and influence behaviour around a social issue through marketing principles). For example, a campaign in Portugal [33] used social networks to raise awareness about a new law criminalizing animals’ mistreatment and abandonment. Transmedia Storytelling falls in line with this broader strategy of (re)educating the public. TS projects are distributed through different media (e.g. web, social media, books, etc.) but come together as a coordinated and unified entertainment experience [19]. This story structure holds great potential to explore complex social issues, decomposing the problem into more manageable and understandable components. Inspired by the potential of Transmedia aligned with emerging XR technologies, in this paper we describe the design and development of Tell a Tail, a TS experience aimed at adolescents (10–19 years old), focusing on companion
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare
225
Animal Welfare issues in Madeira Island. Tell a Tail is composed of two main media elements: (1) Tell a Tail Comic AR, a branching story, realised as an AR enhanced comic book, aiming to raise awareness about abandonment, pet care and education practices; and (2) Tell a Tail 360 ◦ , an interactive and immersive 360◦ documentary on Animal Welfare, focused on the work of NGOs and kennels. While the two story-artefacts share the same story world, projects explore different topics and stakeholder’s views. The first is specifically aimed at pre-teens (10–13), the second is targeting teens (13–19). Through initial pilot evaluations, we discuss how the union of Transmedia, interactivity and XR technology creates multiple design opportunities such as the use of interactivity to connect knowledge to actionable solutions or the use of media immersion to make participants part of the action. Hence, the main contribution of this work is the design of a transmedia artefact that explores how to leverage XR technologies to inform and raise awareness about a specific social issue.
2 2.1
Related Work Transmedia for Social Change
In Transmedia Storytelling, Jenkins defends that each medium uniquely contributes to the story world, creating a unified and coordinated entertainment experience [18]. In our work, we explore the role that TS can assume in raising awareness about social issues. When used as a tool to introduce social change, TS is called “Transmedia For Change” (T4C) [31], being correlated with Transmedia activism and personal growth, meaning a change in society or community. Pratten believes that the story is the main component of the experience, and needs to be told to the right people at the right time [31]. Complementing Pratten’s vision, Weinreich states that in order for such type of transmedia to provoke change, the narrative must be engaging, not only in the topic it tackles but in the content of the story itself [37]. A good practice is crafting real characters, believable conflict, and a good story arch to make it believable. In the story, the problem should be exposed, and a solution should be presented in order to “put your audience in situations where they need to make decisions related to the actions you want your audience to take, and show the consequences of those decisions” [1]. However, a significant and intrinsic limitation of transmedia is that it requires an advanced media literate audience, because of its complex narrative structure. When considering children as the consumers, creators should take into account their information-processing capabilities, memory capacity and their attention span [30]. Hence, the TS approach raises concerns on how it can be applied to audiences in late childhood and early adolescence, arguing that it requires children to have the cognitive maturity with advanced media literacy skills needed to make connections between narratives from the different media platforms to provide a fulfilling experience of the story world [30,40]. Several projects explore the union of storytelling and transmedia engagement, to provide its audience with positive messages, by inspiring better choices and/or providing actionable solutions to the issue [1]. TS projects such as Lizzy Bennett
226
P. Bala et al.
Diaries [7], Fragments of Laura [10] and Priya’s Shakti [9] use recognizable conventions of TS and borrow elements from other types of media such as digital storytelling, documentary film making and art illustration. In this way, hybrid formats emerge “focused on raising awareness about particular social issues or telling the stories of marginalized groups, who otherwise do not have a voice in the public sphere” [13]. To the best of our knowledge, few projects explore the intersection of TS and Animal Welfare. Bear 71 [26] is a TS project where by engaging with the world of a female grizzly bear through several channels such as webcams, augmented reality and geolocation tracking, participants are able to explore the effects that human settlements have on wildlife. Susi [4] is a transmedia project about an old elephant who lives locked up in Barcelona’s zoo since 2002. The project started with an interactive documentary about Susi’s story in an effort to move her into a sanctuary. While Susi remains in the zoo and this goal is yet to be achieved, it raised a citizens’ movement ZOO XXI [2], which is about to achieve a change of model in Barcelona’s zoo. Another transmedia activism experience about wildlife conservation is Endangered Activism [14]. This project makes use of several platforms but also several locations around the world, raising awareness about Animal Welfare conflicts unique to each location. While all these projects show the potential of TS in raising awareness towards Animal Welfare, none of them address the issue of companion animals. 2.2
XR for Social Change
XR technology has the potential to create engaging and novel experiences, many of which are focusing on improving the state of the world by targeting individuals or communities. AR experiences, leveraging AI and computer vision, can help users understand the world, for example, helping visually impaired people navigate spaces [20]. AR experiences can also introduce content into the world with a purpose, as seen in Norouzi et al.’s work [28] where a simulated virtual dog is used for emotional support. Finally, AR experiences can also be used fto create empathy for social causes. For example, the World Wildlife Fund used AR to allow users to take photos with endangered leopards [39] and the British National Health System used AR-tracked markers to simulate and show the importance of blood donations [27]. AR has also been used in conjunction with traditional mediums like graphic novels to make digitally-augmented comic books [22] aimed at augmenting the story rather than duplicating. To our knowledge, Priya’s Shakti [9] is one of the few AR comics focusing on social change. The immersive nature of VR and CR experiences strongly bonds users to story and characters. Pe˜ na et al.’s [29] work on Immersive Journalism laid the foundations for VR nonfiction experiences [5] through works such as “Hunger in L.A.” where the audience’s experience, placed in line for a food bank, mirrors the virtual characters. One of the most significant advantages of VR and CR experiences is to be able to transport users in time and space, making them “live” experiences that would not be possible in reality [24,38]. As such, Thomas et al. [36] use VR to depict the presence of trash in the ocean, while Markowitz et al. [25] used VR to stage ocean field trips in the classroom to present the
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare
227
effects of ocean acidity. VR and CR have also been used to promote the animal cause and the organizations that support it: iAnimal [11] used 360◦ video to show the living conditions of farmed animals, while Windor’s Green Dog Rescue [3] used a 360◦ video to show a puppy adoption event I.
3
Tell a Tail Transmedia
After initial research on companion animals in the Island (including community surveys, interviews with NGOs members, literature review, visits to local shelters, etc.), we identified a complex social problem with several dimensions: uneducated public (how to train dogs, veterinarian care, microchipping, sterilization), preconceptions (preference for breed dogs, pet shops), challenges in the management of kennels and NGOs (overpopulation, lack of funds, lack of volunteers), among others. Due to the vast number of dimensions identified, we opted for the creation of a transmedia world, where different media channels could address different themes of the complex Animal Welfare issue, targeting different audiences’ age range. Baring in mind the limitations of TS regarding audiences’ media-literacy skills. [30,40], we splintered the Tell a Tail transmedia world into two sub-projects targeted at two different but close age groups. Each sub-project is self-contained, employing design strategies appropriate to the target audience while maintaining an overall consistent aesthetic. Additionally, web components (Instagram, YouTube, website) provide the audience with multiple entryways to the transmedia, as well as fulfilling a promotional role. Tell a Tail TS leverages XR to create engaging novel experiences for adolescents: (1) Tell a Tail Comic AR, is a branching fictional comic book with AR scenes for preteens (10–13), and (2) Tell a Tail 360 ◦ , an immersive nonfiction documentary for teens (13–19). 3.1
Tell a Tail Augmented Reality Comic Book
Story. In the comic book, story participants can follow the adventures of Chris and her dog Penny. Chris is a pre-adolescent girl struggling to fit in at school, with some self-esteem problems, and easily influenced by her peers. As the narrative unfolds, we witness Chris growing and grappling with the inherent responsibilities of owning a dog. Through a branching narrative, the audience can explore different issues and misconceptions regarding companion Animal Welfare. The overall story is structured to touch upon several themes collected from the field research, such as common reasons behind animal abandonment, how to train and take care of a pet, and how pets have feelings. These topics are present over the two main branches in which the narrative unfolds: (1) Adoption branch, where Penny is adopted from a kennel; and (2) Shop branch, where Penny is purchased in a pet shop. However, each path then tackles a broad set of specific companion Animal Welfare issues, through the unfolding of Chris and Penny’s adventures. For example, the Adoption branch shows to the audience that animals adopted from a kennel can be happy, healthy and affectionate.
228
P. Bala et al.
Moreover, the branch exemplifies how raising a pet, in this case, Penny, takes time and patience (proper dog training and feeding habits). As this story branch develops, Chris goes through personal problems and quickly dismisses her dog, portraying how easy it is to ignore and forget that a dog has needs and feelings. The Shop branch, on the other hand, exposes the reality of most pet shops and puppy mills, where dogs are bred for profit without regards or proper care, and how important is sterilization to prevent overpopulation. By following Chris and her classmates’ adventures, this story branch touches upon the misconception that breed dogs are “special” and better companion animals than those without pedigree. These two story branches never intertwine along the development of the story. Each branch leads to two different endings.
Fig. 1. Tell a Tail Comic AR (Color figure online)
Twelve AR scenes extend the comic book story to present the audience with Penny’s specific perspective. In these scenes, Penny reacts to situations and talks in the first person, exposing to the readers the animals’ perspective on the events. The AR scenes are distributed in equal numbers among the two main story branches. The goal of the AR scenes is to create a stronger bond between Penny and the reader, making them aware of how animals are sentient and have a different perspective than ours. Furthermore, the AR scenes complement the comic books’ attempt to educate its audience about correct Animal Welfare. For example, in point AR3 - “Veterinarian” (see Fig. 1c/d) we see Penny’s perspective of going to the vet: while initially scared of the vet and vaccines, Penny is relieved when her owner gives her a treat as a reward. This AR scene highlights the importance of providing vet care and the power of positive reinforcement education for pets. In this way, pets associate things that they do not like with something that they do. User Experience Design. The Tell a Tail Comic AR combines the traditional experience of reading a comic book with the agency and immersion of an AR world. Considering the target audience of pre-teens, the number of decisions (branches) was limited in order to achieve a stronger dramatic impact for each
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare
229
branch choice. Furthermore, the visual style is minimalist but joyful, privileging clarity over complexity in the attempt to keep the young audience’s interest. In the two initial pages, the comic book instructs the audience on how to make choices about the story branching, see Fig. 1a. Each branch is color coded, and the border of each panel is highlighted according to the branch. For example, in Fig. 1b, the color red corresponds to the Adoption branch and green to the Shop branch. To move forward in the story, the reader follows to the page presented at the edge. In several panels, the graphical symbol of a smartphone, see Fig. 1c, indicates the availability of an AR scene. These scenes are complementary to the main story, and introduce Penny’s perspective on the events, exploring her feelings and fears. To uncover the AR scene, the reader utilizes the camera view of the AR application to scan the comic book page. This will trigger the placing of a 3D environment into the surrounding real world (see Fig. 1d). The AR scene is comprised of mix between a 3D background environment and a 2D model of Penny. The application allows the reader to create a significant connection with Penny through her clear and simple dialogue and by physically exploring her interactions with the environment. The graphic style of the AR scenes matches the one in the comic book, maintaining consistency between the mediums. The prototype was developed in Unity 2019.3.0a11, using the ARCore module for image and motion tracking and the Fungus package for the narrative inside the AR scenes. 3.2
Tell a Tail Immersive Documentary
Story. Based on the initial context research, Tell a Tail 360 ◦ follows a documentary narrative approach portraying the fieldwork of a local NGO, the challenges faced by a kennel and how a Veterinary Hospital responds to animal abuses. This narrative was organized into two branches, (1) the Kennel Branch and (2) the Vet Hospital Branch. In the first branch, the audience follows the kennel manager during her daily routine at the shelter, showing the conditions, spaces (cat and dog shelters) and the challenges they face, stressing the importance of sterilizing and chip identification in controlling the number of abandoned animals. In the second branch, the audience finds more information about the veterinary services applied to a rescued dog, raising awareness about proper animal health care, and what happens when this is neglected. This branch starts inside the veterinary hospital (see Fig. 2) where a team is performing procedures to a mistreated dog, Chico. The audience then has the freedom to explore around the hospital facilities to learn about several rescued dogs (through the NGO’s Instagram posts), or to learn more about Chico. By doing the latter, the audience follows the fieldwork of the local NGO who rescued Chico, an ill guard dog, left in the blazing sun by his owners, without water/food. User Experience Design. To experience the Tell a Tail 360 ◦ prototype, the participant is required to put on a Head Mounted Display (HMD) with a Bluetooth controller. At the beginning of the experience, a two-stage tutorial is presented: the first stage shows instructions on how to adjust the HMD and how
230
P. Bala et al.
to use the controller (to select, go back and skip); the second stage, introduces user interface elements (e.g. what is selectable). After the tutorial, the audience is positioned at the kennel entrance, where its manager introduces herself and hints that the viewer can explore different content paths by clicking on interactive points (IPs). The IPs are outlined (white line at 50% opacity, see Fig. 2) and are responsive to the controller (becoming green lines at 100% opacity when hovered); choosing the kennel manager selects the Kennel branch while choosing the kennel gate selects the Vet Hospital Branch. This initial meeting place functions as a “Homepage” because it is the main departure and arrival setting from and to the narrative branches in the documentary. In both branches, selecting an IP might lead to: 1) 360◦ videos where the user is prompted to choose their path between multiple interaction points; 2) 360◦ videos where no action is needed from the user; 3) 360◦ panorama images where 2D images (representing Instagram posts, etc.) are overlaid. After visiting both branches, the experience restarts. The prototype was developed in Unity 2019.3.0a11 and is compatible with Oculus Go HMDs. Additionally, a non-HMD version was developed where controller actions were replaced with mouse and keyboard actions, and looking in the 360◦ video was done by clicking and dragging the mouse.
Fig. 2. Veterinary Hospital branch of Tell a Tail 360 ◦ with two examples of IPs
4
Tell a Tail Comic AR Pilot Evaluation
Tell a Tail Comic AR was invited to be showcased at a public school, in a classroom setup. The goal of this pilot study was to understand to which extend the AR component of the experience would enhance the storytelling and the message of the comic book. Furthermore, it would allow uncovering potential shortcomings in the interaction with the comic book and the AR application. A total of 17 pre-teens participated in this pilot evaluation and were randomly divided into two groups (Comic group and Comic+AR group) each in a different classroom. A short questionnaire was delivered at the beginning of the experience, covering age demographics. From the total participants, 5 were males, 11 were females, and one preferred not to answer. Fifteen participants were 10 years old, one participant was 11 years old, and another was 12 years old. Participants in the Comic group would only experience the comic book, and therefore only provided with a copy of the comic book. While participants in the Comic+AR group
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare
231
would experience the comic book with the AR application, and each participant was provided with a comic book, a smartphone with the AR application and earphones. Due to a limitation in the number of mobile phones, 11 participants were assigned to Comic group and 6 participants to Comic+AR group. Participants were given around 30 min to interact with Tell a Tail Comic AR freely. During this time, one researcher would do observation and take notes. The observation guidelines for Comic group were the following: “Did participants understand the concept of branching narratives?”; “Did participants freely explore the comic book?” “Did participant search for alternate endings?”. In the Comic+AR group, participants were observed with particular attention to these actions: “Did participants have difficulties in scanning the comic book pages and/or placing the AR world”; “Did participants explore the AR world and/or follow Penny?”; “Did participants seem immersed in the AR world?”. From the observations of the Comic+AR group, it was noted that participants firstly open up the mobile application, before opening the comic book. As the app was not reacting (without scanning the markers in the book) they proceed to open the comic book and read the instructions. We observed that 4 of the 6 participants, were very eager to discover what the mobile application could do, so they very quickly started flipping through the book pages to find the AR icon. Nothing happened as they pointed the camera phone directly into the AR icon, instead of the whole page, although this was explained in the instructions. This originated some initial frustration that was quickly overcome as some of the participants explained the correct way to use the application to the others. Moreover, because of their short height, it was hard for participants to scan the page while sitting. Since the scanning process requires some distance between the camera phone and the comic book page (in order to cover the whole area), participants had to stand up, to trigger the AR scene. This led participants to gather into groups, interacting with and observing what others were doing. As soon as the AR Scene started, they would call Penny’s name and chase her around the classroom space, very amused. They would say out loud that they found Penny and try to touch and pet her by putting their hands in front of the phone’s camera. After interacting with the AR scenes, some participants grabbed the comic book to look what was going to happen next, saying out loud: “What is going to happen to Penny?” and “Oh no, poor Penny!”. Participants would often look at each other’s screens and wanted to follow along with what was happening in each other’s stories. We noticed that some participants would just flip through the entire comic book looking for AR scenes. We also observed that generally, participants had some difficulty in placing the AR world overlaid on the physical world, as they did not understand the required actions. ARCore recognition system works properly in well-illuminated areas and with textured surfaces. The classroom where the study was performed was not very well-illuminated, and most table surfaces were plain white, which caused some tracking issues with ARCore. At this point, the researcher stepped in and explained these limitations. Soon after, participants got creative and searched the classroom for areas where the AR world could be placed more easily, such as
232
P. Bala et al.
backpacks, jackets, etc. We observed that some participants were unsure if AR scene had finished, because of not noticing the “Exit” button or not realizing that they had to press it to end that AR point. Some participants asked if they could engage for longer with the prototype. From the observations for the Comic group, we noted that participants were quietly sitting at their desks and focusing on reading the comic book while correctly following the branching paths. They quickly started reading and following the paths they had chosen to. Participants seemed to be fully immersed and engaged with the story as they would not look up or stop reading. Only one participant at a certain point got confused and asked about where a specific story branch would lead. However, the rest of the participants did not seem to have any troubles in following the branching narrative. When participants reached an end, some of them would go back a couple of pages, considering a new branch to follow. Some participants asked to keep reading the comic longer than the allotted time. At the end of the session, some participants expressed their enjoyment of the graphical style of the comic book, particularly the bright colours.
5
Tell a Tail 360 ◦ Pilot Evaluation
In order to evaluate the prototype, we organized a pilot study at a public school, in a classroom setting. The goal of this pilot was to understand to which extend the narrative medium would affect the overall experience. A classroom of 18 students (all male, between 15 and 18) was divided into 2 groups of 9 students each. Group HMD experienced the HMD version of the prototype (running on an Oculus Go) and prior to the experience were informed on the use of HMD. Group Non-HMD experienced the non-HMD version of the prototype (running on the school’s laptops) and prior to the experience were informed on the interactions available. Both groups were briefed on the study, given consent forms and asked to fill out a questionnaire with demographic information and statements on NGOs (e.g. S1 “I’m motivated to volunteer in an NGO”). Each participant interacted for 10 min with the prototype during which the researcher took observational notes about the overall reactions and difficulties faced. After this, they were asked to fill a post questionnaire including 8 statements in the form of ordinal items with 5 levels (“Strongly Disagree” to “Strongly Agree”), including a repeat statement from the pre-questionnaire (S1). The remaining statements were: S2 “The technology used was important to keep my attention”, S3 “The chosen topics were appropriate”, S4 “The duration of the experience was appropriate”, S5 “I’m interested in visiting a kennel”, S6 “I would repeat the experience”, S7 “I’m motivated to research about the Animal Welfare topic”, S8 “Being able to choose my path made my experience more memorable”. The data was analyzed using SPSS 26, and graphs were made in R using ggplot2; all data was non-parametric, so Kruskal-Wallis tests were performed to compare groups, but no statistically significant difference was found. Figure 3 shows boxplots for statements S2–S8. Additionally, Wilcoxon sign rank test were performed to compare pre and post responses to S1. For the Non-HMD
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare Non−HMD
Strongly agree
HMD
233
Agree
n 1
Neutral
2 3 4
Disagree
5 6
Strongly disagree
7
S2
S3
S4
S5
S6
S7
S8
S2
S3
S4
S5
S6
S7
S8
Fig. 3. Boxplots for statements S2–S8 for group HMD and Non-HMD. The overlaid coloured circles represent a count(n) of answers for the specific agreement level.
group, we found a statistically significant change in the desire to volunteer (Z = −2.46, p = 0.014), with a median value of 2 (Neutral) before the experience and a value of 3 (Agree) after the experience. For the HMD group, we did not find a statistically significant change in the desire to volunteer (Z = −1.89, p = 0.0591), with a median value of 2 (Neutral) both before and after visualizing the narrative. Figure 4 representing the change of responses in S1, shows that all participants either increased or maintained their agreement with the statement.
HMD
Non−HMD
9 8 7
Count
6
Strongly agree Agree
5
Neutral 4
Disagree Strongly disagree
3 2 1 0 Pre−S1
Post−S1
Pre−S1
Post−S1
Fig. 4. Alluvial diagram for S1 (“I’m motivated to volunteer in an NGO”), before and after the experience for the HMD and Non-HMD groups
From the observations, we note that participants in HMD group showed more enthusiasm during the experience, making remarks regarding the events happening inside the virtual environment, conveying expressions such as surprise, curiosity, and mentioned how close they were to the animals. The observations made during this evaluation point out some technical issues that can be resolved for future iterations. Some participants in the HMD group reported difficulties using
234
P. Bala et al.
the remote, while some participants in the Non-HMD group showed difficulties remembering keys for navigation actions (e.g. S for Skip).
6
Discussion
The Tell a Tail pilot evaluations highlight the potential of Transmedia storytelling aligned with emerging XR technologies. We will first address each prototype pilot evaluation separately, before discussing the implications for the overall Transmedia project. Regarding the Tell a Tail Comic AR, generally speaking, participants in both groups of the pilot study appreciated the experience. The pre-teens highly praised the visuals of both the comic book and the AR application, perceiving the combination of the comic book with AR as something rather novel to them. Several differences in terms of the user experience were found between the participants of the two groups. While participants in both groups showed that they were excited about the prototype, the excitement of the Comic+AR group was much higher than Comic group. The excitement in the participants of Comic+AR group came from having to use a mobile device and discovering the AR scenes, hence the excitement was heavily influenced by technology. In Comic group, participants were excited about the possibility of making choices and having various branches to follow and how this lead to different possible endings, hence the excitement was heavily influenced by the power of interactive storytelling. One thing noticed was that most participants in Comic+AR group did not pay much attention to the story; they would flip the comic book pages just looking for the AR content. While in Comic group, participants were keen to get to know the story and follow all its possible outcomes. These observations’ results indicate that while AR technology can provide users with a sense of excitement, it can at the same time become distracting and detach the users from the story itself, making them only focus on what the technology can do. It is yet to analyze if, after the novelty factor wears off (brought by the AR technology), participants would read the comic book and appreciate its story content. A significant positive aspect emerging from Comic+AR group’ interactions with the prototype, was the instant connection that they made with the main character Penny, enabled through the interaction with the AR scenes and observed in the out-loud comments made by the participants. This connection was not as clearly observed in Comic group; as they were not so vocal as in Comic+AR group, keeping their emotions to themselves. Furthermore, in Comic+AR group, we noticed that participants had more opportunities for content engagement enabled by the AR scenes. Usability issues were found in the AR application. The visual cues used were not clear enough and the placement of the AR world due to difficulties in surface detection lead to some frustration. The issues encountered accentuate the need for a better introduction to the technology and the interactions available. This can be achieved by redesigning the “onboarding” process (introduction of a technology to new users) and streamline the transition from comic to AR. The Tell a Tail 360 ◦ pilot evaluation shows promising results, in particular regarding the possible inclusion of this prototype in educational settings,
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare
235
with both prototypes HMD and Non-HMD being well received by the participants. Overall, as can be seen in Fig. 3, most participants in the HMD group strongly agreed or agreed with all the statements, while participants’ opinion in the Non-HMD group was more diverse. In regards to statements on “technology used” (S2), “chosen topics” (S3) and “duration of the experience”(S4), both groups expressed similar opinions, towards strongly agreeing or agreeing; this is expected since topics and duration are the same for both prototypes, and both prototypes used engaging technology to deliver content. Likewise, both groups wanted to “repeat the experience” (S6); we speculate that this was incentivized by the branching narrative, the novelty factor of the technology, and the content itself. Regarding “visiting a kennel” (S5) and “research about the Animal Welfare topic” (S7), participants from the HMD group more consistently focused their responses on the strongly agree or agree. We speculate that the novelty factor of VR had some impact in this, as the immersive nature of the experience might have generated interest to either visit or investigate the topics further. Lastly, participants in the Non-HMD group had a lower level of agreement than the HMD group on “choosing my path made my experience more memorable” (S8). One possible reason for this is that since interaction performed through a mouse click is analogous to browsing a website, it is quite common in their daily lives; again, the novelty factor of HMDs might have made the choices of the experience more memorable for HMD group. While only a statistically significant difference was found for the Non-HMD group in regards to desire to volunteer in Animal Welfare related activities, results for both groups are encouraging. As can be seen in the alluvial diagram in Fig. 4, all participants either maintained or increased the desire to volunteer; this is promising since these types of experiences could be used to promote activism in students, “demystifying” the experience of volunteering in animal-related NGOs. Furthermore, these results might indicate that the presentation mode (HMD vs Non-HMD), and the inherent novelty factor, might not be a factor in determining their future commitment. This would be consistent with Steinemann et al.’s [35] study on increasing donation behaviour in a Game for Change, where the presentation mode was not a factor, but interactivity was. Similarly to the Tell a Tail Comic AR pilot, usability issues were identified, in particular, on the use of the controller in the HMD group. The solution for this is two-fold: (1) a renewed focus on “onboarding” would help participants with no experience in this type of medium, and (2) simplifying the number of actions possible with the controller would reduce task loading. Looking at the transmedia experience as a whole, from both pilot evaluations results it is clear that the participants’ appreciated both the content and the technology used to deliver it. To note is that our target audience having grown up in a digital age, comfortably deals with technology. We believe that this predisposition to embrace technology is a good strategy to communicate companion Animal Welfare issues with such audience. Their natural engagement with technology can be carried over to the socially engaged message. However, this is a subject for further evaluation in the next iteration of the prototypes.
236
P. Bala et al.
A common link between the pilot studies is the fact that the use of XR technologies implies a novelty factor, which has both advantages and disadvantages (like “onboarding” or access to technology). While experiencing the XR technologies, participants were engaged with the content, but it is yet to be understood to what extent this translates into being informed or able to critically understand the message regarding companion Animal Welfare. Furthermore, designers should leverage the relationship between XR technology and the message since different XR technologies might carry over the message more effectively than others. The interactivity enabled by an AR game puts the participant in a situation where their actions have immediate repercussion on the world and may lead to a solution of the issue. For example, in Tell a Tail Comic AR, the participant is able to see an immediate effect on Penny when they apply a positive reinforcement (dog treat), see Fig. 1d. Furthermore, the immersive nature of XR and the “feeling of being there” [17] can also be leveraged to put the participant in the situation and create a stronger bond with the story and characters. For example, in Tell a Tail 360 ◦ , seeing the poor conditions from which the dog was rescued in a 360◦ environment, see Fig. 2, might establish a stronger connection with the animal and the cause at large. Hence, XR technology features can and should be exploited for the best delivery of the story message.
7
Future Work and Conclusion
Acknowledging the fact that these were pilot evaluations with several limitations (sample size, gender diversity, evaluation measures, iterating prototypes, etc.), a long term evaluation needs to be conducted to accurately assess the impact of XR technologies in Transmedia Storytelling and their effectiveness in communicating Animal Welfare issues. In summary, Tell a Tail tackles a current, multi-faceted and complex societal concern, companion Animal Welfare. The intricate nature of this topic makes it suitable for a Transmedia world. In fact, the TS story structure allows for complex worlds that can expand as the issues in contemporary society evolve, by providing new content over different channels, pushing for novel experiences to raise awareness on the matters at stake. Both Tell a Tail Comic AR and Tell a Tail 360 ◦ show how XR technology can produce an emotional link with the Animal Welfare topic while providing an entertaining experience. Nevertheless, it is important to note that educating audiences is a complex process, and this project provides an entry to the topic, reporting on a preliminary exploration on how to use Transmedia Storytelling to sensitize young audiences towards Animal Welfare. Acknowledgments. This work has been supported by MITIExcell (M1420-01-0145FEDER-000002), MADEIRA 14–20 FEDER funded project Beanstalk (2015–2020), LARSyS-FCT Plurianual funding 2020–2023 (UIDB/50009/2020) and FCT Ph.D. Grant PD/BD/128330/2017 and PD/BD/114142/2015.
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare
237
References 1. The Immersive Engagement Model: Transmedia Storytelling for Social Change. http://www.social-marketing.com/immersive-engagement.html 2. ZOOXXI—Zoo XXI. http://zooxxi.org/en/. library Catalog: zooxxi.org 3. ABC7: 360 VIDEO: Green Dog Rescue puppies at ABC7 (2018). https://www. youtube.com/watch?v=YWpG9T3N0EY 4. Aire, A., Zonca, C.: Susi, an elephant in the room. https://www.facebook.com/ SusiWebdoc/ 5. Bevan, C., et al.: Behind the curtain of the “Ultimate Empathy Machine”: on the composition of virtual reality nonfiction experiences. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI 2019, Glasgow, Scotland UK, pp. 1–12. ACM Press (2019). https://doi.org/10.1145/3290605. 3300736. http://dl.acm.org/citation.cfm?doid=3290605.3300736 6. Bogost, I., Ferrari, S., Schweizer, B.: NewsGames: journalism at play. MIT Press, Cambridge (2010). oCLC: ocn606234291 7. Bushman, J.: The Lizzie Bennet Diaries. https://www.jaybushman.com/lizziebennet. Library Catalog: www.jaybushman.com 8. Carr, N.: Defining Domesticated Animals and Exploring Their Uses by and Relationships with Humans within the Leisure Experience. In: Carr, N. (ed.) Domestic Animals and Leisure. Leisure Studies in a Global Era, pp. 1–13. Palgrave Macmillan UK, London (2015). https://doi.org/10.1057/9781137415547 1 9. Chattopadhyay, D.: Can comic books influence consumer awareness and attitude towards rape victims and perpetrators in India? The case of Priya’s Shakti. J. Graph. Novels Comics 1–19, December 2017. https://doi.org/ 10.1080/21504857.2017.1412992. https://www.tandfonline.com/doi/full/10.1080/ 21504857.2017.1412992 10. Dionisio, M., Nisi, V., Nunes, N., Bala, P.: Transmedia storytelling for exposing natural capital and promoting ecotourism. In: Nack, F., Gordon, A.S. (eds.) ICIDS 2016. LNCS, vol. 10045, pp. 351–362. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-48279-8 31 11. Equality, A.: iAnimal - a virtual reality experience into the lives of farmed animals (2017). https://ianimal360.com. Library Catalog: ianimal360.com 12. Flanagan, M.: Critical Play: Radical Game Design. MIT Press, Cambridge(2013). oCLC: 935016500 13. Gair, S., Van Luyn, A. (eds.): Sharing Qualitative Research: Showing Lived Experience and Community Narratives. No. 21 in Routledge Advances in Research Methods. Routledge, Taylor & Francis Group, Abingdon (2017) 14. Galpin, S., Galpin, D.: Endangered Activism: A Transmedia Experience. https://endangeredactivism.org/where-in-the-world. Library Catalog: endangeredactivism.org 15. Green, M.C., Brock, T.C.: The role of transportation in the persuasiveness of public narratives. J. Pers. Soc. Psychol. 79(5), 701–721 (2000). https:// doi.org/10.1037/0022-3514.79.5.701. http://doi.apa.org/getdoi.cfm?doi=10.1037/ 0022-3514.79.5.701 16. Green, M.C., Brock, T.C., Kaufman, G.F.: Understanding Media enjoyment: the role of transportation into narrative worlds. Commun. Theory 14(4), 311–327 (2004). https://doi.org/10.1111/j.1468-2885.2004.tb00317.x. https:// academic.oup.com/ct/article/14/4/311-327/4110790
238
P. Bala et al.
17. Heeter, C.: Being There: The Subjective Experience of Presence. Presence Teleop. Virtual Environ. 1(2), 262–271 (1992). https://doi.org/10.1162/pres.1992.1.2.262. http://www.mitpressjournals.org/doi/10.1162/pres.1992.1.2.262 18. Jenkins, H.: Convergence Culture: Where Old and New Media Collide. New York University Press, New York (2006). oCLC: ocm64594290 19. Jenkins, H., Deuze, M.: Editorial: convergence culture. Converg. Int. J. Res. New Med. Technol. 14(1), 5–12 (2008). https://doi.org/10.1177/1354856507084415. http://journals.sagepub.com/doi/10.1177/1354856507084415 20. Katz, B.F.G., et al.: NAVIG: augmented reality guidance system for the visually impaired. Virt. Real. 16(4), 253–269 (2012). https://doi.org/10.1007/s10055-0120213-6. https://doi.org/10.1007/s10055-012-0213-6 21. Kim, G., Lee, H.: A case study of community-based socio scientific issue program: focusing on the abandoned animal issue. J. Biol. Educ. 0(0), 1–15 (2019). https:// doi.org/10.1080/00219266.2019.1699150 22. Kljun, M., et al.: Augmentation not duplication: considerations for the design of digitally-augmented comic books. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. CHI 2019, New York, NY, USA, Association for Computing Machinery (2019). https://doi.org/10.1145/3290605.3300333 23. Lee, N., Kotler, P.: Social Marketing: Influencing Behaviors for Good, 4th edn. SAGE Publications, Thousand Oaks (2011). oCLC: ocn733232905 24. Loon, A.v., Bailenson, J., Zaki, J., Bostick, J., Willer, R.: Virtual reality perspective-taking increases cognitive empathy for specific others. PLOS ONE 13(8), e0202442 (2018). https://doi.org/10.1371/journal.pone.0202442. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0202442. Publisher: Public Library of Science 25. Markowitz, D.M., Laha, R., Perone, B.P., Pea, R.D., Bailenson, J.N.: Immersive virtual reality field trips facilitate learning about climate change. Front. Psychol. 9, 2364 (2018). https://doi.org/10.3389/fpsyg.2018.02364 26. Mendes, J., Allison, L.: Bear 71: Wildlife [Transmedia] Storytelling From NFB Canada, November 2012. http://www.bear71.nfb.ca/#/bear71 27. NHS: NHS Virtual Blood Donation (2017). https://www.youtube.com/watch?v=zNWP4lzrJQ 28. Norouzi, N., et al.: Walking your virtual dog: analysis of awareness and proxemics with simulated support animals in augmented reality. In: 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 157–168. IEEE (2019) 29. de la Pe˜ na, N., et al.: Immersive journalism: immersive virtual reality for the first-person experience of news. Presence Teleop. Virtual Environ. 19(4), 291– 301 (2010). https://doi.org/10.1162/PRES a 00005. http://www.mitpressjournals. org/doi/10.1162/PRES a 00005 30. Pietschmann, D.: Limitations of Transmedia Storytelling for Children: A Cognitive Developmental Analysis, p. 24 (2014) 31. Pratten, R.: Getting Started with Transmedia Storytelling: A Practical Guide for Beginners, 2nd edn. (2015). oCLC: 923758058 32. Santos Baquero, O., Akamine, L., Amaku, M., Ferreira, F.: Defining priorities for dog population management through mathematical modeling. Prev. Vet. Med. 123, 121–127 (2016). https://doi.org/10.1016/j.prevetmed.2015.11.009 33. Sousa, B., Soares, D.: Combat to abandonment and mistreatment of animals: a case study applied to the public security police (Portugal). In: Galan-Ladero, M.M., Alves, H.M. (eds.) Case Studies on Social Marketing. MP, pp. 245–252. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04843-3 21
Tell a Tail: Leveraging XR for a Transmedia on Animal Welfare
239
34. Steinemann, S.T., Iten, G.H., Opwis, K., Forde, S.F., Frasseck, L., Mekler, E.D.: Interactive narratives affecting social change: a closer look at the relationship between interactivity and prosocial behavior. J. Med. Psychol. 29(1), 54– 66 (2017). https://doi.org/10.1027/1864-1105/a000211. https://econtent.hogrefe. com/doi/10.1027/1864-1105/a000211 35. Steinemann, S.T., Mekler, E.D., Opwis, K.: Increasing donating behavior through a game for change: the role of interactivity and appreciation. In: Proceedings of the 2015 Annual Symposium on Computer-Human Interaction in Play - CHI PLAY 2015, London, United Kingdom, pp. 319–329. ACM Press (2015). https://doi.org/ 10.1145/2793107.2793125. http://dl.acm.org/citation.cfm?doid=2793107.2793125 36. Thomas, A., et al.: Oceans we make: immersive VR storytelling. In: SIGGRAPH Asia 2018 Virtual & Augmented Reality, SA 2018, Tokyo, Japan, pp. 1–2. Association for Computing Machinery, December 2018. https://doi.org/10.1145/3275495. 3275513. https://doi.org/10.1145/3275495.3275513 37. Weinreich, N.K.: Hands-On Social Marketing: A Step-by-Step Guide to Designing Change for Good. SAGE Publications, Thousand Oaks (2010). Google-Books-ID: rn4mO5AZg1cC 38. Westervelt, A.: Reality Is Too Confining, October 2014. https://www. conservationmagazine.org/2014/10/reality-is-too-confining/. Library Catalog: www.conservationmagazine.org 39. WWF: “Take a Photo with the Leopard” environmental information campaign (2016). https://wwf.panda.org/?300852/Take-a-Photo-with-the-Leopardenvironmental-information-campaign-Leopard-has-Never-Been-so-Close 40. Young-Sung, K., Daniel, H.B.: An exploration of the limitations of transmedia storytelling: Focusing on the entertainment and education sectors. J. Med. Commun. Stud. 10(4), 25–33 (2018). https://doi.org/10.5897/JMCS2018.0607. http:// academicjournals.org/journal/JMCS/article-abstract/38B0EB756758
Survival on Mars - A VR Experience Alexander Ramharter and Helmut Hlavacs(B) Entertainment Computing, University of Vienna, Vienna, Austria [email protected] Abstract. It has been a long time since the colonization of Mars was only a point of discussion in science fiction. Today, with the omnipresent topic of the climate change, some people picture Mars as a viable alternative to Earth. This paper describes the major challenges of living on Mars, and explains the implementation of a virtual reality experience on the red planet, built upon those challenges. The interactive simulation confronts people with the potential problems on Mars, and this paper discusses and evaluates the impact on the players opinions as well as their reactions. Results show, that the VR experience affects peoples perception of Mars, however the general desire to settle Mars remains. Keywords: Virtual reality
1
· Survival on mars · Wrong assumptions
Introduction
In recent years, the colonization of Mars has reached popular culture to an extent where even SpaceX CEO Elon Musk has frequently addressed this topic through interviews and his social media channels [4,10]. But building a liveable and enjoyable habitat on the red planet is very difficult. Conditions on Mars are hostile, and humans can only live there with massive restrictions. The aim of the work is to build a virtual reality experience which demonstrates a typical day on Mars in an entertaining but also informative way. The goal is then to confront the player with some of the problems a potential marsian inhabitant would have, and evaluate the results of the players reactions. Especially, we answer the following questions: 1. What are the major problems of living on Mars? 2. How can you implement those problems in an interactive Virtual Reality experience? 3. How do people perceive living on Mars before and after the experience?
2
Related Work
Virtual reality is now in its heyday, as big publishers slowly begin to deliver significant titles. At least since the introduction of virtual reality for the smartphone it has reached big popularity and became more affordable for the end c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 240–247, 2020. https://doi.org/10.1007/978-3-030-65736-9_20
Survival on Mars - A VR Experience
241
consumer. Due to the high immersion VR provides, many game developers try to create “worlds” which users cannot explore in real life. Since the surface of Mars is still untouched by humans it makes up for a perfect and exciting setting for developers as well as for the players. One of the more detailed implementations of a VR-Experience on Mars is “Mars 2030” [5]. It was created by Fusion Media Group in cooperation with NASA and let’s you explore an accurately-mapped terrain from satellite footage. Not only can you discover the surface of Mars by foot but also with a rover. The Virtual Astronaut [14] is a tool which can create a virtual 3D environment. It is interactive and is built on the Unity engine [11]. A first prototype was created for the Santa Maria Crater on Mars. Scientists can use this tool for example to measure distances on the surface in an very intuitive way. Mental health of the crew members during long term space missions is very important. In [8], the authors examined the use of virtual reality to help sustain mental health during simulated typical situations which are usual on NASA missions. Their results show that due to the capabilities VR provides, a good preparation for the Astronauts can be achieved, but there is still more research in this field necessary. Besides the work mentioned above, it is worth mentioning that there are many 360 videos on the internet which also provide impressive immersion. Since nearly every phone nowadays support 360 video playback function, many people can benefit of this feature.
3
Conditions on Mars
Due to elaborated costs, it would not be possible to transport all resources to Mars from Earth. This means astronauts have to rely on in situ resource utilization (ISRU). This leads to many complications, as the Mars has much different environmental parameters than earth. To keep the virtual reality experience focused on the essential things and the message simple, it concentrates on 6 major conditions - respectively problems as detailed below. Breathing on Mars: The surface pressure on Mars lies between 5 to 11 hPa which is about 1% compared to earths atmospheric pressure [12]. That means the human body has to be protected by a pressure suit, otherwise the liquids would boil away within minutes. Oxygen has to be extracted from the atmospheric carbon dioxide or from the chemical hazard Perchlorate found in Marsian soil. Producing Water: Against common belief there is plenty of water on Mars, though it mostly exists as ice. Some of it exists as vapor in the atmosphere [2]. Because of the low atmospheric pressure, water on Mars only comes in those two forms. That is why melting the ice would lead to vaporizing the water instantly. Toxic Soil: Due to the high concentration of Perchlorate the martian soil is toxic [13]. This makes the process of growing crops extremely difficult. A lot of research takes place in this field [1].
242
A. Ramharter and H. Hlavacs
Radiation: In contrast to earth, Mars unfortunately does not provide a magnetic field or a dense atmosphere that could protect the planet from space radiation. The International Space Station (ISS) also is not protected by an atmosphere but still is in the range of the earths magnetosphere. This is why radiation exposure on Mars is two to three times higher than on the International Space Station. Extreme Temperatures: With a median temperature of −60◦ the surface of the red planet is much colder than the earth’s [12]. This does not only mean that home bases and green houses will have to have a very good isolation and heating, but also that machines and robots have to cope with that - just like Curiosity uses plutonium not only to power itself [6]. Generating Power: One way of producing enough energy for survival is to use solar power, like it was used for example on the Mars Rover “Opportunity”. Solar power is relatively easy to install. But there are several problems [3], including smaller yield and dust storms. Another solution would be to use plutonium-238 to produce electricity, which is used by the Mars Rover “Curiosity” [6].
4
Survival on Mars - A VR Experience
In order to convey the fact that living on Mars is probably very different to what many people think we implemented the VR experience “Survival on Mars”. As VR headset we used the Oculus Quest. It comes with two controllers, one for each hand. The controllers provide one joystick, two buttons, one trigger and one “grip” button. The experience was developed with the game engine “Unity” version 2019.2.13f1 [11]. All sounds either got downloaded from the royalty free music website soundbible.com [9] or are self-made. In the game there is also a robot voice which was generated on a website called onlinegenerator.com [7] and then captured from there. 4.1
Main Story Line
To let the player dive into the scenario and leave an impression, it was necessary to create a story similar structured like a movie, encompassiung an introduction, a confrontation, and a climax. The story begins on the main menu screen, where the player learns that a couple of months have passed since the first humans successfully landed on Mars. They live under the surface to keep safe from radiation. Regularly though, one of the astronauts has to get back up on to the surface and check the status there. The astronaut has to analyze the plants in the green house for toxic or non toxic plants, he has to collect stone probes that will get analyzed for water traces, and he has to make sure that the electricity is working. The player slips into the role of one of those astronauts and the game starts. He wakes up in the main station, a base which provides several resources that the player will need later on. The astronaut has to go outside, this is the moment the
Survival on Mars - A VR Experience
243
player first experiences to be on the surface of Mars. He sees and hears a little breeze, and feels the harshness of the red planet. During the whole operation the player always has to have an eye on his left HUD, where he sees his oxygen and radiation status. If the oxygen runs out, the game is over, if the radiation is too high, the game is also over. He gets prompted the task to go to the green house and analyze the plants. The player walks in there and sees many different plants in there. On the other end of the room, there is a “Plant Analyzer”, which allows the astronaut to check whether a plant is toxic or edible. He has to put the plants on there, then the analyzing process starts. After he finds an edible plant, he has to bring it to the main station. In the main station there is a “Radiation Defuser”, which brings down the players radiation level. After this, the player gets the task to go to the drilling station where he has to collect multiple stone probes. He does that and again has to return back to the main station. Suddenly a voice tells him that he quickly has to get the power back on because the green house is in danger to freeze without heating. He rushes there and manages to get the power back on. A voice tells him that a sand storm is on the way and he has to get back to the main station as fast as possible. The player tries to get there. As the sand storm reaches the players location, he notices a drastic oxygen loss. This is where the game ends. 4.2
Description of Scenes
Title Screen: The first thing the player sees after the “Splash Screen” with the Unity Logo is the Title Screen. It features a great view of the red planet with the title “Survival On Mars” and a Start button. It is also the first time the player sees his “virtual hands”. To navigate through this menu and press buttons, there is a ray attached to the right hand. It points from the hand to the canvas which holds the button and the text. After pressing start, while the next scene gets loaded in the background, there is a brief introduction to the backstory (cf. Sect. 4.1). Then the player can finally start the main scene. Main Scene: After the title screen the player wakes up in the Main Base of the Main Scene. He gets prompted tasks to his HUD and explores the main compound (cf. Fig. 4) consisting of the following sites: – The Green House: (cf. Fig. 1). The green house is a lab where different kinds of plants are cultured. A big problem of growing food on Mars, is that the soil is toxic due to Perchlorate [13]. This is why the lab features a “Plant Analyzer” which analyzes if plants are toxic or not. – The Drilling Site: (cf. Fig. 2). The drilling site is a place where a big drilling machine mines martian rocks to look for potential water resources. – The Solar Panels: (cf. Fig. 3). The solar panels are responsible for the power supply of the whole mission.
244
A. Ramharter and H. Hlavacs
Fig. 1. Inside the green house
Fig. 3. The solar panels
4.3
Fig. 2. The drilling site
Fig. 4. The map of the experience holds the home base, the green house, the drilling site and the solar panels
Gameplay and Mechanics
To show the player all the information he needs to finish his tasks and also to add game mechanics, it was necessary to create a Head-Up-Display (HUD) system, in this case displayed at the player’s left or right wrist. To display the current oxygen and radiation level (see below) the left wrist is equipped with two sliders, one for the radiation. To show the user what he is supposed to do the right wrist also got a HUD which always shows the current mission/goal (cf. Fig. 5).
Fig. 5. The left and the right HUD
Fig. 6. Some buttons.
Survival on Mars - A VR Experience
245
To make it easier for the player to fulfill the tasks and find their way, there are several highlighters set on the map which are switched on if the mission requires the user to do so, and switched back off when the player has arrived at the corresponding location. One of the main interaction methods used in the experience are buttons (cf. Fig. 6). To activate A button, the player does not have to press a button on the controller, but simply extend his index finger and move it towards the button until he pushes it. Many objects in the experience can get grabbed by the player. For that, the user has to press the “grab button” or simply make a fist. By that, the grabbable object sticks to the hand and it can get moved/rotated in any direction. The feature of picking up objects is for example used when analyzing plants in the green house. As a gameplay element, the player has to always keep an eye on his oxygen and radiation level. If the oxygen runs out, or the radiation level hits the maximum, the game is over. 4.4
Missions and Challenges
The player basically has to fulfill three missions: 1. Analyzing the plants in the green house. The player has to get to the green house. Inside there is the “Plant Analyzer”, a fictional machine which can specify if a plant is toxic or not. The user has to grab one of the plants and put it on the analyzer. If the display shows that the plant is edible, he has to take it with him and bring it back to the main base. 2. Collecting stone probes from the drilling site. Here the player has to collect at least 3 stone probes and bring them back to the main base. 3. Repairing the power generator at the solar panels. Since the plants located in the green house need a decent warm temperature, they are relying on the power supply from the solar panels. The player gets prompted that he must reactivate the power supply to save the plants. There is also a slider which indicates how much time the player has left. Besides these 3 missions, the player always has to check his oxygen resources and radiation level. If the player is inside and the doors are closed, he does not loose oxygen and does not gain radiation. To defuse the radiation he can use the “Radiation Defuser” in the main base, but he can only use it once in the whole experience. In a very early stage of the project, the only possible way for the player to switch his location was by “teleporting” to fixed positions. Later on, we decided that it would be better if the player can move around freely, since that provides more freedom. Unfortunately, this has the side effect of increasing the occurrence of motion sickness for some people.
246
5
A. Ramharter and H. Hlavacs
Evaluation and Discussion
To see if the experience would have an effect on peoples perception of Mars, a couple of persons were asked to do a full play through of the game. Overall, 7 people participated: 4 men and 3 women. Their age ranged from 26 to 64. None of the participants had a scientific background, or special previous knowledge about the subject. In advance, the people were asked to answer a questionnaire. It consisted of basic statements about conditions on Mars, general perception of the red planet, as well as if they had ever experienced virtual reality before. The people had to say ‘yes’ or ‘no’, to state if their opinion matches with the statement or not: 1. 2. 3. 4. 5. 6. 7.
“I support attempts to land astronauts on Mars.” “Mars is a suitable alternative to Earth for humankind.” “It is possible to survive on Mars without a space suit.” “There are water resources on Mars.” “Temperatures on Mars’ surface are similar to Earths’ surface.” “Just like Earth, Mars has a thick atmosphere.” “I have tried virtual reality before.”
During the actual experience, they were introduced to the controls and got a pair of headphones to achieve a higher immersion. If the participants had problems to finish the tasks, they received a hint. After they had finished the experience - either by loosing or by completing the game - they got a second questionnaire with the same questions, but without question nr. 7. It turned out, that everyone who was supporting Mars expeditions in the first place, did not change their opinion after the experience. Everyone who thought that Mars was a suitable alternative to Mars changed their opinion, although some mentioned that Mars was not the best, but “the only alternative” to earth. Every participant knew that it is impossible to survive without a space suit on the red planet, and also everyone knew that there is water on Mars. However, none of the participants knew, that Mars has in fact a very thin atmosphere instead of a thick, and also did not know it after the experience. Only a few were correct about the drastic temperature differences on Mars, everyone knew it after the experience. Every participant learned about the toxic soil and the radiation in the experience. Overall the participants did well in fulfilling the prompted tasks. Unfortunately, some of the players suffered from motion sickness during the experience, and one of them even had to stop because of this. Another point of criticism was the lack of action or music.
6
Conclusions and Future Work
The results clearly show that “Survival On Mars” has an effect on peoples perception on Mars. For future work, a revision of the existing game with a more
Survival on Mars - A VR Experience
247
focused approach on entertainment would probably be beneficial of the overall experience and could also lead to a much greater impact on peoples opinions. Even though the experience raised awareness about how hostile Mars is, the desire to settle Mars, still is - such as the general wish to explore the universe deep embedded in our society and will probably remain there.
References 1. Wamelink, G.W.W., et al.: “opag”, vol. 4. 1. 2020 2019. Chap. Crop growth and viability of seeds on Mars and Moon soil simulants, p. 509. https://www. degruyter.com/view/j/opag.2019.4.issue-1/opag-2019-0051/opag-2019-0051.xml. https://doi.org/10.1515/opag-2019-0051 2. Jakosky, B.M., Haberle., R..: The seasonal behavior of water on Mars. In: George, M., (ed.) Mars, pp. 969–1016 (1992) 3. Landis, G., et al.: Mars Solar Power, December 2004. https://doi.org/10.2514/6. 2004-5555 4. McFall-Johnsen, M., Mosher, D.: Elon Musk says he plans to send 1 million people to Mars by 2050 by launching 3 Starship rockets every day and creating ‘a lot of jobs’ on the red planet. In: Ed. by busi- nessinsider.de. 18 January 2020. https://www.businessinsider.de/international/elon-musk-plans-1million-people-to-mars-by-2050-2020-1/?r=US&IR=T 5. NASA. Mars 2030 VR Experience. https://www.nasa.gov/stem-ed-resources/freedownload-for-educators-experience-mars-in-virtual-reality-with-mars-2030.html 6. NASA. Multi-Mission Radioisotope Thermoelectric Generator (MMRTG) - Factsheet. https://mars.nasa.gov/msl/files/mep/MMRTG FactSheet update 10-2-13. pdf. Accessed Jan 2020 7. Onlinetonegenerator. Onlinetonegenerator. http://onlinetonegenerator.com/voicegenerator.html. Accessed Jan 2020 8. Salamon, N., et al.: Application of Virtual Reality for Crew Mental Health in Extended-Duration Space Missions, September 2017 9. Soundbible. Soundbible. http://soundbible.com/. Accessed Jan 2020 10. SpaceX. Missions To Mars. https://www.spacex.com/mars. Accessed Jan 2020 11. Unity. Unity. https://unity.com/. Accessed Jan 2020 12. Verseux, C., et al.: Sustainable life support on Mars - the potential roles of cyanobacteria. Int. J. Astrobiol. 15, 65–92 (2016). https://doi.org/10.1017/ S147355041500021X 13. Wadsworth, J., Cockell, C.: Perchlorates on Mars enhance the bacteriocidal effects of UV light. Sci. Rep. 7, July 2017. https://doi.org/10.1038/s41598-017-04910-3 14. Wang, J., Bennett, K., Guinness, E.: Virtual astronaut for scientific visualization—a prototype for Santa Maria crater on mars. Future Internet 4, December 2012. https://doi.org/10.3390/fi4041049
Tangible Multi-card Projector-Based Interaction with Physics Songxue Wang , Youquan Liu(B) , and Junxiu Guo School of Information Engineering, Chang’an University, Xi’an, China {2018124053,youquan,2018124055}@chd.edu.cn Abstract. We propose a projector-based interaction method and system with fusion of virtuality and reality, supporting participants to interact with real objects like cards physically on the desk or table. An infrared camera is mounted on the micro-projector to detect the hidden markers embedded on cards, while the projector is used to project the generated virtual objects on the surface of cards. When participants move or rotate the tangible cards, the velocity of markers is calculated from the pose change of cards. Especially both the pose and the velocity are used to control the virtual characters which enables participants to interact with virtual objects realistically with the help of physics simulation. Meanwhile, by tracking the multiple cards and orientation projecting, we design a configurable dynamic storyboard system. A variety of examples of interactive scenarios are provided, which verify our method’s validity and extensibility. Keywords: Projector-based interaction of virtuality and reality · Tangible cards
1
· Physical interaction · Fusion
Introduction
Many intelligent projectors are integrated with interaction through fingers, gestures or even tangible objects, such as Sony’s Xperia Torch [8]. Such kind of projector-based interactions are often used in education, entertainment and other fields. In this paper, we propose a novel projection system supporting tangible interaction with cards, with temporal-spatial information used to control the virtual characters. The projected content is manipulated by the participants through the cards, which is similar to the traditional Chinese shadow play. The difference is that the participants manipulates cards with infrared (IR) hidden markers [6], and the generated virtual content is projected directly on tabletop and surfaces of tangible cards according to the pose of target markers. In this paper, our contributions can be exhibited at two different aspects: – We obtain the pose and velocity information of cards by tracking the IR hidden markers, and combine physical simulation to give physical properties to the projected virtual objects. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 248–254, 2020. https://doi.org/10.1007/978-3-030-65736-9_21
Tangible Multi-card Projector-Based Interaction with Physics
249
– We present a configurable dynamic storyboard system through multi-target tracking and orientation projection onto multiple cards, while designing all kinds of cards makes the storyboard expandable.
2
Related Work
As the pioneer work, DigitalDesk [10] presented a way to combine the physical document with the digital one through projection. Willson designed a desktop system named PlayAnywhere [12], which used oblique projection to display virtual information, shadows to detect touch, optical flow to estimate motion to support translation, scaling and rotation of virtual objects, and Hough transform with the Sobel edge to track pages. However only images or videos are displayed and interactions are only available in these transforms. With the popularity of projectors, interactive electronic whiteboard is widely used. This kind of interaction mainly uses fingers or pen to play with the content, such as Xperia Touch [8] which is based on IR sensor to perceive the movement of fingers on the table, and DreamBoard [3] which is based on handwriting recognition. These systems lack a sense of control over the projected objects. Marco et al. designed a vision based desktop system named NIKVision [5], in which users manipulate the toys with marker attached at the bottom to control the story character. Since a camera is needed to mount under the desktop to read the markers, it imposes a limitation on user’s operation space. Villar et al. presented a Near Field Communication (NFC) based system named Zanzibar [9], in which toys with NFC label attached are used to control the story character. A mat integrated with NFC scanner and capacitor induction is used to track the toys. Users can manipulate the toys on the mat to create story, and the story scenario is displayed on the screen. However users have to balance the operation of tangible objects with the screen display. Wu et al. proposed MagicPAPER [13] based on multi-sensors, such as AirBar, Kinect, LeapMotion and camera, using tangible objects like toothbrushes and wooden blocks to interact on kraft paper. As paper markers also belong to tangible objects and can be well applied in reality, some existing work, such as [1,7,11], explored tangible interaction based on tracking paper markers without many sensors. In particular, Willis et al. [11] proposed mobile interaction with tangible objects and surfaces by detecting IR hidden markers. However just position information is extracted from the tracked markers to drive the projected content.
3
Method
Our system is composed of 4 parts, as demonstrated in Fig. 1. Here OpenCV library is used to calibrate the camera [14] and the projector [2] to obtain internal parameters, external parameters and distortion coefficients. Our hardware design ensures that the camera-projector system only needs to be calibrated once, and there is no need to re-calibrate when moving the device.
250
S. Wang et al.
Fig. 1. Overview of our system
The IR hidden markers printed on tangible cards are tracked through the visual tracking algorithm to obtain pose of the cards, and then the velocity is calculated from the card pose change. participants control the virtual character by manipulating the corresponding card on the desktop. Different cards play different roles in a virtual scenario with different physical effect with not only pose used but also velocity used. In the following sections we will describe the key points of our system, including hardware, tracking and locating, aligned projection, physical simulation and storyboard. 3.1
Hardware
Our customized hardware consists of a projector with resolution of 1024 × 768, an IR camera used to capture 640 × 480 images and IR light sources. The cameraprojector system is mounted on a stand. Since the card is used as both tracking input and projection output, we use IR-absorbing ink to print the markers on the card to avoid the interference between projected content and marker patterns. 3.2
Tracking and Locating of Cards
To ensure the robustness of tracking and fluency of interaction, fiducial markers are used to locate the cards. Specifically, we perform adaptive threshold segmentation on the original image by OpenCV, and then detect markers by using the ARToolKit library [4]. Here, the marker coordinate frame OM XM YM ZM and camera coordinate frame OC XC YC ZC are established to describe the marker locating process. The displacement of each marker corner in marker coordinates pM is preset combined with marker size, and its displacement in the camera coordinate frame pC is then calculated after the marker is detected. Suppose pC = (pC , 1)T , pM = (pM , 1)T , the transformation matrix MM C can be calculated by pC = MM C pM .
(1)
Since the marker center is the origin of OM XM YM ZM , let pM = (0, 0, 0, 1)T , the position of marker is estimated.
Tangible Multi-card Projector-Based Interaction with Physics
3.3
251
Aligned Projection
In order to make the observed character consistent with the tangible object, we project it on the surface of card directly. Therefore, alignment of the virtual object to the card needs to be considered. From Sect. 3.2, the displacement of marker in camera coordinate frame pC is calculated. Suppose its displacement in world coordinate frame is pW , and the displacement of virtual object in world coordinate frame is also pW . Let the displacement of virtual object in projector coordinate frame be pP , the displacement in pixel coordinate frame be pi , and the displacement in virtual scene coordinate frame be pv . Define pW = (pW , 1)T , which can be obtained from the following equation, pW = MW C −1 pC ,
(2)
where MW C is the transformation matrix from OW XW YW ZW to OC XC YC ZC , that is, the external parameter matrix of the camera. Define pP = (pP , 1)T , which can be calculated by the following equation, pP = MW P pW ,
(3)
where MW P is the transformation matrix from OW XW YW ZW to OP XP YP ZP , that is, the external parameter matrix of the projector. Considering distortion, let pP = (xP , yP , zP ), pi = (pi , 1)T . From the following equation we get pi , ⎤ ⎡ (1 + k1 r2 + k2 r4 + k3 r6 )x + 2p1 xy + p2 (r2 + 2x2 ) pi = MP ⎣ (1 + k1 r2 + k2 r4 + k3 r6 )y + p1 (r2 + 2y 2 ) + 2p2 xy ⎦ , (4) 1 with x = xzPP , y = yzPP , r = x2 + y 2 . Here, MP is the internal parameter matrix of the projector, and k1 , k2 , k3 , p1 , p2 are the distortion coefficients of the projector. Define pv = (pv , 1)T , which can be calculated by the following equation, pv = Miv pi ,
(5)
where Miv is the transformation matrix from Oi Xi Yi to Ov Xv Yv calculated by the following equation, ⎡ ⎤ 2/C 0 −C/R (6) Miv = s ⎣ 0 2/R 1 ⎦ . 0 0 1 Here, C × R is the resolution of the projector, and s is the scale factor of virtual orthogonal camera.
252
3.4
S. Wang et al.
Physical Simulation
To be more interesting, the motion state of virtual characters can change from walking to running, and vice versa. The state updates are controlled by the velocity information of the tracked target with motion graph. During interaction, participants control the virtual object by moving the card. At a lower speed, the character will be in walking state, at a higher speed, the character will be in running state. If the card is static, then the character will be in idle state. Meanwhile, there are physical interactions among virtual characters, such as collisions. We utilize the physics engine to simulate the physical interaction among virtual objects. 3.5
Storyboard
In our work, four types of scene elements are supported from the perspective of interaction. The story background is the initial setting and the scene elements include the direct scene elements, indirect scene elements, and virtual characters. The direct scene elements mainly include the sun, trees and barriers, which are directly driven by corresponding cards to update their positions. The indirect scene elements are computer generated effects, such as the ball controlled by the character, or artificial visual effects, which are not controlled by the user, but driven by the storyline. The virtual character is directly controlled by card. Its position is bound with card’s position, while the motion state is driven by card’s velocity.
4
Experiment
Our experiment platform is equipped with Intel Core i5 1.6GHz CPU, 8GB memory, 64-bit Windows 10 operating system, and the development environment is Unity 2018.1 and Visual Studio 2013.
Fig. 2. Shooting game
Tangible Multi-card Projector-Based Interaction with Physics
253
Fig. 3. Passing game
4.1
Shooting Game
To reflect more scene elements, we design a shooting game (see Fig. 2), which contains direct scene elements, the barriers, indirect scene elements, an arrow representing initial velocity and a ball. The user moves the cards in a certain time, to give the ball an initial velocity to shoot. The trajectory of the ball, which can be changed by barriers controlled by user, is presented through physical simulation. 4.2
Passing Game
Considering multi-user interaction scenario, we design a passing game based on storyboard (see Fig. 3). Two participants manipulate the cards to drive the virtual characters move on the tabletop. Character’s motion state can switch smoothly from idle to walk or from walk to run in accordance with participants manipulation. When the character meets other objects in the scene, the event trigger mechanism will respond accordingly. For example, when touching a ball on the ground, the character will pick up the ball, then walk with the ball, and finally pass the ball. The speed of character determines the force applied to the ball when it is thrown. The player can also control a board to catch the ball. When the card is moved or rotated, the pose of the board will also change.
5
Conclusion
We propose a projector-based interaction method with fusion of virtuality and reality, supporting participants to interact with real objects like cards on the desk or table. Cards are interaction command input, containing both the pose and velocity information. Compared with existing methods, our work makes full use of the pose and velocity information to provide virtual objects physical properties, allowing interaction have more physical effects. The limitation is that it is hard to detect when the card is moved too fast. Therefore, we plan to improve the performance of tracking to interact with the cards more naturally.
254
S. Wang et al.
References 1. Bang, C., Lee, J., Jung, H., Choi, O., Park, J.-I.: Tangible interactive art using marker tracking in front projection environment: the face cube. In: Yang, H.S., Malaka, R., Hoshino, J., Han, J.H. (eds.) ICEC 2010. LNCS, vol. 6243, pp. 397– 404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15399-0 43 2. Falcao, G., Hurtos, N., Massich, J.: Plane-based calibration of a projector-camera system. VIBOT master 9(1), 1–12 (2008) 3. Hatanaka, T., Hayashi, T., Suzuki, K., Sawano, H., Tuchiya, T., Koyanagi, K.: Dream board: a visualization system by handwriting recognition. In: SIGGRAPH Asia 2013 Posters, p. 1 (2013) 4. Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a videobased augmented reality conferencing system. In: Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR 1999). pp. 85–94. IEEE (1999) 5. Marco, J., Cerezo, E., Baldasarri, S., Mazzone, E., Read, J.C.: User-oriented design and tangible interaction for kindergarten children. In: Proceedings of the 8th International Conference on Interaction Design and Children, pp. 190–193 (2009) 6. Park, H., Park, J.I.: Invisible marker tracking for ar. In: Third IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 272–273. IEEE (2004) 7. Rekimoto, J., Ayatsuka, Y.: Cybercode: designing augmented reality environments with visual tags. In: Proceedings of DARE 2000 on Designing augmented reality environments, pp. 1–10 (2000) 8. Sony: Xperia torch homepage. https://developer.sony.com/develop/xperia-touch/. Accessed 30 Apr 2020 9. Villar, N., et al: Project zanzibar: a portable and flexible tangible interaction platform. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2018) 10. Wellner, P.: Interacting with paper on the digitaldesk. Commun. ACM 36(7), 87– 96 (1993) 11. Willis, K.D., Shiratori, T., Mahler, M.: Hideout: mobile projector interaction with tangible objects and surfaces. In: Proceedings of the 7th International Conference on Tangible, Embedded and Embodied Interaction, pp. 331–338 (2013) 12. Wilson, A.D.: Playanywhere: a compact interactive tabletop projection-vision system. In: Proceedings of the 18th annual ACM symposium on User interface software and technology. pp. 83–92 (2005) 13. Wu, Q., Wang, J., Wang, S., Su, T., Yu, C.: Magicpaper: tabletop interactive projection device based on tangible interaction. In: ACM SIGGRAPH 2019 Posters, pp. 1–2 (2019) 14. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Co-sound: An Interactive Medium with WebAR and Spatial Synchronization Kazuma Inokuchi(B) , Manabu Tsukada, and Hiroshi Esaki The University of Tokyo, Tokyo, Japan {ino,tsukada,hiroshi}@hongo.wide.ad.jp Abstract. An Internet-based media service platform can control recording processes and manage video and audio data. Furthermore, the design and implementation of an object-based system for recording enable the flexible playback of the viewing contents. Augmented Reality (AR) is a three-dimensional video projection technology. However, there are few examples of its use as a method for audio-visual media platforms. In this study, we propose Co-Sound, which is designed as a multimodal interface that renders object-based AR dynamically in response to various actions from viewers on a web browser by sharing AR objects among multiple devices in real time. We confirmed that the system was developed as an object-based interactive medium with AR, achieved the general acceptance of the system was very high through a questionnaire survey, and low-latency synchronization to accept operations from multiple users in real time. Keywords: Interactive media · Object-based audio Augmented reality · Software defined media
1
·
Introduction
With the spread of the high-capacity communication environments, video streaming services have expanded rapidly, and 360◦ video streaming also has attracted increasing interest. Despite the growing demand for live musical performances and concerts, it is difficult for users to view the content of package media and live broadcasting from a free viewpoint because of the limitations of the recording devices’ performance and location. Few media can accept actions from viewers, as they only record and playback predesigned video and audio positional relationships as well as viewpoints. Sound recording and playback systems can be broadly divided into three categories [3]. Object-based audio (OBA) has the following characteristics [8]: (i) Multiple objects that exist in multi-dimensional space, (ii) Interactive reproduction personalized to users, (iii) Decoupling media data from recording devices, and delivering in a variety of formats via the Internet. Unlike conventional channel-based audio and scene-based audio, an object-based approach is adopted not only in the audio, but also in other media components, such as videos and c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 255–263, 2020. https://doi.org/10.1007/978-3-030-65736-9_22
256
K. Inokuchi et al.
position data of instruments. The complete media data can be controlled and managed by abstracting a series of processes from recording to playback, and OBA can interpret and express viewing objects existing in the real world. In this paper, we present Co-Sound, an interactive audio-visual medium with WebAR. Such an audio-visual media platform is ideal for reproducing softwaremanaged object audio. By measuring the real-time response of multiple people to the system and the QoE (Quality of Experience) of the application using this system, we confirmed that Co-Sound create new and enhanced user experiences. The main findings of these experiments were that the delay of spatial synchronization with WebRTC was lower than that with WebSocket and the accuracy of AR-marker detection and calibration could deteriorate the QoE even when the WebAR media application was rated highly.
2
Related work
Three-dimensional visual interfaces reproduce viewing objects existing in the real world. AR is defined by Azuma as systems that have the three characteristecs, (i) Combines real and virtual; (ii) Interactive in real time; (iii) Registered in 3D. In recent years, the number of use cases for AR as a medium for viewing exhibits in museums and art galleries has increased. Fenu et al. asked 34 subjects who visited the Svevo Museum autonomously with their smartphone app using AR [1]. They analyzed their behavioral records, and the items were rated highly, regarding the overall satisfaction, novelty, aesthetics of the user interface, and degree of interest for the content. Tillion et al. classified visitors’ learning experiences in museums into two types, sensitive and analytical, and investigated the results of AR guides [9]. According to their results, the presentation of appropriate information by the AR guide, such as the materials of paintings and the introduction of other works, may promote the Analytical Activity. In 2014, we established the SDM consortium [10] for targeting new research areas and markets involving object-based digital media and Internet-by design audio-visual environments. SDM is an architectural approach to media as a service, by the virtualization and abstraction of networked media infrastructure. LiVRation [5] was a system for interactive playback media from a free viewpoint using a head-mounted display. Web3602 [6] was designed for viewing 3D contents on a browser with tablets, and was deployed as a WebVR application. Both applications accepted interactive manipulation from viewers, and more than half of the total number of responses were for the top two ratings combined in their subjective evaluations using a seven-point Likert scale.
3 3.1
Co-sound Design
We propose a platform that enables multiple people to view and manipulate the same content by playing an object-based music event using interactive AR on the web. Co-Sound satisifes the following requirements.
Co-sound: An Interactive Medium with WebAR and Spatial Synchronization
1. 2. 3. 4.
257
Interactive viewing between viewers and contents Bidirectional communication among viewers Object-based structuring of media data Viewing experience regardless of specific devices
Figure 1 shows an overview of Co-Sound system design and implementation. Co-Sound derives the audio data of the music event based on SDM ontology from the database, and centrally manages the displayed virtual objects. Viewers input video information and touch actions, and Co-Sound outputs binaural audio with camera images of virtual objects superimposed on them. Marker detection from the input video estimates the coordinates of the camera, and those of each virtual object are determined by referring to the position information of the recorded data. Real-time rendering of AR images and sounds in response to touch actions realizes user interactivity. Moreover, Co-Sound synchronizes the virtual space with other devices by communicating the serialized object data.
Fig. 1. Design and implementation of Co-sound
3.2
Implementation
Co-Sound was implemented using AR.js v1.5.01 and aframe.js v0.9.22 , which process the marker recognition and camera location estimation. Three.js v0.110.03 renders AR objects and audio visualization. Three-dimensional audio on browser was implemented using WebAudio. The nodes are chained on the AudioContext from the BufferSource node got by HTTP request to the Destination node. The ON/OFF operation of the sound was represented by setting the gain value of the Gain object, which is a gain adjustment node, to zero or a constant. Similar to W eb3602 [6], the visualization 1 2 3
https://github.com/jeromeetienne/AR.js (Accessed on 01/05/2020). https://aframe.io/blog/arjs/ (Accessed on 01/05/2020). https://threejs.org/ (Accessed on 01/05/2020).
258
K. Inokuchi et al.
of the sound was represented by using the AnalyzerNodegetByteFrequencyData() method in WebAudio. The system obtained the frequency domain data from the time domain data and represented the effective frequency band by converting it to the length and color of the box objects.
Fig. 2. Co-Sound screenshots
We propose the shared and synchronized digital space with WebRTC instead of WebSocket. WebRTC is a technology of peer-to-peer (P2P) real-time connections on web browser. DataChannel, which is one of the types of WebRTC for binary data transport, adopts Stream Control Transmission Protocol (SCTP), and can ensure reliable sequential transport of messages with congestion control. Santos-Gonz´ alez reported that its packet transmission rate is higher than Real Time Streaming Protocol [7]. We employed SkyWay v2.0.14 , a platform as a service (PaaS) designed as a real-time interactive multimedia service. SkyWay provides a signaling server for WebRTC connections, TURN server for packet relay, and WebSocket server. These servers are publicly stated to have been located in Tokyo. Two types of communication methods and protocols were implemented for the comparison experiments: (1) mesh type connection using WebRTC, and (2) start type connection using WebSocket. The open source of SkyWay JavaScript software development kit (SDK) implements WebSocket for room-type binary data communication; for this reason, we improved it to build a mutual DataChannel connection between peers even in room type.
4
Evaluation and Discussion
4.1
Performance Evaluation
In the following experiments, we evaluated the delay of AR spatial synchronization by measuring round trip time (RTT). In the field of online gaming, the 4
https://github.com/skyway/skyway-js-sdk (Accessed on 01/05/2020).
Co-sound: An Interactive Medium with WebAR and Spatial Synchronization
259
Table 1. Co-Sound measurement environments OS
CPU
Laptop
Windows 10 version 1809
R CoreTM i7-8550U Intel
Memory 16 GB
Tablet
iOS 12.3.1
Apple A10X Fusion
4 GB
Smartphone
Android 9, EMUI version 9.1.0
HiSilicon Kirin 960
4 GB
QoE is closely related to the response delay [4]; hence, measuring the delay is one of the indicators to measure the QoE of the spatial synchronization function of Co-Sound. We conducted experiments to measure the performance of Co-Sound spatial synchronization under the following four conditions. One terminal sent a test dummy file, and the other sent it straight back. We define the RTT as the time taken for a series of these transmissions. We did not consider the delay fluctuation caused by the differences in the packet processing performance of each server. Table 1 shows devices used in this experiments. In the first experiment, we selected three types of communication protocols as those available to web browsers: (1) WebRTC in LAN (host); (2) WebRTC via TURN server (relay); and (3) WebSocket. Figure 3a shows that the average RTT was 210 ms and 73 ms with WebSocket and WebRTC (host), respectively, which means that WebRTC was shortened by 65.0%. The average RTT with WebRTC (relay) was 107 ms. It also illustrates that the standard deviations were derived as 116 ms, 47 ms, and 87 ms, which implies that the variation in delay time was suppressed. In the second experiment, we measured RTT when various sizes of messages were transferred: 20 B, 120 B, 220 B, 420 B, 820 B, 1 KiB, 2 KiB, and 4 KiB. The result is shown in Fig. 3b. Message size had little influence on the average RTT and the standard deviation, irrespective of the protocols used. For sizes of 20 to 4096 B, the average RTT for both protocols was approximately 80 ms and 200 ms, respectively, which was constant regardless of the message size. In the third experiment, we evaluated RTT when the number of connected devices was changed. One to three smartphones shown in Table 1 joined the same room in addition to the laptop and the tablet. Figure 3c (compared to Fig. 3a) demonstrates the result of Exp. 3. The average RTT when two and five devices joined was 65 ms and 170 ms, respectively. In the forth experiment, we measured RTT when two kinds of devices were used. The laptop shown in Table 1 and the tablet or the smartphone was used. Figure 3d (compared to Fig. 3a) shows the result of Exp. 4. In the case of the smartphone, the average RTT was 240 ms for WebRTC and 360 ms for WebSocket. It can be inferred that the performance of the device has a significant impact on the delay, irrespective of the protocols adopted. From Exp. 1–4, it was concluded that the proposed method employing WebRTC was more appropriate for real-time AR spatial synchronization. Although the evaluation of the QoE in spatial shared AR has not been determined yet, Nishibori’s study on delay recognition in music sessions over the Internet reported that the delay is recognized at 30 ms or more, and the performance
260
K. Inokuchi et al.
becomes difficult at 50 ms or more [12]. Vlahovic reported that the player’s score and QoE decrease over 100 ms in first-person-shooting games in VR [11]. The results of these experiments show that the average delay for WebRTC communication is less than 50 ms, and P2P in the same LAN could reduce the overhead by using SCTP and retain a lower latency than that using HTTP. Moreover, the transmission delay was independent of the message size and the number of devices within the range measured in the experiments. Even when the payload of AR spatial data became longer because of the increase in the number of AR objects and the complexity of the attributes, Co-Sound could be considered to be highly scalable with the real-time synchronization.
(a) Exp. 1
(c) Exp. 3
(d) Exp. 4
(b) Exp. 2
(e) Result of the subjective evaluation
Fig. 3. (a)–(d) were the results of Exp. 1–4. RTT by WebRTC was shorter than that by WebSocket. RTT was not dependent on message sizes but the performance of devices. (e) was the result of questionnaire survey.
4.2
Subjective Evaluation
We conducted a questionnaire survey to evaluate the QoE of Co-Sound. The survey was carried out from December 6, 2019 to December 17, 2019. Subjects were asked to experience free-viewpoint viewing, turning individual audio on and off, and moving the AR objects, and then answer the questionnaire. Responses were obtained from a total of 25 people, including 24 men and one woman.
Co-sound: An Interactive Medium with WebAR and Spatial Synchronization
261
Concerning the age composition, 20 people were in their 20 s, two in their 30 s, one in his 40 s, and two in their 50 s. Apple iPad Pro (10.5 in.) iOS 12.3.1 and Sony WH-1000XM2 served as a viewing device and a headphone, respectively. The questionnaire items were evaluated using a seven-point Likert scale, ranging from 1 to 7 (worst:1, best:7), for each of the questions Q1–Q8. The eight questions are shown in Fig. 3e. Q1) Q2) Q3) Q4) Q5) Q6) Q7) Q8)
Did you hear the sound from the direction of the AR image? Did the sound match the distance of the AR image? When you moved the AR image, did you feel that the sound move with it? When you changed your viewpoint, did you feel the sound move with it? Is it intuitive to turn on/off the audio objects using the audio visualizer? Is it intuitive to move the AR image using the controller? Was the recognition accuracy of the Ar markers sufficient? Can you interact with the 3D contents on web browser?
Q1–Q4 regarded the fundamental three-dimensionality of the audio. Q5–Q6 were regarding the user interface, Q7 the accuracy of the marker detection, and Q8 the general QoE of Co-Sound. Figure 3e depicts the results. The vertical axis shows questions from Q1 to Q8 and the number of valid responses; the horizontal axis shows the ratio of responses for the seven-point evaluation, from 1 to 7, as a stacked bar graph. The middle of the response ratio of score 4, which represents the mid-term evaluation, was placed at the origin. The more ratings 5, 6, and 7 were given, the more the stacked bar was biased in the positive direction, and vice versa. For all items except for Q7, the total response ratio of scores 6 and 7 was more than 50%, and as for Q8, it was 76%. On the other hand, the average rating of Q7 was 4.96, the ratio of the highest rating score was 16%, and the lowest rating score 1 was present. Q7 was the only question that had an average rating of less than 5, and the ratio with a rating of score 7 was also the lowest. Although more than half of the responses of Q1–Q4 gave a high rating, the total response ratio of scores 5–7 in Q2 and Q3 was approximately 70%, while that in Q1 and Q4 was more than 85%. W eb3602 [6] reported that the evaluation by the questionnaire as for the audio was dispersed, because the questions were ambiguous; for this reason, We classified the audio three-dimensionality into four types. This illustrates that the direction tracking of the audio to the AR image was excellent, but the distance tracking of the audio was not satisfactory. I would suggest that this is because a binaural algorithm employed by WebAudio PannerNode is simple and the calibration with real space is inadequate. The results of Q5, Q6, and Q8 show that the user interface of Co-Sound was rated as highly as LiV Ration and W eb3602 , and the QoE of an interactive medium with AR was also high. ARToolkit, which is used in AR.js, adopts a rudimentary algorithm for marker detection and is known for its high false-negative rate [2], which appeared in the result of Q7. It can be asserted that WebAR is not accurate enough to obtain a high rating from users.
262
5
K. Inokuchi et al.
Conclusion
In this study, we proposed an interactive audio-visual medium using WebAR, Co-Sound. By designing a multimodal interface that dynamically renders AR according to object operations from viewers, we presented a digital space with high affinity to the real space and interactive content viewing. Furthermore, the low-latency bidirectional communication among devices enabled users to interact with each other by allowing them to become the senders and receivers of content. In future work, we plan the integration of real space and digital space. The current version of Co-Sound displays a music event on a marker; however, we must incorporate the advantage of AR and the induction from real to digital.
References 1. Fenu, C., Pittarello, F.: Svevo tour: the design and the experimentation of an augmented reality application for engaging visitors of a literary museum. Int. J. Hum Comput Stud. 114, 20–35 (2018) 2. Fiala, M.: Artag, a fiducial marker system using digital techniques, vol. 2, pp. 590–596, July 2005 3. Rec, I.T.U.R.: Itu-r bs 2051–0 (02/2014): Advanced Sound System For Programme Production. Int. Telecommun. Union, Geneva, Swizerland (2014) 4. Jarschel, M., Schlosser, D., Scheuring, S., Hoßfeld, T.: An evaluation of QoE in cloud gaming based on subjective tests. In: 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 330–335 (2011) 5. Kasuya, T., et al.: Livration: Remote VR live platform with interactive 3d audiovisual service. In: IEEE Games Entertainment & Media Conference (IEEE GEM) 2019, pp. 1–7. Yale University, New Haven, CT, U.S. (2019) 6. Kato, S., Ikeda, T., Kawamorita, M., Tsukada, M., Esaki, H.: Web3602 : an interactive web application for viewing 3D Audio-visual Contents. In: 17th Sound and Music Computing Conference (SMC). Torino, Italy (2020) 7. Santos-Gonz´ alez, I., Rivero-Garc´ıa, A., Gonz´ alez-Barroso, T., Molina-Gil, J., Caballero-Gil, P.: Real-time streaming: a comparative study between RTSP and WebRTC. In: Garc´ıa, C.R., Caballero-Gil, P., Burmester, M., Quesada-Arencibia, A. (eds.) UCAmI/IWAAL/AmIHEALTH -2016. LNCS, vol. 10070, pp. 313–325. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48799-1 36 8. Silzle, A., Sazdov, R., Weitnauer, M.: The EU Project ORPHEUS: Object-Based Broadcasting-For Next Generation Audio Experiences. In: The 29th Tonmeistertagung -VDT International Convention, January 2016 9. Tillon, A.B., Marchal, I., Houlier, P.: Mobile augmented reality in the museum: can a lace-like technology take you closer to works of art? In: 2011 IEEE International Symposium on Mixed and Augmented Reality - Arts, Media, and Humanities, pp. 41–47, October 2011 10. Tsukada, M., et al.: Software defined media: virtualization of audio-visual services. In: 2017 IEEE International Conference on Communications (ICC), pp. 1–7 (2017)
Co-sound: An Interactive Medium with WebAR and Spatial Synchronization
263
11. Vlahovic, S., Suznjevic, M., Skorin-Kapov, L.: Challenges in assessing network latency impact on QoE and in-game performance in VR first person shooter games. In: 2019 15th International Conference on Telecommunications (ConTEL), pp. 1–8, July 2019 12. Yu, N., Yukio, T., Sone, T.: Study and experiment of recognition of the delay in musical performance with delay. IPSJ SIG Technical Reports 53, pp. 37–42, December 2003
A Memory Game Proposal for Facial Expressions Recognition in Health Therapies Samuel Vitorio Lima and Victor Travassos Sarinho(B) Universidade Estadual de Feira de Santana (UEFS) Laborat´ orio de Entretenimento Digital Aplicado (LEnDA) Feira de Santana, Bahia, Brazil [email protected], [email protected]
Abstract. The recognition of facial expressions has an important role in the communication and interaction between human beings. Researches attest that it is possible to exercise the recognition skills and production of facial expressions with the aid of computational support tools. This article presents Trimem´ oria, a game whose objective to produce a therapeutic playful environment for children with difficulty in recognizing facial expressions. Trimem´ oria seeks to apply classic dynamics of memory games with sensors, in an attempt to increase the integration of the real world with the virtual interactions provided by the game.
Keywords: Health game
1
· Facial expressions · RFID
Introduction
The recognition of facial expressions is an intrinsic ability of human beings [4], representing a important role in the communication and interaction between humans. However, some children can show some difficulties at the task of facial expressions recognition [1], such as Autism Spectrum Disorder (ASD) carriers. The ASD is classified in the group of neurodevelopmental disorders [7], which result in significant losses in the communication, the interpersonal relationships, the imaginative ability and movement [8]. The ASD does not have cure, but psychological approaches can be useful for the improvement of behavior of individual ASD carrier, given a life improvement and not allowing a worse condition [9]. Searches attest that it is possible to exercise the recognition abilities and production of facial expressions of ASD carriers, through the support of visual computing tools capable of reinforce the connection between mental state and facial expression [3,5,11]. In this sense, this article presents the development of a therapeutic support game for children, in special who carries ASD, called Trimem´oria. It is a game that extends the popular memory game with facial elements for expression recognition, together with Radio Frequency IDentification (RFID) resources to produce a real and virtual playful therapeutic ambient for the child audience. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 264–269, 2020. https://doi.org/10.1007/978-3-030-65736-9_23
A Memory Game Proposal for Facial Expr. Recognition in Health Therapies
2
265
Related Work
The RFID technology is used in the automated identification of people and objects. It is a sensor which extends the integration between real and virtual world in the smart environment [2]. As a result, RFID allows computer programs or cell phones to track changes in the physical world and reflect them in the virtual world [2]. Some developed initiatives has been applied with the use of RFID technology to the support of educational activities for children with ASD. As an example, in the United Kingdom (U.K.), teachers use the Pictures Exchange Communication System (PECS) for communication support with children who carries ASD. Based in the PECS, Herring et al. [6] developed a virtual teacher for the child interaction that recognizes images with embedded RFID tags, giving as a result a certain autonomy to children interact and learn with it. Another example of RFID as a support tool was created by Sula and Spaho [10], where cards with RFID tags were used to support assistive education activities, such as images learning, identify equals object, coo-work, emotional categories, and lastly math. The main use of RFID in this system was performed by parents that can index some objects. When the child takes an indexed card and goes through the sensor, he will hear the word in the native language with the proposal to repeat it [10].
3
Methodology
Trimem´oria uses distinct memory cards with at least three images of the same creature with different facial expressions to be find by the player (Fig. 1). The idea is to allow distinct expressions identification in the same creature, expanding intervention possibilities during therapeutic sessions. To associate a virtual environment to the game, each Trimem´oria card has a built-in RFID tag that will be read by a integrated RFID service. The idea is to detect which creature the child found and generate an associated stimulus for it. For each detected RFID tag, an information is sent to a MQTT broker that transfers the detection to connected clients. The action of sending the RFID tag is done by an RFID service supported by an RFID device, and each connected client represents an application that generates a virtual stimulus to the player for each received information. Another point is the RFID service assembly with a low cost and open hardware micro-controller able to read RFID sensors. For this, a NodeMCU Lolin 1.0 was used to: collect the data of an RFID sensor; manage the connection with the WiFi service; manage the connection with the provided broker; and send RFID tags detected by the protocol MQTT’s topic. The connection between NodeMCU and the RFID sensor is performed by a communication SPI (Serial Peripheral Interface). The information is transmitted via pins D4, D5, D6 and D7, which represents the Reset (RST), Serial Clock (SCK), Master input Slave Output (MIS0) and Master Output Slave Input (MOSI), respectively (Fig. 2).
266
S. V. Lima and V. T. Sarinho
Fig. 1. Some designed creatures with respective emotions for the Trimem´ oria game.
The MQTT server/broker is responsible for managing the messages of the topics, which contains the IDs of the detected RFID tags by the connected RFID service. After receiving a message, the MQTT server sends the message to all clients connected with the topic, in this case the outPut topic of the Trimem´oria game. A TCP server was also developed to be activated when the RFID service turns on. It is responsible to collect available WiFi information from an external
Fig. 2. Prototyping the RFID service for the Trimem´ oria game.
A Memory Game Proposal for Facial Expr. Recognition in Health Therapies
267
Fig. 3. RFID, MQTT and HTML services for the Trimem´ oria game.
dispositive (smartphone for example) that will be used by the RFID service to connect with the MQTT broker. After the confirmation that the internet information is available to be used, the RFID service turns off the TCP server and connects itself with the chosen WiFi, and consequently with the respective active MQTT broker (Fig. 3). A HTML server was also created to act as a client to the MQTT broker (Fig. 3). It was development through available JavaScript frameworks able to: interpret the developed MQTT client code (Node.js); manage the information from HTTP requisitions and provide an API REST (CORS and Express.js); perform the connection with the MQTT broker (MQTT.js); and create dynamic views for the game (Vue.js). As a result, when the MQTT broker sends a new information to the MQTT client to API REST, this information will appear in a connect client interface managed by Vue.js.
4
Results and Discussion
For the initial RFID service prototype (Fig. 4), that was built using a low cost open hardware (U$ 4.23 as total cost), it is able to execute two main processes for the game operation. The first is responsible to establish a communication to receive the necessary settings to control the communication with the MQTT broker. The second is responsible to control the communication with the MQTT broker, by sending detected tag IDs from the RFID device. In both, the game presented a good server performance and a fast device response, without compromise the game play as a whole.
268
S. V. Lima and V. T. Sarinho
Fig. 4. RFID service prototype, game cards with RFID tags, and an HTML client example.
For the game cards with associated tag IDs, they will be detected by the RFID device when the player selects one. The RFID service sends a message via MQTT broker to the HTML server, that is received by the HTML client due to a continuous verification of the HTML server status (Fig. 3). As a final result, for each received tag ID by the MQTT broker, the associated images and sounds of the detected emotion is displayed to the player (Fig. 4).
5
Conclusions and Future Work
This paper presented Trimem´ oria, a game that creates a mixed reality in a memory game, which can be used in health therapies to assist the patient in the recognition of facial expressions. For that, RFID, MQTT and HTML services were developed to provide: hardware devices, for player interaction on a physical level; software services, for the card detection together with internet communication; and client apps, able to present multimedia content associated with detected facial expression. Regarding the use of open hardware, this paper confirms that its use is viable for the creation of an initial low-cost prototype of the proposed game, despite the different types of developed and configured services for its proper functioning. As an example, the creation of an extra TCP application to establish the communication between the RFID service and the MQTT broker shows that it is necessary to automate and simplify the activation and the synchronization process of the proposed hardware/software resources for the game. As future work, it is intended to create an approach for the direct communication between RFID and the virtual game interface, eliminating the use of different services applied in the developed prototype. Other game dynamics, such as the competitors identification in a match and the return of game awards points, will be develop in the future. To finish, the application of the game in therapies with ASD carriers, along with the assessment of gains and losses obtained with its use, will also be carried out in a near future.
A Memory Game Proposal for Facial Expr. Recognition in Health Therapies
269
References 1. Baron-Cohen, S., Leslie, A.M., Frith, U., et al.: Does the autistic child have a “theory of mind”. Cognition 21(1), 37–46 (1985) 2. Bohn, J.: The smart jigsaw puzzle assistant: Using rd technology for building augmented real-world games. In: Workshop on Gaming Applications in Pervasive Computing Environments at Pervasive. vol. 2004 (2004) 3. Deriso, D.M., et al.: Exploring the facial expression perception-production link using real-time automated facial expression recognition. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 270–279. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33868-7 27 4. Eckman, P., Friesen, W.: Manual for the facial action coding systems, consulting psych. Press, Palo Alto (1977) 5. Gordon, I., Pierce, M.D., Bartlett, M.S., Tanaka, J.W.: Training facial expression production in children on the autism spectrum. J. Autism Dev. Disorders 44(10), 2486–2498 (2014) 6. Herring, P., Sheehy, K., Jones, R., Kear, K.: Designing a virtual teacher for nonverbal children with autism: pedagogical affordances and the influence of teacher voice (2010) 7. Klin, A.: Autismo e s´ındrome de asperger: uma vis˜ ao geral. Brazilian J. Psychiatry 28, s3–s11 (2006) 8. Lˆ o, E.N., Goerl, D.B.: Representa¸ca ˜o emocional de crian¸cas autistas frente a um programa de interven¸ca ˜o motora aqu´ atica. Revista da gradua¸ca ˜o 3(2) (2010) 9. Pradi, T., Silva, L., Bellon, O.R., D´ oria, G.M.: Ferramentas de computa¸cao visual para apoio ao treinamento de expressoes faciais por autistas: Uma revisao de literatura. In: Anais do XLIII Semin´ ario Integrado de Software e Hardware, pp. 140–151. SBC (2020) 10. Sula, A., Spaho, E.: Using assistive technologies in autism care centers to support children develop communication and language skills a case study: Albania. Acad. J. Interdiscip. Stud. 3(1), 203–203 (2014) 11. Tanaka, J.W., et al.: Using computerized games to teach face recognition skills to children with autism spectrum disorder: the let’s face it! program. J. Child Psychol. Psychiatry 51(8), 944–952 (2010)
Augmented Reality Media for Experiencing Worship Culture in Japanese Shrines Kei Kobayashi and Junichi Hoshino(B) University of Tsukuba, Ibaraki, Japan [email protected], [email protected]
Abstract. With the globalization in recent years, multicultural coexistence has been increasing its presence. This concept aims to create new relationships among people with different cultures through the acceptance of each other’s cultural differences and values. In this study, we will clarify the method of realizing a media that conveys information about visiting shrines, such as the divine messengers, enshrined deities and manners, while emphasizing the acceptability of multi-generational users and the affinity with the installation space. The requirements were established through a survey, and to meet the requirements, a picture scroll was produced that included a screen to project the images and information to supplement those images. Users can put their hands on the images projected on the picture scrolls or place objects to get information about visiting the shrine. A questionnaire survey was conducted among the multigenerational users of the site, from the young to the elderly, to confirm the effectiveness of the experience of visiting the site, as well as their interest in etiquette and knowledge. Keywords: Culture · Interaction · Media art · Augmented reality
1 Introduction With the globalization in recent years, multicultural coexistence [1] has been increasing its presence. This concept aims to create new relationships among people with different cultures through the acceptance of each other’s cultural differences and values. Among all Japanese cultures, those shrines located in the neighborhood are particularly playing important roles including the formation of regional communities and the sharing of intangible cultures in the region such as annual events like New Year’s visit to a shrine (Hatsumoude). However, in recent years, there have been decreasing opportunities where we can gain knowledge about intangible cultures and the roles of a shrine which have been incorporated into our living practices and shared. In this paper, we propose a scroll painting that delivers the intangible cultures concerning worship such as the divine messengers, enshrined deities and manners. We have, blending with the scroll painting, used image projections based on the recognition of objects and body motions. Worshippers and those concerned with the shrine have experienced the system installed at an actual shrine. The research results obtained through the discussions on the acceptability and effects of this system are detailed hereinafter. © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 270–276, 2020. https://doi.org/10.1007/978-3-030-65736-9_24
Augmented Reality Media for Experiencing Worship Culture in Japanese Shrines
271
2 Preliminary Surveys We orally interviewed the people from shrines and Cultural Properties Department of the city government for the purpose of hearing their opinions about current worship and expression methods. 1) Worship Experience at Actual Site The problem of decreasing number of people getting involved in shrines due to a declining birthrate was pointed out based on the interviews with the people concerned. In addition, there was a comment that it was not good to complete one’s worship on the web. It is thought that their stances vary because some shrines and temples provide a virtual worship service. 2) Compatibility with Space for Installation The answers include those such as “We have reduced the number of sign boards in our shrine because we respect its view”; “We would like you to give expression in a way not to change our essential parts” and “It would be nice if we can check the information on the traffic lines of worshippers through images, videos, etc.”. 3) Expression Drawing Attention of Many People Some held the view that easy-to-understand expression methods such as images would be good as the ones that easily attract the attention of multigeneration worshippers including children and elderly people. Regarding the changes in expressions, we took home shrines (kamidana) as an example, and such comment as “Expression methods will change in accordance with today’s housing environments like rental housing in which people are not allowed to install one with making a hole on the wall” told that even traditional things would change in line with the times.
3 Media Composition 3.1 Scroll Painting on Which Images Are Projected As listed in design requirements 2, we, considering the compatibility with the space for Installation as well, employed a scroll painting made of Japanese paper (washi) (Fig. 1/Fig. 2). It was thought that there would be the compatibility with shrines since scroll paintings were one of the forms of Japanese paintings. 3.2 Information on Enshrined Deities In this system, we gave expressions using the things relating to the enshrined deities appearing in myths such as A Record of Ancient Matters (Kojiki) and the images comprised of the names of enshrined deities (Fig. 3). We chose, as the move to switch images,
272
K. Kobayashi and J. Hoshino
Fig. 1. Overall view of work
Fig. 2. Picture scroll
Fig. 3. Objects of divine messengers and picture scroll on which images are projected
Fig. 4. Holding out hand over icon
Fig. 5. Placing object of divine messenger
Augmented Reality Media for Experiencing Worship Culture in Japanese Shrines
273
the body action of holding out a hand over the icon (Fig. 4) with prioritizing the system manageability in consideration of the multigeneration people as our targets. Further, we produced the animals as objects which are believed to be the messengers of enshrined deities. Placing such an object on the prescribed position, the information on the enshrined deity relating to the divine messenger expressed in the object will be displayed together with the special effects of clappers (hyoshigi) and drums (Fig. 5). 3.3 Manners for Rinsing Hands and Worship The manners for rinsing hands aim not to wash one’s hands but purify them, which are the simplified version of the manners that used to be followed in sinking into a river, the sea, etc. for expelling one’s impurities. We considered that because manners were body actions after all, those who saw easy-to-understand animated images could get interested without being restricted by words (Fig. 6). We introduced a seek bar to the animated images describing how to properly rinse hands for those who want to see them again and study them. The seek bar also enables those who don’t’ have any knowledge on the manners to recognize their completion as it tells them when the animated images begin and when they end.
Fig. 6. Manners for rinsing hands
Fig. 7. Manners for worship
With regard to the manners for worship, we, using animated images likewise (Fig. 7), show people the body actions of bowing twice, clapping hands twice and bowing once (nihai-nihakusyu-ichihai) which have been widely known. We project images by putting a film for image projection on the transparent acrylic board installed by the side of the scroll painting, which, in comparison with a normal screen, doesn’t rupture spaces, makes visual expressions seen as if they were floating in the air, and enables one to feel mysterious, magical atmospheres.
274
K. Kobayashi and J. Hoshino
3.4 System Implementation Method For the purpose of the establishment of an evaluation system (Fig. 8), we detected the bony framework of users using an RGB-D camera in the system to project and operate the images on the scroll painting in 3.2 and 3.3. With designating the coordinates of an icon projected on the scroll painting with a projector, we created a program by which the projected images are switched and special effects are played as soon as one’s right hand or left hand covers such coordinates.
Fig. 8. System configuration
Fig. 9. Equipment used
With regard to the recognition of the object of a divine messenger, we utilized a cascade classifier in order for the RGB-D camera to recognize the object. As for the manners for rinsing hands, we changed the start location of animated images by evenly divided ordinate for the operation of the seek bar. Regarding the projector, we chose a small mobile projector in order not to obstruct the detection of a user’s bony framework and fixed it on the light stand (Fig. 9).
4 User Study in Shrine In order to verify the effects of this system, we exhibited it from November 1, 2017 to November 7, 2017 at a shrine with an exhibition facility. We selected the exhibition facility in terms of the facts that it was on the traffic line of worshippers, that it was an indoor facility not to change the shrine’s view with this system, and that it can secure the electric power source for the system and images. For the purpose of confirming the necessity of the system dealing with a traditional subject as well as its acceptability and attractiveness enough to generate interests of worshippers, we requested 67 people with the age range from 2 to 60+ to use this system and interviewed 36 people among them (20 males and 16 females) In addition, we received comments and feedbacks from 2 people from the shrine.
Augmented Reality Media for Experiencing Worship Culture in Japanese Shrines
275
4.1 Effects of Proposed Media We, as a traditional method, created a system in which one can show on the display the same image as projected on the scroll painting and switch it with a click. Users were requested to operate it and give feedbacks in comparison with the proposed media. On the proposed system, one replied, “It is more interesting, rather than just watching the screen and clicking buttons for seeing things. I think this system is interesting because it requires us to produce movements such as holding out our hand and placing a thing, even for seeing the same things (a male in his 40’s)”. Concerning the display, one replied, “This is easier to operate. That one (proposed system) is a kind of difficult to understand for beginners… (a male in his 60’s)”, which showed that the traditional device had the advantage that it had an easy-to-understand operation system due to its popularization. Meanwhile, regarding the proposed system, one said, “This system has the property that we don’t know what’s going to happen next. That clicking this one (the pressure-sensitive touch trackpad of a notebook computer) to proceed is a usual move, isn’t it? This system is interesting in that we don’t know what’s going to happen by placing a thing (a male in his 40’s)”, which told us that one’s inexperience in operation would awaken his/her interests. 4.2 Compatibility with Spaces for Installation Some provided a comment that the materials and designs suitable to a subject were effective for the compatibility with the space for installation. “I think its design is good, matched with the places like a shrine or equivalents. I believe this is an important point (a male in his 30’s)”; “The screen is so easily viewable, clear and well-made. Its layout is nice as well. It also produces a good atmosphere (a male in his 60’s)” and “I guess a live-action video is too much as it would repeatedly play, for example, only this portion in which a person bows twice and claps the hands twice (nihai-nihakusyu), I mean visually. Displaying an actual human being is like…. This system will not make us tired that much. In terms of visual expressions as well, the simpler, the better (a person from the shrine)”, all of which are examples giving a good evaluation to what we intended. Further, we received a comment such as “I have found it interesting. A scroll painting or the like produces a good atmosphere (a female in her 20’s)”. 4.3 Acceptability of Multigeneration Worshippers Eighteen users equivalent to about half of them made a comment that the system was interesting. More specifically, they answered that they found the system interesting mainly because of its design, operation methods and special effects. Regarding the proposed system being an expression method with the use of the equipment including sensors and a PC, 13 people aged 60 or above also said that the proposed system was interesting and any negative feedback about the expression method itself was not given.
5 Summary and Future Development In this paper, we have discussed the design requirements and implementation methods of media that deliver the worship cultures in shrines. With installing the scroll painting
276
K. Kobayashi and J. Hoshino
in the shrine, we have, based on the feedbacks from multigeneration users from young people to elderly ones, confirmed several things including the effects caused through their worship experiences at the actual shrine site as well as their interests in manners and knowledge. In addition, it has been found that even those including elderly people are able to use an interactive experience system and feel it interesting. One of our future tasks is to validate the effectiveness of this system for non-Japanese people such as tourists from overseas.
Reference 1. UNESCO: Cultural diversity; https://en.unesco.org/themes/education-sustainable-develo pment/cultural-diversity
Customer Service Training VR Game System Using a Multimodal Conversational Agent Tomoya Furuno, Yuji Omi, Satoru Fujita, Wang Donghao, and Junichi Hoshino(B) University of Tsukuba, Ibaraki, Japan [email protected], [email protected]
Abstract. Hospitality is the primary focus in many service industries. Staff can improve customer satisfaction by grasping the customer’s situation accurately and responding to it appropriately. To cultivate these skills, we propose a VR training system based on multimodal recognition of the customer’s status during service. A preliminary user study of a claim handling situation is also summarized. Keywords: VR training system · Customer servers · Hospitality
1 Introduction Traditionally, customer service skills are learned over a long time through experience on the job and supervision from senior employees and other relevant people. However, this approach involves experiencing failures due to lack of experience, thereby creating a psychological burden on the staff. Even when on-the-job training is provided, exercising many situations repeatedly is difficult. Muramoto et al. [1] developed a customer service training software for newly hired female clerks at women’s fashion stores. The software simulated customer dialog and the interrelationships among the customer, product, and clerk. Robert et al. [2] developed a character-based dialog training system for informed consent skills and evaluated its effectiveness. The study demonstrated that the subjects who trained with virtual characters achieved better performance in an interview with a real person than the subjects who trained using only written instructions. In the present work, we describe both the concept and current prototype of a customer service VR game system using a multimodal conversational agent with dialog involving speech and gestures. This system enables the trainee to play the role of staff in various situations as many times as needed to learn to not only understand the customers’ problems and anticipate their potential needs but also provide accurate solutions and appropriate psychological care.
2 Overview of Customer Service Training System High-quality customer service creates a positive impression of the stores and products leading to an increase in the total number of visitors and repeated purchases. If a clerk © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 277–281, 2020. https://doi.org/10.1007/978-3-030-65736-9_25
278
T. Furuno et al.
leaves customers unsatisfied, they are unlikely to return even if the quality of the products is good. Skillful hospitality is especially important when customer experience is an integral part of the service, for example, in airports, hotels, and restaurants. The following competencies are considered to be important. 1) Understanding potential needs: estimating problems from the customers’ characteristics and behavior as well as the current situation. For example, if a customer appears confused while standing in front of an airport ticketing machine, he or she may not know how to operate it. 2) Proposing solutions: being able to present an appropriate solution based on available information. For example, if there is a problem with an airplane, the solution must take time constraints and other circumstances into account. 3) Creating a positive experience: social cues of staff such as facial expressions, tone of voice, and gestures are also important for leaving a good impression. A skilled customer service staff member should be able to • • • •
deal with all situations described in the manual; accommodate a large number of customers in a limited amount of time; handle claims; and improvise appropriate responses to rare and unique situations (Fig. 1).
Fig. 1. Multimodal dialogue process of customer service
For the training to be successful, it is necessary to be able to recreate the details of relevant scenarios, such as time, place, occasion, and customer characteristics. Training based on specific scenarios includes the goal-based scenario model in experiential education, content-based instruction in second language education, task-centered teaching, and simulation education in medical fields. We built our system to conform to the following requirements: • It should be able to simulate various training scenarios, including those that require listening to the customers and understanding their needs, require business knowledge, and require emotional care.
Customer Service Training VR Game System
279
• The customer agents should be able to proceed appropriately by analyzing the voice and gestures of the trainee and responding to them verbally and through showing emotional reactions. • It should be able to evaluate whether the trainee has provided convincing responses based on business knowledge. • It should be able to provide feedback and suggestions for improvement to the trainee (Fig. 2).
Fig. 2. Concept of customer service VR training system using conversational agent
The VR environment and customer character were implemented using Unity version 2019.1. Speech recognition was done using Windows Speech Recognition, and Transitions, a Python package, was used for the finite-state-machine-based dialog. Eye tracking was done using HTC VIVE Pro, and gestures were recognized using bone extraction facilitated by an RGB-D camera (Microsoft Kinect). Different types of bows (slight bow, salute, respectful bow) were distinguished.
3 Airport Ground Staff Scenarios While hospitality is crucial for a variety of services, in the present work we focus on the ground staff of airline companies serving customers at airports. The role of the ground staff is to provide efficient assistance according to the airport and airline operation rules, ensuring the customer has a comfortable trip to their travel destination. The main work areas are around departure gates, check-in counters, and baggage claims. In recent years, areas around automatic boarding registration terminals and automatic baggage gates have become common as well. A sample scenario can be found below. Customer: Excuse me, I would like to ask you a question. Staff: Yes, how may I help you? Customer: Is the flight A223 departing as planned? Staff: I am sorry, it will be delayed by about two hours due to the typhoon.
280
T. Furuno et al.
Customer: Two hours late? I have plans immediately after arrival. Would it be possible to switch to another flight? Staff: I am very sorry, other flights are all full. We apologize and provide a meal coupon to all passengers on flight A223, which you can use at this airport. Customer: Could you arrange for a taxi upon my arrival? Staff: Yes, would you like me to go ahead and do that? Customer: Yes, please. Would the airline cover the fare in this case? Staff: I am sorry, if the delay is due to a typhoon, the taxi fare will not be covered. Customer: Okay, I understand.
4 Preliminary User Study Four airport service beginners aged 20 to 27 years were asked to go through the training scenarios in the VR environment multiple times, and the changes in each person’s responses with each repetition were analyzed. The automatic check-in machines and baggage drops in airports were reproduced in the VR environment. The customer agent recognized the utterances of the staff in training using speech recognition and responded with speech synthesis and changes in facial expression. The trainees played the role of ground staff, listened to the customer character complaining about the delay of the flight, and responded by offering a refund, providing assistance with transportation, etc. The trainees were asked to go through three sets of scenarios with 30-min breaks in between. Each set comprised four scenarios with different variations, such as the reason for the delay (e.g., weather or aircraft failure), length of the delay, and arrival time. To analyze the trainee behavior, content of the conversations, their lengths, the time elapsed before the conversation starts, and lines of sight of the trainees were measured, and videos of their performances were taken. We observed a tendency toward shorter delays before starting the conversations, accelerated decision-making, and a smoother manner of speech in all four subjects as the scenarios were repeated. Moreover, it was confirmed that the trainees were able to learn the necessary skills required to respond to the customers using this system repeatedly despite starting without preliminary knowledge. We also observed some voluntary actions such as the use of honorifics, other speech choices, and bowing.
5 Conclusion We have described our current progress toward a customer service training system using a multimodal conversational agent. Future work will include long-term user studies with various training scenarios in business environments. Acknowledgement. This work was supported by Council for Science, Technology and Innovation, “Cross-ministerial Strategic Innovation Promotion Program (SIP), Big-data and AI-enabled Cyberspace Technologies”. (funding agency: NEDO)
Customer Service Training VR Game System
281
References 1. Muramoto, R., Kaneda, T., Tanabe, J.: A simulation game software for training use of salestalking in fishion shop. The Special Interest Group Technical Reports of IPSJ. 1994-ICS-099, 59–65, 1994 2. Hubal, C., Day, S.: Informed consent procedures: an experimental test using a virtual character in a dialog systems training application. J. Biomed. Informat. 39(5), 532–540 (2006)
Artificial Intelligence
Procedural Creation of Behavior Trees for NPCs Robert Fronek, Barbara G¨ obl, and Helmut Hlavacs(B) Entertainment Computing, University of Vienna, Vienna, Austria [email protected] Abstract. Based on an emerging need for automated AI generation, we present a machine learning approach to generate behavior trees controlling NPCs in a “Capture the Flag” game. After discussing the game’s mechanics and rule set, we present the implemented logic and how trees are generated. Subsequently, teams of agents controlled by generated trees are matched up against each other, allowing underlying trees to be refined by learning from victorious opponents. Following three program executions, featuring 1600, 8000 and 16000 matches, highest scoring trees are presented and discussed in this paper. Keywords: Behavior trees
1
· Machine learning · Procedural creation
Introduction
Despite its major impact on gaming experience, creating artificial intelligence (AI) for games is increasingly pushed towards late stages of the development process, freeing resources for easily marketable features such as graphics [4]. Thus, one can currently observe a trend towards automatically generated AI in the industry [6]. These approaches allow for procedural generation of opponents that are still able to display different behaviors. The following paper presents a program that procedurally generates behaviour trees (BT) for a simple “Capture the Flag” game. Games have a fixed set of rules and possible interactions, making them ideal candidates to test the capabilities of an AI [8]. The quality of an AI is evaluated by matches between randomly generated BTs. When a BT is defeated, it is discarded and replaced by a new one, which, after a sufficient number of matches, eventually leads to a BT that is superior to a large number of other, randomly generated BTs. The game, despite its turn-based approach, shares various aspects with real-time strategy (RTS) games, such as commanding units, strategic decision making and aiming for both short- and longterm goals [7].
2
Related Works
Several previous works discuss applications of automated BTs. Lim et al. [3] managed to defeat the AI of the game “DEFCON” through evolution and combination of BTs. Abiyev used BTs to control soccer robots [1]. Dromey points out c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 285–296, 2020. https://doi.org/10.1007/978-3-030-65736-9_26
286
R. Fronek et al.
that BTs are a good alternative to automated AI, due to a clear connection of requirements and formal specifications [2]. This allows to generate requirementbased BTs.
3
Game Mechanics
The game is played on a board comprising a total of 56 (7 × 8) fields. Each field can only be occupied by one agent. If another agent, either ally or enemy, is already placed on a field an agent cannot move onto it. There are three types of terrain for each field: – Open: Every agent can move onto this field, there are is no bonus or penalty. – Blocked: No agent can move onto this field. Agents may shoot across the field with no bonus or penalty. – Cover: Every agent can move onto this field. Additionally, damage is reduced when placed on this field if attacked from the front. Damage from melee attacks is not reduced, neither is damage from attacks originating from behind or either side. Figure 1 shows the game’s board. Two fields, symmetrically placed, are blocked (marked “x”). Additionally, there are 4 fields of type “cover” (marked: “c”) in the field’s center as well as next to the blocked fields. The flag’s position is marked “F”. Numbers indicate x- and y-coordinates as used by the game’s internal logic.
Fig. 1. Empty board (Color figure online)
Each team starts off with 5 units (agents) positioned on the top (colored in blue) and bottom row (colored in red) of the board. A team wins if one of its agent reaches the opposing team’s flag. If no agent has reached the opposing team’s flag within 100 turns, the game results in a draw.
Procedural Creation of Behavior Trees for NPCs
287
Sweetser et al. [9], define (software) agents as goal-driven entities that are aware of their environment. In this work, agents operate based on a number of parameters as described below. The AI’s success determines whether chosen actions are suitable to support the game’s goal. Agents initially possess 25 health points (HP) and can perform ranged and melee attacks. If the HP value decreases to 0 or less it is defeated and removed from the game. Ranged attacks can be carried out 2 times before ammunition needs to be reloaded. Reload is either initiated by explicitly performing the corresponding action or happens automatically if a ranged attack is conducted without ammunition, thus replacing the attack action. There are 9 possible actions for an agent. Each turn, an active agent (i.e. an agent that has not been defeated yet) performs one of the following actions: – Move towards an enemy: The agent looks for nearby enemy agents on the field. If none are left, he moves towards the opposing team’s flag instead. Distances are measured in fields and the agent may move one field horizontally or vertically per turn. Vertical movements are prioritized over horizontal movements until the opposing agent is positioned at the same row. If a field is blocked, the agent moves sidways at a 90◦ angle to the initially calculated direction. If none of these movements is possible, the agent can perform no more action in this turn and stays in place. – Move towards the enemy’s flag: The agent moves one field towards the opposing team’s flag. Similar to the previous action, vertical movement is prioritized if possible and necessary. – Move towards the own team’s flag: The agent moves one field towards their own team’s flag. Again, vertical movements precede horizontal movements. – Heal: The agent stays in place and restores a small amount of HP. – Shoot enemy agent: The agent shoots the nearest enemy agent. Damage is higher, the closer the other agent is. Shot agents take increased damage if attacked from their sides (double damage) or behind (triple damage). In both cases, “Cover” fields do not reduce damage. If an agent is in cover, frontal damage is reduced by half. If no ammunition is left, the agent reloads instead. – Perform melee attack: The agent attacks an enemy agent on an adjacent field, doing damage dependant on the attack’s direction. Sideway attacks do double damage while attacks from behind immediately defeat the attacked agent. If there is no enemy on adjacent fields, the agent moves towards an enemy instead. – Flank: The agent preferably tries to move on a field directly behind an opposing agent, or, alternatively, to the side of an opposing agent. – Reload: Ammunition is restored to 2. – Idle: The agent stays idle.
4
Logic
Each team generates a BT to decide which actions an agent performs. The BT is a binary tree where all internal nodes are logic nodes and leaf nodes
288
R. Fronek et al.
represent actions. Each logic node represents a comparison with a randomly chosen operator and a threshold. One of the following comparisons may be assigned: – – – – – – – – –
Distance from the own flag to its closest enemy agent Distance from the closest enemy agent to self HP of self Distance from self to the enemy’s flag Remaining ammunition Whether self is in cover Whether closest enemy agent is in cover Number of nearby allied agents (teammates) Number of nearby enemy agents
Possible operators are ≤(less or equal) and ≥(greater or equal). Thresholds are randomly assigned from predefined ranges. Distances are assigned between a range from 1–15 as 15 is the maximum distance on a 7 × 8 board (based on the 1 norm). Agents’ HP range from 0–25, ammunition ranges from 0–2. Checking whether the agent itself or enemy agent’s are positioned on “Cover” needs no threshold as it is a logical query. Nevertheless, 0.5 is assigned to avoid NULL values. To query for nearby enemy agents, two parameters are needed. A threshold, assigned between 1 and 4, limits the amount of agents that are looked for. The second parameter defines the maximum distance of fields for the search, possible values range from 1–6. Trees are generated as follows: 4–12 actions are randomly assigned to leaf nodes, with corresponding 3–11 internal logic nodes holding randomly assigned comparisons and operators. Subsequently, a threshold and, if needed, a second parameter are assigned randomly. Tree traversal starts by evaluating the logical statement in the root node. If true, the left child is visited, otherwise the right child is visited. If the next node is a logic node, traversal continues as described, if it is a leaf node, the corresponding action is performed. Active agents decide which action to perform by traversing the tree each turn.
5
Methodology
Ponsen et al. [5] show, that dynamic AIs which adapt based on victorious strategies are superior to static AIs. Thus, defeated AIs should learn from victorious BTs and try to adapt to their strategies. At first, BTs are generated for 32 teams. Teams are then paired against each other and initialized with a score of 1. Subsequently, each team keeps count of their victories. Victorious teams increment their score by 1 plus half of the points (rounded down) of the defeated team, except for victors in the first round who double their score to 2. 5.1
Generation of a New Tree (No Draw)
The defeated team’s tree is generated based on both opposing parties’ trees. Size of the tree is randomly assigned from 3 options: size of the victorious team’s tree,
Procedural Creation of Behavior Trees for NPCs
289
size of the defeated team’s tree or a randomly assigned new size. Similarly, each node is randomly inherited from the corresponding node from either tree or randomly generated. If the algorithm tries to base a node on a pre-existing tree with smaller size holding no corresponding node, it falls back to the other tree’s nodes, or, ultimately, random generation of a new node. Generation of new trees is based on weighted decisions. First, the upper limit l of a range is determined by the victorious team’s score (Sv ), the defeated team’s score (Sd ) and a constant of 5, to allow for randomly assigned nodes, as follows: l = Sv × 3 + Sd + 5. A random number x is generated within the range of 0–l. If x < Sv ×3, nodes are inherited from the victor’s tree, if Sv ×3 ≤ x < (Sv ×3+Sd ) nodes are inherited from the defeated tree. If the random value is greater or equal to the latter limit (x ≥(Sv × 3 + Sd )), a randomly generated new node is assigned. This enables defeated teams to “learn” from victors but still allows for improvements based on random adaptations. Example: A team with a score of 7 (Sv = 7) defeats a team with a score of 2 (Sd = 2). This results in a range of 0–27 for the randomly generated number x. If 0 ≤ x < 21, the node is inherited from the victorious team’s tree. If 21 ≤ x < 23, the node is inherited from the defeated team’s tree. Finally, if x ≥ 23 a new node is generated randomly. These newly generated trees are assigned a score S, where S = 1 + (Sv + Sd )/10. 5.2
Generation of a New Tree (Draw)
If no team is victorious after 100 turns, the game results in a draw. Typically, if a team wins eventually, a game lasts about 15–40 turns. Thus, probability of a draw is higher if a game goes on beyond 40 turns. Considering the random generation of trees, draws might e.g. be caused by trees that never move an agent towards the enemy flag. In this case, both trees are replaced, where one tree is generated based on both team’s trees as described above (one tree is assumed as victor). The other tree will be generated from scratch and initialized with a score of 1. 5.3
In-between Matches
After each match, the victorious team’s score is checked against the current high score. If the team has a higher score, the new high score is saved and it’s underlying tree is saved for output after completion of all matches. Additionally, the previously highest scoring tree is now ranked second and will also be outputted. Eventually, the vector holding all teams will be shuffled to create new pairings. After a predefined number of matches, the program terminates and outputs the two highest scoring trees and general statistics (see Table 1).
6
Results
The program was executed 15 times, 5 times each for 100 rounds (1600 matches), 500 rounds (8000 matches) and 1000 rounds (16000 matches). The most
290
R. Fronek et al.
superiour trees of each set are discussed below. Table 1 provides further data on program execution and results, such as number of victories, draws and high scores (HS). Table 1. General statistics Rounds 100
7
500
1000
Run
Victories Draws HS Victories Draws HS
Victories Draws HS
1 2 3 4 5
1451 1425 1427 1362 1458
149 175 173 238 142
54 43 41 47 50
5913 6887 7577 6683 6776
2087 1113 423 1317 1224
13238 14027 13308 13004 14026
Avg
1224.6
175.4
47
6767.2
1232.8 52.2 13520.6
2479.4 58.8
%
89,0%
11,0%
84,6%
15,4%
15,5%
34 49 61 69 48
84,5%
2762 1973 2692 2996 1794
50 75 60 56 53
Analysis of Trees
Table 1 provides some first insights. Despite a higher number of matches, results show a roughly equal rate of draws. In a match between two “competent” AIs, a draw is rather unlikey, due to the first-move advantage. This suggests, that a rather constant rate of “inapt” AIs (AIs that do not move towards the opponent’s flag) is generated during program execution. Program execution characterized by higher scores result in less matches ending in a draw. It can be safely assumed, that this is due to early generation of a superiour AI that subsequently defeats its opponents and serves as a basis for opponent’s to learn from. The first run featuring 500 rounds illustrates this issue very well, as a low high score is accompanied by a high number of draws, which suggests that random generation of BTs did not produce a superiour AI for some time. The gathered data during program execution also shows that a higher amount of rounds does not result in higher scores. One could naively assume that 5 times as many rounds result in at least 5 times higher scores - or even more, as a superious AI might gather more points from defeated, but previously successful, opponents. A possible explanation might be that defeated AIs learn rather quickly from superior AIs which eventually leads to equally “competent” opponents. 7.1
Analysis of Superior Trees (100 Rounds)
Tree 1 (Fig. 2) features a “Enemy in Cover” query in its root node (true represented by 1, false by 0). However, the query is more likely to return false, which
Procedural Creation of Behavior Trees for NPCs
291
Fig. 2. Tree 1–100 rounds, run 1, score: 54
results in more frequent traversal of the left side of the tree. Please keep in mind, that the tree is traversed according to the following rules: if a query returns true, the left child is visited, otherwise the right child. “Enemy Distance to Flag” is always less than 13 as long as there are active, opposing agents, leading to traversal of the right children of this node. This results in a mostly defensive behaviour: agents wait, in close distance to their own flag, for opposing agents to advance and shoot them when in reach. As soon as all opposing agents are defeated, querying “Enemy Distance to Flag” returns a value of 1000, leading the agents to try to flank. As there are no further opponents and the flag can not be flanked, agents move directly towards the other team’s flag instead. Similar behaviour can be observed when opponents are in cover: agents move towards their own flag or reload ammunation. As there are is no cover in immediate distance to the flag, the AI may easily defend its flag. Tree 2 (Fig. 3) mostly leads to melee attacks: the root node likely returns false as all allied agents still have to be active and nearby. Agents move towards the enemy if no enemy is nearby and subsequently move towards the flag if no opposing agent is still active - resulting in a simple, but efficient tree.
Fig. 3. Tree 2–100 rounds, run 2, score: 43
292
R. Fronek et al.
The right side of Tree 3 (Fig. 4) is rarely traversed, as an agent’s HP are likely higher than 3.76. The AI lacks opportunity to use ammunition and will therefore mostly perform melee attacks unless many enemies (>3.28) are nearby and the agent heals itself.
Fig. 4. Tree 3–100 rounds, run 3, score: 41
Tree 4 is good example of shirking responsibility: as long as there are nearby allied agents, agents stay idle. This leads to agents on the outer borders of the formation to attack enemys first, while remaining agents, initially positioned in the middle, attack later on (Fig. 5).
Fig. 5. Tree 4–100 rounds, run 4, score: 47
7.2
Analysis of Superior Trees (500 Turns)
Similiar to some previous trees, tree 6 (Fig. 6) will mostly perform melee attacks. If many allied agnets are nearby, they might stay in cover. Nevertheless, as fields providing cover are limited, some agents will continue to attack. In contrast, tree 7 (Fig. 7) displays a new approach: the team will split up and try to reach the enemy’s flag as fast as possible.
Procedural Creation of Behavior Trees for NPCs
293
Fig. 6. Tree 6–500 rounds, run 1, score: 34
Fig. 7. Tree 7–500 rounds, run 2, score: 49
Fig. 8. Tree 8–500 rounds, run 3, score: 61
Tree 8 (Fig. 8) will likely resort to melee attacks as well. However, if HP are low, ranged attacks are more likely. Tree 9 (Fig. 9) displays a more balanced behavior. Agents peform both melee and ranged attacks, except in case of many nearby enemies, which lead to a retreat towards the own flag. This trees displays strategic, team-based behavior and scores highest during execution with 500 rounds and its high score is only topped by one tree in the 1000 round scenario.
294
R. Fronek et al.
Fig. 9. Tree 9–500 rounds, run 4, score: 69
Fig. 10. Tree 11–1000 rounds, run 1, score: 50
Fig. 11. Tree 12–1000 rounds, run 2, score: 75
7.3
Analysis of Superior Trees (1000 Rounds)
Tree 11 (Fig. 10) and tree 12 (Fig. 11) display well known patterns resulting in melee attacks. Amount of ammunition never lowers, as agents never shoot and the distance to the flag will always be less than 14. Tree 13 (Fig. 12) also mostly results in melee attacks. If many enemies are nearby, agents will constantly reload ammunition until other allied agents have dealt with aforementioned enemies. On first inspection, tree 14 (Fig. 13) might favor “move” and “reload” actions. However, due to the randomized definition of “near”, the leaf node instructing agents to peform a melee attack instruction can be reached and lead to a victory for the team. Tree 15 (Fig. 14) displays a more differentiated behavior. As long as enemies are not nearby, agents will move towards them based on the “Melee Enemy” instruction. Otherwise, agents attack enemies depending on whether they are
Procedural Creation of Behavior Trees for NPCs
295
Fig. 12. Tree 13–1000 rounds, run 3, score: 60
Fig. 13. Baum 14–1000 rounds, run 4, score: 56
Fig. 14. Tree 15–1000 rounds, run 5, score: 53
in cover. If enemies are not in cover, agents shoot them. In case enemies are in cover agents resort to melee attacks, which do more damage to enemies in cover.
8
Conclusion
There is a clear trends towards “simple” trees, which are somewhat obfuscated by hardly traversed branches. Further work might take a closer look on these rarely visited branches and nodes and deliver further insights. Furthermore, trees could be refined by reordering nodes, as suggested by Utgoff et al. [10]. This helps to address the issue of branches that could not be reached previously.
296
R. Fronek et al.
If actions can not be performed, resorting to more suitable, alternative actions leads to lesser draws. However, this also favors rather simple trees performing the same action repeatedly. Randomly generated trees also often feature branches that cannot be reached, leading to frequent traversal of branches featuring effortless and simple actions in their leaf nodes. These trees are also more likely to be generated during early stages and thus, pass on this behavior to other trees learning from them.
References 1. Abiyev, R.H., Akkaya, N., Aytac, E.: Control of soccer robots using behaviour trees In: 2013 9th Asian Control Conference (ASCC), pp. 1–6. IEEE (2013) 2. Dromey, R.G.: Using behavior trees to model the autonomous shuttle system. In: 3rd International Workshop on Scenarios and State Machines: Models, Algorithms and Tools, (SCESM04), Edinburgh. IET (2004) 3. Lim, C.-U., Baumgarten, R., Colton, S.: Evolving behaviour trees for the commercial game DEFCON. In: Di Chio, C., et al. (eds.) EvoApplications 2010. LNCS, vol. 6024, pp. 100–110. Springer, Heidelberg (2010). https://doi.org/10.1007/9783-642-12239-2 11 4. Nareyek, A.: AI in computer games. Queue 1(10), 58–65 (2004) 5. Ponsen, M., et al.: Automatically generating game tactics through evolutionary learning. AI Mag. 27(3), 75–75 (2006) 6. Riedl, M.O., Zook, A.: AI for game production. In: 2013 IEEE Conference on Computational Inteligence in Games (CIG), pp. 1–8. IEEE (2013) 7. Robertson, G., Watson, I.: A review of real-time strategy game AI. AI Mag. 35(4), 75–104 (2014) 8. Schaeffer, J.: A gamut of games. AI Mag. 22(3), 29–29 (2001) 9. Sweetser, P., Wiles, J.: Current AI in games: a review. Aust. J. Intell. Inf. Process. Syst. 8(1), 24–42 (2002) 10. Utgoff, P.E., Berkman, N.C., Clouse, J.A.: Decision tree induction based on efficient tree restructuring. Mach. Learn. 29(1), 5–44 (1997)
Developing Japanese Ikebana as a Digital Painting Tool via AI Cong Hung Mai1,4(B) , Ryohei Nakatsu3 , and Naoko Tosa2 1 Graduate School of Faculty of Science, Kyoto University, Kyoto, Japan
[email protected] 2 Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University,
Kyoto, Japan [email protected] 3 Design School, Kyoto University, Kyoto, Japan [email protected] 4 RIKEN, Osaka, Japan
Abstract. In this research, we have carried out various experiments to perform mutual transformation between a domain of Ikebana (Japanese traditional flower arrangement) photos and other domains of images (landscapes, animals, portraits) to create new artworks via CycleGAN, a variation of GANs (Generative Adversarial Networks) - new AI technology that can perform deep learning with less training data. With the capability of achieving transformation between two image sets using CycleGAN, we obtained several interesting results in which Ikebana plays the role of a digital painting tool due to the flexibility and minimality of the Japanese culture form. Our experiments show that Ikebana can be developed as a painting tool in digital art with the help of CycleGAN and opens a new way to create digital artworks of high-abstracted level by applying AI techniques to elements from traditional culture. Keywords: GANs · Cycle GAN · Digital art · Ikebana
1 Introduction In recent years, the rapid development of AI, especially “Deep Learning [1],” has reshaped many fields of research. Not only the natural sciences and engineering have been changed into new ways, but AI also has had a deep impact on the development of social sciences and humanities. Among others, the application of deep learning in art is of interest. Art has been widely considered as one of the most humanized areas and it has been considered questionable that AI can learn the way people create artworks. However, recently the power of deep learning has raised the question about the ability of AI to learn, analyze, and even create art. In this paper, we have tried to answer some of these questions by investigating the relationship between art and AI. Also, we will propose a method to use Ikebana (Japanese traditional flower arrangement) as a digital painting tool to create new artworks. Firstly, © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 297–307, 2020. https://doi.org/10.1007/978-3-030-65736-9_27
298
C. H. Mai et al.
in Sect. 2 we will explain Ikebana focusing on the role of Ikebana in the long history of art. And we will explain why Ikebana can work as a digital painting tool by using AI. Then in Sect. 3, we will explain CycleGAN [3], one of the recent new AI technologies, that can work as a basic method to make Ikebana work as a digital painting tool. A detailed explanation of CycleGAN including the explanation of how it could be applied to Ikebana will be given in Sect. 3. But here we briefly explain the basic concept behind the technology. Firstly, we explain the generative model in deep learning. The generative model along with the discriminative model are two kinds of neural networks in deep learning. Informally, the name “generative” came from the fact that the generative model can generate new data instances while the discriminative model can discriminate between different categories of data. In probability terms, the generative model learns the joint probability p(X, Y) of data instances set X and label set Y while the discriminative model learns the conditional probability p(Y|X). Generative models study the distribution of X and how likely a given example belongs to that distribution. In art style transfer, a generative model would transfer a photo into a specific style or can generate new photos from random input which are likely to belong to the distribution representing an artist’s style. Among generative models, GANs (Generative Adversarial Networks) [2] have been extensively studied in recent years because of their powerful efficiency while not requiring a large number of training data. GANs is popular because it can perform deep learning with a relatively small number of training data (see [3]). The composition of GANs includes two networks, a generator network (G) and a discriminator network (D), as in Fig. 1. In the training of GANs, network G learns to generate data from random noise while D tries to identify the generated data whether it is real or fake. The training process can be interpreted as a zero-sum game between G and D if we use the terms from Game Theory [4] when the training process on G tries to maximize the probability of the generated data to lies on the distribution of target sets and the training process on D tries to minimize it. With this minimax mechanism, the networks can converge even with a relatively small number of learning data. With this concept, a large number of GANs variation has been developed by modifying the basic configuration.
Fig. 1. The basic configuration of GANs
Developing Japanese Ikebana as a Digital Painting Tool via AI
299
2 The Art of Ikebana In this research, we study the usage of the generative models in deep learning to generate digital art with Ikebana to be the main painting tool. Ikebana is the art of flower arrangement in Japanese culture [5, 6]. The word “Ikebana” comes from the Japanese word “Ikeru” (“be living” or “to have a life”) and “Hana” (“flower”). It means that the flowers are given life under the arrangements of the artists. As an important element in Japanese culture under the influence of Zen [7], Ikebana has a strong connection with other Japanese art forms such as Tea Ceremony [8]. Viewing plants and flowers is a notable tradition of the Japanese along with the long history of its culture and is related to the Japanese aesthetics of harmony to nature. Ikebana has a very deep root to the Japanese philosophy of art. Ikebana is established in Japan in the Heian period (794-1185) after Buddhist monks brought the tradition of arranging flowers on Buddha from China. In the early stage, it was just placing a flower in vases but in the next centuries, it grew to be an art form under the influence of Zen Buddhism. Like other Zen-influenced art forms such as Haiku [9] or Tea Ceremony, Ikebana is not just beautifully arranging flowers but it gives the path to be harmonic to nature. The philosophy of Ikebana includes two important properties which are influenced by the aesthetic of Zen. First is the minimality as Ikebana not only considers the importance of the materials but also the space surrounding them. An Ikebana artist can represent a complete scene with a lot of space to give a minimal view. Secondly, Ikebana let natural living materials - flowers, leaves, and branches - free to grow and change. We consider it as flexibility when the materials can be placed in various shapes and arrangements. The minimal form of Ikebana gives us the idea that it is considered as a minimally condensed form of nature and also beauty in nature. This also gives us an inspiration using Ikebana as a source of creating various new natural forms that could work as new artworks. The development of Ikebana has continued until today and has not been limited within a traditional art. In the modern digital art scene, one example of Ikebana working as the inspiration source is “Sound Ikebana” created by one of the authors, Naoko Tosa (Fig. 2), which used fluid dynamics to create Ikebana-like forms from different types of water-based solution [10, 11]. The resemblance between Sound of Ikebana and actual Ikebana also motivated us to the idea of integrating this traditional art form into the temporary scene of digital art.
Fig. 2. Naoko Tosa’s Sound of Ikebana
300
C. H. Mai et al.
3 CycleGAN Among the variations of GAN, CycleGAN is an elegant method to study the mutual transformation between two sets of data [3]. We consider the generative models on CycleGAN as the main tool to do the task of creating digital Ikebana artworks. The architecture of a CycleGAN network is illustrated as in Fig. 3. In comparison to traditional GANs, we add an inverse transformation GBA of the generator network GAB , which has the data of domain A as input to transform them into an element of domain B. We also use two discriminator DA and DB for the domains A and B, respectively. We would measure the difference between A and A’ (the reconstruction data in A by applying GAB then GBA ) and the error caused by the difference between B and the domain given by applying GAB to A. The training process would minimize the sum of these two errors. The data generated by GAB and GBA provides the mutual transformation between the two domains.
Fig. 3. The basic configuration of CycleGAN [12]
The most important difference between GANs and CycleGAN as generative models is GANs learn to generate data to fit in a target set while CycleGAN learns the setto-set level of transformation. Because of this feature, CycleGAN could be used to establish mutual conversion between these two groups of images (for example art styles of two artists). Figure 4 shows how horses are converted into zebras and vice versa. The development of CycleGAN has opened a new way to create (or correctly, transform) artworks that have the specific style of an artist. Moreover, in Art, our goal would not be limit to copy the style, we are also desired to create original ideas. We have an assumption that if we use CycleGAN to perform mutual conversion between two groups of images with one group behaves like the painting tools, the transformation would result in a new kind of art. In this research, we use Ikebana as the painting tools to create new artworks via CycleGAN.
Developing Japanese Ikebana as a Digital Painting Tool via AI
301
Fig. 4. Horses-Zebras transfer (Image source [3])
4 Experiments As described above, CycleGAN can perform the mutual transformation between 2 sets of images and does not require one-to-one correspondence between images belonging to each image set. It means that AI can perform high abstract style transfer because the generative models in CycleGAN will learn the distribution (style or characteristic features) of a domain of images. In the classic example of CycleGAN, the generator G can be used to transfer digital landscape photos into Monet’s style and make a mutual transformation between horse and zebra or winter landscape scenes and summer landscapes scenes. In other words, these examples worked with images of relatively similar in size, theme, and category. Artworks by Monet and landscapes photos are representing the same category of subject, similar to winter and summer landscapes things and horses and zebras are animals with similar size and shape. The transformations between these domains are imaginable without using AI. In this research, we consider another viewpoint of mutual transformation by applying CycleGAN to relatively different domains of objects which are difficult to imagine the mutual transformation. The mutual transformation between macro and micro-size world or between plants and animals would be unusual but unusual (but harmonic) concepts are a key point to create art. We predict this “unusual transformation” would give a highabstracted representation of things, and it could generate artworks when one domain can play the role of a painting tool. Inspired by the ability to represent nature in a minimal way of Ikebana, we set a sequence of experiments of CycleGAN in which a set of Ikebana images is one of the two domains. To perform the “unusual transformation”, we choose image sets that are relatively different from flowers: landscapes, portraits, and large-sized animals. We use
302
C. H. Mai et al.
the datasets below and make an experiment of mutual transformation via CycleGAN between A and B1, B2, B3 respectively. Dataset A: Ikebana photos in Flickr Dataset B1: Landscape photos in Kaggle (https://www.kaggle.com/arnaud58/landsc ape-pictures) Dataset B2: Portrait photos in Kaggle (partially) (https://www.kaggle.com/laurentmih/aisegmentcom-matting-human-datasets) Dataset B3: Elephants and Horses in Kaggle Animal-10 dataset. (https://www.kaggle.com/alessiocorrado99/animals10) The classic transformation between Ikebana and Landscapes comes from the traditional idea of Ikebana. Our experiment would re-investigate the fundamental ability of Ikebana as an art form. The second pair Ikebana and portraits are inspired by the comparison between flowers and human faces in many traditional Oriental pieces of literature and poems. We consider the last transformation because the transformation between large-sized moving animals and small-sized still plants is ideal for our concept of “unusual transformation”.
5 Results and Discussion In this section, we give some notable results and discussion of each experiment. 5.1 A - B1 Transformation Some interesting results are shown in Fig. 5. In this experiment, we could see Ikebana can minimize the landscape from different spatial perspectives. The representation gives quite similar results as the traditional Ikebana. It is compatible with the pre-assumption that the traditional Ikebana was mostly inspired by the natural shape of landscapes. 5.2 A - B2 Transformation Some successful transformation from human faces into Ikebana are shown in Fig. 6. We find it interesting when the shapes of original human faces are remaining. It proves that the arrangement of flowers can inspire a new kind of portrait art. 5.3 A - B3 Transformation Some successful examples are given in Fig. 7. Similar to A-B2 transformation, the shapes of the animals are also preserved after being transformed into Ikebana. In our opinion, these results could be regarded as one kind of Surrealism Art, which reflects an unreal world.
Developing Japanese Ikebana as a Digital Painting Tool via AI
303
Fig. 5. Some transformation results from landscape photos to Ikebana images.
5.4 Other Results We also give some fail examples. The first type of fail examples shown in Fig. 8 is the over-transformation when we could not see the original shape of the input photo. We consider that this problem arises because our training data set of A includes many photos that are over transformed. We plan to improve it by replacing them with some successful results for the next experiments. The second type of fail examples shown in Fig. 9 is that the network failed to perform good transformation into Ikebana. We consider the reason as the generator failed at transforming some highly complex photos. We predict this problem would also
304
C. H. Mai et al.
Fig. 6. Some transformation results from portrait photos to Ikebana images.
be resolved if we could improve the training sets by adding successful results into the training set of A. In the inverse direction of the transformation A-B2 shown in Fig. 10, we could see that CycleGAN can reconstruct some local parts of the human face instead of a whole face as expected (there is some exception though). This is the limit of generative models in deep learning when it is noise-vulnerable and performs the transformation locally (in terms of pixel neighborhood) when one domain has a complex structure. While in the reverse, the flexibility and minimality of Ikebana transform avoid that limit.
Developing Japanese Ikebana as a Digital Painting Tool via AI
Fig. 7. Some transformation results from animal photos to Ikebana images.
305
306
C. H. Mai et al.
Fig. 8. Several failed examples of over transformation.
Fig. 9. Several failed examples of transformation.
Fig. 10. Several failed inverse transformation examples.
Developing Japanese Ikebana as a Digital Painting Tool via AI
307
6 Conclusion As can be seen from the results of our sequences of experiments, we can conclude that CycleGAN can help an artist generate original artworks by transforming a target set photos into its representation by Ikebana. With the flexibility and minimality in its nature, Ikebana could be developed into a new kind of digital painting tools. With the help of AI, especially the generative models inspired by GANs, we believe other traditional art forms such as Chinese Sanshui Painting can also be used in modern digital art in the same way. We also note about the concept of “unusual transformation” used in this research. The mutual transformation via CycleGAN would open a new way to create art when we perform mutual transformation between objects of relatively different categories. Via CycleGAN, other transformation could be experimented to investigate the power of AI in creating new art.
References 1. Jorn D.K.: Deep Learning. MIT Press (2019) 2. Creswell, A., et al.: Generative adversarial networks: an overview. IEEE Signal Process. Magaz. 35(1), 53–65 (2018) 3. Jun-Yan, Z., Taesung, P., Phillip, I., Alexei, A.E.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV). pp. 223–2232 (2017) 4. Steven, T.: Game Theory: An Introduction. Princeton University Press (2013) 5. https://en.wikipedia.org/wiki/Ikebana 6. Shozo, S.: Ikebana: The Art of Arranging Flowers. Tuttle Publishing (2013) 7. Suzuki, D.: Zen to Nihonbunka (Zen Buddhism and its Influence on Japanese Culture. Iwanami (1981) 8. Sadler, A.L., Laura, C.M.: The Japanese Tea Ceremony: Cha-no-Yu and the Zen Art of Mindfulness. Tuttle Publishing (2019) 9. Judith, P., Barry, T., Michiko, W.: Haiku: Japanese Art and Poetry. Pomegranate (2010) 10. Naoko, T., Yunian, P., Qin, Y., Ryohei, N.: Pursuit and expression of japanese beauty using technology. Arts J. MDPI. 8(1), 38 (2019) 11. Tosa, N., Yunian, P., Zhao, L., Nakatsu, R.: Genesis: new media art created as a visualization of fluid dynamics. In: Munekata, N., Kunita, I., Hoshino, J. (eds.) ICEC 2017. LNCS, vol. 10507, pp. 3–13. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66715-7_1 12. https://bellchen.me/research/cs/gan/cyclegan-for-unsupervised-translation-in-anime/ 13. Christian, L., et al.: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network (2016)
Learning of Art Style Using AI and Its Evaluation Based on Psychological Experiments Cong Hung Mai1,5 , Ryohei Nakatsu2(B) , Naoko Tosa3 , Takashi Kusumi4 , and Koji Koyamada2 1 Graduate School of Faculty of Science, Kyoto University, Kyoto, Japan
[email protected] 2 Academic Center for Computing and Media Studies, Kyoto University, Kyoto, Japan
[email protected], [email protected] 3 Graduate School of Advanced Integrated Studies in Human Survivability, Kyoto University,
Kyoto, Japan [email protected] 4 Graduate School of Education, Kyoto University, Kyoto, Japan [email protected] 5 RIKEN, Wako, Japan
Abstract. GANs (Generative adversarial networks) is a new AI technology that has the capability of achieving transformation between two image sets. Using GANs we have carried out a comparison between several art sets with different art styles. We have prepared four image sets; a flower image set with Impressionism art style, one with the Western abstract art style, one with Chinese figurative art style, and one with the art style of Naoko Tosa, one of the authors. Using these four sets we have carried out a psychological experiment to evaluate the difference between these four sets. We have found that abstract drawings and figurative drawings are judged to be different, figurative drawings in West and East were judged to be similar, and Naoko Tosa’s artworks are similar to Western abstract artworks. Keywords: GANs · Cycle GAN · Art history · Transformation of art style · Impressionism · Abstraction
1 Introduction Recently a new technology of deep learning in AI called GANs (Generative Adversarial Networks) has been proposed [1], and various attempts to create artworks by AI have been carried out. However, many of these methods merely let AI learn the style of a particular painter and output images with the learned style. Is there a different approach to the relationship between AI and art? For example, can AI approach basic questions such as what beauty is that exists at the basis of art, and what the difference between Oriental and Western perceptions of beauty is? In this paper, a new methodology for approaching the relationship between AI and art will be proposed, and the results of verification through psychological experiments will be shown. © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 308–316, 2020. https://doi.org/10.1007/978-3-030-65736-9_28
Learning of Art Style Using AI and Its Evaluation
309
2 Related Works Recently a new learning method called GANs (Generative Adversarial Networks), that can perform deep learning with a relatively small number of training data, has been proposed [1]. GANs are composed of two networks; a generator network and a discriminator network. By performing learning as a zero-sum game between these two networks, deep learning can converge even with a relatively small number of learning data. By modifying this basic configuration, various GANs have been proposed and interesting results have been obtained. Among them, Cycle GAN is a new method that enables mutual conversion between two image sets. Figure 1 shows the basic concept of Cycle GAN [2]. In Cycle GAN, when two image sets (X, Y) are given, a transformation function G and an inverse transformation function F between them are considered. Also, two types of errors, Dx and Dy are considered; Dx is a difference between X and X’ where X’ is the transformation of X by applying G then F and Dy is an error caused by the difference between Y and Y’ where Y’ is a transformation of Y by applying F and then G. The training is carried out so that the sum of these two types of errors is minimized (Fig. 1).
Fig. 1. Basic concept of Cycle GAN.
The feature of Cycle GAN is that, even if there was no one to one correspondence between image sets X and Y, the conversion between them is possible. By using this feature, for example, by learning a group of landscape photographs and a group of paintings of a specific painter, mutual conversion between these image groups becomes possible. Figure 2 shows how a Monet-style landscape painting is created from a landscape photograph [2].
Fig. 2. Conversion from a landscape photo to a Monet painting using Cycle GAN.[3]
310
C. H. Mai et al.
3 Fluid Art “Sound of Ikebana” The behavior of fluid is an important research subject in physics, and has been studied as “fluid dynamics.” It is known that fluid creates extremely beautiful shapes under various conditions. As beauty is a fundamental element of art, it is natural to consider fluid dynamics as a basic methodology of art creation. Naoko Tosa, one of the authors, has been leading a project of creating “fluid art” by shooting the behavior of fluid with a high-speed camera. One of the techniques for creating fluid art is the creation of Ikebana-like shapes when sound vibration is applied to paint or other fluids and the phenomenon is shot with a high-speed camera. The detailed process is as follows. A speaker is placed upward, a thin rubber film is put on top of it, and a fluid such as paint is placed on top of it, and the speaker is vibrated with sound, then the paint jumps up and various shapes are created. Based on this method Naoko Tosa created a video art called “Sound of Ikebana” [3]. In April 2017, she exhibited Sound of Ikebana using more than 60 digital billboards at Times Square in New York. Figure 3 shows the event.
Fig. 3. “Sound of Ikebana” at Times Square in New York.
It was interesting that the video art mentioned above let the authors consider what beauty is and also what Japanese beauty is. When Tosa, exhibited her media art around the world as Japan Cultural Envoy named by the Agency for Cultural Affairs in Japan, many foreign art-related people indicated, “In Tosa’s media art, which expressed the beauty hidden in physical phenomena, there is the beauty which Westerners did not notice until now, and which might be the condensed consciousness and sensitivity unique to Japan.” Inspired by this, the authors would like to compare Western and Oriental artworks using AI.
4 Framework of This Research As described in Chapter 2, Cycle GAN can be used to carry out the transformation between two image sets, even if there is no one-to-one correspondence between images belonging to each image set. So far, reference [3] merely states that landscape photographs could be converted into Monet-style paintings and vice versa. But the authors consider that Cycle GAN could be applied to research investigating the relationship between art and beauty. This paper is based on this basic thought. Considering that art is an essential feature extracted from real objects or natural phenomena, it is possible
Learning of Art Style Using AI and Its Evaluation
311
to use Cycle GAN to convert between real objects or natural phenomena and art that extracts their essence. In the long history of art, the painting originally tried to imitate nature. Western realism is the extension of this trend. As the times go down, however, the impressionism is born that tries to paint the light and its transitions perceived by human eyes, rather than trying to paint nature as it is. At this stage, the form of the objects depicted is still clear. Later, however, the history of Western painting was followed by Cubism and Surrealism, then followed by more recent abstract paintings. Nevertheless, it can be said that artists extract essential things they felt in their hearts from the surrounding nature and made them into abstract paintings. On the other hand, the history of Oriental painting is characterized by the fact that the painted objects have been clear since ancient times. Rather, it is characterized by the direction of minimalism that removes color like ink painting and by the way of drawing emphasizing the characteristics of the object like Ukiyo-e in Japan and remains at the level of figurative painting compared to the West. In such a situation, how is “Sound of Ikebana” described earlier positioned in the history of Oriental painting? It does not depict landscapes either human life and looks like abstract images and videos. Nevertheless, as mentioned earlier, many people overseas have said “The artwork has the feeling of Japanese beauty.” Is it possible to use AI style learning and style conversion functions to find out how Sound of Ikebana is positioned compared to Western and Oriental figurative and abstract paintings? In this study, this important and interesting issue is approached using Cycle GAN. (1) Two types of image sets (image set A, and multiple image sets B1, B2,…) are prepared. The image set A is to be converted into art. A multiple image sets B1, B2,… consist of art images. (2) Using Cycle GAN, mutual conversion (Fig. 4) of an image set A and multiple image sets B1, B2,.. are achieved to obtain conversion functions (G1, F1), (G2, F2), ….
Fig. 4. Mutual conversion between two types of image sets.
(3) A psychological experiment is performed using the image sets G1 (A), G2 (A),… that are obtained by performing the conversion G1, G2, … to the image set A. Depending on the purpose of the psychological experiment, questionnaires such as “Can you evaluate it as art?” “Do you feel beauty?” will be filled by the subjects.
312
C. H. Mai et al.
By doing this, it is possible to verify depending on different art styles what kind of information is extracted as essential information from real objects and made into artworks.
5 Learning of Various Art Style and Transformation of Art Style The following image sets were prepared. (Resolution of all images are 256 × 256) Image set A: Image set B1: Image set B2: Image set B3:
8069 flower images 1072 Monet paintings in the Monet2Photo dataset [2]. 123 Kandinsky abstract paintings from Wiki Art. 238 Chinese hand-painted flower paintings called “Gongbi” from Stanford University project “Chinese Painting Generation Using Generative Adversarial Networks.” Image set B4: 569 images selected from “Sound of Ikebana” The image set B1 includes paintings mainly for flowers drawn by the Impressionist Monet as a representative example of the Western figurative paintings. As a representative example of Western abstract paintings, Kandinsky paintings were prepared as image set B2. Image set B3 includes flower paintings of Chinese hand-painted painting, called “Gongbi [5],” as a representative example of Oriental figurative paintings. Image Set B4 is a set of still images taken from the media art “Sound of Ikebana” created by one of the authors, Naoko Tosa. The Sound of Ikebana is a video artwork created by shooting physical phenomena with a high-speed camera. As it is made from a physical phenomenon, it should not be said that it originally contains “Japanese beauty.” In this experiment, the art style of the Sound of Ikebana was compared with Western and Oriental representative painting styles. The four types of different image set B1, B2, B3, B4 and the image set A including were mutually converted using the Cycle GAN. Figure 5 shows examples of the result of the mutual conversion between the Sound of Ikebana and the photograph of the flower.
Fig. 5. Examples of mutual conversion between flower photos and Sound of Ikebana style images. (a: Flower to “Sound of Ikebana” style, b: “Sound of Ikebana” to flower-like)
The question here is why it is necessary to use AI for comparative evaluation of art styles. Why Monet art, Kandinsky art, Chinese Gongbi art, and Sound of Ikebana images
Learning of Art Style Using AI and Its Evaluation
313
were not directly compared and evaluated using psychological experiments. There have been some studies that evaluated artworks through psychological experiments [4]. However, by using a copy of the original artwork, it can be relatively easy to identify the artist for each art. For example, knowing that a painting is Monet’s art suggests that a subject has a prejudice of the work of Monet, a representative of the Impressionists in Western art history and that this will have a significant effect on evaluation experiments. To avoid this effect, there are research examples of using lesser-known works [4]. However, works of famous artists and art schools can be easily identified and would greatly affect evaluation experiments. On the other hand, using GAN allows for AI to learn an art style and to apply the art style on the input images with the art style from the input. Therefore, bias can be avoided in the evaluation experiment and this is the benefit of using AI to evaluate artworks.
6 Evaluation of Obtained Results Based on Psychological Experiment The experiment described in Chapter 5 yielded results of performing various style conversions on flower images. By having people evaluate the results of applying various style transformations to various flower images, is it possible to know what art is, what is the beauty behind it, and the culture of beauty? Is there any suggestion on how people receive Japanese beauty and the corresponding Western beauty? That is the goal of this research. Since this is a subjective evaluation, a method used in psychological experiments, which is to present a target image to a subject, to conduct a questionnaire survey, and statistically analyze the results, was used. 6.1 Psychological Experiment Image group Goup1, Group2, Group3, and Group4 are prepared by selecting two images from each of image sets G1(A), G2(A), G3(A), and G4(A), which are obtained by converting image set A into image sets B1, B2, B3, and B4. The resolution of each image is 256 × 256. Twenty three Kyoto University students (12 male and 11 female, all are Japanese) were used as subjects. The gender ratio is almost half. Each of 8 images was printed out on A4 high-quality paper, and the eight images were presented to the subjects. The order of the presented images was set randomly for each subject. The subjects were asked to perform a seven-step subjective evaluation of the 6 items shown in Table 1. These items were selected to identify the difference between the Oriental and Western art styles. Table 1. Adjective pairs used for evaluation Individual – Ordinary Bold – Careful Dynamic – Static Artistic – Non-artistic Stable – Unstable Oriental – Western
314
C. H. Mai et al.
6.2 Analysis The results of the subjective evaluations by the 23 subjects and for six items, which are Individual-Ordinary, Dynamic-Static, Stable-Unstable, Bold-Careful, Artistic-Non artistic, Oriental-Western, were averaged for each evaluation item, graphed, and t-tested. Figures 6, 7, 8 show the results of the averaged value and the standard error for each evaluation item. Also, the results of t-analysis (**:1%, *:5%) are shown on these figures.
Fig. 6. Subjective evaluation results for “individual-ordinary” (left) and “Dynamic-Static” (right). (Group1: Monet-style flower images, Group2: Kandinsky-style flower images, Group3: Chinese figurative art-style flower images, Grop4: Naoko Tosa art-style flower images.)-
Fig. 7. Subjective evaluation results for “Stable-Unstable” (left) and “Bold-Careful” (right). (Group1: Monet-style flower images, Group2: Kandinsky-style flower images, Group3: Chinese figurative art-style flower images, Grop4: Naoko Tosa art-style flower images.)-
6.3 Consideration (1) Group 2 vs Group 4 Group 2 and Group 4 received similar evaluations. This indicates that there is a small significant difference between the style of Kandinsky and the style of Sound of Ikebana. Conversely, Group 4 is evaluated as having a significant difference from Group 1 and Group 3 for all items except “Oriental-Western”. This indicates that the
Learning of Art Style Using AI and Its Evaluation
315
Fig. 8. Subjective evaluation results for “Artistic-Non-artistic” (left) and “Oriental-Western” (right). (Group1: Monet-style flower images, Group2: Kandinsky-style flower images, Group3: Chinese figurative art-style flower images, Grop4: Naoko Tosa art-style flower images.)-
Sound of Ikebana is considered to be abstract rather than figurative. If the transition from figurative painting to abstract painting is the history of Western painting, the Sound of Ikebana can be said to be positioned in the history of the transition from figurative painting to abstract painting in the Orient. (2) Group 1 vs Group 3 Similarly, Group 1 and Group 3 received similar evaluations. In particular, the hypothesis that there is a significant difference of 5% level between the “DynamicStatic” and “Artistic-Non artistic” items have been rejected. Group 1 is an image set with a style of Western Impressionism, and Group 3 is an image set with the characteristics of Oriental figurative painting. Each of the original art could be judged that each has its characteristics, But in a more essential part, these styles may have something in common. (3) Artistic or not It is interesting to note that Group 1 and Group 3 are around or below the median value of 4 for “Artistic-Non artistic.” Few people will rate Monet’s original image as unartistic. Chinese hand-painted paintings have also been highly evaluated as elaborately depicting nature. However, Groups 2 and 4 are evaluated as being higher artistic than Groups 1 and 2. This is thought to be due to the young age of the subject. As the younger generation has more opportunities to watch abstract drawings and recent media art, they may have an aesthetic sense appreciating abstract paintings. Also, this means that our principle of using the converted photos with a specific art style instead of the original artworks worked well. By using the original art image for evaluation, it was relatively easy to identify who the artist was, or even the specific work itself, and it is guessed that this had a significant effect on the evaluation. (4) Oriental or Western As shown in Fig. 8, the answers to the question of Oriental or Western are all around the median of 4 except for Group 2. This indicates that the subject did not identify whether Oriental or Western for the artworks in these groups, and gave a response near the middle. Initially, the authors expected that the Sound of Ikebana would be evaluated as “Oriental” because of the overseas evaluation that the artwork contains
316
C. H. Mai et al.
Japanese beauty. But so far, such a result was not obtained. At the same time, Monetstyle images and Chinese Gongbi-style images were evaluated as intermediate. This seems to indicate that at this time, the art style extracted by AI has not yet reached a level to identify Western or Oriental impression.
7 Conclusion In this paper, a new method of handling art with AI was described by using GANs to investigate where is the difference in art style, and what is the essence of the difference in aesthetic sense between Oriental and Western beauty. By using the method proposed in this paper, subjects evaluate the styles of Western and Oriental figurative and abstract paintings without bias created by identifying specific artists and/or artworks. As a result, it was shown that the figurative drawings of the Orient and the West are not very different. Also, it was shown that one of the authors’ works “Sound of Ikebana,” has no significant difference from the abstract drawings of the Western abstract drawings. However, it is not enough to clarify why the Sound of Ikebana is evaluated by Westerners as Oriental in the scope of this study. It is future work to clarify this. Also, it is necessary to use Western subjects to know the difference of sensitivity between Oriental and Western people.
References 1. Creswell, A., et al.: Generative adversarial networks: an overview. IEEE Signal Process. Magaz. 35(1), 53–65 (2018) 2. Jun-Yan, Z., Taesung, P., Phillip, I., Alexei, A.E.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV). pp. 223–2232 (2017) 3. Naoko, T., Yunian, P., Qin, Y., Ryohei, N.: Pursuit and expression of japanese beauty using technology. Arts J. MDPI. 8(1), 38 (2019) 4. Freedman, K.: Judgement of painting abstraction, complexity, and recognition by three adult educational groups. Visual Arts Res. 14, 68–78 (1988) 5. https://en.wikipedia.org/wiki/Gongbi
Deep Learning-Based Segmentation of Key Objects of Transmission Lines Mingjie Liu1 , Yongteng Li1 , Xiao Wang1 , Renwei Tu2 , and Zhongjie Zhu2(B) 1 Ninghai Power Supply Company Limited, State Grid Corporation of Zhejiang,
Ningbo 315600, China 2 College of Information and Intelligence Engineering, Zhejiang Wanli University,
Ningbo 315100, China [email protected]
Abstract. UAV (Unmanned Aerial Vehicle) inspection is one of the main ways of transmission line inspection, which plays an important role in ensuring the transmission safety. In view of the disadvantages of existing inspection methods, such as slow detection speed, large calculation of detection model, and inability to adapt to low light environment, an improved algorithm based on YOLO (You Only Look Once) v3 is proposed to realize the real-time detection of power towers and insulators. First of all, a data set of power towers and insulators is established, which are inverted and transformed to expand the data volume. Secondly, the network structure of YOLO v3 is simplified and the calculation of the detection model is reduced. Res unit is added to reuse convolution feature. Then, K-means is used to cluster the new data set to get more accurate anchor value, which improves the detection accuracy. Through the experimental demonstration, the accuracy of the proposed scheme for the detection of key parts of the transmission line is 4% higher than the original YOLO, and the detection speed reaches 33.6 ms/frame. Keywords: YOLO v3 · Transmission line · Tower · Insulator
1 Introduction The safe and stable operation of transmission lines is of great significance for the regional social and economic development. In order to eliminate the potential safety hazards before power failure, it is necessary to carry out daily inspection of transmission lines [1]. The traditional inspection method relies on manual inspection, which has great potential safety hazards. Especially after the natural disasters such as earthquake and typhoon, the inspection environment becomes worse and more complex. Moreover, the manual inspection is time-consuming, inefficient, labor intensive and cost-effective. The UAV (Unmanned Aerial Vehicle) inspection is a kind of inspection method based on machine learning and target detection algorithm, which can be realized without directly contacting the power system. Compared with manual inspection, UAV inspection has the advantages of low risk, high efficiency and low cost [2–4]. Therefore, UAV patrol technology is widely concerned by the power industry. © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 317–324, 2020. https://doi.org/10.1007/978-3-030-65736-9_29
318
M. Liu et al.
In traditional machine vision recognition and location methods, based on shallow machine learning algorithms such as SIFT (Scale Invariant Feature Transform) recognition method, combined with HOG (Histogram of Oriented Gradients) and SVM (Support Vector Machine) and other methods are mainly based on the edge or texture features in the image. This part of the algorithm is relatively simple in the background environment, and has achieved good results in the image with clear edge of the tested object. In the image scene recognition, T. Yu and R. Wang put forward the algorithm of scene analysis using image matching, and achieved good results in the street view. However, the transmission line image captured by UAV contains a lot of environmental background information [5]. In the face of complex backgrounds, traditional algorithms cannot meet the real-time and accuracy of detection. In recent years, deep learning, especially convolution neural network model, has achieved remarkable results in image recognition. R. Girshick et al. Proposed RCNN [6], which changed the traditional region selection to use sliding window. Each sliding window is detected once, RCNN uses a heuristic method (selective search), and then detects candidate regions, reducing the degree of information redundancy. However, the image scale fixed in CNN will cause the object deformation and then lead to the degradation of detection performance. Later, S. Ren proposed fast RCNN [8], which really realized the end-to-end target detection framework and improved the detection accuracy and speed. Then, Redmon and others proposed the YOLO algorithm [9], which greatly improved the speed of target detection. Among them, the ability of YOLO v3 to detect small targets has been improved obviously, which makes its performance to detect different sizes of targets more balanced. In the detection of transmission towers and insulators, YOLO v3 mainly detects large and medium-sized targets, but almost no small targets. There is no doubt that the part of small target detection will increase the calculation of the whole detection process. At the same time, in the low light environment, there is such a situation of missing or false detection. To resolve the abovementioned problems, this paper improves upon the YOLO v3 framework to realize real time object detection of key parts of the transmission line. In order to make up for the loss of features caused by network simplification, res unit is added after the backbone network to reuse the convolution features [10]. Since there is no public data set for key parts of transmission lines, a new data set with weak light conditions is established. The data set comes from UAV multi angle shooting, which simulates the real detection environment. Also, it is rotated and scaled to expand the data. According to the characteristics of the developed data set, the optimal clustering center point is selected and reclustered using the K-means approach to obtain more accurate anchors. The contributions of this work can be summarized as follows: (1) Establishment of a data set [11]. (2) Simplification of network structure. (3) Selection of accurate anchors. In Sect. 2, we introduce the new data set, propose the optimization scheme of the network structure of YOLO v3, and select a more accurate anchor value. Section 3 shows the experimental results and analysis. Finally, Sect. 4 draws the conclusion of this paper.
Deep Learning-Based Segmentation of Key Objects
319
2 Proposed Scheme Based on the improved YOLO v3, this paper proposes a real-time location detection method for key parts of transmission line By building more targeted data set of power tower and insulator, and selecting more accurate anchor value through K-means clustering, the real-time detection of transmission towers and insulators can be realized. The flow chart of the detection method in this paper is shown in Fig. 1. For two types of detection targets of key parts of transmission line: towers and insulators, a new data set is established. Lighting, background and shooting angles are fully considered. After that, two kinds of detection targets in the data set are labeled with YOLO-Mark. And then more accurate anchor value are obtained through K-means clustering. The data set is put into the optimized convolution neural network to train the detection model. Finally, the results can be obtained by inputting the test data into the improved YOLO v3 model.
Establishment of data set
Selection of accurate anchor
Preprocessing
Training of detection model
Improved YOLO v3 model
Settings of training parameter
Simplification of network structure
Real time detection of key parts of the transmission Line
Test Data
Result
Fig. 1. Diagram of the proposed scheme
2.1 Establishment of Data Set The training date set of neural network usually require at least a few thousand. If the picture is too few, it will affect the accuracy and reliability of detection. Because there is no public data set of the key parts of the transmission line, a new data set is established by collecting online pictures and UAV photos. The pictures are expanded by the way of inversion and scale transformation. These images contain different illumination, different shooting angles of detection objects, different resolutions, different detection backgrounds and other conditions, which meet the requirements of sample diversity, and make them get purposeful optimization. The equalization of samples is of great significance to improve the robustness of the algorithm. f the illumination factor is not taken into account in the new data set, only a single illumination sample will lead to obvious false detection or missing detection when the trained model detects dark pictures or videos. Scientific data set can achieve better detection results with fewer training samples. Sample data set is as follows (Fig. 2):
320
M. Liu et al.
Fig. 2. Sample images in the developed data set
2.2 Simplification of Network Structure YOLO v3 is proposed by Joseph Redmon and Ali Farhadi of Washington University in the United States. Its detection speed is very fast and its detection accuracy is high. In the aspect of basic image feature extraction, YOLO v3 adopts a network structure called darknet-53 [12, 13], which has 53 convolution layers in total. In order to better extract features, some parts of the structure refer to the method of residual network and set up quick links between some layers. In the whole structure, since there is no pooling layer and full connection layer, when the convolution layer is transferred, the tensor size transformation is realized by changing the convolution kernel moving step length, such as strip = (2,2), which is equal to dividing the image side length by 2, making the area 1/ 4 of the original [14]. In the whole convolution process, YOLO v3 experienced 5 times of down sampling (Fig. 3). In order to improve the detection ability of small targets, the network structure of YOLO v3 is optimized. YOLO v3 realizes feature fusion by concatenating convolution layers of different depths and depths, and constitutes FPN structure by three sets of feature graphs of different sizes. The feature outputs of 13 × 13, 26 × 26 and 52 × 52 are respectively for the detection of large, medium and small targets. In this paper, the feature output of 52 × 52 for small target detection is eliminated. In addition, res unit is added behind the backbone network to enhance features. Before the output of y1 and y2, four DBLs are replaced by two res units, which the size and number of convolution cores are kept unchanged. 2.3 Selection of Accurate Anchor K-means algorithm is a typical distance based clustering algorithm, which evaluates the similarity according to the distance, that is, the closer the distance between two objects, the greater the similarity. Clusters are made up of objects close to each other, so the final goal is to get compact and independent clusters. In this paper, K-means is used to calculate the anchor value, because using Euclidean distance will cause more errors in large bounding boxes than in small ones. And then get good IoU scores through anchors. IoU is the ratio of intersection and union of prediction box and annotation box. The larger the ratio is, the better the detection performance of the detection model is, and the higher the accuracy is. In this paper, 1 to 9 clustering centers are set for the data set,
Deep Learning-Based Segmentation of Key Objects
DBL
res1
res2
res8
res8
res4
DBL
DBL
conv
321
y1 13x13x21
416x416x3 up sample
concat
DBL
DBL
DBL
conv
y2 26x26x21
up sample
concat
DBL
DBL
DBL
conv
y3 52x52x21
DBL
=
conv
BN
Leaky ReLU
res unit
=
DBL
DBL
add
(a) YOLO v3 DBL
res1
res2
res8
res8
res4
res2
DBL
conv
y1 13x13x21
416x416x3 up sample
concat
DBL
DBL
res2
conv
y2 26x26x21
DBL
=
conv
BN
Leaky ReLU
res unit
=
DBL
DBL
add
(b) Improved YOLO v3
Fig. 3. Comparison of the network structure of the original and improved YOLO v3 frameworks
and the anchor and IoU values obtained from clustering are shown in the following table (Table 1): Table 1. Results of K-means clustering. K
Anchor
Avg IoU
1
54, 118
21.21%
2
30, 61, 138, 324
38.31%
3
26, 47, 63, 182, 173, 364
43.83%
4
24, 42, 57, 129, 95, 357, 232, 353
47.19%
5
29, 24, 13, 56, 84, 63, 34, 167, 153, 356
58.53%
6
12, 48, 33, 27, 30, 145, 92, 67, 91, 344, 226, 359
59.58%
7
12, 50, 28, 22, 59, 50, 30, 151, 132, 89, 88, 348, 217, 368
60.15%
8
11, 45, 29, 23, 24, 109, 63, 50, 43, 215, 136, 91, 101, 367, 234, 364
61.06%
9
11, 44, 27, 22, 22, 90, 58, 44, 34, 182, 116, 84, 69, 328, 133, 374, 267, 349
61.88%
The above table shows different anchor values and different Avg IoU obtained by setting different number of cluster centers. With the increase of cluster center, Avg IoU shows an increasing trend, the increasing speed is fast first and then slow, and gradually tends to convergence. When k = 6, Avg IoU reached 59.58%, and then became stable.
322
M. Liu et al.
Therefore, in order to reduce the amount of calculation as much as possible, this paper selects the anchor value when k = 6: (12, 48), (33, 27), (30, 145), (92, 67), (91, 344), (226, 359).
3 Experiments The experiment development environment is as follows: CPU: Intel i7. GPU: NVIDIA GeForce GTX1060. RAM: 16G. Deep learning network framework: Darknet-53. In order to train the model better and get better detection results, the training parameters of YOLO v3 are optimized. In this paper, there are two kinds of detection targets, transmission tower and insulator. Each type of detection target has 4000 iterations. The input resolution is 416 × 416, and multi-scale training is started. The learning rate determines the speed of weight updating. If it is set too much, the result will exceed the optimal value, and if it is too small, the descent speed will be too slow. So a dynamic learning rate is set up to get a better model. When setting 0 < iteration < 6400, lr = 0.001; 6400 < iteration < 7200, lr = 0.0001; 7200 < iteration < 8000, lr = 0.00001, and the learning rate of the whole training process decreases by 100 times (Fig. 4).
4.5 4.0 3.5
LOSS
3.0 2.5 2.0 1.5 1.0 0.5 0
800
1600 2400 3200 4000 4800 5600 6400 7200 8000 Iteration
Fig. 4. Loss curve
The loss curve above is obtained through training. The early training lr = 0.001 is set to make the loss rapidly decrease. From the 1600th iteration, the speed of loss decrease slowly becomes more and more stable. From the 6400th iteration, it tends to converge. Finally, the loss value converges to 1.3270. The ideal small sample training result is obtained. The reliability of the model is verified by a large number of picture tests. The experimental results are as follows: The detection accuracy of the optimized YOLO v3 for transmission towers and insulators is quite reliable, and which meets the requirements of real-time detection. Compared with the existing inspection method, the algorithm in this paper can not only ensure the detection accuracy, but also reduce the amount of calculation and improve the detection speed. The new data set can still have high detection performance in low light conditions, especially in extreme weather or natural disasters before and after the scene is still able to perform inspection tasks (Fig. 5). Six representative experimental results are selected for analysis, (a) (b) (e) (f) represent the results of single target and multi-target detection in complex background.
Deep Learning-Based Segmentation of Key Objects
323
Fig. 5. Partial test results
Because of the diversity of data sets, the detection model has the ability to detect the transmission tower with partial structure, and also reflects the generalization ability of the model. The improvement of convolutional neural network greatly improves the ability of the trained model to detect large and medium size targets. (c) and (d) are the detection in low light environment. Single target and multi-target in different background can also be detected accurately. Table 2 shows the data comparison between original YOLO v3 and the improved YOLO v3. The recall and average IoU of the improved YOLO v3 are almost the same as the original. At the same time, precision is increased by 4%, Total BFLOPs is reduced by 2.2, and real-time detection speed is guaranteed. Table 2. Comparison of test results. Algorithm
Recall
Average IoI
Precision
Speed
Total BFLOPs
YOLO v3
0.42
62.06%
81%
33.2 ms/frame
65.297
Improved YOLO v3
0.37
62.66%
85%
33.6 ms/frame
63.097
4 Conclusion Based on the improved YOLO v3, an algorithm to detect the key parts of transmission line is proposed. The data set of transmission tower and insulator are newly established, and pictures in low light environment are added. According to the detection requirements of large and medium-sized targets, a simplified network structure of YOLO v3 is proposed to reduce the calculation of the model. At the same time, res unit is added behind the backbone network to make feature reuse and make up for the loss of feature brought by
324
M. Liu et al.
structure simplification. Furthermore, K-means clustering is used to get more accurate anchor value, which makes the detection accuracy further improved. In addition, the experimental results show that the detection accuracy of the improved YOLO v3 model is 85%, and the detection speed is 33.6 ms/frame, which has a certain detection ability under low light conditions. It will improve the efficiency of UAV inspection in the actual scene, and has high application value. Acknowledgment. This work was supported in part by the National Natural Science Foundations of China (No. 61671412); Zhejiang Provincial Natural Science Foundation of China (No. LY19F010002); the Natural Science Foundation of Ningbo, China (2018A610053); Ningbo Municipal Projects for Leading and Top Talents (NBLJ201801006); and Innovation and consulting project from Ninghai Power Supply Company, State Grid Corporation of Zhejiang, China.
References 1. Zormpas, A.: Power transmission lines inspection using properly equipped unmanned aerial vehicle (UAV). In: IEEE Instrumentation and Measurement Society, pp. 1–5 (2018) 2. Miao, X.: Insulator detection in aerial images for transmission line inspection using single shot multibox detector. IEEE Access 7, 9945–9956 (2019) 3. Yang, Q., Yang, Z., Hu, G., Du, W.: A new fusion chemical reaction optimization algorithm based on random molecules for multi-rotor UAV path planning in transmission line inspection. J. Shanghai Jiaotong Univ. (Science) 23(5), 671–677 (2018). https://doi.org/10.1007/s12204018-1981-2 4. Bhola, R.: Detection of the power lines in UAV remote sensed images using spectral-spatial methods. J. Environ. Manage. 206, 1233–1242 (2018) 5. Wang, Y.: Detection and Recognition for Fault Insulator Based on Deep Learning, pp. 1–6 (2018) 6. Zhao, Z.: Insulator detection method in inspection image based on improved faster R-CNN. Energies 12, 1204 (2019) 7. Sengupta, A., et al.: Going deeper in spiking neural networks: VGG and residual architectures. Front. Neurosci. (2018) 8. Qiao, Y.: Cattle segmentation and contour extraction based on mask R-CNN for precision livestock farming. Comput. Electron. Agricult. 165, 104958 (2019) 9. Lu, J.: A vehicle detection method for aerial image based on YOLO. J. Comput. Commun. 06, 98–107 (2018) 10. Leng, J., Liu, Y.: An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput. Appl. 31(10), 6549–6558 (2018). https://doi.org/10.1007/s00521018-3486-1 11. Zhao, W.: Research on the deep learning of the small sample data based on transfer learning. In: AIP Conference Proceedings, vol. 1864 (2017) 12. Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement (2018) 13. Ju, M.: The application of improved YOLO V3 in multi-scale target detection. Appl. Sci. 9, 3775 (2019) 14. Kim, K.: Performance Enhancement of YOLOv3 by Adding Prediction Layers with Spatial Pyramid Pooling for Vehicle Detection, pp. 1–6 (2018)
Classification of Chinese and Western Painting Images Based on Brushstrokes Feature Liqin Qiao1,3 , Xiaoying Guo2,3(B) , and Wenshu Li4 1 School of Computer and Information Technology,
Shanxi University, Taiyuan 030006, Shanxi, China 2 School of Automation and Software Engineering,
Shanxi University, Taiyuan 030013, Shanxi, China [email protected] 3 Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006, Shanxi, China 4 Department of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, Zhejiang, China
Abstract. Painting is an important witness of the development of human civilization. In the communication and collision of Chinese and Western culture and art, because of the differences in political, geographical, historical and cultural backgrounds of the two countries, we find that there are great differences in the process of creating Chinese and Western art. As an expressive form of painting language, it truly and accurately reflects the painter’s personality and unique psychological activities. The innovation of this article is intended for distinguishing Chinese from Western paintings by leveraging the brushstroke characteristics of paintings carefully. In particular, we run edge detection method and Sobel operator to extract the characteristics of brushstroke; meanwhile, this research uses a 3 * 3 filter of image filtering to obtain image edge line. Considering the continuity of the painting brushstroke, we use morphological operation to remove noise and track to correction, connect and filter the edge of the line that are detected, aiming to extract the brushstroke features of painting. On this basis, combined with the deep learning model, we propose a new Chinese and Western painting classification framework, which helps to describe the style of painting works and improve the accuracy of Chinese and Western painting classification. Regarding Chinese and Western painting database constructed in the article, SVM shows its unique advantages compared with four commonly used classifier methods. In addition, this paper compares the classification based on brushstroke features to that without, the results show that the accuracy of classification based on brushstroke is nearly 10% better. Keywords: Brushstroke · Painting images · Classification · Chinese and western painting · Brushstroke features
© IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 325–337, 2020. https://doi.org/10.1007/978-3-030-65736-9_30
326
L. Qiao et al.
1 Introduction Painting is one of the most important cultural heritages in the world. At present, using advanced machine learning algorithm based on big data to study painting image has become a popular research topic in the field of image processing and computer vision. Nowadays globalization has become a developmental trend. As the frequent communication between Chinese and western painting art, we can better find the differences between these two kinds of art. Different people have different aesthetic, and therefore they choose to watch different paintings. In order to enjoy different paintings easily, using a computer to analyze the classification of Chinese and western painting has become an urgent need. In the field of computer vision, natural images mainly describe the real scene objectively, so it has small difference in content. On the contrary, drawing images is the manual work featured in painter style and artistic genre, its content deviates from the reality. And it reflects the painter’s emotion. Besides, for viewers, appreciating paintings tends to self-knowledge, and they can not accurately distinguish artist style and artistic genre. Therefore, this article uses feature extraction and computer image analysis to learn a lot of knowledge from painting images, and then extracts the features of painting images, and conducts sentiment analysis and style classification. Traditional painting image researches mainly focus on analyzing the image, and pay more attention to the content of the painting itself, such as shape and key points. However, the artistic style of the image is a high-level semantic concept. In order to better analyze the painting works, the researchers look into painting techniques in the painting process and classify paintings by color, texture, overall structure, etc. However, new perspective generates new problems. Researchers manually extract complex features from their personal experience, which cause the loss of details and hence poor model generalization ability, challenged for the classification of painting images. Based on their characteristics of Chinese and Western paintings, the shapes, key points, colors and textures used in these techniques, no matter which technique is used, are to vividly depict the human and object images in the image, enrich the picture effect and enhance the taste of art. So different brushstrokes are left in the paintings, and it is a feature that can reflect different styles. Brushstroke is the trace left by the painter in the process of painting, and it is also the characteristic of the painter in painting. Different brushstrokes present different feelings. The brushstroke is considered to be an important reflection of painting techniques, and a natural exile of the artist’s personality, taste, and artistic endowment such that it can be used to distinguish painting style. In this paper, we propose a new method of extracting brushstoke. The edge detection method is used to extract the stroke features. Considering that one stroke may not be sharp or broken, morphological operations are used to remove the noise and seal the edge. After the detected edge is connected and filtered, the CNN model is used to extract the deeper features which are input into the SVM classifier afterwards, in this process the classification of Chinese and western paintings is achieved. Compared with the nonbrushstrokes classification method, the experimental results show that the accuracy of the classification based on the brushstroke features increases by nearly 10%.
Classification of Chinese and Western Painting Images
327
2 Related Works International scholars have studied the brushstroke feature of painting. In China, Li et al. [1] proposed a new automatic brushstrokes extraction system to analyze the characteristics of Vincent van Gogh’s unique brushstrokes style with scientific arguments. Based on the comprehensive methods of edge detection [2], clustering and image segmentation, achieving automatic brushstroke extraction system, in this system, the feature of brushstroke are divided into the interactive features which depend on the distribution of neighboring brushstroke and the independent features of the geometric shape of brushstroke. And a large number of features of brushstrokes automatically are compared by means of statistical analysis. Guo et al. [3] summarized and expounded the image features used in the image complexity evaluation method from the aspects of information theory, image compression theory, image feature analysis and eye movement data, and she concluded the problems of classification and regression in image complexity modeling. Fan et al. [4] studied Wu Guanzhong’s works, and analyzed the effect of different brushstroke thickness on the visual complexity. In the end, he proposed a method to estimate the thickness of the brushstroke by calculating the color change of the neighboring pixels. Sun et al. [5] also analyzed the feature of length, curvature and density of traditional Chinese painting brushstrokes. First, they extracted the painting brushstrokes with the method described in literature [1], and then proposed a method of histogram synthesis of the three features of the brushstrokes, including boundary length, flatness and average density. In abroad, Ting [6] discussed a specific Van Gogh’s painting that requires some manual operations to complete the process of extracting brushstrokes. In order to find the brushstroke feature, David [7] and others need to manually input the details of the painting works, while for other completely different style works, other methods are needed. In foreign countries, in terms of classifying painting images by extracting features, Condoroviei et al. [8] proposed a system for automatically identifying different styles of digital painting. It uses literature [9] combine the theory and framework to extract brightness and shape features, and use literature [10] Gabor Energy method to extract texture and edge information, 7 different classifiers are used to classify more than 3400 paintings of 6 different styles. For domestic work related to the classification of painting images, Yang [11] analyzed the artistic styles of western paintings and extracted the features of the two different artistic styles based on the ratio of color pairs, white space features and light consistency, which are classified by Support Vector Machine. Bai et al. [12] analyzed the different forms of expression and the causes of the formation of Chinese and Western painting, summed up two research methods of painting image aesthetics: experimental aesthetics and computational aesthetics, and he also summarized the development of painting image computational aesthetics from the perspective of painting image classification. Zou et al. [13] classified Chinese DunHuang frescoes by describing the appearance and shape features of the paintings, refined SIFT features into a four-layer deep learning network, and coded them using unsupervised deep learning methods. Through the comprehensive analysis using the word bag method, they found that the average recognition rate is 84.24% when the classification support vector machine is trained and tested, which has a significant advantage over the classification result of literature [14], which is 76.67%. Jia et al. [15] summarized the development of
328
L. Qiao et al.
Chinese and Western painting feature extraction techniques and classification methods based on the details of the stroke feature, color features, shape features and texture features, and blank features of painting images, summed out the common machine learning methods, such as support vector machines, decision tree, artificial neural network and deep learning, outlined the advantages of various methods, drawing emotion analyze the features of image extraction and classification.
3 Characteristics of Chinese and Western Paintings The characteristics of Chinese and Western paintings are different, especially the stroke characteristics, as shown in Fig. 1.
Fig. 1. Paintings with different characteristics of brush strokes
In Chinese painting, flowers and birds, figures and mountains and rivers are the main features. The painting style mainly includes two categories: ink painting and murals. The representative painters are Qi Baishi, Xu Beihong, Zheng Banqiao and Wu Guanzhong. The painting usually takes the line as the basic modeling means, uses the color as the auxiliary characteristic, and does not pay attention to the light and shade. Regardless of landscape, flowers, birds, figures, painters always use different lines to draw the outline, and supplemented the colors. They mainly focus on the implicit, concise and euphemistic styles and pay attention to the performance and freehand brushstrokes. Western paintings are mainly based on relatively direct and intuitive painting techniques, most of which are represented by portraits and religious paintings, focusing on reproduction and realism. Painting styles include Renaissance style, Romantic style, Abstraction, and Impressionism. It is usually represented by classical realism. When creating works, painters use the same speed and force of the pen and apply the paint evenly on the canvas. The overall brushstrokes are more delicate, smooth and traceless. Therefore, traditional Chinese painting is quite different from western painting, and the corresponding brushstrokes features are also different.
Classification of Chinese and Western Painting Images
329
4 Brushstrokes Feature Extraction Methods In order to extract brushstroke features, we first use a method based on edge detection. After the edge lines around the stroke are identified, morphological operations are used to remove noise and closed edge lines. Figure 2 shows a flowchart for extracting stroke features:
images
Edge detection
3*3 filter
Sobel operator
Grayscale image
Morpholocal operations
Open before close operation
Fig. 2. Flowchart of extracting brushstroke features
4.1 Image Preprocessing Edge detection is a basic method in image processing and computer vision. It is not affected by changes in overall light intensity. Most of them are based on edges, but the edge lines detected by different operators have different effects. Through comparison experiments, as indicated in the box shown in Fig. 3, it was found that the edge lines detected by Prewitt operator would be lost or not clear enough. Therefore, as shown in Fig. 3, we selected the result of edge detection of a painting in the database. In this stage, the input image is first converted into a grayscale image. Then a 3 * 3 filter is used to filter the image, and the filter is convolved with the image to obtain the gradient image.
Fig. 3. Original image and Effect diagram of Sober operator and Prewitt operator
The results show that the 3 * 3 filter not only has less calculation time, but also has a slightly higher accuracy. So 3 * 3 is the best filter. The results show that the 3 * 3 filter
330
L. Qiao et al.
not only has less calculation time, but also has higher accuracy. So 3 * 3 is the best filter. In the edge detection process, the results show that the calculation time of 3 * 3 filter is about 45.02 s, which is 20.12 s less than 5 * 5 filters and 33.08 s less than 7 * 7 filters. In the final classification accuracy, the classification accuracy of the filter with size of 3 * 3 is 3% higher than that of 5 * 5 and 2% higher than that of 7 * 7. Because edge detection emphasizes image contrast, in other words, the difference in brightness, it can enhance the boundary features in the image. The target boundary is actually the step change of the brightness level, while the edge is the position of the step change. The horizontal and vertical grayscale values of each pixel in the image are combined by the following formula, and the grayscale size of this point is calculated: (1) G = GX2 + GY2 4.2 Morphological Operations As shown in Fig. 4, the edge line detected from the image may not be a stroke. Because the edges are not completely sharp and complete, when the fracture occurs, morphological manipulation is performed to complete the closure of the edges.
Fig. 4. Broken edge line diagram
Morphological operation in digital images refers to the use of digital morphology as a tool to extract image components that are useful for expressing and describing regional shapes, such as boundary, skeleton and convex hull, as well as morphological filtering, refinement and pruning after processing. Figure 5 is the image after edge detection, and is the result after morphological operation.
5 The Proposed Method After extracting the brushstroke features through edge detection algorithm and morhological operation, this paper combines the brushstroke features of painting images, studied the selection of classifiers in the classification model of Chinese and western paintings, and finally selected the appropriate classifier to classify Chinese and western paintings. The experimental steps are shown in Fig. 6.
Classification of Chinese and Western Painting Images
331
Fig. 5. The left picture shows the effect picture of edge detection, and the right picture shows the effect picture after morphological operation
Chinese and Western painting datasets
Classification results and evaluation
Feature extraction
SVM classifier
Sober operator brushstrokes Morphological operations CNN model
Fig. 6. Experimental flow chart
5.1 CNN Model In this study, after performing morphological operations on the image after edge detection, the data is fed into a CNN to extract the deep learning feature. The architecture of our five-layer ConvNet model is shown in Fig. 8. The data with size of 64 × 64 is inputted. In layer C1, the first convolutional layer filters the input data with six kernels of size 5 × 5, producing six maps with size of 60 × 60. In following layer S1, with a subsampling ratio of 2, each map reduces the size of feature to 30 × 30 by performing max pooling. In layer C2, the second convolutional layer filters the data with 12 kernels of size 5 × 5, producing 12 maps with size of 26 × 26. Then the feature size of 12 map is reduced to 13 × 13 in layer S2. In the last layer, a full connection layer is designed to get a 2028-dimensional vector from layer S2. Finally, a 1014-dimensional feature is obtained in the output layer. In this study, the CNN classifier is not used, but the last layer of CNN is inputted into the final SVM classifier. The support vector machine makes the output as a new feature vector and finally classify. Giving the training data (xi , yi ) of “m” instances, it can be defined as: Training data: xi , yi xi ∈ RN , yi ∈ {−1, 1}, i = 1, 2, 3 · · · m. Where xi is from the N-dimensional feature space x and yi indicates the class, to which the corresponding xi belongs.
332
L. Qiao et al.
(a) Chinese paintings
(b) Western paintings Fig. 7. Chinese and Western paintings in the datasets
CNN Model 6@60×60 2028 1014 6@30×30 12@26×26
SVM
12@13×13 Y
1
2
3
4
Margin
5
S2:2*2 S1:2*2
X
C2:5*5 Output
C1:5*5
FC
Fig. 8. CNN modelin the proposed method
Classification of Chinese and Western Painting Images
While the objective function is:
n 1 2 εi min ω + c ω,ε,b 2
333
(2)
i=1
where C is the regularization parameter, εi is a slack variable. The above objective function can be reformulated to solve the optimization problem using quadratic programming: max
m
λi −
i=1
m 1 λi λj yi yj k(xi , yj ) 2
(3)
i,j=1
0 ≤ λ ≤i C, i = 1, 2, 3 · · · m,
m
λi , yi = 0
(4)
i=1
where k(xi , xj ) = xi .xy is the kernel function. In the following work, we choose Support Vector Machine Classification. 5.2 Support Vector Machine Classification The SVM is used to perform the classification in our work since it is confirmed to be effective. The basic idea of the SVM is to convert a nonlinear separable problem into a linear separable problem by searching an optimal hyper-plane. The optimal solution is to maximize the distance of each class from the hyper-plane. The SVM was proposed for a two-class classification problem originally. It shows unique advantages in solving the problem of non-linear and highdimensional pattern recognition in small sample data sets. As an effective method, SVM overcomes the traditional dimensional disaster, and is successfully applied to text classification, speech recognition, image classification, and so on. For the classification of binary images, the purpose of SVM is to learn the decision function: f (x) =
n
wi k(x, xi ) + b
(5)
i=1
The goal of the SVM is to search the optimal hyper-plane by maximizing the width of the margin between the two classes. The hyper-plane is: f (x) = wt φ(x)
(6)
Where {(xi , yi )}ni=1 is the training image set, and yi ∈ {−1, 1} is the label of the image. For the test image, assume that its feature description is x, if f (x) = 1 ,it is classified as a positive sample, otherwise, it is a negative sample. During the experiment, a linear support vector machine was selected for classification, and the parameter settings used universal values. In this study, two types of classification problems need to be solved. SVM is used to determine whether a test image belongs to Chinese painting or Western painting.
334
L. Qiao et al.
6 Experiment and Result Analysis 6.1 Datasets This article establishes a database of Chinese and Western painting images in order to measure our algorithm objectively and correctly. It is worth noting that the database is established by our team and can also be used by other researchers. The database contains 1083 Chinese ink paintings including includes characters, scenery, flowers, etc., and 1766 Western paintings of religion and landscape from different periods include Baroque, Rococo, Romanticism. Then we evaluate the proposed model for classification of Chinese and Western paintings based on stroke characteristics on this database. In order to verify the superiority of the model, we also selected some similar works from the two different types of painting in China and the West to classify in our model. In the experiment, the data set is divided into two parts: three quarters are used for training and one quarter are used for testing. Figure 7(a) shows samples of Chinese paintings. Figure 7(b) shows samples of Western paintings. 6.2 Classification Results In order to verify the importance of stroke features for the classification of Chinese and Western paintings, we conducted two classification experiments: classification based on stroke features and classification without stroke features. In the classification based on stroke features, the pre-processed image is inputted into the CNN model to automatically learn the good features of the image edges. In the experiment without brushstrokes, there is no image preprocessing, edge detection and morphology operations. In the absence of stroke features, we directly input 64×64 data images into the network model shown in Fig. 7. The experimental results are shown in Table 1. The classification accuracy of the classification model based on stroke features proposed in this paper reaches 89.04%, which is 10% higher than the classification model without brushstrokes. Table 1. Comparison of brushstroke characteristics and no brushstroke Classification No brushstrokes The proposed method Accurate
79.21
89.04
In addition, this paper also compares several commonly used classifiers (ID3, Decision tree, KNN, Naive Bayes). Proposed by Ross Quinlan [17], both ID3 and C4.5 algorithms, a classification prediction algorithm, are algorithms in decision tree. The core of ID3 is to construct decision tree recursively by applying information gain criterion to select features at each node of decision tree. By calculating the information gain of each attribute, it is considered that the one with high information gain is a good attribute, and the attribute with the highest information gain is selected as the partition standard for each partition. The process of C4.5 algorithm is the same as ID3 algorithm, but the method of feature selection is changed from information gain to information gain
Classification of Chinese and Western Painting Images
335
ratio. KNN algorithm is one of the simplest methods in data mining classification technology. Referring to reference [18], we set the parameter K value to 5. Naive Bayesian classifier originated from classical mathematical theory, so we can directly find out the joint distribution P (x, y) P (x, y) of feature output y and feature x, and then use the following formula to obtain. By comparison, the experimental results of the support vector machine classifier reach 89.04%. Compared with other classifiers, SVM can obtain better classification accuracy. The results are shown in Table 2. Table 2. Compare the classification results of different classifiers Classifier
ID3
C4.5
KNN
Naive Bayesian
CNN
SVM
Accuracy
81.14
80.57
82.29
82.35
78.85
89.04
It is verified that different network parameters have impacts on runtime and accuracy of the classification. As shown in Table 3, the training speed with epochs data with a size of 50 is nearly twice faster than the training with epochs data with a size of 100, but the accuracy decreases slightly. Table 3. Comparison of training time and accuracy under different amounts of data Batchsize Epochs Training time/min Accuracy 50
50
41.45
85.23
50
100
73.62
89.04
6.3 Discussion In order to verify the classification performance of brushstroke features, we also did a comparative test without extracting stroke features, and found that extracting brushstroke features important factors for the success of the classification system. In summary, brushstrokes are considered to be one of the important techniques for identifying image styles, and can accurately classify images. In the process of researching and analyzing the painting characteristics of brushstrokes, classifiers are often used to make predictions in different application fields. Each classifier has its own characteristics, and different classifiers are suitable for different situations. In order to evaluate the performance of each classifier, the algorithm extracts features then feeds them into the classifier. It can be seen from Table 2 that support vector machines have superior classification performance, so this study uses support vector machines as the final classifier.
336
L. Qiao et al.
By comparing the difference in training time and accuracy of different data sizes, we found that data size is determined by the original subset of pixels. By inputting data of different sizes into CNN, a competitive accuracy can be obtained, and the training speed is much higher than that of the original pixel system. Therefore, choosing an appropriate data size can reduce training time and obtain better classification performance.
7 Conclusion In this paper, the edge detection method is used to extract the stroke features. Considering that one stroke in the stroke may not be sharp or broken, morphological operations are used to remove the noise and seal the edge. After the detected edge is connected and filtered, the CNN model is used to extract the deep learning features and input them into the SVM classifier, which can better achieve the classification of Chinese and western paintings. Compared with the non-brushstrokes classification method, the experimental results show that the accuracy of the classification based on the brushstrokes characteristics increases by nearly 10%. We demonstrate that the brushstroke is an significant element in classification. Acknowledgements. This paper is supported by National Natural Science Foundation of China (Grant No. 61603228); Research Project Supported by Shanxi Scholarship Council of China(Grant No. HGKY2019001); Shanxi Province Science Foundation for Youths (Grant No. 201901D211171); the Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi (Grant No. 2020L0036).
References 1. Li, J., Yao, L., Hendriks, E., Wang, J.Z.: Rhythmic brushstrokes distinguish van gogh from his contemporaries: findings via automated brushstroke extraction. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1159–1176 (2012) 2. Meer, P., Georgescu, B.: Edge detection with embedded confidence. IEEE Trans. Pattern Anal. Mach. Intell. 23(12), 1351–1365 (2001) 3. Guo, X.Y., Li, W.S., Qian, Y.H., Bai, R.Y., Jia, C.H.: Computational evaluation methods of visual complexity perception for images. Acta Electron. Sinica 48(4), 819–826 (2020) 4. Fan, Z.B., Li, Y., Yu, J., Zhang, K.: Visual complexity of chinese ink paintings. In: ACM Symposium on Applied Perception (2017) 5. Sun, M., Zhang, D., Wang, Z., Ren, J., Jin, J.S.: Monte Carlo convex hull Model for classification of traditional Chinese paintings. Neurocomputing 171, 788–797 (2016) 6. Yu, T., Eberly, J.H.: Finite-time disentanglement via spontaneous emission. Phys. Rev. Lett. 93(14), 140404–140404 (2004) 7. Lowe, D, G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) 8. Condorovici, R.G., Florea, C., Vrânceanu, R., Vertan, C.: Perceptually-inspired artistic genre identification system in digitized painting collections. In: Kämäräinen, J.-K., Koskela, M. (eds.) SCIA 2013. LNCS, vol. 7944, pp. 687–696. Springer, Heidelberg (2013). https://doi. org/10.1007/978-3-642-38886-6_64
Classification of Chinese and Western Painting Images
337
9. Gilchrist, A., Kossyfidis, C.: An anchoring theory of lightness perception. Psychol. Rev. 106(4), 795–834 (1999) 10. Grigorescu, S.E., Petkov, N., Kruizinga, P.: Comparison of texture features based on Gabor filters. IEEE Trans. Image Process. 11(10), 1160–1167 (2002) 11. Yang, B.: Research on painting image classification based on aesthetic style. Zhejiang Univ. 2 (2013) 12. Bai, R.Y., Guo, X.Y., Jia, C.H., Geng, H.J.: Overview of research methods of painting aesthetics. J. Image Graph. 24(11), 1860–1881 (2019) 13. Zou, Q., Cao, Y., Li, Q., Huang, C., Wang, S.: Chronological classification of ancient paintings using appearance and shape features. Pattern Recogn. Lett. 49, 146–154 14. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2014) 15. Jia, C.H., Guo, X.Y., Bai, R.Y.: Review of feature extraction method and research on affective analysis for painting. J. Image Graph. 23(7), 0937–0952 (2018) 16. Vedaldi, A., Fulkerson, B.: Vlfeat: an open and portable library of computer vision algorithms. In: ACM Multimedia (2010) 17. Salzberg, S.L.: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 16(3), 235–240 (1994) 18. Abeywickrama, T., Cheema, M.A., Taniar, D.: K-nearest neighbors on road networks: a journey in experimentation and in-memory implementation. Proc. VLDB Endow. 9(6), 492–503 (2016)
Role and Value of Character Design of Social Robots Junichi Osada1,2(B) , Keiji Suzuki3 , and Hitoshi Matsubara3,4 1 Tohoku University of Art and Design, 3-4-5 Kami-Sakurada, Yamagata City, Japan
[email protected] 2 Future University Hakodate Graduate School, 116-2 Kamedanakano-Cho, Hakodate City,
Japan 3 Future University Hakodate, 116-2 Kamedanakano-Cho, Hakodate City, Japan
[email protected], [email protected] 4 The University of Tokyo, 7-3-1 Hongou, Bunkyo-Ku, Tokyo, Japan
Abstract. This paper presents on the meaning and value of the “character” of a robot. Through reflection and analysis of the robot development process and demonstration experiments with a robot designed by the authors, it was found that the completeness of the design of each element of the robot—such as its shape, color, sound, and so on—is not important; on the contrary, maintaining overall balance of those elements should take precedence. We considered that the goal of designing the robot was to avoid giving the user “discomfort.” Accordingly, to reach that goal, we confirmed the relationship through which the character of the robot is perceived by humans. Keywords: Social robot · Character design · Communication robot · Information design
1 Introduction In recent years, the technological progress of artificial intelligence (AI) has been remarkable, and opportunities to directly interact with artificial objects such as speakers equipped with AI we have been increasing. Such “AI devices” refer to a variety of information and provide us with something we want to know or happen. At that time, we experience an intellectual interaction with such AI devices that differs from that we get from conventional devices. In this paper, an object that provides such an intellectual interaction experience is called an “intelligent symbiotic artificial devices” (ISAD). As for ISADs, character designs such as “agents” on a screen and social robots are sometimes used to personify or add character to robots. Hereinafter, “character” refers to an anthropomorphic character or persona assigned to an intellectual symbiotic artificial device. In this study, we focus on social robots, which are especially physically present in our living environment as kind of intelligent symbiotic artificial devices.
© IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 338–350, 2020. https://doi.org/10.1007/978-3-030-65736-9_31
Role and Value of Character Design of Social Robots
339
2 Background We experience interaction with intelligent symbiotic artificial devices in various scenarios of everyday life. That interaction ranges from searching for a convenient train transfer or searching for a restaurant of choice on your smartphone to being provided with the conditions that you like at home, such the air-conditioning and lighting settings. The next time you start up your computer, you will see ads that reflect the content you just assessed on your smartphone. It is becoming a new social issue that personal information is acquired through these experiences, and that acquisition of our personal information gives us the feeling that someone is always watching our behavior, giving us a negative impression that makes us uneasy. Such experiences have created a negative image of AI and increased our anxiety that we will lose our work to AI in the future. In other words, the design of an intellectual symbiotic artificial device must allow the device to coexist with humans. An effective way to communicate with an ISAD is to give it a “character.” Some robots that communicate directly with humans often wear clothes and speak in a strangely familiar manner. It is likely that the developers of such robots have found such features necessary from demonstration experiments in social environments; however, cases in which clear intentions and effects are explained accurately and scientifically are few and far between. Moreover, the design method and acceptance criteria for the character to be assigned to an ISAD are personal. If we focus on the character of ISADs, we find that many AI speakers and other AI devices have voices that are spoken politely by adult men or women. Although these ISADs may be presented as representatives of “no character,” isn’t it right that the characters “adult women and men” and “no character” fully exist? It is thus urgent to design ISADs to allow them to interact with us in a manner that enables them to evolve with technology and become deeply involved in our daily lives. Incorporating “character” can be regarded as one effective method to satisfy that requirement; however, it is necessary to clarify meaning and effects scientifically and clarify the design method.
3 Peculiarities in Character Design Concerning Intelligent Symbiotic Artificial Devices Regarding methods for designing characters, many examples—such as designing fantasy worlds like animations and roll-playing games (RPGs) as well as regional, company, and product mascots—and we can use that knowledge for robot design. However, ISADs differ from these characters in the way they exist and interact directly with the “real” world. Moreover, even if you have a method for designing a character, you still need to determine what characters are suitable for the ISAD that you want to design. In the world of fantasy, such as anime and RPGs, a “world view” is designed first. Then, characters are created on the basis of their affinity with the world, their significance, and their role. As for ISADs, this world view is everyday life itself, and it is necessary to design the interacting users themselves as people in the world. That requires “user experience (UX) design,” namely, designing the experience of the user. UX design represents the value of user experience, and it is one of the most important themes in the design field today
340
J. Osada et al.
[1]. In other words, when designing a character for an ISAD, it is necessary to have knowledge of both the fantasy field (such as animation and RPGs) and “metadesign” theory—integrated from the perspective of UX design.
4 Research Methods The ultimate goals of this research are twofold: (i) clarify the necessity of a “character” in an ISAD and (ii) demonstrate design methods and present design guidelines. To achieve those goals, this research focuses on the design process. In particular, we investigate what role the character of an ISAD plays in its interaction with humans, and we attempt to clarify how to design the ISAD and identify what matters are important in that design process. As for the development of social robots, a sociological approach called “technomethodology” has been attracting attention, and actual interactions of robots with humans in society have been observed, and the observations fed back to the development [2]. As for that development, one idea of designers is to solve the problems observed in society. That idea seems to have been reflected from the “ethnography,” and by analyzing those interactions, we can expect to discover design processes and methodologies. For that purpose, it is effective to analyze previous cases; however, designing the character of social robots is problematic because such cases are very scarce, and the design process and design are personal and not externalized. In light of the above-described circumstances, in this study, we attempt to clarify a “personalized character design” by using a technique of art design. According to Sunaga, art is “a world woven as the integration of perception and conception”; namely, it lies between a world filled with perceived events—called “life”—and the “science” that has been proven as a conceptual structure. Art design includes “expression, creation, and synthesis (construction)” as creation and “explanation, criticism, and analysis” as reflection. By contrasting and coordinating these worlds of perception and conception, it creates a code of practice [3]. In other words, the process of developing a social robot is the “creative” part, and analyzing and considering the developed robot is the “reflection” part. Accordingly, in this study, which targets robots designed by the authors from 1997 to 2018, each design development is taken as “creation,” and the research activities to analyze and consider those designs are regarded as “reflection.” In this way, by contrasting and coordinating the former and latter, the meaning and effects of “characters” are clarified. Regarding the examples of the robots developed by the authors, each robot – – – –
can engage in intellectual communication, can interact directly with people (i.e., it is designed to interact with people), has a strong character, and has demonstrated its high degree of completeness at various experimental demonstrations (such as Expo 2005 Aichi). We therefore considered the specifications to be appropriate for this study.
Role and Value of Character Design of Social Robots
341
5 Characteristics of the Robots to Be Studied Next, the robots targeted in this study are described. The personal robot “PaPeRo”— developed between 2001 and 2008—is shown in Fig. 1.
Fig. 1. Personal Robot PaPeRo (2001, 2003, 2005 and 2008 model)
It is fitted with speech-recognition technology to enable verbal communication, and it has CCD cameras built in both eyes so it can identify people from the images it captures and confirm the identity of each person. With these communication functions as basic functions, PaPeRo has been applied in the field in places such as kindergartens and nursing-care facilities implemented [4]. The robotic agent called PIVO2, fitted in a concept car of Nissan Motors, is shown in Fig. 2. It supports driving while interacting with the driver through voice dialog [5].
Fig. 2. Robotic Argent PIVO2
A robot developed for research on interaction between humans and robots from aspects such as psychology and sociology is shown in Fig. 3. Up until that development, we have used PaPeRo in humanities research; however, we could not deny that the character and size of PaPeRo influenced the experimental results. We therefore made its height was variable and developed its character to be neutral. Based on PaPeRo and renamed “PaPe-Long” because of its increased height, PaPe-Long was demonstrated in tests carried out at local railways and rehabilitation hospitals, and it received the same high evaluations as PaPeRo [6–8].
6 Characteristics of the Robots to Be Studied The design of the robot (PaPeRo) was categorized in terms of each element, and a design specification was structured for each element. First, the design elements are broadly
342
J. Osada et al.
Fig. 3. PaPe-Long.
classified into ones that the user directly contacts (e.g., shape, color, voice, and dialogue) and contents such as utterances and the functions to be executed. Hereinafter, the former elements are called “surface” ones, and the latter elements are called “behavior” ones. The surface elements are further classified into more target elements, namely, shape, color, motion, timbre, sound, and dialogue (tone of voice) (see Table 1). Table 1. Structured design elements
Role and Value of Character Design of Social Robots
343
Behaviors are classed as functions such as utterance contents, “messages,” “reading out information on the Internet,” and “singing songs,” etc. Rather than being handled by elemental technologies such as voice recognition and face recognition, the functions are handled in units that are meaningful to the user. 6.1 Shape The element “shape” refers to the shape and size of the robot. As for PaPeRo, instead of simulating something already existing, we aimed to create a something new in a functional form that simply realizes basic functions. This styling concept was inherited by robots developed after PaPeRo, and although minor changes have been made to the details of each new robot, the subsequent robots are largely the same. The head and face were not changed. As for PIVO2, the concepts used in modeling vehicles were basically inherited. For research purposes, PaPe-Long has a function allowing it to reduce its height. Since our development purpose was “neutral design,” the torso of PaPe-Long is a duct hose pipe shaped as the minimum-required simple line covering the internal mechanisms in consideration of keeping the basic function of the design as is, and the rest of PaPe-Long is the same as PaPeRo. 6.2 Color The “color” element refers to the color of the robot, but in this study, the material and finish (matte, gloss, etc.) are also treated as color. Regarding color, in 2001, PaPeRo was colored vividly, while in 2003, its targeted color was silver. The exhibited model (“ExpoPaPeRo”, hereafter) and 2008 model were based on pearl white and used slightly different, brighter colors than the 2001 model; even so, their overall image is that inherited from the 2001 model. PIVO2 inherits the color of the car, and PaPe-Long has a simple white but satin-like cloth on the surface. This is because painting PaPe-Long in the same color as PaPeRo must be avoided because gloss and reflection of light create the image of an industrial robot. A matte paint was also a candidate, but a cloth was used in consideration of its durability in subsequent experiments. 6.3 Motion The element “motion” refers to the movement of the robot. As for PaPeRo, the neck can move, the LEDs forming the mouth can move, and the body can turn and move. As for motion, it is mainly directives such as the moving the neck when speaking and emitting the LEDs of the mouth. In the case of the expo version, the LEDs of the mouth were designed to correspond to the contents and application status of the robot so that the person demonstrating PaPeRo at the expo venue could check the robot status. In 2008, LEDs were added to the chest so that information could be presented via the chest LEDs. Regarding the motion of social robots, various previous studies have been reported, and they are expected to be significantly reflected in the present development; however, we would like to report them in detail as one study on another occasion.
344
J. Osada et al.
6.4 Tone of Voice The element “tone of voice” refers to the voice of the robot. Excluding the content of utterances, it purely means the sound of the voice. PaPeRo2001 has a voice recorded by a voice actor; the rest have synthesized voices. PaPeRo2003 and expo model had the same synthesized voice, while the synthesized voice of the PaPeRo2008 was a more child-like. PIVO2, PaPe-Long, and the PaPeRo2003 had the same voice as ExpoPaPeRo. 6.5 Sound The element “sound” is the sound, as well as its voice, emitted by the robot. As feedback to interactions, it the sound of a directed fanfare, BGM when the robot dancing, etc. As for PaPeRo2001, its voice was designed with a focus on comical American comics in combination with colors. PaPeRo2003 and ExpoPaPeRo extensively used electronic sounds like machines and synthesizers according to the color and speech-synthesized voice. As for ExpoPaPeRo and PaPeRo2008, the sound used as the user interface was mainly synthesizer-based, and they used common sound effects for scenarios with strong directive elements such as fanfares when the user answered quizzes correctly. PIVO2 adopts a sound based on that of PaPeRo2003, and PaPe-Long’s sound is designed on the basis of the same concept as PaPeRo2008. 6.6 Dialogue (Tone of Voice) The element “Dialogue” is the tone (b) of the robot’s speech. As for PaPeRo2001, the tone of voice was “−da yoo” (“That’s right!”) and “nee-nee” (“Guess/You know what!”). As for speech synthesis, it was difficult to produce unique expressions like the manner of voice recordings by voice actors, and the unnaturalness of utterances such as “−da yoo” (“That’s right!”) and “nee nee” (“Guess/You know what!”) was extreme. Since PaPeRo2003, which adopted speech synthesis, the utterances simply ended in “desu” or “masu” (standard verb ending and polite verb ending, respectively, used at the end of a Japanese) sentence. 6.7 Behavior Next, the element “behavior” is described. PaPeRo has a typical “cute” image, speaks in a friendly manner with phrases such as “Let’s play!”, gives quizzes, imitates things, makes jokes, and dances. On top of that, it has functions that match the work done in the field of each experimental demonstration. For example, in a kindergarten, it reads a picture book or says “Good morning” alongside the head teacher at the school entrance in the morning. ExpoPaPeRo is basically intended for caring for children at daycare centers in the future, so it will give intellectual quizzes and sing songs. In addition to providing route guidance and driving navigation, PIVO2 also utters phrases like “Your driving has gotten better recently.” Like PaPeRo, PaPe-Long imitates things by using jokes and sounds as well as handling matters related to the field of the experimental demonstration. For example, in an experimental demonstration at a local railway, PaPeLong imitated sounds of a train such a carriage door closing, a horn blowing, and the motor running.
Role and Value of Character Design of Social Robots
345
7 Considerations As for what matters were reflected through structuring, focusing on the design process, we consider (i) how the character is designed and (ii) how a person perceives that character. 1) How is the character designed? According to a study by Ito, et al., PaPeRo is an image representing “cute,” and that cuteness stayed almost the same from PaPeRo2001 to PaPeRo2008 [9]. Hereafter, we consider this “cute” to be a state that can be understood and accepted as a communication target. If the following two conditions, – being perceived as one character – being recognized as an object of interaction, are satisfied, characters are designed by maintaining a certain balance between all the surface elements and behavior elements; that is, it was considered that they are designed from both the viewpoint of character-design methods in fields such as RPG and animation and user-interface design in the field of UX design. A “certain balance” means that “discomfort” does not occur. For example, PaPeRo2001 has a recorded voice by a voice actor, but the other PaPeRos have a synthesized voice. The synthesized voices of PaPeRo2003 and ExpoPaPeRo are the same, but PaPeRo2008 has the synthesized voice that was a little child-like. When the voice was changed from a voice of an actor to a synthesized voice for the development of PaPeRO2003, we prototyped a robot that spoke in a voice synthesized from that of PaPeRo2001. The voice synthesized at that time was the general speech of an adult woman (e.g., a voice like that a of a TV presenter). The tone of the voice was the same as that of PaPeRo2001. As a result, developers and users who cooperated in evaluations was “not cute”. This evaluation meant that the two conditions above were not satisfied and the voice needed to be improved. Accordingly, when the recorded voice and synthesized voice were used together, we got comments that the voice gave the impression of being “transformed” or “possessed” at the moment it changed. Moreover, the voice actor’s speech (tone) was uttered as it was by the speech synthesis, so it was evaluated as “not cute.” Accordingly, with the tone as “desu” and “-mas,” the speech-synthesized voice was designed to have a higher pitch than that of an adult woman to make it sound like a child’s voice. When the sound was designed in this way, the bright colors did not mismatch the sound, so the color was changed to silver. And the sound effects were changed from that of the previous American comic tone to the sound of a mechanical synthesizer. As a result of these designs, although PaPeRo2003 was also perceived as the image of “cute,” its character clearly differed from that of PaPeRo2001. While PaPeRo2001 had a bright and zany character, PaPeRo2003 (with synthesized speech) gave a serious and clever impression. Focusing on the behavior element, after developing PaPeRo2003, we applied the various kinds of humor found in the comic genius robot “PaPe-Jiro” as the robot’s behavior [10]. With a serious and clever image, PaPe-Jiro worked with human comics
346
J. Osada et al.
(Fig. 4). A variety of “humor” was created using the characteristics of the PaPeRo concept and functions while performing a live show at an actual standup-comic venue. For example, by taking advantage of the characteristics of speech synthesis, it was possible to make PaPe-Jiro utter tongue twisters that were too fast to be audible, so that humor was updated to something like “robotic humor.” We consider that by making humorous utterances in this manner, PaPe-Jiro achieves a balance between the behavior elements and cuteness.
Fig. 4. Zenjiro and Pape-Jiro on the M-1 Live show
From the considerations described above, regarding how the character is designed, it is assumed that all the surface and behavior elements must be designed, not specific ones such as shape and voice, and that all of them are designed by maintaining a certain balance. It is considered that having no sense of discomfort at that time is a major criterion of design. This is close to the viewpoint of design called “tone manner” and the previously mentioned “world view” in the field of games and RPG. 2) Human perception of the character Next, human perception of character is considered. In this paper, human perception is a reflection on the cognitive activity of users and developers in the development of robots themselves and social experiments using the robots. It was therefore considered that humans tried to find meaning and character through active cognitive activities to understand social robots as objects of communication. In a psychological study with Kashiwabuchi, et al. conducted at an exposition, we conducted experiments to identify factors that strongly influence the image of the robot (Fig. 5) [11]. In addition to the normal PaPeRos, we prepared a PaPeRo limited to movement only, a PaPeRo with utterances only removed, and a PaPeRo subjected to multiple experimental conditions that restricted the surface and behavior elements (such as one with the communication function removed), and had them interact with each other to evaluate their individual images. In the experiment, in addition to identifying influences, we hypothesized that as the surface elements, behavior elements, and communication
Role and Value of Character Design of Social Robots
347
functions became more abundant, the evaluation would be better. However, experiments revealed that there were no significant differences between the surface elements, etc., and that it was not necessarily better to have more surface elements, behavior elements, and communication functions. Under for all the experimental conditions, even under the condition that PaPeRo’s voice-recognition function was restricted (cannot-communicate specification) and its movement was restricted (chat but not move), PaPeRo was still highly evaluated with comments like “Talking with PaPeRo was fun.” And even when PaPeRo did not respond to the user’s question, many of its cognitive behaviors were attempts to understand its inability to communicate with user who said, “Oh, I don’t like this sort of chat!” The comment “Oh, I don’t like this sort of chat!” is presumed to arise from PaPeRo’s interests and preferences, and it can be considered that cognitive activities aiming to understand the state of communication were conducted among users.
Fig. 5. Psychological observations
As for the development of PIVO2, the challenge was, unlike PaPeRo, its design, in which the robotic agent had a “state of doing nothing”. A “state of doing nothing” means that the robotic agent has no need to work for the user while the vehicle is running normally. During development, we designed each element for each interaction, so PIVO2 did nothing if the user did nothing. However, the user is always driving and the car is always running. It was observed that in that situation, if the agent did nothing for a certain period of time, the user became anxious and said, “Is it broken?” Therefore, even when the user did nothing for a certain period of time, PIVO2 added certain “living” feelings such as moving its neck a little [5]. This can be regarded as a positive cognitive activity by which the user tries to understand the state of the robot. During development, PaPe-Long was designed with the goal of not expressing a specific “character” for that purpose. The design concept was “neutral design.” Based on this concept, the body was designed from the torso with a contraction function first; however, no matter what we did, the design created a “girl-like” or “penguin-like” image (see Fig. 6). So, ultimately, we minimized the design of the body as a “contraction/elongation” function. As a result of that design, a sense of uncomfortableness between the design of the head (designed for PaPeRo) and the body (with minimized design and functions), so we redesigned the head (Fig. 7). When the head and body designs were balanced, arms
348
J. Osada et al.
Fig. 6. Designing the body
were added because having “no arms” was obvious. As a result of proceeding with such a design, the “character” was called “ponkotsu” (“retro robot images”)—which are the exact opposites of our original goal, namely “neutral.” Ironically, in a later demonstration experiment, PaPe-Long was evaluated as “cute” as PaPeRo, so it turned out that we had perfected its character [6].
Fig. 7. Designing the head
Looking back on this design development revealed that the design task was to eliminate the “uncomfortable feeling,” and as a result of eliminating the uncomfortable feeling, the “retro robot images” characters were designed. Given this, it might be possible for a developer think of something unconscious as a criterion for “discomfort.” At first the criterion was “neutral,” but in the first place, “neutral” is vague itself. We suppose that the contraction/elongation function was designed in a vaguely minimal manner, and the “duct hose pipe” adopted as the robot’s appearance seemed to give a sense of uncomfortableness based on that minimum design. In addition, as for part of the movement itself, the balance of the center of gravity was unstable, and that unbalance gave a kind of “unpredictable” movement. It is conceivable that as a result of proceeding with the design so as not to feel uncomfortable with this “duct hose pipe” and “unpredictable” movement,
Role and Value of Character Design of Social Robots
349
the design would finish up with the image of “retro robot images”. At this stage, the developers themselves unconsciously found the characters “retro robot images” from “duct hose pipe” and “unpredictable.” So we might also be able to think of a criterion for “discomfort” from those characters. If so, it is conceivable that the developer was also performing cognitive activities to find characters as guidelines for designing the target element. From the above, it is considered that humans are actively engaged in cognitive activities aimed at robots. That means trying to understand the state of the robot and the robot itself. And it can be considered that there is some prerequisite for that understanding, that is, the character of the robot. That character is perceived from the “surface” and “behavior” elements.
8 Concluding Remarks In this study, what is the role of the character of a social robot (that is, an intelligent symbiotic artificial device) was investigated by using an art-design technique? In particular, we tried to clarify how the robot was designed and what was important in designing it. As a result of this investigation, it is concluded that the character of social robots are important information in understanding the robot as a conversation partner in interaction with humans. It was also considered that the character “always exists” regardless of the developer’s intentions. This is because people actively and cognitively interact with social robots in an effort to understand them and try to perceive their character. If a robot does not show a character, people will look for information in more detail. So developers probably need to treat “character” as essential in development of robots. The clues to that character are all the surface and behavior elements mentioned in this paper, and it is important to consider the overall balance between those elements rather than the completeness of each element. At that time, an important viewpoint of design is “discomfort,” which can be handled from the viewpoints of “world view” and the user interface. Two main issues face the final purpose of this research. One is to confirm, through further development, whether or not the matters reflected in this study are true. The other is “character”; in particular; we must clarify relationships in the social environment where the social robot is placed.
References 1. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981) 2. Yoshinori, K.: Robotics Based on ethnomethodology. J. Robot. Soc. Japan 29(1), 27–30 2011 (In Japanese) 3. Takeshi, S.: Wisdom of Design: From Designing Interaction to Shaping the Society, Film Art Inc., (2019) (In Japanese) 4. Junichi, O.: A consideration of communication robots from the perspective of interaction design. Tama Art Univ. Bullet. (2012) 27, 103–113 (2012) (In Japanese) 5. Junichi, O., Ken, M.: Interaction Design for PIVO2 Robotic agent. In: Proceedings of Human Interface Symposium 2008,. pp. 133–136 (2008) (In Japanese)
350
J. Osada et al.
6. Junichi, O., Ryouhei, T.: A development and consideration for the communication robot design. In: Proceedings of Human Interface Symposium 2018, (2018) (In Japanese) 7. Kazutoshi, T., Junichi, O.:: Attempt to design medical industry as a service (4) -Consideration about the role and the social value of a communication robot-. In: Proceedings of the Annual Conference of JSSD. (2019) (In Japanese) 8. Junichi, O., Ryouhei, T.: Development of the communication robot for human-robot interaction. In: Proceedings of the annual conference of JSSD. (2018) (In Japanese) 9. Toshiki, I., Junichi, O.: Relations between degree of liking for personal robot “papero” and characteristics of personality. In: Proceedings of the 53th Annual Meeting of the Japanese Association of Educational Psychology. (2011) (In Japanese) 10. Junichi, O.Z.: Humorous interaction with a robot2. In: 9th International Conference on Entertainment Computing (ICEC2010) (2010) 11. Megumi, K., Junichi, O.: Psychological research of the human-robot communication(2). In: Proceedings of the 70th Annual Convention of the Japanese Psycohological Association. (2006) (In Japanese)
Edutainment and Art
Clas-Maze: An Edutainment Tool Combining Tangible Programming and Living Knowledge Qian Xing1,2 , Danli Wang1,2(B) , Yanyan Zhao1 , and Xueyu Wang1 1 The State Key Laboratory of Management and Control for Complex Systems, Institute of
Automation, Chinese Academy of Sciences, Beijing, China [email protected] 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Abstract. With the development of computer technology, children’s programming technology has become increasingly mature. However, the current content and form of children’s programming tools are too singular and do not integrate well with daily life skills. In the era of vigorously advocating environmental protection, it is particularly important to cultivate children’s environmental awareness and ability from an early age. The paper described a new edutainment tool named Clas-Maze, for children in 5–9 years old, which combines tangible programming and living knowledge, such as garbage classification. Clas-Maze consists of three parts: Programming blocks which can be used to construct a program to control the garbage’s route, external camera which can be used to collect the information of programming blocks, and the virtual environment which is the execution interface of the system on the computer. We wanted to explore whether the system is helpful to children in learning garbage classification and programming, and the difference between single and collaborative learning. So we conducted a user experiment with 37 children. The results showed that Clas-Maze can help children learn programming and garbage classification. Single and cooperative programming have their own advantages. In order to cultivate children’s ability of decision-making, communication and cooperation, people can choose cooperative programming. Keywords: Edutainment tool · Children’s programming · Computational thinking · Garbage classification · Learning effects
1 Introduction With the rapid development of computer technology, training of children’s computational thinking has gradually attracted people’s attention in the education community [1]. At present, the international community believes that children’s computational thinking is as important as “reading, writing, and computing” skills. It is also a cognitive skill that everyone should master. Cultivating this computational thinking mode from a young age can greatly improve their logical thinking ability, cognitive ability, creative ability, © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 353–368, 2020. https://doi.org/10.1007/978-3-030-65736-9_32
354
Q. Xing et al.
Fig. 1. Overview of Clas-Maze system
etc. [2, 3]. From American “programming one hour a day” campaign to adding programming education to the national curriculum in Britain [4, 5], and then to China’s A New Generation of Artificial Intelligence Development Plan, programming education is gradually being extended [6, 23]. Traditional programming languages are based on the form of text or symbols, and involve complex grammar and instructions. It is difficult for children to learn [7, 12], so some simpler programming methods have been developed for children’s programming learning, such as graphical programming language Scratch [8] and tangible programming language Tear [9, 10]. In recent years, with the rapid development of economy and society, environmental problems have become increasingly serious, and people’s awareness of environmental protection has been constantly enhanced. New regulations on garbage classification, which are more scientific and meticulous, have also been strongly promoted in most urban areas of China. Garbage classification refers that people put the same kind of garbage into the same dustbin in the designated place. For children, learning garbage classification not only helps them to establish environmental awareness, but also helps them to improve their ability of learning and classification. Notice on Promoting the Management of Domestic Garbage Classification in Schools [11] was issued by Chinese Ministry of Education on January 16, 2018, which proposed to widely adopt the form of telling stories, playing games, knowledge competition and other activities to carry out a variety of garbage classification themed education activities. And people should make full use of wall map, blackboard newspaper, publicity window, campus website and other publicity positions to vigorously promote garbage classification. The paper presented a new edutainment tool, Clas-Maze combining tangible programming and knowledge of garbage classification (Fig. 1). The system is designed for children 5–9 years old, so that they can not only cultivate computational thinking, but also improve their ability of garbage classification when they learn to program. This paper
Clas-Maze: An Edutainment Tool Combining Tangible Programming
355
explored three questions through the experiment: 1) Can children learn the knowledge of garbage classification through Clas-Maze, and whether there is a difference in the learning effect between single and double groups? 2) Can children learn the programming knowledge through Clas-Maze, and whether there is a difference in the learning effect between single and double groups; 3) Whether there is a difference in the learning performance between single and double group. Our study showed that the children’s two abilities were improved significantly after the experiment, but there was no significant difference between two groups. In the next sections, the related work, description of system and user study are presented. Result analysis, limitation and future work are also discussed.
2 Related Work 2.1 Children’s Programming Education In recent years, children’s programming education and programming tools have been deeply studied. Relevant research proved that using interactive programming tools in computer teaching can increase students’ enthusiasm and participation [28]. Numerous human-computer interaction laboratories and scientific research institutions have conducted research on tangible programming, such as Northwestern University, Massachusetts Institute of Technology (MIT), University of Colorado, Carnegie Mellon University (CMU), and Google. There are some great tangible programming tools, for example Storytelling Alice [13] is based on Alice 2.0 programming tool for storytelling. roBlock [14] supports user to set a robot by connecting different blocks, and robot can do specific actions by logical calculation. TurTan [15] is a programming tool based on computer vision technology, which uses a camera to capture and identify the location of fingers and objects on the desktop. T-Maze [16] is a tangible programming tool, in which children play the role of escaping the maze, and choose the way forward by placing programming blocks. TanProStory [22] is programming tool for storytelling, in which children can tell a story by arranging programming blocks to control a character. Tern [25], Strawbies [26], CoProStory [29] and Loopo [12] combine tangible language and virtual display interface. Users connect wood blocks or electronic slices to form program control flow, and get programming feedback through screen display, which can teach chil-dren computer programming knowledge [24]. The above tangible programming tools all use physical interaction technology, but they are all presented in the form of games, rarely combining practical skills in life. This paper proposed the programming system that combines the garbage classification knowledge and programming knowledge. It not only can develop children’s computational thinking and promote cognitive development of children, but also can help them to improve the ability of garbage classification. 2.2 Cultivate Children’s Garbage Classification The cultivation of children’s living ability is an important aspect of all-round development education for children. Today, with the continuous promotion of quality education,
356
Q. Xing et al.
it is very important to improve children’s living ability [17]. Garbage classification is a very frequently used item in living capacity. Many countries in the world have cultivated children’s garbage classification ability since childhood, such as Sweden, Belgium, Japan and so on. The most typical country is Japan. Japanese kindergartens take garbage classification behavior cultivation as an important part of their education. They cultivate children’s garbage classification ability from daily life, on-site visits, watching related cartoons, playing games and some other ways [18]. With the increasing efforts of environmental protection in our country, people also pay more and more attention to cultivate children’s ability and awareness of garbage classification, we believe it would be very beneficial to carry out garbage classification theme education in kindergartens. 2.3 Summary of Related Work Analysis of the above related work shows that children’s tangible programming tools and related technologies are relatively mature, and young children can learn programming through physical operations. However, previous programming tools only teach children programming knowledge, not combine living knowledge. Therefore, we implement ClasMaze, which is aimed to improve children’s multi-faceted abilities.
3 Clas-Maze Description Clas-Maze contains three parts: tangible programming blocks, external camera and the virtual environment of garbage classification. Programming blocks are made by 3 cm wooden brick cube as shown in Fig. 2. There are four faces in each block, which can be used to express four distinct semantics, and to reduce the number of programming blocks, thus lowering cost of the system. In the process of programming, the image of tangible programming block is collected by external camera, and the collected image is converted into programming language through ReacTIVision visual recognition library.
Fig. 2. Tangible programming blocks
The virtual environment of garbage classification includes garbage, maze, and dustbins, as shown in Fig. 3. There are four kinds of dustbins namely kitchen waste, recyclable, harmful waste and other waste, as shown in Fig. 4. While playing the game, children firstly need to judge the type of a garbage and then help the garbage get back “home” by placing the programming blocks. In order to make children better understand the function of programming blocks, Clas-Maze also provides feedback on programming. If programmed correctly, a forward arrow will appear, otherwise no arrow will appear and the route will stay at the previous step.
Clas-Maze: An Edutainment Tool Combining Tangible Programming
357
Fig. 3. The virtual environment of garbage classification
Fig. 4. Four kinds of dustbins
4 User Study To evaluate children’s learning effect and performance, we conducted a lab-based user experiment with children. 4.1 Goals of Study In this experiment, by analyzing the performance of children in the experimental video and questionnaires, we mainly explored the following three questions: 1. Can children learn the knowledge of garbage classification through Clas-Maze, and whether there is a difference in the learning effect between single and double groups? 2. Can children learn the programming knowledge through Clas-Maze, and whether there is a difference in the learning effect between single and double groups? 3. Whether there is a difference in the learning performance between single and double group? 4.2 Participants and Environment The experiment was conducted on August 6th and 8th, 2019 at Beijing Huacai Education Center. The whole experiment was conducted in a spacious and quiet classroom in Huacai
358
Q. Xing et al.
Education Center, and the temperature and humidity met the requirements of natural comfort. In addition to the necessary tables and chairs, the room also includes a video recorder, two notebooks, a camera, and a display. A total of 37 healthy children participated in the experiment. There are 19 boys and 18 girls, and their mean age is 6.38 years old (SD = 0.64). The participants were divided into Group G1, which completed the experiment independently, and Group G2 which completed the experiment by two children’s cooperative programming. The specific information of all participants is shown in Table 1. To protect the participants’ privacy, the researchers assigned the children a test code before they did the experiment. Table 1. The specific information of all participants Group
Age
Gender
Tangible program
Garbage classification
G1 (single)
Range = 6–7 Mean = 6.19 SD = 0.24
7 girls 6 boys
2 Had learned 11 No learned
5 Had learned 8 No learned
G2 (double)
Range = 5–8 Mean = 6.48 SD = 0.76
12 girls 12 boys
2 Had learned 22 No learned
16 Had learned 8 No learned
4.3 Experimental Design Tasks Design. Participants in this experiment need to complete two tasks. During the experiment, unless the children ask the researchers for help or there is a problem with the experimental equipment, the researchers cannot provide them with any help. • Task 1: By placing the programming block, move the first type of garbage, the battery, to the position of “Harmful Trash can” (simple) • Task 2: By placing the programming block, move the second type of garbage, cigarette butts, to the position of “Other Trash can” (more complicated) Process. The process of the whole experiment is as follows: • Firstly, we taught children knowledge about garbage classification and programming, including the meaning of programming, sequence, garbage classification and classification standards. • Then the researchers introduced the Clas-Maze system to the children, including the each part of the virtual environment, the name of each target garbage, the type of each dustbins, rules for programming and designing route, and the use of programming blocks. After getting familiar with the system, children need to complete a pre-test questionnaire, but they can’t get any tips or correct answers from the researchers.
Clas-Maze: An Edutainment Tool Combining Tangible Programming
359
• Next was the practice stage. Researchers led Children to complete a simple exercise task (put the egg shell in the kitchen waste dustbin). And children completed a complex exercise task independently (put plastic bottles in recyclable dustbin), in which they can ask researchers a variety of questions about system. • Finally, they needed to complete the two formal tasks mentioned in Section Tasks design. During this period, the researchers used the experimental record table to record participants’ behavior. At the same time, a video recorder was recording the programming performance and facial expression of them. After the experiment, children still needed to complete a post-test questionnaire. 4.4 Data Acquisition Questionnaire. Using the questionnaires had two main purposes: 1) compare children’s ability to deal with programming problems and garbage classification problems in pretest and pos-test; 2) investigate children’s subjective feelings about the system. So they were divided into pre-test questionnaire and post-test questionnaire, which aimed to collect children’s personal information, experience of learning garbage classification and using tangible programming, test children’s ability and children’s perception of system use. Considering children’s cognitive habits, this paper used the improved smiley-ometer rating scale [19] to collect subjective data. Record Table and Video Recorder. In order to facilitate more careful and comprehensive recording of children’s performance, we recorded the data of the children in the Table 2. Group G1 behavior coding scheme Key
Code
Type
Description
1
P
Programming time
State event
From put down the first programming block to complete both tasks
2
T
Thinking time
State event
No discussion, no program, time of duration is more than 5 s
3
H
Happiness
Point event
Cheerful behaviors including applauding, laughing, jumping and so on
4
C
Confusion
Point event
Be frustrated or overwhelmed, including scratching the head, frowning, sighing and pouting
5
A
Asking for help
Point event
Children asking researchers for help
6
E
Trial-and-error of garbage classification
Point event
Try different kinds of classification
7
W
Trial-and-error of programming
Point event
Try a variety of programming blocks
360
Q. Xing et al. Table 3. Group G2 behavior coding scheme Key
Code
Type
Description
1
P
Programming time
State event
From put down the first programming block to complete both tasks
2
T
Thinking time, left child
State event
No discussion, no program, time of duration is more than 5 s
3
S
Thinking time, right child
State event
No discussion, no program, time of duration is more than 5 s
4
H
Happiness, left child
Point event
Cheerful behaviors including applauding, laughing, jumping and so on
5
I
Happiness, right child
Point event
Cheerful behaviors including applauding, laughing, jumping and so on
6
C
Confusion, left child
Point event
Be frustrated or overwhelmed, including scratching the head, frowning, sighing and pouting.
7
D
Confusion, right child
Point event
Be frustrated or overwhelmed, including scratching the head, frowning, sighing and pouting
8
A
Asking for help, left child
Point event
Children asking researchers for help
9
B
Asking for help, right child
Point event
Children asking researchers for help
10
E
Trial-and-error of garbage classification, left child
Point event
Try different kinds of classification
11
F
Trial-and-error of garbage classification, right child
Point event
Try different kinds of classification
12
W
Trial-and-error of programming, left child
Point event
Try a variety of programming blocks
13
Y
Trial-and-error of garbage classification, right child
Point event
Try a variety of programming blocks
14
R
Relevant discussion time
State event
Discuss something about task
15
Z
Irrelevant discussion time
State event
Communicate something except relevant discussion
experiment by using the record table and video recorder. The researcher used recorder to record the child’s specific conditions, such as help, irritability, and interruptions, etc. Video recorder is used to record the operation, behavior and facial expression of children in the programming process.
Clas-Maze: An Edutainment Tool Combining Tangible Programming
361
4.5 Video Data Coding and Quantization Firstly, we extracted three groups’ video data. In these three videos, the behaviors of the subjects were classified and coded, and labeled with BORIS, a software for video annotation [20]. The first was behavioral classification. In order to ensure the accuracy and reliability of classification, two researchers watched video data and recorded children’s behavior categories in the experiment. Through watching video for many times and discussing the classification results, the kappa coefficient [21] of the two researchers was 0.78. The main classification of behavior included relevant discussion, irrelevant discussion, asking for help, programming, trial-and-error of programming, trial-and-error of garbage classification, thinking, confusion, happiness, etc. The second was behavior coding. According to the behavior classification scheme, the behavior coding scheme was designed by using BORIS, including behavior definition (Description), marking target objects, setting shortcut operations (Key), annotation event type (Type). Event types were divided into point event and state event. Because the number of children in two groups were different and the behavior categories were also different, the videos of Group G1 and Group G2 are also coded separately. The behavior coding map of Group G1 is shown in Table 2. On the basis of G1 behavior coding map, iterative coding is carried out to generate Group G2 coding map as shown in Table 3. Finally, according to the behavior coding map, we annotated the video of Group G1 and Group G2 respectively by using the BORIS software. We counted the frequency and duration of all kinds of behaviors and completed the quantitative analysis of all children’s video data in the experiment. In the process, we found that there were four children in Group G1 whose video data had partial defects. We discarded their video data, so there were 11 children in Group G1 and 22 children in Group G2.
5 Result Analysis In the following part, we will analyze the results of the questionnaire, record table, and video data. And we will answer the three questions raised in the section of Goals of Experiment. • Question 1: Can children learn the knowledge of garbage classification through ClasMaze, and whether there is a difference in the learning effect between single(G1) and double(G2) groups? The pre-test questionnaire and post-test questionnaire both contain 5 questions for garbage classification ability, which test result is shown in Fig. 5. In Group G1, the per capita correct number of garbage classification is 3.94 ( SD = 0.67 )in pre-test; that is 4.62 ( SD = 0.26 )in post-test. In Group G2, the per capita correct number of garbage classification is 3.92 ( SD = 0.86 )in pre-test; that is 4.58 ( SD = 0.60 )in post-test. The data comparison between the two groups is that the Group G1 improvement of the garbage classification ability is 0.02 higher than Group G2. We use One-way ANOVA to analyze the data between the two groups, with the grouping situation as the variable factor and the correct number of questions as the dependent variable. The results showed that the significance level P = 0.76 between pre-test Group G1 and Group G2, that is
362
Q. Xing et al.
greater than 0.05. So there is no significant difference in garbage classification ability between two groups in pre-test. The significance level between two groups is P = 0.91 in post-test, that is greater than 0.05. So there also is no significant difference between two groups in post-test.
Fig. 5. Comparison of garbage classification ability in pre-test and post test
There is a subjective question in the post-test questionnaire “Does this game help you to learn garbage classification?”. 2 children think that the effect learning garbage classification is the same with or without this system. 13 children think that the system is a bit helpful. 22 children think it is very helpful. About pre-test and post-test data of all participants, we use the SPSS software to perform Non-parametric Test on relevant samples. The result is P = 3.815E − 6, which proved that the children’s garbage classification ability has a very significant difference between pre-test and post-test. Combining the date of garbage classification ability and subjective question, it is easy to know that children can learn garbage classification knowledge through this programming system. During complete tasks, children in Group G1 got more help and guidance from researchers, so the learning effect of the Group G1 is a little better. • Question 2: Can children learn the programming knowledge through Clas-Maze, and whether there is a difference in the learning effect between single and double groups? The pre-test questionnaire and post-test questionnaire both contain 2 questions for programming ability, whose test result is shown in Fig. 6. In Group G1, the per capita correct number of programming is 0.62 ( SD = 0.42 )in pre-test and is 1.39 ( SD = 0.59 )in post-test. In Group G2, the per capita correct number of programming is 0.67 ( SD = 0.41 )in pre-test and is 1.33 ( SD = 0.32 )in post-test. The improvement of
Clas-Maze: An Edutainment Tool Combining Tangible Programming
363
Group G2 is 0.10 higher than that of Group G2. We also use One-way ANOVA to analyze the data between the two groups. The results showed that the significance level, P = 0.82 between Group G1 and Group G2 is greater than 0.05, that is, there is no significant difference in garbage classification ability between two groups, neither in pre-test or post-test. About pre-test and post-test data of all participants, we performed Non-parametric Test. The result showed that P = 3.815E − 6 is far less than 0.05, that is, the children’s programming ability has a very significant difference between pre-test and post-test.
Fig. 6. Comparison of programming ability in pre-test and post-test
There is a subjective question in the post-test questionnaire “Does this game help you to learn programming?”. 2 children think that this system is no helpful. 13 children think that it is a bit helpful. 22 children think that it is very helpful. Combining the date of programming ability and subjective question, we found that children can learn programming knowledge through this programming system. In short period, children in Group G1receive more instruction, so their performance is a little better at the aspect of learning effect. • Question 3: Whether there is a difference in the learning performance between single and double groups? Asking for Help. According to the statistical analysis of the data in the record table, we can know that the per capita number of asking for help is 1.36 times in Group G1; it is 0.52 times in Group G2. Obviously, the number of G1 children asking researchers for help is significantly higher than that of Group G2. Trial-and-Error of Programming and Garbage Classification. The definition of “programming trial-and-error” is that when child does not know which programming block
364
Q. Xing et al.
to put, he tries a variety of programming blocks. The definition of “garbage classification trial-and-error” is that the child tries different kinds of classification, when he does not know a certain thing belong to what kind of garbage. The statistics of the trial-and-error is shown in Fig. 7. In Group G1, the per capita number of programming trial-and-error is 0.46, and the per capita number of garbage classification trial-and-error is 0.36. In Group G2, the per capita number of programming trial-and-error is 1.35(the No.6 child had 11 trial-and-errors), and the per capita number of garbage classification trial-and-error is 0.36. Obviously, the Group G2 has a higher number trial-and-error in programming and garbage classification. The reason is that a few children in Group G2 had higher number of garbage classification trial-and-error, which lowers the overall average.
Fig. 7. Statistics of the trial-and-error
Programming Time. Programming time refers to the time from child puts down the first programming block to complete both tasks, so it consists of thinking time, relevant discussion time, irrelevant discussion time, and pure programming time. The thinking state is described as a state that child doesn’t discuss or program and the duration is greater than 5 s (we can sure that no child zone out during the experiment). Relevant discussion refers to the discussion among teammates about routes, garbage classification, use of programming blocks and other things related to the task. Irrelevant discussion refers to the communication between teammates except relevant discussion According to the video quantization data, we get the programming time distribution diagram as shown in Fig. 8. In Group G1, averagely, the total programming time is 287.94 s, and the thinking time is 173.53 s, and the pure programming time is 114.41 s. In Group G2, averagely, the programming total time is 324.77 s, and the thinking time is 105.68 s, and the pure programming time is 172.12 s, and the relevant discussion time is 38.45 s, and the irrelevant discussion time is 8.52 s. In Group G2, there are three groups that their programming time is far higher than the per capita time. We made a detailed
Clas-Maze: An Edutainment Tool Combining Tangible Programming
365
analysis of the three groups’ video data and found that the participants had several disputes with their teammates during the programming process, such as scrambling for programming blocks and holding difference opinions about programming.
Fig. 8. Distribution of programming time
Furthermore, we used Correlation Analysis to analyze the relationship between total programming time and other date. The results showed that the relevant discussion time and total programming time is extremely significant (ρ = 0.63, P = 0.002). And the irrelevant discussion time and total programming time is significant (ρ = 0.523, P = 0.012 < 0.05). Overall, the discussion among teammates lead the programming time of G2 to be longer than G1. Children’s Happiness. The state of happiness is described as a set of emotionally cheerful behaviors including applauding, laughing, jumping and so on. Through the analysis of the video quantitative data, we got that the per capita number of happiness is 2.09 (SD = 0.89) in Group G1; it is 3.86 (SD = 2.53) in Group G2. The result showed that most children (regardless of the group) were in a happy state in the experiment. But the range of happy frequency in G2 is wider, and the per capita number of G2 is significantly higher than Group G1. We used the SPSS software to perform one-way ANOVA for the two groups’ data, and there is a significant difference, P = 0.036. The children in Group G1 are happier in the cooperative learning. Children’s Confusion. The state of confusion is described as children are frustrated or overwhelmed during programming, specific actions including but are not limited to scratching the head, frowning, sighing and pouting. The per capita number of confusion is 3.18 (SD = 2.96) in Group G1 and 0.36 (SD = 2.17) in Group G2 separately. It is easy to know that the number of confusion is lower in Group G2. It reflects that children in Group G2 can solve problems independently and timely, rather than rely on the experiment researchers.
366
Q. Xing et al.
Children’s Subjective Preference. To investigate children’s cooperative inclination in programming, the second question of the post-test questionnaire is “ Do you prefer to program alone or cooperatively with a teammate?”. We got the statistical analysis of data is shown that there was only 24.32% thought programming alone was better. But 75.68% of children thought programming with a teammate is better. Children who tend to program alone are generally older. They said that the game is simple and cooperating with others is unnecessary. Children who want a teammate said that it would be more fun if they had a partner to communicate. Further, we analyzed the personal data of children who were inclined to program alone and found that they had disagreements or disputes with their teammates during the cooperation. And they gave a low score to teammates, that is, the bad cooperative experience led them to want to program alone. To sum up, children in Group G2 have a greater ability to solve problems through cooperation. They need less help from researchers, and they are more pleasant in the process of programming learning. But they averagely cost more time than those children in Group G1. So, we can judge the learning performance from two aspects. If the criterion was that child can get more knowledge in shorter time, the performance of Group G1 is better. If the criterion was that child can not only learn knowledge but also improve his or her decision-making and communication skills, and have a pleasant learning state, the performance of Group G2 is better.
6 Limitation and Discussion This experimental design also has some limitations. First, during the experiment, experimental equipment or children’s arms may block the camera, which made the camera can’t recognize the block. Although we can adjust the equipment quickly, it dispersed children’s attention and commitment to learning in some degree. Secondly, variable control is not strict. Before the test, children in G1 were given one-to-one training, but children in G2 were one-to-two. So, this may lead children in G2 to learn knowledge more roughly. Asking a researcher can make children get the right answer more quickly than team discussion, which make G1 use a shorter time to finish tasks. In our study, 35 children were involved in. The small dataset has inherent limitations, and our results therefore more tend to hypotheses and inference rather than definite conclusions.
7 Conclusion and Future Work The paper presented a new edutainment tool, Clas-Maze, which combines tangible programming and knowledge of garbage classification. Through user study, we evaluated the help of this programming system for children to learn programming and garbage classification knowledge and discussed the difference between Group G1 and Group G2. All the children’s ability of programming and garbage classification were improved significantly after the experiment, but there was no significant difference in the improvement effect between two groups. This paper evaluates the learning performance of children from six dimensions: trial and error, asking for help, programming time, happiness,
Clas-Maze: An Edutainment Tool Combining Tangible Programming
367
confusion, children’s subjective preference. The result showed that Group G1 and Group G2 have their own advantages. If the purpose of use is to let children obtain more knowledge in a short time, the system is more suitable for single use. If the purpose of use is not only to let children learn knowledge, but also to improve their ability of decision-making and communication in a pleasant learning environment, the system is more suitable for cooperative use. Based on user feedback and analysis of experimental data, in the future, we will be to improve the programming system and the experimental design. First, we should control irrelevant variables more strictly, such as collective training at the stage of knowledge popularization. Second, Clas-Maze as an edutainment tool, we will integrate more living knowledge into it. Third, to better demonstrate the usefulness of the system, we will add a control group that use other learning methods, for example teaching by teacher or watching videos. Acknowledgments. This research is supported by the National Natural Science Foundation of China under Grant No. 61872363, 61672507, the National Key Research and Development Program under Grant No. 2016YFB0401202, and the Research and Development Fund of Institute of Automation, CAS under Grant No. Y9J2FZ0801.
References 1. Wing, J.M.: Computational thinking. Commun. ACM 49(3), 33–35 (2006) 2. Council, N.: Report of a Workshop on the Scope and Nature of Computational Thinking. National Academies Press, (2010) 3. Xiaozhou, D.: Design and implementation of tangible programming system based on distributed cognition. University of Chinese Academy of Sciences, Institute of Automation (2019) 4. Alvarado, C.: CS Ed Week 2013: the hour of code. ACM Sigcse Bulletin 46(1), 2–4 (2014) 5. Hongyan, W., Yuhe, T.: UK: programming education into the national curriculum. Shanghai Educ. 2016(2), 20–23 (2016) 6. New Generation Artificial Intelligence Development Planning. Guofa No. 35. China’s State Council (2017) 7. Tingting, W., Danli, W., Lu, L., et al.: Children-oriented graphical programming language and tools. J. Comput.-Aided Design Comput. Graph. 4, 154–161 (2013) 8. Mitchel, R., John, M., Andrés, Monroy-Hernández., et al.: Scratch: programming for all. Commun. ACM, 52(11), 60–67(2009) 9. Horn, M., Jacob, R.J.K.: Tangible programming in the classroom with tern. In: Proceedings of SIGCHI 2007, ACM Press (2007) 10. Revelle, G., Zuckerman, O., Druin, A., Bolas, M.: Tangible user interfaces for children. In: Proceeding CHI EA 2005. ACM Press, pp. 2051–2052 (2005) 11. Notice on promoting the management of domestic garbage classification in schools. General Office of Chinese Ministry of Education (2018) 12. Kelleher, C., Pausch, R.: Lowering. the barriers to programming: a taxonomy of programming environments and languages for novice programmers. ACM Comput. Surv. 37(2), 83–137 (2005) 13. Kelleher, C., Pausch, R.: Using storytelling to motivate programming. Commun. ACM 50(7), 58–64 (2007)
368
Q. Xing et al.
14. Wing, J.M.: Computational thinking and thinking about computing. Phil. Trans. Math. Phys. Eng. Sci. 366(1881), 3717–3725 (2008) 15. Gallardo, D., Juli‘a, C.F., Jord‘a, S.: TurTan: atangible programming language for creative exploration. In: Proceedings Third annual IEEE International Workshop on Horizontal Human-Computer Systems. IEEE Press (2008), pp. 412–420 (2008) 16. Wang, D.L., Zhang, C., Wang, H.A. T-Maze: a tangible programming tool for children. In: Proceedings IDC 2011. ACM Press, pp. 127–135 (2011) 17. Xiuqin, Z.: How to cultivate children’s self-care ability. Exam. Weekly, 87, 193–193 (2016) 18. Miao, Z.: Educational analysis on the development of garbage sorting in kindergartens in japan. Educ. Guide (Second Half) 10, 87–90 (2015) 19. Qiao, J.: Research and implementation of children’s physical programming system based on augmented reality. University of Chinese Academy of Sciences, Institute of Automation (2019) 20. Friard, O., Gamba, M.: BORIS-Behavioral Observation Research Interactive Software: Italy. (2015) 21. Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Fam. Med. 37(5), 360–363 (2005) 22. Qi, Y., Wang, D., Zhang, L., et al.: TanProStory: a tangible programming system for children’s storytelling. In: ACM Conference Extended Abstracts on Human Factors in Computing Systems. ACM (2015) 23. Lihui, S., Danhua, Z.: Current situation and action path of international children programming education research. Open Educ. Res. 25(02), 25–37 (2019) 24. Junnan, Y., Ricarose, R.A.: Survey of computational kits for young children. In: Proceedings of the 17th ACM Conference on Interaction Design and Children (IDC 2018). Association for Computing Machinery, New York, USA, 289–299 (2018). https://doi.org/10.1145/3202185. 3202738 25. Fails, J.A., Druin, A., Guha, M.L., et al.: Child’s play: a comparison of desktop and tangible interactive environments. In: Proceedings of the 2005 Conference on Interaction Design and Children. ACM, pp. 48–55 (2005) 26. Felix, H., Ariel, Z., Horn, M., Judd, F.: Strawbies: explorations in tangible programming. In: Proceedings of the 14th International Conference on Interaction Design and Children (IDC 2015). Association for Computing Machinery, New York, USA, pp. 410–413 (2015) 27. Manches, A., O’Malley, C.: Tangibles for learning: a representational analysis of physical manipulation. Pers. Ubiquitous Comput. 16(4), 405–419 (2012) 28. Md, M.R., Monir, S., Roshan, P.: Impact of Infusing Interactive and Collaborative Learning in Teaching Introductory Programming in a Dynamic Class. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE 2020). p. 1315. Association for Computing Machinery, New York, NY, USA (2020) https://doi.org/10.1145/3328778.337 2608 29. Deng, X., Wang, D., Jin, Q.: CoProStory: a tangible programming tool for children’s collaboration. In: International Conference on Computer Supported Collaborative Learning (CSCL 2019) (2019)
To Binge or not to Binge: Viewers’ Moods and Behaviors During the Consumption of Subscribed Video Streaming Diogo Cabral1(B) , Deborah Castro2 , Jacob M. Rigby3 , Harry Vasanth4 Mónica S. Cameirão5 , Sergi Bermúdez i Badia5 , and Valentina Nisi1 1 ITI/LARSyS, IST, University of Lisbon, Lisbon, Portugal
{diogo.n.cabral,valentina.nisi}@tecnico.ulisboa.pt 2 ITI/LARSyS, Erasmus University Rotterdam, Rotterdam, The Netherlands
[email protected] 3 School of Geographical Sciences, University of Bristol, Clifton, Bristol, UK
[email protected] 4 ARDITI, Funchal, Madeira, Portugal
[email protected] 5 NOVA LINCS, University of Madeira & Madeira Interactive Technologies Institute,
Funchal, Madeira, Portugal {monica.cameirao,sergi.bermudez}@m-iti.org
Abstract. The popularity of internet-distributed TV entertainment services, such as Netflix, has transformed TV consumption behavior. Currently, the level of control viewers have over their TV experiences, along with the release of complete seasons at once, are some of the factors that stimulate the so-called binge-watching phenomenon (the consumption of several episodes of a program in a single sitting). Most of binge-watching studies have focused on viewers’ habits and health effects. This paper presents a study that relates to viewers’ behaviors and moods. It was carried out with 13 young participants at their home, watching online content, collecting physiological, inertial, and self-reported data. We identify and compare binge-watching with non-binge-watching behaviors. Our results suggest that while viewers recur to online serial entertainment in pursuit of leisure related needs, such as relaxation, relief from boredom and escapism, the act of binge-watching tends to make them feel rather unsatisfied with no change in Arousal. Nevertheless, in binge-watching the Positive Affect increases while the Negative decreases. Moreover, watching a single episode only, tends to result in increased arousal and but not necessarily in increased satisfaction. This preliminary finding can be the starting point of fruitful future investigations on unpacking further motives and nuances from this outcome. Keywords: Binge watching · Television viewing behavior · Online TV · Video-on-demand · Video streaming
© IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 369–381, 2020. https://doi.org/10.1007/978-3-030-65736-9_33
,
370
D. Cabral et al.
1 Introduction Popular press and media organizations have recently popularized the practice of “Bingewatching” referring to a viewing behavior enabled by the raise of entertainment in the form of internet TV as well as TV on Demand. Binge-watching viewing modalities are not new and can be traced back to 80 s and 90 s, when the video recording and storing capabilities of VCR and DVD made possible for the viewers to engage with hours of sequential episodes consumption. Later on, Internet-distributed TV services have facilitated this specific mode of viewing. Despite the different studies on the topic [4, 8, 13], today binge-watching still lacks a standardized definition. In the last years, both Industry and Academia have explored the phenomena of binge-watching, but from different angles. While Industry looks at the number of spectators binging, the frequency of this practice, and the device chosen [5], the academics look at its motivations and effects on health [15, 16, 19]. In this work, we performed 13 participants in the wild study, where recruited users where binge-watching content on Netflix. We collected data on users’ interface actions, self-reported data as well as physiological and inertial data, in a non-intrusive manner. Despite the lack of consensus in the definition and connotation of the term, for the purpose of this research we decided to use the term binge-watching and defined it as watching two or more complete episodes of the same program immediately after each other, or with a maximum pause of 15 min, to ensure that the flow was not disrupted. This work builds upon previous research [2], by analyzing and comparing the data of binge-watching behaviors with non-binge-watching behaviors. Thus, we have identified three types of online TV entertainment watching behavior: Binge-watching (as defined by the authors earlier in the text), watching just a single episode, and watching multiple episodes but not pertaining to the same series. Results from the analysis indicate that while binge watchers mainly look for leisure through relaxation, relief from boredom and escapism, binge watchers reported feeling rather unsatisfied after viewing several episodes, with no change in their excitement levels. At the same time, participants who manifested a Binge-watching behavior reported an increase in positive emotions and decrease in negative emotions after the sessions, being Sci-Fi the genre that present the highest positive impact and Comedy the one that reduces the most anxiety. Single Episode viewers end up their experience as happier and slightly more excited than before they started. While results already invite nuanced discussions on the binge phenomenon, further data is needed to clarify the complex reality of the entertainment experience. Our preliminary results point out to several avenues for further studies and unpacking of the experience.
2 Related Work Scholars have used traditional psychological assessment scales to quantify audience emotions while watching TV or Film content. Some have used PANAS [23], a scale for measuring the positive and negative dimensions of affective states, to study whether watching emotionally arousing films increases pain thresholds and group bonding [7], the release of hormones by TV soccer spectators [21], and to analyze there relationship
To Binge or not to Binge: Viewers’ Moods and Behaviors
371
between serialized TV fiction watching and binge-watching [9]. Other studies have used SAM [1], a pictorial scale that assesses arousal, valence, and dominance, applied to advertising studies [14] and to analyzing affective reactions to movies [3]. Regarding in the wild studies targeting TV or streaming consumption, the methodologies adopted vary, ranging from online questionnaires after viewing sessions [8], to gathering data in the wild, from hacking TV boxes [11] or placing cameras in households [18]. However, these approaches face technology and privacy limitations, which are even more challenging when employed to collect audiences’ emotional data [20, 22]. Moreover, audiences’ emotional changes can be related to physiological reactions [10, 12, 20]. Therefore, self-reported data combined with physiological data can shed new light on moods, hence motivations of binge watchers, innovating in the data collection methods. While enriching existing binge-watching data collection methods, we performed an in the wild study focused on Netflix content consumption, collecting data on users’ interface actions, self-reported mood data as well as physiological and inertial data, in a non-intrusive manner.
3 In the Wild Study In this paper, the authors describe a multifaceted exploratory study and analysis of a Netflix’s based entertainment experience. Data were collected from 13 users before, during and after online TV exposure. We collected physiological (heart rate) and inertial (wrist movement) data, through a smartwatch; online actions, logged from the Netflix interface, into a custom-made browser extension. Finally, users filled in pre and post viewing questionnaires self-reporting on arousal, valence and positive negative effects. A browser extension (specific to Chrome) was developed to register participants’ interactions on the Netflix interface (e.g. pauses, skip content), and to synchronize these actions with the smartwatch data. The extension was installed on each participant’s laptop, which was synchronized with the same clock server as the smartwatch (time.google.com). In addition, the browser extension automatically presented the pre and post viewing online questionnaires to participants. All the data collected (action logs and questionnaires’ answers) were automatically stored in a Google Sheet. The questionnaires include questions before and after a watching session regarding participants’ motivations and mood. Two scales were used: 1) the Self-Assessment Manikin (SAM) [1], a pictorial scale which assesses participants’ arousal and valence (Fig. 1), and 2) the Positive Affect and Negative Affect Schedule (PANAS) [23], a scale that consists of a number of words that describe different feelings and emotions (affects). A custom-made app (Fig. 2) was developed for the Android smartwatch (Moto 360) that participants wore during the study. The app allowed the logging of the required data, otherwise limited by commercial devices that use their proprietary software. The app acquires readings from smartwatch sensors, such as heart rate (HR) and inertial data (accelerometer and gyroscope), once per second (1 Hz). Initial validation tests showed that at 1 Hz the heart rate readings collected through the smartwatch were accurate enough and followed ECG standards. The smartwatch connects via Bluetooth to a smartphone (Moto G4), storing all data in a Dropbox account.
372
D. Cabral et al.
Fig. 1. Self-assessment manakin (SAM): top - valence; and bottom - arousal.
Fig. 2. Smartwatch app (left) and phone app (right)
The questionnaires, collecting the user’s data, included the classification of content with participants’ self-reported values of arousal and valence to understand their physiological variations while watching. Participants were asked to rate the clips after the end of the viewing session. To rate the short clip, participants were asked to recall how they felt when they were watching the content of that short clip. Participants were asked rate the fragments according to the SAM scale (Fig. 1). Asking this classification recurrently (e.g., every minute) would be quite demanding for the participants and break participants’ watching flow. Thus, we decided to sample eight short (6-s) video clips, within equal time intervals, from each episode of a pre-select content. Due to the large number of shows available in Netflix, it was not realistic to perform such sampling for all available content in the streaming platform. Therefore, we decided to sample four of the most popular titles from a preliminary survey on online consumption, covering the TV genres of Comedy, Science Fiction (Sci-Fi), and Drama. The chosen TV series were: Master of None, seasons 1 and 2; Sense8, seasons 1 and 2; 13 Reasons Why, season 1; and Stranger Things, season 1. 3.1 Participants and Procedure We recruited 13 (five female and eight male) participants, all from the Millennial Generation (1981–1997) as natural consumers of online media content, with the average ages M = 26.85 (SD = 4.41). Out of 13 participants, six were students, five workers, and two were both workers and students. Based on online pre-study that allowed us to understand
To Binge or not to Binge: Viewers’ Moods and Behaviors
373
their viewing habits, we characterized all of them as potential binge-watchers. Previous to the beginning of the study, participants were informed and instructed about the study, how to use the different devices and technologies we would be providing them with (i.e., smartwatch and browser extension). After agreeing to participate they signed a consent form and installed the browser extension on their laptops. The participants were informed that during the study, they could watch any content they wish and how they wanted but we invited them to watch the four pre-selected series. Participants signed up for a free month trial with Netflix. The study took place at participants’ homes for ten days (six workdays and two weekends). At the end of the field study, the participants received a 20 euros gift card reward. 3.2 Study Results We collected a total of 97 watching sessions, where for “session” we consider any moment when participants have watched some Netflix content for more than 2 min and without a break longer than 15 min. Of these 97 sessions, we discarded: 24 sessions because missing part of the questionnaires’ answers; 4 sessions with no single episode watched until the end and 3 sessions were participants were not watching a tv series. We defined that an episode was “watched until the end” when a participant watched its content 10 s before the beginning of the final credits of that particular episode. The tolerance of 10 s before the start of the credits aimed to mitigate different reaction times (stop or move to the next episode) without losing the episode narrative. The remaining 66 session were considered valid and were analyzed, categorizing them into three main clusters: A) Binge-watching sessions (60.61%); B) Single Episode sessions (19.70%); and C) Multiple Episodes sessions (19.70%). These categories identify three different types of viewers’ behavior, defined as: • A) Binge-watching Behavior: watching at least 2 episodes of the same TV series watched until the end (40 Sessions; M = 02:10:40; SD = 01:09:33; Average of episodes M = 3.43; SD = 1.69) • B) Single Episode Behavior: watching just 1 episode of any TV series, watched until the end and any other episodes watched for less than 2 min (13 Sessions; M = 00:54:59; SD = 00:25:23) • C) Multiple Episodes Behavior: watching multiple episodes, but from different TV series. This mode entailed that 1 episode at least was watched until the end and other episodes (not of the same series) watched for more than 2 min or until the end (13 Sessions; M = 01:28:10; SD = 00:44:34; Average of episodes M = 3.00; SD = 1.29) Motivations Before starting any online entertainment sessions on Netflix, participants were asked about their watching motivations; the question allowed multiple answers. “Relaxation” and “Boredom relief” emerged as the main motivations, followed by “Escapism” (Table 1). Mood Assessment Because of the ordinal nature of the data (SAM and PANAS), the Wilcoxon Signed
374
D. Cabral et al. Table 1. Online entertainment watching motivation
Motivation
Binge-watching behavior (N = 40)
Single episode behavior (N = 13)
Multiple episode behavior (N = 13)
Relaxation
23
7
10
Boredom relief
18
7
6
Escape
15
6
3
Learning
3
2
–
Hedonism
1
–
–
Companionship
1
–
–
Social interaction
–
1
–
None of these
–
1
–
Rank (WSR) was used to test for differences. Regarding SAM (Valence and Arousal), the WSR tests showed significant differences for Valence Before (Mdn = 3) and After (Mdn = 1) Binge-watching; and for Arousal, Before (Mdn = 2) and After (Mdn = 3) watching a single episode (Table 2). Table 2. SAM: median values before and after, for all three categories of behaviors. SAM
Binge-watching behavior (N = 40)
Single episode behavior (N Multiple episodes = 13) behavior (N = 13)
Valence before
3**
3
3
Valence after
1**
2
3
Arousal before
2
2*
3
Arousal after
2
3*
3
*p < .05 (Z = −2.232); **p < .0000001 (Z = −5.387)
Analyzing the SAM data by TV genre (Comedy, Sci-Fi, Drama and Action) WSR tests showed significant Valence decrease for Comedy Binge-watching (N = 10; Mdn Before = 3; Mdn After = 1; Z = −2.877; p < 0.005); Drama (N = 13; Mdn Before = 3; Mdn After = 1; Z = −2.994; p < 0.005); and Sci-Fi (N = 10; Mdn Before = 3.5; Mdn After = 2; Z = −2.539; p < 0.05). For Single Episode sessions, WSR test showed significant increase for Arousal when watching Drama (N = 5; Mdn Before = 2; Mdn After = 4; Z = −2.060; p < 0.05), compared to Comedy and Sci-fi, which did not show any significant differences. No significant differences in Valence or Arousal were observed for Multiple Episode sessions by genre (Comedy, Sci-fi, and Action). Regarding the analysis of the PANAS data (Table 3), WSR tests showed significant differences for Positive (PA) and Negative Affects (NA) for the Binge-watching behavior sessions. The PA increases, while the NA decreases. No significant differences were
To Binge or not to Binge: Viewers’ Moods and Behaviors
375
observed for Single Episode and Multiple Episodes sessions. Analyzing PANAS by genre for Binge-watching sessions, WSR testing showed significant increase for PA in Sci-Fi (N = 10; Mdn Before = 18; Mdn After = 24; Z = −2.807; p = 0.005); and a significant decrease for NA in Comedy (N = 10; Mdn Before = 13; Mdn After = 11; Z = −2.506; p < 0.05). Moreover, within the PANAS scale, the variable “guilty” is particularly relevant in the context of Binge-watching, due its relevance in addiction and addictive behaviors [15]. The level of guiltiness, presented on PANAS, did not change Before (Mdn = 1) and After (Mdn = 1) Binge-watching. Table 3. PANAS: Median Values for Positive and Negative Affects PANAS (10– 50)
Binge-watching behavior (N = 40)
Single episode behavior (N = 13)
Multiple episodes behavior (N = 13)
Positive before Positive after
18*
22
18
20.5*
20
16
Negative before
12*
13
12
Negative after
11.5*
12
14
*p < 0.05 (PA Z = − 2.028; NA Z = − 2.573)
Interface Actions We also compared the different interface actions such as the number of pauses per hour, setting a new position in the timeline (skipping content or getting back to re watch it) per hour, pressing the full-screen button per item watched and skipping the intro per item watched (Table 4). Because data were not normality distributed, as assessed by the Shapiro-Wilk test, we used the Wilcoxon Signed Rank test to compare the different actions. The only significant difference found was skyping the intro per item, where Binge-watching shows a higher tendency regarding this action, as shown in Table 4. Table 4. Interface actions Binge-watching (N = 40) Single episode (N = 13)
Multiple episodes (N = 13)
Action
M (SD)
Mdn
M (SD)
Mdn
M (SD)
Mdn
Pause/hour
0.65 (0.97)
0.10
1.48 (1.30)
1.51
0.69 (1.08)
0.00
New position in the timeline/hour
1.74 (6.66)
0.00
0.08 (0.30)
0.00
1.00 (3.14)
0.00
Full screen/item
0.82 (0.90)
0.50
1.85 (2.12)
1.00
1.31 (1.84)
1.00
Skip intro/item
0.16 (0.30)*
0.00
0.08 (0.28)*
0.00
0.54 (0.88)
0.00
* p < .05 (Z = −2.132)
376
D. Cabral et al.
Physiological Data From the 40 Binge-watching sessions, 12 were valid for physiological and inertial data analysis, i.e., no gaps in the smartwatch data and with all episodes watched until the end. Following the same validation, 8 Single Episode sessions were considered for physiological and inertial data analysis. Multiple Episodes sessions were excluded from this analysis due to the lack of usable data. Figure 3 shows an example of a chart displaying synchronized data for a Single Episode session (Fig. 3 a)) and a Binge Watch session (Fig. 3 b)):
Fig. 3. Example of data synchronization through a watching session (time): a) Single Episode Session; and b) binge-watching session. blue - heart rate beats; dark red -gyroscope; yellow accelerometer; red - valence; green – arousal;
– Diff hr avg: the difference between the raw value and the average of heart rate beats for that particular session, i.e., heart rate values BMP but close to vertical axis = 0. – The light blue line is the linear trend line for heart rate variations. – Gyro_movement: Binary value (0 or 1) for wrist rotations. – Acc_movement: the sum of the absolute differences between the current accelerometer values and previous values (1 s of difference) of the 3 axes. Show arm movement: low values correspond to small variations, high values (higher than 5) correspond to big variations.
To Binge or not to Binge: Viewers’ Moods and Behaviors
377
– Valence and Arousal (SAM) values associated with video clips. We did not find any clear pattern for physiological and inertial data comparing Bingewatching with Single Episode sessions through time, comparing the same episodes watched in both types of sessions (5 comparisons), and comparing the same episode watched at the beginning and the end of Binge-watching sessions (4 comparisons).
4 Discussion From the analysis of the data we can infer that while majority of the participants (60, 6%) adopted a Binge-watching behavior, many (43, 3%) did not, consuming single and multiple series’ episodes during their online entertainment sessions. Nevertheless, the motivations for the three different types of behaviors appear to be underlined by the same quest for leisure: “Relaxation”; “Boredom Relief” and “Escapism”. However, “Relaxation” closely followed by “Boredom relief”, emerge as the main motivation for the Binge-watching behavior and watching Multiple Episodes behaviors (i.e., longer sessions). For the Single Episode behavior the three leisure seeking related motives are quite balanced as top choices. Therefore, this would suggest that participants who look for “Relaxation” tend to spend more time watching online content. “Learning” as a motivation scores low across three types of behaviors, reinforcing the leisure seeking attitude of all viewing behaviors. When looking at moods of the viewers across the SAM results, we can observe a striking decrease in Valence after the end of the Binge-watching sessions, suggesting that participants ended the experience in an unpleasant mood, less satisfied or unhappier than when they started it (Fig. 4). This decrease in Valence, before and after the viewing session, is less striking in the Single Episode watching behavior, and is not noticeable in the Multiple Episodes behavior. Interesting to note is that all participants reported quite low Arousal values before even starting to watch any online TV entertainment. Participants that engaged in Single Episode watching behavior got more excited after a single episode session, while no change for arousal was observed after Binge-watching or watching Multiple Episodes. These results further support the idea that Binge-watching participants’ mood worsened after the session, while their arousal levels remained the same. On the other hand, the Single Episode watchers’ mood also deteriorated slightly during the session, but their arousal increased. Multiple Episode watchers did not report any changes in mood or excitement. The fact that Arousal has only increased in Single Episode sessions and not in Binge-watching and Multiple Episodes might be explained by the circumstance that “Relaxation” was the top motivation for these long watching sessions behaviors, as previously mentioned. The curiosity induced by serial entrainment watching, to know how the story will develop in next episodes as well as the short time seated in front of a screen, might be considered as a reason for the increase in Arousal in Single Episode watchers. The analysis of the results by TV genres highlighted that Drama is the only genre that presented significant increase in Arousal. Analyzing the PANAS data, Binge-watching, Single Episode and Multiple Episode sessions show similar values. In the case of the Binge-watching behavior, there was significant increase of PA and a significant decrease of NA after the session. Nonetheless,
378
D. Cabral et al.
Fig. 4. Circumplex model: arousal vs valence [17]: a) before watching single episode and bingewatching; b) after watching single episode; and c) after binge-watching
these values are still overall quite low, not enough to denote a change in participants’ mood (PA is far from normal values, which is rated at 25 or higher, and close the threshold of 18, which is an abnormally low value and might be associated with depression [6]). Such results are in line with the decrease in Valence after Binge-watching, as previously discussed. Looking at the data through the TV genre lens, the increase in PA values is mainly related with Sci-Fi content, while the NA decrease is related to the Comedy genres. Interesting also to note the Sci-Fi content almost set PA values near to normal (25 or higher [6]), and since PA might reflect better participants’ mood [6], such results indicate that Binge-watching Sci-Fi content have a higher positive impact on participants’ mood than Comedy. The NA values are also low across the three types of behavior (far from the anxiety threshold of 29 [6]), which might be related with the second main motivation for watching online content, “Boredom relief”. The significant decrease of NA in Binge-watching is related to watching Comedy genre, and it might indicates reducing participants’ anxiety. Regarding the levels of guilt reported through the PANAS, it is constant before and after Binge-watching. These results confirm Percks studies which associate Bingewatching with addiction and addictive behaviors [15]. Regarding data emerging from the users’ interface actions, we observe a tendency to skip intros more often while Binge-watching, when compare with the watching of
To Binge or not to Binge: Viewers’ Moods and Behaviors
379
single or multiple episodes. This can be an expected in order not to repeat the same titles or introductory content several times during a Binge-watching session. No clear pattern was found for physiological and inertial data. Such a lack could derive from the challenges emerging from collecting physiological data in in the wild, with no or little control of the experiment. Also, the choice of using minimally intrusive sensors (smart watch) also substantially limited the amount and quality of the physiological data that could be gathered since these devices do not provide access to raw ECG data. To conclude we would like to acknowledge several limitations of our study. A higher sample of sessions and more constrains on the physiological data collection are needed to be able to extract patterns from the data. Therefore, part of the experiment needs some controlled environment to achieve comparable data, e.g., all participants must watch the same series. It may reduce the poll of possible participants but will provide more meaningful data.
5 Conclusions In summary, in this paper the authors present an exploratory field study, collecting physiological, inertial, and self-reported data from 13 young participants while watching online entertainment content. The authors developed a smartwatch app, collecting physiological and inertial data, and a chrome extension, logging participants’ interface actions and opening questionnaires automatically. The study highlighted and compared three kinds of distinct behaviors on our participants: 1) Binge-watching (according to our definition reported in the first section of this article), 2) Single Episode watching and 3) Multiple Episode (from different series) watching. Results from the analysis of the data suggest that while most viewers engage in online TV entertainment mainly looking for leisure, the Binge-watching behavior affect viewers mood, resulting in lower levels of Valence and no change in Arousal at the end of the session. Nevertheless, Bingewatchers report an average increase in Positive Affect and decrease in Negative Affect at the end of the sessions, being Sci-Fi the genre with the highest positive impact and Comedy the one that reduces anxiety. On the other hand, Single Episode consumption can make viewers feel more excited after the session. More studies would be needed to deepen the reasoning behind these intriguing initial findings. Acknowledgments. We would like to thank to Jéssica Franco and Ricardo Pinheiro for their help cleaning part of the data. This research was funded by FCT/MCTES LARSyS (UID/EEA/50009/2013 (2015-2017)); by LARSyS (UIDB/50009/2020) and by CEECINST/00122/2018 (IST-ID/300/2019).
References 1. Bradley, M.M., Lang, P.J.: Measuring emotion: the self-assessment manikin and the semantic differential. J. Behav. Ther. Exper. Psychiatry 25(1), 49–59 (1994)
380
D. Cabral et al.
2. Castro, D., Rigby, J.M., Cabral, D., Nisi, V.: The binge-watcher’s journey: investigating motivations, contexts, and affective states surrounding netflix viewing. Convergence (in press), 1354856519890856 (2019) 3. Codispoti, M., Surcinelli, P., Baldaro, B.: Watching emotional movies: affective reactions and gender differences. Int. J. Psychophysiol. 69(2), 90–95 (2008) 4. Conlin, L., Billings, A.C., Auverset, L.: Time-shifting vs appointment viewing: the role of fear of missing out within TV consumption behaviors. Commun. Soc. 29(4), 151–164 (2016) 5. Ericsson Consumer Lab Report 2015 TV & Media 2015. The empowered TV & media consumer’s influence (2015) 6. Crawford, J.R., Henry, J.D.: The positive and negative affect schedule (PANAS): construct validity, measurement properties and normative data in a large non-clinical sample. Br. J. Clin. Psychol. 43(3), 245–265 (2004) 7. Dunbar, R.I.M., et al.: Emotional arousal when watching drama increases painthreshold and social bonding. R. Soc. Open Sci. 3(9), 160288 (2016) 8. Feijter, D.de., Khan, V., Gisbergen, M. van.: Confessions of a’guilty’ couch pota-to understanding and using context to optimize binge-watching behavior. In: Proceedings of ACM TVX 2016. ACM, New York, NY, USA, pp. 59–67 (2016) 9. Flayelle, M., Canale, N., Vögele, C., Karila, L., Maurage, P., Billieux, J.: Assessing bingewatching behaviors: development and validation of the “watching tv series motives” and “binge-watching engagement and symptoms” questionnaires. Comput. Hum. Behav. 90, 26– 36 (2019) 10. Heiselberg, L., Bjørner, T.: How to evaluate emotional experiences in television drama series: improving viewer evaluations by psychophysiological measurements and self-reports. In: Proceedings of the 36th European Conference on Cognitive Ergonomics, pp. 1–4 (2018) 11. Kim, M., Kim, J., Han, S., Lee, J.: A data-driven approach to explore television viewing in the household environment. In: Proceedings of ACM TVX 2018. ACM, New York, NY, USA, pp. 89–100 (2018) 12. Kreibig, S.D.: Autonomic nervous system activity in emotion: a review. Bio. Psychol. 84(3), 394–421 (2010) 13. Merikivi, J., Bragge, J., Scornavacca, E., Verhagen, T.: Binge-watching serialized video content: a transdisciplinary review. Tele. New Media, 1527476419848578 (2019) 14. Morris, J.D.: Observations: SAM: the self-assessment manikin: an efficient cross-cultural measurement of emotional response. J. Adv. Res. 35(6), 63–68 (1995) 15. Perks, L.G.: Media Marathoning: Immersions in Morality. Lexington Books, Washington DC (2015) 16. Pittman, M., Sheehan, K.: Sprinting a media marathon: uses and gratifications of bingewatching television through netflix. First Monday 20(10), (2015) 17. Posner, J., Russel, J.A., Peterson, B.S.: The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev. Psychopathol. 17(3), 715–734 (2005) 18. Rigby, J.M., Brumby, D.P., Gould, S.J.J., Cox, A.L.: Media multitasking at home: a video observation study of concurrent tv and mobile device usage. In: Proceedings of ACM TVX 2017. ACM, New York, NY, USA, pp. 3–10 (2017) 19. Sung, Y.H., Kang, E.Y., Lee, W.A.: Bad habit for your health? an exploration of psychological factors for binge-watching behavior. In: Proceedings of 65th ICA (2015) 20. Wang, C., Geelhoed, E.N., Senton, P., Cesar, P.: Sensing a live audience. In Proceedings of CHI 2014. ACM, New York, NY, USA, pp. 1909–1912 (2014) 21. van der Meij, L., et al.: Testosterone and cortisol release among Spanish soccerfans watching the 2010 world cup final. PLoS ONE 7(4), e348144 (2012)
To Binge or not to Binge: Viewers’ Moods and Behaviors
381
22. Vermeulen, J., MacDonald, L., Schöning, J., Beale, R., Carpendale, S.: Heartefacts: augmenting mobile video sharing using wrist-worn heart rate sensors. In: Proceedings of ACM DIS 2016. ACM, New York, NY, USA, pp. 712–723 (2016) 23. Watson, D., Clark, L.A., Tellegen, A.: Development and validation of brief measures of positive and negative affect: the PANAS scales. J. Personal. Soc. Psychol. 54(6), 1063 (1998)
Psychological Evaluation for Images/Videos Displayed Using Large LED Display and Projector Ryohei Nakatsu1(B) , Naoko Tosa1 , Takashi Kusumi1 , and Hiroyuki Takada2 1 Kyoto University, Yoshida-Honmachi, Sakyo-Ku, Kyoto 606-8501, Japan
[email protected] 2 TELMIC Corp., Akiba East Bldg., 1-28-5 Taito, Taito-Ku, Tokyo 110-0016, Japan
Abstract. Display of images/videos on a big screen/display, such as digital signage, projection mapping, etc., is becoming popular in the area of entertainment, advertisement, and so on. A comparison of the usage of an LED display and a projector is important for such applications. We have prepared an experimental environment where two types of projection/display using a 200 inch LED display and a 200-inch screen and a projector. Also, two brightness conditions, light-on and light-off, and two types of contents, art content and text content were used for the experiment. Under a total of eight types of environments, we carried out a psychological experiment using 24 participants. Each subject filled a questionnaire based on a five-point semantic differential scale. Based on the analysis of the result, it was revealed that the combination of art content, light-off, and LED display gives far better results than other combinations. Keywords: LED display · Projector · Art contents · Five scale SD method
1 Introduction Displaying images and videos with a very large scale is suitable as a means to convey image information to many people at the same time. As a method of displaying image or video with a large scale, there are a method of displaying it on a large LED display and a method of projecting it with a projector. Large LED displays are expensive but have excellent visibility even in a bright situation, such as train stations and street corners. On the other hand, the method of displaying it using a projector has the disadvantage that it is difficult to see in a bright situation, but it has the advantage that it can be displayed without changing existing walls or buildings. In particular, projection mapping [1], which projects a three-dimensional image using the shape of a building as it is, has been attracting attention as a new display method of information and is often used in various entertainment scenes. However, as the price of large LED displays have dropped and the brightness of projectors has improved, it is desirable to accumulate data regarding what kind of information is suitable and what kind of environment is suitable for an LED display and a © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 382–390, 2020. https://doi.org/10.1007/978-3-030-65736-9_34
Psychological Evaluation for Images/Videos Displayed
383
projector. In this area researches such as how high-resolution display and high-brightness display affect people’s psychology [2, 3] have been carried out. Also, studies on screen size and viewing distance [4] have been conducted. But few experiments have been reported to compare the display by a projector and by an LED display under various conditions. The main reason for this is that a comparison of a large size display with an LED display and a projector is difficult to achieve in an experimental environment at a university. Since we were able to realize such an environment within the framework of joint research with a company, we compared the image/video display of large size when using an LED display and using a projector. At the same time, we compared the LED display and the projector by changing the brightness of the surrounding environment, changing the contents to be displayed, and performed the evaluation using the framework of a psychological experiment. In this paper, the experiment framework, the analysis results, and the consideration based on them will be described.
2 Method 2.1 Experimental Conditions In our experiment, two types of display methods, two types of contents, and two types of brightness conditions were set, and as a result, eight types of experimental conditions were set. (1) Display method We compared two methods: displaying contents on a 200-inch LED display and projecting contents on a 200-inch screen using an 8000-lumen projector. Figure 1 shows the 200-inch LED display. Also. Figure 2 shows the screen and the projector. The LED display, projector, and screen necessary for realizing these environments were installed side by side in a laboratory of about 300 square meters area.
Fig. 1. 200 inch LED display used for the experiment
384
R. Nakatsu et al.
Fig. 2. Screen and projector used for the experiment.
(2) Contents and its creation method Two types of contents were used for the comparison experiment: a text-based image created using PowerPoint and a video art created by one of the authors, Naoko Tosa. The content of the text-based image is an introduction of a company including text and additional images such as a corporate logo and a representative product. The video art used here is “Sound of Ikebana” created by Naoko Tosa. She found that a fluid such as paint can create flower-like shapes by applying a sound vibration to the fluid and photographing it with a high-speed camera. A speaker is placed face-up, a thin rubber film is put on the top, fluid such as paint is put on it, and the speaker is vibrated with sound, then the paint jumps up and various shapes are created. She found that various fluid shapes can be generated by changing the shape of sound, frequency of sound, the type of fluid, viscosity of fluid, etc. [5]. Then, the obtained video was edited according to the various colors expressing each of the Japanese four seasons to create a video art called “Sound of Ikebana” [6]. This is a video art with a resolution of 4 K, with a length of about 30 min, and a part of it was used for this research. In 2014, projection mapping of Sound of Ikebana was performed using the outer wall of Art Science Museum in Singapore with a height of about 20 m. Furthermore, in April 2017, it was exhibited using more than 60 large LED displays in Times Square, New York (Fig. 3). Therefore, the video artwork has a track record of being displayed both using a projector and an LED display and, therefore, is considered suitable for this comparative experiment.
Fig. 3. Exhibition of “Sound of Ikebana” at times square in New York.
Psychological Evaluation for Images/Videos Displayed
385
(3) Environmental conditions To carry out a comparative experiment between a bright environment and a dark environment, two types of conditions, with and without lighting, were performed in our laboratory. To achieve complete darkness, the experiment was done at night. This made it possible to achieve complete darkness when the lights were turned off. The illuminance of the experimental environment was 40 lx when the light was on and 0 lx when the light was off. During the psychological experiments, the subjects sat down just in front of the LED display or the screen with a distance of 5 m, which is the standard distance when watching a 200-inch display or screen. 2.2 Experimental Procedure (1) Participants The participants were 24 students from Kyoto University (13 males and 11 females). Their ages were in their twenties, and we considered that the gender ratio should be about half. In the experiment, sufficient informed consent was given to the participants. (2) Image/video presentation procedure For each subject, the above mentioned 8 conditions were set in a random order to conduct the experiment. The experiment was repeated 8 times under different conditions with 30 s rest -> 60 s content display (first time) -> 30 s rest -> 60 s content display (second time) -> 30 s rest as one cycle. (3) Measuring method The semantic differential method was adopted as the subjective evaluation method. In this method, adjectives having opposite meanings are placed, and the scale is divided into 7 or 5 point scale. In our experiment, a 5 point scale was used. (4) Evaluation items Taking the previous studies [2–4], etc. into consideration, the six subjective evaluation items shown in Table 1 were selected. When displaying on a large screen, evaluation items such as whether or not a sense of presence is realized were often used [8]. In our case, however, it was decided to use Kansei (sensitivity) evaluation items such as legibility, overall satisfaction, etc. The eight experimental conditions were given in random order, and the questionnaire was filled out at the end of each cycle. After all cycles were completed, a brief interview was conducted with each subject to know their overall impressions.
386
R. Nakatsu et al. Table 1. Subjective evaluation items.
Satisfied Easy Relaxed Interested Elated Want to watch longer
Unsatisfied Uneasy Unrelaxed Uninterested Unelated Don’t want to watch longer
3 Results and Discussion 3.1 Means and Standard Deviations To compare eight experimental conditions for each subjective evaluation item, means and standard deviations were obtained. The results for each of the six subjective evaluation items are shown in Fig. 4, 5, 6, 7, 8, and 9.
Fig. 4. Comparison of scores for “As a whole were you satisfied?”
3.2 Consideration on Scores of Each Evaluation Item (1) Total satisfaction Regarding overall satisfaction, “art x LED x no lighting” is the only one with a score of 4 or higher. The art contents are highly evaluated in the order of “art x LED x no lighting” > ”art x projector x no lighting” “art x LED x lighting”. In particular, “art x LED x no lighting” has been highly evaluated. In the interviews, there were many opinions that details of the art content were highly visible and its power was overwhelming. Only one person replied that under the “art x LED x no lighting” condition, she was tired due to its high contrast. This result shows that the young generation, who are accustomed to watch
Psychological Evaluation for Images/Videos Displayed
Fig. 5. Comparison of scores for “Was it easy to watch”
Fig. 6. Comparison of scores for “Were you relaxed while watching?”
Fig. 7. Comparison of scores for “Were you interested in the content?
387
388
R. Nakatsu et al.
Fig. 8. Comparison of scores for “Did you feel elation?”
Fig. 9. Comparison of scores for “Did you want to watch it longer?”
high-resolution, high-contrast images/videos on smartphones, do not feel awkward when watching an LED display in a dark environment. Regarding the text, there is a slight difference between “LED x lighting” and “projector x no lighting”, and which display method is suitable may depend on the content. As content with text, images, and video mixed together has become quite common in presentations using PowerPoint, it is necessary to perform evaluations when such content is used. (2) Other evaluation items Concerning all other evaluation items, “art x LED x no lighting” received the highest evaluation, which leads to the highest evaluation in the overall evaluation. In particular, concerning the ease of viewing the entire image, a score of 4.5 or higher has been obtained. The next highest rating is “art x projector x no lighting”. For each evaluation item, “art x projector x no lighting” got the second-highest rating.
Psychological Evaluation for Images/Videos Displayed
389
When displaying art content such as video art, it is often that the interior is set dark and a sub-content is displayed on the wall using a projector, and at the same time, the main content is displayed using an LED display. One of the authors, Naoko Tosa, has the experience of having such an exhibition at galleries in Japan, Hong Kong, etc. and gained a high evaluation, which supports this result. Following that, “art x LED x lighting” is highly evaluated. This is also a normal display method for video art in art galleries, but it has received a lower rating compared to “art x LED x no lighting.” Based on the results, it can be said that when displaying video art in galleries, etc., it is effective to show art contents by setting the room dark even if an LED display is used. On the other hand, concerning the display of text-based contents that are commonly used for presentations at academic conferences, etc., the evaluation score is generally low as compared with the display of art contents. However, this is somewhat unavoidable due to the nature of the content itself, and in particular, it is rated lower than art content in terms of “Were you interested in the content?”, “Did you feel elation?”, “Did you want to watch longer?”. On the other hand, concerning “Was it easy to watch”, the evaluation comparable to art content was obtained with the evaluation score close to 4. This supports the fact that in academic conferences usually a room is set dark while displaying content with a projector.
4 Conclusion When displaying content on a large scale in classrooms, art galleries, conference venues, and entertainment venues, it is an important issue of whether to use an LED display or a projector. Although related research includes research on the psychological effects of high-resolution display and high-brightness display [2–5], there have been almost no reports of experiments comparing a projector and an LED display under various conditions. Also, the result would depend on a condition, whether the light is turned on or not, whether the displayed content is text-based content or art content. In this research, we have tackled such a problem. As an environmental condition, we prepared an environment for displaying contents on a large screen of a 200-inch LED display and a 200-inch screen + a projector. We set a total of eight environmental conditions, that is, whether an LED display or projector is used, whether the light is turned on or off, whether the content to be displayed is text-based content or art content. Then we carried out a psychological experiment having 24 subjects to evaluate using the semantic differential method. As a result, based on the average value of the score, it was found that “art content x LED x no lighting” got a very high score. It was also found that the commonly used method of “projector x dark environment” is effective for text-based content. We believe that we have obtained guidelines for display and exhibition regarding whether to use an LED display or a projector and whether to turn on or off the lighting. For future research, it is necessary to examine in more detail how content differences would affect the results. Also, as it is considered that the sensitivity of the younger generation is reflected in the results, it is necessary to conduct experiments with subjects for other generations.
390
R. Nakatsu et al.
References 1. Blokdyk, G.: Projection mapping: A Complete Guide. 5SRARcooks (2018) 2. Sakamoto, K., Sakashita, S., Yamashita, K., Okada, A.: Influence of high-resolution 4 K displays on psychological state during content viewing. In: Stephanidis, C. (ed.) HCI 2014. CCIS, vol. 434, pp. 363–367. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07857-1_64 3. Sakamoto, K., Yamashita, K., Okada, A.: Effect of high dynamic range on physiological and psychological state during video content viewing. J. Hum. Interface Soc. 20(2), 123–133 (2018) 4. Gilinsky, A.G.: Perceived size and distance in visual space. Psychol. Rev. 58, 460–482 (1951) 5. Narita, N., Kanazawa, M., Okano, F.: Optimum screen size and viewing distance for viewing ultra high-definition and wide-screen images. Trans. Inst. Image Inf. Telev. Eng. 55(5), 773–780 (2001) 6. Yunian, P., Zhao, L., Nakatsu, R., Tosa, N.: A study on variable control of sound vibration form (SVF) for media art creation. In: Proceedings of 2015 Conference on Culture and Computing, IEEE Press (2015) 7. Tosa, N., Yunian, P., Qin, Y., Nakatsu, R.: Pursuit and expression of Japanese beauty using technology. Arts J. MDPI 8(1), 38 (2019) 8. Heeter, C.: Being there: the subjective experience of presence. MIT Press (1992)
To Borrow Arrows with Thatched Boats: An Educational Game for Early Years Under the Background of Chinese Three Kingdoms Culture Hui Liang(B) , Fanyu Bao, Yusheng Sun, Chao Ge, Fei Liang, and Qian Zhang Zhengzhou University of Light Industry, Zhengzhou, China [email protected]
Abstract. In recent years, the significance of traditional Chinese civilization is gradually recognized by us. Many studies have shown that serious games are profitable for learning, but they have not yet exerted their tremendous effect in the field of traditional culture. To dissemination traditional cultural education, improve cognitive skills and promote children’s interest in learning, we exploited an application using the Unity 3D Game Engine. In this paper, a shooting game is projected with the background of straws boat borrowing arrows, in which children are worked out to complete the collection of arrows through gesture movement and condition selection. In the end, the player’s upbeat test results exhibited that the child’s mental capacities are smoothly advanced in the game. Keywords: Chinese culture · Serious game · Ability training · Three-dimensional framework
1 Introduction It is vitally important to promoting children’s traditional cultural education to the entire society that Chinese traditional culture, as the source of driving force for social development, embodies the essence of each era in thousands of years. At present, a specified amount of children’s learning has an akin characteristic in which the comprehending of traditional culture is merely at the level of memory, rather than deeply understand. They are tough to procure various levels of practice and training such as historical knowledge, cognitive ability, coordination ability, and hence on. In this setting, new forms and means of pedagogy are needed by us. The dominant position of serious games has brought innovative anticipation to traditional Chinese cultural education. The modality of immersing children in virtual reality games is provided with superior interactive effects on educational games. This paper selects the classic battle of the Three Kingdoms period (Fig. 1 [1]), Borrowing Arrows with Straw Boats, and uses this as a background design to implement an educational game based on natural gesture interaction, To Borrow Arrows with Thatched Boats. © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 391–398, 2020. https://doi.org/10.1007/978-3-030-65736-9_35
392
H. Liang et al.
Fig. 1. Three kingdoms
2 Related Work Virtual reality technology has been involved in entertainment [2], military [3], medicine [4], film and television [5] and other fields, especially in the education industry with an extensive range of applications. In traditional geography teaching, books, films and television work are frequently utilized as a medium to demonstrate to students. The evaluation of the science course demonstrated that students who performed VR design activities had preferable learning effects, and improved self-efficacy and critical thinking [6]. The study by Tüzün H [7] show that students bear a higher intrinsic motivation for learning in the environment of the virtual reality game. We comprehend art in an immersive manner with the assistance of virtual reality thereby get over art’s abstract and obscure. The 3D Museum is also used as learning means to grasp more about art-historical facts [8]. We also hold software that teaches to sing like Sing Master [9]. History will not reemerge until virtual reality technology emerges. Photo-realistic Byzantine church models generated using VR technology can be applied for scientific analysis, preservation-restoration and conservation purpose [10]. Etruscanning 3D is a gesture-based interface project that tells users the history and importance of archaeological discoveries through storytelling [11].
3 System Design Our game design focuses on three aspects, one is the comprehension of history, the remaining two are the training of children’s strategic ability and motor coordination ability. There are diverse indicators for testing in allusion to these three aspects of training. Granting to the design focus, we will depart from the following aspects: 1. To train comprehension of history: a. At present, the Chinese historical story is an infrequent mold of game development, which can also slim down the threshold of learning history. b. Children will not only contact the synopsis of the story before the game but also deepen their understanding in the process of the game if taking historical stories as the background of the game.
To Borrow Arrows with Thatched Boats
393
2. To train strategic ability: a. Have a clear thread of the game’s tasks and know how to win the game by manipulation. b. The player is trained to achieve the objective by choosing the applicable game object. On that point are no unique criteria for these choices, nevertheless, different choices contribute to different scores. 3. To train motor coordination ability: a. Complete the game by manipulating gestures precisely. b. Children are required to cooperate with their hands and brain to avoid obstacles and receive the arrow. Training target of our game is shown in Fig. 2.
Fig. 2. Training target of borrowing arrows with straw boats
Virtual environments are used as media for games. The complete VR system is composed of HTC Vive and Leap motion as perception device and tracking device. The children’s coordination ability is examined the following three aspects: motor performance, sensory performance, and cognitive performance [12]. The investigation of strategic capability is from the following three angles: favourable climate, geographical position, and support of the people. The ability to interpret and analyze the causal relation boat of things is acquired by the choice of the different game objects. It left a concept in their mind: every “outcome” will be affected by the “choice” he once did. This part is designed to train children’s tactical and strategic abilities. Set a certain amount of energy to prevent children from indulging in games. When the energy is exhausted, you need to wait for enough time to continue the game. The training of Chinese traditional culture runs through the whole game, not embodied in a certain link. The story as background suggestions provided to players before the game. During the game, players appreciate the story details which is used as the framework. Capacity improvement framework is shown in Fig. 3.
4 Implementation Combined with four types of interrelated sets of activities of Klabbers [13] and the use of the four-dimensional framework [14] proposed by de Freitas, S, a new education
394
H. Liang et al.
Fig. 3. Capacity improvement framework
framework is proposed which interaction, presentation, schemas. According to the design focus of the game and the new education framework (Fig. 4) to implement game details. First, teacher-student interaction is one of the important components of the framework. The autonomous manipulation interactive game (Fig. 5) as a medium for teacherstudent communication, based on depth-sensing technology, not only helps teachers show knowledge points that intuitive, fascinating and operable, but also presented the Battle of Chibi to students in the novel form. The second dimension is primarily for the style of presentation that learners interact with the game by some extraordinary expression tool to immerse in learning. The game combines various elements through Unity to build a realistic and interesting game interface (Fig. 6). The schema is a combination of design themes and suitable tools. Chinese traditional culture is irreproducible and untouchable, so we can only observe history through text descriptions. Virtual reality technology has advantages of immersive, conceptual,
To Borrow Arrows with Thatched Boats
395
Fig. 4. The education framework
Fig. 5. Game selection interface
Fig. 6. Collect arrows interface
and interactive. Learners can obtain a new teaching environment combining virtual reality technology with traditional Chinese cultural education. Learners understand the substance of the game expressions due to the lifelike scenes bring an immersive experience. These three dimensions construct a framework for this game design, each latitude in this framework is inseparable. They are related and interact with each other, providing more possibilities for learning from multiple perspectives. Game-based learning designed by this framework considers from the perspective of both students and teachers, reducing the risk of losing sight of each other, also has the ability to stimulate students’ potential.
396
H. Liang et al.
5 Experiment and Discussion Twenty students whose age is 6-12 were recruited to assess the effectiveness of our educational game in improving historical comprehension ability, strategic ability, and motor coordination ability. We set some indicators and record the indicator data which are reflected the impact of the game on the children for each round of the game. The results are shown below using the average data of the five rounds of experiments (Fig. 7) and the scoring criteria (Table 1). In the game, as long as the player collects 500 arrows to be considered as winning. The test results demonstrate that the children have stepped forward their ability in the following areas. Strategic capability development: correctly understanding the characteristics of each thing can make an accurate choice in different situations, which is also an amelioration in strategic ability. Coordination capability development: the brain, hand, and eye need cooperative together to move the boat follow the player’s thoughts. Table 1. THE SCORING CRITERIA Choice
Result
Score
Climate
Sunny
5
Rainy
8
Foggy
10
Battle Site The Yangtze River
10
The ground combat Equipment Scarecrows
5 8
Food
6
Soldiers
4
Weapons
3
War drum
3
Battle flag
3
Scarecrows + (drum, flag, weapons)
10
Scarecrows + food/soldier
7
drum/flag/weapons + (drum, flag, weapons) 3 Time
others
5
70 s
4
fail
0
To Borrow Arrows with Thatched Boats
397
Fig. 7. The average data of strategic capability and coordination capability
In addition to the training of strategy and coordination ability, it also aids students to deepen their comprehending of the story. We used the same questionnaire which relating to straw boats borrowing arrows to the two groups which the training and active control group. The first group was taken after game which uses the questionnaire and another group did it directly that ten people in each group. Figure 8 shows that the educational game leads to significantly improve the ability to comprehend history.
Fig. 8. Two group scores
6 Conclusion Educational gamification is a common teaching method nowadays. It is no longer limited to the boring teaching mode in books and classrooms and brings children a new experience in a fun and vivid way. This game involves coordination, thinking, and cultural knowledge. After a series of tests, we can get from this result that children can improve their ability in many aspects while playing. This article just selected a small story from the Three Kingdoms period as an example. I hope that in future research, more traditional culture can be introduced to contribute to the inheritance of Chinese cultures.
References 1. https://www.google.com/imgres?Imgur 2. Fiber, J.: The future of VR & (non-gaming) entertainment. TWICE, 31(16) (2016)
398
H. Liang et al.
3. Gace, I., Jaksic, L., Murati, I., Topolovac, I., Zilak, M., Car, Z.: Virtual reality serious game prototype for presenting military units. In: 15th International Conference on Telecommunications (ConTEL), Graz, Austria, pp. 1–6 (2019) 4. Zhang, D.: Improvement of medical clinical teaching by VR medical-surgical helmet. Institute of Management Science and Industrial Engineering. In: Proceedings of 2019 3rd International Conference on Economics, Management Engineering and Education Technology(ICEMEET 2019). Institute of Management Science and Industrial Engineering: (Computer Science and Electronic Technology International Society), pp. 1972–1974 (2019) 5. Hei, X.: An affordable motorized generation system of object VR movie. Intelligent Information Technology Application Association. In: Proceedings of the 2011 International Conference on Informatics, Cybernetics, and Computer Engineering (ICCE 2011 V3). Intelligent Information Technology Application Association, pp. 81–89 (2011) 6. Chang, S.C., Hsu, T.C., Jong, M.S.Y.: Integration of the peer assessment approach with a virtual reality design system for learning earth science. Comput. Educ. 146, 103758 (2020) 7. Tüzün, H., Yılmaz-Soylu, M., Karaku¸s, T., et al.: The effects of computer games on primary school students’ achievement and motivation in geography learning. Comput. Educ. 52(1), 68–77 (2009) 8. Froschauer, J., Arends, M., Goldfarb, D., Merkl, D.: Towards an online multiplayer serious game providing a joyful experience in learning art history. In: Third International Conference on Games and Virtual Worlds for Serious Applications, Athens, pp. 160–163 (2011) 9. Dzhambazov, G., Goranchev, K.: Sing master: an intelligent mobile game for teaching singing. In: 8th International Conference on Games and Virtual Worlds for Serious Applications (VSGAMES), Barcelona, pp. 1–2 (2016) 10. Voutounos, C., Lanitis, A.: On the presentation of byzantine art in virtual environments. In: Third International Conference on Games and Virtual Worlds for Serious Applications, Athens, pp. 176–177 (2011) 11. Pietroni, E., Pagano, A., Rufa, C.: The etruscanning project: gesturebased interaction and user experience in the virtual reconstruction of the regolini-galassi tomb. In: Digital Heritage International Congress (DigitalHeritage), 2, pp. 653–660 (2013) 12. Asadipour, A., Debattista, K., Chalmers, A.: A game-based training approach to enhance human hand motor learning and control abilities. In: 7th International Conference on Games and Virtual Worlds for Serious Applications (VS-Games), Skovde, pp. 1–6 (2015) 13. Pereira, L.L., Roque, L.G.: Design guidelines for learning games: the living forest game design case. In: DiGRA Conference (2009) 14. De Freitas, S., Oliver, M.: How can exploratory learning with games and simulations within the curriculum be most effectively evaluated? Comput. Educ. Special Issue 46(2006), 249–264 (2006)
Jo˜ ao em Foco: A Learning Object About the Dyslexia Disorder Washington P. Batista1(B) , Kayo C. Santana1 , Lenington C. Rios2 , Victor T. Sarinho1 , and Claudia P. Pereira1 1
Computer Science Post-Graduate Program - State University of Feira de Santana. Feira de Santana, Bahia, Brazil [email protected], [email protected], {vsarinho,claudiap}@uefs.br 2 Department of Exact Sciences - State University of Feira de Santana. Feira de Santana, Bahia, Brazil [email protected], Abstract. This paper presents Jo˜ ao em Foco, a Learning Object (LO) to help professionals and parents to understand about dyslexia, a specific learning disorder. The LO provides information about diagnosis, treatment and better practices to help the development of people in this condition, overcoming different barriers faced by them. The validation process was performed with Pedagogy students of the State University of Feira de Santana and by the feedback of specialized professionals that deal with dyslexic people from the ABCD Institute, a social organization that disseminates projects that have a positive impact on the lives of people with dyslexia.
Keywords: Learning Object
1
· Serious game · Dyslexia
Introduction
Considered as a specific learning disorder, dyslexia is characterized by the difficulty in the acquisition and development of reading and writing skills, essential abilities to people’s daily activities [1]. It affects 15% of the world population [2], which presents some difficulties in the learning process, such as: disorder between letters, syllables or words with simple spelling distinctions (e.g. a-o, c-o, e-c); disorder between letters, syllables or words with similar spelling, but with different orientation in space (e.g. b-d, b-p, b-q, d-b, d-p); partial or total exchange of syllables or words (e.g. me-in, sol-los, som-mos); and replacing words with others with similar spelling or even creating words [3]. The early and accurate diagnosis of dyslexia is an important aspect to overcome some barriers imposed by this learning disorder. In this sense, it is known that initial evidences of dyslexia are more perceptible in the children’s literacy, when they have great difficulty in learning to read [2]. However, due to the lack of knowledge of parents and educators about dyslexic children [2,4], many children do not receive the necessary follow up to help them in their educational c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 399–406, 2020. https://doi.org/10.1007/978-3-030-65736-9_36
400
W. P. Batista et al.
development process, by using different educational resources and methodological approaches. Regarding the current demand of parents and teachers of dyslexic children to be informed about their difficulties and possible treatments, some works [5,6] show the importance of the technology in the development process of dyslexic people. As a result, the technology help them to overcome different challenges, creating a receptive and motivator environment to encourage them without disrespecting their abilities and limits, or even labeling them as disinterested, lazy or undisciplined [2,7]. As an attempt to provide a receptive and motivator environment for dyslexic people, this paper presents Jo˜ ao em Foco (in English, John in Focus). It is a Learning Object (LO) that provides games and scenarios able to disseminate information about the dyslexic disorder, helping different people (e.g. teachers, brothers and parents) to understand and to create a safe environment for a dyslexic person. Several aspects related to dyslexia, presented by game scenarios, are also reported in this paper, such as: a) the needs of a specialized professional to deal with; b) their difficulties and ways to overcome them; and c) successful diagnosed people, providing information about representativeness. Finally, some considerations about the importance of this project for the social inclusion of dyslexic people are also presented, as well as the future work needed to improve this proposed tool.
2
Dyslexia: Concept, Diagnostics and Treatment
Dyslexia is considered as a specific learning disorder of neurobiological origin, characterized by the difficulty in recognizing words, both in terms of decoding and spelling [8,9]. Children with dyslexia have flaws in the phonological module, which makes it difficult to divide words into underlying sounds, which means that there is difficulty in associating connection sounds present in whole words [9,10]. This problem results in a poor school performance or in the accomplishment of activities imposed on their daily lives [11]. There are some symptoms that can characterize dyslexia, such as: delay in acquisition and in the tasks of reading and writing; difficulty with sounds of words; incorrect writing (exchanges, omissions, junctions and agglutination of phonemes); difficulty in associating sound to the symbol, rhymes and alliterations; difficulty in sequential organization (e.g. letters of the alphabet, months of the year, times tables); difficulties with organization of time, space and direction; difficulty with mental calculations, memorizing phone numbers, messages, taking notes, and similar; and the difficulty to organize your tasks [12]. According to Costa and others [13], the teacher’s knowledge of what dyslexia is serves as a pre-diagnosis, subject to referral and intervention. However, even if the teacher is properly prepared, it is not his role to diagnose a specific learning disorder [14]. In this sense, it is necessary to work with a multi-professional approach (involving parents, teachers/school and child) to overcome the child’s
Jo˜ ao em Foco: A Learning Object About the Dyslexia Disorder
401
difficulties. Then, after recognizing the problem, the speech therapist’s diagnosis must be made through the analysis of phonological, morphological, syntactic and semantics levels [15–17]. With parents active in their children’s educational process, the chances of success are greater. In addition, the involvement or participation of parents in school, especially in the monitoring of activities, meeting and notes, generates an important dialogue between these two pillars responsible for the formation of the individual, who are the parents and teachers [18,19]. Due to the lack of dissemination of information about dyslexia, many family members don’t know what this disorder is and end up feeling unable to help, or make an incorrect assessment based on their previous experiences [20]. Therefore, it is necessary not only to disclose information about dyslexia, but also to help specialized professionals to guide parents on how to behave and deal with their children’s specificities, such as, for example, in encouraging school activities, not leaving them to give up activities that they find difficult [21,22]. Attitudes like this can improve the subjects’ skills, and, with that, make them safer in the execution of reading and writing activities.
3
Related Work
Some studies were developed to provide awareness/information games about dyslexia. One of them is the Dyslexia MT1 that was created by the Mato Grosso Dyslexia Association. It aims to promote inclusive actions on dyslexia, providing didactic material and support to the family and dyslexics themselves. As a civil organization, the Association relies on the collaboration of members, collaborators, partners and donations to develop its activities in the state of Mato Grosso (Brazil). Dislexia MT also provides the news functionality regarding the dyslexia disorder across the country, being possible to contact the association for external assistance. As a result, despite the dyslexia information not be transmitted as a playful game, Dislexia MT addresses the proposal of this work, which is to clarify issues related to people with dyslexia.
4
Development
Jo˜ ao em Foco was designed for teachers, parents and children older than 8 years. It provides a single player game mode that can be played with distinct profiles on the same device, such as parents and child, allowing them to learn together about dyslexia and the difficulties that the main character of the game goes through in his daily life. The target public was selected thinking not only about helping dyslexic players to better know themselves, but also to help their close people in the process 1
Available at: https://apkpure.com/br/dislexia-mt/br.com.app.gpu1910220. gpu62d19f82a6fcbfb9fbf6c9f648121608 app. Last Checked: may 2020.
402
W. P. Batista et al.
of knowing and helping them. In this sense, teachers are responsible for guiding and motivating them in the classroom and learning process, while their parents follow the growth and the challenges in educating and guiding them throughout their lives. 4.1
Jo˜ ao em Foco Operation
After running the LO, a home screen is displayed (Fig. 1), which presents initial options for “Play”, “About” and “Exit”. As the main character of the LO is a child, the built scenarios have many childish elements. It is worth mentioning that the presentation screen is the player’s first visual contact with Jo˜ ao, and, with this, the children’s elements bring important nuances, highlighting the beginning of their learning process in the early stages of school.
Fig. 1. Learning object presentation screen & Children module board screen.
Users are able to play by clicking on the “Jogar”(in English, Play) button (Fig. 1), after that they are able to select between three modules: teacher, children and parents. These modules will be better explained in the following sections. 4.1.1 Children Module The Children module is inspired by a board game, through which the player must help Jo˜ ao to reach the fourth step (the tile 4), which corresponds to the end of the board (Fig. 1). In each step, different contents related to dyslexia is worked on: – 1st step: a school exercise is presented to Jo˜ao, in which the mirroring of letters occurs (Fig. 2). Mirroring can occur through the exchange of letters from d and b, q and p, or it can also happen in the rotation of the axis itself in letters like a and e [23]; – 2nd step: simulates how a dyslexic person reads a given text, showing that the acquisition of reading and writing is not something trivial for them. – 3rd step: presents fonts that minimizes their difficult in reading presenting and exercise with the use o it (Fig. 2). – 4th step: a final feedback is given.
Jo˜ ao em Foco: A Learning Object About the Dyslexia Disorder
403
Fig. 2. Jo˜ ao’s school exercise & Activity screen with OpenDyslexic font.
4.1.2 Teachers and Parents Modules The Parents and Teachers modules aim to guide and raise awareness of dyslexia for important people in Jo˜ ao’s teaching and learning process. Therefore, in order to guide the parents and the teachers player profile, similar dynamics with different narratives were constructed for them, as shown below: – Vocˆ e Sabia?: (in English, “Did you know?”), the player will face situations, through characters of the LO, that will guide him to learn about dyslexia. The LO brings the quiz style, and, to bring this information, the player will choose which answer best suits the described situation. Finally, after the quiz answers, a feedback will be shown in relation to the content, and the player will be congratulated for the achievement; – Biblioteca: (in English, “Library”) this topic presents an environment in which it is possible to find content about dyslexia. The player will be able to click on books on the screen, which will be directed to websites, articles, booklets, with dyslexia contents. Its noteworthy that, for each module, the contents are specific to the respective audience; – Hist´ orias: (in English, “Stories”) in this module, a text is shown about who made history because of their notoriety and talent. Besides presenting the information on these people’s contributions to the world, it is also pointed that all of them were dyslexic, creating a sense of representativeness. It is important to present this information to children or students with dyslexia, showing them that, like those people, they have potential and are also capable of carrying out their activities. In order to help to broadcast dyslexic information, “Jo˜ ao em Foco” has an achievement system in Teachers and Parents modules, which shares information on social media (Facebook and Instagram). Thus, when players finish a stage, they are able to share this information on their social media, reaching friends, family, among other people, being able to inform about dyslexia or encourage people to know more about the subject. 4.2
Validation Process
To validate this work, two versions of the LO were provided, one for PC-Desktop and other for Android. Both versions were tested by 15 users (Pedagogy students of the State University of Feira de Santana), with only 18.2% of them
404
W. P. Batista et al.
with a previously knowledge about dyslexia. The project was also submitted to the Ethics Committee (Presentation Certificate for Ethical Appreciation on Plataforma Brasil: 09091718.6.0000.0053), requiring the definition of a collection of data instruments to be used in the validation phase. Participants were asked to rate the LO using six criterias about their experience that vary on a five-point scale: Boring vs Stimulant, Tiring vs Entertaining, Useless vs Useful, Difficult learning vs Easy learning, and Bad didactic vs Good didactic. The evaluation averages can be seen in Fig. 3. Overall, the LO was well evaluated. The participants considered it stimulating, light, with a good didactic, profitable, organized and conducive to learning dyslexia, all with an average higher than 4. It is noteworthy that the item with the lowest average was tedious x stimulating, obtaining an average of 4.36
Fig. 3. Jo˜ ao em Foco’s user experience graphic.
Besides that, we also used a questionnaire designed with Likert scale to evaluate the LO and the information presented about dyslexia, game characters, interfaces, and feedback. According to the results, over 90% of the users agree with the developed approach to handle the dyslexia information and recommend the game for others. The results reflect that ’Jo˜ ao em Foco’ provides clear interfaces, dialogues and information that permit people to be informed about dyslexia content. Users considered as a good reward to level accomplishment the share of information in social media, as well as an important approach to spread dyslexia information. Further, when they were asked if they recommend ’Jo˜ ao em Foco’ to other people, all of them agree with this statement, results that can be also related to the large number of users that already installed the app.
5
Final Considerations
Researches focused on better alternatives in the dissemination of information about people with a Specific Learning Disorder are extremely important to contribute not only to development of these people, but also to their social inclusion process. In this sense, this paper presented “Jo˜ao em Foco”, a LO developed to help people to increase their knowledge about dyslexia, as well as raising awareness to educators about alternative methods to help their students to overcome
Jo˜ ao em Foco: A Learning Object About the Dyslexia Disorder
405
different barriers. For parents, “Jo˜ ao em Foco” talks about the need of understand their children and provide them a safe environment together with a specialized follow-up. Finally, in a society perspective, “Jo˜ ao em Foco” raises empathy about the challenges faced by a dyslexic person, helping them in the process of social inclusion. “Jo˜ao em Foco” provides through a game environment not only information about dyslexia but also experiences of the learning difficulties that come with it, to approximate not dyslexic people with this reality, and encouraging dyslexic ones to overcome the different barriers. It is also important to emphasize that Jo˜ ao em Foco at this point was used not only by the group of educators who helped in the validation process, but also by parents and specialized professionals of the ABCD Institute. In fact, as the ABCD Institute published the game app on its social media, encouraging users to download it, they helped to get an important feedback about it from parents and professionals who deal directly with dyslexic children. As a result, some of the suggested changes are already available, such as the font size, the use of audio in the game narrative, and some changes in the quiz contents. Nowadays, according to information from the Google Play Store, “Jo˜ ao em Foco” has already been installed by more than 2700 people and its evaluation is 4.8/5.0 stars. As future work, besides a validation of “Jo˜ ao em Foco” with dyslexic children, new features are expected, such as: a) add soundtrack and sound effects; b) improve the textual presentation so that the reading content does not become so tiring; c) add voice feedback to present information; d) add a specific session for dyslexic children with suggestions and games to develop their skills; and e) develop an administrator module that allows professors and specialized professionals to easily create storytelling inside the LO.
References 1. Alves, R.J.R., Lima, R.F., Salgado-Azoni, C.A., Carvalho, M.C., Ciasca, S.M.: Teste para identifica¸ca ˜o de sinais de dislexia: processo de constru¸ca ˜o. Estudos de Psicologia (Campinas) 32, 383–393 (2015). https://doi.org/10.1590/0103166X2015000300004 2. Pimenta, D.: Dislexia: um estudo sobre a percep¸ca ˜o de professores do ensino fundamental. Centro de Ensino, Pesquisa, Extens˜ ao e Atendimento em Educa¸ca ˜o Especial (Org.), Anais do VSemin´ ario Nacional de Educa¸ca ˜o Especial, IV Encontro de Pesquisadores em Educa¸ca ˜o Especial e Inclus˜ ao Escolar, pp. 1–15 (2012) 3. Lima, I.: A dislexia e o contexto escolar. Anhanguera educacional 10, 1–15 (2012) 4. Gon¸calves, M.A.F.: A dislexia no ensino fundamental. Revista Eletrˆ onica Acervo Cient´ıfico 3, e648–e648 (2019). https://doi.org/10.25248/reac.e648.2019 5. Cidrim, L., Madeiro, F.: Information and Communication Technology (ICT) applied to dyslexia: literature review. Revista CEFAC 19(1), 99–108 (2017). https://doi.org/10.1590/1982-021620171917916 6. Zikl, P., Bartoˇsov´ a, I.K., V´ıˇskov´ a, K.J., Havl´ıˇckov´ a, K., Kuˇc´ırkov´ a, A., Navr´ atilov´ a, J., Zetkov´ a, B.: The possibilities of ICT use for compensation of difficulties with reading in pupils with dyslexia. Procedia-Soc. Behav. Sci. 176, 915–922 (2015). https://doi.org/10.1016/j.sbspro.2015.01.558
406
W. P. Batista et al.
7. Shaywitz, S.: Entendendo a dislexia: um novo e completo programa para todos os n´ıveis de problemas de leitura. Artmed (2006) 8. Massi, G., Santana, A.P.D.O.: A desconstru¸ca ˜o do conceito de dislexia: conflito entre verdades. Paid´eia (Ribeir˜ ao Preto) 21(50), 403–411 (2011). https://doi.org/ 10.1590/S0103-863X2011000300013 9. Snowling, M.J., Stackhouse, J.: Dyslexia, Speech and Language: A Practitioner’s Handbook. John Wiley & Sons, United States (2013) 10. Carid´ a, D.A.P., Mendes, M.H.: A importˆ ancia do est´ımulo precoce em casos com risco para dislexia: um enfoque psicopedag´ ogico. Revista Psicopedagogia 29(89), 226–235 (2012) 11. American Psychiatric Association, et al.: Diagnostic and statistical manual of menR American Psychiatric Pub (2013) tal disorders (DSM-5). 12. Ianhez, M.E., Nico, M.A.N.: Nem sempre ´e o que parece. Gulf Professional Publishing (2002) 13. Costa, J.P.F., J´ unior, A.B.T., Santos, H.M.P., Mateus, Y., Albuquerque, L., Silva, B.B.T.N., Maciel, G.E.S., et al.: N´ıvel de conhecimento dos professores de escolas p´ ublicas e particulares sobre dislexia. CEP 52170, 000 (2013) 14. Cancela, A.L.: As implica¸co ˜es da dislexia no processo de aprendizagem na perspectiva dos professores do 1 ciclo do Ensino B´ asico. Ph.D. thesis, [Universidade Fernando Pessoa] (2014) 15. Capellini, S.A.: Dist´ urbios de aprendizagem versus dislexia. Tratado de fonoaudiologia. S˜ ao Paulo: Roca, 862–76 (2004) 16. Capellini, S.A., Silva, A.P.D.C., Silva, C., Pinheiro, F.H.: Avalia¸ca ˜o e diagn´ ostico fonoaudiol´ ogico nos dist´ urbios de aprendizagem e dislexias. Zorzi JL, Capellini SA. ao Jos´e dos Campos: Pulso Dislexia e outros dist´ urbios da leitura-escrita. 2a . ed. S˜ Editorial, pp. 95–111 (2009) 17. Schulte-K¨ orne, G.: The prevention, diagnosis, and treatment of dyslexia. Deutsches ¨ Arzteblatt Int. 107(41), 718 (2010) 18. Carvalho, M.E.P.D.: Modos de educa¸ca ˜o, gˆenero e rela¸co ˜es escola-fam´ılia. Cadernos de pesquisa 34(121), 41–58 (2004). https://doi.org/10.1590/S010015742004000100003 19. Reid, G.: Dyslexia: A Practitioner’s Handbook. John Wiley & Sons, United States (2016) 20. Polonia, A.D.C., Dessen, M.A.: Em busca de uma compreens˜ ao das rela¸co ˜es entre fam´ılia escola. Psicologia escolar e educacional 9(2), 303–312 (2005). https://doi. org/10.1590/S1413-85572005000200012 21. Reid, G.: Dyslexia and Inclusion: Classroom Approaches for Assessment, Teaching and Learning. Routledge, United Kingdom (2019) 22. Alias, N.A., Dahlan, A.: Enduring difficulties: the challenges of mothers in raising children with dyslexia. Procedia-Soc. Behav. Sci. 202, 107–114 (2015) 23. Roberto, T.M.G.: Reconhecimento das letras: considera¸co ˜es sobre espelhamento e varia¸ca ˜o topol´ ogica em fase inicial de aprendizagem da leitura. Letras de Hoje 48(1), 12–20 (2003)
3D Modeling
3D Modeling and 3D Materialization of Fluid Art that Occurs in Very Short Time Naoko Tosa1(B) , Pan Yunian1 , Ryohei Nakatsu1 , Akihiro Yamada2 , Takashi Suzuki2 , and Kazuya Yamamoto3 1 Kyoto University, Yoshida-Honmachi, Sakyo-Ku, Kyoto, Japan
[email protected] 2 Toppan Printing Co., Ltd., Tokyo, Japan 3 NAC Image Technology Inc., Tokyo, Japan
Abstract. We have been creating artworks called “liquid art” utilizing liquid dynamics phenomena. One of the liquid artworks is “Sound of Ikebana” which is created by giving sound vibration to color paints and shooting the phenomenon by a high-speed camera, which has been evaluated as “the artwork includes Japanese beauty.” To investigate further why it is evaluated in such a way and also to seek the possibility of its application in society, we tried to materialize it into 3D objects. As the phenomenon occurs in a very short time of less than one second, we have developed a specific experimental environment consisting of multiple high-speed cameras surrounding a speaker where the phenomenon occurs. Among various technologies to reconstruct the 3D model from multiple 2D images, we have chosen a method called Phase-Only Correlation and developed a 3D mesh model of a snapshot of “Sound of Ikebana.” Also using a 3D printer we have successfully obtained 3D materialized “Sound of Ikebana.” Keywords: Fluid art · High-speed camera · 3D modeling · 3D materialization
1 Introduction People are surrounded by various natural phenomena in the world, and they have found a beauty for many of them. Artists have produced many artworks starting from imitating those natural phenomena. Although what beauty is has been a fundamental question discussed for a long time [1], people have an intuitive understanding of what beauty is and incorporate it around them to enrich their lives. At the same time, the beauty of nature is more than what usually is seen. People can find beauty in deep-sea and deep-space images that can be seen using the latest technology. This indicates that there is still hidden beauty in nature that is unknown to people. To take out the beauty hidden in nature and make it visible is what we have been working. The basic concept of this paper is based on the idea of how to extract hidden beauty in nature and how to provide it to people in new ways. We have done the work of using a high-speed camera to extract beauty hidden in natural phenomena that are usually invisible to people and provide it to people in © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 409–421, 2020. https://doi.org/10.1007/978-3-030-65736-9_37
410
N. Tosa et al.
the form of video art. In particular, recently we have been working on filming various fluid phenomena using a high-speed camera and making the beauty hidden in fluid phenomenon into video art [2]. Although the natural phenomenon is three-dimensional, we have expressed it in the form of two-dimensional images. As a next step, we have tried to showcase the created beauty by visualizing it into three-dimensional materials. This paper has the following structure. First, in Sect. 2, we will explain fluid art which is to be visualized as three-dimensional material. We also discuss beauty in fluid art and mention what Japanese beauty is. Next, in Sect. 3, we will discuss related research on the 3D reconstruction technology required to convert fluid art into 3D. In Sect. 4, the method of making fluid art into 3D material is described. Finally, Sect. 5 summarizes the paper including future research directions.
2 Fluid Art 2.1 Fluid Dynamics and Fluid Art The behavior of fluids accounts for a large part of natural phenomena. Typical examples are airflow, water flow, wave behavior, etc. Research on the movement of fluid is an important research subject in physics called “fluid dynamics” [3]. Visualization of fluid phenomena makes it possible to intuitively understand how fluid behaves under various conditions. With visualization techniques, people know that fluids create extremely beautiful shapes under various conditions. Beauty is a very basic element that constitutes art. Therefore, it is natural to think of using fluid dynamics as a basic methodology for art production. The art created based on fluid dynamics is named “fluid art” by us. 2.2 Generation of Fluid Art 2.2.1 Fluid Art Generation System As a basic technique for creating fluid art, we have developed a method for shooting the liquid form that is created by being given the sound vibration with a high-speed camera. High-speed cameras have been used to shoot various phenomena such as explosions that occur in a very short time, but most of their uses are in scientific and technological experiments. On the other hand, we were interested in producing various beautiful organic shapes such as “milk crowns”. Then, we found that a fluid such as paint could be vibrated by sound to create a flower-like shape. Figure 1 shows the fluid art generation environment [4]. We put a speaker facing upwards, put a thin rubber film on its top, put fluid such as color paints on the rubber film, vibrate the speaker with the sound, and shoot the phenomenon with a high-speed camera. A high-speed camera with 2000 frames/sec is used here. A PC connected to the speaker produces various sounds and vibrates the speaker. 2.2.2 Fluid Art “Sound of Ikebana” By using the fluid art generation system, we systematically changed the shape of the sound, the frequency of the sound, the type of the fluid, the viscosity of the fluid, etc.
3D Modeling and 3D Materialization of Fluid Art
411
Fig. 1. Fluid art generation system.
Based on these experiments we found that beautiful forms were generated. One of us, Naoko Tosa, created a video art called “Sound of Ikebana” by editing the video images thus obtained according to the colors of the Japanese seasons [2, 4]. Figure 2 shows a scene of the artwork. An interesting issue is that, while she exhibited the video art in Japan and abroad, many people expressed that “in her artwork, Japanese beauty is expressed” or “Japanese beauty is included in her artwork.”
Fig. 2. A scene from “Sound of Ikebana.”
2.3 What Is Japanese Beauty? What is the essence of Japanese beauty? As has been pointed out by many people such as Bruno Taut, Japanese artworks and architectures have always emphasized the sense of unification with nature [5]. Then, it can be said that Japanese beauty is not something created by the Japanese, but that it is the beauty hidden in nature and that the delicate and sophisticated parts that appeal to the Japanese sensitivity are taken out. The methodology of extracting the beauty hidden in natural phenomena using advanced technology based
412
N. Tosa et al.
on the artist’s sensibility fits this Japanese beauty concept very well. This seems to be the reason why many people feel Japanese beauty in Naoko Tosa’s artworks. At the same time, this is not only the characteristics of her artworks but also those of other Japanese artists. Many Japanese artists have found beauty in the natural phenomena such as the flow of the river and the waves that scatter, and have finished them as artworks. For example, in Ikebana the basic shape (“type”) is an asymmetric triangle [15] (Fig. 3 left). Interestingly, the shape of “Sound of Ikebana” created by Naoko Tosa is similar to the “type” of Ikebana. (Figure 3 right).
Fig. 3. Comparison of Ikebana type (left) and “Sound of Ikebana” (right).
What does the similarity between these artworks or “types” of Japanese beauty and natural and physical phenomena mean? Perhaps Japanese artists used their genius or mind-eye to discover the beauty hidden in nature that the general public cannot see, which may have been evaluated as Japanese beauty. 2.4 3D Modeling of Sound of Ikebana and Its Application When “Sound of Ikebana” was exhibited in Japan and abroad, it was evaluated as “expressing Japanese beauty.” At the same time, many people have requested that “I want to see ‘Sound of Ikebana’ as a three-dimensional object.” ”Sound of Ikebana” is based on natural and physical phenomena, and it contains organic shapes that were not created by conventional human designs and modeling. There are three-dimensional objects such as public art, daily necessities such as plates and cups, cars and aircraft, and even buildings around us. Up until now, artists, designers, engineers, architects, etc. have created appropriate shapes for these objects. Those shapes were created based on the artists’ and designers’ sensitivity and creativity, such as a car design that reduces air resistance, and an architectural design that matches the cityscape. All of these are shapes created by humans, and are based on straight lines and simple curves, and tend to give an inorganic or artificial impression. However, as the maturity of culture
3D Modeling and 3D Materialization of Fluid Art
413
increases and the aging society arrives, it is requested to incorporate not only artificial shapes but also organic and diverse modeling of nature into various products around us. Realizing “Sound of Ikebana” as a 3D model enables the following applications (Fig. 4).
Fig. 4. Application examples of 3D modeling of “Sound of Ikebana” (Left: Art sculpture, Middle: Vase, Right: Future architecture).
(1) Unprecedented organic art sculptures can be created. (2) New modeling design can be introduced to daily necessities such as bowls and vases. (3) A new architectural design with an organic shape can be introduced to the current architecture that is based on straight lines. Also, by further discussing with CAD/CAM related engineers, another possible application is, (4) 3D modeling with a natural shape can introduce a new design method to the design of cars and aircraft, which is so far based on the use of spline curves, etc. To achieve this, it is necessary to achieve the 3D model reconstruction of physical phenomena and the transformation of the 3D model into a solid object. Section 4 describes the current status of our research progress regarding these issues.
3 Related Research on 3D Reconstruction To measure the shape of a three-dimensional object and reconstruct it in the form of a 3D model, etc., there are two methods; a passive method of photographing an object with cameras and an active method of projecting laser or structured light on a target object. A typical passive method is to reconstruct the surface shape of the target object using multiple camera images taken from different viewpoints [6, 7]. In this case, it is possible to use a relatively simple system as compared with the active method. Three-dimensional reconstruction from multiple viewpoints usually consists of three steps: image capturing, camera calibration, and precise shape reconstruction. 3D reconstruction from multiple viewpoints is based on the method of shooting target images with multiple cameras fixed around the target. With this method, camera calibration can be performed in advance, and the internal and external parameters of the cameras
414
N. Tosa et al.
can be obtained with high accuracy and stability. Furthermore, the three-dimensional reconstruction of the moving object is possible by synchronizing the cameras and taking images. There is also a method of fixing the camera to a robot arm and moving the robot arm or fixing the camera or target object to a rotary table and moving the rotary table to take multi-viewpoint images [6], but by this method, it is not possible to shoot a moving object. Also, recently, as a shooting method with fewer restrictions on shooting, free-moving shooting with a monocular camera has been used [7]. With this method, there are few restrictions on shooting, and it is possible to perform 3D reconstruction of a large scale object such as an entire building, or even a city, as well as a small object as a table. With this method, it is difficult to perform camera calibration in advance, but in recent years, feature-based image matching such as SIFT [8] and SURF [9] has enabled stable image matching between multi-view images. Active three-dimensional shape measurement is a method of recovering the shape of a target object from a single image by projecting a fixed pattern light onto the target object and performing high-speed projection or high-speed shooting. This method has attracted attention as a method capable of measuring the 3D shape of a moving object. As one of them, there is a method called TOF (Time-of-Flight), which measures threedimensional information by measuring the reflection time of the pulsed light applied to the target object for each pixel. A typical example of this is Kinect [10], which Microsoft used to sell as a product until recently. Kinect irradiates pulse-modulated infrared rays and measures the distance to the target based on the time delay of the reflected light from the target. Since it is relatively inexpensive, it was widely used not only for applications such as games but also for research related to human-computer interaction as a device that can easily measure human movements.
4 Generation of 3D Model of Fluid Art 4.1 Basic Concept and Method As mentioned in Sect. 2, the fluid art “Sound of Ikebana” created by one us, Naoko Tosa, has been evaluated as “Japanese beauty is expressed”. By exploring the reasons for this, it is necessary to investigate the essential problems of what beauty is and also what Japanese beauty is. As part of this research, one idea is to make “Sound of Ikebana,” which is currently obtained as video art, into a three-dimensional object. Then, as it is possible to observe its shape from various directions, it is possible to get impressions and comments from many people, including art critics. Also, as described in Sect. 2, there will be various application areas of “3D Sound of Ikebana.” Based on this idea, we decided to carry out the 3D reconstruction of “Sound of Ikebana” and the 3D materialization. This Chapter describes the method of 3D reconstruction and the results. Regarding 3D reconstruction, there are passive methods and active methods as described in Sect. 3. Since the created form can be seen only by using a high-speed camera of 2000 frames/second, the problem we challenge is a three-dimensional reconstruction of an extremely fast-moving object. Regarding the three-dimensional reconstruction of a high-speed moving object, an active method, that projects structured images using a high-speed projector of 1000 frames/second and capturing them with a high-speed video camera have recently been proposed [11]. However, the 1000 frame/sec projector is not
3D Modeling and 3D Materialization of Fluid Art
415
commercially available as a mass-produced product at this time, and the resolution is not sufficient. Furthermore, the phenomenon of “Sound of Ikebana” occurs in a small area of approximately 10 cm3 , and it is difficult to accurately project in such a small area. On the other hand, when using the passive method, there are methods: a method that uses multiple still cameras and a method that uses multiple high-speed video cameras. Although a method of performing synchronous shooting using multiple still cameras can build a shooting system relatively inexpensively, the timing of shooting becomes a problem. Forms in “Sound of Ikebana” can be seen by shooting with a high-speed camera of 2000 frames/second, and at the same time, the timing when beautiful forms that can be used as a basis of 3D reconstruction is extremely limited. It is necessary to select relevant images by watching a huge number of shot images. Furthermore, in the case of materializing from a 3D model, the moment that can be selected is more limited because it is necessary not only to consider the beauty of the selected image but also the preservation of the obtained 3D object. This means it is very difficult to use multiple still cameras and to decide the adequate shutter timing based on human intuition. In fact, in this research, we first set up a system using multiple still cameras and conducted various experiments to shoot, but it was extremely difficult to obtain a satisfying shooting result necessary to obtain beautiful modeling depending on the timing of pressing the shutter. Based on the above consideration and preliminary experiments, we decided to adopt a passive method based on shooting “Sound of Ikebana” by using multiple high-speed cameras to obtain its 3D model. 4.2 Camera Settings Figure 5 illustrates the setting for shooting. Also, Fig. 6 shows the actual setting. The system consisting of multiple high-speed cameras is called MEMRECAM manufactured by NAC. The camera part (Mcam V004) has 2 M pixels and a shooting speed of 2000 frames/sec. We placed 6 to 8 cameras around the speaker, where color paints are jumped up by sound vibration. Our final target is to shoot from 360 degrees to perform a complete 3D reconstruction, but as a first step, we used 6 to 8 high-speed cameras. We placed the high-speed cameras so that the speaker was surrounded by them about 120 to 180 degrees. 4.3 Three-Dimensional Shape Reconstruction by Phase-Only Correlation Method The reason why a passive method was adopted for 3D shape reconstruction from 2D images was explained in 4.1. Regarding the passive method, we surveyed the technology in Sect. 4, but here, as a new method compared to the conventional method, the Phase-Only Correlation method [12, 13] was adopted. Here, the method will be briefly described. 3D shape reconstruction by the passive method consists of (1) estimation of 3D points using image matching, and (2) generation of 3D models from 3D points. Among these processes, the image matching method for estimating 3D points has a significant effect
416
N. Tosa et al.
Fig. 5. Shooting system using eight high-speed cameras.
Fig. 6. Actual setting.
3D Modeling and 3D Materialization of Fluid Art
417
on the accuracy, robustness, and computational cost of the multi-view stereo algorithm, and is an important factor that determines the algorithm performance. In conventional multi-view stereo algorithms, image matching based on Normalized Cross-Correlation (NCC) [14] is used. In this algorithm, image matching is repeatedly performed while discretely changing the coordinates of the 3D points, and the 3D coordinates having the highest matching score are used as the points for reconstruction. To perform highly accurate 3D reconstruction, it is necessary to change the coordinates with a very fine step size, which causes a problem that the calculation cost becomes huge. On the other hand, Phase-Only Correlation (POC) [12, 13] is an image matching method that uses only the phase information of the images. The two images are Fourier transformed and the POC function is calculated from the respective transfer spectra. If the transformation between the two images is only based on translation, it is possible to define an ideal peak model of the POC function. At this time, the height of the peak of the POC function corresponds to the similarity between the images, and the peak coordinate of the POC function corresponds to the parallel movement region between the images. When applying POC to multi-view stereo, k stereo pairs are made from one reference viewpoint and k neighboring viewpoints, and POC is used for local window matching between each stereo image. At this point, by estimating the coordinates of the 3D points from the peak model of the POC function, the 3D coordinates can be estimated with higher accuracy than the conventional NCC-based method. This is particularly effective in a camera setting whose baseline is relatively narrow like the camera setting of our experiment. To perform 3D reconstruction of a very complicated shape such as the one in “Sound of Ikebana,” it is necessary to set the cameras with multi-viewpoints close to each other and shoot under the camera setting with a narrow baseline. On the other hand, if the baseline becomes narrow, the influence of the image matching error on the error of the 3D shape becomes large, so a highly accurate image matching method is required. Therefore, in this experiment, we decided to use a method based on the POC method. The POC method has already been commercialized with the name “TORESYS 3DTM ” by Toppan Printing Co., Ltd., and in this experiment, we performed 3D reconstruction using the software. A normal object, which is mainly composed of a convex surface, can be reconstructed into a 3D model with relative ease, but “Sound of Ikebana” has a very organic and complicated shape, and it is difficult to reconstruct its 3D model. Specifically, it was necessary to cope with the following problems. Problem 1: Restoration accuracy is low when the number of high-speed cameras is small. Solution 1: It was confirmed that the accuracy was improved by increasing the number of cameras from 2 to 4 and 8. In the installation layout of the camera, the restoration width is minimized to increase the restoration accuracy. (This means more cameras are required to restore the entire image.) The parameters for processing were fine-tuned to make it easier to restore the 3D shape. Problem 2: Restoration accuracy cannot be improved unless imaging can be performed with the fluid having fine feature points. Solution 2: When a gold paint is used, the material of the paint appears on the surface like a pattern (roughness), and good restoration results are obtained. Moreover, To reduce
418
N. Tosa et al.
the overexposure of the image, the lighting was covered with tracing paper to diffuse the light. Problem 3: If the object in the image is blurred, the accuracy of 3D rendering decreases. Solution 3: The restoration result is good for an image in which the entire surface of the subject is in focus. But it is difficult to obtain a good stable result. It is necessary to consider the adjustment such as the squeezing of the camera to increase the depth of field and the adjustment of the intensity of the illumination. Figure 7 shows a 3D model of “Sound of Ikebana” obtained as a result of various trials considering the above issues. Further, Fig. 8 shows the mesh model.
Fig. 7. 3D model of “Sound of Ikebana.”
Fig. 8. Mesh pattern of “Sound of Ikebana” 3D model.
3D Modeling and 3D Materialization of Fluid Art
419
4.4 The 3D Materialization of Fluid Art As a next step, we attempted to make “3D Sound of Ikebana” by a 3D printer using the 3D model shown in Fig. 8. As the 3D printer, a commercial 3D printer (3DUJ-553), which is mainly used for the production of figures and miniatures was used. The obtained 3D object is shown in Fig. 9.
Fig. 9. “Sound of Ikebana” obtained as a 3D object.
It can be seen that the three-dimensional shapes shown in Fig. 7 and Fig. 8 are reproduced with certain accuracy. The shape of “Sound of Ikebana” is extremely organic, and there are several thin and elongated columnar parts. Because of this, it was difficult to reproduce it with a 3D printer at once. Also, we had to consider the issue of safe transport. Therefore, we created the 3D shape of Fig. 8 by dividing the 3D model into several parts, created them as individual parts with a 3D printer, and combined them. This time, the work of dividing it into multiple parts was done manually. How the entire 3D model with a complex shape could be optimally divided for a 3D printer is an important research theme.
5 Conclusion We have been making artworks called “fluid art” utilizing fluid dynamics based phenomenon. One of such artworks called “Sound of Ikebana” has been evaluated as “expressing Japanese beauty”, and at the same time as “if it is made into a 3D object, it may have various applications.” This paper described our attempt to make “3D Sound of Ikebana.” First, we explained about fluid art and investigated why it is evaluated as “expressing Japanese beauty.” We proposed the assumption that fluid art is created by discovering beauty hidden in nature, which has been the traditional art creation process in Japan, and that is the reason why fluid art is evaluated to include Japanese beauty. Also, we explained that if fluid art represented by “Sound of Ikebana” could be made into a 3D object, it has a wide range
420
N. Tosa et al.
of applications, not only it can be exhibited as a new art in public spaces, but it can be applied to design everyday items such as cups and vases. There is also the possibility of applying it to new organic designs such as architectures, cars, and trains. Next, we described how to develop “3D Sound of Ikebana.” For 3D shape reconstruction, there are two methods; a passive method and an active method. Based on our survey on these technologies we decided to adopt the passive method for reconstructing a 3D model from multiple 2D images. Furthermore, among various passive methods, we have adopted the Phase-Only Correlation method, as it has several advantages over other methods. We carried out experiments using multiple still cameras but found it is difficult to shoot the phenomenon at relevant shutter timing. Therefore, for the actual experiment, we have developed a system consisting of multiple high-speed cameras and succeeded in capturing beautiful multiple scenes of “Sound of Ikebana.” Then using the Phase-Only Correlation method, the 3D model of “Sound of Ikebana” was reconstructed and then by using a 3D printer it was materialized as a 3D object. Our research and experiment are still in its early stage and we will continue to study how to improve the quality of the obtained 3D fluid art. Also, we will pursue how to apply the 3D materialized fluid art to various areas in our society.
References 1. Steve, M., Cahn, A.M.: Aesthetics: A Comprehensive Anthology. Blackwell Publishing (2007) 2. Naoko, T., Yunian, P., Qin, Y., Ryohei, N.: Pursuit and expression of japanese beauty using technology. Arts J. MDPI, 8(1), 38 (2019) 3. Bruce, R.M., et al.: Fundamentals of Fluid Mechanics. Wiley (2012) 4. Yunian, P., Liang, Z., Ryohei, N., Naoko, T.: A study of variable control of sound vibration form (svf) for media art creation. In: 2017 International Conference on Culture and Computing (2017) 5. Murat, D.: A Study on Bruno Taut’s Way of Thought: Taut’s Philosophy of Architecture. LAP LAMBERT Academic Publishing (2011) 6. Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A Comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of International Conference Computer Vision and Pattern Recognition. pp. 519–528 (2006) 7. Strecha, C., von Hansen, W., Gool, L.V., Fua, P., Thoennessen, U.: On benchmarking camera calibration and multi-view stereo for high-resolution imagery. In: Proceedings International Conference Computer Vision and Pattern Recognition. pp. 1–8 (2008) 8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) 9. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32 10. https://en.wikipedia.org/wiki/Kinect 11. Murayama, S., Torii, I., Ishii, N.: Development of projection mapping with utility of digital signage. In: IIAI 3rd International Conference on Advanced Applied Informatics. pp. 895–900 (2014) 12. Sakai, S., Ito, K., Aoki, T., Masuda, T., Unten, H.: An efficient image matching method for multi-view stereo. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7727, pp. 283–296. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3642-37447-0_22
3D Modeling and 3D Materialization of Fluid Art
421
13. Shuji, S., Koichi, I., Takafumi, A., Takafumi, W., Hiroki, U.: Phase-based window matching with geometric correction for multi-view stereo. IEJCE Trans. Inf. Syst. 98(10), 1818–1828 (2015) 14. Jae-Chern, Y., Tae, H.H.: Fast normalized cross-correlation. Circuits, Syst. Signal Proces. 28, 819 (2009) 15. Shozo, S.: Ikebana: The Art of Arranging Flowers/ Tuttle Publishing (2013)
A 3D Flower Modeling Method Based on a Single Image Lin Jiaxian, Ju Ming, Zhu Siyuan, and Wang Meili(B) College of Information Engineering, Northwest A and F University, Yangling 712100, China [email protected], {jm55480,2017050988,wml}@nwsuaf.edu.cn
Abstract. Since the structure of the flower is too complex, the modeling of the flower faces huge challenges. This paper collects 3D scenes containing flower models, uses 3dsMax to extract flower models, and constructs flower dataset. This paper proposes an encoder-decoder network structure called MVF3D and adopted the trained MVF3D network to predict the missing perspective information, use a single RGB image to generate a depth map of flowers from different perspectives, and finally use the depth maps to reconstruct the flower models. To evaluate the performance of our proposed method, for simple flowers, the average chamfer distance between the reconstructed 3D model and the real model is 0.27, The experimental results have shown that our proposed method can preserve the true structure of the flower. Keywords: Flower models · 3D reconstruction · Deep learning
1 Introduction With the research and development of visualization technology, virtual reality technology and computer vision technology, the application of 3D modeling technology in virtual reality games is more and more widely used. The flower model increases the diversity of game scene model reconstruction. Despite the modeling tools such as Maya, 3dsMax, and AutoCAD have rapidly development in recent years, using these modeling tools still requires a certain learning cost, and researchers cannot use these modeling tools efficiently to complete the 3D reconstruction of flowers. Therefore, how to generate exquisite flower models under simple input data has become a meaningful work. Traditional flower modeling methods simulate plant growth through mathematical modeling but ignore the individual structure of flowers [1]. The use of interactive hand-painting [2, 3] not only requires complicated interactive operations, but also relies too much on the user’s subjective understanding of the flower structure, and the reconstructed flowers modeled lack realism. In daily life, users can easily obtain RGB images of flowers through mobile phones, digital cameras, and other devices. More and more researchers are studying plant modeling methods based on single or multiple RGB images. Using multiple RGB images and simple interactive operations can generate realistic flower models [4]. Similarly, a © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 422–433, 2020. https://doi.org/10.1007/978-3-030-65736-9_38
A 3D Flower Modeling Method Based on a Single Image
423
real natural vegetation scene can be constructed by using a single RGB image of the tree and the contour of the tree crown [5]. With the development of deep learning, it is no longer difficult to infer a 3D structure from a 2D image [6–9]. However, there are few studies on flower 3D modeling combined with deep learning, and there is a lack of flower dataset that can be used for deep learning training. In view of the above problems, this paper aims to improve modeling speed of 3D flower model, and presents an algorithm for generating RGBD images of different viewing angles from a single flower image and a method of reconstructing the flower grid model based on RGBD images from multiple different viewing angles. This method avoids manual interaction, improves model reuse, and has certain application value for the production of 3D game scenes. The structure of this article is as follows. Section 2 introduces the composition of the flower dataset. Section 3 introduces a deep learning network model MVF3D that uses a single flower image to generate multi-view RGBD images. Section 4 introduces how to use RGBD images from different perspectives to perform 3D reconstruction of the flower model. Section 5 presents conclusions and future works.
2 Flower Dataset Deep learning needs to summarize the discovery rules from the dataset, and how to construct the flower data that meets the input of the deep neural network is the primary goal of this paper. The 520 flower models used in this paper come from websites which provide 3D scene models. By manually separating them, 4106 individual flower models are obtained, which are then rendered using Panda3D to obtain a dataset suitable for deep neural networks. 2.1 Flower Model Preprocessing This paper collected 520 3D scene models containing flowers from the Internet. An example is shown in Fig. 1. In order to facilitate the 3D reconstruction of the flower model, it is necessary to segment the flower clusters in the scene model to ensure that the segmented model contains only one flower. At present, the semantic automatic segmentation algorithm based on the skeleton or feature points of the 3D model has the limitation that with the model complexity increases, the segmentation effect becomes worse. So, this paper used a 3D modeling tool 3dsMax to process the scene model containing flowers.
Fig. 1. Scene models containing flowers collected from the internet
424
L. Jiaxian et al.
Preprocessing each flower model. The preprocessing operation ensures that the size of the extracted flower model is roughly the same, and the model center is located at the origin of the world coordinates. The flower model after preprocessing is shown in Fig. 2.
Fig. 2. Single flower models segmented by 3dsMax
2.2 Panda3D Rendering Flower Model When constructing training samples using panda3d, in order to increase the robustness of the network, the given camera elevation angle is a random angle between −10° and 30°. By adding 15° at a time, each model can generate 24 pictures. Regarding the generation of depth maps, Panda3D uses UVN cameras to render objects, and the depth value is the distance from the camera position in UVN coordinates. As shown in Fig. 3, the left figure shows the depth map rendered by Panda3D, and the right figure shows the RGB image rendered by Panda3D. It can be seen that the generated RGBD image reflects the shape of the flower under different perspectives. 2.3 Distinction of Flower Dataset Considering that the quality of RGB images and depth maps generated by deep learning depends on the accuracy of coding for the representation of 3D information, in order to reduce the training difficulty of deep neural networks, this study divides the flower dataset into simple patterns and complex patterns according to their morphology. The pattern is divided into different dataset, and the two flower pattern datasets are trained separately. The simple flower pattern is defined as having a clear petal structure. The human eye can distinguish the number of petals. The complex flower pattern cannot be
A 3D Flower Modeling Method Based on a Single Image
425
Fig. 3. Panda3D rendering effect of flower model
judged by the human eye on the number and approximate structure of petals. The selfocclusion between the petals is serious, and there is no clear distribution rule. Figure 4 shows examples of simple patterns and complex patterns, with simple patterns on the left and complex patterns on the right. Of the 4106 individual flower models, 1310 are classified as complex models and 2796 are classified as simple models.
Fig. 4. Schematic diagram of pattern differentiation
After rendering each flower model by Panda3D, we can get 24 RGBD images at different viewing angles. Randomly extract two different RGBD images in 24 images, construct 276 sample pairs, select one sample RGB images as input yi in each sample pair, and another sample RGB image yi and depth map di as deep learning Real label, such a model can build 552 samples. Leave-one-out method is used to divide the training set and the test set. The division of the flower training set is shown in Table 1. So far, we have constructed a 3D flower dataset that can be used for deep neural network training.
Table 1. Preprocessed Flower Datasets. Types of dataset
Number of training samples
Number of test samples
Simple flower dataset
1 389 052
154340
650 808
72 312
Complex flower dataset
426
L. Jiaxian et al.
3 Generate Flower RGBD Images Based on MVF3D 3.1 MVF3D Architecture MV3D (Multi-view 3D Models from Single Images) is a simple and elegant encoderdecoder structure network [7]. Given any view vector θ, the MV3D network can generate RGB images and depth maps of objects. In order to adapt the MV3D network to the flower dataset constructed in this paper, the network structure of MV3D is modified in this paper. The modified network structure is shown in Fig. 5.
Fig. 5. MVF3D algorithm architecture
This paper named the modified network structure MVF3D (Multi-view 3D Flower Models from Single Images). In order to get a more detailed flower model, the MVF3D network fixes the input image resolution to 256 × 256, and adds a convolution operation that does not change the size of the feature map after each layer of convolution operations, which increases the depth of the network, making MVF3D can extract more abstract information from the RGB image of flowers [10]. After six convolution operations, a 4 × 4 × 256 feature map is obtained. The obtained 4096-dimensional vector is used to extract the 1024-dimensional flower model feature representation through the fully connected layer fc1 (4096 × 1024). In the encoding process, the target angle of view θ is also required. The target angle of view θ is composed of 5 variables, which are the distance rad of the UVN camera and the object, The sine value sin el and cosine value cos az of the camera elevation angle el, the sine value sin az and cosine value cos az of the camera rotation angle az. The target view vector θ(rad , sin el, cos el, sin az, cos az) is processed by 3 fully connected layers, and the obtained 64-dimensional vector is combined with the vector obtained from the flower RGB image. At this point, the task of the encoder is completed. After the 1084-dimensional encoding vector is obtained, the decoder performs 3 fully connected layer operations on it, deconvolutes the vector according to the structure of the encoder, and finally obtains the predicted RGB images and depth maps of the target angle of view. MVF3D uses the same loss function (1) as MV3D. yi − yi 2 + λ − d (1) L= d i i 2
i
1
Where yi and di are the RGB image and depth map output by the MVF3D network, respectively yi and di are their corresponding labels, and λ is the hyperparameter. Considering the training cost, this study limits the super parameter λ range from 0.1 to 1.0
A 3D Flower Modeling Method Based on a Single Image
427
and increased by 0.1 each time. Experiments show that when λ = 0.8, good results can be achieved. Interested Researchers can spend more time looking for a better λ value. 3.2 Experimental Results In the experiment, the RGB images with different views obtained by training the simple flower model and the complex flower model are shown in Fig. 6 and Fig. 7, respectively.
Fig. 6. MVF3D generates multi-view results of simple flower model (The upper left is the input RGB image, the first line is the is the real RGB image, and the second line is the result generated by MVF3D)
Fig. 7. MVF3D generates multi-view results of complex flower model (The upper left is the input RGB image, the first line is the is the real RGB image, and the second line is the result generated by MVF3D)
It can be found that for the simple flower model, the predicted RGB images is blurry compared to the real RGB images, and a part of the detailed information is lost. There is a deviation in the color of the generated RGB images, but the original structure of the flower is roughly restored. The blurring of the generated picture is the result of the Euclidean loss function. The Euclidean loss function guides the network to average the latent images, which leads to the generation of blurred images. The difference in color is because compared with the difference in color, the set loss function (1) punishes the shape of the object with Euclidean loss and L1 loss, and the punishment for color difference is
428
L. Jiaxian et al.
only Euclidean loss. Therefore, MVF3D pays more attention to the geometric structure of flowers than colors. For 3D reconstruction, determining the geometric structure of objects is more important than color. In this paper, the mutual information is used to measure the loss of the image and the real image. Mutual information is also called normalized mutual information. It is an expression to measure the similarity of two images. The larger the value, the more similar the two pictures are. Usually used as a criterion or objective function in image registration. It has a good effect for the case where the difference of the gray level of the image is not large. As shown in Fig. 8, for flowers with a simple structure, the mutual information of the target depth map and the predicted depth map is 0.6743 on average, which can prove that MVF3D has learned the abstract expression of the flower model and restored most of the depth information of the simple flower model.
Fig. 8. MVF3D generates multi-view depth maps of a simple flower model
Compared with the simple flower model, MVF3D performs poorly in the prediction of complex flower data. The generated RGB images are weird. In addition to the more complex pattern and fewer training samples, that also because of the large number of complex flower petals and the complicated structure. As a result, MVF3D has no way to accurately find the feature space corresponding to complex flowers, so MVF3D does not perform as well as simple flowers in complex flowers. As shown in Fig. 9, it can be seen that for flowers with complex shapes, MVF3D does not predict the corresponding 3D structure of flowers well. Through calculation, the average mutual information between the target depth map and the predicted depth map of the complex flower model is 0.3396 also confirmed this.
Fig. 9. MVF3D generates multi-view depth maps of a complex flower model
A 3D Flower Modeling Method Based on a Single Image
429
4 3D Reconstruction Based on RGBD Images of Flowers In order to build a complete flower model, the algorithm in this chapter is mainly divided into four steps: In the first step, all matching feature point pair sets are extracted from the RGB images. In the M m1 , m1 , m2 , m2 , · · · , mk , mk second step, in the depth map D, a point cloud matching algorithm is performed on the set M according to the feature points to obtain the camera poses at different perspectives, and the rotation matrix R and the translation matrix t are obtained. The third step is to combine the point cloud to obtain the complete point cloud P. The fourth step is to use P to generate grid data V. The algorithm framework is shown in Fig. 10.
Fig. 10. Algorithm framework for 3D reconstruction of flowers
4.1 Flower RGB Image Feature Extraction and Matching Method The ORB (Oriented FAST and Rotated BRIEF) algorithm is efficient on extracting and describing feature points [11]. The ORB is fast and has stable rotation invariance. At the same time, the size of the flowers in the RGB images generated by MVF3D is basically the same, and the angle of the adjacent RGB images is 15 degrees different. This paper uses the ORB algorithm to extract features from flower RGB images. As shown in Fig. 11, the circle on the picture represents the feature point determined by the ORB algorithm.
Fig. 11. ORB extracted flower RGB image features
After obtaining the ORB features of each flower perspective, this paper uses Hamming distance as shown in Eq. (2) as the feature point similarity metric for flower images
430
L. Jiaxian et al.
from different perspectives. d (x, y) =
n i=1
x[i] ⊕ y[i]
(2)
x and y are the n-dimensional ORB feature vectors extracted from the two flower pictures. Comprehensive efficiency and accuracy, this paper uses FLANN algorithm [12] to match the feature points and descriptors of the flower RGB images. The rotation angles of the two flower pictures to be matched differ by 15°. Figure 12 shows the effect of using the FLANN algorithm to match it.
Fig. 12. FLANN algorithm matching result
It can be seen that there has a large number of mismatch points. Based on experience, this paper screens the matching results. The screening criterion is to leave the matching points where the Hamming distance is less than twice the minimum distance, and the minimum distance should be greater than 30. After screening, the matching results of the FLANN algorithm obtained in this paper are shown in Fig. 13. It can be seen that a large number of mismatched point pairs are filtered, leaving the correct matched point pairs, which also reduces the burden of subsequent camera pose estimation.
Fig. 13. FLANN algorithm matching screening result
4.2 Flower Point Cloud Registration Method In this paper, the ICP algorithm [13] is used to solve the rotation matrix R and translation matrix t between the flowers of different views generated by MVF3D. Since there are 24 flower images with different viewing angles, this paper specifies that the first image rendered by Panda3D is the standard coordinate system, and the remaining view images need to be transformed into the standard coordinate system through rotation and translation operations. In order to facilitate the measurement of the error with the real flower
A 3D Flower Modeling Method Based on a Single Image
431
model, after the point cloud registration, the camera pose of the first picture needs to be used to convert the point cloud into the original world coordinate system and normalize it. The ORB feature has a poor effect of extracting feature point pairs when the image rotation angle is too large. Therefore, when solving the rotation matrix RSi and the translation matrix tSi , we first convert the perspective image i coordinate to the perspective coordinate of the intermediate image i-1 (or i + 1) closer to the standard coordinate, and find the rotation matrix R(i−1)i or R(i+1)i and translation matrix t(i−1)i or t(i+1)i , and then converted into the coordinates of the target angle standard image S through the intermediate perspective. In this way, the error can be evenly distributed to each transformation. Since the total number of point clouds obtained after registration is uncertain, this study uses the chamfer distance to measure the gap between the real flower point cloud and the predicted flower point cloud. The definition of the chamfer distance is shown in Eq. (3). CD(S1 , S2 ) =
1 1 miny∈S2 x − y2 + minx∈S1 y − x2 x∈S1 y∈S2 |S1 | |S2 |
(3)
S1 is the target point cloud, and S2 is the predicted point cloud. The chamfer distance calculates the average error between two point clouds. Figure 14 shows the flower point cloud model after point cloud registration using 6 depth maps with the angle of view difference from standard coordinates of 0, 60, 105, 165,−135, and −30. It can be seen that the point cloud is relatively evenly distributed. The average chamfer distance between the real point cloud and the predicted point cloud is 0.2703, which also confirms this view.
Fig. 14. Flower point cloud after registration by ICP algorithm
4.3 Flower Meshing Method For the flower model, the petal surface meets the smooth condition, the point cloud generated by multi-view flower image registration will not have holes. Therefore, this paper uses the greedy projection triangulation algorithm [14] to convert the simple flower point cloud into a mesh model, and obtain the final result of the simple flower 3D reconstruction, as shown in Fig. 15. It can be seen that the generated mesh model restores the general shape of the flower model. However, due to the blurry RGBD images output by the MVF3D network and the angle of view estimation error, the created 3D model has some noise and lost some
432
L. Jiaxian et al.
Fig. 15. 3D mesh reconstruction effect of a simple flower
details. Compared with directly using a single real RGBD image to convert into a flower mesh model, as shown in Fig. 16, the flower model generated in this paper reduces the probability of occurrence of holes and saves the subsequent steps of hole filling.
Fig. 16. Simple flower mesh model reconstructed from a single RGBD image
5 Conclusions and Future Works This paper improved the MV3D network structure, and constructed a flower model dataset for MVF3D network training. The structure can infer the RGB image and depth map of the simple flower model at different viewing angles from the input single flower image. The depth map was used to reconstruct the 3D mesh model of the flower. This work required less manual interaction and improved the modeling efficiency. In the future work, due to the poor performance of MVF3D for complex flower models, it considers using image segmentation algorithm [15, 16] to segment each part of the flower, and then modeling and combining each part to form a complete flower. At the same time, it considers collecting more flower samples to improve the performance of the network facing complex flower models. For the details of reconstruction, it considers using an adversarial generation network [17] training samples to obtain clearer RGB images in our future work. In the processing of point cloud registration, it may be considered to introduce a deep neural network to improve the registration accuracy of point clouds from different perspectives [18, 19].
References 1. Prusinkiewicz, P.: A look at the visual modeling of plants using L-systems. In: Hofestädt, R., Lengauer, T., Löffler, M., Schomburg, D. (eds.) GCB 1996. LNCS, vol. 1278, pp. 11–29. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0033200
A 3D Flower Modeling Method Based on a Single Image
433
2. Steven, L., Adam, R., Frederic, B., Przemyslaw, P.: TreeSketch: interactive procedural modeling of trees on a tablet. In: Proceedings of the International Symposium on Sketch-Based Interfaces and Modeling, pp. 107–120. Eurographics Association, Goslar (2012) 3. Lintermann, B., Deussen, O.: A modelling method and user interface for creating plants. Computer Graphics Forum, pp. 73–82. Wiley-Blackwell, Malden (1998) 4. Quan, L., Tan, P., Zeng, G., Yuan, L., Wang, J., Kang, S.B.: Image-based plant modeling. ACM Trans. Graph. 25(3), 599–604 (2006) 5. Argudo, O., Chica, A., Andujar, C.: Single-picture reconstruction and rendering of trees for plausible vegetation synthesis. Comput. Graph. 57, 55–67 (2016) 6. Chao, W., Zhang, Y., Li, Z., Fu, Y.: Pixel2mesh ++: multi-view 3D mesh generation via deformation. In: ICCV, pp. 1042–1051 (2019) 7. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Multi-view 3D models from single images with a convolutional network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 322–337. Springer, Cham (2016). https://doi.org/10.1007/978-3-31946478-7_20 8. Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018). https://doi.org/10. 1007/978-3-030-01252-6_4 9. Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_18 10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. ICLR, 1–14 (2015) 11. Ethan, R., Vincent, R., Kurt, K., Gary, B.: ORB: an efficient alternative to SIFT or SURF. In: ICCV, pp. 2564–2571(2011) 12. Marius, M., David, G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014) 13. Besl, P., Mckay, N.: A method for registration of 3-D shapes. Int. Soc. Optic. Eng. 14(3), 239–256 (1992) 14. Liu, J., Bai, D., Chen, L.: 3-D point cloud registration algorithm based on greedy projection triangulation. Appl. Sci. 8(10), 1–10 (2018) 15. Zhang, Z., Duan, C., Lin, T., Zhou, S., Wang, Y., Gao, X.: GVFOM: a novel external force for active contour based image segmentation. Inf. Sci. 506, 1–18 (2020) 16. Wang, W., Wang, Y., Wu, Y., Lin, T., Li, S., Chen, B.: Quantification of left ventricle via deep regression learning with contour guidance. IEEE Access 7, 47918–47928 (2019) 17. Goodfellow, I.J., et al.: Generative adversarial networks. Adv. Neural. Inf. Process. Syst. 3, 2672–2680 (2014) 18. Wang, Y., Xie, Z., Xu, K., Dou, Y., Lei, Y.: An efficient and effective convolutional autoencoder extreme learning machine network for 3D feature learning. Neurocomputing 174, 988–998 (2015) 19. Dou, P., Kakadiaris, I.A.: Multi-view 3d face reconstruction with deep recurrent neural networks. Image Vis. Comput. 80, 80–91 (2018)
Dynamic 3D Scanning Based on Optical Tracking Han Jiangtao, Yao Longxing, Yang Long, and Zhang Zhiyi(B) Northwest A and F University, College of Information Engineering, Yangling, Shaanxi, China [email protected], [email protected] Abstract. In order to solve the problem that traditional binocular vision 3D scanning requires multiple scans and then registration, a dynamic 3D scanning method based on optical tracking is proposed. The first part is monocular optical scanning, which uses laser stripes as the active light source to achieve the function of 3D scanning. The second part is monocular optical tracking, which realizes the calculation and tracking of the position and posture of the scanning device, and instantly converts the point cloud data obtained by the scanning camera to the tracking camera coordinate system, realizing real-time data registration during the dynamic scanning process. The experimental results show that this method can achieve dynamic scanning to obtain the point cloud information of the object. The accuracy of the scan is about 1 mm in the direction of the XOY coordinate plane of the coordinate system and about 0.08 mm in the direction of the z-axis. At the same time, the scanning time can be saved by 50% compared with the traditional binocular vision 3D scanning method. Keywords: Dynamic 3D scanning registration
1
· Optical tracking · Real-time
Introduction
The 3D scanning technology can scan the shape of the space object to obtain 3D point cloud data on the surface of the scanned object. At present, most of the 3D scanning equipment that uses structured light or laser as the active light source needs to perform multiple scans at different viewpoints, and then registration each part of the data to obtain complete data of the object. This application process, because it does not solve the problem of dynamic scanning, not only makes its application scenarios extremely limited, but also the time consumption and accuracy loss of subsequent registration affected its ease of use. Since the 1990 s, camera calibration methods based on specific targets have made successive breakthroughs. Classic five-point algorithm, six-point algorithm, Supported by “the Fundamental Research Funds for the Central Universities”, and “National Natural Science Foundation of China, No.61702422”. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 434–441, 2020. https://doi.org/10.1007/978-3-030-65736-9_39
Dynamic 3D Scanning Based on Optical Tracking
435
calibration algorithm [9,11], self-calibration algorithm, and Bundle Adjustment algorithm [2,6] have appeared one after another. In 2003, Andrew Davison proposed Visual SLAM based on monocular camera, published the groundbreaking MonoSLAM [3], and opened the era of visual positioning—Visual SLAM [1]. Recent research shows that using LCD (Liquid Crystal Display) to achieve camera calibration [4], the accuracy can reach 0.018 pixels. Zhiyi Zhang and Lin Yuan put forward a method to realize 3D scanning by monocular vision combined with laser stripes [10]. In addition, the best paper of CVPR 2019 [8] starts with Shape-from-X, and can obtain millimeter-level accuracy. In the classical calibration algorithm, Zhang Zhengyou algorithm needs several images of position and posture transformation to get accurate results, and it is difficult to obtain position and posture information with only one image. The Tsai’s twostep calibration algorithm can be used for reference [5,7]. Different from the existing methods, the method proposed in this paper decomposes the traditional 3D scanning system based on laser stripes and binocular vision into two parts: one part is a scanning system based on laser stripes and monocular vision, and the other part is based on optical tracking system for monocular vision.
2 2.1
Methodology Overall System Description
The dynamic scanning system designed in this paper is mainly composed of two parts, as shown in Fig. 1. The first part is a monocular optical tracking system, consisting of a tracking camera. It mainly implements the calculation and tracking of the position and posture of the scanning device. First, calibrate the position and posture of the scanning camera in the tracking camera coordinate system, and then the position and posture between the scanning camera and the tracking target in the tracking camera coordinate system should be calibrated. The second part is a monocular optical 3D scanning system, consisting of a scanning camera, tracking target and laser. Using laser stripes as the active light source, the center of the light strip is first extracted to obtain sub-pixel-level light strip information, and then the light plane parameters are calibrated by repeatedly calibrating the calibration board. Finally, the light strip information and light plane parameters are used to calculate the point cloud data of a single image. Finally, the point cloud information obtained by the monocular 3D scanning system, using the parameter information calibrated by the monocular optical tracking system, instantly converts the scanned data to the system with the optical tracking device as the boundary coordinate, to realize the data registration in the dynamic scanning process. 2.2
Monocular Optical Tracking System
In this part, the optical tracking of 3D dynamic scanning is mainly realized. By calculating the position and posture between the tracking target and the
436
H. Jiangtao et al.
Fig. 1. 3D dynamic scanning model
scanning camera, the point cloud information obtained by the monocular 3D scanning system is instantly converted to the tracking camera coordinate system to complete the dynamic scanning. This part mainly includes the position and posture calibration of the scanning camera and the tracking target and the conversion from a single image to the tracking camera coordinate system. Scanning Camera and Tracking Target Position and Posture Calibration. In order to ensure the 3D point cloud information obtained from each image during the dynamic scanning process, and implement the conversion from the scanning camera coordinate system to the tracking camera coordinate system, the key step is to calibrate the position and posture between the scanning camera coordinate system and the tracking target. In this part, the scanning camera and the tracking camera are use to obtain the position and posture of the plane of the calibration board at the same time multiple times, so as to calibrate the position and posture of the scanning camera and the tracking target. The transformation relationship model of position and posture calibration is shown in Fig. 2. It can be seen from Fig. 2 that the tracking camera can simultaneously obtain T the calibration point coordinates of the tracking target Ptt = [xtt , ytt , ztt ] and T the calibration target Ptc = [xtc , ytc , ztc ] , and the scanning camera can obtain T the calibration point coordinates Psc =[xsc ,ysc ,zsc ] of the calibration target. Since the entire dynamic scanning system is under the same world coordinate system, the position and posture of the calibration target in the world coordinate
Dynamic 3D Scanning Based on Optical Tracking
437
Fig. 2. Scanning camera and tracking target position and posture calibration model
system are determined and unique. The position and posture of the scanning camera coordinate system under the tracking camera coordinate system can be calculated using different calibration point coordinates under the scanning camera coordinate system and the tracking camera coordinate system. First, calculate the translation transformation Tsc between Ptc and Psc , then move Ptc and Psc to the origin of the coordinate system, and use the least square method to calculate the rotation transformation Rtc between the two, as shown in Eq. (1). (1) Ptc = Rtc Psc +Ttc The coordinate of the scanning camera under the tracking camera can be obtained by using the rotation and translation transformation, as shown in Eq. (2). The calibration board used in this paper is composed of a set of circular calibration points in an ordered sequence. The direction of the x-axis and yaxis of the calibration plate can be determined from only one image, so the sequence of characteristic calibration points of the XOZ coordinate plane can be used instead of the position and posture of the coordinate system. It should be noted here that since the rotation and translation transformation from the scanning camera coordinate system to the tracking camera coordinate system was previously calculated, the translation operation needs to be reversed when the scanning camera position and posture is reversed by the tracking camera. PscaCam = Rtc PtraCam + R−1 tc Ttc T
(2)
where PtraCam = [xi , 0, zi ] is the calibration points under the tracking camera T coordinate system, and PscaCam = [xi , yi , zi ] is the calibration points under the scanning camera coordinate system. The position and posture of the tracking target in the tracking camera coordinate system can be directly calculated. Therefore, after determining the position
438
H. Jiangtao et al.
and posture of the scanning camera in the tracking camera coordinate system, the final calibration of the scanning camera and the tracking target can be performed. Here, it is necessary need to calculate the rotation and translation matrix RTts from the tracking target Ptt to the scanning camera PscaCam , as shown in Eq. (3). (3) PscaCam = RTts Ptt Conversion from Single Image to Tracking Camera Coordinate System. After determining the position and posture between the scanning camera and the tracking target, the point cloud information obtained by each scan can be converted to the tracking camera coordinate system. First, the tracking camera obtains the 3D coordinates of the tracking target in the tracking camera coordinate system, and uses the inverse operation of Eq. (3) to calculate the corresponding 3D coordinate position and posture of the scanning camera coordinate system in the tracking camera coordinate system. Then it is necessary to determine the rotation and translation transformation RTts from the tracking coordinate system to the scanning coordinate system, as shown in Eq. (4). −1 PtraCam PT RTts = PscaCam PT traCam traCam
(4)
After applying the calculated rotation and translation transformation to the point cloud Ps obj obtained under the scanning camera coordinate system, the 3D coordinates of the scanning object point cloud under the tracking camera coordinate system Pt obj can be obtained, as shown in Eq. (5).
Pt obj = RTts Ps obj
3
(5)
Experimental Results
In this paper, C++ is used as the programming language, and OpenCV 3.0 is used as the image processing library. Windows 10 with 8 GB memory is used as the development platform for the system. The tracking camera is Logitech HD Pro Webcam C920 with a resolution of 1920 * 1080. The scanning camera is Huagu Power WP-UT530 with a resolution of 2592 * 2048. 3.1
Error Assessment
In this part uses a 3D printer to print out a standard model, and scans the standard model to obtain the point cloud information of the model, so as to analyze the error of the dynamic scanning model. The error assessment model used in this part is a staircase model. As shown in Fig. 3(a), this staircase model has four steps, and the height and width of each step set to 20 mm. The point cloud data obtained by scanning the front steps of the stairs is shown in Fig. 3(b). For the obtained point cloud data, the front model consists of 8 faces with 4 steps. After accurate measurement with vernier calipers, the distance between
Dynamic 3D Scanning Based on Optical Tracking
439
Fig. 3. Error assessment model
the horizontal and vertical planes of the real model of each step and the corresponding plane of the next step can be obtained with an accuracy of 0.01 mm. Use the measured point cloud data to fit 8 planes respectively, calculate the distance between adjacent steps and compare with the actual distance. The comparison results are shown in Table 1. Table 1. Distance error measurement Compare plane True distance/mm Scan fitted distance/mm Distance error/mm Plane 1 and Plane 2 and Plane 3 and Plane 4 and Plane 5 and Plane 6 and Average
3.2
3 4 5 6 7 8
19.88 20.02 19.82 19.94 19.82 19.94 19.90
18.6261 19.2738 18.6382 19.1493 18.3960 18.9861 18.8449
−1.2539 −0.7462 −1.1818 −0.7907 −1.4240 −0.9539 −1.055
Point Cloud Details
In order to verify the effectiveness and feasibility of the method in this paper, the bronze teapot and doll printed by 3D printer was used as measurement models to show the obtained point cloud data and details. As shown in Fig. 4, each model consists of the original image, the overall point cloud image and the detailed enlarged image. It can be seen from the images that each model can realize the dynamic scanning of the scanned object. However, in addition to the scanning method, the material and color of the scanned object will also affect the scanning result. For the matte material, the red laser has a reflection property similar to Lambertian, so the point cloud
440
H. Jiangtao et al.
Fig. 4. 3D dynamic scanning results
data obtained by scanning is better. As shown in Fig. 4(b), it is the scan result of the teapot. Due to the age of the bronze teapot model, the surface is corroded to produce patina (basic copper carbonate, CuCO3•Cu(OH)2), and dark spots formed after the patina is peeled off. When the laser light strip is projected to these parts, the red laser light is absorbed, and the R channel of the corresponding pixel point on the image cannot reach the threshold, so that the surface is mottled. The edge of the teapot lid has a thickness of about 3 mm, and there is a ring with an inner diameter of about 2 mm next to the teapot lid. Both details can be obtained with clear point cloud data. For the doll model of Fig. 4(e), it can be seen that the eyes of the doll in the original picture have a depth of about 1 mm. It can be seen from the enlarged image that the dynamic scanning method proposed in this article can obtain the details of the eye part well. It can be seen from the point cloud details that the dynamic scanning model proposed in this paper can convert the scanned data to the direct coordinate system with the optical tracking device in real time during the dynamic scanning process, which can avoid the registration process of multi-view scanning data in the later traditional method, and support interrupted scanning.
4
Conclusion and Outlook
The experimental results show that the method proposed in this paper can scan the two groups of objects to obtain the point cloud information of the scanned objects, with an accuracy of about 1 mm. The method proposed in this paper
Dynamic 3D Scanning Based on Optical Tracking
441
can convert the scanned point cloud into the tracking camera coordinate system in real time by separating scanning and tracking, which not only avoids the registration process of multi-viewpoint scanned point cloud in the traditional method, but also allows intermittent scanning operations. However, it should be pointed out that in the scanning range, it is affected by the tracking device’s field of view range and the accuracy of tracking target board acquisition on the tracking camera coordinate system. How to achieve precise control of optical tracking equipment, how to expand the field of view of the tracking camera is the main content to be studied next.
References 1. Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: Codeslam learning a compact, optimisable representation for dense visual slam. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2560– 2568 (2018) 2. Davison: Real-time simultaneous localisation and mapping with a single camera. In: Proceedings Ninth IEEE International Conference on Computer Vision, vol. 2, pp. 1403–1410 (2003) 3. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: Monoslam: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007) 4. Jiang, J., Zeng, L., Chen, B., Lu, Y., Xiong, W.: An accurate and flexible technique for camera calibration. Computing 101(4), 1–18 (2019) 5. Tsai, R.: An efficient and accurate camera calibration technique for 3D machine vision. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 364–374 (1986) 6. Tang, Q.H., Zhang, Z.Y.: Camera self-calibration based on multiple view images. Computer Engineering and Science (2017) 7. Tsai, R.: A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf tv cameras and lenses. IEEE J. Robot. Autom. 3(4), 323–344 (1987) 8. Xin, S., Nousias, S., Kutulakos, K.N., Sankaranarayanan, A.C., Narasimhan, S.G., Gkioulekas, I.: A theory of fermat paths for non-line-of-sight shape reconstruction. In: Proceedings of (CVPR) Computer Vision and Pattern Recognition (June 2019) 9. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Analy. Mach. Intell. 22(11), 1330–1334 (2000) 10. Zhang, Z., Yuan, L.: Building a 3D scanner system based on monocular vision. Appl. Opt. 51(11), 1638 (2012) 11. Zhengyou, Z.: Flexible camera calibration by viewing a plane from unknown orientations. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1, pp. 666–673 (1999)
Animation
Body2Particles: Designing Particle Systems Using Body Gestures Haoran Xie(B) , Dazhao Xie, and Kazunori Miyata Japan Advanced Institute of Science and Technology, Ishikawa, Japan [email protected] Abstract. Body2Particles is an interactive design system that allows users to create complex particle systems using body gestures. Even for skilled animators, adjusting the various parameters for the animation of complex physics systems is a boring task. Common users may feel frustrated in attempting to understand these parameters and their proper usages. To solve these issues, Body2Particles provides an embodied design interface for the animation of complex particle systems. This work focuses especially on the animation of fireworks as it requires the hierarchical construction of multiple particle systems. The proposed user interface takes the user’s body movements as input from depth sensor and outputs firework animation. In our preliminary study, we identified the relationships among the animation parameters and skeleton joints. The system proposed in this study adapts these relationships using predefined parameter constraints and estimates the simulation parameters based on the captured skeleton from a depth sensor. User studies are also conducted to verify the effectiveness and potential uses of this system such as exercise promotion.
Keywords: User interface Body tracking
1
· Particle system · Firework animation ·
Introduction
Particle systems are used to simulate complex phenomena in virtual worlds, such as fluid, smoke, and fireworks, which are common in computer animations and video games [12]. Although various sophisticated Euler and Lagrangian fluid simulation approaches are used in computer graphics nowadays, it is convenient for common users to use particle systems due to the low computation cost and the plugin embedded in popular game engines, such as Unity3D. It is not easy for common users to achieve the desired animation results by adjusting various control parameters. Even for professional users, it is boring task to control the parameters with the traditional graphical user interfaces (GUI). If a user wanted to create fireworks animation, they would have to study the flow dynamics of actual fireworks and master the appropriate simulation software, which are not interesting tasks for common users. c IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 445–458, 2020. https://doi.org/10.1007/978-3-030-65736-9_40
446
H. Xie et al.
We have also observed that lack of exercise is a serious social issue because we are so busy with our daily studies and work that we have few motivation to exercise, leading to a variety of health problems. A embodied design interface for content creation can facilitate bodily self-expression in contrast to the common design interface with keyboards and mouse as system input. To make the user exercise while designing the interesting items, we aim to present a gesture-based embodied interface for designing a complex particle system. Through such an interface, the common users can design particle systems using body gestures that exercise their bodies. To achieve these goals, we propose Body2Particles, an interactive system that allows any user to easily design complex particle systems using full-body gestures as system input. Body2Particles provides a similar experience as playing games with Microsoft Kinect in the design processes for particle systems. In this work, we especially focus on firework animations which are common and fascinating phenomena which can be represented by particle systems in our daily lives. Different from the conventional design system used for firework animations [5], the proposed system tracks human gestures as input. Body2Particles enables the procedural design of particle systems without requiring professional knowledge. As shown in Fig. 1, we conduct the parameter mapping from body posture onto the control parameters of a particle system, so users are not required to understand simulation principles or master any simulation skills. The main contribution of this work is to provide an intuitive and playful user experience in animation design using body gestures. We believe that the proposed system can be applied to general procedural modeling techniques in computer graphics, especially for physical animation designs, such as cloth and smoke animations.
Fig. 1. Embodied design interface for firework animations using body gestures.
Body2Particles: Designing Particle Systems Using Body Gestures
2
447
Related Work
In this section, we review related work on firework animation, design interfaces. We also discuss the training systems to support daily exercise. Firework Animation Particle systems are usually used to model fuzzy objects, such as clouds, water, and fireworks [12]. It is difficult to use particle systems to design the fuzzy objects under animation constraints, such as into particular shapes. A GPU-based firework simulation pipeline has been proposed to animate shape-constrained fireworks [16]. Recently, a similar approach has been proposed for a virtual-reality environment with a head-mounted display [5]. The authors provided a sketch-based design system for shape-constrained firework animations. All these approaches were proposed for computers with traditional GUI. However, our work aims to provide an embodied design interface for fireworks animation. Design Interface. It is challenging to create a design interface for novice users, especially for the tasks required high-level skills, such as aerodynamics [14] and fluid dynamics [17]. The common solutions fall into two categories: sketch-based design and gesture-based design. An interactive sketch system usually adopts a data-driven approach to provide real-time feedbacks to users. Sketch2Photo was proposed to composite realistic pictures with freehand sketches [4]. Sketch2VF can help novice users design a 2D flow field with conditional generative adversarial networks [6]. Sketch2Domino provided the spatial sketch design of block management and guiding for real world tasks [10]. As a gesture-based design system, BodyAvatar enables the design of freeform 3D avatars using human gestures [15]. This embodied interaction can also be used in 3D furniture design using body posing and acting as system input [9]. Unlike these gesture-based design systems, we provide an embodied design interface for dynamical animation rather than 3D modeling. Training System. A training system usually used depth sensors, such as Microsoft Kinect, to support sports training [2]. A more complex motion capture system has been used to guide tai chi exercise in a virtual environment [7]. A similar 3D virtual coaching system has been proposed to improve golf swing technique [8]. In addition, a VR training system has been proposed to learn the basketball free throw gestures with motion capture devices [11]. Due to the rapid development of deep learning based approaches, the human skeleton [3] and 3D meshes [13] can be constructed interactively to guide users in sport training. In contrast to previous training systems, our work aims to design particle systems and provide a game experience that promotes user exercise.
3
System Configuration
The proposed system framework of Body2Particles is illustrated in Fig. 2. To explore the relationship between body gestures and the control parameters of
448
H. Xie et al.
Fig. 2. System framework with parameter mapping in the preliminary study and the runtime embodied design interface with controlled parameters.
particle systems, we conducted a preliminary study to collect motion data, in which the users were asked to perform gestures freely with the recorded fireworks animations of different control parameters. In our embodied design interface for firework animation, we explicitly designed body gestures to correspond with different launching angles and heights and different blooming sizes and shapes. The final firework animations are created based on the users’ gestures in real time. 3.1
Body Tracking
In this study, we used the RGB-D depth camera for body tracking (Kinect V2 with Kinect SDK 2.0) and obtained the coordinates, velocities and angles of the captured skeletons’ joints. As shown in Fig. 3(a), we used 21 items of joint data from the 25 joints captured by the Kinect configuration. Note that the data on ankles and hand tips were ignored because they were not clearly visible in this study. Then, we defined the distances between two hands Dh , elbows De , shoulders Ds , knees Dk , and feet Df for the particle system control. To control blooming shapes, a shape made by two arms is estimated based on the angles between the forearms and upper arms, where θl and θr denote the left and right arm joint angles as shown in Fig. 3(b). 3.2
Firework Animation
A particle system has been proposed to represent fuzzy objects, such as fireworks with a collection of many small particles [12]. Each particle system specifies various parameters to control the dynamical behaviors of fireworks as shown in Fig. 4(a). A particle system normally goes through three stages in a given lifetime: the creation of particles, a change in particle attributes, and disappearance. After each particle in the particle system is rendered, fireworks animations can be generated.
Body2Particles: Designing Particle Systems Using Body Gestures
449
Fig. 3. Skeleton tracked by the depth camera (a) and the defined metrics (b).
In this study, we generate a fireworks animation by dividing the whole life cycle of firework generation into two phases, from setting off the fireworks to their extinguishing. As shown in Fig. 4(b), the firework to be launched is the parent particle, and the firework after blooming is a sub-emitter of particle systems. There are different control parameters defined in two phases as follows: – Launching phase: launching height and angle, and number of fireworks; – Blooming phase: blooming sizes and shapes, number of particles after blooming are specified as sub-emitter particles from blooming to extinction. Note that we developed both single and multiple fireworks modes in our prototype development in terms of the number of fireworks.
4
User Interface Design
Body2Particles provides an embodied design interface for particle systems, which takes users’ body gestures as input. In this section, we discuss the preliminary study conducted to map body gesture parameters and control parameters, and to define the gesture controls used in our interface to increase the system immersion and user experiences. 4.1
Preliminary Study
We conducted a preliminary study to explore how to express fireworks by observing the animations of fireworks from the common users. All the fireworks animations were generated by modifying the control parameters manually using Unity3D. Totally, we collected 18 basic styles of fireworks animations as shown in Fig. 5. To cover the feature spaces in designing fireworks, we adopted the following main parameters of particle system.
450
H. Xie et al.
Fig. 4. Control parameters of particle systems (a) and generation phases (b).
Fig. 5. Video clips of fireworks animations used in the preliminary study.
– – – – – –
Launching Angles (a ∼ c): 15◦ , 30◦ , and 45◦ degrees from vertical direction; Launching Heights (d ∼ f): predefined low, medium, and high heights; Launching Numbers (g ∼ i): 2, 3, 6 fireworks launched simultaneously; Blooming Numbers (j ∼ l): 8, 20, 100 particles in sub-emitter; Blooming Sizes (m ∼ o): small, medium, and large sizes of firework explosion. Blooming Shapes (p ∼ r): hemispherical, ring and weeping-willow shapes of fireworks.
We asked five participants to join our preliminary study (three males and two females). The experiment setup is shown in Fig. 2. The participants stood in front of Kinect depth sensor, and a display screen which showed video clips of generated fireworks animations. The participants were asked to express the fireworks using their body gestures. The distance between participants and the screen was 3.0 m.
Body2Particles: Designing Particle Systems Using Body Gestures
451
Fig. 6. Motion trajectories tracked in the preliminary study corresponding to the firework animations from (a) to (r).
4.2
Parameter Mapping
Figure 6 shows the motion trajectories tracked during our preliminary study, and each figure corresponds to the video clip of firework animation. We found that all participants preferred to use hand movements to express the firework dynamics in both the launching and blooming phases. For different launching angles of fireworks as shown in Figs. 5 and 6 (a ∼ c), the participants were most likely to tilt their bodies to the left or right. In interviews, participants said that it was difficult to express the differences among blooming numbers of firework animations. In addition, there was no obvious difference in the data on male and female participants’ movements. With reference to the results of the preliminary study, we explicitly define the parameter mapping between body gestures and control parameters as follows: Launching Angles are calculated from the tilting angle of the user’s body θB . θB = arccos(
yneck − ysb (xneck − xsb
)2
+ (yneck − ysb )2 + (zneck − zsb )2
)
(1)
where (xneck , yneck , zneck ) is three-dimensional coordinates of the body’s neck joint, and (xsb , ysb , zsb ) is the coordinates of spine base joint.
452
H. Xie et al.
Fig. 7. Body gestures for launching angles (a), launching heights (b), blooming shapes (c-e), and launching numbers (f) of fireworks animations.
Launching Heights are defined by the relative positions of the user’s head, spine shoulder and spine middle joints as shown in Fig. 3. Because it is difficult to distinguish between launching heights with small variations, we defined three levels: high yhead ∈ (yspine−shoulder , ∞), medium yhead ∈ (yspine−mid , yspine−shoulder ), and low yhead ∈ (−∞, yspine−mid ) heights, where yhead , yspine−shoulder , and yspine−mid denote the height (y-coordinate) of the user’s body, spine shoulder, and spine middle joints. Launching Numbers of parent particles in particle system are decided based on the acceleration values of the hand joints. The larger the acceleration of the user’s hands, the greater the number of launching fireworks. Blooming Sizes are calculated as the maximum values among the distances between the two hands, elbows, shoulders, knees and feet, as max(Dh , De , Ds , Dk , Df ). Then, we adjust the life cycle of sub-emitter particles to modify the blooming sizes. During the preliminary study, the largest fireworks were generated by the maximum distance between two hands. We argue that these size constraints are related to exercise promotion while the users intended to achieve larger fireworks. Blooming Shapes are set to be hemispheres, rings or weeping-willow shapes in our prototype design, which are the typical and common shapes of fireworks. The blooming shapes are determined by hand distance Dh and the arm angles θl and θr . If Dh > δ (δ is a threshold value of shape criterion, δ = 40 cm in this study) and θl,r > 115◦ , the blooming shape is set to be hemispherical. If Dh < δ and 60◦ < θl,r ≤ 115◦ , the blooming shape is set to be a closed shape (ring or weeping-willow shape). We decide the specific shape in terms of the relative position of hands to body core. Therefore, it is ring shape if yhand ∈ (yspine−mid , ∞), and weeping-willow shape if yhand ∈ (yspine−base , yspine−mid ). yhand and yspine−base denote the height (y-coordinate) of user hands and spine base joints.
Body2Particles: Designing Particle Systems Using Body Gestures
453
Fig. 8. Gesture controls for color selection (a1, a2) and transition between single and multiple fireworks modes (b1, b2) in the proposed design user interface.
Figure 7 shows the firework animations generated by different body gestures with our proposed parameter mapping approaches. Please refer to the accompanying video for the details of gesture controls. 4.3
Gesture Control
In addition to the parameter mapping between body gesture and control parameters, Body2Particles provides the global setting for firework animations, including initial launching conditions, particle colors, and single or multiple fireworks modes. Firework animations can be designed with different colors and numbers using our proposed embodied design interface as shown in Fig. 8. Launching Conditions. In order to increase the recognition rate of body gestures, the design system launches fireworks under the initial conditions. Based on our preliminary study, we set the initial condition as the hand heights are 20 cm above spine middle joints, and the velocity of the user’s hands should exceed 1.8 m/s. Color Selection. The user can select the fireworks colors using the right hand in our design system as shown in Fig. 8(a1, a2). The height of right hand positions yhr is used to determine the colors from bottom to up. The fireworks color can be set to orange if yhr ∈ (yhead , ∞), yellow if yhr ∈ (yspine−shoulder , yhead ), purple if yhr ∈ (yspine−mid , yspine−shoulder ), blue if yhr ∈ (yspine−base , yspine−mid ), dark blue if yhr ∈ (yknee−right , yspine−base ), and green if yhr ∈ (0, yknee−right ). Mode Selection. The user can select the single or multiple fireworks modes in our design system using the knees as shown in Fig. 8(b1, b2). To diversify the gestures
454
H. Xie et al.
and promote exercise, we set the single fireworks mode if yf oot−lef t > 20 and (yknee−kef t − yhip−lef t ) > 10 cm with the user’s left leg, and multiple firework mode if yf oot−right > 20 and (yknee−right − yhip−right ) > 10 cm with the user’s right leg.
5
User Study
In our user study, the prototype system of Body2Particles was developed using Unity3D in Windows, Kinect V2 for depth sensor, and HUAWEI Honor band 5 for calorie consumption measurement. The experiment setup and participants are shown in Fig. 9. In order to verify the user experiences using the proposed system, we conducted a subjective evaluation through a questionnaire and an objective evaluation by measuring activity intensity.
Fig. 9. Experiment setup (a) and participants performances (b) in our user study.
There were 12 graduate students joined this user study (6 males and 6 females). All participants were asked to experience the proposed system for a total of 20 min, 10 min in both single and multiple fireworks. During the user study, the participants wore the smartwatch device to record calorie expenditure, which was used to calculate the intensity of the 20-minute activity (metabolic rates).
6
Results
In this section, we discuss the subjective and objective evaluation results from our user study.
Body2Particles: Designing Particle Systems Using Body Gestures
455
Fig. 10. Results of the subjective evaluation questionnaire.
Subjective Evaluation. We have confirmed the effectiveness of three aspects of proposed system through the subjective evaluation: system usage, system effects, and user experience as shown in Fig. 10. We adopted 5-Likert scale for all questions (5 for strongly agree, 1 for strongly disagree). We have received quite positive feedback on all evaluation aspects including mean values of 4.4 for system usage, 4.5 for system effects, 4.1 for user experiences. For system usages, it is verified that our proposed system is easy to operate and provide adequate interactive feedback to users. For system effects, the proposed system can encourage the user to exercise, and the participants felt relaxed after using the system and wanted to use it again. For user experience, the participants felt satisfied and interested while using both single and multiple fireworks modes. In the interviews with participants after they used the proposed system, the following comments were most common: “More types of fireworks shapes”, “Allow two users to battle against each other”, “Generate fireworks with footprints while walking”, and “Design cute shapes for female users”. The participants also pointed out that the system sometimes recognized gestures incorrectly. There was a delay between body gestures and fireworks generation because the system had to generate new particle systems after the completion of previous ones. Objective Evaluation. Because our research aimed to combine particle system design with exercise promotion in gesture-based control, we tried to evaluate the effect of using the proposed system on exercise. Figure 11 shows the
456
H. Xie et al.
Fig. 11. Results of objective evaluation based on the calorie consumption (left) and metabolic rates (right) from our user study.
correlation between the participant weight and calories consumption. In this study, we measured activity intensity using metabolic rate [1] as follows. M ET s =
calorie(kcal) weight(kg) × time(h)
(2)
In our user study, both male and female participants achieved more than 6.0 METs as shown in right subfigure of Fig. 11. This shows that the proposed system had the effects of medium to high intensity physical activity.
7
Conclusion
In this study, we proposed an interactive system Body2Particles, which designs particle systems using body gestures. In order to explore the relation between the fireworks animations and users’ body movements, we conducted a preliminary study to investigate the parameter mapping from body gestures to control parameters of particle systems. Then, we analyzed the body movements of the participants and clarified the correspondence between the body gestures and the control parameters, including launching angles and heights, blooming sizes and shapes. In our user study, we verified that the proposed system is intuitive and enjoyable for common users. Furthermore, the proposed system had the clear effect of promoting exercise equivalent to other moderate-intensity physical activities. The prototype implementation of the proposed system can be improved in multiple ways. Although current proposed system used the list of built-in skeletal parameters for designs of particle systems, the design flexibility would be improved by providing interactive authoring tool for visual control of simulation parameters. To provide a continuously stimulating environment to system user, we would improve the current system with few constraints on control gestures and more freedom to the user. The system could be made more interesting by
Body2Particles: Designing Particle Systems Using Body Gestures
457
increasing the types of fireworks and gamifying elements, such as scores and awards. The current prototype system can support only single user at a time, so we plan to make it possible for a large number of people to participate in designing multiple fireworks. In future work, we would like to explore the use of deep learning approaches to improve gesture recognition and automatic parameter mapping. Acknowledgments. We greatly thank reviewers for their valuable comments, and all the participants in our user study. This work was supported by Hayao Nakayama Foundation Research Grant A, and JSPS KAKENHI grant JP20K19845, Japan.
References 1. Ainsworth, B., et al.: Compendium of physical activities: an update of activity codes and met intensities. Med. Sci. Sports Exerc. 32, S498–504 (2000). https:// doi.org/10.1097/00005768-200009001-00009 2. Alabbasi, H., Gradinaru, A., Moldoveanu, F., Moldoveanu, A.: Human motion tracking evaluation using kinect v2 sensor. In: 2015 E-Health and Bioengineering Conference (EHB), pp. 1–4 (2015) 3. Cannav` o, A., Prattic` o, F.G., Ministeri, G., Lamberti, F.: A movement analysis system based on immersive virtual reality and wearable technology for sport training. In: Proceedings of the 4th International Conference on Virtual Reality. ICVR 2018, New York, NY, USA, pp. 26–31. Association for Computing Machinery (2018). https://doi.org/10.1145/3198910.3198917 4. Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2photo: internet image montage. ACM Trans. Graph. 28(5), 1–10 (2009). https://doi.org/10.1145/ 1618452.1618470 5. Cui, X., Cai, R., Tang, X., Deng, Z., Jin, X.: Sketch-based shape-constrained fireworks simulation in head-mounted virtual reality. Comput. Animation Virtual Worlds 31(2), e1920 (2020). https://doi.org/10.1002/cav.1920 6. Hu, Z., Xie, H., Fukusato, T., Sato, T., Igarashi, T.: Sketch2vf: sketch-based flow design with conditional generative adversarial network. Comput. Animation Virtual Worlds 30(3–4), e1889 (2019). https://doi.org/10.1002/cav.1889 7. Iwaanaguchi, T., Shinya, M., Nakajima, S., Shiraishi, M.: Cyber tai chi-g-based video materials for tai chi chuan self-study. In: 2015 International Conference on Cyberworlds (CW), pp. 365–368 (2015) 8. Kelly, P., Healy, A., Moran, K., O’Connor, N.E.: A virtual coaching environment for improving golf swing technique. In: Proceedings of the 2010 ACM Workshop on Surreal Media and Virtual Cloning. SMVC 2010, New York, NY, USA, pp. 51–56. Association for Computing Machinery (2010). https://doi.org/10.1145/1878083. 1878098 9. Lee, B., Cho, M., Min, J., Saakes, D.: Posing and acting as input for personalizing furniture. In: Proceedings of the 9th Nordic Conference on Human-Computer Interaction. NordiCHI 2016, New York, NY, USA. Association for Computing Machinery (2016). https://doi.org/10.1145/2971485.2971487 10. Peng, Y., et al.: Sketch2domino: interactive chain reaction design and guidance. In: 2020 Nicograph International (NicoInt), pp. 32–38 (2020)
458
H. Xie et al.
11. Qiao, S., Wang, Y., Li, J.: Real-time human gesture grading based on openpose. In: 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–6 (2017) 12. Reeves, W.T.: Particle systems–a technique for modeling a class of fuzzy objects. ACM Trans. Graph. 2(2), 91–108 (1983). https://doi.org/10.1145/357318.357320 13. Xie, H., Watatani, A., Miyata, K.: Visual feedback for core training with 3D human shape and pose. In: 2019 Nicograph International (NicoInt), pp. 49–56 (2019) 14. Xie, H., Igarashi, T., Miyata, K.: Precomputed panel solver for aerodynamics simulation. ACM Trans. Graph., 37(2), March 2018. https://doi.org/10.1145/3185767 15. Zhang, Y., et al.: Bodyavatar: creating freeform 3D avatars using first-person body gestures. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology. UIST 2013, New York, NY, USA, pp. 387–396. Association for Computing Machinery (2013). https://doi.org/10.1145/2501988.2502015 16. Zhao, H., Fan, R., Wang, C.C.L., Jin, X., Meng, Y.: Fireworks controller. Comput. Animation Virtual Worlds, 20(2–3), 185–194 (2009). https://doi.org/10.1002/cav. 287, https://onlinelibrary.wiley.com/doi/abs/10.1002/cav.287 17. Zhu, B., et al.: Sketch-based dynamic illustration of fluid systems. ACM Trans. Graph. 30(6), 1–8 (2011). https://doi.org/10.1145/2070781.2024168
Discussion on the Art of Embryonic Form of Computer Animation—the Peepshow Cheng Liang Chen1(B) and Wen Lan Jiang2(B) 1 Wuhan University of Technology and Anhui University, Hefei, China
[email protected] 2 Anhui Vocational College of Press and Publishing, Hefei, China
Abstract. Peepshow is the result of the accumulation of multiple cultures. It is the product of the development of the art form to a certain degree due to the urgency of the aesthetic needs of the public art and the urgency of demand of the upper class society on novel art. As a form of media art, it had played a very active role in cultural communication in a certain social period. With the progress of the times, it has gradually evolved into a folk art culture. The protection and inheritance of the peepshow as an intangible culture is our present task. The peepshow art, the originator of modern computer animation, has a unique artistic charm in traditional art. Below it is planned to study the comparison between the peepshow art and modern computer animation. Keywords: Peepshow · Media and advertisement · Art protection · Computer animation
1 Retrospect of Regional Cultural Exchanges Under Specific Historical Conditions 1.1 The Origin of Peepshow When and where did the art of the peepshow originate? It was said that the peepshow originates in the Tang Dynasty and its inventors are Li Chunfeng and Yuan Tiangang. In the Tang Dynasty, the Western Region sent a beautiful woman to the emperor as a tribute, she is good at singing and dancing, and is regarded as a treasure by the emperor, but the beauty often frowns, and rarely laughs. Even the emperor has no way to change this, so the emperor calls in Li Chunfeng and Yuan Tiangang, who know the Book of Changes and the Eight Diagrams and can foresee the future, they said that the imperial concubine was not sick, but homesick, so they worked out a plan for the emperor, then the emperor led the concubine to the imperial study room and then they drink wine, he let her talk about the customs and beautiful landscape of her hometown. While the concubine was talking about her hometown, her eyes fluttered, but her spirit was inspired, and smiled vividly every now and then. Li Chunfeng and Yuan Tiangang who hid in the next room wrote down one by one and then portrayed them on a total of eight pages. They were © IFIP International Federation for Information Processing 2020 Published by Springer Nature Switzerland AG 2020 N. J. Nunes et al. (Eds.): ICEC 2020, LNCS 12523, pp. 459–467, 2020. https://doi.org/10.1007/978-3-030-65736-9_41
460
C. L. Chen and W. L. Jiang
known as the eight scenes of the Western Region, they were packed in wooden boxes, the wooden box was equipped with magnifying glass and gongs and drums, etc., and was equipped with an entertainer for performance. When this concubine was invited to watch the scenes again, the beauty was so surprised that she seemed to return to her hometown, she laughed out loud and her beauty was drop-dead gorgeous. This is the origin of the peepshow. The author did not find any specific historical records for this legendary, so can only take it as a story. But what is certain is that the peepshow is the product of cultural exchange between the East and the West, and it is also an advanced form of art whose art form develops to a certain level.
Fig. 1. John Thomson, west view of Beijing street, 1869, London: Wilcom
1.2 The Peepshow Is the Predecessor of Computer Animation Media The peepshow is also called diorama. As the name “diorama”, it has become a specific cultural term. For example, the movie “The Peep Show”, directed by Hu An, starred by Xia Yu, Xing Yufei, Liu Peiqi and Lu Liping tells a story about first entry of Western film to the old Beijing and birth of first Chinese film in the early 20th century; For example, the documentary “The Peep Show” filmed by CCTV tells the story of East-West communication. The invention of photography and the opening of China to the outside occurred almost simultaneously. In 1839, the French Louis Jacques Mandé Daguerre took the first photo in the world. One year later, the Opium War broke out between the Qing Dynasty and the United Kingdom. Photography, this new thing soon entered China following the Western war photographers, businessmen, journalists, pastors and soldiers.
Discussion on the Art of Embryonic Form
461
Fig. 2. Modern people watching peep show
For the first time, the West saw the real China through photos. There are many more similar works. During the long-term development of Chinese culture, a specific two-part allegorical saying is also formed, such as: “A silly woman watches a peep show—spend money for nothing”, and so on. These are true portrayal of diorama as a cultural term. This cultural term was formed in the process of gradual cognition, deepening, understanding, re-deepening, re-cognition and then re-understanding in people’s original conscious cognition of the explicit diorama (the peep show) art. After countless times of repetition, it eventually became a custom-made exclusive term, which has different meanings.
2 The Peep Show Is an Art Form of Media and Advertisement 2.1 Concept and Types of Media The Chinese word “Mei”(matchmaker), in “The Analytical Dictionary of Characters” written by Xu Shen in Eastern Han Dynasty, interprets it as “Mei, a planner, who makes a match between two family names.” In vernacular words, it is: “Mei means planning, planning to make a match between a man and a woman of two different family names to form a new family.” It can be extended to: “Pass one party’s information to another.” The mass media is the media that spreads various kinds of information. Since the birth of oracle, human civilization has been transmitting information through oracle as a medium, making the media to have its initial shape. The first type of media is the most primitive type. This type of media is the simplest, mainly based on short-distance transmission. The transmission method and speed and the transmission area are relatively limited. It is not vivid and difficult to retain; The second type of media is more vivid than the first type of media. It is often the on-site interactive transmission, but its retention capacity is weak and it privacy is also weak; The third type of media uses paper as the main carrier and can transmit information over long distance.
462
C. L. Chen and W. L. Jiang
It uses words and patterns as the expression form of transmission and can be retained for a long time; The fourth type of media mainly uses the audio and video and time concept as the main expression forms. The content of the expression is more accurate and can be stored for a long time. It has a fast spread speed and a wide spread range, but it lacks synchronization. The fifth type of media has the fastest communication speed, wider communication range, stronger interactivity, longer retention, smaller carrier size, and strong synchronization, so it the most vivid. “Media integration is not only a change in the media structure, but also a result of the rapid development of emerging media represented by the Internet, which also greatly affects the landscape of cultural production and communication” [1]. 2.2 The Peep Show Is the Predecessor of Animation Film Media Both the peep show or kamishibai to be mentioned later, are the predecessor of contemporary movies. They both have many similarities: there is a script before the performance; And they all have pictures, and tell stories through the pictures; They are all accompanied by background music art; And they both have elements of performance in it; Through this art form, fees can be charged to obtain profits; Both of them are a form of media art; Both of them have a function of disseminating and recording cultural thoughts, traditional stories, political opinions and historical events of that time. As one of the forms of media, a variety of arts have converged to become an emerging art. The root of the birth is the re-combination of business formats, scene reproduction, breakdown of the normal situation after development of various arts and technologies to high-end forms, and it finally become a new art form. Throughout the history of human civilization, the development of culture is a process from simple to complex, from single to multiple, and from low level to advanced level. The peep show is a carrier of diverse arts. It combines painting art, drama art, folk art and juggling, music art and other arts in one. At the beginning of its birth, it was full of vitality and vigor, and attracted the attention of many audiences. In terms of function, the peep show has the functions of rapping entertainment, spreading cultural stories, inspiring thoughts, and recording history. These functions are also similar to movies. The marketing method also has the attribute of buying tickets for watching, and no tickets no watching. In terms of basic principle, the peep show use the principle of telling the story by pulling multiple story pictures by the performer. The movie also tells the story by the continuous flow of countless pictures to create new pictures. In general, both the production process and the marketing methods are functionally similar, and they both are the art that is presented to the audience through creations in advance through mechanisms, so they are known as “old-style movies”. But with the progress of society and the advancement of science and technology, it has eventually become a folk event in society. With the advent of the technological revolution in the world, the discovery of photography, videography, performance art, contemporary music art, theater art and the principle of persistence of vision as well as the appearance of sound movies, finally animated movies appeared. The peep show art has already possessed the prototype of animation film art. “The development of digital technology continues to promote the innovation of artistic thinking and the transformation of video presentation” [2]. The peep show has laid the foundation for the birth of animation film art. Animation film art is the continuation and inheritance of the peep show art. “In the
Discussion on the Art of Embryonic Form
463
field of digital art, technology and art need to be deeply integrated. But it is needed to make clear that technology serves art, and art content is the fundamental” [3]. Modern computer animation has opened the era of digital media and realized the transition from traditional art to modern technology, that is, the combination of art and science. Compared with the art of the peep show, computer animation is more advantageous. It is mainly reflected in the following aspects: 1. Paperless production of computer animation; 2. The spread range of computer animation has been extended to the extreme; 3. Computer animation spreads faster; 4. Computer animation can be saved and played in the form of Internet multimedia, and its way of transmission has changed; 5. The expression forms of computer animation are more diverse, has great virtual infinity, unlimited creation of visual effects and other advantages; 6. The profitability of computer animation has been greatly improved. Just the film “Nezha: Birth of the Demon Child” in last year reached a box office value of 5 billion Yuan, which cannot be achieved by the traditional peepshow art. Although the peepshow is the beginning of animation art, the peepshow art was restricted by the historical limitations at that time; 7. Computer animation has formed a unique animation industry, affecting the economic structure of the entire society and the trend of today’s regional economy. 2.3 The Toyo-Flavored Peep Show “Kamishibai” The Japanese kamishibai is mostly installed in a box on the back frame of a bicycle. The shape of the box is similar to a small house. It has two small doors. The doors are closed before the performance. With the accompaniment of the knocking board and the narration of the performer, the plot is gradually unfolded, and the two small doors opened slowly. Inside the small doors there are the prepared plot pictures. The audience sits around the box. The audience can be more or less. These audiences are mainly children. The number of audiences is not limited, but for this kind of performance, it is not easy to charge performance fees. In China, for the peep shows, magnifying glasses are often installed around the box for viewing, and the performer does not sell tickets, the performer will not let the audience sit in front of the viewing glasses, so the audience will not see the story pictures inside. The Japanese kamishibai has a relatively small box at the back, and is often performed in a mobile way on a bicycle. It usually takes a few minutes for a story, but it also takes a little longer. For Chinese peep show, the box is generally large in size, and there are musical instruments such as gongs and drums mounted on the back of the box, so it is often not easy to move. It is mainly suitable for performance in crowded places such as temple fair, big fair or important celebrations. This kind of performance is not convenient in movement and the audience can only watch through the viewing holes equipped with a magnifying glass, so it actually hinders the spread and development of the art of the peep show. Kamishibai often interacts and communicates with children through theatrical form of performance, which not only narrows the distance between each other, but also brings theater-like enjoyment and entertainment to children, and improves children’s attention, understanding and expression skills, and inspire children’s potential in language, music, vision, performance, etc. Through the changes in language, expression, limbs and sound of the kamishibai performer, as well as the interaction and communication with the
464
C. L. Chen and W. L. Jiang
audience, the performer lets the story go out of the stage into the audience and into the imaginary world of the children, making the performance and imagination merge naturally and resonance between the performer and the audience, this is the magical charm of kamishibai. The Japanese kamishibai has played an active role in promoting and spreading the development of Japanese anime. As far as the peep show is concerned, the peep shows are not totally “foreign”. It has been merged into the soul of traditional Chinese art in is long development, and has gradually used traditional Chinese art as its main expression form. Whether “the peep shows” or “American ginseng”, and “Western Clock”, “Westerners”, “Western Matches”, “Western Painting”, “Western Gun”, “Western Artillery”, “Western Musical Instrument”, etc., they are all the terms of material and cultural exchanges between different regions. The word “foreign” represents a distinctive western (alien) color, it is a word that distinguishes Chinese culture and foreign cultures. “The underlying logic of fusion development is first the fusion of technology, from the traditional technology of the mechanical age, to the classic technology of the industrial age, and then to the high-tech of the digital age. Every technological advancement will bring the integration and innovation of traditional and emerging technologies” [4]. As a form of media, it had once prospered for a time and played a positive role in cultural thoughts, legends and history records at that time. As a form of entertainment, it had also once won people’s love. With the progress of the times and with the rising of the emerging media such as digital media and movies and televisions, the peep shows have been evolved into an art form of folk activities and become a form of folk art that urgently needs social protection.
3 Contribution and Innovation I have been committed to the research of computer animation industry and communication for a long time. According to the laws of the computer animation industry, various different types of animation development models have been explored, and the generation mechanism of such three cycles as computer animation industry, market, and economic society has been studied in depth. I have written a number of related papers and books. The textbook “Flash Animation Short Film Production” which I recently wrote has been published by Hebei Fine Arts Publishing House. The book which I recently wrote on the research of computer animation industry, “Animation Art Industry and Communication Research” will be printed soon. Welcome animation enthusiasts to read it and provide valuable opinions. The papers “A Preliminary Study on the Operation Mode of Animation Market” and “Research on China’s Early Animation Market Development” were published in publications such as “Art Market”. “The development of art is a transformation of art paradigm in a certain sense. In every historical period, there is a dominant art paradigm, which determines the mainstream direction of art development and forms a certain artistic trend of thought. However, we have also seen that this paradigm is gradually changing, which is related to humanities, social sciences, and philosophy. However, science and technology also play an important role and influence this process” [5]. Take the peepshow studied in this paper as an example, through the birth, spread, prosperity, and even decline of the peepshow, we should draw nourishment from it, and should innovate design. “‘Original design’ must
Discussion on the Art of Embryonic Form
465
extend the scope of design activities from ‘model design’ forward to ‘principle research’ and ‘prototype experimental research’; extend backward to ‘product implementation’, ‘commercialization’ and even the possibility research practice of recycling and reuse. Its working process can be expressed as a complete, full-program original design ‘chain’: principle research → application experiment → detailed design → production transformation → achievement promotion” [6]. Realize the digital game design of the peepshow art and carry out the digital improvement of the peepshows through modern computer technology, that is, use computer animation virtual reality technology to realize the digital presentation of the peepshows, combine traditional art with contemporary computer technology. While rescuing those national arts that are on the verge of extinction, it is easier to enrich people’s entertainment interests and forms of entertainment after creation by combining the new and old ideas. “The creation history of Chinese animated films has actually gone through the creation process of borrowing, interruption, imitation and pursuing a unique national style, has experienced different development stages such as imitating foreign industrial operation modes, improving processing technology, and changing market concepts to exploring new creations with national characteristics” [7]. Because the lyrics and tunes of the peepshows have a clear nature of national folk art, we can record the sounds and pictures of contemporary peepshow artists in their live performances, and perform non-linear editing of these performances subsequently, and finally, synthesize them into new digital music tracks for music lovers to listen to in combination with the advanced modern digital music synthesis equipment. According to the structure of the peepshow wooden box equipment, design a digital peepshow APP or device. The digital peepshow device should be simple and easy to use, easy to carry, easy to move, easy to use by peepshow artists, which can alleviate the extinction of peepshow art to a certain extent. This measure is the uphold of peepshow art and also an innovation of a new art form. “Interactive film, as a narrative carrier, guides the characters in the film to make “choices”, and is a bridge that promotes narrative and performs the two-way communication between the experiencer and the work” [8]. We can also create corresponding computer animate films based on the combined artistic style of rap, performance and painting of the peepshows to enrich the art form of digital animated films. Peepshows are a traditional visual art with relatively strong interaction. It has the unique charm of modern animated films. We should enhance the interactivity of peepshow art and strengthen the development and research of interactive film animation. The ultimate goal of innovation is to maximize the inheritance, dissemination and perfection of peepshow art, promote the combination of traditional art and computer animation, and explore new computer animation industrial marketing methods. The animation art industry is affected by the laws of development model to a certain extent and has a pattern of overall bucket effect. Only after the overall understanding, overall planning, overall creation of the animation industry and in-depth understanding of the animation market, can animation creation have sufficient economic support and can form the industrialization of the animation market when it has reached a certain extent. Both the early Chinese animations and the peepshow that have taken shape in the animation industry have the basic principle characteristics of animation. They are both the drawn dynamic pictures and both have the characteristics such as script creation, character portrayal, coexistence of sound and picture and selling tickets for
466
C. L. Chen and W. L. Jiang
seats. However, due to different stages, their specific contents and forms are also different, and eventually they formed their own different industrial models. Project funding. 1. Took charge of the 2018 Key Project of the Department of Education of Anhui Province for Humanities and Social Sciences of Regular Colleges and Universities SK2018A0021— the emerging development model of animated films and “going out” strategy along the lines of the “One Belt, One Road” Initiative. 2. Took charge of the Project of the Department of Education of Anhui Province 2017JYXM0064—Study on Correlation between Creative Brands and Classroom Teaching—Take Digital Animation as an Example. 3. Took charge of the Training Targets Project of the Fourth Group of Young Cadres of Anhui University. “Look inside and see a mass of bobbing heads, it’s heavy snow and yet it’s a cold day; Yang Bailao has sold out tofu and is returning home, holding one kilogram of noodles in his arms; He was preparing to celebrate the New Year, but he didn’t expect that the henchman came to collect the debt before his door; Huang Shiren hounded Yang Bailao to death and snatched Xi’er as his servant girl”. I remember when I was strolling around the Shanghai City God Temple a few years ago, an old entertainer grasped my arm and did not let me go. The passing pedestrians were shocked instantly by this situation. What happened? It was seen there is a big red wooden box with five big eyes painted on it, there were three or five people sitting in front of the big eyes and looked inside intently. There were a lot of audience surrounding the box, and the visitors immediately surrounded the shouting old entertainer’s venue in the busy street tightly. It was seen that the old entertainer was pulling the rope mechanism through the wooden box device while singing, and the sound of gongs and drums behind the wooden box were also heard. It can be said that there are talks and singing, gongs and drums, performances and pictures, plots and stories. There is only one person in the whole performance process, this is the folk art—peepshow (see Fig. 1). The peepshow, also known as “Zoetrope” or “Diorama”, was more popular in Beijing, Tianjin, Shanghai, Xi’an and other places, and later became popular in Hunan, Hubei, Shanxi, Henan, etc. The main performance contents are landscapes, historical legends, novels, folk stories, and also recorded history. It is a comprehensive form of folk art. Because there are story plots, sounds, and sense of shot, the main picture inside the box moves by pulling, so it has the prototype of animation and is known as the old animation. The peepshow is a distinctive art form in traditional Chinese folk art. It is a comprehensive art that combines painting, rapping, performance and screenwriting art (see Fig. 2). Whenever Spring Festival and other important festivals come, it becomes a shining folk event. It is a materialized carrier of traditional Chinese folk art with rich cultural connotation and aesthetic interest, and is of great artistic value.
References 1. Zhifeng, H., Changcheng, H.: Innovative development of Chinese culture in the new environment. J. Central Institute of Soc. 02, 94–99 (2019) 2. Lijun, S.: Innovative thinking on the construction of film discipline under the background of new liberal arts. Drama (J. Central Acad. Drama) 03, 33–38 (2020) 3. Xinyuan, H.: Without artistic expression, technology will lose its value — Huang Xinyuan’s talks about digital media art. Design 33(12), 49–51 (2020) 4. Zhifeng, H., Xu, L.: The concept and path of “drama and film and television” under the background of new liberal arts. Drama (J. Central Acad. Drama) (03), 1–8 (2020) 5. Xiaobo, L.: Thinking on the new art form in the intelligent age[J]. Chinese Art 10, 62–65 (2017)
Discussion on the Art of Embryonic Form
467
6. Guanzhong, L.: “Industrial chain” innovation of original design and industrial design. J. Fine Arts 01, 11–13 (2009) 7. Jianping, L.: Thinking on the national values of original animated films. Television Res. 11, 9–11 (2009) 8. Xinyuan, H., Zi, J.: Discussion on the ontological features of interactive movies—The fusion, collision and rebirth of movies and games. Contemporary Cinema 01, 167–171 (2020)
Author Index
Abdelmonem, Hajar 77 Andrade, Tânia 223 Atef, Marwa 77 Aya, M. 77 Azevedo, Gabriel S. 142 Baalsrud Hauge, Jannicke 104 Badia, Sergi Bermúdez i 369 Bahrini, Mehrdad 18 Bala, Paulo 223 Bao, Fanyu 391 Batista, Washington P. 399 Brown, Joseph Alexander 134 Cabral, Diogo 369 Cai, Ning 51 Cameirão, Mónica S. 369 Castro, Deborah 369 Chen, Cheng Liang 459 Cheng, Darui 37 Choosri, Noppon 163 Clua, Esteban W. G. 90 Dionisio, Mara 223 Donghao, Wang 277 Elbossaty, Abdelrahman 77 Elhady, Abd Elghaffar M. 77 Elsaeed, Naira 77 Esaki, Hiroshi 255 Fei, Guangzheng 37 Felnhofer, Anna 151 Fronek, Robert 285 Fujita, Satoru 277 Furuno, Tomoya 277 Galvão, Abel R. 142 Gao, Naying 126 Gao, Yuexian 126 Ge, Chao 391 Göbl, Barbara 285 Grudpan, Supara 163
Guo, Junxiu 248 Guo, Xiaoying 325 Haji, Shaibou Abdoulai 77 Han, Honglei 37 Hlavacs, Helmut 65, 151, 240, 285 Hoshino, Junichi 270, 277 Iida, Hiroyuki 117, 126 Inokuchi, Kazuma 255 Jiang, Feng 176 Jiang, Wen Lan 459 Jiangtao, Han 434 Jiaxian, Lin 422 Khalid, Mohd Nor Akmal 126 Klaphajone, Jakkrit 163 Kobayashi, Kei 270 Kohwalter, Troy C. 90 Kothgassner, Oswald 151 Koyamada, Koji 308 Kusumi, Takashi 308, 382 Lemke, Stella 18 Lenz, Andreas 151 Li, Wenshu 325 Li, Yongteng 317 Li, Zheng 51 Liang, Fei 391 Liang, Hui 391 Lima, Samuel Vitorio Lin, Ding 176 Liu, Mingjie 317 Liu, Youquan 248 Long, Yang 434 Longxing, Yao 434 Luo, Tianren 51
264
Mai, Cong Hung 297, 308 Malaka, Rainer 3, 18, 163 Matsubara, Hitoshi 338 Meili, Wang 422
470
Author Index
Ming, Ju 422 Miyata, Kazunori 445 Mohsen, Abdelrahman 77 Mostafa, Haidy 77 Murta, Leonardo G. P. 90
Stefan, Antoniu 104 Stefan, Ioana A. 104 Sun, Yusheng 391 Suzuki, Keiji 338 Suzuki, Takashi 409
Nakatsu, Ryohei 297, 308, 382, 409 Nisi, Valentina 188, 205, 223, 369 Nunes, Nuno Jardim 205
Takada, Hiroyuki 382 Tang, Liyu 176 Tharwat, Mai 77 Tosa, Naoko 297, 308, 382, 409 Tsukada, Manabu 255 Tu, Renwei 317
Olim, Sandra Câmara 188 Oliveira, Sarah 223 Omi, Yuji 277 Osada, Junichi 338 Palee, Patison 163 Pan, Zhigeng 51 Pereira, Claudia P. 110, 142, 399 Pfau, Johannes 18 Potuzak, Bernhard 65 Qiao, Liqin 325 Radeta, Marko 205 Ramharter, Alexander 240 Reda, Aya 77 Rigby, Jacob M. 369 Rios, Lenington C. 399 Rotab, Eiad 77 Sakhapov, Albert 134 Santana, Kayo C. 110, 142, 399 Sararit, Nat 3 Sarinho, Victor T. 110, 142, 399 Sarinho, Victor Travassos 264 Shohieb, Samaa M. 77 Siyuan, Zhu 422 Söbke, Heinrich 104 Sohr, Karsten 18 Stankic, Nikola 65
Vasanth, Harry
369
Wang, Danli 353 Wang, Songxue 248 Wang, Xiao 317 Wang, Xueyu 353 Wattanakul, Sirprapa 163 Wongta, Noppon 163 Xie, Dazhao 445 Xie, Haoran 445 Xing, Qian 353 Xiong, Shuo 117 Yamada, Akihiro 409 Yamamoto, Kazuya 409 Yuan, Qingshu 51 Yunian, Pan 409 Zargham, Nima 18 Zhang, Qian 391 Zhao, Yanyan 353 Zhiyi, Zhang 434 Zhou, Xiang 176 Zhu, Zhongjie 317 Zuo, Long 117