338 115 8MB
English Pages [255] Year 2023
Smart Innovation, Systems and Technologies 330
Kazumi Nakamatsu · Srikanta Patnaik · Roumen Kountchev · Ruidong Li · Ari Aharari Editors
Advanced Intelligent Virtual Reality Technologies Proceedings of 6th International Conference on Artificial Intelligence and Virtual Reality (AIVR 2022)
123
Smart Innovation, Systems and Technologies Volume 330
Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.
Kazumi Nakamatsu · Srikanta Patnaik · Roumen Kountchev · Ruidong Li · Ari Aharari Editors
Advanced Intelligent Virtual Reality Technologies Proceedings of 6th International Conference on Artificial Intelligence and Virtual Reality (AIVR 2022)
Editors Kazumi Nakamatsu University of Hyogo Kobe, Japan
Srikanta Patnaik SOA University Bhubaneswar, India
Roumen Kountchev Technical University of Sofia Sofia, Bulgaria
Ruidong Li Kanazawa University Kanazawa, Japan
Ari Aharari Sojo University Kumamoto, Japan
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-19-7741-1 ISBN 978-981-19-7742-8 (eBook) https://doi.org/10.1007/978-981-19-7742-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
AIVR 2022 Organization
Honorary Chair Prof. Lakhmi C. Jain, KES International, UK
General Co-chairs Assoc. Prof. Ruidong Li, Kanazawa University, Japan Prof. Kazumi Nakamatsu, University of Hyogo, Japan
Conference Chair Assoc. Prof. Ari Aharari, Sojo University, Japan
International Advisory Board Srikanta Patnaik, SOA University, India Xiang-Gen Xia, University of Delaware, USA Shrikanth (Shri) Narayanan, University of Southern California, USA Hossam Gaber, Ontario Tech University, Canada Jair Minoro Abe, Paulista University, Brazil Mario Divan, National University de la Pampa, Argentina Chip Hong Chang, Nanyang Technological University, Singapore Aboul Ela Hassanien, Cairo University, Egypt Ari Aharari, Sojo University, Japan
v
vi
AIVR 2022 Organization
Program Chairs Mohd. Zaid Abdullah, Universiti Sains Malaysia, Malaysia Minghui Li, The University of Glasgow, Singapore Letian Huang, University of Electronic Science and Technology of China, China
Technical Program Committee Michael R. M. Jenkin, York University, Canada Georgios Albanis, Centre for Research and Technology, Greece Nourddine Bouhmala, Buskerud and Vestfold University College, Norway Pamela Guevara, University of Concepción, Chile Joao Manuel R. S. Tavares, University of Porto, Portugal Punam Bedi, University of Delhi, India Der-Chyuan Lou, Chang Gung University, Taiwan Chang Hong Lin, National Taiwan University of Science and Technology, Taiwan Tsai-Yen Li, National Chengchi University, Taiwan Zhang Yu, Harbin Institute of Technology, China Yew Kee Wong, Jiangxi Normal University, China Lili Nurliyana Abdullah, University Putra Malaysia, Malaysia S. Nagarani, Sri Ramakrishna Institute of Technology, India Liu Huaqun, Beijing Institute of Graphic Communication, China Jun Lin, Nanjing University, China Hasan Kadhem, American University of Bahrain, USA Gennaro Vessio, University of Bari, Italy Romana Rust, ITA Institute of Technology in Architecture, Switzerland S. Anne Susan Georgena, Sri Ramakrishna Institute of Technology, India Juan Gutiérrez-Cárdenas, Universidad de Lima, Peru Devendra Kumar R. N., Sri Ramakrishna Institute of Technology, Coimbatore, India Shilei Li, Naval University of Engineering, China Jinglu Liu, The Open University of China, China Aiman Darwiche, Instructor and Software Developer, USA Alexander Arntz, University of Applied Sciences Ruhr West, Germany Mariella Farella, University of Palermo, Italy Daniele Schicchi, University of Palermo, Italy Liviu Octavian Mafteiu-Scai, West University of Timisoara, Romania Shivaram, Tata Consultancy Services, India Niket Shastri, Sarvajnik College of Engineering and Technology, India Gbolahan Olasina, University of KwaZulu-Natal, South Africa Amar Faiz Zainal Abidin, Universiti Teknikal Malaysia Melaka, Malaysia Muhammad Naufal Bin Mansor, Universiti Malaysia Perlis (UniMAP), Malaysia Le Nguyen Quoc Khanh, Nanyang Technological University, Singapore
AIVR 2022 Organization
Organizer and Supporting Institutes Beijing Huaxia Rongzhi Blockchain Technology Institute, China Sojo University, Japan Universiti Sains Malaysia, Malaysia Universiti Teknologi Malaysia, Malaysia Chang Gung University, China
vii
Preface
The international conference series, Artificial Intelligence and Virtual Reality (AIVR), has been bringing together researchers and scientists, both industrial and academic, developing novel Artificial Intelligence and Virtual Reality outcomes. Research in Virtual Reality (VR) is concerned with computing technologies that allow humans to see, hear, talk, think, learn, and solve problems in virtual and augmented environments. Research in Artificial Intelligence (AI) addresses technologies that allow computing machines to mimic these same human abilities. Although these two fields evolved separately, they share an interest in human senses, skills, and knowledge production. Thus, bringing them together will enable us to create more natural and realistic virtual worlds and develop better, more effective applications. Ultimately, this will lead to a future in which humans and humans, humans and machines, and machines and machines are interacting naturally in virtual worlds, with use cases and benefits we are only just beginning to imagine. The sixth International Conference on Artificial Intelligence and Virtual Reality (AIVR 2022) was originally supposed to be held in Kumamoto, Japan, on July 22–24, 2022, though, the world is still fighting against COVID-19 pandemic, there is no doubt that the safety and well-being of our participants are most important. Considering the health and safety of everyone, we had to make a tough decision and convert AIVR 2022 into a fully online conference via the Internet. Past AIVR conferences were held in Nagoya (2018), Singapore (2019), and as virtual conference (2020 and 2021), respectively. AIVR 2022 in the successful AIVR conference series provided an ideal opportunity for reflection on developments over the last two decades and to focus on future developments. The topics of AIVR 2022 focus on theory, design, development, testing, and evaluation of all Virtual Reality intelligent technologies applicable/applied to various systems and their infrastructures, and the major topics cover system techniques, performance, and implementation; content creation and modeling; cognitive aspects, perception, and user behavior in terms of Virtual Reality; AI technologies; interactions/interactive and responsive environments; and applications and case studies.
ix
x
Preface
We accepted one invited and 16 regular papers among submitted 44 papers from China, Germany, Greece, Japan, Malaysia, Brazil, UK, etc., at AIVR 2022. This volume is devoted to presenting all those accepted papers of AIVR 2022. Lastly, we wish to express our sincere appreciation to all participants and the technical program committee for their review of all the submissions, which is vital to the success AIVR 2022, and also to the members of the organizer who had dedicated their time and efforts in planning, promoting, organizing, and helping the conference. Special appreciation is extended to our keynote and invited speakers: Prof. Xiang-Gen Xia, University of Delaware, USA; Prof. Shrikanth (Shri) Narayanan, University of Southern California, USA; Prof. Chip Hong Chang, Nanyang Technological University, Singapore; and Prof. Minghui Li, University of Glasgow, UK, who made very beneficial speeches for the conference audience, and also Prof. Jair M. Abe, Paulista University, Sao Paulo, Brazil, who kindly contributed an invited paper to AIVR 2022. Kobe, Japan Sofia, Bulgaria Bhubaneswar, India Kanazawa, Japan Kumamoto, Japan July 2022
Kazumi Nakamatsu Roumen Kountchev Srikanta Patnaik Ruidong Li Ari Aharari
Contents
Part I 1
Paraconsistency and Paracompleteness in AI: Review Paper . . . . . . Jair Minoro Abe, João I. da Silva Filho, and Kazumi Nakamatsu
Part II 2
3
4
5
Invited Paper
Regular Papers
Decision Support Multi-agent Modeling and Simulation of Aeronautic Marine Oil Spill Response . . . . . . . . . . . . . . . . . . . . . . . . Xin Li, Hu Liu, YongLiang Tian, YuanBo Xue, and YiXiong Yu
19
Transferring Dense Object Detection Models To Event-Based Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vincenz Mechler and Pavel Rojtberg
35
Diagnosing Parkinson’s Disease Based on Voice Recordings: Comparative Study Using Machine Learning Techniques . . . . . . . . . Sara Khaled Abdelhakeem, Zeeshan Mohammed Mustafa, and Hasan Kadhem Elements of Continuous Reassessment and Uncertainty Self-awareness: A Narrow Implementation for Face and Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stanislav Selitskiy
6
Topic-Aware Networks for Answer Selection . . . . . . . . . . . . . . . . . . . . . Jiaheng Zhang and Kezhi Mao
7
Design and Implementation of Multi-scene Immersive Ancient Style Interaction System Based on Unreal Engine Platform . . . . . . . . Sirui Yang, Qing Qing, Xiaoyue Sun, and Huaqun Liu
8
3
49
61 73
85
Auxiliary Figure Presentation Associated with Sweating on a Viewer’s Hand in Order to Reduce VR Sickness . . . . . . . . . . . . . 101 Masaki Omata and Mizuki Suzuki xi
xii
9
Contents
Design and Implementation of Immersive Display Interactive System Based on New Virtual Reality Development Platform . . . . . . 119 Xijie Li, Huaqun Liu, Tong Li, Huimin Yan, and Wei Song
10 360-Degree Virtual Reality Videos in EFL Teaching: Student Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Hui-Wen Huang, Kai Huang, Huilin Liu, and Daniel G. Dusza 11 Medical-Network (Med-Net): A Neural Network for Breast Cancer Segmentation in Ultrasound Image . . . . . . . . . . . . . . . . . . . . . . 145 Yahya Alzahrani and Boubakeur Boufama 12 Auxiliary Squat Training Method Based on Object Tracking . . . . . . 161 Yunxiang Pang, Haiyang Sun, and Yiqun Pang 13 Study on the Visualization Modeling of Aviation Emergency Rescue System Based on Systems Engineering . . . . . . . . . . . . . . . . . . . 173 Yuanbo Xue, Hu Liu, Yongliang Tian, and Xin Li 14 An AI-Based System Offering Automatic DR-Enhanced AR for Indoor Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Georgios Albanis, Vasileios Gkitsas, Nikolaos Zioulis, Stefanie Onsori-Wechtitsch, Richard Whitehand, Per Ström, and Dimitrios Zarpalas 15 Extending Mirror Therapy into Mixed Reality—Design and Implementation of the Application PhantomAR to Alleviate Phantom Limb Pain in Upper Limb Amputees . . . . . . . . 201 Cosima Prahm, Korbinian Eckstein, Michael Bressler, Hideaki Kuzuoka, and Jonas Kolbenschlag 16 An Analysis of Trends and Problems of Information Technology Application Research in China’s Accounting Field Based on CiteSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Xiwen Li, Jun Zhang, Ke Nan, and Xiaoye Niu 17 Augmented Reality Framework and Application for Aviation Emergency Rescue Based on Multi-Agent and Service . . . . . . . . . . . . 237 Siliang Liu, Hu Liu, and Yongliang Tian Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
About the Editors
Kazumi Nakamatsu received the Ms. Eng. and Dr. Sci. from Shizuoka University and Kyushu University, Japan, respectively. His research interests encompass various kinds of logic and their applications to Computer Science, especially paraconsistent annotated logic programs and their applications. He has developed some paraconsistent annotated logic programs called ALPSN (Annotated Logic Program with Strong Negation), VALPSN (Vector ALPSN), EVALPSN (Extended VALPSN) and bf-EVALPSN (before-after EVALPSN) recently, and applied them to various intelligent systems such as a safety verification based railway interlocking control system and process order control. He is an author of over 180 papers and 30 book chapters and 20 edited books published by prominent publishers. Kazumi Nakamatsu has chaired various international conferences, workshops, and invited sessions, and he has been a member of numerous international program committees of workshops and conferences in the area of Computer Science. He has served as the editor-in-chief of the International Journal of Reasoning-based Intelligent Systems (IJRIS); he is now the founding editor of IJRIS and an editorial board member of many international journals. He has contributed numerous invited lectures at international workshops, conferences, and academic organizations. He also is a recipient of numerous research paper awards. Dr. Srikanta Patnaik is presently working as the director of International Relation and Publication of SOA University. He is a full professor in the Department of Computer Science and Engineering, SOA University, Bhubaneswar, India. He has received his Ph. D. (Engineering) on Computational Intelligence from Jadavpur University, India, in 1999. He has supervised more than 25 Ph.D. theses and 60 master theses in the area of computational intelligence, machine learning, soft computing applications, and re-engineering. Dr. Patnaik has published around 100 research papers in international journals and conference proceedings. He is author of two textbooks and 52 edited volumes and few invited book chapters, published by leading international publisher like Springer-Verlag, Kluwer Academic, etc. Dr. Srikanta Patnaik is the editors-in-chief of International Journal of Information and Communication Technology and International Journal of Computational Vision and Robotics xiii
xiv
About the Editors
published from Inderscience Publishing House, England, and International Journal of Computational Intelligence in Control, published by MUK Publication, the editor of Journal of Information and Communication Convergence Engineering, and an associate editor of Journal of Intelligent and Fuzzy Systems (JIFS), which are all Scopus Index journals. He is also the editors-in-chief of Book Series on “Modeling and Optimization in Science and Technology” published from Springer, Germany, and Advances in Computer and Electrical Engineering (ACEE) and Advances in Medical Technologies and Clinical Practice (AMTCP), published by IGI Global, USA. Dr. Patnaik has travelled more than 20 countries across the globe to deliver invited talks and keynote address at various places. He is also a visiting professor to some of the universities in China, South Korea, and Malaysia. Prof. Roumen Kountchev Ph.D., D.Sc. is a professor at the Faculty of Telecommunications, Department of Radio Communications and Video Technologies, Technical University of Sofia, Bulgaria. His areas of interests are digital signal and image processing, image compression, multimedia watermarking, video communications, pattern recognition and neural networks. Prof. Kountchev has 350 papers published in magazines and proceedings of conferences; 20 books; 47 book chapters; and 21 patents. He had been a principle investigator of 38 research projects. At present, he is a member of Euro Mediterranean Academy of Arts and Sciences and President of Bulgarian Association for Pattern Recognition (member of Intern. Association for Pattern Recognition). He is an editorial board member of: International Journal of Reasoning-based Intelligent Systems; International Journal of Broad Research in Artificial Intelligence and Neuroscience; KES Focus Group on Intelligent Decision Technologies; Egyptian Computer Science Journal; International Journal of BioMedical Informatics and e-Health, and International Journal of Intelligent Decision Technologies. He has been a plenary speaker at: WSEAS International Conference on Signal Processing, 2009, Istanbul, Turkey; WSEAS International Conference on Signal Processing, Robotics and Automation, University of Cambridge 2010, UK; WSEAS International Conference on Signal Processing, Computational Geometry and Artificial Vision 2012, Istanbul, Turkey; International Workshop on Bioinformatics, Medical Informatics and e-Health 2013, Ain Shams University, Cairo, Egypt; Workshop SCCIBOV 2015, Djillali Liabes University, Sidi Bel Abbes, Algeria; International Conference on Information Technology 2015 and 2017, Al Zayatoonah University, Amman, Jordan; WSEAS European Conference of Computer Science 2016, Rome, Italy; The 9th International Conference on Circuits, Systems and Signals, London, UK, 2017; IEEE International Conference on High Technology for Sustainable Development 2018 and 2019, Sofia, Bulgaria; The 8th International Congress of Information and Communication Technology, Xiamen, China, 2018; General chair of the International Workshop New Approaches for Multidimensional Signal Processing, July 2020, Sofia, Bulgaria. Assoc. Prof. Ruidong Li is an associate professor at Kanazawa University, Japan. Before joining this university, he was a senior researcher at the National Institute of Information and Communications Technology (NICT), Japan. He serves as the
About the Editors
xv
secretary of IEEE ComSoc Internet Technical Committee (ITC), is the founder and chair of IEEE SIG on Big Data Intelligent Networking and IEEE SIG on Intelligent Internet Edge, and the co-chair of young research group for Asia future internet forum. He is the associate editor of IEEE Internet of Things Journal and also served as the guest editors for a set of prestigious magazines, transactions, and journals, such as IEEE Communications Magazine, IEEE Network Magazine, IEEE Transactions. He also served as chairs for several conferences and workshops, such as the general co-chair for AIVR2019, IEEE INFOCOM 2019/2020/2021 ICCN workshop, IEEE MSN 2020, BRAINS 2020, IEEE ICDCS 2019/2020 NMIC workshop and IEEE Globecom 2019 ICSTO workshop, and publicity co-chair for INFOCOM 2021. His research interests include future networks, big data networking, intelligent Internet edge, Internet of things, network security, information-centric network, artificial intelligence, quantum Internet, cyber-physical system, naming and addressing schemes, name resolution systems, and wireless networks. He is a senior member of IEEE and a member of IEICE. Assoc. Prof. Ari Aharari (Ph.D.) received M.E. and Ph.D. in Industrial Science and Technology Engineering and Robotics from Niigata University and Kyushu Institute of Technology, Japan, in 2004 and 2007, respectively. In 2004, he joined GMD-JAPAN as a research assistant. He was a research scientist and coordinator at FAIS-Robotics Development Support Office from 2004 to 2007. He was a postdoctoral research fellow of the Japan Society for the Promotion of Science (JSPS) at Waseda University, Japan, from 2007 to 2008. He served as a senior researcher of Fukuoka IST involved in the Japan Cluster Project from 2008 to 2010. In 2010, he became an assistant professor at the faculty of Informatics of Nagasaki Institute of Applied Science. Since 2012, he has been an associate professor at the Department of Computer and Information Science, Sojo University, Japan. His research interests are IoT, robotics, IT agriculture, image processing and data analysis (Big Data) and their applications. He is a member of IEEE (Robotics and Automation Society), RSJ (Robotics Society of Japan), IEICE (Institute of Electronics, Information and Communication Engineers), and IIEEJ (Institute of Image Electronics Engineers of Japan).
Part I
Invited Paper
Chapter 1
Paraconsistency and Paracompleteness in AI: Review Paper Jair Minoro Abe , João I. da Silva Filho , and Kazumi Nakamatsu
Abstract The authors analyse the contribution of the logical treatment of the concepts of inconsistency and paracompleteness to better understand AI’s current state of development. In particular, the relationship between Artificial Intelligence and a new type of logic, called Paraconsistent Annotated Logic, which effectively manipulates the above concepts, both computationally and in its use in Hardware, is considered.
1.1 Introduction 1.1.1 Classical and Non-classical Logic Logic, until very recently, was a single science, which progressed linearly, even after its mathematisation by mathematicians, logicians and philosophers such as Boole, Peano, Frege, Russell and Whitehead. The revolutionary developments in the 1930s, such as those by Gödel and Tarski, still fall within what we can call classical or traditional logic. Despite all the advances in traditional logic, another parallel revolution took place in the field of science created by Aristotle, of a very different nature. We refer to the institution of non-classical logic. They produced, as in the case of non-Euclidean geometry, a transformation of a profound nature in the scientific sphere, whose consequences, of a philosophical nature, have not yet been investigated systematically and comprehensively. J. M. Abe (B) Paulista University, São Paulo, Brazil e-mail: [email protected] J. I. da Silva Filho Santa Cecília University, Santos, Brazil K. Nakamatsu University of Hyogo, Hyogo, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_1
3
4
J. M. Abe et al.
We call classical or traditional logic the study of the calculus of the first-order predicates, with or without equality, as well as some of its subsystems, such as classical propositional calculus, and some of its extensions, for example, traditional logic, higher-order (type theory) and the standard systems of set theory (Zermelo–Fraenkel, von Neumann–Bernays–Gödel, Kelley–Morse, NF Quine-Rosser, ML Quine-Wang, etc.). The logic under consideration is based on well-established syntax and semantics; thus, the usual semantics of predicate calculus is based on Tarski’s concept of truth. Non-classical logic is characterised by amplifying, in some way, traditional logic or by infringing or limiting its core principles or assumptions [1]. Among the first, called complementary logics of the classical, we will remember the traditional logics of alethic modalities, deontic modalities, epistemic operators and temporal operators. Among the second, called heterodox or rivals of classical, we will cite paraconsistent, paracomplete and intuitionist logics without negation (Griss, Gilmore and others). Logic, we must stress, is much more than the discipline of valid forms of inference. It would be difficult to fit, e.g. the theory of models in its current form and the theory of recursion in a logic thus defined. However, for this article, we can identify (deductive) logic as the discipline especially concerned with valid inference (or reasoning). On the other hand, each deductive logic L is usually associated with an inductive logic L’, which, under certain conditions, indicates how invalid inferences according to L can still be used. The patterns for this to be legitimate are encoded in L’. Inductive logics are perfectly placed among non-classical logics (perhaps as a replacement for the corresponding deductive logics) [1]. Artificial Intelligence (AI) has contributed to the progress of several new logics (non-monotonic logics, default logics, defeasible logics, and paraconsistent logics in general). This is because, particularly in the case of expert systems, we need non-traditional forms of inference. Paraconsistency, e.g. is imposed as one regularly works with inconsistent (contradictory) sets of information. In this chapter, we outline the great relevance that AI is acquiring regarding the deep understanding of the meaning of logicity (and, indirectly, for the very understanding of reason, its structure, limits and forms of application). To do so, we will only focus on the case of paraconsistent logic and paracomplete logic; without a doubt, it can be seen as one of the most heterodox among the heterodox logics, although technically, it can be constructed as a complementary logic to classical logic.
1.2 Paraconsistent and Paracomplete Logic A (deductive) theory T, based on logic L, is said to be consistent if among its theorems there are not two, such that one is the negation of the other; otherwise, T is called
1 Paraconsistency and Paracompleteness in AI: Review Paper
5
inconsistent. Theory T is called trivial if all its language sentences (closed formulas) are theorems; if not, T is nontrivial. If L is one of the standard logics, such as classical and Brouwer-Heyting’s intuitionistic logic, T is named trivial if and only if it is inconsistent. In other words, logic like these does not separate the concepts of inconsistency and triviality. L is called paraconsistent if it can function as a foundation for inconsistent and nontrivial theories. (Only in certain specific circumstances does the presence of contradiction imply trivialisation.) In other words, paraconsistent logic can handle inconsistent information systems without the danger of trivialisation. The forerunners of paraconsistent logic were the Polish logician J. Łukasiewicz and the Russian philosopher N. A. Vasiliev. None of them had, at the time, a broad view of classical logic as we see it today; they treated it more or less through Aristotle’s prism in keeping with the then dominant trends in the field. Simultaneously, around 1910, though independently, they aired the possibility of a paraconsistent logic that would constrain, for example, the principle of contradiction, when formulated as follows: Given two contradictory propositions, that is, one of which is the negation of the other, then one of the propositions is false. Vasilev even came to articulate a certain paraconsistent logic, which he baptised imaginary, modifying the Aristotelian syllogistic. The Polish logician S. Ja´skowski, a disciple of Łukasiewicz, was the first logician to structure a paraconsistent propositional calculus. In 1948, he published his ideas on logic and contradiction, showing how one could construct a paraconsistent sentential calculus with convenient motivation. Ja´skowski’s system, named by him discursive logic, was developed later (from 1968 onwards) due to the works of authors such as J. Kotas, L. Furmanowski, L. Dubikajtis, N. C. A. da Costa and C. Pinter. Thus, an actual discursive logic was built, encompassing a calculus of the first-order predicate and a higher-order logic (there are even discursive set theories, intrinsically linked to the attribute theory, based on Lewis’ S5 calculus) [1]. The initial systems of paraconsistent logic, containing all logical levels, thus involving propositional, predicate and description calculations and higher-order logic, are due to N. C. A. da Costa (1954 onwards). This was carried out independently of the inquiries of the authors, as mentioned earlier. Today, there are even paraconsistent systems of set theories, strictly stronger than the classical ones, as they contain them as strict subsystems and paraconsistent mathematics. These mathematics are related to fuzzy mathematics, which, from a certain point of view, fits into the list of the former. As a result of the elaboration of paraconsistent logic, it has been proved that it becomes possible to manipulate inconsistent and robust information systems without eliminating contradictions and without falling into trivialisation. Worthy of mentioning is that paraconsistent logic was born out of purely theoretical considerations, both logical-mathematical and philosophical. The first ones refer, for example, to problems related to the concept of truth, the paradoxes of set theory and the vagueness inherent in natural language and scientific ones. The second is correlated with themes such as foundations of dialectics, notions of rationality and logic and the acceptance of scientific theories.
6
J. M. Abe et al.
Some of the consequences of structuring paraconsistent logic, which can be classified into two categories, ‘positive’ and ‘negative’, are as follows: Positive: (1) Better elucidation of some central concepts of logic, such as negation and contradiction, as well as the role of the abstraction scheme in set theory (set theory antinomies), (2) a deeper understanding of specific philosophical theories, especially Meinong’s dialectic and object theory, (3) proof of the possibility of strong and inconsistent, though not trivial, theories (common paradoxes can be treated from a new perspective) and (4) organisation of ontological schemes different from traditional ontology. Negatives: (1) Demonstration that specific criticisms of dialectics appear to be unfounded (e.g. Popper’s well-known remarks), (2) proof that the methodological requirements imposed on scientific theories prove to be too restrictive and deserve to be liberalised and (3) Evidence that the usual conception of truth as correspondence, a la Tarski, does not entail the laws of classical logic, without extra assumptions, usually kept implicit. Details on paraconsistent logic can be found in [1, 2]. In general, a paracomplete logic can be conceived as the underlying logic of incomplete theory in the strong sense, i.e. theory according to which a proposition and its negation are both false. The motivation for paracomplete systems is connected with the classical requirement that at least one of a proposition and its negation be true does not always fit our intuitions. For instance, if P is a vague predicate and a is a borderline individual, we may feel that both P(a) and the negation of P(a) are false. In a technical sense, paracomplete logic can be considered to be dual to paraconsistent logic. Examples of paracomplete logic are intuitionistic logic, multivalued logic, annotated logic, etc. It is worth mentioning that after discovering paracomplete logic, it was found that the notions of paraconsistent logic and paracomplete logic are independent. There are paraconsistent logics that are not paracomplete, and there are paracomplete logics that are not paraconsistent. Furthermore, some logics are paraconsistent and paracomplete simultaneously, such as annotated logics [2].
1.3 AI and Formal Systems Nowadays, in AI, we need to manipulate inconsistent information systems. We need to process them in similar systems via paraconsistent programming. Trying to transform these systems into consistent ones would be not only impractical but, above all, theoretically pointless. Therefore, AI constitutes a field where paraconsistent logic naturally encounters critical applications. Thus, computing, in general, is closely linked to paraconsistency. From a certain angle, the non-monotonic and ‘default’ logics are included in the class of paraconsistent logics (broadly). For details, the reader can consult, among others, the following references: [1] and [2].
1 Paraconsistency and Paracompleteness in AI: Review Paper
7
In connection with the preceding exposition, here are some philosophically significant problems: (a) Are non-classical logics, logics? (b) Can there even be rival logic to the classical one? (c) Ultimately, wouldn’t the logic called rivals be only complementary to the classical one? (d) What is the relationship between rationality and logic? (e) Can reason be expressed through different logics, incompatible with each other? Obviously, within the limits of this article, we cannot address all these questions, not even in a summarised way. However, adopting an ‘operational’ position, if the logical system denotes a kind of inference organon, AI contributes to lead us, inescapably, to the conclusion that there are several kinds of logic, classical and non-classical, and among the latter, complementary and rivals of classical logic. Furthermore, AI corroborates the possibility and practical relevance of logic in the category of paraconsistent, so far removed from the standards established for logicity until recently. This is, without a doubt, surprising for those not used to the latest advances in information technology. It is worth remembering that numerous arguments weaken the position of those who defend the thesis of the absolute character of classical logic. Here are four such arguments as follows: (1) Any given rational context is compatible with infinite logics capable of appearing as underlying logics. (2) Fundamental logical concepts, such as negation, have to be seen as ‘family resemblance’ in Wittgenstein’s sense. There is no particular reason for refusing, say, paraconsistent negation the dignity of negation: if one does so, one should also maintain that the lines of non-Euclidean geometries are not, in effect, lines. (3) Common semantics, e.g. restricted predicate calculus is based on set theory. As there are several (classical) set theories, there are numerous possible interpretations of such semantics, not equivalent to each other. Consequently, that calculation is not as well defined as it appears at first sight. (4) There is no sound and complete axiomatisation for traditional second-order (and higher-order) logic. It, therefore, escapes (recursive) axiomatisation. Thus, the answers to questions (a) and (b) are affirmative. A simple answer to question (c) seems complicated: at the bottom, it is primarily a terminological problem. However, in principle, as a result of the previous discussion, nothing prevents us from accepting that there are rival logics, which are not included in the list of complementary ones to the traditional one. Finally, on (d) and (e), we will emphasise that we have excellent arguments to demonstrate that reason remains reason even when it manifests itself through non-classical logic (classical logic itself is not a well-defined system). From the above, we believe that the conclusions that are imposed are susceptible to a summary, as follows: Science is more a struggle, an advance, than a stage acquired or conquered, and the fundamental scientific categories change over time. As Enriques [3] points out,
8
J. M. Abe et al.
science appears imperfect in any parts, developing through self-correction and selfintegration, to which others are gradually added, there is a constant back and forth from the foundations to the most complex theories, correcting errors and eliminating inconsistencies. However, history proves that every scientific theory contains something true: Newtonian mechanics, though surpassed by Einstein’s, evidently contains traces of truth; if its field of application is conveniently restricted, it works, predicts and therefore contains a bit of truth. Nevertheless, the real truth is a walk constant to the truth. This is the teaching of history, beyond any serious doubt. Even more, logic is constituted through history, and it does not seem possible to predict the vicissitudes of its evolution. It is not just about progress in extension; the concept of logicity has changed. An expert from the beginning of the century, although familiar with the works of Frege, Russell and Peano, could hardly have foreseen the transformations that would take place in logic in the last forty years. Today, heterodox logics have entered the scene with great impetus: no one could predict where polyvalent, relevant and paraconsistent logics will take us. Perhaps, in the coming years, a new alteration of the idea of logicity is in store, impossible to imagine at the moment [1]. ‘Reason, as defined…, is the faculty of conceiving, judging and reasoning. Conceiving and reasoning are the exclusive patrimonies of reason, but judging is also a rational activity in the precise sense of the word. Some primitive form of nonrational intuition provides the basis for judgment; it is the reason that judges since it alone manipulates and combines concepts. Most common uses of the word ‘reason’ derive from reason conceptualised as the faculty of conceiving, judging and reasoning. Thus, to discern well and adopt rational norms of life, one must consider reason in a sense defined. Furthermore, there is a set of rules and principles regulating the use of reason, primarily as it manifests itself in rational contexts. It is also permissible to call this set of rules and principles reason. When we ask whether reason transforms itself or remains invariant, it is undoubtedly more convenient to interpret the question as referring to reason as a set of rules and principles and not as a faculty. So formulated, the problem has an immediate answer: reason has changed over time. For example, the rational categories underlying Aristotelian, Newtonian and modern physics diverge profoundly, ipso facto, the principles that govern these categories vary, from which it can be concluded the reason itself has been transformed.’ (da Costa [1]). Consequently, reason does not cease to be the reason, even if it is expressed through a different logic. AI is currently one of the pillars on which the considerations that have just been made are based. So, it has a practical value of the technological application and a theoretical value, contributing to a better solution to the problems of logic, reason and culture.
1 Paraconsistency and Paracompleteness in AI: Review Paper
9
1.4 Paraconsistent Annotated Evidential Logic Eτ We focus on a particular paraconsistent and paracomplete logic, namely the paraconsistent annotated evidential logic Eτ —logic Eτ. The logic Eτ has a language such that the atomic formulas are of the type p(μ, λ) , where (μ, λ) ∈ [0, 1]2 and [0, 1] is the real unitary interval. The symbol p denotes a propositional variable in the usual sense. The pair (μ, λ) is called annotation constant. In the unitary real square [0, 1] × [0, 1], an order relation is defined as follows: (μ1 , λ1 ) ≤ (μ2 , λ2 ) iff μ1 ≤ μ2 and λ2 ≤ λ1 . The pair [[0, 1]2 , ≤ ] constitutes a lattice symbolised by τ. p(μ, λ) can be intuitively read (among others): ‘It is assumed that p’s favourable evidence degree (or belief, probability, etc.) is μ and contrary evidence degree (or disbelief, etc.) is λ’. Thus • • • •
(1.0, 0.0) indicates total favourable evidence, (0.0, 1.0) indicates total unfavourable evidence, (1.0, 1.0) indicates total inconsistency, and (0.0, 0.0) indicates total absence of evidence (absence of information).
The operator ~: | τ | → | τ | defined by ~[(μ, λ)] = (λ, μ) is correlated as the ‘meaning’ of the logical negation of the logic Eτ. The consideration of the values of the favourable degree and unfavourable degree is made, for example, by experts who use heuristics knowledge, probability or statistics. We can consider several important concepts (all considerations are taken with 0 ≤ μ, λ ≤ 1): Segment DB—segment perfectly defined: μ + λ − 1 = 0. Segment AC—segment perfectly undefined: μ − λ = 0. Uncertainty degree: Gun (μ, λ) = μ + λ − 1; Certainty degree: Gce (μ, λ) = μ − λ; To fix ideas, by using the uncertainty and certainty degrees, we can define the following 12 states: extreme states (false, true, inconsistent and paracomplete) and non-extreme states (see Fig. 1.1 and Table 1.1). The standard Cartesian system can represent such logical states. The states can be described with the certainty degree and uncertainty degree values. In this text, we have chosen the resolution 12 (number of the regions considered according to Fig. 1.2). However, the resolution is entirely dependent on the precision of the analysis required in the output, and it can be externally adapted according to the applications (Fig. 1.2). So, such limit values called control values are as follows: V cic = maximum value of uncertainty control = C 3 . V cve = maximum value of certainty control = C 1 . V cpa = minimum value of uncertainty control = C 4 . V cfa = minimum value of certainty control = C 2 . For the discussion in the present text, we used C 1 = C 3 = ½ and C 2 = C 4 = −½.
10
J. M. Abe et al.
Fig. 1.1 Representation of the extreme and non-extreme states
Table 1.1 Extreme and non-extreme states Extreme states
Symbol
Non-extreme states
Symbol
True
V
Quasi-true tending to inconsistent
QV → T
False
F
Quasi-true tending to paracomplete
QV → ⊥
Inconsistent
T
Quasi-false tending to inconsistent
QF → T
Paracomplete
⊥
Quasi-false tending to paracomplete
QF → ⊥
Fig. 1.2 Extreme and non-extreme states
Quasi-inconsistent tending to true
QT → V
Quasi-inconsistent tending to false
QT → F
Quasi-paracomplete tending to true
Q⊥ → V
Quasi-paracomplete tending to false
Q⊥ → F
1 Paraconsistency and Paracompleteness in AI: Review Paper
11
Fig. 1.3 Prototype of the terrestrial mobile robot
With the decision states and the degrees of certainty and uncertainty, we obtain a logic analyser called para-analyser [4]. Such an analyser materialised with electrical circuits gave rise to a logic controller called para-control [4]. Below we describe some applications made with such controllers.
1.5 Application in Robotics 1.5.1 Description of the Prototype This project was conceived based on the history of the application of paraconsistent logic in predecessor robots [5] and the development of robotics in autonomous navigation systems [6]. The prototype of the project implemented with the ATmega 2560 Microcontroller is observed in Fig. 1.3. The HC-SR04 ultrasonic sensors were installed correctly. At the front of the robot, one observes traction motors controlled by pulse width modulation (PWM). On the back can be seen the differential of this prototype compared to the predecessors, which consists of a servomotor to control the robot’s direction. Another difference from the previous ones was the idea of using an LCD to monitor the readings of ultrasonic sensors and observe the value of the angle pointed out by the servomotor. All these observations were critical in the robot’s movement tests [6].
1.5.2 Development of the Paraconsistent Annotated Logic Algorithm By using concepts of paraconsistent annotated evidential logic Eτ and with the mechatronic prototype finalised, it was ideal for simulating five possibilities of positioning supposed static obstacles to the front of the robot. The criterion of this simulation was based on the models, in different positions, on the robot’s left and right
12
J. M. Abe et al.
Table 1.2 Simulation of obstacles in different positions Front sensors Situation
Left (cm)
μ
Right (cm)
λ
1
10
0.2
50
0
−0.8
−75.76
2
20
0.4
40
0.2
−0.4
−37.88
3
30
0.6
30
0.4
0
0
4
40
0.8
20
0.6
0.4
37.88
5
50
1
10
0.8
0.8
75.76
Uncertainty degree
Set point (º)
positions. Next, a normalisation of frontal sensors’ readings for the values of μ and λ of the lattice was made, as can be observed in Eqs. (1.1) and (1.2). The normalisation process involves adapting the distances’ values obtained from the sensors and converting them to a range from 0 to 1, conceptual to paraconsistent logic [2]. Left Sensor 200 Right Sensor λ=1− 200 μ=
(1.1) (1.2)
Using the proposition p ‘The robot’s front is free’, paraconsistent logic concepts were applied. The certainty and uncertainty degrees were calculated according to the values μ and λ obtained by Eqs. (1.1) and (1.2). It was noticed that the degree of uncertainty generated very peculiar values to be used directly in the set point of the servomotor. Then, with other values, six new tests were performed to define the robot’s behaviour concerning supposed obstacles, simultaneously positioned at the same distance to the left and right frontal sensors. Table 1.2 shows the development of simulations and the results obtained in each case. Table 1.3 shows a gradual change in certainty degrees for these new cases that ranged from 0.99 to -0.70. The control values obtained in the simulations were applied to the paraconsistent algorithm programming developed in the C Language directly in the Interface Development Environment (IDE) of Arduino ATmega 2560. These values were used in decision-making for speed control and braking. The program’s algorithm was divided into four main blocks to facilitate its implementation: the block of the frontal sensors, the block of paraconsistent logic, the control block of the servomotor and the control block of speed and traction. // Front Sensor Block trigpulse_1(); //calls the function trigger of the right front sensor pulse_1 = pulsein (echo_1, high); rt_ft_sr =pulse_1/58; //calculates obstacle distance to right front sensor
1 Paraconsistency and Paracompleteness in AI: Review Paper
13
Table 1.3 Simulation of obstacles in equal positions simultaneously Front sensors Situation
Left (cm)
μ
1
180
1
180
0.01
0.99
2
150
0.75
150
0.25
0.50
3
120
0.60
120
0.40
0.20
4
90
0.45
90
0.55
−0.10
5
60
0.30
60
0.70
−0.40
6
30
0.15
30
0.85
−0.70
Right (cm)
λ
Certainty degree
trigpulse_2();//calls the function trigger of the left front sensor pulse_2 = pulsein (echo_2, high); lt_ft_sr =pulse_2/58; //calculates obstacle distance to left front sensor if(rt_ft_sr >=50) { rt_ft_sr =50; } //limits distance measured at 200cm if(lt_ft_sr >=50) { lt_ft_sr =50; } //limits distance measured at 200cm //Paraconsistent Logic Block mi = (sr_fe/50); //process of normalization of favorable evidence μ la = (1-(sr_fd*0.02)); //normalization process of the contrary evidence λ deg_unc = ((mi+la)-1); //calculates the degree of uncertainty deg_cer = (mi-la); //calculates the degree of certainty //Servomotor Control Block sv_set_pt = 538.42*gra_inc+551.5; //calculates the set point of the servomotor ser_pos = map (sv_set_pt, 0 , 1023, 0, 180); //positions the servomotor //Speed and Traction Control Block pwm_set_mt = deg_cer*105 + 150; //calculates the pwm of the traction motor analogwrite (rt_trc_mt, pmw_set_mt); //controls right motor traction analogwrite (lt_trc_mt, pwm_set_mt); //controls left motor traction if (deg_cer > -0.9) { digitalwrite (in1_mot_dir, high); //traction motors follow forward digitalwrite (in2_mot_dir, low); digitalwrite (in3_mot_esq, high); digitalwrite (in4_mot_esq, low); } else if(deg_cer TT ∧ yt < TT) 0, (lt > TT ∧ yt > TT) ∨ (lt < TT ∧ yt < TT)
(5.1)
The input of the meta-learning supervisor ANN was built from the softmax activations of the ensemble of the underlying CNN models. The algorithm of building USD can be described in a few words as follows: build the “uncertainty shape descriptor” by sorting softmax activations inside each model vector, order model vectors by the
64
S. Selitskiy
highest softmax activation, flatten the list of vectors, rearrange the order of activations in each vector to the order of activations in the vector with the highest softmax activation. Examples of the descriptor for the M = 7 CNN models in the underlying FR or FER ensemble (M is a number of models in the ensemble), for the cases when none of the models detected the face correctly, 4 models and 6 models detected the face correctly, are presented in Fig. 5.2. It could be seen that shapes of the distribution of the softmax activations are quite distinct and, therefore, can be subject to the pattern recognition task which is performed by the meta-learning supervisor ANN. However, unlike in the mentioned above publication, for simplification reasons, supervisor ANN was not categorizing the predicted number of the correct members of the underlying ensemble but instead is performing the regression task of the transformation. On the high level (ANN layer details are given in Sect. 5.4), the transformation can be seen as Eq. 5.2, where n = |C|∗M is the dimensionality of the ∀ USD ∈ X , |C|—cardinality of the set of FR or FER categories (subjects or emotions) and M—the size of the CNN ensemble, Fig. 5.1. reg : X ⊂ Rn → Y ⊂ R where ∀ x ∈ X , x ∈ (0 . . . 1)n , ∀y ∈ Y, E(y) ∈ [0 . . . M].
Fig. 5.1 Meta-learning supervisor ANN over underlying CNN ensemble
(5.2)
5 Elements of Continuous Reassessment and Uncertainty Self-awareness …
65
Fig. 5.2 Examples of the uncertainty shape descriptors (from left to right) for 0, 4, and 6 correct FER predictions by the 7-model CNN ensemble
The loss function used for y is the usual for regression tasks, sum of squared error: SSE y = t=1,Nmb (yt − et )2 , where e is the label (actual number of the members of CNN ensemble with correctly prediction), and Nmb —minbatch size. From the trustworthiness categorization and ensemble vote point of view, the highlevel transformation of the combined CNN ensemble together with the meta-learning supervisor ANN can be represented as Eq. 5.3: cat : I ⊂ Il → C × B ⊂ C × B
(5.3)
where i are images, l—mage size, c—classifications,and b—binary trustworthy flags, such as ∀ i ∈ I, i ∈ (0 . . . 255)l , ∀c ∈ C, c ∈ c1 , . . . , c|C| , ∀b ∈ B, b ∈ {1, 0}. bi =
1, (yi > TTt ) 0, (yi < TTt )
(5.4)
where i is an index of the image at the moment t of the state of the loss function memory. ci = argmin(|yi − ei (ci )|)
(5.5)
Equations above describe the ensemble vote that chooses category ci , which received the closest number of votes ei to the predicted regression number yi .
5.3 Data Set The BookClub artistic makeup data set contains images of E = |C| = 21 subjects. Each subject’s data may contain a photo-session series of photos with no makeup, various makeup, and images with other obstacles for facial recognition, such as wigs, glasses, jewellery, face masks, or various headdresses. The data set features 37 photo sessions without makeup or occlusions, 40 makeup sessions, and 17 sessions with occlusions. Each photo session contains circa 168 JPEG images of the 1072 × 712 resolution of six basic emotional expressions (sadness, happiness, surprise, fear, anger, and disgust), a neutral expression, and the closed eyes photoshoots taken
66
S. Selitskiy
with seven head rotations at three exposure times on the off-white background. The subjects’ age varies from their twenties to sixties. The race of the subjects is predominately Caucasian and some Asian. Gender is approximately evenly split between sessions. The photos were taken over two months, and several subjects were posed at multiple sessions over several weeks in various clothing with changed hairstyles, downloadable from https://data.mendeley.com/datasets/yfx9h649wz/3. All subjects gave written consent to use their anonymous images in public scientific research.
5.4 Experiments The experiments were run on the Linux (Ubuntu 20.04.3 LTS) operating system with two dual Tesla K80 GPUs (with 2×12 GB GDDR5 memory each) and one QuadroPro K6000 (with $12$GB GDDR5 memory, as well), X299 chipset motherboard, 256 GB DDR4 RAM, and i9-10900X CPU. Experiments were run using MATLAB 2022a. The experiments were done using MATLAB with Deep Learning Toolbox. For FR and FER experiments, the Inception v.3 CNN model was used. Out of the other SOTA models applied to FR and FER tasks on the BookClub data set (AlexNet, GoogLeNet, ResNet50, and Inception-ResNet v.2), Inception v.3 demonstrated overall the best result over such accuracy metrics as trusted accuracy, precision, and recall [14, 15]. Therefore, the Inception v.3 model, which contains 315 elementary layers, was used as an underlying CNN. Its last two layers were resized to match the number of classes in the BookClub data set (21) and re-trained using the “Adam” learning algorithm with 0.001 initial learning coefficient, “piecewise” learning rate drop schedule with 5 iterations drop interval, and 0.9 drop coefficient, mini-batch size 128, and 10 epochs parameters to ensure at least 95% learning accuracy. The Inception v.3 CNN models were used as part of the ensemble with a number of models M = 7 trained in parallel. Meta-learning supervisor ANN models were trained using the “Adam” learning algorithm with 0.01 initial learning coefficient, mini-batch size 64, and 200 epochs. For online learning experiments, naturally, batch size was set to 1, as each consecutive prediction was used to update meta-learning model parameters. The memory buffer length, which collects statistics about previous training iterations, was set to K = 8192. The r eg meta-learning supervisor ANN transformation represented in Eq. 5.2 is implemented with two hidden layers with n + 1 and 2n + 1 neurons in the first and second hidden layer, and the ReLU activation function. All source code and detailed results are publicly available on GitHub (https://github.com/Selitskiy/StatLoss).
5 Elements of Continuous Reassessment and Uncertainty Self-awareness …
67
5.4.1 Trusted Accuracy Metrics Suppose only the classification verdict is used as a final result of the ANN model. In that case, the accuracy of the target CNN model can be calculated only as the ratio of the number of correctly identified test images by the CNN model to the number of all test images: Accuracy =
Ncorrect Nall
(5.6)
When additional dimension in classification is used, for example amending verdict of the meta-learning supervisor ANN, (see Formula 5.3), and cat(i) : c × b, where ∀ i ∈ I, ∀c × b ∈ C × B = (c1 , b1 ), . . . , (c|C| , b|C| ) , ∀b ∈ B, = {True, False}, then the trusted accuracy and other trusted quality metrics can be calculated as: Accuracyt =
Ncorrect: f =T + Ncorrect: f =T Nall
(5.7)
As a mapping to a more usual notations, Ncorrect: f =T can be as the True Positive (TP) number,Nwrong: f =T —True Negative (TN), Nwrong: f =T —False Positive (FP), and Ncorrect: f =T —False Negative (FN). Analogously to the trusted accuracy, such metrics as precision, recall, specificity, and F1 score, we used for the models’ evaluation.
5.5 Results Results of the FER experiments are presented in Table 5.1 (FR results are similar but with less un-trusted and trusted metrics difference). The first column holds accuracy metrics using the ensemble’s maximum vote. The second column using the ensemble vote closest to the meta-learning supervisor ANN prediction and trustworthiness threshold learned only on the test set, see Formulae 4, 5. The next two columns contain the results of the online learning experiments. The first of these columns has data of the online learning on the randomized test data, and the last column online learning on the images grouped by the photo session, i.e. groups of the same person and same makeup or occlusion, but with different lighting, head position, and emotion expression (also see Fig. 5.3). Figure 5.4 shows the relationship between the average session trusted threshold and session-specific trusted recognition accuracy for FR and FER cases of the grouped test sessions.
68
S. Selitskiy
Table 5.1 Accuracy metrics for FER task Metric
Maximal
Predicted
Pred. online
Pred. grouped
Untrusted accuracy
0.39425
0.35380
0.35380
0.35380
Trusted accuracy
0.68827
0.73339
0.64791
0.73303
Trusted precision
0.62136
0.63510
0.35043
0.66043
Trusted recall
0.53580
0.57927
0.64294
0.75818
Trusted F1 score
0.57542
0.60590
0.45362
0.70594
Trusted specificity
0.78751
0.81778
0.64937
0.71462
Maximal ensemble vote, meta-learning predicted vote, meta-learning with random online re-training vote, and meta-learning with session-grouped online re-training vote
Fig. 5.3 Trusted threshold learned during the training phase (blue, dashed line), online learning changes for grouped test images (green), and shuffled test images (red). FR—left and FER—right
5.6 Discussion, Conclusions, and Future Work Computational experiments with CNN ensemble based on Inception v.3 architecture and data set with significant out-of-training data distribution in the form of makeup and occlusions were performed. A meta-learning supervisor ANN was used as an instrument of self-awareness of the model about the uncertainty and trustworthiness of its predictions. Results demonstrate a noticeable increase of the accuracy metrics for the FR task (by tens of per cent) and significantly (doubles)—for the FER task. The proposed novel “loss layer with memory” architecture without online re-training increases key accuracy metrics by an additional (up to 5) percentage. The trustworthiness threshold learned using the “loss layer with memory” explains why prediction for a given image was categorized as trusted or non-trusted. However, prima facie online re-training meta-learning supervisor ANN (while underlying CNN stayed unchanged) after each tested image demonstrates poorer
5 Elements of Continuous Reassessment and Uncertainty Self-awareness …
69
Fig. 5.4 Trusted accuracy against trusted threshold for grouped test images. FR—left and FER— right
Fig. 5.5 Examples of images for FER (anger expression) with low trusted threshold (bad acting)— left and high trusted threshold (better acting)—right
performance on most accuracy metrics except recall. Obviously, improving the online learning algorithms would be a part of future work. Still, what is fascinating, is that the dynamically adjusted trustworthiness threshold informs the model not only about its uncertainty but also about the quality of the test session—for example, in Fig. 5.5, it could be seen that a low-threshold session has a poorly performing subject who struggles to play the anger emotion expression. In contrast, in the high-threshold session, the facial expression is much more apparent.
70
S. Selitskiy
References 1. Post|LinkedIn: https://www.linkedin.com/posts/yann-lecun_i-think-the-phrase-agi-should-beretired-activity-6889610518529613824-gl2F/?utm_source=linkedin_share&utm_medium= member_desktop_web, (Online Accessed 11 Apr 2022) 2. Abu-Mostafa, Y.S.: Learning from hints in neural networks. J. Complex. 6(2), 192–198 (1990) 3. Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 3988–3996. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016) 4. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24. Curran Associates, Inc. (2011), https://procee dings.neurips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf 5. F. Author et al. Cacioppo, J.T., Berntson, G.G., Larsen, J.T., Poehlmann, K.M., Ito, T.A., et al.: The psychophysiology of emotion. Handbook Emot. 2(01), 2000 (2000) 6. Chomsky, N.: Powers and Prospects: Reflections on Human Nature and the Social Order. South End Press (1996) 7. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124 (1971) 8. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1126–1135. PMLR (06–11 Aug 2017), http://proceedings.mlr.press/v70/finn17a.html 9. Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017).https://doi.org/10.1017/S0140525X160 01837 10. Liu, X., Wang, X., Matwin, S.: Interpretable deep convolutional neural networks via metalearning. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–9 (2018). https://doi.org/10.1109/IJCNN.2018.8489172 11. McCarthy, J., Minsky, M.L., Rochester, N., Shannon, C.E.: A proposal for the dartmouth summer research project on artificial intelligence, August 31, 1955. AI Mag. 27(4), 12–12 (2006) 12. Ram, R., Müller, S., Pfreundt, F., Gauger, N., Keuper, J.: Scalable hyperparameter optimization with lazy Gaussian processes. In: 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), pp. 56–65 (2019) 13. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1842–1850. PMLR, New York, New York, USA (20–22 Jun 2016) 14. Selitskiy, S., Christou, N., Selitskaya, N.: Isolating Uncertainty of the Face Expression Recognition with the Meta-Learning Supervisor Neural Network, pp. 104–112. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3480433.3480447 15. Selitskiy, S., Christou, N., Selitskaya, N.: Using statistical and artificial neural networks metalearning approaches for uncertainty isolation in face recognition by the established convolutional models. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Jansen, G., Pardalos, P.M., Giuffrida, G., Umeton, R. (eds.) Machine Learning, Optimization, and Data Science, pp. 338–352. Springer International Publishing, Cham (2022)
5 Elements of Continuous Reassessment and Uncertainty Self-awareness …
71
16. Thrun, S.: Is learning the n-th thing any easier than learning the first? Adv. Neural Inf. Process. Syst. 8 (1995) 17. Thrun, S., Mitchell, T.M.: Lifelong robot learning. Robot. Auton. Syst. 15(1–2), 25–46 (1995) 18. Thrun, S.P.L.: Learning To Learn. Springer, Boston, MA (1998). https://doi.org/10.1007/9781-4615-5529-2 19. Turing, A.M.: I.—Computing machinery and intelligence. MindLIX(236), 433–460 (1950). https://doi.org/10.1093/mind/LIX.236.433
Chapter 6
Topic-Aware Networks for Answer Selection Jiaheng Zhang and Kezhi Mao
Abstract Answer selection is an essential task in the study of natural language processing, which is involved in many applications such as a dialog system, reading comprehension, and so on. It is a task of selecting the correct answer from a set of given candidates for certain questions. One of the challenging problems for this task is that traditional deep learning model for answer selection lacks real-world background knowledge, which is crucial for answering questions in real-world applications. In this paper, we propose a set of deep learning networks to enhance the traditional answer selection models with topic modeling, so that we could use topic models as external knowledge for the baseline models and improve the performance of the model. Our topic-aware networks (TANs) are specially designed for answer selection task. We proposed a novel method to generate topic embedding for both questions and answers separately. We designed two kinds of TAN models and evaluate our models in two commonly used answer selection datasets. The results verify the advantages of TAN in improving the performance of traditional answer selection deep learning models.
6.1 Introduction Answer selection is an essential part of modern dialogue systems. Dialogue systems also known as chat systems or chatbot and are widely used in our daily life. Some famous mobile applications such as Apple’s Siri or Google Assistant are dialog systems. People can receive answers or responses from dialogue systems when they ask a question. To correctly answer people’s question, especially for those with relatively standard answers, some of the dialogue systems use algorithms to first J. Zhang · K. Mao (B) School of Electrical and Electronic, Nanyang Technological University, Singapore 639798, Singapore e-mail: [email protected] J. Zhang e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_6
73
74 Table 6.1 Sample of answer selection task
J. Zhang and K. Mao Answer Selection Sample
Lable
How are glacier caves formed?
–
A partly submerged glacier cave on Perito Moreno glacier
0
The ice facade is approximately 60 m high
0
Ice formations in the Titlis glacier cave
0
A glacier cave is a cave formed within the ice of a glacier
1
select a set of possible and relevant answers from various and numerous resources, then use answer selection algorithms to sort the candidate answers, and finally send the most likely answer to the user. Thus, improving the performance of answer selection is crucial for dialogue systems. As shown in Table 6.1, given a question and a set of candidate answers, the goal of answer selection is to find the correct and the best answer to that question among those candidates. Various frameworks or methods have been proposed for answer selection. For example, rule-based systems which focus on feature engineering with human summarized rules, and deep learning models, more popularly adopted methods recently, which use deep learning networks such as convolutional neural networks, recurrent neural networks, or attentional networks to extract matching features automatically. However, traditional deep learning models are purely data driven and feature driven, which not only face overfitting problems and also lack real-world background information and those information beyond the features in the local contexts. To solve the aforementioned issues, some knowledge-based methods were proposed, which use external knowledge as a compensation for traditional deep learning models. In this chapter, we propose to use specially designed topic-aware networks to enhance the performance of traditional deep learning models with topic embeddings as external knowledge references. Topic modeling is a traditional machine learning technique that is used to model the generation process of a set of documents. Each word in a document can be assigned with a latent topic using topic modeling. Topic modeling is a good tool for us to understand the nature of the documents. For each text in the document, the latent topic tags for this text are a kind of external knowledge from a document level point of view. Topic embeddings which is a numerical representation of latent topic tags are proposed to make topic modeling convenient in helping deep learning models. As shown in Fig. 6.1, we use Skip-gram techniques to generate topic embeddings. Each word in a text is assigned with two tokens: the word token wi and the topic tag token zi. These tokens are used as basic inputs in our proposed frameworks. Our work is inspired by following considerations. Firstly, we think intuitively that the correct answer to a question normally should under the same topic. For example, if the question is asking about time-related information, the beginning of the question may be “When…” or “What time…,” and the answer may contain texts relating to topics about time such as “in the morning” or “at 8 am.” The questions
6 Topic-Aware Networks for Answer Selection
75
Fig. 6.1 Topic embedding generation processing
and the correct answers normally contain related words. By adding the latent topics of the answers and questions into consideration, we could restrict the selection of the answers so as to further improve the generalization of the model. Secondly, topic models are based on document-level information which reveals latent topics under targeting documents. However, traditional deep learning models normally focus on discovering local features that can be used to classify texts. Topic models can help us understand how the texts are generated. The output of the topic models, which is a set of topics that form the document and lists of words that describe the same topic, is somehow like a knowledge base of certain datasets. Motivated by above considerations, we proposed topic-aware networks for answer selection (TNAS) that integrates topic models into answer selection architectures by using topic embeddings as external knowledge for baseline deep learning models. As shown in Fig. 6.2, compared with traditional deep learning models for answer selection, TNAS has one more topic embeddings module during the training stages. The topic-aware module generates topic embeddings for both questions and answers. This topic embeddings layer can help us determine the similarity about the question and the answer from topic point of view. Eventually, we generate topic-aware vector representations and concatenate them with baseline deep learning texts representations for both questions and answers and get their cosine distances as scoring function for calculating the probability that the answer is a correct candidate. To evaluate our model, we conduct experiments in 3 popular answer selection datasets in natural language processing. The results of our experiments show that our model improved the performance of baseline deep learning models. The main contributions of our work are summarized into four parts as follows: • We propose an efficient way to generate topic embeddings for baseline deep learning models that can be used easily integrated in their architectures. • We propose to incorporate topic embeddings as external knowledge into baseline deep learning models for answer selection tasks by applying LDA algorithm for both questions and answers. • We propose two networks specially design for answer selection tasks that incorporate topic information into baseline deep learning models to automatically matching topics of both questions and answers.
76
J. Zhang and K. Mao
Fig. 6.2 a Traditional deep learning framework for answer selections; b Our model
• We propose to use external databases with similar contexts in training topic embeddings for our topic-aware networks to further improve the performance of our network.
6.2 Related Works 6.2.1 Topic Embedding Generation To better extract semantic knowledge in texts for downstream NLP tasks, various topic models have been introduced for generating topic embeddings. One influential and classic research is the latent semantic indexing (LSI) [1]. LSI utilizes linear algebra methods for mapping latent topics with singular value decomposition (SVD). Subsequently, various methods for generating topic embedding have been proposed on top of LSI. Among them include the latent Dirichlet allocation (LDA) [2], which is introduced as a Bayesian probability model that generates document-topic and word-topic distribution utilizing Dirichlet priors [3]. In comparison with prior topic embeddings generation approaches such as LSI, LDA is more effective thanks to its ability to capture hidden semantic structure within a given text through the correlated words [4]. Dirichlet priors are leveraged to estimate document-topics density and topic-word density in LDA, improving its efficacy in topic embedding generation. Thanks to it superior performance, LDA has become one of the most commonly used approach for topic embedding generation. In this work, we adopt LDA as the topic embedding generation method to generate topic embeddings as external knowledge base, bringing significant improvement to the result of answer selection.
6 Topic-Aware Networks for Answer Selection
77
6.2.2 Answer Selection Answer selection has received increasing research attention thanks to its applications in areas such as dialog systems. A typical question selection model requires the understanding of both the question as well as the candidate answer texts [5]. Previously, answer selection models typically rely on human summarized rules with linguistic tools, feature engineering, and external resources. Specifically, Wang and Manning [6] utilize tree-edit operation mechanism on the dependency parse trees; Severyn and Moschitti [7] employ an SVM [8] with tree kernels for fusing feature engineering over parsing trees for feature extraction, while lexical semantic features obtain from WordNet [9] have been used by Yih et al. [10] to further improve on answer selection. More recently, deep networks such as CNN [11, 12] and RNN [11, 13, 14] have brought significant performance boost in various NLP tasks [15]. Deep learningbased approach has also been predominant in the task of answer selection thanks to their better performance. Among them, Yu et al. [16] transformed the answer selection task into a binary classification problem [17] such that candidate sentences are ranked based on the cross-entropy loss of the question-candidate pairs, while constructing a Siamese-structured bag-of-words model. Subsequently, QALSTM [18] was proposed which employs a bidirectional LSTM [19, 20] network to construct sentence representations of questions and candidate answers independently, while CNN is utilized in [21] as the backbone structure to generate sentence representation. Further, HyperQA [22] is proposed where the relationship between the question and candidate answers is modeled in the hyperbolic space [23] instead of the Euclidean space. More recently, with the success of transformer [24] in a variety of NLP tasks [25, 26], it has also been introduced to the answer selection task. More specifically, TANDA [27] is proposed by transferring a pre-trained model into a model specialized for answer selection through fine-tuning on large and high-quality dataset, improving the stability of transformer for answer selection, while Matsubara et al. [28] improve the efficiency of transformers by reducing the amount of sentence candidates through neural re-rankers. Despite the impressive progress made in deep learning-based approaches for answer selection, these methods neglect the importance of topics in answer selection. In this work, we propose to incorporate topic embeddings as external knowledge into baseline deep learning models for answer selection and demonstrate its effectiveness.
6.3 Methodology We present the detailed implementation of our model in this section. The overall architecture of our proposed model is shown in Fig. 6.2. Our model is a multichannel deep learning model with two stages in training. Firstly, we use techniques in word embedding generation to help generate topic embedding as our external
78
J. Zhang and K. Mao
knowledge base for the next stage. Secondly, we set up our topic-aware network for answer selections. We proposed two main topic-aware network architectures based on traditional answer selection architectures. Lastly, we use triplet loss as our objective function in our final training stage for our model.
6.3.1 Topic Embedding To generate topic embeddings as external knowledge, we need to train a topic model for the targeting documents first. Then, we use the topic model to label the input texts to get topic token sequences. As shown in Fig. 6.1, we are using Skip-gram algorithm to train our topic embeddings. To train word embeddings, we need to get word token sequences. This is the same for training topic embeddings. To get topic tokens, we firstly apply latent Dirichlet allocation (LDA) algorithm to get a topic model using Gensim training tools, then use the results of LDA which a set of topics and a set of words under each topic to assign each word wi with a latent topic zi. The objective function of the training process for these topic tokens is shown below. L(D) =
M 1 log Pr(wi+c , z i+c |z i ) M i=1 −k≤c≤k,c=0
(6.1)
where Pr() is the probability using softmax function. The nature under above function is to use each topic token as a pseudo-word token to predict words and topics around it. We aim to not only encode the word information but also the topic information into the topic embeddings.
6.3.2 Topic-Aware Networks After we generate topic embeddings results, we can use these results as external knowledge for our deep learning architectures. We propose two main kinds of architecture with four kinds of network designs for topic-aware networks for answer selection. The first is a network with shared encoder weights as shown in Figs. 6.3 and 6.4. The encoders for both questions and answers are trained together, and the weights are shared. The second is a network with none-shared encoder weights, as shown in Figs. 6.5 and 6.6. The encoders are trained separately for questions and answers. The input text sequences are firstly separated into sequences one for original texts and the other for topic tokens. TAN1: None-shared encoders for both text and topic tokens. As shown in Fig. 6.3, question texts and answer texts, which are transformed into text tokens and topic tokens, are used as the inputs for both word embedding layers and topic embedding layers. After getting the numerical representations for the input tokens,
6 Topic-Aware Networks for Answer Selection
79
Fig. 6.3 TAN3: None-shared encoders for text and shared encoders for topic
Fig. 6.4 TAN4: Shared encoders for text and none-shared encoders for topic
the outputs of each embedding layers are then processed with none-shared encoders so that each encoder is trained separately with totally different weights inside. TAN2: Shared encoders for both text and topic tokens. As shown in Fig. 6.4, different from TAN1, both encoders for text channel and topic channel are shared for TAN2. TAN3: None-shared encoders for text and shared encoders for topic. As shown in Fig. 6.3, different from TAN1 and TAN2, there is another architecture which use none-shared encoders for text token embeddings and shared encoders for topic token embeddings.
80
J. Zhang and K. Mao
Fig. 6.5 TAN1: None-shared encoders for both text and topic tokens
Fig. 6.6 TAN2: Shared encoders for both text and topic tokens
TAN4: Shared encoders for text and none-shared encoders for topic. As shown in Fig. 6.4, similar to TAN3, it is a mixed architecture which use shared encoders for text token embeddings and none-shared encoders for topic token embeddings.
6.3.3 Triplet Loss For all the networks we proposed, we adopt the same training and testing mechanism. We use triplet loss in our model. During the training stage, for each question texts
6 Topic-Aware Networks for Answer Selection
81
Table 6.2 Statistics of the questions Dataset
Train
Dev
Test
Total
WikiQA TrecQA
2118 1229
296 82
633 100
3047 1411
Q, besides its ground truth answer A+, we randomly pair a negative answer A− for it. Therefore, the input data are actually a triplet set (Q, A+, A−). Our goal is to minimize this triplet loss for the answer selection task: L Q, A+ , A− = max 0, m + d Q, A− − d Q, A+ ,
(6.2)
where d(Q, A−) and d(Q, A+) is the Euclidean distance between the vector representation of the question texts and the answer texts.
6.4 Experiments In this section, we present the experiment and the result of our proposed model. All the network architectures are achieved using Keras in this paper. We evaluate our model using two widely used answer selection dataset.
6.4.1 Dataset The statistics of the datasets used in this paper is shown in Table 6.2. The tasks for both datasets are to rank the candidate answers based on their relatedness to the question. Brief descriptions of the datasets are as follows: 1. WikiQA: This is a benchmark for open-domain answer selection that was created from actual Bing and Wikipedia searches. We only use questions with at least one accurate response. 2. TrecQA: This is another answer selection dataset that comes from Text REtrieval Conference (TREC) QA track data.
6.4.2 Experiment Settings To evaluate the model, we implement a baseline system for comparison. The baseline model adopt CNN as the encoders, and the architecture of the baseline model is the same as TAN1 and TAN4 but without the topic-aware module. The CNN used in TAN1, 2, 3, 4 is the same as the baseline model. The other key settings of our models are as follows:
82
J. Zhang and K. Mao
1. Embeddings: We use GloVe with 300 dimensions to initialize our word embedding layer. For the topic embeddings, we use Gensim package to generate LDA model and use its Word2Vec function to help generate topic embeddings. We generate topic embeddings for both questions and answers separately. 2. CNN as encoders: We set CNN filter to 1200 filters and all the inner dense layers to be 300 dimensions. We use keras to help us set up the training and testing process. The optimizer we choose is an Adam optimizer. 3. TAN1 without Topic: The first baseline model we use is a traditional architecture for answer selection which use none-shared encoders for question-and-answer tokens. 4. TAN4 without Topic: The second baseline model we use that have shared encoders for question-and-answer tokens. 5. Evaluation Metrics: Our task is to rank the candidate answers on their correctness to the question; thus, we adopt widely used measurement standards in information retrieval and answer selection, namely mean average precision (MAP) and mean reciprocal rank (MRR) to evaluate the performance of our model.
6.4.3 Results and Discussion Table 6.3 shows the results of our models. From the results, we have following findings. Firstly, for baseline models, TAN4 without topic outperforms TAN1 without topic in both WikiQA and TrecQA. This indicates that the shared encoders may be more suitable for answer selection tasks. This is reasonable because for shared encoders, the model can compare the representation of question and answers in the same context; however, for none-shared encoders, the model has to learn double the parameters to compare the representation. It is harder for the model to learn more parameters with limited samples. Secondly, compared with the baseline model, all of our models outperform the baseline to some extent. Adding topic-aware module does improve the performance of the baseline models. Among all the networks, TAN2 which adopt shared encoders Table 6.3 Model performance of topic-aware networks for answer selection task Model
WikiQA
TrecQA
MAP
MRR
MAP
MRR
TAN1 without topic
0.65
0.66
0.71
0.75
TAN4 without topic
0.67
0.68
0.73
0.77
TAN1
0.66
0.67
0.72
0.79
TAN2
0.69
0.70
0.79
0.80
TAN3
0.67
0.68
0.74
0.76
TAN4
0.68
0.69
0.72
0.78
6 Topic-Aware Networks for Answer Selection
83
for both text and topic tokens outperforms all the other networks. TAN1 has similar performance to the best baseline model. TAN3 and TAN4 have similar performance. These findings show that for both baseline and our proposed model, shared encoders are more efficient in pairing the right answer to the question. Topicaware modules improved the performance of the baseline models. TAN2 is the best architecture among all the architectures we have proposed.
6.5 Conclusion and Future Work In this paper, we studied incorporating external knowledge into the traditional answer selection deep learning models by using specially designed networks. The proposed network is an automatic tool to extract useful information from the topic models and use it in any deep learning baseline models. We designed the representation of external knowledge as topic embeddings. The results show that our model can improve the performance of the baseline deep learning model. Moreover, we identified the best architectures among our designed networks. For future works, we can apply two improvements. First, during the training stage of topic modeling, we fixed the number of topics for topic models. However, we will explore ways to automatically decide the number of topics we should use in the model. Second, given that there are many question type classification datasets such as TREC, we will investigate the use of transfer learning to obtain a pre-trained topic embedding using the publicly available dataset and fine-tune the embedding using the training data.
References 1. Lai, T., Bui, T., Li, S.: A review on deep learning techniques applied to answer selection. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2132–2144 (2018) 2. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999) 3. Kumari, R., Srivastava, S.K.: Machine learning: a review on binary classification. Int. J. Comput. Appl. 160(7) (2017) 4. Yih, S.W., Chang, M.-W., Meek, C., Pastusiak, A.: Question answering using enhanced lexical semantic models. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, (2013) 5. Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995) 6. Fenchel, W.: Elementary geometry in hyperbolic space. In: Elementary Geometry in Hyperbolic Space. de Gruyter (2011) 7. Tan, M., dos Santos, C., Xiang, B., Zhou, B.: Lstm-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015) 8. Yu, L., Hermann, K.M., Blunsom, P., Pulman, S.: Deep learning for answer sentence selection. arXiv preprint arXiv:1412.1632 (2014)
84
J. Zhang and K. Mao
9. Yogatama, D., Dyer, C., Ling, W., Blunsom, P.: Generative and discriminative text classification with recurrent neural networks. arXiv preprint arXiv:1703.01898 (2017) 10. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advan. Neural Inf. Process. Syst. 30 (2017) 11. Noble, W.S.: What is a support vector machine? Nature Biotechnol. 24(12), 1565–1567 (2006) 12. Matsubara, Y., Vu, T., Moschitti, A.: Reranking for efficient transformer-based answer selection. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1577–1580 (2020) 13. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Advan. Neural Inform. Process. Syst. 28 (2015) 14. Naseem, U., Razzak, I., Musial, K., Imran, M.: Transformer based deep intelligent contextual embedding for twitter sentiment analysis. Futur. Gener. Comput. Syst. 113, 58–69 (2020) 15. Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C., Socher, R.: Ctrl: a conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858 (2019) 16. Garg, S., Vu, T., Moschitti, A.: Tanda: transfer and adapt pre-trained transformer models for answer sentence selection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7780–7788 (2020) 17. Severyn, A., Moschitti, A.: Automatic feature engineering for answer selection and extraction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 458–467 (2013) 18. Melamud, O., Goldberger, J., Dagan, I.: Context2vec: learning generic context embedding with bidirectional lstm. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016) 19. Likhitha, S., Harish, B.S., Keerthi Kumar, H.M.: A detailed survey on topic modeling for document and short text data. Int. J. Comput. Appl. 178(39), 1–9 (2019) 20. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016) 21. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 22. Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382 (2015) 23. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2003) 24. Tay, Y., Tuan, L.A., Hui, S.C.: Hyperbolic representation learning for fast and efficient neural question answering. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 583–591 (2018) 25. Wang, M., Manning, C.D.: Probabilistic tree-edit models with structured latent variables for textual entailment and question answering. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1164–1172 (2010) 26. Yin, W., Kann, K., Yu, M., Schütze, H.: Comparative study of CNN and RNN for natural language processing. arXiv preprint arXiv:1702.01923 (2017) 27. Sethuraman, J.: A constructive definition of dirichlet priors. Statistica Sinica pp. 639–650 (1994) 28. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI Conference on Artificial Intelligence (2015)
Chapter 7
Design and Implementation of Multi-scene Immersive Ancient Style Interaction System Based on Unreal Engine Platform Sirui Yang, Qing Qing, Xiaoyue Sun, and Huaqun Liu Abstract This project, based on the clue of flying and searching for Kongming lanterns, combines the novel and vivid interactive system with the traditional roaming system to create a multi-scene and immersive virtual world. The project uses UE4 engine and 3dsMax modeling software to build the museum scene and the virtual ancient scene. After the modeling completed by 3DSMax, import it to UE4 to add collider and plants to the model. And constantly optimize the scene combining with the layout of streets of the Tang Dynasty. Then test the scene to make the interaction and scene better match and merge. Use Sequence to record cutscenes, use blueprint to connect animation with character operation, intersperse particle system, to realize scene roaming and the interaction of Kongming lanterns. The project combines with various technologies in the unreal engine, breaks the boring experience mode of the traditional roaming system, and reproduces the magnificent ancient city, which hurries off us for the appointment of thousands of lights.
7.1 Introduction With the rapid development of human science and technology, virtual reality technology has penetrated into all aspects of people’s life with it gradually changed from theory to industrialization. It is also loved by users because of its immersion, interaction and imagination. Virtual reality, just as its name implies, is the combination of virtual and reality. Theoretically, virtual reality technology (VR) is a kind of computer simulation system that can create and experience the virtual world. It uses the computer to generate a simulation environment and immerse the user in the environment. With the help of 3D modeling technology, realistic real-time rendering technology, collision detection technology, and other key technologies of virtual reality, the picture expressive force and atmosphere can be improved when the users experience. S. Yang · Q. Qing · X. Sun · H. Liu (B) Beijing Institute of Graphic Communication, Beijing 102600, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_7
85
86
S. Yang et al.
In the virtual reality world, the most important feature is the sense of “realism” and “interaction.” Participants feel like being in virtual world, environment and portraits are just like being in real environment, in which various objects and phenomena are interacting. Objects and characteristics in the environment develop and change according to natural laws. People in the environment have sensations such as vision, hearing, touch, motion, taste, and smell. Virtual reality technology can create all kinds of fabulous artificial reality environments, which are vivid and immersive, and interact with the virtual environment to the extent of using false to confuse the truth. Based on the characteristics and theoretical foundation of virtual reality technology mentioned above, this project designs a multi-scene strong immersion ancient wind interactive project. Through multi-scene interaction, this project presents a beautiful story of ancient and modern travel. The tourist in the museum was immersed in the artistic atmosphere, looking at the ancient paintings, he imagined the charm of the Tang Dynasty. In a trance, he seemed to be in the Tang Dynasty, and suddenly he was inhaled by the space–time cracks. Then he opened his eyes and became a general of the Tang Dynasty in armor. The story is set to show the idea that ancient cultures can still move people. People’s living conditions and concepts are constantly changing, but those cultural and architectural arts that can move people are permanent.
7.2 Technical Basis Physical-based rendering technology has been widely used in the movie and game industry since Disney Principled BRDF was introduced by Disney on SIGGRAPH 2012, due to its high ease of use and convenient workflow. Physical-based rendering (PBR) refers to the concept of rendering using a coloring/lighting model based on physical principles and microplane theory, as well as surface parameters measured from reality to accurately represent real-world materials. The unreal engine is widely used because of its excellent rendering technology. Next, we will describe the theoretical and technical basis of the unreal engine for rendering scenes.
7.2.1 Lighting Model Currently, the physical-based specular reflective BRDF model is Microsrofacet Cook-Torrance BRDF based on the microfacet theory. The microfacet theory derives from the idea that microgeometry is modeled as a set of microfacets and is generally used to describe surface reflections from non-optically flat surfaces. The basic assumption of microplane theory is the existence of microscopic geometry (microgeometry). The scale of microscopic geometry is smaller than that of observational scales (such as coloring resolution), but it is larger than that of visible
7 Design and Implementation of Multi-scene Immersive Ancient Style …
87
Fig. 7.1 When light interacts with a non-optical flat surface, the non-optical flat surface behaves like a large collection of tiny optical flat surfaces
wavelengths (so the application of geometrical optics and wave effects such as diffraction can be ignored). The microplane theory was only used to derive the expression of single-bounce surface reflection in 2013 and before. In recent years, with the development of the field, there have been some discussions on the multiple bouncing surface reflection using microfacet theory. Each surface point can be considered as optical flat because the microscopic geometric scale is assumed to be significantly larger than the visible light wavelength. As mentioned above, an optical flat surface divides light into two directions: reflection and refraction. Each surface point reflects light from a given direction of entry into a single direction of exit, which depends on the direction of the microgeometry normal m. When calculating BRDF items, specify light direction l and view direction v. This means that all surface points, only those small planes that are just pointing in the right direction to reflect l to v, may contribute to the BRDF value (positive and negative in other directions, after the integral, offset each other). In Fig. 7.1, we can see that the surface normal m of these “correctly oriented” surface points is located just in the middle between l and v. The vector between l and v is called a half-vector or a half-angle vector. We express it as h. Only the direction of the surface points m = h reflect light l to the direction of line of sight v. Other surface points do not contribute to BRDF. Not all m = h surface points contribute to reflection actively; some are blocked by l direction (shadowing), v direction (masking), or other surface areas of both. Microfacet theory assumes that all shadowed light disappears from the mirror reflector. In fact, some of these will eventually be visible due to multiple surface reflections, but this is not generally considered in current microplane theory. In Fig. 7.2, we see that some surface points are blocked from the direction of l, so they are blocked and do not receive light (so they cannot reflect anything). In the middle, we see that some surface points are not visible from the view direction v, so of course, we will not see any light reflected from them. In both cases, these surface points do not contribute to the BRDF. In fact, although shadow areas do not receive any direct light from l, they do receive (and therefore reflect) light from other surface areas (as shown in the right image). The microfacet theory ignores these interactions.
88
S. Yang et al.
Fig. 7.2 Various types of light surface interactions
7.2.2 From Physical Phenomena to BRDF Using these assumptions (a locally optical flat surface without mutual reflection), it is easy to derive a general form of Specular BRDF called Microfacet Cook-Torrance BRDF. This Specular BRDF takes the following form: f (l, v) =
D(h)F(v, h)G(l, v, h) 4(n · l)(n · v)
(7.1)
Among them: • D(h): Normal Distribution Function, which describes the probability of the normal distribution of micropatches, i.e., the concentration of the normal that is oriented correctly. That is, the concentration relative to the surface area of the surface point that reflects light from L to V with the correct orientation. • F(l, h): The Fresnel Equation, which describes the proportion of light reflected by a surface at different surface angles. • G(l, v, h): Geometry Function, which describes the self-shading properties of a microplane, i.e., the percentage of uncovered surface points M = H. • Denominator 4 (n. l) (n. v): Correction factor that corrects the amount of microplane transformed between the local space of microscopic geometry and the local space of the overall macro surface. With regard to Cook-Torrance BRDF, two considerations need to be highlighted: For the dot product in the denominator, simply avoiding negative values is not enough, and zero values must also be avoided. This is usually done by adding very small positive values after a regular clamp or absolute value operation. Microfacet Cook-Torrance BRDF is the most widely used model in practice. In fact, it is the simplest microplane model that people can think of. It only models single scattering on a single micro-surface in a geometric optical system, without considering multiple scattering, layered material, and diffraction. The microfacet model actually has a long way to go.
7 Design and Implementation of Multi-scene Immersive Ancient Style …
89
7.2.3 Physical-based Environment Lighting Lighting in a scene is the most critical and important part, and generally uses physicalbased ambient lighting. Common technical solutions for ambient lighting include image-based lighting (IBL). For example, the diffuse reflective ambient lighting part generally uses the Irradiance Environment Mapping technology in traditional IBL. Based on physical specular ambient lighting, image-based lighting (IBL) is commonly used in the industry. To use physical-based BRDF models with imagebased lighting (IBL), Radiance Integral (Radiance Integral) needs to be solved, and Importance Sample is usually used to solve the Brightness Integral. The importance sampling (Importance Sample) is based on some known conditions (distribution functions). It is a strategy to concentrate on sampling the regions with high probability of the distribution of integrable functions (important areas) and then efficiently calculating the accurate estimation results. The following two terms are briefly summarized. Split Sum Approximation. Based on the importance sampling method, substitute the Monte Carlo integral formula into the rendering equation: ∫ L i (l) f (l, v) cos θl · dl ≈
N 1 L i (lk ) f (lk , v) cos θlk N k=1 p(lk , v)
(7.2)
The direct solution of the upper form is complex, and it is not very realistic to complete real-time rendering. At present, the mainstream practice of the game industry is to divide the L i (lk ) f (lk ,v) cos θlk 1 N in the above formula into two terms: average brightness k=1 N p(lk ,v) N f (lk ,v) cos θlk 1 1 N . k=1 L i (l k ) and environment BRDF N k=1 N p(lk ,v) Namely: N 1 L i (lk ) f (lk , v) cos θlk ≈ N k=1 p(lk , v)
N 1 L i (lk ) N k=1
N 1 f (lk , v) cos θlk N k=1 p(lk , v)
(7.3)
After splitting, two terms are offline precomputed to match the rendering results of offline rendering reference values. In real-time rendering, we calculate the two terms that have been calculated in the Split Sum Approximation scheme, and then make the combination as the rendering result of the real-time IBL physical environment lighting part. The First Term Pre-Filtered Environment Map (pre-filter). The first term is 1 N k=1 L i (l k ), which can be understood as the L i (l k ) mean value of brightness. After N n = v = r’s assumption, it only depends on the surface roughness and the reflection vector. In this term, the practice of the industry is relatively uniform (including UE4 and COD: Black Ops 2). The main scheme adopted is to pre-filter the environmental texture, and to store the fuzzy environment highlight with multilevel fuzzy mipmap.
90
S. Yang et al. N 1 L i (lk ) ≈ Cubemap · sample(r, mip) N k=1
(7.4)
That is to say, the first term directly uses the MIP level sampling input of cubemap. N f (lk ,v) cos θlk , is The Second Sum Environment BRDF. The second item, N1 k=1 p(lk ,v) hemispherical-directional reflectance of the mirror reflector, which can be interpreted as environmental BRDF. It depends on the elevation θ, Roughness α, and Fresnel Item F. Schlick approximation is often used to approximate F, which is parameterized only on a single value of F 0 , making Rspec a three-parameter ((elevation) θ (NdotV), Roughness α, F 0 ). UE4 proposed in [Real shade in Unreal Engine 4, 2013] that in the second summation term, F 0 can be divided from the integral after using the Schlick approximation:
L i (l) f (l, v) cos θl · dl = F0
+
f (l, v) 1 − (1 − v · h)5 cos θl · dl F(l, v) f (l, v) (1 − v · h)5 cos θl · dl F(l, v)
(7.5)
This leaves two inputs (Roughness and cos θ v) and two outputs (a scale and bias to F 0 ), all of which are conveniently in the range [0, 1]. We precalculated the result of this function and store it in a 2D look-up texture2 (LUT). Figure 7.3 is about the inherent mapping relationship between roughness, cos θ , and the reflective intensity of the environmental BRDF mirror, which can be precomputed offline. Specific removal method is: N 1 f (lk , v) cos θlk = LU T.r ∗ F0 + LU T.g) N k=1 p(lk , v)
Fig. 7.3 Red-green map inputs roughness, cos θ and outputs intensity of specular reflection of ambient BRDF
(7.6)
7 Design and Implementation of Multi-scene Immersive Ancient Style …
91
That is, UE4 searched by taking F 0 of the Fresnel formula out, making up F 0 * scale + offset, saving the indexes of scale and offset onto a piece of 2D LUT, finding by roughness and ndotv.
7.3 Conceive Preparation 7.3.1 Research on the Current Situation of UE4 Used for Developing This Work As the most open and advanced real-time 3D creation tool in the world, Unreal Engine has been widely used in games, architecture, radio and film and television, automobile and transportation, simulation training, virtual production, man–machine interface, etc. In the last decade to a few years, U3D has been very popular, with over 2 million games developed on it, but in recent years, UE4 has caught up and surprisingly surpassed it. In addition, other virtual reality development platforms include VRP, CryEngine, ApertusVR, Amazon Sumerian, etc. Comparing with them, the excellent picture quality, good lighting and physics effects, and simple and clear visual programming of UE4 make it the preferred development platform for this project. Many of the instructional videos and documents posted on the UE4 website are extremely friendly for beginners.
7.3.2 Storyboard The storyboard for this project is divided into six parts. The first two parts show the game player’s sudden passing through the scene after the exhibition. The third, fourth, and fifth sections show the scene of players roaming around the city after traversing through the Tang Dynasty. When the players light up the lantern, the lights of the city also fly. The sixth part shows the player chasing after the Kong Ming lamp and finally running up the mountain when it gets dark to see the beautiful scene of the thousand lights rising in the valley (Fig. 7.4).
7.3.3 Design Process First of all, collect relevant information to clarify the design scheme. The pavilion scene completes the construction of the ground, architecture, and indoor scenes in the unreal engine, then adds a TV screen to play videos, and finally optimizes the relevant materials. The scene construction of ancient prosperous scenes and mountain night scenes is modeled by 3dsmax. After adjustment, it is imported into the unreal
92
S. Yang et al.
Fig. 7.4 Storyboard for this project
engine to adjust the architectural layout and add particle effects. Then add lighting effects and collision bodies, design interactive functions, and finally test and output (Fig. 7.5).
7.4 The Main Scene Building Part 7.4.1 The Idea of Building a Scene of the Tang Dynasty The Tang Dynasty scene of this project depicts the impression of the Tang Dynasty formed by players based on historical experience accumulation and observation of ancient paintings. This city has prosperous street views and lively markets. The magnificent architecture is the most dazzling scenery in the city, which reflects the rich life of the people. Vendors selling a full range of goods are displayed along the street, and the sacred palace is more mysterious and majestic. The architecture of this scene strives to restore a fantastic and magnificent prosperous scene. In order to make the scene closer to the real environment, the grass and trees in the scene have added a dynamic effect of swinging with the wind, and the rivers in the scene also present realistic effects. In addition, dynamic flying butterflies, flocks of white pigeons flying out of the palace, and lovely lambs have been added to the scene.
Fig. 7.5 Design flowchart
7 Design and Implementation of Multi-scene Immersive Ancient Style …
93
94
S. Yang et al.
7.4.2 Collider The interior of the scene contains a variety of buildings and other elements, and the distance between the elements is relatively close. In the process of scene construction, to ensure that there is no mold penetration problem in the scene, it is necessary to set the collider of the scene one by one. For example, when importing an Fbx format building made from 3dsmax software, when placing the building into the UE4, the character will be able to penetrate the model. In order to avoid these problems, when importing the model, we can double-click the static grid and choose to add simplified collisions to avoid building molding problems. When introducing the urban ground built by 3dsmax, the characters also have the problem of piercing molds and unable to stand on the ground. We use landscape as the main ground of the scene, so that the characters can stand on the ground.
7.4.3 Light Effect of Dusk The whole scene simulates the light effect of dusk, and the gorgeous orange sun glow adds some beauty to the magnificent ancient city. In order to achieve the desired effect, we set up a dark pink sky box. Use “Exponential Height Fog” to simulate the fog effect. Adjust “Max Opacity” and “Start Distance” to make the fog effect “lighter.” Check “Volumetric Fog” here. The comparison diagram is as follows (Fig. 7.6). Sunlight is simulated using “Directional Light” to adjust the appropriate color and intensity. Use “Skylight” to realize the reflection effect of sunlight, and finally add “Light mass Importance Volume” and “Post Process Volume” to further optimize the lighting effect.
Fig. 7.6 Lighting effect comparison map
7 Design and Implementation of Multi-scene Immersive Ancient Style …
95
7.4.4 Particle Effects In order to enhance the sense of picture when crossing the scene, the project has particle effects such as the particle effects while changing the scenes and particle effects for crossing the ancient city. The author activated Niagara effect system in this project. Niagara makes particle effects based on a modular visual effects stack that is easy to reuse and inherit and combined with a timeline. At the same time, Niagara supports data interfaces for a variety of data in the unreal engine, such as using the data interface to obtain skeletal mesh data.
7.5 Functional Design 7.5.1 Sequence Animation in Scene Character Animation. The action of the character in the animation is realized by binding the character skeleton to the animation. The movement route of the character is determined by adding the location key frame, calling the z-axis broken line diagram and adding key point adjustment to realize the movement of the character up the stairs. The door closing in the animation is realized by adding rotation key frames. Add weight key frames where necessary to realize the smooth transition of character actions. Camera Animation Implementation. The camera in the animation uses the “Cine Camera Actor.” Check “Enable Look at Tracking” and select the character skeleton as the “Actor to Track” to realize the function of aiming the camera at the character. Change the “Relative Offset” to adjust the desired perspective. Transition Between Level Sequence. Add key frames to focus settings and aperture in the sequence to achieve the zoom effect. The transition effect of fade in and fade out is also added to the animation. Add fade tracks to the animation, and complete the fade in and fade out crossing effect by setting keys. Create a master sequence and use fade track transition to integrate various level sequences.
7.5.2 Release Kongming Lanterns The blueprint class of a Kongming lantern is used as the interactive object, and the Kongming lantern is lit during flying through the set intensity node. A box collision is used as the interactive detection range. After entering the range, the HUD prompting to fly is triggered. When releasing the Kongming lantern, call the world location in
96
S. Yang et al.
Fig. 7.7 Release Kongming lantern blueprint
Fig. 7.8 Game player release lanterns
tick and add the Z value automatically to achieve the effect of flying the Kongming lantern. After the character releases the Kongming lantern, play the cut-off animation and trigger the function of a large amount of Kongming lanterns to take off (Figs. 7.7 and 7.8).
7.5.3 Large Amounts of Kongming Lanterns Take off In the function, the Kong Ming lamp order to pursue better visual effect, the Kongming lantern here is larger and destroyed after 30 s.
7 Design and Implementation of Multi-scene Immersive Ancient Style …
97
Fig. 7.9 Project framework diagram
Fig. 7.10 Museum scene display screenshots
7.6 Artwork Display and Rendering Resource Analysis 7.6.1 Project Framework Diagram and Interactive Display Screenshots The following Fig. 7.9 shows the project framework of the project (Figs. 7.9, 7.10, 7.11 and 7.12).
7.6.2 Project Performance Analysis Performance is an omnipresent topic in making real-time games. In order to create the illusion of moving images, we need a frame rate of at least 15 frames per second.
98
S. Yang et al.
Fig. 7.11 Ancient city display screenshots
Fig. 7.12 Screenshots of lanterns lift off and mountain night scene display
Depending on the platform and game, 30, 60, or even more frames per second may be the target (Fig. 7.13). For large projects in particular, it is very important to know the occupation of rendering resources. As can be seen from the above figure, the water body, as a non-key roaming part, occupies too much rendering resources and memory, and the number of polys of palaces and trees is large. The number of polys of palace windows and branches and trunks should be optimized, while the number of polys of buildings on the street is small and the texture realism is insufficient. In the process of building
Fig. 7.13 Shader complexity and Quads
7 Design and Implementation of Multi-scene Immersive Ancient Style …
99
the main scene, the effect of the river did not meet the expectations. Later, the author will try to restore the light effect and ripple effect of the river, so as to achieve better results with a more resource-saving scheme.
7.6.3 Project Innovation The player explored the palace, lit the palace lantern, and triggered the thousand lights to fly, and then the game player ran after the lights. The process of running to the 1000lamp appointment is omitted in the project. The player will travel to the mountain with night view, which means that the player chases the Kong Ming lamp all the way, and the time has passed from evening to night on the mountain. Multi-scene interactions exemplify Unreal Engine’s power and detail in achieving dreamy scenes. In order to achieve better 3D effects, the Niagara Particle Effects plug-in was enabled to produce a lot of realistic particle effects. By using the timeline in Niagara, we visually control the rhythm of particle effects and set keyframes for them to make them better. This detail is modeled after the film and television editing industry, making the unreal world more detailed and realistic. When simulating the dusk lighting effect in the Tang Cheng scene part, this project combines the directional light source and the sky light source that are extremely close to the sunlight effect to simulate the real sunlight effect and its reflection effect. Coupled with the LightMass Importance Volume, which controls the accuracy of the precomputation, more indirect lighting cache points are generated inside the space, and well-controlled precomputed lighting can make the scene beautiful while reducing the cost of dynamic lighting.
7.7 Conclusion Relying on the unreal engine, the project realizes the matching of model and environment light and shadow in the interactive experience, as well as the harmonious unity of architecture, plants, terrain, fog effect, light effect and particle effect through the construction, and integration and optimization of the scene of the ancient city. Through various interactive functions, it also greatly enhances the image appeal in the roaming process and creates a real-time and high quality 3D fantasy world. As a more dynamic and intuitive form of expression, virtual reality has its unparalleled advantages. It combines unreal engine visual programming and sequence, Niagara particle effects, and other technologies to make it not only achieve better visual effects, but also have a more immersive feeling.
100
S. Yang et al.
Acknowledgements This work was supported by grant from: “Undergraduate teaching Reform and Innovation” Project of Beijing Higher Education (GM 2109022005); Beijing College Students’ innovation and entrepreneurship training program (Item No: 22150222040, 22150222044). Key project of Ideological and Political course Teaching reform of Beijing Institute of Graphics Communication (Item No: 22150222063). Scientific research plan of Beijing Municipal Education Commission (Item No: 20190222014).
Chapter 8
Auxiliary Figure Presentation Associated with Sweating on a Viewer’s Hand in Order to Reduce VR Sickness Masaki Omata and Mizuki Suzuki
Abstract We have proposed a system that superimposes auxiliary figures on a VR scene according to viewer’s physiological signals that are responses to the viewer’s VR sickness, in order to reduce VR sickness but not interfere with viewer’s sense of immersion. We conducted an experiment to find a type of physiological signals that correlated strongly with VR sickness. The results showed that sweating on a viewer’s hand was strongly correlated, and that the amount of sweating tended to increase as the VR sickness worsened. Then, we designed and developed the proposed system that controlled degree of alpha blending of color of Dots as an auxiliary figure that reduces VR sickness according to amount of the sweating and conducted an experiment to evaluate it. The results showed that the effect of the system to reduce VR sickness was found to be effective for the participants with less VR experience although there was no statistically significant difference among the experimental conditions. Additionally, the results also showed that the controlled presentation of the system was less distracting on immersion of a VR scene than the constant presentation as a previous system.
8.1 Introduction One of factors stalling spread of virtual reality (VR) contents is VR sickness, which refers to deterioration of physical condition caused by viewing a VR content. The symptoms of VR sickness are similar to those of general motion sickness, such as vomiting, cold or numb limbs, sweating, and headache [1, 2]. When the symptom appears, the user cannot enjoy a VR content and may stop viewing or avoid viewing M. Omata (B) Graduate Faculty of Interdisciplinary Research Faculty of Engineering, University of Yamanashi, Kofu, Japan e-mail: [email protected] M. Suzuki Department of Computer Science and Engineering, University of Yamanashi, Kofu, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_8
101
102
M. Omata and M. Suzuki
in the first place. This may hinder the future development of the VR market and field, and it is essential to elucidate the cause of VR sickness and take preventive measures. VR sickness is sometimes called visually induced motion sickness (VIMS) and is considered to be one of motion sicknesses. Motion sickness is a deterioration of physical condition caused by staying in a moving environment such as in a car or on a ship for a long time. Although the cause of motion sickness is not completely explained, the sensory conflict theory is the most popular theory. The theory states that when the pattern of empirical vestibular, visual, and somatosensory information is incompatible with the pattern of sensory information in the actual motor environment, motion sickness occurs during the adaptation process to the situation [3]. The same conflict is thought to occur in VR sickness. In other words, in VR sickness, the visual system perceives motion, while the vestibular system does not. Broadly speaking, two methods of reducing VR sickness have been studied: One is to provide a user with actual motion sensation from outside the virtual world, and the other is to provide some effect on a user’s field-of-view in the virtual environment. As examples of the former method, there are a method to apply wind to the user while viewing VR images [4], and a method to provide a pseudo-motor sensation by applying electricity to the vestibular system [5, 6]. However, these methods have disadvantages such as needs for large-scale equipment and high cost. On the other hand, as examples of the latter methods, there are a method to display gazing points on VR images [7], and a method to switch from the first-person’s view to the third person’s view in situations where the user is prone to sickness. These methods have an advantage that they can be solved within an HMD and are less costly than the former methods because they only require processing of the images. The latter method is more realistic in terms of the spread of VR contents. However, there are concerns that superimposed images may not match a world view of a VR environment, or that superimposed images may distract a user and make it difficult to concentrate on a VR game, thereby diminishing the sense of immersion. Therefore, we propose a system that is one of the latter methods, but instead of constantly displaying the superimposed figures, it keeps detecting signs of VR sickness from physiological signals and controls the display of the superimposed figures in real time according to the detection results. We aim to reduce VR sickness without lowering the sense of immersion.
8.2 Related Work As a method to change a user’s field-of-view to reduce motion sickness, Bos et al. hypothesized that appropriate visual information on self-motion was beneficial in a naval setting and conducted and experiment a ship’s bridge motion simulator with three visual conditions: an Earth-fixed outside view, an inside view that moved with the subjects, and a blindfolded condition [8]. As the results, sickness was highest in the inside viewing condition, and no effect of sickness on task performance was observed. Norouzi et al. investigated use of vignetting to reduce VR sickness when
8 Auxiliary Figure Presentation Associated with Sweating on a Viewer’s …
103
using amplified head rotations instead of controller-based input and whether the induced VR sickness is a result of the user’s head acceleration or velocity by introducing two different modes of vignetting, one triggered by acceleration and the other by velocity [9]. The results show generally indicating that the vignetting methods did not succeed in reducing VR sickness for most of the participants and, instead, lead to a significant increase. Duh et al. suggested that an independent visual background (IVB) might disturbance when conflicting visual and inertial cues [10]. They examined 3 levels of independent visual background with 2 levels of roll oscillation frequency. As the results, there were statistically significant effects of IVB and a significant interaction between IVB and frequency. Sargunam et al. compared three common joystick rotation techniques: traditional continuous rotation, continuous rotation with reduced field-of-view, and discrete rotation with fixed intervals for turning [11]. Their goal is to investigate whether there are tradeoffs for different joystick rotation techniques in terms of sickness, preferences in a 3D environment. The results showed no evidence of differences in orientation, but sickness ratings found discrete rotations to be significantly better than field-of-view reduction. Fernandes et al. explored the effect of dynamically, yet subtly, changing a physically stationary person’s field-of-view in response to visually perceived motion in a virtual environment [12]. Then, they could reduce the degree of VR sickness perceived by participants, without decreasing their subjective level of presence, and minimizing their awareness of the intervention. Budhiraja et al. proposed rotation blurring, uniformly blurring the screen during rotational movements to reduce cybersickness caused by character movements in a First Person Shooter game in virtual environment [13]. The results showed that the blurring technique led to an overall reduction in sickness levels of the participants and delayed its onset. On the other hand, as a method to add a figure on user’s field-of-view, Whittinghill et al. placed a three-dimensional model of a virtual human nose in the center of the fields of view of the display in order to observe that placing a fixed visual reference object within the user’s field-of-view seems to somewhat reduce simulator sickness [14]. As the results, users in the nose experimental group were able, on average, to operate the VR applications longer and with fewer instances of stop requests than were users in the no-nose control group. However, in the roller coaster game with intense movements, the average play time was only about 2 s longer. Cao et al. designed a see-through metal net surrounding users above and below as a rest frame to reduce motion sickness reduction in an HMD [15]. They showed that subjects feel more comfortable and tolerate when the net is included than when there was no rest frame. Buhler et al. proposed and evaluated two novel visual effects that can reduce VR sickness with head-mounted displays [16]. The circle effect is that the peripheral vision shows the point of view of a different camera. The border between the outer peripheral vision and the inner vision is visible as a circle. The dot effect adds artificial motion in peripheral vision that counteracts a virtual motion. The results showed lower means of sickness in the two effects; however, the difference is not statistically significant across all users. In many studies, entire view is changed, or figures are conspicuously superimposed. There are some superimposed figures that imitate the user’s nose, which are
104
M. Omata and M. Suzuki
not so obvious, but it is not effective in some situations, or can only be used for a first-person’s view. Therefore, Omata et al. designed a more discreet static figure in virtual space and a scene-independence figure connecting the virtual world and the real world [7]. The results show that the VR sickness tended to reduce by the superimposition of Dots on the four corners of the field-of-view. At the same time, however, they also showed that the superimposition of auxiliary figures reduced the sense of immersion. Based on the results of Omata et al.’s study, we investigated a method to reduce VR sickness without unnecessarily lowering the sense of immersion by displaying the Dots only when a symptom of the sickness appears. In addition, since hand sweating was used as a physiological index to investigate the degree of VR sickness in the study by Omata et al., and nasal surface temperature was used as an index of the degree of VR sickness in the study by Ishihara et al. [17]; we also have proposed to use these physiological signals as indexes to quantify the degree of VR sickness. Additionally, we have clarified an appropriate type of physiological signal for the purpose and have proposed to use the physiological signal as a parameter to emphasize the presentation of an auxiliary figure when a user felt VR sickness.
8.3 Experiment to Select Physiological Signal that Correlates with VR Sickness In this experiment, nasal surface temperature, hand blood volume pulse (BVP), and hand sweating of experimental participants watching a VR scene were measured and analyzed in order to find a type of physiological signals that were strongly correlated with their VR sickness. A magnitude estimation method was used to measure degree of psychological VR sickness of the participants [18]. The participants were instructed that the degree of discomfort of his/her VR sickness under his/her normal condition when they wore an HMD, and no images were presented was 100, and they were asked to verbally answer a degree of discomfort based on the 100 at 20 s intervals while viewing a VR scene. As an experimental task to encourage head movement, participants were asked to look for animals hidden in the VR scene.
8.3.1 Physiological Signals Based on the physiological and psychological findings, we selected nasal surface temperature, BVP, and sweating as candidates for the physiological responses to VR sickness that could be expected. ProComp INFINITI system from Thought Technology [19] was used as the encoder for the sensors. The surface temperature sensor was a Temperature-Flex/Pro from Thought Technology and was attached to a tip of participant’s nose, as shown in Fig. 8.1. In
8 Auxiliary Figure Presentation Associated with Sweating on a Viewer’s …
105
Fig. 8.1 Temperature sensor attached to a participant’s nose
Fig. 8.2 Blood volume pulse (BVP) sensor attached on a participant’s fingertip
this study, the Celsius temperature is used. The sensor can measure skin surface temperature between 10 and 45 °C. The BVP sensor was a BVP-Flex/Pro from Thought Technology and was attached on index finger of participant’s dominant hand as shown in Fig. 8.2. The sensor bounces infra-red light against a skin surface and measures the amount of reflected light in order to measure heart rate and BVP amplitude. In this study, the BVP value is a value averaged for each inter-beat interval (IBI). The sweating sensor was a SC-Flex/Pro from Thought Technology, which is a skin conductance (SC) sensor measures conductance across skin between two electrodes on fingers, and was attached on index and ring fingers of participant’s non-dominant hand as shown in Fig. 8.3. The inverse of the electrical resistance between the fingers is the skin conductance value.
8.3.2 Experiment Environment and Task Virtual Scene. A three-dimensional amusement park with a Ferris wheel and a roller coaster was presented as a VR scene. We used assets available on the Unity Asset
106
M. Omata and M. Suzuki
Fig. 8.3 Skin conductance (SC) sensor attached to a participant’s hand
Store [20], in order to create the amusement park. “Terrain Toolkit 2017” was used to generate the land; “Animated Steel Coaster Plus” was used for the roller coaster, and “Ferris Wheel” was used for the Ferris wheel. To generate and present the scene, we used a computer (IntelR CoreTM i5-8400 2.80 GHz CPU, GeForce GTX 1050 Ti), Unity, an HMD (Acer AH101), and inner-ear earphones (Apple, MD827FE). Task. All movement of the avatar in the virtual space was automatic, but an angle of the view was changed according to an orientation of the participant’s head. The first scene was a 45 s walk through the forest, followed by a 75 s ride on the Ferris wheel, a 30 s walk through the grassland again, and finally a 150 s ride on the roller coaster, for a total of 300 s. The participants wore the HMD to view the movement in the scene and looked for animals in the view linked to their head movements (However, there were no animals in the scene.). This kind of scene and task creates a situation that was more likely to induce VR sickness. The expected degree of VR sickness in the VR scene was small for walking in the forest, medium for riding the Ferris wheel, and large for riding the roller coaster. Procedure. The participants were asked to refrain from drinking alcohol the day before the experiment, in order to avoid sickness caused by factors other than the task. Participants were given informed consent prior to the experiment, and their consent to participate in the experiment was obtained. The participants were asked to spend 10 min before performing the task to familiarize themselves with the room temperature of 21 °C in our laboratory, and then, they were asked to wear the HMD, the skin temperature sensor, the BVP sensor, and the SC sensor. Figure 8.4 shows a participant wearing the devices. Then, after performing the experimental task, they were asked to answer a questionnaire about their VR experience. After the experiment, the participants were asked to stay in the laboratory until they recovered from VR sickness. The participants were ten undergraduate or graduate students (six males and four females) between the ages of 21 and 25 with no visual or vestibular sensory problems.
8 Auxiliary Figure Presentation Associated with Sweating on a Viewer’s …
107
Fig. 8.4 Experimental apparatus and a participant during an experimental task
8.3.3 Result Figures 8.5, 8.6, and 8.7 show the relationship between each of the three types of physiological signals and the participants’ verbal responses of discomfort. The measured physiological values are converted to normal logarithms in order to make it easier to check the correlation between physical measurements and human psychological responses. The line segments on the graph represent exponential trend lines, and each color represents an individual participant from A to J. Table 8.1 shows the mean and standard deviation of the coefficients of determination in the correlation analysis between each type of physiological signals and discomfort for each of the participants. Fig. 8.5 Scatter plots and regression lines between nasal surface temperature and discomfort
108
M. Omata and M. Suzuki
Fig. 8.6 Scatter plots and regression lines between BVP and discomfort
Fig. 8.7 Scatter plots and regression lines between sweating and discomfort
Table 8.1 Coefficient of determination in correlation between type of physiological signal and discomfort
Type of physiological signal
Mean
S.D
Nasal surface temperature
0.606
0.207
BVP
0.338
0.225
Sweating
0.723
0.148
8.3.4 Analyses and Discussions Determination Coefficient. From Fig. 8.7 and Table 8.1, we found that there was a strong correlation between hand sweating and discomfort due to VR sickness. From the figure, it can be seen that for all participants, the more discomfort increases, the more sweat increases, and the rate of increases of all participants also follows the same trend. Additional Confirmatory Experiment. The determination coefficient of the nasal surface temperature also shows a rather strong correlation, and the graph shows that the temperatures tend to increase as the discomfort increases. However, this tendency is different from the result of Ishihara et al. that the temperature decreases with the
8 Auxiliary Figure Presentation Associated with Sweating on a Viewer’s …
109
increase of discomfort [17]. Therefore, we conducted an additional experiment to measure nasal surface temperature of two participants during a 5-min period when they were wearing the HMD, and no images were shown. As a result, the temperatures of both participants increased. The reason for this could be that the heat was trapped between the HMD and their noses. Therefore, even if there is a relationship between the nasal surface temperature and VR sickness, we think it is difficult to actually use the temperature as an indicator of VR sickness for the reason. Time Difference between VR Sickness and Sweating Response. Since there was a strong correlation between VR sickness and sweating, we analyzed the time difference between the onset of VR sickness and the sweating response. Figure 8.8 shows the time series variation of the subjective evaluation of discomfort for each participant from A to J, and Fig. 8.9 shows the time series variation of sweating for each participant from A to J. In Fig. 8.8, since the responses are based on the ME method, some participants answered with a large difference. Therefore, we investigated the time difference between the time series variation of the subjective discomfort and the time series variation of the sweating of the two participants who answered with the large differences, and found that there was almost no time difference within the range of the sampling rate of every 20 s in the experimental tasks. Limit of Discomfort. We asked, “Did you feel so uncomfortable that you wanted to stop watching the VR video? If so, what was the degree of discomfort?” in the posttask questionnaire. As the results, five participants out of all participants answered that they were in the situation, and the average discomfort level of the five participants at that time was 196 ± 26.5. Based on the result of the previous section, which showed that there was no large time difference between the onset of discomfort and the sweating response, the discomfort degree of the five participants 20 s prior to that time was 162 ± 11.7. Moreover, according to Stevens’ power law [21], the mean of the power indexes of the five participants is 1.13 ± 0.15.
Fig. 8.8 Time series changes of discomfort
110
M. Omata and M. Suzuki
Fig. 8.9 Time series changes of sweating
8.4 Design of Auxiliary Figure Control System We found the relationship between the amount of sweating and the degree of VR sickness, as well as its limit and power index, from the experiment in the previous section. Based on the results, this section explains a design of a system that controls degree of alpha blending of auxiliary figures that reduces VR sickness according to amount of sweating on a hand of a VR viewer. Based on an assumption that Stevens’ law of Eq. (8.1) holds between a psychological measure of discomfort of VR sickness and a physical measure of sweating, we constructed an equation to derive a percentage of alpha value of the alpha blending. R = k Sn
(8.1)
where R is physical quantity, S is psychological quantity, n is power index, and k is coefficient. The power index n is 1.13, which was obtained in the previous section. The equation was derived so that the alpha value would be 0% when the discomfort value was 100, which was the standard value of the experiment in the previous section, and 100% when the discomfort value was 162, which was slight lower than the limit value in the previous section. By deriving this equation, the more the amount of sweating increases as the discomfort value increases, the larger the alpha percentage of the superimposed auxiliary figure becomes, and the more clearly the auxiliary figure becomes visible. The derived equation is Eq. (8.2). x 1.13 −1 α = 161.3 z
(8.2)
where α is the alpha percentage for alpha blending, Z is the normal sweating volume (µS) of a viewer, x is the real-time sweating volume (µS) at each sample time when the viewer watches a VR scene, and the power index is 1.13.
8 Auxiliary Figure Presentation Associated with Sweating on a Viewer’s …
111
Fig. 8.10 Dots auxiliary figure being alfa blended onto a VR scene
As a previous study, Omata et al. evaluated four types of auxiliary figures (Gazing point, Dots, User’s horizontal line, and Real-world’s horizontal line) that aimed to reduce VR sickness, and among them, the Dots was the one that reduced VR sickness the most [7]. Therefore, we adopt the Dots design as the auxiliary figure in this research. Dots are composed of four dots, as shown in Fig. 8.10. The four dots are superimposed on the four corners of the view on a screen of an HMD. The dots do not interfere with content viewing, and it is thought that the decline in a sense of immersion can be reduced. In this study, we made the Dots design a little larger than that of Omata et al. and changed the color from white to peach. The reason for the larger size is to make it easier to perceive the dots as foreground than the system of Omata et al. The reason for the changing color is to avoid blending in with white roads and clouds in a VR scene of our experiment. The overall flow of the auxiliary figure presentation system is as follows: First, the skin conductance value of ProComp INFINITI, which is a biological amplifier, is continuously measured, and then, the value is continuously imported into Unity, which presents a VR scene, and is reflected in alpha percentage of the color of Dots on the HMD. The normal sweating value for each viewer is the average of the viewer’s skin conductance acquired for 3 s after the system starts. The timing for updating the alpha value should be once every two seconds so that it does not blink during drawing.
8.5 Evaluation Experiment We hypothesized that controlling the auxiliary figure according to sweating on a hand would reduce VR sickness without losing a sense of immersion. Therefore, we conducted an evaluation experiment to validate differences between the three conditions described in the previous section: the condition where the alpha value of the auxiliary figure is controlled according to a volume of sweating (hereinafter
112
M. Omata and M. Suzuki
called “controlled presentation”), the condition where the auxiliary figure is always displayed without the alpha blending (hereinafter called “constant presentation”), and the condition where the auxiliary figure is not displayed (hereinafter called “no auxiliary figure”). The specific hypothesis was that the control presentation was significantly less likely to cause VR sickness than the no auxiliary figure and had the same sickness reduction effect as the constant presentation, and the control presentation was less distracting to gain a sense of immersion than the constant presentation and gave the same sense of immersion as the no auxiliary figure. The general flow of this experiment was the same as in Sect. 8.3, but instead of oral responses, the participants of this experiment were asked to answer Simulator Sickness Questionnaire (SSQ) [22] before and after viewing a VR scene, and Game Experience Questionnaire (GEQ) [23] at the end of the VR scene. The SSQ is an index of VR sickness that can derive three sub-scores (Oculomotor-related sub-score, nausea-related sub-score, and disorientation-related sub-score) and the total score by rating 16 symptoms that are considered to be caused by VR sickness on a four-point scale from 1 to 4. Since each sub-score is calculated in a different way, it is difficult to compare them and to understand how large each sub-score is. Therefore, in this experiment, the maximum value of each sub-score is expressed as a percentage of 100%. In addition, to measure a worsening of VR sickness before and after an experimental task, the SSQ was rated on a 4-point scale from 0 to 3 before the task. The GEQ is a questionnaire to measure the user’s experience after gameplay. Originally, 14 factors can be derived, but in this experiment, we focused on positive and negative affect, tension, and immersion, and asked the participants to rate the 21 questions on a 5-point scale from 1 to 5.
8.5.1 Procedure and Task The experimental environment was the same as in Sect. 8.3. The task was slightly different. In Sect. 8.3, the participants were asked to search for animals in the space where they were not actually placed. In this experiment, however, the participants were asked to search for animals in a space where they were actually placed so that they would not get bored. The participants were asked to refrain from drinking alcohol the day before to avoid sickness caused by factors other than VR. The room temperature was adjusted to a constant 21 °C using an air conditioner. Informed consent was given to all participants before the experiment. In order to counterbalance the order of the three conditions, the participants were divided into three groups and asked to leave at least one day between the next conditions to reduce the effect on habituation. At the beginning of watching the VR scene in each of the three conditions, a participant answered the SSQ to check his or her physical condition before the start. After that, the participant wore the HMD and skin conductance sensor and then watched the VR scene for 300 s like the task in Sect. 8.3. After watching it, the participant answered the SSQ, GEQ, and questions about the condition. This
8 Auxiliary Figure Presentation Associated with Sweating on a Viewer’s …
113
procedure was carried out as a within-subjects design, with three conditions for each participant, with at least one day in between. The participants were twelve undergraduate or graduate students (six males and six females) between the ages of 21 and 25 with no visual or vestibular sensory problems.
8.5.2 Results SSQ. Figure 8.11 shows the results of the SSQ of the differences among the three conditions on VR sickness. Each score shown here is the difference between each participant’s post-task SSQ score minus his or her pre-task SSQ score. The error bars indicate the standard errors. Therefore, the higher the value is, the worse the degree of VR sickness became due to the task. The scores of all evaluation items decreased in the control and constant presentation conditions compared to the no auxiliary figure condition, but the results of the Wilcoxon-signed rank sum test (5% level of significance) showed no significant difference among them. In order to analyze a difference in the number of times the participants had experienced VR in the past, we divided the participants into two groups: one with more than five VR experiences (four participants) and the other with less than five VR experiences (eight participants). Figure 8.12 shows the SSQ scores of the group with less experience, and Fig. 8.13 shows the SSQ scores of the group with more experience. As a result, in the group with less experience, although there was no significant difference in the Wilcoxon’s-signed rank sum test (5% level of significance), it was found that the control condition and the constant condition reduced VR sickness compared to the no auxiliary figure condition. On the other hand, no such trend was observed in the group with more VR experience.
Fig. 8.11 Results of SSQ for all participants
114
M. Omata and M. Suzuki
Fig. 8.12 Results of SSQ of the group with less experience
Fig. 8.13 Results of SSQ of the group with more experience
GEQ. Figure 8.14 shows the results of the GEQ of differences of the three conditions. The larger the value, the more intense the experience was. The error bars also indicate the standard errors. Since the scores of tension were low regardless of the condition, it is considered that sweating due to tension did not occur. The positivity, negativity, and immersion items showed little differences among the conditions. The results of the Wilcoxon’s signed rank sum test (5% level of significance) showed no significant differences among the conditions in each item. Impressions of Auxiliary Figure. As a result of the question “What did you think about the display of Dots?” regarding the auxiliary figure in the control and constant presentation conditions, 3 out of 12 participants answered “I didn’t mind” in the constant condition, while 7 out of 12 participants answered “I didn’t mind” in the control condition.
8 Auxiliary Figure Presentation Associated with Sweating on a Viewer’s …
115
Fig. 8.14 Results of GEQ for all participants
8.5.3 Discussions VR Sickness. We divided the groups according to the number of times the participants had experienced VR and analyzed the results of the responses to the SSQ for each group. The results showed that there was no statistically significant difference between the conditions for the group with less VR experience. However, from the graph in Fig. 8.11, we infer that the SSQ scores were lower with the auxiliary figure than without it. Therefore, we assumed that the difference in SSQ scores depended on the number of VR experiences and summarized the relationship between the number of VR experiences and SSQ scores in a scatter diagram (Fig. 8.15). From Fig. 8.15, we found that the difference in SSQ scores decreased with the number of VR experiences. In other words, this suggests that the auxiliary figure have a negative impact on users with a large number of experiences, and therefore, it is appropriate to present the auxiliary figure to users with little VR experience. Fig. 8.15 Correlation between the number of VR experiences and the difference of SSQ scores
116
M. Omata and M. Suzuki
From another point of view, regarding the issue of the difficulty in reducing VR sickness in a scene with intense motion in “Virtual nose” by Whittinghill et al. [14], we infer that our proposed system was able to reduce VR sickness because the increase in sweating was reduced even in the scene with intense motion in the latter half of our experimental VR scene. Sense of Immersion. The results of the GEQ were not able to provide an overall trend on the participants’ senses of immersion because the scores varied widely and were not statistically significant. This suggests that the superimposition of the auxiliary figure does not have a significant negative impact on the sense of immersion. On the other hand, it also indicates that there is no significant difference in the sense of immersion between the proposed control presentation and the constant presentation. Therefore, in the future, we consider it necessary to use or create more specific and detailed indices to evaluate a sense of immersion. Most of the participants answered that they were not bothered by the control presentation of the auxiliary figure. The reason for this is that the auxiliary figure gradually appeared by alpha blending in the control presentation condition, and thus, they were blended into the VR image without being strongly gazed at by the viewer.
8.6 Conclusions In this paper, we proposed a system that superimposes auxiliary figures on a VR scene according to viewer’s physiological signals that are responses to the viewer’s VR sickness, in order to reduce VR sickness but not interfere with viewer’s sense of immersion. For the purpose, we conducted a physiological signal survey experiment and an evaluation experiment of the proposed system. In the first experiment to find the type of physiological signals that correlated strongly with VR sickness, we found that sweating was strongly correlated among nasal surface temperature, fingertip BVP, and hand sweating, and that the amount of sweating tended to increase as the degree of VR sickness became stronger. Based on this result, we developed an auxiliary figure-presentation-control system that controls a degree of saliency of the auxiliary figure that reduces VR sickness by varying an alpha value of alpha blending according to amount of sweating of a viewer’s hand while viewing a VR scene. As the result of the evaluation experiment of the system, we found that the controlled presentation had the effect of reducing VR sickness although there was no significant difference in SSQ between the controlled presentation and the constant presentation. The effect of the control presentation to reduce VR sickness was found to be effective for the participants with less VR experience. In addition, through the interviews with all the participants, it was found that the controlled presentation was less distracting than the constant presentation. Since it was stated in the previous research that there are individual differences in the degree and tendency of reduction of VR sickness, in the future of our research,
8 Auxiliary Figure Presentation Associated with Sweating on a Viewer’s …
117
we plan to analyze the effect of the proposed system in detail from the viewpoint of differences between men and women, differences in SSQ scores at the beginning of the task, or differences in eye movements during task execution, other than the differences in VR experience shown in this paper. Then, based on the results of such detailed analysis, in the future, we plan to develop a learning HMD that switches a method to reduce VR sickness and its parameters based on the user’s time of VR experience, time spent experiencing the same content, and variation of various physiological reactions resulting from VR sickness in real time.
References 1. Jerald, J.: The VR book: human-centered design for virtual reality. In: Association for Computing Machinery and Morgan & Claypool (2015) 2. Brainard, A., Gresham, C.: Prevention and treatment of motion sickness. Am. Fam. Phys. 90(1), 41–46 (2014) 3. Kariya, A., Wada, T., Tsukamoto, K.: Study on VR sickness by virtual reality snowboard. Trans. Virtual Reality Soc. Japan 11(2), 331–338 (2006) 4. Hashilus Co, Ltd.: Business description. https://hashilus.com/business/. Last accessed 2022/01/20 5. Aoyama, K., Higuchi, D., Sakurai, K., Maeda, T., Ando, H.: GVS RIDE: providing a novel experience using a head mounted display and four-pole galvanic vestibular stimulation. In ACM SIGGRAPH 2017 Emerging Technologies (SIGGRAPH’17), Article 9, pp. 1–2. Association for Computing Machinery, New York, NY, USA (2017) 6. Sra, M., Jain, A., Maes, P.: Adding proprioceptive feedback to virtual reality experiences using galvanic vestibular stimulation. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI’19), Paper 675, pp, 1–14. Association for Computing Machinery, New York, NY, USA (2019) 7. Omata, M., Shimizu, A.: A proposal for discreet auxiliary figures for reducing VR sickness and for not obstructing FOV. In: Proceedings of the 18th IFIP TC 13 International Conference on Human-Computer Interaction, INTERACT 2021, Sequence number 7. Springer International Publishing (2021) 8. Bos, J.E., MacKinnon, S.N., Patterson, A.: Motion sickness symptoms in a ship motion simulator: effects of inside, outside, and no view. Aviat. Space Environ. Med. 76(12), 1111–1118 (2005) 9. Norouzi, N., Bruder, G., Welch, G.: Assessing vignetting as a means to reduce VR sickness during amplified head rotations. In: Proceedings of the 15th ACM Symposium on Applied Perception (SAP’18), Article 19, pp. 1–8. Association for Computing Machinery, New York, NY, USA (2018) 10. Duh, H.B., Parker, D.E., Furness, T.A.: An “independent visual background” reduced balance disturbance envoked by visual scene motion: implication for alleviating simulator sickness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’01), pp. 85–89. Association for Computing Machinery, New York, NY, USA (2001) 11. Sargunam, S.P., Ragan, E.D.: Evaluating joystick control for view rotation in virtual reality with continuous turning, discrete turning, and field-of-view reduction. In: Proceedings of the 3rd International Workshop on Interactive and Spatial Computing (IWISC’18), pp. 74–79. Association for Computing Machinery, New York, NY, USA (2018) 12. Fernandes, A.S., Feiner, S.K.: Combating VR sickness through subtle dynamic field-of-view modification. In: 2016 IEEE Symposium on 3D User Interfaces (3DUI), pp. 201–210 (2016) 13. Budhiraja, P., Miller, M.R., Modi, A.K., Forsyth, D.: Rotation blurring: use of artificial blurring to reduce cybersickness in virtual reality first person shooters. arXiv:1710.02599[cs.HC] (2017)
118
M. Omata and M. Suzuki
14. Whittinghill, D.M., Ziegler, B., Moore, J., Case, T.: Nasum Virtualis: a simple technique for reducing simulator sickness. In: Proceedings of Games Developers Conference (GDC), 74 (2015) 15. Cao, Z., Jerald, J., Kopper, R.: Visually-induced motion sickness reduction via static and dynamic rest frames. In: IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 105–112 (2018) 16. Buhler, H., Misztal, S., Schild, J.: Reducing VR sickness through peripheral visual effects. In: IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 517–519 (2018) 17. Ishihara, N., Yanaka, S., Kosaka, T.: Proposal of detection device of motion sickness using nose surface temperature. In: Proceedings of the Entertainment Computing Symposium 2015, pp. 274–277. Information Processing Society of Japan (2015) 18. Narens, L.: A theory of ratio magnitude estimation. J. Math. Psychol. 40(2), 109–129 (1996) 19. Thought Technology Ltd.: ProComp infiniti system. https://thoughttechnology.com/procompinfiniti-system-w-biograph-infiniti-software-t7500m. Last accessed 2022/01/20. 20. Unity Asset Store.: https://assetstore.unity.com/. Last accessed 2022/01/20. 21. Stevens, S.S.: On the psychophysical law. Psychol. Rev. 64(3), 153–181 (1957) 22. Kennedy, R.S., Lane, N.E., Berbaum, K.S., Lilienthal, M.G.: Simulator sickness questionnaire: an enhanced method for quantifying simulator sickness. Int. J. Aviat. Psychol. 3(3), 203–220 (1993) 23. IJsselsteijn, W.A., de Kort, Y.A.W., Poels, K.: The game experience questionnaire. Technische Universiteit Eindhoven. Eindhoven (2013)
Chapter 9
Design and Implementation of Immersive Display Interactive System Based on New Virtual Reality Development Platform Xijie Li, Huaqun Liu, Tong Li, Huimin Yan, and Wei Song
Abstract In order to solve the single form of traditional museums, reduce the impact of space and time on ice and snow culture and expand the influence and dissemination of ice and snow culture, we developed an immersive Winter Olympics virtual museum based on Unreal Engine 4. We used 3ds Max to build virtual venues, import Unreal Engine 4 through Datasmith, and explore the impact of lighting, materials, sound, and interaction on virtual museums. This article gives users an immersive experience by increasing the realism of the space, which provides a reference for the development of virtual museums.
9.1 Introduction With the rapid development of society and the improvement of people’s living standards, virtual reality technology has been widely used in medical, entertainment, aerospace, education, tourism, museums and many other fields. The digitization of museums is an important trend in the development of museums in recent years, and the application of virtual reality technology in the construction of digital museums X. Li · H. Liu (B) · H. Yan School of New Media, Beijing Institute of Graphic Communication, Digital Art and Innovative Design (National) Experimental Demonstration Center, Beijing, China e-mail: [email protected] X. Li e-mail: [email protected] H. Yan e-mail: [email protected] T. Li · W. Song School of Information Engineering, Beijing Institute of Graphic Communication, Beijing, China e-mail: [email protected] W. Song e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_9
119
120
X. Li et al.
is an important topic. The Louvre Museum is the first museum to move its collections from the exhibition hall to the Internet. The Palace Museum has also started the process of VR digitalization. The V Palace Museum display platform connects online and offline display content, breaking the rules and improving user stickiness. “Bringing the Forbidden City Cultural Relics Home.” The successful hosting of the 2022 Beijing-Zhangjiakou Winter Olympics has made Beijing a city that hosts both the Summer Olympics and the Winter Olympics. However, the spread of ice and snow culture has certain limitations, which are affected by many factors such as epidemic situation and region. Using virtual reality technology to build a virtual Winter Olympics museum can break the limitations of physical museums, expand the extension space of the museum, expand the functions of the museum, and an effective way to meet the multi-level and multi-faceted needs of the public. This is one of the directions for the future development of digital museums and has broad prospects for development.
9.2 System Methodology 9.2.1 Demand Analysis In the context of informatization and modernization, some exhibition halls such as museums, as important media for the accumulation and dissemination of modern culture, have a relatively important position in education research and social development. At the same time, since the successful bid for the 2022 Beijing-Zhangjiakou Winter Olympics, the construction and publicity of ice and snow Nagano have been strengthened throughout the country. The traditional way of visiting cannot meet the needs of modern audiences, and the virtual museum is particularly important due to the impact of the region and the epidemic. From the perspective of user experience, this paper presents the display information to users in a multi-sensory, multi-layered and three-dimensional way, so that users feel like they are in the scene of a virtual museum. The virtual museum can break the limitations of traditional memorial halls in terms of time, space, region and visiting form, so as to promote Beijing, the Winter Olympics and the Olympic spirit.
9.2.2 Analysis of Our Developing Platform In recent years, Unreal Engine 4 has also been widely used with the rise of virtual reality technology. Unreal Engine 4 has its own advantages over others, not only has good operating habits, real light and shadow relationship, but also has flexible and free interface design and minimalist interaction design implementation [1].
9 Design and Implementation of Immersive Display Interactive System …
121
The UE4 brings a new way of program development. Visualization blueprint script makes developers more convenient through integrated code visualization, provides more possibilities for the realization of functions and enhances the editability of blueprints. The blueprint script is very easy to read, which not only enhances the development efficiency, but also can be connected through the node to watch the running process more intuitively, and it is convenient to solve the problems that arise [2].
9.2.3 Development Processing The project used 3ds Max modeling tools to build virtual scenes and optimize geometry. The project used Adobe Photo-shop to texture models and Adobe After Effects and Adobe Audition to work on video and audio materials. This article exports the 3D model to Datasmith format and imports it into Unreal Engine for project scene construction, reprocesses the materials and textures of some models, completes the key processes such as the creation of model materials and lighting, and the preparation of Blueprint interaction events to complete the production of the project (Fig. 9.1). Fig. 9.1 System flowchart
122
X. Li et al.
9.3 Analysis of Key Technologies 9.3.1 Model Construction and Import The quality of the model construction effect in the virtual museum has a great impact on the implementation of the final system, because the model serves as a carrier for functional implementation. First, we made a CAD basemap to determine the model positioning criteria. Then, we built a scene model from the 3ds Max based on the drawn planar shape, paying special attention to the units, axial direction, and model scale. When building the model, the purpose of removing the excess number of faces was not only to improve the utilization of the map, reduce the number of faces of the entire scene, but also improve the speed of the interactive scene [3] as shown in the example (Fig. 9.2). After the scene was modeled, our used a file-based workflow to bring designs into Unreal. Datasmith gets our design data into Unreal quickly and easily. Datasmith is a collection of tools and plugins that bring entire pre-constructed scenes and complex assets created in a variety of industry-standard design applications into Unreal Engine. Firstly, we install a special plugin in 3ds Max, which we used to export files with the. udatasmith extension. And then, we used the Datasmith Importer to bring the saved or exported file into your current Unreal Engine Project (Fig. 9.3). Using the Datasmith workflow, it is possible to achieve one-to-one restoration of the scene, establish a single Unreal asset for all instantiated objects, maintain the original position and orientation, realize layer viewing and automatically convert the map, further narrowing the gap between the design results and the final product in the actual experience.
9.3.2 Real Shading in Unreal Engine 4 The first thing we achieve in virtual reality is to imitate the effect of the eyes, to achieve a realistic sense of space and immersion. Unreal Engine’s rendering system is key to its industry-leading image quality and superior immersive experience. Realtime rendering technology is an important part of computer graphics research [4]. The purpose of the application of this technology is to allow users to experience the immersive feeling, according to the real situation of the scene’s shape, material and light source distribution, to produce visual effects similar to the real scene and almost indistinguishable. Due to the limitation of space, the visual experience is dominant for people in the virtual environment. In this project, the presentation effect of the model also greatly affects the user experience, after completing the system’s scene construction, it is necessary to further improve the model and rendering.
9 Design and Implementation of Immersive Display Interactive System …
123
Fig. 9.2 CAD floor plan and scene building in 3ds Max
Illumination In Unreal Engine 4, there are a few key properties that have the greatest impact on lighting in the world simulating the way light behaves in 3D worlds is handled in one of two ways: using real-time lighting methods that support light movement and interaction of dynamic lights, or by using precomputed (or baked) lighting information that gets stored in textures applied to geometric surfaces [5]. Unreal Engine provides both these ways of lighting scenes and they are not exclusive to one another as they can be seamlessly blended between one another.
124
X. Li et al.
Fig. 9.3 Datasmith workflow
Physically Based Rendering (PBR) refers to the rendering concept of using a shading/lighting model modeled based on physics principles and micro-plane theory, and using surface parameters measured from reality to accurately represent realworld materials. PBR shading is mainly divided into two parts: Diffuse BRDF and Microfacet Specular BRDF. The BRDF describes reflectance of the surface for given combination of incoming and outgoing light direction. In other words, it determines how much light is reflected in given direction when certain amount of light is incident from another direction, depending on properties of the surface. Note that BRDF does not distinguish between direct and indirect incoming light, meaning it can be used to calculate contribution of both virtual lights placed in the scene (local illumination), and indirect light reflected one or more times from other surfaces (global illumination). This also means that BRDF is independent of the implementation of lights which can be developed and authored separately (BRDF only needs to know direction of incident light and its intensity at shaded point) [6]. Lambert Model BRDF: f (l, v) =
Cdiff π
(9.1)
This BRDF value states that the intensity of the reflected light is proportional to the intensity of the incident light, regardless of the angle of reflection. So, no matter what angle the material is viewed from, the final light intensity reflected into the camera is the same. Cook-Torrance Model BRDF: f (l, v) =
D(h)F(v, h)G(l, v) 4(n ∗ l)(n ∗ v)
(9.2)
9 Design and Implementation of Immersive Display Interactive System …
125
• D: Normal Distribution Function (NDF) • F: Fresnel Equation • G: Geometry Function • N is the material normal, and H is the angle bisector direction of the illumination direction L and the line of sight direction V. When the light source is farther away from the target object, the less its lighting effect on the object will be, which is lighting mode. Lighting mode adopted a physically accurate inverse square falloff and switched to the photometric brightness unit of lumens to improve light falloff was relatively straightforward. We chose to window the inverse square function in such a way that the majority of the light’s influence remains relatively unaffected, while still providing a soft transition to zero. This has the nice property whereby modifying a light’s radius does not change its effective brightness, which can be important when lighting has been locked artistically, but a light’s extent still needs to be adjusted for performance reasons [7]. falloff =
2 saturate 1 − (distance/lightRadius)4 diatance2 + 1
(9.3)
The 1 in the denominator is there to prevent the function exploding at distances close to the light source. It can be exposed as an artist-controllable parameter for cases where physical correctness is not desired. The quality difference this simple change made, particularly in scenes with many local light sources, means that it is likely the largest bang for buck takeaway. It is worth exploring in depth [8] (Fig. 9.4). Materials Controlling the appearance of surfaces in the world using shaders. A material is an asset that can be applied to a mesh to control the visual look of the scene. In more technical terms, when light from the scene hits the surface, a material is used to calculate how that light interacts with that surface. These calculations are done using incoming data that is input to the material from a variety of images (textures) and math expressions, as well as from various property settings inherent to the material itself. The base material model includes: base color, metallic, specular, roughness. The project was able to better reflect the snow sports through snow scenes and so on, using blueprints to make snow on the ground and snow on objects. As shown in Fig. 9.5. Using subsurface scattering in snow materials. Subsurface scattering is the term used to describe the lighting phenomenon where light scatters as it passes through a translucent/semi translucent surface [9]. Using a subsurface profile in snow material. To fulfill render realistic, Unreal Engine 4 (UE4) now offers a shading method called subsurface profile. While the subsurface profile shading model has similar properties to the subsurface shading model, its key difference is in how it renders (Fig. 9.6).
126
X. Li et al.
Fig. 9.4 Lighting effect map
9.3.3 Making Interactive Experiences In addition to providing Real Shading, Unreal Engine also has a variety of interaction methods. Now that most virtual reality products use the buttons of the controller to implement interactive functions, accurate gesture recognition and eye movement recognition are not yet fully mature. Based on the demand analysis of the virtual museum, the following interactive methods are mainly adopted: users enter the virtual museum of the Winter Olympics and roam the museum [10]. In order to let users have more directions and more understanding of the needs of virtual museums, 2 different angles and different forms of roaming methods have been specially set up. That is, first-person roaming, third-person roaming. In Unreal Engine 4, both roaming methods are controlled by character blueprints. Create a new pawn blueprint class, add to the scene and do camera switch events and add regular game perspective operations to the pawnCamera to achieve firstperson and third-person perspective switching (Fig. 9.7). Box Trigger Interaction: Blueprints control the ON and OFF of videos by adding box triggers and defining OnActorBeginOverlap and OnActorEndOverlap events: When a character touches a box trigger, the video turns on playback. Video playback stops when the character leaves the range of the box trigger. Such a design will make the user’s sense of experience and realism more perfect. The results are as shown in Fig. 9.8. In a virtual museum, sound information can be superimposed on the real-world virtual accompaniment in real time, producing a mixed effect of sight and hearing, which can supplement the limitations of seeing. When the virtual scene changes, the
9 Design and Implementation of Immersive Display Interactive System …
127
Fig. 9.5 Material of the snow on the ground
voice the user hears changes accordingly, rendering it immersive. Use audio volumes to add reverb effects to the virtual museum, increase spatial realism, adjust the dry humidity of the level sound and get a more realistic sense of distance in space. For added realism, the sound usually moves, not just static [11] (Fig. 9.9).
128
Fig. 9.6 Comparison chart using a subsurface profile
Fig. 9.7 Camera switch blueprint
X. Li et al.
9 Design and Implementation of Immersive Display Interactive System …
129
Fig. 9.8 Box touch renderings
Fig. 9.9 Cone audio attenuation
9.4 Conclusion Virtual museums have changed traditional concepts and broken the shackles of time and space. The virtual museum increases the enthusiasm of visitors and enriches the display form of the museum. Diverse interaction design will be the direction of the future of immersive virtual museums [12]. The Winter Olympics virtual museum integrates technology and sports, more people can understand Beijing and understand the 2022 Winter Olympics through new forms. While disseminating Chinese culture, it also enhances China’s soft power and international influence. In the future, combined with the increasingly perfect concept of science and technology, we will develop virtual museum with a high level, a high level and more significance.
130
X. Li et al.
Acknowledgements This work was supported by grant from: “Undergraduate teaching Reform and Innovation” Project of Beijing Higher Education (GM 2109022005); Beijing College Students’ innovation and entrepreneurship training program (Item No: 22150222040, 22150222044). Key project of Ideological and Political course Teaching reform of Beijing Institute of Graphics Communication (Item No: 22150222063). Scientific research plan of Beijing Municipal Education Commission (Item No: 20190222014).
References 1. Kersten, T.P.: The Imperial Cathedral in Knigslutter (Germany) as an immersive experience in virtual reality with integrated 360° panoramic photography. Appl. Sci. 10 (2020) 2. Libing, H.: Immersive somatosensory interaction system based on VR technology: design and realization of “going to Nanyang” story. Software 42(12), 7 (2021) 3. Santos, B., Rodrigues, N., Costa, P., et al.: Integration of CAD models into game engines. In: VISIGRAPP (1: GRAPP), pp. 153–160 (2021) 4. Burley, B., Studios, W.D.A.: Physically-based shading at Disney. In: ACM SIGGRAPH, pp. 1–7 (2012) 5. Wojciechowski, K., Czajkowski, T., Artur, B.K., et al.: Modeling and rendering of volumetric clouds in real-time with unreal engine 4. Springer, Cham (2018) 6. Boksansky, J.: Crash course in BRDF implementation (2021) 7. Karis, B., Games, E.: Real shading in unreal engine 4. Proc. Phys. Shading Theory Pract. 4(3), 1 (2013) 8. Natephra, W., Motamedi, A., Fukuda, T., et al.: Integrating building information modeling and virtual reality development engines for building indoor lighting design. Visualization Eng. 5(1), 1–21 (2017) 9. Yan, H., Liu, H., Lu, Y., et al.: Dawn of south lake”—design and implementation of immersive interactive system based on virtual reality technology (2021) 10. Jianhua, L.: Interactive design of virtual dinosaur museum based on VR technology. Comput. Knowl. Technol. 16(13), 257–259 (2020). https://doi.org/10.14004/j.cnki.ckt.2020.1698 11. Group R.: Shadow Volumes in Unreal Engine 4 (2017) 12. Wang, S., Liu, H., Zhang, X.: Design and implementation of realistic rendering and immersive experience system based on unreal engine4. In: AIVR2020: 2020 4th International Conference on Artificial Intelligence and Virtual Reality (2020)
Chapter 10
360-Degree Virtual Reality Videos in EFL Teaching: Student Experiences Hui-Wen Huang, Kai Huang, Huilin Liu, and Daniel G. Dusza
Abstract In technology-enhanced language learning environments, virtual reality (VR) has become an effective tool to support contextualized learning and promote immersive experiences with learning materials. This case study, using the qualitative research method, explored students’ attitudes and perceptions of engaging in VR language learning. Twenty-eight Chinese sophomores attended this VR project in a travel English unit. They wore VR headsets to watch 360 VR videos containing famous tourist attractions from four countries during a 6-week project. Data were collected from final reflections and interviews. Results of final reflections indicated that students showed positive feedback on this new teaching method in English learning. In addition, interview data present the advantages and disadvantages of implementing immersive VR learning in EFL classrooms.
10.1 Introduction Integrating technology into language classrooms has drawn numerous scholars’ attention [1–3]. It can support teachers’ teaching efficacy, promote student learning engagement, and supplement traditional textbooks [4, 5]. Among the emerging technologies applied in education, virtual reality (VR) is one of the most valuable learning H.-W. Huang (B) Shaoguan University, Guangdong, China e-mail: [email protected] K. Huang · H. Liu Fujian University of Technology, Fujian, China e-mail: [email protected] H. Liu e-mail: [email protected] D. G. Dusza Hosei University, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_10
131
132
H.-W. Huang et al.
tools because of its immersive experience [6, 7]. This VR immersion leads to engagement or flow, which is beneficial for students to connect with the learning content for producing deep positive emotional value, which consequently enhances learning outcomes [8–10]. The nature of VR immersion enhances students’ active learning with embodiment [11], long-term retention [12], and enjoyment [13]. Students experience contextualized learning when they actively engage in the 360 VR video content. This learning experience is different from traditional classrooms with oneway or didactic lectures, focusing on shared content, in which students tend to learn passively [12]. VR technology has been employed in a variety of educational contexts. Together with providing students with immersive engagement in learning new knowledge in experiencing foreign language learning [6, 14, plant cells [13] and English public speaking [9], these studies demonstrate that VR technologies facilitate promising new learning results. Previous scholars indicated that applying VR in education can increase students’ authentic experience in contextualized learning, which consequently empower students to develop autonomy [4, 11, 15]. The impetus of this study is to provide Chinese EFL learners with an innovative teaching method that promotes learner motivation in English learning. Although previous studies have recorded positive feedback on VR integration in foreign language classrooms [6, 9, 14], research focusing on exploring EFL learners’ experiences after watching 360 VR videos in tourism English learning is scarce. Therefore, this study applied online free 360 VR videos for students to experience foreign tourist attractions after wearing VR headsets. Specifically, whether Chinese EFL learners prefer this innovative learning tool to experience immersive and authentic foreign contexts to learn English has not yet been investigated. Hence, the purpose of this case study was to apply the VR tool in an English course, focusing on travel English units.
10.2 Literature Review 10.2.1 Different VR Types Parmaxi [2] classified three types of VR simulations. First, a non-immersive VR context allows users to use a desktop computer system to go through the 3D simulation, such as Second Life (https://secondlife.com). Second, semi-immersive VR has a gesture recognition system, which can track users’ body movement, which enables human–computer interactions, like Microsoft Kinect. Finally, fully immersive VR applies a “head-mounted system where users’ vision is fully enveloped, creating a sense of full immersion” in simulation (p. 5). The main difference between semi-immersive and fully immersive VR is the degree to which the user is immersed in the experience. Kaplan-Rakowski and Gruber [16] indicated that high-immersion VR offers a greater sense of presence
10 360-Degree Virtual Reality Videos in EFL Teaching: Student Experiences
133
and authenticity compared with low-immersion VR with wearing a head-mounted device (HMD). In this paper, we focus on high-immersion VR HMD and initially propose the learning theory behind VR learning and a brief overview of integrating VR in education. Next, we present the research method and results collected from EFL learners in China. Finally, we give conclusion with future suggestions.
10.2.2 VR in Higher Education Teachers’ teaching methods in the twenty-first century classrooms face the changes from the Information Age to the Experience Age [17]. Young learners live in an experience age, full of multimedia and digital technologies, and they view technologymediated learning environments as their preference in obtaining and sharing new knowledge. VR as a learning tool in educational environments meets their learning styles better for constructing new knowledge [13] and VR guides them to experience more engaging and exciting learning materials [4, 15]. The integration of VR has become popular in higher education over the last two decades [4, 10, 18, 19]. These studies indicate that students in VR learning contexts show dynamic engagement and participation in classroom discussion and obtain abstract concepts easily because they are more receptive to real-life contexts. For example, Dalgarno and Lee [4] conducted a review of the affordance in 3D virtual learning environment from the 1990s to 2000s in enriching student motivation and engagement. The 3D virtual environments create “greater opportunities for experiential learning, increased motivation/engagement, improved contextualization of learning, and richer/more effective collaborative learning” [4]. For students, the brand-new learning moments with VR technologies are unique in educational settings. Their curiosity and motivation can be increased while being involved in immersive VR learning contexts. This activated participation can be an opportunity for igniting students’ interest and involvement in learning and maximizing the potential of VR learning experiences [13]. These unique VR learning benefits make subject contents come alive because students feel a strong sense of presence in a multi-sensory setting to create immersive and engaging learning [11]. For example, Allcoat and von Mühlenen [13] found that VR provided increased positive emotions and engagement compared with textbook and video groups learning about plant cells. Furthermore, Parmaxi [2] indicated that the VR contextual learning helped students who particularly struggle to stay focused on learning materials in English learning. These positive results showed that VR contexts give a better sense of being present in a place for connecting learning contents.
134
H.-W. Huang et al.
10.2.3 VR Embedded Language Learning Integrating VR technologies in language courses allows students to deeply immerse in learning materials [2, 14]. From the perspective of Vygotsky’s sociocultural theory [20], learning occurs when individuals communicate with contexts under meaningful interactions. The 360 VR videos, coupled with appropriate learning design, provide students with learner-object interactions to enhance learning quality in a sociocultural context. Previous studies have revealed that students had positive feedback after experiencing VR language learning. Students indicated that their attitudes on VR language learning is more enjoyable than conventional English learning [21]. Other scholars reported that their participants had increased motivation, experienced high levels of engagement and excitement to learn foreign language in immersive VR environments [6, 14]. In addition to the benefits of VR learning, there are some challenges that VR encounters. First, students feel dizzy if they wear VR HMDs for more than 3 min. Berti et al. [15] suggest wearing VR HMDs for no more than 2 min. The 360 VR video quality is the other issue while implementing VR in education. Although the previous studies indicated that students are more engaged in VR learning, there is a need to explore how the use of VR can be maximized in English learning. This case study contributes to the literature by exploring students learning experiences and attitudes in a travel English unit in an EFL classroom in China. To explore EFL learners’ attitudes and learning experiences towards the use of 360 VR videos in an English course, the authors addressed the research questions below. R.Q. 1: What were the overall perceptions of the students’ VR language learning experience? R.Q. 2: What were students’ suggestions after experiencing the VR project? R.Q. 3: What advantages and disadvantages of the VR learning project did the students express in the interviews?
10.3 Research Methods This study explored students’ perceptions of experiencing VR immersive environments through virtually “visiting” four different countries’ main tourism cities. In order to experience a fully immersive VR environment, the students wore HMDs to watch 360 videos. They were “teleported” into a 3D view of different pre-recorded authentic contexts and viewed the virtual locales in a 360-degree surrounding by moving their head. This study used a qualitative research approach, collecting data from students’ final reflections, and interview transcripts to explore EFL learners’ perceptions of the 360 VR video to English learning. There was no control group in this study as the class teacher was expected to investigate students’ views of this pilot study. The rationale of such a design was that
10 360-Degree Virtual Reality Videos in EFL Teaching: Student Experiences
135
the university hoped to understand students’ learning experience in this innovative teaching method.
10.3.1 Participants The participants were English major sophomores (n = 28; 24 females and 4 males) enrolled in a four-year public university in southern China, all aged between 20 and 21 years old. The participants were randomly divided into six groups of 4–5 students. This study utilized a convenience sample because all participants enrolled in a cohort programme. They were native Chinese speakers, had studied English as a foreign language for at least six years in the Chinese school education system, and were assessed to be at an intermediate English proficiency level based on the Chinese National College Entrance Examination (equivalent to the B1 level of the Common European Framework of Reference for Languages (CEFR) levels [22]. None of the students had a previous experience with VR learning. All the participants volunteered to attend this study and had the right to withdraw at any time.
10.3.2 Instruments The data collected included final reflections written by 28 students and focus-group interviews with six participants. Students’ final reflections were collected to explore their views about VR learning experience and suggestions regarding how VR can be effectively used in future learning. Six volunteers, five females, and one male attended the focus-group interview to understand their thoughts about the advantages and challenges of using VR for language learning.
10.3.3 Procedures The classroom teacher (the first author) announced the goals of the VR learning programme to all participants before conducting the research. All participants in this case study had similar opportunities to use the VR HMDs and participate in the learning tasks. Particularly, this study investigated the use of advanced VR HMDs, which are suitable for myopia less than 600 ◦ . This is important because the majority of Chinese students wear glasses and the VR HMDs offer the best VR experience under these conditions. The research lasted for six weeks, and students had 100 min of class time each week. The theme of the VR project was “Travel English”. Four countries were selected by the class teacher for immersive VR learning; these countries were Turkey, Spain, New Zealand, and Colombia.
136
H.-W. Huang et al.
Fig. 10.1 Screenshots of 360 VR videos from Spain, Colombia, New Zealand, and Turkey
The course design adopted the structure of flipped learning approach. All students watched a 2D introductory video of the country to learn the basic concepts related to the country before entering the class. When students came to the classroom, they had group discussions and answered the teacher’s questions related to the 2D video. Each student then wore a VR HMD to experience a 360 VR video of the country for the weekly schedule and answered embedded questions (see Fig. 10.1). Afterwards, students practised English conversation sharing what they saw in the 360 video with group members. Students’ final reflections were collected to explore their views of the VR learning experience and analyse their suggestions regarding how VR can be more effectively used in future teaching. Afterwards, six volunteers participated in semi-structured interviews. The RAs interviewed the volunteers using the interview questions made by the first author. The interviews were conducted in the students’ native language, Chinese, allowing the students to express their views with less restriction or being contrived by second language limitations. To make students feel comfortable while answering questions, the first author did not participate in the interview process, and the RAs started a welcome talk to put the interviewees at ease. Finally, the RAs conducted subsequent data analysis.
10.3.4 Data Analysis The RAs conducted a content analysis of the students’ final reflections and categorized them into different themes to answer RQ1 and 2. The content analysis steps originated from Braun and Clarke [23]. We conducted six steps to categorize the students’ final reflections: (a) familiarizing yourself with your data, (b) generating
10 360-Degree Virtual Reality Videos in EFL Teaching: Student Experiences
137
initial codes, (c) searching for themes, (d) reviewing themes, (e) defining and naming themes, and (f) producing the report (p. 87). Steps a, b, c, and e were performed by the RAs. To improve accuracy, we reviewed themes (step d) and produced the report (step f) with the RAs. In the end, we reviewed the final report to improve accuracy. For the focus-group interviews, all interview data were collected from audio recordings and transcribed into texts for corpus analysis. The accuracy of the transcriptions was verified through the audio file and analysed it using the Chinese corpus. Corpus analysis was conducted using WEICIYUN (http://www.weiciyun.com), an online tool that allows for Chinese basic corpus analysis and generates visualizations of word occurrences based on the input text. Visualizing word occurrence frequencies enables us to analyse key information from interview data.
10.4 Results The following results are presented following each research questions. R.Q. 1: What were the overall perceptions of the students’ VR language learning experience?
The final reflections aimed at exploring student perceptions of VR language learning. Their responses to this question were grouped into two categories: reallike contexts and immersive learning with technology. Regarding the category of real-like contexts, the data indicated that the students felt that VR learning could enhance the realism by wearing VR HMDs in the virtual scenes, and this experience is helpful to learning retention. In general, students’ learning experience was positive and full of novelty towards VR language learning. Over 85% (i.e. 24 out of 28) of the students stated that they enjoyed this new learning method in experiencing foreign tourist attractions through 360 VR videos. Some examples are detailed below. S5: … Dr. Huang used virtual reality to immerse me in the beautiful scenery of other countries, watching and appreciating the beautiful scenery while learning the knowledge of various countries. The knowledge I learned has been applied to my composition. S16: … VR teaching gave us an interesting and realistic learning experience. We learned about other countries in a more realistic atmosphere, and this gave us a deeper memory. S23: … VR learning allows me to experience more engagement in the sense of virtually being there, which broke my traditional English learning mindset. This unique learning opportunity is not offered in other courses. The second category was immersive learning with technology. The data showed that the students enjoyed the immersive VR language learning, which help them realize how technology supports English learning. Moreover, the students stated that they could feel immersed in the virtual “real-world” environments after wearing VR HMDs because they could “fly” to other countries without spending travel expenses. Some responses are presented as follows:
138
H.-W. Huang et al.
S9: I immersed myself in learning about some foreign cultures rather than the presentation of photos. The application of modern technology in the field of education is of great benefit. S17: VR learning, combining fun and high-tech, gave me a new definition of English learning, which not only improved my English learning motivation but also enabled me to have a sense of immersion while watching 360 videos. S26: Technology has brought us new ways and opportunities to learn. We can have virtual field trips to learn new knowledge and see tourism spots in other countries. This is not possible in traditional English classrooms. In summary, students’ final reflections indicated that VR technologies provide learners with engaging learning opportunities and reform language learning experiences in EFL classrooms. Students had VR tours in different countries, which is an improvement on the traditional textbook or the 2D pictures experienced in other English learning settings. Additionally, the VR tours allowed students to immerse themselves in foreign tourism attractions, inspiring them to have deeper connection with the learning materials. R.Q. 2: What were students’ suggestions after experiencing the VR project?
All 28 students’ response suggestions about the VR learning project were used to answer research question two. The RAs categorized the results collected according to their similarities (see Fig. 10.2): (1) VR equipment and technology, (2) VR content, and (3) others. In the first category about VR equipment and technology, three students expressed their views about the VR device itself. Their responses were categorized into (1) use more advanced VR display devices; and (2) wear headphones to experience panoramic sound.
Fig. 10.2 Overview of the classified suggestions
10 360-Degree Virtual Reality Videos in EFL Teaching: Student Experiences
139
Table 10.1 Selected students’ suggestions about the VR learning project A. VR content • I hope next time VR experience can introduce other countries’ traditional clothing • In the next VR immersive language learning class, we can enjoy different conditions. For example, we can drive a car and enjoy the view of the city • I hope the next VR experience can go into the streets to feel the local atmosphere • It can include some interactive games or questions in VR B. VR equipment and technology • My suggestion is to remind the whole class to wear headphones. Wearing headphones allows us to have a more immersive experience • It is better to use more advanced VR equipment to improve immersion C. Others • I wish I could have buffered time in the video. It is easy to miss the beginning part because I needed to set up the device with my smartphone • I hope to increase the length of the video and introduce the history and culture of the country in more aspects • I hope the teacher can provide more time for us
The second category is about the VR content. Eighteen students (64%) mentioned viewing content that could be sorted into: (1) enriching video types and content, especially more lifelike videos such as food, clothing, and street culture; (2) improving the clarity of content; and (3) increasing interaction. The third category is other suggestions that could not be identified in the previous two categories. These suggestions included (1) extending the use time; and (2) providing buffer time in the beginning to set up the device. Table 10.1 presents the students’ suggestions categorized by the RAs. R.Q. 3: What advantages and disadvantages of the VR learning project did the students expressed in the interviews?
All volunteers were divided into two groups to conduct interviews. They expressed positive attitudes towards the VR learning project. They indicated that VR learning has many advantages, such as allowing students to focus on the content more while learning with VR, developing the ability to active learning, and exploring knowledge by themselves. However, some students expressed some disadvantages of VR learning. For example, the equipment had various problems and was not easy to control. Additionally, using VR reduced teacher-student communication and wearing the HMD caused dizziness. The interviewees’ responses were transcribed into Chinese and then visualized through a word cloud to present their perceptions of the advantages and disadvantages of the VR learning project (see Fig. 10.3). The larger words in the word cloud indicate more frequent use. The illustration also includes translations into English.
140
H.-W. Huang et al.
(a) advantages
(b) disadvantages
Fig. 10.3 Word cloud results of student interview responses
10.4.1 Results of Two Interview Questions Q: What benefits does VR learning have as a whole? All six respondents agreed that VR learning has many benefits and can help students learn more. Students believed that VR learning can reduce the impact of the COVID19 pandemic, provide an immersive learning environment, and enhance learning efficiency (see Fig. 10.3a). Below are examples, translated, responses to this interview question from three students. Because the pandemic has brought us a lot of inconvenience in classroom learning, VR learning project is not limited by time and space, which improves our learning efficiency. (Student D, Group2) The 360 VR videos can help me engage in an immersive environment, which is beneficial for me to apply my imagination in this kind of digitalized learning materials. (Student B, Group 1) VR allows us to participate in class activities with interesting scenarios or dialogues, which can raise our interest, make us more motivated, and learn new knowledge quickly. (Student A, Group 1)
Q: Were there any disadvantages in VR learning? If yes, what are they? Some students expressed several disadvantages in VR learning and hope to improve it in future. Students mentioned that it was difficult to experience high-quality VR content and blurred VR content led to lower interaction. The VR teaching mode reduced interaction between teachers and students. Some students also experienced VR vertigo symptoms. (See Fig. 3b). Below are two example responses to this interview question. In addition to the technical issue of VR equipment, the main thing is probably that VR is not very popular now and VR resources may be relatively scarce. After students put on the VR headsets, the teacher cannot monitor the students’ viewing, which may lead to absent-minded learning. (Student E, Group2)
10 360-Degree Virtual Reality Videos in EFL Teaching: Student Experiences
141
I felt dizzy and the recovery time may vary depending on each individual’s physical condition. (Student C, Group1)
10.5 Discussion This case study aimed to explore student experiences in engaging in immersive VR learning in an EFL course in China. Several major findings were found. First, students experienced immersive learning in an almost lifelike context. When students wear VR HMDs, they immediately became involved and immersed in the virtual surroundings they were visiting. They paid full attention to the 360 video contents to learn new knowledge and experience the feeling of being present in the scene before their eyes. The findings are consistent with previous studies [2, 6, 14], indicating that VR learning is a highly immersive and engaging learning experience. Second, students’ suggestions written in the final reflections supported evidence of the main themes of VR equipment and contents. The majority of the students mentioned their expectation to visit local street views in different countries and to see the world. The results are similar to the conclusions of Berti et al. [15], who reported that providing students with VR HMDs can teleport students to virtually visit anyplace in the world. Students can “visit” a famous travel spot that they have longed for and they can interact with the virtual context by walking around there, which replicates a field trip. These affordances help students experience a new learning method that goes beyond conventional role plays and physical field trips. Additionally, teachers can search the uploaded 360 VR videos on the Internet and download them for class teaching materials. Afterwards, teachers take students to have virtual field trips with VR HMDs, along with headphones. According to the teaching objectives, the 360 VR videos can be a museum, a famous foreign tourism attraction, even outer space. Finally, students observed the advantages of VR language learning. They noticed that the immersive VR learning experience helps their imagination to build new knowledge. Using VR in EFL classrooms can enhance twenty-first century learners’ engagement and curiosity beyond classrooms walls. The findings of the word cloud are in line with those of previous studies stating that VR technologies can be creative and powerful vehicles to stimulate student imagination and increase engagement [4, 6]. When VR becomes an educational tool in schools, students experience a feeling of “being there” in a virtual environment, not merely perusing pictures or reading materials. Due to virtually physical immersion, English learners benefits from deeper engagement in language learning tasks. As for the disadvantages of VR learning, some students felt dizzy while watching 360 videos. This could be individual differences because all the videos were set within two minutes, as suggested by Berti et al. [15].
142
H.-W. Huang et al.
10.6 Conclusions Virtual reality has become a popular theme in educational contexts, and the low cost of implementing 360 VR videos in EFL classrooms makes it a more attractive learning tool. Although there were some technical issues related to video quality and motion sickness, most students expressed their excitement and engagement in immersive VR learning in an EFL course. Additionally, the responses supported previous research that it is difficult for teachers to monitor students’ selection and attention to content in the HMDs. However, by asking students to prepare for the lesson and use that preparation to guide the VR experience, attention to content should be improved. Furthermore, asking students to complete a survey, participate in an interview, and submit a final reflection encourages attention during the task and enhances retention. Finally, asking students to provide suggestions for future enhancements gives them motivation to contribute to further learning. While this study did not include any longitudinal data or quantitative language learning results, the overall impression of VR English learning is to increase participation, improve attention, and motivate students to be critical about their learning and the learning material. For future suggestions, it is worth evaluating the variables, including students’ speaking performance under the VR course design in English learning and their emotion changes in quantitative data.
References 1. Godwin-Jones, R.: Augmented reality and language learning: from annotated vocabulary to place-based mobile games. Language Learn. Technol. 20(3), 9–19 (2016). https://www.sco pus.com/inward/record.uri?eid=2-s2.0-84994627515&partnerID=40&md5=6d3aec75cd73 21c12aa0d2acef7c8ad9 2. Parmaxi, A.: Virtual reality in language learning: a systematic review and implications for research and practice. Interact. Learn. Environ. (2020) 3. Warschauer, M.: Comparing face-to-face and electronic discussion in the second language classroom. CALICO J. 13(2–3), 7–26 (1995) 4. Dalgarno, B., Lee, M.: What are the learning affordances of 3-D virtual environments? Br. J. Edu. Technol. 41, 10–32 (2010) 5. Huang, H.W.: Effects of smartphone-based collaborative vlog projects on EFL learners’ speaking performance and learning engagement. Australas. J. Educ. Technol. 37(6), 18–40 (2021) 6. Berti, M.: Italian open education: virtual reality immersions for the language classroom. In: Comas-Quinn, A., Beaven, A., Sawhill, B. (eds.), New Case Studies of Openness in and Beyond the Language Classroom, pp. 37–47 (2019) 7. Makransky, G., Lilleholt, L.: A structural equation modeling investigation of the emotional value of immersive virtual reality in education [Article]. Educ. Tech. Res. Dev. 66(5), 1141– 1164 (2018) 8. Chien, S.Y., Hwang, G.J., Jong, M.S.Y.: Effects of peer assessment within the context of spherical video-based virtual reality on EFL students’ English-Speaking performance and learning perceptions. Comput. Educ. 146 (2020) 9. Gruber, A., Kaplan-Rakowski, R.: The impact of high-immersion virtual reality on foreign language anxiety when speaking in public. SSRN Electron. J. (2022)
10 360-Degree Virtual Reality Videos in EFL Teaching: Student Experiences
143
10. Riva, G., Mantovani, F., Capideville, C., Preziosa, A., Morganti, F., Villani, D., Gaggioli, A., Botella, C., Alcañiz Raya, M.: Affective interactions using virtual reality: the link between presence and emotions. Cyberpsychol. Behav. 10, 45–56 (2007) 11. Hu-Au, E., Lee, J.: Virtual reality in education: a tool for learning in the experience age. Int. J. Innov. Educ. 4 (2017) 12. Qiu, X.-Y., Chiu, C.-K., Zhao, L.-L., Sun, C.-F., Chen, S.-J.: Trends in VR/AR technologysupporting language learning from 2008 to 2019: a research perspective. Interact. Learn. Environ. (2021) 13. Allcoat, D., von Mühlenen, A.: Learning in virtual reality: effects on performance, emotion and engagement. Res. Learn. Technol. 26 (2018) 14. Lin, V., Barrett, N., Liu, G.-Z., Chen, N.-S., Morris, Jong, S.-Y.: Supporting dyadic learning of English for tourism purposes with scenery-based virtual reality. Comput. Assisted Language Learn. (2021) 15. Berti, M., Maranzana, S., Monzingo, J.: Fostering cultural understanding with virtual reality: a look at students’ stereotypes and beliefs. Int. J. Comput. Assisted Language Learn. Teach. 10, 47–59 (2020) 16. Kaplan-Rakowski, R., Gruber, A.: Low-immersion versus high-immersion virtual reality: definitions, classification, and examples with a foreign language focus. In: Proceedings of the Innovation in Language Learning International Conference 2019, pp. 552–555. Pixel (2019) 17. Wadhera, M.: The information age is over; welcome to the experience age. Tech Crunch (2016, May 9). https://techcrunch.com/2016/05/09/the-information-age-is-overwelcome-tothe-experience-age/ 18. Hagge, P.: Student perceptions of semester-long in-class virtual reality: effectively using “google earth VR” in a higher education classroom. J. Geogr. High. Educ. 45, 1–19 (2020) 19. Lau, K., Lee, P.Y.: The use of virtual reality for creating unusual environmental stimulation to motivate students to explore creative ideas. Interact. Learn. Environ. 23, 3–18 (2012) 20. Vygotsky, L.: Mind in society: the development of higher psychological processes (1978) 21. Kaplan-Rakowski, R., Wojdynski, T.: Students’ attitudes toward high-immersion virtual reality assisted language learning. In: Taalas, P., Jalkanen, J., Bradley, L., Thouësny, S. (eds.), Future-Proof CALL: Language Learning as Exploration and Encounters—Short Papers from EUROCALL 2018, pp. 124–129 (2018) 22. European Union and Council of Europe. Common European Framework of Reference for Languages: Learning, Teaching, Assessment (2004). https://europa.eu/europass/system/files/ 2020-05/CEFR%20self-assessment%20grid%20EN.pdf 23. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3, 77–101 (2006)
Chapter 11
Medical-Network (Med-Net): A Neural Network for Breast Cancer Segmentation in Ultrasound Image Yahya Alzahrani
and Boubakeur Boufama
Abstract Breast tumor segmentation is an important image processing technique for cancer diagnosis and treatment. Recently, deep learning models have shown significant advances toward computer-aided diagnosis systems (CAD). We proposed a novel neural network-based attention modules to segment tumors from breast ultrasound (BUS) images. Inspired by the human brain function of interpreting the scene, in this contribution, we focused only on the salient areas of the image, while suppressing other details. This was built on a residual encoder and dense blocks decoder. The generated feature map comprises spatial as well as channel details and fused the maps producing more meaningful feature map and gives better discriminative characteristics. The results show that the proposed model outperformed several recent models and has potential for clinical practices. Keywords Convolutional neural network · BUS Images · Breast tumor segmentation · Deep learning
11.1 Introduction Breast cancer is by far the common breast mass among women [1]. Clinical diagnosis in primary care clinics is a crucial factor in decreasing the risk of breast cancer and providing earlier treatment for more positive outcomes for patients. Although the mammogram is a well-known and reliable image modality that is used in breast cancer diagnosis, it can be costly and comes with radiation risks from the use of X-rays. Mammograms also tend to produce a high number of false-positive results. In contrast, ultrasound (US) is an appropriate alternative for early stage cancer detection. A mammogram or magnetic resonance imaging (MRI) can be used in conjunction with US to provide additional evidence. Various medical image segmentation techniques have emerged in the last decade. However, recent studies have further Y. Alzahrani (B) · B. Boufama University of Windsor, 401 Sunset Ave, Windsor N9B 3P4, ON, Canada e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_12
145
146
Y. Alzahrani and B. Boufama
developed existing computer-aided methods as they are often helpful in combination with machine learning and deep learning (DL) approaches. Edges distinguish separate regions within an image, and their characteristics can reveal the presence of a cancerous tumor [2]. However, a common challenge in US medical images is that the high occurrence of noise in such images can obscure noticeable edges, thus making successful boundary detection challenging. Automatic segmentation is the ultimate goal of many methods; some algorithms show sophisticated performance if incorporated with prior knowledge or human interaction, such as active contour[3] and region growing (RG) [4]. In the former, the initial contour is defined by the user, and the contour usually evolves toward homogeneous regions. Similarly, in the latter, the initial seed of RG algorithms is chosen, and the neighboring parts are integrated in the predefined region in an iterative process. However, this process is prone to errors, and the initialization is subjective, meaning that the results are initialization dependent. Recently, machine learning-based algorithms have attracted the interest of many researchers [5–10] due to the availability of graphic processing units (GPUs) and appropriate data and their ability to provide sophisticated outcomes. The human breast, like other organs of the body, vary in shape, appearance, and size, which means that any diagnosis tool based on human prior knowledge must also display considerable flexibility. Learning-based methods have shown their superiority over other segmentation methods. However, the number of data samples available is a significant factor for any DL model. Usually, the lack of data in a medical field is a major challenge, which raises the need for models that are capable of generalizing well even from a small dataset. Moreover, convolutional operations, which are widely used in state-ofthe-art computer vision networks, cannot discard the unavoidable locality imposed by the nature of convolutional filters themselves because they can only examine a concentrated part of an image and thus miss long-range dependencies. A possible solution to this issue can be a good attention mechanism. Attention concepts in computer vision are inspired from the way humans process visual information. The human eye focuses on certain parts of a scene rather than an entire scene as a whole, allowing an accurate recognition of objects even when the differences between the classes are subtle, and the classes themselves contain diverse samples. In this article, we proposed a novel model fora novel attention-based convolutional neural network (CNN) for breast US segmentation that comprises both channel and spatial information. We also preserve important features using residual learning. The rest of this paper is organized as follows: Sect. 11.2 provides a literature review of the existing methods for breast US segmentation. Section 11.3 describes the proposed segmentation model. Section 11.4 discusses the implementation and evaluation of the model. Finally, Sect. 11.5 concludes this article.
11 Medical-Network (Med-Net): A Neural Network …
147
11.2 Related Work In recent years, advances in deep learning and neural networks have contributed toward achieving fully automated US image segmentation and other relevant tasks by overcoming several persistent challenges that many previous methods could not effectively handle. Various deep neural network architectures have been proposed to perform efficient segmentation and the detection of abnormalities. For example, convolutional neural networks have been used for fully automated medical image segmentation; patch-based neural networks are trained on image patches, and fully convolutional networks perform pixel-wise prediction to form the final segmentation and U-nets [11]. Boundary ambiguity is one of the major issues when using fully connected networks (FCNs) for automatic US image segmentation, resulting in the need for more refined deep learning architectures. In this light, one study [12] proposed the use of cascaded FCNs to perform multiscale feature extraction. Moreover, spatial consistency can be enhanced by adding an auto-context scheme to the main architecture. U-nets are one of the most popular and well-established deep learning architectures for image segmentation. They deliver high performance even with a limited amount of training data. They are primarily CNNs that consist of a downsampling path, which reduces the image size by performing convolutional and pooling operations on the input image and extracts contextual features, and an upsampling path, which reconstructs the image to recover the image size and various details [13]. However, U-net encoder-based maxpooling tends to lose some localization information. Therefore, many studies show significant improvement when it is replaced by more sophisticated architectures, such as in the visual geometry group network (VGG) [14]. V-nets [15] are a similar architecture that are applied to 3D US images. They also face the limitation of inadequate training data. They consist of a compression and decompression path, in a manner similar to U-nets for 2D images. The incorporation of a 3D supervision mechanism facilitates accurate segmentation by exploiting a hybrid loss function that has shown fast convergence. Transfer learning [16] has gained the attention of practitioners for various tasks. This approach has succeeded in many applications and is one of the popular current approaches. It is a convenient solution for any limited data task as these models are usually trained on relatively huge datasets, such as Google’s Open Images, ImageNet, and CIFAR10. In the U-net base model, the effective use of skip connections between the two paths has some drawbacks, such as suboptimal feature reusability and a consequently increased need for computational resources. Other versions, [5, 6], have used attention mechanisms incorporated in the U-net architectures to improve performance, especially for detection tasks. The addition of a spatial attention gate (SAG) and a channel attention gate (CAG) to a U-net helps in locating the region of interest (ROI) and explaining the feature representation, respectively. This type of technique is utilized in numerous applications, such as machine translation and natural language processing (NLP). Non-local means [17] and its extended version of the non-local neural network [18], as well as machine translation [19], can be optimized through a back propagation process in the training iterations and therefore are considered
148
Y. Alzahrani and B. Boufama
soft attention modules. These soft attention mechanisms are very efficient and can be plugged into CNNs. In contrast, hard attention non-differentiable operations are not commonly used with CNNs. Attention mechanisms have proven successful in sequence modeling as they allow the effective transmission of past information, an advancement that was not possible with older architectures based on recurrent neural networks. Therefore, self-attention can substitute convolutional operations to improve the performance of neural networks on visual tasks. The best performance though has been reported when both attention and convolutions are combined [20].
11.3 Proposed Breast Ultrasound Segmentation Model (Med-Net) The most significant obstacle in breast tumor US image segmentation is the shape variation because the size of a tumor can vary, and normally, the border of a tumor is very close to the surrounding tissue. Frequently, the majority of data points involved are toward the back. Therefore, small tumors are quite difficult to identify. This raises what is called a class imbalance problem. One of the popular ways to address this problem is to force more weight on the minority class in an imbalanced dataset. This can be achieved using a weighted objective function. Prepossession may also be exploited to manipulate the image data in a way that helps reduce the problem. For example, scaling the image by shifting the width and the height may help to gain some sort of accuracy enhancement. In a similar classification task, oversampling the minority class will balance the data and help to tackle an imbalance problem. Inspiration from the human brain interpretation of visual perspective has influenced deep learning researchers to adapt the same concepts to recognizing objects in CNNs and related tasks. Many contributions in the literature have applied this concept in applications, such as classification [21], detection [22], and segmentation [23].
11.3.1 Encoder Architecture In this article, we present a neural network for breast ultrasound image segmentation as shown in Fig. 11.1. Our solution is a general use model and can be utilized on similar vision tasks. When the network processes the data to downsample the spatial dimensions, some meaningful details may be lost. Although pooling is a must in CNNs, we however employed residual blocks across the encoder of our network to keep track of the previous activations of each layer and sum up the feature maps before fusion. This seems to be a good solution to address this issue. When encoding the data, one of the keys is to maintain the dimension reductions and to exploit the high-level information that carries spatial information while extracting the feature vector.
11 Medical-Network (Med-Net): A Neural Network …
149
Fig. 11.1 Proposed neural network architecture
To further enhance our network, we employed an attention module as described in the next subsection. This is similar to [21] but with a more meaningful feature map. However, the localization information can be preserved in a U-net-like architecture as in our proposal using residual blocks that add raw representations to the refined feature map produced by each layer. The encoder’s residual block is shown in Fig. 11.2. Each block in our encoder can be represented as follows: xl∼ = Al ( f n (C(K ,n) (C(k,n) (xl )))) + (xl )
(11.1)
where xl∼ is the output of the l th layer, xl is the input to the residual block, C(K ,n) is a convolution layer with a filter size of k and n filters (n = 32, 64, 128, 256, 512, and 1024, respectively). A denotes an attention unit; K in l1 and l2 is of size 1 × 7 and 7 × 1, respectively, and of a symmetric size of five and three, respectively, in the subsequent layers in both residual blocks and attention units. However, the last residual block utilizes a k = 1 square filter.
11.3.2 Attention Module Inspired by the work of Hu et al. [24], which is one of the early models that proposed channel attention for vision tasks, a squeeze block employs a channel-wise module that learns dependencies across the channels. However, in detection tasks, this work may suffer from the lack of localization information needed. Similarly, the work in [21] adds more spatial information that can be taken into account to look into the salience map that comprises the channel and spatial details. A squeeze
150
Y. Alzahrani and B. Boufama
Fig. 11.2 Residual blocks utilized in our proposed encoder to downsample the spatial dimensions
operator incorporates global feature information in an aggregated feature set across the spatial axis using average pooling, followed by an excitation operator, which assigns per channel weights given the respective feature channel. However, in this work, we propose attention unit-based residual learning instead of global pooling, which is meant for adding more importance to the spatial features incorporated with the relevant channel features, and boosts the performance of the network. Inspired by the two previously mentioned attention models, we propose an attention unit with two channel and spatial branches to improve the discriminative characteristics of the incorporated feature maps. The channel-wise pooling path employs a global maxpooling layer followed by a convolution operation. The feature vector is shrunk around the channel axis, and the following 1 × 1 convolution further emphasizes what has been captured. The other branches utilize a residual block as shown in Fig. 11.3, to add more spatial contextual representation. Early representations of the previous layer are used to produce a final feature map. Both branches are incorporated using element-wise summation and then multiplied by the attention input. Let M be the input feature map to the attention unit M ∈ R H ×W ×C . We first downsize the input feature maps using a maxpooling layer so that the spatial details are grouped and represented by the matrix Fmax ∈ R H ×W ×C . It is then squeezed around the channel axis to produce Fmax ∈ R 1×1×C . It will be convolved by 1 × 1 filters. We also employed a residual block as another branch to preserve the localization details that may be lost from the first branch. This block is followed by a 1 × 1 convolution for more refined feature maps. Let F ∼ ∈ R H ×W ×C denote the output of the branch; then, it can be written as:
11 Medical-Network (Med-Net): A Neural Network …
151
Fig. 11.3 Proposed attention module
F ∼ = σ (C(K ,n) (R ⊗ M)
(11.2)
where k is a square filter of size 7 × 7; n is the number of channels, which is equal to the input channels; C is a convolution operation; R represents the residual block; and σ denotes the sigmoid function. The two feature descriptors from both branches are added together using element-wise summation and then multiplied by the the input as follows: (11.3) M ∼ = σ (Fmax ⊕ F ∼ ) ⊗ M
11.3.3 Decoder Architecture In this work, we utilize four upsampling layers based on a dense block. The lowlevel features are first concatenated with the encoder’s residual and attention units, which pass them to the dense block. To take advantage of the large-size feature maps concatenated from early layers, we employ the dense block in the encoder, which includes two convolutional layers with 12 filters prefixed with a batch normalization layer and a rectified linear unit (ReLU) activation layer to add non-linearity. The dense block [25] was utilized to feed forward input as well as the output of each layer to the subsequent layers. The decoder path in our model consists of a 2×2 upsampling operation, batch normalization, ReLU activation, and 3×3 convolution. This can be written as: Ul = f n (δ(D(K ,n) (Tl−1 Al Rl )))
(11.4)
where U is the output of layer l, T is the output of the l th−1 transposed layer, f is a fully connected layer, D denotes the dense block with n = 12 kernels of size k = 7 × 7, δ denotes the ReLU function, A denotes an attention unit, represents a concatenation, and R is the output of the l th encoder’s block.
152
Y. Alzahrani and B. Boufama
11.3.4 Implementation Details In this study, the experiments were conducted using a Keras/TensorFlow 2.3.1 backend with Python 3.6 on Windows. The computer was equipped with an NVIDIA GeForce 1080 Ti with 11 GB of GPU memory. We performed a five-fold crossvalidation to evaluate our model. The data were split randomly in each fold into two sets with a ratio of 80% for training and 20% for the validation set. It is worth mentioning that the model was trained on both datasets separately. First, the images were resized to 256×256 spatial dimensions, and a preprocessing technique was applied to further enhance the performance. It involved several transformations: horizontal flip ( p = 0.5); random brightness contrast ( p = 0.2); random gamma [gamma_limit = (80, 120)]; adaptive histogram equalization ( p = 1.0, threshold value for contrast limiting = 2.0); grid distortion ( p = 0.4); shift, scale, and rotate (shift_limit = 0.0625, scale_limit = 0.1, −rotate_limit = 15). Finally, p is the probability of applying a transformation. In this work, these transformations were applied on all the experiments, including the models which were used for comparison. Our proposed model has 16 million trainable parameters and was optimized using the adaptive moment estimation (Adam) optimizer [26]. We set the adaptive learning rate initially at 0.0001 with a minimum rate of 0.000001 and batch of 4 to train our model. To prevent an overfitting problem, we set all the experiments to terminate the training if no improvement was recorded within 10 epochs. Due to its robustness against the imbalanced class issue, in this work, we used the Dice loss function to train the model given by: Loss = 1 −
2iN pi ∗ qi iN pi + iN qi
(11.5)
This loss function produces a value between [0, 1], where pi is the predicted pixel and qi denotes the true mask.
11.3.5 Dataset Two BUS datasets named UDIAT [11] and BUSIS [27–30] were used for training and validating the model. UDIAT has fewer samples that is 163 images along with their labels. This dataset was collected by Parc Taul’ı Corporation Diagnostic Center, Sabadell (Spain), in 2012. BUSIS has 562 images of benign and malignant tumors along with the ground truth. The latter was collected by different institutions using different scanners which make it a very reliable data source. Both datasets present a single tumor in each image. Most images in these datasets present small tumors in which the background represents the majority class data points. This situation introduces what so called class imbalance problem which needs to be carefully handled.
11 Medical-Network (Med-Net): A Neural Network …
153
11.4 Discussion Tumor tissues in breast US images are of different shapes and can appear in different locations. However, most of the tumors occupy only a small area of pixels. Therefore, in the early layers, small kernels can capture local discrepancies, and there is also a need for a large receptive field to cover more pixels to consider the semantic correlations. This helps to preserve the location information before fusing the feature maps in the subsequent layers. Moreover, the divergence of intensities in the vertical neighboring pixels is very small. However, a large receptive field kernel creates overhead regarding memory resources. To overcome this challenge, we utilized dimensions of 1 × 7 and 7 × 1 in the early two layers, respectively. Then, the size was narrowed down in the following layers as the dimensions of the feature maps increased. This approach preserves the long-range dependencies with a significant improvement on the produced features, thus, providing better feature map representations. In this article, we introduced a novel breast US image segmentation model that can be utilized and extended for any segmentation task. Our model has been demonstrated to be robust with imbalanced class data as seen from the results. The model was evaluated using various metrics: Dice coefficient (DSC), Jaccard index (JI), truepositive ratio (TPR), and false-positive ratio (FPR). In this work, we evaluated our proposed model quantitatively and qualitatively, and the model proved to be stable and robust for breast US image segmentation. We compared our proposed model with four others; two of them were recent successful models for medical image segmentation: M-net [31] and squeeze-U-Net [32]. These two models were implemented and trained from scratch. We also utilized selective kernel U-Net, [33], STAN [34], and U-Net-SA [35] trained originally on UDIAT and BUSIS, for comparison only as these models were meant for breast US images. Therefore, the scores were taken as reported in their articles. The evaluation metrics are given by the following equations: DSC = JI =
2T P 2T P + F P + F N
(11.6)
TP T P + FN + FP
(11.7)
Table 11.1 Evaluation metrics for all models given by the average score of five-fold cross-validation on (UDIAT) dataset Model Dataset DSC JI TPR FPR Proposed model Squeeze U-Net [32] M-Net [31] STAN [34] SK-U-Net [33]
UDIAT UDIAT UDIAT UDIAT UDIAT
0.794 0.721 0.748 0.782 0.791
0.673 0.585 0.615 0.695 –
0.777 0.701 0.740 0.801 –
0.007 0.008 0.007 0.266 –
154
Y. Alzahrani and B. Boufama
Table 11.2 Comparison and evaluation metrics for the models given by the average score of fivefold cross-validation on (BUSIS) dataset Model Dataset DSC JI TPR FPR Proposed model Squeeze U-Net M-Net STAN U-Net-SA [35]
BUSIS BUSIS BUSIS BUSIS BUSIS
0.920 0.912 0.909 0.912 0.905
0.854 0.841 0.836 0.847 0.838
0.906 0.910 0.915 0.917 0.910
Fig. 11.4 Training curves using five-cross-validation on UDIAT dataset
0.007 0.009 0.009 0.093 0.089
11 Medical-Network (Med-Net): A Neural Network …
155
T PR =
TP T P + FN
(11.8)
FPR =
FP FP + T N
(11.9)
The results showed our model outperformed the others that were examined. Tables 11.1 and 11.2 show the obtained results Fig. 11.6 shows the performance of the model on BUSIS dataset. In terms of scores on UDIAT dataset, which had only a few samples, the model has proven to be very efficient for a small-sized dataset as a Dice score of 0.79 and JI score of 0.67 were achieved. The selective kernel U-net SK-U-net gained a very close score, having the second highest Dice score on the dataset. However, it was trained on a relatively large private dataset as compared to our model, which was trained on only 163 samples. Moreover, Stan achieved a JI score of 0.69 and had the highest TPR and FPR scores, which indicates that it may identify some background pixels as a tumor. In contrast, our model and M-Net scored the lowest FPR, and this
Fig. 11.5 Segmentation sample cases produced by different models used in this study and our proposed network (Med-Net) using UDIAT dataset
156
Y. Alzahrani and B. Boufama
Fig. 11.6 Performance curves using five-cross-validation on BUSIS dataset
can be seen in Fig. 11.5, which shows very little positive area outside of the tumor boundaries Fig. 11.4 shows the performance of the model on UDIAT dataset. The other models that were implemented and trained in this study were M-net and squeeze-U-Net. M-net had few parameters and showed decent performance with both datasets, achieving Dice and JI scores of 0.74 and 0.61 on UDIAT, respectively. Squeeze U-net, which was a modified version of U-Net [36] equipped with a squeeze module [37], achieved Dice and JI scores of 0.72 and 0.58, respectively. In contrast, our model scored the lowest FPR on BUSIS dataset, and this can be seen in Fig. 11.7, which shows very little positive area outside of the tumor boundaries. Our proposed model also achieved the highest performance on BUSIS dataset
11 Medical-Network (Med-Net): A Neural Network …
157
Fig. 11.7 Segmentation sample cases produced by different models used in this study and our proposed network (Med-Net) using BUSIS dataset
with Dice and JI scores of 0.92 and 85, respectively. It is clear that our model has the best FPR of all the models. STAN also gained the highest TPR score and the second best JI. An adequate performance from all the models was shown with this dataset. This is due to the fact that this dataset was collected and annotated by experts from different institutions. It had a large number of samples and was also produced by different devices, which make it suitable for evaluating and justifying segmentation tasks. Overall, our model proved its superiority over the other models in this study when all the results are considered. Our model could be computationally expensive with very high-scale image data. Med-Net model can be further extended in the future by adding more data and examining different type of images like computerized tomography (CT), MRI, and X-ray on different organs.
11.5 Conclusion In this article, we presented a novel U-Net-like CNN for breast US image segmentation. The model was equipped with visual attention modules to focus only on the
158
Y. Alzahrani and B. Boufama
salient features and suppress irrelevant details. The proposed network was able to extract the important features while considering spatial and channel-wise information. Dense blocks were used along the construction path to provide full connectivity between the layers within the blocks. The model was validated on two breast US image datasets and showed promising results and enhanced performance. Although the model was meant for breast US images, it can be utilized for any computer vision segmentation task with some modifications.
References 1. Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F.: Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71(3), 209–249 (2021) 2. Nugroho, H., Khusna, D.A., Frannita, E.L.: Detection and classification of breast nodule on ultrasound images using edge feature (2019) 3. Lotfollahi, M., Gity, M., Ye, J., Far, A.: Segmentation of breast ultrasound images based on active contours using neutrosophic theory. J. Medical Ultrasonics 45, 1–8 (2017) 4. Kwak, J.I., Kim, S.H., Kim, N.C.: Rd-based seeded region growing for extraction of breast tumor in an ultrasound volume. Comput. Intel. Secur. 799–808 (2005) 5. Khanh, T., Duy Phuong, D., Ho, N.H., Yang, H.J., Baek, E.T., Lee, G., Kim, S., Yoo, S.: Enhancing u-net with spatial-channel attention gate for abnormal tissue segmentation in medical imaging. Appl. Sci. 10 (2020) 6. Schlemper, J., Oktay, O., Chen, L., Matthew, J., Knight, C., Kainz, B., Glocker, B., Rueckert, D.: Attention-gated networks for improving ultrasound scan plane detection (2018) 7. Suchindran, P., Vanithamani, R., Justin, J.: Computer aided breast cancer detection using ultrasound images. Mat. Today Proc. 33 (2020) 8. Suchindran, P., Vanithamani, R., Justin, J.: Computer aided breast cancer detection using ultrasound images. Mat. Today Proc. 33 (2020) 9. Nithya, A., Appathurai, A., Venkatadri, N., Ramji, D., Anna Palagan, C.: Kidney disease detection and segmentation using artificial neural network and multi-kernel k-means clustering for ultrasound images. Measurement 149, 106952 (2020). https://www.sciencedirect.com/science/ article/pii/S0263224119308188 10. Alzahrani, Y., Boufama, B.: Biomedical image segmentation: a survey. SN Comput. Sci. 2(4), 1–22 (2021) 11. Yap, M.H., Pons, G., Martí, J., Ganau, S., Sentís, M., Zwiggelaar, R., Davison, A.K., Martí, R.: Automated breast ultrasound lesions detection using convolutional neural networks. IEEE J. Biomed. Health Inform. 22(4), 1218–1226 (2018) 12. Wu, L., Xin, Y., Li, S., Wang, T., Heng, P., Ni, D.: Cascaded fully convolutional networks for automatic prenatal ultrasound image segmentation, pp. 663–666 (2017) 13. Almajalid, R., Shan, J., Du, Y., Zhang, M.: Development of a deep-learning-based method for breast ultrasound image segmentation, pp. 1103–1108 (2018) 14. Nair, A.A., Washington, K.N., Tran, T.D., Reiter, A., Lediju Bell, M.A.: Deep learning to obtain simultaneous image and segmentation outputs from a single input of raw ultrasound channel data. IEEE Trans. Ultrasonics Ferroelectrics Freq. Control 67(12), 2493–2509 (2020) 15. Lei, Y., Tian, S., He, X., Wang, T., Wang, B., Patel, P., Jani, A., Mao, H., Curran, W., Liu, T., Yang, X.: Ultrasound prostate segmentation based on multi directional deeply supervised v net. Med. Phys. 46 (2019) 16. Liao, W.X., He, P., Hao, J., Wang, X.Y., Yang, R.L., An, D., Cui, L.G.: Automatic identification of breast ultrasound image based on supervised block-based region segmentation algorithm and features combination migration deep learning model. IEEE J. Biomed. Health Inform. 1 (2019)
11 Medical-Network (Med-Net): A Neural Network …
159
17. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) 18. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018) 19. Zhang, B., Xiong, D., Su, J.: Neural machine translation with deep attention. IEEE Trans. Pattern Anal. Mach. Intel. 42(1), 154–163 (2020) 20. Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3286–3295 (2019) 21. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018) 22. Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention (2014). arXiv:1412.7755 23. Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axialattention for medical image segmentation (2021). arXiv:2102.10662 24. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018) 25. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv:1412.6980 27. Xian, M., Zhang, Y., Cheng, H.D., Xu, F., Huang, K., Zhang, B., Ding, J., Ning, C., Wang, Y.: A benchmark for breast ultrasound image segmentation (BUSIS). Infinite Study (2018) 28. Xian, M., Zhang, Y., Cheng, H.D.: Fully automatic segmentation of breast ultrasound images based on breast characteristics in space and frequency domains. Pattern Recogn. 48(2), 485–497 (2015) 29. Cheng, H.D., Shan, J., Ju, W., Guo, Y., Zhang, L.: Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recogn. 43(1), 299–317 (2010) 30. Xian, M., Zhang, Y., Cheng, H.D., Xu, F., Zhang, B., Ding, J.: Automatic breast ultrasound image segmentation: a survey. Pattern Recogn. 79, 340–355 (2018) 31. Mehta, R., Sivaswamy, J.: M-net: A convolutional neural network for deep brain structure segmentation. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 437–440 (2017) 32. Beheshti, N., Johnsson, L.: Squeeze u-net: A memory and energy efficient image segmentation network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 364–365 (2020) 33. Byra, M., Jarosik, P., Szubert, A., Galperin, M., Ojeda-Fournier, H., Olson, L., O’Boyle, M., Comstock, C., Andre, M.: Breast mass segmentation in ultrasound with selective kernel u-net convolutional neural network. Biomed. Signal Process. Control 61, 102027 (2020) 34. Shareef, B., Xian, M., Vakanski, A.: Stan: small tumor-aware network for breast ultrasound image segmentation. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), pp. 1–5 (2020) 35. Vakanski, A., Xian, M., Freer, P.E.: Attention-enriched deep learning model for breast tumor segmentation in ultrasound images. Ultrasound Med. Biol. 46(10), 2819–2833 (2020) 36. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer (2015) 37. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 5 × 5 cm) were required in order to be recognized and the recognition of moving tags worked very unreliably. In addition, the use of this function resulted in a noticeable performance loss. Potential for improvement certainly lies in the control of the virtual hand, which was rated with an average of 3.5. Literature pertaining to neuroplastic hypotheses for alleviating PLP highlights the relevance of prioritizing anthropomorphic visual feedback [11, 39]. The concept of stochastic entanglement as hypothesized by Ortiz-Catalan, however, predicts that pain reduction would be independent of the level of anthropomorphic visual representation presented to the patient [10].
212
C. Prahm et al.
Fig. 15.6 Showing the 3 subscales of the prosthesis embodiment scale for both patients. The agency subscale was rated highest, indicating a feeling of congruent control during movement of their own prosthetic hand
Using a tentacle for a hand was a concept which was new to both patients, but they embraced the idea and stated, that it did not necessarily need to be their hand, or a hand. In fact, they thought it was fun to explore in the game; however, in real life, they preferred an anthropomorphic prosthesis to a marine animal. One patient was certain that PLP was lower during the mixed reality experience, while the other patient described a lessening of pain during active play but reported later that pain was increased during play. Both patients agreed that PLP was lower after using the application. Of course, this one-time proof of concept cannot provide a statement about the alleviation of PLP. Therefore, increasing the sample size cannot only provide more insight on PLP but also on embodiment and their progress over time. One of the challenges with AR glasses is the restrictive field of view, which might lead to reduced immersion when not operating in the center of vision. The Thalmic Myo armbands accumulated a tracking error that required calibrating after 5–10 min of using the system to avoid a horizontal drift. As the Myo armbands use a 9-axis IMU containing a magnetometer, this drift should be possible to avoid without additional hardware. Other groups have shown that the Thalmic Myo IMU data (without post-processing) has no drift [40]. The latency of the movement of the real arm to the visual representation of the corresponding virtual arm was not directly measured, but for usual arm movements,
15 Extending Mirror Therapy into Mixed Reality—Design …
213
there is no noticeable lag. The latency is assumed to be below 50 ms, as the data is received from the Thalmic Myo armbands with a latency of around 25 ms [40] and translated to the virtual arm position within the next frame. A comparably low latency that has not yet been reported in other studies, in which the latency was 500–800 ms when controlling a virtual arm using custom IMU sensors [27]. The finger-tap for periodic re-calibration can be unintrusively integrated as a game element, requiring the user to perform a task with the augmented arm stretched out and tapping onto the Myo armband with the other arm. PhantomAR was not designed to be goal-oriented, but curiosity driven. There is no intended or evaluated task transfer from a virtual hand to an actual myoelectric prosthesis. There might be, though, however, the idea of PhantomAR is to simply use the hands, or hand-like representations, moving through the room and exploring the environment. Intrinsic motivation of what one might be able to find out should be the primary drive. It was important to not only provide applications for research, but also transfer them to the clinic. They should be as easy to use as possible, with separate user interfaces for the clinician and the patient. Therefore, the patient only has to mount the devices and can start interacting. In addition, the whole system is portable and completely wireless and can thus be used anywhere in the clinic or even at home. The system automatically detects the room; therefore, there are no special requirements for the room in which it is used.
15.6 Conclusion In this paper, we explored how conventional mirror therapy can be reflected and extended in a mixed reality approach using the HoloLens 2. Immersion could be increased from a technical perspective by creating a spatially coherent experience of the virtual and real world that are responsively interacting with each other and underlying it with haptic feedback. The virtual as well as the real hand could perform independently from each other or together. Players could move around freely and actively and safely explore their surroundings in a manner motivated by intrinsic motivation and curiosity. Addressing complex health-related and quality of life impacting issues such as PLP through novel technology requires interdisciplinary teamwork among therapists, engineers and researchers. To gain further insight on the impact of XR mirror therapy, we plan to conduct a four-week intervention study using the application four days per week to compare the intensity, frequency and quality of PLP and embodiment. Currently, PhantomAR is exclusively available for transradial (forearm) amputees, but in the future, we plan extended it to transhumeral (upper arm) amputees as well.
214
C. Prahm et al.
References 1. Trojan, J. et al.: An augmented reality home-training system based on the mirror training and imagery approach. Behav. Res. Methods. (2014) 2. Mayer, Á., Kudar, K., Bretz, K., Tihanyi, J.: Body schema and body awareness of amputees. Prosthetics Orthot. Int. 32(3), 363–382 (2008) 3. Clark, R.L., Bowling, F.L., Jepson, F., Rajbhandari, S.: Phantom limb pain after amputation in diabetic patients does not differ from that after amputation in nondiabetic patients. Pain 154(5), 729–732 (2013) 4. Flor, H.: Phantom-limb pain: characteristics, causes, and treatment. Lancet Neurol. 1(3), 182– 189 (2002) 5. Rothgangel, A., Braun, S., Smeets, R., Beurskens, A.: Feasibility of a traditional and teletreatment approach to mirror therapy in patients with phantom limb pain: a process evaluation performed alongside a randomized controlled trial. Clin. Rehabil. 33(10), 1649–1660 (2019) 6. Richardson, C., Crawford, K., Milnes, K., Bouch, E., Kulkarni, J.: A clinical evaluation of postamputation phenomena including phantom limb pain after lower limb amputation in dysvascular patients. Pain. Manag. Nurs. 16(4), 561–569 (2015) 7. Perry, B.N. et al.: Clinical trial of the virtual integration environment to treat phantom limb pain with upper extremity amputation. Front. Neurol. 9(9) (Sept 2018) 8. Rothgangel., Bekrater-Bodmann, R.: Mirror therapy versus augmented/virtual reality applications: towards a tailored mechanism-based treatment for phantom limb pain. Pain Manag. 9(2), 151–159 (March 2019) 9. Foell, J., Bekrater-Bodmann, R., Diers, M., Flor, H.: Mirror therapy for phantom limb pain: brain changes and the role of body representation. Eur. J. Pain 18(5), 729–739 (2014) 10. Tsao, J., Ossipov, M.H., Andoh, J., Ortiz-Catalan, M.: The stochastic entanglement and phantom motor execution hypotheses: a theoretical framework for the origin and treatment of phantom limb pain. Front. Neurol. 9, 748 (2018). www.frontiersin.org 11. Moseley, L.G., Gallace, A., Spence, C.: Is mirror therapy all it is cracked up to be? Current evidence and future directions. Pain 138(1), 7–10 (2008) 12. Dunn, J., Yeo, E., Moghaddampour, P., Chau, B., Humbert, S.: Virtual and augmented reality in the treatment of phantom limb pain: a literature review. NeuroRehabilitation 40(4), 595–601 (2017) 13. Thøgersen, M., Andoh, J., Milde, C., Graven-Nielsen, T., Flor, H., Petrini, L.: Individu-alized augmented reality training reduces phantom pain and cortical reorganization in amputees: a proof of concept study. J. Pain 21(11–12), 1257–1269 (2020) 14. Boschmann, A., Neuhaus, D., Vogt, S., Kaltschmidt, C., Platzner, M., Dosen, S.: Immersive augmented reality system for the training of pattern classification control with a myoelectric prosthesis. J. Neuroeng. Rehabil. 18(1), 1–15 (2021) 15. Andrews, C., Southworth, M.K., Silva, J.N.A., Silva, J.R.: Extended reality in medical practice. Curr. Treat. Options Cardio. Med. 21, 18 (1936) 16. Ortiz-Catalan, M., et al.: Phantom motor execution facilitated by machine learning and augmented reality as treatment for phantom limb pain: a single group, clinical trial in patients with chronic intractable phantom limb pain. Lancet 388(10062), 2885–2894 (2016) 17. Lendaro, E., Middleton, A., Brown, S., Ortiz-Catalan, M.: Out of the clinic, into the home: the in-home use of phantom motor execution aided by machine learning and augmented reality for the treatment of phantom limb pain. J. Pain Res. 13, 195–209 (2020) 18. Bach, F., et al.: Using Interactive Immersive VR/AR for the Therapy of Phantom Limb Pain. Hc’10 Jan, pp. 183–187 (2010) 19. Ambron, E., Miller, A., Kuchenbecker, K.J., Buxbaum, L.J., Coslett, H.B.: Immersive low-cost virtual reality treatment for phantom limb pain: evidence from two cases. Front. Neurol. 9, 67 (2018) 20. Markovic, M., Karnal, H., Graimann, B., Farina, D., Dosen, S.: GLIMPSE: Google glass interface for sensory feedback in myoelectric hand prostheses. J. Neural. Eng. 14(3) (2017)
15 Extending Mirror Therapy into Mixed Reality—Design …
215
21. Tepper, O.M., et al.: Mixed reality with hololens: where virtual reality meets augmented reality in the operating room. Plast. Reconstr. Surg. 140(5), 1066–1070 (2017) 22. Saito, K., Miyaki, T., Rekimoto, J.: The method of reducing phantom limb pain using optical see-through head mounted display. In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 1560–1562 (2019) 23. Lin, G., Panigrahi, T., Womack, J., Ponda, D.J., Kotipalli, P., Starner, T.: Comparing order picking guidance with microsoft hololens, magic leap, google glass XE and paper. In: Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, vol. 7, pp. 133–139 (2021) 24. Gorisse, G., Christmann, O., Amato, E.A., Richir, S.: First- and third-person per-spectives in immersive virtual environments: presence and performance analysis of em-bodied users. Front. Robot. AI 4, 33 (2017) 25. Nishino, W., Yamanoi, Y., Sakuma, Y., Kato, R.: Development of a myoelectric prosthesis simulator using augmented reality. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1046–1051 (2017) 26. Ortiz-Catalan, M., Sander, N., Kristoffersen, M.B., Håkansson, B., Brånemark, R.: Treatment of phantom limb pain (PLP) based on augmented reality and gaming controlled by myoelectric pattern recognition: a case study of a chronic PLP patient. Front. Neurosci. 8(8), 1–7 (Feb 2014) 27. Sharma, A., Niu, W., Hunt, C.L., Levay, G., Kaliki, R., Thakor, N.V.: Augmented reality prosthesis training setup for motor skill enhancement (March 2019) 28. Tatla, S.K., et al.: Therapists’ perceptions of social media and video game technologies in upper limb rehabilitation. JMIR Serious Games 3(1), e2 (2015). 29. Lohse, K., Shirzad, N., Verster, A., Hodges, N.: Video games and rehabilitation: using design principles to enhance engagement in physical therapy, pp. 166–175 (2013) 30. Arya, K.N., Pandian, S., Verma, R., Garg, R.K.: Movement therapy induced neural reorganization and motor recovery in stroke: a review. J. Bodyw. Mov. Ther. (2011) 31. Primack, B.A., et al.: Role of video games in improving health-related outcomes: a systematic review. Am. J. Prev. Med. (2012) 32. Kato, P.M.: Video games in health care: closing the gap. Rev. Gen. Psychol. (2010) 33. Gamberini, L., Barresi, G., Majer, A., Scarpetta, F.: A game a day keeps the doctor away: a short review of computer games in mental healthcare. J. Cyber Ther. Rehabil. (2008) 34. Gentles, S.J., Lokker, C., McKibbon, K.A.: Health information technology to facilitate communication involving health care providers, caregivers, and pediatric patients: a scoping review. J. Med. Internet Res. (2010) 35. Johnson, D., Deterding, S., Kuhn, K.A., Staneva, A., Stoyanov, S., Hides, L.: Gamification for health and wellbeing: a systematic review of the literature. Internet Interv. (2016) 36. Ijsselsteijn, W.A., Kort, Y.A.W.D., Poels, K.: The game experience questionnaire. In: Johnson, M.J., VanderLoos, H.F.M., Burgar, C.G., Shor, P., Leifer, L.J. (eds) Eindhoven, vol. 2005, no. 2013, pp. 1–47 (2013) 37. Bekrater-Bodmann, R.: Perceptual correlates of successful body–prosthesis interaction in lower limb amputees: psychometric characterisation and development of the prosthesis em-bodiment scale. Sci. Rep. 10(1), (Dec 2020) 38. Prahm, C., Schulz, A., Paaßen, B., Aszmann, O., Hammer, B., Dorffner, G.: Echo state networks as novel approach for low-cost myoelectric control. In: Artificial Intelligence in Medicine: 16th Conference on Artificial Intelligence in Medicine, AIME 2017, June 21–24, 2017, Proceedings, no. Exc 277, Vienna (pp. 338–342). Austria, Springer (2017) 39. Harris, A.J.: Cortical origin of pathological pain. Lancet 354(9188), 1464–1466 (1999) 40. Nyomen, K., Romarheim Haugen, M., Jensenius, A.R.: MuMYO—evaluating and exploring the MYO armband for musical interaction. Proceedings International Conference New Interfaces Musical Expression (2015)
Chapter 16
An Analysis of Trends and Problems of Information Technology Application Research in China’s Accounting Field Based on CiteSpace Xiwen Li, Jun Zhang, Ke Nan, and Xiaoye Niu Abstract By using CiteSpace software and using the number of publications, the main authors and institutions, the research topics and the research fronts as indexes, a text mining and visual analysis of the existing literature in the domestic CNKI from 2000 to 2020 is conducted. According to the development of practice, the number of researchers’ research literature on the application of information technology in accounting has increased year by year, but the quality has not improved; big data, management accounting, financial sharing, cloud accounting, and blockchain technology have been in the spotlight of recent research; the Ministry of Finance, the National Accounting Institute and financial support played an important role. However, there are still challenges, such as a lack of cross-institutional and crossregional cooperation among scholars, limited research on accounting informatization construction of SMEs, and inadequate literature on accounting education. Strengthening guidance and support, promoting cooperation and exchanges can continuously promote the mutual progress of theoretical research and practical innovation of information technology application in the field of accounting.
16.1 Questions Posed As science and technology have rapidly developed, accounting is no longer purely manual. As a result of the application of information technology in accounting, scholars are investigating the subject of accounting modernization, from the initial research on accounting computerization to the current trend of accounting informatization. By integrating theoretical research and practical application, technology
X. Li · J. Zhang · K. Nan (B) · X. Niu School of Accounting, Hebei University of Economics and Business, Hebei, China e-mail: [email protected] J. Zhang Hebei Zhongmei Xuyang Energy Co. LTD, Hebei, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_17
217
218
X. Li et al.
for accounting is being applied more effectively. Accounting is undergoing significant changes with the development of information technology, and practice also demands higher requirements for accountants in the new era, especially the continuous improvement of big data technology, which brings new challenges to accounting work. In the current era, the accounting environment is undergoing tremendous change with the development of information technology. There is great change taking place in the accounting environment with the development of information technology. This has resulted in higher practical requirements for accountants in the new era. In particular, the constant improvement of big data technology has created new challenges for accountants. It has become a topic of common concern in the academic and practical community how scholars should integrate information technology and accounting practice to carry out research. In this study, CiteSpace software is used to conduct text mining and visual analysis of the manuscripts from domestic core journals and the CSSCI database from 2000 to 2020 that contains research on the application of information technology in the accounting field to determine the context and dynamics of current research.
16.2 Text Mining and Visual Analysis of Research Trends in Information Technology in the Field of Accounting in China 16.2.1 Research Tools CiteSpace, a statistical and information visualization software program developed by Professor Chen Chaomei of the School of Computer Information Science, Russell University, USA, is used for the analysis [1]. By analyzing the number of articles, keywords, authors, institutions, and time distribution, the study reveals the current status, hotspots, and directions of the current information technology literature in the field of accounting, explores existing problems and suggests potential future directions.
16.2.2 Source of Data The literature selected for this paper comes from the CNKI database. In order to ensure the representativeness and authority of the selected data, the literature source is set to Peking University and CSSCI database through the advanced search function. The collected content includes keywords, authors, institutions, article titles, publication time, publications, and abstracts. These themes include financial sharing, big data accounting, Internet accounting, accounting computerization, accounting informatization, accounting cloud computing, accounting intelligence, blockchain,
16 An Analysis of Trends and Problems of Information Technology …
219
and artificial intelligence. The retrieval period is February 1, 2021, and the time period is 2000–2020. In total, 5501 documents were retrieved, imported into the software, and duplicates were removed through the data module, yielding 4136 valid documents cited 45,658 times with an average citation frequency of 11.04.
16.2.3 Text Mining and Visual Analysis of Related Research Based on Classification Visual Analysis of the Publication Volume. As shown in Fig. 16.1, the number of applied research on information technology in accounting has increased year by year, and the growth rate is fast, indicating that with the development of information technology, related theoretical research is also receiving attention from academic circles. The number of publications has increased significantly, and the growth trend of non-core publications is roughly the same. However, research papers published in Peking University core and CSSCI journals have not changed significantly, meaning that the quality of research is not improving significantly. It may be related to the fact that empirical research is more prevalent in core journals. Moreover, at the same time, related research topics have also generated new branches. Especially after 2013, technologies such as financial sharing, cloud accounting, big data, and intelligent finance have emerged. Likewise, there are no significant changes in the number of articles published in core journals. Compared with accounting computerization and accounting informatization, the number and quality of articles on various branch topics are insufficient at this stage. In conclusion, from 2000 to 2020, the number of applied research papers on information technology in the accounting field has increased year by year. However, the quality of the research should be improved. Visual Analysis Based on Research Themes. It should be noted that the threshold algorithm is set in the initial processing parameters of CiteSpace, and c (minimum citations), cc (co-citations in this slice), and CCV (co-citations after specification) Number of articles issued (articles) 8000 6000 4000 2000 0 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 All Journals Fig. 16.1 Statistics of the number of articles issued
220
X. Li et al.
are, respectively, (4, 4, 20), (6, 5, 20), (6, 5, 20), and the time slice is one year. In addition, analysis of the keyword in literature data results in 114 nodes (N = 114), 199 critical paths (E = 199), and a network density of 0.0309 (Density = 0.0309). The limited frequency is greater than or equal to 30, and the co-occurrence network (Fig. 16.2) and TimeZone View (Fig. 16.3) of the main keywords are obtained. In the keyword co-occurrence network map, each node represents a keyword, and the size of the node reflects the frequency of that keyword. The color of the node reflects the publication date of the document where the keyword appears, and the darker color represents an earlier publication date. Analysis of Co-occurrence of Keywords. In Figs. 16.2 and 16.3, it can be seen that in the past 20 years, China’s accounting computerization and accounting informatization research have occupied the top two positions, respectively, with centralities of 0.32 and 0.18. According to the subdivision, the two themes were at the core of research for the period 2001–2010, and the research center showed the trend of shifting to big data accounting research. Relevant research has transitioned from the study of accounting software to accounting information systems and internal control and then to a combined application of information technology and management accounting. This has promoted the rapid transformation of accounting from traditional accounting functions to management and service functions. Accounting for management is closely related to informatization. To some extent, the management accounting boom is driven by national policies. In 2013, the Ministry of Finance designated management accounting as an important direction of accounting reform, and in 2014, it issued the “Guiding Opinions on Comprehensively Promoting the Construction of Management Accounting System,” which ushered in a period of rapid development for management accounting [2]. Before 2012, the focus of management accounting research was cost control and theoretical exploration. After 2012, with the rise of big data and artificial intelligence, management accounting research content gradually enriched, and in 2014, it became a national strategy [3]. The powerful data mining, processing, and analysis capabilities of big data technology have expanded the information sources of management accounting, enabling it to unearth the potential value from data to enhance the competitiveness of enterprises and maximize its benefits [4, 5]. Based on the above analysis, it is evident that big data and management accounting are at the core of the research at this stage, echoing the development of practice. Timeline Chart Analysis. The first stage (2001–2010) was devoted to the research topics of domestic core journals and CSSCI papers. The initial processing parameters of CiteSpace should be set to k = 25, the time period 2001–2010, the time slice is one year, choose the timeline mode in the visualizations option, and draw a sequence diagram of keyword co-occurrence clustering. According to Fig. 16.4, the graph has 487 nodes and is divided into eight clusters. The module value is 0.4321, and the contour value is 0.7434. Clusters 1, 3, 5, and 8 are computerized accounting subjects. The main keywords are computerized accounting, computerized auditing, computerized accounting system, accounting data, accounting software, accounting center, office automation, commercialized accounting software, accounting reform, computerized
221
Fig. 16.2 Co-occurrence network of keywords
16 An Analysis of Trends and Problems of Information Technology …
Fig. 16.3 Keyword time zone network
222 X. Li et al.
223
Fig. 16.4 2001–2010 timeline for keyword clusters
16 An Analysis of Trends and Problems of Information Technology …
224
X. Li et al.
accounting, accounting computerization major, comprehensive professional ability, practical teaching, accounting computerization teaching, and applied talents. Among these keywords, accounting computerization, accounting computerization system, accounting data, accounting software, and accounting computerization major have strong centrality. In Fig. 16.4, each node represents a keyword, and the size of the node indicates its frequency. The color change of a node indicates the date when the keyword appeared in the article, and a darker color indicates an earlier date. The lines between nodes represent occurrences of keywords in the same article. Years are counted from left to right on the timeline, and the position of a node indicates the date when the keyword appeared for the first time. Cluster 0 is accounting informatization. The main topics are accounting informatization, information system, network technology, business process reorganization, information society, internal control, information technology, accounting information system, accounting system, accounting information resources, value chain accounting management, information system reconstruction, intelligent agent, intelligent financial software, and data mining technology, among which accounting informatization, internal control, accounting information system, and information technology have strong centrality. At this stage, accounting computerization and accounting informatization are at the core of research. China’s accounting computerization originated in the 1980s and completed the transition from “accounting” to “management” at the beginning of the twenty-first century, and then ERP and other management accounting software pushed accounting computerization into a new stage. According to Liu Qin [6], “the sign of accounting informatization lies in the widespread use of ERP systems” with the rapid development of the “big intelligence shift cloud,” the research gradually evolves from accounting computerization to accounting informatization to accounting intelligence. In the second phase (2010–2020), domestic core journals and CSSCI papers were studied for the theme. Initially, CiteSpace was configured with a g-index algorithm and k = 25, time interval 2011–2020, 1 year-long data slice, and the timeline mode in the visualizations option to build a keyword co-occurrence clustering time series chart to analyze the evolution of research themes and the interdependence between them. According to Fig. 16.5, the graph covers 499 nodes, which are classified into seven major clusters with a modularity value of 0.4617 and a silhouette value of 0.7545. According to Fig. 16.5, in Cluster 0, which is the largest, the keywords include accounting informatization, cloud computing, SAAS, IASS, PAAS, cloud accounting, financial sharing, Internet era, digital financial transformation, Internet+, teaching model, big data accounting, and big data intelligence, among which accounting informatization, cloud accounting, and big data accounting have greater centrality. Big data technology is the basis for developing intelligent finance, but the level of data storage and data processing is not high, and there is a large research area to explore [7]. Under the influence of big data technology, accountants should become experts in data mining and visualization, and future analysis and display methods will consist not only of financial statements and texts, but more intuitive
225
Fig. 16.5 2011–2020 timeline for keyword clusters
16 An Analysis of Trends and Problems of Information Technology …
226
X. Li et al.
data graphs, simple, and repetitive positions will be replaced by accounting systems, and accountants will be transformed into data analysts [8]. In cluster 1, there is the CPA industry, which focuses on examining the development of the accounting profession, the training of accounting talents, and the qualifications of accountants and related policies. The main keywords include accounting information system, accounting management work, accounting firm, accounting service market, international financial reporting standards, certified public accountant industry, the accounting industry, accounting information standard system, small and medium accounting firms, and non-auditing businesses. The application of information technology to the accounting field has stimulated the development of the industry. The application of information technology in the accounting field has stimulated the development of the certified public accountant industry, and the research on auditing technology and methods is also a hot topic in the new era. For instance, Xu Chao classified auditing into three stages: computer-assisted audit, network audit, and big data audit [9]. A recent study by Chen Wei et al. applied text mining and visual analysis based on big data technology to the area of auditing, leading to an entirely new field of research [10]. Cluster 2 focuses on computerized accounting. Keywords are computerized accounting, reporting system, teaching video, vocational college, accounting major, teaching reform, and open education. Currently, the number of researches on computerized accounting has been significantly reduced, and the finance function has developed from accounting to service-oriented and is developing toward digitization and artificial intelligence [11]. Big data, financial sharing, and artificial intelligence are gradually being applied to the accounting field, and centralization is gradually being achieved. Cluster 3 is dedicated to shared services. The main keywords are shared service center, shared service model, national audit quality, audit quality, transaction rules, and risk management and control, among which shared service center is more intermediary, shared service model, and risk management. Financial sharing has been a hot topic in recent years, and as early as 1998, Haier Group began to explore the strategy of financial information sharing [12]. Due to technical limitations, development is not yet mature, and related research has not been able to achieve national attention. In the new stage, accounting informatization is becoming more and more perfect, and financial sharing is no longer a simple online analysis tool. Its value in optimizing organizational structure, optimizing processes, and reducing costs have been recognized. Currently, financial sharing is widely used by large group companies and state-owned enterprises, and there is still room for other technologies to be embedded within financial sharing. For example, combining RPA technology and OCR scanning technology with financial sharing can dramatically enhance the automation of corporate finance work. It can reduce the human error rate and reduce the operating costs of the enterprise [13]. Cluster 5 is blockchain, and the main keywords are smart contract, consensus mechanism, expendable biological assets, database technology, blockchain technology, business and finance integration, data mining, and surplus manipulation, among which the more intermediary ones are: smart contract, blockchain technology,
16 An Analysis of Trends and Problems of Information Technology …
227
and consensus mechanism. Although blockchain has been widely applied to improve information quality, most of the research focuses on audit investigation, and its value for enterprise management has yet to be discovered. The application of blockchain in financial sharing can promote the financial intelligence of enterprises, globalization of management and control, shared services, and integration of business and finance [14]. In the second stage, accounting informatization has evolved from the research of concepts and systems to the application of information technology. In the accounting field, the number of applied research projects on big data, financial sharing, blockchain, robotic process automation, and other technologies has increased dramatically, and the content and results have improved as well. In the context of management accounting research, the application of different information technologies combined with management has enriched the work of finance workers and promoted the transformation of finance personnel from simple bookkeeping work to enterprise management [15], which has a far-reaching impact on accounting research and practice. Analysis of the Emergence of Research Frontiers. The emergent words are commonly used to analyze the frontier or research trends in a certain research field. As seen in Table 16.1, among the 25 keywords for which data were extracted in this paper, a total of 6 keywords with an emergent degree greater than 20 are, in descending order, accounting computerization (110.37), big data (62.5), management accounting (37.74), blockchain (37.51), cloud accounting (33.77), financial sharing (24.38), and industry-financial integration (20.23). During the period 2001–2008, computerized accounting was the core theme of research with a prominence of 110, followed by accounting software and accounting information systems. Since 2009, the CPA profession has become a hot topic and continued until 2016. With the rapid application of information technology to the accounting industry, cloud accounting in 2013, big data and management accounting in 2014, blockchain, financial sharing, and industry-accounting integration in 2017 became hotspots in turn, and the emergence degree was always at a high level as of 2020. Visual Analysis Based on the Lead Authors and Institutions. CiteSpace is set up to use the g-index algorithm and k = 25 in the initial processing parameters, and the time slice is one year. The author and institution in the literature are analyzed at the same time, and the initial results show 967 nodes (N = 967) and 861 critical paths (E = 861), as well as 0.0018 node density (Density = 0.0018). The limited frequency is greater than or equal to 10, and the main author and institute node information co-occurrence network are obtained. According to Fig. 16.6, in the cooccurrence diagram of authors and institutions, each node represents an author or a research institute. The size of the node reflects the number of published articles; the color of the node reflects the time of issuance, and the darker color indicates the earlier issuance; the connection between the nodes reflects the cooperation between authors and authors, authors and institutions, and institutions, and the thickness of the connection reflects the closeness of the cooperation. The thickness of the line reflects the degree of cooperation.
228
X. Li et al.
Table 16.1 Research frontier keyword emergence degree Keywords
Prominence
Start
End
Accounting computerization Computerized accounting Computerization Accounting computerization system Accounting Software Accounting Information System CPA Industry Accounting Information Technology xbrl Shared Service Center Cloud Accounting Cloud Computing Big Data Management Accounting Big Data Era Financial Shared Service Center Management Accounting Information Technology Financial Shared Services Internet+ Shared Services Blockchain Financial Sharing industry and finance integration Blockchain Technology Artificial Intelligence
110.37 18.96 18.63 18.23 13.98 14.64 13.25 11.69 11.21 17.51 33.77 18.82 62.5 37.74 11.52 19.82
2001 2001 2001 2001 2001 2005 2009 2010 2010 2012 2013 2013 2014 2014 2014 2015
2008 2008 2008 2007 2004 2006 2016 2012 2016 2020 2020 2018 2020 2020 2020 2020
11.34
2015
2020
18.46 12.39 11.11 37.51 24.38 20.23 15.15 19.37
2016 2016 2016 2017 2017 2017 2017 2018
2020 2020 2020 2020 2020 2020 2020 2020
2001 - 2020
In Fig. 16.6, the author with the most papers is Professor Cheng Ping and his team from the School of Accounting at the Chongqing University of Technology, whose research direction is the application of big data technology to accounting [16], followed by Zhang Qinglong from Beijing National Accounting Institute [17], whose research direction is financial sharing, and Liu Yuting from the Ministry of Finance, whose research focuses on accounting reform in China [18], followed by Wang Jun, Yang Jie, Huang Changyong, Ding Shuqin, Ying Limeng, and Liu Qin. Among the four major research groups in the field of accounting informatization, the School of Accounting of the Chongqing University of Technology is the most active. The Accounting Department of the Ministry of Finance, Beijing National Accounting Institute, and Shanghai National Accounting Institute are also important research camps. Figure 16.7 shows that except for the National Natural Science Foundation of China and the National Social Science Foundation of China, the number of science funds at the Chongqing Municipal Education Commission is much higher than that of the other places, indicating that the Chongqing Municipal Education Commission has paid sufficient attention to applying technology to accounting. In summary, the Ministry of Finance, the National Accounting Institute, and funding support played a major role in its completion. However, the cooperation network of Chinese accounting scholars remains primarily internal, and the lack of cross-institutional and cross-regional cooperation has had an adverse effect on its progress.
229
Fig. 16.6 Co-presence network of major authors and institutions
16 An Analysis of Trends and Problems of Information Technology …
National Natural Science ...
83
35
National Social Scientific research Science Foundation... project of Chongqing...
73
Fig. 16.7 Distribution of funds supporting research
0
10
20
30
40
50
60
70
80
90
Number of literatures (articles)
China Postdoctoral Science Foundation
11
Humanities and Social Science...
8
Soft science Research Program...
7
China National Tobacco Corporation...
7
Humanities and Social Science...
6
5
Soft science Research Jiangsu Blue Project Project...
5
230 X. Li et al.
16 An Analysis of Trends and Problems of Information Technology …
231
16.3 A Review of Major Issues Discovered by Research and Analysis Regarding the Application of Information Technology in Accounting The previous analysis found that the number of relevant studies is basically consistent with the trend of practice, but the quality of research has not kept up, and the lack of cross-border cooperation among researchers has become a weak issue in current research. In a further study, we also found two other prominent problems in the research.
16.3.1 Few Studies Have Been Conducted on the Application of Information Technology to Accounting in SMEs There is a low level of informatization construction in the small and medium-sized industries, and few scholars have conducted in-depth research on these industries. Through the process of keyword analysis, it was also found that the frequency of SMEs appeared 51 times, accounting for only 1.18% of the total number of documents. Differences in research content and conclusions were small, focusing mainly on low capital investment, backward software and hardware, lack of talents, and insufficient attention of managers [19]. This shows that how to make SMEs have enough funds for modernization and attract complex talents to design and develop financial informatization application systems to match the development of SMEs is a topic that needs urgent attention and research. SMEs account for 99% of the number of enterprises in China and are the driving force behind the continued positive development of our national economy. The report of the 19th Party Congress clearly points out that “deepen the reform of the science and technology system, establish a technology innovation system with enterprises as the main body, market-oriented, and deep integration of industry, academia, and research, strengthen the support for SMEs’ innovation, and promote the transformation of scientific and technological achievements.” Limited by the difficulties of financing, simple organizational structure, lack of talents, and other factors, SMEs are still relatively backward in the application of information technology in the field of accounting, especially the low level of application and the limited role of management accounting informatization in SMEs in China [20].
232
X. Li et al.
16.3.2 The Number and Quality of Information-Based Accounting Education Research is Low and Declining The construction and application of accounting information technology require highquality personnel training. Higher education plays an important role in the process of training informatization talents. By combining 22 keywords such as “accounting education,” “accounting teaching,” “practical teaching,” and “training students” into one theme, “accounting education,” a total of 176 articles were obtained. We found 176 documents when we combined 22 keywords with the theme “accounting education,” including “accounting education,” “accounting teaching,” “practice teaching,” and “cultivating students.” This proportion suggests that information technology in accounting education does not receive enough attention. Moreover, in the past 20 years of information-based accounting education exploration, most of the contents are accounting computerization and ERP curriculum design. Flipped classrooms and catechism have been proposed many times [21], but innovative education models have rarely been explored. Figure 16.8 shows that the number of research topics in accounting education is low and on a decreasing trend. Not only that, when the core journals were counted, it was found that only 13 articles out of 176 accounting education literature were conference reviews or book reviews, and high-quality research still needs to be improved. This is basically consistent with the view that “the number of core journals on accounting talent training is also decreasing and the quality of papers is declining” found by Nian Yan [22].
Number of articles issued (articles) 30
400 300
20
200
10
100
0 0 200620072008200920102011201220132014201520162017201820192020 Accounting Education
Total number of articles
Fig. 16.8 Statistics on the number of accounting education articles issued
16 An Analysis of Trends and Problems of Information Technology …
233
16.4 Conclusions and Recommendations of the Research on the Application of Information Technology in Accounting 16.4.1 Conclusions of the Research First, the number of literature on the application of information technology in the field of accounting is on the rise, but the number of literature published in the core journals of Peking University and CSSCI has not changed significantly, and the quality of research on the application of emerging technologies in the field of accounting has a decreasing trend compared to that of research in the period of computerized accounting, which indicates that the quality of relevant research needs to be further improved. Second, the research themes and hotspots of information technology application in the accounting field show obvious changes with the development of information technology. During 2001–2011, accounting computerization and accounting informatization were at the core of research on information technology in accounting, and their literature quantity, centrality, and prominence were much higher than other topics; with the gradual maturity of information technology development, big data accounting became the hottest topic in 2013–2020, followed by financial sharing, cloud accounting, and blockchain topics. Overall, at this stage, big data and management accounting are at the core of research, big data has opened up new paths for management accounting research, and management accounting innovation has become a hot spot for current and future research. Third, the research on the application of information technology in the field of accounting has significant contributions from the finance department, the National Accounting Institute, and the fund support literature, which fully demonstrates the importance and leadership of the state in promoting the application of information technology in the field of accounting, but the research is mostly confined within the unit, and the lack of cross-institutional and cross-regional cooperation also limits the extensiveness and depth of the research. Fourth, the research on the application of information technology in the field of accounting for SMEs and the combination of information technology and accounting education is obviously insufficient in quantity and generally low in quality, which needs urgent guidance and attention.
234
X. Li et al.
16.4.2 Research Recommendations for Advancing the Application of Information Technology in Accounting The 14th Five-Year Plan for Accounting Reform and Development has identified “the application of new information technology to basic accounting work, managerial accounting practice, financial accounting work, and the construction of unit financial accounting information systems” as the main subject of research. To better promote information technology application research and enhance the integration of theoretical research and practical innovation, government departments, application entities, and research institutions must engage in joint efforts. In the first place, the government departments should continue to lead research in the field of applying information technology in accounting. Increase fund support, pay particular attention to improving the quality of research results, and increase the attention paid to the application of information technology in accounting for small and medium-sized enterprises, as well as the combination of information technology and accounting education. At the same time, the government departments should attach great importance to improving the soft power of sustainable development of enterprises by enhancing management accounting systems and internal control mechanisms. The application of information technology in the field of accounting should not be limited to a certain enterprise or unit or a certain industry, but only through systematic research to raise it to the theoretical level and form a scientific theoretical system of an effective combination of information technology and accounting, can we really promote the height and depth of accounting informatization construction, and can give full play to the positive role of accounting in enterprise management and even economic construction. Secondly, accounting scholars should actively expand the scope of cooperation, strengthen cooperation with government departments and enterprises, and make full use of cross-institutional and cross-discipline collaboration to effectively solve practical and difficult problems concerning the application of information technology in the field of accounting, so as to develop a new pattern of integrated development of accounting information technology application and theoretical innovation beyond its own narrow vision. Finally, government departments should also raise the importance of research and transformation of accounting education informatization results and continue to improve the collaborative education mechanism between industry, academia, and research. Both the supply and demand sides of accounting informatization talent training should raise awareness and strengthen communication. The development of the digital economy has increasingly increased the requirements for the training of accounting professionals. These requirements include improving the comprehensive ability of the teaching staff to apply information technology to accounting teaching and promoting the transformation of the training model. These requirements require science to promote collaboration and exchanges between businesses, schools and research institutions, and to enhance the ability to develop the theoretical and applied
16 An Analysis of Trends and Problems of Information Technology …
235
integration of talent. The above measures are conducive to incubating higher quality accounting information talents for the society, and consolidating the human resources foundation for accounting to help the development of information economy. The impact of information technology application in the accounting field is far-reaching, and accounting theory research and practice innovation are equally important. Through visual analysis, this paper sorts out the research development, summarize and refines the characteristics of the current relevant research and some outstanding problems, and puts forward corresponding suggestions, hoping to attract the attention of the academic community, and only through joint efforts of government departments and accounting scholars and practitioners, the future use of information technology in the field of accounting will be more in-depth and positive.
References 1. Yue, C., Chaomei, C., Zeyuan, L., et al.: Methodological functions of CiteSpace knowledge graphs. Scientology Res. 33(2), 242–253 (2015) 2. Man, W., Xiaoyu, C., Haoyang, Y.: Reflections and outlook on the construction of management accounting system in China. Finan. Acc. (22), 4–7 (2019) 3. Zhanbiao, L., Jun, B.: Bibliometric analysis of management accounting research in China (2009–2018)-based on core journals of Nanjing university. Finan. Acc. Commun. 7, 12–18 (2020) 4. Maohua, J., Jiao, W., Jingxin, Z., Lan, Y.: Forty years of management accounting: a visual analysis of research themes, methods, and theoretical applications. J. Shanghai Univ. Fin. Econ. 22(01), 51–65 (2020) 5. Ting, W., Yinghua, Q.: Exploring the professional capacity building of management accounting in the era of big data. Friends Account. 19, 38–42 (2017) 6. Qin, L., Yin, Y.: Accounting informatization in China in the forty years of reform and opening up: review and prospect. Account. Res. 02, 26–34 (2019) 7. Qinglong, Z.: Next-generation finance: digitalization and intelligence. Financ. Account. Mon. 878(10), 3–7 (2020) 8. Weiguo, L., Guangjun, L., Shaobing, P.: The impact of data mining technology on accounting and response. Financ. Account. Mon. 07, 68–74 (2020) 9. Chao, X., et al.: Research on auditing technology based on big data. J. Electron. 48(05), 1003–1017 (2020) 10. Chen, W., et al.: Research on audit trail feature mining method based on big data visualization technology. Audit Res. 201(1), 16–21 (2018) 11. Shangyong, P.: On the development and basic characteristics of modern finance. Financ. Account. Mon. 881(13), 22–27 (2020) 12. Zhijun, W.: Practice and exploration of financial information sharing in Haier group. Financ. Account. Newslett. 1, 30–33 (2006) 13. Ping, C., Wenyi, W.: Research on the optimization of expense reimbursement based on RPA in financial shared service centers. Friends Account. 589(13), 146–151 (2018) 14. Runhui, Y.: Application of blockchain technology in the field of financial sharing. Financ. Account. Mon. 09, 35–40 (2020) 15. Gang, S.: Innovation of management accounting personnel training mechanism driven by big data and financial integration. Financ. Account. Mon. 02, 88–93 (2021) 16. Ping, C., Jinglan, Z.: Performance management of financial sharing center based on cloud accounting in the era of big data. Friends Account. 04, 130–133 (2017)
236
X. Li et al.
17. Qinglong, Z.: Financial sharing center of Chinese enterprise group: case inspiration and countermeasure thinking. Friends Account. 22, 2–7 (2015) 18. Yuting, L.: Eight major areas of accounting reform in China are fully promoted. Financ. Account. 01, 4–10 (2011) 19. Yumei, J.: Discussion on the construction of cloud computing accounting information technology for small and medium-sized enterprises. Financ. Account. Commun. 07, 106–109 (2018) 20. Xiaoyi, L.: Research on the application of management accounting informatization in small and medium-sized enterprises in China. Econ. Res. Ref. 59, 64–66 (2016) 21. Weibing, Z., Hongjin, Z.: Exploration of the design and implementation of flipped classrooms based on effective teaching theory. Financ. Account. 04, 85–86 (2020) 22. Yan, N., Chunling, S.: Visual analysis of accounting talent training research—based on the data of CNKI from 2009–2018. Financ. Account. Commun. 15, 172–176 (2020)
Chapter 17
Augmented Reality Framework and Application for Aviation Emergency Rescue Based on Multi-Agent and Service Siliang Liu, Hu Liu, and Yongliang Tian
Abstract Aviation emergency rescue is one of the efficient ways to rescue and transport people and transport supplies. Dispatching multiple aircraft for air rescue covering a large area is required for systematic planning. Given the complexity of such a system, a framework is proposed to build an augmented reality system to present the situation and assist in decision-making. An augmented reality simulation and monitoring system for aviation emergency rescue based on multi-agent and service are completed to apply the framework.
17.1 Introduction Aircraft, which include fixed-wing aircraft and helicopters, have the advantage of rapid mobility, multi-type loading capability, and less restriction by terrain. Aircraft have been more and more applied to the emergency rescue area. Missions such as aviation firefighting [1], aeromedical rescue [2], aviation search and rescue [3], and aviation transport can be collectively called aviation emergency rescue. With an enormous scale of land and sea, China has a great need of aviation emergency rescue in case of suffering from disasters. Yet maintaining a large aircraft fleet in every city is impossible owing to the economic issue. Thus, how to deploy and dispatch the aircraft in a certain area becomes a problem. Wang et al. [4] studied the deployment and dispatch of aviation emergency rescue. While the method to deploy and dispatch the aircraft is discussed, an intuitive way of showing and commanding the process remains to be solved. Augmented reality provides an efficient and intuitive way to present the virtual environment in the physical world. Augmented reality has been applied to the aeronautic field for training and maintenance instruction [5]. And the model for large scale of land [6] and agent-based model [7] has been developed in augmented reality. Augmented reality device providers such as Microsoft and Magic Leap, and game S. Liu · H. Liu · Y. Tian (B) Beihang University, Xueyuan Road 37, Beijing, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8_18
237
238
S. Liu et al.
engines such as Unity3D and Unreal have developed tools and environments offered to developers to develop their augmented reality program while they can have little consideration to hardware adaptation and focus more on the program itself. In this paper, the advantage of augmented reality is taken into consideration for aviation emergency rescue display and commanding. A framework for aviation emergency rescue on a large scale based on augmented reality is proposed. A system instance developed in Unity3D with the usage of MRTK and deployed in Microsoft’s Hololens2 is developed to verify the framework.
17.2 Framework of Augmented Reality System 17.2.1 System Framework Considering the demand for aviation emergency rescue on a large scale, the framework of the augmented reality system consists of two main parts, which are the service part and multi-agent part. And besides the augmented reality system itself, the framework also contains the development environment part and hardware part. The development environment includes a toolkit to develop services for augmented reality, 3D modeling software to build models of aircraft and cities, and a game engine to visualize the system. The whole system will be installed in augmented reality devices with which users can watch and interact. The framework is shown in Fig. 17.1. The service part contains service to offer function when needed and can be called for one or more times when the system is running. The multi-agent part contains two main types of entities that will be visualized in the system and will be instantiated in the application which applies the framework.
17.2.2 Service-Oriented Architecture The service-oriented architecture contains several services in two aspects, the system basic service and scenario service. The system basic service begins to run when the system initializes and keeps running in the background, offering services to get input from the user, send or receive data to other systems, and hold the persistence of the hologram of the system to anchor to a specific place in space. The scenario service on the other hand is highly tied with aviation emergency rescue. Services contained in the scenario service will only be called once after the system is initialized. Those
17 Augmented Reality Framework and Application for Aviation …
239
Fig. 17.1 Augmented reality system framework for aviation emergency rescue
services will function as the first step to visualize the system or the last step when shutting down the system. System Basic Service System basic service contains three main services which are the spatial anchor and share service, gesture recognition service, and network transfer service. Spatial anchor and share service are the key service to anchor the hologram upon the real world. Since users of the augmented reality system may wander around the hologram to get a better view or may wander away to deal with other stuff, it’s of vital importance to maintain the hologram and anchor it in the space. Spatial anchor and share service can identify the space point or plane that the user specified and hold it until the change is required. Gesture recognition service provides the user a way to interact with the augmented reality system. This service runs throughout the whole period of the system running time, playing a role as a port to receive the user’s input. In an augmented reality system, users can use one or both hands to click, select, spin, and zoom in or out on the hologram. Gesture recognition service monitors users’ hands’ position and their gestures, and once the gesture fits a specific mode, gesture recognition service will translate it into instruction and invoke relevant functions. Network transfer service is used for transferring data through the network with other computers to synchronize the state of the system. The function of network transfer service can be divided into two parts, which are sending data and receiving data. Receiving data can be called when the system is used to demonstrate the state
240
S. Liu et al.
of aircraft carrying emergency rescue tasks. And sending data can be called when the user gives instructions about where and what task the aircraft will carry. Scenario Service Scenario service is highly correlated with the system’s specific task scenarios. It contains services of three aspects and will perform before, during, and after the scene. Mission generation service is called the situation where the system is functioning as a simulation or training system for users and is called before the scene starts. Such service can create missions for cities on the scene and allow aircraft to fulfill the missions. The main event display service is called during the scene and plays as a notepad for users to record the command or gives users a hint of what mission is being accomplished. The event log service is called after the scene when all the missions are accomplished. The event log service will record every arrangement of the aircraft, including the ID of the aircraft, the mission it carries, and the time that the arrangement is made. The event log service will save the log as a file and can be used to evaluate the efficiency of aircraft and other indexes.
17.2.3 Multi-Agent Architecture The multi-agent architecture demonstrates two main types of agents in the system, aircraft agents and city agents. Each type of agent has its attributes and functions. And they can interact with each other to renew their attributes. Aircraft Agent Aircraft agent is a basic class of all aircraft objects in the system. This agent class has attributes and functions that mainly represent how the fixed-wing aircraft and helicopters work in the system. Aircraft agent class has attributes including appearance, aircraft type, fuel load, and loading ability. The appearance attribute is for modeling the aircraft and is used when visualizing the system, and the hologram of appearance will indicate the position and rotation of the aircraft. The aircraft type is used for the main event display and event logging. It’s one of the basic attributes of an aircraft. Fuel load represents the quantity of fuel that the aircraft is carrying. Such attribute is taken into consideration when users decide what mission the aircraft will accomplish. Loading ability measures how many people, how heavy the supplies, and what kind of equipment the aircraft can carry. This attribute is another factor that should be considered when making decisions. Functions of the aircraft agent class include planning a route, flying to a destination, executing a task, and updating load. The planning route is to generate the waypoints toward the destination, depending on the aircraft type and the terrain.
17 Augmented Reality Framework and Application for Aviation …
241
Flying to the destination can be called after the route is created. And this function can dominate the aircraft’s position and rotation so that they are consistent with the real situation. Executing task is called after the aircraft reaches the destination and is relevant with updating load function. Together these two functions can accomplish the task and update what the aircraft carries to accomplish the task. City Agent City agent is a basic class of all city objects in the system. Attributes and functions of the city agent class match those of the aircraft agent class. City agent class has attributes including location, airport capacity, resource, and resource demand. Location includes a location in the real world and a location in the system. And both can be transferred into another when needed. Location is needed when an aircraft needs to plan a route. Airport capacity measures how many aircraft can land and execute the mission at the same time in this very city. This attribute influences what mission the aircraft will take. Resource measures what kind of resource and how much the city can offer so that aircraft can transport it to another city in need. Resource demand on the other hand measures what kind of resource and how much the city needs. The functions of city agents are accepting aircraft, offering resources, and updating demand. Accepting aircraft is used to update the number of aircraft in the airport to decide whether this city can accept more aircraft. Offering resource can be called when an aircraft arrives in the city and loads supplies and decrease the city’s resource according to how much the aircraft loads. Updating demand is called after the aircraft carries supplies to this city. Resource demand is decreased by updating the demand function.
17.3 Construction of Aviation Emergency Rescue Augmented System An augmented reality system of aviation emergency rescue is constructed under the framework in Sect. 17.2. The development platform is the Unity3D engine. Unity3D engine can provide a user-friendly environment to develop and visualize the system. MRTK, which represents a mixed reality toolkit, is a package of tools to assist in developing augmented reality programs in Unity3D. The augmented reality equipment to deploy the system is Microsoft’s Hololens2. Hololens2 can provide hardware and software to support the services designed in the system. The OS to develop the program is Windows10 professional, CPU is Intel(R) Core(TM) i7-9700KF CPU @ 3.60 GHz. Memory is 16 GB. The Unity3D version to construct the system is 2019.4(LTS). The MRTK version is 2.7.2.0. The map and terrain in the system are based on the Zhejiang Province of China, which has a total area of 105,500 km2 . Fourteen types of aircraft are chosen as instances of aircraft agent class. Fifteen cities are chosen as instances of city agent class.
242
S. Liu et al.
Fig. 17.2 Construction of augmented reality system
17.3.1 Construction of System The construction of the augmented reality system contains three parts, development environment, services, and entities. MRTK and Unity3D together offer a fundamental environment to develop a program targeting universal windows platform, which can be released on Hololens2. Services are established in the development environment, and some of them use functions provided by MRTK or Unity3D. Entities are objects in the scene and are visualized by Unity3D. When data or instructions are transferred from other outside systems by network transfer service in basic services, states of aircraft and cities can be changed by services. The construction of the augmented reality system is shown in Fig. 17.2.
17.3.2 Services Development System basic services are adjusted from functions that existed in MRTK or Windows. Some services can be realized in more than one way, and this paper chose one of them while others will still be introduced. Spatial anchor and share service are based on Unity3D’s built-in XR SDK. In Unity3D component, names “World Anchor” can be added to an object and this object is linked to the Hololens2’s understanding of an exact point in the physical world. Unity also provides a function to transfer world anchor between devices named “World anchor transfer batch.” The flowchart of spatial anchor and share service is in Fig. 17.3. Other options include using services provided by World Locking Tools which is available in Unity3D’s higher versions or using image recognition. World Locking Tools is similar to World Anchor and is based on Hololens2’s understanding of the real world. Image recognition is using a pre-placed picture in the physical world to locate the device and initiate the system, and the user’s observation position in the system is based on data from 6 DoF sensors.
17 Augmented Reality Framework and Application for Aviation …
243
Fig. 17.3 Procedure of spatial anchor and share service
Gesture recognition in the system uses MRTK’s input services. In the configuration profile of MRTK, the gestures part of the input section can be changed to change gestures into another setting. In this paper, the default gestures profile provided by MRTK is used. In addition, in articulated hand tracking part of the hand mesh visualization is set as “everything” so that users can confirm that their hands are tracked by the device, and the teleport system is disabled in the teleport section since the system needs no teleportation. Network transfer service uses socket based on UDP protocol. After the local IP address and the exact port is bound, a new thread is started to listen to the local area network. This listening thread is parallel to the main thread in order not to block the system’s main logic. When the system is shut down, the listening thread will be interrupted and aborted. Data transferred between systems is byte array encoded from the struct, JSON, or string. The flowchart of the network transfer service is in Fig. 17.4. The flow of other systems in Fig. 17.4 is simplified with only the data transfer logic is remained on the right of the network transfer service’s workflow.
244
S. Liu et al.
Fig. 17.4 Procedure of network transfer service
Mission generation service contains two steps. Step one is to allocate missions to cities randomly. Step two is to calculate minimum resources that can fulfill the demand and allocate them to cities that do not need such resources. The main event display service uses the event in C# based on the publisher– subscriber model. When aircraft and cities complete a certain event, they publish this event and the display board which subscribed to these events when the system started will be corresponding to the publishment and display the news. The event log service uses the OS’s IO function. A text writing stream is instantiated after the system starts. Each time an event is published, the service will write it into the cache through the stream and when the system is shut down, the service will turn the cache into a text file.
17 Augmented Reality Framework and Application for Aviation …
245
Fig. 17.5 Map model in Unity3D (left) and real map model(right)
17.3.3 Entity Development Entities in the aviation emergency rescue system include map, aircraft, and cities. 3D models of these entities are created in 3d MAX software and models of aircraft and cities are visualized by Unity3D. The map model is developed from the digital elevation map and the satellite map. The digital elevation map offers the height of the terrain, and the satellite map is the texture of the terrain. Since the map covers a large scale of land, the map model has a large file size and consumes a lot of rendering resources of Hololens2’s GPU. So instead of visualizing the map model in augmented reality, this paper chose to print a real map and base the hologram on it. Figure 17.5 shows the map model in Unity3D’s scenic view and the real map model made with resin. The real map model remains other parts of the land and paints them in green color. The management of aircraft and cities’ models relies on the Unity3D package “Addressables.” Aircraft models and city models are asynchronously loaded by the label after the system starts and services such as spatial anchor and share service and network transfer service are initiated. City models are instantiated after the load process is finished. Aircraft models are instantiated only when the aircraft takes off from a city, and the model will be set not active after it arrives. Other objects in the system are also managed by the “Addressables” package (Fig. 17.6). Functions of aircraft and cities are organized as the followed sequence: as the user chooses a particular aircraft and city so that this aircraft would fly to the city and accomplish the mission, the aircraft itself calls the “plan route” function. After the waypoints are calculated, the “fly to destination” method is called to instantiate the model and change the position and rotation of the aircraft. Once the aircraft’s position is closed to the city, the city calls the “accept aircraft” function to inform the event, add the aircraft to the city’s current aircraft fleet, and destroy the model of the aircraft. Then the aircraft calls the “execute task” function to transport resources
246
S. Liu et al.
Fig. 17.6 Procedure of models’ management using Addressables package
between aircraft and city. The “update load” and “update demand” functions are called at last, and the aircraft will be ready for another arrangement from the user. The procedure of calling functions is shown in Fig. 17.7.
17.4 Simulation of Aviation Emergency Rescue The system simulates aviation emergency rescue based on a task scenario in which Zhejiang province in China is attacked by a flood. In this scenario, suffered people need to be transported to settlement places and supplies, and large machinery needs to be transported to cities in need. The simulation runs in a single-device environment while data interfaces are still open for data exchange. The view of the simulation state from the user’s perspective can be seen in Fig. 17.8. The map and other background are in the physical world while models of aircraft, cities, message board, and mesh on the hands are rendered by Hololens2.
17 Augmented Reality Framework and Application for Aviation …
247
Fig. 17.7 Procedure of calling functions between aircraft and city
Fig. 17.8 System running state of simulation of aviation emergency rescue
17.5 Conclusion In this paper, a framework for aviation emergency rescue on large scale based on augmented reality is proposed. This framework contains two main parts including
248
S. Liu et al.
service-oriented architecture and multi-agent architecture. Services in serviceoriented architecture constitute the bottom functional layer while agent classes in multi-agent architecture realize the logic function and visualization part of the system. Based on the framework, an augmented reality system instance was developed. This system used Unity3D engine and MRTK as developing foundation and fulfilled functions in service-oriented architecture and aircraft class and city class. At last, a scenario was simulated to examine the framework and augmented reality system of aviation emergency rescue.
References 1. Goraj, Z., et al.: Aerodynamic, dynamic and conceptual design of a fire-fighting aircraft. Proc. Inst. Mech. Eng., Part G: J. Aerosp. Eng. 215(3), 125–146 (2001) 2. Moeschler, O., et al.: Difficult aeromedical rescue situations: experience of a Swiss pre-alpine helicopter base. J. Trauma 33(5), 754–759 (1992) 3. Grissom, C.K., Thomas, F., James, B.: Medical helicopters in wilderness search and rescue operations. Air Med. J. 25(1), 18–25 (2006) 4. Wang, X., et al.: Study on the deployment and dispatching of aeronautic emergency rescue transport based on virtual simulation. In: 2021 5th International Conference on Artificial Intelligence and Virtual Reality (AIVR), pp. 29–35. Association for Computing Machinery (2021) 5. Brown, C., et al.: The use of augmented reality and virtual reality in ergonomic applications for education, aviation, and maintenance. Ergon. Des. 10648046211003469 (2021) 6. Tan, S., et al.: Study on augmented reality electronic sand table and key technique. J. Syst. Simul. 20 (2007) 7. Guest, A., Bernardes, S., Howard, A.: Integration of an Agent-Based Model and Augmented Reality for Immersive Modeling Exploration, p. 13. Earth and Space Science Open Archive (2021)
Author Index
A Abdelhakeem, Sara Khaled, 49 Abe, Jair Minoro, 3 Albanis, Georgios, 187 Alzahrani, Yahya, 145
B Boufama, Boubakeur, 145 Bressler, Michael, 201
D da Silva Filho I., João, 3 Dusza, Daniel G., 131
E Eckstein, Korbinian, 201
G Gkitsas, Vasileios, 187
H Huang, Hui-Wen, 131 Huang, Kai, 131
K Kadhem, Hasan, 49 Kolbenschlag, Jonas, 201 Kuzuoka, Hideaki, 201
L Li, Tong, 119 Liu, Hu, 19, 173, 237 Liu, Huaqun, 85, 119 Liu, Huilin, 131 Liu, Siliang, 237 Li, Xijie, 119 Li, Xin, 19, 173 Li, Xiwen, 217 M Mao, Kezhi, 73 Mechler, Vincenz, 35 Mustafa, Zeeshan Mohammed, 49 N Nakamatsu, Kazumi, 3 Nan, Ke, 217 Niu, Xiaoye, 217 O Omata, Masaki, 101 Onsori-Wechtitsch, Stefanie, 187 P Pang, Yiqun, 161 Pang, Yunxiang, 161 Prahm, Cosima, 201 Q Qing, Qing, 85
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 K. Nakamatsu et al. (eds.), Advanced Intelligent Virtual Reality Technologies, Smart Innovation, Systems and Technologies 330, https://doi.org/10.1007/978-981-19-7742-8
249
250
Author Index
R Rojtberg, Pavel, 35
X Xue, Yuanbo, 19, 173
S Selitskiy, Stanislav, 61 Song, Wei, 119 Ström, Per, 187 Sun, Haiyang, 161 Sun, Xiaoyue, 85 Suzuki, Mizuki, 101
Y Yang, Sirui, 85 Yan, Huimin, 119 Yu, YiXiong, 19
T Tian, Yongliang, 19, 173, 237 W Whitehand, Richard, 187
Z Zarpalas, Dimitrios, 187 Zhang, Jiaheng, 73 Zhang, Jun, 217 Zioulis, Nikolaos, 187