278 75 21MB
English Pages [335] Year 2022
LNCS 13705
Agma Traina · Hua Wang · Yong Zhang · Siuly Siuly · Rui Zhou · Lu Chen (Eds.)
Health Information Science 11th International Conference, HIS 2022 Virtual Event, October 28–30, 2022 Proceedings
Lecture Notes in Computer Science Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA
Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Moti Yung Columbia University, New York, NY, USA
13705
More information about this series at https://link.springer.com/bookseries/558
Agma Traina · Hua Wang · Yong Zhang · Siuly Siuly · Rui Zhou · Lu Chen (Eds.)
Health Information Science 11th International Conference, HIS 2022 Virtual Event, October 28–30, 2022 Proceedings
Editors Agma Traina University of São Paulo São Carlos, Brazil
Hua Wang Victoria University Melbourne, VIC, Australia
Yong Zhang Tsinghua University Beijing, China
Siuly Siuly Victoria University Footscray, VIC, Australia
Rui Zhou Swinburne University of Technology Hawthorn, VIC, Australia
Lu Chen Swinburne University of Technology Hawthorn, VIC, Australia
ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-031-20626-9 ISBN 978-3-031-20627-6 (eBook) https://doi.org/10.1007/978-3-031-20627-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The International Conference Series on Health Information Science (HIS) provides a forum for disseminating and exchanging multidisciplinary research results in computer science/information technology and health science and services. It covers all aspects of health information sciences and systems that support health information management and health service delivery. The 11th International Conference on Health Information Science (HIS 2022) was held in Biarritz, France, during October 28–30, 2022. Founded in April 2012 as the International Conference on Health Information Science and Their Applications, the conference continues to grow to include an ever broader scope of activities. The main goal of these events is to provide international scientific forums for researchers to exchange new ideas in a number of fields that interact in-depth through discussions with their peers from around the world. The scope of the conference includes (1) medical/health/biomedicine information resources, such as patient medical records, devices and equipment, software, and tools to capture, store, retrieve, process, analyze, and optimize the use of information in the health domain;(2) data management, data mining, and knowledge discovery, all of which play a crucial role in decision-making, management of public health, examination of standards, privacy and security issues; (3) computer visualization and artificial intelligence for computer-aided diagnosis; and (4) development of new architectures and applications for health information systems. The conference solicited and gathered technical research submissions related to all aspects of the conference scope. All the submitted papers were peer-reviewed in a singleblind manner by at least three international experts drawn from the Program Committee. After the rigorous peer-review process, a total of 29 papers among the 54 submissions were selected on the basis of originality, significance, and clarity and were accepted for publication in the proceedings. The authors were from Australia, Bangladesh, Brazil, China, Iran, Iraq, Italy, the Netherlands, Switzerland, the UK, the USA, and Vietnam. The high quality of the program – guaranteed by the presence of an unparalleled number of internationally recognized top experts – is reflected in the content of the proceedings. The conference was therefore a unique event where attendees were able to appreciate the latest results in their field of expertise and to acquire additional knowledge in other fields. The program was structured to favor interactions among attendees coming from many different areas, scientifically and geographically, from academia and from industry.
vi
Preface
Finally, we acknowledge all those who contributed to the success of HIS 2022 but whose names are not listed here. October 2022
Agma Traina Hua Wang Yong Zhang Siuly Siuly Rui Zhou Lu Chen
Organization
General Co-chairs Yannis Manolopoulos Yanchun Zhang
Open University of Cyprus, Cyprus Victoria University, Australia
Program Co-chairs Agma Traina Hua Wang Yong Zhang
University of Sao Paulo, Brazil Victoria University, Australia Tsinghua University, China
Coordination Chair Siuly Siuly
Victoria University, Australia
Publicity Co-chairs Enamul Kabir Mirela Cazzolato Varun Bajaj
University of Southern Queensland, Australia University of Sao Paulo, Brazil Indian Institute of Information Technology, India
Publication Co-chairs Rui Zhou Lu Chen
Swinburne University of Technology, Australia Swinburne University of Technology, Australia
Website Co-chairs Yong-Feng Ge Mingshan You
La Trobe University, Australia Victoria University, Australia
HIS Steering Committee Representative Fernando Martin-Sanchez
Institute of Health Carlos III, Spain
viii
Organization
Program Committee Venkat Achalam Ashik Mostafa Alvi Ömer Faruk Alçin Mirela Teixeira Cazzolato Lu Chen Soon Chun Thomas M. Deserno Mohammed Diykh Yong-Feng Ge Zhisheng Huang Md Rafiqul Islam Enamul Kabir Smith Khare Fátima L. S. Nunes Shaofu Lin Gang Luo Jiangang Ma Muhammad Tariq Sadiq Siuly Siuly Paolo Soda Le Sun Lili Sun Weiqing Sun Supriya Supriya Md. Nurul Ahad Tawhid Agma Traina Hua Wang Kate Wang Jiao Yin Mingshan You Guoqiang Zhang Rui Zhou
Additional Reviewer Deserno Thomas
CHRIST (Deemed to be University), India Victoria University, Australia Bingol Üniversitesi, Turkey University of São Paulo, Brazil Swinburne University of Technology, Australia City University of New York, USA TU Braunschweig, Germany University of southern Queensland, Australia La Trobe University, Australia Vrije Universiteit Amsterdam, The Netherlands University of Technology Sydney, Australia University of Southern Queensland, Australia Shri Ramdeobaba College of Emgineering and Management, India University of São Paulo, Brazil Beijing University of Technology, China University of Washington, USA Federation University Australia, Australia University of Lahore, Pakistan Victoria University, Australia Università Campus Bio-Medico di Roma, Italy Nanjing University of Information Science and Technology, China University of Southern Queensland, Australia University of Toledo, USA Victoria University, Australia Victoria University, Australia University of Sao Paulo, Brazil Victoria University, Australia RMIT University, Australia Victoria University, Australia Victoria University, Australia UTHealth, USA Swinburne University of Technology, Australia
Contents
Applications of Health and Medical Data Evidence Extraction to Validate Medical Claims in Fake News Detection . . . . . . Pritam Deka, Anna Jurek-Loughrey, and Deepak P Detection of Obsessive-Compulsive Disorder in Australian Children and Adolescents Using Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . Umme Marzia Haque, Enamul Kabir, and Rasheda Khanam An Anomaly Detection Framework Based on Data Lake for Medical Multivariate Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peng Ren, Zhiyuan Tian, Zeyu Wang, Xin Li, Xia Wang, Tao Zhao, and Ming Sheng Anomaly Detection on Health Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Durgesh Samariya and Jiangang Ma DRAM-Net: A Deep Residual Alzheimer’s Diseases and Mild Cognitive Impairment Detection Network Using EEG Data . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashik Mostafa Alvi, Siuly Siuly, Maria Cristina De Cola, and Hua Wang An Intelligence Approach for Blood Pressure Estimation from Photoplethysmography Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shahab Abdulla, Mohammed Diykh, Sarmad K. D. AlKhafaji, Atheer Y. Oudah, Haydar Abdulameer Marhoon, and Rand Ameen Azeez Tailored Nutrition Service to Reduce the Risk of Chronic Diseases . . . . . . . . . . . Jitao Yang
3
16
26
34
42
54
64
Combining Process Mining and Time Series Forecasting to Predict Hospital Bed Occupancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Annelore Jellemijn Pieters and Stefan Schlobach
76
HGCL: Heterogeneous Graph Contrastive Learning for Traditional Chinese Medicine Prescription Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zecheng Yin, Yingpei Wu, and Yanchun Zhang
88
x
Contents
Fractional Fourier Transform Aided Computerized Framework for Alcoholism Identification in EEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Muhammad Tariq Sadiq, Hesam Akbari, Siuly Siuly, Yan Li, and Paul Wen Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Zeyu Wang, Huiying Zhao, Peng Ren, Yuxi Zhou, and Ming Sheng Health and Medical Data Processing MHDML: Construction of a Medical Lakehouse for Multi-source Heterogeneous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Qi Xiao, Wenkui Zheng, Chenyu Mao, Wei Hou, Hao Lan, Daojun Han, Yang Duan, Peng Ren, and Ming Sheng A Hybrid Medical Causal Inference Platform Based on Data Lake . . . . . . . . . . . . 136 Peng Ren, Xingyue Liu, Shuxin Zheng, Lijun Liao, Xin Li, Ligong Lu, Xia Wang, Ruoyu Wang, and Ming Sheng Data Exploration Optimization for Medical Big Data . . . . . . . . . . . . . . . . . . . . . . . 145 Shuang Ding, Chenyu Mao, Wenkui Zheng, Qi Xiao, and Yitao Wu Improving Data Analytic Performance in Health Information System with Big Data Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Hung Ba Ngo and Nguyen Chi Tran HoloCleanX: A Multi-source Heterogeneous Data Cleaning Solution Based on Lakehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Qin Cui, Wenkui Zheng, Wei Hou, Ming Sheng, Peng Ren, Wang Chang, and XiangYang Li The Construction and Validation of an Automatic Crisis Balance Analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Long Yin Guo, Lin Xia, Xin Yi Huang, Yu Xin Fu, Xin Yi Li, Si Chen Zhou, Chao Zhao, and Bing Xiang Yang Assessing the Utilisation of TELedentistry from Perspectives of Early Career Dental Practitioners - Development of the UTEL Questionnaire . . . . . . . . 189 Joshua Lee, Joon Soo Park, Hua Wang, Boxi Feng, and Kate N. Wang Genetic Algorithm for Patient Assignment Optimization in Cloud Healthcare System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Xinyu Pang, Yong-Feng Ge, and Kate Wang
Contents
xi
Research on the Construction of Psychological Crisis Intervention Strategy Service System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Yahong Yao, Shaofu Lin, Zhisheng Huang, Shushi Li, Chaohui Guo, and Ying Wu Towards a Perspective to Analyze Emergent Systems in Health Domain . . . . . . . 217 Sandro Luís Freire de Castro Silva, Bruno Elias Penteado, Rodrigo Pereira dos Santos, and Marcelo Fornazin Health and Medical Data Mining via Graph-based Approaches Food Recommendation for Mental Health by Using Knowledge Graph Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Chengcheng Fu, Zhisheng Huang, Frank van Harmelen, Tingting He, and Xingpeng Jiang Medical Knowledge Graph Construction Based on Traceable Conversion . . . . . . 243 Wei Hou, Wenkui Zheng, Ming Sheng, Peng Ren, Baifu Zuo, Zhentao Hu, Xianxing Liu, and Yang Duan Automated Knowledge Graph Construction for Healthcare Domain . . . . . . . . . . . 258 Markian Jaworsky, Xiaohui Tao, Jianming Yong, Lei Pan, Ji Zhang, and Shiva Pokhrel Alcoholic EEG Data Classification Using Weighted Graph-Based Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Supriya Supriya, Tony Jan, Nandini Sidnal, and Scott Thompson-Whiteside Health and Medical Data Classification Optical Coherence Tomography Classification Based on Transfer Learning and RA-Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Xiaoyi Lian, Lina Chen, Xiayan Ji, Fangyao Shen, Hongjie Guo, and Hong Gao Intelligent Interpretation and Classification of Multivariate Medical Time Series Based on Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Tianbo Xu, Le Sun, Sudha Subramani, and Yilin Wang ECG Signals Classification Model Based on Frequency Domain Features Coupled with Least Square Support Vector Machine (LS-SVM) . . . . . . . . . . . . . . 303 Rand Ameen Azeez, Sarmad K. D. Alkhafaji, Mohammed Diyk, and Shahab Abdulla
xii
Contents
Cluster Analysis of Low-Dimensional Medical Concept Representations from Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Fernando Jaume-Santero, Boya Zhang, Dimitrios Proios, Anthony Yazdani, Racha Gouareb, Mina Bjelogrlic, and Douglas Teodoro Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Applications of Health and Medical Data
Evidence Extraction to Validate Medical Claims in Fake News Detection Pritam Deka(B) , Anna Jurek-Loughrey, and Deepak P Queen’s University, Belfast, UK {pdeka01,a.jurek}@qub.ac.uk, [email protected]
Abstract. Fact-checking of online health information has become necessary due to the increasing usage of internet by people searching for medical advice. There is a plethora of false information available to the public, which can put people in harm’s way. In order to aid the factchecking process, recent research has leveraged the advancements made in NLP and deep learning techniques. Majority of the existing technology relies on the existence of labelled data, which is very limited. In this work we explored an unsupervised approach to identifying evidence sentences, which is the key task in claims verification process. We show by performing experiments on a publicly available dataset that our method achieves performance comparable to that of state-of-the-art supervised techniques. We also show how our proposed method can be adapted in cases where labelled data is available.
Keywords: Health misinformation learning · Evidence retrieval
1
· Fact-checking · Unsupervised
Introduction
Fake news prevailing online is not a new phenomenon and has been around for quite sometime. However, with the increasing usage of internet, the rise of fake news has been more apparent since recent times. It particularly gained popularity during the 2016 US presidential election, which saw a huge trend in online fake news circulation [1]. It has become a serious problem in the world which is capable of jeopardising democracy, snatching freedom of expression of journalism and toppling market economy [43]. In order to tackle the problem of fake news, recent advancements in the field of machine learning and NLP have been used. However, the primary focus of most studies has been political news. There are different sub tasks within the broader aspect of fake news detection using AI. One of these is fact-checking of information. There has been a surge in research related to fact-checking with the public release of datasets such as Politifact [37], RumourEval [11,16], Claimbuster [18], FEVER [35], FEVEROUS [2]. However, these datasets mostly cater to general fact-checking that also includes political news fact-checking. This is markedly different from the fact-checking of c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Traina et al. (Eds.): HIS 2022, LNCS 13705, pp. 3–15, 2022. https://doi.org/10.1007/978-3-031-20627-6_1
4
P. Deka et al.
health information which is garnering more attention since the Covid-19 pandemic [15,22]. Fact checking of online health information requires domain expertise, unlike general fact checking. It is also a time consuming process due to the fact that experts need to assess health related claims based on existing medical information available via medical databases such as PubMed. Recent advancements in NLP and machine learning has helped in boosting research related to this problem. However, the shortage of appropriate publicly available datasets limits the posibility of applying machine learning for the health related factchecking task. In order to address this challenge, we propose an unsupervised learning approach to retrieve evidence sentences for such claims from PubMed abstracts. This task is designed to be part of a pipeline for predicting veracity of medical claims which follows the retrieval of PubMed abstracts as evidence for such claims. We evaluate our method on a publicly available dataset and show that our method is comparable with the state-of-the-art supervised methods. Task Definition The task is defined as follows: For a list of claims and a set of relevant PubMed abstracts for each claim, the task is to identify the evidence sentences within those abstracts. Given a claim c and a set of relevant abstracts A = [a1 , . . . an ], the goal is to find the evidence sentences from within those abstracts for the claim.
2
Related Work
In this section we will hereby discuss the recent research in the space of medical evidence retrieval, which is predominantly based on the application of transformer neural networks [36]. [38] released SCIFACT, a publicly available dataset for health/medical fact checking. They proposed a method that uses a RoBERTalarge [27] encoder for the task of evidence retrieval where abstracts for each claim are first extracted from a corpus using TF-IDF similarity. After that, for the sentence selection the RoBERTa-large encoder is used which is fine-tuned on SCIFACT. VerT5erini [29] used a T5 (Text-To-Text Transfer Transformer) [30] model for the abstract retrieval and sentence selection. For identifying abstracts the authors proposed a one-stage as well as a two-stage model where the first stage in both the approaches used the BM25 algorithm to extract a list of k abstracts. For the two-stage method, they have used a T5 re-ranker to re-rank the list of k abstracts. For sentence selection, they again used the T5 model fine-tuned on SCIFACT. Their proposed method outperformed the approach of [38] in both abstract retrieval and sentence selection. Another model named PARAGRAPH-JOINT [25] used BioSentVec [9] for retrieving the abstracts instead of using TF-IDF similarity. Sentence embedding is computed for each claim and each abstract using BioSentVec and cosine similarity is used to find out the top k most similar abstracts for each claim.
Evidence Extraction to Validate Medical Claims in Fake News Detection
5
They used a RoBERTa-large encoder for the sentence selection task fine-tuned on SCIFACT. ARSJOINT [42] also used BioSentVec [9] to first select the top k similar abstracts for further processing. This was done to minimize the total number of candidate abstracts and remove non-relevant abstracts. They experimented with RoBERTa-large and BioBERT-large [23] where both models were fine-tuned on the SCIFACT dataset. Their proposed method jointly learns all three tasks of abstract retrieval, sentence selection, and label prediction. They also proposed a regularization which is based on the divergence between the abstract retrieval sentence attention and the sentence selection outputs. They showed using experimental results that their proposed method performed better than the baseline models which were chosen. The above works use a three step approach of abstract retrieval, evidence sentence selection and label prediction using the selected evidence sentences. However, the difference between them is the way the evidence sentences are selected. VerT5erini selects the evidence sentences by using each sentence independently whereas PARAGRAPH-JOINT and ARSJOINT uses the abstracts for evidence sentence selection using pooling and self-attention. Unlike previous approaches, MULTIVERS [39] uses a weakly-supervised approach where training is done using available scientific documents. Instead of encoding claims and abstracts separately, their method encodes both claims and abstracts using a Longformer [5] architecture which allows longer documents to be encoded without any information loss. For choosing candidate abstracts, they follow the two-stage methodology of VerT5erini. Once the candidate abstracts are selected, sentence selection is performed as a binary classification task and label prediction as a 3-way classification task. Their proposed method performed better than the baseline methods on both the abstract and sentence selection tasks evaluated on SCIFACT. In our work, we have focused on the task of evidence sentence selection from candidate abstracts which are retrieved using a dense S-BERT [31] finetuned model. Our approach is different from these works, as we have explored an approach of selecting the evidence sentences from the candidate abstracts using evidence-based medicine (EBM) concepts combined with knowledge from BERT based models. The method can be used in an unsupervised way without the need for data labels relevant to the evidence sentence extraction task, however it can also be adapted in situations where labelled data is available.
3
Proposed Method
The proposed method is a two-stage approach with the assumption that evidence statements are usually sentences that reports the outcomes of a randomized controlled trial (RCT) including findings and results. RCTs belong to the highest level of EBM practice in terms of gathering evidence in a systematic manner [7]. The first stage of the method consists of selecting the sentences from the abstract, which are the most similar to the claim. The second stage uses a classifier to classify the sentences from the first stage as evidence or not evidence sentences. The proposed method is shown in Fig. 1.
6
P. Deka et al.
similarity
SBERT sentence similarity model Claim
s1 s2 sk
-------------------------------
Sentences from the relevant abstracts
s'1
s'n
----------BioBERT classifier
evidence sentences
-----------
Top n sentences ranked according to similarity (n0.70,