230 35 5MB
English Pages 237 [230] Year 2023
Intelligent Systems Reference Library 244
Halina Kwaśnicka · Nikhil Jain · Urszula Markowska-Kaczmar · Chee Peng Lim · Lakhmi C. Jain Editors
Advances in Smart Healthcare Paradigms and Applications Outstanding Women in Healthcare— Volume 1
Intelligent Systems Reference Library Volume 244
Series Editors Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The aim of this series is to publish a Reference Library, including novel advances and developments in all aspects of Intelligent Systems in an easily accessible and well structured form. The series includes reference works, handbooks, compendia, textbooks, well-structured monographs, dictionaries, and encyclopedias. It contains well integrated knowledge and current information in the field of Intelligent Systems. The series covers the theory, applications, and design methods of Intelligent Systems. Virtually all disciplines such as engineering, computer science, avionics, business, e-commerce, environment, healthcare, physics and life science are included. The list of topics spans all the areas of modern intelligent systems such as: Ambient intelligence, Computational intelligence, Social intelligence, Computational neuroscience, Artificial life, Virtual society, Cognitive systems, DNA and immunity-based systems, e-Learning and teaching, Human-centred computing and Machine ethics, Intelligent control, Intelligent data analysis, Knowledge-based paradigms, Knowledge management, Intelligent agents, Intelligent decision making, Intelligent network security, Interactive entertainment, Learning paradigms, Recommender systems, Robotics and Mechatronics including human-machine teaming, Self-organizing and adaptive systems, Soft computing including Neural systems, Fuzzy systems, Evolutionary computing and the Fusion of these paradigms, Perception and Vision, Web intelligence and Multimedia. Indexed by SCOPUS, DBLP, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
Halina Kwa´snicka · Nikhil Jain · Urszula Markowska-Kaczmar · Chee Peng Lim · Lakhmi C. Jain Editors
Advances in Smart Healthcare Paradigms and Applications Outstanding Women in Healthcare— Volume 1
Editors Halina Kwa´snicka Department of Artificial Intelligence Wroclaw University of Science and Technology Wrocław, Poland Urszula Markowska-Kaczmar Department of Artificial Intelligence Wroclaw University of Science and Technology Wrocław, Poland
Nikhil Jain The Permanente Medical Group, Inc. California, CA, USA Chee Peng Lim Deakin University Victoria, VIC, Australia
Lakhmi C. Jain KES International Selby, UK
ISSN 1868-4394 ISSN 1868-4408 (electronic) Intelligent Systems Reference Library ISBN 978-3-031-37305-3 ISBN 978-3-031-37306-0 (eBook) https://doi.org/10.1007/978-3-031-37306-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
2023 International Women’s Day
Preface
The aim of this book is to recognise the contributions of outstanding women in STEM (science, technology, engineering, and mathematics), focusing on the healthcare sector. The book presents a small collection of recent advances in theory and application of smart healthcare systems created and implemented by women scientists, researchers, and practitioners. The reported studies include advances in artificial intelligence (AI) and machine learning for medical image processing and analysis, smart health monitoring and recommendation systems, intelligent methodologies for healthcare software development, as well as effective continuous education and training strategies for healthcare professionals and stakeholders. A total of 10 chapters are included in this edited book. A summary of each chapter is given in the following section. Favorskaya and Kutuzova analysed microscopic images in the medical domain with deep neural networks presented in Chap. 1. Both Generative Adversarial Network (GAN)-based and non-GAN-based models are used for microscopic image reconstruction. To preserve the contour and texture in a high-quality image representation, the deep wavelet super-resolution network is employed. The results from empirical evaluation with two publicly available medical data sets, namely “blood cell images” and “Malaria bounding boxes” from Kaggle, indicate the effectiveness of a modified GAN-based model for processing and analysing microscopic blood smear images. Montani et al. devised two explainable trace classification strategies in medical processes with deep learning models in Chap. 2. The first strategy leverages a trace saliency map and a counter-map to highlight the most important activities and the activity positions for justifying the classification output of a convolutional neural network. The second strategy exploits a string alignment method to identify frequently conversed activities for classification with the k-nearest neighbour-like model, providing a post-hoc explanation for the classification output. Case studies in stroke patient management demonstrate the usefulness of the proposed strategies. In Chap. 3, Ignat and Gaina studied the automatic classification of pneumonia based on chest X-ray images. Ensemble classification models are devised with voting, probabilistic, and machine learning methods. The base classifiers consist of several vii
viii
Preface
deep learning algorithms, including Resnet variants, VGG variants, Mobilenet, Densenet, Inception, and Shufflenet. A data set from the Radiological Society of North America Detection Challenge is used for performance evaluation. Efficacy of different classifier combinations to form ensemble models for pneumonia classification with chest X-ray images is analysed and discussed. Chapter 4 by Oliboni et al. presents an Intuitive Context-Award Recommender with Explanation (ICARE) framework for healthcare applications. Given a temporal data set, the Aged Look Back A priori algorithm is used to infer sequential rules in ICARE for providing contextual recommendations. In a case study on sleep quality analysis, data samples collected from wearable devices are coupled with contextual information to discover rules for recommending actions to allow users in enjoying good sleep quality. The ICARE app is able to provide personalised evaluation and recommendation of sleep quality by considering an individual’s physical activities and other factors. Szolomicka and Markowska-Kaczmar present a survey on different few-shot learning methods for analysis of histopathological images in cancer diagnosis in Chap. 5. The general concept of few-shot learning and the characteristics of histopathological images are described. The examined few-shot learning paradigms are grouped into two main categories: data augmentation-based and meta-learningbased methods. A comprehensive analysis of the existing methods in the literature is presented. Publicly available histopathological images are reviewed. In addition, future research directions in few-shot learning in the histopathology domain are elucidated. In Chap. 6, Bianchini et al. presented a technique for skin lesion recognition and diagnosis with an AI-driven approach. A processing pipeline to analyse skin lesion images and distinguish nevi from melanomas, along with patient clinical information, is devised. Dermoscopic images are processed to derive their segmented counterparts, while saliency maps are used to highlight the lesion regions. A series of evaluations using the International Skin Imaging Collaboration (ISIC) data set and ISIC Weak Segmentation Maps data set indicate the efficacy of the proposed AI-driven pipeline for skin lesion classification and segmentation. Belciug and Iliescu examined the use of AI-based methods in the obstetrics domain in Chap. 7. The benefits of using AI in obstetrics are explained, which include assessment of fetal morphology scans and forecast of delivery mode (e.g., vaginal versus caesarean). Differential evolution and evolutionary computational methods to optimise deep learning models for analysing fetal ultrasound images are described. Furthermore, statistical and AI methods to determine predictive variables, which include maternal characteristics and fetal-related measurements from ultrasound images, for determining a suitable baby delivery mode are discussed. In Chap. 8, Pedell et al. investigated a co-design procedure to generate knowledge for developing smart healthcare software. Motivational modelling is exploited to allow stakeholders to voice their opinions in designing smart healthcare apps. The driving principles of co-design, e.g., value-driven, collaborative, creative, involving, and shared, are discussed. Two use cases are presented, i.e., a person-centred care setting in hearing rehabilitation and a personal emergency alarm system for
Preface
ix
the elderly. The outcomes illustrate the usefulness of the co-design procedure for capturing the emotional goals of stakeholders in software design and development. Angulo-Sherman and Salazar-Varas in Chap. 9 highlighted the contributions of women in brain–computer interface (BCI) for healthcare applications. A number of recent advances made by women in utilising electroencephalography and related technologies for BCI to tackle healthcare-related problems are presented. These include health management, mobility assistive technology, as well as disease prevention, diagnosis, and treatment. The current limitations and prospects of BCI technology for smart healthcare are discussed, with a particular focus on the contributions of women in this domain. The Final Chapter by Oikawa et al. reports the application of simulation-based education (SBE) as a pragmatic strategy to facilitate continuous learning of healthcare faculty members. Initiatives for transforming faculty development programmes with SBE during and after the COVID-19 pandemic are discussed. Best practices for conducting remote development programmes are shared, e.g., in terms of technologies, participants, facilitators, and contents. The barriers and challenges in conducting online or hybrid programmes for SBE are explained. Lessons learned and possible solutions to overcome challenges in advancing SBE for faculty development are elucidated. The editors are grateful to all authors for contributing their work, to all reviewers for improving the article quality, and to all Springer editorial team members for assisting in the compilation and publication of this book. The chapters presented in this edition constitute only a tiny fraction of contributions made by prominent women in healthcare, aiming to encourage and inspire women scientists, researchers, and practitioners to proactively uptake STEM education and further design, develop, and deploy smart technologies for advancing the healthcare sector. Wrocław, Poland Davis, USA Wrocław, Poland Waurn Ponds, Australia Selby, UK
Halina Kwa´snicka Nikhil Jain Urszula Markowska-Kaczmar Chee-Peng Lim Lakhmi C. Jain
Contents
1
2
3
4
5
Medical Microscopic Single Image Super-Resolution Based on Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Margarita N. Favorskaya and Marina O. Kutuzova
1
Making Process Trace Classification More Explainable: Approaches and Experiences in the Medical Field . . . . . . . . . . . . . . . . Stefania Montani, Giorgio Leonardi, and Manuel Striani
29
Improving Diagnostics of Pneumonia by Combining Individual Hypotheses on Chest X-Ray Images . . . . . . . . . . . . . . . . . . . Anca Ignat and Robert-Adrian G˘ain˘a
43
ICARE: An Intuitive Context-Aware Recommender with Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barbara Oliboni, Anna Dalla Vecchia, Niccolò Marastoni, and Elisa Quintarelli An Overview of Few-Shot Learning Methods in Analysis of Histopathological Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joanna Szołomicka and Urszula Markowska-Kaczmar
65
87
6
From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition . . . . . 115 Monica Bianchini, Paolo Andreini, and Simone Bonechi
7
Artificial Intelligence in Obstetrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Smaranda Belciug and Dominic Gabriel Iliescu
8
A Co-design Approach for Developing and Implementing Smart Health Technologies and Services . . . . . . . . . . . . . . . . . . . . . . . . . 153 Sonja Pedell, Leon Sterling, Nicole Aimers, and Diego Muñoz
9
Recent Applications of BCIs in Healthcare . . . . . . . . . . . . . . . . . . . . . . . 173 I. N. Angulo-Sherman and R. Salazar-Varas
xi
xii
Contents
10 Remote Faculty Development Programs for Simulation Educators-Tips to Overcome Barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Sayaka Oikawa, Maki Someya, Machiko Yagi, and Benjamin W. Berg
Chapter 1
Medical Microscopic Single Image Super-Resolution Based on Deep Neural Networks Margarita N. Favorskaya
and Marina O. Kutuzova
Abstract Hematological diseases affecting the blood and hematopoietic organs are detected using digital microscopy systems in clinical laboratories. However, the quality of such microscopic images often remains poor due to expensive coloring components. There are many methods, including traditional image processing and segmentation methods, which are currently being implemented using deep neural networks. One of the promising ways is to apply super-resolution approach, which has also been developed from interpolation methods to super-resolution (SR) methods based on deep neural networks. In this chapter, we study two fundamentally different approaches for microscopic image reconstruction: the non-GAN-based SR model and the GAN-based SR model. The GAN-based SR model was significantly modified during experimental research. Due to the need to preserve contours and textures in a high-quality representation, the outputs were fed and processed by deep wavelet SR network. We provide rich experimental results for microscopic image reconstruction on two public datasets “Blood cell images” and “Malaria bounding boxes” from Kaggle cloud storage using different deep SR models. Keywords Medical image · Microscopic image · Blood cell analysis · Super-resolution · Deep learning
Nomenclature CMYK CNN CUDA DWSR GAN
Cyan, magenta, yellow, key (color space) Convolutional neural network Compute unified device architecture Deep wavelet super-resolution Generative adversarial network
M. N. Favorskaya (B) · M. O. Kutuzova Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, 31, Krasnoyarsky Rabochy Ave, Krasnoyarsk 660037, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_1
1
2
GUI HF HR HSV JPEG LF LR LSTM MAE MSE MRC-Net PSO RCF ReLU PReLU RBC RGB PSNR SISR SP SRGAN SSIM YUV WBC
M. N. Favorskaya and M. O. Kutuzova
Graphical user interface High-frequency High resolution Hue, saturation, value (color space) Joint photographic experts group Low-frequency Low resolution Long short-term memory Mean absolute error Mean square error Multi-scale refined context network Particle swarm optimization Refined Context Fusion Rectified linear unit Parametric rectified linear unit Red blood cell Red green blue (color space) Peak signal-to-noise ratio Single image super-resolution Super-resolution Super-resolution generative adversarial network Structural similarity index measure Brightness (referred to as luminance), color components (the chroma) White blood cell
1.1 Introduction Blood cell analysis is a common practice for diagnosing a wide range of important hematological pathologies such as infectious diseases, leukemia, and some types of cancer. Due to the multiple different morphological features of blood components, manual classification of such cells is a time-consuming process with inevitable human errors. Classification results are highly dependent on the skill and experience of the operator, who calculates the percentage of occurrence of each cell type in the region of interest of the smear [1]. Therefore, classification and automated differential counting with a minimum of errors are the highly desirable processes. Two main techniques based on flow cytometry and image processing are used for counting cells [2]. There are many laser based cytometers that provide automatic cell counting, but cannot automatically classify white blood cells and separate abnormal cells from normal cells with acceptable accuracy. In addition, they are expensive devices, require precise hardware calibration, and completely destroy the blood sample after analysis. In contrast, image processing is cost-effective, automated and implemented remotely.
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
3
The advantage of image processing is that images can be saved and used for further verification in some abnormal cases. Segmentation of blood cells in microscopic images is often used as a preliminary step of diagnostic software tool in haematology and infectious diseases prognosis. The next steps are the classification and counting of certain cells. As well known, blood includes several structures: leukocytes (white blood cells), erythrocytes (red blood cells), platelets, and plasma. However, colorized blood images can suffer by noise and blur, making post-processing difficult. High quality microscopic image segmentation provides better classification results when white blood cells need to be divided into two main categories depending on the structure of the nuclei: three types of granular (neutrophils, basophils, and eosinophils) and two types of nongranular (lymphocytes and monocytes) [3]. White blood cells are colorless, but they can be colored when stained with chemicals in order to make them visible under a microscope. However, the main disadvantage of staining the studied blood smears is the lack of standards. One of the ways to increase the quality of segmentation is to improve the blood smear images using single image super-resolution (SISR) methods. The SISR methods have recently been significantly enhanced based on deep learning models. It should be noted that SISR is still a challenging ill-posed problem since multiple high-resolution (HR) images can be generated from the same low-resolution (LR) image. Although most deep neural network architectures for SISR problems have been developed for natural images, their application to enhance medical images is also worthwhile. In this chapter, we study the quality of microscopic image reconstruction using non-generative adversarial network (GAN) and GAN-based SR methods. As a non-GAN-based method, we implement an image-fusion interpolation method that can balance the distortion and visual quality based on CNN model. As a GAN-based method, we use a one-to-one SR approach, in which a single HR image is generated from a single LR image. The advantages of such approach include simple implementation, less time consuming and less complexity. We chose the real-time SR implementation due to lower hardware requirements and faster speed. Currently, GAN methods are considered to provide better performance in the SISR domain due to the fact that GAN-based SISR algorithms use isotropic and anisotropic Gaussian kernels to eliminate distortion, while CNN-based networks can only use isotropic Gaussian kernels. In addition, we experimentally investigate the possibility of preserving contours with a deep wavelet SR (DWSR) network. The rest of this chapter is structured as follows. Section 1.2 provides a brief overview of SISR methods applied to medical images. Section 1.3 describes deep learning models as non-GAN and GAN-based SISR algorithms. Setups and validation experiments are presented in Sect. 1.4. Finally, conclusions are drawn in Sect. 1.5.
4
M. N. Favorskaya and M. O. Kutuzova
1.2 Related Work Like many computer vision problems, the practical task of automatically counting elements in microscope blood smear images can be solved using traditional image processing techniques and based on deep learning paradigms. First, we offer a brief overview of these topics. Second, we pay attention to the reconstruction of medical images suitable for classification and counting using deep neural networks. White blood cells (WBCs) or leukocytes, red blood cells (RBCs) or erythrocytes and platelets have their own functions in the human blood system. The WBCs help build a strong antibody system against diseases, viruses and bacteria, while the RBCs help transport oxygen from the lung throughout the body. The normal WBCs count is in the range of 4500–10,000 per microliter while the values outside of this range are considered abnormal values indicating the corresponding type of disease. Normally, the number of RBCs contains 4.2–5.9 million cells per square centimeter. Either a lower or higher values are diagnosed as anemia or diseases of heart, lung, or kidney, respectively. The normal platelet count ranges from 150,000 to 450,000 platelets per microliter. They promote blood clotting. Generally speaking, any abnormal deviation in the value of a blood smear indicates an infection or disease [4]. Segmentation is the most important step in automatic cell counting, failure of which can lead to inaccuracies in subsequent steps. Segmentation based on traditional image processing used thresholding methods, pattern recognition methods, deformable methods, metaheuristic methods, and saliency methods [5]. The greatest difficulty is the segmentation of the WBC region due to the fact that WBCs consist of two parts: the nucleus and the cytoplasm. This is done by thresholding when analyzing the RGB, CMYK and HSV color spaces or a combination of color components [6]. However, the main segmentation and classification problems are the presence of noise (caused by improper handling of tissue prior to fixation, tissue processing, sectioning, and staining), overlap of cells in the clump region, detection of irregular circles (the normal RBCs and WBCs are approximately circular, but not all cells), and detection of small particles. To solve them, several original methods were proposed, such as the median filtering and mathematical morphology, circle Hough transform, Gabor wavelet filter, connected component labeling, etc. Thus, the Hough transform and its modifications have often been used to detect the RBC’s and WBC’s circles in microscopic images. Due to different types of cells, segmentation is performed for WBCs and RBCs separately based on identification. In addition, it is necessary to recognize the types of WBCs (analysis of the nucleus and cytoplasm regions), which requires accurate segmentation results. Some interesting segmentation results can be found in literature [7–9]. Feed forward back propagation neural networks were applied for the WBC classification in 2010–2017, showing a precision of about 90%. Currently, many machine learning and deep learning models are being successfully applied to cells and nuclei segmentation [10] as well as WBC classification with higher average precision results [11]. In [12], CNN was used to classify erythrocytes based on morphology with a correct classification rate of 90.6% for 10 morphological classes. A new
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
5
fuzzy Lomax-Gumbel membership function with the fuzzy divergence algorithm was proposed in [13] for blood cell image segmentation. Davamani et al. used an adaptive fuzzy C-means cell segmentation and hybrid learning-based cell classification to improve the efficiency of blood cell image identification [14]. In practice, blood smear images can be of poor quality, resulting in poor segmentation and classification. Color image enhancement, denoising, and color normalization are the main pre-processing steps that can be performed using a family of traditional image processing as well as deep learning techniques. Some original approaches can be found in the literature. Thus, Harun et al. [15] proposed a combination of particle swarm optimization (PSO) algorithm and contrast stretching. The PSO algorithm optimized the fitness criterion to improve the details in microscopic image, while a contrast stretching procedure was applied to enhance the range of pixel intensities. Shirazi et al. [16] used Wiener filter for noise and blur removal. The finest details were obtained using the direct discrete curvelet transform. All manipulations were performed in the YUV color space. The color variations (color normalization) were studied in [17]. They proposed geometry-inspired chemical-invariant and tissue invariant stain normalization method that is robust to illumination variation, stain color vector variation, and stain quantity variation through the use of singular value decomposition and non-negative matrix factorization in order to align the color basis. One of the promising ways is to use the SISR approach. Image SR methods can be divided into three categories: interpolation-based, statistics-based and learning-based methods. In previous years, the first two types dominated, but they were limited to small upscaling factors. Currently, methods based on deep learning have the highest efficiency in various fields, including the improvement of medical images [18]. SR image reconstruction is an ill-posed inverse problem, since many different HR images correspond to one LR image. This fact may not be critical for natural images, but not for medical images. In traditional reconstruction methods, additional information can be obtained from several pairs of LR-HR images. Deep learning based SR reconstruction methods directly learn the end-to-end mapping function from HR image to SR image through a neural network with appropriate architecture. The first deep SISR model was introduced in 2015 as a super-resolution convolutional neural network [19]. Further, different modifications were developed, for example, fast super-resolution CNN [20], sub-pixel CNN [21], among others. Ledig et al. [22] proposed the first super-resolution generative adversarial network (SRGAN). Shortcomings such as edge blurring, image artifacts, complex network model have prompted attempts to improve SRGAN. Thus, Wasserstein GAN [23] used a gradient penalty to solve the problems of neural network training. In [24], a GAN-based on depth-wise separable convolution dense blocks was proposed in order to increase computational efficiency by reducing the number of network parameters. A significant role is played by the high-frequency and low-frequency components of the medical image, which preserve the texture details and global topology information, respectively. The idea to use wavelet-based SR reconstruction emerged in the 2010s [25] and has evolved in the era of deep learning for many applications [26]. The main assumption is that an HR image with abundant textural details and global
6
M. N. Favorskaya and M. O. Kutuzova
topology information can be reconstructed via an LR image, since the corresponding wavelet coefficients are accurately predicted. The prediction of the HF wavelet coefficients helps to restore texture details, while constraints on the reconstruction of the LF wavelet coefficients ensure the consistency of the global topology information. Although many wavelet-based methods have already been proposed for solving the SR problem, the SISR problem is discussed in a limited number of publications. The multi-level wavelet CNN [27] model based on the U-Net architecture performed SR in the wavelet domain, ensuring that both frequency information and location information of feature maps can better preserve detailed texture. This deep model demonstrated the efficiency of image denoising, SISR, and JPEG artifact removal. Thus, the application of deep learning-based SISR methods to improve medical images is attractive to many researchers due to significant practical results. New promising deep SISR models have recently been developed for medical image reconstruction, but the field of blood cell classification, detection and counting remains at the level of the first deep SISR models. At the same time, enhanced deep superresolution networks that require large computing resources are unacceptable for the medical diagnostics. Finding a trade-off between efficiency and computational costs is an area of further promising research.
1.3 Models The SIST problem is formulated as follows. Let I LR ∈ X be a low-resolution image obtained, for example, by applying a Gaussian filter to a high-resolution image I HR ∈ Y followed by a downsampling with a factor r. The SR forward problem is written as Eq. 1.1, where O: Y → X is the operator considering noise, blurring, downsampling distortions, and other artifacts. I L R = O(I H R )
(1.1)
The used training dataset (I LR , I HR ) helps to learn O which predicts values IˆH R = O (I L R ) so that IˆH R ≈ I H R . A parametric approximate inverse operator Oθ : X → Y is learned by solving
arg min θ∈
L con I H Ri , Oθ (I L Ri ) + L r eg (θ ),
(1.2)
i
where L con is a content loss function that optimizes the network parameters, L reg (θ) is a regularization function, is the set of parameters. We categorize SISR implementation as non-GAN and GAN-based approaches. In the case of non-GAN-based methods, reconstruction evaluation metrics are used for the model optimization. The peak signal-to-noise ratio (PSNR), which is calculated using the mean square error (MSE) (L 2 norm in terms of deep learning), is the most
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
7
widely used metric to assess reconstruction quality for a super resolution problem. The structural similarity index measure (SSIM) is also often used as a quality index adapted to the human visual system. Structural similarity between images is evaluated based on luminance, contrast and structures. In the case of GAN-based methods, additional adversarial losses should be taken into account based on cross entropy [21]. Thus, the adversarial min–max problem has a view of Eq. 1.3, where Dθ is a discriminator network that is trained to distinguish SR images, Gθ is a generative network that is trained to fool a discriminator network, L HR and L LR are the HR and LR losses, respectively, G and D are the sets of generator and discriminator parameters, respectively [28]. min max
θ∈G θ∈ D
i
L H R (log Dθ (I H Ri )) +
L L R (log(1 − Dθ (G θ (I L Ri ))))
(1.3)
i
Regardless of any deep architecture used, three steps are sequentially performed: feature extraction on LR scale images, non-linear mapping, and super-resolution image reconstruction. We have tested several non-GAN and GAN-based deep SISR models, but we have noticed two of them: a multi-scale refined context network (MRC-Net) with refined context fusion (RCF) [29] and Swift-SRGAN [30]. The main idea of MRC-Net is to extract both global structures at a large scale and local details at a small scale and then fuse them with the RCF unit. We used only this part of the network proposed in [29], since the diagnostics framework is beyond the scope of our study. The MRC-Net architecture includes three modules: feature extraction, nonlinear mapping, and image reconstruction. The RCF unit performs nonlinear mapping as a bilinear downsampling/upsampling to capture an efficient global context and iteratively integrate the significant features of the two paths. Image reconstruction at the targeted resolution is performed by a pixel shuffle layer [21]. In [30], a real-time efficient solution based on depth-wise separable convolutions was proposed. Depth-wise separable convolutions [31] includes depth-wise convolution and point-wise convolution operations and, unlike standard convolutions, require very few training parameters. The perceptual loss function, which is very crucial for the generator performance, is formed as the sum of content losses and adversarial losses. Herewith, the content losses are calculated as the MSE losses and the adversarial losses are computed through entropy function. Another research topic is the type of input data that is fed into the deep networks. Typical inputs for non-GAN (like CNNs) and GAN-based models are images. Both edges and texture are inherent components for microscopic medical images requiring high quality reconstruction. For this purpose, we use two types of input data representation—in the spatial and frequency domains. First, the LR image is upscaled to the SR image. Second, the PSNR and SSIM metrics evaluate the reconstructed SR image, and a decision is made on further improvement. If necessary, the reconstructed SR image I SR ∈ Rh×w , where h and w are the height and width of the image, respectively, is transformed into a direct 2-level wavelet decomposition (obtaining
8
M. N. Favorskaya and M. O. Kutuzova
four coefficients), in other words, into high-frequency (HF) texture details and lowfrequency (LF) edge information. Then, the HF and LF components are explicitly separated to four channels with resolution I W ∈ Rh/2×w/2×4 . It should be noted that since diverse samples are stored in four channels, the complexity of training such a model is significantly reduced. The mean absolute error (MAE) (L 1 norm) between the predicted and the ground-truth wavelet coefficients is adopted as the wavelet loss function L W provided by Eq. 1.4, where PSR, LR stands for the joint distribution of LR and HR pairs, sr and lr are the current pixels of the SR and LR images, K is a number of wavelet sub-scales, k stands for one of the three types of H, V and D detail coefficients indicating Horizontal, Vertical and Diagonal wavelet detail coefficients, respectively, ckj (sr ) and cˆkj (lr ) represent the ground-truth wavelet detail sub-bands and the corresponding predicted wavelet sub-bands in jth scale, respectively. L W = E sr,lr ∼PS R, L R
K ck (sr ) − cˆk (lr ) j
j
1
(1.4)
j=1 k∈H,V,D
Finally, the inverse wavelet transform is applied to the output data (without information loss), and re-evaluation is performed.
1.4 Setups and Validation Experiments In this section, we sequentially consider the setups and experimental results obtained on the two public datasets using deep models in different configurations. Datasets are described in Sect. 1.4.1. Implementation details of deep network architectures are discussed in Sect. 1.4.2. Section 1.4.3 presents the main evaluation metrics used to evaluate reconstructed images. Details of the experimental setups are shown in Sect. 1.4.4. Section 1.4.5 describes the comparative results of the validation experiments.
1.4.1 Datasets The experiments used two open datasets “Blood cell images” [32] and “Malaria bounding boxes” [33] from Kaggle cloud storage. The total number of training images obtained from these two datasets was 1,890 three-channel images. It should be noted that the images from the datasets have different sizes and image formats, which led to the need for their preprocessing. All images have been scaled down by the written script to a single size with a resolution of 256 × 256 × 3. To solve the problem of reconstructing SR microscopic images, it is necessary to have two training datasets that differ in the number of pixels in a certain proportional ratio: the HR and LR microscopic images with resolutions of 256 × 256 × 3 and
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
9
a
b Fig. 1.1 Examples of paired images: a the original HR images, b the corresponding LR images
64 × 64 × 3, respectively. The LR microscopic images were obtained using the down-sampling, blur, and noise addition operators. Figure 1.1 shows examples of paired images from two training datasets. The main problem of microscopic images of blood smears is the lack of a standard for staining the studied smears. Blood smear image datasets have different color intensities because they were collected from different countries and institutions with unknown stain parameters. Thus, three datasets were formed: an anonymous medical institution (Set1) [32], blood smears obtained from the Leonid and Maria Dan Institute (Set2), and blood smears obtained from the National Singapore University (Set 3) [33]. Figure 1.2 depicts examples from these subsets. Each dataset was divided into training, validation and test subsets with 500, 100 and 30 images, respectively.
1.4.2 Implementation Details Let us consider in detail the architectures of two deep neural networks suitable for solving the SR problem. The MRC-Net network is a typical CNN architecture without last (classifying) layers, involving an LSTM (long short-term memory) layer, an MRC module, and a shuffle layer [29]. MRC-Net combines global structures on a large scale and local details on a small scale, for which the RCF unit is used. First, the recursive MRC-Net is trained using the LSTM layer that takes into account longterm dependencies, and second, the obtained features are fed into the unidirectional part of MRC-Net, which allows to save performance while reducing computational costs. The MRC-Net extracts global and local features in two parallel branches using
10
M. N. Favorskaya and M. O. Kutuzova
a
b
c Fig. 1.2 Examples from datasets: a Set1, b Set2, c Set3
the RCF unit. The MRC-Net architecture is presented in Table 1.1. The stride of convolutional kernels is equal to 1. The MRC-Net architecture includes two convolutional layers Conv2D with 3 × 3 kernels followed by two residual blocks to convert the input LR image into feature maps. A convolutional LSTM unit (ConvLSTM) is then adopted to exploit and store the previous feature maps. An LSTM unit consists of an input gate, a forget gate, an output gate, and a cell state. The gates and states are updated temporally using the convolution operation and the Hadamard product. The MRC module includes two parallel paths: a local path with input feature maps and a global path with the corresponding down-sampled data. Each path contains repeated groups of residual blocks and the RCF units alternately. Residual blocks extract features at different scales, while RCF units use an efficient global context to refine and iteratively integrate significant features from two paths. As a result, refined feature maps are created. Since the ConvLSTM and MRC module have the input and output vectors with the same dimension, they can interact freely. The processed features from the MRC module are returned back to the ConvLSTM a certain number of times, removing potential errors that were made in previous iterations.
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
11
Table 1.1 MRC-net architecture Layer
Input vector
Output vector Convolutional kernel size
Number of layers
Conv2D
3 × 64 × 64
64 × 64 × 64 3
1
Conv2D
64 × 64 × 64 64 × 64 × 64 3
1
Residual Block
64 × 64 × 64 128 × 64 × 64
–
2
ConvLSTM with parameters: – Convolutional kernel 3 × 3 – Sigmoid recurrent activation function – tanh activation function
128 × 64 × 64
128 × 64 × 64
3
1
MRC module
128 × 64 × 64
128 × 64 × 64
1
Conv2D
128 × 64 × 64
256 × 64 × 64
1
ReLU
256 × 64 × 64
256 × 64 × 64
1
Pixel shuffle
64 × 256 × 256
64 × 256 × 256
1
Upsample block
64 × 256 × 256
3 × 256 × 256
–
1
Conv2D
3 × 256 × 256
3 × 256 × 256
3
1
Swift-SRGAN has a GAN architecture which consists of a generator and a discriminator. This architecture uses separated convolutions in depth for feature extraction, providing a real time implementation and requiring less memory [30]. The generator architecture consists of residual blocks with Depth-wise Separable Convolutions (significantly reducing inference time), followed by Batch Normalization and a parametric ReLU (PReLU) layer. The Element-wise sum operation is used to summarize the previous block’s output and current output. The up-sample Block contains Depth-wise Separable Convolutions followed by two Pixel Shuffle layers and PReLU activation function. The discriminator architecture consists of 8 Depth-wise Separable Convolution blocks followed by Batch Normalization and LeakyReLU activation function. The output of the adaptive average pooling layer is transformed into a linear layer of 1024 neurons. The main objective of the discriminator is to classify SR images as fake and HR images as real. We have modified the generator and discriminator as follows: • Adding 4 Residual Block and Depth-wise Conv layers in the generator, which increase Residual Block and Depth-wise Conv layers to 20.
12
M. N. Favorskaya and M. O. Kutuzova
• Using 2 additional layers (Depth-wise Conv + Batch Normalization + Leaky ReLU). The modified architecture of Swift-SRGAN allowed the extraction of more unique feature maps, which helps improve the accuracy of reconstructed microscopic images of blood smears. In addition, such a modification did not affect the speed of the neural network. The architecture of the modified Swift-SRGAN generator is presented in Table 1.2, while Tables 1.3 and 1.4 show the architecture of the Residual Block and the architecture of the Swift-SRGAN discriminator. The stride of convolutional kernels is equal to 1. The MRC-Net and Swift-GAN output SR images were fed to the DWSR module to improve contour information. The DWSR module is based on the DWSR network proposed in [34]. Figure 1.3 shows an example of a 2D wavelet transform of a microscopic blood smear image. The application for creating the SR microscopic images of blood smears contains the following modules: • GUI module. Table 1.2 Modified swift-SRGAN architecture Layer
Input vector
Output vector
Convolutional kernel size
Number of layers
Depth-wise Conv
3 × 64 × 64
64 × 64 × 64
9
1
PReLU
64 × 64 × 64
64 × 64 × 64
–
1
Residual Block 64 × 64 × 64
64 × 64 × 64
3
20
Depth-wise Conv
64 × 64 × 64
64 × 64 × 64
3
20
Batch normalization
64 × 64 × 64
64 × 64 × 64
–
1
Element-wise sum
64 × 64 × 64
256 × 64 × 64
–
1
Depth-wise Conv
256 × 64 × 64
256 × 64 × 64
9
1
Pixel shuffle
256 × 64 × 64
64 × 128 × 128
–
1
PReLU
64 × 128 × 128
64 × 128 × 128
–
1
256 × 128 × 128
–
1
Upsamle block 64 × 128 × 128 Depth-wise Conv
256 × 128 × 128
256 × 128 × 128
9
1
Pixel shuffle
256 × 128 × 128
64 × 256 × 256
–
1
PReLU
64 × 256 × 256
64 × 256 × 256
–
1
3 × 256 × 256
–
1
3 × 256 × 256
9
1
Upsamle block 64 × 256 × 256 Depth-wise Conv
3 × 256 × 256
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
13
Table 1.3 Architecture of Residual Block Layer
Input vector
Output vector Convolutional kernel Number of layers size
Depth-wise Conv
64 × 64 × 64 64 × 64 × 64 3
1
Batch normalization 64 × 64 × 64 64 × 64 × 64 – PReLU
64 × 64 × 64 64 × 64 × 64 –
Depth-wise Conv
64 × 64 × 64 64 × 64 × 64 3
1
Batch normalization 64 × 64 × 64 64 × 64 × 64 –
1
• • • • • • •
1
Image processing module. Neural network training module. MRC-Net module. Swift-SRGAN module. DWSR module. Image reconstruction module. Module for saving results.
1.4.3 Evaluation Criteria There are two main metrics that are commonly used to evaluate reconstructed SR images. Peak-signal-to-noise-ratio (PSNR) is the ratio of the maximum image power MAXi to the noise power calculated by Eq. 1.4. PSNR = 10 · log10
MAXi2 MSE
(1.5)
Here, the mean square error (MSE) is given by Eq. 1.5, where I i is the intensity of the high resolution image and Iˆi is the intensity of the reconstructed SR image at point i. The higher the PSNR, the better the SR image quality. MSE =
n 2 1 Ii − Iˆi n i=1
(1.6)
The structural similarity index (SSIM) reflects the perceptual representation of an image in terms of structure, brightness, and contrast. The SSIM values range from 0.0 to 1.0, where 1.0 means a perfect high resolution copy of the image. The SSIM is calculated using the mean, variance and correlation of images I and Iˆ denoted μI , σI and σ I Iˆ , respectively:
14
M. N. Favorskaya and M. O. Kutuzova
Table 1.4 Architecture of swift-SRGAN discriminator Layer
Input vector
Output vector
Convolutional kernel size
Number of layers
Depth-wise Conv
3 × 256 × 256
64 × 256 × 256
3
1
Leaky ReLU
64 × 256 × 256
64 × 256 × 256
–
1
Depth-wise Conv + Batch normalization + Leaky ReLU
64 × 256 × 256 64 × 128 × 128 64 × 128 × 128
64 × 128 × 128 64 × 128 × 128 64 × 128 × 128
3 – –
1
Depth-wise Conv + Batch normalization + Leaky ReLU
64 × 128 × 128 128 × 128 × 128 128 × 128 × 128
128 × 128 × 128 128 × 128 × 128 128 × 128 × 128
3 – –
1
Depth-wise Conv + Batch normalization + Leaky ReLU
128 × 128 × 128 128 × 64 × 64 128 × 64 × 64
128 × 64 × 64 128 × 64 × 64 128 × 64 × 64
3 – –
1
Depth-wise Conv + Batch normalization + Leaky ReLU
128 × 64 × 64 256 × 64 × 64 256 × 64 × 64
256 × 64 × 64 256 × 64 × 64 256 × 64 × 64
3 – –
1
Depth-wise Conv + Batch Normalization + Leaky ReLU
256 × 64 × 64 256 × 32 × 32 256 × 32 × 32
256 × 32 × 32 256 × 32 × 32 256 × 32 × 32
3 – –
1
Depth-wise Conv + Batch normalization + Leaky ReLU
256 × 32 × 32 512 × 32 × 32 512 × 32 × 32
512 × 32 × 32 512 × 32 × 32 512 × 32 × 32
3 – –
1
Depth-wise Conv + Batch normalization + Leaky ReLU
512 × 32 × 32 512 × 16 × 16 512 × 16 × 16
512 × 16 × 16 512 × 16 × 16 512 × 16 × 16
3 – –
1
(continued)
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
15
Table 1.4 (continued) Layer
Input vector
Output vector
Convolutional kernel size
Number of layers
Adaptive pooling
512 × 16 × 16
512 × 6 × 6
–
1
Linear
512 × 16 × 16
1024
–
1
Leaky ReLU
1024
1024
–
1
Linear
1024
1024
–
1
Sigmoid
1
1
–
1
Fig. 1.3 Example of a 2D wavelet transform of a microscopic blood smear image
2μ I μ Iˆ + α 2σ I σ Iˆ + β , SSIM = μ2I + μ2Iˆ + α σ I2 + σ I2ˆ + β
(1.7)
where α and β are the constants chosen to be small to avoid zero in the denominator. The SSIM value is usually calculated for a small window size, followed by the calculation of the mean SSIM value for the entire image.
1.4.4 Experimental Setups For experiments, the PyCharm Community Edition was chosen as the development environment. It is an integrated development environment for Python. This development environment includes the necessary tools for quickly and easily creating
16
M. N. Favorskaya and M. O. Kutuzova
the Python applications. Also, one of the main factors in choosing this development environment was the ability to use additional libraries, which improves programming efficiency. PyTorch has been chosen as the main framework for working with deep learning models. The main advantage of PyTorch is the construction of dynamic computational graphs, as opposed to the static computational graphs presented in TensorFlow and Keras. It should be noted that PyTorch is widely used in developments by Facebook, Twitter, NVIDIA and other companies. During the implementation, a large number of libraries were used. The main libraries are cv2, PyQt5, and torch. The program interface was implemented using the PyQt5 library and the Qt Designer auxiliary program. To work with trained neural networks, the PyTorch framework is used, to which the torch library is connected. In addition, the developed software works with images, thus a software tool is needed to work with them. As such a tool, the OpenCV platform and PIL were chosen as frameworks for developing applications in the field of computer vision and artificial intelligence. OpenCV and PIL allow to display images on the screen, as well as to perform preprocessing. The program interface was implemented using the PyQt5 library and the Qt Designer auxiliary program. A complete list of used libraries is given in Table 1.5, where GUI means graphical user interface.
Table 1.5 List of used libraries Part of application
Title
Version
Part of application
Title
Version
Image preprocessing
Numpy
1.16.3
GUI
PyQt5
5.15.4
Image preprocessing
Matplotlib
3.4.3
GUI
PyQt5-plugins
5.15.4.2.2
Image preprocessing
Pillow
8.4.0
GUI
PyQt5-Qt5
5.15.2
Image preprocessing
Numpy
1.18.5
GUI
PyQt5-sip
12.9.0
Image preprocessing
Scipy
1.4.1
GUI
PyQt5-tools
5.15.4.3.2
Image processing
Opencv-python
4.5.4.58
GUI
PySide2
5.15.2
Machine learning
Torch
1.7.0
GUI
PySimpleGUI
4.55.1
Machine learning
Torchvision
0.8.1
GUI
Qt5-applications
5.15.2.2.2
4.41.0
GUI
Qt5-tools
5.15.2.1.2
Console output TQDM
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
17
System requirements are as follows: • Processor with a clock frequency of 1.5 GHz or higher. • 4 GB of free hard disk space. • 4 GB of RAM. Information and software compatibility requirements are OS Windows 10 or Linux. To speed up the software, it is recommended to have NVIDIA card, as their graphics accelerators have CUDA cores, which can be used by a convolutional neural network to speed up the process of creating SR microscopic images.
1.4.5 Validation Experiments The graphs of loss functions for the MRC-Net and Swift-SRGAN architectures are depicted in Fig. 1.4. Figure 1.5 shows the obtained estimates for the PSNR and SSIM metrics of the MRC-Net and Swift-SRGAN on three training datasets Set1, Set2 and Set3. Table 1.6 presents experimental PSNR and SSIM estimates for reconstruction of LR images for the MRC-Net and Swift-SRGAN on three training datasets Set1, Set2 and Set3. Figures 1.6, 1.7 and 1.8 depict the visible results of the MRC-Net and SwiftSRGAN for three training datasets Set1, Set2 and Set3, respectively. The experimental results demonstrate the effectiveness of the obtained models. The MRC-Net model performed the best results on Set3 with a PSNR value of 30.1 dB and an SSIM value of 0.91. The high values can be justified by the fact that the medical images of blood smears of this subsample have less pronounced contours. In turn, the Swift-SRGAN model showed the best performance during training. Thus, the results of reconstruction of blood smear images significantly depend on the quality
a
b
Fig. 1.4 Graphs of loss functions depending on training epochs for: a MRC-Net, b Swift-SRGAN
18
M. N. Favorskaya and M. O. Kutuzova
a
b Fig. 1.5 Objective estimations of PSNR and SSIM of reconstructing LR image depending on training epochs: a MRC-Net, b Swift-SRGAN Table 1.6 Experimental estimates
Neural network model
PSNR, dB
SSIM
MRC-Net (Set1)
28.67
0.79
MRC-Net (Set2)
29.49
0.89
MRC-Net (Set3)
30.10
0.91
Swift-SRGAN (Set1)
26.90
0.75
Swift-SRGAN (Set2)
28.77
0.83
Swift-SRGAN (Set3)
28.84
0.87
Swift-SRGAN (modified) (Set1)
27.89
0.79
Swift-SRGAN (modified) (Set2)
29.56
0.89
Swift-SRGAN (modified) (Set3)
30.08
0.91
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
19
a
b
c
d Fig. 1.6 Comparative results for image reconstruction from Set1: a original HR images, b LR images, c bicubic reconstruction, d MRC-Net, e MRC-Net + DWSR, f Swift-SRGAN, g SwiftSRGAN + DWSR, h modified Swift-SRGAN, j modified Swift-SRGAN + DWSR
of the original datasets. Also, the modified Swift-SRGAN model performed better than the original Swift-SRGAN model. This is justified by the fact that the additional convolutional layers made it possible to extract more unique features and better train the neural network. Wavelet transform helps to get sharper contours and additionally removes background artifacts. The last unexpected property makes blood smear images clearer and more attractive for subsequent cell recognition.
20
M. N. Favorskaya and M. O. Kutuzova
e
f
g
h
j Fig. 1.6 (continued)
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
21
a
b
c
d
e Fig. 1.7 Comparative results for image reconstruction from Set2: a original HR images, b LR images, c bicubic reconstruction, d MRC-Net, e MRC-Net + DWSR, f Swift-SRGAN, g SwiftSRGAN + DWSR, h modified Swift-SRGAN, j modified Swift-SRGAN + DWSR
22
M. N. Favorskaya and M. O. Kutuzova
f
g
h
j Fig. 1.7 (continued)
1.5 Conclusions In practice, many microscopic blood smear images do not have sufficient resolution. This requires improving these images so that the texture and contours of the reconstructed SR image remain in good quality. Traditional methods, such as bicubic interpolation, did not provide sufficient results, and only in recent years, deep SISR methods have made a great contribution to this field. The current research examines the latest non-GAN-based SR model (MRC-Net) and GAN-based SR model (SwiftGAN), including their modification and the use of the addition DWSR network. The study includes rich experimental material obtained from open datasets. The best
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
23
a
b
c
d
e Fig. 1.8 Comparative results for image reconstruction from Set1: a original HR images, b LR images, c bicubic reconstruction, d MRC-Net, e MRC-Net + DWSR, f Swift-SRGAN, g SwiftSRGAN + DWSR, h modified Swift-SRGAN, j modified Swift-SRGAN + DWSR
24
M. N. Favorskaya and M. O. Kutuzova
f
g
h
j Fig. 1.8 (continued)
achieved results are objective estimates of a PSNR value of 30.1 dB and an SSIM value of 0.91. We see three directions for future research: preliminary analysis of color microscopic images, incorporation of wavelet processing into non-GAN-based SR models and GAN-based SR models, and implementation of real-time modification for practical requirements.
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
25
References 1. Prasad, N.K.T.K., Singh, B.M.K.: Analysis of red blood cells from peripheral blood smear images for anemia detection: a methodological review. Med. Biol. Eng. Comput. 60, 2445–2462 (2022) 2. Su, M-C., Cheng, C.-Y., Wang, P.-C.: A neural-network-based approach to white blood cell classification. Sci. World J. 2014, 796371.1–796371.9 (2014) 3. Dong, N., Zhai, M.-D., Chang, J.-F., Wu, C.-H.: A self-adaptive approach for white blood cell classification towards point-of-care testing. Appl. Soft Comput. 111, 107709.1–107709.13 (2021) 4. Alomari, Y.M., Abdullah, S.N., Azma, R.Z., Omar, K.: Automatic detection and quantification of WBCs and RBCs using iterative structured circle detection algorithm. Comput. Math. Methods Med. 2014, 979302.1–979302.17 (2014) 5. Saraswat, M., Arya, K.V.: Automated microscopic image analysis for leukocytes identification: a survey. Micron 65, 20–33 (2014) 6. Safuan, S.N.M., Tomari, M.R.M., Zakaria, W.N.W.: White blood cell (WBC) counting analysis in blood smear images using various color segmentation methods. Measurement 116, 543–555 (2018) 7. Zhang, C., Xiao, X., Li, X., Chen, Y.-J., Zhen, W., Chang, J., Zheng, C., Liu, Z.: White blood cell segmentation by color-space-based K-mean clustering. Sensors 14, 16128–16147 (2014) 8. López-Puigdollers, D., Traver, V.J., Pla, F.: Recognizing white blood cells with local image descriptors. Expert Syst. Appl. 115, 695–708 (2019) 9. AL-Dulaimi, K., Tomeo-Reyes, I., Banks, J., Chandran, V.: Evaluation and benchmarking of level set-based three forces via geometric active contours for segmentation of white blood cell nuclei shape. Comput. Biol. Med. 116, 103568.1–103568.15 (2020) 10. Ahmed, I., Balestrieri, E., Tudosa, I, Lamonaca, F.: Segmentation techniques for morphometric measurements of blood cells: overview and research challenges. Meas.: Sens. 24, 100430.1– 100430.12 (2022) 11. Chen, H., Liu, J., Hua, C., Feng, J., Pang, B., Cao, D., Li, C.: Accurate classification of white blood cells by coupling pre-trained ResNet and DenseNet with SCAM mechanism. BMC Bioinform 23, 282.1–282.20 (2022) 12. Durant, T.J., Olson, E.M., Schulz, W.L., Torres, R.: Very deep convolutional neural networks for morphologic classification of erythrocytes. Clin. Chem. 63(12), 1847–1855 (2017) 13. Nayak, D.R., Padhy, N., Swain, B.K.: Blood cell image segmentation using modified fuzzy divergence with morphological transforms. Mater. Today: Proc. 37, 2708–2718 (2021) 14. Davamani, K.A., Robin, R.C.R., Robin, D.D., Anbarasi, J.L.: Adaptive blood cell segmentation and hybrid learning-based blood cell classification: a meta-heuristic-based model. Biomed. Signal Process. Control 75, 103570.1–103570.16 (2022) 15. Harun, N.H., Bakar, J.A., Abd Wahab, Z., Osman, M.K., Harun, H.: Color image enhancement of acute leukemia cells in blood microscopic image for leukemia detection sample. In: The 2020 IEEE 10th Symposium on Computer Applications & Industrial Electronics (ISCAIE), pp. 24–29. IEEE, Malaysia (2020) 16. Shirazi, S.H., Umar, A.I., Naz, S., Razzak, M.I.: Efficient leukocyte segmentation and recognition in peripheral blood image. Technol. Health Care 24(3), 335–347 (2016) 17. Gupta, A., Duggal, R., Gehlot, S., Gupta, R., Mangal, A., Kumar, L., Thakkar, N., Satpathy, D.: GCTI-SN: geometry-inspired chemical and tissue invariant stain normalization of microscopic medical images. Med. Image Anal. 65, 101788.1–101788.18 (2020) 18. Li, Y., Sixou, B., Peyrin, F.: A review of the deep learning methods for medical images super resolution problems. IRBM 42, 120–133 (2021) 19. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2015) 20. Dong, C., Loy, C.C., Tang, X.: Accelerating the super-resolution convolutional neural network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016. ECCV 2016. LNCS, vol. 9906, pp. 391–407. Springer, Cham (2016)
26
M. N. Favorskaya and M. O. Kutuzova
21. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Realtime single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874– 1883. IEEE, Las Vegas, NV, USA (2016) 22. Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Twitter, W.S.: Photo-realistic single image super-resolution using a generative adversarial network. The IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690. IEEE, Honolulu, HI, USA (2017) 23. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN, pp 1–32 (2017). arXiv:1701.078 75v3 24. Jiang, Z., Huang, Y., Hu, L.: Single image super-resolution: depthwise separable convolution super-resolution generative adversarial network. Appl. Sci. 10, 375–375.10 (2020) 25. Ji, H., Fermüller, C.: Robust wavelet-based super-resolution reconstruction: theory and algorithm. IEEE Trans. Patt. Anal. Mach. Intell. 31(4), 649–660 (2009) 26. Huang, H., He, R., Sun, Z., Tan, T.: Wavelet-SRNet: a wavelet-based CNN for multi-scale face super resolution. 2017 IEEE International Conference on Computer Vision, pp. 1698–1706. IEEE, Honolulu, HI, USA (2017) 27. Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.: Multi-level wavelet-CNN for image restoration. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773– 782. Computer Vision Foundation/IEEE Computer Society, Salt Lake City Utah, USA (2018) 28. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems (NIPS), pp. 2672–2680, Montreal, Quebec, Canada (2014) 29. Chen, Z., Guo, X., Woo, P.Y.M., Yuan, Y.: Super-resolution enhanced medical image diagnosis with sample affinity interaction. IEEE Trans. Med. Imaging 40(5), 1377–1389 (2021) 30. Krishnan, K.S., Krishnan, K.S.: SwiftSRGAN—Rethinking super-resolution for efficient and real-time inference. In: International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA 2021), Virtual Conference, vol. 1, pp. 46–51 (2021) 31. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1251–1258. IEEE, Las Vegas, NV, USA (2016) 32. Blood cell images. https://www.kaggle.com/datasets/paultimothymooney/blood-cells. Accessed 20 Dec. 2022 33. Malaria bounding boxes. https://www.kaggle.com/datasets/kmader/malaria-bounding-boxes. Accessed 20 Dec. 2022 34. Guo, T., Mousavi, H.S., Vu, T.H., Monga, V.: Deep wavelet prediction for image superresolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 104–113. IEEE, Honolulu, HI, USA (2017)
1 Medical Microscopic Single Image Super-Resolution Based on Deep …
27
Dr. Sci. Margarita N. Favorskaya is a Professor and a Head of Department of Informatics and Computer Techniques at Reshetnev Siberian State University of Science and Technology, Russian Federation. Professor Favorskaya is a member of KES organization since 2010, the IPC member and the Chair of invited sessions of over 30 international conferences. She serves as a Reviewer for several international journals, an Associate Editor for the Intelligent Decision Technologies Journal, International Journal of Knowledge-Based and Intelligent Engineering Systems and International Journal of Reasoning-based Intelligent Systems, a Honorary Editor for the International Journal of Knowledge Engineering and Soft Data Paradigms, the Reviewer, Guest Editor, and Book Editor (Springer). She is the author or the co-author of 260 publications and 20 educational manuals in computer science. She co-authored/coedited around 30 books/conference proceedings for Springer in the last 10 years. Professor Favorskaya supervised nine Ph.D. candidates and presently supervising three Ph.D. students. Her main research interests are digital image and videos processing, machine learning, deep learning, remote sensing, pattern recognition, artificial intelligence, and information technologies.
Chapter 2
Making Process Trace Classification More Explainable: Approaches and Experiences in the Medical Field Stefania Montani, Giorgio Leonardi, and Manuel Striani
Abstract Medical process traces store the sequence of activities performed on a patient, during the implementation of a diagnostic or treatment procedure. Medical process trace classification can be used to verify whether single traces meet some expected criteria, or to make predictions about the future of a running trace, thus supporting resource planning and quality assessment. State-of-the-art process trace classification resorts to deep learning, which is proving powerful in several application domains; however, deep learning classification results are typically not explainable, an issue which is particularly relevant in medicine. In our recent work we are tackling this problem, by proposing different approaches. On the one hand, we have defined trace saliency maps, a novel tool able to visually highlight what trace activities are particularly significant for the classification task, resorting to artificial perturbations of the trace at hand that are classified in the same class as the original one. Trace saliency maps can also be paired to the corresponding countermaps, built from artificial trace perturbations classified in a different class, able to refine the map information. On the other hand, we are adopting a string alignment strategy to verify what activities are conserved, in like-nearest-neighbours (i.e., referring to the nearest traces classified in the same class as the trace under examination). Specific activities, identified in similar trace positions, can in fact justify the classification outcome, which was based on non-explainable latent features, extracted by the deep learning technique. In the chapter, we will describe the two approaches, and provide some experimental results in the field of stroke patient management. Keywords Medical process trace classification · Deep learning · Explainability · Stroke
S. Montani (B) · G. Leonardi · M. Striani DISIT, Computer Science Institute, Università del Piemonte Orientale, Alessandria, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_2
29
30
S. Montani et al.
2.1 Introduction Modern hospital information systems log a huge quantity of patient data, including process traces, i.e., the sequences of diagnostic or therapeutic activities executed on patients in order to care or prevent diseases. The analysis of medical process traces enables physicians and administrators to better understand the actual processes implemented at their organization, thus identifying issues or bottlenecks, and allowing them to work in the direction of quality of care improvement. In the area of process trace analysis, trace classification [2] is gaining particular attention. Such a task consists in exploiting the logged activity sequences to classify traces on the basis of some categorical or numerical performance properties, that can be later used to check if a trace is compliant with some expected criteria (in a quality assessment perspective), or to make predictions about the temporal, human or instrumental resource consumption needed to complete a running trace (in a resource optimization perspective). State-of-the-art approaches to trace classification resort to deep learning techniques [9]. Deep learning architectures operate by creating a hierarchy of increasingly more abstract latent features, which are exploited to provide class assignment. Deep learning has proved very powerful for classification in many applications, including medical ones. However, the meaning of latent features and their correlation to the original input data are typically difficult to understand, making the output of the classification tool not explainable. This is a very critical issue and an open problem, particularly relevant in medicine, where classification results might be used by physcians to reach a final conclusions about diagnostic or therapeutic choices. In our previous work [10], we addressed this problem by introducing trace saliency maps, a novel tool able to graphically highlight what trace activities are particularly significant for the classification task. A trace saliency map is built by generating artificial perturbations of the trace at hand that are classified in the same class as the original one, called examples. The idea is that the artificial traces that share the same class as the original one, also share elements (i.e., activities and their position in the sequence) that are relevant for the classification process. These elements are highlighted in the trace saliency map. We are now extending this approach, by considering counterexamples as well, i.e., artificial perturbations that are classified in a different class with respect to the original trace. Counterexamples can be used to build a countermap, able to refine the corresponding trace saliency map information. Moreover,
2 Making Process Trace Classification More Explainable … Table 2.1 Table of acronyms Acronym k-NN CNN AAE CT ECG LIME LORE
31
Definition k-nearest neighbour Convolutional neural network Adversarial auto encoder Computerized tomography ElectroCardioGram Local interpretable model-agnostic explanations Local rule-based explanations
we are investigating a second direction to improve trace classification explainability. In particular, while in the previously mentioned approach (based on trace saliency maps and countermaps), we resort to a deep learning architecture to complete the classification task, in this case we resort to deep learning just to extract latent features, and to k-Nearest Neighbour (K-NN) techniques to identify the class. We are then adopting a string alignment strategy to verify what activities are conserved, in the nearest traces classified in the same class as the trace under examination (called like-nearest-neighbours). Specific activities, identified in similar trace positions, can in fact justify the classification outcome. In this paper we will describe both the approaches to trace classification explainability, and provide some experimental results in the field of stroke patient management. The paper is organized as follows. Section 2.2 illustrates the first proposed approach, which exploits the definition of trace saliency maps and countermaps, by providing both the methodological and the experimental work details; analogously, Sect. 2.3 presents the second approach, based on string alignment; in Sect. 2.4 we compare with related work. Finally, Sect. 2.5 provides our concluding remarks. A table of acronyms (Table 2.1) is provided below, as a useful quick reference.
2.2 Explainable Trace Classification by Trace Saliency Maps and Countermaps In this section, we first illustrate our methodological approach to trace classification explainability by means of trace saliency maps and countermaps; we then move to the presentation of some experimental results.
32
S. Montani et al.
2.2.1 Methodology Our first approach to explainable trace classification exploits a Convolutional Neural Network (CNN) [1] (see Fig. 2.1; the interested reader is referenced to [10] for further technical details). The CNN takes in input a process trace, properly converted into a numeric vector, where each activity is encoded as an integer (activity-code henceforth), and outputs the class. In order to make the result explainable, the trace under examination is also provided as an input to an Adversarial AutoEncoder (AAE) [13] (see Fig. 2.2; the interested reader is referenced to [10] for further technical details), which generates a neighbourhood of artificial traces, as perturbations of the original one. Artificial traces are then classified by the same CNN architecture described above, being thus divided into examples (if classified in the same class as the original trace), and counterexamples (if classified in a different class1 ). Given the examples in the neighbourhood of a real trace x, the trace saliency map is built. For each activity-code ack in trace x, we calculate the average of all the corresponding activity-codes of the examples, obtaining the values μk with k ∈ [1, m]. We then calculate the absolute value of the difference between ack and the average value μk . The trace saliency map of x is the sequence of these differences (|ack − μk | with k ∈ [1, m]). When a low difference is obtained, it means that the corresponding activity provides a great contribution to classification. On the other hand, when the difference is high, the activity is unimportant, since classification did not change even if the activity was not conserved. Intuitively, we color in red the activities where the greater differences were calculated, while we color in green the activities with minimal differences, which correspond to the most conserved and important ones. On the other hand, we adopt counterexamples to build the countermap. For each activity-code ack in trace x, we calculate the average of all the corresponding activitycodes of the counterexamples, obtaining the average values νk with k ∈ [1, m]. We compute the absolute value of the difference between ack and the average value νk . The countermap of x is the sequence of these differences (|ack − νk | with k ∈ [1, m]). This time, activities associated to a high difference are those that provide the greater contribution to classification: indeed, these activities changed with respect to the original trace, and this change led to a change in the classification output as well. In our interface, they will be highlighted in green. Instead, activities associated to a small difference in the countermap are not very important, because classification changed even if they were conserved in the artificial traces: they will be highlighted in red. Figure 2.3 shows the visualization of a trace saliency map (left), and of the corresponding countermap (right), as provided by our tool. In both plots, activities associated to a medium difference value are highlighted in yellow. 1
In our current application, we deal with a binary classification. However, the approach could be generalized to more than two classes.
2 Making Process Trace Classification More Explainable …
33
Fig. 2.1 CNN architecture for trace classification
Countermaps can be exploited to enforce or better characterize the output of trace saliency maps. Indeed, if an activity is in green both in the trace saliency map and in the countermap, it is certainly important for classification, while if it is in red in both maps, it is certainly not relevant. Therefore, identical colors can enforce the role of a specific activity in determining the classification output. On the other hand, when the colors of an activity do not match in the two maps, the countermap can be used as an instrument to reduce or increase the importance of such an activity in determining the classification output, thus better characterizing the situation and improving the overall explainability.
34
Fig. 2.2 AAE architecture for artificial trace generation
Fig. 2.3 A trace saliency map (left) and the corresponding countermap (right)
S. Montani et al.
2 Making Process Trace Classification More Explainable …
35
2.2.2 Results We are testing explainable process trace classification on stroke management. Stroke is a severe medical condition, where insufficient blood flow to the brain can lead to cell death and thus to serious adverse events threatening patient health and survival. In this domain, it is necessary to distinguish between simple versus complex patients. Complex patients are those affected by co-morbidities, or experiencing a rare stroke type. These patients are supposed to follow more articulated guideline recommendations (including additional tests or specialistic consultations), thus originating different traces. Binary classification can thus be adopted, to verify the quality and correctness of the provided patient care. If simple and complex patient traces cannot be distinguished, a warning, e.g., about the need for a better human or resource scheduling, might be triggered. In our experiments we could resort to a log of 2629 traces, composed by a number of activities ranging from 10 to 25 (16 on average). 905 traces were defined as complex by medical experts, while 1724 were judged as simple. First, we classified all traces resorting to the CNN black box architecture described above, reaching an accuracy of 82% (detailed classification results can be found in [10]). We then generated the trace saliency map and the countermap for each of the 2629 traces, and calculated the error percentage, defined as follows: error =
x| x∈Log |discord |totalx |
T
∗ 100
(2.1)
where |discordx | is the number of activities in trace x that are highlighted in green in the trace saliency map and in red in the corresponding countermap, or viceversa, while |totalx | is the number of activities trace x; T is the number of traces in the log. In our experimental work, the error percentage reached a value of 31.94% on simple patients, and a value of 27.61% on complex ones, demonstrating that the countermap information reinforces the trace saliency map information in over two thirds of the cases. In addition to these quantitative measure, we also conducted a qualitative analysis of the map/countermap pairs, verifying that, in most cases, the findings they highlight are coherent with medical knowledge, and analysing how countermap information can be of help, even when it is not reinforcing the trace saliency map one. In the following, we discuss in detail one case. Figure 2.3 compares a trace saliency map (left), obtained on a complex patient trace, to the corresponding countermap (right), activity by activity. As the figure shows, a set of activities have proved relevant for classification: indeed, they appear in green in the map (since they were conserved in the examples), and they appear in green in the countermap as well (since they were not conserved in the counterexamples— thus determining the misclassification). In particular, we can mention intravenous thrombolysis therapy and antiplatelet agents provision. This result is coherent with medical knowledge: as a matter of fact, the patient at hand is particularly complex,
36
S. Montani et al.
and requires a thrombolytic and anti-aggregant therapy, which is uncommon, already in the initial treatment phase (emergency phase). Then, the countermap makes the complexity of the patient’s situation even clearer: some activities that appear in yellow in the trace saliency map on the left, become green in the countermap. It is the case of coagulation screening, chest x-rays and transthoracic ECG. The countermap therefore allows the user to better understand the high positioning of such activities on the scale of importance: the patient at hand undergoes an atypical set of monitoring actions, which testify the need to carefully follow the evolution of his/her complex situation. Equally important is the identification of venous sinus thrombosis, which passes from yellow (in the trace saliency map) to green (in the countermap), and indeed represents a particularly serious condition that must be taken into great consideration. In the opposite direction, we find ECG performed in emergency and rehabilitation therapy, in yellow in the trace saliency map, but in red (i.e., not important) in the countermap. This is coherent with medical knowledge as well: in fact, although it is not certain that all patients receive the ECG on arrival or start rehabilitation quickly, these are two desiderata for all patients, and cannot be used to classify the patient at hand as a complex one. The countermap information can then be able to further characterize the patient’s specific situation, providing an even more explainable classification output to the end user.
2.3 Explainable Trace Classification by String Alignment In this section, we first illustrate our methodological approach to trace classification explainability by means of string alignment; we then present some experimental results.
2.3.1 Methodology As a second approach to explainable process trace classification, we have adopted a deep learning architecture to describe traces in terms of their latent features. Feature extraction is typically used in time series analysis in order to reduce dimensionality, and to provide an input to classical machine learning algorithms, such as k-Nearest Neighbour (k-NN) classification. When the input data are process traces, hand-crafted feature engineering may be adopted to extract features. However, this method is time consuming, and not always possible. Therefore, we propose to resort to latent features extraction to summarize and characterize the trace information. We then use the latent features as an input to a K-NN classifier using the Euclidean distance as a similarity measure, as provided by the open source tool Weka [8]. The architecture we propose for latent features extraction is obtained from the one we already described, and shown in Fig. 2.1, by simply removing the classification
2 Making Process Trace Classification More Explainable …
37
layer. Notably, different architectures may be resorted to as well, and will be tested in our future work. In order to make the classification results explainable, we then exploit a string alignment strategy to verify what activities are conserved, in the nearest traces classified in the same class as the trace under examination (like-NNs henceforth). Specifically, given a trace x, we first classify it through K-NN classification. We then identify the like-NNs among the K retrieved traces: since K-NN operates by majority voting among the K retrieved cases, the number p of the like-NNs might be lower than K. On each one of the p like-NNs we perform string alignment to x, by exploting the Levenshtein distance [11]. Alignments with a displacement of a single position are considered as correct. Frequently aligned activities, identified in similar trace positions, can justify the classification outcome.
2.3.2 Results In the stroke domain, some activities are prescribed by the clinical guideline, and must be executed on all patients, both simple and complex. These activities will therefore be extremely frequent in the alignments. We are more interested in slightly less frequent—but still relevant—ones, which, according to medical knowledge, are indicative of a specific class. Table 2.2 shows the top frequently aligned activities in the overall class of simple patients (left column) and in the class of complex ones (right column). As it can be observed, the first three activities are in common. This is not surprising, since, as commented above, a set of activities are prescribed by the guideline in the management of all patient types. The three top activities in the table (blood test, computerized tomography (CT), and electrocardiogram (ECG) test) in fact belong to this set. On the other hand, the next two activities in the two columns are different: in simple patients, we find coagulation screening and antiaggregant therapy administration— which indeed represent the steps of the default therapy management in simpler situations. Such an approach, on the other hand, can not be adopted in complex patients, where additional specialistic evaluations must be completed before the therapy is administered, in order to correctly identify the presence and severity of comorbidi-
Table 2.2 Top frequently aligned activities Simple patients class Blood test CT ECG test Coagulation screening Antiaggregants
Complex patients class Blood test CT ECG test Neurological consultation Diabetological consultation
38
S. Montani et al.
ties and complications, that could impact on the therapy choice. Indeed, in complex patients a neurological and a diabetological consultation are identified as the most common activities. Such results are thus fully coherent with medical knowledge, and provide a clear justification of the classification output, by highlighting the specificities of the two classes in terms of the most frequently executed activities. Besides the global alignment results presented above, it is worth noting that our interface allows to visualize the alignment results of a specific trace, highlighting in different colors the conserved activities with respect to one of the like-NN traces. Figure 2.4 provides an example: the correctly aligned activities are in green, the activities displaced of just one position in the trace are in orange, deletions are in red and substitutions are in blue. As it can be observed, the medical expert can easily check the coherence of the provided classification result on the basis of domain knowledge. In the specific case, the input trace (on the left) is classified as complex: indeed, besides some activities that should be executed on all patients (CT and blood test—which are in green as they are correctly aligned with the like-NN trace on the right, and ECG test—in orange as it displaced of just one position), it contains two specialistic consultations (with the diabetologist and with the cardiologist) and a specific diagnostic test (supra-trunck doppler), all highlighted in green by the interface, which are typical of a problematic patient. In the future, we plan to conduct a more extensive evaluation, by considering the alignement of patterns of activities (instead of single activities), and by differently penalizing the possible different displacements (higher than one position) of otherwise identical patterns.
2.4 Related Work Post-hoc explainability consists in providing an explanation of the output of an Artificial Intelligence black box model using some after-the-fact rationale for the output itself [12]; post-hoc explainability is part of the wider research area known as eXplainable Artificial Intelligence. A simple but effective solution to post-hoc explainability consists in providing outcome explanation [7] by returning the “importance” of the features used by the black box model. Saliency maps [7] can be exploited to this end. Saliency maps can provide an explanation in image classification, but have also been proposed for time series and text data. They provide a visual interpretation of feature importance, by highlighting in a graphical way, typically by using different colors, the input elements that have been judged as key ones by the classification tool. An extended literature review about saliency maps in deep learning is provided in our previous work [10]. Here, we just mention the Local Interpretable Model-agnostic Explanations (LIME) tool [15] and the LOcal Rule-based Explanations (LORE) tool [6]. LIME is a model-independent tool that derives explanations from local perturbations of
2 Making Process Trace Classification More Explainable …
39
'''''' ''''''
''''''
Fig. 2.4 Activity alignement between a complex patient trace and its like-NN
the input instance under examination. LIME uses linear models as comprehensible models, returning the importance of the features as explanation.LORE relies on local perturbations of the instance to be explained as well, but constructs a decision rule explaining the reasons of the decision, and a set of counterfactual rules which highlight what feature changes would lead to a different classification result. The contribution in [5], which is applied to image data, uses LORE, and resorts to an Adversarial Autoencoder to generate the synthetic neighbourhood. A significant number of works have been published in the area of process trace classification/prediction resorting to deep learning techniques, such as recurrent networks (see, e.g., [4, 18]), or convolutional networks (see, e.g., [3]). The authors in [3] have obtained better results in predicting the next activity with respect to recurrent architectures in their experiments, justifying our adoption of convolutional networks in the field of process trace classification. However, it is worth noting that we may
40
S. Montani et al.
easily switch to a different black box classifier, and then rely on trace saliency maps and countermaps for explanation, since our map construction procedure does not depend on the classifier itself. Despite the large number of deep learning approaches to trace classification, there are just very few, recent approaches for eXplainable Artificial Intelligence in the process mining domain. One is reported in [17]: here, the authors use LIME [15] to generate explanations for predictions. On the other hand, the work in [14] exploits decision trees as surrogate models for a deep learner output local explanation, in the field of the identification of deviations from the default business process. Moreover, even if process mining is gaining more and more attention in healthcare applications (see, e.g., the survey in [16]), to the best of our knowledge the specific issues we are dealing with in our work have not been afforded yet.
2.5 Conclusions Our recent research is addressing the problem of explainability, in medical process trace classification. In this work, we have illustrated two strategies towards deep learning classification explainability, that we tested in the domain of stroke. The first strategy has been adopted to a scenario where deep learning is in charge of completing the classification task. It combines the use of trace saliency maps (that we introduced in [10]) and countermaps, to visually highlight the most important activities and activity positions which can justify the output of the classification task. Countermaps, in particular, have proved useful to enforce and refine the information carried by trace saliency maps. The second strategy has been applied to a situation where deep learning is resorted to in order to extract process trace features, thus reducing input dimensionality; such features are then used for K-NN classification. In this case, we have adopted a string alignment approach to identify what activities are frequently conserved in like-NN, thus providing a post-hoc explanation of classification output. Tests in the field of stroke management have proved very positive, as discussed with our medical collaborators. In the future, we wish to make further experiments resorting to different deep learning classifiers, in both scenarios. Moreover, we plan to conduct further experiments in the hospital setting, collecting a feedback on the utility of the tools and also on the usability of their interfaces.
2 Making Process Trace Classification More Explainable …
41
References 1. Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Hasan, M. Van Essen, B.C., Awwal, A.A.S., Asari, V.K.: A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3) 2. Breuker, D., Matzner, M., Delfmann, P., Becker, J.: Comprehensible predictive models for business processes. MIS Q. 40, 1009–1034 (2016) 3. Di Mauro, N., Appice, A., Basile, T.M.A.: Activity prediction of business process instances with inception CNN models. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI * IA 2019— Advances in Artificial Intelligence—XVIIIth International Conference of the Italian Association for Artificial Intelligence, Rende, Italy, Proceedings, volume 11946 of Lecture Notes in Computer Science, pp. 348–361. Springer (2019) 4. Evermann, J., Rehse, J.R., Fettke, P.: Predicting process behaviour using deep learning. Decis. Support Syst. 100, 129–140 (2017) 5. Guidotti, R., Monreale, A., Matwin, S., Pedreschi, D.: Black box explanation by learning image exemplars in the latent feature space. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A.J., Maathuis, M.H., Robardet, C (eds.) Machine Learning and Knowledge Discovery in Databases—European Conference, ECML PKDD 2019, Würzburg, Germany, September 1620, 2019, Proceedings, Part I, volume 11906 of Lecture Notes in Computer Science, pp. 189– 205. Springer (2019) 6. Guidotti, R., Monreale, A., Ruggieri, S., Pedreschi, D., Turini, F., Giannotti, F.: Local rule-based explanations of black box decision systems (2018). arxiv:1805.10820 7. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93:1–93:42 (2019) 8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009) 9. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436–444 (2015) 10. Leonardi, G., Montani, S., Striani, M.: Explainable process trace classification: an application to stroke. J. Biomed. Inform. 126, 103981 (2022) 11. Levenshtein, A.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Phys. Doklady 10, 707–710 (1966) 12. Lipton, Z.C.: The mythos of model interpretability. Queue 16(3), 30 (2018) 13. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders (2016) 14. Mehdiyev, N., Fettke, P.: Explainable artificial intelligence for process mining: a general overview and application of a novel local explanation approach for predictive process monitoring (2020). arxiv:2009.02098 15. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?": explaining the predictions of any classifier. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp. 1135–1144. ACM (2016) 16. Rojas, E., Munoz-Gama, J., Sepulveda, M., Capurro, D.: Process mining in healthcare: a literature review. J. Biomed. Inform. 61, 224–236 (2016) 17. Sindhgatta, R., Ouyang, C., Moreira, C., Liao, Y.: Interpreting predictive process monitoring benchmarks (2019). arxiv:1912.10558 18. Tax, N., Teinemaa, I., van Zelst, S.J.: An interdisciplinary comparison of sequence modeling methods for next-element prediction (2018). arxiv:1811.00062
42
S. Montani et al.
Stefania Montani received her Ph.D. in Bioengineering and Medical Informatics from the University of Pavia, Italy, in 2001. She is currently a Full Professor in Computer Science at the University of Piemonte Orientale, in Alessandria, Italy, and the vice-dean of the Department of Science and Technological Innovation (DISIT). Her main research interests focus on casebased reasoning, decision support systems, business process management, machine learning and deep learning, with a particular interest for medical applications. She has authored more than 200 papers in international journals and international refereed conferences in Artificial Intelligence and Medical Informatics. She is a member of the Editorial Board of 6 International journals, Associate Editor of another international journal, and member of several technical program committees of international conferences, including different editions of the International Joint Conference on Artificial Intelligence (IJCAI) and of the European Conference on Artificial Intelligence (ECAI). She has organized workshops co-located with international conferences, special issues in international journals and edited books, related to her main research topics.
Chapter 3
Improving Diagnostics of Pneumonia by Combining Individual Hypotheses on Chest X-Ray Images Anca Ignat and Robert-Adrian G˘ain˘a
Abstract According to the World Health Organization, pneumonia is the cause of death for 14% of children under 5 years old. One of the main tests a physician requires for diagnosing pneumonia is a radiography of the lungs. In this work, we study the problem of automatic classification of pneumonia using chest X-ray images. There are many deep-learning techniques that can achieve this goal. We can interpret these classification results as annotations made by a specialist. Human physicians can annotate the same chest X-ray image, and the same thing happens when the annotation is performed by a machine learning algorithm. We combine these automatically generated, classification results in order to improve the accuracy of the labelling process. Different well–known Convolutional Neural Networks (CNN), Support Vector Machines (SVM), Random Forest (RF) and k-Nearest Neighbour (kNN) are the base classifiers that are combined. We combine these classifiers using voting procedures, probabilistic techniques and a machine-learning approach. The numerical tests were performed using the RSNA (Radiological Society of North America) dataset obtaining results that are better than those obtained with the individual classifiers. Keywords Pneumonia classification · Convolutional neural networks · Support vector machines · Random forest · k-nearest neighbour · Ensemble learning
3.1 Introduction Healthcare data is dominated by medical images such as CT, MRI, and X-ray, which provide non-invasive insight into the internal anatomy of patients. These images are used to enhance medical treatment. One of the most common radiological examinations used in the diagnosis of lung diseases or respiratory symptoms is the chest X-ray (CXR). This method of A. Ignat (B) · R.-A. G˘ain˘a Faculty of Computer Science, University “Alexandru Ioan Cuza” of Ia¸si, str. Berthelot 16, Ia¸si, Romania e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_3
43
44
A. Ignat and R.-A. G˘ain˘a
examination is popular due to its affordability, low radiation exposure, and the ease with which pathologies can be identified. Chest X-rays can be of three types, depending on the position of the patient in report with to the X-ray source: posteroanterior (PA), anteroposterior (AP) and lateral. The PA projection is the standard frontal chest projection where the X-ray beam traverses the patient from posterior to anterior. This type of imaging is used in order to examine the lungs, the thoracic cavity and the great vessels. For the AP projection, the ray goes from the back to the front of the patient and is mostly used when a patient is unable to sit up straight for the PA projection. A common disadvantage of this type of examination is that the heart and the frontal chest section are further from the detector, thus making those regions less visible. The lateral projection is acquired in combination with a PA image for better 3D localization of the suspected chest pathology and displaying the retrosternal and retrocardiac spaces harder to expose with the other methods described. It projects the patient from one lateral side to the other. Because CXR are in high demand when dealing with pulmonary diseases, multiple datasets have been gathered over the years. These are labelled either manually by radiologists or automatically. Among these, we mention: • ChestX-ray14 (C) which consists of 112,120 automatically labelled CXRs from 30,805 patients collected at the (US) National Institute of Health involving 14 types of abnormality. • CheXpert (X) consisting of 224,316 automatically labelled CXRs from 65,240 patients collected at the Stanford Hospital indicating the presence, absence or undecided result for 12 abnormalities. • Ped-Pneumonia (PP) dataset containing 5,856 pediatric CXRs collected from Guangzhou Women and Children Medical Center, China. The labels include pneumonia detection and normal images along with the type of the disease, either bacterial or viral. Most of the datasets used suffer from a common problem: class imbalance. This means that the images are not evenly distributed over all the possible results, with some categories having more samples than others. This can lower the accuracy of a model targeted to recognize the affected classes (the Kermany DB includes 5,856 images, out of which 4,273 are pneumonia and the rest with no anomaly detected). In order to overcome this drawback, the lowest classes are oversampled while the highest classes are undersampled. Pneumonia is a prevalent lung disease caused by viruses, bacteria, or fungi. It results in the inflammation of the lungs and the accumulation of fluids, leading to coughing and difficulty in breathing. Patients with pneumonia often have reduced gas exchange between carbon dioxide and oxygen at the alveolar membrane level. The treatment of pneumonia depends on the pathogen responsible for the infection, with antibiotics used for bacterial pneumonia, antivirals for viral pneumonia, and antifungal drugs for fungal pneumonia. Although anyone can develop this disease,
3 Improving Diagnostics of Pneumonia …
45
it is more common in children and people over 65, and the chances of survival are reduced. Radiologists use CXRs to identify patients with pneumonia by looking for white hazy areas in the lungs that are absent in healthy individuals. However, due to the high demand for these examinations and the shortage of qualified experts, there is a direct correlation between the incidence of pneumonia and the availability of medical infrastructure in a given region. Consequently, recent studies have focused on developing classification models that can recognize pneumonia based on CXRs. The main contributions of this work are: 1. Our study uses RSNA dataset [1] for pneumonia classification, a dataset that is usually employed for semantic segmentation of the lungs and seldom for pneumonia classification. We provide individual classification results with well-known CNN architectures along with improved ensemble learning results. 2. The present work is one of the very few studies that uses ensemble learning for pneumonia classification on the RSNA dataset. 3. We analyse four different methods for combining individual hypothesis: voting, probabilistic, dynamical selection and machine learning algorithms. The present work is organized into seven sections. The second section is dedicated to the research carried out on the classification of CXR images. The third section presents the dataset that was employed in the experiments. Section 3.4 is dedicated to the CNNs employed in our computations. The individual classifiers and the methods employed for combining them are described in Sect. 3.5. The results are presented in Sect. 3.6. The last section is dedicated to conclusions and future research directions.
3.2 Related Work Deep learning models have shown promising results in CXR classification for detecting pneumonia. In recent years, numerous studies have explored the use of various deep learning architectures to classify CXRs for the detection of pneumonia. Some of the popular deep learning architectures used in CXR classification include convolutional neural networks (CNNs), recurrent neural networks (RNNs), and residual neural networks (ResNets). These models have demonstrated remarkable success in accurately classifying CXRs for pneumonia detection. The research in [2] provides a comprehensive review of the state-of-the-art methods for pneumonia identification from chest X-rays (CXRs). The authors conducted a systematic review of the literature published between 2018 and 2020, focusing on deep learning-based methods for pneumonia identification. The review identified several deep learning models, including convolutional neural networks (CNNs) and their variants, that achieved high accuracy in identifying pneumonia from CXRs. The authors also analyzed the dataset sizes, augmentation techniques, and pre-processing methods used in the reviewed studies. Overall, the paper provides a valuable summary
46
A. Ignat and R.-A. G˘ain˘a
of the current state-of-the-art methods for pneumonia identification from CXRs, and highlights the key research challenges and opportunities in this area. Another valuable survey is provided in [3]. The authors present in this paper a comprehensive survey of deep learning methods for chest X-ray (CXR) analysis. The state-of-the-art techniques for various tasks related to CXR analysis, including image classification, object detection, segmentation, and disease diagnosis are presented in this paper. The survey covers various deep learning architectures and techniques, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), attention mechanisms, transfer learning, and ensembling. The paper concludes with a discussion of the future directions and opportunities for research in this area, including the development of more robust and interpretable deep learning models, the integration of clinical data with CXR analysis, and the exploration of multimodal approaches combining CXRs with other medical imaging modalities. In [4], the authors specifically focus on the use of deep learning for the automated detection of pneumonia cases from chest X-ray (CXR) images, with a particular emphasis on COVID-19-related pneumonia cases. The authors discuss the fact that CXR is one of the primary diagnostic tools for COVID-19 pneumonia, and that the demand for automated diagnosis and triage of CXR images has increased due to the COVID-19 pandemic. They also note that the presence of COVID-19 pneumonia can lead to a more severe course of illness and higher mortality rates compared to other types of pneumonia. The paper [5] provides a systematic review and meta-analysis of the accuracy of deep learning models for the automated detection of pneumonia using chest X-ray (CXR) images. The authors reviewed a total of 15 studies that evaluated deep learning models for pneumonia detection using CXR images. They found that the reported diagnostic accuracy of these models varied widely depending on the methods and databases used. The meta-analysis also revealed that models based on transfer learning from pre-trained models achieve higher accuracy compared to those trained from scratch. In [6], the authors present a deep learning approach for the automated diagnosis of pneumonia from chest X-ray (CXR) images. They compared VGG 16 and Xception against the task of classifying CXR images into two classes: normal and pneumonia. The authors evaluated the performance of their model using the publicly available RSNA Pneumonia Detection Challenge dataset and achieve an overall accuracy of 91% for the binary classification. In [7], a deep learning approach for the automated detection of pneumonia from chest X-ray (CXR) images is presented. The authors used a convolutional neural network architecture with and without dropout for feature extraction and classification of CXR images into two classes: normal and pneumonia. The dataset used in the study included a subsample from the RSNA database (5863 CXR images) with labels for normal and pneumonia cases. The study highlights the fact that the use of image augmentation and layer dropout returns the best results (90.68% accuracy) for the CNN model tested in comparison with other variations. The authors of [8] propose a method for detecting pneumonia in chest X-ray (CXR) images using an ensemble of deep learning models. They suggest that com-
3 Improving Diagnostics of Pneumonia …
47
bining multiple models can help improve the accuracy of pneumonia detection. The authors trained three different deep learning models: GoogLeNet, ResNet-18 and DenseNet-121 applying a weighted average ensemble technique for the final result. The performance of the proposed ensemble approach was evaluated on the RSNA Pneumonia Detection Challenge dataset and the Kermany dataset, demonstrating the effectiveness of their approach. They also compared their method with several stateof-the-art deep learning models, and their ensemble approach outperformed these models in terms of accuracy (86.85% for RSNA dataset). In [9] the authors propose a neural network-based approach for pneumonia detection in chest X-ray images. The proposed method uses a CNN and transfer learning for models like Xception, InceptionV3/V4 and EfficientNetB7. Some statistical analysis is performed over the entries from the RSNA dataset and the data distribution is shown based on various metadata classes such as age, sex and different correlations are extracted. In [10], the authors present a recent study that explores the use of radiomic features and contrastive learning for detecting pneumonia in chest X-ray images. Radiomic features are quantitative features extracted from medical images that can be used to characterize the disease. Contrastive learning is a deep learning technique that can be used to learn representations from data by contrasting similar and dissimilar pairs of images. The results showed that the combination of radiomic features and contrastive learning achieved an accuracy of 88.6% over the RSNA dataset for the classification task. A novel deep learning architecture called Detail-Oriented Capsule Networks (DECAPS) is introduced in [11]. This new architecture aims to improve the performance of image classification tasks by capturing both local and global features of the input images. DECAPS are built on top of capsule networks, a type of neural network that uses capsules instead of neurons as the basic building blocks. Capsules are groups of neurons that represent different properties of an object, such as its pose, deformation, and texture. By using capsules, capsule networks can better handle the spatial relationships between objects in an image, leading to improved performance in object recognition tasks. In the context of the RSNA Pneumonia Detection Challenge, the authors applied their DECAPS architecture to classify chest X-ray images into pneumonia or normal categories. They used a modified version of the Inceptionv3 network as the backbone architecture of DECAPS, which extracts feature maps of different resolutions and scales. Then, they added a set of DECAPS modules on top of the backbone network to capture the detailed information in the feature maps. The authors evaluated their approach and achieved state-of-the-art performance with an accuracy of 94.02%.
3.3 Dataset Chest X-ray (CXR) classification today is a vital task for detecting and diagnosing multiple thoracic diseases, including pneumonia. The RSNA (Radiological Society of North America) Pneumonia Detection Challenge dataset [1] used in our experiments
48
A. Ignat and R.-A. G˘ain˘a
is one of the most used datasets for evaluating the performance of deep learning models for Pneumonia classification since it is centred around this pathology. Many other CXR datasets are composed of multiple disease categories, leading to imbalanced class distributions and making it challenging to evaluate the accuracy of a model for pneumonia classification.
3.3.1 RSNA Pneumonia Detection Challenge The RSNA Pneumonia Detection Challenge is a competition organized by the Radiological Society of North America (RSNA) in partnership with Kaggle, a data science community and platform. The challenge was launched in 2018 with the aim of developing a model that could automatically detect the presence of pneumonia using one or multiple bounding boxes and classify a set of chest X-ray images. The challenge dataset was provided by the RSNA, which consisted of over 30,000 chest X-ray images with accompanying labels indicating the presence or absence of pneumonia. The images were collected from various hospitals and clinics around the world and were labelled by a panel of experienced radiologists. Participants in the challenge were required to develop a deep learning model that could accurately identify and classify pneumonia from the provided chest X-ray images. The performance of the models was evaluated based on their ability to accurately predict the presence or absence of pneumonia in a separate set of test images that were not used during model training. The challenge was open to participants from around the world, including data scientists, machine learning experts, and healthcare professionals. The top-performing models were awarded cash prizes, with the best model receiving a prize of $30,000. The RSNA Pneumonia Detection Challenge was a significant event in the field of medical image analysis and machine learning, as it provided a platform for researchers and practitioners to develop and showcase their skills and expertise in using deep learning to solve real-world problems in healthcare. The results of the challenge demonstrated the potential of deep learning models to accurately detect and classify pneumonia from chest X-ray images. The topperforming models achieved high levels of accuracy, indicating that deep learning algorithms could be a valuable tool for improving the efficiency and accuracy of pneumonia diagnosis, especially in resource-constrained settings.
3.3.2 Pre-processing of the RSNA Images and Data Augmentation The images from RSNA (see Fig. 3.1) are part of a wider chest X-ray dataset made publicly available by the National Institute of Health, and have more labels that we
3 Improving Diagnostics of Pneumonia …
49
Fig. 3.1 Examples of images from RSNA dataset: a normal lungs, b lungs with no opacity but not normal, c lungs with pneumonia
used in this paper. The RSNA selected images have three annotations: “normal”, “with opacity” (pneumonia-affected lungs) and “no opacity, not normal”. We used the dataset employed in the first stage of the competition because it has a training set and a test set, which are both annotated. The training set was annotated by radiologists that are not specialists in thoracic examinations. The test images were labelled by specialists in thoracic radiology. The images are in DICOM format. In the training set, there are 5659 images of lungs with opacity, 11500 images labelled “no opacity, not normal” and 8525 images of normal lungs. The training set contains 353 images of lungs with opacity, 321 images labelled “no opacity, not normal” and 326 images of normal lungs. In our computations, we merged in one class the images labelled “normal” and “no opacity, not normal”, the second class was formed with the images labelled “with opacity”. Thus, we obtained a dataset that was unbalanced (5659 images in one class and 20025 images in the second class). To address this problem we performed upscaling-downscaling and downscaling-upscaling (with different scale factors) of the images with bicubic interpolation, thus obtaining 16977 images. From each image we select the middle part, we keep 75% of the original image, thus eliminating the information from the border of the images, which does not contain useful information for the classification process. We apply CLAHE (contrast limited adaptive histogram equalization) with a small value for the contrast-related parameter. More details about pre-processing techniques and their effect on the accuracy results can be found in [12].
50
A. Ignat and R.-A. G˘ain˘a
3.4 Networks For feature extraction and classification, we used well-known deep learning networks. These networks are presented below.
3.4.1 Residual Networks Residual Networks [13] (ResNet) is a variant of the HighwayNet, one of the first deep feedforward neural networks with an increased number of layers. In comparison with the initial model discussed where some layers could be closed for learning in order to skip them, ResNet allows for n-layer skips based on the activation from the current layer. This helps solve two common problems when using neural network architectures: the vanishing gradients and degradation. The vanishing gradient occurs when training neural networks with gradient-based methods and backpropagation. In this way, the weights of the model receive an update proportional to the partial derivative of the error function with respect to the current weight. When the number of layers and weights is increased the proportion of each is lowered thus decreasing the gradient as we propagate to the initial layers. Therefore in some cases, the networks fail to learn after a number of iterations. Degradation is a less intuitive drawback of big neural networks. It was already proven that increasing the network depth might saturate the accuracy but it might also leave room for propagating an error through the network. For instance, if we have a function that can be learned efficiently with 9 layers and then we extend it to 20 layers whilst keeping the initial 9 layers as the first ones, the remaining layers will have to learn the function f (x) = x. It would be more efficient to use an 11-skip in order to go directly from the 9th layer to the output result without training the last layers.
3.4.2 Visual Geometry Group Visual Geometry Group [14] (VGG) is a standard deep Convolutional network (CNN) architecture with 16 or 19 layers and is one of the best-performing models for the object recognition problem in image processing. VGG is a variation over AlexNet that obtained better results over ILSVRC and has the following structure: • The input consists of a 224 × 224 pixel image. • The convolutional layers of the VGG use the smallest possible size able to capture all 2D directions with a 3 × 3 receptive field. • The hidden layers use ReLU as the activation function and do not use the Local Response Normalization as it is believed to increase both the memory consumption and training time without improving the model accuracy. • Three fully connected layers, the first two having 4096 channels each and the last one having 1000 channels (one for each class expected for the ImageNet competition)
3 Improving Diagnostics of Pneumonia …
51
3.4.3 Inception An inception network [15] is a deep neural network that is based upon repeating components referred to as Inception modules. The main idea while designing this architecture was that CNNs learn the best by extracting features at different levels. Therefore, the same input can be analyzed at different scales in order to extract different features depending on the focus zone. In order to apply this principle, 1 × 1, 3 × 3 and 5 × 5 convolutions along with a pooling path composed of each module and multiple modules were stacked on top of each other with the possibility of returning a result after every few modules. Since modules are rather sparse, an Inception network has a high efficiency and high-performance rate in comparison with other heavy architectures that scale poorly with the problem dimension. The different convolution sizes allow for better feature extractions making it easier to differentiate samples that might be similar from only one perspective.
3.4.4 Densenet When using standard CNNs, the input goes through multiple convolution levels in order to obtain high-level features. As presented in the Residual Network section, skipping layers can be used in order to improve the accuracy of the model. Densenets [16] combine both features such that a layer receives inputs from all the previous layers while passing its results to all of the next ones. Because there is a high increase in weights as the number of layers increases, every few blocks we should apply a pooling operation in order to reduce the sample size for the following blocks. This avoids the vanishing gradient problem and allows for better error corrections in each block of layers. The main advantages of using this type of architecture are: the scalability of the system with size, the sample diversity and better decisions over small training data since features can be reused by multiple blocks.
3.4.5 Mobilenets MobileNet [17] is a CNN architecture that is much smaller than the others previously presented and also much faster, based on the principle of Depthwise Separable convolution. The name comes from the portability of the model that can be implemented on mobile and embedded devices due to the reduced resources required. The main difference between regular CNNs and Depthwise Convolution is that in the first category, convolutions are performed over a subset from all the input
52
A. Ignat and R.-A. G˘ain˘a
channels while in the latter each channel is processed separately. Therefore, an RGB image can be split into the main 3 channels, operations can be applied over each of them and the output is obtained by combining the channel results. Because of this separation, the number of weights and training time required reduces significantly. However, there is a trade-off between efficiency and accuracy thus models trained with MobileNet tend to perform worse than other heavymodels.
3.4.6 Shufflenet ShuffleNet [18] is a deep neural network architecture designed for efficient image classification tasks on mobile and embedded devices. It was introduced as a response to the need for lighter and faster networks that could run on these devices with limited computational resources. The main idea behind ShuffleNet is to improve inter-layer communication and reduce computation through channel shuffling, a technique that allows for exchanging information between different feature maps and channel groups. This helps to better propagate the information throughout the network and capture more complex features. ShuffleNet is also designed to be computationally efficient and has a small number of parameters compared to other networks. It uses a combination of 1 × 1 and 3 × 3 convolutions to reduce the computational load while still extracting useful features. Additionally, ShuffleNet is able to maintain high accuracy on image classification tasks while using significantly less computation compared to other networks, making it a popular choice for deployment on mobile and embedded devices.
3.5 Combinations of Classification Results In the following section, we shall present how the individual classifiers were obtained and the methods employed for combining these classifiers. We coded the two classes as 0 and 1, where 0 is assigned for the images of lungs with opacity, and 1 for the normal lungs and those not normal but with no opacity. We denote class 1 as l1 = 0 and the second class as l2 = 1.
3.5.1 Base Classifiers In our computations, we used eight well-known Convolutional Neural Networks: Resnet18, Resnet50 [13], VGG16, VGG19 [14], Densenet201 [16], Mobilenet v2 [17], Inception v3 [15], and Shufflenet [18] . These networks were trained on the
3 Improving Diagnostics of Pneumonia …
53
training set of the RSNA collection of CXR images. We, then, classified all the test images. These are the first eight classifiers, c1 , c2 , . . . , c8 . Using the same networks mentioned above, one can extract features for all the images in the dataset, by stopping the network evolution before the classification step. Thus each image has a feature vector associated. Using the eight networks considered above, we obtain eight different sets of features. For each set of features, Random Forest (RF), Support Vector Machines (SVM) with order 2 polynomial kernel and k-NearestNeighbour (kNN) machine learning algorithms were applied for classification [19]. In order to establish the best parameters for the above mentioned machine learning techniques, we used a validation set build by randomly extracting 25% of the training set. We trained the models with the remaining 75% images of the training set. We selected the set of parameters that provided the best results on the validation set. For the Random Forest we tested various number of decision trees, 500, 1000, 2000, 4000, 5000, 60000. In the case of the SVM we tested three different types of kernels: linear and polynomial with degrees 2 and 3. For kNN the tuned parameter was k, the size of the neighborhood that provides the label for a new image, k ∈ {1, 3, 21, 101, 151, 201, 301, 501}. After selecting these details, we used the entire training set for training these methods and after that we classified the images from the test set. Thus, one obtains 24 more classification results, each providing possible labels for the test samples. Using all these classification methods, one obtains a 32-dimensional vector of possible labels for each test image. We consider that for each test image, one has a pool of L ≤ 32 classifiers, C = {c1 , c2 , . . . , c L }. These classifiers allows us to associate to a image test T a vector with binary elements, r = (r1 , r2 , . . . , r L ) ∈ {0, 1} L , where ri ∈ {0, 1} are the result of classification for the test image T by classifier ci ∈ C.
3.5.2 Voting and Weighted Voting Combinations [20] We used the 32 classification results in the following way: we selected all possible combinations of 3, 5, 7, and 9 classifiers, and registered the combination that provided the best classification result. We applied the voting procedure in the following way. One first computes the mean value of the vector with labels r : L 1 r¯ = ri . L i=1
(3.1)
Depending on the value for r¯ one chooses the new class for the test image T . if r¯ < 0.5 then new_class = 0; else new_class = 1.
54
A. Ignat and R.-A. G˘ain˘a
We also considered a weighted voting procedure, by mixing the classification results from multiple classifiers. One computes the sum: S=
L
ai ri .
(3.2)
i=1
where the values of constant ai satisfy the relation: ai ∈ { 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 } ∀ i ,
L
ai = 1
(3.3)
i=1
The new class for the test image is established using the same rule as above: if S < 0.5 then new_class = 0; else new_class = 1.
3.5.3 Bayes Rule Combination of Classifiers [21] This approach assumes conditional independence of the classification results obtained with the individual classifiers. Denote P(r j ) the probability that classifier c j labels the test image as belonging to class r j . The label of the test image can be either l1 = 0 or l2 = 1. Using the conditional independence property, the following relation holds: P(r |lk ) = P(r1 , r2 , . . . , r L |lk ) =
L
P(ri |lk )
(3.4)
i=1
For labelling the test image we need to compute the a posteriori probabilities P(lk |r ) which can be computed using the Bayes Theorem: P(lk |r ) =
P(r |lk )P(lk ) , k = 1, 2. P(r )
(3.5)
The test image T gets the label lk given for the label with maximum a posteriori probability P(lk |r ) value: argmax{P(l1 |r ), P(l2 |r )}.
(3.6)
As the denominator in relation (3.5) does not depend on the label index, one computes the maximum value:
3 Improving Diagnostics of Pneumonia …
argmax{P(l1 )
L
55
P(ri |l1 ), P(l2 )
i=1
L
P(ri |l2 )}
(3.7)
i=1
to establish the new class for test image T . Assume that the training set has n 1 images from class l1 and n 2 images in class l2 . One estimates: n1 n2 P(l1 ) = , P(l2 ) = . (3.8) n1 + n2 n1 + n2 For estimating the probabilities P(ri |lk ), one uses the confusion matrices associated with each classifier, C M I i = (cm ikq ), cm ikq = the number of images that have true label lk and the classifier ci labelled them as belonging to class lq . In our situation, the confusion matrices are 2 × 2 matrices. We have the estimation: P (ri |lk ) =
cm ik(ri +1) nk
.
(3.9)
3.5.4 Probability-Based Dynamically Selected Classifiers [22, 23] In the approaches that dynamically select the best (in a certain way, to be specified) form an ensemble, a significant role in labelling the test image has a set of K nearest neighbours from the training set, Q = {Q 1 , Q 2 , . . . , Q K }. The method we used in our computations (we named it A Priori Algorithm), selects a single classifier from a pool of classifiers, analyzing the accuracy of these classifiers in the selected kNN region, Q. Using this set, one computes the probability that a certain classifier c j assigns the correct label to the test image T . The measure of correctness for an individual classifier is estimated as the class posterior probability of classifier c j over Q, in the following way: p(classifier c j is correct) = p(corr ect j ) = K =
i=1 δi
p j (lk |Q i ∈ lk ) 1 , δi = . K Q − T 2 i i=1 δi
(3.10)
where · 2 is the Euclidean distance. The A Priori Algorithm has as input an ensemble of classifiers C, the test image T , the training set and K the size of the selected neighbourhood. The technique outputs the classifier that assigned the most probable label.
56
A. Ignat and R.-A. G˘ain˘a
The algorithm has the following steps: 1. Compute Q = {Q 1 , Q 2 , . . . , Q K } a subset from the training set of images closest to the test image T . 2. For each individual classifier c j compute its correctness probability p j (corr ect) with formula (3.10). 3. C S ⊆ C, C S = {c j ; p j (corr ect) ≥ 0.5} is a subset of classifiers with correctness greater than 0.5. 4. Compute the classifier that has the maximum correctness, i.e., m = argmax{ p j (corr ect); c j ∈ C S}. 5. selected = true; 6. For each classifier c j ∈ C S • Compute d = pm (corr ect) − p j (corr ect). • If j = m and d < thr selected=false. 7. If selected = true then the classifier with the most probable result for labelling test image T is cm , new_class = rm . 8. If selected = false randomly select a classifier in CS that has d < thr to provide the new class for test image T . The best results were obtained with thr = 0.1. If there are other classifiers that are close to the best one from the point of view of correctness, the label for the test image can be provided by any of these classifiers.
3.5.5 Combining Classifiers with Machine Learning Algorithms [20] For each image in the dataset, we build the vector of labels, r , with the results of the individual classifier. This vector can be interpreted as a feature vector associated with each image. Using these feature vectors, we apply classification algorithms, such as SVM, RF, kNN to deduce a new label for the test images.
3.6 Results The computations were performed using MATLAB [24]. The networks were trained only once, using a standard set of parameters: 10 epochs, 128 batch size, stochastic gradient descent optimizer, 10−4 learning rate. These networks were trained on 37002 images (20025 CXR images with no opacity and 16977 images with lung opacity). All the images were resized according to the requirements of the network
3 Improving Diagnostics of Pneumonia …
57
Table 3.1 Accuracy for the individual/base classifiers on the test set (in %) Networks DL (%) RF (%) 201-NN (%) Resnet18 Resnet50 VGG16 VGG19 Mobilenet Densenet Inception Shufflenet
78.5 78.0 78.4 77.0 75.9 78.1 76.0 77.1
79.2 79.3 78.4 77.3 77.4 79.0 76.8 79.8
79.0 79.3 81.2 79.1 79.8 79.0 77.3 77.8
SVM (%) 70.6 76.0 76.5 75.9 73.8 75.3 74.8 57.9
and transformed in RGB format by placing the grayscale image in each colour component. The accuracy results presented in this section are obtained by classifying the 1000 images from the test set (353 images with lung opacity and 647 with no opacity). For tuning the hyperparameters we established a 70–30% split for validation purposes. We use the following abbreviations for the networks: R18, R50 for the Resnet networks, V16, V19 for the VGG networks, Mbl for Mobilenet, D for Densenet201, Inc for Inception and Sh for Shufflenet. In Table 3.1, the DL column presents results obtained using the 8 beforementioned CNN for classification. The SVM, 201-NN and RF columns use the same networks for feature extraction and Support Vector Machines, 201-NearestNeighbour and Random Forest, respectively, for the classification of test images. First, we monitored the individual classification results obtained with 8 CNN and 4 classification methods. The accuracy results are in Table 3.1. We note that the overall best result was obtained using 201-NN with VGG16 features. On average, the best results were provided by the 201-NN classification method, followed by Random Forest and Deep Learning methods, and the worst average results were computed with SVM (because of the accuracy obtained with Shufflenet features). Analysing the average results from the point of view of the networks used for feature extraction, the best results (78.63%) were obtained with VGG16 (due to the best result computed with 201-NN) followed by Resnet50 with an average of 78.15%. In the following computations, we considered all possible combinations of p classifiers from the total of 32 classification results, with p ∈ {3, 4, 5, 6, 7}. We did not consider higher values for p because, for example, for p = 8 there would be 10.518.300 combinations to take into consideration. In Tables 3.2 and 3.3 are the results for the voting combinations, simple voting in Table 3.2, and weighted voting in Table 3.3, respectively. We note that as p increases so do the accuracy results. For the simple voting procedure, for p = 3 there is only one combination that provides the best result, for p = 5 there are three such combinations and for p = 7 there are seven combinations For the weighted voting algorithm, we have for p = 3 three combinations that produce the best results, but in fact, these combinations are formed by the same
58
A. Ignat and R.-A. G˘ain˘a
Table 3.2 Best accuracy for simple voting combinations No. of combinations
Combination
Accuracy (%)
p=3
R50(DL) + R18(k NN) + V16(k NN)
82.8
p=5
R50(DL) + V16(DL) + D(DL) + R18(k NN) + D(k NN)
83.3
R50(DL) + V16(DL) + R18(k NN) + V16(k NN) + D(k NN) R50(DL) + D(DL) + V16(RF) + R18(k NN) + D(k NN) p=7
R50(DL) + V16(DL) + D(DL) + V16(RF) +
83.4
+ R18(k NN) + R50(k NN) + D(k NN) R50(DL) + V16(DL) + D(DL) + R18(k NN) + + R50(k NN) + V16(k NN) + D(k NN) R50(DL) + V16(DL) + Sh(RF) + R18(k NN) + + V16(k NN) + D(k NN) + Inc(SVM) R50(DL) + V16(DL) + R18(k NN) + R50(k NN) + + V16(k NN) + D(k NN) + R50(SVM) R50(DL) + V16(DL) + R18(k NN) + R50(k NN) + + V16(k NN) + D(k NN) + V16(SVM)
Table 3.3 Best accuracy for weighted voting combinations No. of combinations
Combination
Accuracy (%)
p=3
0.4 R50(DL) + 0.2 R18(k NN) + 0.4 V16(k NN)
82.8
0.3 R50(DL) + 0.3 R18(k NN) + 0.4 V16(k NN) 0.4 R50(DL) + 0.3 R18(k NN) + 0.3 V16(k NN) 0.2 R50(DL) + 0.4 R18(k NN) + 0.4 V16(k NN) 0.3 R50(DL) + 0.4 R18(k NN) + 0.3 V16(k NN) 0.4 R50(DL) + 0.4 R18(k NN) + 0.2 V16(k NN) p=5
0.2 R50(DL) + 0.2 R18(k NN) + 0.4V16(k NN) +
83.7
+ 0.1 D(k NN) + 0.1 Mbl(SVM) 0.1 R50(DL) + 0.3 R18(k NN) + 0.4 V16(k NN) + + 0.1 D(k NN) + 0.1 Mbl(SVM) 0.1 R50(DL) + 0.3 V16(DL) + 0.1 D(DL) + + 0.3 R18(k NN) + 0.2 D(k NN) p=7
0.2 R50(DL) + 0.2 V16(DL) + 0.1 D(DL) + 0.2 R18(k NN) +
84.0
+ 0.1 R50(k NN) + 0.1 V16(k NN) + 0.1 D(k NN) 0.2 R50(DL) + 0.2 V16(DL) + 0.1 D(DL) + 0.2 R18(k NN) + + 0.1 V16(k NN) + 0.1 D(k NN) + 0.1 Sh(k NN) 0.2 R50(DL) + 0.1 V16(DL) + 0.1 D(DL) + 0.3 R18(k NN) + + 0.1 V16(k NN) + 0.1 D(k NN) + 0.1 V16(SVM) 0.2 R50(DL) + 0.1 D(DL) + 0.1 V16(RF) + 0.3 R18(k NN) + + 0.1 V16(k NN) + 0.1 D(k NN) + 0.1 V16(SVM) 0.3 R50(DL) + 0.1 V16(DL) + 0.1 D(DL) + 0.1 V16(RF) + + 0.2 R18(k NN) + 0.1 V16(k NN) + 0.1 D(k NN)
results with different coefficients. For p = 5 there are also three best combinations, two of which contain the same classifiers with different weights. For p = 7 there are five combinations that produce the 84% accuracy.
3 Improving Diagnostics of Pneumonia … Table 3.4 Best accuracy for Bayes combinations No. of combinations Combination p=3 p=4 p=5
p=6 p=7
R50(DL), R18(kNN), V16(kNN) R50(DL), V16(DL), D(DL), R18(kNN) R50(DL), R18(kNN), V16(kNN), D(kNN) R50(DL), V16(DL), D(DL), R18(kNN), D(kNN) R50(DL), V16(DL), R18(kNN), V16(kNN), D(kNN) R50(DL), D(DL), V16(RF), R18(kNN), D(kNN) R50(DL), V16(DL), R18(kNN), R50(kNN), V16(kNN), D(kNN) R50(DL), V16(DL), R18(kNN), R50(kNN), V16(kNN), D(kNN), R18(SVM)
59
Accuracy (%) 82.8 83.1 83.3
83.6 83.7
We note that in all the presented combinations there are the results computed by Resnet50 network, the 201-NN with Resnet18 and VGG16 features. In the great majority of the combinations, the Deep Learning and the 201-NN classifiers are present. The most present features in these combinations are those computed using the Resnet and VGG families of networks. Although the 201-NN with VGG16 features provided the best individual result, in the weighted combinations has usually the smallest weight possible, usually the highest weight is for the Resnet50 results. In the best combinations with seven classifiers, Mobilenet-related results are absent, Inception and Shuffelenet results appear only in 2 or 3 combinations. In Table 3.4 are the results for the naive Bayes method for combining classification results. Again, as p increases, the corresponding accuracy values. increase. The same classification results provide the best-combined results: Resnet50, 201-NN with Resnet18 and VGG16 features. For p < 7 the best results are computed only with Deep Learning and 201-NN results. It is interesting that for p = 7 the only combination that produced 83.7% accuracy, besides Deep Learning and 201-NN classifiers, the individual classifier computed using SVM with Resnet18 features is present. This SVM classifier provides the second worse result among the SVM individual results. For this type of combining classifiers, there are fewer combinations that produce the best results for each p than we obtained with the voting algorithms. The algorithm that is based on the dynamically selected procedure is more timeconsuming, which is the reason why we considered only combining 24 classification results, namely those provided by the CNN, Random Forest and 201-NN methods. The accuracy results are in Table 3.5. For p = 3, the three combined classifiers are those that provided the best individual results. If we analyse the content of the combination for p = 7, the individual results are not necessarily the best computed. All the combinations contain the 201-NN with VGG16 features. The feature vectors formed by concatenating the labels produced by the individual results are binary vectors. We used them with SVM, 1-NN and Random Forest to produce a new label. The results are in Table 3.6. We first concatenated all the 32
60
A. Ignat and R.-A. G˘ain˘a
Table 3.5 Best accuracy for probabilistic dynamically selected combinations (only deep learning, random forest and 201-NN combinations) No. of combinations Combination Accuracy (%) p=3 p=4 p=5 p=6 p=7
Sh(RF), V16(kNN), Mbl(kNN) V16(kNN), V19(kNN), D(kNN), Sh(kNN) R50(DL), Sh6(DL), Sh(RF), V16(kNN), V19(kNN) Mbl(RF), R18(kNN), V16(kNN), V19(kNN), Sh(kNN) Mbl(DL), Inc(DL), Sh(DL), R50(kNN), V16(kNN), Mbl(kNN) R50(DL), Sh(DL), R18(RF), D(RF), V16(kNN), V19(kNN), Mbl(kNN)
81.8 82.3 82.2 82.8 83.2
Table 3.6 Accuracy of machine learning algorithms applied with individual results as feature vectors Features SVM (%) 1-NN (%) RF (%) DL + RF + 201NN + SVM DL + RF + 201NN DL + RF + SVM DL + 201NN + SVM RF + 201NN + SVM
81.4 81.1 81.4 78.8 81.3
79.8 79.6 79.5 76.8 79.5
79.4 80.3 80.1 77.3 80.0
results (the accuracy results are in the first row of Table 3.6), and then we considered all combinations of 3 types of classifiers (the results are in rows 2–5 in Table 3.6). Remark that there are only 3 cases when the results improve the best individual results, and the improvement is very small 0.1–0.2%. In Tables 3.2, 3.3, 3.4 3.5 and 3.6 we presented only the best results among all combinations possible. We also computed for p = 7 (the situation when we get the best results) the worse result produced with these combinations and also the average accuracy of all the considered combinations. The results are in Table 3.7. We note that although the best overall result was produced with the weighted voting procedure, the best average result was obtained for the simple voting algorithm. We note that all these procedures of combining classifiers improved the best result with 2–2.8% and the average with 2.1–3.6%. We also performed some computations for values higher than 7 for p. We expect that starting from a certain level for p, the combined classifiers do not provide better accuracy results. For example, for the simple voting procedure with p = 9, the best accuracy is 83.8% and for p = 11 the best accuracy is 83.7%. For the weighted voting combining rule, the best combinations of p = 8 classifiers have 83.9% accuracy (there are three such combinations). Also for p = 8, the Bayes combination rule computes 83.5% the best accuracy result (for two combinations).
3 Improving Diagnostics of Pneumonia … Table 3.7 Statistics of best combinations Type of combinations Average (%) Individual Simple voting ( p = 7) Weighted voting ( p = 7) Bayesian ( p = 7) Dynamically selected ( p = 7)
76.86 80.5 79.62 80.36 79.0
61
Min (%)
Max (%)
57.9 76.7 73.8 76.6 74.2
81.2 83.4 84.0 83.7 83.2
We also computed some confusion matrices for the classification results obtained with the best individual classifier, for the weighted voting rule and Bayesian combination that produced the best results ( p = 7). Figure 3.2 shows these confusion matrices. Note that for the five combinations that provided the best accuracy results in the weighted voting procedure, there are three different confusion matrices, although the accuracy is the same. The first two combinations have the same confusion matrices, a possible explanation is that they have the same weights, and also they differ by only one individual classifier. The same happens for the following two weighted
Fig. 3.2 Confusion matrices for best results: a 201-NN with VGG16 features, b weighted voting rule with p = 7, the first two combinations from Table 3.3, c weighted voting rule with p = 7, the third and the fourth combinations from Table 3.3, d weighted voting rule wit p = 7, the last combination from Table 3.3, e Bayes combination rule for p = 7
62
A. Ignat and R.-A. G˘ain˘a
combinations that also have the same confusion matrix. The last combination has a different confusion matrix, due probably to the 0.3 weight for the Resnet50 classifier. We note that the main improvement in accuracy results is due to an increase in the true negative evaluator. We cannot compare our results with those obtained in other studies that use the same dataset because the split training set-test set is different and it is not always clear the content of the training and test sets, only a percentage is provided. For example, in [8], the computations are performed in a 5-fold cross-validation format.
3.7 Conclusions In this paper We presented how combinations of base classifiers work on chest X-ray images for pneumonia classification. We combined CNN-based results, SVM, Random Forest and kNN methods trained on features extracted with the same networks. Voting techniques, probabilistic algorithms, dynamically selected classifiers and a machine-learning method were employed to combine the individual results. We tested all possible combinations of 3, 4, 5, 6, and 7 classifiers. We have obtained better results, for some combination techniques the improvement was higher than 2%. We intend to apply more complex pre-processing techniques, other combination methods and test them on different datasets. If a segmentation step is applied to identify the lung region, and then crop only this area in the image, we expect to get better classification results. In our computations, we used only two classes (with lung opacity and no lung opacity). The “no lung opacity” chest X-ray images have annotations that allow us to split this set in two: normal lungs and with no opacity but not normal. Thus, we obtain a collection of images with 3 possible labels. We want to approach the same ideas for a multi-class dataset. We also want to find a way of combining the results provided by the combinations we computed, without introducing in this second step of combinations information about the test set.
References 1. Shih, G., Wu, C., Halabi, S., Kohli, M., Prevedello, L., Cook, T., Sharma, A., Amorosa, J., Arteaga, V., Galperin-Aizenberg, M., Gill, R., Godoy, M., Hobbs, S., Jeudy, J., Laroia, A., Shah, P., Vummidi, D., Yaddanapudi, K., Stein, A.: Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiol. Artif. Intell. 1 (2019) 2. Khan, W., Zaki, N., Ali, L.: Intelligent pneumonia identification from chest x-rays: a systematic literature review. IEEE Access 9, 51747–51771 (2021) 3. Çalli, E., Sogancioglu, E., van Ginneken, B., van Leeuwen, K.G., Murphy, K.: Deep learning for chest x-ray analysis: a survey. Med. Image Anal. 72, 102125 (2021)
3 Improving Diagnostics of Pneumonia …
63
4. Hammoudi, K., Benhabiles, H., Melkemi, M., Dornaika, F., Arganda-Carreras, I., Collard, D., Scherpereel, A.: Deep learning on chest x-ray images to detect and evaluate pneumonia cases at the era of COVID-19. J. Med. Syst. 45(7), 75 (2021) 5. Li, Y., Zhang, Z., Dai, C., Dong, Q., Badrigilan, S.: Accuracy of deep learning for automated detection of pneumonia using chest x-ray images: a systematic review and meta-analysis. Comput. Biol. Med. 123, 103898 (2020) 6. Ayan E., Ünver, H.M.: Diagnosis of pneumonia from chest x-ray images using deep learning. In: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–5 (2019) 7. Sharma, H., Jain, J.S., Bansal, P., Gupta, S.: Feature extraction and classification of chest x-ray images using CNN to detect pneumonia. In: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 227–231 (2020) 8. Kundu, R., Das, R., Geem, Z.W., Han, G.-T., Sarkar, R.: Pneumonia detection in chest x-ray images using an ensemble of deep learning models. In: Plos One 16, 1–29 (2021) 9. Darapaneni, N., Ranjan, A., Bright, D., Trivedi, D., Kumar, K., Kumar, V., Paduri, A.R.: Pneumonia detection in chest x-rays using neural networks (2022). arxiv:2204.03618 10. Han, Y., Chen, C., Tewfik, A.H., Ding, Y., Peng, Y.: Pneumonia detection on chest x-ray using radiomic features and contrastive learning. In: 18th IEEE international symposium on biomedical imaging, ISBI 2021, Nice, France, pp. 247–251. IEEE (2021) 11. Mobiny, A., Yuan, P., Cicalese, P.A., Van Nguyen, H.: Decaps: detail-oriented capsule networks. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) Medical image computing and computer assisted intervention—MICCAI 2020, pp. 148–158. Springer International Publishing, Cham (2020) 12. Caseneuve, G., Valova, I., LeBlanc, N., Thibodeau, M.: Chest x-ray image preprocessing for disease classification. In: Watróbski, J., Salabun, W., Toro, C., Zanni-Merk, C., Howlett, R.J., Jain, L.C. (eds.) Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES-2021, Virtual Event/Szczecin, Poland, vol. 192 of Procedia Computer Science, pp. 658–665. Elsevier (2021) 13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, pp. 770–778. IEEE Computer Society (2016) 14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556 15. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision (2016). https://www.overleaf.com/project/ 63e0f4759b65f657c457c83a 16. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017) 17. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 (2018). arxiv:1801.04381 18. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices (2017) 19. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, New York, NY, USA: Springer New York Inc., (2001) 20. Kuncheva, L.I.: Combining Pattern Classifiers: methods and Algorithms. Wiley (2004) 21. Mohandes, M.A., Deriche, M.A., Aliyu, S.O.: Classifiers combination techniques: a comprehensive review. IEEE Access 6, 19626–19639 (2018) 22. Britto, Jr, A.S., Sabourin, R., Oliveira, L.E.: Dynamic selection of classifiers—A comprehensive review. Pattern Recognit. 47(11), 3665–3680 (2014) 23. Cruz, R.M.O., Sabourin, R., Cavalcanti, G.D.C.: Dynamic classifier selection: recent advances and perspectives. Inf. Fusion 41, 195–216 (2018) 24. MATLAB, R2021a.: The Mathworks, Inc., Natick, Massachusetts, MATLAB version 9.10.0.1613233 (R2021a), (2021)
64
A. Ignat and R.-A. G˘ain˘a
Anca Ignat graduated from the Faculty of Computer Science, University “Alexandru Ioan Cuza” of Ia¸si, Romania. She obtained a master degree in Numerical Analysis at the University Paris XI Orsay, France. She got a PhD in Mathematics in 1999, from the University “Alexandru Ioan Cuza” of Ia¸si, Romania. Currently, Anca Ignat is Associated Anca Ignat graduated from the Faculty of Computer Science, University “Alexandru Ioan Cuza” of Ia¸si, Romania. She obtained a master degree in Numerical Analysis at the University Paris XI Orsay, France. She got a PhD in Mathematics in 1999, from the University “Alexandru Ioan Cuza” of Ia¸si, Romania. Professor at the Faculty of Computer Science, University “Alexandru Ioan Cuza” of Ia¸si, Romania, where she teaches classes on Numerical Calculus, Computer Vision and Machine Learning. Her research interests include Computer Vision, Biometry (iris and palmprint characterization and recognition), numerical approximations with applications in image processing. She is the author or co-author of more than 45 papers in scientific journals, national and international conferences.
Chapter 4
ICARE: An Intuitive Context-Aware Recommender with Explanations Barbara Oliboni, Anna Dalla Vecchia, Niccolò Marastoni, and Elisa Quintarelli
Abstract The chapter presents a framework, called Intuitive Context-Aware Recommender with Explanations (ICARE), that can provide contextual recommendations, together with their explanations, useful to achieve a specific and predefined goal. We apply ICARE in the healthcare scenario to infer personalized recommendations related to the activities (fitness and rest periods) a specific user should follow or avoid in order to obtain a high value for the sleep quality score, also on the base of their current context and the physical activities performed during the past days. We leverage data mining techniques to extract frequent and context-aware sequential rules that can be used both to provide positive and negative recommendations and to explain them. Keywords Context-aware recommendation systems · Health recommendation systems · Explainable recommendations · Data mining · Sequential rules
Acronyms RS HRS CARS
Recommendation Systems Health Recommendation System Context-Aware Recommendation Systems
B. Oliboni (B) · A. Dalla Vecchia · N. Marastoni · E. Quintarelli Department of Computer Science, University of Verona, Strada Le Grazie, 15, 37134 Verona, Italy e-mail: [email protected] A. Dalla Vecchia e-mail: [email protected] N. Marastoni e-mail: [email protected] E. Quintarelli e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_4
65
66
ICARE ALBA
B. Oliboni et al.
Intuitive Context-Aware Recommender with Explanations Aged LookBackApriori (ALBA)
4.1 Introduction Recommendation Systems (RS) provide suggestions related to a considered decisionmaking process and in particular to what item is relevant for a user on the base of his/ her profile or past habits. They have been applied in different application domains to suggest, for example, what book to read next or the film to watch, what product to choose in an online shop, the best job recruiting announcements to consider, or what restaurant to book during a trip. In the healthcare scenario, a Health Recommendation System (HRS) makes available recommendations in different contexts, as medical treatment suggestions, nutrition plans, or physical activities to perform in order to reach and follow a healthy lifestyle [24]. Collecting data related to people’s behaviours and well-being has become easier thanks to wearable devices; indeed, many people own them and can monitor with simple apps their movements, record their sleep quality and heartbeats, while also being surrounded by IoT devices embedded in common appliances in homes, offices and means of transport. This situation ensures a constant stream of new data that can be integrated, also with external data sources, and offers increasing opportunities for data analytics in the health care domain to produce useful insights. Moreover, collecting data can also help people to gain more awareness of their physical health and improve their lifestyle. An example of use case in this domain is the analysis of hyperglycemic episodes in diabetic patients together with their physical activity [7, 20]. Further examples are the analysis of wearable data to evaluate factors affecting the physical activity habits of college students [25], or to study situations affecting loneliness and social isolation. Thanks to wearable devices, the participants of health-related studies do not need to manually provide data and thus do not incur in fatigue that might lead them to stop providing the aforementioned data [26]. However, useful insights in these huge and semantically rich data sets are not always explicit. A very important aspect of sensor data is their intrinsically temporal nature, as sensors collect information about events that happen in succession. Each event is labeled with a timestamp reporting the exact time at which it is collected, thus the temporal feature is one of the contextual dimensions that can be leveraged to obtain useful knowledge. Indeed, temporal analysis allows one to discover, for example, whether events occur in a periodic way, or if they are correlated to any additional contextual feature, such as weather conditions or personal circumstances. Detecting
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
67
the periodicity of events, or their temporal correlation, can aid in the prediction of future situations that are related to past trends. In the state of the art, the impact of contextual information has been investigated and considered relevant for providing personalized suggestions, since Context-Aware Recommendation Systems (CARS) are able to provide more accurate recommendations by adapting them to the specific context the user is acting in [1]. Here, the notion of context can include, but it is not limited to, geographical, temporal, social and categorical information. For example, the ability to discover that a particular user sleeps better after some days of heavy fitness activity and that the same user prefers to train during sunny days can be useful to provide personalized suggestions suited to the weather conditions of the next few days. In recent years the importance of explainable, and contextual, recommendation systems, has been emphasized, since clear explanations could help users fulfill their needs more easily, in a more intuitive way, but also help them accept the suggestions [6]. This is true especially in the health care domain, since people using wearable devices do not have clinical expertise, thus the ability to provide clear and quick recommendations, together with intuitive and understandable explanations, could be very important. In particular, we aim at providing explainable recommendations [30], i.e., suitable recommendations together with intuitive and understandable explanations, in order to guarantee transparency and interpretability. This way, users are supported in the comprehension of the suggested recommendations, and helped in understanding why those recommendations are given and are relevant to a specific goal. In this chapter, we will describe ICARE (Intuitive Context-Aware Recommender with Explanations), a framework able to (i) collect data from wearable devices (e.g., Fitbit), (ii) enrich data with external contextual information, (iii) analyse enriched data to discover sequential rules correlating sequences of past events with a specified future goal, and (iv) provide explainable recommendations by means of an intuitive application. To better describe our proposal, we consider Fitbit data as a general use case: our algorithm analyzes past data and learns which sequences of activities, along with their intensity, historically lead to a pre-defined goal, in our case “sleeping well”. Then, it suggests which set of actions is best to take next to have a good night’s sleep while considering the user current context, as well as the actions that should be avoided. An aging function is introduced to facilitate up-to-date analysis and consequently adapt recommendations when frequent behaviours change; for example, fitness activities correlated with sleep quality might change during the year due to different factors, such as weather conditions or available free time. The chapter is structured as follows: Sect. 4.2 introduces the proposed algorithm, while Sect. 4.3 describes our proposal to adopt previously mined sequential rules to provide both positive and negative suggestions towards a specific goal. Section 4.4 presents some experimental results. Section 4.5 describes the implemented mobile App, and Sect. 4.6 summarizes the state of the art. Finally, Sect. 4.7 concludes the chapter and outlines future work.
68
B. Oliboni et al.
4.2 The ALBA Algorithm In this Section, we describe the overall approach of ICARE and the ALBA (Aged LookBackApriori) algorithm used to infer sequential rules that are then used to provide contextual recommendations. In Fig. 4.1 we show an intuitive workflow of our approach. ICARE needs as input a temporal dataset D (in our scenario, the log of physical activity levels collected by Fitbit), enriched with contextual information. For our scenario we leverage the temporal dimension to capture the user’s habits in some specific days, i.e., we distinguish if each day is either a weekday or part of the weekend. Whenever it is possible, we also utilize weather conditions to establish if the user’s preferences are related to specific situations. However, the approach is general and can be extended to consider additional contextual features. The dataset D is fed into ALBA, which then considers a temporal window τ w to construct an augmented data set Dτw . This augmented dataset is then fed to a version of the Apriori algorithm [2] that calls an aging mechanism (described in Sect. 4.2.1) at each iteration to calculate the support of items. This process outputs a set of frequent sequences S, used to generate a set of totally ordered sequential rules R, formalized as implications X ⇒ Y, where X and Y are two sets of ordered data items, such that X ∩ Y = ∅, according to specific thresholds for confidence and support [31]. Support is the frequency of the set X ∪ Y in the dataset, while confidence is the conditional probability of finding Y, having found X and is given by sup(X ∪ Y)/ sup(X). The recommender system orders R with the criteria introduced in Sect. 4.3.1 to produce a set of totally ordered sequential rules that can be queried to extract a positive recommendation r+ and a negative recommendation r− . This last step is explained in Sect. 4.3.2.
4.2.1 LookBack Apriori Given a transactional dataset D0 (see Fig. 4.2), where each transaction has a timestamp i, D1 extends D0 looking back 1 time unit, i.e., each transaction t j ∈ D1 is
Fig. 4.1 ICARE workflow
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
69
Fig. 4.2 Example of fitbit dataset and the extended dataset
obtained as the concatenation of t i , in D1 , if any, and t i in D0 , and so on (as shown in Fig. 4.2). This way, we iteratively build a dataset Dj , where each transaction t is related to a time window of length j + 1. We can iteratively apply Apriori on the dataset Dj (looking back over j time units) and in this case our algorithm analyzes a dataset with a variable time window, or we can set the length of temporal window, e.g., τ w , and apply Apriori on the dataset Dτ w−1 . We then set a threshold for the support (minsupp) and extract all the itemsets whose support is equal or more than minsupp; these are the frequent itemsets. After this, we set a confidence threshold minconf and discover totally ordered sequential rules r with confidence greater than or equal to minconf, such that the consequent is the itemset related to the target function at the time unit 0 (i.e., the current day).
4.2.2 Aging Mechanism In order to enhance the importance of recent frequent patterns, the LookBackApriori algorithm has been extended with an aging mechanism. Normally, the support of a sequence is calculated at each iteration of the algorithm, in order to decide if a particular combination of items is frequent. This is done by counting the occurrences of these combinations at each timestamp, and then dividing it by the number of timestamps. Without an aging mechanism, the support of a sequence appearing frequently in recent times and another sequence that appeared the same amount of times a while ago would be the same. For this reason, we modify the support of each sequence to account for their recency. More in detail, in order to penalize older sequences, we multiply each row of our dataset by an aging factor that guarantees that the items in the temporal window will still be represented, while older items slowly decay but never quite disappear.
70
B. Oliboni et al.
In Fig. 4.2 we show an example of a Fitbit dataset D0 and the dataset D2 built to consider a temporal window of length 3; in D1 the frequent sequences “HA:3, LA:3” and “HA:3, LA:2” appear 2 times each. The sequence “HA:3, LA:3” appears more often in recent times, compared to “HA:3, LA:2”, which appears more at the beginning of the dataset. By introducing the aging factor before computing the support of each item we penalize the older sequence, and thus the older behaviors, while more recent sequences will have a higher support and therefore will be prioritized. In the literature there are different ways to compute the decayed weight of a tuple based on its age, measured backward from the current time. Exponential decay has been used for many applications since it can be easily computed [11], but in our scenario it led to worse accuracy results as we show in Sect. 4.4.
4.3 The Recommender System In our use case scenario, the transactional dataset D is a log of physical activity levels and sleep quality extracted from Fitbit, where for each day t i , the user’s activities, such as heavy physical activities (HA), light physical activities (LA), steps (ST), rest periods (R), sleeping score (SL) and their related intensities are stored and enriched by the value WD, if t i represents a day that falls into the weekend, and the relative average weather conditions (S = sunny, C = cloudy, R = rainy), if the location of the user during t i is available/declared. Weather conditions are extracted trough the API of OpenWeather [15]. For each user, the time interval of each (daily) activity has been discretized into 3 possible uniform values (1: Low, 2: Medium, 3: Intense). Each day is an itemset, as the activities themselves are not ordered due to the aggregated nature of Fitbit data. In this dataset, the timestamp i is related to an interval including the fitness activities and the subsequent sleeping period. For simplicity we will call this time unit day. After frequent itemset have been mined, non contextual sequential rules will be generated and in our scenario they will be of the form: f
f
f
ri : I−(τw −1) ∧ ... ∧ I−2 ∧ I−1 → I0s [si , ci ] This shows, with support si and confidence ci , the correlation between the sleep quality (i.e., our target function) for the current day 0 (see the itemset I0s in the consequent related to the sleeping activity) and fitness activities, performed considering at most τ w days, where τ w is the considered temporal window. We remind the reader that the sequence of itemsets in the antecedent do not need to be complete. For example, a mined rule stating that after a day with medium heavy activity (HA:2) and a long stretch of rest (R:3) and the subsequent day with low light activity (LA:1), the predicted sleeping quality for the current day will likely be medium (SL:2), has the following form:
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
71
{H A : 2, R : 3}−2 ∧ {L A : 1}−1 → {S L : 2}0 [sr , cr ] When the Fitbit log is enriched with contextual information, ICARE mines sequential rules of the form: f
f
f
ri : C I −(τw −1) ∧ ... ∧ C I−2 ∧ C I−1 → I0s [si , ci ] where each itemset in the antecedent of the rule contains, besides the information about frequent fitness activities, also the related contextual conditions, if available. For example, a mined rule stating that after a cloudy day with medium heavy activity (HA:2) and a long stretch of rest (R:3) and the subsequent rainy day, during a weekend, with low light activity (LA:1), the predicted sleeping quality for the current day will likely be medium (SL:2), has the following form: {C, H A : 2, R : 3}−2 ∧ {R, W D, L A : 1}−1 → {S L : 2}0 [sr , cr ] To discover the best rule r for predicting the answer to the query “How well will I sleep tonight?”, we need to match the mined rules to a portion of the user Fitbit log , ..., I−2 , I−1 > L =< I−(τ w −1)
related to the established temporal window. This partial log will be hereafter called query. Note that if the user constantly wears the smartwatch, the query will contain information for each day, always enriched with contextual information related to the day of the week (WD or not), and sometimes the weather conditions, when available. For example, during the previous 3 days the user log/query may be: L = {L A : 1, ST : 3, S L : 2}−3 , {H A : 3, ST : 2, S L : 3}−2 , {H A : 3, L A : 3, S L : 3}−1
Note that L refers to weekdays (the item WD related to the weekend is not present) and the weather information is not available (i.e., the user did not declare his/her current location). The algorithm will then need to sift through all the mined rules to find one that matches the query in the antecedent and the goal in the consequent.
4.3.1 Rule Ordering The set of rules is ordered using the following criteria: (1) confidence, (2) completeness, (3) support, and (4) size. Ordering by confidence and support is straightforward, as they are simple float values. A rule with better support will be prioritized over a rule with less support and the same goes for the confidence. On the other hand, ordering by completeness means that the rules that have at least one type of activity per timestamp in the considered temporal window will be prioritized over those that
72
B. Oliboni et al.
lack information in specific days. Let us define the subset of empty itemsets in a rule r : {Ii ∈ r |Ir = ∅}. We can define the completeness order between two rules r as I∅ r 1 , r 2 as: r1 r2 < I r1 >c r2 → I∅ ∅ r1 r2 = I r1 =c r2 → I∅ ∅ If two rules r 1 , r 2 have the same support, confidence and. r1 =c r2 , then the rule with more itemsets will be prioritized: r1 >c r2 : |r1 | > |r2 |
4.3.2 Rule Search The recommender system is designed to be goal-oriented, so the rules are first filtered (according to their consequent) into two different sets R+ (r ) and R− (r ), hat will contain the recommendations on the fitness activities that may lead to better sleep quality and to worse sleep quality, respectively. The two sets of rules are then searched separately to produce a positive recommendation and its negative counterpart. Before defining how rules are matched with queries, we need to define the similarity of itemsets and items. Every item is either a contextual item or a physical activity represented by its intensity and duration, thus two items are considered similar if they have the same type, i.e., LA:3 is similar to LA:2 but not to MA:3. Note that the contextual value can match only with itself. Given an ordered set of rules, obtained through the above criteria, the algorithm needs to find the best rule that matches the query. The query is a list of activities and contextual information, if any, in sequential time slots and it is matched to the antecedent of a rule through the following criteria: (1)
EXACT MATCH → the query is exactly the antecedent of the rule: Query: {L A : 3}−3 ∧ {M A : 2}−2 ∧ {R : 3}−1 Match: {L A : 3}−3 ∧ {M A : 2}−2 ∧ {R : 3}−1
(2) MATCH → all of the items in the matching rule appear in the right time slot in the query: Query: {M A : 2, L A : 3}−3 ∧ {M A : 2}−2 ∧ {W D, L A : 1, R : 3}−1 Match: {L A : 3}−3 ∧ {M A : 2}−2 ∧ {W D, R : 3}−1
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
73
(3) PARTIAL MATCH → some of the items in the query appear in the right time slot in the matching rule, while others have “similar” counterparts in the right time slot: Query: {R : 2, L A : 2}−3 ∧ {M A : 2}−2 ∧ {L : 2, R : 2}−1 Match: {L A : 3}−3 ∧ {M A : 2}−2 ∧ {R : 3}−1 (4) SIMILAR MATCH → every time slot in the query contains one item that is “similar” to one item in the corresponding time slot of the matching rule: Query: {R : 3, L A : 2}−3 ∧ {M A : 3}−2 ∧ {L : 2, R : 1}−1 Match: {L A : 3}−3 ∧ {M A : 2}−2 ∧ {R : 3}−1 The search algorithm then iterates over the ordered rules and returns the best match (or no match at all) according to the criteria above. If an exact match is encountered, the algorithm immediately returns the corresponding rule r , as it is the best possible outcome. Otherwise, the iteration proceeds until the end and collects the other types of matching rules in their respective lists. At the end, the best rule r is the first rule in the first non-empty list in the order established above and is of the form: f
f
f
r : I−(τw −1) ∧ ... ∧ I−2 ∧ I−1 → I0s If all lists are empty, the algorithm returns NULL. This process is executed for both R+ (r ) and R− (r ), thus returning the best possible positive recommendation and the best negative recommendation. The two sets are of the form: R+ (r ) = {I−(τw −1) ∧ ... ∧ I−2 ∧ I−1 ∧ I 0 → I0∗s with I0∗s > I0s } f
f
f
f
R− (r ) = {I−(τw −1) ∧ ... ∧ I−2 ∧ I−1 ∧ I 0 → I0∗s with I0∗s < I0s } f
f
f
f
The set R+ is composed of the rules with the same past activities of r , but with a f suggestion of fitness activities for the current day (i.e., I 0 ) and with better sleeping quality in the consequent. On the contrary, the rules in R− are those with the same past activities of r , but with a suggestion of fitness activities for the current day (i.e., f I 0 ) that may lead to worse sleeping quality in the consequent. The order relation > depends on the function we want to optimize. Note that the antecedent of a sequential rule is the explanation of the current suggestion, i.e., it is the recent behaviour of the user that leads to a certain sleep quality. It is important to highlight that, when using sequential rules to provide recommendations, the sequential relationship between each itemset is important. Indeed, the user performs activities in particular days and under precise contextual conditions, thus the order is important. This is the reason why we have developed our approach without starting from well know algorithms, like GSP [14].
74
B. Oliboni et al.
4.4 Experimental Results To show the validity of ALBA we show how the aging mechanism allows the algorithm to extract better rules than the original LookBackApriori [16] on data taken from the PMData dataset [23] and a custom dataset that has been collected specifically for this paper.
4.4.1 Datasets The first dataset on which we evaluate our approach is PMData [23], which contains Fitbit logs from 16 users (p01–p16), 13 male and 3 female, with ages ranging from 23 to 60. The data was collected in a span of around 149 days between November 2019 and March 2020 and contains detailed information about the users’ eating and drinking habits, weight, demographics, on top of all the data collected by Fitbit. The second dataset was collected between August 2021 and September 2022 specifically for this paper. The participants of this study are two male and two female Fitbit users of ages ranging from 16 to 55. For this study we focus primarily on the sleep data and the daily physical activity of the users that reports, for each day, the number of minutes that the user has spent in a specific activity level (be it light, moderate, heavy activity or rest). Two of the 16 original users of PMdata have been removed from the experiments: user p12 was missing a file fundamental for our experiments and user p13 only had 47 days of recorded activity out of the 149 days of the study.
4.4.2 Discretizations Since the algorithm is based on Apriori, it requires discrete items to mine frequent itemsets on non categorical values. For this reason, the first step of the experiments is data discretization in order to transform the numerical data (minutes spent in an activity level) into labels that can then be used as input to ALBA. Every activity duration is grouped into 3 possible buckets, according to thresholds that distribute the data evenly into the buckets. For example, the first bucket for the light activity of the user p10 in the dataset PMdata holds all the values in the interval (66.999, 249.333], while the second and third buckets are delimited by the intervals (249.333, 293.333] and (293.333, 475.0]. Looking at Fig. 4.3 it is easy to see why the middle bucket represents the shortest range, as the data points tend to concentrate in that region. Similarly, the smallest bucket for the heavy activity of the same user is given by the interval (−0.1, 12.0] as it represents the biggest cluster of data points (see Fig. 4.4).
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
75
Fig. 4.3 Distribution of light activity
Fig. 4.4 Distribution of heavy activity
Every data point is then encoded with the type of activity it represents and the bucket in which it falls. For example, on November 8th, 2019, user p10 spent 313 min doing light activity. Looking at the above buckets we can encode this data point as LA:3, meaning a high level of light activity. This encoding is done for each user separately, accounting for the wildly different distributions of values in the dataset.
76
B. Oliboni et al.
Since every user has different habits when it comes to physical activity, a “high” level of light physical activity (LA:3) is different between one user and the other. For completeness, we have run many experiments with different discretization methods and parameters where we have found that the best way to encode the data is in 3 equally-distributed buckets. Different number of buckets. Increasing the number of buckets decreases the accuracy of the results considerably, as the relevance of the mined rules depends on the frequency of the sequences, which decreases as the buckets increase. Decreasing the number of buckets is not feasible as 1 bucket would lose all information and 2 buckets, while technically performing well, is not informative enough. Subdividing the data in 3 buckets allows also an easy semantic mapping between bucket 1 and “low levels” of a specific activity, bucket 2 with “average levels” and bucket 3 as “high levels”. For these reasons we find that 3 buckets is the best configuration for the experiments. Different discretizations. Using buckets that are not equally distributed also causes problems in the rule mining process. The advantage of this technique is that it preserves the original distribution of the data, but at the same time it inflates the numbers data points that are encoded with the label representing the highest concentration. For example, using equi-spaced buckets in the light activity shown in Fig. 4.3 we would have every data point between 180 and 360 encoded as LA:2, meaning that around 90% of the data points would have one label, while the remaining 10% is split between the two remaining buckets. This causes obvious problem in a data mining algorithm that favors frequent itemsets, as the results would always focus only on one label and thus not be informative at all.
4.4.3 Methodology In order to verify the validity of our approach we have extensively tested the rules mined with ALBA and compared them with the ones extracted by LookBackApriori. Since the idea of our approach is to leverage historical Fitbit data to provide suggestion that will improve the sleep quality of the users, we check how accurately the mined rules can predict the sleep quality. Train-test split. As a first step, we reserve the first 80% of the logs of each user as a training set to generate the rules and then use the remaining 20% as a test set. For these experiments, the construction of the training and test sets cannot be randomized, as the temporal information is vital to our approach. Training. We then run ALBA and the original LookBackApriori with the training set and generate a set of rules for each user in the dataset with no specified goal (the consequent can contain any sleep score value). The experiments are run on a Windows laptop with an 11th generation i7 and 32 GB of RAM, which were often put to the test due to the exponential nature of Apriori. For example, a configuration
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
77
Fig. 4.5 Number of rules generated and running time for ALBA and LookBackApriori with different support values
with temporal window set to 5 achieved an average accuracy 4 percentage points higher than the one with temporal window set to 4, but the latter returned its results in around 11 s per user, while the former took 4 min. In Fig. 4.5 we show the execution times of our algorithm and the number of rules generated for 4 randomly-selected users in PMdata at different support values. The low improvement in accuracy were deemed not worth the computational cost, thus we selected the following parameters that achieve the best average accuracy among all users without causing unnecessary slow down: (1) temporal_window = 4, (2) min_support = 0.04, (3) min_confidence = 0.8. The rules generated at this step contain physical activity labels in the antecedent and only the sleep quality score in the consequent. The test set is then split with a sliding window into sequences of days of length 4 (same as the temporal window), where the first 3 days only contain physical activity data and the fourth day also contains the actual sleep quality, mirroring the structure of the mined rules. The physical activity data from the 4 days is then used as a query to find the most relevant rule in the ones generated by the training phase. The criteria discussed in Sect. 4.3 are used to find the best rule, which will have the predicted sleep quality in the consequent. As a last step, we check the predicted sleep quality of the rule extracted from the training set with the actual sleep quality of the test set and save the result. After all sequences of the test set have been checked against the rules extracted by ALBA, we tally the correct ones and calculate the accuracy for every user.
4.4.4 Results The results of our experiments on PMdata can be observed in Fig. 4.6. ALBA outperforms LookBackApriori in 12 of the 14 users (86% of the dataset), while LookBackApriori comes out on top on only one of the users. The improvements of ALBA against LookBackApriori can also be observed by looking at the average accuracy achieved by the two algorithms, depicted in Fig. 4.6 with a dashed line.
78
B. Oliboni et al.
Fig. 4.6 Accuracy of LBA and ALBA on PMdata users
The experiments on our custom dataset confirm that ALBA is the better algorithm as it either matches LookBackApriori exactly (on u04) or it outperforms it. The results can be seen in Fig. 4.7. We ran many experiments while collecting the data for the custom dataset, and gathered some interesting insights. For example, running ALBA at the end of the summer on the activity log of our 16 years old user revealed that the best sleep is achieved after 3 days of HA:3, so the best positive suggestion after 2 days of HA:3 is another HA:3.
Fig. 4.7 Accuracy of LBA and ALBA on custom users
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
79
Fig. 4.8 Accuracy of linear aging and exponential aging on PMdata users
The opposite was true in the experiments run during the school year, as 3 straight days of HA:3 led to very bad sleep, making HA:3 the best negative suggestion after 2 days of HA:3 for that period. Running LookBackApriori on the same log during the two periods produced the same rule (3 days of HA:3 leads to good sleep) for both, since the sequences mined during the school months were not penalized. This highlights how the aging mechanism can help our algorithm discover behaviours that might be dependent on different temporal contexts. To confirm that our choice of aging mechanism was correct we ran the experiments with the usual parameters and an exponential aging function. The results depicted in Fig. 4.8 show that indeed our aging mechanism works better than the exponential one on average, but also that the algorithm with exponential aging works better than with no aging at all.
4.5 ICARE App We implemented an Android app with a Python backend which recommends activities to perform during the current day, either a weekday or during the weekend, in order to increase the sleeping quality w.r.t. the predicted value, as well as the activities to avoid. The app is written in Ionic, an open-source framework used to create hybrid mobile apps. Its main features resemble many existing health-related apps, collecting data from the user such as their weight, their sleep quality and activity levels. The latter are obtained directly from a Fitbit device worn by the user and are used mainly in the sleep quality prediction and the activity recommendation phases, as shown in
80
B. Oliboni et al.
Figs. 4.9 and 4.10. The main view and the sleep quality prediction page, together with its explanation, can be seen in Fig. 4.9, while positive and negative recommendations are shown in Fig. 4.10. The ICARE app provides a personalized prediction by giving a sleep quality evaluation for the following night, with a visual representation of the related confidence, describing the expected likelihood that the prediction will turn out to be true. Moreover, we provide the user an explanation for highlighting why that prediction was proposed. In particular, physical activities performed in the previous days, and considered as the prevision reason, are summarized. This way users are given an insight on the motivations that led to a prediction, and thus they can better understand how activities and desired goal are intertwined. Starting from the description of the present situation and the related prediction, ICARE provides users with personalized suggestions that are different for different users. The app shows the goal to reach together with positive and negative recommendations describing activities to perform and to avoid, respectively, in order to achieve the given goal. Proposed suggestions change over time and are strictly related to user context. Fig. 4.9 ICARE prediction and explanation
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
81
Fig. 4.10 ICARE positive and negative recommendations
4.6 Related Work In this section, we first explore some proposals that have applied data mining techniques on sensor data to gather new insights useful to provide recommendations and the main proposals on temporal association rules. Then, we introduce possible way to use Recommendation Systems in the healthcare scenario. Existing fitness apps typically provide the user with a lot of information that is incomplete, not very relevant, and difficult to interpret. In [5], the authors try to overcome these limitations with a middleware app that proposes personalized lifestyle improvement recommendations by leveraging community data. In our proposal we collect data from the wearable devices of each user individually and analyze them in order to provide personal suggestions on (i) the future actions required and (ii) those to avoid in order to reach a given goal. We do not compare user data to community data, thus our recommendations are personalized with regard to user activities and goals, since we deem the personalized suggestions do not depend only on some features like sex, age, and lifestyle. Indeed, we have performed extensive experimental sessions
82
B. Oliboni et al.
and verified that each user has different habits, that can be correlated to various contextual data as well (e.g., free time, period of the year, weather conditions). The Apriori algorithm [2] has been successfully used to infer which types of activities are associated with different levels of loneliness [13]. In particular, passive data collected from both smartphones and Fitbit devices is analyzed to capture activity and mobility levels, communications and general phone usage, along with sleep behaviors. Various analytical methods have been developed to predict sleep quality: deep learning models have been successfully used to link daily activity rates with sleep quality [21], although some studies have reported that the correlation between physical activity and sleep quality is very weak, and they might be more independent than originally thought [5]. In general, our proposal does not want to underline this correlation from a medical point of view, we consider the sleep score as a target function, that could be changed in future on the base of the available collected data. Data Mining algorithms considering the temporal dimension have been proposed in the literature and mainly infer sequences of events. In [22], the authors deal with the problem of mining sequential patterns introduced in [3] and generalize it by (i) adding time constraints for specifying a minimum and/or a maximum time period between adjacent elements in a pattern or the antecedent; (ii) relaxing the restriction related to the source of items, thus allowing the items can come from different transactions; (iii) introducing a user-defined taxonomy on items for enabling sequential patterns to include items across levels of taxonomy. They present GSP (Generalized Sequential Patterns), an algorithm for discovering sequential patterns in transactional datasets. In [14] the authors propose an algorithm that searches for periodic occurrences of events of interest that regularly occur prior to other events of interest, by setting temporal windows and the time lag between the antecedent and the consequent of a rule. The introduction of time constraints restricts the time gap between sets of transactions containing consecutive elements of the sequence. In general, in our proposal the minimum time gap is set by the granularity of data that are aggregated daily in Fitbit, the maximum time gap is flexible because is set by the temporal window and can consequently be enlarged, but we need to maintain the relative (w.r.t. the current day) temporal information of each itemset, since it is used to provide recommendations and thus, the antecedent of mined rules has to match with the real log storing what a user has done daily in the past few days. Starting from [22], a lot of research work has been done to improve the efficiency of association rule mining algorithms and to extend the definition of associations rule to other types of data, e.g., time sequence data [28]. In our case we are also interested in the relative temporal information of each itemset w.r.t. the current time unit, the only parameter we can decide to set is the temporal window. In addition, we deem important to differentiate recently generated sensor data from the obsolete information, which may be no longer useful in some application scenarious or possibly invalid at present, because some contextual conditions have changed. In the same way, other algorithms like SPADE [29] and PrefixSpan [17] extract sequential rules efficiently but they do not attempt to store the temporal distance among event sequences. The temporal dimension of data has been considered in [19] where
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
83
the authors propose an algorithm for discovering how association rules vary over time and introduce calendric association rules to describe complex correlations. The temporal dimension has been adopted in [4] to introduce the notion of temporal support and reduce the number of mined rules; for this purpose, a time interval is introduced as an obsolescence factor: the frequent itemsets are only those included in that interval. The proposal whose aim is more closely aligned with ours is [32]: the authors propose a commercial recommender system that leverages a temporal rule mining algorithm to extract sequential patterns containing not only the recommended items but also the time gaps among them; with this additional information they can provide suggestions right in time. As an example, they discover frequent patterns describing that a given product (e.g., pizza) has been bought, and after two weeks another product (e.g., sushi) is bought. The main difference with our approach is that the authors focus on the time interval between different purchasing transactions, with a time notion that is relative w.r.t. the first purchase, to detect frequent purchasing patterns and their time gap. In our case, we want to infer how the history of activities influences the value/intensity of an item that belongs to the current transaction. Translating our scenario into the same commercial recommendations problem, we can think of a user that buys beer and pizza for six consecutive days. Our approach can focus on learning that on the seventh day the user will eventually get tired of alcohol and buy non-alcoholic beer. Beer is still in the last transaction, but the type of beer has been influenced by the history of purchasing activities. To the best of our knowledge, the only work that considers an aging function, through a decay unit, for mining recent frequent itemsets is [8]. However, the authors consider data streams, so they need to compute on-line supports and they do not consider the possibility to mine sequences of frequent and recent patterns. Recommendations provided by Recommendation Systems must be relevant for users and are usually based on previous user decisions. In the healthcare scenario, a Health Recommendation System (HRS) makes available recommendations in different contexts, as medical treatment suggestions, nutrition plans, or physical activities to perform in order to reach and follow a healthy lifestyle [24]. In [12], the authors provide a systematic review of Health Recommender Systems by classifying the recommended items into four main categories: lifestyle, nutrition, general health care information and specific health conditions. Modern lifestyles can increase health risks, often due to poor decisions. This motivates the needed of small guided lifestyle changes such as eating healthily, rising physical activities, and sleeping well. In the modern era people can be supported in improving their lifestyle by means of wearable devices and smartphones. Mobile health feedback systems are personalized systems providing health recommendations based on physical activities and dietary information acquired by wearable devices. Suggested actions are similar to existing behaviors, and thus are familiar to people [18]. In this context, mobile phones can be used to collect feedback that can be used to improve and better personalized future recommendations. Our proposal provides recommendations for improving the sleep quality to users wearing Fitbit smartwatch, but it can also be applied in other similar contexts
84
B. Oliboni et al.
related to wellness to provide suggestions related to nutrition plans, exercises, and mindfulness. The concept of explainability is considered in [10]. In the context of Artificial Intelligence (AI) in Medicine, explainable AI (XAI) is important for clinicians and patients. Clinicians and medical staff deal with patient safety and patients, having limited background knowledge, need to understand the system conclusions. In our proposal, we try to provide the user an explanation answering the question: “Why does the recommender system arrive at a decision about a suggestion?”. Recommendation Systems are often based on prediction procedures that are black boxes and are difficult to earn the trust of clinicians and patients. Explainability can overcome this limit since it can provide support in monitoring (drug) recommendation systems, allowing the explanation to be evaluated by clinicians in order to prevent false predictions and reduce drug misuse [27]. Explanations may be evaluated from different perspectives. In [9], the authors provide a systematic survey on the evaluation of explainable recommendation. Among the evaluation perspectives they considered, we focus on “effectiveness” since we aim to propose an explanation improving the utility of the recommendation accordingly with the user experience.
4.7 Conclusions The chapter introduces an Intuitive Context-Aware Recommender with Explanations (ICARE), which is a framework for collecting and enriching wearable devices data to produce sequential rules. Sequential rules correlating sequences of past events with a specified future goal are discovered by analysing temporal data. Finally, ICARE provides explainable recommendations by means of an intuitive application. ICARE is a Health Recommendation System (HRS) since it provides recommendations aimed at improving the user’s sleeping quality based on the user’s historical frequent behaviors, combined with their relative context, while suggesting activities to take to achieve a better sleep quality and thus, to reach a healthy lifestyle. As for future work, we plan to extend the ICARE app in order to collect the user’s opinion on the received predictions and recommendations, and to infer not just the frequent activity patterns, but also the average duration of each activity correlated with additional external information (e.g., the daily schedule). Indeed, it could be useful to suggest the next best activities on the base of the user’s free time, while factoring the current time, the available remaining time interval, and the average duration of the activities to be recommended. Acknowledgements We thank the editorial team and the reviewers for their expertise and time.
4 ICARE: An Intuitive Context-Aware Recommender with Explanations
85
References 1. Adomavicius, G., Mobasher, B., Ricci, F., Tuzhilin, A.: Context-aware recommender systems. AI Mag. 32(3), 67–80 (2011). https://doi.org/10.1609/aimag.v32i3.2364 2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proceedings of VLDB’94, pp. 487–499. Morgan Kaufmann (1994). http://www.vldb.org/conf/1994/P487.PDF 3. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Philip, S.Y., Chen A.L.P. (eds.) Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE Computer Society, Taipei, Taiwan (1995). https://doi.org/10.1109/ICDE.1995.380415 4. Ale, J.M., Rossi, G.H.: An approach to discovering temporal association rules. In: Proceedings of the 2000 ACM Symposium on Applied Computing, vol. 1, pp. 294–300 (2000) 5. Angelides, M.C., Wilson, L.A.C., Echeverría, P.L.B.: Wearable data analysis, visualisation and recommendations on the go using android middleware. Multimed. Tools Appl. 77(20), 26397–26448 (2018) 6. Balog, K., Radlinski, F., Arakelyan, S.: Transparent, scrutable and explainable user models for personalized recommendation. ACM SIGIR 2019, 265–274 (2019) 7. Bosoni, P., Meccariello, M., Calcaterra, V., Larizza, C., Sacchi, L., Bellazzi, R.: Deep learning applied to blood glucose prediction from flash glucose monitoring and fitbit data. In: Proceedings of AIME 2020, pp. 59–63. Springer (2020) 8. Chang, J.H., Lee, W.S.: Finding recently frequent itemsets adaptively over online transactional data streams. Inf. Syst. 31(8), 849–869 (2006). https://doi.org/10.1016/j.is.2005.04.001 9. Chen, X., Zhang, Y., Wen, J.-R.: Measuring “Why” in recommender systems: a comprehensive survey on the evaluation of explainable recommendation (2022). 2202.06466 10. Combi, C., Amico, B., Bellazzi, R., Holzinger, A., Moore, J.H., Zitnik, M., Holmes, J.H.: A manifesto on explainability for artificial intelligence in medicine. Artif. Intell. Med. 133, 102423 (2022). ISSN:0933-3657 11. Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 138–149 (2009). https://doi.org/10.1109/ICDE.2009.65 12. De Croon, R., Van Houdt, L., Htun, N.N., Štiglic, G., Vanden Abeele, V., Verbert. K.: Health recommender systems: systematic review. J. Med. Internet Res. 23(6), e18035 (2021) 13. Doryab, A., Villalba, D.K., Chikersal, P., Dutcher, J.M., Tumminia, M., Liu, X., Cohen, S., Creswell, K., Mankoff, J., Creswell, J.D., et al.: Identifying behavioral phenotypes of loneliness and social isolation with passive sensing: statistical analysis, data mining and machine learning of smartphone and fitbit data. JMIR mHealth uHealth 7(7), e13209 (2019) 14. Harms, S.K., Deogun, J.S.: Sequential association rule mining with time lags. J. Intell. Inf. Syst. 22(1), 7–22 (2004). https://doi.org/10.1023/A:1025824629047 15. https://openweathermap.org/api 16. Marastoni, N., Oliboni, B., Quintarelli, E.: Explainable recommendations for wearable sensor data. In: The 24th International Conference on Big Data Analytics and Knowledge Discovery (DaWak 2022) (2022) 17. Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004) 18. Rabbi, M., Hane Aung, M., Choudhury, T.: Towards health recommendation systems: an approach for providing automated personalized health feedback from mobile data. In: Rehg, J., Murphy, S., Kumar, S. (eds.) Mobile Health. Springer, Cham (2017) 19. Ramaswamy, S., Mahajan, S., Silberschatz, A.: On the discovery of interesting patterns in association rules. In VLDB 98, 368–379 (1998) 20. Salvi, E., Bosoni, P., Tibollo, V., Kruijver, L., Calcaterra, V., Sacchi, L., Bellazzi, R., Larizza, C.: Patient-generated health data integration and advanced analytics for diabetes management: the AID-GM platform. Sensors 20(1), 128 (2020)
86
B. Oliboni et al.
21. Sathyanarayana, A., Joty, S., Fernandez-Luque, L., Ofli, F., Srivastava, J., Elmagarmid, A., Arora, T., Taheri, S.: Sleep quality prediction from wearable data using deep learning. JMIR mHealth uHealth 4(4), e125 (2016) 22. Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) Advances in database technology—EDBT’96 (Lecture Notes in Computer Science, vol. 1057), pp. 3–17. Springer (1996). https://doi.org/10.1007/BFb0014140 23. Thambawita, V., Hicks, S.A., Borgli, H., Stensland, H.K., Jha, D., Svensen, M.K., Pettersen, S.A., Johansen, D., Johansen, H.D., Pettersen, S.D., et al.: Pmdata: a sports logging dataset. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 231–236 (2020) 24. Valdez, A.C., Ziefle, M., Verbert, K., Felfernig, A., Holzinger, A.: Recommender systems for health informatics: state-of-the-art and future perspectives. In: Lecture Notes in Computer Science, p. 9605 (2016) 25. Wang, C., Lizardo, O., Hachen, D.S.: Using fitbit data to examine factors that affect daily activity levels of college students. Plos One 16(1), e0244747 (2021) 26. Wendt, T., Knaup-Gregori, P., Winter, A.: Decision support in medicine: a survey of problems of user acceptance. In: Medical Infobahn for Europe, pp. 852–856. IOS Press (2000) 27. Xi, J., Wang, D., Yang, X., Zhang, W., Huang, Q.: Cancer omic data based explainable AI drug recommendation inference: a traceability perspective for explainability. Biomed. Signal Process. Control 79(Part 2), 104144 (2023). ISSN:1746-8094 28. Yu, P.S., Chi, Y.: Association Rule Mining on Streams, pp. 177–181. Springer New York, New York, NY (2018). https://doi.org/10.1007/978-1-4614-8265-9_25 29. Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001) 30. Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y., Ma, S.: Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In: Geva, S., Trotman, A., Bruza, P., Clarke, C.L.A., Ja¨rvelin, K. (eds.) The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’14, Gold Coast , QLD, p. 83–92. ACM (2014) 31. Zhang, C., Lyu, M., Gan, W., Yu, P.S.: Totally-ordered sequential rules for utility maximization (2022). 2209.13501 32. Zhou, H., Hirasawa, K.: Evolving temporal association rules in recommender system. Neural Comput. Appl. 31(7), 2605–2619 (2019). https://doi.org/10.1007/s00521-017-3217-z
Barbara Oliboni is associate professor of Computer Science at the Department of Computer Science of the University of Verona (Italy). In 2003, she received the Ph.D. degree in Computer Engineering by the Politecnico of Milan (Italy). Her main research interests are in the database and information systems field, with an emphasis on clinical information management, business process management, semistructured data, and temporal information. She is part of the Program Committee of International Conferences, and reviewer for International Journals. Since 2016 she is member of the Board of the Artificial Intelligence in MEdicine Society (AIME).
Chapter 5
An Overview of Few-Shot Learning Methods in Analysis of Histopathological Images Joanna Szołomicka and Urszula Markowska-Kaczmar
Abstract Analysis of histopathological images allows doctors to diagnose diseases like cancer, which is the cause of nearly one in six deaths worldwide. Classification of such images is one of the most critical topics in biomedical computing. Deep learning models obtain high prediction quality but require a lot of annotated data for training. The data must be labeled by domain experts, which is time-consuming and expensive. Few-shot methods allow for data classification using only a few training samples; therefore, they are an increasingly popular alternative to collecting a large dataset and supervised learning. This chapter presents a survey on different fewshot learning techniques of histopathological image classification with various types of cancer. The methods discussed are based on contrastive learning, meta-learning, and data augmentation. We collect and overview publicly available datasets with histopathological images. We also show some future research directions in few-shot learning in the histopathology domain. Keywords Few-shot learning · Histopathological image analysis · Contrastive learning · Meta-learning
J. Szołomicka · U. Markowska-Kaczmar (B) Department of Artificial Intelligence, Wroclaw University of Science and Technology, Wyb. Wyspianskiego 27, Wroclaw 50-370, Poland e-mail: [email protected] J. Szołomicka e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_5
87
88
J. Szołomicka and U. Markowska-Kaczmar
5.1 Introduction Computer-assisted diagnosis systems are becoming an increasingly better tool for histopathologists thanks to the robust development of machine learning methods and increased computational power [1]. The current state of deep learning (DL) model development offers much more possibilities in computer vision analysis and applicability in the histopathology domain than before. The main obstacle in using deep learning models trained in a supervised way is the need for large datasets to train them. Usually, the annotation procedure requires a certain number of experts, and the final label should be agreed upon between them. It makes data annotation even more expensive. Sometimes, experts are unavailable, which precludes us from training the model. We can have very few patterns, for instance, for sporadic illnesses. The problem of lack of annotated samples makes few-shot learning the desirable solution in medical applications because this technique allows one to calculate features of the patterns needed to discriminate between classes using just a few samples. It is possible because models in few-shot learning learn how to differentiate classes, while in supervised learning, they learn specific features characterizing a given class. The chapter surveys existing solutions based on few-shot learning in analyzing histopathological images. We overview different few-shot methods applied and various augmentation methods useful in histopathological image analysis. We also collect information about public datasets that include histopathological images hoping that the knowledge gathered in this chapter will contribute to the further development of these methods and their greater use in recognition of histopathological images. The chapter consists of five sections. The following presents existing approaches in few-shot learning. It creates a background for understanding their usage in histopathological images. Section 5.3 describes some characteristics referring to histopathological images that should be considered in their processing. Section 5.4 reviews the few-shot learning in histopathology. It surveys various methods and gives information about datasets publicly available. The conclusion summarizes our research.
5.2 Few-Shot Learning This section presents a general concept of few-shot learning that creates a background to understand how few-shot learning can be applied in histopathology.
5.2.1 Problem definition As we mentioned, supervised learning of DL models needs a large dataset for training. An alternative is few-shot learning, where only a few samples are required to train
5 An Overview of Few-Shot Learning Methods …
89
Fig. 5.1 Tasks used in training and testing phases in the few-shot learning process
the pretrained network to new categories (classes). Usually, the network is pretrained on a large dataset (for instance, on ImageNet), which saves a lot of computational power. There are different approaches to few-shot learning, which we will discuss later in this section. Now we will focus on methods called meta-learning, which comes from the idea the methods learn how to learn. In meta-learning, we have N (T ) tasks Ttraini for training, and the model is evaluated on a set of testing tasks Ttest j . To each task there is a corresponding dataset Dtraini , and Dtest j (Fig. 5.1). A dataset is divided into meta-training and meta-testing sets, each split into support and query sets. The support set consists of a few labeled samples per new data category. The query set consists of the samples from the new and old classes on which the model should generalize knowledge gained from the support set. Each task in the N-way-K-shot scenario includes N classes with K examples. Each task can have various classes; we may never see the classes from one task in the others. In the literature, one distinguishes different types of few-shot learning depending on how many samples the model is trained on: few-shot learning, one-shot learning, and zero-shot learning. “Shot” means a single training sample. We have two-shot learning in the example shown in Fig. 5.1. Each task contains three various classes of images. Our goal is to find function f given in Eq. (5.1) parametrized by θ . y ≈ f (x, θ ) wher e(x, y) ∈ Dtest
(5.1)
The parameters θ are searched in the optimization procedure descibed by eq. 5.2. θ = arg minθ
L( f (x, θ ), y)
(5.2)
(x,y)∈Dtrain
In Eq. 5.2 L is a loss function that represents an error between prediction f (x, θ ) and true label y.
90
J. Szołomicka and U. Markowska-Kaczmar
During meta-testing, the model is trained on the meta-testing support set with a previously unknown task and tested on the meta-testing query set. As a result, the model learns how to learn a new task having only a few training samples.
5.2.2 Methods The taxonomy of few-shot learning methods differs slightly in various survey papers [2–4]. In this chapter, focusing on histopathological images, we will cover all techniques visualized in the taxonomy pictured in Fig. 5.2. All these methods are based on learning, similar to human learning. We describe two branches: data augmentation-based methods and meta-learning or learning to learn. The last approach focuses on learning priors from previous experiences that allow efficient learning of new tasks. Few-shot learning methods applied in histopathological image recognition mainly belong to meta-learning. This group comprises parameter-based (optimization-based), metrics-based, and model-based approaches. Parameter-based methods focus on optimizing model parameters to be easily tuned or ensure good performance for small datasets. Metrics-based algorithms involve the comparison of image embeddings to find their similarity. Model-based methods utilize models with external memory, a buffer where neural network models can write and retrieve information. We were unable to find any histopathological studies in this regard. Other few-shot learning methods can aid learning in a limited data regime. Transfer learning and miscellaneous autoencoders can be mentioned here [2]. Some attractive augmentation-based solutions that synthesize missing data using augmentation methods or generative adversarial networks (GANs) will also be mentioned further.
Fig. 5.2 Few-shot learning taxonomy methods overviewed in this survey
5 An Overview of Few-Shot Learning Methods …
91
Still, mainly we will refer to meta-learning methods. In the next section, we will shortly present these methods to give the background for understanding their use in histopathological image analysis. Optimization based methods Optimization based meta-learning for few-shot learning, sometimes called parameter based, enables an optimization procedure to find initial model parameters θ for each task Ti to work on limited training examples. Model agnostic meta-learning and its improvements are mainly adopted in histopathology. Model-agnostic meta-learning Model-Agnostic Meta-Learning (MAML) [5] is one of the common meta-learning methods. The model f θ with parameters θ is trained on the distribution of tasks p(T ) so that given a new task T j ∼ p(T ) the parameters are adjusted in only a few gradient steps. Equation 5.3 describes the gradient update. θ = θ − α∇θ LTi ( f θ ),
(5.3)
where θ are the model’s parameters after adapting to a new task Ti , LTi is a loss function, α is the learning rate. Parameters can be updated using one or multiple gradient steps. Equation 5.4 gives the objective during training. min θ
LTi ( f θi ) =
Ti ∼ p(T )
LTi f θ−α∇θ LTi ( fθ )
(5.4)
Ti ∼ p(T )
The model parameters θ are optimized over a sum of loss functions across multiple tasks Ti , which are computed using updated parameters θ (Fig. 5.3). Optimization uses stochastic gradient descent (Eq. 5.5), and β is the learning rate. ⎛ θ = θ − β ⎝∇θ
LTi
⎞ f θ−α∇θ LT ( fθ ) ⎠ i
(5.5)
Ti ∼ p(T )
A model-agnostic algorithm can be used for many problems like classification, regression, reinforcement learning, etc. It is worth mentioning that authors of [6] tested MAML with different parameter configurations on a standard image classification task. They obtained a similar accuracy score on the training (89–93%) and testing (86–92%) sets where testing was carried out as one-shot learning. Reptile Reptile is an improvement of MAML [7]. It performs stochastic gradient descent (SGD) for k iterations on a given task and then gradually moves the initialization weights toward the weights learned on that task. It executes this procedure for each task. As a result, the parameters are optimized towards manifolds of tasks. The algorithm is not superior in learning performance, but it is much simpler to implement because it uses vanilla SGD. Metric learning While optimization-based methods search for initial weights as a good starting point for all tasks, metric based methods aim to learn how to assign
92
J. Szołomicka and U. Markowska-Kaczmar
Fig. 5.3 Optimization process of MAML algorithm. Model parameters θ can be adapted to a new task Ti from a similar distribution as training tasks {T1 , T2 , T3 } in only a few steps. Each task Ti is composed of images from a different class
a distance measure value between pairs, small for semantically similar patterns and large for different instances. The metric is learned based on a dataset. Query images of the novel task are classified by computing their distances to novel support images regarding the learned measure. Next, a distance-based classifier is applied. Metric learning methods are more resistant to overfitting because they do not require learning additional parameters once the metric is learned. Many few-shot metric learning methods compare query samples with class representations such as prototypes and subspaces. These methods can be divided into the following group: learning feature embeddings, learning class representations, and learning distance or similarity measures. An early example of the learning of the feature embedding method is the Siamese network [8]. The Matching Network is worth mentioning [9]. It encodes support and query images using different networks in the context of the entire support set. We will not introduce the details referring to this model because we have not found any example of its usage in histopathological images. The last model in this group is the prototype network that attracts more attention from researchers in histopathological image analysis. Siamese networks The Siamese network consists of two branches created by networks with the same architecture and weights. Regarding image processing, convolutional networks are mainly used [8]. Two inputs are delivered to these two branches. Each branch finds an embedding for its input. The output generated by a Siamese neural network can be considered as the semantic similarity between the projected representation of the two input vectors. For few-shot learning, this approach delivers a way to find embeddings for the images existing in various tasks, so it is possible to find some patterns to recognize unseen classes. The loss function has a form of contrastive loss [10] or triplet loss that allows a model to learn about data, even without labels. Triplet loss is presented
5 An Overview of Few-Shot Learning Methods …
93
in Eq. 5.6. Ltri plet =
N
p
max(0, || f ia − f i ||22 − || f ia − f in ||22 + α)
(5.6)
i=1
The model is trained on a Ds data so that the embeddings f p of positive examples and f a that represents query image (sometimes called the anchor) are as close as possible and f n representing negative samples and f a as far as possible. The model learns general features about the dataset by learning which types of images are similar and which are different. It forces the model to learn how to represent images such that similar images have similar representations, allowing it to distinguish between them. Prototype networks Prototype networks [11] is a few-shot classification algorithm that maps D-dimensional images into M-dimensional vector representations in the embedding space using a neural network f φ : R D → R M . Model f φ is a convolutional network, and it is trained on the support set. Each class k has a prototype ck , a point in the embedding space calculated as a centroid of embeddings from the support set (Eq. 5.7), 1 f φ (xi ) (5.7) ck = |Sk | (x ,y ).∈S i
i
k
where Sk is a subset of the support set containing images with label k. While the classification of the query set for a given sample, the prediction is the class represented by the prototype in the nearest distance to the sample embedding. Figure 5.4 shows two clusters assigned c1 and c2 , representing Adenosis and Tubular adenoma. The query sample with a question mark is classified based on the distance to the prototype, and the one with the shortest distance assigns the class. The distance between embeddings is calculated as a Euclidean distance. Data generation based learning This approach relies on synthesizing new patterns to train the model with the increased data. The distribution diversity inside each few-shot task effectively alleviates the overfitting problem and makes the few-shot model learn more robust feature representations [13]. In such cases, using GAN to generate new images will be helpful [14]. Generative adversarial network (GAN) [15] comprises a generator G and discriminator D. A training set is composed of images processed to create embeddings delivered to the generator. In other words, the generator’s input is a vector z with a fixed length sampled from a distribution pz (z). The generator learns to generate data so that the discriminator can not tell whether it is generated or from a training set. D is trained to maximize the probability of predicting if the sample is real or fake. The generator and discriminator simultaneously play a minimax game with a function given by the Eq. 5.8. min max V (D, G) = Ex∼ pdata (x) [log D(x)] + Ez∼ pz (z) [log(1 − D(G(z)))] (5.8) G
D
94
J. Szołomicka and U. Markowska-Kaczmar ?
e
e
Adenosis
e
e
Tubular adenoma
e
e e
Fig. 5.4 Classification of a sample from the query set with breast cancer images [12] using prototype network. c1 is a prototype of class Adenosis and c2 is a prototype of class Tubular adenoma. Unknown sample embedding e is closer to the prototype c2 , because the Euclidean distance d2 < d1 so its class is Tubular adenoma
Conditional GAN [16] utilizes image labels during training. The model can generate objects of a specific class. The input to the generator consists of a vector z drawn from the distribution pz (z) and a class label. The discriminator checks whether the sample is fake or real and if it is from a given class.
5.3 Histopathology Images The main information source in cancer diagnosis are histological slides examined by a pathologist under a microscope. In the case of cancer, the report of a pathologist contains the tissue diagnosis. Differentiating benign and malignant lesions is essential to determine the appropriate treatment. When removing cancer, the pathologist checks if the surgical margin is cleared. These slides are prepared based on the parts of the tissue taken during a biopsy or a surgical operation, which are immersed in substances that penetrate and harden the tissues. Routinely paraffin is used. This process enables the preparation of suitably thin sections fixed onto glass slides. Then, Whole slide imaging (WSI) scanners capture sequential images converting the entire tissue on a glass slide into a highresolution virtual slide [17]. WSI is the most recent imaging modality employed by pathology departments worldwide [18].
5 An Overview of Few-Shot Learning Methods …
95
Different cellular components of the tissue are visible thanks to dying with stains. Standard staining used for over a hundred years is Hematoxylin-Eosin (H&E). Hematoxylin stains the cell nuclei in shades of blue or even purple, and Eosin stains the cytoplasm and connective tissue in shades of red/pink [19]. It is worth mentioning that there are several other staining techniques commonly used in the medical diagnosis of tumors [20]. In the last years, using antibodies to stain particular proteins, lipids, and carbohydrates increased the ability to identify categories of cells precisely [21]. Even with the same staining, slides differ depending on the scanner applied. Most automatic methods of WSI analysis focus on one tissue slide, such as breast, lung, and prostate tissue. WSI is characterized by high resolution, so it is almost impossible to process it directly. Therefore, in the standard analysis, WSI is split into multiple tiles. The number of malignant regions compared to the whole WSI is small, so parts of WSI with a high probability of being non-benign are chosen for analysis. This part is called Unit of Interest (UoI) [22]. Analysis of UoI allows for observation of the tumor growth and diffusion rate. Doctors evaluate the cell morphology and tissue structure to determine whether the tumor is malignant or benign. The hallmark of cancer is uncontrolled proliferation [23]. Thus, proliferation in cancers is assessed by either counting mitotic index—MI showing the number of dividing cells or immunohistochemical analysis of proliferation-associated antigens like Ki-67 (proliferation index; PI). Counting these indexes is a prerequisite for the effectiveness of most potential treatments. Computer-based methods evaluating the Ki-67 or mitotic index (MI) must segment cells in the pathological image to calculate these indexes. Tumor classification based on pathological images is another common task that allows pathologists to distinguish the subtypes and grades of cancer [24]. They are related to different components or degrees of canceration in the same tissues. The computer analysis method can be helpful in this case. Classical computer vision methods need to define some pathological image features manually. Deep learning (DL) models can find these features automatically, but a vast labeled dataset of pathological images is necessary. It is a critical challenge to use DL models in pathology image analysis.
5.4 Few-Shot Learning Approaches in Histopathology This section is the core part of our survey research. Firstly, it describes few-shot learning methods in histopathology analysis. Then, we overview the existing datasets that contain histopathological images useful for further study. We also refer to the types of problems solved in the automatic analysis of histopathological images using few-shot learning.
96
J. Szołomicka and U. Markowska-Kaczmar
5.4.1 Few-Shot Methods in Histopathological Image Analysis In the following section, we will present various applications of few-shot learning to histopathological image analysis regarding the taxonomy presented in Sect. 5.2.2. Optimization based learning Chao and Belanger in [25] present an implementation of MAML to classify whole genome doubling (WGD) across 17 types of cancer in histopathology images. WGD is a cancer phenotype that, regardless of the cancer type, is associated with a poor prognosis for a patient and determines the way of treatment. Classical methods of WGD identification are expensive and time-consuming, so the authors of [25] presented an automatic classification method across multiple cancer types. Data is from the TCGA public database of clinical and genomic data samples with 33 cancer types from over 20000 patients (Table 5.1). The dataset comprises images of histopathology slides stained with hematoxylin and eosin with a correlated WGD label. Slides were collected using different platforms and technologies over the years. Histopathology images differ depending on the cancer type, so the problem was investigated as a multitask classification where the classification of each cancer type is a separate task. Dataset D with Z cancer types is composed of Nz slide images sz,i with binary labels yz,i , i ∈ {1 . . . , Nz }. Each slide image sz,i is divided into Tz,i non-overlapping tiles x z,i,t , t ∈ {1, . . . , Tz,i }. Label yz,i is described by Eq. 5.9. 1, if WGD is detected yz,i = (5.9) 0, otherwise. The proposed algorithm utilizes MAML in two use cases: (1) to classify the same label yz,i across different cancer types having one cancer type per task and (2) to classify the same label yz,i across different batches having one batch per task. A batch, in this meaning, is a group of samples. It is prevalent that batches differ from each other due to technical variations during measurements. In the second scenario, authors added transformations to images from the meta-test set, changing their resolution and brightness and imitating natural conditions. The model consists of a ResNet18 pretrained on the ImageNet dataset [26] and two fully connected layers, dropout and tanh activation function. LogSumExp pooling was applied to enable learning across multiple tiles, as each slide consists of several tiles. It is a smooth approximation of max pooling and gives the final prediction of a slide. The pretrained ResNet18 is also pretrained on the meta-train set’s images to adapt to histopathology images. Meta-train set contains images from 8 cancer types (2056 images), meta validation set from 2 cancer types (458 images), and a meta-test set from 7 cancer types (953 images). The dataset is augmented with random vertical or horizontal flips and color jitter. The entire model is then meta-trained according to the MAML algorithm on the meta-train set. Before testing, the model is trained on the 8 slides from the meta-test set for each unknown cancer type. The MAML model results are compared to baseline CNN model results. The baseline model has the same architecture as the MAML model but is trained on the meta-train set in a
5 An Overview of Few-Shot Learning Methods …
97
supervised manner. The results show that the MAML model outperforms the CNN in a few-shot learning setup both for unchanged images and with changes in brightness and resolution. The research described in [27] is another example of the MAML method application in histopathology image classification from different microscopic scanners. It is worth adding that images scanned by various scanners differ in staining characteristics. MAML was trained on each WSI slide as a separate task, resulting in an unknown number of classes in each task. The authors proposed a system for pathologists that requires the user’s involvement. After starting the prediction of the WSI slide, the system asks the user to correct several random patches to create a support set. The model is adapted to a few patches in the support set and then predicts the remaining patches in the slide. Experiments were conducted on three datasets: inhouse HER2 stained slides from two different medical centers, CAMELYON17 [28] dataset with 50 lymph slides with tumors from five medical centers, and AidaOvary dataset [29] (Table 5.1). The slides from the AidaOvary were divided into a support set containing the six most significant ovary tumor types and a query set with the remaining types. The support set for the HER2 dataset comprised images from one medical center and for the CAMELYON17 dataset from one scanner. The remaining images were in the query sets. The model architecture was a 4-layer CNN with a fully connected layer as the last layer. The model was trained separately for each dataset on 16 slides with 50 patches from each slide. During training, data was augmented with random horizontal/vertical flips, 180◦ rotation, rescaling, and color jitter. The models were evaluated on 5000 tasks and support sets with 20 samples per task. The models trained on AidaOvary and CAMELYON17 datasets obtained average patch accuracy per task of more than 94%, and the model trained on the HER2 dataset acquired an accuracy of 84.46%. The models were compared with a baseline model, which had the same architecture as the MAML model but was trained in a supervised manner on the support sets. The accuracy of the baseline models for each query set was lower than that of MAML models by 8.37–74.61%. Worth mentioning is the MetaHistoSeg framework [30]. The authors use MAML on the histopathology image segmentation meta-dataset and compare it with instancebased transfer learning as a baseline. Both approaches in their experiments gave comparable results. They compared the results of models with the U-Net backbone trained using transfer-learning and MAML algorithms. Experiments were performed on 5 datasets with different cancer types; each dataset is a single task. The models were trained and tested on each task. In the segmentation problem, MAML and transfer learning approaches gave similar results. Zhang et al. [31] applied MAML to the few-shot, unbalanced, medical image datasets, such as VQA-RAD [32] (radiology images), and PathVQA [33] (pathology images) using dice loss function instead of the cross-entropy loss function. Authors showed that MAML obtained the best accuracy among several few-shot algorithms in 1-shot, 3-shot, and 5-shot learning on both datasets. In their research, authors of [7] focused on Reptile—an enhancement of the MAML method. Reptile algorithm with advanced augmentation methods classified images from three medical datasets, including histopathology dataset BreakHis [12],
98
J. Szołomicka and U. Markowska-Kaczmar
with breast tumor tissues. The dataset comprises 9109 histopathology images with 8 types of breast cancer from 82 patients. Slides are stained with hematoxylin and eosin. The authors compared the results of the Reptile algorithm to the results of the transfer learning method. The models have the same architecture of a 4-layer CNN and differ only in the learning method. Authors proposed three types of data augmentation: MixUp [34], CutOut [35], and CutMix [36]. Experiments using the Reptile algorithm were performed for each augmentation strategy. MixUp and CutMix allow new images and classes to be generated, preventing overfitting as the model is meta-trained both on images and classes. MixUp augmentation is interpolating two samples (xi , yi ) and (x j , y j ) that are randomly selected from the same batch (Eq. 5.10). x = λxi + (1 − λ)x j y = λyi + (1 − λ)y j λ ∼ Beta(ζ, ζ )
(5.10)
where Beta(ζ, ζ ) means beta distribution with parameters ζ . CutOut augmentation cuts a random, rectangular, black mask from an image. It removes all features from the image in the masked area and is a better alternative than dropout which randomly removes features from only some feature maps. CutMix is a combination of the two previous augmentations. New sample (x , y ) is created by cutting a rectangle from one image and pasting it into another (Eq. 5.11). x = F xi + (1 − F) x j y = λyi + (1 − λ)y j λ ∼ Beta(ζ, ζ ),
(5.11)
where F is a random, binary, rectangular mask and is element-wise multiplication, Beta(ζ, ζ ) is a beta distribution with parameters ζ . The mask is filled with ones in a rectangular area and zeros in the rest. The resulting image x contains pixels from xi in the region where the mask is filled with ones and from x j where the mask has zeros. Meta-train set contains images from 5 classes (1705 images) and a metatest from 3 classes (376 images). The models were tested as 3-shot, 5-shot, and 10-shot tasks for 2-way and 3-way classification. Meta-learning models (without augmentation and three models for each type of augmentation) made predictions with higher confidence than the transfer learning model. A classification model for a medical problem must have high confidence. Furthermore, the transfer learning model gave a worse accuracy score than the Reptile models, which obtained around 80% on a 5-shot task. It is shown that selected augmentation methods improved the accuracy score on the BreakHis dataset and reduced the problem of overfitting. We will end the survey in this group with the solution described in [37]. The novelty applied here is the DoubleOpponent (DO) neurons to model the texture patterns of histopathology images. The authors propose an LSTM-model based metalearning framework.
5 An Overview of Few-Shot Learning Methods … Fig. 5.5 The three-headed siamese model. Slides x p and xa show lobular carcinoma and xn is an example of ductal carcinoma [12]
99
VGG16
VGG16
VGG16
Metric learning methods Another few-shot learning approach is contrastive learning. In [38], authors applied a deep siamese network to solve the problem of knowledge transfer from a specific domain to a more generic one in a classification problem of histopathology images. The model Histology Siamese Network is first trained on a dataset Ds from the source domain containing colon tissues and then adapted on a few examples from dataset Dt from the target domain with colon, breast, and lung tissues. Dataset Ds is a CRC [39] dataset with colorectal tissues. Dataset Dt is private and composed of healthy and tumoral tissues of the colon, breast, and lung collected at five hospitals and annotated by 10 pathologists. It contains 1755 images in total. The model is a three-headed siamese network (sometimes called triplet net) with a modified VGG16 as a base. 128 neuron layers replaced the fully connected layers with 4096 neurons to obtain a vector representation f i for an image (Fig. 5.5). A loss function is a lossless triplet function given by Eq. 5.12: Llossless−tri plet =
N i=1
p || f ia − f i ||22 D − || f ia − f in ||22 −ln − + 1 + − ln(− +1+ β β
(5.12) where f a , f p , f n are the embeddings of an anchor (query), positive and negative samples, β and D are parameters equal to the dimension of the embedding vector (128), and is a small number. The lossless version is more efficient than the triplet loss (Eq. 5.6) because it considers the gradient for negative values of losses, unlike the standard triplet loss. The triplets comprise the images that were misclassified during the classification of Ds dataset. A confusion matrix was calculated, and a triplet sampling probability function was computed to obtain this information. The embeddings f i are input to the Support Vector Machine (SVM) classifier trained on N images per class from a training subset of Dt randomly augmented at each training step. The augmentations are spatial transformations, Gaussian blur, and color transformations. The few-shot learning model was compared to a baseline with the same modified VGG16 architecture. The baseline is trained on the Ds dataset and fine-tuned with a replaced last layer on the N samples from the Dt training set. The siamese model obtained around 80% accuracy for N = 1 and more than 90% for N = 20 when the baseline acquired 15% less accuracy with N = 130. A similar
100
J. Szołomicka and U. Markowska-Kaczmar
method but without the few-shot scenario was proposed in [40], where the authors proposed a triplets generation method based on spatial correlations. a and p tiles were selected within a certain distance on the same WSI. The n tile was from a different WSI than a but of the same cancer type, of the same organ, or a different organ. n could also be a patch from the same WSI as a but the distance between n and a must be considerable. Comparison of the model performance trained with different loss functions to obtain a histology image embedding is the subject of [41]. The compared loss functions are triplet loss, multiclass-N-pair loss, and proposed constellation loss. Triplet loss aims to minimize the distance between an anchor image xa and a positive sample x p from the same class and, at the same maximize the distance between xa and a negative sample xn with a different label. Multiclass-N-pair loss [42] objective is analogous to triplet loss, but it considers the distance between N − 1 negative classes instead of just one negative class. Constellation loss, described further in the text, is a generalization of a multiclass-N-pair loss. The model architecture is a siamese network. Dataset [39] comprises histopathology images stained with hematoxylin and eosin of eight low-grade and high-grade colon cancer tissues. The number of images per class is 625. The training set is comprised of 20 samples. Proposed constellation loss (Eq. 5.13) outperforms other methods in terms of higher accuracy and more compact clusters of image representations. Constellation loss is similar to the triplet loss function but is computed on multiple triplets. Lconstellation
⎛ ⎞ N K 1 a n a p = log ⎝1 + e( fi f j − fi f j ) ⎠ N i=1 j
(5.13)
At each training step, the loss is computed for triplets with N negative classes and K negative samples for each class. f a , f n , f p in Eq. (5.13) are the embeddings of samples: anchor (query), negative and positive, f a is a transposed f a vector. The backbone of the network is Resnet-50, and it is trained on 20 samples per class. The shallow classifier of the embeddings is the k-nearest neighbor classifier. T-SNE visualization of the embeddings acquired using constellation loss was analyzed. Clusters with each class are very compact, and their proximity matches the similarity between cancer types. The models with listed loss functions obtained comparable accuracy, with the highest score equal to 85% for the model with constellation loss. Authors of [43] apply a metric-learning method to show that knowledge acquired from learning on a weakly labeled dataset can improve the model’s performance trained on a few examples of labeled data. Weakly labeled data is from the KimiaPath24 dataset, consisting of 24 classes of histopathology images of different body parts. Non-experts group 23916 images by visual similarity. The target datasets are CRC dataset [39] with 7 types of colorectal cancer (4375 images in total), and PCam dataset [44] with 327680 healthy and tumoral breast tissues images. The model is ProxyNCA (Proxy-Neighborhood Component Analysis) [45], a modification of Siamese Networks where samples are compared to class proxies that reduce computational complexity. ResNet34 backbone, which was trained on the weakly labeled
5 An Overview of Few-Shot Learning Methods …
101
dataset, was further trained with a new embedding layer on a few samples from the target dataset. The results show that the model trained on weakly labeled data from the histopathology domain outperforms the model trained on natural images from the ImageNet dataset on both target datasets. Contrastive learning with latent augmentation for few-shot classification is a subject of [46]. The classifier was firstly pre-trained on a base dataset and then trained on meta-test support set using latent augmentation. The contrastive method MoCo-v3 [47] comprises a feature extraction backbone f θ , projection head f g , and prediction head gq . During training, the model maximizes the resemblance between the same two images with random augmentation and minimizes the similarity between a pair of different images. Only the extraction backbone f θ is further used with frozen weights. The authors utilize the knowledge from learned data embeddings. The unlabeled datasets representations z = f θ (x) are clustered by the k-means algorithm and ci , i ∈ {0, . . . , C} is the i-th cluster prototype, where C is the number of clusters. C , where i is an intra-cluster covariance matrix. The base dataset B = {(ci , i )}i=1 During meta-testing the classifier is trained on image embeddings z and augmented z˜ = z + δ, δ ∼ N (0, i ∗ ), where i ∗ indicates the cluster centroid ci that is nearest z in the latent space. Latent augmentation may resemble healthy cell mutations into tumorous ones and create better representations of new classes. Experiments were conducted on three public datasets: NCT-CRC-HE-100K [48] with colon tissues, LC25000 [49] with lung and colon tissues, and PAIP19 [50] with liver tissues. The authors compared their method in three setups. The first one is when new and base classes are from different parts of the body (new classes are out-domain). The second is when data is from the same dataset, and new and base classes refer to the same organ (new classes are near-domain). The last one contains other remaining cases (new classes are middle-domain). Classifiers with latent augmentation outperformed other methods in 1-shot, 5-shot, and 10-shot setups. A model trained using contrastive learning obtained a higher F1-score than other few-shot methods for classes unseen during training. Prototype networks are another example of a metric-earning approach. In [51], Shaikh et al. applied a few-shot learning prototype network to classify artifacts on histopathology images stained with hematoxylin and eosin (H&E) or immunohistochemistry (IHC). Artifacts are created at different stages of acquiring histopathology images, resulting in a huge diversity of them. The proposed solution combines transfer learning and few-shot learning using prototype networks that generate embeddings for artifact classes. The first dataset comprised 50 slides stained with IHC with tonsil, breast, and colon cancers. Images were stained with 3,3’-Diaminobenzidine (DAB) for seven different biomarkers (BCL2, HER2, CD10, CD22, CD23, CD68 and Ki67) which indicate different type of cancer. Six classes of artifacts were designated: nuclear stain, membrane/cytoplasm stain, tissue fold, background stain, watermarks, and non-stain. There were 65 samples per class divided into training and validation sets (120 samples) and test sets (25 samples per class). Experiments were also performed on a larger IHC Ki67 dataset with two classes of tissues (unstained and nuclear stained) and artifacts (ink stain, tissue cuts, and blur) not included in the training set. The second dataset combines 122 slides from a private collection and a
102
J. Szołomicka and U. Markowska-Kaczmar
public dataset TCGA stained with H&E. Slides show tissues with lymph node, lung, breast, colon, and liver cancers. 8840 images were divided into the train, validation, and test sets with 100,20 and 1000 samples per class. The dataset contains tissues with 8 classes, including 6 artifact types: anthracosis, blur, tissue crash, hemorrhage, pen mark, tissue fold), non-stain class, and background class. The neural network used for feature extraction was ResNet18 which was pre-trained on ImageNet dataset [26] . All layers, except the last fully connected one, were frozen during training to avoid overfitting. Data was randomly augmented with rotation and horizontal/vertical flip. The model trained using prototypical networks was compared to the classifier with the same architecture trained on the same training sets with standard transfer learning. One model was trained on each dataset. The support sets contained 15 samples per unseen class from the H&E dataset (anthracosis and hemorrhage were not in the training set) and 20 samples with unseen artifacts from the IHC Ki67 dataset. Experiments showed that the optimal number of samples in the support set is between 10 ∼ 30. The prototypical networks outperformed the standard transfer learning method on images with both types of staining. Moreover, it generalized well to the unseen classes of artifacts. An interesting k-means extension of prototypical networks that allows the prediction of unseen classes of whole slide images (WSIs) based on a few examples is proposed by authors in [52]. The method is also resistant to changes in image color and resolution resulting from different microscopic scanners and preprocessing steps dependent on the clinic. The proposed Multi-ProtoNets use multiple prototypes for each class to learn more disparate embeddings. The model is tested on six datasets with images from different scanners and varying scanning procedures. The dataset comprises 356 WSIs from 161 H&E stained colon tissues with adenocarcinoma cancer, with more than 10 million image patches scanned using the 3DHistech MIDI scanner. Tissues are divided into seven classes: tumor, muscle tissue, connective combined with adipose tissue, mucosa, mucus, inflammation, and necrosis. They are additionally scanned by four different scanners and one manual microscope. Multiple prototypes are calculated using k-means clustering on the vector space obtaining k clusters for each class. The cluster centroids become class prototypes. Six models were trained: baseline trained with standard transfer learning and five Multi-ProtoNets models with k ∈ {1, . . . , 5}. ResNet34 was used as a feature extraction backbone. The models are fine-tuned on images scanned by the MIDI scanner. During testing, the model was adapted on 9 slides per scanner unseen during training and then tested on 30 slides for each scanner, including MIDI. The authors also compared the models with augmentation consisting of changes in hue and saturation and H&E augmentation [53]. The model with multiple prototypes per class obtained about 10 percentage points higher accuracy than with single prototype k = 1 or transfer learning for unseen scanners. The model acquired the best score with k = 3 on images from all scanners. The authors also compared the number of examples used for model adaptation. It varied from 0 to 9. The larger the number of adaptation slides, the better performance of the model, but the differences in the scores decreased. Data augmentation improved the results by up to 25.1 percentage points in accuracy. The accuracy of the best models was between 86.9 and 88.2% for
5 An Overview of Few-Shot Learning Methods …
103
unseen scanners, which is a similar score to 87.3% obtained on the MIDI scanner in the training set. The model with multiple prototypes allows data classification from new scanners without loss of accuracy. Cell counting with the few-shot learning method is the research focus included in [54]. Each cell in the image is annotated with a bounding box. The network comprises a backbone Resnet-50 that extracts features, an attention module that correlates the extracted features and bounding boxes, and a regression module that predicts the final density of cells. The attention and the regression modules are firstly finetuned on a non-medical dataset FSC-147 [55] with 147 categories. The backbone’s weights are frozen during the first step. Next, the network is adapted on a few histopathological images from a private colorectal dataset [56] and two datasets from other medical fields. During testing, the model is adapted to each test image using additional information about a few bounding boxes on the image. The model does not learn new features during testing. It is trained with 10-shot learning and tested on 90 images. The proposed method outperforms other state-of-the-art methods for few-shot object detection and object detection. Data-generation based approaches Data generation-based approaches belong to the methods that produce new data using Generative Adversarial Networks (GANs) [15] and augmentation techniques that transform available data, thereby increasing its number and decreasing the risk of overfitting. Common augmentations are random horizontal/vertical flips, color transformations, blurring, spatial transformations, etc. More advanced techniques such as MixUp, CutOut, and CutMix were described in Sect. 5.4.1. In [57], a ScoreMix augmentation technique for histopathology images was introduced. It mixes two samples ((xs , ys ) and (xt , yt ), xs , xt ∈ RC×H ×W ) together into a new training example (xm , ym ) using the distribution of the semantic image regions P. The distributions are calculated using self-attention of [CLS] token learned by vision transformer (ViT) [58]. ViT is based on a transformer architecture [59] that uses attention mechanisms. An image is represented as a sequence of fixedsize patches. Based on the distributions P from both images, the location of cutting and pasting regions is calculated. New H&E stain augmentation was introduced in [53]. Let us remind you that hematoxylin and eosin dye cell nuclei are in blue and other parts of the image in different shades of pink. The method is based on modifying H&E color channels stochastically using color deconvolution to change RGB color space to H&E. The color intensity of the stains depends on the lab conditions, so the augmentation allows them to emulate different factors. Tests were conducted on two datasets: private Radboudumc, similar to the training set, and TUPAC16, unseen during training. Augmentation improved F1-score on the Radboudumc by 1.6 p.p. and 45.7 p.p. (percentage point) on the TUPAC dataset compared to no augmentation. Another kind of data augmentation is proposed by authors of [60]. It uses vision transformer (ViT) [58] based generative adversarial networks (GANs) for histology image synthesis. The generator consists of layers that progressively upsample the latent vector noise z and increase the feature map resolution. The discriminator uses a multi-scale technique to learn global and local structures. The output of the dis-
104
J. Szołomicka and U. Markowska-Kaczmar
criminator is an image class and a prediction of whether it is real or fake. The model is trained using a multi-loss weighting function which combines the classification cross-entropy loss, and Wasserstein [61] loss for GANs. Selective data augmentation was applied to improve generation performance. Images are generated using z sampled from a truncated normal distribution by 0.7 [62]. Moreover, only the generated images with class prediction confidence higher than 0.6 are used. Experiments were performed on the PCam dataset [44] consisting of 327680 images with healthy and tumorous cancer. The authors compared different classification models with the proposed data synthesis augmentation and without this augmentation. The augmentation significantly improved the performance of all tested models. The best results (accuracy equal to 94.5%) were acquired by the classification head used during data generation with the proposed augmentation.
5.4.2 Machine Learning Tasks Performed on Histopahological Images Analysis of histopathological images can be done in different ways. The most common task is classification, where WSI is divided into patches, each with an assigned class. The classification can be binary [25] between healthy and tumorous tissues or multiclass with different cancer types [27, 38, 41, 43, 46, 51]. Cancer types can be specified by a fixed label or a given metric score, such as the Gleason grading system. Segmentation is another task performed on histopathology images. The models can be trained using few-shot [63, 64] or conventional [65, 66] methods. In the segmentation task, each pixel in the image has a separate label. It allows us to find regions with certain categories of nuclei. Pixels can be annotated with a score indicating cancer progression or with a class label, e.g., inside and outside nuclei and the boundary of nuclei. The different task that helps to diagnose cancer is counting the number of cells, which can be solved using regression in a few-shot setup [54] or a classic approach [56, 67].
5.4.3 Datasets There are several publicly available datasets with histopathology images of different types of cancer. All public datasets contain images stained with hematoxylin and eosin, but they are scanned with various scanners, which can cause color changes. In datasets for classification, each image patch is assigned to one class. In the segmentation problem, each pixel is assigned to a class. Table 5.1 presents a collection of public histopathology datasets. They contain microscopic images of tissues stained with hematoxylin and eosin of different types of cancer in various body types. One of the most common cancer types is colorectal cancer. Changes in colorectal tumor
5 An Overview of Few-Shot Learning Methods …
105
Table 5.1 Histopathological image datasets. Problem type-c means classification and s is segmentation Name
Cancer
Number of classes
Classes
Number of samples
CRC (available Colorectal on request) [39]
8
Tumour epithelium, simple stroma, complex stroma, immune cells, debris, normal mucosal glands, adipose tissue and no tissue (background)
625 patches 150×150 per class
c
NCT-CRCHE-100K [48]
Colorectal
9
Adipose, no tissue 100000 (background), debris, patches lymphocytes, mucus, smooth muscle, normal colon mucosa, cancer-associated stroma and colorectal adenocarcinoma epithelium
224×224
c
LC25000 [49]
Colon and lung
5
Colon adenocarcinoma, benign colonic tissue, lung adenocarcinoma, lung squamous cell carcinoma and benign lung tissue
768×768
c
DigestPath [75]
Colorectal
2
Benign, malignant
872 WSI
5000×5000 s
DigestPath [75]
Signet-ring 2 cell from gastric mucosa and intestine
Tumorous, non-tumorous
682 WSI
2000×2000 s
MoNuSeg [76]
Breast, kidney, 3 liver, prostate, bladder, colon and stomach
background, inside nucleus, boundary of nuclei
30 WSI
1000×1000 s
BreakHis [12]
Breast
8
Benign tumors: adenosis, fibroadenoma, phyllodes tumor, tubular adenona and malignant tumors: carcinoma, lobular carcinoma, mucinous carcinoma, papillary carcinoma
9109 images (2480 benign, 5429 malignant)
700×460
c
BreastPathQ
Breast
–
Tumor cellularity score 2579 patches
–
c
TUPAC16 [77]
Breast
–
Mitotic score and proliferation score
821 WSI
–
c
MITOS-ATYPIA-14
Breast
–
Nuclear atypia score
2 · 23000 frames
1539×1376 c and 1663×1485
5000 patches per class
Patch size [px]
Problem type
(continued)
106
J. Szołomicka and U. Markowska-Kaczmar
Table 5.1 (continued) Name
Cancer
Number of classes
Classes
Number of samples
Patch size [px]
Problem type
AidaOvary
Ovary
8
High grade serous carcinoma, low grade serous carcinoma, clear cell carcinoma, endometrioid adenocarcinoma, metastastic serous carcinoma, metastatic other, serous borderline tumor, mucinous borderline tumor
174 WSI (16 benign, 158 malignant)
–
c
Gleason2019
prostate
5
Gleason grade scores
244 patches 5120×5120 s
PAIP19
Liver
3
Whole tumor are, viable tumor area, viable tumor burden
100 WSI
PCam [78, 79]
Lymph nodes
2
Metastatic tissue, no metastatic tissue
327680 patches
CAMELYON17 [80]
Breast
2
Breast cancer metastases, no breast cancer metastases
1399 WSI
TCGA
33 cancer types
–
–
–
–
c
KimiaPath24 [81]
Different body – parts
Weak labels based on image similarity
24 WSI
1000 × 1000
c
Warwick-QU [82, 83]
Colon
Healthy, adenomatous, moderately differentiated, moderately-to-poorly differentiated, and poorly differentiated
165 patches 520 × 775 and 430 × 575
5
s
96 × 96
c c
s
structure are directly related to patient prognosis. The most important part of the colorectal is the colon. Colorectal cancer that starts in the colon is called colon cancer. There are five public datasets containing histopathology images of colorectal or colon cancer. Another prevailing type of cancer is breast cancer which can take two forms: benign and malignant. Both forms can have different subtypes that result in various types of treatment and patient prognosis. Five available datasets are comprised of images with different subtypes of breast cancer. Two datasets contain tissues from the whole body. Other datasets contain information about tumors in the lung, kidney, prostate, bladder, stomach, liver, ovary, lymph nodes, etc. In some articles, the experiments were conducted on the private collections of histopathology images. Shaikh et al. [51] describe a private dataset with immunohistochemistry (IHC) stained images. In [56], a dataset for detecting nuclei was gathered. Other private datasets were presented in [27, 38, 52]. In [68], authors suggested a benchmark for few-shot classification of histopathology images collected from datasets CRC, NCT-CRC-HE-100K, LC25000, and BreakHis.
5 An Overview of Few-Shot Learning Methods …
107
5.4.4 Future of Few-Shot Learning in Histopathology Our survey shows that histopathological image analysis benefits from classical fewshot learning methods developed for the computer vision domain. Presented research relies on the relatively old few-shot methods. As a future research direction, more recent few-shot methods can be implemented. Most of the existing few-shot learning methods in histopathology relate to the problem of cell classification. An interesting future work direction is exploring few-shot learning solutions for cell segmentation or counting, leading to a new benchmark. One of the recently published papers [69] on non-medical image segmentation used hybrid masking, an enhanced feature masking method that considers the fine-grained details in the image. Other recent techniques [70] for integrative, few shot segmentation and classification outperformed other methods on several non-medical datasets. Current few-shot classification methods that can be evaluated on histopathology datasets refer to meta-learning model distillation [70], vision self-attention mechanisms [71, 72], improving generalization to unseen classes using global prototype [73] or dropout [74]. Moreover, there is a need for more one-shot methods in histopathology image analysis. Their impact on histological image analysis can also be further researched. Another topic of future research can be an acquisition of data for new few-shot learning, a public dataset with IHC stained images, as most of the existing public datasets contain images stained with H&E. There is no IHC stained dataset including images from different scanners which is often a reason for using few-shot methods since images from different devices can vary significantly.
5.5 Conclusion Few-shot learning creates an attractive option for histopathological image recognition. It is mainly because supervised training needs a vast amount of annotated data, which is difficult to obtain for various reasons. The annotation task is tedious and needs to be done by someone with high expertise. That is sometimes difficult to achieve. It is also costly; therefore, the interest in few-shot methods is high. Especially in the last two years, many papers have referred to this area. Few-shot learning methods (optimization, metrics, and data generation based) proposed for nonmedical image analysis developed a few years ago were applied for histopathological images obtaining scores high enough to help in medical diagnoses. The most common task solved by few-shot learning methods in the histopathology domain is classification. As a future direction for research in histopathological image recognition, we can propose new few-shot classification methods developed for nonmedical image recognition. Meta-learning model distillation [70], vision self-attention mechanisms [71, 72], global prototype [73] or dropout [74] can be mentioned here. Few papers refer to segmentation and detection. As a future research direction, the hybrid masking [69] can be applied in this case. It is an enhanced feature masking
108
J. Szołomicka and U. Markowska-Kaczmar
method that considers the fine-grained details in the image. Another recent technique for integrative, few-shot segmentation and classification is proposed by Wu et al. in [70] Training a machine learning model requires a dataset labeled by a pathologist, which can be hard to obtain. More publicly available datasets appear each year with different types of cancer from various body parts. The limit of these datasets is that the images are stained with only one type of histological stain: hematoxylin and eosin. A stain or microscopic scanner determines the characteristic of the samples, and some authors decide to collect their private datasets, but public datasets would be desired for further research.
References 1. Gurcan, M., Boucheron, L., Can, A., Madabhushi, A., Rajpoot, N., Yener, B.: Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 147–71 (2009) 2. Parnami A., Lee, M.: Learning from few examples: a summary of approaches to few-shot learning (2022). arxiv:2203.04291 3. Song, Y., Wang, T., Mondal, S.K., Sahoo, J.P.: A comprehensive survey of few-shot learning: evolution, applications, challenges, and opportunities (2022). arxiv:2205.06743 4. Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53(3). https://doi.org/10.1145/3386252 5. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks (2017). arxiv:1703.03400 6. So, C.: Exploring meta learning: Parameterizing the learning-to-learn process for image classification. In: International conference on artificial intelligence in information and communication (ICAIIC) 2021, pp. 199–202 7. Singh, R., Bharti, V., Purohit, V., Kumar, A., Singh, A.K., Singh, S.K.: Metamed: few-shot medical image classification using gradient-based meta-learning. Pattern Recogn. 120, 108111 (2021). https://www.sciencedirect.com/science/article/pii/S0031320321002983 8. Koch, G.R.: Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop (2015) 9. Vinyals, O., Blundell, C., Lillicrap, T., kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc., (2016). https://proceedings.neurips.cc/paper/2016/file/ 90e1357833654983612fb05e3ec9148c-Paper.pdf 10. Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2495–2504 (2020) 11. Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning (2017). arxiv:1703.05175 12. Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 63(7), 1455–1462 (2016) 13. Qin, T., Li, W., Shi, Y., Gao, Y.: Diversity helps: unsupervised few-shot learning via distribution shift-based data augmentation (2020) arxiv:2004.05805 14. Hariharan, B., Girshick, R.: Low-shot visual recognition by shrinking and hallucinating features (2016). arxiv:1606.02819 15. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in neural information processing systems, vol. 27. Curran Associates, Inc., (2014). https://proceedings.neurips.cc/paper/2014/file/ 5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
5 An Overview of Few-Shot Learning Methods …
109
16. Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014). arxiv:1411.1784 17. Kumar, N., Gupta, S., Gupta, R.: Whole slide imaging (wsi) in pathology: current perspectives and future directions. J. Digit. Imaging 4, 1034–1040 (2020) 18. Amin, S., Mori, T., Itoh, T.: A validation study of whole slide imaging for primary diagnosis of lymphoma. Pathol. Int. 69(6), 341–349 (2019). https://onlinelibrary.wiley.com/doi/abs/10. 1111/pin.12808 19. Fox, H.: Is h&e morphology coming to an end? J. Clin. Pathol. 1, 38–40 (2000) 20. Alturkistani, H., Tashkandi, F., Mohammedsaleh, Z.: Histological stains: a literature review and case study. Glob. J. Health Sci. 3, 72–9 (2015) 21. Libard, D., Cerjan, S., Alafuzoff, I.: Characteristics of the tissue section that influence the staining outcome in immunohistochemistry. Histochem. Cell Biol. 151, 91–96 (2019) 22. Chen, P., Liang, Y., Shi, X., Yang, L., Gader, P.: Automatic whole slide pathology image diagnosis framework via unit stochastic selection and attention fusion Neurocomputing 312– 325 (2021) 23. Kriegsmann, M., Warth, A.: What is better/reliable, mitosis counting or ki67/mib1 staining? Transl. Lung Cancer Res. 5, 543–546 (2016) 24. Wenbin, H., Ting, L., Yongjie, H., Wuyi, M., Jinguang, D., Yinxia, L., Yuan, Y., Leijie, W., Zhiwen, J., Yongqiang, W., Jie, Y., Chen, C.: A review: the detection of cancer cells in histopathology based on machine vision. Comput. Biol. Med. (2022) 25. Chao, S., Belanger, D.: Generalizing few-shot classification of whole-genome doubling across cancer types. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 3382–3392 (2021) 26. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2009, pp. 248–255 27. Fagerblom, F., Stacke, K., Molin, J.: Combatting out-of-distribution errors using modelagnostic meta-learning for digital pathology. In: Medical imaging (2021) 28. Litjens, G.J.S., Bándi, P., Bejnordi, B.E., Geessink, O.G.F., Balkenhol, M.C.A., Bult, P., Halilovic, A., Hermsen, M., van de Loo, R., Vogels, R., Manson, Q.F., Stathonikos, N., Baidoshvili, A., van Diest, P., Wauters, C.A., van Dijk, M., van der Laak, J.: 1399 h&e-stained sentinel lymph node sections of breast cancer patients: the camelyon dataset. GigaScience 7 (2018) 29. Lindman, K., Rose, J.F., Lindvall, M., Stadler, C.B.: Ovary data from the visual Sweden project droid (2019). https://datahub.aida.scilifelab.se/10.23698/aida/drov 30. Yuan, Z., Esteva, A., Xu, R.: Metahistoseg: a python framework for meta learning in histopathology image segmentation. In: Deep Generative Models, and Data Augmentation, Labelling, and Imperfections: first Workshop, DGM4MICCAI 2021, and First Workshop, DALI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, Proceedings. Springer, Berlin, Heidelberg, pp. 268–275 (2021). https://doi.org/10.1007/978-3-030-88210-5_27 31. Zhang, C., Cui, Q., Ren, S.: Few-shot medical image classification with MAML based on dice loss. In: 2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA), pp. 348–351 (2022) 32. Lau, J.J., Gayen, S., Ben Abacha, A., Demner-Fushman, D.: A dataset of clinically generated visual questions and answers about radiology images. Sci. Data 5, 180251 (2018) 33. He, X., Zhang, Y., Mou, L., Xing, E., Xie, P.: Pathvqa: 30000+ questions for medical visual question answering (2020). arxiv:2003.10286 34. Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization (2017). arxiv:1710.09412 35. DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout (2017). arxiv:1708.04552 36. Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision
110
J. Szołomicka and U. Markowska-Kaczmar
37. Wen, Q., Yan, J., Liu, B., Meng, D., Li, S.: A meta-learning method for histopathology image classification based on LSTM-model. In: Tenth international conference on graphic and image processing (ICGIP 2018) (2019) 38. Medela, A., Picon, A., Saratxaga, C.L., Belar, O., Cabezón, V., Cicchi, R., Bilbao, R., Glover, B.: Few shot learning in histopathological images: reducing the need of labeled data on biological datasets. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 1860–1864 (2019) 39. Kather, J., Weis, C.-A., Bianconi, F., Melchers, S., Schad, L., Gaiser, T., Marx, A., Zöllner, F.: Multi-class texture analysis in colorectal cancer histology. Sci. Rep. 6, 27988 (2016) 40. Sikaroudi, M., Safarpoor, A., Ghojogh, B., Shafiei, S., Crowley, M., Tizhoosh, H.: Supervision and source domain impact on representation learning: a histopathology case study. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 1400–1403 (2020) 41. Medela, A., Picon, A.: Constellation loss: improving the efficiency of deep metric learning loss functions for optimal embedding (2019). arxiv:1905.10675 42. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc., (2016). https://proceedings.neurips.cc/ paper/2016/file/6b180037abbebea991d8b1232f8a8ca9-Paper.pdf 43. Teh, E.W., Taylor, G.W.: Learning with less data via weakly labeled patch classification in digital pathology (2019). arxiv:1911.12425 44. Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant cnns for digital pathology (2018). arxiv:1806.03962 45. Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies (2017). arxiv:1703.07464 46. Yang, J., Chen, H., Yan, J., Chen, X., Yao, J.: Towards better understanding and better generalization of few-shot classification in histology images with contrastive learning (2022). arxiv:2202.09059 47. Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers (2021). arxiv:2104.02057 48. Kather, J.N., Halama, N., Marx, A.: 100,000 histological images of human colorectal cancer and healthy tissue (2018). https://doi.org/10.5281/zenodo.1214456 49. Borkowski, A.A., Bui, M.M., Thomas, L.B., Wilson, C.P., DeLand, L.A., Mastorides, S.M.: Lung and colon cancer histopathological image dataset (lc25000) (2019). arxiv:1912.12142 50. Kim, Y.J., Jang, H., Lee, K., Park, S., Min, S.-G., Hong, C., Park, J. H., Lee, K., Kim, J., Hong, W., Jung, H., Liu, Y., Rajkumar, H., Khened, M., Krishnamurthi, G., Yang, S., Wang, X., Han, C.H., Kwak, J.T., Ma, J., Tang, Z., Marami, B., Zeineh, J., Zhao, Z., Heng, P.-A., Schmitz, R., Madesta, F., Rösch, T., Werner, R., Tian, J., Puybareau, E., Bovio, M., Zhang, X., Zhu, Y., Chun, S.Y., Jeong, W.-K., Park, P., Choi, J.: Paip 2019: liver cancer segmentation challenge. Med. Image Anal. 67, 101854 (2021). https://www.sciencedirect.com/science/article/ pii/S1361841520302188 51. Shaikh, N.N., Wasag, K., Nie, Y.: Artifact identification in digital histopathology images using few-shot learning. In: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), pp. 1–4 (2022) 52. Deuschel, J., Firmbach, D., Geppert, C.I., Eckstein, M., Hartmann, A., Bruns, V., Kuritcyn, P., Dexl, J., Hartmann, D., Perrin, D., Wittenberg, T., Benz, M.: Multi-prototype Few-Shot Learning in Histopathology, IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), vol. 2021, pp. 620–628 (2021) 53. Balkenhol, M., Karssemeijer, N., Litjens, G., van der Laak, J., Ciompi, F., Tellez, D.: H&e stain augmentation improves generalization of convolutional networks for histopathological mitosis detection. In: Medical Imaging 2018: digital Pathology, p. 34 (2018) 54. Li, M., Zhao, K., Peng, C., Hobson, P., Jennings, T., Lovell, B.C.: Deep adaptive few example learning for microscopy image cell counting. In: Digital Image Computing: techniques and Applications (DICTA), vol. 2021, pp. 1–7 (2021)
5 An Overview of Few-Shot Learning Methods …
111
55. Ranjan, V., Sharma, U., Nguyen, T., Hoai, M.: Learning to count everything (2021). arxiv:2104.08391 56. Sirinukunwattana, K., Raza, S.E.A., Tsang, Y.-W., Snead, D.R.J., Cree, I.A., Rajpoot, N.M.: Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans. Med. Imaging 35(5), 1196–1206 (2016) 57. Stegmüller, T., Bozorgtabar, B., Spahr, A., Thiran, J.-P.: Scorenet: learning non-uniform attention and augmentation for transformer-based histopathological image classification. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), vol. 2023, pp. 6159–6168 (2023) 58. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16 × 16 words: transformers for image recognition at scale (2020). arxiv:2010.11929 59. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need (2017). arxiv:1706.03762 60. Li, M., Li, C., Peng, C., Lovell, B.: Conditioned generative transformers for histopathology image synthetic augmentation (2022). arxiv:2212.09977 61. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN (2017). arxiv:1701.07875 62. Marchesi, M.: Megapixel size image creation using generative adversarial networks (2017). arxiv:1706.00082 63. Yuan, Z., Esteva, A., Xu, R.: Metahistoseg: a python framework for meta learning in histopathology image segmentation (2021). arxiv:2109.14754 64. Saha, S., Choi, O., Whitaker, R.: Few-shot segmentation of microscopy images using gaussian process. In: Huo, Y., Millis, B.A., Zhou, Y., Wang, X., Harrison, A.P., Xu, Z. (eds.) Medical Optical Imaging and Virtual Microscopy Image Analysis, pp. 94–104. Springer Nature Switzerland, Cham (2022) 65. Kurmi, Y., Chaurasia, V., Kapoor, N.: Histopathology image segmentation and classification for cancer revelation. Signal Image Video Process 15, 09 (2021) 66. Kim, H., Yoon, H., Thakur, N., Hwang, G., Lee, E., Kim, C., Chong, Y.: Deep learning-based histopathological segmentation for whole slide images of colorectal cancer in a compressed domain. Sci. Rep. (1) (2021) 67. He, S., Minn, K.T., Solnica-Krezel, L., Anastasio, M.A., Li, H.: Deeply-supervised density regression for automatic cell counting in microscopy images (2020). arxiv:2011.03683 68. Shakeri, F., Boudiaf, M., Mohammadi, S., Sheth, I., Havaei, M., Ayed, I.B., Kahou, S.E.: Fhist: a benchmark for few-shot classification of histological images (2022). arxiv:2206.00092 69. Moon, S., Sohn, S.S., Zhou, H., Yoon, S., Pavlovic, V., Khan, M.H., Kapadia, M.: HM: hybrid masking for few-shot segmentation. In: Computer Vision-ECCV: 17th European Conference, Tel Aviv, Israel. Proceedings. Part XX, vol. 2022. Springer, pp. 506–523 (2022) 70. Wu, Y., Chanda, S., Hosseinzadeh, M., Liu, Z., Wang, Y.: Few-shot learning of compact models via task-specific meta distillation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 6265–6274 (2023) 71. Li, Z., Hu, Z., Luo, W., Hu, X.: Sabernet: self-attention based effective relation network for fewshot learning. Pattern Recogn. 133, 109024 (2023). https://www.sciencedirect.com/science/ article/pii/S0031320322005040 72. Peng, Y., Liu, Y., Tu, B., Zhang, Y.: Convolutional transformer-based few-shot learning for cross-domain hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 16, 1335–1349 (2023) 73. Liu, Y., Shi, D., Lin, H.: Few-shot learning with representative global prototype (2023). https:// openreview.net/forum?id=vT2OIobt3pQ 74. Lin, S., Zeng, X., Zhao, R.: Explore the power of dropout on few-shot learning (2023) 75. Da, Q., Huang, X., Li, Z., Zuo, Y., Zhang, C., Liu, J., Chen, W., Li, J., Xu, D., Hu, Z., Yi, H., Guo, Y., Wang, Z., Chen, L., Zhang, L., He, X., Zhang, X., Mei, K., Zhu, C., Lu, W., Shen, L., Shi, J., Li, J., Krishnamurthi, S.S.G., Yang, J., Lin, T., Song, Q., Liu, X., Graham, S., Bashir, R.M.S., Yang, C., Qin, S., Tian, X., Yin, B., Zhao, J., Metaxas, D.N., Li, H., Wang, C., Zhang, S.: Digestpath: a benchmark dataset with challenge review for the pathological
112
76.
77.
78. 79.
80.
81.
82.
83.
J. Szołomicka and U. Markowska-Kaczmar detection and segmentation of digestive-system. Med. Image Anal. 80, 102485 (2022). https:// www.sciencedirect.com/science/article/pii/S1361841522001323 Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., Sethi, A.: A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans. Med. Imaging 36(7), 1550–1560 (2017) Veta, M., Heng, Y.J., Stathonikos, N., Bejnordi, B.E., Beca, F., Wollmann, T., Rohr, K., Shah, M.A., Wang, D., Rousson, M., Hedlund, M., Tellez, D., Ciompi, F., Zerhouni, E., Lanyi, D., Viana, M., Kovalev, V., Liauchuk, V., Phoulady, H.A., Qaiser, T., Graham, S., Rajpoot, N., Sjöblom, E., Molin, J., Paeng, K., Hwang, S., Park, S., Jia, Z., Chang, E.I.-C., Xu, Y., Beck, A.H., van Diest, P.J., Pluim, J.P.: Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge. Med. Image Anal. 54, 111–121 (2019) Veeling, B.S., Linmans, J., Winkens, J., Cohen, T., Welling, M.: Rotation equivariant CNNs for digital pathology (2018) Bejnordi, B.E., Veta, M., Van Diest, P.J., Van Ginneken, B., Karssemeijer, N., Litjens, G., Van Der Laak, J.A., Hermsen, M., Manson, Q.F., Balkenhol, M., Geessink, O.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 2199–2210 (2017). https://doi.org/10.1001/jama.2017.14585 Bándi, P., Geessink, O., Manson, Q., Van Dijk, M., Balkenhol, M., Hermsen, M., Ehteshami Bejnordi, B., Lee, B., Paeng, K., Zhong, A., Li, Q., Zanjani, F.G., Zinger, S., Fukuta, K., Komura, D., Ovtcharov, V., Cheng, S., Zeng, S., Thagaard, J., Dahl, A.B., Lin, H., Chen, H., Jacobsson, L., Hedlund, M., Çetin, M., Halici, E., Jackson, H., Chen, R., Both, F., Franke, J., Küsters-Vandevelde, H., Vreuls, W., Bult, P., van Ginneken, B., van der Laak, J., Litjens, G.: From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE Trans. Med. Imaging 38(2), 550–560 (2019) Babaie, M., Kalra, S., Sriram, A., Mitcheltree, C., Zhu, S., Khatami, S.A., Rahnamayan, S., Tizhoosh, H.R.: Classification and retrieval of digital pathology scans: a new dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 760–768 (2017) Sirinukunwattana, K., Pluim, J.P.W., Chen, H., Qi, X., Heng, P.-A., Guo, Y.B., Wang, L.Y., Matuszewski, B.J., Bruni, E., Sanchez, U., Böhm, A., Ronneberger, O., Cheikh, B.B., Racoceanu, D., Kainz, P., Pfeiffer, M., Urschler, M., Snead, D.R.J., Rajpoot, N.M.: Gland segmentation in colon histology images: the GLAS challenge contest (2016) Sirinukunwattana, K., Snead, D., Rajpoot, N.: A stochastic polygons model for glandular structures in colon histology images. IEEE Trans. Med. Imaging 34, 05 (2015)
5 An Overview of Few-Shot Learning Methods …
113
Author Biographies Urszula Markowska-Kaczmar is a full professor in the Artificial Intelligence Department at the Wroclaw University of Science and Technology. Since beginning her scientific career, she has been interested in research on computational intelligence methods. Initial papers concerns classical neural networks and the possibilities of explaining their operation. Along with this field, research migrated through pulsed neural networks to deep models and their applications. Urszula Markowska-Kaczmar is the author of over 150 publications on neural networks and machine learning in scientific journals and conferences. She is a PC member of top conferences in the field of neural networks and artificial intelligence and a reviewer of renowned journals in this domain. In 2022, she was nominated to the Top 100 list of Polish women implementing the success of technology in the field of Artificial Intelligence. Joanna Szołomicka is a Ph.D. student in the Artificial Intelligence Department at the Wroclaw University of Science and Technology. During her master’s studies in the field of artificial intelligence, she has co-authored publications on deep learning methods. She is fascinated by the applications of artificial intelligence in medicine. She was involved in developing a system for automatically diagnosing masticatory system dysfunctions. Currently, she is working on detecting HER2-positive cancers using few-shot learning.
Chapter 6
From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition Monica Bianchini, Paolo Andreini, and Simone Bonechi
Abstract Skin cancer is a serious public health problem with a sharply increasing incidence in recent years, which has a major impact on quality of life and can be disfiguring or even fatal. Deep learning techniques can be used to analyze dermoscopic images, resulting in automated systems that can improve the clinical confidence of the diagnosis – also avoiding unnecessary surgery – help clinicians objectively communicate its outcome, reduce errors related to human fatigue, and cut costs affecting the health system. In this chapter, we present an entire pipeline to analyze skin lesion images in order to distinguish nevi from melanomas, also integrating patient clinical data to reach a diagnosis. Furthermore, to make our artificial intelligence tool explainable for both clinicians and patients, dermoscopic images are further processed to obtain their segmented counterparts, where the lesion contour is easily observable, and saliency maps, highlighting the areas of the lesion that prompted the classifier to make its decision. Experimental results are promising and have been positively evaluated by human experts. Keywords Deep learning · Dermoscopic images · Image segmentation · Nevus/melanoma classification · Saliency maps · Explainability
M. Bianchini (B) · P. Andreini · S. Bonechi Department of Information Engineering and Mathematics, University of Siena, Siena, Italy e-mail: [email protected] P. Andreini e-mail: [email protected] S. Bonechi e-mail: [email protected] S. Bonechi Department of Social, Political and Cognitive Science, University of Siena, Siena, Italy © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_6
115
116
M. Bianchini et al.
6.1 Introduction Artificial Intelligence (AI) is certainly the defining technology of the last decade, and probably the next as well. In recent years, AI has woven itself into our daily lives and has become so pervasive that many people remain unaware of both its impact and addiction to it. Throughout our day, AI technology drives much of what we do. As soon as we get up in the morning, we take out the cell phone or turn on the laptop to start our activities. Doing so has become completely automatic and natural and we see no other way to make decisions, plan our work or find information. AI is the intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals and humans. In fact, the human brain comprises approximately 100 billion neurons connected by approximately 100 trillion synapses, and, to date, neuroscience has not yet fully disclosed the mechanism of acquiring knowledge, which determines its intellectual abilities. Nonetheless, in order to efficiently mimic the nervous system, it is necessary to have an idea of the nature of the processes that take place in the brain. Every cognitive process evolves from an initial situation, associated with a stimulus, up to a terminal one, in which there is a response that constitutes the result of the intellectual process. It is intuitive that, in this evolution, there is a conservative transfer of information. Since the brain structure and the electromagnetic activity of individual biological cells are known, inferences can be made on the collective behavior of neurons and inspiration can be drawn for the construction of machines capable of replicating intellectual tasks. The human being is constantly evolving and learning; this mirrors how AI works at its core. Human intelligence, creativity, knowledge, experience and innovation are the drivers for the expansion of current and future AI technologies. Over time, artificial intelligence has increasingly identified itself with Machine Learning (ML) – though machine learning is simply one possible way to achieve artificial intelligence – and, recently, with Deep Learning (DL). Machine learning predicts results based on incoming data without being explicitly programmed to do so. To this aim, the machine needs to find relevant patterns that it identifies by analyzing the variety of data it receives. Through this process, the machine learns and creates its knowledge base, which automatically improves through experience. The real challenge for AI has always been solving tasks that are easy for people to perform but difficult to formally describe: problems that humans solve intuitively, but which have a strong perceptual component. In fact, the perceptive abilities, developed during an evolutionary process of hundreds of thousands of years, are difficult to replicate not only using symbolic computational models, but also classical machine learning. Deep learning, instead, allows “computers to learn from experience and understand the world in terms of a hierarchy of concepts, with each concept defined through its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all the knowledge that the computer needs. The hierarchy of concepts enables the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
117
showing how these concepts are built on top of each other, the graph is deep, with many layers” [16]. Convolutional Neural Networks (CNNs) are a class of deep architectures particularly inspired by the organization of the visual cortex. The development and consequent massive use of CNNs was an epochal event in image processing, which has made it possible to obtain unimaginable performance until a few years ago for image classification [21, 24], semantic segmentation [11, 26, 38], object detection [20, 30] and action recognition [12], to name just a few applications. CNNs are also becoming very popular in the medical field [25], where many decision support systems have been developed for the automatic reporting of medical tests [2, 9], for the segmentation of retinal fundus images [3] and for the analysis of radiological, magnetic resonance, PET (Positron Emission Tomography) or CT (Computerized Axial Tomography) images [7, 32]. Indeed, AI can affect several processes in diagnostic imaging. It can be employed to optimize technological and human resources, through greater efficiency of workflows and patient management. In fact, AI algorithms can intervene to analyze and interpret medical images as well as to harmonize the protocols used in the execution of such exams. The use of AI algorithms can help in image reconstruction, since an intelligent system is able, for instance, to reduce artifacts due to patient movement and to guarantee the same image quality by reducing the time required to perform an exam. CNNs can be trained to understand images and detect anomalies in particular systems or organs. They can be trained to characterize lesions: for example, to recognize whether an abnormality shown on a mammogram is benign or malignant, or whether a pulmonary nodule shown on CT should be removed or can be checked. In this process, the information obtained from the images can be correlated with patient data, such as lifestyle habits or blood test results. The correlation of the data coming from different methods of analysis and from the clinical history of the patients can allow them to be classified into homogeneous groups of risk, to establish personalized treatment protocols and plans. Deep learning techniques have been also successfully applied in the dermatological field to segment and classify nevi and melanomas (see, for instance, [4, 8, 15, 19, 35, 36]). A comprehensive survey on the last advances of deep learning methods for the skin lesion diagnosis based on dermoscopic images can be found in [27]. These techniques prove to be quite effective, especially in recognizing atypical nevi, also called dysplastic nevi [36]. This type of nevi is difficult to distinguish from melanomas even for experienced dermatologists. In fact, they are unusuallooking nevi that have irregular characteristics under the microscope (asymmetrical, jagged edges, and irregular coloring) and can occur anywhere on the body. Although benign, they are slightly different from common moles both clinically and histologically because of their disorganized architecture and melanocytes atypia, which make them visually similar to melanomas. Very rare before puberty, melanoma predominantly affects people aged between 30 and 60. Considered a rare neoplasm until recently, today it shows a constantly growing incidence all over the world. Globally, it is estimated that, in the last decade, skin melanoma has reached 100,000 new cases a year: an increase of about 15% compared to the previous decade. Cutaneous melanoma is, in particular, dozens of times more frequent in subjects of European
118
M. Bianchini et al.
countries (Caucasians) than in other ethnic groups. The highest incidence rates are in fact found in very sunny areas inhabited by populations with particularly light skin (like in Northern Europe). In Italy, the estimate of melanomas, and of the deaths attributed to them, is still approximate: it is around 7,000 cases per year. Although cutaneous melanoma comprises less than 5% of all skin tumor cases, it causes the majority (75%) of skin cancer deaths. Regular clinical screenings and head-to-toe self-examinations are recommended to detect melanoma in its earlier stages when the lesion is smaller than 2 mm and can be easily removed with surgery. If melanoma is diagnosed in a more advanced stage, in which cancer has already spread to lymph nodes, the excision is insufficient. To treat these cases, surgery must be combined with radiotherapy, immunotherapy, or targeted therapy [14]. The development of cutaneous melanoma is a complex phenomenon, due to interactions between environmental and endogenous factors, including phototype, number of nevi, presence of atypical nevi, genetic alterations, and UV exposure, which is recognized to be the major risk factor. In this context, deep learning techniques can be employed to develop decision support systems to speed-up the early diagnosis of melanoma and drastically increase the positive outcome of the disease. Furthermore, unnecessary removal operations, which are painful for the patient and costly for the healthcare system, can be avoided. Deep learning models need large supervised datasets to be trained. For the analysis of skin lesions, the public ISIC (International Skin Imaging Collaboration) Archive exists [22], which collects more than 60,000 images. All images are equipped with classification labels, while only a small subset is also labeled at the pixel-level for segmentation. Indeed, only 2694 images along with their segmentation supervision were released for the 2018 ISIC segmentation challenge [13, 37], due to the prohibitive cost of manual pixel-by-pixel labeling, especially for medical imaging. However, using the lesion segmentation as additional information for the classifier has proved to be quite useful in improving its accuracy [19, 35] – not surprisingly given that dermatologists’ diagnosis deeply depends on the lesion boundaries. From a different point of view – followed in our approach – segmented images can be directly provided to the human expert as a distilled form of visualization to highlight the contours of the lesion and evaluate their irregularity or smoothness. In this chapter, we present a whole pipeline to develop an intelligent tool for the diagnosis of nevi and melanomas (Fig. 6.1). First of all, deep learning techniques are used to classify skin lesions, making use of information fusion techniques, to take into account not only dermoscopic images but also clinical features (including the age and gender of the patient and the position of the lesion). A modular architecture is set up able to process images and patient data separately and in parallel, with a dedicated module for each one. The partial results are then combined (by a fusion module) in order to reach a diagnosis. Since explainability is one of the hottest topics when it comes to the application of AI in healthcare, we will proceed by extracting saliency maps from the images, highlighting those areas on which the network focuses to make its decision. Indeed, while it is clear, in practice, that AI-powered systems outperform humans at certain analytic tasks, the lack of explainability continues to draw criticism and leave medical experts mistrustful of decisions whose motivations
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
119
Fig. 6.1 Proposed pipeline for skin lesion analysis
are unknown. Saliency maps cannot explain why a decision has been taken, but certainly help in specifying where useful information can be found within an image. Finally, a segmentation network is trained to extract the lesion boundaries. To overcome the lack of pixel-level annotation in the ISIC dataset, we employed the supervisions provided in the publicly available ISIC Weak Segmentation Maps (ISIC_WSM) dataset, obtained with the weakly supervised approach described in [5]. The supervision achieved with this approach may not ensure the same level of accuracy as the manual annotation, but it is still highly effective (as proved by our experiments), and considerably reduces the cost and time needed to create labeled data. Indeed, by visualizing the lesion boundaries, the human expert can observe the shape and contour of the lesion, which can provide additional clues for the diagnosis, since melanomas tend to have irregular and asymmetrical boundaries, while benign lesions usually have smooth and symmetrical contours. In the proposed pipeline, the segmented images, combined with the corresponding saliency maps, are provided to the dermatologist to help understand the network diagnosis. The remainder of this chapter is organized as follows. In the next section, we briefly describe the datasets used in our experiments, namely the publicly available ISIC Archive and ISIC_WSM, which are respectively used to address the tasks of classification and segmentation of dermoscopic images. Also in Sect. 6.2, the neural network architectures used for both classification and segmentation are introduced. Section 6.3 presents the entire pipeline used for the analysis of skin lesions – which includes the binary classification of nevi/melanomas, based on both dermoscopic images and clinical data, image segmentation and saliency map extraction – and our obtained experimental results. The following Sect. 6.4 introduces the issues related to reliability and explainability of AI techniques used in the medical field, illustrating how both saliency maps and segmented images can help the doctor understand how the automatic diagnosis was achieved. Finally, Sect. 6.5 collects some conclusions and future perspectives. For the sake of clarity, a list of abbreviations and acronyms used in this chapter is reported in Table 6.1.
120
M. Bianchini et al.
Table 6.1 List of abbreviations and acronyms used in this chapter Abbreviation Definition AI CNN CT DL GDPR Grad-CAM ISIC ISIC_GT ISIC_WSM ML MLP PET PSPNet ResNet SMANet UV WSM YOLO
Artificial Intelligence Convolutional Neural Network Computerized (Axial) Tomography Deep Learning General Data Protection Regulation Gradient-weighted Class Activation Mapping International Skin Imaging Collaboration ISIC Ground Truth ISIC Weak Segmentation Maps Machine Learning MultiLayer Perceptron Positron Emission Tomography Pyramid Scene Parsing Network Residual Network Segmentation Multiscale Attention Network Ultra Violet Weakly Segmentation Maps You Only Look Once
6.2 Materials and Methods The ISIC and ISIC_WSM datasets used in this work are presented in Sect. 6.2.1, while the classification model and the segmentation networks are described in Sect. 6.2.2 and Sect. 6.2.3, respectively.
6.2.1 Datasets ISIC Archive The International Skin Imaging Collaboration (ISIC) [22] is a joint project between academia and industry aimed at reducing melanoma mortality through the application of digital skin imaging. The ISIC dataset is an open-source repository of skin images designed to facilitate and encourage the development of automated diagnostic systems in this field. The archive contains over 60,000 images of skin lesions from various sources, along with the diagnosis and some anamnestic data of the patients. In 2018, ISIC launched a segmentation challenge [13, 37], releasing segmentation supervision for 2,694 images (2,594 for training and 100 for validation). These label maps were generated using different approaches, including manual and
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
121
semi-automatic methods, and were reviewed and curated by dermatologists specialized in dermoscopy. An additional test set of 1,000 images was also released, though the segmentation supervision for these images has not been made publicly available. Models can still be evaluated on these images using the ISIC evaluation server.1 ISIC_WSM The ISIC Weak Segmentation Maps (ISIC_WSM) dataset2 provides segmentation supervision for approximately 43,000 images from the ISIC database. The annotations were created using the weakly supervised approach described in [5]. This approach involved the use of a YOLO detector network [30] to extract bounding boxes around the lesions, which were then used to crop the images. A segmentation network was then trained to recognize the background and foreground (lesion) within the crop. Finally, the probability prediction of the segmentation network on the image crop was used to create the label map for the entire image. Using probability prediction makes it possible to label not only the background and lesion classes but allows also to define an uncertain region, typically comprising the pixels belonging to the edges of the lesion. This is particularly useful as the lesion contours are often indistinct and difficult to be precisely defined.
6.2.2 Classification Model: ResNet Residual Networks (ResNets) [21] are an exceptionally successful family of deep neural network architectures that are widely employed in computer vision. The key innovation of ResNet is the use of residual connections (also known as skip connections) to simplify the network optimization. In particular, the network layers in a ResNet are reformulated to learn residuals, instead of directly learning the desired underlying mapping. This enables deeper network architectures to be trained successfully. ResNet architectures have provided state of the art performance across many computer vision tasks and domains.
6.2.3 Segmentation Networks SMANet The Segmentation Multiscale Attention Network (SMANet) [10] is a modified version of the Pyramid Scene Parsing Network (PSPNet) [38], which uses a ResNet [21] encoder with dilated convolutions (i.e., atrous convolutions [29]) in place of pooling layers to increase the receptive field. In the PSP, a pyramid of pooling layers with different kernel sizes was used to gather context information, and the desired 1 2
https://challenge.isic-archive.com/task/49/. Publicy available at https://simonebonechi.github.io/downloads/isic_wsm.
122
M. Bianchini et al.
per-pixel prediction was obtained by directly upsampling the network output to the original image resolution. In the SMANet, an additional multiscale attention mechanism was added to the pooling module to focus on relevant objects in the image and better handle the presence of objects at different scales. SegNeXt The SegNeXt model is a state of the art encoder–decoder architecture for semantic segmentation that rethinks the convolutional attention design. Rather than using standard convolutional blocks, SegNeXt replaces them with multiscale convolutional layers, which perform spatial attention for each stage of the encoder using a simple element-wise multiplication at the end of each block. This spatial awareness method has been shown to be more efficient than normal convolutions and selfattention. SegNeXt consists of a convolutional encoder and decoder that includes a decomposition-based Hamburger module for global information extraction. In the decoder, information from several stages is combined, and the Hamburger module is used to extract the global context, allowing for the aggregation of multiscale context information from different levels. As a result, SegNeXt has proven to be more accurate than previous segmentation approaches, including those based on transformers.
6.3 Skin Lesion Analysis The proposed pipeline to perform skin lesion analysis (Fig. 6.1) comprises three main steps: classification, segmentation, and saliency map extraction. The main goal is to extract valuable information from the image to help dermatologists diagnose melanomas. The approach presented in this chapter uses a classification model, described in Sect. 6.3.1, which allows the analysis of dermoscopic images and patient data to suggest a diagnosis. However, to give clinicians a real and useful decision support system it is necessary to integrate the proposed diagnosis with additional information that could help them to understand the network’s decision. As a result, our pipeline incorporates two additional models: the segmentation model, described in Sect. 6.3.2, which extracts the border of the lesion, a key diagnostic feature typically used by doctors, and the saliency map estimator, presented in Sect. 6.3.3, which identifies the pixels of the image that contribute most to the classification. By incorporating these models, our pipeline provides doctors with a more complete and interpretable analysis of skin lesions, enhancing the accuracy and confidence of their diagnoses.
6.3.1 Skin Lesion Classification Our skin lesion classifier employs deep learning models to predict the probability of having a nevus or a melanoma. However, what sets our approach apart is the
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
123
Fig. 6.2 The lesion classification scheme: LesionNet and MetaNet are used to analyze lesion images and anamnestic data, respectively, producing encoded feature representations. MergedNet combines these representations to determine the probabilities of a nevus or melanoma
use of information fusion techniques to take into account not only dermoscopic images but also clinical features such as age, gender, and the position of the lesion. A modular architecture is used to process images and patient data separately and in parallel, with a dedicated module for each one. The partial results are then combined by a fusion module to reach a diagnosis. Specifically, we trained the LesionNet, a deep convolutional neural network, to analyze the dermoscopic images, and the MetaNet, a fully-connected Multi Layer Perceptron (MLP), to process the clinical features. The features extracted from the inner layer of these networks are then concatenated and fed into the MergedNet, a network specifically trained to classify the lesions by combining the information coming from the image and from anamnestic data. By using these networks in combination, we are able to leverage the strengths of each approach and achieve a more accurate and reliable classification of skin lesions. Figure 6.2 provides an overview of our approach, which comprises three main components, the LesionNet, the MetaNet, and the MergedNet. In the following, we will provide a detailed description of each of these components. LesionNet To implement the LesionNet we have used a pre-trained CNN. In particular, we chose ResNet50, which is known for its good performance on various image classification tasks while balancing computational costs. To ensure that the input was compatible with ResNet50, we resized all the images to a fixed dimension of 224×224 pixels. We employed ResNet50 as a feature extractor module to process the dermoscopic images through multiple consecutive layers of convolutions, obtaining a set of relevant high-level features. During training, we employed standard data augmentation techniques, including vertical and horizontal flips, and rotation of 90, 180, and 270◦ , to artificially enlarge the sample size, thus reducing the risk of overfitting. Moreover, we normalized the images to ensure that their values ranged from 0 to 1.
124
M. Bianchini et al.
MetaNet The MetaNet is a fully-connected neural network with three hidden layers, specifically designed to process the vector of clinical features associated with each skin lesion. The feature vector elements include age, lesion location, gender, and melanocytic status, each of which is processed and encoded in a particular way. As regards age, we have discretized the values into 5 year intervals ranging from 0 to 95 (plus an additional class for ages greater than 95 years) and applied one-hot encoding to these age groups. The lesion location is indicated by the body part where the lesion occurs, and we assigned a different code to each location, using one-hot encoding. The body parts include the anterior torso, upper extremity, lower extremity, lateral torso, palms/soles, and head/neck. If this information is missing, a null value is used. Gender is encoded using a two bit code to indicate the gender of the patient. Finally, for melanocytic status, we used a two bit code to indicate the presence or absence of melanocytes in the lesion. The MetaNet allows to extract meaningful features from the encoded anamnestic data that can be used in combination with the information extracted from the image. MergedNet To perform the final classification, MergedNet analyzes a concatenation of the last hidden layer outputs from LesionNet and MetaNet. However, the feature vectors of the two networks have different dimensions: to balance their contributions, a small fully-connected network is applied to the LesionNet feature vector, reducing its size. The MergedNet architecture includes three hidden layers and a softmax output layer with two units that produce a binary classification of the lesion. Experimental Setup The ISIC dataset has been split using 85% of the data for training, 5% for validation, and the remaining 10% for testing. Furthermore, each set contains an equal number of benign and malignant cases to prevent any imbalances within the subsets. The LesionNet and the MetaNet are first trained independently. Their outputs are then combined to constitute the input to a unique architecture (MergedNet). The entire MergedNet architecture is trained at once, with error backpropagation occurring from the output layer of the MergedNet to the layers of both the LesionNet and MetaNet.
6.3.1.1
Classification Results
We evaluated the classification performance using the experimental setups described in the previous section. The results obtained on the validation set and on the test set of the ISIC dataset are reported in Table 6.2. The results clearly demonstrate the significant improvement in performance achieved by incorporating clinical data in the analysis. Indeed, by classifying only the images (using the LesionNet) the test results are 5% lower than using our combined approach (MergedNet). It is worth noting that using the anamnestic data of the
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
125
Table 6.2 Validation and test accuracies of the three network architectures. LesionNet is trained on only images, MetaNet uses only anamnestic data, MergedNet uses the MetaNet and the LesionNet as feature extractors Model Validation accuracy (%) Test accuracy (%) LesionNet MetaNet MergedNet
87.00 83.00 91.00
83.44 80.6 88.34
patient together with the visual inspection of the skin lesion is the standard procedure in dermatological diagnostics. In our setup, we show that it is fundamental even for the automatic analysis of dermoscopic images based on CNNs.
6.3.2 Skin Lesion Segmentation Model Training Deep segmentation models require a considerable amount of training data to be effective. Although the ISIC Archive provides a large number of images for skin lesion classification, only a small subset of them (2,694) have pixel-level labels for segmentation. To address the limited availability of pixel-level labeled images we employed the ISIC_WSM dataset. The supervision of this dataset, obtained with a weakly supervised approach inspired by [6], could be less accurate than manual annotation. However, as we experimentally prove, it is still highly effective if included in the network training. All the experiments were carried out with the two segmentation models previously described (SMANet [10] and SegNeXt [18]), based on the following setups: • ISIC_GT (ISIC Ground Truth) – The networks were trained with only the 2594 images of the ISIC 2018 segmentation challenge; • WSM (Weakly Segmentation Maps) – Only the generated label maps of the ISIC_WSM dataset were employed in the training phase; • WSM + ISIC_GT – The segmentation networks were first trained on the weakly supervised label maps and then fine-tuned on the original ground truth supervisions, provided for the 2018 segmentation challenge. To standardize the variation in image resolution, the largest side of the image was scaled to 1024 pixels while preserving the aspect ratio. Moreover, to augment the dataset, the networks were trained on random crops of 377×377 pixels with random flipping (horizontal and vertical) and rotation (90◦ , 180◦ , and 270◦ ). Additionally, random zooms with factors of 0.4, 0.6, 0.8, and 1 were applied to the image patches. The Adam optimizer has been employed, with a learning rate of 10−4 and a minibatch of 15 examples. To stop the network training, a validation set of 100 images from the ISIC database was used in both the ISIC_GT and WSM+ISIC_GT setups.
126
M. Bianchini et al.
Table 6.3 Results on the 100 images of the validation set of the ISIC 2018 segmentation challenge, obtained with the SMANet and the SegNeXt, based on the three experimental setups Setups SMANet (Mean IoU) (%) SegNeXt (Mean IoU) (%) ISIC_GT WSM WSM+ISIC_GT
78.7 84.5 87.3
87.6 87.4 88.2
Table 6.4 Results on the 1000 images of the test set of the ISIC 2018 segmentation challenge, obtained with the SMANet and the SegNext, based on the three experimental setups Setups SMANet (Mean IoU) (%) SegNeXt (Mean IoU) (%) ISIC_GT WSM WSM+ISIC_GT
59.9 70.3 74.1
76.3 77.6 78.7
For the WSM setup, a subset of about 100 randomly selected images was extracted from the generated label maps and used as the validation set. In the evaluation phase, the largest side of the image was scaled to a fixed size (1024 pixels), as per the training, and a sliding window approach was applied, using the same window size used during training, with an overlap factor of 50% between adjacent windows. The predicted label map was then interpolated using a nearest-neighbor approach to recover the original size. Both SMANet and SegNeXt were evaluated using the same experimental setup. All experiments were conducted in a Linux environment; the SMANet was trained on an NVIDIA GTX 2080Ti, while the SegNext training was carried out on an NVIDIA TITAN RTX.
6.3.2.1
Segmentation Results
We conducted an evaluation of the two network architectures on the basis of the three experimental setups described in the previous section. The evaluation was first performed on the official validation set provided for the ISIC 2018 segmentation challenge, and the results are presented in Table 6.3. As we can observe, with both models, the best Mean Intersection-over-Union (MIoU) is obtained when the generated supervision is employed to train the network and the original ground truth is used to fine-tune it. However, it is important to note that these results are not a true indicator of the model’s performance, as the validation set was used to early stop the network training in the ISIC_GT and WSM+ISIC_GT setups. To perform a proper model performance evaluation, it is necessary to conduct the evaluation on the test set, submitting the results obtained by the two models to the ISIC evaluation server. The results of our models on the test set can be found in Table 6.4. The effectiveness of including the ISIC_WSM dataset in the training process is evident also in this case. Using only the ISIC_WSM data (WSM configuration)
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
127
results in an increase in the MIoU of about 10% with SMANet and 1.3% with SegNeXt, compared to using only the 2594 images provided for the ISIC segmentation challenge. Results further improve when the model trained in the WSM setup is finetuned on the ISIC training set (WSM+ISIC_GT), resulting in a total MIoU increase of 14.2% with SMANet and 2.4% with SegNeXt. In addition, Figs. 6.3 and 6.4 present a qualitative assessment of the segmentation results obtained from the models trained in the three proposed setups on the validation and test sets, respectively. As shown in the figures, including ISIC_WSM in the network training results in a highly accurate segmentation of the lesions and an improved ability in the detection of their boundaries. Segmentation can assist doctors in visualizing the contour and shape of a lesion, which are crucial features considered for diagnosis. Indeed, asymmetrical and irregular boundaries are commonly observed in melanomas, whereas benign lesions tend to have symmetrical and smooth boundaries. Therefore, retrieving lesion boundaries through segmentation can provide valuable information to doctors, enabling them to evaluate the consistency of the diagnoses.
6.3.3 Saliency Map Extraction Saliency map extraction helps identify regions of an input image that contribute the most to a network prediction. Indeed, it is possible to gain insights into how the network is processing a given image and how it is making predictions, adding the possibility to interpret the model decision. Various techniques can be used to extract saliency maps, such as Gradient-weighted Class Activation Mapping (Grad-CAM) [33] or Guided Backpropagation [34]. Grad-CAM calculates the gradients of the output of the network with respect to the feature maps of the last convolutional layer and then weights the feature maps by the gradients to obtain a localization map. Guided Backpropagation, instead, uses the gradients of the output with respect to the input to highlight the pixels that have the strongest influence on the prediction. In this work, we employed the Guided Backpropagation technique using the Python library iNNvestigate Neural Networks [1] to extract the saliency map from the classification model presented in Sect. 6.3.1. Some examples of the extracted saliency maps are shown in Fig. 6.5. The first lesion shown in Fig. 6.5 corresponds to a melanoma, correctly predicted by the network. The corresponding saliency map shows that the network focuses its attention on the irregular border and on some parts of the lesion that have a variable color. These are all characteristics that are actually linked to melanomas and could help the doctor to understand and trust the prediction. Instead, both the other two lesions correspond to melanomas, but the network classified them incorrectly. As can be seen from the saliency maps, the network does not find parts of the lesion having a high salience. In this case, the dermatologist could quickly understand that the decision made by the network is not reliable.
M. Bianchini et al.
Ground Truth
SegNeXt WSM+ISIC
SegNeXt WSM
SegNeXt ISIC
SMANet WSM+ISIC
SMANet WSM
SMANet ISIC
Original Images
128
Fig. 6.3 Results on the validation set with both network architectures in the three experimental setups
129
SegNeXt WSM+ISIC
SegNeXt WSM
SegNeXt ISIC
SMANet WSM+ISIC
SMANet WSM
SMANet ISIC
Original Images
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
Fig. 6.4 Results on the test set with both network architectures in the three experimental setups
6.4 Explainable AI in Medical Imaging The field of learning trustworthily is rapidly evolving. It focuses on incorporating the human-relevant requirements of fairness, robustness, privacy, and interpretability into AI and, particularly, machine and deep learning. Concerns about fairness were raised by society as AI began to exhibit human bias. Robustness requires that DL models are not misled by carefully crafted malicious data or induced into unexpected errors by poisoned training data. Privacy preservation addresses the problem of keep-
130
M. Bianchini et al.
(a) Original image
(b) Saliency map
(c) Saliency map overlapped with original images
Fig. 6.5 Some examples of saliency maps extracted with the iNNvestigate neural network library
ing individual data private while still being able to employ them to obtain useful information about a population. Finally, explainability aims at providing model/outcome explanations for naturally black-box DL approaches, engendering trust in their users [28]. It is actually well-established that DL algorithms used for image-based medical diagnosis, risk prediction, and informing triage decisions underperform for disadvantaged groups, such as women, or racial and ethnic minorities [17]. Moreover, DL models for medical image analysis are also vulnerable to adversarial attacks. Adversarial attacks against healthcare systems could potentially cause misdiagnosis and affect therapeutic judgments by imperceptibly altering medical images used as input to DL tools [31]. Large-scale collection, aggregation, and transmission of patient data can rise both legal and ethical issues. Furthermore, it is a fundamental patient right to have control of the storage, transmission, and usage of personal health data. Centralized data sharing practically eliminates this control, leading to a loss of sovereignty. Moreover, anonymized data, once transmitted, cannot easily be retrospectively updated or augmented, for example by introducing additional clinical information that becomes available. Federated learning is in charge of advancing medical research without compromising health data privacy [23]. It allows an AI algorithm to learn on different datasets without removing the data from where they
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
131
were stored. Hospitals and research centers keep control over data governance and GDPR3 compliance. Additional privacy-preserving measures such as differential privacy and secure aggregation allow for new ways to protect data processed by federated learning. Finally, in all deep learning applications to healthcare, it is essential to reinforce transparency and explainability, in order to allow users to make decisions that can be understood, shared, and explained not only to other doctors or healthcare professionals but also to patients. In other words, a diagnostic system needs to be transparent, understandable, and explainable to gain the trust of both physicians and patients. Ideally, it should be able to explain the complete logic of making a certain diagnosis to all the parties involved. In particular, the man in the loop approach places people’s knowledge and experience at the center of machine learning processes and, in this specific case, involves the human expert who, with their experience, can push the AI tool toward their own way of judging, gradually acquiring trust in its behavior. For this purpose, after the classification phase, the dermatologist can take advantage of both the saliency maps, which highlight the areas of the image on the basis of which the CNN has made its decision, and the corresponding segmented images, which exhibit the characteristics of the lesion contour. In particular, for the lesion border, it will be possible for the doctor to visually examine it, while an objective smoothness measure can also be obtained based, for instance, on a spline approximation of the contour. Figure 6.6 illustrates an example of our software pipeline’s output for a given input lesion. The output includes the classification, the saliency map displayed alone and overlapped with the original image, as well as the highlighted border of the lesion on the original image.
6.5 Conclusions The automatic diagnosis of the different forms of skin tumors which include, in addition to the previously described types, also benign tumors is extremely important and it can prove life-saving when it comes to the prognosis of nodular melanoma that, if mistaken for a benign pathology, can lead in a short time to metastatic carcinoma – still today difficult to cure. Moreover, teledermatology can be supported by artificial intelligence, although the total body skin examination is difficult to perform virtually, with clinically significant lesions that can be missed. Besides, the implementation of teledermatology involves various technological costs (for example, equipment costs, technological skills and personnel training), but also costs related to the need of protecting large quantities of sensitive data being transmitted. Teledermatology may also not be available to people who do not have access to high-quality Internet or high-quality camera telecommunication devices. 3
In May 2018, a uniform data law was approved for all 27 EU member states, aimed at protecting the privacy of European citizens on digital infrastructures around the world, called General Data Protection Regulation.
132
M. Bianchini et al.
Fig. 6.6 Example of the output given by our overall software pipeline
Beyond these considerations, there is no doubt that AI-based dermatology has diagnostic potential that makes it superior to that of the individual healthcare professional, above all for its ability to identify suspected cases early. This, however, does not mean that the automatic AI tool must be made absolute to the point of thinking that it can replace the specialist. Instead, it can be a valid support to the activity of doctors, both general practitioners and specialists in dermatology. Artificial intelligence algorithms can help improve early diagnosis, identifying cancers at an early stage or diagnosing diseases that doctors miss. This approach could also reduce the burden onwards by allowing healthcare professionals to see more patients and, in places where there is a severe shortage of specialists, can still deliver quality service in a shorter time by reducing the lists in wait. What is reported in this chapter constitutes only a small contribution to the scenario just described, but demonstrates how artificial intelligence can actually be used for the early diagnosis of skin lesions, constituting a powerful and sufficiently transparent support tool for the human expert.
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
133
References 1. Alber, M., Lapuschkin, S., Seegerer, P., Hägele, M., Schütt, K.T., Montavon, G., Samek, W., Müller, K.R., Dähne, S., Kindermans, P.J.: iNNvestigate neural networks! J. Mach. Learn. Res. 20(93), 1–8 (2019) 2. Andreini, P., Bonechi, S., Bianchini, M., Mecocci, A., Di Massa, V.: Automatic image classification for the urinoculture screening. In: Intelligent Decision Technologies: proceedings of the 7th KES International Conference on Intelligent Decision Technologies (KES-IDT 2015), pp. 31–42. Springer (2015) 3. Andreini, P., Ciano, G., Bonechi, S., Graziani, C., Lachi, V., Mecocci, A., Sodi, A., Scarselli, F., Bianchini, M.: A two-stage GAN for high-resolution retinal image generation and segmentation. Electronics 11(1), 60 (2021) 4. Bonechi, S.: A weakly supervised approach to skin lesion segmentation. In: ESANN 2022 Proceedings European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2022) 5. Bonechi, S.: ISIC_WSM: generating weak segmentation maps for the ISIC archive. Neurocomputing 523, 69–80 (2023) 6. Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F.: Generating bounding box supervision for semantic segmentation with deep learning. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pp. 190–200. Springer (2018) 7. Bonechi, S., Andreini, P., Mecocci, A., Giannelli, N., Scarselli, F., Neri, E., Bianchini, M., Dimitri, G.M.: Segmentation of aorta 3D CT images based on 2D convolutional neural networks. Electronics 10(20), 2559 (2021) 8. Bonechi, S., Bianchini, M., Bongini, P., Ciano, G., Giacomini, G., Rosai, R., Tognetti, L., Rossi, A., Andreini, P.: Fusion of visual and anamnestic data for the classification of skin lesions with deep learning. In: International Conference on Image Analysis and Processing, pp. 211–219. Springer (2019) 9. Bonechi, S., Bianchini, M., Mecocci, A., Scarselli, F., Andreini, P.: Segmentation of Petri plate images for automatic reporting of urine culture tests. In: Handbook of Artificial Intelligence in Healthcare, pp. 127–151. Springer (2022) 10. Bonechi, S., Bianchini, M., Scarselli, F., Andreini, P.: Weak supervision for generating pixellevel annotations in scene text segmentation. Pattern Recogn. Lett. 138, 1–7 (2020) 11. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal Mach Intell 40(4), 834–848 (2017) 12. Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose—Based CNN features for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3218–3226 (2015) 13. Codella, N., Rotemberg, V., Tschandl, P., Celebi, M.E., Dusza, S., Gutman, D., Helba, B., Kalloo, A., Liopyris, K., Marchetti, M., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the International Skin Imaging Collaboration (ISIC) (2019). arXiv:1902.03368 14. Domingues, B., Lopes, J.M., Soares, P., Pópulo, H.: Melanoma treatment in review. ImmunoTargets Therapy 7, 35–49 (2018) 15. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017) 16. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press Ltd. (2016) 17. Grote, T., Keeling, G.: Enabling fairness in healthcare through machine learning. Ethics Inf. Technol. 24(39) (2022) 18. Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: SegNeXt: rethinking convolutional attention design for semantic segmentation (2022). arXiv:2209.08575
134
M. Bianchini et al.
19. Hasan, M.K., Elahi, M.T.E., Alam, M.A., Jawad, M.T., Martí, R.: DermoExpert: skin lesion classification using a hybrid convolutional neural network through segmentation, transfer learning, and augmentation. Inform. Med. Unlock. 100819 (2022) 20. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017) 21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 22. ISIC: SIIM–ISIC 2020 challenge dataset (2020). https://challenge2020.isic-archive.com/ 23. Kaissis, G.A., Makowski, M.R., Rückert, D., Braren, R.F.: Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2, 305–311 (2020) 24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012) 25. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A., Van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017) 26. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 27. Nie, Y., Sommella, P., Carratù, M., Ferro, M., O’Nils, M., Lundgren, J.: Recent advances in diagnosis of skin lesions using dermoscopic images based on deep learning. IEEE Access 10, 95716–95747 (2022) 28. Oneto, L., Navarin, N., Biggio, B., Errica, F., Micheli, A., Scarselli, F., Bianchini, M., Demetrio, L., Bongini, P., Tacchella, A., Sperduti, A.: Towards learning trustworthily, automatically, and with guarantees on graphs: an overview. Neurocomputing 493, 217–243 (2022) 29. Papandreou, G., Kokkinos, I., Savalle, P.A.: Untangling local and global deformations in deep convolutional networks for image classification and sliding window detection (2014). arXiv:1412.0296 30. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv:1804.02767 31. Rodriguez, D., Nayak, T., Chen, Y., Krishnan, R., Huang, Y.: On the role of deep learning model complexity in adversarial robustness for medical images. BMC Med. Inform. Decis. Making 22(Suppl 2)(160) (2022) 32. Rossi, A., Vannuccini, G., Andreini, P., Bonechi, S., Giacomini, G., Scarselli, F., Bianchini, M.: Analysis of brain NMR images for age estimation with deep learning. Procedia Comput. Sci. 159, 981–989 (2019) 33. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad—CAM: Visual explanations from deep networks via gradient—Based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017) 34. Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net (2014). arXiv:1412.6806 35. Thapar, P., Rakhra, M., Cazzato, G., Hossain, M.S.: A novel hybrid deep learning approach for skin lesion segmentation and classification. J. Healthc. Eng. 2022 (2022) 36. Tognetti, L., Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F., Cevenini, G., Moscarella, E., Farnetani, F., Longo, C., Lallas, A., et al.: A new deep learning approach integrated with clinical data for the dermoscopic differentiation of early melanomas from atypical nevi. J. Dermatol. Sci. 101(2), 115–122 (2021) 37. Tschandl, P., Rosendahl, C., Kittler, H.: The HAM10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions. Sci. Data 5(1), 1–9 (2018) 38. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
6 From Pixels to Diagnosis: AI-Driven Skin Lesion Recognition
135
Author Biography Monica Bianchini received the Laurea degree cum laude in Applied Mathematics in 1989 and the Ph.D. degree in Computer Science and Control Systems in 1995 from the University of Florence, Italy. She is currently an Associate Professor at the Department of Information Engineering and Mathematics of the University of Siena. Her main research interests are in the field of machine learning, with emphasis on neural networks for structured data and deep learning, approximation theory, bioinformatics, and image processing. She served/serves as an Associate Editor for IEEE Transactions on Neural Networks, Neurocomputing, In. J. of Knowledge-Based and Intelligent Engineering Systems, Int. J. of Computers in Healthcare, Frontiers in Genetics and has been the editor of numerous books and special issue in international journals on neural networks/structural pattern recognition. She is a permanent member of the editorial board of IJCNN, ICANN, ICPR, ESANN, ANNPR, and KES.
Chapter 7
Artificial Intelligence in Obstetrics Smaranda Belciug and Dominic Gabriel Iliescu
Abstract The aim of this chapter is to present theoretical and practical aspects of how Artificial Intelligence can be applied in the obstetrics field. We are discussing matters such as the poignant role that artificial intelligence plays in fetal morphology scans, how artificial intelligence help healthcare providers determine the type of birth should be used (vaginal vs. cesarean), and last but not least how can we monitor the training programs success in teaching doctors and midwifes in determining the fetal head position and weight. Keywords Deep learning · Evolutionary computation · Differential evolution · Logistic regression · Statistics · Learning curves
Acronyms AI CNN DE DL EC EU FB FP
Artificial Intelligence Convolutional Neural Networks Differential Evolution Deep Learning Evolutionary Computation European Union Fetal Brain Fetal Plane
S. Belciug (B) · D. G. Iliescu Department of Computer Science, Faculty of Sciences, University of Craiova, Craiova, Romania e-mail: [email protected] D. G. Iliescu e-mail: [email protected] D. G. Iliescu Departament of Obstetrics and Gynecology, University of Medicine and Pharmacy of Craiova, Craiova, Romania © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_7
137
138
NN ReLU
S. Belciug and D. G. Iliescu
Neural Networks Rectified Linear Unit
7.1 Introduction Practicing medicine today is not like it used to be. The advancements made in the last decade are amazing and overwhelming. Doctors need to keep up with the new trend, learn and embrace it. If not, chances are that sooner rather than later they will be left behind. These days, Artificial Intelligence (AI) is a game changer. The obstetrics field has been “touched” by this new magical trend, whether we are referring to fetal morphology assessment, or predicting the delivery mode. Obviously, some healthcare providers are still reluctant in recognizing the need of AI in obstetrics. Luckily enough, the doctors that are open-minded are the ones that help data scientists take AI further than it has ever gone before into healthcare revolution. Before we proceed further along, let us see why would need AI in obstetrics. We shall start with the fetal morphology assessment. We already know that during a fetal morphology ultrasound scan the doctor can assess whether the fetus has congenital anomalies or not. Congenital anomalies are the most encountered cause of fetal death and infant mortality and morbidity [1]. Within the first 28 days after birth, approximately 295000 newborns lose the battle with their condition cause by congenital anomalies and die. The authors’ country, Romania, has one of the highest death rates due to congenital anomalies in the EU [2]. The numbers look really bad: each year up to 104000 (2.5%) births within the EU have congenital anomalies, while worldwide this number reaches 7.9 million babies (6%) [3, 4]. The good news is that some congenital anomalies can be treated, or at least kept under control. The bad news is that, even so, 3.2 million children will be disabled for life. To try to do disease management control, provide life-saving treatments, stop the progression of disabilities, or undertake other actions, can be done if these congenital anomalies are detected early in pregnancy. Congenital anomalies can be diagnosed through a morphology ultrasound scan. Through this procedure, we are able to evaluate the structure and functionality of the fetus’ organs. Only a skilled sonographer can perform a morphology scan correctly. The detection rate of an experienced doctor (more than 2000 examinations) is of 52%. If we are talking about the detection rate of an unexperienced doctor (less than 2000 examinations) things look a lot scarier: only 32.5% [5]. If we were to look at the discrepancies between the pre and postnatal diagnosis of congenital anomalies, we would learn that the numbers are: 29% in live births from which 7% had a longterm impact; and 23% in fetal autopsies [6]. Some of the reasons why a sonographer misreads a morphology scan are: the lack of necessary sonography knowledge, time pressure, fatigue, fetal involuntary movement, and different mother physical characteristics. For instance, obesity leads up to 50% in misread ultrasound for women that have a body mass index over 30 kg/m2 versus women that have a normal weight [7].
7 Artificial Intelligence in Obstetrics
139
The second area where AI could lend a helping hand is in forecasting the type of delivery. If things would be as easy as online shopping. You choose what you want, enter your shipping details, pay, and wait for your package to arrive. If you are anxious, you can check the tracking number to see where your package is and who delivers it. Unfortunately, when it comes to babies, the delivery date and mode cannot be predicted. Even if the woman is prepared for a vaginal birth, things can change along the way, and a caesarean section might be needed. Modern obstetrics wants to predict the delivery mode before the onset of labor. Why? Because emergency c-sections are associated with high fetal and maternal morbidity and mortality. Csections have been performed for thousands of years, but quite differently from what we know these days. For instance, one major difference consists in the fact that a c-section back in Ancient Roman time meant delivering a baby after the mother died during childbirth. If we read the Maimonides or the Talmud, we see that in ancient Jewish literature the mother survives the surgery [8]. Fast-forwarding till today, we find ourselves trying to predict with a certain accuracy the delivery mode. Different clinical and ultrasound parameters have been discussed in literature regarding their role in forecasting whether a c-section is needed. Even so, there are few studies that try to find means to predict the type of birth before the onset of the uterine contractions or rupture of membranes. The aim of this chapter is two-fold. At first, we will show different novel AI methods that can distinguish between different view planes of the fetal abdomen and fetal brain. This is the first step toward congenital anomaly detection. Secondly, we will present a statistical framework that can be used to predict the type of birth.
7.2 Neuroevolution for Fetal Brain Plane Recognition Neuroevolution makes use of evolutionary computation approaches to design a neural network’s architecture. In this respect, we will present and explore two possibilities that optimize the architecture of a deep learning (DL) neural network (NN) without manually tuning it: differential evolution and evolutionary computation.
7.2.1 Differential Evolution + Deep Learning Mix Differential evolution (DE) is flexible, versatile, easy to implement and understand. It appeared in 1997 [9, 10]. DE mimics the biological process of evolution through the creation of a temporary individual that has as starting point the differences inside the population. DE converges globally and is also robust. It has been applied successfully in constrained image classification [11], image segmentation [12], linear array [13], neural networks [14], global optimization problems [15], and other areas [16–19].
140
S. Belciug and D. G. Iliescu
By using a population-based global search strategy, the algorithm is able to reduce the complexity of the mutation operation through a one-by-one competition. Thus, the method adapts the candidate solutions and explores the search space inparallel. At each generation G, the population has N candidates, written 1 2 M X j G = x j G , x j G , . . . , x j G , j = 1, 2, . . . , N , where M is the number of features. We randomly generate the initial population using the upper and lower bound of each feature’s search domain: X in = X in,L + rand() · (X in,U − X in,L ), i = 1, 2, . . . , M, n = 1, 2, . . . N ,
(7.1)
where X in,L is the lower bound of the variable X in , and X in,U is the upper bound of the variable X in . To apply the mutation operator, we need to select three vectorsX r1 ,G , X r2 ,G , X r3 ,G , and apply the following formula: VGn = X rn1,G + F · X rn2,G − X rn3,G ,
(7.2)
n is the donor vector, F ∈ [0, 1] is the variation factor that regulates the where VG+1 amplification degree of the differential variable X rn2,G − X rn3,G . n from In what regards the recombination operator, we create a trial vector Ui,G+1 n n the target vector X i,G and the donor vector VG+1 , using the following formula:
n Ui,G+1
=
n , i f r and () ≥ C p ori = Irand Vi,G+1 , n X i,G , i f r and () > C p andi = Irand
(7.3)
where i = 1, 2, . . . M, n = 1, 2, . . . , N , Irand ∈ [1, M] is an integer random number, and C p is the recombination probability. Through this operation, the old and the new candidate solutions can interchange parts of their code and thus produce a new offspring. n n with the trial vector Ui,G+1 . The selection operator compares the target vector X i,G We select to form the next generation, the vector that minimizes the fitness function value. n n n < f X i,G Ui,G+1 , i f f Ui,G+1 n X i,G+1 , (7.4) = n , other wise X i,G where i = 1, 2, . . . , M, and n = 1, 2, . . . , N . The steps of the DE algorithm are: 1. Initialize candidate population. 2. Repeat: 2.1. Mutation operation. 2.2. Recombination operation.
7 Artificial Intelligence in Obstetrics
141
2.3. Selection operation. Until the stopping criterion is met. In [20], the authors represent the candidate solutions in a fixed-length integer array. A convolutional neural network is composed of an input layer, λ convolutional hidden layers, π pooling layers, and an output layer. The number of hidden neurons is denoted by n H . Each filter has a width, fw, and a height, fh. The hyperparameters are the recombination probability, Cp, and the mutation variation factor, F. Therefore, they defined a candidate solution as an integer array xi = (λ, n H, f w, f h, C p, F), i = 1, . . . , q, where q is the number of candidate solutions in the population. Because the solutions have a fixed length, the candidate solution has the same number of hidden units. After each convolutional layer, they have added a max pooling layer. The architecture of this DL ends with a dense layer. The non-linear activation function for the convolutional layers is the ReLU: f (x) =
0, f or x < 0 , x, f or x ≥ 0
(7.5)
while the activation function between the dense layer and the output is the softmax function. ⎛ ⎞ p1 e xi ⎜ .. ⎟ . (7.6) p = ⎝ . ⎠, wher e pi = n xj j=1 e pn The pool size was (2, 2). The proposed DE/CNN algorithm has the following steps: 1. Input: the dataset D, the number of generations G, the number of candidate solutions in each generation N , Xi = (λi , n H i , f wi , f h i , C pi , Fi ) i = 1, 2, . . . , N the candidate solutions. 2. Initialization: Randomly generate a set of candidate solutions X i,1 , i = 1, 2, . . . , N , and built N CNNs having λi number of convolutional layers and pooling layer, n H i number of hidden units per convolutional layer, the filter size ( f wi , f h i ), the recombination probability C pi , and mutation variation factor Fi . Train the CNNs and record their accuracies and losses over the validation dataset. Each CNN’s loss will represent the candidate solution’s fitness value. 3. Repeat 3.1. Mutation: for each individual perform mutation using the variation factor Fi ; 3.2. Recombination: for each pair of individuals perform recombination with C pi ; 3.3. Select: the individuals that will form the next generation based on their validation loss. until stopping criterion is met (number G of generations is reached).
142
S. Belciug and D. G. Iliescu
4. Output: the best candidate solution that will represent the networks’ architecture. The authors apply the novel DE/CNN algorithm on two maternal-fetal ultrasound datasets (FP and FB—https://zenodo.org/record/3904280#.YfjeTPVBzL9) that were collected from two different hospitals. The FP dataset consists of 6 classes, 4 of which regard the fetal anatomical planes: abdomen (711 cases), brain (3092 cases), femur (1040 cases), and thorax (1718 cases), the fifth regarding the mother’s cervix (1626 cases), and the last one includes the less common image plane (4213 cases). The second set, FB, contains images of the brain planes that are split in 3 classes: transthalamic (1638 cases), trans-cerebellum (714 cases), trans-ventricular (597 cases). The first set has 12 400 images, while the second contains 2949. Figure 7.1 presents three sample images from FP, while Fig. 7.2 presents two sample images from FB. An initial population of 20 candidates with 50 generations was used. The recombination probability was generated from the interval [0.5, 0.7], so that new architectures to be generated at a faster pace, while the mutation variation factor was generated from the interval [0.3, 0.6]. The intervals [1, 6] and [20, 300] had been used to initialize the candidate solution, and the interval [2, 5] for the kernel. The algorithm ran for 10 training epochs with a batch size of 64. The authors obtained a 96.29% accuracy on the FP dataset, and a 78.73% accuracy on the FB dataset.
Fig. 7.1 a Fetal abdomen, b fetal brain, c maternal cervix, d fetal femur, e fetal thorax, f other (https://zenodo.org/record/3904280#.YfjeTPVBzL9)
7 Artificial Intelligence in Obstetrics
143
Fig. 7.2 a Trans-ventricular, b trans-thalamic, c trans-cerebellar (https://zenodo.org/record/390 4280#.YfjeTPVBzL9)
7.2.2 Evolutionary Computation + deep Learning Mix Another way to optimize the architecture of a DL neural network through neuroevolution is by using evolutionary computation (EC). These types of algorithms mimic the biological processes that we encounter in real life. We start with a population of chromosomes that are subjected to different selection mechanisms based on certain fitness function, crossover and mutation operators. Each chromosome has a certain level of intel regarding the potential solution of the problem. The best chromosomes are selected at each iteration to form the future population of candidate solutions. When we reach a predefined stopping criterion the algorithm stops and returns the best solution. Through the fitness function, we measure the ability of each chromosome to compete against the others in the specific environment, f (xi ), i = 1, 2, . . . , n. Each contains M genes/features and is mathematically written as X iC = chromosome C C C xi1 , xi2 , . . . , xi M , where i = 1, 2, . . . , N . We initialize the population randomly U L ) and lower bound (X i,n ) of the search interval for each gene of from the upper (X i,n the chromosome. U L L + rand() · X i,n − X i,n , i = 1, 2, . . . , M, n = 1, 2, .., N . X i,n = X i,n The general scheme for an EC algorithm is [21, 22]: 1. Randomly initialize the chromosome population 2. Obtain the fitness score of each chromosome 3. Repeat the following steps until the stopping criterion is reached: 1.1 1.2 1.3 1.4
Apply the selection operator to choose the parents Apply the crossover operator to produce the offspring Apply the mutation operator on the offspring to produce variety Apply the fitness function on the population
(7.7)
144
S. Belciug and D. G. Iliescu
1.5 Select according to the fitness the chromosomes that will form the next generation. Using EC, a DL’s architecture can be optimized [23]. The authors used a max pooling layer of 2 by 2, and a kernel depth of 3 (red, green, blue). The recombination probability was 0.6 and the mutation 0.25. A potential solution will be an integer vector of the following form: xi = γ , n H j , j = 1, . . . , γ , activation f unction, optimi zer , i = 1, 2, . . . , N . All the architectures ended with a dense layer. The EC/DL algorithm is presented below: 1. Input: the image dataset, the number of generations nG, the number of xi = chromosomes in a generation, N, and the chromosomes γ , n H j , j = 1, . . . , γ , activation f unction, optimi zer , i = 1, 2, . . . , N . 2. Create the initial population: generate randomly the chromosomes. Use each chromosome i to build a DL having γi convolutional and pooling layers, n H i , one activation function chosen randomly from the three mentioned before, and one optimizer. Train the DLs and record the chromosome’s fitness score as the accuracy obtained on the validation dataset of the DL having the respective architecture. 3. Selection. Select the parents for the recombination process 4. Recombination. Apply the recombination operator. 5. Mutation. Apply the mutation operator. 6. Selection. Select the chromosomes that will form the next generation. 7. Repeat steps 3–6 until the stopping criterion is met (the predetermined number of generations nG has been reached). Output. Return the best performing chromosome from the last generation. This chromosome will represent the best DL architecture. The DE/CNN is applied on a prospect cohort study dataset. deployed in a tertiary maternity hospital Emergency County Hospital Craiova, Romania, which involves ultrasound movies and images from the second and third trimester morphologies (PARADISE). The dataset contained 970 images from 100 participants. The dataset regards the abdominal plane and has 10 classes: 3 vessels plus bladder (113 images), gallbladder (70 images), sagittal cord insertion (47 images), transverse cord insertion (103 images), anterior abdominal wall (20 images), anteroposterior kidney plane (96 images), biometry plane (205 images), intestinal sagittal plane (83 images), kidney sagittal plane (207 images), bladder plane (22 images). Figure 7.3 presents a sample image from every decision class. The dataset is unbalanced, a common situation encountered in clinical scenarios. In order to obtain a larger dataset, the authors have applied augmentation: flips, rotations, translations, and Gaussian noise. After this process was over, the dataset contained 4815 images. The EC algorithm had to optimize the CNN by choosing between different activation functions and optimizers. The choices for the activation function were the sigmoid, the hyperbolic tangent, and ReLU, as for the optimizer, they were Adam,
7 Artificial Intelligence in Obstetrics
145
Fig. 7.3 Fetal morphology abdomen: a 3 vessels plus bladder; b gallbladder; c sagittal cord insertion; d transverse cord insertion; e anterior abdominal wall; f anteroposterior kidney plane; g biometry plane; h intestinal sagittal plane; i kidney sagittal plane; j bladder plane
AdamX, and NAdam. The operators used were the uniform crossover and uniform mutation, adapted for the integer representation. The initial population contained 30 candidate solutions. The number of generations was 50. The chromosomes were generated from the intervals [1, 5] and [1, 256]. The stride was set to 1. The algorithm ran for 10 training epochs having a batch size of 64. The accuracy obtained on the PARADISE dataset was 74.63%.
146
S. Belciug and D. G. Iliescu
7.3 Prediction of Labor Outcome with Artificial Intelligence As we have mentioned before, an important problem that needs to be resolved by modern obstetrics is to forecast the delivery mode. One way to predict this is by using AI and Statistics. In [24], the authors tried a longitudinal assessment of fetomaternal features at term by recording clinical parameters, such as maternal characteristics and Bishop score, together with weekly ultrasound measurements that included fetal head descent, occiput posterior, estimated fetal weight, and cervical length. Through their analysis they tried to establish which parameters are strongly and significantly dependent to the birth outcome, and also enabled a statistical analysis of prediction from early to full-term pregnancy. The study was conducted on nulliparous women at term. The principal AI method used in this research was the logistic regression. The dataset recorded the above-mentioned parameters of primiparous pregnant women that had been admitted to the Prenatal Unit for the assessment of the thirdtrimester well-being scan at ≥37 gestational weeks. The patients that had pregnancies with indications for elective cesarean delivery, multiple pregnancies, noncephalic presentation, previous cesarean delivery, preeclampsia, fetal growth restriction, and / or diabetes mellitus had been excluded from the study. The remaining 276 patients had a series of weekly scans and clinical examinations at term. The study was three-fold. At first the authors wanted to determine which variables are correlated to the outcome per week. Thus, they have applied the logistic regression to determine the predictive variables. The second task was to see whether there are significant differences between the delivery modes in terms of these variables. For this the authors applied Friedman analysis of variance, and Kendall’s concordance coefficient. The third task was to identify a reliable and valid cutoff point for these variables and each week to differentiate between the birth types. The receiver operating characteristic curves with G-mean and Youden’s J statistic, precision-recall curves and F1-score were used to establish the threshold for each significant variable per week. The reported results showed that the variables that were significantly correlated with the delivery mode were the body mass index in all term evaluations, the progression distance for weeks 37 and 38, maternal age for week 39, Bishop score, estimated fetal weight, and occiput posterior for week 40. In terms of the week before delivery the most significant attributes were the body mass index, estimated fetal weight, and the progression distance. Using AI, the healthcare professionals can be somehow prepared for what will happened in the delivery room. Time is of an essence there. The doctors need to estimate different parameters, so they avoid dystocia. Dystocia is a major cause of maternal-fetal mortality and morbidity. It is usually cause by fetal-pelvic disproportion. Fetal weight and fetal heal position are two of the most important parameters that need to be assessed. The error rate in determining the fetal head position during labor digital versus using ultrasound ranges between 23 and 53%, [25–29]. In [30], the
7 Artificial Intelligence in Obstetrics
147
authors present a review of how learning curves are used to see what is the amount of necessary theoretical and practical knowledge that is needed to be taught so that future midwives or doctors perform better in the delivery room. Learning curves have been developed by Theodore Paul Wright in 1936. The theory behind them is that the more an individual performs a task, her/his skills improve. If the skills improve, then the costs drop, while the output get better and better. We can look at learning curves as tracking progress milestones. They are represented through a graph that shows the relationship between time and someone’s experience and/or know-how. To plot this graph, multiple functions can be used. The most used ones are: • The sigmoid: the knowledge rises slowly, followed by fast increase, and ending with a plateau (Fig. 7.4). • The exponential growth: the knowledge increases without limit (Fig. 7.5). • Exponential rise: the knowledge increases to a limit, after which it does not improve much with experience anymore (Fig. 7.6). • Power law is used for metrics that need to decrease, such as cost (Fig. 7.7). Fig. 7.4 Sigmoid function plot
Fig. 7.5 Exponential growth graph
148
S. Belciug and D. G. Iliescu
Fig. 7.6 Exponential rise graph
Fig. 7.7 Power law graph
Learning curves can be used to compare between transabdominal sonography and digital vaginal examination. In [31], a study was conducted on a student midwife that was a complete novice. She had no prior knowledge in performing digital vaginal examination, nor in ultrasound scan. She was taught how to perform both examinations, before and after each examination. Using learning curves, the researchers discovered that the skill of determining the fetal head position is gained more rapidly through ultrasonography, rather than vaginal examination. Through the use of learning curves, the myth that the gold standard should be the vaginal examination is proved wrong. In what regards the ultrasonographic estimation of fetal weight and learning curves, we refer the reader to the study of Predanic et al. [32]. The reported results showed that the greatest error observed was found in the first month of the training program. As the training progresses, the performance increases and the error decreases. Interestingly, there was no single point in which a significant improvement could be spotted. The improvement increased progressively over time.
7 Artificial Intelligence in Obstetrics
149
7.4 Conclusions In this chapter, we have explored how we can use AI to assess the view planes in fetal morphology scan, how we can use statistics and AI methods to predict the type of birth, long before the onset of the labor, and how to assess the progress of training programs when it comes to fetal head position and weight estimation. These areas and many other can be improved through AI. If you are curious to find out more about how we can use AI in obstetrics, we invite you to continue reading the book Pregnancy with Artificial Intelligence—A 9.5 months journey from preconception to birth [22]. Acknowledgements This work was supported by a grant of the Ministry of Research Innovation and Digitization, CNCS—UEFISCDI, project number PN-III-P4-PCE-2021-0057, within PNCDI III.
References 1. Boyle, B., et al.: Estimating global burden of disease due to congenital anomaly: an analysis of European data. Arch. Disease Childhood Fetal Neonatal Edition, 103, 22–28, (2018). 2. https://data.unicef.org/country/rou 3. Kinsner-Ovaskainen, A., et al.: European monitoring of congenital anomalies: JRC EUROCAT, report on statistical monitoring of congenital anomalies (2008–2017) EUR 30158 EN, Publication Office of the European Union. Luxembourg (2020). https://doi.org/10.2760.658 86 4. Lobo, I., Zhaurova, K.: Birth defects: causes and statistics. Nat. Educ. 1(1), 18 (2008) 5. Tegnander, E., Eik-Nes, S.H.: The examiner’s ultrasound experience has a significant impact on the detection rate of congenital heart defect at the second trimester fetal examination. Ultrasound Obstet Gyncol. 28, 8–14 (2006) 6. Bensamlali, M., et al.: Discordances between pre-natal and postanatl diagnosis of congenital heart diseases and impact on care strategies. J. Am. Coll. Cardiol. 68, 921–930 (2016) 7. Paladini, D.: Sonography in obese and overweight pregnant women: clinical, medicolegal and technical issues. Ultrasound Obstet Gynecl. 33(6), 720–729 (2009) 8. Boss, J.: The antiquity of caesarean section with maternal survival: the jewish tradition. Med. Hist. 5, 17–31 (1961) 9. Storn, R., Price, K.: Differential-evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997) 10. Storn, R., Price, K.: Differential evolution for multi-objective optimization. Evol. Comput. 4, 8–12 (2003) 11. Omran, M.G.H., Englebrecht, A.P. Selt-adaptive differential evolution methods for unsupervised image classification. In: Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, pp. 1–6 (2006) 12. Aslantas, V., Tunckanat, M.: Differential evolution algorithm for segmentation of wounded images. In: Proceedings of the IEEE International Symposium on Intelligent Signal Processing, WISP (2007) 13. Yang, S., Gan, Y.B., Qing, A.: Sideband suppression in time-modulated linear arrays by the differential evolution algorithm. IEEE Trans. Antenn. Propag. Lett. 1(1), 173–175 (2002)
150
S. Belciug and D. G. Iliescu
14. Dhahri, H., Alimi, A.M.: The modified differential evolution and the RBF (MDE-RBF) neural network for time series prediction. In: Proceedings of the International Joint Conference on Neural Networks, pp. 2938–2943 (2006) 15. Kim, H.K., Chong, J.K., Park, K.Y., Lowther, D.A.: Differential evolution strategy for constrained global optimization and application to practical engineering problems. IEEE Trans. Magn. 43(4), 1565–1568 (2007) 16. Massa, A., Pastorino, M., Randazzo, A.: Optimization of the directivity of a monopulse antenna with a subarray weighting by a hybrid differential evolution methods. IEEEE Trans. Antenn. Propag. Lett. 5(1), 155–158 (2006) 17. Su, C.T., Lee, C.S.: Network reconfiguration of distribution systems using improved mixed integer hybrid differential evolution. IEEE Trans. Power Deliv. 18(3), 1022–1027 (2003) 18. Tasgetiren, M.F., Suganthan, P.N., Chua, T.J., Al-Hajri, A.: Differential evolution algorithms for the generalized assignment problem. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC’09), pp. 2606–2613 (2009) 19. Sum-Im, T., Taylor, M.R., et al.: A differential evolution algorithm for multistage transmission planning. In: Proceedings of the 42nd International Universities Power Engineering Conference (UPEC’07), pp. 357–364 (2007) 20. Belciug, S.: Learning deep neural networks’ architectures using differential evolution. Case study: medical imaging processing. Comput. Biol. Med. 105623 (2022). https://doi.org/10. 1016/j.compbiomed.2022.105623 21. Belciug, S.: Artificial Intelligence in Cancer—Diagnostic to Tailored Treatment. Elsevier (2020) 22. Belciug, S., Iliescu, D.G.: Pregnancy with Artificial Intelligence. A 9,5 Months Journey from Preconception to Birth. Springer Nature (2023). https://doi.org/10.1007/978-3-031-18154-2 23. Ivanescu, R., Belciug, S., Nascu, A., Serbanescu, M.S., Iliescu, D.G.: Evolutionary computation paradigm to determine deep neural networks architecture. Int. J. Comput. Commun. Control 17(5), 4866 (2022) 24. Iliescu, D.G., Belciug, S., Ivanescu, R.C., Dragusin, Cara, M.L., Dira, L.: Prediction of labor outcome pilot study: evaluation of primiparous women at term. Am. J. Obstet. Gynecol. MFM 4(6), 100711 (2022) 25. Sherer, D.M., Miodovnik, M., Bradley, K.S., Langer, O.: Intrapartum fetal head position I: comparison between transvaginal digital examination and transabdominal ultrasound assessment during the active stage of labor. Ultrasound Obstet. Gynecol. 19, 258–263 (2002) 26. Akmal, S., Kametas, N., Tsoi, E., Hargreaves, C., Nicolaides, K.H.: Comparison of transvaginal digital examination with intrapartum sonography to determine fetal head position before instrumental delivery. Ultrasound Obstet. Gynecol. 21, 437–440 (2003) 27. Chou, M.R., Kreiser, D., Taslimi, M., Druzin, M.L., El-Sayed, Y.Y.: Vaginal versus ultrasound examination of fetal occiput position during the second stage of labor. Am. J. Obstet. Gynecol. 191, 521–524 (2004) 28. Dupuis, O., Ruimark, S., Corinne, D., Simone, T., Andre, D., Rene-Charles, R.: Fetal head position during the second stage of labor: comparison of digital vaginal examination and transabdominal ultrasonographic examination. Eur. J. Obstet. Gynecol. Reprod. Biol. 123, 193–197 (2005) 29. Zahalka, N., Sadan, O., Malinger, G., Liberati, M., Boaz, M., Glezerman, M., Rotmensch, S.: Comparison of transvaginal sonography with digital examination and transabdominal sonography for determination of fetal head position in the second stage of labor. Am. J. Obstet. Gynecol. 193, 381–386 (2005) 30. Iliescu, D.G., Belciug, S., Gheonea, I.: Practical guide to simulation in delivery room emergencies. In: Cinnella, G., Beck, R., Malvasi, A., (eds.). Springer (2023), in press 31. Rozenberg, P., Porcher, R., Slomon, L.J., Boirot, F., Morin, C., Ville, Y.: Comparison of the learning curves of digital examination and transabdominal sonography for the determination of fetal head position during labor. Ultrasound Obstet. Gynecol. 31, 332–337 (2008) 32. Predanic, M., Cho, A., Ingrid, F., Pellettieri, J.: Ultrasonographic estimation of fetal weight: acquiring accuracy in residency. J. Ultrasound Med. 21(5), 495–500 (2002)
7 Artificial Intelligence in Obstetrics
151
Smaranda Belciug is an Associate Professor in Computer Science at the University of Craiova, Romania, associate member of the Royal Society of Medicine, Data Scientist at the Molecular Tumor Board—Multidisciplinary Commission for Personalized Therapeutic Indication based on a Comprehensive Molecular (Genetic) Assessment, and an expert for the European Commission Horizon Europe. She has obtained her Ph.D. in Artificial Intelligence in Healthcare from the University of Pitesti, Romania in 2010. Her research interests include Artificial Intelligence in the Healthcare system and Statistics. She is the author of the three monographs “Artificial Intelligence in Cancer: diagnostic to tailored treatment”, Elsevier, “Pregnancy with Artificial Intelligence. A 9,5 months journey from preconception to birth”, and “Intelligent Decision Support Systems—a journey to smarter healthcare”, Springer Nature. In 2022 she has been awarded the Romanian Academy Prize Award. She is an enthusiastic partisan of the multidisciplinary approach in scientific studies, and all her research is driven by this reason. This has been recognized at multiple levels, from the wide variety of nature of the journals she has published into to the variety of journal and conferences that she reviews for.
Chapter 8
A Co-design Approach for Developing and Implementing Smart Health Technologies and Services Sonja Pedell, Leon Sterling, Nicole Aimers, and Diego Muñoz
Abstract In this chapter we demonstrate how co-design can generate the knowledge needed to develop smart technologies and services in participatory ways. Even though co-design has many benefits and has been successfully used in an array of domains, it is argued that easy to apply methods are needed to guide the co-design process. One useful method for this purpose is Motivational Modelling which importantly, does not limit any aspect of participation and gives all stakeholders a strong voice in the design outcome. We report on two case studies using Motivational Modelling in the co-design process to create solutions for health and wellbeing with a focus on older adults. We argue that through the shared goal definition with attention to users’ emotions, we better understand the demands on smart solutions to create benefits for older adults and other stakeholders in complex and individual everyday contexts and hence increase acceptance and adoption. Keywords Co-design · Motivational modelling · Goals · Co-design principles · Smart technology · Older adults · Health technologies
S. Pedell (B) · L. Sterling · N. Aimers · D. Muñoz Swinburne Living Lab, Swinburne University of Technology, Centre for Design Innovation, 1 John Street, Hawthorn, Victoria 3122, Australia e-mail: [email protected] L. Sterling e-mail: [email protected] N. Aimers e-mail: [email protected] D. Muñoz e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_8
153
154
S. Pedell et al.
8.1 Introduction There is a growing amount of literature promoting the need for intelligent and smart technologies in healthcare (e.g. [1]). There is also an expectation that all population groups should benefit from such technologies to increase their quality of life through technologies that are better tailored to the diverse needs of people. Such an objective becomes difficult when groups are not experienced with technology use such as many older adults who are more and more faced with modern technologies for the purpose of increasing their health and wellbeing. However, while adoption of digital technology can support everyday activities, the inability to use technologies often results in feelings of alienation, exclusion and being marginalised [2]. Once inexperienced users have had negative experiences, they do not feel confident to use technologies and even less to be involved in technology development or service design involving the implementation of technology. Hence, we need methods to involve all user groups early on in technology development, implementation and adoption, which empower inexperienced technology users to contribute confidently. Our previous research established that motivational models can help define high-level goals around technology use, with emotions as an important part [3, 4]. Research has also shown that understanding emotional needs of future users of technologies for increasing health and wellbeing leads to better uptake [5, 6]. Here we argue that Motivational Modelling embedded in a co-design approach will lead to the development of smarter health and wellbeing technologies. A motivational model is a high level diagram describing an overall socio-technical system, a software product or a company. In this chapter, models are described in terms of a system, with awareness of their other uses. The intent underlying the creation of a motivational model is to capture a shared understanding of the system, its goals and stakeholders. The diagram captures what the system to be is being designed to do. Motivational Modelling is the process of developing a motivational model. This work presents two case studies where this process was used, and discusses how Motivational Modelling helps align with co-design principles, and how this approach benefits the development and implementation of smart health technologies and services.
8.2 Related Work 8.2.1 Co-design in the Context of Health In the design landscape, ‘co-design’ has become increasingly popular. Co-design is a creative process which describes an array of activities used in the design of products, services and systems which involve key stakeholders collaborating on a shared problem or goal [7]. Co-design does not discriminate; regardless of skill and expertise, everyone involved in each stage of the co-design process works together and is considered to be an equal contributor.
8 A Co-design Approach for Developing and Implementing Smart Health …
155
Co-design has been used in numerous studies investigating a diverse range of design outcomes associated with health and wellbeing. For example, researchers have explored how co-design can support the development of playful technology to help deaf children learn to speak [8], and how co-design can create digital activities to initiate more meaningful social interaction between residents living with advanced dementia and their relatives [9]. But not only technologies are in the centre for involving future users in co-design. Research is also looking into how participatory methods can create a shared understanding about positive ways of ageing [10], how student nurses conceptualise human dignity in the care of patients in the effort to better prepare them for professional practice [11] and how co-design of person-centred care experiences can lead to better outcomes in the domain of hearing rehabilitation [12]. These examples show that co-design processes are highly versatile and can be helpful in many research and development contexts to gain a shared understanding about health and wellbeing goals and consequently the kind of technologies and services needed. In the context of smart health, Papoutsi et al. [13] have found that co-design can be useful in healthcare environments. The co-design should involve several stakeholders, including the service users and healthcare staff as professionals. There are multiple frameworks that attempt to address the need to involve service users in the design process, which enables co-design to support identifying the needs that fit better for the context of the users [14]. Some notable examples of the use of co-design for smart health technologies are the design of an ecosystem of Ambient Assisted Living technologies [15] and the exploration of how older adults understand data for the design of health visualisations in South America [16]. The documented benefits of using co-design as a research technique are farreaching and can include more efficient decision making, improved knowledge of customer or user needs and quicker validation of ideas or concepts [17]. However, even though these benefits are achieved by involving all stakeholders and asking them to work together as equals, what methods are used to guide the co-design process? It is acknowledged that many co-design tools (e.g., probes, toolkits and prototypes) are used to achieve a key ingredient of design research which is ‘making’ [18]. However, the concept of ‘making’ is not simply defined as an act or process of forming but must also include the “construction and transformation of meaning” for what is to be formed [18]. While these co-design tools exist to help create design outcomes, co-design methods to identify and construct meaning to the design opportunity at the outset are lacking. It is argued that motivational models can address this potential shortcoming of co-design research. Here we suggest that co-design should be a standard element in creating smart health technologies. We argue that constructing meaning with the people who will be the users of the final product builds the basis for these technologies to be smart. Without involving end users as experts of their own lives, technologies cannot take on a role providing added benefit. With our research we particularly focus on older adults as users of health technologies developing for their goals and emotional needs.
156
S. Pedell et al.
8.2.2 The Relevance of Context and Mutual Learning for Older Technology Users From a participatory perspective, we recognise that we need to provide ways so that future users of smart health technologies feel enabled to share their experiential knowledge about what they want for their life [19], which is grounded in their knowledge about the context they are immersed in [20]. However, users, and in particular older adults, often do not have any expertise or experience in supporting technology development processes or even have little experience with the use of modern technology itself. Therefore, we need a language that enables them to confidently talk about their goals for technology use. Older users’ goals for technology use are also strongly related to their everyday context. This means that their interactions with technologies are embedded or ‘situated’ in their actions [21]. Understanding and designing for the situation can lead to engage with technologies better and helps them to feel in control and in the centre of attention [22]. Hence, we do need to know about the skills, experiences and goals of older users before developing and implementing technologies. Suchman’s concepts of situated action [21] leads us to thinking about technologies not as ‘smart’ per se but located in a social and material context. Hence it is crucial to understand the context and goals of users when developing and implementing smart technologies used to support people’s individual health and wellbeing. As such our suggested approach of Motivational Modelling for developing shared goals is closely related to the concept of “mutual learning”: “That is, designers learn from the participants about their experiences, practices and situations, and participants learn from the designers about potential technological options and how these can be provided. Everyone involved learns more about technology design.” [23]. As the future users often do not know about technology development and the researchers and developers do not know about the life experiences and the context the future users are immersed in and how the technologies would fit with their everyday activities, we need an approach to bridge these gaps over time. Motivational models can take this role of a shared language or “information vessels” [24] that allow the experiences and life goals to permeate discussions of developers responsible for programming health technologies. In our case studies we show how the direct participation of the older adults and other stakeholders occurred via creating shared goals for technology development that carry meaning for the key stakeholders involved. We see Motivational Modelling as a uniquely suited approach that aligns with the concept of situated action [21] and promoting older users as experts of their everyday life context. Our approach follows the principles of codesign that provides users with a tool to share their expertise and goals for everyday life with research and development teams [25, 26].
8 A Co-design Approach for Developing and Implementing Smart Health …
157
8.3 Motivational Modelling as a Co-design Method 8.3.1 Motivational Modelling Motivational models can help articulate the ambiguous nature of social concepts [27]. Emerging from fifteen years of research into agent-oriented software engineering, motivational models extend upon models used for socio-technical systems for the business domain. Being concerned with economic output, such socio-technical models predominately focus on the functionality of concepts which according to Miller and colleagues [27], neglects to understand the social context in which the concept is being used. As such, by also modelling the quality goals and the emotional goals [28] of the designed solution, the needs of end users can be identified which will enable technologies, systems and services to better support people in their everyday lives. Motivational models are developed by a two-part process. The first is by using the “do/be/feel” elicitation method which ultimately, leads to the generation of four lists of words by stakeholders: do, be, feel goals, and who the stakeholders are. According to Sterling et al. [4], these lists reflect the intended goals and roles of an intended product, system or service from multiple perspectives. As stated above, the “do” list refers to the functional goals of the concept (i.e., what should it do), the “be” list refers to the quality goals of the concept (i.e., what should it be), the “feel” list refers to the emotional goals of the concept (i.e., how should it feel) and finally the “who” list refers to who achieves these goals (i.e., identifies the stakeholders). Using a whiteboard to generate these word lists is convenient and typically takes 30 min to complete [4]. Figure 8.1 shows the word lists for a research project with the aim to develop an application to help with lower back pain. The goals are represented by symbols (parallelograms, clouds and hearts) which are easy to understand across different disciplines [29]. Once a draft has been constructed, the model is shown to stakeholders for feedback and any revisions to the model can be made. This is done so the stakeholders take ownership of their contribution because often, stakeholders will not refer to the model unless they own it. The second part of the process is using the do/be/feel and ‘who’ word lists to populate the motivational model (refer Fig. 8.2), typically by using a software tool. However, before the lists are entered, it is important to ensure that the lists are reviewed by all key stakeholders involved to represent multiple perspectives and even contradictory goals. As long as the overall goal and underlying main values are shared, Motivational Modelling provides a solid basis to guide development. Figure 8.2 depicts a motivational model for an app to support people with lower back pain. The primary stakeholders are patients and clinicians, indicated by the label under the person figure in the model. Other stakeholders include health funds, government, employers and families, to mention four of the stakeholders depicted under the person figure in the left of Fig. 8.2. The overall goal was to ‘Support people with lower back pain’. This was divided into four goals: (i) informing people about
158
S. Pedell et al.
Fig. 8.1 Do/be/feel and who word lists on white board for a lower back pain project
Fig. 8.2 Motivational model for developing a system to support people with lower back pain
what is known about back pain, (ii) observing the person with back pain both by collecting data and ongoing monitoring, (iii) making an assessment based on the observations, and reporting on what was found by visualising the data collected, and (iv) producing outcome measures that would be useful for patients and clinicians. Overall, the app and the overall support system needed to be affordable, integrated, lightweight and scalable as indicated in the cloud at the top of the diagram. The observation had to be non-intrusive, easy-to-use, accurate and reliable, as indicated in the cloud bottom left in Fig. 8.2. The report should be adaptable and intuitive. The emotions that were intended to be engendered is that the patient feels empowered, freed up, confident and hopeful, to mention four of the words in the heart at the top
8 A Co-design Approach for Developing and Implementing Smart Health …
159
of the diagram. The concerns to be overcome are frustration, being too hard to use, and a lack of motivation as listed in the spade figure at the right of the diagram. This figure was arrived at after several iterations with the project team consisting of the key stakeholders. The model helped with ensuring the team had a shared understanding of the project. The model guided the collection of user input which was reported in detail in the paper by Merolli et al. [30]. Even though co-design has many benefits and has been successfully used in an array of domains, it is argued that methods are needed to guide the co-design process. We found the described Motivational Modelling approach to be useful as it does not limit any aspect of participation and gives all stakeholders a strong voice in the design outcome. However, before we can discuss the benefits of using motivational modelling as a co-design method, the rationale for why we believe it truly is a co-design method needs to be addressed first.
8.3.2 Motivational Modelling as Driver of Co-design Principles It is argued that Motivational Modelling is a co-design method because the same driving principles of co-design are achieved when utilising Motivational Modelling to develop technologies. To illustrate this point, key co-design principles are displayed in Table 8.1. Two instances in which Motivational Modelling was used are then presented. We discuss how the co-design principles were applied in these motivational modelling examples to better understand what needed to be achieved through the respective technologies. As can be seen in Table 8.1, there are key co-design principles namely, value driven, collaborative, creative, involving, and shared. While this list is not exhaustive these principles articulate the backbone of what co-design constitutes and what the process hopes to achieve as outlined in the literature. Furthermore, because co-design in practice can have a different meaning between people, these principles can also offer consistency [34]. Here, it is argued that one way in which co-design (and these principles) can be achieved is by using Motivational Modelling as it will be shown that these principles are innate of this method.
160
S. Pedell et al.
Table 8.1 Driving principles of co-design and their application in co-design research Co-design principle
Application in co-design
Value driven
Giving individuals an opportunity to incorporate their values and objectives into the project is important. This allows the space for genuine participation and identification of the solution among all individuals in the co-design team [31]
Collaborative
Co-design works with people to discover new and relevant information across user groups, knowledge domains or disciplines. All key stakeholders collaborate on a shared overall goal or problem and are considered equals in the design process at all times [7]
Creative
Co-design creates services, artefacts, or ideas that are made available to the public or specific user groups. The outcome of the co-design project can differ to what was expected as more details emerge throughout the process and time [18]
Involving
Involving people who are representative of the target audience is important and will help keep the focus on the relevant stakeholders at all times. When involving people in the project, it is important that information is shared to maintain a common understanding of the problem [32]
Shared
Knowledge about the problem, project goals and information about the project must be openly shared within the team. This will maintain participation and interest [33]
8.4 Two Case Studies Using Motivational Modelling 8.4.1 Case Study One: Exploring and Evaluating Person-Centred Care in Audiology Settings This case study reports on the motivational modelling part of a project to develop a holistic and person-centred care approach in hearing rehabilitation settings [12]. To assist with a better understanding of what person-centred care meant for clients of a hearing rehabilitation provider two co-design workshops were conducted: one with ten audiologists and the second one with six people who were living with hearing difficulties. It is argued that with only a shared understanding of what different stakeholders mean by person-centred care it is possible to provide appropriately tailored services to clients, the tuning of their hearing devices and measurement of the success. First, we needed to understand what the different key stakeholders (persons with hearing loss and audiologists) meant by care, what and how they would like services delivered and what outcomes they would like to achieve from the service. The do/be/ feel method is designed to elicit what care should do, how care should be and how care should feel. After completing the do/be/feel elicitation method (the first phase), the following was revealed (refer Table 8.2). Table 8.2 shows the main goals shared between the two key stakeholder groups that were generated from the workshops. It
8 A Co-design Approach for Developing and Implementing Smart Health …
161
is important to note that audiologists could more easily comment on how care should be and what it should do while the clients could more easily comment on how it should feel. The main goal of these workshops was to better understand the shared goals and scope of person-centred care in audiology settings. As can be seen in Table 8.2, both groups identified that person-centred care can only be achieved when several functional goals are combined (what it should do). The main high-level goals concerned the support of the whole person beyond just selling suitable hearing aids. The goals related to social connection and support, understanding the social impact that the device can have on their users, and addressing negative aspects of hearing loss such as stigma and challenges in everyday life. Each of these seven main functional goals for person-centred care in audiology settings was elaborated on further by the two workshop groups, giving examples and addressing how these could be achieved within client consultations. Audiologists reported that ‘ensuring mutual understanding’ could be achieved by spending the time listening to the client and repeating what they heard the client wanted to achieve. A challenge was seen in managing the expectations and ensuring also that clients understood that the hearing aid would not just bring back hearing capability as without any hearing devices. Workshop participants with hearing loss wished for contact with other clients (users) with the same or similar devices to exchange experiences and build some peer-to-peer support. ‘Investigating full social impact’ referred to the importance of what scenarios and situations clients wished to improve their hearing in and where they had difficulties socialising due to their hearing loss. This also included talking about how to address their hearing needs in public and talk about them. The overarching functional goals were then discussed in terms of the associated qualities (how person-centred care should be). Table 8.2 The do, be, feel goals of person-centred care in hearing rehabilitation Do
Be
Feel
Who
Ensure mutual understanding
Learn from clients
Understood
Clients with hearing loss
Avoid stigma
Transparent
Humanising
Audiologists
Help to cope with hearing loss
Empathetic
Compassionate
Hearing aid provider Peak body
Link to clients with similar devices
Personalised
Part of community
Family members of clients
Provide service beyond Jargon free selling
Confident
Wider community
Investigate full social impact
Holistic
Whole of person addressed
Link to full services
Respectful
Respected (experiences)
162
S. Pedell et al.
Key quality goals, promoted the role the audiologist should take on while interacting with the client in being and having their service being transparent, using clear jargon free language while being respectful, empathetic and truly aiming for learning from the client as expert of their own lives. Equipped with this knowledge the audiologists saw themselves or were seen in a better position to personalise services and address the needs of their clients in a more holistic manner. Next, the two workshop groups discussed the emotional goals for person-centred care (how it should feel). In particular, the participants with hearing loss were quick in coming up with ways they wanted to feel when interacting with audiologists. They emphasised that the interactions needed to make them feel understood and respected. They wanted the audiologist to be compassionate, addressing their whole person not just their declining hearing capability and increase their confidence in dealing with their hearing loss. However, they also saw the impact in terms of feeling able to retain their integration in the community long term in order to achieve ‘providing person-centred care’, which is the overarching functional goal.
8.4.2 Case Study Two: Development of a Personal Emergency Alarm In our second case study we used Motivational Modelling to develop an innovative personal alarm system with older adults. By acknowledging the lack of research exploring the suitability and acceptance of emergency alarms worn by older adults, a study was conducted by Miller et al. [35] to better understand the emotions, challenges, benefits and opportunities that are encountered when older adults use emergency alarms and what they wish for instead in such products. The study sample consisted of twelve participants; eight older adults and four relatives of emergency alarm users, who were included due to differences in perspective. The older adults receiving the service at no costs needed to: (1) Have had an assessment, (2) be identified to be eligible for low care services, (3) be living in their own home (i.e. not those in residential care) and (4) be living on their own. Semi-structured interviews were conducted, with interview questions derived from an initial motivational model of older adults’ emergency alarm use (refer Fig. 8.3). This model was generated based on the assumptions of the researchers and also served as a baseline to which the final model could be compared when the interviews were complete. Interview questions included: “How was the decision made to sign up for the emergency alarm service?”, “How do you feel about using this system?”, “Could you describe what actions the emergency alarm system requires?” and “What other technologies are you using?”. Based on an ethnographic content analysis of the interview data, key themes relating to the functional, quality and emotional goals of the two stakeholder groups were extracted and then modelled (refer Fig. 8.4). These themes were then used to guide prototype development of a new emergency alarm in the final phase of the study. Both motivational models (Figs. 8.3 and 8.4) served
8 A Co-design Approach for Developing and Implementing Smart Health …
163
Fig. 8.3 Initial goal model for emergency alarm use
as valuable communication and reference tools for the team and allowed software engineers to evaluate their progress in the development of a prototype by referring to the stakeholder goals defined in the models. The models were also useful for comparing and evaluating other market solutions. As can be seen in Figs. 8.3 and 8.4, there are several notable differences in the functional, quality and emotional goals between the contrasting models. For example, in the initial model, prior to conducting the interviews and involving the end users in the goal definition (Fig. 8.3), it was assumed that older adults wanted to feel accessible, unified and natural when using emergency alarms. Again, this assumption was based on existing literature and anecdotal stories that the researchers had heard [35]. However, after the interviews had taken place, these emotional goals were not mentioned by participants and were subsequently replaced with more general
164
S. Pedell et al.
Fig. 8.4 Updated goal model for emergency alarm use based on study data
emotional goals such as feeling integrated and in control as these had real impact on participants’ lives (see Fig. 8.4).
8.5 Discussion 8.5.1 Application of Co-design Principles All of the co-design principles displayed in Table 8.1 were achieved in the motivational modelling examples described above. In relation to value driven, the do/be/ feel method used for person-centred care in audiology services (example one) and the development of a personal emergency alarm (example two), elicited the goals and needs of the stakeholders which were embedded into the design outcome. Older adult’s goals are based on their values such as the importance of an independent life and not being a burden to others. The do/be/feel method also guided collaboration amongst the stakeholders as they worked together on a shared problem and were considered equals during the process. By eliciting the functional, quality and emotional goals of the stakeholders, a new understanding of fitting hearing aids and providing a more holistic service (example one) and an innovative emergency alarm product (example two) could be created which aligned with the needs of stakeholders involved. Any preconceived ideas of the technology design or service could
8 A Co-design Approach for Developing and Implementing Smart Health …
165
be challenged as more domain specific details emerged throughout the participatory process. This meant expectations on the designed outcome were more realistic and more detailed for all stakeholders involved. This was highlighted in the contrasting figures above (refer Figs. 8.3 and 8.4) which shows notable differences in functional, quality and emotional goals from before and after interviews took place with study participants talking about their goals. The undertaking of both examples above involved stakeholders who were representative of the target audience: audiologists and clients with hearing loss (example one) and older adults and relatives of emergency alarm users (example two). The do, be, feel goals maintained the focus on the user group(s) at all times and highlighted the differences in functional, quality and emotional goals of each stakeholder group. For example, as shown in Fig. 8.4, the older adult emergency alarm users wanted to feel in touch, loved, safe and independent when using the personal alarm whereas their relatives/carers wanted to feel assured. They felt that achieving such feelings would also lower the stigma about this kind of technologies and overall they would be more positive about personal alarm use. Stigma was also a large issue for the users of hearing aids showing that besides the actual products and services the broader context and implications of the wider community need to be considered and given space in such discussions. Lastly, the outcome of the do/be/feel method is a motivational goal model (refer Figs. 8.3 and 8.4) which is shared with project stakeholders and visually represents their functional, quality and emotional goals for person centred services in audiology settings (example one) and the envisioned emergency alarm product (example two). Our experience indicates that the driving principles of co-design (value driven, collaborative, creative, involving and shared) are achieved when utilising Motivational Modelling in research and development. To allow for a direct comparison, the application of these principles in Motivational Modelling has been added to the content displayed in Table 8.1 and can be seen in Table 8.3. Furthermore, similar to co-design, the Motivational Modelling method can also be used for many different applications, including but not limited to the design of person-centred solutions and the development of personal emergency alarms, with a variety of stakeholder groups.
8.5.2 Motivational Modelling as a Co-design Method: Benefits and Implications Understanding how end-users feel when using a product, system or service is vital to the success of use and uptake [36]. In many disciplines such as software engineering, there is a tendency to focus on functionality, with the emotional needs of the enduser often neglected [37]. In fact, it could also be argued that even in co-design research, many stakeholders are not specifically asked how they want to feel when using a product, system or service. As such, their emotional needs fail to be at the forefront of their minds throughout the co-design process, resulting in this vital
166
S. Pedell et al.
Table 8.3 Driving principles of co-design and their application in co-design and motivational modelling research and development Co-design principle
Application in co-design
Application in motivational modelling
Value driven
Giving individuals an opportunity to incorporate their values and objectives into the project is important. This allows the space for genuine participation and identification of the solution among all individuals in the co-design team [31]
Older adult’s goals are based on their values. The whole approach is about eliciting the goals and needs of stakeholders. This is so they can be embedded into the design and align with people’s personal desires as well as values
Collaborative
Co-design works with people to discover new and relevant information across user groups, knowledge domains or disciplines. All key stakeholders collaborate on a shared overall goal or problem and are considered equals in the design process at all times [7]
The method is designed to guide collaboration to elicit domain specific knowledge—e.g., how the innovative emergency alarm needs to be designed to make older adults feel positively
Creative
Co-design creates services, artefacts, or ideas that are made available to the public or specific user groups. The outcome of the co-design project can differ to what was expected as more details emerge throughout the process and time [18]
The method is used to elicit emotional, functional and quality goals, that can be used to guide the design process, to create new experiences that are aligned to individual needs. The formulation of the shared goals is a creative act in itself
Involving
Involving people who are representative of the target audience is important and will help keep the focus on the user at all times. When involving people in the project, it is important that information is shared to maintain a common searching and understanding of the problem [32]
The do/be/feel method can be applied to the same problem, among different sets of stakeholder groups to identify relevant and new information. The three prompts (do, be, feel) help keep focus on the user at all times of the goal setting phases and help users with no technology experience formulate their needs
Shared
Knowledge about the problem, project goals and information about the project must be openly shared within the team. This will maintain participation and interest [33]
The outcome of the do/be/ feel method is a motivational goal model. This acts as a one page blueprint that can be shared and discussed among the team capturing key user emotions, product qualities and functions
information being overlooked unless specifically asked for by the facilitator. By using Motivational Modelling to guide the co-design process, stakeholders will be asked to think of the design outcome more holistically. That is, not only will the functional and quality goals be focused upon, but the emotional goals will be too leading to technology solutions aligning with user needs. Motivational Modelling helps building a shared understanding of the emotional goals of the stakeholders,
8 A Co-design Approach for Developing and Implementing Smart Health …
167
guiding the development and implementation of smart solutions that are more likely to be accepted and adopted by the users. Another benefit of using Motivational Modelling as a co-design method is that it allows transparency. In relation to the do, be, feel elicitation phase, each stakeholder may find each goal (i.e., functional, quality and/or emotional) equally important, but their priorities can become evident by the descriptor words assigned to each goal type. For example, in a study conducted by Wachtler et al. [38], uptake of an app designed to improve treatment allocation for depression was investigated. Motivational Modelling was used with two stakeholder groups; patients and clinicians which identified that patients wanted the app to make them feel emotionally supported and confident that the information presented to them was relevant. In contrast, it was identified that the clinicians focused on the validity, utility and professional appearance of the app. Hence, the findings of Wachtler and colleagues [38] revealed that patients prioritised emotional goals whereas clinicians prioritised the functional and quality goals of the app. These differences in priorities are important as they need to be accommodated throughout the design process, otherwise stakeholder engagement with the app would be compromised. Other benefits of applying Motivational Modelling as a co-design method are that the co-design process can be enhanced. For example, the first phase of Motivational Modelling the do/be/feel elicitation phase, takes only half an hour to complete (depending on circumstance). So in comparison to other co-design workshops, studies etc. which can take hours to complete, this phase is efficient. Secondly, from observations when conducting numerous motivational modelling workshops, the do/be/feel phase creates a positive collaborative environment amongst stakeholders which in turn, encourages participation. Thirdly, the do/be/feel phase encourages the stakeholders to tackle the design activity from a high, rather than low level view. These three factors encourage collaboration between the stakeholders and understand how they would benefit from the use of a smart system or service. In relation to the second phase; the model construction phase, the resulting model can be used throughout the entire co-design process. Importantly, this enables stakeholders to stay on-track by the ability to refer to a one-page, easy-to-understand image which can explain what the project is trying to achieve. The model can act as a reminder of the goals of the users throughout the design and implementation process of the smart solutions. Furthermore, the software used to transpose the do/ be/feel word lists into the model is easy to use and can be done quickly by the workshop facilitator. The flexibility of the software allows the model to be easily refined as needed (i.e., some goals deleted, others added) with the insights that are learned throughout the co-design process. This refinement can contribute to understand and bring nuance to the goals of the stakeholders that may not have been voiced in the do/ be/feel elicitation phase, e.g., enable to understand in more detail the desired emotions of older adults for smart health technologies based on their everyday experiences and needs.
168
S. Pedell et al.
8.6 Conclusions Despite co-design being widely used in a broad range of research and develop new technologies, it is argued that methods are needed to guide the co-design process. One such method we presented here is Motivational Modelling as the driving principles of co-design are achieved when conducting method in research and development. In this chapter, two examples were presented which highlighted this in different application domains, namely use in tele-audiology and the development of an innovative personal emergency alarm addressing older adults’ health needs. It was shown that by conducting motivational modelling, the principles of value driven, collaborative, creative, involved and shared were inherently followed, giving participants a strong voice in the design outcome. Hence, true co-design can be achieved through the use of Motivational Modelling; providing some structure and transparency to a process which is often interpreted differently between people. Importantly, Motivational Modelling considers the emotional goals of stakeholders which have been previously neglected in many fields such as software engineering and also co-design. Considering the emotions of end-users (and other relevant stakeholders) when using a product, system, or service is shown to be paramount to uptake and high adoption rates. Smart health technologies need to be built conveying meaning that is understood by end users. After all they are the experts of their lives and in a position to evaluate fitness and hence smartness of the technology provided.
References 1. Lim, C.-P., Chen, Y.-W., Vaidya, A., et al.: Handbook of Artificial Intelligence in Healthcare : vol 2: practicalities and Prospects, 1st edn., p. 20. Springer International Publishing, Cham 2. Pirhonen, J., Lolich, L., Tuominen, K., et al.: “These devices have not been made for older people’s needs”—Older adults’ perceptions of digital technologies in Finland and Ireland. Technol. Soc. 62, 101287 (2020). https://doi.org/10.1016/j.techsoc.2020.101287 3. Muñoz, D., Pedell, S., Sterling, L.: Evaluating engagement in technology-supported social interaction by people living with dementia in residential care. ACM Trans. Comput. Interact. 29, 1–31 (2022). https://doi.org/10.1145/3514497 4. Sterling, L., Pedell, S., Oates, G.: Using motivational modelling with an app designed to increase student performance and retention. In: Glick, D., Cohen, A., Chang, C., (eds.) Early Warning Systems and Targeted Interventions for Student Success in Online Courses, pp. 161–176. IGI Global, Hershey, PA, USA (2020) 5. Burrows, R., Mendoza, A., Pedell, S., et al.: Technology for societal change: evaluating a mobile app addressing the emotional needs of people experiencing homelessness. Health Inform. J. 28, 14604582221146720 (2022). https://doi.org/10.1177/14604582221146720 6. Sterling, L., Burrows, R., Barnet, B., et al.: Emotional Factors for teleaudiology. In: TeleAudiology and the Optimization of Hearing Healthcare Delivery, pp 1–18. IGI Global (2019)
8 A Co-design Approach for Developing and Implementing Smart Health …
169
7. Détienne, F., Baker, M., Burkhardt, J.-M.: Quality of collaboration in design meetings: methodological reflexions. CoDesign 8, 247–261 (2012). https://doi.org/10.1080/15710882.2012. 729063 8. Taffe, S., Tjung, C., Paulovich, B., Pedell, S.: Playful technology to help deaf children to speak: a case study using co-design method for designers to learn from speech therapists and parents. In: Proceedings of the 5th Conference on Design4Health. Sheffield, UK (2018) 9. Pedell, S., Favilla, S., Murphy, A., et al.: Promoting personhood for people with dementia through shared social touchscreen interactions. In: Woodcock, A., Moody, L., McDonagh, D., et al. (eds.) Design of Assistive Technology for Ageing Populations, pp. 335–361. Springer International Publishing, Cham (2020) 10. Taffe, S., Pedell, S., Wilkinson, A.: Reimagining ageing: insights from teaching co-design methods with designers, seniors and industry partners. Des. Health 2, 107–116 (2018). https:/ /doi.org/10.1080/24735132.2018.1450945 11. Munoz, S.-A., Macaden, L., Kyle, R., Webster, E.: Revealing student nurses’ perceptions of human dignity through curriculum co-design. Soc. Sci. Med. 174, 1–8. https://doi.org/10.1016/ j.socscimed.2016.12.011 12. Priday, G., Pedell, S., Vitkovic, J., Story, L.: Tracking person-centred care experiences alongside other success measures in hearing rehabilitation BT. In: Lim, C.-P., Vaidya, A., Chen, Y.-W., et al. (eds.) Artificial Intelligence and Machine Learning for Healthcare: vol. 1: image and Data Analytics, pp. 185–210. Springer International Publishing, Cham 13. Papoutsi, C., Wherton, J., Shaw, S., et al.: Putting the social back into sociotechnical: case studies of co-design in digital health. J. Am. Med. Inform. Assoc. 28, 284–293 (2021). https:/ /doi.org/10.1093/jamia/ocaa197 14. Greenhalgh, T., Hinton, L., Finlay, T., et al.: Frameworks for supporting patient and public involvement in research: systematic review and co-design pilot. Heal. Expect 22, 785–801 (2019). https://doi.org/10.1111/hex.12888 15. Martínez, R., Moreno-Muro, F.J., Melero-Muñoz, F.J., et al.: Co-design and engineering of user requirements for a novel ICT Healthcare Solution in Murcia, Spain. In: Pires, I.M., Spinsante, S., Zdravevski, E., Lameski, P., (eds.) Smart Objects and Technologies for Social Good. GOODTECHS, pp 279–292. Springer International Publishing, Cham (2021) 16. Cajamarca, G., Herskovic, V., Lucero, A., Aldunate, A.: A co-design approach to explore health data representation for older adults in Chile and Ecuador. In: Designing Interactive Systems Conference. Association for Computing Machinery, pp 1802–1817, New York, NY, USA (2022) 17. Roser, T., Samson, A., Humphreys, P., Cruz-Valdivieso, E.: New pathways to value: co-creating products by collaborating with customers, London (2009) 18. Sanders, E.B.-N., Stappers, P.J.: Probes, toolkits and prototypes: three approaches to making in codesigning. Co-Design 10, 5–14 (2014). https://doi.org/10.1080/15710882.2014.888183 19. Robertson, T., Simonsen, J.: Participatory design: an introduction. In: Routledge International Handbook of Participatory Design, pp. 21–38. Routledge, London, England (2012) 20. Visser, F.S., Stappers, P.J., van der Lugt, R., Sanders, E.B.-N.: Contextmapping: experiences from practice. Co-Design 1, 119–149 (2005). https://doi.org/10.1080/15710880500135987 21. Suchman, L.: Human-Machine Reconfigurations: Plans and Situated Actions, 2nd edn. Cambridge University Press, Cambridge, United Kingdom (2006) 22. Pedell, S., Constantin, K., Muñoz, D., Sterling, L.: Designing meaningful, beneficial and positive human robot interactions with older adults for increased wellbeing during care activities. In: Lim, C.-P., Chen, Y.-W., Vaidya, A., et al. (eds.) Handbook of Artificial Intelligence in Healthcare: vol 2: practicalities and Prospects, pp 85–108. Springer International Publishing, Cham (2022)
170
S. Pedell et al.
23. Robertson, T., Leong, T.W., Durick, J., Koreshoff, T.: Mutual learning as a resource for research design. In: Proceedings of the 13th participatory design conference: short papers, industry cases, workshop descriptions, Doctoral Consortium Papers, and Keynote Abstracts, vol. 2, pp 25–28. ACM, New York, NY, USA (2014) 24. Paay, J., Sterling, L., Vetere, F., et al.: Engineering the social: the role of shared artifacts. Int. J. Hum. Comput. Stud. 67, 437–454 (2009). https://doi.org/10.1016/j.ijhcs.2008.12.002 25. Sanders, E.B.-N., Stappers, P.J.: Co-creation and the new landscapes of design. Co-Design 4, 5–18 (2008). https://doi.org/10.1080/15710880701875068 26. Brandt, E., Binder, T., Sanders, E.B.-N.: Ways to engage telling, making and enacting. In: Routledge International Handbook of Participatory Design, pp 145–181. Routledge, New York. Routledge, London, England (2012) 27. Miller, T., Pedell, S., Sterling, L., et al.: Understanding socially oriented roles and goals through motivational modelling. J. Syst. Softw. 85, 2160–2170 (2012). https://doi.org/10.1016/j.jss. 2012.04.049 28. Marshall, J.: Agent-based modelling of emotional goals in Digital Media Design Projects. In S. Goschnick (ed.) Innovative Methods, user-friendly tools, coding, and design approaches in people-oriented programming, pp. 262–284. (2018). https://doi.org/10.4018/978-1-52255969-6.ch008 29. Sterling, L., Taveter, K.: The Art of Agent-Oriented Modeling. MIT Press (2009) 30. Merolli, M., Marshall, C.J., Pranata, A., et al.: User-centered value specifications for technologies supporting chronic low-back pain management. Stud. Health Technol. Inform. 264, 1288–1292 (2019). https://doi.org/10.3233/shti190434 31. Palmås, K., von Busch, O.: Quasi-Quisling: co-design and the assembly of collaborateurs. Co-Design 11, 236–249 (2015). https://doi.org/10.1080/15710882.2015.1081247 32. Pedersen, J.: War and peace in codesign. Co-Design 12, 171–184 (2016). https://doi.org/10. 1080/15710882.2015.1112813 33. Jin, Y., Chusilp, P.: Study of mental iteration in different design situations. Des. Stud. 27, 25–55 (2006). https://doi.org/10.1016/j.destud.2005.06.003 34. Burkett, I.: An Introduction to Co-Design. Knode, Sydney, Australia (2012) 35. Miller, T., Pedell, S., Lopez-Lorca, A.A., et al.: Emotion-led modelling for people-oriented requirements engineering: the case study of emergency systems. J. Syst. Softw. 105, 54–71 (2015). https://doi.org/10.1016/j.jss.2015.03.044 36. Au, N., Ngai, E.W.T., Cheng, T.C.E.: Extending the understanding of end user information systems satisfaction formation: an equitable needs fulfillment model approach. MIS Q 32, 43–66 (2008). https://doi.org/10.2307/25148828 37. Lopez-Lorca, A.A., Miller, T., Pedell, S., et al.: One size doesn’t fit all: diversifying “the User” using personas and emotional scenarios. In: Proceedings of the 6th International Workshop on Social Software Engineering. Association for Computing Machinery, pp 25–32, New York, NY, USA (2014) 38. Wachtler, C., Coe, A., Davidson, S., et al.: Development of a mobile clinical prediction tool to estimate future depression severity and guide treatment in primary care: user-centered design. JMIR Mhealth Uhealth 6, e95 (2018). https://doi.org/10.2196/mhealth.9502
8 A Co-design Approach for Developing and Implementing Smart Health …
171
Professor Sonja Pedell has 17 years of experience in evidencebased research, advocacy and participatory design processes. She carries out research working with many different user groups including older adults, culturally diverse groups, homeless people, people with low income and people living with chronic illnesses or dementia. Her research expertise includes human-centred design methods, co-design, and scenario-based design, the design of engaging technology, robotics, and the theoretical modelling of motivational goals for design development. Sonja is located in the Department of Communication Design/School of Design and Architecture at Swinburne University of Technology (Melbourne Australia) since January 2012, and her research contributes extensive knowledge of human– computer interaction (HCI), design and research methods to the teaching of digital media, experience and communication design. Prior to taking up this role at Swinburne, she completed a Master of Psychology from the Technical University of Berlin and was employed as an Interaction Designer, Usability Consultant and Product Manager in industry for several years. Sonja is Director of the Swinburne Living Lab which is now the oldest accredited active Australian member in the European Network of Living Labs (EnoLL). The Living Lab has developed core development capabilities in the area of innovative socio-technical systems and design solutions for “wellbeing and enjoyment for whole of life”. Sonja achieves this through co-designing solutions for enriching everyday living of marginalised groups. Via co-design she brings those with technical expertise and lived experience together to design meaningful products, services and experiences in their environments and develop innovative solutions that people readily adopt, as they fit with their actual needs and goals. Sonja evaluates the effectiveness of these products, services and experiences with a focus on how they benefit people’s lives. Since August 2019 she is leading the Assistive Robots for Healthcare Program in the Future Manufacturing Institute at Swinburne emphasising the importance of context and ethical consideration when introducing robots in people’s lives. Since March 2020 she is Theme Leader for Designing with Users in the Smart Cities Research Institute bringing people’s everyday needs to city planning.
Chapter 9
Recent Applications of BCIs in Healthcare I. N. Angulo-Sherman and R. Salazar-Varas
Abstract Brain-computer interfaces (BCIs) are systems that help translate brain activity, which is typically measured using electroencephalography (EEG), into commands for controlling a device without relying on the signals from the peripheral nerves or muscles. BCIs have shown high potential in healthcare applications. For instance, they have been implemented in mobility or assistive tools, vision restoration devices and cognitive or motor rehabilitation systems, which represent a wide variety of applications. This chapter will discuss the recent advances of BCIs in healthcare, focusing on the work made by female researchers, as well as some of the current limitations and future perspectives. Keywords BCI · EEG · Women · Healthcare · Smart health
List of Abbreviations and Symbols ALS AR BCI cBCI CFC CNN ECoG EEG
Amyotrophic lateral sclerosis Autoregressive Brain-computer interface Collaborative brain-computer interface Cross-frequency coupling Convolutional neural network Electrocorticography Electroencephalography
I. N. Angulo-Sherman Universidad de Monterrey, Av. Ignacio Morones Prieto 4500 Pte., 66238 San Pedro Garza García, Nuevo León, México e-mail: [email protected] R. Salazar-Varas (B) Universidad de las Américas Puebla, Ex Hacienda Sta. Catarina Mártir S/N., San Andrés Cholula, Puebla. C.P. 72810, México e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_9
173
174
EOG ERP fMRI ICA LDA LIS M1 MEG MRCPs PDC PM RF S1 SCI SCP SMR SSVEP
I. N. Angulo-Sherman and R. Salazar-Varas
Electrooculogram Event related potential Functional magnetic resonance image Independent Component Analysis Linear discriminant analysis Locked-in syndrome Primary motor cortex Magnetoencephalography Movement-related cortical potentials Partial directed coherence Premotor Random Forest Primary somatosensory cortex Spinal cord Injury Slow cortical potential Sensorimotor rhythms Steady-state visual evoked potentials
9.1 Introduction With the development of information technology, smart healthcare has arisen as the incorporation of new technology like artificial intelligence, microelectronics, mixed reality, mobile Internet and big data into different parts of healthcare to redirect the healthcare approach towards patient-centered care and management in order to satisfy the individual needs [85]. Based on a current review about smart health, the applications of smart healthcare can be divided depending on such needs into implementations for assisting diagnosis and treatment, health management, disease prevention and risk monitoring, virtual assistance, assisting drug research and smart hospitals [85]. One of the emerging technologies that can be implemented in smart health applications is the brain-computer interface (BCI). Deep details are given in Sect. 9.2. For instance, BCIs have been used to assist the diagnosis and disease monitoring and treatment [19, 65], such as in the case of pathologies like hyperactivity, depression and dementia; vision restoration devices and cognitive or motor rehabilitation systems, among others [5, 20, 47]. Also, these systems have been used for supporting patients at their own homes [3], as well as in health management approaches for improving the process of making complex decisions [28], Hence, the use of braincomputer interfaces has become fundamental in healthcare [64]. Since the emergence of the BCIs, at the end of the 20th century, most of the work in this line of research has been carried out by male researchers, with a reduced visibility and participation of female researchers. The latter is not due to a lack of interest, but for different social factors [77]. However, it is important to mention that this female participation has been increasing. Therefore, it is interesting to review
9 Recent Applications of BCIs in Healthcare
175
the work carried out by women in the field of BCIs, particularly focused on healthcare [83]. For the above, different proposals of the use of BCIs in the field of healthcare are reviewed in this chapter, focusing on the contributions of female researchers. The current situation of women in this area of knowledge is also exposed, as well as the perspectives. A similar work was reported previously, in 2022, in which the Women in Human Neuroscience series presented a research topic dedicated to the work of Women in Brain-Computer Interfaces with the aim of showing the scientific contributions of women to the understanding and development of approaches for enhancing and assisting the human neural systems through user-technology interaction [24]. This initiative was led by Caterina Cinel, Camille Jeunet, Zulay Lugo, Floriana Pichiorri, Angela Riccio and Selina C. Wriessnegger and it resulted in five research articles up to January of 2023 [10, 25, 45, 69, 80]. For better reading, the information presented in this chapter is organized as follows: Sect. 9.2 gives a background of the BCIs, describes the parts of an EEGbased BCI, provides an overview of the different kinds of BCIs and briefly describes the challenges in this field. Section 9.3 describes the current role of women in the advances of BCI research, considering different contexts, such as diagnosis and treatment, assistive technology and health management, among others. It is important to say that only studies based on EEG signals are considered in this chapter due to their extended use in BCI research and their potential use in smart health. Finally, Sect. 9.4 discusses the current situation faced by women in BCI research and the perspectives for them in this field.
9.2 Brain-Computer Interfaces Brain-computer interfaces (BCI) are systems that allow a device to be controlled or manipulated through the use of brain activity and without the participation of peripheral nerves or muscles [57, 95]. Therefore, it is necessary to decode the brain activity (i.e., to infer mental processes from patterns of brain activity [76]) and to associate it with a control command. To achieve this goal, it is necessary to comply with different stages: brain activity acquisition, processing and application. Figure 9.1 shows the typical block diagram of a BCI. As it can be seen in Fig. 9.1, the first stage consists of signal acquisition. The brain activity can be recorded through the use of different techniques such as mag-
Processing
Sensors
Brain acvity signal acquisiton
Preprocessing
Feature extracon
Classificaon
Postprocessing
Fig. 9.1 Typical sequence of data acquisition and processing for BCI applications
Applicaon
176
I. N. Angulo-Sherman and R. Salazar-Varas
netoencephalography (MEG), functional magnetic resonance image (fMRI), electrocorticography (ECoG), electroencephalography (EEG), among others [1]. Due to its low complexity and cost, EEG is the most widely used technique for BCI systems [6]. EEG is a noninvasive approach that measures the electric brain activity using electrodes that are placed over the scalp. The measurements that are obtained with this method are the sum of the electrical activity of millions of neurons from the cortex of the brain [57]. EEG electrodes are typically placed according to the 10–20 system, which suggests the location of 19 electrodes relative to the landmarks of the skull inion and nasion. Otherwise, it is common to use extensions of this sensor array using 32, 64, 128 or 256 electrodes [73]. Figure 9.2 shows the placement proposed by the 10–20 system. Based on the anatomical landmarks inion and nasion that are shown in the figure, the positions are marked to provide divisions of the head of either 10 or 20% of the inion-nason distance [81]. The names of the electrodes indicate the regions of the head in which the sensors are located: occipital (O), parietal (P), temporal (T), central (C), and frontal (F). The spatial resolution of EEG is limited by reasons that include that only a finite number of electrodes can be positioned at the scalp [57]. However, EEG has excellent temporal resolution that allows taking measurements within the order of thousandths of a second, as well as good portability and an inexpensive cost. These characteristics make this technique attractive for BCI applications. Once the brain activity has been recorded, the next stage is the processing, that is composed by the preprocessing, feature extraction and classification phases. In the preprocessing step, the signal is prepared for its processing (feature extraction and classification) by filtering and rescaling. This step is crucial, since the EEG signal is very vulnerable to noise due to its small values on the order of microvolts. Noise can be originated by natural sources, i.e. from the subject (e.g., breathing, heart signal or movement), or by artificial sources, from the surrounding environment (e.g., line
Inion
Fig. 9.2 Electrode positions of the 10–20 system
O2
P8 P4
10% 20%
O1
Pz
P3
Cz
C3
Fz
F3
P7 20%
T8
C4
F8 F4 Fp2
T7 20%
F7
Fp1 20%
Nasion
10%
9 Recent Applications of BCIs in Healthcare
177
power interference). In this sense, there is a wide variety of techniques that can be applied at this stage such as spatial filters, temporal filters and adaptive filters [46]. After the preprocessing phase is the feature extraction stage, where a feature vector is built that stores a set of values (i.e., features) that describe the relevant information of the brain signals and that allow further interpretation of brain activity [49]. Thus, the feature vector is a simplified representation of such signals and it is the input to the classification stage. The features can be extracted in the spatial, frequency or time domain. There are several characteristics in the time or frequency domain that have been proposed according to the phenomenon under study [29, 42, 91]. Finally, in the classification stage a decision is made based on the information contained in the recorded brain signal via the evaluation of the features. It is at this point that brain activity is associated with a control command to manipulate an application or external device. A detailed guide to designing and building a BCI can be reviewed at [36]. The choice of a feature varies according to the BCI paradigm that is used. A BCI paradigm is a type of BCI that is designed to use a set of type of neural activity features [84]. Thus, BCI systems are classified based on the brain activity patterns induced by a specific event that is sought by the system. Considering these patterns, BCIs can be divided in systems based on the detection of sensorimotor rhythms (SMRs), steady-state visual evoked potentials (SSVEPs), P300 component of event related potentials (ERPs), and slow cortical potentials (SCPs) [6]. Although most BCI systems are based on SMRs, P300s, or SSVEPs. A brief description of these three approaches are presented next.
9.2.1 Sensorimotor Rhythms BCIs that use sensorimotor rhythms or SMRs to provide control outputs have sensors that are located over the motor cortex in order to detect on the brain activity the mental practice of movement performance (i.e., motor intention or mental imagery). The mental practice of motor tasks activates some of the parts of the sensorial and motor neural network that are also involved during actual motor performance [23]. This activation of the sensorimotor cortex produces changes on the SMRs that can be observed as the attenuation of the EEG signals in the μ (8–12 Hz), α (9–12) and β (13–30 Hz) frequencies [30, 62], as well as an increase in the γ frequencies (>30 Hz) compared to when the user is not performing any motor task [94]. These changes occur following an specific distribution over the primary motor (M1) and primary somatosensory cortex (S1) depending on the part of the body that participates on the motor task. This spatial distribution is schematized in Fig. 9.3 for both brain hemispheres. A more detailed description of this topological distribution can be found in [41]. These BCIs have the potential of restoring or enhancing the motor functions or creating an additional mechanism of communication for people with motor impairments or healthy individuals [94]. Through the mental practice of the task that has to
178
I. N. Angulo-Sherman and R. Salazar-Varas
Foot
Foot
M1 S1
M1 S1
Central sulcus
Fig. 9.3 Topological distribution of the parts of the body in the sensorimotor cortex for both brain hemispheres
be enhanced or restored, a BCI cannot only be used in rehabilitation processes, but also for allowing training in sports, which can be also combined with virtual reality for more controlled training procedures [39]. Considering that the SMRs used by these BCIs are modulated at will and are not evoked by an external stimulus, they can require a period of training to produce detectable changes on the brain activity patterns [40].
9.2.2 Steady-State Visual Evoked Potentials SSVEP-based BCIs rely on the resonance effect that is observed on the brain activity of the occipital and parietal lobes when observing a flickering stimulus frequency [33]. SSVEP BCIs permit selecting diverse commands by paying attention on particular visual stimulus with specific characteristics such as the frequency or the phase, where each stimulus is associated to a specific control command [96]. As an example, Fig. 9.4 shows different stimuli in a screen that represent different commands and that flicker at different frequencies in Hz. This focus of attention elicits brain activity at the visual cortex with the same properties of the characteristics of the stimuli (i.e., frequency). Then, the difference of the modulation of the brain activity at the frequencies that are known to be related to each stimuli are used to detect the command that the user wants to select. SSVEPs BCIs are useful for applications that does not require to select between several options of commands [48]. For example, they can be used to control a prosthesis or activate the operation of a cleaning robot and other devices in a smart home [60, 82].
9 Recent Applications of BCIs in Healthcare Fig. 9.4 Representation of a four stimuli pattern on a screen for a SSVEP BCI
179
8.57 Hz 13 Hz
14 Hz
11 Hz
9.2.3 P300 Component of Event Related Potentials P300 is a component of the event-related potential, which is a phase-locked response to a sensory, motor or cognitive stimulus that produces a fluctuation of the EEG [22]. In particular, the fluctuation originated by the P300 is also known as the P3 wave and it is a positive peak that occurs 300 ms after the presentation of a stimulus that is unpredictable or not fully predictable and, therefore, attracts voluntary or involuntary attention after noting the event of interest. In BCI designs the P300 is originated using an oddball paradigm. This paradigm consists on providing a random sequence of visual, auditory or vibro-tactile events that can be classified in two types of stimuli, in which one is delivered frequently (i.e., standard or non-target) and the other is presented infrequently (i.e., the oddball or target) [28, 72]. Then, the user is asked to ignore the non-target stimuli and to perform some mental response with the target stimuli, so the P300 wave is expected for the oddball stimuli. For visual P300 BCIs, a screen is used to display a set of elements that are spatially fixed. Commonly, this set is displayed as a matrix like the one shown in Fig. 9.5. Then, a classifier that detects the presence of P300 is constructed based on the comparison between the brain activity that is elicited when target and non-target stimuli are presented. This makes possible the detection of when a subject is paying attention to an stimulus of interest. For example, in case of the Fig. 9.5 for an speller application, the stimulus that is considered as a target (e.g., ‘C’) is indicated to the subject. One row or column flashes and the subject should be aware if the target is highlighted. For this purpose, the user in some cases can be asked to perform a task associated to the target, like Fig. 9.5 Example of a set of stimulus in a matrix array for a P300 BCI paradigm
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
1
2
3
4
5
6
7
8
9
0
180
I. N. Angulo-Sherman and R. Salazar-Varas
silently counting how many times it flashes in the time that a specific series of flashes last [26]. This procedure is repeated with different targets in order to construct the classifier. After this calibration, the users can select an stimulus of interest to provide a command via based on the P300 detection. Within the most common applications of P300 BCIs are the control of home settings for people with disabilities and spellers to enable communication and control of wheelchairs by using the oddball paradigm to select commands in a control panel [22, 26]. BCIs are inherently related to smart healthcare, as both can involve of wearable devices, mobile connectivity and artificial intelligence [85]. For instance, there are several wearable EEG devices that make possible the monitoring of brain activity in the BCI [63]. Also, in order to interpret the EEG recordings the BCI must detect patterns in the brain activity, which can be performed by using machine learning or deep learning algorithms [2]. Even though BCIs offer several advantages in healthcare, they still have several limitations both in the design, and when taking it to applications in daily life. One of the major limitations is the availability of the data. EEG recording is time-consuming and it is desirable to carry out more than one session. On the other hand, the fatigue when recording, since most of the BCIs require a lot of concentration from the subject to perform the mental task required to issue the control command. Another issue is inter- and intra-subject variability, it has been shown that different factors such as gender, mood, age, among others, have an influence on the characteristics of the EEG signal, so it is necessary to consider this variability and identify the inherent characteristics of EEG signals that are related to the phenomenon being studied. Additionally, further research is needed to test the efficacy of BCI systems for people with disabilities, considering the source of the disability and its severity within the design and the testing of the systems [69]. This is considered that would allow the identification of characteristics of the user, the BCI design and the way the BCI interacts with the user that impact positively or negatively the results with the BCI and, thus, standardize the implementation of BCIs so that it can be used broadly by its intended population. On the other hand, there are also ethical and legal limitations linked to their reliability and the privacy and confidentiality of the information of the user. Also, they have usability issues during their implementation in the healthcare field for educational purposes, restoration and monitoring of physical functions, rehabilitation of neurological disorders and the control of environments or devices for mobility or communication [64].
9.3 Role of Women in the Advances of Brain-Computer Interfaces Research Although in recent years a great effort has been made to promote the participation of women in Science, Technology, Engineering and Mathematics (STEM), only a third
9 Recent Applications of BCIs in Healthcare
181
of the population in this field are women. BCI’s field research is no exception. In order to increase the visibility and recognition of women in the BCI field and promote the participation of more women, there is the need to share their contributions. For this reason, below current contributions of women regarding BCI within the context of smart health are described. For better reading, these contributions are classified by topics related to the applications of smart health described in [85]. In particular, the topics include assisting diagnosis and treatment, health management, virtual assistance, disease prevention and risk monitoring, and assistive technology. Other applications of smart health like assisting drug research and smart hospitals were omitted due to the lack of enough studies or the overlap with other topics. Only EEG-based research work within the last ten years was considered and there was given priority to presenting projects that had women as first or last authors. It should be mentioned that due to the richness of the contributions of women, only a brief overview is presented and it does not represent entirely the great extent of the women in BCI research related to smart health, especially in the topics that have a larger group of female researchers working in them.
9.3.1 Health Management Making individual or group decisions in complex environments where decisions can affect the well-being of others, as in the case of a hospital, can result in adverse outcomes like loss of lives [11]. The likelihood of making a correct decision based on the context of the situation and prior knowledge is called confidence and it is related to self-reports of confidence and physiological measures, such as the slow ERPs of EEG (e.g. P300) from the prefrontal and parietal brain regions [86]. Thus, it is thought that BCIs could help to detect and improve the level of confidence. However, these EEG correlations appear to be only found using large data sets of decisions that, in addition, show high inter-subject and intra-subject variability. In order to avoid the need of extensive BCI calibration or re-calibration procedures due to the inherent variability of EEG, meta-learning transfer techniques have been used to allow a zero-training approach by training the BCI using past data from different subjects and a relatively small amount of data from the target user to allow personalization. In [86], participants attended three sessions on different days to perform a complex decision task while their electrooculogram (EOG) and the EEG from the frontal and parietal regions were acquired and they reported their confidence. This information was used to fit a meta-learning framework aimed to predict the participants confidence based on the physiological data. Linear regression was used to obtain a mean regression model that was later fine-tuned using an optimized amount of data of the participant’s data set. The performance of this approach was compared to the case of training individually the BCI and with a baseline algorithm that relied on an average value of random samples from the participant. Results showed that the meta-analysis BCI had a significantly better performance and required smaller data sets.
182
I. N. Angulo-Sherman and R. Salazar-Varas
Even though confidence tends to be correlated with the accuracy of decisions, there are situations where individuals can show low accuracy and metacognitive accuracy, i.e., the level to which confidence is accurate. In these cases, better decisions can be made in groups, but achieving an accurate consensus can still challenging, considering that time can be critical, the information that is available can be limited or affected by noise or biases. In [11], a hybrid collaborative BCI (cBCI) was proposed to increase the speed and accuracy of group decision making. cBCIs make decisions according to the outputs of personal classifiers that decode the individual decisions, considering that the outputs that differ more from a threshold have a greater weight on the decision. This hybrid BCI does not uses only EEG (ERPs) to estimate the confidence of the members of the group for each decision and weight it to provide a group decision, but it also uses other physiological and behavioural measurements, such as the self-reported confidence, response time, eye movements and eye blinks preceding the decision. This system has been tested over several years, but recently it was performed under complex realistic visual tasks like searching animals in a landscape, looking for planes in aerial pictures, finding the presence of characters during a military patrol task. These activities are difficult for individual decisions due to conditions like poor lighting or similarity of the object to detect and the environment, fast and random presence of events that have to be detected and pressure [11, 56, 90]. Results showed that that the response time and the detected confidence was improved using this approach for the case of group decisions. Collaborative BCIs may still need to be tested on hospital environments and continue to be perfected. However, it is clear their potential to accelerate and improve the outcome during critical decisions, which makes them attractive for their possible use in healthcare. Another main focus of research that can impact the decision making in health care is the heterogeneity in the nomenclature regarding the condition of the patients and the characteristics of the studies. For example, patients that remain unresponsive after an accident that affects the brain can still have residual cognitive processes or awareness and, depending on their level of consciousness alteration, they could benefit from the use of BCIs for enabling communication. Hence, an assessment of the patient condition is required [51]. Recently, Schnakers et al. have emphasized the need to redefine nomenclature to specify the condition of the patient within studies regarding the state of consciousness in order to direct the effort of future studies, improve the clinical management of the cases of the patients according to their conditions and avoid providing confusing interpretation of studies and unrealistic expectations and risks to caregivers and stakeholders [80].
9.3.2 Virtual Assistance In the case of the technologies that serve as virtual assistants, there is debate as to whether the technology in question should have a “brain” that makes them aware of the mood of the user [52]. Also, the interactions between the virtual assistants and
9 Recent Applications of BCIs in Healthcare
183
the users are expected to occur in a natural and conversational manner. This seems to be a major challenge for BCI systems. Furthermore, Peters et al. performed a survey with novice and expert BCI users that have disabilities to obtain feedback about BCI development [68]. The results revealed that there is still areas of opportunity for improving the BCIs’ reliability, ease of use, utility and support. As aforementioned, there are still some issues on the robustness of BCIs that still limit their potential for many applications [50, 64]. Also, increasing the complexity of the selections that an user can make with these systems can impact the accuracy of the system [59]. Nonetheless, there have been some advancements regarding this kind of functions in the BCI field. A project of biometric authentication that relied on the use of brain activity for the detection of the state of mind and emotions (love, pride, contempt, disgust, fear, sadness, pleasure, worry, affection, hope and indifference) achieved their detection in an stable manner, although the related features of such patterns were subject dependent and, thus, authentication could be performed. After that, the project was continued to fit a model with inference algebra that made possible to describe the abstract emotions using different representations of the same emotion, e.g., as in the case an emotion is displayed differently in the person during pathological and healthy conditions [87]. Then, it was proposed a smart home system that was based on a BCI system and a set of mobile sensors that related to the vital signs of the person (agents) for people with severe impairments due to age or paralysis. As future work, this project will be continued and the sensors are pretended to be used so they can learn from the human user to understand the needs and intentions in a similar approach to the one of human reasoning, and they can be provided with the capacity to communicate with the user. For this purpose, the model has been extended using representations of knowledge (using epistemic logic, logic of trust and belief logic) to use it in the future to allow the communication and supervision of the agents by using the BCI to establish routines that adapt to the personal needs and allow flexibility on the prioritization of tasks based on the learning from the subject [88, 89].
9.3.3 Disease Prevention and Risk Monitoring The studies of BCIs related to this topic are exploring through reviews and surveys the relevant information to provide a guide to direct research and improve the reporting of research outcomes. As an example, a review of the possible biomarkers that are based on physiological measurements like EEG has been performed in order to find the biomarkers that could be integrated in a machine learning algorithm to support not only the diagnosis in brain injury cases, but also decision making for treatment and the prediction of the outcome of the intervention, which could improve the clinical attention and reduce the long-term symptoms and risks [92]. However, this topic does not seem to be as developed as others.
184
I. N. Angulo-Sherman and R. Salazar-Varas
9.3.4 Assisting Diagnosis and Treatment The use of BCIs in healthcare has helped to improve the diagnosis and the effect of the treatment in different pathologies and impairments. However, the motor rehabilitation is possibly the field that has been most widely explored. Mental practice of movement through motor imagery has been used to potentiate motor recovery after stroke [70]. The BCI can provide both the patient and the therapist a way to train the motor imagery and provide feedback. To evaluate the effect of motor imagery training on the brain in subacute post-stroke patients, Pichiorri et al. analized the correlation of clinical improvement and the effective brain connectivity that can be estimated as EEG data for two cases: undergoing motor imagery training with a BCI for one month and going through the motor imagery training without a BCI [71]. Effective brain connectivity was estimated using the partial directed coherence (PDC) of an EEG recording of 5 minutes with the person at rest that was taken before and after the training, which is a method to describe the relationships of the EEG between each combination of pair of channels [9]. Clinical improvement was measured for the upper limb of BCI group using the Fugl-Mayer scale. Results showed that motor imagery BCI training enhances the interhemispheric interactions of the sensorimotor rhythms, so the PDC could be useful for evaluating BCI training efficacy. In addition, a comparison of pre-BCI and post-BCI clinical and EEG conditions, showed that the BCI group had a significantly greater probability of improving the Fugl-Mayer Assessment score and they had a greater involvement of the alpha and beta bands from sensorimotor regions of the ipsilesional hemisphere of the trained hand [70]. This means that it is plausible that the training led to a greater involvement of the damaged hemisphere in the desynchronization of the sensorimotor rhythms. Similarly, Kostoglou and Müller-Putz explored the patterns of cortical connectivity in subjects with spinal cord injury (SCI) when attempting hand and arm movements using multivariate autoregressive (AR) models and a method named directed coherence [44]. AR models allow to represent the value of signals based on past observations and they provide a mean to observe the degree of similarity for different frequencies between signals across time through the use of coherence analysis, which can be used to estimate functional cortical connectivity. The results suggest that there are brain connectivity changes that occur before and after of the movement and such changes depend on the type of the movement. Also Kostoglou et al. developed a method to estimate cross-frequency coupling (CFC) that is based on variable AR models of linear parameters [43]. The method was validated by synthetic and real EEG signal and it can provide information that can be used both for classification and for a better understanding of coupling in the brain and its dynamics. In the future, this knowledge can be used to explore common CFC mechanisms between healthy and SCI people. In the same line of the support the treatment patients, we can found other works to evaluate the efficacy of a BCI that is applied to motor rehabilitation of SCI or stroke patients. Among some recent works there is the analysis of cortical activation patterns through the intervention of a BCI during both motor intention and feedback (passive
9 Recent Applications of BCIs in Healthcare
185
movement), carried out by stroke patients [15]. Based on the results of 10 patients, activation in region of somatosensory cortex during both conditions was observed with significant differences in α and β activations. Likewise, Dr. Cantillo has followed the evolution of patients in particular cases and has reported the positive effect of BCI in rehabilitation therapy. This is the case of a man who had a stroke the same day he was diagnosed with COVID-19, for which recovery of the upper limb was followed up while using a BCI [13]. The observed results showed that the patient presented a notable improvement in the motor function of the upper limb. Another example is the case of a woman who presented postoperative right hemiparesis, dysphagia, and speech difficulties after several tumor resections over twelve years. The patient participated in a BCI intervention for five weeks (30 sessions were performed). Before the BCI intervention, bilateral activation in the sensory motor cortex was observed by fMRI when the patient performed motor images of her paralyzed hand; these activations become ipsilateral throughout the sessions. Based on the results obtained, it could be said that BCI is complementary to conventional therapies to promote neurological recovery [14]. The use of BCIs has also been aimed at evaluating the effect of therapies other than motor rehabilitation, such as the study of the effect of acoustic therapy in the tinnitus pathology. Based on the state of the art of EEG signal analysis in the evaluation of tinnitus therapy [32], Alonso-Valerdi et al. established a method to select an appropriate therapy based on psycho-neurological effects [4]. With the participation of 71 Mexican volunteers (60 tinnitus sufferers and 11 controls), five therapies commonly used for the treatment of the tinnitus were evaluated. One of the five methods was applied to each participant during two months. To analyze the effect of therapy, six characteristics were defined: hearing loss, anxiety level, stress level, total amount of neural electrical response to acoustic therapy, assigned group, and EEG. The maximum neural response was obtained from the latter and, finally, a cluster analysis was undertaken. The authors suggest that the findings can be a guideline to select the appropriate therapy. One of the major contributions of BCI research to patients may be that the analysis of EEG with BCIs can be useful for the assessment of non-responsive patients, who can have severe language and motor deficits but still have good cognitive abilities [53]. In particular, it has been reported that ERPs can be used for diagnostics and prognosis, but sometimes it may result in sub-diagnosis of the level of consciousness. Hence, detecting consciousness remains a difficult task. Lugo et al. illustrated with a case report of a locked-in syndrome (LIS) patient [53], i.e., a person who is awake and conscious but cannot speak or produce limb or facial movements, although some residual movements can be present depending on the variety of LIS, an can sometimes blink or move the eyes [28, 54]. The assessment of the patient followed a hierarchical evaluation using six auditory paradigms to elicit ERPs (MMN, N400 and P300) while EEG was acquired [53]. The recordings were preprocessed and an Independent Component Analysis (ICA) was carried out to separate the EEG into underlying components, which allowed to remove the ocular activity and other artifacts. Then, the mean of the different ERPs of the several trials was obtained and the results were visually inspected and through statistical analysis. Results showed that the MMN was
186
I. N. Angulo-Sherman and R. Salazar-Varas
not found nor the passive N400 and P3 (i.e., originated by just listening to the auditory stimuli), although the active P3 was observed (i.e., originated by counting certain tones). Lugo et al. discuss that these findings indicate it is important to performing a complete assessment of non-responsive patients. This is relevant because even if there are clinical criteria for diagnosis of LIS, misdiagnosis still occurs [28]. To deal with this situation, BCI could supplement the limited behavioral assessment scales for patients with LIS, considering that these tools rely on the use of motor abilities. Also, BCIs could help to establish a channel of communication or control of the environment. Heilinger et al. compared the accuracy of a vibrotactile P300 BCI between patients with LIS who were diagnosed with either stroke or amyotrophic lateral sclerosis (ALS) as the etiology or cause of the LIS condition [28]. Two vibrotactile schemes were tested on the patients: one that had one tactor (VT2) on each wrist and another in which an additional tactor (VT3) was placed on the upper part of the back or neck that acted as a distraction. The subjects had to count the oddball stimulus that was presented on one wrist to elicit P300s during a run that included 480 stimuli. VT3 was used also for communication by counting the stimuli of one hand to answer “yes” and the other hand to answer “no”. Each participant participated in one VT3 communication run that consisted on 10 questions and at least one assessment run with VT2 and VT3. It was considered that a patient could communicate reliably if at least 7/10 questions could be answered. Patients who had ALS had a better performance compared to the patients who had a stroke in VT2 (ALS: median accuracy of 98%, stroke: median accuracy of 32%) and VT3 (ALS: 8/9 patients could communicate, stroke: 0/6 patients could communicate) schemes. This shows that there are different pathological mechanisms for the patients and such differences can allow the use of a BCI to support the diagnosis. Regarding the brain activity, it was observed that all patients showed a P300 complex during VT2, but none of the strike patients showed it during VT3. Heilinger et al. hypothesized that the stroke patients could have increased fatigability, cognitive deficits or reduced tactile sensitivity due to a lesion of the sensory pathways. The differences between patients should be elucitated for a better understanding and assessment of the LIS patients. This information could also be useful to adapt the BCI system to the needs of these kind of patients. Regarding the adaptation to user needs, some BCIs like the SSVEP systems need that the users can focus their gaze on the stimuli that are presented on a screen, but some patients such as the ones with amyotrophic lateral sclerosis (ALS) may have concomitant visual or ocular impairments. Peters et al. compared the accuracy of a traditional SSVEP speller, a commercially available gaze tracker speller and a Shuffle SSVEP Speller with a modified gaze tracker for the BCI system [67], which was intended to enable adaptation to the user’s visual skills and deficits in order to make the system more comfortable and easy to use. Results from two participants showed that the Shuffle SSVEP Speller with the modified gaze tracker was the one that provided a higher accuracy of the system. These results show the impact that can have considering the individual characteristics. Although all the applications reported above are very interesting, it is important not to neglect the need for basic research to improve knowledge of the neurorehabilitation
9 Recent Applications of BCIs in Healthcare
187
process and the diagnostic process. Garro et al. report a review of the state of the art of biomarkers used in neurorehabilitation [27]. Since translational rehabilitation research faces the challenge of developing feasible strategies to obtain an accurate evaluation, the proposal is to use robot-assisted interventions, since it allows to obtain an objective evaluation of performance using different quantifiable parameters. The parameters are related to the kinematics, but they are also associated with the brain plasticity that is fostered by the therapy, so it is relevant to evaluate the different biomarkers related to robotic neurorehabilitation. The work points out the trends and perspectives on the use of different biomarkers. For example, one application could be to develop an integrated treatment of stroke-induced motor, cognitive, and affect-related deficits, and another application could be to create computational models of neurorehabilitation by therapies. This type of research will provide insights to improve our understanding of prognosis and the recovery process, as well as potentially allow for improvements in therapies. It should be considered that there is no universal BCI that works for all users, so there is interest on understanding the relation between personal traits, cognitive states and BCIs for procuring the successful use of these systems. This could improve the technology acceptance, which is affected by factors such as the sense of agency, i.e, feeling in control when performing a task, are also crucial for implementations with virtual reality. For improving the conditions, EEG could be used to monitor these kind of factors [34]. Recently, Jeunet et al. constructed a model that helps to relate BCI technology acceptance for stroke rehabilitation procedures, considering factors as the systems characteristics, facilitating conditions, individual differences. It is expected that this model would help to improve BCI acceptance by adapting the systems to each patient profile [37]. As previously mentioned, the characteristics of the system, including the design of the feedback can affect BCI outcomes.The user can be provided with feedback during the training stage with a motor imagery based BCI. This feedback consists on providing stimuli (e.g., visual, auditory or vibrotactile) that provides information about the cognitive task through the outcome of the BCI system. The most common feedback modality is visual, although its use may not be practical for all circumstances. Feedback among systems can vary and there is still discussion on the way that it should be delivered to promote the best performance, since results seems to indicate that the feedback that provides the best performance varies among subjects and also different kinds of feedback appear to affect differently the brain connectivity changes that occur after BCI use [7, 8]. A full review about the influence of neuro feedback on self-regulation of sensorimotor rhythm in training procedures on motor skills enhancement is presented in [38]. In this work, a synthesis of the antecedents related to neurophysiology and neuroplasticity is made. This background results in SMR being considered a reliable EEG target for improving motor skills through the application of neurofeedback. A review of the findings regarding SMR neurofeedback in patients and healthy is also reported. Although SMR neurofeedback shows promise in improving motor skills, a clear relationship between neurofeedback therapies and motor skill improvement has not been defined due to a number of limitations.
188
I. N. Angulo-Sherman and R. Salazar-Varas
9.3.5 Assistive Technology Assistive technologies that can be controlled via a BCI include smart homes and neuroprostheses. The EU BackHome project aimed to develop BCIs as assistive technologies that are implemented in the home of people with limited mobility, using user-centred design to provide health telemonitoring and home support.The system consists on two stations: a user station and a therapist station [58]. In the user station, an EEG wireless device and environmental home sensors are used a the BCI system with a screen and an easy to use software to access different services related to daily living, including smart home control, web access and cognitive stimulation through performing cognitive rehabilitation tasks or using Brain Painting [61]. The BCI system is controlled using a P300 paradigm in which a matrix of symbols in presented on the screen and the rows and columns oh the matrix are highlighted in random order. Then, the end-user has to attend the symbols they want to select and count when it is highlighted to elicit a P300 response that is identified by the system. The selection is associated to the control of services and actuators. The user station is interconnected to the therapist station, so the system provides remote feedback and information of the users their nearby sensors to the therapist or another professional for the follow up and rehabilitation plans or the personalization of the remote assistance and initiation of support actions. The BCI of the BackHome project has been evaluated for an end-user group and therapists. Seven out of nine end-group subjects could operate the system, who were more or less satisfied with the system [16, 58], although they recommended to improve the software stability and the speed of selection making. In case of the therapists, they reported that the system would help in their practice, but there is still required to simplify the interface and include more cognitive rehabilitation tasks. The neuroprotheses that are activated with help of a BCI should detect the user intentions, take actions and provide feedback timely and accurately so the use of such devices feels natural to the user and promotes recovery. The movement-related cortical potentials (MRCPs) are slow fluctuations within the delta band that are associated to motor tasks along with the typical mu band changes. They have small latencies respect to the movement intention and their magnitude and slope are related to movement parameters (e.g., speed, direction and trajectories), so their use could improve BCIs detection. Pereira et al. investigated the differences between the EEG during goal oriented (reach-and touch movement) and a no-goal (aimless movement) tasks that shared the same kinematics [66]. They showed that the main differences between both kind of tasks were found not only on the premotor (PM) and M1 areas, but also in the parietal cortex. Then, they included these differences in a classifier to improve the detection of movement intention. In order to control systems like the ones mentioned before, it is important to select and appropiate classifier that can detect the patterns on the brain activity that the BCI uses for generating a control command. This can also determine the amount of time thar the user requires to adapt to the system. Wriessnegger and Barradas indicate that deep learning provides techniques for finding relevant signal features and models
9 Recent Applications of BCIs in Healthcare
189
that relate those features based solely on the brain activity data. This allows to obtain accurate results with no training, but one disadvantage is the interpretation of the constructed models that are obtained [31]. In some of Wriessnegger’s recent research, a comparison of the results in the classification of fine hand movements using deep learning and typical machine learning models was performed [12]. In particular, MRCPs were used to model a convolutional neural network (CNN) for the classification of fine hand self-paced movements (touch, grasp, palmar, an lateral grasp) and its accuracy was compared to the performance with the shrinkage-linear discriminant analysis (LDA) and Random Forest (RF) machine learning models. MRCPs are EEG low-frequency modulations within 0.3–3 Hz that are associated to the processes of the planning and execution of movement and they encode properties of the movements, such as the type of grasp, force level and speed and they are exhibited in either actual execution of movement, motor attempt and motor imagery. The accuracy results (i.e., how many correct classifications were performed in relation to the total of events to classify) of the deep learning model for discriminating among three classes (two movements and the rest condition) had a similar or better accuracy (up to 70 ± 11%) with the CNN compared to the other models. The authors reported that these results along with the relative simplicity of the implementation of the CNN compared to the machine learning models that were evaluated make this deep learning model suitable for online BCI applications. Speaking of basic research developed to gain knowledge and improve the development of BCIs, there is research in which the exploration of characteristics and the selection of the appropriate parameters during the pre-processing stage is assessed in order to improve the classification of EEG signals. For example, an evaluation of different cutoff frequencies was carried out to find those that improved the task of classifying EEG signals related to motor imagery [78]. This information is could be useful for customizing the preprocessing stage for each subject. Additionally in that same study, different feature extraction techniques were evaluated to compare their performance in the classification stage. In [79], the use of the fractal dimension and different cut-off frequencies was proposed to address the problem of intra- and inter-subject variability. This proposal is interesting because the variability of the subject limits the generalization of the application of BCI.
9.4 Current Status and Perspectives for Women in BCIs in Smart Health Broadly addressing the situation of women in research and innovation, it is important to say that although most universities, research institutions and funding bodies support gender parity, women make up about only one quarter of senior academics and people in decision making positions [77]. This is not because women lack of ambition, but is due to a overestimation of the female representation in key working positions, as well as a workplace culture in which there are no role models for
190
I. N. Angulo-Sherman and R. Salazar-Varas
women and they are treated differently from male colleagues. In the neuroscience field, there has been substantial advancement to increase the recognition of women, but the cultural shift still needs to continue providing equal opportunities for sharing the intellectual accomplishments of women and support them to secure the resources to provide continuity to their research [93]. In order to improve the parity in workplace in relation to gender, there are some actions that can be taken [75]. Firstly, there should exist programs that raise conscious awareness of gender bias, as it has been observed that it can improve equity in science along with institutional policies against misconduct and sexual harassment, which appears to be the main reason why women leave science. Also, national funding agencies can discontinue supporting institutions that do not address complaints satisfactorily. This requires the destination of resources to monitor the effectiveness of interventions and monitor the progress over time with large samples of analysis that are representative of the population under analysis. Secondly, training the peer reviewers that participate on the evaluations for grant funding has been found to be beneficial for the career of women in the long term, as it is a factor that is evaluated for the hiring and promotion processes. In addition to the previous measures, it should be considered that the use of technology can discriminate some groups of people, as in the case of digital ads of job offers that are presented differently according to gender [17]. Moreover, it should be considered that the conscious awareness should not only be promoted until the people are introduced into the workplace, but since the phase when they are young students, as both faculty members and students tend to underestimate the perceived skills of the female counterpart [75]. For this purpose, anonymous evaluation, which hides the gender, may a solution to achieve fairer evaluation. Finally, female members should be included in the committees that are in charge of hiring, promoting and assigning tenure to ensure that the documentation that is used to evaluate candidates is similar in content and length and that evaluation criteria is applied homogeneously between candidates and seeking inclusion and diversity. Now, addressing the particularities of research in BCI, there is the need of personalized healthcare and understanding the subject variability in relation to the designs and their outcomes. It would be appropriate that female researchers could also help to identify and satisfy the healthcare needs of women that are not currently satisfied in either research and their implementations. Even if there have been some releases of real-life applications, there can be still under continuous improvement. Healthcare technologies are typically designed for an universal user that does not always meet the needs of women [55]. However, this may require a wider use of the systems, since this kind of information seems to be gathered mainly with surveys. This would be particularly relevant in the future, since as the population ages it is expected to become feminised and stay mostly at home, creating a niche of development for assistive and monitoring healthcare technology that involves end-users and their families, friends and caregivers for either sensory, cognitive and physical difficulties. In this sense, to increase the participation of women could lead to enrich the knowledge and to increase the scientific quality [21].
9 Recent Applications of BCIs in Healthcare
191
On the other hand, to exploit the different applications of BCIs, in future investigations, a multidisciplinary participation is important, to evaluate and address the different aspects involved in the design of a BCI. For example, users must practice to be able to use some BCIs and the BCI training does not necessarily consider psychology recommendations. Also, the studies consist on only one or two training sessions. In the present time and in the near future, it is possible that women will contribute more to interdisciplinary research. A study about the networking of brokerage researchers reported that women assume more diverse work positions and their working network shows a greater variety in terms of the professional affiliations of their collaborators [18] and their homophily (i.e., exchange occurs mainly through ties with the same sex), since women can benefit from the inclusion in their networks. This results in the access to non-redundant knowledge and richer perspectives due to the interaction with a wider range of organisational environments and professional communities. Also, this behavior can be related to the expectations according to gender, such as the idea that that women are better at interacting with people, while men stand out at working with things and, therefore, excel at the STEM fields. This expectations can lead to the integration of women in the available niche of interdisciplinary work, where they can disrupt the popular practice and develop new opportunities. It should be noted that network diversity is also relevant because it seems that network diversity is crucial for promotion and funding achievement, as well as for gaining an visibility. This shows that gender disparities can lead to positive effects that can help to overcome the difficulties. Although the previous observations can serve as a basis for the expectations for women in the BCI area, further analysis in this research context should be performed. However, it should be mentioned that BCIs have emerged in an interdisciplinary field where computer science, electrical engineering, neuroscience, psychology and their applicability join [74]. Thus, the characteristics of the area could be beneficial for the development of female researchers in the near future. A good example of the effort to achieve interdisciplinary research is the project carried out by Camille Jeunet. She has worked in an initiative to gather an interdisciplinary consortium of BCI researchers from Europe and Japan to collect data from several participants and multiple training sessions [35]. This project is expected to strengthen the BCI community and to allow a coordinated effort to improve BCI innovations and the training procedures with these systems. Other female researchers that are participating in this project are Selina C. Wriessnegger, Silvia Kober, Raphaëlle N. Roy, Cornelia Kranczioch, Sonja Kleih, Andrea Kübler, Donatella Mattia, Patricia Figueiredo, Stefanie Enriquez-Geppert, Aleksandra Vuckovic. It should be mentioned that all these women are affiliated to European organizations in Austria, France, Germany, Italy, Portugal, the Netherlands and United Kingdom. In the future, it should be expected that other collaborative projects of BCI research as the one led by Camille Jeunet in [35] should be proposed and other female researchers should be added, including organizations that are in non-European countries. As can be seen, the participation of women in STEM and particularly in BCI research is not null, however, their participation should be encouraged so that, together with male researchers, knowledge is enriched and greater development and achievement are achieved. benefit to society.
192
I. N. Angulo-Sherman and R. Salazar-Varas
References 1. Abiri, R., Borhani, S., Sellers, E.W., Jiang, Y., Zhao, X.: A comprehensive review of EEG-based brain-computer interface paradigms. J. Neural Eng. 16(1), 011001 (2019) 2. Aggarwal, S., Chugh, N.: Review of machine learning techniques for EEG based brain computer interface. Arch. Comput. Methods Eng. 1–20 (2022) 3. Alonso-Valerdi, L.M., Sepulveda, F.: Development of a simulated living-environment platform: design of BCI assistive software and modeling of a virtual dwelling place. Comput.-Aided Des. 54, 39–50 (2014) 4. Alonso-Valerdi, L.M., Torres-Torres, A.S., Corona-González, C.E., Ibarra-Zárate, D.I.: Clustering approach based on psychometrics and auditory event-related potentials to evaluate acoustic therapy effects. Biomed. Signal Process. Control 76, 103719 (2022) 5. Amaral, C., Mouga, S., Simões, M., Pereira, H.C., Bernardino, I., Quental, H., Playle, R., McNamara, R., Oliveira, G., Castelo-Branco, M.: A feasibility clinical trial to improve social attention in autistic spectrum disorder (ASD) using a brain computer interface. Front. Neurosci. 12(July) (2018) 6. Amiri, S., Fazel-Rezai, R., Asadpour, V.: A review of hybrid brain-computer interface systems. Adv. Hum.-Comput. Interact. 2013 (2013) 7. Angulo-Sherman, I.N., Gutiérrez, D.: Effect of different feedback modalities in the performance of brain-computer interfaces. In: 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 14–21. IEEE (2014) 8. Angulo-Sherman, I.N., Gutiérrez, D.: A link between the increase in electroencephalographic coherence and performance improvement in operating a brain-computer interface. Comput. Intell. Neurosci. 67–67, 2015 (2015) 9. Baccalá, L.A., Sameshima, K.: Partial directed coherence: a new concept in neural structure determination. Biolo. Cybern. 84(6), 463–474 (2001) 10. Berger, L.M., Wood, G., Kober, S.E.: Effects of VR-based feedback on NF training performance—A sham-controlled study. Front. Hum. Neurosci. 523 (2022) 11. Bhattacharyya, S., Valeriani, D., Cinel, C., Citi, L., Poli, R.: Anytime collaborative braincomputer interfaces for enhancing perceptual group decision-making. Sci. Rep. 11(1), 17008 (2021) 12. Bressan, G., Cisotto, G., Müller-Putz, G.R., Wriessnegger, S.C.: Deep learning-based classification of fine hand movements from low frequency EEG. Fut. Internet 13(5), 103 (2021) 13. Carino-Escobar, R.I., Rodríguez-García, M.E., Ramirez-Nava, A.G., Quinzaños-Fresnedo, J., Ortega-Robles, E., Arias-Carrion, O., Valdés-Cristerna, R., Cantillo-Negrete, J.: A case report: upper limb recovery from stroke related to sars-cov-2 infection during an intervention with a brain-computer interface. Front. Neurol. 13 (2022) 14. Carino-Escobar, R.I., Rodriguez-Barragan, M.A., Carrillo-Mora, P., Cantillo-Negrete, J.: Brain-computer interface as complementary therapy for hemiparesis in an astrocytoma patient. Neurol. Sci. 43 (2022) 15. Castro-Aparicio, J.C., Carino-Escobar, R.I., Cantillo-Negrete, J.: Movement-related electroencephalography in stroke patients across a brain-computer interface-based intervention. In: Ribeiro, P.R.d.A., Cota, V.R., Barone, D.A.C., de Oliveira, A.C.M. (eds.) Computational neuroscience, pp. 215–224. Springer International Publishing, Cham (2022) 16. Daly, J., Armstrong, E., Wriessnegger, S.C., Müller-Putz, G.R., Hintermüller, C., Thomson, E., Martin, S.: The evaluation of a brain computer interface system with acquired brain injury end users. In: 6th International Brain Computer Interface Conference, pp. 73–76. Graz University of Technology, Graz, Austria (2014) 17. Datta, A., Tschantz, M.C., Datta, A.: Automated experiments on ad privacy settings: A tale of opacity, choice, and discrimination (2014). arXiv:1408.6491 18. Díaz-Faes, A.A., Otero-Hermida, P., Ozman, M., D’este, P.: Do women in science form more diverse research networks than men? An analysis of Spanish biomedical scientists. PloS One 15(8), e0238229 (2020)
9 Recent Applications of BCIs in Healthcare
193
19. Dobkin, B.H.: Brain-computer interface technology as a tool to augment plasticity and outcomes for neurological rehabilitation. J. Physiol. 579(3), 637–642 (2007) 20. Enriquez-Geppert, S., Smit, D., Pimenta, M.G., Arns, M.: Neurofeedback as a treatment intervention in ADHD: current evidence and practice. Curr. Psychiatry Rep. 21(6) (2019) 21. Fatourou, P., Papageorgiou, Y., Petousi, V.: Women are needed in stem: European policies and incentives. Commun. ACM 62(4), 52 (2019) 22. Fazel-Rezai, R., Allison, B.Z., Guger, C., Sellers, E.W., Kleih, S.C., Kübler, A.: P300 brain computer interface: current challenges and emerging trends. Front. Neuroeng. 14 (2012) 23. Foerster, Á., Rocha, S., Wiesiolek, C., Chagas, A.P., Machado, G., Silva, E., Fregni, F., MonteSilva, K.: Site-specific effects of mental practice combined with transcranial direct current stimulation on motor learning. Euro. J. Neurosci. 37(5), 786–794 (2013) 24. Frontiers. Women in brain-computer interfaces 25. Galiotta, V., Quattrociocchi, I., D’Ippolito, M., Schettini, F., Aricò, P., Sdoia, S., Formisano, R., Cincotti, F., Mattia, D., Riccio, A.: EEG-based brain-computer interfaces for people with disorders of consciousness: features and applications. a systematic review. Front. Hum. Neurosci. 16 (2022) 26. Ganin, I.P., Shishkin, S.L., Kaplan, A.Y.: A P300-based brain-computer interface with stimuli on moving objects: four-session single-trial and triple-trial tests with a game-like task design. PloS One 8(10), e77755 (2013) 27. Garro, F., Chiappalone, M., Buccelli, S., De Michieli, L., Semprini, M.: Neuromechanical biomarkers for robotic neurorehabilitation. Front. Neurorobot. 15 (2021) 28. Heilinger, A., Ortner, R., La Bella, V., Lugo, Z.R., Chatelle, C., Laureys, S., Spataro, R., Guger, C.: Performance differences using a vibro-tactile p300 BCI in LIS-patients diagnosed with stroke and ALS. Front. Neurosci. 12, 514 (2018) 29. Hengzhi, L., Dong, W., Zhenhao, W., Yanhong, Z.: Advances in the extraction and classification of EEG dynamic features in patients with mild cognitive impairment. Chin. J. Biomed. Eng. 38(3), 348–354 (2019) 30. Holz, E.M., Höhne, J., Staiger-Sälzer, P., Tangermann, M., Kübler, A.: Brain-computer interface controlled gaming: evaluation of usability by severely motor restricted end-users. Artif. Intell. Med. 59(2), 111–120 (2013) 31. Iakovidis, D.K., Ooi, M., Kuang, Y.C., Demidenko, S., Shestakov, A., Sinitsin, V., Henry, M., Sciacchitano, A., Discetti, S., Donati, S., et al.: Roadmap on signal processing for next generation measurement systems. Meas. Sci. Technol. 33(1), 012002 (2021) 32. Ibarra-Zarate, D., Alonso-Valerdi, L.M.: Acoustic therapies for tinnitus: the basis and the electroencephalographic evaluation. Biomed. Signal Process. Control 59, 101900 (2020) 33. ˙I¸scan, Z., Nikulin, V.V.: Steady state visual evoked potential (SSVEP) based brain-computer interface (BCI) performance under different perturbations. PloS one 13(1), e0191673 (2018) 34. Jeunet, C., Albert, L., Argelaguet, F., Lécuyer, A.: “Do you feel in control?”: towards novel approaches to characterise, manipulate and measure the sense of agency in virtual environments. IEEE Trans. Vis. Comput. Graph. 24(4), 1486–1495 (2018) 35. Jeunet, C., Bernaroch, C., Cabestaing, F., Chavarriaga, R., Colamarino, E., Corsi, M.C., Coyle, D., Fallani, F.D.V., Enriquez-Geppert, S., Figueirédo, P., et al.: A user-centred approach to unlock the potential of non-invasive BCIs: an unprecedented international translational effort. In: CHIST-ERA conference 2020 (2020) 36. Jeunet, C., Debener, S., Lotte, F., Mattout, J., Scherer, R., Zich, C.: Mind the Traps! Design Guidelines for Rigorous BCI Experiments, pp. 613. CRC Press Taylor & Francis Group (2018) 37. Jeunet, C., Forge, K., Grevet, E., Amadieu, F., Py, J., Gasq, D.: Modelling the acceptance of BCI-based stroke rehabilitation procedures: Heading for efficiently personalised therapies. In: BCI meeting 2021 (2021) 38. Jeunet, C., Glize, B., McGonigal, A., Batail, J.-M., Micoulaud-Franchi, J.-A.: Using EEGbased brain computer interface and neurofeedback targeting sensorimotor rhythms to improve motor skills: theoretical background, applications and prospects. Neurophysiol Clin 49(2), 125–136 (2019) (Neurophysiol. Move.: From Prepar. Act.)
194
I. N. Angulo-Sherman and R. Salazar-Varas
39. Jeunet, C., Hauw, D., Millán, J.d.R.: Sport psychology: technologies ahead. Front. Sports Act. Living 2, 10 (2020) 40. Jeunet, C., Jahanpour, E., Lotte, F.: Why standard brain-computer interface (BCI) training protocols should be changed: an experimental study. J. Neural Eng. 13(3), 036024 (2016) 41. Kabat-Zinn, J.: Homunculus. Mindfulness 9(6), 1974–1978 (2018) 42. Kong, A.H., Lai, M.M., Finnigan, S., Ware, R.S., Boyd, R.N., Colditz, P.B.: Background EEG features and prediction of cognitive outcomes in very preterm infants: a systematic review. Early Hum. Dev. 127, 74–84 (2018) 43. Kostoglou, K., Müller-Putz, G.R.: Using linear parameter varying autoregressive models to measure cross frequency couplings in EEG signals. Front. Hum. Neurosci. 16 (2022) 44. Kostoglou, K., Müller-Putz, G.: Directed connectivity analysis in people with spinal cord injury during attempted arm and hand movements. In: Müller-Putz, G., Baumgartner, C., (eds.) Proceedings Annual Meeting of the Austrian Society for Biomedical Engineering 2021, pp. 75– 78. Verlag der Technischen Universität Graz (2021); Annual Meeting of the Austrian Society of the Biomedical Engineering 2021: ÖGBMT 2021, ÖGBMT 2021; Conference date: 30-092021 Through 01-10-2021 45. Kostoglou, K., Müller-Putz, G.R.: Using linear parameter varying autoregressive models to measure cross frequency couplings in EEG signals. Front. Hum. Neurosci. 663 (2022) 46. Kotte, S., Dabbakuti, J.R.K.K.: Methods for removal of artifacts from EEG signal: a review. J. Phys.: Conf. Ser. 1706(1), 012093 (2020) 47. Lim, C.G., Poh, X.W.W., Fung, S.S.D., Guan, C., Bautista, D., Cheung, Y.B., Zhang, H., Yeo, S.N., Krishnan, R., Lee, T.S.: A randomized controlled trial of a brain-computer interface based attention training program for ADHD. PloS One 14(5) (2019) 48. Liu, Y.-H., Wang, S.-H., Hu, M.-R.: A self-paced p300 healthcare brain-computer interface system with SSVEP-based switching control and kernel FDA+ SVM-based detector. Appl. Sci. 6(5), 142 (2016) 49. Lotte, F., Congedo, M.: EEG feature extraction. In: Brain–Computer Interfaces 1: Foundations and Methods, pp. 127–143 (2016) 50. Lotte, F., Larrue, F., Mühl, C.: Flaws in current human training protocols for spontaneous brain-computer interfaces: lessons learned from instructional design. Front. Hum. Neurosci. 7, 568 (2013) 51. Luauté, J., Morlet, D., Mattout, J.: BCI in patients with disorders of consciousness: clinical perspectives. Ann. Phys. Rehabil. Med. 58(1), 29–34 (2015) 52. Lugano, G.: Virtual assistants and self-driving cars. In: 2017 15th International Conference on ITS Telecommunications (ITST), pp. 1–5. IEEE (2017) 53. Lugo, Z., Lesenfants, D., Lehembre, R., Laureys, S., Noirhomme, Q.: Role of active ERP paradigms in awareness detection in non responsive patients. In: First International DECODER Workshop (2012) 54. Lulé, D., Zickler, C., Häcker, S., Bruno, M.-A., Demertzi, A., Pellas, F., Laureys, S., Kübler, A.: Life can be worth living in locked-in syndrome. Progr. Brain Res. 177, 339–351 (2009) 55. Lupton, D., Maslen, S.: How women use digital technologies for health: qualitative interview and focus group study. J. Med. Internet Res. 21(1), e11481 (2019) 56. Matran-Fernandez, A., Poli, R., Cinel, C.: Collaborative brain-computer interfaces for the automatic classification of images. In: 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 1096–1099. IEEE (2013) 57. Min, B.-K., Marzelli, M.J., Yoo, S.-S.: Neuroimaging-based approaches in the brain-computer interface. Trends Biotechnol. 28(11), 552–560 (2010) 58. Miralles, F., Vargiu, E., Dauwalder, S., Solà, M., Müller-Putz, G., Wriessnegger, S.C., Pinegger, A., Kübler, A., Halder, S., Käthner, I., et al.: Brain computer interface on track to home. Sci. World J. 2015 (2015) 59. Müller-Putz, G., Scherer, R., Brunner, C., Leeb, R., Pfurtscheller, G.: Better than random: a closer look on BCI results. Int. J. Bioelectromagn. 10(ARTICLE), 52–55 (2008) 60. Muller-Putz, G.R., Pfurtscheller, G.: Control of an electrical prosthesis with an SSVEP-based BCI. IEEE Trans. Biomed. Eng. 55(1), 361–364 (2007)
9 Recent Applications of BCIs in Healthcare
195
61. Münßinger, J.I., Halder, S., Kleih, S.C., Furdea, A., Raco, V., Hösle, A., Kübler, A.: Brain painting: first evaluation of a new brain-computer interface application with ALS-patients and healthy volunteers. Front. Neurosci. 4, 182 (2010) 62. Neuper, C., Scherer, R., Reiner, M., Pfurtscheller, G.: Imagery of motor actions: differential effects of kinesthetic and visual-motor mode of imagery in single-trial EEG. Cogn. Brain Res. 25(3), 668–677 (2005) 63. Nijboer, F., Van De Laar, B., Gerritsen, S., Nijholt, A., Poel, M.: Usability of three electroencephalogram headsets for brain-computer interfaces: a within subject comparison. Interact. Comput. 27(5), 500–511 (2015) 64. Olaronke, I., Rhoda, I., Gambo, I., Oluwaseun, O., Janet, O.: Prospects and problems of brain computer interface in healthcare. Curr. J. Appl. Sci. Technol. 23, 1–7 (2018) 65. Pan, J., Xiao, J., Wang, J., Wang, F., Li, J., Qiu, L., Di, H., Li, Y.: Brain-computer interfaces for awareness detection, auxiliary diagnosis, prognosis, and rehabilitation in patients with disorders of consciousness. Semin. Neurol. 42(03), 363–374 (2022) 66. Pereira, J., Ofner, P., Schwarz, A., Sburlea, A.I., Müller-Putz, G.R.: EEG neural correlates of goal-directed movement intention. Neuroimage 149, 129–140 (2017) 67. Peters, B., Bedrick, S., Dudy, S., Eddy, B., Higger, M., Kinsella, M., McLaughlin, D., Memmott, T., Oken, B., Quivira, F., et al.: SSVEP BCI and eye tracking use by individuals with late-stage ALS and visual impairments. Front. Hum. Neurosci. 14, 595890 (2020) 68. Peters, B., Bieker, G., Heckman, S.M., Huggins, J.E., Wolf, C., Zeitlin, D., Fried-Oken, M.: Brain-computer interface users speak up: the virtual users’ forum at the 2013 international brain-computer interface meeting. Arch. Phys. Med. Rehabil. 96(3), S33–S37 (2015) 69. Peters, B., Eddy, B., Galvin-McLaughlin, D., Betz, G., Oken, B., Fried-Oken, M.: A systematic review of research on augmentative and alternative communication brain-computer interface systems for individuals with disabilities. Front. Hum. Neurosci. 16 (2022) 70. Pichiorri, F., Morone, G., Petti, M., Toppi, J., Pisotta, I., Molinari, M., Paolucci, S., Inghilleri, M., Astolfi, L., Cincotti, F., et al.: Brain-computer interface boosts motor imagery practice during stroke recovery. Ann. Neurol. 77(5), 851–865 (2015) 71. Pichiorri, F., Petti, M., Morone, G., Molinari, M., Astolfi, L., Cincotti, F., Inghilleri, M., Mattia, D.: 9. Brain network modulation following motor imagery BCI-assisted training after stroke. Clin. Neurophysiol. 126(1), e3 (2015) 72. Polich, J., Margala, C.: P300 and probability: comparison of oddball and single-stimulus paradigms. Int. J. Psychophysiol. 25(2), 169–176 (1997) 73. Ramadan, R.A., Refat, S., Elshahed, M.A., Ali, R.A.: Basics of brain computer interface: a review. In: Hassanien, A.E., Azar, A.T., (eds.) Brain–Computer Interfaces, Intelligent Systems Reference Library, pp. 3–30. Springer Cham (2015) 74. Rodriguez, J.D.: Simplification of EEG Signal Extraction, Processing, and Classification Using a Consumer-Grade Headset to Facilitate Student Engagement in BCI Research. The University of Texas Rio Grande Valley, Ann Arbor (2018) 75. Roper, R.L.: Does gender bias still affect women in science? Microbiol. Mol. Biol. Rev. 83(3), e00018-19 (2019) 76. Rubin, T.N., Koyejo, O., Gorgolewski, K.J., Jones, M.N., Poldrack, R.A., Yarkoni, T.: Decoding brain activity using a large-scale probabilistic functional-anatomical atlas of human cognition. Plos Comput. Biol. 13(10), e1005649 (2017) 77. Ryan, M.: To advance equality for women, use the evidence. Nature 604, 403 (2022) 78. Salazar-Varas, R., Vazquez, R.A.: Evaluating the effect of the cutoff frequencies during the pre-processing stage of motor imagery EEG signals classification. Biomed. Signal Process. Control 54, 101592 (2019) 79. Salazar-Varas, R., Vazquez, R.A.: Facing high EEG signals variability during classification using fractal dimension and different cutoff frequencies. Comput. Intell. Neurosci. 2019 (2019) 80. Schnakers, C., Bauer, C., Formisano, R., Noé, E., Llorens, R., Lejeune, N., Farisco, M., Teixeira, L., Morrissey, A.-M., De Marco, S., et al.: What names for covert awareness? A systematic review. Front. Hum. Neurosci. 16 (2022)
196
I. N. Angulo-Sherman and R. Salazar-Varas
81. Seeck, M., Koessler, L., Bast, T., Leijten, F., Michel, C., Baumgartner, C., He, B., Beniczky, S.: The standardized EEG electrode array of the IFCN. Clin. Neurophysiol. 128(10), 2070–2077 (2017) 82. Shao, L., Zhang, L., Belkacem, A.N., Zhang, Y., Chen, X., Li, J., Liu, H., et al.: EEG-controlled wall-crawling cleaning robot using SSVEP-based brain-computer interface. J. Healthc. Eng. 2020 (2020) 83. Sniderman, D.: Girls coming to tech: bix reaches out with tales of the history of women and technology. IEEE Women Eng. Mag. 9(1), 18–20 (2015) 84. Somai, R., Riahi, M., Moussa, F.: ALS recommendation system for BCI user experience evaluation. In: 17th International Conference on Modeling Decisions for Artificial Intelligence (2020) 85. Tian, S., Yang, W., Le Grange, J.M., Wang, P., Huang, W., Ye, Z.: Smart healthcare: making medical care more intelligent. Glob. Health J. 3(3), 62–65 (2019) 86. Tremmel, C., Fernandez-Vargas, J., Stamos, D., Cinel, C., Pontil, M., Citi, L., Poli, R.: A meta-learning BCI for estimating decision confidence. J. Neural Eng. 19(4), 046009 (2022) 87. Tulceanu, V.: The emotion of action: Where logic, algebra and BCI meet. In: Cognitive Sciences—An Interdisciplinary Approach, p. 259 (2015) 88. Tulceanu, V.: A matter of trust: smart home system relying on logic, BCI, and sensor agents. In: 2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 177–180. IEEE (2015) 89. Tulceanu, V., Luca, M.: Brain-computer interfacing for interaction in ad-hoc heterogeneous sensor agents groups. In: JJAP Conference Proceedings 14th International Conference on Global Research and Education, Inter-Academia 2015, pp. 011608–011608. The Japan Society of Applied Physics (2016) 90. Valeriani, D., Poli, R., Cinel, C.: A collaborative brain-computer interface for improving group detection of visual targets in complex natural environments. In: 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER), pp. 25–28. IEEE (2015) 91. Wang, J., Wang, M.: Review of the emotional feature extraction and classification using EEG signals. Cogn. Robot. 1, 29–40 (2021) 92. Wilde, E.A., Wanner, I.-B., Kenney, K., Gill, J., Stone, J.R., Disner, S., Schnakers, C., Meyer, R., Prager, E.M., Haas, M., et al.: A framework to advance biomarker development in the diagnosis, outcome prediction, and treatment of traumatic brain injury. J. Neurotrauma 39(7–8), 436–457 (2022) 93. Yevoo, P.E., Maffei, A.: Women in neuroscience: four women’s contributions to science and society. Front. Integr. Neurosci. 15, 68 (2022) 94. Yuan, H., He, B.: Brain-computer interfaces using sensorimotor rhythms: current state and future perspectives. IEEE Trans. Biomed. Eng. 61(5), 1425–1435 (2014) 95. Zhao, Q.B., Zhang, L.Q., Cichocki, A.: EEG-based asynchronous BCI control of a car in 3D virtual reality environments. Chin. Sci. Bull. 54(1), 78–87 (2009) 96. Zhu, D., Bieger, J., Garcia Molina, G., Aarts, R.M.: A survey of stimulation methods used in SSVEP-based BCIs. Comput. Intell. Neurosci. 2010 (2010)
9 Recent Applications of BCIs in Healthcare
197
Irma Nayeli Angulo-Sherman received the B.Sc. degree in biomedical engineering from the Universidad de Guadalajara, Mexico, in 2011, the M.Sc. and Ph.D. degrees in biomedical engineering and physics from Centro de Investigación y Estudios Avanzados (CINVESTAV) Monterrey, Mexico, in 2013 and 2017, respectively, and the Ph.D. degree in industrial technologies and telecommunications from the Universidad Miguel Hernández de Elche, Spain, in 2018. In 2017, she joined the Universidad de Monterrey, Mexico, where she is an Assistant Professor at the department of Ciencias Aliadas de la Salud. Her current research interests include biomedical signal analysis and brain-computer interfaces. Rocio Salazar-Varas received the B.Sc. degree in bionic engineering from the Interdisciplinary Professional Unit, Engineering and Advanced Technologies, National Polytechnic Institute, Mexico City, Mexico, in 2009, the M.Sc. and Ph.D. degrees in biomedical engineering and physics from the Center for Research and Advanced Studies (Cinvestav) Monterrey, Apodaca, Mexico, in 2011 and 2015, respectively, and the Ph.D. degree in industrial technologies and telecommunications from Miguel Hernández University, Elche, Spain, through a dual doctoral agreement with Cinvestav Monterrey. She spent a year abroad as a visiting student with the Brain-Machine Interface Systems Laboratory, Miguel Hernández University. From 2015 to 2016, she held a Postdoctoral position with the Biomedical Signal Processing Laboratory, Cinvestav Monterrey. In 2017, she joined the University of the Americas Puebla, Cholula, Mexico, where she is an Associate Professor at the Department of Computing, Electronics, and Mechatronics. Her current research interests include biomedical signal processing and feature extraction and classification of EEG signals oriented to brain-computer interface applications. Dr. SalazarVaras was holding the appointment of National Researcher by the Mexican Council for Science and Technology (Conacyt), in 2017 and she currently holds its level-I fellowship.
Chapter 10
Remote Faculty Development Programs for Simulation Educators-Tips to Overcome Barriers Sayaka Oikawa, Maki Someya, Machiko Yagi, and Benjamin W. Berg
Abstract Simulation-based education (SBE) is a practical teaching strategy which has been widely used among healthcare professions. This methodology, underpinned by educational theories including experiential learning, adult learning theory, and reflective practice, has emerged as a major teaching format in healthcare professional education in conjunction with society’s increasing focus on patient safety. Since the pedagogical characteristics of SBE are different from those of traditional didactic methods, SBE requires specific facilitator skills. Faculty development offerings provide training opportunities for simulation facilitators, and incorporate basic principles of learning in simulation; developing learning objectives, designing simulation scenarios, illustrating plans for facilitation and debriefing, and creating assessment processes. Since SBE is an educational method based on experiential learning, it is desirable for participants to learn through hands-on experience to deepen understanding in a problem-solving manner by peer-discussion with other participants in faculty development programs. Since the start of the COVID-19 pandemic, faculty development conducted in a face-to-face, and hands-on manner became difficult. However, an increasing need for SBE by healthcare providers and students emerged and emphasized by the pandemic. To meet this increasing, multiple remote faculty development programs for simulation educators have been employed using digital technologies, such as e-learning, as well as synchronous and asynchronous individual S. Oikawa (B) Department of Innovative and Digitalized Medical Education, Akita University Graduate School of Medicine, 1-1-1 Hondo, Akita 0108543, Japan e-mail: [email protected] M. Someya Integrated Clinical Education Center/Clinical Simulation Center, Kyoto University Hospital, 54 Kawaharacho, Shogoin Sakyo-Ku, Kyoto 6068507, Japan M. Yagi School of Nursing, Jichi Medical University, 3311-159 Yakushiji, Shimotsuke, Tochigi 3290498, Japan B. W. Berg John A. Burns School of Medicine, SimTiki Simulation Center, University of Hawaii at Manoa, 651 Ilalo St, MEB 212, Honolulu, HI 96813, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 H. Kwa´snicka et al. (eds.), Advances in Smart Healthcare Paradigms and Applications, Intelligent Systems Reference Library 244, https://doi.org/10.1007/978-3-031-37306-0_10
199
200
S. Oikawa et al.
and group learning. In this chapter, we report our domestic/international practices focusing on addressing barriers to conduct of remote simulation faculty development programs, as well as considerations regarding technology, participants, facilitators, and content, for successful remote programs. Keywords Simulation-based education (SBE) · Faculty development · Tele simulation · Experiential learning
10.1 Introduction Simulation-based education (SBE) is a practical learning method where learners practice simulated clinical scenarios within the realistic environment, and learn target clinical skills with guided reflection of “debriefing” [1] (Fig. 10.1). SBE is an effective educational format underpinned by various educational theories; experiential learning theory [2], adult learning theory [3], reflective practice [4], and situated cognition [5]. SBE is a unique educational method which allows learners to repeatedly practice simulation scenarios, leading to a thorough mastering of the target clinical skills [6]. This teaching methodology has expanded in undergraduate, postgraduate, and continuing medical education as a result of established recognition of effectiveness for both technical and non-technical skills of healthcare professions [7, 8]. Coupled with the increasing awareness of patient safety, opportunities for students in healthcare professions to practice in real-life clinical settings have been limited and avoided due to risk and ethical issues [9]. In addition, a patient questionnaire showed that they were more accepting of medical students performing clinical procedures if the students had mastered the skill in simulation [10]. In line with the change in mindset of those who receive medical care, a larger number of healthcare profession faculty are urged to adapt this methodology, to help students acquire basic clinical skills through training using SBE, and prepare them before bedside learning. Thus, the need to understand basic principles of SBE has increased, and more teaching staff are expected to be capable of employing this teaching method. There is evidence to support embedding SBE in the existing curriculum, in addition to the existing educational formats, including didactic lectures, communitybased clinical experiences, and problem-based learning [6]. However, the skills required of SBE facilitators are different from those of traditional didactic lecturers. Facilitators engaged in SBE, a practical learner-centered teaching method, should have a well-developed competency in operating scenarios, allowing learners to reflect on their simulation experiences via probing with appropriate questions, and via guiding focused and structured debriefing discussions. Additional required SBE facilitator skills to be mastered include: designing scenarios that allow learners to achieve learning objectives; selecting teaching materials and simulation modalities that support learner immersion in scenarios; developing and sustaining a psychologically safe learning environment; planning strategies for effective feedback and
10 Remote Faculty Development Programs for Simulation Educators-Tips …
201
Fig. 10.1 Image of simulation-based education (SBE)
debriefing; and developing assessment tools for the learners [11]. Thus, as SBE facilitator skills have become better delineated and defined, the need for faculty development for facilitators of SBE has grown. The learners make decisions by judging the physical condition of the manikin, and perform the appropriate clinical maneuvers. The facilitator runs the scenario and controls the manikin. The actor is a teaching-team member who joins the simulation scenario acting in a specific role which is necessary for enhancing learning and supporting the flow of the scenario design. (In the scenario in the image, the actor is playing the role of a senior physician who was called by the learners, who are playing the roles of resident doctors).
202
S. Oikawa et al.
10.2 Faculty Development for Facilitators of Simulation-Based Education (SBE) Faculty development for facilitators of SBE is improving the teaching and assessing skills of those who are engaging in SBE. This occurs in different types of programs: longitudinal simulation fellowships, advanced degree programs of healthcare professions education, short term workshops, and informal sessions in local settings [12– 15]. In simulation fellowship programs, participants spend variable amounts of time teaching with the use of simulations either as a stand-alone fellowship or as an integrated element of their own clinical training program. Ahmed et al. described six core domains which are universally taught in many fellowship programs; simulation curriculum development, simulation operations, research, debriefing, educational theory, and administration [12]. Guidelines are published by academic societies regarding specific skills and knowledge required for facilitators of SBE [16]. One example is the Certified Healthcare Simulation Educator® (CHSE) certification blueprint, by the Society for Simulation in Healthcare (SSH) [17]. SSH offers official certifications for simulation facilitators to increase delivery of high quality of SBE in increased number of simulation centers. Broad areas covered by the exam for certified simulation facilitators are; professional values and capabilities; knowledge of simulation principles, practice, and methodologies; educating and assessing learners; managing simulation resources; and engaging in scholarly activities as they pertain to simulation education [18]. Including these elements, the core competencies required for the simulation facilitators were mostly developed by experienced simulation facilitators through their lenses. Because of this, it is believed that content of faculty development programs should be designed by instructors who are experienced simulation facilitators, in order to enhance program outcomes. However, a recent study emphasized that the topics for faculty development of SBE should include the learners’ perceptions as well as instructors [19]. The learners expect simulation facilitators to provide deeper contexts for simulation tasks, and to set up a positive learning environment where learners feel comfortable enough to ask questions and make mistakes in front of their peers during simulations. To achieve these goals, faculty development for SBE should not only enhance the core competencies of simulation facilitators, but also provide the basic knowledge of medical education including communication with and feedback to their learners. Thus, instructors of faculty development programs are encouraged to design content from a broader perspective. In this chapter, people related to SBE and faculty development of SBE are referred to by the roles shown in the Table 10.1. In general, outcomes of faculty development programs for medical educators reported by participants include several benefits of training; improved career satisfaction, easier path to academic promotion, and improved quality of medical education [8]. Similarly, in faculty development programs for SBE, participants reported that they have established interprofessional collaborations effectively through the
10 Remote Faculty Development Programs for Simulation Educators-Tips …
203
Table 10.1 Description of roles Role
Description
Simulation facilitator
Healthcare professional who engages in simulation-based education (SBE)
Learner
All levels of healthcare professional (e.g. medical student, resident, and nurse) who are taught by simulation facilitators
Participant
Simulation facilitator who is learning in a faculty development program for SBE
Instructor
Expert simulation facilitator who is instructing in a faculty development program for SBE
Remote participant/ instructor
Participant/instructor who joins faculty development for SBE via a video conference system
On-site participant/ instructor
Participant/instructor who joins faculty development for SBE in the venue where the session is conducted
creation of simulation scenarios [20], and have deepened understanding the concept of patient safety [21] as beneficial outcomes of the programs. Several characteristics have been acknowledged as effective elements of faculty development for SBE: participation of well-experienced instructors who are capable of providing participants with appropriate orientation [22]; opportunities for participants to give and receive peer feedback, cultivating self-reflection [23]; and practical learning with scenarios created with frameworks underpinned by educational theories [24]. In order to provide effective programs, instructors of faculty development should develop curriculum with consideration of these elements. Regardless of participant discipline, in-person collegial exchange of ideas among participants and instructors are known to lead to deep learning [25, 26]. Guided practice with hands-on application enables participants to engage with content and enhance learning and well as help them to prepare for independent practice [27]. For these reasons, in-person faculty development for SBE in which participants and instructors can exchange ideas and experience hands-on skill application in a safe learning environment is optimal. Since simulation facilitator skill requirements are wide-ranging, and the level of experience of individual participants is not uniform, faculty development should include multiple teaching elements utilizing an individualized tailored approach. Interactive learning experience, expert feedback, and hands-on practice are emphasized as important teaching elements, in line with didactic lectures and observation [28]. This suggests that effective learning depends on participants’ active engagement during faculty development programs. Hands-on practice is a style of learning that transforms participants’ unconscious incompetence to conscious incompetence [29]. This principle is adequately implemented in faculty development for SBE as a form of “simulated teaching” in which participants demonstrate SBE to each other. In this simulated teaching, participants play roles of simulation facilitators or learners.
204
S. Oikawa et al.
By demonstrating teaching participants discover deficits in their teaching skills, and how to improve. Likewise, by demonstrating learning, the participants understand the experience of stress experienced by learners who are observed by peers during SBE. For these reasons, demonstrating teaching in faculty development courses is essential to cultivate participants’ self-awareness. Furthermore, discussion and reflection with all participants and instructors following simulated teaching supports multifaceted and significant self-reflection [12, 14, 30, 31].
10.3 Transforming Faculty Development for SBE During and After the COVID-19 Pandemic The COVID-19 pandemic complicated and diminished opportunities for face-toface delivery of professional development. Imposed restrictions resulted in modified formats for faculty development including online and/or hybrid designs. In online/ hybrid delivery, some or all of the participants and instructors attend the sessions remotely via a video conference system (e.g. Zoom® , Cisco Webex® , Google Meet® , etc.). In this chapter, we define the different types of faculty development formats and roles as (Table 10.2). Interactive collaboration with peers is paramount in faculty development for SBE. Novel approaches using digital technology include reported remote debriefing training with virtual or augmented reality [15], and an intensive debriefing workshop in an online format [13]. An accelerated shift to online/hybrid formats of faculty Table 10.2 Formats of faculty development sessions for SBE Session format
Description
Face-to-face format
All participants and instructors engage in faculty development sessions for SBE at a single venue that does not rely on remote participant connectivity Simulation devices and technologies (e.g. manikins, skill trainers, and medical equipment) are usually deployed at the venue
Hybrid format
Some participants and/or instructors join faculty development sessions for face-to-face SBE at the venue, while other participants and instructors join remotely Simulation devices and technologies (e.g. manikins, skill trainers, and medical equipment) are usually deployed at the venue, but not deployed at remote participant locations
Online format
All participants and instructors join faculty development sessions for SBE individually, from unique locations
Synchronous
Sessions occur live through a video conference system at pre-specified dates and times
Asynchronous
Sessions are available on-demand. Digital content includes video, audio, lecture, flat-screen interactive formats, etc.
10 Remote Faculty Development Programs for Simulation Educators-Tips …
205
development for SBE is therefore evident, and instructors have been faced with practical challenges for creating interactive spaces for remote participants. Previous studies have reported several unfavorable effects of online/hybrid faculty development, such as an increased sense of participant isolation, and extended time requirements for solving technical problems compared to the face-to-face format. Furthermore, the online/hybrid format does not allow remote participants to engage in discussion, and puts a significant amount of stress on participants with less knowledge of, or experience with, digital technology [32]. The online/hybrid format limits the core interactive practical curriculum components, which are specifically key for SBE faculty development. For example, participants’ creation of simulation scenarios is limited due to environmental constraints (e.g. participants have never created simulation scenarios for remote learners), or, participants’ facilitation of remote participants in demonstrating “simulated teaching” is limited due to technical factors (e.g. remote participants are unable to understand or hear other participants’ voices clearly). Despite these difficulties, online/hybrid faculty development for SBE is expected to continue to expand due to convenient scheduling and accessibility for participants located in rural areas or at a distance from an academic center where a faculty development program is offered in the face-to-face format. As the pandemic subsides, a transition or reversion from online/hybrid to face-to-face format is inevitable, and a revised format style with solutions for the noted remote faculty development format challenges may emerge. In accordance with uncertain societal change, unexpected disaster, and unstable global situations, instructors should be prepared for several format options to flexibly manage unanticipated schedule changes and other contingencies in a fluid fashion. In addition, further consideration is required for managing unfavorable effects associated with online or hybrid faculty development programs for SBE. Based on our experiences, both online and hybrid formats introduce multiple barriers compared to the face-to-face format, however, careful and detailed reflection regarding each barrier will lead to possible solutions. For these reasons, we introduce our reflections of the experiences with a novel online/hybrid faculty development for SBE citing evidence-supported solutions in this chapter.
10.4 Online/Hybrid Faculty Development for SBE: Barriers and Challenges In our collective experience with the transition from an established face-to-face faculty development program for SBE to an online/hybrid format, we encountered multiple barriers or challenges. Four principle categories emerged; technology, participants, facilitators, and content, each with multiple sub-categories (Table 10.3). We introduce each of them by describing details of the two faculty development programs in which the authors are involved, and evidence from previous research.
206
S. Oikawa et al.
Table 10.3 Barriers in online/hybrid faculty development for SBE Category
Sub-category
Example of problems
Example of solutions
Technology
Audio visual problems
Remote participants have difficulty in listening to the discussion of on-site participants
Use conference microphone in the venue
Individual knowledge of technology
Participants with less knowledge of technology have disadvantages in participating
A technician joins the session and manages the participant’s technical troubles remotely
Participants
Instructors
Trouble shooting Unclear who is responsible in case of problems
Discuss about risks of technical failure, and have a consensus of planning for failure between instructors
Time management
Program takes longer due to technical troubles
Plan a schedule with enough space of time plan for technology failure
Collegiality
Participants are unable to join the on-site discussion
Allow remote participants to use chat function of the system, and instructor pay attention to the question during the discussion
Team dynamics
Participants have difficulty in getting along with other team members
Provide a pre-course web-based format (e.g. learning management system: LMS) for peer introduction so that they could know each others
Interactions with Participants hesitate to ask instructors questions during the session
Instructors provide several options for the participants to ask questions (e.g. LMS, chat function, and personal communication via E-mail)
Communication within instructors
Remote instructors have difficulty in communicating during the session
Conduct pre-session meetings and preplan possible trouble shootings, and use additional communication tools as a back-up during the session
Feedback to the participants
Instructors are unable to give Implement the co-facilitation appropriate feedback to the technique participants’ scenario demonstration due to limited audio visual accessibility (continued)
10 Remote Faculty Development Programs for Simulation Educators-Tips …
207
Table 10.3 (continued) Category
Content
Sub-category
Example of problems
Example of solutions
Experience of remote facilitation
Instructors who have less experience at remote facilitation are unable to manage the remote group discussions
Assign more than one instructor to the group work if the instructor is less-experienced
Group work
Group work requires longer time for remote discussion
Spend more time on group work than face-to-face format faculty development session
Scenario development
Participants have limitations Instructors prepare several in scenario options they options of remote simulation create scenarios to share with the participants
Scenario demonstration
Flow of scenario Give the participants more than demonstration is not smooth one chance to demonstrate due to disorganized their scenario if it is necessary collaboration of remote participants
Whole discussion
The communication tends to be done between instructor and participant rather than between participants
Use chat function concurrently and create another communication space for the participants
10.5 Practice #1 an International Hybrid Faculty Development Program for SBE 10.5.1 Program Description Fundamental Simulation Instructional Methods (FunSim) is an introductory level two-day train-the-trainer course designed for simulation facilitators from all disciplines who desire to effectively apply sound, up-to-date instructional design principles of simulation-based healthcare education with active facilitation and debriefing techniques. This program was developed by the SimTiki Simulation Center, at John A. Burns School of Medicine. The original creator and simulation experts in Japan translated all content into Japanese, and the Japanese version of this program (FunSim-J) was established in 2011. This program has been held in both the U.S. and Japan as a Japanese language face-to-face format course supported by international instructors. In 2020, a hybrid e-learning format was implemented due to the COVID-19 pandemic. The program agenda was reconfigured incorporating both synchronous and asynchronous remote educational formats, and was delivered over multiple days (Fig. 10.2). A total of 105 participants were enrolled in this program between 2020 and 2022. The instructors are Certified Healthcare Simulation Educators (CHSE® ) and program directors of the International Simulation Education Fellowship at the University
208
S. Oikawa et al. ®
Fig. 10.2 FunSIM-J on-line: agenda
of Hawaii, SimTiki Simulation Center, and Japanese healthcare professionals who graduated from this program. Fellowship graduates served to as faculty and as a bridge between instructors and participants, as translators, and as cultural mediators [30, 33] for Japanese curriculum localization. Beginning in 2020, the instructors developed the FunSim-J On-line (FSJ-O) hybrid synchronous and asynchronous elearning curriculum. From the simulation center at the University of Hawaii, they served as on-site co-located instructors, with 3–4 remotely located instructors via video conference from Japan. Participants joined the program remotely from multiple locations of their choosing, throughout Japan. An integrated learning management system (Moodle® ) was coordinated and managed from the venue in Hawaii by an on-site instructor, who supported technical engagement by remote instructors and participants. Instructors used an internet-based text messaging service (WhatsApp® ) to communicate and manage real-time program coordination and technical issues during synchronous events. On Day #1, on-site instructors demonstrated a model anaphylaxis simulation scenario. Participants were divided into 2–3 workgroups of 4–6 people per group, and synchronous collaborative group work was conducted via a videoconferencing breakout room function of Zoom® , on Day #2 and Day #3. For each workgroup, 1–2 instructors were assigned to observe and support. Participants were assigned to modify a scenario total length of 13 min; 3 min for orientation, 5 min for facilitation, and 5 min for debriefing. On Day #4, participants from each workgroup demonstrated their modified scenario, as “simulated teaching”, with on-site instructors at the simulation center serving in “simulated learner” roles. Members of each workgroup remotely joined via Zoom® and conducted a demonstration scenario during the session, including orientation, facilitation, and debriefing of the venue located simulated learners. Participants from all workgroups observed each demonstration
10 Remote Faculty Development Programs for Simulation Educators-Tips …
209
scenario. After each workgroup demonstrated a modified scenario, participants and instructors participated in a “meta-debriefing”, and discussed the scenario design’s good points, and elements that required modification.
10.5.2 How Did We Manage the Technology Issues?—Collaborating with a Digital Technology Technician In a hybrid session in which some participants and/or instructors join at the venue, while other participants and instructors join remotely, we imagined that remote participants and instructors would be isolated due to difficulties in video such as having a complete view of the entire simulation demonstration, and/or audio limitations to hearing conversation between participants at the venue. To overcome these issues suitable technology is necessary; e.g. using a high-quality conference microphone or built-in omnidirectional microphone in the simulation space. A digital technology technician on-site at SimTiki, the venue of the program was critical to preparing, testing, and assuring adequate audio/visual capabilities. We strongly recommend inclusion of an on-site knowledgeable technician for real-time intervention for technology-related issues which may impact the flow of the planned educational design. When organizing an online/hybrid program, the instructors are not only required to design and facilitate education, but also to pay attention to additional factors, including technical facets, participant reactions, messages from the remote instructors, awareness of their personal visibility in live camera views, and time management in live sessions. These requirements pose a risk to the control of the sessions. Therefore, the instructors should anticipate and establish a back-up plan for technical difficulties, including adequate personnel resources to deploy back-up strategies with minimal impact on the learning experience.
10.5.3 How Did We Encourage Remote Participant Engagement in Distance Learning?—Implementing a Learning Management System (LMS) When re-designing session content from the original face-to-face format to a hybrid format, a choice had to be made as to whether to maintain core content which had been done face-to-face as a live/synchronous session or replace it with on demand asynchronous content. Furthermore, we anticipated that remote participants would find it challenging to collaborate as naturally as participants in face-to-face faculty development programs. We approached this problem with the effective use of technology. We developed an e-learning system for participants using an established state-ofthe-art learning management system (LMS), Moodle® for asynchronous elements of
210
S. Oikawa et al.
FSJ-O. This LMS enables a range of functions such as sharing of learning materials including on-demand video lectures, providing pre-course learning materials which foster the participant’s learning experience at an individualized pace, communicating between instructors and participants using a chat/forum function before, during and after each session, and enabling and coordination synchronous and asynchronous interactions among participants and instructors. Moodle® is an open-source LMS which uses plug-ins to enhance learners’ self-study. Moodle® can be used to collect detailed learning logs [34], to support program evaluation. Smooth implementation of distance learning requires specific technical responses, management of learning in relation to the continuation of distance learning, maintenance of the motivation to learn, and communication skills specific to a distance learning environment [35]. To avoid technical problems, course designers are required to become familiar with information and communication technologies (ICT) used by integrated computer systems and LMS technical specifications. In addition, participants’ ability to manage their learning, maintain motivation to learn, and to improve metacognitive skills also require adapting and devising self-regulated learning strategies. A Moodle® basic instruction manual was created and shared with FSJ-O participants. A contact e-mail address was displayed on-screen during live sessions so that remote participants could consult and contact the FSJ-O instructor team for additional real-time learning support, as recommended by Benson [36]. In consideration of time constraints for participants who study while balancing work and other commitments, a pre-course self-learning period of one week available prior to the main sessions, and synchronous sessions were conducted via Zoom® for a duration of 1.5–3.5 h. A self-introduction session was conducted online using the asynchronous forum/ chat function prior to the initial synchronous sessions to facilitate mutual understanding among participants, because learning acquisition occurs more frequently among learners who adapt to online learning and respond to posts compared to learners who do not [37]. Evidence supported by previous research; the usefulness of self-introduction and LMS based dialogue among learners [38], and the effective use of forums on LMS in distance learning by Japanese nurses [39] prompted design of the pre-course socialization forum in our program. In an attempt to promote participants’ motivation to learn, interactive content was embedded in pre-course learning materials. As an example, we placed selfassessment/knowledge check quizzes following on-demand video lectures, so that participants could confirm content understanding and identify gaps (Fig. 10.3). Table 10.4 summarizes the distance implementation of FunSim-J using the Substitution, Augmentation, Modification, and Redefinition (SAMR) model [40, 41], a framework for the acceptance and integration of digital technologies. The SAMR model divides the use of technology in teaching and learning activities into four categories, namely; direct use of technology without change (Substitution), use of technology with functional changes (Augmentation), significant use of technology with re-design (Modification), and use of technology creation in new tasks that previously did not exist (Redefinition). As shown in Table 10.3, the use of the LMS in
10 Remote Faculty Development Programs for Simulation Educators-Tips …
Fig. 10.3 On-demand video lecture (upper), and knowledge check in the LMS (lower)
211
212
S. Oikawa et al.
Table 10.4 Summary of LMS tools used in FunSim-J with corresponding SAMR model Face-to-face
Hybrid
Corresponding SAMR model
Distribution of Distributed in hard learning materials copy
Available for download in the LMS A: Augmentation
Communication between participants and instructors
E-mail via the secretariat
E-mail via the secretariat Online LMS forum
R: Redefinition
Communication among participants
Face-to-face communication Exchange of individual contact information is optional
Remote communication via Zoom® , or chat function Messages and document sharing via LMS online forum
R: Redefinition
Pre-course self-learning
None
15–20 min on- demand LMS lecture videos, & quizzes
R: Redefinition
Confirmation of learning progress
None
Learning progress of the participants can be checked from the LMS’s log function
R: Redefinition
FSJ-O not only replaced a simple educational tool, but also enabled new practices not been possible in the original format. In a post-course questionnaire for FSJ-O, 80% of participants indicated that distance learning was suitable for them. However, 40% of the respondents experienced technical difficulties with distance learning. In terms of preferable formats, more than 70% of the participants indicated that they were interested in taking an advanced faculty development program in online/hybrid formats. These results indicate that the hybrid faculty development program for SBE successfully promoted acceptance of distance learning among.
10.5.4 How Did We Overcome Barriers to Remote Demonstration of Simulation Scenarios Designed by Videoconference Collaborating Learner Workgroups?—Modifying Scenario Demonstration Methods Role-playing is known as an important factor for learners’ immersion in medical training settings [42]. In the face-to-face FunSim-J program, participants role-play scenarios with each other, in the roles of “learners” and “simulation facilitators”. Due to the high possibility of technical troubles and participant lack of experience of remote simulation demonstration, modification of the delivery method was required.
10 Remote Faculty Development Programs for Simulation Educators-Tips …
213
In terms of educational outcomes, to provide opportunities for participants to experience the role of learners was imperative, since it illuminates stress and other unique factors associated with being taught and active learning in front of peers. Furthermore, the participants in learner roles, could highlight flawed elements of the scenario design from a learner’s perspective, and provide feedback to participants in the role of simulation facilitators. For those reasons, experiencing the role of learners enabled participants to gain insights during scenario demonstration and meta-debriefing. However, since each remote participant joined the session independently from a single location, usually without any simulation capability, it was difficult for them to demonstrate or play in roles of learners, and scenarios used in the demonstration were necessarily limited in their variation. FSJ-O instructors preferred to let the participants design demonstration scenarios which were entirely designed according to workgroup preference. Constraints on participant ability to role-play simulated learners required modification in the delivery of the scenario demonstration, a core component of the faculty development program for SBE. The modified scenario development and demonstration engaged on-site instructors at the SimTiki simulation center venue in simulated learner roles, and assigned a core scenario framework and topic (anaphylaxis) for workgroup demonstrations in simulation facilitator and debriefer roles. Working from a pre-formatted core scenario generates less effort and cognitive load in the scenario creation process during faculty development programs. In this way, we overcame barriers of remote demonstration of simulation scenarios (Fig. 10.4).
Fig. 10.4 Remote participant in the simulation facilitator role (lower right) facilitated the onsite instructors in simulated learner roles (lower left). Other remote participants and instructors observed the demonstration scenario via Zoom® (upper). To enhance effectiveness of demonstration scenarios, videos of both learner and facilitator roles were pinned, allowing all participants to focus on the active scenarios
214
S. Oikawa et al.
Based on our collective experiences, several tips emerged for successful remote demonstration of simulation scenarios during faculty development sessions. First, vivid and reliable audiovisual systems and simulators which work as intended were critical. In simulation rooms at the University of Hawaii, the Pan Tilt Zoom (PTZ) digital cameras and omnidirectional microphones with wide area coverage and noisecancelling were ceiling mounted. Instructors carefully pre-checked the audiovisual systems and simulators, and confirmed functionality with remote instructors in advance on the day of each session. This pre-session technical double-check enabled participant immersion in the teaching environment, regardless of geographic distance. Second, timely interaction between the participants in learner roles and simulation facilitator roles was required for smooth demonstration. For this sake, the on-site instructors who were in simulated scripted learner roles intentionally spoke loudly and verbalized every single action they took during the scenario. By doing this, the remote participants were able to observe and understand all simulated learner actions at the venue, without missing important information (e.g. the patient’s vital signs), and remote participants in simulation facilitator roles could respond to the learners’ questions without any guessing or misinterpreting behavior during scenarios. Third, direct real-time sharing of course management information between on-site and remote instructors helped achieve educational outcomes. Since participants in simulation facilitator roles sometimes struggled with conducting a wellorganized orientation to provide pre-scenario explanation for instructors as simulated learners, scenario demonstrations were sometimes incomplete. This incomplete performance could be a discussion trigger to enhance the participant learning; however, in situations where time was limited, orientation miscommunication could lead to poor educational outcomes. In our experiences, sharing the simulation design agenda of each group work with all instructors who could anticipate and adjust their simulated learner behavior to highlight key elements of each scenario design led to successful scenario demonstrations. These tips are consistent with some noted aspects which enhance learners’ immersion during SBE indicated by Hagiwara et al. [43] (Table 10.5). Decreasing participants’ uncertainty about the demonstration in the post-scenario discussion was crucial. After the scenario demonstration, all participants and instructors engaged in a facilitated discussion, where participants in simulation facilitator roles reflected on their performances in response to inquiries from instructors. In this process, instructors should allow participants to describe what they experienced during the scenario. This avoids ambiguity in the participants’ mind. When unintended things happened in the scenario, participants analyzed and rationalized reasons, relating to their behavior in role of facilitator, scenario designer, or simulation environment factors. Participants remained uncertain about the reflection conclusions unless an explicit description was given. The key person who was able to guide reflections and suggest ideas for analysis and reflection was the instructor who primarily guided the discussion, he/she should ask the participants about their reflections as much as possible rather than simply providing his/her own analysis or opinion. Here, instructors needed the skill of asking the right questions at the right time.
10 Remote Faculty Development Programs for Simulation Educators-Tips …
215
Table 10.5 Triggers which reduce or enhance the participants’ immersion during simulation scenarios [43]
When uncertainty among participants regarding analysis of unanticipated events during scenario demonstration could not been cleared, providing opportunity for the participants to modify and repeat the demonstration following discussion was effective. Iterative learning is a core strategy for experiential learning, and in our model of faculty development for SBE, it worked well for engaging a group of participants who were at different levels of experience. We found that the sequence of providing a core scenario, facilitating participant modification and customization of the scenario, and conducting remote alpha-testing [31] of the scenario allowed participants to engage in, and, observe the entire sequence and range of fundamental simulation instructional methods. Our experience revealed opportunities to revise and optimize the online/hybrid SBE faculty development program, enabling participants’ active engagement by addressing multiple barriers.
10.6 Practice #2 A Domestic Online Faculty Development Program for SBE 10.6.1 Program Description Foundation Course for Medical Education (FCME) is a one-year certificate program for physicians in Japan. This program was established in 2015, and enrolls 12 participants every year [44]. In this program, there is a 9-hour session of SBE where participants demonstrate scenarios they created and give feedback to each other (Fig. 10.5). In this session, 12 participants were divided into 3 groups of 4 people each, and were assigned to create a simulation scenario for a group demonstration. The scenario assigned time is 20 min in total; 5 min for orientation, 5 min for facilitation and 10 min for debriefing.
216
S. Oikawa et al. ®
Fig. 10.5 Program agenda of the simulation session of FCME (online version)
From 2015 to 2019, both instructors and participants all attended the program faceto-face in the venue; however, in 2020 and 2021, the sessions were conducted online due to the COVID-19 pandemic. In the online program, the 5–6 instructors joined the session remotely. This program utilized the LMS operated by Kyoto University, and all pre-course learning materials were uploaded in the LMS. Participants submitted pre/post-course assignments via the LMS, and the instructors scored and input scores via this system. Prior to the day of the session, the participants were required to review on demand video lectures of pre-course materials about designing simulationbased education, debriefing, gamification, and organizing simulation courses as a flipped classroom format. Furthermore, the participants needed to submit a pre-course collaborative assignment with learner group members. The pre-course assignment was a scenario blueprint for demonstration. Each group was assigned 1–2 instructors as supporters of the group discussion.
10.6.2 How Did We Support Participants’ Synchronous Group Work for Creating a Simulation Scenario to Demonstrate?—Providing Pre-course Materials for Remote Simulation Scenarios For participant factors, barriers related to individual experiences with remote simulation were identified. For example, creating and facilitating simulation scenarios which are applicable to remote learners is challenging for participants who may have limited experience in real-life educational settings, and are new to the concepts
10 Remote Faculty Development Programs for Simulation Educators-Tips …
217
presented in the faculty development program for SBE. In such situations, instructors should offer a laddered opportunity for participants to participate at a comfortable level matched to their experience. Specifically, we created and provided by LMS pre-course reference material to support design of remote simulation scenarios based on previous reports and educational resources. This reference material introduced outlines and templates for creating scenarios and supported participants’ precourse group work. Creating a simulation scenario with group members who have never collaborated together is difficult even in the face-to-face format. Therefore, we provided these pre-course resource materials two weeks before the scenario demonstration sessions, allowing participants adequate time for preparation.
10.6.3 How Did We Provide a Concrete Participant Experience Despite Geographical Barriers?—Creating an Isolated Space for Simulation Demonstration in the Virtual Learning Environment On the day of the online session, group work was conducted using the Zoom® Breakout Room function, the remote instructor joined and supported the group discussion if needed. Although the overall agenda and time schedule of the session was modified from the face-to-face sessions, the time allocation for group demonstrations was unchanged since this is a core part of faculty development for SBE. In the scenario demonstration, each participant group had a specific role; simulation facilitator, learner, and observer. During the scenario demonstration, the roles of simulation facilitators and learners remained on-camera/turned on video, while the observers and all instructors remained off-camera/turned off video. At the beginning of the discussion following the scenario demonstration, everyone was on-camera/turned on video. In addition, during the scenario demonstration, the group of observers and instructors were free to write comments or opinions in the chat box, and the participants of simulation facilitator roles and learner roles were able to read them concurrently with the scenario demonstration (Fig. 10.6). Since participants and instructors joined the session from multiple different locations without simulation facilities, providing participants a concrete experience of simulation education was challenging. We overcame this barrier by increasing the presence of participants of each role. In the literature, the three types of presence (physical presence, co-presence, and social presence) are known in virtual educational settings, and that “the meaning learners derive from a learning activity reflects the quality of interactions among learners and facilitators” is known in a virtual learning setting [45]. Referring to these findings, we employed a rule for the online faculty development program for SBE. We established a rule for turning video on or off, depending on the role of the participant. During the scenario demonstration, participants in simulation facilitator and learner roles turned on their videos, and had audio communication with each other via computer microphones, while observers
218
S. Oikawa et al.
Fig. 10.6 Screenshot of scenario demonstration in an online session. A participant in the learner role (large view) performed a physical examination on the simulated patient of the puppet, with facilitation by a participant in a simulation facilitator role (small view on the right upper of the large view). Observers and instructors wrote comments in the chat box (right)
and instructors turned off their cameras and communicated via the chat function. In this way, we intentionally created an isolated space in the virtual environment where the participants in each role could devote themselves to the scenario demonstration. This rule was employed in order to provide a practical learning environment to participants. In the experiential learning theory by Kolb [2], learning occurs through a cycle of: (i) having a concrete experience (concrete experience); (ii) reflecting on the experience from various perspectives (reflective observation); (iii) conceptualizing the experience so that it can be applied not only in a single context but also in other situations (abstract conceptualization); and (iv) attempting new tasks based on these concepts (active practice). Therefore, we strongly believe that, in order to generate effective learning in a faculty development program for SBE, it is essential for the participant to have actual experiences of simulation which left them with concrete ideas. However, it was difficult to overcome geographic barriers which restrain remote participants from demonstrating actual simulation scenarios collaboratively. We attempted to overcome this barrier by creating a relatively isolated environment for scenario demonstration by controlling participants’ video appearance.
10 Remote Faculty Development Programs for Simulation Educators-Tips …
219
10.6.4 How Did We Manage to Increase Participants’ Reflective Observation?—Unexpected Learning Effects Found in the Online Faculty Development Program As Kolb indicated, the reflection on the experiences is one of the key components of experiential learning [2]. Schön categorized reflection into two types; reflection that takes place whilst a person is involved in the situation (reflection-in-action), and reflection that happens at some time after the situation has occurred (reflection-onaction) [4]. In the face-to-face faculty development sessions for SBE before 2019, participants in roles of simulation facilitators demonstrated simulation scenarios by observing the reactions, conversation, and behaviors of the participants in roles of learners who were physically present in front of them (Fig. 10.7). The audience, participants in observer roles and instructors, observed their demonstration in the venue, usually without any conversation. In this traditional scenario demonstration, participants in simulation facilitator roles modified their cueing or actions based on the performance of participants in learner roles. We considered that this modification of behaviors by the participants in simulation facilitator roles occurred by reflection-in-action. Following the demonstration, we conducted a facilitated discussion with all participants and instructors through a plenary review. In this discussion, the participants in simulation facilitator roles reflected their performances based on their memories guided by the instructor, which we consider as process of reflection-on-action. Since many fruitful reflective dialogues emerged during this discussion, we recognized the importance of reflection-on-action through face-to-face faculty development sessions. Based on these experiences, we conducted facilitated discussion following the scenario demonstration in the online faculty development session as well. In the discussion, adequate time was allocated to allow participant reflections, and the instructor posed probing questions as needed to prompt participant discussion and reflective analysis regarding the demonstration scenario from an educator viewpoint. With these solutions, post-demonstration discussion was successful, even in the remote/virtual learning environment. Furthermore, in the discussion we noticed that concurrent self-reflection of participants in simulation facilitator roles, termed “reflection-in-action” by Schon, was triggered by multiple channels, greater than that of face-to-face sessions (Fig. 10.8). In traditional fact-to-face sessions, even if participants in simulation facilitator roles engaged in reflection-in-action during the simulation demonstration, it was triggered primarily mainly by the reaction of the participants in learner roles or the collaborative atmosphere of the situation. In online sessions, however, the triggers for reflection-in-action changed and became more complex due to online-specific features. In face-to-face sessions, it was difficult to view one’s own demonstration objectively in a synchronized way unless one adjusted the situation, such as using a system that projects images of the demonstration concurrently, or demonstrating
220
S. Oikawa et al.
Fig. 10.7 The scene of scenario demonstration in the face-to-face faculty development program for SBE
scenarios in a mirrored room. Conversely, in online sessions, participants in simulation facilitator roles could simultaneously observe their own image and chat box comments at the same time as the active simulation demonstration; thus reflection-inaction is facilitated by multiple triggers. The theory-based approach for switching the video on or off gave instructors unexpected and meaningful findings about the participants’ reflections. Someya et al. reported the changes in construction of learners’ reflections by comparing online and face-to-face sessions [14]. In a standard face-to-face faculty development session, a video debriefing technique was used to encourage reflection-on-action, in which participants in each role watched and reflected on a recording of the demonstration. It was suggested that video debriefing after simulation was effective in terms of enabling participant selfreflection and synchronous feedback from the instructor to the participant. However, the learner impact of self-reflection while viewing themselves during simulation demonstrations is not clear. We will seek to further clarify this point, in order to advance knowledge about how to conduct effective faculty development for SBE.
10 Remote Faculty Development Programs for Simulation Educators-Tips …
Concrete experience (Scenario demonstration)
221
Face-to-face session
Online session
Scenario solely with onsite participants ↓ ↑
Scenario with only remote participants ↓ ↑
Observing performance of participants in learner roles
Observing performance of the participants in learner roles + Reading discussion of observers and instructors on the chat + Viewing of self-view on the computer screen
(Reflection-in-action)
(Reflection-in-action) Reflective observation (Discussion following scenario demonstration)
Abstract conceptualization (Discussion following scenario demonstration) Active practice (After the session)
Answering instructor probing questions and reflecting on the experience
Answering instructor probing questions remotely and reflecting on the experience
(Reflection-on-action)
(Reflection-on-action)
Practicing what they learned during the scenario demonstration in the participants’ actual clinical contexts
Practicing what they learned during the scenarios demonstration in the participants’ actual clinical contexts
Fig. 10.8 Correspondence of each component of experiential learning theory with the behavior of participants in simulation facilitator roles during both face-to-face and online simulation faculty development
222
S. Oikawa et al.
10.6.5 How Did We Collaborate with Remote Instructors—Utilizing Communication Tools and Co-facilitation Techniques Effectively in the Online Faculty Development Program In addition to participant factors, instructor considerations for faculty development at a distance revealed challenges. For example, remote instructors experienced technical troubles, especially for direct instructor-to-instructor communication using a video-conference system during live sessions with participants. In such a situation, preparing an integrated additional/alternate communication channel for instructorto-instructor and technician-to-technician real-time tools (e.g. internet-based text messaging service) to resolve challenging situations was effective. Furthermore, we utilized an established co-facilitation technique for both faculty development programs [46]. Co-facilitation techniques enable instructors to address the challenges collaboratively, especially in online/hybrid faculty development where a single instructor may be required to manage multiple additional tasks compared to face-to-face programs. For example, one of the instructors was assigned to monitor synchronous written communication so that technical, timing or other challenges could be noted immediately during scenario demonstrations. In this co-facilitation situation, other instructors observed the demonstration closely to assure availability of an instructor to guide the post-demonstration discussion. In terms of effective educational outcomes, the co-debriefing technique with multiple simulation facilitators providing feedback to learners in SBE, is known to have beneficial outcomes [47]. We implemented this concept into the design of the post-demonstration discussion framework, and found that this dynamic technique provided a larger pool of expertise with diverse viewpoints, allowing us to manage various learner expectations and needs.
10.7 Conclusions In this chapter, we described the barriers and challenges in hybrid/online faculty development for SBE, and proposed several solutions and lessons-learned based on both our experiences and the results of previous studies. Due to the COVID-19 pandemic, instructors faced many difficulties in conducting optimal faculty development sessions. Most attempted to provide program content of equivalent quality as face-to-face sessions. However, we discovered that the new distance learning for faculty development paradigm can be explored while taking on, and adjusting to, anticipated and unanticipated challenges. We uncovered immense possibilities for development of a “new” learning style. Iterative trial and error experiences with careful analysis and reflection will take us on the journey of learning, and sometimes show us invisible potential of faculty development for SBE far beyond our understanding.
10 Remote Faculty Development Programs for Simulation Educators-Tips …
223
Acknowledgements The authors would like to express our deepest gratitude to Dr. Gen Ouchi, Dr. Yuki Moritoki, Dr. Yuka Eto, Dr. Eri Sato, Dr. Kentaro Ono, Ms. Kris Hara, Mr. Michael von Platen, and Ms. Eileen Beamis Maeda for FunSim-J online, and Dr. Hiroshi Nishigori, Dr. Takeshi Kimura, Mr. Fumitaka Tanemura, Dr. Shoko Tani, Dr. Tsunetoshi Mogi, Dr. Junichi Tanaka, Dr. Kenichi Tetsuhara, Dr. Akira Yamamoto, and Ms. Kureha Kasahara for Foundation Course for Medical Education (FCME). We also would like to appreciate all the participants of the faculty development courses.
References 1. Cook, D.A.: How much evidence does it take? A cumulative meta-analysis of outcomes of simulation-based education. Med. Educ. 48(8), 750–760 (2014). https://doi.org/10.1111/medu. 12473 2. Kolb, D.A.: Experiential Learning: experience as the Source of Learning and Development, 1st edn. Prentice-Hall, Inc. (1984) 3. Knowles, M.S.: The Modern Practice of Adult Education; andragogy Versus Pedagogy, New York Association Press (1970) 4. Schön, D.A.: The Reflective Practitioner: How Professionals Think in Action. Basic Books (1983) 5. Lave, J., Wenger, E.: Situated Learning: legitimate Peripheral Participation. Cambridge University Press (1991) 6. Motola, I., Devine, L.A., Chung, H.S., Sullivan, J.E., Issenberg, S.B.: Simulation in healthcare education: a best evidence practical guide. AMEE Guide No. 82. Med. Teach. 35(10), 142–159 (2013). https://doi.org/10.3109/0142159X.2013.818632 7. Bradley, P.: The history of simulation in medical education and possible future directions. Med. Educ. 40(3), 254–262 (2006). https://doi.org/10.1111/j.1365-2929.2006.02394.x 8. Hughes, P., Brito, J.C., Ahmed, R.: Training the trainers: a survey of simulation fellowship graduates. Can. Med. Educ. J. 8(3), e81–89 (2017). https://doi.org/10.36834/cmej.36865 9. Chernikova, O., Heitzmann, N., Stadler, M., Holzberger, D., Seidel, T., Fischer, F.: Simulationbased learning in higher education: a meta-analysis. Rev. Educ. Res. 90(4), 499–541 (2020). https://doi.org/10.3102/0034654320933544 10. Graber, M.A., Wyatt, C., Kasparek, L., Xu, Y.: Does simulator training for medical students change patient opinions and attitudes toward medical student procedures in the emergency department? Acad. Emerg. Med. 12(7), 635–639 (2005). https://doi.org/10.1197/j.aem.2005. 01.009 11. McKimm, J., Forrest, K.: The roles of faculty and simulated patients in simulation. In: McKimm, J., Forrest, K., Edgar, S., (eds.) Essential Simulation in Clinical Education, pp. 87–110. Wiley-Blackwell (2013) 12. Ahmed, R.A., Frey, J.A., Hughes, P.G., Tekian, A.: Simulation fellowship programs in graduate medical education. Acad. Med.: J. Assoc. Am. Med. Colleges 92(8), 1214 (2017). https://doi. org/10.1097/ACM.0000000000001780 13. Gantt, L.T., Robey, W.C., Langston, T., Bliley, L.: Simulation faculty and staff development: an interprofessional, online approach. J. Interprofessional Educ. Pract. 19, 100310 (2019). https:/ /doi.org/10.1016/j.xjep.2019.100310 14. Someya, M., Oikawa, S., Mogi, T., Tanaka, J., Tetsuhara, K.: A new style of reflection from online simulation faculty development practice. Igaku Kyoiku/Med. Educ. (Japan) 52(5), 405– 410 (2021) 15. Wong, N.L., Peng, C., Park, C.W., Pérez, J., Vashi, A., Robinson, J., Okuda, Y.: DebriefLive: a pilot study of a virtual faculty development tool for debriefing. Simul. Healthc. 15(5), 363–369 (2020). https://doi.org/10.1097/SIH.0000000000000436
224
S. Oikawa et al.
16. The Certified Healthcare Simulation Educator® .: (n.d.) https://www.ssih.org/Credentialing/Cer tification/CHSE. Accessed 9 Feb. 2023 17. INACSL Standards Committee: INACSL standards of best practice: simulationSM facilitation. Clin. Simul. Nurs. 12, S16–S20 (2016). https://doi.org/10.1016/j.ecns.2016.09.007 18. Wittmann-Price, R.A.: The certification examination test plan. In: Wilson, L., Wittmann-Price, R.A. (eds.) Review Manual for the Certified Healthcare Simulation Educator (CHSE) Exam, pp. 7–12. Springer Pub Co (2015) 19. Pylman, S.E., Emery, M.T.: Student perceptions of effective simulation instructor teaching. Simul. Healthc.: J. Soc. Simul. Healthc. 18(1), 51–57 (2022). https://doi.org/10.1097/sih.000 0000000000640 20. Plack, M.M., Goldman, E.F., Wesner, M., Manikoth, N., Haywood, Y.: How learning transfers: a study of how graduates of a faculty education fellowship influenced the behaviors and practices of their peers and organizations. Acad. Med. 90(3), 372–378 (2015). https://doi.org/10.1097/ ACM.0000000000000440 21. Natal, B., Szyld, D., Pasichow, S., Bismilla, Z., Pirie, J., Cheng, A.: Simulation fellowship programs: an international survey of program directors. Acad. Med. 92(8), 1204–1211 (2017). https://doi.org/10.1097/ACM.0000000000001668 22. Anderson, M., Bond, M.L., Holmes, T.L., Cason, C.L.: Acquisition of simulation skills: survey of users. Clin. Simul. Nurs. 8(2), e59–e65 (2012). https://doi.org/10.1016/j.ecns.2010.07.002 23. Fey, M.K., Auerbach, M., Szyld, D.: Implementing faculty development programs: moving from theory to practice. Simul. Healthc. 15(1), 5–6 (2020). https://doi.org/10.1097/SIH.000 0000000000429 24. Jeffries, P.R., Dreifuerst, K.T., Kardong-Edgren, S., Hayden, J.: Faculty development when initiating simulation programs: lessons learned from the national simulation study. J. Nurs. Regul. 5(4), 17–23 (2015). https://doi.org/10.1016/S2155-8256(15)30037-5 25. Cate, O.T., Mann, K., Mccrorie, P., Ponzer, S., Snell, L., Steinert, Y.: Faculty development through international exchange: the IMEX initiative. Med. Teach. 36(7), 591–595 (2014). https://doi.org/10.3109/0142159X.2014.899685 26. de Grave, W., Zanting, A., Mansvelder-Longayroux, D.D., Molenaar, W.M.: Workshops and seminars: enhancing effectiveness. In: Steinert, Y. (ed.) Faculty Development in the Health Professions A Focus on Research and Practice, pp. 181–195. Springer, Dordrecht (2014) 27. Lemoine, J.B., Chauvin, S.W., Broussard, L., Oberleitner, M.G.: Statewide interprofessional faculty development in simulation-based education for health professions. Clin. Simul. Nurs. 11(3), 153–162 (2015). https://doi.org/10.1016/j.ecns.2014.12.002 28. Peterson, D.T., Watts, P.I., Epps, C.A., White, M.L.: Simulation faculty development. Simul. Healthc.: J. Soc. Simul. Healthc. 12(4), 254–259 (2017). https://doi.org/10.1097/SIH.000000 0000000225 29. Morell, V.W., Sharp, P.C., Crandall, S.J.: Creating student awareness to improve cultural competence: creating the critical incident. Med. Teach. 24(5), 532–534 (2002). https://doi.org/10. 1080/0142159021000012577 30. Akamine, Y., Berg, B.W., Nowicki, M., Ouchi, G., Abe, Y.: International faculty development in fundamental simulation methods for Japanese healthcare educators. Igakukyoiku/Med. Educ. (Japan) 46(5), 409–418 (2015). https://doi.org/10.11307/mededjapan.46.5_409 31. Lee-Jayaram, J.J., Berg, B.W., Sy, A., Hara, K.M.: Emergent themes for instructional design: alpha and beta testing during a faculty development course. Simul. Healthc. 14(1), 43–50 (2019). https://doi.org/10.1097/SIH.0000000000000329 32. Cook, D.A.: Faculty development online. In: Steinert, Y. (ed.) Faculty Development in the Health Professions A Focus on Research and Practice, pp. 217–241. Springer, Dordrecht (2014) 33. Oikawa, S., Berg, B.W., Lee-Jayaram, J.: An international, culturally adaptive faculty development fellowship for simulation educators. Med. Teach. 43(8), 914–915 (2021). https://doi. org/10.1080/0142159X.2021.1929905 34. Asada, Y., Yagi, M.: Moodle for learning analytics and institutional research: exporting data via SQLs and plugins. Int. J. Inst. Res. Manag. 4(2), 30–43 (2020). https://doi.org/10.52731/ ijirm.v4.i2.587
10 Remote Faculty Development Programs for Simulation Educators-Tips …
225
35. Beaudoin, M., Jung, I., Suzuki, K., Kurtz, G., Grabowski, B.: Online learner competencies: knowledge, Skills, and Attitudes for Successful Learning in Online and Blended Settings. Information Age Publishing (2013) 36. Benson, E.P.: Online learning: a means to enhance professional development. Crit. Care Nurse 24(1), 60–63 (2004). https://doi.org/10.4037/ccn2004.24.1.60 37. Shaw, R.-S.: A study of the relationships among learning styles, participation types, and performance in programming language learning supported by online forums. Comput. Educ. 58(1), 111–120 (2012). https://doi.org/10.1016/j.compedu.2011.08.013 38. Pelz, B.: (MY) Three principles of effective online pedagogy. J. Asynchronous Learn. Netw. 14(1), 103–116 (2010). https://doi.org/10.24059/olj.v8i3.1819 39. Yagi, M., Murakami, R., Suzuki, M., Mishina, S., Sekiyama, Y., Sasaki, M., Nakano, M., Kawakami, M., Kitada, S., Otsuka, K., Makamura, M., Narita, S., Haruyama, S.: How to success using online discussion forum for Nurses pertaining to specified medical acts. Jpn. J. Rural Remote Area Nursing 1–8 (2017) 40. Saha, S., Beach, M.C., Cooper, L.A.: Patient centeredness, cultural competence and healthcare quality. J. Natl. Med. Assoc. 100(11), 1275–1285 (2008). https://doi.org/10.1016/S0027-968 4(15)31505-4 41. Puentedura, R.R.: SAMR: moving from enhancement to transformation (2013). http://www. hippasus.com/rrpweblog/archives/000095.html. Accessed 24 Feb. 2023 42. Dieckmann, P., Gaba, D., Rall, M.: Deepening the theoretical foundations of patient simulation as social practice. Simul. Healthc. 2(3), 183–193 (2007). https://doi.org/10.1097/SIH.0b013e 3180f637f5 43. Hagiwara, M.A., Backlund, P., Söderholm, H.M., Lundberg, L., Lebram, M., Engström, H.: Measuring participants’ immersion in healthcare simulation: the development of an instrument. Adv. Simul. 1(1), 1–9 (2016). https://doi.org/10.1186/s41077-016-0018-x 44. Nishigori, H., Oikawa, S., Shoko, T., Takeshi, K., Fumitaka, T.: Foundation program in medical education organized by Kyoto University. Igaku Kyoiku/Med. Educ. (Japan) 52(6), 515–523 (2021) 45. Duch Christensen, M., Oestergaard, D., Dieckmann, P., Watterson, L.: Learners’ perceptions during simulation-based training. Simul. Healthc.: J. Soc. Simul. Healthc. 13(5), 306–315 (2018). https://doi.org/10.1097/SIH.0000000000000300
Dr. Sayaka Oikawa is a Project Professor at the Department of Innovative and Digitalized Medical Education, Akita University Graduate School of Medicine, Japan, and the Adjunct Assistant Professor of the SimTiki Medical Education Simulation Center at the John A Burns School of Medicine, University of Hawaii at Manoa, Honolulu, Hawaii, USA. She is a certified Emergency Medicine physician and an Internal Medicine physician, and holds a Master of Science degree in Health Professions Education from the Maastricht University. She has garnered research awards from Pan Asia Simulation Society in Healthcare (PASSH), and the Japan Society for Medical Education (JSME). As a member of the JSME Internationalization Committee, and a research secretary of the PASSH, she has contributed to the activation of simulation-based education and related medical education research, and engaged in international collaboration and development in the field of medical education. Her current research interests are faculty development, cultural awareness and simulation-based education.