149 83 27MB
English Pages 586 [571] Year 2023
Sudeep Pasricha Muhammad Shafique Editors
Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing Use Cases and Emerging Challenges
Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing
Sudeep Pasricha • Muhammad Shafique Editors
Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing Use Cases and Emerging Challenges
Editors Sudeep Pasricha Colorado State University Fort Collins, CO, USA
Muhammad Shafique New York University Abu Dhabi Abu Dhabi, Abu Dhabi United Arab Emirates
ISBN 978-3-031-40676-8 ISBN 978-3-031-40677-5 https://doi.org/10.1007/978-3-031-40677-5
(eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
Preface
Machine Learning (ML) has emerged as a prominent approach for achieving stateof-the-art accuracy for many data analytic applications, ranging from computer vision (e.g., classification, segmentation, and object detection in images and video), speech recognition, language translation, healthcare diagnostics, robotics, and autonomous vehicles to business and financial analysis. The driving force of the ML success is the advent of Neural Network (NN) algorithms, such as Deep Neural Networks (DNNs)/Deep Learning (DL) and Spiking Neural Networks (SNNs), with support from today’s evolving computing landscape to better exploit data and thread-level parallelism with ML accelerators. Current trends show an immense interest in attaining the powerful abilities of NN algorithms for solving ML tasks using embedded systems with limited compute and memory resources, i.e., so-called Embedded ML. One of the main reasons is that embedded ML systems may enable a wide range of applications, especially the ones with tight memory and power/energy constraints, such as mobile systems, Internet of Things (IoT), edge computing, and cyber-physical applications. Furthermore, embedded ML systems can also improve the quality of service (e.g., personalized systems) and privacy as compared to centralized ML systems (e.g., based on cloud computing). However, state-of-the-art NN-based ML algorithms are costly in terms of memory sizes and power/energy consumption, thereby making it difficult to enable embedded ML systems. This book consists of three volumes, and explores and identifies the most challenging issues that hinder the implementation of embedded ML systems. These issues arise from the fact that, to achieve better accuracy, the development of NN algorithms have led to state-of-the-art models with higher complexity with respect to model sizes and operations, the implications of which are discussed below: • Massive Model Sizes: Larger NN models usually obtain higher accuracy than the smaller ones because they have a larger number of NN parameters that can learn the features from the training dataset better. However, a huge number of parameters may not be fully stored on-chip, hence requiring large-sized off-chip memory to store them and intensive off-chip memory accesses during run time. v
vi
Preface
Furthermore, these intensive off-chip accesses are significantly more expensive in terms of latency and energy than on-chip operations, hence exacerbating the overall system energy. • Complex and Intensive Operations: The complexity of operations in NN algorithms depends on the computational model and the network architecture. For instance, DNNs and SNNs have different complexity of operations since DNNs typically employ Multiply-and-Accumulate (MAC) while SNNs employ more bio-plausible operations like Leaky-Integrate-and-Fire (LIF). Besides, more complex neural architectures (e.g., residual networks) may require additional operations to accommodate the architectural variations. These complex architectures with a huge number of parameters also lead to intensive neural operations (e.g., a large number of MAC operations in DNNs), thereby requiring high computing power/energy during model execution. In summary, achieving acceptable accuracy for the given ML applications while meeting the latency, memory, and power/energy constraints of the embedded ML systems is not a trivial task. To address these challenges, this book discusses potential solutions from multiple design aspects, presents multiple applications that can benefit from embedded ML systems, and discusses the security, privacy, and robustness aspects of embedded ML systems. To provide a comprehensive coverage of all these different topics, which are crucial for designing and deploying embedded ML for real-world applications, this book is partitioned into three volumes. The first volume covers the Hardware Architectures, the second volume covers Software Optimizations and Hardware/Software Codesign, and the third volume presents different Use Cases and Emerging Challenges. The brief outline of the third volume of this Embedded ML book targeting Use Cases and Emerging Challenges along with the section structure is as follows. Part I – Mobile, IoT, and Edge Applications: The first part of the Volume 3 of this book elucidates various applications that benefit from embedded ML systems, including applications for mobile, IoT, and edge computing. • Chapter 1 explains how to employ CNNs for efficient indoor navigation on battery-powered smartphones while leveraging WiFi signatures. • Chapter 2 discusses a framework for performing an end-to-end NAS and model compression for healthcare applications on embedded systems. • Chapter 3 describes the challenges and opportunities of robust ML for powerconstrained wearable device applications, such as health monitoring, rehabilitation, and fitness. • Chapter 4 highlights techniques to design the vision of Unmanned Aerial Vehicles (UAVs) for aerial visual understanding, including data selection, NN design, and model optimization. • Chapter 5 presents optimization techniques for multi-modal ML-based healthcare applications, including an exploration of accuracy-performance trade-offs.
Preface
vii
• Chapter 6 provides a comprehensive survey of embedded ML systems for enabling smart and sustainable healthcare applications. • Chapter 7 proposes a middleware framework that uses reinforcement learning to decide if the processing should be performed in local or offload processing mode. Part II – Cyber-Physical Applications: The second part of the Volume 3 of this book presents examples of cyber-physical applications that benefit from embedded ML systems. • Chapter 8 discusses an adaptive context-aware anomaly detection method for fog computing by employing Long-Short Term Memory (LSTM)-based NNs and Gaussian estimator. • Chapter 9 explores different ML algorithms to perform various tasks in Autonomous Cyber-Physical Systems, such as robotic vision and robotic planning. • Chapter 10 presents a framework for efficient ML-based perception with Advanced Driver Assistance Systems (ADAS) for automotive cyber-physical systems. • Chapter 11 describes a DL-based anomaly detection framework in automotive cyber-physical systems by utilizing a Gated Recurrent Unit (GRU)-based recurrent autoencoder network. • Chapter 12 discusses an embedded system architecture for infrastructure inspection using UAVs based on ML algorithms. Part III – Security, Privacy, and Robustness of Embedded ML: Embedded ML systems should be trustworthy to produce correct outputs without any privacy leaks. Otherwise, their processing may lead to wrong outputs, undesired behavior, and data leakage. To address this, the third part of the Volume 3 of this book presents techniques for mitigating security and privacy threats, and improving the robustness of embedded ML systems. • Chapter 13 discusses the vulnerability of deep reinforcement learning against backdoor attacks in autonomous vehicles. • Chapter 14 analyzes the vulnerability of a CNN-based indoor localization on embedded devices against access point attacks, then proposes a methodology for mitigating the attacks. • Chapter 15 studies the impact of noise in the data input on the DNN accuracy, then provides a suitable framework for analyzing the impact of noise on DNN properties. • Chapter 16 proposes techniques for mitigating backdoor attacks on DNNs by employing two off-line novelty detection models to collect samples that are potentially poisoned. • Chapter 17 highlights the robustness of DNN acceleration on analog crossbarbased IMC against adversarial attacks and discusses energy-efficient attack mitigation techniques.
viii
Preface
• Chapter 18 provides an overview of adversarial attacks and security threats on ML algorithms for edge computing, including DNNs, CapsNets, and SNNs. • Chapter 19 investigates different challenges for achieving trustworthy embedded ML systems, including robustness to errors, security against attacks, and privacy protection. • Chapter 20 presents a systematic evaluation of backdoor attacks on DL-based systems in various scenarios, such as image, sound, text, and graph analytics domains. • Chapter 21 discusses different error-resilience characteristics of DNN models and leverages these intrinsic characteristics for mitigating reliability threats in DL-based systems. We hope this book provides a comprehensive review and useful information on the recent advances in embedded machine learning for cyber-physical, IoT, and edge computing applications. Fort Collins, CO, USA Abu Dhabi, UAE September 1, 2023
Sudeep Pasricha Muhammad Shafique
Acknowledgments
This book would not be possible without the contributions of many researchers and experts in the field of embedded systems, machine learning, IoT, edge platforms, and cyber-physical systems. We would like to gratefully acknowledge the contributions of Rachmad Putra (Technische Universität Wien), Muhammad Abdullah Hanif (New York University, Abu Dhabi), Febin Sunny (Colorado State University), Asif Mirza (Colorado State University), Mahdi Nikdast (Colorado State University), Ishan Thakkar (University of Kentucky), Maarten Molendijk (Eindhoven University of Technology), Floran de Putter (Eindhoven University of Technology), Henk Corporaal (Eindhoven University of Technology), Salim Ullah (Technische Universität Dresden), Siva Satyendra Sahoo (Technische Universität Dresden), Akash Kumar (Technische Universität Dresden), Arnab Raha (Intel), Raymond Sung (Intel), Soumendu Ghosh (Purdue University), Praveen Kumar Gupta (Intel), Deepak Mathaikutty (Intel), Umer I. Cheema (Intel), Kevin Hyland (Intel), Cormac Brick (Intel), Vijay Raghunathan (Purdue University), Gokul Krishnan (Arizona State University), Sumit K. Mandal (Arizona State University), Chaitali Chakrabarti (Arizona State University), Jae-sun Seo (Arizona State University), Yu Cao (Arizona State University), Umit Y. Ogras (University of Wisconsin, Madison), Ahmet Inci (University of Texas, Austin), Mehmet Meric Isgenc (University of Texas, Austin), and Diana Marculescu (University of Texas, Austin), Rehan Ahmed (National University of Sciences and Technology, Islamabad), Muhammad Zuhaib Akbar (National University of Sciences and Technology, Islamabad), Lois Orosa (ETH Zürich, Skanda Koppula (ETH Zürich), Konstantinos Kanellopoulos (ETH Zürich), A. Giray Ya˘glikçi (ETH Zürich), Onur Mutlu (ETH Zürich), Saideep Tiku (Colorado State University), Liping Wang (Colorado State University), Xiaofan Zhang (University of Illinois Urbana-Champaign), Yao Chen (University of Illinois Urbana-Champaign), Cong Hao (University of Illinois Urbana-Champaign), Sitao Huang (University of Illinois Urbana-Champaign), Yuhong Li (University of Illinois Urbana-Champaign), Deming Chen (University of Illinois Urbana-Champaign), Alexander Wendt (Technische Universität Wien), Horst Possegger (Technische Universität Graz), Matthias Bittner (Technische Universität Wien), Daniel Schnoell (Technische Universität Wien), Matthias Wess (Technische Universität Wien), ix
x
Acknowledgments
Dušan Mali´c (Technische Universität Graz), Horst Bischof (Technische Universität Graz), Axel Jantsch (Technische Universität Wien), Floran de Putter (Eindhoven University of Technology), Alberto Marchisio (Technische Universitat Wien), Fan Chen (Indiana University Bloomington), Lakshmi Varshika Mirtinti (Drexel University), Anup Das (Drexel University), Supreeth Mysore Shivanandamurthy (University of Kentucky), Sayed Ahmad Salehi (University of Kentucky), Biresh Kumar Joardar (University of Houston), Janardhan Rao Doppa (Washington State University), Partha Pratim Pande (Washington State University), Georgios Zervakis (Karlsruhe Institute of Technology), Mehdi B. Tahoori (Karlsruhe Institute of Technology), Jörg Henkel (Karlsruhe Institute of Technology), Zheyu Yan (University of Notre Dame), Qing Lu (University of Notre Dame), Weiwen Jiang (George Mason University), Lei Yang (University of New Mexico), X. Sharon Hu (University of Notre Dame), Jingtong Hu (University of Pittsburgh), Yiyu Shi (University of Notre Dame), Beatrice Bussolino (Politecnico di Torino), Alessio Colucci (Technische Universität Wien), Vojtech Mrazek (Brno University of Technology), Maurizio Martina (Politecnico di Torino), Guido Masera (Politecnico di Torino), Ji Lin (Massachusetts Institute of Technology), Wei-Ming Chen (Massachusetts Institute of Technology), Song Han (Massachusetts Institute of Technology), Yawen Wu (University of Pittsburgh), Yue Tang (University of Pittsburgh), Dewen Zeng (University of Notre Dame), Xinyi Zhang (University of Pittsburgh), Peipei Zhou (University of Pittsburgh), Ehsan Aghapour (University of Amsterdam), Yujie Zhang (National University of Singapore), Anuj Pathania (University of Amsterdam), Tulika Mitra (National University of Singapore), Hiroki Matsutani (Keio University), Keisuke Sugiura (Keio University), Soonhoi Ha (Seoul National University), Donghyun Kang (Seoul National University), Ayush Mittal (Colorado State University), Bharath Srinivas Prabakaran (Technische Universität Wien), Ganapati Bhat (Washington State University), Dina Hussein (Washington State University), Nuzhat Yamin (Washington State University), Rafael Makrigiorgis (University of Cyprus), Shahid Siddiqui (University of Cyprus), Christos Kyrkou (University of Cyprus), Panayiotis Kolios (University of Cyprus), Theocharis Theocharides (University of Cyprus), Anil Kanduri (University of Turku), Sina Shahhosseini (University of California, Irvine), Emad Kasaeyan Naeini (University of California, Irvine), Hamidreza Alikhani (University of California, Irvine), Pasi Liljeberg (University of Turku), Nikil Dutt (University of California, Irvine), Amir M. Rahmani (University of California, Irvine), Sizhe An (University of Wisconsin-Madison), Yigit Tuncel (University of Wisconsin-Madison), Toygun Basaklar (University of WisconsinMadison), Aditya Khune (Colorado State University), Rozhin Yasaei (University of California, Irvine), Mohammad Abdullah Al Faruque (University of California, Irvine), Kruttidipta Samal (University of Nebraska, Lincoln), Marilyn Wolf (University of Nebraska, Lincoln), Joydeep Dey (Colorado State University), Vipin Kumar Kukkala (Colorado State University), Sooryaa Vignesh Thiruloga (Colorado State University), Marios Pafitis (University of Cyprus), Antonis Savva (University of Cyprus), Yue Wang (New York University), Esha Sarkar (New York University), Saif Eddin Jabari (New York University Abu Dhabi), Michail Maniatakos (New York University Abu Dhabi), Mahum Naseer (Technische Universität Wien), Iram
Acknowledgments
xi
Tariq Bhatti (National University of Sciences and Technology, Islamabad), Osman Hasan (National University of Sciences and Technology, Islamabad), Hao Fu (New York University), Alireza Sarmadi (New York University), Prashanth Krishnamurthy (New York University), Siddharth Garg (New York University), Farshad Khorrami (New York University), Priyadarshini Panda (Yale University), Abhiroop Bhattacharjee (Yale University), Abhishek Moitra (Yale University), Ihsen Alouani (Queen’s University Belfast), Stefanos Koffas (Delft University of Technology), Behrad Tajalli (Radboud University), Jing Xu (Delft University of Technology), Mauro Conti (University of Padua), and Stjepan Picek (Radboud University). This work was partially supported by the National Science Foundation (NSF) grants CCF-1302693, CCF-1813370, and CNS-2132385; by the NYUAD Center for Interacting Urban Networks (CITIES), funded by Tamkeen under the NYUAD Research Institute Award CG001, Center for Cyber Security (CCS), funded by Tamkeen under the NYUAD Research Institute Award G1104, and Center for Artificial Intelligence and Robotics (CAIR), funded by Tamkeen under the NYUAD Research Institute Award CG010; and by the project “eDLAuto: An Automated Framework for Energy-Efficient Embedded Deep Learning in Autonomous Systems,” funded by the NYUAD Research Enhancement Fund (REF). The opinions, findings, conclusions, or recommendations presented in this book are those of the authors and do not necessarily reflect the views of the National Science Foundation and other funding agencies.
Contents
Part I Mobile, IoT, and Edge Application Use-Cases for Embedded Machine Learning Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saideep Tiku, Ayush Mittal, and Sudeep Pasricha An End-to-End Embedded Neural Architecture Search and Model Compression Framework for Healthcare Applications and Use-Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bharath Srinivas Prabakaran and Muhammad Shafique Robust Machine Learning for Low-Power Wearable Devices: Challenges and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ganapati Bhat, Dina Hussein, and Nuzhat Yamin Efficient Deep Vision for Aerial Visual Understanding . . . . . . . . . . . . . . . . . . . . . . Rafael Makrigiorgis, Shahid Siddiqui, Christos Kyrkou, Panayiotis Kolios, and Theocharis Theocharides Edge-Centric Optimization of Multi-modal ML-Driven eHealth Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anil Kanduri, Sina Shahhosseini, Emad Kasaeyan Naeini, Hamidreza Alikhani, Pasi Liljeberg, Nikil Dutt, and Amir M. Rahmani
3
21
45 73
95
A Survey of Embedded Machine Learning for Smart and Sustainable Healthcare Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Sizhe An, Yigit Tuncel, Toygun Basaklar, and Umit Y. Ogras Reinforcement Learning for Energy-Efficient Cloud Offloading of Mobile Embedded Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Aditya Khune and Sudeep Pasricha
xiii
xiv
Contents
Part II Cyber-Physical Application Use-Cases for Embedded Machine Learning Context-Aware Adaptive Anomaly Detection in IoT Systems . . . . . . . . . . . . . . . 177 Rozhin Yasaei and Mohammad Abdullah Al Faruque Machine Learning Components for Autonomous Navigation Systems . . . . 201 Kruttidipta Samal and Marilyn Wolf Machine Learning for Efficient Perception in Automotive Cyber-Physical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Joydeep Dey and Sudeep Pasricha Machine Learning for Anomaly Detection in Automotive Cyber-Physical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Vipin Kumar Kukkala, Sooryaa Vignesh Thiruloga, and Sudeep Pasricha MELETI: A Machine-Learning-Based Embedded System Architecture for Infrastructure Inspection with UAVs. . . . . . . . . . . . . . . . . . . . . . . 285 Marios Pafitis, Antonis Savva, Christos Kyrkou, Panayiotis Kolios, and Theocharis Theocharides Part III Security, Privacy and Robustness for Embedded Machine Learning On the Vulnerability of Deep Reinforcement Learning to Backdoor Attacks in Autonomous Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Yue Wang, Esha Sarkar, Saif Eddin Jabari, and Michail Maniatakos Secure Indoor Localization on Embedded Devices with Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Saideep Tiku and Sudeep Pasricha Considering the Impact of Noise on Machine Learning Accuracy . . . . . . . . . 377 Mahum Naseer, Iram Tariq Bhatti, Osman Hasan, and Muhammad Shafique Mitigating Backdoor Attacks on Deep Neural Networks . . . . . . . . . . . . . . . . . . . . 395 Hao Fu, Alireza Sarmadi, Prashanth Krishnamurthy, Siddharth Garg, and Farshad Khorrami Robustness for Embedded Machine Learning Using In-Memory Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Priyadarshini Panda, Abhiroop Bhattacharjee, and Abhishek Moitra Adversarial ML for DNNs, CapsNets, and SNNs at the Edge. . . . . . . . . . . . . . . 463 Alberto Marchisio, Muhammad Abdullah Hanif, and Muhammad Shafique On the Challenge of Hardware Errors, Adversarial Attacks and Privacy Leakage for Embedded Machine Learning . . . . . . . . . . . . . . . . . . . . 497 Ihsen Alouani
Contents
xv
A Systematic Evaluation of Backdoor Attacks in Various Domains . . . . . . . 519 Stefanos Koffas, Behrad Tajalli, Jing Xu, Mauro Conti, and Stjepan Picek Deep Learning Reliability: Towards Mitigating Reliability Threats in Deep Learning Systems by Exploiting Intrinsic Characteristics of DNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Muhammad Abdullah Hanif and Muhammad Shafique Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Part I
Mobile, IoT, and Edge Application Use-Cases for Embedded Machine Learning
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones Saideep Tiku, Ayush Mittal, and Sudeep Pasricha
1 Introduction Contemporary outdoor location-based services have transformed how people navigate, travel, and interact with their surroundings. Emerging indoor localization techniques have the potential to extend this outdoor experience across indoor locales. Beyond academics, many privately funded providers in the industry are focusing on indoor location-based services to improve customer experience. For instance, Google can suggest products to its users through targeted indoor locationbased advertisements [1]. Stores such as Target in the United States are beginning to provide indoor localization solutions to help customers locate products in a store and find their way to these products [2]. Services provided by these companies combine GPS, cell towers, and Wi-Fi data to estimate the user’s location. Unfortunately, in the indoor environment where GPS signals cannot penetrate building walls, the accuracy of these geo-location services can be in the range of tens of meters, which is insufficient in many cases [3]. Radio signals such as Bluetooth, ultra-wideband (UWB) [4], and radio frequency identification (RFID) [5, 6] are commonly employed for the purpose of indoor localization. The key idea is to use qualitative characteristics of radio signals (e.g., signal strength or triangulation) to estimate user location relative to a radio beacon (wireless access point). These approaches suffer from multipath effects, signal attenuation, and noise-induced interference [8]. Also, as these techniques require specialized wireless radio beacons to be installed in indoor locales, they are costly and thus lack scalability for wide-scale deployment.
S. Tiku () · A. Mittal · S. Pasricha Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO, USA e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 S. Pasricha, M. Shafique (eds.), Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, https://doi.org/10.1007/978-3-031-40677-5_1
3
4
S. Tiku et al.
Wi-Fi-based fingerprinting is perhaps the most popular radio-signal-based indoor localization technique being explored today. Wi-Fi is an ideal radio signal source for indoor localization as most public or private buildings are pre-equipped with WiFi access points (APs). Lightweight middleware-based fingerprinting frameworks have been shown to run in the background to deliver location-based updates on smartphones [29, 37]. Fingerprinting with Wi-Fi works by first recording the strength of Wi-Fi radio signals in an indoor environment at different locations. Then, a user with a smartphone can capture Wi-Fi received signal strength indication (RSSI) data in real-time and compare it to previously recorded (stored) values to estimate their location in that environment. Fingerprinting techniques can deliver an accuracy of 6–8 m [28], with accuracy improving as the density of APs increases. However, in many indoor environments, noise and interference in the wireless spectrum (e.g., due to other electronic equipment, movement of people, and operating machinery) can reduce this accuracy. Combining fingerprinting-based frameworks with dead reckoning can improve this accuracy somewhat [8]. Dead reckoning refers to a class of techniques where inertial sensor data (e.g., from accelerometer and gyroscope) is used along with the previously known position data to determine the current location. Unfortunately, dead reckoning is infamously known to suffer from error accumulation (in inertial sensors) over time. Also, these techniques are not effective for people using wheelchairs or moving walkways. The intelligent application of machine learning (ML) techniques can help to overcome noise and uncertainty during fingerprinting-based localization [8]. While traditional ML techniques work well at approximating simpler input–output functions, computationally intensive deep learning models are capable of dealing with more complex input–output mappings and can deliver better accuracy. Middlewarebased offloading [30] and energy enhancement frameworks [31, 32, 37] may be a route to explore for computation and energy-intensive indoor localization services on smartphones. Furthermore, with the increase in the available computational power on mobile devices, it is now possible to deploy deep learning techniques such as convolutional neural networks (CNNs) on smartphones. These are a special form of deep neural networks (DNNs) that are purposely designed to for image-based input data. CNNs are well known to automatically identify high-level features in the input images that have the heaviest impact on the final output. This process is known as feature learning. Prior to deep learning, feature learning was an expensive and time-intensive process that had to be conducted manually. CNN has been extremely successful in complex image classification problems and is finding applications in many emerging domains, e.g., self-driving cars [27]. In this chapter, we discuss an efficient framework that uses CNN-based WiFi fingerprinting to deliver a superior level of indoor localization accuracy to a user with a smartphone. Our approach utilizes widely available Wi-Fi APs without requiring any customized/expensive infrastructure deployments. The framework works on a user’s smartphone, within the computational capabilities of the device, and utilizes the radio interfaces for efficient fingerprinting-based localization. The main novel contributions of this chapter can be summarized as follows:
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones
5
• We discuss a newly developed technique to extract images out of location fingerprints, which are then used to train a CNN that is designed to improve indoor localization robustness and accuracy. • We implemented a hierarchical architecture to scale the CNN, so that our framework can be used in the real world where buildings can have large numbers of floors and corridors. • We performed extensive testing of our algorithms with the state of the art across different buildings and indoor paths, to demonstrate the effectiveness of our proposed framework.
2 Related Works Various efforts have been made to overcome the limitations associated with indoor localization. In this section, we summarize a few crucial efforts toward the same. Numerous RFID-based [5, 6] indoor localization solutions that use proximitybased estimation techniques have been proposed. But the hardware expenses of these efforts increase dramatically with increasing accuracy requirements. Also, these approaches cannot be used with smartphones and require the use of specialized hardware. Indoor localization systems that use UWB [4] and ultrasound [10] have similar requirements for additional (costly) infrastructure and a lack of compatibility for use with commodity smartphones. Triangulation-based methods, such as [11], use multiple antennas to locate a person or object. But these techniques require several antennas and regular upkeep of the associated hardware. Most techniques therefore favor using the more lightweight fingerprinting approach, often with Wi-Fi signals. UJIIndoorLoc [7] describes a technique to create a Wi-Fi fingerprint database and employs a k-nearest neighbor (KNN)-based model to predict location. Their average accuracy using KNN is 7.9 m. Given the current position (using fingerprinting) of a user walking in the indoor environment, pedestrian dead reckoning can be used to track the user’s movement using a combination of microelectromechanical systems (MEMs)-based motion sensors ubiquitously found within contemporary smartphones and other wearable electronics. Dead reckoning techniques use the accelerometer to estimate the number of steps, a gyroscope for orientation, and a magnetometer to determine the heading direction. Such techniques have been employed in [12, 26] but have shown to deliver poor localization accuracy results when used alone. Radar [12] and IndoorAtlas [26] proposed using hybrid indoor localization techniques. Radar [12] combines inertial sensors (dead reckoning) with Wi-Fi signal propagation models, whereas Indoor Atlas [26] combines information from several sensors such as magnetic, inertial, and camera sensors, for localization. LearnLoc [8] shallow feed-forward neural network models, dead reckoning techniques, and Wi-Fi fingerprinting to trade-off indoor localization accuracy and energy efficiency during localization on smartphones. Similar to LearnLoc, more recent works focus on optimizing and adapting light-weight machine learning techniques for the
6
S. Tiku et al.
purpose of fingerprinting-based indoor localization [28, 34–36]. However, all such techniques are limited by their ability to identify and match complex pattern within RSSI fingerprints. Additionally, a considerable amount of effort needs to be placed into the preprocessing, feature selection, and the tuning of the underlying model. Given these challenges, there is need of robust methodologies and algorithms for the purpose of fingerprinting-based indoor localization. A few efforts have started to consider deep learning to assist with indoor localization. The work in [13] presents an approach that uses DNNs with Wi-Fi fingerprinting. The accuracy of the DNN is improved by using a hidden Markov model (HMM). The HMM takes temporal coherence into account and maintains a smooth transition between adjacent locations. But our analysis shows that the fine location prediction with the HMM fails in cases such as when moving back on the same path or taking a sharp turn. HMM predictions are also based on the previous position acquired through the DNN and, hence, can be prone to error accumulation. DeepFi [14] and ConFi [15] propose approaches that use the channel state information (CSI) of Wi-Fi signals to create fingerprints. But the CSI information in these approaches was obtained through the use of specialized hardware attached to a laptop. None of the mobile devices available today have the ability to capture CSI data. Due to this limitation, it is not feasible to implement these techniques on smartphones. Deep belief networks (DBNs) [16] have also been used for indoor localization, but the technology is based on custom UWB beacons that lead to very high implementation cost. In summary, most works discussed so far either have specialized hardware requirements or are not designed to work on smartphones. Also, our real-world implementation and analysis concluded that the abovementioned frameworks slow down as they become resource intensive when being scaled to cover large buildings with multiple floors and corridors. The framework discussed in this chapter, CNNLOC, overcomes the shortcomings of these state-of-the-art indoor localization approaches and was first presented in [33]. CNNLOC creates input images by using RSSI of Wi-Fi signals that are then used to train a CNN model, without requiring any specialized hardware/infrastructure. CNNLOC is easily deployable on current smartphones. The proposed framework also integrates a hierarchical scheme to enable scalability for large buildings with multiple floors and corridors/aisles.
3 Convolutional Neural Networks Convolutional neural networks (CNNs) are specialized form of neural networks (NNs) that are designed for the explicit purpose of image classification [9]. They are highly resilient to noise in the input data and have shown to deliver excellent results for complex image classification tasks. The smallest unit of any neural network is
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones
7
a perceptron and is inspired by the biological neuron present in the human brain. A perceptron is defined by the following equation: y=
n
wi xi + w0
(1)
i=1
Here, y is the output, which is a weighted sum of the inputs xi , with a bias (w0 ). NNs have interconnected layers, and in each layer, there are several perceptrons, each with its own tunable weights and biases. Each layer receives some input, executes a dot product, and passes it to the output layer or the hidden layer in front of it [17]. An activation function is applied to the output y, limiting the range of values that it can take and establishes an input–output mapping defined by logistic regression. The most common activation functions used are sigmoid and tanh functions. The goal of an NN is to approximate a functional relationship between a set of inputs and outputs (training phase). The resulting NN then represents the approximated function that is used to make predictions for any given input (testing phase). While an NN often contains a small number of hidden layers sandwiched between the input and output layer, a deep neural network (DNN) has a very large number of hidden layers. DNNs have a much higher computational complexity but in turn are also able to deliver very high accuracy. CNNs are a type of DNN that include several specialized NN layers, where each layer may serve a unique function. CNN classifiers are used to map input data to a finite set of output classes. For instance, given different animal pictures, a CNN model can be trained to categorize them into different classes such as cats and dogs. CNNs also make use of rectified linear units (ReLUs) as their activation function, which allows them to handle nonlinearity in the data. In the training phase, our CNN model uses a stochastic gradient descent (SGD) algorithm. Adam [18] is an optimized variant of SGD and is used to optimize the learning process. The algorithm is designed to take advantage of two well-known techniques: RMSprop [19] and AdaGrad [20]. SGD maintains a constant learning rate for every weight update in the network. In contrast, Adam employs an adaptive learning rate for each network weight, with the learning rate being adapted as the training progresses. RMSprop uses the mean (first-order moment) of past-squared gradients and adjusts the weights based on how fast the gradient changes. Adam, to optimize the process, uses the variance (second-order moment) of past gradients and adjusts the weights accordingly. The structure of the CNN in CNNLOC is inspired from the well-known CNN architectures, LeNet [21] and AlexNet [22]. Our CNN architecture is shown in Fig. 1. For the initial set of layers, our model has 2-D convolutional layer, followed by dense layers and culminates in an output layer. The 2-D convolutional layer works by convolving a specific region of the input image at a time. This region is known as a filter. The filter is shown by a rectangle (red-dotted lines). Each layer performs a convolution of a small region of the input image with the filter and feeds the result to
8
S. Tiku et al.
Fig. 1 CNN architecture
the ReLu activation function. Therefore, we refer to each layer as [Conv2D-ReLu]. To capture more details from the input image we can use a larger number of filters. For each filter, we get a feature map. For the first layer of [Conv2D-ReLU], we used 32 filters to create a set of 32 feature maps. We used five hidden layers of [Conv2DReLU], but only two are shown for brevity. The number of filters and layers is derived through empirical analysis as discussed in Sect. 4.4. A “stride” parameter determines the quantity of pixels that a filter will shift, to arrive at a new region of the input image to process. The stride and other “hyperparameters” of our CNN are further discussed in Sect. 4.4. In the end, a fully connected layer helps in identifying the individual class scores (in our case each class is a unique location). The class with the highest score is selected as the output. In this layer, all the neurons are connected to the neurons in the previous layer (green-dotted lines). In a conventional CNN, a pooling layer is used to down-sample the image when the size of the input image is too big. In our case, the input image is small, and therefore, we do not need this step. We want our CNN to learn all the features from the entire image.
4 CNNLOC Framework: Overview 4.1 Overview An overview of our CNNLOC indoor localization framework is shown in Fig. 2. In the framework, we utilize the available Wi-Fi access points (APs) in an indoor environment to create an RSSI fingerprint database. Our framework is divided into two phases. The first phase involves RSSI data collection, cleaning, and preprocessing. This preprocessed data is used to create a database of images. Each image represents a Wi-Fi RSSI-based signature that is unique to a location label. Each location label is further associated with an x-y coordinate. This database of images is used to train a CNN model. The trained model is deployed on to a smartphone. In the second phase, or the online phase, real-time AP data is converted
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones
9
into an image and then fed to the trained CNN model to predict the location of the user. The CNN model predicts the closest block that was sampled as the users’ location. A detailed description of the preprocessing is described in the next section.
4.2 Preprocessing of RSSI Data The process of image database creation begins with the collection of RSSI fingerprints as shown in the top half of Fig. 2. The RSSI for various APs are captured along with the corresponding location labels and x-y coordinates. Each AP is uniquely identified using its unique media access control (MAC) address. We only maintain information for known Wi-Fi APs and hence clean the captured data. This ensures that our trained model is not polluted by unstable Wi-Fi APs. On the RSSI scale, values typically range between −99 dB (lowest) and − 0 dB (highest). To indicate that a specific AP is unreachable, −100 is used, or no signal is received from it. We normalize the RSSI values on a scale from 0 and 1, where 0 represents no signal and represents the strongest signal. Assume that while fingerprinting an indoor location, a total of K APs are discovered at N unique locations. These combine to form a two-dimensional matrix of size N × K. Then, the normalized RSSI fingerprint at the Nth location, denoted as lN , is given by a row vector [r1 , r2, . . . , rK ], denoted by RN . Therefore, each column vector, [w1 , w2 , . . . , wN ] would represent the normalized RSSI values of the Kth AP at all N locations, denoted by WK . We calculate the Pearson correlation coefficient (PCC) [23] between each column vector WK and the location vector [l1 , l2 , . . . , lN ]. The result is a vector of correlation values denoted as C. PCC is useful in identifying the most significant APs in the database that impact localization
Fig. 2 An overview of the CNNLOC framework
10
S. Tiku et al.
accuracy. The coefficient values range across a scale of −1 to +1. If the relationship is −1, it represents a strong negative relationship, whereas +1 represents a strong positive relationship, and 0 implies that the input and output have no relationship. We only consider the magnitude of the correlation as we are only concerned with the strength of the relationship. APs with very low correlation with the output coordinates are not useful for the purpose of indoor localization. Therefore, we can remove APs whose correlation to the output coordinates is below a certain threshold (|PCC| < 0.3). This removes inconsequential APs from the collected Wi-Fi data and helps reduce the computational workload of the framework. The normalized RSSI data from the remaining high-correlation APs is used to create an RSSI image database, as explained in the next section.
4.3 RSSI Image Database In this section, we present our approach to convert RSSI data for a given location into a gray scale image. A collection of these images for all fingerprinted locations forms the RSSI image database. To form gray scale images, a Hadamard product (HP) [24] is calculated for each R and C. HP is defined as an element wise multiplication of two arrays or vectors:
HP =
N
Ri ◦ C
(2)
i=1
The dimension of each HP is 1 × K. Then, the HP matrix is reshaped into a p × p matrix, which represents a 2-D image as shown in Fig. 3. The HP is padded with zeros in the case that K is less than p2 . Therefore, we now have a set of N images of size p × p in our database. These images are used to train the CNNs. Figure 3 shows two images of size 8 × 8 created for two unique fingerprints (signatures) associated with two different locations. Each pixel value is scaled on a scale of 0–1. The patterns in each of these images will be unique to a location and change slightly as we move along an indoor path. In Eq. (2), the product of PCC and normalized RSSI value for each AP is used to form a matrix. Its purpose is to promote the impact of the APs that are highly correlated to fingerprinted locations. Even though there may be attenuation of Wi-Fi signals due to multipath fading effects, the image may fade but will likely still have the pattern information retained. These patterns that are unique to every location can be easily learned by a CNN. The hyperparameters and their use in CNNLOC are discussed next.
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones
11
Fig. 3 Snapshot of CNNLOC’s offline phase application showing contrasting the images created for two unique locations. The green icons represent locations that are fingerprinted along an indoor path. The two locations shown are 10 m apart
4.4 Hyperparameters The accuracy of the CNN model depends on the optimization of the hyperparameters that control its architecture, which is the most important factor in the
12
S. Tiku et al.
performance of CNN. A smaller network may not perform well, and a larger network may be slow and prone to overfitting. There are no defined rules in deep learning that help in estimating the appropriate hyperparameters and therefore need to be empirically found through an iterative process. The estimated hyperparameters are also highly dependent on the input dataset. For the sake of repeatability, we discuss some the key hyperparameters of our CNN model below: • Number of hidden layers: A large number of hidden layers lead to longer execution times and conversely, fewer hidden layers may produce inaccurate results due to the challenges associated with vanishing gradients. We found that five layers of [Conv2D-ReLU] worked best for our purposes. • Size of filter: This defines the image area that the filter considers at a time, before moving to the next region of the image. A large filter size might aggregate a large chunk of information in one pass. The optimum filter size in our case was found to be 2 × 2. • Stride size: The number of pixels a filter moves by is dictated by the stride size. We set it to 1 because the size of our image is very small, and we do not wish to lose any information. • Number of filters: Each filter extracts a distinct set of features from the input to construct different feature maps. Each feature map holds unique information about the input image. The best results were obtained if we started with a lower number of filters and increased them in the successive layers to capture greater uniqueness in the patterns. There were 32 filters in the first layer and were doubled for each subsequent layer up to 256 filters such that both the fourth and fifth layer had 256 filters.
4.5 Integrating Hierarchy for Scalability We architect our CNNLOC framework to scale up to larger problem sizes than that handled by most prior efforts. Toward this, we enhanced our framework by integrating a hierarchical classifier. The resulting hierarchical classifier employs a combination of smaller CNN modules, which work together to deliver a location prediction. Figure 4 shows the hierarchical classification structure of the framework. Each CNN model has a label that starts with C. The C1 model classifies the floor numbers, and then in the next layer, C20 or C21 identifies the corridor on that floor. Once the corridor is located, one of the CNNs from the third layer (C30–C35) will predict the fine-grain location of the user. This hierarchical approach can further be extended across buildings.
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones
13
Fig. 4 A general architecture for the hierarchical classifier Table 1 Indoor paths used in experiments
Path name Library Clark A Physics
Length (m) 30 35 28
Shape U shape Semi-octagonal Square shape
5 Experiments 5.1 Experimental Setup The following sections describe the CNNLOC implementation and experimental results that were conducted on three independent indoor paths as described in Table 1. The overall floor plan of the path is divided into a grid and tiles of interest are labeled sequentially from 1 to N. For the purposes of this work, each square in the grid has an area of 1 m2 . Based on our analysis (not presented here), having grid tiles of size smaller than 1 m2 did not lead to any improvements. Each of these labeled tiles is then treated as a “class.” This allows us to formulate indoor localization as a classification problem. Figure 5 shows an example of a path covered in the library building floor plan with labeled squares. Each label further translates into an x-y coordinate. Five Wi-Fi scans were conducted at each square during the fingerprinting (training) phase.
5.2 Smartphone Implementation An android application was built to collect Wi-Fi fingerprints (i.e., RSSI samples from multiple APs at each location) and for testing. The application is compatible with Android 6.0 and was tested on a Samsung Galaxy S6. After fingerprint data collection, the data was preprocessed as described in the previous section for the CNN model. The entire dataset is split into training and testing samples, so we can
14
S. Tiku et al.
Fig. 5 Library building path divided into a grid, with squares along the path labeled sequentially from 1 to 30
check how well our models perform. We used one-fifth of the total samples for testing, and four-fifth of the samples were used for training.
5.3 Experimental Results We compared our CNNLOC indoor localization framework with three other indoor localization frameworks from prior work. The first work we implemented is based on the approach in [25] and employs support vector regression (SVR). The approach forms one or more hyperplanes in a multidimensional space segregating similar data point, which are then used for regression. The second work is based on the KNN technique from [8], which is a nonparametric approach that is based on the idea that similar input will have similar outputs. Lastly, we compare our work against a DNN based approach [13] that improves upon conventional NNs by incorporating a large number of hidden layers. All of these techniques supplement the WiFi fingerprinting approach with a machine learning model to provide robustness against noise and interference effects. Our experiments in the rest of this section first discuss the localization accuracy results for the techniques. Subsequently, we also discuss results for the scalability of our framework using a hierarchical classification enhancement approach. Lastly, we contrast the accuracy of our framework with that reported by other indoor localization techniques.
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones
15
Fig. 6 Path traced using different techniques at the Clark building path. Green and red traces indicate actual and predicted paths respectively
5.3.1
Indoor Localization Accuracy Comparison
The overall indoor localization quality as experienced by a user is heavily impacted by the stability of the predicted path that is traced over an indoor localization session. In an attempt to evaluate this, we compare the paths traced by various indoor localization frameworks as compared with the proposed CNNLOC framework. Figure 6 shows the paths predicted by the four techniques, for the indoor path in the Clark building. The green dots along the path represent the points where Wi-Fi RSSI fingerprint samples were collected to create the training fingerprint dataset. The distance between each of the green dots is 1 m. In the offline phase, the RSSI fingerprint at each green dot is converted into an image. The online phase consists of the user walking along this path, and the red lines in Fig. 6 represent the paths predicted by the four techniques. It is observed that KNN [8] and SVR [25] stray off the actual path the most, whereas DNN and CNNLOC perform much better. This is likely because KNN and SVR are both regression-based techniques where
16
S. Tiku et al.
Fig. 7 Comparison of indoor localization techniques
the prediction is impacted by neighboring data points in the RSSI Euclidean space. Two locations that have RSSI fingerprints that are very close to each other in the Euclidian space might not be close to each other on the actual floor plan. This leads to large localization errors, especially when utilizing regression-based approaches. The transition from one location to another is smoother for CNN as it is able to distinguish between closely spaced sampling locations due to our RSSI-to-image conversion technique. The convolutional model is able to identify patterns within individual RSSI images and classify them as locations. From Fig. 6, it is evident that our CNNLOC framework produces stable predictions for the Clark path. Figure 7 shows a bar graph that summarizes the average location estimation error in meters for the various frameworks on the three different indoor floor plans considered. We found that the KNN approach is the least reliable among all techniques with a mean error of 5.5 m and large variations across the paths. The SVR-based approach has a similar mean error as the KNN approach. The DNNbased approach shows lower error across all of the paths. But it does not perform consistently across all of the paths, and the mean error is always higher than that for CNNLOC. This may be due to the fact that the filters in CNN are set up to focus on the image with a much finer granularity than the DNN approach is capable of. We also observe that all techniques perform the worst in the Physics department. This is due to the fact that the path in the Physics department is near the entrance of the building and has a lower density of Wi-Fi APs as compared with the other paths. The Library and Clark paths have a higher density of Wi-Fi APs present; hence, better accuracy can be achieved. Our proposed CNNLOC framework is the most reliable framework with the lowest mean error of less than 2 m.
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones
5.3.2
17
CNNLOC Scalability Analysis
The size and complexity of a deep learning model is directly correlated to number of classes and associated dataset in use. The baseline formulation of our proposed framework does not account for the increasing area of floor plan that needs to be covered. To overcome this, we proposed a hierarchal approach for CNNLOC (Sect. 4.5). We consider a scenario when CNNLOC is required to predict a location inside a building with two floors and with three corridors on each floor. The length of each corridor is approximately 30 m. We combined several small CNNs (in our case 9 small CNNs), such that a smaller number of weights are associated with each layer in the network than if a single larger CNN was used. We first evaluated the accuracy of predictions, for CNNLOC with and without the hierarchical classifier. For the first and second layer of the hierarchical classifier (shown in Fig. 4), the accuracy is determined by the number of times the system predicts the correct floor and corridor. We found that floors and corridors were accurately predicted 99.67% and 98.36% of times, respectively. For the final layer, we found that there was no difference in accuracy between the hierarchal approach and the nonhierarchical approach. This is because in the last level, both the approaches use the same model. Figure 8 shows the benefits in terms of time taken to generate a prediction with the hierarchical versus the nonhierarchical CNNLOC framework. We performed our experiment for four walking scenarios (“runs”) in the indoor environment (building with two floors and with three corridors on each floor). We found that the hierarchical CNNLOC model only takes 2.42 ms to make a prediction on average, whereas the nonhierarchical CNNLOC takes longer (3.4 ms). Thus, the proposed hierarchical classifier represents a promising approach to reduce prediction time due to the fewer number of weights in the CNN layers in the hierarchical approach, which leads to fewer computations in real time.
5.3.3
Accuracy Analysis with Other Approaches
Our experimental results in the previous sections have shown that CNNLOC delivers better localization accuracy over the KNN [8], DNN [13], and SVR [25] frameworks. The UJIIndoorLoc [7] framework is reported to have an accuracy of 4–7m. Our average accuracy is also almost twice that of RADAR [12]. If we consider frameworks that used CSI (DeepFi [14] and ConFi [15]), our accuracy is very close to both at just under 2 m. However, [14, 15] use special equipment to capture CSI and cannot be used with mobile devices. In contrast, our proposed CNNLOC framework is easy to deploy on today’s smartphones, does not require any specialized infrastructure (e.g., custom beacons), and can be used in buildings wherever Wi-Fi infrastructure preexists.
18
S. Tiku et al.
Fig. 8 A comparison of execution times for hierarchical and nonhierarchical versions the CNNLOC framework
6 Conclusion In this chapter, we discuss the CNNLOC framework [33] that uses Wi-Fi fingerprints and convolutional neural networks (CNNs) for accurate and robust indoor localization. We compared our work against three different state-of-the-art indoor localization frameworks from prior work. Our framework outperforms these approaches and delivers localization accuracy under 2 m. CNNLOC has the advantage of being easily implemented without the overhead of expensive infrastructure and is smartphone compatible. We also demonstrated how a hierarchical classifier can improve the scalability of this framework. CNNLOC represents a promising framework that can deliver reliable and accurate indoor localization for smartphone users. Acknowledgments This work was supported by the National Science Foundation (NSF), through grant CNS-2132385.
References 1. How Google Maps Makes Money. (2022) [Online] https://www.investopedia.com/articles/ investing/061115/how-does-google-maps-makes-money.asp. Accessed 1 Apr 2022 2. Target Rolls Out Bluetooth Beacon Technology in Stores to Power New Indoor Maps in its App. (2017) [Online] https://techcrunch.com/2017/09/20/target-rolls-out-bluetooth-beacontechnology-in-stores-to-power-new-indoor-maps-in-its-app/. Accessed 1 Apr 2022
Convolutional Neural Networks for Efficient Indoor Navigation with Smartphones
19
3. Case Study: Accuracy & Precision of Google Analytics Geolocation. (2017) [Online] Available at: https://radical-analytics.com/case-study-accuracy-precision-of-google-analyticsgeolocation-4264510612c0. Accessed 1 Dec 2017 4. Ubisense Research Network. [Online] Available at: http://www.ubisense.net/. Accessed 1 Dec 2017 5. Jin, G., Lu, X., Park, M.: An indoor localization mechanism using active RFID tag. In: IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC). IEEE (2006) 6. Chen, Z., Wang, C.: Modeling RFID signal distribution based on neural network combined with continuous ant colony optimization. Neurocomputing. 123, 354–361 (2014) 7. Torres-Sospedra, J., et al.: UJIIndoorLoc: a new multi-building and multi-floor database for WLAN fingerprint-based indoor localization problems. In: IEEE Indoor Positioning and Indoor Navigation (IPIN). IEEE (2014) 8. Pasricha, S., Ugave, V., Han, Q., Anderson, C.: LearnLoc: a framework for smart indoor localization with embedded mobile devices. In: ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). IEEE (2015) 9. Pouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Reyes, M.P., Shyu, M., Chen, S., Iyengar, S.S.: A survey on deep learning: algorithms, techniques, and applications. ACM Comput. Surv. 51(5), 1–36 (2019) 10. Borriello, G., Liu, A., Offer, T., Palistrant, C., Sharp, R.: WALRUS: wireless acoustic location with room-level resolution using ultrasound. In: Mobile systems, applications, and services (MobiSys). ACM (2005) 11. Yang, C., Shao, H.R.: WiFi-based indoor positioning. IEEE Commun. Mag. 53(3), 150–157 (2015) 12. Bahl, P., Padmanabhan, V.: RADAR: an in-building RF-based user location and tracking system. In: IEEE International Conference on Computer Communications (INFOCOM). IEEE (2000) 13. Zhang, W., Liu, K., Zhang, W., Zhang, Y., Gu, J.: Deep neural networks for wireless localization in indoor and outdoor environments. Neurocomputing. 194, 279–287 (2016) 14. Wang, X., Gao, L., Mao, S., Pandey, S.: DeepFi: deep learning for indoor fingerprinting using channel state information. In: IEEE Wireless Communications and Networking Conference (WCNC). IEEE (2015) 15. Chen, H., Zhang, Y., Li, W., Tao, X., Zhang, P.: ConFi: convolutional neural networks based indoor WiFi localization using channel state information. IEEE Access. 5, 18066–18074 (2017) 16. Hua, Y., Guo, J., Zhao, H.: Deep belief networks and deep learning. In: IEEE International Conference on Intelligent Computing and Internet of Things (ICIT). IEEE (2015) 17. Stanford CNN Tutorial. [Online] Available at: http://cs231n.github.io/convolutional-networks. Accessed 1 Apr 2022 18. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015) 19. RMSProp. [Online] https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf. Accessed 1 Apr 2022 20. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. ACM J. Mach. Learn. Res. 12, 2121–2159 (2011) 21. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE. 86(11), 2278–2324 (1998) 22. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. Neural IPS 2012 (2012) 23. Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. In: Topics in Signal Processing, vol. 2, pp. 1–4. Springer (2009) 24. Styan, G.: Hadamard products and multivariate statistical analysis. In: Linear Algebra and its Applications, vol. 6, pp. 217–240. Elsevier (1973)
20
S. Tiku et al.
25. Cheng, Y.K., Chou, H.J., Chang, R.Y.: Machine-learning indoor localization with access point selection and signal strength reconstruction. In: IEEE Vehicular Technology Conference (VTC). IEEE (2016) 26. IndoorAtlas. [Online] http://www.indooratlas.com/. Accessed 1 Apr 2022 27. Rausch, V., Hansen, A., Solowjow, E., Liu, C., Kreuzer, E., Hedrick, J.K.: Learning a deep neural net policy for end-to-end control of autonomous vehicles. In: IEEE American Control Conference (ACC). IEEE (2017) 28. Langlois, C., Tiku, S., Pasricha, S.: Indoor localization with smartphones: harnessing the sensor suite in your pocket. IEEE Consum. Electron. 6(4), 70–80 (2017) 29. Tiku, S., Pasricha, S.: Energy-efficient and robust middleware prototyping for smart mobile computing. In: IEEE International Symposium on Rapid System Prototyping (RSP). IEEE (2017) 30. Khune, A., Pasricha, S.: Mobile network-aware middleware framework for energy-efficient cloud offloading of smartphone applications. In: IEEE Consumer Electronics. IEEE (2017) 31. Donohoo, B., Ohlsen, C., Pasricha, S.: A middleware framework for application-aware and user-specific energy optimization in smart Mobile devices. Journal Pervasive Mob. Comput. 20, 47–63 (2015) 32. Donohoo, B., Ohlsen, C., Pasricha, S., Anderson, C., Xiang, Y.: Context-aware energy enhancements for smart mobile devices. IEEE Trans. Mob. Comput. 13(8), 1720–1732 (2014) 33. Mittal, A., Tiku, S., Pasricha, S.: Adapting convolutional neural networks for indoor localization with smart mobile devices. In: ACM Great Lakes Symposium on VLSI (GLSVLSI). ACM (2018) 34. Tiku, S., Pasricha, S., Notaros, B., Han, Q.: SHERPA: a lightweight smartphone heterogeneity resilient portable indoor localization framework. In: IEEE International Conference on Embedded Software and Systems (ICESS). IEEE (2019) 35. Tiku, S., Pasricha, S.: PortLoc: a portable data-driven indoor localization framework for smartphones. IEEE Des. Test. 36(5), 18–26 (2019) 36. Tiku, S., Pasricha, S., Notaros, B., Han, Q.: A hidden markov model based smartphone heterogeneity resilient portable indoor localization framework. J. Syst. Archit. 108, 101806 (2020) 37. Pasricha, S., Ayoub, R., Kishinevsky, M., Mandal, S.K., Ogras, U.Y.: A survey on energy management for mobile and IoT devices. IEEE Des. Test. 37(5), 7–24 (2020)
An End-to-End Embedded Neural Architecture Search and Model Compression Framework for Healthcare Applications and Use-Cases Bharath Srinivas Prabakaran and Muhammad Shafique
1 Introduction As discussed in chapter “Massively Parallel Neural Processing Array (MPNA): A CNN Accelerator for Embedded Systems”, deep learning has revolutionized domains worldwide by improving machine understanding and has been used to develop state-of-the-art techniques in fields like computer vision [15], speech recognition and natural language processing [21], healthcare [9], medicine [49], bioinformatics [20], etc. These developments are primarily driven by the rising computational capabilities of modern processing platforms and the availability of massive new annotated datasets that enable the model to learn the necessary information. Fields like medicine and healthcare generate massive amounts of data, in the order of hundreds of exabytes is the USA alone, which can be leveraged by deep learning technologies to significantly improve a user’s quality of life and obtain substantial benefits. Furthermore, healthcare is one of the largest revenuegenerating industries in the world, requiring contributions upward of .10% of the country’s Gross Domestic Product (GDP) annually [4]. Countries like the United States routinely spend up to .17.8% of their GDP on healthcare [35]. The global health industry is expected to generate over .$10 trillion revenue, annually by 2022, which is a highly conservative estimate as it does not consider the increasing global elderly population percentages [47]. The rising global average life expectancy is another byproduct of the substantial technological advancements in medicine and healthcare [34]. The Internet of Things (IoT) phenomenon serves as an ideal B. S. Prabakaran () Institute of Computer Engineering, Technische Universität Wien (TU Wien), Vienna, Austria e-mail: [email protected] M. Shafique Engineering Division, New York University Abu Dhabi, Abu Dhabi, UAE e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 S. Pasricha, M. Shafique (eds.), Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, https://doi.org/10.1007/978-3-031-40677-5_2
21
22
B. S. Prabakaran and M. Shafique
Estimated Economic Impact of IoT Applications by 2025 $ 0.2 – 0.5 Trillion 7%
33% $ 0.9 – 2.3 Trillion
19% $ 0.5 – 0.9 Trillion 41% $ 1.1 – 2.5 Trillion
4%
4% 4% 4% 2%
1% Healthcare Urban Infrastructure Agriculture
Industry Security Vehicles
Electricity Resource Extraction Retail
Personalized Health-CARE
Fig. 1 Breakdown of the estimated economic impact of Internet of Things applications by 2025; an overview of the human bio-signals that can be monitored and analyzed for patient-specific care
opportunity that can be exploited by investigating its applicability in the healthcare sector to offer more efficient and user-friendly services that can be used to improve quality of life. The Internet of Medical Things (IoMT) market is expected to grow exponentially to achieve an annual economic impact of .$1.1–$2.5 trillion by 2025, which constitutes .41% of the impact of the complete IoT sector [29]. This includes applications like personalized health monitoring, disease diagnostics, patient care, and physiological signal, or bio-signal, monitoring, and analytics to recommend person-specific lifestyle changes or health recommendations [2, 19], especially by investing heavily in the capabilities and advancements of deep learning. A breakdown of the estimated economic impact of IoT applications by the year 2025 and an overview of human bio-signals, which can be monitored and analyzed, are presented in Fig. 1. An overview of a few healthcare use-cases and applications is discussed next, before moving on to the framework that can be used for exploring the deep learning models that can be deployed for a given use-case, given its output quality requirements and hardware constraints of the target execution platform.
1.1 Deep Learning in Healthcare: Potential Use-Cases and Applications Medical Imaging Deep learning has been largely investigated as a solution to address research challenges in the computer vision and imaging domains due to the availability of massive labeled and annotated datasets. Therefore, the primary healthcare domain suitable for investigating the applicability of deep learning would be medical imaging. Since various technologies like X-Rays, CT (Computed Tomography) and MRI (Magnetic Resonance Imaging) scans, ultrasound, etc. are regularly used by clinicians and doctors to help patients, deep learning can be highly
An End-to-End Embedded Neural Architecture Search and Model Compression. . .
23
beneficial in such scenarios where they can be deployed as clinical assistants that can aid in diagnostics and radiology. Electronic Health Record Analysis Electronic Health Record (EHR) is a collection of health data related to a patient across time, including their medical history, current and past medications, allergies, immunization information, lab test results, radiology imaging studies, age, weight, etc. These EHRs, from several patients, are combined into a large pool, based on their demographics, to mine and extract relevant information that can be used to devise new treatment strategies and improve the health status of the patients. Deep learning techniques have successfully demonstrated the ability to combine all this information to extract vital health statistics, including the ability to predict a patient’s mortality. Drug Discovery The processing capabilities of deep learning models can also be leveraged on massive genomic, clinical, and population-level data to identify potential drugs or compounds that can, “by-design,” explore associations with existing cell signaling pathways, pharmaceutical and environmental interactions, to identify cures for known health problems. For instance, the protein folding problem, which had fazed the community for more than five decades, was recently solved by the AlphaFold deep learning model, proposed by researchers from Google [20]. This enables researchers to predict the structure of a protein complex, at atomic granularity, using just its amino acid composition, which can further enable scientists to identify compounds that can prevent the formation of lethal proteins in hereditary medical conditions like Alzheimer’s or Parkinson’s. Precision Medicine Genomic analysis, in combination with drug discovery approaches, might be the key to develop the next generation of precise targeted medical treatments, which improve the user’s quality of life. Understanding the genetic capability of the underlying condition, such as the type of cancer, its ability to reproduce, and the way it propagates, can enable scientists to better develop userspecific treatment options. However, processing such large amounts of data can take anywhere from weeks to months, which can be circumvent by deep learning models to the order of hours, enabling such explorations. Similarly, there are plenty of other healthcare applications, like real-time monitoring and processing of bio-signals, sleep apnea detection, detecting gait patterns, genomic analysis, artificial intelligence-based chatbots and health assistants, and many more, that can benefit by investigating the applicability of deep learning in these use-cases (Table 1). We delve into the field of deep learning for healthcare,
Table 1 A summary of key state-of-the-art techniques in deep learning for healthcare Deep learning in healthcare Medical imaging Electronic health record analysis Drug discovery and precision medicine Others
References [1, 10, 15, 25, 27, 43] [18, 39, 45, 46, 50] [7, 20, 24, 36, 38, 40, 42] [17, 23, 37]
24
B. S. Prabakaran and M. Shafique
next, by presenting a comprehensive embedded neural architecture search and model compression framework for healthcare applications and illustrate the benefits by evaluating its efficacy on a bio-signal processing use-case.
2 Embedded Neural Architecture Search and Model Compression Framework for Healthcare Applications Figure 2 illustrates an overview of the deep neural network (DNN) model search and compression framework for healthcare, which is composed of six key stages. The framework considers (1) the user specifications and quality requirements, such as the required prediction labels, output classes, expected accuracy or precision of the model and (2) the hardware constraints of the target execution platform, such as the available on-chip memory (MBs) and the maximum number of floating-point operations (FLOPs) that can be executed per second to construct the required dataset from the existing labeled annotations (more details provided in Sect. 2.3) and explore the design space of DNN models that can be useful for the application.
2.1 User Specifications and Requirements The framework enables dynamic model exploration by restricting the output classes of the DNN, based on the user requirements; besides the normal and anomalous classes, the user might require an output class specific to the target use-case. For instance, a hospital might require the model to classify X-Rays or CT scans to explicitly detect cases of lung infection caused by the novel SARS-CoV-2 coronavirus as a separate classification. These instances can be included by the framework to generate and explore DNN models specific to this case, given that the corresponding annotated data is already present in the dataset used for construction. The framework currently considers four metrics for evaluating the quality of a model (Q), namely, Accuracy (A), Precision (P ), Recall (R), and F1-score (F ). Accuracy of a model is defined as the ratio of correct classifications with respect to the total number of classifications, Precision signifies the percentage of classified items that are relevant, Recall is defined as the percentage of relevant items that are classified correctly, and F1-score is used to evaluate the Recall and Precision of a model by estimating their harmonic mean. The system designer can specify a constraint in the framework using any of the abovementioned metrics, while exploring the DNN models for the application to ensure that the models obtained after exploration satisfy the required quality constraint. Evaluation with other usecase specific metrics is orthogonal to these standards and can be easily incorporated into the framework. These metrics are estimated as follows:
Inputs
Input
0 0 0
Outputs
Output
Training and evaluation of selected network models
Genetic Algorithms Tournament; NSGA-II; SPEA-2; Wheel Roulette
Selective Exploration
Pareto-Optimal Networks
Training & evaluation of all possible network models
Optimization Goals: Accuracy only Recall only Memory only Weighted Accuracy & Memory
Optimal Model Selection
Optimal Model Selection
Exhaustive Exploration 98
Increasing Pruning →
10×
100
50
44
4
O 6 5 4 3 2 Number of Bits
60× 20
Quantization
Combined Exploration
97
Pruning
6 Model Compression
New Labels ¬ Existing Labels Adding Samples ¬ Data Interpolation
Dataset Construction
0
Set of DNN Models for Healthcare Application
Fig. 2 An overview of the key stages in the deep learning model exploration and compression framework for healthcare applications (adapted from [37])
Output
Blockn
0 0 0
Block2
Block1
Input
Generate Deep Neural Networks
Neural Architecture Parameters #ResNet Blocks; #Filters; #LSTM Cells
3
5 Deep Learning Model Training and Evaluation
256 MB, 10 TFLOPs
Cluster/GPU
Prediction Labels/Classes
4 Deep Learning Model Generation
4MB, 0.1 GFLOPs
Wearables/ Mobiles
Quality Metrics: Accuracy, Precision, Recall, F1-score
2 Platform Constraints (MB, FLOPs)
Accuracy
Accuracy
Memory Memory
1 User Specifications/Requirements
An End-to-End Embedded Neural Architecture Search and Model Compression. . . 25
26
B. S. Prabakaran and M. Shafique
A=
.
TP +TN TP 2∗P ∗R TP ,R = ,F = ,P = T P + FP T P + FN P +R #Classif ications
where T P stands for the number of true positive predictions, whereas T N, F P , and F N depict the number of true negative, the number of false positive, and the number of false negative predictions, respectively.
2.2 Platform Constraints Similarly, to ensure that the explored networks do not require computational resources beyond those available on the target execution platform, a hardware constraint is implemented before the evaluation stage, which the user can define as per their requirements. Currently, the hardware constraints of the system can be specified by the system designer either in terms of the memory overhead (B), i.e., the maximum size of the model that can be accommodated on the platform, or the execution time in terms of the maximum number of floating-point operations that the platform can execute for a single inference (F P ). Like the quality metrics, incorporating other additional platform-specific hardware constraints are orthogonal to our current approach and can be an easily added functionality to the framework. Explicitly specifying hardware constraints requires the framework to identify models that offer the best quality under the given constraints, which enables the exploration of a trade-off between output quality and hardware requirements of the model, two metrics that typically maintain an inverse correlation.
2.3 Dataset Construction To ensure that the model developed is application-driven, a custom dataset is constructed by fusing labels, in an existing healthcare dataset, in order to create the required output classes. Note that each label in the custom dataset needs to be correspond to one of the labels in the existing healthcare dataset, to ensure coherence. For instance, with respect to the COVID-19 classifier application discussed earlier, there could be varying diagnosis for the lung X-Rays or CT scans present in the dataset, including pneumonia, pleural effusion, cystic fibrosis, or lung cancer, which are ultimately labeled as “anomaly” in the constructed dataset, given the sole focus of the application is to just classify amongst normal, anomaly, and “COVID-19.” A similar methodology can be used to construct custom datasets for a given healthcare application, as discussed with the help of a use-case in Sect. 3.
An End-to-End Embedded Neural Architecture Search and Model Compression. . .
27
2.4 Deep Learning Model Generation With the necessary information regarding the user specifications, platform constraints, and the constructed dataset, we generate the set of possible DNN models (.ψ) by varying the key neural architecture parameters. Since our approach considers a relevant state-of-the-art model as a baseline, we extract the key architectural parameters from the baseline model and vary them to generate different models that can achieve (near) state-of-the-art accuracy with reduced hardware requirements. For instance, the use-case discussed in Sect. 3 explores three DNN model parameters, namely, (1) No. of Residual Network Blocks (#ResNet Blocks), (2) No. of Filters (#Filters), and (3) No. of LSTM Cells (#LSTM Cells), which can, theoretically, be any value in the .R+ domain, leading to an explosion of the designs that need to be explored under an unbounded design space. By considering the state-of-the-art model as the upper bound, we restrict the number of designs to be explored, thereby ensuring that the algorithm converges in finite time. Furthermore, since the exploration of the models is heavily dependent on the state of the art, any modifications to the block-level structure of the baseline model, including changes to the block, will affect the design space of the models (.ψ) to be explored.
2.5 Deep Learning Model Training and Evaluation The DNN models generated earlier need to be trained and evaluated on the constructed dataset, individually, before their real-world deployment. However, since the training and evaluation of each individual DNN in the design space is a compute-intensive and time-consuming task, first, we need to reduce the number of models generated, which we ensure by constraining the hardware requirements of the model (as discussed earlier in Sect. 2.2), and second, we need to quickly explore the design space of DNNs, to reduce the overall duration of the task. Exploration of the design space, in our framework, can be conducted either exhaustively or selectively, as discussed below: (1) Exhaustive Exploration It requires each individual model of the design space to be trained and evaluated on the constructed dataset in order to determine the set of Pareto-optimal DNN models, which essentially trade off between output quality and hardware requirements. The hardware constraints imposed by the execution platform combined with the state-of-the-art imposed upper bound enable the framework to exhaustively explore the design space of DNN models in tens of GPU hours, as opposed to hundreds or thousands of hours in the case of unconstrained exploration. Therefore, when the complexity of the model, its parameter format, the number of weights and biases, and the variation in the hyper-parameters increase, it is recommended to selectively explore the design space to circumvent the exponential rise in design space models. Exhaustive exploration is primarily
28
B. S. Prabakaran and M. Shafique
included as a functionality to illustrate the efficacy of the selective exploration technique that has been discussed next. (2) Selective Exploration It involves the effective selection, training, and evaluation of a small subset of the models in .ψ in order to reduce the exploration time to a couple of GPU hours. Genetic algorithms utilize a cost function, which defines the optimization goal, to effectively obtain near-optimal solutions while reducing the exploration time for a wide range of real-world optimization problems [44]. The framework uses genetic algorithms that rely on the concepts of reproduction and evolution to select the models that need to be trained in each generation to create a new generation of models that have the potential to further optimize the cost function. The use of other meta-heuristic approaches to explore the design space is orthogonal to our use of genetic algorithms, which encompasses techniques like ant colony optimization [8] and simulated annealing [48], and can be incorporated into the framework, if essential. In the selective exploration process, we start with an initial population of 30 random DNN models present in the design space, referred to as individuals, based on the recommendation of previous works [41] in order to obtain the best results. The genetic algorithms require the presence of a “chromosome,” which encodes all the key neural architecture parameters (“genes”) that can be varied to obtain the complete design space of DNN models. All the genes are stitched together to generate the chromosome string, which, when decoded, constructs a DNN model, or an individual, in the design space. Each individual is subsequently trained on the constructed dataset to evaluate its viability in terms of a fitness value, which enables it to compete with other individuals in the design space. The fitness value is estimated as the cost function when the decoded DNN model (M) exists in the design space (.ψ) or is considered to be a NULL value otherwise and is discarded from the search. Next, on the basis of their fitness values, two individuals are selected to pass on their genes to the next generation while undergoing the process of mutation and crossover, which are essential reproduction principles. We ensure an ordered .0.4 crossover probability for a mating parent pair with a random crossover location in the pair’s chromosomes. The next generation of the population, i.e., the offspring, has their parents’ chromosomes exchanged from the start until the crossover point and is considered for exploration based on their fitness value. The offspring also have a mutation probability of .0.11 to enable a bit-flip in the chromosome, thereby ensuring a diverse population and enabling a comprehensive exploration of DNN models. The experiments are run to determine a population of 30 individuals in each generation, based on their fitness values, to create 5 consecutive iterations of offspring that can be trained and evaluated to determine the set of best-fit individuals (see Fig. 3). By default, the framework includes the ability to explore the design space using the following recognized genetic algorithms: NSGA-II [6], Roulette Wheel [11], Tournament Selection [31], and SPEA-2 [51]. Likewise, other algorithms and heuristics can be incorporated into the framework, as discussed earlier. The time complexity of each algorithm determines the order of execution time required for
An End-to-End Embedded Neural Architecture Search and Model Compression. . .
29
Offspring Generation
Start
Initial Population: 30 DNN Models
Training and Fitness Value Evaluation
Mating pair Selection
NO
Crossover
Mutation
Generations == 5?
YES
DNN Offspring
Stop
Fig. 3 Flow chart illustrating the selective design space exploration technique (adapted from [37])
exploring the design space .ψ. If the size of the design space is considered to be N, the time complexity of the algorithms would be .O(N 2 ), .O(N ∗ log N ), .O(N ), and .O(N 2 ∗ log N), respectively. The efficacy of these algorithms, illustrated by the varying subset of individuals selected and evaluated, is discussed in Sect. 3 with the use-case. The genetic algorithms used by the framework require a cost function (.φ) that needs to be specified by the system designer, which can be optimized to obtain the set of near-optimal network models (.ω) for the explored design space. The weighted cost function used in this framework is H φ =α∗Q+β ∗ 1− Hmax
.
where .α, β ∈ [0, 1] depict the weights for output quality (Q) and hardware requirements (H ) of the model, respectively. .Hmax denotes the hardware requirements of the state-of-the-art baseline model. As discussed earlier, Q can be evaluated as Accuracy, Precision, Recall, or F1-score, whereas H can be estimated as the memory overhead or the number of floating-point operations for an inference. Other application-specific quality metrics or additional hardware requirements, such as the power consumption of the model or its energy requirements on the target platform, can also be included in the framework. The weights .α and .β depict the importance of the quality and hardware metrics, respectively, during the algorithm’s exploration of the design space. Algorithm 1 discusses the pseudo-code for the weighted DNN model exploration technique deployed in the framework. Given (1) the inputs (design space (.ψ), weights for the cost function (.α, .β), and the hardware requirement for the state-of-the-art model (.Hmax )) and (2) quality and hardware constraints (.QConst , HConst ), the weighted DNN model exploration algorithm generates a set of DNN models .ω that satisfies the quality and hardware constraints of the application. The ExplorationAlgorithm function call in Line 10 can call any of the selective exploration algorithms (genetic algorithms) or the exhaustive exploration technique discussed earlier. Table 2 illustrates an overview of the symbols and denotations used in this chapter.
30
B. S. Prabakaran and M. Shafique
Algorithm 1 Weighted DNN model exploration Input: ψ, α, β, Hmax Constraints: QConst , HConst Output: ω 1: H = []; 2: for M in ψ do 3: if H ardwareRequirements(M) ≤ HConst then 4: H.append(H ardwareRequirements(M)); 5: else 6: ψ.remove(M); 7: end if 8: end for 9: φ = α ∗ Q + β ∗ 1 − HHmax 10: ω = ExplorationAlgorithm(φ, ψ); 11: for DNN in ω do 12: if Q.(DNN) < QConst then 13: ω.remove(DNN); 14: end if 15: end for Table 2 Overview of the symbols used in this work along with their denotations [37] Symbol Q .QConst A P R F B FP .ψ
Denotation Model quality User quality constraint Accuracy of the model Precision of the model Recall of the model F1-score of the model Memory overhead of the model No. of floating-point operations reqd. by the model Design space of DNN models
Symbol N .φ .ω .α .β M .HConst .Hmax
H
Denotation Size of design space (.ψ) Cost function to be optimized Output set of near-optimal DNN models Weight for output quality Q Weight for hardware requirement H DNN model in .ψ Platform’s hardware constraint Hardware requirements of the state-of-the-art model Hardware requirements of the model
2.6 Model Compression The framework also includes the capability of further reducing the model’s hardware requirements through the means of compression techniques like pruning and quantization. Besides neural architecture search approaches, model compression techniques have proven to be highly successful in reducing the hardware requirements of the model while retaining output quality [12].
2.6.1
Pruning
As the name conveys, the core concept of this approach involves identifying lessimportant parameters of the model, such as the weights, kernels, biases, or even
An End-to-End Embedded Neural Architecture Search and Model Compression. . .
31
neurons or layers, and eliminating them to further reduce the hardware requirements of the DNN model, increasing their deployability in edge platforms. Eliminating the model parameters reduces many of its requirements, such as memory overhead of the model and the number of floating-point operations required for an inference, which tend to further improve performance and reduce energy consumption on the target platform during inference. The pruned model is subsequently retrained on the constructed dataset to ensure that the model achieves an output quality similar to that of the original unpruned model obtained from the design space. The framework integrates the pruning techniques presented in [3, 12, 26, 28, 30] to provide the system designer with a range of options that can be implemented in order to meet the application requirements based on the DNN model’s capabilities. For example, the technique proposed in [12] determines the lowest .x% of weights, based on their absolute magnitude, in each individual layer of the model and eliminates them, followed by a retraining stage, as discussed earlier, to achieve an accuracy similar to the original model. Whereas the technique presented in [30] sorts the complete set of weights in the model to iteratively eliminate the lowest .x% of overall weights in each iteration, regardless of the layer, followed by model retraining to achieve original model accuracy. Section 3 illustrates an overview of the benefits of pruning DNN models obtained using this approach. Incorporating other pruning techniques into the framework can be easily achieved as long as the new technique complies with the original interfaces of standard pruning techniques.
2.6.2
Quantization
The model parameters are usually stored in a floating-point format requiring 32 bits, leading to a large memory overhead on the execution platform. Accessing each floating-point parameter from memory requires increased access latency and energy consumption, as opposed to traditional 8-bit or 16-bit integers. Likewise, a highprecision floating-point addition operation requires nearly an order of magnitude more energy as opposed to a 32-bit integer ADD operation [13]. Hence, approaches that can be used to reduce the precision from 32 bits to 16 or 8 bits, through the process of quantization, can be used to substantially reduce the hardware requirements of the model. Quantization techniques can be implemented to further reduce the precision of the DNN model to less than 8 bits, by analyzing its trade-off with output quality for the target application. The process involves the construction of .2p clusters, where p stands for the number of quantized bits, using the k-means algorithm, which evaluates the parameters in each layer of the DNN model. Once the clusters are determined, equally spaced values are allocated to each cluster ranging from minimum to maximum value for corresponding cluster weights composed of all zeros to all ones, respectively. For simplicity, all layers in the DNN model are quantized with the same number of bits. Similar to pruning, other quantization techniques can be incorporated into the framework as long as the new technique complies with the original interfaces of standard quantization.
32
B. S. Prabakaran and M. Shafique
Based on recommendations from the studies presented in [12] and from exhaustive experimentation, the optimal approach for minimizing the hardware requirements of the model requires pruning the selected DNN model obtained from the design space, followed by model quantization, to eliminate the redundant parameters and subsequently reduce parameter precision, respectively.
3 Case Study: Bio-signal Anomaly Detection We present the efficacy of the framework by deploying it to generate, explore, and compress a wide range of DNN models for our use-case: ECG Bio-signal processing. We explore 5 different sub-cases as a part of this study: • UC.1 : Binary Classification: [Normal, Anomaly] • UC.2 : Multi-class Classification: [Normal, Premature Ventricular Contraction, Other Anomaly] • UC.3 : Multi-class Classification: [Normal, Bundle Branch Block, Other Anomaly] • UC.4 : Multi-class Classification: [Normal, Atrial Anomaly, Ventricular Anomaly, Other Anomaly] • UC.5 : Multi-class Classification: [Normal, Ventricular Fibrillation, Other Anomaly] Hannun et al. [14] proposed a deep neural network model architecture that can differentiate between 12 classes of ECG signals, evaluated on their private dataset. This model is considered to be the current state of the art in ECG signal classification and is the baseline model of this use-case. The primary block used in [14] is adopted in this use-case to generate the design space of DNN models for each of the 5 different sub-cases discussed above. The input and output layers have been modified to consider the data from the open-source ECG dataset adopted in this case study to process and categorize them into the required output classes. The default model of the DNN is modified to include LSTM cells at the end, enabling accuracy improvements in cases where the number of feature extraction layers is substantially reduced during neural architecture search.
3.1 Experimental Setup Dataset Construction For this bio-signal processing case study, the MIT-BIH dataset [32] is used to construct the required datasets by collecting a 256-sample window, which is subsequently assigned a label corresponding to the original labels of the parent dataset. The 41 different annotations of the parent dataset are categorized as one of the labels for each sub-case to ensure coherence in the dataset. To construct an enriched dataset that can provide the relevant information to the DNN model and enable it to learn effectively across labels like ventricular tachy-
An End-to-End Embedded Neural Architecture Search and Model Compression. . .
33
cardia and ventricular fibrillation, the framework also includes the CU Ventricular dataset [33] during the construction of the custom datasets. The constructed datasets are split in the ratio of .7:1:2 to generate the training, validation, and testing datasets, respectively. Neural Architecture Parameters An overview of the modified DNN architecture used in this case study is presented in Fig. 4. Therefore, the three primary neural architecture parameters that can be varied to generate the DNN model design space are (1) #ResNet Blocks, (2) #Filters, and (3) #LSTM Cells. The ResNet blocks are made of 1D convolutional layers, batch normalization, ReLU activation blocks, and dropout layers, as illustrated in Fig. 4, and can vary between 0 and 15. The number of filters, of size 16, in each convolution layer is determined as a function of z– [32 × 2z ]—where z starts from the value of 0 and is increased by 1 after every y ResNet blocks (y varies from 1 to 4 in increments of 1, i.e., y ∈ {1, 2, 3, 4}). The number of LSTM cells is varied as 2x , where x ∈ {4, 5, 6, 7, 8}. By varying these parameters, we can generate up to 320 different DNN models as part of a given application’s design space. However, due to the hardware limitation imposed by the state-of-the-art model, the framework reduces the number of models explored to 135, thereby drastically reducing the exploration time. Selective Exploration Figure 5 presents the composition of the chromosome used by the genetic algorithms in this case study. The chromosome is a binary string of size 9, which encodes the key neural architecture parameters discussed
Batch Norm ReLU Convolution Batch Norm ReLU Dropout Convolution
Convolution Batch Norm ReLU
256 sample
ECG Signal
Batch Norm ReLU LSTM Dense Softmax
Max pool
Convolution Batch Norm ReLU Dropout Convolution
× [1 – 15]
Max pool
Output
Fig. 4 Modified state-of-the-art DNN architecture used in the case study (adapted from [37]) Parent Chromosome Pair
Chromosome
4 values
#ResNet Blocks 0
1
0
16 values
1
1
0
#Filters
0 #LSTM Cells 1
1
0
Genes 5 values
1
1
1
0
0
1
0
1
0
0
0
1
1
1
1
1
1
1
0
0
1
1
1
0
0
0
1
1
1
1
0
1
1
0
1
1
1
1
0
0
0
0
1
0
1
0
1
Crossover 0
1
1
Mutation 0
1
1
Fig. 5 The composition of the chromosome used by the genetic algorithms in this case study; example of chromosomal crossover and mutation (adapted from [37])
34
B. S. Prabakaran and M. Shafique
Table 3 Optimal hyper-parameter values used for training the DNN Models [37]
Hyper-parameter Weights initialization Adam optimizer [22] Learning rate Batch size Dropout
Optimal value He et al. [16] β1 = 0.9, β2 = 0.999 0.001 128 0.2
Table 4 The optimal values of the constants used by the genetic algorithms [37]
Constant Population size Chromosome length Generation size Mutation probability Crossover probability
Optimal value 30 9 5 0.11 0.4
above as genes. The chromosome can therefore construct 210 − 1, or 1023, DNN models in design space for each of the sub-cases. However, since only 5 of the 7 possible #LSTM cell values lead to valid DNN model architectures, we can directly eliminate invalid configurations not present in ψ. Once the parent chromosome pair is selected, based on their fitness value, for generating offspring, they undergo the process of crossover to exchange genes and undergo potential mutation to introduce diversity, as illustrated in Fig. 5. Tool Flow The TensorFlow platform is used for the implementation of the DNN models in the Python programming environment with the help of the Keras package. The DNN models are trained over multiple iterations with varying hyper-parameter values to determine the ones that offer maximum accuracy. Table 3 presents the optimal values of these hyper-parameters. The DEAP library [5] in Python contains implementations of the four genetic algorithms that are used in the case study. Table 4 presents the constants and their optimal values, which are used by the genetic algorithms during selective exploration of the design space. The exploration stage is executed on a GPU server composed of four i9 CPUs and 8 Nvidia RTX 2080 GPUs, with the early stopping mechanism enabled. The selected models are then trained using the custom dataset for quality evaluation and studying the tradeoff with their hardware requirement.
3.2 Exhaustive Exploration Figure 6 illustrates the results and trade-offs between quality and memory of exhaustively exploring the models in the UC.3 design space. The Pareto-frontier of the complete design space, which connects all the Pareto-optimal DNN models and offers the best trade-off between quality and memory, is illustrated by A . Label B ,
An End-to-End Embedded Neural Architecture Search and Model Compression. . .
35
Fig. 6 Analysis of exhaustive exploration on the UC.3 design space (adapted from [37]) Exploration Time [Hrs]
20 15 10 5 0
α=0.8 β=0.2 ~9.52x Reduction in Exploration Time
UC1
UC2
α=0.2 β=0.8
α=0.2 β=0.8
~8.37x Reduction in Exploration Time
UC3
UC4
UC5
Fig. 7 Analyzing the time benefits of the selective exploration approach (adapted from [37])
on the other hand, depicts the pseudo-Pareto-frontier constructed using the set of optimal designs obtained by random exploration (i.e., baseline). The large number of inter-dependent parameters in DNN models leads to the situation where the designs depicted in A and B are very similar to each other. Exhaustive exploration of the design space has led to the successful identification of a DNN model that can reduce the overhead by .∼30 MB for a quality loss of less than .0.5%. However, due to the time required for such exhaustive exploration, it might be more suitable to obtain a near-optimal point that offers similar trade-offs using selective exploration for much less time. The variance in the Precision, Recall, and F1-score of the model for the specialized bundle branch class indicates suitability of the framework to impose a quality constraint on these metrics as well.
3.3 Selective Exploration: Time Benefits The primary benefit of the selective exploration process is the reduction in time required to search the design space of DNN models with the use of genetic algorithms. Figure 7 illustrates the reduction in time for the five different use-cases when explored using the genetic algorithms, as opposed to exhaustive exploration. We have also varied the weights used by the cost function (.α, .β) to emphasize that changing weights does not drastically modify the time needed for exploring the design space. Randomly selecting and evaluating .10% of the DNN models
36
B. S. Prabakaran and M. Shafique Anomaly Class: UC1 Tournament Search
Anomaly Class: UC1 Wheel Roulette Search Precision/Recall/F1-Score[%]
98
96
94 Precision Recall F1-Score
92 0
10
20 Memory[MB]
30
40
Precision/Recall/F1-Score[%]
100
100
98
96
94 Precision Recall F1-Score
92 0
10
20 Memory[MB]
30
40
Fig. 8 Evaluation of the quality and memory trade-offs for the models obtained using wheel roulette search and tournament search on the UC1 design space (adapted from [37])
in the design space acts as baseline comparison for the selective exploration strategies discussed in this chapter, with practically no algorithmic overhead. The selective exploration strategies achieve .9× reduction in exploration time, on average, as opposed to the bounded exhaustive exploration strategy. The use of genetic algorithms for exploring the search space is highly beneficial in scenarios where the application requires the use of highly complex deep neural networks with tens of millions of parameters. Exhaustively training and evaluating each model in the design space, in such instances, would lead to exploration time overheads of hundreds of GPU hours, which might not be feasible for the system designer.
3.4 Selective Exploration: Efficacy and Analysis The primary benefit of using genetic algorithms, reduction in exploration time, was already discussed earlier. In this subsection, we focus on the capability of the genetic algorithms in exploring the design space and analyze their efficacy. The results of these experiments for the UC.1 and UC.5 design spaces, with .α and .β set to .0.5 are illustrated in Figs. 8 and 9, respectively. The transparent points in these results depict the models obtained from the design space using exhaustive exploration, enabling us to determine the efficacy of the genetic algorithms. The genetic algorithms are highly successful at identifying a set of near-optimal DNN models without traversing the complete design space, especially in cases where the accuracy improvements or the hardware memory reductions are minimal when compared to the Pareto-optimal design. The number of models evaluated by the NSGA-II and SPEA-2 algorithm is smaller than their counterparts. Note that a significant number of models in the UC.5 design space exhibit .0% quality due to
An End-to-End Embedded Neural Architecture Search and Model Compression. . . Ventricular Fibrillation Class: UC5 SPEA-2 Search
Ventricular Fibrillation Class: UC5 NSGA-II Search 100
100 Precision/Recall/F1-Score[%]
Precision/Recall/F1-Score[%]
37
80 60
Precision Recall F1-Score
40 20 0
80 60
Precision Recall F1-Score
40 20 0
0
10
20 Memory[MB]
30
0
40
10
20 Memory[MB]
30
40
Fig. 9 Evaluation of the quality and memory trade-offs for the models obtained using NSGA-II search and SPEA-2 search on the UC5 design space (adapted from [37])
97.5 95.0 92.5 90.0 87.5 85.0
Precision Recall F1-Score
82.5 0
10
20 Memory[MB]
30
40
Precision/Recall/F1-Score[%]
Precision/Recall/F1-Score[%]
Premature Ventricular Contraction Class: UC2 Tournament Search a = 0.2 & b = 0.8
Premature Ventricular Contraction Class: UC2 Tournament Search a = 0.5 & b = 0.5 97.5 95.0 92.5 90.0 87.5 85.0
Precision Recall F1-Score
82.5 0
10
20 30 Memory[MB]
40
Fig. 10 Evaluation of the weighted exploration technique on the UC2 design space using tournament search with two different weight values for the cost function (adapted from [37])
the inherent differences in the number of samples of class Ventricular Fibrillation, leading to a bias against it.
3.5 Selective Exploration: Weighted Exploration Next, we discuss a subset of the results obtained when exploring the design space using different weights for the cost function (.φ), which is used by the genetic algorithms. Figure 10 illustrates the results when the algorithm focuses on optimizing (1) memory alone (.α = 0.2, β = 0.8) or (2) memory and accuracy (.α = 0.5, β = 0.5), for a model in the design space of UC.2 . Similarly, Fig. 11 presents the results when the algorithm optimizes for (1) memory alone (.α = 0.2, β = 0.8) or (2) accuracy alone (.α = 0.5, β = 0.5), in the design space of
38
B. S. Prabakaran and M. Shafique Ventricular Anomaly Class: UC4 Wheel Roulette Search a = 0.2 & b = 0.8
100
80 60 40 Precision Recall F1-Score
20
0
10
20 Memory[MB]
30
40
Precision/Recall/F1-Score[%]
Precision/Recall/F1-Score[%]
100
Ventricular Anomaly Class: UC4 Wheel Roulette Search a = 0.8 & b = 0.2
80 60 40 Precision Recall F1-Score
20
0
10
20 Memory[MB]
30
40
Fig. 11 Evaluation of the weighted exploration technique on the UC4 design space using wheel roulette search with two different weight values for the cost function (adapted from [37])
UC.4 . The weighted parameterization of the cost function is highly beneficial in guiding the genetic algorithms to optimize for the required parameter, i.e., memory or quality or both. For example, as illustrated by A in Fig. 11, the algorithm focuses on optimizing memory, thereby selecting a large number of points with minimal overhead. Similarly, when the optimization goal is either accuracy only (see Fig. 11) or memory and accuracy (see Fig. 10), appropriate models are selected for evaluation by the algorithm.
3.6 Pruning and Quantization: Compression Efficacy and Receiver Operating Characteristics Next, we select three near-optimal models obtained from the UC.4 design space to evaluate the efficacy of our pruning and quantization techniques. Without loss of generality and for the purpose of illustration, the three models, Z.1 , Z.2 , and Z.3 , focus on accuracy alone, trade-off between accuracy and memory, or memory alone, respectively. Pruning, alone, is quite effective in reducing the memory by nearly .40% for roughly .0.15% increase in accuracy. This contra-indicative improvement in accuracy can be attributed to the over-redundant parameterization of the network model, which is eliminated by pruning. Due to similar reasons, model Z.1 can tolerate pruning of a significant percentage of parameters before exhibiting accuracy losses, as opposed to the other models that are not as over-parameterized. Likewise, quantization can drastically reduce the memory requirements of the network by lowering the precision of the parameters storing the weights and biases. This process can further reduce the memory requirements by up to .5×, as opposed to FP32 precision, for .