203 37 55MB
English Pages 309 [300] Year 2021
Lecture Notes in Electrical Engineering 738
Sergio Saponara Alessandro De Gloria Editors
Applications in Electronics Pervading Industry, Environment and Society APPLEPIES 2020
Lecture Notes in Electrical Engineering Volume 738
Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Naples, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Università di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München, Munich, Germany Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Stanford University, Stanford, CA, USA Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University, Palmerston North, Manawatu-Wanganui, New Zealand Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering - quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning:
• • • • • • • • • • • •
Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please contact leontina. [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Editorial Director ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada: Michael Luby, Senior Editor ([email protected]) All other Countries: Leontina Di Cecco, Senior Editor ([email protected]) ** This series is indexed by EI Compendex and Scopus databases. **
More information about this series at http://www.springer.com/series/7818
Sergio Saponara Alessandro De Gloria •
Editors
Applications in Electronics Pervading Industry, Environment and Society APPLEPIES 2020
123
Editors Sergio Saponara DII University of Pisa Pisa, Italy
Alessandro De Gloria DITEN University of Genoa Genoa, Italy
ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-3-030-66728-3 ISBN 978-3-030-66729-0 (eBook) https://doi.org/10.1007/978-3-030-66729-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The 2020 edition of the Conference on “Applications in Electronics Pervading Industry, Environment and Society” was exceptionally held fully online during November 19 and 20, 2020 During the 2 days, 87 registered participants, from 27 different entities (20 Universities and seven industries), discussed electronic applications in several domains, demonstrating how electronics has become pervasive and ever more embedded in everyday objects and processes. The conference had the technical and/or financial support of University of Pisa, University of Genoa, SIE (Italian Association for Electronics), Giakova and of the H2020 European Processor Initiative. After a strict blind-review selection process, 12 short presentations and 24 lectures have been accepted (with co-authors from 14 different nations) in 11 sessions focused on circuits and electronic systems and their relevant applications in the following fields: Wireless and IoT, health care, vehicles and robots (electrified and autonomous), power electronics and energy storage, cybersecurity, AI and data engineering. More in detail, the short presentation sessions involved contributions on SS1 mechatronics, energies and Industry 4.0 and SS2 IoT, AI and ICT applications, while the full oral sessions involved contributions on S1 AI and ML techniques, S2 environmental monitoring and E-health, S3 electronics for health and assisted living, S4 digital techniques for mechatronics, energy and critical systems and S5 photonic circuits and IoT for communications. There were also two scientific keynotes, given by Cecilia Metra (IEEE Computer Society Past President) and by John David Davies (Barcelona Super Computing) and three industrial keynotes, by Carlo Cavazzoni (Leonardo Spa), Paolo Gai (Huawei) and Luca Poli (Giakova Spa). The articles featured in this book, together with the talks and round tables of the special events, prove that the capabilities of nowadays electronic systems, in terms of computing, storage and networking, are able to support a plethora of application domains, such as mobility, health care, connectivity, energy management, smart
v
vi
Preface
production, ambient intelligence, smart living, safety and security, education, entertainment, tourism and cultural heritage. In order to exploit such capabilities, multidisciplinary knowledge and expertise are needed to support a virtuous iterative cycle from user needs to the design, prototyping and testing of new products and services that are more and more characterized by a digital core. The design and testing cycles go through the whole system engineering process, which includes analysis of user requirements, specification definition, verification plan definition, software and hardware co-design, laboratory and user testing and verification, maintenance management and life cycle management of electronics applications. The design of electronics-enabled systems should be characterized by innovation, high performance, real-time operations and budget compliance (in terms of time, cost, device size, weight, power consumption, etc.). Design methodologies and tools have emerged in order to support teams dealing with such a complexity. All these challenging aspects call for the importance of the role of Academia as a place where new generations of designers can learn and practice with the cutting-edge technological tools, and where new solutions are studied, starting from challenges coming from a variety of application domains. This approach is sustained by industries that understand the role of a high-level educational system, able to nurture new generations of designers and developers. The APPLEPIES conference has reached, in 2020, its eight edition, confirming its role as a reference point for a growing research community in the field of electronics systems design, with a particular focus on applications. Sergio Saponara General Chair Alessandro De Gloria Honorary Chair
Contents
AI and ML Techniques Implementation of Particle Image Velocimetry for Silo Discharge and Food Industry Seeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Romina Molina, Valeria Gonzalez, Jesica Benito, Stefano Marsi, Giovanni Ramponi, and Ricardo Petrino Analysis and Design of a Yolo like DNN for Smoke/Fire Detection for Low-cost Embedded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessio Gagliardi, Marco Villella, Luca Picciolini, and Sergio Saponara Video Grasping Classification Enhanced with Automatic Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edoardo Ragusa, Christian Gianoglio, Filippo Dalmonte, and Paolo Gastaldo Enabling YOLOv2 Models to Monitor Fire and Smoke Detection Remotely in Smart Infrastructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Saponara, Abdussalam Elhanashi, and Alessio Gagliardi Exploring Unsupervised Learning on STM32 F4 Microcontroller . . . . . Francesco Bellotti, Riccardo Berta, Alessandro De Gloria, Joseph Doyle, and Fouad Sakr
3
12
23
30 39
Environmental Monitoring and E-health Unobtrusive Accelerometer-Based Heart Rate Detection . . . . . . . . . . . . Yurii Shkilniuk, Maksym Gaiduk, and Ralf Seepold A Lightweight SiPM-Based Gamma-Ray Spectrometer for Environmental Monitoring with Drones . . . . . . . . . . . . . . . . . . . . . . Marco Carminati, Davide Di Vita, Luca Buonanno, Giovanni L. Montagnani, and Carlo Fiorini
49
55
vii
viii
Contents
Winter: A Novel Low Power Modular Platform for Wearable and IoT Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrick Locatelli, Asad Hussain, Andrea Pedrana, Matteo Pezzoli, Gianluca Traversi, and Valerio Re Hardware–Oriented Data Recovery Algorithms for Compressed Sensing–Based Vibration Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . Federica Zonzini, Matteo Zauli, Antonio Carbone, Francesca Romano, Nicola Testoni, and Luca De Marchi
62
69
Electronics for Health and Assisted Living Automatic Generation of 3D Printable Tactile Paintings for the Visually Impaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francesco de Gioia, Massimiliano Donati, and Luca Fanucci Validation of Soft Real-Time in Remote ECG Analysis . . . . . . . . . . . . . Miltos D. Grammatikakis, Anastasios Koumarelis, and Efstratios Ntallaris Software Architecture of a User-Level GNU/Linux Driver for a Complex E-Health Biosensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miltos D. Grammatikakis, Anastasios Koumarelis, and Angelos Mouzakitis
79 90
97
Enabling Smart Home Voice Control for Italian People with Dysarthria: Preliminary Analysis of Frame Rate Effect on Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Marco Marini, Gabriele Meoni, Davide Mulfari, Nicola Vanello, and Luca Fanucci Brain-Actuated Pick-Up and Delivery Service for Personal Care Robots: Implementation and Case Study . . . . . . . . . . . . . . . . . . . . . . . . 111 Giovanni Mezzina and Daniela De Venuto Digital Techniques for Mechatronics, Energy and Critical Systems Creation of a Digital Twin Model, Redesign of Plant Structure and New Fuzzy Logic Controller for the Cooling System of a Railway Locomotive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Marica Poce, Giovanni Casiello, Lorenzo Ferrari, Lorenzo Flaccomio Nardi Dei, and Sergio Saponara HDL Code Generation from SIMULINK Environment for Li-Ion Cells State of Charge and Parameter Estimation . . . . . . . . . . . . . . . . . . 136 Mattia Stighezza, Valentina Bianchi, and Ilaria De Munari
Contents
ix
Performance Comparison of Imputation Methods in Building Energy Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Hariom Dhungana, Francesco Bellotti, Riccardo Berta, and Alessandro De Gloria Design and Validation of a FPGA-Based HIL Simulator for Minimum Losses Control of a PMSM . . . . . . . . . . . . . . . . . . . . . . . 152 Giuseppe Galioto, Antonino Sferlazza, and Giuseppe Costantino Giaconia x86 System Management Mode (SMM) Evaluation for Mixed Critical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Nikos Mouzakitis, Michele Paolino, Miltos D. Grammatikakis, and Daniel Raho Photonic Circuits and IoT for Communications A Novel Pulse Compression Scheme in Coherent OTDR Using Direct Digital Synthesis and Nonlinear Frequency Modulation . . . . . . . 173 Yonas Muanenda, Stefano Faralli, Philippe Velha, Claudio Oton, and Fabrizio Di Pasquale Design and Analysis of RF/High-Speed SERDES in 28 nm CMOS Technology for Aerospace Applications . . . . . . . . . . . . . . . . . . . . . . . . . 182 Francesco Cosimi, Gabriele Ciarpi, and Sergio Saponara Enabling Transiently-Powered Communication via Backscattering Energy State Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Alessandro Torrisi, Kasım Sinan Yıldırım, and Davide Brunelli Analysis and Design of Integrated VCO in 28 nm CMOS Technology for Aerospace Applications . . . . . . . . . . . . . . . . . . . . . . . . . 202 Paolo Prosperi, Gabriele Ciarpi, and Sergio Saponara vrLab: A Virtual and Remote Low Cost Electronics Lab Platform . . . . 213 Massimo Ruo Roch and Maurizio Martina Mechatronics, Energies and Industry 4.0 Mechatronic Design Optimization of an Electrical Drilling Machine for Trenchless Operations in Urban Environment . . . . . . . . . . . . . . . . . 223 Valerio Vita, Luca Pugi, Lorenzo Berzi, Francesco Grasso, Raffaele Savi, Massimo Delogu, and Enrico Boni Analysis and Design of a Non-linear MPC Algorithm for Vehicle Trajectory Tracking and Obstacle Avoidance . . . . . . . . . . . . . . . . . . . . 229 Francesco Cosimi, Pierpaolo Dini, Sandro Giannetti, Matteo Petrelli, and Sergio Saponara
x
Contents
Impact of Combined Roto-Linear Drives on the Design of Packaging Systems: Some Applications . . . . . . . . . . . . . . . . . . . . . . . 235 Marco Ducci, Alessandro Peruzzi, and Luca Pugi Preliminary Study of a Novel Lithium-Ion Low-Cost Battery Maintenance system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Andrea Carloni, Federico Baronti, Roberto Di Rienzo, Roberto Roncella, and Roberto Saletti Low Cost and Flexible Battery Framework for Micro-grid Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Roberto Di Rienzo, Federico Baronti, Daniele Bellucci, Andrea Carloni, Roberto Roncella, Marco Zeni, and Roberto Saletti Survey of Positioning Technologies for In-Tunnel Railway Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Luca Fronda, Francesco Bellotti, Riccardo Berta, Alessandro De Gloria, and Paolo Cesario IoT, AI and ICT Applications Edgine, A Runtime System for IoT Edge Applications . . . . . . . . . . . . . . 261 Riccardo Berta, Andrea Mazzara, Francesco Bellotti, Alessandro De Gloria, and Luca Lazzaroni An Action-Selection Policy Generator for Reinforcement Learning Hardware Accelerators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Marco Re, and Sergio Spanò Porting Rulex Machine Learning Software to the Raspberry Pi as an Edge Computing Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Ali Walid Daher, Ali Rizik, Marco Muselli, Hussein Chible, and Daniele D. Caviglia High Voltage Isolated Bidirectional Network Interface for SoC-FPGA Based Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Luis Guillermo García, Maria Liz Crespo, Sergio Carrato, Andres Cicuttin, Werner Florian, Romina Molina, Bruno Valinoti, and Stefano Levorato A Comparison of Objective and Subjective Sleep Quality Measurement in a Group of Elderly Persons in a Home Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Maksym Gaiduk, Ralf Seepold, Natividad Martínez Madrid, Juan Antonio Ortega, Massimo Conti, Simone Orcioni, Thomas Penzel, Wilhelm Daniel Scherz, Juan José Perea, Ángel Serrano Alarcón, and Gerald Weiss
Contents
xi
A Preliminary Study on Aerosol Jet-Printed Stretchable Dry Electrode for Electromyography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 M. Borghetti, Tiziano Fapanni, N. F. Lopomo, E. Sardini, and M. Serpelloni Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
AI and ML Techniques
Implementation of Particle Image Velocimetry for Silo Discharge and Food Industry Seeds Romina Molina1,2,4(B) , Valeria Gonzalez2 , Jesica Benito3 , Stefano Marsi1 , Giovanni Ramponi1 , and Ricardo Petrino2 1
3
Universit` a Degli Studi di Trieste—IPL, Dipartimento di Ingegneria e Architettura, Piazzale Europa, 1, 34127 Trieste, TS, Italy [email protected] 2 National University of San Luis—LEIS, Department of Electronic, Av. Ej´ercito de los Andes 950, D5700 BPB San Luis, Argentina National University of San Luis—INFAP, CONICET, Department of Physics, Av. Ej´ercito de los Andes 950, D5700 BPB San Luis, Argentina 4 The Abdus Salam International Centre for Theoretical Physics—MLAB, Strada Costiera, 11, 34151 Trieste, TS, Italy
Abstract. This work focuses on determining the velocity profile of a granular flow at the outlet of a silo, using artificial vision techniques. The developed algorithm performs a frame enhancement through neural networks and the particle image velocimetry detects seed motion in the hopper. We process 50, 100, 150 and 200 frames of a video discharge for three different grains using: CPU and PYNQ-Z1 implementations with a simple image processing at pre-processing level, and CPU implementation using neural network. Execution times are measured and the differences between the involved technologies are discussed. Keywords: PIV
1
· Image processing · SoC
Introduction
The growth of artificial vision techniques for image processing, recognition and classification permits to expand the expectations of the systems to solve problems that otherwise are much more difficult or impossible in different fields as security, industry, autonomous drive, among others [1–3]. In this work, we present the design of an artificial vision application for the calculation of the velocity field in a granular media at the outlet of a silo. The study of granular flows within silos is a topic of great interest due to its influence on different industrial processes (emptying, mixing, grinding and transfer of material) in industries such as cement, pharmaceutical, food, mining, among others. In food industry, there are countless examples where silos intervene in production processes, presenting problems associated with the geometry and characteristics of the silo and grains [4–6]. When the silos are not designed properly, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 S. Saponara and A. De Gloria (Eds.): ApplePies 2020, LNEE 738, pp. 3–11, 2021. https://doi.org/10.1007/978-3-030-66729-0_1
4
R. Molina et al.
serious difficulties can arise in the discharge flow, leading to non-homogeneity of mixing or blockages in hoppers. Therefore, it is highly relevant to know the characteristics of the granular flow. This work focuses on determining the velocity profile at the outlet of a silo using artificial vision techniques: (a) video enhancement (b) image processing in frequency domain using Fast Fourier Transform.
2
Experimental Setting Under Study
The device used in the experiments is shown in Fig. 1 [6]. It consists of a quasi-2D silo, with acrylic walls and mobile rods that allow to vary the silo outlet opening and the angle of inclination of the hopper. Transparent walls allow visualizing the flow during discharge. The granular material used are seeds typically employed in food industry: black sesame, millet and canary seeds (Fig. 1). The shape and color of the seeds present a hard challenge due to the difficulty to identify them during the rapid discharge close to the outlet. Also, in these experiments, the hopper angle is 90◦ (measured from the vertical), known as a flat silo. Furthermore, the outlet opening is large enough to avoid blockages. This configuration presents a rapid discharge zone in the center of the silo and stagnation zones near the walls.
Fig. 1. (a) Experimental device. (b) Grains (from top to bottom): millet, canary seeds and sesame.
Implementation of Particle Image Velocimetry
5
Particle image velocimetry (PIV) determines the displacement of particles in a certain flow using two images captured in a known time interval. Thus, it is required a video or image sequence of the silo discharge to be analyzed. In our experiments video is captured with a digital camera IDS UI-3160CP-C-HQ Rev.2.1, with a resolution of 900 × 400 pixels at 100 fps for millet seed and 940 × 328 pixels at 142 fps for black sesame. Input frame examples can be see in Fig. 2.
Fig. 2. Input frame - Millet (top) and black sesame (bottom)
Through the PIV processing, it can be determined the displacement (magnitude and direction) of the particles and, therefore, their velocity [7]. To implement this technique, each frame of the video is divided into a certain number of areas distributed over the image (interrogation windows).
3
Particle Image Velocimetry Algorithm: Design and Implementation
The proposed algorithm employs artificial vision techniques and has three main stages: (1) Frame enhancement, (2) Particle image velocimetry (PIV) algorithm to detect seed motion and (3) Motion vectors debugging. 3.1
Stage 1: Frame Enhancement Through Neural Networks
This stage improves the quality and appearance of each frame affected by experimental conditions (such as non-homogeneity in lighting, low brightness, noise,
6
R. Molina et al.
color distortion), to improve the subsequent tasks. We employ two techniques: (a) based on simple image processing, which involves conversion of the input frame to the HSV color space, followed by a per-element bit-wise conjunction with a predefined mask, and (b) based on neural network using WESPE [8] architecture, an image-to-image Generative Adversarial Network-based architecture with openly available models and code. Both techniques were included into the video-processing pipeline in separate experiments. As regarding neural network implementation, the training was performed using strong supervision, with the DPED dataset introduced in [9], with some modifications in the original architecture: (i) the weights for each loss were modified: w content (reconstruction): 0.2, w color (gan color):25, w texture (gan texture): 3, w tv (total variation): 1/600 (ii) for the content loss, relu 2 2 layer from VGG19 was used. The training parameters were configured as follows: learning rate: 0.0001, batch size: 32, train iterations: 20000. It should be observed that implementing a unique pre-processing technique makes the deployment of Stage 1 independent of the input video, i.e. permits to process different types of seeds with the same algorithm, avoiding specific techniques for each seed and generating a robust long-term processing technique. Our experiments put in evidence that, on the contrary, traditional enhancement methods without neural networks needed to be modified and tuned differently for the different cases. Once the enhancement is performed, each frame is converted to gray scale to be used as input in the next stage. Figure 3 shows the input frame in gray scale (left) and the output frame of this stage using the neural network (right). 3.2
Stage 2: Particle Image Velocimetry Algorithm
In this stage we determine the displacement of the particles within each interrogation window. An optimal window size has to be used in order to obtain a high accuracy without generating invalid vectors. Here, the image is divided into 18 × 8 (millet) and 18 × 6 (sesame) windows. By decreasing the window size, the number of resulting speed vectors increases and thus it is possible to estimate the direction and speed of the seeds with better accuracy. But this is not always optimal, if the window size is less than the proper size of the seed, it may happen that the real movement of the seeds is not detected, generating invalid displacement vectors. And, if the size of the windows is very large, a loss of information occurs. Then, we calculate the Fast Fourier Transform (FFT) for each interrogation window and the individual spectres of each subsequent interrogation window are multiplied. Finally, we calculate the Inverse Fourier Transform to obtain the position having the maximum correlation value. This information results in the displacement vector of the grains within that window.
Implementation of Particle Image Velocimetry
7
Fig. 3. Input frame in gray scale (top). Output of the pre-processing stage using neural network (bottom)
3.3
Stage 3: Debugging the Motion Vectors
Incorrect vectors are inevitable in the processing due to: size of the windows, stagnant or almost immobile seeds on the sides of the silo, very fast motion of the seeds between two subsequent frames, among others. With debugging, vectors are subjected to a reduction, validation and replacement. This is done comparing the resulting vectors with those obtained in neighboring windows. If there are inconsistencies the vector is eliminated and replaced by an average obtained from all neighboring windows. Finally, with the calculated displacement and the time interval between frames, we determine the velocity field along the hopper.
4
Embedded Implementation Using System on Chip
The implementation of increasingly complex systems is possible due to the development of modern technologies that allow on-chip systems to include the Processing system (PS) and Programmable logic (PL) in a single integrated system. These technologies permit a Hardware/Software (H/S) co-design for a reduction in processing time. In this context, with PYNQ (Python + Zynq) framework [10] we can create applications with SoC and MPSoC devices, using Python through
8
R. Molina et al.
Jupiter Notebook, at PS level and a certain available hardware configurations of the PL, through the so-called overlays. The main algorithm to implement PIV technique was developed and tested in CPU and, after verifying the correct functionality, it was ported to the System on Chip using Jupyter Notebook in the processing system through Ethernet connection. The measurements of the execution times is the first step to perform the codesign H/S in future works, to obtain a final embedded implementation for the complete system, looking forward to obtain real time processing in a portable device.
5
Results
Experimental setup: The algorithm was implemented and executed on a CPU Core i7 3.4 GHZ 64 GB RAM GeForce GTX 1070, using the Python 3.6.7, TensorFlow 1.12 and OpenCV 3.4.1 libraries. For the embedded implementation, PYNQ-Z1 board from Xilinx was used. The input videos were captured with a digital camera IDS UI-3160CP-C-HQ Rev.2.1, with a resolution of 900 × 400 pixels at 100 fps for millet seed and 940 × 328 pixels at 142 fps for black sesame. Figure 4 (a) and (b) show the results (velocity field) for the millet and sesame seeds. It can be noted that this technique predicts quite well the motion of the grains in the different zones of the hopper. Possible errors can be related to: stagnation zones (grains move very slowly, or short displacement every many
Fig. 4. Velocity field of the grains: (a) Millet (b) Black sesame.
Implementation of Particle Image Velocimetry
9
frames), or very fast motion (central region with fluctuations of the flow). It is also important to note that results for sesame seeds are very good despite the fact that grains have a very different geometry and color compared to the millet. Regarding the execution times, the experiments were carried out by processing 50, 100, 150 and 200 frames of the discharge in different technologies and with 3 types of grain as input: CPU implementation, PYNQ-Z1 implementation, both with simple image processing at pre-processing level, and CPU using neural network. The results are presented in Table 1. As we can observe, the rise in execution time is directly proportional to the increase in processed frames. Also, the differences between the involved technologies are related with the processing system: CPU with i7 and PYNQ-Z1 with dual-core Cortex-A9. Despite run time differences, the algorithm was fully implemented in the embedded system, enabling the next stage of H/S co-design. Table 1. Execution times in seconds: (A) CPU implementation with simple image processing, (B) CPU implementation using neural network, (C) PYNQ-Z1 implementation with simple image processing. (A) CPU [sec]
(B) CPU NN [sec]
Frames
Frames
Seed Millet
Black sesame
Canary seed
Millet
50
7.69
4.69
9.76
50
100
10.80
6.93
13.13
100
150
13.96
9.29
16.56
150
200
20.04
11.57
20.07
200
(C) PYNQ-Z1 [sec]
Seed
Frames Black sesame
Canary seed
Seed Millet
Black sesame
Canary seed
26.95
21.58
28.29
50
102.88
79.84
154.15
50.5
38.8
50.17
100
221.01
136.35
234.57
74.24
55.19
71.88
150
300.84
192.67
318.81
97.08
72.8
93.75
200
415.08
254.49
405.06
The speed of the grains at the final line of the outlet vs. position is shown in Fig. 5. With 200 processed frames, better results are obtained (less fluctuations), even if the execution time is larger than the one obtained with 50 frames. The behavior of the speed observed at the exit of the silo (Fig. 5) is the expected one for a granular discharge [6], this is, the speed in the center of the hopper is higher and, as the grains approach the edges of the outlet opening, it tends to zero. Besides, the curves have the shape of an arc (typical arc formed by the particles in the outlet where they describe a free fall). It also can be observed that black sesame grains present higher velocities than the millet ones. This may be due to the fact that the seeds have different characteristic sizes. Nevertheless, in these analyzed cases, the width of the outlet opening is different (to avoid blockages), thus, a more in-depth study should be carried out to analyze the dependence of the speed at the outlet with the size of the seeds and outlet width. When comparing the displacements of the seeds found in the different interrogation windows through PIV with the ones determined visually, the differences result of the order of 10%. Also, inside the hopper, these differences are more noticeable as we go closer to the walls of the silo. This may be due to the transition of the behavior: from fast discharge zone to stagnant zone. In this transition,
10
R. Molina et al.
the displacements of the grains are quite small and, some of them do not even move. On the other hand, the enhancement stage improves the subsequent processing, and a trade off between pre-processing stage and its impact in the final velocity field must be taken into account. With the neural network, run times are longer than implementing the simple processing for the enhancement task, but using WESPE we can obtain more vectors in the final velocity field along the hopper.
Fig. 5. Speed at the final line in pix/sec: Millet (top) and black sesame (bottom)
6
Conclusions and Future Work
In this work we developed an algorithm for the analysis of the motion of different granular materials within a silo hopper. Different stages were carried out in order to improve the motion detection. This goal was successfully achieved for all the tested cases and, in particular, the number of frames used in the process proves relevant to reduce fluctuations in the obtained velocity. The algorithm was executed using different technologies (CPU and PYNQ-Z1) and different pre-processing methods in stage 1 (image processing and neural network). As expected, when comparing the execution times, they were lower for the case of
Implementation of Particle Image Velocimetry
11
the CPU implementation. Nevertheless, with these results and the new innovations for neural networks and FPGAs, in future work we can improve the execution times to get a good compromise between time processing and a solid pre-processing stage in order to generate a robust and precise velocity field. Also, future efforts will be dedicated to test more types of seeds, to process the velocity field in the entire silo (not only the area of the outlet) and to incorporate the version video camera in an embedded system.
References 1. Wang, Y., Zhang, J., Cao, Y., Wang, Z.: A deep CNN method for underwater image enhancement. In: IEEE International Conference on Image Processing (ICIP), Beijing, pp. 1382–1386 (2017) 2. Kadar, M., Onita, D.: A deep CNN for image analytics in automated manufacturing process control. In: 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Pitesti, Romania, pp. 1–5. (2019) 3. Ko, S., Yu, S., Kang, W., Park, C., Lee, S., Paik, J.: Artifact-free low-light video enhancement using temporal similarity and guide map. IEEE Trans. Ind. Electron. 64(8), 6392–6401 (2017) 4. Job, N., Dardenne, A., Pirard, J.P.: Silo flow-pattern diagnosis using the tracer method. J. Food Eng. 91(1), 118–125 (2009) 5. Eurocode 1, Basis of design and actions on structures - 4: Actions in silos and tanks (1998) 6. Villagr´ an,C.: Efecto de los par´ ametros de forma de los granos y del ´ angulo de inclinaci´ on de la tolva en el flujo de semillas en silos. Trabajo Final de Licenciatura en F´ısica - FCFMyN - UNSL, and references within (2018) 7. Westerweel, J.: Fundamentals of digital particle image velocimetry. Meas. Sci. Technol. 8(12), 1379–1392 (1997) 8. Ignatov, A., Kobyshev, N., Timofte, R., Vanhoey, K., Van Gool, L.: WESPE: weakly supervised photo enhancer for digital cameras. In: IEEE International Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) (2018) 9. Ignatov, A., Kobyshev, N., Timofte, R., Vanhoey, K., Van, G.: LucDSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks, Luc (2017) 10. PYNQ - http://www.pynq.io/. Seen 8 2020
Analysis and Design of a Yolo like DNN for Smoke/Fire Detection for Low-cost Embedded Systems Alessio Gagliardi(B) , Marco Villella, Luca Picciolini, and Sergio Saponara Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy [email protected]
Abstract. This paper proposes a video-based fire and smoke detection technique to be implemented as antifire surveillance system into low cost and low power single board computer (SBC). Such algorithm is inspired by YOLO (You Only Look once), a real-time state of the art object detector system able to classify and localize several objects into a single camera frame. Our architecture is based in three main segments: Bounding Box Generator, Support Classifier and Alarm Generator. The custom Yolo network was trained using already available dataset from literature and tested with respect to Classical and DL (Deep Learning) algorithms achieving best performance in terms of accuracy, F1, Recall and precision. The proposed technique has been implemented on four low cost embedded platform and compared respect the frame per second that they can achieve in real-time.
1 Introduction Video surveillance involves the action of observing a scene and looking for specific incorrect behavior, emergency situations or dangerous conditions. We develop, for that purpose, a video surveillance system for fire and smoke detection by designing a video camera-based algorithm. Similar methods have already been proposed in the literature e.g. those that use classic techniques such as background subtraction using Visual Background Estimation (ViBe) [1], fuzzy logic [2] and Gaussian Mixed Model (GMM) [3] for recognition. Other approaches exploit the Kalman filter [4] for the estimation of the moving smoke blobs, while other authors propose the modern Neural Networks object detection retraining existing models or creating new ones from scratch [5–8]. While traditional video-based smoke detectors use hand-designed features such as colour and shape characteristics of smoke, the recent Deep Learning algorithms allow for automatic data-driven feature extraction and classification from raw image streams [6, 16, 17]. In [6] authors propose an image classification method to detect fire and smoke instances in various videos using a simple Deep Neural Network (DNN) classifier. In [16] and [17] the authors fine-tuned two similar You Only Look Once (YOLO) [9] networks to detect fires in indoor scenario. However, the main issues of these presented studies involve the implementation on an embedded system or the dataset used for the training phase. In fact, both studies of [16] and [17] lacks of implementation on embedded systems and uses a limited dataset. In [16] where used 1720 images while in [17] where used © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 S. Saponara and A. De Gloria (Eds.): ApplePies 2020, LNEE 738, pp. 12–22, 2021. https://doi.org/10.1007/978-3-030-66729-0_2
Analysis and Design of a Yolo like DNN for Smoke/Fire
13
only 60. It is clear that these two networks are able to cover only the indoors scenarios according to their dataset and therefore difficult to apply in real world scenarios. The works in [6] instead propose lightweight Convolutional Neural Network (CNN) and an implementation on a Raspberry Pi 3. However, such CNN is not able to identify the objects inside each frame camera, as YOLO does, but works as an image classifier. This means that the frame must be completely covered by flames to have good performance and therefore such method is not applicable for fire prevention systems. The goal of the project is to build a solution that combines the robustness and the accuracy derived provided by Neural Network (NN) model and the portability for an embedded implementation. Hence, a Neural Network inspired by Yolo [9], was designed from scratch. Subsequently we added an alarm generator algorithm, based on the persistence of anomalies in the image, to increase robustness. This solution has been finally deployed in four different embedded systems having different price and capabilities. The main contributions of this work are as follows: • We introduce a YOLO like neural network for fire and smoke detection, able to consistently detect smoke and fire without generating false alarms compared to the State of the Arts algorithm. • We also introduce a new heterogeneous training fire and smoke dataset combining images from different indoor and outdoor scenarios. The entire dataset is finally composed of 90 danger videos, 1200 danger images, 28 neutral videos, and 600 neutral images. • We also present implementations on four low cost embedded system comparing them respect to their performance of real time processing and power consumption. This comparison is necessary verify the specification that fits well for an IoT application. Hereafter, the paper is organized as follows: Sect. 2 deals with the description of global Neural Network. Section 3 describes the configuration parameter used in the training phase and dataset. Section 4 reports a short comparison with respect the stateof-the-art algorithm and a comparison of real-time processing with different boards. Conclusions are reported in Sect. 5.
2 The Deep Neural Network Architecture YOLO [9] is one of the fast object detector method where a single CNN simultaneously predicts multiple bounding boxes and class probabilities for those boxes. Such method has become popular because it achieves high accuracy while also being able to run in real-time. The algorithm “only looks once” at the image in the sense that it requires only one forward propagation pass through the neural network to make predictions and then output multiple bounding boxes. YOLO method divides the input image into an S × S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell produce a prediction of bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and also how accurate it thinks the box is that it predicts. The author of YOLO define a
14
A. Gagliardi et al.
confidence as: Pr(Object) ∗ IOU, where IOU is the Intersection of Union. The confidence is equal to zero if no object exists in that cell. Otherwise, the confidence score to equal the IOU between the predicted box and the ground truth. After that, it is defined the class probability for each grid as: Pr(Classi|Object)∗ Pr(Object) ∗ IOU = Pr(Classi)∗ IOU. A bounding box will be selected with the highest probability value to be used as a separator of one object with another object, as shown in Fig. 1.
Fig. 1. Detection of multiple objects using the YOLO method.
The original YOLO method uses a 7 × 7 grid, 24 convolutional layers with two fully connected for a total model size of almost 200 MB. So that, the idea is to design a custom neural network to keep the same efficiency and speed as the yolo method, but saving memory space by obtaining a model smaller than 200 MB and with less than 24 convolutional layers to fit in an embedded system.
Fig. 2. Picture of the full deep network architecture. blue layers represent convolutional blocks. the red layers represent instead fully connected blocks. on top of the bounding box generator subnet, on bottom the support classifier.
Analysis and Design of a Yolo like DNN for Smoke/Fire
15
Inspired by YOLO, we developed a Deep Neural Network architecture composed of two different CNNs. The whole architecture receives as inputs RGB frames of a video source and performs fire and smoke detection within each input frame and, eventually, triggers a pre-alarm signal. The first CNN is a Bounding Box Generator that has the task of collecting each fire and smoke instance within a frame into a single bounding box. The second CNN is a Support Classifier that has the purpose of rejecting false positives coming from the first sub-network. Furthermore such “danger classifier” decides whether to trigger the pre-alarm state by inspecting the last few frames detections determining consistency in term of position, frequency, and label. Finally, the Alarm Generator Algorithm check the persistence of the anomaly and eventually trigger a fire alarm. The Fig. 2 shows a representation of the full model’s architecture: the connection between the first subnet’s output and the second subnet’s input is handled by cropping the region of the image corresponding each bounding box predicted. Once cropped, such regions are processed and resized in order to last feeding them to the Support Classifier who predicts again their validity. Filtered outputs are then stored within a list and checked by the alarm generator algorithm, which represents the final judge on the need of triggering the pre-alarm state. 2.1 The Bounding Box Generator Subnetwork The Bounding Box Regressor subnet can be furtherly split into two logic portions: the convolutional portion, or Feature Extractor, and the fully connected portion, or Feature Regressor. The first consists of a fully convolutional network that has the role to extract desired features from the input 96 × 96 × 3 RGB image, the second is a fully connected segment that stores most of the first subnet’s capacity and uses the extracted features to assign bounding boxes. A Global Max-Pooling (GMP) layer interconnects the two portions. This layer achieves a much less dimensional reduction of parameter than the usually implemented with Flatten layers. Each convolutional and fully connected block stores several simpler layers, as depicted in Figs. 2 and 3.
Fig. 3. Convolutional block composition (left) and fully connected block composition (right).
16
A. Gagliardi et al.
Each Convolutional block consists of Convolutional Layer, ReLU and Batch Normalization. The Convolutional portion handles the heavy image processing computations; they have been designed with a filter size of 3 × 3, a unitary stride, dimensionspreserving zero-padding and depth that ranges from 16 kernels to 512 throughout the whole architecture. The Fully Connected block consists instead of Dense layer, Leaky ReLU, and Batch Normalization. Such layers represent the majority of the model capacity being the main source of the network’s trainable weights. We decided to adopt a deeper stack of smaller layers so that we could keep the network memory efficient by reducing parameters count and enhancing layers expressivity by using a higher amount of activation nonlinearities. The Leaky-ReLU is selected as activation function because it achieves the best inference computation time while being more robust than traditional ReLU unit that can cause many dead connections instead. Max-Pooling layers are implemented with a kernel size of 2 × 2 and a stride of 2. Moreover, an 2-norm regularization with 0.01 regularizing force has been applied to all trainable layers to discourage skewed large-weight distribution from forming during training. Batch Normalization Layers and Dropout layers were used to furtherly reduce over-fitting related issue and enforce the maximum possible regularity in weights distribution. The Bounding Box Generator subnet returns as output a tensor of 3 × 3 × 6 (Fig. 4). The tensor represents a 3 × 3 grid on the image and for each of the nine cells there are 6 main data that are: Detection Probability, X coordinate of the bounding box’s center, Y coordinate of the bounding box’s center, Width of the bounding box, Height of the bounding box and Class Score (0 if smoke, 1 if fire).
Fig. 4. The 3 × 3 grid for detection encoding applied to a frame.
2.2 The Support Classifier Subnetwork The input of the support classifier is an RGB image 24 × 24 × 3 while the output is the danger score, a 2-dimensional array where the first number is the probability that the input is classified as a danger, and the second is the probability that the input is classified as neutral. We selected a typical Convolutional block’s architecture with a 2 × 2 pooling between each block to halve the image size. At the end of the Convolution blocks series, there is a Global max-pooling 2d which is an ordinary max pooling layer with pool size
Analysis and Design of a Yolo like DNN for Smoke/Fire
17
equals to the size of the input useful because it allows reducing tensor dimensions and overall trainable weights. Then there are the Fully Connected blocks, each of which followed by a dropout to prevent overfitting. At the end of the architecture, there is the Softmax layer that is a generalization of binary Logistic Regression classifiers to multiple cases. There are three Convolutional blocks, each of which is composed by a Convolutional layer, an Activation layer, and a Batch Normalization layer in series with another Convolutional layer, Activation layer and Batch Normalization layer of the same dimensions. The convolutional and fully connected blocks of such subnetworks share the same architecture already shown in Fig. 3. 2.3 The Alarm Generator Algorithm The final goal of this work is to develop an efficient video surveillance system. So that, we must avoid any false positive, which means that the alarm is generated without real danger. For doing so, the algorithm check in an observation window of a second every prediction of the two Neural Network. If the at least 60% of the frame is marked as “danger” by both the bounding box generator and the support classifier, then an alarm is triggered. Figure 5 shows the real-time detection of some test video applied as input to our algorithm. The fire is inserted into a red bounding box while the smoke is inserted into blue bounding box. The alarm is generated when the red circle appears on the top left of the image otherwise the circle displayed is green.
Fig. 5. Example of fire and smoke detection in tests videos.
3 The Dataset and the Training Procedure To train these CNNs we use a dataset composed of both images and videos of fire, smoke, and neutral scenarios. Some of these videos and images are taken from the “Firesense” dataset available online [10], other from the Foggia’s [11] and Sharma’s [12] dataset, and most of the images from Yuan’s dataset [13]. Other fire videos were provided by Trenitalia which showed interest in equipping the cameras already installed inside the train wagons with intelligent fire detection algorithms. The entire dataset is finally composed of 90 danger videos, 1200 danger images, 28 neutral videos, and 600 neutral images. A total number of 60 danger videos and 22 neutral videos have been selected as test dataset while 30 danger videos, 6 neutral videos, and all 1800 images
18
A. Gagliardi et al.
were used for training and validation. We used data augmentation techniques such as shift, rotation, and flip etc. to improve the training by increasing the number of the images from 2400 to 4800. We labeled the input data to the two neural subnets in a different way. The labeling for the Bounding Box Generator consists in drawing a bounding box by hand around any target (smoke or fine) present in the images or video frames. Subsequently, are reported the 6 values that have been previously discussed. The labeling for the Support Classifier consists in sorting the cropped images in two different folders, one containing the danger images and the other containing the neutral images. For both CNN’s, the learning rate methods chosen is Adam which is a stochastic gradient descent method that is based on adaptive estimation of first order and second-order moments. A batch size of 64 was configured for the Bounding Box Generator while a batch size of 32 was chosen for the Support Classifier. While the Bounding Box Generator network has trained for 10000 epochs, for the Support Classifier 2500 epochs were enough to reach the convergence. We used a custom loss function for training the first neural network designed specifically to correctly predict the presence/absence of a target within a cell and, in case of presence of smoke/fire, to learn its bounding box parameters. This was performed to reduce training time and to avoid wasting network’s capacity on learning useless parameters such as the zeros in no danger containing cells’ depth vector. The Support Classifier is, as already reported, a Softmax classifier, and so it uses a binary cross-entropy loss with the form: N 1 y log yˆ i + (1 − y) log 1 − yˆ i L y, yˆ = − N
(1)
i=0
where y is the ground truth value, yˆ is the prediction and N is the number of samples per batch. The Softmax classifier is hence minimizing the cross-entropy between the estimated class probabilities and the “true” distribution. For the training phase we considered three initial values of learning rate from 10−2 to 10−5 . Such value has been finally set as 10−4 achieving the best model after 7363 epochs for the first network and around 2100 epoch as shown in Fig. 6 with the red marks.
Fig. 6. Training loss and Validation loss at 10−4 values of learning rate achieved for the bounding box generator network(left) and for support classifier network(right).
Analysis and Design of a Yolo like DNN for Smoke/Fire
19
4 Results of the Proposed DNN Technique To assess the performance of the proposed technique the first comparison was performed with respect to non-AI-based algorithm. Therefore, we decided to test the performance in terms both of accuracy, F1, Precision, and Recall metrics. These tests were conducted on a set of videos presented in [4] and the results are shown in Fig. 7. We can clearly see that our solution, mentioned as Custom Yolo, outperforms both [4] and [14, 15] in terms of all the four main metrics. We performed another comparison versus other AI-based solutions available in the state of the arts by using the same dataset. In this case the test was performed comparing only accuracy and recall metrics since those were the only available in most of the research literature. Looking at results indicated in Table 1 it is possible to see how our solution achieves the highest accuracy value and the second highest recall value, being outperformed only by an RCNN based solution. However, it is proved that RCNN runs at about 25x slower in terms of frame per seconds (fps) than our solution on tested embedded systems. Hence, we tested the algorithm in four different embedded systems with different price and capabilities: a Raspberry Pi3, Raspberry Pi4 with 4 GB of RAM, an Nvidia Jetson Nano and an Nvidia Jetson AGX Xavier. We performed the test measuring the maximum frame per seconds that they can achieve at different power consumption setting if available. Results are reported in Table 2.
Fig. 7. Performance Comparison with Non-AI-Based Methods [4, 14, 15].
The Nvidia Jetson Xavier obtains the best results in MAXN configuration reaching about 80 frames per second. In such configuration, the Xavier board is x4 times faster than the Nvidia Jetson Nano and about x16 times faster than the Raspberry Pi 3 that reaches only 5 fps. The Nvidia Jetson Nano seems to be the ideal compromise between performance and power consumption. In fact, the Raspberry Pi 4 does not get more than 10 fps even if it consumes an average 5 W. The Jetson Nano board gets about 20 fps in 10 W consumption configuration feasible and so can be considered as final platform for a final IoT antifire system.
20
A. Gagliardi et al. Table 1. Accuracy and Recall of our Custom CNN vs State of the Arts AI-based methods. Reference
Accuracy Recall
Custom Yolo
0.973
0.978
Saponara et al. [18] R-CNN 0.936
1
Jadon et al. [6]
0.939
0.94
Filonenko et al. [7]
0.85
0.96
Yuan et al. [8]
0.86
0.53
Celik et al. [2]
0.83
0.6
Table 2. Average fps values achieved across all the four platforms. Reference Raspberry Pi 3 Model b
Fps 5
Raspberry Pi 4 4 GB RAM
10
Nvidia Jetson Nano 5 W
12
Nvidia Jetson Nano MAXN 20 Nvidia Jetson AGX 10 W
26
Nvidia Jetson AGX 15 W
33
Nvidia Jetson AGX MAXN 81
5 Conclusion In this paper we presented a design, implementation, and a short comparison of a DNN algorithm for smoke and fire detection inspired by the YOLO technique. We have shown that through dedicated design procedure it was possible to build DNN-based object detection models that preserve and improve the flexibility and performance of traditional state of the art solutions. We also have achieved better performance than similar algorithms using both classic image processing methods and modern AI techniques implementing the techniques in different embedded systems. As future work, the authors intend to continue the investigation respect to the state-of-the-art algorithms by comparing the delay time of detection, CPU, GPU, Memory usage and power consumption on such already presented boards. The authors are further investigating about a possible implementation of the described methodology on FPGA. Acknowledgments. Work partially supported by Dipartimenti di Eccellenza Crosslab Project by MIUR.
Analysis and Design of a Yolo like DNN for Smoke/Fire
21
References 1. Vijayalakshmi, S.R., Muruganand, S.: Smoke detection in video images using background subtraction method for early fire alarm system. In: 2017 2nd International Conference on Communication and Electronics Systems (ICCES), pp. 167–171 (2017) 2. Çelik, T., Özkaramanlı, H., Demirel, H.: Fire and smoke detection without sensors: image processing-based approach. In: IEEE 15th European Signal Processing Conference, pp. 1794– 1798 (2007) 3. Zhang, Q., et al.: Dissipation function and ViBe based smoke detection in video. In: 2017 2nd International Conference on Multimedia and Image Processing (ICMIP). IEEE, pp. 325–329 (2017) 4. Gagliardi, A., Saponara, S.: AdViSED: advanced video smoke detection for real-time measurements in antifire indoor and outdoor systems. Energies, 13(8), 2020 5. Ehlnashi, A., Gagliardi, A., Saponara, S.: Exploiting R-CNN for video smoke/fire sensing in antifire surveillance indoor and outdoor systems for smart cities. In: IEEE 6th International Conference on Smart Computing (SSC), Bologna, Italy, September 2020 6. Jadon, A., et al.: Firenet: a specialized lightweight fire & smoke detection model for real-time iot applications. arXiv preprint arXiv:1905.11922 (2019) 7. Filonenko, A., Hernández, D.C., JO, K.-H.: Fast smoke detection for video surveillance using CUDA. IEEE Trans. Ind. Inform. 14(2), 725–733 (2017) 8. Yuan, F., Fang, Z., Wu, S., Yang, Y., Fang, Y.: Real-time image smoke detection using staircase searching-based dual threshold AdaBoost and dynamic analysis. IET Image Process. 9(10), 849–856 (2015) 9. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016) 10. Dimitropoulos, K., Barmpoutis, P., Grammalidis, N.: Spatio-temporal flame modeling and dynamic texture analysis for automatic video-based fire detection. IEEE Trans. Circ. Syst. Video Technol. 25(2), 339–351 (2014) 11. Foggia, P., Saggese, A., Vento, M.: Real-time fire detection for video-surveillance applications using a combination of experts based on color, shape, and motion. IEEE Trans. Circ. Syst. Video Technol. 25(9), 1545–1556 (2015) 12. Sharma, J., Granmo, O.C., Goodwin, M., Fidje, J.T.: Deep convolutional neural networks for fire detection in images. In: International Conference on Engineering Applications of Neural Networks, pp. 183–193. Springer, Cham August 2017 13. Yuan, F., Shi, J., Xia, X., Fang, Y., Fang, Z., Mei, T.: High-order local ternary patterns with locality preserving projection for smoke detection and image classification. Inf. Sci. 372, 225–240 (2016) 14. Saponara, S., Pilato, L., Fanucci, L.: Early video smoke detection system to improve fire protection in rolling stocks. In: Real-Time Image and Video Processing 2014, vol. 9139, p. 913903. International Society for Optics and Photonics, May 2014 15. Saponara, S., Pilato, L., Fanucci, L.: Exploiting CCTV camera system for advanced passenger services on-board trains. In: 2016 IEEE International Smart Cities Conference (ISC2), pp. 1–6. IEEE, September 2016 16. Shen, D., Chen, X., Nguyen, M., Yan, W.Q.: Flame detection using deep learning. In: 2018 4th International Conference on Control, Automation and Robotics (ICCAR), pp. 416–420. IEEE April 2018
22
A. Gagliardi et al.
17. Lestari, D.P., Kosasih, R., Handhika, T., Sari, I., Fahrurozi, A.: Fire hotspots detection system on CCTV videos using you only look once (YOLO) method and tiny YOLO model for high buildings evacuation. In 2019 2nd International Conference of Computer and Informatics Engineering (IC2IE), pp. 87–92. IEEE September 2019 18. Saponara, S., Elhanashi, A., Gagliardi, A.: Real-time video fire/smoke detection based on CNN in antifire surveillance systems. J. Real-Time Image Proc., 1–12 (2020)
Video Grasping Classification Enhanced with Automatic Annotations Edoardo Ragusa(B) , Christian Gianoglio, Filippo Dalmonte, and Paolo Gastaldo Department of Electrical, Electronic, Telecommunication Engineering and Naval Architecture DITEN, University of Genoa, Genova, Italy {edoardo.ragusa,christian.gianoglio}@edu.unige.it, [email protected], [email protected]
Abstract. Video-based grasp classification can enhance robotics and prosthetics. However, its accuracy is low when compared to e-skin based systems. This paper improves video-based grasp classification systems by including an automatic annotation of the frames that highlights the joints of the hand. Experiments on real-world data prove that the proposed system obtains higher accuracy with respect to the previous solutions. In addition, the framework is implemented on a NVIDIA Jetson TX2, achieving real-time performances. Keywords: Grasping classification systems
1
· CNNs · Prosthetics · Embedded
Introduction
Automatic inference of grasping actions can boost fields like prosthetics and rehabilitation. Artificial intelligence techniques in combination with electronic skin proved effective for this task [11,21]. However, tactile sensors are costly and limit grasping and manipulation actions [5,7]. Besides, the sensing system might be annoying for patients. Video-based models for hand grasp analysis can represent a non-invasive solution [6,15]. These approaches are possible thanks to deep learning. In fact, computer vision can address complex tasks like medical image analysis [1], sentiment analysis [16] and sports application [10] thanks to automatic features learning capabilities. However, the deployment of computer vision on embedded devices is tricky and requires ad-hoc solutions [18]. In practice, the most accurate solutions can be rarely deployed in real-time on embedded systems. Video grasp detection is a challenging task composed of two sub-tasks: the system locates the hand into the image, then, it classifies the grasp action. The literature provides three works that addressed both tasks. In [24] the authors proposed a solution based on convolutional neural networks (CNNs) that identifies hand grasping by filtering patches from an image. The CNN architecture was composed of 5 convolutional layers. Patches identification used an c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 S. Saponara and A. De Gloria (Eds.): ApplePies 2020, LNEE 738, pp. 23–29, 2021. https://doi.org/10.1007/978-3-030-66729-0_3
24
E. Ragusa et al.
ensemble of three classifiers proposed in [14]. Similarly, [4] approached the two tasks by employing a heterogeneous set of computer vision techniques for both hand detection and feature extraction. [17] introduced a novel framework for video-based grasping classification. Deep Learning (DL) supported the two levels schema thanks to an automatic refinement of the existing databases. In addition to the closely related solutions, many works offer interesting insights: [15] estimated contact force using visual tracking between users’ hands and objects. In [12] the video analysis models the manipulation of deformable objects. In [8] deep learning (DL) methods discriminate right and left hands inside an image. [9,13] enriched the processing system of a prosthetic hand with a video system. Finally, hand gesture recognition techniques [22] offer valuable insight for grasp classification. Despite the interesting results shown in [4,14,17], accuracy is still a primary concern in video-based solutions. In practice, inference systems detect hand positions with high accuracy, but they struggle in recognizing the grasp action. This is mostly due to intrinsic problems like self-occlusion and dataset limitation. In fact, labeling operations are time-consuming and prevent high size datasets. In addition, during grasp actions, parts of the hand are occluded. This paper tackles explicitly these issues using an automatic annotation of the images. The proposed solution uses the approach proposed in [20] where the authors trained a CNN in recognizing the joint of the hand using a dataset composed of multiple views of the hand. In practice, thanks to a projection strategy, the network was trained to detect also the joints that were self occluded. This network was included in the prediction pipeline proposed in [17]. The information extracted about joints position is superimposed to RBG images using colored segments that connect the joints of the hand. Accordingly, these annotations add precious information for the grasping action recognition. Notably, using the superimposed annotations avoids the development of a custom dataset containing multiple views of hands with the label of the grasping action. This annotation is expected to simplify the classification task reducing the needs for high-size datasets and alleviating the self-occlusion problem, leading to better generalization performances. Experiments on real-life videos prove that the framework overcomes recent solutions in terms of accuracy. In addition, experiments on Nvidia Jetson TX2 confirm that the proposed solution still has real-time performances in high-performance edge devices suitable for embedded implementations.
2
Proposal
This paper extends the solution proposed in [17] by including a novel block in the processing flow. In the following, in accordance with previous works, the right hand is the target of the analysis. Figure 1 shows the novel solution: the Hand detection network locates all the hands inside an input image (the red boxes in the figure “Detections”). The Right hand heuristic block selects the right hand, using a well-known heuristics. The selected patch feeds the new
Video Grasping Classification Enhanced with Automatic Annotations
25
Fig. 1. Outline of the grasping classification pipeline
Annotation network, marked in green. This network superimposes a skeleton over the hand under analysis, as shown in the “Annotated Patch” sub-image. Finally, the Grasping Classification Network outputs a grasping label based on the “Annotated Patch”. This work considers a bi-class output: grasp versus pinch. The new block simplifies the classification task highlighting the parts of the hand. Accordingly, the training procedure is expected to converge more easily and the inference process should become more accurate. In the following subsections, the paper describes the Annotation Network and the training procedure. 2.1
Annotation Network
The annotation network superimposes a skeleton on the image of the hand. In this work, the model proposed in [20] is the core of the annotation network. This solution relies on a deep CNN called Convolutional Pose Machine [23]. The outputs of the network are 21 heatmaps, with values corresponding to the probability of each pixel being part of one of the joints of the hand. Figure 2 exemplifies the working scheme: sub-figure (a) shows the 21 joints of the hand, sub-figure(b) depicts the role of the probability distribution, i.e. the outputs of the network. The Convolutional Pose Machine proves effective in this hard task thanks to the custom training procedure proposed in [20]. In practice, the dataset contains always multiple versions of a picture collected simultaneously from different, known angulation: in other words, multiple projections of the hands are available. Using an iterative procedure, a detector produces labels for all the images. The predictions from multiple views of an object are triangulated. When triangulation confirms that the predictions where consistent, the images are added to the training set. Finally, a simple software superimposes the skeleton of the image given the coordinates of the joints, drawing segments that connect the prediction, as per Fig. 2(a).
26
E. Ragusa et al.
Fig. 2. Representation of the hand’s annotation outputs (Color figure online)
2.2
Training Procedure
The grasping classification is a challenging task. Accordingly, the training process needs high size datasets [3,19]. The existing datasets are noisy, lack the hands’ position inside the frame, and contain a modest amount of images. To overcome these issues this paper improves the multi-step learning approach proposed in [17]. A deep network pre-trained on the task of object detection is fine-tuned on the hand detection problem using a small dataset of labeled images, i.e. images containing annotation of the hands’ position. In practice, only hundreds of images are sufficient to accomplish training convergence. Thus, the small annotated training set puts the basis for the development of the whole system for hand grasping classification. After, the trained Hand detection network infers the position of the hands in all the frames of an existing grasping dataset, producing a high size labeled dataset, i.e., a dataset in which the position of the grasping hand inside a frame is known. Accordingly, the Right hand heuristic can be utilized to identify the right hand in all the frames and to extract the corresponding patches. Then, the Annotation Networks annotates all the patches containing the hands that perform the grasping action. Finally, the annotated patches form the training set for the Grasping classification network. Accordingly, the grasping classification is expected to converge to better solutions.
3
Experimental Results
The experiments assessed both the accuracy and the computational cost of the proposal. SSD-MobileNetV1, Convolutional Pose Machine, and MobileNetV1 architectures support blocks 1, 3, and 4 of Fig. 1, respectively. The hand detection network was trained using Oxford Hands Dataset [14] and Egohands Dataset [2]. The Annotation Network exploits the implementation of the original paper [20]. The classification network was trained on a subset of the Yale Human Grasping Dataset [3]. The remaining part of the dataset was employed as a test set. 500 frames were excluded from the training phase and classified by the complete system. The classification problem was pinch vs
Video Grasping Classification Enhanced with Automatic Annotations
27
grasp problem following the setup proposed in [17]. This setup considers only the biclass classification task because EMG control of prosthesis allows only a limited set of actions, for example, grasp pinch and wrist rotation. Anyway, the proposed approach can be extended to fine-grained classification without changes in the setup. The experiments consider only genuine RGB solutions as baseline comparison because other methods would be out of the scope of this paper. In fact, electronic skin leads to better performance in terms of accuracy, but it is more annoying for a patient. 3.1
Generalization Capability
The experiment compared the proposed solution with the previous version of the system. In addition, the experiments assess the impact of annotations colors: multicolor assigns to each finger a color, as per Fig. 2. Blue, Red, and Green use the base colors of RGB. Accordingly, the annotation affects only one channel of the original image. Finally, gray uses a constant value for the annotation over all three RGB channels. Table 1 shows the results: the first row describes the previous solution, the following rows remark the different versions of the proposal. Four standard metrics measure the performance of the model, one for each column. The best results are marked in bold. Table 1. Performance of grasping predictors Model
Accuracy Precision Recall F1
Original [17] 0.81
0.84
0.84
0.84
Multicolor
0.78
0.77
0.86
0.81
Blue
0.78
0.82
0.80
0.81
Red
0.79
0.84
0.81
0.82
Green
0.80
0.81
0.84
0.83
Grey
0.85
0.86
0.87
0.87
Results confirm that the color of the annotation is critical. In fact, only the configuration with grey annotations improved over the original solution. However, the grey configuration overcomes the previous solution consistently. It is reasonable to assume that the grey annotations yielded the best performance because affects equally all the three RGB channels. Accordingly, the annotations affect more consistently the convolution kernels. 3.2
Implementation
The later phase concerned the deployment of the predictor on a NVIDIA Jetson TX2. The test used floating-point arithmetic and python implementation. Accordingly, generalization performances are identical to offline measures.
28
E. Ragusa et al.
The proposed solution deployed on the embedded device detects the eventual presence of a hand and the actual grasping action with a maximal latency of 200 ms. This performance meets the real-time constraint of 400 ms [11]. Compared to the previous version of the system the latency increases by 70 ms in the worst case, however, this additional cost is justified by the enhanced generalization performances. When considering continuous acquisitions, the overall power consumption ranges from 5 to 11 W, while 7W are consumed by the GPU. As expected, power consumption in Jetson TX2 is high. Considering 2 lion batteries with 3.6 V output and 2900 mAh capacity, used in series, the continuous prediction span-time ranges from 2 to 4 h. However, a real system should trigger prediction only when needed, highly increasing battery duration.
4
Conclusion
The paper presented an enhanced video-based solution for grasping classification. An annotation network, included in the classification pipeline, simplifies the classification problem. Overall, the experiments confirmed the improved performances of the new solution. In addition, experiments on a Jetson TX2 module revealed that the proposed setup can process a frame in 200 ms confirming the feasibility of the proposed method for embedded devices.
References 1. Anwar, S.M., Majid, M., Qayyum, A., Awais, M., Alnowami, M., Khan, M.K.: Medical image analysis using convolutional neural networks: a review. J. Med. Syst. 42(11), 226 (2018) 2. Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1949–1957 (2015) 3. Bullock, I.M., Feix, T., Dollar, A.M.: The Yale human grasping dataset: grasp, object, and task data in household and machine shop environments. Int. J. Robot. Res. 34(3), 251–255 (2015) 4. Cai, M., Kitani, K.M., Sato, Y.: An ego-vision system for hand grasp analysis. IEEE Trans. Hum.-Mach. Syst. 47(4), 524–535 (2017) 5. Chortos, A., Liu, J., Bao, Z.: Pursuing prosthetic electronic skin. Nat. Mater. 15(9), 937 (2016) 6. Fan, Q., Shen, X., Hu, Y., Yu, C.: Simple very deep convolutional network for robust hand pose regression from a single depth image. Pattern Recogn. Lett. 119, 205–213 (2017) 7. Feix, T., Romero, J., Schmiedmayer, H.B., Dollar, A.M., Kragic, D.: The grasp taxonomy of human grasp types. IEEE Trans. Hum.-Mach. Syst. 46(1), 66–77 (2015) 8. Gao, Q., Liu, J., Ju, Z., Zhang, X.: Dual-hand detection for human-robot interaction by a parallel network based on hand detection and body pose estimation. IEEE Trans. Ind. Electron. 66, 9663–9672 (2019) 9. Ghazaei, G., Alameer, A., Degenaar, P., Morgan, G., Nazarpour, K.: Deep learningbased artificial vision for grasp classification in myoelectric hands. J. Neural Eng. 14(3), 036025 (2017)
Video Grasping Classification Enhanced with Automatic Annotations
29
˙ T.U., Peng, W.C.: TrackNet: a deep 10. Huang, Y.C., Liao, I.N., Chen, C.H., Ik, learning network for tracking high-speed and tiny objects in sports applications. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8. IEEE (2019) 11. Ibrahim, A., Valle, M.: Real-time embedded machine learning for tensorial tactile data processing. IEEE Trans. Circuits Syst. I Regul. Pap. 99, 1–10 (2018) 12. Li, Y., Wang, Y., Yue, Y., Xu, D., Case, M., Chang, S.F., Grinspun, E., Allen, P.K.: Model-driven feedforward prediction for manipulation of deformable objects. IEEE Trans. Autom. Sci. Eng. 99, 1–18 (2018) 13. Markovic, M., Dosen, S., Popovic, D., Graimann, B., Farina, D.: Sensor fusion and computer vision for context-aware control of a multi degree-of-freedom prosthesis. J. Neural Eng. 12(6), 066022 (2015) 14. Mittal, A., Zisserman, A., Torr, P.H.: Hand detection using multiple proposals. In: BMVC, pp. 1–11. Citeseer (2011) 15. Pham, T.H., Kyriazis, N., Argyros, A.A., Kheddar, A.: Hand-object contact force estimation from markerless visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2883–2896 (2017) 16. Ragusa, E., Cambria, E., Zunino, R., Gastaldo, P.: A survey on deep learning in image polarity detection: balancing generalization performances and computational costs. Electronics 8(7), 783 (2019) 17. Ragusa, E., Gianoglio, C., Zunino, R., Gastaldo, P.: Data-driven video grasping classification for low-power embedded system. In: 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 871–874. IEEE (2019) 18. Ragusa, E., Gianoglio, C., Zunino, R., Gastaldo, P.: Image polarity detection on resource-constrained devices. IEEE Intell. Syst. 35, 50–57 (2020) 19. Saudabayev, A., Rysbek, Z., Khassenova, R., Varol, H.A.: Human grasping database for activities of daily living with depth, color and kinematic data streams. Sci. Data 5, 180101 (2018) 20. Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1145–1153 (2017) 21. Sundaram, S., Kellnhofer, P., Li, Y., Zhu, J.Y., Torralba, A., Matusik, W.: Learning the signatures of the human grasp using a scalable tactile glove. Nature 569(7758), 698 (2019) 22. Wang, T., Li, Y., Hu, J., Khan, A., Liu, L., Li, C., Hashmi, A., Ran, M.: A survey on vision-based hand gesture recognition. In: International Conference on Smart Multimedia, pp. 219–231. Springer (2018) 23. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016) 24. Yang, Y., Fermuller, C., Li, Y., Aloimonos, Y.: Grasp type revisited: a modern perspective on a classical feature for vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 400–408 (2015)
Enabling YOLOv2 Models to Monitor Fire and Smoke Detection Remotely in Smart Infrastructures Sergio Saponara, Abdussalam Elhanashi, and Alessio Gagliardi(B) Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy [email protected]
Abstract. This paper presents implementation of a centralized antifire surveillance management system based on video camera. The system provides visualization information and an optimal guide for quick response of fire and smoke detection. We utilize deep learning model (YOLOv2) and Jetson nano board with Raspberry Pi camera as Internet of things (IoT) sensors. The smart cameras will be mounted in indoor and outdoor environments, and connected to the centralized computer via ethernet cables and communication protocols according to an IoT scheme. Specific software will be used in the centralized computer to show video stream from each camera, in real-time while these cameras are responsible for detecting fire and smoke objects and to generate the alarms accordingly. The proposed approach is able to monitor and supervise fire and smoke detection from different cameras remotely. It is suitable for targeted applications such as smart cities, smart transports, or smart infostructures. Keywords: YOLOv2 · Internet of Things (IoT) · Fire-smoke detection · Jetson nano · Smart cities
1 Introduction Fire and smoke accidents have become a very big concern as it causes severe destructive including loss of human lives and damage to the properties [1]. Warning and alerting of the citizens are a vital importance in terms of emergency management and preparedness in large cities. One of the targets of this research is to deploy integrated mass warning systems which can provide an emergency alert to the population. Several methods and techniques have been used to detect fire and smoke accidents. Most of the traditional methodologies are using sensors-based techniques. The drawback of these technologies is that can detect fire and smoke in the vicinity where they are installed. Smoke and fire detection sensors are not adequate to provide the location, magnitude, and direction for the fire. Traditional sensors are limited to cover a large area for fire and smoke and smoke detection. Other similar works have already been proposed in the literature. The author in [2] has been developed a distributed video antifire surveillance system based on IoT Embedded Computing nodes. The system takes advantage of an existing Video Smoke Detection algorithm providing a Web Application able to detect smoke in real-time from © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 S. Saponara and A. De Gloria (Eds.): ApplePies 2020, LNEE 738, pp. 30–38, 2021. https://doi.org/10.1007/978-3-030-66729-0_4
Enabling YOLOv2 Models
31
several cameras distributed in different areas. Although this system can give access to several users via a handy login page and a centralized dashboard, such a technique was only implemented for smoke detection. In our approach, we designed a deep learning model which is able to detect both fire and smoke objects and monitored remotely from a centralized management system. In this research, we are proposing CNN deep learning detector (YOLOv2) for smoke and fire detection based on a video camera. YOLOv2 is a real-time deep learning model for object detection [3]. By exploiting YOLOv2, it is possible to notify the early fire and smoke detection in a timely manner. Fire and smoke detection in IoT environment is a promising component of early accident-related event detection in smart cities. The target of this paper is to reduce the utilization of techniques based-sensors, data processing, and communication resources to minimize energy consumption in favor of increased battery life with regulations. This paper is organized as follows: Sect. 1 deals with introduction of the system architecture. Sections 2 presents YOLOv2 implementation and hardware setup for remote control. Section 3 presents software implementation. Section 4 discusses the experimental results and it is followed by Sect. 5 where is presented the Smart IoT real-time model for fire and smoke detection. Conclusions and further work are presented in Sect. 6.
2 YOLOv2 Implementation and Hardware Setup for Remote Control The following is a section that introduces the neural network algorithm and the hardware setup components for the proposed system. This section shows the design for deep neural network YOLOv2, training, validation, and evaluation of the model, hardware components, and system diagram for utilizing YOLOv2 to monitor fire and smoke accidents remotely. 2.1 YOLOv2 Design YOLOv2 model has been developed in MATLAB by using deep neural designer toolbox. The deep neural network model was built with 25 layers as shown in Fig. 1. We established light-weight architecture to fit into low-cost IoT nodes permitting a standalone solution for a Smart Antifire System. This light-weight model is suitable for real-time applications and it is worthy to be deployable on low-cost embedded devices. YOLOv2 architecture includes the input layer, a set of middle layers, and YOLOv2 specific layers. Our network accepts as inputs 128 × 128 pixel RGB images and produces as output the object class probabilities (fire or smoke) and the coordinates of the bounding box. The middle layers used in this architecture consist of convolutional, batch normalization, ReLU, and max pooling. For such a neural network we decided to use a size of 3 × 3 for each convolution layer. All layers after the input layer up to ‘ReLU4’ are to be considered as layers to extract features. YOLOv2 subnetwork succeeding in those layers was used instead to permit the object localization typical of Yolov2 algorithm. Finally, the Output layer was constructed to predict the location and the class of the detected objects like fire or smoke.
32
S. Saponara et al.
Fig. 1. The architecture of the proposed YOLOv2 neural network
2.2 Training and Validation The model was trained with 400 images of fire and smoke. Ground truth labeler application was used to label the region of interests (ROI) of our dataset [4]. The designed YOLOv2 was trained with stochastic gradient descent with a momentum optimization method which helps accelerate gradients vectors in the right directions, thus leading to faster converging (sdgm) [5]. Table 1. Summary for validation results by ROC tool analysis Matrices
Validation values
Number of images 200 Accuracy
93%
Specificity
80%
Sensitivity
94%
Momentum 0.9 was used to accelerate the speed of the training process for the architecture. We set the learning rate at 10−3 to control the model change in response to the error. YOLOv2 was validated with an independent dataset of 200 fire/smoke images (100 images with NO fire/smoke, and 100 images with fire/smoke). According to the results from Receiving Operating Characteristics (ROC), the accuracy for this validation achieved up to 93%, see Table 1 and Fig. 2.
Enabling YOLOv2 Models
33
Fig. 2. ROC curve for validation dataset.
2.3 YOLOv2 Evaluation The model has been tested with a dataset of videos which includes 170 fire/smoke videos and 117 no fire/smoke. These videos have been made challenging for color-based and motion-based objects. This dataset has been collected from various realistic situations for smoke/fire and normal conditions. As per confusion matrix criteria, different matrices (false-negative rate, false-positive rate, and accuracy) were analyzed to evaluate the performance of YOLOv2 model for fire and smoke detection, see Eq. (1–3). The proposed approach achieved promising classification results with an accuracy of 96.82% and overcomes all other methodologies [6, 7] and [8], see Table 2. False negative rate =
FN FN + TP
(1)
False positive rate =
FP FP + TN
(2)
Accuracy =
TP + TN TP + FN + TN + FP
Where: • • • •
True positive (TP) detects fire/smoke objects in positive videos. True negative (TN) does not detect fire/smoke in negative videos. False-positive (FP) detects fire/smoke objects in negative videos. False-negative (FN) does not detect fire/smoke in positive videos.
(3)
34
S. Saponara et al. Table 2. Performance of the proposed approach vs. state-of-art
Method
False positive (%)
False negative (%)
Accuracy (%)
YOLOv2
3.4
2.9
96.82
De Lascio et al. [6]
13.33
0
92.86
Fu T J, Zheng et al. [7]
14
8
91
YOLO [8]
5
5
90
2.4 Hardware Setup for Remote Control Jetson nano is a powerful computer tailored for running machine learning and neural network models for object detection. It is a suitable board for application which are based on distributed networks [9]. YOLOv2 models have been deployed on Jetson nano boards. We used MATLAB and third-party support packages to generate the C code for the Nvidia devices to run the algorithm as a standalone application. The hardware setup consists of three Jetson Nano devices, three Raspberry Pi V2 cameras, Ethernet cables, LAN switch, and a personal computer. We connected Raspberry cameras to the CSI (Camera Serial Interface) port of each hardware by using a proper CSI flat cable. Such a system permits the connection of multiple cameras with Jetson nano devices from different locations that guarantees the best performance for real-time fire and smoke detection and monitored from a centralized computer see Fig. 3.
Fig. 3. Hardware setup for fire and smoke smart surveillance detection system
3 Software Implementations We used MobaXterm software in Windows 10 Operating System (OS) to establish the communication between the main computer (Centralized fire and smoke management system) and Nvidia Jetson nano boards [10]. The communication is processed through OpenSSH server with respect to the defined IP address of each Jetson nano node. OpenSSH or Secure Shell is a remote ICT protocol that allows users to control
Enabling YOLOv2 Models
35
and transfer data between computers. The system is built with a multi-access point of IP addresses through OpenSSH sessions in the MobaXterm software. Each OpenSSH session communicates with Jetson nano board with its designated IP address. We can visualize the status of fire and smoke from several Jetson nano devices in one centralized fire and smoke management system. We implemented a specific code in the Linux OS of Jetson Nano for remote access, see Table 3. This code will enable the Remote Frame Buffer (RFB) protocol for remote access to the Graphical User Interface (GUI) of Jetson Nano boards. Table 3. Implemented code in XML file for remote access.
Enable remote access to the desktop
If true, allows remote access to the desktop via the RFB protocol. Users on remote machines may then connect to the desktop using a VNC viewer.
true
4 Experiment Results We started the communication between the centralized fire and smoke management computer and the Jetson Nano boards through MobaXterm software that resides in the main computer. Each node is identified with a static IP address. The neural network is executed on each board through specific commands on OpenSSH session terminal in MobaXterm software. We displayed a set of videos of real fire and smoke on a PC screen and exposed them to the Raspberry Pi cameras connected to each Jetson Nano devices. At the time when fire and smoke were caught to Raspberry Pi cameras, the bounding boxes were created enclosing the detected objects (fire and smoke), see Fig. 4. We measured the latency time of communication between the main computer and Jetson boards obtaining 0.3 ms. The execution time for MobaXterm software was recorded at 0.008 s and the transmission bandwidth was measured at 7.91 Gbits/sec. We recorded the average frames per second from a centralized computer which were processed by Jetson nano devices, using different sizes of video display (width and height). According to the results from this experiment, see Table 4, the real-time in the centralized computer reached up to 27 fps.
36
S. Saponara et al.
Fig. 4. Mobaxterm software in the main computer for visualizing fire and smoke from different cameras nodes. Table 4. The real-time measurement (fps) in the centralized computer Frame size (Width × Height) Real-time in centralized computer (fps) 128 × 128
27
224 × 224
18.4
416 × 416
11.2
640 × 480
9.49
5 Smart IoT Real-Time Model for Fire and Smoke Detection Early fire and smoke detection in smart cities can minimize large-scale damage and improve public and society safety significantly. The proposed approach detects fire and smoke based on video processing signals which are inputted from closed-circuit television (CCTV). This system can detect intrusion and explosive accidents in indoor and outdoor environments. We measured the time decision of YOLOv2 necessary to detect smoke or fire and to trigger an alarm. When the camera is in video mode, the time delay between the start of smoke/fire in videos and YOLOv2 detection is 1 to 2 s. It means that the presented architecture requires 1–2 s to trigger a fire alarm. YOLOv2 uses a single-stage object detection network which is faster than other two-stage deep learning detectors such as regions with convolutional neural networks (R-CNN) models. Regional convolutional neural network algorithms are slow and hard to optimize because each stage needs to be processed separately. We compared our approach with respect to other methodologies, see Table 5. Note that our design can produce better time decision for fire and smoke detection in comparison to the method [13], which proposed Faster
Enabling YOLOv2 Models
37
R-CNN detector. This is an advantage of utilizing IoT deep learning model (YOLOv2) to detect and analyze an early warning for fire and smoke disasters. Table 5. The proposed approach vs other methodologies for time decision. Methodologies
Time decision
Proposed approach 1–2 s Shin-Juh et al. [11] 10 s AdViSED [12]
3s
Faster R-CNN [13] 10 s
6 Conclusion and Further Work The objective of this research was to design a low-cost supervised management system for antifire surveillance system from different video cameras simultaneously. The proposed approach takes the advantage of utilizing several Nvidia Jetson Nano nodes, which are able to communicate with the main computer via ethernet cables and openSSH and RFB protocol. We designed a lightweight neural network model to account requirement for an embedded system. YOLOv2 technique showed promising results for real-time measurement up to 27 frames per second in the centralized computer. Indeed, the time decision was tracked as the best (1–2 s) when compared to the other state of art methodologies. In the future, we intended to connect the proposed system to iCloud facilities via wireless communication such as Wi-Fi and 4G LTE technologies. Acknowledgments. Work partially supported by H2020 European Processor Initiative project n. 826647 and by Dipartimenti di Eccellenza Crosslab Project by MIUR. We thank the Islamic Development Bank for their support of the Ph.D. work of A. Elhanashi.
References 1. Hall, J.R.: The Total Cost of Fire in the United States. National Fire Protection Association, Quincy, MA (2014) 2. Gagliardi, A., Saponara, S.: Distributed video antifire surveillance system based on IoT embedded computing nodes. In: International Conference on Applications in Electronics Pervading Industry, Environment and Society, pp. 405–411. Springer, Cham, September 2019 3. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) 4. MathWorks Student Competitions Team. Using Ground Truth for Object Detection (2020). (https://www.mathworks.com/matlabcentral/fileexchange/69180-using-groundtruth-for-object-detection), MATLAB Central File Exchange. Accessed 27 July 2020
38
S. Saponara et al.
5. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and CCD Camera, IEEE Trans. on Instrumentation and Measurement, vol. 54, no. (4) (2005) 6. Di Lascio, R., Greco, A., Saggese, A., Vento, M.: Improving fire detection reliability by a combination of video analytics. In: International Conference Image Analysis and Recognition, Vilamoura, Portugal, Springer, Cham, CH (2014) 7. Fu, T.J., Zheng, C.E., Tian, Y., Qiu, Q.M., Lin, S.J.: Forest fire recognition based on deep convolutional neural network under complex background. Comput. Modernization 3, 52–57 (2016) 8. Lestari, D., et al.: Fire hotspots detection system on CCTV videos using you only look once (YOLO) method and tiny YOLO model for high buildings evacuation. In: 2nd International Conference of Computer and Informatics Engineering (IC2IE2019), Banyuwangi, Indonesia, pp. 87–92 (2019) 9. Jetson Nano Developer Kit. https://developer.nvidia.com/embedded/jetson-nano-develo per-kit. Accessed 25 Feb 2020 10. Mobatek (n.d.). MobaXterm free Xserver and tabbed SSH client for Windows. mobaxterm.mobatek.net. https://mobaxterm.mobatek.net. Accessed 21 Jul 2020 11. Chen, S.J., Hovde, D.C., Peterson, K.A., Marshall, A.W.: Fire detection using smoke and gas sensors. Fire Saf. J. 42(8), 507–515 (2007) 12. Gagliardi, A., Saponara, S.: AdViSED: advanced video smoke detection for real-time measurements in antifire indoor and outdoor systems. Energies 13(8), 2098 (2020) 13. Kim, B., Lee, J.: Video-based fire detection using deep learning models. Appl. Sci. 9(14), 2862 (2019)
Exploring Unsupervised Learning on STM32 F4 Microcontroller Francesco Bellotti1(B) , Riccardo Berta1 , Alessandro De Gloria1 , Joseph Doyle2 , and Fouad Sakr1 1 Department of Electrical, Electronic and Telecommunication Engineering (DITEN),
University of Genoa, Via Opera Pia 11a, 16145 Genoa, Italy {francesco.bellotti,riccardo.berta, alessandro.degloria}@unige.it, [email protected] 2 School of Electronic Engineering and Computer Science, Queen Mary University of London, London E14NS, UK [email protected]
Abstract. This paper investigated the application of unsupervised learning on a mainstream microcontroller, like the STM32 F4. We focused on the simple Kmeans technique, which achieved good accuracy levels on the four test datasets. These results are similar to those obtained by training a k-nearest neighbor (KNN) classifier with the actual labels, apart from one case, in which K-NN performs consistently better. We propose an autonomous edge learning and inferencing pipeline, with a K-NN classifier which is periodically (i.e., when a given number of new samples have arrived) trained with the labels obtained from clustering the dataset via Kmeans. This system performs only slightly worse than pure K-means in terms of accuracy (particularly with small data subsets), while it achieves a reduction of about two orders of magnitude in latency times. To the best of our knowledge, this is the first proposal of this kind in literature for resource-limited edge devices. Keywords: IoT · Development tools · Edge computing · Arduino · Platform-independence
1 Introduction Machine learning (ML) is currently applied on the edge, in a variety of applications (e.g., [1]) and platforms (e.g., [2]), also including resource-constrained mainstream microcontrollers (e.g., [3]). This approach, compared to the cloud computing paradigm, provides better latency, bandwidth occupation, data security, energy consumption [4]. At present, most of the ML applications on the edge process sensor samples to make inferences (classification or regression) exploiting a model trained in the cloud. The processes of data cleaning and preparation, and of model training is very time consuming [2]. This is a major reason for the growing interest towards unsupervised learning, which aims at discovering (previously) unknown patterns in a data set without © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 S. Saponara and A. De Gloria (Eds.): ApplePies 2020, LNEE 738, pp. 39–46, 2021. https://doi.org/10.1007/978-3-030-66729-0_5
40
F. Bellotti et al.
the help of pre-defined labels. This could be of further relevance for embedded devices, as they could be left in the field to learn autonomously from the data they collect. A well-known unsupervised machine learning (ML) technique is clustering, which enables the detection of hidden patterns in data based on a cluster strategy and a distance function [5]. In this paper we are interested in exploring performance (accuracy and latency) of clustering data on a mainstream microcontroller with no a priori knowledge. Since the clustering works on the whole (available) dataset, we are also interested in understanding whether combination of clustering and classification (which works on each single record) may be beneficial. The remainder of this paper is organized as follows. Section 2 presents an outlook of related works. Section 3 proposes the pipeline methodology. Experimental results are shown in Sect. 4, while conclusions are drawn in Sect. 5.
2 Related Works Few works in literature have proposed models combining supervised and unsupervised learning. Agrawal et al. [6] have developed a novel classification framework for the identification of breast cancer, featuring a pipeline with an ensemble classification stage after the ensemble clustering stage, in order to target the unclustered patients. Oliveira et al. [7] proposed an iterative methodology combining automatic clustering and expert analysis for labeling tweets to be used in k-nearest neighbours (K-NN) and CentroidBased Classifier (CBC) classification. Chakraborty et al. [8] presents EC2 (Ensemble Clustering and Classification), a novel algorithm for discovering Android malware families of varying sizes-ranging from very large to very small families (even if previously unseen). Thanks to the proposed merging of classification and clustering to support binary and multi-class classification, EC2 constitutes an early warning system for new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs. Papas et al. [9] presented a data mining technique for software quality evaluation. They use K-means clustering to establish clusters of Java classes based on static metrics, and then built decision trees for identifying metrics which determine cluster membership. Our approach falls into the same group but unlike other techniques, we are targeting edge devices in the Internet of Things (IoT) field, with the goal to support completely autonomous systems.
3 The Autonomous Edge Pipeline 3.1 Background In this paper we deal with two very simple ML algorithms: K-means and K-NN. K-means is a very simple centroid-based, iterative, clustering algorithm that aims to partition data into k different and non-overlapping clusters where points with similar features belong to the same group. It randomly chooses K representative points as the initial centroids, and then each data point is assigned to the closest centroid. At the end of each iteration,
Exploring Unsupervised Learning on STM32 F4 Microcontroller
41
the centroids of each cluster are updated using the mean of all data points belonging to the same cluster until there is no further change in their values [10]. K-NN is a simple classification algorithm based on feature similarity. It assigns to an input data point the class of the nearest set of previously labeled points. The performance of this method is dependent on k, the number of neighbors to be considered at each decision, which is the only one hyper-parameter to be set for a model [11]. This paper presents an experimental analysis on a mainstream microcontroller first of the K-means clustering algorithm, then of a pipeline combining clustering and classification, as described in the following. 3.2 Methodology for the Autonomous Edge Pipeline (AEP) Unsupervised classification of samples is typically done through clustering. This is very appealing for field deployed devices, that would not need any prior knowledge, but requires, for each new sample to classify, the processing of the whole dataset collected so far. We propose a different approach, with an iterative pipeline alternating clustering and classification. Particularly, clustering is executed periodically (e.g., after the reception of 100 samples) and provides the labels for the classifier, which performs the much faster classification of each single sample. This is expected to lead also to a reduction in energy consumption because of the lower execution time needed for classification than for clustering, but we need to investigate the possible performance drop. The proposed Autonomous Edge Pipeline (AEP) implements a two-stage workflow, shown in Fig. 1. The initial step consists in filling the dataset up to a certain level L (e.g., 50 records) is reached. Then, the K-means clustering is run on the dataset, providing the labels that are attached to the original records. Then, the continuous operation loop starts. Using the above labels, the K-NN classifier (which does not need a training, apart from the definition of the k hyper-parameter) is used to classify the subsequent samples, that are also stored in the dataset. In order to avoid memory overflow, the dataset is implemented as a fixed maximum length queue. After other L records, another clustering session is run and the K-NN classifier updated.
Fig. 1. The AEP workflow.
42
F. Bellotti et al.
4 Experimental Results We conducted the experimental analysis on a STM NUCLEO-F401RE board, with 84 MHz processing speed, 512 kB flash memory and 96 kB SRAM. The F series is widely spread at industrial level, as it offers a compact, high-performing and cost-effective solution [12]. WAs a simple baseline for desktop/cloud computation, we use a PC hosting a 2.7 GHz core i7 processor with 16 GB of RAM and 8 MB cache. For data clustering at the edge, we implemented the K-means algorithm in platformindependent C (i.e., not using native OS libraries). On the desktop, with consistent results, we used the K-means implementation offered by the sk-learn python library. For our tests we use four binary classification datasets representing the IoT field. The first dataset is Seismic Mine (2584 samples × 18 features) [13], used for seismic hazard prediction. This dataset deals the problem of high energy seismic bumps (higher than 10ˆ4 J) and comprises data from two longwalls located in a coal mine, and is quite unbalanced (93% zeros). We randomly reduced the dataset size to 30%, to fit the MCU memory size, and considered only 4 features, which looks closer to a field device environment. Daphnet Freezing of Gait (28801 × 10) [14], it is used to recognize gait freeze from wearable acceleration sensors placed on legs and hip of Parkinson patients. Similarly, this dataset has been reduced to 5% and 3 features. The third one is IoT_Failure (951 × 9) [15], which is used to predict failure in IoT field. The last one is Heart (303 × 12) [16], a popular medical dataset used to predict heart diseases. For simplicity, we chose binary label datasets only. In the following experiments, we removed these actual labels from the training, and used them only as a ground truth for comparing the clustering/classification results. 4.1 Assessing K-means on IoT Datasets As a first step, we assessed performance of K-means clustering. Lastly, we evaluated the performance of the clustering method using two common metrics. The silhouette score is a measure of how close a point is to its own cluster compared to other clusters, it ranges from −1 to 1 where higher value indicates that the point is well matched to its own cluster. The Davies Baldwin score is the ratio of within-cluster to betweencluster distances. Lower values indicate better clustering. We also considered two scaling cases: no scaling, and Standard Scaler. Table 1 shows the obtained empirical results. The Silhouette value is the average over all the samples. In general, results show a certain discrepancy between the K-means clustering and the values obtained with the actual labels, that are generally worse. This seems to indicate the challenge of the classification task, which will have to guess the actual labels based on the dataset features. The scaling effect (standardization) improves the metrics in two cases but not in general. Table 2 shows the clustering time performance on both PC and F4. Results highlight the long latency on microcontrollers (measured with HAL_GetTick()), also compared with the classification latency, which is typically in the order of tens of milliseconds (see also Table 3).
Exploring Unsupervised Learning on STM32 F4 Microcontroller
43
Table 1. K-means clustering performance. Dataset
K Scaling
K-means labels
Actual labels
Silhouette Davies Silhouette Davies Mine Daphnet
2 None
0.45
0.4
2.9
2 Standard 0.67
0.9
1.01
0.17
3.92
2 None
0.77
−0.17
9.62
0.72
2 Standard 0.61 IoT_Failure 2 None Heart
0.94
0.84
−0.15
9.85
0.28
−0.03
6.83
2 Standard 0.17
2
0.14
2.28
2 None
0.97
0.04
4.51
2.2
0.1
2.9
0.38
2 Standard 0.16
Table 2. K-means timing performance. K-means clustering time Dataset
PC
F4
Mine
6 ms
1.9 s
Daphnet
8 ms
3.3 s
IoT_Failure 11 ms 3.9 s Heart
4 ms
1.6 s
4.2 Autonomous Edge Pipeline Figure 2 depicts the learning curves for the AEP on the four test datasets. Scaling is performed through a standard scaler. The learning curves represent the performance of a K-NN classifier as a function of training set size, which varies from 5% to 75% of the whole dataset. Performance is measured on the same 20% testing set. Table 3 reports the numerical values. The best k is obtained by a 3-fold cross validation with a value range from 1 to an upper limit which is 3, 5 or 10, depending on the number of samples in the dataset. The inference time using the K-NN classifier is reported only for the dataset size providing the best accuracy (e.g. 65% training set size for Mine). Accuracy does not seem to be affected by variable unbalanced-ness (Mine dataset case). For comparison Fig. 2 shows also the performance obtained by a K-NN classifier trained on the actual labels, which almost always outperforms the AEP, and by the simple K-means clustering algorithm. Not surprisingly, the K-means always performs better than AEP. This is because the AEP classification is trained on labels created by a run of K-means on the training set. Moreover, K-means computes the labels on the basis of the knowledge of the testing set, differently from AEP. However, the computational burden (and consequent energy consumption) of K-means is significantly higher (Table 2 vs 3). In AEP, the clustering algorithm is run only every Tl samples. Computation of the
44
F. Bellotti et al.
best value of the k-hyperparameter requires one run for each candidate value. However, we observed that choosing a fixed value of 5 we got very similar results, with a (slight) decrease only for Heart (2%) and IoT failure (1%).
Fig. 2. Learning curves for AEP on various datasets.
Table 3. AEP performance. AEP Dataset
Best training set size
Accuracy
K
Inference time
Mine
65%+
90%
1
12 ms (65% case)
Daphnet
55%+
88%
6
23 ms (55%)
IoT_Failure
45%–65%
91%
7
16 ms (45%)
Heart
55%
82%
8
9 ms (55%)
The lines in Fig. 2 suggest some other interesting considerations. The performance of all datasets quickly saturates with low percentages of the training set (i.e., the learning rate is high). AEP performance tends to be less stable than the other techniques as the training size increases.
Exploring Unsupervised Learning on STM32 F4 Microcontroller
45
As a final experiment, we computed on the desktop (due to the lack of training tools on the STM32 F4 board), performance of other classifiers beside K-NN in the AEP implementation. In the Heart dataset, Decision Tree starts (5% size) with 74% accuracy and reaches 80% at 65% size. SVM reaches 84% at 75% size. In Mine, SVM starts with 90% and ends with 86% (which is similar to the K-means performance), but has a drop at 15% (69%). Also this point confirms a certain instability of the AEP results, which we attribute to the sub-optimality of the training labels, but should be better investigated. In Daphnet, both classifiers perform similar to K-NN, while in IoT failures SVM achieves always at least 90% accuracy (93% at 75%).
5 Conclusions and Future Work This paper investigated application of unsupervised learning on a mainstream microcontroller, like STM32 F4. We focused on the simple K-means technique, which achieves about 90% accuracy on two datasets (Mine and IoT failure), and slightly worse (83– 86%) in other two (Heart and Daphnet). These results are similar to those obtained by training a K-NN classifier with the actual labels, apart from the Daphnet case, in which K-NN performs about 10% better. We propose AEP, an autonomous edge learning and inferencing pipeline, with a KNN classifier which is periodically trained with the labels obtained from clustering the dataset via K-means. This system performs only slightly worse than pure K-means in terms of accuracy (and particularly with small data subsets), while it achieves a reduction of about two orders of magnitude in latency times. To the best of our knowledge this is the first proposal of this kind in literature for resource-limited edge devices. In the future, we intend to integrate the AEP in the Edge Learning Machine (ELM) framework [17] and expand its implementation to support other combinations of clustering and classification algorithms (e.g., SVM seems to have promising results), as well as fusion. This could also help understand a certain instability of the AEP results when increasing training set sizes. Other possible research directions concern the management of the dataset on the edge (e.g., filtering of K-NN training samples), also exploiting hierarchical architectures, so to prevent memory overflow.
References 1. Albanese, A., d’Acunto, D., Brunelli, D.: Pest detection for precision agriculture based on IoT machine learning. In: Applepies 2019, Lecture Notes in Electrical Engineering, vol. 627, pp. 65–72 (2020). https://doi.org/10.1007/978-3-030-37277-4_8 2. Lipnicki, P., Lewandowski, D., Syfert, M., Sztyber, A., Wnuk, P.: Inteligent IoTSP - implementation of embedded ML AI tensorflow algorithms on the NVIDIA jetson Tx Chip. In: Proceedings-2019 International Conference on Future Internet of Things and Cloud, FiCloud 2019, pp. 296–302 (2019). https://doi.org/10.1109/ficloud.2019.00049 3. Sakr, F., Bellotti, F., Berta, R., De Gloria, A.: Machine learning on mainstream microcontrollers. Sensors 20(9), 2638 (2020). https://doi.org/10.3390/s20092638 4. Lin, L., Liao, X., Jin, H., Li, P.: Computation offloading towards edge computing. Proc. IEEE 107, 1584–1607 (2019)
46
F. Bellotti et al.
5. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010). https://doi.org/10.1016/j.patrec.2009.09.011 6. Agrawal, U., et al.: Combining clustering and classification ensembles: a novel pipeline to identify breast cancer profiles. Artif. Intell. Med. 97, 27–37 (2019). https://doi.org/10.1016/ j.artmed.2019.05.002 7. De Oliveira, E., Gomes Basoni, H., Saúde, M.R., Ciarelli, P.M.: Combining Clustering and Classification Approaches for Reducing the Effort of Automatic Tweets Classification. https:// doi.org/10.5220/0005159304650472 8. Chakraborty, T., Pierazzi, F., Subrahmanian, V.S.: EC2: ensemble clustering and classification for predicting android malware families. In: IEEE Transactions on Dependable and Secure Computing, vol. 17, no. 2, pp. 262–277, 1 March-April 2020. https://doi.org/10.1109/tdsc. 2017.2739145 9. Papas, D., Tjortjis, C.: Combining clustering and classification for software quality evaluation. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8445. LNCS, pp. 273–286 (2014). https://doi.org/10.1007/978-3-319-07064-3_22 10. Marsland, S.: Machine Learning An Algorithmic Perspective, 2nd edn. CRC Press, Boca Raton (2015) 11. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University, New York (2014) 12. STM32 High Performance Microcontrollers (MCUs)—STMicroelectronics. http://www.st. com/en/microcontrollers-microprocessors/stm32-highperformance-mcus.html. Accessed 23 Jul 2020 13. Sikora, M., Wrobel, U.: Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Arch. Min. Sci. 55(1), 91–114 (2010) 14. Bächlin, M., Plotnik, M., Roggen, D., Giladi, N., Hausdorff, J.M., Tröster, G.: A wearable system to assist walking of Parkinso´ns disease patients benefits and challenges of contexttriggered acoustic cueing. Methods Inf. Med. 49(1), 88–95 (2010). https://doi.org/10.3414/ ME09-02-0003 15. IoT_failure_prediction | Kaggle. https://www.kaggle.com/mukundhbhushan/iot-failure-pre diction. Accessed 23 Jul 2020 16. Heart Disease UCI | Kaggle. https://www.kaggle.com/ronitf/heart-disease-uci/kernels. Accessed 23 Jul 2020 17. Edge-Learning-Machine GitHub. https://github.com/Edge-Learning-Machine. Accessed 31 Jul 2020
Environmental Monitoring and E-health
Unobtrusive Accelerometer-Based Heart Rate Detection Yurii Shkilniuk1(B) , Maksym Gaiduk1,2 , and Ralf Seepold1,3 1 HTWG Konstanz, Alfred-wachtel-Str. 8, 78462 Konstanz, Germany
{yshkilni,maksym.gaiduk,ralf.seepold}@htwg-konstanz.de 2 University of Seville, Av. Reina Mercedes s/n, 41012 Seville, Spain 3 I.M. Sechenov First Moscow State Medical University, Bolshaya Pirogovskaya St. 2-4, 119435 Moscow, Russian Federation
Abstract. Ballistocardiography (BCG) can be used to monitor heart rate activity. Besides, the accelerometer should have high sensitivity and minimal internal noise; a low-cost approach was taken into consideration. Several measurements have been executed to determine the optimal positioning of a sensor under the mattress to obtain a signal strong enough for further analysis. A prototype for an unobtrusive accelerometer-based measurement system has been developed and tested in a conventional bed without any specific extras. The influence of the human sleep position for the output accelerometer data was tested. The obtained results indicate the potential to capture BCG signals using accelerometers. The measurement system can detect heart rate in an unobtrusive form in the home environment.
1 Introduction A ballistocardiography is a technique that measures the heart rate from the mechanical vibrations of a human body at each cardiac cycle. Ballistocardiography can perform non-contact measurements of such quantities by studying the vibration patterns that propagate through an object mechanically coupled to the subject. For example, a bed can be used to track HR of subjects lying overnight [1], resulting in a contactless, non-intrusive measurement. Some non-invasive techniques use the ballistocardiography placing sensors in a chair or in the bed where the patient is placed [2]. The BCG can realize in an unobtrusive sensory form and be embedded with different configurations in the patient’s environment. Owing to its physical nature, it can convey critical medical information about the cardiovascular system, which might be otherwise unattainable, e.g., the force of the heart’s contraction, which is a crucial indicator of the heart’s physiologic age and its decline [3]. BCG can offer useful perspectives for application in preventive medicine, e.g., in determining the quality of sleep, in the detection of physical or mental stress, or the early detection of coronary heart disease. The quality of sleep, sleep phases, and duration can be determined using data on heart rate and respiratory rate of the patient [4, 5]. The main objective of the study in this work is to develop a prototype unobtrusive accelerometer-based measurement system and investigate the accelerometer positioning for heart rate detection in a home environment used for sleep stage classification. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 S. Saponara and A. De Gloria (Eds.): ApplePies 2020, LNEE 738, pp. 49–54, 2021. https://doi.org/10.1007/978-3-030-66729-0_6
50
Y. Shkilniuk et al.
2 Status and Experiment Researchers proposed several ways to measure BCG signals during sleep. They use different types of sensors, their quantity, and placement. Some of them used a hydraulic sensor system filled with water [6], load sensors installed on four legs of the bed [7], piezoelectric load, and acceleration sensors [8], pressure sensors installed under a mattress [4, 9] and others. With the development of highly sensitive accelerometers based on microelectromechanical (MEMS) technologies, their applications in various fields of measurement are growing rapidly. Unfortunately, the use of accelerometers for measuring BCG signals is still a little-studied area of measurement science. The found publications determined that the sensor should have high sensitivity and minimal internal noise [8, 9]. By these requirements, a search for available accelerometers was done. The accelerometers were compared in Table 1. For further tests, the LIS3DSHTR1 the accelerometer was chosen since it has such advantages as high sensitivity, low noise density, built-in 16 bits analog-to-digital converter (ADC), and low cost. Table 1. Comparison of accelerometers. Parameters
Measurement range
Output data type
Sensitivity
Noise density
ADXL 362
±2g, ±4 g, ±8g
Digital (12 bits)
1 mg/LSB for ±2g range
175 µg/sqrt(Hz) 3
3.3 V
8e
SCA820-D04
±2g
Digital (12 bits)
1.2 mg/LSB
2000 µg RMS
1
3.3 V
18e
LIS3D SHTR
±2g, ±4 g, ±8g, Digital ±16g (16 bits)
0.06 mg/LSB 150 µg/sqrt(Hz) 3 for ±2g range
3.3 V
2e
SCA620-EF1V1B 2g
Analog 2 V/g
2000 µg RMS
Axes Power Price supply
1
5V
45e
MXA 2500E
±1g
Analog 0.5 V/g
200 µg/sqrt(Hz) 2
3.3 V; 5V
7e
ADXL 103
±1,7g
Analog 1 V/g
110 µg/sqrt(Hz) 2
3.3 V; 5V
22e
The structure of the bed and mattress are common low-cost offers (Fig. 1a). A foam mattress 120 mm thick was used. The studies in the field of measuring heart rate using the pressure sensors under the mattress show that the accuracy of the heart rate measurements depends on the human body position [1]. In this study, the influence of the human sleep position for the output accelerometer data was tested. Four basic human sleep positions 1 https://www.st.com/resource/en/datasheet/lis3dsh.pdf.
Unobtrusive Accelerometer-Based Heart Rate Detection
51
were investigated. They lie on a chest, lying on a left side (with the arm folded back), lying on a right side, and lying on a back.
Fig. 1. Three methods of attaching the accelerometer (a - attached to a slat, b - attached to the mattress, c - attached to a cantilever between the slats).
Three methods of attaching the accelerometer to the bed and mattress were tested. They are: on a slat (Fig. 1a), between slats and attached directly to the mattress (Fig. 1b), between slats with cantilever (Fig. 1c). The test cantilever was made of rough polypropylene plastic plate 2 mm thick. The influence of this cantilever’s geometric and mechanical characteristics on the measurement results has not been investigated. The dimensions of the flexible part of the plate were 200 × 25 mm. A Raspberry Pi and a program written in Python program language were used to collect data from the accelerometer. The polling rate was 600 values per second. Data recording was carried out, taking into account time ranges. Processing and visualization of data were carried out on a personal computer using another Python program. Data were filtered using a digital low-pass and high-pass filter to reduce lowfrequency bias and high-frequency noises that do not carry useful information. The frequency bandwidth of the heart rate component varies among publications, where the cutoff frequencies range between 0.1–1 Hz for high-pass filter and 10–25 Hz for lowpass filters [8, 9]. By some publications [3, 10] as well as empirical results, it was found that the bandwidth 1–15 Hz overall more suitable for heart rate monitoring using the accelerometer.
3 Discussion of the Results The development system prototype provides the measurement of heart rate with minimal influence of human body movement, including movement caused by breathing. The measurement system was designed for use in experiments with different attachment methods to the mattress and human sleep positions. All measurements (except presented in Fig. 6) were done with the sensor placed directly opposite the heart. Any BCG signals from the accelerometer according to the methods of attaching to the slat and to the mattress was not recognized (Fig. 2). The only recognizable results were given by the method by using the cantilever (Fig. 3).
52
Y. Shkilniuk et al.
Fig. 2. An output signal from the accelerometer attached to the slat.
Fig. 3. An output signal from the accelerometer located on the cantilever in lying chest position.
The experiments showed that the human sleep position significantly affects the precision of measuring the BCS signals by the accelerometer. The most precise results were shown by the position lying on the chest (Fig. 3) and the left side. Here is possible to recognize a typical periodic BCG heartbeat signal described in the scientific papers [1, 11]. Lying on the right side, it is much more challenging to recognize the heartbeat. (Fig. 4). Lying on a back, it was impossible to recognize the BCG signals (Fig. 5).
Fig. 4. An output signal from the accelerometer located on the cantilever in lying right side position
Fig. 5. An output signal from the accelerometer located on the cantilever in lying back position
The most informative signals of the biggest amplitude were obtained when the accelerometer was located under the mattress directly opposite the human heart (Table 2). As the accelerometer was moving away from the human heart, the BCG signal level was dropping sharply. At a distance of 10 cm to the side from the conditional vertical of the human heart, the BCG signal from the chest position looks like shown in Fig. 6 (Table 2).
Unobtrusive Accelerometer-Based Heart Rate Detection
53
Table 2. Signal recognition results for different sleep positions and sensor placement. Sleep position
Chest
Left side
Right side
Back
Directly opposite Good recognition the heart
Good recognition
Poor recognition
Impossible to recognize
10 cm from the vertical of the heart
Poor recognition
Impossible to recognize
Impossible to recognize
Poor recognition
Fig. 6. BCG signal from the chest position at a distance of 10 cm to the side from the conditional vertical of the human heart.
The accelerometers with high sensitivity and low noise density can be used for measuring the heart rate from the mechanical vibrations of the body due to the heart movement. For successful measurements, special cantilever design is needed. As the accelerometer is moving away from the human heart, the BCG signal level is dropping sharply. Not all human sleep positions are suitable for clearly recognizing BCG signals using the system with one accelerometer.
4 Conclusions and Outlook The prototype of an accelerometer-based measurement system was developed. Using the system does not cause inconvenience during sleep as the sensor is placed under the mattress. The investigation on the human sleep position and accelerometer placement for quality of recognition of heart rate detection in a home environment were performed. In this research, the most precise heart rate recognition results were achieved when subjects were lying on the chest (prone position) and the left side (left lateral position), whereas it was almost impossible to identify heart rate in other sleep positions. In the left lateral and prone position, most heartbeats were detected when the sensor was placed no further than 10 cm from the heart axis under the mattress. Further research into the use of accelerometers for measuring BCG can be focused on the aspect of mechanical processes that occur when BCG signals propagate through the mattress. It can also help determine the influence of the cantilever’s mechanical and geometric characteristics on the measurement results. Another possible aim of further work could be to investigate the use of several accelerometers to eliminate the disadvantage of signal dropping when moving away from the accelerometer.
54
Y. Shkilniuk et al.
Acknowledgments. This research was partially funded by the Ministry of Economics, Labour and Housing Baden-Württemberg (Germany) under the contract ‘Errichtung und Betrieb eines (virtuellen) Kompetenzzentrums Markt- und Geschäftsprozesse Smart Home & Living BadenWurttemberg’. The author is responsible for the content of this publication. This research was partially funded by the EU Interreg V-Program “Alpenrhein-Bodensee-Hochrhein”: Project “IBH Living Lab Active and Assisted Living”, grants ABH040, ABH04, ABH066 and ABH068.
References 1. Brüzer, C., Stadlthanner, K., Waele, S., Leonhardt, S.: Adaptive beat-to-beat heart rate estimation in ballistocardiograms. IEEE Trans. Inf Technol. Biomed. 15, 778–786 (2011) 2. Sadek, I., Biswas, J.: Non-intrusive heart rate measurement using ballistocardiogram signals: a comparative study. Signal Image Video Process. 13, 475–482 (2019) 3. Albukhari, A., Lima, F., Mescheder, U.: Bed-embedded heart and respiration rates detection by longitudinal ballistocardiography and pattern recognition. Sensors 19, 1451 (2019) 4. Gaiduk, M., Seepold, R., Martínez Madrid, N., Orcioni, S., Conti, M.: Recognizing breathing rate and movement while sleeping in home environment. Appl. Electron. Pervading Ind. Environ. Soc. 627, 333–339 (2019) 5. Rodríguez, E.T., Seepold, R., Gaiduk, M., Martínez Madrid, N., Orcioni, S., Conti, M.: Embedded system to recognize movement and breathing in assisted living environments. In: Applications in Electronics Pervading Industry, Environment and Society. LNEE, pp. 391–397. Springer, Cham (2019) 6. Jiao, C., Su, B., Lyons, P., Zare, A., Ho, L.C., Skubic, M.: Multiple instance dictionary learning for beat-to-beat heart rate monitoring from ballistocardiograms. IEEE Trans. Biomed. Eng. 65, 2634–2648 (2018) 7. Sivanantham, A.: Measurement of heartbeat, respiration and movements detection using smart Bed. In: IEEE Recent Advances in Intelligent Computational Systems (2015) 8. Feng, X., Dong, M., Levy, P., Xu, Y.: Non-contact home health monitoring based on lowcost high-performance accelerometers. In: IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies, pp. 356–364 (2017) 9. Gomez-Clapers, J., Serra-Rocamora, A., Casanella, R., Pallas-Areny, R.: Towards the standardization of ballistocardiography systems for J-peak timing measurement. Measurement 58, 310–316 (2014) 10. Lima, F., Albukhari, A., Zhu, R., Mescheder, U.: Contactless sleep monitoring measurement setup. In: Proceedings, vol. 2. (2018) 11. Inan, O.T., Migeotte, P.F., Park, K.S., Elemadi, M., Tavakolian, K., Casanella, R., Zanetti, J., Tank, J., Funtova, I., Prisk, G.K., Rienzo, H.K.: Ballistocardiography and seismocardiography: a review of recent advances. IEEE J. Biomed. Health Inform. 19, 1414–1427 (2014)
A Lightweight SiPM-Based Gamma-Ray Spectrometer for Environmental Monitoring with Drones Marco Carminati1,2(B) , Davide Di Vita1,2 , Luca Buonanno1,2 , Giovanni L. Montagnani1 , and Carlo Fiorini1,2 1 Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano,
Piazza Leonardo da Vinci 32, 20133 Milan, Italy [email protected] 2 Istituto Nazionale di Fisica Nucleare, Sezione di Milano, via Celoria 16, 20133 Milan, Italy
Abstract. A wireless, compact (8 × 8 × 11 cm3 ) and lightweight ( 1/RP , where gm is the transconductance of the NMOS inside the cross-coupled cell and RP is the parasitic resistance of the inductor [12]. Varicaps value had been chosen to have the correct TR (Tuning Range) to recover frequency deviations over PVT corners. The tail current was fixed to 1 mA. 3.1 Output Buffer Design After a preliminary analysis and sizing of the VCO with an output ideal capacitance as load, a decoupling resistive CML buffer was sized and connected at VCO outputs. This buffer is necessary to decouple the VCO from the rest of the circuit in order to fix the oscillation frequency and to pick up the oscillation voltage. Then, since the VCO has to be tested, and the first stage buffer is not able to drive the complex load offered from IC pads, wire bonding and measure instrument, a second stage output buffer was designed.
206
P. Prosperi et al.
This second stage buffer had been designed as an inductive tuned amplifier, in order to have a reasonable power on measurement instrument and to have a better matching with the impedance viewed looking towards the pads. The entire circuit, VCO and buffers, is shown in figure Fig. 3.
Fig. 3. Complete circuit schematic.
4 Layout Implementation The entire circuit, LC-VCO and buffer, has been implemented in layout view, showed for the VCO in Fig. 4. For the design of this layout, all choices were made in order to reduce the parasitic resistance and to guarantee a good matching of simple current mirrors and transistor pairs. Indeed, a high parasitic resistance can lead to a gain degradation and this
Fig. 4. VCO layout in 28 nm CMOS technology.
Analysis and Design of Integrated VCO in 28 nm CMOS
207
could cause a weak start-up condition for the VCO. It’s possible to see how, the main contribution to area occupation is surely that one of inductor, which is implemented as three-turns differential inductor, already present in technology libraries. The space between the devices is the minimum allowed by technology rules helping to minimize the devices mismatch.
5 Results and Comparison 5.1 Schematic (Pre-layout) Simulations In schematic design, the target central frequency had been fixed to higher than 25 GHz in order to leave a margin for an eventual layout implementation. In Table 1 are summarized the main performances of the designed voltage controlled oscillators. To have a better comparison between the designed VCOs, a very used figure of merit (FOM) can be introduced [12, 13]. This FOM, defined in Eq. (1), allows to compare different VCOs taking into account several important performance parameters at the same time, such as Phase Noise (L(Δf)), dissipated power (Pdc ) and central frequency (f0 ). Parameter Δf is the frequency offset from the carrier at which Phase Noise, and so FOM, are evaluated. Table 1. Designed VCOs schematic design main performances. Ring a)
Ring b)
Ring c)
LC-VCO
VCO Type
Single ended
Pseudo-diff
Fully-diff
Fully-diff
Vdd[V ]
0.9
0.9
1.2
0.9
Temperature Range
[−40–100] °C
[−40–100] °C
[−40–100] °C
[−40–125] °C
Used Transistors
Core RF-Mos
Core RF-Mos
ULVT RF-Mos
Core RF-Mos
Frequency[GHz]
28
28
28
30
Power diss.[mW ]
9.85
20.6
12.78
0.9
Kvco [GHz/V ]
75
75
12
3.3
PVT variations
High
High
Mid-low
low
Area [µmˆ2]
A1 > 4
A2 ≈ 2xA1
A3 > 350
A4 ≥ 10000
L (f = 1 MHz) [dBc/Hz]
−63.6
−65.97
−65.6
−90
FOM (f = 1 MHz) [dBc/Hz]
−142.6
−141.77
−143
−180
Pdc f0 ) + 10log10 FOM (f ) = L(f ) − 20 log10 ( f 1mW
(1)
It’s important to say that the temperature range for ring VCOs had to be reduced from −40 °C −125 °C to −40 °C −100 °C because ElectroMigration (EM) current density
208
P. Prosperi et al.
specifications at 125°C are too stringent to be respected for the RF_MOS devices used in this work, for these topologies. We can also see that for the CML ring VCO a higher supply voltage of 1.2 V and the use of Ultra-Low-Threshold-Voltage (ULVT) are needed to overcome some gain issues due to low Vdd/Vth ratio. Instead, LC tank architecture allows to respect all declared constraints, so Vdd is 0.9 V and the temperature range is − 40 °C to 125 °C. From Table 1 it’s possible to see how pseudo-Nmos ring VCO structures have a very high sensitivity to PVT (Process Voltage Temperature) variations and, due to this, high Kvco values are required to recover target oscillation frequency among all corners. CML-based structure has instead a lower sensitivity to PVT variations, but a higher area occupation in addition to a higher supply voltage and to the use of non-core transistors. Phase noise performances are instead very similar and around −65 dBc/Hz at 1 MHz offset from the 28 GHz carrier for all ring VCOs. FOM values also confirmed the comparability of ring structures in terms of noise/power dissipation performances. FOM values are around −142 dBc/Hz, always at 1 MHz offset from 28 GHz carrier. Now, it can be noticed how the LC tank VCO has better noise, power and PVT response performances. At 30 GHz schematic target frequency, its PN at 1 MHz offset is around − 90 dBc/Hz and dissipation power is about 900 μW. This results in a much higher FOM of −180 dBc/Hz at 1 MHz offset from 30 GHz carrier. The drawback of the LC architecture is obviously the Area occupation, which is much greater than ring VCOs one. 5.2 Post Layout Simulations Post layout simulations show how the target frequency of 25 GHz is reached and recovered in every PVT corner, from the slowest to the fastest, by varying the control voltage Vc on Varicaps (Fig. 5). Post layout tuning range is typically 1.3 GHz. Typical Kvco is around 2.3 GHz/V. The power dissipation of the single VCO is about 860 μW and the
Fig. 5. Post layout slowest (green), fastest (yellow) and typical (red) tuning range curves.
Analysis and Design of Integrated VCO in 28 nm CMOS
209
PN at 1 MHz frequency offset is −95 dBc/Hz, with a typical FOM of −184 dBc/Hz. The total power dissipation, buffers included is about 8.5 mW. The area of the single VCO is about 160 μm × 110 μm, while the total area, buffers, mirror and interconnections included, is about 410 μm × 220 μm. Typical single ended VCO swing is about 550 mV, while the single ended swing on the measuring instrument is around 185 mV.
6 SEE Simulations An aerospace application involves a radiation tolerance analysis. The relatively low presence of cumulative radiation doses in the space environment let them to be neglected and to focus the attention on the disturbs coming from high-energy particles hitting the substrate (Single Event Effects). SEEs refer to the consequences that particles may cause when they strike the silicon substrate of an electronic device [14, 15, 16]. The hit generates a charge collection, which can be simulated in a CAD as an injected double exponential modeled current peak in every pn-junction node of the circuit, trying to find out the most sensitive ones. SEE may be also caused by glitches due to EM interference in automotive applications. Figure 6a shows a typical response to a quite high SEE injected charge of 375 fC. We can see that both, frequency and amplitude, vary with a SEE hit. Frequency variations are wider when a SEE hits a VCO node, near the Varicaps.
Fig. 6. SEE simulations results when each node of the circuit is stimulated by a charge injection with a period of 3 ns for circuit: a) without, b) with mitigation technique.
210
P. Prosperi et al.
Instead, amplitude variations could be critical when SEE hits the common gate node of the current mirrors. To raise circuit reliability in a radiation environment Guard rings and Deep Nwells are used for transistors and a differential architecture is implemented. To mitigate amplitude variations due to SEE at common gate nodes of current mirrors a technique based on increasing the RC constant of those nodes [15], by raising their capacitance is presented(Fig. 7).
Fig. 7. Proposed mitigation technique.
Since, those nodes are directly connected to IC pads, from which externals bias currents are injected, it’s possible to integrate these capacitances under the pads, saving some die area. Figure 6b shows the improved results.
7 State of the Art Comparison and Conclusions This work proposes a comparison of various VCO structures, in a 28 nm process, and the complete design of a radiation tolerant VCO, working at a central frequency of 25 GHz, with a low supply voltage of a 0.9 V and able to work in a temperature range from − 40 °C to 125 °C. The comparison has highlighted that ring oscillator based VCOs, both in their CMOS and CML solutions, suffer from some problems, in term of reliability Table 2. State of the Art comparison. This Work
[2]
[4]
[17]
Technology [nm]
28
65
65
28
VCO type
NMOS LC
CMOS LC
NMOS LC
NMOS LC
Rad. tolerance
yes
yes
yes
no
Vdd [V ]
0.9
1.2
1.2
0.85
Temperature range
[−40–125] °C
–
[−55–125] °C
–
Frequency [GHz]
25
2.56
6.25
15
Power diss. [mW ]
0.86
1.8
1.8
6.8
L (1 MHz) [dBc/Hz]
−95
−118
−105
−97
FOM (1 MHz) [dBc/Hz]
−184
−188.7
−178
−172
Analysis and Design of Integrated VCO in 28 nm CMOS
211
and of global performances, resulting from the issues introduced by low voltage ultrascaled technologies in Analog-RF design. LC tank oscillator has instead shown better performances, so a LC tank VCO has been totally implemented. The designed LC-VCO covers the frequency range from 24.35 to 25.65 GHz, with a power dissipation