301 122 8MB
English Pages 215 [216] Year 2023
Synthesis Lectures on Engineering, Science, and Technology
Khaled Salah Mohamed
Deep Learning-Powered Technologies Autonomous Driving, Artificial Intelligence of Things (AIoT), Augmented Reality, 5G Communications and Beyond
Synthesis Lectures on Engineering, Science, and Technology
The focus of this series is general topics, and applications about, and for, engineers and scientists on a wide array of applications, methods and advances. Most titles cover subjects such as professional development, education, and study skills, as well as basic introductory undergraduate material and other topics appropriate for a broader and less technical audience.
Khaled Salah Mohamed
Deep Learning-Powered Technologies Autonomous Driving, Artificial Intelligence of Things (AIoT), Augmented Reality, 5G Communications and Beyond
Khaled Salah Mohamed Siemens Digital Industries Software Fremont, CA, USA
ISSN 2690-0300 ISSN 2690-0327 (electronic) Synthesis Lectures on Engineering, Science, and Technology ISBN 978-3-031-35736-7 ISBN 978-3-031-35737-4 (eBook) https://doi.org/10.1007/978-3-031-35737-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
I would like to express my deepest gratitude to my late father, Eng. Salah Abouelhassan, for introducing me to the world of engineering and for always encouraging me to explore my curiosity. I am forever grateful for the countless lessons he has taught me, and for always pushing me to be the best version of myself. I am also indebted to my mother, Prof. Layla Kotb, who instilled in me a love for philosophy and critical thinking from a young age. Her unwavering support and guidance have been invaluable to me throughout my life, and I am honored to have her as a role model and mentor. Lastly, I would like to thank my beautiful daughters, Lojayne and Joury, who are a constant source of inspiration and joy in my life. Watching them grow and learn has been a privilege, and I am grateful for the ways in which they challenge me to be a better person and writer. They are, in many ways, a small version of me, and I am proud to call them my daughters.
About This Book
This book provides an extensive coverage of cutting-edge deep learning technologies, making it an essential read for those seeking to expand their knowledge and understanding of this exciting field. It delves into various deep learning applications, ranging from autonomous driving to augmented reality, AIOT, 5G, and beyond, thereby providing readers with a comprehensive overview of the diverse ways in which deep learning can be used to enhance a variety of industries. Furthermore, the book offers an in-depth exploration of fundamental techniques and tools used in deep learning, as well as a detailed discussion and comparative study of different deep learning architectures, applications, tools, and platforms. In addition, the book showcases state-of-the-art applications of both machine and deep learning, providing valuable insights into the integration of these techniques with various application domains such as autonomous driving, augmented reality, AIOT, 5G, and beyond. Overall, this book serves as an invaluable reference for readers looking to design and develop deep learning applications.
vii
Contents
1 An Introduction to Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Machine Learning Versus Data Mining Versus Statistics . . . . . . . . . . . . . . 1.2 Classical Supervised Machine Learning Algorithms Versus Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 K-Nearest Neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.6 Naive Bayes: Probabilistic Machine Learning . . . . . . . . . . . . . . 1.2.7 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.8 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.9 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.10 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Feedforward Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 Spike Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.5 GANs: Generative Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 1.3.6 Transfer Learning Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.7 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.8 Comparison Between Different NN Models . . . . . . . . . . . . . . . . 1.4 Deep Learning Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 AlexNet: Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 VGG: Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 GoogLeNet/Inception: Classification . . . . . . . . . . . . . . . . . . . . . . 1.4.4 ResNets: Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.5 MobileNets: Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.6 RetinaNet: Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 3 7 7 8 9 9 9 10 11 11 12 12 15 19 20 23 23 24 25 25 26 26 27 27 28 28 29 ix
x
Contents
1.5
Deep Learning Platforms and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 Pytorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Challenges of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Overfitting and Underfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Long Computation Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Vanishing Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.4 Hyper-Parameters Tuning: Weights and Learning Rate . . . . . . 1.7 Optimization of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Different Types of Distance Measures in Machine Learning . . . . . . . . . . 1.8.1 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.2 Manhattan Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.3 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.4 Cosine Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.5 Minkowski Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8.6 Jaccard Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Classification Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.1 Confusion Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.3 True Positive Rate (TPR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.4 False Negative Rate (FNR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.5 True Negative Rate (TNR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.6 False Positive Rate (FPR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.7 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.8 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9.9 F1-Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 New Trends in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.1 Hamiltonian Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.2 Quantum Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.3 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.4 Self-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.5 Zero-Shot Learning and Few-Shot Learning . . . . . . . . . . . . . . . . 1.10.6 Neurosymbolic AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.7 Binarized Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.8 Text to Video Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.9 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.10 Large Language Model (LLM) . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31 31 31 33 33 33 34 35 35 35 36 36 37 37 37 38 39 39 39 41 41 41 42 42 42 42 42 43 43 43 44 44 45 46 46 47 47 47 52 52
Contents
xi
2 Deep Learning for Autonomous Driving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Automotive Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 AUTOSAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Automotive Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Can Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 FlexRay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Local Interconnect Network (LIN) . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Media Oriented Systems Transport (MOST) . . . . . . . . . . . . . . . 2.5 ADAS Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 SAE International Standard (J3016) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Color Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Color Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Image Processing Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 DL-Based Object Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 You Only Look Once (Yolo): Localization . . . . . . . . . . . . . . . . . 2.8.2 Single Shot Detector (SSD): Localization . . . . . . . . . . . . . . . . . . 2.8.3 Region-Based Convolutional Neural Networks (R-CNN): Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.4 Fast R-CNN: Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.5 Faster R-CNN: Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.6 Spatial Pyramid Pooling (SPP-Net): Localization . . . . . . . . . . . 2.8.7 Mask R-CNN: Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.8 Comparison Between Different Object Detection Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 V2X Communications and Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Hardware Platforms for Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.1 FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.2 High Level Synthesis (C/C++ to RTL) . . . . . . . . . . . . . . . . . . . . 2.10.3 High Level Synthesis (Python to HDL) . . . . . . . . . . . . . . . . . . . . 2.10.4 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.5 Java to VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Autonomous Driving Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Autonomous Driving Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.13 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59 59 61 62 64 64 70 74 76 77 80 82 83 87 87 88 90
93 93 96 96 97 97 98 99 99 100 100 101
3 Deep Learning for IoT “Artificial Intelligence of Things (AIoT)” . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Cyber-Physical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 AI, Big Data and IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Edge Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
105 105 106 109 115
90 91 91 92 93
xii
Contents
3.5
IoT Security Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 Confidentiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.5 Resilience to Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 IoT Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 IoT Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Industrial Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.3 Environment Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.4 Smart Buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.5 Smart Logistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Digital Twin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Digital Twin Enabling Technologies . . . . . . . . . . . . . . . . . . . . . . 3.8.2 Digital Twin Versus Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.3 How to Build a Digital Twin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 New Trends for IoT/AIoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.1 IIoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.2 LLM for AIoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.3 IoUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.4 IoBNT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.5 IoIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.6 TinyML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Deep Learning for Spatial Computing: Augmented Reality and Metaverse “the Digital Universe” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Immersive Technologies: Virtual Reality, Augmented Reality, Augmented Virtuality, Mixed Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Types of Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Augmented Reality Key Components . . . . . . . . . . . . . . . . . . . . . 4.2.3 Metaverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Interaction with Virtual Digital Representation . . . . . . . . . . . . . 4.3 Virtual/Augmented Reality Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Augmented Reality in Digital Learning . . . . . . . . . . . . . . . . . . . . 4.3.2 Augmented Reality in Tourism . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Augmented Reality in Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Augmented Reality in eCommerce . . . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Augmented Reality in Consumer Electronic . . . . . . . . . . . . . . . .
116 118 118 118 118 119 119 119 119 119 119 120 120 120 121 123 123 124 124 125 125 125 126 126 126 127 131 131 132 137 137 138 139 140 141 141 141 142 142
Contents
xiii
4.3.6 Augmented Reality in Manufacturing . . . . . . . . . . . . . . . . . . . . . 4.3.7 Augmented Reality in Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.8 Augmented Reality in Construction . . . . . . . . . . . . . . . . . . . . . . . 4.3.9 Augmented Reality in Fashion . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
143 143 145 145 145 146
5 Deep Learning for 5G and Beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Deep Learning Applications in 5G/6G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Deep Learning Applications at Physical Layer . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Channel Estimation/Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Beamforming Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.7 Spectrum Sensing, Sharing and Management . . . . . . . . . . . . . . . 5.3.8 Interference Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.9 Error Detection and Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Deep Learning Applications at MAC Layer . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Flexible Duplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Power Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Resource Allocation and Management . . . . . . . . . . . . . . . . . . . . . 5.4.4 Modulation and Coding Scheme Selection . . . . . . . . . . . . . . . . . 5.4.5 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.6 Link Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Deep Learning Applications at Network Layer . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Traffic Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Deep Learning Applications at Application Layer . . . . . . . . . . . . . . . . . . . 5.6.1 Performance Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 ML Deployment for 5G/6G: Case-Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 BER Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Blockchain for Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151 151 155 158 158 160 160 160 160 161 161 161 162 162 162 162 162 163 163 163 164 164 164 164 164 164 165 165 165 165 166
xiv
Contents
6 Python for Deep Learning: A General Introduction . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 What Is Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Why Should You Learn Python? . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Python Applications: What Can You Do with It? . . . . . . . . . . . 6.1.4 Python Programming Environment and Tools: What Do You Need to Write Programs in Python? . . . . . . . . . . . . . . . . . . 6.1.5 Integrated Development Environment (IDE) . . . . . . . . . . . . . . . . 6.1.6 The Big Picture of Any Program . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.7 Create Your First Program in Python: Hello World Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.8 Python Versus Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.9 Python Versus C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Numbers and Functions of Numbers . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Strings and Functions of Strings . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 List and Functions of List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5 Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.6 Classes and Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 External Functions: User-Defined Functions . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Control: If Statement, While Loop, For Loop . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 If/If Else/If Elif Else Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 While Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 For Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Reading and Writing from External Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Modules and pip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.2 Printing Quotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.3 Assert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.4 Try-Except . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11 GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.12 From Python to .exe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.13 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.14 Web Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.15 Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.16 Python to HDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
171 171 172 172 172 173 173 174 175 175 176 178 178 180 181 182 182 182 183 183 184 184 186 186 187 187 188 189 189 189 189 190 190 191 192 194 195 195 195
Contents
xv
6.17 Python for Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.18 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
197 198 200
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203
1
An Introduction to Deep Learning
Machine learning (ML) algorithms try to learn the mapping from an input to output from data rather than through explicit programming. ML uses algorithms that iteratively learn from data to describe data or predict outcomes, i.e., to produce precise models based on that data. As the training data becomes larger, it is then possible to produce more precise models. Deep Neural Networks (DNNs) have achieved great success in many different applications. Moreover, deep neural networks (DNNs) have inspired new applications with different technologies such autonomous driving, Internet of things (IoT), 5G and beyond and augmented reality (Fig. 1.1). Deep learning (DL) architecture is a form of ML and can be adapted to solve the detection problem in camera-based tracking for augmented reality (AR). Autonomous driving systems are an example for using deep learning with AR. Internet of Things (IoT) is a global infrastructure for the information society, enabling advanced services by interconnecting (physical and virtual) things based on existing and evolving interoperable information and communication technologies. AR can be combined with IoT. It is a new paradigm called Virtual Environment of Things (VEoT). In this work, we introduce different applications for DL in different domains such as: IoT, AR, 5G and beyond, and Autonomous driving. IoT devices capture data from the physical world so that it can be analyzed using deep learning and AR devices take that digital data and render it back on the physical world for people to view and interact with. IoT can be combined with AI to have a new paradigm called Artificial intelligence of things (AIoT). 5G and beyond requires real time decision-making. Deep learning applications for 5G system includes but not limited to prediction of faults, fraud detection, improvement of service quality, chatbots, recommendation systems, optimize device resource usage. The self-driving cars deploy deep neural network (DNN) inference for real-time object detection to avoid obstacles on road. DL has many challenges: model size as DNNs have too many parameters, training time have become a huge bottleneck as Such a long training time limits the productivity of ML researchers, energy efficiency. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. S. Mohamed, Deep Learning-Powered Technologies, Synthesis Lectures on Engineering, Science, and Technology, https://doi.org/10.1007/978-3-031-35737-4_1
1
2
1 An Introduction to Deep Learning
Fig. 1.1 Some of deep learning applications
IoT
Autonomous Driving Applications of deep learning
5G and Beyond
Augmented Reality
So, we need efficient deep learning models and methods. Deep neural networks rely on massive and high-quality data to gain satisfying performance. When training a large and complex architecture, data volume and quality are very important, as deeper model usually has a huge set of parameters to be learned and configured. No single ML algorithm or model fits all scenarios. To Choose the right ML model, there are several factors should be taken into consideration such as the complexity of the problem, type of data (structured, un-structured, texts, images, Choosing the right ML model. etc.), and required latency (real-time, or not). The absence of high-quality labelled data may result in inaccurate ML algorithms. In such cases, we need to integrate prior knowledge from physics-based models with ML techniques. We are in the era of data; the important challenge is to convert the data to wisdom level (Fig. 1.2).
1.1
Machine Learning Versus Data Mining Versus Statistics
Machine learning and data mining often employ the same methods, but while machine learning focuses on prediction, data mining focuses on the discovery of previously unknown properties in the data. In other words, data mining uses many machine learning methods, but with different goals. Machine learning and statistics are closely related fields in terms of methods, but distinct in their principal goal. Statistics draws population inferences from a sample, while machine learning finds generalizable predictive patterns. Statistics draws population inferences from a sample, while machine learning finds generalizable predictive patterns.
1.2
Classical Supervised Machine Learning Algorithms Versus Deep Learning
3
Wisdom (Evaluated understanding) Understanding (Why)
Knowledge (How)
Information (Who, What, Where, When)
Data (Symbols)
Fig. 1.2 From data to wisdom
1.2
Classical Supervised Machine Learning Algorithms Versus Deep Learning
Types of Machine Learning are: Supervised learning, Unsupervised learning, Semisupervised learning, and Reinforcement learning (Figs. 1.3 and 1.4). In the supervised ML, a machine learns from past data and then produces the desired output. The supervised learning algorithms can be classified into two, namely regression and classification. Regression is a supervised learning technique whose output variables are continuous and can be assigned any real value that is regression type. The examples are weight, size, and amount. Classification is a type of supervised learning in which the output vectors are usually discrete or non-continuous such as gender and color. There are three kinds of classification task. Binary classification where there are only two exclusive classes. Multiclass classification where there are more than two classes, and the classifier can only report one of them as output. Multilabel classification where a combination of multiple independent binary classifiers. When a machine learns from unlabeled data or it discovers the input pattern itself, it is known as unsupervised learning. When the machine learns from both labeled and unlabeled data, it is known as semi-supervised learning. Researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce
4
1 An Introduction to Deep Learning
SVM Naïve Bayes Nearest Neighbor Random Forest Decision Tree
Linear Regression Q-Learning Logistic Regression
Neural Network K-Means clustering
Fig. 1.3 Machine learning types
Fig. 1.4 Machine learning types versus data types
a considerable improvement in learning accuracy. Reinforcement learning does not need training examples. In the reinforcement learning, models are given an environment, group of some actions, a goal, and a reward. This algorithm learns by rewards and penalties. ML applications are summarized in Table 1.1. Computer Vision (CV) and Pattern Recognition is one of the earliest applications of DL. Image classification, object detection, object/instance segmentation, image reconstruction, and captioning are the scopes of CV models. Natural Language Processing (NLP) deals with processing sequential data. Recurrent Neural Networks (RNN), LongShort-Term-Memory (LSTM) and Gated Recurrent Unit (GRU) have been used in language modeling until the state-of-the-art Transformer model is introduced. NLP models are used in machine translation, text summarization, speech recognition, syntactic and semantic parsing, question answering, dialog system etc. There are three classifications of machine learning: rule-based learning, feature-based learning, and representation learning (Fig. 1.5) [1–4].
1.2
Classical Supervised Machine Learning Algorithms Versus Deep Learning
Table 1.1 ML applications
5
Machine learning type
Application
Supervised learning
• • • • • •
Unsupervised learning
• Social network analysis • Analysis of cancer diagnosis
Semi-supervised learning
• Sentiment analysis
Face recognition Pattern recognition [7] Speech recognition [8] Email spam filtering Handwriting recognition [9] Intrusion detection
Reinforcement learning [10] • Computer games • Stock market analysis • Robotics
Input
Output
Rules
(a) Input
Handcrafting Features
Mapping
Output
(b) Input
Automated Features
Mapping
Output
(c) Fig. 1.5 a Rule-based learning, b feature-based learning, c and representation learning
Spark NLP [5] is an open-source natural language processing (NLP) library built on top of Apache Spark, a popular big data processing framework. It provides a high-level API for building scalable and efficient NLP pipelines on large datasets, leveraging the distributed processing capabilities of Spark. Spark NLP offers a wide range of NLP functionalities, including tokenization, part-of-speech tagging, named entity recognition, sentiment analysis, and text classification. It also supports pre-trained models for several NLP tasks, such as word embeddings and language detection. Spark NLP is designed to work with several programming languages, including Python, Scala, and Java. It can be used with several big data platforms, including Apache Hadoop, Amazon EMR, and Google Cloud Dataproc. One of the key benefits of Spark NLP is its scalability. It can
6
1 An Introduction to Deep Learning
process large volumes of text data in a distributed and parallelized manner, making it suitable for handling big data NLP tasks. It also provides a unified API for NLP tasks, making it easier to build and maintain NLP pipelines. Classical ML techniques can make use of the data volume to some magnitude. Their performance will not improve further even we add more data. DL methods can use big data by building a complex model to learn nonlinear relationships between data. Analogy between human brain and different types of learning is shown in Fig. 1.6. In the context of machine learning, the level of automation for batch, real-time, and near real-time processing is typically related to the speed at which data is processed and analyzed by machine learning models. Batch processing in machine learning involves processing a large dataset in batches or chunks, often during off-peak hours or in the background. This type of processing is commonly used for tasks such as training machine learning models or running analytics on historical data. Batch processing can be highly automated using tools such as Apache Spark, which can distribute and parallelize computations across a cluster of machines. Real-time processing in machine learning involves processing data as it arrives in real-time, with minimal delay. This type of processing is commonly used for applications such as fraud detection or anomaly
Fig. 1.6 Analogy between human brain and different types of learning. In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons [11]
1.2
Classical Supervised Machine Learning Algorithms Versus Deep Learning
7
detection, where immediate action is required based on incoming data. Real-time processing requires highly automated systems that can rapidly process and analyze incoming data, such as stream processing frameworks like Apache Flink or Apache Kafka. Near real-time processing in machine learning involves processing data in smaller batches, typically every few minutes, to provide faster results than traditional batch processing but not as fast as real-time processing. This type of processing is commonly used in applications such as predictive maintenance or recommendation systems. Near real-time processing can be automated using tools such as Apache Beam or Apache NiFi, which can process data in near real-time by leveraging batch processing techniques [6].
1.2.1
K-Nearest Neighbor
K-nearest neighbor (K-nn) is a probabilistic pattern recognition technique that classifies a signal output based on the most common class of its k nearest neighbors in the training data. The most common class can be computed as a distance or correlation metric. The similarity function is calculated using the Euclidean distance. K-nearest neighbor or KNN is a simple algorithm that uses the whole dataset in the training stage. When a forecast is required for an unknown data, the whole training data set searches for the k-most similar cases and the data is eventually returned as the forecast in the most similar occurrence. KNN is often used in search applications to find items like one. KNN can be used both for Classification and Regression. The concept of KNN is described as follows: First, find the average value between the new and the training samples, then, according to the classification of the neighbors, obtain the classification of the new sample. the new sample is categorized to the nearest category. The nearest element on the space is obtained according to the standard Euclidean distance law and given by [12]: [ | k |Σ dj = √ (1.1) (xi − yi )2 i=1
where x i , yi are two elements in the space. k is generated automatically in the model construction process. the forecasting is computed from the time-series values d j .
1.2.2
Support Vector Machine
SVM was used in the linear classification of data. However, the use of a linear kernel limited its accuracy in nonlinear classification tasks. Therefore, the SVM algorithm was iterated by implementing a Gaussian kernel [13]. This allows the algorithm to map data to an unlimited dimension space where data can become more separable in a higher dimension [14, 15]. The model attempts to obtain the maximum edge to classify between
8
1 An Introduction to Deep Learning
the elements using the hyperplane in the middle. Substantially, the training elements that define the hyperplane are called support vectors. In a 2D feature space, the hyperplane is a line, and in a 3D feature space, it is a flat plane. If the data is linear, the hyperplane can be easily identified to separate each class. On the other hand, if the data is nonlinear and inseparable with linear planes, kernel functions map the nonlinear input data to a high-dimensional space to make that as linearly separable.
1.2.3
Decision Tree
Decision tree is a supervised learning technique that aims to split classification into a set of decisions that determine the class of the signal. The output of the algorithm is a tree whose decision nodes have multiple branches with its leaf nodes deciding the classes [16]. An example is shown in Fig. 1.7. Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item’s target value (represented in the leaves).
Fig. 1.7 Decision tree example
1.2
Classical Supervised Machine Learning Algorithms Versus Deep Learning
9
Fig. 1.8 Presentation of linear regression
1.2.4
Linear Regression
Regression analysis deals with the problem of fitting straight lines to patterns of data. The expected value of dependent variable Y is a linear function of the independent variable X [17]. Yt = β0 + β1 X 1t + β2 X 2t + · · · + βk X kt
(1.2)
where betas obtained by least squares, that is, minimizing the squared prediction error within the sample. Presentation of linear regression is shown in Fig. 1.8.
1.2.5
Logistic Regression
Logistic regression is well-known ML algorithms. It is utilized for anticipating the dependent factor by making use of a given set independent factor, it is used for the classification problems, and it is dependent on the idea of probability. Logistic regression calculates the yield of a dependent variable. Thus, the outcome is a discrete value. It may be either yes or no, zero or one, and valid or invalid. Logistic regression is used when the dependent variable has only two values, such as 0 and 1.
1.2.6
Naive Bayes: Probabilistic Machine Learning
Probability is used to represent uncertainty about the relationship being learned. Naive Bayes is a fast-learning classification algorithm based on Bayes’ theorem. The reason it is called naive is because the algorithm supposes the independence of the predictor variables. Or in other words, it assumes that a particular feature is not affected by the
10
1 An Introduction to Deep Learning
presence of other features. For each data point, the algorithm predicts the probability of how related the feature is to that class and the class for which the probability ranks highest is chosen as a probable class [18–21]. Naive Bayes is a grouping calculation that works based on the Bayes hypothesis. Prior to clarifying about Naive Bayes, first, we ought to examine Bayes theorem. Bayes hypothesis is utilized to discover the likelihood of a theory with the given proof: P(A|B) =
P(A)P(B|A) P(B)
(1.3)
where, A is the hypothesis and B is the proof. P(B|A) are probability of B given that A is True. P(A) and P(B) are independent probabilities of A and B. It can be expressed too as in (1.4): P(hypothesis|proof) =
P(hypothesis)P(proof|hypothesis) P(proof)
(1.4)
Probability function tells us about the probability of data y given a parameter vector θ. The likelihood function performs a mapping in the opposite direction: given the data y, what is the likelihood of the values in the #parameter vector θ.
1.2.7
Random Forest
Random forest is a popular supervised learning algorithm. It is a collection of decision trees and works on bagging technique which is an ensemble method. In ensemble, instead of using an individual model, a collection of models is used to predict the output. The most popular ensemble is the random forest which is a collection of many decision trees. Bagging is the bootstrap aggregation where bootstrap refers to creating bootstrap samples for a given dataset containing 30–70% of the data and aggregation refers to aggregate the result of various models present in the ensemble [22]. Random forest is a classifier that combines several decision trees on different subsets of a dataset and averages the results to increase the dataset’s predictive accuracy. Instead of relying on a single decision tree, random forest collects the forecasts from each tree and predicts the final output based on the majority votes of predictions (Fig. 1.9). Random Forest is a standard decision tree algorithm for timeseries analysis.
1.2
Classical Supervised Machine Learning Algorithms Versus Deep Learning
11
Fig. 1.9 Random forest algorithm
1.2.8
K-Means Clustering
K-means algorithm is an iterative algorithm that tries to partition the dataset into K predefined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the intra-cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid (arithmetic mean of all the data points that belong to that cluster) is at the minimum. The less variation we have within clusters, the more homogeneous (similar) the data points are within the same cluster [23, 24].
1.2.9
Q-Learning
Neural networks are usually trained under the paradigm of supervised learning, i.e., on input–output pairs from some ground-truth data set. A different paradigm is that of reinforcement learning (RL), where an agent is not told what action it should take, but instead receives a reward or penalty for actions. Rewards and penalties are dictated by an external environment. The goal of RL is for the agent to learn a policy (strategy) that maximizes its reward [25]. Q-Learning is a Reinforcement learning policy that will find the next best action, given a current state. It chooses this action at random and aims to maximize the reward. Qlearning determines the optimal approach for any finite Markov decision-making process.
12
1 An Introduction to Deep Learning
1.2.10 Deep Learning Deep learning journey is back to the early 1940s. Deep learning (DL) is a class of machine learning whose algorithms comprise of neural networks with several hidden layers. The advantage of deep learning algorithms over traditional machine learning algorithm is the ability of DL algorithms to automatically extract features from the input data without the bias that comes with manual feature extraction in classical machine learning algorithms. However, DL algorithms require significantly more data and computation resources than classical machine learning algorithms. In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing [26].
1.3
Deep Learning Models
Artificial intelligence (AI) is making machines mimics the seven-layer layered reference model of the brain (LRMB) which has more than 43 cognitive processes. Deep learning (DL) is the foundation of artificial intelligence (AI). Deep learning is advanced form of Artificial Neural Networks (ANN) which is imitation of the human nervous system. DL has verities of applications such as: image classification, audio recognition, object detection, autonomous driving, computer vision and natural language processing. Deep learning processing requires huge computation intensity and large memory space storage and where conventional computing platforms are not efficient for it as Moore’s Law Scaling reaches its end. Moreover, CPUs/GPUs are no longer benefit from increasing the number of cores. i.e. Deep learning is still too computational heavy even with the help of CPUs/GPUs. Thus, specialized accelerator architectures play vital role in addressing these challenges in terms of increasing the speed and lowering the power. Convolutional neural network (CNN), Multilayer Perceptron (MLP), recurrent neural network (RNN), and generative adversarial networks (GANs) are the major cores of deep learning. Matrix– Vector multiplication (MVM) is the major computational module in CNNs. MVM has massive multiplication and accumulation operations (MACs). Moreover, MACS are the major computation component in MLP. The main difference between an MLP and RNN is that in an RNN the hidden layer is additionally connected to itself [27]. These operations can be processed independently to increase temporal/spatial parallelism. Memories can also be accessed in parallel. In this paper, a comparative study for the current deep learning accelerators is provided. Moreover, An FPGA-based hardware accelerator is proposed to improve the computational complexity of MVM operations. FPGAs have recently been adopted to accelerate the development of DL because they can achieve lower latency and consume less power compared to GPUs, while providing more flexible than ASICs, where ASICs cannot adopt new RNN models [28].
1.3
Deep Learning Models
13
The most important deep learning architectures are: multi-layer perceptrons (MLPs) which is the basic models that are generally used in many learning tasks. Convolutional neural networks (CNN) which use convolution operation to reduce the input size and to get feature map are often used in image recognition tasks. For learning tasks which require sequential models, recurrent neural network (RNN) are most suitable. Autoencoder based deep learning models are used for dimension reduction and generative adversarial networks (GANs) are used to generate samples similar to the available dataset. Analogy between a biological neuron (perceptron) and its mathematical model is shown in Fig. 1.10 and Table 1.2. A single-layer network can be extended to a multiple-layer network, referred to as a Multilayer Perceptron. A Multilayer Perceptron, or MLP for sort, is an artificial neural network with more than a single layer. Artificial neural network (ANN) is one of the methods to realize machine learning. The artificial neural network uses the structural and functional features of the biological neural network to build mathematical models for estimating or approximating functions. ANNs are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Deep learning process involves two stages: training and interference. Training or parameter learning is the process of estimating the parameters of the neural network model based on known data. Regarding the size of training data, the number of training examples should be at least five to ten times the number of weights of the network. Inference is the trained neural network model is deployed in the system to predict the labels of the unknown data obtained by the sensor in real time (Tables 1.3 and 1.4). Data pre-processing is required to train any model to make sure that the training process behaves better by improving the numerical condition of the optimizer. This will also ensure that the various initializations are made appropriately. A neural network is trained through an optimization process that uses a loss function to calculate an error between the predicted value of the model and the expected output. Typically, minimizing the loss function is applied when training the neural networks. There are many loss functions, and it can be challenging to choose a suitable one for a specific problem. Mean square error (MSE) can be employed as loss function. The flow is shown in Fig. 1.11. Backpropagation (BP) is a method to update the weights of neural networks in combination with gradient descent. Gradient descent is one of the most common algorithms to perform optimization for neural networks. By calculating the gradient and moving in the reverse direction, the error between the predicted result of the network and the expected output is reduced during training process (Fig. 1.12). Backpropagation adjusts the weights of the NN in order to minimize the network total mean squared error (Fig. 1.13). The activation functions play a determinant role in the training process and consequently the network’s effectiveness, by adjusting the neurons’ outputs. A function is attached to each neuron in the network and decides if it should be activated or not, based
14
1 An Introduction to Deep Learning
Fig. 1.10 Analogy between a biological neuron (perceptron) (a) and its mathematical model (b)
on whether its input is relevant for the model’s prediction. If sigmoid activation function is used, then graph appears as shown in Fig. 1.14. Moreover, Table 1.4 summarizes some of the important activation functions.
1.3
Deep Learning Models
Table 1.2 Comparison between a biological neuron and human neuron
Table 1.3 Comparison between training mode an inference mode
15
Human neuron
Neural network
Cell body
Node
Axon
Weight
Synapses
Activation
Attribute
Training
Inference
Performance metric
Time
Throughput, latency
Memory capacity
High
Medium
Data types
FP
Integer
DL is a subset of ML in which several linear, as well as nonlinear, processing units are organized in a deep layer design so that it could model the abstraction in the data. Figure 1.15 shows the basic architecture of DL, which can learn the features automatically. Deep neural networks are usually overparameterized. Pruning removes the redundant elements in neural networks to reduce the model size and computation (Fig. 1.16). Pruning can be performed at different granularities. Fine-grained pruning removes individual elements from the weight tensor. Coarse-grained pruning removes a regular tensor block for better hardware efficiency. To train the neural network sufficiently well, the learning models need a huge number of raw data to work with. Since large data is not always available, there is need for data augmentation. This is a technique in which the quantity of raw data can be increased through the addition of some slightly modified copies to our already existing data.
1.3.1
Feedforward Neural Network
This is one of the simplest types of artificial neural networks. In a feedforward neural network, the data passes through the different input nodes till it reaches the output node. In other words, data moves in only one direction from the first tier onwards until it reaches the output node (connections between the nodes do not form a cycle). This is also known as a front propagated wave which is usually achieved by using a classifying activation function. Unlike in more complex types of neural networks, there is no backpropagation and data move in one direction only. A feedforward neural network may have a single layer, or it may have hidden layers. In a feedforward neural network, the sum of the products of the inputs and their weights are calculated. It can be Single-layer perceptron, multi-layer perceptron, or radial basis function. Feed-forward neural network architecture is shown in Fig. 1.17 [29].
f (x) = 1+e1 −x • Smooth gradient • Good classifier • Clear predictions
Sigmoid
Disadvantages • Vanishing gradient • Computationally expensive
Advantages
Equation
Structure
Comparison
Table 1.4 Activation functions examples
• Vanishing gradient • Computationally expensive
• More efficient than Sigmoid as it has wider range for faster learning • Smooth gradient • Good classifier • Clear predictions
f (x) = tanh(x)
Tanh (hyperbolic tangent)
• The dying ReLU problem: any input that close to zero or negative the gradient will be zero
• Most efficient • Good classifier
f (x) = x at x < 0
ReLU (rectified linear units)
16 1 An Introduction to Deep Learning
1.3
Deep Learning Models
17
yes Augment Data
Train Model
Test Model
Accepted Accuracy
Release Model
No
Fig. 1.11 NN model development lifecycle: testing and inference flow
Fig. 1.12 Gradient decent: 3D view of the error landscape. When we train a network, we start from the initial weights (possibly random) and iteratively adjust weights. The error goes down, or descends, the surface and gradually reaches the minimum point
Fig. 1.13 Back propagation concept
18
Fig. 1.14 Sigmoid activation function
Fig. 1.15 The basic architecture of deep learning
Fig. 1.16 Pruning
1 An Introduction to Deep Learning
1.3
Deep Learning Models
19
Fig. 1.17 Feedforward neural network
Deep Belief Network (DBN) is structurally similar to FC networks, but they differ in their training (Fig. 1.18). DBNs are trained by exploiting the concept of successive RBM pairs. Every layer in DBN is a visible RBM layer for the next neighbor and a hidden RBM layer for the previous one. This pairwise training is a beneficial substitution for the BP training scheme in FC networks. DBN as a supervised method can use labeled data to fine-tune the weights, after RBM initialization. It overcomes the gradient vanishing and gradient explosion problems in BP method, by employing the RBM training scheme. It can be used in both supervised and unsupervised learning. However, it usually requires larger training time, especially when it undergoes a fine-tuning step [30, 31].
1.3.2
Recurrent Neural Network
RNNs allow us to solve many problems of time sequences that traditional deep neural networks have difficulties to deal with such as audio or video [32]. The main difference between an MLP and RNN is that in an RNN the hidden layer is additionally connected to itself (Fig. 1.19). It performs matrix/vector operations on given input sequences. It is more efficient in terms of area and energy. Both the deep feedforward and convolutional neural networks have a characteristic that their network structures are arranged in order and there is no feedback structure.
20
1 An Introduction to Deep Learning
Fig. 1.18 Deep belief network
Fig. 1.19 RNN architecture
However, in a particular task such as speech signal processing, to better capture the timesequential features of the input vector, it is sometimes necessary to combine the sequential inputs [33]. RNNs have the capability to identify the dynamic temporal behaviors of time sequences. Long short-term memory (LSTM) is a special class of RNNs which have the capacity to learn long-term dependencies [34, 35]. LSTM cares about crucial information and positively affects the network performance.
1.3.3
Convolutional Neural Network
Convolutional neural network (CNN) is a special type of multilayer feedforward neural network that is well suited for pattern classification. A Convolutional Neural Network (CNN) is a stacked structure of layers (Fig. 1.20). The major computation in a CNN layer is weighted sum of the product between input and model weights where the weights are
1.3
Deep Learning Models
21
Fig. 1.20 CNN architecture
learned from the training phase and the inputs are from the previous layer [36]. Convolution in CNNs is to shift a group of 3D filters over an input tensor and outputs a result tensor. CNN Training is an iterative process in which each iteration consists of a forward phase and a backward phase. The forward phase feeds input images sequentially through layers. The output of previous layer is the input to the next. The final layer computes a loss between its output and the ground truth of the input. The backward phase propagates the loss layer by layer [37]. The gradients of each layer’s weights are also computed during the backward phase. At the end of backward phase, all the weights are updated [38]. The CNN can be described by Eqs. (1.5)–(1.7), where summary of the used variables are summarized in Table 1.5. y=
n Σ
wi xi
(1.5)
i=1 n Σ Loss = Cost function = (Predictioni − Targeti )2
(1.6)
i=1
wi+1 = wi − η
∂Loss ∂wi
(1.7)
There are many convolutional neural network architectures (CNN) such as VGG, AlexNet, GoogleNet, ResNet. Different convolutional neural networks comparison in terms of accuracy, number of layers, and floating operations per second are summarized in Table 1.6. Input signal of CNN can be any types of signals like image, video, sound or text. Convolutional neural network is a kind of feedforward neural networks with convolution operation and deep structure. A convolutional neural network, as an important
22
1 An Introduction to Deep Learning
Table 1.5 Summary of CNN descriptive variables in Eqs. (1.1)–(1.3)
wi
Weights
xi
Inputs
y
Output
∂ Loss ∂w
The gradients of each layer’s weights
η
Learning rate
wi+1
The updated weights
supporting technology of deep learning, promotes the development of artificial intelligence. Convolution operators can effectively extract spatial information and are widely used in the field of visual images, including image recognition [39]. CNN Architecture consists of the following layers [40]: • Convolution Layer: it extracts low level features like orientation, edges, gradient, and color. • Pooling Layer: it helps to reduce the dimension of the features extracted in the convolution layer. This helps to reduce the computational complexity to classify the data. • Fully Connected Layer: This layer takes the output of convolution and pooling layer as input and assigns a label to an image in the dataset. For doing this, it converts the output of convolution and pooling layer to a single vector which represents the probability of a feature to belong to a class. Sparse CNNs leverage the fact that a substantial fraction of values in weights and activations are zeros. Hardware accelerators exploit weight and activation sparsity by skipping multiplications by zero and by not storing zero values. Sparsity in weights and activations arises for different rea-sons, and to different degrees. First, weight pruning removes filter weights with near-zero values. This process creates significant weight sparsity [41, 42]. Table 1.6 Comparison between different types of CNN [43] AlexNet [44]
VGG [45]
GoogLeNet [46]
Resnet [47]
Accuracy %
80.2
89.6
89.9
96.3
Layers
8
19
22
152
FLOPS
729 m
19.6G
1.5G
11.3G
1.3
Deep Learning Models
23
Fig. 1.21 Spike neural network
1.3.4
Spike Neural Network
Spiking neural networks (SNNs) are inspired by information processing in biology, where sparse and asynchronous binary signals are communicated and processed in a massively parallel fashion. SNNs on neuromorphic hardware exhibit favorable properties such as low power consumption, fast inference, and event-driven information processing. This makes them interesting candidates for the efficient implementation of deep neural networks and the method of choice for many machine-learning tasks (Fig. 1.21). Spiking neural networks (SNNs) use unsupervised bio-inspired neurons and synaptic connections, trainable with either biological learning rules such as spike-timing-dependent plasticity (STDP) or supervised statistical learning algorithms such as surrogate gradient [48, 49]. Unlike conventional feed-forward neural networks (ANNs), spiking neural networks (SNNs) give smooth tradeoffs between accuracy, latency, and power. This property makes them capable of “fast and slow” anytime perception [50, 51].
1.3.5
GANs: Generative Deep Learning
In a GAN training phase, a generator (G) and a discriminator (D) are iteratively trained against each other through an adversarial process. G generate synthetic samples based on an input image, while D implements a binary classifier to differentiate the samples generated by a generator against real samples [52]. In supervised training, a lot of human
24
1 An Introduction to Deep Learning
effort is needed to generate trained labeled data. To solve this problem, GANs are recognized as an attractive solution where it can generate new samples from high-dimensional data distributions [53]. Generative adversarial network (GAN) is an unsupervised learning algorithm to create new data samples that resemble the training data [54–56].
1.3.6
Transfer Learning Accelerator
Recently, Transfer Learning emerged as an efficient training paradigm that re-utilizes an existing neural network onto a different task. Thus, reducing the needed efforts in training and data labeling in supervised learning [57]. Training a learning system from scratch is difficult because it requires many labeled images for training and a significant amount of expertise in the field to ensure that the model converges properly. A promising alternative is to refine a learning system that has already been developed. Transfer learning techniques, as opposed to conventional machine learning techniques that try to learn each task from scratch, try to transfer information from previous tasks to a target task when the latter has less high-quality training data [58]. Transfer learning is actually very similar to our human learning method; the knowledge gained by completing a task can also be used to solve other related tasks. The more related the tasks are, the easier the knowledge is transferred (Fig. 1.22), [59]. Transfer learning steps can be summarized as follows: • • • • •
Select a pre-trained model. Remove the last layer(s) of the model. Add new layer(s) to the model. Train the model on the new dataset. Fine-tune the model.
Fig. 1.22 Transfer learning versus learning from scratch
1.3
Deep Learning Models
25
Fig. 1.23 Autoencoder architecture
1.3.7
Autoencoder
An autoencoder (AE) is a neural network that is trained to learn efficient representations of the input data, that is, it learns to copy its input to its output. AEs are mainly used for solving unsupervised problems. These networks first compress (encodes) the input and then reconstruct (decodes) the output from this representation by minimizing the differences between the input and reconstructed version of the input. The point behind data compression is to get a smaller representation of the input and passed it around. AEs are great for fault detention and intrusion detection. In training of the AE model, it is hoped that the output of the decoder and the input of the encoder are as similar as possible [60]. Autoencoder architecture is shown in Fig. 1.23 [61].
1.3.8
Comparison Between Different NN Models
Comparison between different NN models is shown in Table 1.7.
26
1 An Introduction to Deep Learning
Table 1.7 Comparison between different NN models Convolutional
Spike
Recurrent
Applications
• • • • •
Image recognition Video analysis Drug discovery Video analysis Natural language processing • Checkers game • Human pose estimation • Document analysis
• Information processing • Study the operation of biological neural circuits, can model the central nervous system of a virtual insect for seeking food without the prior knowledge of the environment
• Language modelling and prediction • Speech recognition • Machine translation • Image recognition • Grammar learning • Handwriting recognition • Human action recognition
Advantages
• More efficient in terms of memory and complexity • Good feature extractors
• Recognizes patterns with little data • Excellent low-resolution performance
• Store information • Much smaller set of input nodes • Given an RNN’s ability to remember past input, it is suitable for any task where that would be valuable
• Due to information load Disadvantages • From a memory and capacity standpoint the format, the inherent CNN is not much bigger computational costs, than a regular two-layer become much more network complicated • At runtime the convolution • Practical speed operations are implementations, are not computationally expensive on an apt level for and take up about 67% of consumer basis of the the time general public • CNN’s are about 3X slower than their fully connected equivalents (size-wise)
• It cannot process very long sequences if it uses tanh as its activation function • It is very unstable if we use REL as its activation function • RNNs cannot be stacked into very deep models • RNNs are not able to keep track of long-term dependencies
Learning algorithm
• Gradient descent and global optimization methods
1.4
• Supervised training with backpropagation
• One shot training and feed-forward
Deep Learning Architectures
In this section, we will present different deep learning architectures. Table 1.8 shows comparison of different deep learning architecture.
1.4.1
AlexNet: Classification
AlexNet is composed five convolutional (to obtain features from the Image) and three fully connected layers. The last layer of the fully connected layer is connected to an Nway softmax classifier, where N is the number of classes. Moreover, it uses dropout for
1.4
Deep Learning Architectures
Table 1.8 Comparison of different deep learning architecture
Architecture AlexNet
27
Layers 7
Parameters (*106 )
FLOPs (*106 )
62
1.5
VGG
16
138
15.5
GoogLeNet/ inception
22
6.7
1.6
ResNets
50
25.6
3.8
Fig. 1.24 AlexNet architecture
regularization and ReLU for faster training convergence. It focuses on smaller receptive window size to improve accuracy [62]. AlexNet Architecture is shown in Fig. 1.24.
1.4.2
VGG: Classification
VGG is composed of a stack of convolutional layers (from 8 to 16 layers) with three fully connected layers, followed by a softmax layer. VGG uses small convolution filters to construct networks of varying depths. It is a resource intensive network [63]. VGG-16 Architecture is shown in Fig. 1.25.
1.4.3
GoogLeNet/Inception: Classification
GoogLeNet is composed of 22 layer deep network, made up by stacking multiple Inception modules on top of each other. Inception modules are networks that have multiple sized filters at the same level. Input feature maps pass through these filters and are concatenated and forwarded to the next layer. The network also has auxiliary classifiers in the intermediate layers to help regularize and propagate gradient. GoogLeNet is using locally sparse connected architecture instead of a fully connected one to solve overfitting of data issues (Fig. 1.26) [64]. Its architecture is shown in Fig. 1.27. The main idea of
28
1 An Introduction to Deep Learning
Fig. 1.25 VGG-16 architecture
using Inception Blocks in GoogLeNet is to increase the depth and width of the network while keeping the computation constant.
1.4.4
ResNets: Classification
ResNet is realized by addition of a skip connection between the layers. This connection is an elementwise addition between input and output of the block and does not add extra parameter or computational complexity to the network. It reduces the number of parameters [65]. ResNet Architecture is shown in Fig. 1.28. Residual Block is a stack of two 3 × 3 convolutional layers, where the number of kernels of layers is equal. Residual Block, ½ is a stack of two 3 × 3 convolutional layers, where the number of kernels of the first layer is one a half that of the second layer.
1.4.5
MobileNets: Classification
MobileNet consists of 28 separate convolutional layers, each followed by batch normalization and ReLU activation function. A standard convolution uses kernels on all input channels and combines them in one step while the depthwise convolution (factorizes a standard convolution into a depthwise convolution and a 1 × 1 pointwise convolution) uses different kernels for each input channel and uses pointwise convolution to combine inputs. This separation of filtering and combining of features reduces the computation cost and model size. It is a lightweight object detection algorithm [66]. The effectiveness of MobileNet has been proven in a wide range of applications, with good performance in such areas as target detection and face recognition [67, 68].
1.4
Deep Learning Architectures
29
Fig. 1.26 a Dense network, b sparse network
1.4.6
RetinaNet: Classification
RetinaNet is a popular one-stage object detection model which works well with dense objects. RetinaNet architecture consists of a Feature Pyramid Network (FPN) built on
30
Fig. 1.27 GoogleNet architecture
Fig. 1.28 ResNet architecture
1 An Introduction to Deep Learning
1.5
Deep Learning Platforms and Libraries
31
top of a deep feature extractor network, followed by two subnetworks, one for object classification and the other for bounding box regression [69, 70].
1.5
Deep Learning Platforms and Libraries
Comparison between different deep learning Platforms is shown in Table 1.9. Moreover, Amazon, Google, IBM, Microsoft, and virtually every other CSP offer many of ML services (Cloud-Based ML).
1.5.1
TensorFlow
TensorFlow is a main technical tool used for the writing of machine learning applications. It is one of the libraries commonly used to execute machine learning and various calculations, which includes a large account of scientific activities. TensorFlow’s central component is the computational graph and tensors that move through edges between all the nodes. TensorFlow is an end-to-end, open-source platform developed by Google that uses data flow graphs to construct machine learning and deep learning models. Tenso refers to data representation and it is an array of multi-dimensions where All computations in TensorFlow are based on tensors [71]. Different dimensions of TensorFlow are shown on Fig. 1.29. TensorFlow is a powerful tool that can be used to create a robust object detection framework with a set of standard functions, eliminating the need for users to write code from scratch. It also provides a list of pre-trained models, which are useful not only for inference but also for building models with new data.
1.5.2
Keras
Keras can adopt Tensorflow as backend. Keras was built on top of TensorFlow as a highlevel API for quickly and easily designing and training network models. Keras has no low-level operations of its own; it is run on top of open-source deep learning libraries called backend, such as TensorFlow [72]. Keras does not handle low-level API such as computational graph, making tensors or other variables because it has handled only by the backend engine [73]. Keras is one of the most leading high-level neural network APIs written in Python and supports several backend neural network computing engines. Keras is created to be a user-friendly code interface, easy to understand and extend, and it supports modularity.
32
1 An Introduction to Deep Learning
Table 1.9 Comparison between different NN platforms Tensorflow [75]
Keras [76]
Pytorch [77]
Definition
TensorFlow is an open-sourced library. It is one of the more famous libraries when it comes to dealing with deep neural networks
Keras is an open-source Network library written in Python. It is capable of running on top of TensorFlow. Developers can use Keras to quickly build neural networks without worrying about the mathematical aspects of tensor algebra, numerical techniques, and optimization methods
Pytorch is a machine learning library based on the Torch library, used applications such as computer vision and natural language processing
Speed
Faster and suitable for high performance
Slower
Faster same as Tensorflow
Level of API
Provides both high and low level API
High level API
Provides lower level API
Architecture
It’s not very easy to use Has simple architecture so we can use Keras as “more readable” framework that makes work easier
Has complex architecture and the readability is less
Debugging
There is a special dedicated tool called Tensorflow debugger
Has a lot of computation junk in its abstractions and so it becomes difficult to debug
Allows an easy access to the code and it’s easier to focus on the execution of each line
Dataset
Is used for high performance models and large dataset that require fast execution
Is used for small datasets Same as TensorFlow as it is the slowest
Popularity
The middle
Has the highest popularity
The lowest
Operating system Android-ios—Mac os—windows—Linux and Raspberry Pi
Linus and OSX
Windows-Linux and OSX
Conclusions
Suitable for: • Rapid prototyping • Small dataset • Multiple backend support
Suitable for: • Flexibility • Short training duration • Debugging capability
Suitable for: • Large dataset • High performance • Functionality
1.6
Challenges of Deep Learning
33
Fig. 1.29 Different dimensions of TensorFlow
1.5.3
Pytorch
It is an open-source machine learning framework that accelerates the path from research prototyping to production deployment. It has a lot of computer vision, natural language processing, and other ML techniques and libraries. It enables programmers to compute GPU-accelerated tensors and aids in the creation of computer graphs [74].
1.6
Challenges of Deep Learning
Deep neural networks have many challenges, especially in training.
1.6.1
Overfitting and Underfitting
Overfitting in ML means that the fitting includes all the noise points as well into the model. This has a negative impact on the performance of the model, thereby reducing the accuracy. Underfitting Means the model is too simple that it fails to fit all the data points correctly from the training set, thereby leaving most of the points in the testing data as outliers or noise (Fig. 1.30). To overcome overfitting, several techniques have been
34
1 An Introduction to Deep Learning
Fig. 1.30 Underfitting versus overfitting
proposed including dropout, Sampling of training data. Dropout is probably the best, effective, and most common solution presented to deal with overfitting. In this technique, to train the network, in every epoch, a small number of nodes are randomly selected instead of all nodes, hence, it is determined what a percentage of the output of the nodes should be set to zero. To reduce the effect of overfitting, the simplest way is to reduce the size of the model, i.e., the number of layers and the number of nodes in each layer. These parameters are also called hyperparameters. Another technique is weight regularization, where instead of trying to reduce the error, the network try to reduce the loss. Σ L1 regularization adds “summed absolute values of weights,” i.e., |W|, as a penalty term to the loss function, and L2 regularization adds “summed squared weights,” i.e., Σ 2 W , as a penalty term to the loss function. L1 regularization penalizes the neural network weights by zeroing the weight values that are close to 0 or negative. The general intuition behind L1 is that if the weight value is close to 0 or very small, its effect on the overall performance of the model will be very small, so if we set this weight to 0, it will not affect the performance of the model and can reduce the memory consumption of the model. L2 regularization also penalizes weight values. For small and relatively large weight values, L2 regularization converts the values to a number close to 0, but not completely 0.
1.6.2
Long Computation Time
There are ways to speed up the learning and training of deep learning model such as: better parameter selection and Utilizing hardware support, e.g., GPU or FPGA.
1.7
Optimization of Deep Learning
1.6.3
35
Vanishing Gradient
The vanishing gradient problem occurs when the backpropagation algorithm is unable to propagate the network error to the primary layers to update the weights. In back propagation process, gradient is propagated backwards from the output layer to the input layer this results in early hidden layers learn much slowly than later hidden layers. As the early layers receive fraction of the gradient (vanishing gradient), although in some cases, gradient gets larger as it is propagated backwards (exploding gradient).
1.6.4
Hyper-Parameters Tuning: Weights and Learning Rate
Several algorithms have been applied to systematically search for the optimal hyperparameter values such as Monte Carlo and Genetic Algorithms.
1.7
Optimization of Deep Learning
There are many techniques used to optimize deep learning such as Stochastic Gradient Descent, and Mini-Batch Gradient Descent. Table 1.10 provides a comparison between Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent.
Table 1.10 Comparison between gradient descent, stochastic gradient descent, and mini-batch gradient descent Criteria
Gradient descent Stochastic gradient descent Mini-batch gradient (SDG) descent
The used dataset size Entire
Randomly subset
Fixed-size subset
Computation cost
Expensive
Less expensive
Less expensive
Convergence
Slow
Fast
Fast and more stable than SDG
Local minima
May get stuck
Can escape it
Can escape it and it is less noise than SDG
Global minimum
More reliable
Less reliable, but more efficient
Less reliable, but more efficient
36
1.8
1 An Introduction to Deep Learning
Different Types of Distance Measures in Machine Learning
Distance measures are used to measure the similarity between two or more vectors in multi-dimensional space. Classification of different types of distance measures are shown in Fig. 1.31 [78].
1.8.1
Euclidean Distance
Euclidean Distance represents the shortest distance between two vectors. It is the square root of the sum of squares of differences between corresponding elements (Fig. 1.32).
Fig. 1.31 Classifications of distance measures
Fig. 1.32 Euclidean distance
1.8
Different Types of Distance Measures in Machine Learning
37
Fig. 1.33 Manhattan distance
Fig. 1.34 Hamming distance
1.8.2
Manhattan Distance
Manhattan Distance is the sum of absolute differences between points across all the dimensions (Fig. 1.33).
1.8.3
Hamming Distance
Hamming Distance measures the similarity between two strings of the same length. The Hamming Distance between two strings of the same length is the number of positions at which the corresponding characters are different (Fig. 1.34).
1.8.4
Cosine Distance
The cosine similarity is simply the cosine of the angle between two vectors. It also has the same inner product of the vectors if they were normalized to both have length one (Fig. 1.35).
38
1 An Introduction to Deep Learning
Fig. 1.35 Cosine distance
1.8.5
Minkowski Distance
This distance metric is a generalization of the Euclidean and Manhattan distance metrics. It determines the similarity of distances between two or more vectors in space. In machine learning, the distance metric calculated from the Minkowski equation is applied to determine the similarity of size. It is also renowned as the p-norm vector, which represents the order of the norm. It allows the addition of the parameter p that enables the measurement of different distances that are to be calculated (Fig. 1.36).
Fig. 1.36 Minkowski distance
1.9
Classification Evaluation Metrics
39
Fig. 1.37 Jaccard distance
1.8.6
Jaccard Distance
Jaccard Similarity is a common proximity measurement used to compute the similarity between two objects, such as two text documents. Jaccard similarity can be used to find the similarity between two asymmetric binary vectors or to find the similarity between two sets (Fig. 1.37) [79].
1.9
Classification Evaluation Metrics
ML systems should be assessed using performance measures which represent real-world error cost. The performance of ML classifiers is evaluated based on accuracy, precision, recall, F1-score, and Confusion metric [80]. A classifier is typically represented as a set of discriminant functions. The classifier assigns a feature vector x to the i-the class if as depicted in Fig. 1.38.
1.9.1
Confusion Metric
Confusion metric is one of the most widely used metrics for classification. From the confusion metric, several metrics can be derived which can be used to evaluate binary class classification models as well as multi-class classification. Confusion metric is a metric of size m * m where m is no. of classes. It can be splitted into four separate elements: true positive (TP), true negative (TN), false positive (FP), and false negative (FN) as depicted in Fig. 1.39. For each class c:
40
Fig. 1.38 Representation of a classifier
Fig. 1.39 Confusion metric
1 An Introduction to Deep Learning
1.9
Classification Evaluation Metrics
41
• True Positive (TP): A true positive is when a sample is correctly classified as belonging to class c. • False Positive (FP): A false positive is when a sample is incorrectly classified as belonging to class c. • True Negative (TN): A true negative is when a sample is correctly classified as not belonging to class c. • False Negative (FN): A false negative is when a sample is incorrectly classified as not belonging to class c. It can be formulated as [81]: [ Confusion Matrix =
1.9.2
] TP FP FN TN
Accuracy
Accuracy is defined as the ratio of the total number of correctly predicted samples by the model to the total number of samples. Accuracy expression is given in (1.8). If the dataset is imbalanced, hence, we cannot depend totally on accuracy as a metric. Accuracy =
1.9.3
No. of correctly classified samples TP + TN = Total no. of samples TP + TN + FN + FN
(1.8)
True Positive Rate (TPR)
TPR is ratio of correctly predicted positive labels to the total actual positive labels. T PR =
1.9.4
TP TP + FN
(1.9)
False Negative Rate (FNR)
FNR is ratio of incorrect predicted positive labels to the total actual positive labels. FNR =
FN TP + FN
(1.10)
42
1.9.5
1 An Introduction to Deep Learning
True Negative Rate (TNR)
TNR is ratio of correct predicted negative labels to the total actual negative labels. TN FP + TN
TNR =
1.9.6
(1.11)
False Positive Rate (FPR)
FPR is ratio of incorrect predicted negative labels to the total actual negative labels. FP FP + TN
FPR =
1.9.7
(1.12)
Precision
Precision is how many of the positive samples were predicted correctly to all positive predicted samples. Precision =
1.9.8
TP TP + FP
(1.13)
Recall
Recall is how many of the samples were predicted correctly to all the actual positive samples. Precision and recall are inversely related. Recall =
1.9.9
TP TP + FN
(1.14)
F1-Score
In some cases, where we are unable to decide between precision and recall, we proceed with F1-score which is the combination of precision and recall. F1-Score = 2 ∗
Precision ∗ Recall Precision + Recall
(1.15)
1.10
1.10
New Trends in Machine Learning
43
New Trends in Machine Learning
1.10.1 Hamiltonian Neural Networks Feed-forward neural networks, or multilayer perceptrons, are the foundation of contemporary deep learning machines widely utilized in image, video, and audio processing. These neural machines typically contain an input layer, many hidden layers, and an output layer. Nodes within the same layer do not interact, but instead, they connect with the next layer’s nodes through a set of weights and biases, whose values are established during training, often utilizing stochastic gradient descent (SGD) as the standard method. However, the process of how networks in different layers collaborate to solve specific problems remains enigmatic. Hamiltonian Neural Networks (HNNs) are a type of neural network architecture that use the principles of Hamiltonian mechanics to model and learn dynamic systems. Hamiltonian mechanics is a branch of classical mechanics that describes the evolution of a system in terms of its energy and momentum, rather than its position and velocity. In an HNN, the network architecture is designed to represent the Hamiltonian of a given system. The input to the network is the state of the system at a particular time, and the output is the Hamiltonian at that time. By training the network on a set of known trajectories, HNNs can learn to accurately model and predict the behavior of a dynamic system. HNNs have been successfully applied to a variety of domains, including physics, robotics, and control systems. They have been shown to be particularly effective at learning the dynamics of complex physical systems, such as fluid flow or molecular dynamics. One of the key advantages of HNNs is that they can be used to model systems that are difficult or impossible to model analytically. This makes them a powerful tool for scientific discovery and engineering applications [82].
1.10.2 Quantum Machine Learning The race to conquer the quantum information domain has accelerated the pace of research and innovation toward developing radical innovative approaches for quantum-based technologies. Quantum computing, communication, sensing combinatorial optimization, and quantum machine learning are among the most competitive and dynamic research and development areas currently on the verge of unlocking practical new quantum capabilities that will have a profound effect on almost every aspect of our lives. Recently, the new field of quantum machine learning has emerged, with the goal of combining machine learning and quantum computing. The objective is to find quantum implementations of machine learning algorithms which have the expected power of quantum computers and the flexibility and learning capabilities of machine learning algorithms. One of the problems to be solved in quantum machine learning is the limitation present in
44
1 An Introduction to Deep Learning
the quantity of input data that the proposed implementations can handle. Although manybody quantum systems have a Hilbert space whose dimension increases exponentially in relation to the size of the system, permitting to store and manipulate a huge quantity of data, an important problem is to initialize accurately and efficiently this quantum state with the desired data. In machine learning this stage is essential, since learning a problem needs a lot of learning data. Quantum computation promises to expedite the training of classical neural networks and the computation of big data applications, generating faster results. A quron is a qubit in which the two levels stand for active a resting neural firing states. This allows for neural network to be in a superposition of firing patterns [83]. When considering the usefulness of Quantum Machine Learning (QML), it is important to note that simply attempting to accelerate classical ML algorithms on a quantum processing unit may not be the best approach. QML involves both the data generation process and the data processing device, with each component being either classical or quantum. Traditional ML relies on classical processes for both components, while QML often involves classical data generation and quantum processing. However, a more promising avenue for QML is when the data generation process is intrinsically quantum, such as in fields like Physics, Chemistry, or Biology, where researchers deal with quantum data on a regular basis. By utilizing quantum data to run quantum ML algorithms, there is evidence that this could lead to significant advancements in scientific research. Hybrid quantum–classical architectures, which combine classical and quantum processing units, may also be useful in certain QML applications, such as Tensorflow Quantum.
1.10.3 Federated Learning Federated learning is a promising learning mode which assure privacy where learning the model parameters is performed over a central unit, either a data center or an edge host, while the data are kept in the peripheral nodes. In centralized federated learning, the devices do not send their data to any remote server. They only share local estimates of the parameters to be learned. Each device can boost its performance without exchanging data, thus preserving privacy [84–86]. Federated Learning (FL) enables collaborative training of machine learning models for edge devices (e.g., mobile phones) over a network without revealing raw data of the participants. Noting that, edge devices have limited hardware resources [87].
1.10.4 Self-supervised Learning Self-Supervised Learning (SSL) is a ML paradigm where a model, when fed with unstructured data as input, generates data labels automatically, which are further used in subsequent iterations as ground truths. Self-Supervised Learning has been successful in
1.10
New Trends in Machine Learning
45
multiple fields i.e., text, image/video, speech, and graph. The term self-supervised refers to creating its own supervision, i.e., without supervision, without labels. Self-supervised learning is one category of unsupervised learning. Transfer learning is a key enabler for self-supervised learning [88].
1.10.5 Zero-Shot Learning and Few-Shot Learning Zero-shot learning is a machine learning technique that allows a model to generalize to new, unseen classes that it has not been trained on. In traditional machine learning, a model is trained on a set of labeled data, and can only recognize the classes that it has seen during training. However, zero-shot learning aims to extend this capability to recognize new, previously unseen classes. In zero-shot learning, the model is trained on a set of labeled data that includes both seen and unseen classes. During training, the model learns to associate each class with a set of attributes or features, such as visual or semantic characteristics. These attributes are used to create a semantic space that represents the relationships between different classes. When the model is presented with a new, unseen class during testing, it can use the learned semantic space to infer the attributes or features associated with the class. This allows the model to recognize the new class, even though it has not seen any examples of it during training. Zero-shot learning has many potential applications, particularly in scenarios where it may be difficult or impractical to collect large amounts of labeled data for all possible classes. For example, in image classification, zero-shot learning could be used to recognize new types of objects or scenes, such as rare or exotic animals, without the need for additional labeled data [89]. Few-shot learning is a machine learning technique that involves training a model to recognize new objects or classes with only a small amount of labeled data. In traditional machine learning approaches, large amounts of labeled data are required to train a model to perform a task accurately. However, in real-world scenarios, obtaining labeled data for every new task or class can be time-consuming and expensive. Few-shot learning addresses this issue by training a model to learn how to generalize from a few examples. Typically, the training set consists of a small number of examples, often in the range of 1–10 examples per class. The model is then trained to recognize new objects or classes using this small amount of data. One popular approach to few-shot learning is metalearning, also known as “learning to learn.“ In this approach, the model is trained to learn how to learn from a few examples, by being trained on multiple few-shot learning tasks. The model then applies the learned knowledge to new tasks. Few-shot learning has applications in a variety of areas, including computer vision, natural language processing, and robotics. It enables machines to learn and adapt to new tasks quickly, making it an important technique for artificial intelligence systems that need to operate in dynamic and changing environments [90].
46
1 An Introduction to Deep Learning
In summary, zero-shot learning is a technique for recognizing new objects or concepts without any labeled examples from that class, while few-shot learning involves recognizing new objects or concepts with a small number of labeled examples, typically between 1 and 10 examples per class.
1.10.6 Neurosymbolic AI Neurosymbolic AI refers to an emerging approach in artificial intelligence that integrates the strengths of both symbolic and neural AI models. This approach aims to overcome some of the limitations of each paradigm by combining them in a synergistic way. The first wave of AI was symbolic AI, which used logic and rule-based systems to reason about the world. This approach was limited by its inability to deal with uncertainty and the complexity of real-world data. The second wave of AI was neural networks, which are based on the idea of learning from data. This approach has been successful in areas such as image and speech recognition but is limited by its lack of explainability and interpretability. The third wave of AI, neurosymbolic AI, seeks to address these limitations by combining the strengths of both symbolic and neural AI models. The symbolic part of the approach provides the ability to reason about abstract concepts and make decisions based on logical rules. The neural part of the approach provides the ability to learn from data and deal with uncertainty. Neurosymbolic AI has the potential to revolutionize many areas of AI, including natural language processing, robotics, and scientific discovery. By combining the strengths of symbolic and neural AI models, neurosymbolic AI could lead to more intelligent and flexible systems that can reason about the world in a way that is both explainable and interpretable [91].
1.10.7 Binarized Neural Networks Network compression techniques have given rise to binarized neural networks (BNNs), where weights and activations are represented by a single bit. As a result, the generally expensive multiply-accumulate (MAC) operations between weights and activations can be reduced to low-complexity bitwise XNOR-bit count operations. This makes BNNs powerefficient, computationally faster, and a memory-saving alternative to conventional CNNs. Binary neural networks are networks with binary weights and activations at run time. At training time these weights and activations are used for computing gradients; however, the gradients and true weights are stored in full precision. Binary value can be 0, 1 or 1, −1 [92].
1.10
New Trends in Machine Learning
47
1.10.8 Text to Video Machine Learning Text to video machine learning is a field of artificial intelligence that involves training algorithms to generate videos based on text input. This technology is also known as textto-scene or scene generation. The process involves breaking down the text input into relevant concepts and identifying the key elements that need to be included in the video. These elements can include characters, settings, actions, and emotions. The algorithm then uses this information to generate a visual representation of the text, often in the form of an animated video. Text to video machine learning has a wide range of potential applications, including creating explainer videos, generating content for social media, and even creating animated films. However, it is still a relatively new field and requires significant development and refinement to produce high-quality results. As with any machine learning technology, text to video algorithms require large amounts of data to train effectively. This can include video footage, text descriptions, and other relevant information. As more data becomes available and algorithms continue to improve, we can expect to see text to video machine learning become an increasingly important tool in content creation and storytelling [93].
1.10.9 Graph Neural Networks Graph neural networks (gnn) have gained considerable attention in various fields, including natural language processing, image processing, and computational biology. One of the main advantages of these neural networks is that they enable users to specify the connectivity pattern of input states, providing greater flexibility. However, the challenge lies in updating the model based on this connectivity pattern, which has been the subject of research in several models. Additionally, modern approaches are exploring the possibility of reshaping the graph, such as converting edges into nodes, to adjust computational costs and increase flexibility (Fig. 1.40) [94, 95].
1.10.10 Large Language Model (LLM) A large language model (LLM) is a type of machine learning model that can perform a variety of natural language processing (NLP) tasks, including generating and classifying text, answering questions in a conversational manner and translating text from one language to another (Fig. 1.41). The label "large" refers to the number of values (parameters) the model can change autonomously as it learns. Some of the most successful LLMs have hundreds of billions of parameters [96]. Large Language models use reinforcement learning for fine-tuning. Reinforcement Learning is a feedback-driven Machine learning
48
1 An Introduction to Deep Learning
Fig. 1.40 GNN
Fig. 1.41 NLP representation: NLU, NLG
method based on a reward system. An agent learns to perform in an environment by completing certain tasks and observing the results of those actions. The agent gets positive feedback for every good task and a penalty for each bad action. However, the combination of the supervised learning and reinforcement learning can attain optimal performance. Examples of LLMs include Generative Pre-trained Transformer 3 (GPT-3), Bidirectional Encoder Representations from Transformers (BERT), Large Language Model Meta AI ( LLaMA), and multimodal large language model (MLLM).
1.10
New Trends in Machine Learning
49
In the context of deep learning, a tokenizer is a crucial component for processing textual data, which is commonly used in natural language processing (NLP) tasks. A tokenizer is a function or a module that breaks down a string of text into individual tokens or words, usually by separating them based on whitespace or punctuation marks. This process of tokenization converts raw text into a sequence of numerical values that can be easily fed into a neural network for training and inference. Tokenization is an essential step in many NLP tasks, including sentiment analysis, text classification, and language translation. Deep learning models rely heavily on tokenization to preprocess textual data and convert them into a format that can be effectively processed by the neural network. There are various types of tokenizers used in deep learning, including word-level tokenizers, character-level tokenizers, subword-level tokenizers, and byte-pair encoding (BPE) tokenizers. Each type of tokenizer has its strengths and weaknesses, depending on the specific task and the characteristics of the data being processed.
1.10.10.1 GPT OpenAI introduced Generative Pretrained Transformer (GPT). It is a very large language model trained on a massive dataset from the internet and Wikipedia [97]. It was pretrained on a lot of text from the Internet and thus ready to respond to questions, it uses a particular kind of neural network architecture called the Transformer [98–100]. The transformer neural network is a novel architecture that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease (Fig. 1.42). It was proposed to overcome the disadvantages of RNN such as slow training time and vanishing gradient. transformers process the entire input all at once unlike RNNs. It is major application is NLP. GPT-3 was first introduced in June 2020. It is trained on a massive dataset of over 570 GB of text data, which includes a diverse range of sources such as books, articles, and websites. GPT-3 has 175 billion parameters, making it the largest language model to date. GPT-4 is able to process multiple types of data including Videos, Images, Sounds, Numbers etc.
1.10.10.2 BERT Google introduced Bidirectional Encoder Representations from Transformers (BERT). It makes use of Transformer, an attention mechanism that learns contextual relations between words in a text [101]. In its vanilla form, Transformer includes two separate mechanisms: an encoder that reads the text input and a decoder that produces a prediction for the task. The difference between GPT-3 and BERT is their architecture. GPT-3 is an autoregressive model, while BERT is bidirectional. GPT-3 considers only the left context when making predictions, BERT considers both left and right context (Figs. 1.43 and 1.44) [102, 103]. BERT is pretrained on massive data and is used in General domain applications.
50
1 An Introduction to Deep Learning
Fig. 1.42 Transformer architecture: stacked encoders and decoders. A transformer uses Encoder stack to model input and uses decoder stack to model output using input information from encoder side. If we do not have input, we just want to model the next word, we can get rid of the Encoder side of a transformer and output “next word” one by one. This gives us GPT. If we are only interested in training a language model for the input for some other tasks, then we do not need the Decoder of the transformer, that gives us BERT
Pre-trained representatio
Context free Contextual
Uni-directional
GPT
Bi-directional
BERT
Fig. 1.43 GPT versus BERT
1.10.10.3 LLaMA Meta introduced Large Language Model Meta AI (LLaMA). It is a collection of foundational large language models ranging from 7 to 65 billion parameters. LLaMA is creating excitement because it is smaller than GPT-3 but performs better. One of the most important features of LLaMA is that it is available under a non-commercial license to researchers and other organizations, making it easier for them to use it for their work [104].
1.10
New Trends in Machine Learning
51
Fig. 1.44 GPT versus BERT. BERT uses a bidirectional transformer. GPT uses a left to right transformer
1.10.10.4 MLLM Microsoft introduces a multimodal large language model (MLLM) called Kosmos-1. It can perceive general modalities to answer questions provided different modalities like image and text. Moreover, it demonstrates visual dialogue capabilities. Also, it shows potential in more complex tasks such as visual IQ [105].
1.10.10.5 Green AI Green Artificial Intelligence (AI) refers to the development of AI technologies that are environmentally sustainable and aimed at reducing the carbon footprint of AI models and applications. The goal of GREEN AI is to minimize the negative impact of AI on the environment and promote more sustainable practices in the field. AI models and applications can have a significant carbon footprint due to the large amounts of computational power and energy required for training and inference. GREEN AI addresses this issue by using techniques such as model compression, sparsity, and quantization to reduce the computational requirements of AI models. Additionally, GREEN AI promotes the use of renewable energy sources, such as solar or wind power, to power the computing infrastructure used for training and inference. GREEN AI is an important area of research as the demand for AI technologies continues to grow, and concerns about the environmental impact of AI are becoming more prominent. By developing more sustainable and eco-friendly AI technologies, we can help mitigate the negative effects of AI on the environment and promote a more sustainable future [106].
52
1.11
1 An Introduction to Deep Learning
Conclusions
The real advantage of neural networks lies in their power and ability to represent both linear and non-linear relationship. Moreover, their ability to learn these relationships directly from the modeled data series using different training mechanism. This chapter explores different ML and DL algorithms and architectures. Also, it gives insights into the future direction for machine learning. Deep Learning has emerged as one of the most powerful and transformative technologies in the field of artificial intelligence. By using deep neural networks with multiple layers, Deep Learning has enabled breakthroughs in computer vision, natural language processing, speech recognition, and many other areas. With the availability of large amounts of data and computational power, Deep Learning has been able to achieve remarkable results in tasks that were previously considered difficult or even impossible for machines. However, as with any powerful technology, Deep Learning also has its limitations and challenges. These include the need for large amounts of data, the difficulty in interpreting and understanding the internal workings of deep neural networks, and the potential for biased or unfair results. Nevertheless, Deep Learning is likely to continue to play an increasingly important role in shaping the future of technology and society, and it is important for researchers, developers, and policymakers to stay informed about its potential and limitations.
References 1. F. Hayes-Roth, Rule-based systems. Commun. ACM 28(9), 921–932 (1985) 2. G. Chandrashekar, F. Sahin, A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014) 3. Y. Bengio, A. Courville, P. Vincent, Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013) 4. Y. Geng, M. Liu, Q. Li, R. He, Introduction of Machine Learning (Institution of Engineering and Technology (IET), London, 2019) 5. https://github.com/JohnSnowLabs/spark-nlp 6. V. Kennady, P. Mayilsamy, Migration of batch processing systems in financial sectors to near real-time processing. Int. J. Sci. Res. Publ. (IJSRP). 12, 483–493 (2022). https://doi.org/10. 29322/IJSRP.12.07.2022.p12755 7. W. Shi, R. Rajkumar, Point-gnn: graph neural network for 3d object detection in a point cloud, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 1711–1719 8. H. Ai, W. Xia, Q. Zhang, Speaker recognition based on lightweight neural network for smart home solutions, in Proc. Int. Symp. Cyberspace Saf. Secur. (Springer, Cham, Switzerland, 2019), pp. 421–431 9. G.R. Ball, S.N. Srihari, Semi-supervised learning for handwriting recognition, in ICDAR’09. 10th International Conference on Document Analysis and Recognition (IEEE, 2009) 10. B. Zhu, J. Jiao, M.I. Jordan, Principled reinforcement learning with human feedback from pairwise or k-wise comparisons (2023). arXiv preprint arXiv:2301.11270
References
53
11. K. Gurney, An Introduction to Neural Networks (1996) 12. F. Wang, Z. Zhen, B. Wang, Z. Mi, Comparative study on KNN and SVM based weather classification models for day ahead short term solar PV power forecasting. Appl. Sci. 8(1) (2018) 13. H. Drucker, D. Wu, V.N. Vapnik, Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999) 14. https://www.tutorialspoint.com/machine_learning_with_python/classification_algorithms_s upport_vector_machine.htm 15. P. Lingras, C.J. Butz, Precision and recall in rough support vector machines, in Proc. of the 2007 IEEE Int. Conference on Granular Computing (GRC 2007) (IEEE Computer Society, Washington, DC, USA, 2007), pp. 654–654 16. K. Mayur, Decision Trees for Classification: A Machine Learning Algorithm (Xoriant, 2017) 17. X. Yan, X. Su, Linear Regression Analysis: Theory and Computing (World Scientific, Singapore, 2009) 18. R. Agarwal, P. Sagar, A comparative study of supervised ML algorithms for fruit prediction. J. Web Dev. Web Des. 4(1), 14–18 (2019) 19. K.P. Murphy, Probabilistic Machine Learning: An Introduction (PML-1) (The MIT Press, 2022) 20. K.P. Murphy, Probabilistic Machine Learning: Advanced Topics (PML-2) (The MIT Press, 2022) 21. C.M. Bishop, Pattern Recognition and Machine Learning (PRML) (Springer, 2007) 22. A. Bosch, A. Zisserman, X. Munoz, Image classification using random forests and ferns, in: Proceedings of the International Conference on Computer Vision (2007) 23. https://towardsdatascience.com/k-means-clustering-algorithm-applications-evaluation-met hods-and-drawbacks-aa03e644b48a 24. J. Ham, M. Kamber, Data Mining: Concepts and Techniques, 2nd edn. (Morgan Kaufman Publishers, 2006), pp. 1–6 25. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski et al., Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015) 26. L. Alzubaidi et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data 8(1), 53 (2021). https://doi.org/10.1186/s40537-021-004 44-8 27. https://www.tero.co.uk/writing/ml-rnn.php 28. K.I. Lakhtaria, D. Modi, Deep learning: architectures and applications, in Handbook of Research on Deep Learning Innovations and Trends, ed. by A. Hassanien, A. Darwish, C. Chowdhary (IGI Global, Hershey, PA, 2019), pp. 114–130 29. D. Svozil, V. Kvasnicka, J. Pospichal, Introduction to multi-layer feed-forward neural networks. Chemom. Intell. Lab. Syst. 39(1), 43–62 (1997) 30. Y. Zhang, P. Li, X. Wang, Intrusion detection for IoT based on improved genetic algorithm and deep belief network. IEEE Access 7, 31 711–31 722 (2019) 31. G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 32. G. Sreenu, M.A. Saleem Duran, Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J. Big Data (2019) 33. S. Albawi, T.A. Mohammed, S. Al-Zawi, Understanding of a convolutional neural network, in Proceedings of International Conference on Engineering and Technology (2017), pp. 1–6 34. B. Hammer, Learning with recurrent neural networks, in Lecture Notes in Control and Information Sciences (Springer, London, 2000)
54
1 An Introduction to Deep Learning
35. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 36. Q. Xiao, Y. Liang, Fune: an FPGA tuning framework for CNN acceleration. IEEE Des. Test (2019) 37. A. Khan, A. Sohail, U. Zahoora, A.S. Qureshi, A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020) 38. L. Zhao, Y. Zhang, J. Yang, SCA: a secure CNN accelerator for both training and inference, in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) (IEEE, 2020) 39. J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79(8), 2554–2558 (1982) 40. S. Haykin, Neural Networks and Learning Machines (Prentice-Hall of India, 2011) 41. A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S.W. Keckler, W.J. Dally, SCNN: an accelerator for compressed-sparse convolutional neural networks, in Proc. ISCA-44 (2017) 42. Y.-H. Chen, J. Emer, and V. Sze, Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks, in Proc. ISCA-43 (2016) 43. A. Mirzaeian, H. Homayoun, A. Sasan, NESTA: hamming weight compression-based neural proc. engine, in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) (IEEE, 2019) 44. Networks, in Advances in Neural Information Processing Systems, vol. 25 (2012), pp. 1097– 1105 45. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556 46. C. Szegedy et al., Going deeper with convolutions, in Proc.of the IEEE Conf. on Computer Vision and Pattern Recognition (2015), pp. 1–9 47. K. He et al., Deep residual learning for image recognition, in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (2016), pp. 770–778 48. W. Zhang, P. Li, Temporal spike sequence learning via backpropagation for deep spiking neural networks. Adv. Neural. Inf. Process. Syst. 33, 12022–12033 (2020) 49. N. Rathi, A. Agrawal, C. Lee, A.K. Kosta, K. Roy, Exploring spike-based learning for neuromorphic computing: prospects and perspectives, in 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE) (IEEE, 2021), pp. 902–907 50. S. Deng, S. Gu, Optimal conversion of conventional artificial neural networks to spiking neural networks, in International Conference on Learning Representations (2022) 51. M. Dutson, Y. Li, M. Gupta, Spike-based anytime perception, in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA (2023), pp. 5283– 5293. https://doi.org/10.1109/WACV56688.2023.00526 52. F. Chen, L. Song, Y. Chen, Regan: a pipelined reram-based accelerator for generative adversarial networks, in 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC) (IEEE, 2018), pp. 178–183 53. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courvile, Y. Bengio, Generative adversarial nets, in NIPS (2014), pp. 2672–2680 54. A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, A.A. Bharath, Generative adversarial networks: an overview. IEEE Signal Process. Mag. 35, 53–65 (2018) 55. A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, in: International Conference on Learning Representations (2016), pp. 1–16 56. D. Foster, Generative deep learning: teaching machines to paint, in Write, Compose and Play (O’REILLY, 2019)
References
55
57. F. Chen, H. Li, Emat: an efficient multi-task architecture for transfer learning using reram, in ICCAD (2018) 58. H. Azizpour, A. Razavian, J. Sullivan, A. Maki, S. Carlsson, Factors of transferability for generic ConvNet representation. IEEE Trans. Pattern Anal. Mach. Intell. 38 (2015) 59. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P.J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020) 60. D.P. Kingma, M. Welling, Auto-encoding variational bayes, in 2nd International Conference on Learning Representations, Canada, (2014) 61. A. Tawfik et al., A generic real time autoencoder-based lossy image compression, in 2022 5th International Conference on Communications, Signal Processing, and their Applications (ICCSPA), Cairo, Egypt (2022), pp. 1–6. https://doi.org/10.1109/ICCSPA55860.2022. 10019047 62. F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and = 1000 km/h
Latency
–
300 ms
100 ms
10 ms
1 ms
10–100 us
Multiplexing
FDMA
TDMA/ CDMA
CDMA
CDMA
CDMA
OFDM
Modulation
FM,FSK
GMSK
QPSK
TDD FDD
Massive MIMO Advanced MIMO
Massive MIMO Advanced MIMO
Web Standard
–
WWW
WWW (IPv4)
WWW (IPv4)
WWW (IPv6)
WWWW (IPv6)
Applications
Voice
Voice, text
Voice, Multimedia
Voice, Mobile Internet, Mobile TV
Virtual Reality, Augmented Reality, Wearable Device, Internet of Things
Fully automated vehicle, Driverless car
4. 4G: 4G, or the fourth generation, provided even faster data transfer speeds and lower latency, enabling applications such as high-definition video streaming, online gaming, and video conferencing. 4G also introduced the concept of LTE (Long-Term Evolution), which improved network performance and efficiency. 5. 5G: the fifth generation, provides even faster data transfer speeds, lower latency, and greater capacity than 4G. 5G enables new use cases, such as autonomous vehicles, remote surgery, and smart cities. 5G also uses advanced technologies, such as millimeter wave (mmWave) and massive multiple-input multiple-output (MIMO), to achieve these capabilities.
5.1
Introduction
Fig. 5.1 Different key performance indicators for different generations
Fig. 5.2 Spectrum in 5G and 6G
153
154
5 Deep Learning for 5G and Beyond
6. 6G: is still in the research and development stage, but it is expected to provide even faster speeds and lower latency than 5G. 6G is expected to enable new use cases, such as holographic communications and real-time virtual and augmented reality [60]-[61]. Deep learning (DL) can play vital roles to ensure the efficient and successful implementations of 5G/6G wireless communication technologies. ML is a sub-branch of AI which has the cognitive capability to perform computational task and predict the outcomes based on the predefined ML algorithms. The ML algorithms efficiently predict the results found by the learned ML agent during the training phase. The accuracy depends on the computational capability of the machine and the available training datasets. Deep learning (DL) uses cascaded neural network layers to extract features from the input data automatically and make a decision. The most important deep learning architectures that are suitable for wireless communications domain: . Multi-layer perceptrons (MLPs): are the basic models that are generally used in many learning tasks. . Convolutional neural networks (CNN): use convolution operation to reduce the input size are often used in image recognition tasks. . Recurrent neural network (RNN): are most suitable for learning tasks which require Sequential models. . Autoencoder (AE): are used for dimension reduction. . Generative adversarial networks (GANs): are used to generate samples similar to the available dataset. . Reinforcement learning (RL) models enable BSs to learn from the real-time feedback of dynamic/uncertain environment and mobile users, as well as from their historical experiences. Table 5.2 shows different ML techniques and their wireless communication applications. Deploying DL/ML in wireless communications can make it self-operational, self-configurational, and self-managed systems. Table 5.2 5G/6G: Different ML Techniques and Their Wireless Communication Applications. ML Techniques [17] Learning Model
Application
Supervised
Support Vector Machines Predict propagation path loss in wireless networks
Unsupervised
K-means clustering
Deciding which low-power node is upgraded to a high-power mode in order to reduce latency and the power node in wireless cellular networks
5.2
5.2
Deep Learning Applications in 5G/6G
155
Deep Learning Applications in 5G/6G
In the current-generation wireless communication, the physical layer suffers significantly from multiple channel impairments such as, fading, interference, etc., and many hardware impairments such as, amplifier distortion, local oscillator leakage, etc. To communicate reliably and efficiently in the presence of combined impairments, it is required to control and optimize multiple design parameters jointly. Advanced deep learning technologies could be useful to optimize the physical layer’s performance of wireless networks. The existing wireless systems have various physical layer issues including noise, channel and hardware impairments, quadrature imbalance, interference, and fading. AI technologies can provide an optimal way to communicate among different hardware. 5G networks have the capability to transfer large amounts of data at high speeds, making it easier for deep learning algorithms to process and analyze large amounts of data. This can lead to more accurate models and improved performance for various applications. AI/DL can be used to address many wireless communication problems of high complexity [1]. Many problems in 5G wireless communications are nonlinear thus can be approximated. NN-based machine learning can be used for such problems [15, 16]. Machine learning in 5G wireless communications can address the optimal physical layer design, complex decision making, network management, and resource optimization tasks (spectral management), change in processes may be evaluated to achieve minimal energy evaluation, network traffic classification and management [2]. ANN can be used for denoising signals and enhancing the SNR instead of traditional approach of using constellation de-mapping and LDPC decoders [3]. Deep Learning-Based NOMA, Deep Learning-Based Massive MIMO, Deep Learning-Based mm-Wave, Deep Learning for Channel Estimation [4, 5]. AI/DL can be used for Dynamic Spectrum Sharing and MLbased channel estimation. Moreover, it can be used to enhance/optimize handover [18]. Table 5.3 summarize ML applications at different layers of the communications protocol stack. The physical layer’s primary functions include encoding/decoding, error detection, modulation/demodulation, digital and analog beam-formation, RF processing, and MIMO handling. ML-based techniques enhance the properties of the physical layer and make the process efficient and optimized [17, 27]. More intelligence is needed. A complex and heterogeneous network such as 6G will require an AI paradigm which is self-aware, self-adaptive, proactive, and prescriptive in order to make the network intelligent. AI-bases wireless transceiver versus conventional one is depicted in Fig. 5.3. ML algorithms should be deployed and trained at different levels of the network: management layer, core, radio base stations, and as well as in mobile devices. Given sufficient samples, the DNN can extract important features from network inputs and realize endto-end learning for predicting or regressing. Federated learning is a promising learning
156 Table 5.3 5G and 6G: ML applications at different layers of the communications protocol stack
5 Deep Learning for 5G and Beyond
Layer
ML application
Application layer
Performance Management
Network layer
Routing Anomaly detection Traffic prediction
MAC layer
Flexible duplex Power Management Resource allocation and management modulation and coding scheme selection Scheduling Link Evaluation
Physical layer
Channel Coding Synchronization Positioning Channel Estimation/Prediction Beamforming Design Optimization Spectrum Sensing, Sharing and management Interference Management Error Detection
mode which assure privacy where learning the model parameters is performed over a central unit, either a data center or an edge host, while the data are kept in the peripheral nodes. In centralized federated learning, the devices do not send their data to any remote server. They only share local estimates of the parameters to be learned. Each device can boost its performance without exchanging data, thus preserving privacy [6, 7, 26]. The huge advances in technological applications such as unmanned aerial vehicles (UAVs), massive robotics, vehicle-to-everything (V2X), and Internet of Things (IoT) will require more advanced wireless communication systems to meet their massive data exchange requirements and demands. Machine learning will Play a vital role in the future wireless communication networks. Moreover, it will be an enabling technology for several advanced services and functions. We can deploy ML technology to 5G/6G wireless communications to address challenges in physical layer design, MAC layer, Network layer, and Application layer [8, 14]. Physical layer transmits the data on the wireless medium using different modulating schemes including channel coding, advanced modulation schemes like OFDM and MIMO.
5.2
Deep Learning Applications in 5G/6G
157
Fig. 5.3 From 1 to 6G
Medium access control (MAC) layer of a cellular networks performs tasks such as user selection, user pairing for MIMO systems, resource allocation, modulation and coding scheme selection, power management and control of uplink transmissions and random access, mobility, and handover control. ML can be trained to learn the characteristics and model systems that cannot be presented by a mathematical equation. It can be used too in Power control, beamforming, and modulation and coding scheme selection, interference detection and mitigation, uplink, and downlink reciprocity in FDD, channel prediction. Network layer is responsible for connection management, mobility management, load, and routing protocols management [29, 30]. AI is a key enabler for the next generation 6G mobile networks. There are many levels where ML can be integrated into 6G wireless communication system. Table 5.3 summarize ML applications at different 6G layers of the communications protocol stack. Taking into consideration that power, cost, and size are always important in implementations of neural networks. In 6G communications, the physical layer will be intelligent so that the end-to-end system will be capable of self-learning and optimization, combining AI technologies, advanced sensing and data collection, and domain-specific signal processing methods [31,
158
5 Deep Learning for 5G and Beyond
42]. Deep neural networks rely on massive and high-quality data to gain satisfying performance. When training a large and complex architecture, data volume and quality are very important, as deeper model usually has a huge set of parameters to be learned and con figured. This issue remains true in mobile network applications [42]. In the context of wireless networks, the stability and flexibility of machine learning (ML) techniques are crucial factors that must be considered. These algorithms must be adaptable to the dynamic changes that occur in the network environment. For example, the weights of a neural network are evaluated online using the learned data, but this may not always be suitable for wireless networks, particularly in a standard where coordination between different entities belonging to different operators and offered by different suppliers must coexist. Furthermore, the demand for rapid reaction may preclude the use of certain alternatives. To address these challenges, several solutions have been proposed, including pre-trained or partially trained neural networks, cloud-based downloadable datasets for NN training, codebook-based NNs, and hybrid NNs. Codebook-based NNs use a codebook of different NNs that are agreed upon by all parties involved. Hybrid NNs use a cloud-based downloadable dataset for NN training. Another challenge associated with ML algorithms in wireless networks is the complexity of these techniques. In most cellular network entities, there are constraints on battery life, storage, processing capability, and communication bandwidth. As a result, the cost-performance tradeoff of an ML model becomes a basic concern. The speed or time-steps at which training and inference must be executed is another concern [59, 61] (Fig. 5.4).
5.3
Deep Learning Applications at Physical Layer
5.3.1
Channel Coding
Channel coding introduces redundancy in the transmitted signals to protect the information from channel noise and interference that create unwanted errors. Noise and interference disturb the reliability of digital communication systems and the errorcorrecting codes applied to control these occurrences are classified into linear block codes (i.e., Hamming codes, BCH codes, and Reed Solomon codes), and convolutional codes. Channel coding can be evaluated based on their signal to noise ratio (SNR) and block error rate (BLER) performance. In 3G and 4G wireless communication, turbo code, a mix between convolutional and block codes, was implemented given that it is one of the best forward error correction (FEC) codes that perform closest to the Shannon limit and had remarkable power efficiency in additive white Gaussian noise (AWGN) and flat-fading channels for moderately low bit rate error (BER). However, for 5G wireless communication, due to its high throughput and low latency requirements, low-density parity-codes (LDPC) are adopted over turbo codes. Although LDPC codes achieve better performance
5.3
Deep Learning Applications at Physical Layer
159
Fig. 5.4 a Conventional wireless transceiver, b AI-based wireless transceiver
and faster decoding, it contains a higher encoding complexity than turbo coding and typically requires more iterations than iterative turbo decoding which could lead to higher. latency. Since 5G communications systems have adopted polar codes, specifically LDPC codes, DL has been used to dis- cover methods to blindly identify LDPC codes, reduce the decoding delay, analyze the trade- off of LDPC codes for channel coding, develop error correction codes for nonlinear channels [38] and optimize the decoding algorithm to solve a non-convex minimization problem [44, 51].
160
5.3.2
5 Deep Learning for 5G and Beyond
Synchronization
All wireless devices must go through time, frequency, and cell synchronization procedures [22]. Before the decoder decodes the received data, it needs to detect the location of the frame header. And the frame header is a series of specific constant data, the task is to detect them despite the impairments of the channel which is similar to the application of CNNs on ImageNet classification [45].
5.3.3
Positioning
Positioning is used in Identifying the location of users in indoor or outdoor environments, based on various signals received from mobile devices. The problem of location determination is formulated as a multioutput regression problem. To predict of the (x, y) co-ordinates, Authors in [46] used Linear Regression (LR), Polynomial Regression (PR), Support Vector Regression (SVR), and KNN regressor are implemented, using mostly default parameters.
5.3.4
Channel Estimation/Prediction
Since DNN has to be trained offline because of requirements on long training period and large training data, mismatches between the real channels and the channels in the training phase may cause performance degradation. Online training might be promising approaches to overcome this problem. CNN-based techniques can be used to adjust the physical layer parameters in noisy environment [20].
5.3.5
Beamforming Design
Beamforming improves the signal-to-interference- noise ratio (SINR) at the intended users and reduces the interference to other users by improving the directionality of the transmit signal. The beamforming is performed by multiplying the transmit signals by beamforming coefficients, which are calculated according to the channel status. Machine learning can be incorporated in such system to improve the efficiency of the beamforming calculation and reduce the computational complexity. DL can enhance the beamforming design to provide the optimized beam recognition [19]. Advanced ML-based algorithms can be used for intelligent beam prediction at both the base stations and devices. This brings reduced ignalling overhead that can improve usable network capacity and device battery life.
5.3
Deep Learning Applications at Physical Layer
5.3.6
161
Optimization
ML is capable of modeling systems that cannot be presented by a mathematical equation which can help in better optimization. For example, deep learning algorithms can be used to dynamically adjust the transmission parameters to minimize the power consumption of the 5G base station and the user equipment. Recently, deep neural network (DNN) has been widely adopted in the design of intelligent communication systems thanks to its strong learning ability and low testing complexity [57]. DNN-based approach in this article to solve general optimization problems in wireless communication, where a dedicated DNN is trained for each data sample. Specifically, the optimization variables and the objective function are treated as network parameters and loss function, respectively [58].
5.3.7
Spectrum Sensing, Sharing and Management
Frequency spectrum is the most valuable and limited resource among wireless communication resources. Various ML techniques are proposed to compensate the huge traffic demands and efficiently manage spectrum resources [25]. The proper training of the AI framework forces an offline training model to easily recognize online spectrum management solutions. Spectrum sensing (SS) is an important tool in finding new opportunities for spectrum sharing. Spectrum sensing (SS) is an important tool in finding new opportunities for spectrum sharing. Federated Learning can be used for 5G Radio Spectrum Sensing [37–40].
5.3.8
Interference Management
Interference is a major challenge in wireless communications. Due to the increased usage of spectrum caused by the exponential growth of wireless devices, detecting and avoiding interference has become an increasingly relevant problem to ensure uninterrupted wireless communications [41]. DL can be used to develop algorithms for efficient interference management by learning the interference patterns in the network and dynamically adjust the transmission parameters to mitigate the interference. Moreover, deep learning algorithms can be used to detect weak signals in the presence of strong interference, which can improve the reliability of the network.
162
5.3.9
5 Deep Learning for 5G and Beyond
Error Detection and Correction
DL can be used to develop algorithms for efficient error correction in 5G networks, which can help improve the reliability of the network and reduce the transmission errors. Authors of presents new NR-decoders, which are based on deep learning and machine learning, and combines them with ECC decoding [56].
5.4
Deep Learning Applications at MAC Layer
5.4.1
Flexible Duplex
Flexible duplex means supporting symmetric, asymmetric, and variable-rate traffic demanded by the different users. This can be achieved by dynamically allocating the corresponding numbers of subcarriers to the users having different rate and QoS requirements [21]. ML is a good candidate to be used for such an optimization task.
5.4.2
Power Management
Both spectrum reuse and spectrum sharing will cause inter-user interference and lead to errors unless signal power is under constraints. Effective control of the signal power can reduce inter-user interference, and it will increase system throughput. The key advantages of Deep Learning are the efficient learning of an enormous amount of data and the precise analysis for the hidden distribution. Therefore, Deep Learning methods can be used to solve complicated but useful energy efficiency optimization problems in wireless communication systems [54, 55].
5.4.3
Resource Allocation and Management
Resource allocation and management means the effective and efficient use of resources [32]. Classical resource allocation (RA) schemes rely on numerical techniques to optimize various performance metrics. Most of these works can be defined as instantaneous since the optimization decisions are derived from the current network state without considering past network states. While utility theory can incorporate long-term optimization effect into these optimization actions, the growing heterogeneity and complexity of network environments has rendered the RA issue intractable. One prospective candidate is reinforcement learning (RL), a dynamic programming framework which solves the RA problems optimally over varying network states. Still, such method cannot handle
5.4
Deep Learning Applications at MAC Layer
163
the highly dimensional state-action spaces in the context of Cloud radio access network (CRAN) problems. DL algorithms can be used to allocate resources dynamically in NOMA and OFDM systems, which can help improve the spectral efficiency of the network [53].
5.4.4
Modulation and Coding Scheme Selection
ML is used for modulation and coding scheme selection based on the channel estimation [23, 24]. This can help improve the spectral efficiency of the networks. Fully connected layers are used to design the channel estimator, where it consists of two Multi-Layer Perceptron (MLP) architectures that estimate the real and imaginary parts of Channel State Information (CSI) independently [43].
5.4.5
Scheduling
With 5G, the complications of Air interface scheduling have increased due to the usage of massive MIMO, beamforming, and introduction of higher modulation schemes with varying numerologies. DL algorithms can be used to dynamically schedule the data transmission based on the network conditions, which can help improve the network performance and reduce the latency [52].
5.4.6
Link Evaluation
In wireless networks, the propagation channel conditions for radio signals may vary significantly with time and space, affecting the quality of radio links. In order to ensure a reliable and sustainable performance in such networks, an effective link quality estimation (LQE) is required by some protocols and their mechanisms, so that the radio link parameters can be adapted and an alternative or more reliable channel can be selected for wireless data transmission. DL algorithms can be used to dynamically adjust the transmission parameters based on the channel conditions, which can help improve the link quality and reduce the transmission errors. Additionally, deep learning algorithms can be used to evaluate the link quality in real-time, which can help improve the network performance and reliability [52].
164
5 Deep Learning for 5G and Beyond
5.5
Deep Learning Applications at Network Layer
5.5.1
Routing
Network traffic routing is fundamental in networking, and entails selecting a path for packet transmission. Different machine learning techniques are employed to handle the dynamics of routing mechanisms [48].
5.5.2
Anomaly Detection
Anomaly detection is one of the most promising areas of research in detecting novel attacks. However, its adoption to real world applications is hindered due to system complexity requiring a large amount of testing, tuning and evaluation. Anomaly detection means identifying rare items or events in data sets that do not conform to expected behavior. An anomalous traffic pattern in a computer or an IoT device could mean that the device is compromised or hacked. Machine learning can be used to provide end-to-end protection for 5G networks [49].
5.5.3
Traffic Prediction
Network traffic prediction plays a key role in network operations and management for today’s increasingly complex and diverse networks. It entails forecasting future traffic. DL algorithms can be used to predict the future traffic demand in the network, which can help improve the network planning and resource allocation. ML and time series analysis, which have been used in a variety of applications, are considered as powerful tools for modelling, and forecasting network traffic. ML allows for the systematic extraction of useful information from traffic data and the automated discovery of correlations that otherwise is too complicated for human specialists to obtain [50].
5.6
Deep Learning Applications at Application Layer
5.6.1
Performance Management
Performance management with machine learning can prevent bottlenecks and performance issues. Tuning operating parameters and achieving cross-layer optimization to maximize the end-to-end performance is a challenging task. This is especially complex due to the huge traffic demands and heterogeneity of deployed wireless technologies. To address these challenges, machine learning (ML) is increasingly used to develop advanced
5.8
Conclusions
165
approaches that can autonomously extract patterns and predict trends based on environmental measurements and performance indicators as input. Such patterns can be used to optimize the parameter settings at different protocol layers, e.g., PHY, MAC or network layer [47].
5.7
ML Deployment for 5G/6G: Case-Studies
5.7.1
BER Estimator
Bit error rate (BER) is a critical task to guarantee wireless communication system’s performance. The bit error rate or bit error ratio (BER) is defined as the number of bit errors divided by the total number of transferred bits during a studied time interval. It is not practical to directly estimate extremely small BER, because it would take a long time to observe a bit error. CNN are used as the BER estimator. Channel, noise, and Rx performance is used as training data set. BER curves and images are used as label during training ML model data input.
5.7.2
Blockchain for Security
The integration of blockchain with 6G can be a possible solution to overcome security and privacy issues associated with 6G. The blockchain is capable of providing better and effective privacy in many forms. Firstly, the data are stored in a time-stamped and immutable mode to prevent data modification. In terms of data exchange across the network, smart contracts can protect data from malicious users by providing user authenticity. DoS attack is one of the challenging threats in 6G. Although many solutions have been proposed to mitigate the issue such as using AI techniques to detect the DoS, this issue is going to be crucial challenging in the 6G due to the massive connectivity. The blockchain is a decentralized and distributed ledger in which transactions are added to the network and confirmed by a majority of the nodes participating in the network. SHA-256 hash was then used to link the new block to the preceding one. As a result, the blockchain provides a secure and immutable environment. Moreover, the data in the blockchain have a unique encrypted hash to guarantee its integrity [33, 34, 36].
5.8
Conclusions
Deep learning (DL) can play a vital role in improving the performance of wireless communications. However, deep learning for wireless communications is still in its early stages. DL can solve many complex 6G network problems. In this chapter, a discussion
166
5 Deep Learning for 5G and Beyond
on the potential technologies to enable solutions for 5G/6G network design and optimization are intoduced. Deep Learning for 5G and beyond has the potential to unlock the full potential of the next generation of wireless networks, enabling new applications and services that were once unimaginable. With the emergence of 5G, we are entering a new era of connectivity where data speeds will be faster, and latency will be reduced. Deep Learning algorithms can be used to optimize network performance, enable intelligent traffic management, and provide real-time analytics, enabling new use cases such as autonomous vehicles, smart cities, and remote healthcare. However, there are also significant challenges to be addressed, such as the need for standardized APIs, the complexity of integrating AI into the network, and the potential for bias or unfair results. It is important for stakeholders from industry, academia, and government to work together to address these challenges and ensure that Deep Learning for 5G and beyond is developed and deployed in a responsible and safe manner.
References 1. N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y.-C. Liang, and D. I. Kim, Applications of deep reinforcement learning in communications and networking: A survey, IEEE Commun. Surveys Tuts., 21(4), pp. 3133–3174, 4th Quart. (2019) 2. C. Zhang, P. Patras, H. Haddadi, Deep learning in mobile and wireless networking: A survey. IEEE Commun. Surv. Tutor. 21, 2224–2287 (2019) 3. K. Karra, S. Kuzdeba, and J. Petersen, Modulation recognition using hierarchical deep neural networks. In IEEE Int. Symp. Dynamic Spectrum Access Networks (DySPAN), (2017) 4. H. Ye, G.Y. Li, B.H. Juang, Power of deep learning for channel estimation and signal detection in OFDM systems. IEEE Wireless Commun. Lett. 7(1), 114–117 (2018) 5. X. Li, F. Dong, S. Zhang, W. Guo, A survey on deep learning techniques in wireless signal recognition. Wirel. Comms. Mob. Comput. 2019, pp. 1–12, 02 (2019) 6. Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: Concept and applications, ACM Transactions on Intelligent Systems and Technology (TIST), (2019) 7. S. Niknam, H.S. Dhillon, J.H. Reed, Federated learning for wireless communications: Motivation, opportunities, and challenges. IEEE Commun. Mag. 58(6), 46–51 (2020) 8. K.B. Letaief, W. Chen, Y. Shi, J. Zhang, Y.A. Zhang, The roadmap to 6G: AI empowered wireless networks. IEEE Commun. Mag. 57(8), 84–90 (2019) 9. G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, K. Huang, Toward an intelligent edge: Wireless communication meets machine learning. IEEE Commun. Mag. 58(1), 19–25 (2020) 10. C. She et al., Deep learning for ultra-reliable and low-latency communications in 6G networks. IEEE Netw. 34(5), 219–225 (2020) 11. N. Kato, B. Mao, F. Tang, Y. Kawamoto, J. Liu, Ten challenges in advancing machine learning technologies toward 6G. IEEE Wirel. Commun. 27(3), 96–103 (2020) 12. M. McClellan, C. Cervelló-Pastor, S. Sallent, Deep learning at the mobile edge: opportunities for 5G networks. Appl. Sci. 10(14), 35–47 (2020) 13. W. Guo, Explainable artificial intelligence for 6G: improving trust between human and machine. IEEE Commun. Mag. 58(6), 39–45 (2020)
References
167
14. P.V.R. Ferreira et al., Multi objective reinforcement learning for cognitive satellite communications using deep neural network ensembles. IEEE J. Sel. Areas Commun. 36(5), 1030–1041 (2018) 15. R. Shafin, L. Liu L, V. Chandrasekhar, H. Chen, J. Reed, J. Zhang, Artificial intelligenceenabled cellular networks: a critical path to beyond-5G and 6G. IEEE Wireless Commun. 1–6, (2020) 16. Y. Xiaohu, C. Zhang, X. Tan, S. Jin, H. Wu, AI for 5G: Research directions and paradigms, springer link, science China. Inf. Sci. 62, 21301 (2019) 17. M.Z. Chowdhury, M. Shahjalal, S. Ahmed, Y.M. Jang, 6G wireless communication systems: applications, requirements, technologies, challenges, and research directions. IEEE Open J. Commun. Soc. 1, 957–975 (2020) 18. M. Abbasi, A. Shahraki, A. Taherkordi, Deep learning for network traffic monitoring and analysis (NTMA): a survey. Comput. Commun. 170, 19–41 (2021) 19. G. Gui, M. Liu, F. Tang, N. Kato, F. Adachi, 6g: Opening new horizons for integration of comfort, security, and intelligence. IEEE Wirel. Commun. 27(5), 126–132 (2020) 20. X. Bao, W. Feng, J. Zheng, J. Li, Deep CNN and equivalent channel-based hybrid precoding for mmWave massive MIMO systems. IEEE Access 8, 19327–19335 (2020) 21. R. Rajashekar, C. Xu, N. Ishikawa, L.-L. Yang, and L. Hanzo, Multicarrier division duplex aided millimeter wave communications. IEEE Access, 7, pp. 100 719–100 732, (2019) 22. D. Gündüz, P. de Kerret, N.D. Sidiropoulos, D. Gesbert, C.R. Murthy, M. van der Schaar, Machine learning in the air. IEEE J. Sel. Areas Commun. 37(10), 2184–2199 (2019) 23. S. Hu, Y. Pei, P. P. Liang, and Y.-C. Liang, “Robust modulation classification under uncertain noise condition using recurrent neural network, In IEEE Glob. Commun. Conf. (GLOBECOM), pp. 1–7 (2018) 24. F. Meng, P. Chen, L. Wu, and X. Wang, Automatic modulation classification: A deep learning enabled approach. IEEE Trans. Veh. Technol., 67(11), pp. 10 760–10 772, (2018) 25. R. Li, Z. Zhao, Z. Xuan, G. Ding, C. Yan, Z. Wang, H. Zhang, Intelligent 5G: When cellular networks meet artificial intelligence. IEEE Wireless Commun. 24(5), 175–183 (2017) 26. A. Imteaj, M.H. Amini, FedPARL: Client activity and resource-oriented lightweight federated learning model for resource-constrained heterogeneous IoT environment. Frontiers Commun. Netw. 2, 10 (2021) 27. M. Katz, P. Pirinen, H. Posti, Towards 6G: getting ready for the next decade, In 2019 16th International Symposium on Wireless Communication Systems (ISWCS). IEEE 2019, 714–718 (2019) 28. Z. Zhang, et al., 6G wireless networks: vision, requirements, architecture, and key technologies. IEEE Veh. Technol. Mag. 14(3), pp. 28–41 (2019) 29. C. De Alwis, A. Kalla, Q.-V. Pham, P. Kumar, K. Dev, W.-J. Hwang, M. Liyanage, Survey on 6G frontiers: Trends, applications, requirements, technologies and future research. IEEE Open J. Commun. Soc. 2, 836–886 (2021) 30. M. Elsayed, M. Erol-Kantarci, AI-enabled future wireless networks: Challenges, opportunities, and open issues. In: IEEE Veh. Technol. Mag. 14.3, pp. 70–77 (2019) 31. A. bdualgalil, B., S. Abraham, Applications of machine learning algorithms and performance comparison: a review, in IEEE 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), 1–6. 10.1109. (2020) 32. K. Lin, Y. Li, Q. Zhang, G. Fortino, AI-Driven collaborative resource allocation for task execution in 6G-enabled massive IoT. IEEE Internet of Things J. 5264–5273 (2021) 33. D.C. Nguyen, P.N. Pathirana, M. Ding, A. Seneviratne, Blockchain for 5G and beyond networks: A state of the art survey. J Netw Comput Appl 166, 102693 (2020)
168
5 Deep Learning for 5G and Beyond
34. T. Maksymyuk, Blockchain-Empowered framework for decentralized network management in 6G. IEEE Commun Mag 58(9), 86–92 (2020) 35. Mohamed, Khaled Salah. Wireless communications systems architecture: transceiver design and DSP towards 6G. Springer Nature, (2022) 36. Mohamed, Khaled Salah, New frontiers in cryptography. New Frontiers in Cryptography: Quantum, Blockchain, Lightweight, Chaotic and DNA (1st ed.), Springer: 41–63 (2020) 37. M. Wasilewska, H. Bogucka, A. Kliks, Federated learning for 5G radio spectrum sensing. Sensors. 22(1), 198 (2022). https://doi.org/10.3390/s22010198 38. J. Xu, H. Wang, Client selection and bandwidth allocation in wireless federated learning networks: a long-term perspective. IEEE Trans. Wirel. Commun. 20, 1188–1200 (2021) 39. Z. Zhao, C. Feng, W. Hong, J. Jiang, C. Jia, T.Q.S. Quek, M. Peng, Federated learning with Non-IID data in wireless networks. IEEE Trans. Wirel. Commun. (2021) 40. Z. Yang, M. Chen, W. Saad, C.S. Hong, M. Shikh-Bahaei, Energy efficient federated learning over wireless communication networks. IEEE Trans. Wirel. Commun. 20, 1935–1949 (2021) 41. Robinson, Clifton, Uvaydov, Daniel, d’oro, Salvatore, Melodia, Tommaso. Narrowband interference detection via deep learning. https://doi.org/10.48550/arXiv.2301.09607. (2023) 42. C. Zhang, P. Patras, H. Haddadi, Deep learning in mobile and wireless networking: A survey, IEEE Commun. Surv. Tutor. 21(3), 2224–2287. https://doi.org/10.1109/comst.2019.2904897. (2019) 43. N. Soltani et al., Neural Network-Based OFDM receiver for resource constrained IoT Devices. IEEE Internet of Things Magazine 5(3), 158–164 (2022). https://doi.org/10.1109/IOTM.001. 2200051 44. A. Ly, Y.-D. Yao, A review of deep learning in 5G research: channel coding, massive MIMO, multiple access, resource allocation, and network security. IEEE Open J. Commun. Soc. 2, 396– 408 (2021) 45. H. Wu, Z. Sun, X. Zhou, Deep Learning-based frame and timing synchronization for end-to-end communications. J. Phys: Conf. Ser. 1169, 012060 (2019). https://doi.org/10.1088/1742-6596/ 1169/1/012060 46. Paudel, Krishna, Kadel, Rajan, Babarenda Guruge, Deepani. Machine-Learning-Based Indoor Mobile positioning using wireless access points with dual ssids—an experimental study. J. Sens. Actuator Netw. 11. https://doi.org/10.3390/jsan11030042 (2022) 47. M. Kulin, T. Kazaz, E. De Poorter, I. Moerman, A survey on machine learning-based performance improvement of wireless networks: PHY. MAC Netw. Layer. Electron. 10, 318 (2021). https://doi.org/10.3390/electronics10030318 48. P. Nayak, G.K. Swetha, S. Gupta, K. Madhavi, Routing in wireless sensor networks using machine learning techniques: challenges and opportunities. Measurement 178, 108974 (2021). https://doi.org/10.1016/j.measurement.2021.108974 49. Lam, Jordan, and Robert Abbas. Machine learning based anomaly detection for 5G networks. arXiv preprint arXiv:2003.03474 (2020) 50. Fengli Xu, Yuyun Lin, Jiaxin Huang, Di Wu, Hongzhi Shi, Jeungeun Song, and Yong Li. Big data driven mobile traffic understanding and forecasting: A time series approach. IEEE Trans. Serv. Comput., 9(5):796–805, (2016) 51. R.N.S. Rajapaksha, Master’s thesis: potential deep learning approaches for the physical, 1–59, (2019) 52. G. Cerar, H. Yetgin, M. Mohorcic, C. Fortuna, Machine learning for wireless link quality estimation: a survey. IEEE Commun. Surv. Tutor. (2021). https://doi.org/10.1109/COMST.2021.305 3615 53. Habiby, Ali Asgher Mansoor, Ahamed Thoppu. Application of reinforcement learning for 5G scheduling parameter optimization. arXiv preprint arXiv:1911.07608 (2019)
References
169
54. M.-L. Tham, A. Iqbal and Y. C. Chang, Deep Reinforcement Learning for Resource Allocation in 5G Communications, in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, pp. 1852-1855, https://doi. org/10.1109/APSIPAASC47483.2019.9023112 (2019) 55. T. Wang, C.-K. Wen, H. Wang, F. Gao, T. Jiang, S. Jin, Deep learning for wireless physical layer: Opportunities and challenges. China Commun. 14(11), 92–111 (2017) 56. Y. Xing, Y. Qian, L. Dong, Deep learning for optimized wireless transmission to multiple rf energy harvesters, in Proc. of IEEE VTC Fall, (2018) 57. Upadhyaya, Pulakesh Jiang, Anxiao. Machine learning for error correction with natural redundancy, (2019) 58. Gao, Jiabao, Zhong, Caijun, Li, Geoffrey, Zhang, Zhaoyang, Deep neural network for optimization in wireless communications (2022) 59. G. Villarrubia et al., Artificial neural networks used in optimization problems, Neurocomputing, 272, pp. 10–16, (2018) 60. V.P. Rekkas, S. Sotiroudis, P. Sarigiannidis, G.K. Karagiannidis, S.K. Goudos, Unsupervised machine learning in 6G NetworksState-of-the-art and future trends, in Proceedings of the 2021 10th International Conference on Modern Circuits and Systems mTechnologies (MOCAST), Thessaloniki, Greece, 5–7, pp. 1–4 (2021) 61. H. Dahrouj, R. Alghamdi, H. Alwazani, S. Bahanshal, A.A. Ahmad, A. Faisal, R. Shalabi, R. Alhadrami, A. Subasi, M. Alnory et al., An overview of machine learning-based techniques for solving optimization problems in communications and signal processing. IEEE Access 9, 74908–74938 (2021) 62. H. Yang, A. Alphones, Z. Xiong et al., Artificial-intelligenceenabled intelligent 6G networks. IEEE Network 34(6), 272–280 (2020) 63. R. Shafin, L. Liu, V. Chandrasekhar et al., Artificial intelligence-enabled cellular networks: a critical path to beyond-5G and 6G. IEEE Wirel. Commun. 27(2), 212–217 (2020)
6
Python for Deep Learning: A General Introduction
6.1
Introduction
Python is a high-level programming language that has become increasingly popular over the years, thanks to its simplicity, ease of use, and the wide range of libraries and tools available for developers. In recent years, Python has also emerged as a dominant programming language for Deep Learning, a subfield of artificial intelligence that involves building and training neural networks to learn from data. Python’s popularity in the Deep Learning community is due to its powerful libraries, such as NumPy, TensorFlow, PyTorch, and Keras, that allow for rapid development and experimentation of Deep Learning models. These libraries, combined with the simplicity and flexibility of Python, have made it an ideal choice for researchers and developers working in Deep Learning. In this chapter, we will explore the basics of Python programming and its use in Deep Learning, including popular libraries and tools, and best practices for developing and deploying Deep Learning models in Python [1]. What programming language to learn? When you try to figure out which language to start learning, it does not matter which one. Whatever the language you choose to learn you should know what makes difference is how to build your: 1. Coding skills. 2. Problem solving skills. 3. Algorithms understanding.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 K. S. Mohamed, Deep Learning-Powered Technologies, Synthesis Lectures on Engineering, Science, and Technology, https://doi.org/10.1007/978-3-031-35737-4_6
171
172
6 Python for Deep Learning: A General Introduction
However, you should take into consideration that some languages are easier than other languages.
6.1.1
What Is Python?
Python is a general-purpose programming language back to 1991. It is an open-source language and free one [2]. There are two versions: Python2, Python3. Python is a dynamic language. Dynamic languages can be interpreted directly, which means that the actual text of the program “the source code” is used while the program is running. In contrast, a static language is executed in two phases: first the program is translated from source code to binary code, and then the binary code is interpreted. Python is designed for readability (similarities to the English language). Although Python is a general-purpose programming language like C, Java, etc., but it is higher level as you can express concepts is fewer lines of code than in C or C++.
6.1.2
Why Should You Learn Python?
Python is a nice language for beginning programming for several reasons. . . . . . . . . . .
The syntax is simple and clear. Easy to learn. Powerful and can be used in many applications. You can write powerful and interesting programs without a lot of work. Widely used. Quick for testing out new ideas. Less line of codes compared to C or JAVA. Vast function library geared towards strings and files. Save coding time and automate a lot of tasks. Intuitive, code is concise, but human readable.
6.1.3 . . . . .
Python Applications: What Can You Do with It?
Web development. Machine learning. Data analytics and visualization. Scripting. Game development.
6.1
Introduction
173
Fig. 6.1 Python interpreter
. Desktop applications and GUI. . Embedded applications. . As a hardware description language (HDL): myHDL is a software that converts from python to HDL [3].
6.1.4
Python Programming Environment and Tools: What Do You Need to Write Programs in Python?
. A text editor for writing and changing your source code easily (Notepad++). – Automatically indent the code: use indentation to determine the code structure, use 4 spaces not tab, indentation should be consistent through the whole code. – Color code to clarify its meaning. – Jump from variable name to its definition. – Jump from function call to its definition. – Python code is placed in a regular text file that must end with the.py extension. . An interpreter to translate and execute your program (Fig. 6.1).
6.1.5
Integrated Development Environment (IDE)
6.1.5.1 Definition of IDE IDE is a large computer program that provides tools for increasing productivity in software development. i.e. have everything you need to start coding such as: . Text editor . Auto-Completion
174
6 Python for Deep Learning: A General Introduction
. Real-time error-checking . Debugging . Interpreter
6.1.5.2 Types of IDE (A) IDLE IDE Visit python.org/download and install versions for Windows users/for Mac users. Idle IDE is Included with the Python Installation and it is written in Python. To run it: . Open the command prompt or terminal. . Navigate to the location of the python file: (a) Use the cd command to access folders. (b) Use the cd command with two periods (cd ..) to move backward in the filesystem. . To run/interpret the python file: (a) Type python filename.py (where filename is the file you’re trying to run) (B) 3rd Party IDE . . . . .
Pycharm [4]. Spyder [5]. Anaconda [6]. Miniconda [7]. Jupyter [8].
(C) Online IDE . www.onlinegdb.com. . http://www.skulpt.org. . https://colab.research.google.com.
6.1.6
The Big Picture of Any Program
Any program consists of computation, communication, and control between different parts of the code (Fig. 6.2). As we will learn later in this chapter, Python can be used to program all these different parts.
6.1
Introduction
175
Fig. 6.2 The big picture of any program
6.1.7
Create Your First Program in Python: Hello World Program
Python is a case sensitive language, so print not PRINT nor Print. As shown in Fig. 6.3, using the online IDE, you write only Print (“hello word”). When you press the play button, all instructions are executed in order.
6.1.8
Python Versus Java
Although performance is not always a problem in software, it should always be a consideration. Where network I/O costs or database access dominate, the specific efficiency of a language is less significant than other aspects of technology choice and design when it comes to overall efficiency. Although neither Java [9] nor Python is especially suited to high-performance computing, when performance matters, Java has the edge by platform and by design. A lot of Java efficiency comes from optimizations to virtual machine execution. A JVM can translate bytecode into native machine code as a program executes. This Just-In-Time (JIT) compilation is why Java’s performance can often rival that of native languages. Relying on JIT is a reasonably portable assumption as HotSpot, the default Oracle JVM, offers it.
176
6 Python for Deep Learning: A General Introduction
Fig. 6.3 Hello world program using idle IDE
Java has had support for concurrency from its first public version, whereas Python is more resolutely a sequential language. This has implications for taking advantage of current multi-core processor trends, with Java code more readily able to do so. Both Java and Python enjoy a seemingly endless supply of open-source libraries populated by code from individuals and companies who have solved common and uncommon problems, and who are happy to share so others can take advantage of their solutions. Indeed, both languages have benefited from online forums and open-source development. Though Java’s been in the mainstream longer than Python. Thus, it’s probable that the libraries/ tools for Java are more mature and/or capable. But that’s very debatable.
6.1.9
Python Versus C++
C++ is a general-purpose programming language. It is also developed from the original C programming language. C++ [10] is a statically typed, free-form, multi-paradigm,
6.1
Introduction
177
and a compiled programming language. Python is another programming language. However, it is quite different than C++. Python is a general-purpose, high-level programming language. Python is considered to be cleaner and more direct, with emphasis code readability. C++ is now commonly used for hardware design. The design is first described in C++. It is then analyzed, architecturally constrained, and scheduled to create a register-transfer level hardware description language. It would do this through high-level synthesis. An advantage of Python is that its code is quite shorter than most other programming languages. This allows programmers to express concepts is fewer lines of code than in C or C++. Python’s language provides constructs. These constructs are intended to enable clear programs on both a small and large scale. Another advantage of Python is that it multiple programming paradigms, including object-oriented, imperative and functional programming styles. It features a dynamic type system and automatic memory management. It also has a large and comprehensive standard library. All of which helps improve Python’s usability. Also, python interpreters are available for many operating systems. Python is a dynamic language and like other dynamic languages, it is often used as a scripting language. However, it is also often used in non-scripting contexts. Furthermore, Python code can be packaged into standalone executable programs by using third-party tools. Difference between Python and C++ can be summarized in the below points: . . . . . . . . . . . . . . . . .
Python uses Garbage Collection whereas C++ does not. C++ is a statically typed language, while Python is a dynamically typed language. Python is easier to use than C++. Python is run through an interpreter, whilst C++ is pre-compiled. Hence, C++ is faster than Python. C++ supports pointers and incredible memory management. Python supports very fast development and rapid, continuous language development. Python has less backwards compatibility. Majority of all applications are built from C++. Majority of all 3D applications offer Python access to their API’s. Python code tends to be 5–10 times shorter than that written in C++. In Python, there is no need to declare types explicitly. Smaller code size in Python leads to leads to “rapid prototyping”, which offers speed of development. Python requires an engine to run. Python is interpreted each time it runs. Python is hard to install on a Windows box and thus makes distribution of the program problematic. C++ is a pure binary that links to existing libraries to assist the coding. In Python, variables are in scope even outside the loops in which they are first instantiated.
178
6 Python for Deep Learning: A General Introduction
. In Python, a function may accept an argument of any type, and return a value of any type, without any kind of declaration beforehand. . Python provides flexibility in calling functions and returning values. . Python looks cleaner, is object oriented, and still maintains a little strictness about types.
6.2
Data Types
Variable is a container that holds/stores values. It makes changes in the code easier. Instead of changing many lines manually, we change it once. The data stored in a variable can be of many types. Python has different Data types: . . . . . .
Numbers Strings List Tuple Dictionary Classes
counter = 100 # An integer miles = 100.2 # A floating point name = "Khaled" # A string a = b = c = 1 # Python allows you to assign a single value to several variables simultaneously.
6.2.1
Numbers and Functions of Numbers
Number data types store numeric values. var1 = 1 var2 = 10.3 var3 = -7 var4 = True # Boolean var5 = False Print (var2) # Print help in debugging too besides displaying the results
6.2
Data Types
179
Functions are little block of code that we can run to perform specific operations. They can be built-in or external (user-defined). They are used to modify data or get-info about it. Functions can have any numbers of arguments. print (abs(-3)) print (pow (3,3)) print (min (4,3)) # Built-in print (round (4.3)) print (sqrt(4)) # External: it gives error, we need to add the following line: From math import *
Arithmetic operations of Python are shown in Table 6.1. Order of arithmetic operations is shown in Table 6.2. Table 6.1 Arithmetic operations
Table 6.2 Order of arithmetic operations
()
Brackets (inner before outer)
**
Exponent
*, /, //, %
Multiplication, division, modulo
+, −
Addition, subtraction
=
Assignment
180
6 Python for Deep Learning: A General Introduction
Table 6.3 Python relational operations Operator
Meaning
Greater than
==
Equal
=
Greater than or equal
!=
Not equal
Table 6.4 Python logical operators and #Evaluates to true if both expressions are true or #Evaluates to true if at least one expression is true not #Returns true if the expression is false
x = 3 * 2 ** 3 # =24 x = (3 * 2) ** 3 # = 216
Python relational operations are shown in Tables 6.3 and 6.4 shows Python logical operators.
6.2.2
Strings and Functions of Strings
Strings in Python are identified as a contiguous set of characters in between quotation marks. We write the strings between the quotations mark. Code
Results
str= " Hello World" Name = "Khaled" print (" my name is :"+ Name +" ." " \n my age is 30" ) # + for appending # \n new line my name is:Khaled my age is 30
There are functions associated with strings. We can run cascaded functions. Name = "Khaled" print (Name.lower()) # khaled print (Name.upper()) #KHALED print (Name.isupper()) # False
6.2
Data Types
print (Name.upper().isupper() ) # True # We can run cascaded functions print (len(Name)) # 6 print (Name[0]) # K print (Name[-1]) # d # Indexing start from 0, from the end it starts from -1 print (Name.index ("d")) # 5 print (Name.replace ("d", "f")) # khalef #other functions ord #convert letter to asci(hex) chr #convert asci (hex) to letter join # narrow the distances between letters format # like + +
6.2.3
List and Functions of List
To store list of information. It uses square brackets. list = [1234, "khaled"] print (list [0]) #1234 print (list [0:1]) # 1234, khaled print (list [:1]) # If start is omitted, the selection is made from the beginning print (list [0:]) # if stop is omitted, the selection is made up to the end print (list [:]) # If both are omitted, all items are selected
We can have grids of list (list of lists) List = [ [ , ], [ , ], [ , ] ] empty = []
Functions of List list = [1234, "khaled"] list.append(7) # add to the last print (list) # [1234, "khaled", 7] print (list. pop) # [1234, "khaled"] it pushes to the right , so it remove 7 #other functions list.extend(L) list.insert(i,x) list remove(x) list.index(x) list.count() list.sort() list.reverse()
181
182
6.2.4
6 Python for Deep Learning: A General Introduction
Tuples
Similar to the list except that it cannot be changed such as coordination. Unlike lists, tuples are enclosed within circle brackets (). tuple = (123, "khaled") tuple[0] = 5 # Error!
6.2.5
Dictionary
It consist of key-value pairs. Dictionaries are enclosed by curly brackets {}.
6.2.6
Classes and Objects
We cannot represent all things by strings or numbers such as persons and cars. We can create our own data type, for example we can create a data type called student. Class create a template for the new data type (student). Object create an actual (student). Keywords init, def, self are useful for the interpreter as they work as a preamble. The double underscore symbol “__” is used to make the variable visible inside the class only. Init is a constructor for the class. class student # definition (two level of identation) def __init__(self, name, grade): #initialization (constractor) self.name = name # attribute 1 self.grade = grade # attribute 2
6.4
External Functions: User-Defined Functions
183
Studen1= student (" Ahmed“, 2) # call of class: create object Studen2= student (" Ramy“, 4) #create object Print (student1.name)
For code reusability. We can create another class that inherit other class attributes. Super is a keyword in inherited classes and it works as a constructor for the base class.
class class2 (class1): #class2 inherits from class1
6.3
Inputs
The program is getting string information from the user. name = input("enter your name:") print (name)
a= input(“enter first number”) b = input(“enter second number”) c = int(a) + int (b) # type conversion is important print (c)
6.4
External Functions: User-Defined Functions
Functions are defined with the def statement. Indentation is very important for functions, without it, it will give an error. There is a distance between first line and second line. def fn_name(param1, param2): value = do_something() return value
184
6 Python for Deep Learning: A General Introduction
def iseven (num): #definition return (num%2==0) Result= iseven(100) # call the function Print (Result)
6.5
Control: If Statement, While Loop, For Loop
6.5.1
If/If Else/If Elif Else Statement
In if statement, the condition is evaluated and if the condition is true, the list of statements is executed as depicted in Figs. 6.4, 6.5 and 6.6. if :
else :
true
Grade >= 60
print “Passed”
false
Fig. 6.4 If statement
false
print “Failed”
Fig. 6.5 If else statement
Grade >= 60
true
print “Passed”
6.5
Control: If Statement, While Loop, For Loop
condition a
true
185
case a action(s)
false true condition b
case b action(s)
false
true condition z false default action(s)
Fig. 6.6 If elif else statement Years = 2 If years > 10: Bonus = 10 # do not forget indentation else bonus = 5
If years > 10: bonus = 5 elif years = 10: bonus = 5 else bonus = 5
case z action(s)
186
6 Python for Deep Learning: A General Introduction
6.5.2
While Loop
while :
counter = 1 while counter