735 63 8MB
English Pages 271 [272] Year 2023
Mohammad “Sufian” Badar Editor
A Guide to Applied Machine Learning For Biologists
A Guide to Applied Machine Learning for Biologists
Mohammad “Sufian” Badar Editor
A Guide to Applied Machine Learning for Biologists
Editor Mohammad “Sufian” Badar Department of Computer Science and Engineering School of Engineering Sciences and Technology (SEST) Jamia Hamdard University New Delhi, India Senior Teaching Faculty (Former) Department of Bioengineering University of California, Riverside Riverside, CA, USA
ISBN 978-3-031-22205-4 ISBN 978-3-031-22206-1 https://doi.org/10.1007/978-3-031-22206-1
(eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
This dissertation is lovingly dedicated to my mother (late Nurunnisa), father (late Mohd. Badre Alam), mother-in-law (Fakhrunnisa Kamal), sister (Shahnaz Badar), brother-in-law (late Professor Zaki Ahmed), wife (Rana Kamal), daughters (Sarah and Aisha), and my brothers and sister. Their support, encouragement, guidance, and constant love have sustained me throughout my life.
Foreword
Artificial intelligence is the study and design of an intelligent agent, the intelligent agent being a carrier that perceives its environment, retrieves the necessary information from said environment, and takes actions which maximizes its chances of success. An ideal artificial intelligence system is able to rationalize and take the course of action best suited for achieving a goal. Machine learning (ML) is a branch of AI in which machines, through training, are taught to solve a problem by generating programs independently and automatically. In today’s time, machine learning is a catchword, representing a major step forward in how computers can learn and act. Machine learning engineers are in high demand, due to evolving technology and the generation and processing of a large amount of data, also called big data. Deep learning is part of a machine learning (ML) technique that uses parts of AI to classify and order information beyond the simple input/output protocols. Right now, there is no book available in the market that explains artificial intelligence in layman terms and correlates it with machine learning. This book explains artificial intelligence in simple words, keeping in mind that the reader is a novice, and covers all its areas. It then correlates artificial intelligence with machine learning. This provides the reader a clear understanding of artificial intelligence as well as machine learning and the boundary between them. This book is necessary for college-going students and research scholars. There are various concepts and applications of machine learning in different fields. This book helps college-going students (especially freshman as well as sophomore), professionals, and researcher in the area of artificial intelligence and machine learning. The editors have edited this book to provide basic understanding of these
vii
viii
Foreword
areas to those without a background in computer science. It is a basic book, and it is handy in academic and professional communities. David K. Mills Professor, Department of Biological Sciences, Director, Center for Biomedical Engineering and Rehabilitation Sciences, Louisiana Tech University, Ruston, LA, USA
Preface
As of right now, most books in the market on Machine Learning are for those who have basic knowledge of it. This includes college-going students planning to take a course in this subject at the undergraduate or graduate level or researchers in this field. Newcomers are not included in this audience. However, Machine Learning Engineers are in high demand due to evolving technology and the generation and processing of large data. For newcomers (students from another branch or new researchers) who want to understand and learn the basics of Machine Learning, searching for any source material that fulfills these demands is unsuccessful. This can lead to the novice losing interest in this field, as they cannot find a beginner-level book that explains everything to them in simple terms. Thus, this book will satisfy the needs of newcomers, as it has been written specifically for them. This book covers basics like probability theory, multilayer perceptron, dimensionality reduction, clustering, different types of learning models, ANN, coding in Python, and applications of Machine Learning. The chapters are written so that they start with basic computers, programming languages, and essential mathematics. Bioinformatics is needed to understand and utilize Machine Learning. Therefore, it has been explained in the book. This book is written in a way that will help the reader gain a basic understanding of Artificial Intelligence and Machine Learning. Applications of Bioinformatics and Machine Learning are also explored. Lastly, we discuss the prospects of these newly emerging areas, Artificial Intelligence and Machine Learning. Case Study 1: Human Emotion Detection Case Study 2: Brain Tumor Classification The unique feature of this book is that it includes two case studies. The first is on Human Emotion Detection. The code is written in Python, giving results with reasonable accuracy. There is another case study which is of clinical data. The
ix
x
Preface
results given are highly accurate. I believe it is the only book intended for novice readers in Machine Learning who can see the results and analyze them, even if the reader doesn’t have a background in AI/ML or prior coding knowledge. Riverside, CA, USA
Mohammad “Sufian” Badar
Acknowledgments
First and foremost, I would like to praise and thank Allah SWT, the almighty, who has granted me countless blessings, knowledge, and opportunities so that I can edit my first book. I am extremely grateful to my parents (late Mohd. Badre Alam and late Nurunnisa) for their love, prayers, caring, and sacrifices for educating and preparing me for my future. Most importantly, I wish to thank my wife, Rana Kamal, and daughters, Sarah Sufian Badar and Aisha Sufian Badar, for their patience, assistance, support, and faith in me. Rana's tolerance of my occasional lows is a testament in and of itself to her unyielding devotion and love. I am also thankful to my mother-in-law, Fakhrunnisa Kamal, Dr. Azfar Kamal, Dr. Farheen Asaf Kamal, Feraz Kamal, Farha Kamal, Shaheen Kamal, and Mohd Salahuddin. I will forever be thankful to my sister (Shahnaz Badar), my brothers (M. Rizwan Badar, M. Hassan Badar, Dr. M. Affan Badar, and Dr. M. Rehan Badar), my sistersin-law (Sayyada Shaheen, Mahtab, Dr. Sadia Saba, and Ishrat Raza), brother-in-law (late Professor Md. Zaki Ahmed), nieces (Dr. Aafia Tasneem, Dr. Labeebah Nesa, and Aalia Tasneem), and son-in-law (Mansoor Alam) for their support and valuable prayers.
xi
Contents
Basics of Modern Computer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . Hussam Bin Mehare, Jishnu Pillai Anilkumar, Ekbal Ahmed, and Ali Al Qahtani
1
The Python Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . Hussam Bin Mehare, Jishnu Pillai Anilkumar, and Naushad Ahmad Usmani
27
Basic Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hussam Bin Mehare, Jishnu Pillai Anilkumar, and Iqbal Hasan
61
Introduction to the World of Bioinformatics . . . . . . . . . . . . . . . . . . . . . . 105 Sarbani Mishra, Sudiptee Das, Madhusmita Rout, Sanghamitra Pati, Ravindra Kumar, and Budheswar Dehury Introduction to Artificial Intelligence & ML . . . . . . . . . . . . . . . . . . . . . . 127 Sarath Panat and Ravindra Kumar Fundamentals of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Joel Mathew Cherian and Ravindra Kumar Applications in the Field of Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . 175 M. Parvez and Tahira Khan Future Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Hussam Bin Mehare, Jishnu Pillai Anilkumar, and Mohammad “Sufian” Badar Case Study 1: Human Emotion Detection . . . . . . . . . . . . . . . . . . . . . . . . 221 Jishnu Pillai Anilkumar, Hussam Bin Mehare, and Mohammad “Sufian” Badar
xiii
xiv
Contents
Case Study 2: Brain Tumor Classification . . . . . . . . . . . . . . . . . . . . . . . 243 Jishnu Pillai Anilkumar, Hussam Bin Mehare, and Mohammad “Sufian” Badar Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Basics of Modern Computer Systems Hussam Bin Mehare, Jishnu Pillai Anilkumar, Ekbal Ahmed, and Ali Al Qahtani
1 Computer Generations 1.1
First Generation: Vacuum Tubes
Programs and data were stored in the same manner as they are today in this generation of computers. Assembly language programs were developed and then translated into machine language for execution. The logic functions were built using vacuum tube technology, and basic arithmetic calculations were done in a few milliseconds. At first, mercury delay-line memory was used. Typewriter-like devices handled I/O functions. Magnetic core memory and magnetic tape storage technologies were also developed. The UNIVAC and ENIAC computers are examples of first-generation computing technology [1, 3].
H. B. Mehare (*) Department of Mechanical Engineering, Z.H. College of Engineering and Technology, Aligarh Muslim University, Aligarh, Uttar Pradesh, India J. P. Anilkumar Department of Computer Science & Engineering, Presidency University, Bengaluru, Karnataka, India E. Ahmed Data Center Engineer at Diwan of Royal Court, Muscat, Sultanate of Oman A. A. Qahtani Agricultural and Technical State University, North Carolina, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_1
1
2
1.2
H. B. Mehare et al.
Second Generation: Transistors
The transistor, invented at AT&T Bell Laboratories, significantly improved on the vacuum tube, allowing computers to be smaller, faster, cheaper, more energyefficient, and more trustworthy than their predecessors. Magnetic core memory and magnetic drum storage systems were widely used in the second generation. During this generation, magnetic disk storage methods were developed. The first high-level languages, like Fortran, were developed, making the development of application programs much easier. Compilers were developed to convert these high-level language programs into assembly code, which was then translated into executable machine code [2, 3].
1.3
Third Generation: Integrated Circuits
Texas Instruments and Fairchild Semiconductor were pioneers of integrated circuit technology, which enabled them to build many transistors on a single silicon chip. This allowed for the creation of faster and less costly CPUs and memory units. Magnetic core memories were increasingly replaced by integrated circuit memory. Other developments included microprogramming, parallelism, and pipelining. Operating system software allowed several user programs to share a computer system efficiently. Caches and virtual memory were built. Cache memory causes main memory to look faster than it is, but virtual memory causes it to appear larger. The third generation market was dominated by IBM’s System 360 mainframe computers and Digital Equipment Corporation’s PDP minicomputer series [1–3].
1.4
Fourth Generation: Microprocessors
The phrase very large-scale integration originated in the early 1970s when integrated circuit manufacturing methods had developed to the point where thousands of transistors could be packed onto a single chip (VLSI). A complete processor that is housed on a single chip is known as a microprocessor. Companies including Intel, National Semiconductor, Motorola, Texas Instruments, and Advanced Micro Devices have all adopted this technology. With the help of modern VLSI technology, several processors (cores) and cache memory may now be combined on a single piece of hardware. For fourth-generation computers, GUIs, the mouse, and portable devices were also created [1–3]. In order to meet the needs of certain applications, particularly in embedded computer systems, system developers can design and implement processor, memory, and input/output (I/O) circuits on a single chip using field-programmable gate arrays
Basics of Modern Computer Systems
3
(FPGAs), a kind of VLSI technology. Companies like Altera and Xilinx offer this technology together with the requisite software development infrastructure. Highperformance computing systems exist now thanks to the development of organizational concepts like parallelism and hierarchical memory as the fourth generation matured.
2 Attributes of a Modern-Day Computer 2.1
Speed and Accuracy
Modern computers are capable of doing millions of computations per second while processing data. For instance, weather forecasting requires the processing of a sizable amount of data pertaining to temperature, pressure, and humidity in various regions in a matter of a few seconds. Another example would be calculating and creating wage slips for a company’s employees. Computers are incredibly accurate. For instance, computers are capable of computing the result of the division of any two integers to ten decimal places [2].
2.2
Diligence
A computer does not become tired or worn out when utilized for prolonged periods of time. It can always do complicated calculations quickly and effectively [1].
2.3
Storage Capability and Versatility
It is possible to store and retrieve enormous amounts of data and information as needed. Only a certain amount of data may be stored in main memory at any given time. Massive volumes of data may be permanently stored via secondary storage devices. Computers can readily complete many jobs at once. In 1 min, a person can use a computer to compose a letter, and the next minute, they can play music or print out a paper. On the other hand, computers are limited to the duties that have been programmed into them [3].
4
H. B. Mehare et al.
3 Computer Hardware Basics A modern-day computer consists of various parts which are discussed briefly below [4–6]: 1. Computer case/chassis: The enclosure contains the main components of a computer. 2. Motherboard/mainboard/system board: The system buses are transported on the main printed circuit board. All CPUs, memory modules, plug-in cards, daughterboards, and peripheral devices are connected to sockets on the motherboard. 3. CPU/processor: Often called the computer’s brain and is capable of performing arithmetic and logical operations on data. It consists of an arithmetic and logic unit (ALU), a control unit, registers, and buses. 4. Memory/RAM: Operating system software, programs, and data are typically transferred into RAM from a disk drive so that the central processor unit (CPU) may do operations quickly and directly. 5. Hard disk drive: A computer storage device that permanently saves data on rotating magnetic surfaces. 6. Optical disk drive (ODD): A storage device that stores or retrieves data using light or lasers. CD and DVD drives are common examples. 7. Sound card/audio adapter: An internal computer expansion card enables computer programs to regulate the input and output of audio signals to and from a computer. 8. LAN/Ethernet adapter: Provides a standardized method for linking computers to form a network. 9. MODEM (modulator and demodulator): A telephone-based gadget that allows two computers to communicate across the phone line. It transforms digital computer signals into analog signals and then back into digital signals. 10. TV/FM card: The hardware allows analog or digital broadcast television signals to be received and converted or translated for display by a computer. 11. Memory/flash memory card: A small storage medium stores data such as text, pictures, audio, and video on small, portable, or remote computing devices. 12. RJ45 connector: Registered Jack-45, an eight-wire connector used to connect computers on local area networks (LAN), especially Ethernets.
3.1
Central Processing Unit
The majority of CPU chips are divided into four sections [7, 8]: 1. ALU: The ALU provides the computer with the ability to perform arithmetic and logical operations. Under the heading of functional units, more information will be provided.
Basics of Modern Computer Systems
5
2. Registers: Temporary data storage spaces maintain track of instructions and save the location and outcomes of operations. 3. Control section: It times and monitors the whole computer system’s performance by using its instruction decoder to understand data patterns in a specific register and translate the patterns into operations like addition and comparison. Additionally, it makes use of its interrupt input to regulate how much CPU time is allocated to each task and to specify the sequence in which different processes use the CPU. 4. Internal bus: A network of communication cables links the processor’s internal components and leads to connections on the outside that connect it to other parts of the computer. The main functions of the microprocessor (CPU clips) include the following [8, 9]: • • • •
Control primary storage in storing data and instructions (i.e., the ROM). Control the sequence of operations. Give commands to all parts of the computer system. Carry out processing.
3.2
Instruction Cycle
Fetch: Get the instruction from memory into the processor [3, 4]. Decode: Internally decode what it has to do (in this case add). Execute: Take the values from the registers, actually add them together. Store: Store the result back into another register.
3.3 3.3.1
Functional Units Input Unit
An input device is any part of an information processing system, such as a computer or other information appliance, that transmits data and control signals. Through input devices, computers obtain coded data and convert it from a human-readable to a computer-readable format [5]. The most common input method is the keyboard, with the corresponding letter or digit instantly converted to binary code and sent to the processor. Other input devices for human-computer interfaces include the trackball, joystick, mouse, touchpad, and others. These are widely used as graphic input devices in combination with displays. Audio can be captured by microphones and afterwards sampled and converted to digital codes for processing and storage. Cameras can also be used to record video input. A computer may receive data across digital communication networks like the Internet from databases and other computers [3, 10].
6
3.3.2
H. B. Mehare et al.
Memory Unit
Programs and data are maintained by the memory unit. It is constructed of capacitors and silicon chips. An electronic device called a capacitor or condenser stores electrical energy between two plates made of electrically conducting material and separated from one another by a dielectric insulator (stores electrons). In common electrical and electronic equipment, integrated circuits are made via semiconductor device manufacturing [9, 10]. The following are the storage class categories: 1. Registers: All data must first be represented in a register, a special high-speed storage space within the CPU, before processing. Instead of including the data itself, the record might contain the address of a memory location where the data is kept. The quantity and size of registers affect a CPU’s performance and speed (number of bits). 2. Primary memory: The fast memory, also known as main memory, runs at electrical speeds. This memory is where computer programs must be kept while they are running. Each semiconductor storage cell in the memory has the capacity to hold one bit of data [7]. They are discussed using fixed-size groupings called words. A single word can be stored or extracted from memory in a single operation. The length of a word in a computer is commonly 16, 32, or 64 bits, depending on how many bits are contained in each word. Each word’s position in the memory is linked to a particular address for easy access. Addresses are strings of integers that start with 0 and indicate additional places. Providing the memory’s address and giving it a control instruction allows one to access the memory that constitutes the storing or retrieval operation [6]. Under the control of the CPU, commands and data can be written to or read from memory. Any place in a random access memory may be reached after specifying its address in a brief and set amount of time (RAM). The time it takes to access a single word from memory is known as the word access time. The location of the word being accessed makes no difference to this time. The normal lifetime of a modern RAM unit is between a few nanoseconds (ns) and about 100 ns. Transistors in static RAM (SRAM) are used to store data in a static manner. It runs at processor-equivalent speeds and doesn’t require data replenishment. Transistors and capacitors are the building blocks of dynamic RAM. Data is stored in capacitors, which require dynamic recharging in addition to routine data updating. On Macintosh computers, parameter RAM (PRAM) is used to hold internal data that must stay in memory after the computer shuts down, such as the date and time of the computer and other configuration data. Portable computers are designed particularly to use pseudo-static RAM (PSRAM). On video adapter
Basics of Modern Computer Systems
7
Table 1 Memory types and usages Name SRAM (static RAM) DRAM (dynamic RAM) PRAM (parameter RAM) PSRAM (pseudo-static RAM) VRAM (video RAM)
Usage Also called flash RAM, used in CACHE memory and in PCMCIA memory cards Personal computers The equivalent of CMOS on a Macintosh computer Notebooks and portable PCs Frame buffer for video and color graphics support
cards, video RAM (VRAM) is utilized to enhance the PC system and the visual display [8] (Table 1). No matter if the power source is on or off, persistent storage like read-only memory (ROM) is active. The data stored on ROM devices cannot be changed. 1. Cache Memory: The cache is an addition to the main memory and a more compact but speedier RAM unit. It is used to maintain track of the parts of a program that are now in operation as well as any associated data. The cache and CPU are often housed on the same integrated circuit chip and have a close connection. The purpose of the cache is to hasten the execution of instructions. All program commands and any pertinent information are kept in the leading memory. As execution moves forward, instructions are fetched into the processor chip, and a duplicate of each is kept in the cache. Documents are received and stored in the cache when an instruction requests data from the main memory. Assume that several instructions are carried out repeatedly, much like in a software loop. When used frequently, these instructions can be quickly accessed if they are saved in the cache. Similar to this, if duplicates of the precise data locations’ contents are stored in the cache and those locations are regularly requested, the copies can be retrieved rapidly. A disk cache is a portion of main memory or extra memory that has been added to the hard drive to hasten the transfer of data and programs from the hard drive to RAM. The disk controller board houses big chunks of data that are often accessed. There are two types of cache memory: • An internal cache is also called primary cache and is placed inside the CPU chip. • An external cache is also called secondary cache and is located on the motherboard. Level 1 cache: Level 1 cache, also known as internal cache, is the cache memory nearest to the CPU and is located inside on the processor chip. Level 2 cache: The second cache level is called level 2 cache. L2 cache is sometimes confused with external cache; however L2 cache may also be found on the CPU chip. It is RAM if there is a level 3 cache.
8
H. B. Mehare et al.
Static RAM (SRAM), sometimes known as flash RAM, is used in cache memory and PCMCIA (Portable Computer Memory Card Industry Association) memory cards. Most CPUs include a large number of separate caches, including instruction and data caches, with the data cache frequently structured in a hierarchical structure with numerous cache levels (L1, L2, L3). 2. Secondary Storage: Although main memory is necessary, it is costly and does not preserve information when power is switched off (i.e., primary memory is volatile). When huge volumes of data and many programs must be saved, especially for information that is accessed rarely, extra, less expensive, permanent secondary storage is utilized. Secondary storage has slower access times than primary memory. • A hard drive (HD) is a device component that stores and provides relatively quick access to huge amounts of data on an electromagnetically charged surface or collection of surfaces. A disk drive, hard drive, or hard disk drive is another name for it. • Solid-state drive/disk (SSD): a data storage device that stores permanent data in solid-state memory. To work, hardware makes extensive use of connections, allowing pieces to interact and communicate. A bus is a common linked system that is made up of wires or circuitry that organizes and delivers data between internal components. • A disk drive that employs laser light to read or write data to or from optical discs is an optical disk drive (ODD). • The name “disk” is used because the data is accessed as if it were on a hard drive. Flash disk is a storage module consisting of flash memory chips that do not include mechanical platters or access arms. The storage structure on the disk is simulated.
3.4 3.4.1
Memory Hierarchy in Computer Systems (Fig. 1) Arithmetic and Logic Unit
Most computer activities, notably arithmetic or logic operations such as addition, subtraction, multiplication, division, or number comparison, are initiated by bringing the necessary operands into the processor, where the ALU works. When operands are fed into the processor, they are stored in registers, which are high-speed storage components that can hold one word of data.
Basics of Modern Computer Systems
9
Fig. 1 Memory hierarchy in computer systems
3.4.2
Output Unit
The job of the output unit is to send processed results to the outside world. Some units, such as graphic displays, include both an output function that shows text and graphics and an input function that may be accessed through a touchscreen. The dual function of such units is often the rationale for utilizing the single moniker input/ output (I/O) unit.
3.4.3
Control Unit
The control unit acts as a nerve center, transmitting orders and monitoring the state of other units. I/O transfers, which include input and output procedures, are controlled by program instructions that identify the devices involved and the content to be transferred. Control circuits are in charge of creating the timing signals that regulate transfers and determine when a specific activity takes place. Through timing signals, the control unit also supervises data transfers between the CPU and memory. A computer’s operation can be summarized as follows: • Through an input unit, the computer receives data in programs and data and maintains it in storage. • Information from the RAM is fetched and processed in an arithmetic and logic unit under program control. • The output unit is where the computer’s processed data exits. • The control unit directs all activities on the computer.
10
3.5
H. B. Mehare et al.
Hardware Connections
To work, hardware makes extensive use of connections, allowing pieces to interact and communicate. A bus is a common linked system that is made up of wires or circuitry that organizes and delivers data between internal components [6, 9]. A serial connection is a cable or group of wires that sends data from the CPU to an external device such as a mouse, keyboard, modem, scanner, or printer. Because it only transfers one piece of data at a time, this type of connection is sluggish. A serial link has the benefit of enabling productive connectivity across long distances. A parallel connection uses two pairs of cables to deliver data blocks at the same time. A parallel connection uses two pairs of cables to deliver data blocks at the same time. This is the connection used by the majority of scanners and printers. An identical link is significantly quicker than a serial connection, but it is limited to distances of less than 3 m between the CPU and the external device (10 ft).
3.6
Computer Networking
A network is any group of autonomous computers that transfer information across a common communication medium. A computer network is the connectivity of autonomous computers. Networks are classified as follows.
3.6.1
Local Area Network (LAN)
LANs are frequently confined to a certain geographical area and may be quite small. The widespread usage of LANs in businesses and educational institutions worldwide has come from the development of common networking protocols and media.
3.6.2
Metropolitan Area Network (MAN)
A type of WAN that interconnects LANs and computers within a specific geographical area. The distance covered is between 1 and 10 km.
3.6.3
Campus Area Network (CAN)
A variation of a LAN that extends to include computers in buildings close to one another, such as in an office park, buildings of a college, a manufacturing company, or campus setting.
Basics of Modern Computer Systems
3.6.4
11
Wide Area Network (WAN)
Wide area networking joins several LANs that are physically distant. Many LANs are connected using dedicated leased lines, such as a T1 or T3, dial-up phone lines (both synchronous and asynchronous), satellite links, and data packet carrier services. WANs can be as basic as a modem and a remote access server that employees can dial into or as complex as hundreds of branch offices linked globally. Unique muting techniques and filters reduce the cost of transmitting data over long distances.
3.6.5
Wireless Local Area Network (WLAN)
A solid-state drive removes search time, latency, and other electromechanical drawbacks and flaws associated with traditional hard disk drives since it contains no moving components. WLANs provide users with mobility, flexibility, and enhanced productivity by allowing them to connect to a local area network without using a cable.
3.6.6
Intranet: A Secure Internet-Like Network for Organizations
An intranet is a private network that uses Internet-like features only available within that company. 3.6.7
Extranet: A Secure Means for Sharing Information with Partners
Extranets are often used by businesses to securely communicate data with their business partners, whereas intranets are used to transfer confidential information within a company. Encryption and secure authentication techniques are available to safeguard data and guarantee that only those with suitable access credentials may see it. 3.6.8
Networking Relationship
1. A server-based (client/server) network is a network of linked computers and peripherals with a centralized server that allows network data, software, and hardware resources to be shared. A client/server network often has a central administrator who administers network rights and access. This topology is utilized for the majority of LANs, as well as practically all WANs and other network types that link via a WAN. 2. Peer-to-peer (peer-to-peer) networks connect two or more computers directly for the purpose of sharing data and hardware resources. A peer-based network has no central administrator and is essentially restricted to no more than ten PCs configured as a LAN.
12
3.7
H. B. Mehare et al.
Topologies
The topology of a network describes the basic structure, layout, and technologies utilized to support the network [11–13]. 3.7.1 Bus/Ethernet nodes are linked to hubs or switches, which are linked to a central backbone cable that spans the network. For Ethernet networks, the bus topology is often employed. 3.7.2 The primary network connection is arranged as a loop or ring in the ring/token architecture, and the workstations are connected to the direct cable at various places on the ring. PCs on a token ring network are linked to devices known as multiaccess units. 3.7.3 In star architecture, each workstation is directly connected to the central server through its connection. The ARCNet star architecture is still used today with Ethernet and token ring networks to cluster workstations with hubs, which are subsequently connected to the principal network cable. Aside from these, a hybrid topology that combines two or more of the network topologies discussed above can be constructed (Fig. 2).
Fig. 2 Types of topology (a) Bus topology (b) Ring topology (c) Star topology
Basics of Modern Computer Systems
13
4 Operating Systems An operating system serves as a bridge between the computer user and the computer hardware. An operating system’s objective is to offer an environment in which a user may run applications in a convenient, productive, timely, and efficient way [1, 12, 15].
Types 3: Operating systems
A computer system consists of: 1. Hardware. 2. System programs: Some of these elements may be merged or completely omitted in different operating systems. In serving as an intermediary between the users of computer services and the computer’s resources, the operating system provides three basic types of services [15]: 1. It accepts and processes commands and requests from the user and the user’s programs and presents relevant output results. 2. It manages, loads, and executes programs. 3. It manages the computer’s hardware resources, including the interfaces to networks and other external parts of the system.
14
H. B. Mehare et al.
Typically, an operating system provides most or all of the following capabilities: • • • • • • • • •
The operating system provides interfaces for the user and the user’s programs. It provides file system access and file support services. It provides I/O support services that every program can use. It provides a means for starting the computer and is known as bootstrapping or initial program load (IPL). It handles all interrupt processing, including error handling and recovery, I/O, and other routine interrupts. It provides services for networking. Most modern systems also offer services to support symmetric multiprocessing, clustering, and distributed processing. The operating system provides services that allocate resources, including memory, I/O devices, and CPU time to programs as they need them. It provides security and protection services: program and file control services to protect users’ programs and files and make communication between programs possible. It provides information and tools that can be used by the (human) system administrator to control, tailor, and tune the system for appropriate behavior and optimum performance.
4.1
Services and Facilities
There are 10 key blocks to consider, not all of which will be included in every operating system [14]: • • • • • • • • • •
The command processor, application program interface, and user interface. The file management system. The input/output control system. Process control management and interprocess communication. Memory management. Scheduling and dispatching. Secondary storage management. Network management, communication support, and communication interfaces. System protection management and security. Support for system administration.
Other system operations, such as accounting and error handling, are occasionally handled as independent blocks but are more usually found inside the mentioned blocks. In various operating systems, these components may be blended or completely omitted.
Basics of Modern Computer Systems
4.2 4.2.1
15
Modes of CPU Operation User Mode
The operating system is in user mode when it executes a user application. When a software requests help from the operating system, or when an interrupt or system call occurs, the transition from user mode to kernel mode occurs. 4.2.2
Kernel Mode
Only when the CPU is in kernel mode may it execute specific instructions (known as privilege instructions). They enable the usage of specified actions, the activation of which by the user program may interfere with the operating system or activity of another user application. When the system boots, it first operates in kernel mode before loading the operating system and launching applications in user mode.
4.3
Structure of an OS
The structure of the operating system is mostly determined by how the many standard components of the operating system are interrelated and integrated into the kernel. Based on this, we have the following operating system structures: 1. 2. 3. 4.
Simple structure. Layered structure. Micro-kernel. Modular structure/approach.
Note: • Assume the PC has Internet connectivity. In such a situation, it could be able to download its applications, including the operating system, from another network computer, such as a cloud server. This has given rise to the diskless workstation idea, which is a personal computer that, once launched, saves and accesses data and applications solely over the network. Interrupts or service requests from a program or a user can cause events. • Computer designers strive to combine computer hardware and operating systems such that each supports the capabilities of the other, resulting in a robust environment for users and their programs. Such a setting is said to be symbiotic.
16
4.4
H. B. Mehare et al.
Input/Output Services
I/O device driver programs in the operating system receive I/O requests and conduct real data transfers between the hardware and designated memory regions. In addition to the operating system’s I/O device drivers, current systems have dedicated I/O drivers with minimum functionality in ROM to ensure access to crucial devices like the keyboard, display, and boot disk during system startup. A single set of I/O services for each device assures that no applications compete for the device and that its use is governed by a single point of control.
4.5
Process Control Management
When a program gets accepted to the system, it is given memory space and the resources it needs to get started. As the process (executing software) runs, it may demand more resources or release resources that it no longer requires. Processes must be aligned on a regular basis to guarantee that processes that use a common resource do not step on one another’s toes by changing essential data or denying each other access to resources they require. Systems also allow distinct processes to communicate with one another by passing messages back and forth via interprocess messaging services. Process control management stores each process in memory. It indicates whether a method is running, ready to run, or awaits the completion of an event, such as I/O. In most contemporary systems, the process is further subdivided into smaller units known as threads. A thread is a section of a process that may be run independently. It shares memory and other resources with all other threads in the same process, but it may also execute independently.
4.6
Memory Management
The purpose of the memory management system is to load programs and program data into memory such that each program has enough memory to operate. Multiple programs will occupy memory at the same time in order for multitasking to occur, with each application in its own memory area. There are three primary functions of the memory management system: 1. It manages memory by keeping records that identify each program put into memory, the space being utilized, and the available space. It makes more space available for running applications as needed. It stops programs from accessing and writing memory beyond their assigned space, preventing them from inadvertently or purposefully harming other processes.
Basics of Modern Computer Systems
17
2. It keeps one or more queues of programs ready to be loaded into memory when space becomes available, based on program characteristics such as priority and memory needs, if necessary. When memory becomes available, it allocates it to the applications that will be loaded next. This is an uncommon occurrence in current computer systems. 3. When a program completes execution, it dislocates its memory space. Other applications can now use the unallocated space. Virtual storage is provided by any contemporary computer system by employing memory that contains hardware support for advanced memory management capability. A virtual repository creates the illusion of a much larger memory space than the real amount of physical storage in the computer system. Where virtual storage is available, the operating system’s memory management module communicates directly with the hardware. It offers the tools needed to build an integrated memory management system that makes the most of virtual storage characteristics.
4.7
Scheduling and Dispatch
Scheduling is divided into two tiers. Scheduling at one level specifies which jobs will be accepted to the system and in what sequence. When a job is accepted into the system, it is queued in some order of priority and eventually allotted memory space and other resources. Some operating systems split this function into two operations, one for system admission and the other for memory allocation. High-level scheduling is another name for this scheduling function. Dispatching, the second level of scheduling, is in charge of the actual selection of processes run by the CPU at any particular time. The dispatch component of the operating system enables concurrency by allocating CPU time such that numerous functions seem to be running concurrently. Dispatch is done at the thread level rather than the process level in systems that support the separation of processes into threads. Most of the time, new processes are admitted to the system and given memory space if it is available, or they are held until space is available, then they are revealed. Because a single application cannot be allowed to “hog” the computer, the dispatcher must regularly interrupt whatever process is running and run itself to check the condition of the machine’s resources and allocate CPU resources to ensure that every user and task receives what it requires. The dispatcher is also in charge of transferring control to the process being dispatched. This role involves preserving the prior operating program’s program counter, register values, and other parameters that describe the program’s state at the time it was halted, as well as restoring the actual previous state of the program being dispatched, if necessary. This is known as context flipping.
18
H. B. Mehare et al.
The dispatcher can be: 1. Preemptive: Preemptive multitasking uses the clock interrupt to preempt the executing program and to make a new decision as to which program executes next. 2. Non-preemptive: The dispatcher for a non-preemptive system replaces an executing program only if the program is blocked because of I/O or some other event or if the program voluntarily gives up the CPU. When necessary, the executing program may be suspended momentarily so that the CPU can handle interrupts, but when the interrupting task is complete, control is returned to the same program.
4.8
Secondary Storage Management
The secondary storage management system uses algorithms to organize requests for more efficient disk utilization in order to optimize the completion of I/O activities. It is usual on a busy system to have many disk requests outstanding at any same moment. The operating system software will try to process these requests in a way that improves system performance. There are several disk scheduling strategies in use: 1. 2. 3. 4.
First-come, first-served scheduling (FCFS). Shortest distance first scheduling. Scan scheduling. n-Step c-scan scheduling.
4.9
Network and Communications Support Services
The operating system’s network and communications support facilities execute the functions necessary to ensure that the system runs smoothly in a networked and distributed environment. TCP/IP allows you to find and connect to other computer systems; transfer application data in packet form from one system to another; access files, I/O devices, and programs from remote systems; provide error checking and correction when necessary; and support the requirements of distributed processing. The network and communication services of the operating system provide the communication software required to implement Wi-Fi, wired Ethernet, and TCP/IP functions and facilities. Most systems also include a comprehensive set of TCP/IP applications and extensions, such as e-mail, remote login, Web services, streaming multimedia, voice over IP telephony (VoIP), secure networking over the Internet (referred to as a virtual private network, or VPN), Bluetooth capability, and so on [13–15].
Basics of Modern Computer Systems
19
The operating system’s communications services also serve as an interface between the communication software and the OS I/O control mechanism, allowing access to the network. The I/O control system contains software drivers for modems, network interface cards, wireless communication cards, and other devices that link the computer to the network or networks physically and electrically.
4.10
Security and Protection Services
Interprocess messaging services are often provided by the operating system to allow processes to interact with one another without compromising the system. Critical components of the operating system run in a particularly protected mode of operation built into the CPU architecture. The operating system can securely block applications from executing certain instructions or accessing memory areas designated by the operating system. Each module in the operating system has safeguards for its assets [11]. The operating system also includes login and password services to assist and prevent unauthorized users from gaining access to data, as well as access control capabilities to allow users to safeguard their files at varying degrees of availability to other users and outsiders. Modern operating systems contain firewall protection, which makes it more difficult for outsiders to enter the system but is not perfect. Despite all of the security features provided by a current system, bugs, viruses, and vulnerabilities within the operating system, as well as poor configuration of firewalls and other security features and poor user management policies such as weak password enforcement, can make a system vulnerable to outsider attack [12].
5 Files and Directories 5.1
File Management
A file is commonly described as a collection of connected data. Internally, a file might be divided into records or be a continuous stream of data. A file is a logical storage unit that is understandable by the person or program that utilizes it [1]. The file management system develops and maintains a mapping between a file’s logical storage needs and the physical location where it is maintained, recognizes and manipulates files based on user-supplied names, and keeps track of the available space on each system-connected device. The files are accessed by name by users and programs, and the file management system manages the details [14]. File management systems are necessary in systems where secondary storage devices are shared by several users because they offer a directory structure that ensures that physical storage is not used twice. Users might accidentally overwrite each other’s files if this feature did not exist [15].
20
H. B. Mehare et al.
A path is a string of text that indicates the location of a file or directory. Files are frequently kept on the user’s computer or at a remote location and include the following characteristics: • Name: Only the symbolic file name is preserved in a human-readable format. • Identifier: This one-of-a-kind tag, generally a number, uniquely identifies a file within the file system; it is its non-human-readable name. • Type: This information is required by systems that support various file types. • Location: This data consists of a device pointer and the file’s location on that device. • Size: This attribute contains the file’s current size (in bytes, words, or blocks) and the maximum allowable size. • Protection: Who can read, write, execute, and so on is determined by access control information. • Time, date, and user identification: This data may be saved for creation, last update, and last use. These records can be helpful for security, protection, and usage tracking. The operating system is responsible for the following activities in connection with file management: • • • • • •
Creating, truncating, and deleting files. Writing, reading, and repositioning within a file. Creating and deleting directories to organize files. Supporting primitives for manipulating files and directories. Mapping files onto secondary storage. Backing up files on stable (non-volatile) storage media.
The person who generated the file specifies its contents. A text file is a collection of characters organized into lines (and possibly pages). A source file is made up of a sequence of functions that are separated into declarations and executable statements. An executable file is a set of code segments that the loader may load and run [16].
5.2 5.2.1
File Access Method Sequential Access
It is based on a taped file idea, and the data is evaluated one record at a time. A read operation reads the next portion of the file and advances a file pointer, which keeps track of the I/O location automatically. As a result, the write operation advances to the end of the newly written data and adds to the file’s end (the new end of the file).
Basics of Modern Computer Systems
5.2.2
21
Direct/Relative Access
The direct-access strategy is based on a disk model of a file since disks offer random access to any file block. The file is viewed as a collection of numbered blocks or records. Databases are frequently of this sort because they may give instant access to vast volumes of information. Different access techniques can be created on top of a direct-access mechanism.
5.3
Directories
A disk can be partitioned into quarters, each carrying its own file system. Partitioning helps to restrict the size of individual file systems, allows many file system types to be stored on the same device, and allows a section of the device to be used for other purposes. Each disk holding a file system must also carry file system information. The device directory stores information about all files on that volume, such as their name, location, size, and type.
5.4
Operations Performed on a Directory
Look for a file. We must be able to search a directory hierarchy for the entry for a certain file. Because files have symbolic names and similar names might imply a link between files, we may wish to locate all files whose names fit a specific pattern. • • • • •
Create a file. Delete a file. List a directory. Rename a file. Traverse the file system.
22
H. B. Mehare et al.
The following are the most prevalent approaches for defining a directory’s logical structure: 1. 2. 3. 4. 5.
Single-level directory. Two-level directory. Tree-structured directory. Acyclic graph directory. General graph directory.
5.5
File Sharing
File sharing depends on the semantics provided by the system and is achieved due to the existence of the following: 1. 2. 3. 4. 5.
Multiple users. Remote file systems. Client/server model. Distributed information systems. Failure modes.
5.6
File Protection
The ability to access files directly causes the necessity to safeguard files. The most common approach to the security problem is to make access dependent on the user’s identification. The most general technique for implementing identity-dependent access is to attach an access control list (ACL) with each file and directory, defining user names and the types of access authorized for each user. • Owner: The user who created the file. • Group: A group is a collection of users sharing a file and requiring the same level of access. • Universe: The universe comprises all other users in the system.
6 Programs and Shells 6.1
Programs
A single command is an instruction, and a computer program is a collection of instructions that, when executed by a computer, accomplish a specified task. Most computer gadgets require applications to work properly. A programmer creates computer programs using a programming language. As a rule of thumb, algorithm + data structure ¼ program [14, 16].
Basics of Modern Computer Systems
23
Here are a few examples of computer program applications: • Microsoft Word, Microsoft Excel, Adobe Photoshop, Internet Explorer, Chrome, and other apps are examples of computer programs. • Computer programs are used in filmmaking to produce pictures and special effects. • Computer programs are used to do ultrasounds, X-rays, and other medical procedures. • Computer apps on our mobile phones manage SMS, chat, and voice communication.
6.2
Shells
A shell is the user interface to the operating system. At its core, a shell is merely a macro processor that executes instructions. The shell may read commands from a terminal or a file [15]. A Unix shell is a command interpreter that provides a user interface to the vast array of Unix utilities as well as a programming language that allows them to be integrated. A shell permits synchronous and asynchronous execution of Unix commands. Shells can be used interactively or passively and can accept keyboard input or shell scripts. A shell script is a file that comprises a series of commands from the shell and the operating system [16, 17]. It takes a command line and evaluates it according to a fixed set of rules to direct the operating system to take appropriate action. The shell recognizes three types of commands: • A command can be an executable that contains object code generated by source code compilation. • An executable containing a series of shell command lines might be considered a command. • A command can be an internal shell command, transforming the shell into a programming language as well as a command interpreter. The shell, like any high-level language, has variables, flow control structures, quotes, and functions. Each time a terminal window is invoked, a new shell is launched. A shell can provide users with one or more of the following features: • • • • •
Create an environment that meets their needs. Write shell scripts. Define-command aliases. Edit the command line. Manipulate the command history.
24
• • • • •
H. B. Mehare et al.
Automatically complete the command line. Run lengthy tasks in the background. Store data in user-defined or shell-defined variables. Link any number of commands together (piping). Redirect program input and output.
User-related shells: Standard shell, C shell, KornShell. Security-related shells: Trusted shell, remote shell, restricted shell, secure shell.
7 Programming Languages A programming language is used to connect with computers by programmers (developers) [18, 19].
7.1
Types
• Low-level programming languages: Low-level programming languages are machine-dependent. Programs written in these languages can be run without the need for a compiler or interpreter. • Machine language: A machine language is made up of strings of binary numbers (i.e., 0s and 1s), and it is the only language that the processor understands directly. It ensures quicker execution and lower memory needs, but it is difficult to develop and debug the code. • For example, 1GL assembly language: An assembler converts this language into machine code by using mnemonic code (symbolic operation code like “ADD” for addition) instead of 0s and 1s. The resulting program is known as object code. It speeds up development and debugging, although it is hardware dependent. • High-level programming languages: High-level programming languages are machine-independent. A compiler or interpreter transforms programs written for one computer into machine language, allowing them to run on other machines. They are easier to read, develop, and maintain because they are program-oriented, e.g., C/C++, Java, Python, MATLAB, R, Objective C, Cobol, Perl, LISP, Pascal, FORTRAN, Swift, etc. • Procedural oriented programming language. • Object-oriented programming language. • Natural language.
Basics of Modern Computer Systems
7.2
25
Compiler vs Interpreter
Compilers and interpreters effectively translate high-level source code into machinereadable code. An interpreter turns each high-level program statement to machine code when the program is run, whereas a compiler converts the entire program into object/binary code prior to execution. C/C++, Java, and other languages utilize compilers, whereas PHP, Perl, and Ruby employ interpreters. Python, Java, C/C++, C#JavaScript, R, PHP, Go, and Ruby are the most popular programming languages. Python will be the emphasis of this book.
8 Troubleshooting Computer Problems Problems are expected to arise regularly, particularly in computer-related areas. Understanding every troubleshooting technique and issue that may arise is not possible, but learning how to use the existing tools to efficiently address them is a valuable skill. Troubleshooting frequently involves the process of elimination, in which a technician takes a set of procedures to discover or remedy a problem. While we are familiar with typical troubleshooting techniques such as running anti-malware, restarting the device, upgrading the software, applying updates, and so on, some issues may arise that cannot be fixed immediately. Conducting research, discovering solutions, and educating yourself about new topics, including professional networking and public speaking, are critical to continuing your personal growth and success.
8.1 8.1.1
Resources Google
When it comes to error codes or complex difficulties, search engines might be your greatest friend. We will get to the results much faster if we refine our comprehension and use advanced search strategies. Google Scholar is a terrific resource and tool for learning new things. Explore the following pages to learn more about Google: • http://www.google.com/support/websearch/bin/answer.py?hl¼en&answer¼13 6861 • https://in.pcmag.com/software-services/147497/21-google-search-tips-youllwant-to-learn
26
H. B. Mehare et al.
Errors in code are often a very common problem that we will face as a beginner in Python. To troubleshoot this issue, copy-paste the error on the google search bar within quotation marks. E.g.: “SyntaxError: Missing parentheses” The search result for this query will show us how to troubleshoot this issue effectively.
8.1.2
Webopedia
It is an online computer dictionary and a good quick reference source.
References 1. Englander, I., & Wong, W. (2021). The architecture of computer hardware, systems software, and networking: An information technology approach. Wiley. 2. Kernighan, B. W. (2021). Understanding the digital world: What you need to know about computers, the internet, privacy, and security. Princeton University Press. 3. Wilson, K. (2019). Essential computer hardware: The illustrated guide to understanding computer hardware. Elluminet Press Ltd.. 4. Carl Hamacher, Z. V., & Safwat Zaky, N. M. (2019). Computer organization and embedded systems. 5. Ledin, J., & Farley, D. (2022). Modern computer architecture and organization: Learn x86, ARM, and RISC-V architectures and the design of smartphones, PCs, and cloud servers. Packt Publishing Ltd.. 6. Clements, A. (2006). Principles of computer hardware. Oxford University Press. 7. Gilster, R. (2001). PC hardware: A beginner's guide. Tata McGraw-Hill Education. 8. Reddy, N. (2016). PC hardware. NEO Publishing House. 9. Thompson, R. B., & Thompson, B. F. (2003). PC hardware in a nutshell: A desktop quick reference. O’Reilly Media. 10. Patterson, D. A., & Hennessy, J. L. (2016). Computer organization and design ARM edition: The hardware software interface. Morgan kaufmann. 11. Bello, S. A., Agunsoye, J. O., Adebisi, J. A., Adeyemo, R. G., & Hassan, S. B. (2020). Optimization of tensile properties of epoxy aluminum particulate composites using regression models. Journal of King Saud University-Science, 32(1), 402–411. 12. Mach, D. (2019). Networking for beginners: Easy guide to learn basic/advanced computer network, hardware, wireless, and cabling. Lte, internet and cyber security. Dylan Mach. 13. Kurose, J. F., & Ross, K. W. (1986). Computer networking. A top-down. 14. Silberschatz, A., Galvin, P. B., & Gagne, G. (2006). Operating system concepts. Wiley. 15. Deitel, H. M., Deitel, P. J., & Choffnes, D. R. (2005). Sistemi operativi. Pearson Italia Spa. 16. Pehlivan, H. (2000). A sophisticated Shell environment. http://www.cs.bris.ac.uk/Tools/ Reports/Abstracts/2000-pehlivan.html 17. Cannon, J. (2015). Shell scripting: How to automate command line tasks using bash scripting and Shell programming. CreateSpace Independent Publishing Platform. 18. Brey, B. B. (2000). The Intel microprocessors 8086: Architecture, programming, and interfacing. Prentice Hall. 19. McConnell, S. (2004). Code complete. Pearson Education.
The Python Programming Language Hussam Bin Mehare, Jishnu Pillai Anilkumar, and Naushad Ahmad Usmani
1 Short History Guido van Rossum began to work on Python in late 1989 at CWI in the Netherlands as a successor to the ABC programming language, capable of managing exceptions and communicating with the Amoeba operating system, and it was finally launched for public distribution in early 1991 (with version number 0.9.0) [1, 2]. Python is a well-known general-purpose high-level programming language. It is compatible with a variety of programming paradigms, including object-oriented, imperative, functional, and procedural programming. It has a large standard library, a dynamic type system, and clever memory management. Python interpreters may be installed on a wide range of operating systems, allowing Python programs to operate across several platforms [1–3]. A number of alpha, beta, and release candidates are supplied as previews and for testing prior to the final release. Over 72,000 software modules have also been given by the Python community to the Python Package Index (commonly known as PyPI), the official repository for third-party Python libraries (as of January 2016) [1–3]. The most recent version, Python 3.4.x, has better and consistent functionality. All main frameworks, on the other hand, are still running on version 2.7.x and will be for some time [1].
H. B. Mehare (✉) Department of Mechanical Engineering, Z.H. College of Engineering and Technology, Aligarh Muslim University, Aligarh, Uttar Pradesh, India J. P. Anilkumar Department of Computer Science & Engineering, Presidency University, Bengaluru, Karnataka, India N. A. Usmani Teaching Faculty, Buraimi University College, Buraimi, Oman © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_2
27
28
H. B. Mehare et al.
2 The Python Interpreter The term “interpreter” can be used in a variety of contexts when discussing Python. The Python REPL, which you may access by entering Python at the command line, is also known as an interpreter. When discussing the execution of Python code from start to finish, the terms “the Python interpreter” and “Python” are occasionally used interchangeably. The term “interpreter” has a more particular meaning in this chapter: it refers to the final stage in the Python program execution process. Before the interpreter takes control, Python performs three more processes: lexing, parsing, and compilation. These procedures collaborate to transform the programmer’s source code from lines of text into structured code objects that the interpreter can understand. The Python interpreter is a virtual machine and bytecode interpreter, which implies that it is software that mimics the functioning of a physical computer. It also runs operations on many stacks. Python interpreter accepts bytecode as input. Bytecode is a Python intermediate form that allows the interpreter to comprehend your source code [2–6]. Python source files are encoded in UTF-8 by default. Using that encoding, characters from most languages throughout the globe may be used in string literals, identifiers, and comments; however the standard library only uses ASCII characters for identifiers, as should any portable code. To correctly display all of these characters, the editor must recognize that the file is UTF-8 and choose a font that supports all of the characters in the file. Some Python modules can also be used as scripts. These may be accessible using the command Python -m module [arg]..., which executes the source file of the module as if you had written its complete name on the command line [2–6]. Argument passing: When the interpreter knows the script name and any additional arguments, they are transformed into a list of strings and assigned to the argv variable in the sys module [2–6]. Interactive mode: When instructions are read from a tty, the interpreter is regarded to be in interactive mode. The primary prompt, generally three greater than signs (>>>), demands the next command in this mode; the secondary prompt, by default three dots, wants continuation lines (...). The interpreter displays a welcome message with its version number and a copyright notice before presenting the first prompt [2–6].
The Python Programming Language
29
3 Basic Syntax 3.1
Python Identifiers
Identifiers are Python entities that are similar to classes, functions, and variables [3–11]. • • • •
It might be a mix of capital and lowercase letters (a to z or A to Z). Any number (0–9) or an underscore (_) can be used. These are the general guidelines for writing identifiers in Python. It must not begin with a digit. 1variable, for example, is invalid, although variable1 is valid. • Reserved keywords (see Table 1) in Python cannot be used as identifiers. • Special characters such as!, @, #, $, percent, and others are not allowed in the identifiers.
3.2
Variables
They are the labels assigned to the information that our programs must store and maintain. When naming variables, case matters. userName and username are not synonymous [3–11]. For example, age = 20 and name = “John”
3.3
Keywords
Keywords are words that have a special meaning for a language’s compiler or interpreter. All of the keywords are in lowercase except for True, False, and None. FALSE, Class, Finally, Is, return, None, Continue, For, try, and so on are examples [5–7]
Table 1 Arithmetic operators Operator + – * / % ** //
Description Addition Subtraction Multiplication Division Modulus Exponentiation Floor division
Example a + b = 99 a – b = –21 a * b = 10 a/b = 8 a%b=1 a ** b = ab = 128 7 // 3 = 2
30
3.4
H. B. Mehare et al.
Indentation
One of Python’s most distinguishing characteristics is the use of indentation to denote code blocks. That is, each piece of code must be aligned in the same manner. In Python, each line of code must be indented by the same amount to represent a block of code [8–13]. For example, x =1 Y=2
3.5 3.5.1
Basic Operators Arithmetic Operators
Arithmetic operators are used to perform numerical operations such as addition, subtraction, multiplication, and division [3–11].
3.5.2
Assignment Operators
Assignment operators are used to assign values to variables in Python [3–11] (Table 2).
Table 2 Assignment operators Operator = + = Add AND – = Subtract AND * = Multiply AND /= Divide AND % = Modulus AND ** = Exponent AND //= Floor division
Description Assigns a value from the left operand to the right operand Adds the right and left operands together and then assigns the result to the left operand Subtracts the right and left operands together and assigns the result to the left operand The right operand is multiplied by the left operand, and the result is assigned to the left operand The right operand is divided by the left operand, and the result is assigned to the left operand The modulus of the two operands is calculated, and the result is assigned to the left operand The result of the exponent (power) operation on the operands is assigned to the left operand The operands undergo floor division, and the result is assigned to the left operand
Example z = x + y assigns the value of x + y to z z + = x is equivalent to z=z+x z - = x is equivalent to z=z–x z * = x is equivalent to z=z*x z /= x is equivalent to z=z/x z % = x is equivalent to z = z % x z ** = x is equivalent to z = z ** x z //= x is equivalent to z = z // x
The Python Programming Language
31
Table 3 Relational operators Operator == ! = OR
> < >= b) is not true (a < b) is true (a > = b) is not true (a < = b) is true
Table 4 Logical operators Operator Logical AND Logical OR Logical NOT
3.5.3
Description If both operands are true, the condition is fulfilled The condition becomes true if any of the operands are non-zero Used to invert the operand’s logical state
Example (var1 AND var2) is true (var1 OR var2) is true NOT (var1 AND var2) is false
Comparison/Relational Operators
When comparing values, the comparison or relational operators come in handy. For a given condition, it would return True or False as a response [3–11] (Table 3).
3.5.4
Logical Operators
The AND, OR, and NOT operators are logical operators. These are useful for comparing two variables against a specified condition and deciding whether the result is True or False [3–11] (Table 4).
3.5.5
Identity Operators
Identity operators are helpful for determining if two variables are stored in the same location in memory. In Python, there are two identity operators: “is” and “is not” [3–11] (Table 5).
32
H. B. Mehare et al.
Table 5 Identity operators Operator Is Is Not
Description If both variables on either side of the operator point to the same object, the condition turns True; otherwise, False If the variables on both sides of the operator point to the same object, the result is False; otherwise, the result is True
Example var1 is var2 var1 is not var2
Table 6 Membership operators Operator In Not In
3.5.6
Description If a value is in the provided sequence, it returns True; otherwise, it returns False If a value is not in the provided sequence, it returns True; otherwise, it returns False
Example var1 in var2 var1 not in var2
Membership Operators
The membership operators can be used to check whether a value exists in a sequence such as a string, list, tuple, set, or dictionary. There are two membership operators in Python: “in” and “not in” [3–11] (Table 6).
3.5.7
Bitwise Operators
Bits, or a sequence of 0s and 1s, are employed in computers to represent everything (one). Bitwise operators allow us to directly edit or act on bits [3–11].
3.6
Comments
Comments are used during debugging to discover errors and clarify the code. It is intended for non-programmers as a reference guide [3–11].
3.6.1
Single Line Comment
The Python interpreter ignores all characters following the # (hash) and up to the end of the line as part of the remark. Example: # In Python, this is a one line comment.
The Python Programming Language
3.6.2
33
Multiline Comments
Any characters between the strings """ (referred to as multiline string), that is, one at the beginning and one at the conclusion of your comments, will be ignored by the Python interpreter. Example: \s""" This is an example of a multiline comment that spans several lines. Everything in between is seen as comments """
3.7 3.7.1
Data Types None
A singleton null object is None. 3.7.2
Boolean
True or False are the two values returned by Boolean data types. 3.7.3
Integers
Integers are integers that do not have any decimal parts, such as -5, -4, -3, 0, 5, 7, and so on. Syntax: variable_name = initial value Example: age = 18, etc. 3.7.4
Float
Float numbers, such as 1.234, -0.023, and 12.01, have decimal parts. Syntax: variable_name = initial value Example: userHeight = 1.82, etc. 3.7.5
String
Text is represented by a string. Syntax: variable_name = ‘initial value’ (single quotes) variable_name = “initial value” (double quotes) Example: location = “Asia”, etc.
34
H. B. Mehare et al.
3.7.6
List
A list is a collection of data that is linked together. Use this function when an ordered sequence of homogenous collections is required; the values can be changed later in the application [5–11]. Syntax: list_name = [initial values]
Example: listName = [] # indicates that the list is empty. age = [21, 22, 23, 24, 25]. # In this case, age 21 is in the zeroth index, 22 in the second index, and so on.
(a) To modify the entries in a list, Syntax: list_name[index of item to be modified] = new value (b) To add entries into a list, Syntax: list_name.append(new value) (c) To remove entries from a list, Syntax: list_name[index of item to be deleted] Slice Syntax: list_name[start: stop: step] Items from the list with index 2 to index (n - 1 = 4 - 1 = 3) are assigned to age new from age when age new = age[2:4]. To put it another way, age_new = [23, 24]. 3.7.7
Tuple
Tuples are like lists in that their values cannot be modified. The initial settings are those that will be maintained throughout the program. Use this method when an ordered sequence of heterogeneous collections is required, as long as the values do not need to be changed later in the application [7–15]. Syntax: tupleName = (initial values) Example: Months = (“Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec”). Individual tuple values can be retrieved by their indices, such as Months [0] = “Jan”.
The Python Programming Language
3.7.8
35
Dictionary
A dictionary is a collection of connected data PAIRS. Dictionary keys must be unique (within one dictionary). It’s ideal for when you need to correlate values with keys so you can easily search them up with a key [3–11]. Syntax: dictionary_name = {dictionary key: data} Example: user_details = {“Liam”:54, “Jack”:51, “Lily”:13, “Maria”:“Not Available”} (a) To modify entries in a dictionary, Syntax: dictionary_name[dictionary key of item to be modified] = new data (b) To add entries to a dictionary, Syntax: dictionary_name[dictionary key] = data (c) To remove entries from a dictionary, Syntax: del dictionary_name[dictionary key] 3.7.9
Sets
A set is an unordered collection data type that is iterable, changeable, and does not contain duplicate entries, in other words, a collection that is unsorted, unchangeable, and unindexed. When you don’t need to maintain duplicates and don’t care about the arrangement of the contents, this is the ideal option. The only purpose is to find out if a particular value already exists [3–11]. Syntax: set_name = {values} Example: Fruits = {“apple”, “banana”, “cherry”}.
3.8
Type Casting
The process of transforming a variable or value from one data type to another is known as type casting. Python has three built-in functions that allow us to type cast: int(), float(), and str() [4–16]. 3.8.1
int()
(i) int(5.712987) = 5, (ii) int(“4”) = 4,
36
3.8.2
H. B. Mehare et al.
float()
(iii) float(2) = float(“2”) = 2.0, (iv) float(“2.09109”) = 2.09109, 3.8.3
str()
(v) str(2.1) = “2.1”, (vi) str(30) = “30”.
3.9 3.9.1
Escape Characters [3–11] \n (Prints a newline)
print (‘Hello\nWorld’) Output: Hello World
3.9.2
\cr (Prints the backslash character itself)
print (‘\cr’) Output: \ 3.9.3
\” (Prints quotes, so that the quotes does not signal the end of the string)
print (“I am 5\’9\” tall”) Output: I am 5’9” tall
3.9.4
\t (Prints a tab space)
print (‘Hello\tWorld’) Output: Hello World
Note: For instance, if the programmer doesn’t want “\t” to be interpreted as a tab, then use. print (r‘Hello\tWorld’) Output: Hello\tWorld
The Python Programming Language
3.10
37
Condition Statements
All control flow instruments analyze a condition statement. The program will proceed in a different manner depending on whether the condition is satisfied [3–14]. 3.10.1
If Statement
The if statement is one of the most often used control flow statements. It allows the program to detect whether a particular condition has been satisfied and, if so, to execute the necessary action based on the findings. The following is the structure of an if statement [3–14]: Syntax: if condition 1 is met: do A elif condition 2 is met: do B elif condition 3 is met: do C elif condition 4 is met: do D else: do E Where elif stands for “else if”
Inline If If you simply need to do a short action, an inline if statement is a better option than an if statement. Syntax: do Task A if condition is true else do Task B. Example: (a) num1 = 12 if num == 10 else 13, (b) print (“This is the first challenge” if num == 10 else “This is the second challenge”).
3.10.2
For Loop
The for loop performs a block of code until the for statement’s condition is no longer valid [3–17].
38
H. B. Mehare et al.
Syntax: for a in iterable: print (a) Example: (a) Looping Through a List laptop = [‘HP’, ‘Macbook Air’, ‘Lenovo’, ‘Acer’]
for myLaptop in laptop: print (myLaptop)
The variable myLaptop gets allocated HP the first time the program runs through the for loop. The statement print then prints the value HP (myLaptop). The software assigns the value Macbook Air to myLaptop and outputs the value ‘Macbook Air’ the second time the for statement is looped through. The program loops over the list until it reaches the end. (b) Looping Through a String. message = ‘Hello’ for i in message: print (i)
Output: H e l l o (c) Looping Through a Sequence of Numbers. The range() technique is handy for looping through a numeric sequence. The range() function returns a list of numbers. Syntax: range (start, end, step) If start is not specified, the produced numbers will start at zero. Note: 1. In Python, unless otherwise specified, we always begin at zero (and other computer languages). 2. A layered control statement employs a while loop within an if statement or a for loop within a while loop. Example: for i in range(5): print (i).
The Python Programming Language
39
Output: 0 1 2 3 4 3.10.3
While Loop
While loops run instructions within the loop indefinitely as long as a specified condition is fulfilled. Syntax: while condition is true: do A Example: counter = 5. while counter >0: print (“Counter = ”, counter). counter = counter -1.
Output: Counter = 5. Counter = 4. Counter = 3. Counter = 2. Counter = 1. The value of counter should be reduced by one so that the loop condition while counter >0 evaluates to False eventually.
3.10.4
Break
When dealing with loops, you may wish to totally escape the loop if a certain condition is fulfilled. To do this, the break keyword is utilized [6–11]. Example: j = 0. for i in range(5): j = j + 2. print (‘i = ’, i, ‘, j = ’, j). if j == 6: break.
Because we used the function range, the program should loop from I = 0 to I = 4 without stopping (5). When the break keyword is used, the program terminates prematurely at I = 2. When I = 2, j reaches the value of 6, the loop stops when the break keyword is invoked.
40
H. B. Mehare et al.
3.10.5
Continue
When we use continue, the entirety of the loop following the keyword is skipped for that iteration [6–14]. Example: j = 0 for i in range(5): j = j + 2 print (‘\ni = ’, i, ‘, j = ’, j) if j == 6: continue print (‘I will be skipped over if j=6’)
Output: i=0,j=2 I will be skipped over if j=6 i=1,j=4 I will be skipped over if j=6 i=2,j=6 i=3,j=8 I will be skipped over if j=6 i = 4 , j = 10 I will be skipped over if j=6 The line after the continue keyword is not written when j = 6. Apart from that, everything is as usual.
3.11
Try, Except
When an error occurs, this statement determines how the program will proceed [8–15]. Syntax: try: # Program Code except: # optional block # Handling of exception (if required) else: # execute if no exception finally: # Code that is always executed
The Python Programming Language
41
Example: int a = 30 try: a = a/0 print (a) except: print (“An error occurred”)
When you run the software, the message “An error occurred” will show. Because you can’t divide a number by zero, an error occurs when the program tries to execute the expression a = a/0 in the try block. The remaining try block is ignored in favor of the statement in the except block. If you want to display more specific error messages to your users based on the issue, add the error type after the unless keyword.
The try-except block is intended to handle code errors. It’s a last-ditch attempt to find any unanticipated flaws. Error types include [9–13]: 1. ValueError This flag is triggered when a built-in operation or function gets a parameter with the right type but an erroneous value. 2. ZeroDivisionError When the software tries to divide by zero, this value gets raised. 3. IOError When an Input-Output (I/O) operation fails for an I/O-related reason, this flag is raised. For example, “file not found”.
42
H. B. Mehare et al.
4. ImportError When an import statement fails to locate the module definition, this error is raised. 5. IndexError When the index of a sequence (string, list, or tuple) is out of range, this value is raised. 6. KeyError When a dictionary key cannot be discovered, this flag is raised. 7. NameError When a local or global name cannot be found, this flag is raised. 8. TypeError When an operation or function is performed on an object of the wrong type, this indicator is raised. Python additionally has error messages that are pre-programmed for each type of issue. To display the message, use the as keyword after the error type. except Exception as e: print (“Unknown error: ”, e)
Example: def exception_handling(a): if a < 6: x = a/a–5 # Throws ZeroDivisionError at a = 5 print(x)
# Throws NameError when x > = 6
try: exception_handling(3) exception_handling(5) except ZeroDivisionError: print(“ZeroDivisionError occurred and handled”) except NameError: print(“NameError occurred and handled”) else print(x) finally: print(“The code has completed execution”)
The Python Programming Language
3.12
43
Suites
In Python, a suite is a grouping of discrete statements that together comprise a single code block. A header line followed by a suite is required for compound or complicated expressions like as if, while, def, and class [3–11].
3.13
Functions
A function is a collection of connected statements that execute computation, logic, or evaluation. Repetitive activities are grouped into a single unit that can be utilized several times with varying inputs based on the demands of the program [3–14]. Functions must be defined, and the rules for defining a function in Python are listed below: • A function block begins with the word def, which is followed by the function’s name and open and closing parentheses. Following that, a colon (:) should be used to signify the end of the function header [3–11]. • Functions can be supplied with arguments and parameters. In the parameter’s header, any input arguments or parameters should be contained in parentheses [3– 11]. • The major code statements should be indented and positioned beneath the function header to indicate that they are part of the same function [3–11]. • Functions can return an expression to the caller. If the return method is not used at the end of the function, it will act as a subroutine. The main contrast between a function and a subroutine is that a function always returns an expression, whereas a subroutine does not [3–11]. They are of two types: (a) Built-in (b) User-defined Built-In [3–11] The Python programming language comes with a number of built-in functions. For example: Scanner, abs, hash, etc. 1. Input() Syntax: variable_name = input(“Message inside Quotation marks”)
44
H. B. Mehare et al.
Example: Name = input(“Please enter your name: ”)
The prompt “Please enter your name:” will show on the screen to offer the user with instructions. When the user inputs the necessary information, it is saved as a string in the variable Name. 2. Print() The print() method allows users to see information. As arguments, it allows zero or more expressions separated by commas. Syntax: print(“message”) (a) Simple Messages print (“Hello World”)
(b) Text with Variables Assume that name and age are two variables with some values. (i) print (“My name is”, name, “and I am”, age, “years old”.) (ii) The % formatter is another option for printing a statement with variables. print (“My name is %s and I am %s years old”. %(name, age)) (iii) Use the format() function to print the same statement. print (“My name is {} and I am {} years old”.format(myName, myAge)) (c) Triple Quotes Triple quotes can be used to span many lines of a text. This aids in improving the readability of a communication. Example: print (“‘Hello World. My name is James and I am 20 years old.”’)
Output: Hello World. My name is James and I am 20 years old. User-Defined Python allows us to write our own functions and use them. Syntax: 1. Making a function declaration and definitiondef function_name(parameters): #code detailing what the function should do return [expression]
The Python Programming Language
45
2. Making a function call function_name(arguments) def It is the place where functions and methods are defined. It declares a function/ method (object) connected to a variable in the current namespace [3–11]. return It is used to obtain the output of a function. It supports zero or more commaseparated values, with None as the default. A function that does not explicitly return a value returns None [3–11]. parameter Formal parameters of a function can have default values. Mutable objects should not be used for default values. Keyword arguments must come before default arguments, and regular arguments must come before keyword arguments. If a function has a default value for a parameter, all “normal” inputs must go through the default value parameter. Parameters should be supplied in the following sequence, from left to right [3–11]: • • • •
Normal arguments. Arguments with default values. Tuple: Argument list (*args). Dictionary: Keyword arguments (**kwargs).
arguments When invoking a function, values can be passed as positional arguments or keyword arguments. Positional arguments must appear before (to the left of) keyword arguments [6–11]. Example: #Declaration & Definition def cars(name): print(name) #Calling the function cars(‘Rolls Royce’)
3.14
Scope of Variables
The scope of a variable defines whether or not a variable or identifier is available within the program during and after execution. There are two fundamental variable scopes in Python [8–17].
46
H. B. Mehare et al.
(a) Global Variables. We must utilize global when setting the value of a global variable within a function. Variable declarations made outside of a code block. For example: global variable_name = value (b) Local Variables. By default, assignment in a function or method produces local variables. Any binding operation also creates a local variable. Variables declared within a code block. For example: # Global variable x = 30 # Simple function to add two numbers def product(y): # Local variable Z=x*y return Z # Call the function and print result print product(15)
4 Working with Popular Libraries (Modules) 4.1
Modules
A Python module is a file that contains Python source code. A module is a piece of code that may be shared and imported. When a module is imported, it is evaluated and a module object is produced. It is accessible to all modules that import a single module object. The module object has characteristics [8–17]. The following features are very intriguing: • _doc__ -- The module’s documentation string. • When the module is imported, it has the name “__name__”, but when it is performed, it has the text “__main__”. • Other names that the module generates (binds). Python looks for modules in these places: • Look up sys.path. • Standard locations. • Variable environment PYTHONPATH.
The Python Programming Language
4.2
47
Packages
A package is a directory on the file system that includes the file __init.py. It “imports” the directory’s modules and allows users to import packages. When an application first imports something from that directory/package, it is assessed [8–17]. For example: from package_name import * Simple Package Installation: • Python setup.py build • Python setup.py install # as root Complex Package Installation • In the package’s root directory, look for a README or INSTALL file. • Pip is becoming increasingly popular for installing and maintaining Python packages. • If you are afraid that installing a new package would modify the behavior of an existing package on your computer, utilize virtualenv. It creates a directory and a Python environment in which you may install and use Python packages without changing your current Python installation.
4.3
NumPy
The NumPy package in Python serves as the foundation for scientific computing. It offers a multidimensional array object with good performance as well as functions for manipulating it. It is the numeric package’s heir [4–17]. To include NumPy, import numpy as np Array The NumPy Python module is the foundation of scientific computing. It includes a high-performance multidimensional array object as well as manipulation tools. It is the numeric package’s replacement. 1. Create a single dimensional array. a = np.array([0, 1, 2]) 2. Check the dimension of an array. print a.shape 3. Change the array element. a[0] = 5
48
H. B. Mehare et al.
4. Create a multidimensional array. b = np.array([[0,1,2],[3,4,5]]) 5. Create a 3x3 array of all zeros. a = np.zeros((3,3)) Output: [[ 0. 0. 0.] [ 0. 0. 0.] [ 0. 0. 0.]] # For ones, use a = np.ones((3,3)) 6. Create a 3x3 constant array. c = np.full((3,3), 7) Output: [[ 7. 7. 7.] [ 7. 7. 7.] [ 7. 7. 7.]] 7. Create a 3x3 array filled with random values. d = np.random.random((3,3)) Create a 3x3 identity matrix. e = np.eye(3) 8. Convert list to array. f = np.array([2, 3, 1, 0]) 9. Arrange values in ascending order. g = np.arange(20) 10. Create an array of range with float data type. i = np.arange(1, 8, dtype=np.float) 11. Element-wise sum. np.add(x, y) 12. Inner product of vectors. np.dot(a, b) 13. Not a number. np.nan 14. Transpose an array. a.T
4.4
Pandas
Pandas introduces two new NumPy-based data structures to Python: Series and DataFrame [4–17]. To include Pandas, import pandas as pd
The Python Programming Language
49
1. Series This is a one-dimensional object that works in the same way as columns in a spreadsheet or SQL table do. It is possible for values to be duplicated. pd.Series([1,2,3,4,5,6], index=[‘A’,‘B’, ‘C’, ‘D’, ‘E’, ‘F’]) 2. DataFrame It’s a two-dimensional object that looks like a spreadsheet or a SQL table. Example: data = {‘Gender’: [‘F’, ‘M’], ‘Age’: [25, 27, 25]} df = pd.DataFrame(data, columns=[‘Gender’, ‘Age’])
3. Reading Data (a) Single Files df=pd.read_csv(‘Data/file_name.csv’) # from csv df=pd.read_csv(‘Data/file_name.txt’, sep=‘\t’) # from text file df=pd.read_excel(‘Data/file_name.xlsx’,‘Sheet2’) Excel
# from
(b) Multiple Files
Excel_file = pd.ExcelFile(‘file_name.xls’) sheet1_df = pd.read_excel(Excel_file, ‘Sheet1’) sheet2_df = pd.read_excel(Excel_file, ‘Sheet2’)
4. Writing Data df.to_csv(‘Data/mtcars_new.csv’, index=False) df.to_csv(‘Data/mtcars_new.txt’, sep=‘\t’, index=False) df.to_excel(‘Data/mtcars_new.xlsx’,sheet_name=‘Sheet1’, index = False)
Note: (a) The index = False argument prevents the index values from being written; the default is True. (b) By default, Write overwrites any existing file of the same name. 5. Statistical Methods (a) describe() On each column of the dataframe, returns fast statistics such as count, mean, std (standard deviation), min, first quartile, median, third quartile, and max. Syntax: df.describe() (b) cov() Covariance describes the relationship between two variables. A positive covariance indicates that the variables are related favorably, whereas a negative covariance indicates that the variables are related inversely. Syntax: df.cov()
50
H. B. Mehare et al.
(c) corr() When two items are correlated, a change in one item impacts the other. Syntax: df.corr() 6. Viewing Data I. head Look at the top n records with this tool. The default value for n is 5 (if not specified). Syntax: df.head(n = value) II. tail Look at the last n records with this function. Syntax: df.tail()Obtain the column names III. columns Get column names. Syntax: df.columns IV. dtypes Obtain the column data types. Syntax: df.dtypes V. values Obtain the column values. Syntax: df.values VI. rename Change the name of a certain column. Syntax: Specific column df.rename(columns={‘old_columnname’:‘new_columnname’}, inplace=True) All columns df.columns = [‘col1_new_name’,‘col2_new_name’. . . .] VII. drop Remove any duplicates. Syntax: Drop all columns. df = df.drop_duplicates() Drop specific columns. df.drop_duplicates([‘column_name’]) Duplicates would be removed from a specified column, but the first or last observation would be kept in the duplicate set. df.drop_duplicates([‘column_name’], keep =‘first’) VIII. dropna Remove any rows and columns with missing values. Syntax: df.dropna()
The Python Programming Language
51
IX. fillna All missing values are replaced. Syntax: df.fillna(value = numeric_value) X. mean For each column, it returns the mean. Syntax: df.mean() XI. max/min Return the maximum/minimum values for each column. Syntax: df.max() df.min() XII. sum Returns the sum of each column. Syntax: df.sum() XIII. concat Concatenate two or more data frames. Syntax: df = pd.concat() XIV. merge Merge two dataframes based on a common value. Syntax: df = pd.merge(df_1, df_2, on = ‘common_value’, how = ‘left/ right/inner/outer’) XV. count Return count for each column. Syntax: df.count()
4.5
Matplotlib
It’s a fantastic tool for seeing or presenting data in a visual or graphical style. NumPy is a numerical mathematics extension. It enables analysts and decision-makers to visualize analytics, helping them to comprehend tough topics or detect new trends [4–17]. To include Matplotlib, import matplotlib.pyplot as plt df.hist() df.plot() df.boxplot() plt.bar(x,y) plt.scatter(x,y) plt.plot(x, y, label = ‘Sample Label’). plt.title(‘Sample Plot Title’)
# Histogram # Line graph # Box plot # Bar plot, (x, y) denotes the x and y axis values # Scatterplot
# Chart title
52
H. B. Mehare et al.
plt.xlabel(‘x axis label’) plt.ylabel(‘y axis label’) plt.grid(True) plt.show() plt.savefig(‘filename.png’)
4.6
# x axis title # y axis title # Show gridlines # Show the plotted figure # Saving chart to a file
Scikit-Learn
Scikits, or scientific toolboxes based on SciPy, come in a variety of forms and sizes. It is designed to work with Python’s NumPy and SciPy numerical and scientific libraries, and it includes support vector machines, random forests, gradient boosting, k-means, and DBSCAN, among other classification, regression, and clustering algorithms [4–17]. To include Scikit-learn, import sklearn sklearn.cluster sklearn.datasets sklearn.linear_model sklearn.naive_bayes sklearn.neighbors sklearn.neural_network sklearn.svm sklearn.tree sklearn.preprocessing sklearn.ensemble
4.7
# All inbuilt clustering algorithms and functions are here # All inbuilt datasets are here # All inbuilt linear models and functions are here # To use Naive Bayes model # To use nearest neighbors model # To use neural network models # To use support vector machine model # To use decision tree model # To use preprocessing and normalization techniques # To use ensemble methods
TensorFlow
TensorFlow is designed to operate with tf.Tensor objects, which are multidimensional arrays or tensors. It is a free and open-source library created by the Google Brain Team. It makes machine learning faster and easier by utilizing the Python programming language for numerical calculation and data flow. Tensor is a multidimensional array, and Flow is used to design a system’s data flow [4–17]. To include TensorFlow, import tensorflow as tf
The Python Programming Language
53
# Transpose given data elements # Concatenate data elements
tf.transpose(data) tf.concat([data_1, data_2, data_3], axis = value) tf.Variable([0.0, 0.0, 0.0]) tf.keras tf.examples.tutorials.mnist.input_data
4.8
# To store models # To bring the Keras functionalities # To use MNIST dataset
Keras
Keras is a deep learning API written in Python that works on top of the TensorFlow machine learning framework. It was designed to allow for fast experimentation [4–17]. To include Keras, From tensorflow import keras keras.models # Keras models (e.g.: model) keras.layers # Keras layers (e.g.: dense, input, convolutional, etc.) keras.utils # Keras utilities (e.g.: plot_model)
4.9
Pytorch
It is a deep learning Python library created and maintained by Facebook. It is production-ready, with cloud support, a robust ecosystem, and dispersed training [4–17]. To include Pytorch, import torch torch.Tensor([value]) torch.randn(value_1, value_2. . .) torch.autograd torch.optim torch.nn
# Define a tensor # Define a matrix with random values # For automatic differentiation # Implement optimization algorithms # Neural network layer (sequential, linear, etc.)
54
4.10
H. B. Mehare et al.
Pattern
This is more of a “full suite” library because it contains data collecting and analysis tools in addition to machine learning algorithms. The data mining area can help you collect information from websites like Google, Twitter, and Wikipedia. There is also a web crawler and an HTML DOM parser provided. The inclusion of these tools has the advantage of allowing data gathering and training to be done in the same program. It has shown to be beneficial in NLP, clustering, and classification because to the numerous functions it provides, and it is most typically used for text processing (sentiment analysis, modality, spell checks, etc.) [4–17]. Google Mining from pattern.web import Google google = Google() Twitter Mining from pattern.web import Twitter twitter = Twitter() Other important Python libraries for machine learning include Natural Language Toolkit (NLTK), MlPack, OpenCV, etc.
5 Advanced Concepts 5.1
Decorators
A decorator modifies a function’s behavior. A decorator is applied to a line immediately preceding the function declaration header using the “@” character [7–17]. Examples: built-in functions are @classmethod, @staticmethod, and @property. @classmethod def HelloClass(cls, arg): pass
5.2
Lambda
A lambda allows us to design “functions” that do not require names and can receive a large number of inputs and return a large number of results (like a function). In some cases, a lambda can be used as an event handler [7–17]. Use a lambda as a convenience when a function with the following criteria is required:
The Python Programming Language
55
• Is anonymous (does not need a name). • There are no statements in this expression. • For example: fn = lambda x, y, z: (x ** 2) + (y * 2) + z
5.3
Iterators and Generators
Iterator Iterator is a type of object that may be used in a for: statement and adheres to the iterator protocol [7–17]. Iterator Protocol The iterator protocol is satisfied when an item does the following [7–17]: • It has a __iter__ function that returns an iterator object. • It provides a next function that returns the next item to iterate through from a collection, sequence, stream, or any collection of objects. • When the items are exhausted and the next() function is called, the StopIteration error is thrown. Generator A generator is a class or function that implements the iterator protocol, or an iterator [7–17]. Yield Because the yield statement is an expression, we may create generator functions using it. Such functions may be coroutines since they “yield” numerous times and are resumed. The yield statement returns a value. The yield instruction causes execution to resume immediately when the next item is requested and the iterator to be “resumed” [7–17]. Example: def mygenerator(): yield item_name
5.4
Classes
Classes reflect the behavior of things in the “real” world. Methods are used to implement the behavior of these objects. The status of member variables is kept. Classes in Python allow us to construct new data types. While defining a class, the class: statement creates a class object and associates it with a name [7–17]. For example: class A(object): pass
56
H. B. Mehare et al.
Defining Methods A method is a function declared in the scope of a class, with the first parameter self: The variable self is explicitly stated. It refers to the current object, which is the one whose method is now being run [7–17]. For example: class B(object): def show(self): print 'hello from B'
Constructor The constructor is the __init__ method. For example: class A(object): def __init__(self, name): self.name = name
Member Variables Assignment is used to generate member variables. For example: class A(object): def __init__(self, items = None): if items is None: self.items = [] else: self.items = items
Calling Methods • Use the dot operator and the instance. • Invoking a method from the same class or a superclass. Outside the class, use the following instance: object_1.method_1() matrix [1].method_2() From within the same class, use self: self.a_method() When a method in the superclass has the same name as a method in the current class, you can invoke it from within a subclass. Use either the class (name) or the super keyword: Super_Class_Name.__init__(self, arg1, arg2) super(Presentt_Class, self).__init__(arg1, arg2) # One way to achieve inheritance is to use the built-in super function.
The Python Programming Language
57
Class Variables • Also known as static data. • Instances of the class share a class variable. • class name.variable is a reference. Note: If you use self.variable = x, you’ll get a new member variable. Class Methods and Static Methods Instance (plain) methods: The instance is sent as the first argument to an instance method. Class Methods • The class is the initial argument to a class method. • Class methods can be defined using the built-in function classmethod() or the decorator @classmethod. Static Methods • The instance or class is not sent as the first argument to a static method. • Static methods can be defined using the built-in function staticmethod() or the decorator @staticmethod. File Input and Output. Iterates across the lines in a text file, which is an iterable. • string.rstrip() removes newlines and other whitespace from each line’s right side. Try rstrip(;n’) to remove only newlines and not other whitespaces. • filename.read(), filename.readline(), and filename.readlines() are some other ways to read from a file/stream object. • Unlike the print statement, the write technique does not add newline characters automatically. • To flush output, the file must be closed. Use filename.flush() instead. • The with: statement shuts files automatically. For example: def test(): f = file(‘filename.txt’, ‘w’) read from file for ch in ‘abcdefg’: f.write(ch * 10) f.write(‘\n’) f.close() test()
# function call
# ‘w’ denotes write to file and ‘r’ denotes
58
H. B. Mehare et al.
6 Optimization Machine learning is defined as function approximation, that is, forecasting new data by estimating the unknown underlying function that links input instances to output instances. The purpose of function optimization is to find a collection of inputs that produces the minimal or greatest value of an objective function. Function optimization is widely used to overcome function approximation concerns since it is easier than function approximation [9–17].
6.1
SciPy
The Python SciPy open-source toolbox for scientific computing includes a collection of optimization algorithms [4–17]. • Scalar optimization: To optimize a single variable convex function. • Local search: To improve the performance of a unimodal multiple variable function. • Global search: To improve the performance of a multimodal multiple variable function. • Least squares: To solve least squares problems, including linear and nonlinear. • Curve fitting: To create a curve from any data sample. • Root finding: To locate the function’s root. • Linear programming: To linearly optimize a problem while keeping restrictions in mind. All optimization approaches are based on the assumption that the objective function is a minimization function. A maximizing function can be converted to a minimizing function by appending a negative sign to the values produced by your goal function. To include Scipy, import scipy # methods for minimizing/maximizing objective functions that are perhaps constrained scipy.optimize
6.2
Local Search with SciPy
Local search, also known as local function optimization, refers to algorithms that find the input to a function that gives the lowest or greatest output, assuming that the function or constrained region being searched has a single optima.
The Python Programming Language
59
The SciPy library’s minimize() function enables local search. The minimize() method accepts as input the name of the objective function to be minimized and the search starting point and returns an OptimizeResult that specifies whether the search succeeded or failed, as well as the specifics of the solution if one was found [9–17]. For example: minimize(objective, point)
6.3
Global Search with SciPy
Global search or global function optimization refers to approaches that seek the input to a function that results in the lowest or highest output when the function or limited region being searched is thought to have numerous local optima. Global search algorithms are frequently stochastic, which means they employ chance in the search process and may or may not manage a population of candidate solutions. Each algorithm provides an OptimizeResult object that explains the success or failure of the search, as well as the specifics of the solution if one is found [9–17]. Optimization difficulties in machine learning that can be automated via global search. (a) Data Preparation Scaling values, correcting missing values, and altering variable probability distributions are all options. An exhaustive search or a grid search may be utilized for optimization and automation. (b) Hyperparameter Tuning Although many hyperparameters have known dynamics, it is uncertain how they will affect the final model’s performance on a specific dataset. As a result, evaluating a range of values for critical algorithm hyperparameters for a machine learning algorithm is standard practice. In this case, a Bayesian optimization method is a common choice. (c) Model Selection It comprises deciding on the machine learning algorithm or pipeline that will produce the model. This is then used to train a final model, which may subsequently be used to forecast fresh data in the target application. For hyperparameter tuning, a global search strategy that also approximates the objective function, such as Bayesian optimization, is typically utilized.
60
H. B. Mehare et al.
References 1. Tulchak, L. V., & Маrchuk, А. О. (2016). History of python (Doctoral dissertation, ВНТУ). 2. Nosrati, M. (2011). Python: An appropriate language for real world programming. World Applied Programming, 1(2), 110–117. 3. Kuhlman, D. (2009). A Python book: Beginning python, advanced python, and python exercises (pp. 1–227). Dave Kuhlman. 4. Sheridan, C. (2016). The python language reference manual. Lulu Press, Inc. 5. Beazley, D., & Jones, B. K. (2013). Python cookbook: Recipes for mastering python 3. O’Reilly Media. 6. Ramalho, L. (2015). Fluent python: Clear, concise, and effective programming. O’Reilly Media. 7. Lutz, M. (2010). Programming python: Powerful object-oriented programming. O’Reilly Media. 8. Albon, C. (2018). Machine learning with python cookbook: Practical solutions from preprocessing to deep learning. O’Reilly Media. 9. Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning (Vol. 4, No. 4, p. 738). Springer. 10. Swaroop, C. H. (2013). A byte of python. Independent. 11. Hellmann, D. (2011). The python standard library by example (p. 1302). Addison-Wesley. 12. Theobald, O. (2017). Machine learning for absolute beginners: A plain English introduction (Vol. 157). Scatterplot Press. 13. Robinson, S. (2017). The best machine learning libraries in python. Stack Abuse. 14. Matthes, E. (2019). Python crash course: A hands-on, project-based introduction to programming. No Starch Press. 15. Shaw, Z. A. (2017). Learn python 3 the hard way: A very simple introduction to the terrifyingly beautiful world of computers and code. Addison-Wesley Professional. 16. Dhruv, A. J., Patel, R., & Doshi, N. (2021). Python: The most advanced programming language for computer science applications. In Proceedings of the international conference on culture heritage, education, sustainable tourism, and innovation technologies (CESIT 2020) (pp. 292–299). 17. Siva Jyothi, P. N., & Yamaganti, R. (2019). A review on python for data science, machine learning and IOT. International Journal of Scientific & Engineering Research, 10.
Basic Mathematics Hussam Bin Mehare, Jishnu Pillai Anilkumar, and Iqbal Hasan
1 Overview To convey notions, machine learning uses mathematical language. Machine learning seeks to provide general-purpose approaches for extracting meaningful patterns from data, ideally with little domain-specific knowledge. Mathematics offers the concept underpinning machine learning algorithms and assists in picking the best approach by taking accuracy, training time, model complexity, and the amount of features into consideration [1–3]. The mathematical foundations of machine learning include linear algebra, calculus, and statistics. Linear algebra is the fundamental premise of machine learning because matrices and vectors represent data. Statistics are essential to analyze the outputs of learning algorithms and grasp data distributions. Calculus teaches us how the learning process works on the inside [1–3]. Math provides the foundation for addressing real-world commercial and datadriven applications. Calculus is a mathematical discipline that assists in the analysis of the rate of change of quantities. Its goal is to increase the efficiency of machine learning algorithms or models. Without understanding this calculus concept, it is impossible to evaluate probabilities on data, and we cannot make credible conclusions from the information we have. Calculus’ core themes include integrals, limits, H. B. Mehare (✉) Department of Mechanical Engineering, Z.H. College of Engineering and Technology, Aligarh Muslim University, Aligarh, Uttar Pradesh, India J. P. Anilkumar Department of Computer Science & Engineering, Presidency University, Bengaluru, Karnataka, India I. Hasan National Informatics Centre, Ministry of Electronics IT and Telecommunications, Government of India, Research Scholar, JMI New Delhi, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_3
61
62
H. B. Mehare et al.
derivatives, and functions. The focus of linear algebra is computation. It is used for deep learning and is essential for understanding the background theory of machine learning. It gives us a better grasp of how algorithms function in practice, helping us to make better decisions.
2 Linear Algebra Basics Numerical data is represented as vectors, and a table containing such data is defined as a matrix. Linear algebra is the study of vectors, matrices, and the principles that manipulate them.
2.1
Scalars
A physical quantity whose magnitude fully describes it. Algebraic laws can be used to alter it. For example, volume, density, mass, energy, speed and time, etc. [1–8].
2.2
Tensors
In the general case, an array of numbers arranged on a regular grid with a variable number of axes is known as a tensor. Denoted as Ai, j, and k [1–8].
2.3
Vectors
A quantity with magnitude and direction and an array of numbers are called a vector y as a⃗ and read as “vector a.” Its importance is represented as kak. The numbers are arranged in order. We can identify each unique number by its index in that ordering.
2.3.1
Geometric Vectors
By interpreting vectors as geometric vectors, we may reason about mathematical operations using our intuitions about direction and magnitude. For example: x ⃗ + y ⃗ = z ⃗.
Basic Mathematics
2.3.2
63
Polynomials
When two polynomials are combined together, they produce another polynomial, and when multiplied by a scalar R (tuple of n elements), they produce another polynomial [1–8].
Note: • Audio signals are vectors that are represented numerically. An audio signal can be generated by scaling an audio signal. As a result, an audio signal is a vector quantity. • The dot product between two vectors x and y of the same dimensionality is the matrix product xTy [1–8].
2.4
Systems of Linear Equations
Linear equation systems are an important aspect of linear algebra. Many problems may be expressed as systems of linear equations, and linear algebra provides the means to solve them. A linear equation might be two-dimensional or threedimensional. There are two methods for solving simultaneous linear equations: graphical methods and algebraic methods [1–8]. General Form: a11 x1 þ . . . þ a1n xn = b1 to am1 x1 þ . . . þ amn xn = bm
where aij 2 R x1 . . . xn are unknown
In general, we get either no, exactly one, or infinitely many solutions to a realvalued system of linear equations. There are several methods for solving a system of linear equations:
64
H. B. Mehare et al.
2.4.1
Algebraic Method
An algebraic method is a collection of several methods, which are used to solve a pair of the linear equations that includes two variables [1–8].
2.4.1.1
Substitution Method
The algebraic approach for solving simultaneous linear equations is the substitution method. The value of one variable from one equation is substituted in the second equation, as the name implies. A pair of linear equations is therefore reduced into a single linear equation with only one variable, which can then be readily solved. • • • • •
Simplify the given equation by expanding the parenthesis or brackets. Solve one of the equations to get the values of either x or y. Substitute the step 2 solution in the other equation. Now solve the new equation obtained using basic arithmetic operations. Finally, solve the equation to find the value of the second variable. Let the system of linear equations be. x þ y=6
ð1Þ
- 3x þ y = 2
ð 2Þ
Let’s subtract both the equations Note that the bits of the equation before equals symbol are subtracted and equated to the ones on after it. ðx þ yÞ - ð - 3x þ yÞ = 6 - 2 On simplifying (i.e., opening the brackets), x þ y þ 3x - y = 6 - 2 ðSince negative and negative gives positiveÞ = > 4x = 4 = >x=1 As we know the value of x, let’s substitute it in Eq. 1 to get the value of y, x þ y=6 = >1 þ y=6 = >y=5 Therefore, x = 1 and y = 5.
Basic Mathematics
2.4.1.2
65
Elimination Method
Using fundamental arithmetic procedures, we delete either one of the variables and then simplify the equation to obtain the value of the other variable. The variable can be eliminated using this approach by replacing the value of another variable in an equation. As a result, this procedure is known as the elimination by substitution method. • To obtain a common coefficient of any of the variables in both equations, multiply or divide both linear equations by a nonzero value. • Add or remove from both equations such that the identical terms are eliminated. • Simplify the result to produce a final solution of the left out variable (say, y) in the form of y = c, where c is any constant. • Finally, use this value to solve either of the above equations to discover the value of the other given variable. Note: (a) There are no solutions to equations involving two parallel lines. So, if we solve such equations using the elimination approach, we get two unequal integers on each side of the unequal sign. (b) There are an endless number of solutions to two equations with coinciding lines. So, if we solve a system of equations with coincident lines using the elimination approach, we get a consistent system with infinite solutions. If we use the elimination procedure in such circumstances, we get an answer of 0 = 0. Let the equations be, 3x þ 2y = 19 x þ y=8 Let’s try to eliminate a variable, 3x + 2y = 19 x + y = 8 There’s a 2y and a y, so if we multiply the second equation with 2, 3x + 2y = 19 2x + 2y = 16
66
H. B. Mehare et al.
Now subtract the second equation from the first, 3x + 2y = 19 2x + 2y = 16 x = 3 Now divide the second equation by 2 and subtract the first equation from the second, x + y = 8 x = 3 y = 5 Alternatively, substitute x value in one of the initial equations. Therefore x = 3 and y = 5. Note: • Substitution is often easier for small cases (like two equations, or sometimes three equations). • Elimination is easier for larger cases.
2.4.1.3
Graphical Method
The graphical technique, often known as the geometric method, is used to solve a system of linear equations. In this technique, the equations are produced based on the target function and constraints. To obtain the solutions to a system of linear equations, this approach went through numerous phases. The primary technique for solving linear equations graphically is to represent them as straight lines on a graph and find any places of intersection. We may quickly deduce at least two solutions by substituting the values for x, computing the x and y-intercepts, and graphing them geometrically on the graph. The importance of the equations varies depending on the position of the lines: • Consistent: If the two lines intersect at the same position, the point provides a unique solution to both equations, and the pair of equations is said to be consistent. • Dependent: The pair of equations is considered dependent; if the two lines intersect, there are an unlimited number of solutions. Every point on a line is converted into a solution. • Inconsistent: If the two lines are parallel, the pair of equations is said to be inconsistent; there is no solution in this case [1–8].
Basic Mathematics
67
Provided conditions a1/a2 ≠ b1/b2
Graphical representation
Intersecting lines a1/a2 = b1/b2 = c1/c2
Exactly one solution (unique)
Coincident lines a1/a2 = b1/b2 ≠ c1/c2
Infinitely many solutions
Parallel lines
No solution
Algebraic interpretation
68
H. B. Mehare et al.
2.5
Matrices
A matrix is a numerical data array having m rows and n columns, where m and n are both positive integers. The following operations are possible: (a) Addition A þ B=
½a b c d
þ
=
ef gh
ð a þ eÞ ð b þ f Þ ðc þ gÞ ðd þ hÞ
Example: A=
½1 2 3 4
B=
A þ B=
½1 2 3 4
þ½5 6 7 8
56 78 =
6 8 10 12
(b) Subtraction A-B=
½a b c d
-
ef gh
=
- ½5 6 7 8
=
-4-4 -4-4
ða - eÞ ðb - f Þ ð c - gÞ ð d - h Þ
Example: A=
½1 2 3 4
B=
½5 6 7 8
Aþ B=
½1 2 3 4
(c) Multiplication To perform multiplication of matrices, the nth column of the first matrix and mth row of the second matrix should have the same order. That is, Matrix degrees should be Aij * Bjk = Cik , where i, j, and k are positive integers [1–8].
Basic Mathematics
69
A B =
½a b c d
þ
ef gh
=
ðae þ bgÞ ðaf þ bhÞ ðce þ dgÞ ðcf þ dhÞ
Example: A=
A B =
½1 2 3 456
½1 2 3 4 5 6
B=
½7 8 9 10 11 12
½ 7 8 9 10 11 12 =
½ 58 64 139 154
Identity An identity matrix is a matrix that does not modify any vector when multiplied by itself [1–8]. Identity matrix = In x = x Example: I3 =
1 0 0 0 1 0 0 0 1
(d) Inverse A-1 =
a b - 1 = ð1=ðad - bcÞÞ d - b cd -c a ðad - bcÞ is called the Determinant AA - 1 = I ðmust be satisfiedÞ
Example: A=
A-1 =
47 26
=
½1 2 3 4
ð1=ðð4 9Þ - ð7 2ÞÞÞ
6-7 -2 4
=
0:6 0:7 - 0:2 0:4
70
H. B. Mehare et al.
(e) Transpose AþB =
a bT cd
=
ac bd
Example: A=
½1 2 3 4
AT =
2.6
½1 3 2 4
Determinants
A determinant is a mathematical object used in the study and solution of linear equation systems. Determinants are defined only for square matrices, which have the same number of rows and columns. det ðAÞ = det ða11 Þ = a11 For n = 2, a11 a21
a12 a22 = a11 a22 - a12 a21
For n = 3, a11 a21 a31
a12 a13 a22 a23 a32 a33 = a11 a22 a a33 þ a21 a32 a13 þ a31 a12 a23 - a31 a22 a13 - a11 a32 a23 - a21 a12 a33
• The determinant of a matrix product is the product of the corresponding determinants,
Basic Mathematics
71
detðABÞ = detðAÞ detðBÞ • • • •
Determinants are invariant to transposition, i.e., det(A) = det(AT). If A is regular (invertible), then det(A-1) = 1 / det(A). Adding a multiple of a column/row to another one does not change det(A). Swapping two rows/columns changes the sign of det(A).
Example: Find the determinant of: 8 3
6 4 = ð8 4Þ - ð6 3Þ = 32 - 18 = 14
To explore with and develop a better grasp of matrices and their operations, use the online application Matrix Calculator [1–8].
2.7
Slope-Intercept Equation
Slope-intercept equations are a kind of linear equation. It is written as y = mx + b, where m denotes the slope and b denotes the intercept.
72
H. B. Mehare et al.
Note: (a) The y-intercept is located at (0, b), which means that to get the y-intercept of a graph, use x = 0, and solve for y. (b) Slope is a measure of how steep a line is. It is the ratio of y change to x change between any two places on the line. Slope = change in y / change in x Slope = rise / run Example: 1. Let two points on a plane be (0, 8) and (3, 2). As we know y-intercept is (0, b), therefore b = 8. To find the slope, m = change in y / change in x = (2 – 8) / (3 – 0) = -6 / 3 = -2 Hence, y = -2x + 8 2. Let two points on a plane be (2, 5) and (4, 9). To find the slope, m = change in y / change in x = (9 – 5) / (4 – 2) =4/2=2 Hence, y = 2x + b . . .Equation 1 As our line passes through several points, let’s choose (2, 5) and substitute it in our Equation 2. y = 2x + b 5=2*2+b Hence, b = 5 – 4 = 1 [1–8]
2.8
Secant and Tangent Line
A secant line is a simple straight line that connects two points on a function. It is also known as the average rate of change or the slope between two locations. That is, secant line = average rate of change = slope A tangent line is one that is straight that intersects a function just once. The tangent line reflects the function’s instantaneous rate of change at that single location. The slope of the tangent line at a point on the function equals the function’s derivative at the same place. That is, tangent line = instantaneous rate of change = derivative The average rate of change becomes the instantaneous rate of change when the two points used for the secant line approach closer together, and the secant line becomes the tangent line [1–8].
Basic Mathematics
2.9
73
Nonlinear Function
A nonlinear function has a graph that is not a straight line. Its graph can take the form of any curve other than a straight line. • f(x) = x2 is nonlinear as it is a quadratic function. • f(x) = 2x is nonlinear as it is an exponential function. • f(x) = x3 – 3x is nonlinear as it is a cubic function [1–8].
3 Calculus Basics Calculus, a branch of mathematics developed by Newton and Leibniz, is the study of the connection between variables and their rates of change. Essential calculus delineates the two forms of calculus known as “differential calculus” and “integral calculus.” Differential calculus is used to determine the rate of change of a quantity, whereas integral calculus is used to determine the quantity when the rate of change is known [1–3, 9–12]
3.1
Exponents
An exponent is represented in the form, Base Exponent. Some useful rules are as follows: • The zeroth power of any nonzero number is 1 a0 = 1 • The first power of a number is just the number itself a1 = a • When two exponents having the same base value are multiplied, their exponents are added a x * ay = a x + y • When two exponents having the same base value are divided, their exponents are subtracted ax / ay = a x – y • When an exponent of another exponent is taken, their exponents are multiplied (ax)y = a x * y Example: 32
2
= 3ð22Þ = 34 = 81 ½1–3, 9–12
74
H. B. Mehare et al.
3.2
Logarithms
Logarithm is a different technique of describing exponential equations. Some relevant rules are as follows: • loga N = x, read as log of N to the base a = > ax = N, loga1 = 0 loga a = 1 N > 0; a > 0; a ≠ 1 log1/a a = -1 • loga(x * y) = loga(x) + loga(y); x,y > 0, x,y > 0, • loga(x/y) = loga(x) – loga(y); x>0 • loga(xy) = y * loga(x); x > 0, x ≠ 1 • loga x = 1/ loga x; • aloga x = x; a > 0, a ≠ 1 a, b, c > 0, b ≠ 1 • alogb c = clogb a; Example: 2x = 8 x = log 2 ð8Þ = 3 ½1–3, 9–12
3.3
Functions
A function is a rule that transforms one item into another. A function is defined in classical mathematics as a relationship between two words called variables as their values change. The item you begin with is known as the input, and it originates from a set known as the domain. The result is known as the output, and it originates from a set known as the codomain. Here are a few examples: If each value of one variable, say x, corresponds to exactly one value of another variable, say y, y is said to be a function of x. In this case, x is referred to as the independent variable, and y is referred to as the dependent variable since its value is reliant on the value of x. If there is a function, we can write y = f(x) = 3x + 9. The expression is known as an explicit function of x in this form. Because the explicit form is inferred by the equation, the equation is called an implicit function of x if it has the corresponding form of 2x – y – 7 = 0. It is simply found by rearranging the terms in the equation. Example: Trigonometric functions include sines, cosines, tangents, and secants. Logarithmic functions are represented by logs. Exponential functions are those in which the independent variable, x, is expressed as an exponent in an equation, such as y = 2x [1–3, 9–12].
Basic Mathematics
3.3.1
75
Interval Notation
The expression [a, b] denotes the set of all x such that an x [0, 100] is the set of all real numbers between 0 and 100 (including 0 and 100). It covers the digits 0, 1, 2, 3, ... 98, fractions, and irrational numbers such as 98/8, 99, and so on. A closed interval is one that has the form [a, b]. The interval would be open if the square brackets were substituted with parenthesis. The notation (a, b) denotes that the set of all x is such that it regards all the values except a and b. The expression (0, 100) would cover all the values from 1 through 99 including fractions and irrational numbers, but excludes 0 and 100. To summarize, square brackets [] indicate that the intervals are to be regarded, whereas parentheses () suggest otherwise [1–3, 9–12].
3.3.2
Inverse Functions
So far we have seen that y = f(x). A gist of inverting a function is as follows: 1. Begin with a function f such that there is precisely one integer x such that f(x) = y for each y in the range of f. In other words, various inputs result in diverse results. We’ll now define the inverse function f-2. 2. The domain of f-1 is the same as the range of f. 3. The range of f-1 is the same as the domain of f. 4. The value of f-1(y) is the number x such that f(x) = y.
76
H. B. Mehare et al.
So if we have y = f(x), we can also have f-1(y) = x; the transformation f-1 acts like an undo button for f: if you start with x and transform it into y using the function f, then you can undo the effect of the transformation by using the inverse function f-1 on y to get x back. Example: f ðxÞ = x2 This means that f(x) = y = x2 and x = p Hence, f-1 (y) = y [1–3, 9–12]
3.3.3
p
y
Composition of Functions
Composition of functions yields a new function formed from the elementary functions. Let’s say, there are two functions f and g. The new function h will be of the form h(x) = g(f(x)). Another way of expressing this is to write f = h ∘ g; here the circle means “composed with.” Example: Let f(x) = 3x and g(x) = x2 g ∘ f = g(f(x)) = > g(3x) = (3x)2 = 9x2 [1–3, 9–12]
3.4
Trigonometry
The study of the connection between the sides and angles of a right-angle triangle is known as trigonometry. As a consequence, using trigonometric formulae, functions, or identities, it is feasible to find the missing or unknown angles or sides of the right triangle. It has been employed in oceanography, seismology, meteorology, physical sciences, astronomy, acoustics, navigation, electronics, and other domains. There are 360° or 2 Π units in a full revolution. Angle in radian = (Π / 180) * Angle in degrees
Basic Mathematics
77
A triangle’s trigonometric ratios are also known as its trigonometric functions. Angles in trigonometry can be measured in degrees or radians. 0°, 30°, 45°, 60°, and 90° are some of the most widely utilized trigonometric angles for computations. The trigonometric function can be described as being even or odd. (a) Odd Trigonometric Functions. A trigonometric function is said to be an odd function if f(-x) = -f(x) and symmetric with respect to the origin. (b) Even Trigonometric Functions. A trigonometric function is said to be an even function, if f(-x) = f(x) and symmetric to the y-axis.
3.4.1 • • • • • • • • • • • • • •
Trigonometric Formulae
Sin (-x) = - Sin x. Cos (-x) = Cos x. Tan (-x) = -Tan x. Csc (-x) = - Csc x. Sec (-x) = Sec x. Cot (-x) = -Cot x. Cot θ = 1/tan θ. Sec θ = 1/cos θ. Cosec θ = 1/sin θ. sin2θ + cos2θ = 1, tan2θ + 1 = sec2θ, cot2θ + 1 = cosec2θ, sin 2θ = 2 sin θ cos θ, cos(2θ) = 2 cos2(θ) – 1 = 1–2 sin2(θ),
78
H. B. Mehare et al. sin(90° – θ) = cosθ sin(90° + θ) = cosθ sin(180° + θ) = sinθ sin(180° – θ) = -sinθ sin(270° – θ) = -cosθ sin(270° + θ) = -cosθ
• • • • • • • •
cos(90° – θ) = sinθ cos(90° + θ) = -sinθ cos(180° + θ) = -cosθ cos(180° – θ) = -cosθ cos(270° – θ) = -sinθ cos(270° + θ) = sinθ
sin(A + B) = sin(A) cos(B) + cos(A) sin(B), cos(A + B) = cos(A) cos(B) – sin(A) sin(B), sin(A – B) = sin(A) cos(B) – cos(A) sin(B), cos(A – B) = cos(A) cos(B) + sin(A) sin(B), tan(A + B) = tan(A) + tan(B) / 1 – tan(A) tan(B), tan(A – B) = tan(A) – tan(B) / 1 + tan(A) tan(B), cot(A + B) = cot(B) cot(A) – 1 / cot(B) + cot(A), cot(A – B) = cot(B) cot(A) + 1 / cot(B) – cot(A). A few trigonometric functions start with the letter “co,” which stands for “complimentary.” As a result, the even trigonometric functions are cosine and secant, whereas the odd trigonometric functions are sine, tangent, cosecant, and cotangent [1–3, 9–12].
3.4.2
Trigonometric Angles
Angles sin θ cos θ tan θ cosec θ sec θ cot θ
3.5
0° 0 1 0 1 1 1
30° 1/2 √3/2 1/√3 2 2/√3 √3
45° 1/√2 1/√2 1 √2 √2 1
60° √3/2 1/2 √3 2/√3 2 1/√3
90° 1 0 1 1 1 0
Limits
A limit is a derivative, which is the core notion of differential calculus, and an integral, which is the essential concept of integral calculus. The degree to which a value or word is approaching. A limit is generally stated using the limit formula, which is as follows:
Basic Mathematics
79
lim f(x)= A
x→c
It may be translated as “the limit of f of x as x approaches c equals A.” Because x is a temporary label that may be substituted by any other character, it is referred to as a dummy variable. Consider three buddies who reside in three successive and neighboring houses to better comprehend constraints. Assume friend A lives in home A, friend B in house B, and friend C in house C.
If A wishes to meet C without B realizing, the restriction demands that A meet C in such a way that A and C are close to but not at home B. Their individual boundaries are the distances that both friends A and C may go without crossing or entering home B. A buddy who walks from left to right is said to have a left-hand limit, and a buddy who walks from right to left has a right-hand limit. When both the left-hand and right-hand limits at x = c exist and are equal to each other, the standard two-sided limit exists. Even the smallest variation indicates that a two-sided limit for that function does not exist. lim f ðxÞ = A
x → c-
and
lim f ðxÞ = A
x → c+
is the same as
lim f ðxÞ = A
x→c
If g(x) ≤ f(x) ≤ h(x) for all x near a, And lim g(x) = lim h (x) = L, then x→a
x→a
lim f(x) = L
x→a
The limit to the right at x = a. The behavior of f(x) to the left of x = a, as well as at x = a, is unimportant. (This indicates that for the right-hand limit, it makes no difference what values f(x) chooses for x a.) [1–3, 9–12]
3.5.1
Indeterminate Forms
A mathematical equation with an indeterminate form is one for which a solution cannot be found even after the replacement of limitations, such as 0/0, 1 / 1, 1 – 1, 0 * 1, 1 1, 00, 1 0, etc.
80
H. B. Mehare et al.
These can be evaluated using the following methods: (a) Factoring Method (0/0 form) Expressions are factored to their simplest form using the factoring process. The limit value should then be replaced. (b) L’Hospital’s Rule (0/0 or 1/1 form) L’Hospital’s rule is a generic way of assessing indeterminate forms like 0/0 or 1/1. L’Hospital’s rule is used in calculus to evaluate the limits of indeterminate forms for derivatives. The derivative of each term is taken in each step sequentially until at least one of the terms is free of the variable. That is, at least one term becomes constant. It can be used several times and must be terminated when a deductive form is produced. L’Hospital’s rule is the ultimate method for simplifying limit assessment. It does not evaluate limits directly, but rather facilitates evaluation when used correctly. f ðxÞ f ðxÞ f 0 ð xÞ 0 1 = or lim = ,then lim 0 x → a gðsÞ x → a gð s Þ x → a g ðsÞ 0 1
If lim
(c) Division of Each Term by Highest Power of Variable (1/1 Form) The limit value is derived by dividing each term in the numerator and denominator by the variable of the highest power in the eq. [1–3, 9–12]. 3.5.2
Limits of Rational Functions as x → a
A limit of the form, lim p(x) / q(x) can be solved as follows:
x→c
Suppose we have a function, lim 5x + 2 / 3 – x
x → ð - 1Þ
Simply substitute -1 in place of x. 5 ð–1Þ þ 2=3–ð–1Þ = ð–5 þ 2Þ=ð3 þ 2Þ = –3=5 Hence the limit is -3/5. If the denominator simplifies to zero, then it would result in an indeterminate form. The limit may be finite, 1, - 1, or a limit might not exist [1–3, 9–12].
3.5.3
Limits of Square Roots as x → a
When the function has a square root, the conjugate approach should be employed. It entails multiplying and dividing the square root term by the function.
Basic Mathematics
81
p lim x - 9 – 4 / x – 5
x→5
Upon conjugating, note that -4 is now +4, p p p lim x2 - 9 – 4 / x – 5 * ( x2 - 9 + 4) / ( x2 - 9 + 4) x→5 p lim x2 – 9 – 16 / (x – 5) * ( x2 - 9 + 4) x→5 p lim x2 – 25 / (x – 5) * ( x2 - 9 + 4) x→5
Splitting the numerator, p lim (x – 5) * (x + 5) / (x – 5) * ( x2 - 9 + 4) x→5
Hence, p lim (x + 5) / ( x2 - 9 + 4) x→5
Substituting x = 5, we get 10/8 [1–3, 9–12]. Limits of Rational Functions as x→1
3.5.4
A general depiction of rational functions as x→1 is, lim p(x) / Leading term of p(x) = 1 x→1
General notes include: For any function of the form, lim p(x) / q(x), x→c
• If the degree of p equals the degree of q, the limit is nite and nonzero. • If the degree of p is greater than the degree of q, the limit is 1 or -1. • If the degree of p is less than the degree of q, the limit is 0. A rule of thumb that is true for most cases is that, lim C / xn = 0, where n > 0 and C is a constant x→ 1 [1–3, 9–12]
3.6
Differential Calculus
Differential calculus deals with the issues of determining the rate of change of a function in relation to other variables. Derivatives are used to identify the maximum and minimum values of a function in order to find the best solution. The process of determining a derivative is known as differentiation. A function’s derivative is the rate of change of its output value with regard to its input value, whereas its differential is the actual change of the function.
82
H. B. Mehare et al.
If x is a variable and y is another variable, then the rate of change of x with respect to y is given by dy/dx. This is the general expression of the derivative of a function and is represented as f′(x) = dy/dx, where y = f(x) is any function and the ratio dy/ dx is called the differential coefficient of y with respect to x. When we have derivatives of higher order (two or above), it is denoted dny/dxn, where n is any integer. For a derivative of order two, f″(x) = d2y/dx2. If the function f(x) suffers an infinitesimal change of “h” at any point “x,” the function’s derivative is defined as lim f(x + h) – f(x) / h [1–3, 9–12] h→0
3.6.1
Continuity
A function f(x) is said to be continuous at a particular point x = a, if the following three conditions are satisfied: • f(a) is defined, • lim f(x) exists. x→a • lim f(x) = lim f(x) = f(a). x→a+ x→ aWhen two continuous functions are divided or multiplied (unless when divided by zero), the quotient or product is also continuous. Furthermore, logarithmic and exponential functions are both continuous. Functions can also be continuous at specific intervals. According to the max-min theorem, if f is continuous on [a, b], then it has at least one maximum and one minimum on [a, b] [1–3, 9–12].
3.6.2
Derivatives
The instantaneous rate of change of one quantity in relation to another. A function’s derivative is written as: lim f(x + h) - f(x) / h = A x→h
3.6.3
Notations
When a function is denoted as y=f(x), the derivative is indicated by the following notations. 1. D(y) or D[f(x)] is called Euler’s notation. 2. dy/dx is called Leibniz’s notation. 3. F′(x) is called Lagrange’s notation [1–3, 9–12].
Basic Mathematics
3.6.4
83
Differentiation Rules Common functions Constant Line
Function c x ax x2 √x ex ax ln(x) loge(x) sin(x) cos(x) tan(x) sin-1(x) cos-1(x) tan-1(x)
Derivative 0 1 a 2x (1/2)x -1/2 ex ln(a)ax 1/x 1/(x ln(a)) cos(x) –sin(x) sec2(x) 1/√(1–x2) –1/√(1–x2) 1/√(1 + x2)
Rules Multiplication by constant Power Rule Sum Rule Difference Rule Product Rule Quotient Rule
Function cf nc f+g f-g fg f/g
Derivative cf′ nx n-1 f′ + g′ f′ – g′ fg′ + f′g
Reciprocal Rule Chain Rule Chain Rule (using ‘) d ) Chain Rule (using dx
1/f f°g f(g(x))
-f/f (f′ ° g) × g′ f′(g(x))g′(x)
Square Square root Exponential Logarithms Trigonometry (x is in radians)
Inverse Trigonometry
dy dx
dy = du
f 0 g - g0 f g2 2
du dx
General Formulas Assume u and v are differentiable functions of x Constant Sum Difference Constant multiple Product Quotient Power Chain rule
d dx (c) = 0 d du dv dx (u + v) = dx + dx d du dv dx (u – v) = dx – dx d du dx(cu) = c dx d dv du dx(uv) = u dx þ v dx du dv v u d u dx dx v2 dx v = n-1 d dx xn = nx d dx (f(g(x)) = f′ (g(x))
. g’(x)
84
H. B. Mehare et al.
Trigonometric functions d dx(sin x) = cos x 2 d dx(tan x) = sec x 2 d dx (cot x) = -cosec x
d dx(cos x) = -sin x d dx(sec x) = sec x. tan x d dx(cosec x) = -cosec x
cot x
Exponential functions d x dxe d x dxa
= ex = a ln a x
d dx d dx
ln x = 1/x
d dx d dx d dx
(cos -1 x) = - 1/ √ 1 – x2
d dx d dx d dx
(cosh x) = sinh x
d dx d dx d dx
(cosh -1 x) = 1/ √ x2–1
(loga x) = x
1 ln a
Inverse trigonometric functions d dx d dx d dx
(sin-1 x) = 1/ √ 1–x2 (tan -1 x) = 1/1 + x2 (cot
-1
x) = -1/ 1+ x
2
(sec -1 x) = 1/ |x| √ x2–1 (cosec -1 x) = -1/ |x| √ x2–1
Hyperbolic functions d dx d dx d dx
(sinh x) = cosh x (tanh x) = sech x 2
(coth x) = -cosec h2x
(sech x) = =sech x tanh x (cosech x) = -cosech x- coth x
Inverse hyperbolic functions d dx d dx d dx
(sinh -1 x) = 1/ √ 1+ x2 (tanh (coth
-1
x) = 1/1 – x2
-1
x) = 1/1 – x2
(sech -1 x) = -1/ x√1 – x2 (cosech -1 x) = -1/ |x|√1 + x2
Parametric Equations If x = f(t) and y = g(t) are differentiable, then y0 =
dy dy=dt d2y dy0 =dt and = = dx dx=dt d2x dx=dt
Differentiation is simply about applying the correct rules to any given equation. Select and substitute the rules for the equations displayed in the examples below [1– 3, 9–12]. Example: • Differentiate n3. n3 is of the form similar to that of the power rule, d/dx (xn) = n * xn-1. So, d/dx (n3) = 3 * n3–1 = 3n2.
Basic Mathematics
• Differentiate 5z2 + z3 - 7z4. We break the equation into three parts and apply the power rule, d/dz. (z2) = 2z. d/dz. (z2) = 3z2 d/dz. (z4) = 4z3 Substitute these new found values in the original equation, d/dz. (5z2 + z3 - 7z4) = 5 × 2z + 3z2–7 × 4z3 = 10z + 3z2 - 28z3. • Differentiate 3/x. 3/x is similar to the reciprocal rule, d/dx(1/f) = -f′/ f 2. d/dx (3/x) = 3 * d/dx (x-1) = 3 * -1 * x–1–1 = > -3x-2 = -3/x2 • Differentiate cos(x)sin(x). By product rule, fg = f g’ + f′ g Here, f = cos(x) and g = sin(x). d/dx (sin(x)) = cos(x) d/dx (cos(x)) = -sin(x) d/dx (cos(x)sin(x)) = cos(x)cos(x) - sin(x)sin(x) = cos2(x) - sin2(x) • Differentiate cos(x)/x. By quotient rule, f/g = (f′ g - g’ f)/g2 Here, f = cos(x) and g = x d/dx (x) = 1 d/dx (cos(x)) = -sin(x) d/dx (cos(x)/x) = x(-sin(x)) - cos(x)(1) / x2 = - x sin(x) + cos(x) / x2 • Differentiate x2 + y2 with respect to x. d/dx (x2) = 2x d/dx (y2) is not equal to 2y as we are differentiating with respect to x. So, let’s consider u = y2. Now u with respect to y (du/dy) = 2y. Hence, d/dx (y2) = du/dx = du/dy * dy/dx = 2y * dy/dx. d/dx (cos(x)/x) = 2x + 2y * dy/dx • Differentiate sin(x2). By chain rule, dy/dx = dv/du * du/dx Here, u = x2 and y = sin(u). d/dx (sin(x2)) = d/du (sin(u)) * d/dx (x2) = cos(u) (2x) Substitute u = x2, d/dx (sin(x2)) = 2x cos(x2) [1–3, 9–12]
85
86
H. B. Mehare et al.
3.7
Critical Points
A crucial point of a function y = f(x) is a point on the graph of the function where there is either a vertical or horizontal tangent. To locate important spots, we observe: • The points at which f′(x) = 0 • The points at which f′(x) is not defined Graphical Estimation • Check for minimum and maximum points. • Check the points where drawing a horizontal or vertical tangent is possible. • Check for sharp turning points. To find the critical points of a multivariable function, say f(x, y), we just set the partial derivatives with respect to each variable to 0 and solve the equations. That is, we solve fx = 0 and fy = 0, and solve them. Uses • Find maxima and minima. • Finding the increasing and decreasing intervals. • Used in optimization problems. Example: f(x) = x2/3 f ′(x)= (2/3) x-1/3 = 2 / (3x1/3) Setting f′(x) = 0, We get 2 / (3x1/3) = 0 ) 2 = 0, which can never happen. So there are no x values that satisfy f′ (x) = 0. Now, check where f′(x) is not defined. We can see that 2 / (3x1/3) is not defined at x = 0. So the only critical point is at x = 0. Its critical value is f(0) = 02/3 = 0 [1–3, 9–12].
3.8
Extreme Value Theorem
Critical points are useful for finding the probable maximum and lowest values of a function on specific intervals. The extreme value theorem assures that a function has both a maximum and a minimum value. If a function f(x) is continuous on a closed interval [a, b], then f(x) has both a maximum and a minimum value on [a, b]. The method for using the extreme value: The first step in proving the theorem is to show that the function is continuous on the closed interval. The next step is to identify all important points in the specified period and evaluate the function at these critical points as well as the interval’s ends. The maximum value is the biggest function value from the preceding step, and the lowest value is the function’s minimum value on the provided interval. It is worth noting that the words local
Basic Mathematics
87
minimum and local maximum are sometimes used interchangeably with relative minimum and relative maximum [1–3, 9–12].
3.9
Partial Derivatives
The partial derivative of any function with several variables is its derivative with respect to one of those variables while holding the others constant. The partial derivative of a function f with respect to a variable x is indicated by f′x, fx, xf, or f/x. The partial derivative symbol is shown here. If f(x,y) is a function that partially relies on x and y, and we differentiate f with regard to x and y, the derivatives are referred to as f’s partial derivatives. The partial derivative of f with respect to x, with y as a constant, is given by: fx =
f ðx þ h, yÞ - f ðx, yÞ ∂f = lim h ∂x h → 0
and partial derivatives of function f with respect to y, keeping x as constant, we get fy =
f ðx, y þ hÞ - f ðx, yÞ ∂f = lim h ∂x h → 0
Partial differentiation refers to the method of determining the partial derivatives of a given function. When we take one of the tangent lines of the graph of the given function and find its slope, we are using partial differentiation [1–3, 9–12]. Rules: (a) Product Rule. If u = f(x, y).g(x, y) ∂f = g(x, y) ∂x + f(x, y)∂g ux = ∂u ∂x ∂x
and, uy = ∂u g(x, y) ∂f + f(x, y)∂g ∂y ∂y ∂y (b) Quotient Rule. If u = f(x, y) / g(x, y) ux =
∂f gðx, yÞ∂x - f ðx, yÞ∂g ∂x ½gðx, yÞ2 ∂f gðx, yÞ∂y - f ðx, yÞ∂g ∂y uy = ½gðx, yÞ2
and, (c) Power Rule. ux = n |f(x,y)|n-1 ∂f / ∂x uy = n |f(x,y)|n-1 ∂f / ∂y (d) Chain Rule.
(i) One Independent Variable. Consider the case where x = g(t) and y = h(t) are differentiable functions of t and z = f(x, y) is a differentiable function of x and y. If z = f(g(t), h(t)) and
88
H. B. Mehare et al.
z is a differentiable function of t, then the partial derivative of the function with respect to the variable “t” is given as: ∂z / ∂t = ∂z / ∂x. ∂x / ∂t + ∂z / ∂y. ∂y / ∂t (ii) Two Independent Variables. Assume that x = g (u, v) and y = h (u, v) are the differentiable functions of the two variables u and v and also z = f (x, y) is a differentiable function of x and y; then z can be defined as z = f (g (u, v), h (u, v)), which is a differentiable function of u and v. Thus, the partial derivatives of the function with respect to the variables are given as: ∂z / ∂u = ∂z / ∂x. ∂x / ∂u + ∂z / ∂y. ∂y / ∂u and ∂z / ∂v = ∂z / ∂x. ∂x / ∂v + ∂z / ∂y. ∂y / ∂v
3.10
Linearity of Differentiation
Linearity of differentiation (also known as the rule of linearity or the differentiation superposition rule) is a combination of the sum rule and the constant factor rule. The property asserts that differentiation respects linear combinations of functions, which is effectively simply an extension of the limit laws. In other words, differentiation is a linear operation; if you differentiate a linear combination of functions, you will receive the same linear combination of their derivatives. Note: In practice, the rate of change of any quantity, say Q, is the derivative of Q with respect to time t. That is, if Q is some quantity, then the rate of change of Q is dQ/dt.
3.11
Integral Calculus
The study of integrals and their properties is known as integral calculus. It is mostly used for the following two functions: • To calculate f from f′ (i.e., from its derivative). If a function f is differentiable in the interval of consideration, then f′ is defined in that interval. • To calculate the area under a curve [1–3, 9–12].
Basic Mathematics
3.11.1
89
Integration
Differentiation is the inverse of integration. As differentiation is defined as separating a part into numerous small parts, integration is defined as assembling small parts to make a whole. It is commonly used to calculate area. dx means the sum of all the little bits of x, and the symbol is called the “integral of″ [1–3, 9–12].
3.11.2
Indefinite Integral
An indefinite integral does not have a specific boundary, i.e., no upper and lower limit is defined. An indefinite integral, like f(x) dx, is a family of functions. This family consists of all functions which are antiderivatives of f (with respect to x). The functions all differ by a constant. Thus the integration value is always accompanied by a constant value (C). It is denoted as: f(x).dx = F(x) + C [1–3, 9–12] Properties (a) d/dx (e) = 0 ) 0 dx = c (b) d/dx (x) = 1 ) 1 dx = x + c (c) d/dx (kx) = k ) k dx = kx + c (d) d/dx (xn + 1/ n + 1) =(1/n + 1) (n + 1) xn = xn ) xn dx= (xn + 1/ n + 1) + c (e) d/dx (log | x| ) = 1/x ) 1/x dx = log j x j + c (f) d/dx (ex) = ex ) ex dx= ex + c (g) d/dx (ax/ log a) =(log a) xn / (log a) = ax ) ax dx= (ax/ (log a)) + c (h) d/dx (sin x) = cos x ) cos x dx = sin x + c (i) d/dx (- cos x) = sin x ) cos x dx = - cos x + c (j) d/dx (tan x) = sec2 x ) sec 2 x dx = tan x + c (k) d/dx (- cot x) = cosec2 x ) cosec 2 x dx = - cot x + c (l) d/dx (sec x) = sec x tan x ) sec x tan x dx = sec x + c (m) d/dx (- cosec x) = cosec x cot x ) cosec x cot x dx = - cosec x + c (n) d/dx (sec x) = sec x tan x ) sec x tan x dx = sec x + c p p (o) d/dx (sin-1 xÞ = 1= 1 - x 2 ) 1= 1 - x 2 dx = sin-1x + c p p (p) d/dx (tan-1 xÞ = 1= 1 þ x 2 ) 1= 1 þ x 2 dx = tan-1x + c p p (q) d/dx (sec-1 xÞ = 1=x - 1 þ x 2 ) 1=x - 1 þ x 2 dx = sec-1x + c p p (r) d/dx (cos-1 xÞ = - 1= 1 - x 2 ) - 1= 1 - x 2 dx = cos-1x + c p p (s) d/dx (cot-1 xÞ = - 1= 1 þ x 2 ) - 1= 1 þ x 2 dx = cot-1x + c p p (t) d/dx (cosec-1 xÞ = - 1=x - 1 þ x 2 ) - 1=x - 1 þ x 2 dx = cosec1 x+c Examples: • Integrate x2 dx.
90
• •
•
•
H. B. Mehare et al.
According to power rule, xn dx = x n + 1/ (n + 1) dx Hence, x2 dx = x3 / 3 + C. Integrate 5x2 + cos(x) dx. Separating the terms, 5 x2 + cos(x) dx = 5x3/3+ sin(x) + C Integrate 1/x1/3 dx According to power rule, xn dx = x n + 1/ (n + 1) dx 1/x1/3 dx = x–1/3 dx = x – 1/3 + 1/(–1/3 + 1) + C = x2/3/(2/3) dx + C Hence, 1/x1/3 dx = (3 x2/3)/2 + C p Integrate 3 x2 dx. p 1=3 3 2 x2=3 dx = x2=3þ1 =ð2=3 þ 1Þ þ C = x5=3 =ð5=3Þ x dx = ðx2 Þ dx = dx þ C p Hence, 3 x2 dx = 3 x5=3 =5 þ C. Integrate 7x dx. According to the rules, ax dx = ax/ log a + C. Hence, 7x dx = 7x / log 7 + C [1–3, 9–12].
3.11.3
Definite Integral
Using a definite integral and the accompanying graph, y = f(x), the size of the shaded region between the x-axis and the vertical lines (interval [a, b]) may be approximated.
A definite integral has a defined boundary within which the function must be computed. The lower and upper limits of a function’s independent variable are given, and its integration is represented using definite integrals. A definite integral is represented by: b a f(x) dx = F(x), where a and b are called Endpoints or Limits of integration.
Basic Mathematics
91
It is the signed area (in square units) of the region bounded by the y = f(x) curve, the lines x = a and x = b, and the x-axis. The entire sentence is interpreted as the integral of f(x) with regard to x from a to b [1–3, 9–12]. Properties (a) Order of integration: (b) Zero width interval: a b
(c) Constant multiple: (d) Sum and difference:
a b f(x) dx = a b f(x) dx = 0. b a
k f(x) dx = k a b
b a
f(x) dx + (e) Additivity: (f) Max-min inequality:
b a
f(x) dx.
f(x) dx.
(f(x) ∓ g(x)) dx = c b
f(x) dx =
c a
b a
f(x) dx ∓
b a
g(x) dx.
f(x) dx.
if f has a maximum value max f and a minimum value min f on [a, b], b (min f) (b – a) ≤ a f(x) dx ≤ (max f) (b – a) (g) Domination: • if f(x) ≥ g(x) on [a, b], • if f(x) ≥ 0 on [a, b], Example: Integrate
2 0
b a
b a
f(x) dx ≥
b a
g(x) dx,
f(x) dx ≥ 0.
x + 2 dx. 2
Using the power and constant rules, 0 x + 2 dx = [x2/2 + 2x]. For applying the limits, substitute the upper and lower values and subtract, [22/2 + 2*2] – [02/2 + 2*0] = 4/2 + 4 = 2 + 4 = 6. 2 Had the equation been 2 x + 2 dx, the answer would have been zero by the zero width interval property [1–3, 9–12].
3.11.4
Area Estimation
Given a function, y = f(x) b I = a y dx The area of the strip is equal to the height multiplied by the width, or ydx square units. The integral adds up the areas of all the strips while taking the limit as all strip widths approach zero (in the limit).
92
H. B. Mehare et al.
There are three specific types[3] of area estimation: (a) Unsigned area estimation: I =
b a
|y| dx or I =
(b) Area estimation between two curves: I =
b a
b a
|f(x)| dx.
|f(x) – g(x)| dx.
(c) Area estimation between a curve and the y-axis: I =
3.12
b a
|f -1(y)| dy [1–3, 9–12].
Mean Value Theorem
The average value of a continuous function is stated to be attained at least once over its useful lifespan. Given a continuous function y = f(x), if f is continuous on [a, b], b then c exists in (a, b) such that f(c) = faverage = (1/ (b – a)) a f(x) dx.
In other words, the function achieves an average value c if the intervals a and b are present [1–3, 9–12].
Basic Mathematics
93
Note: Use these online tools to enhance your learning and quickly compute solutions • Integral calculus: https://www.integral-calculator.com • Differential calculus: https://www.derivative-calculator.net
4 Probability Basics Probability theory is a mathematical framework that may be used to express uncertain assertions. Probability is the study of uncertainty and may be defined as the number of times an event happens or as a degree of belief in an occurrence. The goal of probability theory is to define a mathematical structure for describing the random results of experiments. It offers a method for quantifying uncertainty as well as axioms for deriving new uncertain claims. Quantifying uncertainty necessitates the concept of a random variable, which is a function that links the results of random trials to a collection of attributes of interest. The probability distribution is a function associated with the random variable that quantifies the likelihood that a certain result (or group of possibilities) will occur. Probability distributions are used as a building block for other concepts, such as probabilistic modeling, graphical models, and model selection [1–3, 13–18].
4.1
Sample Space
The sample space is the set of all possible outcomes of the experiment, usually denoted by S. Two subsequent coin flips, for example, have a sample space of fhh, tt, ht., and thg, where “h” represents “heads” and “t” denotes “tails” [1–3, 13–18]
4.2
Event Space
The event space is the location of the experiment’s potential outcomes. A sample space event space subset A is in the event space A if we can witness whether a given outcome is in A at the end of the experiment. For discrete probability distributions, the event space A is constructed by studying the collection of subsets and A is frequently the power set of S [1–3, 13–18].
94
H. B. Mehare et al.
4.3
Probability
With each event A A, we associate a number P(A) that measures the probability or degree of belief that the event will occur. P(A) is called the probability of A. The probability of a single event must lie in the interval [0; 1], and the total probability over all outcomes in the sample space must be 1, i.e., P(Ω) = 2 [1–3, 13–18].
4.4
Random Variables
It is a discrete or continuous variable that may take on arbitrary values. A discrete random variable has a finite or countably infinite number of states, whereas a constant random variable has a fixed value [1–3, 13–18].
4.5
Probability Distributions
A probability distribution indicates how likely each of a random variable’s or group of random variables’ potential states is. Depending on whether the variables are discrete or continuous, we describe probability distributions. The following are the most frequent probability distributions: • • • • •
Bernoulli distribution. Multinoulli distribution. Gaussian distribution. Exponential and Laplace distributions. The Dirac distribution and empirical distribution [1–3, 13–18].
4.6
Probability Mass Functions
A probability distribution over discrete variables may be described using a probability mass function (PMF). The probability mass function maps from a state of a random variable to the probability of that random variable taking on that state. The probability that x = x is denoted as P (x = x) or P(x), with a probability of 1 indicating that x = x is certain and a probability of 0 indicating that x = x is impossible. Probability mass functions can act on many variables at the same time. Such a probability distribution over many variables is known as a joint probability distribution. P (x = x, y = y) or P(x, y) denotes the probability that x = x and y = y simultaneously [1–3, 13–18].
Basic Mathematics
4.7
95
Probability Density Functions
Probability distributions are described using a probability density function (PDF) when working with continuous random variables. A function p must have the following features in order to be a probability density function: • The domain of the function p must be the set of all possible states of x. • 8x 2 p(x) ≥ 0 (No particular need for p(x) ≤ 1). • p(x) dx = 2. A probability density function p(x) does not give the probability of a specific state directly; instead the probability of landing inside an infinitesimal region with volume δx is given by p(x)δx. The density function should be integrated to estimate the actual probability mass of a set of points [1–3, 13–18].
4.8
Marginal Probability
The marginal probability distribution is the probability distribution over the subset. The sum rule may be used to estimate p(x) for discrete variables, while integration is required for continuous variables. Discrete variables: 8x 2 x, P(x = x) = ∑y P(x = x, y = y) Continuous variables: p(x) = p(x, y) dy [1–3, 13–18]
4.9
Conditional Probability
Conditional probability is used to compute the probability of an event given that some other event has happened. It is denoted as P(y = y | x = x). P(y = y | x = x) = P(y = y, x = x) / P(x = x) The conditional probability is only defined when P(x = x) > 0 as it cannot be computed on an event that never happens [1–3, 13–18].
4.9.1
Chain/Product Rule
Any joint probability distribution over many random variables may be decomposed into conditional distributions over only one variable. Example: P(a, b, c) = P(a | b, c) P(b, c) P(b, c) = P(b | c) P(c) P(a, b, c) = P(a | b, c) P(b | c) P(c) [1–3, 13–18]
96
H. B. Mehare et al.
4.10
Expectation
The anticipated value of a function f(x) with respect to a probability distribution. P (x) denotes the average or mean value of f when x is drawn from P. The sum rule may be used to estimate P(x) for discrete variables, while integration is required for continuous variables. Discrete variables: Ex P [f(x)]= ∑y P(x) f(x) Continuous variables: Ex P [f(x)] = p(x) f(x) dx [1–3, 13–18]
4.11
Variance
The variance measures how much the values of a function of a random variable x fluctuate as different values of x are sampled from its probability distribution. When the variance is low, the f(x) values cluster close to their anticipated value. Varðf ðxÞÞ = E ðf ðxÞ - E ½f ðxÞÞ2 The square root of the variance is known as the standard deviation [1–3, 13–18].
4.12
Covariance
The covariance shows how two values are connected to each other linearly and the scale of these variables. High absolute covariance values indicate that the values are both distant from their respective means and fluctuate a lot. If the sign of the covariance is positive, both variables tend to achieve relatively high values at the same time. When the sign of the covariance is negative, one variable has a relatively high value, while the other has a relatively low value, and vice versa. Covðf ðxÞ, gðyÞÞ = E ½ðf ðxÞ - E ½f ðxÞÞ ðgðyÞ - E ½gðyÞÞ Covariance and dependency are related concepts, but they are not the same. They are related because two independent variables have zero covariance while two dependent variables do not [1–3, 13–18].
Basic Mathematics
4.13
97
Correlation
Correlation normalizes each variable’s contribution to assess simply how much the variables are connected, rather than being impacted by the magnitude of the individual variables [1–3, 13–18].
4.14
Bayes’ Rule
It is the mathematical rule that describes how to update a belief, given some evidence. Pðx j yÞ = PðxÞ Pðy j xÞ=PðyÞ,where PðyÞ =
x
Pðy j xÞ PðxÞ
Here, • • • •
P(x | y) is called posterior probability. P(x) is called prior probability. P(y | x) is called likelihood. P(y) is called marginal probability [1–3, 13–18].
5 General Applications of Linear Algebra, Calculus, and Probability 5.1 5.1.1
Linear Algebra Computer Graphics
Computer graphics is the art of using computers to create images, which can be in the form of a series of images or a single image. At the heart of graphics is the projection of a three-dimensional picture onto a two-dimensional screen. By default, projection is a linear map. Furthermore, linear algebra is needed to properly conduct and comprehend rotations, scaling, and perspective. Types: • RASTER (composed of pixels). • VECTOR (composed of paths) [1–3, 19–22].
98
5.1.2
H. B. Mehare et al.
Network Flow
Many challenges include a network of conductors along which a flow is detected. A net flow enters and exits the system at various points. The investigation of such systems is predicated on the notion that the total flow into the system must equal the total flow out. The total flow via each network connection must equal the total flow out. This criteria produces a linear equation that relates the flows in the conductors arising from the junction. For example, irrigation network, electrical network, a network of streets or freeways, etc. [1–3, 19–22].
5.1.3
Recommender Systems
Recommender systems are machine learning classes that offer users relevant suggestions based on pre-collected data [1–3, 19–22].
5.1.4
Natural Language Processing and Word Embedding
NLP applications include chatbots, speech recognition, and text analysis. Word embedding is a type of word representation that allows machine learning algorithms to recognize words that have similar meanings. For example: Grammarly, ChatBots, etc. [1–3, 19–22].
5.1.5
Error Correcting Codes
Another inconspicuous but commonly utilized use of linear algebra is coding theory. The purpose is to encrypt data so that the unencoded data can be retrieved even if the encoded data is tampered with [1–3, 19–22].
5.1.6
Facial Recognition
It employs principal component analysis, a linear algebraic approach. Essentially, this is merely finding a perfect foundation for representing a database of face photos and rebuilding the images with eigenvectors [1–3, 19–22].
5.1.7
Signal Analysis
Signal analysis is a field that provides a useful tool for encoding, analyzing, and changing signals such as music, images, video, and even x-rays and light reflected
Basic Mathematics
99
through a crystal. The Fourier transform is best understood as a linear map with a basis change [1–3, 19–22].
5.1.8
Quantum Computing
Quantum computing, like general quantum physics, is entirely based on linear algebra. Linear algebra is used to write quantum computers. It is often used to describe qubit states and quantum operations, as well as to anticipate how a quantum computer will respond to orders [1–3, 19–22].
5.2 5.2.1
Calculus Industrial Construction
Integration is used to calculate the quantity of materials required to build curved form structures (such as domes) and to calculate the weight of such structure. Calculus is used to improve building architecture and key infrastructure such as bridges [1–3, 9–12, 23, 24].
5.2.2
Space Flight and Research
Integration is used to calculate the quantity of materials needed to build curved form structures (such as domes) and to calculate the weight of such structures. Calculus is used to improve the architecture of buildings and vital infrastructure such as bridges [1–3, 9–12, 23, 24].
5.2.3
Bacterial Growth
Differential calculus is used by biologists to estimate the exact rate of development of a bacterial culture when multiple factors such as climate and food supply are modified [1–3, 9–12, 23, 24].
5.2.4
Graphics
A graphics artist uses calculus to determine how distinct three-dimensional models will behave under quickly changing settings. It can create a realistic environment for films or video games [1–3, 9–12, 23, 24]
100
5.2.5
H. B. Mehare et al.
Chemistry
It is used to determine the rate of a chemical reaction and to determine some necessary information of Radioactive decay reaction [1–3, 9–12, 23, 24]
5.2.6
Shipbuilding
Calculus has been used for many years to determine the curve of the ship’s hull (using differential calculus), the area under the hull (using integral calculus), and even in the general design of ships [1–3, 9–12, 23, 24].
5.2.7
Price Elasticity
Economists use calculus to calculate the price elasticity of demand. They call the ever-changing supply-and-demand curve elastic and the curve’s behaviors elasticity. Calculus allows you to choose specific points on that ever-changing supply-anddemand curve [1–3, 9–12, 23, 24].
5.2.8
Astronomy
Astronomers investigate heavenly bodies using planetary motion rules. Calculus is used to determine the orbits of moving bodies by measuring the speed at which their locations change over time [1–3, 9–12, 23, 24].
5.2.9
Cardiac Output: Blood Flow
Calculus is also used to calculate cardiac output, which is the rate at which blood flows through an artery or vein at any particular time. With the help of a dye, calculus is used to calculate the rate of blood pumped through the heart [1–3, 9–12, 23, 24].
5.2.10
Cancer: Monitor Tumor
Calculus is also utilized to determine whether a tumor is growing or shrinking. It can also figure out how many cells are in the tumor. It employs an exponential function to examine the disease’s progress or reversal [1–3, 9–12, 23, 24].
Basic Mathematics
5.2.11
101
Calculating Weather Patterns
By using differential calculus equations, you can calculate the effects of changing weather conditions in the atmosphere by applying such variables as temperature and pressure changes [1–3, 9–12, 23, 24].
5.3 5.3.1
Probability Weather Planning
A probability forecast assesses how likely an event can occur in terms of percentage and records the risks associated with weather [1–3, 13–18, 25].
5.3.2
Insurance
When it comes to structuring a policy or determining a premium rate, insurance firms apply the theory of probability, also known as theoretical probability. The probability view is a statistical strategy for predicting the likelihood of future outcomes [1–3, 13–18, 25].
5.3.3
Games and Game Theory
The study of mathematical representations of strategic relationships among analytical outcomes is known as game theory. Social science, logic, system science, and computer science are all areas where they can be used. In blackjack, poker, gambling, all sports, board games, and video games, the probability determines how likely a team or individual is to win [1–3, 13–18, 25].
5.3.4
Elections
Political analysts use exit polls to measure the probability of winning or losing the candidate or parties in the elections. The probability technique is used to predict voting results after the election [1–3, 13–18, 25].
5.3.5
Winning Lottery
Each player chooses six different numbers from a particular range in a typical lottery game. The ticket holder is a jackpot winner if all six numbers on the ticket match those on the winning lottery ticket, despite the order of the numbers [1–3, 13–18, 25].
102
5.3.6
H. B. Mehare et al.
Rate of Accidents
The likelihood of occurrence of vehicular accidents is estimated using the concepts and techniques in probability [1–3, 13–18, 25].
5.3.7
Typing on Smart Devices
When we type, the software constantly keeps suggesting words. It does this using probability, depending on which word is now commonly used [1–3, 13–18, 25].
5.3.8
Traffic Signals
The idea of probability is utilized to manage and monitor traffic flow on roadways. People can also use the concept of probability to figure out how long they will have to wait at the signals [1–3, 13–18, 25].
5.3.9
Sports Betting
Probability is heavily used by sports betting companies to determine the odds they should set for specific teams to win certain games [1–3, 13–18, 25].
5.3.10
Sales Forecasting
Many retailers utilize probability to forecast the likelihood of selling a specific number of items on a particular day, week, or month. This enables businesses to predict how much inventory they will require [1–3, 13–18, 25].
5.3.11
Natural Disasters
Countries’ environmental ministries frequently utilize probability to estimate the chances of a natural disaster occurring in a particular year. If the probability is relatively high, then the department will make decisions about housing, resource allocation, etc., which will minimize the effects done by natural disasters [1–3, 13– 18, 25].
5.3.12
Investments
Investors use probability to assess how likely a particular investment will pay off. Based on this probability, the investor will pick how much of their net worth to invest in the stock [1–3, 13–18, 25].
Basic Mathematics
103
References 1. Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for machine learning. Cambridge University Press. 2. Goodfellow, I. (2016). Deep learning-Ian Goodfellow. Aaron Courville- Google Books. 3. Trask, A. W. (2019). Grokking deep learning. Simon and Schuster. 4. Axler, S. (2015). Linear algebra done right (Vol. 2). Springer. 5. Brownlee, J. (2018). Basics of linear algebra for machine learning. Machine Learning Mastery. 6. Aggarwal, C. C., Aggarwal, L. F., & Lagerstrom-Fife. (2020). Linear algebra and optimization for machine learning (Vol. 156). Springer. 7. Elgohary, A., Boehm, M., Haas, P. J., Reiss, F. R., & Reinwald, B. (2016). Compressed linear algebra for large-scale machine learning. Proceedings of the VLDB Endowment, 9(12), 960–972. 8. Dhanalakshmi, P. (2021). Linear algebra for machine learning. In Artificial intelligence theory, models, and applications (pp. 405–428). Auerbach Publications. 9. Banner, A. (2007). The calculus lifesaver: All the tools you need to excel at calculus. Princeton University Press. 10. Thompson, S. P., & Gardner, M. (1998). Calculus made easy. Macmillan. 11. Brownlee, J., Cristina, S., & Saeed, M. (2022). Calculus for machine learning. Machine Learning Mastery. 12. Laue, S., Mitterreiter, M., & Giesen, J. (2020). A simple and efficient tensor calculus for machine learning. Fundamenta Informaticae, 177(2), 157–179. 13. Jaynes, E. T. (2003). Probability theory: The logic of science. Cambridge university press. 14. Morin, D. J. (2016). Probability: For the enthusiastic beginner. Createspace Independent Publishing Platform. 15. DasGupta, A. (2011). Probability for statistics and machine learning: Fundamentals and advanced topics (pp. 1057–7149). Springer. 16. Unpingco, J. (2016). Python for probability, statistics, and machine learning (Vol. 1). Springer. 17. Hernández-Orozco, S., Zenil, H., Riedel, J., Uccello, A., Kiani, N. A., & Tegnér, J. (2021). Algorithmic probability-guided machine learning on non-differentiable spaces. Frontiers in Artificial Intelligence, 3, 567356. 18. Coenen, L., Verbeke, W., & Guns, T. (2022). Machine learning methods for short-term probability of default: A comparison of classification, regression and ranking methods. Journal of the Operational Research Society, 73(1), 191–206. 19. Halim, S. (2020). Application of linear algebra in machine learning. Interface, 7(02). 20. Nicholson, W. K. (2020). Linear algebra with applications. 21. Gilbert, W. J., & Nicholson, W. K. (2004). Modern algebra with applications. Wiley. 22. Nicholson, K. W. (2019). Linear algebra with applications, 2019A version (Lyryx). 23. Marvin, L., Ellenbogen, D. J., & Surgent, S. J. (2014). Calculus and its applications, expanded version. 24. Niu, H., Chen, Y., Guo, L., & West, B. J. (2021, August). A new triangle: Fractional calculus, renormalization group, and machine learning. In International design engineering technical conferences and computers and information in engineering conference (Vol. 85437, V007T07A022). American Society of Mechanical Engineers. 25. Borovcnik, M., & Kapadia, R. (2012). Applications of probability: The Limerick experiments. Topic Study Group.
Introduction to the World of Bioinformatics Sarbani Mishra, Sudiptee Das, Madhusmita Rout, Sanghamitra Pati, Ravindra Kumar, and Budheswar Dehury
1 Laying Foundation In simple terms, bioinformatics may be defined as a biological field that deals with the storage, retrieval, and analysis of large biological information. Bioinformatics is still an emerging interdisciplinary field that is pouncing up rapidly. It focuses on hastening and escalating biological research [1]. It conceptualizes biology in macromolecules and then applies information technology derived from various other fields, such as mathematics, statistics, biophysics, biochemistry, computer science, etc., to understand, organize, and retrieve information related to these macromolecules on a large scale [2]. The basic principle lies in the use of computational techniques to solve the problems that emerged in biology. Initially, it was used as an effort to sequence genomes with ease which has now become a pulsating discipline that significantly contributes to all aspects of biological research [3]. Bioinformatics deals with the sequence and structure of genes and proteins, the simplest way to represent a macromolecule. The structure of any macromolecule can provide us more information about protein folding patterns and their evolutionary relationships. Furthermore, it focuses on the genome structures and functions that are continuously updated. In addition, it deals with the bibliographic data, such as abstracts of scientific articles from all the research projects [4]. Bioinformatics
Sarbani Mishra and Sudiptee Das have contributed equally to this chapter. S. Mishra · S. Das · M. Rout · S. Pati · B. Dehury (✉) Bioinformatics Division, ICMR-Regional Medical Research Centre, Bhubaneswar, Odisha, India R. Kumar (✉) School of Biotechnology, National Institute of Technology Calicut, Kozhikode, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_4
105
106
S. Mishra et al.
applications include gene-protein interactions, protein structure predictions, protein function analysis, etc. Despite significant achievements made over the last few decades in the field of bioinformatics, starting from genes, proteins, genomes, transcriptomes, and metabolomes, it is still an emerging field, and many more developments are yet to come. This book chapter holds a brief introduction to the world of bioinformatics, beginning with the origin and history of bioinformatics, followed by the goals and application of different bioinformatic tools and databases in the management of different macromolecules like protein and nucleic acids.
2 A Brief History of Bioinformatics Paulien Hogweg coined the term bioinformatics for the first time in 1979, but the systematic work using bioinformatics was begun by Margaret Dayhoff (the first bioinformatician) and her collaborators during the 1960s. Hence Dayhoff is known as the pioneer in bioinformatics. Bioinformatics originated between 1950 and 1970. The protein analysis during the 1950s was the beginning of the development of bioinformatics which was marked by the publication of the first protein sequence, insulin. In the early 1960s, the very first bioinformatics software was developed to solve the sequence. Dayhoff, along with Robert S. Ledley, developed COMPROTEIN, “a complete computer program for IBM 7090” for the determination of protein structure at the National Biomedical Research Centre (NBRC). It was the first de novo sequence assembler, where both the input and output amino acid codes were represented in a three-letter code. Later the single letter code of amino acids was developed by Dayhoff [5]. In 1970, the world of bioinformatics grew further with the development of the Needleman-Wunsch algorithm, a dynamic programming algorithm for pairwise sequence comparison. The world’s first nucleotide database, European Molecular Biology Laboratory (EMBL), was established in 1974. Another milestone in bioinformatics was achieved when Dayhoff, Schwartz, and Orcutt developed the very first probabilistic model of amino acid substitution in 1978, i.e., the PAM substitution model [5]. At the beginning of the 1980s, the first phylogenetic tree construction method, the maximum likelihood method, was proposed. The increase in the number of sequences became necessary for sequence comparison, which led to the development of a tool for local sequence alignment in the 1990s, i.e., BLAST. The significant achievements in bioinformatics over the last few decades have been illustrated in Fig. 1. The three main breakthroughs in the history of bioinformatics are (i) the development of DNA sequencing methods that were more efficient and cost-effective, (ii) the rise of supercomputers and the Internet, and (iii) the genome projects [6]. The sequencing of the whole human genome using bioinformatics tools gave the world a new viewpoint. The Human Genome Project was officially completed in the year 2003. In the meantime, the genomes of many other organisms were sequenced using the shotgun sequencing method, such as Haemophilus influenza (1995) [7],
Introduction to the World of Bioinformatics
107
Fig. 1 Various milestones achieved in the field of bioinformatics over the last few decades, starting from the 1970s have been illustrated below. The development of the Needleman-Wunsch algorithm along with the protein databank, the sequencing of the first genome using Sanger’s sequencing, and the development of the PAM algorithm took place in the 1970s. In 1980s, the development of the Smith-Waterman algorithm, FASTA algorithm, and neighbor-joining phylogenetic method, creation of NCBI, and sequencing of the phage lambda took place. The development of the BLAST algorithm and genome sequencing of S. cerevisiae and H. influenza was achieved in 1990s, along with making EMBL publicly available for the first time. The completion of the human genome project with the beginning of the next generation sequencing techniques, such as the 454 pyrosequencing technique and the completion of rice genome sequencing, was seen in the 2000s. The year 2010–2020 witnessed the development of the SOLiD and Ion Torrent sequencing techniques with the complete genome sequencing of Drosophila melanogaster and AlphaFold and Nanopore sequencer. During the 2020 pandemic, bioinformatics played a significant role in the complex analysis of SARS-CoV-2
Drosophila melanogaster (2010) [8], and many more, leading to the development of the high-throughput genome sequencing or the next-generation genome sequencing platforms that reduced the processing time as well as the cost with the advent of each innovative technology [9]. In 2018, an artificial intelligence system called AlphaFold was developed, which is used to solve the 50-year-old problem of protein folding. It is recognized by the organizers of the biennial Critical Assessment of the Protein Structure Prediction (CASP) [10, 11]. In the last few years, bioinformatics has also played a crucial role in the complex analysis of SARS-CoV-2.
3 Goals of Bioinformatics The development of bioinformatics solely focuses on the following extremities. (i) Development of databases to store and organize different known datasets produced from various research works, which may help researchers to access
108
S. Mishra et al.
Fig. 2 The different aspects of bioinformatics involve the similarity search techniques and sequence alignment with the functional annotation of different biological macromolecules along with the protein secondary and tertiary structure prediction and studying their evolutionary relationships between the given sequences
existing information and submit new entries. For example, databases such as GenBank, EMBL, and DDBJ store information about the nucleotide sequences. (ii) Developing tools and software packages for performing various analyses on different data, which are difficult to perform experimentally. The development of these tools require thorough knowledge of both computational and biological theories. For example, the different variants of BLAST are used for searching local alignment between sequences that might be nearly homologous or distantly homologous. (iii) The data that has been collected in the databases is investigated thoroughly with its elucidation and used for various research works using the tools. The ultimate focus of bioinformatics is to discover new biological insights and create a global perspective through which the combined principles in biology can be detected [12]. The different areas of bioinformatics are illustrated in Fig. 2.
4 Genome, Genes, and Sequence 4.1
Gene
A gene is a short segment of DNA, present in the chromosome that controls the phenotypic characters in any organism. It is the structural and functional unit of heredity. It may also be defined as the structural arrangement of exons (coding regions) and introns (interrupting non-coding regions) interchangeably throughout the sequence in the case of eukaryotes [13]. It carries the information for forming a
Introduction to the World of Bioinformatics
109
specific polypeptide or RNA molecule. The study of genes and heredity is known as genetics. The process through which a gene gets turned on to transcribe RNA and translate proteins is known as gene expression. Several gene and gene expression databases are developed to keep a record of all the details of the gene of different organisms and their characteristics.
4.1.1
Gene Prediction
Gene prediction may be defined as the process of identification of coding regions in a DNA. It is one of the most critical problems in computational biology or bioinformatics. The gene identification problem involves the prediction of genes that code for RNAs which are then expressed as proteins by means of translation. But many functional genes do not code for any protein but only get transcribed to RNA, which helps in the regulation of gene expression and protein synthesis. These genes lack the sequence features, and hence it becomes difficult to find them using traditional gene finding programs. In order to identify such genes, various computational measures have been taken by developing tools to perform such tedious work. Due to their complexity, the identification of prokaryotic genes is easier than eukaryotic genes. The two fundamental approaches for gene finding are (i) similarity-based approaches and (ii) ab initio-based approaches.
4.1.1.1
Similarity-Based Approaches
The main concept of this approach is based on searching for similarities between the ESTs (expressed sequenced tags) or other genomes in gene sequences and the input genome [14]. This approach is based on the assumption that the functional regions in a gene are evolutionarily more conserved than the non-functional genes [14]. The two widely used similarity search methods are local alignment and global alignment (which we shall study in Sect. 4.3). Various tools have been developed to perform these tasks. The BLAST and NEEDLE are the most widely used algorithms for local and global alignment-based similarity searches. Many protein homology methods have been applied to the gene prediction programs such as GENEWISE [15], GENOMESCAN [16], GeneParser [17], GRAIL [18], Genie [19], and HMMgene [20].
4.1.1.2
Ab Initio-Based Approaches
The approach is based on identifying genes by using the gene structure as a template to detect them, hence named ab initio prediction. The ab initio prediction relies on two types of sequence information, signal sensors and content sensors [14]. Many different algorithms have been applied to model gene structures, such as dynamic programming, hidden Markov model, neural networks, etc. Based on these models, a
110
S. Mishra et al.
large number of ab initio gene prediction programs have been developed, such as GeneParser [17], Genie, and GRAIL, which combine similarity searches. Other programs include GeneID [21], FGENESH [22], GENESCAN [23], HMMgene, AUGUSTUS [24], etc. The gene prediction method is still in progress, which requires a piece of comprehensive biological knowledge of gene expressions at the molecular level. Hence, to increase the pace of gene discovery, a lot of experimental efforts is required [14].
4.2
Genome
A genome is the set of all the genetic material in an organism, which, due to commonalities, tends to favor the same phenotypes [25]. As suggested by WHO, genomics is the study that deals with the part of genetic or epigenetic sequence information of organisms and also tries to understand the structure and function of these sequences [26]. The main difference between genetics and genomics is that the former studies the function and composition of a single gene, whereas the later computes the interrelationships between all those genes with the aim of identifying their combined influence on the growth and development of any organism [27]. All the genomic information, such as the genome size, sequence of the whole genome, genome arrangement, genome evolution, etc. can be gathered from the study of genomics. It also helps us understand the variability among the genomes, which may occur from various sources such as duplication events, transposons, microsatellites, etc. It is the starting point for understanding other “OMICS” science [27]. The list of databases useful for genes and genome analysis has been summarized in Table 1. Genomics can be divided into three types: (i) functional genomics, (ii) comparative genomics, and (iii) structural genomics.
4.2.1
Functional Genomics
The main goal of functional genomics is to understand the relationship between an organism’s genome and phenotype. Functional genomics deals with the characterization of the genome at the functional level. It mainly focuses on transcription, translation, and protein-protein interactions (PPI). It describes the characterization of genes from the genome at the transcriptome level (transcriptomics) and at the proteome level (proteomics).
4.2.2
Structural Genomics
The detailed study of the organization and sequence of DNA in the whole genome is known as structural genomics. It can be studied by developing chromosomal maps,
Introduction to the World of Bioinformatics
111
Table 1 List of biological databases useful for analysis of genes and genomes List of gene and genomic databases Gene Expression Omnibus (GEO) database [28]
Online Mendelian Inheritance in Man (OMIM) [29]
RefSeqGene [30]
BioProject [31]
Gene [32]
Gene Expression Database (GXD) [33] Gene Expression database of Normal and Tumoral tissues (GENT) [34]
It is a public repository that freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community This database contains details about human genes, genetic disorders, and traits. It mainly focuses on the molecular relationship between genetic variation and phenotypic expression It is a collection of different human gene-specific reference genomic sequences. RefSeq gene is a subset of NCBI’s RefSeq database It is a collection of biological data related to either a single initiative originating from a single or a consortium. The database provides users with the link to diverse data generated for that project It is a searchable database for genes that focuses on genomes that have been completely sequenced. It carries information about gene nomenclature, chromosomal localization, gene products, associated markers, phenotypes, interactions etc. This database carries the gene expression information from the laboratory mouse It provides gene expression profiles across diverse human cancer and normal tissues with samples generated by a microarray platform with consistent data processing
generating expressed sequence tags from cDNA libraries, whole sequencing genomes, and whole genome re-sequencing.
4.2.2.1
Genome Mapping
Genome mapping has proven to be the most critical component of structural genomics. There are various types of genome maps, like cytogenic maps, genetic maps/linkage maps, and physical maps. The cytogenic mapping deals with the visual appearance of the stained chromosomes and is examined under a microscope. It is created by the microscopic determination of the position of visible structures on fixed and stained mitotic or meiotic chromosomes. Mapping of genome can be used to correlate information with genetic maps. Genetic maps of species help in showing the position of its known genes or genetic marker, which are inter-related based on the recombination frequency, whereas the physical maps are the representation of the chromosome that provides the physical distance of genes at different positions in a chromosome measured in the nucleotide base.
112
4.2.3
S. Mishra et al.
Comparative Genomics
Comparative genomics focuses on deriving the standard features of two organisms, often encoded within the DNA that is conserved among the ancestors. The importance of comparative genomics lies in comparing the gene numbers, their locations, and biological functions in the genomes of different organisms, intending to identify the group of genes that plays a unique role in a particular organism. It can help to predict the functions of many such genes by analyzing the genomic context and gene fusions, distributions, and co-expression. Comparative genomics helps study organisms’ evolutionary history by comparing related species or strains. With the commonalities in the evolution of all living organisms, it becomes possible to understand the significant differences and similarities between species. The application of this technique proves to be helpful where the financial and technical problem arises [27].
4.3
Sequence
A biological sequence refers to the sequence of nucleic acids (both DNA and RNA) and proteins. Bioinformatics has proven to help analyze DNA, RNA, and protein sequences using various tools and software to extract knowledge about their properties, biological functions, structure, and evolution [35].
4.3.1
Sequence Homology
Sequence analysis is one of the significant applications of bioinformatics. The tools developed for sequence analyses are based on sequence alignment and determining the query’s relatedness and the database’s template sequence. Sequence homology is the term used for analyzing evolutionary ancestry. Based on homology, sequences can be classified as orthologous or paralogous. Homologous sequences that are separated by speciation events are called orthologs. The orthologs have conserved genes. Examples of orthologs are the plant flu regulatory protein in Arabidopsis and Chlamydomonas. Paralogous sequences are homologous sequences that are separated by gene duplication events. Paralogs have the exact composition of genes with two different positions in the genome having a different functions. An example of a paralogous sequence is the haemoglobin gene of humans and the myoglobin gene of chimpanzees.
4.3.2
Sequence Alignment
Sequence alignment is the identification and arrangement of regions of similarity in nucleic acids and protein sequences for understanding their evolutionary
Introduction to the World of Bioinformatics
113
relationships [36]. The comparison between a pair of sequences is known as pairwise sequence alignment (PSA), whereas the comparison among a group of sequences is known as multiple sequence alignment (MSA).
4.3.3
Pairwise Sequence Alignment
PSA is one of the most fundamental tools in bioinformatics that forms the basis for MSA. The PSA can be broadly divided into local alignment and global alignment. Local alignment searches for local regions of similarity between sequences, whereas global alignment is performed to align the entire sequences or end-to-end alignment. One of the most common algorithms used for global alignments is the Needleman-Wunsch algorithm, which is the Smith-Waterman algorithm for local alignment. Needleman-Wunsch’s algorithm [37] is applied for finding optimal alignments between pairs of sequences, but with the increase in the number of sequences, the job becomes tedious and time-consuming. MSA is developed to overcome the above problem in PSA. BLAST is a vividly used local alignment search tool.
4.3.4
Multiple Sequence Alignment
MSA is more advantageous than the PSA as it considers more sequences of a family and provides additional biological information [38]. MSA has proven to be helpful for comparative genomics for identifying and quantifying conserved regions in a sequence family. A large number of biological analyses require MSA. It is also a vital modeling tool that combines computational and biological problems [39]. MSA can be used for building phylogenetic trees to study the evolutionary relationship between groups of sequences, prediction of protein secondary and tertiary structures, protein function prediction, etc.
4.3.5
Algorithm for MSA
To study the importance of MSA in various fields, many algorithms have been developed in recent years, making them more efficient. Unfortunately, development of a perfect MSA tool is computationally intensive and a complex task. Hence this area of research is very operative. Progressive alignment is the most popular heuristic used for MSA that FengDoolittle developed. It focuses on progressively building the full alignment by first performing pairwise alignment using methods such as the Needleman-Wunsch algorithm, Smith-Waterman algorithm, k-mer algorithm, etc. and then clustering them together to show their relationship using different phylogenetic tree construction methods. However, the problem lies in the nature of the algorithm, which focuses on only two sequences at a time and ignores the rest, and if any mistake
114
S. Mishra et al.
arises in the initial stages of the alignment, it becomes unfixable in the later stages. This results in uncertainty in obtaining an optimal solution. The mostly used progressive alignment algorithms include Clustal Omega [Clustal Omega (RRID: SCR_001591)], MAFFT [40–44], Kalign [39], ProbCons [45], DIALIGN [46], PRANK [47], Probalign [48], MSAProbs [49], etc. Iterative alignment, also known as the iterative progressive alignment, is the improvised version of progressive alignment that focuses on overcoming the limitations of the latter one. This algorithm primarily works in the same manner as progressive alignment, except that it repeatedly applies dynamic programming to realign the initial sequences in order to improve their overall alignment quality by adding new sequences at the same time [50]. The iteration eases the alignment by rectifying any errors produced initially, increasing the alignment’s accuracy. The mostly used iterative alignment algorithm includes MUSCLE [51], Dialign, SAGA [28], T-COFFEE [52], and PRRN [53].
5 Protein and Structure A protein may be considered a polymer of amino acids bound together by peptide bonds translated from cellular DNA. Each gene in the DNA codes for a unique protein. Amino acids are the building blocks of proteins. They are small organic molecule which is composed of an α-carbon atom linked to an amino group, a carboxyl group, hydrogen, and a variable component, say R (as illustrated in Fig. 3). There are 20 essential amino acids. The protein is made up of those 20 sets of amino acids with a unique side chain [54]. The starting amino acid in any protein is methionine. A peptide bond is formed when the carboxyl group of one amino acid is linked to the carboxyl group of the neighboring amino acid. Hence proteins are known as polypeptides. The repeating sequence of amino acids in a polypeptide is termed the backbone of the protein. The portions of amino acids which are attached to these portions give the amino acids their unique properties [55].
Fig. 3 2D representation of the structure of amino acids comprising the amino group, carboxylic group, and the functional group (R)
Introduction to the World of Bioinformatics
115
Protein Structure
Primary
Secondary
Tertiary
O +
H3N
C CH
–
O
Fig. 4 This figure illustrates the structure of a protein based on its different levels of complexity
5.1
Linear Protein Structure
The linear sequence of amino acids formed is known as the protein’s primary structure. It consists of at least 20 amino acids. All the 20 amino acids differ in their side chains with varying atomic compositions leading to different chemical properties. The different types of protein structures are shown in (Fig. 4).
6 Secondary Structure The secondary structure of proteins refers to the regular structure of the protein backbone, stabilized by both intra-molecular and inter-molecular hydrogen bonding. The hydrogen bond is formed between the partially negative oxygen atom and the partially positive nitrogen atom. The two widespread forms of the secondary structure are the alpha helix and the beta sheets (Fig. 4).
6.1
Α-Helix
The alpha helix is the helical coil held together by hydrogen bonding between every fourth amino acid. The globular proteins; fibrous proteins such as alpha-keratin, fingernails, and toenails; the transmembrane proteins; etc. contain the alpha-helical structure.
116
6.2
S. Mishra et al.
Β-Sheets
The beta-pleated sheet is the other standard secondary structure. It has two regions of the polypeptide chain lying side by side and bonded by hydrogen bonds. These structures make up the core of many globular proteins. Two types of beta-pleated sheets include parallel beta-pleated sheets and anti-parallel beta-pleated sheets. The parallel beta-pleated sheets have both the beta strands in the same direction, whereas the anti-parallel sheets are opposite in direction [56].
6.3
Secondary Structure Prediction
The protein secondary structure prediction is a set of a technique used to determine the local secondary structures of proteins based on the knowledge of their amino acid sequences. The prediction accuracy of the secondary structure is the crucial element in the prediction of tertiary structure. The various secondary structure prediction methods include the following extremities. (a) The Chou-Fasman method of secondary structure prediction depends on assigning a set of prediction values to a residue and then applying a simple algorithm to those numbers. It is one of the first and most widely used methods for predicting secondary structures, with an accuracy of 50–60% [57]. The Chou-Fasman method is a combination of statistics-based and rule-based methods. The algorithm begins with searching for the helices and beta stands. A helix is predicted if four helices are favored in a run of six residues, and the average value of helix propensity is greater than 100 and the beta-strand propensity. Likewise, a strand is predicted if three strands are favored in a run of 5 residues, and the average value of the strand propensity is greater than 1.04 and the helix propensity [58]. (b) The GOR (Garnier-Osguthorpe-Robson) method is one of the most popular secondary structure prediction methods. It is based on the information from Bayesian statistics. The main advantage of the GOR method over neural networks and nearest neighbor methods is that it clearly identifies the amino acids taken into account. This method is based on the frequency of amino acids present in the databases, and the calculation of parameters is easy to update [59]. (c) Hidden Markov Models is a stochastic method that focuses on training sequences to predict their secondary structures. A test sequence is selected, and other sequences are used each time to train data. The sequences which are homologous to the test sequence are removed from the training set. The sequence that have highest probability in the output of the forward and backward algorithms are predicted to have secondary structures. (d) Neural Networks are used to predict the first machine learning-based secondary structures. However, deep learning is an emerging and evolving field in machine learning that is used for effective and accurate structure prediction. Though an
Introduction to the World of Bioinformatics
117
accurate protein secondary structure predictor has not yet been achieved using neural networks, but its progress is very promising and accurate. (e) Combinatorial Methods. A combinatorial approach for protein structure prediction focuses on minimizing the protein folding problem. It helps in developing the most useful structure predictions. The general-purpose secondary structures are accurate at ~65%. Various combinatorial packing algorithms have been developed that explore all the possible geometrically sensible tertiary arrangements from the secondary elements [60]. Tools for protein secondary structure prediction that uses combinatorial methods include PSI-pred [61], JPRED [62], PREDATOR [63], and BeStSel [64] (a web server for accurate prediction of secondary structure).
7 Tertiary Structure The tertiary structure of polypeptides and nucleic acids refers to its threedimensional (3D) structure in space. The protein tertiary structure is formed by the interactions of the side chains of the various amino acids. They can be polar, non-polar, or charged, which determines the varying physical and chemical properties of the amino acid. The polar and charged amino acids are hydrophilic (waterloving) in nature, whereas the non-polar amino acids are hydrophobic (water-hating) in nature. The tertiary structure is influenced by the properties of the amino acids (Fig. 4). The various types of interactions in a protein tertiary structure are hydrophobic interaction, hydrogen bonds, weak van der Waals force, and ionic interactions. These are among the weak interactions. The disulfide bridges formed between two cysteine residues are the covalent bonds that strengthen the shape of a protein. The primary structure determines the tertiary structure and function of a protein [65].
7.1
Prediction Methods for Tertiary Structure of Proteins
There are three principal methods for the prediction of protein tertiary structure, which include comparative modeling, the ab initio method, and threading or fold recognition.
7.1.1
Comparative Modeling
Comparative modeling is also known as homology modeling. This method is used to predict the structure of a sequence by comparing it to a known structure with a similar sequence. This method relies on knowledge of known structures and on the proposition that similar sequences will have similar structures [66]. Based on the
118
S. Mishra et al.
sequence similarity, the target sequence may fall under three zones, the twilight zone > 30% midnight zone < 70% safe zone. The target protein sequence is aligned against a library of known sequences with known structures available in a 3D structure data repository such as PDB. The template structures with the highest sequence identity are chosen to build models. The accuracy of the models built is evaluated by checking the stereo-chemistry and compatibility of the target sequence and modeled structure, e.g., modeler [67–69], Swiss modeler [70–73], HOMELETTE [74], I-TASSER [75], PRIMO [76], etc.
7.1.2
Ab Initio
The prediction of protein structure using its amino acid sequence without any prior structural knowledge is known as the ab initio method of tertiary structure prediction. This method gives the most accurately predicted model; however, it is very time-consuming and costly. I-SITES [77], HMMSTR [77], and ROSETTA [78] are some of the tools that use the ab initio method.
7.1.3
Fold Recognition
The process of fold recognition and threading is used to assign tertiary structures to the protein sequences even without the knowledge of clear homology. It aims to assign folds to target sequences that have very low sequence identity to known structures. The main aim is to calculate how well each sequence fits a structure [79]. Some fold recognition tools and software are 3D-pssm [80, 81], THREADER [82], ProFIT [83], etc.
7.1.4
Machine Learning Approach
The use of machine learning in the prediction of the 3D structure of proteins has proven to be a forward-looking technique. It has been seen during the CASP13 held in 2019 that various groups such as ROSETTA [84], DeepMind [85], and FEIG-R2 [86, 87] have proven that the use of deep learning methods helps to gain the fold level accuracy for proteins that lack homologs in PDB [88]. The AlphaFold developed by DeepMind predicts a protein’s 3D structure from its amino acid sequence. The first release of AlphaFold DB contains over 360,000 predicted protein structures of different organisms, which greatly impact the biological field due to its high accuracy. However, further development in the AlphaFold DB is required, which may again be used for the structural prediction of the proteins, which may help in the research of various neglected diseases [89].
Introduction to the World of Bioinformatics
119
8 Databases Databases are the collection of data and/or structured information in an organized manner stored in a computer system that is updated regularly. It helps users to study the available data and understand the concepts of biological phenomena by removing data redundancy. The available biological databases may be divided into three major categories primary database, secondary database, and composite or derived database.
8.1
Primary Database
The primary databases are used to store the experimental data submitted directly by the researchers. The experimental data consists of information related to protein sequence, nucleic acid sequence, or macromolecular structures. ENA, GenBank of NCBI, and DDBJ are the primary nucleotide sequence databases. Examples of other primary databases include ArrayExpress, GEO, and Protein Data Bank (PDB: a 3D macromolecular structure database).
8.2
Secondary Database
The secondary database consists of data derived from analyzing the primary data. Highly curated data are stored in these databases. The secondary database is better as it contains more valuable knowledge compared to the primary database. Examples of secondary databases include InterPro [90] (protein families, motifs, and domains), UniprotKB [91] (sequence information on proteins), Ensemble [92] (variation, function, regulation, and more layered onto whole genome sequences), etc.
8.3
Composite Database
The data entered into these databases are first compared and then filtered on the basis of the desired data. This database consists of information from both primary and secondary databases. Some examples of composite databases are OWL [92], NRDB and SwissProt [93, 94], and TrEMBL (Table 2).
120
S. Mishra et al.
Table 2 List of different databases used in bioinformatics research and life sciences Database ArrayTrack™ [95]
European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) [96] Genomes Online Database (GOLD) [97]
AlphaFold
AceDB [98]
PlantMarkers [99]
Harvest [100, 101]
Description This is a microarray database, data analysis, and interpretation tool developed by NCTR. It is MIAME (Minimum Information About A Microarray Experiment)-supportive for storing both microarray data and experiment parameters associated with a pharmacogenomics or toxicogenomics study. This includes gene and protein sequences, protein families, structures, gene expression data, protein interactions, pathways, and small molecules, to name a few. GOLD serves as a significant resource that catalogs and monitors genome and metagenome projects from around the world. AlphaFold uses machine learning based approaches to predict the 3D structure of a protein from its amino acid sequence. This database system was developed for handling genome and bioinformatic data; it includes many powerful tools for manipulating, displaying, and annotating genomic data. This database contains a comprehensive pool of predicted molecular markers (SNP, COS, and SSR). Harvest originated as EST database-viewing software supporting gene function analyses and oligonucleotide design.
9 Bioinformatics Tools Bioinformatics tools are software programs designed to extract meaningful information from various data of molecular biology/biological databases and carry out sequence, structure, function, and network analysis. The different types of bioinformatic tools extensively used for various computational biology works have been listed in Table 3. There are a lot of more widely developed tools which have not been listed below that are also helpful for various biological analyses and life science research.
10
Conclusion
Bioinformatics utilizes different computational approaches that includes sequence and structural alignment of different proteins and nucleic acids for various biological analyses, data mining, database designing, phylogenetic tree construction for studying evolutionary relationship between organisms, protein structure and function prediction, gene prediction, expression data analysis, etc. Bioinformatics has
Introduction to the World of Bioinformatics
121
Table 3 List of bioinformatic tools developed over the years Tools SNP2CAPS [102] TASSEL [103] BRENDA: Comprehensive Enzyme Information System [104]
Cytoscape: Biology [105]
Maestro [106]
MEGAN4-MEtaGenome ANalyzer [107] CCTop [108, 109] AgBase v. 2.00 [110, 111]
FAST Tool Kit [112] Ion Torrent NGS [113, 114]
Description This tool used to convert SNPs to CAPS marker. This package is used to evaluate trait associations, evolutionary patterns, and linkage disequilibrium. BRENDA is an information system representing one of the most comprehensive enzyme repositories, which comprises molecular and biochemical information on enzymes that have been classified by the IUBMB. Although Cytoscape is often used to visualize gene, protein, and metabolic networks, it can be used to visualize other biomedical networks and visualize the human disease network as well. Maestro is the graphical user interface for all of Schrodinger’s computational programs and provides a powerful, fully integrated molecular visualization and analysis environment. This is a stand-alone analysis tool for metagenomics of short-read data. This tool is used for CRISPR/Cas9 target prediction. It provides several functional analysis tools, including GOProfiler, Genome2Seq, GORetriever, etc. AgBase is used for functional analysis of agricultural plant and animal gene products, including gene ontology annotations. These command line tools used for short-reads FASTA/ FASTQ files pre-processing. This is a high-throughput rapid sequencing of base pairs in DNA or RNA samples.
contributed significantly in the field of science by integrating a variety of computational tools, techniques and algorithms. Though several research groups are producing abundance of genomic information using these high-throughput computational tools and pipelines, improvements in the existing algorithms and tools are still required to accelerate the process of annotation of biological data including genes, genomes, proteins, proteomes, transcripts and transcriptomes.
References 1. Hogeweg, P., & Tekaia, F., B. Information, T. Oxford, and E. Dictionary. (2001). History, aim and scope, pp. 1–8. 2. Luscombe, N. M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? A proposed definition and overview of the field. Methods of Information in Medicine, 40(4), 346–358. 3. Fenstermacher, D. (2005). Introduction to bioinformatics. Journal of the American Society for Information Science and Technology, 56(5), 440–446. https://doi.org/10.1002/asi.20133
122
S. Mishra et al.
4. Pongor, S., & Landsman, D. (2015). Bioinformatics and developing word. Biotechnology and Development Monitor, 40, 1–7. 5. Gauthier, J., Vincent, A. T., Charette, S. J., & Derome, N. (2019). A brief history of bioinformatics. Briefings in Bioinformatics, 20(6), 1981–1996. https://doi.org/10.1093/bib/ bby063 6. Hagen, J. B. (2000). The origins of bioinformatics. Nature Reviews Genetics, 1(3), 231–236. https://doi.org/10.1038/35042090 7. Fleischmann, R. D., et al. (1995). Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269(5223), 496–512. https://doi.org/10.1126/SCI ENCE.7542800 8. Adams, M. D., et al. (2000). The genome sequence of Drosophila melanogaster. Science, 287(5461), 2185–2195. https://doi.org/10.1126/SCIENCE.287.5461.2185 9. Mariano, D. C. B., et al. (2016). SIMBA: A web tool for managing bacterial genome assembly generated by ion PGM sequencing technology. BMC Bioinformatics, 17(Suppl 18), 456. https://doi.org/10.1186/s12859-016-1344-7 10. AlphaFold: a solution to a 50-year-old grand challenge in biology (2022). https://www. deepmind.com/blog/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology. Accessed 14 Sep 2022. 11. Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K., & Moult, J. (2019). Critical assessment of methods of protein structure prediction (CASP – Round XIII). Proteins: Structure, Function, and Bioinformatics, 87(12), 1011–1020. https://doi.org/10.1002/PROT.25823 12. Sayers, E. W., et al. (2022). Database resources of the national center for biotechnology information. Nucleic Acids Research, 50(D1), D20–D26. https://doi.org/10.1093/nar/ gkab1112 13. Dey, N., & Bhateja, V. (2016). Medical imagin in clinical applications. Algorithmic and Computer-Based Approaches, 651. 14. Wang, Z., Chen, Y., & Li, Y. (2004). A brief review of computational gene prediction methods. Genomics, Proteomics & Bioinformatics, 2(4), 216–221. https://doi.org/10.1016/ S1672-0229(04)02028-5 15. Birney, E., & Durbin, R. (2000). Using GeneWise in the drosophila annotation experiment. Genome Research, 10(4), 547–548. https://doi.org/10.1101/GR.10.4.547 16. Yeh, R. F., Lim, L. P., & Burge, C. B. (2001). Computational inference of homologous gene structures in the human genome. Genome Research, 11(5), 803–816. https://doi.org/10.1101/ GR.175701 17. Snyder, E. E., & Stormo, G. D. (1995). Identification of protein coding regions in genomic DNA. Journal of Molecular Biology, 248(1), 1–18. https://doi.org/10.1006/JMBI.1995.0198 18. GrailEXP Home Page. (2022). http://pbil.univ-lyon1.fr/members/duret/cours/insa2004/exer cise4/pgrail.html. Accessed 19 Sep 2022. 19. GENIES. https://www.genome.jp/tools/genies/. Accessed 19 Sep 2022. 20. HMMgene – 1.1 – Services – DTU Health Tech. (2022). https://services.healthtech.dtu.dk/ service.php?HMMgene-1.1. Accessed 19 Sep 2022. 21. Geneid WEB Server. (2022). https://genome.crg.es/geneid.html. Accessed 19 Sep 2022. 22. FGENESH – HMM-based Gene Structure Prediction. (2022). http://www.softberry.com/ berry.phtml?topic=fgenesh&group=programs&subgroup=gfind. Accessed 19 Sep 2022. 23. GENSCAN bio.tools. (2022). https://bio.tools/genscan. Accessed 19 Sep 2022. 24. M. Stanke, “The AUGUSTUS gene prediction tool,” 2003. 25. Stencel, A., & Crespi, B. (2013). What is a genome? Molecular Ecology, 22(13), 3437–3443. https://doi.org/10.1111/mec.12355 26. Genomics. (2022). https://www.who.int/news-room/questions-and-answers/item/genomics. Accessed 14 Sep 2022. 27. Solanke, A., Tribhuvan, K., Kanika. (2015). Genomics: An integrative approach for molecular biology. In Biotechnology progress and prospects, pp. 234–270.
Introduction to the World of Bioinformatics
123
28. Notredame, C., & Higgins, D. G. (1996). SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Research, 24(8), 1515–1524. https://doi.org/10.1093/nar/24.8.1515 29. Home – OMIM. (2022). https://omim.org/. Accessed 19 Sep 2022. 30. RefSeqGene. (2022). https://www.ncbi.nlm.nih.gov/refseq/rsg/. Accessed 19 Sep 2022. 31. Home – BioProject – NCBI. (2022). https://www.ncbi.nlm.nih.gov/bioproject/. Accessed 19 Sep 2022. 32. Home – Gene – NCBI. (2022). https://www.ncbi.nlm.nih.gov/gene. Accessed 19 Sep 2022. 33. MGI-Mouse Gene Expression Database (GXD). (2022). http://www.informatics.jax.org/ expression.shtml. Accessed 19 Sep 2022. 34. Shin, G., Kang, T. W., Yang, S., Baek, S. J., Jeong, Y. S., & Kim, S. Y. (2011). GENT: Gene expression database of normal and tumor tissues. Cancer Informatics, 10, 149. https://doi.org/ 10.4137/CIN.S7226 35. Ranganathan, S., Gribskov, M., Nakai, K., & Schönbach, C. (2019). Applications. In Encyclopedia of bioinformatics and computational biology (vol. 3, pp. 938–952), Elsevier. 36. Wiltgen, M. (2018). Algorithms for structure comparison and analysis: Homology modelling of proteins. In Encyclopedia of bioinformatics and computational biology: ABC of bioinformatics (vol. 1–3, pp. 38–61), doi: https://doi.org/10.1016/B978-0-12-809633-8.20484-6. 37. Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453. https://doi.org/10.1016/0022-2836(70)90057-4 38. Chowdhury, B., & Garai, G. (2017). A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics, 109(5–6), 419–431. https://doi.org/10.1016/j. ygeno.2017.06.007 39. Lassmann, T., & Sonnhammer, E. L. L. (2005). Kalign – An accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics, 6, 1–9. https://doi.org/10.1186/14712105-6-298 40. Katoh, K., Rozewicki, J., & Yamada, K. D. (2018). MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics, 20(4), 1160–1166. https://doi.org/10.1093/BIB/BBX108 41. Nakamura, T., Yamada, K. D., Tomii, K., & Katoh, K. (2018). Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics, 34(14), 2490–2492. https://doi.org/ 10.1093/BIOINFORMATICS/BTY121 42. Katoh, K., Kuma, K. I., Toh, H., & Miyata, T. (2005). MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Research, 33(2), 511–518. https://doi. org/10.1093/NAR/GKI198 43. Yamada, K. D., Tomii, K., & Katoh, K. (2016). Application of the MAFFT sequence alignment program to large data – Reexamination of the usefulness of chained guide trees. Bioinformatics, 32(21), 3246–3251. https://doi.org/10.1093/BIOINFORMATICS/BTW412 44. Rozewicki, J., Li, S., Amada, K. M., Standley, D. M., & Katoh, K. (2019). MAFFT-DASH: Integrated protein sequence and structural alignment. Nucleic Acids Research, 47(W1), W5– W10. https://doi.org/10.1093/NAR/GKZ342 45. PROBCONS: Probabilistic Consistency-based Multiple Alignment of Amino Acid Sequences. (2022). http://probcons.stanford.edu/. Accessed 19 Sep 2022. 46. DIALIGN: Home. (2022). https://dialign.gobics.de/. Accessed 19 Sep 2022. 47. PRANK – was@bi. (2022). http://wasabiapp.org/software/prank/. Accessed 19 Sep 2022. 48. Probalign Home Page. (2022). https://web.njit.edu/~usman/probalign/. Accessed 19 Sep 2022. 49. Liu, Y., & Schmidt, B. (2014). Multiple protein sequence alignment with MSAProbs. Methods in Molecular Biology, 1079, 211–218. https://doi.org/10.1007/978-1-62703-646-7_14/ COVER 50. Daugelaite, J., Driscoll, A. O., & Sleator, R. D. (2013). An overview of multiple sequence alignments and cloud computing in bioinformatics. ISRN Biomathematics, 2013, 1–14. https:// doi.org/10.1155/2013/615630
124
S. Mishra et al.
51. Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. https://doi.org/10.1093/nar/gkh340 52. T-COFFEE Multiple Sequence Alignment Server. (2022). https://tcoffee.crg.eu/. Accessed 19 Sep 2022. 53. Parallel PRRN : Multiple Sequence Alignment. (2022). https://www.genome.jp/tools/prrn/ prrn_help.html. Accessed 19 Sep 2022. 54. Protein Structure | Learn Science at Scitable. (2022). https://www.nature.com/scitable/ topicpage/protein-structure-14122136/. Accessed 14Sep 2022. 55. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., & Walter, P. (2002). The shape and structure of proteins. [Online]. https://www.ncbi.nlm.nih.gov/books/NBK26830/. Accessed 14 Sep 2022. 56. Singh M. (2005). Predicting protein secondary and supersecondary structure, pp. 29, doi: https://doi.org/10.1201/9781420036275.pt7 57. Reeb, J., Rost, B. (2018). Secondary structure prediction. In Encyclopedia of bioinformatics and computational biology: abc of bioinformatics (vol. 1–3, pp. 488–496), doi: https://doi.org/ 10.1016/B978-0-12-809633-8.20267-7. 58. Singh, R., Deol, S. K., & Sandhu, P. S. (2010). Chou-Fasman method for protein structure prediction using cluster analysis. World Academy of Science, Engineering and Technology, 48, 980–985. 59. Garnier, J., Gibrat, J. F., & Robson, B. (1996). [32] GOR method for predicting protein secondary structure from amino acid sequence. Methods in Enzymology, 266(1995), 540–553. https://doi.org/10.1016/s0076-6879(96)66034-0 60. Protein Structure Prediction : A Practical Approach: A Practical Approach – Google Books. (2022). https://books.google.co.in/books? hl=en&lr=&id=u6ue1BygHnsC&oi=fnd&pg=PA207&dq=combinatorial+methods+for +structure+prediction&ots=SFdcHdm6BS&sig=rQMsoZ3pczFTDv6tbVsCCmn3AQ#v=onepage&q=combinatorial methods for structure prediction&f=false. Accessed 19 Sep 2022. 61. PSIPRED Workbench. (2022). http://bioinf.cs.ucl.ac.uk/psipred/. Accessed 19 Sep 2022. 62. Drozdetskiy, A., Cole, C., Procter, J., & Barton, G. J. (2015). JPred4: A protein secondary structure prediction server. Nucleic Acids Research, 43(W1), W389–W394. https://doi.org/10. 1093/NAR/GKV332 63. NPS@ : PREDATOR Secondary Structure Prediction. (2022). https://npsa-prabi.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_predator.html. Accessed 19 Sep 2022. 64. Micsonai, A., Bulyáki, É., & Kardos, J. (2021). BeStSel: From secondary structure analysis to protein fold prediction by circular dichroism spectroscopy. Methods in Molecular Biology, 2199, 175–189. https://doi.org/10.1007/978-1-0716-0892-0_11 65. Rehman, I., Kerndt, C. C., & Botelho, S. (2022). Biochemistry, tertiary protein structure. In StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing, p. 470269. https://www. ncbi.nlm.nih.gov/books/NBK470269/ 66. Moult, J., Hubbard, T., Bryant, S. H., Fidelis, K., & Pedersen, J. T. (1997). Critical assessment of methods of protein structure prediction (CASP: Round II). Proteins, Suppl 1, 2–6. 67. About MODELLER. (2022). https://salilab.org/modeller/. Accessed 19 Sep 2022. 68. Webb, B., & Sali, A. (2016). Comparative protein structure modeling using MODELLER. Current Protocols in Bioinformatics., 54, 5, 6(1). https://doi.org/10.1002/CPBI.3 69. Comparative Protein Structure Modeling Using MODELLER – Webb – 2016 – Current Protocols in Bioinformatics – Wiley Online Library. (2022). https://currentprotocols. onlinelibrary.wiley.com/doi/abs/10.1002/cpbi.3. Accessed 19 Sep 2022. 70. Bienert, S., et al. (2017). The SWISS-MODEL repository-new features and functionality. Nucleic Acids Research, 45(D1), D313–D319. https://doi.org/10.1093/NAR/GKW1132 71. SWISS-MODEL. (2022). https://swissmodel.expasy.org/. Accessed 19 Sep 2022.
Introduction to the World of Bioinformatics
125
72. Waterhouse, A., et al. (2018). SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Research, 46(W1), W296–W303. https://doi.org/10.1093/ NAR/GKY427 73. Guex, N., Peitsch, M. C., & Schwede, T. (2009). Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis, 30(SUPPL), 1. https://doi.org/10.1002/ELPS.200900140 74. Junk, P., & Kiel, C. (2021). HOMELETTE: A unified interface to homology modelling software. Bioinformatics, 38(6), 1749–1751. https://doi.org/10.1093/bioinformatics/btab866 75. Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., & Zhang, Y. (2014). The I-TASSER suite: Protein structure and function prediction. Nature Methods, 12(1), 7–8. https://doi.org/10.1038/ nmeth.3213 76. Bishop, T., Hatherley, R., Brown, D. K., & Glenister, M. (2016). PRIMO : An interactive homology modeling pipeline, pp. 1–20, doi: https://doi.org/10.1371/journal.pone.0166698. 77. Bystroff, C., & Shao, Y. (2002). Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA. Bioinformatics, 18(suppl_1), S54–S61. https://doi.org/ 10.1093/bioinformatics/18.suppl_1.S54 78. Moretti, R., Lyskov, S., Das, R., Meiler, J., & Gray, J. J. (2018). Web-accessible molecular modeling with Rosetta: The Rosetta online server that includes everyone (ROSIE). Protein Science, 27(1), 259–268. https://doi.org/10.1002/PRO.3313 79. Mcguffin, L. J. (2010). Computational structural biology – Methods and applications, doi: https://doi.org/10.1142/9789812778789. 80. 3D-pssm bio.tools. (2022). https://bio.tools/3d-pssm. Accessed 19 Sep 2022. 81. Kelley, L. A., MacCallum, R. M., & Sternberg, M. J. E. (2000). Enhanced genome annotation using structural profiles in the program 3D-PSSM. Journal of Molecular Biology, 299(2), 501–522. https://doi.org/10.1006/jmbi.2000.3741 82. THREADER bio.tools. (2022). https://bio.tools/threader. Accessed 19 Sep 2022. 83. bioinf.org.uk – Prof. Andrew C.R. Martin’s group at UCL. (2022). http://www.bioinf.org.uk/ software/profit/. Accessed 19 Sep 2022. 84. The Rosetta Software | RosettaCommons.(2022). https://www.rosettacommons.org/software. Accessed 20 Sep 2022. 85. AlphaFold. (2022). https://www.deepmind.com/research/highlighted-research/alphafold. Accessed 20 Sep 2022. 86. Heo, L., & Feig, M. (2020). High-accuracy protein structures by combining machine-learning with physics-based refinement. Proteins: Structure, Function, and Bioinformatics, 88(5), 637–642. https://doi.org/10.1002/PROT.25847 87. Heo, L., Arbour, C. F., & Feig, M. (2019). Driven to near-experimental accuracy by refinement via molecular dynamics simulations. Proteins: Structure, Function, and Bioinformatics, 87(12), 1263–1275. https://doi.org/10.1002/PROT.25759 88. Audagnotto, M., et al. (2022). Machine learning/molecular dynamic protein structure prediction approach to investigate the protein conformational ensemble. Scientific Reports, 12(1), 1–17. https://doi.org/10.1038/s41598-022-13714-z 89. Varadi, M., et al. (2022). AlphaFold protein structure database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50(D1), D439–D444. https://doi.org/10.1093/nar/gkab1061 90. Blum, M., et al. (2021). The InterPro protein families and domains database: 20 years on. Nucleic Acids Research, 49(D1), D344–D354. https://doi.org/10.1093/nar/gkaa977 91. UniProt. (2022). https://www.uniprot.org/. Accessed 19 Sep 2022. 92. Ensembl Genome Browser 107. https://asia.ensembl.org/index.html. Accessed 19 Sep 2022. 93. Swiss-Prot. (2022). https://www.sib.swiss/swiss-prot. Accessed 19 Sep 2022. 94. Bansal, P., et al. (2022). Rhea, the reaction knowledgebase in 2022. Nucleic Acids Research, 50(D1), D693–D700. https://doi.org/10.1093/NAR/GKAB1016
126
S. Mishra et al.
95. Xu, J., Kelly, R., Fang, H., & Tong, W. (2010). ArrayTrack: A free FDA bioinformatics tool to support emerging biomedical research – An update. Human Genomics, 4(6), 428–434. https:// doi.org/10.1186/1479-7364-4-6-428 96. EMBL-EBI: EMBL’s European Bioinformatics Institute | EMBL’s European Bionformatics Institute. (2022). https://www.ebi.ac.uk/. Accessed 19 Sep 2022. 97. Mukherjee, S., et al. (2021). Genomes OnLine Database (GOLD) v.8: Overview and updates. Nucleic Acids Research, 49(D1), D723–D733. https://doi.org/10.1093/NAR/GKAA983 98. AceDB – Wellcome Sanger Institute. (2022). https://www.sanger.ac.uk/tool/acedb/. Accessed 19 Sep 2022. 99. Rudd, S., Schoof, H., & Mayer, K. (2005). PlantMarkers – A database of predicted molecular markers from plants. Nucleic Acids Research, 33, no. DATABASE ISS, 628–632. https://doi. org/10.1093/nar/gki074 100. HarvEST Home Page. (2022). https://harvest.ucr.edu/. Accessed 19 Sep 2022. 101. Muñoz-Amatriaín, M., et al. (2017). Genome resources for climate-resilient cowpea, an essential crop for food security. The Plant Journal, 89(5), 1042–1054. https://doi.org/10. 1111/TPJ.13404 102. SNP2CAPS. (2022). http://pgrc.ipk-gatersleben.de/snp2caps/. Accessed 19 Sep 2022. 103. Tassel. (2022). https://tassel.bitbucket.io/. Accessed 19 Sep 2022. 104. BRENDA Enzyme Database. (2022). https://www.brenda-enzymes.org/. Accessed 19 Sep 2022. 105. Otasek, D., Morris, J. H., Bouças, J., Pico, A. R., & Demchak, B. (2019). Cytoscape automation: Empowering workflow-based network analysis. Genome Biology, 20(1), 185. https://doi.org/10.1186/S13059-019-1758-4 106. Maestro | Schrödinger. (2022). https://www.schrodinger.com/products/maestro. Accessed 19 Sep 2022. 107. Computomics – Megan6. (2022). https://computomics.com/services/megan6.html. Accessed 19 Sep 2022. 108. CCTop – CRISPR/Cas9 Target Online Predictor. (2022). https://cctop.cos.uni-heidelberg. de:8043/. Accessed 19 Sep 2022. 109. Stemmer, M., Thumberger, T., Del Sol Keyer, M., Wittbrodt, J., & Mateo, J. L. (2015). CCTop: An intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PLoS ONE, 10(4). https://doi.org/10.1371/JOURNAL.PONE.0124633 110. Du, Z., Zhou, X., Ling, Y., Zhang, Z., & Su, Z. (2010). agriGO: A GO analysis toolkit for the agricultural community. Nucleic Acids Research, 38(SUPPL), 2. https://doi.org/10.1093/ NAR/GKQ310 111. Tian, T., et al. (2017). AgriGO v2.0: A GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Research, 45(W1), W122–W129. https://doi.org/10.1093/NAR/ GKX382 112. FASTX-Toolkit. (2022). http://hannonlab.cshl.edu/fastx_toolkit/. Accessed 19 Sep 2022. 113. Ion Torrent – IN. (2022). https://www.thermofisher.com/in/en/home/brands/ion-torrent.html. Accessed 19 Sep 2022 114. Ion Torrent | Thermo Fisher Scientific – IN. (2022). https://www.thermofisher.com/in/en/ home/brands/ion-torrent.html. Accessed 19 Sep 2022.
Introduction to Artificial Intelligence & ML Sarath Panat and Ravindra Kumar
What You Will Learn 1. History of machine learning to understand how machine learning evolved to have such a large impact. 2. What is machine learning and why does it matter? 3. Different approaches in machine learning to solve problems. 4. Real-world application to understand the impact of machine learning. 5. What is artificial intelligence and why does it matter? 6. Basic principles of artificial intelligence. 7. Real-world application to understand the impact of artificial intelligence.
1 Machine Learning Machine learning was the thing of science fiction just 50 years ago. It was a promise made by computers that they would be able to do things that humans do with their brains. But it lacked the power to make that happen back then. Early thinkers and scientists shed light on how to make computers’ promising future a reality. And it is from there that machine learning emerges. Machine learning is a solution that arose as a result of the overabundance of questions. How do computers learn in the first place? What is the process by which a machine learns something? Can computers think? In 1959, Arthur Samuel coined the term “machine learning” to describe the field of study that allows computers to learn without being explicitly programmed. Machine learning algorithms identify patterns S. Panat · R. Kumar (✉) School of Biotechnology, National Institute of Technology Calicut, Kozhikode, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_5
127
128
S. Panat and R. Kumar
Fig. 1 Traditional programming
and/or predict outcomes [1]. ML techniques differ from traditional programming which take input data and create a program to produce outputs, whereas machine learning is an approach where we give both inputs and outputs to generate algorithms. Let’s begin with an interesting question, “How can a computer be taught to do tasks?” We have traditional programming techniques, which include taking some inputs and reasoning in the form of code and generating results. This works as long as the logic is straightforward and well understood. Let’s say we’re doing email spam detection. How can we identify spam emails using traditional programming? We may go through the spam emails to see how they look, and you may notice specific keywords (such as “won,” “credit card,” “earn money,” and “100%”) or patterns in email addresses or subject lines. Then, using the logic discovered, we can develop an algorithm to classify email as spam if it contains the keywords or patterns discovered by us [2]. However, finding patterns becomes extremely difficult when dealing with a large number of emails; sometimes we are unable to detect hidden patterns that follow in spam emails. We also need to review spam on a regular basis to see if spammers’ tactics have changed by introducing new keywords. Even if you find out, your program will most likely become a long list of complex rules that are difficult to maintain (Fig. 1). Whereas machine learning follows a different path, a machine learning algorithm takes inputs and outputs to generate a machine learning model that can be used to work with new input to produce an output. A spam filter based on machine learning techniques learns which words and phrases are efficient predictors of spam by recognizing abnormally frequent word patterns in spam samples compared to non-spam examples. As a result, the program will most likely be more accurate, shorter, and easier to maintain (Fig. 2). Let’s see the difference between traditional programming and machine learning through an object detection example (Figs. 3 and 4).
Introduction to Artificial Intelligence & ML
Fig. 2 Machine learning
Fig. 3 Traditional programming approach to solve object detection problem
Fig. 4 Machine learning approach to solve object detection problem
129
S. Panat and R. Kumar
130
1.1
A History of Machine Learning
Before delving deep into machine learning, let’s see the great early thinkers, philosophers, and scientists who laid the firm foundation with outstanding innovations. Machine learning cannot be explained without their contribution.
1.1.1
Laying the Foundation
Blaise Pascal is a French mathematician and philosopher who invented the “arithmetic machine” [3] which can do mathematical operations such as addition, subtraction, multiplication, and division. The invention of binary code by German mathematician and philosopher Gottfried Wilhelm Leibniz [4] serves as the foundation for all current computer systems. Charles Babbage, the inventor of computers, built the world’s first general-purpose machine that could be programmed, and Ada Lovelace, the world’s first computer programmer, described a sequence of operations for solving mathematical problems in Charles Babbage’s theoretical punchcard machine [5]. All of these are major episodes in the history of computer science. In 1936, Alan Turing designed his “universal machine” [6] where he introduced the modern computer’s core premise: the idea of managing the machine’s actions via a program of coded instructions stored in the computer’s memory. These are some of the major contributors; without them, no machine learning would exist (Fig. 5).
Fig. 5 Machine foundation timeline
Introduction to Artificial Intelligence & ML
1.1.2
131
From Theory to Reality
In the last half a century, what was once science fiction became real. Warren McCulloch, a neurophysiologist, and mathematician Walter Pitts published a study on how human neurons operate [7]. A human “neural network” is represented by electrical circuits. Computer scientists began putting these principles to use, and Arthur Samuel, a pioneer in machine learning, created software in 1952 that assisted an IBM computer in becoming better at checkers the more it played [8]. Board games are popular among machine learning researchers because they are both simple and complex. Later, a neural network, which is essentially a model inspired by the neurological system of humans, was successfully used to improve the clarity of phone calls. MADALINE [9] is a neural network model created at Stanford that is still used today to eliminate echoes over phone lines, making it the first real-world use. In 1997, Deep Blue beat a chess champion, marking a turning point in machine learning. When IBM’s Deep Blue (Deep Blue was a chess-playing expert system run on a purpose-built IBM supercomputer) beat chess grandmaster Garry Kasparov [10], it was the first and maybe the only time a machine beat a human chess expert. Kasparov demanded a rematch, but IBM refused and promptly retired Deep Blue. Another significant advancement in machine learning occurred in 1999, with computer-aided diagnostics catching more tumors. Computers can’t (yet) cure cancer, but they can assist humans in diagnosing it. The University of Chicago’s CAD Prototype Intelligent Workstation analyzed 22,000 mammograms and identified cancer 52 percent more effectively than radiologists [11]. These are some important advancements in machines that demonstrated that computers had begun to deliver on the promises they made [12] (Fig. 6).
Fig. 6 Machine learning: theory to reality timeline
132
1.2
S. Panat and R. Kumar
Why Machine Learning
Machine learning and artificial intelligence have become buzzwords in recent years, and you’ve probably heard a lot about them, both good and bad. Despite the fact that the terms machine learning and artificial intelligence are frequently used interchangeably, they are not synonymous. Machine learning evolved from artificial intelligence, which was a huge step forward for AI in general, but not all the way to the end. As shown in the Google Trends graph in Fig. 7, AI was the most popular search term until machine learning overtook it in September 2015 (Data source: Google Trends). Machine learning has become one of the most important applications of artificial intelligence, or machine learning is a subfield or method of achieving artificial intelligence. Machine learning is the process of providing data to machines so that they can “learn” how to do something without being explicitly programmed to do so. A machine learning algorithm is a computer program that assists them in making sense of the data they are given. So, through categorization and statistical analysis, machine learning algorithms learn the relationship among data and can make an educated “guess” based on the greatest probability. Many can even learn from their mistakes, making them “smarter” as they go along. Machine learning can be used to: • Predict outcomes based on input data, similar to regression analysis but with more variables and a larger scale, when a large amount of data needs to be evaluated in order to predict or recommend a result.
Fig. 7 Google search trend result of words ML and AI over time. (Data source: Google Trends)
Introduction to Artificial Intelligence & ML
133
• Machine learning can be used to automatically identify patterns and relationships in large datasets. This can be particularly useful for uncovering hidden insights that might not be immediately obvious to a human observer. Machine learning has become an integral part of our daily lives, assisting us with self-driving cars, medical diagnosis, financial decisions, product recommendations, and object identification, among several other applications [13]. Machines become incredibly powerful when they are able to make decisions based on prior experience rather than a pre-programmed algorithm. The ability of computers to solve realworld problems is rapidly improving. Machine learning is becoming increasingly important due to its wide range of applications and impressive ability to adapt and provide effective and efficient solutions to complex problems. Another key idea in machine learning is deep learning. Deep learning describes algorithms that analyze data with a logic structure similar to how a human would draw conclusions. An artificial neural network (ANN) is a machine learning technique based on neural networks that are inspired by the nervous system’s mechanism, or simply neurons [14]. The design of such an ANN is inspired by the biological neural network of the human brain, leading to a process of learning that’s far more capable than that of standard machine learning models. Input is fed into numerous layers of neurons; just like neurons and synapses, output from each layer is sent on to the next. This model with a large number of neural network layers is known as deep learning or deep neural network [15]. Deep learning algorithms can be considered as an advanced and mathematically complex progression of machine learning algorithms. Deep learning has gotten a lot of interest because of its ability to lead advances that traditional machine learning had never thought of before. In brief, deep learning is a subset of machine learning that differs in its approach to solve problem. Traditional machine learning generally requires a domain expert to identify most applied features. On the other hand, deep learning models can do feature extraction by model itself, thus eliminating the need for domain expertise. Deep learning is also highly scalable due to its ability to process huge amounts of data. It is preferred for massive data (big data), a lack of domain understanding for feature extraction, or complex problems, such as speech recognition and natural language processing (NLP). Artificial intelligence will be discussed in detail in the following topic, but to understand the relationship between artificial intelligence, machine learning, and deep learning, let us first look at artificial intelligence in brief. Artificial intelligence is the concept of creating intelligent machines that can perform tasks that normally require human intelligence. Machine learning is an artificial intelligence subset that aids in the development of AI-driven applications. Deep learning is a subset of machine learning that achieves artificial intelligence capabilities by utilizing the foundations of artificial neural networks. Deep learning is the best and closest approach to artificial intelligence that we have seen so far. We’ve come a long way in a short time. Thanks to philosophers and mathematicians who laid the groundwork, computer scientists who turned theory into reality, and all the researchers who have brought machine learning from the lab into our
134
S. Panat and R. Kumar
daily lives with solutions to challenging issues. As technology advances, so does the collection and processing of massive amounts of data, also known as big data. Machine learning has become such a crucial concept in the twenty-first century that it cannot be ignored.
1.3
Machine Learning Approaches
Machine learning requires well-organized and precise data to solve problems. In today’s environment, data is generated in various fields. As the saying goes, data is the new oil. Companies have access to huge amounts of data, which is also known as big data. Even though big data analysis is a time-consuming and difficult task for humans, good quality data is everything for a machine learning algorithm. As data becomes cleaner and more machine-readable, machine learning becomes better. Three types of learning can be used to train a machine learning algorithm: supervised learning, unsupervised learning, and reinforcement learning. Before delving deep into types of machine learning, let’s develop a brief understanding of the types of data used. There are two kinds of data – labeled data and unlabeled data. • Unlabeled data: It is raw data that has not been tagged with labels to identify the characteristics and properties of data. Photos, tweets, and audio clips are all examples of unlabeled data. There is no “explanation” for each piece of unlabeled data; it simply contains the data. • Labeled data: It is a collection of samples that have been labeled with one or more labels. Labeling typically involves taking a set of unlabeled data and adding each piece of that unlabeled data with meaningful, informative tags. Labels, for example, could indicate whether a photo contains a cat or a dog or whether a tweet’s sentiment is positive, negative, or neutral. Labeled data includes tags or labels that are associated with data in some way.
1.3.1
Supervised Learning
The use of labeled datasets to train machine learning algorithms defines supervised learning, also known as supervised machine learning. When labeled data (data with meaningful tags, labels, or classes) is fed into a machine learning model, it learns (i.e., adjusts its weights in the model) until the model is accurately fitted. Put another way, supervised learning employs training to teach desired outputs from inputs (task-driven). Because the training dataset is labeled (that is, it includes both inputs and outputs). The model learns from data over time by minimizing error and increasing prediction accuracy (that is, the model can accurately predict labels when given unlabeled data as input) [16–18] (Fig. 8).
Introduction to Artificial Intelligence & ML
135
Fig. 8 Supervised machine learning
Supervised learning is used to solve a wide range of real-world problems on a large scale, including spam detection, object detection, and customer sentiment analysis. There are two types of problems that can be solved using supervised learning – regression and classification • Classification: An algorithm is used to classify test data and assign it to specific categories [13]. • Regression: To understand the relationship between dependent and independent variables, regression is used [13].
1.3.2
Unsupervised Learning
The use of unlabeled datasets to train machine learning algorithms defines unsupervised learning, also known as unsupervised machine learning. When unlabeled data is fed into a machine learning model, algorithms discover hidden patterns and data groupings by analyzing similarities and differences in information to give outputs [17]. Put another way, unsupervised machine learning analyzes and clusters unlabeled datasets to make an ideal solution (data-driven). Unsupervised learning is used to solve a wide range of real-world problems on a large scale, including exploratory data analysis, customer segmentation, news classification, and recommendation engine [18] (Fig. 9). There are two types of problems that can be solved using unsupervised learning – clustering and dimensionality reduction. • Clustering: It’s a data-mining method for categorizing unlabeled data into groups based on similarities and differences [13].
136
S. Panat and R. Kumar
Fig. 9 Unsupervised machine learning
Fig. 10 Semi-supervised machine learning
• Dimensionality reduction: Dimensionality reduction is employed when a dataset’s number of features, or dimensions, is too huge. It keeps the dataset’s integrity to the greatest extent possible while reducing the amount of data inputs to a reasonable quantity [19]. There is also another type of learning called semi-supervised learning, which combines supervised and unsupervised learning techniques. A semi-supervised machine learning only requires a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning avoids the challenges of finding a large amount of labeled data and provides the benefits of both supervised and unsupervised learning (Fig. 10). Semi-supervised machine learning is often used when labeling is challenging or expensive, or when there is a limited amount of labeled data available. Semisupervised learning is commonly used in speech analysis, text classification, protein sequence classification, and other areas.
Introduction to Artificial Intelligence & ML
1.3.3
137
Reinforcement Learning
Reinforcement learning, a branch of machine learning inspired by behaviorist psychology, is concerned with the ability to learn associations between stimuli, actions, and the occurrence of pleasant or unpleasant events, referred to as rewards or punishments [20]. Let’s take a look at how a child learns to walk. The way you move on two legs is the first thing a child notices. After getting the notion, the child tries to copy you, but fails most of the time, but eventually learns to crawl, then stand still. A child learns to balance body weight, hand movement, and other aspects each time he or she succeeds or fails. After that, the child progressively learns to walk, although he or she constantly loses balance and falls. After many ups and downs, the kid eventually learns to walk. Now you have a good idea of how difficult it is for the child to get up and walk. In short, the child is punished for falling and rewarded for advancing when learning to walk. Humans have an inbuilt reward system that motivates us to walk (happiness/ feeling one gets when walking) and a built-in punishment system that discourages us from doing the action (falling/pain while falling). This kind of reward-based learning has been modeled mathematically and is called reinforcement learning (Fig. 11).
1.4
Machine Learning Applications
Machine learning is making its way out of the lab and into our daily lives, with applications in fields as diverse as healthcare, transportation, finance, and cybersecurity [21]. Most industries that deal with large amounts of data realize the benefits
Fig. 11 Reinforcement learning
138
S. Panat and R. Kumar
of machine learning technologies. Organizations may operate more effectively or gain an advantage over rivals by gleaning insights from this data.
1.4.1
Product Recommendation
You may be startled when YouTube recommends the exact movie that you want to watch, when Google shows advertising for products that you just looked for, and when Amazon suggests products that are frequently purchased combined with the one you are about to purchase. All these unexpected surprises are the effort of machine learning algorithms that analyze your previous purchases, search habits, watch history, cart history, and other activities to provide product suggestions. Amazon’s product recommendations, Netflix’s recommendations for movies and TV shows in your feed, YouTube’s suggested videos, Spotify’s music, Facebook’s newsfeed, and Google Ads are all examples of recommender systems in action [22]. Let’s take a look at how machine learning can help. Customers may be dissatisfied with the ability to discover the specific product that they want due to time restrictions and the complexity of sorting through all of the goods on the site. And here is when product suggestions come into play. Product suggestions assist customers in locating products that they require and are relevant, and most websites implement personalized product recommendations, which enhance both user experience and satisfaction.
1.4.2
Image Recognition
Image recognition is another use of machine learning [23]. You might be surprised at how smartphones use face detection to unlock the phone within milliseconds each time you pick it up. Or you may be impressed by how cameras installed at automated toll booths recognize license plates to make traveling easier and faster. There are several computer vision approaches, which are essentially machine learning models used for various applications such as object recognition, segmentation, classification, and so on. Image recognition is one of the most well-known and essential machine learning approaches, in which a machine learning model recognizes patterns, shapes, or objects in a digital image. Machine learning image recognition algorithms are also being actively used to help doctors and provide faster test results in a variety of healthcare and biological sectors. You may have noticed a significant improvement in counting speed if you have visited an automated laboratory for blood cell counts or other similar tests. Traditionally, blood cells are counted manually with a hemocytometer (an instrument for visual counting of the number of cells in a blood sample or other fluid under a microscope) and another laboratory equipment, which is a timeconsuming and labor-intensive task. Machine learning algorithms can recognize blood cells and provide cell-counting data without the need for a human
Introduction to Artificial Intelligence & ML
139
[24]. Because of machine learning, the counting time was considerably reduced, and human counting mistakes were eliminated. These image recognition algorithms also aided in the fight against COVID-19 as well [25]. Machine learning engineers develop machine learning models to detect COVID-19 instances based on patient chest radiographs, which is deployed as an effective tool in triaging patients and supporting the healthcare system in crises. (Additionally, machine learning techniques are efficiently employed in instances other than image recognition, such as infection prediction, screening patients, and accelerating drug development). Machine learning is also widely used in object detection problems such as cancer diagnosis and pneumonia detection [26] and in medical digital image processing to decrease the influence of noise, enhance the picture, and increase its quality. In short, machine learning is a powerful tool for addressing image-related problems.
1.4.3
Sentiment Analysis
Sentiment analysis is one of the most important machine learning applications that applies to sentiment classification, opinion mining, and emotion analysis [27]. Sentiment analysis is a real-time machine learning program that determines the speaker’s or writer’s mood or opinion. For example, if someone writes a review or an email (or any other type of document), a sentiment analyzer will quickly determine the actual sentiment and tone of the text. Sentiment analysis is also used in product analytics to better analyze brand buzzwords, consumer requirements and attitudes, and customer care systems. Sentiment analysis is one of those machine learning techniques that, when applied correctly, can help solve a problem while also providing additional insights. The results can then be used to improve the quality of a product or service.
2 Artificial Intelligence 2.1
What Is Artificial Intelligence
There are several definitions for artificial intelligence; however, according to John McCarthy, the father of artificial intelligence and a well-known American computer scientist and cognitive scientist, “It is the science and engineering of making intelligent machines, especially intelligent computer programs. It is related to the similar task of using computers to understand human intelligence, but AI does not have to confine itself to methods that are biologically observable” [28]. Artificial intelligence is a bigger concept of creating intelligent machines that can mimic the problem-solving and decision-making capabilities of the human mind [29]. To put it another way, artificial intelligence is the use of computers to do jobs that previously required human intelligence. This involves developing algorithms to
140
S. Panat and R. Kumar
categorize, analyze, and forecast data. It also involves implementing data-driven actions, learning from new data, and improving over time. Artificial intelligence is rapidly expanding its presence into almost every field. It is similar to a young human child growing into a more intelligent human adult. While humans can make mistakes, artificial intelligence has the potential to produce more accurate results in the future as research and development progress. Similar to what we discussed in the difference between traditional programming and machine learning, artificial intelligence is a program that has the freedom to explore and improve on its own, rather than stick to predefined scenarios. It’s also known as an intelligent agent because it can perceive its surroundings in order to gather necessary information and perform the best actions for attaining the objective and maximizing success based on past experiences. We deal with artificial intelligence on a daily basis, from our professional to personal lives, and the application of AI will be discussed in depth in the coming section.
2.2
Basic Principles
When artificial intelligence systems get more creative, they can come up with amazing and surprising results. In June 2020, the non-profit company OpenAI unveiled GPT-3, a language model that can generate text that is virtually indistinguishable from human writing [30]. We also see, as many countries entering into an election, controlling deep fakes [31] (which are synthetic media in which a person in an existing image or video is replaced with someone else’s likeness) is a challenge as deep fakes become increasingly sophisticated and accessible. While artificial intelligence has the ability to provide meaningful solutions to many issues, it must be used correctly to avoid it acting in the wrong way. As AI is advancing, the AI community should follow AI ethics and develop responsible AI for the future [31, 32].
2.2.1
AI Ethics and Responsible AI
Artificial intelligence ethics is a set of moral ideas and strategies designed to guide the development and responsible application of AI. As AI becomes more embedded into products and services, businesses are beginning to create AI codes of ethics, because AI is a human-created technology that aims to mimic, augment, or replace human intelligence. To create insights, these systems often rely on enormous amounts of diverse sorts of data. Projects that are poorly designed and based on incorrect, inadequate, or biased data can have unexpected and perhaps severe consequences. Responsible AI is the development of creating new opportunities to improve the lives of people ensuring the safety, unbiasedness, and trustworthiness of AI. Responsible AI also ensures privacy, safety, and models are explainable and ethical [33].
Introduction to Artificial Intelligence & ML
2.3
141
General Applications of Artificial Intelligence
If artificial intelligence can translate complex data into meaningful insights, it could do wonders. Artificial intelligence and its problem-solving abilities are needed to either replace the need for humans to perform laborious and repetitive tasks or to improve the speed, precision, and efficacy of human efforts when given the right amount of data [34]. When AI takes over tasks that are monotonous or dangerous, it frees up human labor to focus on tasks that require creativity and empathy. AI has a wide range of applications that may make our lives easier and more efficient by speeding efforts to protect the environment by advancing greener transportation, monitoring deforestation, and so on. Despite the fact that artificial intelligence has a wide range of applications, in this chapter, we’ll go through some of the more general artificial intelligence applications.
2.3.1
Voice Assistants
Okay, Google, what’s the weather like today? You may have tried these things with Google Voice Assistant [35] or other voice assistants or you received responses from chatbots or voice assistants when attempting to contact customer support for information/complaints and services. How do they interpret human language and respond to our questions? Obviously, we can’t pre-program an algorithm for every question a customer could ask, can we? This is where artificial intelligence and machine learning come in. Machine learning algorithms are trained on human language to understand the relationship between words, sentences, language of words, etc. Artificial intelligence is used to interpret the results and find the best answer users required, and it uses another voice assistant or chatbot to respond to user questions. Many of the devices we use on a daily basis make use of voice assistants. They’re on our smartphones and in our home’s smart speakers. They are used by many mobile apps and operating systems [36]. Furthermore, certain technologies in automobiles, as well as those in retail, education, healthcare, and telecommunications, may be controlled by speech. Some well-known examples of intelligent personal assistants are Apple Siri, Google Assistant, and Amazon Alexa. Despite the fact that voice assistants have only been in mainstream use for around 10 years, they have already grabbed more than 30 percent [37] of all conversations with technology, and the use of voice assistants is increasing exponentially as machine learning and artificial intelligence become better.
2.3.2
Self-Driving Cars
Another exciting application of artificial intelligence is in self-driving cars [38], where the complex process of driving, which requires full-time human attention on the road to analyze surroundings, route, traffic signs, follow traffic rules, and drive
142
S. Panat and R. Kumar
safely, has been automated. You’ve probably seen videos of Tesla self-driving cars or Waymo self-driving cars that can navigate with little or no human assistance. As self-driving technology advances to level 5 automation, which does not require any form of human interaction, the role of artificial intelligence and machine learning becomes increasingly important. All of these self-driving cars use data acquired from sensors like cameras and lidar (a remote sensing method that uses light in the form of a pulsed laser to measure ranges), which is fed into a machine learning algorithm to construct a detailed 3D map of the surroundings, recognize traffic lights and road signs, and make decisions based on this data and traffic rules [39]. In summary, AI software in the car connects to all sensors and receives inputs, which are then processed. AI simulates human perceptual and decision-making processes using deep learning (deep learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain) and regulates actions in the driver control system such as steering and braking. So, let’s look at how artificial intelligence and machine learning can benefit in this case. According to statistics, 94 percent (data source: USDOT Releases 2016 Fatal Traffic Crash Data) of crashes are caused by driver mistakes or behavior, such as impaired driving, unbelted car occupants, speeding, drugged driving, or distraction. As a result, driverless vehicles hold the most promise in terms of minimizing collisions and will have a significant impact on society in the coming years. Autonomous vehicles also hold other significant benefits. People who are elderly, sick, or differently abled are capable of self-sufficiency, and highly automated vehicles can assist them in living the lives they want. Automated vehicles can also save money, provide greater personal freedom, increase productivity, and benefit the environment by reducing fuel use and carbon emissions. Fully autonomous cars are being tested in many locations throughout the world, but none are yet available to the general public; nevertheless, we can hope that AI can bring it in the near future.
2.3.3
Healthcare and Biology
Early detection and diagnosis are critical in the battle against cancer because there is no guaranteed treatment for cancer in its advanced stages [40]. Machine learning techniques can be introduced into the testing pipeline for early detection, which can significantly improve testing speed, accuracy, and remove human errors. Machine learning techniques can be trained on digital pictures such as CT scans and X-ray scans to discover and segment probable illness regions, assisting radiologists and doctors in providing earlier medical treatments [40, 41]. Machine learning can also help with analyzing patient health records, diabetes prediction, image processing, personalized treatment, and, most importantly, drug development [42]. One such initiative, Microsoft’s Hanover, attempts to discover a personalized drug to treat acute myeloid leukemia. AlphaFold is a Google DeepMind artificial intelligence algorithm that performs protein structure prediction and solved a 50-year-old grand challenge in biology [43]. Proteins, as we all know, are responsible for a variety of processes in our body
Introduction to Artificial Intelligence & ML
143
and are essential for life. Proteins are large, complicated molecules that are built up of amino acid chains. Because a protein’s shape is so tightly tied to its function, understanding its unique three-dimensional structure can help us better to understand what it does and how it operates. AlphaFold is a deep learning model that solves the challenge of determining what shapes proteins fold into, generally known as the protein folding problem. AlphaFold is considered by the AI community as a breakthrough that demonstrates how AI can greatly accelerate progress in some of the most challenging areas. Another area where AI has seen a lot of application recently is drug discovery. The existing drug discovery method is extremely expensive, and developing medicine can take years [44]. Integrated research methods can be developed by combining biology and artificial intelligence to save both cost and time. Biological insight offers precision, but AI adds up speed and reduces uncertainty by analyzing trillions of data points per tissue sample in days, which is very hard for humans to do alone. The advantage of AI to handle large data also has application in analyzing greater population index as well as analyzing individual patient data, which helps to determine personalized treatment [45].
3 Conclusion Artificial intelligence is a field of computer science and engineering that focuses on the development of intelligent systems that can perform tasks that typically require human-like cognition, such as problem-solving, decision-making, and learning. The concept of machines being able to think and act like humans has been explored in mythology and science fiction for centuries, but it is only in recent decades that advancements in technology and algorithms have made this a reality. Despite its origins in computer science, artificial intelligence has found applications in a wide range of fields as it has progressed. It is evolving at a furious pace, and it is becoming increasingly adept at engineering the implementation of powerful and wellestablished cognitive principles. Artificial intelligence has the potential to increase workplace efficiency and boost human capabilities. When artificial intelligence takes over repetitive or dangerous work, it frees up the human labor to focus on tasks that need creativity and empathy, among other skills. Artificial intelligence and machine intelligence are becoming increasingly powerful, but their potential for good is not limited to their technologies. It lies mainly in its users. Take-Home Message • Machine learning is an idea that systems can learn from data, identify patterns, and make decisions with minimal human intervention. • Machine learning is capable of analyzing large volumes of data (or big data) to extract hidden insights within a short span of time. • Deep learning is one of the popular machine learning techniques that is used to solve complex machine learning challenges.
144
S. Panat and R. Kumar
• Supervised, unsupervised, and reinforcement learning are three main types of machine learning approaches to solve problems. • Artificial intelligence is the concept of creating intelligent machines that can mimic human problem-solving and decision-making capabilities to solve challenges. • AI ethics and responsible AI is the idea of using AI to find meaningful solutions to many issues in the right way. • AI and ML have already been implemented in a range of products and services that are used in daily life, and they have a promising future in a variety of fields. • Machine learning is a subset of artificial intelligence, while deep learning is a subset of machine learning.
References 1. Awad, M., & Khanna, R. (2015). Machine learning. In: Efficient learning machines: Theories, concepts, and applications for engineers and system designers (pp. 1–18). Apress 2. Dada, E. G., Bassi, J. S., Chiroma, H., et al. (2019). Machine learning for email spam filtering: Review, approaches and open research problems. Heliyon, 5, e01802. https://doi.org/10.1016/j. heliyon.2019.e01802 3. Rojas Sola, J. I., Río-Cidoncha, G., Sarriá, A., & Galiano-Delgado, V. (2021). Blaise pascal’s mechanical calculator: Geometric modelling and virtual reconstruction. Machines, 9, 136. https://doi.org/10.3390/machines9070136 4. Ares, J., Lara, J., Lizcano, D., & Martínez, M. (2018). Who discovered the binary system and arithmetic? Did Leibniz Plagiarize Caramuel? Science and Engineering Ethics, 24. https://doi. org/10.1007/s11948-017-9890-6 5. Skulrattanakulchai, A. (2017). A man before his time. Charles Babbage. 6. De Mol, L. (2021). Turing Machines. In E. N. Zalta (Ed.), The {Stanford} encyclopedia of philosophy. Metaphysics Research Lab, Stanford University. 7. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5, 115–133. https://doi.org/10.1007/ BF02478259 8. Samuel, A. L. (1959). Some studies in machine learning using the game of Checkers. IBM Journal of Research and Development, 3, 210–229. https://doi.org/10.1147/rd.33.0210 9. Widrow, B., & Lehr, M. A. (1990). 30 years of adaptive neural networks: perceptron, Madaline, and backpropagation. Proceedings of the IEEE, 78, 1415–1442. https://doi.org/10.1109/5. 58323 10. Campbell, M., Hoane, A. J., & Hsu, F. (2002). Deep blue. ArtifIntell, 134, 57–83. 11. (2000). Computer technology helps radiologists spot overlooked small breast cancers. Oncology (Williston Park) 14, 1450 12. Patel, M., & Jaiswal, M (2021). Introduction to artificial intelligence 13. Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2, 160. https://doi.org/10.1007/s42979-021-00592-x 14. Grossi, E., & Buscema, M. (2007). Introduction to artificial neural networks. European Journal of Gastroenterology & Hepatology, 19, 1046–1054. https://doi.org/10.1097/MEG. 0b013e3282f198a0 15. Arnold, L., Rebecchi, S., Chevallier, S., Paugam-Moisy, H. (2011). An introduction to deep learning. In Proceedings of the European symposium of artificial neural network, ESANN2011, pp. 477–488
Introduction to Artificial Intelligence & ML
145
16. Biswas, A., Saran, I., & Wilson, F. P. (2021). Introduction to supervised machine learning. Kidney 360, 2. https://doi.org/10.34067/kid.0000182021 17. Vemuri, V. K. (2020). The hundred-page machine learning book. Journal of Information Technology Case and Application Research, 22, 136–138. https://doi.org/10.1080/15228053. 2020.1766224 18. Badillo, S., Banfai, B., Birzele, F., et al. (2020). An introduction to machine learning. Clinical Pharmacology & Therapeutics, 107, 871–885. https://doi.org/10.1002/cpt.1796 19. Usama, M., Qadir, J., Raza, A., et al. (2019). Unsupervised machine learning for networking: Techniques, applications and research challenges. IEEE Access, 7, 65579–65615. https://doi. org/10.1109/ACCESS.2019.2916648 20. Kormushev, P., Calinon, S., & Caldwell, D. G. (2013). Reinforcement learning in robotics: Applications and real-world challenges. Robotics, 2, 122–148. https://doi.org/10.3390/ robotics2030122 21. Majumder, A. K. M. J., & Veilleux, C. (2021). Smart health and cybersecurity in the era of artificial intelligence. In Computer-mediated communication. IntechOpen 22. Portugal, I., Alencar, P., & Cowan, D. (2015). The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications, 97. https://doi. org/10.1016/j.eswa.2017.12.020 23. Khan, A. (2020). Machine learning in computer vision. Procedia Computer Science, 167. https://doi.org/10.1016/j.procs.2020.03.355 24. Alam, M. M., & Islam, M. T. (2019). Machine learning approach of automatic identification and counting of blood cells. Healthcare Technology Letters, 6, 103–108. https://doi.org/10.1049/ htl.2018.5098 25. Ghaderzadeh, M., & Asadi, F. (2021). Deep learning in the detection and diagnosis of COVID19 using radiology modalities: A systematic review. Journal of Health, Education and Literacy, 2021, 6677314. https://doi.org/10.1155/2021/6677314 26. Račić, L., Popovic, T., Cakic, S., & Šandi, S. (2021). Pneumonia detection using deep learning based on convolutional neural network. In 25th international conference on Information Technology (IT), IEEE 27. Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5, 1093–1113. https://doi.org/10.1016/j.asej.2014. 04.011 28. McCarthy, J. (2004). What is artificial intelligence? 29. Jarrahi, M. H. (2018). Artificial intelligence and the future of work: Human-AI symbiosis in organizational decision making. Business Horizons, 61. https://doi.org/10.1016/j.bushor.2018. 03.007 30. Brown, T., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, et al. (Eds.), Advances in neural information processing systems (pp. 1877–1901). Curran Associates, Inc.. 31. Nguyen, T., Nguyen, C. M., Nguyen, T., et al. (2019). Deep learning for deepfakes creation and detection: A survey 32. Kayid, A. (2020). The role of artificial intelligence in future technology 33. Ghallab, M. (2019). Responsible AI: requirements and challenges. AI Perspect, 1, 3. https://doi. org/10.1186/s42467-019-0003-z 34. Wang, W., & Siau, K. (2019). Artificial intelligence, machine learning, automation, robotics, future of work and future of humanity: A review and research agenda. Journal of Database Management, 30, 61–79. https://doi.org/10.4018/JDM.2019010104 35. Schalkwyk, J., Beeferman, D., Beaufays, F., et al. (2010). Google search by voice: A case study. In Advances in speech recognition: Mobile environments (pp. 61–90). Call Centers and Clinics 36. Terzopoulos, G., & Satratzemi, M. (2020). Voice assistants and smart speakers in everyday life and in education. Informatics Education, 473–490. https://doi.org/10.15388/infedu.2020.21
146
S. Panat and R. Kumar
37. Muthukumaran, A. (2020). Optimizing the usage of voice assistants for shopping. Indian Journal of Science and Technology, 13, 4407–4416. https://doi.org/10.17485/IJST/v13i43. 1911 38. Cunneen, M., Mullins, M., & Murphy, F. (2019). Autonomous vehicles and embedded artificial intelligence: The challenges of framing machine driving decisions. Applied Artificial Intelligence, 33, 706–731. https://doi.org/10.1080/08839514.2019.1600301 39. Biggi, G., & Stilgoe, J. (2021). Artificial intelligence in self-driving cars research and innovation: A scientometric and bibliometric analysis artificial intelligence in self-driving cars research and innovation: A scientometric and bibliometric analysis 40. Saba, T. (2020). Recent advancement in cancer detection using machine learning: Systematic survey of decades, comparisons and challenges. Journal of Infection and Public Health, 13, 1274–1289. https://doi.org/10.1016/j.jiph.2020.06.033 41. Makaju, S., Prasad, P. W. C., Alsadoon, A., et al. (2018). Lung cancer detection using CT scan images. Procedia Computer Science, 125, 107–114. https://doi.org/10.1016/j.procs.2017. 12.016 42. Ahuja, A. S. (2019). The impact of artificial intelligence in medicine on the future role of the physician. PeerJ, 7, e7702. https://doi.org/10.7717/peerj.7702 43. Jumper, J., Evans, R., Pritzel, A., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589. https://doi.org/10.1038/s41586-021-03819-2 44. Réda, C., Kaufmann, E., & Delahaye-Duriez, A. (2020). Machine learning applications in drug development. Computational and Structural Biotechnology Journal, 18, 241–252. https://doi. org/10.1016/j.csbj.2019.12.006 45. Schork, N. J. (2019). Artificial intelligence and personalized medicine. Cancer Treatment and Research, 178, 265–283. https://doi.org/10.1007/978-3-030-16,391-4_11
Fundamentals of Machine Learning Joel Mathew Cherian and Ravindra Kumar
What You Will Learn This chapter focuses on: 1. 2. 3. 4. 5. 6.
Introducing two main types of learning, supervised and unsupervised learning. Supervised learning. Unsupervised learning. A brief introduction to semi-supervised and reinforcement learning. Deep learning. Training and evaluating models.
1 Introduction Machine learning is a type of artificial intelligence (AI); it deals with the study of how machines learn to perform tasks without being explicitly programmed to do so. To understand this process, it is beneficial to reflect on how humans learn. When learning something new, humans start with a poor performance on the task and gradually improve with experience. Similarly, when machines are learning to execute a particular task, they process data associated with that task, looking for patterns in the data. These patterns provide the machine with necessary experience to perform increasingly better on the task [1].
J. M. Cherian Computer Science and Engineering, National Institute of Technology Calicut, Kozhikode, Kerala, India R. Kumar (✉) School of Biotechnology, National Institute of Technology Calicut, Kozhikode, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_6
147
148
J. M. Cherian and R. Kumar
Machine learning focuses on methods that automatically detect patterns in data [2]. These methods can be categorized into four main groups based on the type of learning, namely, supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. The following sections take a look at the mechanism involved in making such learning possible.
2 Supervised Learning Supervised learning is when a machine uses labeled data to learn. Data is said to be “labeled” when a training example consists of inputs and expected outputs. To give the reader a better idea of a labeled dataset, Table 1 shows a subset of the iris flower dataset [3]. Each training example in this subset has sepal and petal measurements as input and a label for the iris flower class as the expected output. Labeled data allows machines to be trained to predict the exact output for a given input. Algorithms based on supervised learning are called supervised learning algorithms [2, 4, 5]. Figure 1 shows the training mechanism for supervised learning. It is necessary to familiarize oneself with individual components of the diagram and various terminologies involved before moving on to review the mechanism as a whole.
Table 1 A labeled dataset (subset of iris flower dataset) with the label “Species” Sepal length 4.9 5.7 6.6 5.1
Sepal width 3 2.5 2.9 3.5
Petal length 1.4 5 4.6 1.4
Petal width 0.2 2 1.3 0.2
Species Iris setosa Iris virginica Iris versicolor Iris versicolor
Optimization Algorithm
Model
ŷ
Training Data ( X , y ) ƒ
Loss Loss Function ƒ( ŷ , y )
X - Input data y - Expected output
ŷ - Predicted output ƒ - Loss function
Fig. 1 Training mechanism for supervised learning algorithms
Fundamentals of Machine Learning
149
1. Model: A model represents the prediction rules that map an input to its output [1, 6, 7]. It is simply a mathematical function [8]. This function performs operations involving the inputs provided and certain inbuilt parameters. When defining a model, both the operations and the parameters of a model have to be defined. Once a model is defined, it is not possible to alter the operations of the model or the number of parameters. However, the values of the parameters are updated throughout the training process, enabling the model to learn and fit the data better. 2. Training data: This is the data on which the model is trained. As mentioned earlier, this data contains input and expected output pairs [6]. 3. Loss function: This function gauges how good the model’s prediction is. It receives the predicted and the expected output as inputs and returns a value; this value is often referred to by the term “loss.” Loss measures the difference between the expected and predicted output [1, 6, 9]. There are many loss functions, and different tasks require different loss functions. A few loss functions are discussed in the upcoming sections. Another term called “cost function” is often used interchangeably with the loss function. However, the difference is that the loss function is the loss for a single training example, while the cost function is the average loss over the entire training set. 4. Optimization algorithms: Optimization algorithms try to reduce the value of the cost function. This reduction is achieved by tuning the parameters in the model. The optimization algorithm helps provide an organized way to update these parameter values so that the model progressively gets better and better [10]. Gradient descent, RMSprop, and Adam [11, 12] are some common optimization algorithms used to train models today. Discussions in this chapter will be restricted to gradient descent alone. Gradient Descent Consider the plot of cost function against model parameters. To simplify the explanation, consider the case where the model has only one parameter. This means that the plot will be two-dimensional and could look something like Fig. 2a.
J
J
(a)
(b)
J
(c)
Fig. 2 Representation of gradient descent. The red dot shows (a) the loss for the current value of θ, (b) the gradient at θ, and (c) the reduction in loss when the value of θ is updated
150
J. M. Cherian and R. Kumar
The model parameters are randomly initialized at the start of execution. Let the red dot in Fig. 2a represent the initial position on the curve. Since optimization algorithms try to minimize the cost function, the red dot must move towards the minima of the cost function. Mathematically, the curve’s derivative (gradient) at that red point gives the direction of the steepest increase for a function, as shown in Fig. 2b. Hence, the direction towards the minima will be given by the negative of the gradient [6, 8, 10]. Following the derivative the curve becomes flatter and flatter on moving closer to the minima; hence the derivative must become smaller and smaller. With that in mind, one can say that the derivative decreases progressively till finally, it stops at the minima. Figure 2c shows how the cost function decreases towards the minima. The discussion so far can be mathematically formulated [10] as follows θ → θ - ⍺δðJ Þ=δθ
ð1Þ
In Eq. 1, θ represents the model parameter and J the cost function. Learning rate is an additional parameter that is introduced to control the speed at which the model learns. The learning rate usually has a value lesser than 1. If the model takes large steps, it may overshoot the minima; the learning rate helps control that. This parameter is external to the model, and it is usually set by the designers of the model. These external parameters are referred to as “hyperparameters.” The most optimal values for hyperparameters can be obtained by trial and error. This approach of using gradient descent can be extended to include multiple parameters. In those cases the plots would no longer be two-dimensional; they would rather be n-dimensional. It can also be noted that gradient descent updates the parameters after passing through the entire training dataset. There are other versions of gradient descent, such as stochastic gradient descent and mini-batch gradient descent, that update model parameters after calculating the loss for each training example and a smaller subset of the training dataset, respectively [12]. Mechanism: The model parameters are initialized with random values, and input data is passed to the model either one by one or in small batches. A corresponding output prediction is made for each input, and this predicted output and the expected output are passed to the loss function. The optimization algorithm updates the model parameters to reduce the loss. After all the training data is processed, it can be said that one epoch is complete. Then the training data is passed through the model again, and the process is repeated for a specified number of epochs. Supervised machine learning problems are of two types, namely, classification and regression [2]. 1. Regression: These problems require the model to predict a single real number for the given inputs [2], for example, predicting the price of a house based on the number of rooms and the square footage of the house. The data for regression tasks include input-output pairs, where the output pairs are realvalued numbers.
Fundamentals of Machine Learning
151
2. Classification: These problems require the model to predict a category for the given inputs [2]. For example, the task of classifying iris flowers based on the height and width of their sepals and petals is a classification task. The data for classification tasks include input-output pairs, where the output pairs are categorical. The following subsections will discuss some popular machine learning algorithms built for these tasks.
2.1
Linear Regression
Linear regression is a concept that was introduced in statistics and was later ported to machine learning because of its usefulness in analyzing the relationship between input and output variables. Linear regression is a supervised learning algorithm used to perform regression on tabular data. For example, consider a table of housing pricing for various houses. Assume a simple case where the table has two columns, one for the number of rooms (A) and the other for the cost of the house (R). Now linear regression can be used to study how A affects R. Intuitively, a house with more rooms would have a higher price. But the real question is how much higher is the price for every additional room. Assume the price increases by the same amount for every additional room; then, it could be said that R is linearly dependent on A. Then, mathematically the formula could be written as R = ⍺ ðA Þ þ β
ð2Þ
So, in this case, the model is the function shown in Eq. 2, and the parameters for our model are ⍺ and β. It may be noted that for a given value ⍺, β in Eq. 2 represents a line in a plot between R and A. Now that the model has been defined, the next step is to define a loss function that can help guide the model to learn. A very common loss function for regression tasks is the least squared error (LSE). The cost function of least squared errors is called mean squared errors (MSE). The formula for LSE and MSE is given below [8, 12]. LSE = MSE =
1 2m
1 2 m i=1
ypred - yexpected
2
yðiÞ pred - yðiÞ expected
ð3Þ 2
ð4Þ
152
J. M. Cherian and R. Kumar
In Eq. 3 ypred is the predicted output, and yexpected is the expected output for a training example from the dataset. In Eq. 4 “m” stands for the total number of training examples, y(i)pred is the predicted output, and y(i)expected is the expected output for the ith training example as input. The loss function calculates the difference between the expected and the predicted output. The cost function calculates the cost, which is the sum of all the losses. If Eq. 2 is considered as a line, this cost function is simply trying to find the perpendicular distance of each point from the line. The data [8] is fit better [8] by reducing the cost function. It can be seen below how a linear regression model is trained with the loss function defined. The steps involved in linear regression are as follows: 1. 2. 3. 4. 5.
Initialize ⍺ and β with random values. Find the value of the cost function. Apply gradient descent. Update values of ⍺ and β. Repeat from step 2 till the number of epochs is complete.
Training can be stopped when the desired accuracy is obtained. Figure 3 shows how the line fits the data better as training continues. Since steps become smaller as the minima is approached, it may not be possible to reach the exact minima. However, for application purposes it is not necessary to reach the exact minima to obtain good results. Linear regression can be extended to account for multiple independent variables. The formula is as follows [9].
R
R
A
(a)
A
(b)
Fig. 3 (a) Datapoints and (b) the model getting progressively better as more epochs pass
Fundamentals of Machine Learning
153
R = ⍺1 ðA1 Þ þ ⍺2 ðA2 Þ þ . . . þ ⍺n ðAn Þ þ β
ð5Þ
While linear regression works well in some cases, there is an underlying issue with linear regression. Linear regression works successfully only as long as outputs relate to inputs linearly. What happens when the relationship is polynomial rather than linear?
2.2
Polynomial Regression
One possible approach is to try a polynomial function (Eq. 6) for the model rather than a linear one. So that would imply that R would no longer depend just on A, but also on A2, A3, etc. Equation 6 represents a polynomial regression model using the degree 2. R = ⍺ A2 þ β A þ Ɣ
ð6Þ
It can be inferred that Eq. 2 is essentially a polynomial with degree 1. So linear regression can be thought of as a special case of polynomial regression. Figure 4a shows a polynomial function modeled by polynomial regression to fit the data. The degree of the polynomial is a hyperparameter, and its value is decided by trial and error. It may be noted that choosing the right degree is very important; if too small a value is chosen, then the model will not be strong enough to model that data (underfitting). On the other hand, if too big a value is chosen, then the model would try to memorize the data and fail to make correct predictions for new datapoints (overfitting). These are common machine learning problems limited not just to polynomial regression; they will be addressed in detail in an upcoming section.
R
R
A
A
(a)
(b)
Fig. 4 Result of datapoints being fit by (a) polynomial regression and (b) non-linear regression
154
2.3
J. M. Cherian and R. Kumar
Non-linear Regression
The approach adopted with polynomial regression introduces non-linearity, but polynomials in themselves do not represent the full extent of non-linear functions. Equation 7 shows a non-linear regression model R = f ðAÞ þ C
ð7Þ
Here f can represent any non-linear function like trigonometric or exponential functions. C represents a constant. Figure 4b is an example of a non-linear function modeled by non-linear regression.
2.4
Bayesian Regression
The types of linear regression that has been discussed so far follows an approach in statistics where parameters are assumed to be fixed constants. Parallel to this approach, there exists a Bayesian statistic [13], where the parameter is assumed to be a distribution of values for which some form of prior information exists. When the parameter is assumed to be a distribution, the best value for the parameter is the one with the highest probability given the training data. This probability is dependent on the prior information available for the parameter. Incorporating this prior information allows the model that fits the data better. A more detailed discussion of Bayesian regression is beyond the scope of this chapter.
2.5
Logistic Regression
Logistic regression, despite its name, is a supervised learning algorithm that performs the classification task. Logistic regression is used chiefly with tabular data when the labels are categorical rather than continuous variables. For example, consider a task where a tumor is to be classified as malignant or benign based on its size. The data for this would include columns for the size of the tumor (A) and labels (R). Let the labels be “1” for malignant and “0” for benign. Approach this task as done previously. Consider using Eq. 2 to model the relationship between R and A. Just like before ⍺ and β are parameters of the model. But here, the value of R is not continuous; it is categorical, i.e., it can either be 0 or 1. Hence, it is necessary to find a way to translate the output of the right-hand side of Eq. 2 to a value within the range of 0 to 1. To do this, a function called the sigmoid function is used. The equation for the sigmoid function is shown in Eq. 8, and its graph is shown in Fig. 5.
Fundamentals of Machine Learning
155
Fig. 5 Sigmoid function
1
0.5
-5 -4 -3 -2 -1 0 1 2 3 4 5 6
SigmoidðxÞ =
1 1 þ e-x
ð8Þ
For a very large number as input, the sigmoid outputs values close to 1, and for the hypothetical infinity as input, the sigmoid outputs 1. Likewise, for huge negative inputs, the sigmoid function outputs values close to 0. And at the hypothetical negative infinity, the sigmoid output the value 0 [8]. The sigmoid function when applied on the right-hand side of Eq. 2 generates a value between 0 and 1. R = Sigmoidð⍺ A þ β Þ
ð9Þ
During training, the model parameters are updated to output really large positive numbers for malignant tumors and really large negative numbers for benign tumors. The right-hand side of Eq. 9 outputs between 0 and 1; there needs to be a threshold to decide for what values the output is 0 and for what values it’s 1. Equation 9 can be updated as follows, Z = Sigmoidð⍺ A þ βÞ
ð10Þ
if Z > threshold,R = 1 if Z < threshold,R = 0 The threshold will be any value between 0 and 1. The choice of the threshold is up to the designer. An obvious choice is to use 0.5, but different problems may warrant different values. Due to the categorical nature of the output, the same loss function used for linear regression cannot be used. If least square error is used as a loss function, the model will not be penalized enough for its incorrect predictions, and the model will have too little of a loss to learn from. To account for the output’s categorical nature a new loss function known as binary cross-entropy (BCE) is used [7]. BCE = - log ðpÞ,
if y = 1
BCE = - log ð1 - pÞ,
if y = 0
ð11Þ
In Eq. 11, “p” is the prediction (output of Z in Eq. 10), and “y” is the expected output. When the expected outcome is 1, and the prediction is close to 1, the BCE
156
J. M. Cherian and R. Kumar
value will be nearly 0, but if the prediction is closer to 0, the BCE value will be a significantly negative number. A similar analysis can be made for the case where the expected output is 0. Now that a general idea of logistic regression is established, listed below are the steps taken by the algorithm. The steps involved in logistic regression are as follows: 1. 2. 3. 4. 5.
Initialize ⍺ and β with random values. Find the binary cross-entropy loss. Apply gradient descent. Update values of ⍺ and β. Repeat from step 2.
This can now be extended from a binary classification task to classification problems with more than two classes. The modified version of binary cross-entropy that supports N classes is called categorical cross-entropy. Along with logistic regression, support vector machines (SVM) and decision trees [1, 7, 8, 14, 15] are some common supervised classification algorithms. There are also deep learning approaches for classification problems. The coming sections will explore these concepts.
3 Unsupervised Learning Unsupervised learning is when a machine uses unlabeled data to learn. Data is “unlabeled” when training examples contain only inputs. Table 2 is an example of an unlabeled dataset. Since unlabeled datasets do not have expected outputs, they cannot be used to train a model to predict the exact output. Instead, this learning approach attempts to identify interesting patterns in the input data. Algorithms based on unsupervised learning are called unsupervised learning algorithms [2, 4]. A major challenge in the machine learning world is getting labeled data; hence, unsupervised learning is used to learn from unlabeled data. While many problems require unsupervised learning, only one such issue will be discussed in this chapter, the clustering problem. Clustering algorithms try to identify similarities within the input data and group similar inputs together [14]. K-means clustering is a very common unsupervised clustering algorithm.
Table 2 An unlabeled dataset (subset of iris flower dataset)
Sepal length 4.9 5.7 6.6 5.1
Sepal width 3 2.5 2.9 3.5
Petal length 1.4 5 4.6 1.4
Petal width 0.2 2 1.3 0.2
Fundamentals of Machine Learning
3.1
157
K-Means Clustering
K-means clustering is based on the fact that data points that are closer when plotted are more closely correlated than those that are further away [14]. To explain K-means clustering, consider an example where clusters for different kinds of leaves based on their height and width are to be identified. The data contains the height and width values of various leaves. Figure 6 is the plot for the example discussed here. The steps involved in K-means clustering are as follows [8]: 1. Choose the number of clusters (let this be k). 2. Choose k random points from the set of data points. These points will now be centroids for the k clusters. 3. Find the distance between each point and all the centroids. Whichever distance is the least, mark that point as belonging to that centroid’s cluster. 4. Now find the mean of all the points in each cluster, and make the mean the new centroids of all the clusters. 5. Repeat step 3. 6. If with the new centroids the clusters change, then repeat from step 3; else, continue to step 7. 7. Find the sum of the variance of each cluster and store that. 8. Repeat step 2. 9. The clustering with the lowest variance is desired output. Figure 7 is a pictorial representation of steps (1) through (6). In step 6, the process of finding the clusters is repeated; this is so that with every iteration, the centroid gets closer and closer to the real centroid of the cluster. Steps (1) through (6) would yield the correct result only if the random selection of points in step (2) were able to converge at a location close to the actual centroid. Since this may not always be the case, the process may have to be repeated multiple times, and the one with the least total variance (steps 7 through 9) would be chosen. Variance is used as a measure here to select the best clustering because variance tells
Fig. 6 Datapoints of two different classes of leaves plotted based on their height and width
Height
Type 1 Type 2
Width
158
J. M. Cherian and R. Kumar
Height
Height
type 1 type 2 centroid
Width (b)
Width
Height
Height
(a)
Width
Width (c)
(d)
Fig. 7 Shifting of centroid (represented by the dots) over multiple iterations of the K-Means clustering algorithm. (a) centroids start at initial random positions (b) centroids shift closer to respective clusters after one iteration of K-Means (c) centroids continue to shift as more iterations pass (d) centroids reach their final positions and remain unchanged on further application of K-Means
how spread out the data is. A high total variance would mean that the data in the clusters were spread out significantly, which implies the clustering was probably incorrect. Hence, the clustering that has the lowest total variance is chosen.
4 Brief Introduction to Semi-Supervised and Reinforcement Learning 4.1
Semi-Supervised Learning Approach
Semi-supervised learning is when a machine uses a mix of labeled and unlabeled data to learn. Such learning is considered when a large dataset is available, but only a small portion of the dataset is labeled. Here unsupervised learning algorithms are used first to assign labels to unlabeled examples. Following this, supervised learning algorithms are used by the machine to learn.
Fundamentals of Machine Learning
159
Height
Height
Type 1 Type 2 Type 3
Width
Width
(a)
Height
(b)
Width
(c)
Fig. 8 (a) Collection of data points out of which the colored ones are labeled. (b) Output after performing clustering. (c) Result after assigning clusters with a label based on the labeled data points
Semi-supervised learning finds itself between supervised and unsupervised learning. Algorithms under this category receive a higher level of supervision than unsupervised learning algorithms. This effective supervision allows the use of such algorithms in supervised learning tasks when limited labeled data is available [16]. A classical semi-supervised learning approach is to apply clustering to all the data points to identify clusters. Then using the labeled data points, the model can assign labels to these clusters [16] (Fig. 8). Modern semi-supervised learning algorithms have advanced from the classical approach defined above. Today, there are specialized algorithms tailored for different domains.
160
J. M. Cherian and R. Kumar
However, it must be noted that supervised learning algorithms will surpass semisupervised learning algorithms in performance when data are abundant. Semisupervised learning algorithms are most useful when there are many unlabeled data points but only a few labeled ones. Some use cases where this happens are: 1. CT segmentation: There are large datasets of unlabeled CT scans available, but obtaining labeled ones is quite difficult because that would require a trained professional to label each slice of a CT, which is time-consuming. 2. Language tasks: There is a large reserve of unlabeled textual material from online books, blog posts, etc. However, obtaining a labeled version of this text for tasks such as emotion classification is quite tricky.
4.2
Reinforcement Learning Approach
Reinforcement learning is a reward-based learning system where the machine is rewarded for its positive behavior and penalized for its negative ones. Reinforcement learning algorithms ensure that the machine tries to maximize its overall reward and, in the process, learns to perform better on the desired task. For example, an AI learning to drive is given a positive reward every time it moves in the right direction, whereas it receives a negative reward when it takes a step in the wrong direction. Reinforcement learning (RL) is a form of learning closest to how humans learn. Here a machine learns from the data generated through interactions with a certain environment. For example, when an person learns to walk for the first time, they interact with the real-world environment. Every time they fail, they get hurt, which serves as a check for what they are not expected to do. They then try to rectify it or try something different and eventually learn to walk. For interaction-based tasks, supervised learning algorithms are limited by the number of scenarios that can be accounted for through collected data. The task of collecting such a large amount of data is quite challenging. On the other hand, RL enables the machine to learn through interaction, which has shown promising results in many domains [17]. Consider a simplified example, say that there is a small maze game where the objective is for the player to find a way out of the maze. The player can take one of four options and move to the top, bottom, right, or left. Assume a constraint is set up such that a player cannot backtrack along a path they has already walked. This constraint implies that the game is over when a player hits a dead end. A reinforcement learning algorithm could follow an approach that starts by assigning every possible game state (the player’s current position in the maze) with a value. This value would represent the probability of reaching a winning state from that game state. The exit would have a probability of 1, the dead ends would have 0, and the other states could have a probability of 0.5. If the machine reaches the exit, it receives a reward, and the machine’s main objective is to maximize the reward, so when the game is played, the machine chooses to move to a state with the best probability of success. As the machine keeps playing, it learns from its mistakes and updates the probability value of non-terminal
Fundamentals of Machine Learning
161
Fig. 9 A machine learning agent learning to navigate a maze to the exit point
states. Over time the states better represent the possibilities of winning a game from that state. Here the state values are what is being learned. Occasionally, the machine may choose a non-optimal game state to explore the opportunities that a lower probability output could offer. Figure 9 is a visualization of the example discussed here.
162
J. M. Cherian and R. Kumar
This example is a simplification of what can be achieved through reinforcement learning. Modern applications of reinforcement learning include robotics, selfdriving cars, and even applications in the medical domain [18].
5 Deep Learning Deep learning is a subset of machine learning [19]. The development of deep learning was motivated by the inability of conventional machine learning models to process higher dimensional data, such as images, audio, video, etc. These algorithms were computationally intensive and could not model the complex relationships between input-output pairs. The deep learning approach uses “artificial neural networks” to model complex relationships between input and output pairs [20]. These networks can also be modified to process multiple data types, including images, audio, and video files. Deep learning can be used for supervised and unsupervised learning. However, this chapter will be looking into only supervised deep learning algorithms.
5.1
Artificial Neural Networks
An artificial neural network is an extensive network of artificial “neurons” organized into layers. The mechanism involved in the learning of this network is shown in Fig. 10. This is very similar to the mechanism of supervised learning algorithms, as shown in Fig. 1. Backpropagation
Neural Network
Training Data ( X , y )
ŷ
Forward Propagation ƒ
Loss Loss Function ƒ( ŷ , y )
X - Input data y - Expected output
Fig. 10 Training mechanism of deep learning models
ŷ - Predicted output ƒ - Loss function
Fundamentals of Machine Learning
163
From this point forward, artificial neurons will be referred to as neurons and artificial neural networks as neural networks; however, be careful not to mistake these words for their biological counterparts. In the following sections, the structure of neural networks will be discussed. The two algorithms that help these networks learn – forward and backward propagation – will also be examined.
5.1.1
Artificial Neuron
Neurons are the most fundamental elements of neural networks. A neuron is a function; it takes in a set of inputs and produces an output. The output is referred to as the activation of that neuron [20]. Consider a single neuron; say that the inputs to this neuron are x1, x2, x3 ... xn. Each input to a neuron will have a weight associated with it, say w1, w2. . . wn in this case. The neuron first finds the weighted sum of all those inputs. Z = ðw1 x1 Þ þ ðw2 x2 Þ þ ðw3 x3 Þ þ . . . þ ðwn xn Þ
ð12Þ
Each neuron has associated with it another parameter called a bias (b) which is added on the right-hand side of Eq. 12. Z = ðw1 x1 Þ þ ðw2 x2 Þ þ ðw3 x3 Þ þ . . . þ ðwn xn Þ þ b
ð13Þ
It may be noted that Z in Eq. 13 is a linear combination of all inputs. This means that Eq. 13 will only be able to model linear relationships. But most relationships are complex, and restricting ourselves to linear relationships limits the ability of the model [21]. To tackle this issue, an activation function is included. Activation functions simply introduce an element of non-linearity to the model. Some common activation functions are tanh, rectified linear unit (ReLU), and sigmoid [5, 19, 21, 22]. So now, by applying the activation function, the final output of a neuron (Y) is obtained. Y = ActivationFunctionð Z Þ
ð14Þ
The sigmoid function has already been discussed in the previous section. Before moving on, there is one more activation function to examine, the rectified linear unit or ReLU [21, 22] (Fig. 11). Rectified Linear Unit (ReLU) ReLU ðxÞ = max ð0, xÞ
ð15Þ
ReLU is a simple activation function; it outputs the value itself for all values greater than 0. For all values lesser than 0, it outputs 0. While the graph is linear for
164
J. M. Cherian and R. Kumar
Fig. 11 Rectified Linear Unit Function
ReLU(x)
-5 -4 -3 -2 -1 0 1 2 3 4 5
values greater than 0, the non-linearity at 0 allows it to model some complex relations. ReLU is commonly used as an activation function in many neural networks today [22].
5.1.2
Structure of a Neural Network
Artificial neural networks are composed of multiple layers of neurons. Within a layer, there are no interconnections between neurons. But between two adjacent layers, each neuron connects to every neuron in the other layer. There are three layers in a neural network [19, 20]. • Input layer: The neurons in the input layer do not perform any of the functions talked about earlier. They simply relay the inputs they receive to the first hidden layer. • Hidden layers: There can be multiple hidden layers in a network. Each hidden layer can be composed of many neurons. The number of neurons in a hidden layer is a hyperparameter that can be tuned. • Output layer: This is the last layer of the neural network. The outputs from this layer will be passed to the loss function to calculate the loss during training. There are many types of neural networks, but this study will be restricted to feedforward neural networks. A feedforward neural network uses inputs to produce outputs without a feedback mechanism [20] (neural networks that have a feedback mechanism are called recurrent neural networks). Figure 12 shows a simple feedforward neural network, and this figure will be used to understand how neural networks operate. Here x1, x2, and x3 are the inputs. a1[1], a2[1], and a3[1] are neurons in the input layer. a1[2] and a2[2] are neurons in the hidden layer. a1[3] and a2[3] are neurons in the output layer.
5.1.3
Forward Propagation
The input layer passes the inputs to the first hidden layer. Each neuron in the first hidden layer is connected to every neuron in the input layer. The inputs and weights associated with each connection are used to calculate the output of each neuron in the first hidden layer (as shown by Eqs. 12–14). These outputs are then passed to the
Fundamentals of Machine Learning
165
Fig. 12 Structure of a small artificial neural network
next hidden layer, and the process continues till the output layer. This process of progressively calculating the output of each layer, starting with the input layer to the output layer, is called forward propagation [19].
5.1.4
Backpropagation
Using the expected and predicted outputs, the loss of each output neuron can be calculated. This measures how far the value of each output neuron is from the desired value. To alter the value of the output neurons, the following parameters are important [23]: 1. Biases of the neurons in the current layer. 2. Weights of the connection between the previous layer and the current layer. 3. Outputs of the previous layer. An optimization algorithm such as gradient descent is used to update the weights of the connections and biases of neurons. To update the previous layer’s outputs, it is necessary to update the previous layers’ biases and the weights associated with the connections it has with the layer before it. So the network is traversed recursively, updating the weights and biases at each layer till the input layer is reached, which cannot be updated. This recursive process of going to previous layers and updating weights and biases is referred to as backpropagation. Backpropagation is what makes learning possible in neural networks [19]. In this chapter, the mathematical implementation of backpropagation will be overlooked [23, 24], and the general idea will be used to move forward.
5.1.5
Overview of Feedforward Neural Networks
Training data is passed to neural networks, which are fed forward to each layer (forward propagation) from the input to the output layer. The predicted output is
166
J. M. Cherian and R. Kumar
compared with the expected output using a loss function. The loss is then used to update the weights and biases of each layer, starting from the output layer to the input layer [25] (backpropagation). After some epochs, the model will successfully be able to model a complex relationship between inputs and outputs and make correct predictions. Feedforward neural networks are very powerful, but using the structure discussed earlier might be ineffective when dealing with more complex data types, such as large images. Consider the example of a 512 × 512 image, and if a feedforward neural network is used, 262,144 neurons would be needed in the first layer. This is very large, and it would be computationally intensive to train a model to learn from so many inputs. Another problem with this approach is that this image would need to be flattened (convert it to a 1D vector) if it is being passed as an input to a feedforward neural network. By doing so spatial information associated with the image would be lost. Hence, there is a need for a computationally efficient way to process images while maintaining spatial information. Researchers decided to employ a concept from image processing to tackle this issue. This gave rise to convolutional neural networks, which will be discussed in the following section.
5.2
Convolutional Neural Networks
While feedforward neural networks model complex relations, these networks are incapable of working with large images. To work with large images, a class of neural networks called “convolutional neural networks” is used. Before discussing convolutional neural networks, the two operations that play a crucial role in these networks must be examined. They are the convolution operation and the pooling operation.
5.2.1
Convolution Operation
The convolution operation (usually denoted by *) is often used in image processing. These operations play a vital role in convolutional neural networks. A11 = ð k11 a11 Þ þ ð k12 a12 Þ þ ð k21 a21 Þ þ ð k22 a22 Þ A12 = ð k11 a12 Þ þ ð k12 a13 Þ þ ð k21 a22 Þ þ ð k22 a23 Þ A21 = ð k11 a21 Þ þ ð k12 a22 Þ þ ð k21 a31 Þ þ ð k22 a32 Þ A22 = ð k11 a22 Þ þ ð k12 a23 Þ þ ð k21 a32 Þ þ ð k22 a33 Þ
ð16Þ
Figure 13 details the convolution operation. The matrix marked with yellow in Fig. 13 is called the filter or the kernel. The other input matrix is the image under consideration. The filter’s size is always smaller compared to the size of the image. The convolution operation takes the filter and image as input and then slides the filter over the image (as shown in Fig. 14) and performs an operation on the portion of the image that it is over [26]. Equation 16 indicates the values of the output matrix.
Fundamentals of Machine Learning
167
Fig. 13 Convolution operation
a11 ✕ k11 a12 ✕ k12 a21 ✕ k21 a22 ✕ k22
a13
a11
a23
a21
a12 ✕ k11 a13 ✕ k12 a22 ✕ k21 a23 ✕ k22
a31
a32
a33
a31
a32
a33
a11
a12
a13
a11
a12
a13
a21 x k11 a22 ✕ k12
a23
a21
a22 ✕ k11 a23 x k12
a31 x k21 a32x k22
a33
a31
a32 x k21 a33 x k22
Fig. 14 Filter movement during the convolution operation
The output from the convolution operation is a matrix with a size smaller than the original image. In effect, the convolution operations reduce the size of the image by combining the image with a filter. The use of the filter allows the model to retain selected properties of the original image in the smaller output matrix. A portion of convolutional neural networks is made of “convolutional layers.” A convolutional layer contains multiple filters, and each filter performs a convolution operation on the image. The outputs are then concatenated together. So, the output image will have as many channels as there are filters, each channel being the output of one filter. The convolution operation establishes a linear relationship between the output and the input image. To introduce some non-linearity, the output of a convolutional layer is passed through an activation function.
168
5.2.2
J. M. Cherian and R. Kumar
Pooling Operation
The pooling operation is also used to reduce the size of the image while maintaining the dominant features and removing minor insignificant anomalies in the image. The pooling operation takes an image and a matrix size as input. It then slides a kernel of the specified matrix size over the image and performs a specified operation on the pixels that the kernel is over [26]. This operation varies for different types of pooling operations. Here two common types will be discussed, max pooling and average pooling. In max pooling, the kernel returns the maximum value among all the pixels over, whereas in average pooling, the kernel returns the average of all the pixels that it is over. Figure 15 and associated equations illustrate the pooling operation. For max pooling: A11 = max ða11 , a12 , a21 , a22 Þ A12 = max ða13 , a14 , a23 , a24 Þ
ð17Þ
A11 = max ða31 , a32 , a41 , a42 Þ A22 = max ða33 , a34 , a43 , a44 Þ For average pooling: 1 ða þ a12 þ a21 þ a22 Þ 4 11 1 A12 = ða13 þ a14 þ a23 þ a24 Þ 4 1 A11 = ða31 þ a32 þ a41 þ a42 Þ 4 1 A22 = ða33 þ a34 þ a43 þ a44 Þ 4 A11 =
a11
a12
a13
a14
a21
a22
a23
a24
a31
a32
a33
a34
a41
a42
a43
a44
Fig. 15 Pooling operation
2x2 Pooling
ð18Þ
A11
A12
A21
A22
Fundamentals of Machine Learning
169
When used in convolutional neural networks, there are layers called “pooling layers,” which perform the pooling operation.
5.2.3
Structure of the Network
Convolutional neural networks usually have two portions [26]. 1. Convolutional and pooling layers. 2. Feedforward layers. The image is first passed through multiple convolutional and pooling layers. Finally, the image is of a size that a feedforward neural network can handle. Then the matrix is simply flattened into a single dimension and passed to the feedforward neural network. In convolutional neural networks, during backpropagation, it is not only the parameters of the feedforward layers that get updated; the parameters of convolutional layers are also updated. The values of the filters used in the convolutional layers are all parameters, and these parameters are updated just like weights and biases are updated in feedforward layers. This allows the filters to learn what to give more importance to in the image. Shallow layers of the model identify simple correlations within the image, such as horizontal and vertical edges. In contrast, deeper layers identify more complex features associated with the image, such as a series of shapes and even faces (when trained on a dataset of faces) [27].
5.2.4
Overview of Convolution Neural Networks
An image is passed as input, and convolutional and pooling layers are set up to reduce the dimensions of the image till they reach an appropriate size, after which they are flattened out and passed to a feedforward layer. The output from the final layer is used to calculate the loss. In backpropagation, the weights and biases of the feedforward layer and the kernel values in the convolutional layers are updated. This repeats for every training example for numerous epochs. Figure 16 is the pictorial representation of a small convolutional neural network. Convolutional neural networks have shown impressive results for multiple datasets and are used extensively today [28]. Note: While artificial neural networks are exceptionally powerful in learning complex relations, they are computationally expensive to train. Hence, it is necessary to use graphic processing units (GPUs) to speed up the training process.
J. M. Cherian and R. Kumar
170
Input Image
Feedforward Layers
Flattening Output
Convolutional Layer + ReLU
Pooling Layer
Pooling Layer
Convolutional Layer + ReLU
Fig. 16 Structure of a convolutional neural network
6 Training and Evaluating Models 6.1
Evaluation Metrics
Apart from model creation and training, it is important to evaluate the models’ performance. Evaluation metrics are often selected based on the type of task the model is expected to perform. For regression tasks, the cost function, mean squared error, is an evaluation metric. This is because mean square error is a measure of how close the predictions are to the correct result. In the case of classification problems, it’s slightly more challenging to interpret the results of categorical cross-entropy, and hence there is a need for other evaluation metrics. Confusion Matrix The Confusion matrix is a table used to represent relevant data when evaluating the model [19]. The confusion matrix is a square matrix of size equal to the number of classes. Here, consider a simple confusion matrix for a binary classification problem. For a binary classification problem, the model must predict two classes, either 1 (positive) or 0 (negative). Figure 17 shows the general structure of a confusion matrix for the binary classification problem.
Fundamentals of Machine Learning
171
Fig. 17 Confusion matrix
Positive
Negative
Positive
TP
FP
Negative
Predicted Class
True Class
FN
TN
TP (true positives) – The number of positive predictions (the class predicted is 1) which were correctly predicted. TN (true negatives) – The number of negative predictions (the class predicted is 0) which were correctly predicted. FP (false positive) – The number of positive predictions (the class predicted is 1) which were incorrectly predicted. FN (false negative) – The number of negative predictions (the class predicted is 0) which were incorrectly predicted. The confusion matrix can be used to define some evaluation metrics [19]. 1. Precision: The ratio of true positives to all positive predictions. Out of all the positive predictions, this metric tells how many were correct. Precision =
TP TP þ FP
ð19Þ
2. Accuracy: The ratio of true positives to true predictions. Out of all the predictions, this metric tells how many were correct. Accuracy =
TP þ TN TP þ TN þ FP þ FN
ð20Þ
3. Recall: The ratio of true positives to the sum of true positives and false negatives. Out of all the positives, this metric tells how many were predicted correctly. Recall =
TP TP þ FN
ð21Þ
172
6.2
J. M. Cherian and R. Kumar
Train Test Split
After the model has been carefully designed, it must be trained, and the hyperparameters must be tuned. When training a model, it is necessary to ensure that it is as general as possible because then only the model will be able to perform well on data it has not seen [29]. To ensure that the data generalize, it must be split before training. Usually, data is divided into two sets, namely, the train set and the test set [30]. The train set is used to train the model. After training, the test set is used to evaluate how well the model performs. There is no hard and fast rule for the proportion of the train test split; it is up to the implementer’s discretion. Some common ones are 60/40, 70/30, and 80/20 [30]. If the train and validation accuracies are similar during training, it can be said that the model is training well. But if the validation accuracy is much lesser than train accuracy, the model is said to be overfitting. If both the train and validation accuracy are very low, the model is said to be underfitting. Overfitting and underfitting are very common scenarios when training a model, and if gone uncorrected, it can lead to the creation of bad models. Underfitting Underfitting usually occurs when the model is not powerful enough to capture the relationship between the input and the output variables. It can also happen when the model is trained on very little data or for very little time. In this case, both train and validation accuracies will be very low. This can be corrected by using a more powerful model, collecting more data, or training for a more significant number of epochs [29]. Figure 18a shows an example of underfitting.
model predictions datapoints
y - output x - input y
y
x
x (a)
(b)
Fig. 18 (a) Output from a model that was underfitted and (b) output from a model that was overfitted
Fundamentals of Machine Learning
173
Overfitting When a model is trained for too long or on too little data, the model, in a way, memorizes the training data and performs exceptionally well on the train set. But when evaluated on the validation set, it gives a much lower score. This situation is known as overfitting. Overfitting is an indication of the model not generalizing well. In such cases, the model will perform poorly on data not part of the train set. Numerous implementational practices can be followed to avoid overfitting. Figure 18b depicts overfitting. It must be noted that many practices can be adopted during implementation to help fit the data better. This chapter focused mainly on introducing the fundamentals of machine learning and did not cover those aspects. However, readers are invited to look at some research that has analyzed these implementational techniques [30]. Take-Home Message When it comes to machine learning, there are four learning approaches, namely, supervised, unsupervised, semi-supervised and reinforcement learning. Supervised learning consists of regression and classification tasks, whereas unsupervised learning consists of clustering and non-clustering tasks. Semi-supervised learning attempts to learn from limited labeled data and reinforcement learning uses the feedback from an environment to enable agents to learn. The recent boom in machine learning is due to a subset of machine learning called deep learning. Deep learning is based on artificial neural networks. Modified versions of artificial neural networks such as convolutional neural networks are used when dealing with large images. To test the effectiveness of a model, evaluation metrics like mean square error for regression, accuracy, precision, and recall for classification need to be used.
References 1. Mitchell, T. (1997). Machine learning. McGraw Hill. 2. Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press. 3. Fisher, R. A. (2018). iris. IEEE Dataport. 4. Müller, A. C., & Guido, S. (2016). Introduction to machine learning with python: A guide for data scientists. Journal of Chemical Information and Modeling, 53. 5. Rätsch, G. (2004). A brief introduction into machine learning. In 21st Chaos Communication Congress. 6. Shalev-Shwartz, S., & Ben-David, S. (2013). Understanding machine learning: From theory to algorithms. Cambridge University Press. 7. Hastie, T., Tibshirani, R., & Friedman, J. (2017). The elements of statistical learning data mining, inference, and prediction (12th printing). Springer. 8. Grus, J. (2019). Data science from scratch: First principles with python. O’Reilly Media. 9. Bishop, C. M. (2006). Pattern recoginiton and machine learning. Springer. 10. Smola, A., & Vishwanathan, S. V. N. (2008). Introduction to machine learning. Cambridge University Press.
174
J. M. Cherian and R. Kumar
11. Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015 – Conference Track Proceedings. 12. Ruder, S. (2017). An overview optimization gradients. arXiv preprint arXiv:160904747. 13. Gelman, A., Carlin, J. B., Stern, H. S., et al. (2013). Bayesian data analysis. Chapman and Hall/ CRC. 14. Nilsson, N. J. (2005). Introduction to machine learning. Stanford University Press. Machine Learning. 15. Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. 16. Chapelle, O., Schölkopf, B., & Zien, A. (2010). Semi-supervised learning. Adaptive computation and machine learning. MIT Press. 17. Sutton, R. S., & Barto, A. G. (2012). Reinforcement learning: An introduction (2nd ed.). MIT Press. 18. Yu, C., Liu, J., Nemati, S., & Yin, G. (2023). Reinforcement learning in healthcare: A survey. ACM Computing Surveys. 19. Patterson, J., & Gibson, A. (2017). Deep learning a practitioner’s approach. 20. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. 21. Buduma, N., & Locascio, N. (2017). Fundamentals of deep learning: Designing nextgeneration machine intelligence algorithms. O’Reilly Media. 22. Arora, R., Basu, A., Mianjy, P., Mukherjee, A. (2018). Understanding deep neural networks with rectified linear units. arXiv preprint arXiv:16110149. 23. LeCun, Y. A., Bottou, L., Orr, G. B., & Müller, K. R. (2012). Efficient backprop. Springer. 24. Parr, T., & Howard, J. (2018). The matrix calculus you need for deep learning. arXiv preprint arXiv:180201528. 25. Gabella, M., Afambo, N., Ebli, S., & Spreemann, G. (2019). Topology of learning in artificial neural networks. arXiv. 26. Wu, J. (2017). Introduction to convolutional neural networks. National key lab for novel software technology. Nanjing University. China. Introduction to convolutional neural networks. 27. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. Springer. 28. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM. 29. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM. 30. Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning Performance Estimation: Generalization Performance Vs. Model Selection. arXiv.
Applications in the Field of Bioinformatics M. Parvez and Tahira Khan
Abbreviations AI ABM CT CVD DB IoT KNN MIMIC ML MRI NER NGS NN SVM
Artificial intelligence Agent-based models Computed tomography Cardiovascular disease Database Internet of Things K-nearest neighbor Multiparameter Intelligent Monitoring Intensive Care Machine learning Magnetic resonance imaging Named-entity recognition Next-generation sequencing Neural network Support vector machines
What You Will Learn? This chapter focuses on: • Applications of AI in systems biology • Importance of systems biology • Biological data analysis and predictions M. Parvez Department of School of Allied Health Sciences, DPSRU, New Delhi, New Delhi, India T. Khan (✉) Department of Pharmacology, SPER, Jamia Hamdard, New Delhi, New Delhi, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_7
175
176
M. Parvez and T. Khan
• Biomedical imaging and its importance • Application of AI in healthcare and treatment • Disease diagnosis and prediction using AI
1 Introduction Getting familiar with the concepts of artificial intelligence (AI) and machine learning (ML) is one thing, whereas building a system and training it to predict data in the form of an application is entirely another. The wide application of ML techniques shows how beneficial it is for modern-day researchers and the crucial role it will play in the upcoming years. This chapter will discuss the applications of the previously learned techniques and algorithms in various fields like systems biology, text mining, biological data analysis, predictions, healthcare, diagnosis, and treatment.
2 Application of AI in Systems Biology Emphasis has been laid on system-level understanding rather than just gaining the knowledge of genes and proteins [1]. Identifying the genes and proteins in our body can be regarded as identical to recognizing the hardware components in any assembly. This recognition would provide a list of all the elements in our body. But this knowledge will still be insufficient to understand the working of a system when its pieces are put together [2]. Therefore, the priority should be to understand the dynamics of a biological system. It will help us to understand the traffic patterns within a body, how they emerge, signal production, and how to control or modify them.
2.1
Why Is It Important?
Systems biology helps to be familiar with the complex working of living organisms at the system level. The knowledge thus obtained from a given system helps us predict its functioning, forecast changes over time, and tailor treatments for living organisms. These include the discovery of biomarkers, precision medicine, target drugs, and various other treatments. Systems biology is more of a collaboration of different domains like biology, bioinformatics, physics, computer science, and engineering (Fig. 1).
Applications in the Field of Bioinformatics
177
Fig. 1 The figure above describes the interrelationship between biology, technology, and computation. With new biological questions from fundamental biology, there is a need for the development of new technologies. In search of these dimensions, we attain new datasets, and to explore the datasets thoroughly, novel analytical tools are required
2.2
Intelligent Vaccine Design
Vaccine development is a crucial part of controlling the spread of any disease. The current route for the drug development of vaccines for major infectious diseases is a costly and time-taking process. Moreover, most vaccines manufactured under such an approach fail during the phase II or phase III trials. Considering the cost, time, and high rate of failures, the application of artificially intelligent methods comes as a boon to systems biology. These methods can significantly reduce the failure rate, speed up the current development process, and consequently increase the efficacy of such systems. Steps are being taken for approval from concerned regulatory authorities for the application of these technologies in in silico trials aided by the systems biology database.
2.2.1
The Collaboration of AI and Systems Biology for Vaccine Design and Development
Systems biology utilizes the biological data pool available from various levels of the omics field, be it system or multiscale data. On the other hand, AI uses knowledgebased methods in figuring out the suitable target, ML for epitope prediction, and agent-based models for systems immunology. Combining the above techniques result in an intelligent vaccine design. Immunotherapy in its current state is expensive in clinical development and even more costly to be used for a sustainable run of public health systems. In such a
178
M. Parvez and T. Khan
Fig. 2 Combination of artificial intelligence and systems biology for intelligent vaccine design
situation, the collaboration of AI and systems biology in immunology has proven its significance. With the knowledge discovery approach, AI can help discover the hidden patterns that went unnoticed in the past. Using a suitable ML algorithm-based epitope prediction, one can design epitope-based vaccines for various infectious diseases, even cancer [3]. AI approaches can also help predict the toxic effects of a drug and figure out whether the vaccine antigen pairs up for an efficient relationship with the human proteins or can generate a chain of toxic reactions in the worst possible case. Figure 2 will help us understand how AI and systems biology can help design an intelligent vaccine.
2.3 2.3.1
Approaches Used by AI to Design Intelligent Vaccine Knowledge Discovery Approach
This approach deals with mining the data from specialized databases, which have been used previously for disease clinical trials. Moreover, it uses ML techniques to simplify the data available and visualize it for comprehensible interpretation. The data at hand is processed further to obtain the hidden characteristics from the previously tested vaccines. This process has two different stages: (i) Named-entity recognition (NER): NER discovers the concerned entities like diseases, tests, and procedures and associates them with known identifiers. (ii) Information retrieval: It retrieves information from the data collected after the first stage. If the extracted information is in textual form, natural processing techniques are applied, while image processing techniques are utilized for the visual information. Once the critical information is retrieved, it is subjected to an unsupervised ML technique. This technique will help in the identification of crucial factors for antigen selection and clustering antigens as well.
Applications in the Field of Bioinformatics
2.3.2
179
Epitope and Agent Prediction Approach
Due to the variability in their nature, proteins are the preferred targets of immunity. The B and T cells can recognize any protein antigen, but not all are responsible for inducing immunity in our bodies. Appropriate ML models are used to find the relevant antigens, which are an essential part of vaccines. This process is called reverse vaccinology. Modern epitope-based vaccines can be manufactured with the help of epitope prediction rather than going with the generalized approach [4, 5]. The various machine-based approaches used recently for reverse vaccinology are Jenner-Predict, VaxiJen, VacSol, etc. [6–8].
2.3.3
Agent-Based Model Approach
Agent-based models, often known as ABMs, are the computation branch of artificial intelligence. They assist in determining how systems can achieve a higher-order state by leveraging computational capabilities. In ABMs, each cell is matched against an agent with a predetermined set of potential interactions. It makes an organism’s immune system an illustration of an intricate structure that consists of several interacting cells, chemicals, or organs. ABM is therefore useful for modelling and replicating the immune system’s dynamics and, consequently, for evaluating novel vaccinations. To summarize the process, the first regions of specific antigens are identified, using a knowledge-based approach that aids in eliminating the failed vaccination candidates. The pertinent T and B cell epitopes are chosen after further refinement using epitope prediction methods. Additionally, epitope prediction techniques assist in removing non-human immunogenic epitopes from the system. Candidate vaccine formulations are found by pre-selecting B and T cell epitopes. ABM technique further aids in choosing the model that will effectively target and retrieve systems biology data. Finally, clinical trials of the formulated vaccine are conducted, and the vaccine is made available for use.
3 Application of AI in Biological Data Analysis and Predictions With the advancements in technology, our computing and storage capabilities have evolved quickly. These have made analysis of large-scale biological data analysis. Next-generation sequencing (NGS) is one of the revolutionary advances of the last decade. While Sanger’s method could only run one sequence at a time, NGS can run millions of fragments simultaneously. These significant advances by AI in the last decade have opened up numerous options in different fields. Biological data analysis and its prediction is one such progress in bioinformatics [9].
180
M. Parvez and T. Khan
The use of AI in bioimaging, medical imaging, sequence analysis, and other fields has created opportunities to model techniques that have the potential to revolutionize the future [10]. Let’s take a look at some of these applications.
3.1
Biomedical Imaging
Biomedical imaging has been an indispensable tool in the field of diagnosis for a long time. High-resolution images are produced when electromagnetic waves of different wavelengths interact with the biological tissues of our body. In the case of ultrasound, these electromagnetic waves are replaced by mechanical sound waves. The field of biomedical imaging is thus concerned with capturing images and using them for therapeutic or diagnostic purposes. They help medical researchers to analyze the current state of a tissue or an organ. The most common ones are X-rays, ultrasound, MRI (magnetic resonance imaging), X-ray computed tomography (CT), nuclear medicine, and high-resolution microscopy. Although the first imaging technique, X-ray, was introduced in 1895, biomedical imaging is still said to be evolving. Over time the quality of image features detailing and resolution has improved drastically and provides a more reliable and accurate diagnosis. Today, the application of ML and deep learning methods has provided a boost to the already developing field of biomedical imaging. With modern AI techniques, it is now even easier and faster to diagnose diseases, thus saving crucial time and providing clinicians and researchers with little extra time to discover a treatment regimen for a life-threatening disease [11, 12].
3.1.1
ML for Feature Analysis
In biomedical imaging, it is necessary to reduce data dimensionality. In the past, it was achieved using principal and independent component analysis, while K means algorithm, an unsupervised ML technique, was preferred to solve clustering problems. These techniques extract the necessary features from an image and feed them into a learnable algorithm. The most popular methods are the random forest method and support vector machines (SVM). While random forests deal with multiple decision trees, each of which is subjective to a specific case, support vector machines are employed because of their ease of application on non-linear models (Fig. 3).
Applications in the Field of Bioinformatics
181
Fig. 3 The process of biomedical imaging is completed in four steps, namely: (i) acquisition, (ii) reconstruction, and (iii) restoration and registration. With ML models, the loaded data is turned in for data analysis. After data pre-processing, models are created according to the use and are then tested for accuracy
3.1.2
Deep Learning in Biomedical Imaging
The first use of artificial intelligence in biomedical imaging dates back to 1995 when a convolutional neural network was used for lung screening under X-rays. Over the last two decades, the field has undergone some revolutionary changes. AlexNet was the first which set the standards for classification in imaging. The limitations of AlexNet were the ones that led to the discovery of U-net, which is an encoderdecoder network in its simplest form. Researchers believe that the full potential of deep learning in the field of classification for medical imaging has yet to be recognized. It is because deep learning methods lack training data. However, this can be resolved shortly, thanks to open datasets of medical images [13].
3.2
Prediction of COVID-19 Pneumonia
COVID-19 or Coronavirus emerged in Wuhan in December 2019. In no time, the disease took over the world to a condition where the WHO declared it a pandemic in March 2020. The first wave created severe pneumonia, which caused respiratory system failure in many cases. Most countries implemented lockdown, sealing the borders and prohibiting imports and exports outside the countries. Because the medical industry is always looking for better ways and technologies to control and monitor disease spread, AI was identified as the ideal tool for addressing various issues of COVID-19. Apart from decreasing the widespread of COVID-19, it also helped in mortality rate prediction, disease planning and treatment, and disease analysis (Fig. 4).
182
M. Parvez and T. Khan
Fig. 4 Role of AI in different fields in the COVID-19 pandemic
3.2.1
Application in Early Detection
The utmost priority in a pandemic is to save a life through early detection and prevention. With AI to our rescue, ML helped in the detection of COVID-19 at an early stage and in vaccine development and metabolic engineering [14]. AI has also assisted the medical industry in developing new diagnostic and treatment systems through various algorithms, thus helping us to prepare for an emergency condition. 3.2.2
Treatment Monitoring
AI can assist in predicting a patient’s current state with the help of a neural network (NN). NN can aid in extracting hidden features in biomedical imaging, allowing clinicians to provide appropriate treatment. 3.2.3
Application in Medical Image Analysis
The traditional method of detecting COVID-19 was time-consuming. Convolutional neural networks, approved by researchers around the globe, have proven their worth in medical imaging systems. It involves a technique that separates patients seeking intensive care from the ones with low-level disease infection. Under this technique, COVID-19 medical image analysis quantifies the patterns of interstitial lung disease (ILD), having eight radiographic ILD patterns resembling COVID-19. Following quantification, the next crucial step is staging. In the case of an ILD, staging assists clinicians by ascertaining the location and extent of the disease. Once
Applications in the Field of Bioinformatics
183
done, key features from the staging process are identified. The method employs an approach that combines 2D and 3D deep learning structures. In the 2D deep learning structures, CovidENet and AtlasNet were used [15, 16]. Algorithm 1 AtlasNet Inference S ← sample Ci ← the i - th trained network for i 2 i::N do step 1 : T i ← argminE T : S, Ai Si warped ← T i ðSÞ step 2 : Si warped,seg ← C i Si warped step 3 : Si seg ← T i - 1 Si warped,seg step 4 : Sseg ← CombineðSi seg Þ Outputs were recorded and compared to statistical and traditional techniques.
3.2.4
Cases and Mortality Prediction
AI can assist in predicting the number of cases across the world from the available data with proper training. Furthermore, it can aid in predicting the number of patients recovered, deaths, peak timings, and so on. Thus, AI can lend a helping hand to countries and their healthcare departments in taking the necessary steps [17–19].
4 Application of AI in Healthcare Diagnosis and Treatment The development of AI in the past two decades has shown its possibilities in healthcare, diagnosis, and treatment. Its exceptional use during the pandemic has helped researchers to broaden its utilization rather than just prediction and imaging analysis. AI can help in documenting and managing databases to use them for diagnosis, treatment, and monitoring. Moreover, modern AI systems can behave like humans by understanding text, images, and speech. This capability of artificially intelligent systems could carry out complex tasks such as AI-assisted robotic surgeries, reducing risk and thus increasing accuracy in human life expectancy. The most recent developments indicate that AI will play a significant role in healthcare diagnosis and treatment, involving biomedical research, information processing, data mining, biomedical information processing and segmentation, disease diagnostics, etc. AI is still in its early stages, and here are some examples of AI employment to demonstrate and inspire you.
184
M. Parvez and T. Khan
Fig. 5 Use of artificial intelligence in the medical field
4.1
Disease Diagnostics and Prediction
AI has demonstrated a significant role in biomedicine, be it the accurate and timely prediction of the COVID-19 disease or the intelligent vaccine design [20]. Innumerable lives are saved every day across the globe because of the rapid diagnosis of disease using AI, as observed during the COVID-19 pandemic. Another example of life-saving technology is using biosensors and biochips to record gene expression as a diagnostic tool. Appropriate ML techniques can help analyze gene expressions to detect abnormalities or mutations, thus helping in the early prediction of cancer [21, 22] (Fig. 5). 4.1.1
Futuristic Biosensors with AI for Cardiac Healthcare
Cardiovascular diseases (CVDs) are disorders of the blood vessels and the heart that commonly include coronary heart diseases, rheumatic heart disease, cerebrovascular diseases, etc. These diseases are responsible for 32% of all deaths worldwide, accounting for 17.9 million lives each year. The most common factors for CVDs are unhealthy diet, irregular sleep patterns, intake of tobacco and alcohol, and lack of exercise. It may result in high blood pressure, obesity, hyperglycemia, palpitations, chest pain, etc. It is crucial to identify patients at high risk of CVDs and ensure that they receive appropriate intensive care subject to the severity of the disease. Many cardiac biomarkers, such as C-reactive protein, myoglobin, and B-type natriuretic peptide, have previously been identified. However, detection of most of these biomarkers adheres to the traditional approach, quantified using laboratory techniques to assess the patient’s condition. These tests may require a few hours to several days in some areas, which is a long duration for a patient requiring an urgent diagnosis. With the use of AI, recent advancements in the medical industry have shed light on new techniques that can be combined with biomarkers to create futuristic biosensors, thus contributing to early detection and timely diagnosis [23].
Applications in the Field of Bioinformatics
185
Table 1 Commonly used databases for AI in CVDs Database Long-Term ST Database UCI Repository of Machine Learning CardioNet IQRAA Hospital, Calicut, Kerala, India Multiparameter Intelligent Monitoring Intensive Care (MIMIC)-II
Data PreProcessing
Feature Extraction
Storage ECG recordings Non-invasive and clinical reports Manually curated database for CVD research ECG recording Physiological parameters and clinical reports
Feature Selection
Learning Method
Fig. 6 Process to deduce outcome from the CVD databases in the diagnosis of disease
4.1.2
Role of ML
ML algorithms developed with an ever-expanding medical database have demonstrated their utility in modern medical practices. It has aided clinicians in their decision-making and computational abilities in dealing with everyday issues. In the case of CVDs, various databases are used to achieve the same result. These are listed in Table 1. 4.1.3
Procedure to Determine Results from the CVD Databases for Disease Diagnosis
This section describes the steps involved in obtaining the result from the CVD database using AI and ML (Fig. 6). 4.1.3.1
Data Pre-processing
Data pre-processing in machine learning removes noise and outliers from the data provided to the system. To remove 20 Hz and 0.3 Hz noise, low-pass and high-pass filters are applied, while other noises are eliminated using band rejection, such as a 50-Hz notch filter.
186
4.1.3.2
M. Parvez and T. Khan
Feature Extraction
It aids in the detection of hidden and essential features from various variables in the dataset. It is an ML technique that transforms the raw data into numerical form while maintaining information in its original state. 4.1.3.3
Feature Selection
Feature selection is critical because it selects only the important patterns from the dataset, avoiding capturing noise or unnecessary patterns resulting in an incorrect diagnosis. Only the essential features are chosen in this step to avoid the overfitting of a model. 4.1.3.4
Learning Method
There are four types of learning methods: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. The most commonly used models in healthcare are the supervised and unsupervised models. While supervised learning works on a labelled dataset, unsupervised learning works on an unlabelled dataset. Commonly used techniques under supervised learning are K-nearest neighbor (KNN), SVM, random forest, and regression. In recent years, SVM and KNN have outperformed other techniques in the field of healthcare. KNN is the preferred classifier for computer-aided design (CAD) detection because it does not make any assumptions about data distribution. Furthermore, it outperforms SVM in detecting heart irregularities. A multilayer perceptron NN has been developed recently for predicting the risk level of a heart disease patient. The NN can also speculate whether a person will develop heart disease or not. Another learning technique applied to the processed data is a decision tree, which can predict structures or patterns using classification or regression. However, in the case of big data, random forest is preferred to avoid overfitting data. With appropriate ML techniques and the implication of wireless capabilities, futuristic biosensors are in the stage of development for real-time patient monitoring. The Internet of Things (IoT) has gained interest in recent years. With the application of AI, IoT can help biosensors to act as virtual assistants. Keeping the necessary limitations in mind, AI can also help to develop biosensors for other diseases apart from CVD. Take-Home Message This chapter includes just a few of the many AI techniques used in bioinformatics to date. AI’s current growth has piqued the interest of researchers all over the world. Researchers are now looking for ways to use ML to solve existing problems rather than relying on traditional methods. It is important for people to understand that AI will not replace humans in the future. Despite that, it will unlock the hidden potential of various fields like biological data, analysis, sequencing, imaging, diagnosis, and management.
Applications in the Field of Bioinformatics
187
References 1. Middendorf, M., Kundaje, A., Wiggins, C., Freund, Y., & Leslie, C. (2004). Predicting genetic regulatory response using classification. Bioinformatics, 20(suppl_1), 1232–1240. 2. Kitano, H. (2002). Systems biology: A brief overview. Science, 295(5560), 1662–1664. 3. Russo, G., Reche, P., Pennisi, M., & Pappalardo, F. (2020). The combination of artificial intelligence and systems biology for intelligent vaccine design. Expert Opinion on Drug Discovery, 15(11), 1267–1281. 4. Parvizpour, S., Pourseif, M. M., Razmara, J., Rafi, M. A., & Omidi, Y. (2020). Epitope-based vaccine design: A comprehensive overview of bioinformatics approaches. Drug Discovery Today, 25(6), 1034–1042. 5. Parvizpour, S., Razmara, J., & Omidi, Y. (2018). Breast cancer vaccination comes to age: Impacts of bioinformatics. BioImpacts: BI, 8(3), 223. 6. Rizwan, M., Naz, A., Ahmad, J., Naz, K., Obaid, A., Parveen, T., Ahsan, M., & Ali, A. (2017). VacSol: A high throughput in silico pipeline to predict potential therapeutic targets in prokaryotic pathogens using subtractive reverse vaccinology. BMC Bioinformatics, 18(1), 1–7. 7. Doytchinova, I. A., & Flower, D. R. (2007). VaxiJen: A server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics, 8(1), 1–7. 8. Jaiswal, V., Chanumolu, S. K., Gupta, A., Chauhan, R. S., & Rout, C. (2013). Jenner-predict server: Prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions. BMC Bioinformatics, 14(1), 1–1. 9. Chakraborty, I., Choudhury, A., & Banerjee, T. S. (2017). Artificial intelligence in biological data. Journal of Information and Software Technology, 7(4), 207. 10. Hayat, H., & Wang, P. (2020). The applications of artificial intelligence in biomedical imaging. AJBSR, 8(3), 228–231. 11. Mahmud, M., Kaiser, M. S., Hussain, A., & Vassanelli, S. (2018). Applications of deep learning and reinforcement learning to biological data. IEEE Transactions on Neural Networks and Learning Systems, 29(6), 2063–2079. 12. Jebril, N. A., & Al-Haija, A. (2021). Artificial intelligent and machine learning methods in bioinformatics and medical informatics. In Emerging technologies in biomedical engineering and sustainable telemedicine (pp. 13–30). Springer 13. Panayides, A. S., Amini, A., Filipovic, N. D., Sharma, A., Tsaftaris, S. A., Young, A., Foran, D., Do, N., Golemati, S., Kurc, T., & Huang, K. (2020). AI in medical imaging informatics: Current challenges and future directions. IEEE Journal of Biomedical and Health Informatics, 24(7), 1837–1857. 14. Helmy, M., Smith, D., & Selvarajoo, K. (2020). Systems biology approaches integrated with artificial intelligence for optimized metabolic engineering. Metabolic Engineering Communications, 1(11), e00149. 15. Chassagnon, G., Vakalopoulou, M., Battistella, E., Christodoulidis, S., Hoang-Thi, T. N., Dangeard, S., Deutsch, E., Andre, F., Guillo, E., Halm, N., & El Hajj, S. (2021). AI-driven quantification, staging and outcome prediction of COVID-19 pneumonia. Medical Image Analysis, 1(67), 101860. 16. Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoderdecoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. 17. Vaishya, R., Javaid, M., Khan, I. H., & Haleem, A. (2020). Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes and Metabolic Syndrome: Clinical Research and Reviews, 14(4), 337–339. 18. Luo, H., Tang, Q. L., Shang, Y. X., Liang, S. B., Yang, M., Robinson, N., & Liu, J. P. (2020). Can Chinese medicine be used for prevention of corona virus disease 2019 (COVID-19)? A review of historical classics, research evidence and current prevention programs. Chinese Journal of Integrative Medicine, 26(4), 243–250.
188
M. Parvez and T. Khan
19. Aminian, A., Safari, S., Razeghian-Jahromi, A., Ghorbani, M., & Delaney, C. P. (2020). COVID-19 outbreak and surgical practice: Unexpected fatality in perioperative period. Annals of Surgery. 20. Sharma, A., Rani, S., & Gupta, D. (2020). Artificial intelligence-based classification of chest X-ray images into COVID-19 and other infectious diseases. International Journal of Biomedical Imaging, 6, 2020. 21. Haleem, A., Javaid, M., & Khan, I. H. (2019). Current status and applications of artificial intelligence (AI) in medical field: An overview. Current Medicine Research and Practice, 9(6), 231–237. 22. Rong, G., Mendez, A., Assi, E. B., Zhao, B., & Sawan, M. (2020). Artificial intelligence in healthcare: Review and prediction case studies. Engineering, 6(3), 291–301. 23. Vashistha, R., Dangi, A. K., Kumar, A., Chhabra, D., & Shukla, P. (2018). Futuristic biosensors for cardiac health care: An artificial intelligence approach. 3 Biotech, 8(8), 358.
Future Prospects Hussam Bin Mehare, Jishnu Pillai Anilkumar, and Mohammad “Sufian” Badar
1 Automotive Industry Machine learning has enormous promise in the automotive industry for uncovering hidden linkages within data sets and generating predictions. The market environment is changing due to changing market circumstances, fierce competition, globalization, budget limitations, and unpredictability. The combination of big data analytics and machine learning has enhanced the ability to analyze huge volumes of data, which has accelerated the emergence of AI systems [1–15].
1.1
Predictive Maintenance
Monitoring and prediction modeling are used in proactive maintenance to identify the status of the machine and predict what is likely to fail and when it will happen. Machine learning systems can help with changing maintenance intervals, in which the same maintenance is performed but the time or mileage is shifted back or ahead. H. B. Mehare (✉) Department of Mechanical Engineering, Z.H. College of Engineering and Technology, Aligarh Muslim University, Aligarh, Uttar Pradesh, India J. P. Anilkumar Department of Computer Science & Engineering, Presidency University, Bengaluru, Karnataka, India M. S. Badar Department of Computer Science and Engineering, School of Engineering Sciences and Technology (SEST), Jamia Hamdard, New Delhi, India (Former) Department of Bioengineering, University of California, Riverside, CA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_8
189
190
H. B. Mehare et al.
As a consequence, machine learning systems can help to enhance predictive maintenance abilities and aid in the accurate prediction of future failures rather than just identifying current ones. Text and tweet analytics, for example, may appropriately combine client input analysis findings in social media. This contributes to the advancement of vehicle and subsystem efficiency, which will drive future product design. It also helps in detecting failure trends and establishing a relationship between the failure and its causes. With the help of appropriate data, organizations may use machine learning systems to create region-specific adjustments that boost product dependability. It also incorporates analytical data and learns user personality traits, resulting in userspecific profiles that may be utilized for customization and assistance [1–15].
1.2
Quality Control
Machine learning technologies such as image recognition and fault detection may be used to quickly identify and eliminate faulty parts before they reach the automotive manufacturing cycle. Parts manufacturers may take photographs of each element as it departs the assembly line and instantly process these photos through a machine learning model to detect flaws. Anomaly detection systems that are very accurate may discover abnormalities as little as a millimeter. Predictive analytics can be used to assess if a defective item is repairable or must be discarded. Eliminating or rebuilding faulty components early is far less expensive than finding and resolving issues later. It lowers the expense of more serious issues later in the manufacturing process and reduces the likelihood of costly recalls. It also helps with consumer safety, satisfaction, and retention. Image recognition and analytics models can aid in the development of new and better-performing tires by recognizing and evaluating minute variations in tread wear patterns, providing excellent control for paint and other finishes, and facilitating potential danger evasion for ADAS and autonomous driving systems. As a consequence, rather than a discrete solution designed for an unique use case, many businesses would benefit more from an enterprise data science platform [1–15].
1.3
Root Cause Analysis
Identifying the root cause(s) of a difficulty during the manufacturing process is a timeconsuming and labor-intensive operation. In root cause analysis, vast quantities of testing data, sensor readings, manufacturer settings, and other factors are utilized. Machine learning technologies might significantly speed up root cause analysis and resolution. Anomaly detection algorithms can analyze enormous amounts of system and driver data effectively, and they can do so using novel data formats. These approaches can uncover highly specific basic causes months faster than traditional research and usually identify concerns that would have gone undetected otherwise [1–15].
Future Prospects
1.4
191
Supply Chain Optimization
Across the supply chain, analytical models are used to assess demand levels for various marketing tactics, selling prices, locations, and a range of other data aspects. Finally, this predictive analysis estimates the quantity of inventory necessary at each location. Various situations are investigated in order to ensure appropriate inventory levels, boost brand reputation, and eliminate unnecessary holding costs. Following an examination of the gap between current and planned inventory levels, optimization models are constructed to aid in managing the precise flow of goods from the production to distribution hubs and, finally, to customer-facing storefronts. Machine learning is supporting automakers and their logistical partners in becoming more productive and profitable while also enhancing customer service and brand reputation [1–15].
2 Aviation Artificial intelligence is gaining appeal in a variety of industries, including airports, due to its ability to analyze massive amounts of data and accelerate operations and procedures [1–10, 16–20].
2.1
Recommendation Engine
Recommendation engines are commonplace in well-known Internet businesses like Netflix and Amazon, as well as various travel-booking websites. The AI platform analyzes the passenger’s historical data, such as earlier registrations, behaviortracking techniques, metadata, purchase history, and real-time data, to present highly personalized offers to passengers, increasing retention and the lifetime value of a customer [1–10, 16–20].
2.2
Chat Bots
Chatbots could direct customers to specific services or outlets, provide flight status updates, and more, freeing up workers to focus on other important tasks and reducing the need for human interaction. Chatbots and customer service automation are similar to humans in that they comprehend basic questions and respond in a casual, conversational manner. Airports may use chatbots to provide 24/7 customer care while eliminating the need for human interaction [1–10, 16–20].
192
2.3
H. B. Mehare et al.
Baggage Screening Passenger
Checked-in luggage is screened more thoroughly thanks to the implementation of an artificial intelligence-based, robotic-assisted convenience system that quickly troubleshoots and redirects high-risk baggage for additional scrutiny. Today’s AI-powered live video facial recognition technologies provide information about how people move in space and allow much faster access [1–10, 16–20].
2.4
AI Thermal Cameras/AI-Based Video Analytics
Using algorithms and computer vision technology, AI-based video analytics evaluates video feeds from cameras to detect patterns and trends. The analysis is done in real time and delivers actionable intelligence such as crowd gathering, emotions and actions of individuals, general heat mapping, and so on. Artificial intelligence technologies that use facial recognition, for example, can help to simplify the check-in procedure by matching the photo in the passport with the real image of the customer. The data might also be linked to the check-in baggage, making baggage collection at the destination airport easier. This also avoids the risk of customers picking up the wrong bags at check-out. It also aids airport security by mapping baggage to the customer in the event that prohibited items are found in the bags [1–10, 16–20].
2.5
Autonomous Taxiing Takeoff and Landing
Pilots would be able to focus on strategic decision-making and mission management rather than aircraft operations with a fully autonomous system. The ATOL system relies heavily on computer vision and machine learning, and it uses a plethora of cameras, radar, and LiDAR to learn about its surroundings. The system was put on a full-sized Airbus A350-1000 airliner with seating for more than 400 passengers, which performed 450 totally human-controlled flights to collect video data and finetune control algorithms before being sent out to handle business on its own [1–10, 16–20].
2.6
Automatic Dependent Surveillance Broadcast (ADS-B)
On a regular basis, aircraft can broadcast current status information such as their international civil aviation organization (ICAO) identification number, longitude, latitude, and speed. The ADS-B-based system is less costly than traditional radar-
Future Prospects
193
based techniques, and the corresponding ADS-B receiver (at 1090 MHz or 978 MHz) may be easily connected to home PCs. The ADS-B message received, together with other data collected via the Internet, may give a large volume of aviation data that may be mined for military, agricultural, and commercial purposes. In civil aviation, ADS-B can be used to increase aircraft location precision and the reliability of air traffic management (ATM) systems. Malicious or fraudulent communications can be identified via multilateration (MLAT), providing all aircrafts inside airspace open, free, and secure vision. ADS-B is divided into two subsystems: ADS-BOUT and ADS-BIN. Flight transmitters in the ADS-BOUT subsystem regularly convey their own information (e.g., identification, position, velocity) to other aircraft and ground stations, whereas flight receivers in the ADS-BIN subsystem receive out-messages from other flights and ground stations [1–10, 16–20].
2.7
Revenue Management
The use of data and analytics to determine how to sell a product to people who need it at the right price, at the right time, and through the right channel is known as revenue management (RM). Data analytics and machine learning are used by airlines in the following areas: a. b. c. d.
Flight routes Willingness to pay Expected marginal seat revenue Ancillary price optimization [1–10, 16–20].
3 Maritime Logistics Machine learning allows users to apply advanced algorithms and analyze data to guide the logic of future concerns in sea transportation. These methodologies, among other things, may be used for marine network design, trip planning, cargo optimization, and maintenance procedures. ML algorithms can manage data from a vessel’s whole operational history. Because data is a critical component in removing uncertainty, adopting ML algorithms can assist to boost atypical data that can be critical for shipowners. Advanced machine learning algorithms will be capable of enhancing trip optimization, such as increasing fuel economy, reducing personnel performance, improving voyage cost estimates, calculating the ideal route in a minute, making speed and course suggestions, and so on [1–10, 21–25].
194
3.1
H. B. Mehare et al.
Imagery
For photographic data, the bulk of the ML case studies exhibited successful autonomous object detection and categorization. Plankton images from video recorders with well-defined features enable ML research to refine ML methods, whereas ML applications for underwater and aerial visual surveys for fish and marine mammals are more difficult to implement due to the quality of imagery collected from environments impacted by light, turbidity, and other environmental variability. When targets occur in less difficult backgrounds, object identification and classification are simplified; as a result, training with targets in more complex backgrounds is required. Another problematic use is the evaluation of benthic ecosystems using pixels [1–10, 21–25].
3.2
Active Acoustics and Passive Acoustics
Large volumes of acoustic backscatter data are collected from fishing surveys and ocean monitoring systems, and using machine learning to reduce human processing time has a considerable advantage. Unlike photos, acoustic surveys rely on extra technology and sample to analyze acoustic backscatter by species. As a consequence, the immediate benefits of employing ML include decreased time and human bias associated with time-consuming active acoustic data post-processing [1–10, 21–25]. ML programs employing cloud resources will cater to the classification of marine animal songs from passive audio data. The collection of passive acoustic data has increased rapidly over the last decade, resulting in petabytes of data that represent our ocean soundscapes [1–10, 21–25].
3.3
Other Data Types
There is broad consensus that ML applications for various environmental data are beneficial in terms of delivering higher-quality and more timely scientific results for the long-term protection of living ocean resources. Machine learning (ML) can, for example, be utilized in shotgun sequencing (e.g., meta-genome assembly or metatranscriptome assembly) or amplicon sequencing (PCR meta-barcoding) of environmental DNA (eDNA) used to uncover organisms in the marine environment. Furthermore, machine learning technologies for data fusion and assimilation from various observation systems have significant promise for improved forecasting in marine ecosystems and earth science [1–10, 21–25].
Future Prospects
3.4
195
Electronic Monitoring
Electronic monitoring refers to the collection of data on commercial and recreational fishing boats in order to provide scientific information, such as capture per unit effort and bycatch estimates for a given region, that may be used in harvest management guidelines. Vessel monitoring systems (VMS) and image data from camera systems are two examples of data gathering from fishery-dependent sampling. These data are subject to confidentiality restrictions that impede open access, and the greater community will not utilize them for ML-based discovery or forecasting. Furthermore, training datasets will most definitely be updated to reflect local fisheries and harvests. As a result, localized on-premise computers are likely to be the best choice for data access, training datasets, and ML calculations. Overall, machine learning applications for electronic monitoring data will save money by shortening processing times and giving more current scientific knowledge for regional harvest management rules [1–10, 21–25].
4 Software Engineering Processes (groups of software-related activities such as constructing specifications, detailed design, or testing), products (artifacts, deliverables, or documents that result from a process activity such as a specification document, a design document, or a segment of code), and resources are the three types of entities in software engineering (entities required by a process activity, such as personnel, software tools, or hardware) [1–10, 26–31]. Entities in the aforementioned categories have internal and external attributes. External attributes define an entity’s action, whereas internal properties define the object itself (how the entity relates to its environment). The following SE tasks, among others, lend themselves well to machine learning applications: 1. Measuring internal or external qualities of processes, goods, or resources and predicting or estimating measurements. • • • • • • • • • •
Software size estimation Software quality estimation Software cost prediction Software development effort prediction Maintenance task effort prediction Software resource analysis Correction cost estimation Software reliability prediction Defect prediction Reusability prediction
196
• • • •
H. B. Mehare et al.
Software release timing Testability prediction Productivity Execution time
2. Discovering either internal or external properties of processes, products, or resources 3. Transforming products to accomplish some desirable or improved external attributes 4. Synthesizing various products 5. Reusing products or processes 6. Enhancing processes (such as recovery of specification from software) 7. Managing ad hoc products (such as design and development knowledge) [1–10, 26–31].
4.1
Bug and Error Identification
Given the number of mistakes that are overlooked owing to human error and the massive volumes of data that must be processed and checked, machine learning algorithms may auto-correct themselves with minimal human intervention, making software development easier [1–10, 26–31].
4.2
Strategic Decision-Making
By reviewing the efficacy of prior development projects, machine learning may help stakeholders and development teams make data-driven business decisions and prevent risks [1–10, 26–31].
4.3
Testing Tools
As long as we know how the system is supposed to behave, entering input and comparing the results to expectations is rather easy. A match means the test was successful. If there is a discrepancy, the issue must be resolved. Machine learning allows software testers to offer more accurate results while minimizing the possibility of errors. Furthermore, it takes less time to run a software test and uncover a potential flaw, but the volume of data that must be examined can grow without placing additional load on the testing team [1–10, 26–31].
Future Prospects
4.4
197
Rapid Prototype
It normally takes months to turn a concept into a product since several stages must be accomplished, from brainstorming through wire-framing to building a product prototype. Machine learning has the potential to cut the time spent on prototype solutions in software development. Additionally, machine learning takes less technical professionals to create software [1–10, 26–31].
4.5
Code Review
Clean code is required for long-term maintenance and team collaboration. When businesses update their technology, large-scale code reorganization is unavoidable. Machine learning technologies might be used to analyze and optimize the code automatically. Compilers, which are programs that compile and translate computer code written in a high-level programming language into machine language that a computer can understand and execute, may fix obsolete code without requiring the original source. They accelerate the next generation of programming by automating the repair of current code [1–10, 26–31].
4.6
Smart and Intelligent Assistants
Smart programming assistants may significantly minimize this time by providing just-in-time assistance and guidance, such as relevant text, best practices, and code examples. Furthermore, programming helpers may learn from prior errors and flag them automatically during the development process. Machine learning may even be used to detect issues in system logs. In the future, machine learning is expected to allow software to adjust in reaction to flaws without the need for human intervention [1–10, 26–31].
4.7
Accurate and Precise Estimates
The budget and timeline for software development are routinely exceeded. To provide credible projections, the team must have substantial experience and context knowledge. To provide a more realistic budget estimate, machine learning may analyze prior project data such as product specifications, user stories, and estimates. Aside from these, machine learning may be used in spam detection, data security, and deployment management [1–10, 26–31].
198
H. B. Mehare et al.
5 Marketing and Retail Marketers utilize machine learning to find trends in user activity on a website. This enables them to estimate future user behavior and alter advertising offerings in real time. The goal of machine learning in marketing is to enable quick decisions based on massive amounts of data. Machine learning enables adaptation to changes in traffic quality induced by advertising campaigns. Machine learning systems can handle hundreds of requests, organize them, and offer results in the form of a ready answer to a question [1–10, 32–41, 70, 71, 74]. Machine learning (ML) has a significant impact on the retail industry, particularly for firms that rely on online sales, where the usage of AI technology is growing in popularity. Large corporations such as eBay, Amazon, and Alibaba have successfully integrated AI across the sales cycle, from inventory management to post-sale customer service [1–10, 32–41, 69, 72]. Key benefits of machine learning in marketing • • • • • • • •
Improves the quality of data analysis Enables you to analyze more data in less time Adapts to changes and new data Allows you to automate marketing processes and avoid routine work Does all of the above quickly Enhanced customer segmentation Optimized marketing campaigns Personalized suggestions [1–10, 32–41, 69, 71–73].
5.1
Recommendation Systems
A recommendation system’s goal is to supply clients with items that they are currently interested in. It predicts which things a customer is likely to buy and sends emails and push notifications, as well as recommended products and similar products blocks on a website [1–10, 32–41].
5.2
Forecast Targeting
In general, the core of all targeting approaches is to spend the advertising money entirely on target users. It predicts whether a user will make a purchase in the following n days. The most popular types of targeting are as follows:
Future Prospects
199
• Segment Targeting Show ads to groups of users with the same set of attributes • Trigger Targeting Show ads to users after they take a certain action (e.g., viewing a product or adding an item to the shopping cart) • Predictive Targeting Show ads to users based on the likelihood of them making a purchase [1–10, 32– 41].
5.3
Churn Rate Forecasting
In marketing, churning or outflow refers to consumers who leave a company and the resulting revenue loss and is frequently expressed in percentage or monetary terms. Churn rate forecasting allows you to predict a customer’s intent to leave your product or service before they actually do so. Based on the user segment, it anticipates the chance of users leaving. Email or push notification providers, as well as Google Ads, Facebook Ads, and other ad networks, can all get segments [1– 10, 32–41].
5.4
Choice Modeling
Many research have been published on discrete and hierarchical choice models, choice interdependence, endogeneity and heterogeneity constraints, and decisionmaking. A heterogeneous Bayesian semiparametric strategy provides a flexible and robust alternative to parametric techniques for modeling choice endogeneity. It was based on a centered Dirichlet process mixture (CDPM) model that captured consumer preference heterogeneity nonparametrically. The findings provide a more robust alternative to models that rely on the normal distribution to deal with endogeneity and heterogeneity difficulties. Approaches to quantifying the limit of rationality (LoR) in choice modeling applications must be developed to account for the reality that consumers’ decisions are not always logical. Their rational separation and choice graph techniques enabled them to swiftly compute LoR for applications such as supermarket sales data and product category identification, where going beyond rational choice models is essential to obtain acceptable performance [1–10, 32–41].
200
5.5
H. B. Mehare et al.
Product Matching
The technique of recognizing and connecting identical goods in two or more catalogs is known as product matching. Product matching enables any organization to keep track of the prices charged by competitors for comparable items. It aids in pricing comparisons with competitors or other suppliers of the same items, merging many offers from different vendors into a single product page, and doing assortment gap studies with competitors, among other less obvious applications [1–10, 32–41].
5.6
Predicting Customer Behavior
A forecasting consumer behavior system’s purpose is to anticipate how customers will act in the future based on historical behavior data. These tools enable merchants to segment customers and run more effective targeted marketing efforts than broad alternatives. Responding to anticipated consumer needs also boosts loyalty and retention [1–10, 32–41].
5.7
Retail Stocking and Inventory
Optimizing inventory control and predictive maintenance is a vital problem for retailers as well as a significant logistical challenge. 5.7.1
Predicting Inventory
Using purchase data, machine learning systems may predict inventory requirements in real time. These algorithms may present a daily dashboard of recommended purchases based on the day of the week, season, surrounding events, social media data, and client historical behavior to a purchasing manager. For each product, a prediction model is built using historical data such as previous stockouts, demand, price, and stock level. It also creates accurate estimates based on real-time data such as price fluctuations and date-time characteristics. When products on e-commerce sites are likely to be out of supply, users may be notified. Consumer behavior data, like purchase history or purchasing trends, is required for inventory planning models, but it may also contain social media activity and domain-specific expertise. These algorithms can also be used to optimize price, although a sales forecasting model is necessary [1–10, 32–41].
Future Prospects
5.8
201
e-Commerce
5.8.1
Tagging and Copywriting Automation
It may be tough to find products on a busy website. That is why retail teams spend days writing descriptive text for products and categorizing them based on their major characteristics and taxonomy of choice. When computer vision technologies are used to automate item labeling and descriptions, uniform and standardized data may be produced. Simply uploading a photo allows the computer to automatically fill up product data and categorize them on the website in the appropriate category and hierarchy [1–10, 32–41].
5.8.1.1
Image Retouching
Professional quality photographs may provide an e-commerce site an advantage, increasing the likelihood of a purchase and making manufacturers happy with how their products are depicted. Pictures are frequently shot in a studio before being sent out for retouching. Depending on the site, the retouching process may include several stages such as calibrating colors for accurate depiction, removing backgrounds, aligning/rotating items, smoothing wrinkles, and erasing mannequins. AI-assisted image retouching saves time and money by replicating the normal Photoshop procedure used by artists on a much larger scale. This can significantly minimize the amount of time required to edit thousands of images [1–10, 32–41].
6 Manufacturing The next industry 4.0 frontier has been characterized as the introduction of connected, adaptive production (CAP) as a notion of autonomous, networked, and environment-responsive equipment and processes. A lively application environment, according to this view, promises high-level efficiency benefits such as process optimization through loss minimization, lead-time reduction, and adaptive routing and scheduling [1–10, 42–47].
6.1
Predictive Maintenance
Predictive maintenance is a significant use of machine learning in manufacturing because it may employ algorithms to forecast the failure of crucial machinery or components. Machine learning can find trends in data from previous maintenance cycles, which may be used to predict equipment failures and when future maintenance is needed. This information may then be used to schedule maintenance before
202
H. B. Mehare et al.
problems emerge. This, in turn, may save manufacturers significant time and money by allowing them to address specific problems precisely when they are required – and in a highly focused way. This is advantageous to producers because: • • • • • • •
Significant process-driven loss reductions Cost reductions driven by predictive maintenance Consumer-driven product creation thanks to smart factories Boost in capacity through process optimization Ability to scale product lines by streamlining and optimizing processes More efficient inventory management by using predictive analytics Extended life of machinery and equipment via predicting remaining useful life (RUL). • Better supply chain management • Enhanced quality control • Improved safety conditions on the manufacturing floor with the help of deep learning techniques implementation [1–10, 42–47].
6.2
Predictive Quality and Yield
Firms are finding it more difficult to tolerate process-based losses as consumer demand grows in parallel with population expansion. AI and machine learning may assist firms in determining the root cause of losses in quality, productivity, energy efficiency, and other areas, helping them to maintain their bottom line and remain competitive. This is accomplished by continuous, multivariate analysis with process-tailored ML algorithms and machine learning-enabled root cause analysis (RCA). ML and AI-driven RCA, in particular, is a powerful tool for eliminating process-based waste and is substantially more successful than human RCA for the following reasons: • Machine learning algorithms use past data models to detect trends in fresh data and forecast where losses may occur – preventing problems before they occur. • Entirely data-driven and unbiased. • Without the distractions of everyday administration and other manual duties handled by process specialists, the focus is only on enhancing processes [1–10, 42–47].
6.3
Digital Twin
Manufacturers may employ a real-time digital representation of a physical product or process. It might help businesses revolutionize their engineering processes while also offering full design, production, and operational customization. Manufacturing
Future Prospects
203
companies may create a virtual representation of their products and processes to test and optimize before they are built. Some of the benefits of ML-enabled digital twins in manufacturing include: • • • • • • •
Significant cost reductions Improved reliability of production lines Optimized performance and productivity Reduced risks on shop floor Improved quality Full customization Streamlined maintenance [1–10, 42–47].
6.4
Generative Design and Smart Manufacturing
Based on established factors like size, materials, weight, and so on, AAI and machine learning can develop an almost infinite number of design solutions for any problem/product. This allows engineers to evaluate the best design choice for a product before it is built. Discriminator and generator models are used in machine learning to: • Create new designs for specified products • Distinguish between generated and real products • Train deep learning algorithms to recognize and define every possible design solution • Making the computer a design partner [1–10, 42–47].
6.5
Energy Consumption Forecasting
Manufacturers may now employ machine learning algorithms to provide predictive estimates of future energy use based on factors such as temperature, lighting, activity levels within a facility, and more. Machine learning algorithms can search through enormous data sets to find patterns and connections that would be impossible to find using traditional methods, such as: • • • •
Sequential data measurements Autoregressive data models that identify cyclical or seasonal trends Helps factory owners and operators plan for future energy needs. Forecasting energy consumption can help factories avoid disruptions in production due to unexpected changes in energy costs or availability [1–10, 42–47].
204
6.6 6.6.1
H. B. Mehare et al.
Manufacturing Ergonomics Operator Model
Analyzing, categorizing, and evaluating the ergonomic principle risks of a human operator’s activity in manufacturing, including both physically and emotionally demanding workloads and fatigue, to identify the relationship between operator activities and work events and quantify the relationship between human work posture and degree of ergonomic risk. Wearable gadgets, sensors, and videos are the primary data sources for recording the operators’ operations. • • • • •
Sensing-based activity assessment Motion analysis through videos Risk stratification of physical workload Mental workload evaluation Fatigue classification [1–10, 42–47].
6.6.2
Operator and Workspace Interaction Model
Interactions with the surrounding environment are always a component of the job of an operator. Thus, concerns of occupational safety, the impact of the workplace environment, and, in manufacturing, the collaboration between human operators and robots (or machines) are all related to the operator’s dangers. • Occupational safety • Workplace environment • Human-robot collaboration [1–10, 42–47].
6.6.3
System Design and Optimization Model
In a manufacturing system, all operators’ actions are interconnected, and their impacts might extend throughout the system. As a result, ergonomics should be considered in all aspects of manufacturing system design and optimization, from products to activities and processes. • Product design • Task assignment • Process planning [1–10, 42–47].
Future Prospects
6.7
205
Fault Detection
Rapid and accurate evaluation of manufacturing machinery process problems provides a strategic advantage in supporting industrial businesses in keeping competitive by reducing machine downtime. Machine learning algorithms will find more application in the manufacturing industry for providing production system fault diagnostics as customers want manufacturers to provide high-quality items swiftly and at a reasonable cost [1–10, 42–47].
7 Cybersecurity Over the last half-century, the information and communication technology (ICT) industry has evolved dramatically, becoming pervasive and profoundly integrated in our modern society. Because of the increasing reliance on digitalization and the Internet of Things (IoT), various security incidents such as unauthorized access, malware attack, zero-day attack, data breach, denial of service (DoS), social engineering or phishing, and so on have grown at an exponential rate in recent years. Machine learning algorithms may detect irregularities or malicious behavior, as well as data-driven patterns of related security risks, to help make an educated decision. The key to automating and intelligent security systems is to extract security event patterns or insights from cybersecurity data and build associated data-driven models [1–10, 48–52]. Different use cases of ML in cybersecurity: (a) To detect threats and stop attacks (b) To analyze mobile endpoints (c) Enhance human analysis Defense strategies are necessary to protect information, information systems, and networks against cyberattacks or intrusions. They are primarily responsible for preventing data breaches or security incidents, as well as monitoring and responding to intrusions, which are defined as any illegal action that damages an information system. An intrusion detection system (IDS) is often a hardware or software application that monitors a computer network or systems for malicious activity or policy violations. It solves issues by analyzing security data from multiple critical points in a computer network or system (internal or external attacks) [1–10, 48–52]. a. Signature-Based IDS A signature might be a preset language, pattern, or rule that corresponds to an earlier known assault. A certain pattern is identified as the detection of comparable assaults in a signature-based IDS. It can manage a huge volume of network
206
H. B. Mehare et al.
data effectively and is also known as knowledge-based or abuse detection; nevertheless, it is restricted to recognized risks. One of the most significant challenges that this signature-based system confronts is detecting new or undiscovered attacks [1–10, 48–52]. b. Anomaly-Based IDS The concept of anomaly-based detection addresses the issues of signature-based IDS by examining network activity to identify dynamic patterns, automatically developing a data-driven model, profiling expected behavior, and detecting deviations in the case of any irregularities [1–10, 48–52]. Once the detection has been completed, the intrusion prevention system (IPS), which is meant to prevent malicious events, may be used to mitigate the risks in a number of ways, including manual, notification, and automatic procedures [1–10, 48–52].
7.1
Quantum Computing
Quantum computers are very powerful systems that process data using innovative approaches. By incorporating real-time operations, these systems may run modern types of algorithms to handle information more systematically [1–10, 48–52].
7.2
Cloud Computing
Cloud computing introduces significant new security issues that have yet to be solved. More importantly, cloud computing illustrates the need for novel methods to cybersecurity, such as lifecycle management and corporate integration [1–10, 48–52].
7.3
Predictive Semantics
One of the least understood but most significant new IT breakthroughs is semantic technology. Few companies have recognized the connection between semantic technology and cybersecurity. Semantic technology, on the other hand, provides new techniques for integrating and analyzing data, which will help future generations with predictive and visual analytics. This will help with biometrics, identity management, and network behavior integration [1–10, 48–52].
Future Prospects
7.4
207
Behavioral Identity
One of the most significant advances in the future decade will be the growth of understanding about what identity represents. Identity can be defined as a credential, a biometric identification, or both, and it will be evaluated using hundreds of features and real-time behavior [1–10, 48–52].
7.5
Dynamic Networks
A dynamic network is the next generation of network management, enabling better automation, self-repair, and performance. AI can aid in the finding of devices and hidden patterns when analyzing massive amounts of data. Machine learning may aid in the monitoring of incoming and outgoing communications in the IoT ecosystem for any behavioral anomalies. AI and machine learning may be used to develop low-cost endpoint detection systems. This can be a life-saving alternative, particularly when IoT devices lack computational capacity and need less resource-intensive behavior-based detection capabilities [1–10, 48–52]. Because cybersecurity and machine learning are linked and can increase the efficacy of each other, the following are some of the advantages of their collaboration: • Full-Proof Security of ML Models The presence of these attacks may have an effect on the operation, performance, and forecasting of machine learning models. However, these undesirable scenarios may be prevented by adopting certain cybersecurity technologies. Cybersecurity measures are used to protect the operation and performance of ML models, as well as their input datasets [1–10, 48–52]. • Improved Performance of Cybersecurity Techniques While machine learning algorithms are used to improve the efficiency of cybersecurity schemes, other ML approaches such as supervised learning, unsupervised learning, reinforcement learning, and deep learning algorithms can also be used depending on the communication environment and connected systems [1–10, 48–52]. • Effective Detection of Zero-Day Attacks Cybersecurity systems that use machine learning models to detect intrusions tend to be particularly effective in detecting zero-day threats (i.e., unknown malware attacks). This happens because they detect with the help of some deployed models. The models work by collecting and comparing particular attributes; if a program’s features match those of a malicious software, that program may be deemed malevolent. This identifying procedure may be carried out automatically by machine learning models [1–10, 48–52].
208
H. B. Mehare et al.
• Quick Scanning and Mitigation Machine learning-based intrusion detection systems are quite effective in detecting the presence of threats. As a consequence, merging machine learning with cybersecurity systems enables very rapid monitoring of intrusions as well as rapid response in the case of any intrusion symptom [1–10, 48–52].
8 Healthcare The system is organized into four categories: preventive, diagnostic, corrective, and therapeutic. Each of these areas works together to provide a complete, all-encompassing experience for today’s patients. Using machine learning for the activities outlined above can provide healthcare firms with a plethora of new opportunities. • Allows healthcare providers to focus on patient care rather than searching for or entering information • Increase of diagnosing accuracy • Develop precise treatment plans [1–10, 53–59]. Given the rapid developments in artificial intelligence for imaging processing, most radiology and pathology images will almost certainly be examined by a computer at some point. Speech and text recognition are already being utilized for things like patient communication and clinical note gathering. To be widely used, AI systems must be approved by regulators, linked to EHR systems, standardized to the point that similar products work similarly, taught to doctors, paid for by public or commercial payer groups, and continuously updated in the field [1–10, 53–59].
8.1
Clinical Decision Support Systems
Machine learning is transforming the healthcare industry by applying cognitive technologies to decipher enormous volumes of medical data and also to perform any powerful diagnosis. Clinical decision support systems assist in the processing of massive volumes of data in order to diagnose a disease, determine the next therapeutic step, identify/flag any potential complications, and improve overall patient care efficiency. CDSS is a powerful tool that helps physicians perform their jobs more efficiently and quickly while also decreasing the probability of making an inaccurate diagnosis or prescribing ineffective medicine [1–10, 53–59].
Future Prospects
8.2
209
Smart Recordkeeping
Because data entry is a time-consuming process, it is difficult to ensure that all patient records are frequently updated. It is, nonetheless, crucial for sound decisionmaking and improved patient care. The use of optical character recognition (OCR) technology on physicians’ handwriting to speed up and simplify data input is one use of machine learning in healthcare. This data may then be examined by other machine learning algorithms to improve decision-making and patient care [1–10, 53–59].
8.3
Medical Imaging
For the longest time, medical images like X-rays were analog. This has made it difficult to apply technology for anomaly identification, case grouping, and disease research in general. Fortunately, the industry’s digitalization has provided opportunities for several types of data analysis, including machine learning [1–10, 53–59]. Deep learning models may be taught to recognize certain photos (such as nodule detection on chest computed tomography or hemorrhage on brain magnetic resonance imaging). However, hundreds of such restricted detection jobs are necessary to effectively identify all prospective discoveries in medical imaging. In addition, radiologists consult with other doctors on diagnosis and treatment, treat diseases (e.g., by providing local ablative therapies), perform image-guided medical interventions such as cancer biopsies and vascular stents, characterize the technical parameters of imaging tests to be performed, connect conclusions from image data to other health records and diagnostic testing, and discuss procedures and results with patients [1–10, 53–59].
8.4
Personalized Medicine
People typically suffer from a number of ailments that require concurrent treatment, and complex considerations must be made in order to establish an effective treatment plan that accounts for pharmaceutical interactions while minimizing potential bad effects. Based on a patient’s history, machine learning may present a plethora of potential therapeutic options. Because they are based on the user’s data, these therapies are more likely to suit the patient and are more personalized [1–10, 53–59].
210
8.5
H. B. Mehare et al.
Predictive Adjustments to Treatment
When it comes to the most fatal diseases, discovering them early increases the chances of effective therapy significantly. This also helps to detect potential worsening in the patient’s condition before it happens. Wearables powered by AI are being developed to monitor a person’s health and inform users when anything unusual or improbable is noticed. These devices monitor a person’s heart rate, sleep cycle, breathing rate, amount of activity, blood pressure, and other vital indicators. It monitors these readings 24 h a day and 7 days a week and can be used to accurately predict some of the most serious illnesses in at-risk people. Detecting diabetic signs, hepatic and renal issues, and oncology are all part of this [1–10, 53–59].
8.6
Elderly and Low-Mobility Group Care
Machine learning and healthcare could indeed assist people with limited mobility in improving their daily lives by providing smart reminders and scheduling assistance, predicting and avoiding potential injuries by identifying common obstacles and determining the best paths, and obtaining help as soon as possible [1–10, 53–59].
8.7
Robotic Process Automation
This method performs structured digital administration tasks, such as those involving information systems, as if they were human users following scripts or laws. When compared to other forms of AI, they are less expensive, easier to construct, and more transparent in their operations. It employs a mix of workflow, business rules, and presentation layer interaction with information systems to function as a semi-intelligent user of the systems. They are used in healthcare for repetitive tasks such as prior authorization, patient information updates, and billing. They can be used with other technologies, such as image recognition, to extract information from faxed images and input it into transactional systems [1–10, 53–59]. Surgical operations need exceptional precision, adaptability to changing circumstances, and a consistent approach over an extended period of time. While trained surgeons possess all of these qualities, one of the possibilities in machine learning for healthcare is for robots to fulfill these tasks. Machine learning, in particular, has the potential to improve operation modeling and planning, assess the surgeon’s competence, and simplify surgical tasks like suturing. They are more flexible and controllable than any other way for implementing complex tasks. Robotic surgery is commonly used in gynecologic surgery, prostate surgery, and head and neck surgery [1–10, 53–59].
Future Prospects
8.8
211
Drug Discovery and Production
Based on previously obtained data on active components in pharmaceuticals and how they affect the organism, ML algorithms can model an active component that would perform on another analogous circumstance. This strategy may be used to provide a tailored therapy for those who have a unique set of symptoms or special needs. In the future, this machine learning approach might be used in combination with nanotechnology to enhance pharmaceutical delivery [1–10, 53–59].
8.9
Clinical Research
Because new pharmaceuticals and medical treatments must be proven safe before they are widely used, clinical research and trials are costly and time-consuming operations. Machine learning algorithms can help to simplify the process by identifying the best trial sample, gathering extra data points, analyzing ongoing data from trial participants, and reducing data-based errors [1–10, 53–59].
8.10
Infectious Disease Outbreak Prediction
Predicting these outbreaks is especially beneficial in third-world countries with inadequate medical infrastructure and educational systems. Machine learningbased technologies can identify an epidemic or pandemic early on, if diseases would grow out of control, analyze satellite data, news, social media feeds, and even video sources [1–10, 53–59].
8.11
Administration
Machine learning may be used in a variety of applications in healthcare, including claims processing, clinical documentation, revenue cycle management, and medical records administration. Chatbots can be used for patient communication, mental health and wellness, and telehealth. These NLP-based applications might be helpful for simple tasks like medicine refills or appointment booking. Another AI method applicable to claims and payment administration is machine learning, which may be used for probabilistic data matching across numerous databases. Millions of claims must be checked for accuracy by insurers. Detecting, evaluating, and correcting coding mistakes and fraudulent claims saves time, money, and effort for all stakeholders, including health insurers, governments, and providers. Incorrect claims that go through the cracks represent tremendous money potential that may be realized through data matching and claims auditing [1–10, 53–59].
212
8.12
H. B. Mehare et al.
Prescription Error
Machine learning can detect and analyze prescription errors. It evaluates the patient’s health data alongside the prescribed drugs to discover and correct any medication errors [1–10, 53–59].
9 Agriculture Agriculture is seen as a critical component of the global economy since it offers one of humanity’s most basic needs, namely food. In the majority of countries, it is recognized as the key source of employment. Machine learning (ML) has grown in tandem with big data technologies and high-performance computers to provide new avenues for unraveling, measuring, and comprehending data-intensive processes in agricultural operational contexts [1–10, 60–64]. Understanding how weather; seasonal sunlight; animal, bird, and insect migratory patterns; crop-specific fertilizers and pesticides; planting cycles; and irrigation cycles all affect productivity is a fantastic challenge for machine learning. By integrating machine learning to sensor data, farm management systems are evolving into full artificial intelligence systems, providing deeper suggestions and insights for subsequent decisions and actions with the ultimate objective of increasing productivity. In this context, it is expected that the usage of ML models will become increasingly more common in the future, allowing for the development of integrated and relevant solutions [1–10, 60–64].
9.1
Pre-harvesting
Crop/fruit growth is greatly influenced by pre-harvesting circumstances. In pre-harvesting, machine learning is used to record soil, seed quality, fertilizer treatment, pruning, genetic and environmental variables, and irrigation parameters [1–10, 60–64].
9.1.1
Soil
Machine learning algorithms can predict or identify soil properties, pH levels, soil organic matter, and soil fertility indicators. Soil categorization and evaluation help farmers save excessive fertilizer expenses, reduce the need for soil analysis professionals, increase profitability, and improve soil health [1–10, 60–64].
Future Prospects
9.1.2
213
Seeds
To automate the seed sorting and calculation operation, many machine learning and image recognition algorithms have been described. CNN is used in the DNN model to determine the number of seeds per pod and to identify haploid seeds based on shape, morphological expression, and embryo location [1–10, 60–64].
9.1.3
Pesticide and Disease Detection
A real-time decision support system paired with a video sensor module for plant disease diagnostics and leaf reflections can be used in disease detection. Physical factors such as texture, color, hole structure on the fruit, and morphology can be used to identify diseases intelligently. Agricultural teams using AI may be able to detect and diagnose pest infestations before they occur by integrating infrared camera data from drones with sensors on the ground that can monitor plant health levels. By merging intelligent sensors with visual data streams from drones, agricultural AI systems can already pinpoint the most polluted locations in a growing area. They may then use supervised machine learning algorithms to find the ideal pesticide mixture to keep pests from spreading and infecting good crops [1–10, 60–64].
9.1.4
Surveillance
Machine learning will reduce the likelihood of domestic and wild animals harming crops by mistake or performing a break-in or burglary at a remote agricultural area. Machine learning-based monitoring has proven effective in securing faraway sites, optimizing harvests, and preventing trespassers by applying machine learning to identify employees who operate onsite [1–10, 60–64].
9.2 9.2.1
Crop Management Yield Prediction
One of the most essential aspects of precision agriculture is yield prediction, which is necessary for yield mapping, yield estimation, matching crop supply to demand, and crop management to enhance productivity. Agricultural specialists may now forecast prospective soil yields for a given crop by combining 3D mapping, social condition data from sensors, and drone-based soil color data [1–10, 60–64].
214
9.2.2
H. B. Mehare et al.
Crop Quality
Accurate detection and categorization of agricultural quality parameters can increase crop price while reducing wastage. Because weeds are difficult to identify and differentiate from crops, weed identification is crucial to sustainable agriculture. Again, combining ML algorithms with sensors can result in accurate weed detection and classification at a low cost, with no environmental challenges or side effects [1– 10, 60–64].
9.2.3
Species Recognition
The main goal is to avoid the use of human specialists and reduce classification time by applying automatic identification and categorization of plant species. Vein morphology offers detailed information on the leaf’s characteristics. It is a good assistance for plant identification when compared to color and shape [1–10, 60–64].
9.3
Harvesting
At this stage, important variables to examine are fruit/crop size, skin color, hardness, flavor, quality, maturity stage, market window, type identification, and harvest classification. Auto-harvesting robots, machine learning, and deep learning technologies are improving results and supporting farmers in reducing harvesting losses. The use of autonomous robots in the field enhances productivity, decreases harvesting time, and ultimately increases farmer profitability [1–10, 60–64].
9.4
Post Harvesting
Subtasks such as fruit and vegetable shelf life, post-harvest grading, and export must be done after all operations, from yield estimation through harvesting, have been finished. Each country has its own set of criteria for categorizing edible imports and exports. Inadequate post-harvest management can diminish fruit quality and quantity, increasing overall losses. Only degradation was responsible for 31% of store losses. Poor harvesting, careless handling, and insufficient packaging and shipping are all losses [1–10, 60–64]. Fruit and vegetable quality is defined by aspects such as shape, size, texture, color, and defects. To identify fruits and vegetables based on their quality features, many procedures such as data collecting, pre-processing of data, image segmentation, feature extraction, and classification must be utilized [1–10, 60–64]. Crop price forecasting based on production rates is critical in developing price strategies for a specific crop [1–10, 60–64].
Future Prospects
9.5
215
Livestock Management
Animal welfare and livestock production are subcategories of the livestock category. Animal welfare is concerned with the health and well-being of animals, and machine learning is mostly used to monitor animal behavior in order to detect ailments early. Livestock farming, on the other hand, addresses production system difficulties, with the primary focus of ML applications being the exact prediction of economic balances for farmers based on production line monitoring [1–10, 60–64].
9.6
Water Management
Water management in agricultural production requires significant effort and is critical to hydrological, climatological, and agronomical equilibrium. Finding irrigation leaks, changing irrigation systems, and determining how effective frequent crop watering improves yield rates are just a few of the ways machine learning may assist farmers in increasing their output rates. The correct computation of evapotranspiration is a complex procedure that is crucial for crop production resource management as well as irrigation system design and operation management. The daily dew point temperature, on the other hand, is critical in predicting meteorological events and measuring evapotranspiration and evaporation [1–10, 60–64]. Challenges Associated with Machine Learning in Real-World Applications The benefits of machine learning are enormous. However, the benefits are not without downsides. The following are some of the issues encountered while building machine learning algorithms. 1. Data Data is the most fundamental requirement for developing machine learning models, and the decisions made by the machine learning algorithm are fully dependent on the data on which it was trained. Many researchers encountered data challenges such as a lack of data, data that was not available in the right format, low data quality, data that had extraneous parts, and so on. Unfortunately, not all data is as precise and standardized as it should be. There are gaps in the records, profile discrepancies, and other concerns [1– 10, 71–74]. 2. Data Pre-processing Because data has so many flaws, numerous pre-processing approaches must be utilized to prepare the data for training, testing, and validating the model, which may be a time-consuming process [1–10, 71–74]. 3. Selection of Machine Learning Algorithms There are several machine learning algorithms available, making it difficult to select the best way for developing the unique machine learning model. Many
216
4.
5.
6.
7.
8.
9.
10.
H. B. Mehare et al.
times, random selection is required, or the results of several algorithms may be compared to choose the most suitable algorithm. This strategy of trial and error may cause the model deployment process to be delayed [1–10, 71–74]. Training and Testing of the Machine Learning Model To construct an accurate model, large volumes of data are necessary for training. Prior to deployment, testing and validation are also required to validate the model’s accuracy. Building a model from the ground up for the greatest planned and realistic outcomes requires significant training and numerous time testing, both of which take time. It needs high-end hardware resources, subject-matter programmers, testing tools, and so on. When developing models, overfitting and underfitting are key challenges [1–10, 71–74]. Model Deployment Because of a lack of deployment skills, third-party library dependencies, model size, intricate real-world scenarios, deployment platform hardware limits (such as android phones and embedded boards), and so on, this is the most challenging phase of moving the models into production [1–10, 71–74]. Compatibility Several types of dynamic domain components and machine learning approaches are used in the merging of multiple domains with machine learning. Furthermore, the data used in the analysis process originates from a number of sources, including IoT devices. These Internet of Things devices interact via a variety of protocols. There may be compatibility issues and/or overloading of the system during the integration and/or amalgamation of several algorithms, causing a hindrance to the system’s actual functioning [1–10, 71–74]. Privacy and Data Security Everyone should be able to keep their medical records private. In most cases, machine learning does not need a whole collection of data; hence, it may be adequately anonymized so that the person’s identity is not revealed and the ML precision method is not jeopardized [1–10, 71–74]. Autonomy Machine learning deployment may result in humans giving up their autonomy and doing as told. It limits their projected range of options to certain preferred ones. As a result, there should be a clear balance between algorithmic instructions and personal freedom [1–10, 71–74]. Transparency and Informed Consent Many countries have laws that make it illegal to use specific personal, sensitive, or private data without the informed consent of the data’s owner or holder. As a result, the use of machine learning in healthcare should be accompanied with a warning to the user, as well as data security safeguards taken to keep their data safe [1–10, 71–74]. Representation and Inclusivity When developing a comprehensive healthcare software system, be certain that its algorithms function well on a diverse set of patients. As a result, a machine learning solution should be trained on a sufficiently diverse range of use cases and backgrounds [1–10, 71–74].
Future Prospects
217
11. Skill Management It is critical to promote good team collaboration so that value can be delivered and the product’s viability can be shown as soon as feasible. The following responsibilities should be included in a successful machine learning development team: • • • • •
Business analyst Data architect Data engineer Data scientist Machine learning expert [1–10, 71–74].
References 1. Mitchell, T. M., & Mitchell, T. M. (1997). Machine learning (Vol. 1, No. 9). McGraw-Hill. 2. Mjolsness, E., & DeCoste, D. (2001). Machine learning for science: State of the art and future prospects. Science, 293(5537), 2051–2055. 3. Solomonoff, R. J. (2006). Machine learning-past and future. Dartmouth. 4. Surya, L. (2016). An exploratory study of Machine Learning and It's future in the United States. International Journal of Creative Research Thoughts (IJCRT), 2320–2882. 5. Zhou, Z. H. (2016). Learnware: On the future of machine learning. Frontiers of Computer Science, 10(4), 589–590. 6. Helm, J. M., Swiergosz, A. M., Haeberle, H. S., Karnuta, J. M., Schaffer, J. L., Krebs, V. E., et al. (2020). Machine learning and artificial intelligence: Definitions, applications, and future directions. Current Reviews in Musculoskeletal Medicine, 13(1), 69–76. 7. Morocho-Cayamcela, M. E., Lee, H., & Lim, W. (2019). Machine learning for 5G/B5G mobile and wireless communications: Potential, limitations, and future directions. IEEE Access, 7, 137184–137206. 8. Kopuru, M. S. K. (2020). A machine learning framework for prediction of diagnostic trouble codes in automobiles. Mississippi State University. 9. Dimitrakopoulos, G., & Demestichas, P. (2010). Systems based on cognitive networking principles and management functionality. IEEE Transactions on Vehicular Technology, 5, 77–84. 10. Alzubi, J., Nayyar, A., & Kumar, A. (2018, November). Machine learning from theory to algorithms: an overview. In Journal of physics: Conference series (Vol. 1142, No. 1, p. 012012). IOP Publishing. 11. Jindal, M., Gupta, J., & Bhushan, B. (2019, October). Machine learning methods for IoT and their future applications. In 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) (pp. 430–434). IEEE. 12. Qureshi, K. N., & Abdullah, A. H. (2013). A survey on intelligent transportation systems. Middle-East Journal of Scientific Research, 15(5), 629–642. 13. An, S. H., Lee, B. H., & Shin, D. R. (2011, July). A survey of intelligent transportation systems. In 2011 third international conference on computational intelligence, communication systems and networks (pp. 332–337). IEEE. 14. Figueiredo, L., Jesus, I., Machado, J. T., Ferreira, J. R., & De Carvalho, J. M. (2001, August). Towards the development of intelligent transportation systems. In ITSC 2001. 2001 IEEE intelligent transportation systems. Proceedings (Cat. No. 01TH8585) (pp. 1206–1211). IEEE.
218
H. B. Mehare et al.
15. Zhang, J., Wang, F. Y., Wang, K., Lin, W. H., Xu, X., & Chen, C. (2011). Data-driven intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems, 12(4), 1624–1639. 16. Samuel, A. L. (1967). Some studies in machine learning using the game of checkers. II – Recent progress. IBM Journal of Research and Development, 11(6), 601–617. 17. Jiang, Y., Liu, Y., Liu, D., & Song, H. (2020, August). Applying machine learning to aviation big data for flight delay prediction. In 2020 IEEE international conference on dependable, autonomic and secure computing, International conference on pervasive intelligence and computing, International conference on cloud and big data computing, International conference on cyber science and technology congress (DASC/PiCom/CBDCom/CyberSciTech) (pp. 665–672). IEEE. 18. Gui, G., Liu, F., Sun, J., Yang, J., Zhou, Z., & Zhao, D. (2019). Flight delay prediction based on aviation big data and machine learning. IEEE Transactions on Vehicular Technology, 69(1), 140–150. 19. Madeira, T., Melício, R., Valério, D., & Santos, L. (2021). Machine learning and natural language processing for prediction of human factors in aviation incident reports. Aerospace, 8(2), 47. 20. Hansen, C. J., DiCostanzo, D., Mumaw, R. J., & Patterson, E. S. (2020, September). Healthcare and aviation: Perspectives on alerts, machine learning, and future directions. In Proceedings of the international symposium on human factors and ergonomics in health care (Vol. 9, No. 1, pp. 113–115). SAGE Publications. 21. Michaels, W. L. (Ed.). (2019). Machine learning to improve marine science for the sustainability of living ocean resources: Report from the 2019 Norway-US Workshop. US Department of Commerce, National Oceanic and Atmospheric Administration, NOAA Fisheries. 22. Obradović, I., Miličević, M., & Žubrinić, K. (2014). Machine learning approaches to maritime anomaly detection. Naše more: znanstveni časopis za more i pomorstvo, 61(5–6), 96–101. 23. Rawson, A., & Brito, M. (2022). A survey of the opportunities and challenges of supervised machine learning in maritime risk analysis. Transport Reviews, 1–23. 24. Akyuz, E., Cicek, K., & Celik, M. (2019). A comparative research of machine learning impact to future of maritime transportation. Procedia Computer Science, 158, 275–280. 25. Makridis, G., Kyriazis, D., & Plitsos, S. (2020, September). Predictive maintenance leveraging machine learning for time-series forecasting in the maritime industry. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) (pp. 1–8). IEEE. 26. Zhang, D., & Tsai, J. J. (Eds.). (2005). Machine learning applications in software engineering (Vol. 16). World Scientific. 27. Sperling, A., & Lickerman, D. (2012, July). Integrating AI and machine learning in software engineering course for high school students. In Proceedings of the 17th ACM annual conference on innovation and technology in computer science education (pp. 244–249). 28. Chiong, K. X., & Shum, M. (2019). Random projection estimation of discrete-choice models with large choice sets. Management Science, 65(1), 256–271. 29. Zhang, D., & Tsai, J. J. (2003). Machine learning and software engineering. Software Quality Journal, 11(2), 87–119. 30. Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., . . . & Zimmermann, T. (2019, May). Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSESEIP) (pp. 291–300). IEEE. 31. Zhang, D., & Tsai, J. J. (Eds.). (2006). Advances in machine learning applications in software engineering. Igi Global. 32. Brei, V. A. (2020). Machine learning in marketing: Overview, learning strategies, applications, and future developments. Foundations and Trends® in Marketing, 14(3), 173–236. 33. Ma, L., & Sun, B. (2020). Machine learning and AI in marketing–connecting computing power to human insights. International Journal of Research in Marketing, 37(3), 481–504. 34. Hair, J. F., Jr., & Sarstedt, M. (2021). Data, measurement, and causal inferences in machine learning: Opportunities and challenges for marketing. Journal of Marketing Theory and Practice, 29(1), 65–77.
Future Prospects
219
35. Siau, K., & Yang, Y. (2017, May). Impact of artificial intelligence, robotics, and machine learning on sales and marketing. In Twelve Annual Midwest Association for Information Systems Conference (MWAIS 2017) (Vol. 48, pp. 18–19). 36. Hagen, L., Uetake, K., Yang, N., Bollinger, B., Chaney, A. J., Dzyabura, D., et al. (2020). How can machine learning aid behavioral marketing research? Marketing Letters, 31(4), 361–370. 37. Miklosik, A., Kuchta, M., Evans, N., & Zak, S. (2019). Towards the adoption of machine learning-based analytical tools in digital marketing. IEEE Access, 7, 85705–85718. 38. Huber, J., & Stuckenschmidt, H. (2020). Daily retail demand forecasting using machine learning with emphasis on calendric special days. International Journal of Forecasting, 36(4), 1420–1438. 39. Kumar, M. R., Venkatesh, J., & Rahman, A. M. J. (2021). Data mining and machine learning in retail business: Developing efficiencies for better customer retention. Journal of Ambient Intelligence and Humanized Computing, 1–13. 40. Krishna, A., Akhilesh, V., Aich, A., & Hegde, C. (2018, December). Sales-forecasting of retail stores using machine learning techniques. In 2018 3rd International Conference on Computational Systems and Information Technology for Sustainable Solutions (CSITSS) (pp. 160–166). IEEE. 41. Kaneko, Y., & Yada, K. (2016, December). A deep learning approach for the prediction of retail store sales. In 2016 IEEE 16th International conference on data mining workshops (ICDMW) (pp. 531–537). IEEE. 42. Qi, X., Chen, G., Li, Y., Cheng, X., & Li, C. (2019). Applying neural-network-based machine learning to additive manufacturing: Current applications, challenges, and future perspectives. Engineering, 5(4), 721–729. 43. Lee, S., Liu, L., Radwin, R., & Li, J. (2021). Machine learning in manufacturing ergonomics: Recent advances, challenges, and opportunities. IEEE Robotics and Automation Letters, 6(3), 5745–5752. 44. Schuh, G., Scholz, P., & Nadicksbernd, M. (2020, October). Identification and characterization of challenges in the future of manufacturing for the application of machine learning. In 2020 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS) (pp. 1–10). IEEE. 45. Rai, R., Tiwari, M. K., Ivanov, D., & Dolgui, A. (2021). Machine learning in manufacturing and industry 4.0 applications. International Journal of Production Research, 59(16), 4773–4778. 46. Wang, C., Tan, X. P., Tor, S. B., & Lim, C. S. (2020). Machine learning in additive manufacturing: State-of-the-art and perspectives. Additive Manufacturing, 36, 101538. 47. Sharp, M., Ak, R., & Hedberg, T., Jr. (2018). A survey of the advancing use and development of machine learning in smart manufacturing. Journal of Manufacturing Systems, 48, 170–179. 48. Geluvaraj, B., Satwik, P. M., & Ashok Kumar, T. A. (2019). The future of cybersecurity: Major role of artificial intelligence, machine learning, and deep learning in cyberspace. In International conference on computer networks and communication technologies (pp. 739–747). Springer. 49. Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H., Watters, P., & Ng, A. (2020). Cybersecurity data science: An overview from machine learning perspective. Journal of Big data, 7(1), 1–29. 50. Fraley, J. B., & Cannady, J. (2017, March). The promise of machine learning in cybersecurity. In SoutheastCon 2017 (pp. 1–6). IEEE. 51. Shaukat, K., Luo, S., Varadharajan, V., Hameed, I. A., Chen, S., Liu, D., & Li, J. (2020). Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies, 13(10), 2509. 52. Wazid, M., Das, A. K., Chamola, V., & Park, Y. (2022). Uniting cyber security and machine learning: Advantages, challenges and future research. ICT Express. 53. Davenport, T., & Kalakota, R. (2019). The potential for artificial intelligence in healthcare. Future Healthcare Journal, 6(2), 94.
220
H. B. Mehare et al.
54. Bhardwaj, R., Nambiar, A. R., & Dutta, D. (2017, July). A study of machine learning in healthcare. In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC) (Vol. 2, pp. 236–241). IEEE. 55. Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future – Big data, machine learning, and clinical medicine. The New England Journal of Medicine, 375(13), 1216. 56. Weiss, J., Kuusisto, F., Boyd, K., Liu, J., & Page, D. (2015). Machine learning for treatment assignment: Improving individualized risk attribution. In AMIA annual symposium proceedings (Vol. 2015, p. 1306). American Medical Informatics Association. 57. Choy, G., Khalilzadeh, O., Michalski, M., Do, S., Samir, A. E., Pianykh, O. S., et al. (2018). Current applications and future impact of machine learning in radiology. Radiology, 288(2), 318. 58. Bibault, J. E., Giraud, P., & Burgun, A. (2016). Big data and machine learning in radiation oncology: State of the art and future prospects. Cancer Letters, 382(1), 110–117. 59. Pallathadka, H., Mustafa, M., Sanchez, D. T., Sajja, G. S., Gour, S., & Naved, M. (2021). Impact of machine learning on management, healthcare and agriculture. In Materials today: Proceedings. 60. Meshram, V., Patil, K., Meshram, V., Hanchate, D., & Ramkteke, S. D. (2021). Machine learning in agriculture domain: A state-of-art survey. Artificial Intelligence in the Life Sciences, 1, 100010. 61. Liakos, K. G., Busato, P., Moshou, D., Pearson, S., & Bochtis, D. (2018). Machine learning in agriculture: A review. Sensors, 18(8), 2674. 62. Dutta, R., Smith, D., Rawnsley, R., Bishop-Hurley, G., Hills, J., Timms, G., & Henry, D. (2015). Dynamic cattle behavioural classification using supervised ensemble classifiers. Computers and Electronics in Agriculture, 111, 18–28. 63. Matthews, S. G., Miller, A. L., PlÖtz, T., & Kyriazakis, I. (2017). Automated tracking to measure behavioural changes in pigs for health and welfare monitoring. Scientific Reports, 7(1), 1–12. 64. Benos, L., Tagarakis, A. C., Dolias, G., Berruto, R., Kateris, D., & Bochtis, D. (2021). Machine learning in agriculture: A comprehensive updated review. Sensors, 21(11), 3758. 65. Goodell, J. W., Kumar, S., Lim, W. M., & Pattnaik, D. (2021). Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis. Journal of Behavioral and Experimental Finance, 32, 100577. 66. Culkin, R., & Das, S. R. (2017). Machine learning in finance: The case of deep learning for option pricing. Journal of Investment Management, 15(4), 92–100. 67. Athey, S. (2018). The impact of machine learning on economics. In The economics of artificial intelligence: An agenda (pp. 507–547). University of Chicago Press. 68. Aziz, S., Dowling, M., Hammami, H., & Piepenbrink, A. (2022). Machine learning in finance: A topic modeling approach. European Financial Management, 28(3), 744–770. 69. Warin, T., & Stojkov, A. (2021). Machine learning in finance: A metadata-based systematic review of literature. Journal of Risk and Financial Management, 14(7), 302. 70. Emerson, S., Kennedy, R., O'Shea, L., & O’Brien, J. (2019, May). Trends and applications of machine learning in quantitative finance. In the 8th international conference on economics and finance research (ICEFR 2019). 71. Kumar, A., Boehm, M., & Yang, J. (2017, May). Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the 2017 ACM international conference on management of data (pp. 1717–1722). 72. Schelter, S., Biessmann, F., Januschowski, T., Salinas, D., Seufert, S., & Szarvas, G. (2018). On challenges in machine learning model management. 73. Paleyes, A., Urma, R. G., & Lawrence, N. D. (2020). Challenges in deploying machine learning: A survey of case studies. ACM Computing Surveys (CSUR). 74. Hutter, F., Kotthoff, L., & Vanschoren, J. (2019). Automated machine learning: Methods, systems, challenges (p. 219). Springer Nature.
Case Study 1: Human Emotion Detection Jishnu Pillai Anilkumar, Hussam Bin Mehare, and Mohammad “Sufian” Badar
1 Introduction Convolutional neural networks (CNNs) are a state-of-the-art technology that enables machines to recognize and categorize human emotions in images and videos. It has several uses in many different industries, including psychology, marketing, entertainment, and security. The capacity of emotion detection technology to offer insightful data on human emotions, behavior, and decision-making processes is prompting it to gain popularity at a rapid rate. Convolutional layers are used by CNNs, which are deep learning neural networks, to identify and categorize information in a picture. In order to determine emotions using CNNs, a model must be trained on a sizable dataset of pictures or videos that have been annotated with the associated feelings, such as Anger, Disgust, Fear, Happiness, Neutral, Sadness, and Surprise. By modifying its weights during training to reduce the discrepancy between its predictions and the actual labels of the training images, the model learns to identify patterns in the images that represent various emotions. By transmitting new images or videos through the network after the CNN model has been trained, it is possible to anticipate the emotions in the images or videos. For
J. P. Anilkumar Department of Computer Science and Engineering, Presidency University, Bengaluru, Karnataka, India H. B. Mehare Department of Mechanical Engineering, Z.H. College of Engineering and Technology, Aligarh Muslim University, Aligarh, Uttar Pradesh, India M. S. Badar (✉) Department of Computer Science and Engineering, School of Engineering Sciences and Technology (SEST), Jamia Hamdard, New Delhi, India (Former) Department of Bioengineering, University of California, Riverside, CA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_9
221
222
J. P. Anilkumar et al.
each input image or video, the model determines the likelihood of each emotion and assigns the label of the emotion with the highest likelihood. CNN-based emotion detection is possible in real time, which makes it perfect for applications requiring prompt emotion identification, such social robots or virtual assistants. The design of the CNN model, the preprocessing methods employed, and the quality of the training data are just a few of the variables that might affect how accurate CNN-based emotion detection models are. However, thanks to recent developments in deep learning, CNN-based models have shown cutting-edge outcomes in emotion recognition, sometimes outperforming human accuracy. In conclusion, CNN-based emotion detection is a powerful technology that has the potential to change numerous industries by offering crucial insights into human emotions and behavior. CNN-based models are improving in accuracy and dependability due to their expanding popularity and deep learning developments, which open the door for new applications in the future.
2 Google Colaboratory There are many ways in which we can create and deploy a machine or deep learning model. In this case study, we will be using Google Colaboratory. An online development environment for producing and executing Jupyter Notebook files is called Google Colaboratory, or Colab for short. The GPUs and TPUs are accessible for free (there are some limits in the free version), and it is built on top of Google Drive. With Colab, we can develop and run Python code, perform machine and deep learning tasks, and collaborate. Before we begin coding, we need to set up a few things. Follow the directions in the images given below:
2.1
Click on Runtime
Case Study 1: Human Emotion Detection
2.2
Click on Change runtime type
2.3
Select GPU from the drop-down menu and click Save
223
That completes the set up. We can now start coding our CNN model. Note Use this link to access the online resource that can help you get started with Google Colaboratory: https://colab.research.google.com/
3 Model Implementation You will find the code and all related materials from https://www.github.com/ Jishnnu/Emotion-Detection
224
J. P. Anilkumar et al.
We briefly touched upon the libraries that python offers machine learning in Chap. 2. Let us take a hands-on approach to deepen our understanding of these concepts. Machine learning’s subfield, Emotion Detection, includes evaluating and identifying human emotions from a variety of sources, including text, audio, facial expressions, and physiological signs. Psychology, healthcare, customer service, and marketing are just a few of the industries where emotion detection has applications. On labeled data sets, machine learning algorithms are taught to identify patterns in the input data that correspond to various emotional states. For instance, algorithms for face recognition may be trained to recognize expressions that are indicative of various emotions, such as surprise, rage, grief, or happiness. Here we are particularly interested in exploring the dataset using a deep learning model.
3.1
Deep Learning
In order to learn hierarchical representations of data, deep learning entails training artificial neural networks with several layers. The ability of these deep neural networks to autonomously learn and extract complex characteristics from highdimensional data enables them to achieve state-of-the-art performance on a number of tasks, including speech and picture recognition, natural language processing, and game playing.
3.2
Dataset
We are using an image-based dataset [2] that shows seven different human emotions – Anger, Disgust, Fear, Happiness, Neutral, Sadness, and Surprise. from google.colab import drive drive.mount('/content/drive')
3.3
Convolutional Neural Network
Deep neural networks like CNN (convolutional neural networks) are typically employed to analyze visual data. For tasks like image classification, object recognition, and image segmentation, it employs a specific architecture with convolutional layers that can automatically learn features from photos and other forms of multidimensional data. The code in the following cells defines the architecture of a convolutional neural network (CNN) model for human emotion detection.
Case Study 1: Human Emotion Detection
225
# Importing necessary libraries import import import import
os numpy as np tensorflow as tf matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
The fastest way to import data into Google Colab is to upload the zipped version of the dataset into Google Drive and use the following line of code to unzip and load it into the runtime. > /dev/null is used to disable the unzip command’s output. # Unzipping the dataset
!unzip /content/drive/MyDrive/Emotion_Detection/images.zip > /dev/null
# Setting the paths for our training and validation data. # This is my custom path, and it may be different from your folder/file path folder = "/content/images" train_data_path = os.path.join(folder,"train") test_data_path = os.path.join(folder,"validation")
The emotion variable’s value is shown in a grid of nine pictures using the following code. Using the load_img() method from the keras.preprocessing.image module, the pictures are loaded from the directory provided in the folder variable. The imshow() function of the Matplotlib library is then used to display the pictures. The plt.show() method is used to display the plot. The plot’s backdrop is made dark by using the plt.style.use(‘dark background’) function. The plot’s size is adjusted to 12 x 12 inches using the command plt.figure (figsize = (12, 12)). # Viewing images in the dataset emotion = 'happy' plt.style.use('dark_background') plt.figure(fig = (12, size 12))
for i in range(1, 10, 1): plt.subplot(3, 3, i) img = load_img(folder + "/train/" + emotion + "/" + os.listdir(folder + "/train/" + emotion)[i],䐞
↪target_size=(48, 48)) plt.imshow(img) plt.show()
226
3.4
J. P. Anilkumar et al.
Data Generators
The following lines of code create a data generator for the training and validation datasets, which can be used to load images in batches during model training. The train_datagen object’s flow_from_directory method requests the directory path containing the training images, the goal size for the images, the batch size for loading the images, the color mode (in this case, grayscale), and the class mode (categorical in this case). When a model is being trained, it returns a generator that can be used to load batches of photos and the labels that go with them. # Create data generators for our train and validation datasets batch_size = 32 img_size = (48, 48)
# The following lines of code define an ImageDataGenerator object for data augmentation during training of a neural network for image classification.
Case Study 1: Human Emotion Detection
227
train_datagen = ImageDataGenerator( rescale=1./255, # Scales between 0
the pixel values of the image to be
and 1.
rotation_range=20, # Randomly rotates the image by a specified number of degrees in the given range.
zoom_range=0.2, # Randomly zooms into the image by a specified factor in
the given range. width_shift_range=0.2, # Randomly shifts the image horizontally by a
specified fraction of the total image size. height_shift_range=0.2, # Randomly shifts the image vertically by a
specified fraction of the total image size. horizontal_flip=True, # Determines how the empty space created by
the above transformations is filled. fill_mode='nearest' # Fills it with the nearest pixel value. )
# The next line of code defines an ImageDataGenerator object for data augmentation during validation of a neural network for image classification. val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory (
train_data_path, target_size=img_size, batch_size=batch_size, color_mode='grayscale', class_mode='categorical'
) val_generator = val_datagen.flow_from_directory(
test_data_path, target_size=img_size, batch_size=batch_size, color_mode='grayscale', class_mode='categorical'
)
Found 28821 images belonging to 7 classes. Found 7066 images belonging to 7 classes.
228
3.5
J. P. Anilkumar et al.
Hyper-Parameter Description
ACTIVATION RELU The rectified linear activation function or ReLU for short is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. It has become the default activation function for many types of neural networks because a model that uses it is easier to train and often achieves better performance. MODEL SEQUENTIAL A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. MAXPOOLING Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map. The results are down sampled or pooled feature maps that highlight the most present feature in the patch, not the average presence of the feature in the case of average pooling. PADDING The padding parameter of the Keras Conv2D class can take one of two values: “valid” or “same”. Setting the value to “valid” parameter means that the input volume is not zero-padded and the spatial dimensions are allowed to reduce via the natural application of convolution. BATCH NORMALIZATION Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each minibatch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks. DROPOUT Dropout is a technique used to prevent a model from overfitting. Dropout works by randomly setting the outgoing edges of hidden units (neurons that make up hidden layers) to 0 at each update of the training phase. SGD Stochastic Gradient Descent (SGD) addresses both of these issues by following the negative gradient of the objective after seeing only a single or a few training examples. The use of SGD in the neural network setting is motivated by the high cost of running back propagation over the full training set. RMSPROP RMSprop is a gradient-based optimization technique used in training neural networks. This normalization balances the step size (momentum), decreasing the step for large gradients to avoid exploding, and increasing the step for small gradients to avoid vanishing. ADAM Adam can be looked at as a combination of RMSprop and Stochastic Gradient Descent with momentum. It uses the squared gradients to scale the learning rate like RMSprop, and it takes advantage of momentum by using moving average of the gradient instead of gradient itself like SGD with momentum.
Case Study 1: Human Emotion Detection
3.6
229
Model Definition
• The next lines add convolutional layers to the model with increasing depth. • and reducing spatial dimensions of the feature maps through MaxPooling layers. • The Rectified Linear Unit (ReLU), which is well known for performing well in image recognition tasks, is the activation function utilized in all convolutional layers. • When the padding is set to “same,” zeros are appended to the input to provide the output the same spatial dimensions as the input. • Each Conv2D layer is followed by a layer of BatchNormalization to normalize the activations from the preceding layer. • To minimize overfitting, dropout layers are introduced after each MaxPooling layer. • To extract the probability distribution of 7 emotions, 2 dense layers are added at the end, one with 128 units and ReLU activation (mentioned above) and the other with 7 units and softmax activation. • Each Conv2D layer is followed by a layer of BatchNormalization to normalize the activations from the preceding layer. • To minimize overfitting, dropout layers are introduced after each MaxPooling layer. • To extract the probability distribution of 7 emotions, 2 dense layers are added at the end, one with 128 units and ReLU activation (mentioned above) and the other with 7 units and softmax activation.
# DEFINE CNN MODEL
# An empty sequential model is created in the first line, to which successive layers are added. model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(img_size[0], img_size[1], 1))) model.add(Conv2D(64, (3, 3), activation='relu', padding='same')) model.add(BatchNormalization()) model.add(MaxPooling2D((2, 2))) model.add(Dropout(0.5))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same')) model.add(Conv2D(128, (3, 3), activation='relu', padding='same')) model.add(BatchNormalization()) model.add(MaxPooling2D((2, 2))) model.add(Dropout(0.5)) model.add(Conv2D(256, (3, 3), activation='relu', padding='same')) model.add(Conv2D(256, (3, 3), activation='relu', padding='same')) model.add(BatchNormalization()) model.add(MaxPooling2D((2, 2)))
230
J. P. Anilkumar et al. model.add(Dropout(0.5))
model.add(Conv2D(512, (3, 3), activation='relu', padding='same')) model.add(Conv2D(512, (3, 3), activation='relu', padding='same')) model.add(BatchNormalization()) model.add(MaxPooling2D((2, 2))) model.add(Dropout(0.5))
# Convolutional layer output is transformed into a one dimensional vector by the Flatten layer, which can then be fed into a fully linked layer. model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(7, activation='softmax'))
# To print the model’s architecture, including the number of parameters in each tier, the summary method is invoked. model.summary()
Model: "sequential" ---------------------------------------------------------Layer (type) Output Shape Param # ================================================== conv2d (Conv2D) conv2d_1 (Conv2D) batch_normalization (BatchN normalization)
(None, 48, 48, 64) (None, 48, 48, 64) (None, 48, 48, 64)
max_pooling2d (MaxPooling2D) dropout (Dropout) conv2d_2 (Conv2D) conv2d_3 (Conv2D)
(None, (None, (None,
24, 24, 24,
batch_normalization_1 (Batc hNormalization) max_pooling2d_1 (MaxPooling 2D) dropout_1 (Dropout) conv2d_4 (Conv2D) conv2d_5 (Conv2D)
(None, (None, (None,
batch_normalization_2 (Batc hNormalization) max_pooling2d_2 (MaxPooling 2D) dropout_2 (Dropout) conv2d_6 (Conv2D) conv2d_7 (Conv2D)
(None, 24, 24, 64)
(None, (None, (None,
24, 24, 24,
64) 128) 128)
0
0 73856 147584
(None, 24, 24, 128)
512
(None, 12, 12, 128)
0
12, 12, 12,
12, 12, 12,
128) 256) 256)
0 295168 590080
(None, 12, 12, 256) (None, 6, 6, 256) 6, 6, 6,
6, 6, 6,
256) 512) 512)
1024
0
0 1180160 2359808
640 36928 256
Case Study 1: Human Emotion Detection
231
batch_normalization_3 (Batc
(None, 6, 6, 512)
max_pooling2d_3 (MaxPooling 2D) dropout_3 (Dropout) flatten (Flatten) dense (Dense)
(None, (None,
3, 3, 512) 3, 3, 512)
(None, (None,
4608) 128)
batch_normalization_4 (Batc 512
(None, 128) hNormalization)
dropout_4 (Dropout) dense_1 (Dense)
(None, (None,
128) 7)
2048 hNormalization) 0 0 0 589952
0 903
================================= Total params: 5,279,431 Trainable params: 5,277,255 Non-trainable params: 2,176
3.7
Model Compilation and Training
The model is set up for training using the model.compile() function. In this line of code, we are building the model using the Adam optimizer, a well-liked stochastic gradient descent optimizer. Moreover, categorical cross-entropy, which is frequently employed for multiclass classification issues, is the loss function that we have specified. Finally, we define accuracy as the statistic that will be used to assess the model’s performance throughout training. The model is trained using the model.fit() function. We are fitting the model to our training data by using train_generator as the input data, train_generator.samples // batch_size as the number of steps_per_epoch (the number of batches of samples to use in each epoch), 50 as the number of epochs, val_generator as the validation data, and val_generator.samples // batch_size as the number of validation steps. The model will be tuned during training to reduce category cross-entropy loss and increase the accuracy metric. The history object contains the training history. model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy']) history = model.fit( train_generator, steps_per_epoch=train_generator.samples // batch_size, epochs=45, validation_data=val_generator, validation_steps=val_generator.samples // batch_size
)
232
J. P. Anilkumar et al.
Epoch 1/45 900/900 [==============] - 57s 44ms/step - loss: 2.0880 accuracy: 0.2082 - val_loss: 1.7973 - val_accuracy: 0.2673 Epoch 2/45 900/900 [==============] - 45s 50ms/step - loss: 1.8189 accuracy: 0.2426 - val_loss: 1.9431 - val_accuracy: 0.2023 Epoch 3/45 900/900 [===================] - 40s 44ms/step - loss: 1.7806 accuracy: 0.2618 - val_loss: 1.9553 - val_accuracy: 0.2578 Epoch 4/45 900/900 [=====================] - 38s 42ms/step - loss: 1.7226 accuracy: 0.3041 - val_loss: 1.5582 - val_accuracy: 0.3987 Epoch 5/45 900/900 [=================] - 39s 43ms/step - loss: 1.6135 accuracy: 0.3681 - val_loss: 1.4472 - val_accuracy: 0.4453 Epoch 6/45 900/900 [==============================] - 39s 44ms/step loss: 1.5332 accuracy: 0.4022 - val_loss: 1.5490 - val_accuracy: 0.4437 Epoch 7/45 900/900 [==============================] - 39s 44ms/step loss: 1.4768 accuracy: 0.4307 - val_loss: 1.3040 - val_accuracy: 0.4989 Epoch 8/45 900/900 [==============================] - 39s 44ms/step loss: 1.4253 accuracy: 0.4563 - val_loss: 1.2798 - val_accuracy: 0.5126 Epoch 9/45 900/900 [==============================] - 37s 41ms/step loss: 1.3814 accuracy: 0.4728 - val_loss: 1.1965 - val_accuracy: 0.5389 Epoch 10/45 900/900 [==============================] - 37s 41ms/step loss: 1.3549 accuracy: 0.4821 - val_loss: 1.2637 - val_accuracy: 0.5116 Epoch 11/45 900/900 [==============================] - 40s 44ms/step loss: 1.3170 accuracy: 0.5025 - val_loss: 1.1515 - val_accuracy: 0.5639 Epoch 12/45 900/900 [==============================] - 40s 45ms/step loss: 1.2991 accuracy: 0.5084 - val_loss: 1.1044 - val_accuracy: 0.5794 Epoch 13/45 900/900 [==============================] - 39s 44ms/step loss: 1.2705 accuracy: 0.5185 - val_loss: 1.0906 - val_accuracy: 0.5815 Epoch 14/45 900/900 [==============================] - 39s 43ms/step loss: 1.2577 accuracy: 0.5239 - val_loss: 1.0586 - val_accuracy: 0.5999 Epoch 15/45 900/900 [==============================] - 37s 41ms/step -
Case Study 1: Human Emotion Detection
233
loss: 1.2385 accuracy: 0.5310 - val_loss: 1.0632 - val_accuracy: 0.5989 Epoch 16/45 900/900 [==============================] - 39s 44ms/step loss: 1.2258 accuracy: 0.5379 - val_loss: 1.1115 - val_accuracy: 0.5746 Epoch 17/45 900/900 [==============================] - 38s 43ms/step loss: 1.2038 accuracy: 0.5510 - val_loss: 1.1792 - val_accuracy: 0.5570 Epoch 18/45 900/900 [==============================] - 37s 41ms/step loss: 1.1925 accuracy: 0.5529 - val_loss: 1.1119 - val_accuracy: 0.5727 Epoch 19/45 900/900 [==============================] - 39s 43ms/step loss: 1.1817 accuracy: 0.5558 - val_loss: 1.0416 - val_accuracy: 0.6105 Epoch 20/45 900/900 [==============================] - 38s 42ms/step loss: 1.1735 accuracy: 0.5595 - val_loss: 1.0488 - val_accuracy: 0.6000 Epoch 21/45 900/900 [==============================] - 36s 40ms/step loss: 1.1625 accuracy: 0.5639 - val_loss: 1.1301 - val_accuracy: 0.5818 Epoch 22/45 900/900 [==============================] - 37s 42ms/step loss: 1.1461 accuracy: 0.5740 - val_loss: 1.0315 - val_accuracy: 0.6104 Epoch 23/45 900/900 [==============================] - 36s 40ms/step loss: 1.1414 accuracy: 0.5745 - val_loss: 0.9896 - val_accuracy: 0.6270 Epoch 24/45 900/900 [==============================] - 39s 44ms/step loss: 1.1326 accuracy: 0.5739 - val_loss: 1.0154 - val_accuracy: 0.6139 Epoch 25/45 900/900 [==============================] - 40s 45ms/step loss: 1.1183 accuracy: 0.5796 - val_loss: 1.0445 - val_accuracy: 0.5966 Epoch 26/45 900/900 [==============================] - 36s 40ms/step loss: 1.1146 accuracy: 0.5850 - val_loss: 0.9745 - val_accuracy: 0.6341 Epoch 27/45 900/900 [==============================] - 38s 43ms/step loss: 1.1112 accuracy: 0.5810 - val_loss: 1.0015 - val_accuracy: 0.6313 Epoch 28/45 900/900 [==============================] - 41s 45ms/step loss: 1.1022 accuracy: 0.5871 - val_loss: 0.9814 - val_accuracy: 0.6378
234
J. P. Anilkumar et al.
Epoch 29/45 900/900 [==============================] - 38s 43ms/step loss: 1.0996 accuracy: 0.5892 - val_loss: 1.0284 - val_accuracy: 0.6102 Epoch 30/45 900/900 [==============================] - 39s 43ms/step loss: 1.0891 accuracy: 0.5912 - val_loss: 0.9535 - val_accuracy: 0.6425 Epoch 31/45 900/900 [==============================] - 37s 41ms/step loss: 1.0797 accuracy: 0.5953 - val_loss: 0.9972 - val_accuracy: 0.6385 Epoch 32/45 900/900 [==============================] - 41s 45ms/step loss: 1.0758 accuracy: 0.5964 - val_loss: 0.9826 - val_accuracy: 0.6337 Epoch 33/45 900/900 [==============================] - 38s 42ms/step loss: 1.0670 accuracy: 0.6045 - val_loss: 1.0000 - val_accuracy: 0.6281 Epoch 34/45 900/900 [==============================] - 41s 46ms/step loss: 1.0657 accuracy: 0.6022 - val_loss: 0.9796 - val_accuracy: 0.6277 Epoch 35/45 900/900 [==============================] - 40s 44ms/step loss: 1.0583 accuracy: 0.6048 - val_loss: 1.0089 - val_accuracy: 0.6209 Epoch 36/45 900/900 [==============================] - 40s 44ms/step loss: 1.0611 accuracy: 0.6043 - val_loss: 1.0444 - val_accuracy: 0.6186 Epoch 37/45 900/900 [==============================] - 40s 45ms/step loss: 1.0474 accuracy: 0.6105 - val_loss: 0.9603 - val_accuracy: 0.6442 Epoch 38/45 900/900 [==============================] - 40s 44ms/step loss: 1.0513 accuracy: 0.6102 - val_loss: 0.9506 - val_accuracy: 0.6460 Epoch 39/45 900/900 [==============================] - 39s 43ms/step loss: 1.0485 accuracy: 0.6117 - val_loss: 0.9502 - val_accuracy: 0.6480 Epoch 40/45 900/900 [==============================] - 37s 41ms/step loss: 1.0334 accuracy: 0.6176 - val_loss: 0.9776 - val_accuracy: 0.6303 Epoch 41/45 900/900 [==============================] - 39s 43ms/step loss: 1.0350 accuracy: 0.6137 - val_loss: 0.9485 - val_accuracy: 0.6489 Epoch 42/45 900/900 [==============================] - 36s 40ms/step -
Case Study 1: Human Emotion Detection
235
loss: 1.0344 accuracy: 0.6111 - val_loss: 0.9676 - val_accuracy: 0.6408 Epoch 43/45 900/900 [==============================] - 38s 42ms/step loss: 1.0237 – accuracy: 0.6198 - val_loss: 0.9331 - val_accuracy: 0.6503 Epoch 44/45 900/900 [==============================] - 41s 45ms/step loss: 1.0200 accuracy: 0.6208 - val_loss: 0.9264 - val_accuracy: 0.6626 Epoch 45/45 900/900 [==============================] - 37s 41ms/step loss: 1.0217 accuracy: 0.6162 - val_loss: 0.9225 - val_accuracy: 0.6622
3.8
Model Evaluation
These lines of code create two graphs, one for the accuracy of a machine learning model during training and validation and the other for the loss during training and validation, using the Python module Matplotlib. Here is a detailed explanation of the code: acc = history.history['accuracy'] val_acc = history.history['val_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss']
epochs = range(len(acc)) plt.figure(figsize=(12, 6)) plt.subplot(1, 2, 1)
# This line displays the values from the training and validation sets over the epochs that were recorded in the history object and correspond to the input parameters. plt.plot(epochs, acc, 'b', label='Training accuracy') plt.plot(epochs, val_acc, 'r', label='Validation accuracy') # This line changes the graph's title plt.title('Training and validation accuracy') plt.legend() plt.subplot(1, 2, 2)
236
J. P. Anilkumar et al. plt.plot(epochs, loss, 'b', label='Training loss') plt.plot(epochs, val_loss, 'r', label='Validation loss')
# This line changes the graph's title plt.title('Training and validation loss')
# This line adds a legend to the graph plt.legend()
# This line displays the graph. plt.show()
# Save the model model.save('emotion_detection_model.h5')
3.9
OpenCV
A software library for computer vision and machine learning is called OpenCV (OpenSource Computer Vision Library). It provides an array of algorithms and methods that can be applied to a number of computer vision tasks, including processing of images and videos, object detection and recognition, and more. This code loads a model that has already been trained to recognize facial expressions in a video stream. It captures the video stream using OpenCV, uses a pre-trained face detection model to identify faces in each frame, and then employs a pre-trained CNN model to identify the emotions associated with each detected face. The edited frames are then written to a new video file along with the predicted emotion label that was drawn on the frame. Finally, it closes all open windows and releases all consumed resources.
Case Study 1: Human Emotion Detection
237
import cv2 import numpy as np from tensorflow.keras.models import load_model
# Load the trained facial emotion detection model model = load_model('/content/emotion_detection_model.h5')
# Define a dictionary to map emotion labels to their names emotions = {
0: 1: 2: 3: 4: 5: 6:
'angry', 'disgust', 'fear', 'happy', 'neutral', 'sad', 'surprise'
} # Create a VideoCapture object to capture the video stream # This is my custom path, and it may be different from your file path cap = cv2.VideoCapture('/content/drive/MyDrive/Emotion_Detection/Emotions.mp4')
# Define the face detection model face_cascade = cv2.CascadeClassifier('/content/drive/MyDrive/Emotion_Detection/ haarcascade_frontalface_default.xml')
# Define the output video codec and frame rate fourcc = cv2.VideoWriter_fourcc(*'mp4v') fps = int(cap.get(cv2.CAP_PROP_FPS))
# Define the output video writer out = cv2.VideoWriter('Emotions_Output.mp4', fourcc, fps, (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))))
# Loop through each frame in the video stream
238
J. P. Anilkumar et al.
while cap.isOpened():
# Read the next frame from the video stream ret, frame = cap.read()
# If there's an error reading the frame, break out of the loop if not ret: break
# Convert the frame to grayscale for face detection gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Detect faces in the frame using the face detection model faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)
# For each face detected, predict the emotion using the trained model for (x, y, w, h) in faces: # Crop the face region from the frame face_image = gray[y:y+h, x:x+w] face_image = cv2.resize(face_image, (48, 48)) face_image = np.reshape(face_image, (1, 48, 48, 1))
# Normalize the pixel values to be between 0 and 1 face_image = face_image / 255.0 # Predict the emotion using the trained model emotion_probabilities = model.predict(face_image)[0] predicted_emotion = emotions[np.argmax(emotion_probabilities)]
# Draw the predicted emotion label on the frame cv2.putText(frame, predicted_emotion, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
# Draw a rectangle around the face on the frame cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
# Write the frame to the output video out.write(frame)
# Exit the loop if the user presses the 'q' key if cv2.waitKey(1) & 0xFF == ord('q'): break # Release the VideoCapture and VideoWriter objects cap.release() out.release()
# Close all windows cv2.destroyAllWindows()
Case Study 1: Human Emotion Detection
239
1/1 [==============================] - 0s 36ms/step 1/1 [==============================] - 0s 32ms/step 1/1 [==============================] - 0s 35ms/step 1/1 [==============================] - 0s 31ms/step 1/1 [==============================] - 0s 34ms/step 1/1 [==============================] - 0s 32ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 33ms/step 1/1 [==============================] - 0s 37ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 25ms/step 1/1 [==============================] - 0s 25ms/step 1/1 [==============================] - 0s 33ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 26ms/step 1/1 [==============================] - 0s 31ms/step 1/1 [==============================] - 0s 32ms/step 1/1 [==============================] - 0s 38ms/step 1/1 [==============================] - 0s 31ms/step 1/1 [==============================] - 0s 26ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 29ms/step 1/1 [==============================] - 0s 31ms/step 1/1 [==============================] - 0s 34ms/step 1/1 [==============================] - 0s 33ms/step 1/1 [==============================] - 0s 35ms/step 1/1 [==============================] - 0s 40ms/step 1/1 [==============================] - 0s 45ms/step 1/1 [==============================] - 0s 44ms/step 1/1 [==============================] - 0s 35ms/step
4 Output and Conclusion The use of CNN in emotion detection allows for the extraction of high-level features from images and videos, making it possible to identify subtle changes in facial expressions that can reveal the emotional state of an individual. The performance of CNN-based emotion detection models has been shown to surpass that of traditional methods such as Support Vector Machines (SVM) and Decision Trees. As we can see, our model seems to be doing a good job in detection emotions. You can use any video or maybe record your own video and evaluate the accuracy of the model. As with any machine learning model, obtaining high accuracy largely depends on the quality of the data used to train the model. The creation of more reliable models has been made possible by the availability of large-scale annotated datasets. The difficulty of handling unbalanced and noisy datasets, however, continues to be a major problem in emotion detection.
240
J. P. Anilkumar et al.
Overall, CNN-based emotion detection is a powerful technology that is transforming our ability to understand and analyze human emotions from visual cues. With continued research and development, it has the potential to revolutionize numerous fields and provide valuable insights into the emotional state of individuals in various contexts. Browse to this web page to view and see our model in action: https://huggingface.co/spaces/Jishnnu/Emotion-Detection
Emotion Detected: Sad
Emotion Detected: Neutral
Case Study 1: Human Emotion Detection
Emotion Detected: Fear
Emotion Detected: Neutral and Happy (from left to right)
241
242
J. P. Anilkumar et al.
Emotion Detected: Sad and Fear (from left to right)
5 Credits 1. KISS Institute for Practical Robotics Face Detection Model – Harrcascade File (haarcascade_frontalface_default.xml). 2. Face Expression Recognition Dataset https://www.kaggle.com/datasets/jonathanoheix/face-expression-recognitiondataset
Case Study 2: Brain Tumor Classification Jishnu Pillai Anilkumar, Hussam Bin Mehare, and Mohammad “Sufian” Badar
1 Introduction Brain tumor classification is a very important area in medical image analysis as it has the potential to improve the diagnosis and treatment of brain tumors. In order to properly plan therapy and manage patients, it is essential to accurately classify brain tumors, which are one of the major causes of cancer-related fatalities globally. High-accuracy brain tumor classification models have been created as a result of recent developments in machine learning and deep learning approaches. Convolutional neural networks (CNNs) have demonstrated applaudable results in brain tumor classification tasks and are often employed in medical image analysis. In this case study, we examine the use of ResNet, a deep neural network architecture, to classify brain tumors. We will make use of a set of MRI scans of brain tumors that is publicly available. The dataset includes four classes: glioma, meningioma, notumor, and pituitary. Our objective is to create a CNN model that can accurately categorize these different forms of brain tumors. By transmitting new images or videos through the network after the CNN model has been trained, it is possible to anticipate the brain tumors in the images or videos.
J. P. Anilkumar Department of Computer Science and Engineering, Presidency University, Bengaluru, Karnataka, India H. B. Mehare Department of Mechanical Engineering, Z.H. College of Engineering and Technology, Aligarh Muslim University, Aligarh, Uttar Pradesh, India M. S. Badar (✉) Department of Computer Science and Engineering, School of Engineering Sciences and Technology (SEST), Jamia Hamdard, New Delhi, India (Former) Department of Bioengineering, University of California, Riverside, CA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1_10
243
244
J. P. Anilkumar et al.
For each input image or video, the model determines the likelihood of each type of brain tumor and assigns the label of the corresponding type with the highest likelihood. To expand the size of the training dataset and lower overfitting, we will employ data augmentation techniques. On our brain tumor dataset, we will also apply transfer learning to enhance a pre-trained ResNet model. Finally, we will assess our model’s performance on a held-out test set and compare it to the most recent techniques for classifying brain tumors. This case study will show how deep learning techniques work in medical image analysis and how they may be used to improve patient outcomes in the detection and treatment of brain tumors. CNN-based models are improving in accuracy and dependability due to their expanding popularity and deep learning developments, which open the door for new applications in the future.
2 Kaggle Kernel There are many ways in which we can create and deploy a machine or deep learning model. In this case study, we will be using Kaggle Kernel. Kaggle Kernel is an online space for data science exploration and cooperation. It is similar to Google Colaboratory and we are using this platform because the huge variety datasets that are available to us on Kaggle makes it easy and fun to build machine learning models. It enables users to create, run, and share their code in a variety of programming languages, including Python and R. Kernels can be applied to data exploration, visualization, and the development of machine learning models. The dataset we are using is rather huge and is difficult to work with on other platforms, hence let us see how we can set up and get started with our case study: After you create a Kaggle account, click on Create -> New Notebook from the menu
Case Study 2: Brain Tumor Classification
245
Once the notebook loads, click on Add Data under the Data section on the righthand side
Type Brain Tumor MRI Dataset in the search bar and click the + symbol to add the dataset to your working directory.
If you verified your phone number, then you are eligible to use Kaggle GPUs and TPUs. To toggle this feature on, click on the three dots on the top righthand side -> Accelerator. Choose one that you prefer working with.
246
J. P. Anilkumar et al.
That completes the set up. We can now start coding our model. Note • TPUs may not always be available, so use the GPUs when you encounter such issues. • Use this link to access the online resource that can help you get started with Kaggle Kernel: https://www.kaggle.com/docs.
3 Dataset The dataset [1] is collection of MRI scans of four different types of brain tumors.
Case Study 2: Brain Tumor Classification
247
4 Model Implementation You will find the code and all related materials from https://www.github.com/ Jishnnu/Brain-Tumor-Classification In [1]: # Import necessary libraries
In [1]: import os import numpy as np import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.preprocessing.image import load_img, img_to_array, ImageDataGenera
In [2]: # Define the input shape
In [2]: input_shape = (150, 150, 3) # Define the number of classes num_classes = 4
4.1
Data Generators
The image data is enhanced and preprocessed using the ImageDataGenerator class. A common preprocessing step for image data is to scale the pixel values to be between 0 and 1. This is done using the rescale option. The data augmentation techniques of shear range, zoom range, and horizontal flip serve to diversify the training data and enhance model performance. The image data directories are utilized to generate data generators using the flow from directory technique. The training and testing data folders are used to build the train generator and test generator, respectively, which produce batches of enhanced and preprocessed image data. The target size option, which is required to scale the images in the data generators, indicates the size of the input images that the model anticipates. The number of images to include in each batch of data is specified by the batch size option. Given that there are several classes, the class mode argument indicates the type of labels to use, which in this case is categorical.
248
J. P. Anilkumar et al.
In [3]: # Define the data generators
In [3]: train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True) test_datagen = ImageDataGenerator(rescale=1./255) # Define the paths to the training and test data. This path might differ in your case train_data_path = '/kaggle/input/brain-tumor-mri-dataset/Training' test_data_path = '/kaggle/input/brain-tumor-mri-dataset/Testing' # Create the generators train_generator = train_datagen.flow_from_directory(train_data_path, target_size=input_shape[:2], batch_size=32, class_mode='categorical') test_generator = test_datagen.flow_from_directory(test_data_path, target_size=input_shape[:2], batch_size=32, class_mode='categorical') Found 5712 images belonging to 4 classes. Found 1311 images belonging to 4 classes.
Found 5712 images belonging to 4 classes. Found 1311 images belonging to 4 classes.
4.2
ResNet Model
These lines of code use TensorFlow’s Keras API to define a ResNet101 model. The ImageNet is a large dataset with millions of annotated images that were used to pre-train the ResNet101 deep neural network architecture. In [4]: # Define the ResNet model
In [4]: resnet_model = tf.keras.applications.ResNet101(include_top=False, weights='imagenet', input_shape=input_shape)
Downloading data from https://storage.googleapis.com/ tensorflow/keras-applications/resne t/ resnet101_weights_tf_dim_ordering_tf_kernels_notop.h5 171446536/171446536 [==============================] - 1s 0us/step
Case Study 2: Brain Tumor Classification
4.3
249
CNN Model
These lines of code establish a ResNet-based convolutional neural network (CNN) architecture for classifying brain tumors. The pre-trained ResNet model is added as the first layer of the model using the add() function, along with a sequential model, and classifier. The following layer, Flatten(), is introduced to transform the ResNet model’s 2D output into a 1D feature vector that can be fed into the fully connected layers. The flattened output is inserted as a fully linked layer on top of the Dense() layer with 256 units and ReLU activation. To reduce overfitting, a Dropout() layer with a dropout rate of 0.5 is added next. To obtain the final output probabilities for each class, a Dense() layer is added with the number of output classes and softmax activation. The categorical cross-entropy loss function, accuracy metric, and Adam optimizer are used to compile the model using the compile() method. During training, the network’s weights are updated using the optimizer, and the loss function calculates the discrepancy between expected and actual results. The model’s performance during training and testing is assessed using the accuracy measure. In [5]: # Add the classification layers on top of ResNet
In [5]: classifier = keras.Sequential() classifier.add(resnet_model) classifier.add(layers.Flatten()) classifier.add(layers.Dense(256, activation='relu')) classifier.add(layers.Dropout(0.5)) classifier.add(layers.Dense(num_classes, activation='softmax')) # Compile the model classifier.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy
4.4
Train the Model
The generator object train generator, which creates batches of enhanced pictures for each epoch, is used in these lines of code to train the CNN model on the training dataset. The classifier object is given the following arguments when the fit() function is invoked on it: • train_generator: The generator object used to create the augmented images for the model’s training. • steps_per_epoch: The number of batches that will be created from the training data for each epoch. The training dataset’s total sample count divided by the batch size is used to determine this.
250
J. P. Anilkumar et al.
• epochs: The number of times the whole training dataset will be iterated over during training. • validation_data: The object used as a generator to produce the augmented images used for training. This is used to keep track of the model’s performance on a different validation or dataset. • validation_steps: The number of batches that must be produced from the validation data per epoch. The validation dataset’s total sample count divided by the batch size is used to compute this. After each epoch, the fit() function records the model’s accuracy and loss on the training and validation datasets. The fit() method trains the model for the provided number of epochs. These details are kept in the history object and may be used to examine how the model performed throughout training. In [6]: In [6]: # Train the model history = classifier.fit(train_generator, steps_per_epoch=train_generator.samples // train_generator.batch epochs=50, validation_data=test_generator, validation_steps=test_generator.samples // test_generator.batch
Epoch 1/50 178/178 [==============================] - 168s 516ms/step - loss: 1.6124 - accuracy: 0. 6662 - val_loss: 3.5850 - val_accuracy: 0.2477 Epoch 2/50 178/178 [==============================] - 68s 381ms/step loss: 0.9414 - accuracy: 0.7 887 - val_loss: 5.7073 - val_accuracy: 0.2281 Epoch 3/50 178/178 [==============================] - 68s 381ms/step loss: 0.5286 - accuracy: 0.8 136 - val_loss: 1.4026 - val_accuracy: 0.2344 Epoch 4/50 178/178 [==============================] - 68s 379ms/step loss: 0.5818 - accuracy: 0.8 445 - val_loss: 3.1178 - val_accuracy: 0.3469 Epoch 5/50 178/178 [==============================] - 68s 378ms/step loss: 0.4691 - accuracy: 0.8 674 - val_loss: 1.2281 - val_accuracy: 0.4664 Epoch 6/50 178/178 [==============================] - 68s 380ms/step loss: 0.4422 - accuracy: 0.8 792 - val_loss: 0.6995 - val_accuracy: 0.7398 Epoch 7/50 178/178 [==============================] - 67s 376ms/step loss: 0.2978 - accuracy: 0.9 070 - val_loss: 0.4949 - val_accuracy: 0.8156 Epoch 8/50 178/178 [==============================] - 68s 381ms/step loss: 0.2817 - accuracy: 0.9 081 - val_loss: 7.4410 - val_accuracy: 0.6164 Epoch 9/50 178/178 [==============================] - 68s 383ms/step loss: 0.2530 - accuracy: 0.9 157 - val_loss: 0.3382 - val_accuracy: 0.8867
Case Study 2: Brain Tumor Classification Epoch 10/50 178/178 [==============================] - 68 s 379 - loss: 0.2232 - accuracy: 0.9 308 - val_loss: 1.2623 - val_accuracy: 0.6492 Epoch 11/50 178/178 [==============================] - 69 s 385 - loss: 0.3170 - accuracy: 0.9 113 - val_loss: 1.0645 - val_accuracy: 0.7445 Epoch 12/50 178/178 [==============================] - 69 s 386 - loss: 0.1938 - accuracy: 0.9 333 - val_loss: 0.3485 - val_accuracy: 0.8789 Epoch 13/50 178/178 [==============================] - 68 s 383 - loss: 0.2355 - accuracy: 0.9 213 - val_loss: 0.2923 - val_accuracy: 0.9023 Epoch 14/50 178/178 [==============================] - 68 s 380 - loss: 0.2698 - accuracy: 0.9 217 - val_loss: 3.2239 - val_accuracy: 0.3688 Epoch 15/50 178/178 [==============================] - 68 s 379 - loss: 0.2198 - accuracy: 0.9 356 - val_loss: 0.2600 - val_accuracy: 0.9141 Epoch 16/50 178/178 [==============================] - 68 s 383 - loss: 0.1727 - accuracy: 0.9 452 - val_loss: 0.7824 - val_accuracy: 0.7820 Epoch 17/50 178/178 [==============================] - 68 s 380 - loss: 0.3644 - accuracy: 0.9 014 - val_loss: 115.1347 - val_accuracy: 0.3086 Epoch 18/50 178/178 [==============================] - 67 s 374 - loss: 0.4011 - accuracy: 0.8 910 - val_loss: 0.5298 - val_accuracy: 0.8047 Epoch 19/50 178/178 [==============================] - 68 s 379 - loss: 0.2799 - accuracy: 0.9 225 - val_loss: 1193.4662 - val_accuracy: 0.3125 Epoch 20/50 178/178 [==============================] - 67 s 374 - loss: 0.4156 - accuracy: 0.9 083 - val_loss: 61.0873 - val_accuracy: 0.5992 Epoch 21/50 178/178 [==============================] - 68 s 378 - loss: 0.2734 - accuracy: 0.9 255 - val_loss: 23.4512 - val_accuracy: 0.3141 Epoch 22/50 178/178 [==============================] - 67 s 377 - loss: 0.4708 - accuracy: 0.9. 109 - val_loss: 4.9691 - val_accuracy: 0.7875. Epoch 23/50. 178/178 [==============================] - 67 s 376 - loss: 0.2782 - accuracy: 0.9 294 - val_loss: 1.1295 - val_accuracy: 0.7172 Epoch 24/50 178/178 [==============================] - 68 s 378 - loss: 0.1980 - accuracy: 0.9 433 - val_loss: 0.2136 - val_accuracy: 0.9234 Epoch 25/50 178/178 [==============================] - 67 s 378 - loss: 0.2100 - accuracy: 0.9 421 - val_loss: 0.2040 - val_accuracy: 0.9336 Epoch 26/50 178/178 [==============================] - 66 s 372 - loss: 0.1881 - accuracy: 0.9 530 - val_loss: 0.3867 - val_accuracy: 0.8570 Epoch 27/50
251
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
ms/step
252
J. P. Anilkumar et al.
178/178 [==============================] - 68 s 378 ms/step - loss: 0.1273 - accuracy: 0.9 618 - val_loss: 0.1067 - val_accuracy: 0.9656 Epoch 28/50 178/178 [==============================] - 67 s 373 ms/step - loss: 0.1468 - accuracy: 0.9 579 - val_loss: 1053.6357 - val_accuracy: 0.3148 Epoch 29/50 178/178 [==============================] - 67 s 377 ms/step - loss: 0.2129 - accuracy: 0.9 574 - val_loss: 0.1300 - val_accuracy: 0.9523
Epoch 30/50 178/178 [==============================] - 69s 388ms/step loss: 0.1423 - accuracy: 0.9 651 - val_loss: 0.2734 - val_accuracy: 0.9469 Epoch 31/50 178/178 [==============================] - 71s 397ms/step loss: 0.1071 - accuracy: 0.9 639 - val_loss: 1.1121 - val_accuracy: 0.6656 Epoch 32/50 178/178 [==============================] - 71s 398ms/step loss: 0.2856 - accuracy: 0.9 319 - val_loss: 155.1677 - val_accuracy: 0.2977 Epoch 33/50 178/178 [==============================] - 70s 394ms/step loss: 0.4239 - accuracy: 0.8 808 - val_loss: 0.7172 - val_accuracy: 0.8516 Epoch 34/50 178/178 [==============================] - 70s 393ms/step loss: 0.1874 - accuracy: 0.9 350 - val_loss: 0.2975 - val_accuracy: 0.9086 Epoch 35/50 178/178 [==============================] - 70s 392ms/step loss: 0.2327 - accuracy: 0.9 350 - val_loss: 3084.9941 - val_accuracy: 0.3680 Epoch 36/50 178/178 [==============================] - 69s 387ms/step loss: 0.1619 - accuracy: 0.9 552 - val_loss: 0.2330 - val_accuracy: 0.9320 Epoch 37/50 178/178 [==============================] - 70s 390ms/step loss: 0.1828 - accuracy: 0.9 540 - val_loss: 0.2234 - val_accuracy: 0.9227 Epoch 38/50 178/178 [==============================] - 70s 393ms/step loss: 0.2110 - accuracy: 0.9 579 - val_loss: 0.7641 - val_accuracy: 0.8156 Epoch 39/50 178/178 [==============================] - 75s 422ms/step loss: 0.1894 - accuracy: 0.9 514 - val_loss: 0.5340 - val_accuracy: 0.8102 Epoch 40/50 178/178 [==============================] - 72s 403ms/step loss: 0.1235 - accuracy: 0.9 569 - val_loss: 0.1776 - val_accuracy: 0.9414 Epoch 41/50 178/178 [==============================] - 71s 397ms/step loss: 0.0921 - accuracy: 0.9 722 - val_loss: 0.0859 - val_accuracy: 0.9641 Epoch 42/50 178/178 [==============================] - 71s 398ms/step loss: 0.1393 - accuracy: 0.9 581 - val_loss: 0.2020 - val_accuracy: 0.9273 Epoch 43/50
Case Study 2: Brain Tumor Classification
253
178/178 [==============================] - 71s 399ms/step loss: 0.0915 - accuracy: 0.9 680 - val_loss: 0.2554 - val_accuracy: 0.9156 Epoch 44/50 178/178 [==============================] - 76s 426ms/step loss: 0.0870 - accuracy: 0.9 734 - val_loss: 0.1290 - val_accuracy: 0.9539 Epoch 45/50 178/178 [==============================] - 71s 395ms/step loss: 0.0587 - accuracy: 0.9 799 - val_loss: 0.0840 - val_accuracy: 0.9750 Epoch 46/50 178/178 [==============================] - 71s 395ms/step loss: 0.0616 - accuracy: 0.9 805 - val_loss: 0.1492 - val_accuracy: 0.9664 Epoch 47/50 178/178 [==============================] - 71s 397ms/step loss: 0.0792 - accuracy: 0.9 764 - val_loss: 0.1646 - val_accuracy: 0.9422
Epoch 48/50 178/178 [==============================] - 72s 401ms/step loss: 0.0611 - accuracy: 0.9 778 - val_loss: 0.7863 - val_accuracy: 0.8562 Epoch 49/50 178/178 [==============================] - 71s 397ms/step loss: 0.1615 - accuracy: 0.9 585 - val_loss: 2.0743 - val_accuracy: 0.6062 Epoch 50/50 178/178 [==============================] - 71s 395ms/step loss: 0.0906 - accuracy: 0.9 699 - val_loss: 0.1145 - val_accuracy: 0.9656
# Save the model. This path might differ in your case classifier.save('/kaggle/working/Classifier.h5')
# Evaluate the model score = classifier.evaluate(test_generator, verbose=0) print('Validation loss:', score[0]) print('Validation accuracy:', score[1]) Validation loss: 0.11681337654590607 Validation accuracy: 0.9649122953414917
Validation loss: 0.11681337654590607 Validation accuracy: 0.9649122953414917
In the previous case study, we did not evaluate the model using metrics, but we did so by making use of user-defined images and videos. The intention was to demonstrate the emotion detection model’s efficiency first-hand. Here, although you can see and test the deployed model, a certain group of our readers may not
254
J. P. Anilkumar et al.
be able to understand or make out the efficiency. Hence, in this case study, we are providing the model evaluation statistics to disclose its efficiency.
5 Output and Conclusion Our deep learning model demonstrated encouraging results in the classification of MRI brain tumor images in the provided dataset. As ResNet50 can handle enormous volumes of data with a relatively minimal risk of overfitting, its application has shown to be successful in our case. As with any machine learning model, obtaining high accuracy largely depends on the quality of the data used to train the model. The creation of more reliable models has been made possible by the availability of large-scale annotated datasets. The difficulty of handling unbalanced and noisy datasets, however, continues to be a major problem in emotion detection. It is crucial to keep in mind that there is still room for improvement. Adding new training data might assist the model capture additional features that were not found in the given dataset (could increase model accuracy). Additionally, tweaking the model’s parameters may also produce better results. In conclusion, this work offers a useful way for classifying different types of brain tumors from their MRI images, and the model’s output may be utilized to improve future approaches for classifying brain tumors. Browse to this web page to view and see our model in action as shown below: https://huggingface.co/spaces/Jishnnu/Brain-Tumor-Detection
Case Study 2: Brain Tumor Classification
6 Credits 1. MRI Brain Tumor Image Dataset. https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset 2. ResNet101 https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet
255
Further Reading Hussam Bin Mehare and Jishnu Pillai Anilkumar Department of Mechanical Engineering, Z.H. College of Engineering and Technology, Aligarh Muslim University, Aligarh, Uttar Pradesh, India Department of Computer Science & Engineering, Presidency University, Bengaluru, Karnataka, India
In an era of augmented technological breakthroughs, it is confident that as time goes on, computing systems and newly developed technologies will continue to set new standards and exhibit greater versatility. In the 1950s, artificial intelligence pioneer Arthur Samuel created the first self-learning checkers system when the phrase Machine Learning was first used. The defining characteristic of machine learning emanates from its capacity to draw precise, logical conclusions from computer algorithms and to learn from trials and observations while reflecting on them. Mathematics, the root of all disciplines, is its source and underlying principle element, of which calculus, probability, linear algebra, and statistics make up four of its cornerstones. While statistical concepts are crucial to all models, calculus allows us to comprehend and optimize them, and when working with enormous datasets, linear algebra is quite beneficial in data representation and computation, while probability helps predict how future events will turn out. Machine learning has exploded in mainstream popularity, propelled by developments in mathematics, computer science, modern computing systems, better datasets, deep understanding, big data, and the internet of things. Some primary industries where its benefits can be seen are healthcare, bioinformatics, search engines, marketing, education, automobiles, aviation, maritime, engineering, manufacturing, and logistics. Agriculture, finance, etc. Bioinformatics is progressively serving an integral role in adopting machine learning solutions to conduct predictive analytics and derive holistic insights into intricate biological processes. The Python-Machine Learning coalition has cemented its position in the data science and IT sectors. Due to its simplicity, shorter development time, and consistent syntax, it is ideal for machine learning applications. Listed below are references to help you explore the vastness of machine learning, the math behind it, its applications, and its prospects. 1. Turing, A. (1936). Turing machine. Proc London Math Soc, 242, 230-265. 2. Samuel, A. L. (1967). Some studies in machine learning using the game of checkers. II— Recent progress. IBM Journal of research and development, 11(6), 601-617. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1
257
258
Further Reading
3. Burkov, A. (2019). The hundred-page machine learning book (Vol. 1, p. 32). Quebec City, QC, Canada: Andriy Burkov. 4. Theobald, O. (2017). Machine learning for absolute beginners: a plain English introduction (Vol. 157). Scatterplot press. 5. Géron, A. (2019). Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. “O’Reilly Media, Inc.”. 6. Kelleher, J. D., Mac Namee, B., & D'arcy, A. (2020). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT press. 7. Maini, V., & Sabri, S. (2017). Machine learning for humans. Online: https://medium. com/machine-learning-for-humans. 8. Harrington, P. (2012). Machine learning in action. Simon and Schuster. 9. Brewka, G. (1996). Artificial intelligence—a modern approach by Stuart Russell and Peter Norvig, Prentice Hall. Series in Artificial Intelligence, Englewood Cliffs, NJ. The Knowledge Engineering Review, 11(1), 78-79. 10. Murphy, K. P. (2022). Probabilistic machine learning: an introduction. MIT press. 11. McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. “O’Reilly Media, Inc.”. 12. Ng, A. (2017). Machine learning yearning. URL: http://www. mlyearning. org/(96), 139. 13. Liu, Y. H. (2017). Python Machine Learning By Example. Packt Publishing Ltd. 14. Burkov, A. (2019). The hundred-page machine learning book (Vol. 1, p. 32). Quebec City, QC, Canada: Andriy Burkov. 15. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260. 16. Wang, J. T., Zaki, M. J., Toivonen, H. T., & Shasha, D. (2005). Introduction to data mining in bioinformatics. In Data mining in bioinformatics (pp. 3-8). Springer, London. 17. Wu, X., Wang, J. T., Jain, L., Zaki, M. J., Shasha, D., & Toivonen, H. (Eds.). (2005). Data mining in bioinformatics. Springer Science & Business Media. 18. Zaki, M. J., Wang, J. T., & Toivonen, H. T. (2002). Biokdd 2002: Recent advances in data mining for bioinformatics. ACM SIGKDD Explorations Newsletter, 4(2), 112-114. 19. Lee, B. D., Gitter, A., Greene, C. S., Raschka, S., Maguire, F., Titus, A. J., Kessler, M. D., Lee, A. J., Chevrette, M. G., Stewart, P. A., Britto-Borges, T., Cofer, E. M., Yu, K. H., Carmona, J. J., Fertig, E. J., Kalinin, A. A., Signal, B., Lengerich, B. J., Triche, T. J., Jr, & Boca, S. M. (2022). Ten quick tips for deep learning in biology. PLoS computational biology, 18(3), e1009803. https://doi.org/10.1371/journal.pcbi.1009803 20. Ramsundar, B., Eastman, P., Walters, P., & Pande, V. (2019). Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. O'Reilly Media. 21. Yang, A., Zhang, W., Wang, J., Yang, K., Han, Y., & Zhang, L. (2020). Review on the application of machine learning algorithms in the sequence data mining of DNA. Frontiers in Bioengineering and Biotechnology, 8, 1032. 22. Raschka, S., & Mirjalili, V. (2019). Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing Ltd. 23. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). 24. Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1701-1708). 25. Russo, G., Reche, P., Pennisi, M., & Pappalardo, F. (2020). The combination of artificial intelligence and systems biology for intelligent vaccine design. Expert Opinion on Drug Discovery, 15(11), 1267-1281. 26. Dubitzky, W., & Azuaje, F. (Eds.). (2004). Artificial intelligence methods and tools for systems biology. Heidelberg: Springer.
Further Reading
259
27. Almalki, Y. E., Qayyum, A., Irfan, M., Haider, N., Glowacz, A., Alshehri, F. M., ... & Rahman, S. (2021, May). A novel method for COVID-19 diagnosis using artificial intelligence in chest X-ray images. In Healthcare (Vol. 9, No. 5, p. 522). Multidisciplinary Digital Publishing Institute. 28. Adams, S. J., Henderson, R. D., Yi, X., & Babyn, P. (2021). Artificial intelligence solutions for analysis of X-ray images. Canadian Association of Radiologists Journal, 72(1), 60-72. 29. Thomas, R. S., Rank, D. R., Penn, S. G., Zastrow, G. M., Hayes, K. R., Pande, K., ... & Bradfield, C. A. (2001). Identification of toxicologically predictive gene sets using cDNA microarrays. Molecular Pharmacology, 60(6), 1189-1194. 30. Kalinin, A. A., Higgins, G. A., Reamaroon, N., Soroushmehr, S., Allyn-Feuer, A., Dinov, I. D., ... & Athey, B. D. (2018). Deep learning in pharmacogenomics: from gene regulation to patient stratification. Pharmacogenomics, 19(7), 629-650. 31. Neftci, E. O., & Averbeck, B. B. (2019). Reinforcement learning in artificial and biological systems. Nature Machine Intelligence, 1(3), 133-143. 32. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., ... & Dean, J. (2019). A guide to deep learning in healthcare. Nature medicine, 25(1), 24-29. 33. O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458. 34. Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big data & society, 3(1), 2053951715622512. 35. Trask, N., Huang, A., & Hu, X. (2022). Enforcing exact physics in scientific machine learning: a data-driven exterior calculus on graphs. Journal of Computational Physics, 456, 110969. 36. Giesen, J., Klaus, J., Laue, S., & Schreck, F. (2019, June). Visualization Support for Developing a Matrix Calculus Algorithm: A Case Study. In Computer Graphics Forum (Vol. 38, No. 3, pp. 351-361). 37. Stokes, J. M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz, A., Donghia, N. M., ... & Collins, J. J. (2020). A deep learning approach to antibiotic discovery. Cell, 180(4), 688-702. 38. Lauzon, F. Q. (2012, July). An introduction to deep learning. In 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA) (pp. 1438-1439). IEEE. 39. Ebrahim, M., Al-Ayyoub, M., & Alsmirat, M. A. (2019, June). Will transfer learning enhance imagenet classification accuracy using imagenet-pretrained models?. In 2019 10th International Conference on Information and Communication Systems (ICICS) (pp. 211-216). IEEE. 40. Bolukbasi, T., Wang, J., Dekel, O., & Saligrama, V. (2017, July). Adaptive neural networks for efficient inference. In International Conference on Machine Learning (pp. 527-536). PMLR. 41. Wang, M., Cui, Y., Wang, X., Xiao, S., & Jiang, J. (2017). Machine learning for networking: Workflow, advances and opportunities. Ieee Network, 32(2), 92-99. 42. Brunke, L., Greeff, M., Hall, A. W., Yuan, Z., Zhou, S., Panerati, J., & Schoellig, A. P. (2022). Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 411-444. 43. Brink, H., Richards, J., & Fetherolf, M. (2016). Real-world machine learning. Simon and Schuster. 44. Voulodimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep learning for computer vision: A brief review. Computational intelligence and neuroscience, 2018. 45. Thadeshwar, H., Shah, V., Jain, M., Chaudhari, R., & Badgujar, V. (2020, September). Artificial intelligence based self-driving car. In 2020 4th International Conference on Computer, Communication and Signal Processing (ICCCSP) (pp. 1-5). IEEE. 46. Manoharan, S. (2019). An improved safety algorithm for artificial intelligence enabled processors in self driving cars. Journal of artificial intelligence, 1(02), 95-104. 47. Burley, S. K., Arap, W., & Pasqualini, R. (2021). Predicting proteome-scale protein structure with artificial intelligence. New England Journal of Medicine, 385(23), 2191-2194. 48. LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10), 1995.
260
Further Reading
49. Huang, S., Yang, J., Fong, S., & Zhao, Q. (2020). Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer letters, 471, 61-71. 50. Gibas, C., Jambeck, P., & Fenton, J. (2001). Developing bioinformatics computer skills. “O’Reilly Media, Inc.” 51. Do, J. H., & Choi, D. K. (2006). Computational approaches to gene prediction. The Journal of Microbiology, 44(2), 137-144. 52. Brian, W. (2021). Bioinformatics and machine learning in prevention, detection and treatment of HIV/AIDS (Doctoral dissertation, Brac University). 53. Arnold, K., Bordoli, L., Kopp, J., & Schwede, T. (2006). The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics, 22(2), 195-201. 54. Notredame, C., & Higgins, D. G. (1996). SAGA: sequence alignment by genetic algorithm. Nucleic acids research, 24(8), 1515-1524. 55. Layeb, A., Meshoul, S., & Batouche, M. (2006, April). Multiple sequence alignment by quantum genetic algorithm. In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium (pp. 8-pp). IEEE. 56. Richards, F. M., & Kundrot, C. E. (1988). Identification of structural motifs from protein coordinate data: secondary structure and first‐level supersecondary structure. Proteins: Structure, Function, and Bioinformatics, 3(2), 71-84. 57. Al-Lazikani, B., Jung, J., Xiang, Z., & Honig, B. (2001). Protein structure prediction. Current opinion in chemical biology, 5(1), 51-56. 58. Nussinov, R., Tsai, C. J., Shehu, A., & Jang, H. (2019). Computational structural biology: Successes, future directions, and challenges. Molecules, 24(3), 637. 59. Gustavsen, J. A., Pai, S., Isserlin, R., Demchak, B., & Pico, A. R. (2019). RCy3: Network biology using Cytoscape from within R. F1000Research, 8. 60. Manghwar, H., Li, B., Ding, X., Hussain, A., Lindsey, K., Zhang, X., & Jin, S. (2020). CRISPR/Cas systems in genome editing: methodologies and tools for sgRNA design, off‐ target evaluation, and strategies to mitigate off‐target effects. Advanced Science, 7(6), 1902312.
Index
A Artificial intelligence (AI), 127–144, 147, 160, 175–186, 189, 191, 192, 198, 202, 207, 208, 210–213 Automotive industry, 189–191 Aviation, 191–193
B Bioinformatics, 105–121, 176–186
C Central processor unit (CPU), 4–7, 9, 10, 14, 15, 17–19 Computational biology, 109, 120
D Data privacy, 216 DeepMind, 118, 142 Differentiation, 53, 81, 83–85, 87–89 Directories, 19–22, 47
G Genomics, 110–113, 120
I In silico, 177 Integration, 89, 90, 99, 206, 216
L Libraries, 27, 28, 46–54, 59, 111, 118, 216 Linear algebra, 61–68, 70–73, 97–101
M Machine learning (ML), 52–54, 58, 59, 61, 98, 118, 127–144, 147–173, 176–182, 184–186, 189–198, 200–203, 205, 207–217 Manufacturing, 2, 6, 10, 190, 201–205 Memory management, 14, 16, 17, 27
O Operating systems, 2, 13–20, 23, 27, 141 Operators, 30–32, 56, 203, 204
P Probability, 59, 61, 93–102, 132, 154, 160, 161, 208 Proteomics, 110 Python, 24–59 Python identifiers, 29
S SARS-CoV-2, 107 Statistics, 49, 61, 105, 116, 142, 151, 154
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Badar (ed.), A Guide to Applied Machine Learning for Biologists, https://doi.org/10.1007/978-3-031-22206-1
261
262 Supervised learning, 134–136, 148–151, 154, 158–160, 162, 173, 186, 207 System biology, 176–179
Index U Unsupervised learning, 134–136, 147, 148, 156–159, 162, 173, 186, 207