Key Concepts in Artificial Intelligence (Team-IRA) 1774691450, 9781774691458

This book explains numerous key terms that are related to Artificial Intelligence. Artificial intelligence is the simula

192 104 43MB

English Pages 287 [291] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Title Page
Copyright
ABOUT THE AUTHOR
TABLE OF CONTENTS
List of Figures
List of Abbreviations
Preface
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Z
Bibliography
Index
Back Cover
Recommend Papers

Key Concepts in Artificial Intelligence (Team-IRA)
 1774691450, 9781774691458

  • Commentary
  • Thanks to Team-IRA for making this available
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

本书版权归Arcler所有

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

本书版权归Arcler所有

Jocelyn O. Padallan

www.arclerpress.com

Key Concepts in Artificial Intelligence Jocelyn O. Padallan

Arcler Press 224 Shoreacres Road Burlington, ON L7L 2H2 Canada www.arclerpress.com Email: [email protected]

e-book Edition 2022 ISBN: 978-1-77469-324-7 (e-book)

This book contains information obtained from highly regarded resources. Reprinted material sources are indicated and copyright remains with the original owners. Copyright for images and other graphics remains with the original owners as indicated. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable data. Authors or Editors or Publishers are not responsible for the accuracy of the information in the published chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The authors or editors and the publisher have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not been acknowledged, please write to us so we may rectify.

Notice: Registered trademark of products or corporate names are used only for explanation and identification without intent of infringement. © 2022 Arcler Press ISBN: 978-1-77469-145-8 (Hardcover) Arcler Press publishes wide variety of books and eBooks. For more information about Arcler Press and its products, visit our website at www.arclerpress.com

本书版权归Arcler所有

ABOUT THE AUTHOR

Jocelyn O. Padallan is Assistant Professor II from Laguna State Polytechnic University, Philippines and she is currently pursuing her Master of Science in Information Technology at Laguna State Polytechnic University San Pablo Campus and has Master of Arts in Education from the same University. She has passion for teaching and has been Instructor and Program Coordinator at Laguna State Polytechnic University

本书版权归Arcler所有

本书版权归Arcler所有

TABLE OF CONTENTS

本书版权归Arcler所有

List of Figures ........................................................................................................ix List of Abbreviations ...........................................................................................xiii Preface........................................................................ ................................... ....xix A ............................................................................................................... 1 B.............................................................................................................. 22 C ............................................................................................................. 36 D ............................................................................................................. 47 E .............................................................................................................. 57 F .............................................................................................................. 65 G ............................................................................................................. 75 H ............................................................................................................. 84 I............................................................................................................... 93 J............................................................................................................. 104 K............................................................................................................ 106 L ............................................................................................................ 114 M........................................................................................................... 125 N ........................................................................................................... 144 O ........................................................................................................... 155 P ............................................................................................................ 166 Q ........................................................................................................... 184 R............................................................................................................ 185

本书版权归Arcler所有

S ............................................................................................................ 199 T ............................................................................................................ 225 U ........................................................................................................... 236 V ........................................................................................................... 242 W .......................................................................................................... 246 X............................................................................................................ 254 Z............................................................................................................ 255 Bibliography .......................................................................................... 257 Index ..................................................................................................... 263

viii

LIST OF FIGURES Figure 1. Goal Setting and action planning Figure 2. Components of adaptive resonance theory (ART) Figure 3. Algorithm basic functioning process Figure 4. Negamax alpha-beta pruning Figure 5. Artificial intelligence Figure 6. Artificial neural network image recognition Figure 7. Fully automated diagnostic system with artificial intelligence Figure 8. ALVIN vehicle Figure 9. Backtracking representation Figure 10. Bayes rule Figure 11. The architecture of Bidirectional association memory Figure 12. Cycle of case-based reasoning Figure 13. Cassiopeia constellation map Figure 14. Formula for Chi-squared statistics Figure 15. Computer vision syndrome Figure 16. Convolutional neural network feed forward example Figure 17. Data mining process Figure 18. Data science Figure 19. Problem solution decision Figure 20. Image showing edge coloring Figure 21. Image showing evolutionary algorithm Figure 22. Expectation-maximization (EM) algorithm Figure 23. Fuzzy logic Figure 24. Fuzzy cognitive chart of drug crime Figure 25. A FPGA based bitcoin mining board Figure 26. Man facial recognition technology Figure 27. OLAP Figure 28. Ontology language XOL used for cross-application communication

本书版权归Arcler所有

Figure 29. Optical character recognition (OCR) Figure 30. Use of optical flow technique for different sequences Figure 31. Parallel processing Figure 32. Path matrix Figure 33. Plan recognition Figure 34. Different components of the predictive model Figure 35. Architecture of probabilistic neural network (PNN) Figure 36. Image showing saddle point Figure 37. Image portraying semantic segmentation Figure 38. Set Point functioning Figure 39. Set point tree Figure 40. Shafer-Shenoy Architecture for a join tree Figure 41. Portray of shift register Figure 42. Image showing signal filtering Figure 43. Graph showing signal processing Figure 44. Image showing Simpson’s paradox Figure 45. Image showing simulated annealing Figure 46. Representation of SIMD Figure 47. Different smart apps in computing device Figure 48. An example of smart machine Figure 49. Robotic process automation Figure 50. Pictorial wooden description of software Figure 51. Spectral analysis of light source Figure 52. Representation of voice recognition Figure 53. Image showing statistical learning Figure 54. STEM description Figure 55. Though Process behind Strong AI Figure 56. Factors involved in supervised learning Figure 57. SVM separating hyperplanes Figure 58. Swam behavior of birds Figure 59. The environment of the TASA system Figure 60. Tensor processing unit Figure 61. Image showing travelling salesman problem Figure 62. Truth management system on the basis of logic

本书版权归Arcler所有

x

Figure 63. The various subdomains integrated in the UMLS Figure 64. Unsupervised learning Figure 65. Wearable computer systems Figure 66. Word error rate Figure 67. WTM portray representation Figure 68. Working envelope of a robot Figure 69. Different types of wearable technology Figure 70. Web 2.0 description Figure 71. Word 2 VEC image Figure 72. Zero sum game Figure 73. Zero shot learning

本书版权归Arcler所有

xi

本书版权归Arcler所有

LIST OF ABBREVIATIONS

ABS

Assumption Based System

AI

Artificial Intelligence

AIC

Akaike Information Criteria

AID

Automatic Interaction Detection

ALVINN

Autonomous Land Vehicle in a Neural Net

ANN

Artificial Neural Network

APACHE

Active Physiology and Chronic Health Evaluation

APR

Apple Print Recognizer

ART

Adaptive Resonance Theory

ART

Advanced Reasoning Tool

ATR

Automatic Target Recognition

AVAS

Additivity and Variance Stabilization

BAM

Bidirectional Associative Memory

BIC

Bayesian Information Criteria

BN

Bayesian Network

BNIF

Bayesian Network Interchange Format

BPM

Business Process Modeling

CAD

Coronary Artery Disease

CAI

Computer-Aided Instruction

CART

Classification and Regression Trees

CBR

Case-Based Reasoning

CBT

Case-Based Thinking

CDM

Critical Decision Method

CHAMP

Churn Analysis, Modeling, and Prediction

CKML

Conceptual Knowledge Markup Language

CLIPS

C Language Integrated Production System

本书版权归Arcler所有

CNF

Conjunctive Normal Form

CNN

Complexity Neural Networks

CTA

Cognitive task analysis

DAG

Directed Acyclic Graph

DMQL

Data Mining Query Language

DNF

Disjunctive Normal Form

EA

Evolutionary Algorithm

EDR

Electronic Dictionary Research

EM

Expectation-Maximization

EP

Evolutionary Programming

ETPS

Educational Theorem Proving System

FAM

Fuzzy Associative Memory

FCM

Fuzzy Cognitive Map

FIFO

First-In, First-Out

FPGA

Field-Programmable Gate Array

FRA

Fuzzy Rule Approximation

FRL

Frame Representation Language

GA

Genetic Algorithm

GAMs

Generalized Additive Models

GLD

Generalized Logic Diagram

GPS

General Problem Solving

GPTs

General-Purpose Technologies

GRNN

General Regression Neural Network

HME

Hierarchical Mixtures of Experts

HMM

Hidden Markov Model

HPC

High-Performance Computing

HTML

HyperText Markup Language

ICAI

Intelligent Computer-Aided Instruction

ICR

Image Character Recognition

IDB

Intensional DataBase

IES

Intelligent Enterprise Strategy

ILP

Inductive Logic Programming

IoT

Internet of Things

IOU

Intersection Over Union

本书版权归Arcler所有

xiv

KDD

Knowledge Discovery in Databases

KDDMS

Knowledge and Data Discovery Management Systems

KDT

Knowledge Discovery in Text

KIF

Knowledge Interchange Format

KQML

Knowledge Query and Manipulation Language

KRL

Knowledge Representation Language

LAD

Least Absolute Deviations

LDA

Linear Discriminant Analysis

LGG

Least General Generalization

LVQ

Learning Vector Quantization

MAP

Maximum a Posteriori

MAPS

Modular Automated Parking System

MARS

Multivariate Adaptive Regression Spline

MBR

Model-Based Reasoning

MCMC

Markov Chain Monte Carlo

MDDB

Multi-Dimensional DataBase

MDLP

Minimum Description Length Principle

MDP

Markov Decision Problem

ME

Mixture-of-Experts

MGCI

Most General Common Instance

MGU

Most General Unifier

MIMD

Multiple Instruction Multiple Datastream

MITA

Metropolitan Life Intelligent Text Analyzer

MLP

Multiple Layer Perceptron

MML

Minimum Message Length

MSE

Mean Square Error

NIL

New Implementation of Lisp

NLP

Natural Language Processing

NLPCA

NonLinear Principal Components Analysis

NNF

Negation Normal Form

NPPC

Nuclear Power Plant Consultant

NRBF

Normalized Radial Basis Function

OCR

Optical Character Recognition

OLAP

Online Analytical Processing

本书版权归Arcler所有

xv

OLS

Ordinary Least Squares

OML

Ontology Markup Language

PAC

Probably Approximately Correct

PBD

Programming by Demonstration

PCFG

Probabilistic Context Free Grammar

PESKI

Probabilities Expert Systems Knowledge and Inference

PIERS

Pathology Expert Interpretative Reporting System

PLS

Partial Least Squares

PNN

Probabilistic Neural Networks

POS

Part-of-Speech

RAD

Rapid Application Development

RALPH

Rapidly Adapting Lateral Position Handler

RBF

Radial Basis function

RBM

Restricted Boltzmann Machines

RDR

Ripple Down Rules

RNN

Recurrent Neural Network

ROLAP

Relational OnLine Analytic Processing

RPA

Robotic Process Automation

RTN

Recursive Transition Network

RTOS

Real Time Operation System

SHOE

Simple HTML Ontology Extension

SIC

Schwartz Information Criteria

SIMD

Single Instruction Multiple Datastream

SME

Subject-Matter Expert

SOM

Self-Organizing Map

SQL

Structured Query Language

SVMs

Support Vector Machines

SVR

Support Vector Regression

TOVE

Toronto Virtual Enterprise

TPS

Theorem Proving System

UMLS

Unified Medical Language System

VRML

Virtual Reality Modeling Language

WER

Word Error Rate

本书版权归Arcler所有

xvi

WNG

Word N-Gram

WTM

Word-Tag Model

WWW

World Wide Web

XML

eXtended Markup Language

本书版权归Arcler所有

xvii

本书版权归Arcler所有

PREFACE

Artificial intelligence (AI) refers to the ability of a digital computer or, in other words, of the computer-regulated robot to carry out the tasks mainly associated with intellectual beings. This term, most of the time, is implemented to the project of systems that are developing furnished with the intellectual processes attributes of human beings, for instance, the ability of being rational or logical, or to discover purpose, observe, or learn from the experiences in the past. Since the 1940s, during the evolution of the digital computer, it has been identified that computers can be programmed to perform very complex duties—as, for instance, demonstrating proofs for mathematical theorems or it can be responsible for playing chess—with immense efficiency. Still, in spite of continuing progress in the speed of computer processing and the capacity of memory to hold things, there are as yet no programs that can be compared to the human elasticity over extensive domains or in roles and functions requiring much day-to-day knowledge. Conversely, some programs have acquired the operational levels of human experts and professionals in carrying out certain specific functions and duties, in such a manner that artificial intelligence in this restricted sense is established in applications as varied as the medical diagnosis, computer search engines, in addition to the voice or the recognition of handwriting. The ideal attribute of artificial intelligence is its capacity to account for and be rational as well as take actions that have the best opportunity of obtaining a particular goal. A parameter of artificial intelligence includes machine learning, which basically mentions the concept that computer programs can be activated on their own without any deliberation, learn from and modify themselves with respect to the new data without the requirement of human hands. Deep learning methods authorize this automatic learning by means of the absorption of large amounts of unorganized data, for example, text, images, or video. The groundwork of artificial intelligence is the principle that human intelligence can be regarded in a manner that a machine can mimic it and execute its functions without any troubles, from the easiest to those that are full of complexity. The objectives of artificial intelligence incorporate imitating human cognitive tasks. Researchers and developers in this dimension are making astonishingly rapid paces in mimicking tasks and functions such as learning, reasoning, and insights, to the area that these can be concretely elaborated. Some often believe that innovators may shortly be able to establish systems that increase the ability of humans to learn or be logical out on

本书版权归Arcler所有

any subject. While others remain skeptical due to all cognitive functions is tied up with value judgments that are a substance to human experience. As technology is pacing fast, former benchmarks that viewed and given several definitions to artificial intelligence has become old-fashioned. For instance, machines that evaluate basic activities or identify text by means of optical character recognition are in recent times not considered to personify artificial intelligence, since this role is now taken for granted, in the form of an intrinsic computer function. This book will introduce the readers to the field of artificial intelligence and its fundamentals. It is precisely designed for readers or students with no prior experience related to AI and its application, and touches upon a diversity of fundamental topics. By the end of the book, readers will understand the basic knowledge of artificial intelligence and the key concepts involved.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

1

A Aalborg Architecture The Aalborg architecture provides a method for computing marginal in a join tree representation of a belief net. It handles new data in a quick, flexible matter and is considered the architecture of choice for calculating marginal of factored probability distributions. It does not, however, allow for retraction of data as it stores only the current results, rather than all the data.

Abduction It was first suggested and initiated by Charles Pierce in the 1870s. the main motive of this is to quantify the patterns and also to suggest plausible hypotheses for a set of observations. This abduction is a form of nonmonotone logic.

ABEL It is a language of modeling which supports the assumptions which are based on reasoning. It is recently implemented in Macintosh Common LISP and is also available on the world wide web (www).

ABS It is an acronym is a system which is based on assumption, a logic system which uses assumption-based reasoning.

ABSTRIPS It is derived from the STRIPS program; this is also designed to solve the robotic placement and the movement problems. Not like the STRIPS it also offers the difference between the current and the goal which state by working from the most critical to the least critical difference.

Accuracy Accuracy is a machine which is a learning system and it measures the percentage of the correct predictions or the classifications which is made by the model over a specific data set. The proportion of in correction predictions on the same data is called the complement or error rate.

本书版权归Arcler所有

2

Key Concepts in Artificial Intelligence

ACE It is a technique which is based on regression that estimates additive models for smoothed response attributes. The transformation has been found out very useful in understanding the nature of the problem at hand, as well as providing predictions.

ACORN It is a Hybrid rule-based on the Bayesian system basically for advising the management of chest pain patients who are in the emergency room. It was developed and used in the mid-1980s.

Action-based Planning

Figure 1. Goal Setting and action planning. Source: Image by Capterra Blog.

It is the goal of action-based planning which is to determine how to decompose a high-level action into a network of sub actions that perform the requisite task. Therefore, the major task within such a planning system is to manage the constraints that apply to the interrelationships (e.g., ordering constraints) between actions. In fact, action-based planning is best viewed as a constraint satisfaction problem. The search for a plan cycles through the following steps: choose a constraint and apply the constraint check; if the constraint is not satisfied, choose a bug from the set of constraint bugs; choose and apply a fix, yielding a new plan and possibly a new set of constraints to check. In contrast, statebased planners generally conduct their search for a plan by reasoning about how the actions within a plan affect the state of the world and how the state of the world affects the applicability of actions.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

3

Activation Functions It has the Neural networks which obtains so much of their power through the use of activation functions instead of the linear functions of classical regression models. Basically, the inputs to a node in a neural network which are weighted and then summed. This sum is then passed through a nonlinear activation function. Usually, these functions are sigmoidal (monotone increasing) functions such as a logistic or Gaussian function, although output nodes should have activation functions matched to the distribution of the output variables. Activation functions are closely related to the link functions in statistical generalized linear models and have been intensively studied in that context.

Active Learning A proposed technique for adjusting machine learning algorithm by permitting them to determine test areas to work on their accuracy. Anytime, the calculation can pick another point x, notice the output and join the new (x, y) pair into its training base. It has been applied to neural networks, prediction functions, and clustering functions.

Act-R A goal-oriented cognitive architecture, Act-R is coordinated around a single objective stack. Its memory contains both declarative memory components and procedural memory that contains production rules. The declarative memory components have both activation esteems and associative qualities with other components.

Active Physiology and Chronic Health Evaluation (APACHE III) APACHE is a framework intended to anticipate a person’s chances of passing away in a hospital. The framework depends on a large assortment of case information and uses 27 credits to anticipate a patient’s result. It can likewise be utilized to assess the impact of a proposed or real treatment plan.

ADABOOST It is an as of late created strategy for further developing AI strategies. It can drastically work on the exhibition of classification methods (e.g., decision trees). It works by over and over applying the strategy to the information,

本书版权归Arcler所有

4

Key Concepts in Artificial Intelligence

assessing the outcomes, and afterward reweighting the perceptions to give more prominent credit to the cases that were misclassified. The last classifier utilizes the entirety of the middle of the road classifiers to characterize a perception by a greater part vote of the individual classifiers. It additionally has the intriguing property that the speculation mistake (i.e., the blunder in a test set) can keep on diminishing even after the blunder in the preparation set has quit diminishing or arrived at 0. The procedure is as yet under active development and investigation (starting at 1998).

Adaptive Interface A PC interface that consequently and progressively adjusts to the necessities and competence of every individual client of the product. The benefits of an adaptive user interface are found within its capability of conforming to a user’s needs. The properties of it allow to show only relevant and accurate information based on the current user. This eventually creates less confusion for less experienced users and provides ease of access throughout a system.

Adaptive Resonance Theory (ART)

Figure 2. Components of adaptive resonance theory (ART). Source: Image by Pixabay.

A class of neural organizations dependent on neurophysiologic models for neurons. They were created by Stephen Grossberg in 1976. ART models utilize a secret layer of ideal cases for expectation. On the off chance that an information case is adequately near a current case, it “resounds” with the case; the best case is refreshed to join the new case. Something else, another optimal case is added. ARTs are frequently addressed as having two layers, alluded to as a F1 and F2 layers. The F1 layer plays out the coordinating and the F2 layer picks the outcome. It’s anything but a type of group investigation.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

5

Adaptive Vector Quantization A neural network approach that sees the vector of inputs as framing a state space and the organization as quantization of those vectors into fewer ideal vectors or locales. As the organization “learns,” it is adjusting the area (and number) of these vectors to the information.

Additive Models A modeling procedure that uses weighted linear amounts of the possibly changed input factors to foresee the output variable, however does exclude terms, for example, cross-products which rely upon in excess of a single predictor factors. Additive models are utilized in various AI frameworks, for example, boosting, and in Generalized Additive Models (GAMs).

Additivity and Variance Stabilization (AVAS) AVAS, an abbreviation for Additivity and Variance Stabilization, is a modification of the ACE procedure for smooth relapse models. It’s anything but a difference balancing out change into the ACE method and consequently disposes of a significant number of ACE’s trouble in assessing a smooth relationship.

ADE Monitor ADE Monitor is a CLIPS-based expert framework that screens patient data for proof that a patient has endured an adverse medication reaction. The framework will incorporate the capability for adjustment by the doctors and will actually want to notify proper organizations when required.

Adjacency Matrix An Adjacency Matrix is a valuable method to address a binary connection over a finite set. Assuming the cardinality of set, A is n, the Adjacency Matrix for a relation on A will be a nxn binary matrix, with a one for the I, j-th component if the relationship holds for the I-th and j-th component and a zero in any case. Various way and conclusion calculations verifiably or unequivocally work on the contiguousness lattice. A contiguousness lattice is reflexive in the event that it has ones along the fundamental corner to corner, and is symmetric if the I, j-th component rises to the j, I-th component for all I, j sets in the framework.

本书版权归Arcler所有

6

Key Concepts in Artificial Intelligence

Advanced Reasoning Tool (ART) The Advanced Reasoning Tool (ART) is a LISP engineering language. It’s anything but a rule-based framework yet in addition permits frame and procedure representations. It was created by Inference Corporation. A similar shortened form (ART) is additionally used to allude to strategies dependent on Adaptive Resonance Theory.

Advanced Scout A particular framework, created by IBM during the 1990s, that utilizes Data Mining strategies to arrange and decipher information from b-ball games.

Advice Taker A program proposed by J. McCarthy that was planned to show practical and improvable conduct. The program was addressed as an arrangement of declarative and imperative sentences. It contemplated through prompt allowance. This framework was a harbinger of the Situational Calculus proposed by McCarthy and Hayes in a 1969 article in Machine Intelligence.

Agenda Based Systems A surmising interaction that is constrained by a plan or occupation list. It breaks the framework into unequivocal, measured advances. Every one of the sections, or errands, in the work list is some particular assignment to be cultivated during a problem-solving interaction.

Agent Architecture There are two degrees of specialist engineering, when various specialists are to cooperate for a shared objective. There is the design of the arrangement of specialists, that will decide how they cooperate, and which shouldn’t be worried about how singular specialists satisfy their sub-missions; and the engineering of every individual specialist, which decides its inward operations. The design of one programming specialist will allow connections among a large portion of the accompanying parts (contingent upon the specialist’s objectives): perceptors, effectors, correspondence channels, a state model, a model-based reasoner, an organizer/scheduler, a receptive execution screen, its reflexes (which empower the specialist to respond quickly to changes in its current circumstance that it can hardly wait on the organizer to manage), and its objectives. The perceptors, effectors,

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

7

and correspondence channels will likewise empower association with the specialist’s external world.

Agent CLIPS is an expansion of CLIPS that permits the formation of intelligent specialists that can convey on a solitary machine.

Agents Agents are software programs that are fit for self-governing, adaptable, deliberate and thinking activity in quest for at least one objective. They are intended to make an ideal move because of outside boosts from their current circumstance for the benefit of a human. At the point when different specialists are being utilized together in a framework, singular specialists are required to connect together as fitting to accomplish the objectives of the general framework. Likewise called self-governing specialists, collaborators, intermediaries, bots, droids, insightful specialists, programming specialists.

AI Effect The great practical advantages of AI applications and surprisingly the presence of AI in numerous software products go generally undetected by numerous individuals in spite of the all-around inescapable utilization of AI strategies in programming. This is the AI impact. Many showcasing individuals don’t utilize the expression “man-made reasoning” in any event, when their organization’s items depend on some AI procedures. Why not? It could be on the grounds that AI was oversold in the main jubilant long periods of reasonable standard based master frameworks during the 1980s, with the pinnacle maybe set apart by the Business Week front of July 9, 1984 declaring, Artificial Intelligence, IT’S HERE. James Hogan in his book, Mind Matters, has his own clarification of the AI Effect: “Simulated intelligence scientists talk about a curious wonder known as the “AI Effect.” At the beginning of a venture, the objective is to captivate an exhibition from machines in some assigned region that everybody concurs would require “knowledge” whenever done by a human. On the off chance that the undertaking fizzles, it’s anything but an objective of ridicule to be pointed at by the cynics to act as an illustration of the ludicrousness of the possibility that AI could be conceivable. On the off chance that it succeeds, with the interaction demystified and its inward functions exposed as lines

本书版权归Arcler所有

8

Key Concepts in Artificial Intelligence

of common PC code, the subject is excused as “not actually too clever all things considered.” Perhaps … the genuine danger that we oppose is simply the further demystification… It appears to happen over and again that a line of AI work … ends up being redirected such a way that … the actions that should stamp its achievement are exhibited splendidly. Then, at that point, the subsequent new information commonly invigorates requests for utilization of its anything but a thriving industry, market, and extra feature to our lifestyle appears, which inside 10 years we underestimate; however, by then, at that point, obviously, it’s anything but AI.

AI Languages and Tools AI software has various prerequisites from other, traditional programming. Accordingly, explicit dialects for AI programming have been created. These incorporate LISP, Prolog, and Smalltalk. While these dialects regularly lessen an opportunity to foster a man-made reasoning application, they can stretch an opportunity to execute the application. Hence, much AI programming is currently written in dialects like C++ and Java, which normally builds improvement time, however, abbreviates execution time. Likewise, to diminish the expense of AI programming, a scope of business programming improvement apparatuses has additionally been created. Stottler Henke has fostered its own restrictive apparatuses for a portion of the specific applications it is insight in making.

AI-QUIC AI-QUIC is a standard based application utilized by American International Groups guaranteeing segment. It wipes out manual guaranteeing errands and is intended to change rapidly to changes in endorsing rules.

Akaike Information Criteria (AIC) The AIC is a data-based measure for looking at different models for a similar information. It was inferred by considering the deficiency of exactness in a model when subbing information-based assessments of the boundaries of the model for the right qualities. The condition for this misfortune incorporates a consistent term, characterized by the genuine model, – multiple times the probability for the information given the model in addition to a steady numerous of the quantity of boundaries in the model. Since the initial term, including the obscure genuine model, enters as a steady (for a given

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

9

arrangement of information), it very well may be dropped, leaving two known terms which can be assessed. Arithmetically, AIC is the amount of a (negative) proportion of the mistakes in the model and a positive punishment for the quantity of boundaries in the model. Expanding the intricacy of the model will possibly work on the AIC if the fit (estimated by the log-probability of the information) works on more than the expense for the additional boundaries. A bunch of contending models can measure up by registering their AIC esteems and picking the model that has the littlest AIC esteem, the ramifications being that this model is nearest to the genuine model. In contrast to the standard measurable strategies, this takes into consideration examination of models that don’t share any normal boundaries.

Aladdin A pilot Case-Based Reasoning (CBR) created and tried at Microsoft during the 1990s. It resolved issues engaged with setting up Microsoft Windows NT 3.1 and, in a subsequent form, tended to help issues for Microsoft Word on the Macintosh. In tests, the Aladdin framework was found to permit support designers to offer help in regions for which they had practically no preparation.

Algorithm An algorithm is a bunch of guidelines that disclose how to tackle an issue. It is typically first expressed in English and math, and from this, a developer can make an interpretation of it into executable code (that is, code to be run on a PC).

Algorithmic Distribution A likelihood circulation whose qualities can be dictated by a capacity or calculation which takes as a contention the design of the properties and, alternatively, a few boundaries. At the point when the appropriation is a numerical capacity, with a “little” number of boundaries, it is normal alluded to as a parametric dissemination.

本书版权归Arcler所有

10

Key Concepts in Artificial Intelligence

Figure 3. Algorithm basic functioning process. Source: Image by MIT technology Review.

Alpha-Beta Pruning

Figure 4. Negamax alpha-beta pruning. Source: Image by Wikimedia Commons.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

11

A calculation to prune, or abbreviate, a pursuit tree. It is utilized by frameworks that produce trees of potential moves or activities. A part of a tree is pruned when it very well may be shown that it can’t prompt an answer that is any better compared to a known decent arrangement. As a tree is created, it tracks two numbers called alpha and beta.

Analogy A method for thinking or discovering that reasons by contrasting the current circumstance with different circumstances that are in some sense comparable.

Analytical Model In Data Mining, a design and interaction for investigating and summing up an information base. A few models would incorporate a Classification and Regression Trees (CART) model to characterize groundbreaking perceptions, or a relapse model to foresee new upsides of one (set of) variable(s) given another set. Tribal Ordering Since Directed Acyclic Graphs (DAGs) don’t contain any coordinated cycles, it is feasible to produce a straight requesting of the hubs so any descendent of a hub follow their precursors in the hub. This can be utilized in likelihood propagation on the net.

AND Versus OR Nondeterminism In logic programs, don’t indicate the request where AND propositions and “A if B” propositions are assessed. This can influence the productivity of the program in discovering an answer, especially in the event that one of the branches being assessed is long.

Apple Print Recognizer (APR) The Apple Print Recognizer (APR) is the penmanship acknowledgment motor provided with the eMate and later Newton frameworks. It’s anything but a fake neural organization classifier, language models, and word references to permit the frameworks to perceive printing and penmanship. Stroke streams were divided and afterward classified utilizing a neural net classifier. The likelihood vectors delivered by the Artificial Neural Network (ANN) were then utilized in a substance driven pursuit driven by the language models.

本书版权归Arcler所有

12

Key Concepts in Artificial Intelligence

Arcing Procedures They are an overall class of Adaptive Resampling and Combining methods for working on the presentation of AI and measurable strategies. Two noticeable models incorporate ADABOOST and sacking. When all is said in done, these strategies iteratively apply a learning method, for example, a choice tree, to a preparation set, and afterward reweight, or resample, the information and refit the learning strategy to the information. This delivers an assortment of learning rules. Novel perceptions are gone through all individuals from the assortment and the expectations or orders are joined to deliver a consolidated outcome by averaging or by a democratic forecast. Albeit less interpretable than a solitary classifier, these strategies can deliver results that are definitely more precise than a solitary classifier. Exploration has shown that they can deliver insignificant (Bayes) hazard classifiers.

ARF An overall issue solver created by R.R. Fikes in the last part of the 1960s. It joined requirement fulfillment techniques and heuristic quests. Fikes additionally created REF, a language for expressing issues for ARF.

ARIS It is an economically applied AI framework that aids the portion of air terminal doors to showing up flights. It utilizes rule-based thinking, imperative engendering, and spatial wanting to appoint air terminal entryways, and furnish the human chiefs with a general perspective on the current activities.

Artificial Intelligence Artificial intelligence (AI) is the copying of human idea and psychological cycles to take care of complex issues consequently. Simulated intelligence utilizes methods for composing PC code to address and control information. Various procedures imitate the various ways that individuals think and reason (see Case-based Reasoning and Model-based Reasoning for instance). Computer based intelligence applications can be either independent programming, for example, choice help programming, or installed inside bigger programming or equipment frameworks. Computer based intelligence has been around for around 50 years and keeping in mind that early positive thinking about coordinating with human thinking abilities rapidly has not

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

13

been acknowledged at this point, there is a huge and developing arrangement of significant applications. Man-made intelligence hasn’t yet mirrored a significant part of the presence of mind thinking of a five-year old youngster.

Figure 5. Artificial intelligence. Source: Image by Wikimedia Comonns.

By and by, it can effectively copy numerous master assignments performed via prepared grown-ups, and there is presumably more computerized reasoning being utilized practically speaking in some structure than a great many people figure it out. Truly savvy applications might be feasible with man-made brainpower and it is the sign of a fruitful planner of AI programming to convey usefulness that can’t be conveyed without utilizing AI. See citations.

Artificial Neural Network (ANN) A learning model made to behave like a human cerebrum that addresses errands that are excessively hard for customary PC frameworks to tackle.

本书版权归Arcler所有

14

Key Concepts in Artificial Intelligence

Figure 6. Artificial neural network image recognition. Source: Inage by Wikimedia Commons.

ARTMAP A regulated learning rendition of the ART-1 model. It learns indicated twofold info designs. There are different regulated ART calculations that are named with the addition “Guide,” as in Fuzzy ARTMAP. These calculations bunch both the sources of info and targets and partner the two arrangements of groups. The fundamental impediment of the ARTMAP calculations is that they have no instrument to keep away from overfitting and thus ought not be utilized with loud information.

Assembly Language A coding that utilizes straightforward shortenings and images to represent machine language. The PC code is prepared by a constructing agent, which makes an interpretation of the content record into a bunch of PC directions. For instance, the machine language guidance that causes the program store the worth 3 in area 27 may be STO 3 @27.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

15

Assertion In an information base, rationale framework, or metaphysics, a declaration is any explanation that is characterized deduced to be valid. This can incorporate things like aphorisms, qualities, and requirements.

Association Rule Templates Searches for association rules in a huge information base can create countless standards. These standards can be repetitive, self-evident, and in any case dull to a human examiner. A component is expected to remove rules of this sort and to stress decides that are fascinating in a given scientific setting. One such component is the utilization of layouts to prohibit or underline rules identified with a given examination. These formats go about as customary articulations for rules. The components of formats could incorporate characteristics, classes of qualities, and speculations of classes (e.g., C+ or C* for at least one individual from C or at least 0 individuals if C). Rule formats could be summed up to incorporate a C – or A – terms to preclude explicit properties or classes of qualities. A comprehensive format would hold any principles which coordinated with it’s anything but, a prohibitive layout could be utilized to dismiss decides that match it. There are the typical issues when a standard matches various formats.

Association Rules An affiliation rule is a connection between a bunch of twofold factors W and single paired variable B, to such an extent that when W is genuine then B is valid with a predetermined degree of certainty (likelihood). The explanation that the set W is genuine implies that every one of its segments are valid and furthermore valid for B. Affiliation rules are one of the normal strategies is information mining and other Knowledge Discovery in Databases (KDD) regions. For instance, assume you are taking a gander at retail location information. In the event that you track down that an individual shopping on a Tuesday night who purchases brew likewise purchases diapers around 20% of the time, then, at that point you have an association decide that {Tuesday, beer} {diapers} that has a certainty of 0.2. The help for this standard is the extent of cases that record that a buy is made on Tuesday and that it incorporates brew.

本书版权归Arcler所有

16

Key Concepts in Artificial Intelligence

Associative Memories Associative recollections work by reviewing data in light of a data sign. Cooperative recollections can be autoassociative or heteroassociative. Autoassociative recollections review the very data that is utilized as a sign, which can be valuable to finish a halfway example. Hetero associative recollections are valuable as a memory. Human long-haul memory is believed to be cooperative as a result of the manner by which one idea recovered from it prompts another. At the point when we need to store another thing of data in our drawn-out memory it ordinarily takes us 8 seconds to store a thing that can’t be related with a pre-put away thing, however just a couple of seconds, if there is an existed data structure with which to relate the new thing.

Associative Memory Classically, locations in memory or inside information structures, like exhibits, are listed by a numeric list that beginnings at nothing or one and are augmented consecutively for each new area. For instance, in a rundown of people put away in a cluster named people, the areas would be put away as person [0], person [1], person [2], etc. A cooperative exhibit permits the utilization of different types of files, like names or subjective strings. In the above model, the file may turn into a relationship, or a subjective string, for example, a government managed retirement number, or some other significant worth. Accordingly, for instance, one could gaze upward person[“mother”] to discover the name of the mother, and person[“OldestSister”] to discover the name of the most established sister.

Associative Property In conventional rationale, an administrator has a cooperative property if the contentions in a statement or recipe utilizing that administrator can be pulled together without changing the worth of the equation. In images, on the off chance that the administrator O is acquainted, aO (b O c) = (an O b) O c. Two normal models would be the + administrator in standard expansion and the “and” administrator in Boolean rationale.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

17

Automated Diagnosis Systems

Figure 7. Fully automated diagnostic system with artificial intelligence. Source: Image by Science Direct.com.

Most finding work is finished by master people like mechanics, engineers, specialists, fire fighters, client support specialists, and examiners of different sorts. We all normally do something like a little finding regardless of whether it’s anything but a significant piece of our functioning lives. We utilize a scope of procedures for our determinations. Essentially, we contrast a current circumstance and past ones, and reapply, maybe with little changes, the best past arrangements. On the off chance that this doesn’t work, we may run little mental recreations of potential arrangements through our brains, in light of first standards. We may accomplish more unpredictable reproductions utilizing first standards on paper or PCs searching for arrangements. A few issues are likewise amiable to quantitative arrangements. We may hand off the issue to more noteworthy specialists than ourselves, who utilize similar techniques. The issue with people doing analysis is that it’s anything but quite a while and a great deal of missteps to figure out how to turn into a specialist. Numerous circumstances simply don’t reoccur habitually, and we may need to experience every circumstance a few chances to get comfortable with it. Programmed conclusion frameworks can assist with staying away from these issues, while assisting people with turning out to be specialists quicker. They work best in blend with a couple of human specialists, as there are some analysis issues that people are better at addressing, and furthermore on the grounds that people are more innovative and versatile than PCs in thinking of new answers for new issues.

本书版权归Arcler所有

18

Key Concepts in Artificial Intelligence

Automatic Target Recognition (ATR) The capacity for a calculation or gadget to perceive targets or different items dependent on information got from sensors.

Autonomous Agents A piece of AI programming that consequently plays out an assignment for a human’s sake, or even for the benefit of another piece of AI programming, so together they achieve a valuable errand for an individual some place. They are equipped for autonomous activity in powerful, unusual conditions. “Selfruling specialist” is a popular term that is at times saved for AI programming utilized related to the Internet (for instance, AI programming that goes about as your help with shrewdly dealing with your email). Independent specialists present the best expectation from acquiring extra utility from figuring offices. In the course of recent years, the expression “specialist” has been utilized freely. Our meaning of a product specialist is: “an astute programming application with the approval and capacity to detect its current circumstance and work in an objective coordinated way.” Generally, the expression “specialist” suggests “insight,” which means the degree of intricacy of the undertakings included methodologies that which would already have required human mediation.

Assumption Based Reasoning Assumption Based Reasoning is a logical extension of the symbolic evidence theory – Dempster-Shafer theory. It is intended to tackle problems involving ambiguous, partial, or inconsistent data. It starts with a collection of propositional symbols, some of which are assumptions. When given a hypothesis, it will try to find justifications or explanations for it. Arguments that are adequate to justify a hypothesis provide quasi-support for the hypothesis, whereas arguments that do not refute a hypothesis constitute support for the hypothesis. Contradictions to the hypothesis are known as doubts. Arguments that support the hypothesis are referred to as plausibility. The process of assumption-based reasoning thus entails determining the sets of supports and doubts. It is important to note that this reasoning is done qualitatively. When probabilities are ascribed to the assumptions, an assumption based system (ABS) can also reason quantitatively. In this situation, the degrees of support, uncertainty, and plausibility can be estimated similarly to the

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

19

Dempster-Shafer theory. To conduct these calculations, a language called ABEL was created.

Asymptotically Stable A dynamic system, such as one used in robotics and other control systems, is asymptotically stable with regard to a certain equilibrium point if it begins near the equilibrium point, remains near the equilibrium point, and approaches the equilibrium point asymptotically.

Atom An atom is the basic building block in the LISP programming language. It is a sequence of characters that begins with a letter, digit, or any special character other than a (or). Examples are “atom,” “cat,” “3,” and “2.79.”

Attribute A (typically) named quantity with a range of values. These values are the domain of the attribute and can, in general, be either quantitative or qualitative, however they can also comprise other things, such as a picture. Its definition is synonymous with that of the statistical term “variable.” An attribute’s value is often referred to as its feature. Attributes with numerical values are frequently characterized as nominal, ordinal, integer, or ratio valued, along with being discontinuous or continuous in nature.

Attribute-Based Learning Machine learning techniques such as classification and regression trees, neural networks, regression models, and related or derivative techniques are referred to as Attribute-Based learning. All of these algorithms develop based on attribute values but do not explain relationships between object parts. Inductive Logic Programming is a different technique that focuses on learning relationships.

Augmented Transition Network Grammar Also known as an ATN. This gives a representation for linguistic rules that a machine can use in an efficient manner. The ATN is an extension of the Recursive Transition Network, another transition grammar network (RTN). ATNs add registers to contain partial parse structures and can be configured

本书版权归Arcler所有

20

Key Concepts in Artificial Intelligence

to record attributes (such as the speaker) and execute tests on the validity of the present analysis.

Autoassociative The same collection of variables is used as predictors and targets in an autoassociative model. Typically, the purpose of these models is to conduct some type of data reduction or clustering.

AutoClass AutoClass is a machine learning tool that conducts unsupervised multivariate data categorization (clustering). It automatically determines the number of clusters using a Bayesian model and can handle combinations of discrete and continuous data as well as missing values. It probabilistically classifies the data, allowing an observation to be classified into many classes.

Autoepistemic Logic Autoepistemic Logic is a nonmonotone logic that was devised in the 1980s. It expands first-order logic by introducing a new operator that stands for “I know” or “I believe.” This feature enables introspection, which means that if the system knows some truth A, it is aware that it knows A, and it allows the system to change its beliefs when new information is acquired. Variations of Autoepistemic logic may incorporate default logic within the autoepistemic logic.

Automatic Interaction Detection (AID) In the 1950s, the Automatic Interaction Detection (AID) program was designed. This programme was a precursor to Classification and Regression Trees (CART), CHAID, and other tree-based “automatic” data modelling programmes. It employed recursive significant testing to find interactions in the database under consideration. As a result, the trees it grew were usually exceedingly huge and aggressive.

Autonomous Land Vehicle in a Neural Net (ALVINN) The Autonomous Land Vehicle in a Neural Net (ALVINN) is an example of a neural network application to a real-time control challenge. It was a threelayer neural network. Its input nodes were photo sensor elements coupled to five middle nodes in a 30 by 32 array. A 32-element output array was

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

21

attached to the middle layer. It was taught using a mix of human experience and produced examples.

Figure 8. ALVIN vehicle. Source: Image by Huffpost UK.

Axiom In a logic system, an axiom is a sentence or relation that is presumed to be true. The axioms of Euclidean geometry and Kolmogorov’s axioms of probability are two well-known examples. In genetics tracking system, a more basic example might be the axiom that “all animals have a mother and a father” (e.g., BOBLO).

本书版权归Arcler所有

22

Key Concepts in Artificial Intelligence

B Backpropagation An old-style strategy for mistake engendering when preparing Artificial Neural Networks (ANNs). For standard backpropagation, the boundaries of every hub are changed by the neighborhood blunder inclination. The technique can be delayed to unite in spite of the fact that it very well may be worked on using strategies that lethargic the mistake spread and by bunch handling. Many substitute strategies, for example, the form inclination and Levenberg Marquardt calculations are more compelling and dependable.

Backtracking

Figure 9. Backtracking representation. Source: Image by Wikimedia Commons.

A technique utilized in search calculations to withdraw from an unsuitable position and restart the hunt at a formerly known “great” position. Normal hunt and enhancement issues include picking the “best” arrangement, subject to certain requirements (for instance, buying a house subject to spending restrictions, nearness to schools, and so on). A “beast power” approach would take a gander at all accessible houses, dispense with those that didn’t meet the limitation, and afterward request the arrangements from best to most noticeably awful. A steady pursuit would bit by bit limit in the houses viable. On the off chance that, at one stage, the hunt meandered into a local that was too costly, the inquiry calculation would require a technique to rear up to a past state.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

23

Backward Chaining A substitute name for in reverse thinking in master frameworks and objective arranging frameworks.

Backward Reasoning In reverse thinking, an objective or end is determined and the information base is then looked to discover subgoals that lead to this end. These subobjectives are contrasted with the premises and are either misrepresented, confirmed, or are held for additional examination. The thinking cycle is rehashed until the premises can be displayed to help the end, or it very well may be shown that no premises support the ends.

Bag of Words Representation A procedure utilized in certain Machine Learning and literary examination calculations; the sack of words portrayal of the content implodes the content into a rundown of words without respect for their unique request. Dissimilar to different types of regular language preparing, which treats the request for the words as being huge (e.g., for sentence structure examination), the pack of words portrayal permits the calculation to focus on the minimal and multivariate frequencies of words. It has been utilized in creating article classifiers and related applications.

Bayes Rule

Figure 10. Bayes rule. Source: Image by Wikimedia Commons.

本书版权归Arcler所有

24

Key Concepts in Artificial Intelligence

The Bayes rule, or Bayes classifier, is an optimal classifier that can be utilized when the dissemination of the sources of info given the classes are referred to precisely, just like the earlier probabilities of the actual classes. Since everything is expected to be known, it’s anything but a clear utilization of Bayes Theorem to figure the back probabilities of each class. By and by, this optimal condition of information is seldom achieved, so the Bayes rule gives an objective and a reason for correlation for different classifiers.

Bayes’ Theorem Bayes Theorem is a major hypothesis in likelihood hypothesis that permits one to reason about causes dependent on impacts. The hypothesis shows that assuming you have a suggestion H, and you notice some proof E, the likelihood of H in the wake of seeing E ought to be corresponding to your underlying likelihood times the likelihood of E if H holds. In images, P(H|E) µP(E|H)P(H), where P() is a likelihood, and P(A|B) addresses the restrictive likelihood of A when B is known to be valid. Bayes’ Theorem gives a strategy to refreshing a framework’s information about recommendations when new proof shows up. It is utilized in numerous frameworks, for example, Bayesian organizations, that need to perform conviction amendment or need to make deductions restrictive on incomplete information.

Bayesian Belief Function A conviction work that compares to a conventional likelihood work is alluded to as a Bayesian conviction work. For this situation, the entirety of the likelihood mass is appointed to singleton sets, and none is allotted straightforwardly to associations of the components.

Bayesian Hierarchical Model Bayesian progressive models indicate layers of vulnerability on the wonders being displayed and take into consideration staggered heterogeneity in models for ascribes. A base model is indicated for the most minimal level perceptions, and its boundaries are determined by earlier dispersions for the boundaries. Each level over this likewise has a model that can incorporate different boundaries or earlier circulations.

Bayesian Knowledge Discover Bayesian Knowledge Discoverer is an uninhibitedly accessible program to develop and assess Bayesian conviction organizations. It can naturally gauge

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

25

the organization and fare the outcomes in the Bayesian Network Interchange Format (BNIF).

Bayesian Learning Classical modeling techniques as a rule produce a solitary model with fixed boundaries. Bayesian models rather address the information with dispersion of models. Contingent upon strategy, this can either be as a back circulation on the loads for a solitary model, a wide range of models (e.g., a “wood” of order trees), or a blend of these. At the point when another info case is introduced, the Bayesian model creates a circulation of forecasts that can be joined to get a last expectation and assessments of changeability, and so on Albeit more muddled than the typical models, these methods likewise sum up better compared to the less complex models.

Bayesian Methods Bayesian strategies give a proper strategy to thinking about unsure occasions. They are grounded in likelihood hypothesis and utilize probabilistic procedures to survey and spread the vulnerability.

Bayesian Network (BN) A Bayesian Network is a graphical model that is utilized to address probabilistic connections among a bunch of characteristics. The hubs, addressing the condition of traits, are associated in a Directed Acyclic Graph (DAG). The curves in the organization address likelihood models interfacing the traits. The likelihood models offer an adaptable way to address vulnerability in information frameworks. They permit the framework to determine the condition of a bunch of traits and derive the subsequent disseminations in the excess credits. The organizations are called Bayesian on the grounds that they utilize the Bayes Theorem to spread vulnerability all through the organization. Note that the circular segments are not needed to address causal bearings but instead address headings that likelihood spreads.

Bayesian Network Interchange Format (BNIF) The Bayesian Network Interchange Format (BNIF) is a proposed design for portraying and exchanging conviction organizations. This will permit the sharing of information bases that are addressed as a Bayesian Network (BN) and permit the numerous Bayes organizations to interoperate.

本书版权归Arcler所有

26

Key Concepts in Artificial Intelligence

Bayesian Networks A displaying method that gives a numerically strong formalism to addressing and thinking about vulnerability, imprecision, or unconventionality in our insight. For instance, seeing that the front grass is wet, one may wish to decide if it down-poured during the earlier evening. Induction calculations can utilize the design of the Bayesian organization to compute contingent probabilities dependent on whatever information has been noticed (e.g., the road doesn’t seem wet, so almost certainly, the wetness is because of the sprinklers). Bayesian organizations offer or empower a bunch of advantages not gave by some other framework to managing vulnerability – a straightforward graphical portrayal, a solid numerical establishment, and viable mechanized tuning systems. These methods have demonstrated helpful in a wide assortment of errands including clinical conclusion, normal language understanding, plan acknowledgment, and interruption recognition. Likewise called conviction organizations, Bayes organizations, or causal probabilistic networks.

Bayesian Updating A strategy for refreshing the vulnerability on an activity or an occasion dependent on new proof. The modified likelihood of an occasion is P(Event given new data)=P(E preceding data)*P(E given information)/P(data).

Beam Search Many search problems (e.g., a chess program or an arranging program) can be addressed by a pursuit tree. A bar search assesses the tree also to a broadness first inquiry, advancing level by level down the tree however just follows a best subset of hubs down the tree, pruning branches that don’t have high scores dependent on their present status. A shaft search that follows the best momentum hub is likewise named a best first hunt.

Belief An uninhibitedly accessible program for the control of graphical conviction capacities and graphical likelihood models. All things considered, it upholds both conviction and probabilistic control of models. It additionally permits second-request models (hyper-dispersion or meta-dissemination). A business form is being developed under the name of GRAPHICAL-BELIEF.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

27

Belief Chain A conviction net whose Directed Acyclic Graph (DAG) can be requested as in a rundown, with the goal that every hub has one archetype, aside from the primary which has no archetype, and one replacement, aside from the last which has no replacement.

Belief Core The center of a set in the Dempster-Shafer hypothesis is that likelihood is straightforwardly appointed to a set yet not to any of its subsets. The center of a conviction work is the association of the relative multitude of sets in the casing of wisdom which have a non-zero center (otherwise called the central components).

Belief Function In the Dempster-Shafer hypothesis, the likelihood positively doled out to a bunch of recommendations is alluded to as the conviction for that set. It’s anything but a lower likelihood for the set. The upper likelihood for the set is the likelihood allocated to sets containing the components of the arrangement of interest and is the supplement of the conviction work for the supplement of the arrangement of interest (i.e., Pu(A)=1 – Bel(not A).) The conviction work is that work which returns the lower likelihood of a set. Conviction works that can measure up by thinking about that the probabilities appointed to some repeatable occasions are an assertion about the normal recurrence of that occasion. A belief function and upper probability just determine upper and below on the normal recurrence of that occasion. The likelihood tends to the vulnerability of the occasion, however is exact about the average frequency of that event. A belief function and upper probability only specify upper and lower bounds on the average frequency of that event. The probability addresses the uncertainty of the event, but is precise about the averages, while the belief function includes both uncertainty and imprecision about the average.

Belief Net Used in probabilistic master frameworks to address connections among factors, a conviction net is a Directed Acyclic Graph (DAG) with factors as hubs, alongside conditionals for each bend entering a hub. The attribute(s)

本书版权归Arcler所有

28

Key Concepts in Artificial Intelligence

at the hub are the top of the conditionals, and the traits with curves entering the hub are the tails. These charts are additionally alluded to as Bayesian Networks (BN) or graphical models.

Belief Revision Belief correction is the way toward adjusting a current information base to represent new data. At the point when the new data is reliable with the old data, the interaction is typically direct. At the point when it negates existing data, the conviction (information) structure must be modified to take out inconsistencies. A few strategies incorporate extension which adds new “rules” to the data set, constriction which kills logical inconsistencies by eliminating rules from the data set, and correction which keeps up with existing guidelines by transforming them to adjust to the new data.

Belle A chess-playing framework created at Bell Laboratories. It was evaluated as an expert level chess player.

Berge Networks A chordal graphical organization that has club crossing points of size one. Valuable in the examination of conviction organizations, models characterized as Berge Networks can be fallen into interesting proof chains between any ideal pair of hubs permitting simple assessment of the proof streams.

Bernoulli Process The Bernoulli interaction is a basic model for a grouping of occasions that produce a parallel result (typically addressed by zeros and ones). On the off chance that the likelihood of a “one” is steady over the arrangement, and the occasions are autonomous, then, at that point the cycle is a Bernoulli interaction.

Best First Algorithm Used in investigating tree structures, a best first calculation keeps a rundown of investigated hubs with neglected sub nodes. At each progression, the calculation picks the hub with the best score and assesses its sub-hubs. After the hubs have been extended and assessed, the hub set is re-requested and the best of the current hubs is picked for additional turn of events.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

29

BESTDOSE It is a specialist framework that is intended to furnish doctors with patientexplicit medication dosing data. It was created by First Databank, a supplier of electronic medication data, utilizing the Neuron Data “Components Expert” framework. It can alarm doctors in the event that it’s anything but an expected issue with a portion and give references to the writing. Bias Input Neural network models frequently consider a “predisposition” term in every hub. This is a consistent term that is added to the amount of the weighted information sources. It acts in a similar design as a catch in a direct relapse or a counterbalance in a summed up straight model, letting the yield of the hub buoy to a worth other than zero at the beginning (when every one of the information sources are zero.) This can likewise be addressed in a neural organization by a typical contribution to all hubs that is constantly set to one.

Bidirectional Associative Memory (BAM)

Figure 11. The architecture of Bidirectional association memory. Source: Image by ResearchGate.

A two-layer input neural organization with fixed association frameworks. When given an information vector, rehashed utilization of the association frameworks makes the vector merge to a learned fixed point.

Bidirectional Network A two-layer neural organization where each layer gives contribution to the next layer, and where the synaptic grid of layer 1 to layer 2 is the render of the synaptic lattice from layer 2 to layer 1.

本书版权归Arcler所有

30

Key Concepts in Artificial Intelligence

Binary Resolution A conventional derivation decides that licenses PCs to reason. At the point when two conditions are communicated in the appropriate structure, a paired derivation rule endeavors to “determine” them by tracking down the most broad normal proviso. All the more officially, a parallel goal of the statements An and B, with literals L1 and L2, separately, one of which is positive and the other negative, to such an extent that L1 and L2 are unifiable overlooking their signs, is found by getting the Most General Unifier (MGU) of L1 and L2, applying that substitute on L3 and L4 to the conditions An and B to yield C and D individually, and shaping the disjunction of C-L3 and D-L4. This strategy has discovered numerous applications in master frameworks, programmed hypothesis demonstrating, and formal rationale.

Binary Tree A parallel tree is a specialization of the conventional tree necessitating that each non-terminal hub has unequivocally two youngster hubs, generally alluded to as a left hub and a right hub.

Binding A relationship in a program between an identifier and a worth. The worth can be either an area in memory or an image. Dynamic ties are transitory and generally just exist briefly in a program. Static ties commonly keep going for the whole existence of the program.

Binning Many learning calculations just work on credits that take on few qualities. The way toward changing over a ceaseless property, or an arranged discrete trait with numerous qualities into a discrete vari-capable with few qualities is called binning. The scope of the ceaseless quality is parceled into various receptacles, and each case nonstop trait is ordered into a canister. Another trait is developed which comprises of the receptacle number related with worth of the ceaseless quality. There are numerous calculations to perform binning. Two of the most widely recognized incorporate equi-length receptacles, where every one of the containers are a similar size, and equiprobable canisters, where each container gets similar number of cases.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

31

Binomial Distribution The binomial dispersion is a fundamental circulation utilized in demonstrating assortments of paired occasions. On the off chance that occasions in the assortment are expected to have an indistinguishable likelihood of being a “one” and they happen freely, the quantity of “ones” in the assortment will follow a binomial dispersion. At the point when the occasions can each assume similar arrangement of different qualities yet are still in any case indistinguishable and free, the appropriation is known as a multinomial. An exemplary model would be the aftereffect of an arrangement of sixsided bite the dust rolls. In the event that you were keen on the occasions the kick the bucket showed a 1, 2, . . ., 6, the dispersion of states would be multinomial. In the event that you were just intrigued by the likelihood of a five or a six, without recognizing them, there would be two states, and the circulation would be binomial.

Bipartite Graph A bipartite diagram is a chart with two sorts of hubs to such an extent that circular segments from one kind can just associate with hubs of the other sort.

Board A Board engineering framework gives a system to agreeable critical thinking. Every one of different autonomous information sources can convey to others by writing to and perusing from a board data set that contains the worldwide issue states. A control unit decides the space of the issue space on which to center.

Blocks World A fake climate used to test arranging and getting frameworks. It is made out of squares of different sizes and tones in a room or series of rooms.

Boolean Circuit A Boolean circuit of size N over k twofold characteristics is a gadget for registering a double capacity or rule. It’s anything but a Directed Acyclic Graph (DAG) with N vertices that can be utilized to process a Boolean outcome. It has k “input” vertices which address the twofold traits. Its other vertices have it is possible that a couple of information circular segments.

本书版权归Arcler所有

32

Key Concepts in Artificial Intelligence

The single information vertices supplement their information variable, and the parallel information vertices take either the combination or disjunction of their data sources. Boolean circuits can address ideas that are more mind boggling than k-choice records, yet less muddled than an overall disjunctive ordinary structure.

Boosted Naïve Bayes (BNB) Classification The Boosted Naïve Bayes (BNB) arrangement calculation is a minor departure from the ADABOOST grouping with a Naïve Bayes classifier that re-communicates the classifier to determine loads of proof for each property. This permits assessment of the commitment of each quality. Its exhibition is like ADABOOST.

Boosted Naïve Bayes Regression Boosted Naïve Bayes relapse is an expansion of ADABOOST to deal with constant information. Maybe the preparation set has been extended in a boundless number of repeats, with two new factors added. The first is a removed point which fluctuates over the scope of the objective variable and the second is a paired variable that shows whether the genuine variable is over (1) or beneath (0), the cut-off.

Bootstrapping It can be utilized as a way to assess the blunder of a displaying procedure, and can be viewed as a speculation of cross-approval. Essentially, each bootstrap test from the preparation information for a model is an example, with substitution from the whole preparing test. A model is prepared for each example and its mistake can be assessed from the unselected information in that example. Normally, countless examples (>100) are chosen and fit. The strategy has been broadly concentrated in insights writing.

Bottom-Up This modifier, like the top-down modifier, illustrates the strategy of a programme or method used to solve problems. Given a goal and the current state, a bottom-up strategy would investigate all feasible steps (or states) that can be generated or attained from the current state. The current state is then added to these, and the procedure is repeated. Once the goal is accomplished or all derivative steps have been exhausted, the process comes to an end.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

33

These methods are sometimes known as data-driven or forward search or inference.

Bound and Collapse Bound and Collapse is a two-step approach for learning a Bayesian Network (BN) using partial data in databases. The two (repeated) processes are confining the estimates with values consistent with the present state, coupled with collapsing the estimate bounds using a convex combination of the bounds. Bayesian Knowledge Discoverer is an experimental programme that employs this method.

Boundary Region The boundary region in a rough set analysis of a concept X is the (set) difference between the upper and lower approximations for that concept. In a rough set analysis of credit data, when the concept is “high credit risk,” the largest set containing solely high credit risk cases would be the closest approximation to “high credit risk.” The upper approximation would be the smallest set including all high credit risk situations, and the boundary region would include cases from the upper approximation but not from the lower approximation. The cases in the boundary region contain, by design, some examples that do not belong to the concept and show the attribute table inconsistencies.

Bound Variable or Symbol When a variable or symbol is said to be bound, it has a value given to it. If no value has been assigned, the variable or symbol is unbound.

Box Computing Dimension In document and vision analysis, a simplified form of the Hausdorff dimension, used to evaluate the fractal dimension of a collection.

Box-Jenkins Analysis Box-Jenkins Analysis is a type of time series analysis in which the output is seen as a succession of systematic changes and cumulative random shocks. A spectrum analysis, which views the series of events as an outcome of an ongoing process and models the amplitude of the frequencies of that output, is another type of analysis.

本书版权归Arcler所有

34

Key Concepts in Artificial Intelligence

Boxplot A boxplot is a simple tool for illustrating a distribution. In its most basic form, it consists of a horizontal axis with a box above it, often with a spike protruding from each end. The box’s beginning and ending points represent a pair of percentiles, such as the 25th and 75th percentile points. The extreme percentiles (10 and 90) can be shown at the ends, and the center can be marked by a vertical line (median or mean).

Branch-and-Bound Search Branch-and-bound searches are used to enhance searches by representing a solution space as a tree. As the algorithm traverses a tree, it keeps track of any partial pathways that have already been analyzed. At each iteration, it selects the best (lowest cost) known path and develops it to the next level, scoring each of the new viable paths. These new roads are added to the list of viable paths, replacing their common ancestor, and the process is reassessed at the best path currently available. When a solution is discovered, it can be enhanced by reevaluating the stored paths in order to eliminate more expensive options. The remaining alternatives can then be reviewed unless they either produce a better solution or become more costly than the bestknown option.

Branching Factor A branching factor is a measure of a problem’s or search algorithm’s complexity. The branching factor is B= N(1/d). if an algorithm constructs a tree with a maximum depth of D and N nodes. This metric can be used to compare different methods and techniques for a wide range of challenges. It has been demonstrated that alpha-beta pruning produces the best results of any broad game-searching algorithm for a wide range of tree types.

Brier Scoring Rule The squared Euclidean distance between two categorical distributions is used to calculate this distance. It has been used in categorization and pattern recognition as a scoring rule.

Brute Force Algorithm The term “brute force algorithms” refers to algorithms that thoroughly investigate every alternative. While this approach will always result in

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

35

the “best” solution, it can also take an excessive amount of time or other resources in contrast to techniques that use another property of the problem to arrive at a solution, techniques that use a greedy approach, or techniques that use a limited look-ahead. As an example, consider the challenge of determining the maximum of a function. A brute force step would divide the feasible region into small grids and then analyse the outcomes at each grid point. In a “well-behaved” function, a smarter algorithm might analyze it at a few locations and use the results of those evaluations to iteratively advance toward a solution, arriving at the maximum faster than the brute force technique.

Bubble Graph A bubble graph is a type of Directed Acylic Graph (DAG) in which the nodes represent groups of variables instead of just a single variable as in a DAG. They are used to represent multivariate head tail relationships for conditionals in probabilistic expert systems.

BUGS BUGS is a free and open-source tool for fitting Bayesian models. It can fit specific graphical models utilizing Markov Chain Monte Carlo techniques, along with a large range of traditional models. The Microsoft Windows version, WinBUGS, includes a graphical interface as well as the ability to create graphical models for further examination.

本书版权归Arcler所有

36

Key Concepts in Artificial Intelligence

C Computer Language A higher-level computer language intended for general frameworks programming in the last part of the 1960s at Bell Labs. It enjoys the benefit of being exceptionally amazing and to some degree “close” to the machine, so it can produce extremely quick projects. Numerous creation master frameworks depend on C schedules.

Caduceus A specialist framework for clinical analysis created by H. Myers and H. Pople at the University of Pittsburgh in 1985. This framework is a replacement to the INTERNIST program that fuses causal connections into its determinations.

Cardinality The cardinality of a set is the quantity of components in the set. When all is said in done, the cardinality of an article is an action, typically by some type of tallying, of the size of the item.

Case An occasion or illustration of an article relating to a perception in conventional science or a column in a data set table. A case has a related element vector, containing values for its ascribes.

Case-Based Reasoning (CBR)

Figure 12. Cycle of case-based reasoning. Source: Image by Researchgate.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

37

Case-Based Reasoning (CBR) is an information-based method for robotizing thinking from past cases. At the point when a CBR framework is given an info design, it scans its data set for comparative arrangements and makes expectations or derivations dependent on comparable cases. The framework is equipped for learning through the expansion of new cases into its information base, alongside some proportion of the integrity, or wellness, of the arrangement.

Case-Based Reasoning Case-based reasoning (CBR) takes care of a current issue by recovering the answer for past comparable issues and changing those answers for meet the current requirements. It depends on past encounters and examples of past encounters. People with long periods of involvement with a specific work and action (e.g., a talented paramedic showing up on a mishap scene can regularly consequently realize the best strategy to manage a patient) utilize this method to tackle a significant number of their issues. One benefit of CBT is that unpracticed individuals can draw on the information on experienced associates, incorporating ones who aren’t in the association, to tackle their issues. Equivalent word: Reasoning by relationship.

CASSIOPEE

Figure 13. Cassiopeia constellation map. Source: Image by Wikimedia Commons.

本书版权归Arcler所有

38

Key Concepts in Artificial Intelligence

An investigating master framework created as a joint endeavor between General Electric and SNECMA and applied to analyze and foresee issues for the Boeing 737. It utilized Knowledge Discovery in Databases (KDD) based grouping to determine “families” of disappointments.

Categorical Variable A variable or attribute that can only have a restricted number of values. It is commonly considered that the values have no fundamental order. Classification problems are commonly used to describe prediction problems with categorical outputs.

Category Proliferation The term refers to ART networks and other machine learning algorithms’ tendency to generate a high number of prototype vectors as the size of input patterns grows.

Cautious Monotonicity Cautious monotonicity is a subset of monotone logic that allows the retention of existing theorems whenever fresh information flows from an old premise.

CHAID It was an early development of the Automatic Interaction Detection (AID) technique, substituting Chi-Square tests on contingency tables for the earlier systems’ reliance on normal theory techniques and measures, such as tests and analyses of variance. The method outperforms the AID methodology on multiple n-ary attributes (variables). However, the method suffers from its reliance on repeated statistical significance testing, because the theory on which these tests are based requires such factors as the independence of the data sets utilized in repeated testing (which is clearly violated when the tests are performed on recursive subsets of the data).

Chain Graph A different way of displaying multivariate relationships in a belief net. This graph contains both directed and undirected arcs, where the directed arcs represent head/tail relationships like in a belief graph while the undirected arcs represent multivariate relationships between sets of variables.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

39

Chain Rule The chain rule is a technique for breaking down multivariable functions into smaller univariate functions. Backpropagation in neural nets, where the prediction error at a neuron is broken down into a part due to local coefficients and a part due to error in incoming signals, which can be passed down to those nodes, and probability-based models, which can decompose complex probability models into conditional distribution products, are two common examples. A decomposition of P(A, B, C) into the product of P(A|B, C), P(B|C), and P(C) is an example of the latter, where P(X|Y) is the conditional probability of X given Y. Much of the belief nets are based on this latter breakdown.

Character Recognition The ability of a computer to recognize a character’s picture as a character. This has been a long-term AI aim, and it has proven relatively successful for both machine- and hand-printed items.

Chatbots A chat robot (chatbot for short) is a computer program that simulates a conversation with human users by conversing via text chats, voice commands, or both. They are a frequent interface for computer applications with AI capabilities.

Checkers Playing Programs Samuels wrote the best checkers playing programs from 1947 to 1967, and they can beat most players. Game-playing programs are significant because they provide an excellent environment for testing and evaluating various algorithms, as well as a way to test various learning and knowledge representation theories.

Chernoff Bound The Chernoff bound is a conclusion of probability theory that provides upper and lower bounds on the deviation of a sample mean from the true mean. It is frequently used in the analysis of machine learning algorithms and other areas of computer science.

本书版权归Arcler所有

40

Key Concepts in Artificial Intelligence

Chinook Chinook is a checkers playing programme that currently owns the manmachine checkers championship. Chinook won the championship in 1994 by forfeiting the reigning human champion, Marion Tinsely, who quit due to health issues during the match and died from cancer later that year. Since then, the show has defended its title. Chinook employs an alpha-beta search algorithm and can search around 21 moves ahead, thanks to a handtuned evaluation function. It features a library of approximately 400 billion locations in the endgame, as well as a vast database of opening sequences.

Chi-Squared Statistic

Figure 14. Formula for Chi-squared statistics. Source: Image by Flickr.

A Chi-Squared Statistic is a test statistic used to compare a collection of data to a hypothesized distribution. When the data and the hypothesis diverge, this statistic has a high value. Its values are typically compared to those of a Chi-Squared distribution. It is often used as a measure of independence in contingency tables (cross classifications). In this context, the sum of the squared differences between the observed and expected counts in a cell, divided by the expected count (i.e., observedexpected^2/expected).

Chromosome This is a data structure in genetic algorithms that carries a sequence of task parameters known as genes. They are frequently encoded in such a way that mutations and crossovers are simple (i.e., changes in value and transfer between competing solutions).

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

41

Chunking Chunking is a knowledge representation approach used in systems like Soar. Data conditions are grouped together in such a way that data in one state implies data in another. Soar’s learning and goal-seeking behavior can be accelerated as a result of this chunking. When Soar solves an impasse, its algorithms determine which working pieces enabled the impasse to be solved. After that, the elements are chunked. When a similar situation arises, the chunked findings can be reused.

Church Numerals Church Numbers are a functional representation of non-negative numerals that allow for the manipulation of numerical relationships on a purely logical level.

Church’s Thesis An assertion that any algorithmic process defines a mathematical function that belongs to a well-defined class of functions known as recursive functions. It has enabled the proof of certain intractable problems as well as the proof of a number of other significant mathematical results. It also serves as a conceptual underpinning for the notions that AI is feasible and can be implemented in computers. It basically indicates that intelligence can be reduced to mechanical processes.

Churn Analysis, Modeling, and Prediction (CHAMP) Churn Analysis, Modeling, and Prediction (CHAMP) is a Knowledge Discovery in Databases (KDD) programme being developed at GTE. Its goal is to model and predict cellular client turnover (churn), allowing them to reduce or influence customer turnover.

Circumspection Circumspection is a type of nonmonotone logic. It accomplishes this by supplementing the basic predicate logic with formulas that limit (circumscribe) the predicates in the initial formulae. A formula having a p-ary predicate symbol, for example, can be circumscribed by replacing the p-ary symbol with an arity p predicate expression. Circumscription reaches its full potential in second-order logic, but its application has been limited due to existing computational constraints.

本书版权归Arcler所有

42

Key Concepts in Artificial Intelligence

Class In a representation system, a class is an abstract grouping of objects, such as the class of automobiles. A class can have sub-classes, such as four-door sedans or convertibles, and (one or more) super-classes, such as the class of four-wheeled vehicles. An instance of a class is a specific object that meets the definitions of the class. The class can have slots that describe the class (own slots), slots that describe instances of the class (instance slots), and assertions that characterize the class (such as facets).

CLASSIC AT&T created a knowledge representation system for use in situations where rapid response to queries is more critical than the system’s expressive power. It is object-oriented and can convey many of the features of a semantic network. Three different versions have been created. The most powerful version of CLASSIC is the original, written in LISP. C-Classic, a less powerful variant, was written in C. Neo-Classic, the most recent version, is written in C++. It is almost as powerful as the CLASSIC lisp version.

Classification Decision trees, for example, have been found to be quite effective at distinguishing and characterizing very huge amounts of data. Based on a set of observed features, they assign items to one of a set of predefined classes of objects. For example, depending on its color, size, and gill size, one can tell whether a mushroom is “poisonous” or “edible.” Through supervised learning, classifiers can be taught automatically from a group of samples. Classification rules are rules that distinguish between different partitions of a database depending on various attributes included inside the database. The database partitions are based on an attribute known as the classification label (for example, “faulty” and “good”).

Classification Methods Methods used in data mining and related fields (statistics) to create classification rules that can classify data into one of several predefined groups. The output of the rules can be a type of membership function, which is a specific form of regression. It gives a rough estimate of the likelihood that an observation corresponds to one of the classes. The membership can be accurate or ambiguous. A discriminant function that finds the most likely class, implicitly designating the membership of that class to one and the

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

43

others, is an example of a crisp assignment. A multiple logistic regression or a Classification and Regression Trees (CART) tree, which provides a probability of membership for many classes, is an example of an inaccurate membership function.

Classification The process of categorizing a group of records from a database (observations in a dataset) into one of a “small” number of pre-specified disjoint categories. Regression, which predicts a range of values, and clustering, which (usually) allows the categories to create themselves, are both related techniques. In numerous ways, the classification can be described as “fuzzy.” In the traditional sense, the classification technique allows a single record to belong to many (disjoint) categories, each with an estimated probability of being in that class. When the categories are created using a hierarchical model or an agglomerative technique, they may overlap. Finally, the classification can be fuzzy in the sense that it employs “fuzzy logic” techniques.

Clustering Clustering is a learning strategy that aims to automatically arrange items into meaningful groups based on their similarity. In contrast to classification, clustering does not require the groups to be predefined in the hope that the algorithm will discover relevant but hidden groupings of data points. The objective is that by using clustering techniques, useful but previously undiscovered classes of items would be discovered. NASA’s discovery of a new class of stellar spectra was a well-publicized success of a clustering system. Examples of applications that use clustering include IQE, GIIF, Web Mediator, Rome Graphics, and data mining.

Cognitive Science Artificial intelligence can be defined as the imitation of human intellect in order to accomplish valuable tasks such as problem-solving. This continuous involvement with the human mind, the motivation for AI, is required for the development of new paradigms, algorithms, and methodologies. To this purpose, AI software designers collaborate with cognitive psychologists and employ cognitive science concepts, particularly in knowledge elicitation and system design.

本书版权归Arcler所有

44

Key Concepts in Artificial Intelligence

Cognitive Task Analysis Cognitive task analysis (CTA) is a systematic approach for identifying the cognitive elements of task performance. This comprises domain knowledge as well as cognitive processing. In contrast to behavioral task analysis, which breaks the activity down into observable, procedural steps, CTA focuses on mental operations that cannot be observed. CTA is most useful for jobs with limited visible behaviors. Cognitive processing elements include: deciding, judging, noticing, assessing, recognizing, interpreting, prioritizing, and anticipating. Concepts, principles, and interrelationships are examples of domain knowledge aspects, as are goals and goal structures, rules, strategies, and plans, implicit knowledge, and mental models. CTA results can be used to identify content to include in training programs for complex cognitive tasks, research on expert-novice differences in domain knowledge and cognitive processing during task performance, modeling of expert performance to support expert system design, and the design of human-machine interfaces.

Collaborative Filtering A method of using historical data on the preferences of a group of users to help create recommendations or filter information for a specific user. Intuitively, the purpose of these strategies is to find what is interesting to people who are similar to that user in order to establish an idea of what may be attractive to that user. Examples of programs that use collaborative filtering techniques are GIIF and IQE.

Commonsense Reasoning Ordinary people may complete an astonishing number of complex tasks by employing simple, informal mental processes based on a significant amount of common knowledge. They can easily organize and carry out a shopping expedition to six or seven different stores, as well as pick up the kids from soccer and return a book to the library, without logically considering the hundreds of thousands of other ways to plan such an outing. Using commonsense reasoning, they can handle their personal money or dance their way across a crowded room without collapsing. Except for a few jobs, artificial intelligence lags considerably behind humans in employing such reasoning, and tasks that rely largely on commonsense thinking are typically poor candidates for AI applications.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

45

Computer Vision

Figure 15. Computer vision syndrome. Source: Image by Flickr.

Making meaning of what we see is usually simple for humans, but extremely difficult for computers. Until now, practical vision systems have been limited to working in highly controlled conditions. Synonym: machine vision.

Constraint Satisfaction Events, situations, or laws that limit our options for accomplishing a task are referred to as constraints. For example, a building’s foundation must be laid before framing can begin; a car must be refueled every 400 miles; brain surgery requires a neurosurgeon, and a Walkman can only run on a 9-volt battery. Satisfying constraints is especially crucial when it comes to arranging complex activities. The number of different schedules to evaluate

本书版权归Arcler所有

46

Key Concepts in Artificial Intelligence

in a search for an acceptable timetable can be greatly reduced by first evaluating applicable limitations, making the search process considerably more efficient. To tackle scheduling problems directly, constraint satisfaction techniques can be employed. Heuristic constraint-based search and annealing are two types of constraint satisfaction algorithms. Convolutional neural network (CNN)

Figure 16. Convolutional neural network feed forward example. Source: Image by Wikimedia Commons.

本书版权归Arcler所有

A form of neural network that recognizes and interprets images.

Key Concepts in Artificial Intelligence

47

D DADO A parallel machine architecture optimized for artificial intelligence. The various processing pieces are organized into a binary tree.

Daemon A daemon is a self-contained computer process that can (appear to) run alongside other processes on a computer. The daemon can communicate with other processes and respond to external (user) requests. A typical example is a mail transport daemon, which waits for mail and handles local delivery.

Dante Dante I and Dante II were ambitious semi-autonomous experimental robots designed to work in adverse situations such as volcanoes and extraterrestrial environments. Dante I was created to explore Mt. Eusebius in the Antarctic, but it was unable to do so because of the harsh cold. Dante II had better luck investigating another volcano, but he eventually slid, flopped over, and was unable to right itself.

Data Cleaning The process of validating data prior to data analysis or data mining. This includes ensuring that the data values are legitimate for a certain characteristic or variable (for example, heights are all positive and within a suitable range) as well as that the values for a given record or set of records are consistent. Some examples include ensuring that age grows over time or that age and weight data are consistent (no seven-foot person weighs less than 100 pounds). When dealing with anomalous values, care must be taken to ensure those valid outliers are not deleted incorrectly. These outliers can be quite instructive. Data cleansing is an important first step in the development of data warehouses and data mining. Failure to clean the data appropriately might result in erroneous findings caused by outliers and/ or unrealistic values, or missed relationships.

Data Dictionary A data dictionary is a database that contains information on variables found in other data tables. A data dictionary is a database that holds “meta-data” on

本书版权归Arcler所有

48

Key Concepts in Artificial Intelligence

the structure, contents, and relationships between other databases and their properties in a data warehouse.

Data Fusion The association, correlation, and integration of data and information from single and numerous sources to obtain a more full and accurate appraisal of a situation is referred to as information processing. To attain better results, the process is characterized by constant refining of its estimates and judgments, as well as the examination of the need for additional sources or modification of the process itself.

Data Mart A data warehouse that has been tailored to certain analyses. The data is often a subset of a broader data warehouse that answers a specific inquiry or is formatted for a specific toolset.

Data Mining

Figure 17. Data mining process. Source: Image by Flickr.

A phrase used in statistics and Knowledge Discovery in Databases to describe the application of automatic or semi-automatic techniques to data in order to identify previously unknown patterns. There are several techniques, each of which is addressed more in other entries. These procedures include classification procedures that seek to learn how to categorize objects into previously established classes and regression procedures that attempt to predict or assign a value to specified output fields based on specified input fields.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

49

Clustering methods, which attempt to find groups of similar data points, dependency models (both quantitative and structural), such as graphical models, which attempt to model the interrelationships of a set of data, and deviation methods, which attempt to extract the most significant deviations from a set of normative cases or values, are also included.

Data Mining Query Language (DMQL) A query language created for use in Data Mining applications. It is based on Structured Query Language (SQL), which is a standard language used in database applications.

Data Mining The challenging process of identifying interesting and valuable links and patterns in very large databases in order to influence better commercial and technological decisions. Data mining is becoming increasingly significant as a result of the fact that all types of commercial and government entities are now logging massive amounts of data and need a way to optimize the use of these large resources. Data mining techniques are distinguished from more traditional statistical and machine learning approaches by the size of the databases to which they are applied, which can be computationally expensive. Data mining is a component of the broader process of ‘Knowledge Discovery in Databases.” Data mining is preceded by the preliminary steps of data preparation and cleansing and is followed by the incorporation of other relevant knowledge and the final interpretation. See all of the data mining projects for examples of how data mining techniques can be used.

Data Navigation In Data Mining and , the analyst often works with multiple data attributes at the same time. Data navigation tools allow you to view certain slices (subsets of the whole attribute set) and subsamples (subsets of the rows), as well as the transition between them.

Data Preprocessing Prior to undertaking Data Mining, the data must be preprocessed in order to normalize its structure. This can comprise data cleaning, data selection or sampling, data reduction and mapping, and data reduction and mapping.

本书版权归Arcler所有

50

Key Concepts in Artificial Intelligence

Data Reduction A terminology used largely in scientific data analysis to describe the extraction of essential variables or variables’ functions from a large number of available attributes.

Data Science

Figure 18. Data science. Source: Image by Wikimedia Commons.

An interdisciplinary area that combines scientific methods, systems, and processes from statistics, information science, and computer science in order to provide insight into phenomena using either structured or unstructured data.

Data Visualization Complex data structures and interactions are frequently difficult to comprehend. Data visualization technologies aim to graphically portray these relationships and information. Data visualization tools range from simple histograms and scatter plots to complicated 3-D structures such as a Virtual Reality Modeling Language (VRML) representation.

Data Warehouse This phrase refers to a vast, centralized collection of business or other organizational data, which is typically gathered from numerous sources. The databases should have been cleaned and the attribute names, values, and relationships should have been regularized. Meta-data, or data containing information about the data collection itself, is commonly included in these

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

51

large collections. These data warehouses can provide economies of scale, but they must also balance ease of access with corporate security and privacy considerations.

Data Warehousing A business and database term that refers to the collection and cleaning of transactional data in order to facilitate online analysis methods such as online analytical processing and “decision support.” The establishment of a data warehouse is a necessary initial step in the linked topic of Knowledge Discovery in Databases (KDD).

Database Marketing A Data Mining strategy that combines customer databases and Data Mining tools to accurately target new clients. Also referred to as a mailshot response.

DataLogic/R DataLogic/R is a database “mining” system that analyses data at various levels of knowledge representation using Rough Set Theory and inductive logic. Reduct Systems, Inc. is the vendor.

Decision Aids Software that assists humans in making judgments, particularly in difficult situations where a high level of expertise is required to make a successful decision.

Decision Attribute In a Rough Set Data Analysis, decision attributes are the output or target variables. Since this choice is based on the values of the predictor or classification attributes, they are referred to as condition attributes.

Decision List An ordered sequence of k if-then-else rules constitutes a k-decision list. Each rule in the sequence is tested one at a time until one of the rules is satisfied. When the rule is satisfied, the related action is executed. Both the k-term Conjunctive Normal Form (CNF) and the k-term Disjunctive Normal Form (DNF) can be represented as kdecision lists, which are Probably Approximately Correct (PAC) learnable.

本书版权归Arcler所有

52

Key Concepts in Artificial Intelligence

Decision Problem

Figure 19. Problem solution decision. Source: Image by Pixabay.

A decision problem can be represented by three elements: a set of valid actions A, a set of states S, and a utility or payoff function u(A, s) >0 for elements in A and S. When the state S is unclear, the challenge is deciding which action to take.

Decision Support The term “decision support” refers to a broad category of applications for artificial intelligence software. There are numerous scenarios in which people would desire technology, particularly computers, to either automatically aid them in making decisions or to make and act on their behalf. There are several non-AI decision support systems, such as the majority of process control systems that successfully run chemical plants, power plants, and the like under steady state settings. However, when situations become more complex—for example, in chemical plants that do not operate in a constant state or in enterprises where humans and equipment interact—intelligent decision support is necessary. That can only be given by artificial intelligence-based decision assistance software. Stottler Henke has developed a number of decision support programmes that illustrate such scenarios. Synonym: intelligent decision support.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

53

Decision Support System (DSS) A data modelling and reporting system designed to address specific ongoing business problems or issues. It is typically characterized from traditional information systems by an emphasis on “real-time,” or interactive analysis, in which the business analyst can utilize numerous tools on the data to get answers “now.” It differs from Data Mining and Knowledge Discovery in Databases (KDD), where the emphasis is on discovering “new” associations to enhance the present data model.

Decision Table A decision table is a decision tree variant that can be expressed as a dimension-stacked two- or three-dimensional table. A single characteristic is picked to form the split in a typical decision tree or classification tree, and the attribute is chosen independently of the attributes chosen at other nodes. A decision table, on the other hand, selects pairs of characteristics at each level and applies the same split to all nodes on the tree at that level. Although this may result in a classifier that does not perform as well as the generic tree model, it has the advantage of being simple to present and explain to a non-technical user.

Decision Theory Decision Theory is a formal mathematical theory that describes how to make logical decisions when the results are unknown. The theory, which is frequently referred to as Bayesian decision theory, heavily relies on Bayesian methods for merging data. Decision theory provides a foundation for making decisions in the face of ambiguity by assigning probability and payoffs to all possible outcomes of each decision. A decision tree represents the space of possible actions and world conditions.

Decision Tree A decision tree is one way to describe a decision sequence, regression function, or classification function. As a set of nested choices or questions, the tree symbolizes the decision-making or regression/classification process. At each stage, a single binary or multinomial question is presented, and the response decides the following set of options. This strategy is commonly referred to as recursive partitioning in a regression or classification setting.

本书版权归Arcler所有

54

Key Concepts in Artificial Intelligence

The tree-generation technique is similar to clustering in that the goal is to find homogeneous groupings of examples. It differs from clustering techniques in that it focuses on partitioning the dataset on a specific set of predictors in order to achieve a homogeneous value of the dependent variable, whereas clustering techniques form partitions of the dataset on the same set of variables in which it measures the homogeneity criterion. Most partitioning methods are now orthogonal, splitting among one predictor at a time, however, some are oblique, partitioning on many variables at the same time. The tree can be modeled as a set of rules (IF condition 1 THEN condition 2).

Decision Trees A decision tree is a graphical depiction of a hierarchical collection of rules that specify how to evaluate or categorize an object of interest depending on the responses to a series of questions. A decision tree, for example, can define the series of tests that a clinician might use to diagnose a patient. A decision tree of this type will prioritize the tests based on their value to the diagnostic task. The outcome of each subsequent test determines the path a person travels through the tree and, as a result, the tests (and their order) that are offered. When a person reaches a point where no further testing is recommended, the patient has been fully diagnosed. Due to their hierarchical rule structure, decision trees have the advantage of being simple to grasp, and explanations for their diagnosis may be easily and automatically generated. Decision trees may be generated automatically from a group of samples and can reveal powerful predicting rules even when a huge number of variables are involved. These algorithms work by selecting the test that best discriminates between classes/diagnoses and then repeating this process on each of the subsets matching the different test outcomes (e.g., “patients with temperatures greater than 101ºF” and “patients with temperatures less than or equal to 101ºF”).

Decision-Centered Design The use of cognitive task analysis methodologies to find expertise and decision requirements is emphasized in decision-centered design. It promotes designs that prioritize difficult decisions and unexpected situations over normal activities. While decision-centered design focuses on identifying significant decisions rather than exhaustively documenting all conceivable

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

55

cognitive needs, it also acknowledges that individual variances in expertise play an essential role in decision making. Critical decision method (CDM) interviews and idea mapping are two methodologies for determining design requirements. They can, for example, be employed in the design of a crew position aboard a surveillance aircraft and the redesign of anti-air warfare stations in navy vessels’ battle information centers.

Declarative Representation A way for knowledge to be represented in a knowledge base. The knowledge base comprises factual statements (and, optionally, truth-values) such as “All persons are mortal” and “Socrates was a person,” among other things. The inference engine can then deduce Socrates’ attributes, such as the fact that he was mortal. In contrast to a procedural representation, this type of representation has the advantage of being modular and easy to change. This form, however, lacks the search control that a procedural representation provides.

Deductive Database (DDB) A broadening of the common relational database. A DDB sees database objects as a set of ground assertions and has a set of axioms known as an Intensional DataBase (IDB). This database can be seen as a logic program with no functions.

Deep learning The ability for machines to autonomously mimic human thought patterns through artificial neural networks composed of cascading layers of information.

Dependency Maintenance The practice of recording why certain beliefs is held, decisions are made, or actions are conducted in order to assist in modifying such decisions, actions, or beliefs in the face of changing circumstances is known as dependency maintenance. Various truth maintenance system families have been designed to help with dependence maintenance in specific situations (e.g., need to consider many alternate scenarios versus a single scenario, the frequency with which assumptions change, etc.).

本书版权归Arcler所有

56

Key Concepts in Artificial Intelligence

Document Clustering Document clustering techniques allow documents to be automatically sorted into logical classes, allowing users of a full-text database to conveniently search across related content. Finding specific documents among vast online, full-text collections has become increasingly difficult in recent years, owing to the dropping cost of computer storage capacity and the networking of document databases to large numbers of users. Traditional library indexing has not been adequate for retrieving information from these vast sources. Document clustering techniques typically include some natural language processing as well as a set of statistical measurements.

Domain For AI professionals, this is an overused term. A “domain” can refer to a subject area, field of knowledge, an industry, a specific job, an area of activity, a sphere of influence, or a range of interests, such as chemistry, medical diagnosis, putting out fires, managing a nuclear power plant, arranging a wedding, or diagnosing car defects. A domain, in general, is a system in which a specific set of rules, facts, or assumptions operates. Humans can typically deduce what is intended from the context in which “domain” is used; computers, on the other hand, are unlikely to deduce what a human means when he or she says “domain.”

Domain Expert The individual who knows how to conduct an activity inside the domain and whose knowledge will be the subject of an expert system. The knowledge and manner of work of this individual or persons are watched, documented, and stored into a knowledge base for use by an expert system. The knowledge of the domain expert may be augmented by textual knowledge included in operation manuals, standards, specifications, computer programs, and so on that the experts employ. Synonym: subject-matter expert (SME).

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

57

E Early Stopping A method for preventing overfitting in neural networks and other adaptive data modeling tools. The data is essentially divided into a training and validation set, and the modeling technique is “detuned” to learn very slowly. As the model learns on the training set, the error in the validation set is monitored, and the learning (maximization) process is terminated when the error in the validation set begins to increase. In practice, the validation error can vary while overall reducing (and likewise can appear to decrease when the overall trend is upward). To overcome this issue, one solution is to save the intermediate models when training the system to convergence and then back out to the genuine minimum (i.e., overshoot and correct). This approach has been developed.

EASE EASE is a knowledge-based method that evaluates workplace exposure to potentially harmful novel compounds. It is an extension of the C Language Integrated Production System (CLIPS) expert system, with the user interface provided by wxCLIPS.

Edge Coloring

Figure 20. Image showing edge coloring. Source: Image by Wikipedia.

本书版权归Arcler所有

58

Key Concepts in Artificial Intelligence

Edge coloring, like node coloring, provides a visual way to emphasize information in a graphical model. Edge coloring emphasizes the flow of information through the model by coloring the edges based on some estimate of the amount of information flowing through each edge. One such metric could be the weight of evidence. Also see: graphical model, node coloring, and evidence weight.

Edge Detection A set of techniques used in image and vision systems to determine the edges of an item when given an image of the object. In general, this entails comparing the brightness of adjacent parts in the image and looking for a “sharp” change.

Effectors Effectors are a broad term in robotics for a motor-driven device that a robot can utilize to have an effect on the surroundings. This could comprise hands, arms, legs, and any attached instruments.

Electronic Dictionary Research (EDR) Project The Japanese Electronic Dictionary Research (EDR) project is a longterm effort to create a Japanese-English dictionary. It comprises a bilingual word dictionary, concept categorization and description dictionaries, and a co-occurrence dictionary, all of which can help computers grasp natural phrasing as well as a massive corpus of textual material.

ElimBel A simple belief propagation algorithm for a Bayesian Network. It can be applied to both single-connection and multiple-connection networks. However, it necessitates recompiling the entire network whenever new evidence is introduced to the network, and it necessitates as many passes as there are outcome notes. The algorithm requires nodes to be sorted and only updates the final node in the ordering.

Eliza A well-known program that imitates a “Rogerian” psychotherapist. Although it has no knowledge of the dialogue, it can appear intelligent by repeating past assertions that contain important terms in the form of questions.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

59

Elliptical Basis Function Networks Radial basis function networks often compute the Euclidean distance of their inputs, basically determining the radius of the distance between the inputs and the center (the node). When the inputs are filtered through a linear layer that scales and rotates its inputs, the Euclidean distance on the filtered inputs is equivalent to an elliptical distance on the original (pre-linear inputs), and this function is referred to as an elliptical basis function.

Embedded Systems Embedded systems are computers that are built into another device, such as a car, a dishwasher, or a camera. These systems function as intelligent controllers, attempting to optimize performance (like in an automobile) or satisfy the desired standard, such as adjusting the exposure in a camera to ensure a “good” picture. These systems are typically far simpler and more resilient than the standard computer with which most people are familiar. They usually feature a specialized CPU, ROM to store the operating system and program code, some RAM for computation and temporary storage, and some I/O devices to determine the device state and control various functions. The system’s parameters are normally specified when the software is written to ROM, but some systems feature FlashRAM or another type of dynamic memory that allows the system to be altered or “learn” after it has been built. Such systems could be considered highly specialized expert systems, despite the fact that they typically lack the ability to engage with their owners or explain their activities.

Emergence The phenomenon of emergent behavior is the emergence of complex patterns of behavior from the many interactions of simple agents, each of which may operate according to a few simple rules. In other words, an emergent system is considerably more than the sum of its parts. It is possible to occur in the absence of a grandmaster outside the system instructing individual agents on how to act. Without an overall plan, all of the people in a contemporary city operating in their respective capacity as growers, processors, distributors, vendors, purchasers, and consumers of food create a food market matching the supply and demand of thousands of different things. Another example of a simple agent is an ant colony, which operates according to a few simple rules to produce a larger system that finds food, provides shelter, and protection for its members. Artificial intelligence

本书版权归Arcler所有

60

Key Concepts in Artificial Intelligence

software running on powerful computers can also exhibit valuable emergent behavior, such as that seen in automatic scheduling software, which generates near-optimal schedules for complex activities with several constraints.

Empirical Natural Language Processing To understand languages, traditional “rationalist” natural language processing relies on hand-coded rules. It has been highly successful in understanding constrained regions such as inquiries concerning specialized databases (for example, moon rocks or aviation maintenance) or in special worlds such as Winograd’s “blocks” world. Empirical approaches are far more data-driven and can be partially automated through the use of statistical (stochastic) and other machine learning methods. These approaches often concentrate on the distribution of words and word clusters within a vast body of linked text. These are frequently based on techniques like the Hidden Markov Model (HMM) and Probabilistic Context-Free Grammar (PCFG). These strategies can be classified as either supervised or unsupervised. Expert annotation of the text is required for the supervised techniques to indicate the parts of speech and semantic senses of the words. Unsupervised training is more challenging because it requires “proper” sentences in the target language. Although the latter is preferable in terms of preparation time, supervised approaches often outperform unsupervised techniques.

EMYCIN This system is a variant of the MYCIN program. It could be used to build rule-based expert systems.

Epistatasis A gene is said to be epistatic in a Genetic Algorithm when it has a strong interaction with another gene. This is in contrast to the conventional biological understanding, which states that a gene is epistatic if certain of its alleles can repress another gene. The presence of interaction among the parameters suggests that the problem will be more difficult to address because the effect of modifying one gene depends on the condition of others.

EQP EQP is a first-order equational logic automated theorem-proving program. Its features include good implementations of associative-commutative

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

61

unification and matching a range of equational reasoning and rapid search strategies. It appears to perform effectively on a wide range of issues involving lattice-like structures. It is freely available on the World Wide Web.

Equilibrium State In robotics, an equilibrium point is a position in a dynamic system that remains at that point when the system is at that point. Furthermore, the system is (asymptotically) stable if, when the system is at a nearby point, the system remains nearby and (asymptotically) will enter the equilibrium point.

Estimation The procedure for producing a score or scoring function for a dataset. This can refer to either the process of fitting the model, in which you estimate the coefficients of the equations (for example, the parameters of a regression function or the split points in a decision tree), or the process of giving scores to individual observations using the model (e.g., credit ratings). This usually refers to a prediction or scoring function rather than a classification function, but it can also be used in the latter case.

Euclidean Distance The Euclidean distance, as used in nearest neighbor approaches, is a basic measure of the distance between two objects. It is calculated as the square root of the sum of the squared differences between the two objects over each attribute. This distance is a multi-attribute generalization of the plane geometry distance measure.

Evolutionary Algorithm (EA) An Evolutionary Algorithm (EA) is a broad category of fitting or maximizing techniques. They all maintain a pool of structures or models on hand that can be modified and evolved. Each model is graded at each stage of the algorithm, and the better models are permitted to reproduce or mutate for the following round. Some strategies enable successful models to interbreed.

本书版权归Arcler所有

62

Key Concepts in Artificial Intelligence

Figure 21. Image showing evolutionary algorithm. Source: Image by Wikimedia Commons.

They’re all driven by the biological process of evolution. Some techniques are asexual (no cross-breeding), while others are bisexual, allowing successful models to interchange “genetic” information. Asexual models allow a wide range of models to compete, whereas sexual techniques require the models to share a common “genetic” code.

Evolutionary Programming (EP) Evolutionary Programming (EP) is a Machine Learning (model estimation) technique that generalizes the Genetic Algorithm (GA) by retaining evolutionary behavior while removing the close connection to biologic genetics. It works similarly to asexual reproduction in that each generation is evaluated and the more fit are more likely to reproduce. Each “child” model is allowed to mutate during reproduction. This strategy focuses on

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

63

the behavior of the models rather than the internal representation of the model. It is becoming easier to combine many types of models in the same mix. It should be noted that this technique frequently chooses “winners” stochastically rather than deterministically.

Exchangeability A sequence is said to be exchangeable with regard to some measure P(.) if the sequence’s measure has the same value when any two of its components are exchanged. This notion is useful in analyzing and manipulating probabilistic belief networks.

Expectation (Mathematical) The expectation of an attribute is a measure of an attribute’s “typical” location. It is an attribute’s arithmetic mean or average value with regard to a certain probability distribution. It is determined by multiplying each potential attribute value by its likelihood (or density) and summing. A common arithmetic average of a set of k numbers is an expectation with regard to a probability distribution that assigns probability 1/k to each of the set’s constituents.

Expectation-Maximization (EM) Algorithm

Figure 22. Expectation-maximization (EM) algorithm. Source: Image by KDnuggets.

本书版权归Arcler所有

64

Key Concepts in Artificial Intelligence

The Expectation-Maximization (EM) algorithm is a machine learning estimation approach. It can be used if the model meets specific technical requirements. If the model can be simplified by assuming that specific hidden values exist, the model learns by first predicting the hidden values, then cycling through a series of maximizations given the current hidden values, and finally estimating the hidden values given the current maximized values. It has been utilized successfully in a wide range of Machine Learning and statistical models, such as the AutoClass class clustering program.

Expert System An expert system captures the specialist information received from a human expert (such as a bond trader or a loan underwriter) and uses that expertise to make choices automatically. For example, doctors’ knowledge on how to detect an illness can be encapsulated in software. The process of obtaining knowledge from experts and their documentation and properly implementing it into software is known as knowledge engineering, and it needs significant ability to perform successfully. Customer service and helpdesk support, computer or network problems, regulatory tracking, autocorrect functions in word processors, document preparation such as tax forms, and scheduling are all examples of such applications.

Exploratory Data Analysis This term refers to the use of “simple” tabular and graphical displays to acquire a better understanding of a dataset’s structure. Originally coined by Tukey to describe a set of techniques for quickly characterizing a batch of data without resorting to “heavy” statistical modeling, it has since evolved into an alternative approach to modeling, focusing on intuitive and visual techniques for rapidly summarizing data rather than traditional statistical estimation and testing techniques. “Five-number summaries” (median, upper and lower quartiles, and upper and lower “fences”), box and whisker plots, and other smoothed histograms and scatter plots are some of the regularly used techniques. These can be linked in computer animations to allow different sorts of brushing to examine the relationship between different views. More sophisticated algorithms enable “grand tours” of high-dimensional data, which can be led or run in a projection pursuit mode, where the software looks for “interesting” perspectives of the data.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

65

F Facet A facet in a class-based ontology represents information about a slot, such as a constraint on an instance slot’s values. An assertion about the allowable value types (all moms must be of type female, for example) or information on the slot’s cardinality are examples.

Factor Graph A factor graph is a bipartite graph in which one set of nodes represents the variables in the model and the other set of nodes represents the local (probability) functions that describe interactions between the variable nodes. Each function node is linked to the variable nodes that it is dependent on. Similarly, each variable node is linked to the variables that influence or are influenced by it. Factor graphs can have directed edges. In terms of expressing the factorization of a multivariate distribution, factor graphs are more general than Markov Random Fields or Bayesian networks.

Feature Extraction In speech recognition and image processing, feature extraction refers to the reduction of an input signal into a collection of larger features that can be used for subsequent analysis. It is more broadly used to refer to the process of variable reduction.

Feature Vector A feature vector is one approach for representing a textual or visual object in a numeric or machine learning format. A block of text (for example, an article in a newspaper) could be collapsed into a (sorted) list of terms. This list might be compared to a conventional glossary of 50,000 words and represented by a 50,000-element binary vector containing ones (1s) for words that appeared in the document and zeros (0s) for those that did not. This vector could then be used to categorize the document or for further investigation. This sort of representation, which disregards the document’s word order, is also known as a bag of words representation. A feature vector is also a general term used in Machine Learning and related fields to define a vector or list holding the values of attributes for a case. It usually has a defined length (dimension). It is also known as a record or a tuple.

本书版权归Arcler所有

66

Key Concepts in Artificial Intelligence

Feedback In general, this word refers to systems or inputs where the current output or condition can change the effect of the input. On the output, positive feedback acts as an amplifier or magnifier (e.g., the rich get richer and the poor get poorer). Negative feedback reduces big inputs while amplifying minor inputs. This is critical in keeping a system under control or “on target.” Error-driven feedback systems create a corrective term based on the system’s divergence from the current setpoint or target point. This idea is central to any discussion of robotics and control theory.

FFOIL FFOIL is a subset of the FOIL curriculum that focuses on learning functional relationships. It has successfully learned functions such as a greatest common denominator or Ackermann’s function much faster than FOIL in empirical experiments.

Floyd’s Shortest Distance Algorithm This is one of the numerous algorithms for determining the shortest distance or lowest cost pathways between nodes in a graph. An adjacency matrix represents the connections and expenses between the nodes. In n3 steps, Floyd’s algorithm generates a cost matrix. When this number is too large, just a few paths are required, or the costs are dynamic, other algorithms can be employed.

FOIL FOIL is a program that uses inductive logic to learn first-order relationships. It employs a trimmed-down version of Prolog, excluding cuts, failures, disjunctive goals, and functions other than constants. It learns by expanding clauses using a divide-and-conquer strategy until no more instances can be absorbed, and it can simplify definitions by pruning.

Fuzzy Logic Traditional Western logic systems believe that things fall into one of two categories. In practice, however, we know that this is not always the case. People aren’t just short or tall; they can be fairly short or fairly tall, and we all have different ideas about what height genuinely equates too tall. A cake’s ingredients aren’t only not combined or combined; they might also be

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

67

relatively well combined. When a computer makes an automatic conclusion, fuzzy logic allows us to take our commonsense knowledge that most things are a matter of degree into consideration. One rice cooker, for example, employs fuzzy logic to cook rice correctly even if the cook adds too little or too much water.

Figure 23. Fuzzy logic. Source: Image by GeeksforGeeks.

Fuzzy Sets Fuzzy sets are sets in mathematics that have elements with varying degrees of membership. Lotfi A. Zadeh and Dieter Klaua introduced fuzzy sets in 1965 as an expansion of the classical notion of set. Simultaneously, Salii (1965) proposed a more broad type of structure known as L-relations, which he examined in an abstract algebraic framework. Fuzzy relations, which are now employed in linguistics (De Cock et al, 2000), decision-making (Kuzmin, 1982), and clustering (Bezdek, 1978), are special examples of L-relations when L is the unit interval [0, 1]. The membership of elements in a set is assessed in binary terms according to a bivalent condition in classical set theory – an element either belongs or does not belong to the set. In contrast, the fuzzy set theory allows for the gradual assessment of the membership of elements in a set, which is defined using a membership function valued in the real unit interval [0, 1]. Fuzzy sets generalize classical sets because the indicator functions of classical sets are special instances of the membership functions of fuzzy sets, in which the latter can only take values of 0 or 1. Classical bivalent sets are commonly referred to as crisp sets in fuzzy set theory. The fuzzy set

本书版权归Arcler所有

68

Key Concepts in Artificial Intelligence

theory has applications in a variety of fields where information is incomplete or inaccurate, such as bioinformatics.

FORTH A stack-based programming language with a low level of extensibility. It employs a reverse polish (or, postfix) syntax, such that adding two integers is specified by the command sequence a b +, which places the resultant sum (a+b) at the top of the stack. Although basic FORTH is a low-level language, it offers operations that allow the programmer to easily define new and redefine old operations. FORTH was created by Charles Moore for machine control in astronomy and has since extended to a variety of other fields, most notably embedded systems. It has been applied to mobile robots.

Forward Reasoning The process of deducing conclusions from premises. This can result in a rapid accumulation of findings that are unrelated to the desired outcome in automated logic systems.

Fractal A fractal is a compound entity made up of numerous sub objects, each of which has a locally observable feature that is identical to the same characteristic measured on the complete item. Fractals and fractal dimension are concepts utilized in document and vision analysis.

Frame of Discernment The frame of discernment refers to the collection of propositions of interest in Dempster-Shafer theory. It varies from the conventional Universe used by probability theory in that it might include sets whose members are not in the frame of discernment. It can be corrected by splitting apart its sets and coarsened by aggregating them. Two frames of discernment are compatible if they are equal after correction and/or coarsening.

Frame Representation Language (FRL) In the late 1970s, Roberts and Goldstein from the Massachusetts Institute of Technology created the Frame Representation Language (FRL). The frame templates (classes) are grouped in a hierarchical structure, with the relationship between two items stated as “a kind of.” Persons, for example,

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

69

are a type of mammal, which may be a type of mortal. An example of the person class can be Socrates.

Frames In 1975, M. Minsky proposed frames as a method of organizing knowledge. Objects are represented by a complicated data structure that includes a number of named slots that characterised the type of object as well as its links to other objects in the knowledge base. An object’s templates can contain default values. The slots can also be bound by generic relationships (for example, a person’s age is less than the age of the person’s parents) and specific constraints for a single object. Actions (functions) and goals can also be stored in the slots. An object’s frame has named “slots” for information about that object. The data in these slots can then be used to determine appropriate actions and goals.

Function A function is a relationship that accepts zero or more attributes and returns a single item, which may be compound. A slot is a function that takes a single term and returns a single term in class and frame-based systems. A multivariate function is one that returns a compound object, such as a list or an array that can be thought of as a specific single multidimensional item.

Functional Programming Languages Functional programming languages are defined exclusively by well-specified mathematical functions that take arguments and return values without any side effects. Since there is no assignment in a pure functional language, the computation can be distributed across numerous machines with little requirement for synchronization.

Fusion and Propagation A key algorithm for obtaining many marginals from a graphical model that can be represented as a tree is the fusion and propagation algorithm. It defines a rule for fusing incoming messages at each node and spreading those messages out of the node. Because the fusion occurs in the local frame of discernment, the entire joint frame is never formally required.

本书版权归Arcler所有

70

Key Concepts in Artificial Intelligence

Fuzzy Associative Memory (FAM) A fuzzy function or model that accepts a k-dimensional fuzzy input and generates a 1-dimensional fuzzy output. Similar to a regression model or a neural network. The model can either learn (estimate) from data or have its parameters specified in another way, such as by the model designer.

Fuzzy CLIPS Fuzzy CLIPS is an upgraded version of the C Language Integrated Production System (CLIPS) developed at the National Research Council of Canada that enables the creation of fuzzy expert systems. Domain experts can use their own fuzzy terminology to convey rules. It supports any combination of fuzzy and normal words, numeric-comparison logic controls, and rule and fact ambiguities. In approximation reasoning, fuzzy sets and relations handle fuzziness, whereas certainty factors for rules and facts manipulate uncertainty. The usage of the aforementioned modifications is optional, and current CLIPS programmes continue to function properly.

Fuzzy Cognitive Map (FCM)

Figure 24. Fuzzy cognitive chart of drug crime. Source: Image by Flickr.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

71

A graph having nodes connected by signed and directed arcs. The arc from node i to node j indicates node i’s influence on node j. Similarly, the arc from j to I indicates the influence of node j on node i. These arcs are not required to be symmetric. After the nodes have been initialized to some state, the connection matrix obtained from the graph can be utilized repeatedly to identify the evolution of the system as well as any fixed points or limit cycles.

Fuzzy Intersection The intersection of two sets is the set of all items that are in both sets in (crisp) set theory. The fuzzy intersection is the set of all elements having non-zero memberships in both sets. An element’s membership function in this new set is described as the minimum of its membership in the two parent sets. Thus, in fuzzy sets, the intersection of a set and its complement (A not-A), which is defined as an empty set (of measure zero) in conventional crisp sets, can be non-empty.

Fuzzy Rule Approximation (FRA) A fuzzy logic-based inference approach. A collection of fuzzy rules defines a map from the input to the output space. An FRA system tries to substitute fuzzy rules with a neural network that resembles the rule base.

Fuzzy Union The set of all items having non-zero memberships in either of the two sets is termed as the fuzzy union of two sets. The maximum of the two memberships in the parent sets is defined as the membership function of elements in the new set.

False Negative An example of a prediction model incorrectly classifying an item as negative. A false negative, for example, illustrates a circumstance in which a junk email model states that a specific email message is not spam (the negative class) while the email message is actually spam, causing the end user to be annoyed by the junk message appearing in their inbox. In a more serious scenario, a false negative is when a medical diagnostic model fails to recognize a disease that is present in a patient.

本书版权归Arcler所有

72

Key Concepts in Artificial Intelligence

False Positive An example of the predictive model incorrectly classifying an object as positive. For example, the model concluded that a specific email message was spam (the positive class), but that email message was not spam, causing an end user to miss reading a possibly essential communication. A false positive in a more serious circumstance is when a disease is diagnosed as existent when it is not, potentially leading to unneeded and expensive treatments.

Federated Data Repository A repository of virtual data that connects data from multiple sources (e.g., other repositories), giving a centralized access point for locating and accessing data.

Field-Programmable Gate Array (FPGA)

Figure 25. A FPGA based bitcoin mining board. Source: Image by Wikimedia commons.

An integrated circuit with configurable interconnects that can be programmed by the user to be tailored for certain functions after production.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

73

FPGAs provide greater flexibility than ASICs, although at the expense of performance.

Facial Recognition

Figure 26. Man facial recognition technology. Source: Image by Pixabay.

A system that can identify or authenticate a person based on a digital image or a video frame. The software detects facial traits and then categorizes them by matching the image to faces in a database.

Feature or Predictor The quantifiable variable that a machine learning model uses to predict an outcome. An individual’s biometrics, such as height and gender, could be utilized to estimate his or her weight, for example.

First Order Logic To predicate calculus. As logic statements accept variables of entities (contrary to simpler logic systems like Boolean or prepositional logic that do not allow variables), it is referred to as first order logic. Higher order logic systems support more complicated logical expressions by allowing for more variables.

Fisherian Statistics A basic element in inferential statistics, unlike other modern inference schools such as Frequentist or Bayesian statistics. Sir Ronald Fisher developed it,

本书版权归Arcler所有

74

Key Concepts in Artificial Intelligence

and it is also known as fiducial inference. Statistical inference in Fisherian statistics is based on both probability and likelihood, but in Frequentist and Bayesian approaches, inference is based solely on probability.

Fisherian Formal Concept Analysis A formal mathematical method for derivation of hierarchical conceptual ontologies from data. It can be used for data mining, knowledge management, and machine learning.

Forward Chaining Rules An approach in which an Expert system must work “forward” from a problem in order to find a solution. Forward chaining, which uses a rulebased approach, compels the artificial intelligence to decide which “if” rules to apply until the target is achieved.

Field Programmable Gate Array (FPGA) A type of specialized computing module that may be designed to execute specific tasks quickly.

Frequentist Statistics An inferential focus placed on the proportion of a specific sample of data. The probabilities are thus evaluated with a well-defined random experiment in a frequentist approach.

Fully Connected Networks Multi-level networks where all nodes are connected. A generic Boltzmann machine is a form of fully linked network model.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

75

G Gain and Lift Charts Data scientists use this to compare the performance of models to a random process or without the prediction model.

Gallium Nitride A material other than silicon that can be used to make transistors. Gallium nitride transistors have higher electron mobility than silicon and can switch faster, have higher thermal conductivity, and have lower on-resistance than similar silicon alternatives.

Game Theory A branch of mathematics that is used in various fields of research, including economics, biology, and internet and network design, to determine outcomes when participants are complicated in ways that can result in zero-sum games (victory for one result in loss for another) or non-zero-sum games. John Von Neuman, one of the forefathers of modern computers, was an early pioneer in establishing the mathematical principles of game theory, which were later advanced by John Nash and others to a more comprehensive context of noncooperative game theory.

Gate Function In mixture-of-experts or hierarchical mixture-of-experts systems, a gate, or mixing function, is used to combine or choose the various experts’ forecasts or selections for output (at that level). Consider a system that estimates creditworthiness on the basis of salary and other variables. If the prediction of the individual experts comprised of models based on non-overlapping partitions of the training data, the gate function would be a simple selection function that would select the proper sub-model. If the sub models were created using overlapping data regions, the gate function may weigh the prediction of the sub models based on their distance from this data point or their past accuracy.

Gated Recurrent Unit A form of recurrent neural network (RNN), specifically a simplified version on the LSTM type RNN. They are widely used in modelling languages, as

本书版权归Arcler所有

76

Key Concepts in Artificial Intelligence

well as in sequential or time series data. GRU, like LSTM, permits the flow of information control in the individual cells (units) of the neural network architecture, making model training much more manageable.

Gaussian Distribution A form of continuous probability distribution characterized by two parameters, the mean µ and the standard deviation s. Also known as normal or the bell curve.

Gaussian Function The Gaussian function is the traditional “bell” curve, which is commonly used in statistical analysis and as a weight function in interpolation (smoothing) nets. The relevance of a reference point to a given value is proportional to e-2(x-y)2/s, where x is the value, y is the reference point, and s is a bell width parameter. The Gaussian function is also employed as a radial basis function.

General AI A type of artificial intelligence which can be deployed to execute a wide variety of jobs that a human can perform in a broad variety of circumstances.

General Problem Solving (GPS) Inference Engine GPS was looking for a set of procedures that would remove the gap between an initial condition and a final aim. The various steps were determined using a Means-Ends analysis. The GPS inference engine was designed by Newell, Shaw, and Simon.

General Purpose Technology The importance of general-purpose technologies (GPTs) stems from their overall impact on society, and also the wide range of complementary breakthroughs they facilitate. Today, electricity, the internet, and information technology are most likely the most important. Another significant advancement in general-purpose technology is artificial intelligence.

General Regression Neural Network (GRNN) The continuous analogue of Donald Specht’s Probabilistic Neural Networks (PNN) refers to a General Regression Neural Network (GRNN). Rather than

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

77

returning a 0–1 signal from each matched hidden unit, it returns a weight and the output corresponding to each matched hidden unit. The weighted average of the results is used to make the final prediction.

Generalization curve A loss curve displaying both the training and validation sets. A generalization curve can assist in detecting potential overfitting. For example, for a particular case it can imply overfitting because loss for the validation set eventually becomes much larger than loss for the training set.

Generalization Refers to the capacity of a model to produce right predictions on new, unknown data, instead of the data used to train the model.

Generalized EM Algorithm The EM algorithm finds an exact maximum solution to a problem of estimation, such as probabilities in a partially observed network. This strategy, however, may be computationally or analytically impracticable. Approximations for the E and/or M steps are substituted by a generalized EM method. A Gibbs sampling method, for example, might be used to compute an approximation expectation (the E step) or a lower bound on the maximum probability for the M (maximization) step.

Generalized Forward-Backward Algorithm A network probability propagation method. Following some network rearrangement, probability calculations are relayed down the network, with each node storing its state, and then backing up the network to produce final estimations of the network’s state.

Generalized Linear Model A generalization of least squares regression models, which are founded on Gaussian noise, to other types of models, such as Poisson noise or categorical noise. Logistic regression, multi-class regression, and least squares regression are examples of generalized linear models. Convex optimization can be used to find the parameters of a generalized linear model. The average prediction of the optimum least squares regression model is equal to the average label on the training data in generalized linear models. The features

本书版权归Arcler所有

78

Key Concepts in Artificial Intelligence

of a generalized linear model restrict its power. A generalized linear model, unlike a deep model, cannot “learn new features.”

Generalized List A generalized list over a set of elements is a list whose components are either set elements or generalized lists from the set. It can contain elements as well as other types of lists.

Generalized Logic Diagram (GLD) A generalized logic diagram (GLD) is a Karnaugh map that has been generalized. Dimension stacking is employed in the Karnaugh map to embed numerous binary qualities into a two-dimensional table whose entries match to the binary output of that combination. A GLD supports multi-level attributes and allows for a generic answer, which may include a graphic, as the entries. After binning, continuous attributes can be incorporated. A decision table is a subtype of a GLD.

Generate and Test Method Generate test method is a term for a method that produces a sequence of responses before testing them against the current situation(s). They typically consist of two modules: a generator module that generates solutions and a test module that scores them. The approach can be exhaustive, examining all conceivable answers, or it can continue testing until some level of acceptability is reached. This approach is frequently used to recognize new samples or scenarios that are presented to it.

Generative Adversarial Networks Pairs of models that have been alternately trained using competitive deep learning methods. In this case, the first model is taught to distinguish between genuine and synthetic data using a second model. This ability to collect and duplicate changes within a dataset can be used in areas such as healthcare and pharmacology to better understand risk and recovery.

Generative Model In practice, a model that does one of the following: Produces new samples from the training dataset. A generative model, for example, may generate poetry after being instructed on a dataset of poems. This category includes

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

79

the generator component of a generative adversarial network. Determines the likelihood that a new example is drawn from the training set or was generated by the same mechanism that generated the training set. For instance, a generative model may assess the likelihood that new input is a legitimate English sentence after training on a dataset of English sentences. In theory, a generative model can identify the distribution of examples or specific attributes in a dataset.

Generator The subsystem of a generative adversarial network that generates fresh examples.

Genetic Algorithm (GA) A method for estimating computer models (for example, Machine Learning) that is based on methods derived from the science of genetics in biology. For employing this technique, potential model behaviors are incorporated into “genes.” The current models are graded and allowed to mate and reproduce depending on their fitness after each generation. The genes are exchanged during mating, and crossings and mutations can occur. The existing population is discarded, and its descendants become the next generation. Furthermore, Genetic Algorithm is a phrase used to describe a range of modelling or optimization strategies that aim to replicate some part of biological modelling in selecting an ideal. Typically, the object being modelled is represented in a way that allows for automatic modification. The present data is then used to produce a huge number of candidate models, which are then tested. Each model is graded, and only the “best” models are kept for the next generation. Retention is either deterministic (choose the best k models) or random (choose the k models with probability proportional to the score.) These models are then randomly disrupted (as in asexual reproduction), and the procedure is repeated until convergence occurs. If the model is designed in such a way that they have “genes,” the winners will be able to “mate” to produce the following generation.

Genetic Programming A subset of artificial intelligence wherein computer programmes are stored as collections of genes that are updated via evolutionary algorithms. In this

本书版权归Arcler所有

80

Key Concepts in Artificial Intelligence

approach, Darwin’s natural selection principles are followed during genetic programming: the computer programme determines which solutions are the strongest and advances them while the weaker choices are eliminated.

Genomic Analysis At the genomic scale, genomic analysis methods are used to discover, measure, and compare genomic properties such as DNA sequence, structural variation, gene expression, and regulatory and functional element annotation.

Gesture Recognition How a computer sees a particular human gesture or motion. Movements or characteristics in any body action or state can be detected by gesture recognition technology.

Gini Index A merit figure used in machine learning and statistical models such as Classification and Regression Trees (CART). A grouping’s Gini Index is given as 1-s pi2, where pi is the proportion of cases in the i-th category. When all of the cases in a node or grouping are in the same category, it is 0 while it is maximum when all of the examples are evenly distributed throughout all of the categories.

Gradient Clipping When employing gradient descent to train a model, a popular approach for mitigating the exploding gradient problem is to artificially constrain (clip) the maximum value of gradients.

Gradient Descent or Hill Climbing It is a popular strategy for estimation of parameters in Machine Learning algorithms like neural nets or other regression-style techniques. These algorithms “learn” by altering their settings in order to reduce some measure of inaccuracy. The algorithm analyses how a change in the parameters could reduce the current error and progresses in that direction at any stage in the learning process. Typically, the algorithm determines the direction as a function of the error function’s derivative (its gradient) with respect to the parameters or a weighted derivative, where the weights are determined by the matrix of second derivatives (the Hessian matrix). When the direction

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

81

is determined, it is increased by a learning rate parameter and added to the parameter vector’s current values.

Gradient The partial derivative vector with respect to all independent variables. The gradient in machine learning refers to the vector of partial derivatives of the model function. The gradient indicates the direction of the steepest ascent.

Granularity Refers to the smallest unit size that can be modified. Commonly refers to the level of detail or abstraction with which a specific problem is examined. According to Jerry R. Hobbs, one trait of human intelligence is the ability to perceive a world at several levels of granularity (complexity) and to shift between them while analyzing problems and scenarios. The coarser the grain may be while still providing efficient answers to the problem, the simpler the problem.

Graph A graph is a collection of items that are linked to one another. The items are commonly referred to as nodes, while the connections are referred to as arcs or edges. The graph is referred to as a directed graph if the connections have directions attached to them. The graph is acyclic if the connections are such that there is only one path between any two objects. A graph with both qualities is known as a Directed Acyclic Graph (DAG).

Graph Execution A TensorFlow programming environment wherein the programme first creates a graph and then implements all or a portion of it. In TensorFlow 1.x, the default execution mode is graph execution.

Graphical Model The relationships between the attributes of a graphical model can be represented as a (mathematical) graph. The graph’s nodes generally denote variables or actions, while the arcs represent dependencies or information flows.

本书版权归Arcler所有

82

Key Concepts in Artificial Intelligence

Graphics Processing Units (GPUs) A specialized electrical circuit (chip) designed to perform calculations at a rapid rate. Graphics processing unit architectures, which are primarily used for image display, can be applied to a wide variety of goods, such as mobile phones, personal computers, workstations and game consoles.

Gray Codes A type of binary encoding that allows the integers [0,..., 2N-1] to be encoded as a binary string of length N. They have the unique attribute of only differing by one bit between consecutive integers, which is frequently termed as an ”adjacency property.” They have been used to encode numbers in a Genetic Algorithm since a minor mutation in a Gray code encoding results in a small change in the value. Patented by Frank Gray in 1953.

Greedy Algorithm Algorithms are typically defined as greedy when they try to tackle a larger problem by taking short, locally optimal steps whilst neglecting combinations of steps that may lead to a more effective solution. For instance, greedy algorithms are frequently used to build decision trees. The greedy software will choose the best single split as the next level in the tree for every given node, instead of exploring combinations of splits that may produce a better solution few steps later. Greedy algorithms are advantageous in the sense that they are easier to execute and faster than global algorithms, and that they frequently produce “good enough” outcomes.

Greedy Policy A policy in reinforcement learning that always selects the action with the highest predicted return.

Ground Truth A procedure for testing the accuracy of a training dataset in verifying or refuting a research hypothesis, usually performed on-site. Self-driving automobiles, for example, employ ground truth data to train artificial intelligence to properly assess road and street situations.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

83

Group Attribution Bias The assumption that what is true for one person is true for everyone in the group. When data is collected through convenience sampling, the consequences of group attribution bias might be amplified. Attributions that do not reflect reality might be made in a non-representative sample.

Goodness of Fit A model’s goodness of fit indicates how well it matches a set of observations. Measures of goodness of fit generally describe the difference between observed and predicted values under the model. A good fit for a machine learning algorithm occurs when the model’s error on both the training and test data is minimal. As the algorithm learns, the model’s error on the training data decreases, as does the model’s error on the test dataset. If training is done for an excessive amount of time, the model’s performance on the training dataset may continue to deteriorate because it is overfitting and learning irrelevant information and noise from the training dataset. Simultaneously, the error for the test set begins to increase again as the model’s capacity to generalize declines. Thus, the good fit of the model is defined as the point immediately before the error on the test dataset begins to increase and where the model has good skill on both the training dataset and the unseen test dataset.

本书版权归Arcler所有

84

Key Concepts in Artificial Intelligence

H HACKER G.J. Sussman’s HACKER produced strategies for resolving difficulties in the “blocks world.” The programme, which is based on human problemsolving techniques, attempts to find a solution by searching an “answer library.” If no immediate solution is available, it seeks to modify one that is similar. A “criticizer” subsystem then searches for and seeks to correct faults in the plan.

Hamming Distance Classical distance measures, such as Euclidean distance or Manhattan distance, demand that the feature vector’s component properties be on an interval or ratio level. The Hamming distance is a distance measure that counts the number of attributes on which the pair varies. It is ideal for nominal or ordinal measures.

Hardy Hardy is a diagramming tool that works using hypertext. Hardy has been integrated with NASA’s rule-based and object-oriented language C Language Integrated Production System Version 6.0 (CLIPS 6.0), which allows users to quickly construct diagram-related applications.

HARPY In 1975, B. Lowerre developed HARPY, a speech understanding system. The “understanding” was accomplished as a sequence of transitions between these words in this programme, which had a series of precompiled words.

Hashing A mechanism in machine learning for bucketing categorical data, especially when the number of categories is enormous but the number of categories actually occurring in the dataset is minimal. For example, there are around 60,000 tree species on Earth. Each of the 60,000 tree species could be represented by 60,000 distinct categorization buckets. Similarly, if just 200 of those tree species occur in a dataset, hashing might be used to separate tree species into 500 buckets.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

85

Multiple tree species could be found in a single container. For example, hashing could group together two genetically distinct species, baobab and red maple. Hashing, nevertheless, is an excellent approach to map huge categorical sets into the required number of buckets. By grouping data in a predictable manner, hashing reduces a category characteristic with a large number of possible values to a significantly smaller number of values.

HASP A blackboard-based system for real-time sonar signal interpretation in a specified area of the ocean. It detects and identifies the location and movement of ships in its vicinity by using input from several sensors in recognized places.

Hasse Diagram A Hasse diagram is a reduced depiction of a partially ordered set’s complete graph. A complete representation of a partially ordered set contains arcs connecting all nodes a and b for which a R b, where R denotes the ordering relation. A Hasse diagram is a graph that removes all linkages between two nodes that have a longer path between them.

Hebbian Learning A type of unsupervised learning in neural networks that tries to minimize the sum of squared distances between the input cases and a linear subspace of the training cases’ data space. It resembles a Principal Components Analysis.

Helmholtz Machine The computation of probability propagation on a large, multiply-connected Bayesian network can be challenging. A Helmholtz machine addresses this issue by combining the original generative network with a second recognition network that generates a rapid approximation to the intended result. Different recognizer networks may be needed for different sets of visible variables in the network. Recognizer networks are classified as either factorial networks, which assume a basic Naïve Bayes model given the visible variables, or nonfactorial networks, which allow for a more sophisticated relationship between the hidden variables given the visible variables.

本书版权归Arcler所有

86

Key Concepts in Artificial Intelligence

Helpmate Helpmate is an autonomous delivery robot designed for use in hospitals and other similar settings. Meals and medications are among the items that Helpmate can supply. It has the ability to navigate around unpredictable obstructions (humans or other cars). It is currently being tested in the United States.

Hessian Matrix Gradients are sometimes evaluated by the inverse of the matrix of crossproducts or the Hessian matrix in learning algorithms that use gradient or related approaches to learn. The Hessian matrix is the matrix of the error function’s second derivatives with regard to the learning rule’s parameters.

Heteroassociative Heteroassociative models are that relate one set of variables to another, such as a prediction or categorization. The predictive variables are not the same as the target variables (s).

Heterogeneous Databases Databases that store various types of data, such as text and numerical data.

Heuristic A heuristic is also known as a rule of thumb. In other words, a heuristic is an approach for solving a problem that does not always guarantee a successful answer, but generally does. George Polya, a mathematician, is credited with coining the phrase. To illustrate, a heuristic would be to start looking for a missing object in the last place one can recall using it. Heuristics Approximation strategies or “rules of thumb” for problem solving. They are usually applied when the exact answer to a certain problem is unknown or when an exact method is known but would be too complex or impractical to apply. They provide a “pretty good” solution at a “reasonable” cost.

Hidden Layer Multiple nodes in an artificial neural network could be neither the initial input nodes nor the final output nodes. These are commonly known as “hidden” nodes. These notes are frequently organized into layers, with each

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

87

layer receiving input from a “previous” layer, the output of which becomes the input for the following layer. Complicated network schemes might make layer counting challenging. Also see: input layer, output layer.

Hidden Markov Model A Markov chain-based model featuring additional, parallel states that are totally dictated by the underlying Markov chain states. The underlying states are typically not detected, but secondary state information is. Multiple hidden states can produce the same observable states, while a single hidden state can produce multiple observable states.

Hierarchical Clustering A type of clustering method that produces a tree of clusters. Hierarchical clustering works effectively with hierarchical data such as botanical taxonomy. Hierarchical clustering algorithms are classified into two types: Agglomerative clustering allocates each sample to its own cluster and then combines the nearest clusters iteratively to form a hierarchical tree. Divisive clustering puts all samples into one cluster and then splits the cluster repeatedly into a hierarchical tree.

Hierarchical Mixtures of Experts (HME) The HME architecture is a generalization of the concepts underlying decision trees and recursive partitioning techniques. The main idea behind this design is to divide down a huge (hard) problem into a group of smaller problems that “experts” can resolve. The problem is decomposed a priori into a hierarchical input method that enables overlapping clusters (i.e., “soft splits”) in this technique. The “experts” (neural networks) at the bottom of the tree each generate predictions for an input vector, which are then probabilistically mixed as the predictions are transmitted “up” the tree.

Hinge Loss A family of loss functions for classification that seeks to locate the decision boundary as far away from each training example as feasible, hence maximizing the margin between examples and the boundary. Hinge loss or a related function (like squared hinge loss) are employed by KSVMs.

本书版权归Arcler所有

88

Key Concepts in Artificial Intelligence

Holdout Data During training, examples that are purposefully omitted (“held out”). Holdout data can be found in the validation and test datasets. Holdout data assesses a model’s capacity for generalization of data other than the data on which it was built. The loss on the holdout set approximates the loss on an unknown dataset better than the loss on the training set.

Hold-Out Sample When several data-adaptive models are fitted, the models have a propensity to over-adapt to a specific dataset, resulting in restricted generality. In order to control or measure this effect, one way is to “hold out” some of the data to assess the fitted model on this “fresh” data. There are several approaches to this, including the use of training, validation, and test sets, as well as methods such as jack-knifing, cross-validation, and bootstrapping. All of them have the feature of constructing the model on a portion of the data and evaluating/tuning it on another.

Honda Human Honda Corporation presented the Honda Human, a humanoid robot capable of bipedal walking, turning, ascending stairs, and other functions, in 1996. It is the outcome of a continuous endeavor to create human assistants and substitutes for the factory floor.

Hopfield Network An additive autoassociative network in which the signal functions are a bounded monotone increasing function having a symmetric synaptic matrix. These networks have global stability and rapidly converge to fixed locations for all inputs. Thus, after initializing such a network and receiving some input x(0), it will compute x(1), x(2), and so on until it achieves a stable x(INFINITY).

Horizon Effect Berliner invented this term to describe the impact of partial look-ahead in game-playing algorithms. If a programme can predict the end of the game, it can (hypothetically) choose the best sequence of moves. Due to its restricted horizon, the programme may pick subpar moves when it can only view ahead partially (e.g., five moves in a chess game).

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

89

Horn Clause Logical clauses that contain only one positive literal. These are classified into three types, namely, assertions, conditional assertions, and denials.

Horn Core The Horn Core of a propositional formula refers to the weaker Horn formula indicating that formula. It is also known as the greatest Horn lower bound. The Horn Core need not be unique, as several inequivalent Horn formulae can infer a given formula.

Horn Envelope The Horn envelope of a given propositional formula X is the strongest Horn formula implied by X. It is often referred to as the least Horn upper bound. A formula’s approximation by its Horn envelope and Horn Core can aid in speedy approximate reasoning.

Hough Transform The Hough Transform is a technique for image analysis that permits a histogram analysis of an input feature space. The feature space histogram’s peaks correspond to elements in the input image. For example, a detected edge point would add to all conceivable lines passing through that point in the image’s “line space” transform. A line in the input image would then convert to a high position in the created line space.

Human-Centered Computing Computers and other technology should be built to satisfy people’s needs and requirements adequately. All too frequently, they are not. People frequently highlight the difficulty they experience in setting up their VCR to record a TV show, as well as the problems they have in setting up a home computer facility or connecting to the Internet. Artificial intelligence software can be used to provide more human-centered computing, improve system usability, enhance the capability of human thinking, allow better collaboration between humans and machines, and encourage human learning. One aim of human-centered computing is to compensate people and machines for each other’s weaknesses, for the achievement of human goals. For example, with machines that compensate people for their limited short-term memory and the slow pace with which they can find numerous

本书版权归Arcler所有

90

Key Concepts in Artificial Intelligence

alternatives to problems and compensate people for their limited capacity for pattern recognition, language and other problems.

Hybrid Systems Many artificial intelligence applications from Stottler Henke have combined multiple AI approaches. Case-based reasoning, for instance, can be employed in an automated diagnostic system together with model-based reasoning. Case-based reasoning, that is less costly to build and run, might rely on historical databases of past faults, failure diagnoses, remedies and results. CBR can therefore be used to diagnose most failures. Model-based reasoning can be utilized for the diagnosis of less common and costly faults as well as for fine-tuning repair methods in the CBR case base.

Hypergraph A typical graph showing a problem or solution is a single condition in each node. The graph is sometimes called a hypergraph if the nodes indicate compound conditions like in a complex rule. The compound nodes may also be called hypernodes.

Hyperparameter The value of an unknown attribute can be shown as a probability distribution in models of knowledge representation. If the distribution’s parameters are given in another distribution, the second distribution’s parameters are called a hyperparameter. For instance, the uncertainty in binary events frequently consists of a distribution of Bernoulli or, in total, binomial distribution. The two distributions require a parameter, p, to indicate the likelihood of a binary event. When the value of p is unclear, a beta distribution with two parameters, a and b may be depicted A and b are hyperparameters for the original event in this example.

Hyperplane A border which divides a space into two subspaces. For example, a line is a two-dimensional hyper-plane and a plane is a three-dimensional hyperplane. More commonly, the hyperplane is the border between a high dimensional space in machine learning. Kernel Support Vector Machines employ hyperplane systems in a high-dimensional space to differentiate positive classes from negative classes.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

91

Hyperresolution Hyperresolution is a deduction rule that is applicable in systems with automated reasoning. It concentrates on two or more clauses, with one of them (the nucleus) having at least one negative literal and the others (the satellites) having no negative literals. Briefly, it comes to a conclusion if a unifier (the substitution of the terms of the variables) is discovered to produce identical (excluding sign) literal pairs with a positive from a satellite and a negative literal one from the nucleus. This conclusion is drawn from the fact that we ignore paired literals, apply the unifier to the nucleus and satellites simultaneously and combining the resulting literals. It is a generalization of binary resolution in that it takes into account exactly two clauses at the same time.

Hypothesis Testing A statistical technique for testing hypotheses or conjectures concerning unobserved or unobservable numerical features of a population. The statistic that quantifies the quantity of interest is evaluated, together with its variability, on a sample of population data and compared to a range of values consistent with one of the hypotheses. The hypothesis is rejected in classical frequentist hypothesis testing if the value is rare or uncommon.

High-Performance Computing (HPC) Developing, constructing, and running very large computers (together with the necessary software, hardware, facilities, and supporting infrastructure) in order to push the computational upper limits of resolution, dimensionality, and complexity.

Homomorphic Encryption A method for performing calculation directly on encrypted data without the need for a secret key. The outcome of such a computation stays encrypted and can be divulged later by the owner of the secret key.

Human-Machine Teaming (or Human-AI Teaming) The ability of people and AI systems to collaborate in a range of situations to complete complicated, evolving tasks with smooth handoff between human and AI team members. Efforts are being directed toward building effective policies for controlling human and machine efforts, computing approaches

本书版权归Arcler所有

92

Key Concepts in Artificial Intelligence

that best complement humans, strategies that enhance teamwork goals, and designs that improve human-AI interaction.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

93

I Image Analytics Image analytics is the technique of extracting and analyzing information from image data using digital image processing. Image analytics algorithms can recognize multiple aspects at once, in addition to identifying faces to determine age, gender, and sentiment (including logos, objects, scenes, etc.). Simple examples include bar codes and QR codes, but more complicated applications include biometrics and location and movement analysis.

Image Recognition It is a technique that uses machine vision in conjunction with a camera, statistical methodologies, and artificial intelligence to identify items, places, people, writing, and actions in photographs.

Image Search The use of specialized data search methods to locate images is referred to as image search. Image search engines like Google image search and Microsoft photos® present users with images that match their query (keywords, links, or even other images).

Implicant A statement with an implicant B A is a satisfactory set of circumstances that leads to A. When no subset of B is also an implicant of A, B is a prime implicant of A. (e.g., when B is minimal).

Implicit Bias Automatically associating or assuming something based on mental models and recollections. Implicit prejudice can have a negative impact on the following: The manner through which data is gathered and categorized. The design and development of machine learning systems. An engineer might use the presence of a white dress in a photo as a feature while creating a classifier to recognize wedding photos. White dresses, on the other hand, have only been worn in certain ages and cultures.

本书版权归Arcler所有

94

Key Concepts in Artificial Intelligence

Imputation In datasets, the process of “filling in” missing values. A wide range of approaches is available, with the majority of them being addressed in statistical and survey data literature.

Incidence Calculus Incidence calculus is a system for managing uncertainty in expert systems. Rather than explicitly computing belief in a proposition, one may simulate alternative truth-values for the proposal that comes before the hypothesis and count the number of times it is true.

Incompatibility of Fairness Metrics The belief that certain notions of justice are mutually contradictory and cannot be satisfied at the same time. As a result, there is no general metric for measuring fairness that can be applied to all machine learning problems. Rather, it implies that fairness must be defined in the context of a specific machine learning problem in order to avoid harms specific to that situation’s usage cases.

Independence When the probability distribution for X, can be generated without reference to the state of Y given the state of attributes in W, an attribute X is said to be unconditionally independent in probabilistic expert systems. When the set W is empty, X and Y are said to be unconditionally independent (i.e., there are no intervening variables). X and Y are conditional otherwise.

Independent and Identically Distributed In most Machine Learning applications, the algorithms are built on the assumption that all of the observations were chosen randomly and come from the same distribution. A combination in which observations were sampled differently from two or more subgroups, or where the selection of one observation from the population makes another observation or number of responses more likely to be picked, would be a counter-case. The first counter-example violates the assumption of identically distributed data, whereas the second breach the premise of independence. If this assumption is not met, the generalization of any model or information produced from it may be limited.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

95

Indiscernible When it comes to a set of characteristics A, two objects are indistinguishable if they have identical values for each attribute in A. The equivalence class of A is a collection of all the things in the “universe” that are indistinguishable in relation to some A. If the universe consists of integers in the range (0,100) and a single characteristic for an integer is defined as mod(i,3), then the integers 1 and 4 are indistinguishable from mod(i,3) because they both have the same value, 1. This characteristic can have four different values (0,1,2,3), resulting in four different equivalence classes.

Individual Fairness A fairness statistic that examines whether similar people are categorized in the same way. Brobdingnagian Academy, for example, could aim to ensure individual fairness by assuring that two students with identical grades and standardized test scores have an equal chance of being accepted. Individual fairness is totally dependent on how you define “similarity” (in this case, grades and test scores), and if your similarity metric lacks vital information (such as the difficulty of a student’s curriculum), you risk introducing additional fairness issues.

ILP (Inductive Logic Programming) ILP (Inductive Logic Programming is a different way of looking at Machine Learning. The majority of strategies in this field are attribute-based, focusing on determining the values of attributes that best predict outcome variables. ILP assumes a set of predicate definitions, a set of positive examples, and a set of negative examples as background knowledge. The next step for an ILP system is to try to enhance the underlying knowledge in order to generate all of the positive examples while rejecting all of the negative ones. Despite the fact that this appears to be similar to the fundamental problem of inductive learning, ILP systems require that the underlying information and its extensions be stated in predicate logic, most commonly prologue.

Inductive Reasoning It is a method of arriving at certain conclusions based on evidence and data. In reality, this should be very similar to regular programming because it works with existing datasets rather than creating new ones.

本书版权归Arcler所有

96

Key Concepts in Artificial Intelligence

Inference Engine Inference-drawing programs or subprograms are referred to as inferencedrawing programs or subprograms. Rather than the data and premises that offer the knowledge, the phrase refers to the part of the program that does the inference. The component of an expert system that is in charge of inferring new conclusions from existing data and rules. The inference engine is a part of an expert system that can be reused (together with the user interface, knowledge base editor, and explanation system) and can function with a variety of casespecific data and knowledge bases.

Inference Often refers to the process of creating predictions by applying the learned model to un-labelled instances in machine learning. The inference is the process of fitting the parameters of a distribution based on some observed data in statistics.

Information Distance It is a statistic for determining how similar two objects are. It’s utilized in a variety of methods, including unsupervised clustering to locate groups of things that are similar. [ As an example, consider the Levenshtein Distance.

Information Extraction Natural Language Parsing includes the field of information extraction (NLP). It combines natural language processing (NLP) technologies like parsing, tagging, and dictionaries with expert systems or machine learning tools to recognize and structure the concepts in a document.

Information Filtering An information filtering system sorts through enormous amounts of dynamically generated data to give the user the information nuggets that are most likely to meet his or her immediate needs. Information filtering is a subset of the older topic of information retrieval, which also involves information selection. Many of the characteristics of information retrieval system design (for example, representation, similarity measures or dichotomous selection, document space visualization) are also found in

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

97

information filtering systems. Information retrieval from a constantly changing information space is broadly defined as information filtering.

Information Operations Information operations are the tactics, techniques, and procedures used to gain a competitive edge through the offensive and defensive use of information.

Information Retrieval Information retrieval is a branch of computer science concerned with the tools, procedures, and capabilities for extracting, sorting, and organizing usable data from a variety of sources.

Information Table A data matrix is referred to as an information table in Rough Set Theory. Cases are represented by rows, and attributes are represented by columns. Condition attributes and decision attributes are the two sorts of attributes(s).

In-Group Bias Showing favoritism for one’s own group or qualities. In-group bias may invalidate product testing or the dataset if testers or raters are the machine learning developer’s friends, family, or co-workers.

Inheritance Hierarchy An object hierarchy in which each object inherits the properties of the object “above” it and transmits all of its properties to any objects that inherit from it, usually in the form of a tree.

Innovation Diffusion It is said to be artificial intelligence’s (AI) ability to create a multiplier effect for economic growth. Innovation diffusion examples include the driverless car. Driverless cars’ AI systems will create huge amounts of data that will generate opportunities for others to develop new services and products. Insurers, for example, will generate new revenue streams as a result of their capacity to cover new risks and provide clients with new data-driven services. Urban planners and others will be able to take advantage of the technology and develop new services, such as how they charge for road

本书版权归Arcler所有

98

Key Concepts in Artificial Intelligence

usage. Other AI-enabled technologies could have a similar impact and spur growth across a wide range of business activities.

Input Function A function in TensorFlow that returns input data to an Estimator’s training, evaluation, or prediction procedure. The training input function, for example, returns a set of features and labels from the training set.

Input Layer An input layer refers to the nodes of a neural network that can accept external input and is a reference to the standard layered network.

Instance Link An instance link is a link in a knowledge representation scheme that connects a generic class of objects, such as checking accounts in a banking schema, to a specific instance, such as a specific person’s checking account. All of the slots that the general class possessed, such as interest rates, monthly charges, and an account number, will be passed down to the specific account.

Instance Slot A slot in a class that is used to define the properties of instances of that class is known as an instance slot. It doesn’t actually explain the class.

Intellect The name of a natural language interface to database systems is INTELLECT. The Artificial Intelligence Corporation of Waltham, Massachusetts, sells it as one of the first commercially viable AI programs.

Intelligent Automation It is a term used to describe an automation solution that has been augmented with cognitive capabilities, allowing programs and machines to learn, interpret, and respond. Automobile cruise controls are examples of early adopters, whereas a self-driving car is the present state-of-the-art scenario.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

99

Intelligent Computer-Aided Instruction (ICAI) AI ideas and techniques are applied to computer-aided teaching applications. This program differs from standard Computer-Aided Instruction (CAI) in that it allows for student engagement. Rather than using simple scripts to describe the subject matter, knowledge networks are used to store the data, which comprise facts about the subject matter as well as rules and relationships between them. ICAI systems are reactive, modifying their behavior in response to the student’s answer, whereas traditional CAI programs are limited to the author’s scripts. Student models, which track which sections of the knowledge network the student appears to grasp, and diagnostic error rules, which seek to explain the “why” of the student’s errors, are also included in these programs.

Intelligent Enterprise Strategy (IES) Intelligent Enterprise Strategy (IES) is a management concept that focuses on using technology and new service paradigms to boost corporate performance.

Intelligent Entities It has been defined as an entity which exhibits the significant degree of intelligence. It has been said that from its environment it has the ability to reason, make plans, carry out plans, acquire knowledge, learn and also manipulates its environment, and interact with other entities within its environment to some extent.

Intelligent Products Over a global network the intelligent products communicate with humans and each other, like the internet. Intelligent products in artificial intelligence settings include Amazon Alexa, autonomous vehicles and smart energy meters.

Intelligent Sensing For better integration of sensors and better feature extraction there is a need for the utilization of advanced signal processing techniques, data fusion techniques, intelligent algorithms and along with this AI concept to better understand sensor data, leading to actionable knowledge that can be used in smart sensing applications.

本书版权归Arcler所有

100

Key Concepts in Artificial Intelligence

Intelligent System It has been seen that to gather and analyze data and communicate with other systems it is considered as a machine embedded with an internet-connected computer that has the capacity to adapt itself to current data.

Intelligent Tutoring With the use artificial intelligence (AI) software technologies and cognitive psychology models this helps the system to encode and apply the subject matter and teaching expertise of experienced instructors in order to provide coaching and hinting, evaluate each student’s performance, assess the student’s knowledge and skills, provide instructional feedback, and select appropriate next exercises for the student.

Intellipath For the clinical pathologists intellipath is an expert system. To provide diagnostic capabilities and to assist in differential diagnoses it uses a Bayesian Network. To explain decisions, it also has the capability and can recommend tests and data to confirm its diagnoses.

Intension of a Concept The set of all properties that are satisfied or shared by all members of an extension of a concept is the intension of a concept. By the machine learning and knowledge discovery techniques the intension and extension can be used to generalize new cases and generate new concepts.

Interlingua To describe artificial meta languages, it is a term used in machine translation. For every pair direct translation between a set of languages would require a translator. It has been seen that for three languages, this would require three translation pairs, four languages would require six pairs, five languages would require 15, and so on. On the other hand, the problem only requires maintaining translators from each language into and from the interlingua only if the items are translated from a source language into a single interlingua and thence into the target.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

101

Internal Disjunction By taking disjunctions on the values of attributes an attribute level of disjunction is being formed. Manipulation of conjunctive forms that allow internal disjunctions can cause combinatorial problems.

Internet of Things (IoT) For the information society a global infrastructure enabling the advanced series by interconnecting the things which are based on existing and evolving interoperable information and communication technologies.

Interpolation Net To estimate a response based on input values an interpolation net is a twolayer neural network that is used. Between each of the nodes and the input point the first layer computes the Gaussian distance, and the second layer combines each of the node’s values according to the Gaussian weights.

Interpretability The ability to understand the value and accuracy of system output. Interpretability refers to the extent to which a cause and effect can be observed within a system or to which what is going to happen given a change in input or algorithmic parameters can be predicted. Interpretability complements explain ability.

Interpretable AI To interpreting machine learning models by non-data scientists it is considered as the framework that provides a rigorous approach, that go beyond the quality metrics or statistical measurements.

Interpreter It is a computer program that reads the input files and also immediately translates and executes the program. Examples include LISP and BASIC interpreters, which allow you to write and evaluate code dynamically. Therefore, it usually takes less time to interpret and test a program than it does to compile the complete program.

本书版权归Arcler所有

102

Key Concepts in Artificial Intelligence

Inter-Rater Agreement At the time of doing task a measurement of how often human raters agree. It has been said that the task instructions may need to be improved if raters disagree and also sometimes called inter-annotator agreement or inter-rater reliability. Cohen’s kappa is also one of the most popular inter-agreement measurements.

Intersection Over Union (IOU) By their union the intersection of two sets gets divided. IOU is used to measure the accuracy of the model’s predicted bounding box in the machine -learning image-detection tasks, with respect to the ground-truth bounding box. So, in this case IOU for the two boxes is the ratio between the overlapping area and the total area, and its value ranges from 0 (no overlap of predicted bounding box and ground-truth bounding box) to 1 (predicted bounding box and ground-truth bounding box have the exact same coordinates).

Interval Attribute The numeric values are being used by an interval valued attributes for which the relative differences are meaningful whose meaning would not be changed if the values were translated or multiplied by a positive constant. The zero in these scales is arbitrary. A familiar example would be temperature in degrees Celsius. The difference between 20 and 30 is twice as large as the difference between 20 and 25, but the 30-degree temperature is not 50 percent larger than the 20-degree temperature. The zero value in this scale is arbitrary. This level of attribute supports the same operations as do ordinal and nominal value attributes as well as sums and differences. Scale and origin dependent operations, such as multiplication and exponentiation, do not have an intrinsic meaning for these types of attributes.

Irreflexive It has been noted that an irreflexive binary relationship R is one in which the relationship a Ra is not true. For this an example would be the relationship is a parent of item matrix. By matrix factorization a matrix of embeddings generated that holds latent signals about each item in the recommendation system. For all the items each row of the item matrix holds the value of a single latent feature. For instance, movie recommendation system is being considered. A single movie is being represented in each column in the item

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

103

matrix. Among the genre, stars, movie age, or other factors the latent signals might represent genres, or might be harder-to-interpret signals that involve complex interactions among. As the target matrix the item matrix has the same number of columns that is being factorized. For instance, given a movie recommendation system that evaluates 10,000 movie titles, the item matrix will have 10,000 columns.

Items In a recommendation system, the entities that a system recommends. For example, videos are the items that a video store recommends, while books are the items that a bookstore recommends.

Itemset To denote groupings of attributes which is used in data mining. For instance, in a database the set of all 2-itemsets would be the set of all pairs of attributes. It has been noted that interest is usually focused on kitemsets where all k elements are true, and a “large” number of cases exist in the mining of association rules. As the support for the itemset the number (or proportion) of cases in an itemset is referred.

Iteration During training a single update of model’s weights. With respect to the loss on a single batch of data an iteration consists of computing the gradients of the parameters.

本书版权归Arcler所有

104

Key Concepts in Artificial Intelligence

J Java In the year 1990s sun microsystems developed this object-oriented computer language in order to support programming for small devices. On the World Wide Web it has since become very popular and it is making inroads into general programming. It has been seen that similar to C and C++ Java is syntactically but has many semantic similarities to languages such as MODULA.

Java Bayes In the Java programming language Java Baves is the first full implementation of a Bayesian Network. In the network a user can assign values to some of the variables in this system. To infer the optimal levels of the remaining values the system will use these values. For any unknown values, average values for univariate functions, and maximum a posteriori configurations and univariate functions it can also derive such quantities as marginal distributions. It has been said that system is freely available and also runs on multiple platforms.

Jeremiah With orthodontic treatment plans Jeremiah is a commercial rule-based/fuzzy logic system that provides dentists for cases suitable for treatment by general dental practitioners with a knowledge of removable orthodontic techniques.

Jess In java jess is a version of the CLIPS expert system that is written entirely. At Sandia Laboratories Ernest Friedman-Hill is developed Jess and it is available through the World Wide Web (WWW). To build the java application this system allows and applets that have the ability to reason.

Jitter In training data, the one method is to avoid over-fitting in order to use jittering. To the training data noise is deliberately added. It has been said that when the model is trying to estimate a smooth relationship this is a sampling-based form of smoothing and works well. In the underlying

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

105

relationship this technique will however obscure any discontinuities. As the kernel techniques this is closely related to such techniques, where the data points are replaced by multivariate distributions and ridge regressions, which add “prior” information to the regression function. In the ridge regression selection of the size of the jitter is equivalent to the statistical problem of choosing a kernel bandwidth or the size of the prior information.

J-measure For the association rules the J-measure is a scoring rule. It has been said that in evaluation of a rule of the form if X occurs, then Y will occur,” with a given frequency (support) and confidence, the J-measure provides a single measure that trades off rule frequency and rule confidence. The J-measure is the product of the probability of X, the IF part of the rule, with the crossentropy of the rule.

Junction Graph From the cliques of a graphical model a junction graph is being formed. In this graph each clique becomes a node, and through an intersection node, labeled with the name of that common variable any nodes that share a common variable are connected. From the moral graph of the graphical model the junction graph is usually formed.

Junction Tree From a junction graph it has been said that a junction tree is a spanning Markov tree formed. In two nodes any of the two attributes are also on the path between the two variables. On a graphical model a well-constructed junction tree lowers the computational costs. It has been noted that they are being used to derive conditional distributions from belief nets after evidence (data) on some of the variables has been obtained.

本书版权归Arcler所有

106

Key Concepts in Artificial Intelligence

K KAISSA A Russian chess playing program that won the 1974 world chess computer championship.

Kalman Filter It has been said that this filter will quickly converge to optimal behavior, when the measurements and error terms meet certain distributional criteria. It is an adaptive linear filter or controller.

KAPPA It has been referred as the rule -based object-oriented expert system tool and application developer. For the PCs KAPPA is written in C, and is available also look at the AI languages and tools.

Karnaugh Map For displaying multi-attribute binary relations in a single cross-classification a Karnaugh map is a method. Into the two groups the attributes are being combined and within each group, are stacked to create two compound attributes.

K-D Tree A fast method for finding the nearest neighbors of a case in a high dimensional space is being provided by the k-d tree. As a decision tree the entire space of cases or examples is stored. Decisions, or tests are being represented by the non-terminal nodes that are used to narrow the set of candidate points recursively. To determine the best match the final node points to a set of possible matches or neighbors, which can be examined individually. To identify neighbors with a logarithmic time this method allows procedures rather than linear time.

Kernel Function A weighted distance function is being used by many local learning algorithms to control the influence that distant cases have on a prediction. As a kernel function the weighting function is being referred. They are being considered

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

107

as the typically positive functions that integrate to one and have compact support. Examples would include a radial basis function, used in some forms of neural nets, and a simple windowing function, which is defined to be one over a small range and zero otherwise.

Kernel Regression On the values of the other cases that is within a given distance of the target case or observation a technique that bases predictions, or classifications is required. By their distance from the target the influence of other cases is weighted.

Kinematics It has been known as the study of spatial relationships between a configuration of mechanically interconnected and constrained rigid bodies.

Kismet It has been considered as the part of the COG project. For the social interactions it is developing a robot which is being designed with the humans. To engage in meaningful social exchanges with humans it focuses on teaching robots. In this project the researchers have chosen to model a caretaker-infant dyad where the human acts as the caretaker. Kismet is capable of a wide range of facial expressions.

KL-ONE For the knowledge representation it is a frame-based language. In the year 1978 R. Brachman developed KL-ONE at BBN. By including the capability, it extended the ideas of Frame Representation Language (FRL) and Knowledge Representation Language (KRL) to have constraints involving more than one slot. For instance, a person’s birthday slot must be less than the value in his/her high-school graduation day slot.

KnowEXEC It has been referred as an extension of the CLIPS. As a helper extension KnowEXEC is designed to be used in the Netscape Navigator Browser. It allows users to download and execute CLIPS knowledge bases.

本书版权归Arcler所有

108

Key Concepts in Artificial Intelligence

Knowledge and Data Discovery Management Systems (KDDMS) Knowledge and Data Discovery Management Systems (KDDMS) are used to refer to proposed “second generation” Knowledge Discovery in Databases (KDD) systems, which would include extended database support and KDD query languages.

Knowledge Base To a program the collection of facts is available. An organized collection of “facts” or statements of relationships between objects a database can be thought of as an organized collection of data and a knowledge base. By the knowledge representation method, the particular organization of the knowledge is dictated chosen by the program designer.

Knowledge Compilation For the process of representing a propositional formula the term knowledge compilation has been used by its optimal upper and lower Horn approximations. These approximations are also known as the Horn envelope and Horn Core of the formula.

Knowledge Discovery in Databases (KDD) In the databases knowledge discovery is the general process of discovering knowledge in data. So, in this process there are nine steps. First is the development of an understanding of the application domain and the goal of the process. The second step involves the creation of a data set (the “data mine”), which leads to the third and fourth steps of cleaning, preprocessing, reduction, and projection. The fifth step is choosing the Data Mining method to match the goals chosen in the first step. The sixth and seventh steps are the exploratory analysis and model/hypothesis selection, followed by the actual mining process. The final two steps involve the interpretation of the patterns and acting on the derived knowledge. In summary, KDD is an attempt to automate the entire art and practice of data analysis and database inference.

Knowledge Discovery in Text (KDT) The techniques of KDD us being adapted and extended by this subdiscipline of Knowledge Discovery in Databases (KDD), into tables and database which is primarily oriented toward data that can be represented

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

109

numerically or structured and into the area of textual collections, which lack the structure and numeric content of databases. To analyze financial news which is possibly applied to a portfolio advisor.

Knowledge Engineering From the human expert’s knowledge engineering is the process of collecting knowledge in a form suitable for designing and implementing an expert system. The person conducting knowledge engineering is called a knowledge engineer.

Knowledge Engineering To build the expert system this term is being used to refer the techniques and tools. In an expert system or rule base, a knowledge engineer is one who implements the process of interviewing domain experts or reviewing case histories and representing the extracted knowledge.

Knowledge Interchange Format (KIF) For the specification and interchange of ontologies the Knowledge Interchange Format is a proposed standard. First-order calculus is being used by it and also allows the definition of objects, functions, and relations, and the representation of Meta-Knowledge. Ontologies to be exchanged between knowledge systems the use of this format would allow and as well as allow systems to import “foreign” ontologies. It has been said that the developers of this format have also released a number of ontologies as well as an ontology editor.

Knowledge Management With the help of the organizational knowledge, it is the process of capturing, developing, sharing, and effectively. To achieve the organizational objective, it refers to a multi-disciplined approach by making the best use of knowledge. In the fields it includes the courses are being taught. To the KM research other fields have started contributing more recently and these include information and media, computer science, public health, and public policy. On the organizational objectives knowledge management which typically focuses such as improved performance, competitive advantage, innovation, the sharing of lessons learned, integration and continuous improvement of

本书版权归Arcler所有

110

Key Concepts in Artificial Intelligence

the organization. With the organizational learning KM efforts overlap and may be distinguished from that by a greater focus on the management of knowledge as a strategic asset and a focus on encouraging the sharing of knowledge. It has been seen as an enabler of organizational learning and a more concrete mechanism than the previous abstract research.

Knowledge Query and Manipulation Language (KQML) For exchanging information and knowledge between software agents and knowledge bases the Knowledge Query and Manipulation Language (KQML) is both a language and protocol. By the other agents or another Knowledge Base it is extensible and specifies the operations that an agent will allow to be performed. A special class of software agents specifies the behavior of communication facilitators. To coordinate the behavior of other agents these software facilitators are designed.

Knowledge Representation There are two basic techniques of artificial intelligence and knowledge representation is one of them, and the other one is the capability to search for the end points from a starting point. On the prospect there is a powerful effect because of the way in which the knowledge is being represented for a computer or a person in order to draw conclusion or make inferences from that knowledge. The numbers which are to be added must be considered and also which is easier in adding 10 + 50 in Arabic numerals, or adding X plus L in Roman numerals. There should also be a consideration for the use of algebraic symbols in solving problems for unknown numerical quantities, compared with trying to do the same problems just with words and numbers.

Knowledge Representation Language (KRL) For the knowledge representation a frame-based language is being developed by Bobrow and Winograd in 1977. It has been seen that this includes all the frames of the Frame Representation Language (FRL) and an added patternbased inference method as well as a number of primitives for forming instances of classes.

Knowledge Representation With the information the Knowledge Representation of a program is the means by which a program is supplied from which it reasons. As a knowledge

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

111

base the collection of information available to a program is often referred. It has been seen that several issues must be addressed at the time of building a knowledge base. To represent the knowledge these, include notation and the allowed operations in this notation, a method of assigning meaning to the facts in the Knowledge Base, the choice of a declarative or procedural representation, and finally a method for organizing the knowledge (e.g., frames.). It has been said that the Knowledge Representation also must match the inference scheme that will be used in the program. For representing the knowledge analog schemes can be generally divided into two types: propositional and analog. Along with all of its relationships to the elements an analog representation of a situation every element appears once. By the procedures these representations can be manipulated that are often similar to physical procedures, whereas a propositional representation is manipulated by using general rules of inference. It has been seen that analog representations are often in a form that can be continuously modified, unlike propositional representations, which tend to be discrete as in the location of a vase on a table. In the representation the structural relationships among the objects being represented, rather than being represented by truthvalues attached to a proposition.

Knowledge Source Combination It is a technique which is used in speech recognition and other areas where multiple sources of information (“knowledge”) are combined to provide the best interpretation of the input. For instance, several alternate probabilistic methods might be applied to the same input and their scores combined to determine the “most likely” meaning of the input.

Knowledge In a model there is the representation of the values, parameters and rules that have been learned from the data is known as the knowledge of a model.

Knowledge-Based Planning The planner’s incomplete knowledge state and the domain actions is being represented by the knowledge -based planning. In terms of how they modify the knowledge state of the planner actions are being modeled rather than in terms of how they modify the physical world. This approach scales better and supports features that make it applicable to much richer domains and problems. These are said to be knowledge rich approaches, such as

本书版权归Arcler所有

112

Key Concepts in Artificial Intelligence

hierarchical task network planning which are having the advantages of scalability, expressiveness, continuous plan modification during execution, and the ability to interact with humans. Therefore, these planners also have limitations, such as requiring complete domain models and failing to model uncertainty, that often make them inadequate for real-world problems.

Knowledge-Based Representations For the expert and other intelligent systems, this is the form or structure of databases and knowledge bases so that the information and solutions provided by a system are both accurate and complete. Typically, it has been seen that it involves a logically-based language capable of both syntactic and semantic representation of time, events, actions, processes, and entities. Knowledge representation languages include Lisp, Prolog, Smalltalk, OPS5, and KL-ONE. Structures include rules, scripts, frames, endorsements, and semantic networks.

Knowledge-Based Systems For the expert system it is usually a synonym though some think of expert systems as knowledge-based systems that are designed to work on practical, real-world problems.

Knowledge Seeker It has been referred as the early commercial classification tree program. It offered multi-way splits, unlike the Classification and Regression Trees. It also used a different set of splitting criteria.

Kohonen Network In the neutral networks a Kohonen network is a form of unsupervised learning and is similar to a k-means cluster analysis. Each of the hidden units acts as a cluster center and the error measure is the sum of square differences between the input data and the nearest cluster center. To an input pattern the clusters compete with each other for the right to respond.

Kullback-Liebler Information Measure A single-number summary for comparing two distributions or models is being provided by the Kullback-Liebler information measure. It has been said that the distance between a true distribution (or model) and some other

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

113

distribution (or model) is defined to be the average difference between the log density of true and the other distribution, averaging over the true distribution. As an information theoretic criterion, it can be derived and it is often used for the comparing of various models for data. So, the KullbackLiebler distance requires knowledge of the true distribution, related measures, such as the Akaike Information Criteria (AIC) are often used.

本书版权归Arcler所有

114

Key Concepts in Artificial Intelligence

L Lamarckian Evolution To inherit it a form of evolution in which children are allowed to acquire the characteristics as well as, or instead of, simple genetic characteristics.

Lambda Calculus As arguments the mathematics of the functions that can take other functions and also returns other functions as results. It forms the foundation for procedures in LISP and later computer languages.

Language Model From an input acoustic sequence, the language model is used to compute or assign the probability of a word sequence W in the speech recognition. It is typically probabilistic and depends on previously derived grammars, as well as the application domain and the speaker.

Last-In First-Out For the further evaluation there is a requirement of the retained clauses which has to be maintained by the automated reasoning systems, as opposed to choosing the first retained clause to drive the reasoning.

Latent Variable In a system an unobserved or unobservable attribute that varies over units and whose variation influences recorded attributes. Based on databases this can affect the results of Data Mining or other efforts that were collected for other unrelated purposes, or data collections that are formed to capture “natural” data sources (e.g., credit card transactions). With a specific purpose in mind and are missing some pertinent variables if the data were not collected, then these variables would be latent.

Lattice A lattice is a relatively ordered set that contains a least upper bound and substantial lower bound for every pair of elements present in the set. One imputation is that conjunction and disjunction are apparent on any defined subset of the lattice, the main reason for this is, the subset’s least upper

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

115

bound and greatest lower bound, sequentially. If every subset has both a greatest lower bound and a least upper bound (which is most of the time true when it comes to a finite set) then the lattice is said to be absolute. A finalized lattice always has a unit and a zero.

Lauritzen-Spiegelhalter Architecture A method for generated data by means of a join tree presentation of a belief net. It is somewhat less general as compared to the architecture by ShaeferShenoy; in that it needs that continuer subsist for the tree.

Learning Rate In neural networks and different other machine learning methods, the learning rate parameter identifies how fast the model adapts itself to some new case. A quick- learning rate leads to adaptation which is fastly obtained, but is also responsible for leading to uncertainty, while a slow learning rate can cause the algorithm to ignore new case. These parameters are often set by hand but can also be determined by techniques such as cross-validation.

Learning Rate parameter Many learning algorithms carries a parameter, known as a learning rate parameter, that is most commonly between 0 and 1 and regulates how quick the algorithms can transform, can overhaul their internal parameters as a reaction to new data, or can reduce their inaccuracies in the course of learning. Suppose, If the parameter is too large, then the model will repeatedly go too far to new data or faults and fail to intersect; on the other hand, if it is too small, it will react deliberately or in other words it will take a lot of time reacting as well as taking a long time to coincide. In inclined learning algorithms, step size is the name often given to the learning rate, one of the reasons for this is, because it controls the size of the step from one parameter vector to another for each step (that is iteration) of the learning (estimation) process. It can be framed by hand or algorithmically.

Learning Vector Quatization Networks A construction of supervised form of a Kohonen network. Each class contains one or more codebook vectors allocated to it; and a case is categorized by assigning it to the imminent cluster, as in the case of a nearest neighbor algorithm.

本书版权归Arcler所有

116

Key Concepts in Artificial Intelligence

Least General Generalization (LGG) Least General Generalization (LGG) is a vigilant generalization method implemented in Inductive Logic Programming (ILP). It conjectures that if two clauses are accurate, then their most particular generalization is also anticipated to be true. The LGG of two causes is evaluated by computing the LGG of the factual terms in their heads and bodies, exchanging a variable for the parts that do not counterpart.

Lift Lift is a measure, which is modified from methods of direct marketing, to outline the gain in groupings or prediction because of a classifier or other model. If the usual success rate is f(Y), and f(Y|X) is the success rate provided the classifier, then in that case the lift is sustained as f(Y|X)/f(Y). When the model fabricated a constant response, a substitute representation, known as the cumulative lift, evaluates the lift. This is sometimes plotted as the cumulative success versus cumulative categories of the predictor, devised from best to worse. The figure depicts a predictor (solid line) that surpasses a “non-informative” predictor (dotted 45° line.)

Likelihood In the techniques that are probabilistic and statistical, the prospect is an evaluation of the evidence for the data when a speculation is assumed to be accurate. It is usually utilized in Bayesian and Quasi-Bayesian methods. A comparative measure, minimum message length, can be obtained from information theoretic concepts. As an instance, suppose someone observe a data value X that is presupposed to be Gaussian. The likelihood for that particular data when the anticipated mean is, expected to be, 5, and the variance is 10, is correlative to e-((X-5)2/2*10), the kernel of a Gaussian dispensation. A likelihood is generally calculated by making the use of the kernel, instead of the complete distribution task, which incorporates the constants that are generally normalizing.

Likelihood Ratio When it comes to calculating the likelihood of the evidence while illustrating the contrast between two disjoint hypotheses, HA and HB, the likelihood ratio is the ratio of P(e|HA) to P(e|HB). It is at times known as a Bayes factor for HA versus HB. The caudal odds for any initial can be determined by the means of multiplying this ratio by the prior odds ratio.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

117

Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA) is a structure of supervised learning, originally established as a statistical tool, for searching an optimal linear rule to detach two classes (or hypothesis). The algorithm locates the line in a hyperspace that links the centroids of the positive and negative instances and opt for a breakpoint throughout that line that best distinct the class, weighting the coefficients and holds it responsible for any linear correlations between the features. When the concept proves to be linearly distinguishable, LDA can entirely divide the classes. The “linear” in this term denotes the linearity of the role that puts together the attributes, and not to the features themselves. The attribute list can incorporate powers and products of various other attributes, in addition to this, other functions like spline functions of the “real” attribute. The directive is of the form a + b.x + c.y + d.z + ..., i.e., it remains linear in the provided attributes. The value of a, often knows as the bias or the intercept, is sometimes chosen in order to state that any positive score on an attribute is held up to mean membership in one class, and a score that is negative to mean membership in the other. It can be outstretched to study multiple classes by the medium of a method referred to as Multiple Discriminant Analysis (MDA), which evaluates k-1 linear functions to distinct k classes in accordance or on the groundwork of the attributes.

Linear Layer Neural networks are most commonly organized as layers of nodes. Where each node (except for input node) generally gathers its inputs and performs a transition by the technique of using an activation function. Activation functions are normally nonlinear (logistic functions or radial basis functions) yet few of the networks such as either the nonlinear proposition components or the elliptical basis function networks incorporates a layer of nodes that plainly is relevant to a linear model as well as to their inputs. When the number of nodes becomes equals to the number of inputs, then this becomes equivalent to carrying out an oblique rotation on the input data; whereas, when in this layer the number of inputs is more than the number of nodes, then the layer is decreasing dimensionality.

本书版权归Arcler所有

118

Key Concepts in Artificial Intelligence

Linear Model To explain a linear model, it is an analytic model that is, in some aspects, linear in the framework of the model (instead of the attributes that are implemented to structure the input to the model). A finest example of this is a linear regression, where the output is patterned as a weighted sum of the input or predictor variables. The weights are present in the form of parameters, and if in case they enter linearly, the model is called linear model. Models can also be regarded linear if there are some invertible oneto-one transitions of the predicted variable that generates a linear model. A classic instance of this would be the logistic model present in the binary data. The logistic model is a linear model for log(p/(1-p)), whereby p is the proportion or likelihood being evaluated. An example of a non-linear model would be a linear regression amended to incorporate an unknown power transitions on one of the input variables.

Linear Regression A unique type of linear model where the output (that is dependent) variable is a simple weighted aggregate of the input (that is independent) variables. When the weights are usefully independent, then in the parameters the model is linear. Classical linear regression also needs assumptions on a common fault term over the domain of the predictors, in addition with the assumptions of independence and regularity of the errors.

Linearly Separable A concept or class is referred to as linearly separable in a framework of (binary) attributes if the instances which are members of a class can be distinguished from the negative examples through the hyperplane which is in the attribute space. For example, for one dimension (along a line), a concept would be linearly separable if all the positive examples settle down on one side of a point which is present in the line and all of the adverse example’s place on the other side. Both perceptron’s and Linear Discriminant Analysis can study hypothesis that are linearly distinct.

Link Analysis Link analysis is a method for exploring interrelations across large numbers of objects of various types. It is a comparatively new area and is implemented

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

119

in police investigations, fraud detection, epidemiology, and many such same domain. Unlike many graph-based methods, which activate a graph from a multivariate framework, associate analysis starts with data that can be accorded in the constructions of links.

Linked Inference Rules Linked inference rules modify the syntactic restraints of ordinary inference rules through inducing link clauses to cater to as bridges in between clauses that begins inferences and clauses that are accountable for completing inferences. UR-resolution, hyper resolution, and paramodulation can all be outstretched by the mode of allowing link clauses to associate the nucleus together with the satellite clauses. John McCarthy invented LISP in 1956, which is held as the most popular language for the work of AI. It was framed from the initial times to be a symbolic processing language, which accounts for being admirably fit for carrying out the tasks of AI. It has been redesigned and extended a number of times since its original design. Some of the languages which are related or descendent include MacLisp, Common LISP, as well as Scheme.

LISP LISP (is a short form for the term list processing language), which is a computer language, and was invented by John McCarthy, who was one of the explorers of artificial intelligence. The language is archetypal for portraying knowledge (for instance, suppose if a fire alarm is ringing, then it denotes that there is a fire from which conclusions are to be derived.

LISP Machines During the late 1970s and 1980s, typical gains in LISP processing speed were said to be acquired by establishing computers that were designed for the LISP language. Gradually, computers having a general aim became quick and cost efficient to get hold of the markets for these expertise machines.

Literate Programming The programming and writing documentation practice are carried out at the same time, is such a trend that the outcome is framed to be read by humans and producing “real” computer code in the given period of time. The term was coined by D. Knuth in the course of his development of the system

本书版权归Arcler所有

120

Key Concepts in Artificial Intelligence

of TeX. Although the documentation is associated together with either the source code or the pseudo-code, systems created for literate programming are sometimes known as the WEB systems, succeeding Knuth’s nomenclature. This use of the term WEB substantially marks its trace from the times and in reference to the World Wide Web (WWW). It is a technique that is quite convenient which aims to teaching.

Local A local method is one which only makes use of information and details from cases that are in some manner “close” to the area of target. For instance, classical linear regression is considered as global due to the fact that all observations come up together to the linear functions that are implemented to shape the intercept as well as the slope parameters. LOESS regression is said to be local because only the examinations that are very near to the points of target are used in constructing the approximate at the point of target.

Local Operators Refers to functions that regulate in a limited neighborhood around a specified item and is said to be a term handed down in feature analysis (for example, a pixel in a picture).

Local Reasoning Monotone logic possesses the property that if in case A => B, then A&C => also signifies B. The logic denotes an attractive property of backing up the local reasoning, which states that the conclusions studied from sections of the information grasps for the complete set of information.

Local Regression The perspectives of Classical regression anticipates that a specific structural relationship supports the entire domain of predictors. Local regression methods assume that generally the frameworks that are simple hold approximately each case in the space of data, but that data implies “interestingness” decreases the further one goes from its region. Local regressions can commonly reproduce an approach of classical regression at the time when the global regularity holds and exceeds it when the regularity does not hold. The different form of local regression are smoothing splines, neural networks and regression and classification trees.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

121

Locally Optimal Searches (Solutions) A search algorithm which is locally optimal will use a process that is capable of looking for the best one-step or two-step outcome but is not undertaken to look for the solution which is globally optimum. Most of the time, improved results can be found by repeating the search or by building up random perturbations to solutions that are local.

Loebner Prize Every year when a prize is awarded for the computer program that draws parallels at its best in the context of natural human behavior. The selection of the winner is done at an annual contest where judges sitting at computer terminals makes effort to govern and find out whether or not the hidden respondent is a machine or a human.

LOESS A method of local regression evolved in statistics and implemented in Data Mining. An analysis of classic regression which makes use of all the data to be suitable for the regression line. Therefore, all of the data will impact the prediction at some point or the other. A local regression is carried out by LOESS, only taking into consideration those points that are close to the target point while making a prediction. This “Nearness” is regulated by the width of a window chosen by the analyst.

Logic Databases Also referred to as Declarative Logic Programming, these databases imply knowledge in terms of logical relations and in order to solve problems it uses deduction. Logic databases are plainly declarative, having no elements which is procedural as in the context of Logic Programming, Logic Means or techniques for reasoning from a “known” or provided set of facts and speculations related to the other conclusions and facts. There are systems of multiple logic incorporating inductive inference, Nonmonotone logic, Predicate Logic, in addition to the multiple deductive logic.

Logic Programming Classical computer science revolves around the “how-to” knowledge, in comparison to mathematics, which talks about the declarative (what is called) knowledge. Logic Programming, which emerged out of research

本书版权归Arcler所有

122

Key Concepts in Artificial Intelligence

in the proving of automatic theorem, efforts to develop relations regarding mathematics that backs up for multiple values, instead of a specific value. A logic program operates symbols and correlations to deduct or conjecture new associations throughout the symbols.

Logic, Modal C.I. Lewis during 1883- 1964 advanced a necessity and possibility logic. If a hypothesis is not certainly false, at this condition it is then possible. A plausible proposition might also be accurate.

Logical Belief Function A role of a belief function is that it reserves all of its mass straightly to one parameter of the frame. This is how a logical belief function works although it functions like an ordinary (Boolean) logic function.

Logistic Autoregressive Network An autoregressive network is a network where the prospect of the current node is said to be a linear logistic function which is available in the ancestral nodes.

Logistic Function Generally, it refers to the cumulative logistic function where y=1/(1+exp(bx)). This function is implemented normally either as an output or signal function when it comes to neural network nodes and as a connection function in statistical universal linear models, and this is for two reasons, one is because of its simplicity, the other is for the theoretical purpose.

Logistic Regression A peculiar form of regression which can be put to use in order to bring about the functions of regression for both the purposes that is binary and ordinal variables. It is a linear model which is generalized, that makes the use of a logistic function so as to associate the anticipated value of the reaction in accordance with the linear model. In a number of the reasons that it is selected and prioritized for variables of this type because it accurately holds and tackles the restricted responses and the reliability between the mean as well as the error.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

123

Look-ahead Look-ahead methods can be put to use in order to enhance and ameliorate the outcomes of global searches or an algorithm which is greedy. At every phase in a search, the program regards the outcomes for numerous steps to the fore of the present stage, and prioritizes the step that is accountable for providing the best result various steps ahead. On the other hand, at the next stage, the program reiterates the search, by the method of plausibly making the use of the results which is conserved from the prior look-ahead. The different number of steps that the program seeks forward to is often termed as the horizon of the program. Large horizon values can direct to a combinatorial explosion and averts the program from setting its foot in a solution in an appropriate amount of time.

Lower Approximation in Rough Set Theory The largest definable set contained in the concept (class) X is the lower approximation of a concept X. For instance, in a medical database on heart attacks, the lower estimate of the heart attack concept would be the immense definable set of attributes between those cases which deals with heart attacks. An evaluation of unreliability for the lower approximation is the proportion of the number of records which are in the lower approximation to the total number of records present in the dataset. It is an evaluation of relative frequency, in addition to a Dempster-Shafer belief function.

Lower Envelope In Quasi-Bayesian depiction of probability, the lower envelope of a probability is the lowest probability for a theorem over the (convex) set of probability dispensations connected to those hypotheses. It is also directly associated to the upper envelope of the probability of its capacity: lower(P(X)) = 1 – upper(P(Xc)), where X is the proposition, Xc is its complement, and lower(P) and upper(P) denotes the lower and the operators of upper probability.

Lower Expectation The lower anticipation of an action or a decision is the lowest expectation (or average) of that action comprehensive of the probability dispensations that are suitable for that action. It has also been known as a lower foreseeing for the action.

本书版权归Arcler所有

124

Key Concepts in Artificial Intelligence

Lower Prevision As same as to the lower expectation, the lower prevision is the lowest expectation of an action on top of a set of distributions of probability. The term “prevision” is implemented in this respect so as to highlight the subjective behavior of the probabilities that are incorporated.

Lyapunov Function It permits and enables an individual to determine consistency and equilibrium points of a system and is referred to as a function of a dynamic system.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

125

M MACE MACE refers to a reasoning system which is automated and looks for small models consisting of finite statements. It also uses Otter language to elaborate and imply problems and operates searches that are exhaustive.

Machine intelligence An umbrella term which puts emphasis on the algorithms of machine learning, deep learning, and classical learning.

Machine Language Refers to the binary guidelines that a computer performs. Particular to a specific type of computer and (necessarily) illogical to people. It identifies simple operations in a form that can be suddenly accomplished by a computer.

Machine Learning The ability of computers to automatically obtain new knowledge, and learning, is called machine language. For instance, the cases or experience of past, either from the computer’s own experiences, or from exploration. Machine learning implies various uses for example searching for rules to direct marketing campaigns dependent on lessons mastered from study of data from supermarket loyalty campaigns; or learning to identify characters from handwriting of people. Machine learning authorizes computer software to modifies to transformation of circumstances, giving it a chance so as to take better decisions as compared to the non-AI software. Other words for machine learning can be: learning, automatic learning. The capacity of a program to obtain or establish new understanding or skills. The center of attention of the analysis of Machine Learning is on evolving methods of computation in order to discover new knowledge from the given data.

Machine Perception It illustrates a capacity of system to obtain and explicate data from the outside world in such a manner like how humans make use of their perceptions and senses. This is generally attained by means of attached hardware.

本书版权归Arcler所有

126

Key Concepts in Artificial Intelligence

Machine Translation The translation in which computers translate speech or text from one language to another. There exist two prominent kinds of machine translation, firstly, knowledge-based systems which derives from dictionaries, grammars and so forth and the another one is statistical machine translation enabled through the analysis of deep learning of bilingual texts to draw meanings that delegates translation from one language to another.

Machine Vision It is the technique which is used to give automated inspection which is image-based as well as analysis by implementing optical sensors. In industry, the implementation of guidance via robot, automatic inspection, security monitoring and the assessment of damage is done for the analysis of image processing and machine vision.

MacLISP An unoriginal form of the original LISP language that examined with various new theories that immensely influenced the growth and evolution of Common LISP. Its beneficiary was the New Implementation of Lisp (NIL).

Macro A term used for a piece of computer code that increases to generate more code. This code can include other macros and so on till the time there are no availability of macros left. This is implemented during the course when a program is expounded or put together in order to clarify the number of reiterative codes. Few of the language or interpreters permit only a single proportion of macros.

MACSYMA Refers to a mathematics programming system that can be implemented to help the scientists and others in extracting information, evaluating, and solving complex mathematics. Other programs incorporate REDUCE, MATHEMATICA, MAPLE, and MATHCAD. The specific advantages and attributes of these programs transform with the duration of time, but in general they immensely simplify and bring out the solutions of problems consisting of the complex mathematics.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

127

Mahalanobis Distance Mahalanobis distances are an evocation of classic Euclidean distances that enables those transitions in few of the directions are way too difficult in addition to this it is more “expensive” as compared to the transformations that takes place in other directions. The correlative “cost” of the variance is outlined in a weight matrix W, and the distance is evaluated as the square root of (x-y)’W(xy), whereby x and y are the quality vectors of the two objects that are being in contrast.

Mainframe is a system that is generally comes in use by large firms for in the dimensions of tasks related to data-processing. This could incorporate statistical analysis, ERP functions and the transactions happening financially. Throughout the course of 1960s and 1970s, mainframe computers were to a great extent linked with IBM™ owing to its market share which was superior. Mainframes are considered a prime computing resource for various large organizations and are expected to do the same for many years.

Manhattan Distance In respect to the two cases the Manhattan distance between them is the function of the aggregate of the distances in terms of the two objects that is in between them on all of the features involved in the measure of the distance. This varies from the distance that is standard Euclidean, in that it does not need that a straight path between two objects be purposeful or easily accessible. It is labelled for the distance one would require to walk or drive between two extremities in Manhattan.

Map and Reduce is a process which is implemented for the sets consisting of big data. It keeps together and hoards chunks of data in order to operate filtering/sorting functions prior to lowering the data by adding to it a summary operation.

Marginal Distribution When a multivariate probability distribution is put together throughout one (or more) of its features (or aspects), then the outcome of the distribution is known to be a “marginal” distribution with context of the original distribution. For instance, the following multivariate distribution in respect

本书版权归Arcler所有

128

Key Concepts in Artificial Intelligence

to the grouping of age and sex of a particular population consists of two marginal distributions, one for age (totalizing over sex), having values of 0.2, 0.5, and 0.3 and another for sex (summing over age), values of 0.5 and 0.5. All these three distributions are held up as distributions of regular probability.

Marginalization The process of turning out or in other words lowering a multi-variable function to a function of (often) one variable, generally by summing up or amalgamating the multi-variable roles over all possible values of the variables being detached is called the marginalization. The process is at times implemented in, for instance, belief nets to withdraw or absorb variables and constants that are insignificant to a specific question. For instance, examine a model for a medical diagnosis of, suppose, heart disease which is grounded upon a graphical model. Eventually, the availability of diagnostic information had been set its foot and propagated all along the model could be marginalized in order to lower it to a probability statement which is simple about possible diagnoses for a particular patient.

Market Basket Data A usual trouble in Data Mining revolves around a type of data called a Market Basket Data. This is data in which each set of record comprises of a list of items that were acquired at the same time or else they are assembled naturally. The typical instance would be record of purchase that is made from a retail store however, the methods could be implemented in various other different problems (for example, medical symptoms). This term is generally used when the aim of the analysis is to find out rules of associations or the reliability rules. The prior one is plainly ruling that denote if item A is in the “basket,” then item B is also placed in the basket, having two evaluations, the support (the proportion of cases that carries both A & B) and the next is the confidence (the proportion of cases with B throughout those that have A). The last one is an abstraction of all the connected rules that are enclosed elsewhere.

Markov Chain A model illustrating the changes in state, where each state has a framework of possible successor states and a connected distribution of probability. The probability distribution relies only on the recent state, and does not depend

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

129

upon the history or way of states that were passed over in order to set foot in the current state. It can be amounted to by either its graph which consists of state, a managed graph with arcs depicting permissible transformations or a matrix related to a transition. In the instance below, the changes in matrix defines a system of three-state. Where, if the system which remains in the first state, it stays there with the 0.5 probability or shifts to either of the other two states which have the probability of 0.25. If the system goes in the second state, it will remain there with probability of 0.3, shift to the first state having almost the same probability, or even to the third state with the probability of 0.4. When the system reaches in the third state, it remains in that state with probability of 0.7 or shifts to the second with 0.3 probability. But it will in no condition shift from state three to state one. The transformation matrix shapes a conditional in the likelihood expert system manner.

Markov Chain Monte Carlo (MCMC) Methods A group of techniques that are sampling-based for evaluating the dispensation of a belief net provided an assemblage of data. Normally, the unexamined values in a belief net are initialized without planning. The algorithm then circles along the set of the values that are unobserved, sampling from its distribution conditional on the current settings of other values. This process is sampled again and again till adequate amount of data is assembled to produce the desired answer. It permits one to compute far more than generalized belief net architectures do, incorporating complicated multivariate probabilities and expectations over complicated domains of the data.

Markov Condition In cases when an attribute Y does not imply an effect of X, then in that condition X and Y becomes conditionally independent provided X in the sense of the direct causes.

Markov Decision Problem (MDP) A reiterated decision trouble against “Nature” whereby the decision maker possess an entire detail as well as the strategic certainty, where Nature’s next action relies probabilistically on its present state in addition to the selected action.

本书版权归Arcler所有

130

Key Concepts in Artificial Intelligence

Markov Grammar A Markov grammar is a substitute to conventional statistical parsers. Instead of storing a pattern of rules based upon explicit parsing and their linked probabilities, a Markov grammar hoards transformation probabilities that permits it to establish set of rules on the fly.

Markov Random Field A Markov sudden field is a usual graphical model for the reliability of a set of variables demonstrated as nodes. It shows the property that the dispensation of a specific node is only a justification of its random neighbors, and the product of the clique potentials is the global distribution.

Markov Tree A Markov tree representation of a graphical net is essential to implement the merging as well as dispersion algorithm to that net. A tree is said to be a Markov tree if its nodes are marked with sets of variables and the tree possess the attributes that for each parameter of variables that are present on the tree, the subtree fabricated from nodes having those variables is also associated.

Mathematica When for symbolic mathematics as well as for mathematical/logical computation, a computer system is used it is known as Mathematica. It is a successor of programs like MACSYMA and REDUCE. Mathematica was invented by Steven Wolram which is accessible from the company of Mathematics.

Mathematical Induction Programs or routines that can draw new relationships which is mathematical grounded upon a beginning set of relationships that are known.

Maximin Criteria In a decision problem with a payoff function u (a, s), where “a” stands for a member of the set of actions A, while s implies to a member of the set of states’ S’s, the maximin value denotes the value of the action having the payoff which is immensely minimum. The maximin action would refer to

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

131

that action. It refers to an appealing action in the problems of decisions where the adversary is supposed to be off the ground in respect to the game.

Maximum a Posteriori (MAP) In the context of Machine Learning as well as the statistical inference, at some times the algorithm or person is in a need to select a single speculation (or the value) among many. If the system is dependent on the methods of Bayesian, that choice will most of the times be the hypothesis (or value) which contains the probability of Maximum A Posteriori (MAP), that is the speculation that possess the probability which is maximum after grouping together the data with former information.

Maximum Entropy Principle A principle for selecting a distribution of probability in order to demonstrate unreliability in a system. The principle decides and selects the probability distribution with the substantial entropy choosing from the set of distributions that encounter and intersects the obstructions dependent upon that distribution. For instance, the distribution of maximum entropy for an uninterrupted variable with a provided mean and variance refers to the normal (i.e., Gaussian) distribution.

Maximum Likelihood An optimization benchmark implemented in Machine Learning, Statistics, and various other models. The subsets are chosen so as to increase and expand the “likelihood” function of the responses, constructed upon their anticipated probability distributions. For instance, least squares fit refers to the “maximum likelihood” when the response is generally distributed with a usual dissimilarity provided its mean. In neural networks, maximum probabilities are made use for the responses that are multinomial (that is, multistate responses).

Maximum Likelihood Estimate The maximum likelihood evaluation is the worth of the subsets in an estimate that accounts for expanding the function of likelihood in the context for the data, this regulates the distribution that is most likely to have generated the data, depending solely on the data and distribution. It varies from the Maximum A Posteriori (MAP) evaluation, which is the most of the product

本书版权归Arcler所有

132

Key Concepts in Artificial Intelligence

of the probabilities (data and distribution) in addition to this other former information on feasible values of the evaluations.

Max-Pooling It, in complexity neural networks (CNN) denotes to the gathering of clusters of neurons in a sheet into a single neuron in the following layer through the process of selecting the maximum value from the former cluster.

Mean Square Error (MSE) Criterion The Mean Square Error (MSE) criterion can be implemented so as to offer an amalgamated measure of the exactness and dependency of a conjecture. It is evaluated as the squared variation between an evaluated value or forecast and its accurate value. Scientifically, it can be decayed into two sections, a squared bias, which emerges from the variance between the model’s average performance as well as the accurate average for that amalgamation of attributes, and a discrepancy effect term, because of the variability of the data. The anticipated MSE is put to use so as to estimate the performance of models that can be calculated analytically, whereas the Predicted MSE or Predictive MSE (PMSE), which is assessed on data by making comparisons the speculated value to the accurate one, can be implemented as a criterion for model fitting.

Means-Ends Analysis A method related to problem-solving where the present state is most of the times compared to the desired end, and a new move is selected depended on what at the present appears reasonable. Initially it was implemented in the CPS program, and since then it has been employed in programs for example FDS, STRIPS, ABSTRIPS, and MPS.

Member A member of a set is a component that is in a set. The correlative LISP function scrutinizes a list in order to demonstrate if a constituent is a member of that particular list. In case if it is, it goes back to that element and the tail of the list; or else it returns to a list which is void.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

133

Membership Function A function m(A, x), which goes back to the truth-value of either an object or a set. Normally used in logics that are blurry, where it estimates a membership of an object in a set. In the latter case, the function of membership can extend from 0, taken to mean no membership in the parameter A, to 1, taken to mean entire membership in A.

Membership Query In the process of Machine Learning, the learner often requires to interrogate an external source so as to put up questions of a “teacher.” When it asks a question for example “Is a robin a bird” at the same time making an effort to study the concept of bird, it is carrying out the function of a membership query.

Memetics Refers to the anthropological study of how propositions (memes) can be perceived by means of the lens of a Darwinian evolutionary. Memetics assists to elucidate how notions disperse in cultures which are more like genes with a lifecycle that imitates the evolution of genetic as it puts together, mutates, and transcripts itself into further memes.

Merge/Purge Problem The most common problem in the knowledge discovery in database that is KDD as well as the data warehousing is the consolidation of numerous large databases from various sources, having distinct representations. The data has to be cleansed and connected into a single similar whole prior to the KDD process can start with. This significant process is called by numerous names, together with the problem of Merge/Purge, record linkage, instance recognition, semantic combination, and cleaning of data.

Merge-Purge In the mid-1990s the Knowledge Discovery in Databases-based (KDDbased) system evolved in order to identify identical records in huge databases. It has been applied flourishingly to the recognition of duplicate welfare assertations in data from the Welfare Department of the Washington state.

本书版权归Arcler所有

134

Key Concepts in Artificial Intelligence

Merit Function In many troublesome areas, the algorithms need a numerical technique for evaluating how “good” a solution, choice, or feature is for a given objective. The function applied to allocate a value to one of these is known as a merit function. Generalized merit functions are minimum squares or, more usually, maximum probability, information obtains or entropy, in addition to the Gini criteria.

Metadata Data about data. Applied in Data Mining and warehousing of data to denote to data in terms of the meaning and domains of attributes, their associations and locations. Metadata provides the context to comprehend the raw data.

Meta-Knowledge A term used to imply “knowledge about knowledge,” where a program not only “knows” something (meaning to say, it can approach to a database of knowledge), but “knows what it knows.”

MetLife’s Intelligent Text Analyzer (MITA) MetLife’s Intelligent Text Analyzer (MITA) is an immense system textual analysis of the applications of life insurance. It takes up the withdrawal of information techniques which occurs from natural language processing to the information of structure from the freeform textual fields. MITA makes the use of an ontology fabricated from the SNOMED system.

Micro-Planner A parameter of the PLANNER language. It was the groundwork for the language called SHRDLU and takes it to the evolution of the numerous other languages in addition to CONNIVER.

MIM A commercial package for setting in graphical and hierarchical models to continual as well as the categorical data is known as the MIM. It holds up both automatic together with the model which is user-directed constructing by means of either a command language or a graphical interface.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

135

Minimax Action In a decision problem having a payoff function u (a, s), where a denotes a member belonging to the set of actions A, and s refers to the member of the set of states’ S, the minimax action is the action with the slight maximum loss. It is one of the best actions when it comes to a pessimistic game, where the adversary team has a proper information and perfect knowledge.

Minimax Procedures Minimax procedures are the type of procedures that function to reduce the maximum loss that can be the outcome of a move or plan. It is usually used in game theory and associated problems that anticipate an opposing player who possess the complete information and details.

Minimum Description Length (MDL) It is a formalization of the Occam’s Razor principle which is applied in order to find out the best model (that is description) for encapsulating prime aspects of data. MDL expects that the best and most probable explanation of the data is the one which is the simplest, most limited representation of data.

Minimum Description Length Principle (MDLP) The Minimum Description Length Principle (MDLP) signifies that the best speculation for a provided set of data is the one that reduces the sum of the length of the theory and the length of the data during the time of making the use of theory as a predictor of the data. The length of both is evaluated in bits and the encoding scheme mirrors one’s a priori probabilities. The MDLP can also be perceived as a Bayesian Maximum A Posteriori (MAP) evaluation.

Minimum Message Length (MML) Minimum Message Length (MML) is a method for evaluating the complexity of a rule or the framework of rules, that is expanding in the complexity of both the data along with the rule. Choosing the MML rule is, typically, an implementation of Ockham’s Rule. In this case, the complications of an item are estimated as the negative log (base 2) of its likelihood. It is in proportion to the probability function.

本书版权归Arcler所有

136

Key Concepts in Artificial Intelligence

Minimum Viable Product A development method which is extensively used by start-ups to establish and bring out new products or websites at a sudden rate, evaluate their viability, and fast-track moderations. A final set of attributes is only evolved once feedback from early adopters has been taken into consideration. Groupon’s™ early platform, constructed the technologies of third-party, offers an example of minimum viable product.

Missing At Random (MAR) If the likelihood that a response feature is missing can stand on its own of its value yet relies on the value of the predictors, then this is known as Missing at Random. The analysis made on the basis of likelihood can dodge the mechanism which is used behind the missing data, but several standard supervised learning methods and ideologies can generate results that might be invalid in case if the missing cases are plainly ignored.

Missing Completely At Random (MCAR) If the likelihood that a response feature is missing is independent of its value and of the value of the predictors, then it is Missing Completely at Random, and in an analysis, it can simply be avoided.

Missing Data Many databases contain cases where the values of not all the attribute are known. These may have reasons such as the structural reasons (for example, parity for males), due to transformations or variations in the collection of data methodology, or because of the nonresponses. In the latter case, it becomes essential to distinguish between the responses that are ignorable and nonignorable. The former must be labelled whereas generally the latter one is served as random.

Mixture-of-Experts (ME) The technique in which models that is a mixture-of-experts (ME) enables a model to incorporate numerous sub models or in other words “experts” enclose in the boundaries of the overall model. The experts are gathered by a gate function that leads the outputs of individual experts’ and brings them together in order to offer a final output. These experts are present in a shape of local model that are optimal over limited sub-domains when it

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

137

comes to the problem of a total domain. Their amalgamation can lead to more true models as compared to a single “global” model. An instance could be a problem of medical diagnosis, where every expert is a model for a specific type of disease. Each expert could then anticipate the probability that a given patient is going through some special kind of disease. The gate function could then bring together the anticipations of an individual through the mode of using a “softmax” function or in some cases various other functions on voting. The supplement of the ME model is a classified ME (HME) model. In this model, the experts are organized in a tree, the reason for this is, that an individual’s expert’s model is a combination of the models “below” it on the tree. In this respect, Classification and Regression Trees (that is CART) or in other words decision tree is a very ancient form of HME or can be said a conventional form.

MNIST A very prominent set of collection of data of handwritten numerical digits, implemented at times in the recognition of image benchmarks. It is most of the times used as no insignificant example of classification functions.

Mobile It illustrates how the internet, services that are online, voice calls, applications, information, details and content are examined through smartphones or other mobile devices. Mobile is at times viewed as a distinguished market sector.

Mobile Robot A free-roving robot that is capable of moving through space in order to achieve some of the tasks. Along with the problems that face normal robots, these robots require to be able to discover themselves in space, steer and/ or find a pathway to their aim in addition to performing their task upon reaching. Classic instances for this would include Dante, a robot framed for space, volcanic exploration, together with robots implemented for nuclear repair.

Model It is simply a demonstration or simulation of some phenomenon which is of real-world. There are various kinds of models that can be generated, for

本书版权归Arcler所有

138

Key Concepts in Artificial Intelligence

instance, iconic models, analogic models, as well as the analytic models. When it comes to machine learning, analytic models are generated by the process of executing a learning algorithm in opposition to some data.

Model Equivalence In most of the Machine Learning and Data Mining environments, complex models can capitulate necessarily the same anticipations from very dissimilar framework of variables and premises. This can be regarded as model equivalence.

Model Training It is the process in which artificial intelligence (AI) is examined to carry out its functions, and in various other ways follows the similar procedure that new human who engages must also go through. AI training data requires to be impartial and extensive to make sure that the actions of AI’s and their decisions do not unintended demerit a framework of people. A prime attribute of responsible AI is the capacity to identify how an AI has been instructed.

Model Workflow They are functions enclosed in boundary of a workflow that can be surveyed and inspected prior to making transformations which are positive to that workflow by means of a technique which is business process modelling (BPM).

Model-Based Reasoning Model-based reasoning (MBR) focuses on reasoning about the behavior of a system from a crystal- clear model of the operations fundamental to that behavior. Model-based methods can very concisely showcase knowledge more thoroughly and at a significant level of detail than methods that encode experience, due to the fact that they engage models that are dense axiomatic systems from which substantial amounts of knowledge and information can come to a better understanding.

Model-Free A “model-free” model, for instance a neural network or a blurry system is a model that is too complex in nature to pen down explicitly, or in which

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

139

several behaviors of the input or output are left undefined. For instance, a model which is plain linear regression model for three inputs and one output can be penned down in a single line, on the other hand, a multi-dimensional feedforward neural network could consume many lines to write down and is therefore model-free. The linear regression model is simply receptive to analysis, and it is plausible to demonstrate many general optimality and other attributes, whereas the neural net proves to be more troublesome to scrutinize and is therefore identified model-free. This usage is same to the utilization of the term which is known as nonparametric in statistics.

Modular Automated Parking System (MAPS) The Modular Automated Parking System (MAPS) is a logic-based system which is fuzzy and established by Robotic Parking to park and recovered vehicles in garages.

Momentum The speed at which algorithms, like the neural nets, learn can be ameliorated in specific situations by modernizing the evaluations in few of the direction other than the current gradient. One purposeful direction to shift is sometimes a mix of the best direction which is current and the last best direction. This has the impact of lessening oscillations in the modernized subset values.

Monotone Function A function of monotone which is increasing (decreasing), for instance, an activation function in a neural network is a function which accounts for the always increase(decreases) in value when its argument(s) expand.

Monotone Logic Most of the time a classical logic is monotone logic which possess the property that if A=>B, the (A&C) also must =>B. This averts a system dependent upon monotone logic from withdrawing a conclusion reliable on evidences (propositions) that are new or conflicting.

Monotonic Logic A logic that anticipates that once a fact is demonstrated it cannot be changed in the course of the remainder of the process.

本书版权归Arcler所有

140

Key Concepts in Artificial Intelligence

Monte Carlo Method It is a statistical method where reiterated numbers that are often random are used to draw a result which is numeric. They are beneficial in practice in order to solve problems in dimensions that are mainly optimization as well as the probabilistic distributions.

Moral Graph One of the steps in transforming a Directed Acyclic Graph (DAG) into a junction tree is the change of the graph into a moral graph. A moral graph of a DAG associates all random parents of a node and transforms the derived graph to a directionless graph. The interiors of this graph are implemented in the construction of the junction graph in addition to the junction tree.

Most General Common Instance (MGCI) The Most General Common Instance (MGCI) in two expressions that is A and B is an expression C which is an instance of both A and B in a way that any other expression D which is also an instance of A and B proves to be an instance of C.

Most General Unifier (MGU) Most General Unifier (MGU) is used in binary resolution, and it works as a substitution that relents the MGCI of two fused liters.

Motion Analysis Methods used to re-establish the motion which is three-dimensional of an object dependent upon a pattern of perspective views (that is threedimensional image that portrays height, width, and depth for a more realistic image).

MPS A program constructed to solve specific puzzle problems that are way too difficult, such as Rubik’s cube. As same as the Means-Ends analysis, it traces the current state and implements a pattern of operations framed to bring the present state adjourned to the final state. The tasks were described in such a manner that any constituents that are at their eventual state are not transformed by the motion of another, less significant one.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

141

MSBN MSBN is a network of belief manipulation and speculation tool evolved by the Decision Theory and Adaptive Systems Group at Microsoft. It offers an interface which is graphical to construct and alter belief networks and can dispense the assessment of probabilities.

Multi-Class Most of the classification systems are established and evolved for binary data. The variables that are multi-class are those having more than two categories. This structure of variable is also referred to as multinomial.

Multi-Dimensional DataBase (MDDB) A system which is a Case Based Reasoning system known as MDDB that is implemented to the syndrome that is diagnosis dysmorphic, a domain with poor medical knowledge by essential information. The system is constructed on records of approximately 3000 patients. MDDB is usually used in Online Analytical Processing (OLAP) and many such systems that are related to this. Instead of treating a database which has multi-attribute records in a way of two-dimensional table with rows as records and the attributes (variables) in the form of columns, the data is arranged as a rectangle which is k-dimensional, with one dimension for an individual attribute. The aspect corresponding to the i-th attribute has levels that are ni, each of them corresponds to one of the values that the i-th feature can anticipate in this structure. A cell at the intersection of particular values for each of the k features possesses summary data on all the lists that are categorized as to be in possession to that cell. Generally, the marginal cells (that is cells where one or more dimensions have been repressed) carries information on every cell in the dimension(s) that are missing.

Multi-Label Most class variables are the property of a single class. Target variables which can belong to more than one class are referred to as multi-label. An instance would be categorizations on textual objects that are capable of belonging to diverse categories.

本书版权归Arcler所有

142

Key Concepts in Artificial Intelligence

Multiple Imputation A method for “filling in” missing values in data parameters. A set of rules for missingness is instigated or deduced, and collective imputed datasets are produced. Each is examined in a traditional fashion, and the outcomes are adjusted and put together for the imputation.

Multiple Instruction Multiple Datastream (MIMD) Multiple Instruction Multiple Datastream (MIMD) computer architectures. Which have manifold layers of processors, where each of them functions in order to bring about a different framework of calculations on their own data.

Multiple Layer Perceptron (MLP) The Multiple Layer Perceptron (MLP) is a neural network which carries either one or more than one layers of nodes that are hidden. Each of the nodes makes the use of an activation function on the product which is present on the inside of the inputs and the weights (that is a generalized linear model) as well as a “bias” setting.

Multivalent Refers to a logic or system that can lead to multiple values. Where the values may be separated from each other, for example, Lukasiewicz’s trivalent logic [0, 1/2, 1], or continual, such as likelihood or blurry scores of memberships, both of which are described ranging from 0–1. Eventually, a value of zero inferred absolute falsification or impracticality while a value of one (1) signifies truth that is absolute and essential. Multivalent logics permits and enables one to get hold of degrees of certainty, and reinforce logics that are non-monotone.

Multivariate Adaptive Regression Spline (MARS) The acronym for Multivariate Adaptive Regression Spline is MARS, which is a statistical method for adaptively evaluating a function which is multiattribute. Like CART and other methods, this technique monotonously fits a function to the data. It adaptively fixes and adapts for domains with poor fits by attaching spline functions to the model for that area.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

143

Multivariate Probability Distribution A probability distribution is a distribution whose states are categorized by various variables. An instance would be a bivariate distribution based upon Age and Gender, where Age had been indexed into three states of young, middle-aged, as well as old and Gender consists of two states that is of female and male. A multivariate distribution on Age and Gender would carry out six states (that is young female, young male, and middle-aged female, etc.), each with a connected probability. The sum total of the six probabilities would be one.

Mutation In evolutionary programming, some structures and techniques for producing “new” behaviors requires is introduced, extracting its name from biologic mutation. In the concept of biologic mutation, random mistakes can be instigated into offspring. In a same vein, machine models will institute variability into their offspring, with the purpose of enhancing their fitness.

MxNet It is a learning of a deep open-source library backed up by cloud vendors for instance AWS as well as Microsoft Azure.

Mycin A system of medical diagnosis that was constructed to be a consultant on strenuous cases of meningitis in addition to the bacterial disease. It also incorporates factors which is mainly “certainty” in its diagnosis in order to imply and denote the strength of belief regarding the hypothesis.

本书版权归Arcler所有

144

Key Concepts in Artificial Intelligence

N Naïve Bayes A naïve Bayes classifier, like a Bayes classifier, categorizes based on the predicted probabilities of the classes given the inputs. Although, a naïve Bayes classifier treats the inputs as self-governing given the class, and guesses the distribution by simple counting, so that it is really a frequentist approach.

Natural Language Generation The complement of natural language understanding, Natural Language Generation is worried about computer generation of text in a way to clarify items, provide direction or ask questions.

Natural Language Interface A vague term, this phrase is generally used to define program front ends that seem to comprehend questions or directives provided by a user.

Natural Language Processing English is an instance of a natural language; a computer language isn’t. For a computer to process a natural language, it would have to impersonate what a human does. That is, the computer would have to acknowledge the series of words spoken by a person or another computer, comprehend the grammar or syntax of the words (i.e., do a syntactical analysis), and then abstract the meaning of the words. A confined amount of meaning can be resultant from a sequence of words taken out of context (i.e., by semantic analysis); but much more of the meaning be contingent on the context in which the words are spoken (e.g., who spoke them, under what conditions, with what tone, and what else was said, specifically before the words), which would need a pragmatic analysis to extract. To date, natural language processing is inadequately developed and computers are not yet able to even approach the potential of humans to extricate meaning from natural languages; yet there are already valued practical applications of the technology.

Natural Language Understanding The potential of a program or system to comprehend (usually restricted) questions or directives from persons. A broader range of

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

145

techniques can be employed, such as pattern matching, Semantic Grammar, syntactic parsing, word experts, connectionism, etc. NAVEX A real-time expert system that make use of radar data to supervise the velocity and position of the space shuttle. The program is rule based and uses frames.

NAVLAB Project The NAVLAB project can be defined as an ongoing program that is developing self-navigating robot vehicles. They have prepared a series of vehicles (including vans, cars, and buses) with computer and sensor equipment with the objective of vehicles development that are capable of self-navigation, both on regular roads and as off-road vehicles.

NDS NDS is a rule-based expert system that is developed by Smart Systems Technology and Shell Development Corporation to ascertain faults in the COMNET national communications network.

Nearest Neighbor A term for techniques that classify or predict an observation on the basis of values of previous observations that are “near” to the target value in some sense. Typical distance measures can be based on a Manhattan metric or a function of the Euclidean distance. The position set can either be the k nearest neighbors or all of the neighbors within a given distance. In the latter case, the technique can also be mentioned to as a kernel predictor or classifier.

Necessary and Sufficient Conditions In developing descriptions for an object or for a rule to hold, attributes can have essential conditions (ranges, values, etc.) that must hold for the rule to hold and adequate conditions, meaning that if the condition on the attribute holds then the rule must hold. An essential and adequate set of conditions on a set of attributes is a collection of conditions that such that the rule is always true when the condition(s) hold and always are met whenever the rule is true.

本书版权归Arcler所有

146

Key Concepts in Artificial Intelligence

Negation Normal Form (NNF) In mathematical logic, both the Disjunctive Normal Form (DNF) and Conjunctive Normal Form (CNF) are special cases of a negation normal form (NNF). A proposition is in Negation Normal Form when it is either a literal (i.e., an atom in the language), a conjunction of NNFs or a disjunction of NNFs.

Negative Evidence Evidence that can be used to forecast the non-occurrence of various conditions.

Negative Paramodulation Negative paramodulation can be defined as an inference rule that can be used in automated reasoning systems. Its reasons from negated equalities in comparison to equalities. The rule is sound if the functions involved gratify some cancellation-like properties. For instance, from AB and AC=D, we can derive BCD by negative paramodulation provided that a right cancellation holds for a product.

Negmax A technique for searching trees that is correspondent to the minimax procedure.

Nelder-Mead Simplex Algorithm The Nelder-Mead Simplex Algorithm is simple method for exploring for minima or maxima over smooth functions. For a k-dimensional problem, the algorithm starts with k+1 (often randomly chosen) starting values, and assesses the function at each point. It is basically interested at the point where has the highest value. If the difference between that value and the other is adequately large, it moves in that direction, and appraises another set of points. When the distinction between the points is not large it contracts the simplex around the center of the existing points and reexamines the functions. When the difference is adequately small, it finishes.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

147

NeoGanesh NeoGanesh is a knowledge-based system for management of ventilator in Intensive Care Units. It infers data on a real time basis and can control the mechanical assistance to the patient.

Netica Netica is a commercial system for manipulating, building, and implementing expert systems on the basis of influence diagrams and Belief Network. The networks can be fulfilled into a form suitable for fast processing, and can accept data in a wide range of formats. These networks can be used to find optimum decisions, and can be used to create conditional plans.

NETL A language for the frame-based illustration of semantic networks. It was developed at MIT and implemented in MacLISP.

NETtalk NETtalk, that was created in the year 1978, is a classic example of a training multilayer perceptron network that main aim was to study the conversion of text to spoken words. The network transition the text input into a series of feature vectors that were then mapped into a sequence of stress markers and phonemes. When the system was trained on 1024 words extracted from a child’s speech, it was 95 per cent precise in producing the correct phoneme, and attained a 78 percent precision on a second set of about 450 words that were not in the training set. The network was fairly robust to differences in the node weights and needed a much smaller storage space than comparable dictionary lookup programs.

Neural Net Programs There are a large number of sharewares, freeware, and commercial neural net packages available. Majority of them can be downloaded over the Internet. A large listing is maintained in the Sarle97 reference.

Neural Networks Neural networks are basically referring to an approach to machine learning which developed out of attempts to model the processing that occurs within the neurons of the brain. By making use of simple processing units (neurons),

本书版权归Arcler所有

148

Key Concepts in Artificial Intelligence

prepared in a layered and highly parallel architecture, it is probable to perform arbitrarily complex calculations. Learning can be achieved through repeated minor modifications to selected neurons, that may result in an extremely powerful classification system. A problem with neural networks is that it very challenging to understand their internal reasoning process, and thus to attain an explanation for any specific conclusion. They are best used, thus, when the results of a model are more imperative than understanding how the model works. Neural network software can be used to acknowledge handwriting, and also to control chemical processes to run at anticipated conditions. Other applications comprise stock market analysis, character recognition, fingerprint identification, speech recognition, scientific analysis of data, credit analysis, and in neurophysiological research. Neural networks are also known as connectionism, neural nets, and parallel associative memory.

NEUREX A rule-based expert system that is basically used for the diagnosis of diseases of the nervous system. It uses both backward chaining as well as forward chaining, as well as MYCIN like certainty factors to support in location and categorization of the damage.

Neuron In biology, a neuron can be defined as a specialized form of cell that conveys electrical impulses. Generally, it contains a central body or soma, output tendrils, called dendrites, and input tendrils, called axons. When a neuron receives an adequately large signal along its axons from other neurons or sensory nerves it produces an electrical impulse that travels to the dendrites. At the end of the dendrites there are synaptic junctions with other neurons or other outputs, such as muscles. The signal causes the synaptic junction to discharge chemicals that (may) cause the target to fire. In neural networks, a neuron is basically a single processing unit that obtains inputs from other processing units, sums or otherwise collects the inputs and produces an output signal as a (usually non-linear) function of the collected inputs.

n-gram This is an overall term for an allied family of Markov techniques for modeling natural language. A bigram would model the natural language as a series of word pairs, and would model the language as a series of probabilistic

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

149

transitions between pairs of words. Relatedly, a trigram-based model would model the change to the next word based on the last two words. Overall, an n-gram-based model would model the language on the basis of the last n-1 words or units. A popular “toy” based on n-grams is the difference of the so-called “travesty” program, where the input is a chunk of text, and the output is a trigram-based or bigram-based random walk down the transition tree. This is generally used as an instance of natural language generation. As the order of the estimate increases, the output initiates to appear more and more like input. Although presented above as a technique operating at the word level, it can also be executed at a “higher” level to several syntactic units (e.g., treating the red-haired boy as a single “unit” instead of four separate words) or a class-based model, where the likelihood of a change projected from the class of the words includes the vocabulary. An instance of the latter approach would be to syndicate all cases of the pair (fruit name) ripens to guess the likelihood that the word apple is followed by the word ripens.

Node Coloring Node coloring can be defined as a technique that is used for examination or selecting important nodes in a graph-based model. Each node in the target graph is assessed in relation to some significance measure, and assigned a “color” to match the value. A plot of the graph can then be inspected to evaluate the imperative nodes. This is used, for instance, in the examination of belief nets.

Node In a network or graph, a node is a point on the graph that can be associated to other points through arcs. Characteristically, a node signifies some object or concept, and the arcs exhibits connections between the concepts or objects.

Noisy Channel Model The noisy channel model is mainly used in experimental natural language processing as a basic model for the statistical analysis of language. The model assumes that language is made “clearly” and then passed through a noisy channel before being “received.” The objective becomes to retrieve the original “clear” communication from the noisy input.

本书版权归Arcler所有

150

Key Concepts in Artificial Intelligence

Noisy Data Data which comprises errors because of the way in which they are amassed and measured are generally mentioned too as being noisy. Continuous measurements are generally mixed with Gaussian noise, unless they are near the lower or upper bounds of the measurement system, in which case the noise tends to be skewed towards the midpoint of the scale. Several statistical methods make the a priori assumption that the data is noisy.

Nominal Attribute (Type) A classical taxonomy for numerically valued attributes breaks them into nominal, interval, ordinal, and ratio types. An attribute is nominal if the meaning of the value doesn’t change if you rearrange the attributes or scuffle the values allocated to each meaning. An instance would be a variable such as sex, which could be encoded as 0=male and 1=female just as meaningfully as using 1=male and 0=female. Since the numeric values of these attributes are arbitrary, any operation beyond manipulations and counting adequate to proportions are insignificant.

Non-Ignorably Missing When response attribute can be lost because of another unobserved attribute of a case that is connected to the response attribute, the data is non-ignorably missing. This can arise when, for instance, a loan officer selects applicants on the basis of an unmeasured attribute, such as a general impression. In order to build a valid model for the response, the mechanism that causes the missing data should also be modeled.

Non-Linear Model A classification or prediction model that is not linear in its parameters. This does not typically comprise models that can be re-expressed or transformed as a linear model. For instance, the model y=α+βxδ is non-linear in α, β, and δ, while y=βxδ is not intrinsically non-linear as it can be rewritten as log(y) = log(β) + δlog(x).

Nonlinear Principal Components Analysis (NLPCA) NLPCA can be defined as a nonlinear extension of a principal components analysis and can be used for data compression or reduction, and autoassociative model development. It is generally applied as a multilayer feed-

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

151

forward auto-associative neural network with five layers. The network has input as well as output layers, two “dense” inner layers with sigmoidal activation roles and a sparse linear layer in the middle. The first sigmoidal layer models any intricate associations in the input, the linear layer acts as a blockage to lessen this multifaceted association into a small number of linear components, while the second sigmoidal layer inflates the linear components into a model for the output data which, in this case, is the input data.

Nonlinearity A descriptor pertained to functions or systems where the response or output is not proportionate to the input or impulse driving the system over the (whole) range of system inputs. In a range where the system yields a comparatively small response to large changes in the input, the system can be thought of as “damping” the input. On the contrary, if the system yields a comparatively large change in reply to small changes in the input, the system is intensifying the input. An instance of a linear system would be the relationship y=b*x. In this case the change in the output is proportionate to the transition in the input. A simple nonlinear system would be y=x*x. When x is in the range [-1, 1], the change in y for small changes in x is smaller than the change in x (i.e., damped), while else the transition is larger (i.e., amplified).

Nonmonotone Logic Systems having default reasoning potential can draw conclusions that use presumed premises to compensate for incomplete information. These reasoning methods usually have a nonmonotone property such that the accumulation of further information to the system can cause it to abandon or revise conclusions that were held previous to the accumulation of that data. This is in contrast to standard logic where the conclusion on the basis of a set of premises still holds when added premises are added. For instances, in mathematics, if a conclusion is known to hold when a given set of assumptions hold, then further assumptions (that agree with the earlier ones) can only advance the connection or allow you to derive more restrictive conclusions. Although, the original conclusions still hold. By contrast, a Nonmonotone Logic (e.g., a probabilistic one) might drop the inference with the add-on of further information. For instance, in reasoning about a series of games, such as a football or baseball season, one might determine early in the season that a definite team was best. As the season

本书版权归Arcler所有

152

Key Concepts in Artificial Intelligence

progressed, this assumption might be abandoned or disproved by the results of later games.

Nonmonotonic Reasoning Reasoning techniques that assist conclusions to be withdrawn as additional evidence is obtained.

Nonparametric A procedure is nonparametric if it does not contingent upon “simple” parametric forms in the data, such as a Gaussian, binomial or Poisson distributions. Normally, a nonparametric distribution depends upon the data to form the distribution, and derives its possessions from the way the data was sampled (selected) and the way in which it is influenced (e.g., classification trees, randomization tests). This makes the procedure more sturdy, but less effectual when a parametric distribution is sensible. For instance, a tree-based classifier can outdo a logistic regression when the data are not linear, but will be less effectual when the data do meet the linearity assumptions of a logistic regression. It is important to note that nonparametric procedures effectively treat the data as the distribution, so that the base distribution indulged is the observed data distribution, which is a form of a tabular distribution. These procedures are occasionally known by the name “distribution free,” again meaning “doesn’t use a simple parametric form,” rather than “doesn’t have any distribution.”

Nonterminal Symbol It is basically a symbol in a grammar that can be rewritten into further symbols when administering a statement.

Non-Von Neumann architecture A family of massively parallel computer architectures industrialized at Columbia University in the mid-1980s. The systems are portrayed by a special form of “active memory,” comprising of many elements each with an insignificant amount of local memory, a I/O and a specialized CPU, switches that allows the machine to be dynamically reconfigured.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

153

Normalization It is generally referring to the process of categorizing a data set by the square root of its sum of squares, so that the data set has a length of one. With respect to the probability scores, normalization generally refers to the process of categorizing the scores by their sum, so that the total of the scores becomes one. In the setting of neural networks, normalization denotes to the process of rescaling a vector to the (0, 1) range.

Normalized Mutual Information Normalized mutual information is basically a variant of the entropy measure, aimed at building classification trees. Entropy is famously known to favor multi- way splits, and regularized mutual information penalizes multi-way splits by separating the gain from a split by the logarithm of the number of leaves in the split.

Normalized Radial Basis Function (NRBF) Normalized mutual information is generally a variant of the entropy measure that is used creation of classification trees. Entropy is known to favor multiway splits, and regularized mutual information penalizes multi-way splits by splitting the gain from a separation by the logarithm of the number of leaves in the split.

Not Applicable Data It is generally seen that some attributes are not meaningful for all members of a universe and are labeled as non-applicable. An instance would be the parity of person. It is only meaningful for adult females. Attributes that are only pertinent for subsets of the data can cause substantial difficulty for Machine Learning and Data Mining algorithms that are not dedicated to address them.

NPC It is known as an expert system developed by Digital Equipment Corporation (DEC) for troubleshooting DECnet-based computer networks.

本书版权归Arcler所有

154

Key Concepts in Artificial Intelligence

NRBFEQ A normalized radial basis function (NRBF) with equal heights and widths on each node. The NRBFEQ architecture is a smoothed variant of the learning vector quantization (LVQ) and counter propagation architectures.

NRBFUN A normalized radial basis function (NRBF) with unequal heights and widths on each node.

Nuclear Power Plant Consultant (NPPC) The Nuclear Power Plant Consultant (NPPC) is an expert system that helps in assisting nuclear plant operators in defining the causes of abnormal events.

NUDGE A frame-based front end for scheduling and planning of algorithms. It takes incomplete and possibly contrary scheduling requests and aimed at completing and reconciling them.

Null Generally A null is basically a symbol representing an empty set or a zero (i.e., a symbol for “nothing”). In LISP a null is an S-expression that is both an atom and a list (e.g., “()”;).

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

155

O Object Oriented Language A computer language that considers data and data structures as objects, and which can receive and send commands or messages, and act upon those. This differentiates from traditional procedural languages, where the results are attained through a series of procedures that is pertained to the data.

Object-Oriented Programming An object-oriented problem-solving approach is identical to the way a human solves problems. It comprises of identifying objects and the correct sequence in which to use these objects to address the problem. In other words, objectoriented problem solving comprises of designing objects whose individual behaviors, and interactions solve a specific problem. Interactions between objects take place through the exchange of messages, where a message to an object causes it to perform its operations and solve its part of the problem. The object-oriented problem-solving method thus has four steps: 1) identification of the problem; 2) identify the objects required for the solution; 3) identify messages to be sent to the objects; and 4) create a messages sequence to the objects that solve the problem. In an object-oriented system, objects are data structures used to signify knowledge about physical things (e.g., pumps, arteries, computers, any equipment) or concepts (e.g., plans, requirements, designs). They are characteristically organized into hierarchical classes, and each class of object has information about its attributes stored in example variables related with each example in the class. The only thing that an object knows about another object is that interface of object. It is generally seen that each object’s data and logic is concealed from other objects. This helps the developer to distinct an object’s implementation from its behavior. This parting creates a “black-box” effect where the user is isolated from implementation transitions. As long as the interface remains the same, any changes to the internal implementation is visible to the user. Objects provide substantial leverage in demonstrating the world in a natural way and in recycling code that operates on common classes of objects.

本书版权归Arcler所有

156

Key Concepts in Artificial Intelligence

Occam Algorithm An Occam Algorithm is mainly a general structure that is used for applying Probably Approximately Correct (PAC) models. It has two basic steps. First, draw a random sample from the target distribution, and then return all the rules that are reliable with the concepts. The sample size should be large in relation to the number of attribute and the set of possible rules.

OCEAN An expert system developed by Teknowledge for internal use by NCR. Akin to XCON, it checks system configurations.

OCEAN SURVEILLANCE A rule-based expert system that is being developed for the United States Navy to identify and track remotely sensed naval vessels.

OCSS An expert system that helps chemists in synthesizing complex organic molecules.

Odds Ratio The odds ratio can be defined as the ratio of possibility for a proposition to the possibility against the proposition (P(A)/(1-P(A)).) A value less than one signifies the complement is more probable, while large values exhibit a probability greater than one. When assessing probabilistic data, the odds ratio for a proposition is commonly used.

Off-line Training Iterative learning techniques that process the whole learning set as a batch and use the joined error to alter the estimates of the model for the next iteration of the training process.

Online Analytical Processing (OLAP) It is basically an approach to the analysis of data warehouses. OLAP tools mainly focus on supporting multidimensional analysis and in simplifying and supporting the data analytic process by persons. This is in juxtaposition

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

157

to a Knowledge Discovery in Databases (KDD) approach, which tries to automate as much of the process as is doable.

Figure 27. OLAP. Source: Image by OLAP.com.

Online Training Machine Learning techniques that consistently update their approximations with each new observation are using on-line training. Generally, the learning algorithm can be re-expressed as a series of difference equations, so that calculations are simple and quick.

ONOCIN Onocin is a rule-based expert system that helps in advising physicians about treatment protocols for Hodgkin’s and lymphoma disease. Written in INTERLISP at Stanford University, the program uses both forward chaining as well as backward chaining.

Ontolingua Ontolingua is basically a set of computer tools built around the Knowledge Interchange Format (KIF) that abridge the construction, translation, and analysis of computer ontologies.

本书版权归Arcler所有

158

Key Concepts in Artificial Intelligence

Ontology A formal ontology is a severe specification of a set of expert vocabulary terms and their relationships adequate to describe and reason about the range of situations of interest in some domain. In other words, it is a conceptual representation of the events, entities, and their relationships that compose a particular domain. Two primary relationships of interest are abstraction (“a cat is precise example of a more general entity called animal”) and composition (“a cat has whiskers and claws”). Ontologies are normally used to model a domain of interest, allowing inferential and deductive reasoning by learning systems.

Ontology An ontology can be defined as a particular theory or model about the nature of a domain of objects and the relationships among them. Any knowledge model has an implicit or explicit ontology. A formal ontology comprises a set of terms and their definitions and axioms (a priori rules) to relate the terms. The terms are normally organized into some category of a taxonomy. Axioms signify the relationships between the terms and can specify limits on values and uses of terms.

Ontology Markup Language (OML)

Figure 28. Ontology language XOL used for cross-application communication. Source: Image by IntechOpen.

OML can be defined as an eXtended Markup Language (XML) application that lengthens the SHOE effort to add knowledge demonstration to the World Wide Web (WWW) into a full XML DTD. It also varies from the SHOE objects in that the OML files are planned to be separate from the HyperText Markup Language (HTML) pages to which they denote. OML also forms a subset of Conceptual Knowledge Markup Language (CKML) that assists in richer knowledge representation capabilities.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

159

Operating System A program that accomplishes a computer’s software and hardware components. This program schedules the operation of other programs and offers a uniform interface to the hardware. Examples comprise UNIX, Windows NT, and DOS.

Operational Definition A definition of a notion or method that provides exact instructions on how to observe, measure, or implement the idea. For instance, although probability is legally defined as a measure on a set that maps the set onto [0, 1], an operational definition might stipulate probability as a function of the odds that would make a specific bet fair.

Operationalization The process of transforming an abstract specification of a process into a set of specific, concrete steps to perform the process.

Operator A function or procedure to transform a problem or program state into another, generally simpler problem or program state.

OPM A blackboard-based planning system. It aimed at planning sequences of tasks that placate conflicting goals and restraints. Given a list of tasks, their dependencies, constraints, and priorities, it attempts to find possible solutions to the problem.

Opportunistic Search A search method that are used by systems that do not have an unchanging approach to solving a problem. At several stages throughout the problemsolving process, these systems can reconsider their strategy for solving a problem.

OPS5 A language for building expert systems. It retains knowledge in the form of if-then rules and backs a wide variety of control structures, and an effectual forward chaining interpreter.

本书版权归Arcler所有

160

Key Concepts in Artificial Intelligence

OPS83 A derivative of OPS5, this is an assembled language for use in production expert system. It also lets procedural programming in the form of a PASCALlike language. It was developed at Carnegie-Mellon University and runs on a diversity of computers.

Optical Character Recognition (OCR)

Figure 29. Optical character recognition (OCR). Source: Image by www.ad-ins.com.

The process of translating the image of an item comprising text into the character-based representation of the image. A distinct is generally made between optical character recognition (OCR) and image character recognition (ICR). In this sense, optical character recognition signifies a real-time process, with the recognition engine unswervingly receiving input from an optical sensor, while image character recognition specifies that the recognition engine is working from a stored image and is not essentially operating in real-time.

Optical Flow The pattern of movement of image features. A technique that is used in Motion Analysis and image understanding.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

161

Figure 30. Use of optical flow technique for different sequences. Source: Image by Wikimedia commons.

Optimal Factoring Problem Probabilistic Networks offer a very adaptable way to signify uncertainty and its propagation through attributes. Though, it can be very “pricey” to compute marginal and joint probabilities from large and intricate nets. One solution is to find an optimum factoring of the network for a given set of target nodes in the network that lessens the cost of computing the target probabilities.

Optimal Solution A solution that is known to be best according to some criteria, e.g., an optimal cost solution would be one that is inexpensive, while an optimal time solution is one that is fastest, but not essentially cheapest.

Ordinal Attribute An ordinal attribute is one that takes on values whose order has external meaning but whose specific values or distinctions between values do not have meaning. An instance is an unanchored rating by a person on a ninepoint scale. Allocating one object a value of three and another a value of six does not suggest that the second object has twice the value of the first. It only suggests that it has more. By extension, one can also infer that the latter also has more than any other objects that are rated as 1, 2, 4, or 5 by the same person, and less than objects that are rated as 7, 8, or 9.

本书版权归Arcler所有

162

Key Concepts in Artificial Intelligence

The results of operations other than proportion level operations and order-specific counting (e.g., the no. of cases with a score less than three) are reliant on the scaling, which is arbitrary. Using techniques intended at continuous measures on ordinal variables usually result in misleading or silly results.

Ordinary Least Squares (OLS) An ordinary least squares (OLS) function makes uses of the sum of the squared deviations between the fitted and observed values as its minimization criteria. This is corresponding to reducing the Euclidian distance between the fitted and observed values. Variants include weighted least squares that weights the individual squared differences with reference to a set of weights, Least Absolute Deviations (LAD) which lessen the absolute differences, and general Lp criteria, where the p-th power of the absolute difference is shortened. OLS and LAD are L2 and L1, correspondingly. OLS is known in the neural network literature as “least mean squares.” The same acronym has been used in the neural network literature as Orthogonal Least Squares a technique for forward stepwise selection in Radial Basis function (RBF) networks. The latter technique commences with a large set of candidate points and selects a subset that is imperative for predictions.

Orthographic Projection A method to represent a three-dimensional object in a two-dimensional space (i.e., on a paper or screen). A point (x, y, z) in three-dimensional space is signified by a (scaled) point (x, y) in two-dimensional space, instead of being foreshortened by a perspective transformation. This preserves distances.

Orthoplanner Orthoplanner is a knowledge-based system to assists dentists with orthodontic treatment plans for cases where static orthodontic appliance techniques should be executed. It uses a number of techniques embracing rule base reading, backward chaining, forward chaining, and fuzzy logicbased representations of orthodontic knowledge.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

163

OTTER OTTER can be defined as an automated deduction system intended at proving theorems stated in first-order logic with equality. Otter’s inference rules are depended on resolution and paramodulation, and it comprises facilities for term rewriting, term orderings, Knuth-Bendix completion, weighting, and strategies for restricting and directing searches for proofs. Otter can also be used as a symbolic calculator and has an embedded equational programming system. Otter is a fourth-generation Argonne National Laboratory deduction system whose ancestors (dating from the early 1960s) comprise the TP series, AURA, NIUTP, and ITP. Presently, the main application of Otter is research in abstract algebra and formal logic.

Outlier It is basically referring to a record or observation that has data value(s) that is(are) outside the expected or normal range for the values. A simple form arises when a single attribute is “out of range,” but other forms can arise when combinations of values are independently valid but jointly unusual. The implication of calling a record an outlier is that the values are correct, but they do not fit the present model for the data. A main task in data cleaning is the recognition of unusual attribute values and differentiating between those that are meekly in error and those that are correct but unusual.

Out-of-Sample Testing out-of-sample testing is basically a general name for split-sample and connected technologies for giving “honest” estimates of the errors produced by an induction/learning technique. Two samples are drawn from the data. The (usually) larger of the two is chosen the “training” set, and the other the “validation” set. Models developed on the training set are then pertained to the validation set to project the error of the models.

Output Smearing Recent research has emphasized on methods for combining predictors such as classification trees or neural networks. Generally, multiple training sets are created and a predictor developed on each set. The predictions can then be merged using techniques such as arcing-boosting (ADABOOST) or Bootstrap AGGregation (bagging). Output smearing and output flipping propose an alternate to producing multiple training sets by upholding the

本书版权归Arcler所有

164

Key Concepts in Artificial Intelligence

same feature vectors but perturbing the output. When the output is constant or has been encoded in a set of binary vectors, new training sets are made by adding small amounts of Gaussian noise to the response. Output flipping is an alike technique for categorical responses, where the labels are randomly flipped or exchange so as to maintain the similar comparative proportion of the classes.

Overfitting Overfitting is a term that is used in neural networks, recursive partitioning, and other automated modeling areas. If the training data is deliberated to comprise both signal and noise (i.e., noisy data), a modeling technique has begun to overfit when it initiates seizing the “noise” instead of the “signal.” This generally arise when a model is permitted to enhance its number of “parameters” such as regression coefficients, splits in recursive partitioning schemes, or hidden units in neural networks. The effect of overfitting is to curtail the applicability of the model to other data sets, (i.e., to limit its generalizability). In the extreme, the model can only “predict” its input. The most straight-forward way to eradicate this effect is to need an extremely greater number of observations per parameter in the model.

Overlap In fuzzy logic, the overlap of a fuzzy set A is the fuzzy union of set A with its complement B. In classical set theory, this union would describe the universe. In fuzzy logic it is merely the fuzzy set defined by the element wise maximums of the membership function m(A, x) and (1-m(A, x)).

OWL A knowledge-engineering language for frame-based representations. It uses a semantic net taxonomy and was applied in LISP at MIT.

Own Slot An own slot is a slot in a class that is taken into use to define properties that are pertinent to the class itself, in comparison to any specific instance of the class.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

165

Object Detection It is basically a type of image segmentation task where the goal is to recognize one or more real-world objects in a digital image or video and confine them within the image. With progress in deep learning, object detection has been broadly adopted in manifold areas. Applications comprise navigation in autonomous vehicles, crowd monitoring through quality control and video analytics in manufacturing processes.

Open Source Platform It basically refers to any program whose source code is made available for use or modification by other developers or users. An open source platform is generally developed as a public collaboration and made available freely. An instance of an artificial intelligence open source platform is Google’s TensorFlow®. There are several licensing models for open source, which enforce agreeing organizations to proliferate open standards. Linux is one of the great successes of the open source movement.

Optical Character Recognition It converts images of typed, printed or handwritten text into machineencoded text. Sources can comprise a scanned document, a scene-photo, a photo of a document, or even text superimposed on an image. Google™ optical character recognition, for instance, can sense and convert more than 248 languages.

Oversegmentation It is basically a process through which objects being segmented from images are themselves broken down into separate fragments. The goal in oversegmentation is not to segment an image into different high-level objects, but into smaller “super pixel” regions that are roughly uniform in texture and color. Python is broadly used as a programming language in oversegmentation.

本书版权归Arcler所有

166

Key Concepts in Artificial Intelligence

P Pandemonium One of the earliest attempts (1959) at Machine Learning, making use of an approach that is allied to a neural network. Lower order demons erudite to shorten features in the input data which were approved up to higher-order demons, etc.

Paradigm A point of view or means of resembling problems. Particular tools may be worthy for one paradigm but not for another.

Parallel Coordinate Visualization A visualization technique that is mainly used in Data Mining, clustering, and related fields. A database with manifold attributes is signified as a twodimensional figure with multiple parallel axes along one dimension. Any point in the database is represented as a line connecting its coordinates. Groups of alike points show up as bands of lines

Parallel Processing

Figure 31. Parallel processing. Source: Image by Safe Software.

本书版权归Arcler所有

Key Concepts in Artificial Intelligence

167

A computer architecture that comprises (or emulates) multiple CPUs. This assists either multiple programs to run concurrently and/or manifold programs to attack the same problems concurrently.

Parameter Learning A learning method that is when a fixed functional form of the solution is assumed, and only needs the specific of the value of a parameter B, possibly vector-valued. The parameter B is estimated from a set W of training cases and selected to curtail some criteria, such as mean squared error.

Parametric Bayesian Networks Many uses of Bayesian networks, specifically for binary and multinomial data, use the observed conditional distributions to broadcast probability through the network. A parametric Bayesian network substitutes these observed distributions with some parametric form (e.g., a logistic function).

Parametric Distribution A probability function that is totally specified by a mathematical function and a “few” numbers (parameters). Some of the common examples comprises the normal, binomial and Poisson distributions. The normal distribution needs a mean and a variance parameter, the Poisson needs a mean parameter, and the binomial needs both a size and probability parameter.

Paramodulation Paramodulation can be defined as an inference rule that can be used in automated reasoning systems. Paramodulation always emphases on two clauses, needing that one of the clauses comprise at least one literal asserting the equality of two expressions.

PARRY A program that imitates a paranoid personality. Similar to Eliza, it is centered on a small word set and simple rules but seems to reply like an intelligent person.

本书版权归Arcler所有

168

Key Concepts in Artificial Intelligence

Parsing The act of breaking down a sentence and recognizing its components. The set of acceptable sentences is described by the grammar of the language that the parser is intended to accept.

Partial Least Squares (PLS) Partial Least Squares can be defined as a technique used in Machine Learning when linear or multiple regression techniques are inadequate. Multiple Linear Regression (MLR) techniques work well when there are a (comparatively) small number of independent (orthogonal) attributes and the focus is on comprehending the relationship between the attributes and the target variable. Although, in many Machine Learning situations, the attributes are highly collinear and the goal is envisaging in comparison to comprehending the implications of individual coefficients. In this case, MLR methods may not be adequate or efficient. PLS offers an alternative means to address this situation. The predictor attributes and the target variable(s) are expected into a lower dimensional space, where the regression is resolved. The projections are selected so that the target dimensions and the attribute dimensions have greater pairwise associations. The results are proposed back into the original attribute measurement space.

Partially Ordered Set A partially ordered set (or poset) is a set taken together with a partial order on it. Formally, a partially ordered set is defined as an ordered pair P = (X,